The aggregate() Function with R

The aggregate() Function

The aggregate() function in R is used to compute summary statistics of data grouped by one or more factors. It is particularly useful when you want to calculate statistics like the mean, sum, or median of a variable, split by levels of one or more grouping variables.

Basic Syntax 

aggregate(x, by, FUN, ...)
  • x: The data to be aggregated (typically a numeric vector or data frame).
  • by: A list of factors or lists of factors to group the data by.
  • FUN: The function to apply to each group (e.g., mean, sum, median).
  • : Additional arguments for the function.

Detailed Examples

Aggregating a Single Numeric Vector

Example 1: Calculate the mean of a numeric vector by a factor 

# Create a data frame
data <- data.frame(
  group = factor(c("A", "A", "B", "B", "C", "C")),
  value = c(10, 20, 30, 40, 50, 60)
)
# Aggregate to find the mean of 'value' for each 'group'
result <- aggregate(value ~ group, data = data, FUN = mean)
print(result)
# Output:
#   group value
# 1     A  15
# 2     B  35
# 3     C  55

In this example, aggregate() computes the mean of the value column for each level of the group factor.

Aggregating with Multiple Factors

Example 2: Calculate the sum of a numeric variable grouped by two factors 

# Create a more complex data frame
data <- data.frame(
  group1 = factor(c("A", "A", "B", "B", "A", "B")),
  group2 = factor(c("X", "Y", "X", "Y", "X", "Y")),
  value = c(10, 20, 30, 40, 50, 60)
)
# Aggregate to find the sum of 'value' by 'group1' and 'group2'
result <- aggregate(value ~ group1 + group2, data = data, FUN = sum)
print(result)
# Output:
#   group1 group2 value
# 1      A      X    60
# 2      A      Y    20
# 3      B      X    30
# 4      B      Y   100

Here, aggregate() calculates the sum of value for each combination of group1 and group2.

Using Custom Functions

Example 3: Apply a custom function to compute the range of values 

# Create a data frame
data <- data.frame(
  group = factor(c("A", "A", "B", "B", "A", "B")),
  value = c(10, 15, 30, 35, 25, 40)
)
# Define a custom function to calculate range
range_fun <- function(x) {
  return(max(x) - min(x))
}
# Aggregate to find the range of 'value' for each 'group'
result <- aggregate(value ~ group, data = data, FUN = range_fun)
print(result)
# Output:
#   group value
# 1     A    15
# 2     B    10

In this example, a custom function range_fun is used to calculate the range (difference between the maximum and minimum) of value for each group.

Aggregating Data Frames

Example 4: Aggregating multiple columns 

# Create a data frame with multiple numeric columns
data <- data.frame(
  group = factor(c("A", "A", "B", "B", "A", "B")),
  value1 = c(10, 20, 30, 40, 50, 60),
  value2 = c(5, 15, 25, 35, 45, 55)
)
# Aggregate to find the mean of 'value1' and 'value2' for each 'group'
result <- aggregate(. ~ group, data = data, FUN = mean)
print(result)
# Output:
#   group value1 value2
# 1     A     26     21
# 2     B     43     30

Here, aggregate() calculates the mean for multiple columns (value1 and value2) by the group factor.

Key Points to Remember

  • Grouping Variables: The by argument specifies the grouping variables. You can group by one or more factors.
  • Aggregation Function: The FUN argument determines which summary statistic is computed. It can be any function that takes a numeric vector and returns a single value (e.g., mean, sum, median, or a custom function).
  • Data Frames and Vectors: The aggregate() function can handle both data frames (where multiple columns can be aggregated) and numeric vectors (where only one column is aggregated).

Summary

The aggregate() function in R is a powerful tool for summarizing data based on grouping factors. It allows you to compute various summary statistics, such as means, sums, or custom functions, across different levels of factors. By using aggregate(), you can easily analyze and interpret complex datasets by breaking them down into manageable groupings.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Facebook
Twitter
LinkedIn
WhatsApp
Email
Print