Common Functions Used with Factors
levels()
The levels() function is used to get or set the levels of a factor. Levels are the distinct categories that a factor can take.
Getting Levels
# Create a factor data <- factor(c("High", "Low", "Medium", "Medium", "High", "Low")) # Get the levels of the factor levels(data) # Output: # [1] "High" "Low" "Medium"
Setting Levels
You can set the levels of a factor to a new set of values.
# Set new levels for the factor levels(data) <- c("Low", "Medium", "High", "Very High") # Print the factor with updated levels print(data) # Output: # [1] High Low Medium Medium High Low
Levels: Low Medium High Very High
nlevels()
The nlevels() function returns the number of levels in a factor.
# Number of levels in the factor nlevels(data) # Output: # [1] 4
as.factor()
The as.factor() function converts a vector into a factor. This is useful when you want to convert a character vector or numeric vector into a factor.
# Convert a character vector to a factor char_vector <- c("Red", "Green", "Blue", "Green", "Red") factor_char_vector <- as.factor(char_vector) # Print the factor print(factor_char_vector) # Output: # [1] Red Green Blue Green Red
Levels: Blue Green Red
summary()
The summary() function provides a summary of a factor, showing the frequency of each level.
# Summary of the factor summary(factor_char_vector) # Output: # Blue Green Red # 1 2 2
table()
The table() function creates a frequency table of the factor levels. This function is useful for seeing how many observations fall into each category.
# Frequency table of the factor freq_table <- table(factor_char_vector) print(freq_table) # Output: # factor_char_vector # Blue Green Red # 1 2 2
relevel()
The relevel() function changes the reference level of a factor. This is useful in modeling when you want to change which level is used as the baseline.
# Relevel the factor to set "Blue" as the reference level relevel_factor <- relevel(factor_char_vector, ref = "Blue") # Print the relevel factor print(relevel_factor) # Output: # [1] Blue Green Red Green Red
Levels: Blue Green Red
fct_reorder()
From the forcats package, fct_reorder() reorders the levels of a factor based on another variable. This is useful when you want to order levels by some numeric summary.
# Install and load the forcats package if not already installed # install.packages("forcats") library(forcats) # Create a data frame df <- data.frame( category = factor(c("A", "B", "C", "B", "A", "C")), value = c(10, 20, 30, 40, 50, 60) ) # Reorder levels of 'category' based on the mean of 'value' df$category <- fct_reorder(df$category, df$value, .fun = mean) # Print the reordered factor print(df$category) # Output: # [1] A B C B A C
Levels: A B C
fct_recode()
Also from the forcats package, fct_recode() allows you to rename the levels of a factor.
# Recode the levels of a factor df$category <- fct_recode(df$category, "Group 1" = "A", "Group 2" = "B", "Group 3" = "C") # Print the recoded factor print(df$category) # Output: # [1] Group 1 Group 2 Group 3 Group 2 Group 1 Group 3
Levels: Group 1 Group 2 Group 3
fct_collapse()
fct_collapse() is another function from the forcats package that allows you to combine levels into broader categories.
# Collapse the levels of the factor df$category <- fct_collapse(df$category, "Group A" = c("A", "B"), "Group B" = "C") # Print the collapsed factor print(df$category) # Output: # [1] Group A Group A Group B Group A Group A Group B
Levels: Group A Group B
fct_expand()
fct_expand() ensures that all levels specified are included in the factor, even if they are not present in the data.
# Expand the factor to include all specified levels df$category <- fct_expand(df$category, "Group A", "Group B", "Group C") # Print the expanded factor print(df$category) # Output: # [1] Group A Group A Group B Group A Group A Group B
Levels: Group A Group B Group C