Working with Tables with R
Working with Tables Introduction to Tables in R In R, tables are a way to summarize categorical data. They are often created from factors or categorical variables to provide counts or frequencies. Tables can be very useful for understanding the distribution of categorical data and for exploratory data analysis. Creating Tables Using table() The table() function is the most common way to create a frequency table from a vector or data frame. Basic Usage: # Create a vector of categorical data categories <- c(“A”, “B”, “A”, “C”, “B”, “A”, “C”, “C”, “B”) # Create a frequency table freq_table <- table(categories) print(freq_table) # Output: # categories # A B C # 3 3 3 This table shows the count of each category in the categories vector Using table() with Multiple Factors You can create a contingency table (cross-tabulation) for two or more factors. This table shows the count of each combination of gender and age_group. Manipulating Tables # Create vectors for two factors gender <- c(“Male”, “Female”, “Female”, “Male”, “Male”, “Female”) age_group <- c(“Young”, “Old”, “Young”, “Young”, “Old”, “Old”) # Create a contingency table contingency_table <- table(gender, age_group) print(contingency_table) # Output: # age_group # gender Old Young # Female 2 1 # Male 2 2 Accessing Table Elements You can access specific elements of a table using indexing. # Access the count of Females in the Old age group count_female_old <- contingency_table[“Female”, “Old”] print(count_female_old) # Output: # [1] 2 Adding and Removing Table Elements You can modify tables by adding or removing elements. # Add a new level to the ‘age_group’ factor age_group <- factor(age_group, levels = c(“Young”, “Old”, “Middle-aged”)) # Create a new contingency table with the additional level contingency_table_updated <- table(gender, age_group) print(contingency_table_updated) # Output: # age_group # gender Young Old Middle-aged # Female 1 2 0 # Male 2 2 0 Converting Tables to Data Frames You can convert a table to a data frame for easier manipulation and analysis. # Convert the contingency table to a data frame df_from_table <- as.data.frame(contingency_table) print(df_from_table) # Output: # gender age_group Freq # 1 Female Old 2 # 2 Female Young 1 # 3 Male Old 2 # 4 Male Young 2 Analyzing Tables Computing Proportions You can compute proportions from a frequency table to understand the relative distribution. # Compute proportions prop_table <- prop.table(freq_table) print(prop_table) # Output: # categories # A B C # 0.3333333 0.3333333 0.3333333 Aggregating Data You can use aggregate() with tables to summarize data across different dimensions. # Aggregate data by gender and age group to compute the total counts agg_table <- aggregate(Freq ~ gender + age_group, data = df_from_table, sum) print(agg_table) # Output: # gender age_group Freq # 1 Female Old 2 # 2 Female Young 1 # 3 Male Old 2 # 4 Male Young 2 Marginal Tables You can compute marginal totals for rows or columns. # Compute row-wise marginal totals row_totals <- margin.table(contingency_table, 1) print(row_totals) # Compute column-wise marginal totals col_totals <- margin.table(contingency_table, 2) print(col_totals) # Output: # Row totals: # Female Male # 3 4 # Column totals: # Young Old # 3 4 Extended Examples Example: Creating and Analyzing a Multi-Dimensional Table Suppose we have a data frame with more complex categorical data. # Create a more complex data frame data <- data.frame( region = factor(c(“North”, “South”, “East”, “West”, “North”, “East”)), outcome = factor(c(“Success”, “Failure”, “Success”, “Success”, “Failure”, “Failure”)) ) # Create a multi-dimensional table multi_table <- table(data$region, data$outcome) print(multi_table) # Output: # outcome # region Failure Success # East 1 1 # North 1 1 # South 1 0 # West 0 1 In this table, we see the counts of Failure and Success outcomes for each region. Example: Visualizing Tables You can visualize tables using bar plots. # Create a bar plot of the frequency table barplot(freq_table, main = “Frequency of Categories”, xlab = “Categories”, ylab = “Frequency”) This will create a bar plot showing the frequency of each category. Summary Tables in R are a powerful way to summarize and analyze categorical data. Using functions like table(), you can create frequency tables, contingency tables, and more. Manipulating and analyzing these tables involves accessing elements, converting to data frames, computing proportions, and aggregating data. Visualizing tables through plots can also provide valuable insights.
Working with Tables with R Lire la suite »