Extracting Sub-Data Frames
Extracting Rows and Columns
You can extract sub-data frames by selecting specific rows and columns.
Extracting Rows
To extract specific rows from a data frame, you can use indices or logical conditions.
Example: Extraction by Indices
# Create a data frame df <- data.frame(Name = c("Alice", "Bob", "Charlie", "David"), Age = c(25, 30, 35, 40), City = c("Paris", "London", "Berlin", "New York")) # Extract rows 1 and 3 subset_rows <- df[c(1, 3), ] print(subset_rows) # Output: # Name Age City # 1 Alice 25 Paris # 2 Charlie 35 Berlin
Example: Extraction by Condition
# Extract rows where Age is greater than 30 subset_age <- df[df$Age > 30, ] print(subset_age) # Output: # Name Age City # 1 Charlie 35 Berlin # 2 David 40 New York
Extracting Columns
To extract specific columns, you can use indices or column names.
Example: Extraction by Column Names
# Extract the "Name" column name_column <- df["Name"] print(name_column) # Output: # Name # 1 Alice # 2 Bob # 3 Charlie # 4 David
Example: Extraction by Indices
# Extract the first column first_column <- df[, 1] print(first_column) # Output: # [1] "Alice" "Bob" "Charlie" "David"
Extraction with Logical Conditions
Logical conditions allow you to extract subsets based on specific criteria.
Example: Extraction with Multiple Conditions
# Extract rows where Age is greater than 25 and City is "Paris" subset_condition <- df[df$Age > 25 & df$City == "Paris", ] print(subset_condition) # Output: # Name Age City # 1 Alice 25 Paris
Extraction Using subset()
The subset() function allows you to filter data based on conditions.
Example: Extraction with subset()
# Extract rows where Age is less than 35 subset_df <- subset(df, Age < 35) print(subset_df) # Output: # Name Age City # 1 Alice 25 Paris # 2 Bob 30 London
Extraction Using dplyr Functions
The dplyr package provides powerful functions for manipulating and extracting subsets of data.
Example: Extraction with filter() and select()
# Load the dplyr package library(dplyr) # Extract rows where Age is greater than 30 and select "Name" and "City" columns subset_dplyr <- df %>% filter(Age > 30) %>% select(Name, City) print(subset_dplyr) # Output: # Name City # 1 Charlie Berlin # 2 David New York
Extraction Using slice() for Row Ranges
The slice() function from dplyr allows you to select specific ranges of rows.
Example: Extraction of Row Ranges
# Extract rows 2 to 4 subset_slice <- df %>% slice(2:4) print(subset_slice) # Output: # Name Age City # 1 Bob 30 London # 2 Charlie 35 Berlin # 3 David 40 New York
Extraction with which() for Logical Indices
The which() function can be used to get indices corresponding to a logical condition.
Example: Extraction with which()
# Get indices of rows where Age is greater than 30 indices <- which(df$Age > 30) # Use indices to extract sub-data frames subset_which <- df[indices, ] print(subset_which) # Output: # Name Age City # 1 Charlie 35 Berlin # 2 David 40 New York
Extraction Using Negative Indices
Negative indices allow you to exclude specific rows or columns during extraction.
Example: Excluding Rows or Columns
# Exclude row 2 subset_exclude_row <- df[-2, ] print(subset_exclude_row) # Exclude the "City" column subset_exclude_col <- df[, -3] print(subset_exclude_col) # Name Age City # 1 Alice 25 Paris # 2 Charlie 35 Berlin # 3 David 40 New York # Output (for columns): # Name Age # 1 Alice 25 # 2 Bob 30 # 3 Charlie 35 # 4 David 40