Merging Data Frames with R

Merging Data Frames

Merging Data Frames is a fundamental operation in data analysis, allowing you to combine multiple datasets based on common keys. In R, there are several methods to merge Data Frames, each suited to specific needs.

Merging with merge()

The merge() function is the primary method for combining Data Frames in R. It works similarly to SQL joins and allows you to specify merge keys and join types.

Basic Syntax

The syntax for the merge() function is:

merge(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, all.x = FALSE, all.y = FALSE)
  • x, y: The Data Frames to merge.
  • by: The name(s) of the columns on which to merge (for both Data Frames).
  • by.x: The name(s) of the columns in Data Frame x.
  • by.y: The name(s) of the columns in Data Frame y.
  • all: Performs a full outer join (keeps all rows from both Data Frames).
  • all.x: Performs a left join (keeps all rows from Data Frame x).
  • all.y: Performs a right join (keeps all rows from Data Frame y).

Inner Join

Example: Inner Join on a Common Key 

# Create two Data Frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
                   Name = c("Alice", "Bob", "Charlie", "David"))
df2 <- data.frame(ID = c(3, 4, 5, 6),
                   Age = c(25, 30, 35, 40))
# Merge Data Frames on the ID column
merged_df <- merge(df1, df2, by = "ID")
print(merged_df)
# Output:
#   ID    Name Age
# 1  3 Charlie  25
# 2  4   David  30

 Full Outer Join

Example: Full Outer Join 

# Merge Data Frames, keeping all rows from both Data Frames
merged_df_full <- merge(df1, df2, by = "ID", all = TRUE)
print(merged_df_full)
# Output:
#   ID     Name Age
# 1  1   Alice  NA
# 2  2     Bob  NA
# 3  3 Charlie  25
# 4  4   David  30
# 5  5     NA  35
# 6  6     NA  40

Left and Right Joins

Example: Left Join 

# Merge Data Frames, keeping all rows from Data Frame df1
merged_df_left <- merge(df1, df2, by = "ID", all.x = TRUE)
print(merged_df_left)
# Output:
#   ID     Name Age
# 1  1   Alice  NA
# 2  2     Bob  NA
# 3  3 Charlie  25
# 4  4   David  30

Example: Right Join 

# Merge Data Frames, keeping all rows from Data Frame df2
merged_df_right <- merge(df1, df2, by = "ID", all.y = TRUE)
print(merged_df_right)
# Output:
#   ID     Name Age
# 1  3 Charlie  25
# 2  4   David  30
# 3  5      NA  35
# 4  6      NA  40

 Using dplyr for Data Frame Merging

The dplyr package provides modern and intuitive functions for merging Data Frames, including left_join(), right_join(), inner_join(), and full_join().

Basic Syntax

The dplyr functions for merging are:

  • left_join(x, y, by = NULL): Left join.
  • right_join(x, y, by = NULL): Right join.
  • inner_join(x, y, by = NULL): Inner join.
  • full_join(x, y, by = NULL): Full outer join.

Inner Join with dplyr

Example: Inner Join 

# Load the dplyr package
library(dplyr)
# Merge Data Frames, keeping only the rows with matching keys
joined_df_inner <- inner_join(df1, df2, by = "ID")
print(joined_df_inner)
# Output:
#   ID     Name Age
# 1  3 Charlie  25
# 2  4   David  30

Full Outer Join with dplyr

Example: Full Outer Join 

# Merge Data Frames, keeping all rows from both Data Frames
joined_df_full <- full_join(df1, df2, by = "ID")
print(joined_df_full)
# Output:
#   ID     Name Age
# 1  1   Alice  NA
# 2  2     Bob  NA
# 3  3 Charlie  25
# 4  4   David  30
# 5  5      NA  35
# 6  6      NA  40

Left Join with dplyr

Example: Left Join 

# Merge Data Frames, keeping all rows from Data Frame df1
joined_df_left <- left_join(df1, df2, by = "ID")
print(joined_df_left)
# Output:
#   ID    Name Age
# 1  1   Alice  NA
# 2  2     Bob  NA
# 3  3 Charlie  25
# 4  4   David  30

Merging Data Frames with Multiple Columns

When Data Frames to be merged have multiple columns to base the merge on, you can specify multiple columns in the by, by.x, or by.y arguments.

Example: Merging on Multiple Columns 

# Create two Data Frames with multiple key columns
df1 <- data.frame(ID = c(1, 2, 3, 4),
                   Type = c("A", "B", "A", "B"),
                   Name = c("Alice", "Bob", "Charlie", "David"))
df2 <- data.frame(ID = c(3, 4, 5, 6),
                   Type = c("A", "B", "A", "B"),
                   Age = c(25, 30, 35, 40))
# Merge on the ID and Type columns
merged_df_multi <- merge(df1, df2, by = c("ID", "Type"))
print(merged_df_multi)
# Output:
#   ID Type   Name Age
# 1  3   A Charlie  25
# 2  4   B   David  30

Merging Data Frames with Non-Common Columns

If the Data Frames being merged have non-common columns, these columns will automatically be added to the resulting Data Frame with NA values for missing combinations.

Example: Merging with Non-Common Columns 

# Create two Data Frames with different columns
df1 <- data.frame(ID = c(1, 2, 3),
                   Name = c("Alice", "Bob", "Charlie"))
df2 <- data.frame(ID = c(2, 3, 4),
                   Age = c(25, 30, 35),
                   City = c("Paris", "Berlin", "New York"))
# Merge Data Frames on the ID column
merged_df_non_common <- merge(df1, df2, by = "ID")
print(merged_df_non_common)
# Output:
#   ID     Name Age   City
# 1  2     Bob  25  Paris
# 2  3 Charlie  30 Berlin

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Facebook
Twitter
LinkedIn
WhatsApp
Email
Print