The UNION Operator in SQL
Functionality of UNION
The UNION operator in SQL is used to combine the result sets of two or more SELECT queries into a single result set. The key characteristic of UNION is that it removes duplicate rows from the result set. The result set will only include unique rows across all the combined queries.
Syntax of UNION
The basic syntax for using UNION is:
SELECT column1, column2, ... FROM table1 UNION SELECT column1, column2, ... FROM table2;
Points to Remember:
- Column Consistency: Each SELECT statement must have the same number of columns, and the columns must be in the same order. The data types of the corresponding columns should be compatible.
- Column Names: The column names in the result set are taken from the first SELECT statement.
Example of UNION
Consider the following tables:
- Table: EmployeesUSA
ID | Name | Position |
1 | John Smith | Manager |
2 | Emily Johnson | Developer |
- Table: EmployeesCanada
ID | Name | Position |
3 | Michael Brown | Developer |
4 | Sarah Davis | Designer |
Query Using UNION:
SELECT ID, Name, Position FROM EmployeesUSA UNION SELECT ID, Name, Position FROM EmployeesCanada;
Result:
ID | Name | Position |
1 | John Smith | Manager |
2 | Emily Johnson | Developer |
3 | Michael Brown | Developer |
4 | Sarah Davis | Designer |
Here, UNION combines the rows from both tables and removes any duplicate rows (if any existed).
Detailed Points and Use Cases
Column Matching and Compatibility
- Data Type Compatibility: The data types of corresponding columns in each SELECT statement must be compatible. For example, if the first column in one SELECT is an INTEGER, the corresponding column in the other SELECT should also be an INTEGER or a type that can be implicitly converted to INTEGER.
- Column Order: The order of columns must be the same across all SELECT statements. If SELECT column1, column2 is used in the first query, the subsequent queries should also use SELECT column1, column2.
Performance Considerations
- Duplicate Removal: Since UNION removes duplicate rows, it can be slower than UNION ALL, especially when dealing with large datasets. This is because the database engine has to perform additional processing to identify and eliminate duplicates.
- Indexes: Having appropriate indexes on the columns involved in UNION queries can improve performance, especially for large datasets.
Sorting Results
- ORDER BY Clause: To sort the results of a UNION query, you need to use the ORDER BY clause at the end of the final query. The ORDER BY clause can sort based on any of the columns from the SELECT statements.
Example with Sorting:
SELECT ID, Name, Position FROM EmployeesUSA UNION SELECT ID, Name, Position FROM EmployeesCanada ORDER BY Name;
This query sorts the combined result set by the Name column.
Use Cases
- Combining Similar Data: UNION is useful for combining data from similar tables that might have the same structure but are stored in different tables. For instance, combining employees from different country-specific tables into a single list.
- Creating Reports: When creating reports that need to aggregate data from various sources or segments, UNION allows you to merge these segments into a comprehensive report.
- Handling Partitioned Data: In cases where data is partitioned across multiple tables or databases, UNION helps in querying and aggregating this distributed data into a unified result.
Common Pitfalls and Tips
- Check Data Types: Ensure that corresponding columns have compatible data types to avoid errors or unexpected results.
- Column Count: Always match the number of columns and their order across all SELECT statements.
- Performance Tuning: For large datasets, consider using UNION ALL if duplicates are not a concern, or ensure that appropriate indexes are in place to optimize performance.
Example of Using UNION in Practice
Let’s consider a scenario where you have different tables for different branches of a company and you want to consolidate the employee data.
Tables:
- Table: EmployeesNY
ID | Name | Department |
1 | Alice Green | HR |
2 | Bob White | IT |
- Table: EmployeesLA
ID | Name | Department |
3 | Charlie Black | IT |
4 | Dana Blue | Marketing |
Query to Consolidate Data:
SELECT ID, Name, Department FROM EmployeesNY UNION SELECT ID, Name, Department FROM EmployeesLA;
This query combines employees from the New York and Los Angeles branches into a single result set.