The UNION ALL Operator in SQL
Functionality of UNION ALL
The UNION ALL operator in SQL combines the result sets of two or more SELECT queries into a single result set. Unlike UNION, UNION ALL does not remove duplicate rows; it includes all rows from the combined queries, preserving duplicates.
Syntax of UNION ALL
The basic syntax for using UNION ALL is:
SELECT column1, column2, ... FROM table1 UNION ALL SELECT column1, column2, ... FROM table2;
Points to Remember:
- Column Compatibility: The queries combined with UNION ALL must have the same number of columns with compatible data types.
- Column Order: The order of columns must be identical in all SELECT statements.
Example of UNION ALL
Consider the following tables:
- Table: Sales2023
ID | Product | Amount |
1 | Laptop | 1000 |
2 | Smartphone | 500 |
- Table: Sales2024
ID | Product | Amount |
3 | Tablet | 300 |
4 | Headphones | 150 |
2 | Smartphone | 500 |
Query Using UNION ALL:
SELECT ID, Product, Amount FROM Sales2023 UNION ALL SELECT ID, Product, Amount FROM Sales2024;
Result:
ID | Product | Amount |
1 | Laptop | 1000 |
2 | Smartphone | 500 |
3 | Tablet | 300 |
4 | Headphones | 150 |
2 | Smartphone | 500 |
Here, UNION ALL includes all rows, including duplicates. The row for Smartphone appears twice.
Detailed Points and Use Cases
Preservation of Duplicates
- Duplicates Not Removed: Unlike UNION, which removes duplicates, UNION ALL preserves all occurrences of rows. This can be useful when duplicates are significant or need to be kept for further analysis.
Performance
- Better Performance: UNION ALL is generally more performant than UNION because it does not involve the overhead of removing duplicates. This extra step in UNION can be resource-intensive, especially for large datasets.
- Indexing: Using indexes on columns involved in UNION ALL queries can improve performance, though UNION ALL is usually faster due to the absence of duplicate removal.
Sorting Results
- ORDER BY Clause: As with UNION, to sort the results of a UNION ALL query, use the ORDER BY clause at the end of the final combined query. You can sort by any column from the result set.
Example with Sorting:
SELECT ID, Product, Amount FROM Sales2023 UNION ALL SELECT ID, Product, Amount FROM Sales2024 ORDER BY Amount DESC;
This query combines sales data from both years and sorts the results by Amount in descending order.
Use Cases
- Complete Data Combination: UNION ALL is ideal when you need to combine datasets while keeping all occurrences, including duplicates. This is useful when each instance is important, such as counting transactions or events.
- Data Aggregation: When aggregating data from multiple sources or periods and all instances need to be preserved for analysis, UNION ALL is effective.
- Transaction Reporting: In financial or sales reporting where each record is unique and needs to be counted separately, UNION ALL ensures all transactions are included.
Common Pitfalls and Tips
- Check for Duplicates: Ensure that preserving duplicates is intentional and meaningful for your analysis or reporting.
- Performance Optimization: For large datasets, use UNION ALL to improve performance if duplicate removal is not required.
- Result Validity: If you need a result set without duplicates for final analysis or reporting, consider using UNION after combining the datasets.
Practical Example of Using UNION ALL
Imagine you have sales tables for different quarters and want to consolidate all sales data into one list for a comprehensive analysis.
Tables:
- Table: SalesQ1
ID | Product | Amount |
1 | Laptop | 1000 |
2 | Smartphone | 500 |
- Table: SalesQ2
ID | Product | Amount |
3 | Tablet | 300 |
4 | Headphones | 150 |
2 | Smartphone | 500 |
Query to Consolidate Data:
SELECT ID, Product, Amount FROM SalesQ1 UNION ALL SELECT ID, Product, Amount FROM SalesQ2;
This query combines sales data from both quarters and keeps all occurrences of each record.