Analytical Functions in SQL
Analytical functions are powerful tools for performing complex calculations over a range of rows defined by a window. They are essential for advanced data analysis and reporting. Here are the primary analytical functions with examples:
ROW_NUMBER()
- ROW_NUMBER() OVER (PARTITION BY partition_column ORDER BY order_column): Assigns a unique number to each row within a partition of results based on a specified order.
Example
SELECT employee_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num FROM employees;
Assigns a unique number to each employee within their department, ordered by salary in descending order.
RANK()
- RANK() OVER (PARTITION BY partition_column ORDER BY order_column): Assigns a rank to each row within a partition of results, with tied rows receiving the same rank. The ranks are not consecutive if there are ties.
Example
SELECT employee_id, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank FROM employees;
Assigns a rank to each employee within their department based on salary, with tied salaries receiving the same rank.
DENSE_RANK()
- DENSE_RANK() OVER (PARTITION BY partition_column ORDER BY order_column): Assigns a rank to each row within a partition of results, with tied rows receiving the same rank. The ranks are consecutive, with no gaps.
Example
SELECT employee_id, salary, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank FROM employees;
Assigns a dense rank to each employee within their department, ensuring that ranks are consecutive without gaps for tied salaries.
NTILE()
- NTILE(n) OVER (PARTITION BY partition_column ORDER BY order_column): Divides the rows in a partition into n approximately equal groups and assigns a group number to each row.
Example
SELECT employee_id, salary, NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS quartile FROM employees;
Divides employees within each department into four groups (quartiles) based on their salary.
SUM()
- SUM(expression) OVER (PARTITION BY partition_column ORDER BY order_column ROWS BETWEEN frame_start AND frame_end): Calculates the sum of an expression over a window defined around each row.
Example
SELECT employee_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY salary RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW) AS rolling_sum FROM salaries;
Calculates the rolling sum of salaries within a window that includes the previous month and the current row.
AVG()
- AVG(expression) OVER (PARTITION BY partition_column ORDER BY order_column ROWS BETWEEN frame_start AND frame_end): Calculates the average of an expression over a window defined around each row.
Example
SELECT employee_id, salary, AVG(salary) OVER (PARTITION BY department_id ORDER BY salary ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_avg FROM salaries;
Calculates the rolling average of salaries within a window that includes the two preceding rows and the current row.
MIN() / MAX()
- MIN(expression) OVER (PARTITION BY partition_column ORDER BY order_column ROWS BETWEEN frame_start AND frame_end): Finds the minimum value of an expression within a window defined around each row.
- MAX(expression) OVER (PARTITION BY partition_column ORDER BY order_column ROWS BETWEEN frame_start AND frame_end): Finds the maximum value of an expression within a window defined around each row.
Example
SELECT employee_id, salary, MIN(salary) OVER (PARTITION BY department_id ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS min_salary, MAX(salary) OVER (PARTITION BY department_id ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS max_salary FROM employees;
Finds the minimum and maximum salaries within a window that extends from the start of the partition to the current row.
LAG()
- LAG(expression, offset, default) OVER (PARTITION BY partition_column ORDER BY order_column): Accesses the value of an expression from a preceding row within the partition, based on a specified offset.
Example
SELECT employee_id, salary, LAG(salary, 1, 0) OVER (PARTITION BY department_id ORDER BY salary) AS prev_salary FROM employees;
Returns the salary of the previous row within the same department.
LEAD()
- LEAD(expression, offset, default) OVER (PARTITION BY partition_column ORDER BY order_column): Accesses the value of an expression from a following row within the partition, based on a specified offset.
Example
SELECT employee_id, salary, LEAD(salary, 1, 0) OVER (PARTITION BY department_id ORDER BY salary) AS next_salary FROM employees;
Returns the salary of the next row within the same department.
PERCENTILE_CONT()
- PERCENTILE_CONT(fraction) WITHIN GROUP (ORDER BY expression) OVER (PARTITION BY partition_column): Calculates the continuous percentile of an expression within a partition. Used to find a value at a specific percentile.
Example
SELECT employee_id, salary, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS median_salary FROM employees;
Returns the median salary within each department.
PERCENTILE_DISC()
- PERCENTILE_DISC(fraction) WITHIN GROUP (ORDER BY expression) OVER (PARTITION BY partition_column): Calculates the discrete percentile of an expression within a partition. Used to find a value at a specific percentile, where the result is an actual value from the dataset.
Example
SELECT employee_id, salary, PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS percentile_90_salary FROM employees;
Returns the salary at the 90th percentile within each department.
Summary
Analytical functions in SQL are essential for performing detailed calculations across sets of rows defined by a window. They provide powerful capabilities for data analysis, such as ranking, moving averages, and percentile calculations, without the need for complex subqueries or temporary tables. Mastering these functions allows for more in-depth insights and detailed reporting in SQL queries. Note that the exact names and availability of these functions may vary depending on the SQL database system in use.