FIRST_VALUE() and LAST_VALUE() with SQL

Overview of FIRST_VALUE() and LAST_VALUE()

The FIRST_VALUE() and LAST_VALUE() functions are analytic functions used to retrieve the first and last values within a specific window of data. They are especially useful when working with ordered datasets and you need to obtain specific values from the beginning or end of these datasets.

FIRST_VALUE()

The FIRST_VALUE() function returns the first value in an ordered partition of a result set.

Syntax: 

FIRST_VALUE(column_name) OVER (
    [PARTITION BY partition_column]
    ORDER BY order_column [ASC|DESC]
)
  • column_name: The column from which you want to retrieve the first value.
  • PARTITION BY partition_column: (Optional) Divides the result set into partitions to which FIRST_VALUE() is applied independently.
  • ORDER BY order_column [ASC|DESC]: Specifies the order in which rows are processed within each partition. The default is ascending order (ASC).

LAST_VALUE()

The LAST_VALUE() function returns the last value in an ordered partition of a result set. It requires a window frame specification to ensure it includes all rows in the partition.

Syntax: 

LAST_VALUE(column_name) OVER (
    [PARTITION BY partition_column]
    ORDER BY order_column [ASC|DESC]
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
  • column_name: The column from which you want to retrieve the last value.
  • PARTITION BY partition_column: (Optional) Divides the result set into partitions to which LAST_VALUE() is applied independently.
  • ORDER BY order_column [ASC|DESC]: Specifies the order in which rows are processed within each partition.
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Defines the window frame to include all rows in the partition, which is necessary to get the last value.

Examples of Using FIRST_VALUE() and LAST_VALUE()

Example 1: Using FIRST_VALUE()

Suppose you have a table sales with columns salesperson_id, sales_amount, and sales_date. To get the amount of the first sale for each salesperson: 

SELECT
    salesperson_id,
    sales_amount,
    FIRST_VALUE(sales_amount) OVER (
        PARTITION BY salesperson_id
        ORDER BY sales_date ASC
    ) AS first_sale_amount
FROM sales;

Explanation:

  • FIRST_VALUE(sales_amount) OVER (PARTITION BY salesperson_id ORDER BY sales_date ASC) retrieves the amount of the first sale for each salesperson based on the sale date.

Example 2: Using LAST_VALUE()

To get the amount of the last sale for each salesperson: 

SELECT
    salesperson_id,
    sales_amount,
    LAST_VALUE(sales_amount) OVER (
        PARTITION BY salesperson_id
        ORDER BY sales_date ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_sale_amount
FROM sales;

 Explanation:

  • LAST_VALUE(sales_amount) OVER (PARTITION BY salesperson_id ORDER BY sales_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) retrieves the amount of the last sale for each salesperson, considering all rows in the partition.

Key Points

  • Window Frames:
    • For FIRST_VALUE(), you do not need to specify the window frame as the function inherently retrieves the first value based on the ORDER BY.
    • For LAST_VALUE(), it’s crucial to use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to ensure that all rows in the partition are considered, allowing you to retrieve the last value.
  • Partitioning:
    • Both functions can be used with PARTITION BY to apply the function within each partition of the result set.
  • Performance:
    • Using these functions can be resource-intensive, especially on large datasets or complex queries. Ensure your queries are optimized for performance if necessary.
  • Practical Applications:
    • These functions are useful for retrieving boundary values in ordered data, such as the first and last transactions, initial and final sales figures, etc.

Use Cases

Employee Performance Tracking

To get the first and last sale amounts for each employee: 

SELECT
    employee_id,
    sales_amount,
    FIRST_VALUE(sales_amount) OVER (
        PARTITION BY employee_id
        ORDER BY sale_date ASC
    ) AS first_sale,
    LAST_VALUE(sales_amount) OVER (
        PARTITION BY employee_id
        ORDER BY sale_date ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_sale
FROM employee_sales;

Sales Trend Analysis

To track the first and last sales amounts for each product: 

SELECT
    product_id,
    sales_amount,
    FIRST_VALUE(sales_amount) OVER (
        PARTITION BY product_id
        ORDER BY sale_date ASC
    ) AS first_sale_amount,
    LAST_VALUE(sales_amount) OVER (
        PARTITION BY product_id
        ORDER BY sale_date ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_sale_amount
FROM product_sales;

Transaction History

To get the first and last transaction amount for each customer: 

SELECT
    customer_id,
    transaction_amount,
    FIRST_VALUE(transaction_amount) OVER (
        PARTITION BY customer_id
        ORDER BY transaction_date ASC
    ) AS first_transaction,
    LAST_VALUE(transaction_amount) OVER (
        PARTITION BY customer_id
        ORDER BY transaction_date ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_transaction
FROM customer_transactions;

Conclusion

The FIRST_VALUE() and LAST_VALUE() functions are powerful tools for extracting boundary values within ordered partitions. Understanding how to use these functions effectively, especially with respect to window frames and partitions, is crucial for accurate and efficient data analysis.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Facebook
Twitter
LinkedIn
WhatsApp
Email
Print