Back to DAG

Window Functions

databases

Window Functions: Aggregation Without Collapsing Rows

A GROUP BY query collapses rows — if you group 1 million orders by country, you get one row per country and lose the individual order details. Window functions solve a different problem: they compute a value across a set of related rows while keeping every row in the result. This makes them essential for rankings, running totals, moving averages, and comparisons within groups.

The OVER Clause

Every window function uses an OVER() clause that defines the window — which rows participate in the computation:

SELECT name, department, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
  SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;
  • PARTITION BY: divides rows into groups (like GROUP BY, but rows are not collapsed). Each partition is processed independently.
  • ORDER BY: defines the order of rows within each partition. Required for ranking functions and running aggregates.

Window Function Categories

Ranking functions (require ORDER BY):

FunctionBehavior
ROW_NUMBER()Unique sequential number: 1, 2, 3, 4 (no ties)
RANK()Tied values get the same rank, gaps follow: 1, 2, 2, 4
DENSE_RANK()Tied values get the same rank, no gaps: 1, 2, 2, 3

Offset functions (access other rows by position):

FunctionBehavior
LAG(col, n)Value from n rows BEFORE the current row
LEAD(col, n)Value from n rows AFTER the current row
FIRST_VALUE(col)First value in the window frame
LAST_VALUE(col)Last value in the window frame
NTH_VALUE(col, n)The nth value in the window frame

Aggregate functions as windows: SUM, AVG, COUNT, MIN, MAX can all be used as window functions. When combined with ORDER BY, they compute running aggregates — the aggregate of all rows from the start of the partition up to the current row.

Frame Specification

The frame defines exactly which rows relative to the current row are included in the computation:

SUM(amount) OVER (
  PARTITION BY customer_id
  ORDER BY order_date
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

This computes a running total per customer. Common frame options:

  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — default for ORDER BY, gives running aggregate.
  • ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING — 5-row sliding window (moving average).
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING — entire partition (same as no ORDER BY).

Execution

The database first sorts data by (PARTITION BY columns, ORDER BY columns), then makes a single pass computing the window function for each row. Multiple window functions with the same PARTITION BY and ORDER BY share a single sort. Window functions are evaluated after WHERE, GROUP BY, and HAVING, but before ORDER BY and LIMIT.

Running Totals and Moving Averages

Running total: SUM(revenue) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)

7-day moving average: AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

These are extremely common in financial reporting, time-series analysis, and dashboards.

Real-Life: Top-N Per Group

Real-World Example

Problem: Show the top 3 highest-paid employees in each department.

With GROUP BY, you can find the MAX salary per department, but you cannot list the actual employees. Window functions solve this:

SELECT * FROM (
  SELECT name, department, salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
  FROM employees
) ranked
WHERE rn <= 3;

ROW_NUMBER assigns 1, 2, 3, ... within each department ordered by salary descending. The outer query filters to keep only the top 3 per department.

Running total example:

SELECT order_date, amount,
  SUM(amount) OVER (ORDER BY order_date) AS cumulative_revenue
FROM orders;
order_dateamountcumulative_revenue
Jan 1100100
Jan 2250350
Jan 375425
Jan 4300725

Each row shows the total revenue up to and including that date, without collapsing the individual order rows.

LAG for period-over-period comparison:

SELECT month, revenue,
  revenue - LAG(revenue, 1) OVER (ORDER BY month) AS month_over_month_change
FROM monthly_revenue;

This computes how much revenue changed compared to the previous month — a single SQL expression replaces a complex self-join or application-level logic.

Window Function: RANK and Running SUM

RANK() OVER (PARTITION BY dept ORDER BY salary DESC), running SUM(salary) name dept salary rank run_sum partition: Engineering Alice Eng 120k 1 120k Bob Eng 100k 2 220k Carol Eng 100k 2 320k Dave Eng 85k 4 405k partition: Sales Eve Sales 95k 1 95k Frank Sales 80k 2 175k Grace Sales 80k 2 255k Key observations: - RANK: Bob and Carol tie at rank 2 (same salary). Dave skips to rank 4 (gap after tie). - Running SUM: resets at each partition boundary. 120 → 220 → 320 → 405 within Engineering. - Every input row is preserved — unlike GROUP BY, no rows are collapsed. - Frame for running SUM: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (default with ORDER BY).
Step 1 of 2