Window functions in SQL — a working analyst guide

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What window functions are and why they matter

A window function computes a value for every row using a related set of rows — but, unlike GROUP BY, it does not collapse the result. You keep all the rows and gain a new column. That single property is what makes window functions the workhorse of analytics SQL at Stripe, Snowflake, DoorDash, and Airbnb. If a PM pings you for "every order from Q1 with its share of category revenue, before standup tomorrow," a window function is the difference between a one-screen query and a self-join that times out.

The mental model is simple. GROUP BY changes the shape of the output: one thousand rows in, ten rows out. A window function leaves the rows alone and adds a computed column to each one. The rest is mechanics: how to describe the window, how to choose the right function, and how to avoid the frame-clause traps.

Canonical example — every order with its category total and percentage share, in one query:

SELECT
    order_id,
    category,
    amount,
    SUM(amount) OVER (PARTITION BY category) AS category_total,
    ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY category), 1) AS pct
FROM orders;

The deeper comparison of these two clauses lives in GROUP BY vs PARTITION BY in SQL.

The OVER() syntax

Every window function uses an OVER(...) clause that describes three things: how to partition the rows, how to order them inside each partition, and which rows to include relative to the current row.

function() OVER (
    PARTITION BY column    -- split ROWS INTO independent windows
    ORDER BY column        -- ORDER ROWS inside each WINDOW
    ROWS BETWEEN ... AND ...  -- frame: which ROWS participate
)

PARTITION BY is the windowed cousin of GROUP BY. It splits the input into independent windows but does not collapse them. Leave it out and the whole table is one window.

ORDER BY defines the order inside each window. It is mandatory for ranking and offset functions like ROW_NUMBER, RANK, LAG, and LEAD. For plain aggregates it is optional, but adding it switches SQL to a cumulative interpretation.

The frame clause defines which rows participate relative to the current row. The default frame is rarely what you want, and getting it wrong silently produces wrong numbers.

Ranking functions: ROW_NUMBER, RANK, DENSE_RANK

Three functions, three different behaviors when values tie.

SELECT
    name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,
    RANK()       OVER (ORDER BY score DESC) AS rnk,
    DENSE_RANK() OVER (ORDER BY score DESC) AS drnk
FROM candidates;
name score rn rnk drnk
Alice 95 1 1 1
Boris 95 2 1 1
Vera 90 3 3 2
Gleb 85 4 4 3

ROW_NUMBER always produces a unique integer per row. When values tie, the order is not deterministic — add a tiebreaker column to ORDER BY if you need reproducibility. RANK gives ties the same rank but skips the next numbers (1, 1, 3). DENSE_RANK gives ties the same rank without skipping (1, 1, 2).

Reach for ROW_NUMBER for top-N-per-group tasks (one row per group). Reach for RANK or DENSE_RANK when ties must be visible — leaderboards, creator rankings.

Offset functions: LAG and LEAD

LAG(column, n) returns the value n rows back in the same window; LEAD(column, n) returns the value n rows forward. Default offset is 1.

Classic use case — month-over-month growth.

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', created_at) AS month,
        SUM(revenue) AS rev
    FROM sales
    GROUP BY 1
)
SELECT
    month,
    rev,
    LAG(rev) OVER (ORDER BY month) AS prev_month,
    ROUND(
        100.0 * (rev - LAG(rev) OVER (ORDER BY month))
        / NULLIF(LAG(rev) OVER (ORDER BY month), 0), 1
    ) AS growth_pct
FROM monthly;

Both functions return NULL when the previous or next row does not exist. Pass a third argument to specify a default, for example LAG(rev, 1, 0). Wrap the denominator in NULLIF to avoid divide-by-zero on the first month.

Aggregates with a window: SUM, AVG, COUNT OVER

Any aggregate becomes a window function when you add OVER(...). The three patterns below cover most dashboard work.

Running total

SELECT
    DATE,
    new_users,
    SUM(new_users) OVER (ORDER BY DATE) AS cumulative_users
FROM daily_registrations;

Each row shows the sum of all earlier rows including itself — the cumulative signup curve on every growth dashboard.

Seven-day moving average

SELECT
    DATE,
    dau,
    AVG(dau) OVER (
        ORDER BY DATE
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS dau_7d_avg
FROM daily_metrics;

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means the current row plus the six before it — exactly seven days. The standard way to smooth daily product metrics before plotting them.

COUNT inside a window

SELECT
    user_id,
    order_id,
    COUNT(*) OVER (PARTITION BY user_id) AS total_orders
FROM orders;

Every order row now carries the total order count for that user — no subquery, no self-join.

FIRST_VALUE, LAST_VALUE, NTH_VALUE

FIRST_VALUE(column) returns the value from the first row of the window; LAST_VALUE(column) from the last row; NTH_VALUE(column, n) from the nth row.

SELECT
    user_id,
    order_date,
    amount,
    FIRST_VALUE(amount) OVER (
        PARTITION BY user_id ORDER BY order_date
    ) AS first_order_amount
FROM orders;

One trap catches almost every analyst the first time. By default, when you add ORDER BY, the frame ends at the current row — so LAST_VALUE returns the current row, not the actual last row. To get the true last value, force the frame:

SELECT
    user_id,
    order_date,
    LAST_VALUE(amount) OVER (
        PARTITION BY user_id ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_order_amount
FROM orders;
Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Frame clause: ROWS BETWEEN and RANGE BETWEEN

The frame is the set of rows the function looks at, relative to the current row.

Boundary Meaning
UNBOUNDED PRECEDING Start of the window
n PRECEDING n rows back
CURRENT ROW The current row
n FOLLOWING n rows forward
UNBOUNDED FOLLOWING End of the window

ROWS counts physical rows. RANGE counts logical ranges of the ORDER BY value — and when values tie, RANGE lumps all tied rows together. SUM(x) OVER (ORDER BY date) with no explicit frame defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so two rows sharing the same date are added at once. In practice you almost always want ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Make the frame explicit and you remove a class of silent bugs from your dashboards.

Worked patterns from real dashboards

Top-N per group

Find the three biggest orders per category.

WITH ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY amount DESC) AS rn
    FROM orders
)
SELECT *
FROM ranked
WHERE rn <= 3;

ROW_NUMBER in a CTE, filter on rn outside — the single most asked window-function pattern in analyst interviews at Meta, DoorDash, and Stripe.

Day-over-day retention with LAG

Did the user come back the next day.

WITH daily_sessions AS (
    SELECT DISTINCT user_id, session_date::DATE AS day
    FROM sessions
),
with_prev AS (
    SELECT
        user_id,
        day,
        LAG(day) OVER (PARTITION BY user_id ORDER BY day) AS prev_day
    FROM daily_sessions
)
SELECT
    prev_day AS cohort_day,
    COUNT(DISTINCT user_id) FILTER (WHERE day - prev_day = 1) AS retained,
    COUNT(DISTINCT user_id) AS total
FROM with_prev
WHERE prev_day IS NOT NULL
GROUP BY prev_day;

LAG attaches the previous session date to each row, then you count users where the gap is exactly one day. Generalizes to D7, D30, and Nth-day retention.

Seven-day moving average by channel

SELECT
    channel,
    DATE,
    conversion_rate,
    AVG(conversion_rate) OVER (
        PARTITION BY channel
        ORDER BY DATE
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS cr_7d_avg
FROM channel_daily_stats;

The only addition versus the global version is PARTITION BY channel, which restarts the window for each channel.

Common pitfalls

The first trap is omitting ORDER BY on a ranking function. ROW_NUMBER() OVER (PARTITION BY user_id) with no order clause is technically legal but the row order is undefined, so the same query returns different numbers on different runs. Anything that uses position — ROW_NUMBER, RANK, LAG, LEAD, FIRST_VALUE, LAST_VALUE — needs a deterministic ORDER BY, with a unique tiebreaker column when the sort key has duplicates.

A second trap, much worse because it is silent, is the default frame on cumulative aggregates. SUM(x) OVER (ORDER BY date) without an explicit frame uses RANGE, which groups all rows sharing the current date together. If two orders fall on the same day, the running total jumps by both amounts on the first of them. Always write ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for running totals.

A third trap is the LAST_VALUE surprise. The default frame ends at the current row, so LAST_VALUE returns the value of the row you are on, not the actual last row of the window. The fix is to set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, or to use FIRST_VALUE with a DESC order instead.

A fourth trap is reaching for GROUP BY when a window function is the right tool. If the task asks for both detail rows and an aggregate next to them, GROUP BY forces a self-join. The window function does the same job in one pass.

A fifth trap is writing multiple window functions with subtly different OVER clauses when you meant the same window. Each unique OVER triggers its own sort. If the window is identical, name it once with WINDOW:

SELECT
    user_id,
    order_id,
    ROW_NUMBER() OVER w AS rn,
    SUM(amount) OVER w AS running_total
FROM orders
WINDOW w AS (PARTITION BY user_id ORDER BY created_at);

On a large orders table this is the difference between a fifteen-second query and a five-minute one.

Optimization tips

Sort cost dominates window functions on big tables. The engine has to order rows inside each partition before it can compute a rank or offset. Match the table sort key or index to your PARTITION BY and ORDER BY columns — clustering keys on Snowflake, covering indexes on PostgreSQL, clustered tables on BigQuery. And reuse one window with WINDOW whenever multiple functions share it.

Push filters down before the window. Apply WHERE created_at >= '2026-01-01' inside the CTE that produces the windowed data, not the outer query, so the planner narrows the scan early.

Materialize the result when the metric is read many times per day. Running-total revenue, cumulative installs, and 7-day moving averages belong in a scheduled materialized view, turning thirty downstream queries into thirty fast index lookups.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.

FAQ

Which window functions should I learn first?

ROW_NUMBER, LAG/LEAD, and SUM OVER. Together they cover roughly eighty percent of analyst interview tasks: top-N per group, period-over-period comparisons, and running totals. Once those feel automatic, add RANK and DENSE_RANK for tied-rank cases and AVG OVER for moving averages. Frame clauses and LAST_VALUE come next — that is where most silent bugs live.

What is the difference between ROWS and RANGE in the frame clause?

ROWS counts physical rows. ROWS BETWEEN 2 PRECEDING AND CURRENT ROW includes exactly three rows. RANGE counts logical ranges of the ORDER BY value, so when two rows share a value, RANGE groups them and may include more rows than expected. Use ROWS for running totals and moving averages — its behavior is predictable. RANGE matters only for time-range queries where ties carry meaning.

Do window functions slow down a query?

It depends on data volume, indexes, and how many distinct windows you have. Each unique OVER triggers its own sort, so two functions sharing one window are essentially free together. On analyst workloads of tens of thousands to a few million rows, a single window function adds milliseconds to seconds. On hundreds of millions of rows you care about clustering keys and materialization, but the fix is rarely to drop the window function — it is to pre-aggregate.

Can I filter on a window function inside WHERE?

No. WHERE runs before window functions, so the rank column does not exist yet. Compute the window in a CTE or subquery and filter on the result in the outer query. The top-N-per-group pattern above is the canonical example.

What happens if I omit PARTITION BY entirely?

The whole result set is one window. SUM(amount) OVER () returns the grand total on every row. ROW_NUMBER() OVER (ORDER BY date) returns a continuous numbering across the table with no group resets. Useful for share-of-total calculations against one global aggregate.

Which databases support window functions?

All major analytical engines: PostgreSQL, MySQL 8+, ClickHouse, BigQuery, Snowflake, Redshift, SQL Server, Oracle, Databricks SQL. Syntax is nearly identical, with small differences in RANGE support and less-common functions like NTH_VALUE and PERCENT_RANK. Standard PostgreSQL syntax is enough for interview prep at Stripe, Meta, Airbnb, or DoorDash.