May 18, 2026·10 min read

Window functions SQL cheat sheet

Q: Which window functions show up most on interviews?

`ROW_NUMBER`, `RANK`, and `LAG`/`LEAD` are the clear top three across phone screens, take-homes, and onsite SQL rounds. `SUM OVER` for running totals and `AVG OVER` for moving averages are the close second tier — every product analytics screen at a B2C company will touch them. `NTILE` shows up when the prompt mentions segmentation, RFM, or deciles. Knowing the difference between `ROW_NUMBER`, `RANK`, and `DENSE_RANK` cold is the highest-leverage prep, because that question is asked almost universally.

Q: What is the difference between ROWS and RANGE frames?

`ROWS` counts physical rows: `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW` always includes exactly three rows. `RANGE` operates on values of the `ORDER BY` expression — when duplicates exist, `RANGE` groups all rows sharing a value into the same frame position, which can silently inflate aggregates. Default to `ROWS` unless you have a specific reason for value-based windowing. The biggest gotcha: omitting the frame entirely with an `ORDER BY` defaults to `RANGE`, not `ROWS`.

Q: Can I use multiple window functions in one query?

Yes. Each window function carries its own `PARTITION BY` and `ORDER BY`, independent of others in the same `SELECT`. A common pattern combines a `ROW_NUMBER` for ranking, a `SUM OVER` for the partition total, and a `LAG` for the previous row's value — three windows, one pass, no joins. Name shared specs with `WINDOW w AS (...)` and reference `OVER w`.

Q: Why does my WHERE clause on a window function fail?

Because `WHERE` is evaluated before window functions. The logical query order is `FROM` then `WHERE` then `GROUP BY` then `HAVING` then window functions then `SELECT` then `ORDER BY`, so referencing a window expression in `WHERE` is invalid. Use a CTE to compute the window result, then filter in the outer query. Snowflake, BigQuery, and Databricks support `QUALIFY` for this.

Q: When should I use NTILE versus PERCENT_RANK?

`NTILE(n)` puts rows into `n` buckets of roughly equal size — ideal for quartiles or deciles. `PERCENT_RANK` returns a continuous value between 0 and 1 representing the row's relative position — ideal for per-row percentiles like churn risk or leaderboard scores. Use `NTILE` for discrete buckets, `PERCENT_RANK` for a continuous score.

Q: How do I deduplicate rows with window functions?

Use `ROW_NUMBER() OVER (PARTITION BY dedup_key ORDER BY recency_column DESC, stable_tiebreaker)` filtered to `= 1`. The stable tiebreaker matters: without it, rows with identical `recency_column` produce non-deterministic results, so reruns return different "deduped" rows. A primary key or surrogate id works. Wrap in a CTE and filter outside, or use `QUALIFY rn = 1` on warehouses that support it.

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

Contents:

Why window functions matter
General syntax
Ranking functions
Offset functions: LAG and LEAD
Aggregates with OVER
Window frames: ROWS BETWEEN
15 interview drills
Common pitfalls
Optimization tips
Related reading
FAQ

Why window functions matter

If a hiring manager at Stripe, Airbnb, or Snowflake asks you one SQL question on a phone screen, the odds are well above even that it involves a window function. They test row-level reasoning without collapsing the grain. A candidate who only knows GROUP BY cannot rank, deduplicate, walk a session, or compute a running total without nesting joins nobody wants to read in code review.

The surface area is small. Five families of functions, one syntax block, one frame clause. This cheat sheet covers every clause you will see on a screen, with runnable snippets and a 15-drill set at the end.

General syntax

function(arg) OVER (
    PARTITION BY group_column
    ORDER BY sort_column
    ROWS BETWEEN frame_start AND frame_end
)

PARTITION BY carves the table into independent windows; functions reset at each partition boundary. ORDER BY defines the order inside each window — required for ranking and offset functions. ROWS BETWEEN defines the physical frame, which only matters for aggregates and FIRST_VALUE / LAST_VALUE. Ranking functions ignore the frame clause, and ordered aggregates default to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not ROWS. More on that trap below.

Ranking functions

The three workhorses are ROW_NUMBER, RANK, and DENSE_RANK. They differ only in tie handling.

SELECT
    user_id, score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,
    RANK()       OVER (ORDER BY score DESC) AS rk,
    DENSE_RANK() OVER (ORDER BY score DESC) AS drk
FROM exam_results;

If two rows tie at 95, ROW_NUMBER gives 1, 2 (non-deterministic unless you add a tiebreaker). RANK gives 1, 1, 3. DENSE_RANK gives 1, 1, 2. Use ROW_NUMBER with a stable tiebreaker to pick one row per group. Use RANK = 1 or DENSE_RANK = 1 to keep all rows tied for first.

NTILE(n) rounds out the family for segmentation.

SELECT
    user_id, total_spent,
    NTILE(4) OVER (ORDER BY total_spent DESC) AS quartile
FROM user_stats;

NTILE(4) splits ordered rows into four buckets — bucket 1 is the top 25%. Canonical tool for RFM quartiles.

Offset functions: LAG and LEAD

LAG looks backward, LEAD looks forward, and together they replace most self-joins for adjacent-row comparisons.

SELECT
    DATE, revenue,
    LAG(revenue, 1)  OVER (ORDER BY DATE) AS prev_day,
    LEAD(revenue, 1) OVER (ORDER BY DATE) AS next_day,
    revenue - LAG(revenue, 1) OVER (ORDER BY DATE) AS day_over_day
FROM daily_stats;

Both take an offset (default 1) and a default when the offset falls outside the window (default NULL). Time between consecutive sessions is a one-liner.

SELECT
    user_id, session_start,
    session_start - LAG(session_start) OVER (
        PARTITION BY user_id ORDER BY session_start
    ) AS gap_since_last
FROM sessions;

FIRST_VALUE and LAST_VALUE return values at specific positions in the window. LAST_VALUE requires an explicit unbounded frame; without it, the default frame stops at the current row.

SELECT DISTINCT
    user_id,
    FIRST_VALUE(order_id) OVER w AS first_order,
    LAST_VALUE(order_id) OVER (
        PARTITION BY user_id ORDER BY created_at
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_order
FROM orders
WINDOW w AS (PARTITION BY user_id ORDER BY created_at);

Aggregates with OVER

Any aggregate — SUM, AVG, COUNT, MIN, MAX, STDDEV — becomes a window function by adding OVER. Most common uses: running totals and moving averages.

SELECT
    DATE, new_users,
    SUM(new_users) OVER (ORDER BY DATE) AS cumulative_users
FROM daily_registrations;

SELECT
    DATE, dau,
    AVG(dau) OVER (
        ORDER BY DATE
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS dau_7d_ma
FROM daily_metrics;

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW is exactly seven rows — what Mixpanel and Amplitude compute when you toggle a trailing 7d smoother. Window aggregates without an ORDER BY compute over the whole partition — perfect for share-of-total.

SELECT
    order_id, user_id, amount,
    ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY user_id), 1) AS pct_of_user_total
FROM orders;

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

Window frames: ROWS BETWEEN

The frame clause defines which rows are visible to an aggregate relative to the current row.

Clause	Meaning
`UNBOUNDED PRECEDING`	Start of partition
`n PRECEDING`	`n` rows before current
`CURRENT ROW`	Current row
`n FOLLOWING`	`n` rows after current
`UNBOUNDED FOLLOWING`	End of partition

Three patterns cover most use cases.

-- Running total
SUM(x) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- Centered 3-row moving average
AVG(x) OVER (ORDER BY DATE ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

-- Trailing 7-day window
SUM(x) OVER (ORDER BY DATE ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

RANGE is the value-based sibling of ROWS. With duplicates in ORDER BY, RANGE groups all rows sharing a value into the same frame position, which can silently double-count. Default to ROWS.

15 interview drills

Drill 1. Rank customers by lifetime revenue

SELECT
    user_id,
    SUM(amount) AS total,
    ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS rn
FROM orders
GROUP BY user_id;

Drill 2. Top product per category

WITH ranked AS (
    SELECT
        product_id, category,
        SUM(quantity) AS sold,
        ROW_NUMBER() OVER (
            PARTITION BY category ORDER BY SUM(quantity) DESC
        ) AS rn
    FROM order_items
    JOIN products USING (product_id)
    GROUP BY product_id, category
)
SELECT * FROM ranked WHERE rn = 1;

Drill 3. ROW_NUMBER vs RANK vs DENSE_RANK

Two products tied on sales. ROW_NUMBER gives 1, 2. RANK gives 1, 1, 3. DENSE_RANK gives 1, 1, 2.

Drill 4. Month-over-month growth

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', created_at) AS month,
        SUM(amount) AS revenue
    FROM orders
    GROUP BY 1
)
SELECT
    month, revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month,
    ROUND(
        100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
        / NULLIF(LAG(revenue) OVER (ORDER BY month), 0),
        1
    ) AS growth_pct
FROM monthly;

Drill 5. Cumulative daily revenue

SELECT
    created_at::DATE AS day,
    SUM(amount) AS daily,
    SUM(SUM(amount)) OVER (
        ORDER BY created_at::DATE
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative
FROM orders
GROUP BY created_at::DATE;

Drill 6. 7-day moving average

SELECT
    day, revenue,
    AVG(revenue) OVER (
        ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS ma_7d
FROM daily_revenue;

SELECT
    order_id, user_id, amount,
    ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY user_id), 1) AS pct
FROM orders;

Drill 8. First and last order per user

SELECT DISTINCT
    user_id,
    FIRST_VALUE(order_id) OVER w AS first_order,
    LAST_VALUE(order_id) OVER (
        w ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_order
FROM orders
WINDOW w AS (PARTITION BY user_id ORDER BY created_at);

Drill 9. Gap between consecutive orders

SELECT
    user_id, created_at,
    created_at - LAG(created_at) OVER (
        PARTITION BY user_id ORDER BY created_at
    ) AS gap
FROM orders;

Drill 10. Monthly running total that resets each month

SELECT
    created_at::DATE AS day, amount,
    SUM(amount) OVER (
        PARTITION BY DATE_TRUNC('month', created_at)
        ORDER BY created_at
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS monthly_cumulative
FROM orders;

Drill 11. User percentile by order count

SELECT
    user_id, total_orders,
    PERCENT_RANK() OVER (ORDER BY total_orders) AS pct_rank
FROM (
    SELECT user_id, COUNT(*) AS total_orders
    FROM orders
    GROUP BY user_id
) t;

Drill 12. Distance from each row to its group max

SELECT
    user_id, order_id, amount,
    MAX(amount) OVER (PARTITION BY user_id) - amount AS diff_from_max
FROM orders;

Drill 13. Reset numbering by user and category

SELECT
    user_id, category, order_id,
    ROW_NUMBER() OVER (
        PARTITION BY user_id, category ORDER BY created_at
    ) AS order_in_category
FROM orders
JOIN products USING (product_id);

Drill 14. Quartile users by spend

SELECT
    user_id, total_spent,
    NTILE(4) OVER (ORDER BY total_spent) AS quartile
FROM (
    SELECT user_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY user_id
) t;

Drill 15. Funnel with sequence check

WITH ordered AS (
    SELECT
        user_id, event_name, event_time,
        LEAD(event_name) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) AS next_event
    FROM events
)
SELECT
    COUNT(DISTINCT user_id) FILTER (WHERE event_name = 'view') AS step1,
    COUNT(DISTINCT user_id) FILTER (
        WHERE event_name = 'view' AND next_event = 'cart'
    ) AS step2,
    COUNT(DISTINCT user_id) FILTER (
        WHERE event_name = 'cart' AND next_event = 'purchase'
    ) AS step3
FROM ordered;

Common pitfalls

The frame default is the most common trap. Writing SUM(amount) OVER (ORDER BY date) without specifying the frame defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With duplicates in date, RANGE includes all rows sharing the current date, not just rows physically earlier. Specify ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW whenever the result must be deterministic at duplicate values.

LAST_VALUE without an explicit unbounded frame returns the current row, not the last row in the partition. The default frame stops at the current row, so LAST_VALUE over an ordered window with no frame override looks at a window of size one. Override with ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING whenever you want the partition's last value.

Using ROW_NUMBER for deduplication without a deterministic tiebreaker is a subtle data-quality bug. ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) to keep the latest record per user is non-deterministic if two records share the same updated_at. Reruns can produce different "latest" rows. Always add a stable secondary key like primary id to the ORDER BY.

Filtering on a window function in WHERE fails because WHERE runs before window functions. Use an outer query referencing the windowed result via a CTE, or QUALIFY on Snowflake, BigQuery, and Databricks.

Performance: every distinct window specification can force another sort pass. Consolidate using named windows via WINDOW w AS (...) wherever the partition and order match.

Optimization tips

Sort once, use many. Compose window specifications so they share PARTITION BY and ORDER BY keys when possible. Snowflake, BigQuery, Postgres, and Databricks all deduplicate the sort when window specs match exactly.

Partition by columns already clustered or sorted in storage. On BigQuery, PARTITION BY event_date over a partitioned table is cheap. On Databricks with Z-ordered or liquid-clustered tables, the same logic applies. On Postgres, an index on the partition and order columns can let the planner skip the sort entirely.

When the windowed result feeds a top-N filter, push it down with QUALIFY (warehouses that support it) or wrap the window in a CTE and filter outside. For wide tables, project only what you need before the window function — the sort is memory-bound, so narrower rows mean fewer disk spills.

If you want to drill window-function questions like this every day, NAILDD is launching with 500+ SQL problems organized by exactly this pattern.

FAQ

Which window functions show up most on interviews?

ROW_NUMBER, RANK, and LAG/LEAD are the clear top three across phone screens, take-homes, and onsite SQL rounds. SUM OVER for running totals and AVG OVER for moving averages are the close second tier — every product analytics screen at a B2C company will touch them. NTILE shows up when the prompt mentions segmentation, RFM, or deciles. Knowing the difference between ROW_NUMBER, RANK, and DENSE_RANK cold is the highest-leverage prep, because that question is asked almost universally.

What is the difference between ROWS and RANGE frames?

ROWS counts physical rows: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW always includes exactly three rows. RANGE operates on values of the ORDER BY expression — when duplicates exist, RANGE groups all rows sharing a value into the same frame position, which can silently inflate aggregates. Default to ROWS unless you have a specific reason for value-based windowing. The biggest gotcha: omitting the frame entirely with an ORDER BY defaults to RANGE, not ROWS.

Can I use multiple window functions in one query?

Yes. Each window function carries its own PARTITION BY and ORDER BY, independent of others in the same SELECT. A common pattern combines a ROW_NUMBER for ranking, a SUM OVER for the partition total, and a LAG for the previous row's value — three windows, one pass, no joins. Name shared specs with WINDOW w AS (...) and reference OVER w.

Why does my WHERE clause on a window function fail?

Because WHERE is evaluated before window functions. The logical query order is FROM then WHERE then GROUP BY then HAVING then window functions then SELECT then ORDER BY, so referencing a window expression in WHERE is invalid. Use a CTE to compute the window result, then filter in the outer query. Snowflake, BigQuery, and Databricks support QUALIFY for this.

When should I use NTILE versus PERCENT_RANK?

NTILE(n) puts rows into n buckets of roughly equal size — ideal for quartiles or deciles. PERCENT_RANK returns a continuous value between 0 and 1 representing the row's relative position — ideal for per-row percentiles like churn risk or leaderboard scores. Use NTILE for discrete buckets, PERCENT_RANK for a continuous score.

How do I deduplicate rows with window functions?

Use ROW_NUMBER() OVER (PARTITION BY dedup_key ORDER BY recency_column DESC, stable_tiebreaker) filtered to = 1. The stable tiebreaker matters: without it, rows with identical recency_column produce non-deterministic results, so reruns return different "deduped" rows. A primary key or surrogate id works. Wrap in a CTE and filter outside, or use QUALIFY rn = 1 on warehouses that support it.