Window functions SQL cheat sheet
Contents:
Why window functions matter
If a hiring manager at Stripe, Airbnb, or Snowflake asks you one SQL question on a phone screen, the odds are well above even that it involves a window function. They test row-level reasoning without collapsing the grain. A candidate who only knows GROUP BY cannot rank, deduplicate, walk a session, or compute a running total without nesting joins nobody wants to read in code review.
The surface area is small. Five families of functions, one syntax block, one frame clause. This cheat sheet covers every clause you will see on a screen, with runnable snippets and a 15-drill set at the end.
General syntax
function(arg) OVER (
PARTITION BY group_column
ORDER BY sort_column
ROWS BETWEEN frame_start AND frame_end
)PARTITION BY carves the table into independent windows; functions reset at each partition boundary. ORDER BY defines the order inside each window — required for ranking and offset functions. ROWS BETWEEN defines the physical frame, which only matters for aggregates and FIRST_VALUE / LAST_VALUE. Ranking functions ignore the frame clause, and ordered aggregates default to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not ROWS. More on that trap below.
Ranking functions
The three workhorses are ROW_NUMBER, RANK, and DENSE_RANK. They differ only in tie handling.
SELECT
user_id, score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,
RANK() OVER (ORDER BY score DESC) AS rk,
DENSE_RANK() OVER (ORDER BY score DESC) AS drk
FROM exam_results;If two rows tie at 95, ROW_NUMBER gives 1, 2 (non-deterministic unless you add a tiebreaker). RANK gives 1, 1, 3. DENSE_RANK gives 1, 1, 2. Use ROW_NUMBER with a stable tiebreaker to pick one row per group. Use RANK = 1 or DENSE_RANK = 1 to keep all rows tied for first.
NTILE(n) rounds out the family for segmentation.
SELECT
user_id, total_spent,
NTILE(4) OVER (ORDER BY total_spent DESC) AS quartile
FROM user_stats;NTILE(4) splits ordered rows into four buckets — bucket 1 is the top 25%. Canonical tool for RFM quartiles.
Offset functions: LAG and LEAD
LAG looks backward, LEAD looks forward, and together they replace most self-joins for adjacent-row comparisons.
SELECT
DATE, revenue,
LAG(revenue, 1) OVER (ORDER BY DATE) AS prev_day,
LEAD(revenue, 1) OVER (ORDER BY DATE) AS next_day,
revenue - LAG(revenue, 1) OVER (ORDER BY DATE) AS day_over_day
FROM daily_stats;Both take an offset (default 1) and a default when the offset falls outside the window (default NULL). Time between consecutive sessions is a one-liner.
SELECT
user_id, session_start,
session_start - LAG(session_start) OVER (
PARTITION BY user_id ORDER BY session_start
) AS gap_since_last
FROM sessions;FIRST_VALUE and LAST_VALUE return values at specific positions in the window. LAST_VALUE requires an explicit unbounded frame; without it, the default frame stops at the current row.
SELECT DISTINCT
user_id,
FIRST_VALUE(order_id) OVER w AS first_order,
LAST_VALUE(order_id) OVER (
PARTITION BY user_id ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_order
FROM orders
WINDOW w AS (PARTITION BY user_id ORDER BY created_at);Aggregates with OVER
Any aggregate — SUM, AVG, COUNT, MIN, MAX, STDDEV — becomes a window function by adding OVER. Most common uses: running totals and moving averages.
SELECT
DATE, new_users,
SUM(new_users) OVER (ORDER BY DATE) AS cumulative_users
FROM daily_registrations;SELECT
DATE, dau,
AVG(dau) OVER (
ORDER BY DATE
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS dau_7d_ma
FROM daily_metrics;ROWS BETWEEN 6 PRECEDING AND CURRENT ROW is exactly seven rows — what Mixpanel and Amplitude compute when you toggle a trailing 7d smoother. Window aggregates without an ORDER BY compute over the whole partition — perfect for share-of-total.
SELECT
order_id, user_id, amount,
ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY user_id), 1) AS pct_of_user_total
FROM orders;Window frames: ROWS BETWEEN
The frame clause defines which rows are visible to an aggregate relative to the current row.
| Clause | Meaning |
|---|---|
UNBOUNDED PRECEDING |
Start of partition |
n PRECEDING |
n rows before current |
CURRENT ROW |
Current row |
n FOLLOWING |
n rows after current |
UNBOUNDED FOLLOWING |
End of partition |
Three patterns cover most use cases.
-- Running total
SUM(x) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- Centered 3-row moving average
AVG(x) OVER (ORDER BY DATE ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
-- Trailing 7-day window
SUM(x) OVER (ORDER BY DATE ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)RANGE is the value-based sibling of ROWS. With duplicates in ORDER BY, RANGE groups all rows sharing a value into the same frame position, which can silently double-count. Default to ROWS.
15 interview drills
Drill 1. Rank customers by lifetime revenue
SELECT
user_id,
SUM(amount) AS total,
ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS rn
FROM orders
GROUP BY user_id;Drill 2. Top product per category
WITH ranked AS (
SELECT
product_id, category,
SUM(quantity) AS sold,
ROW_NUMBER() OVER (
PARTITION BY category ORDER BY SUM(quantity) DESC
) AS rn
FROM order_items
JOIN products USING (product_id)
GROUP BY product_id, category
)
SELECT * FROM ranked WHERE rn = 1;Drill 3. ROW_NUMBER vs RANK vs DENSE_RANK
Two products tied on sales. ROW_NUMBER gives 1, 2. RANK gives 1, 1, 3. DENSE_RANK gives 1, 1, 2.
Drill 4. Month-over-month growth
WITH monthly AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY 1
)
SELECT
month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
ROUND(
100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0),
1
) AS growth_pct
FROM monthly;Drill 5. Cumulative daily revenue
SELECT
created_at::DATE AS day,
SUM(amount) AS daily,
SUM(SUM(amount)) OVER (
ORDER BY created_at::DATE
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative
FROM orders
GROUP BY created_at::DATE;Drill 6. 7-day moving average
SELECT
day, revenue,
AVG(revenue) OVER (
ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma_7d
FROM daily_revenue;Drill 7. Share of each order in its user's total
SELECT
order_id, user_id, amount,
ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY user_id), 1) AS pct
FROM orders;Drill 8. First and last order per user
SELECT DISTINCT
user_id,
FIRST_VALUE(order_id) OVER w AS first_order,
LAST_VALUE(order_id) OVER (
w ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_order
FROM orders
WINDOW w AS (PARTITION BY user_id ORDER BY created_at);Drill 9. Gap between consecutive orders
SELECT
user_id, created_at,
created_at - LAG(created_at) OVER (
PARTITION BY user_id ORDER BY created_at
) AS gap
FROM orders;Drill 10. Monthly running total that resets each month
SELECT
created_at::DATE AS day, amount,
SUM(amount) OVER (
PARTITION BY DATE_TRUNC('month', created_at)
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS monthly_cumulative
FROM orders;Drill 11. User percentile by order count
SELECT
user_id, total_orders,
PERCENT_RANK() OVER (ORDER BY total_orders) AS pct_rank
FROM (
SELECT user_id, COUNT(*) AS total_orders
FROM orders
GROUP BY user_id
) t;Drill 12. Distance from each row to its group max
SELECT
user_id, order_id, amount,
MAX(amount) OVER (PARTITION BY user_id) - amount AS diff_from_max
FROM orders;Drill 13. Reset numbering by user and category
SELECT
user_id, category, order_id,
ROW_NUMBER() OVER (
PARTITION BY user_id, category ORDER BY created_at
) AS order_in_category
FROM orders
JOIN products USING (product_id);Drill 14. Quartile users by spend
SELECT
user_id, total_spent,
NTILE(4) OVER (ORDER BY total_spent) AS quartile
FROM (
SELECT user_id, SUM(amount) AS total_spent
FROM orders
GROUP BY user_id
) t;Drill 15. Funnel with sequence check
WITH ordered AS (
SELECT
user_id, event_name, event_time,
LEAD(event_name) OVER (
PARTITION BY user_id ORDER BY event_time
) AS next_event
FROM events
)
SELECT
COUNT(DISTINCT user_id) FILTER (WHERE event_name = 'view') AS step1,
COUNT(DISTINCT user_id) FILTER (
WHERE event_name = 'view' AND next_event = 'cart'
) AS step2,
COUNT(DISTINCT user_id) FILTER (
WHERE event_name = 'cart' AND next_event = 'purchase'
) AS step3
FROM ordered;Common pitfalls
The frame default is the most common trap. Writing SUM(amount) OVER (ORDER BY date) without specifying the frame defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With duplicates in date, RANGE includes all rows sharing the current date, not just rows physically earlier. Specify ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW whenever the result must be deterministic at duplicate values.
LAST_VALUE without an explicit unbounded frame returns the current row, not the last row in the partition. The default frame stops at the current row, so LAST_VALUE over an ordered window with no frame override looks at a window of size one. Override with ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING whenever you want the partition's last value.
Using ROW_NUMBER for deduplication without a deterministic tiebreaker is a subtle data-quality bug. ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) to keep the latest record per user is non-deterministic if two records share the same updated_at. Reruns can produce different "latest" rows. Always add a stable secondary key like primary id to the ORDER BY.
Filtering on a window function in WHERE fails because WHERE runs before window functions. Use an outer query referencing the windowed result via a CTE, or QUALIFY on Snowflake, BigQuery, and Databricks.
Performance: every distinct window specification can force another sort pass. Consolidate using named windows via WINDOW w AS (...) wherever the partition and order match.
Optimization tips
Sort once, use many. Compose window specifications so they share PARTITION BY and ORDER BY keys when possible. Snowflake, BigQuery, Postgres, and Databricks all deduplicate the sort when window specs match exactly.
Partition by columns already clustered or sorted in storage. On BigQuery, PARTITION BY event_date over a partitioned table is cheap. On Databricks with Z-ordered or liquid-clustered tables, the same logic applies. On Postgres, an index on the partition and order columns can let the planner skip the sort entirely.
When the windowed result feeds a top-N filter, push it down with QUALIFY (warehouses that support it) or wrap the window in a CTE and filter outside. For wide tables, project only what you need before the window function — the sort is memory-bound, so narrower rows mean fewer disk spills.
Related reading
- SQL window functions interview questions
- Window functions for data engineering interviews
- GROUP BY vs PARTITION BY in SQL
- CTE vs subquery in SQL
- How to calculate cohort retention in SQL
If you want to drill window-function questions like this every day, NAILDD is launching with 500+ SQL problems organized by exactly this pattern.
FAQ
Which window functions show up most on interviews?
ROW_NUMBER, RANK, and LAG/LEAD are the clear top three across phone screens, take-homes, and onsite SQL rounds. SUM OVER for running totals and AVG OVER for moving averages are the close second tier — every product analytics screen at a B2C company will touch them. NTILE shows up when the prompt mentions segmentation, RFM, or deciles. Knowing the difference between ROW_NUMBER, RANK, and DENSE_RANK cold is the highest-leverage prep, because that question is asked almost universally.
What is the difference between ROWS and RANGE frames?
ROWS counts physical rows: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW always includes exactly three rows. RANGE operates on values of the ORDER BY expression — when duplicates exist, RANGE groups all rows sharing a value into the same frame position, which can silently inflate aggregates. Default to ROWS unless you have a specific reason for value-based windowing. The biggest gotcha: omitting the frame entirely with an ORDER BY defaults to RANGE, not ROWS.
Can I use multiple window functions in one query?
Yes. Each window function carries its own PARTITION BY and ORDER BY, independent of others in the same SELECT. A common pattern combines a ROW_NUMBER for ranking, a SUM OVER for the partition total, and a LAG for the previous row's value — three windows, one pass, no joins. Name shared specs with WINDOW w AS (...) and reference OVER w.
Why does my WHERE clause on a window function fail?
Because WHERE is evaluated before window functions. The logical query order is FROM then WHERE then GROUP BY then HAVING then window functions then SELECT then ORDER BY, so referencing a window expression in WHERE is invalid. Use a CTE to compute the window result, then filter in the outer query. Snowflake, BigQuery, and Databricks support QUALIFY for this.
When should I use NTILE versus PERCENT_RANK?
NTILE(n) puts rows into n buckets of roughly equal size — ideal for quartiles or deciles. PERCENT_RANK returns a continuous value between 0 and 1 representing the row's relative position — ideal for per-row percentiles like churn risk or leaderboard scores. Use NTILE for discrete buckets, PERCENT_RANK for a continuous score.
How do I deduplicate rows with window functions?
Use ROW_NUMBER() OVER (PARTITION BY dedup_key ORDER BY recency_column DESC, stable_tiebreaker) filtered to = 1. The stable tiebreaker matters: without it, rows with identical recency_column produce non-deterministic results, so reruns return different "deduped" rows. A primary key or surrogate id works. Wrap in a CTE and filter outside, or use QUALIFY rn = 1 on warehouses that support it.