CTE SQL cheat sheet
Contents:
Why CTEs matter for analysts
CTEs (Common Table Expressions, also known as WITH clauses) turn tangled five-level subqueries into readable step-by-step code. On an analyst interview at Stripe, Airbnb, or DoorDash, the ability to rewrite a nest of subqueries into a clean CTE chain is a strong positive signal — it shows you decompose problems before you write SQL.
A CTE is a named intermediate result that lives inside one query. After the outer SELECT finishes, the CTE is gone — no throwaway tables every time you build a funnel or a cohort.
The other reason CTEs show up everywhere on interview loops is that they map onto how interviewers think. A clean question is a sequence of steps: pick active users, attach their orders, compute LTV, rank by cohort. Each step is one CTE. Reviewers and on-call engineers pick the CTE version every time.
Basic syntax
WITH cte_name AS (
SELECT ...
FROM ...
WHERE ...
)
SELECT *
FROM cte_name
WHERE ...;Everything between the parentheses after AS is a regular SELECT. After the closing paren you reference the result by name as if it were a real table. The semicolon terminates the whole statement.
Chained CTEs
You can define several CTEs separated by commas. Each subsequent CTE sees the ones declared earlier in the same WITH block.
WITH
active_users AS (
SELECT user_id
FROM users
WHERE last_active >= CURRENT_DATE - INTERVAL '7 day'
),
user_orders AS (
SELECT user_id, COUNT(*) AS orders_cnt, SUM(amount) AS revenue
FROM orders
WHERE user_id IN (SELECT user_id FROM active_users)
GROUP BY user_id
)
SELECT
a.user_id,
COALESCE(o.orders_cnt, 0) AS orders_cnt,
COALESCE(o.revenue, 0) AS revenue
FROM active_users a
LEFT JOIN user_orders o USING (user_id);Every step has a name and can be inspected on its own (swap the final SELECT for SELECT * FROM active_users to debug). That single property is why analysts at Notion, Linear, and Snowflake reach for WITH instead of nested subqueries.
CTE vs subquery vs temp table
A recurring interview question. The short answer:
| Criterion | CTE | Subquery | Temp table |
|---|---|---|---|
| Scope | one query | one query | session |
| Reusable | yes, within the query | no | yes |
| Materialization | engine-dependent | usually inline | physical |
| Indexes | no | no | yes |
| Readability | high | drops fast with nesting | high |
What to say out loud: CTEs and subqueries are one-shot constructs scoped to one statement. Temp tables live for the session, can be indexed, and can be reused across queries. Performance is roughly equivalent on modern engines; the CTE form is dramatically easier to review.
Materialization
A nuance interviewers love to probe:
- PostgreSQL 12 and older materializes every CTE by default. Force inlining with
WITH ... AS NOT MATERIALIZED. - PostgreSQL 13+ lets the planner decide. Simple non-recursive CTEs are inlined.
- Snowflake, BigQuery, Databricks inline CTEs the same way they inline subqueries.
For interviews: modern engines treat CTE as a transparent wrapper, so the old performance argument against WITH is dead.
Recursive CTEs
The most powerful — and most frequently fumbled — part of the syntax.
WITH RECURSIVE numbers AS (
SELECT 1 AS n -- base CASE
UNION ALL
SELECT n + 1 -- RECURSIVE step
FROM numbers
WHERE n < 10 -- stop condition
)
SELECT n FROM numbers;That returns the integers 1 through 10. The structure is always: base SELECT (seed rows), UNION ALL, recursive SELECT that references the CTE itself, plus a WHERE that stops the recursion.
Unfolding an employee hierarchy is the canonical use:
WITH RECURSIVE tree AS (
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL -- root: the CEO
UNION ALL
SELECT e.id, e.name, e.manager_id, t.level + 1
FROM employees e
JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree ORDER BY level;Interview prompts for org-chart traversal, comment threads, or referral chains map onto this template line-for-line.
Filling missing dates is the other classic case — generate a calendar, LEFT JOIN orders onto it, and zero-revenue days appear as rows instead of being silently dropped.
CTEs with window functions
The canonical pattern: one CTE computes a window function, the outer SELECT filters on it. You cannot filter on a window function inside WHERE directly because window functions evaluate after WHERE.
WITH ranked AS (
SELECT
user_id,
order_id,
amount,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS order_num
FROM orders
)
SELECT *
FROM ranked
WHERE order_num = 1;"Find every user's first order" is one of the most common SQL screens — Meta, Amazon, and Uber all ask a variant.
Modifying CTEs
In PostgreSQL (and a few other engines) a CTE can contain INSERT, UPDATE, or DELETE in addition to SELECT. Combined with RETURNING, you can chain mutations in a single statement.
WITH deleted AS (
DELETE FROM logs
WHERE created_at < CURRENT_DATE - INTERVAL '90 day'
RETURNING *
)
INSERT INTO logs_archive
SELECT * FROM deleted;One statement archives old log rows. This shows up on senior data-engineering loops more than analyst screens.
Common pitfalls
The first trap is reference order. CTE definitions are evaluated top-down inside a single WITH block, so a CTE can only see siblings declared earlier. If you reference cte_b from inside cte_a while cte_b is defined later, the planner rejects the query with an "undefined table" error. The fix is to reorder the definitions. Recursive CTEs are the only legal self-reference, and even those reference themselves, not later siblings.
The second pitfall is the missing RECURSIVE keyword. PostgreSQL, Snowflake, and BigQuery all require WITH RECURSIVE even when only one CTE in a chain is recursive. Forget it and you get a parser error pointing at the inner UNION ALL, which is rarely the actual problem.
The third trap is unbounded recursion. Every recursive CTE needs a stop condition in the recursive step's WHERE clause. Without one, the query runs until it hits the engine's recursion limit — high enough on PostgreSQL that you can melt a small warehouse first. Always add an explicit ceiling on the depth column or date range, even when you "know" the data is bounded.
The fourth trap is treating a CTE like a temp table that survives across statements. A CTE is born and dies with the statement that defines it. If you need to reuse the result, materialize into CREATE TEMP TABLE ... AS SELECT ... or a view. Referencing a CTE in a separate statement is the single most common screen-share interview mistake.
Ten interview tasks
Task 1. Each user's first order
WITH first_orders AS (
SELECT user_id, order_id, created_at,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS rn
FROM orders
)
SELECT user_id, order_id, created_at
FROM first_orders
WHERE rn = 1;Task 2. Users with at least three orders in a month
WITH monthly_orders AS (
SELECT user_id, DATE_TRUNC('month', created_at) AS month, COUNT(*) AS cnt
FROM orders
GROUP BY user_id, DATE_TRUNC('month', created_at)
)
SELECT DISTINCT user_id
FROM monthly_orders
WHERE cnt >= 3;Task 3. Each city's share of total revenue
WITH city_revenue AS (
SELECT city, SUM(amount) AS rev FROM orders GROUP BY city
),
total AS (
SELECT SUM(rev) AS all_rev FROM city_revenue
)
SELECT city, rev, ROUND(100.0 * rev / all_rev, 2) AS pct
FROM city_revenue, total
ORDER BY rev DESC;Task 4. Funnel: view to cart to purchase
WITH events_pivot AS (
SELECT user_id,
MAX(CASE WHEN event = 'view' THEN 1 ELSE 0 END) AS viewed,
MAX(CASE WHEN event = 'cart' THEN 1 ELSE 0 END) AS carted,
MAX(CASE WHEN event = 'purchase' THEN 1 ELSE 0 END) AS purchased
FROM events
GROUP BY user_id
)
SELECT
SUM(viewed) AS step1,
SUM(carted) AS step2,
SUM(purchased) AS step3
FROM events_pivot;Task 5. Weekly retention
WITH cohort AS (
SELECT user_id, DATE_TRUNC('week', MIN(created_at)) AS cohort_week
FROM users GROUP BY user_id
),
activity AS (
SELECT user_id, DATE_TRUNC('week', event_time) AS active_week
FROM events
)
SELECT
c.cohort_week,
a.active_week,
COUNT(DISTINCT a.user_id) AS active_users
FROM cohort c
JOIN activity a USING (user_id)
GROUP BY c.cohort_week, a.active_week
ORDER BY 1, 2;Task 6. Employees paid above the department average
WITH dept_avg AS (
SELECT department_id, AVG(salary) AS avg_sal
FROM employees
GROUP BY department_id
)
SELECT e.name, e.salary, e.department_id, d.avg_sal
FROM employees e
JOIN dept_avg d USING (department_id)
WHERE e.salary > d.avg_sal;Task 7. Management chain (recursive CTE)
WITH RECURSIVE chain AS (
SELECT id, name, manager_id, 0 AS level FROM employees WHERE name = 'Alex'
UNION ALL
SELECT e.id, e.name, e.manager_id, c.level + 1
FROM employees e JOIN chain c ON e.id = c.manager_id
)
SELECT * FROM chain;Task 8. Daily revenue including zero days
WITH RECURSIVE cal AS (
SELECT DATE '2026-04-01' AS day
UNION ALL
SELECT day + INTERVAL '1 day' FROM cal WHERE day < DATE '2026-04-30'
)
SELECT cal.day, COALESCE(SUM(o.amount), 0) AS revenue
FROM cal LEFT JOIN orders o ON o.created_at::DATE = cal.day
GROUP BY cal.day ORDER BY cal.day;Task 9. Cohort first and last activity
WITH user_activity AS (
SELECT user_id,
MIN(event_time) AS first_seen,
MAX(event_time) AS last_seen
FROM events
GROUP BY user_id
)
SELECT
DATE_TRUNC('month', first_seen) AS cohort,
COUNT(*) AS users,
AVG(EXTRACT(EPOCH FROM (last_seen - first_seen)) / 86400) AS avg_lifetime_days
FROM user_activity
GROUP BY cohort
ORDER BY cohort;Task 10. Categories with MoM growth above 20 percent
WITH monthly AS (
SELECT category, DATE_TRUNC('month', created_at) AS month, SUM(amount) AS rev
FROM orders JOIN products USING (product_id)
GROUP BY category, DATE_TRUNC('month', created_at)
),
with_lag AS (
SELECT category, month, rev,
LAG(rev) OVER (PARTITION BY category ORDER BY month) AS prev_rev
FROM monthly
)
SELECT category, month, rev, prev_rev,
ROUND(100.0 * (rev - prev_rev) / prev_rev, 1) AS growth_pct
FROM with_lag
WHERE prev_rev IS NOT NULL AND rev > prev_rev * 1.2;Optimization tips
Each CTE in a chain is a join target, so the planner can push filters into it — but only when the CTE is inlined. On PostgreSQL 13+, Snowflake, Databricks, and BigQuery, inlining is the default. On PostgreSQL 12, add AS NOT MATERIALIZED when you want filter pushdown.
For huge intermediate results reused three or more times in the same query, the calculus flips: materialization wins because the inner result is computed once. Force materialization on Postgres 13+ with AS MATERIALIZED; measure with EXPLAIN ANALYZE.
Recursive CTEs benefit from indexing the join key. The recursive step joins the working table to the source on a key column — if it is unindexed, every iteration triggers a sequential scan. For org-chart traversals, an index on employees(manager_id) turns a 10-second query into a 50-millisecond one.
Related reading
- SQL window functions interview questions
- CASE WHEN SQL cheat sheet
- NULL in SQL cheat sheet
- How to calculate cohort retention in SQL
- How to calculate funnel in SQL
If you want to drill SQL questions like these daily, NAILDD is launching with 500+ SQL problems built around exactly this CTE-first pattern.
FAQ
What is the actual difference between a CTE and a subquery?
Functionally they are equivalent — both are one-shot intermediate results scoped to a single statement. A CTE has a name and can be referenced multiple times in the outer query; a subquery is anonymous and single-use. On modern engines (PostgreSQL 13+, Snowflake, BigQuery, Databricks) the optimizer inlines the CTE so the execution plan matches the subquery form. The reason to reach for CTEs is readability, not performance.
When should I use a CTE versus a temp table?
Use a CTE when the logic is only needed inside one query and the intermediate result is not enormous. Use a temp table when you need to reuse the result across multiple statements, when the dataset is large enough that recomputing it is expensive, or when you need an index on the intermediate result. Temp tables also give the planner more flexibility when CTE materialization rules hurt performance.
Can a CTE reference itself?
Only if declared with WITH RECURSIVE. A non-recursive CTE can only reference previously declared siblings in the same WITH block. A recursive CTE has three required pieces: a base SELECT for the seed rows, UNION ALL, and a recursive SELECT that joins back to the CTE name. Use cases: hierarchies, graph traversal, and generating calendars or integer ranges.
How many CTEs can I put in one query?
Engines allow dozens, but the practical readability ceiling is three to seven. Beyond ten you should extract stable subroutines into views or dbt models. The pattern at Airbnb and Notion: analytical queries stay under five CTEs, and shared upstream logic lives in dbt models referenced like regular tables.
Are CTEs slower than subqueries?
Not on modern engines. On PostgreSQL 12 and older they were always materialized and could be slower because the planner could not push filters in. From PostgreSQL 13 onward, simple non-recursive CTEs are inlined by default. Snowflake, BigQuery, and Databricks have always inlined CTEs. If a CTE outperforms the same logic as a subquery, the cause is usually one-shot materialization of a repeated computation, which you can force explicitly when needed.