May 18, 2026·9 min read

CTE SQL cheat sheet

Q: Can a CTE reference itself?

Only if declared with `WITH RECURSIVE`. A non-recursive CTE can only reference previously declared siblings in the same `WITH` block. A recursive CTE has three required pieces: a base `SELECT` for the seed rows, `UNION ALL`, and a recursive `SELECT` that joins back to the CTE name. Use cases: hierarchies, graph traversal, and generating calendars or integer ranges.

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

Contents:

Why CTEs matter for analysts
Basic syntax
Chained CTEs
CTE vs subquery vs temp table
Materialization
Recursive CTEs
CTEs with window functions
Modifying CTEs
Common pitfalls
Ten interview tasks
Optimization tips
Related reading
FAQ

Why CTEs matter for analysts

CTEs (Common Table Expressions, also known as WITH clauses) turn tangled five-level subqueries into readable step-by-step code. On an analyst interview at Stripe, Airbnb, or DoorDash, the ability to rewrite a nest of subqueries into a clean CTE chain is a strong positive signal — it shows you decompose problems before you write SQL.

A CTE is a named intermediate result that lives inside one query. After the outer SELECT finishes, the CTE is gone — no throwaway tables every time you build a funnel or a cohort.

The other reason CTEs show up everywhere on interview loops is that they map onto how interviewers think. A clean question is a sequence of steps: pick active users, attach their orders, compute LTV, rank by cohort. Each step is one CTE. Reviewers and on-call engineers pick the CTE version every time.

Basic syntax

WITH cte_name AS (
    SELECT ...
    FROM ...
    WHERE ...
)
SELECT *
FROM cte_name
WHERE ...;

Everything between the parentheses after AS is a regular SELECT. After the closing paren you reference the result by name as if it were a real table. The semicolon terminates the whole statement.

Chained CTEs

You can define several CTEs separated by commas. Each subsequent CTE sees the ones declared earlier in the same WITH block.

WITH
    active_users AS (
        SELECT user_id
        FROM users
        WHERE last_active >= CURRENT_DATE - INTERVAL '7 day'
    ),
    user_orders AS (
        SELECT user_id, COUNT(*) AS orders_cnt, SUM(amount) AS revenue
        FROM orders
        WHERE user_id IN (SELECT user_id FROM active_users)
        GROUP BY user_id
    )
SELECT
    a.user_id,
    COALESCE(o.orders_cnt, 0) AS orders_cnt,
    COALESCE(o.revenue, 0) AS revenue
FROM active_users a
LEFT JOIN user_orders o USING (user_id);

Every step has a name and can be inspected on its own (swap the final SELECT for SELECT * FROM active_users to debug). That single property is why analysts at Notion, Linear, and Snowflake reach for WITH instead of nested subqueries.

CTE vs subquery vs temp table

A recurring interview question. The short answer:

Criterion	CTE	Subquery	Temp table
Scope	one query	one query	session
Reusable	yes, within the query	no	yes
Materialization	engine-dependent	usually inline	physical
Indexes	no	no	yes
Readability	high	drops fast with nesting	high

What to say out loud: CTEs and subqueries are one-shot constructs scoped to one statement. Temp tables live for the session, can be indexed, and can be reused across queries. Performance is roughly equivalent on modern engines; the CTE form is dramatically easier to review.

Materialization

A nuance interviewers love to probe:

PostgreSQL 12 and older materializes every CTE by default. Force inlining with WITH ... AS NOT MATERIALIZED.
PostgreSQL 13+ lets the planner decide. Simple non-recursive CTEs are inlined.
Snowflake, BigQuery, Databricks inline CTEs the same way they inline subqueries.

For interviews: modern engines treat CTE as a transparent wrapper, so the old performance argument against WITH is dead.

Recursive CTEs

The most powerful — and most frequently fumbled — part of the syntax.

WITH RECURSIVE numbers AS (
    SELECT 1 AS n           -- base CASE
    UNION ALL
    SELECT n + 1             -- RECURSIVE step
    FROM numbers
    WHERE n < 10             -- stop condition
)
SELECT n FROM numbers;

That returns the integers 1 through 10. The structure is always: base SELECT (seed rows), UNION ALL, recursive SELECT that references the CTE itself, plus a WHERE that stops the recursion.

Unfolding an employee hierarchy is the canonical use:

WITH RECURSIVE tree AS (
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL   -- root: the CEO
    UNION ALL
    SELECT e.id, e.name, e.manager_id, t.level + 1
    FROM employees e
    JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree ORDER BY level;

Interview prompts for org-chart traversal, comment threads, or referral chains map onto this template line-for-line.

Filling missing dates is the other classic case — generate a calendar, LEFT JOIN orders onto it, and zero-revenue days appear as rows instead of being silently dropped.

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

CTEs with window functions

The canonical pattern: one CTE computes a window function, the outer SELECT filters on it. You cannot filter on a window function inside WHERE directly because window functions evaluate after WHERE.

WITH ranked AS (
    SELECT
        user_id,
        order_id,
        amount,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS order_num
    FROM orders
)
SELECT *
FROM ranked
WHERE order_num = 1;

"Find every user's first order" is one of the most common SQL screens — Meta, Amazon, and Uber all ask a variant.

Modifying CTEs

In PostgreSQL (and a few other engines) a CTE can contain INSERT, UPDATE, or DELETE in addition to SELECT. Combined with RETURNING, you can chain mutations in a single statement.

WITH deleted AS (
    DELETE FROM logs
    WHERE created_at < CURRENT_DATE - INTERVAL '90 day'
    RETURNING *
)
INSERT INTO logs_archive
SELECT * FROM deleted;

One statement archives old log rows. This shows up on senior data-engineering loops more than analyst screens.

Common pitfalls

The first trap is reference order. CTE definitions are evaluated top-down inside a single WITH block, so a CTE can only see siblings declared earlier. If you reference cte_b from inside cte_a while cte_b is defined later, the planner rejects the query with an "undefined table" error. The fix is to reorder the definitions. Recursive CTEs are the only legal self-reference, and even those reference themselves, not later siblings.

The second pitfall is the missing RECURSIVE keyword. PostgreSQL, Snowflake, and BigQuery all require WITH RECURSIVE even when only one CTE in a chain is recursive. Forget it and you get a parser error pointing at the inner UNION ALL, which is rarely the actual problem.

The third trap is unbounded recursion. Every recursive CTE needs a stop condition in the recursive step's WHERE clause. Without one, the query runs until it hits the engine's recursion limit — high enough on PostgreSQL that you can melt a small warehouse first. Always add an explicit ceiling on the depth column or date range, even when you "know" the data is bounded.

The fourth trap is treating a CTE like a temp table that survives across statements. A CTE is born and dies with the statement that defines it. If you need to reuse the result, materialize into CREATE TEMP TABLE ... AS SELECT ... or a view. Referencing a CTE in a separate statement is the single most common screen-share interview mistake.

Ten interview tasks

Task 1. Each user's first order

WITH first_orders AS (
    SELECT user_id, order_id, created_at,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS rn
    FROM orders
)
SELECT user_id, order_id, created_at
FROM first_orders
WHERE rn = 1;

Task 2. Users with at least three orders in a month

WITH monthly_orders AS (
    SELECT user_id, DATE_TRUNC('month', created_at) AS month, COUNT(*) AS cnt
    FROM orders
    GROUP BY user_id, DATE_TRUNC('month', created_at)
)
SELECT DISTINCT user_id
FROM monthly_orders
WHERE cnt >= 3;

WITH city_revenue AS (
    SELECT city, SUM(amount) AS rev FROM orders GROUP BY city
),
total AS (
    SELECT SUM(rev) AS all_rev FROM city_revenue
)
SELECT city, rev, ROUND(100.0 * rev / all_rev, 2) AS pct
FROM city_revenue, total
ORDER BY rev DESC;

Task 4. Funnel: view to cart to purchase

WITH events_pivot AS (
    SELECT user_id,
        MAX(CASE WHEN event = 'view' THEN 1 ELSE 0 END) AS viewed,
        MAX(CASE WHEN event = 'cart' THEN 1 ELSE 0 END) AS carted,
        MAX(CASE WHEN event = 'purchase' THEN 1 ELSE 0 END) AS purchased
    FROM events
    GROUP BY user_id
)
SELECT
    SUM(viewed) AS step1,
    SUM(carted) AS step2,
    SUM(purchased) AS step3
FROM events_pivot;

Task 5. Weekly retention

WITH cohort AS (
    SELECT user_id, DATE_TRUNC('week', MIN(created_at)) AS cohort_week
    FROM users GROUP BY user_id
),
activity AS (
    SELECT user_id, DATE_TRUNC('week', event_time) AS active_week
    FROM events
)
SELECT
    c.cohort_week,
    a.active_week,
    COUNT(DISTINCT a.user_id) AS active_users
FROM cohort c
JOIN activity a USING (user_id)
GROUP BY c.cohort_week, a.active_week
ORDER BY 1, 2;

Task 6. Employees paid above the department average

WITH dept_avg AS (
    SELECT department_id, AVG(salary) AS avg_sal
    FROM employees
    GROUP BY department_id
)
SELECT e.name, e.salary, e.department_id, d.avg_sal
FROM employees e
JOIN dept_avg d USING (department_id)
WHERE e.salary > d.avg_sal;

Task 7. Management chain (recursive CTE)

WITH RECURSIVE chain AS (
    SELECT id, name, manager_id, 0 AS level FROM employees WHERE name = 'Alex'
    UNION ALL
    SELECT e.id, e.name, e.manager_id, c.level + 1
    FROM employees e JOIN chain c ON e.id = c.manager_id
)
SELECT * FROM chain;

Task 8. Daily revenue including zero days

WITH RECURSIVE cal AS (
    SELECT DATE '2026-04-01' AS day
    UNION ALL
    SELECT day + INTERVAL '1 day' FROM cal WHERE day < DATE '2026-04-30'
)
SELECT cal.day, COALESCE(SUM(o.amount), 0) AS revenue
FROM cal LEFT JOIN orders o ON o.created_at::DATE = cal.day
GROUP BY cal.day ORDER BY cal.day;

Task 9. Cohort first and last activity

WITH user_activity AS (
    SELECT user_id,
        MIN(event_time) AS first_seen,
        MAX(event_time) AS last_seen
    FROM events
    GROUP BY user_id
)
SELECT
    DATE_TRUNC('month', first_seen) AS cohort,
    COUNT(*) AS users,
    AVG(EXTRACT(EPOCH FROM (last_seen - first_seen)) / 86400) AS avg_lifetime_days
FROM user_activity
GROUP BY cohort
ORDER BY cohort;

Task 10. Categories with MoM growth above 20 percent

WITH monthly AS (
    SELECT category, DATE_TRUNC('month', created_at) AS month, SUM(amount) AS rev
    FROM orders JOIN products USING (product_id)
    GROUP BY category, DATE_TRUNC('month', created_at)
),
with_lag AS (
    SELECT category, month, rev,
        LAG(rev) OVER (PARTITION BY category ORDER BY month) AS prev_rev
    FROM monthly
)
SELECT category, month, rev, prev_rev,
    ROUND(100.0 * (rev - prev_rev) / prev_rev, 1) AS growth_pct
FROM with_lag
WHERE prev_rev IS NOT NULL AND rev > prev_rev * 1.2;

Optimization tips

Each CTE in a chain is a join target, so the planner can push filters into it — but only when the CTE is inlined. On PostgreSQL 13+, Snowflake, Databricks, and BigQuery, inlining is the default. On PostgreSQL 12, add AS NOT MATERIALIZED when you want filter pushdown.

For huge intermediate results reused three or more times in the same query, the calculus flips: materialization wins because the inner result is computed once. Force materialization on Postgres 13+ with AS MATERIALIZED; measure with EXPLAIN ANALYZE.

Recursive CTEs benefit from indexing the join key. The recursive step joins the working table to the source on a key column — if it is unindexed, every iteration triggers a sequential scan. For org-chart traversals, an index on employees(manager_id) turns a 10-second query into a 50-millisecond one.

If you want to drill SQL questions like these daily, NAILDD is launching with 500+ SQL problems built around exactly this CTE-first pattern.

FAQ

What is the actual difference between a CTE and a subquery?

Functionally they are equivalent — both are one-shot intermediate results scoped to a single statement. A CTE has a name and can be referenced multiple times in the outer query; a subquery is anonymous and single-use. On modern engines (PostgreSQL 13+, Snowflake, BigQuery, Databricks) the optimizer inlines the CTE so the execution plan matches the subquery form. The reason to reach for CTEs is readability, not performance.

When should I use a CTE versus a temp table?

Use a CTE when the logic is only needed inside one query and the intermediate result is not enormous. Use a temp table when you need to reuse the result across multiple statements, when the dataset is large enough that recomputing it is expensive, or when you need an index on the intermediate result. Temp tables also give the planner more flexibility when CTE materialization rules hurt performance.

Can a CTE reference itself?

Only if declared with WITH RECURSIVE. A non-recursive CTE can only reference previously declared siblings in the same WITH block. A recursive CTE has three required pieces: a base SELECT for the seed rows, UNION ALL, and a recursive SELECT that joins back to the CTE name. Use cases: hierarchies, graph traversal, and generating calendars or integer ranges.

How many CTEs can I put in one query?

Engines allow dozens, but the practical readability ceiling is three to seven. Beyond ten you should extract stable subroutines into views or dbt models. The pattern at Airbnb and Notion: analytical queries stay under five CTEs, and shared upstream logic lives in dbt models referenced like regular tables.

Are CTEs slower than subqueries?

Not on modern engines. On PostgreSQL 12 and older they were always materialized and could be slower because the planner could not push filters in. From PostgreSQL 13 onward, simple non-recursive CTEs are inlined by default. Snowflake, BigQuery, and Databricks have always inlined CTEs. If a CTE outperforms the same logic as a subquery, the cause is usually one-shot materialization of a repeated computation, which you can force explicitly when needed.