CTE in SQL: a working guide to the WITH clause

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why analysts need CTEs

Real analytical queries are almost never simple. You prepare data, filter, aggregate, join, aggregate again, and rank. Without CTEs that turns into a nest of subqueries nobody can read. With CTEs it becomes named steps where each block does one thing and the query reads top to bottom like a script.

CTE (Common Table Expression) is the construct you declare with WITH before the main statement. Supported in Postgres, Snowflake, BigQuery, Redshift, SQL Server, and MySQL 8.0+. On a data analyst loop at Stripe, DoorDash, or Airbnb, interviewers expect mid-level candidates to decompose non-trivial questions into named blocks. This guide covers WITH syntax, recursion, and the materialization rules screens at Snowflake and Databricks lean on.

Basic WITH syntax

A CTE is WITH followed by a name, AS, and a parenthesised SELECT. After the closing parenthesis you write the main query and reference the CTE by name as if it were a table.

WITH cte_name AS (
    SELECT ...
    FROM ...
    WHERE ...
)
SELECT *
FROM cte_name;

A worked example — AOV per category, only above one million in revenue:

WITH category_stats AS (
    SELECT
        category,
        COUNT(*)   AS order_count,
        SUM(amount) AS revenue,
        AVG(amount) AS avg_check
    FROM orders
    GROUP BY category
)
SELECT *
FROM category_stats
WHERE revenue > 1000000
ORDER BY avg_check DESC;

Without the CTE this would be a subquery in FROM. With the CTE it splits into two steps: compute statistics, then filter and sort. You can also run the CTE on its own to sanity check the numbers.

CTE vs subquery: when to pick which

Subquery and CTE often produce the same result. Difference: readability and reuse. A top-ten-users query as a subquery:

SELECT *
FROM (
    SELECT
        user_id,
        COUNT(*) AS sessions,
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rn
    FROM user_sessions
    WHERE session_date >= '2026-01-01'
    GROUP BY user_id
) ranked
WHERE rn <= 10;

The same logic with a CTE:

WITH ranked AS (
    SELECT
        user_id,
        COUNT(*) AS sessions,
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rn
    FROM user_sessions
    WHERE session_date >= '2026-01-01'
    GROUP BY user_id
)
SELECT *
FROM ranked
WHERE rn <= 10;

For a single level the difference is cosmetic. Once you have two or three levels, the subquery version becomes unreadable while the CTE still flows in one direction. A subquery is fine for short conditions in WHERE — WHERE user_id IN (SELECT user_id FROM vip_users). A CTE wins whenever the intermediate result is reused, the same logic appears more than once, or you want to comment out the outer query and inspect the partial result.

Chaining multiple CTEs

The real power of WITH is declaring several blocks where each one references the previous ones. You write WITH once — additional CTEs are separated by commas.

WITH daily_orders AS (
    SELECT
        DATE_TRUNC('day', order_date) AS dt,
        COUNT(*)   AS order_count,
        SUM(amount) AS revenue
    FROM orders
    WHERE order_date >= '2026-01-01'
    GROUP BY DATE_TRUNC('day', order_date)
),
weekly_avg AS (
    SELECT
        dt,
        order_count,
        revenue,
        AVG(revenue) OVER (
            ORDER BY dt
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS revenue_7d_avg
    FROM daily_orders
)
SELECT
    dt,
    order_count,
    revenue,
    ROUND(revenue_7d_avg, 2) AS revenue_7d_avg
FROM weekly_avg
ORDER BY dt;

Three steps: aggregate by day, layer a seven-day rolling average, then format the output. Each CTE has one job. If a number looks wrong, comment out the final SELECT, run the chain up to weekly_avg, and see where the bug entered. Reviewers at Snowflake and Databricks watch for this pattern.

Step-by-step transforms: a conversion funnel

Funnel questions show up in every product analyst loop. The shape is always the same: how many users reached step one, step two, and so on. CTEs are tailor-made — each step becomes its own block, and the final SELECT glues counts together.

WITH registrations AS (
    SELECT user_id, created_at
    FROM users
    WHERE created_at >= '2026-01-01'
      AND created_at <  '2026-02-01'
),
activated AS (
    SELECT DISTINCT r.user_id
    FROM registrations r
    JOIN user_actions a ON a.user_id = r.user_id
    WHERE a.action = 'complete_onboarding'
),
first_purchase AS (
    SELECT DISTINCT r.user_id
    FROM registrations r
    JOIN orders o ON o.user_id = r.user_id
    WHERE o.order_date <= r.created_at + INTERVAL '7 days'
),
funnel AS (
    SELECT
        (SELECT COUNT(*) FROM registrations)  AS step_1_registered,
        (SELECT COUNT(*) FROM activated)      AS step_2_activated,
        (SELECT COUNT(*) FROM first_purchase) AS step_3_purchased
)
SELECT
    step_1_registered,
    step_2_activated,
    ROUND(100.0 * step_2_activated  / step_1_registered, 1) AS activation_pct,
    step_3_purchased,
    ROUND(100.0 * step_3_purchased / step_1_registered, 1) AS purchase_pct
FROM funnel;

Four CTEs: the cohort, the activated subset, the seven-day purchase subset, and the assembled funnel. Try the same with nested subqueries and you produce something nobody can review. Each block can be queried alone when QA asks "why did activation drop in February".

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Recursive CTE

Recursive CTEs are a separate construct that lets a query reference itself — useful for hierarchies: org charts, category trees, referral chains, threaded comments. The shape is an anchor query, a UNION ALL, and a recursive arm that joins back to the CTE.

WITH RECURSIVE cte_name AS (
    -- anchor: the base case
    SELECT ...
    UNION ALL
    -- recursive part: references cte_name
    SELECT ...
    FROM cte_name
    JOIN ...
)
SELECT * FROM cte_name;

Textbook example — an employee tree where employees.manager_id points at the boss:

WITH RECURSIVE org_tree AS (
    -- anchor: the CEO (manager_id IS NULL)
    SELECT
        id,
        name,
        manager_id,
        1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- recursion: employees whose manager is already in the tree
    SELECT
        e.id,
        e.name,
        e.manager_id,
        t.level + 1
    FROM employees e
    JOIN org_tree t ON e.manager_id = t.id
)
SELECT
    level,
    name,
    manager_id
FROM org_tree
ORDER BY level, name;

The result is every employee labelled with their depth: CEO at level one, direct reports at level two, and so on. Recursive CTEs come up at mid-level and above. For a junior role, knowing they exist is enough. For mid or senior, you need to write one in front of an interviewer without panicking — especially at Amazon and Microsoft.

CTE vs temporary table

CTEs and temporary tables both name an intermediate result, but differ on every other axis. A CTE lives for one query and disappears when it finishes. A temporary table lives for the whole session, can have indexes, and can be referenced from multiple queries.

Property CTE Temporary table
Scope Single query Whole session
Indexes None Can be created
Reuse Only inside one statement Across many statements
Plan impact Optimizer may inline Always materialised
Best for Decomposing one query Heavy intermediates reused many times

Postgres 12+ inlines a CTE used once and non-recursive, so it adds zero overhead. Force materialisation with AS MATERIALIZED (...); force inlining with AS NOT MATERIALIZED (...). Snowflake and BigQuery always inline non-recursive CTEs.

The common interview phrasing is "when would you pick a CTE over a temp table?". CTE for structuring a single query; temp table when an expensive intermediate is reused across several follow-up queries.

Common pitfalls

Engineers new to CTEs often assume they behave like a view and are therefore "free". In older Postgres (11 and below) every CTE was a hard optimization fence — the planner could not push predicates inside, and it was always materialised. On a billion-row table that meant a fast query suddenly took an hour. Upgrade to Postgres 12+ or add NOT MATERIALIZED when you want inlining.

The second trap is recursion without a stopping condition. If your hierarchy has a cycle, a naive recursive CTE spins forever and either crashes the planner or hits the recursion limit. Add a level counter with WHERE level < 20 to the recursive arm, or track the path and refuse to revisit a visited node. Treat any production recursive CTE as if the data has cycles.

A third pitfall is under-materialising heavy CTEs referenced multiple times. If daily_orders produces ten million rows and the outer query references it three times, an inlining optimizer runs the aggregation three times. There you want AS MATERIALIZED or a temp table loaded once. Check the explain plan first.

The last trap is column naming. A CTE inherits column names from the inner SELECT, so COUNT(*) becomes count. Always alias computed columns inside the CTE — COUNT(*) AS order_count. Without aliases the outer query fails or silently picks up the wrong column when two CTEs share a default name.

Interview questions

1. What is a CTE and why use one?

A CTE is a named intermediate result declared with WITH before the main statement, alive for one query. The point is decomposition — every step gets a name that explains what it does. The query reads top to bottom, debugging is faster because each block can be executed alone, and an intermediate result can be referenced multiple times without rewriting it.

2. Is a CTE materialised or not?

Depends on the engine. Postgres up to 11 always materialised CTEs and treated them as optimization fences. From 12 onward, a non-recursive CTE referenced once is automatically inlined. BigQuery and Snowflake always inline non-recursive CTEs. Force materialisation in Postgres with AS MATERIALIZED (...), force inlining with AS NOT MATERIALIZED (...). The honest interview answer is "depends on the engine, here is how I would check the plan", not a blanket yes or no.

3. Find users whose every order is above their own average order value.

WITH user_avg AS (
    SELECT
        user_id,
        AVG(amount) AS avg_amount
    FROM orders
    GROUP BY user_id
),
below_avg AS (
    SELECT DISTINCT o.user_id
    FROM orders o
    JOIN user_avg u ON o.user_id = u.user_id
    WHERE o.amount <= u.avg_amount
)
SELECT DISTINCT user_id
FROM user_avg
WHERE user_id NOT IN (SELECT user_id FROM below_avg);

Two CTEs: per-user average, then users with at least one order at or below their average. The final SELECT keeps everyone not in that list. Reviewers prefer this shape over a correlated subquery — each block is easy to verify on its own.

4. Write a recursive CTE that returns all subordinates of a given manager.

WITH RECURSIVE subordinates AS (
    SELECT id, name, manager_id
    FROM employees
    WHERE manager_id = 42  -- the starting manager

    UNION ALL

    SELECT e.id, e.name, e.manager_id
    FROM employees e
    JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;

The anchor is the direct reports of employee 42. The recursive arm walks down the tree until no further reports remain. Follow-up: "what if there is a cycle?". Answer: infinite recursion. Guard with a depth counter and WHERE level < N, or by tracking the path and refusing to revisit a visited node.

5. How is a CTE different from a subquery?

Functionally almost identical — both produce an intermediate result for the outer query. Practical differences: a CTE has a name and can be referenced multiple times, while a subquery has to be duplicated. A CTE reads top-to-bottom like a script; nested subqueries read inside-out. For one-shot WHERE conditions, a subquery is simpler. For multi-step pipelines, the CTE wins on readability.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems built around exactly this kind of decomposition.

FAQ

Can I use a CTE in INSERT, UPDATE, and DELETE?

Yes. WITH is not restricted to SELECT. Postgres lets you write WITH ... INSERT INTO, WITH ... UPDATE, and WITH ... DELETE. The pattern is useful when you need to compute an intermediate and then mutate — a CTE that finds users inactive for ninety days, then a DELETE that consumes it.

Does a CTE hurt performance?

In most cases, no. Postgres 12+, BigQuery, Snowflake, and SQL Server inline a non-recursive CTE referenced once, producing the same plan as the equivalent subquery. The exception is a CTE referenced multiple times — inlining means recomputing on every reference, so materialisation can be faster. Check the explain plan; legacy Postgres (11 and below) always materialised.

Is CTE supported in MySQL?

Yes, from MySQL 8.0 (2018). Recursive CTEs too. MySQL 5.7 and earlier have no WITH — fall back to subqueries or temporary tables. Most analyst interviews are Postgres dialect or unspecified, so CTEs are safe. If the interviewer says MySQL, confirm the version first.

When should I prefer a recursive CTE over multiple self-joins?

Self-joins are fine when the depth is fixed and small — manager and manager's manager. Recursion is the right tool when the depth is unbounded: full org charts, comment threads, category trees with arbitrary nesting. A recursive CTE handles unknown depth in one block; self-joins require you to predict the maximum depth and break the moment data exceeds it.

How do I debug a CTE returning wrong numbers?

Treat the chain like a pipeline. Comment out the final SELECT and run SELECT * FROM <last_cte> LIMIT 100. If that looks right, walk up one CTE at a time. The CTE where the numbers first go wrong is the one to fix. This is exactly why CTEs are worth writing — each block is independently runnable, so you can binary-search the bug.