CTE vs subquery — when each one wins

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why this question keeps coming up

A hiring manager at Stripe asks you to rewrite a forty-line block of nested SELECTs so a junior can read it next quarter. Parentheses three levels deep, the same aggregation duplicated twice because someone needed it in both a JOIN and a HAVING clause. You know the answer involves a CTE. But if you blindly wrap everything in WITH, the next interviewer asks why the plan looks worse and whether you understand materialization.

CTE vs subquery is an interview prompt that looks like trivia but is really about engineering judgement. Both constructs let you compose a query from smaller pieces. The honest framing: they are two ways of naming and reusing a result set, and the right pick depends on how many times you read that result, whether you need recursion, what engine you are on, and whether you trust the planner to inline it.

This guide walks both forms through the same example and gives you the answer to expect at a Snowflake or Databricks interview.

What a CTE looks like

A Common Table Expression is a named, scoped result set declared with the WITH keyword. It lives only for the duration of a single statement and behaves like a virtual table you can reference by name.

WITH active_users AS (
  SELECT user_id, COUNT(*) AS sessions
  FROM sessions
  WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT u.name, a.sessions
FROM users u
JOIN active_users a ON u.user_id = a.user_id
WHERE a.sessions > 10;

You define active_users once at the top, then treat it as if it were a real table. The reader scans top to bottom: first the building block, then the assembly. There is no nesting to track.

What a subquery looks like

A subquery is a SELECT placed inline inside another statement — in the FROM clause, the WHERE clause, the SELECT list, or as part of a JOIN target. Here is the same query rewritten with the active-user calculation as an inline derived table:

SELECT u.name, a.sessions
FROM users u
JOIN (
  SELECT user_id, COUNT(*) AS sessions
  FROM sessions
  WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY user_id
) a ON u.user_id = a.user_id
WHERE a.sessions > 10;

Identical output, same execution plan on most modern engines, but the building block is now buried inside the FROM clause and your eye has to jump between the alias a and the parenthesized definition above it.

The real differences

The honest comparison has five axes: readability, reuse, recursion, performance, and scoping.

Readability. CTEs read like a paragraph: each step has a name, each step appears once, and the final SELECT pulls them together. Subqueries read like a math expression with parentheses — fine for one or two levels, painful at three or four. On any query longer than fifteen lines, the CTE form is easier to onboard a new teammate to.

Reuse. A CTE can be referenced from multiple places in the outer statement, including from later CTEs in the same WITH chain. A subquery cannot — if you need the same derived table in two JOINs, you copy and paste it, and now you have two places to change when the logic shifts.

Recursion. Only CTEs can be recursive. If you need to walk a manager hierarchy, expand a bill-of-materials tree, or follow a chain of referrals, WITH RECURSIVE is the only standard-SQL way to do it. Subqueries cannot reference themselves.

Performance. This is the slippery one. On PostgreSQL 12+, MySQL 8+, Snowflake, BigQuery, and Databricks SQL, the optimizer will usually inline a non-recursive CTE so the plan is identical to the subquery form. On older PostgreSQL (≤11), CTEs were always materialized — a fence the planner could not cross — which sometimes helped (avoided recomputation) and sometimes hurt (blocked predicate pushdown).

Scoping. CTEs are visible to everything below them in the WITH chain and to the final SELECT. Subqueries are visible only at their lexical position. CTEs are also easier to debug: comment out the final SELECT, replace it with SELECT * FROM my_cte, and you have a one-line inspector for the intermediate result.

When CTEs are the better choice

The first scenario is a multi-step pipeline. You take raw events, aggregate to daily revenue, compute a rolling seven-day average, and finally flag the days where revenue spiked.

WITH daily_revenue AS (
  SELECT DATE_TRUNC('day', created_at) AS day,
         SUM(amount) AS revenue
  FROM orders
  GROUP BY 1
),
weekly_avg AS (
  SELECT day,
         revenue,
         AVG(revenue) OVER (ORDER BY day ROWS 6 PRECEDING) AS rolling_avg
  FROM daily_revenue
)
SELECT day, revenue, rolling_avg
FROM weekly_avg
WHERE revenue > rolling_avg * 1.5;

Each step gets a name. The reader sees the recipe before they see the seasoning. Try writing the same logic with nested subqueries and you will have a four-deep parenthesis stack that nobody wants to touch.

The second scenario is reuse. You compute a user_stats aggregate once and reference it from both arms of a UNION:

WITH user_stats AS (
  SELECT user_id, COUNT(*) AS orders, SUM(amount) AS total
  FROM orders
  GROUP BY user_id
)
SELECT 'high_value' AS segment, COUNT(*) AS users
FROM user_stats
WHERE total > 10000
UNION ALL
SELECT 'low_value' AS segment, COUNT(*) AS users
FROM user_stats
WHERE total <= 10000;

With subqueries you would duplicate the aggregation. Two copies means two places to fix when the spec changes, and two scans of orders if the planner cannot deduplicate.

The third scenario is recursion. There is no subquery equivalent — you have to use a recursive CTE:

WITH RECURSIVE org_tree AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, t.level + 1
  FROM employees e
  JOIN org_tree t ON e.manager_id = t.id
)
SELECT id, name, level
FROM org_tree
ORDER BY level, name;

Recursive CTEs are how you walk trees and graphs in SQL: org charts, comment threads, category hierarchies, graph traversals up to a depth limit.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

When a subquery is the better choice

The honest case for subqueries is short, one-off filters where naming the result adds noise. The classic example is filtering against a scalar aggregate:

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Wrapping (SELECT AVG(salary) FROM employees) in a CTE would not make this clearer — it would add three lines for a single number. Same with EXISTS and IN predicates: a correlated subquery is idiomatic, and rewriting it as a CTE often produces a worse plan because the planner loses the correlation hint.

The rule of thumb: if the derived table is used exactly once, shorter than four lines, and has no business-logic name worth giving it, leave it as a subquery. Anything more complex earns a CTE.

Performance — the honest answer

This is where bad interview advice creeps in. Some people will tell you "CTEs are always slower because they materialize." That has not been true on most engines for years.

On PostgreSQL 12+, non-recursive CTEs are inlined by default unless you explicitly write WITH foo AS MATERIALIZED (...). On PostgreSQL 11 and earlier, every CTE was a hard optimization fence — useful when you wanted to force materialization (for example, a CTE referenced five times that would otherwise be recomputed five times), painful when you wanted predicate pushdown.

On MySQL 8, Snowflake, BigQuery, and Databricks SQL, the planner treats CTEs and inline subqueries equivalently in most cases. The choice is about readability, not speed.

The two cases where the form does change the plan: recursive CTEs (no subquery equivalent exists), and explicit MATERIALIZED or NOT MATERIALIZED hints on PostgreSQL 12+. The latter is worth knowing because it gives you a knob: force the fence if you want to avoid recomputation, or force the inline if you want predicate pushdown.

What to say in an interview: "On modern engines they are usually equivalent. I default to CTEs for readability. If a CTE is referenced once and the planner cannot push predicates into it, that is an old-Postgres footgun I would rewrite or add the inline hint to."

Common pitfalls

The most common mistake is reaching for a CTE when a scalar subquery would do. A four-line WITH block to compute a single AVG adds vertical space without adding clarity. Every CTE is a name your reader has to keep in their head, and names are not free.

A second trap is assuming CTEs are caches. They are not. A non-materialized CTE referenced three times will be evaluated three times — the planner inlines it three times. If the underlying scan is expensive and the result is small, you may want the materialized form to compute once and reuse, but you have to ask for it explicitly on engines that default to inlining.

A third trap is the recursive CTE that never terminates. If your join condition is wrong or the data has a cycle, the recursion runs until it hits the engine's depth limit and dies with an opaque error. Include a level counter and a WHERE level < 100 guard while you develop, and remove it only after you have verified the data is acyclic.

A fourth pitfall is name shadowing. A CTE named users silently overrides a real table named users for the rest of the query. New readers assume the SELECT is reading the table; they are actually reading your CTE. Pick names that signal a derived set — active_users, eligible_orders, monthly_cohort.

A fifth pitfall, especially in interviews, is forgetting that CTEs only live for the duration of the statement. They are not temporary tables, and you cannot reference one from a second query in the same session. If you need that, use CREATE TEMP TABLE.

Optimization tips

When a CTE is referenced more than once and the underlying scan is expensive, force materialization. On PostgreSQL 12+ that is WITH foo AS MATERIALIZED (...). On engines without the hint, emulate it by writing the derived set to a CREATE TEMP TABLE and joining against the temp table.

When a CTE wraps a filtered scan and you want the planner to push the outer WHERE into it, do the opposite — use NOT MATERIALIZED on Postgres or inline it as a subquery. The cost is readability; the benefit is the engine reads fewer rows.

For recursive CTEs, watch the recursion depth. PostgreSQL has no built-in limit; Snowflake caps at 100 by default; BigQuery caps at 500. Add a level filter in the recursive arm if your hierarchy can grow unbounded.

For pipelines with more than ten CTEs, break them into materialized views or tables in your transformation tool (dbt, Dataform). A 200-line WITH block is hard to debug, hard to test, and hard to incrementalize.

If you want to drill SQL problems like this every day until the patterns are reflex, NAILDD is launching with 500+ SQL questions across exactly these kinds of composition tradeoffs.

FAQ

Does a CTE always slow down a query?

No, and on most modern engines it does not change the plan at all. PostgreSQL 12+, MySQL 8+, Snowflake, BigQuery, and Databricks SQL inline a non-recursive CTE so the optimizer sees the same shape as the subquery form. The "CTEs are always slower" claim comes from PostgreSQL 11 and earlier where every CTE was a hard optimization fence. On Postgres 12+, pick either form for performance and let readability decide.

How many CTEs can I chain in a single query?

There is no fixed limit in the SQL standard, and engines support hundreds in practice. The practical ceiling is readability — once you cross ten or fifteen CTEs, you are writing a small data pipeline and you would be better served materializing intermediate steps as tables or views in dbt. Long WITH chains are hard to test in isolation, hard to incrementalize, and slow to iterate on.

Is a CTE the same as a temporary table?

No. A CREATE TEMP TABLE is a real object that lives until the end of the session and can be referenced by multiple statements. A CTE exists only during a single statement and disappears the moment it finishes. Temp tables also have their own statistics, which the planner uses for cardinality estimates — CTEs do not.

Can I do INSERT or UPDATE inside a CTE?

Yes, on engines that support data-modifying CTEs. PostgreSQL is the most permissive — you can put INSERT, UPDATE, or DELETE inside a CTE and use RETURNING to pipe affected rows into the main statement. Useful for patterns like "delete from one table and archive the rows in a single transaction." Snowflake, BigQuery, and MySQL are more restrictive; check the docs.

Why is my recursive CTE running forever?

Two common causes. First, the recursive arm has no termination — the join keeps producing rows because the data has a cycle or the join key is wrong. Second, the depth is larger than expected and the engine's default cap kicks in. Add an explicit level column and a WHERE level < N guard in the recursive arm during development, then raise N once you have confirmed the recursion terminates.