UNION in SQL — how to combine query results the right way
Contents:
What UNION does
The PM pings you on Slack at 4:47pm: "Pull every customer who showed up in either the 2024 or 2025 orders table. Board deck is at 9." Two tables, identical schema, sitting in different partitions. You want one result set, deduped, sorted, ready to paste — not two queries and a spreadsheet merge.
That is what UNION is for. UNION stacks the rows of one SELECT on top of another vertically, returning a single result. If the same row appears in both queries, UNION keeps it once. If you want every row preserved — including duplicates — you reach for UNION ALL, the faster and more common variant in real analyst work.
SELECT customer_id, customer_name FROM orders_2024
UNION
SELECT customer_id, customer_name FROM orders_2025;The result is a single deduplicated customer list. A buyer who appears in both years shows up exactly once. Swap UNION for UNION ALL and that same buyer shows up twice — once per source row.
UNION vs UNION ALL vs INTERSECT vs EXCEPT
The four ANSI set operators all combine query results, but they answer different questions. Memorize this table — half of SQL interview misses on set operators trace back to picking the wrong one.
| Operator | What it returns | Dedupes? | Typical use case | Relative cost |
|---|---|---|---|---|
UNION |
Rows in A or B | Yes | Distinct customers across two regions | Medium (sorts internally) |
UNION ALL |
Rows in A or B | No | Stacking partitions, event logs, monthly slices | Cheap (streaming concat) |
INTERSECT |
Rows in A and B | Yes | Customers who bought online and offline | Medium |
EXCEPT |
Rows in A but not in B | Yes | Users active in 2024 who churned in 2025 | Medium |
Load-bearing trick: default to UNION ALL. Use plain UNION only when you specifically need duplicate elimination and the input could actually contain duplicates. Picking UNION "just to be safe" costs you a sort on millions of rows for no gain.
On Postgres and Snowflake, UNION performs a hash aggregate or sort-distinct after concatenation. On BigQuery, plain UNION is not even valid — you must write UNION DISTINCT explicitly. That dialect difference alone has burned plenty of analysts moving from Snowflake to BigQuery.
Rules every set operator enforces
UNION, UNION ALL, INTERSECT, and EXCEPT share the same column contract. Break either rule and the engine refuses to run the query.
Same column count. Every SELECT must return the same number of columns. The engine does not auto-pad with NULLs — you write them in yourself.
Compatible data types per position. Columns match by position, not by name. The first column of the first query unions with the first column of the second, and the types must be compatible. INT with BIGINT is fine. INT with VARCHAR is not.
-- Wrong: 3 columns vs 2 columns
SELECT name, city, signup_date FROM customers
UNION ALL
SELECT name, city FROM suppliers;
-- Right: pad the shorter side
SELECT name, city, signup_date FROM customers
UNION ALL
SELECT name, city, NULL AS signup_date FROM suppliers;Column names in the final result come from the first SELECT. If your first query aliases a column as client_name and the second calls it supplier_name, the output column is client_name — which matters when downstream BI tools key off column names.
ORDER BY with UNION
ORDER BY belongs at the end of the whole statement, never inside an individual SELECT in a UNION chain. It sorts the combined result.
SELECT product_name, revenue FROM sales_q1
UNION ALL
SELECT product_name, revenue FROM sales_q2
ORDER BY revenue DESC;Putting ORDER BY inside the first SELECT is a syntax error. If you need to limit rows within each branch — say, top 5 products per quarter before combining — wrap each branch in a subquery.
SELECT * FROM (
SELECT product_name, revenue FROM sales_q1
ORDER BY revenue DESC LIMIT 5
) top_q1
UNION ALL
SELECT * FROM (
SELECT product_name, revenue FROM sales_q2
ORDER BY revenue DESC LIMIT 5
) top_q2
ORDER BY revenue DESC;Worked examples from analyst work
Stitching partitioned event tables
A common warehouse layout splits events by year or month for cost control. Querying a window across the boundary needs UNION ALL.
SELECT user_id, event_type, created_at
FROM events_2024
WHERE created_at >= '2024-12-15'
UNION ALL
SELECT user_id, event_type, created_at
FROM events_2025
WHERE created_at < '2025-01-15'
ORDER BY created_at;No duplicates can exist across the partition boundary, so UNION ALL is the only sane choice. A plain UNION here would force the engine to dedupe a result set that is guaranteed unique — pure waste.
Building a lookup table inline
Sometimes you need a small mapping that does not exist as a real table — month names, status decoders, A/B variant labels.
WITH month_lookup AS (
SELECT 1 AS month_num, 'January' AS month_name
UNION ALL SELECT 2, 'February'
UNION ALL SELECT 3, 'March'
UNION ALL SELECT 4, 'April'
)
SELECT m.month_name, COUNT(*) AS signups
FROM signups s
JOIN month_lookup m ON EXTRACT(MONTH FROM s.created_at) = m.month_num
GROUP BY m.month_name;Cleaner than maintaining a dimension table for something this trivial. Pair this with a JOIN and you have a full reporting query.
Assembling a unified event log
The single most common UNION ALL use case at companies like Stripe, Notion, or Linear: pull heterogeneous events from separate tables into one timeline.
SELECT user_id, 'purchase' AS event, created_at, amount FROM purchases
UNION ALL
SELECT user_id, 'login' AS event, created_at, NULL::numeric FROM logins
UNION ALL
SELECT user_id, 'signup' AS event, created_at, NULL::numeric FROM registrations
ORDER BY user_id, created_at;Notice the NULL::numeric casts — required because purchases.amount is numeric and stricter engines need an explicit type match across branches.
Cohort overlap with INTERSECT and EXCEPT
INTERSECT answers "who is in both?"; EXCEPT answers "who is in A but not B?". A churn snapshot in two lines:
-- Customers active in 2024 who did not return in 2025
SELECT customer_id FROM customers_active_2024
EXCEPT
SELECT customer_id FROM customers_active_2025;That is a churned-cohort definition you can ship to a dashboard without a single LEFT JOIN ... WHERE IS NULL. Same result, far more readable.
UNION vs JOIN — the interview classic
Interviewers love this question because it separates candidates who think in tables from candidates who memorized syntax. The distinction is geometric.
UNION is vertical — stacks rows. Both inputs must share column structure. Output has the same width as either input, more rows.
JOIN is horizontal — extends columns. Inputs are linked by a key. Output has the combined width and a row count that depends on the join type and key cardinality.
-- UNION: "all customers across two acquisition channels"
SELECT customer_id, name FROM customers_online
UNION
SELECT customer_id, name FROM customers_offline;
-- JOIN: "customers with their orders attached"
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.id;When a stakeholder says "combine these two tables," your first question should be: rows or columns? That single clarification will save you from writing the wrong query.
Common pitfalls
The traps below come up in production code reviews and on whiteboards in roughly equal measure.
Using SELECT * across UNION branches. Two tables with the "same schema" today drift apart tomorrow — someone adds a column to one of them, the union breaks, and the on-call analyst inherits the mess at 11pm. Always list columns explicitly in every branch.
Reaching for UNION when you needed UNION ALL. This is the single most common performance footgun in SQL. The query is correct but slow because the engine sorts millions of rows to remove duplicates that could not exist. If your data sources are disjoint by construction — different time partitions, different event types, different geographies — UNION only wastes CPU. Default to UNION ALL and switch to UNION only when you can name the duplicate scenario you are protecting against.
Placing ORDER BY inside a branch. ORDER BY belongs once, at the very end. If you need per-branch ordering plus a per-branch LIMIT — say, "top 5 from each quarter" — wrap each branch in a subquery. Putting ORDER BY directly between two SELECTs in a UNION chain is a parse error.
Confusing UNION with JOIN. A surprising number of "I need to combine two tables" requests are JOIN problems wearing a UNION costume. If the two tables share a key and you want their attributes side by side, that is a JOIN. Asking rows or columns? out loud kills this confusion fast.
Forgetting that INTERSECT and EXCEPT dedupe too. Both do an implicit DISTINCT on their inputs. If you need multiplicities, Postgres, BigQuery, and Snowflake support INTERSECT ALL and EXCEPT ALL — check your dialect.
Interview questions
1. What is the difference between UNION and UNION ALL?
UNION combines two result sets and removes duplicates by running an implicit DISTINCT over the full output. UNION ALL combines them without deduplication, so every row from every branch survives. UNION ALL is faster because it skips the sort or hash aggregate that dedup requires. Default to UNION ALL and only switch to UNION when you specifically need duplicates removed.
2. Can you use ORDER BY inside a UNION?
No — ORDER BY applies only to the final combined result and must appear after the last SELECT in the chain. If you want to order or limit rows within a single branch, wrap that branch in a subquery and put ORDER BY and LIMIT inside the subquery.
3. How does UNION differ from JOIN?
UNION is vertical: it stacks the rows of one query on top of another, and both queries must share the same column structure. JOIN is horizontal: it links two tables by a key and produces a wider result with both sides' columns attached. Figure out whether the stakeholder wants more rows (UNION) or more columns (JOIN).
4. Write a query that returns customers who bought both online and offline.
SELECT customer_id FROM online_orders
INTERSECT
SELECT customer_id FROM offline_orders;INTERSECT returns only rows that appear in both inputs, which is exactly the dual-channel customer definition. An equivalent INNER JOIN on customer_id also works but reads less clearly when the only purpose is set membership.
5. How do you combine three tables with different schemas?
Align the SELECT lists by padding missing columns with NULL or constant literals so every branch returns the same column count and compatible types. Add a discriminator column so downstream consumers can tell which branch each row came from.
SELECT user_id, amount, 'purchase' AS type FROM purchases
UNION ALL
SELECT user_id, NULL::numeric, 'login' AS type FROM logins
UNION ALL
SELECT user_id, NULL::numeric, 'signup' AS type FROM signups;Related reading
- JOIN in SQL — the guide
- JOIN cheat sheet
- JOIN types — visual cheat sheet
- SQL window functions interview questions
Drill set operators the way real interviews ask them — NAILDD is launching with 500+ SQL problems on this pattern.
FAQ
How many queries can I chain together with UNION?
There is no hard upper bound — chain as long as the engine's statement size limit allows. In production code, chains of more than five or six branches usually mean you should be reading from a partitioned table or a view that already does the union for you. Long chains become hard to maintain because every column-list change has to be applied N times.
Does UNION preserve row order?
No. Without an explicit ORDER BY at the end, output order is engine-defined and can change between runs, between query planner versions, or even between parallel execution shards. If row order matters, always add an explicit ORDER BY after the last SELECT.
Can you use UNION with INSERT?
Yes — INSERT INTO target_table SELECT ... UNION ALL SELECT ... is valid and common, and is the easiest way to bulk-load a staging table from several sources in one statement. Watch the column order: INSERT matches by position unless you supply an explicit column list after the target table name.
Does UNION work with GROUP BY?
Yes. GROUP BY applies inside each individual SELECT, then UNION or UNION ALL combines the already-aggregated results — the canonical pattern for combining metrics from two source tables. Output column names come from the first SELECT, so name your aggregates consistently.
Is UNION ALL always faster than UNION?
In essentially every engine, yes — UNION ALL is a streaming concatenation, while UNION adds a sort or hash aggregate to deduplicate. The gap is small on tiny inputs and large on multi-million-row inputs. If your data is already deduplicated downstream — for example, you wrap the union in SELECT DISTINCT or a GROUP BY — prefer UNION ALL and let the existing dedup step do the work once instead of twice.
Can I use UNION across different databases or schemas?
If the engine supports cross-database queries — Snowflake, BigQuery, Postgres with foreign data wrappers — you can union across them as long as column counts and types align. In federated setups, expect the slower side to dominate query time, because the engine must materialize both inputs before stacking.