JOIN types SQL cheat sheet

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why deep JOIN knowledge separates seniors from juniors

Roughly half of analytics-report bugs trace back to a wrong JOIN. Duplicated rows, lost users, inflated revenue, broken cohort sizes — almost all of it comes from the wrong JOIN type or key. The math may be right, but if the row set is wrong, every dashboard tile is silently lying.

In interviews at Stripe, Airbnb, DoorDash, and Snowflake, the JOIN block is non-negotiable. Any time you JOIN two tables, say out loud "one-to-many" or "many-to-many" before writing any SQL. Skip that step and you ship inflated SUMs and broken COUNTs.

JOIN is also the most expensive line in most warehouse queries. Knowing which strategy the planner picks — hash, merge, nested loop — is the difference between a 3-second and a 3-hour query.

The six JOIN flavors at a glance

JOIN What it returns
INNER Only matched rows from both tables
LEFT (OUTER) All rows from the left table + matched rows from the right
RIGHT (OUTER) All rows from the right table + matched rows from the left
FULL (OUTER) All rows from both, with NULL where no match
CROSS Cartesian product (every left row times every right row)
SELF A table joined against itself, usually for hierarchies or time shifts

ANTI JOIN and SEMI JOIN are not separate keywords — they are patterns expressed with NOT EXISTS / EXISTS or LEFT JOIN ... WHERE r.key IS NULL.

INNER, LEFT, RIGHT, FULL — runnable examples

INNER JOIN keeps only rows that match on both sides.

SELECT u.name, o.amount
FROM users u
INNER JOIN orders o ON o.user_id = u.user_id;

One row per matching (user, order) pair. Users with no orders disappear. If a user has three orders, the name appears three times — duplication that bites the moment you forget a GROUP BY.

LEFT JOIN keeps every row from the left side regardless of match.

SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON o.user_id = u.user_id;

Every user shows up. Users with zero orders appear once with o.amount = NULL. This is the workhorse of product analytics because most reports anchor on a dimension table.

The most common LEFT JOIN bug: moving a filter on the right table into WHERE silently converts the LEFT JOIN into an INNER JOIN. NULL = 'paid' evaluates to UNKNOWN, the row gets dropped, and "all users with paid orders" quietly becomes "users with at least one paid order". Put the filter in the ON clause:

SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o
  ON o.user_id = u.user_id
  AND o.status = 'paid';

RIGHT JOIN is the mirror of LEFT JOIN and is almost never the right answer in production. Reviewers will ask you to flip the tables and use LEFT JOIN, because reading left-to-right matches how humans read code.

FULL OUTER JOIN keeps everything from both sides, filling NULLs wherever a match is missing.

SELECT COALESCE(u.user_id, o.user_id) AS user_id,
       u.name,
       o.amount
FROM users u
FULL OUTER JOIN orders o ON o.user_id = u.user_id;

Most product analytics never needs this, but it is invaluable for reconciliation — Stripe vs internal ledger, Salesforce vs warehouse, app events vs server-side events.

CROSS, SELF, ANTI, SEMI

CROSS JOIN produces the Cartesian product.

SELECT *
FROM categories
CROSS JOIN products;
-- 10 categories x 1000 products = 10,000 rows

Legitimate use: grid generation (every category for every day of the month). Illegitimate: a comma-join without an ON condition.

-- Wrong: Cartesian product, not a JOIN
SELECT *
FROM users u, orders o
WHERE u.name = 'Alex';

SELF JOIN is a table joined to itself.

-- Employees with their managers
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;
-- Each order vs the same user's previous-day order
SELECT o1.user_id,
       o1.amount AS curr,
       o2.amount AS prev
FROM orders o1
LEFT JOIN orders o2
  ON o2.user_id = o1.user_id
  AND o2.created_at = o1.created_at - INTERVAL '1 day';

Most SELF JOINs over time can be replaced with a window function like LAG, which is cleaner and almost always faster because it scans once.

ANTI JOIN means "rows on the left with no match on the right".

-- Users who never ordered
SELECT u.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.user_id
WHERE o.user_id IS NULL;
SELECT u.*
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.user_id
);

SEMI JOIN means "rows on the left with at least one match on the right" without duplicating rows.

SELECT u.*
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.user_id
);

EXISTS is preferable to INNER JOIN ... DISTINCT because it preserves the left-side row count and most planners short-circuit on the first match.

USING vs ON vs NATURAL JOIN

-- Equivalent when the column has the same name on both sides
SELECT * FROM users JOIN orders ON users.user_id = orders.user_id;
SELECT * FROM users JOIN orders USING (user_id);

USING is shorter and collapses the duplicated column in the output. Mentioning it in an interview is a small but real signal of syntax fluency.

NATURAL JOIN auto-joins on every column that shares a name across both tables.

SELECT * FROM users NATURAL JOIN orders;

If both tables happen to have an unrelated created_at or is_active column, NATURAL JOIN silently includes it in the join key and the query returns nonsense. Never use it in production — the implicit behavior breaks six months later when someone adds a column.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Common pitfalls

The first pitfall is row duplication after joining a one-to-many table. If a user has five orders, you get five rows for that user. A SUM(o.amount) after GROUP BY u.user_id is correct because the SUM aggregates them back. But SUM(u.subscription_price) is wrong — you just multiplied it by five. Either aggregate before joining, or only aggregate on the many-side.

The second pitfall is many-to-many chains like users -> orders -> order_items. One user expands to N orders, each order to M items. SUM(oi.price) grouped by user is correct only by accident. Pre-aggregate items to one row per order in a CTE, then join to users.

WITH order_totals AS (
  SELECT order_id, SUM(price) AS total
  FROM order_items
  GROUP BY order_id
)
SELECT u.name, SUM(t.total) AS revenue
FROM users u
JOIN orders o USING (user_id)
JOIN order_totals t USING (order_id)
GROUP BY u.name;

The third pitfall is the LEFT-to-INNER trap: a filter on the right-side table placed in WHERE drops NULL rows and silently converts a LEFT JOIN into an INNER JOIN. Any predicate on the right side of a LEFT JOIN belongs in ON, not WHERE. The only exception is a predicate that explicitly checks for IS NULL (the ANTI JOIN pattern).

The fourth pitfall is joining on the wrong granularity. If your left table is one-row-per-user and your right is one-row-per-user-per-day, joining without a date predicate gives you one row per user per day. Add the date to the ON clause or pre-aggregate the right table to match.

The fifth pitfall is NULL-equals-NULL. In ANSI SQL, NULL = NULL is UNKNOWN, not TRUE. If your join key contains NULLs and you expected them to match across tables, they will not. Use IS NOT DISTINCT FROM (Postgres) or COALESCE(key, sentinel) if you need NULL-to-NULL joining, and leave a comment.

Optimization tips

Hash joins dominate modern warehouses. Snowflake, BigQuery, and Databricks build a hash on the smaller side and probe with the larger side. Read the EXPLAIN plan and check whether the planner is broadcasting or shuffling — broadcasts are fast, shuffles can be brutal.

Partition pruning matters more than indexing. If your fact table is partitioned by event_date and you have a date predicate, the engine reads a fraction of the data. If the predicate is wrapped in a function (DATE_TRUNC('day', event_ts) >= '2026-05-01'), pruning often fails and you scan everything.

Filter early, join late. Push predicates into a CTE before the JOIN. Modern planners often do this for you, but writing it explicitly is a useful signal in code review.

For SELF JOINs against time-shifted versions of the same table, prefer window functions. LAG(amount) OVER (PARTITION BY user_id ORDER BY created_at) is one scan and one sort.

For one-to-many JOINs feeding an aggregation, a correlated EXISTS or NOT EXISTS is often enough. When you only need "did the user ever do X", SEMI JOIN is faster because the planner stops after the first match.

10 interview-style JOIN tasks

These come up in SQL rounds at Stripe, DoorDash, Notion, and Linear.

1. Every user with total order amount, including users with zero orders (zero, not NULL).

SELECT u.name, COALESCE(SUM(o.amount), 0) AS total
FROM users u
LEFT JOIN orders o USING (user_id)
GROUP BY u.name;

2. Every order labeled with the user's name.

SELECT o.order_id, u.name, o.amount
FROM orders o
JOIN users u USING (user_id);

3. Users who have never placed an order.

SELECT u.*
FROM users u
LEFT JOIN orders o USING (user_id)
WHERE o.user_id IS NULL;

4. Products that no one has ever bought.

SELECT p.*
FROM products p
LEFT JOIN order_items oi USING (product_id)
WHERE oi.product_id IS NULL;

5. All distinct pairs of orders belonging to the same user.

SELECT o1.order_id, o2.order_id
FROM orders o1
JOIN orders o2
  ON o1.user_id = o2.user_id
  AND o1.order_id < o2.order_id;

6. Employee hierarchy — each employee with their manager's name.

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

7. Difference between each order and the same user's order on the previous day.

SELECT o1.user_id, o1.amount - o2.amount AS diff
FROM orders o1
JOIN orders o2
  ON o2.user_id = o1.user_id
  AND o2.created_at = o1.created_at - INTERVAL '1 day';

8. A full matrix of every category for every day of April 2026.

SELECT c.category, d.day
FROM categories c
CROSS JOIN generate_series('2026-04-01'::DATE,
                           '2026-04-30'::DATE,
                           '1 day'::INTERVAL) d(day);

9. Reconcile two data sources and surface only the disagreements.

SELECT COALESCE(a.id, b.id) AS id,
       a.value AS source_a,
       b.value AS source_b
FROM source_a a
FULL OUTER JOIN source_b b ON b.id = a.id
WHERE a.value IS DISTINCT FROM b.value;

10. Users who placed an order but the order has no payment record.

SELECT DISTINCT u.user_id
FROM users u
JOIN orders o USING (user_id)
LEFT JOIN payments p USING (order_id)
WHERE p.payment_id IS NULL;

If you can write all ten without looking, no SQL onsite will surprise you. For daily problems like these, NAILDD is launching with 500+ SQL problems indexed by JOIN type and pitfall.

FAQ

What is the difference between LEFT JOIN and LEFT OUTER JOIN?

Nothing. The OUTER keyword is optional in the SQL standard, so LEFT JOIN and LEFT OUTER JOIN are identical. The same applies to RIGHT and FULL. Some teams spell out OUTER for clarity, but no engine treats them differently.

Can I put multiple conditions in the ON clause?

Yes, and you often should. ON a.x = b.x AND a.y = b.y is a compound key join. ON a.x = b.x AND b.status = 'active' is a filtered join — and for LEFT JOIN, it is the only correct place for a predicate on the right table. Moving the second condition into WHERE silently turns the LEFT JOIN into an INNER JOIN.

Is JOIN faster than IN, or is IN faster than JOIN?

In modern engines, the planner usually rewrites both into the same operator. As a rough rule, JOIN (or EXISTS) wins for large probing tables because the planner can pick a hash join, while IN (...) with a small literal list is fastest for a tiny fixed set. The right interview answer is "it depends on table sizes and indexes, and I would read the plan".

When do I actually need FULL OUTER JOIN?

Rarely. Most product analytics anchors on a dimension table and uses LEFT JOIN. FULL OUTER JOIN shines in reconciliation: Stripe vs internal ledger, diffing snapshots, joining client-side and server-side event streams to find drift. In a normal analytics report you probably want a LEFT JOIN against a calendar instead.

Should I worry about RIGHT JOIN in interviews?

Know it exists and that it is the mirror of LEFT JOIN. Then always pick LEFT JOIN. Reviewers prefer LEFT because it matches the reading order of the query and never forces them to mentally swap tables.

How do I JOIN when the join key has NULLs and I want NULL to match NULL?

Standard = treats NULL = NULL as UNKNOWN, so those rows do not match. Postgres has IS NOT DISTINCT FROM, which returns TRUE for two NULLs. Other engines may require COALESCE(a.key, '__sentinel__') = COALESCE(b.key, '__sentinel__'). Either way, leave a comment — the next reader will not expect NULL-to-NULL matching.