JOIN in SQL: a practical guide

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why JOIN matters

In a production database your data is almost never in one place. Users live in users, orders in orders, sessions in events. A relational schema keeps each fact in one row in one table and stitches them back together at query time. The stitching tool is JOIN.

JOINs appear on every SQL screen for data analyst and analytics engineer roles. In a Snowflake or Databricks interview at a tier-1 employer, expect at least one question that hinges on subtle JOIN behavior — ON versus WHERE, or row duplication. This guide walks through the five join types, the conditions that drive them, and the traps that show up in real interviews.

Sample data

Two tiny tables, used throughout. Every query below operates on these exact rows.

users:

user_id name
1 Alice
2 Ben
3 Carla
4 David

orders:

order_id user_id amount
101 1 500
102 1 300
103 3 700
104 5 200

Two things to notice: Ben and David have zero orders, and order 104 references user_id = 5, which does not exist in users. This referential drift is normal in raw event tables and is exactly what JOIN behavior decides how to handle.

INNER JOIN

INNER JOIN returns only rows that have a match in both tables. No match, no row.

SELECT u.user_id, u.name, o.order_id, o.amount
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;

Three rows come back: Alice (two orders) and Carla (one). Ben and David disappear because they have no orders; order 104 disappears because there is no matching user. INNER JOIN is the right pick when you only want the intersection — for example, "users who have ever placed an order".

LEFT JOIN

LEFT JOIN keeps every row from the left table. If there is no matching row on the right, the right-side columns come back as NULL.

SELECT u.user_id, u.name, o.order_id
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;

All four users come back. Ben and David show up with order_id = NULL. Order 104 still vanishes because it has no matching user on the left. In analytics, LEFT JOIN is the single most common operation — it builds "users with their last activity" or "products with optional reviews" queries without losing the entity you care about.

RIGHT JOIN

RIGHT JOIN is the mirror of LEFT JOIN: it keeps every row from the right table.

SELECT u.name, o.order_id, o.amount
FROM users u
RIGHT JOIN orders o ON u.user_id = o.user_id;

All four orders come back. Order 104 shows up with name = NULL. Most teams never write RIGHT JOIN in practice — it is clearer to flip the table order and use LEFT JOIN so the primary entity reads left to right. Some style guides at companies like Linear and Figma explicitly ban RIGHT JOIN in code review.

FULL OUTER JOIN

FULL OUTER JOIN keeps every row from both sides. Where there is no match, the missing side fills with NULL.

SELECT u.name, o.order_id
FROM users u
FULL OUTER JOIN orders o ON u.user_id = o.user_id;

Returns users without orders (Ben, David) and orders without users (104) in one result set. The right tool when reconciling two data sources and surfacing mismatches in either direction — for example, comparing what your event tracker logged against what billed in Stripe.

CROSS JOIN

CROSS JOIN produces a Cartesian product: every row on the left paired with every row on the right. There is no ON condition. M rows joined to N rows produces M times N output rows.

SELECT u.name, o.order_id
FROM users u
CROSS JOIN orders o;

Four users by four orders is sixteen rows. Cartesian products are used deliberately for date spines (every day for every store, even zero-sales days), gap analysis, and test data. They are also the leading cause of accidental table explosions when someone forgets the ON clause on a regular join.

The ON condition

The expression after ON decides which rows on the left match which rows on the right. The most common form is an equality between a primary key and a foreign key:

ON u.user_id = o.user_id

You can pile on extra conditions:

LEFT JOIN orders o
    ON u.user_id = o.user_id
    AND o.created_at >= '2026-01-01'

This is where most people get tripped up. For a LEFT JOIN, a condition in ON filters only the right-hand table before the join; rows on the left are preserved even when the right-side condition does not match. Move the same condition to WHERE and your LEFT JOIN quietly degrades into an INNER JOIN — see the pitfalls section.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Chaining multiple JOINs

Real production queries rarely stop at two tables. JOINs chain left-to-right, and the result of each step becomes the left side of the next:

SELECT
    u.name,
    o.order_id,
    p.product_name,
    c.city
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
LEFT JOIN products p ON o.product_id = p.product_id
LEFT JOIN cities c ON u.city_id = c.city_id;

Each step builds on the previous one. Order matters: if an early INNER JOIN drops rows, no later LEFT JOIN brings them back. A useful habit is to start with the entity you must keep (users) and add optional tables to the right. When a chain mixes INNER and LEFT, lead with the strict joins so you know exactly when rows are dropped.

JOIN with aggregation

The bread and butter of analytics. Count orders per user, sum revenue per cohort, average session length per region:

SELECT
    u.user_id,
    u.name,
    COUNT(o.order_id) AS orders_count,
    COALESCE(SUM(o.amount), 0) AS total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.name;

LEFT JOIN plus GROUP BY returns all four users, including the two with zero orders. COUNT(o.order_id) correctly returns 0 for them because COUNT ignores NULL. COALESCE turns the NULL sum into 0, so Ben renders as a real zero in your BI tool instead of blank. INNER JOIN here would silently drop Ben and David from the dashboard — almost always the wrong default.

Anti-join pattern

Finding rows in one table that have no match in another is one of the most asked SQL interview questions, sometimes called "find the missing" or "find users who did not X":

-- Users who never placed an order
SELECT u.user_id, u.name
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.order_id IS NULL;

Result: Ben and David. The mechanic is simple — do a LEFT JOIN, then filter to rows where the right-side key came back NULL. Alternatives are NOT EXISTS and NOT IN. The LEFT JOIN + IS NULL form is the most readable, and on most engines its query plan is at least as good as NOT IN, which has nasty edge cases when the subquery returns NULL.

Self join

A self join is a regular join where both inputs are the same table, distinguished by aliases. Canonical use cases: hierarchies, sequential events, and pairwise comparisons.

-- Employee and their manager
SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

The same pattern handles "sessions within 10 minutes of another session by the same user" or "pairs of products bought together". Self joins can also be the right answer for gap-and-island problems before you reach for window functions.

Common pitfalls

The most common bug is filtering in WHERE for what should be a condition in ON. The pattern looks innocent — LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.status = 'paid'. The intent is "all users with their paid orders if any". The result is wrong: users with no paid orders disappear because WHERE o.status = 'paid' rejects rows where o.status is NULL, which is exactly what LEFT JOIN produced for those users. Move the predicate into ON as AND o.status = 'paid'.

A second trap is unexpected row duplication. JOIN does not check that the right side is unique on the key. If a user has ten orders, joining users to orders produces ten rows for that user. Fine when you mean to expand to order granularity, disastrous when you forget and then SUM from the user table — lifetime value gets multiplied by order count. Always run SELECT COUNT(*) to sanity-check, and aggregate first or use ROW_NUMBER filtered to rank 1 when you only want one row per user.

A third pitfall is performance from missing indexes. A JOIN on a column without an index on the right side forces a full scan and turns a one-second query into a ten-minute one on a billion-row table. On Snowflake and Databricks the equivalent concern is clustering keys and partition pruning. If a query suddenly slows down, check whether a JOIN now has to read the whole right table.

A fourth trap is accidental Cartesian products. You meant INNER JOIN orders o ON u.user_id = o.user_id, but typed INNER JOIN orders o with no ON. Some engines reject it, others happily return the product — ten thousand by ten thousand is one hundred million rows. Always check the row count against what you expected.

A fifth trap is NOT IN with a subquery that can return NULL. WHERE user_id NOT IN (SELECT user_id FROM blocked) returns zero rows whenever any row of blocked.user_id is NULL, because NULL makes the inequality unknown. Use NOT EXISTS or LEFT JOIN ... WHERE right.key IS NULL — both handle NULL predictably.

Interview questions

Expect any of these on a SQL screen for data analyst, analytics engineer, or junior data engineer roles at companies like Airbnb, Uber, or Anthropic.

"List the JOIN types in SQL." INNER, LEFT, RIGHT, FULL OUTER, CROSS. Strong candidates also mention SELF JOIN as a usage pattern, and know that NATURAL JOIN and USING exist as shortcuts.

"Difference between LEFT JOIN and INNER JOIN?" LEFT JOIN keeps every row from the left table even when no match exists on the right, filling the right-side columns with NULL. INNER JOIN keeps only the intersection.

"Find users who never placed an order." LEFT JOIN on orders followed by WHERE orders.id IS NULL. Strong candidates also mention NOT EXISTS and explain why NOT IN is risky when the subquery can return NULL.

"WHERE versus ON for a LEFT JOIN?" The WHERE version effectively becomes an INNER JOIN because rows with NULL on the right are filtered out. The ON version preserves left-side rows and only filters the right before the join.

"Can a JOIN increase row count?" Yes, whenever the right side has more than one match per left key. The classic source of inflated metrics in dashboards. Fix by aggregating first or picking one row with ROW_NUMBER.

"When do you use CROSS JOIN on purpose?" Date spines that include every day even with zero activity, combinations for gap analysis, and test fixtures.

If you want to drill SQL JOIN questions like these every day, NAILDD is launching with 500+ SQL problems covering exactly these patterns.

FAQ

What is the default JOIN type if I just write JOIN?

A bare JOIN is INNER JOIN in PostgreSQL, MySQL, Snowflake, BigQuery, Databricks SQL, and SQL Server. In code reviews, prefer writing INNER JOIN explicitly — it costs one word and matches most internal style guides.

Can I join more than two tables in one query?

Yes, no language-level cap. Three to five tables is normal in analytics. Watch your row count at each step — running SELECT COUNT(*) after each new JOIN surfaces duplication or unexpected drops before they cascade into wrong numbers.

Difference between ON and WHERE?

For INNER JOIN the difference is stylistic. For outer joins it is structural: a predicate in ON constrains which rows from the optional side participate, preserving the required side. A predicate in WHERE runs after the join, and can silently eliminate the NULL-filled rows the outer join was supposed to keep.

Is INNER JOIN faster than LEFT JOIN?

Usually slightly — fewer rows returned, more planner freedom. On an indexed table the difference is often single-digit percent. Choose the join type by whether you want optional or required rows, not micro-performance. Saving milliseconds while silently dropping dashboard rows is a much worse outcome than a slightly slower correct query.

Why does my LEFT JOIN return more rows than the left table has?

The right table has more than one match per join key. A LEFT JOIN is not a lookup — it produces one row per matching combination. If you need exactly one row per left-side entity, aggregate the right side first in a subquery or CTE, or pick one row with ROW_NUMBER() OVER (PARTITION BY join_key ORDER BY ...) filtered to rank 1 before joining.