NULL in SQL cheat sheet
Contents:
Why NULL trips up analysts
A query returns 100 rows. You add what looks like a harmless filter. Now it returns 50. Nine times out of ten the missing rows had NULL in the filtered column, and the predicate quietly dropped them. That's the difference between an analyst who ships a clean dashboard at Stripe and one who quietly under-reports revenue for two quarters.
At interview, NULL questions are the cheapest way for a hiring manager to separate people who memorized SQL syntax from people who have shipped against real warehouses. The classic prompt — "what does WHERE col != 'x' do when col can be NULL?" — takes thirty seconds and tells them whether you understand three-valued logic: every comparison can return TRUE, FALSE, or UNKNOWN, and only TRUE passes through WHERE.
This page covers the mental model, the four operators that handle NULL safely, the four places NULL breaks naive queries, and ten worked examples mapped to interview tasks at Meta, Airbnb, DoorDash, and Snowflake.
The mental model
NULL is not a value — it's the explicit absence of one. Treating it as "zero" or "empty string" is the root of every NULL bug you will ever ship. Say "NULL means unknown" and the rest of SQL stops being surprising.
NULL = NULL -- UNKNOWN, NOT TRUE
NULL != NULL -- UNKNOWN
NULL + 1 -- NULL
'abc' = NULL -- UNKNOWN
NULL OR TRUE -- TRUE (TRUE absorbs unknown)
NULL AND TRUE -- UNKNOWN
NULL OR FALSE -- UNKNOWN
NULL AND FALSE -- FALSE (FALSE absorbs unknown)The rule that covers ninety percent of cases: any arithmetic or comparison against NULL produces NULL. Booleans have one exception each — TRUE OR anything is TRUE and FALSE AND anything is FALSE — because the unknown side can't change the outcome.
Checking, coalescing, nullifying
Never compare to NULL with =. Use IS NULL and IS NOT NULL — the only operators that return TRUE or FALSE when one side is NULL.
-- Correct
SELECT * FROM users WHERE email IS NULL;
SELECT * FROM users WHERE email IS NOT NULL;
-- Broken: returns zero rows every time
SELECT * FROM users WHERE email = NULL;
SELECT * FROM users WHERE email != NULL;email = NULL evaluates to UNKNOWN for every row, and WHERE only keeps TRUE rows. Result: empty set. This is the most-asked NULL question in junior analyst loops at Amazon and Microsoft.
COALESCE is the workhorse. It returns the first non-NULL argument, so you chain fallbacks in business priority order.
-- Pick the first available contact channel
SELECT COALESCE(phone, email, 'unknown') AS contact
FROM users;
-- Fill missing aggregates so dashboards don't show blank cells
SELECT u.user_id, COALESCE(SUM(o.amount), 0) AS revenue
FROM users u
LEFT JOIN orders o USING (user_id)
GROUP BY u.user_id;NULLIF is the inverse — it converts a specific value back into NULL. The canonical use case is defending against division by zero.
-- Returns NULL instead of erroring when users = 0
SELECT revenue / NULLIF(users, 0) AS arpu
FROM daily_stats;NULLIF(a, b) returns NULL when a = b, otherwise a. Use it for sentinel values (0, -1, 'N/A') that should be treated as missing.
NULL in aggregates, WHERE, and JOIN
Every aggregate except COUNT(*) silently skips NULL — the second most common source of "why is my average wrong?" questions.
-- Source rows: amount IN (10, 20, NULL, 30)
SELECT
COUNT(*) AS row_count, -- 4
COUNT(amount) AS non_null, -- 3
SUM(amount) AS total, -- 60
AVG(amount) AS avg_amount, -- 20 (60/3, NOT 60/4)
MIN(amount) AS min_amount, -- 10
MAX(amount) AS max_amount -- 30
FROM orders;If the spec says "treat missing as zero," wrap the column: AVG(COALESCE(amount, 0)). If the spec says "ignore missing," do nothing. Pin down which the stakeholder means before writing the query.
In WHERE, any predicate involving NULL with =, !=, <, > returns UNKNOWN and the row is dropped:
-- Drops users whose email is NULL, even though logically
-- a NULL email is "not equal to" admin@example.com
SELECT * FROM users WHERE email != 'admin@example.com';
-- Explicit version that keeps both
SELECT * FROM users
WHERE email != 'admin@example.com' OR email IS NULL;JOIN follows the same rule — join conditions are predicates. An INNER JOIN on a column containing NULL on either side never matches, and those rows vanish.
-- Employees whose manager is NULL never appear in this self-join
SELECT u.name AS employee, m.name AS manager
FROM users u
INNER JOIN users m ON u.manager_id = m.id;LEFT JOIN keeps the left side and inserts NULL for unmatched right-side columns. Combined with WHERE right.key IS NULL, that's the standard anti-join pattern.
-- Users with zero orders
SELECT u.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.user_id
WHERE o.user_id IS NULL;This is the second-most-asked SQL interview pattern after window functions, and the answer involves NULL.
The NOT IN trap
NOT IN surprises even experienced engineers. If the subquery returns a single NULL, NOT IN produces an empty result set — not "all rows except non-NULL matches."
-- Looks correct, returns zero rows if any referred_by is NULL
SELECT * FROM users
WHERE user_id NOT IN (SELECT referred_by FROM users);Fix it by stripping NULL from the subquery, or switch to NOT EXISTS, which handles NULL the way humans expect.
-- Option 1: explicitly exclude NULL
SELECT * FROM users
WHERE user_id NOT IN (
SELECT referred_by FROM users WHERE referred_by IS NOT NULL
);
-- Option 2: NOT EXISTS, safe by default
SELECT u.*
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM users r WHERE r.referred_by = u.user_id
);Senior engineers at Databricks and Snowflake default to NOT EXISTS whenever the subquery column is nullable — it's also usually faster because the planner can short-circuit.
GROUP BY, DISTINCT, ORDER BY, concatenation
GROUP BY treats all NULLs as a single group, producing one row with NULL in the grouping column. Label the bucket explicitly if it'll show in a dashboard:
SELECT COALESCE(category, 'unknown') AS category, COUNT(*)
FROM products
GROUP BY 1;DISTINCT keeps exactly one NULL — it deduplicates them like any other value.
-- Source: ('a', NULL, 'b', NULL, 'a')
SELECT DISTINCT col FROM t;
-- Result: ('a', 'b', NULL)ORDER BY puts NULL last for ASC and first for DESC in PostgreSQL, Snowflake, Redshift, and BigQuery. MySQL and SQL Server are the opposite. Spell it out with NULLS FIRST or NULLS LAST whenever ordering matters:
SELECT * FROM orders ORDER BY shipped_at NULLS LAST;
SELECT * FROM orders ORDER BY shipped_at DESC NULLS LAST;String concatenation with || produces NULL when any operand is NULL — a common dashboard bug where one missing middle name blanks the entire column.
-- PostgreSQL: any NULL operand makes the whole thing NULL
SELECT first_name || ' ' || last_name FROM users;
-- CONCAT skips NULL operands
SELECT CONCAT(first_name, ' ', last_name) FROM users;
-- Or coalesce each operand to empty string
SELECT COALESCE(first_name, '') || ' ' || COALESCE(last_name, '')
FROM users;Common pitfalls
The first pitfall is using = or != to test for NULL. The query runs without error and looks correct in a code review unless the reviewer is paying attention. Replace with IS NULL and IS NOT NULL, and read every WHERE clause out loud asking "what if this column is NULL?" before merging.
The second pitfall is treating empty string as equivalent to NULL. They are not the same: LENGTH('') = 0 while LENGTH(NULL) = NULL. Production tables often contain both because the ETL layer wrote empty strings while the source database wrote NULLs. When a stakeholder asks for "users without an email," handle both: email IS NULL OR email = ''.
The third pitfall is forgetting that SUM returns NULL — not zero — when every input is NULL or the input set is empty. A revenue dashboard with blank cells looks broken to a CFO even when technically correct. Wrap aggregates that feed dashboards: COALESCE(SUM(amount), 0). The same applies to AVG, MIN, and MAX.
The fourth pitfall is the NOT IN trap covered above. It is common enough at Meta and Netflix interviews that some loops use it as an automatic disqualifier. Default to NOT EXISTS when the subquery column might contain NULL.
The fifth pitfall is silent row loss in INNER JOIN when the join key is nullable. A hierarchy query joining employees to managers on manager_id drops the CEO and anyone whose manager is missing. Switch to LEFT JOIN and decide explicitly what happens for the missing side — usually a placeholder, not a drop.
Ten interview tasks
These cover roughly eighty percent of NULL-related SQL in analytics interviews at the FAANG tier and tier-two unicorns. Memorize the patterns, not the syntax.
1. Replace NULL in amount with zero
SELECT COALESCE(amount, 0) AS amount FROM orders;2. Find rows where email is missing (NULL or empty string)
SELECT * FROM users WHERE email IS NULL OR email = '';3. Safe division without crashing on zero
SELECT revenue / NULLIF(users_count, 0) AS arpu
FROM daily_stats;4. Users with zero orders
SELECT u.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.user_id
WHERE o.user_id IS NULL;5. Count of users with a phone, alongside total
SELECT COUNT(phone) AS with_phone, COUNT(*) AS total
FROM users;6. Average order amount treating NULL as zero
SELECT AVG(COALESCE(amount, 0)) AS avg_amount FROM orders;7. Orders whose status is not 'canceled', keeping NULL status rows
SELECT *
FROM orders
WHERE status != 'canceled' OR status IS NULL;8. Top users by revenue with NULL revenue at the bottom
SELECT user_id, SUM(amount) AS revenue
FROM orders
GROUP BY user_id
ORDER BY revenue DESC NULLS LAST;9. Paid vs unpaid order counts in one row
SELECT
COUNT(*) FILTER (WHERE paid_at IS NOT NULL) AS paid,
COUNT(*) FILTER (WHERE paid_at IS NULL) AS unpaid
FROM orders;10. Percentage of NULL in a column
SELECT
COUNT(*) FILTER (WHERE col IS NULL) * 100.0 / COUNT(*) AS null_pct
FROM t;If you want to drill these patterns every day, NAILDD is launching with 500+ SQL problems built around exactly this kind of question.
Related reading
- SQL window functions interview questions
- A/B testing peeking mistake
- How to calculate cohort retention in SQL
FAQ
How is NULL different from an empty string?
NULL means the value is unknown or absent. An empty string is a known value with length zero. NULL != '' evaluates to UNKNOWN while '' = '' is TRUE. Production data often contains both because ETL layers normalize missing fields to empty strings while upstream sources write NULL. When a stakeholder asks for "users without an email," handle both with WHERE email IS NULL OR email = ''. Treating them as interchangeable is one of the most common bugs in cohort queries.
Why does WHERE col != 'x' exclude rows where col is NULL?
Because NULL != 'x' evaluates to UNKNOWN under three-valued logic, and WHERE only keeps rows where the predicate is TRUE. UNKNOWN rows are dropped silently, which is why your row count shrinks even though the predicate feels like it should include them. The fix is WHERE col != 'x' OR col IS NULL. This is the canonical interview question for testing whether a candidate understands two- vs three-valued logic.
How does SUM behave with NULL values?
SUM ignores NULL inputs and adds the rest. If every input is NULL — or the input set is empty — SUM returns NULL, not zero. The standard defense is COALESCE(SUM(amount), 0), which converts a NULL aggregate to zero before it hits the dashboard layer. The same pattern applies to AVG, MIN, and MAX whenever you need a non-NULL sentinel.
Should I use IS NULL or = NULL?
Always IS NULL. The = NULL form is syntactically valid but semantically wrong — it evaluates to UNKNOWN for every row. Interview panels at Stripe, Airbnb, and Notion use it as a quick screen: candidates who write = NULL typically spend the next ten minutes being walked through three-valued logic. Build the habit early so you never type it by accident.
What's the safest way to handle NULL in a NOT IN clause?
Switch to NOT EXISTS. The NOT IN operator returns an empty set whenever the subquery contains a single NULL — a silent failure mode that breaks queries months after they're written. NOT EXISTS uses correlated comparison and behaves the way most engineers expect, and most warehouses also execute it faster because the planner can short-circuit.
Are NULL ordering defaults consistent across databases?
No, and this is a real source of cross-warehouse bugs. PostgreSQL, Snowflake, Redshift, and BigQuery put NULL last in ASC order and first in DESC order. MySQL and SQL Server do the opposite. Whenever NULL ordering matters for a report, append NULLS FIRST or NULLS LAST explicitly. It costs nothing and prevents behavior from changing during a warehouse migration.