HAVING SQL cheat sheet

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why HAVING exists

You are five minutes into an analyst onsite at Stripe. The prompt on the screen: "Return cities with more than 100 active users and average order value above 75 dollars." You start typing WHERE COUNT(*) > 100 and the screen-share goes quiet. The interviewer is not waiting for syntax — they are watching for the difference between filtering rows and filtering groups. That difference is what HAVING was invented for, and it is the most common reason candidates fail the first SQL screen at Linear, Notion, and Snowflake.

HAVING filters groups after aggregation has happened. Use it when the condition you care about does not exist until rows have been collapsed by GROUP BY — counts, sums, averages, distinct counts, min, max, conditional aggregates. Use WHERE for everything else. The rule looks trivial on paper, but interviewers stretch it across joins, subqueries, and nullable group keys. By the end of this cheat sheet you will write HAVING cold and avoid the five pitfalls that tank otherwise-good candidates.

Basic syntax and execution order

The full skeleton of a SELECT statement that uses HAVING has a fixed clause order. You do not get to reorder them, even when the engine lets you alias your way around the rules.

SELECT col, agg_func(x)
FROM TABLE
WHERE row_condition
GROUP BY col
HAVING group_condition
ORDER BY something;

The logical evaluation order is FROM then WHERE then GROUP BY then HAVING then SELECT then ORDER BY. Memorize that order before any SQL interview — every confusing rule about aliases, aggregates, and column visibility falls out of it for free.

-- Logical evaluation order:
-- 1. FROM       -> assemble the row source (joins happen here)
-- 2. WHERE      -> drop rows that fail the predicate (no aggregates yet)
-- 3. GROUP BY   -> collapse rows into groups by the grouping keys
-- 4. HAVING     -> drop groups that fail the predicate (aggregates legal)
-- 5. SELECT     -> project final columns and compute aliases
-- 6. ORDER BY   -> sort the result set

A SELECT alias is invisible to WHERE because SELECT runs later. HAVING runs after GROUP BY so aggregates are fully computed by the time the predicate is evaluated. PostgreSQL and MySQL let you reference SELECT aliases from HAVING, but SQL Server and BigQuery in strict mode do not — the portable habit is to repeat the expression.

WHERE vs HAVING in plain English

WHERE filters rows before they are grouped. HAVING filters groups after. If you can compute the condition without looking at any aggregate, put it in WHERE. If it references COUNT, SUM, AVG, MIN, MAX, or any window-free aggregate, it must live in HAVING.

-- WHERE filters rows: drop everyone signed up before 2026
SELECT city, COUNT(*) AS active_users
FROM users
WHERE created_at >= DATE '2026-01-01'
GROUP BY city;

-- HAVING filters groups: keep cities with more than 100 active users
SELECT city, COUNT(*) AS active_users
FROM users
WHERE created_at >= DATE '2026-01-01'
GROUP BY city
HAVING COUNT(*) > 100;

Both clauses can coexist and usually should. Push every non-aggregate predicate into WHERE so GROUP BY has fewer rows to bucket, then use HAVING exclusively for aggregate conditions. Writing HAVING city = 'New York' instead of WHERE city = 'New York' is legal but wastes work — the database still groups the entire table before throwing most of it away.

Worked examples

Three scenarios cover nearly every analyst interview loop. Each one shows where WHERE belongs, where HAVING belongs, and what the engine does.

The cleanest case for WHERE: every city in a single country with a row count. No aggregate in the predicate, so WHERE handles it and HAVING is unnecessary.

SELECT city, COUNT(*) AS users
FROM users
WHERE country = 'US'
GROUP BY city
ORDER BY users DESC;

The cleanest case for HAVING: every city, no row-level filter, only groups above a threshold. The count does not exist until the group is formed, so there is no way to write this at the row level.

SELECT city, COUNT(*) AS users
FROM users
GROUP BY city
HAVING COUNT(*) > 10
ORDER BY users DESC;

The third combines both — filter rows first, group, then filter groups. This is the shape of nearly every real production query that uses HAVING.

SELECT city,
       COUNT(*)              AS active_users,
       AVG(order_amount)     AS aov
FROM orders
WHERE country = 'US'
  AND created_at >= DATE '2026-01-01'
GROUP BY city
HAVING COUNT(*) > 100
   AND AVG(order_amount) > 75
ORDER BY aov DESC;

A senior variation references an aggregate from a correlated subquery. Snowflake and Databricks loops lean on this pattern to test whether you can keep two scopes of aggregation straight.

SELECT category, SUM(revenue) AS cat_revenue
FROM orders
JOIN products USING (product_id)
GROUP BY category
HAVING SUM(revenue) > (
    SELECT AVG(cat_rev)
    FROM (
        SELECT SUM(revenue) AS cat_rev
        FROM orders
        JOIN products USING (product_id)
        GROUP BY category
    ) t
);

The inner derived table computes one number per category, the outer query averages those, and HAVING compares each category total against that single scalar. Narrate the scopes back to the interviewer — "outer groups by category, inner averages per-category sums, HAVING is scalar comparison" — and you pass the round.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Ten drill questions with answers

Run through ten variations back-to-back. Try each before reading the answer. Schema: users(user_id, city, country, created_at), orders(order_id, user_id, product_id, amount, created_at), products(product_id, category).

-- 1. Cities with more than 100 users
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;

-- 2. Users with more than three orders
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 3;

-- 3. Products with total revenue above one million
SELECT product_id, SUM(amount) AS revenue
FROM orders
GROUP BY product_id
HAVING SUM(amount) > 1000000;

-- 4. Categories with more than ten unique buyers
SELECT category, COUNT(DISTINCT user_id) AS unique_buyers
FROM orders
JOIN products USING (product_id)
GROUP BY category
HAVING COUNT(DISTINCT user_id) > 10;

-- 5. Days with revenue above the monthly average
WITH daily AS (
    SELECT CAST(created_at AS DATE) AS day, SUM(amount) AS rev
    FROM orders
    GROUP BY 1
)
SELECT day, rev
FROM daily
WHERE rev > (SELECT AVG(rev) FROM daily);

-- 6. Users who bought all three products A, B, C
SELECT user_id
FROM orders
WHERE product_id IN ('A', 'B', 'C')
GROUP BY user_id
HAVING COUNT(DISTINCT product_id) = 3;

-- 7. Users with at least two payments totaling more than 5000
SELECT user_id, COUNT(*) AS payments, SUM(amount) AS total
FROM payments
GROUP BY user_id
HAVING COUNT(*) >= 2
   AND SUM(amount) > 5000;

-- 8. Cities where the average ticket is above 2000
SELECT city, AVG(amount) AS avg_ticket
FROM orders
GROUP BY city
HAVING AVG(amount) > 2000;

-- 9. Users with at least one purchase above 10000
SELECT user_id, MAX(amount) AS biggest_order
FROM orders
GROUP BY user_id
HAVING MAX(amount) > 10000;

-- 10. (user, month) pairs with more than ten orders
SELECT user_id,
       DATE_TRUNC('month', created_at) AS month,
       COUNT(*) AS orders_in_month
FROM orders
GROUP BY 1, 2
HAVING COUNT(*) > 10;

If you can get through all ten without peeking, you are ready for the HAVING portion of any analyst screen. Question six — counting distinct values inside a filtered set — is the trick interviewers reuse most, because it tests whether you understand that IN ('A','B','C') plus COUNT(DISTINCT product_id) = 3 is equivalent to set inclusion, not just row presence.

Common pitfalls

The first trap is putting an aggregate inside WHERE. The engine has not run GROUP BY yet, so COUNT(*) is undefined at that stage and every database rejects the query with a parse error. Candidates still write it under interview pressure because they translate the requirement word-by-word. Keep one sentence in your head before you type: aggregates live downstream of GROUP BY, so any condition on an aggregate lives in HAVING.

The second trap is relying on SELECT aliases inside HAVING. PostgreSQL and MySQL allow it as a convenience, so the query runs locally and the candidate ships it. The same code throws an error in SQL Server, BigQuery strict mode, and several cloud warehouses. Repeat the aggregate expression in HAVING rather than referencing the alias — it never breaks when the team migrates from Postgres to Snowflake.

The third trap is using HAVING for a filter that could have been a WHERE. GROUP BY city HAVING city = 'New York' is legal, but the engine groups the entire users table first and throws away most of the work. On a small dataset this is invisible. On a billion-row event log it doubles your query cost. Every non-aggregate predicate belongs in WHERE.

The fourth trap is forgetting how HAVING interacts with NULL. A condition like HAVING avg_amount = NULL is never true, because NULL = NULL is itself NULL. If a group has no non-null amounts then AVG(amount) is NULL and the only way to keep it is HAVING AVG(amount) IS NULL. The same three-valued logic that bites WHERE bites HAVING too, but it surprises candidates more there because they think the aggregate has "fixed" the nulls.

The fifth trap is mixing HAVING with window functions. Windows run in the SELECT stage, after HAVING, so you cannot filter on a window result with HAVING. "Top-three customers per region by revenue" requires either a subquery wrapping the window or a QUALIFY clause on Snowflake, BigQuery, and Databricks. HAVING ROW_NUMBER() OVER (...) <= 3 is a guaranteed parse error.

Optimization tips

Push every predicate that can be expressed as a row condition into WHERE. The optimizer uses indexes, partition pruning, and statistics on WHERE predicates, but none of those apply to HAVING because rows have already been merged into groups by then. A date range filter that looks identical in both can run ten or a hundred times faster in WHERE on a large partitioned table.

Use covering indexes that include the grouping key and the aggregated column. On Postgres a partial index on (city) INCLUDE (amount) WHERE created_at >= DATE '2026-01-01' lets GROUP BY city HAVING SUM(amount) > 1000 run as an index-only scan over the recent partition. On Snowflake and BigQuery, partition by date and cluster by city so the engine reads only the relevant micro-partitions.

For very large group-bys, pre-aggregate into a daily summary table. If your HAVING predicate is on total revenue per merchant, the dashboard does not need to scan raw events every time — a nightly merchant_daily_revenue job is one or two orders of magnitude cheaper, and your HAVING runs on a row count that fits in memory.

When the aggregate is COUNT(DISTINCT user_id) on a huge table, approximate cardinality functions help. Warehouses expose APPROX_COUNT_DISTINCT or HLL_COUNT.DISTINCT with error bounds under two percent — mention it if the interviewer cares about performance.

If you want to drill problems like these every day, NAILDD is launching with hundreds of analyst SQL drills built around exactly this pattern.

FAQ

Can I use HAVING without GROUP BY?

Yes, but only when every column in SELECT is an aggregate. The query returns one row if the predicate is satisfied and zero rows otherwise, which is occasionally useful for sanity checks like SELECT COUNT(*) FROM events HAVING COUNT(*) > 0. Almost every HAVING you write in production sits on top of a GROUP BY — treat the no-group-by form as a curiosity, not a tool.

Why does my SELECT alias work in HAVING on Postgres but not on SQL Server?

The SQL standard says HAVING is evaluated before SELECT, so aliases defined in SELECT should not be visible. PostgreSQL and MySQL bend the rule as a convenience; SQL Server and several others follow the standard strictly. Repeat the expression — it costs a few keystrokes and runs unchanged when the team migrates warehouses.

Which is faster, WHERE or HAVING?

WHERE is always at least as fast as HAVING, and usually much faster, because WHERE predicates can use indexes, partition pruning, and runtime filters before any grouping happens. HAVING operates on the already-grouped result and has none of those acceleration paths. Use WHERE for every row-level condition and reserve HAVING for genuine aggregate conditions.

How does HAVING handle NULL aggregates?

Aggregate functions ignore NULL inputs by default, so AVG, SUM, MIN, and MAX return NULL only when every input in the group is NULL. A predicate like HAVING AVG(amount) > 100 silently drops those all-null groups because NULL > 100 evaluates as unknown. If you want to keep them, write HAVING AVG(amount) > 100 OR AVG(amount) IS NULL.

Can I use a window function inside HAVING?

No. Windows run in the SELECT stage of evaluation, which happens after HAVING. To filter on a window result — like "top three rows per group" — wrap the window in a subquery or CTE and filter outside, or use QUALIFY on Snowflake, BigQuery, and Databricks. HAVING ROW_NUMBER() OVER ... <= 3 is a guaranteed parse error.

Is HAVING evaluated before or after ORDER BY?

Before. The engine filters groups first, then sorts what remains. That ordering also means a SELECT alias is visible to ORDER BY on every major engine, even when the same alias is not visible to HAVING on engines that follow the standard strictly.