HAVING in SQL: filtering groups after GROUP BY

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What HAVING does and when you reach for it

Picture a Tuesday morning at Stripe. The growth PM pings you in Slack: "Pull every merchant category that did more than 250 thousand dollars last month, and only count categories with at least fifty active merchants." You open your editor and type WHERE SUM(amount) > 250000. Three seconds later the engine returns a syntax error, the PM is waiting, and you remember the small but unforgiving rule: aggregate conditions belong in HAVING, not WHERE. Every analyst at DoorDash, Airbnb, Linear, and Snowflake learns this the same way.

HAVING filters groups after GROUP BY has already collapsed rows into buckets. The clause exists because some questions cannot be expressed without seeing aggregates first. "Users with five or more orders," "categories with revenue above one hundred thousand dollars," "departments where average tenure exceeds three years" — none of those conditions exist at the row level. They only exist after the engine has grouped, counted, and summed.

HAVING vs WHERE

WHERE filters rows before grouping. HAVING filters groups after grouping. The rule for deciding which one you need is mechanical: if your condition references an aggregate function, it goes in HAVING; if it references a column value, it goes in WHERE. Mixing the two in the same statement is normal and often required.

-- WHERE removes rows BEFORE the engine groups
SELECT category, SUM(amount) AS revenue
FROM orders
WHERE amount > 0
GROUP BY category;

-- HAVING removes groups AFTER the engine groups
SELECT category, SUM(amount) AS revenue
FROM orders
GROUP BY category
HAVING SUM(amount) > 100000;

In the first query, WHERE strips out zero or negative rows before they are ever counted toward a category total. In the second, the engine builds a sum for every category, then keeps only the categories whose total clears one hundred thousand dollars. Writing WHERE SUM(amount) > 100000 is a syntax error because at the moment WHERE is evaluated the aggregate does not yet exist.

If you can express the filter at the row level, do it in WHERE. Filtering by an aggregate makes the engine carry more rows into the group phase than it needs to, and on a billion-row table that is a real cost. Use HAVING only when the condition truly depends on grouped output.

Syntax and clause order

HAVING always sits after GROUP BY and before ORDER BY. The full skeleton has a fixed shape you should be able to write from muscle memory.

SELECT column, AGG_FUNCTION(column2) AS alias
FROM TABLE
WHERE row_predicate
GROUP BY column
HAVING aggregate_predicate
ORDER BY alias
LIMIT n;

Inside HAVING you can use any aggregate function the dialect supports — COUNT, SUM, AVG, MIN, MAX, and COUNT(DISTINCT ...) are the everyday choices. You can combine predicates with AND, OR, and parentheses, and you can compare aggregates to a scalar, to another aggregate, or to the result of a subquery.

SELECT
    user_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_spent
FROM orders
GROUP BY user_id
HAVING COUNT(*) >= 5 AND SUM(amount) > 10000;

The query returns one row per user who has placed at least five orders and spent at least ten thousand dollars in total. Both predicates run against aggregates, so both belong in HAVING.

How the engine runs your query

Understanding why HAVING exists requires a short detour through the logical execution order of a SELECT statement. The dialect may rewrite the plan for performance, but the logical order is fixed and is what you reason about when you write SQL.

  1. FROM resolves the source tables and joins.
  2. WHERE filters rows.
  3. GROUP BY collapses surviving rows into groups.
  4. HAVING filters groups.
  5. SELECT evaluates expressions and aliases.
  6. ORDER BY sorts the surviving rows.
  7. LIMIT truncates the result.

WHERE runs at step two. At that moment no aggregates exist yet, so aggregate functions are illegal there. HAVING runs at step four. By then the engine has finished computing every aggregate, so they are all available. This is also why most dialects reject aliases inside HAVING — aliases are bound at step five, one step later.

Worked examples from real interviews

Users with five or more orders

SELECT
    user_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) >= 5
ORDER BY order_count DESC;

GROUP BY produces one row per user. HAVING keeps only the users who cross the activity threshold. This pattern is the building block for power-user dashboards, retention cohorts, and abuse detection.

Categories with revenue above one hundred thousand dollars

SELECT
    p.category,
    SUM(o.amount) AS revenue,
    COUNT(DISTINCT o.user_id) AS buyers
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= DATE '2026-01-01'
GROUP BY p.category
HAVING SUM(o.amount) > 100000
ORDER BY revenue DESC;

Here WHERE and HAVING cooperate. WHERE limits the source to orders from this year before the engine bothers grouping. HAVING keeps only the categories whose total cleared the revenue gate.

Products bought by three or more unique users

SELECT
    product_id,
    COUNT(DISTINCT user_id) AS unique_buyers,
    SUM(quantity) AS total_units
FROM order_items
GROUP BY product_id
HAVING COUNT(DISTINCT user_id) >= 3;

COUNT(DISTINCT ...) works inside HAVING, not only in SELECT. Marketplace teams at Uber Eats and DoorDash use this exact shape to isolate items with real organic demand from items that one repeat buyer keeps reordering.

Departments with high average salary and meaningful headcount

SELECT
    department,
    AVG(salary) AS avg_salary,
    COUNT(*) AS headcount
FROM employees
WHERE hire_date >= DATE '2023-01-01'
GROUP BY department
HAVING AVG(salary) > 150000 AND COUNT(*) >= 3;

Two aggregate predicates combined with AND. The WHERE clause restricts the population first, and HAVING keeps only departments where the average salary clears one hundred fifty thousand dollars and the team is at least three people. Combining a magnitude condition with a sample-size condition is a common interviewer move.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

HAVING without GROUP BY and with subqueries

HAVING without GROUP BY is legal. In that case the entire table is treated as a single implicit group, and HAVING decides whether that one row of aggregates should be returned at all.

SELECT COUNT(*) AS total_orders
FROM orders
HAVING COUNT(*) > 1000;

If the table holds more than one thousand orders, you get back a single row with the count. If it holds fewer, you get back an empty result. The pattern is rare in production but appears in interview gotchas.

The more useful advanced pattern is HAVING with a scalar subquery, which lets you compare a group aggregate to a global aggregate.

SELECT
    category,
    AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > (SELECT AVG(price) FROM products);

The outer query computes one average per category. The subquery computes the overall average across every product. HAVING keeps only the categories whose average exceeds the global mean. The subquery runs once and is reused for every group.

Common pitfalls

The first pitfall is putting an aggregate inside WHERE. Writing WHERE COUNT(*) > 5 looks reasonable at a glance and is the single most common mistake interviewers see. The engine rejects it because at the row-filter stage no aggregate has been computed. The fix is mechanical: move the predicate to HAVING. Flag the error out loud in a live interview to show you understand the logical order.

The second pitfall is referencing a select-list alias inside HAVING. Standard SQL forbids it because aliases are bound at step five and HAVING runs at step four. Writing HAVING revenue > 100000 works in MySQL and PostgreSQL as a vendor extension, but the same query crashes in Snowflake strict mode and confuses reviewers. The safe habit is to repeat the full expression: HAVING SUM(amount) > 100000.

The third pitfall is filtering rows through HAVING when WHERE would have done the job. A predicate like HAVING status = 'active' is technically legal, but it forces the engine to group every row first and then throw most groups away. Move the same predicate up to WHERE and the engine drops inactive rows before grouping, which can be one or two orders of magnitude faster on a wide fact table.

The fourth pitfall is forgetting GROUP BY entirely. The engine silently treats the whole table as one group, returns a single row, and the candidate stares at a result that does not match what they meant to write. If you expected one row per category and got one row period, the missing GROUP BY category is the first place to look.

Optimization tips

The first lever is to push every row-level predicate into WHERE. The further down the query plan the engine can prune rows, the less data has to travel through GROUP BY. On a multi-billion-row fact table the difference between WHERE order_date >= DATE '2026-01-01' and the same predicate inside HAVING is the difference between seconds and a timeout.

The second lever is making sure the columns referenced by GROUP BY are well indexed or, on a columnar warehouse, are good clustering keys. Snowflake clusters cooperate when grouping keys are low-cardinality and queried often. On Postgres with a row store, an index on the grouping column lets the planner pick a hash or sort aggregate that does not need a separate scan.

The third lever is keeping the aggregate expressions inside HAVING identical to the ones in SELECT. Most engines compute each distinct aggregate once and reuse the result. If you write SUM(amount) in SELECT and SUM(amount) in HAVING, the engine evaluates it one time. Write SUM(amount * 1.0) in one place and SUM(amount) in the other, and you have asked for two separate aggregates.

Interview questions with answers

What is the difference between WHERE and HAVING? WHERE filters individual rows before GROUP BY runs, and it cannot reference aggregate functions because no aggregate exists yet. HAVING filters whole groups after GROUP BY has produced them, and it is the only clause where you may reference aggregates as a predicate. A typical query uses both.

Can you use HAVING without GROUP BY? Yes. With no GROUP BY, the engine treats the whole table as a single implicit group, computes the aggregates once, and applies the HAVING predicate to that single result. The construction is rare in production code but useful as a sanity test inside data pipelines.

Write a query that finds users with an average order value above seventy-five dollars and more than three orders. SELECT user_id, AVG(amount) AS avg_check, COUNT(*) AS order_count FROM orders GROUP BY user_id HAVING AVG(amount) > 75 AND COUNT(*) > 3. Both predicates reference aggregates so both belong in HAVING.

Why does WHERE SUM(amount) > 10000 fail? Because the logical execution order places WHERE before GROUP BY. At step two the engine is still looking at unaggregated rows; there is no SUM to compare against. Aggregate functions are valid only in HAVING, SELECT, and ORDER BY, all of which run after grouping.

If you want to drill HAVING patterns until they feel automatic, NAILDD is launching with hundreds of SQL problems covering exactly this shape — aggregate filters, multi-clause queries, and interviewer-style follow-ups.

FAQ

Is WHERE faster than HAVING for the same filter?

When the filter can be expressed against a row-level column, WHERE is faster and almost always the right choice. It prunes rows before they reach GROUP BY, which means the grouping stage processes a smaller input. HAVING is the only option when the predicate references an aggregate, and you should treat it as a tool for that case rather than as a general-purpose filter.

Can HAVING combine multiple conditions?

Yes. Inside HAVING you can join predicates with AND and OR, wrap them in parentheses, and mix different aggregate functions. A typical compound predicate looks like HAVING COUNT(*) >= 5 AND SUM(amount) > 10000. As with WHERE, AND binds tighter than OR, so use parentheses whenever the ordering is not obvious from the expression.

Does HAVING work with window functions?

No. HAVING operates only on the aggregate functions that GROUP BY produced. Window functions evaluate at the SELECT stage, after HAVING has already run. To filter on a window function, wrap the query in a subquery or a CTE and apply a normal WHERE predicate on the outer level.

Can I reference a SELECT alias inside HAVING?

Standard SQL says no, because aliases are bound at the SELECT stage which runs after HAVING. PostgreSQL and MySQL allow it as a non-standard extension, and Snowflake allows it in lenient modes. The portable habit is to repeat the full expression — for example HAVING SUM(amount) > 100000 rather than HAVING revenue > 100000.

When should I prefer a CTE over HAVING?

If your filter is simple — one or two aggregate predicates on a single grouped query — HAVING is the cleanest option. If you need to compose several aggregate filters across multiple group keys, or you want to reuse the grouped result more than once, a CTE that materializes the aggregates and a downstream WHERE will read better and run no slower on a modern engine.