How to read EXPLAIN ANALYZE

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Why read EXPLAIN

When a SQL query at Stripe or Notion takes 45 seconds instead of 45 milliseconds, two questions matter: what is the database doing, and how do you stop it? EXPLAIN ANALYZE answers the first. Without it, every optimization is a guess — you might add an index the planner ignores, or rewrite a query the optimizer was already handling. For a mid-level analyst interview at Meta or Databricks, reading a plan out loud and pointing at the bottleneck is a baseline skill. Syntax is simple:

EXPLAIN ANALYZE SELECT ...;

One safety note. EXPLAIN ANALYZE actually executes the query. For SELECT that is fine. For UPDATE, DELETE, or INSERT, it will write rows. Use plain EXPLAIN (plan only) for those, or wrap the run in a transaction and ROLLBACK at the end.

Anatomy of the output

A toy query:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42;

A typical response:

Index Scan using idx_orders_user on orders
    (cost=0.43..8.45 rows=1 width=64)
    (actual time=0.023..0.025 rows=1 loops=1)
    Index Cond: (user_id = 42)
Planning Time: 0.102 ms
Execution Time: 0.054 ms

Reading the blocks. Index Scan using idx_orders_user walks the index on user_id — the fast path. cost=0.43..8.45 is the planner's estimate; first number is startup cost (time to first row), second is total. Units are arbitrary and only meaningful within the same plan — comparing cost between unrelated queries is a category error.

rows=1 is the estimated row count. If it drifts far from actual rows, statistics are stale and the plan was built on a fantasy. actual time=0.023..0.025 is real wall-clock time in milliseconds per loop. rows=1 loops=1 means the node was entered once and returned one row — if loops is huge, actual time gets multiplied by it. For most production queries, execution dominates planning by orders of magnitude.

Scan types

A table can be read several ways, each with a different cost shape.

A Seq Scan reads every page of the table. For small tables, this is fine — sometimes faster than using an index. For a 50-million-row events table without a usable filter, it is a disaster.

An Index Scan walks the index to find matching rows, then jumps into the table for each one. Great for selective filters, but pays a random-read penalty per row.

An Index Only Scan is the fastest variant. Every column the query needs is already in the index, so the executor never touches the table. This requires a covering index — composite indexes that include both filter and selected columns can be a big win.

A Bitmap Heap Scan paired with a Bitmap Index Scan is a hybrid. The index builds a bitmap of matching pages, then the heap scan reads those pages sequentially. Wins when a filter is moderately selective — too many rows for a plain index scan, too few for a seq scan.

Whether any of these is "good" depends on context. A Seq Scan on a 10-row lookup table is correct. An Index Scan driven by a million loops from a Nested Loop is a smoking gun.

Join algorithms

PostgreSQL picks between three join strategies, and the difference is often 50 ms versus 50 seconds.

A Nested Loop walks the outer table and, for each row, scans the inner. Complexity is O(N x M). Works well when the outer side is small and the inner has a tight index on the join key. Falls apart when the outer side is large and the planner mis-estimated it.

A Hash Join builds a hash table on one side, then streams the other through it. Complexity is roughly O(N + M). The workhorse for medium-to-large equality joins where neither side has the perfect index.

A Merge Join sorts both sides by the join key, then walks them together. Shines when inputs are already sorted — both come from index scans on the join column. If the sort happens on the fly, the cost flips.

The classic failure mode:

Nested Loop  (actual time=0.1..5432 rows=1000000 loops=1)
    ->  Seq Scan on users  (rows=1000)
    ->  Index Scan on orders  (loops=1000)

A million rows from a Nested Loop with 1000 outer rows means the inner index scan is being executed a thousand times. A Hash Join would build one hash table on users and stream orders through it once. The fix is rarely "force a different join" — it is "tell the planner the truth" by running ANALYZE orders; so estimates match reality.

Sorts

When ORDER BY, GROUP BY, or DISTINCT cannot be served by an index, a Sort node appears:

Sort  (actual time=120.5..125.8 rows=50000 loops=1)
  Sort Method: quicksort  Memory: 125kB
  ->  Seq Scan on orders

Sort Method: quicksort Memory: 125kB means the sort fit in memory. Good. The bad version:

Sort Method: external merge  Disk: 245MB

The data did not fit in work_mem and PostgreSQL spilled to disk. Disk sorts are dramatically slower — often 10-50x — and they hammer the I/O of every other query on the server. Three fixes: add an index that matches the ORDER BY so the sort disappears, raise work_mem for this session, or rewrite the query to reduce the volume being sorted. The first is almost always right for production.

The ideal endgame is no Sort node:

Index Scan using idx_orders_created_at on orders

The index already stores rows in sorted order. The executor walks it.

Aggregates

For GROUP BY, PostgreSQL chooses between HashAggregate and GroupAggregate. HashAggregate builds an in-memory hash table and accumulates aggregates as rows stream in — fast when distinct groups are modest. GroupAggregate requires sorted input and runs a single pass — slower per row but constant memory.

If the plan shows HashAggregate consuming enormous memory or spilling, the grouping cardinality is higher than the planner thought. Grouping by user_id on a billion-row table can be legitimately expensive, but it can also be a sign to pre-aggregate into a daily roll-up instead of computing it live.

To drill query-plan reading against real interview questions, NAILDD is building an SQL practice library where each problem ships with the expected plan and a worked optimization path.

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Row estimates and statistics

Side by side in every node, the plan prints estimated rows and actual rows. A big gap is the single most common source of bad plans:

Seq Scan on orders  (rows=100) (actual rows=1000000)

The planner thought it would get 100 rows and got a million. Every operation built on top of that estimate — joins, sorts, aggregates — was chosen for a hundred-row world. In reality those operations are crushing a million rows, and the plan is wrong from that node up.

The fix is usually ANALYZE table_name; to refresh statistics. PostgreSQL runs autovacuum and auto-analyze in the background, but high-velocity tables drift between runs. Raise the auto-analyze frequency in autovacuum settings, or add a manual ANALYZE at the end of a batch load. A subtler case is correlated columns — if country and currency are dependent, the planner assumes independence and gets joint selectivity wrong. CREATE STATISTICS with dependencies tells the planner about it.

BUFFERS and real cost

By default EXPLAIN ANALYZE does not show how much data was read from disk versus served from the page cache. Turn it on:

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Extra lines appear:

Buffers: shared hit=125 read=50

shared hit counts blocks already in PostgreSQL's shared buffer cache — fast. shared read counts blocks fetched from disk — slow. On a cold cache, most numbers are read. After the same query runs twice, blocks land in cache and the next run is mostly hit. This matters for benchmarking: a first execution measures cold performance; subsequent ones measure warm. If a dashboard query runs at 50 ms in your testing but users see 5 seconds, the cold-cache cost is the gap.

Common pitfalls

The first trap is reading cost and stopping. cost is what the planner thinks the query will cost in arbitrary units; actual time is the truth in milliseconds. When cost is low and actual time is high, the planner was wrong and a better plan likely exists. Cross-check both — a divergence is begging for an ANALYZE.

The second trap is trusting a single run. The first execution pays cold-cache penalties — disk reads, parser caches, JIT compilation in PostgreSQL 11+. Run the query two or three times and read the stabilized numbers. If only the first run is slow, you have a cold-cache problem, not a plan problem.

The third trap is comparing cost across different queries. The unit is arbitrary and only meaningful within a single planning round. A query with cost 1000 is not necessarily slower than one with cost 100 — it depends on table sizes, per-row work, and dozens of other inputs. Inside one plan, cost rankings are reliable; across queries, they are noise.

The fourth trap is ignoring loops on Nested Loop inner nodes. If the inner side shows actual time=0.5 ms loops=100000, real time on that node is roughly 50 seconds, not half a millisecond. PostgreSQL prints time per loop, not total. People miss this and chase phantom bottlenecks elsewhere.

The fifth trap is fixing one slow node and declaring victory. Real queries usually have two or three weak nodes layered on top of each other. The first fix exposes the second. Treat optimization as iterative — re-run EXPLAIN ANALYZE after every change and re-read the new plan from scratch.

A step-by-step debug routine

When a query is slow, work through this in order. Step one: run EXPLAIN (ANALYZE, BUFFERS). Note the total Execution Time so you have a baseline.

Step two: find the heaviest node. Scan the plan for the largest actual time value, accounting for loops. PostgreSQL prints per-loop time, so a 1 ms inner node looped 100,000 times outweighs a 50 ms outer node looped once.

Step three: classify the node. Seq Scan on a large table? You probably need an index, or the planner avoided your existing index because of a function over the column or a type cast. Nested Loop with millions of inner rows? Statistics are likely stale, or a Hash Join would be better. Sort Method: external merge Disk? Add an index that matches the sort order, or raise work_mem.

Step four: apply the change, re-run EXPLAIN ANALYZE, compare Execution Time and plan shape. If the same node is still heaviest, your fix did not help — back it out and try a different angle. If the heavy node moved, you made progress, and the next iteration tackles the new bottleneck.

Step five: inside 2-3x of the target latency, stop tuning the query and look at the system. Concurrent long-running transactions holding locks, undersized buffer cache, autovacuum debt — plan-level fixes have a floor, and past it you are negotiating with infrastructure, not SQL.

FAQ

EXPLAIN or EXPLAIN ANALYZE — which one should I run?

Plain EXPLAIN returns the plan only, without executing. Instant and safe even for UPDATE or DELETE. EXPLAIN ANALYZE runs the query for real and reports actual timings, row counts, and loops — much more useful for debugging, dangerous for write statements. For any SELECT you are tuning, reach for ANALYZE. For DML, use plain EXPLAIN or wrap the run inside BEGIN ... ROLLBACK so the changes never commit.

Why does the plan look fine but the query is still slow?

Check BUFFERS first. If most blocks are read and few are hit, the query is paying cold-cache cost — run it twice and read the second run, or use pg_prewarm to seed the cache. If the query is consistently slow across runs, look outside the plan: concurrent transactions, locks in pg_locks, saturated CPU or I/O on the host, vacuum debt. A perfect plan executed by a starved server still takes seconds.

Why is PostgreSQL ignoring my index?

Three usual suspects. One: stale statistics — run ANALYZE and re-check the plan. Two: the predicate is not index-sargable. Wrapping the indexed column in a function (LOWER(email) = ...) or an implicit cast forces the planner off the index unless you build a matching expression index. Three: the table is small enough that a Seq Scan is genuinely cheaper. The planner is not always wrong.

How do I make a complex plan readable?

Use a plan visualizer. explain.depesz.com accepts raw EXPLAIN ANALYZE output and renders it as a tree with per-node row counts and time, highlighting the heaviest nodes. pgMustard does similar work with more opinionated suggestions. For 200-line plans from a multi-CTE analytical query, these turn a 20-minute reading exercise into a 30-second scan — they also help when explaining a plan during review.

What is the difference between Planning Time and Execution Time?

Planning Time is what PostgreSQL spent figuring out how to run the query — parsing, rewriting, picking join orders, choosing access methods. Execution Time is what it spent doing the work. Execution dominates by 100-1000x in most queries, so you ignore planning time. For very fast queries hit at extreme rates on a hot OLTP path, planning time becomes the bottleneck — that is when prepared statements start to matter, because the plan is computed once and reused.