Materialized views in SQL: hands-on guide

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Plain VIEW vs materialized VIEW

A plain VIEW is a saved query — every read re-runs the SELECT. It encapsulates logic but gives no performance win; the query plan is identical to running the SELECT inline.

A MATERIALIZED VIEW is a physical table storing query results. Reads just scan stored data — fast, but the data is a frozen snapshot from the last refresh. If a dashboard reads at 9am and orders change by 9:05am, you do not see those changes until you run REFRESH MATERIALIZED VIEW.

The split matters for analytical dashboards. A plain view aggregating GMV across 100 million order rows costs thirty seconds and twenty CPUs per dashboard open. A materialized version returns in half a second because the heavy lifting happened once overnight. The trade-off is freshness: if you need sub-minute latency, materialized views are not the answer.

When materialized views make sense

Three patterns make materialized views pay off. The first is heavy aggregates for dashboards — daily GMV over five million transactions across twelve months, weekly active users by acquisition channel, monthly cohort retention. These queries scan tens of gigabytes every time a PM opens Looker, and computing them on demand burns warehouse credits and patience.

The second pattern is complex joins across several tables. A wide user profile joining events, orders, payments, and CRM data can run for minutes on raw tables. Materializing reduces query time to milliseconds, which makes interactive exploration in BI tools actually interactive. A data scientist at Stripe or Airbnb slicing this profile fifty times a day will love you for it.

The third pattern is reference data that rarely changes — country lookups, denormalized product catalogs, currency tables. Materializing the join saves you from doing it on every query.

The flip side: avoid materialized views when freshness beats speed. Real-time fraud monitoring, live operational dashboards, anything that drives a paging alert needs the source of truth, not an hour-old snapshot. Avoid them when queries are already cheap, and avoid them when source tables churn faster than your refresh cadence — an hourly view over per-minute writes is permanently stale.

PostgreSQL syntax

Wrap a SELECT in CREATE MATERIALIZED VIEW and Postgres runs the query once, storing the result.

CREATE MATERIALIZED VIEW mart_daily_gmv AS
SELECT
    DATE_TRUNC('day', created_at)::DATE AS day,
    SUM(amount) AS gmv,
    COUNT(DISTINCT user_id) AS buyers,
    AVG(amount) AS avg_check
FROM orders
WHERE status = 'paid'
GROUP BY 1;

CREATE INDEX ON mart_daily_gmv (day);

Dashboards can now query it like any table — SELECT * FROM mart_daily_gmv WHERE day >= CURRENT_DATE - INTERVAL '30 day' runs in milliseconds because Postgres reads a pre-aggregated, indexed table rather than scanning every order row. Without the index on day, even small materialized views degrade to sequential scans for filtered queries.

The SELECT inside the materialized view is the contract. If the raw orders table later gets a new column, the view will not pick it up until you DROP and CREATE again. Treat the SELECT as part of your schema migration.

REFRESH and CONCURRENTLY

A materialized view does not refresh itself. The data is frozen at the last refresh time. To recompute, run:

REFRESH MATERIALIZED VIEW mart_daily_gmv;

This re-runs the underlying SELECT and overwrites the stored rows. Duration tracks the SELECT — a five-second query takes five seconds, a ten-minute query takes ten minutes. No incremental magic by default.

The catch with plain REFRESH is that it takes an ACCESS EXCLUSIVE lock on the view for the duration of the refresh. Any dashboard query trying to read during that window blocks. For an hourly refresh on a small mart this is fine; for a daily refresh on a giant mart mid-business-day, it is a production incident waiting to happen.

The fix is CONCURRENTLY:

REFRESH MATERIALIZED VIEW CONCURRENTLY mart_daily_gmv;

CONCURRENTLY rebuilds data in the background and atomically swaps it in without blocking reads. Two costs: it requires a UNIQUE INDEX on the view, and it is slower because it has to compare old and new state. For anything user-facing during business hours, CONCURRENTLY is the default.

Refresh strategies

How often to refresh is a product decision, not a database one. Ask the dashboard owner what staleness they can live with, then pick the cheapest strategy that meets it.

Scheduled refresh through Airflow, dbt Cloud, or plain cron is by far the most common approach. An Airflow DAG runs the upstream ETL, lands fresh rows in orders, then triggers REFRESH MATERIALIZED VIEW CONCURRENTLY as the final step. The advantage is predictability — finance reports always land at 7am, and stakeholders calibrate against that. Most analytics teams at DoorDash or Notion run on this model.

Trigger-based refresh — firing REFRESH from an INSERT trigger on the source — is almost always a bad idea. A full refresh is expensive, and turning every write into a recompute does not scale. If you want this, you actually want a streaming tool like Materialize or a Snowflake dynamic table.

On-demand refresh works for reports humans open rarely. A finance dashboard with two views per week per executive can refresh when somebody clicks "update" rather than running on a schedule.

Incremental refresh is the holy grail and Postgres does not give it to you out of the box. Snowflake's dynamic tables and Databricks' materialized views support it natively; dbt's incremental model approximates it with custom merge logic. In pure Postgres you can roll your own, but the engineering cost is real. For most teams, "refresh the whole thing nightly" is the right answer until it isn't.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Materialized view vs a regular table

A fair question: how is a materialized view different from a regular table you refill with INSERT INTO ... SELECT ... from Airflow? Functionally they are equivalent — both store pre-computed data and need orchestration to stay fresh.

The differences are organizational. A materialized view is part of the database DDL. The SELECT lives in the schema; you can read it with \d+ mart_daily_gmv in psql. This is great for casual exploration and bad for version control, because the canonical query lives inside the database rather than in git.

A regular table populated by an external pipeline is the opposite. The schema is just storage; the logic lives in your dbt project or Airflow repo. The SELECT is version-controlled, code-reviewed, and tested. Modern analytics teams on dbt almost universally pick this pattern — every table is a dbt model with tests, docs, and a clear DAG. Materialized views inside Postgres give you none of that.

Practical guidance: if your stack uses dbt or any modern transformation layer, build regular tables and let it own freshness. If you are in a Postgres-only codebase without dbt — internal tooling, a small startup, embedded analytics — materialized views are a perfectly good way to make dashboards fast without another moving part.

Worked example: daily product metrics

A realistic materialized view — DAU, orders, revenue, paying users, and signups joined into one row per day.

CREATE MATERIALIZED VIEW mart_daily_product_metrics AS
WITH active_users AS (
    SELECT
        event_time::DATE AS day,
        COUNT(DISTINCT user_id) AS dau
    FROM events
    WHERE event_name = 'app_open'
    GROUP BY 1
),
purchases AS (
    SELECT
        created_at::DATE AS day,
        COUNT(*) AS orders_count,
        SUM(amount) AS revenue,
        COUNT(DISTINCT user_id) AS paying_users
    FROM orders
    WHERE status = 'paid'
    GROUP BY 1
),
new_users AS (
    SELECT
        registered_at::DATE AS day,
        COUNT(*) AS new_signups
    FROM users
    GROUP BY 1
)
SELECT
    COALESCE(a.day, p.day, n.day) AS day,
    COALESCE(a.dau, 0) AS dau,
    COALESCE(p.orders_count, 0) AS orders,
    COALESCE(p.revenue, 0) AS revenue,
    COALESCE(p.paying_users, 0) AS paying_users,
    COALESCE(n.new_signups, 0) AS new_signups
FROM active_users a
FULL OUTER JOIN purchases p USING (day)
FULL OUTER JOIN new_users n USING (day);

CREATE UNIQUE INDEX ON mart_daily_product_metrics (day);

Three CTEs join with FULL OUTER JOIN because not every day has activity in every metric. COALESCE fills nulls with zero. The UNIQUE INDEX on day speeds up daily lookups and unlocks REFRESH CONCURRENTLY.

The Airflow task that owns this mart:

-- runs after the ETL DAG lands new rows in orders, events, users
REFRESH MATERIALIZED VIEW CONCURRENTLY mart_daily_product_metrics;

In front of a Looker or Metabase dashboard, query times drop from thirty seconds to instant.

Limitations in PostgreSQL

Postgres materialized views are useful but limited compared to Snowflake dynamic tables or Databricks materialized views. A few constraints to plan around.

There is no cascading refresh. If you build view B on top of view A, Postgres does not know that refreshing A should also refresh B — you orchestrate the order yourself in Airflow or dbt. This is a frequent source of subtle bugs.

Volatile functions like NOW() or RANDOM() are evaluated at refresh time and frozen until the next refresh. A column defined as NOW() AS snapshot_taken_at will show the refresh time, not the current time. Often useful, but it surprises people the first time.

There is no built-in incremental refresh — every refresh is a full recompute. For a 200-million-row mart this is painful and forces you toward custom logic or a different tool.

Materialized views do not refresh themselves. You need an external orchestrator to call REFRESH. A view that "looked right yesterday" can sit untouched for weeks if nobody owns the refresh job.

Common pitfalls

The first failure mode is treating materialized views as live data. A new analyst queries mart_daily_gmv to investigate "what happened in the last hour," sees yesterday's numbers, and concludes the system is broken. The fix is naming and docs: prefix marts with mart_ or mv_ so it is obvious they are snapshots, and put the refresh cadence in a comment.

A second trap is forgetting the UNIQUE INDEX and discovering it during an incident. Somebody creates a view, ships a daily Airflow job, everything works for a month. Then traffic grows, the dashboard becomes business-critical, and switching to REFRESH CONCURRENTLY errors out because there is no unique index. Add one up front, even if you do not need concurrent refresh today.

The third pitfall is refresh time silently growing. A view that took 30 seconds six months ago now takes 12 minutes because source tables doubled. Nobody notices until the morning run starts overlapping with US business hours. Instrument refresh duration and alert on threshold — you cannot fix what you do not measure.

The fourth pitfall is using a materialized view when a dbt model fits better. If your team runs dbt, every materialized view is one more thing outside the DAG — no tests, no docs, no lineage. The convenience of one-line creation rarely beats losing the ecosystem. Audit hybrid codebases and migrate where you can.

If you want to drill these patterns the way a data engineering interviewer would test them, NAILDD is launching with hundreds of SQL problems on exactly this kind of dashboard-and-mart design.

FAQ

Materialized view or dbt model?

In a modern analytics stack that uses dbt, a dbt model materialized as a regular table is almost always the better choice. You get version control in git, automated tests, documentation, and a lineage graph showing how the mart fits into the warehouse. Materialized views live inside the database and miss all of that. The exception is a Postgres-only project without dbt, where pulling in a transformation layer for two aggregates is overkill.

Can I build a materialized view on top of another materialized view?

Technically yes — a materialized view is a queryable object. The catch is that Postgres has no cascading refresh, so you manage refresh order yourself. If view B reads from view A, refresh A first. Most teams encode this in an Airflow DAG or a dbt project. Forgetting the order is a common production bug: one dashboard shows current numbers, a dependent dashboard shows yesterday's.

How do I know when a materialized view was last refreshed?

Postgres does not track this. The standard workaround is to add NOW() AS refreshed_at to the SELECT — the value gets frozen at refresh time and tells you exactly when the snapshot was taken. Alternatively, log refresh timestamps from your orchestrator into a side table. Pick one and standardize so future engineers do not have to guess.

Is REFRESH CONCURRENTLY always better than plain REFRESH?

For anything user-facing during business hours, yes — the slightly slower refresh is worth not blocking reads. For batch refreshes at 2am when nobody is looking, plain REFRESH is faster and you do not need a unique index. Rule of thumb: if any consumer might query during refresh, use CONCURRENTLY; if you have a guaranteed maintenance window, plain REFRESH is fine.

How big can a materialized view get before it becomes a problem?

No hard limit — Postgres will store hundreds of millions of rows. The practical limit is refresh time. Once a full refresh exceeds your tolerable staleness window, you have a problem. A 30-minute refresh is fine for a daily mart and a disaster for an hourly one. The options at that point: rewrite the SELECT to be cheaper, switch to a regular table with incremental dbt logic, or move to a tool with native incremental refresh like Snowflake dynamic tables.