How to monitor data volume anomaly in SQL
Contents:
Why volume monitoring matters
A production events table normally lands one hundred thousand rows a day. This morning it landed twenty thousand. The pipeline did not fail loudly, the dashboard still rendered, the on-call channel was quiet — but a downstream model is now scoring users on a quarter of the signal it expected, and revenue attribution for yesterday is silently wrong. Volume anomaly detection is the cheapest possible monitor for catching this class of bug, and across data engineering teams at Stripe, Airbnb, Snowflake, Databricks, and Netflix it is usually the first check anyone bolts onto a new pipeline.
The same question gets asked verbatim in DE on-sites at Uber, DoorDash, Vercel, and Linear. The phrasing varies — "alert when the daily row count looks off", "we just shipped a new ingestion path, how would you guard it" — and the answer is the same shape every time. Compute a daily count, build a rolling baseline that excludes today, score today with a z-score, and stratify by day of week so weekends do not page you. Strong candidates name those four steps before touching SQL; weak candidates jump straight to SELECT COUNT(*) and ship a detector that fires every Saturday at 3 a.m.
The whole pattern fits in roughly thirty lines and runs on any warehouse — Postgres, Snowflake, BigQuery, Redshift, Databricks SQL. This post walks the recipe end to end, then collects the traps real detectors hit in their first month.
Daily volume baseline
Start with the simplest possible query: one row per day, with the count of records inserted on that day. Pull sixty days so you have enough signal for a 28-day rolling window plus a month of context to eyeball on a chart.
SELECT
DATE_TRUNC('day', created_at)::DATE AS day,
COUNT(*) AS rows_inserted
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '60 days'
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY day;That single column is already a usable dashboard tile — paste it into a line chart and an outage shows up as a visible cliff. The reason you keep going past the eyeball is scale: with one table you stare at the chart every morning; with three hundred tables you need an alert. The next two sections are the path from a chart to an alert that does not lie.
Rolling baseline
A fixed 60-day mean works for the first month of a stable pipeline and quietly stops working the moment the product grows. A healthy 10 percent week-over-week increase in events trips the static detector every Monday for the same reason a tide trips a flood gauge. Swap the fixed baseline for a rolling one — last 28 days, excluding today — and the detector tracks the trend instead of fighting it.
WITH daily AS (
SELECT
DATE_TRUNC('day', created_at)::DATE AS day,
COUNT(*) AS ROWS
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '60 days'
GROUP BY DATE_TRUNC('day', created_at)
),
rolling AS (
SELECT
day,
ROWS,
AVG(ROWS) OVER w AS rolling_avg,
STDDEV_SAMP(ROWS) OVER w AS rolling_std
FROM daily
WINDOW w AS (
ORDER BY day
ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING
)
)
SELECT *
FROM rolling
WHERE day >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY day;The frame ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING is the load-bearing detail. The lower bound says "look at the previous 28 days"; the upper bound says "do not include the row you are scoring." Drop the 1 PRECEDING and a true anomaly inflates its own baseline, hiding in plain sight — the single most common bug interviewers listen for. Twenty-eight days covers four full weeks and absorbs day-to-day noise without lagging real growth.
Z-score alert
The baseline is a reference point, not a detector. To turn it into an alert compute how many standard deviations today is away from the rolling mean, then threshold. The convention is roughly the same across teams: above three standard deviations is a real alert, between two and three is a warning, anything else is normal.
WITH daily AS (
SELECT
DATE_TRUNC('day', created_at)::DATE AS day,
COUNT(*) AS ROWS
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '60 days'
GROUP BY DATE_TRUNC('day', created_at)
),
rolling AS (
SELECT
day,
ROWS,
AVG(ROWS) OVER w AS rolling_avg,
STDDEV_SAMP(ROWS) OVER w AS rolling_std
FROM daily
WINDOW w AS (ORDER BY day ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING)
)
SELECT
day,
ROWS,
rolling_avg,
(ROWS - rolling_avg)::NUMERIC / NULLIF(rolling_std, 0) AS z_score,
CASE
WHEN ABS((ROWS - rolling_avg) / NULLIF(rolling_std, 0)) > 3 THEN 'CRITICAL'
WHEN ABS((ROWS - rolling_avg) / NULLIF(rolling_std, 0)) > 2 THEN 'WARNING'
ELSE 'ok'
END AS verdict
FROM rolling
WHERE day = CURRENT_DATE - INTERVAL '1 day';Under a normal distribution |z| > 3 corresponds to about 0.27 percent of days — roughly one false positive per metric per year. With three hundred tables on a detector that is still close to one false alarm per day from pure noise, so most teams either raise the threshold to four on noisy tables or, better, calibrate per table from a backtest of the last twelve months. The NULLIF around the standard deviation is not cosmetic: a fresh table with low variance produces a zero std on day one and the detector will divide by zero unless you guard it.
The other thing worth pointing out in an interview is the sign of the deviation. A drop usually means an upstream ETL broke; a spike usually means duplicates or a runaway loop — different runbooks. Split the verdict into CRITICAL_DROP and CRITICAL_SPIKE by replacing ABS() with two CASE branches on the signed z-score.
Day-of-week stratification
A flat z-score detector fires every weekend on almost every consumer product. Saturday DAU is routinely twenty to forty percent below the Tuesday average, so a baseline that mixes weekday and weekend days into a single mean scores every Saturday as a real anomaly. The fix is stratification: compute a separate baseline per day of week, then score Saturday against the last eight Saturdays instead of the last 28 mixed days.
WITH daily AS (
SELECT
DATE_TRUNC('day', created_at)::DATE AS day,
EXTRACT(DOW FROM created_at)::INT AS dow,
COUNT(*) AS ROWS
FROM events
GROUP BY DATE_TRUNC('day', created_at), EXTRACT(DOW FROM created_at)
),
baseline_by_dow AS (
SELECT
dow,
AVG(ROWS) AS dow_avg,
STDDEV_SAMP(ROWS) AS dow_std
FROM daily
WHERE day BETWEEN CURRENT_DATE - INTERVAL '56 days' AND CURRENT_DATE - INTERVAL '1 day'
GROUP BY dow
)
SELECT
d.day,
d.dow,
d.ROWS,
b.dow_avg,
(d.ROWS - b.dow_avg) / NULLIF(b.dow_std, 0) AS z_dow
FROM daily d
JOIN baseline_by_dow b USING (dow)
WHERE d.day = CURRENT_DATE - INTERVAL '1 day';Fifty-six days gives you eight observations per weekday — the minimum for a usable standard deviation. If your table has a strong intra-week pattern but a clean trend, layer this on top of the rolling baseline by including day in the partition and detrending first. For pipelines without weekly seasonality — backend system metrics, B2B internal events — the unstratified version is fine and one fewer moving part.
Common pitfalls
When teams roll their own volume anomaly detector the most common bug is the one called out above: the rolling window includes today, so a true outage inflates its own baseline and the z-score lands close to zero. The fix is the 1 PRECEDING upper bound on the window frame, and it is worth a unit test that injects a fake drop and asserts the detector catches it. Interviewers ask "what would happen if you wrote ROWS BETWEEN 28 PRECEDING AND CURRENT ROW instead?" as a deliberate trap.
A close second is ignoring day-of-week seasonality. Almost every consumer product has a thirty percent gap between weekday and weekend volume, so a flat detector pages the on-call engineer every Saturday. After two weekends of false alarms the team mutes the alert, and three months later a real Saturday outage goes undetected. Stratification by EXTRACT(DOW FROM date) is mandatory for any user-facing metric.
A third trap is z-scoring a skewed distribution. DAU and event volumes are often log-normal, so the right tail is much fatter than a normal distribution predicts and a z of three corresponds to far more than the textbook 0.27 percent of days. Either apply a log transform before the z-score, or switch to a robust alternative — median plus median absolute deviation, or a percentile threshold like value > p99(baseline).
A fourth pitfall is using one threshold for every table. A volatile clickstream table with new sources flowing in and out every week needs a wider threshold — |z| > 4 is sensible — than a stable orders table where |z| > 2 is enough. Calibrate per table on a backtest of the last twelve months and pick the threshold that gives no more than one alert per table per month.
A fifth pitfall is failing to handle holidays. New Year's Day, Black Friday, the Fourth of July are reliably different from a normal weekday, and a vanilla detector fires on every one. The cheap fix is a holidays-aware mute: maintain a small holidays table keyed on date and country, and silence alerts on those days. The better fix is to add holiday flags to the baseline so the detector compares this Black Friday against last year's.
Optimization tips
On a table with billions of rows the COUNT(*) over sixty days is the expensive part. The first lever is to partition or cluster on the event date column — daily partitions cut the scanned bytes for a 60-day window from the entire table to roughly two months of data, which on Snowflake and BigQuery turns minutes into seconds. On Postgres the same effect comes from declarative partitioning plus a date index; on Databricks you want Z-ordering on the event date.
The second lever is to maintain a daily roll-up table. The detector does not need raw events — it needs a single row per day with the count. Materialize that as a small events_daily_volume(day, rows) table refreshed by a once-a-day job, and the alert query becomes a thirty-row window function over a tiny table that runs in milliseconds. Every downstream detector — volume, freshness, completeness — can share the same roll-up.
The third lever is to keep STDDEV_SAMP off raw event tables. Standard deviation is non-additive, so the warehouse cannot reuse the partial aggregates it caches for SUM and COUNT. Running it on top of the daily roll-up is cheap; running it on raw events is not. The same goes for PERCENTILE_CONT if you use the percentile-based variant below.
Related reading
- How to detect anomalies in SQL
- How to calculate data freshness in SQL
- How to calculate data quality score in SQL
- How to calculate data completeness in SQL
- SQL window functions interview questions
If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.
FAQ
Z-score or percentile-based threshold — which should I use?
Default to z-score for metrics that are roughly normal — counts of independent events on a stable surface, signups per day, orders per hour on a mature product. Switch to a percentile-based threshold — "alert when today is below the fifth percentile of the trailing 28 days" — for skewed distributions like revenue per order or session duration, where the tail is fat enough that a z of three corresponds to many more days than the textbook number. The percentile method is also a safe default when you have not yet plotted the histogram.
How long should the baseline window be?
Twenty-eight days is the default: it covers four full weeks, absorbs day-to-day noise, and supports per-day-of-week stratification with eight observations per weekday. Shorter windows — seven or fourteen days — adapt faster to genuine growth but are noisier. Longer windows — sixty or ninety days — are quieter but lag real changes and keep paging you after a deliberate launch that doubled volume.
What if my data is growing on a linear trend?
A pure z-score detector against a flat baseline starts firing every day because the trend itself looks like an anomaly. Two fixes work. The cheap one is to detrend before scoring — fit a linear regression to the last 60 days and compute the z-score on the residuals. The better one is the rolling baseline shown above, which naturally tracks a smooth trend.
Should an alert on a drop look the same as an alert on a spike?
No, and most production runbooks treat them as different incidents. A drop usually means an upstream pipeline broke — an ingestion job failed, an API returned errors, a partner stopped sending data — and the playbook is to page the team that owns the source. A spike usually means duplicates or a runaway producer, and the playbook is to find what is over-emitting before storage costs explode. Split the verdict by sign and route the two cases to different rotations.
Row count versus other volume measures?
Row count is the right starting point — it requires no schema knowledge, works on every table, and catches outright pipeline failures. It can miss anomalies where the count is normal but the content is off: a partner sends the usual number of events but they are all zero-amount, or all from one user. The next layer monitors aggregate measures on the daily roll-up: SUM(amount), COUNT(DISTINCT user_id), AVG(amount). The full pattern is row count plus two or three business-meaningful sums, with the same z-score detector on each.