Changepoint detection in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why changepoints matter

An outlier is a single weird point. A changepoint is the moment the regime shifts — mean or variance moves to a new level and stays there. A release degrades p95 latency, a cheap acquisition channel turns on, a pricing change ships, a payment provider silently routes traffic differently. The number you stare at on the dashboard does not just spike, it lives at a new level for the next two weeks.

Outlier detection answers "is this point weird?". Changepoint detection answers "when exactly did things break?". That second question is the one your engineering manager actually needs at 9pm on a Friday, because it lines up with deploys, feature flags, and config rollouts. Pointing at a date is what unblocks the root-cause conversation.

In analytics interviews at Stripe, DoorDash, and Snowflake, this comes up framed as "metric X dropped, find the day". The candidates who reach for CUSUM and a pre/post t-test land the loop. The candidates who eyeball a chart and say "around April 15" do not, because they cannot defend the boundary.

CUSUM in SQL

CUSUM (cumulative sum) is the simplest, most readable method. The idea: subtract a baseline (the overall mean) from each observation, then take the running sum. If the regime is stable, the cumulative sum drifts around zero. If the mean shifted up, the sum starts climbing monotonically. The inflection point — where the slope of the cumulative sum changes — is your changepoint candidate.

WITH base AS (
    SELECT
        event_date,
        value,
        AVG(value) OVER () AS overall_mean
    FROM daily_metric
    WHERE event_date >= CURRENT_DATE - INTERVAL '60 days'
),
cusum AS (
    SELECT
        event_date,
        value,
        SUM(value - overall_mean) OVER (ORDER BY event_date) AS cusum_score
    FROM base
)
SELECT
    event_date,
    value,
    cusum_score
FROM cusum
ORDER BY event_date;

cusum_score is what you plot. A stable series gives a flat, near-zero line. A regime that shifted up gives a V-shaped curve with the minimum at the changepoint — before the change, deviations are mostly negative; after, mostly positive. To pick the date programmatically, locate the row where the absolute value of cusum_score is maximized, or where the running minimum and running maximum diverge fastest.

A subtle property: CUSUM is sensitive to the baseline. If you compute overall_mean over the full 60-day window and the regime change happened on day 20, your baseline is contaminated by both regimes. The cleaner pattern is to use a fixed reference window — the first 14 days, or a known "good" period before the suspected event — and only let later observations contribute to the cumulative sum.

Mean shift on rolling windows

CUSUM is great for one global changepoint. When you suspect the shift is somewhere in the middle of the series but do not know where, scan two rolling windows: one looking backward, one looking forward. The candidate changepoint is wherever the difference between them is largest.

WITH rolling AS (
    SELECT
        event_date,
        value,
        AVG(value) OVER (
            ORDER BY event_date
            ROWS BETWEEN 14 PRECEDING AND 1 PRECEDING
        ) AS pre_mean,
        AVG(value) OVER (
            ORDER BY event_date
            ROWS BETWEEN CURRENT ROW AND 14 FOLLOWING
        ) AS post_mean
    FROM daily_metric
    WHERE event_date >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
    event_date,
    value,
    post_mean - pre_mean AS mean_shift,
    ABS(post_mean - pre_mean) AS shift_magnitude
FROM rolling
WHERE event_date BETWEEN CURRENT_DATE - INTERVAL '60 days'
                     AND CURRENT_DATE - INTERVAL '14 days'
ORDER BY shift_magnitude DESC
LIMIT 10;

Two design choices to defend in an interview. First, why exclude the current row from the pre_mean window? Because if the changepoint is exactly on day D, you do not want D's value polluting the "before" mean. Second, why filter the outer query to BETWEEN -60 and -14 days? Because the first and last 14 rows of the series do not have full windows on both sides, so their mean_shift is biased by partial-window noise. Ranking those rows in the top 10 would produce false positives at the boundaries.

The output is a ranked shortlist of candidate dates. Eyeball the top three, then confirm one with a statistical test.

Pre/post t-test

A visual shift in the CUSUM plot is a hypothesis, not a finding. To say "the regime changed on April 15" with a straight face, run a two-sample t-test on the 14 days before vs the 14 days after.

WITH candidate AS (
    SELECT DATE '2026-04-15' AS changepoint
),
pre AS (
    SELECT value
    FROM daily_metric, candidate
    WHERE event_date BETWEEN changepoint - INTERVAL '14 days'
                         AND changepoint - INTERVAL '1 day'
),
post AS (
    SELECT value
    FROM daily_metric, candidate
    WHERE event_date BETWEEN changepoint
                         AND changepoint + INTERVAL '14 days'
),
stats AS (
    SELECT
        (SELECT AVG(value)      FROM pre)  AS mean_pre,
        (SELECT AVG(value)      FROM post) AS mean_post,
        (SELECT VAR_SAMP(value) FROM pre)  AS var_pre,
        (SELECT VAR_SAMP(value) FROM post) AS var_post,
        (SELECT COUNT(*)        FROM pre)  AS n_pre,
        (SELECT COUNT(*)        FROM post) AS n_post
)
SELECT
    mean_post - mean_pre AS diff,
    (mean_post - mean_pre)
        / NULLIF(SQRT(var_pre / n_pre + var_post / n_post), 0) AS t_stat,
    CASE
        WHEN ABS(
            (mean_post - mean_pre)
                / NULLIF(SQRT(var_pre / n_pre + var_post / n_post), 0)
        ) > 2 THEN 'significant shift'
        ELSE 'NOT significant'
    END AS verdict
FROM stats;

A |t_stat| above 2 corresponds to roughly a 95% confidence level with these sample sizes — close enough to defend the call. Above 3 is "the regime definitely changed". Below 2 means the shift you saw on the chart is plausibly noise from a 14-day window. Use the Welch form (unequal variances) shown above rather than the pooled-variance version, because pre and post regimes usually have different volatility.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Multiple changepoints

A series can have more than one regime shift in 90 days. The cheap heuristic: if the CUSUM curve has several local maxima and minima of comparable amplitude, treat each one as a candidate and run the pre/post t-test on each independently. Walk left to right, accept the first significant changepoint, recompute the baseline using only the data after it, repeat. This is essentially binary segmentation, and it is what the Python ruptures library does under the hood with Binseg.

For a rigorous probability over the entire segmentation — "what is P(changepoint = April 15) given the data?" — you want Bayesian online changepoint detection, which is an MCMC problem and does not belong in SQL. Run that in a notebook on the dates SQL already flagged.

Common pitfalls

The first mistake is running CUSUM on trending data without detrending. If your series grows linearly because the product is growing, the cumulative sum of value - overall_mean will climb forever and every late date looks like a changepoint. Fit a linear trend first (a simple regression on the date column), subtract it, then CUSUM the residuals. The cumulative sum should now be a zero-centered random walk under the null.

The second pitfall is pre/post windows that are too short. Five-day windows give you variance estimates with 4 degrees of freedom — the t-statistic is unstable and you will flag noise as changepoints all week. Fourteen days is the floor for daily metrics. For hourly metrics use 48 to 72 hours per side. The narrower the window, the higher the threshold you should require on |t_stat| to compensate.

The third pitfall is skipping the statistical test entirely. CUSUM and mean-shift plots are excellent at showing you something interesting, but they are not significance tests. Every series has a "biggest CUSUM excursion" even if nothing actually changed. Without the t-test you are pattern-matching on noise. The fix is to always close the loop: the chart tells you the candidate date, the t-test tells you whether to ship the finding to the engineering team.

The fourth pitfall is using one threshold for every metric. Revenue and conversion rate have very different signal-to-noise profiles. For volatile metrics like daily revenue, require |t_stat| > 3 before opening a ticket. For low-variance metrics like API latency on a stable service, |t_stat| > 2 is usually sufficient. Calibrate per metric using historical regimes you know are stable.

The fifth pitfall is ignoring seasonality. Weekly cycles in B2B traffic, summer dips in retail, end-of-month spikes in finance dashboards — all of these create apparent changepoints if you do not deseasonalize first. Subtract a 7-day moving average (or a same-day-last-week baseline) before running CUSUM, or detect changepoints on the residuals from an STL decomposition.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.

FAQ

CUSUM or mean shift — which should I reach for first?

Reach for CUSUM first when you want a global picture of how the regime evolved across the whole window — it gives you a single readable curve. Reach for mean shift when you already suspect "somewhere in the last 60 days something happened" and want a ranked shortlist of candidate dates. In practice the workflow is CUSUM to see the shape, mean shift to nominate dates, t-test to confirm one.

What is the right pre/post window length?

For a daily metric, 14 days per side is the working default. Shorter than 7 days and your variance estimates are too noisy to trust the t-statistic. Longer than 28 days and you lose temporal resolution — a real changepoint will get diluted by two weeks of pre-change data living inside your "post" window. For hourly metrics use 48 to 72 hours per side; for weekly metrics use 6 to 8 weeks.

Bayesian changepoint detection vs CUSUM — when do I need BCP?

Bayesian online changepoint detection (BCP) gives you a probability distribution over changepoint locations, which is what you want when you are building an automated alerting system that needs to quantify uncertainty. CUSUM gives you a single best guess plus a t-test for significance, which is what you want for ad-hoc investigation. For interview answers and most internal dashboards, CUSUM plus t-test is enough. Reach for BCP when the stakes justify the additional setup — fraud monitoring, infra SLO breaches, regulated-metric reporting.

How is a changepoint different from an outlier?

An outlier is a single observation that is far from the local distribution but does not change the regime — the next day the series returns to where it was. A changepoint is a transition: every observation after the changepoint date is drawn from a different distribution than every observation before. The diagnostic is simple: if you delete the suspicious day and refit, does the rest of the series look normal (outlier) or does the post-period still look shifted (changepoint)?

Can I do this entirely in SQL or do I need Python?

CUSUM, rolling mean-shift detection, and the pre/post t-test all run cleanly in SQL using window functions, which is why most analytics teams ship the first version of changepoint monitoring as a scheduled query in Snowflake or BigQuery. The harder methods — Bayesian online changepoint detection, kernel-based changepoint detection, multivariate changepoint models — need Python and a library like ruptures or bayesian_changepoint_detection. The practical split: SQL for daily monitoring dashboards, Python for one-off forensic analysis when the SQL flags something the team wants to dig into.

How do I avoid being woken up at 3am by false changepoint alerts?

Three knobs. First, require statistical significance on the t-test before paging — never alert on CUSUM amplitude alone. Second, require persistence: the post-period mean must still be shifted seven days later, not just on the candidate day. Third, deseasonalize and detrend before running the detector, because most "changepoints" alerted by naive systems are weekly seasonality the model never saw. Apply all three and the alert volume drops by an order of magnitude without losing the real regime shifts.