How to calculate exponential smoothing in SQL
Contents:
Why exponential smoothing matters
Your product manager pings you on a Thursday evening: "Daily active users were flat for a month, then jumped on Tuesday — what should I expect tomorrow?" You have one number to bring to the standup on Friday, and the answer needs to look defensible if a director asks you to explain the method on the spot. Exponential smoothing is the cheapest, most respectable baseline for that situation. It is one line of arithmetic, it weighs recent days more than ancient days, and it runs in pure SQL on the warehouse table you already have.
This question shows up in analytics rounds at Stripe, DoorDash, Uber, Netflix, Airbnb, Snowflake, and Databricks. The framing is almost always the same: here is a daily series with no obvious trend or seasonality, give me a forecast for tomorrow in SQL, then explain when the method breaks. Strong candidates write the recursive CTE in five minutes, name a sensible alpha, and call out the failure modes before the interviewer asks. Weak candidates either reach for Python because they think SQL cannot do recursion, or they pick a moving average and never explain why it lags.
Postgres, Snowflake, and Redshift all support WITH RECURSIVE, which is the natural way to express the EWMA recurrence in SQL. This post walks the formula end to end, builds the recursive query, tunes alpha by mean absolute error, and collects the traps that get candidates dinged.
The formula in one line
Exponential smoothing at time t is a weighted average of the new observation and the previous smoothed value:
S_t = alpha * Y_t + (1 - alpha) * S_{t-1}
S_1 = Y_1The smoothing weight alpha lives in the open interval from zero to one. A large alpha — say 0.8 — leans heavily on the most recent observation, which means the smoothed series reacts fast to new data and tracks noise. A small alpha — say 0.1 — leans on history, which means the series moves slowly and shrugs off single-day spikes. The one-step-ahead forecast is the current smoothed value: Y_hat_{t+1} = S_t.
Expanding the recurrence shows the weights explicitly: S_t = alpha * Y_t + alpha * (1 - alpha) * Y_{t-1} + alpha * (1 - alpha)^2 * Y_{t-2} + ... The weights decay geometrically, which is where the name comes from. Compared to a moving average, exponential smoothing never throws an old point away, it just gradually forgets it.
Recursive CTE in Postgres
The natural way to express the recurrence in SQL is WITH RECURSIVE. The anchor row sets S_1 = Y_1, and the recursive step joins the next row in the series and applies the smoothing formula. Order the input by date and assign a ROW_NUMBER so the recursive join has a stable predecessor to find.
WITH RECURSIVE base AS (
SELECT
event_date,
value,
ROW_NUMBER() OVER (ORDER BY event_date) AS rn
FROM daily_metrics
WHERE metric_name = 'dau'
),
smoothed AS (
SELECT
event_date,
value,
rn,
value::NUMERIC AS s_t
FROM base
WHERE rn = 1
UNION ALL
SELECT
b.event_date,
b.value,
b.rn,
0.3 * b.value + 0.7 * s.s_t AS s_t
FROM base b
JOIN smoothed s ON b.rn = s.rn + 1
)
SELECT event_date, value, s_t AS smoothed
FROM smoothed
ORDER BY event_date;The constant 0.3 here is the smoothing weight alpha, with the complement 0.7 applied to the previous smoothed value. That is a reasonable starting point for a daily product metric — fast enough to react to a real shift within a week, slow enough to ignore single-day noise. The next section walks through how to pick alpha properly instead of guessing.
On Snowflake and Redshift the same query works with minor adjustments to numeric casting. BigQuery does not support recursive CTEs in the same form, so on BigQuery you would translate the recurrence into a scripting block, or pull the series into Python and use pandas.Series.ewm.
Tuning alpha by MAE
Picking alpha by feel is a bad habit. The defensible choice is to grid-search alpha against a holdout slice of the series, compute the mean absolute error of the one-step-ahead forecast, and take the alpha with the smallest error. In SQL the cleanest pattern is a generate_series of candidate alphas, joined to the smoothed series, with the error aggregated per alpha.
WITH RECURSIVE base AS (
SELECT
event_date,
value,
ROW_NUMBER() OVER (ORDER BY event_date) AS rn
FROM daily_metrics
WHERE metric_name = 'dau'
),
alphas AS (
SELECT generate_series(0.05, 0.95, 0.05)::NUMERIC AS alpha
),
seeds AS (
SELECT a.alpha, b.event_date, b.value, b.rn, b.value::NUMERIC AS s_t
FROM alphas a
JOIN base b ON b.rn = 1
),
smoothed AS (
SELECT * FROM seeds
UNION ALL
SELECT
s.alpha,
b.event_date,
b.value,
b.rn,
s.alpha * b.value + (1 - s.alpha) * s.s_t AS s_t
FROM base b
JOIN smoothed s ON b.rn = s.rn + 1
),
forecasts AS (
SELECT
s.alpha,
b.value AS actual,
LAG(s.s_t) OVER (PARTITION BY s.alpha ORDER BY s.rn) AS prediction
FROM smoothed s
JOIN base b ON b.rn = s.rn
)
SELECT alpha, AVG(ABS(actual - prediction)) AS mae
FROM forecasts
WHERE prediction IS NOT NULL
GROUP BY alpha
ORDER BY mae ASC
LIMIT 1;The query computes the one-step-ahead prediction for every candidate alpha by carrying the smoothed value forward with LAG, compares it to the actual observation, and picks the alpha with the smallest mean absolute error. In practice you would also split the series into train and test windows by date so the alpha is chosen on out-of-sample error. On long series the same logic is faster in Python with scipy.optimize.minimize_scalar, but the SQL version is enough for an interview.
Forecasting one step ahead
Exponential smoothing produces a flat forecast at any horizon: the predicted value for tomorrow, the day after, and the day after that is all the same number, namely the last smoothed state S_T. That is the honest property of the method — it has no trend, no seasonality, no momentum. If you need to write the forecast back to a table for the next seven days, the pattern is a small generate_series joined to the last smoothed value.
WITH last_smoothed AS (
SELECT s_t AS last_state
FROM smoothed
ORDER BY event_date DESC
LIMIT 1
),
forecast AS (
SELECT
CURRENT_DATE + i AS forecast_date,
last_state AS prediction
FROM last_smoothed
CROSS JOIN generate_series(1, 7) AS g(i)
)
SELECT * FROM forecast;If the series has an obvious upward or downward trend, a flat forecast will consistently under or overshoot. The fix is Holt's linear method, which adds a second equation for the trend term and produces a sloped forecast. If there is weekly or yearly seasonality, the right baseline is Holt-Winters. Both extensions follow the same recursive CTE shape — the recurrence just has more state.
Common pitfalls
The first trap is picking alpha too high on a noisy series. With alpha at 0.9 the smoothed series is barely smoother than the raw data — most of the weight goes to today's observation, so a one-day spike pulls the forecast with it and a one-day dip pulls it back. The fix is to validate alpha against a holdout slice as shown above, or to start at 0.2 to 0.3 on consumer metrics where you expect daily noise but want the forecast to react to real shifts within a week.
The second trap is the opposite: picking alpha too low on a series that just had a real shift. With alpha at 0.05 the smoothed series carries roughly 20 days of history before it forgets the past, so after a release, a price change, or a seasonal kickoff the forecast lags reality for weeks. If you suspect a structural break has happened, do not just reach for a smaller alpha — detect the changepoint, restart the smoothing from there, and only then forecast forward.
The third trap is initialization. Setting S_0 = 0 is convenient and wrong: it makes the first dozen smoothed values look depressed because every step is dragged toward zero. The standard initialization is S_1 = Y_1 — anchor the series to its own first observation. On very short series some practitioners initialize with the mean of the first three or five points to reduce sensitivity to one noisy starting value, which is fine as long as you document the choice.
The fourth trap is applying plain exponential smoothing to a series with trend or seasonality. The method has no machinery for either component, so the residuals will show a clear pattern and the forecast will be visibly biased. The fix is not to crank alpha — it is to switch to Holt's linear method for trend, Holt-Winters for trend plus seasonality, or to detrend the series first and smooth the residual. An autocorrelation plot at lag seven will tell you whether you are in this case.
The fifth trap is running a recursive CTE on a long series. Postgres WITH RECURSIVE is single-threaded and accumulates the entire intermediate set in memory, so on a million-row daily series it becomes slow. The right answer is to materialize smoothed values incrementally, or pull the series into Python and use pandas.Series.ewm(alpha=0.3).mean().
Optimization tips
The first optimization is to materialize smoothed values incrementally. Once you have a smoothed value for day T minus one, the smoothed value for day T is a single multiplication and addition — there is no need to re-run the recursion over the full history every night. A daily job that reads yesterday's smoothed value, today's observation, and writes today's smoothed value will run in milliseconds and scale to any horizon. Store the smoothed values in a thin table keyed by metric and date.
The second optimization is partitioning by metric. If you are smoothing many metrics in the same recursive CTE, partition the row numbering by metric name so each series gets its own anchor and recursion. That keeps the recursive join cheap because the join condition includes the metric key, which gives Postgres a small set to scan for each step.
The third optimization is to drop precision where you do not need it. Smoothed values do not need DOUBLE PRECISION if the underlying metric is a count — NUMERIC(18, 4) is plenty and stores faster.
Related reading
- SQL window functions interview questions
- How to calculate autocorrelation in SQL
- How to detrend a time series in SQL
- How to detect changepoints in SQL
- How to detect anomalies in SQL
If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.
FAQ
What value of alpha should I start with?
For a stable daily metric — DAU on a mature product, weekly revenue for an established business — start at 0.2 to 0.3. For a more volatile series — early-stage signups, marketing-driven traffic, holiday-sensitive orders — start at 0.4 to 0.5. Always validate the choice against a holdout slice using mean absolute error or mean absolute percentage error; the best alpha is the one that minimizes out-of-sample error, not the one that produces the prettiest in-sample chart.
How does exponential smoothing compare to a moving average?
A moving average gives every point inside the window equal weight and every point outside zero weight, which means a hard cutoff at the window edge. Exponential smoothing gives the most recent point the most weight and decays geometrically backward, so recent observations count more but ancient ones never get fully discarded. On business data exponential smoothing usually reacts faster to real shifts while staying robust to single-day noise.
Can I compute exponential smoothing in Postgres without a recursive CTE?
Not in a clean way. The formula is inherently recursive — each smoothed value depends on the previous one. The practical alternatives are WITH RECURSIVE for short series, materializing smoothed values incrementally for long series, or computing the smoothing in Python and writing the result back to the warehouse.
Does exponential smoothing work on monthly data?
It does, but it is rarely the best choice on monthly cadence. Monthly series tend to be short — three to five years gives you 36 to 60 points — and most of that signal is trend and seasonality, which plain exponential smoothing cannot model. On monthly data you usually want Holt's linear method for trend, Holt-Winters for yearly seasonality, or a SARIMA fit in Python.
Can exponential smoothing produce a forecast for the next year?
In a strict sense yes, but the forecast will be a flat line at the last smoothed value, which is almost never useful at a one-year horizon. Any trend, seasonality, or known event in the next twelve months will be missed. For a one-year horizon the minimum reasonable baseline is Holt-Winters; for a metric tied to product launches or marketing pushes, you want a model that can take those events as inputs.
How does exponential smoothing compare to ARIMA?
Simple exponential smoothing is exactly equivalent to an ARIMA(0,1,1) model, and Holt-Winters is closely related to seasonal ARIMA. The practical difference is that exponential smoothing is faster to fit, easier to explain, and more forgiving on short series. ARIMA gives you tighter confidence intervals when you have enough clean data, which is why production forecasting stacks at Netflix and Uber use both.