How to calculate Days Between Orders in SQL
Contents:
Why Days Between Orders matters
Days Between Orders — sometimes called inter-purchase interval or repeat cadence — is the average number of days between two consecutive purchases by the same customer. It looks small, but it sits at the center of three product surfaces every consumer team obsesses over. The first is retention timing: if the typical Stripe-billing DTC shopper repurchases every 34 days, you know when the lifecycle email should fire. The second is churn detection: a customer whose gap since last order is 1.7x the cohort median is much more likely to have lapsed than one at 0.8x. The third is category strategy: comparing grocery cadence against electronics cadence in the same marketplace tells merchandising where loyalty programs pay back.
Interview panels at DoorDash, Amazon, and Uber Eats love this question because it forces the candidate through three SQL ideas at once. You need a window function to pair each order with the previous one. You need a distribution view because the mean is almost always misleading on its own. And you need to reason about cohorts because cross-category averages collapse into noise the moment you mix a coffee subscription with a TV purchase. The query is short. The reasoning is what gets you the offer.
Skim a typical Series B dashboard and you will see a single number labeled "avg days between orders" with no breakdown. This post fixes that — SQL, cuts, and pitfalls a senior analyst at Airbnb or Notion expects.
The SQL formula
The core idea is simple. For each customer, order events are paired with the previous order using LAG, the gap is the difference between the two timestamps, and the global metric is the average and median of those gaps across all valid pairs. Single-purchase customers produce a NULL previous order and are excluded.
WITH order_gaps AS (
SELECT
user_id,
created_at,
LAG(created_at) OVER (
PARTITION BY user_id ORDER BY created_at
) AS prev_order_at
FROM orders
WHERE status = 'paid'
)
SELECT
AVG(EXTRACT(EPOCH FROM (created_at - prev_order_at)) / 86400)
AS avg_days_between,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (created_at - prev_order_at)) / 86400
) AS median_days_between,
COUNT(*) AS pair_count
FROM order_gaps
WHERE prev_order_at IS NOT NULL;A few notes on the shape. The EXTRACT(EPOCH FROM ...) pattern is Postgres-flavored; on Snowflake or BigQuery you would use DATE_DIFF(created_at, prev_order_at, DAY) or TIMESTAMP_DIFF(... , DAY). The status = 'paid' filter is non-negotiable: pending and cancelled orders inflate the count of pairs and distort the gap distribution because cancellations cluster on the same hour. Reporting pair_count alongside the means is a habit worth keeping — it tells the reader how thick the sample is before they trust the number.
The median is more honest than the mean here for the same reason it is more honest in salary surveys. The distribution of inter-purchase intervals is right-skewed in nearly every consumer category. One customer who returns after fourteen months pulls the mean up by minutes per pair across the whole base. Show both numbers, lead with the median.
Distribution
The single most useful follow-up to the average is the bucketed histogram. It reveals whether the population behaves like one cohort with a single peak or like two cohorts stitched together — a pattern you will see constantly in marketplaces that mix subscription and one-off behaviors.
WITH gaps AS (
SELECT
EXTRACT(EPOCH FROM (
created_at - LAG(created_at) OVER (
PARTITION BY user_id ORDER BY created_at
)
)) / 86400 AS days
FROM orders
WHERE status = 'paid'
)
SELECT
CASE
WHEN days <= 7 THEN '0-7 days'
WHEN days <= 30 THEN '8-30 days'
WHEN days <= 90 THEN '31-90 days'
WHEN days <= 180 THEN '91-180 days'
ELSE '180+ days'
END AS bucket,
COUNT(*) AS pair_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS share_pct
FROM gaps
WHERE days IS NOT NULL
GROUP BY 1
ORDER BY MIN(days);The shape of this histogram is more diagnostic than any single number. A unimodal distribution with most mass between 7 and 30 days is what you expect from a healthy grocery or beauty subscription. A bimodal shape — one spike at 0-7 days and another at 90+ — almost always means the merchant is serving two distinct jobs to be done, and the cohort should be split before any retention model is fit.
Segment breakdowns
The category cut is the one product managers will ask for first. Partition by both user and category, then compute the gap per segment so a customer who buys coffee weekly and a TV every three years is not blended into a single nonsense number.
WITH gaps AS (
SELECT
user_id,
category,
created_at,
LAG(created_at) OVER (
PARTITION BY user_id, category ORDER BY created_at
) AS prev_order_at
FROM orders
WHERE status = 'paid'
)
SELECT
category,
AVG(EXTRACT(EPOCH FROM (created_at - prev_order_at)) / 86400)
AS avg_days,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (created_at - prev_order_at)) / 86400
) AS median_days,
COUNT(*) AS pair_count
FROM gaps
WHERE prev_order_at IS NOT NULL
GROUP BY category
ORDER BY median_days;Typical ranges, calibrated against public DoorDash and Stripe benchmarks: grocery and household consumables sit at 7-30 days, fashion and beauty at 30-90, home goods at 90-180, and electronics at 180-365. If your category numbers fall wildly outside those windows, the first thing to check is not the model — it is whether the category column is populated correctly for the orders in question.
The same shape generalizes to acquisition channel, plan tier, or device. Replace category with acquisition_channel and the query immediately tells you whether paid-search shoppers repurchase faster than organic ones, which is the single most useful input to a CAC payback model. For a deeper take on payback by acquisition source, see how to calculate CAC by channel in SQL.
Churn triggers from gap data
Once you have the cohort median, you can flag at-risk customers in a single query. The convention is to compare each customer's days-since-last-order against a multiple — typically 1.5x to 2x — of the cohort or category median.
WITH cohort_median AS (
SELECT
category,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY gap_days) AS median_gap
FROM (
SELECT
category,
EXTRACT(EPOCH FROM (created_at - LAG(created_at) OVER (
PARTITION BY user_id, category ORDER BY created_at
))) / 86400 AS gap_days
FROM orders
WHERE status = 'paid'
) g
WHERE gap_days IS NOT NULL
GROUP BY category
),
last_order AS (
SELECT
user_id,
category,
MAX(created_at) AS last_purchase_at
FROM orders
WHERE status = 'paid'
GROUP BY 1, 2
)
SELECT
lo.user_id,
lo.category,
EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - lo.last_purchase_at)) / 86400
AS days_since_last,
cm.median_gap,
CASE
WHEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - lo.last_purchase_at)) / 86400
> 1.5 * cm.median_gap THEN 'at_risk'
ELSE 'healthy'
END AS state
FROM last_order lo
JOIN cohort_median cm USING (category);The output of this query is the input to a retention email job. Anyone whose days_since_last has crossed the 1.5x threshold is a candidate for a winback campaign. Anyone past 2.5x is closer to a reactivation problem than a retention problem and should be routed to a different creative.
Common pitfalls
The most common mistake is leaving single-purchase customers in the denominator. Every first-time buyer produces a NULL for prev_order_at under LAG, and if you forget the WHERE prev_order_at IS NOT NULL filter they will be silently included as zero-day gaps when the column gets coalesced downstream. The fix is to explicitly filter NULL in every gap query and to report pair_count next to the mean so the reader can see how thick the sample is.
A second trap is reporting the mean alone in the face of a long tail. Real inter-purchase intervals follow a right-skewed distribution, often log-normal in subscription categories and Pareto-tailed in fashion. A customer who returns after eighteen months is mathematically real but practically irrelevant for a 90-day retention strategy, and that one outlier will shove the mean up by days. Lead with the median, show the bucketed histogram, and only quote the mean when the distribution is visibly symmetric.
A third trap is treating same-day orders as legitimate zero-day gaps. A shopper who splits an order in two — once because the coupon failed, once again when it worked — produces a gap of less than a minute that is structurally noise rather than behavior. The conventional fix is to cap the minimum gap at one day, or to collapse same-day orders into a single basket at the staging layer before the metric is computed.
A fourth trap is mixing categories in a single average. A customer who buys groceries weekly and electronics every five years produces two gap distributions with means an order of magnitude apart. Averaging across them returns a number that describes neither and is useless for any downstream decision. Always segment by category, plan tier, or channel before computing the mean across users.
A fifth trap is survivorship bias. If you restrict the cohort to "active users" — anyone with an order in the last 90 days — the average gap will look artificially short, because you have already excluded the long tail. The mean and median should be computed on the full base of repeat purchasers, and the active-only cut should be labeled separately on the dashboard.
Optimization tips
For large orders tables, the LAG over PARTITION BY user_id is the expensive operation, and it benefits from a composite index on (user_id, created_at) with status either covered or pre-filtered. On Postgres, adding INCLUDE (status) to the index pays off when the planner would otherwise need a heap fetch for every row. On Snowflake and BigQuery, sort or cluster the table on (user_id, created_at) and the window function will benefit from local data ordering without a global sort.
Materializing the order_gaps CTE as a daily incremental table is worth doing the moment the orders table crosses a hundred million rows. The gap between two orders is immutable once both orders are paid, so the incremental job only needs to recompute pairs for users whose latest order arrived after the watermark. A two-line dbt model is usually enough.
When the dashboard repeatedly slices by category, channel, and tier, pre-aggregate at the user-segment level into a small user_gap_stats table with the median and pair count per user. The downstream BI tool reads tens of thousands of rows instead of tens of millions, and the segment-level numbers come back in milliseconds rather than minutes.
Related reading
- How to calculate AOV in SQL
- How to calculate cart abandonment in SQL
- How to calculate churn in SQL
- How to calculate cohort retention in SQL
- How to calculate customer lifetime in SQL
If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.
FAQ
What counts as a "good" Days Between Orders value?
There is no universal good number — only one that is consistent with the category benchmark. Grocery and consumables generally sit between 7 and 21 days. Fashion and beauty cluster around 30 to 90 days. Home goods and electronics stretch to 180 days or longer. Compare against the median for your own category, not against an absolute target borrowed from a different vertical.
Should I use the mean or the median?
Lead with the median for any consumer dataset because the underlying distribution is almost always right-skewed by a long tail of dormant customers who return after a year or more. Quote the mean as a supplementary number when the audience is forecasting revenue, since revenue projections need the mean by construction, but always pair it with the median so the audience sees how far the tail is pulling the average up.
How do I use this metric for retention triggers?
Compute the cohort or category median, then compare each customer's days-since-last-purchase against a multiple of that median. The standard convention is 1.5x for a winback email and 2.5x for a deeper reactivation campaign. Tune the thresholds against your own conversion data after the first month — the multipliers that work for a weekly grocery cadence will be wrong for a quarterly fashion cadence.
Why does the distribution histogram matter more than the average?
The histogram surfaces bimodal patterns that a single mean or median will hide. A marketplace that serves both subscription buyers and one-off buyers will look stable on the average view and obviously broken on the histogram, because the two populations sit at different peaks. Any retention model fit on the blended average will fail because it is averaging across two jobs to be done.
Should I cap same-day orders at one day or filter them out?
Capping at one day is the safer default for analytics because it preserves the pair in the denominator while removing the noise contribution from same-cart splits. Filtering them out entirely is appropriate only when the upstream ETL has not yet collapsed split orders into baskets, and you want to avoid double-counting the same purchase event. Document whichever convention you pick directly on the dashboard so downstream consumers do not silently compare two definitions.
Segment by user or by category?
Both, for different decisions. The per-user median is the input to churn and loyalty models because retention is fundamentally a per-customer phenomenon. The per-category median is the input to merchandising and marketing strategy because acquisition spend and promotion timing are set at the category level. A single dashboard should expose both views side by side and never blend them.