How to calculate LTV by cohort in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why LTV by cohort matters

It is Sunday night and your head of growth pings: "What is our LTV?" If you answer with a single number, you will be wrong by Monday lunch. A flat LTV averages across every customer through pricing changes, channel-mix shifts, and product pivots. The 2024 customers paid different prices, came from different channels, and used a different product than the 2026 customers. Collapsing them into one number throws away every signal a growth team needs.

LTV by cohort holds the acquisition window fixed and watches what each batch of customers is worth as they age. The output is a curve — "the average customer acquired in 2025-09 had paid 18 dollars by M1, 32 by M3, 58 by M6, 91 by M12". That lets you compare cohorts on equal footing, identify when a channel started decaying, and decide whether the Q4 spend pulled in customers worth keeping. Stripe, Notion, Linear, and DoorDash ship some version of this in weekly growth reviews, and it shows up in onsite SQL rounds at Snowflake, Databricks, and Airbnb. This post walks the recipe: schema, runnable Postgres SQL, the channel split, projection for young cohorts, pitfalls, and optimizations.

The output shape

Before writing any SQL, sketch the output. LTV by cohort should land as one row per (cohort_month, months_since_signup) pair with a cumulative_ltv column.

cohort_month | months_since_signup | cohort_size | cumulative_ltv
2025-09      | 0                   | 12450       | 18.40
2025-09      | 1                   | 12450       | 31.95
2025-09      | 2                   | 12450       | 44.10
2025-09      | 3                   | 12450       | 58.20
2025-09      | 12                  | 12450       | 91.05
2025-10      | 0                   | 14200       | 19.10
2025-10      | 1                   | 14200       | 33.60

Each cell is the running sum of revenue from cohort signup through months_since_signup, divided by the cohort's initial size. The denominator stays constant — that is what makes the metric a per-user lifetime number rather than a per-active-user number. A user who churns is still in the denominator; their contribution to the numerator simply stops growing. That is the whole point: it tells you what an acquired customer is worth, not what a still-active customer is worth.

Cumulative LTV in SQL

Assume two tables. users(user_id, created_at, utm_source) has one row per acquired user. orders(user_id, order_date, amount) has one row per successful charge. The recipe has four CTEs and one window function.

WITH cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', created_at)::DATE AS cohort_month
    FROM users
    WHERE created_at >= '2024-01-01'
),
user_revenue AS (
    SELECT
        c.cohort_month,
        c.user_id,
        EXTRACT(YEAR FROM AGE(o.order_date, c.cohort_month)) * 12
            + EXTRACT(MONTH FROM AGE(o.order_date, c.cohort_month)) AS months_since,
        o.amount
    FROM cohorts c
    JOIN orders o USING (user_id)
    WHERE o.order_date >= c.cohort_month
),
monthly_revenue AS (
    SELECT
        cohort_month,
        months_since,
        SUM(amount) AS revenue_at_month
    FROM user_revenue
    GROUP BY cohort_month, months_since
),
cohort_size AS (
    SELECT cohort_month, COUNT(*) AS users FROM cohorts GROUP BY cohort_month
)
SELECT
    mr.cohort_month,
    mr.months_since,
    cs.users,
    SUM(mr.revenue_at_month) OVER (
        PARTITION BY mr.cohort_month
        ORDER BY mr.months_since
    ) / NULLIF(cs.users, 0) AS cumulative_ltv
FROM monthly_revenue mr
JOIN cohort_size cs USING (cohort_month)
ORDER BY mr.cohort_month, mr.months_since;

The cohorts CTE assigns every user one immutable signup month — the cohort identity that must never drift. The user_revenue CTE joins to orders and computes months_since using the year-times-twelve plus month pattern, which is the safest cross-database arithmetic for "months between two dates". The WHERE o.order_date >= c.cohort_month filter drops pre-signup activity that would corrupt the running sum.

The monthly_revenue CTE collapses every order down to one row per (cohort_month, months_since) pair — the grain the window function needs. The cohort_size CTE counts users once per cohort; that denominator does not change as the cohort ages. The final SUM(...) OVER (PARTITION BY cohort_month ORDER BY months_since) is a classic running total: for each row, sum every revenue value from months_since = 0 through the current row. Divide by user count and you have cumulative LTV.

On BigQuery substitute DATE_DIFF(order_date, cohort_month, MONTH). On Snowflake and Redshift use DATEDIFF('month', cohort_month, order_date). Always unit-test the date arithmetic — a December-to-February gap returning two instead of fourteen is a silent rollover bug that survives until a cohort hits M12.

LTV split by channel

LTV by cohort is interesting on its own; LTV by cohort by channel is the report growth leadership actually wants. Add utm_source (or your acquisition-channel column) to the cohort CTE and propagate it.

WITH cohorts AS (
    SELECT
        user_id,
        utm_source,
        DATE_TRUNC('month', created_at)::DATE AS cohort_month
    FROM users
    WHERE created_at >= '2024-01-01'
),
user_revenue AS (
    SELECT
        c.cohort_month,
        c.utm_source,
        EXTRACT(YEAR FROM AGE(o.order_date, c.cohort_month)) * 12
            + EXTRACT(MONTH FROM AGE(o.order_date, c.cohort_month)) AS months_since,
        SUM(o.amount) AS revenue_at_month
    FROM cohorts c
    JOIN orders o USING (user_id)
    WHERE o.order_date >= c.cohort_month
    GROUP BY 1, 2, 3
),
cohort_size AS (
    SELECT cohort_month, utm_source, COUNT(*) AS users
    FROM cohorts
    GROUP BY 1, 2
)
SELECT
    ur.cohort_month,
    ur.utm_source,
    ur.months_since,
    cs.users,
    SUM(ur.revenue_at_month) OVER (
        PARTITION BY ur.cohort_month, ur.utm_source
        ORDER BY ur.months_since
    ) / NULLIF(cs.users, 0) AS cumulative_ltv
FROM user_revenue ur
JOIN cohort_size cs USING (cohort_month, utm_source)
WHERE ur.months_since IN (0, 3, 6, 12)
ORDER BY ur.cohort_month, ur.utm_source, ur.months_since;

The only structural change is widening the PARTITION BY clause to include utm_source, which keeps each channel's running total isolated. The WHERE months_since IN (0, 3, 6, 12) trims output to the four anchor points growth reviews care about — month zero, month three (early traction), month six (mid-cohort), and month twelve (full year). The channel with the largest M12 LTV is the one whose acquisition dollar compounds the hardest.

A common mistake here is comparing channels at M12 on a chart built from cohorts of mixed ages. Limit the comparison to cohorts old enough to have the data point — for an M12 chart, only include cohorts whose signup month is twelve or more months before today.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Projecting LTV for young cohorts

The growth team wants to know what the September cohort is worth. It is October — only one month of data exists. The honest answer is "ask me again in a year". The useful answer is a projection: assume the September cohort follows the same shape as older cohorts, anchor on the data you do have, and scale.

WITH old_cohorts AS (
    SELECT
        months_since,
        AVG(cumulative_ltv) AS mean_ltv
    FROM cohort_ltv_data
    WHERE cohort_month BETWEEN '2024-01-01' AND '2024-12-31'
      AND months_since BETWEEN 0 AND 12
    GROUP BY months_since
),
new_cohort AS (
    SELECT
        cumulative_ltv AS ltv_at_3
    FROM cohort_ltv_data
    WHERE cohort_month = '2026-02-01'
      AND months_since = 3
)
SELECT
    nc.ltv_at_3,
    (SELECT mean_ltv FROM old_cohorts WHERE months_since = 3) AS old_mean_at_3,
    (SELECT mean_ltv FROM old_cohorts WHERE months_since = 12) AS old_mean_at_12,
    nc.ltv_at_3
        / NULLIF((SELECT mean_ltv FROM old_cohorts WHERE months_since = 3), 0)
        * (SELECT mean_ltv FROM old_cohorts WHERE months_since = 12) AS projected_ltv_12
FROM new_cohort nc;

The logic is a ratio scale: compute the ratio of the new cohort's M3 LTV to the old cohorts' average M3 LTV, then multiply by the old cohorts' M12 LTV. If the new cohort tracks 20 percent above the old average at M3, the projection assumes that lift carries to M12. This breaks the moment something structural changes — a price increase, channel-mix shift, or a product launch that retains better. Report the projection with the assumption, never as a standalone number.

A refinement is to average the ratio across months_since IN (1, 2, 3) rather than M3 alone, smoothing over seasonality. For more sophisticated work, fit Pareto/NBD or BG/NBD — but a ratio scale is fine for an interview answer and a first production dashboard.

Common pitfalls

The first pitfall is a flat denominator. A naive analyst writes SUM(revenue) / COUNT(DISTINCT user_id) over a date range and labels it "LTV". That collapses every cohort effect into one average and erases the entire reason for the analysis. Lock the denominator at the cohort's initial size, computed once at the cohort CTE, and never recount downstream. The 2025-01 cohort had 12,450 users; that denominator does not move regardless of churn or refunds.

The second pitfall is letting cohort identity drift. Many LTV queries quietly reassign users to a later cohort by joining on DATE_TRUNC('month', order_date) instead of DATE_TRUNC('month', created_at). The result is a monthly revenue chart dressed up as cohort analysis. Tag every user with their cohort identity in the first CTE and treat that column as immutable. If you recompute the cohort downstream, the query is wrong.

The third pitfall is mixing mean and median without labeling which. A handful of whales can pull mean LTV three to five times above median. Mean is what finance wants for forecasts; median is what product wants for the typical user. Report both with clear labels. Add a PERCENTILE_CONT(0.5) WITHIN GROUP on per-user cumulative revenue alongside the cohort-size division.

The fourth pitfall is comparing immature cohorts to mature ones on one chart. The 2026-03 cohort cannot have M12 in May 2026. If you let it through, the chart shows 2026-03 dropping to zero at M12 while older cohorts show healthy numbers, and leadership reads it as "the March cohort is dead". Render unmeasurable cells as em-dashes and filter to cohorts old enough to have the data point.

The fifth pitfall is treating refunds and chargebacks as zero revenue. A January charge refunded in May should reduce the January cohort's cumulative LTV from May onward. Dropping refunds overstates LTV and finance audits catch it. Use SUM(CASE WHEN status = 'success' THEN amount WHEN status IN ('refunded', 'chargeback') THEN -amount ELSE 0 END) and reuse the rule across every revenue query so LTV, ARPU, NRR, and cohort revenue all agree.

Optimization tips

On a billion-row orders table the cohorts JOIN orders is the bottleneck. The first lever is partitioning on order_date — Snowflake and BigQuery partition natively; on Postgres declare monthly partitions explicitly. The query then reads only partitions after the earliest cohort_month, which on real workloads is the difference between eight seconds and six minutes.

The second lever is a pre-aggregated user_monthly_revenue(user_id, payment_month, revenue, txn_count) table refreshed nightly by a dbt model. The cohort LTV query becomes a small join between the cohort table and this pre-aggregate — two to three orders of magnitude smaller than the raw transactions. The same pre-aggregate powers ARPU, NRR, and cohort revenue.

The third lever is materializing the cohort-by-offset output as a view that refreshes overnight. If the dashboard reloads hundreds of times a day, that view turns a multi-CTE running-sum query into one sequential scan. Confirm with the owner that yesterday's close is acceptable before flipping the switch.

If you want to drill cohort SQL daily, NAILDD is launching with 500+ SQL problems on this exact pattern.

FAQ

How do I compare cohorts of different ages?

Only at a months_since that every cohort in the comparison has actually reached. A 2026-01 and a 2025-01 cohort are comparable at M3 because both have three months of data. They are not comparable at M12 because the 2026-01 cohort has not aged that far. Filter to a months_since value present in every cohort, or render missing cells as em-dashes. A partial cohort appearing as a zero next to a full one reads as "this cohort died", which is almost always wrong.

Should I report mean or median LTV?

Both, labeled clearly. The mean is what finance uses for revenue forecasts and LTV-to-CAC ratios. The median is what product uses to reason about a "typical" customer. In any business with whales, the mean is two to five times the median; presenting only the mean misleads on the modal experience. Add a PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY user_cumulative_revenue) column alongside the cohort-size division.

Do I include refunds in LTV?

In net LTV, yes — with a sign flip so a refund subtracts from cumulative revenue. In gross LTV, no. Pick one per dashboard, document it in the column header, never mix them. Finance usually demands net; growth prefers gross for channel comparisons because it is closer to acquisition spend.

What is the right LTV horizon?

For SaaS with annual contracts, twenty-four months is the floor — renewals are the whole story and you cannot see them before M12. For monthly SaaS, twelve months stabilizes the curve. For consumer apps with weekly engagement, six to twelve months is plenty. Pick the horizon based on the typical customer lifecycle, not based on how much data you happen to have.

Can I compare LTV/CAC at cohort level instead of company level?

Yes, and you should. Cohort-by-channel LTV divided by cohort-by-channel CAC is the cleanest growth ratio — it tells you which acquisition dollar in which month earned its keep. A blended company-level LTV/CAC averages winners and losers and hides channels burning money. Join this LTV result and the matching CAC query on (cohort_month, channel) with identical cohort definitions.

What if my cohort has a product change mid-life?

Define a new cohort at the change date. If you launched a new pricing tier in 2025-09 and the September cohort behaves differently from earlier cohorts, treat them as separate populations rather than averaging across the break. Averaging with a footnote makes the curve impossible to interpret. Split the cohort, label the regime change on the chart, and let the reader see both curves.