May 17, 2026·12 min read

How to calculate active days in SQL

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Contents:

What active days really tells you
The base query and the schema we use
Distributions and L-metrics
Cohorts and monetization
The streak query
Common pitfalls
Optimization tips
Related reading
FAQ

What active days really tells you

Stickiness — DAU divided by MAU — gives you one number, and most of the time that number hides what is actually happening inside your product. Two apps can both post a DAU/MAU of twenty percent. In the first, the same five thousand people open the app every single day for twenty-eight days straight. In the second, twenty-five thousand different people open it once and never return. Stickiness rounds both of those products to the same scalar. Active days per user is the metric that pulls them apart.

Active days counts, for each user, the number of distinct calendar days they performed a meaningful action over some window — typically the trailing twenty-eight days. The distribution of that count is the engagement fingerprint of your product. A heavy left tail with most users at one or two days means you have an acquisition product. A heavy right tail at twenty-five through twenty-eight means you have a habit product. The mean alone tells you almost nothing; the shape is the story.

This metric was popularized by Facebook through the L-series — L28, L21, L14, L7, L1 — where Lx is the fraction of users active on at least x of the last twenty-eight days. Interviewers at Meta, Netflix, Uber, DoorDash, Snowflake, and Notion lean hard on this pattern because it is conceptually simple, operationally important, and easy to write wrong. This guide walks through the SQL recipes you need: the base count, the L-metrics, cohort decay, monetization splits, the streak query that nine out of ten candidates botch, and the mistakes that quietly inflate your numbers in production.

The base query and the schema we use

We will assume the canonical events table — events(user_id, event_at, event_name, platform) — that almost every analytics warehouse exposes. Pick one definition of "active" before you write a line of SQL. In an interview, asking which event qualifies is worth more than any clever query. Backend teams often treat any tracked event as active, while product teams want a meaningful action — a session of at least thirty seconds, or a specific business event like app_open, track_play, or order_placed. Pick one, document it, and use the same filter everywhere downstream.

The base active-days query is one line of real work. For each user, count distinct calendar dates inside the window:

SELECT
    user_id,
    COUNT(DISTINCT DATE(event_at)) AS active_days
FROM events
WHERE event_at >= '2026-04-01'
  AND event_at <  '2026-05-01'
GROUP BY user_id;

This is the building block. Every recipe below wraps a CTE around it and slices it differently. The DISTINCT is load-bearing — drop it and you count events instead of days, so a user with twelve events on one Tuesday returns twelve active days. For a rolling window, swap the predicate to event_at >= NOW() - INTERVAL '28 days'. Rolling windows are the right choice for L-metrics because they cancel weekday seasonality — twenty-eight is exactly four weeks, so every weekday appears the same number of times in the window.

Distributions and L-metrics

Before any aggregate, look at the distribution. The same user_days CTE — user_id and COUNT(DISTINCT DATE(event_at)) over the trailing twenty-eight days — feeds two queries. Group by active_days and COUNT(*) to get a histogram. Plot it. You will see one of three shapes: a steep decay from one day, a U-shape with a second peak at twenty-eight, or a balanced curve. A flat L1 with no right tail means activation works but retention does not. A heavy right tail with a thin middle means you have a power-user core and a churning fringe.

The L-metrics are a single aggregate over that CTE. Lx is the fraction of users with at least x active days:

WITH user_days AS (
    SELECT
        user_id,
        COUNT(DISTINCT DATE(event_at)) AS days_active_28
    FROM events
    WHERE event_at >= NOW() - INTERVAL '28 days'
    GROUP BY user_id
)
SELECT
    AVG(CASE WHEN days_active_28 >= 1  THEN 1.0 ELSE 0 END) AS l1_rate,
    AVG(CASE WHEN days_active_28 >= 7  THEN 1.0 ELSE 0 END) AS l7_rate,
    AVG(CASE WHEN days_active_28 >= 14 THEN 1.0 ELSE 0 END) AS l14_rate,
    AVG(CASE WHEN days_active_28 >= 21 THEN 1.0 ELSE 0 END) AS l21_rate,
    AVG(CASE WHEN days_active_28 >= 28 THEN 1.0 ELSE 0 END) AS l28_rate
FROM user_days;

For most consumer apps an L28 of five to fifteen percent is healthy — that is your power-user core. An L28 above twenty-five percent is exceptional and usually only happens for utility products people open every workday. If your L28 is zero, either tracking is broken or your activity definition is too narrow.

Cohorts and monetization

Active days only becomes useful when you compare cohorts. Build a CTE with each user's first-seen month via DATE_TRUNC('month', MIN(event_at)), join to per-day activity, and divide distinct active dates by distinct cohort users. One trap: cast to NUMERIC before dividing. In Postgres, BigQuery, and many engines, BIGINT / BIGINT is integer division and silently truncates — your "average active days" will be zero for almost every row, and someone on the team will ship that to a Looker tile before anyone notices.

The follow-up your VP of Growth actually wants is the join between active days and revenue:

WITH user_activity AS (
    SELECT user_id, COUNT(DISTINCT DATE(event_at)) AS active_days
    FROM events
    WHERE event_at >= NOW() - INTERVAL '28 days'
    GROUP BY user_id
),
user_revenue AS (
    SELECT user_id, SUM(total) AS revenue
    FROM orders
    WHERE status = 'paid'
      AND created_at >= NOW() - INTERVAL '28 days'
    GROUP BY user_id
)
SELECT
    ua.active_days,
    COUNT(*)                                          AS users,
    AVG(COALESCE(ur.revenue, 0))                      AS avg_revenue_per_user,
    AVG(CASE WHEN ur.revenue > 0 THEN 1.0 ELSE 0 END) AS paid_conversion
FROM user_activity ua
LEFT JOIN user_revenue ur ON ur.user_id = ua.user_id
GROUP BY ua.active_days
ORDER BY ua.active_days;

The expected pattern is monotonic — paid conversion and ARPU rise with active days. A kink in the middle often means a paywall placed wrong; a sharp jump between adjacent buckets often means a habit threshold the product team has been hunting for. Use LEFT JOIN and COALESCE so active non-payers stay in the denominator of paid conversion — drop them and conversion becomes "conversion conditional on paying," which is one hundred percent by construction.

Sharpen SQL for your next interview

500+ SQL problems with worked solutions — joins, window functions, CTEs.

Join the waitlist

The streak query

Streak counting is the canonical SQL interview question on this topic and the place where most candidates fall apart. The trick is the gap-and-island pattern. For each user, collect distinct active dates, assign a sequential row number ordered by date, and subtract the row number from the date. Every consecutive run of dates produces the same difference, so that difference becomes the streak group identifier.

WITH user_days AS (
    SELECT DISTINCT user_id, DATE(event_at) AS day
    FROM events
),
day_rn AS (
    SELECT
        user_id,
        day,
        day - ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY day)
              * INTERVAL '1 day' AS streak_group
    FROM user_days
),
streaks AS (
    SELECT user_id, streak_group, COUNT(*) AS streak_length
    FROM day_rn
    GROUP BY user_id, streak_group
)
SELECT user_id, MAX(streak_length) AS longest_streak
FROM streaks
GROUP BY user_id;

Walk through it: if a user is active on April 1, 2, 3, and 5, the row numbers ordered by day are 1, 2, 3, 4. Subtracting one, two, three, and four days from the respective dates gives March 31, March 31, March 31, and April 1. The first three days collapse to one group, the fourth becomes its own, and COUNT(*) returns three and one as streak lengths.

For a current streak anchored to today, restrict user_days to the last sixty days, rank descending from the most recent active day, and keep rows where the i-th row from the top is exactly i days behind today. The run stops the first time that equation breaks. For deeper window-function patterns, the SQL window functions interview questions post drills the full set.

Common pitfalls

The single most common mistake is dropping the DISTINCT inside the count. Writing COUNT(DATE(event_at)) instead of COUNT(DISTINCT DATE(event_at)) counts events, not days. A user with twelve events on one Tuesday returns twelve active days, the metric inflates by an order of magnitude, and the L28 rate jumps above one hundred percent. Always cast the date once and wrap it in DISTINCT.

The second trap is time zones. If event_at is stored in UTC and your users are mostly on the US west coast or in Asia, a session that starts at 11pm local can split across two UTC dates and inflate active days. The fix is DATE(event_at AT TIME ZONE 'America/Los_Angeles') — apply the time zone once before truncating. For globally distributed products, pick a company-standard reporting time zone and use it everywhere.

The third issue is unfiltered junk events. Bot traffic, internal QA accounts, and scraping spikes inflate active days and drag down your conversion-by-active-days curve. Filter to known-good event names and exclude internal user IDs. The sanity check: the top ten users by active days should be real users, not your own test fixtures.

The fourth pitfall is mixing window definitions across reports. Twenty-eight days, four weeks, one calendar month, and trailing-thirty are four different windows that produce four different numbers. Pick one — twenty-eight kills weekday seasonality — and use it everywhere. When a PM asks why retention dropped, the last thing you want is to discover two reports were silently using different windows.

Optimization tips

On an events table with billions of rows the bottleneck is the date filter. If your warehouse supports partitioning — BigQuery, Snowflake, Redshift, partitioned Postgres — partition by date and let the planner prune. A query that scans twenty-eight one-day partitions instead of a five-year table runs in seconds instead of hours.

Materialize the per-day-per-user table once. Most engagement queries reduce to SELECT user_id, day FROM user_active_days WHERE day >= X. Computed nightly as an incremental dbt model on Databricks or Snowflake, every downstream query — L-metrics, distributions, streaks, segments — runs against a table orders of magnitude smaller than raw events. For approximate counts the APPROX_COUNT_DISTINCT family — HyperLogLog on BigQuery, APPROX_COUNT_DISTINCT on Snowflake, APPROX_DISTINCT on Presto — is two to ten times faster than exact COUNT(DISTINCT) with error under one percent, which is fine for dashboard tiles even if not for billing.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.

FAQ

Should I look at active days or DAU?

They answer different questions. DAU is the operational pulse — how many distinct users showed up today. Active days per user is an engagement-depth metric — over a fixed window, how many days does the average user actually return. DAU goes on the executive dashboard because it is one number that moves visibly. Active days goes on the product team's dashboard because the distribution tells you whether engagement is broad or shallow. The healthy practice is to report both and watch them diverge.

Is L28 of one hundred percent the goal?

No, and chasing it will make you build a worse product. L28 of one hundred percent means every monthly active user is active every single day of the window, which is essentially impossible for a product that respects weekends, holidays, and competing apps. For most B2C consumer products an L28 between five and fifteen percent is excellent. Utility products that people open as part of a workflow — Slack, Linear, Notion at work — can hit twenty to thirty. If you see L28 close to one hundred, suspect tracking duplication before you celebrate.

Does a streak require strictly consecutive days?

Yes. A streak counts only unbroken runs of consecutive days; one missed day resets the counter to zero. Some products soften this in the UI — Duolingo's "streak freeze" is the famous example — but in your analytics warehouse you should compute the strict version first. Storing the strict streak alongside any product-defined leniency prevents your engagement metrics from depending on UI policy.

How often should I recompute active days?

For most teams, once a day during the overnight batch is sufficient — the metric is aggregated over a multi-day window, so intraday updates rarely change the picture. Go faster when an operational dashboard drives same-day decisions like push or re-engagement campaigns; an hourly materialized view is the right compromise. If you already have a sessions table, swap events for sessions and keep the rest identical; both definitions agree as long as a session requires at least one tracked event.