How to calculate DAU in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What DAU actually means

DAU stands for Daily Active Users — the count of unique users who performed a meaningful action over a single calendar day. The number sits on every product dashboard at Meta, Netflix, Uber, Snap, and DoorDash, and yet most analysts get tripped up the moment somebody asks for it under interview pressure. The headline definition hides three independent choices: what counts as "active", which calendar the day belongs to, and how to handle duplicates inside that day.

This guide walks through the SQL recipes that come up in product-analyst loops at consumer-tech companies — Airbnb, Linear, Stripe, Notion. The patterns are the same on Snowflake, BigQuery, Redshift, or vanilla Postgres. Before any SQL runs, settle the "active" event with whoever is asking. For a social product, "active" usually means opening the main feed. For a marketplace it is a product view. For a mobile game it is a session start. Interviewers love to ask candidates to write a DAU query without specifying activity, and the correct first move is to ask back. We will use app_open throughout this post — substitute your own canonical event when you copy these queries.

The SQL formula

Conceptually, DAU is one line:

DAU(day) = COUNT(DISTINCT user_id) WHERE event happened on `day`

The data lives in an events table. A minimal schema:

events (user_id BIGINT, event_name TEXT, platform TEXT, created_at TIMESTAMPTZ)

Every row is one event. The same user can appear thousands of times in a single day. The job of the DAU query is to collapse those rows into a single count of distinct user IDs per calendar day, after filtering down to the "active" event.

Daily and weekly breakdowns

The basic daily query is the one every interviewer expects you to write in under sixty seconds:

SELECT
    created_at::DATE        AS day,
    COUNT(DISTINCT user_id) AS dau
FROM events
WHERE event_name = 'app_open'
GROUP BY 1
ORDER BY 1;

This is the form you persist in a dau_daily materialized view or daily-snapshot table, because everything else is a derivation of it. Once your events table crosses a few hundred million rows, querying it directly from a dashboard becomes painful, so persist the daily aggregate once per day and let consumers point at the snapshot.

The platform split is the same query with one extra group key. Interviewers ask for it because iOS, Android, and web almost always move differently — an Android-only release blowing up will not show in a single-line DAU chart.

SELECT
    created_at::DATE        AS day,
    platform,
    COUNT(DISTINCT user_id) AS dau
FROM events
WHERE event_name = 'app_open'
GROUP BY 1, 2
ORDER BY 1, 2;

A frequent variant is "what was DAU yesterday?" Use a half-open interval — closed on the lower bound, open on the upper — which is the only safe pattern when created_at is a timestamp rather than a date.

SELECT COUNT(DISTINCT user_id) AS yesterday_dau
FROM events
WHERE event_name = 'app_open'
  AND created_at >= CURRENT_DATE - INTERVAL '1 day'
  AND created_at <  CURRENT_DATE;

Weekly active users (WAU) is the same shape with DATE_TRUNC('week', created_at) instead of casting to date. Watch the week boundary: Snowflake and BigQuery default to ISO weeks starting on Monday, while older Redshift configurations start on Sunday — merging with marketing data on a different week start will mysteriously fail to reconcile.

SELECT
    DATE_TRUNC('week', created_at)::DATE AS week,
    COUNT(DISTINCT user_id)              AS wau
FROM events
WHERE event_name = 'app_open'
GROUP BY 1
ORDER BY 1;

Smoothing the line with a moving average

Raw DAU is noisy. Weekends, holidays, regional outages, push-notification campaigns — they all generate spikes that swamp the actual underlying trend. The fix is a 7-day moving average, which collapses one week of seasonality into a single smooth number because every 7-day window contains exactly one of each weekday.

WITH daily AS (
    SELECT
        created_at::DATE        AS day,
        COUNT(DISTINCT user_id) AS dau
    FROM events
    WHERE event_name = 'app_open'
    GROUP BY 1
)
SELECT
    day,
    dau,
    AVG(dau) OVER (
        ORDER BY day
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS dau_7d_ma
FROM daily
ORDER BY day;

A 28-day moving average smooths harder and removes weekly campaign noise — the right choice for a CEO dashboard. A 3-day moving average reacts to incidents faster — the right choice for oncall.

Rolling MAU and DAU/MAU stickiness

Stickiness is the ratio DAU divided by MAU — what fraction of your monthly users show up on any given day. To compute it for a daily chart, you need a rolling 30-day MAU evaluated at every date, which is harder than it looks because most SQL engines (including Postgres) refuse to let you use COUNT(DISTINCT ...) as a window function. The standard workaround is a correlated subquery — slower than a clean window, but it gets the right answer and runs fine on snapshotted aggregates.

WITH daily AS (
    SELECT
        created_at::DATE        AS day,
        COUNT(DISTINCT user_id) AS dau
    FROM events
    WHERE event_name = 'app_open'
    GROUP BY 1
),
mau AS (
    SELECT
        d.day,
        (
            SELECT COUNT(DISTINCT e.user_id)
            FROM events e
            WHERE e.created_at >= d.day - INTERVAL '29 day'
              AND e.created_at <  d.day + INTERVAL '1 day'
              AND e.event_name = 'app_open'
        ) AS mau_30d
    FROM daily d
)
SELECT
    d.day,
    d.dau,
    m.mau_30d,
    ROUND(d.dau::NUMERIC / NULLIF(m.mau_30d, 0), 3) AS stickiness
FROM daily d
JOIN mau   m USING (day)
ORDER BY d.day;

Reference ranges for stickiness: daily-habit products like messengers sit above 50 percent, frequent-use products like social networks and consumer marketplaces land between 20 and 40 percent, rare-use products like travel booking and tax software live below 20 percent. The single most important number, though, is your own trendline, not the absolute level. In production, persist mau_30d into a daily snapshot table written by a nightly job, so the dashboard joins two cheap tables instead of scanning the raw events twice.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

New vs returning DAU

A flat DAU number cannot tell you whether growth is coming from acquisition or retention. Splitting today's active users into "first time seen today" versus "seen before" makes that obvious instantly, and is one of the most common interview follow-up prompts after the basic DAU query.

WITH user_first AS (
    SELECT user_id, MIN(created_at)::DATE AS first_seen
    FROM events
    GROUP BY user_id
),
daily AS (
    SELECT
        e.created_at::DATE AS day,
        e.user_id,
        uf.first_seen
    FROM events e
    JOIN user_first uf USING (user_id)
    WHERE e.event_name = 'app_open'
)
SELECT
    day,
    COUNT(DISTINCT user_id) FILTER (WHERE day = first_seen) AS new_users,
    COUNT(DISTINCT user_id) FILTER (WHERE day > first_seen) AS returning_users
FROM daily
GROUP BY day
ORDER BY day;

The signal is high. If new_users grows while returning_users is flat, acquisition is working but retention is broken. If returning_users grows while new_users is flat, the product is getting stickier but the top of the funnel is dry. The two together tell you which team to staff up.

A related split is by event type — splitting "viewer DAU" from "purchaser DAU" on a marketplace, because the two numbers routinely move in opposite directions.

SELECT
    created_at::DATE AS day,
    COUNT(DISTINCT user_id) FILTER (WHERE event_name = 'view')     AS view_dau,
    COUNT(DISTINCT user_id) FILTER (WHERE event_name = 'purchase') AS buy_dau,
    COUNT(DISTINCT user_id)                                        AS total_dau
FROM events
GROUP BY 1
ORDER BY 1;

If view DAU spikes but buy DAU is flat, marketing got attention but checkout is leaking. If buy DAU climbs while view DAU is flat, conversion is improving.

Common pitfalls

The most common failure in a DAU query is forgetting to filter the event name. Events tables at every consumer-tech company carry hundreds of distinct event names — server-side health checks, push delivery receipts, background sync pings, internal QA traffic. If you COUNT(DISTINCT user_id) over the raw table without a filter, your DAU inflates by 30 to 60 percent and the trendline becomes whatever your engineering team did to telemetry last week. Always start with an explicit allow-list of events that count as "active".

The second trap is time zones. If your events table stores created_at in UTC and you bucket by created_at::date directly, a user in Los Angeles who opens the app at 9pm local time gets counted in the next UTC day. Across millions of users in dozens of time zones, this smears every weekly cycle. The defensible choices are either casting to a single business time zone, or splitting DAU by region and bucketing each region in its own local time.

The third trap is counting events instead of users. COUNT(*) and COUNT(DISTINCT user_id) are different numbers, and the gap between them is exactly how many times the median user fired the active event. A new analyst sees COUNT(*) on a chart called "DAU" and assumes the metric exploded, when in fact the number of users is flat and a marketing push just triggered more notifications. DAU is always a count of distinct users.

The fourth trap is bots and internal traffic. Synthetic monitoring agents, security scanners, QA accounts, and the engineering team's own tablets all fire the active event. Build an internal_user_ids exclusion list, filter user-agents matching known crawlers, and cap any single user above a sane upper bound of events per day. Apply the same filter symmetrically in MAU, retention, and revenue queries — inconsistent bot filtering is how two charts with the same name start contradicting each other.

The fifth trap is letting "active" mean different things in different reports. The fraud team uses one definition, the growth team uses another, the CEO deck uses a third — and over time the numbers diverge until nobody trusts any of them. Lock the definition into a single SQL view and force every downstream query to join through it.

Optimization tips

The single biggest optimization is to never compute DAU directly from a raw events table on demand. Persist a daily aggregate — a dau_daily(day, platform, country, dau) table written by a nightly job — and let every dashboard hit the snapshot, which is three orders of magnitude smaller than the raw events.

Index created_at on Postgres; partition by created_at::date on BigQuery, Snowflake, or Redshift. Partitioning matters most for the rolling-MAU correlated subquery — without it each scan reads the full table and the query becomes hopeless at scale. With daily partitioning, it reads 30 partitions and finishes in seconds.

COUNT(DISTINCT user_id) is expensive on warehouses because it has to materialize the full distinct set. On BigQuery and Snowflake, switch to APPROX_COUNT_DISTINCT once your team has signed off on a 1 to 2 percent error band — the speedup is typically 5x to 20x. Keep the exact count for board decks and the approximate count for operational dashboards.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems built around exactly this kind of interview prompt.

FAQ

What counts as "active" in DAU?

Whatever your PM and analytics lead agree on, written down in a shared definition. For a social product, opening the main feed is the typical anchor. For a marketplace, viewing a product detail page works because it is upstream of every revenue event. For a mobile game, a session start is conventional. The exact choice matters less than the consistency — pick one event, lock it into a SQL view, and force every dashboard to use the same view.

How do you filter bots and internal traffic?

A combination of explicit allow-lists and behavioral signals. The explicit list is internal employee user IDs, known QA accounts, and synthetic monitoring agents — exclude these by ID. The behavioral signals catch the rest: any user firing hundreds of active events in a day is almost certainly a bot, any user-agent matching a known crawler regex is a bot, and any user whose entire session occurs in under a second is a bot. Apply all three in the same SQL view and reuse it.

WAU or the 7-day average of DAU — which one should I report?

Both, on the same dashboard. WAU is unique users over a calendar or trailing week — a user who showed up every day counts once. The 7-day average of DAU is the mean of seven daily distinct counts — that same user counts seven times. They answer different questions: WAU asks "how many distinct humans engaged this week", while the 7-day average smooths the daily-engagement number for trend analysis.

Why does COUNT(DISTINCT) fail as a window function?

Most warehouse engines — Postgres, Snowflake, Redshift, BigQuery — do not support COUNT(DISTINCT ...) in an OVER (...) clause. A distinct count over a sliding window requires maintaining a hash set per window, which is expensive enough that the SQL standard leaves it as an optional feature most engines never implemented. The standard workaround is the correlated subquery used in the rolling-MAU recipe above, or a pre-aggregated daily-active-users table that lets the rolling count operate on already-distinct rows.

How do I handle late-arriving events in DAU?

Late events arrive in the warehouse a day or more after they happened — common with mobile apps that batch telemetry on poor networks. The pragmatic fix is to recompute DAU for the trailing 7 days every night. Treat DAU for days less than 7 days old as preliminary and label it as such on dashboards.