How to calculate crash-free users in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why crash-free users is the stability metric

Crash-free users is the number that decides whether a mobile release keeps rolling out or gets pulled. The benchmark across mobile-heavy companies like Uber, DoorDash, and Snap is around ninety-nine point nine percent for consumer apps, and a drop below that line is the trigger for a hotfix or rollback.

It shows up in analytics interviews because the calculation is deceptively simple and the interpretation is where candidates fail. The metric counts users, not events — one user with ten crashes is one unhappy person, not ten data points. The denominator has to be active users in the same window. And the meaningful comparison is per app version and per platform — mixing iOS with Android or 12.3 with 12.4 hides the regression the release manager needs to see.

The queries run on Postgres with one-line tweaks for Snowflake, BigQuery, and Redshift. The schema assumes app_sessions with user_id, session_date, platform, device_os, app_version, and crash_events with user_id, crash_date, app_version, crash_type — roughly what Firebase Crashlytics and Sentry export to a warehouse.

The formula

crash_free_users = (users_without_crash / total_active_users) * 100

The numerator is the count of distinct active users who did not experience a crash in the window. The denominator is the count of distinct active users in the same window. Multiply by one hundred.

Crash-free users sits one level above crash rate, which counts crash events divided by total sessions. Crash rate punishes one unstable user repeatedly; crash-free users counts them once. For release decisions, crash-free users is the metric that aligns with what users feel. Crash rate stays useful for debugging — a tenfold spike in events without a matching spike in affected users usually means one device firing the same exception in a loop.

The base SQL query

A seven-day window is the standard for stability dashboards. It smooths daily noise enough that a one-off device cluster does not move the headline, and stays short enough that a Monday regression shows up by Friday.

WITH active_users AS (
    SELECT DISTINCT user_id
    FROM app_sessions
    WHERE session_date >= CURRENT_DATE - INTERVAL '7 days'
),
crashed_users AS (
    SELECT DISTINCT user_id
    FROM crash_events
    WHERE crash_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT
    (SELECT COUNT(*) FROM active_users)                                       AS active,
    (SELECT COUNT(*) FROM crashed_users)                                      AS crashed,
    (1 - (SELECT COUNT(*) FROM crashed_users)::NUMERIC
         / NULLIF((SELECT COUNT(*) FROM active_users), 0)) * 100              AS crash_free_pct;

The NULLIF guard is required because every warehouse will at some point return an empty active set, and a divide-by-zero on the headline metric removes the dashboard from leadership review. The double DISTINCT user_id is the whole point: a user with one crash and a user with twenty both contribute one to the crashed count.

A healthy result on a consumer app sits above ninety-nine point nine percent. Between ninety-nine and ninety-nine point nine the platform team pays attention but is not alarmed. Below ninety-nine is a stability incident. Below ninety-eight is where store ratings drop and the release in flight gets paused.

Breakdown by app version

The single-number crash-free is the headline. The first follow-up in any release review is how it compares across versions.

WITH crashed AS (
    SELECT DISTINCT user_id, app_version
    FROM crash_events
    WHERE crash_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT
    s.app_version,
    s.platform,
    COUNT(DISTINCT s.user_id)                                                 AS active_users,
    COUNT(DISTINCT s.user_id)
        FILTER (WHERE c.user_id IS NOT NULL)                                  AS crashed_users,
    (1 - COUNT(DISTINCT s.user_id)
            FILTER (WHERE c.user_id IS NOT NULL)::NUMERIC
         / NULLIF(COUNT(DISTINCT s.user_id), 0)) * 100                        AS crash_free_pct
FROM app_sessions s
LEFT JOIN crashed c
    ON s.user_id = c.user_id AND s.app_version = c.app_version
WHERE s.session_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY s.app_version, s.platform
HAVING COUNT(DISTINCT s.user_id) >= 500
ORDER BY s.app_version DESC, s.platform;

The HAVING COUNT(DISTINCT s.user_id) >= 500 floor matters. Staged rollouts mean the freshest version often has a few dozen users in the first hours, and a sample that small produces a percentage that swings by entire points with one or two affected devices. A clean regression looks like a step down between two adjacent versions that persists as the rollout reaches half of the install base. A pattern that recovers within a day is usually a cohort effect — early adopters skew toward beta channels.

Joining on both user_id and app_version is the subtle bit. A user who crashed on 12.3 last week and is now on 12.4 should not count as crashed against 12.4. Without the version match the new release inherits the crash history of every upgraded user.

Regression detection against baseline

The release manager wants a one-screen answer to which versions are worse than baseline by enough to act on. The query compares the most recent three days against the prior eleven, on the same versions and platforms, and flags any drop larger than half a percentage point.

WITH version_daily AS (
    SELECT
        app_version,
        platform,
        DATE_TRUNC('day', session_date)::DATE                                 AS day,
        ROUND((1 - COUNT(DISTINCT user_id)
                      FILTER (WHERE crashed)::NUMERIC
                   / NULLIF(COUNT(DISTINCT user_id), 0)) * 100, 3)            AS crash_free_pct
    FROM session_with_crash_flag
    WHERE session_date >= CURRENT_DATE - INTERVAL '14 days'
    GROUP BY app_version, platform, DATE_TRUNC('day', session_date)
)
SELECT
    app_version,
    platform,
    AVG(crash_free_pct)
        FILTER (WHERE day >= CURRENT_DATE - INTERVAL '3 days')                AS recent_cf,
    AVG(crash_free_pct)
        FILTER (WHERE day <  CURRENT_DATE - INTERVAL '3 days')                AS baseline_cf,
    AVG(crash_free_pct)
        FILTER (WHERE day >= CURRENT_DATE - INTERVAL '3 days')
      - AVG(crash_free_pct)
        FILTER (WHERE day <  CURRENT_DATE - INTERVAL '3 days')                AS delta_pp
FROM version_daily
GROUP BY app_version, platform
HAVING ABS(
        AVG(crash_free_pct) FILTER (WHERE day >= CURRENT_DATE - INTERVAL '3 days')
      - AVG(crash_free_pct) FILTER (WHERE day <  CURRENT_DATE - INTERVAL '3 days')
   ) > 0.5
ORDER BY delta_pp;

A delta_pp of minus one and a half on a freshly released version with a healthy install base is a rollback conversation. A drop of three or more is a stability incident. The query uses day-level averages rather than a statistical test, because release teams act on absolute thresholds; bootstrapped confidence intervals belong in the postmortem, not the alert.

The session_with_crash_flag source is a session-level rollup with a boolean crashed per session — most warehouses already maintain that table. If you do not have it, derive it inline with LEFT JOIN crash_events USING (session_id) and crashed IS NOT NULL.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Crash-free by platform and OS

iOS and Android have structurally different crash profiles. iOS tends to crash on memory pressure and concurrency bugs; Android crashes on OEM fragmentation, ANRs, and old WebView versions. Mixing them gives a number right on average and wrong for either platform.

SELECT
    s.platform,
    s.device_os,
    COUNT(DISTINCT s.user_id)                                                 AS active_users,
    COUNT(DISTINCT c.user_id)                                                 AS crashed_users,
    (1 - COUNT(DISTINCT c.user_id)::NUMERIC
         / NULLIF(COUNT(DISTINCT s.user_id), 0)) * 100                        AS crash_free_pct
FROM app_sessions s
LEFT JOIN crash_events c
    ON s.user_id = c.user_id
   AND c.crash_date >= CURRENT_DATE - INTERVAL '7 days'
WHERE s.session_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY s.platform, s.device_os
HAVING COUNT(DISTINCT s.user_id) >= 1000
ORDER BY crash_free_pct ASC;

The first row in the sorted output is usually the OS combination with a fresh major version on it — the latest iOS beta or an Android 15 partial rollout. A gap of a few basis points is noise. A full percentage point on a single OS version is the data the platform team takes to whoever owns the SDK.

Reporting one crash-free number with no platform split is the most common mistake on this metric. The senior version of the dashboard always shows both numbers side by side, with a per-OS-version drill-through one click away.

Common pitfalls

The first pitfall is reporting crash rate instead of crash-free users. Crash rate counts events, so one user crashing twenty times looks identical to twenty users crashing once. A release that ships a tight loop crashing a single device a million times looks catastrophic on crash rate and trivial on crash-free users — trivial is the correct read for a release-blocking decision. Always lead with crash-free users.

The second pitfall is averaging crash-free over a window that mixes versions. A release whose first day has a crash-free of ninety-eight looks acceptable when blended into the prior week's ninety-nine point nine. Compute crash-free per day and per version; never publish a rolling average that crosses a release boundary without the version split visible.

The third pitfall is ignoring Android ANRs. An Application Not Responding event is not strictly a crash, but the user experience is indistinguishable, and Google Play treats ANRs as part of the bad-experience metric that affects store ranking. Decide explicitly whether ANRs are in your numerator and document the choice in the dashboard tooltip.

The fourth pitfall is forgetting the platform split. A global ninety-nine point five can be a healthy iOS at ninety-nine point eight and a sick Android at ninety-nine point one, and the action items differ. Always show platform broken out.

The fifth pitfall is not separating crash type. A native crash, a Java or Kotlin exception, and a JavaScript bridge error in a React Native or Flutter app come from different parts of the codebase and get fixed by different teams. Tagging crashes with a crash_type column turns a generic stability problem into a routable engineering ticket.

Optimization tips

Partition app_sessions and crash_events by date — monthly partitions on Postgres and BigQuery, date clustering on Snowflake. Every query above filters on the last seven or fourteen days, so partition pruning takes scanned data from terabytes to gigabytes without changing a line of SQL.

The second lever is a daily session-version rollup. Materialize a table keyed by (day, app_version, platform, device_os) storing active and crashed user counts. The headline, per-version breakdown, and regression detection all become kilobyte-scale aggregates instead of joins against the raw event stream. On Snowflake or BigQuery this is a scheduled task; on Postgres a MATERIALIZED VIEW with a nightly REFRESH does the same.

The third lever is approximate distinct counts. Crashlytics-scale apps produce billions of session rows, and exact COUNT(DISTINCT user_id) is slow. APPROX_COUNT_DISTINCT on BigQuery and Snowflake, or the hll extension on Postgres, returns the same answer within a fraction of a percent at a fraction of the cost.

If you want to drill SQL questions like this every day, NAILDD is launching with hundreds of analytics SQL problems built around this pattern.

FAQ

What's a healthy crash-free users number?

For consumer mobile apps the benchmark is ninety-nine point nine percent or higher, measured on weekly active users. Games sit around ninety-nine point five because rendering and input paths are more fragile. Enterprise apps with mandatory adoption sit higher, around ninety-nine point nine five. The right number is the one your thirty-day baseline supports, and the meaningful event is a regression from that baseline rather than an absolute miss.

Should background crashes count toward crash-free?

Foreground crashes always count and are what matters for store ratings and support load. Background crashes are ambiguous: on Android they often appear as ANRs and background service terminations, and on iOS the OS routinely kills backgrounded apps for memory reasons that are not bugs. The default for most teams is to count foreground crashes and ANRs in the headline metric and report background terminations as a separate diagnostic.

Crash-free sessions versus crash-free users — which one?

Crash-free users is the stricter metric and the one most teams use for release decisions, because one user with many crashes is one unhappy person whose feedback and uninstall behavior dominate downstream outcomes. Crash-free sessions has higher volume and lower variance, which makes it useful for trend monitoring and for splitting by short-lived dimensions like a feature flag. Most mature dashboards show both numbers.

When should we roll back a release?

The hard triggers are an absolute crash-free below ninety-nine percent on the new version with a meaningful install base, or a regression of more than one percentage point against the prior version on the same platform. Either condition with a few thousand affected users justifies pulling the release. A half-point regression holding for two days usually leads to a hotfix rather than a rollback.

What tools do teams use for this?

Firebase Crashlytics and Sentry are the dominant choices, and both expose crash-free users directly. Bugsnag and Instabug are smaller alternatives. The reason to also compute crash-free in SQL is that the warehouse number is the only one that joins to your other data — revenue, retention, support tickets, feature flag exposure — and the release postmortem is more credible when stability sits in the same query as business metrics.

How does crash-free relate to store ratings?

Closely and with a lag. A drop in crash-free typically shows up in one-star reviews within days and in the rolling average store rating within a week or two, because crashing users are disproportionately likely to leave a one-star review. As a rule of thumb a one-percentage-point drop in crash-free, sustained for a week, costs roughly a tenth of a star — enough to move the store ranking and the conversion rate from the listing page.