How to calculate App Store rating in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Why App Store rating matters

The star number on the App Store and Play Store sits on top of the install funnel like a billboard. Every search ranking, every Today tab feature, every paid acquisition creative routes through it. A move from 3.8 to 4.4 typically lifts organic install conversion by thirty to fifty percent, and the same delta cuts paid CPI. When a Mobile PM at Uber, DoorDash, or Notion asks the analytics team "why did installs drop", the first plot is always the rolling rating split by app version.

The trap is that the average alone is a half-answer. A 4.2 with a normal distribution behaves nothing like a 4.2 with a bimodal one — half five-stars and half one-stars hides a furious minority about to write the review that tanks your search rank. The dashboard that ships with a single AVG(rating) column is the one that gets re-asked the moment a number moves. This post walks through the SQL for star ratings end to end, runnable in Postgres with minor tweaks for Snowflake, BigQuery, or Redshift.

The formula and the data model

The headline metric is a weighted mean: sum of stars times count, divided by the total count. Distribution and dynamics round out the picture.

Average Rating = SUM(rating * count) / SUM(count)
Distribution   = COUNT(*) per star bucket / COUNT(*) total
Recent rating  = same formula filtered to the last N days

When the mean is much higher than the median rating, the distribution is left-skewed — the typical user is happy. When mean and median sit close but the variance is large, the distribution is bimodal and the headline number is hiding a controversy.

For every query below, assume a single ratings fact stitched together from App Store Connect and Google Play feeds, plus a users dimension for cross-joins.

ratings (rating_id, user_id, rating, platform, app_version, created_at)
users   (user_id, registered_at, attribution_channel, country)

rating is an integer between 1 and 5. platform is ios or android. app_version is the build the user was on when they tapped submit.

Seven SQL queries you will actually ship

1. Headline rating for a single window

SELECT
    COUNT(*)                  AS ratings_cnt,
    AVG(rating)::NUMERIC      AS avg_rating,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rating) AS median_rating
FROM ratings
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days';

A ninety-day window matches what the App Store surfaces as "recent ratings". The ::NUMERIC cast avoids the integer-division round-down that bites every Postgres analyst at least once.

2. Star distribution

The mean is the headline. The distribution is the diagnosis.

SELECT
    rating,
    COUNT(*)                                                    AS ratings_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2)          AS pct
FROM ratings
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY rating
ORDER BY rating;

Read the shape on screen. Mostly fives with a thin tail is the goal. A bimodal split with bars at both one and five is a controversy signal — the product is polarizing, and the one-star wall will eventually outrun the five-star drip. A flat distribution centered on three is the mid-funnel-problems shape: users get partway in, find something annoying, and rate the experience as forgettable.

3. Rating by app version

The single most useful slice. Every product team needs to know which build broke things.

SELECT
    app_version,
    COUNT(*)                                                   AS ratings,
    AVG(rating)::NUMERIC                                       AS avg_rating,
    COUNT(*) FILTER (WHERE rating <= 2) * 100.0
        / NULLIF(COUNT(*), 0)                                  AS pct_low,
    COUNT(*) FILTER (WHERE rating >= 4) * 100.0
        / NULLIF(COUNT(*), 0)                                  AS pct_high
FROM ratings
WHERE created_at >= CURRENT_DATE - INTERVAL '60 days'
GROUP BY app_version
ORDER BY app_version DESC;

When pct_low spikes on a single version, freeze the rollout and pull the release notes. Linear and Figma run this query on every release as a smoke test. On Android the per-version slice is in Play Console; on iOS you build it from your own rating submission events.

4. Rating by platform

A persistent iOS-Android gap usually means a UI parity problem or a hardware tail problem.

SELECT
    platform,
    COUNT(*)                          AS ratings,
    AVG(rating)::NUMERIC              AS avg_rating,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rating) AS median_rating
FROM ratings
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY platform;

Android ratings are typically 0.2 to 0.4 stars lower across most consumer apps — wider hardware tail, more crashes, more aggressive Play prompt. The right question is not "why is Android lower" but "is the gap stable or widening".

5. Monthly dynamics with rolling average

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', created_at)::DATE   AS month,
        COUNT(*)                                AS ratings,
        AVG(rating)::NUMERIC                    AS avg_rating
    FROM ratings
    WHERE created_at >= CURRENT_DATE - INTERVAL '24 months'
    GROUP BY 1
)
SELECT
    month,
    ratings,
    avg_rating,
    AVG(avg_rating) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS r3_avg,
    LAG(avg_rating) OVER (ORDER BY month)                                          AS prev_month,
    ROUND(avg_rating - LAG(avg_rating) OVER (ORDER BY month), 3)                   AS delta
FROM monthly
ORDER BY month;

The three-month rolling average dampens the seasonality wobble — holiday rage-rating, summer outage clusters, anything that briefly skews the monthly point. If you want a deeper drill on the windowing patterns here, SQL window functions interview questions walks through LAG and rolling frames with the same shape.

6. Rating tied to acquisition channel

Marketing teams love AOV and CAC charts. The rating-by-channel chart is the one that gets the least airtime and tells the most truth — paid channels often recruit users who never matched the product in the first place.

SELECT
    u.attribution_channel,
    COUNT(*)                          AS ratings,
    AVG(r.rating)::NUMERIC            AS avg_rating,
    COUNT(*) FILTER (WHERE r.rating <= 2) * 100.0
        / NULLIF(COUNT(*), 0)         AS pct_low
FROM ratings r
JOIN users u USING (user_id)
WHERE r.created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY u.attribution_channel
ORDER BY avg_rating DESC;

Organic and referral channels almost always top this chart. When a paid channel sits above organic, audit the attribution model before celebrating — last-touch attribution will happily credit a banner ad for an install that was already coming.

7. Lifetime versus recent rating

The App Store surfaces both. The lifetime number is what new users see by default in many markets; the recent number is what actually predicts the next month.

SELECT
    'lifetime' AS WINDOW,
    COUNT(*)              AS ratings,
    AVG(rating)::NUMERIC  AS avg_rating
FROM ratings

UNION ALL

SELECT
    'recent_90d' AS WINDOW,
    COUNT(*)              AS ratings,
    AVG(rating)::NUMERIC  AS avg_rating
FROM ratings
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days';

When the recent window runs noticeably below the lifetime average, the trajectory is bad and the lifetime number is just a slow-moving cushion. When the recent window runs above lifetime, the product has turned a corner — and the lifetime number will catch up over the next six months as old ratings age out of practical weight.

Common pitfalls

The first trap is confusing the average with the distribution. An average of 4.0 can mean "every user gave a four" or "half the users gave a five and half gave a three" — radically different products with identical headlines. Always ship the distribution alongside the mean, and treat a bimodal shape as a louder signal than the mean itself. The dashboards that win are the ones that show both bars and the running number in the same view.

The second trap is treating lifetime rating as a live metric. Lifetime ratings move slowly because they are dominated by the back catalog of reviews; a real product change shows up first in the recent window and only seeps into lifetime over many months. If you are monitoring product changes against lifetime, you are watching the wrong number. Anchor live monitoring on the recent ninety-day window and use lifetime as a slow lagging benchmark.

The third trap is ignoring the review text. A four-star rating with a constructive comment carries different weight than a four-star with no text — and neither carries the same impact on the wall as a one-star with three paragraphs of detail. Build a parallel pipeline that tags review text by theme — crashes, login issues, pricing complaints, feature requests — and ship the theme distribution alongside the star distribution. The first table tells you what the rating is; the second tells you why.

The fourth trap is sample bias from the in-app prompt. If the rating dialog fires only after a success event, the sample is selected for happy users and the headline number is inflated. Apple's SKStoreReviewController rate-limits prompts to three per year, and the algorithm penalizes apps that gate prompts behind happy-path flows. The right pattern is a soft in-app survey that routes happy users to the App Store dialog and unhappy users to a support intake.

The fifth trap is mistaking a bot wave for an organic move. Competitor downvoting waves and incentivized five-star campaigns leave fingerprints — same-IP clustering, sudden country-specific spikes, ratings from users with zero session minutes the prior week. Flag suspicious cohorts and exclude them from the headline. How to detect anomalies in SQL covers the windowed-z-score pattern that catches most of these.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Optimization tips

Past a few hundred million ratings, three things start to matter. First, partition the ratings table by created_at — monthly partitions are the standard, every query above is naturally bounded by date, and partition pruning collapses the scan to the relevant window. The same applies in Snowflake via clustering keys and in BigQuery via partitioned tables.

Second, materialize the daily aggregate. Most dashboards consume the same three or four roll-ups: daily count, daily average, daily distribution by star bucket. Compute these nightly into a small reporting table and read from that table at dashboard time.

Third, watch the percentile cost. PERCENTILE_CONT is exact and expensive; for trending dashboards, swap to APPROX_PERCENTILE (Snowflake), APPROX_QUANTILES (BigQuery), or tdigest (Postgres). Exact percentiles belong on finance-grade reports; approximate ones are safe for product analytics.

A Python sanity check

When a SQL number looks suspicious — usually because somebody pinged you with "the rating dropped" and the dashboard disagrees with the App Store Connect screenshot — the fastest debugging move is to recompute from pandas and compare.

recent = ratings.loc[ratings["created_at"] >= "2026-02-17", "rating"]
avg_rating = recent.mean()
distribution = recent.value_counts(normalize=True).sort_index()

If pandas and SQL disagree, the bug is almost always in the SQL — a missed timezone shift on the date filter, a join that silently doubled rows, or a status filter that excluded a category nobody documented. The pandas version is also the right place to sketch the rating histogram before deciding whether to ship the mean or the median as the headline.

If you want to drill SQL questions like this daily, NAILDD is launching with hundreds of analytics SQL problems built around exactly this pattern.

FAQ

What rating counts as good?

Across consumer apps, 4.5 and above is top-tier — the band that gets featured and converts at the top of the store. 4.0 to 4.5 is healthy and where most apps with real engineering investment live. Below 4.0 is a serious problem and a clear drag on install conversion; below 3.5 is an emergency that usually correlates with a stability or onboarding regression. The exact band depends on category — utilities skew higher than social and dating, where polarized usage drags ratings down.

Should I report mean or median?

Both, almost always. The mean is the number new users see on the store listing, so it is the right number for store-side optimization. The median is more honest when the distribution is bimodal, and the gap between mean and median is itself a useful diagnostic. The dashboard that ships only the mean misses the controversies; the one that ships only the median misses what the store actually displays.

How much does rating move install conversion?

The rule of thumb across mobile attribution platforms is roughly ten to thirty percent install conversion lift per half-star, with the steepest gains in the 3.5 to 4.5 band. Below 3.5, the listing barely converts at all in many categories. Above 4.5, additional gains compress because the store rendering caps at five stars and most users do not distinguish between 4.6 and 4.8.

How do I get fewer one-star ratings?

Smoothen onboarding so users hit the first valuable moment before the rating prompt has any reason to fire. Drop the crash rate — every crash above the platform median translates directly into one-star rage-rating. Gate the App Store dialog behind a soft in-app survey and route unhappy users to a support intake instead of the store. Track prompt-fire-to-submit conversion as a metric in its own right.

Can I track per-version rating on iOS?

Not natively in the legacy App Store flow — Apple shows lifetime and recent, not per-version. The workaround is to instrument your own rating-submission events: when the user taps submit on the in-app prompt, log a row with the current app_version to your analytics warehouse. The store-side number lags and aggregates, but the warehouse-side number lets you run query three per release with full fidelity.