How to calculate CSAT in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What CSAT is and why it matters

Picture a Monday morning at Stripe. Your PM pings you on Slack: the support team just rolled out a new chargeback workflow, the VP of CX wants to know whether customer satisfaction held up, and the all-hands is in three hours. You do not have time for a perception study or a quarterly NPS read. You have a csat_surveys table, a deadline, and a SQL editor. CSAT — Customer Satisfaction Score — is the metric that earns its keep in moments like this because it ties a single number to a single recent event the user actually experienced.

CSAT is a transactional metric. A user finishes a support chat, completes a checkout at DoorDash, sits through an onboarding call at Notion, and the system fires off a one-question survey: "How satisfied were you?" on a 1-5 or 1-10 scale. You aggregate the answers and report the share of positive responses. That tight coupling between event and rating is why product, support, and revenue teams all reach for CSAT before any other voice-of-customer measure. It also pays off when teams need to compare two operationally similar things — two agents, two flows, two regions, two A/B variants — at a granularity where NPS would be statistical noise.

The SQL formula

The textbook definition is the share of "positive" responses divided by total responses. On a 1-5 scale, positive usually means 4 or 5. On a 1-10 scale, the common cutoff is 7+, sometimes 9-10 if the org wants to be conservative. Pick a definition, document it once at the warehouse layer, and never let two dashboards disagree on what counts as a hit.

-- Baseline CSAT for the last 30 days
SELECT
    COUNT(*)                                                              AS responses,
    COUNT(*) FILTER (WHERE rating >= 4)                                   AS positive,
    COUNT(*) FILTER (WHERE rating <= 2)                                   AS negative,
    ROUND(
        COUNT(*) FILTER (WHERE rating >= 4)::NUMERIC * 100.0
        / NULLIF(COUNT(*), 0),
        2
    )                                                                     AS csat_pct,
    ROUND(AVG(rating)::NUMERIC, 2)                                        AS avg_rating
FROM csat_surveys
WHERE submitted_at >= CURRENT_DATE - INTERVAL '30 days';

A few things to notice. NULLIF(COUNT(*), 0) prevents a divide-by-zero error on quiet days. The FILTER (WHERE rating >= 4) syntax is Postgres / Snowflake / BigQuery friendly and reads cleaner than SUM(CASE WHEN ... THEN 1 ELSE 0 END). Casting to NUMERIC before the multiply forces decimal math; on integer-only engines you would silently lose precision.

Track negatives in the same query. The gap between "share positive" and "share negative" tells you whether you have a polarized customer base or a flat one. A CSAT of 78% with 5% detractors is a completely different operational story from 78% with 18% detractors, even though the headline number is identical.

Segment cuts that actually move the needle

A single org-wide CSAT number is a vanity metric. The first cut every analyst should ship is by agent or owner, because that is where actionable variance lives. The second is by issue type, region, or product surface, because that is where you find systemic problems.

-- CSAT by support agent, last 30 days
WITH agent_csat AS (
    SELECT
        agent_id,
        COUNT(*)                                                          AS responses,
        ROUND(AVG(rating)::NUMERIC, 2)                                    AS avg_rating,
        ROUND(
            COUNT(*) FILTER (WHERE rating >= 4)::NUMERIC * 100.0
            / NULLIF(COUNT(*), 0),
            2
        )                                                                 AS csat_pct
    FROM csat_surveys
    WHERE submitted_at >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY agent_id
    HAVING COUNT(*) >= 20
)
SELECT *
FROM agent_csat
ORDER BY csat_pct DESC;

The HAVING COUNT(*) >= 20 guard is doing real work. Without a sample floor, the leaderboard fills with new hires who handled two easy refund cases and a long-tenured agent whose one bad week dragged a 200-conversation sample down. Twenty is reasonable for a thirty-day window in mid-volume support; for high-volume queues at Uber or DoorDash, push it to fifty. The cutoff should match the rate at which one conversation swings the percentage by more than two points.

For issue-type and region cuts, the structure is the same with a different GROUP BY and a higher sample floor. A common trick is to pre-join the survey table to a tickets dim so you can cut by ticket_priority, channel (chat / phone / email), or time_to_resolution_bucket in one query.

-- CSAT by issue category and channel
SELECT
    t.issue_category,
    t.channel,
    COUNT(*)                                                              AS responses,
    ROUND(
        COUNT(*) FILTER (WHERE s.rating >= 4)::NUMERIC * 100.0
        / NULLIF(COUNT(*), 0),
        2
    )                                                                     AS csat_pct
FROM csat_surveys s
JOIN tickets t USING (ticket_id)
WHERE s.submitted_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY t.issue_category, t.channel
HAVING COUNT(*) >= 50
ORDER BY csat_pct ASC;

Sorting ascending here is deliberate. You want the floor of the distribution at the top of the screen, because that is where the next CX investment should land.

CSAT vs NPS vs CES

These three metrics get confused in every interview, so it is worth being precise. CSAT measures perception of a specific event on a 1-5 or 1-10 scale, fired immediately after that event. NPS measures general loyalty on a 0-10 scale ("How likely are you to recommend us?") and runs on a slower cadence — quarterly is the norm. CES measures the effort a user spent to get something done on a 1-7 scale and, like CSAT, is transactional. The right metric depends on the question: did this interaction land well (CSAT), do customers still like us as a population (NPS), or was the journey friction-free (CES)?

Mature CX programs report all three on the same dashboard, with CSAT broken out by event type, NPS as a quarterly trend with a sample of at least a few thousand, and CES on high-friction flows like cancellation and onboarding. If an interviewer asks which one matters most, the correct answer is "depends on the decision the metric drives" — then you give an example.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Common pitfalls

The first pitfall is survey response bias. Customers who had a great experience and customers who had a furious one are dramatically more likely to fill out the survey than the silent middle. This skews your CSAT toward both tails and can make a fundamentally lukewarm experience look polarized. The fix is twofold: report response rate alongside CSAT every time, and use weighting on key segments when response rate diverges sharply. If support response rate is 8% but the gold-tier customer segment responds at 22%, your top-line CSAT is quietly a gold-tier CSAT, and you should weight by segment population to get back to a defensible org-wide number.

Another trap is reading CSAT in isolation from operational reality. A refund-call CSAT of 5 from a customer who is canceling their account is not a win — the agent did their job, the customer is leaving, and the headline metric will improve right before churn explodes. CSAT should always be reported alongside the underlying event volume and the downstream behavior. If CSAT climbed three points last month but refund volume doubled, you have a satisfaction-of-leaving problem, not a CX success.

Comparing CSAT across industries is the third common mistake. A 70% CSAT in enterprise SaaS at Snowflake is roughly on benchmark; the same number in food delivery at DoorDash would be alarming, because consumer expectations and survey conventions differ. Always benchmark within your industry and, ideally, within your own historical trend. Cross-industry comparisons make for good press releases and terrible operational decisions.

Pitfall four is letting the "positive" threshold drift. One team counts 4-and-5 as positive, another counts only 5, a third uses top-two-box on a 1-10 scale, and the CFO walks into the board meeting with three different CSAT numbers from three different decks. Lock the definition in the warehouse, version it in code review, and surface the definition in every dashboard footer. If a team needs a stricter cutoff for a specific cut, add a new column — never overwrite the canonical one.

The fifth pitfall is ignoring sample size on small segments. A new product line with eight survey responses in its first week will produce a CSAT that swings twenty points on the next response. Use a HAVING COUNT(*) >= N guard in every cut and, for executive reporting, attach a confidence interval. The math is straightforward: for a proportion p with n responses, the 95% confidence interval is roughly p plus or minus 1.96 * sqrt(p * (1 - p) / n). For n = 30 and p = 0.8, that is a 14-point band — which kills the case for celebrating a two-point lift.

Optimization tips

For warehouses at the scale of a billion-row csat_surveys table, the single biggest win is partitioning by submitted_at at the day or month grain and clustering by agent_id or issue_category depending on which cuts dominate query volume. On Snowflake and BigQuery, this is enough to bring a 30-day CSAT query under a second for most segment cuts.

Materialize the daily CSAT rollups. A csat_daily table with one row per (date, agent_id, channel, issue_category) costs almost nothing to maintain on a streaming or hourly DBT job and lets every downstream dashboard hit a tiny aggregate instead of scanning the full surveys table. The savings compound when you add cohort or trend analysis on top — you avoid re-computing the same window every refresh.

When you join surveys to tickets or users, push the survey-side filter down before the join. The query planner usually figures this out, but on engines without good cardinality estimates the explicit pre-filter in a CTE can shave significant time off larger joins. Same logic applies if you enrich with a customer-tier dim: filter surveys to the date window first, then join.

If you want to drill metric-design SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.

FAQ

What counts as a "good" CSAT score?

Most consumer support orgs treat 75-85% as the healthy operating band, with 90%+ as exceptional and below 65% as a clear signal something is structurally wrong. The exact target depends on industry, channel, and the strictness of your "positive" cutoff. Enterprise SaaS support tends to run higher because the customer base is smaller and more invested; on-demand consumer apps run lower because the population is larger and less forgiving. Never set a CSAT target without first plotting twelve months of internal history and at least one published industry benchmark for context.

When should the CSAT survey fire?

Immediately after the event, ideally within minutes. Delayed surveys collect memory and mood, not perception of the interaction. The exception is for events with a delayed outcome — a refund that posts two days later, an onboarding whose value only shows up in week two — where a same-day survey would miss the actual experience. In those cases, send the survey when the outcome lands, and timestamp both the event and the survey so you can analyze any drift in real time.

Should I use a 5-point or 10-point scale?

A 5-point scale is simpler for respondents and produces a higher response rate, which is why most consumer support orgs default to it. A 10-point scale gives you more granularity and aligns with NPS conventions, which simplifies dashboarding if you report both. The tradeoff is that 10-point scales suffer more from cultural rating bias — some markets cluster at 7-8, others at 9-10 — so you need to interpret the average rating, not just the share-positive number. Pick one, commit to it for at least a year, and never mix scales in the same metric.

CSAT is dropping. What do I do first?

Run the issue-type cut sorted ascending and the agent cut filtered to recent hires, in that order. Most CSAT drops trace either to a specific issue category that just spiked in volume — usually because of a product launch or a billing change — or to a cohort of new agents still climbing the learning curve. If both look stable, pull the verbatim comments on 1-and-2 ratings from the last two weeks and read them; pattern-matching the open text reveals the cause faster than any further numeric slice.

How does CSAT relate to operational metrics like first-response time?

Treat CSAT as the perception layer on top of operational metrics, not a substitute. A dashboard that shows CSAT next to average handle time, first-response time, and resolution rate is far more actionable than CSAT alone, because the operational numbers tell you the why. The classic correlation is that first-response time under one minute drives a measurable CSAT bump on chat support, while resolution rate dominates on phone. Run the regression once on your own data — the elasticities will be different — and use the result to prioritize ops improvements that translate to perception lifts.

Should CSAT include responses where the user only left a comment, no rating?

No. Keep the canonical CSAT calculation strictly on responses with a numeric rating, and report comment-only responses as a separate volume signal. Mixing them either forces you to invent a numeric value for the comment-only row, which biases the metric, or to silently drop them, which biases your response rate. Two separate signals on the same dashboard is the clean answer.