How to calculate CES in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What CES is and why it matters

Customer Effort Score (CES) is a single-question survey metric that asks the user how hard it was to get something done. The Gartner formulation phrases it as "How much effort did you personally have to put forth to handle your request?" on a 1 to 7 scale, where 1 means very high effort and 7 means very low effort. Some teams flip the scale so that 1 means strongly disagree and 7 means strongly agree with a statement like "The company made it easy for me to handle my issue." Direction matters and burns at least one analyst per quarter, so we will be explicit about it below.

CES has spread across analytics dashboards at Stripe, Notion, Airbnb, and DoorDash for a clear reason. In Gartner's research, CES predicted future loyalty more reliably than CSAT and NPS for support, self-service, and onboarding. Customers do not stay loyal because their last ticket was delightful. They stay because the product did not waste their time. When a billing question takes four messages and a callback to resolve, no amount of empathetic copy fixes the impression that the company is hard to deal with.

The classic scenario that lands CES in your queue: Monday morning, the VP of Support wants to know whether the new chatbot rollout actually reduced effort, broken out by tier and channel, against the pre-launch baseline. You have a ces_surveys table with user_id, ticket_id, score, channel, and submitted_at. Your job is to answer the question without quietly mixing scales, double-counting tickets, or hiding survivorship bias.

The SQL formula

CES is reported two ways depending on the team. The first is the simple mean: average the score across all responses in the window. The second is the top-box rate: the share of respondents who scored 5, 6, or 7 on a 1-7 scale. The top-box version is what executive decks usually want because percentages move more dramatically than averages and are easier to compare against a target.

Both versions belong in the same query. Ship only one and you will get asked for the other within a week. Here is the baseline calculation over a rolling 30-day window in PostgreSQL syntax. The same expression works in Snowflake and Databricks with trivial changes.

SELECT
    COUNT(*)                                            AS responses,
    AVG(score)                                          AS avg_ces,
    COUNT(*) FILTER (WHERE score >= 5)                  AS easy_responses,
    COUNT(*) FILTER (WHERE score >= 5)::NUMERIC
        * 100.0 / NULLIF(COUNT(*), 0)                   AS pct_easy,
    COUNT(*) FILTER (WHERE score <= 2)                  AS hard_responses,
    COUNT(*) FILTER (WHERE score <= 2)::NUMERIC
        * 100.0 / NULLIF(COUNT(*), 0)                   AS pct_hard
FROM ces_surveys
WHERE submitted_at >= CURRENT_DATE - INTERVAL '30 days'
  AND score BETWEEN 1 AND 7;

Three details to call out. The BETWEEN 1 AND 7 filter is not paranoia: survey tools sometimes emit 0 for skipped questions or null for client-side timeouts, and those rows pull your average toward zero. The NULLIF(COUNT(*), 0) guards against empty result sets in early-launch markets. And both top-box and bottom-box percentages live in the same query because the gap between them is the real signal. A 4.8 average with 20 percent in the bottom box is a very different product than 4.8 with 5 percent.

For a trend chart, wrap the same expression in a date bucket. The window function version below computes a 7-day rolling average so the line is readable when daily volume is small.

WITH daily AS (
    SELECT
        DATE_TRUNC('day', submitted_at)::DATE       AS day,
        COUNT(*)                                    AS responses,
        AVG(score)                                  AS avg_ces
    FROM ces_surveys
    WHERE submitted_at >= CURRENT_DATE - INTERVAL '90 days'
      AND score BETWEEN 1 AND 7
    GROUP BY 1
)
SELECT
    day,
    responses,
    avg_ces,
    AVG(avg_ces) OVER (
        ORDER BY day
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    )                                               AS avg_ces_7d
FROM daily
ORDER BY day;

Segment cuts that actually move the needle

A company-wide CES is almost always less useful than the same number broken down by something operationally meaningful. The three cuts that pay for themselves are channel, issue category, and customer tier. Channel tells you whether chat is hiding behind email's good numbers. Issue category tells you which workflows need fixing. Tier tells you whether the worst experience is going to the customers paying the most.

SELECT
    channel,
    COUNT(*)                                            AS responses,
    AVG(score)                                          AS avg_ces,
    COUNT(*) FILTER (WHERE score >= 5)::NUMERIC
        * 100.0 / NULLIF(COUNT(*), 0)                   AS pct_easy,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY score)  AS median_ces
FROM ces_surveys
WHERE submitted_at >= CURRENT_DATE - INTERVAL '30 days'
  AND score BETWEEN 1 AND 7
GROUP BY channel
HAVING COUNT(*) >= 30
ORDER BY avg_ces ASC;

The HAVING COUNT(*) >= 30 is doing work. Without a minimum sample, a brand-new SMS channel with eight responses and one angry customer will look catastrophic, and someone will draft a memo about it before lunch. Thirty is a conservative floor borrowed from the rule of thumb that governs t-tests. The ORDER BY avg_ces ASC puts the worst channels at the top because that is what operators want to see first.

For a tier breakdown, join the survey responses to your customers table on tier.

SELECT
    u.tier,
    COUNT(*)                                            AS responses,
    AVG(s.score)                                        AS avg_ces,
    COUNT(*) FILTER (WHERE s.score >= 5)::NUMERIC
        * 100.0 / NULLIF(COUNT(*), 0)                   AS pct_easy
FROM ces_surveys s
JOIN users u ON u.user_id = s.user_id
WHERE s.submitted_at >= CURRENT_DATE - INTERVAL '30 days'
  AND s.score BETWEEN 1 AND 7
GROUP BY u.tier
HAVING COUNT(*) >= 30
ORDER BY u.tier;

If tier definitions change over time, snapshot the tier at the moment the survey was answered rather than the user's current tier. Joining a slowly changing dimension table with effective_from and effective_to columns is the cleanest pattern. Otherwise you end up explaining why Enterprise CES suddenly improved, and the real answer is "we re-tiered SMB churners up to Enterprise during the consolidation."

CES vs CSAT vs NPS

These three survey metrics overlap enough to confuse stakeholders and differ enough that swapping one for another quietly gives wrong answers. CES asks about effort on a single interaction. CSAT asks about satisfaction with a transaction. NPS asks about overall recommendation likelihood.

Metric When you ask it Scale What it predicts
CES Right after a support ticket, self-service flow, or onboarding step 1 to 7 Transactional loyalty, churn for support-driven dissatisfaction
CSAT After a transaction (purchase, ticket close, feature use) 1 to 5 Short-term sentiment about a specific touchpoint
NPS Quarterly or after meaningful milestones 0 to 10 Word-of-mouth, long-run brand sentiment

The mistake to avoid is reusing CES copy in a CSAT survey or vice versa. A 7 on "how easy was it" is not the same as a 5 on "how satisfied are you." The first rates friction; the second rates outcomes. If a refund is denied but the process was painless, CES is high and CSAT is low. Both signals are real and answer different questions.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Common pitfalls

The most common CES mistake is asking the question at the wrong moment. CES is a post-resolution metric. If the survey fires when a chat opens or a ticket is merely assigned, the response measures whatever was on the user's mind then, not the effort it took to resolve the issue. Gate the survey on a resolution event: ticket status moves to closed, the user clicks "this solved my issue," or seven days pass with no reopen. Any explicit gate beats firing on first contact.

A second pitfall is silently mixing scales. Some teams instrument CES at 1 to 5 to match a CSAT widget, others use 1 to 7 to match Gartner, and a few use a flipped scale where 1 is best. If two formats land in the same table without a version column, your average is meaningless. Add scale_min and scale_max columns at ingest and either normalize at query time or filter to one version per report.

The third trap is sample bias driven by survey timing. Respondents are not a random sample of customers who had support interactions; they are the ones either delighted or furious enough to click. Report response rate alongside CES, and weight by population if you have demographics on respondents versus non-respondents. A CES of 6.2 at a 4 percent response rate is much less reassuring than a 5.5 at 35 percent.

The fourth trap is reading CES without context on issue complexity. A 4 out of 7 on a simple password reset is a problem. A 4 out of 7 on a billing dispute that crossed three teams is fine. Segment by issue category before drawing conclusions. The management value is identifying segments where CES is unexpectedly low for the difficulty of the issue, not segments that are predictably painful.

The fifth trap is treating CES movement as causal evidence on its own. Support quality, product changes, and seasonality all move the metric. Before telling leadership that the new chatbot raised CES, check whether ticket volume changed, whether the issue-type mix shifted, and whether the survey trigger logic was modified in the same release. Pair CES analysis with the segment cuts above and, ideally, a holdout group when shipping a major change.

Optimization tips

Once ces_surveys reaches tens of millions of rows, the queries above slow down without help. The cheapest win is a composite index on (submitted_at, channel) or (submitted_at, score) depending on which segment dominates. The submitted_at >= CURRENT_DATE - INTERVAL '30 days' predicate is sargable against the first column, so the index turns a full scan into a range scan.

On Snowflake or Databricks, partition by submitted_at truncated to month and cluster on channel if channel dashboards dominate. Materialized views pay off once your dashboard refreshes more than once an hour: pre-computing the daily roll-up means the dashboard reads a 90-row table rather than scanning millions of survey rows.

If you join ces_surveys to users repeatedly, denormalize tier and signup_country at write time. Survey rows are append-only and small, so storage cost is negligible and you save the join on every read. In dbt, a daily incremental model maintaining ces_surveys_enriched is the standard pattern.

If you want to drill SQL questions like this every day, NAILDD is launching with 500+ SQL problems across exactly this pattern.

FAQ

What CES score is considered good?

On a 1 to 7 scale where 7 is least effort, above 5 is the baseline for healthy support, above 6 is excellent, and anything sustained below 4 is a leading indicator of churn. The number depends on what you measure. Self-service for simple actions like password resets should land at 6 or higher because the user expects it to be trivial. Complex billing disputes can sit at 4 to 5 without panic because the underlying problem is hard. The right benchmark is your own trend, not an industry average from a vendor whitepaper.

Should I replace CSAT with CES?

Not entirely. CES and CSAT answer different questions. CES is the right metric for support, onboarding, and self-service because effort is the dominant driver of transactional loyalty there. CSAT is still useful for post-purchase satisfaction, feature feedback, and overall product sentiment because effort is not the only thing customers care about. Most mature support analytics teams report both: CES as the headline metric for friction reduction, CSAT as the secondary signal that catches outcome dissatisfaction even when the process was easy.

Does CES work for self-service flows?

Yes, and arguably this is where CES is most valuable. Self-service is the most underinstrumented part of most customer experiences because nobody opens a ticket, so the failure mode is silent. A one-question CES survey after a help center search or billing portal action gives you a quantitative signal on flows where the alternative is guessing from session recordings. Fire the trigger after a terminal action like marking an article helpful or closing a self-serve case, not on page load.

CES is dropping. What do I do first?

Run the channel and issue-category cuts before anything else. A drop in the company-wide average almost always traces to a specific channel or issue type rather than a uniform decline. Pull a sample of the lowest-scoring responses in the worst segment and read the free-text feedback. Most CES drops have a concrete operational cause: a routing change, a stale help article, a vendor outage, or a release that introduced friction.

How often should we measure CES?

Real-time collection, with reporting cadence matched to the decision. Operators reviewing day-to-day support quality want a 7-day rolling average refreshed daily. Leadership reviewing strategic friction trends wants the monthly number with year-over-year comparisons. Quarterly is the wrong reporting cadence for CES because the metric is responsive enough that you miss real signal between reviews.

What sample size do I need before trusting CES?

For a single segment, 30 responses gets you out of pure noise territory and 100 is where you can draw operational conclusions with reasonable confidence. For comparing two groups, like channel A versus B or pre-launch versus post-launch, the practical minimum is 100 responses per group with a difference of at least 0.3 on a 1 to 7 scale. Smaller differences are real but require larger samples to distinguish from noise.