May 18, 2026·12 min read

Median vs mean — when to use which

Q: Can I use the median in A/B tests?

Yes, but the toolkit changes. Classical t-tests and z-tests compare means, not medians. To compare medians use the Mann-Whitney U test or the bootstrap. A common middle ground is to log-transform skewed metrics — revenue, session duration, latency — and run a standard t-test on the transformed values. The transformation pulls the long tail in and the t-test stays valid. CUPED and similar variance-reduction techniques assume you are testing means, so picking median as the primary metric usually means re-architecting the experiment.

Q: What is a trimmed mean?

A trimmed mean drops a fixed percentage from each end of the sorted data and averages what remains. A 5 percent trimmed mean removes the bottom 5 and top 5 percent. It is a compromise — more robust than the mean, less than the median — but retains some math properties of an arithmetic mean. Useful when you want one number that resists outliers but still moves with the bulk.

Q: Why do news articles quote the average salary instead of the median?

The average is almost always larger than the median for income because the distribution is right-skewed, so a bigger headline reads more flattering. For honest reporting the median is correct — which is why the US Bureau of Labor Statistics, levels.fyi, and Glassdoor publish medians.

Q: How do I compute the median in pandas?

`df['column'].median()` returns the column median. `df.groupby('segment')['value'].median()` returns the median per group. For arbitrary percentiles use `df['column'].quantile(0.95)`, or a list: `df['column'].quantile([0.5, 0.9, 0.99])`.

Q: What if my data has multiple peaks?

A bimodal distribution is where neither mean nor median tells the whole story — both can land in the valley between two peaks, describing a typical value almost no one has. Detect with a histogram or Hartigan's dip test, then segment. Once each segment is roughly unimodal, mean or median become useful on the subpopulations.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Contents:

Why this matters on a Monday morning
What the mean actually measures
What the median actually measures
Side-by-side comparison
Worked examples from product analytics
When to reach for mean
When to reach for median
Median in SQL across dialects
Common pitfalls
Interview answers that actually land
Related reading
FAQ

Why this matters on a Monday morning

A PM at Stripe pings you: "What is the average transaction amount this quarter? Board meeting in an hour." You ship AVG(amount) and report $1,840. An hour later the CFO replies that merchant-success has been reporting a typical ticket of $42. You are both right and both useless — the mean was inflated by seven-figure enterprise wires; merchant-success was eyeballing the median.

This is the median-versus-mean moment, and it shows up in every analyst loop at DoorDash, Uber, Airbnb, Notion, Linear, Snowflake, and Vercel. Interviewers test whether you reach for the right statistic when the distribution is messy, defend the choice to a non-technical stakeholder, and write the SQL across PostgreSQL, BigQuery, and ClickHouse without fumbling syntax. Short version: mean wins on symmetric data or when you care about totals; median wins on skewed or contaminated data — most of product analytics.

What the mean actually measures

The arithmetic mean is the sum of values divided by the count — the balance point. It uses every observation: strength and weakness in one. It carries all the information in the dataset (why it appears in variance, t-tests, and parametric machinery), but any extreme value drags it toward that extreme in proportion to how far the outlier sits from the rest.

Team salaries (USD thousands): 80, 90, 100, 110, 120
Mean = (80 + 90 + 100 + 110 + 120) / 5 = 100

Add one big number and the mean lurches.

Team salaries (USD thousands): 80, 90, 100, 110, 500
Mean = (80 + 90 + 100 + 110 + 500) / 5 = 176

The mean jumped from 100 to 176 because of a single observation. Four of five people make 110 or less, but the "average salary" reads 176. Put that on a job-description page and the candidate who joins expecting 176 quits inside the year.

What the median actually measures

The median is the middle value of sorted data. Odd count: the value at position (n+1)/2. Even count: the average of the two middle values. The median ignores how far the extremes sit from the middle — it only cares about ordering.

Salaries: 80, 90, 100, 110, 500
Median = 100 (the third of five)

Salaries: 80, 90, 100, 110
Median = (90 + 100) / 2 = 95

Replace 500 with 50,000 and the median still sits at 100. That is what statisticians mean by robust: a fraction of the data can be arbitrarily wrong and the statistic stays sensible. Robust is the right default whenever you do not control the data-generating process, which in product analytics is always.

Side-by-side comparison

	Mean	Median
Formula	Sum / count	Middle of sorted values
Outlier sensitivity	High	Low
Skewed distributions	Pulled toward the tail	Stays near the bulk
Math property	Minimizes squared error	Minimizes absolute error
SQL	`AVG(x)`	`PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY x)`
Pandas	`df['x'].mean()`	`df['x'].median()`
Reconstructs totals	Yes (mean × count = sum)	No
Reports a typical value	Only when symmetric	Yes

The line interviewers wait for: mean answers "how much in total per unit"; median answers "what does a typical unit look like." Both are valid, neither replaces the other.

Worked examples from product analytics

Average order value vs typical order value

-- Average (mean) vs typical (median) order value
SELECT
    AVG(amount_usd) AS aov,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount_usd) AS median_order
FROM orders WHERE created_at >= DATE '2026-01-01';
-- aov: $234.10  median_order: $89.00

A mean of $234 and a median of $89 differ by more than two and a half times. The gap is not a bug — the distribution has a long right tail. Enterprise contracts pull the mean up; the typical buyer sits near $89. Finance wanting total revenue gets the mean; product wanting to know what most checkout pages look like gets the median.

API latency on a payments endpoint

SELECT
    AVG(latency_ms) AS mean_ms,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY latency_ms) AS p50,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) AS p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) AS p99
FROM request_logs
WHERE endpoint = '/v1/charges' AND ts >= NOW() - INTERVAL '1 hour';
-- mean_ms: 3,180  p50: 780  p95: 2,400  p99: 11,900

The mean of 3,180 ms looks like a five-alarm fire. The p50 of 780 ms says the typical request is healthy. The p99 of 11,900 ms is the real story: one in a hundred requests waits twelve seconds — where retries, timeouts, and angry tickets come from. A team monitoring only the mean panics for the wrong reason or smooths the tail into invisibility. For deeper percentile work see Tail latency percentiles in SQL.

Company-wide compensation

A leadership update says "average compensation is $190,000." The CEO package is $4,800,000 and the engineering median is $172,000. The headline mean is technically true and substantively misleading. Cleaner framing reports both numbers and explains the gap — which is why levels.fyi and Glassdoor publish medians.

When to reach for mean

Use the mean when the distribution is roughly symmetric, when outliers have been audited and accepted, when parametric machinery will consume the number, or when the question is about totals. The mean is the only statistic that satisfies mean(x) * count(x) = sum(x) — finance asking for revenue gets AVG(amount) * COUNT(*) and the arithmetic closes.

import pandas as pd

# Mean times count reconstructs the total
total_revenue = df['revenue_usd'].mean() * len(df)

# Mean is appropriate when the distribution is symmetric
df['adult_height_cm'].mean()  # mean and median nearly identical

The mean is also the right input to t-tests, ANOVA, linear regression, and most variance-reduction techniques. If your pipeline expects a sample mean — when computing standard errors or running CUPED variance reduction — substituting the median breaks the math.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

When to reach for median

Use the median when the distribution is skewed, when outliers are present and you cannot drop them, when the audience is non-technical, or when the metric drives an SLO. Skewed product metrics include order amounts, session durations, time between purchases, latency, and engagement — anything bounded below by zero with no upper bound. The mean lies on these; the median tells you what users actually experience.

# Median for typical session duration
median_session = df['session_seconds'].median()

# Quantiles for the full distribution
percentiles = df['load_time_ms'].quantile([0.25, 0.5, 0.75, 0.95, 0.99])

The median pairs naturally with other percentiles. A common dashboard pattern reports p25, p50, p75, p95, p99 side by side — five numbers describing both bulk and tail. The mean alone gives you neither.

Median in SQL across dialects

PostgreSQL, BigQuery, Snowflake, and most cloud warehouses support the ANSI ordered-set aggregate. ClickHouse exposes shorter sugar.

-- PostgreSQL, BigQuery, Snowflake, Redshift, Databricks
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount_usd) FROM orders;

-- ClickHouse
SELECT median(amount_usd) FROM orders;
SELECT quantileExact(0.5)(amount_usd) FROM orders;

MySQL 8.0 lacks a built-in percentile, so simulate it with window functions. The pattern handles odd and even counts.

-- MySQL 8.0+
SELECT AVG(amount_usd) AS median_amount
FROM (
    SELECT amount_usd,
           ROW_NUMBER() OVER (ORDER BY amount_usd) AS rn,
           COUNT(*) OVER () AS cnt
    FROM orders
) ranked
WHERE rn IN (FLOOR((cnt + 1) / 2), CEIL((cnt + 1) / 2));

PERCENTILE_CONT interpolates between adjacent values — right for continuous metrics like latency or revenue. PERCENTILE_DISC returns an observed value, useful for ordinal data. The ordered-set aggregate sorts, so it is expensive on huge tables; APPROX_PERCENTILE in Snowflake, APPROX_QUANTILES in BigQuery, and quantileTDigest in ClickHouse return near-correct answers in a fraction of the time.

Common pitfalls

The first pitfall is reporting the mean of a skewed distribution and treating it as typical. This is how a startup advertises a $234 AOV, recruits sales to chase enterprise accounts, and discovers six months later that 80 percent of orders were under $90 and the funnel was tuned for the wrong segment. Pair mean with median in any non-statistical report and flag distributions where the two diverge by more than 20 percent.

The second pitfall is mixing distributions before averaging. Computing mean session duration across all users on a free-plus-paid product gives a number that is neither the free experience nor the paid experience. Segment first, report per-segment statistics, then compute a weighted overall mean if anyone really wants one number. Same trap across countries, device types, and channels.

The third pitfall is forgetting that the median ignores magnitude. If refunds jump from 1 to 5 percent of transactions, the median refund amount may not move while the mean refund and total liability both balloon. Robust does not mean omniscient — summarize the bulk with the median, but track totals and means in parallel for anything where magnitude is the business risk.

The fourth pitfall is computing the median over the wrong unit of analysis. The median of all clicks is not the median CTR per user, and the median of all sessions is not the median user. Aggregate to the user level first, then take the median across users — otherwise you end up with a power-user-weighted statistic dressed as a typical-user statistic. Same bug breaks naive conversion funnel calculations.

The fifth pitfall is averaging averages. The mean of three monthly averages does not equal the overall mean unless months had equal counts; the median of three medians is not the overall median. Recompute from raw rows when your window or segment changes.

Interview answers that actually land

When is the median better than the mean? When the distribution is skewed or contains outliers — salaries, order amounts, latency, session duration. The median reports a typical value that survives a long tail; the mean is pulled toward extremes. Default to median in product analytics.

For e-commerce AOV, mean or median? Both, depending on the question. Finance reconstructing revenue needs the mean (mean × count = revenue). A product team sizing carts needs the median, because that is what most carts look like. Report both side by side.

How are mean and median related under a normal distribution? Equal — along with the mode — because the normal is symmetric and unimodal. A meaningful gap signals skew: mean > median is a long right tail, mean < median is a long left tail. The cheapest skewness diagnostic on a dashboard.

What is robust statistics? A family of methods that stay accurate when assumptions break. The median is the canonical robust statistic. Other tools: trimmed mean, winsorized mean, median absolute deviation, Huber loss, Mann-Whitney. Robust methods trade efficiency under ideal conditions for much better behavior when conditions are messy.

Which statistic monitors API latency? Percentiles, not the mean. p50 for the typical user, p95 and p99 for the tail, p99.9 for one in a thousand. SLOs are written against percentiles because user experience is a distribution, not a point.

If you want to drill these statistics interview questions daily — distribution choice, percentile SQL, robust methods — NAILDD ships a library of analyst interview problems built around this kind of judgment call.

FAQ

Can I use the median in A/B tests?

Yes, but the toolkit changes. Classical t-tests and z-tests compare means, not medians. To compare medians use the Mann-Whitney U test or the bootstrap. A common middle ground is to log-transform skewed metrics — revenue, session duration, latency — and run a standard t-test on the transformed values. The transformation pulls the long tail in and the t-test stays valid. CUPED and similar variance-reduction techniques assume you are testing means, so picking median as the primary metric usually means re-architecting the experiment.

What is a trimmed mean?