How to calculate AOV in SQL
Contents:
What AOV is and why analysts get it wrong
AOV — Average Order Value — sits underneath every e-commerce P&L. It is one of three levers that move revenue in any marketplace or DTC business, and the decomposition is so well-worn that you will hear it in the first ten minutes of a Stripe, DoorDash, or Amazon interview: Revenue = Traffic × Conversion Rate × AOV. If revenue moves and you do not know why, the first question is whether the average ticket changed.
The trap is that a naive AVG(total) answers the question wrong as often as right. A single whale with a five-figure order on Black Friday can shove the monthly mean up by thirty percent without a single behavioral change in the funnel. Order-amount distributions are heavy-tailed in almost every category — fashion, grocery, electronics, travel — and the mean drifts with the tail rather than the typical customer. Analysts who only ship the average get embarrassed in the metric review when someone asks for the median and the numbers diverge by a factor of three. This post walks through the SQL end to end, runnable in Postgres with minor tweaks for Snowflake, BigQuery, or Redshift.
Formulas and the data model
Total revenue from completed orders divided by the count of those orders. Median and P95 round out the picture.
AOV = Total Revenue / Total Orders
median = PERCENTILE_CONT(0.5)
P95 = PERCENTILE_CONT(0.95)When the mean is much larger than the median, the distribution is skewed by big orders and the mean is no longer describing the typical customer. Rule of thumb: if mean exceeds median by more than thirty percent, lead with median and treat the mean as the heavy-tail tracker.
For every query below, assume an orders fact and a users dimension. Currency is normalized to USD upstream.
orders (order_id, user_id, total, discount, status, created_at, category)
users (user_id, registered_at, attribution_channel, platform)orders.status includes at least paid, cancelled, and refunded. total is post-discount; gross order value is total + discount.
Eleven SQL queries you will actually ship
1. Base AOV for a single window
SELECT
COUNT(*) AS orders_cnt,
SUM(total)::NUMERIC AS revenue,
AVG(total)::NUMERIC AS aov
FROM orders
WHERE status = 'paid'
AND created_at >= '2026-01-01'
AND created_at < '2026-02-01';The status filter excludes cancellations and refunds. The ::NUMERIC cast prevents the Postgres integer-division trap that rounds AOV toward zero when total is stored as cents.
2. AOV with median and P95
The mean alone is a half-answer. Median and P95 in the same query force every dashboard consumer to see the shape of the distribution.
SELECT
AVG(total)::NUMERIC AS aov_mean,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total) AS aov_median,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total) AS p95
FROM orders
WHERE status = 'paid';If aov_mean runs noticeably above aov_median, the catalog has a heavy tail dragging the average up. On BigQuery, swap PERCENTILE_CONT for APPROX_QUANTILES.
3. AOV by month
The dashboard staple. A clean monthly series tells you whether AOV is trending or oscillating around a stable mean.
SELECT
DATE_TRUNC('month', created_at)::DATE AS month,
COUNT(*) AS orders,
SUM(total)::NUMERIC AS revenue,
AVG(total)::NUMERIC AS aov
FROM orders
WHERE status = 'paid'
GROUP BY 1
ORDER BY 1;Shopify and Stripe dashboards usually overlay a rolling three-month average on top of the raw line to dampen seasonality noise.
4. AOV by category
Mixing categories in one AOV is rarely useful. Grocery and electronics sit on completely different distributions, and the blended mean answers no real question.
SELECT
category,
COUNT(*) AS orders,
AVG(total)::NUMERIC AS aov,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total) AS median_aov
FROM orders
WHERE status = 'paid'
GROUP BY category
ORDER BY aov DESC;Aggregate AOV without a category breakdown is a Simpson's-paradox trap — a shift in mix moves the blended number even when no category changed internally.
5. AOV by acquisition channel
Marketing teams live on this view. Join orders to users and group by channel.
SELECT
u.attribution_channel,
COUNT(*) AS orders,
AVG(o.total)::NUMERIC AS aov
FROM orders o
JOIN users u USING (user_id)
WHERE o.status = 'paid'
GROUP BY u.attribution_channel
ORDER BY aov DESC;Organic and direct traffic usually post a higher AOV than paid acquisition — those users came with intent. If your paid channels post the highest AOV, audit the attribution model before celebrating.
6. AOV for new versus returning buyers
A user's first purchase usually differs from their fifth. Repeat buyers have absorbed the checkout friction and tend to add more items.
WITH user_orders AS (
SELECT
user_id,
total,
created_at,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS order_num
FROM orders
WHERE status = 'paid'
)
SELECT
CASE WHEN order_num = 1 THEN 'new' ELSE 'returning' END AS buyer_type,
COUNT(*) AS orders,
AVG(total)::NUMERIC AS aov
FROM user_orders
GROUP BY 1;In a healthy retention curve, returning AOV is materially higher. When the gap closes or inverts, a promotion is usually pulling in high-intent new buyers.
7. AOV by decile segment
To see whether AOV is driven by the top of the pyramid or the long middle, slice users by lifetime spend into deciles.
WITH user_ltv AS (
SELECT user_id, SUM(total) AS ltv
FROM orders
WHERE status = 'paid'
GROUP BY user_id
),
user_tiers AS (
SELECT
user_id,
ltv,
NTILE(10) OVER (ORDER BY ltv DESC) AS decile
FROM user_ltv
)
SELECT
ut.decile,
COUNT(o.order_id) AS orders,
AVG(o.total)::NUMERIC AS aov
FROM orders o
JOIN user_tiers ut USING (user_id)
WHERE o.status = 'paid'
GROUP BY ut.decile
ORDER BY ut.decile;The top decile usually posts an AOV two to three times the global mean. A flattening curve is a sign that the high-value segment is decaying or being diluted.
8. Cohort AOV
Do users from one acquisition month spend a different average ticket than users acquired six months later?
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(created_at))::DATE AS cohort_month
FROM orders
WHERE status = 'paid'
GROUP BY user_id
)
SELECT
c.cohort_month,
DATE_TRUNC('month', o.created_at)::DATE AS order_month,
COUNT(*) AS orders,
AVG(o.total)::NUMERIC AS aov
FROM cohorts c
JOIN orders o USING (user_id)
WHERE o.status = 'paid'
GROUP BY c.cohort_month, DATE_TRUNC('month', o.created_at)
ORDER BY c.cohort_month, order_month;In a healthy product, AOV climbs as a cohort matures. A flat or declining cohort AOV is a monetization signal worth investigating before the next pricing review.
9. Gross versus net AOV
Discounts sit between catalog price and the cash that arrived.
SELECT
AVG(total)::NUMERIC AS aov_net,
AVG(total + discount)::NUMERIC AS aov_gross,
AVG(discount)::NUMERIC AS avg_discount
FROM orders
WHERE status = 'paid';Gross AOV reflects catalog pricing; net AOV is the version finance signs off on. When the gap widens, somebody on growth is buying revenue with margin.
10. Month-over-month AOV dynamics
LAG pulls the prior month onto the same row so the percent change is a single expression.
WITH monthly AS (
SELECT
DATE_TRUNC('month', created_at)::DATE AS month,
AVG(total)::NUMERIC AS aov
FROM orders
WHERE status = 'paid'
GROUP BY 1
)
SELECT
month,
aov,
LAG(aov) OVER (ORDER BY month) AS prev_month_aov,
ROUND(100.0 * (aov - LAG(aov) OVER (ORDER BY month))
/ NULLIF(LAG(aov) OVER (ORDER BY month), 0), 2) AS mom_pct
FROM monthly
ORDER BY month;The query I reach for when somebody pings on Slack with "did AOV move this month?" — three columns and the answer is on screen before the BI tool finishes loading.
11. Revenue per buyer
AOV alone does not tell you how much each buyer is worth. Combine it with order frequency.
WITH buyer_stats AS (
SELECT
user_id,
COUNT(*) AS orders_per_buyer,
AVG(total)::NUMERIC AS aov_per_buyer,
SUM(total)::NUMERIC AS rev_per_buyer
FROM orders
WHERE status = 'paid'
GROUP BY user_id
)
SELECT
AVG(orders_per_buyer)::NUMERIC AS avg_orders_per_buyer,
AVG(aov_per_buyer)::NUMERIC AS avg_aov,
AVG(rev_per_buyer)::NUMERIC AS avg_revenue_per_buyer
FROM buyer_stats;This is the bridge between AOV and LTV. If AOV is healthy but orders-per-buyer is collapsing, the problem is retention, not ticket size.
Common pitfalls
The first trap is including cancelled or refunded orders. A naive AVG(total) FROM orders pulls in cancellations and inflates AOV whenever cancelled rows carry a non-zero total. The fix is a strict status = 'paid' filter at the bottom of every query. Inspect the status values once and bake the right filter into the analytics layer.
The second trap is confusing AOV with revenue per user. AOV is revenue divided by orders. Revenue per user is revenue divided by buyers; ARPU is revenue divided by the full user base. Three metrics, three questions — dashboards that label them interchangeably end up in a post-mortem.
The third trap is reporting the mean without the median. Order-value distributions are almost always heavy-tailed, and the mean drifts with the tail rather than the typical customer. When mean and median diverge by more than thirty percent, the headline metric should be the median, and the gap itself becomes a tracked number.
The fourth trap is not normalizing currency. International marketplaces book orders in many currencies, and a raw AVG(total) is meaningless across them. Convert to a single base currency at the order's create-time exchange rate and run AOV on the normalized column.
The fifth trap is mixing categories or regions. Grocery, electronics, and apparel sit on completely different distributions. Always produce AOV by the dimensions that drive the variance — category, region, channel — and treat the global number as a tracking metric, not a decision input.
Optimization tips
Past a few hundred million orders, three things start to matter. First, partition the orders table by created_at — monthly partitions are the standard. Every AOV query is naturally range-bounded by date, so partition pruning collapses the scan to a couple of months.
Second, materialize the monthly aggregate. The section 3 query is the perfect nightly materialized view — read constantly, identical across most dashboards, and recomputing from scratch on every load is pure waste. In Snowflake or BigQuery, the same logic lives in a scheduled query writing to a small reporting table.
Third, watch the percentile computation. PERCENTILE_CONT is exact and expensive. For trending dashboards, swap to APPROX_PERCENTILE (Snowflake), APPROX_QUANTILES (BigQuery), or tdigest (Postgres). Exact percentiles belong on finance-grade reports; approximate ones are perfectly safe for product analytics.
A Python sanity check
When a SQL number looks suspicious, the fastest debugging move is to recompute from a pandas dataframe and compare.
paid = orders.loc[orders["status"] == "paid", "total"]
aov_mean = paid.mean()
aov_median = paid.median()If pandas and SQL disagree, the bug is almost always in the SQL — usually a missed join condition, a timezone-shifted date filter, or the integer-division trap. The pandas version is also the right place to sketch the order-value histogram before deciding which percentile belongs on the dashboard.
Related reading
- How to calculate ARPU in SQL — the per-user counterpart to AOV with the same decomposition logic.
- SQL window functions interview questions — the
LAGandNTILEpatterns from the cohort and decile queries, plus a dozen more drills. - A/B testing peeking mistake — what to watch for when AOV is the success metric of an experiment.
If you want to drill SQL questions like this daily, NAILDD is launching with hundreds of analytics SQL problems built around this pattern.
FAQ
AOV or ARPU — which one should I report?
Both, almost always. AOV measures the average ticket of an order and is the right number for pricing, packaging, and basket-size decisions. ARPU measures revenue per user across the entire base and is the right number for unit economics and channel comparisons. A product can post flat ARPU while AOV climbs simply because users order less often, and that trade-off is exactly what a metric review needs to surface.
What is a normal AOV?
It depends entirely on category. Grocery delivery typically lands in the 30–60 USD range, apparel sits closer to 80–150 USD, and consumer electronics easily clears several hundred. The only rule that travels: if your mean exceeds your median by more than thirty percent, the headline number should be the median.
Should I lead with mean or median?
Lead with whichever is more honest given the distribution. If mean and median are within ten percent of each other, the mean is fine. If they diverge significantly, lead with the median and treat the mean as the heavy-tail tracker. Showing only one number and quietly hiding the other is how a metric review turns into a debate about definitions.
Is AOV a weighted average?
Yes. AOV is the equally weighted mean of every paid order — one row, one observation. A buyer with ten small orders contributes ten times as much weight as a buyer with one large order. If the real question is "what is the typical buyer's order size?", compute AVG(total) per user_id first, then average those user-level means.
How does AOV connect to LTV?
Loosely, LTV ≈ AOV × orders_per_buyer × margin / churn. The formula is more useful as a decomposition than a prediction: when LTV moves, walk through the four levers in order. Query 11 bridges AOV and revenue per buyer, the cleanest stepping stone toward LTV.
How do I compare AOV across channels fairly?
Fix the cohort. Comparing channels using all-time users on each side mixes channel quality with cohort age — paid channels usually have younger users than organic, and younger users have not upgraded their basket yet. Anchor every comparison to a fixed registration window so channel quality, not cohort maturity, is what shows up in the chart.