How to calculate Basket Size in SQL

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

What Basket Size is and why retail teams ask about it

Basket Size is the average number of items per order, and it sits one layer below AOV in the standard e-commerce decomposition. The chain analysts memorize before their first DoorDash or Amazon interview is short: AOV = Basket Size × Average Item Price. Move either lever and the average ticket moves, which is why quarterly reviews at Stripe, Uber Eats, or a marketplace inside Meta start with a Basket Size slide before anyone debates pricing.

The trap is that the metric looks trivial to compute and isn't. SUM(quantity) / COUNT(DISTINCT order_id) looks like a one-liner, but what counts as a single item is the hard part. Three units of the same SKU is three items in some categories and one in others. A bundle is one SKU containing five units; the right count depends on whether you are reasoning about logistics, pricing, or recommendations. The query is short; the judgment behind it is what interviewers want to see. The post walks through the SQL end to end, runnable in Postgres with minor tweaks for Snowflake, BigQuery, or Redshift.

Formulas and the data model

Total units sold divided by the count of orders that produced them. That is the headline number; everything else is a slice.

Basket Size = total_items / total_orders
AOV         = Basket Size × Average Item Price

When Basket Size climbs and item price stays flat, customers are adding more units — usually a sign that bundling, recommendations, or free-shipping thresholds are working. When the two move in opposite directions, headline AOV can stay flat while behavior shifts, and the metric review needs both numbers on screen.

For every query below, assume an order-items fact and a couple of dimensions. Currency is normalized to USD upstream.

order_items (order_id, product_id, quantity, unit_price, total, order_date)
orders      (order_id, user_id, status, created_at)
users       (user_id, country, registered_at, segment)
products    (product_id, category, is_bundle, bundle_unit_count)

orders.status includes at least paid, cancelled, and refunded. order_items.quantity is the unit count for a single SKU on a single order; unit_price is per-unit and post-discount. The product dimension carries the bundle flag and unit count so the SKU-versus-item question has a single answer at query time.

Eight SQL queries you will actually ship

1. Base Basket Size for a single window

SELECT
    COUNT(DISTINCT order_id)                AS orders,
    SUM(quantity)                           AS items,
    SUM(quantity)::NUMERIC
        / NULLIF(COUNT(DISTINCT order_id), 0) AS basket_size,
    AVG(unit_price)::NUMERIC                AS avg_item_price
FROM order_items
WHERE order_date >= '2026-01-01'
  AND order_date <  '2026-02-01';

NULLIF guards against divide-by-zero on empty windows; ::NUMERIC saves you from the Postgres integer-division trap that rounds Basket Size toward whole numbers when quantity is stored as an integer.

2. Basket Size by month

The dashboard staple. A clean monthly series tells you whether customers are growing the cart over time or holding flat.

SELECT
    DATE_TRUNC('month', order_date)::DATE     AS month,
    COUNT(DISTINCT order_id)                  AS orders,
    SUM(quantity)                             AS items,
    SUM(quantity)::NUMERIC
        / NULLIF(COUNT(DISTINCT order_id), 0) AS basket_size,
    AVG(unit_price)::NUMERIC                  AS avg_item_price
FROM order_items
WHERE order_date >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY 1
ORDER BY 1;

Overlay a rolling three-month average before showing this to anyone outside analytics — month-to-month Basket Size swings ten to fifteen percent in any seasonal category, and the eye reads noise as signal without smoothing.

3. Basket Size by segment

Mixing countries or customer segments in one Basket Size is almost never the right call. A US grocery cart and a European apparel cart sit on completely different distributions, and the blended mean answers no real question.

SELECT
    u.country,
    COUNT(DISTINCT oi.order_id)                 AS orders,
    SUM(oi.quantity)::NUMERIC
        / NULLIF(COUNT(DISTINCT oi.order_id), 0) AS basket_size,
    AVG(oi.unit_price)::NUMERIC                 AS avg_item_price,
    AVG(oi.total)::NUMERIC                      AS aov
FROM order_items oi
JOIN orders o USING (order_id)
JOIN users  u USING (user_id)
WHERE o.status = 'paid'
  AND o.created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY u.country
ORDER BY basket_size DESC;

If AOV climbs but basket_size is flat, price per unit moved — usually a pricing or mix shift. If basket_size climbs without an AOV move, customers are adding cheaper units, a different problem with a different fix.

4. Basket Size distribution

The mean lies as often as not. A histogram of items-per-order is the artifact that answers "what does a typical cart look like?".

WITH basket_per_order AS (
    SELECT
        order_id,
        SUM(quantity) AS items
    FROM order_items
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY order_id
)
SELECT
    CASE
        WHEN items = 1               THEN '1 item'
        WHEN items = 2               THEN '2 items'
        WHEN items BETWEEN 3 AND 5   THEN '3-5 items'
        WHEN items BETWEEN 6 AND 10  THEN '6-10 items'
        ELSE '10+ items'
    END                                                AS bucket,
    COUNT(*)                                           AS orders,
    ROUND(COUNT(*)::NUMERIC * 100
              / SUM(COUNT(*)) OVER (), 2)              AS pct
FROM basket_per_order
GROUP BY 1
ORDER BY MIN(items);

A healthy apparel store parks most mass at one to two items with a thin tail. A healthy grocery service looks completely different — almost nothing at one item, a fat cluster at ten to twenty, and a tail past forty. Bucket boundaries belong to the category, not the SQL template.

5. Basket Size by order-value decile

To see whether basket growth concentrates at the top of the pyramid or spreads across the middle, slice orders by value into deciles.

WITH per_order AS (
    SELECT
        order_id,
        SUM(quantity)    AS items,
        SUM(total)       AS order_value
    FROM order_items
    WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
    GROUP BY order_id
),
tiered AS (
    SELECT
        order_id,
        items,
        order_value,
        NTILE(10) OVER (ORDER BY order_value) AS value_decile
    FROM per_order
)
SELECT
    value_decile,
    AVG(items)::NUMERIC          AS avg_basket,
    MIN(order_value)::NUMERIC    AS decile_min_value,
    MAX(order_value)::NUMERIC    AS decile_max_value
FROM tiered
GROUP BY value_decile
ORDER BY value_decile;

Materializing NTILE in a CTE first is what makes this safe in a GROUP BY. Inlining the window function inside a top-level GROUP BY is the classic interview trap — works in some engines, fails in others.

6. Basket composition for recommendations

Headline Basket Size tells you how many items go into a cart. The composition query tells you which categories travel together — the input every recommender starts with.

WITH order_categories AS (
    SELECT DISTINCT
        oi.order_id,
        p.category
    FROM order_items oi
    JOIN products p USING (product_id)
)
SELECT
    a.category                                   AS category_a,
    b.category                                   AS category_b,
    COUNT(*)                                     AS co_purchase_orders
FROM order_categories a
JOIN order_categories b
  ON a.order_id = b.order_id
 AND a.category < b.category
GROUP BY a.category, b.category
ORDER BY co_purchase_orders DESC
LIMIT 20;

The a.category < b.category predicate deduplicates symmetric pairs so the result is the upper triangle of the co-purchase matrix. Top pairs are the bundles a merchandiser tests first; the long tail is where a learned recommender beats hand-crafted rules.

7. Gross versus net Basket Size

Naive Basket Size includes everything that went into a cart. The net version subtracts returns and tells you what the customer actually kept.

WITH per_order AS (
    SELECT
        oi.order_id,
        SUM(oi.quantity)                                       AS gross_items,
        SUM(oi.quantity)
            - COALESCE(SUM(oi.quantity)
                       FILTER (WHERE o.status = 'refunded'), 0) AS net_items
    FROM order_items oi
    JOIN orders o USING (order_id)
    WHERE o.created_at >= CURRENT_DATE - INTERVAL '90 days'
    GROUP BY oi.order_id
)
SELECT
    AVG(gross_items)::NUMERIC AS basket_size_gross,
    AVG(net_items)::NUMERIC   AS basket_size_net,
    AVG(gross_items - net_items)::NUMERIC AS avg_returned_units
FROM per_order;

Apparel and footwear carry a gross-versus-net gap of fifteen to twenty percent. Reporting only the gross number sends finance into the next meeting with the wrong story about cart depth.

8. Basket Size cohort by registration month

Do users acquired in one month put more items into the cart than users acquired six months later?

WITH first_orders AS (
    SELECT
        user_id,
        DATE_TRUNC('month', MIN(created_at))::DATE AS cohort_month
    FROM orders
    WHERE status = 'paid'
    GROUP BY user_id
)
SELECT
    f.cohort_month,
    DATE_TRUNC('month', o.created_at)::DATE           AS order_month,
    COUNT(DISTINCT oi.order_id)                       AS orders,
    SUM(oi.quantity)::NUMERIC
        / NULLIF(COUNT(DISTINCT oi.order_id), 0)      AS basket_size
FROM first_orders f
JOIN orders      o  USING (user_id)
JOIN order_items oi USING (order_id)
WHERE o.status = 'paid'
GROUP BY f.cohort_month, DATE_TRUNC('month', o.created_at)
ORDER BY f.cohort_month, order_month;

In a healthy product, Basket Size climbs as a cohort matures. A flat or declining curve signals that the new acquisition vintage is buying thinner, and the next pricing review starts there.

Common pitfalls

The first trap is the items-versus-SKUs confusion. Three quantities of the same SKU is one SKU but three items. The wrong denominator turns a Basket Size of three into one without any behavioral change. Define both metrics — SUM(quantity) and COUNT(DISTINCT product_id) per order — and publish both alongside each other, with the headline flagged in the dashboard title.

The second trap is leaving returned items in the gross count. Apparel and footwear are the worst offenders: a customer who buys three sizes and keeps one logs as a three-item cart even though one item walked out. Net Basket Size — gross minus refunded units — maps to logistics and unit economics, and the gap is itself a metric worth tracking.

The third trap is mishandling bundles. A bundle SKU containing five units is one SKU but five items on the shelf. Logistics wants units; pricing wants SKUs. Materialize both columns and let the dashboard pick.

The fourth trap is letting promotions inflate the headline. A buy-one-get-one offer doubles items without doubling willingness to pay. If the dashboard only carries gross Basket Size, BOGO weeks look like behavioral wins they are not. Flag promotional orders and track with and without the boost.

The fifth trap is subscription auto-orders. A monthly grocery subscription shipping the same six items every cycle produces a six-item Basket Size with zero variance. Mixing those with one-off purchases blends two different distributions. Split the series and report on each independently.

Sharpen SQL for your next interview
500+ SQL problems with worked solutions — joins, window functions, CTEs.
Join the waitlist

Optimization tips

Past a few hundred million order-items rows, three things start to matter. First, partition order_items by order_date — monthly partitions are the standard. Every Basket Size query is naturally range-bounded by date, so partition pruning collapses the scan to a couple of months.

Second, materialize the per-order rollup. The pattern in seven of the eight queries above — SUM(quantity) grouped by order_id — is the perfect nightly materialized view. In Snowflake or BigQuery, the same logic lives in a scheduled query writing to a small reporting table downstream views select against.

Third, watch the co-purchase query. Self-joining a category table against itself is O(n²) in categories per order. For sparse carts the cost is tame; for grocery baskets carrying twenty distinct categories the row count explodes. Precompute the pair table in a batch job; never run the live self-join in a serving query.

A Python sanity check

When a Basket Size number looks suspicious, the fastest debug move is to recompute from a pandas dataframe and compare.

paid = order_items.merge(
    orders.loc[orders["status"] == "paid", ["order_id"]],
    on="order_id",
)
basket = paid.groupby("order_id")["quantity"].sum()
basket_size_mean = basket.mean()
basket_size_median = basket.median()

If pandas and SQL disagree, the bug is almost always in the SQL — a missed join condition, a timezone-shifted date filter, or the integer-division trap. Pandas is also where you sketch the items-per-order histogram before fixing dashboard bucket boundaries.

To drill SQL like this daily, NAILDD launches with hundreds of analytics SQL problems built around this pattern.

FAQ

Items or SKUs — which one should I report?

Both, almost always. Items — SUM(quantity) — is the count of physical units and is the right denominator for logistics, packaging, and fulfilment. SKUs — COUNT(DISTINCT product_id) — is the count of distinct products and is the right denominator for assortment, merchandising, and recommendations. A cart with three units of the same shirt is a three-item, one-SKU basket; reporting only items makes the customer look more diverse than they are.

What is a normal Basket Size?

It depends entirely on category. Grocery delivery typically lands in the ten-to-thirty range, apparel sits at two to four, and consumer electronics is usually one to two. The rule that travels: track the distribution, not just the mean. A two-item average can hide a bimodal split between one-item impulse buys and five-item planned purchases, and the right response differs for each.

How do I grow Basket Size without burning margin?

Four standard levers: free-shipping thresholds, bundle pricing, recommendation modules at checkout, and tiered bulk discounts. The cheapest is usually the shipping threshold — set it above the current median order value and cart depth shifts within a quarter. Bundles and recommendations follow; bulk discounts belong last.

How should I count bundles?

Decide upfront and document it in the analytics layer. Materialize two columns — items_as_sku and items_as_unit — and let downstream dashboards select against whichever fits. Logistics and pricing teams disagree about the right number more often than not, and a single ambiguous column is what causes the disagreement to surface every quarter.

How does Basket Size connect to AOV and LTV?

Tightly. AOV = Basket Size × Avg Item Price is the decomposition every retail analyst memorizes. Multiply AOV by orders-per-buyer for revenue-per-buyer; fold in margin and churn for a lightweight LTV approximation. When LTV moves, walk the chain — Basket Size, item price, order frequency, margin, churn — and the shifted lever surfaces within minutes.