Binomial distribution
Contents:
Why analysts care about it
A growth PM at Stripe pings you at 5pm: "Conversion on the new checkout came in at 6.5 percent across 1,000 users today, baseline is 5 percent, can we call it a win?" Two ways this goes. One is you say "looks great, ship it", the lift evaporates within a week, and you spend the next sprint explaining why the metric reverted. The other is you reach for the binomial distribution, compute the probability of 65 or more conversions when the true rate is 5 percent, and answer with a number instead of a vibe.
The binomial is the workhorse behind every conversion rate, click-through rate, open rate, install rate, and approval rate you have ever shipped on a dashboard. Anywhere you have a fixed number of trials and a yes-or-no outcome per trial, the count of yeses follows a binomial. It also covers the math behind chi-square tests, two-proportion z-tests, and the surface area of a Bayesian A/B test. Master it and a third of your interview surface at Meta, Amazon, DoorDash, or Snowflake is settled before you sit down.
When the binomial applies
A binomial setup needs four ingredients. First, a fixed number of trials n. If your sample size depends on how the experiment unfolds, you are not in binomial territory and you are likely peeking. Second, exactly two outcomes per trial. Third, independence: the outcome of one trial cannot affect another. Fourth, a constant success probability p across all trials.
These four conditions are also the assumptions you should worry about in the interview answer. Visitors to a checkout are roughly independent if you ignore returning users. Email opens are independent within a campaign but not across campaigns if you bombard the same list daily, because fatigue changes p. Fraud detections look independent until the same device hits your funnel five times in an hour. Whenever the interviewer is being cute, the trap is hidden in one of these four conditions.
Parameters and formula
Write X ~ Binomial(n, p) when X is the random count of successes across n independent trials each with success probability p. The probability of exactly k successes is:
P(X = k) = C(n, k) * p^k * (1 - p)^(n - k)
where C(n, k) = n! / (k! * (n - k)!) is the binomial coefficient. Three summary numbers follow. Mean is mu = n * p. Variance is sigma^2 = n * p * (1 - p). Standard deviation is sigma = sqrt(n * p * (1 - p)). Memorize these three.
# 1,000 visitors, 5 percent conversion baseline
n, p = 1000, 0.05
mean = n * p # 50 conversions expected
std = (n * p * (1 - p)) ** 0.5 # roughly 6.9
print(f"Expected conversions: {mean:.1f} +/- {std:.1f}")
# Expected conversions: 50.0 +/- 6.9That plus-or-minus sets the scale for normal noise. Running the same 1,000-visitor day a hundred times, two thirds would land between 43 and 57 conversions, and 95 percent between 36 and 64. The 65th conversion is where the launch question gets interesting.
Computing it in Python
The scipy stats module gives you the distribution as a frozen object exposing a pmf for exact counts, a cdf for "k or fewer", a survival function for "more than k", a ppf for quantiles, plus the moments and a random sampler.
from scipy import stats
# X ~ Binomial(n=100, p=0.3)
binom = stats.binom(n=100, p=0.3)
binom.pmf(30) # 0.0868 — exactly 30 successes
binom.cdf(25) # 0.1631 — at most 25 successes
1 - binom.cdf(35) # 0.0991 — more than 35
binom.mean() # 30.0
binom.var() # 21.0
binom.std() # 4.58
binom.ppf(0.05) # 22.0 — 5th percentileFor simulation rather than analytics, reach for numpy. The np.random.binomial call draws size samples each summarizing n trials at success rate p.
import numpy as np
# Simulate 1,000 days, each with 500 visitors at 3 percent conversion
daily_purchases = np.random.binomial(n=500, p=0.03, size=1000)
print(f"Mean purchases per day: {daily_purchases.mean():.1f}")
print(f"Standard deviation: {daily_purchases.std():.1f}")Worked problems
The first problem is the e-commerce checkout. Two hundred visitors land on a page with a 10 percent baseline. The PM wants the probability that exactly 25 of them buy, and that 15 or fewer buy.
from scipy import stats
p_exact = stats.binom.pmf(25, n=200, p=0.10)
print(f"P(X = 25) = {p_exact:.4f}") # 0.0446
p_tail = stats.binom.cdf(15, n=200, p=0.10)
print(f"P(X <= 15) = {p_tail:.4f}") # 0.1049The second problem is the Stripe-style launch question. Control gets 1,000 users with 50 conversions, treatment gets 1,000 users with 65 conversions. Could 65 happen by chance if the true rate is still 5 percent?
from scipy import stats
p_value = 1 - stats.binom.cdf(64, n=1000, p=0.05)
print(f"p-value = {p_value:.4f}") # 0.0156The chance is roughly 1.6 percent. Below the conventional 5 percent threshold, so under H0 of "treatment is no different from control" the data are uncomfortable. You would call this directionally significant, then check sample ratio mismatch before shipping.
The third problem is quality control on a Tesla supplier line. A batch of 50 parts ships with a known 5 percent defect rate. The receiving team wants the probability of seeing zero defects.
from scipy import stats
p_zero_defects = stats.binom.pmf(0, n=50, p=0.05)
print(f"P(0 defects) = {p_zero_defects:.4f}") # 0.0769A clean batch happens about 8 percent of the time under the true 5 percent defect rate, rare but not extraordinary.
Link to A/B testing
Every conversion-rate A/B test is two binomial random variables side by side, one for control and one for treatment. The test reduces to asking whether the two underlying success probabilities are equal. The choice between a chi-square test, a two-proportion z-test, an exact Fisher test, and a Bayesian beta-binomial model is a choice of how to summarize the comparison.
When sample sizes are large enough that both n * p and n * (1 - p) clear the rule-of-thumb threshold, the two-proportion z-test is the default. It matches the exact binomial test to four decimal places. For small samples or extreme p, switch to the exact binomial test or to Fisher's exact test, since the normal approximation can mislead you on the tails.
For adjacent interview prep, see p-value explained simply, null hypothesis explained simply, and the A/B testing peeking mistake.
Normal and Poisson approximations
When n is large the binomial looks more and more like a normal distribution. The rule of thumb is that the normal approximation is accurate whenever n * p >= 5 and n * (1 - p) >= 5. In that regime you replace Binomial(n, p) with Normal(mu = n * p, sigma^2 = n * p * (1 - p)). This is why almost every A/B test calculator on the web is, under the hood, a normal approximation to a two-proportion comparison.
from scipy import stats
binom = stats.binom(n=1000, p=0.3)
normal = stats.norm(loc=300, scale=(1000 * 0.3 * 0.7) ** 0.5)
print(binom.cdf(280)) # 0.0887
print(normal.cdf(280)) # 0.0852When n is large but p is tiny, the normal approximation breaks down. The right replacement is the Poisson with lambda = n * p. Rare-event problems — crashes per million sessions, fraud per million transactions — live here.
from scipy import stats
binom = stats.binom(n=10000, p=0.0005)
poisson = stats.poisson(mu=5)
print(binom.pmf(3)) # 0.1403
print(poisson.pmf(3)) # 0.1404Common pitfalls
The first pitfall is confusing the binomial with the geometric distribution. The binomial counts successes across a fixed number of trials, so n is fixed and the count is random. The geometric counts trials until the first success. Mixing these up flips your estimation procedure and shows up immediately when an interviewer asks for the expected value. The fix is to ask, before anything else, which quantity is random and which is fixed.
The second pitfall is applying the binomial when trials are not independent. Retargeted ads break independence because the second click is conditional on the first. Subscription renewals break independence because they are correlated via product satisfaction. Sessions from the same user across a week break independence because user effects dominate the trial-level noise. The fix is to aggregate to the user level so each unit of analysis is independent, or use a hierarchical model that captures the within-unit correlation.
The third pitfall is treating p as a constant when it drifts. Open rates fall as a list ages. Conversion rates rise as a funnel improves. Click rates change with creative. When p is non-stationary the binomial-implied variance is wrong, usually too small, and you over-call lifts as significant. The fix is to inspect the empirical variance against n * p * (1 - p) and switch to a beta-binomial or quasi-binomial when the data are overdispersed.
The fourth pitfall is reaching for the exact binomial when n is large enough that the normal approximation will do. For n = 100,000 the exact computation is slow and numerically unstable; the approximation is one line of code. The fix is the rule of thumb: if n * p >= 5 and n * (1 - p) >= 5, switch to normal; if also p is small with n large, switch to Poisson.
Interview questions
What is a binomial distribution in one sentence? The distribution of the count of successes across n independent trials each with success probability p. Mean is n * p, variance is n * p * (1 - p).
When does the binomial converge to the normal? When n * p >= 5 and n * (1 - p) >= 5 hold simultaneously. A consequence of the central limit theorem applied to a sum of n Bernoulli random variables. For p around 0.5 the approximation is good from n above 20; for p near 0 or 1 you need n in the hundreds.
A site converts at 3 percent and gets 1,000 visitors. Probability of fewer than 20 conversions? P(X < 20) = stats.binom.cdf(19, 1000, 0.03), which lands around 0.048. About a 5 percent chance under business as usual, so on a real day you would check whether the funnel broke before celebrating that the metric came in low.
Why do A/B tests for conversion use a z-test rather than an exact binomial test? For large samples the binomial is well-approximated by a normal, and the two-proportion z-test is the cleanest closed-form summary. It matches the exact test to four decimal places. The exact test is reserved for small samples where the approximation fails.
A vendor claims a 95 percent accurate fraud signal. Why is that misleading? The vendor is reporting P(signal | fraud). What you care about is P(fraud | signal), which depends on the base rate. With a 0.5 percent base rate and a 5 percent false-positive rate, the posterior is roughly 1 in 11 even after the signal fires. The binomial sets up the count; Bayes theorem flips the conditional.
Related reading
- P-value explained simply
- Null hypothesis explained simply
- Effect size explained simply
- A/B testing peeking mistake
- Bayes theorem explained simply
If you want to drill distribution questions like this every day, NAILDD is launching with 500+ analyst interview problems covering exactly this pattern.
FAQ
Is the Bernoulli distribution a special case of the binomial?
Yes, the Bernoulli is the binomial with n = 1. One trial, two outcomes, success probability p, count is either zero or one. The binomial is the sum of n independent and identically distributed Bernoulli random variables, and that decomposition is how you derive the mean and variance from scratch in an interview.
When should I use the exact binomial test rather than a z-test?
Use the exact test when n is small or when p is close to zero or one, since the normal approximation breaks down on the tails. Use it also when the interviewer specifically asks for the exact answer. A reasonable rule: if n * p or n * (1 - p) drops below 5, you have crossed into exact-test territory.
How do I compute the binomial in SQL?
Most warehouses do not ship a binomial pmf or cdf out of the box, so the practical pattern is to compute counts and proportions in SQL, then run the test in Python on the summary numbers. If you have to stay inside SQL, fall back to the normal approximation with mu = n * p and sigma = sqrt(n * p * (1 - p)) and use a tabulated normal cdf. For a worked SQL approach see how to calculate CTR in SQL.
What if my data are overdispersed?
Overdispersion means the empirical variance is larger than n * p * (1 - p), which usually points to non-independent trials or a non-constant p across observations. The standard fixes are a beta-binomial model, which gives p its own prior distribution, or a quasi-binomial regression with a dispersion parameter that scales the variance. Both retain the binomial spirit while honoring the extra noise.
Can I run a one-sided binomial test?
Yes, when the alternative hypothesis is genuinely directional. A one-sided test doubles your power relative to a two-sided test at the same alpha, but only if you commit to the direction before looking at the data. If you peek and then choose the direction, you have spent the alpha twice and the false-positive rate is no longer what the test claims.