Binomial distribution explained simply
Contents:
Why analysts care about the binomial
Almost every binary metric is binomial under the hood. Conversion rate, click-through rate, churn flag, signup rate, fraud flag — all of them are "N trials, each with the same success probability p, count the successes." Knowing how to compute the mean, the variance, and a tail probability covers half the A/B test interview.
The other half is recognizing the shape in messy contexts. A PM at Stripe asks "what's the chance we see at least 120 conversions out of 1,000 visits if true rate is 10 percent?" That's a binomial tail. A Notion data scientist asks "how big a sample do I need to detect a 1-point lift on a 15 percent conversion?" That's a sample-size formula derived from binomial variance. Recruiters at Meta and Airbnb lean on these because the math separates candidates who memorize from candidates who understand.
This post walks the formula, runs the Python, then connects it to the interview questions you'll actually face.
Short explanation
The binomial distribution B(N, p) describes the number of successes in N independent trials, where each trial succeeds with probability p.
A cleaner way to phrase it: take N independent Bernoulli(p) draws — coin flips with bias p — and add them up. The sum is binomial. That's literally the construction. Everything else falls out of it.
Concrete picture: flip a fair coin 10 times. How many heads do you see? Somewhere between 0 and 10, but most of the mass sits around 5. The shape — symmetric, peaked at Np = 5, thin tails — is Binomial(10, 0.5).
The formula
The probability mass function is:
P(X = k) = C(N, k) * p^k * (1 - p)^(N - k)Where C(N, k) = N! / (k! * (N - k)!) is the binomial coefficient — the number of ways to choose k successes out of N trials.
Reading the three pieces:
C(N, k)counts arrangements. Five heads in 10 flips can happen 252 different ways.p^kis the probability that thekchosen positions are all successes.(1 - p)^(N - k)is the probability the rest are all failures.
Key characteristics
- Mean:
N * p - Variance:
N * p * (1 - p) - Standard deviation:
sqrt(N * p * (1 - p))
The variance formula matters more than the mean for analytics work. Almost every confidence interval and power calculation for a proportion comes from p(1 - p) / N — that's the variance of the sample proportion, which is the binomial variance divided by N^2.
A worked analytics example
You're a growth analyst at a DoorDash-style marketplace. The product team launches a new checkout flow and asks: "We expect 10 percent conversion. If 1,000 sessions hit the page today, how many orders should we see?"
Expected orders: E[X] = 1000 * 0.1 = 100.
Standard deviation: sqrt(1000 * 0.1 * 0.9) = sqrt(90) ≈ 9.49.
So a "typical day" is roughly 100 plus or minus 9.5 orders — meaning anything between 90 and 110 is well within one standard deviation. If the dashboard shows 105 orders, that's not a launch win. If it shows 145, that's a four-sigma event and worth investigating.
This is the single most useful framing for daily KPI work: convert the metric to a rate, plug into Np(1-p), get a standard deviation, and ignore movements smaller than 2 sigmas. Most "wins" reported in standups die under that lens.
Python recipes
scipy.stats.binom covers every common operation you'll need.
from scipy.stats import binom
# P(X = 100) — exactly 100 conversions out of 1000 at p = 0.1
print(binom.pmf(100, 1000, 0.1))
# P(X <= 95) — 95 or fewer conversions
print(binom.cdf(95, 1000, 0.1))
# P(X >= 120) — at least 120 conversions
print(1 - binom.cdf(119, 1000, 0.1))
# Simulate 10,000 days of traffic
samples = binom.rvs(n=1000, p=0.1, size=10000)
print(samples.mean(), samples.std())The off-by-one trap on the survival function bites people in interviews. P(X >= k) is 1 - cdf(k - 1), not 1 - cdf(k). Scipy also exposes binom.sf(k - 1, n, p) if you want to avoid floating-point error in the deep tail.
If you need a confidence interval for an observed proportion, statsmodels gives you the right tool:
from statsmodels.stats.proportion import proportion_confint
# 105 conversions out of 1000 — 95% CI
low, high = proportion_confint(count=105, nobs=1000, method="wilson")
print(low, high)Wilson beats the textbook normal interval at small N or extreme p and is the default any senior analyst will accept.
Approximations: Normal and Poisson
The binomial PMF gets expensive to compute at large N, and historically analysts leaned on two approximations. They still show up in interviews because they explain when "use the z-test" is safe.
Normal approximation
When N is large and p is not in the tails:
Binomial(N, p) ≈ Normal(Np, sqrt(Np(1 - p)))The rule of thumb is Np > 5 AND N(1 - p) > 5. Some textbooks say 10 — be safer. This is why a z-test on a proportion works for N = 1000, p = 0.1 (Np = 100, comfortable) but breaks for N = 100, p = 0.01 (Np = 1, no).
Poisson approximation
When p is small and N is large with moderate Np:
Binomial(N, p) ≈ Poisson(Np)This shines for rare events: 10,000 transactions with 0.01 percent fraud rate, click-through on a low-engagement banner, server errors at high QPS. The Poisson PMF is cheaper and the parametric assumptions are looser.
The dividing line: if p < 0.05 and N > 100, prefer Poisson. If p is in the middle and N is in the hundreds or more, prefer Normal. If N is small (under 30), use the exact binomial — both approximations are unreliable.
How it shows up in A/B testing
Sample size for a proportion test
The standard formula for detecting a minimum effect MDE on a baseline conversion p at power 1 - β and significance α is:
N_per_arm ≈ ((z_{α/2} + z_β)^2 * 2 * p * (1 - p)) / MDE^2That p * (1 - p) term is the binomial variance. Everything in proportion-test power analysis traces back to it. If you bump baseline conversion from 5 percent to 50 percent at fixed MDE, required N grows by a factor of roughly 5 because variance peaks at p = 0.5.
Standard error of a difference in proportions
For two arms with conversions p1, p2 and sizes N1, N2:
SE = sqrt(p1 * (1 - p1) / N1 + p2 * (1 - p2) / N2)This is the denominator of the z-statistic for two proportions. It comes directly from adding the variances of two independent binomials.
Chi-square and Fisher's exact
A 2x2 contingency table comparing conversion in two arms is a comparison of two binomial proportions. Chi-square is the large-sample approximation; Fisher's exact is the small-sample version. Both reduce to binomial probability under the hood.
If you want a deeper interview-grade walkthrough of A/B test mechanics, the A/B testing complete guide covers the full pipeline including sample sizing and stopping rules.
Interview-style problems
Problem 1: tail probability
"Conversion is 5 percent, 100 visits arrive. What's the probability of at least 10 conversions?"
1 - binom.cdf(9, 100, 0.05)
# ≈ 0.0282 — about 2.8%Translate: even if your real rate is exactly 5 percent, seeing 10 conversions out of 100 happens roughly 1 day in 35. So a single-day "win" at this scale should not move you.
Problem 2: same proportion, different N
"What's more likely to deviate: 10 conversions out of 100, or 100 out of 1000?"
Same expected proportion of 10 percent. Different variance. The coefficient of variation sqrt(p(1-p)/N) / p falls as 1/sqrt(N), so the smaller sample is roughly sqrt(10) ≈ 3.16 times noisier. The 100-trial version routinely swings between 5 and 15 percent; the 1,000-trial version mostly stays inside 8 to 12.
Problem 3: rapid-fire definitions
"What is a binomial?" — Sum of N independent Bernoulli(p) trials.
"When do you use it?" — Any binary metric with a fixed number of trials and constant success probability.
"Mean and variance?" — Np and Np(1 - p).
"When does the normal approximation hold?" — Np > 5 AND N(1 - p) > 5.
"When Poisson?" — Large N, small p, moderate Np.
If you can answer those four questions in under a minute without notes, the recruiter at Linear or Snowflake moves to the next topic.
Common pitfalls
Confusing binomial with Bernoulli is the most common slip. Bernoulli is a single coin flip — one trial, two outcomes. Binomial is a sum of N Bernoullis. When an interviewer asks "is this Bernoulli or binomial?", they're testing whether you locked the count of trials. State N explicitly before answering.
Forgetting the independence assumption causes the worst silent bugs. Binomial requires each trial to be independent of the others with constant p. Cascading funnels violate both: if a user must convert on A to see B, the joint counts are not binomial. The right model is a multinomial or sequential funnel. Treating it as binomial will quietly inflate your effective sample size and shrink your variance.
Applying the Poisson approximation when p is too large produces wrong tail probabilities. At p = 0.05 and N = 100, Poisson and binomial disagree by a few percent on the median; at p = 0.15 they diverge sharply in both tails. Stick to the rule: Poisson works when the event is rare. If you can compute the exact binomial in scipy, just compute it.
Summing Bernoullis with different p and calling it binomial is the trap that gets advanced candidates. If trial i has its own success probability p_i, the sum is the Poisson-binomial distribution. The mean is still sum(p_i), but the variance is sum(p_i * (1 - p_i)), which is strictly smaller than the binomial variance computed with average p. This shows up in heterogeneous user populations and causes mis-specified power calculations.
Using the normal approximation on tiny samples kills exam scores. At N = 20, p = 0.1, the Wald confidence interval routinely spans negative numbers, which makes no sense for a probability. Use Wilson or Clopper-Pearson intervals for small N, or just compute the exact binomial CI.
Related reading
- P-value explained simply
- Effect size explained simply
- Power analysis explained simply
- Bayes theorem explained simply
- Null hypothesis explained simply
- How to calculate chi-square test in SQL
If you want to drill these distribution and A/B testing questions daily, NAILDD is launching with 500+ problems across exactly this pattern.
FAQ
When is a process not binomial?
Two assumptions must hold: trials are independent, and the success probability p is the same on every trial. Break either one and you're outside binomial territory. Examples: streaks where success this trial raises the chance on the next (positive autocorrelation), or a mix of two user segments with different p (heterogeneity, which collapses into the Poisson-binomial). In practice, with logged product data, the independence assumption is usually fine within a short window but breaks across cohorts or seasonality.
Why is the mean N * p?
Linearity of expectation. Each Bernoulli trial has expected value p. The binomial is the sum of N such trials, so the expected sum is N * p. This works even when the trials are not independent — independence isn't required for the mean, only for the variance. That's why the mean is robust but the variance fails when independence breaks.
What's the connection to the normal distribution?
By the central limit theorem, the sum of many independent and identically distributed random variables converges to a normal distribution. A binomial is exactly such a sum, so Binomial(N, p) converges to Normal(Np, sqrt(Np(1-p))) as N grows. The convergence is fastest when p is near 0.5 and slowest near 0 or 1 — that's why the Np > 5 rule of thumb exists.
How do I pick between exact binomial, normal, and Poisson in practice?
If N is small (under 50) or p is extreme (under 0.05 or over 0.95), use the exact binomial. scipy.stats.binom is fast enough for any N you'll see. If N is large and p is moderate, the normal approximation gives you all the z-test machinery. If p is small and N * p is moderate, Poisson is cheaper. When in doubt, use the exact binomial.
Does the binomial assume a known p?
The classical formulation does. In real analytics, you estimate p_hat = k / N from data and propagate uncertainty. That's where confidence intervals (Wilson, Clopper-Pearson) and Bayesian credible intervals (Beta-Binomial) come in. A Beta prior plus binomial observations gives a Beta posterior, which is why Beta-Binomial conjugacy shows up so much in Bayesian A/B testing and Thompson sampling.
Can I use binomial for events that aren't 0/1?
Only by binarizing them first. If the underlying event is "user clicked at least once" then yes — collapse to clicked or not. If you care about the count of clicks per user, that's no longer binomial; use Poisson, negative binomial, or a hurdle model depending on the dispersion. The diagnostic is variance-to-mean ratio: binomial gives 1 - p, Poisson gives exactly 1, negative binomial gives greater than 1.