Poisson distribution explained simply
Contents:
What Poisson actually models
The Poisson distribution is the workhorse for counting things over a fixed window. You reach for it when a PM at Stripe asks "how often will we see more than ten 500 errors per minute next week," or when a growth analyst at DoorDash models order cancellations per restaurant per day. The unifying feature is the same every time: events arrive one at a time, the window is fixed, and you want the probability of seeing exactly k of them.
Formally, Poisson is a discrete distribution with a single parameter, λ (lambda) — the average number of events per interval. If λ = 3, you expect three events on average per window, but any individual window might give you 0, 1, 5, or even 8. The randomness sits on top of a stable mean rate, which is why one number describes the whole shape.
The probability mass function for exactly k events is:
P(X = k) = (λ^k * e^(-λ)) / k!Here e ≈ 2.71828 and k! is the factorial. The e^(-λ) term dominates for small counts; λ^k / k! dominates as k grows toward and past λ. Both pieces matter — if you only memorize the formula, you will get tripped up on "why does the mode sit near λ?"
Properties you must know
The headline property is that the mean and the variance both equal λ. This single fact is one of the most-asked items on data-analyst stats screens, and it doubles as a diagnostic: if your empirical variance is far larger than your empirical mean, your data is almost certainly not Poisson.
Additivity. If X follows Poisson(λ₁) and Y follows Poisson(λ₂) independently, then X + Y follows Poisson(λ₁ + λ₂). If tickets arrive at 5 per hour on average, then over a 3-hour shift the count follows Poisson(15). You can collapse independent Poisson streams from different channels (email, chat, in-app) into one Poisson with the summed rate — useful when reasoning about capacity.
Limit of the binomial. Poisson is the limiting case of Binomial as n grows large and p shrinks small with the product np = λ staying finite. That is why Poisson shows up everywhere there are "rare events across many trials." Each individual user has a tiny probability of clicking your push notification in the next second, but you have millions of users, so the total click count looks Poisson.
Discreteness. X only takes values 0, 1, 2, 3, … — non-negative integers. You can never observe "2.5 errors per session." This sounds obvious until someone fits a Gaussian to small-count data and reports a probability of negative crashes. It happens in production dashboards more than anyone admits.
Skew at small λ. For λ less than about 5, the distribution is visibly right-skewed: long tail on the right, hard wall at zero. Once λ exceeds 20, Poisson(λ) is well approximated by Normal(λ, λ), and most analysts switch to the normal approximation for confidence intervals.
When Poisson fits
Poisson is the right model when three conditions roughly hold. Events are independent: one support ticket arriving does not cause another. The average intensity is constant across the window you are measuring, or you are willing to slice the window into pieces where it is roughly constant. Events are rare relative to the number of opportunities: any given user has a tiny chance of doing the thing in any given second, but there are many users and many seconds.
Concrete analytics examples: bugs per release, push notifications clicked per day per cohort, refunds per week per SKU, HTTP 500 errors per minute per service. Inside Netflix or Uber, nearly every count-per-window metric on a status dashboard is implicitly Poisson under the hood — it makes the math for confidence bands and anomaly detection tractable.
Common pitfalls
The first pitfall is ignoring overdispersion. When teams pull "purchases per user per month" and treat the count as Poisson, they usually find empirical variance two, five, or ten times the mean. This overdispersion almost always comes from heterogeneity — heavy buyers and light buyers averaged together. The fix is the negative binomial, which adds a second parameter and reduces to Poisson in the special case. Teams at Airbnb and Notion default to negative binomial for per-user count data for this reason.
The second pitfall is treating clustered events as independent. When a bug ships at Vercel and triggers a wave of error reports, those events share a common cause. A Poisson model will underestimate the probability of extreme spikes, your alerting will be too quiet during incidents, and post-mortems will be confused. The fix is either to model the clustering directly (Hawkes processes, compound Poisson) or coarsen the time window so the bursts are absorbed.
The third pitfall is using a constant λ when the rate is obviously time-varying. Order volume at a food-delivery platform is not flat across the day — there is a lunch peak, a dinner peak, and a dead overnight. If you fit one Poisson to the whole day, the model will fit none of it well. Either split into hourly buckets where the rate is roughly constant, or use a non-homogeneous Poisson process where λ becomes λ(t).
The fourth pitfall is forgetting that Poisson counts events in a window. If your data is "time between events" instead of "number of events per window," you want the exponential, not the Poisson — they are dual but not interchangeable. Plenty of analysts have wasted a Tuesday trying to fit Poisson to a column that should have been modeled with the exponential.
The fifth pitfall is using the normal approximation too aggressively at small λ. For λ = 2, Normal(2, 2) gives non-trivial probability mass below zero, which is nonsense for a count. Stick to the exact Poisson PMF and CDF when λ is small; only switch to the Gaussian approximation when λ comfortably exceeds 20.
Link to the exponential distribution
Poisson and the exponential distribution are two views of the same underlying object — the Poisson process. Poisson answers "how many events happened in this window?" The exponential answers "how long until the next event?" If counts per unit time follow Poisson(λ), then inter-arrival times follow Exponential(λ) with mean 1/λ.
A worked example: a support queue at Stripe receives 4 tickets per hour. Then λ = 4 per hour, and the mean time between consecutive tickets is 1/λ = 15 minutes. This is a common follow-up after Poisson — they give you λ and ask for the expected waiting time, or the probability no ticket arrives in the next 30 minutes: P(T > 0.5) = e^(-4 · 0.5) = e^(-2) ≈ 0.135, about 13.5%.
This duality has real operational use. Capacity planning at Uber dispatch, queue-length math at customer support, and SLA modeling at Vercel all lean on swapping between the count view and the interval view. Expect at least one interview question that bounces between the two.
Python recipe with scipy
from scipy import stats
# Average rate: 3 events per window
poisson = stats.poisson(mu=3)
# Probability of exactly 5 events
print(poisson.pmf(5)) # 0.1008
# Probability of at most 2 events: P(X <= 2)
print(poisson.cdf(2)) # 0.4232
# Probability of more than 5 events: P(X > 5)
print(1 - poisson.cdf(5)) # 0.0839
# Quantile: smallest k with P(X <= k) >= 0.95
print(poisson.ppf(0.95)) # 6.0
# Sample 10,000 draws
samples = poisson.rvs(size=10000)
print(f"mean: {samples.mean():.2f}, var: {samples.var():.2f}")
# mean ~ 3.00, var ~ 3.00 — they should match for PoissonA practical note: for discrete distributions, use pmf (probability mass function), not pdf. Mixing them up is a classic mistake when you cargo-cult code over from continuous distributions like Normal or Exponential.
You can also fit Poisson to data in one line. With statsmodels, a Poisson regression is sm.GLM(y, X, family=sm.families.Poisson()).fit(). The output gives an estimated λ that varies with covariates — far more useful than a single global rate when you have features like region, hour-of-day, or channel.
Interview questions
Interview screens at Snowflake, Databricks, and similar data-heavy companies touch Poisson when stats comes up. The questions are predictable.
"What does the Poisson distribution model?"
The number of independent events occurring at a constant intensity over a fixed window. The parameter λ is the expected number of events per window. Canonical examples: clicks on a banner per day, defects per batch, inbound calls per hour, login attempts per minute. The right answer also names the assumptions — independence, constant rate, rare events — because the next question is usually "and when does it fail?"
"What are the mean and variance of Poisson?"
Both equal λ: E(X) = Var(X) = λ. This is the property to memorize and to drop into the conversation early — it tells the interviewer you actually understand the distribution. Follow up by noting that empirical variance far above the mean is the standard signal for overdispersion, which usually means negative binomial is a better fit.
"How are Poisson and the exponential distribution related?"
They describe the same Poisson process from two angles. If the count of events per unit time follows Poisson(λ), the time between consecutive events follows Exponential(λ) with mean 1/λ. Poisson counts events; the exponential measures gaps. A clean way to phrase it in an interview: "they are dual — same process, different question."
"Give an example where Poisson does not fit count data."
Purchases per user per month: some users buy weekly, others once a year. The variance is far larger than the mean (overdispersion), so Poisson underfits. Negative binomial is the standard replacement. Another good example is bursty events — DDoS attack traffic, viral content spikes — where the independence assumption breaks because one event makes others more likely.
"Support gets 6 tickets per hour on average. What is the probability of exactly 10 tickets in an hour?"
X follows Poisson(6). P(X = 10) = (6^10 · e^(-6)) / 10! ≈ 0.0413, about 4.1%. In Python: stats.poisson(mu=6).pmf(10). A good follow-up to volunteer: "and the probability of more than 10 is 1 - stats.poisson(mu=6).cdf(10) ≈ 0.0426" — it shows you understand the CDF without being asked.
Related reading
- Binomial distribution guide — the cousin Poisson approximates in the rare-event limit.
- Normal distribution guide — what Poisson approaches for large λ.
- Variance and standard deviation — the spread half of the E(X) = Var(X) = λ property.
If you want to drill stats and SQL interview questions in this exact pattern every day, NAILDD is launching with 500+ problems across distributions, A/B testing, and product analytics.
FAQ
What is the Poisson distribution in simple terms?
The Poisson distribution tells you the probability of seeing exactly 0, 1, 2, 3, or more events in a fixed window, given an average rate. If a help desk receives 5 emails per hour on average, Poisson tells you how often you will actually see 0, 3, or 10 emails in any given hour. It is the default model for count-per-window data when events are roughly independent and the rate is stable.
Why are the mean and variance of Poisson equal?
It falls out of the algebra of the probability mass function — when you compute E(X) and Var(X) from the Poisson PMF, both sums collapse to λ. Practically, this property is what makes Poisson easy to diagnose: compute the sample mean and sample variance of your count data, and if they are close, Poisson is plausible. If the variance is much larger, switch to negative binomial.
How is Poisson different from the normal distribution?
Poisson is discrete — only non-negative integers — and right-skewed when λ is small. It has one parameter, λ. The normal distribution is continuous, symmetric, and parameterized by mean μ and standard deviation σ. For λ greater than roughly 20, Poisson(λ) is well approximated by Normal(λ, λ), which is convenient for confidence intervals but breaks down at small λ where you can get nonsensical negative quantiles.
How do I check whether Poisson fits my data?
Three tests, in increasing rigor. First, compare the sample mean and sample variance — they should be roughly equal. Second, plot a histogram of your counts and overlay the theoretical Poisson PMF with λ set to the sample mean; the shapes should match. Third, run a formal chi-square goodness-of-fit test against the theoretical Poisson. If variance is much larger than mean, fit a negative binomial regression instead and compare AIC.
When should I use negative binomial instead of Poisson?
Use negative binomial whenever the empirical variance is meaningfully larger than the mean (overdispersion), which is most real-world count data once you have heterogeneous units like users, restaurants, or sessions. Negative binomial reduces to Poisson when its dispersion parameter approaches zero, so you lose nothing by defaulting to it for production models. Poisson stays valuable for back-of-envelope work, simple capacity planning, and interview problems where the data is clean by construction.
Can λ change over time in a Poisson model?
Yes — that is called a non-homogeneous Poisson process, where λ becomes λ(t). It is the right model for order volume that peaks at lunch and dinner, or for traffic that spikes during a product launch. In practice analysts either slice the day into roughly-constant buckets and fit a separate Poisson per bucket, or fit a Poisson regression where λ is parameterized by covariates like hour-of-day and day-of-week.