Regression discontinuity explained

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

The scenario you have already seen

A customer with credit score 699 gets the standard offer. The one at 701 gets a promotional bonus, lower APR, and a welcome credit. Both are essentially the same person on the day they applied — similar income, similar history, probably one late utility payment apart. Yet one gets treatment and one does not.

That tiny line at 700 is doing real work. Measure lifetime value on each side and you are no longer comparing self-selected groups; you are comparing customers who are statistically twins, separated only by a bureaucratic rule. That is the point of regression discontinuity design.

You see this pattern more often than people realize. Stripe gates fee tiers on monthly volume. DoorDash sets bonus thresholds at trip counts. Uber promotes drivers at a star rating. Airbnb pushes Superhost status at a review threshold. Each cutoff is a free natural experiment sitting in the warehouse.

What RDD actually is

Regression discontinuity is a causal inference method that estimates the effect of a treatment assigned based on whether a continuous score crosses a cutoff. The score is the running variable, the cutoff is where treatment switches on, and the treatment effect is the size of the jump in the outcome at exactly that cutoff.

The mental model is simple. Plot the outcome on Y and the running variable on X. Fit a smooth curve on each side of the cutoff. If treatment has zero effect, the two curves meet smoothly. If treatment has a real effect, you see a vertical jump at the cutoff, and the size of that jump is your estimate.

What makes RDD compelling is that you do not need to assume anything heroic about why people end up on one side or the other. You only need to assume that everything other than the treatment is smooth across the cutoff. The customer at 700.0 and the one at 699.9 are almost identical on every trait; the only thing that flips is treatment.

A worked example

A fintech offers a cashback bonus to credit card applicants who score 700 or above on its risk model. You want to measure how much that bonus boosts twelve-month lifetime value.

A customer at 699 has no bonus and lands at $200 LTV. One at 701 gets the bonus and lands at $280. The $80 gap, if everything else is smooth at 700, is your causal estimate of the bonus effect. That gap is not the average effect across all customers — it is the local effect for customers right at the threshold, which is exactly the population a product team can argue about lowering the bar for.

Now imagine you naively compared all bonus customers to all non-bonus customers. The bonus group has higher scores, higher income, and lower default risk; they would have earned higher LTV with or without the bonus. The raw difference massively overstates the treatment effect. RDD strips that out by restricting attention to a thin slice on either side of 700, where the confounders have nowhere to hide.

Sharp vs fuzzy

In a sharp design, treatment is a deterministic function of the running variable. Everyone with score 700 or above receives the bonus, everyone below does not, and there is no opt-out. Most algorithmic rules — credit cutoffs, eligibility scores, automated tier promotions — produce sharp designs.

In a fuzzy design, crossing the cutoff changes the probability of treatment but not deterministically. Maybe 80 percent of customers with score 700+ take the bonus offer and 20 percent ignore the email. Or a manual review process overrides the rule for borderline cases. Fuzzy RDD shows up wherever humans sit in the decision loop, or the treatment requires consent.

Fuzzy RDD needs an instrumental variables wrapper. The cutoff itself is the instrument: crossing it shifts the probability of treatment, but conditional on the running variable should not directly affect the outcome. You estimate two regressions — outcome on cutoff, treatment on cutoff — and divide one coefficient by the other to recover the local average treatment effect for compliers.

Assumptions that have to hold

The continuity assumption is the heart of RDD. All factors other than the treatment must vary smoothly through the cutoff. If your bonus rule fires at 700 but you also extend payment terms at 700 because of a separate policy, the jump conflates both effects. Before publishing any RDD result, list every policy that uses the same running variable and confirm none of them step at the same cutoff.

The no-manipulation assumption is the one that usually fails. If applicants can game the score — by knowing the threshold and submitting an extra document to nudge themselves over — then 700.1 is no longer comparable to 699.9. The 700.1 group is enriched with strategic customers. A density test (McCrary 2008) checks whether the distribution shows an unnatural pile-up just above the cutoff. A clean RDD has a smooth density; a manipulated one has a visible bump.

The sufficient density assumption is about power. You need enough observations close to the cutoff to estimate two regressions on a narrow window. If only six customers fall within $500 of the threshold, you cannot do RDD on that cutoff regardless of how clean it looks.

Python implementation

The minimal implementation is a regression on a window around the cutoff with a treatment indicator and an interaction term. You center the running variable at the cutoff so the intercept of the treatment indicator is exactly the jump.

import statsmodels.api as sm
import pandas as pd

CUTOFF = 700
BANDWIDTH = 30

window = data[abs(data['score'] - CUTOFF) < BANDWIDTH].copy()
window['treatment'] = (window['score'] >= CUTOFF).astype(int)
window['score_centered'] = window['score'] - CUTOFF
window['interact'] = window['treatment'] * window['score_centered']

X = sm.add_constant(window[['treatment', 'score_centered', 'interact']])
model = sm.OLS(window['ltv'], X).fit(cov_type='HC1')
print(model.summary())
# coefficient on 'treatment' is the local average treatment effect

The interaction term lets the slope of the outcome differ on each side of the cutoff. Without it, you would force both sides to share a single trend, which biases the jump estimate whenever the outcome curves differently in the treated and control regions.

For production analyses, the rdrobust package handles bandwidth selection, bias correction, and robust standard errors in one call:

from rdrobust import rdrobust, rdplot

result = rdrobust(y=data['ltv'], x=data['score'], c=CUTOFF)
print(result)

rdplot(y=data['ltv'], x=data['score'], c=CUTOFF, p=2)

The plot is non-negotiable. Show the binned scatter and fitted lines on either side of the cutoff to anyone reading the result. A jump on a chart is far more persuasive than a coefficient in a regression table, and an absent jump should make you skeptical of any coefficient that claims otherwise.

Train for your next tech interview
1,500+ real interview questions across engineering, product, design, and data — with worked solutions.
Join the waitlist

Choosing the bandwidth

The bandwidth controls how close to the cutoff you look. A wide bandwidth uses more data and shrinks the standard error, but pulls in customers far from the threshold who may differ systematically from the borderline group. A narrow bandwidth keeps the comparison tight but throws away observations and inflates the variance.

The standard approach is to let an algorithm pick it. The Imbens-Kalyanaraman and Calonico-Cattaneo-Titiunik MSE-optimal procedures balance bias and variance by minimizing the mean squared error of the treatment estimate. Both ship in rdrobust. The numbers they pick are not magical, but they remove a degree of researcher discretion that reviewers attack.

Always report your estimate at the optimal bandwidth, at half of it, and at double of it. If the three numbers are similar, your result is robust. If they swing wildly, the cutoff effect is not as clean as the headline coefficient suggests, and you should say so before someone else does.

When to reach for RDD

RDD is the right tool whenever a sharp rule already exists in your product. The classic settings are credit score cutoffs, eligibility thresholds for promotions, scholarship grade cutoffs, age-based policies, and elections decided by narrow margins. If you can point at a line in a config file or a policy document and say "everyone above this gets treatment X," RDD is on the table.

The alternatives have different fits. A randomized A/B test beats RDD whenever you can run one — randomization is cleaner and generalizes to the whole population, not just the threshold. Difference-in-differences works when the treatment switches on at a moment in time. Propensity score matching is the move when there is no cutoff and you have to lean on covariate balance instead.

The hardest part is usually political, not statistical. Engineering teams keep tweaking the cutoff between releases. Marketing tests a new threshold for a quarter and forgets to document it. The cleanest RDD lives in a domain where the rule has been stable for months and the policy owner will confirm in writing that nothing else was changing at the same threshold during your study window.

Common pitfalls

Ignoring manipulation is the single most damaging mistake. If applicants can see the score before submission and resubmit when they fall short, the density of observations spikes just above the cutoff and your estimate is contaminated by strategic applicants. Always run a density test, plot the histogram on a fine grid, and walk through with the product team how the score is computed and whether users see it before the treatment fires.

Choosing the wrong functional form is the second trap. If the real relationship is curved and you fit a straight line on each side, the slopes pivot to compensate and load that misfit onto the jump at the cutoff. Plot the binned data first, then choose between linear, quadratic, or local polynomial fits based on what you see. Quadratic is usually the safest default unless your sample is tiny.

Cherry-picking the bandwidth is the third. It is tempting to nudge the window until the p-value crosses below five percent — that is p-hacking dressed in technical language. Pre-register the bandwidth using a published rule, or commit to reporting the estimate at the algorithm-chosen bandwidth and at the two flanking sensitivities. If your story falls apart when you double the window, your story was never robust.

Overstating external validity is the fourth. RDD estimates a local average treatment effect at the cutoff. The effect of a credit bonus on customers right around 700 may be larger or smaller than the effect at 800. When you brief leadership, lead with the population the estimate actually applies to, not the one you wish it did.

On the interview whiteboard

If an interviewer asks you to explain RDD in two minutes, walk them through the credit score example first, then name the assumptions. Resist diving into bandwidth math before the interviewer has anchored on the intuition. Once they have the picture, you can talk about sharp versus fuzzy, the continuity assumption, the manipulation test, and how you would estimate it with a regression and a binned scatter plot.

A common follow-up is the difference between RDD and an A/B test. A/B is better when you can do it; RDD is what you reach for when randomization is impossible. The cutoff itself is not random, but the local comparison at the cutoff is as good as random in the limit, which is enough to recover a causal effect for that local population.

To drill these scenarios daily, NAILDD covers SQL, experimentation, and product cases.

FAQ

Can I do RDD inside an A/B test?

If you already have a clean randomized experiment, you do not need RDD — randomization handles confounding for the whole population, while RDD only handles it at the cutoff. RDD is a substitute for randomization, not a complement. The exception is when an experiment was randomized but the treatment was only delivered to people who also crossed a separate score threshold; then you might combine the two designs.

How many observations do I need near the cutoff?

Rule of thumb: at least a few hundred observations within the optimal bandwidth on each side for a stable estimate of a moderate effect. If the effect is small or the outcome is noisy, you need more. The right diagnostic is to compute a power calculation at your expected effect size before you start, not after you read off a non-significant coefficient.

What if my running variable is not continuous?

If it is ordinal with many levels, like a score from 0 to 1000 in integer steps, RDD is fine as long as you have observations across many of those levels near the cutoff. If it is coarse, like a 1 to 5 star rating, classic RDD breaks down because the local linear fits collapse to a comparison of two adjacent values. Discrete-running-variable RDD exists but the inferential guarantees are weaker, and in practice many analysts switch to a different design.

How do I explain this to a non-technical stakeholder?

Show the binned scatter plot with the cutoff marked. Walk them through the line on the left, the line on the right, and what the size of the jump means in business units. Then say one sentence about what would have to be true for the result to be wrong: nothing else flips at the cutoff and customers cannot manipulate their score. Stakeholders care about the picture and the story; the regression table is for the appendix.