How to calculate K-factor in SQL
Contents:
What K-factor is and why it matters
K-factor is the quantitative measure of virality: on average, each existing user brings in this many new users. When K is greater than or equal to 1, the product grows on its own without paid acquisition. When K is below 1, the viral loop still acts as a multiplier on paid acquisition, but it does not produce the kind of organic explosion people associate with "going viral." Picture this: it is Monday morning, your growth PM at a place like Notion or Slack pings you on Linear and asks for the current K-factor by share channel and by signup cohort, with a note on whether cycle time is regressing. You have until the standup at noon to put numbers on a dashboard. The query has to be right the first time because growth review meets that afternoon and the head of product will quote whatever number you ship.
In B2C social apps and B2B collaboration tools like Notion, Figma, or the early-stage clones of Slack, K-factor sits at the center of growth interviews. It is also one of the easiest metrics to get wrong. The most common ways analysts at companies like Meta, Airbnb, and DoorDash mess up the calculation are counting invites instead of conversions, ignoring cycle time entirely, dividing by all registered users instead of active users, and failing to filter out self-invites where the same person clicks their own referral link from a second device.
This post walks through the formulas, gives seven SQL recipes you can paste into Snowflake, BigQuery, or Postgres, and lists the traps that real candidates fall into during data analyst interviews. If you want to drill more queries like this, NAILDD is launching with 500 plus SQL problems built around exactly this pattern.
The SQL formula
The headline formula has two factors:
K-factor = invites_per_user x conversion_rateThe first factor is invites_per_user, the average number of invitations each active user sends in the period. The second factor is conversion_rate, the fraction of invitations that actually became new signups. Multiplied together, they give you the average number of new users each existing user produces in the window.
There is a third number that is just as important: cycle time, the elapsed time from when an invitation is sent to when the invited person registers. For the same K, a product with a cycle measured in hours grows an order of magnitude faster than one with a cycle measured in weeks. A K of 1.2 with a 30-day cycle is worse than a K of 0.9 with a one-day cycle, in any window shorter than half a year.
Schema
For the rest of this post assume three tables, the shape you tend to see at Stripe, DoorDash, or any modern consumer app:
users (id, invited_by_user_id, signup_at)
invites (invite_id, sender_user_id, sent_at, converted_user_id, share_channel)
events (user_id, event_name, event_at)The invites table is the source of truth for the loop. converted_user_id is null when the invitation never produced a registration. The events table is what we use to define an active user — typically anyone with at least one event in the last 28 or 30 days.
Seven SQL recipes
1. Baseline K-factor over a window
WITH sent AS (
SELECT
COUNT(*) AS total_invites,
COUNT(converted_user_id) AS converted_invites
FROM invites
WHERE sent_at >= NOW() - INTERVAL '30 days'
),
active_users AS (
SELECT COUNT(DISTINCT user_id) AS active_cnt
FROM events
WHERE event_at >= NOW() - INTERVAL '30 days'
)
SELECT
s.total_invites::NUMERIC / NULLIF(a.active_cnt, 0) AS invites_per_user,
s.converted_invites::NUMERIC / NULLIF(s.total_invites, 0) AS conversion_rate,
(s.total_invites::NUMERIC / NULLIF(a.active_cnt, 0))
* (s.converted_invites::NUMERIC / NULLIF(s.total_invites, 0)) AS k_factor
FROM sent s, active_users a;Use active users as the denominator, not senders. If you divide invites only by users who sent at least one invitation, you measure the virality of the 5 percent power users who already love your product. That number is not useful for growth planning.
2. Invites per active user, weekly
WITH active AS (
SELECT DISTINCT
DATE_TRUNC('week', event_at) AS week,
user_id
FROM events
),
sent AS (
SELECT
DATE_TRUNC('week', sent_at) AS week,
sender_user_id AS user_id,
COUNT(*) AS cnt
FROM invites
GROUP BY 1, 2
)
SELECT
a.week,
COALESCE(SUM(s.cnt), 0)::NUMERIC / COUNT(DISTINCT a.user_id) AS invites_per_active_user
FROM active a
LEFT JOIN sent s USING (week, user_id)
GROUP BY a.week
ORDER BY a.week;3. Invite conversion rate, monthly
SELECT
DATE_TRUNC('month', sent_at)::DATE AS month,
COUNT(*) AS invites_sent,
COUNT(converted_user_id) AS conversions,
100.0 * COUNT(converted_user_id) / NULLIF(COUNT(*), 0) AS conversion_pct
FROM invites
GROUP BY 1
ORDER BY 1;When conversion drops, the cause is almost never the share channel or the copy on the invite landing page. It is almost always the onboarding sequence the invitee sees after they click. The growth team I worked with at one B2B SaaS lost 12 points of conversion the week they shipped a new email verification step.
4. Cycle time
SELECT
AVG(EXTRACT(EPOCH FROM (u.signup_at - i.sent_at)) / 86400) AS avg_cycle_days,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (u.signup_at - i.sent_at)) / 86400
) AS median_cycle_days
FROM invites i
JOIN users u ON u.id = i.converted_user_id
WHERE i.converted_user_id IS NOT NULL;Always report median next to mean. A long tail of invites that convert six months later will pull the mean up and mask a healthy short cycle.
5. K-factor by share channel
SELECT
share_channel,
COUNT(*) AS sent,
COUNT(converted_user_id) AS conversions,
100.0 * COUNT(converted_user_id) / NULLIF(COUNT(*), 0) AS conversion_pct
FROM invites
GROUP BY share_channel
ORDER BY conversion_pct DESC;Channels with warm identity (direct message in iMessage, WhatsApp, or a workspace tool where the recipient sees the sender's name and avatar) usually convert better than cold channels (copied link pasted into a tweet). But this is product-dependent — Airbnb's referral team has talked publicly about cases where the cold channel won because the warm channel was shipped with a worse landing page.
6. K-factor by signup cohort
WITH cohorts AS (
SELECT id AS user_id, DATE_TRUNC('month', signup_at)::DATE AS cohort
FROM users
),
cohort_stats AS (
SELECT
c.cohort,
COUNT(DISTINCT c.user_id) AS cohort_size,
COUNT(i.invite_id) AS total_invites,
COUNT(i.converted_user_id) AS conversions
FROM cohorts c
LEFT JOIN invites i ON i.sender_user_id = c.user_id
GROUP BY c.cohort
)
SELECT
cohort,
cohort_size,
total_invites::NUMERIC / NULLIF(cohort_size, 0) AS invites_per_user,
conversions::NUMERIC / NULLIF(total_invites, 0) AS conversion_rate,
(total_invites::NUMERIC / NULLIF(cohort_size, 0))
* (conversions::NUMERIC / NULLIF(total_invites, 0)) AS k_factor
FROM cohort_stats
ORDER BY cohort;Cohort K-factor answers the only question your growth lead really cares about: is the product getting more viral over time, or is the overall K just being inflated by an expanding paid acquisition funnel?
7. What K below 1 actually means
A K of 0.5 does not mean "still kind of viral." It means each user brings half a new user on average, and the invite chain decays geometrically. In steady state with K below 1, the viral loop acts as a multiplier on paid acquisition rather than a standalone growth engine:
total_users ~ paid_users * 1 / (1 - K)So at K equal to 0.5 every paid user eventually pulls in one extra organic user across the full decay tail. Still valuable — it cuts your effective CAC in half — but very different from the runaway exponential growth people imagine when they hear the word "viral."
Common pitfalls
The first pitfall is counting invites sent rather than invites converted. K-factor is defined on the conversion side: sending a referral link costs nothing, registering through it is the actual signal. Analysts who report invites-per-user as K-factor will produce numbers that look two to ten times higher than reality and lose the trust of the growth team the moment someone checks the math.
The second pitfall is ignoring cycle time. Two products with K equal to 1 can be on completely different growth trajectories. One with a one-day cycle doubles its user base every day; one with a 30-day cycle doubles every month. If you report K without cycle time, you are giving the leadership team a number they cannot act on. Always include median cycle days as a sibling metric.
The third pitfall is using the wrong denominator. Dividing total invites by all registered users (including dormant accounts that have not opened the app in six months) systematically understates invites-per-user, because the dormant tail is huge. Always restrict the denominator to users active in the same window the numerator is measuring. If you want a stricter definition, restrict to users who took the product's core action (a message sent, a doc created, an order placed) at least once.
The fourth pitfall is failing to filter self-invites. Power users will invite themselves from a second email address, or run the referral loop to claim a credit. You can catch most of these with simple heuristics: same IP within five minutes, same device fingerprint, identical email domain, or invite that converts in under 60 seconds. At one consumer app I worked on, self-invites accounted for 18 percent of all referrals during a credit promo — that is the difference between K equal to 0.45 and K equal to 0.55.
The fifth pitfall is reporting a bare K-factor in isolation. K equal to 0.4 tells you almost nothing without context. Show it next to invites-per-user, conversion rate, median cycle days, and a cohort trend over the last six months. Otherwise the executive who reads your dashboard will pattern-match on a single number and make a planning decision on noise.
Optimization tips
Pre-aggregate invites and active users into a daily summary table. The recipes above run a full scan over the invites and events tables every time. For products with tens of millions of invitations, materialize a daily roll-up table keyed by date, sender, and channel — your dashboard query then touches a few thousand rows instead of hundreds of millions.
Index invites(sent_at) and invites(converted_user_id) if you are on Postgres. For Snowflake and BigQuery, the usual answer is partitioning: cluster invites on sent_at and events on event_at to make the daily window pruning effective. A 30-day K-factor query on a year of data should not have to scan the other 11 months.
Use APPROX_COUNT_DISTINCT for active-user counts when you do not need exact precision. On Snowflake and BigQuery the approximate function is often 10x faster than COUNT(DISTINCT) and the error is well below the noise floor of K itself.
Be careful with cohorts that have not had time to convert. A signup cohort from last week cannot have a fair K-factor yet, because the invites those users send are still inside their cycle time. Always restrict the cohort window so the youngest cohort has had at least two cycle times to produce conversions.
Related reading
- How to calculate cohort retention in SQL
- How to calculate funnel drop-off in SQL
- SQL window functions interview questions
FAQ
Does K above 1 actually happen in real products?
Rarely, and almost always on short windows during a product's earliest growth phase. The classic examples are messengers and cross-platform social networks during their breakout months — Snapchat, WhatsApp, and TikTok have all had quarters with K above 1. In steady state, most successful consumer and B2B products run with K between 0.3 and 0.8 and lean on paid acquisition to make up the gap. A sustained K above 1 over a year is essentially unheard of outside of network-effect monopolies.
Is K-factor relevant for B2B?
Yes, more than people think. In B2B collaboration tools like Slack, Figma, Notion, and Linear, the viral loop happens when one user invites coworkers into a shared workspace. That coworker invitation is structurally identical to a consumer referral and is one of the main growth drivers for product-led companies. The cycle time tends to be longer (days, not hours) because workspace invitations sit in email inboxes longer than personal messages.
How much does cycle time really matter?
It is the difference between exponential growth that feels fast and exponential growth that feels invisible. At K equal to 1.1 with a cycle of one day, you double your user base in about a week. At K equal to 1.1 with a cycle of 30 days, you double in about seven months. Cycle time is usually easier to improve than K itself, because it is mostly an onboarding and notification problem rather than a product-value problem.
What levers move K-factor most?
Two levers, and they are roughly equally weighted. The first is invites-per-user: more reasons to share, lower friction in the share flow, better defaults on which contacts get suggested. The second is conversion rate: a faster signup path, clear value visible to the invitee before they create an account, and an onboarding flow that delivers the promised value within the first session. Most teams over-invest in the first lever and under-invest in the second.
How do I separate organic K-factor from paid promotion effects?
Tag invites by source. If a user got a paid acquisition credit for referring friends, mark those invites as paid-loop and report two K-factors side by side: organic K (no incentive) and incented K (with credit). Otherwise you cannot tell whether a rising K is real product virality or just a temporary lift from a referral bonus campaign.