How to calculate first-touch attribution in SQL
Contents:
What first-touch attribution actually answers
First-touch attribution is the SQL that credits a conversion to the very first marketing touchpoint a user ever had with the product. When the head of growth at Notion asks "which channel discovered the customers who converted last quarter", that question is a first-touch question and it has a different answer than the last-touch query the paid acquisition team runs every morning. The first paid keyword, the first blog post, the first podcast mention — whichever appeared first in the user's journey gets the credit, and the rest of the path is ignored.
The reason teams care is that long sales cycles starve the top of the funnel of credit. A B2B SaaS journey at Snowflake or Databricks often looks like a blog post in March, a webinar in May, a free trial in June, and a paid contract in August. A last-touch model hands the revenue line to the inside sales rep who hosted the closing demo and reports zero ROI on the content investment that started the path. First-touch flips the lens: the blog post that introduced the company to the buyer takes the win, the demo gets nothing, and the content team can defend its budget. Neither model is wrong on its own — they answer different questions, and any honest growth analyst reports both side by side.
Every query below runs on Postgres with one-line tweaks for Snowflake, BigQuery, or Redshift. The schema assumes two tables: a touchpoints table with user_id, channel, and touched_at, and a conversions table with user_id, amount, and converted_at.
The base first-touch query
The core idea is to rank every touchpoint per user by timestamp, pick the one with rank one, and join it to the conversion table. The constraint that the first touch must occur before the conversion is non-negotiable — a touchpoint logged after the customer converted cannot have influenced the decision, and forgetting that filter is the most common reason first-touch dashboards show inflated revenue against newer channels.
WITH first_touch AS (
SELECT
user_id,
channel,
touched_at,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY touched_at
) AS rn
FROM touchpoints
)
SELECT
ft.channel,
SUM(c.amount) AS attributed_revenue,
COUNT(DISTINCT c.user_id) AS conversions
FROM conversions c
JOIN first_touch ft
ON ft.user_id = c.user_id
AND ft.rn = 1
WHERE ft.touched_at <= c.converted_at
GROUP BY ft.channel
ORDER BY attributed_revenue DESC;Three details earn their place. The ROW_NUMBER window ordered by touched_at ascending picks the chronologically first row per user. The rn = 1 filter strips out every other touchpoint. And the touched_at <= converted_at guard prevents the join from matching a conversion to a touchpoint logged afterward — a defensive condition that catches backfill bugs.
A subtle point hides in the JOIN versus LEFT JOIN choice. An inner join returns only users who both converted and had a touchpoint, which is what most attribution dashboards want. A left join from conversions exposes users who converted with no logged touchpoint — those should be reported in a separate "untracked" bucket. If half the conversions have no first touch, the attribution table is not ready to inform budget decisions.
Revenue and conversions by first-touch channel
The query above gives the all-time view. Most reports want a bounded window — ninety days for paid channels, twelve months for SEO. Filter the touchpoints CTE to the window of interest and aggregate conversions inside the same window so a touchpoint outside the window cannot become a first touch for a conversion inside it.
WITH first_touch AS (
SELECT
user_id,
channel,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY touched_at
) AS rn
FROM touchpoints
WHERE touched_at >= CURRENT_DATE - INTERVAL '90 days'
),
conv AS (
SELECT
user_id,
SUM(amount) AS revenue
FROM conversions
WHERE converted_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY user_id
)
SELECT
ft.channel,
COUNT(c.user_id) AS conversions,
SUM(c.revenue) AS revenue
FROM first_touch ft
LEFT JOIN conv c
ON c.user_id = ft.user_id
WHERE ft.rn = 1
GROUP BY ft.channel
ORDER BY revenue DESC NULLS LAST;The LEFT JOIN direction matters. Starting from first_touch and joining conv includes channels that drove first touches but no conversions yet — a useful signal for top-of-funnel investments still in the pipeline. The NULLS LAST ordering pushes them to the bottom without hiding them.
A practical extension is to add a first_touch_month dimension and trend the channel mix over time. A six-month rolling view of first-touch share by channel is one of the cleanest ways to see whether a content investment is compounding or stalling. Just bucket touched_at to month and add it to the GROUP BY.
First-touch versus last-touch in one query
Sophisticated growth teams report both models side by side. First-touch identifies the channel that opened the door and last-touch identifies the channel that closed the sale, and the spread between them is itself a useful metric. A channel high on first-touch and low on last-touch is a discovery channel. A channel high on last-touch and low on first-touch is a closing channel — invest in it for conversion rate, but do not credit it with creating demand.
WITH first_last AS (
SELECT
user_id,
FIRST_VALUE(channel) OVER (
PARTITION BY user_id
ORDER BY touched_at
) AS first_channel,
FIRST_VALUE(channel) OVER (
PARTITION BY user_id
ORDER BY touched_at DESC
) AS last_channel,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY touched_at
) AS rn
FROM touchpoints
)
SELECT
fl.first_channel,
fl.last_channel,
COUNT(DISTINCT c.user_id) AS conversions,
SUM(c.amount) AS revenue
FROM first_last fl
JOIN conversions c
ON c.user_id = fl.user_id
WHERE fl.rn = 1
GROUP BY fl.first_channel, fl.last_channel
ORDER BY revenue DESC
LIMIT 50;FIRST_VALUE with two different orderings is the trick. The first window orders ascending and returns the earliest channel; the second orders descending and returns the latest. The outer rn = 1 collapses the rows to one per user. On Snowflake or BigQuery a QUALIFY clause replaces the outer filter, but the CTE form runs on every warehouse.
Reading the output is where most teams underuse the query. The interesting rows are the pairs where first and last disagree by a wide margin. Organic search to retargeting display is a discovery-to-close path. Podcast to direct is a brand-to-intent path. Paid social to email is a paid-acquisition-to-lifecycle path. Each pair tells a story about how the funnel actually works, and budget decisions get better when the story is visible.
Common pitfalls
The first pitfall is forgetting the chronological filter on the join. A first-touch query that joins on user_id without enforcing touched_at <= converted_at silently includes touchpoints recorded after the conversion — backfills, late-arriving events, CRM enrichment passes — and inflates the revenue line on every channel touched frequently after the sale. The guard is one line of SQL and prevents an entire class of slow-burn dashboard bugs.
The second pitfall is multi-conversion users. A power user at Stripe who buys monthly has many conversions and only one first touch ever. Naively summing revenue across all of that user's conversions and attributing the full amount to the very first channel inflates that channel beyond what the business cares about. The standard fix is to define first-touch per first conversion only — the touchpoint that introduced the user is credited with introducing them, not with every renewal afterward.
The third pitfall is an unbounded lookback window. A touchpoint from two years ago that finally converts today is technically a first touch, but it is not a useful one for budget decisions. B2C usually caps the lookback at thirty to ninety days; B2B with long sales cycles uses ninety to one hundred eighty. Without a cap, ancient blog posts written by an employee who left two years ago dominate the dashboard.
The fourth pitfall is comparing first-touch to last-touch ranking without naming the model. A junior analyst who reports "blog beats retargeting" using first-touch and the paid lead who reports "retargeting beats blog" using last-touch will fight at every planning meeting until someone shows that both numbers are correct and answer different questions. Always show the spread and label the model on the chart.
The fifth pitfall is same-day touch ambiguity. A user who lands through a paid Google ad in the morning and opens the email newsletter in the afternoon has two touchpoints with the same calendar date. The ORDER BY touched_at window uses the full timestamp, which is what you want — but only if the pipeline captures sub-second precision in a single time zone. Otherwise ties get resolved arbitrarily and the channel ranking becomes unstable from one query run to the next. Cast every timestamp to UTC at the bronze layer.
Optimization tips
First-touch queries scan the entire touchpoints table, and cost is dominated by its size. The biggest win is to partition touchpoints by touched_at — daily partitions in Postgres and BigQuery, date clustering in Snowflake. Every reasonable first-touch report is bounded by a lookback window, so partition pruning takes a billion-row scan down to ninety or one hundred eighty partitions.
The second optimization is to materialize a per-user first-touch table on a schedule. A user's first touch is set at acquisition and only moves if a backfill arrives, so recomputing ROW_NUMBER across the full touchpoints table on every dashboard refresh is wasteful. A nightly batch produces a user_first_touch table keyed on user_id with first_channel, first_touched_at, and any other dimensions worth carrying. Every downstream dashboard joins this thin table to conversions and skips the window function. On a billion-row warehouse this takes a first-touch channel dashboard from forty seconds to under two.
The third optimization is a narrow join type. user_id should be an integer or fixed-length hash, not a varchar — string joins cost ten times what integer joins cost. If the upstream pipeline emits string IDs, hash them to a BIGINT at ingest.
Related reading
- SQL window functions interview questions — the
ROW_NUMBERandFIRST_VALUEpatterns first-touch is built on. - How to calculate install attribution in SQL — last-touch attribution at the install layer, the companion view.
- How to calculate CAC by channel in SQL — what first-touch revenue feeds into for channel-level efficiency.
- How to calculate customer journey in SQL — the multi-touch view between first and last.
- How to calculate conversion window in SQL — how to size the lookback window with data.
If you want to drill attribution and growth SQL questions like this every day, NAILDD is launching with hundreds of analytics SQL problems built around exactly this pattern.
FAQ
When should I use first-touch instead of last-touch?
Use first-touch when the question is "which channel discovered this customer" and last-touch when the question is "which channel closed the sale". First-touch is the right model for top-of-funnel investments — content, SEO, PR, podcasts, paid prospecting — that earn their keep by introducing the product to people who later convert elsewhere. Last-touch is the right model for the closing layer — retargeting, branded search, lifecycle email — that earns its keep by converting people who already know the product. The honest growth dashboard reports both and labels the model on each chart.
What lookback window should I use for first-touch?
Thirty days for B2C apps with short consideration cycles is a defensible default. Ninety days is standard for higher-consideration consumer purchases like travel or financial products. Ninety to one hundred eighty days fits B2B SaaS where the buying journey spans quarters. Measure the actual time-from-first-touch-to-conversion distribution in your own data and pick the window that covers the ninetieth percentile of real journeys.
Does first-touch overcredit organic search and SEO?
Often yes, and that is a real limitation, not a bug to fix in SQL. Most users first encounter a product through a Google search or a direct URL they typed because somebody recommended it, and both bucket as organic in the touchpoints table. The honest move is to report the paid-versus-organic split of first touches and watch that ratio over time. A multi-touch model corrects the imbalance when there is enough deterministic data to support it.
How should I handle UTM parameters in the channel column?
Use utm_source plus utm_medium and normalize the values at ingest with a lookup table — cpc collapses to paid, social and facebook-ads collapse to paid_social. If the touchpoint has no UTM parameters, fall back to the HTTP referrer to detect organic search, organic social, and direct traffic. If none resolve, label the row as direct and report the direct share separately — a rising direct share usually signals UTM tagging hygiene problems, not real brand growth.
Is multi-touch attribution better than first-touch or last-touch?
In principle yes, in practice it depends on data quality. Multi-touch models distribute credit across every touchpoint using a rule — linear gives equal weight, time-decay weights recent touches more heavily, U-shaped weights first and last more than the middle, and data-driven models learn the weights from historical data. Every model assumes the touchpoint stream is complete, which it rarely is on mobile after the ATT prompt. The pragmatic path is to start with first-touch and last-touch as baselines, add linear multi-touch when the data supports it, and skip data-driven attribution until the deterministic match rate is consistently above eighty percent.