May 18, 2026·13 min read

Feature store in the data science interview

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Contents:

Why feature stores show up in DS and MLE loops
What a feature store actually is
Offline vs online store
Train/serve consistency
Tools: Feast vs Tecton vs Hopsworks vs in-house
When you actually need one
Common pitfalls
Related reading
FAQ

Why feature stores show up in DS and MLE loops

If you are interviewing for a senior data scientist or machine learning engineer role at Stripe, DoorDash, Uber, Airbnb, Netflix, or a comparable ML-heavy shop, the system design round will almost certainly ask about train/serve skew — the most common production failure mode for ML systems. The feature store is the architectural answer the industry converged on, and interviewers expect you to discuss it without hand-waving.

The question rarely arrives as "what is a feature store". It arrives as "your model has 92% offline AUC but online lift is zero — what do you check first?" or "design a fraud-scoring service that answers in under 50 ms p99 with features over 30 days of history". Both are really the same question: how do you guarantee the features the model saw in training are the same features it sees in production?

A clean, structured answer signals seniority. Mumbling "we just rerun the SQL" signals you have not run an ML model in production. The surface area is small, the trade-offs are crisp, and the three or four load-bearing concepts fit in a single interview answer.

What a feature store actually is

A feature store is a centralized service for four things: storage of features, versioning of definitions, time-travel queries that return what a value was at a historical moment, and dual-write serving so the same feature is available for batch training and millisecond inference.

Concept diagram — the version interviewers want on the whiteboard:

Raw events ──► Feature pipelines ──► Feature Store ──► Training (batch)
                                            └────────► Serving (real-time)

The store is not a database in the conventional sense — it is a thin layer above one or more databases that enforces a single source of truth for feature definitions. The Postgres or Snowflake table underneath is implementation detail; what matters is that avg_orders_30d exists in exactly one place and is computed by exactly one piece of code regardless of who consumes it. Most candidates get this wrong by describing a key-value cache and stopping — a pure cache does not solve train/serve skew, because cache and training pipeline can still compute the value differently.

Offline vs online store

Every feature store has two halves — identical from the API, completely different from the infrastructure side.

Aspect	Offline store	Online store
Use case	Batch training, backfills, analysis	Real-time inference, online scoring
Typical backend	Snowflake, BigQuery, S3 + Parquet, Delta Lake	Redis, DynamoDB, Cassandra, ScyllaDB
Latency target	seconds to minutes	under 10 ms p99
Storage size	terabytes to petabytes	tens to hundreds of GB
History kept	full, with timestamps	current value only
Read pattern	scan or join, billions of rows	point lookup by entity key
Cost driver	storage + compute on warehouse	RAM and write QPS

Offline answers "what was this user's average order value as of 2025-11-04 14:00 UTC". Online answers "what is it right now, in under ten milliseconds, while the request thread is blocked".

A feature in both stores must be synced by a pipeline that periodically materializes the offline computation into the online cache. Materialization cadence is itself an interview-worthy decision: hourly is common, every five minutes is achievable, true streaming materialization with Flink or Kafka Streams is rare and expensive.

Load-bearing rule: the offline store keeps history so you can reconstruct what the model would have seen; the online store keeps the latest value so the model can be served quickly. If you confuse the two, you either burn money on RAM or you serve stale features.

Train/serve consistency

This is the section the interviewer is actually grading. Get it right and the loop softens; get it wrong and "fundamental understanding of production ML" goes on your debrief in red.

The problem in one sentence: during training you compute features with heavyweight SQL against a warehouse over weeks of history; during serving you need the same feature for one user in single-digit milliseconds. If the two paths use different code, the values drift, and the model that scored AUC 0.91 offline scores nothing useful online.

Three concrete sources of skew show up in interviews. Definition skew — the training query and the serving code compute slightly different things; the fix is a single declarative feature definition both paths consume. Temporal skew — the training data accidentally includes information from after the prediction timestamp; this is the classic leakage bug, and it is the reason point-in-time joins exist. A point-in-time join asks "what was the feature as of the moment the label became known", not "as of right now". Freshness skew — the online value is stale because materialization is late or broken; the fix is staleness monitoring as a first-class SLO, with alarms when online lags offline by more than the agreed budget (typically 5 to 15 minutes for batch features, under 30 seconds for streaming).

Pseudo-Python for a definition both paths consume:

@feature_view(
    entities=["user_id"],
    ttl=timedelta(days=1),
    online=True,
    offline=True,
)
def avg_orders_30d(user_id):
    return """
        SELECT user_id, AVG(amount) AS value
        FROM orders
        WHERE event_ts BETWEEN @as_of_ts - INTERVAL 30 DAY AND @as_of_ts
        GROUP BY user_id
    """

Offline runs the SQL with @as_of_ts set to the label time per training row. Online materializes the same SQL on a schedule and stores the latest result in Redis. Same definition, two engines, one source of truth.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Tools: Feast vs Tecton vs Hopsworks vs in-house

You will probably be asked which tool you would pick. No single right answer, but two wrong ones: "I would build it myself" when the scenario justifies an off-the-shelf system, and recommending Tecton for a team running one model.

Tool	License	Strength	Weakness	Best fit
Feast	Open source (Apache 2.0)	Simple, declarative, plug-and-play backends	No compute layer — you bring the pipelines	Mid-size teams with existing warehouse and Redis
Tecton	Commercial	End-to-end: compute + storage + monitoring	Pricey, vendor lock-in	Series-C and up shops without an ML platform team
Hopsworks	Open source + enterprise	Strong on point-in-time correctness, on-prem friendly	Smaller community than Feast	EU shops with data-residency rules
SageMaker / Vertex	Cloud-managed	Tight integration with AWS / GCP ML stack	Hard to leave the cloud later	Teams already all-in on one cloud
In-house Postgres + Redis	Free, your time	Total control, no abstraction tax	You will rebuild point-in-time joins, badly	Startups with fewer than 5 models and a strong engineer

Feast example — what most candidates have actually touched:

from feast import Entity, FeatureView, Field
from feast.types import Float32
from datetime import timedelta

user = Entity(name="user_id")

avg_orders = FeatureView(
    name="user_avg_orders_30d",
    entities=[user],
    ttl=timedelta(days=1),
    schema=[Field(name="value", dtype=Float32)],
    source=BigQuerySource(table="metrics.user_avg_orders"),
    online=True,
)

The same definition is read by training (point-in-time join over history) and by serving (store.get_online_features(...) hitting Redis).

Gotcha: Feast does not compute features. The BigQuerySource is a pre-aggregated table somebody else built. Say "Feast handles the SQL aggregations" and interviewers will press you — Tecton handles that, Feast does not.

When you actually need one

A feature store is not free — it adds infrastructure, on-call surface area, and a learning curve. It starts paying off when at least three of these are true: two or more models share features, real-time predictions ship in the product, train/serve skew has already burned you, the ML team crosses five engineers, and the warehouse-to-Redis glue has grown into bespoke scripts nobody trusts.

It is overkill when there is one model, batch inference only, fewer than ten features, or you are pre-revenue and still validating the idea. In those cases a well-named SQL view plus a nightly export to Parquet is the right architecture, full stop.

Common pitfalls

Senior interviewers love these — they separate candidates who read the Feast docs from candidates who got paged at 3am for a stale feature.

The first pitfall is treating the feature store as a model quality tool. It is not. A feature store guarantees the model sees the same input in train and serve; it says nothing about whether those inputs are predictive. Teams that ship a feature store expecting accuracy to go up will be disappointed. Feature stores reduce skew, they do not improve features.

The second is forgetting TTL on the online store. Without time-to-live, every key written stays in Redis or DynamoDB forever, and within a quarter the store outgrows its memory budget. Set a TTL — typically 24 to 72 hours for user-level features — longer than the materialization cadence but shorter than the natural churn of the entity. This sounds boring until you get paged for "OOM on the feature-serving cluster" at 2am on a Saturday.

The third is skipping point-in-time correctness during training data generation. Most tutorials write the join as JOIN features ON user_id and stop there, silently leaking future information into the training set. The model looks brilliant offline and fails on launch. Always join with an AS OF clause: JOIN features ON user_id AND feature_ts <= label_ts AND feature_ts > label_ts - ttl. Every feature store worth using has a helper for this.

The fourth is doing heavy computation in the online path. The online store is a lookup, not a compute layer. If a feature requires a 30-day window over a billion rows, the aggregation belongs in the offline materialization job and the result belongs in Redis. Running that SQL at request time will blow your p99 latency budget by three orders of magnitude and fail the system-design round.

The fifth is ignoring backfill before launch. A new feature does not exist historically; if you turn it on for serving and immediately start training on it, you have one day of data and a useless feature. Backfill the offline store across the relevant history, validate the distribution, enable online materialization, then let models consume it. Treat backfill as a deploy step, not an afterthought.

If you want to drill ML system design and feature-engineering questions like this every day, NAILDD is launching with hundreds of DS and MLE interview problems across exactly this pattern.

FAQ

Is Feast production-ready, or should I just use Tecton?

Feast has been in production at Tubi, Robinhood, and a long tail of mid-size ML teams. It is production-ready for the case it targets: a team that already has a warehouse and a key-value store and needs a declarative layer to unify training and serving. It is not an ML platform — you still need Airflow or dbt or Flink to actually produce the feature values. Tecton bundles that compute layer and is the right pick when you do not have a platform team, but you pay for it. The decision is mostly about whether you already own the pipelines.

What is the difference between a feature store and a data warehouse?

A warehouse stores arbitrary data for arbitrary consumers. A feature store stores ML features specifically and adds two things the warehouse does not natively offer: a serving path with millisecond latency, and point-in-time-correct joins that prevent label leakage. Think of the feature store as a thin, opinionated layer above the warehouse. Most feature stores are physically backed by the warehouse for the offline half and by a key-value store for the online half.

Do I need a feature store if I only do batch inference?

Probably not. If every prediction is generated by a nightly job that reads the warehouse and writes a table, you already have a feature store — it is called your warehouse. Adding Feast or Tecton here buys you a declarative catalog and lineage, which is nice for governance but does not solve a real production problem. Wait until you ship a real-time use case before paying the complexity tax.

How does train/serve consistency actually fail in practice?

The most common pattern is two engineers, two implementations. The data scientist writes the training query in SQL against the warehouse. The backend engineer translates that into a Python function that reads recent events from Kafka, aggregates them in memory, and returns the value. The two implementations agree on the happy path but disagree on edge cases — null handling, timezone of the event timestamp, whether refunded transactions count, whether the 30-day window is rolling or anchored to UTC midnight. Each disagreement is a small bias that compounds. A feature store eliminates this by making both paths consume one definition.

Are these tools required for the interview, or can I just describe the concept?

You can pass most interviews by describing the concept: dual stores, single definition, point-in-time correctness, materialization. Naming Feast or Tecton signals familiarity but is not required. What is required is answering follow-ups — "what backend for online", "how to detect a stale feature", "cost trade-off of every five minutes vs hourly". Tools are vocabulary; architecture is the answer.