Iceberg time travel on the Data Engineering interview
Contents:
Why time travel comes up in interviews
When a Data Engineering interviewer at Snowflake, Databricks, or Netflix asks about Apache Iceberg time travel, they are usually checking two things at once: do you understand that Iceberg stores immutable snapshots rather than mutating files in place, and can you reason about which operation — a rollback, a branch, or a tag — is the right tool for a specific business problem. The wrong answer is to recite the syntax. The right answer is to explain when you would reach for each device and what the storage cost looks like.
The reason this topic shows up so often is that lakehouse teams at Stripe, Airbnb, and Uber have to satisfy three simultaneous constraints: regulatory audit windows of 7 years, ML training reproducibility for production models, and fast iteration on staging data. Iceberg time travel addresses all three, but each constraint maps to a different primitive. A candidate who confuses tags (immutable, named, retained on purpose) with branches (mutable, isolated, short-lived) signals that they have only read the marketing page.
Load-bearing trick: every write to an Iceberg table produces a new snapshot. Snapshots are immutable. Tags freeze a snapshot under a name with an explicit retention. Branches let you fork the write history without touching main. Everything else is plumbing.
Snapshots: the foundation
Every INSERT, UPDATE, MERGE, DELETE, or OVERWRITE on an Iceberg table appends a new entry to the table's metadata log and produces a snapshot identified by a 64-bit integer. The snapshot points at a manifest list, which points at manifests, which point at the data files. Because none of these files are rewritten in place, an older snapshot can always be addressed as long as its files have not been garbage-collected by expire_snapshots.
You can inspect snapshots directly through the metadata tables that Iceberg exposes:
SELECT
snapshot_id,
parent_id,
operation,
committed_at,
summary['added-records'] AS added_rows,
summary['deleted-records'] AS deleted_rows
FROM iceberg.events.snapshots
ORDER BY committed_at DESC
LIMIT 20;The operation column is the one interviewers love to ask about. It will be append, overwrite, delete, or replace. An overwrite without a filter rewrites the whole table — easy to do by accident in a CTAS pipeline, and worth flagging in code review.
Time travel queries
There are two canonical ways to read a past snapshot: by snapshot id when you have one in hand from the metadata table, or by timestamp when you only know the wall-clock moment.
-- By snapshot ID (Spark / Trino syntax)
SELECT *
FROM events VERSION AS OF 7283910472618;
-- By timestamp (UTC)
SELECT *
FROM events FOR TIMESTAMP AS OF '2026-04-01 12:00:00';
-- Snowflake-style alternative against an Iceberg table
SELECT *
FROM events AT (TIMESTAMP => '2026-04-01 12:00:00'::TIMESTAMP);The semantics are not "give me the row as it looked then" — they are "give me the table state committed at or before that moment". If three commits landed in the same minute, the timestamp query returns the latest commit at or before your target time, not all three. Candidates often miss this and propose using time travel as a per-row change log; that is the job of the changelog scan API, not time travel.
Branches and tags
Iceberg 1.0 introduced named references: branches and tags. Both point at a snapshot id, but they behave very differently.
A branch is a mutable pointer. You create it from a base snapshot, write to it in isolation, and either merge it back to main via a fast-forward or discard it. This is the lakehouse analogue of a Git feature branch.
ALTER TABLE events CREATE BRANCH dev_2026q2;
-- Write to the branch only
INSERT INTO events.branch_dev_2026q2
SELECT * FROM staging_events WHERE event_date >= '2026-04-01';
-- Read the branch
SELECT count(*) FROM events.branch_dev_2026q2;
-- Fast-forward main to the branch tip when ready
CALL system.fast_forward('iceberg.events', 'main', 'dev_2026q2');A tag is an immutable named snapshot with an explicit retention. Tags are the right primitive for compliance freezes and model-training datasets — anything you must reproduce later by name, not by remembering a 13-digit integer.
ALTER TABLE events CREATE TAG audit_2026_q1
AS OF VERSION 7283910472618
RETAIN 2555 DAYS; -- ~7 years
-- Query the tag like a version
SELECT *
FROM events VERSION AS OF 'audit_2026_q1';Operational table: snapshot vs branch vs tag
| Primitive | Mutability | Typical lifetime | Default retention | Best for | Wrong tool for |
|---|---|---|---|---|---|
| Snapshot | Immutable | Until expire_snapshots |
5 days | Auditing what a write did, debugging a bad commit | Long-term reproducibility (no name, only an id) |
| Branch | Mutable | Hours to weeks | Until dropped | Staging schema changes, isolating backfills, blue/green table swaps | Regulatory freezes — anyone can fast-forward over it |
| Tag | Immutable | Months to years | Configurable, often 1-7 years | ML training set freezes, SOX audit snapshots, named release datasets | Active development — you cannot write to a tag |
| Rollback | One-shot action | N/A | N/A | Recovering from a bad MERGE or OVERWRITE on main |
Routine "undo" — leaves orphaned snapshots that confuse downstream |
The cell that interviewers probe most often is the rollback row. Rollback rewrites the main ref to point at an older snapshot, which means subsequent reads see the older state — but the bad snapshots are still in the metadata log until expired. If a downstream consumer was streaming from snapshot ids, they will see the table appear to "jump backwards" and may produce duplicate rows. The fix is to coordinate rollbacks with downstream owners, not to treat them as a silent operation.
Real applications
Audit and compliance. A regulator asks "what did the customer_transactions table contain at midnight on March 31?". With a tag created at quarter close, the answer is one query. Without a tag, you are pleading with the platform team to recover files that expire_snapshots deleted four days later.
ML training reproducibility. A model trained at Anthropic or OpenAI on a feature table will need to be retrained six months later when the eval suite changes. If the original training set was a tag (features_v1_2026_03), the retrain is deterministic. If it was "whatever was in the table that Tuesday", the retrain is a guess.
Debugging bad pipelines. Your overnight DAG produced 12% fewer rows than yesterday. Query the previous snapshot, diff it against the current one, and you have the offending key set in three minutes. This is faster than parsing Airflow logs and far more precise.
Rollback after a bad MERGE. Someone shipped a MERGE with a WHEN MATCHED THEN UPDATE that overwrote a column with NULL. The Spark or Trino call is short:
CALL iceberg.system.rollback_to_snapshot('db.events', 7283910472618);The data files were never deleted; only the main pointer changes. Reads return to the pre-merge state immediately.
Common pitfalls
The first trap is forgetting that expire_snapshots is what makes time travel finite. By default many engines run snapshot expiration on a 5 day retention. If you assume "Iceberg keeps everything", you will discover at the worst possible moment that the snapshot you needed for an audit was garbage-collected last weekend. The fix is to pin the snapshot with a tag and an explicit RETAIN clause before expiration runs, not after.
A second pitfall is using time travel as a substitute for change data capture. Time travel gives you whole-table states at points in time. If you want a stream of per-row changes between two snapshots, you need the changelog scan API (CALL system.create_changelog_view) or a CDC connector. Candidates who propose "join snapshot A to snapshot B on primary key" are technically correct but will hit a quadratic cost on a billion-row table; the changelog API does this efficiently inside the engine.
A third pitfall is confusing branch isolation with transactional isolation. A branch isolates writes from main, but two writers to the same branch still need optimistic concurrency control. If both fast-forward main to different branch tips, one of them will lose the race and need to retry. This is fine — it is exactly the Git mental model — but candidates sometimes assume branches are serializable by themselves.
A fourth pitfall is tag sprawl. Tags are cheap to create and easy to forget, but each one holds its underlying files alive against expiration. A team that tags every nightly commit "just in case" will quietly multiply storage cost by the number of files those snapshots reference. The discipline is to tag deliberately — quarter closes, model training sets, regulatory checkpoints — and review tags annually for tags whose retention has lapsed.
A fifth pitfall is assuming identical syntax across engines. Trino, Spark, Snowflake, and Athena each spell time travel slightly differently. VERSION AS OF works in Spark and Trino; Snowflake prefers AT (TIMESTAMP => ...). The Iceberg metadata model is the same underneath, but in an interview you should call out the engine you are writing for rather than mix dialects.
Related reading
- Apache Iceberg deep dive — Data Engineering interview
- MERGE and UPSERT — Data Engineering interview
- Airflow backfill — Data Engineering interview
- SQL for Data Engineer interviews
- Materialized views — Data Engineering interview
If you want to drill Data Engineering questions like this every day, NAILDD is launching with a question bank covering Iceberg, Spark, and lakehouse internals at exactly this depth.
FAQ
How long can I actually look back with Iceberg time travel?
As long as your snapshot retention allows. The default expire_snapshots retention in most managed catalogs is 5 days, which means anything older than that is unreachable. If you need to retain specific points indefinitely, create a tag with a RETAIN clause measured in days — for example, RETAIN 2555 DAYS for a 7-year SOX window. Without a tag, "time travel" is really "recent-history travel".
Can I write to a past snapshot directly?
No. Snapshots are immutable. The two ways to "branch off" an older state are to create a branch from that snapshot (CREATE BRANCH ... AS OF VERSION ...), then write to the branch, or to rollback main to the older snapshot and write from there. The second option discards intervening commits, so it is rarely what you want in production.
Does time travel work on partitioned tables the same way?
Yes — partitioning is orthogonal to snapshotting. Each snapshot references a manifest list that knows the partition layout at the moment of that commit, which is also how Iceberg supports schema and partition evolution without rewriting old data. A time travel query against a snapshot before a partition spec change will read with the old partition layout transparently.
What is the storage cost of keeping many snapshots?
Roughly linear in the number of unique data files retained, not in the number of snapshots. Two snapshots that share 99% of their files cost only the 1% delta in extra storage. The risk is small writes producing many small files, where snapshot-level metadata starts to dominate; the mitigation is rewrite_data_files and rewrite_manifests maintenance procedures run on a schedule appropriate to your write volume.
How is this different from Delta Lake time travel?
Conceptually the same primitive — both maintain a metadata log of immutable file lists. The visible differences are syntax (VERSION AS OF in Spark works for both; Delta also has TIMESTAMP AS OF), and that Iceberg's branches and tags were native earlier and have richer first-class catalog support, whereas Delta historically leaned on RESTORE and CLONE to cover similar use cases. For a deeper compare, see the lakehouse format interview material on this site.
Should I expose time travel queries to product analysts?
Carefully. The query syntax is approachable, but interpreting results requires knowing what each snapshot represents — was it a backfill, a hotfix, a routine append? A reasonable pattern is to expose only named tags to non-engineering users (audit_2026_q1, features_v3_train) so they cannot accidentally query a half-written snapshot from the middle of a streaming commit batch.