May 18, 2026·13 min read

dbt Elementary in a DE interview

Q: How long does the anomaly baseline take to warm up?

Elementary needs at least **7 buckets** of history to compute a z-score and recommends **14** for stable detection. In a daily-bucket setup that's two weeks; in an hourly-bucket setup it's half a day. During the warm-up window the test will either pass trivially (no baseline to compare against) or false-positive on the first real outlier. Plan to ship Elementary two weeks before you actually need the alerts, and tag the warm-up window with `meta: { warming_up: true }` so reviewers know to ignore early misfires.

Q: Can Elementary detect data quality issues at the row level, not just the table level?

Not directly. Elementary's tests operate on **aggregated** metrics — row counts, null rates, distinct counts, mean and standard deviation — bucketed by time. For row-level validation you still use dbt's built-in `not_null`, `unique`, and `accepted_values` tests, or a singular test that joins to a reference table. The senior-level answer in the interview is "row-level shape is dbt, table-level drift is Elementary, and the right test stack uses both layered on the same model."

Q: Should I run Elementary in CI or only in production?

Both, but for different reasons. In CI, run `dbt test --select elementary` against a staging warehouse on every PR to catch schema-change alerts and obvious test failures before merge. In production, run `dbt build` + `edr monitor` on a schedule so anomaly baselines actually have continuous history to fit against. Running anomaly tests only in CI defeats the purpose — the baseline resets on every PR branch and you'd never accumulate the 14 buckets of history that the z-score needs.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Contents:

What Elementary actually is
The test-type table interviewers expect
Anomaly detection without rule-writing
Monitors, freshness, and schema change
Alerting that does not page on every flap
Common pitfalls
Related reading
FAQ

What Elementary actually is

Elementary is an open-source dbt package plus an optional cloud service that turns a vanilla dbt project into a data-observability platform. You install it the same way you'd install dbt-utils — one entry in packages.yml, one dbt deps, one dbt run --select elementary — and from that moment every dbt test invocation also writes test metadata, run history, and column-level stats to a set of elementary-prefixed tables in your warehouse. The interview question is rarely "what is Elementary"; it's "how would you catch a silent data quality regression at 3 a.m. before the CFO sees it on the Monday dashboard?" Elementary is the most common answer at Snowflake, Databricks, and BigQuery shops in 2026, and reviewers want to hear you name the moving parts in the right order.

The package ships three things worth memorising before a panel: anomaly tests (volume, freshness, column-level, dimensional), monitors (schema changes, source freshness, model run-time trend), and an alerts CLI that posts to Slack, PagerDuty, MS Teams, or a webhook. The cloud product layers a hosted UI, lineage view, and SLA tracking on top — but the package alone, free, is enough to ace the interview.

# packages.yml
packages:
  - package: elementary-data/elementary
    version: 0.16.2

# dbt_project.yml
models:
  elementary:
    +schema: elementary
    +materialized: incremental

The two-file install matters in interviews because the follow-up is always "what schema does it write to and why incremental?" Answer: a dedicated elementary schema so artifacts don't pollute your marts layer, and incremental materialisation so the metadata tables (dbt_run_results, dbt_models, model_run_results) don't get truncated on every full refresh — you'd lose run history.

The test-type table interviewers expect

Whiteboard moment. When the panel asks "what test types does dbt support, and what does Elementary add on top," draw this from memory:

Test type	Layer	Catches	dbt built-in or Elementary
`not_null`	dbt core	NULL where forbidden (e.g. `user_id`)	dbt core
`unique`	dbt core	Duplicate keys, broken joins downstream	dbt core
`accepted_values`	dbt core	Enum drift, e.g. new status `'refunded_pending'`	dbt core
`relationships`	dbt core	Orphan FKs, e.g. `orders.user_id` not in `users`	dbt core
`freshness`	dbt source	Source table not refreshed in N hours	dbt sources block
`volume_anomalies`	Elementary	Row-count drop / spike vs trailing window	Elementary
`freshness_anomalies`	Elementary	Late-arriving partition vs typical cadence	Elementary
`column_anomalies`	Elementary	NULL-rate, distinct-count, min/max drift	Elementary
`dimension_anomalies`	Elementary	Group-by distribution change (e.g. country mix)	Elementary
`schema_changes`	Elementary	New, dropped, or retyped column upstream	Elementary

Load-bearing trick: dbt's four generic tests (not_null, unique, accepted_values, relationships) catch shape violations — things you can declare a rule about. Elementary's anomaly family catches statistical drift — things you can't declare because the threshold is "different from yesterday." Naming this distinction in the interview is the single sentence that separates a mid-level answer from a senior one.

The reason this table lands well is that most candidates list tests as a flat bullet list. Drawing it as a 2D grid with a "layer" column shows you understand that observability is a stack, not a switch. Reviewers at Stripe, Airbnb, and Linear have all asked some variation of this question on the data platform loop in the last two cycles.

Anomaly detection without rule-writing

The headline feature. Traditional data tests are declarative — not_null, unique, accepted_values — and they fail loudly when a rule is violated. They are useless when the rule is "row count should look like it usually does" because nobody writes a rule that says "row count between 880,000 and 1,120,000 on Tuesdays except after a marketing push." Elementary's anomaly tests fit a baseline from the trailing window — default 14 days — and flag points that fall outside a configurable z-score band, default 3.0.

# models/orders/orders.yml
version: 2
models:
  - name: stg_orders
    tests:
      - elementary.volume_anomalies:
          timestamp_column: created_at
          time_bucket:
            period: day
            count: 1
          training_period:
            period: day
            count: 14
          anomaly_sensitivity: 3
      - elementary.freshness_anomalies:
          timestamp_column: created_at
      - elementary.column_anomalies:
          column_anomalies:
            - null_count
            - missing_count
            - distinct_count
            - average
            - sum
          timestamp_column: created_at

What runs under the hood: Elementary computes per-bucket metrics (row count per day, null rate per day, distinct count per day), stores them in elementary.metrics_anomaly_score, and on every test execution compares the latest bucket against the trailing baseline. A z-score above the sensitivity threshold marks the test as failed.

Sanity check: If your volume_anomalies test fires every Monday because weekends are quieter, you don't have a data quality problem — you have a seasonality problem. Set seasonality: day_of_week and Elementary will compare each Monday against prior Mondays, not against the trailing flat average.

Monitors, freshness, and schema change

Monitors are the always-on telemetry that runs alongside your tests. Schema-change detection is the one to memorise because it's the question that gets asked when interviewers want to see if you've shipped this in production. Elementary stores a snapshot of the column list and types after every dbt run, and on the next run diffs against the prior snapshot. New columns, dropped columns, and type changes are written to elementary.alerts_schema_changes.

# models/orders/orders.yml
models:
  - name: dim_orders
    config:
      elementary:
        schema_changes:
          on_change: alert        # 'alert' | 'fail' | 'ignore'

The interesting design choice is on_change: alert vs fail. Alert posts to Slack but lets the run succeed — appropriate when upstream teams ship schema changes weekly and you want awareness, not paging. Fail kills the run — appropriate for regulated domains (finance, healthcare) where a silent column rename can rewrite a P&L.

Source freshness in Elementary is a superset of dbt's built-in freshness: block. Vanilla dbt checks "is the max timestamp in source X newer than N hours ago" — binary, pass/fail. Elementary's freshness_anomalies test compares the cadence pattern itself: if a source normally lands every 15 minutes and suddenly starts landing every 45, the binary check still passes (it's under 4 hours old) but the anomaly check fires. The interview phrase that lands here is "freshness as a distribution, not a threshold."

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Alerting that does not page on every flap

The chapter where most candidates lose points. Wiring edr send-report --slack-webhook ... to a Slack channel is easy; designing an alert routing strategy that doesn't burn out the on-call rotation is the senior-level skill. Elementary supports three primitives that you should be ready to combine on the whiteboard:

# models/orders/orders.yml
models:
  - name: fct_revenue_daily
    meta:
      owner: "@data-platform"
      subscribers: ["@finance-data"]
    config:
      elementary:
        alert_fields:
          - description
          - owners
          - tags
        alert_suppression_interval: 24   # hours

The three primitives are ownership (who gets paged), suppression (don't re-page for the same failing test within N hours), and severity tagging (P0 → PagerDuty, P1 → Slack, P2 → daily digest). A grown-up Elementary deployment routes by tags: ['p0'] to PagerDuty via the webhook receiver, tags: ['p1'] to a #data-alerts Slack channel, and everything else to a Monday-morning email digest generated by edr monitor report.

Gotcha: Elementary alerts fire from the CLI, not from inside dbt test. You schedule dbt build in Airflow or dbt Cloud as you always have, then schedule a separate edr monitor step five minutes later. Forgetting the second step is the single most common reason a candidate's "I shipped Elementary" story falls apart under cross-examination.

Run-time integration in Airflow looks like a two-task DAG: dbt_build (the existing task) followed by edr_monitor, with the second task using trigger_rule='all_done' so alerts fire even if dbt_build partially failed — you want alerts especially when builds fail.

Common pitfalls

The first pitfall is treating Elementary as a replacement for dbt's built-in tests rather than a complement. Anomaly tests need 7 to 14 buckets of history before the baseline stabilises, and during that warm-up they false-positive a lot. Candidates who present Elementary as "we deleted all our not_null tests and switched to anomaly detection" raise a red flag — the right framing is declarative tests for shape, anomaly tests for drift, both running on every build.

A second trap is leaving anomaly_sensitivity at the default 3.0 across the entire project. A z-score of 3 on a stable B2B revenue table flags a real issue; the same threshold on a sparse experiment-event table flags noise three times a day. Tune sensitivity per model — 4.0 or 5.0 for noisy tables, 2.5 for revenue-critical marts — and document the rationale in meta.anomaly_rationale. Reviewers will ask why and you want a one-sentence answer per number.

The third pitfall is metadata table bloat. Elementary writes to dbt_run_results, model_run_results, dbt_invocations, and metrics_anomaly_score on every invocation. In a project with 800 models running every 15 minutes, the metadata footprint passes 100 GB in a quarter. Set a retention policy with vars: elementary: days_back: 30 in dbt_project.yml and schedule dbt run-operation elementary.cleanup_elementary weekly. Snowflake users should also set +cluster_by: ['detected_at'] on the anomaly tables.

The fourth pitfall is firing alerts to a channel nobody owns. A #data-quality channel without a named owner becomes a dead channel within a sprint, and once people mute it the alerts are functionally invisible. Tie every model to an owner in meta, route alerts to that owner's team channel, and require a written acknowledgment in the channel within 24 hours — Elementary's --review flag on edr monitor makes this concrete.

The fifth and most subtle pitfall is trusting anomaly results during backfills. When you backfill 90 days of history into a table, the volume_anomalies test sees a single huge bucket and flags everything afterwards as a drop. Disable anomaly tests during backfills with --exclude tag:elementary or use where: "{{ elementary.edr_cli_run() }}" to scope tests to incremental runs only.

If you want to drill dbt and data-quality interview questions like the test-type table above, NAILDD is launching with 500+ data engineering problems organised by topic and seniority.

FAQ

Is Elementary free or paid?

The open-source dbt package is free under Apache 2.0 and ships every feature covered in this post — anomaly tests, schema-change monitors, the CLI, Slack and webhook alerts. Elementary Cloud is a paid SaaS that hosts the UI, adds column-level lineage across multiple dbt projects, SLA dashboards, and incident management. For the interview, knowing the open-source package end-to-end is enough; reviewers are testing whether you can ship observability, not whether your company writes the cheque.

How is Elementary different from Great Expectations or Monte Carlo?

Great Expectations is a Python-native validation library that lives outside dbt — you write expectation suites against pandas, Spark, or SQL connections, and it integrates with Airflow via operators. Monte Carlo is a closed-source SaaS that infers tests from query logs and lineage without touching your dbt project at all. Elementary sits between them: dbt-native, free, declarative-and-statistical. In an interview, the safe framing is "Great Expectations for arbitrary Python pipelines, Elementary for dbt-centric stacks, Monte Carlo when you want zero-config observability across a heterogeneous stack and you have the budget."

How long does the anomaly baseline take to warm up?

Elementary needs at least 7 buckets of history to compute a z-score and recommends 14 for stable detection. In a daily-bucket setup that's two weeks; in an hourly-bucket setup it's half a day. During the warm-up window the test will either pass trivially (no baseline to compare against) or false-positive on the first real outlier. Plan to ship Elementary two weeks before you actually need the alerts, and tag the warm-up window with meta: { warming_up: true } so reviewers know to ignore early misfires.

Does Elementary work with BigQuery, Snowflake, Databricks, Redshift, and Postgres?

Yes for the first four — those are the tier-one supported adapters and what most large data orgs run on in 2026. Postgres support exists but is positioned as community-supported; the anomaly SQL uses warehouse-specific window functions and APPROX_COUNT_DISTINCT equivalents that are best on Snowflake, BigQuery, and Databricks. If your interviewer asks about a niche warehouse — DuckDB, ClickHouse, MotherDuck — the honest answer is "check the adapter compatibility matrix, the package targets the big four."

Can Elementary detect data quality issues at the row level, not just the table level?

Not directly. Elementary's tests operate on aggregated metrics — row counts, null rates, distinct counts, mean and standard deviation — bucketed by time. For row-level validation you still use dbt's built-in not_null, unique, and accepted_values tests, or a singular test that joins to a reference table. The senior-level answer in the interview is "row-level shape is dbt, table-level drift is Elementary, and the right test stack uses both layered on the same model."

Should I run Elementary in CI or only in production?

Both, but for different reasons. In CI, run dbt test --select elementary against a staging warehouse on every PR to catch schema-change alerts and obvious test failures before merge. In production, run dbt build + edr monitor on a schedule so anomaly baselines actually have continuous history to fit against. Running anomaly tests only in CI defeats the purpose — the baseline resets on every PR branch and you'd never accumulate the 14 buckets of history that the z-score needs.