May 22, 2026·13 min read

Great Expectations on the Data Engineer interview

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Contents:

Why interviewers care about GE
What Great Expectations actually does
Expectations: the test taxonomy
Suites and checkpoints
Airflow and dbt integration
GE vs dbt tests vs Soda
Common pitfalls
Related reading
FAQ

Why interviewers care about GE

Data quality is no longer a nice-to-have — it is the difference between a dashboard that the CFO trusts and one that quietly poisons every quarterly board deck. When a Snowflake or Databricks recruiter at a company like Stripe, Airbnb, or DoorDash schedules a data engineering loop, expect at least one round to drill on data quality testing, and Great Expectations is the framework that comes up most often by name.

The reason is structural. dbt tests handle the easy cases — uniqueness, not-null, accepted values — but they live entirely inside dbt's transformation layer. The bronze layer, the API ingest, the file landing zone — none of that is covered by dbt test. Great Expectations fills that gap, and interviewers want to see that you understand why the gap exists, not just that you can recite a list of expectation names.

Load-bearing trick: if you can explain when to use GE instead of dbt tests, and when to use both, you have already separated yourself from most candidates who treat data quality as a checkbox.

What Great Expectations actually does

The core idea is declarative expectations — statements about your data that are checked at ingestion or transformation time. Each expectation is a Python function that returns Pass or Fail with a structured result.

expect_column_values_to_not_be_null("user_id")
expect_column_values_to_be_unique("order_id")
expect_column_values_to_be_in_set("status", ["pending", "paid", "shipped", "cancelled"])
expect_column_values_to_be_between("amount", 0, 1_000_000)

Each run produces a result object, results are persisted, and Data Docs — auto-generated HTML — give you a browsable validation history. That last part is what makes GE stick in production: when a dashboard breaks at 7am, you can hand a non-engineer a URL and they can see which expectation failed and on what data.

GE supports multiple compute backends — Pandas for small in-memory checks, SQL via SQLAlchemy for warehouses, and Spark for distributed workloads. The same expectation definition runs across all three, which is the second reason it shows up so often in interview prompts about heterogeneous stacks.

The catch is that "the same expectation runs everywhere" is mostly true but not entirely — statistical expectations behave differently on samples vs full tables, and you should be ready to discuss that nuance.

Expectations: the test taxonomy

There are several hundred built-in expectations. For an interview you do not need to memorize all of them — you need to know the four categories and a handful of representative members of each.

Category	Example	When to reach for it
Single-column	`expect_column_values_to_not_be_null`	Schema invariants, basic shape
Single-column regex	`expect_column_values_to_match_regex`	Email, phone, ID formats
Aggregate	`expect_column_mean_to_be_between`	Volume, distribution sanity
Table-level	`expect_table_row_count_to_be_between`	Pipeline-completeness checks
Multi-column	`expect_compound_columns_to_be_unique`	Composite keys, join contracts
Statistical	`expect_column_kl_divergence_to_be_less_than`	Drift detection on key features

The single-column family is the workhorse — not_null, unique, in_set, between, match_regex, value_lengths_to_be_between. These cover roughly 70-80% of production expectations in most teams, and you should be able to write one from memory in the interview.

The aggregate family is what separates GE from cheaper tools — expect_column_mean_to_be_between, expect_column_sum_to_be_between, expect_column_max_to_be_between. These detect the failure mode where every row is individually valid but the table as a whole has drifted — for example, when an upstream ETL silently filters out 30% of rows and the mean shifts.

The statistical family — KL divergence, quantile checks, distribution matches — is what you use for drift detection on ML features. This is where GE pulls ahead of dbt tests, which have no native concept of distributional comparison across runs.

Suites and checkpoints

An Expectation Suite is a named bundle of expectations for one dataset — typically one suite per table or view, sometimes per layer. A Checkpoint is a run configuration that says "run this suite against this batch of data and store the results in this location."

context.run_checkpoint(checkpoint_name="orders_quality")

The pattern most teams converge on is two suites per critical table — orders.warning and orders.error. Warning-level expectations fire alerts but do not stop the pipeline. Error-level expectations halt the DAG and page on-call. This split is worth bringing up unprompted in an interview because it shows you have run GE in production rather than just read the docs.

Sanity check: before adding any expectation to the error suite, ask "is this rule worth waking someone up at 3am?" If the answer is no, it belongs in the warning suite or not at all.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Airflow and dbt integration

Airflow integration is straightforward — the GreatExpectationsOperator runs a checkpoint as a task and fails the task if validation fails.

ge_check = GreatExpectationsOperator(
    task_id="validate_orders",
    expectation_suite_name="orders.warning",
    data_context_root_dir="/path/to/great_expectations",
    fail_task_on_validation_failure=True,
)

The flag fail_task_on_validation_failure is the single most-asked detail in interview deep-dives. Set it to True for the error suite, False for the warning suite, and route both to a Slack or PagerDuty alert downstream.

For dbt-native workflows, dbt-expectations is a community package that ports the GE expectation library into dbt YAML — same semantics, runs through dbt test.

models:
  - name: orders
    columns:
      - name: amount
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 1000000

The trade-off is real: dbt-expectations covers maybe 60-70% of native GE expectations, but it removes the operational overhead of running a second framework. For mature teams with a heavy dbt investment, this often wins.

GE vs dbt tests vs Soda

This comparison is the prompt that comes up in 80% of senior data engineering interviews where DQ is mentioned. Have a clean answer ready.

	Great Expectations	dbt tests	Soda
Config language	Python + YAML	YAML + SQL	YAML (SodaCL)
Setup complexity	Medium	Low	Low
Custom tests	Harder (Python)	SQL macros	YAML + SQL
Data docs	Auto HTML	Through dbt docs	Soda Cloud
Drift detection	Strong	None	Built-in
Statistical/ML	Strong	Weak	Medium
Hosted option	Self-host or GX Cloud	Self-host or dbt Cloud	Managed by default

The pattern most modern stacks settle on is a layered approach rather than picking one tool. dbt tests cover the simple invariants inside the transformation layer — uniqueness, referential integrity, accepted values. Great Expectations sits on the bronze layer where data lands before transformation, catching schema drift and statistical anomalies before they propagate. Soda is the managed alternative when a team wants cloud-hosted alerting without operating its own GE infrastructure.

In an interview, naming this layering explicitly is worth more than memorizing every expectation in the catalog.

Common pitfalls

The most expensive mistake is testing only the gold layer. By the time data reaches the business-facing mart, it has been through several transformations, and a failed expectation at gold tells you something is wrong but not where. Push the cheap structural tests — not-null, unique, ranges — down to bronze and silver, so failures surface near their root cause. This dramatically reduces mean time to debug, which is what the interviewer is really probing for when they ask "how do you organize tests across layers."

A second trap is expectation overload — teams new to GE often write 500 to 1,000 expectations on a single fact table within the first month, then nobody reads the validation reports because everything is yellow. The discipline is to ask, for each candidate expectation, "what action would I take if this failed?" If the answer is "I would investigate," it belongs in the suite. If the answer is "I would shrug," delete it. A focused 50-expectation suite that everyone trusts beats a 500-expectation suite that nobody reviews.

The third pitfall is disconnecting validation from alerting. A test that fails silently in a logs directory is worse than no test at all, because it creates a false sense of safety. Wire every checkpoint into a notification channel — Slack for warnings, PagerDuty or OpsGenie for errors — and verify the path end-to-end at least quarterly with a deliberate failure.

The fourth trap, and the one most often surfaced in senior interviews, is stale expectations after schema change. When a column is renamed, when a new status value is introduced, when a date column changes timezone — expectations written against the old shape start firing false positives, the team grows numb to alerts, and the next real failure goes unnoticed. The fix is to treat expectation suites as code that ships with schema changes, reviewed in the same PR as the migration.

The fifth pitfall is fail_task_on_validation_failure=False on critical checks. This is the silent killer — a misconfigured flag that lets bad data flow downstream while validation results dutifully record the failure in Data Docs that nobody reads. Be deliberate about which expectations halt the pipeline and which only warn, and write that policy down so a new engineer cannot accidentally invert it.

Finally, ignoring distributional drift on ML feature tables is a category-level mistake. If your downstream models consume features from a warehouse table, schema-level checks are necessary but not sufficient — a feature whose mean shifts by two standard deviations week-over-week will quietly degrade model performance long before any not-null check fires. Statistical expectations like expect_column_kl_divergence_to_be_less_than exist for exactly this reason; use them on the columns that actually drive predictions.

If you want to drill data engineering scenarios like this every day, NAILDD is launching with realistic interview questions across exactly this pattern.

FAQ

Does Great Expectations slow down the pipeline?

It depends on the expectation mix. Lightweight column-level checks like not_null or in_set cost milliseconds on top of the underlying scan, especially when GE pushes the predicate down to the SQL engine. Heavy statistical expectations like KL divergence or quantile checks can take tens of seconds on large tables because they require either a full scan or a representative sample. The practical fix is to run cheap expectations on every batch and reserve expensive ones for a scheduled "deep check" — for example, hourly structural tests and a nightly statistical sweep.

Can I use Great Expectations without Airflow?

Yes. Checkpoints can be triggered from cron, GitHub Actions, a Python script, a CI step, or interactively from a notebook during development. Airflow is the most common production trigger because teams already run it for orchestration, but nothing in GE requires it. For small teams without an orchestrator yet, running checkpoints from GitHub Actions on a schedule is a reasonable starting point.

How do I version expectation suites?

Treat them as code. The suite definition lives in YAML or JSON in your repository, gets reviewed in pull requests alongside schema changes, and goes through the same CI as your application code. The validation results are runtime artifacts and belong in object storage with a retention policy — typically 30 to 90 days for warnings, longer for errors that fed an incident review.

When should I write a custom expectation?

When the built-in catalog cannot express your invariant, and the invariant is important enough that the next on-call engineer must see it labeled by name in Data Docs. Examples: domain-specific business rules ("every paid order has a fulfillment record within 24 hours"), cross-table referential checks that go beyond simple joins, or compliance constraints unique to your industry. Resist writing a custom expectation when a combination of two or three built-ins would do — the maintenance cost is real.

How does Great Expectations interact with streaming pipelines?

GE was originally batch-oriented, and that legacy still shows. For streaming, the common pattern is to run GE against micro-batches or windowed snapshots rather than per-event, because the framework's overhead per validation does not amortize well at event-level granularity. For true per-event quality checks in a Kafka or Kinesis pipeline, you usually want a streaming-native solution and use GE as a complementary batch check on the downstream landing table.

Is this official documentation?

No. This article reflects patterns from Great Expectations 0.18 and later, plus dbt-expectations community usage, as seen in production data platforms at mid-to-large companies. Always cross-reference the official GE documentation for current API details, especially around the fluent datasources rewrite.