May 7, 2026·12 min read

Domain events on a systems analyst interview

Q: How do consumers handle out-of-order events?

By including `occurred_at` and `aggregate_id` in every payload and ordering on the consumer side per aggregate. Kafka guarantees ordering within a partition, so partitioning by `aggregate_id` (for example, `order_id`) makes per-aggregate ordering free. Across aggregates, ordering is not guaranteed and should not be assumed — design consumers to be commutative where possible.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Coverage:

What a domain event actually is
Naming convention
Event payload shape
Domain vs integration events
Schema versioning
Common pitfalls
Related reading
FAQ

What a domain event actually is

A domain event is a past-tense fact — a record that something meaningful happened inside a bounded context. The canonical trio you will repeat on every interview loop: OrderPlaced, PaymentProcessed, UserRegistered. The verb is in the past for a reason. An event is not a request to do something; it is a statement that something already occurred and the rest of the system has to live with it.

Load-bearing rule: if you can argue with the event ("no, don't place that order"), it's a command, not an event. Events are immutable history.

Four properties anchor the definition and you should rattle them off without thinking: past tense, immutable once published, carrying a timestamp so consumers can order them, and carrying a unique event ID so consumers can deduplicate on retry. Miss any of those and the architect across the table starts looking for the exit.

The reason systems analysts get this question at Stripe, DoorDash, Airbnb, and any platform that touches money is straightforward — these companies run on event-driven backbones and they need an SA who can sketch a contract on a whiteboard without inventing one on the spot. If you cannot draw a clean event with payload, ID, and timestamp in under two minutes, the interviewer assumes you have never owned a contract in production.

Naming convention

Names are where most candidates leak signal. Three rules cover 90% of the feedback that comes back from debrief.

Past tense, always. OrderCreated, not CreateOrder. The second one is a command — the consumer can still refuse it. The first one is history — the consumer must react.

Specific over generic. UserUpdated is weak because every downstream service has to crack the payload to figure out which field changed and whether it cares. UserEmailChanged lets a consumer subscribe to exactly the slice it needs. Specificity is what makes the event bus stop becoming a giant fan-out tax.

Domain language, not table language. Use the ubiquitous language of the business: InvoiceIssued, ShipmentDispatched, SubscriptionRenewed. Avoid storage-coupled names like RowInserted, OrderTableUpdated, UserDocumentSaved. Storage is an implementation detail. The event is part of the domain contract and should outlive any specific database engine.

Anti-pattern	Why it fails	Better
`UpdateOrder`	Imperative — sounds like a command	`OrderStatusChanged`
`UserUpdated`	Too generic, forces consumers to diff	`UserEmailChanged`
`RowInserted`	Storage-coupled, leaks the DB	`CustomerRegistered`
`OrderEvent`	No verb, no tense, no information	`OrderPlaced`

Event payload shape

The interviewer will hand you a sticky note and ask you to draft the payload for OrderPlaced. Don't freelance. There is a canonical envelope and you should produce it in 30 seconds.

{
  "event_id": "evt_abc123",
  "event_type": "OrderPlaced",
  "event_version": 1,
  "occurred_at": "2026-05-07T12:00:00Z",
  "aggregate_id": "order_42",
  "data": {
    "customer_id": 99,
    "amount": 1500.00,
    "currency": "USD",
    "items": [
      {"sku": "ABC-001", "qty": 2, "price": 750.00}
    ]
  },
  "metadata": {
    "trace_id": "trace_xyz",
    "causation_id": "cmd_place_order_123",
    "correlation_id": "checkout_session_88",
    "schema_version": "1.0.0"
  }
}

Walk through it field by field on the whiteboard. The event_id is a UUID and exists for idempotency — consumers store seen IDs for a window (often 24 hours, sometimes 7 days for slower partners) and skip duplicates on retry. The aggregate_id ties the event back to the entity it belongs to, so a consumer can rebuild a projection by replaying every event with aggregate_id = order_42. The causation_id and correlation_id are the pair that lets you trace a chain across services — causation_id is the immediate parent, correlation_id is the original business flow.

Gotcha: never put PII in clear text or stuff a 4 MB attachment into the payload. Reference it with a URL or a content-addressable hash. Events get logged, replayed, archived, and indexed — every copy is a leak surface.

What you keep out is just as important. Don't include the entire denormalized state of the world, don't include passwords or full card numbers, don't include large blobs. The payload should answer the question "what changed?" — not "what is the entire state of the universe?".

Domain vs integration events

This is the question that separates a junior SA from someone who has actually owned a public API.

Domain events live inside a single bounded context. They are rich — they can carry internal model fields, denormalized projections, even references that only mean something to the team that publishes them. A consumer in the same context is allowed to be coupled to that internal shape because they ship together.

Integration events are the cross-context citizens. They are the public contract that crosses team boundaries and sometimes company boundaries. They are minimal, stable, versioned aggressively, and treated like an API. Renaming a field in an integration event is the same blast radius as renaming a column in a public REST endpoint.

Property	Domain event	Integration event
Audience	Same bounded context	Other contexts, partners
Shape	Rich, denormalized	Minimal, stable
Coupling tolerance	High	Near zero
Versioning	Rolling, internal	Strict semver, contracts
Example	`OrderPlaced` (internal)	`OrderCreatedV1` (partner-facing)

A very common pattern at Stripe, Shopify, and similar platforms is to publish both. The internal OrderPlaced carries everything the order service knows. A thin translator layer maps it to OrderCreatedV1 and pushes that onto a separate public stream. Internal teams get the rich shape, external partners get the stable contract, and a schema change on the internal event never breaks a partner integration.

This is also why you should never let a partner subscribe directly to a domain event topic — the abstraction layer is the only thing that keeps your refactoring options alive.

Train for your next tech interview

1,500+ real interview questions across engineering, product, design, and data — with worked solutions.

Join the waitlist

Schema versioning

Events outlive code. A row written today might be replayed in three years to rebuild a projection. So versioning is not optional — it's the price of admission to event-driven design.

Three strategies cover the field, and an interviewer will expect you to name them in order of cost.

The cheapest strategy is additive backward-compatible changes. Add new optional fields, never remove or rename existing ones. Old consumers ignore the new fields, new consumers use them. This works for 80% of real-world evolution and should be the default.

The next step up is versioning in the event type itself: OrderPlacedV1 becomes OrderPlacedV2, published in parallel on the bus. Consumers migrate on their own timeline. The translator above can downconvert V2 events into V1 shape for legacy subscribers. The cost is operational — you now run two parallel streams during the migration window.

The heavyweight option is a schema registry — Avro, Protobuf, or JSON Schema managed by Confluent Schema Registry, AWS Glue Schema Registry, or an equivalent. The registry enforces compatibility rules at publish time (backward, forward, full) and rejects breaking changes before they hit the wire. This is the right answer at Netflix, Uber, or any shop with hundreds of producers and consumers.

Sanity check: once published, an event lives forever. Replay, audit, compliance. Breaking renames and field removals are very hard. Plan the schema as if you were designing a payment API.

Common pitfalls

The most expensive mistake candidates make is conflating commands and events. They draw a PlaceOrder "event" and the interviewer immediately knows the candidate has not internalized the past-tense rule. A command is an intention — it can be rejected. An event is a fact — it already happened. If your "event" can fail validation downstream, it is a command in disguise and you are about to invent a distributed monolith.

The second pitfall is putting state in the payload instead of identity plus delta. A candidate writes an OrderUpdated event and crams the entire order object into it. Now every consumer is coupled to every field. The fix is to pick a tight verb (OrderShippingAddressChanged), include the aggregate ID, and include only what changed plus what a consumer needs to make sense of the change. Specific events with thin payloads beat generic events with fat payloads every time.

Third, candidates skip idempotency. They publish, the broker retries, the consumer applies the event twice, and now the customer has been charged twice or the inventory has been decremented twice. The fix is the event_id and a consumer-side deduplication table. Bake this in from the first event, not after the first incident. Idempotency added later is always more expensive than idempotency designed in.

Fourth, leaking storage shape into event names. UserRowUpdated, OrderTableModified, InventoryDocumentSaved — these names tell the world how you persist data. Switch to Postgres tomorrow and the events break. Use domain verbs: UserEmailChanged, OrderStatusTransitioned, InventoryAdjusted.

Fifth, letting partners subscribe to domain events directly. The internal stream is your refactoring playground; the partner stream is your contract. Mix them up and any internal cleanup becomes a public API break. Run a translator layer and a separate integration topic, even if there's currently only one partner.

If you want to drill systems analyst questions like this every day, Naildd is launching with hundreds of SA scenarios across event design, contracts, and distributed patterns.

FAQ

Is this official terminology?

No — this is the consensus from the DDD community and from books like Gregor Hohpe's Enterprise Integration Patterns and Vaughn Vernon's Implementing Domain-Driven Design. Different teams use slightly different vocabulary, and integration events are sometimes called "public events" or "contract events" at companies like Stripe and Shopify. On an interview, define your terms once at the start and the interviewer will accept the dialect you picked.

When should I use domain events vs simple synchronous calls?

Use synchronous calls when the caller needs the answer right now to continue — paying an invoice, validating a coupon, fetching a profile. Use domain events when the caller does not need to wait, when there are multiple interested downstream consumers, or when you want to decouple producer release cycles from consumer release cycles. A good heuristic: if more than two services react to "thing X happened", an event is the cheaper long-term shape.

How do consumers handle out-of-order events?

By including occurred_at and aggregate_id in every payload and ordering on the consumer side per aggregate. Kafka guarantees ordering within a partition, so partitioning by aggregate_id (for example, order_id) makes per-aggregate ordering free. Across aggregates, ordering is not guaranteed and should not be assumed — design consumers to be commutative where possible.

Should the event include the full new state or just the delta?

Lean toward delta plus identity. Full snapshots make events fat, leak coupling, and make versioning painful. A consumer that needs the full state can either rehydrate from the aggregate ID or subscribe to a separate state-snapshot stream. Reserve full-state events for cases where rehydration is genuinely expensive — for example, slow partner integrations that cannot make a follow-up call.

How long should we keep event history?

For replay and projection rebuild, at minimum the longest expected consumer downtime window plus a buffer — most teams pick 30 to 90 days. For audit and compliance, often years, in cold storage. For analytics, indefinitely in a data lake. Separate retention policies per use case rather than picking a single number for the whole bus.

What's the difference between an event-sourced system and one that just publishes domain events?

Publishing domain events means the event bus carries notifications, but the source of truth is still a regular database. Event sourcing means the event log is the source of truth — state is rebuilt by replaying events, and there is no separate "current state" table you trust over the log. Event sourcing is a much heavier commitment, gives you a perfect audit trail, and is overkill for most CRUD systems. On an interview, distinguish the two explicitly so you don't get caught promising audit guarantees you cannot deliver.