Events and durable execution

When something happens in a Voyant module, other parts of the system often need to know: send a confirmation, sync a CMS, refresh a read model. Voyant handles “something happened” with events, and it is deliberately honest about what events guarantee and what they do not. The short version: events are signaling, not durable execution. The current event bus is in-process and fire-and-forget. When work must survive failure, retry, or run later, it moves onto a job or workflow path. And when you need a durable answer to “who did what, under which authority, and can we undo it,” that is the action ledger, a separate concern from events.

The event envelope

Every event consumer shares one canonical envelope from @voyant-travel/core/events:

await events.emit({
  name: "invoice.settled",
  data: { invoiceId },
  metadata: {
    category: "domain",
    source: "service",
  },
  emittedAt: new Date(),
})

The envelope has four parts: name, data, metadata, and emittedAt. Metadata carries category, source (workflow, service, route, subscriber, system), and correlation or causation identifiers when useful. Use the shared envelope rather than inventing package-local event shapes.

Domain events versus internal events

The category field is chosen on purpose, because not every event has the same audience.

domain: a business milestone other modules or integrations may reasonably care about. For example invoice.settled, booking.documents.sent, product.created.
internal: a process signal useful to subscribers, diagnostics, or automation, but not part of the core business language. For example invoice.document.generated, contract.document.generated.

Choosing the category deliberately lets consumers tell whether they are reacting to a business fact or an internal process signal.

The event bus is fire-and-forget

The default EventBus is in-process, and its semantics are explicit:

Handlers run sequentially.
Subscriber errors are caught and logged.
Subscribers do not affect the emitter’s outcome.
Emission does not imply durable delivery or retry.

Do not describe the event bus as a queue, a durable stream, or a reliable delivery mechanism. Callers must not assume durable retries, ordering beyond the implementation’s explicit behavior, dead-letter handling, backpressure, or priority. Those guarantees are not smuggled into the generic EventBus contract.

This honesty matters because it tells you exactly where a side effect belongs.

Two rules that keep events safe

Emit after the durable state change

An event describes a fact that is already true in durable storage. The pattern is always: write first, then announce.

// 1. Persist the rendition, payment, or delivery row.
const settled = await persistSettlement(input)

// 2. Only now emit the fact.
await events.emit({
  name: "invoice.settled",
  data: { invoiceId: settled.invoice.id },
  metadata: { category: "domain", source: "service" },
})

Real examples from the framework follow this exactly: financeDocumentService creates the invoice rendition before emitting the internal invoice.document.generated; financeSettlementService writes the payment and updates invoice state before emitting the domain invoice.settled; the booking-documents service persists the delivery row before emitting booking.documents.sent. The event is a signal to observers, never the mechanism that makes the thing true.

Subscribers are observers, not the correctness boundary

Subscribers are a good fit for secondary reactions: notifications, follow-up sync, cache invalidation, read-model refresh, diagnostics. They are a poor fit for anything that must succeed before the caller can treat the main operation as complete. The CMS sync plugins are the canonical example: payloadCmsPlugin and sanityCmsPlugin subscribe to product.created / updated / deleted and catch-and-log their own failures. A failed content sync is an operational issue, not a reason to invalidate the core product write.

If a side effect is part of the correctness boundary, do not hide it in a fire-and-forget subscriber. Move it to a durable path.

When to use durable execution instead

The moment a side effect needs retries, durable execution, delayed execution, explicit job identity or idempotency, or queue-backed isolation from the request path, it leaves the event bus. Voyant already has the right boundary for this:

@voyant-travel/core/orchestration exposes the JobRunner for durable background jobs.
@voyant-travel/core/workflows and the Workflows SDK provide durable, step-based orchestration with retries, sleeps, and resumability.

The split is clean: use events for signaling, use jobs or workflows for durable background work. If a particular event family genuinely needs stronger delivery guarantees later, that is promoted one family at a time, with explicit ownership for retries, failure handling, and idempotency, rather than turning the whole event bus into a queue. (And event priority is deferred entirely until a real durable queued surface exists for it to mean anything.) This is also how events relate to workflows in practice. A workflow can emit events as it progresses (source: "workflow"), and a subscriber can kick off follow-up reactions, but the durable, retryable part of the work lives inside the workflow’s steps, not inside a subscriber.

The action ledger

Events answer “what happened” for integration and reaction. They are explicitly not an audit trail. When operators need to answer “who did what, why, under which authority, and can we undo or compensate it,” that is the action ledger, a cross-module, actor-centered record of important actions. Voyant keeps four histories separate, and the ledger is the fourth:

Domain state

The real business tables (bookings, invoices, payments). The source of truth.

Domain events

Business and process facts emitted after durable changes. For reaction, not audit.

Workflow journal

Execution history of a durable orchestration: which steps ran, retried, or compensated.

Action ledger

Who initiated an action, which authority allowed it, what changed, and whether it can be reversed.

Roles: attribution and authority

A ledger entry records the principal and the authority, not just the operation. The shared spine carries the smallest stable facts: the action name and kind, status, evaluated risk, the actor type (staff, customer, partner, supplier), the principal type and id (user, API key, agent, workflow, system), the session or API-token id, delegation, the route or tool name, the workflow run and step, correlation and causation ids, idempotency fields, the target, the checked capability, and the authorization source. This maps directly from the request context you already have: userId, apiTokenId, sessionId, callerType, internal-request markers, and any delegation chain. A central principle runs through it: AI agents get no special trust. An agent is an ordinary principal with principal_type = "agent", explicit delegated authority, bounded capabilities, and a mandatory ledger record for every sensitive read or mutation. It never inherits a staff session implicitly.

Reversibility

Reversal is a domain-level concept, never database rollback. Each ledgered mutation declares how it can be undone:

Revert when the old state can safely be restored (a catalog overlay value from overlay history, a draft revision rolled back).
Compensate when the original action had external side effects (cancel an upstream hold, void an unpaid invoice, issue a credit note, refund through the domain cancellation flow).
Irreversible when the action is historical truth (a delivered email, an issued signature, an external capture past the settlement window).

Reversal is tracked as state, not a boolean: reversal_state, reversal_outcome, and the links between an action and the action that reversed it. Partial compensation is normal (a paid cancellation might refund 50% per policy). Ledger entries are append-only, so corrections, reversals, and approvals create new linked entries rather than rewriting history. When something cannot be undone, the operator UI says so and offers follow-up actions rather than pretending there is an undo button.

Consistency model

The write path depends on the action’s evaluated risk. High-risk and critical mutations write the ledger entry in the same transaction as the domain mutation: if the action cannot be durably recorded, it is not committed. Sensitive reads (PII reveals, credential access, private documents, agent retrieval contexts) usually have no mutation to piggyback on, so they use a standalone synchronous ledger write, and the response withholds the sensitive value until the entry is durable. Low-risk logging can be best-effort, but only when the capability’s policy explicitly allows loss.

The action ledger is a planning reference and a phased build, not a single shipped table. The durable spine plus committed profile details and payload references is the audit truth; a relay or outbox decouples heavier enrichment and export from the write path without becoming the source of truth.

Review heuristics

When you add or review event-related behavior:

Is the fact already durable?

Emit the event after the state change it describes, not before.

Domain or internal?

Choose the event category so consumers know if it is a business fact or a process signal.

Can subscriber failure be tolerated?

If not, the work is not a subscriber. Move it to a job or workflow.

Does it need retries or scheduling?

Durable, retryable, or delayed side effects belong on JobRunner or a workflow.

Does it need an audit answer?

“Who did this and can we undo it” is the action ledger, not an event.

Next steps

Workflows

Durable, step-based orchestration for retryable background work.

Services

Where events are emitted, after the durable write.

Auth and identity

The actor and principal context the action ledger records.

Glossary

The shared travel vocabulary behind event names.

​The event envelope

​Domain events versus internal events

​The event bus is fire-and-forget

​Two rules that keep events safe

​Emit after the durable state change

​Subscribers are observers, not the correctness boundary

​When to use durable execution instead

​The action ledger

Domain state

Domain events

Workflow journal

Action ledger

​Roles: attribution and authority

​Reversibility

​Consistency model

​Review heuristics

​Next steps

Workflows

Services

Auth and identity

Glossary

The event envelope

Domain events versus internal events

The event bus is fire-and-forget

Two rules that keep events safe

Emit after the durable state change

Subscribers are observers, not the correctness boundary

When to use durable execution instead

The action ledger

Roles: attribution and authority

Reversibility

Consistency model

Review heuristics

Next steps