Idempotency and ordering

Every event-shaped payload in the contract carries a partner-generated correlation_id. UUID v4 or UUID v7 is recommended; v7 sorts by time, which makes operational triage easier.

Idempotency keys

Surface

Idempotency key

Realtime calls (executor → planner)

(partner_id, correlation_id)

Confirmation calls (executor → planner)

(partner_id, correlation_id)

Webhook dispatch (planner → executor)

(planner_id, correlation_id)

Poll-mode dispatch acks (executor → planner)

(executor_id, cursor)

A replay returns 200 OK with replay: true where the response schema supports it, and produces no additional state mutation: no extra movement, no extra exception, no duplicate work-status, no duplicate supervisor task, no double dispatch.

Retention

The receiving side must retain enough idempotency state to absorb the longest reasonable retry window. Recommended retention:

  • 30 days for realtime and confirmation correlation_ids on the planner.

  • 24 hours for webhook correlation_ids on the executor, matching the webhook DLQ horizon.

Trim older entries on a rolling basis. The retention is per-partner_id per-correlation_id; absolute storage cost is modest even at high throughput.

Ordering guarantees

Channel

Ordering

Kafka dispatch

Preserved per warehouse via partition key on warehouse_frn.

Webhook dispatch

Serialized per (planner_id, warehouse_id). The next event is not delivered until the previous one is acknowledged or DLQ’d.

Polling dispatch

Cursor-ordered. Callers must process pages strictly in order and acknowledge only after durable processing.

Realtime

Best-effort. Movements may arrive out of wall-clock order; the planner reconciles via the confirmation path.

Confirmation

Strictly per-window. Two confirmations for the same window key are deduplicated.

Cross-warehouse ordering is not guaranteed in any channel and should not be relied on.

Replay vs duplicate detection

A replay is intentional — the sender retried after a network failure and got the same (partner_id, correlation_id) accepted twice. The contract returns 200 OK with replay: true.

A duplicate is a sender bug — different correlation_ids for what is logically the same event. The contract has no way to detect duplicates at the protocol level; it is the sender’s responsibility to use the same correlation_id across all retries of the same logical action.

Common sender mistakes that produce duplicates:

  1. Generating a new UUID on each retry attempt instead of reusing the original.

  2. Restarting the dispatch publisher and re-emitting from a stale offset without remembering already-sent correlation_ids.

  3. Splitting one logical movement into two physical retries with two different correlation_ids when a partial write succeeded.

If you see ledger drift after rollout, audit your correlation_id generation first. The contract’s idempotency surface is robust; bugs almost always originate sender-side.

Cursor advancement (poll mode)

Poll-mode dispatch uses a cursor instead of correlation-id idempotency:

GET  /wes/v1/dispatch/pending?since={cursor}&limit=100
POST /wes/v1/dispatch/ack    { "cursor": "<cursor>" }

The cursor opaquely encodes the planner’s internal commit position. The executor must:

  1. GET /dispatch/pending with the last acked cursor.

  2. Process the returned page durably (the events become facts on the executor side before the ack).

  3. POST /dispatch/ack with the page’s next_cursor.

  4. Repeat until next_cursor is empty or equal to the last acked cursor.

Acking before durable processing means lost events on a crash. Acking a cursor that is earlier than the last one acked is a no-op — cursors are monotone.