Rollout checklist

Use this as an implementation plan when adopting the contract on either side. Each item is independently verifiable. Order matters only where indicated.

Common prerequisites

  1. Decide your role. A given system is a planner or an executor for a given warehouse, never both. Multi-warehouse deployments may play different roles per warehouse.

  2. Pin a contract version. Read the OpenAPI for the target version. Pin generated client / server stubs to that exact version.

  3. Provision partner registry records on both sides for every (warehouse, vendor) pair.

  4. Decide environments. mTLS in production; API key in dev / test. The two share idempotency and ordering semantics — pre-production behavior is faithful.

Planner-side checklist

  1. Dispatch transport per warehouse. Pick one of webhook (default), Kafka, or polling. Webhook for most third-party executors; Kafka for co-located FG.AI-to-FG.AI; polling for executors behind strict egress-only firewalls.

  2. Implement the dispatch publisher. For webhook: HMAC signing, exponential-backoff retry, DLQ after 24h. For Kafka: produce to wms-wes.dispatch.*.v1 topics with warehouse_frn partition key. For poll: implement /dispatch/pending and /dispatch/ack.

  3. Implement the realtime server. POST /realtime/movements and /realtime/exceptions are mandatory. /realtime/work-status is recommended (drives the document-detail UI). /realtime/capacity is optional and dashboards-only.

  4. Implement the confirmation server. /confirmation/shift-summary, /confirmation/cycle-count, /confirmation/inventory-snapshot.

  5. Implement the reconciler. Compare realtime sum vs confirmation per (partner_id, warehouse_id, window_start, window_end). Emit wes.window.reconciled.v1 or wes.window.discrepancy.v1.

  6. Implement the supervisor workflow. Three resolution actions (accept-confirmation, accept-realtime, manual-adjustment). Every resolution is attributable to a named supervisor with timestamp and reason in the audit log.

  7. Idempotency storage keyed by (partner_id, correlation_id) with 30-day retention.

  8. Wire /health and /capabilities for operational dashboards and partner onboarding.

  9. Verify with replay tests for every inbound endpoint. Confirm replay: true semantics on at-least-once retries.

  10. Document the supervisor runbook before production go-live.

Executor-side checklist

  1. Implement the dispatch receiver. For webhook: HMAC verification (constant-time compare) and idempotency on (planner_id, correlation_id). For Kafka: consumer with offset commit only after durable processing. For poll: cursor-respecting loop with ack-after-durable.

  2. Implement task expansion. Take a Routing from the dispatch payload, expand into floor tasks, assign workers / robots / equipment.

  3. Implement the realtime publisher. Post PICK, PUT, MOVE, ADJUST, CONSUME, PRODUCE, RECEIVE, SHIP movements as they happen. Batch where appropriate (10–200 per call). Post exceptions and work-status alongside.

  4. Implement the confirmation publisher. Emit a shift-summary at end-of-shift / end-of-wave / scheduled cadence. Cycle counts on schedule or on supervisor request. Inventory snapshots at cutover and major exception events.

  5. Idempotency storage keyed by (planner_id, correlation_id) with 24-hour retention for webhook dispatch.

  6. Backpressure. Honor Retry-After on 429. Drop realtime under sustained backpressure — confirmation is authoritative.

  7. Verify with replay tests for every inbound dispatch path. Confirm duplicate dispatch produces no duplicate tasks.

Joint verification before go-live

Test

What to verify

Cross-warehouse credential rejection

A credential for WH-A is rejected with 403 when posting for WH-B.

Webhook signature mismatch

A request with a wrong X-FGAI-Signature is rejected with 401.

Replay returns 200 with replay: true

A second send of the same (partner_id, correlation_id) does not duplicate state.

Out-of-order dispatch

Webhook delivery serializes per warehouse; Kafka preserves order via partition key.

Discrepancy workflow

A deliberate divergence between realtime sum and confirmation produces a DISCREPANCY and the supervisor UI can resolve it.

Cycle-count auto-reconcile

A counted quantity within tolerance auto-reconciles; out of tolerance routes to supervisor.

Late correction

A correction referencing a prior window updates the audit trail and re-emits the appropriate reconciliation event.

DLQ on persistent failure

A webhook recipient down for 24h drains its events to the DLQ; supervisors are alerted.

Common rollout pitfalls

  • Generating a new correlation_id on retry. Every retry of the same logical action must reuse the original correlation_id, or idempotency breaks. Audit your retry layer.

  • Acking poll cursors before durable processing. A crash between ack and persistence means lost events. The planner believes you have them; you do not.

  • Skipping HMAC verification. “We’re behind a VPC anyway” is not an answer — the contract requires it; the audit posture depends on it.

  • Mixing dispatch transports per warehouse. Webhook and Kafka for the same warehouse causes ordering ambiguity. Pick one.

  • Mixing confirmation cadences per warehouse. Supervisors cannot compare a shift-close window against a wave-close window. Pick one cadence per warehouse and stick to it.

  • Silent reconciliation overrides. Auto-resolving a DISCREPANCY by always accepting confirmation (or always accepting realtime) destroys audit posture. The supervisor path exists for a reason.

  • Ledger writes from realtime without source = 'WES' tags. Confirmation-time reconciliation needs to know which contributions came from the executor stream. Tag them.