Rollout checklist¶
Use this as an implementation plan when adopting the contract on either side. Each item is independently verifiable. Order matters only where indicated.
Common prerequisites¶
Decide your role. A given system is a planner or an executor for a given warehouse, never both. Multi-warehouse deployments may play different roles per warehouse.
Pin a contract version. Read the OpenAPI for the target version. Pin generated client / server stubs to that exact version.
Provision partner registry records on both sides for every
(warehouse, vendor)pair.Decide environments. mTLS in production; API key in dev / test. The two share idempotency and ordering semantics — pre-production behavior is faithful.
Planner-side checklist¶
Dispatch transport per warehouse. Pick one of webhook (default), Kafka, or polling. Webhook for most third-party executors; Kafka for co-located FG.AI-to-FG.AI; polling for executors behind strict egress-only firewalls.
Implement the dispatch publisher. For webhook: HMAC signing, exponential-backoff retry, DLQ after 24h. For Kafka: produce to
wms-wes.dispatch.*.v1topics withwarehouse_frnpartition key. For poll: implement/dispatch/pendingand/dispatch/ack.Implement the realtime server.
POST /realtime/movementsand/realtime/exceptionsare mandatory./realtime/work-statusis recommended (drives the document-detail UI)./realtime/capacityis optional and dashboards-only.Implement the confirmation server.
/confirmation/shift-summary,/confirmation/cycle-count,/confirmation/inventory-snapshot.Implement the reconciler. Compare realtime sum vs confirmation per
(partner_id, warehouse_id, window_start, window_end). Emitwes.window.reconciled.v1orwes.window.discrepancy.v1.Implement the supervisor workflow. Three resolution actions (accept-confirmation, accept-realtime, manual-adjustment). Every resolution is attributable to a named supervisor with timestamp and reason in the audit log.
Idempotency storage keyed by
(partner_id, correlation_id)with 30-day retention.Wire
/healthand/capabilitiesfor operational dashboards and partner onboarding.Verify with replay tests for every inbound endpoint. Confirm
replay: truesemantics on at-least-once retries.Document the supervisor runbook before production go-live.
Executor-side checklist¶
Implement the dispatch receiver. For webhook: HMAC verification (constant-time compare) and idempotency on
(planner_id, correlation_id). For Kafka: consumer with offset commit only after durable processing. For poll: cursor-respecting loop with ack-after-durable.Implement task expansion. Take a
Routingfrom the dispatch payload, expand into floor tasks, assign workers / robots / equipment.Implement the realtime publisher. Post
PICK,PUT,MOVE,ADJUST,CONSUME,PRODUCE,RECEIVE,SHIPmovements as they happen. Batch where appropriate (10–200 per call). Post exceptions and work-status alongside.Implement the confirmation publisher. Emit a shift-summary at end-of-shift / end-of-wave / scheduled cadence. Cycle counts on schedule or on supervisor request. Inventory snapshots at cutover and major exception events.
Idempotency storage keyed by
(planner_id, correlation_id)with 24-hour retention for webhook dispatch.Backpressure. Honor
Retry-Afteron429. Drop realtime under sustained backpressure — confirmation is authoritative.Verify with replay tests for every inbound dispatch path. Confirm duplicate dispatch produces no duplicate tasks.
Joint verification before go-live¶
Test |
What to verify |
|---|---|
Cross-warehouse credential rejection |
A credential for |
Webhook signature mismatch |
A request with a wrong |
Replay returns |
A second send of the same |
Out-of-order dispatch |
Webhook delivery serializes per warehouse; Kafka preserves order via partition key. |
Discrepancy workflow |
A deliberate divergence between realtime sum and confirmation produces a |
Cycle-count auto-reconcile |
A counted quantity within tolerance auto-reconciles; out of tolerance routes to supervisor. |
Late correction |
A correction referencing a prior window updates the audit trail and re-emits the appropriate reconciliation event. |
DLQ on persistent failure |
A webhook recipient down for 24h drains its events to the DLQ; supervisors are alerted. |
Common rollout pitfalls¶
Generating a new
correlation_idon retry. Every retry of the same logical action must reuse the originalcorrelation_id, or idempotency breaks. Audit your retry layer.Acking poll cursors before durable processing. A crash between ack and persistence means lost events. The planner believes you have them; you do not.
Skipping HMAC verification. “We’re behind a VPC anyway” is not an answer — the contract requires it; the audit posture depends on it.
Mixing dispatch transports per warehouse. Webhook and Kafka for the same warehouse causes ordering ambiguity. Pick one.
Mixing confirmation cadences per warehouse. Supervisors cannot compare a shift-close window against a wave-close window. Pick one cadence per warehouse and stick to it.
Silent reconciliation overrides. Auto-resolving a
DISCREPANCYby always accepting confirmation (or always accepting realtime) destroys audit posture. The supervisor path exists for a reason.Ledger writes from realtime without
source = 'WES'tags. Confirmation-time reconciliation needs to know which contributions came from the executor stream. Tag them.