Files
state-hub/docs/offline-write-buffer.md

96 lines
3.7 KiB
Markdown

# State Hub Offline Write Buffer
## Decision
State Hub supports outage buffering through an edge relay with a durable local
outbox, plus central idempotency on replayed writes.
The central service cannot buffer requests that never reach it. Agents should
therefore send writes to a local statehub-edge relay when buffering is enabled.
The relay forwards immediately while the upstream API is reachable. If the
upstream is offline, the relay persists queueable write envelopes in a local
SQLite outbox and returns an explicit queued receipt.
Queued receipts are pending evidence, not successful central commits. Operators
must inspect and replay the outbox after recovery.
## Defaults
- Relay listen target: operator-selected, recommended 127.0.0.1:18080.
- Upstream API: STATEHUB_UPSTREAM_URL, then API_BASE, then
http://127.0.0.1:8000.
- Outbox path: STATEHUB_OUTBOX_PATH, default
~/.statehub/edge-outbox.sqlite3.
- Central idempotency retention: 14 days.
## Route Classes
### Append-Only, Queueable
| Method | Path | Notes |
| --- | --- | --- |
| POST | /progress/ | Session-close progress events. |
| POST | /messages/ | Agent coordination messages. |
| PATCH | /messages/{id}/read | Safe only when the message id is already known. |
| POST | /token-events/ | Token accounting events. |
| POST | /token-events/upsert | Source-id based token upsert. |
| POST | /decisions/ | Queue only when the caller does not need the generated id immediately. |
Append-only writes replay with Idempotency-Key. Exact duplicate retries return
the original central response. Same key with a different request returns HTTP
409.
### Replace-Style, Queueable With Conflict Checks
| Method | Path | Notes |
| --- | --- | --- |
| PATCH | /tasks/{id} | Task status and metadata updates. |
| POST | /tasks/bulk-status-sync | Ordered batch; future coalescing may decompose by task. |
| PATCH | /decisions/{id} | Decision field update. |
| POST | /decisions/{id}/resolve | Decision resolution. |
| PATCH | /workplans/{id} | Workplan lifecycle/status updates. |
| PATCH | /workstreams/{id} | Legacy alias for workplan update. |
In v1 the relay does not silently overwrite newer central state after a replay
conflict. A 409 response marks the envelope conflict and leaves it available for
operator review.
### Online-Only In V1
The relay forwards these while the upstream is reachable and returns a clear
503 during outage:
- DELETE endpoints.
- Repository sync/import/ingest endpoints.
- Consistency sweep mutation endpoints.
- Fabric graph exports and external pulls.
- Schema/bootstrap/admin operations.
- Requests with credentials, authorization tokens, attachments, or large opaque
payloads.
## Non-Secret Outbox Contract
The outbox stores method, path, scrubbed JSON body, route class, source metadata,
idempotency key, retry status, last error, and central response summaries. It
never stores authorization headers, bearer tokens, cookies, API keys, passwords,
or secret-looking JSON fields. Payloads over 64 KiB are rejected.
## Operator Commands
statehub outbox status
statehub outbox list --status queued
statehub outbox replay --upstream-url http://127.0.0.1:8000
statehub outbox export --output /tmp/statehub-outbox.json
statehub outbox retry ENVELOPE_ID
statehub outbox cancel ENVELOPE_ID
## Recovery Checklist
1. Confirm the central State Hub API is reachable.
2. Run statehub outbox status on each host that may have queued writes.
3. Run statehub outbox replay until no due queued envelopes remain.
4. Review conflict envelopes manually.
5. Run make fix-consistency REPO=state-hub so file-backed workplan/task state
remains canonical after replay.
6. Record a progress note with non-secret replay counts.