generated from coulomb/repo-seed
docs(ACTIVITY-WP-0014): rescope T05 to thin client under State Hub beachhead model
Resilience (queue/cache) is handed to custodian/state-hub as a per-machine beachhead; activity-core keeps only idempotent writes + adopt-beachhead-endpoint and retires its bespoke actcore-state-hub-bridge proxy. Proposal sent to state-hub. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -151,7 +151,7 @@ multi-day outage should not flood the triage feed). Update the Railiance runtime
|
|||||||
ConfigMap / bundle, redeploy, and document the run-miss options + per-definition
|
ConfigMap / bundle, redeploy, and document the run-miss options + per-definition
|
||||||
guidance in `docs/runbook.md`. Depends on T01 (confirm) and T02 (modes exist).
|
guidance in `docs/runbook.md`. Depends on T01 (confirm) and T02 (modes exist).
|
||||||
|
|
||||||
## Resilient State Hub sinks/resolvers (real incident fix)
|
## Keep activity-core thin under the State Hub beachhead model
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: ACTIVITY-WP-0014-T05
|
id: ACTIVITY-WP-0014-T05
|
||||||
@@ -160,17 +160,28 @@ priority: high
|
|||||||
state_hub_task_id: "b7e5b877-1b09-421c-a04e-78f785dc00a1"
|
state_hub_task_id: "b7e5b877-1b09-421c-a04e-78f785dc00a1"
|
||||||
```
|
```
|
||||||
|
|
||||||
T01 proved the 06-22/06-23 silence was **not** a Temporal misfire but a State Hub
|
**Architecture decision (Bernd, 2026-06-23):** the resilience that this incident
|
||||||
**`Connection refused` at the report sink** (and chronic resolver timeouts) because
|
needs — queuing writes and caching reads while State Hub is unreachable — must
|
||||||
railiance01 reaches State Hub via a reverse tunnel back to the workstation, which
|
**not** be a burden carried by client repos. It belongs to State Hub as a
|
||||||
is asleep at 07:20 Berlin. Misfire policies do not help: the run fires and fails
|
**per-machine local "beachhead"** (transparent read cache + write outbox, possibly
|
||||||
the same way. Make activity-core resilient to transient State Hub unavailability:
|
with State-Hub federation), owned by custodian/state-hub. It handles all three
|
||||||
|
failure modes: network interruption, central State Hub crash, central machine
|
||||||
|
down. This is handed off to state-hub (see the coordination message / proposal);
|
||||||
|
**do not build client-side queue/cache logic in activity-core.**
|
||||||
|
|
||||||
- Report sinks should retry with backoff and **not hard-fail the workflow** when
|
activity-core's only responsibilities under this model are thin:
|
||||||
the only failure is transient State Hub delivery; preserve the generated report
|
|
||||||
(working-memory note + a deferred/outbox state-hub-progress) for later flush.
|
- **Idempotent writes (do now, in-repo):** attach a stable idempotency key
|
||||||
- Required State Hub context resolvers should retry/backoff and surface a clear,
|
(e.g. `run_id` + `instruction_id` + `event_type`) to every State Hub write so a
|
||||||
single diagnostic rather than a bare `timed out`.
|
beachhead flush — possibly replayed after an outage — cannot create duplicate
|
||||||
- Separately (out of this repo): give railiance01 a State Hub endpoint that does
|
`daily_triage`/progress events. The report sink already does a read-based dedup
|
||||||
not depend on the workstation being awake, or run the triage at a time the
|
check (`_progress_exists`); make the guarantee explicit and not dependent on a
|
||||||
workstation is reliably up. Owner decision needed.
|
live read.
|
||||||
|
- **Adopt the beachhead endpoint (blocked on state-hub):** keep `STATE_HUB_URL`
|
||||||
|
pointed at the local beachhead, and **retire the bespoke
|
||||||
|
`actcore-state-hub-bridge` proxy** (the inline `hostNetwork` proxy in
|
||||||
|
`k8s/railiance/20-runtime.yaml`) once the state-hub-owned beachhead exists — it
|
||||||
|
is a primitive precursor of the beachhead and should not be extended here.
|
||||||
|
|
||||||
|
Blocked on the state-hub beachhead capability for the second item; the idempotent
|
||||||
|
-writes item can proceed independently.
|
||||||
|
|||||||
Reference in New Issue
Block a user