diff --git a/workplans/STATE-WP-0063-weekend-automation-repair.md b/workplans/STATE-WP-0063-weekend-automation-repair.md index 3aae56f..43e4ed6 100644 --- a/workplans/STATE-WP-0063-weekend-automation-repair.md +++ b/workplans/STATE-WP-0063-weekend-automation-repair.md @@ -115,12 +115,15 @@ Result 2026-06-21: **workstation availability + ops-bridge tunnel reachability** to the local State Hub API. Cluster-side `actcore-state-hub-bridge` proxies to node-local `127.0.0.1:18000` as designed. +- **Retry fix (2026-06-21):** stale orphan `sshd` remote forward on Railiance01 + port 18000 blocked new tunnel binds (`ExitOnForwardFailure` → exit 255). + `bridge maintenance cleanup state-hub-railiance01 --restart` cleared it. ## T3 — Restore daily WSJF triage evidence ```task id: STATE-WP-0063-T03 -status: progress +status: done priority: medium state_hub_task_id: "4b68b207-a80e-4c71-a246-10035ef69625" ``` @@ -134,12 +137,17 @@ Railiance01 and confirm: If the schedule was paused, unpause and verify the next 07:20 Europe/Berlin fire is armed. -Progress 2026-06-21: Manual trigger via actcore-api returned workflow -`activity-6fca51fa…:manual-ca469cb5…`. Schedule is armed (next 07:20 Europe/Berlin). -Report sink `state-hub-progress` is still failing with `[Errno 111] Connection -refused` while `state-hub-railiance01` tunnel is reconnecting. Re-verify after -tunnel stabilises (`bridge check state-hub-railiance01` + new `daily_triage` -progress event). +Progress 2026-06-21 (first attempt): sink failed while tunnel reconnecting. + +Result 2026-06-21 (retry): `bridge maintenance cleanup state-hub-railiance01 +--restart` cleared stale remote `sshd` on port 18000 (orphan forward with many +CLOSE_WAIT). Tunnel now **connected**; node `curl 127.0.0.1:18000/state/health` +and worker `actcore-state-hub-bridge:8000/state/health` both return ok. Manual +canaries succeeded: + +- hourly RecentlyOnScope → `recently_on_scope_hourly` at 2026-06-21T17:45:29Z +- daily WSJF triage → `daily_triage` at 2026-06-21T17:45:46Z + working-memory + report under `the-custodian/memory/working/` ## T4 — Repair ancillary local crons