docs(state-hub): STATE-WP-0063 T03 done — tunnel cleanup restored activity-core

Document stale remote sshd forward on Railiance01 :18000 as root cause of
reconnect loop; T03 verified after bridge maintenance cleanup and manual
canaries for hourly RecentlyOnScope and daily WSJF triage.
This commit is contained in:
2026-06-21 19:47:56 +02:00
parent dff8cfe128
commit 323599f2fc

View File

@@ -115,12 +115,15 @@ Result 2026-06-21:
**workstation availability + ops-bridge tunnel reachability** to the local
State Hub API. Cluster-side `actcore-state-hub-bridge` proxies to node-local
`127.0.0.1:18000` as designed.
- **Retry fix (2026-06-21):** stale orphan `sshd` remote forward on Railiance01
port 18000 blocked new tunnel binds (`ExitOnForwardFailure` → exit 255).
`bridge maintenance cleanup state-hub-railiance01 --restart` cleared it.
## T3 — Restore daily WSJF triage evidence
```task
id: STATE-WP-0063-T03
status: progress
status: done
priority: medium
state_hub_task_id: "4b68b207-a80e-4c71-a246-10035ef69625"
```
@@ -134,12 +137,17 @@ Railiance01 and confirm:
If the schedule was paused, unpause and verify the next 07:20 Europe/Berlin
fire is armed.
Progress 2026-06-21: Manual trigger via actcore-api returned workflow
`activity-6fca51fa…:manual-ca469cb5…`. Schedule is armed (next 07:20 Europe/Berlin).
Report sink `state-hub-progress` is still failing with `[Errno 111] Connection
refused` while `state-hub-railiance01` tunnel is reconnecting. Re-verify after
tunnel stabilises (`bridge check state-hub-railiance01` + new `daily_triage`
progress event).
Progress 2026-06-21 (first attempt): sink failed while tunnel reconnecting.
Result 2026-06-21 (retry): `bridge maintenance cleanup state-hub-railiance01
--restart` cleared stale remote `sshd` on port 18000 (orphan forward with many
CLOSE_WAIT). Tunnel now **connected**; node `curl 127.0.0.1:18000/state/health`
and worker `actcore-state-hub-bridge:8000/state/health` both return ok. Manual
canaries succeeded:
- hourly RecentlyOnScope → `recently_on_scope_hourly` at 2026-06-21T17:45:29Z
- daily WSJF triage → `daily_triage` at 2026-06-21T17:45:46Z + working-memory
report under `the-custodian/memory/working/`
## T4 — Repair ancillary local crons