12 KiB
Infrastructure Stabilization Pickup Checkpoint
Updated: 2026-06-30
Coordinator workplan: CUST-WP-0051
Purpose
This checkpoint is the restart surface for the infrastructure stabilization metaplan. It consolidates the workplan review, unblock boards, current State Hub registration state, and the next strategic picks.
Use this file first when resuming the lane. Then open the source workplan named in the relevant row and continue from its task state.
Registration State
State Hub active workstreams queried on 2026-06-27:
| Workstream | Current pickup meaning |
|---|---|
artifact-store-wp-0007 |
Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
ihub-wp-0022 |
Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
cust-wp-0047 |
Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
cust-wp-0049 |
Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
cust-wp-0051 |
This metaplan is the coordination layer for remaining cross-workplan gates. |
activity-wp-0016-llm-output-robustness-trust-boundary |
Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
three-phoenix-ha-cluster |
HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
rail-ho-wp-0005 |
Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
net-wp-0020 |
OpenBao unseal/token custody remains an operator design and smoke gate. |
issue-wp-0003 |
issue-core service is healthy; activity-core REST emission wiring remains. |
activity-wp-0006 |
Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
cust-wp-0038 |
Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
cust-wp-0025 |
FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
cust-wp-0011 |
Active State Hub migration path; next gate is explicit cutover approval. |
Hygiene status:
CUST-WP-0045-cutover-runbookis no longer active; it is a finished runbook record, not an empty active workstream.CUST-WP-0014is reopened asbacklog; it is no longer a done workplan with todo task blocks.- Completed or cancelled tasks no longer carry the stale human-needed flags cleared during this stabilization session.
make fix-consistency REPO=the-custodianstill reports pre-existing C-12 orphan-row warnings, but the relevant workplan lifecycle and task states sync.RAIL-BS-WP-0006-staged-promotion-lifecycleis finished: all seven tasks are done, the workstream is finished in State Hub, and the file frontmatter isstatus: finished.
Blocker Board
No live credential, access, or approval gate is unowned. Do not ask
ops-warden for secret values; use the route catalog, the warden access
assist/proxy surface where the catalog lane allows it, and the owning subsystem.
For credential-related blockers, classify the environment posture and workload
maturity first. Dev/test work can use synthetic contract doubles; production
real-value work needs owner custody, policy gates where applicable, and
non-secret evidence. See docs/ops-warden-secret-posture-review.md.
Do not implement ops-warden changes from this Custodian lane. New ops-warden needs should be posted through State Hub as requirements or suggestions for the separate ops-warden worker.
| Gate | Owner/route | Non-secret evidence to collect | Next action |
|---|---|---|---|
| State Hub pragmatic cutover | Custodian operator approval; CUST-WP-0011-T07 |
Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
| State Hub fallback retirement | Custodian/operator approval; CUST-WP-0038-T08 |
HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
| Inter-Hub ops-hub bootstrap | inter-hub-bootstrap-ssh, openbao-api-key, ssh-cert-host-access as needed |
Hub id, manifest id, widget count, runtime key prefix only, smoke result | Legacy/fallback only. Prefer Core Hub deployed smoke; run attended Inter-Hub bootstrap only by explicit operator supersede/rollback decision. |
| Ops-hub runtime evidence key | openbao-api-key / OpenBao custody |
OpenBao path/version or populated key count, event smoke id | Do not materialize legacy OPS_HUB_KEY until a deployed Core Hub smoke or explicit legacy Inter-Hub smoke is ready to use it. |
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub daily_triage id, output-valid or partial/quarantine status, working-memory path |
Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean streak, then have the activity-core owner land/sync the in-flight WP-0016 diagnostics and prove bounded top-N plus graceful-degradation smoke. |
| activity-core to issue-core | route activity-core-issue-sink |
actcore-runtime-secret has key, activity-core points to issue-core port 8765, HTTP 201, Gitea issue id |
Inject ISSUE_CORE_API_KEY through approved custody, set REST sink env, restart/sync, run safe emission. |
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
| OpenBao unseal and credential helper | openbao-api-key, railiance-infra-principals, ssh-cert-host-access, key-cape-oidc-login |
Policy names, role names, token accessor only, allow/deny smoke | warden-sign lane is verified/banked; broader custody profile and issuer automation remain separate operator-design gates. |
| ops-warden policy gate / warden-sign lane | SECRETS-WP-0004 + FLEX-WP-0007 finished; ops-warden operator posture |
decision:032b096c433ad80c, ttl_out_of_bounds, backend vault; no token/role/secret/accessor values |
No Custodian action. Keep policy.enabled off until testing/production maturity. |
Daily Automation Evidence
The scheduled daily-triage runner is alive and writing State Hub plus working memory evidence. The current blocker is bounded output-contract adoption and live graceful-degradation proof, not scheduling or sink reachability.
Latest clean scheduled streak:
- 2026-06-28: event
f0d8477e-1db9-4c07-bb8c-d28cbb868abc, schema-valid daily triage, working memory written. - 2026-06-29: event
176d2ea7-f0e3-48cd-999b-4ab6055c6a55, schema-valid daily triage, working memory written. - 2026-06-30: event
27d695b2-a537-481b-ada6-ca84ec24cd96, schema-valid daily triage, working memory written.
Latest failed scheduled runs before the clean streak:
- 2026-06-26: event
97fd20a0-eee0-45ea-8290-6d91874e1515, validation failed at char 5268, working memory written. - 2026-06-27: event
c5ab50a8-404b-4e30-849f-841b059ace65, validation failed at char 5246, working memory written.
Bank the three-run calibration streak, but keep the WP-0016 live-proof gate open until the bounded top-N contract and graceful-degradation smoke are proven. The activity-core worktree currently has in-flight uncommitted ACTIVITY-WP-0016 and ACTIVITY-WP-0018/0019 changes, so Custodian should wait for that owner to commit/sync or explicitly hand off before treating those files as source truth. Use activity-core repo-native automation status surface once it lands; do not use assistant-provided scheduling as operational evidence.
Production Service Summary
| Surface | Stable fact | Remaining gate |
|---|---|---|
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through CUST-WP-0011-T06. |
CUST-WP-0011-T07 cutover approval, then stabilization; HA path stays deferred. |
| Inter-Hub / Core Hub | Public https://hub.coulomb.social/api/v2/hubs exposes ops-hub; CORE-WP-0008 finished the Core Hub API smoke harness, activity-core sink, staging profile, CLI wrappers, UI backlog, and Custodian handoff. |
Run deployed Core Hub smoke, staging import, activity-core sink smoke, and readiness summary; keep Haskell Inter-Hub only for migration/rollback proof. |
| ops-hub evidence | CUST-WP-0025-T14 is done with the Core Hub ops evidence contract spec. CUST-WP-0025-T13 through T19 now use Core Hub API/CLI/UI gates; CUST-WP-0047 and CUST-WP-0049 remain legacy/fallback records. |
Execute CUST-WP-0025-T16, T17, and T18; close legacy Inter-Hub waits only through deployed Core Hub evidence or explicit supersede decision. |
| issue-core | ArgoCD service is healthy on port 8765; image 0.2.1; ExternalSecret Ready; authenticated smoke created Gitea issue 175. |
activity-core still needs ISSUE_CORE_API_KEY, URL port 8765, ISSUE_SINK_TYPE=rest, and a safe emission smoke. |
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
| artifact-store | D7.1 is done; D7.2 has an opt-in live MinIO compatibility harness and manual smoke docs. No live secret handoff is recorded. | Run D7.2 against an approved MinIO-compatible endpoint, then route D7.3 STS vending through identity/platform custody before changing credential behavior. |
| secrets-engine | SECRETS-WP-0004 is finished: the scoped warden-sign lane supported the vault-backed policy-gate smoke without exposing token material. SECRETS-WP-0003 remains active for the real whynot-design npm publish pilot. |
Finish or park SECRETS-WP-0003 behind Gitea bot/package-token provisioning, OpenBao custody, ops-warden route confirmation, and real package publish evidence. |
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity, IAM Profile v0.2, the Core Hub FastAPI IAM Profile integration test, and Core Hub operator UI first screens are done; hub-core extraction/dev-hub work is done; CUST-WP-0025 Phase 3 has been rewritten for Core Hub. | Execute the remaining Core Hub deployed evidence and cutover gates: CUST-WP-0025-T16 and T17. |
Next-Pick List
- Execute the remaining rewritten
CUST-WP-0025Core Hub gates: deployed smoke and activity-core proof (T16) and cutover decision coupling (T17). T03, T14, and T18 are complete as the identity integration template, ops evidence/read-model contract, and operator UI first-screen gates. - Keep
CUST-WP-0047andCUST-WP-0049as legacy evidence/fallback until Core Hub deployed smoke evidence or an explicit supersede decision closes them. - Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean daily-triage streak for calibration, then have the activity-core owner land/sync the in-flight WP-0016 diagnostics/status work and prove the bounded top-N plus graceful-degradation smoke.
- Complete the issue-core handoff by wiring activity-core to port
8765withISSUE_SINK_TYPE=restand one known-safe emission smoke. - Request explicit State Hub cutover approval for
CUST-WP-0011-T07, or record that WSL2 remains primary for the next operating period. - Run artifact-store D7.2 live MinIO-compatible evidence; Forgejo and storage work can now inherit the finished staged-promotion gates.
- Keep
SECRETS-WP-0003parked until Gitea bot/package-token provisioning, OpenBao custody, route confirmation, and a coordinated whynot-design version bump are available. - Keep Forgejo cutover and State Hub HA work parked until their human decision and drill gates are satisfied.
Resume Commands
cd /home/worsch/the-custodian
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
sed -n '1,260p' docs/credential-custody-unblock-board.md
After workplan edits, sync from State Hub:
cd /home/worsch/state-hub
make fix-consistency REPO=the-custodian