22 KiB
id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | planning_priority | planning_order | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CUST-WP-0051 | workplan | Infrastructure Stabilization Metaplan | infotech | the-custodian | active | codex | custodian | high | 51 | 2026-06-27 | 2026-06-27 | 21cabc98-3f80-4d00-b3b7-06e2ac2af88f |
CUST-WP-0051 - Infrastructure Stabilization Metaplan
Goal
Drive the registered infrastructure workplans from a scattered blocked state to a stable checkpoint where:
- active blockers have a named owner, route, and next command or decision;
- production credential work uses approved custody paths only;
- daily operational automation has one healthy runner and clean evidence;
- State Hub registration reflects the real file state;
- unfinished strategic work is sequenced into clear follow-on lanes.
This workplan does not replace the child workplans. It is the coordination lane for removing cross-workplan blocks and creating a reliable handoff point.
Review Snapshot
Reviewed on 2026-06-27 from State Hub and the repo workplan files.
Active registered workstreams with open work:
| Workstream | Open state | Main stabilization meaning |
|---|---|---|
| artifact-store-wp-0007 | 5 todo | Object-store compatibility and STS credential vending lane. |
| ihub-wp-0022 | 3 wait, 5 done | Ops-hub evidence intake waits on widget seed/runtime key/smoke. |
| cust-wp-0047 | 1 wait, 6 done | Ops-hub now view waits on Inter-Hub widget activation. |
| cust-wp-0049 | 1 wait, 5 done | Access lane is ready; live bootstrap needs approved admin execution. |
| activity-wp-0016 | 1 wait, 2 progress, 5 todo, 2 done | Daily-triage output robustness needs live deploy/smoke evidence. |
| three-phoenix-ha-cluster | 7 todo | Target HA substrate is planned but not executed. |
| staged-promotion-lifecycle | finished, 7 done | Promotion discipline ready for broad production cutovers. |
| rail-ho-wp-0005 | 11 todo, 1 progress | Forgejo production migration needs human design and cutover decisions. |
| cust-wp-0045-cutover-runbook | 0 tasks | Registered runbook is appearing as an active no-task workstream. |
| net-wp-0020 | 2 wait, 1 todo, 2 done | OpenBao unseal custody models still need operator profile decisions. |
| issue-wp-0003 | 2 progress, 5 done | issue-core deploy is close; finish live wiring and runbook evidence. |
| activity-wp-0006 | 1 wait, 1 todo, 6 done | Three-run calibration waits on the daily-triage live gate. |
| cust-wp-0038 | 8 todo | Full ThreePhoenix State Hub HA migration remains strategic follow-on. |
| cust-wp-0025 | 17 todo, 9 done | FOS hub bootstrap now depends on identity, ops-hub, and fin-hub lanes. |
| cust-wp-0011 | 3 todo, 6 done | Pragmatic State Hub railiance01 migration still needs cutover/stabilize/retire. |
Additional repo-local hygiene issue:
CUST-WP-0014has frontmatterstatus: donebut all six task blocks are stilltodo. Treat it as either superseded and archive it, or reopen it as a focused State Hub sync-health workplan.
State Hub hygiene issue:
- There are stale
needs_humanflags on completed or cancelled tasks. These do not all block execution, but they make the operator view noisier and should be cleared or annotated after the source workplans are reconciled.
Dependency Shape
The critical path is:
- Credential and operator-access custody: OpenBao, Inter-Hub operator key, ops-hub runtime key, Forgejo SMTP/cutover approvals, and OpenBao unseal profile decisions.
- Ops evidence and daily automation: Inter-Hub ops-hub records, activity-core daily-triage robustness deployment, schema-valid smoke, then three clean scheduled runs.
- Production substrate and source forge: issue-core GitOps pilot, Forgejo production migration, artifact-store STS, staged promotion, and State Hub migration strategy.
- Federation buildout: identity completion, Core Hub replacement evidence, ops-hub scaffold reset, fin-hub scaffold, and business/runway canon.
Task: Normalize Registry And Workplan Hygiene
id: CUST-WP-0051-T01
status: done
priority: high
state_hub_task_id: "7e83bd50-5ca2-4341-9d18-65512e3f0442"
Clean up the planning substrate before execution work resumes.
Minimum scope:
- Decide whether
CUST-WP-0045-cutover-runbookshould stay registered as an active workstream or be represented only as a runbook underCUST-WP-0045. - Resolve
CUST-WP-0014: archive as superseded, or reopen and re-scope the six remaining State Hub sync-health tasks. - Clear or annotate stale
needs_humanflags on done/cancel tasks after source workplans confirm they are no longer live gates. - Run State Hub consistency after file changes.
Done when the active workstream list no longer contains no-task runbooks or contradictory done-with-todo files, and the human-needed view shows only live human gates.
Progress 2026-06-27:
CUST-WP-0045-cutover-runbooknow hasstatus: finished; State Hub no longer lists it as an active workstream.CUST-WP-0014is reopened asbacklogwith its task detail preserved, so it is no longer a contradictory done-with-todo file or an active queue item.make fix-consistency REPO=the-custodianpassed with pre-existing C-12 warnings and synced the lifecycle changes into State Hub.
Completed 2026-06-27: cleared 15 stale needs_human flags from tasks that
were already done or cancel, leaving live todo/progress/wait human
gates untouched. T01 is complete.
Task: Establish One Credential-Custody Unblock Board
id: CUST-WP-0051-T02
status: done
priority: high
state_hub_task_id: "312bde29-4370-4352-b5a3-00a8c4fe2059"
Collect the live operator-access decisions in one non-secret board.
Inputs:
CUST-WP-0049-T06: Inter-Hub admin access or deployment-side bootstrap path.IHUB-WP-0022-T04: ops-hub runtimeOPS_HUB_KEYcustody.NET-WP-0020: OpenBao unseal custody and SSH automation profile.RAIL-HO-WP-0005: Forgejo hostname, SMTP, runner, backup, cutover, rollback, and retirement decisions.
Rules:
- Do not put secrets in Git, State Hub, workplans, or chat.
- Use
warden route find/warden route showbefore requesting credentials. - Treat ops-warden as SSH certificate authority only, not as a secret store.
Done when each human/operator gate has an owner, approved route, expected execution host, non-secret evidence target, and fallback decision.
Completed 2026-06-27: added docs/credential-custody-unblock-board.md with
route records, live gate owners, expected execution hosts, non-secret evidence
targets, fallback decisions, and pickup order. Route lookup was verified through
/home/worsch/ops-warden using uv run warden route show ... --json because
the globally installed warden lacks the route subcommand.
Refined 2026-06-27: added docs/ops-warden-secret-posture-review.md and updated
the unblock board/checkpoint to consume ops-warden's warden access assist
boundary plus WARDEN-WP-0015 environment-posture/workload-maturity triage. This
turns vague IT-security blockers into dev/test doubles, owner-routed production
custody gates, or real maturity/posture violations.
Task: Close The Ops-Hub Inter-Hub Evidence Lane
id: CUST-WP-0051-T03
status: progress
priority: high
state_hub_task_id: "d6c3a39e-629d-47e4-b589-9e1a0273d9fa"
Finish the linked ops-hub activation chain:
- Execute
CUST-WP-0049-T06using the approved access route. - Close
CUST-WP-0047-T05by proving ops-hub widgets exist and accept evidence events. - Unblock
IHUB-WP-0022by provisioning the runtime key through the approved secret path and running the end-to-end evidence submission smoke.
Done when ops inventory probes and activity-core evidence can land in Inter-Hub without manual SQL or secret exposure.
Progress 2026-06-27:
- Added
docs/ops-hub-interhub-evidence-lane-status.mdwith non-secret public probe evidence. Production Inter-Hub has anops-hubrow and the ops-hub seed vocabulary is visible on public registry endpoints. - Protected widget, manifest, and hub-registry surfaces correctly require authentication; no runtime-key smoke was attempted.
- New blocker surfaced: the older
IHUB-WP-0022activity-core mapping contract names event types, policy scope, aggregate widget refs, and widget types that do not match the live ops-hub seed vocabulary. Align that contract before an attended bootstrap/runtime-key smoke, or the operator key may still hit manifest/schema failures.
Progress 2026-06-27 contract alignment:
- Updated
/home/worsch/inter-hubcontract docs forIHUB-WP-0022to target the live ops-hub seed vocabulary. Oldops-service-observedandops-inventory-driftnames are transition aliases,ops-access-path-checkedis deferred to fallback until supported, and payload examples now post only live manifest event types. - Ran
make fix-consistency REPO=inter-hub; it passed with pre-existing C-12 warnings and synced the IHUB-WP-0022 description drift into State Hub. - Remaining T03 gate is authenticated widget lookup, any missing backup/risk seed widget, runtime key custody, and protected submission smoke.
Progress 2026-06-27 Core Hub pivot:
- Created
CUST-WP-0052to drive the reframe from old Inter-Hub production bootstrap toward Core Hub-owned replacement implementation. - Treat remaining Inter-Hub evidence as legacy compatibility or fallback evidence. Do not spend new design work on Haskell Inter-Hub unless it is needed for migration proof or rollback.
- Next implementation lane should be Core Hub API first, CLI second, and web UI third, with whynot-design used for the rebuilt UI where practical.
Task: Stabilize Daily-Triage Automation
id: CUST-WP-0051-T04
status: progress
priority: high
state_hub_task_id: "42810d3b-5557-4efd-871b-65bef7c19e0e"
Finish the activity-core daily-triage reliability lane.
Sequence:
- Deploy the
activity-wp-0016robustness bundle: bounded prompt/schema, per-item parsing, quarantine lane, and producer guardrails. - Run a schema-valid live daily-triage smoke on railiance01.
- Collect three clean scheduled runs with matching activity-core, State Hub, and working-memory evidence.
- Close
activity-wp-0006calibration and decide the fate of theCUST-WP-0045cutover runbook registration.
Done when there is exactly one trusted daily triage runner and the fallback state is documented.
Progress 2026-06-27:
- Added
docs/daily-triage-stabilization-status.mdwith the current evidence chain. The 2026-06-24 and 2026-06-25 scheduled runs were schema-valid; the 2026-06-26 and 2026-06-27 runs reached State Hub and working memory but failed output validation around char 5.2k. - Current primary blocker is no longer a silent schedule or State Hub sink
outage. The live runner still needs the
ACTIVITY-WP-0016code/schema bundle and Railiance runtime prompt changes so malformed tails degrade to quarantined partial output. - Pickup sequence: deploy WP-0016 code/schema together, update the runtime prompt bundle for bounded top-N/per-item framing/token headroom, run a live railiance01 smoke, then restart the three-clean-run gate.
- Normalized ACTIVITY-WP-0016 source task status in activity-core: T04 is done and T05 is progress, matching its own progress notes.
- Updated activity-core daily-triage source notes: ACTIVITY-WP-0010-T02 is now done, T03/T04 point at the post-WP-0016 live smoke and three-run gate, and ACTIVITY-WP-0006-T03 records the 2026-06-27 validation failure.
- Cleared the stale human-needed flag from the completed bridge/config task and moved live intervention notes onto the deploy/smoke/calibration gate.
Task: Finish Near-Term Production Service Lanes
id: CUST-WP-0051-T05
status: progress
priority: medium
state_hub_task_id: "2083f0e4-e037-48bf-8069-f31e8db2fd95"
Move near-complete service workstreams to done before starting larger migrations.
Priority order:
issue-wp-0003: finish activity-core wiring and end-to-end GitOps runbook.rail-ho-wp-0005: resolve Forgejo production decisions, email recovery, and cutover approval gates.artifact-store-wp-0007: complete MinIO compatibility and STS credential vending assessment if it is required by backup, registry, or app lanes.staged-promotion-lifecycle: make production promotion gates explicit before further cluster/source-forge cutovers.
Done when each lane is either finished or parked with a precise dependency and no ambiguous human-needed state.
Progress 2026-06-27:
- Added
docs/near-term-production-service-lanes-status.mdwith a lane board for issue-core, Forgejo, artifact-store, and staged promotion. - issue-core is the immediate near-done lane: the service itself is healthy, but
activity-core still points at port
8010andISSUE_SINK_TYPE=null. Do not flip it to REST untilISSUE_CORE_API_KEYis injected into activity-core's runtime secret via routeactivity-core-issue-sink. - Forgejo remains parked behind explicit production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, and cutover approval.
- artifact-store and staged promotion are executable planning/build lanes: artifact-store D7.1/D7.2 remains open; staged-promotion T02 is now complete before broad production source-forge migration work.
Progress 2026-06-27 staged promotion:
- Completed
RAIL-BS-WP-0006-T02in/home/worsch/railiance-cluster. Addeddocs/app-toml-contract.md,schemas/railiance-app.schema.json, andexamples/railiance/app.toml, defining the repository-localrailiance/app.tomldeclaration for identity, ownership, source/artifact policy, platform dependencies, secret references without plaintext values, observability, stage commands/checks/evidence, canary/promotion modes, rollback, and human approval gates. make fix-consistency REPO=railiance-clusterpassed with pre-existing C-12 warnings and synced the T02 status into State Hub.- T02 through T07 are complete; the staged-promotion lifecycle is finished.
Progress 2026-06-27 staged promotion T03:
- Completed
RAIL-BS-WP-0006-T03in/home/worsch/railiance-cluster. Addeddocs/overlay-repo-pattern.md,tools/create_railiance_overlay_repo.sh, and thebin/railiance create-overlaydispatcher entry. The scaffold writes a separate overlay repo withrailiance/upstream.toml, schema-validrailiance/app.toml, stage values, a thin Helm chart, Stage 1 test script, rollback runbook, and promotion notes without cloning upstream code or handling secrets. - Verified the generated Forgejo overlay sample against
schemas/railiance-app.schema.json; generated Stage 1 script ran with Helm skipped because Helm is unavailable in this environment. make fix-consistency REPO=railiance-clusterpassed with pre-existing C-12 warnings and synced the T03 status into State Hub.
Progress 2026-06-27 staged promotion T04:
- Completed
RAIL-BS-WP-0006-T04in/home/worsch/railiance-cluster. Addedtools/cmd/railiance-run, thebin/railiance rundispatcher entry, anddocs/railiance-run-command.md. The command readsrailiance/app.toml, runs Stage 1 commands and local checks, and emits arailiance.run-result.v1JSON result with command references and scrubbed HTTP URLs rather than command logs, stdout/stderr, or secret-bearing URL details. - Updated generated overlays so a Forgejo overlay completes Stage 1 locally:
stage1-scriptis required,local-healthis optional when no local service is running, and Helm rendering remains optional when Helm is unavailable. - Verified a fresh generated Forgejo overlay against
schemas/railiance-app.schema.jsonandbin/railiance run; the smoke passed with one command, two checks, and zero required failures. make fix-consistency REPO=railiance-clusterpassed with pre-existing C-12 warnings and synced the T04 status into State Hub.
Progress 2026-06-27 staged promotion T05:
- Completed
RAIL-BS-WP-0006-T05in/home/worsch/railiance-cluster. Generated overlays now include a Stage 2 canary Helm template with stable/canary release identities, isolated ingress by default, optional Traefik weighted routing, Prometheus annotations, HTTP probes, conservative resource limits, rollback-safe Stage 2/Stage 3 values, andtests/stage2-template.sh. - Verified a fresh generated Forgejo overlay with schema validation,
tests/stage1.sh,tests/stage2-template.sh, andbin/railiance run. Helm rendering was skipped because Helm is unavailable in this environment. make fix-consistency REPO=railiance-clusterpassed with pre-existing C-12 warnings and synced the T05 status into State Hub.
Progress 2026-06-27 staged promotion T06:
- Completed
RAIL-BS-WP-0006-T06in/home/worsch/railiance-cluster. Addedtools/cmd/railiance-stage2plusbin/railiance deployandbin/railiance observedispatch. Both commands default to non-mutating plans; apply/live modes fail closed on missing prerequisites. - Verified a fresh generated Forgejo overlay with schema validation,
tests/stage1.sh,tests/stage2-template.sh, Stage 2 deploy plan, Stage 2 observe plan, and blocked apply without approval/Helm. make fix-consistency REPO=railiance-clusterpassed with pre-existing C-12 warnings and synced the T06 status into State Hub.
Progress 2026-06-27 staged promotion T07 and finish:
- Completed
RAIL-BS-WP-0006-T07in/home/worsch/railiance-cluster. Addedtools/cmd/railiance-stage3,bin/railiance promote,bin/railiance rollback, anddocs/promote-rollback-onboarding.md. Generated overlays now declare promote/rollback plan commands. - Verified a fresh generated Forgejo overlay through Stage 1 run, Stage 2 deploy/observe plans, Stage 3 promote/rollback plans, and blocked apply paths for missing approval/Helm/revision evidence.
- Marked
RAIL-BS-WP-0006status: finished;make fix-consistency REPO=railiance-clustersynced the finished workstream with only pre-existing C-12 orphan-row warnings.
Task: Decide State Hub Migration Strategy
id: CUST-WP-0051-T06
status: progress
priority: high
state_hub_task_id: "0ac3763f-eac0-4773-9be8-cb0a7979e444"
Choose and execute the State Hub stabilization path.
Decision:
- If pragmatic railiance01 service is enough for the next operating period,
finish
CUST-WP-0011: cutover MCP config, observe the stabilization window, then retire or retain WSL2 fallback by explicit decision. - If HA is now required, promote
CUST-WP-0038and the ThreePhoenix HA cluster lane: readiness, storage/database strategy, HA API behavior, failover drill, restore drill, and endpoint/runbook update.
Done when the active State Hub path is singular, tested, and documented, and the alternate path is either cancelled, deferred, or explicitly retained as a future workplan.
Progress 2026-06-27:
- Added
docs/state-hub-migration-strategy-status.mdand selected the pragmaticCUST-WP-0011railiance01 path as the singular active State Hub stabilization lane. CUST-WP-0011is already through T01-T06: image pushed, cluster manifests defined, empty deploy healthy, migrations run, WSL2 data restored, row counts compared, and cluster API health/summary verified.- Next gate is
CUST-WP-0011-T07: explicit approval to freeze WSL2 writes, restore the final dump, compare again, and redirect MCP/private access to the cluster endpoint. CUST-WP-0038andRAIL-BS-WP-0007remain deferred HA lanes until the pragmatic path stabilizes and ThreePhoenix storage/database strategy is current.
Task: Sequence FOS Hub Bootstrap To Completion
id: CUST-WP-0051-T07
status: progress
priority: medium
state_hub_task_id: "27b6828a-9e87-4135-a036-bce760c3057c"
Use the stabilized substrate to finish CUST-WP-0025 without reviving the
mega-hub pattern.
Recommended order:
- Finish identity foundations: NK-WP-0001, NK-WP-0002, then the IAM profile integration test.
- Create the standalone ops-hub repo from hub-core and ingest the inventory
artifacts from
CUST-WP-0047. - Add ops models, MCP tools, Railiance integration, dev-hub coupling, dashboard, and MCP registration.
- Only then start the fin-hub/business-model tasks.
Done when CUST-WP-0025 has no open foundational identity or ops-hub tasks and
fin-hub work is either started on a stable scaffold or deliberately deferred.
Progress 2026-06-27:
- Added
docs/fos-hub-bootstrap-sequence-status.mdwith the current sequence. - Corrected the identity foundation baseline in
CUST-WP-0025: the oldNK-WP-0001Keycloak task is cancelled as superseded,NK-WP-0002local identity is done, and the remaining identity gate is the IAM Profile v0.2 FastAPI integration test. - Current ops-hub reality is extension-first:
ops-hubexists,OPS-WP-0001is finished, andOPS-WP-0002waits on authenticated Inter-Hub bootstrap/runtime-key evidence. ReconcileCUST-WP-0025-T13-T19after the first governed ops event lands. - Fin-hub/business tasks remain deliberately deferred until identity integration and ops-hub extension evidence are proven.
Progress 2026-06-27 Core Hub reset:
CUST-WP-0052now owns the reset criteria.CUST-WP-0025-T13throughT19should not be executed literally as the old standalone ops-hub scaffold until Core Hub replacement evidence is good enough and the tasks are rewritten.- Core Hub is promising enough to stop expanding the Inter-Hub-first path:
local ops-hub bootstrap compatibility and
/consolevisual checks exist, but staging import, deployed dual-run smokes, and cutover evidence are still open.
Task: Create The Stable Pickup Checkpoint
id: CUST-WP-0051-T08
status: done
priority: high
state_hub_task_id: "2cc0a127-a749-4228-962e-f8c9b693a1b3"
Close this metaplan by creating an operator-friendly checkpoint.
Minimum contents:
- active workstream list with zero stale runbooks and zero contradictory task states;
- blocker board showing no unowned credential, access, or approval gates;
- daily automation evidence from the latest successful scheduled run;
- production service status summary for State Hub, Inter-Hub, ops-hub evidence, issue-core, Forgejo, and artifact-store;
- explicit next-pick list for remaining strategic tasks.
Done when a future agent can start from the checkpoint and choose the next workplan without reconstructing this review.
Completed 2026-06-27: added
docs/infrastructure-stabilization-pickup-checkpoint.md with the live active
workstream list, named blocker board, latest daily-triage evidence, production
service status summary, and next-pick sequence. This closes the handoff surface
for future agents while the child workplans remain the execution source of
truth.