--- id: CUST-WP-0051 type: workplan title: "Infrastructure Stabilization Metaplan" domain: infotech repo: the-custodian status: active owner: codex topic_slug: custodian planning_priority: high planning_order: 51 created: "2026-06-27" updated: "2026-07-02" state_hub_workstream_id: "21cabc98-3f80-4d00-b3b7-06e2ac2af88f" --- # CUST-WP-0051 - Infrastructure Stabilization Metaplan ## Goal Drive the registered infrastructure workplans from a scattered blocked state to a stable checkpoint where: - active blockers have a named owner, route, and next command or decision; - production credential work uses approved custody paths only; - daily operational automation has one healthy runner and clean evidence; - State Hub registration reflects the real file state; - unfinished strategic work is sequenced into clear follow-on lanes. This workplan does not replace the child workplans. It is the coordination lane for removing cross-workplan blocks and creating a reliable handoff point. ## Review Snapshot Reviewed on 2026-06-27 from State Hub and the repo workplan files. Active registered workstreams with open work: | Workstream | Open state | Main stabilization meaning | | --- | --- | --- | | artifact-store-wp-0007 | 5 todo | Object-store compatibility and STS credential vending lane. | | ihub-wp-0022 | 3 wait, 5 done | Ops-hub evidence intake waits on widget seed/runtime key/smoke. | | cust-wp-0047 | 1 wait, 6 done | Ops-hub now view waits on Inter-Hub widget activation. | | cust-wp-0049 | 1 wait, 5 done | Access lane is ready; live bootstrap needs approved admin execution. | | activity-wp-0016 | 1 wait, 2 progress, 5 todo, 2 done | Daily-triage output robustness needs live deploy/smoke evidence. | | three-phoenix-ha-cluster | 7 todo | Target HA substrate is planned but not executed. | | staged-promotion-lifecycle | finished, 7 done | Promotion discipline ready for broad production cutovers. | | rail-ho-wp-0005 | 11 todo, 1 progress | Forgejo production migration needs human design and cutover decisions. | | cust-wp-0045-cutover-runbook | 0 tasks | Registered runbook is appearing as an active no-task workstream. | | net-wp-0020 | 2 wait, 1 todo, 2 done | OpenBao unseal custody models still need operator profile decisions. | | issue-wp-0003 | 2 progress, 5 done | issue-core deploy is close; finish live wiring and runbook evidence. | | activity-wp-0006 | 1 wait, 1 todo, 6 done | Three-run calibration waits on the daily-triage live gate. | | cust-wp-0038 | 8 todo | Full ThreePhoenix State Hub HA migration remains strategic follow-on. | | cust-wp-0025 | 17 todo, 9 done | FOS hub bootstrap now depends on identity, ops-hub, and fin-hub lanes. | | cust-wp-0011 | 3 todo, 6 done | Pragmatic State Hub railiance01 migration still needs cutover/stabilize/retire. | Additional repo-local hygiene issue: - `CUST-WP-0014` has frontmatter `status: done` but all six task blocks are still `todo`. Treat it as either superseded and archive it, or reopen it as a focused State Hub sync-health workplan. State Hub hygiene issue: - There are stale `needs_human` flags on completed or cancelled tasks. These do not all block execution, but they make the operator view noisier and should be cleared or annotated after the source workplans are reconciled. ## Review Refresh 2026-07-02 Re-reviewed against the live State Hub active-workplan list. Changes since the 2026-06-30 refinements: - New platform credential lanes registered in `/home/worsch/railiance-platform` that formalize the T02 custody board into executable workplans: `RAILIANCE-WP-0005` (credential request and lease broker; T07 wait, T09/T10 progress), `RAILIANCE-WP-0008` (OpenBao approved automation delegation; T03 progress), `RAILIANCE-WP-0009` (issue-core runtime ingestion key lane, `CCR-2026-0002`; T01/T02 done, T03–T07 wait on CCR approval/apply), and `RAILIANCE-WP-0010` (llm-connect OpenRouter provider key lane, `CCR-2026-0003`; T01 progress, T03–T07 wait). - New cluster deploy lanes in `/home/worsch/railiance-cluster`: `RAIL-BS-WP-0008` packaged the ACTIVITY-WP-0016 robustness deploy as `make deploy-activity-core-triage-robustness` with coupled schema/executor gating, runtime-Instruction contract checks (top-7 bound, NDJSON framing, `max_tokens` ≥ 1800), post-deploy triage trigger, and State Hub evidence polling; `RAIL-BS-WP-0009` packaged the ACTIVITY-WP-0012-T05 no-restart admin-sync smoke as `make admin-sync-smoke`. Both await live execution on railiance01 — this is now the single next action for T04. - The T05 issue-core REST flip remains gated on `RAILIANCE-WP-0009` T03–T05 (apply/provision/verify); T01 evidence shows the `issue-core/issue-core-runtime` ExternalSecret is already `Ready=True` and synced, so the remaining gap is CCR approval plus non-secret verification, not missing plumbing. ## Dependency Shape The critical path is: 1. Credential and operator-access custody: OpenBao, Inter-Hub operator key, ops-hub runtime key, Forgejo SMTP/cutover approvals, and OpenBao unseal profile decisions. 2. Ops evidence and daily automation: Inter-Hub ops-hub records, activity-core daily-triage robustness deployment, schema-valid smoke, then three clean scheduled runs. 3. Production substrate and source forge: issue-core GitOps pilot, Forgejo production migration, artifact-store STS, staged promotion, and State Hub migration strategy. 4. Federation buildout: identity completion, Core Hub replacement evidence, ops-hub scaffold reset, fin-hub scaffold, and business/runway canon. ## Task: Normalize Registry And Workplan Hygiene ```task id: CUST-WP-0051-T01 status: done priority: high state_hub_task_id: "7e83bd50-5ca2-4341-9d18-65512e3f0442" ``` Clean up the planning substrate before execution work resumes. Minimum scope: - Decide whether `CUST-WP-0045-cutover-runbook` should stay registered as an active workstream or be represented only as a runbook under `CUST-WP-0045`. - Resolve `CUST-WP-0014`: archive as superseded, or reopen and re-scope the six remaining State Hub sync-health tasks. - Clear or annotate stale `needs_human` flags on done/cancel tasks after source workplans confirm they are no longer live gates. - Run State Hub consistency after file changes. Done when the active workstream list no longer contains no-task runbooks or contradictory done-with-todo files, and the human-needed view shows only live human gates. Progress 2026-06-27: - `CUST-WP-0045-cutover-runbook` now has `status: finished`; State Hub no longer lists it as an active workstream. - `CUST-WP-0014` is reopened as `backlog` with its task detail preserved, so it is no longer a contradictory done-with-todo file or an active queue item. - `make fix-consistency REPO=the-custodian` passed with pre-existing C-12 warnings and synced the lifecycle changes into State Hub. Completed 2026-06-27: cleared 15 stale `needs_human` flags from tasks that were already `done` or `cancel`, leaving live `todo`/`progress`/`wait` human gates untouched. T01 is complete. ## Task: Establish One Credential-Custody Unblock Board ```task id: CUST-WP-0051-T02 status: done priority: high state_hub_task_id: "312bde29-4370-4352-b5a3-00a8c4fe2059" ``` Collect the live operator-access decisions in one non-secret board. Inputs: - `CUST-WP-0049-T06`: Inter-Hub admin access or deployment-side bootstrap path. - `IHUB-WP-0022-T04`: ops-hub runtime `OPS_HUB_KEY` custody. - `NET-WP-0020`: OpenBao unseal custody and SSH automation profile. - `RAIL-HO-WP-0005`: Forgejo hostname, SMTP, runner, backup, cutover, rollback, and retirement decisions. Rules: - Do not put secrets in Git, State Hub, workplans, or chat. - Use `warden route find` / `warden route show` before requesting credentials. - Treat ops-warden as SSH certificate authority only, not as a secret store. Done when each human/operator gate has an owner, approved route, expected execution host, non-secret evidence target, and fallback decision. Completed 2026-06-27: added `docs/credential-custody-unblock-board.md` with route records, live gate owners, expected execution hosts, non-secret evidence targets, fallback decisions, and pickup order. Route lookup was verified through `/home/worsch/ops-warden` using `uv run warden route show ... --json` because the globally installed `warden` lacks the `route` subcommand. Refined 2026-06-27: added `docs/ops-warden-secret-posture-review.md` and updated the unblock board/checkpoint to consume ops-warden's `warden access` assist boundary plus WARDEN-WP-0015 environment-posture/workload-maturity triage. This turns vague IT-security blockers into dev/test doubles, owner-routed production custody gates, or real maturity/posture violations. Refined 2026-06-30: closed the adjacent ops-warden policy-gate support lanes without changing ops-warden itself. `/home/worsch/flex-auth` `FLEX-WP-0007` finished at commit `339c35e`, and `/home/worsch/secrets-engine` `SECRETS-WP-0004` finished at commit `e0ab1b8`. Non-secret evidence records the deployed flex-auth runtime, `decision:032b096c433ad80c`, `ttl_out_of_bounds`, backend `vault`, and the scoped `warden-sign` OpenBao lane. `policy.enabled` remains intentionally off until testing/production maturity, so this gate is verified and banked rather than live-enforced. ## Task: Close The Ops-Hub Inter-Hub Evidence Lane ```task id: CUST-WP-0051-T03 status: progress priority: high state_hub_task_id: "d6c3a39e-629d-47e4-b589-9e1a0273d9fa" ``` Finish the linked ops-hub activation chain: - Execute `CUST-WP-0049-T06` using the approved access route. - Close `CUST-WP-0047-T05` by proving ops-hub widgets exist and accept evidence events. - Unblock `IHUB-WP-0022` by provisioning the runtime key through the approved secret path and running the end-to-end evidence submission smoke. Done when ops inventory probes and activity-core evidence can land in Inter-Hub without manual SQL or secret exposure. Progress 2026-06-27: - Added `docs/ops-hub-interhub-evidence-lane-status.md` with non-secret public probe evidence. Production Inter-Hub has an `ops-hub` row and the ops-hub seed vocabulary is visible on public registry endpoints. - Protected widget, manifest, and hub-registry surfaces correctly require authentication; no runtime-key smoke was attempted. - New blocker surfaced: the older `IHUB-WP-0022` activity-core mapping contract names event types, policy scope, aggregate widget refs, and widget types that do not match the live ops-hub seed vocabulary. Align that contract before an attended bootstrap/runtime-key smoke, or the operator key may still hit manifest/schema failures. Progress 2026-06-27 contract alignment: - Updated `/home/worsch/inter-hub` contract docs for `IHUB-WP-0022` to target the live ops-hub seed vocabulary. Old `ops-service-observed` and `ops-inventory-drift` names are transition aliases, `ops-access-path-checked` is deferred to fallback until supported, and payload examples now post only live manifest event types. - Ran `make fix-consistency REPO=inter-hub`; it passed with pre-existing C-12 warnings and synced the IHUB-WP-0022 description drift into State Hub. - Remaining T03 gate is authenticated widget lookup, any missing backup/risk seed widget, runtime key custody, and protected submission smoke. Progress 2026-06-27 Core Hub pivot: - Created `CUST-WP-0052` to drive the reframe from old Inter-Hub production bootstrap toward Core Hub-owned replacement implementation. - Treat remaining Inter-Hub evidence as legacy compatibility or fallback evidence. Do not spend new design work on Haskell Inter-Hub unless it is needed for migration proof or rollback. - Next implementation lane should be Core Hub API first, CLI second, and web UI third, with whynot-design used for the rebuilt UI where practical. ## Task: Stabilize Daily-Triage Automation ```task id: CUST-WP-0051-T04 status: progress priority: high state_hub_task_id: "42810d3b-5557-4efd-871b-65bef7c19e0e" ``` Finish the activity-core daily-triage reliability lane. Sequence: 1. Deploy the `activity-wp-0016` robustness bundle: bounded prompt/schema, per-item parsing, quarantine lane, and producer guardrails. 2. Run a schema-valid live daily-triage smoke on railiance01. 3. Collect three clean scheduled runs with matching activity-core, State Hub, and working-memory evidence. 4. Close `activity-wp-0006` calibration and decide the fate of the `CUST-WP-0045` cutover runbook registration. Done when there is exactly one trusted daily triage runner and the fallback state is documented. Progress 2026-06-27: - Added `docs/daily-triage-stabilization-status.md` with the current evidence chain. The 2026-06-24 and 2026-06-25 scheduled runs were schema-valid; the 2026-06-26 and 2026-06-27 runs reached State Hub and working memory but failed output validation around char 5.2k. - Current primary blocker is no longer a silent schedule or State Hub sink outage. The live runner still needs the `ACTIVITY-WP-0016` code/schema bundle and Railiance runtime prompt changes so malformed tails degrade to quarantined partial output. - Pickup sequence: deploy WP-0016 code/schema together, update the runtime prompt bundle for bounded top-N/per-item framing/token headroom, run a live railiance01 smoke, then restart the three-clean-run gate. - Normalized ACTIVITY-WP-0016 source task status in activity-core: T04 is done and T05 is progress, matching its own progress notes. - Updated activity-core daily-triage source notes: ACTIVITY-WP-0010-T02 is now done, T03/T04 point at the post-WP-0016 live smoke and three-run gate, and ACTIVITY-WP-0006-T03 records the 2026-06-27 validation failure. - Cleared the stale human-needed flag from the completed bridge/config task and moved live intervention notes onto the deploy/smoke/calibration gate. Progress 2026-06-30 daily-triage recheck: - State Hub now shows three consecutive schema-valid scheduled `daily_triage` events after the malformed 2026-06-26 and 2026-06-27 outputs: 2026-06-28 `f0d8477e-1db9-4c07-bb8c-d28cbb868abc`, 2026-06-29 `176d2ea7-f0e3-48cd-999b-4ab6055c6a55`, and 2026-06-30 `27d695b2-a537-481b-ada6-ca84ec24cd96`; all wrote working memory. - This banks the scheduling/sink/schema-validity streak for `ACTIVITY-WP-0006-T03` calibration feedback, but not the full WP-0016 live-proof gate because the reports still emit 10 recommendations instead of the bounded top-N contract. - /home/worsch/activity-core currently has in-flight uncommitted changes for ACTIVITY-WP-0016 diagnostics and new ACTIVITY-WP-0018/0019 automation-status/inventory workplans. Custodian should not overwrite or commit that worktree; the next clean handoff is for the activity-core owner to commit/sync or explicitly hand it off, then use the repo-native automation status surface as evidence. Progress 2026-07-02 deploy prep: - Executed the preparable half of `RAIL-BS-WP-0008`: activity-core runtime Instruction now satisfies the T02 contract in the repo bundle (activity-core commit `7612112`: bounded top-7 phrasing on one line, NDJSON-style per-item framing compatible with the WP-0016 recovery parser, `max_tokens` 1800), and `activity-core:railiance01-prod` was rebuilt locally from that commit. - Live transfer/deploy to railiance01 is blocked by agent permission policy (production remote writes need explicit operator authorization), and per-read production log access is likewise gated, so `RAIL-BS-WP-0008-T03` (raw llm-connect response for the 2026-06-26 run) is also operator-owned. - Found that `railiance01:~/activity-core` has no `.git`; the deploy script's revision gate requires git metadata — noted in the workplan for the operator. - Advanced `NET-WP-0020-T02` (OpenBao SOPS-held init/unseal automation) with a gated helper + Make targets in net-kingdom; see that workplan for detail. - Refreshed `docs/infrastructure-stabilization-pickup-checkpoint.md` with an "Operator Pickups Ready Now" list — five one-command/one-decision items. Progress 2026-07-02 live deploy (operator-authorized): - Bernd granted railiance01 deployment authorization; the ACTIVITY-WP-0016 robustness bundle is now deployed and live-proven. RAIL-BS-WP-0008, RAIL-BS-WP-0009, and ACTIVITY-WP-0016 are all `finished`. - Evidence: daily-triage trigger produced a clean schema-valid report with exactly 7 ranked recommendations (State Hub event `24d2d321`, `output_validated=true`); no-restart admin-sync smoke passed with stable worker identity (event `4caa288d`). ACTIVITY-WP-0012-T05 closed. - The raw 2026-06-26 llm-connect response proved unrecoverable (stateless pod, no log retention); ACTIVITY-WP-0016-T01 cancelled on that basis. - Remaining T04 gate: three clean *scheduled* runs (07:20 daily) starting 2026-07-03, then close ACTIVITY-WP-0006 calibration. This is a wait-for-the-calendar gate, not an action gate. ## Task: Finish Near-Term Production Service Lanes ```task id: CUST-WP-0051-T05 status: progress priority: medium state_hub_task_id: "2083f0e4-e037-48bf-8069-f31e8db2fd95" ``` Move near-complete service workstreams to done before starting larger migrations. Priority order: - `issue-wp-0003`: finish activity-core wiring and end-to-end GitOps runbook. - `rail-ho-wp-0005`: resolve Forgejo production decisions, email recovery, and cutover approval gates. - `artifact-store-wp-0007`: complete MinIO compatibility and STS credential vending assessment if it is required by backup, registry, or app lanes. - `secrets-wp-0003`: finish or explicitly park the whynot-design real npm publish pilot behind Gitea bot, OpenBao provisioning, route confirmation, and real package publish evidence. - `staged-promotion-lifecycle`: make production promotion gates explicit before further cluster/source-forge cutovers. Done when each lane is either finished or parked with a precise dependency and no ambiguous human-needed state. Progress 2026-06-27: - Added `docs/near-term-production-service-lanes-status.md` with a lane board for issue-core, Forgejo, artifact-store, and staged promotion. - issue-core is the immediate near-done lane: the service itself is healthy, but activity-core still points at port `8010` and `ISSUE_SINK_TYPE=null`. Do not flip it to REST until `ISSUE_CORE_API_KEY` is injected into activity-core's runtime secret via route `activity-core-issue-sink`. - Forgejo remains parked behind explicit production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, and cutover approval. - artifact-store and staged promotion are executable planning/build lanes: artifact-store D7.1/D7.2 remains open; staged-promotion T02 is now complete before broad production source-forge migration work. Progress 2026-06-27 artifact-store D7.1/D7.2: - Advanced `/home/worsch/artifact-store` `ARTIFACT-STORE-WP-0007`: D7.1 is done with `docs/minio-compatibility-landscape-2026-06-27.md`, deciding to pursue a compatibility profile instead of a direct MaxIO server fork. - D7.2 is now `progress` with an opt-in live MinIO compatibility pytest harness (`tests/integration/test_storage_s3_minio.py`), `make test-minio`, and manual smoke docs in `docs/OPERATOR.md`. - Verified artifact-store with `make test` (`110 passed, 2 skipped`), targeted Ruff checks for the new harness, direct harness execution (`2 skipped` without endpoint variables), and `git diff --check`. Repo-wide `make lint` still reports pre-existing Ruff format drift in seven untouched files. - Remaining artifact-store gate is live evidence: run D7.2 against an approved MinIO-compatible endpoint with non-secret health, round-trip, and multipart output. D7.3 STS vending remains identity/platform-routed work. Progress 2026-06-27 staged promotion: - Completed `RAIL-BS-WP-0006-T02` in `/home/worsch/railiance-cluster`. Added `docs/app-toml-contract.md`, `schemas/railiance-app.schema.json`, and `examples/railiance/app.toml`, defining the repository-local `railiance/app.toml` declaration for identity, ownership, source/artifact policy, platform dependencies, secret references without plaintext values, observability, stage commands/checks/evidence, canary/promotion modes, rollback, and human approval gates. - `make fix-consistency REPO=railiance-cluster` passed with pre-existing C-12 warnings and synced the T02 status into State Hub. - T02 through T07 are complete; the staged-promotion lifecycle is finished. Progress 2026-06-27 staged promotion T03: - Completed `RAIL-BS-WP-0006-T03` in `/home/worsch/railiance-cluster`. Added `docs/overlay-repo-pattern.md`, `tools/create_railiance_overlay_repo.sh`, and the `bin/railiance create-overlay` dispatcher entry. The scaffold writes a separate overlay repo with `railiance/upstream.toml`, schema-valid `railiance/app.toml`, stage values, a thin Helm chart, Stage 1 test script, rollback runbook, and promotion notes without cloning upstream code or handling secrets. - Verified the generated Forgejo overlay sample against `schemas/railiance-app.schema.json`; generated Stage 1 script ran with Helm skipped because Helm is unavailable in this environment. - `make fix-consistency REPO=railiance-cluster` passed with pre-existing C-12 warnings and synced the T03 status into State Hub. Progress 2026-06-27 staged promotion T04: - Completed `RAIL-BS-WP-0006-T04` in `/home/worsch/railiance-cluster`. Added `tools/cmd/railiance-run`, the `bin/railiance run` dispatcher entry, and `docs/railiance-run-command.md`. The command reads `railiance/app.toml`, runs Stage 1 commands and local checks, and emits a `railiance.run-result.v1` JSON result with command references and scrubbed HTTP URLs rather than command logs, stdout/stderr, or secret-bearing URL details. - Updated generated overlays so a Forgejo overlay completes Stage 1 locally: `stage1-script` is required, `local-health` is optional when no local service is running, and Helm rendering remains optional when Helm is unavailable. - Verified a fresh generated Forgejo overlay against `schemas/railiance-app.schema.json` and `bin/railiance run`; the smoke passed with one command, two checks, and zero required failures. - `make fix-consistency REPO=railiance-cluster` passed with pre-existing C-12 warnings and synced the T04 status into State Hub. Progress 2026-06-27 staged promotion T05: - Completed `RAIL-BS-WP-0006-T05` in `/home/worsch/railiance-cluster`. Generated overlays now include a Stage 2 canary Helm template with stable/canary release identities, isolated ingress by default, optional Traefik weighted routing, Prometheus annotations, HTTP probes, conservative resource limits, rollback-safe Stage 2/Stage 3 values, and `tests/stage2-template.sh`. - Verified a fresh generated Forgejo overlay with schema validation, `tests/stage1.sh`, `tests/stage2-template.sh`, and `bin/railiance run`. Helm rendering was skipped because Helm is unavailable in this environment. - `make fix-consistency REPO=railiance-cluster` passed with pre-existing C-12 warnings and synced the T05 status into State Hub. Progress 2026-06-27 staged promotion T06: - Completed `RAIL-BS-WP-0006-T06` in `/home/worsch/railiance-cluster`. Added `tools/cmd/railiance-stage2` plus `bin/railiance deploy` and `bin/railiance observe` dispatch. Both commands default to non-mutating plans; apply/live modes fail closed on missing prerequisites. - Verified a fresh generated Forgejo overlay with schema validation, `tests/stage1.sh`, `tests/stage2-template.sh`, Stage 2 deploy plan, Stage 2 observe plan, and blocked apply without approval/Helm. - `make fix-consistency REPO=railiance-cluster` passed with pre-existing C-12 warnings and synced the T06 status into State Hub. Progress 2026-06-27 staged promotion T07 and finish: - Completed `RAIL-BS-WP-0006-T07` in `/home/worsch/railiance-cluster`. Added `tools/cmd/railiance-stage3`, `bin/railiance promote`, `bin/railiance rollback`, and `docs/promote-rollback-onboarding.md`. Generated overlays now declare promote/rollback plan commands. - Verified a fresh generated Forgejo overlay through Stage 1 run, Stage 2 deploy/observe plans, Stage 3 promote/rollback plans, and blocked apply paths for missing approval/Helm/revision evidence. - Marked `RAIL-BS-WP-0006` `status: finished`; `make fix-consistency REPO=railiance-cluster` synced the finished workstream with only pre-existing C-12 orphan-row warnings. Progress 2026-06-30 policy-gate support closeout: - Closed `/home/worsch/flex-auth` `FLEX-WP-0007` from ops-warden's non-secret production smoke handoff. The deployed runtime at `127.0.0.1:18090` was used from CoulombCore, allow produced `decision:032b096c433ad80c`, and excessive TTL was denied with `ttl_out_of_bounds`. - Closed `/home/worsch/secrets-engine` `SECRETS-WP-0004` from the same evidence: the scoped `warden-sign` OpenBao policy/AppRole lane was applied and used for the vault-backed smoke. No token, role id, secret id, accessor, or raw smoke log was recorded in Git or State Hub. - This removes the `warden-sign` / `FLEX-WP-0007` blocker from CUST-WP-0051. The remaining production credential lanes are different gates: `SECRETS-WP-0003` real npm publish, activity-core -> issue-core, artifact-store live MinIO/STS evidence, and Forgejo migration credentials. Progress 2026-07-02 artifact-store lane finished: - `ARTIFACT-STORE-WP-0007` is `finished`. D7.2 closed with a deterministic local MinIO fixture (`make test-minio-local`) and a live compatibility pass; D7.3 delivered `docs/sts-credential-vending-assessment.md` (key-cape first issuer, Authelia rejected, no production object-storage consumers live yet); D7.4 landed STS temporary-credential support (session token config/env ref, per-client re-resolution of file refs, live MinIO `AssumeRole` proof through tests and CLI health/verify); D7.5 routed the NetKingdom vending-service follow-up (hub message `b57b3403`) and the open-cmis-tck `reports/cmis-summary.md` producer gap (hub message `e5ba736d`), and closed the decision: MinIO compatibility profile + STS, MaxIO fork deferred indefinitely. - The T05 artifact-store bullet is complete. Remaining T05 lanes: issue-core REST flip (waits on `CCR-2026-0002` approval + key injection), Forgejo (human design decisions), `SECRETS-WP-0003` (parked behind provisioning and publish approval). ## Task: Decide State Hub Migration Strategy ```task id: CUST-WP-0051-T06 status: progress priority: high state_hub_task_id: "0ac3763f-eac0-4773-9be8-cb0a7979e444" ``` Choose and execute the State Hub stabilization path. Decision: - If pragmatic railiance01 service is enough for the next operating period, finish `CUST-WP-0011`: cutover MCP config, observe the stabilization window, then retire or retain WSL2 fallback by explicit decision. - If HA is now required, promote `CUST-WP-0038` and the ThreePhoenix HA cluster lane: readiness, storage/database strategy, HA API behavior, failover drill, restore drill, and endpoint/runbook update. Done when the active State Hub path is singular, tested, and documented, and the alternate path is either cancelled, deferred, or explicitly retained as a future workplan. Progress 2026-06-27: - Added `docs/state-hub-migration-strategy-status.md` and selected the pragmatic `CUST-WP-0011` railiance01 path as the singular active State Hub stabilization lane. - `CUST-WP-0011` is already through T01-T06: image pushed, cluster manifests defined, empty deploy healthy, migrations run, WSL2 data restored, row counts compared, and cluster API health/summary verified. - Next gate is `CUST-WP-0011-T07`: explicit approval to freeze WSL2 writes, restore the final dump, compare again, and redirect MCP/private access to the cluster endpoint. - `CUST-WP-0038` and `RAIL-BS-WP-0007` remain deferred HA lanes until the pragmatic path stabilizes and ThreePhoenix storage/database strategy is current. ## Task: Sequence FOS Hub Bootstrap To Completion ```task id: CUST-WP-0051-T07 status: progress priority: medium state_hub_task_id: "27b6828a-9e87-4135-a036-bce760c3057c" ``` Use the stabilized substrate to finish `CUST-WP-0025` without reviving the mega-hub pattern. Recommended order: 1. Keep the completed `CUST-WP-0025-T03` IAM Profile verifier/test as the template for Core Hub auth consumers and future production issuer wiring. 2. Execute the remaining rewritten Core Hub Phase 3 lane: deployed Core Hub smoke, activity-core Core Hub sink smoke, and migration/cutover readiness; the whynot-aligned first UI screens are now closed as `CUST-WP-0025-T18`. 3. Keep `CUST-WP-0047-T05` and `CUST-WP-0049-T06` as legacy/fallback Inter-Hub records until deployed Core Hub evidence or an explicit supersede decision closes them. 4. Start fin-hub/business-model tasks only after identity and Core Hub ops-hub evidence are proven enough to demonstrate the multi-hub pattern. Done when `CUST-WP-0025` has no open foundational identity or ops-hub tasks and fin-hub work is either started on a stable Core Hub pattern or deliberately deferred with a dated condition. Progress 2026-06-27: - Added `docs/fos-hub-bootstrap-sequence-status.md` with the current sequence. - Corrected the identity foundation baseline in `CUST-WP-0025`: the old `NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local identity is done, and the remaining identity gate is the IAM Profile v0.2 FastAPI integration test. - Current ops-hub reality is Core Hub replacement-first: `CORE-WP-0008` finished the API smoke harness, activity-core sink, staging profile, CLI wrappers, UI rebuild backlog, and Custodian handoff. `CUST-WP-0025-T13`-`T19` have been rewritten away from the obsolete standalone scaffold. - Fin-hub/business tasks remain deliberately deferred until identity integration and ops-hub extension evidence are proven. Progress 2026-06-30 Core Hub T16 route refinement: - Rechecked the Core Hub replacement lane after the daily-triage checkpoint. Core Hub is clean and its remaining open gates are deployed evidence, activity-core sink smoke, staging import, dual-run/cutover readiness, and explicit Haskell retirement approval. - `warden route find` for the Core Hub staging operator/runtime token need resolves to OpenBao-owned `openbao-api-key`, with `key-cape-oidc-login` for interactive auth and `ops-bridge-tunnel` for private endpoint access when needed. This is not an ops-warden secret request; ops-warden only routes or assists eligible lanes as the caller. - Next Core Hub proof requires `CORE_HUB_BASE_URL`, approved operator/runtime token custody, activity-core widget mapping, then deployed-smoke plus activity-core sink-smoke evidence with non-secret ids, prefixes, counts, statuses, and containment booleans only. Progress 2026-06-27 Core Hub reset: - `CUST-WP-0052` completed the Phase 3 reset. `CUST-WP-0025-T13` through `T19` now point at Core Hub-owned API evidence, CLI parity, deployed smoke/cutover gates, whynot-aligned UI, and cancellation of immediate standalone ops-hub MCP registration. - Core Hub is now the preferred replacement lane, but staging import, deployed dual-run smokes, cutover evidence, and Haskell retirement approval remain open. Progress 2026-06-27 Core Hub ops evidence contract: - Completed `CUST-WP-0025-T14` by adding Core Hub spec `/home/worsch/core-hub/docs/specs/ops-evidence-contract.md` and linking it from the Core Hub specs index. - The spec defines API resources, non-secret evidence fields, event vocabulary, service-inventory-to-widget/event mapping, readiness-summary inputs, and read-model gaps to close before UI expansion or cutover claims. - T07 sequencing now keeps `T16` and `T17` open; T14 no longer blocks the Core Hub replacement lane. Progress 2026-06-27 CUST-WP-0052 closeout: - `CUST-WP-0052` is finished. It closed the Core Hub reframe, rewrote `CUST-WP-0025-T13` through `T19`, aligned the build/release lane with HelixForge/Railiance Forge practice, and posted non-secret State Hub requirements to `railiance-apps` and `railiance-forge`. - The remaining T07 gates are execution gates, not sequencing ambiguity: `T16/T17` deployed evidence/cutover waits. `T14` is complete as the ops evidence contract definition gate. Progress 2026-06-27 IAM Profile integration: - Completed `CUST-WP-0025-T03` by adding Core Hub's reusable IAM Profile verifier/dependency and a FastAPI fixture integration test covering OIDC discovery, JWKS, authorization-code + PKCE token issuance, protected endpoint access, required IAM Profile claims, missing-token rejection, wrong-audience rejection, and production rejection of local-development issuers. - Remaining T07 gates are now `CUST-WP-0025-T16` and `T17`; identity no longer blocks the Core Hub replacement lane. Progress 2026-06-27 Core Hub operator UI first screens: - Completed `CUST-WP-0025-T18` from Core Hub evidence: `CORE-WP-0006` is finished with the protected `/console` prototype and `CORE-WP-0008-T06` extracted the compact rebuild backlog in `/home/worsch/core-hub/docs/specs/operator-ui-rebuild-backlog.md`. - Fresh Core Hub verification passed with `make visual-check`, covering desktop/mobile screenshots, protected-route behavior, no-overlap, horizontal-overflow, PNG validation, and full-key non-disclosure. - Remaining T07 execution gates are now `CUST-WP-0025-T16` deployed evidence and `T17` cutover decision coupling; both still require staging/runtime custody or migration evidence. Progress 2026-07-02 Core Hub staging deployed: - The T07 Core Hub replacement lane got its first deployed proof: staging on CoulombCore with the full smoke evidence chain (see `CUST-WP-0025-T16` progress and `CORE-WP-0004-T03`, now done). Two defects fixed en route: missing `psycopg2-binary` in the runtime image and the apps-pg NetworkPolicy label requirement (`railiance.io/postgres-client=apps-pg`). - Remaining Core Hub gates: activity-core sink smoke against staging (runtime token + widget mapping), staging import (`CORE-WP-0005-T02`), dual-run/cutover readiness, and Haskell retirement approval. ## Task: Create The Stable Pickup Checkpoint ```task id: CUST-WP-0051-T08 status: done priority: high state_hub_task_id: "2cc0a127-a749-4228-962e-f8c9b693a1b3" ``` Close this metaplan by creating an operator-friendly checkpoint. Minimum contents: - active workstream list with zero stale runbooks and zero contradictory task states; - blocker board showing no unowned credential, access, or approval gates; - daily automation evidence from the latest successful scheduled run; - production service status summary for State Hub, Inter-Hub, ops-hub evidence, issue-core, Forgejo, and artifact-store; - explicit next-pick list for remaining strategic tasks. Done when a future agent can start from the checkpoint and choose the next workplan without reconstructing this review. Completed 2026-06-27: added `docs/infrastructure-stabilization-pickup-checkpoint.md` with the live active workstream list, named blocker board, latest daily-triage evidence, production service status summary, and next-pick sequence. This closes the handoff surface for future agents while the child workplans remain the execution source of truth.