696 lines
33 KiB
Markdown
696 lines
33 KiB
Markdown
---
|
||
id: CUST-WP-0051
|
||
type: workplan
|
||
title: "Infrastructure Stabilization Metaplan"
|
||
domain: infotech
|
||
repo: the-custodian
|
||
status: active
|
||
owner: codex
|
||
topic_slug: custodian
|
||
planning_priority: high
|
||
planning_order: 51
|
||
created: "2026-06-27"
|
||
updated: "2026-07-02"
|
||
state_hub_workstream_id: "21cabc98-3f80-4d00-b3b7-06e2ac2af88f"
|
||
---
|
||
|
||
# CUST-WP-0051 - Infrastructure Stabilization Metaplan
|
||
|
||
## Goal
|
||
|
||
Drive the registered infrastructure workplans from a scattered blocked state to
|
||
a stable checkpoint where:
|
||
|
||
- active blockers have a named owner, route, and next command or decision;
|
||
- production credential work uses approved custody paths only;
|
||
- daily operational automation has one healthy runner and clean evidence;
|
||
- State Hub registration reflects the real file state;
|
||
- unfinished strategic work is sequenced into clear follow-on lanes.
|
||
|
||
This workplan does not replace the child workplans. It is the coordination lane
|
||
for removing cross-workplan blocks and creating a reliable handoff point.
|
||
|
||
## Review Snapshot
|
||
|
||
Reviewed on 2026-06-27 from State Hub and the repo workplan files.
|
||
|
||
Active registered workstreams with open work:
|
||
|
||
| Workstream | Open state | Main stabilization meaning |
|
||
| --- | --- | --- |
|
||
| artifact-store-wp-0007 | 5 todo | Object-store compatibility and STS credential vending lane. |
|
||
| ihub-wp-0022 | 3 wait, 5 done | Ops-hub evidence intake waits on widget seed/runtime key/smoke. |
|
||
| cust-wp-0047 | 1 wait, 6 done | Ops-hub now view waits on Inter-Hub widget activation. |
|
||
| cust-wp-0049 | 1 wait, 5 done | Access lane is ready; live bootstrap needs approved admin execution. |
|
||
| activity-wp-0016 | 1 wait, 2 progress, 5 todo, 2 done | Daily-triage output robustness needs live deploy/smoke evidence. |
|
||
| three-phoenix-ha-cluster | 7 todo | Target HA substrate is planned but not executed. |
|
||
| staged-promotion-lifecycle | finished, 7 done | Promotion discipline ready for broad production cutovers. |
|
||
| rail-ho-wp-0005 | 11 todo, 1 progress | Forgejo production migration needs human design and cutover decisions. |
|
||
| cust-wp-0045-cutover-runbook | 0 tasks | Registered runbook is appearing as an active no-task workstream. |
|
||
| net-wp-0020 | 2 wait, 1 todo, 2 done | OpenBao unseal custody models still need operator profile decisions. |
|
||
| issue-wp-0003 | 2 progress, 5 done | issue-core deploy is close; finish live wiring and runbook evidence. |
|
||
| activity-wp-0006 | 1 wait, 1 todo, 6 done | Three-run calibration waits on the daily-triage live gate. |
|
||
| cust-wp-0038 | 8 todo | Full ThreePhoenix State Hub HA migration remains strategic follow-on. |
|
||
| cust-wp-0025 | 17 todo, 9 done | FOS hub bootstrap now depends on identity, ops-hub, and fin-hub lanes. |
|
||
| cust-wp-0011 | 3 todo, 6 done | Pragmatic State Hub railiance01 migration still needs cutover/stabilize/retire. |
|
||
|
||
Additional repo-local hygiene issue:
|
||
|
||
- `CUST-WP-0014` has frontmatter `status: done` but all six task blocks are
|
||
still `todo`. Treat it as either superseded and archive it, or reopen it as a
|
||
focused State Hub sync-health workplan.
|
||
|
||
State Hub hygiene issue:
|
||
|
||
- There are stale `needs_human` flags on completed or cancelled tasks. These do
|
||
not all block execution, but they make the operator view noisier and should be
|
||
cleared or annotated after the source workplans are reconciled.
|
||
|
||
## Review Refresh 2026-07-02
|
||
|
||
Re-reviewed against the live State Hub active-workplan list. Changes since the
|
||
2026-06-30 refinements:
|
||
|
||
- New platform credential lanes registered in `/home/worsch/railiance-platform`
|
||
that formalize the T02 custody board into executable workplans:
|
||
`RAILIANCE-WP-0005` (credential request and lease broker; T07 wait, T09/T10
|
||
progress), `RAILIANCE-WP-0008` (OpenBao approved automation delegation; T03
|
||
progress), `RAILIANCE-WP-0009` (issue-core runtime ingestion key lane,
|
||
`CCR-2026-0002`; T01/T02 done, T03–T07 wait on CCR approval/apply), and
|
||
`RAILIANCE-WP-0010` (llm-connect OpenRouter provider key lane,
|
||
`CCR-2026-0003`; T01 progress, T03–T07 wait).
|
||
- New cluster deploy lanes in `/home/worsch/railiance-cluster`:
|
||
`RAIL-BS-WP-0008` packaged the ACTIVITY-WP-0016 robustness deploy as
|
||
`make deploy-activity-core-triage-robustness` with coupled schema/executor
|
||
gating, runtime-Instruction contract checks (top-7 bound, NDJSON framing,
|
||
`max_tokens` ≥ 1800), post-deploy triage trigger, and State Hub evidence
|
||
polling; `RAIL-BS-WP-0009` packaged the ACTIVITY-WP-0012-T05 no-restart
|
||
admin-sync smoke as `make admin-sync-smoke`. Both await live execution on
|
||
railiance01 — this is now the single next action for T04.
|
||
- The T05 issue-core REST flip remains gated on `RAILIANCE-WP-0009` T03–T05
|
||
(apply/provision/verify); T01 evidence shows the
|
||
`issue-core/issue-core-runtime` ExternalSecret is already `Ready=True` and
|
||
synced, so the remaining gap is CCR approval plus non-secret verification,
|
||
not missing plumbing.
|
||
|
||
## Dependency Shape
|
||
|
||
The critical path is:
|
||
|
||
1. Credential and operator-access custody:
|
||
OpenBao, Inter-Hub operator key, ops-hub runtime key, Forgejo SMTP/cutover
|
||
approvals, and OpenBao unseal profile decisions.
|
||
2. Ops evidence and daily automation:
|
||
Inter-Hub ops-hub records, activity-core daily-triage robustness deployment,
|
||
schema-valid smoke, then three clean scheduled runs.
|
||
3. Production substrate and source forge:
|
||
issue-core GitOps pilot, Forgejo production migration, artifact-store STS,
|
||
staged promotion, and State Hub migration strategy.
|
||
4. Federation buildout:
|
||
identity completion, Core Hub replacement evidence, ops-hub scaffold reset,
|
||
fin-hub scaffold, and business/runway canon.
|
||
|
||
## Task: Normalize Registry And Workplan Hygiene
|
||
|
||
```task
|
||
id: CUST-WP-0051-T01
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "7e83bd50-5ca2-4341-9d18-65512e3f0442"
|
||
```
|
||
|
||
Clean up the planning substrate before execution work resumes.
|
||
|
||
Minimum scope:
|
||
|
||
- Decide whether `CUST-WP-0045-cutover-runbook` should stay registered as an
|
||
active workstream or be represented only as a runbook under `CUST-WP-0045`.
|
||
- Resolve `CUST-WP-0014`: archive as superseded, or reopen and re-scope the six
|
||
remaining State Hub sync-health tasks.
|
||
- Clear or annotate stale `needs_human` flags on done/cancel tasks after source
|
||
workplans confirm they are no longer live gates.
|
||
- Run State Hub consistency after file changes.
|
||
|
||
Done when the active workstream list no longer contains no-task runbooks or
|
||
contradictory done-with-todo files, and the human-needed view shows only live
|
||
human gates.
|
||
|
||
Progress 2026-06-27:
|
||
|
||
- `CUST-WP-0045-cutover-runbook` now has `status: finished`; State Hub no
|
||
longer lists it as an active workstream.
|
||
- `CUST-WP-0014` is reopened as `backlog` with its task detail preserved, so it
|
||
is no longer a contradictory done-with-todo file or an active queue item.
|
||
- `make fix-consistency REPO=the-custodian` passed with pre-existing C-12
|
||
warnings and synced the lifecycle changes into State Hub.
|
||
|
||
Completed 2026-06-27: cleared 15 stale `needs_human` flags from tasks that
|
||
were already `done` or `cancel`, leaving live `todo`/`progress`/`wait` human
|
||
gates untouched. T01 is complete.
|
||
|
||
## Task: Establish One Credential-Custody Unblock Board
|
||
|
||
```task
|
||
id: CUST-WP-0051-T02
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "312bde29-4370-4352-b5a3-00a8c4fe2059"
|
||
```
|
||
|
||
Collect the live operator-access decisions in one non-secret board.
|
||
|
||
Inputs:
|
||
|
||
- `CUST-WP-0049-T06`: Inter-Hub admin access or deployment-side bootstrap path.
|
||
- `IHUB-WP-0022-T04`: ops-hub runtime `OPS_HUB_KEY` custody.
|
||
- `NET-WP-0020`: OpenBao unseal custody and SSH automation profile.
|
||
- `RAIL-HO-WP-0005`: Forgejo hostname, SMTP, runner, backup, cutover, rollback,
|
||
and retirement decisions.
|
||
|
||
Rules:
|
||
|
||
- Do not put secrets in Git, State Hub, workplans, or chat.
|
||
- Use `warden route find` / `warden route show` before requesting credentials.
|
||
- Treat ops-warden as SSH certificate authority only, not as a secret store.
|
||
|
||
Done when each human/operator gate has an owner, approved route, expected
|
||
execution host, non-secret evidence target, and fallback decision.
|
||
|
||
Completed 2026-06-27: added `docs/credential-custody-unblock-board.md` with
|
||
route records, live gate owners, expected execution hosts, non-secret evidence
|
||
targets, fallback decisions, and pickup order. Route lookup was verified through
|
||
`/home/worsch/ops-warden` using `uv run warden route show ... --json` because
|
||
the globally installed `warden` lacks the `route` subcommand.
|
||
|
||
Refined 2026-06-27: added `docs/ops-warden-secret-posture-review.md` and updated
|
||
the unblock board/checkpoint to consume ops-warden's `warden access` assist
|
||
boundary plus WARDEN-WP-0015 environment-posture/workload-maturity triage. This
|
||
turns vague IT-security blockers into dev/test doubles, owner-routed production
|
||
custody gates, or real maturity/posture violations.
|
||
|
||
Refined 2026-06-30: closed the adjacent ops-warden policy-gate support lanes
|
||
without changing ops-warden itself. `/home/worsch/flex-auth` `FLEX-WP-0007`
|
||
finished at commit `339c35e`, and `/home/worsch/secrets-engine`
|
||
`SECRETS-WP-0004` finished at commit `e0ab1b8`. Non-secret evidence records the
|
||
deployed flex-auth runtime, `decision:032b096c433ad80c`,
|
||
`ttl_out_of_bounds`, backend `vault`, and the scoped `warden-sign` OpenBao lane.
|
||
`policy.enabled` remains intentionally off until testing/production maturity, so
|
||
this gate is verified and banked rather than live-enforced.
|
||
|
||
## Task: Close The Ops-Hub Inter-Hub Evidence Lane
|
||
|
||
```task
|
||
id: CUST-WP-0051-T03
|
||
status: progress
|
||
priority: high
|
||
state_hub_task_id: "d6c3a39e-629d-47e4-b589-9e1a0273d9fa"
|
||
```
|
||
|
||
Finish the linked ops-hub activation chain:
|
||
|
||
- Execute `CUST-WP-0049-T06` using the approved access route.
|
||
- Close `CUST-WP-0047-T05` by proving ops-hub widgets exist and accept evidence
|
||
events.
|
||
- Unblock `IHUB-WP-0022` by provisioning the runtime key through the approved
|
||
secret path and running the end-to-end evidence submission smoke.
|
||
|
||
Done when ops inventory probes and activity-core evidence can land in Inter-Hub
|
||
without manual SQL or secret exposure.
|
||
|
||
Progress 2026-06-27:
|
||
|
||
- Added `docs/ops-hub-interhub-evidence-lane-status.md` with non-secret public
|
||
probe evidence. Production Inter-Hub has an `ops-hub` row and the ops-hub seed
|
||
vocabulary is visible on public registry endpoints.
|
||
- Protected widget, manifest, and hub-registry surfaces correctly require
|
||
authentication; no runtime-key smoke was attempted.
|
||
- New blocker surfaced: the older `IHUB-WP-0022` activity-core mapping contract
|
||
names event types, policy scope, aggregate widget refs, and widget types that
|
||
do not match the live ops-hub seed vocabulary. Align that contract before an
|
||
attended bootstrap/runtime-key smoke, or the operator key may still hit
|
||
manifest/schema failures.
|
||
|
||
Progress 2026-06-27 contract alignment:
|
||
|
||
- Updated `/home/worsch/inter-hub` contract docs for `IHUB-WP-0022` to target
|
||
the live ops-hub seed vocabulary. Old `ops-service-observed` and
|
||
`ops-inventory-drift` names are transition aliases, `ops-access-path-checked`
|
||
is deferred to fallback until supported, and payload examples now post only
|
||
live manifest event types.
|
||
- Ran `make fix-consistency REPO=inter-hub`; it passed with pre-existing C-12
|
||
warnings and synced the IHUB-WP-0022 description drift into State Hub.
|
||
- Remaining T03 gate is authenticated widget lookup, any missing backup/risk
|
||
seed widget, runtime key custody, and protected submission smoke.
|
||
|
||
Progress 2026-06-27 Core Hub pivot:
|
||
|
||
- Created `CUST-WP-0052` to drive the reframe from old Inter-Hub production
|
||
bootstrap toward Core Hub-owned replacement implementation.
|
||
- Treat remaining Inter-Hub evidence as legacy compatibility or fallback
|
||
evidence. Do not spend new design work on Haskell Inter-Hub unless it is
|
||
needed for migration proof or rollback.
|
||
- Next implementation lane should be Core Hub API first, CLI second, and web UI
|
||
third, with whynot-design used for the rebuilt UI where practical.
|
||
|
||
## Task: Stabilize Daily-Triage Automation
|
||
|
||
```task
|
||
id: CUST-WP-0051-T04
|
||
status: progress
|
||
priority: high
|
||
state_hub_task_id: "42810d3b-5557-4efd-871b-65bef7c19e0e"
|
||
```
|
||
|
||
Finish the activity-core daily-triage reliability lane.
|
||
|
||
Sequence:
|
||
|
||
1. Deploy the `activity-wp-0016` robustness bundle: bounded prompt/schema,
|
||
per-item parsing, quarantine lane, and producer guardrails.
|
||
2. Run a schema-valid live daily-triage smoke on railiance01.
|
||
3. Collect three clean scheduled runs with matching activity-core, State Hub,
|
||
and working-memory evidence.
|
||
4. Close `activity-wp-0006` calibration and decide the fate of the
|
||
`CUST-WP-0045` cutover runbook registration.
|
||
|
||
Done when there is exactly one trusted daily triage runner and the fallback
|
||
state is documented.
|
||
|
||
Progress 2026-06-27:
|
||
|
||
- Added `docs/daily-triage-stabilization-status.md` with the current evidence
|
||
chain. The 2026-06-24 and 2026-06-25 scheduled runs were schema-valid; the
|
||
2026-06-26 and 2026-06-27 runs reached State Hub and working memory but failed
|
||
output validation around char 5.2k.
|
||
- Current primary blocker is no longer a silent schedule or State Hub sink
|
||
outage. The live runner still needs the `ACTIVITY-WP-0016` code/schema bundle
|
||
and Railiance runtime prompt changes so malformed tails degrade to quarantined
|
||
partial output.
|
||
- Pickup sequence: deploy WP-0016 code/schema together, update the runtime
|
||
prompt bundle for bounded top-N/per-item framing/token headroom, run a live
|
||
railiance01 smoke, then restart the three-clean-run gate.
|
||
- Normalized ACTIVITY-WP-0016 source task status in activity-core: T04 is done
|
||
and T05 is progress, matching its own progress notes.
|
||
- Updated activity-core daily-triage source notes: ACTIVITY-WP-0010-T02 is
|
||
now done, T03/T04 point at the post-WP-0016 live smoke and three-run gate,
|
||
and ACTIVITY-WP-0006-T03 records the 2026-06-27 validation failure.
|
||
- Cleared the stale human-needed flag from the completed bridge/config task and
|
||
moved live intervention notes onto the deploy/smoke/calibration gate.
|
||
|
||
Progress 2026-06-30 daily-triage recheck:
|
||
|
||
- State Hub now shows three consecutive schema-valid scheduled `daily_triage`
|
||
events after the malformed 2026-06-26 and 2026-06-27 outputs:
|
||
2026-06-28 `f0d8477e-1db9-4c07-bb8c-d28cbb868abc`, 2026-06-29
|
||
`176d2ea7-f0e3-48cd-999b-4ab6055c6a55`, and 2026-06-30
|
||
`27d695b2-a537-481b-ada6-ca84ec24cd96`; all wrote working memory.
|
||
- This banks the scheduling/sink/schema-validity streak for
|
||
`ACTIVITY-WP-0006-T03` calibration feedback, but not the full WP-0016
|
||
live-proof gate because the reports still emit 10 recommendations instead of
|
||
the bounded top-N contract.
|
||
- /home/worsch/activity-core currently has in-flight uncommitted changes for
|
||
ACTIVITY-WP-0016 diagnostics and new ACTIVITY-WP-0018/0019
|
||
automation-status/inventory workplans. Custodian should not overwrite or
|
||
commit that worktree; the next clean handoff is for the activity-core owner to
|
||
commit/sync or explicitly hand it off, then use the repo-native automation
|
||
status surface as evidence.
|
||
|
||
Progress 2026-07-02 deploy prep:
|
||
|
||
- Executed the preparable half of `RAIL-BS-WP-0008`: activity-core runtime
|
||
Instruction now satisfies the T02 contract in the repo bundle (activity-core
|
||
commit `7612112`: bounded top-7 phrasing on one line, NDJSON-style per-item
|
||
framing compatible with the WP-0016 recovery parser, `max_tokens` 1800), and
|
||
`activity-core:railiance01-prod` was rebuilt locally from that commit.
|
||
- Live transfer/deploy to railiance01 is blocked by agent permission policy
|
||
(production remote writes need explicit operator authorization), and
|
||
per-read production log access is likewise gated, so `RAIL-BS-WP-0008-T03`
|
||
(raw llm-connect response for the 2026-06-26 run) is also operator-owned.
|
||
- Found that `railiance01:~/activity-core` has no `.git`; the deploy script's
|
||
revision gate requires git metadata — noted in the workplan for the operator.
|
||
- Advanced `NET-WP-0020-T02` (OpenBao SOPS-held init/unseal automation) with a
|
||
gated helper + Make targets in net-kingdom; see that workplan for detail.
|
||
- Refreshed `docs/infrastructure-stabilization-pickup-checkpoint.md` with an
|
||
"Operator Pickups Ready Now" list — five one-command/one-decision items.
|
||
|
||
## Task: Finish Near-Term Production Service Lanes
|
||
|
||
```task
|
||
id: CUST-WP-0051-T05
|
||
status: progress
|
||
priority: medium
|
||
state_hub_task_id: "2083f0e4-e037-48bf-8069-f31e8db2fd95"
|
||
```
|
||
|
||
Move near-complete service workstreams to done before starting larger migrations.
|
||
|
||
Priority order:
|
||
|
||
- `issue-wp-0003`: finish activity-core wiring and end-to-end GitOps runbook.
|
||
- `rail-ho-wp-0005`: resolve Forgejo production decisions, email recovery, and
|
||
cutover approval gates.
|
||
- `artifact-store-wp-0007`: complete MinIO compatibility and STS credential
|
||
vending assessment if it is required by backup, registry, or app lanes.
|
||
- `secrets-wp-0003`: finish or explicitly park the whynot-design real npm
|
||
publish pilot behind Gitea bot, OpenBao provisioning, route confirmation, and
|
||
real package publish evidence.
|
||
- `staged-promotion-lifecycle`: make production promotion gates explicit before
|
||
further cluster/source-forge cutovers.
|
||
|
||
Done when each lane is either finished or parked with a precise dependency and
|
||
no ambiguous human-needed state.
|
||
|
||
Progress 2026-06-27:
|
||
|
||
- Added `docs/near-term-production-service-lanes-status.md` with a lane board
|
||
for issue-core, Forgejo, artifact-store, and staged promotion.
|
||
- issue-core is the immediate near-done lane: the service itself is healthy, but
|
||
activity-core still points at port `8010` and `ISSUE_SINK_TYPE=null`. Do not
|
||
flip it to REST until `ISSUE_CORE_API_KEY` is injected into activity-core's
|
||
runtime secret via route `activity-core-issue-sink`.
|
||
- Forgejo remains parked behind explicit production design decisions, SMTP/email
|
||
recovery, package registry, Actions, backup/restore, migration drill, and
|
||
cutover approval.
|
||
- artifact-store and staged promotion are executable planning/build lanes:
|
||
artifact-store D7.1/D7.2 remains open; staged-promotion T02 is now complete
|
||
before broad production source-forge migration work.
|
||
|
||
Progress 2026-06-27 artifact-store D7.1/D7.2:
|
||
|
||
- Advanced `/home/worsch/artifact-store` `ARTIFACT-STORE-WP-0007`: D7.1 is
|
||
done with `docs/minio-compatibility-landscape-2026-06-27.md`, deciding to
|
||
pursue a compatibility profile instead of a direct MaxIO server fork.
|
||
- D7.2 is now `progress` with an opt-in live MinIO compatibility pytest harness
|
||
(`tests/integration/test_storage_s3_minio.py`), `make test-minio`, and manual
|
||
smoke docs in `docs/OPERATOR.md`.
|
||
- Verified artifact-store with `make test` (`110 passed, 2 skipped`), targeted
|
||
Ruff checks for the new harness, direct harness execution (`2 skipped` without
|
||
endpoint variables), and `git diff --check`. Repo-wide `make lint` still
|
||
reports pre-existing Ruff format drift in seven untouched files.
|
||
- Remaining artifact-store gate is live evidence: run D7.2 against an approved
|
||
MinIO-compatible endpoint with non-secret health, round-trip, and multipart
|
||
output. D7.3 STS vending remains identity/platform-routed work.
|
||
|
||
Progress 2026-06-27 staged promotion:
|
||
|
||
- Completed `RAIL-BS-WP-0006-T02` in `/home/worsch/railiance-cluster`.
|
||
Added `docs/app-toml-contract.md`, `schemas/railiance-app.schema.json`,
|
||
and `examples/railiance/app.toml`, defining the repository-local
|
||
`railiance/app.toml` declaration for identity, ownership, source/artifact
|
||
policy, platform dependencies, secret references without plaintext values,
|
||
observability, stage commands/checks/evidence, canary/promotion modes,
|
||
rollback, and human approval gates.
|
||
- `make fix-consistency REPO=railiance-cluster` passed with pre-existing
|
||
C-12 warnings and synced the T02 status into State Hub.
|
||
- T02 through T07 are complete; the staged-promotion lifecycle is finished.
|
||
|
||
Progress 2026-06-27 staged promotion T03:
|
||
|
||
- Completed `RAIL-BS-WP-0006-T03` in `/home/worsch/railiance-cluster`.
|
||
Added `docs/overlay-repo-pattern.md`,
|
||
`tools/create_railiance_overlay_repo.sh`, and the `bin/railiance
|
||
create-overlay` dispatcher entry. The scaffold writes a separate overlay
|
||
repo with `railiance/upstream.toml`, schema-valid `railiance/app.toml`,
|
||
stage values, a thin Helm chart, Stage 1 test script, rollback runbook, and
|
||
promotion notes without cloning upstream code or handling secrets.
|
||
- Verified the generated Forgejo overlay sample against
|
||
`schemas/railiance-app.schema.json`; generated Stage 1 script ran with Helm
|
||
skipped because Helm is unavailable in this environment.
|
||
- `make fix-consistency REPO=railiance-cluster` passed with pre-existing
|
||
C-12 warnings and synced the T03 status into State Hub.
|
||
|
||
Progress 2026-06-27 staged promotion T04:
|
||
|
||
- Completed `RAIL-BS-WP-0006-T04` in `/home/worsch/railiance-cluster`.
|
||
Added `tools/cmd/railiance-run`, the `bin/railiance run` dispatcher entry,
|
||
and `docs/railiance-run-command.md`. The command reads `railiance/app.toml`,
|
||
runs Stage 1 commands and local checks, and emits a
|
||
`railiance.run-result.v1` JSON result with command references and scrubbed
|
||
HTTP URLs rather than command logs, stdout/stderr, or secret-bearing URL
|
||
details.
|
||
- Updated generated overlays so a Forgejo overlay completes Stage 1 locally:
|
||
`stage1-script` is required, `local-health` is optional when no local service
|
||
is running, and Helm rendering remains optional when Helm is unavailable.
|
||
- Verified a fresh generated Forgejo overlay against
|
||
`schemas/railiance-app.schema.json` and `bin/railiance run`; the smoke passed
|
||
with one command, two checks, and zero required failures.
|
||
- `make fix-consistency REPO=railiance-cluster` passed with pre-existing
|
||
C-12 warnings and synced the T04 status into State Hub.
|
||
|
||
Progress 2026-06-27 staged promotion T05:
|
||
|
||
- Completed `RAIL-BS-WP-0006-T05` in `/home/worsch/railiance-cluster`.
|
||
Generated overlays now include a Stage 2 canary Helm template with
|
||
stable/canary release identities, isolated ingress by default, optional
|
||
Traefik weighted routing, Prometheus annotations, HTTP probes, conservative
|
||
resource limits, rollback-safe Stage 2/Stage 3 values, and
|
||
`tests/stage2-template.sh`.
|
||
- Verified a fresh generated Forgejo overlay with schema validation,
|
||
`tests/stage1.sh`, `tests/stage2-template.sh`, and `bin/railiance run`.
|
||
Helm rendering was skipped because Helm is unavailable in this environment.
|
||
- `make fix-consistency REPO=railiance-cluster` passed with pre-existing
|
||
C-12 warnings and synced the T05 status into State Hub.
|
||
|
||
Progress 2026-06-27 staged promotion T06:
|
||
|
||
- Completed `RAIL-BS-WP-0006-T06` in `/home/worsch/railiance-cluster`.
|
||
Added `tools/cmd/railiance-stage2` plus `bin/railiance deploy` and
|
||
`bin/railiance observe` dispatch. Both commands default to non-mutating
|
||
plans; apply/live modes fail closed on missing prerequisites.
|
||
- Verified a fresh generated Forgejo overlay with schema validation,
|
||
`tests/stage1.sh`, `tests/stage2-template.sh`, Stage 2 deploy plan,
|
||
Stage 2 observe plan, and blocked apply without approval/Helm.
|
||
- `make fix-consistency REPO=railiance-cluster` passed with pre-existing
|
||
C-12 warnings and synced the T06 status into State Hub.
|
||
|
||
Progress 2026-06-27 staged promotion T07 and finish:
|
||
|
||
- Completed `RAIL-BS-WP-0006-T07` in `/home/worsch/railiance-cluster`.
|
||
Added `tools/cmd/railiance-stage3`, `bin/railiance promote`,
|
||
`bin/railiance rollback`, and `docs/promote-rollback-onboarding.md`.
|
||
Generated overlays now declare promote/rollback plan commands.
|
||
- Verified a fresh generated Forgejo overlay through Stage 1 run, Stage 2
|
||
deploy/observe plans, Stage 3 promote/rollback plans, and blocked apply paths
|
||
for missing approval/Helm/revision evidence.
|
||
- Marked `RAIL-BS-WP-0006` `status: finished`; `make fix-consistency
|
||
REPO=railiance-cluster` synced the finished workstream with only pre-existing
|
||
C-12 orphan-row warnings.
|
||
|
||
Progress 2026-06-30 policy-gate support closeout:
|
||
|
||
- Closed `/home/worsch/flex-auth` `FLEX-WP-0007` from ops-warden's non-secret
|
||
production smoke handoff. The deployed runtime at `127.0.0.1:18090` was used
|
||
from CoulombCore, allow produced `decision:032b096c433ad80c`, and excessive
|
||
TTL was denied with `ttl_out_of_bounds`.
|
||
- Closed `/home/worsch/secrets-engine` `SECRETS-WP-0004` from the same evidence:
|
||
the scoped `warden-sign` OpenBao policy/AppRole lane was applied and used for
|
||
the vault-backed smoke. No token, role id, secret id, accessor, or raw smoke
|
||
log was recorded in Git or State Hub.
|
||
- This removes the `warden-sign` / `FLEX-WP-0007` blocker from CUST-WP-0051.
|
||
The remaining production credential lanes are different gates:
|
||
`SECRETS-WP-0003` real npm publish, activity-core -> issue-core,
|
||
artifact-store live MinIO/STS evidence, and Forgejo migration credentials.
|
||
|
||
Progress 2026-07-02 artifact-store lane finished:
|
||
|
||
- `ARTIFACT-STORE-WP-0007` is `finished`. D7.2 closed with a deterministic
|
||
local MinIO fixture (`make test-minio-local`) and a live compatibility pass;
|
||
D7.3 delivered `docs/sts-credential-vending-assessment.md` (key-cape first
|
||
issuer, Authelia rejected, no production object-storage consumers live yet);
|
||
D7.4 landed STS temporary-credential support (session token config/env ref,
|
||
per-client re-resolution of file refs, live MinIO `AssumeRole` proof through
|
||
tests and CLI health/verify); D7.5 routed the NetKingdom vending-service
|
||
follow-up (hub message `b57b3403`) and the open-cmis-tck
|
||
`reports/cmis-summary.md` producer gap (hub message `e5ba736d`), and closed
|
||
the decision: MinIO compatibility profile + STS, MaxIO fork deferred
|
||
indefinitely.
|
||
- The T05 artifact-store bullet is complete. Remaining T05 lanes: issue-core
|
||
REST flip (waits on `CCR-2026-0002` approval + key injection), Forgejo
|
||
(human design decisions), `SECRETS-WP-0003` (parked behind provisioning and
|
||
publish approval).
|
||
|
||
## Task: Decide State Hub Migration Strategy
|
||
|
||
```task
|
||
id: CUST-WP-0051-T06
|
||
status: progress
|
||
priority: high
|
||
state_hub_task_id: "0ac3763f-eac0-4773-9be8-cb0a7979e444"
|
||
```
|
||
|
||
Choose and execute the State Hub stabilization path.
|
||
|
||
Decision:
|
||
|
||
- If pragmatic railiance01 service is enough for the next operating period,
|
||
finish `CUST-WP-0011`: cutover MCP config, observe the stabilization window,
|
||
then retire or retain WSL2 fallback by explicit decision.
|
||
- If HA is now required, promote `CUST-WP-0038` and the ThreePhoenix HA cluster
|
||
lane: readiness, storage/database strategy, HA API behavior, failover drill,
|
||
restore drill, and endpoint/runbook update.
|
||
|
||
Done when the active State Hub path is singular, tested, and documented, and
|
||
the alternate path is either cancelled, deferred, or explicitly retained as a
|
||
future workplan.
|
||
|
||
Progress 2026-06-27:
|
||
|
||
- Added `docs/state-hub-migration-strategy-status.md` and selected
|
||
the pragmatic `CUST-WP-0011` railiance01 path as the singular active
|
||
State Hub stabilization lane.
|
||
- `CUST-WP-0011` is already through T01-T06: image pushed, cluster
|
||
manifests defined, empty deploy healthy, migrations run, WSL2 data restored,
|
||
row counts compared, and cluster API health/summary verified.
|
||
- Next gate is `CUST-WP-0011-T07`: explicit approval to freeze WSL2
|
||
writes, restore the final dump, compare again, and redirect MCP/private access
|
||
to the cluster endpoint.
|
||
- `CUST-WP-0038` and `RAIL-BS-WP-0007` remain deferred HA
|
||
lanes until the pragmatic path stabilizes and ThreePhoenix storage/database
|
||
strategy is current.
|
||
|
||
## Task: Sequence FOS Hub Bootstrap To Completion
|
||
|
||
```task
|
||
id: CUST-WP-0051-T07
|
||
status: progress
|
||
priority: medium
|
||
state_hub_task_id: "27b6828a-9e87-4135-a036-bce760c3057c"
|
||
```
|
||
|
||
Use the stabilized substrate to finish `CUST-WP-0025` without reviving the
|
||
mega-hub pattern.
|
||
|
||
Recommended order:
|
||
|
||
1. Keep the completed `CUST-WP-0025-T03` IAM Profile verifier/test as the
|
||
template for Core Hub auth consumers and future production issuer wiring.
|
||
2. Execute the remaining rewritten Core Hub Phase 3 lane: deployed Core Hub
|
||
smoke, activity-core Core Hub sink smoke, and migration/cutover readiness;
|
||
the whynot-aligned first UI screens are now closed as `CUST-WP-0025-T18`.
|
||
3. Keep `CUST-WP-0047-T05` and `CUST-WP-0049-T06` as legacy/fallback Inter-Hub
|
||
records until deployed Core Hub evidence or an explicit supersede decision
|
||
closes them.
|
||
4. Start fin-hub/business-model tasks only after identity and Core Hub ops-hub
|
||
evidence are proven enough to demonstrate the multi-hub pattern.
|
||
|
||
Done when `CUST-WP-0025` has no open foundational identity or ops-hub tasks and
|
||
fin-hub work is either started on a stable Core Hub pattern or deliberately
|
||
deferred with a dated condition.
|
||
|
||
Progress 2026-06-27:
|
||
|
||
- Added `docs/fos-hub-bootstrap-sequence-status.md` with the current sequence.
|
||
- Corrected the identity foundation baseline in `CUST-WP-0025`: the old
|
||
`NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local
|
||
identity is done, and the remaining identity gate is the IAM Profile v0.2
|
||
FastAPI integration test.
|
||
- Current ops-hub reality is Core Hub replacement-first: `CORE-WP-0008`
|
||
finished the API smoke harness, activity-core sink, staging profile, CLI
|
||
wrappers, UI rebuild backlog, and Custodian handoff. `CUST-WP-0025-T13`-`T19`
|
||
have been rewritten away from the obsolete standalone scaffold.
|
||
- Fin-hub/business tasks remain deliberately deferred until identity integration
|
||
and ops-hub extension evidence are proven.
|
||
|
||
Progress 2026-06-30 Core Hub T16 route refinement:
|
||
|
||
- Rechecked the Core Hub replacement lane after the daily-triage checkpoint.
|
||
Core Hub is clean and its remaining open gates are deployed evidence,
|
||
activity-core sink smoke, staging import, dual-run/cutover readiness, and
|
||
explicit Haskell retirement approval.
|
||
- `warden route find` for the Core Hub staging operator/runtime token
|
||
need resolves to OpenBao-owned `openbao-api-key`, with
|
||
`key-cape-oidc-login` for interactive auth and
|
||
`ops-bridge-tunnel` for private endpoint access when needed.
|
||
This is not an ops-warden secret request; ops-warden only routes or assists
|
||
eligible lanes as the caller.
|
||
- Next Core Hub proof requires `CORE_HUB_BASE_URL`, approved
|
||
operator/runtime token custody, activity-core widget mapping, then
|
||
deployed-smoke plus activity-core sink-smoke evidence with non-secret ids,
|
||
prefixes, counts, statuses, and containment booleans only.
|
||
|
||
Progress 2026-06-27 Core Hub reset:
|
||
|
||
- `CUST-WP-0052` completed the Phase 3 reset. `CUST-WP-0025-T13` through
|
||
`T19` now point at Core Hub-owned API evidence, CLI parity, deployed
|
||
smoke/cutover gates, whynot-aligned UI, and cancellation of immediate
|
||
standalone ops-hub MCP registration.
|
||
- Core Hub is now the preferred replacement lane, but staging import, deployed
|
||
dual-run smokes, cutover evidence, and Haskell retirement approval remain
|
||
open.
|
||
|
||
Progress 2026-06-27 Core Hub ops evidence contract:
|
||
|
||
- Completed `CUST-WP-0025-T14` by adding Core Hub spec
|
||
`/home/worsch/core-hub/docs/specs/ops-evidence-contract.md` and linking it
|
||
from the Core Hub specs index.
|
||
- The spec defines API resources, non-secret evidence fields, event vocabulary,
|
||
service-inventory-to-widget/event mapping, readiness-summary inputs, and
|
||
read-model gaps to close before UI expansion or cutover claims.
|
||
- T07 sequencing now keeps `T16` and `T17` open; T14 no longer blocks the Core
|
||
Hub replacement lane.
|
||
|
||
Progress 2026-06-27 CUST-WP-0052 closeout:
|
||
|
||
- `CUST-WP-0052` is finished. It closed the Core Hub reframe, rewrote
|
||
`CUST-WP-0025-T13` through `T19`, aligned the build/release lane with
|
||
HelixForge/Railiance Forge practice, and posted non-secret State Hub
|
||
requirements to `railiance-apps` and `railiance-forge`.
|
||
- The remaining T07 gates are execution gates, not sequencing ambiguity:
|
||
`T16/T17` deployed evidence/cutover waits. `T14` is complete as the ops
|
||
evidence contract definition gate.
|
||
|
||
Progress 2026-06-27 IAM Profile integration:
|
||
|
||
- Completed `CUST-WP-0025-T03` by adding Core Hub's reusable IAM Profile
|
||
verifier/dependency and a FastAPI fixture integration test covering OIDC
|
||
discovery, JWKS, authorization-code + PKCE token issuance, protected endpoint
|
||
access, required IAM Profile claims, missing-token rejection, wrong-audience
|
||
rejection, and production rejection of local-development issuers.
|
||
- Remaining T07 gates are now `CUST-WP-0025-T16` and `T17`; identity no longer
|
||
blocks the Core Hub replacement lane.
|
||
|
||
Progress 2026-06-27 Core Hub operator UI first screens:
|
||
|
||
- Completed `CUST-WP-0025-T18` from Core Hub evidence: `CORE-WP-0006` is
|
||
finished with the protected `/console` prototype and `CORE-WP-0008-T06`
|
||
extracted the compact rebuild backlog in
|
||
`/home/worsch/core-hub/docs/specs/operator-ui-rebuild-backlog.md`.
|
||
- Fresh Core Hub verification passed with `make visual-check`, covering
|
||
desktop/mobile screenshots, protected-route behavior, no-overlap,
|
||
horizontal-overflow, PNG validation, and full-key non-disclosure.
|
||
- Remaining T07 execution gates are now `CUST-WP-0025-T16` deployed evidence and
|
||
`T17` cutover decision coupling; both still require staging/runtime custody or
|
||
migration evidence.
|
||
|
||
## Task: Create The Stable Pickup Checkpoint
|
||
|
||
```task
|
||
id: CUST-WP-0051-T08
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "2cc0a127-a749-4228-962e-f8c9b693a1b3"
|
||
```
|
||
|
||
Close this metaplan by creating an operator-friendly checkpoint.
|
||
|
||
Minimum contents:
|
||
|
||
- active workstream list with zero stale runbooks and zero contradictory task
|
||
states;
|
||
- blocker board showing no unowned credential, access, or approval gates;
|
||
- daily automation evidence from the latest successful scheduled run;
|
||
- production service status summary for State Hub, Inter-Hub, ops-hub evidence,
|
||
issue-core, Forgejo, and artifact-store;
|
||
- explicit next-pick list for remaining strategic tasks.
|
||
|
||
Done when a future agent can start from the checkpoint and choose the next
|
||
workplan without reconstructing this review.
|
||
|
||
|
||
Completed 2026-06-27: added
|
||
`docs/infrastructure-stabilization-pickup-checkpoint.md` with the live active
|
||
workstream list, named blocker board, latest daily-triage evidence, production
|
||
service status summary, and next-pick sequence. This closes the handoff surface
|
||
for future agents while the child workplans remain the execution source of
|
||
truth.
|