Add infrastructure stabilization checkpoint

This commit is contained in:
2026-06-27 09:58:52 +02:00
parent 1aec581919
commit aa81d712e1
13 changed files with 925 additions and 5 deletions

View File

@@ -0,0 +1,67 @@
# Credential Custody Unblock Board
Created: 2026-06-27
Owner: the-custodian coordination; credential owners remain with their owning repos.
## Purpose
This board collects the live credential and operator-access gates that block the
infrastructure stabilization plan. It records routes and non-secret evidence
only. It is not a secret store, approval record, or substitute for the owning
repo runbooks.
## Rules
- Do not put secrets in Git, State Hub, workplans, shell history, or chat.
- Use the current ops-warden source CLI for routing if the installed `warden`
lacks `route` commands: `cd /home/worsch/ops-warden && uv run warden route ...`.
- `ops-warden` executes SSH certificate issuance only. It does not vend API
keys, OpenBao tokens, SMTP passwords, OIDC logins, or database credentials.
- OpenBao/API credentials route to `railiance-platform`; interactive identity
routes to `key-cape`; tunnels route to `ops-bridge`; host principal and
force-command deployment routes to `railiance-infra`.
- Evidence may include ids, prefixes, counts, decision ids, HTTP status, and
smoke pass/fail. It must not include credential values.
## Route Records
| Route id | Owner | Scope | Warden executes? | Reference |
| --- | --- | --- | --- | --- |
| `openbao-api-key` | `railiance-platform` | API keys, DB credentials, provider tokens, OpenBao KV/dynamic leases | No | `wiki/CredentialRouting.md#routing-table` |
| `inter-hub-bootstrap-ssh` | `ops-warden` + `railiance-infra` | Inter-Hub bootstrap SSH envelope and force-command pattern | No | `wiki/InterHubBootstrapAccessLane.md#worker-checklist` |
| `ssh-cert-host-access` | `ops-warden` | Short-lived SSH cert signing for host reachability | Yes | `wiki/AccessRouting.md#issue-vs-route` |
| `railiance-infra-principals` | `railiance-infra` | Host SSH principal files and force-command deployment | No | `wiki/CredentialRouting.md#routing-table` |
| `key-cape-oidc-login` | `key-cape` | Interactive login, OIDC, MFA, JWT/authentication | No | `wiki/CredentialRouting.md#quick-decision-tree` |
| `ops-bridge-tunnel` | `ops-bridge` | SSH tunnels and port forwards | No | `wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist` |
## Live Gates
| Gate | Blocking work | Owner and route | Expected execution host | Non-secret evidence | Fallback decision | Next action | Status |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Inter-Hub ops-hub bootstrap | `CUST-WP-0049-T06`, unblocks `CUST-WP-0047-T05` | `inter-hub-bootstrap-ssh` for the envelope; `openbao-api-key` for operator/runtime key custody; `ssh-cert-host-access` only for cert signing if remote execution is used | Local workstation with `IHUB_OPERATOR_KEY_FILE`, or trusted host with railiance-infra force-command wrapper | Hub id, manifest id, widget count, runtime key prefix only, bootstrap smoke result, State Hub progress id | Prefer API helper. Use deployment-side migration/bootstrap only by explicit operator approval. Manual SQL remains last-resort and must be recorded as an exception. | Operator materializes Inter-Hub operator key through approved custody, runs the ops-hub helper, stores generated runtime key outside Git, removes temp files. | Ready for operator handoff |
| Ops-hub runtime evidence key | `IHUB-WP-0022-T04`, then `IHUB-WP-0022-T07` | `openbao-api-key` owned by `railiance-platform` / OpenBao | Operator workstation, OpenBao UI/CLI session, or trusted cluster job; not a Codex-visible shell with printed values | OpenBao path/version or populated key count only, token exchange HTTP status, evidence submission smoke id | Attended one-time key file is acceptable only long enough to store in OpenBao and remove; no chat or State Hub transfer. | Store/provide `OPS_HUB_KEY` via OpenBao path, then run Inter-Hub submission smoke. | Waiting on operator custody |
| OpenBao unseal and token automation | `NET-WP-0020`, related OpenBao token-grant and policy-gate blockers | `openbao-api-key` for OpenBao issuer/token paths; `railiance-infra-principals` for host policy; `ssh-cert-host-access` for cert signing; `key-cape-oidc-login` for login/MFA | OpenBao operator terminal, cluster-admin context, or trusted railiance-infra deployment path | Policy names, role names, token accessor only, decision ids, allow/deny smoke result | Keep attended ceremony path until auto-unseal/profile is explicitly approved. Do not invent `warden secret` or paste `VAULT_TOKEN`. | Decide custody profile, apply narrow policy/role through approved issuer path, rerun smoke with non-secret evidence. | Needs operator design/approval |
| Forgejo production migration | `RAIL-HO-WP-0005` T02/T06/T11/T12 | `openbao-api-key` for SMTP/package/provider credentials; `key-cape-oidc-login` for login/MFA; `ops-bridge-tunnel` or `ssh-cert-host-access` only for host reachability | Forgejo admin/browser session, railiance01 trusted host, or approved GitOps/deployment path | Decision record id, hostname/exposure choice, SMTP sender/domain alignment, password-reset smoke, backup/restore drill id, package pull smoke, cutover approval id | Keep Gitea as read-only rollback until stabilization passes; do not retire legacy Gitea without explicit approval. | Resolve production choices, store SMTP credentials through OpenBao, run recovery and migration drills, then request cutover approval. | Needs human production decisions |
## Route Lookup Commands
```bash
cd /home/worsch/ops-warden
uv run warden route show openbao-api-key --json
uv run warden route show inter-hub-bootstrap-ssh --json
uv run warden route show ssh-cert-host-access --json
uv run warden route show railiance-infra-principals --json
uv run warden route show key-cape-oidc-login --json
uv run warden route show ops-bridge-tunnel --json
```
## Pickup Order
1. Inter-Hub ops-hub bootstrap, because it unlocks both the now-view and the
activity-core evidence lane.
2. Ops-hub runtime evidence key, because it is the immediate smoke gate after
bootstrap.
3. OpenBao custody profile, because several credential-helper and policy-gate
blockers collapse once a narrow issuer path exists.
4. Forgejo production decisions, because those require human design approval
before execution can be responsibly automated.

View File

@@ -0,0 +1,68 @@
# Daily-Triage Stabilization Status
Updated: 2026-06-27
## Purpose
Track the current daily-triage blocker chain for `CUST-WP-0051-T04` without
duplicating the source activity-core workplans.
## Current Evidence
State Hub `daily_triage` progress shows the scheduled activity-core runner is
alive and can write both State Hub progress and working-memory notes.
Recent scheduled run evidence:
| Date | State Hub event | Result |
| --- | --- | --- |
| 2026-06-24 | `8b4c16ee-ac47-4581-b3ee-a23fc1f682e6` | schema-valid daily triage, working memory written |
| 2026-06-25 | `cbba6bc0-14cb-492b-ab23-74b9349326c8` | schema-valid daily triage, working memory written |
| 2026-06-26 | `97fd20a0-eee0-45ea-8290-6d91874e1515` | validation failed at char 5268, working memory written |
| 2026-06-27 | `c5ab50a8-404b-4e30-849f-841b059ace65` | validation failed at char 5246, working memory written |
The 2026-06-26 and 2026-06-27 failures are both overlong malformed JSON
responses from `daily-triage-report`. They are not missed schedules and they are
not silent sink failures.
## Current Blocker
The old `ACTIVITY-WP-0010` State Hub bridge note is partially superseded by the
newer evidence: scheduled runs are reaching State Hub and the working-memory
sink. The current primary blocker is that the live activity-core runtime still
uses an output path that can discard the whole report when the model emits a
malformed tail.
`ACTIVITY-WP-0016` has the repo-side mitigation:
- strict bounded report schema;
- item-granular recovery and quarantine;
- producer guardrails and ADR-004;
- regression tests for the 2026-06-26 failure shape.
The remaining gate is the live deployment/smoke path:
1. Deploy the WP-0016 code and schema together.
2. Update the Railiance runtime prompt bundle with bounded top-N instructions,
per-item framing, value vocabularies, and sufficient `max_tokens` headroom.
3. Run a live daily-triage smoke on railiance01 and confirm malformed-tail
output degrades to partial valid output with quarantined items.
4. Resume the three-clean-scheduled-run gate for `ACTIVITY-WP-0006-T03` and
`ACTIVITY-WP-0010-T04`.
## Hygiene Note
The State Hub task index currently shows stale duplicate tasks for
`ACTIVITY-WP-0016` in addition to the source-file task records. Before relying
on activity-core task counts for triage ranking, run activity-core consistency
sync and prune or reconcile any stale generated task rows that are no longer
linked from the workplan file.
2026-06-27 status-normalization: ACTIVITY-WP-0016 source task blocks now
match the progress notes for T04 (done) and T05 (progress). Remaining hygiene is
to remove or reconcile stale duplicate task rows from the State Hub index.
2026-06-27 gate cleanup: ACTIVITY-WP-0010-T02 is now done because scheduled
runner evidence proves the State Hub sink and working-memory path are reachable.
The live human-needed notes now sit on the post-deployment smoke, WP-0016 live
proof, and three-clean-run calibration tasks.

View File

@@ -0,0 +1,33 @@
# FOS Hub Bootstrap Sequence Status
Updated: 2026-06-27
## Purpose
Track `CUST-WP-0051-T07`: sequence `CUST-WP-0025` so FOS Hub bootstrap can resume from current repo reality rather than the older mega-hub/Keycloak assumptions.
## Current Decision
Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workplan is archived and superseded. The active identity baseline is:
- `NK-WP-0002` local identity: complete; usable for bootstrap/dev OIDC.
- `NK-WP-0012` IAM Profile v0.2: finished; canonical NetKingdom-owned profile and conformance suite.
- KeyCape/Authelia/LLDAP stack from the superseding NetKingdom path: current lightweight identity mode.
- `NK-WP-0011` expanded-mode Keycloak: proposed enterprise federation lane, not a blocker for ops-hub bootstrap.
## Sequence Board
| Area | Current state | Pickup action |
| --- | --- | --- |
| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. |
| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. |
| Ops hub | The `ops-hub` repo exists as an Inter-Hub Operations extension. `OPS-WP-0001` is finished; `OPS-WP-0002` has T01-T03 done and waits on authenticated bootstrap/runtime key. | Finish the Inter-Hub evidence lane first: align the activity-core mapping with the live ops vocabulary, run attended bootstrap, store runtime key by approved route, then send the first governed ops event. |
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` still describe a standalone hub-core/FastAPI/MCP scaffold. Current implementation direction is Inter-Hub extension-first. | Reconcile these tasks after the Inter-Hub evidence lane closes: either rewrite them to extension-owned implementation tasks or explicitly defer the standalone hub-core service. |
| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. |
## Stable Pickup Order
1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test.
2. Finish `CUST-WP-0051-T03` / ops-hub Inter-Hub evidence alignment before expanding ops-hub models/tools.
3. Reconcile `CUST-WP-0025-T13`-`T19` against `OPS-WP-0002` once the first ops event lands.
4. Start fin-hub/business work only after ops-hub proves the extension pattern end-to-end.

View File

@@ -0,0 +1,128 @@
# Infrastructure Stabilization Pickup Checkpoint
Updated: 2026-06-27
Coordinator workplan: `CUST-WP-0051`
## Purpose
This checkpoint is the restart surface for the infrastructure stabilization
metaplan. It consolidates the workplan review, unblock boards, current State
Hub registration state, and the next strategic picks.
Use this file first when resuming the lane. Then open the source workplan named
in the relevant row and continue from its task state.
## Registration State
State Hub active workstreams queried on 2026-06-27:
| Workstream | Current pickup meaning |
| --- | --- |
| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. |
| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
| `staged-promotion-lifecycle` | Start T02 to make promotion gates concrete before broad production migrations. |
| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. |
| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. |
| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. |
Hygiene status:
- `CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook
record, not an empty active workstream.
- `CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with
todo task blocks.
- Completed or cancelled tasks no longer carry the stale human-needed flags
cleared during this stabilization session.
- `make fix-consistency REPO=the-custodian` still reports pre-existing C-12
orphan-row warnings, but the relevant workplan lifecycle and task states sync.
## Blocker Board
No live credential, access, or approval gate is unowned. Do not ask
`ops-warden` for secret values; use the route catalog and the owning subsystem.
| Gate | Owner/route | Non-secret evidence to collect | Next action |
| --- | --- | --- | --- |
| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Use the aligned live-vocabulary mapping, then run attended bootstrap and protected widget lookup. |
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Store/provide `OPS_HUB_KEY` outside Git and run the protected evidence smoke. |
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. |
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | Approve custody profile and apply narrow issuer policies before live helper smokes. |
## Daily Automation Evidence
The scheduled daily-triage runner is alive and writing State Hub plus working
memory evidence. The current blocker is output validation, not scheduling or
sink reachability.
Latest clean scheduled run:
- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`,
schema-valid daily triage, working memory written.
Latest failed scheduled runs:
- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed
at char 5268, working memory written.
- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed
at char 5246, working memory written.
Resume from `docs/daily-triage-stabilization-status.md` and
`ACTIVITY-WP-0016` before restarting the three-clean-run gate.
## Production Service Summary
| Surface | Stable fact | Remaining gate |
| --- | --- | --- |
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. |
| Inter-Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; public registry vocabulary is visible; Inter-Hub contract docs now target the live seed vocabulary. | Protected widget lookup, runtime key custody, and authenticated event smoke remain. |
| ops-hub evidence | `ops-hub` exists as the Inter-Hub Operations extension; `OPS-WP-0001` finished; `OPS-WP-0002` has early seed tasks done. | Attended bootstrap, runtime key custody, protected widget/event smoke. |
| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. |
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
| artifact-store | Workplan is active with all tasks open and no current live secret handoff recorded. | Start D7.1 fork/object-store landscape and D7.2 compatibility harness. |
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity and IAM Profile v0.2 are done; hub-core extraction/dev-hub work is done. | Keep `CUST-WP-0025-T03` as the identity integration test, then reconcile old ops-hub scaffold tasks after first Inter-Hub ops event lands. |
## Next-Pick List
1. Run the attended Inter-Hub ops-hub bootstrap with the aligned live-vocabulary
mapping, confirm protected widget ids, and seed any missing backup/risk target widgets.
2. Store/confirm `OPS_HUB_KEY` through approved custody and run the protected
widget/hub-registry/event smoke.
3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt
bundle, then run the railiance01 daily-triage smoke.
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or
record that WSL2 remains primary for the next operating period.
6. Start staged-promotion T02 and artifact-store D7.1/D7.2 so Forgejo and
storage work inherit clear production promotion gates.
7. Keep Forgejo cutover and State Hub HA work parked until their human decision
and drill gates are satisfied.
## Resume Commands
```bash
cd /home/worsch/the-custodian
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
sed -n '1,260p' docs/credential-custody-unblock-board.md
```
After workplan edits, sync from State Hub:
```bash
cd /home/worsch/state-hub
make fix-consistency REPO=the-custodian
```

View File

@@ -0,0 +1,48 @@
# Near-Term Production Service Lanes Status
Updated: 2026-06-27
## Purpose
Track `CUST-WP-0051-T05`: finish or park near-term production service lanes
before starting larger migrations.
## Lane Board
| Lane | Current state | Next action |
| --- | --- | --- |
| `issue-wp-0003` | issue-core is live through ArgoCD; image `0.2.1`, Service port `8765`, ExternalSecret Ready, authenticated smoke created Gitea issue `175`. | Do not flip activity-core blindly. First inject `ISSUE_CORE_API_KEY` into `actcore-runtime-secret` through route `activity-core-issue-sink`; then set activity-core `ISSUE_CORE_URL` to port `8765`, set `ISSUE_SINK_TYPE=rest`, restart/sync, and run one safe emission smoke. |
| `rail-ho-wp-0005` | Forgejo migration remains pre-implementation. Inventory is in progress; production decisions, SMTP/email recovery, cutover, and legacy retirement are human-gated. | Resolve T02 production decisions first, then build the disposable Forgejo probe. Do not start production cutover before promotion lifecycle, email recovery, package registry, Actions, backup/restore, and migration drill pass. |
| `artifact-store-wp-0007` | All tasks are still `todo`; no live secret gate is currently recorded. | Start with D7.1 fork/object-store landscape and D7.2 compatibility harness. Route D7.3 STS credential vending to NetKingdom if implementation belongs outside artifact-store. |
| `staged-promotion-lifecycle` | Lifecycle spec is done; schema/tooling/canary/promotion tasks are still `todo`. | Start T02 `railiance/app.toml` contract, then use issue-core/Forgejo as reference consumers for Stage 1/2/3 promotion gates. |
## Credential And Operator Routing
`activity-core -> issue-core` REST emission uses route catalog id
`activity-core-issue-sink`.
Route lookup on 2026-06-27:
- owner: `activity-core + issue-core`
- ops-warden executes: no
- status: active
- next action: follow `ops-warden/wiki/playbooks/activity-core-issue-sink.md#worker-checklist`
No secret value was read or written. The required non-secret evidence is:
- `actcore-runtime-secret` has an `ISSUE_CORE_API_KEY` data key;
- activity-core worker consumes `ISSUE_CORE_URL=http://issue-core.issue-core.svc.cluster.local:8765`;
- `ISSUE_SINK_TYPE=rest`;
- one known-safe activity-core emission returns issue-core HTTP 201 and creates
a Gitea issue.
## Pickup Order
1. Close the issue-core handoff gate because the service is already healthy and
only activity-core live emission remains.
2. Start staged-promotion T02 so Forgejo has a repeatable promotion contract
before production cutover work accelerates.
3. Run artifact-store D7.1/D7.2 as an assessment/build harness lane, with D7.3
routed to NetKingdom if STS vending is not artifact-store-owned.
4. Keep Forgejo production cutover parked behind explicit T02 decisions and the
staged-promotion/backup/email/package/action gates.

View File

@@ -0,0 +1,120 @@
# Ops Hub Inter-Hub Evidence Lane Status
Date: 2026-06-27
Workplan: `CUST-WP-0051-T03`
Related tasks: `CUST-WP-0047-T05`, `CUST-WP-0049-T06`, `IHUB-WP-0022-T03/T04/T07`
## Summary
The evidence lane is partially live but not ready to close.
Production Inter-Hub already exposes the public ops-hub bootstrap surface and
has an `ops-hub` row plus the ops-hub seed vocabulary. The remaining blockers
are:
1. authenticated bootstrap/runtime-key execution is still operator-gated;
2. protected widget and hub-registry reads cannot be verified without the
ops-hub runtime key;
3. the older `IHUB-WP-0022` activity-core mapping contract does not match the
currently live ops-hub seed vocabulary.
No secret values were requested, read, printed, or stored during this probe.
## Public Probe Evidence
Base URL: `https://hub.coulomb.social`
| Probe | Result |
| --- | --- |
| `GET /api/v2/hubs` | HTTP `200`; contains `ops-hub` |
| `GET /api/v2/openapi.json` | HTTP `200`; includes `/hubs`, `/hub-capability-manifests`, `/api-consumers`, `/policy-scopes` |
| `GET /api/v2/widgets` | HTTP `401`, protected as expected |
| `GET /api/v2/hub-registry` | HTTP `401`, protected as expected |
| `GET /api/v2/widget-types` | HTTP `200`; 14 ops widget types visible |
| `GET /api/v2/event-types` | HTTP `200`; 15 ops event types visible |
| `GET /api/v2/annotation-categories` | HTTP `200`; 10 ops annotation categories visible |
| `GET /api/v2/policy-scopes` | HTTP `200`; 7 ops policy scopes visible |
| `GET /api/v2/hub-capability-manifests?hubId=<ops-hub-id>` | HTTP `401`, protected as expected |
Observed public ops-hub id: `4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`.
The existing `ops-hub/scripts/interhub-gate-probe.py` exits nonzero because it
still expects unauthenticated `/api/v2/hubs` to return `401`. The live contract
returns `200` for public hub discovery and `401` for protected surfaces such as
`/api/v2/widgets` and `/api/v2/hub-registry`.
## Live Ops Vocabulary
The live public registry matches `ops-hub/seeds/ops-hub-manifest.draft.json`:
- widget types: `ops-environment`, `ops-host`, `ops-cluster`, `ops-service`,
`ops-service-catalog`, `ops-endpoint`, `ops-release`, `ops-backup-set`,
`ops-secret-set`, `ops-runbook`, `ops-incident`, `ops-readiness-gate`,
`ops-migration-wave`, `ops-risk`;
- event types: `ops-inventory-registered`, `ops-inventory-updated`,
`ops-service-discovered`, `ops-health-checked`, `ops-release-observed`,
`ops-endpoint-verified`, `ops-backup-verified`, `ops-restore-tested`,
`ops-runbook-executed`, `ops-drift-detected`, `ops-risk-raised`,
`ops-risk-accepted`, `ops-readiness-gate-updated`,
`ops-migration-gate-passed`, `ops-migration-gate-failed`;
- policy scopes: `ops-local`, `ops-transitional-prod`, `ops-production`,
`ops-threephoenix`, `ops-registry`, `ops-secrets`,
`ops-backup-retention`.
## Contract Mismatch
`inter-hub/docs/contracts/ops-hub-activity-core-mapping.md` and
`ops-hub-activity-core-event-payloads.md` still describe the early
activity-core proposal:
| Contract name | Live seed status | Recommended action |
| --- | --- | --- |
| `ops-service-observed` | Not in live event registry | Rename to `ops-service-discovered`, or add an explicit alias event in the ops-hub manifest. |
| `ops-endpoint-verified` | Live | Keep. |
| `ops-access-path-checked` | Not in live event registry; no `ops-access-path` widget type in seed | Either add access-path vocabulary/widgets, or defer access-path submissions and keep State Hub fallback. |
| `ops-backup-verified` | Live | Keep, but map to `ops-backup-set` widget type. |
| `ops-inventory-drift` | Not in live event registry | Rename to `ops-drift-detected`, or add an explicit alias event. |
| `ops-evidence` policy scope | Not in live policy scopes | Use an existing ops scope or add `ops-evidence` to the manifest and activate it. |
| aggregate refs such as `ops:service:aggregate` | Not in `ops-hub/seeds/ops-hub-widgets.seed.json` | Seed aggregate intake widgets or change mapping to the existing entity/readiness widgets. |
| widget types such as `ops-service-card` | Not in live widget types | Use live widget types like `ops-service`, `ops-endpoint`, `ops-backup-set`, and `ops-readiness-gate`. |
## 2026-06-27 Contract Alignment
The Inter-Hub contract docs were revised in `/home/worsch/inter-hub` to target
the live ops-hub seed vocabulary:
- `ops-service-observed` is now a transition alias for
`ops-service-discovered`.
- `ops-inventory-drift` is now a transition alias for `ops-drift-detected`.
- `ops-access-path-checked` is explicitly deferred to State Hub fallback until
ops-hub adds access-path vocabulary or a readiness/risk mapping decision.
- The old `ops-evidence` policy scope is replaced by declared live scopes such
as `ops-production`, `ops-registry`, and `ops-backup-retention`.
- Payload examples now post only live manifest event types.
This removes the known contract-drift blocker before the attended bootstrap.
The remaining gate is authenticated widget lookup, any missing backup/risk seed
widget, runtime key custody, and protected event submission smoke.
## Current Closure State
`CUST-WP-0049-T06` remains `wait`: the helper and runbook are ready, but an
approved authenticated execution lane is still required.
`CUST-WP-0047-T05` remains `wait`: the ops-hub row and vocabulary are visible,
but seeded widgets and event acceptance cannot be proven without the protected
runtime path.
`IHUB-WP-0022-T03/T04/T07` remain gated: before an end-to-end smoke, reconcile
the activity-core mapping contract to the live ops-hub seed vocabulary or add
the missing aliases/aggregate widgets to the manifest.
## Next Pick
1. Use the aligned live-vocabulary contract for the attended
`CUST-WP-0049-T06` bootstrap.
2. Confirm protected widget ids and seed any missing backup/risk target widgets
required by the mapping.
3. Store or confirm `OPS_HUB_KEY` through OpenBao, then run the protected
widget/hub-registry/event smoke.

View File

@@ -0,0 +1,34 @@
# State Hub Migration Strategy Status
Updated: 2026-06-27
## Decision
Use `CUST-WP-0011` as the active State Hub stabilization path.
Keep `CUST-WP-0038` and `RAIL-BS-WP-0007` as deferred HA/ThreePhoenix follow-up lanes.
Rationale: the pragmatic railiance01 deployment has already completed image
publish, cluster manifests, empty deploy, migrations, WSL2 data restore, row-count
comparison, and cluster API health checks. The remaining work is cutover and
stabilization, not initial buildout.
## Current State
| Path | State | Next action |
| --- | --- | --- |
| `CUST-WP-0011` pragmatic railiance01 | T01-T06 done. Cluster State Hub has verified restored WSL2 data and healthy API. | T07: get explicit approval to freeze WSL2 writes, restore final dump, compare again, and redirect private access/MCP to the cluster endpoint. |
| `CUST-WP-0038` full HA State Hub | Entry criteria depend on completing or superseding CUST-WP-0011 and passing stabilization. All implementation tasks are still todo. | Defer until cluster-hosted State Hub proves stable and ThreePhoenix storage/database strategy is current. |
| `RAIL-BS-WP-0007` ThreePhoenix HA cluster | All phases are todo. | Treat as substrate work for future critical workloads and HA State Hub, not as a blocker for pragmatic cutover. |
## Human Gates
- `CUST-WP-0011-T07`: explicit approval required before freezing WSL2 writes and making the cluster State Hub primary.
- `CUST-WP-0038-T08`: explicit approval required before retiring WSL2 fallback after HA failover and restore drills.
## Stable Pickup Path
1. Reconfirm current WSL2 backup and take final pre-cutover dump.
2. Restore final dump into railiance01 State Hub and compare counts again.
3. Redirect the active private access path: either keep local `127.0.0.1:8000` and move it to an ops-bridge/SSH tunnel, or set MCP `API_BASE` to the private cluster endpoint.
4. Run stabilization with WSL2 retained as fallback.
5. Document the operating model and leave final retirement to a later explicit decision or HA workplan.

View File

@@ -4,14 +4,19 @@ type: workplan
title: Repo Sync Automation & Gitea Inventory
domain: infotech
repo: the-custodian
status: done
status: backlog
state_hub_workstream_id: 27ea80bd-76bf-44a7-b0ed-e09748d5390b
created: 2026-03-16
updated: 2026-03-16
updated: 2026-06-27
---
# CUST-WP-0014 — Repo Sync Automation & Gitea Inventory
2026-06-27 stabilization note: this workplan was previously marked `done` even
though all task blocks remained `todo`. It has been reopened as `backlog` so the
State Hub read model reflects the actual remaining sync-health work without
adding it to the current execution queue.
## Problem
When a repo agent completes work and commits, the state-hub does not automatically

View File

@@ -8,7 +8,7 @@ status: active
owner: custodian
topic_slug: custodian
created: "2026-03-20"
updated: "2026-06-22"
updated: "2026-06-27"
state_hub_workstream_id: "293a74fe-a85a-4ad6-8933-23d52a72fe8b"
---
@@ -57,7 +57,7 @@ OAS gives the viable infrastructure.
```task
id: CUST-WP-0025-T01
status: todo
status: cancel
priority: high
state_hub_task_id: "f55078b6-7fa3-49ab-be30-37db622d64c9"
```
@@ -68,11 +68,13 @@ foundation for all hubs and services.
Cross-reference: net-kingdom NK-WP-0001.
2026-06-27 sequencing update: cancelled as an obsolete prerequisite. `NK-WP-0001` is archived and superseded by the KeyCape/Authelia/LLDAP lightweight stack, `NK-WP-0012` IAM Profile v0.2, and the proposed `NK-WP-0011` expanded-mode Keycloak federation lane. FOS bootstrap should not wait for this old Keycloak path.
### T02 — Complete NK-WP-0002: Local identity bootstrap
```task
id: CUST-WP-0025-T02
status: todo
status: done
priority: high
state_hub_task_id: "0d7792f7-5695-4e1a-9726-b9661d5e7108"
```
@@ -83,6 +85,8 @@ development of hub services without cluster dependency.
Cross-reference: net-kingdom NK-WP-0002.
2026-06-27 sequencing update: marked done. `NK-WP-0002` is complete: local-identity file store, Keycloak export, minimal localhost OIDC provider, permissions hardening, audit log, and docs are all delivered.
### T03 — IAM Profile integration test
```task
@@ -100,6 +104,8 @@ Write a minimal test service + integration test that:
This test becomes the template for hub-core auth middleware.
2026-06-27 sequencing update: this remains the real open identity gate, but it should target the current NetKingdom IAM Profile v0.2 contract and either local-identity or KeyCape lightweight issuer, not the archived `NK-WP-0001` Keycloak path.
### T04 — Canon standard: IAM Profile specification
```task

View File

@@ -3,7 +3,9 @@ id: CUST-WP-0045-cutover-runbook
type: runbook
title: "CUST-WP-0045 T06 cutover — exact command sequence"
parent_workplan: CUST-WP-0045
status: finished
created: "2026-06-01"
updated: "2026-06-27"
state_hub_workstream_id: "4ebc847b-4a2c-4ce2-9fd2-62d3071eed96"
domain: infotech
---

View File

@@ -155,6 +155,14 @@ remaining live-execution blocker.
Done when the ops-hub widgets exist and can accept `ops-endpoint-verified` or
equivalent ops evidence events.
2026-06-27 non-secret probe: production Inter-Hub publicly lists the `ops-hub`
hub row (`4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`) and the ops-hub seed
vocabulary. Protected `/api/v2/widgets` and `/api/v2/hub-registry` return
HTTP `401` without a runtime key, so widget presence and event acceptance still
require the approved operator/runtime-key lane. The activity-core mapping
contract also needs reconciliation with the live ops-hub seed vocabulary before
smoke closure.
## Task: Build First Ops-Hub Service Catalog View
```task

View File

@@ -180,6 +180,12 @@ Done when the ops-hub Inter-Hub records exist in production, the generated
runtime key is stored outside Git, and non-secret validation evidence is logged
to State Hub.
2026-06-27 non-secret probe: production Inter-Hub already has the `ops-hub` row
and public ops vocabulary. The live blocker remains authenticated execution and
runtime-key custody, plus a contract-alignment issue between `IHUB-WP-0022`
activity-core mapping docs and the live ops-hub seed vocabulary. Do not spend
operator key time on the smoke until the vocabulary/mapping direction is chosen.
## Acceptance Criteria
- The repeatable access lane is documented in the owning repos.

View File

@@ -0,0 +1,395 @@
---
id: CUST-WP-0051
type: workplan
title: "Infrastructure Stabilization Metaplan"
domain: infotech
repo: the-custodian
status: active
owner: codex
topic_slug: custodian
planning_priority: high
planning_order: 51
created: "2026-06-27"
updated: "2026-06-27"
state_hub_workstream_id: "21cabc98-3f80-4d00-b3b7-06e2ac2af88f"
---
# CUST-WP-0051 - Infrastructure Stabilization Metaplan
## Goal
Drive the registered infrastructure workplans from a scattered blocked state to
a stable checkpoint where:
- active blockers have a named owner, route, and next command or decision;
- production credential work uses approved custody paths only;
- daily operational automation has one healthy runner and clean evidence;
- State Hub registration reflects the real file state;
- unfinished strategic work is sequenced into clear follow-on lanes.
This workplan does not replace the child workplans. It is the coordination lane
for removing cross-workplan blocks and creating a reliable handoff point.
## Review Snapshot
Reviewed on 2026-06-27 from State Hub and the repo workplan files.
Active registered workstreams with open work:
| Workstream | Open state | Main stabilization meaning |
| --- | --- | --- |
| artifact-store-wp-0007 | 5 todo | Object-store compatibility and STS credential vending lane. |
| ihub-wp-0022 | 3 wait, 5 done | Ops-hub evidence intake waits on widget seed/runtime key/smoke. |
| cust-wp-0047 | 1 wait, 6 done | Ops-hub now view waits on Inter-Hub widget activation. |
| cust-wp-0049 | 1 wait, 5 done | Access lane is ready; live bootstrap needs approved admin execution. |
| activity-wp-0016 | 1 wait, 2 progress, 5 todo, 2 done | Daily-triage output robustness needs live deploy/smoke evidence. |
| three-phoenix-ha-cluster | 7 todo | Target HA substrate is planned but not executed. |
| staged-promotion-lifecycle | 6 todo, 1 done | Promotion discipline needed before broad production cutovers. |
| rail-ho-wp-0005 | 11 todo, 1 progress | Forgejo production migration needs human design and cutover decisions. |
| cust-wp-0045-cutover-runbook | 0 tasks | Registered runbook is appearing as an active no-task workstream. |
| net-wp-0020 | 2 wait, 1 todo, 2 done | OpenBao unseal custody models still need operator profile decisions. |
| issue-wp-0003 | 2 progress, 5 done | issue-core deploy is close; finish live wiring and runbook evidence. |
| activity-wp-0006 | 1 wait, 1 todo, 6 done | Three-run calibration waits on the daily-triage live gate. |
| cust-wp-0038 | 8 todo | Full ThreePhoenix State Hub HA migration remains strategic follow-on. |
| cust-wp-0025 | 17 todo, 9 done | FOS hub bootstrap now depends on identity, ops-hub, and fin-hub lanes. |
| cust-wp-0011 | 3 todo, 6 done | Pragmatic State Hub railiance01 migration still needs cutover/stabilize/retire. |
Additional repo-local hygiene issue:
- `CUST-WP-0014` has frontmatter `status: done` but all six task blocks are
still `todo`. Treat it as either superseded and archive it, or reopen it as a
focused State Hub sync-health workplan.
State Hub hygiene issue:
- There are stale `needs_human` flags on completed or cancelled tasks. These do
not all block execution, but they make the operator view noisier and should be
cleared or annotated after the source workplans are reconciled.
## Dependency Shape
The critical path is:
1. Credential and operator-access custody:
OpenBao, Inter-Hub operator key, ops-hub runtime key, Forgejo SMTP/cutover
approvals, and OpenBao unseal profile decisions.
2. Ops evidence and daily automation:
Inter-Hub ops-hub records, activity-core daily-triage robustness deployment,
schema-valid smoke, then three clean scheduled runs.
3. Production substrate and source forge:
issue-core GitOps pilot, Forgejo production migration, artifact-store STS,
staged promotion, and State Hub migration strategy.
4. Federation buildout:
identity completion, ops-hub scaffold, ops-hub MCP registration, fin-hub
scaffold, and business/runway canon.
## Task: Normalize Registry And Workplan Hygiene
```task
id: CUST-WP-0051-T01
status: done
priority: high
state_hub_task_id: "7e83bd50-5ca2-4341-9d18-65512e3f0442"
```
Clean up the planning substrate before execution work resumes.
Minimum scope:
- Decide whether `CUST-WP-0045-cutover-runbook` should stay registered as an
active workstream or be represented only as a runbook under `CUST-WP-0045`.
- Resolve `CUST-WP-0014`: archive as superseded, or reopen and re-scope the six
remaining State Hub sync-health tasks.
- Clear or annotate stale `needs_human` flags on done/cancel tasks after source
workplans confirm they are no longer live gates.
- Run State Hub consistency after file changes.
Done when the active workstream list no longer contains no-task runbooks or
contradictory done-with-todo files, and the human-needed view shows only live
human gates.
Progress 2026-06-27:
- `CUST-WP-0045-cutover-runbook` now has `status: finished`; State Hub no
longer lists it as an active workstream.
- `CUST-WP-0014` is reopened as `backlog` with its task detail preserved, so it
is no longer a contradictory done-with-todo file or an active queue item.
- `make fix-consistency REPO=the-custodian` passed with pre-existing C-12
warnings and synced the lifecycle changes into State Hub.
Completed 2026-06-27: cleared 15 stale `needs_human` flags from tasks that
were already `done` or `cancel`, leaving live `todo`/`progress`/`wait` human
gates untouched. T01 is complete.
## Task: Establish One Credential-Custody Unblock Board
```task
id: CUST-WP-0051-T02
status: done
priority: high
state_hub_task_id: "312bde29-4370-4352-b5a3-00a8c4fe2059"
```
Collect the live operator-access decisions in one non-secret board.
Inputs:
- `CUST-WP-0049-T06`: Inter-Hub admin access or deployment-side bootstrap path.
- `IHUB-WP-0022-T04`: ops-hub runtime `OPS_HUB_KEY` custody.
- `NET-WP-0020`: OpenBao unseal custody and SSH automation profile.
- `RAIL-HO-WP-0005`: Forgejo hostname, SMTP, runner, backup, cutover, rollback,
and retirement decisions.
Rules:
- Do not put secrets in Git, State Hub, workplans, or chat.
- Use `warden route find` / `warden route show` before requesting credentials.
- Treat ops-warden as SSH certificate authority only, not as a secret store.
Done when each human/operator gate has an owner, approved route, expected
execution host, non-secret evidence target, and fallback decision.
Completed 2026-06-27: added `docs/credential-custody-unblock-board.md` with
route records, live gate owners, expected execution hosts, non-secret evidence
targets, fallback decisions, and pickup order. Route lookup was verified through
`/home/worsch/ops-warden` using `uv run warden route show ... --json` because
the globally installed `warden` lacks the `route` subcommand.
## Task: Close The Ops-Hub Inter-Hub Evidence Lane
```task
id: CUST-WP-0051-T03
status: progress
priority: high
state_hub_task_id: "d6c3a39e-629d-47e4-b589-9e1a0273d9fa"
```
Finish the linked ops-hub activation chain:
- Execute `CUST-WP-0049-T06` using the approved access route.
- Close `CUST-WP-0047-T05` by proving ops-hub widgets exist and accept evidence
events.
- Unblock `IHUB-WP-0022` by provisioning the runtime key through the approved
secret path and running the end-to-end evidence submission smoke.
Done when ops inventory probes and activity-core evidence can land in Inter-Hub
without manual SQL or secret exposure.
Progress 2026-06-27:
- Added `docs/ops-hub-interhub-evidence-lane-status.md` with non-secret public
probe evidence. Production Inter-Hub has an `ops-hub` row and the ops-hub seed
vocabulary is visible on public registry endpoints.
- Protected widget, manifest, and hub-registry surfaces correctly require
authentication; no runtime-key smoke was attempted.
- New blocker surfaced: the older `IHUB-WP-0022` activity-core mapping contract
names event types, policy scope, aggregate widget refs, and widget types that
do not match the live ops-hub seed vocabulary. Align that contract before an
attended bootstrap/runtime-key smoke, or the operator key may still hit
manifest/schema failures.
Progress 2026-06-27 contract alignment:
- Updated `/home/worsch/inter-hub` contract docs for `IHUB-WP-0022` to target
the live ops-hub seed vocabulary. Old `ops-service-observed` and
`ops-inventory-drift` names are transition aliases, `ops-access-path-checked`
is deferred to fallback until supported, and payload examples now post only
live manifest event types.
- Ran `make fix-consistency REPO=inter-hub`; it passed with pre-existing C-12
warnings and synced the IHUB-WP-0022 description drift into State Hub.
- Remaining T03 gate is authenticated widget lookup, any missing backup/risk
seed widget, runtime key custody, and protected submission smoke.
## Task: Stabilize Daily-Triage Automation
```task
id: CUST-WP-0051-T04
status: progress
priority: high
state_hub_task_id: "42810d3b-5557-4efd-871b-65bef7c19e0e"
```
Finish the activity-core daily-triage reliability lane.
Sequence:
1. Deploy the `activity-wp-0016` robustness bundle: bounded prompt/schema,
per-item parsing, quarantine lane, and producer guardrails.
2. Run a schema-valid live daily-triage smoke on railiance01.
3. Collect three clean scheduled runs with matching activity-core, State Hub,
and working-memory evidence.
4. Close `activity-wp-0006` calibration and decide the fate of the
`CUST-WP-0045` cutover runbook registration.
Done when there is exactly one trusted daily triage runner and the fallback
state is documented.
Progress 2026-06-27:
- Added `docs/daily-triage-stabilization-status.md` with the current evidence
chain. The 2026-06-24 and 2026-06-25 scheduled runs were schema-valid; the
2026-06-26 and 2026-06-27 runs reached State Hub and working memory but failed
output validation around char 5.2k.
- Current primary blocker is no longer a silent schedule or State Hub sink
outage. The live runner still needs the `ACTIVITY-WP-0016` code/schema bundle
and Railiance runtime prompt changes so malformed tails degrade to quarantined
partial output.
- Pickup sequence: deploy WP-0016 code/schema together, update the runtime
prompt bundle for bounded top-N/per-item framing/token headroom, run a live
railiance01 smoke, then restart the three-clean-run gate.
- Normalized ACTIVITY-WP-0016 source task status in activity-core: T04 is done
and T05 is progress, matching its own progress notes.
- Updated activity-core daily-triage source notes: ACTIVITY-WP-0010-T02 is
now done, T03/T04 point at the post-WP-0016 live smoke and three-run gate,
and ACTIVITY-WP-0006-T03 records the 2026-06-27 validation failure.
- Cleared the stale human-needed flag from the completed bridge/config task and
moved live intervention notes onto the deploy/smoke/calibration gate.
## Task: Finish Near-Term Production Service Lanes
```task
id: CUST-WP-0051-T05
status: progress
priority: medium
state_hub_task_id: "2083f0e4-e037-48bf-8069-f31e8db2fd95"
```
Move near-complete service workstreams to done before starting larger migrations.
Priority order:
- `issue-wp-0003`: finish activity-core wiring and end-to-end GitOps runbook.
- `rail-ho-wp-0005`: resolve Forgejo production decisions, email recovery, and
cutover approval gates.
- `artifact-store-wp-0007`: complete MinIO compatibility and STS credential
vending assessment if it is required by backup, registry, or app lanes.
- `staged-promotion-lifecycle`: make production promotion gates explicit before
further cluster/source-forge cutovers.
Done when each lane is either finished or parked with a precise dependency and
no ambiguous human-needed state.
Progress 2026-06-27:
- Added `docs/near-term-production-service-lanes-status.md` with a lane board
for issue-core, Forgejo, artifact-store, and staged promotion.
- issue-core is the immediate near-done lane: the service itself is healthy, but
activity-core still points at port `8010` and `ISSUE_SINK_TYPE=null`. Do not
flip it to REST until `ISSUE_CORE_API_KEY` is injected into activity-core's
runtime secret via route `activity-core-issue-sink`.
- Forgejo remains parked behind explicit production design decisions, SMTP/email
recovery, package registry, Actions, backup/restore, migration drill, and
cutover approval.
- artifact-store and staged promotion are executable planning/build lanes:
start artifact-store D7.1/D7.2 and staged-promotion T02 before broad
production source-forge migration work.
## Task: Decide State Hub Migration Strategy
```task
id: CUST-WP-0051-T06
status: progress
priority: high
state_hub_task_id: "0ac3763f-eac0-4773-9be8-cb0a7979e444"
```
Choose and execute the State Hub stabilization path.
Decision:
- If pragmatic railiance01 service is enough for the next operating period,
finish `CUST-WP-0011`: cutover MCP config, observe the stabilization window,
then retire or retain WSL2 fallback by explicit decision.
- If HA is now required, promote `CUST-WP-0038` and the ThreePhoenix HA cluster
lane: readiness, storage/database strategy, HA API behavior, failover drill,
restore drill, and endpoint/runbook update.
Done when the active State Hub path is singular, tested, and documented, and
the alternate path is either cancelled, deferred, or explicitly retained as a
future workplan.
Progress 2026-06-27:
- Added `docs/state-hub-migration-strategy-status.md` and selected
the pragmatic `CUST-WP-0011` railiance01 path as the singular active
State Hub stabilization lane.
- `CUST-WP-0011` is already through T01-T06: image pushed, cluster
manifests defined, empty deploy healthy, migrations run, WSL2 data restored,
row counts compared, and cluster API health/summary verified.
- Next gate is `CUST-WP-0011-T07`: explicit approval to freeze WSL2
writes, restore the final dump, compare again, and redirect MCP/private access
to the cluster endpoint.
- `CUST-WP-0038` and `RAIL-BS-WP-0007` remain deferred HA
lanes until the pragmatic path stabilizes and ThreePhoenix storage/database
strategy is current.
## Task: Sequence FOS Hub Bootstrap To Completion
```task
id: CUST-WP-0051-T07
status: progress
priority: medium
state_hub_task_id: "27b6828a-9e87-4135-a036-bce760c3057c"
```
Use the stabilized substrate to finish `CUST-WP-0025` without reviving the
mega-hub pattern.
Recommended order:
1. Finish identity foundations: NK-WP-0001, NK-WP-0002, then the IAM profile
integration test.
2. Create the standalone ops-hub repo from hub-core and ingest the inventory
artifacts from `CUST-WP-0047`.
3. Add ops models, MCP tools, Railiance integration, dev-hub coupling, dashboard,
and MCP registration.
4. Only then start the fin-hub/business-model tasks.
Done when `CUST-WP-0025` has no open foundational identity or ops-hub tasks and
fin-hub work is either started on a stable scaffold or deliberately deferred.
Progress 2026-06-27:
- Added `docs/fos-hub-bootstrap-sequence-status.md` with the current sequence.
- Corrected the identity foundation baseline in `CUST-WP-0025`: the old
`NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local
identity is done, and the remaining identity gate is the IAM Profile v0.2
FastAPI integration test.
- Current ops-hub reality is extension-first: `ops-hub` exists,
`OPS-WP-0001` is finished, and `OPS-WP-0002` waits on authenticated
Inter-Hub bootstrap/runtime-key evidence. Reconcile `CUST-WP-0025-T13`-`T19`
after the first governed ops event lands.
- Fin-hub/business tasks remain deliberately deferred until identity integration
and ops-hub extension evidence are proven.
## Task: Create The Stable Pickup Checkpoint
```task
id: CUST-WP-0051-T08
status: done
priority: high
state_hub_task_id: "2cc0a127-a749-4228-962e-f8c9b693a1b3"
```
Close this metaplan by creating an operator-friendly checkpoint.
Minimum contents:
- active workstream list with zero stale runbooks and zero contradictory task
states;
- blocker board showing no unowned credential, access, or approval gates;
- daily automation evidence from the latest successful scheduled run;
- production service status summary for State Hub, Inter-Hub, ops-hub evidence,
issue-core, Forgejo, and artifact-store;
- explicit next-pick list for remaining strategic tasks.
Done when a future agent can start from the checkpoint and choose the next
workplan without reconstructing this review.
Completed 2026-06-27: added
`docs/infrastructure-stabilization-pickup-checkpoint.md` with the live active
workstream list, named blocker board, latest daily-triage evidence, production
service status summary, and next-pick sequence. This closes the handoff surface
for future agents while the child workplans remain the execution source of
truth.