Add infrastructure stabilization checkpoint

This commit is contained in:
2026-06-27 09:58:52 +02:00
parent 1aec581919
commit aa81d712e1
13 changed files with 925 additions and 5 deletions

View File

@@ -0,0 +1,67 @@
# Credential Custody Unblock Board
Created: 2026-06-27
Owner: the-custodian coordination; credential owners remain with their owning repos.
## Purpose
This board collects the live credential and operator-access gates that block the
infrastructure stabilization plan. It records routes and non-secret evidence
only. It is not a secret store, approval record, or substitute for the owning
repo runbooks.
## Rules
- Do not put secrets in Git, State Hub, workplans, shell history, or chat.
- Use the current ops-warden source CLI for routing if the installed `warden`
lacks `route` commands: `cd /home/worsch/ops-warden && uv run warden route ...`.
- `ops-warden` executes SSH certificate issuance only. It does not vend API
keys, OpenBao tokens, SMTP passwords, OIDC logins, or database credentials.
- OpenBao/API credentials route to `railiance-platform`; interactive identity
routes to `key-cape`; tunnels route to `ops-bridge`; host principal and
force-command deployment routes to `railiance-infra`.
- Evidence may include ids, prefixes, counts, decision ids, HTTP status, and
smoke pass/fail. It must not include credential values.
## Route Records
| Route id | Owner | Scope | Warden executes? | Reference |
| --- | --- | --- | --- | --- |
| `openbao-api-key` | `railiance-platform` | API keys, DB credentials, provider tokens, OpenBao KV/dynamic leases | No | `wiki/CredentialRouting.md#routing-table` |
| `inter-hub-bootstrap-ssh` | `ops-warden` + `railiance-infra` | Inter-Hub bootstrap SSH envelope and force-command pattern | No | `wiki/InterHubBootstrapAccessLane.md#worker-checklist` |
| `ssh-cert-host-access` | `ops-warden` | Short-lived SSH cert signing for host reachability | Yes | `wiki/AccessRouting.md#issue-vs-route` |
| `railiance-infra-principals` | `railiance-infra` | Host SSH principal files and force-command deployment | No | `wiki/CredentialRouting.md#routing-table` |
| `key-cape-oidc-login` | `key-cape` | Interactive login, OIDC, MFA, JWT/authentication | No | `wiki/CredentialRouting.md#quick-decision-tree` |
| `ops-bridge-tunnel` | `ops-bridge` | SSH tunnels and port forwards | No | `wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist` |
## Live Gates
| Gate | Blocking work | Owner and route | Expected execution host | Non-secret evidence | Fallback decision | Next action | Status |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Inter-Hub ops-hub bootstrap | `CUST-WP-0049-T06`, unblocks `CUST-WP-0047-T05` | `inter-hub-bootstrap-ssh` for the envelope; `openbao-api-key` for operator/runtime key custody; `ssh-cert-host-access` only for cert signing if remote execution is used | Local workstation with `IHUB_OPERATOR_KEY_FILE`, or trusted host with railiance-infra force-command wrapper | Hub id, manifest id, widget count, runtime key prefix only, bootstrap smoke result, State Hub progress id | Prefer API helper. Use deployment-side migration/bootstrap only by explicit operator approval. Manual SQL remains last-resort and must be recorded as an exception. | Operator materializes Inter-Hub operator key through approved custody, runs the ops-hub helper, stores generated runtime key outside Git, removes temp files. | Ready for operator handoff |
| Ops-hub runtime evidence key | `IHUB-WP-0022-T04`, then `IHUB-WP-0022-T07` | `openbao-api-key` owned by `railiance-platform` / OpenBao | Operator workstation, OpenBao UI/CLI session, or trusted cluster job; not a Codex-visible shell with printed values | OpenBao path/version or populated key count only, token exchange HTTP status, evidence submission smoke id | Attended one-time key file is acceptable only long enough to store in OpenBao and remove; no chat or State Hub transfer. | Store/provide `OPS_HUB_KEY` via OpenBao path, then run Inter-Hub submission smoke. | Waiting on operator custody |
| OpenBao unseal and token automation | `NET-WP-0020`, related OpenBao token-grant and policy-gate blockers | `openbao-api-key` for OpenBao issuer/token paths; `railiance-infra-principals` for host policy; `ssh-cert-host-access` for cert signing; `key-cape-oidc-login` for login/MFA | OpenBao operator terminal, cluster-admin context, or trusted railiance-infra deployment path | Policy names, role names, token accessor only, decision ids, allow/deny smoke result | Keep attended ceremony path until auto-unseal/profile is explicitly approved. Do not invent `warden secret` or paste `VAULT_TOKEN`. | Decide custody profile, apply narrow policy/role through approved issuer path, rerun smoke with non-secret evidence. | Needs operator design/approval |
| Forgejo production migration | `RAIL-HO-WP-0005` T02/T06/T11/T12 | `openbao-api-key` for SMTP/package/provider credentials; `key-cape-oidc-login` for login/MFA; `ops-bridge-tunnel` or `ssh-cert-host-access` only for host reachability | Forgejo admin/browser session, railiance01 trusted host, or approved GitOps/deployment path | Decision record id, hostname/exposure choice, SMTP sender/domain alignment, password-reset smoke, backup/restore drill id, package pull smoke, cutover approval id | Keep Gitea as read-only rollback until stabilization passes; do not retire legacy Gitea without explicit approval. | Resolve production choices, store SMTP credentials through OpenBao, run recovery and migration drills, then request cutover approval. | Needs human production decisions |
## Route Lookup Commands
```bash
cd /home/worsch/ops-warden
uv run warden route show openbao-api-key --json
uv run warden route show inter-hub-bootstrap-ssh --json
uv run warden route show ssh-cert-host-access --json
uv run warden route show railiance-infra-principals --json
uv run warden route show key-cape-oidc-login --json
uv run warden route show ops-bridge-tunnel --json
```
## Pickup Order
1. Inter-Hub ops-hub bootstrap, because it unlocks both the now-view and the
activity-core evidence lane.
2. Ops-hub runtime evidence key, because it is the immediate smoke gate after
bootstrap.
3. OpenBao custody profile, because several credential-helper and policy-gate
blockers collapse once a narrow issuer path exists.
4. Forgejo production decisions, because those require human design approval
before execution can be responsibly automated.

View File

@@ -0,0 +1,68 @@
# Daily-Triage Stabilization Status
Updated: 2026-06-27
## Purpose
Track the current daily-triage blocker chain for `CUST-WP-0051-T04` without
duplicating the source activity-core workplans.
## Current Evidence
State Hub `daily_triage` progress shows the scheduled activity-core runner is
alive and can write both State Hub progress and working-memory notes.
Recent scheduled run evidence:
| Date | State Hub event | Result |
| --- | --- | --- |
| 2026-06-24 | `8b4c16ee-ac47-4581-b3ee-a23fc1f682e6` | schema-valid daily triage, working memory written |
| 2026-06-25 | `cbba6bc0-14cb-492b-ab23-74b9349326c8` | schema-valid daily triage, working memory written |
| 2026-06-26 | `97fd20a0-eee0-45ea-8290-6d91874e1515` | validation failed at char 5268, working memory written |
| 2026-06-27 | `c5ab50a8-404b-4e30-849f-841b059ace65` | validation failed at char 5246, working memory written |
The 2026-06-26 and 2026-06-27 failures are both overlong malformed JSON
responses from `daily-triage-report`. They are not missed schedules and they are
not silent sink failures.
## Current Blocker
The old `ACTIVITY-WP-0010` State Hub bridge note is partially superseded by the
newer evidence: scheduled runs are reaching State Hub and the working-memory
sink. The current primary blocker is that the live activity-core runtime still
uses an output path that can discard the whole report when the model emits a
malformed tail.
`ACTIVITY-WP-0016` has the repo-side mitigation:
- strict bounded report schema;
- item-granular recovery and quarantine;
- producer guardrails and ADR-004;
- regression tests for the 2026-06-26 failure shape.
The remaining gate is the live deployment/smoke path:
1. Deploy the WP-0016 code and schema together.
2. Update the Railiance runtime prompt bundle with bounded top-N instructions,
per-item framing, value vocabularies, and sufficient `max_tokens` headroom.
3. Run a live daily-triage smoke on railiance01 and confirm malformed-tail
output degrades to partial valid output with quarantined items.
4. Resume the three-clean-scheduled-run gate for `ACTIVITY-WP-0006-T03` and
`ACTIVITY-WP-0010-T04`.
## Hygiene Note
The State Hub task index currently shows stale duplicate tasks for
`ACTIVITY-WP-0016` in addition to the source-file task records. Before relying
on activity-core task counts for triage ranking, run activity-core consistency
sync and prune or reconcile any stale generated task rows that are no longer
linked from the workplan file.
2026-06-27 status-normalization: ACTIVITY-WP-0016 source task blocks now
match the progress notes for T04 (done) and T05 (progress). Remaining hygiene is
to remove or reconcile stale duplicate task rows from the State Hub index.
2026-06-27 gate cleanup: ACTIVITY-WP-0010-T02 is now done because scheduled
runner evidence proves the State Hub sink and working-memory path are reachable.
The live human-needed notes now sit on the post-deployment smoke, WP-0016 live
proof, and three-clean-run calibration tasks.

View File

@@ -0,0 +1,33 @@
# FOS Hub Bootstrap Sequence Status
Updated: 2026-06-27
## Purpose
Track `CUST-WP-0051-T07`: sequence `CUST-WP-0025` so FOS Hub bootstrap can resume from current repo reality rather than the older mega-hub/Keycloak assumptions.
## Current Decision
Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workplan is archived and superseded. The active identity baseline is:
- `NK-WP-0002` local identity: complete; usable for bootstrap/dev OIDC.
- `NK-WP-0012` IAM Profile v0.2: finished; canonical NetKingdom-owned profile and conformance suite.
- KeyCape/Authelia/LLDAP stack from the superseding NetKingdom path: current lightweight identity mode.
- `NK-WP-0011` expanded-mode Keycloak: proposed enterprise federation lane, not a blocker for ops-hub bootstrap.
## Sequence Board
| Area | Current state | Pickup action |
| --- | --- | --- |
| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. |
| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. |
| Ops hub | The `ops-hub` repo exists as an Inter-Hub Operations extension. `OPS-WP-0001` is finished; `OPS-WP-0002` has T01-T03 done and waits on authenticated bootstrap/runtime key. | Finish the Inter-Hub evidence lane first: align the activity-core mapping with the live ops vocabulary, run attended bootstrap, store runtime key by approved route, then send the first governed ops event. |
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` still describe a standalone hub-core/FastAPI/MCP scaffold. Current implementation direction is Inter-Hub extension-first. | Reconcile these tasks after the Inter-Hub evidence lane closes: either rewrite them to extension-owned implementation tasks or explicitly defer the standalone hub-core service. |
| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. |
## Stable Pickup Order
1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test.
2. Finish `CUST-WP-0051-T03` / ops-hub Inter-Hub evidence alignment before expanding ops-hub models/tools.
3. Reconcile `CUST-WP-0025-T13`-`T19` against `OPS-WP-0002` once the first ops event lands.
4. Start fin-hub/business work only after ops-hub proves the extension pattern end-to-end.

View File

@@ -0,0 +1,128 @@
# Infrastructure Stabilization Pickup Checkpoint
Updated: 2026-06-27
Coordinator workplan: `CUST-WP-0051`
## Purpose
This checkpoint is the restart surface for the infrastructure stabilization
metaplan. It consolidates the workplan review, unblock boards, current State
Hub registration state, and the next strategic picks.
Use this file first when resuming the lane. Then open the source workplan named
in the relevant row and continue from its task state.
## Registration State
State Hub active workstreams queried on 2026-06-27:
| Workstream | Current pickup meaning |
| --- | --- |
| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. |
| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
| `staged-promotion-lifecycle` | Start T02 to make promotion gates concrete before broad production migrations. |
| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. |
| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. |
| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. |
Hygiene status:
- `CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook
record, not an empty active workstream.
- `CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with
todo task blocks.
- Completed or cancelled tasks no longer carry the stale human-needed flags
cleared during this stabilization session.
- `make fix-consistency REPO=the-custodian` still reports pre-existing C-12
orphan-row warnings, but the relevant workplan lifecycle and task states sync.
## Blocker Board
No live credential, access, or approval gate is unowned. Do not ask
`ops-warden` for secret values; use the route catalog and the owning subsystem.
| Gate | Owner/route | Non-secret evidence to collect | Next action |
| --- | --- | --- | --- |
| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Use the aligned live-vocabulary mapping, then run attended bootstrap and protected widget lookup. |
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Store/provide `OPS_HUB_KEY` outside Git and run the protected evidence smoke. |
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. |
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | Approve custody profile and apply narrow issuer policies before live helper smokes. |
## Daily Automation Evidence
The scheduled daily-triage runner is alive and writing State Hub plus working
memory evidence. The current blocker is output validation, not scheduling or
sink reachability.
Latest clean scheduled run:
- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`,
schema-valid daily triage, working memory written.
Latest failed scheduled runs:
- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed
at char 5268, working memory written.
- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed
at char 5246, working memory written.
Resume from `docs/daily-triage-stabilization-status.md` and
`ACTIVITY-WP-0016` before restarting the three-clean-run gate.
## Production Service Summary
| Surface | Stable fact | Remaining gate |
| --- | --- | --- |
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. |
| Inter-Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; public registry vocabulary is visible; Inter-Hub contract docs now target the live seed vocabulary. | Protected widget lookup, runtime key custody, and authenticated event smoke remain. |
| ops-hub evidence | `ops-hub` exists as the Inter-Hub Operations extension; `OPS-WP-0001` finished; `OPS-WP-0002` has early seed tasks done. | Attended bootstrap, runtime key custody, protected widget/event smoke. |
| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. |
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
| artifact-store | Workplan is active with all tasks open and no current live secret handoff recorded. | Start D7.1 fork/object-store landscape and D7.2 compatibility harness. |
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity and IAM Profile v0.2 are done; hub-core extraction/dev-hub work is done. | Keep `CUST-WP-0025-T03` as the identity integration test, then reconcile old ops-hub scaffold tasks after first Inter-Hub ops event lands. |
## Next-Pick List
1. Run the attended Inter-Hub ops-hub bootstrap with the aligned live-vocabulary
mapping, confirm protected widget ids, and seed any missing backup/risk target widgets.
2. Store/confirm `OPS_HUB_KEY` through approved custody and run the protected
widget/hub-registry/event smoke.
3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt
bundle, then run the railiance01 daily-triage smoke.
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or
record that WSL2 remains primary for the next operating period.
6. Start staged-promotion T02 and artifact-store D7.1/D7.2 so Forgejo and
storage work inherit clear production promotion gates.
7. Keep Forgejo cutover and State Hub HA work parked until their human decision
and drill gates are satisfied.
## Resume Commands
```bash
cd /home/worsch/the-custodian
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
sed -n '1,260p' docs/credential-custody-unblock-board.md
```
After workplan edits, sync from State Hub:
```bash
cd /home/worsch/state-hub
make fix-consistency REPO=the-custodian
```

View File

@@ -0,0 +1,48 @@
# Near-Term Production Service Lanes Status
Updated: 2026-06-27
## Purpose
Track `CUST-WP-0051-T05`: finish or park near-term production service lanes
before starting larger migrations.
## Lane Board
| Lane | Current state | Next action |
| --- | --- | --- |
| `issue-wp-0003` | issue-core is live through ArgoCD; image `0.2.1`, Service port `8765`, ExternalSecret Ready, authenticated smoke created Gitea issue `175`. | Do not flip activity-core blindly. First inject `ISSUE_CORE_API_KEY` into `actcore-runtime-secret` through route `activity-core-issue-sink`; then set activity-core `ISSUE_CORE_URL` to port `8765`, set `ISSUE_SINK_TYPE=rest`, restart/sync, and run one safe emission smoke. |
| `rail-ho-wp-0005` | Forgejo migration remains pre-implementation. Inventory is in progress; production decisions, SMTP/email recovery, cutover, and legacy retirement are human-gated. | Resolve T02 production decisions first, then build the disposable Forgejo probe. Do not start production cutover before promotion lifecycle, email recovery, package registry, Actions, backup/restore, and migration drill pass. |
| `artifact-store-wp-0007` | All tasks are still `todo`; no live secret gate is currently recorded. | Start with D7.1 fork/object-store landscape and D7.2 compatibility harness. Route D7.3 STS credential vending to NetKingdom if implementation belongs outside artifact-store. |
| `staged-promotion-lifecycle` | Lifecycle spec is done; schema/tooling/canary/promotion tasks are still `todo`. | Start T02 `railiance/app.toml` contract, then use issue-core/Forgejo as reference consumers for Stage 1/2/3 promotion gates. |
## Credential And Operator Routing
`activity-core -> issue-core` REST emission uses route catalog id
`activity-core-issue-sink`.
Route lookup on 2026-06-27:
- owner: `activity-core + issue-core`
- ops-warden executes: no
- status: active
- next action: follow `ops-warden/wiki/playbooks/activity-core-issue-sink.md#worker-checklist`
No secret value was read or written. The required non-secret evidence is:
- `actcore-runtime-secret` has an `ISSUE_CORE_API_KEY` data key;
- activity-core worker consumes `ISSUE_CORE_URL=http://issue-core.issue-core.svc.cluster.local:8765`;
- `ISSUE_SINK_TYPE=rest`;
- one known-safe activity-core emission returns issue-core HTTP 201 and creates
a Gitea issue.
## Pickup Order
1. Close the issue-core handoff gate because the service is already healthy and
only activity-core live emission remains.
2. Start staged-promotion T02 so Forgejo has a repeatable promotion contract
before production cutover work accelerates.
3. Run artifact-store D7.1/D7.2 as an assessment/build harness lane, with D7.3
routed to NetKingdom if STS vending is not artifact-store-owned.
4. Keep Forgejo production cutover parked behind explicit T02 decisions and the
staged-promotion/backup/email/package/action gates.

View File

@@ -0,0 +1,120 @@
# Ops Hub Inter-Hub Evidence Lane Status
Date: 2026-06-27
Workplan: `CUST-WP-0051-T03`
Related tasks: `CUST-WP-0047-T05`, `CUST-WP-0049-T06`, `IHUB-WP-0022-T03/T04/T07`
## Summary
The evidence lane is partially live but not ready to close.
Production Inter-Hub already exposes the public ops-hub bootstrap surface and
has an `ops-hub` row plus the ops-hub seed vocabulary. The remaining blockers
are:
1. authenticated bootstrap/runtime-key execution is still operator-gated;
2. protected widget and hub-registry reads cannot be verified without the
ops-hub runtime key;
3. the older `IHUB-WP-0022` activity-core mapping contract does not match the
currently live ops-hub seed vocabulary.
No secret values were requested, read, printed, or stored during this probe.
## Public Probe Evidence
Base URL: `https://hub.coulomb.social`
| Probe | Result |
| --- | --- |
| `GET /api/v2/hubs` | HTTP `200`; contains `ops-hub` |
| `GET /api/v2/openapi.json` | HTTP `200`; includes `/hubs`, `/hub-capability-manifests`, `/api-consumers`, `/policy-scopes` |
| `GET /api/v2/widgets` | HTTP `401`, protected as expected |
| `GET /api/v2/hub-registry` | HTTP `401`, protected as expected |
| `GET /api/v2/widget-types` | HTTP `200`; 14 ops widget types visible |
| `GET /api/v2/event-types` | HTTP `200`; 15 ops event types visible |
| `GET /api/v2/annotation-categories` | HTTP `200`; 10 ops annotation categories visible |
| `GET /api/v2/policy-scopes` | HTTP `200`; 7 ops policy scopes visible |
| `GET /api/v2/hub-capability-manifests?hubId=<ops-hub-id>` | HTTP `401`, protected as expected |
Observed public ops-hub id: `4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`.
The existing `ops-hub/scripts/interhub-gate-probe.py` exits nonzero because it
still expects unauthenticated `/api/v2/hubs` to return `401`. The live contract
returns `200` for public hub discovery and `401` for protected surfaces such as
`/api/v2/widgets` and `/api/v2/hub-registry`.
## Live Ops Vocabulary
The live public registry matches `ops-hub/seeds/ops-hub-manifest.draft.json`:
- widget types: `ops-environment`, `ops-host`, `ops-cluster`, `ops-service`,
`ops-service-catalog`, `ops-endpoint`, `ops-release`, `ops-backup-set`,
`ops-secret-set`, `ops-runbook`, `ops-incident`, `ops-readiness-gate`,
`ops-migration-wave`, `ops-risk`;
- event types: `ops-inventory-registered`, `ops-inventory-updated`,
`ops-service-discovered`, `ops-health-checked`, `ops-release-observed`,
`ops-endpoint-verified`, `ops-backup-verified`, `ops-restore-tested`,
`ops-runbook-executed`, `ops-drift-detected`, `ops-risk-raised`,
`ops-risk-accepted`, `ops-readiness-gate-updated`,
`ops-migration-gate-passed`, `ops-migration-gate-failed`;
- policy scopes: `ops-local`, `ops-transitional-prod`, `ops-production`,
`ops-threephoenix`, `ops-registry`, `ops-secrets`,
`ops-backup-retention`.
## Contract Mismatch
`inter-hub/docs/contracts/ops-hub-activity-core-mapping.md` and
`ops-hub-activity-core-event-payloads.md` still describe the early
activity-core proposal:
| Contract name | Live seed status | Recommended action |
| --- | --- | --- |
| `ops-service-observed` | Not in live event registry | Rename to `ops-service-discovered`, or add an explicit alias event in the ops-hub manifest. |
| `ops-endpoint-verified` | Live | Keep. |
| `ops-access-path-checked` | Not in live event registry; no `ops-access-path` widget type in seed | Either add access-path vocabulary/widgets, or defer access-path submissions and keep State Hub fallback. |
| `ops-backup-verified` | Live | Keep, but map to `ops-backup-set` widget type. |
| `ops-inventory-drift` | Not in live event registry | Rename to `ops-drift-detected`, or add an explicit alias event. |
| `ops-evidence` policy scope | Not in live policy scopes | Use an existing ops scope or add `ops-evidence` to the manifest and activate it. |
| aggregate refs such as `ops:service:aggregate` | Not in `ops-hub/seeds/ops-hub-widgets.seed.json` | Seed aggregate intake widgets or change mapping to the existing entity/readiness widgets. |
| widget types such as `ops-service-card` | Not in live widget types | Use live widget types like `ops-service`, `ops-endpoint`, `ops-backup-set`, and `ops-readiness-gate`. |
## 2026-06-27 Contract Alignment
The Inter-Hub contract docs were revised in `/home/worsch/inter-hub` to target
the live ops-hub seed vocabulary:
- `ops-service-observed` is now a transition alias for
`ops-service-discovered`.
- `ops-inventory-drift` is now a transition alias for `ops-drift-detected`.
- `ops-access-path-checked` is explicitly deferred to State Hub fallback until
ops-hub adds access-path vocabulary or a readiness/risk mapping decision.
- The old `ops-evidence` policy scope is replaced by declared live scopes such
as `ops-production`, `ops-registry`, and `ops-backup-retention`.
- Payload examples now post only live manifest event types.
This removes the known contract-drift blocker before the attended bootstrap.
The remaining gate is authenticated widget lookup, any missing backup/risk seed
widget, runtime key custody, and protected event submission smoke.
## Current Closure State
`CUST-WP-0049-T06` remains `wait`: the helper and runbook are ready, but an
approved authenticated execution lane is still required.
`CUST-WP-0047-T05` remains `wait`: the ops-hub row and vocabulary are visible,
but seeded widgets and event acceptance cannot be proven without the protected
runtime path.
`IHUB-WP-0022-T03/T04/T07` remain gated: before an end-to-end smoke, reconcile
the activity-core mapping contract to the live ops-hub seed vocabulary or add
the missing aliases/aggregate widgets to the manifest.
## Next Pick
1. Use the aligned live-vocabulary contract for the attended
`CUST-WP-0049-T06` bootstrap.
2. Confirm protected widget ids and seed any missing backup/risk target widgets
required by the mapping.
3. Store or confirm `OPS_HUB_KEY` through OpenBao, then run the protected
widget/hub-registry/event smoke.

View File

@@ -0,0 +1,34 @@
# State Hub Migration Strategy Status
Updated: 2026-06-27
## Decision
Use `CUST-WP-0011` as the active State Hub stabilization path.
Keep `CUST-WP-0038` and `RAIL-BS-WP-0007` as deferred HA/ThreePhoenix follow-up lanes.
Rationale: the pragmatic railiance01 deployment has already completed image
publish, cluster manifests, empty deploy, migrations, WSL2 data restore, row-count
comparison, and cluster API health checks. The remaining work is cutover and
stabilization, not initial buildout.
## Current State
| Path | State | Next action |
| --- | --- | --- |
| `CUST-WP-0011` pragmatic railiance01 | T01-T06 done. Cluster State Hub has verified restored WSL2 data and healthy API. | T07: get explicit approval to freeze WSL2 writes, restore final dump, compare again, and redirect private access/MCP to the cluster endpoint. |
| `CUST-WP-0038` full HA State Hub | Entry criteria depend on completing or superseding CUST-WP-0011 and passing stabilization. All implementation tasks are still todo. | Defer until cluster-hosted State Hub proves stable and ThreePhoenix storage/database strategy is current. |
| `RAIL-BS-WP-0007` ThreePhoenix HA cluster | All phases are todo. | Treat as substrate work for future critical workloads and HA State Hub, not as a blocker for pragmatic cutover. |
## Human Gates
- `CUST-WP-0011-T07`: explicit approval required before freezing WSL2 writes and making the cluster State Hub primary.
- `CUST-WP-0038-T08`: explicit approval required before retiring WSL2 fallback after HA failover and restore drills.
## Stable Pickup Path
1. Reconfirm current WSL2 backup and take final pre-cutover dump.
2. Restore final dump into railiance01 State Hub and compare counts again.
3. Redirect the active private access path: either keep local `127.0.0.1:8000` and move it to an ops-bridge/SSH tunnel, or set MCP `API_BASE` to the private cluster endpoint.
4. Run stabilization with WSL2 retained as fallback.
5. Document the operating model and leave final retirement to a later explicit decision or HA workplan.