From aa81d712e1bf06ff7da1f3d9f6fe244441c60688 Mon Sep 17 00:00:00 2001 From: tegwick Date: Sat, 27 Jun 2026 09:58:52 +0200 Subject: [PATCH] Add infrastructure stabilization checkpoint --- docs/credential-custody-unblock-board.md | 67 +++ docs/daily-triage-stabilization-status.md | 68 +++ docs/fos-hub-bootstrap-sequence-status.md | 33 ++ ...ructure-stabilization-pickup-checkpoint.md | 128 ++++++ ...ar-term-production-service-lanes-status.md | 48 +++ docs/ops-hub-interhub-evidence-lane-status.md | 120 ++++++ docs/state-hub-migration-strategy-status.md | 34 ++ .../CUST-WP-0014-repo-sync-automation.md | 9 +- workplans/CUST-WP-0025-fos-hub-bootstrap.md | 12 +- workplans/CUST-WP-0045-cutover-runbook.md | 2 + ...0047-ops-hub-service-inventory-now-view.md | 8 + ...-WP-0049-interhub-bootstrap-access-lane.md | 6 + ...1-infrastructure-stabilization-metaplan.md | 395 ++++++++++++++++++ 13 files changed, 925 insertions(+), 5 deletions(-) create mode 100644 docs/credential-custody-unblock-board.md create mode 100644 docs/daily-triage-stabilization-status.md create mode 100644 docs/fos-hub-bootstrap-sequence-status.md create mode 100644 docs/infrastructure-stabilization-pickup-checkpoint.md create mode 100644 docs/near-term-production-service-lanes-status.md create mode 100644 docs/ops-hub-interhub-evidence-lane-status.md create mode 100644 docs/state-hub-migration-strategy-status.md create mode 100644 workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md diff --git a/docs/credential-custody-unblock-board.md b/docs/credential-custody-unblock-board.md new file mode 100644 index 0000000..98edd9e --- /dev/null +++ b/docs/credential-custody-unblock-board.md @@ -0,0 +1,67 @@ +# Credential Custody Unblock Board + +Created: 2026-06-27 +Owner: the-custodian coordination; credential owners remain with their owning repos. + +## Purpose + +This board collects the live credential and operator-access gates that block the +infrastructure stabilization plan. It records routes and non-secret evidence +only. It is not a secret store, approval record, or substitute for the owning +repo runbooks. + +## Rules + +- Do not put secrets in Git, State Hub, workplans, shell history, or chat. +- Use the current ops-warden source CLI for routing if the installed `warden` + lacks `route` commands: `cd /home/worsch/ops-warden && uv run warden route ...`. +- `ops-warden` executes SSH certificate issuance only. It does not vend API + keys, OpenBao tokens, SMTP passwords, OIDC logins, or database credentials. +- OpenBao/API credentials route to `railiance-platform`; interactive identity + routes to `key-cape`; tunnels route to `ops-bridge`; host principal and + force-command deployment routes to `railiance-infra`. +- Evidence may include ids, prefixes, counts, decision ids, HTTP status, and + smoke pass/fail. It must not include credential values. + +## Route Records + +| Route id | Owner | Scope | Warden executes? | Reference | +| --- | --- | --- | --- | --- | +| `openbao-api-key` | `railiance-platform` | API keys, DB credentials, provider tokens, OpenBao KV/dynamic leases | No | `wiki/CredentialRouting.md#routing-table` | +| `inter-hub-bootstrap-ssh` | `ops-warden` + `railiance-infra` | Inter-Hub bootstrap SSH envelope and force-command pattern | No | `wiki/InterHubBootstrapAccessLane.md#worker-checklist` | +| `ssh-cert-host-access` | `ops-warden` | Short-lived SSH cert signing for host reachability | Yes | `wiki/AccessRouting.md#issue-vs-route` | +| `railiance-infra-principals` | `railiance-infra` | Host SSH principal files and force-command deployment | No | `wiki/CredentialRouting.md#routing-table` | +| `key-cape-oidc-login` | `key-cape` | Interactive login, OIDC, MFA, JWT/authentication | No | `wiki/CredentialRouting.md#quick-decision-tree` | +| `ops-bridge-tunnel` | `ops-bridge` | SSH tunnels and port forwards | No | `wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist` | + +## Live Gates + +| Gate | Blocking work | Owner and route | Expected execution host | Non-secret evidence | Fallback decision | Next action | Status | +| --- | --- | --- | --- | --- | --- | --- | --- | +| Inter-Hub ops-hub bootstrap | `CUST-WP-0049-T06`, unblocks `CUST-WP-0047-T05` | `inter-hub-bootstrap-ssh` for the envelope; `openbao-api-key` for operator/runtime key custody; `ssh-cert-host-access` only for cert signing if remote execution is used | Local workstation with `IHUB_OPERATOR_KEY_FILE`, or trusted host with railiance-infra force-command wrapper | Hub id, manifest id, widget count, runtime key prefix only, bootstrap smoke result, State Hub progress id | Prefer API helper. Use deployment-side migration/bootstrap only by explicit operator approval. Manual SQL remains last-resort and must be recorded as an exception. | Operator materializes Inter-Hub operator key through approved custody, runs the ops-hub helper, stores generated runtime key outside Git, removes temp files. | Ready for operator handoff | +| Ops-hub runtime evidence key | `IHUB-WP-0022-T04`, then `IHUB-WP-0022-T07` | `openbao-api-key` owned by `railiance-platform` / OpenBao | Operator workstation, OpenBao UI/CLI session, or trusted cluster job; not a Codex-visible shell with printed values | OpenBao path/version or populated key count only, token exchange HTTP status, evidence submission smoke id | Attended one-time key file is acceptable only long enough to store in OpenBao and remove; no chat or State Hub transfer. | Store/provide `OPS_HUB_KEY` via OpenBao path, then run Inter-Hub submission smoke. | Waiting on operator custody | +| OpenBao unseal and token automation | `NET-WP-0020`, related OpenBao token-grant and policy-gate blockers | `openbao-api-key` for OpenBao issuer/token paths; `railiance-infra-principals` for host policy; `ssh-cert-host-access` for cert signing; `key-cape-oidc-login` for login/MFA | OpenBao operator terminal, cluster-admin context, or trusted railiance-infra deployment path | Policy names, role names, token accessor only, decision ids, allow/deny smoke result | Keep attended ceremony path until auto-unseal/profile is explicitly approved. Do not invent `warden secret` or paste `VAULT_TOKEN`. | Decide custody profile, apply narrow policy/role through approved issuer path, rerun smoke with non-secret evidence. | Needs operator design/approval | +| Forgejo production migration | `RAIL-HO-WP-0005` T02/T06/T11/T12 | `openbao-api-key` for SMTP/package/provider credentials; `key-cape-oidc-login` for login/MFA; `ops-bridge-tunnel` or `ssh-cert-host-access` only for host reachability | Forgejo admin/browser session, railiance01 trusted host, or approved GitOps/deployment path | Decision record id, hostname/exposure choice, SMTP sender/domain alignment, password-reset smoke, backup/restore drill id, package pull smoke, cutover approval id | Keep Gitea as read-only rollback until stabilization passes; do not retire legacy Gitea without explicit approval. | Resolve production choices, store SMTP credentials through OpenBao, run recovery and migration drills, then request cutover approval. | Needs human production decisions | + +## Route Lookup Commands + +```bash +cd /home/worsch/ops-warden +uv run warden route show openbao-api-key --json +uv run warden route show inter-hub-bootstrap-ssh --json +uv run warden route show ssh-cert-host-access --json +uv run warden route show railiance-infra-principals --json +uv run warden route show key-cape-oidc-login --json +uv run warden route show ops-bridge-tunnel --json +``` + +## Pickup Order + +1. Inter-Hub ops-hub bootstrap, because it unlocks both the now-view and the + activity-core evidence lane. +2. Ops-hub runtime evidence key, because it is the immediate smoke gate after + bootstrap. +3. OpenBao custody profile, because several credential-helper and policy-gate + blockers collapse once a narrow issuer path exists. +4. Forgejo production decisions, because those require human design approval + before execution can be responsibly automated. diff --git a/docs/daily-triage-stabilization-status.md b/docs/daily-triage-stabilization-status.md new file mode 100644 index 0000000..9f05e2a --- /dev/null +++ b/docs/daily-triage-stabilization-status.md @@ -0,0 +1,68 @@ +# Daily-Triage Stabilization Status + +Updated: 2026-06-27 + +## Purpose + +Track the current daily-triage blocker chain for `CUST-WP-0051-T04` without +duplicating the source activity-core workplans. + +## Current Evidence + +State Hub `daily_triage` progress shows the scheduled activity-core runner is +alive and can write both State Hub progress and working-memory notes. + +Recent scheduled run evidence: + +| Date | State Hub event | Result | +| --- | --- | --- | +| 2026-06-24 | `8b4c16ee-ac47-4581-b3ee-a23fc1f682e6` | schema-valid daily triage, working memory written | +| 2026-06-25 | `cbba6bc0-14cb-492b-ab23-74b9349326c8` | schema-valid daily triage, working memory written | +| 2026-06-26 | `97fd20a0-eee0-45ea-8290-6d91874e1515` | validation failed at char 5268, working memory written | +| 2026-06-27 | `c5ab50a8-404b-4e30-849f-841b059ace65` | validation failed at char 5246, working memory written | + +The 2026-06-26 and 2026-06-27 failures are both overlong malformed JSON +responses from `daily-triage-report`. They are not missed schedules and they are +not silent sink failures. + +## Current Blocker + +The old `ACTIVITY-WP-0010` State Hub bridge note is partially superseded by the +newer evidence: scheduled runs are reaching State Hub and the working-memory +sink. The current primary blocker is that the live activity-core runtime still +uses an output path that can discard the whole report when the model emits a +malformed tail. + +`ACTIVITY-WP-0016` has the repo-side mitigation: + +- strict bounded report schema; +- item-granular recovery and quarantine; +- producer guardrails and ADR-004; +- regression tests for the 2026-06-26 failure shape. + +The remaining gate is the live deployment/smoke path: + +1. Deploy the WP-0016 code and schema together. +2. Update the Railiance runtime prompt bundle with bounded top-N instructions, + per-item framing, value vocabularies, and sufficient `max_tokens` headroom. +3. Run a live daily-triage smoke on railiance01 and confirm malformed-tail + output degrades to partial valid output with quarantined items. +4. Resume the three-clean-scheduled-run gate for `ACTIVITY-WP-0006-T03` and + `ACTIVITY-WP-0010-T04`. + +## Hygiene Note + +The State Hub task index currently shows stale duplicate tasks for +`ACTIVITY-WP-0016` in addition to the source-file task records. Before relying +on activity-core task counts for triage ranking, run activity-core consistency +sync and prune or reconcile any stale generated task rows that are no longer +linked from the workplan file. + +2026-06-27 status-normalization: ACTIVITY-WP-0016 source task blocks now +match the progress notes for T04 (done) and T05 (progress). Remaining hygiene is +to remove or reconcile stale duplicate task rows from the State Hub index. + +2026-06-27 gate cleanup: ACTIVITY-WP-0010-T02 is now done because scheduled +runner evidence proves the State Hub sink and working-memory path are reachable. +The live human-needed notes now sit on the post-deployment smoke, WP-0016 live +proof, and three-clean-run calibration tasks. diff --git a/docs/fos-hub-bootstrap-sequence-status.md b/docs/fos-hub-bootstrap-sequence-status.md new file mode 100644 index 0000000..f359ec3 --- /dev/null +++ b/docs/fos-hub-bootstrap-sequence-status.md @@ -0,0 +1,33 @@ +# FOS Hub Bootstrap Sequence Status + +Updated: 2026-06-27 + +## Purpose + +Track `CUST-WP-0051-T07`: sequence `CUST-WP-0025` so FOS Hub bootstrap can resume from current repo reality rather than the older mega-hub/Keycloak assumptions. + +## Current Decision + +Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workplan is archived and superseded. The active identity baseline is: + +- `NK-WP-0002` local identity: complete; usable for bootstrap/dev OIDC. +- `NK-WP-0012` IAM Profile v0.2: finished; canonical NetKingdom-owned profile and conformance suite. +- KeyCape/Authelia/LLDAP stack from the superseding NetKingdom path: current lightweight identity mode. +- `NK-WP-0011` expanded-mode Keycloak: proposed enterprise federation lane, not a blocker for ops-hub bootstrap. + +## Sequence Board + +| Area | Current state | Pickup action | +| --- | --- | --- | +| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. | +| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. | +| Ops hub | The `ops-hub` repo exists as an Inter-Hub Operations extension. `OPS-WP-0001` is finished; `OPS-WP-0002` has T01-T03 done and waits on authenticated bootstrap/runtime key. | Finish the Inter-Hub evidence lane first: align the activity-core mapping with the live ops vocabulary, run attended bootstrap, store runtime key by approved route, then send the first governed ops event. | +| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` still describe a standalone hub-core/FastAPI/MCP scaffold. Current implementation direction is Inter-Hub extension-first. | Reconcile these tasks after the Inter-Hub evidence lane closes: either rewrite them to extension-owned implementation tasks or explicitly defer the standalone hub-core service. | +| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. | + +## Stable Pickup Order + +1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test. +2. Finish `CUST-WP-0051-T03` / ops-hub Inter-Hub evidence alignment before expanding ops-hub models/tools. +3. Reconcile `CUST-WP-0025-T13`-`T19` against `OPS-WP-0002` once the first ops event lands. +4. Start fin-hub/business work only after ops-hub proves the extension pattern end-to-end. diff --git a/docs/infrastructure-stabilization-pickup-checkpoint.md b/docs/infrastructure-stabilization-pickup-checkpoint.md new file mode 100644 index 0000000..96928b2 --- /dev/null +++ b/docs/infrastructure-stabilization-pickup-checkpoint.md @@ -0,0 +1,128 @@ +# Infrastructure Stabilization Pickup Checkpoint + +Updated: 2026-06-27 +Coordinator workplan: `CUST-WP-0051` + +## Purpose + +This checkpoint is the restart surface for the infrastructure stabilization +metaplan. It consolidates the workplan review, unblock boards, current State +Hub registration state, and the next strategic picks. + +Use this file first when resuming the lane. Then open the source workplan named +in the relevant row and continue from its task state. + +## Registration State + +State Hub active workstreams queried on 2026-06-27: + +| Workstream | Current pickup meaning | +| --- | --- | +| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. | +| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. | +| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. | +| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. | +| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. | +| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. | +| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. | +| `staged-promotion-lifecycle` | Start T02 to make promotion gates concrete before broad production migrations. | +| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. | +| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. | +| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. | +| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. | +| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. | +| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. | +| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. | + +Hygiene status: + +- `CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook + record, not an empty active workstream. +- `CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with + todo task blocks. +- Completed or cancelled tasks no longer carry the stale human-needed flags + cleared during this stabilization session. +- `make fix-consistency REPO=the-custodian` still reports pre-existing C-12 + orphan-row warnings, but the relevant workplan lifecycle and task states sync. + +## Blocker Board + +No live credential, access, or approval gate is unowned. Do not ask +`ops-warden` for secret values; use the route catalog and the owning subsystem. + +| Gate | Owner/route | Non-secret evidence to collect | Next action | +| --- | --- | --- | --- | +| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. | +| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. | +| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Use the aligned live-vocabulary mapping, then run attended bootstrap and protected widget lookup. | +| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Store/provide `OPS_HUB_KEY` outside Git and run the protected evidence smoke. | +| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. | +| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. | +| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. | +| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | Approve custody profile and apply narrow issuer policies before live helper smokes. | + +## Daily Automation Evidence + +The scheduled daily-triage runner is alive and writing State Hub plus working +memory evidence. The current blocker is output validation, not scheduling or +sink reachability. + +Latest clean scheduled run: + +- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`, + schema-valid daily triage, working memory written. + +Latest failed scheduled runs: + +- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed + at char 5268, working memory written. +- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed + at char 5246, working memory written. + +Resume from `docs/daily-triage-stabilization-status.md` and +`ACTIVITY-WP-0016` before restarting the three-clean-run gate. + +## Production Service Summary + +| Surface | Stable fact | Remaining gate | +| --- | --- | --- | +| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. | +| Inter-Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; public registry vocabulary is visible; Inter-Hub contract docs now target the live seed vocabulary. | Protected widget lookup, runtime key custody, and authenticated event smoke remain. | +| ops-hub evidence | `ops-hub` exists as the Inter-Hub Operations extension; `OPS-WP-0001` finished; `OPS-WP-0002` has early seed tasks done. | Attended bootstrap, runtime key custody, protected widget/event smoke. | +| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. | +| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. | +| artifact-store | Workplan is active with all tasks open and no current live secret handoff recorded. | Start D7.1 fork/object-store landscape and D7.2 compatibility harness. | +| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity and IAM Profile v0.2 are done; hub-core extraction/dev-hub work is done. | Keep `CUST-WP-0025-T03` as the identity integration test, then reconcile old ops-hub scaffold tasks after first Inter-Hub ops event lands. | + +## Next-Pick List + +1. Run the attended Inter-Hub ops-hub bootstrap with the aligned live-vocabulary + mapping, confirm protected widget ids, and seed any missing backup/risk target widgets. +2. Store/confirm `OPS_HUB_KEY` through approved custody and run the protected + widget/hub-registry/event smoke. +3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt + bundle, then run the railiance01 daily-triage smoke. +4. Complete the issue-core handoff by wiring activity-core to port `8765` with + `ISSUE_SINK_TYPE=rest` and one known-safe emission smoke. +5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or + record that WSL2 remains primary for the next operating period. +6. Start staged-promotion T02 and artifact-store D7.1/D7.2 so Forgejo and + storage work inherit clear production promotion gates. +7. Keep Forgejo cutover and State Hub HA work parked until their human decision + and drill gates are satisfied. + +## Resume Commands + +```bash +cd /home/worsch/the-custodian +sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md +sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md +sed -n '1,260p' docs/credential-custody-unblock-board.md +``` + +After workplan edits, sync from State Hub: + +```bash +cd /home/worsch/state-hub +make fix-consistency REPO=the-custodian +``` diff --git a/docs/near-term-production-service-lanes-status.md b/docs/near-term-production-service-lanes-status.md new file mode 100644 index 0000000..60ea41f --- /dev/null +++ b/docs/near-term-production-service-lanes-status.md @@ -0,0 +1,48 @@ +# Near-Term Production Service Lanes Status + +Updated: 2026-06-27 + +## Purpose + +Track `CUST-WP-0051-T05`: finish or park near-term production service lanes +before starting larger migrations. + +## Lane Board + +| Lane | Current state | Next action | +| --- | --- | --- | +| `issue-wp-0003` | issue-core is live through ArgoCD; image `0.2.1`, Service port `8765`, ExternalSecret Ready, authenticated smoke created Gitea issue `175`. | Do not flip activity-core blindly. First inject `ISSUE_CORE_API_KEY` into `actcore-runtime-secret` through route `activity-core-issue-sink`; then set activity-core `ISSUE_CORE_URL` to port `8765`, set `ISSUE_SINK_TYPE=rest`, restart/sync, and run one safe emission smoke. | +| `rail-ho-wp-0005` | Forgejo migration remains pre-implementation. Inventory is in progress; production decisions, SMTP/email recovery, cutover, and legacy retirement are human-gated. | Resolve T02 production decisions first, then build the disposable Forgejo probe. Do not start production cutover before promotion lifecycle, email recovery, package registry, Actions, backup/restore, and migration drill pass. | +| `artifact-store-wp-0007` | All tasks are still `todo`; no live secret gate is currently recorded. | Start with D7.1 fork/object-store landscape and D7.2 compatibility harness. Route D7.3 STS credential vending to NetKingdom if implementation belongs outside artifact-store. | +| `staged-promotion-lifecycle` | Lifecycle spec is done; schema/tooling/canary/promotion tasks are still `todo`. | Start T02 `railiance/app.toml` contract, then use issue-core/Forgejo as reference consumers for Stage 1/2/3 promotion gates. | + +## Credential And Operator Routing + +`activity-core -> issue-core` REST emission uses route catalog id +`activity-core-issue-sink`. + +Route lookup on 2026-06-27: + +- owner: `activity-core + issue-core` +- ops-warden executes: no +- status: active +- next action: follow `ops-warden/wiki/playbooks/activity-core-issue-sink.md#worker-checklist` + +No secret value was read or written. The required non-secret evidence is: + +- `actcore-runtime-secret` has an `ISSUE_CORE_API_KEY` data key; +- activity-core worker consumes `ISSUE_CORE_URL=http://issue-core.issue-core.svc.cluster.local:8765`; +- `ISSUE_SINK_TYPE=rest`; +- one known-safe activity-core emission returns issue-core HTTP 201 and creates + a Gitea issue. + +## Pickup Order + +1. Close the issue-core handoff gate because the service is already healthy and + only activity-core live emission remains. +2. Start staged-promotion T02 so Forgejo has a repeatable promotion contract + before production cutover work accelerates. +3. Run artifact-store D7.1/D7.2 as an assessment/build harness lane, with D7.3 + routed to NetKingdom if STS vending is not artifact-store-owned. +4. Keep Forgejo production cutover parked behind explicit T02 decisions and the + staged-promotion/backup/email/package/action gates. diff --git a/docs/ops-hub-interhub-evidence-lane-status.md b/docs/ops-hub-interhub-evidence-lane-status.md new file mode 100644 index 0000000..70ecd66 --- /dev/null +++ b/docs/ops-hub-interhub-evidence-lane-status.md @@ -0,0 +1,120 @@ +# Ops Hub Inter-Hub Evidence Lane Status + +Date: 2026-06-27 +Workplan: `CUST-WP-0051-T03` +Related tasks: `CUST-WP-0047-T05`, `CUST-WP-0049-T06`, `IHUB-WP-0022-T03/T04/T07` + +## Summary + +The evidence lane is partially live but not ready to close. + +Production Inter-Hub already exposes the public ops-hub bootstrap surface and +has an `ops-hub` row plus the ops-hub seed vocabulary. The remaining blockers +are: + +1. authenticated bootstrap/runtime-key execution is still operator-gated; +2. protected widget and hub-registry reads cannot be verified without the + ops-hub runtime key; +3. the older `IHUB-WP-0022` activity-core mapping contract does not match the + currently live ops-hub seed vocabulary. + +No secret values were requested, read, printed, or stored during this probe. + +## Public Probe Evidence + +Base URL: `https://hub.coulomb.social` + +| Probe | Result | +| --- | --- | +| `GET /api/v2/hubs` | HTTP `200`; contains `ops-hub` | +| `GET /api/v2/openapi.json` | HTTP `200`; includes `/hubs`, `/hub-capability-manifests`, `/api-consumers`, `/policy-scopes` | +| `GET /api/v2/widgets` | HTTP `401`, protected as expected | +| `GET /api/v2/hub-registry` | HTTP `401`, protected as expected | +| `GET /api/v2/widget-types` | HTTP `200`; 14 ops widget types visible | +| `GET /api/v2/event-types` | HTTP `200`; 15 ops event types visible | +| `GET /api/v2/annotation-categories` | HTTP `200`; 10 ops annotation categories visible | +| `GET /api/v2/policy-scopes` | HTTP `200`; 7 ops policy scopes visible | +| `GET /api/v2/hub-capability-manifests?hubId=` | HTTP `401`, protected as expected | + +Observed public ops-hub id: `4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`. + +The existing `ops-hub/scripts/interhub-gate-probe.py` exits nonzero because it +still expects unauthenticated `/api/v2/hubs` to return `401`. The live contract +returns `200` for public hub discovery and `401` for protected surfaces such as +`/api/v2/widgets` and `/api/v2/hub-registry`. + +## Live Ops Vocabulary + +The live public registry matches `ops-hub/seeds/ops-hub-manifest.draft.json`: + +- widget types: `ops-environment`, `ops-host`, `ops-cluster`, `ops-service`, + `ops-service-catalog`, `ops-endpoint`, `ops-release`, `ops-backup-set`, + `ops-secret-set`, `ops-runbook`, `ops-incident`, `ops-readiness-gate`, + `ops-migration-wave`, `ops-risk`; +- event types: `ops-inventory-registered`, `ops-inventory-updated`, + `ops-service-discovered`, `ops-health-checked`, `ops-release-observed`, + `ops-endpoint-verified`, `ops-backup-verified`, `ops-restore-tested`, + `ops-runbook-executed`, `ops-drift-detected`, `ops-risk-raised`, + `ops-risk-accepted`, `ops-readiness-gate-updated`, + `ops-migration-gate-passed`, `ops-migration-gate-failed`; +- policy scopes: `ops-local`, `ops-transitional-prod`, `ops-production`, + `ops-threephoenix`, `ops-registry`, `ops-secrets`, + `ops-backup-retention`. + +## Contract Mismatch + +`inter-hub/docs/contracts/ops-hub-activity-core-mapping.md` and +`ops-hub-activity-core-event-payloads.md` still describe the early +activity-core proposal: + +| Contract name | Live seed status | Recommended action | +| --- | --- | --- | +| `ops-service-observed` | Not in live event registry | Rename to `ops-service-discovered`, or add an explicit alias event in the ops-hub manifest. | +| `ops-endpoint-verified` | Live | Keep. | +| `ops-access-path-checked` | Not in live event registry; no `ops-access-path` widget type in seed | Either add access-path vocabulary/widgets, or defer access-path submissions and keep State Hub fallback. | +| `ops-backup-verified` | Live | Keep, but map to `ops-backup-set` widget type. | +| `ops-inventory-drift` | Not in live event registry | Rename to `ops-drift-detected`, or add an explicit alias event. | +| `ops-evidence` policy scope | Not in live policy scopes | Use an existing ops scope or add `ops-evidence` to the manifest and activate it. | +| aggregate refs such as `ops:service:aggregate` | Not in `ops-hub/seeds/ops-hub-widgets.seed.json` | Seed aggregate intake widgets or change mapping to the existing entity/readiness widgets. | +| widget types such as `ops-service-card` | Not in live widget types | Use live widget types like `ops-service`, `ops-endpoint`, `ops-backup-set`, and `ops-readiness-gate`. | + + +## 2026-06-27 Contract Alignment + +The Inter-Hub contract docs were revised in `/home/worsch/inter-hub` to target +the live ops-hub seed vocabulary: + +- `ops-service-observed` is now a transition alias for + `ops-service-discovered`. +- `ops-inventory-drift` is now a transition alias for `ops-drift-detected`. +- `ops-access-path-checked` is explicitly deferred to State Hub fallback until + ops-hub adds access-path vocabulary or a readiness/risk mapping decision. +- The old `ops-evidence` policy scope is replaced by declared live scopes such + as `ops-production`, `ops-registry`, and `ops-backup-retention`. +- Payload examples now post only live manifest event types. + +This removes the known contract-drift blocker before the attended bootstrap. +The remaining gate is authenticated widget lookup, any missing backup/risk seed +widget, runtime key custody, and protected event submission smoke. + +## Current Closure State + +`CUST-WP-0049-T06` remains `wait`: the helper and runbook are ready, but an +approved authenticated execution lane is still required. + +`CUST-WP-0047-T05` remains `wait`: the ops-hub row and vocabulary are visible, +but seeded widgets and event acceptance cannot be proven without the protected +runtime path. + +`IHUB-WP-0022-T03/T04/T07` remain gated: before an end-to-end smoke, reconcile +the activity-core mapping contract to the live ops-hub seed vocabulary or add +the missing aliases/aggregate widgets to the manifest. + +## Next Pick + +1. Use the aligned live-vocabulary contract for the attended + `CUST-WP-0049-T06` bootstrap. +2. Confirm protected widget ids and seed any missing backup/risk target widgets + required by the mapping. +3. Store or confirm `OPS_HUB_KEY` through OpenBao, then run the protected + widget/hub-registry/event smoke. diff --git a/docs/state-hub-migration-strategy-status.md b/docs/state-hub-migration-strategy-status.md new file mode 100644 index 0000000..c72dd9d --- /dev/null +++ b/docs/state-hub-migration-strategy-status.md @@ -0,0 +1,34 @@ +# State Hub Migration Strategy Status + +Updated: 2026-06-27 + +## Decision + +Use `CUST-WP-0011` as the active State Hub stabilization path. +Keep `CUST-WP-0038` and `RAIL-BS-WP-0007` as deferred HA/ThreePhoenix follow-up lanes. + +Rationale: the pragmatic railiance01 deployment has already completed image +publish, cluster manifests, empty deploy, migrations, WSL2 data restore, row-count +comparison, and cluster API health checks. The remaining work is cutover and +stabilization, not initial buildout. + +## Current State + +| Path | State | Next action | +| --- | --- | --- | +| `CUST-WP-0011` pragmatic railiance01 | T01-T06 done. Cluster State Hub has verified restored WSL2 data and healthy API. | T07: get explicit approval to freeze WSL2 writes, restore final dump, compare again, and redirect private access/MCP to the cluster endpoint. | +| `CUST-WP-0038` full HA State Hub | Entry criteria depend on completing or superseding CUST-WP-0011 and passing stabilization. All implementation tasks are still todo. | Defer until cluster-hosted State Hub proves stable and ThreePhoenix storage/database strategy is current. | +| `RAIL-BS-WP-0007` ThreePhoenix HA cluster | All phases are todo. | Treat as substrate work for future critical workloads and HA State Hub, not as a blocker for pragmatic cutover. | + +## Human Gates + +- `CUST-WP-0011-T07`: explicit approval required before freezing WSL2 writes and making the cluster State Hub primary. +- `CUST-WP-0038-T08`: explicit approval required before retiring WSL2 fallback after HA failover and restore drills. + +## Stable Pickup Path + +1. Reconfirm current WSL2 backup and take final pre-cutover dump. +2. Restore final dump into railiance01 State Hub and compare counts again. +3. Redirect the active private access path: either keep local `127.0.0.1:8000` and move it to an ops-bridge/SSH tunnel, or set MCP `API_BASE` to the private cluster endpoint. +4. Run stabilization with WSL2 retained as fallback. +5. Document the operating model and leave final retirement to a later explicit decision or HA workplan. diff --git a/workplans/CUST-WP-0014-repo-sync-automation.md b/workplans/CUST-WP-0014-repo-sync-automation.md index a6e46d1..502a18a 100644 --- a/workplans/CUST-WP-0014-repo-sync-automation.md +++ b/workplans/CUST-WP-0014-repo-sync-automation.md @@ -4,14 +4,19 @@ type: workplan title: Repo Sync Automation & Gitea Inventory domain: infotech repo: the-custodian -status: done +status: backlog state_hub_workstream_id: 27ea80bd-76bf-44a7-b0ed-e09748d5390b created: 2026-03-16 -updated: 2026-03-16 +updated: 2026-06-27 --- # CUST-WP-0014 — Repo Sync Automation & Gitea Inventory +2026-06-27 stabilization note: this workplan was previously marked `done` even +though all task blocks remained `todo`. It has been reopened as `backlog` so the +State Hub read model reflects the actual remaining sync-health work without +adding it to the current execution queue. + ## Problem When a repo agent completes work and commits, the state-hub does not automatically diff --git a/workplans/CUST-WP-0025-fos-hub-bootstrap.md b/workplans/CUST-WP-0025-fos-hub-bootstrap.md index 684151c..67be8b2 100644 --- a/workplans/CUST-WP-0025-fos-hub-bootstrap.md +++ b/workplans/CUST-WP-0025-fos-hub-bootstrap.md @@ -8,7 +8,7 @@ status: active owner: custodian topic_slug: custodian created: "2026-03-20" -updated: "2026-06-22" +updated: "2026-06-27" state_hub_workstream_id: "293a74fe-a85a-4ad6-8933-23d52a72fe8b" --- @@ -57,7 +57,7 @@ OAS gives the viable infrastructure. ```task id: CUST-WP-0025-T01 -status: todo +status: cancel priority: high state_hub_task_id: "f55078b6-7fa3-49ab-be30-37db622d64c9" ``` @@ -68,11 +68,13 @@ foundation for all hubs and services. Cross-reference: net-kingdom NK-WP-0001. +2026-06-27 sequencing update: cancelled as an obsolete prerequisite. `NK-WP-0001` is archived and superseded by the KeyCape/Authelia/LLDAP lightweight stack, `NK-WP-0012` IAM Profile v0.2, and the proposed `NK-WP-0011` expanded-mode Keycloak federation lane. FOS bootstrap should not wait for this old Keycloak path. + ### T02 — Complete NK-WP-0002: Local identity bootstrap ```task id: CUST-WP-0025-T02 -status: todo +status: done priority: high state_hub_task_id: "0d7792f7-5695-4e1a-9726-b9661d5e7108" ``` @@ -83,6 +85,8 @@ development of hub services without cluster dependency. Cross-reference: net-kingdom NK-WP-0002. +2026-06-27 sequencing update: marked done. `NK-WP-0002` is complete: local-identity file store, Keycloak export, minimal localhost OIDC provider, permissions hardening, audit log, and docs are all delivered. + ### T03 — IAM Profile integration test ```task @@ -100,6 +104,8 @@ Write a minimal test service + integration test that: This test becomes the template for hub-core auth middleware. +2026-06-27 sequencing update: this remains the real open identity gate, but it should target the current NetKingdom IAM Profile v0.2 contract and either local-identity or KeyCape lightweight issuer, not the archived `NK-WP-0001` Keycloak path. + ### T04 — Canon standard: IAM Profile specification ```task diff --git a/workplans/CUST-WP-0045-cutover-runbook.md b/workplans/CUST-WP-0045-cutover-runbook.md index 386bf1b..f7039b2 100644 --- a/workplans/CUST-WP-0045-cutover-runbook.md +++ b/workplans/CUST-WP-0045-cutover-runbook.md @@ -3,7 +3,9 @@ id: CUST-WP-0045-cutover-runbook type: runbook title: "CUST-WP-0045 T06 cutover — exact command sequence" parent_workplan: CUST-WP-0045 +status: finished created: "2026-06-01" +updated: "2026-06-27" state_hub_workstream_id: "4ebc847b-4a2c-4ce2-9fd2-62d3071eed96" domain: infotech --- diff --git a/workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md b/workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md index a144779..0ddb604 100644 --- a/workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md +++ b/workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md @@ -155,6 +155,14 @@ remaining live-execution blocker. Done when the ops-hub widgets exist and can accept `ops-endpoint-verified` or equivalent ops evidence events. +2026-06-27 non-secret probe: production Inter-Hub publicly lists the `ops-hub` +hub row (`4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`) and the ops-hub seed +vocabulary. Protected `/api/v2/widgets` and `/api/v2/hub-registry` return +HTTP `401` without a runtime key, so widget presence and event acceptance still +require the approved operator/runtime-key lane. The activity-core mapping +contract also needs reconciliation with the live ops-hub seed vocabulary before +smoke closure. + ## Task: Build First Ops-Hub Service Catalog View ```task diff --git a/workplans/CUST-WP-0049-interhub-bootstrap-access-lane.md b/workplans/CUST-WP-0049-interhub-bootstrap-access-lane.md index 38371d0..5072ea6 100644 --- a/workplans/CUST-WP-0049-interhub-bootstrap-access-lane.md +++ b/workplans/CUST-WP-0049-interhub-bootstrap-access-lane.md @@ -180,6 +180,12 @@ Done when the ops-hub Inter-Hub records exist in production, the generated runtime key is stored outside Git, and non-secret validation evidence is logged to State Hub. +2026-06-27 non-secret probe: production Inter-Hub already has the `ops-hub` row +and public ops vocabulary. The live blocker remains authenticated execution and +runtime-key custody, plus a contract-alignment issue between `IHUB-WP-0022` +activity-core mapping docs and the live ops-hub seed vocabulary. Do not spend +operator key time on the smoke until the vocabulary/mapping direction is chosen. + ## Acceptance Criteria - The repeatable access lane is documented in the owning repos. diff --git a/workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md b/workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md new file mode 100644 index 0000000..68a0133 --- /dev/null +++ b/workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md @@ -0,0 +1,395 @@ +--- +id: CUST-WP-0051 +type: workplan +title: "Infrastructure Stabilization Metaplan" +domain: infotech +repo: the-custodian +status: active +owner: codex +topic_slug: custodian +planning_priority: high +planning_order: 51 +created: "2026-06-27" +updated: "2026-06-27" +state_hub_workstream_id: "21cabc98-3f80-4d00-b3b7-06e2ac2af88f" +--- + +# CUST-WP-0051 - Infrastructure Stabilization Metaplan + +## Goal + +Drive the registered infrastructure workplans from a scattered blocked state to +a stable checkpoint where: + +- active blockers have a named owner, route, and next command or decision; +- production credential work uses approved custody paths only; +- daily operational automation has one healthy runner and clean evidence; +- State Hub registration reflects the real file state; +- unfinished strategic work is sequenced into clear follow-on lanes. + +This workplan does not replace the child workplans. It is the coordination lane +for removing cross-workplan blocks and creating a reliable handoff point. + +## Review Snapshot + +Reviewed on 2026-06-27 from State Hub and the repo workplan files. + +Active registered workstreams with open work: + +| Workstream | Open state | Main stabilization meaning | +| --- | --- | --- | +| artifact-store-wp-0007 | 5 todo | Object-store compatibility and STS credential vending lane. | +| ihub-wp-0022 | 3 wait, 5 done | Ops-hub evidence intake waits on widget seed/runtime key/smoke. | +| cust-wp-0047 | 1 wait, 6 done | Ops-hub now view waits on Inter-Hub widget activation. | +| cust-wp-0049 | 1 wait, 5 done | Access lane is ready; live bootstrap needs approved admin execution. | +| activity-wp-0016 | 1 wait, 2 progress, 5 todo, 2 done | Daily-triage output robustness needs live deploy/smoke evidence. | +| three-phoenix-ha-cluster | 7 todo | Target HA substrate is planned but not executed. | +| staged-promotion-lifecycle | 6 todo, 1 done | Promotion discipline needed before broad production cutovers. | +| rail-ho-wp-0005 | 11 todo, 1 progress | Forgejo production migration needs human design and cutover decisions. | +| cust-wp-0045-cutover-runbook | 0 tasks | Registered runbook is appearing as an active no-task workstream. | +| net-wp-0020 | 2 wait, 1 todo, 2 done | OpenBao unseal custody models still need operator profile decisions. | +| issue-wp-0003 | 2 progress, 5 done | issue-core deploy is close; finish live wiring and runbook evidence. | +| activity-wp-0006 | 1 wait, 1 todo, 6 done | Three-run calibration waits on the daily-triage live gate. | +| cust-wp-0038 | 8 todo | Full ThreePhoenix State Hub HA migration remains strategic follow-on. | +| cust-wp-0025 | 17 todo, 9 done | FOS hub bootstrap now depends on identity, ops-hub, and fin-hub lanes. | +| cust-wp-0011 | 3 todo, 6 done | Pragmatic State Hub railiance01 migration still needs cutover/stabilize/retire. | + +Additional repo-local hygiene issue: + +- `CUST-WP-0014` has frontmatter `status: done` but all six task blocks are + still `todo`. Treat it as either superseded and archive it, or reopen it as a + focused State Hub sync-health workplan. + +State Hub hygiene issue: + +- There are stale `needs_human` flags on completed or cancelled tasks. These do + not all block execution, but they make the operator view noisier and should be + cleared or annotated after the source workplans are reconciled. + +## Dependency Shape + +The critical path is: + +1. Credential and operator-access custody: + OpenBao, Inter-Hub operator key, ops-hub runtime key, Forgejo SMTP/cutover + approvals, and OpenBao unseal profile decisions. +2. Ops evidence and daily automation: + Inter-Hub ops-hub records, activity-core daily-triage robustness deployment, + schema-valid smoke, then three clean scheduled runs. +3. Production substrate and source forge: + issue-core GitOps pilot, Forgejo production migration, artifact-store STS, + staged promotion, and State Hub migration strategy. +4. Federation buildout: + identity completion, ops-hub scaffold, ops-hub MCP registration, fin-hub + scaffold, and business/runway canon. + +## Task: Normalize Registry And Workplan Hygiene + +```task +id: CUST-WP-0051-T01 +status: done +priority: high +state_hub_task_id: "7e83bd50-5ca2-4341-9d18-65512e3f0442" +``` + +Clean up the planning substrate before execution work resumes. + +Minimum scope: + +- Decide whether `CUST-WP-0045-cutover-runbook` should stay registered as an + active workstream or be represented only as a runbook under `CUST-WP-0045`. +- Resolve `CUST-WP-0014`: archive as superseded, or reopen and re-scope the six + remaining State Hub sync-health tasks. +- Clear or annotate stale `needs_human` flags on done/cancel tasks after source + workplans confirm they are no longer live gates. +- Run State Hub consistency after file changes. + +Done when the active workstream list no longer contains no-task runbooks or +contradictory done-with-todo files, and the human-needed view shows only live +human gates. + +Progress 2026-06-27: + +- `CUST-WP-0045-cutover-runbook` now has `status: finished`; State Hub no + longer lists it as an active workstream. +- `CUST-WP-0014` is reopened as `backlog` with its task detail preserved, so it + is no longer a contradictory done-with-todo file or an active queue item. +- `make fix-consistency REPO=the-custodian` passed with pre-existing C-12 + warnings and synced the lifecycle changes into State Hub. + +Completed 2026-06-27: cleared 15 stale `needs_human` flags from tasks that +were already `done` or `cancel`, leaving live `todo`/`progress`/`wait` human +gates untouched. T01 is complete. + +## Task: Establish One Credential-Custody Unblock Board + +```task +id: CUST-WP-0051-T02 +status: done +priority: high +state_hub_task_id: "312bde29-4370-4352-b5a3-00a8c4fe2059" +``` + +Collect the live operator-access decisions in one non-secret board. + +Inputs: + +- `CUST-WP-0049-T06`: Inter-Hub admin access or deployment-side bootstrap path. +- `IHUB-WP-0022-T04`: ops-hub runtime `OPS_HUB_KEY` custody. +- `NET-WP-0020`: OpenBao unseal custody and SSH automation profile. +- `RAIL-HO-WP-0005`: Forgejo hostname, SMTP, runner, backup, cutover, rollback, + and retirement decisions. + +Rules: + +- Do not put secrets in Git, State Hub, workplans, or chat. +- Use `warden route find` / `warden route show` before requesting credentials. +- Treat ops-warden as SSH certificate authority only, not as a secret store. + +Done when each human/operator gate has an owner, approved route, expected +execution host, non-secret evidence target, and fallback decision. + +Completed 2026-06-27: added `docs/credential-custody-unblock-board.md` with +route records, live gate owners, expected execution hosts, non-secret evidence +targets, fallback decisions, and pickup order. Route lookup was verified through +`/home/worsch/ops-warden` using `uv run warden route show ... --json` because +the globally installed `warden` lacks the `route` subcommand. + +## Task: Close The Ops-Hub Inter-Hub Evidence Lane + +```task +id: CUST-WP-0051-T03 +status: progress +priority: high +state_hub_task_id: "d6c3a39e-629d-47e4-b589-9e1a0273d9fa" +``` + +Finish the linked ops-hub activation chain: + +- Execute `CUST-WP-0049-T06` using the approved access route. +- Close `CUST-WP-0047-T05` by proving ops-hub widgets exist and accept evidence + events. +- Unblock `IHUB-WP-0022` by provisioning the runtime key through the approved + secret path and running the end-to-end evidence submission smoke. + +Done when ops inventory probes and activity-core evidence can land in Inter-Hub +without manual SQL or secret exposure. + +Progress 2026-06-27: + +- Added `docs/ops-hub-interhub-evidence-lane-status.md` with non-secret public + probe evidence. Production Inter-Hub has an `ops-hub` row and the ops-hub seed + vocabulary is visible on public registry endpoints. +- Protected widget, manifest, and hub-registry surfaces correctly require + authentication; no runtime-key smoke was attempted. +- New blocker surfaced: the older `IHUB-WP-0022` activity-core mapping contract + names event types, policy scope, aggregate widget refs, and widget types that + do not match the live ops-hub seed vocabulary. Align that contract before an + attended bootstrap/runtime-key smoke, or the operator key may still hit + manifest/schema failures. + +Progress 2026-06-27 contract alignment: + +- Updated `/home/worsch/inter-hub` contract docs for `IHUB-WP-0022` to target + the live ops-hub seed vocabulary. Old `ops-service-observed` and + `ops-inventory-drift` names are transition aliases, `ops-access-path-checked` + is deferred to fallback until supported, and payload examples now post only + live manifest event types. +- Ran `make fix-consistency REPO=inter-hub`; it passed with pre-existing C-12 + warnings and synced the IHUB-WP-0022 description drift into State Hub. +- Remaining T03 gate is authenticated widget lookup, any missing backup/risk + seed widget, runtime key custody, and protected submission smoke. + +## Task: Stabilize Daily-Triage Automation + +```task +id: CUST-WP-0051-T04 +status: progress +priority: high +state_hub_task_id: "42810d3b-5557-4efd-871b-65bef7c19e0e" +``` + +Finish the activity-core daily-triage reliability lane. + +Sequence: + +1. Deploy the `activity-wp-0016` robustness bundle: bounded prompt/schema, + per-item parsing, quarantine lane, and producer guardrails. +2. Run a schema-valid live daily-triage smoke on railiance01. +3. Collect three clean scheduled runs with matching activity-core, State Hub, + and working-memory evidence. +4. Close `activity-wp-0006` calibration and decide the fate of the + `CUST-WP-0045` cutover runbook registration. + +Done when there is exactly one trusted daily triage runner and the fallback +state is documented. + +Progress 2026-06-27: + +- Added `docs/daily-triage-stabilization-status.md` with the current evidence + chain. The 2026-06-24 and 2026-06-25 scheduled runs were schema-valid; the + 2026-06-26 and 2026-06-27 runs reached State Hub and working memory but failed + output validation around char 5.2k. +- Current primary blocker is no longer a silent schedule or State Hub sink + outage. The live runner still needs the `ACTIVITY-WP-0016` code/schema bundle + and Railiance runtime prompt changes so malformed tails degrade to quarantined + partial output. +- Pickup sequence: deploy WP-0016 code/schema together, update the runtime + prompt bundle for bounded top-N/per-item framing/token headroom, run a live + railiance01 smoke, then restart the three-clean-run gate. +- Normalized ACTIVITY-WP-0016 source task status in activity-core: T04 is done + and T05 is progress, matching its own progress notes. +- Updated activity-core daily-triage source notes: ACTIVITY-WP-0010-T02 is + now done, T03/T04 point at the post-WP-0016 live smoke and three-run gate, + and ACTIVITY-WP-0006-T03 records the 2026-06-27 validation failure. +- Cleared the stale human-needed flag from the completed bridge/config task and + moved live intervention notes onto the deploy/smoke/calibration gate. + +## Task: Finish Near-Term Production Service Lanes + +```task +id: CUST-WP-0051-T05 +status: progress +priority: medium +state_hub_task_id: "2083f0e4-e037-48bf-8069-f31e8db2fd95" +``` + +Move near-complete service workstreams to done before starting larger migrations. + +Priority order: + +- `issue-wp-0003`: finish activity-core wiring and end-to-end GitOps runbook. +- `rail-ho-wp-0005`: resolve Forgejo production decisions, email recovery, and + cutover approval gates. +- `artifact-store-wp-0007`: complete MinIO compatibility and STS credential + vending assessment if it is required by backup, registry, or app lanes. +- `staged-promotion-lifecycle`: make production promotion gates explicit before + further cluster/source-forge cutovers. + +Done when each lane is either finished or parked with a precise dependency and +no ambiguous human-needed state. + +Progress 2026-06-27: + +- Added `docs/near-term-production-service-lanes-status.md` with a lane board + for issue-core, Forgejo, artifact-store, and staged promotion. +- issue-core is the immediate near-done lane: the service itself is healthy, but + activity-core still points at port `8010` and `ISSUE_SINK_TYPE=null`. Do not + flip it to REST until `ISSUE_CORE_API_KEY` is injected into activity-core's + runtime secret via route `activity-core-issue-sink`. +- Forgejo remains parked behind explicit production design decisions, SMTP/email + recovery, package registry, Actions, backup/restore, migration drill, and + cutover approval. +- artifact-store and staged promotion are executable planning/build lanes: + start artifact-store D7.1/D7.2 and staged-promotion T02 before broad + production source-forge migration work. + +## Task: Decide State Hub Migration Strategy + +```task +id: CUST-WP-0051-T06 +status: progress +priority: high +state_hub_task_id: "0ac3763f-eac0-4773-9be8-cb0a7979e444" +``` + +Choose and execute the State Hub stabilization path. + +Decision: + +- If pragmatic railiance01 service is enough for the next operating period, + finish `CUST-WP-0011`: cutover MCP config, observe the stabilization window, + then retire or retain WSL2 fallback by explicit decision. +- If HA is now required, promote `CUST-WP-0038` and the ThreePhoenix HA cluster + lane: readiness, storage/database strategy, HA API behavior, failover drill, + restore drill, and endpoint/runbook update. + +Done when the active State Hub path is singular, tested, and documented, and +the alternate path is either cancelled, deferred, or explicitly retained as a +future workplan. + +Progress 2026-06-27: + +- Added `docs/state-hub-migration-strategy-status.md` and selected + the pragmatic `CUST-WP-0011` railiance01 path as the singular active + State Hub stabilization lane. +- `CUST-WP-0011` is already through T01-T06: image pushed, cluster + manifests defined, empty deploy healthy, migrations run, WSL2 data restored, + row counts compared, and cluster API health/summary verified. +- Next gate is `CUST-WP-0011-T07`: explicit approval to freeze WSL2 + writes, restore the final dump, compare again, and redirect MCP/private access + to the cluster endpoint. +- `CUST-WP-0038` and `RAIL-BS-WP-0007` remain deferred HA + lanes until the pragmatic path stabilizes and ThreePhoenix storage/database + strategy is current. + +## Task: Sequence FOS Hub Bootstrap To Completion + +```task +id: CUST-WP-0051-T07 +status: progress +priority: medium +state_hub_task_id: "27b6828a-9e87-4135-a036-bce760c3057c" +``` + +Use the stabilized substrate to finish `CUST-WP-0025` without reviving the +mega-hub pattern. + +Recommended order: + +1. Finish identity foundations: NK-WP-0001, NK-WP-0002, then the IAM profile + integration test. +2. Create the standalone ops-hub repo from hub-core and ingest the inventory + artifacts from `CUST-WP-0047`. +3. Add ops models, MCP tools, Railiance integration, dev-hub coupling, dashboard, + and MCP registration. +4. Only then start the fin-hub/business-model tasks. + +Done when `CUST-WP-0025` has no open foundational identity or ops-hub tasks and +fin-hub work is either started on a stable scaffold or deliberately deferred. + +Progress 2026-06-27: + +- Added `docs/fos-hub-bootstrap-sequence-status.md` with the current sequence. +- Corrected the identity foundation baseline in `CUST-WP-0025`: the old + `NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local + identity is done, and the remaining identity gate is the IAM Profile v0.2 + FastAPI integration test. +- Current ops-hub reality is extension-first: `ops-hub` exists, + `OPS-WP-0001` is finished, and `OPS-WP-0002` waits on authenticated + Inter-Hub bootstrap/runtime-key evidence. Reconcile `CUST-WP-0025-T13`-`T19` + after the first governed ops event lands. +- Fin-hub/business tasks remain deliberately deferred until identity integration + and ops-hub extension evidence are proven. + +## Task: Create The Stable Pickup Checkpoint + +```task +id: CUST-WP-0051-T08 +status: done +priority: high +state_hub_task_id: "2cc0a127-a749-4228-962e-f8c9b693a1b3" +``` + +Close this metaplan by creating an operator-friendly checkpoint. + +Minimum contents: + +- active workstream list with zero stale runbooks and zero contradictory task + states; +- blocker board showing no unowned credential, access, or approval gates; +- daily automation evidence from the latest successful scheduled run; +- production service status summary for State Hub, Inter-Hub, ops-hub evidence, + issue-core, Forgejo, and artifact-store; +- explicit next-pick list for remaining strategic tasks. + +Done when a future agent can start from the checkpoint and choose the next +workplan without reconstructing this review. + + +Completed 2026-06-27: added +`docs/infrastructure-stabilization-pickup-checkpoint.md` with the live active +workstream list, named blocker board, latest daily-triage evidence, production +service status summary, and next-pick sequence. This closes the handoff surface +for future agents while the child workplans remain the execution source of +truth.