Add infrastructure stabilization checkpoint
This commit is contained in:
67
docs/credential-custody-unblock-board.md
Normal file
67
docs/credential-custody-unblock-board.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Credential Custody Unblock Board
|
||||
|
||||
Created: 2026-06-27
|
||||
Owner: the-custodian coordination; credential owners remain with their owning repos.
|
||||
|
||||
## Purpose
|
||||
|
||||
This board collects the live credential and operator-access gates that block the
|
||||
infrastructure stabilization plan. It records routes and non-secret evidence
|
||||
only. It is not a secret store, approval record, or substitute for the owning
|
||||
repo runbooks.
|
||||
|
||||
## Rules
|
||||
|
||||
- Do not put secrets in Git, State Hub, workplans, shell history, or chat.
|
||||
- Use the current ops-warden source CLI for routing if the installed `warden`
|
||||
lacks `route` commands: `cd /home/worsch/ops-warden && uv run warden route ...`.
|
||||
- `ops-warden` executes SSH certificate issuance only. It does not vend API
|
||||
keys, OpenBao tokens, SMTP passwords, OIDC logins, or database credentials.
|
||||
- OpenBao/API credentials route to `railiance-platform`; interactive identity
|
||||
routes to `key-cape`; tunnels route to `ops-bridge`; host principal and
|
||||
force-command deployment routes to `railiance-infra`.
|
||||
- Evidence may include ids, prefixes, counts, decision ids, HTTP status, and
|
||||
smoke pass/fail. It must not include credential values.
|
||||
|
||||
## Route Records
|
||||
|
||||
| Route id | Owner | Scope | Warden executes? | Reference |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `openbao-api-key` | `railiance-platform` | API keys, DB credentials, provider tokens, OpenBao KV/dynamic leases | No | `wiki/CredentialRouting.md#routing-table` |
|
||||
| `inter-hub-bootstrap-ssh` | `ops-warden` + `railiance-infra` | Inter-Hub bootstrap SSH envelope and force-command pattern | No | `wiki/InterHubBootstrapAccessLane.md#worker-checklist` |
|
||||
| `ssh-cert-host-access` | `ops-warden` | Short-lived SSH cert signing for host reachability | Yes | `wiki/AccessRouting.md#issue-vs-route` |
|
||||
| `railiance-infra-principals` | `railiance-infra` | Host SSH principal files and force-command deployment | No | `wiki/CredentialRouting.md#routing-table` |
|
||||
| `key-cape-oidc-login` | `key-cape` | Interactive login, OIDC, MFA, JWT/authentication | No | `wiki/CredentialRouting.md#quick-decision-tree` |
|
||||
| `ops-bridge-tunnel` | `ops-bridge` | SSH tunnels and port forwards | No | `wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist` |
|
||||
|
||||
## Live Gates
|
||||
|
||||
| Gate | Blocking work | Owner and route | Expected execution host | Non-secret evidence | Fallback decision | Next action | Status |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| Inter-Hub ops-hub bootstrap | `CUST-WP-0049-T06`, unblocks `CUST-WP-0047-T05` | `inter-hub-bootstrap-ssh` for the envelope; `openbao-api-key` for operator/runtime key custody; `ssh-cert-host-access` only for cert signing if remote execution is used | Local workstation with `IHUB_OPERATOR_KEY_FILE`, or trusted host with railiance-infra force-command wrapper | Hub id, manifest id, widget count, runtime key prefix only, bootstrap smoke result, State Hub progress id | Prefer API helper. Use deployment-side migration/bootstrap only by explicit operator approval. Manual SQL remains last-resort and must be recorded as an exception. | Operator materializes Inter-Hub operator key through approved custody, runs the ops-hub helper, stores generated runtime key outside Git, removes temp files. | Ready for operator handoff |
|
||||
| Ops-hub runtime evidence key | `IHUB-WP-0022-T04`, then `IHUB-WP-0022-T07` | `openbao-api-key` owned by `railiance-platform` / OpenBao | Operator workstation, OpenBao UI/CLI session, or trusted cluster job; not a Codex-visible shell with printed values | OpenBao path/version or populated key count only, token exchange HTTP status, evidence submission smoke id | Attended one-time key file is acceptable only long enough to store in OpenBao and remove; no chat or State Hub transfer. | Store/provide `OPS_HUB_KEY` via OpenBao path, then run Inter-Hub submission smoke. | Waiting on operator custody |
|
||||
| OpenBao unseal and token automation | `NET-WP-0020`, related OpenBao token-grant and policy-gate blockers | `openbao-api-key` for OpenBao issuer/token paths; `railiance-infra-principals` for host policy; `ssh-cert-host-access` for cert signing; `key-cape-oidc-login` for login/MFA | OpenBao operator terminal, cluster-admin context, or trusted railiance-infra deployment path | Policy names, role names, token accessor only, decision ids, allow/deny smoke result | Keep attended ceremony path until auto-unseal/profile is explicitly approved. Do not invent `warden secret` or paste `VAULT_TOKEN`. | Decide custody profile, apply narrow policy/role through approved issuer path, rerun smoke with non-secret evidence. | Needs operator design/approval |
|
||||
| Forgejo production migration | `RAIL-HO-WP-0005` T02/T06/T11/T12 | `openbao-api-key` for SMTP/package/provider credentials; `key-cape-oidc-login` for login/MFA; `ops-bridge-tunnel` or `ssh-cert-host-access` only for host reachability | Forgejo admin/browser session, railiance01 trusted host, or approved GitOps/deployment path | Decision record id, hostname/exposure choice, SMTP sender/domain alignment, password-reset smoke, backup/restore drill id, package pull smoke, cutover approval id | Keep Gitea as read-only rollback until stabilization passes; do not retire legacy Gitea without explicit approval. | Resolve production choices, store SMTP credentials through OpenBao, run recovery and migration drills, then request cutover approval. | Needs human production decisions |
|
||||
|
||||
## Route Lookup Commands
|
||||
|
||||
```bash
|
||||
cd /home/worsch/ops-warden
|
||||
uv run warden route show openbao-api-key --json
|
||||
uv run warden route show inter-hub-bootstrap-ssh --json
|
||||
uv run warden route show ssh-cert-host-access --json
|
||||
uv run warden route show railiance-infra-principals --json
|
||||
uv run warden route show key-cape-oidc-login --json
|
||||
uv run warden route show ops-bridge-tunnel --json
|
||||
```
|
||||
|
||||
## Pickup Order
|
||||
|
||||
1. Inter-Hub ops-hub bootstrap, because it unlocks both the now-view and the
|
||||
activity-core evidence lane.
|
||||
2. Ops-hub runtime evidence key, because it is the immediate smoke gate after
|
||||
bootstrap.
|
||||
3. OpenBao custody profile, because several credential-helper and policy-gate
|
||||
blockers collapse once a narrow issuer path exists.
|
||||
4. Forgejo production decisions, because those require human design approval
|
||||
before execution can be responsibly automated.
|
||||
68
docs/daily-triage-stabilization-status.md
Normal file
68
docs/daily-triage-stabilization-status.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Daily-Triage Stabilization Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Purpose
|
||||
|
||||
Track the current daily-triage blocker chain for `CUST-WP-0051-T04` without
|
||||
duplicating the source activity-core workplans.
|
||||
|
||||
## Current Evidence
|
||||
|
||||
State Hub `daily_triage` progress shows the scheduled activity-core runner is
|
||||
alive and can write both State Hub progress and working-memory notes.
|
||||
|
||||
Recent scheduled run evidence:
|
||||
|
||||
| Date | State Hub event | Result |
|
||||
| --- | --- | --- |
|
||||
| 2026-06-24 | `8b4c16ee-ac47-4581-b3ee-a23fc1f682e6` | schema-valid daily triage, working memory written |
|
||||
| 2026-06-25 | `cbba6bc0-14cb-492b-ab23-74b9349326c8` | schema-valid daily triage, working memory written |
|
||||
| 2026-06-26 | `97fd20a0-eee0-45ea-8290-6d91874e1515` | validation failed at char 5268, working memory written |
|
||||
| 2026-06-27 | `c5ab50a8-404b-4e30-849f-841b059ace65` | validation failed at char 5246, working memory written |
|
||||
|
||||
The 2026-06-26 and 2026-06-27 failures are both overlong malformed JSON
|
||||
responses from `daily-triage-report`. They are not missed schedules and they are
|
||||
not silent sink failures.
|
||||
|
||||
## Current Blocker
|
||||
|
||||
The old `ACTIVITY-WP-0010` State Hub bridge note is partially superseded by the
|
||||
newer evidence: scheduled runs are reaching State Hub and the working-memory
|
||||
sink. The current primary blocker is that the live activity-core runtime still
|
||||
uses an output path that can discard the whole report when the model emits a
|
||||
malformed tail.
|
||||
|
||||
`ACTIVITY-WP-0016` has the repo-side mitigation:
|
||||
|
||||
- strict bounded report schema;
|
||||
- item-granular recovery and quarantine;
|
||||
- producer guardrails and ADR-004;
|
||||
- regression tests for the 2026-06-26 failure shape.
|
||||
|
||||
The remaining gate is the live deployment/smoke path:
|
||||
|
||||
1. Deploy the WP-0016 code and schema together.
|
||||
2. Update the Railiance runtime prompt bundle with bounded top-N instructions,
|
||||
per-item framing, value vocabularies, and sufficient `max_tokens` headroom.
|
||||
3. Run a live daily-triage smoke on railiance01 and confirm malformed-tail
|
||||
output degrades to partial valid output with quarantined items.
|
||||
4. Resume the three-clean-scheduled-run gate for `ACTIVITY-WP-0006-T03` and
|
||||
`ACTIVITY-WP-0010-T04`.
|
||||
|
||||
## Hygiene Note
|
||||
|
||||
The State Hub task index currently shows stale duplicate tasks for
|
||||
`ACTIVITY-WP-0016` in addition to the source-file task records. Before relying
|
||||
on activity-core task counts for triage ranking, run activity-core consistency
|
||||
sync and prune or reconcile any stale generated task rows that are no longer
|
||||
linked from the workplan file.
|
||||
|
||||
2026-06-27 status-normalization: ACTIVITY-WP-0016 source task blocks now
|
||||
match the progress notes for T04 (done) and T05 (progress). Remaining hygiene is
|
||||
to remove or reconcile stale duplicate task rows from the State Hub index.
|
||||
|
||||
2026-06-27 gate cleanup: ACTIVITY-WP-0010-T02 is now done because scheduled
|
||||
runner evidence proves the State Hub sink and working-memory path are reachable.
|
||||
The live human-needed notes now sit on the post-deployment smoke, WP-0016 live
|
||||
proof, and three-clean-run calibration tasks.
|
||||
33
docs/fos-hub-bootstrap-sequence-status.md
Normal file
33
docs/fos-hub-bootstrap-sequence-status.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# FOS Hub Bootstrap Sequence Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Purpose
|
||||
|
||||
Track `CUST-WP-0051-T07`: sequence `CUST-WP-0025` so FOS Hub bootstrap can resume from current repo reality rather than the older mega-hub/Keycloak assumptions.
|
||||
|
||||
## Current Decision
|
||||
|
||||
Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workplan is archived and superseded. The active identity baseline is:
|
||||
|
||||
- `NK-WP-0002` local identity: complete; usable for bootstrap/dev OIDC.
|
||||
- `NK-WP-0012` IAM Profile v0.2: finished; canonical NetKingdom-owned profile and conformance suite.
|
||||
- KeyCape/Authelia/LLDAP stack from the superseding NetKingdom path: current lightweight identity mode.
|
||||
- `NK-WP-0011` expanded-mode Keycloak: proposed enterprise federation lane, not a blocker for ops-hub bootstrap.
|
||||
|
||||
## Sequence Board
|
||||
|
||||
| Area | Current state | Pickup action |
|
||||
| --- | --- | --- |
|
||||
| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. |
|
||||
| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. |
|
||||
| Ops hub | The `ops-hub` repo exists as an Inter-Hub Operations extension. `OPS-WP-0001` is finished; `OPS-WP-0002` has T01-T03 done and waits on authenticated bootstrap/runtime key. | Finish the Inter-Hub evidence lane first: align the activity-core mapping with the live ops vocabulary, run attended bootstrap, store runtime key by approved route, then send the first governed ops event. |
|
||||
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` still describe a standalone hub-core/FastAPI/MCP scaffold. Current implementation direction is Inter-Hub extension-first. | Reconcile these tasks after the Inter-Hub evidence lane closes: either rewrite them to extension-owned implementation tasks or explicitly defer the standalone hub-core service. |
|
||||
| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. |
|
||||
|
||||
## Stable Pickup Order
|
||||
|
||||
1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test.
|
||||
2. Finish `CUST-WP-0051-T03` / ops-hub Inter-Hub evidence alignment before expanding ops-hub models/tools.
|
||||
3. Reconcile `CUST-WP-0025-T13`-`T19` against `OPS-WP-0002` once the first ops event lands.
|
||||
4. Start fin-hub/business work only after ops-hub proves the extension pattern end-to-end.
|
||||
128
docs/infrastructure-stabilization-pickup-checkpoint.md
Normal file
128
docs/infrastructure-stabilization-pickup-checkpoint.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Infrastructure Stabilization Pickup Checkpoint
|
||||
|
||||
Updated: 2026-06-27
|
||||
Coordinator workplan: `CUST-WP-0051`
|
||||
|
||||
## Purpose
|
||||
|
||||
This checkpoint is the restart surface for the infrastructure stabilization
|
||||
metaplan. It consolidates the workplan review, unblock boards, current State
|
||||
Hub registration state, and the next strategic picks.
|
||||
|
||||
Use this file first when resuming the lane. Then open the source workplan named
|
||||
in the relevant row and continue from its task state.
|
||||
|
||||
## Registration State
|
||||
|
||||
State Hub active workstreams queried on 2026-06-27:
|
||||
|
||||
| Workstream | Current pickup meaning |
|
||||
| --- | --- |
|
||||
| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
|
||||
| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
|
||||
| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
|
||||
| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
|
||||
| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. |
|
||||
| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
|
||||
| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
|
||||
| `staged-promotion-lifecycle` | Start T02 to make promotion gates concrete before broad production migrations. |
|
||||
| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
|
||||
| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. |
|
||||
| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. |
|
||||
| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
|
||||
| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
|
||||
| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
|
||||
| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. |
|
||||
|
||||
Hygiene status:
|
||||
|
||||
- `CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook
|
||||
record, not an empty active workstream.
|
||||
- `CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with
|
||||
todo task blocks.
|
||||
- Completed or cancelled tasks no longer carry the stale human-needed flags
|
||||
cleared during this stabilization session.
|
||||
- `make fix-consistency REPO=the-custodian` still reports pre-existing C-12
|
||||
orphan-row warnings, but the relevant workplan lifecycle and task states sync.
|
||||
|
||||
## Blocker Board
|
||||
|
||||
No live credential, access, or approval gate is unowned. Do not ask
|
||||
`ops-warden` for secret values; use the route catalog and the owning subsystem.
|
||||
|
||||
| Gate | Owner/route | Non-secret evidence to collect | Next action |
|
||||
| --- | --- | --- | --- |
|
||||
| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
|
||||
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
|
||||
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Use the aligned live-vocabulary mapping, then run attended bootstrap and protected widget lookup. |
|
||||
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Store/provide `OPS_HUB_KEY` outside Git and run the protected evidence smoke. |
|
||||
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. |
|
||||
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
|
||||
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
|
||||
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | Approve custody profile and apply narrow issuer policies before live helper smokes. |
|
||||
|
||||
## Daily Automation Evidence
|
||||
|
||||
The scheduled daily-triage runner is alive and writing State Hub plus working
|
||||
memory evidence. The current blocker is output validation, not scheduling or
|
||||
sink reachability.
|
||||
|
||||
Latest clean scheduled run:
|
||||
|
||||
- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`,
|
||||
schema-valid daily triage, working memory written.
|
||||
|
||||
Latest failed scheduled runs:
|
||||
|
||||
- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed
|
||||
at char 5268, working memory written.
|
||||
- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed
|
||||
at char 5246, working memory written.
|
||||
|
||||
Resume from `docs/daily-triage-stabilization-status.md` and
|
||||
`ACTIVITY-WP-0016` before restarting the three-clean-run gate.
|
||||
|
||||
## Production Service Summary
|
||||
|
||||
| Surface | Stable fact | Remaining gate |
|
||||
| --- | --- | --- |
|
||||
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. |
|
||||
| Inter-Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; public registry vocabulary is visible; Inter-Hub contract docs now target the live seed vocabulary. | Protected widget lookup, runtime key custody, and authenticated event smoke remain. |
|
||||
| ops-hub evidence | `ops-hub` exists as the Inter-Hub Operations extension; `OPS-WP-0001` finished; `OPS-WP-0002` has early seed tasks done. | Attended bootstrap, runtime key custody, protected widget/event smoke. |
|
||||
| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. |
|
||||
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
|
||||
| artifact-store | Workplan is active with all tasks open and no current live secret handoff recorded. | Start D7.1 fork/object-store landscape and D7.2 compatibility harness. |
|
||||
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity and IAM Profile v0.2 are done; hub-core extraction/dev-hub work is done. | Keep `CUST-WP-0025-T03` as the identity integration test, then reconcile old ops-hub scaffold tasks after first Inter-Hub ops event lands. |
|
||||
|
||||
## Next-Pick List
|
||||
|
||||
1. Run the attended Inter-Hub ops-hub bootstrap with the aligned live-vocabulary
|
||||
mapping, confirm protected widget ids, and seed any missing backup/risk target widgets.
|
||||
2. Store/confirm `OPS_HUB_KEY` through approved custody and run the protected
|
||||
widget/hub-registry/event smoke.
|
||||
3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt
|
||||
bundle, then run the railiance01 daily-triage smoke.
|
||||
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
|
||||
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
|
||||
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or
|
||||
record that WSL2 remains primary for the next operating period.
|
||||
6. Start staged-promotion T02 and artifact-store D7.1/D7.2 so Forgejo and
|
||||
storage work inherit clear production promotion gates.
|
||||
7. Keep Forgejo cutover and State Hub HA work parked until their human decision
|
||||
and drill gates are satisfied.
|
||||
|
||||
## Resume Commands
|
||||
|
||||
```bash
|
||||
cd /home/worsch/the-custodian
|
||||
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
|
||||
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
|
||||
sed -n '1,260p' docs/credential-custody-unblock-board.md
|
||||
```
|
||||
|
||||
After workplan edits, sync from State Hub:
|
||||
|
||||
```bash
|
||||
cd /home/worsch/state-hub
|
||||
make fix-consistency REPO=the-custodian
|
||||
```
|
||||
48
docs/near-term-production-service-lanes-status.md
Normal file
48
docs/near-term-production-service-lanes-status.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Near-Term Production Service Lanes Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Purpose
|
||||
|
||||
Track `CUST-WP-0051-T05`: finish or park near-term production service lanes
|
||||
before starting larger migrations.
|
||||
|
||||
## Lane Board
|
||||
|
||||
| Lane | Current state | Next action |
|
||||
| --- | --- | --- |
|
||||
| `issue-wp-0003` | issue-core is live through ArgoCD; image `0.2.1`, Service port `8765`, ExternalSecret Ready, authenticated smoke created Gitea issue `175`. | Do not flip activity-core blindly. First inject `ISSUE_CORE_API_KEY` into `actcore-runtime-secret` through route `activity-core-issue-sink`; then set activity-core `ISSUE_CORE_URL` to port `8765`, set `ISSUE_SINK_TYPE=rest`, restart/sync, and run one safe emission smoke. |
|
||||
| `rail-ho-wp-0005` | Forgejo migration remains pre-implementation. Inventory is in progress; production decisions, SMTP/email recovery, cutover, and legacy retirement are human-gated. | Resolve T02 production decisions first, then build the disposable Forgejo probe. Do not start production cutover before promotion lifecycle, email recovery, package registry, Actions, backup/restore, and migration drill pass. |
|
||||
| `artifact-store-wp-0007` | All tasks are still `todo`; no live secret gate is currently recorded. | Start with D7.1 fork/object-store landscape and D7.2 compatibility harness. Route D7.3 STS credential vending to NetKingdom if implementation belongs outside artifact-store. |
|
||||
| `staged-promotion-lifecycle` | Lifecycle spec is done; schema/tooling/canary/promotion tasks are still `todo`. | Start T02 `railiance/app.toml` contract, then use issue-core/Forgejo as reference consumers for Stage 1/2/3 promotion gates. |
|
||||
|
||||
## Credential And Operator Routing
|
||||
|
||||
`activity-core -> issue-core` REST emission uses route catalog id
|
||||
`activity-core-issue-sink`.
|
||||
|
||||
Route lookup on 2026-06-27:
|
||||
|
||||
- owner: `activity-core + issue-core`
|
||||
- ops-warden executes: no
|
||||
- status: active
|
||||
- next action: follow `ops-warden/wiki/playbooks/activity-core-issue-sink.md#worker-checklist`
|
||||
|
||||
No secret value was read or written. The required non-secret evidence is:
|
||||
|
||||
- `actcore-runtime-secret` has an `ISSUE_CORE_API_KEY` data key;
|
||||
- activity-core worker consumes `ISSUE_CORE_URL=http://issue-core.issue-core.svc.cluster.local:8765`;
|
||||
- `ISSUE_SINK_TYPE=rest`;
|
||||
- one known-safe activity-core emission returns issue-core HTTP 201 and creates
|
||||
a Gitea issue.
|
||||
|
||||
## Pickup Order
|
||||
|
||||
1. Close the issue-core handoff gate because the service is already healthy and
|
||||
only activity-core live emission remains.
|
||||
2. Start staged-promotion T02 so Forgejo has a repeatable promotion contract
|
||||
before production cutover work accelerates.
|
||||
3. Run artifact-store D7.1/D7.2 as an assessment/build harness lane, with D7.3
|
||||
routed to NetKingdom if STS vending is not artifact-store-owned.
|
||||
4. Keep Forgejo production cutover parked behind explicit T02 decisions and the
|
||||
staged-promotion/backup/email/package/action gates.
|
||||
120
docs/ops-hub-interhub-evidence-lane-status.md
Normal file
120
docs/ops-hub-interhub-evidence-lane-status.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Ops Hub Inter-Hub Evidence Lane Status
|
||||
|
||||
Date: 2026-06-27
|
||||
Workplan: `CUST-WP-0051-T03`
|
||||
Related tasks: `CUST-WP-0047-T05`, `CUST-WP-0049-T06`, `IHUB-WP-0022-T03/T04/T07`
|
||||
|
||||
## Summary
|
||||
|
||||
The evidence lane is partially live but not ready to close.
|
||||
|
||||
Production Inter-Hub already exposes the public ops-hub bootstrap surface and
|
||||
has an `ops-hub` row plus the ops-hub seed vocabulary. The remaining blockers
|
||||
are:
|
||||
|
||||
1. authenticated bootstrap/runtime-key execution is still operator-gated;
|
||||
2. protected widget and hub-registry reads cannot be verified without the
|
||||
ops-hub runtime key;
|
||||
3. the older `IHUB-WP-0022` activity-core mapping contract does not match the
|
||||
currently live ops-hub seed vocabulary.
|
||||
|
||||
No secret values were requested, read, printed, or stored during this probe.
|
||||
|
||||
## Public Probe Evidence
|
||||
|
||||
Base URL: `https://hub.coulomb.social`
|
||||
|
||||
| Probe | Result |
|
||||
| --- | --- |
|
||||
| `GET /api/v2/hubs` | HTTP `200`; contains `ops-hub` |
|
||||
| `GET /api/v2/openapi.json` | HTTP `200`; includes `/hubs`, `/hub-capability-manifests`, `/api-consumers`, `/policy-scopes` |
|
||||
| `GET /api/v2/widgets` | HTTP `401`, protected as expected |
|
||||
| `GET /api/v2/hub-registry` | HTTP `401`, protected as expected |
|
||||
| `GET /api/v2/widget-types` | HTTP `200`; 14 ops widget types visible |
|
||||
| `GET /api/v2/event-types` | HTTP `200`; 15 ops event types visible |
|
||||
| `GET /api/v2/annotation-categories` | HTTP `200`; 10 ops annotation categories visible |
|
||||
| `GET /api/v2/policy-scopes` | HTTP `200`; 7 ops policy scopes visible |
|
||||
| `GET /api/v2/hub-capability-manifests?hubId=<ops-hub-id>` | HTTP `401`, protected as expected |
|
||||
|
||||
Observed public ops-hub id: `4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`.
|
||||
|
||||
The existing `ops-hub/scripts/interhub-gate-probe.py` exits nonzero because it
|
||||
still expects unauthenticated `/api/v2/hubs` to return `401`. The live contract
|
||||
returns `200` for public hub discovery and `401` for protected surfaces such as
|
||||
`/api/v2/widgets` and `/api/v2/hub-registry`.
|
||||
|
||||
## Live Ops Vocabulary
|
||||
|
||||
The live public registry matches `ops-hub/seeds/ops-hub-manifest.draft.json`:
|
||||
|
||||
- widget types: `ops-environment`, `ops-host`, `ops-cluster`, `ops-service`,
|
||||
`ops-service-catalog`, `ops-endpoint`, `ops-release`, `ops-backup-set`,
|
||||
`ops-secret-set`, `ops-runbook`, `ops-incident`, `ops-readiness-gate`,
|
||||
`ops-migration-wave`, `ops-risk`;
|
||||
- event types: `ops-inventory-registered`, `ops-inventory-updated`,
|
||||
`ops-service-discovered`, `ops-health-checked`, `ops-release-observed`,
|
||||
`ops-endpoint-verified`, `ops-backup-verified`, `ops-restore-tested`,
|
||||
`ops-runbook-executed`, `ops-drift-detected`, `ops-risk-raised`,
|
||||
`ops-risk-accepted`, `ops-readiness-gate-updated`,
|
||||
`ops-migration-gate-passed`, `ops-migration-gate-failed`;
|
||||
- policy scopes: `ops-local`, `ops-transitional-prod`, `ops-production`,
|
||||
`ops-threephoenix`, `ops-registry`, `ops-secrets`,
|
||||
`ops-backup-retention`.
|
||||
|
||||
## Contract Mismatch
|
||||
|
||||
`inter-hub/docs/contracts/ops-hub-activity-core-mapping.md` and
|
||||
`ops-hub-activity-core-event-payloads.md` still describe the early
|
||||
activity-core proposal:
|
||||
|
||||
| Contract name | Live seed status | Recommended action |
|
||||
| --- | --- | --- |
|
||||
| `ops-service-observed` | Not in live event registry | Rename to `ops-service-discovered`, or add an explicit alias event in the ops-hub manifest. |
|
||||
| `ops-endpoint-verified` | Live | Keep. |
|
||||
| `ops-access-path-checked` | Not in live event registry; no `ops-access-path` widget type in seed | Either add access-path vocabulary/widgets, or defer access-path submissions and keep State Hub fallback. |
|
||||
| `ops-backup-verified` | Live | Keep, but map to `ops-backup-set` widget type. |
|
||||
| `ops-inventory-drift` | Not in live event registry | Rename to `ops-drift-detected`, or add an explicit alias event. |
|
||||
| `ops-evidence` policy scope | Not in live policy scopes | Use an existing ops scope or add `ops-evidence` to the manifest and activate it. |
|
||||
| aggregate refs such as `ops:service:aggregate` | Not in `ops-hub/seeds/ops-hub-widgets.seed.json` | Seed aggregate intake widgets or change mapping to the existing entity/readiness widgets. |
|
||||
| widget types such as `ops-service-card` | Not in live widget types | Use live widget types like `ops-service`, `ops-endpoint`, `ops-backup-set`, and `ops-readiness-gate`. |
|
||||
|
||||
|
||||
## 2026-06-27 Contract Alignment
|
||||
|
||||
The Inter-Hub contract docs were revised in `/home/worsch/inter-hub` to target
|
||||
the live ops-hub seed vocabulary:
|
||||
|
||||
- `ops-service-observed` is now a transition alias for
|
||||
`ops-service-discovered`.
|
||||
- `ops-inventory-drift` is now a transition alias for `ops-drift-detected`.
|
||||
- `ops-access-path-checked` is explicitly deferred to State Hub fallback until
|
||||
ops-hub adds access-path vocabulary or a readiness/risk mapping decision.
|
||||
- The old `ops-evidence` policy scope is replaced by declared live scopes such
|
||||
as `ops-production`, `ops-registry`, and `ops-backup-retention`.
|
||||
- Payload examples now post only live manifest event types.
|
||||
|
||||
This removes the known contract-drift blocker before the attended bootstrap.
|
||||
The remaining gate is authenticated widget lookup, any missing backup/risk seed
|
||||
widget, runtime key custody, and protected event submission smoke.
|
||||
|
||||
## Current Closure State
|
||||
|
||||
`CUST-WP-0049-T06` remains `wait`: the helper and runbook are ready, but an
|
||||
approved authenticated execution lane is still required.
|
||||
|
||||
`CUST-WP-0047-T05` remains `wait`: the ops-hub row and vocabulary are visible,
|
||||
but seeded widgets and event acceptance cannot be proven without the protected
|
||||
runtime path.
|
||||
|
||||
`IHUB-WP-0022-T03/T04/T07` remain gated: before an end-to-end smoke, reconcile
|
||||
the activity-core mapping contract to the live ops-hub seed vocabulary or add
|
||||
the missing aliases/aggregate widgets to the manifest.
|
||||
|
||||
## Next Pick
|
||||
|
||||
1. Use the aligned live-vocabulary contract for the attended
|
||||
`CUST-WP-0049-T06` bootstrap.
|
||||
2. Confirm protected widget ids and seed any missing backup/risk target widgets
|
||||
required by the mapping.
|
||||
3. Store or confirm `OPS_HUB_KEY` through OpenBao, then run the protected
|
||||
widget/hub-registry/event smoke.
|
||||
34
docs/state-hub-migration-strategy-status.md
Normal file
34
docs/state-hub-migration-strategy-status.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# State Hub Migration Strategy Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Decision
|
||||
|
||||
Use `CUST-WP-0011` as the active State Hub stabilization path.
|
||||
Keep `CUST-WP-0038` and `RAIL-BS-WP-0007` as deferred HA/ThreePhoenix follow-up lanes.
|
||||
|
||||
Rationale: the pragmatic railiance01 deployment has already completed image
|
||||
publish, cluster manifests, empty deploy, migrations, WSL2 data restore, row-count
|
||||
comparison, and cluster API health checks. The remaining work is cutover and
|
||||
stabilization, not initial buildout.
|
||||
|
||||
## Current State
|
||||
|
||||
| Path | State | Next action |
|
||||
| --- | --- | --- |
|
||||
| `CUST-WP-0011` pragmatic railiance01 | T01-T06 done. Cluster State Hub has verified restored WSL2 data and healthy API. | T07: get explicit approval to freeze WSL2 writes, restore final dump, compare again, and redirect private access/MCP to the cluster endpoint. |
|
||||
| `CUST-WP-0038` full HA State Hub | Entry criteria depend on completing or superseding CUST-WP-0011 and passing stabilization. All implementation tasks are still todo. | Defer until cluster-hosted State Hub proves stable and ThreePhoenix storage/database strategy is current. |
|
||||
| `RAIL-BS-WP-0007` ThreePhoenix HA cluster | All phases are todo. | Treat as substrate work for future critical workloads and HA State Hub, not as a blocker for pragmatic cutover. |
|
||||
|
||||
## Human Gates
|
||||
|
||||
- `CUST-WP-0011-T07`: explicit approval required before freezing WSL2 writes and making the cluster State Hub primary.
|
||||
- `CUST-WP-0038-T08`: explicit approval required before retiring WSL2 fallback after HA failover and restore drills.
|
||||
|
||||
## Stable Pickup Path
|
||||
|
||||
1. Reconfirm current WSL2 backup and take final pre-cutover dump.
|
||||
2. Restore final dump into railiance01 State Hub and compare counts again.
|
||||
3. Redirect the active private access path: either keep local `127.0.0.1:8000` and move it to an ops-bridge/SSH tunnel, or set MCP `API_BASE` to the private cluster endpoint.
|
||||
4. Run stabilization with WSL2 retained as fallback.
|
||||
5. Document the operating model and leave final retirement to a later explicit decision or HA workplan.
|
||||
Reference in New Issue
Block a user