Add infrastructure stabilization checkpoint
This commit is contained in:
67
docs/credential-custody-unblock-board.md
Normal file
67
docs/credential-custody-unblock-board.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Credential Custody Unblock Board
|
||||
|
||||
Created: 2026-06-27
|
||||
Owner: the-custodian coordination; credential owners remain with their owning repos.
|
||||
|
||||
## Purpose
|
||||
|
||||
This board collects the live credential and operator-access gates that block the
|
||||
infrastructure stabilization plan. It records routes and non-secret evidence
|
||||
only. It is not a secret store, approval record, or substitute for the owning
|
||||
repo runbooks.
|
||||
|
||||
## Rules
|
||||
|
||||
- Do not put secrets in Git, State Hub, workplans, shell history, or chat.
|
||||
- Use the current ops-warden source CLI for routing if the installed `warden`
|
||||
lacks `route` commands: `cd /home/worsch/ops-warden && uv run warden route ...`.
|
||||
- `ops-warden` executes SSH certificate issuance only. It does not vend API
|
||||
keys, OpenBao tokens, SMTP passwords, OIDC logins, or database credentials.
|
||||
- OpenBao/API credentials route to `railiance-platform`; interactive identity
|
||||
routes to `key-cape`; tunnels route to `ops-bridge`; host principal and
|
||||
force-command deployment routes to `railiance-infra`.
|
||||
- Evidence may include ids, prefixes, counts, decision ids, HTTP status, and
|
||||
smoke pass/fail. It must not include credential values.
|
||||
|
||||
## Route Records
|
||||
|
||||
| Route id | Owner | Scope | Warden executes? | Reference |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `openbao-api-key` | `railiance-platform` | API keys, DB credentials, provider tokens, OpenBao KV/dynamic leases | No | `wiki/CredentialRouting.md#routing-table` |
|
||||
| `inter-hub-bootstrap-ssh` | `ops-warden` + `railiance-infra` | Inter-Hub bootstrap SSH envelope and force-command pattern | No | `wiki/InterHubBootstrapAccessLane.md#worker-checklist` |
|
||||
| `ssh-cert-host-access` | `ops-warden` | Short-lived SSH cert signing for host reachability | Yes | `wiki/AccessRouting.md#issue-vs-route` |
|
||||
| `railiance-infra-principals` | `railiance-infra` | Host SSH principal files and force-command deployment | No | `wiki/CredentialRouting.md#routing-table` |
|
||||
| `key-cape-oidc-login` | `key-cape` | Interactive login, OIDC, MFA, JWT/authentication | No | `wiki/CredentialRouting.md#quick-decision-tree` |
|
||||
| `ops-bridge-tunnel` | `ops-bridge` | SSH tunnels and port forwards | No | `wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist` |
|
||||
|
||||
## Live Gates
|
||||
|
||||
| Gate | Blocking work | Owner and route | Expected execution host | Non-secret evidence | Fallback decision | Next action | Status |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| Inter-Hub ops-hub bootstrap | `CUST-WP-0049-T06`, unblocks `CUST-WP-0047-T05` | `inter-hub-bootstrap-ssh` for the envelope; `openbao-api-key` for operator/runtime key custody; `ssh-cert-host-access` only for cert signing if remote execution is used | Local workstation with `IHUB_OPERATOR_KEY_FILE`, or trusted host with railiance-infra force-command wrapper | Hub id, manifest id, widget count, runtime key prefix only, bootstrap smoke result, State Hub progress id | Prefer API helper. Use deployment-side migration/bootstrap only by explicit operator approval. Manual SQL remains last-resort and must be recorded as an exception. | Operator materializes Inter-Hub operator key through approved custody, runs the ops-hub helper, stores generated runtime key outside Git, removes temp files. | Ready for operator handoff |
|
||||
| Ops-hub runtime evidence key | `IHUB-WP-0022-T04`, then `IHUB-WP-0022-T07` | `openbao-api-key` owned by `railiance-platform` / OpenBao | Operator workstation, OpenBao UI/CLI session, or trusted cluster job; not a Codex-visible shell with printed values | OpenBao path/version or populated key count only, token exchange HTTP status, evidence submission smoke id | Attended one-time key file is acceptable only long enough to store in OpenBao and remove; no chat or State Hub transfer. | Store/provide `OPS_HUB_KEY` via OpenBao path, then run Inter-Hub submission smoke. | Waiting on operator custody |
|
||||
| OpenBao unseal and token automation | `NET-WP-0020`, related OpenBao token-grant and policy-gate blockers | `openbao-api-key` for OpenBao issuer/token paths; `railiance-infra-principals` for host policy; `ssh-cert-host-access` for cert signing; `key-cape-oidc-login` for login/MFA | OpenBao operator terminal, cluster-admin context, or trusted railiance-infra deployment path | Policy names, role names, token accessor only, decision ids, allow/deny smoke result | Keep attended ceremony path until auto-unseal/profile is explicitly approved. Do not invent `warden secret` or paste `VAULT_TOKEN`. | Decide custody profile, apply narrow policy/role through approved issuer path, rerun smoke with non-secret evidence. | Needs operator design/approval |
|
||||
| Forgejo production migration | `RAIL-HO-WP-0005` T02/T06/T11/T12 | `openbao-api-key` for SMTP/package/provider credentials; `key-cape-oidc-login` for login/MFA; `ops-bridge-tunnel` or `ssh-cert-host-access` only for host reachability | Forgejo admin/browser session, railiance01 trusted host, or approved GitOps/deployment path | Decision record id, hostname/exposure choice, SMTP sender/domain alignment, password-reset smoke, backup/restore drill id, package pull smoke, cutover approval id | Keep Gitea as read-only rollback until stabilization passes; do not retire legacy Gitea without explicit approval. | Resolve production choices, store SMTP credentials through OpenBao, run recovery and migration drills, then request cutover approval. | Needs human production decisions |
|
||||
|
||||
## Route Lookup Commands
|
||||
|
||||
```bash
|
||||
cd /home/worsch/ops-warden
|
||||
uv run warden route show openbao-api-key --json
|
||||
uv run warden route show inter-hub-bootstrap-ssh --json
|
||||
uv run warden route show ssh-cert-host-access --json
|
||||
uv run warden route show railiance-infra-principals --json
|
||||
uv run warden route show key-cape-oidc-login --json
|
||||
uv run warden route show ops-bridge-tunnel --json
|
||||
```
|
||||
|
||||
## Pickup Order
|
||||
|
||||
1. Inter-Hub ops-hub bootstrap, because it unlocks both the now-view and the
|
||||
activity-core evidence lane.
|
||||
2. Ops-hub runtime evidence key, because it is the immediate smoke gate after
|
||||
bootstrap.
|
||||
3. OpenBao custody profile, because several credential-helper and policy-gate
|
||||
blockers collapse once a narrow issuer path exists.
|
||||
4. Forgejo production decisions, because those require human design approval
|
||||
before execution can be responsibly automated.
|
||||
68
docs/daily-triage-stabilization-status.md
Normal file
68
docs/daily-triage-stabilization-status.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Daily-Triage Stabilization Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Purpose
|
||||
|
||||
Track the current daily-triage blocker chain for `CUST-WP-0051-T04` without
|
||||
duplicating the source activity-core workplans.
|
||||
|
||||
## Current Evidence
|
||||
|
||||
State Hub `daily_triage` progress shows the scheduled activity-core runner is
|
||||
alive and can write both State Hub progress and working-memory notes.
|
||||
|
||||
Recent scheduled run evidence:
|
||||
|
||||
| Date | State Hub event | Result |
|
||||
| --- | --- | --- |
|
||||
| 2026-06-24 | `8b4c16ee-ac47-4581-b3ee-a23fc1f682e6` | schema-valid daily triage, working memory written |
|
||||
| 2026-06-25 | `cbba6bc0-14cb-492b-ab23-74b9349326c8` | schema-valid daily triage, working memory written |
|
||||
| 2026-06-26 | `97fd20a0-eee0-45ea-8290-6d91874e1515` | validation failed at char 5268, working memory written |
|
||||
| 2026-06-27 | `c5ab50a8-404b-4e30-849f-841b059ace65` | validation failed at char 5246, working memory written |
|
||||
|
||||
The 2026-06-26 and 2026-06-27 failures are both overlong malformed JSON
|
||||
responses from `daily-triage-report`. They are not missed schedules and they are
|
||||
not silent sink failures.
|
||||
|
||||
## Current Blocker
|
||||
|
||||
The old `ACTIVITY-WP-0010` State Hub bridge note is partially superseded by the
|
||||
newer evidence: scheduled runs are reaching State Hub and the working-memory
|
||||
sink. The current primary blocker is that the live activity-core runtime still
|
||||
uses an output path that can discard the whole report when the model emits a
|
||||
malformed tail.
|
||||
|
||||
`ACTIVITY-WP-0016` has the repo-side mitigation:
|
||||
|
||||
- strict bounded report schema;
|
||||
- item-granular recovery and quarantine;
|
||||
- producer guardrails and ADR-004;
|
||||
- regression tests for the 2026-06-26 failure shape.
|
||||
|
||||
The remaining gate is the live deployment/smoke path:
|
||||
|
||||
1. Deploy the WP-0016 code and schema together.
|
||||
2. Update the Railiance runtime prompt bundle with bounded top-N instructions,
|
||||
per-item framing, value vocabularies, and sufficient `max_tokens` headroom.
|
||||
3. Run a live daily-triage smoke on railiance01 and confirm malformed-tail
|
||||
output degrades to partial valid output with quarantined items.
|
||||
4. Resume the three-clean-scheduled-run gate for `ACTIVITY-WP-0006-T03` and
|
||||
`ACTIVITY-WP-0010-T04`.
|
||||
|
||||
## Hygiene Note
|
||||
|
||||
The State Hub task index currently shows stale duplicate tasks for
|
||||
`ACTIVITY-WP-0016` in addition to the source-file task records. Before relying
|
||||
on activity-core task counts for triage ranking, run activity-core consistency
|
||||
sync and prune or reconcile any stale generated task rows that are no longer
|
||||
linked from the workplan file.
|
||||
|
||||
2026-06-27 status-normalization: ACTIVITY-WP-0016 source task blocks now
|
||||
match the progress notes for T04 (done) and T05 (progress). Remaining hygiene is
|
||||
to remove or reconcile stale duplicate task rows from the State Hub index.
|
||||
|
||||
2026-06-27 gate cleanup: ACTIVITY-WP-0010-T02 is now done because scheduled
|
||||
runner evidence proves the State Hub sink and working-memory path are reachable.
|
||||
The live human-needed notes now sit on the post-deployment smoke, WP-0016 live
|
||||
proof, and three-clean-run calibration tasks.
|
||||
33
docs/fos-hub-bootstrap-sequence-status.md
Normal file
33
docs/fos-hub-bootstrap-sequence-status.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# FOS Hub Bootstrap Sequence Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Purpose
|
||||
|
||||
Track `CUST-WP-0051-T07`: sequence `CUST-WP-0025` so FOS Hub bootstrap can resume from current repo reality rather than the older mega-hub/Keycloak assumptions.
|
||||
|
||||
## Current Decision
|
||||
|
||||
Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workplan is archived and superseded. The active identity baseline is:
|
||||
|
||||
- `NK-WP-0002` local identity: complete; usable for bootstrap/dev OIDC.
|
||||
- `NK-WP-0012` IAM Profile v0.2: finished; canonical NetKingdom-owned profile and conformance suite.
|
||||
- KeyCape/Authelia/LLDAP stack from the superseding NetKingdom path: current lightweight identity mode.
|
||||
- `NK-WP-0011` expanded-mode Keycloak: proposed enterprise federation lane, not a blocker for ops-hub bootstrap.
|
||||
|
||||
## Sequence Board
|
||||
|
||||
| Area | Current state | Pickup action |
|
||||
| --- | --- | --- |
|
||||
| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. |
|
||||
| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. |
|
||||
| Ops hub | The `ops-hub` repo exists as an Inter-Hub Operations extension. `OPS-WP-0001` is finished; `OPS-WP-0002` has T01-T03 done and waits on authenticated bootstrap/runtime key. | Finish the Inter-Hub evidence lane first: align the activity-core mapping with the live ops vocabulary, run attended bootstrap, store runtime key by approved route, then send the first governed ops event. |
|
||||
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` still describe a standalone hub-core/FastAPI/MCP scaffold. Current implementation direction is Inter-Hub extension-first. | Reconcile these tasks after the Inter-Hub evidence lane closes: either rewrite them to extension-owned implementation tasks or explicitly defer the standalone hub-core service. |
|
||||
| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. |
|
||||
|
||||
## Stable Pickup Order
|
||||
|
||||
1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test.
|
||||
2. Finish `CUST-WP-0051-T03` / ops-hub Inter-Hub evidence alignment before expanding ops-hub models/tools.
|
||||
3. Reconcile `CUST-WP-0025-T13`-`T19` against `OPS-WP-0002` once the first ops event lands.
|
||||
4. Start fin-hub/business work only after ops-hub proves the extension pattern end-to-end.
|
||||
128
docs/infrastructure-stabilization-pickup-checkpoint.md
Normal file
128
docs/infrastructure-stabilization-pickup-checkpoint.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Infrastructure Stabilization Pickup Checkpoint
|
||||
|
||||
Updated: 2026-06-27
|
||||
Coordinator workplan: `CUST-WP-0051`
|
||||
|
||||
## Purpose
|
||||
|
||||
This checkpoint is the restart surface for the infrastructure stabilization
|
||||
metaplan. It consolidates the workplan review, unblock boards, current State
|
||||
Hub registration state, and the next strategic picks.
|
||||
|
||||
Use this file first when resuming the lane. Then open the source workplan named
|
||||
in the relevant row and continue from its task state.
|
||||
|
||||
## Registration State
|
||||
|
||||
State Hub active workstreams queried on 2026-06-27:
|
||||
|
||||
| Workstream | Current pickup meaning |
|
||||
| --- | --- |
|
||||
| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
|
||||
| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
|
||||
| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
|
||||
| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
|
||||
| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. |
|
||||
| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
|
||||
| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
|
||||
| `staged-promotion-lifecycle` | Start T02 to make promotion gates concrete before broad production migrations. |
|
||||
| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
|
||||
| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. |
|
||||
| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. |
|
||||
| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
|
||||
| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
|
||||
| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
|
||||
| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. |
|
||||
|
||||
Hygiene status:
|
||||
|
||||
- `CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook
|
||||
record, not an empty active workstream.
|
||||
- `CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with
|
||||
todo task blocks.
|
||||
- Completed or cancelled tasks no longer carry the stale human-needed flags
|
||||
cleared during this stabilization session.
|
||||
- `make fix-consistency REPO=the-custodian` still reports pre-existing C-12
|
||||
orphan-row warnings, but the relevant workplan lifecycle and task states sync.
|
||||
|
||||
## Blocker Board
|
||||
|
||||
No live credential, access, or approval gate is unowned. Do not ask
|
||||
`ops-warden` for secret values; use the route catalog and the owning subsystem.
|
||||
|
||||
| Gate | Owner/route | Non-secret evidence to collect | Next action |
|
||||
| --- | --- | --- | --- |
|
||||
| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
|
||||
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
|
||||
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Use the aligned live-vocabulary mapping, then run attended bootstrap and protected widget lookup. |
|
||||
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Store/provide `OPS_HUB_KEY` outside Git and run the protected evidence smoke. |
|
||||
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. |
|
||||
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
|
||||
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
|
||||
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | Approve custody profile and apply narrow issuer policies before live helper smokes. |
|
||||
|
||||
## Daily Automation Evidence
|
||||
|
||||
The scheduled daily-triage runner is alive and writing State Hub plus working
|
||||
memory evidence. The current blocker is output validation, not scheduling or
|
||||
sink reachability.
|
||||
|
||||
Latest clean scheduled run:
|
||||
|
||||
- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`,
|
||||
schema-valid daily triage, working memory written.
|
||||
|
||||
Latest failed scheduled runs:
|
||||
|
||||
- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed
|
||||
at char 5268, working memory written.
|
||||
- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed
|
||||
at char 5246, working memory written.
|
||||
|
||||
Resume from `docs/daily-triage-stabilization-status.md` and
|
||||
`ACTIVITY-WP-0016` before restarting the three-clean-run gate.
|
||||
|
||||
## Production Service Summary
|
||||
|
||||
| Surface | Stable fact | Remaining gate |
|
||||
| --- | --- | --- |
|
||||
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. |
|
||||
| Inter-Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; public registry vocabulary is visible; Inter-Hub contract docs now target the live seed vocabulary. | Protected widget lookup, runtime key custody, and authenticated event smoke remain. |
|
||||
| ops-hub evidence | `ops-hub` exists as the Inter-Hub Operations extension; `OPS-WP-0001` finished; `OPS-WP-0002` has early seed tasks done. | Attended bootstrap, runtime key custody, protected widget/event smoke. |
|
||||
| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. |
|
||||
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
|
||||
| artifact-store | Workplan is active with all tasks open and no current live secret handoff recorded. | Start D7.1 fork/object-store landscape and D7.2 compatibility harness. |
|
||||
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity and IAM Profile v0.2 are done; hub-core extraction/dev-hub work is done. | Keep `CUST-WP-0025-T03` as the identity integration test, then reconcile old ops-hub scaffold tasks after first Inter-Hub ops event lands. |
|
||||
|
||||
## Next-Pick List
|
||||
|
||||
1. Run the attended Inter-Hub ops-hub bootstrap with the aligned live-vocabulary
|
||||
mapping, confirm protected widget ids, and seed any missing backup/risk target widgets.
|
||||
2. Store/confirm `OPS_HUB_KEY` through approved custody and run the protected
|
||||
widget/hub-registry/event smoke.
|
||||
3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt
|
||||
bundle, then run the railiance01 daily-triage smoke.
|
||||
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
|
||||
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
|
||||
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or
|
||||
record that WSL2 remains primary for the next operating period.
|
||||
6. Start staged-promotion T02 and artifact-store D7.1/D7.2 so Forgejo and
|
||||
storage work inherit clear production promotion gates.
|
||||
7. Keep Forgejo cutover and State Hub HA work parked until their human decision
|
||||
and drill gates are satisfied.
|
||||
|
||||
## Resume Commands
|
||||
|
||||
```bash
|
||||
cd /home/worsch/the-custodian
|
||||
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
|
||||
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
|
||||
sed -n '1,260p' docs/credential-custody-unblock-board.md
|
||||
```
|
||||
|
||||
After workplan edits, sync from State Hub:
|
||||
|
||||
```bash
|
||||
cd /home/worsch/state-hub
|
||||
make fix-consistency REPO=the-custodian
|
||||
```
|
||||
48
docs/near-term-production-service-lanes-status.md
Normal file
48
docs/near-term-production-service-lanes-status.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Near-Term Production Service Lanes Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Purpose
|
||||
|
||||
Track `CUST-WP-0051-T05`: finish or park near-term production service lanes
|
||||
before starting larger migrations.
|
||||
|
||||
## Lane Board
|
||||
|
||||
| Lane | Current state | Next action |
|
||||
| --- | --- | --- |
|
||||
| `issue-wp-0003` | issue-core is live through ArgoCD; image `0.2.1`, Service port `8765`, ExternalSecret Ready, authenticated smoke created Gitea issue `175`. | Do not flip activity-core blindly. First inject `ISSUE_CORE_API_KEY` into `actcore-runtime-secret` through route `activity-core-issue-sink`; then set activity-core `ISSUE_CORE_URL` to port `8765`, set `ISSUE_SINK_TYPE=rest`, restart/sync, and run one safe emission smoke. |
|
||||
| `rail-ho-wp-0005` | Forgejo migration remains pre-implementation. Inventory is in progress; production decisions, SMTP/email recovery, cutover, and legacy retirement are human-gated. | Resolve T02 production decisions first, then build the disposable Forgejo probe. Do not start production cutover before promotion lifecycle, email recovery, package registry, Actions, backup/restore, and migration drill pass. |
|
||||
| `artifact-store-wp-0007` | All tasks are still `todo`; no live secret gate is currently recorded. | Start with D7.1 fork/object-store landscape and D7.2 compatibility harness. Route D7.3 STS credential vending to NetKingdom if implementation belongs outside artifact-store. |
|
||||
| `staged-promotion-lifecycle` | Lifecycle spec is done; schema/tooling/canary/promotion tasks are still `todo`. | Start T02 `railiance/app.toml` contract, then use issue-core/Forgejo as reference consumers for Stage 1/2/3 promotion gates. |
|
||||
|
||||
## Credential And Operator Routing
|
||||
|
||||
`activity-core -> issue-core` REST emission uses route catalog id
|
||||
`activity-core-issue-sink`.
|
||||
|
||||
Route lookup on 2026-06-27:
|
||||
|
||||
- owner: `activity-core + issue-core`
|
||||
- ops-warden executes: no
|
||||
- status: active
|
||||
- next action: follow `ops-warden/wiki/playbooks/activity-core-issue-sink.md#worker-checklist`
|
||||
|
||||
No secret value was read or written. The required non-secret evidence is:
|
||||
|
||||
- `actcore-runtime-secret` has an `ISSUE_CORE_API_KEY` data key;
|
||||
- activity-core worker consumes `ISSUE_CORE_URL=http://issue-core.issue-core.svc.cluster.local:8765`;
|
||||
- `ISSUE_SINK_TYPE=rest`;
|
||||
- one known-safe activity-core emission returns issue-core HTTP 201 and creates
|
||||
a Gitea issue.
|
||||
|
||||
## Pickup Order
|
||||
|
||||
1. Close the issue-core handoff gate because the service is already healthy and
|
||||
only activity-core live emission remains.
|
||||
2. Start staged-promotion T02 so Forgejo has a repeatable promotion contract
|
||||
before production cutover work accelerates.
|
||||
3. Run artifact-store D7.1/D7.2 as an assessment/build harness lane, with D7.3
|
||||
routed to NetKingdom if STS vending is not artifact-store-owned.
|
||||
4. Keep Forgejo production cutover parked behind explicit T02 decisions and the
|
||||
staged-promotion/backup/email/package/action gates.
|
||||
120
docs/ops-hub-interhub-evidence-lane-status.md
Normal file
120
docs/ops-hub-interhub-evidence-lane-status.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Ops Hub Inter-Hub Evidence Lane Status
|
||||
|
||||
Date: 2026-06-27
|
||||
Workplan: `CUST-WP-0051-T03`
|
||||
Related tasks: `CUST-WP-0047-T05`, `CUST-WP-0049-T06`, `IHUB-WP-0022-T03/T04/T07`
|
||||
|
||||
## Summary
|
||||
|
||||
The evidence lane is partially live but not ready to close.
|
||||
|
||||
Production Inter-Hub already exposes the public ops-hub bootstrap surface and
|
||||
has an `ops-hub` row plus the ops-hub seed vocabulary. The remaining blockers
|
||||
are:
|
||||
|
||||
1. authenticated bootstrap/runtime-key execution is still operator-gated;
|
||||
2. protected widget and hub-registry reads cannot be verified without the
|
||||
ops-hub runtime key;
|
||||
3. the older `IHUB-WP-0022` activity-core mapping contract does not match the
|
||||
currently live ops-hub seed vocabulary.
|
||||
|
||||
No secret values were requested, read, printed, or stored during this probe.
|
||||
|
||||
## Public Probe Evidence
|
||||
|
||||
Base URL: `https://hub.coulomb.social`
|
||||
|
||||
| Probe | Result |
|
||||
| --- | --- |
|
||||
| `GET /api/v2/hubs` | HTTP `200`; contains `ops-hub` |
|
||||
| `GET /api/v2/openapi.json` | HTTP `200`; includes `/hubs`, `/hub-capability-manifests`, `/api-consumers`, `/policy-scopes` |
|
||||
| `GET /api/v2/widgets` | HTTP `401`, protected as expected |
|
||||
| `GET /api/v2/hub-registry` | HTTP `401`, protected as expected |
|
||||
| `GET /api/v2/widget-types` | HTTP `200`; 14 ops widget types visible |
|
||||
| `GET /api/v2/event-types` | HTTP `200`; 15 ops event types visible |
|
||||
| `GET /api/v2/annotation-categories` | HTTP `200`; 10 ops annotation categories visible |
|
||||
| `GET /api/v2/policy-scopes` | HTTP `200`; 7 ops policy scopes visible |
|
||||
| `GET /api/v2/hub-capability-manifests?hubId=<ops-hub-id>` | HTTP `401`, protected as expected |
|
||||
|
||||
Observed public ops-hub id: `4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`.
|
||||
|
||||
The existing `ops-hub/scripts/interhub-gate-probe.py` exits nonzero because it
|
||||
still expects unauthenticated `/api/v2/hubs` to return `401`. The live contract
|
||||
returns `200` for public hub discovery and `401` for protected surfaces such as
|
||||
`/api/v2/widgets` and `/api/v2/hub-registry`.
|
||||
|
||||
## Live Ops Vocabulary
|
||||
|
||||
The live public registry matches `ops-hub/seeds/ops-hub-manifest.draft.json`:
|
||||
|
||||
- widget types: `ops-environment`, `ops-host`, `ops-cluster`, `ops-service`,
|
||||
`ops-service-catalog`, `ops-endpoint`, `ops-release`, `ops-backup-set`,
|
||||
`ops-secret-set`, `ops-runbook`, `ops-incident`, `ops-readiness-gate`,
|
||||
`ops-migration-wave`, `ops-risk`;
|
||||
- event types: `ops-inventory-registered`, `ops-inventory-updated`,
|
||||
`ops-service-discovered`, `ops-health-checked`, `ops-release-observed`,
|
||||
`ops-endpoint-verified`, `ops-backup-verified`, `ops-restore-tested`,
|
||||
`ops-runbook-executed`, `ops-drift-detected`, `ops-risk-raised`,
|
||||
`ops-risk-accepted`, `ops-readiness-gate-updated`,
|
||||
`ops-migration-gate-passed`, `ops-migration-gate-failed`;
|
||||
- policy scopes: `ops-local`, `ops-transitional-prod`, `ops-production`,
|
||||
`ops-threephoenix`, `ops-registry`, `ops-secrets`,
|
||||
`ops-backup-retention`.
|
||||
|
||||
## Contract Mismatch
|
||||
|
||||
`inter-hub/docs/contracts/ops-hub-activity-core-mapping.md` and
|
||||
`ops-hub-activity-core-event-payloads.md` still describe the early
|
||||
activity-core proposal:
|
||||
|
||||
| Contract name | Live seed status | Recommended action |
|
||||
| --- | --- | --- |
|
||||
| `ops-service-observed` | Not in live event registry | Rename to `ops-service-discovered`, or add an explicit alias event in the ops-hub manifest. |
|
||||
| `ops-endpoint-verified` | Live | Keep. |
|
||||
| `ops-access-path-checked` | Not in live event registry; no `ops-access-path` widget type in seed | Either add access-path vocabulary/widgets, or defer access-path submissions and keep State Hub fallback. |
|
||||
| `ops-backup-verified` | Live | Keep, but map to `ops-backup-set` widget type. |
|
||||
| `ops-inventory-drift` | Not in live event registry | Rename to `ops-drift-detected`, or add an explicit alias event. |
|
||||
| `ops-evidence` policy scope | Not in live policy scopes | Use an existing ops scope or add `ops-evidence` to the manifest and activate it. |
|
||||
| aggregate refs such as `ops:service:aggregate` | Not in `ops-hub/seeds/ops-hub-widgets.seed.json` | Seed aggregate intake widgets or change mapping to the existing entity/readiness widgets. |
|
||||
| widget types such as `ops-service-card` | Not in live widget types | Use live widget types like `ops-service`, `ops-endpoint`, `ops-backup-set`, and `ops-readiness-gate`. |
|
||||
|
||||
|
||||
## 2026-06-27 Contract Alignment
|
||||
|
||||
The Inter-Hub contract docs were revised in `/home/worsch/inter-hub` to target
|
||||
the live ops-hub seed vocabulary:
|
||||
|
||||
- `ops-service-observed` is now a transition alias for
|
||||
`ops-service-discovered`.
|
||||
- `ops-inventory-drift` is now a transition alias for `ops-drift-detected`.
|
||||
- `ops-access-path-checked` is explicitly deferred to State Hub fallback until
|
||||
ops-hub adds access-path vocabulary or a readiness/risk mapping decision.
|
||||
- The old `ops-evidence` policy scope is replaced by declared live scopes such
|
||||
as `ops-production`, `ops-registry`, and `ops-backup-retention`.
|
||||
- Payload examples now post only live manifest event types.
|
||||
|
||||
This removes the known contract-drift blocker before the attended bootstrap.
|
||||
The remaining gate is authenticated widget lookup, any missing backup/risk seed
|
||||
widget, runtime key custody, and protected event submission smoke.
|
||||
|
||||
## Current Closure State
|
||||
|
||||
`CUST-WP-0049-T06` remains `wait`: the helper and runbook are ready, but an
|
||||
approved authenticated execution lane is still required.
|
||||
|
||||
`CUST-WP-0047-T05` remains `wait`: the ops-hub row and vocabulary are visible,
|
||||
but seeded widgets and event acceptance cannot be proven without the protected
|
||||
runtime path.
|
||||
|
||||
`IHUB-WP-0022-T03/T04/T07` remain gated: before an end-to-end smoke, reconcile
|
||||
the activity-core mapping contract to the live ops-hub seed vocabulary or add
|
||||
the missing aliases/aggregate widgets to the manifest.
|
||||
|
||||
## Next Pick
|
||||
|
||||
1. Use the aligned live-vocabulary contract for the attended
|
||||
`CUST-WP-0049-T06` bootstrap.
|
||||
2. Confirm protected widget ids and seed any missing backup/risk target widgets
|
||||
required by the mapping.
|
||||
3. Store or confirm `OPS_HUB_KEY` through OpenBao, then run the protected
|
||||
widget/hub-registry/event smoke.
|
||||
34
docs/state-hub-migration-strategy-status.md
Normal file
34
docs/state-hub-migration-strategy-status.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# State Hub Migration Strategy Status
|
||||
|
||||
Updated: 2026-06-27
|
||||
|
||||
## Decision
|
||||
|
||||
Use `CUST-WP-0011` as the active State Hub stabilization path.
|
||||
Keep `CUST-WP-0038` and `RAIL-BS-WP-0007` as deferred HA/ThreePhoenix follow-up lanes.
|
||||
|
||||
Rationale: the pragmatic railiance01 deployment has already completed image
|
||||
publish, cluster manifests, empty deploy, migrations, WSL2 data restore, row-count
|
||||
comparison, and cluster API health checks. The remaining work is cutover and
|
||||
stabilization, not initial buildout.
|
||||
|
||||
## Current State
|
||||
|
||||
| Path | State | Next action |
|
||||
| --- | --- | --- |
|
||||
| `CUST-WP-0011` pragmatic railiance01 | T01-T06 done. Cluster State Hub has verified restored WSL2 data and healthy API. | T07: get explicit approval to freeze WSL2 writes, restore final dump, compare again, and redirect private access/MCP to the cluster endpoint. |
|
||||
| `CUST-WP-0038` full HA State Hub | Entry criteria depend on completing or superseding CUST-WP-0011 and passing stabilization. All implementation tasks are still todo. | Defer until cluster-hosted State Hub proves stable and ThreePhoenix storage/database strategy is current. |
|
||||
| `RAIL-BS-WP-0007` ThreePhoenix HA cluster | All phases are todo. | Treat as substrate work for future critical workloads and HA State Hub, not as a blocker for pragmatic cutover. |
|
||||
|
||||
## Human Gates
|
||||
|
||||
- `CUST-WP-0011-T07`: explicit approval required before freezing WSL2 writes and making the cluster State Hub primary.
|
||||
- `CUST-WP-0038-T08`: explicit approval required before retiring WSL2 fallback after HA failover and restore drills.
|
||||
|
||||
## Stable Pickup Path
|
||||
|
||||
1. Reconfirm current WSL2 backup and take final pre-cutover dump.
|
||||
2. Restore final dump into railiance01 State Hub and compare counts again.
|
||||
3. Redirect the active private access path: either keep local `127.0.0.1:8000` and move it to an ops-bridge/SSH tunnel, or set MCP `API_BASE` to the private cluster endpoint.
|
||||
4. Run stabilization with WSL2 retained as fallback.
|
||||
5. Document the operating model and leave final retirement to a later explicit decision or HA workplan.
|
||||
@@ -4,14 +4,19 @@ type: workplan
|
||||
title: Repo Sync Automation & Gitea Inventory
|
||||
domain: infotech
|
||||
repo: the-custodian
|
||||
status: done
|
||||
status: backlog
|
||||
state_hub_workstream_id: 27ea80bd-76bf-44a7-b0ed-e09748d5390b
|
||||
created: 2026-03-16
|
||||
updated: 2026-03-16
|
||||
updated: 2026-06-27
|
||||
---
|
||||
|
||||
# CUST-WP-0014 — Repo Sync Automation & Gitea Inventory
|
||||
|
||||
2026-06-27 stabilization note: this workplan was previously marked `done` even
|
||||
though all task blocks remained `todo`. It has been reopened as `backlog` so the
|
||||
State Hub read model reflects the actual remaining sync-health work without
|
||||
adding it to the current execution queue.
|
||||
|
||||
## Problem
|
||||
|
||||
When a repo agent completes work and commits, the state-hub does not automatically
|
||||
|
||||
@@ -8,7 +8,7 @@ status: active
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
created: "2026-03-20"
|
||||
updated: "2026-06-22"
|
||||
updated: "2026-06-27"
|
||||
state_hub_workstream_id: "293a74fe-a85a-4ad6-8933-23d52a72fe8b"
|
||||
---
|
||||
|
||||
@@ -57,7 +57,7 @@ OAS gives the viable infrastructure.
|
||||
|
||||
```task
|
||||
id: CUST-WP-0025-T01
|
||||
status: todo
|
||||
status: cancel
|
||||
priority: high
|
||||
state_hub_task_id: "f55078b6-7fa3-49ab-be30-37db622d64c9"
|
||||
```
|
||||
@@ -68,11 +68,13 @@ foundation for all hubs and services.
|
||||
|
||||
Cross-reference: net-kingdom NK-WP-0001.
|
||||
|
||||
2026-06-27 sequencing update: cancelled as an obsolete prerequisite. `NK-WP-0001` is archived and superseded by the KeyCape/Authelia/LLDAP lightweight stack, `NK-WP-0012` IAM Profile v0.2, and the proposed `NK-WP-0011` expanded-mode Keycloak federation lane. FOS bootstrap should not wait for this old Keycloak path.
|
||||
|
||||
### T02 — Complete NK-WP-0002: Local identity bootstrap
|
||||
|
||||
```task
|
||||
id: CUST-WP-0025-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "0d7792f7-5695-4e1a-9726-b9661d5e7108"
|
||||
```
|
||||
@@ -83,6 +85,8 @@ development of hub services without cluster dependency.
|
||||
|
||||
Cross-reference: net-kingdom NK-WP-0002.
|
||||
|
||||
2026-06-27 sequencing update: marked done. `NK-WP-0002` is complete: local-identity file store, Keycloak export, minimal localhost OIDC provider, permissions hardening, audit log, and docs are all delivered.
|
||||
|
||||
### T03 — IAM Profile integration test
|
||||
|
||||
```task
|
||||
@@ -100,6 +104,8 @@ Write a minimal test service + integration test that:
|
||||
|
||||
This test becomes the template for hub-core auth middleware.
|
||||
|
||||
2026-06-27 sequencing update: this remains the real open identity gate, but it should target the current NetKingdom IAM Profile v0.2 contract and either local-identity or KeyCape lightweight issuer, not the archived `NK-WP-0001` Keycloak path.
|
||||
|
||||
### T04 — Canon standard: IAM Profile specification
|
||||
|
||||
```task
|
||||
|
||||
@@ -3,7 +3,9 @@ id: CUST-WP-0045-cutover-runbook
|
||||
type: runbook
|
||||
title: "CUST-WP-0045 T06 cutover — exact command sequence"
|
||||
parent_workplan: CUST-WP-0045
|
||||
status: finished
|
||||
created: "2026-06-01"
|
||||
updated: "2026-06-27"
|
||||
state_hub_workstream_id: "4ebc847b-4a2c-4ce2-9fd2-62d3071eed96"
|
||||
domain: infotech
|
||||
---
|
||||
|
||||
@@ -155,6 +155,14 @@ remaining live-execution blocker.
|
||||
Done when the ops-hub widgets exist and can accept `ops-endpoint-verified` or
|
||||
equivalent ops evidence events.
|
||||
|
||||
2026-06-27 non-secret probe: production Inter-Hub publicly lists the `ops-hub`
|
||||
hub row (`4f6e4cf7-6a96-4ff2-8a37-08c9f9e405d2`) and the ops-hub seed
|
||||
vocabulary. Protected `/api/v2/widgets` and `/api/v2/hub-registry` return
|
||||
HTTP `401` without a runtime key, so widget presence and event acceptance still
|
||||
require the approved operator/runtime-key lane. The activity-core mapping
|
||||
contract also needs reconciliation with the live ops-hub seed vocabulary before
|
||||
smoke closure.
|
||||
|
||||
## Task: Build First Ops-Hub Service Catalog View
|
||||
|
||||
```task
|
||||
|
||||
@@ -180,6 +180,12 @@ Done when the ops-hub Inter-Hub records exist in production, the generated
|
||||
runtime key is stored outside Git, and non-secret validation evidence is logged
|
||||
to State Hub.
|
||||
|
||||
2026-06-27 non-secret probe: production Inter-Hub already has the `ops-hub` row
|
||||
and public ops vocabulary. The live blocker remains authenticated execution and
|
||||
runtime-key custody, plus a contract-alignment issue between `IHUB-WP-0022`
|
||||
activity-core mapping docs and the live ops-hub seed vocabulary. Do not spend
|
||||
operator key time on the smoke until the vocabulary/mapping direction is chosen.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- The repeatable access lane is documented in the owning repos.
|
||||
|
||||
395
workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
Normal file
395
workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
Normal file
@@ -0,0 +1,395 @@
|
||||
---
|
||||
id: CUST-WP-0051
|
||||
type: workplan
|
||||
title: "Infrastructure Stabilization Metaplan"
|
||||
domain: infotech
|
||||
repo: the-custodian
|
||||
status: active
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
planning_priority: high
|
||||
planning_order: 51
|
||||
created: "2026-06-27"
|
||||
updated: "2026-06-27"
|
||||
state_hub_workstream_id: "21cabc98-3f80-4d00-b3b7-06e2ac2af88f"
|
||||
---
|
||||
|
||||
# CUST-WP-0051 - Infrastructure Stabilization Metaplan
|
||||
|
||||
## Goal
|
||||
|
||||
Drive the registered infrastructure workplans from a scattered blocked state to
|
||||
a stable checkpoint where:
|
||||
|
||||
- active blockers have a named owner, route, and next command or decision;
|
||||
- production credential work uses approved custody paths only;
|
||||
- daily operational automation has one healthy runner and clean evidence;
|
||||
- State Hub registration reflects the real file state;
|
||||
- unfinished strategic work is sequenced into clear follow-on lanes.
|
||||
|
||||
This workplan does not replace the child workplans. It is the coordination lane
|
||||
for removing cross-workplan blocks and creating a reliable handoff point.
|
||||
|
||||
## Review Snapshot
|
||||
|
||||
Reviewed on 2026-06-27 from State Hub and the repo workplan files.
|
||||
|
||||
Active registered workstreams with open work:
|
||||
|
||||
| Workstream | Open state | Main stabilization meaning |
|
||||
| --- | --- | --- |
|
||||
| artifact-store-wp-0007 | 5 todo | Object-store compatibility and STS credential vending lane. |
|
||||
| ihub-wp-0022 | 3 wait, 5 done | Ops-hub evidence intake waits on widget seed/runtime key/smoke. |
|
||||
| cust-wp-0047 | 1 wait, 6 done | Ops-hub now view waits on Inter-Hub widget activation. |
|
||||
| cust-wp-0049 | 1 wait, 5 done | Access lane is ready; live bootstrap needs approved admin execution. |
|
||||
| activity-wp-0016 | 1 wait, 2 progress, 5 todo, 2 done | Daily-triage output robustness needs live deploy/smoke evidence. |
|
||||
| three-phoenix-ha-cluster | 7 todo | Target HA substrate is planned but not executed. |
|
||||
| staged-promotion-lifecycle | 6 todo, 1 done | Promotion discipline needed before broad production cutovers. |
|
||||
| rail-ho-wp-0005 | 11 todo, 1 progress | Forgejo production migration needs human design and cutover decisions. |
|
||||
| cust-wp-0045-cutover-runbook | 0 tasks | Registered runbook is appearing as an active no-task workstream. |
|
||||
| net-wp-0020 | 2 wait, 1 todo, 2 done | OpenBao unseal custody models still need operator profile decisions. |
|
||||
| issue-wp-0003 | 2 progress, 5 done | issue-core deploy is close; finish live wiring and runbook evidence. |
|
||||
| activity-wp-0006 | 1 wait, 1 todo, 6 done | Three-run calibration waits on the daily-triage live gate. |
|
||||
| cust-wp-0038 | 8 todo | Full ThreePhoenix State Hub HA migration remains strategic follow-on. |
|
||||
| cust-wp-0025 | 17 todo, 9 done | FOS hub bootstrap now depends on identity, ops-hub, and fin-hub lanes. |
|
||||
| cust-wp-0011 | 3 todo, 6 done | Pragmatic State Hub railiance01 migration still needs cutover/stabilize/retire. |
|
||||
|
||||
Additional repo-local hygiene issue:
|
||||
|
||||
- `CUST-WP-0014` has frontmatter `status: done` but all six task blocks are
|
||||
still `todo`. Treat it as either superseded and archive it, or reopen it as a
|
||||
focused State Hub sync-health workplan.
|
||||
|
||||
State Hub hygiene issue:
|
||||
|
||||
- There are stale `needs_human` flags on completed or cancelled tasks. These do
|
||||
not all block execution, but they make the operator view noisier and should be
|
||||
cleared or annotated after the source workplans are reconciled.
|
||||
|
||||
## Dependency Shape
|
||||
|
||||
The critical path is:
|
||||
|
||||
1. Credential and operator-access custody:
|
||||
OpenBao, Inter-Hub operator key, ops-hub runtime key, Forgejo SMTP/cutover
|
||||
approvals, and OpenBao unseal profile decisions.
|
||||
2. Ops evidence and daily automation:
|
||||
Inter-Hub ops-hub records, activity-core daily-triage robustness deployment,
|
||||
schema-valid smoke, then three clean scheduled runs.
|
||||
3. Production substrate and source forge:
|
||||
issue-core GitOps pilot, Forgejo production migration, artifact-store STS,
|
||||
staged promotion, and State Hub migration strategy.
|
||||
4. Federation buildout:
|
||||
identity completion, ops-hub scaffold, ops-hub MCP registration, fin-hub
|
||||
scaffold, and business/runway canon.
|
||||
|
||||
## Task: Normalize Registry And Workplan Hygiene
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "7e83bd50-5ca2-4341-9d18-65512e3f0442"
|
||||
```
|
||||
|
||||
Clean up the planning substrate before execution work resumes.
|
||||
|
||||
Minimum scope:
|
||||
|
||||
- Decide whether `CUST-WP-0045-cutover-runbook` should stay registered as an
|
||||
active workstream or be represented only as a runbook under `CUST-WP-0045`.
|
||||
- Resolve `CUST-WP-0014`: archive as superseded, or reopen and re-scope the six
|
||||
remaining State Hub sync-health tasks.
|
||||
- Clear or annotate stale `needs_human` flags on done/cancel tasks after source
|
||||
workplans confirm they are no longer live gates.
|
||||
- Run State Hub consistency after file changes.
|
||||
|
||||
Done when the active workstream list no longer contains no-task runbooks or
|
||||
contradictory done-with-todo files, and the human-needed view shows only live
|
||||
human gates.
|
||||
|
||||
Progress 2026-06-27:
|
||||
|
||||
- `CUST-WP-0045-cutover-runbook` now has `status: finished`; State Hub no
|
||||
longer lists it as an active workstream.
|
||||
- `CUST-WP-0014` is reopened as `backlog` with its task detail preserved, so it
|
||||
is no longer a contradictory done-with-todo file or an active queue item.
|
||||
- `make fix-consistency REPO=the-custodian` passed with pre-existing C-12
|
||||
warnings and synced the lifecycle changes into State Hub.
|
||||
|
||||
Completed 2026-06-27: cleared 15 stale `needs_human` flags from tasks that
|
||||
were already `done` or `cancel`, leaving live `todo`/`progress`/`wait` human
|
||||
gates untouched. T01 is complete.
|
||||
|
||||
## Task: Establish One Credential-Custody Unblock Board
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "312bde29-4370-4352-b5a3-00a8c4fe2059"
|
||||
```
|
||||
|
||||
Collect the live operator-access decisions in one non-secret board.
|
||||
|
||||
Inputs:
|
||||
|
||||
- `CUST-WP-0049-T06`: Inter-Hub admin access or deployment-side bootstrap path.
|
||||
- `IHUB-WP-0022-T04`: ops-hub runtime `OPS_HUB_KEY` custody.
|
||||
- `NET-WP-0020`: OpenBao unseal custody and SSH automation profile.
|
||||
- `RAIL-HO-WP-0005`: Forgejo hostname, SMTP, runner, backup, cutover, rollback,
|
||||
and retirement decisions.
|
||||
|
||||
Rules:
|
||||
|
||||
- Do not put secrets in Git, State Hub, workplans, or chat.
|
||||
- Use `warden route find` / `warden route show` before requesting credentials.
|
||||
- Treat ops-warden as SSH certificate authority only, not as a secret store.
|
||||
|
||||
Done when each human/operator gate has an owner, approved route, expected
|
||||
execution host, non-secret evidence target, and fallback decision.
|
||||
|
||||
Completed 2026-06-27: added `docs/credential-custody-unblock-board.md` with
|
||||
route records, live gate owners, expected execution hosts, non-secret evidence
|
||||
targets, fallback decisions, and pickup order. Route lookup was verified through
|
||||
`/home/worsch/ops-warden` using `uv run warden route show ... --json` because
|
||||
the globally installed `warden` lacks the `route` subcommand.
|
||||
|
||||
## Task: Close The Ops-Hub Inter-Hub Evidence Lane
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T03
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "d6c3a39e-629d-47e4-b589-9e1a0273d9fa"
|
||||
```
|
||||
|
||||
Finish the linked ops-hub activation chain:
|
||||
|
||||
- Execute `CUST-WP-0049-T06` using the approved access route.
|
||||
- Close `CUST-WP-0047-T05` by proving ops-hub widgets exist and accept evidence
|
||||
events.
|
||||
- Unblock `IHUB-WP-0022` by provisioning the runtime key through the approved
|
||||
secret path and running the end-to-end evidence submission smoke.
|
||||
|
||||
Done when ops inventory probes and activity-core evidence can land in Inter-Hub
|
||||
without manual SQL or secret exposure.
|
||||
|
||||
Progress 2026-06-27:
|
||||
|
||||
- Added `docs/ops-hub-interhub-evidence-lane-status.md` with non-secret public
|
||||
probe evidence. Production Inter-Hub has an `ops-hub` row and the ops-hub seed
|
||||
vocabulary is visible on public registry endpoints.
|
||||
- Protected widget, manifest, and hub-registry surfaces correctly require
|
||||
authentication; no runtime-key smoke was attempted.
|
||||
- New blocker surfaced: the older `IHUB-WP-0022` activity-core mapping contract
|
||||
names event types, policy scope, aggregate widget refs, and widget types that
|
||||
do not match the live ops-hub seed vocabulary. Align that contract before an
|
||||
attended bootstrap/runtime-key smoke, or the operator key may still hit
|
||||
manifest/schema failures.
|
||||
|
||||
Progress 2026-06-27 contract alignment:
|
||||
|
||||
- Updated `/home/worsch/inter-hub` contract docs for `IHUB-WP-0022` to target
|
||||
the live ops-hub seed vocabulary. Old `ops-service-observed` and
|
||||
`ops-inventory-drift` names are transition aliases, `ops-access-path-checked`
|
||||
is deferred to fallback until supported, and payload examples now post only
|
||||
live manifest event types.
|
||||
- Ran `make fix-consistency REPO=inter-hub`; it passed with pre-existing C-12
|
||||
warnings and synced the IHUB-WP-0022 description drift into State Hub.
|
||||
- Remaining T03 gate is authenticated widget lookup, any missing backup/risk
|
||||
seed widget, runtime key custody, and protected submission smoke.
|
||||
|
||||
## Task: Stabilize Daily-Triage Automation
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T04
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "42810d3b-5557-4efd-871b-65bef7c19e0e"
|
||||
```
|
||||
|
||||
Finish the activity-core daily-triage reliability lane.
|
||||
|
||||
Sequence:
|
||||
|
||||
1. Deploy the `activity-wp-0016` robustness bundle: bounded prompt/schema,
|
||||
per-item parsing, quarantine lane, and producer guardrails.
|
||||
2. Run a schema-valid live daily-triage smoke on railiance01.
|
||||
3. Collect three clean scheduled runs with matching activity-core, State Hub,
|
||||
and working-memory evidence.
|
||||
4. Close `activity-wp-0006` calibration and decide the fate of the
|
||||
`CUST-WP-0045` cutover runbook registration.
|
||||
|
||||
Done when there is exactly one trusted daily triage runner and the fallback
|
||||
state is documented.
|
||||
|
||||
Progress 2026-06-27:
|
||||
|
||||
- Added `docs/daily-triage-stabilization-status.md` with the current evidence
|
||||
chain. The 2026-06-24 and 2026-06-25 scheduled runs were schema-valid; the
|
||||
2026-06-26 and 2026-06-27 runs reached State Hub and working memory but failed
|
||||
output validation around char 5.2k.
|
||||
- Current primary blocker is no longer a silent schedule or State Hub sink
|
||||
outage. The live runner still needs the `ACTIVITY-WP-0016` code/schema bundle
|
||||
and Railiance runtime prompt changes so malformed tails degrade to quarantined
|
||||
partial output.
|
||||
- Pickup sequence: deploy WP-0016 code/schema together, update the runtime
|
||||
prompt bundle for bounded top-N/per-item framing/token headroom, run a live
|
||||
railiance01 smoke, then restart the three-clean-run gate.
|
||||
- Normalized ACTIVITY-WP-0016 source task status in activity-core: T04 is done
|
||||
and T05 is progress, matching its own progress notes.
|
||||
- Updated activity-core daily-triage source notes: ACTIVITY-WP-0010-T02 is
|
||||
now done, T03/T04 point at the post-WP-0016 live smoke and three-run gate,
|
||||
and ACTIVITY-WP-0006-T03 records the 2026-06-27 validation failure.
|
||||
- Cleared the stale human-needed flag from the completed bridge/config task and
|
||||
moved live intervention notes onto the deploy/smoke/calibration gate.
|
||||
|
||||
## Task: Finish Near-Term Production Service Lanes
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T05
|
||||
status: progress
|
||||
priority: medium
|
||||
state_hub_task_id: "2083f0e4-e037-48bf-8069-f31e8db2fd95"
|
||||
```
|
||||
|
||||
Move near-complete service workstreams to done before starting larger migrations.
|
||||
|
||||
Priority order:
|
||||
|
||||
- `issue-wp-0003`: finish activity-core wiring and end-to-end GitOps runbook.
|
||||
- `rail-ho-wp-0005`: resolve Forgejo production decisions, email recovery, and
|
||||
cutover approval gates.
|
||||
- `artifact-store-wp-0007`: complete MinIO compatibility and STS credential
|
||||
vending assessment if it is required by backup, registry, or app lanes.
|
||||
- `staged-promotion-lifecycle`: make production promotion gates explicit before
|
||||
further cluster/source-forge cutovers.
|
||||
|
||||
Done when each lane is either finished or parked with a precise dependency and
|
||||
no ambiguous human-needed state.
|
||||
|
||||
Progress 2026-06-27:
|
||||
|
||||
- Added `docs/near-term-production-service-lanes-status.md` with a lane board
|
||||
for issue-core, Forgejo, artifact-store, and staged promotion.
|
||||
- issue-core is the immediate near-done lane: the service itself is healthy, but
|
||||
activity-core still points at port `8010` and `ISSUE_SINK_TYPE=null`. Do not
|
||||
flip it to REST until `ISSUE_CORE_API_KEY` is injected into activity-core's
|
||||
runtime secret via route `activity-core-issue-sink`.
|
||||
- Forgejo remains parked behind explicit production design decisions, SMTP/email
|
||||
recovery, package registry, Actions, backup/restore, migration drill, and
|
||||
cutover approval.
|
||||
- artifact-store and staged promotion are executable planning/build lanes:
|
||||
start artifact-store D7.1/D7.2 and staged-promotion T02 before broad
|
||||
production source-forge migration work.
|
||||
|
||||
## Task: Decide State Hub Migration Strategy
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T06
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "0ac3763f-eac0-4773-9be8-cb0a7979e444"
|
||||
```
|
||||
|
||||
Choose and execute the State Hub stabilization path.
|
||||
|
||||
Decision:
|
||||
|
||||
- If pragmatic railiance01 service is enough for the next operating period,
|
||||
finish `CUST-WP-0011`: cutover MCP config, observe the stabilization window,
|
||||
then retire or retain WSL2 fallback by explicit decision.
|
||||
- If HA is now required, promote `CUST-WP-0038` and the ThreePhoenix HA cluster
|
||||
lane: readiness, storage/database strategy, HA API behavior, failover drill,
|
||||
restore drill, and endpoint/runbook update.
|
||||
|
||||
Done when the active State Hub path is singular, tested, and documented, and
|
||||
the alternate path is either cancelled, deferred, or explicitly retained as a
|
||||
future workplan.
|
||||
|
||||
Progress 2026-06-27:
|
||||
|
||||
- Added `docs/state-hub-migration-strategy-status.md` and selected
|
||||
the pragmatic `CUST-WP-0011` railiance01 path as the singular active
|
||||
State Hub stabilization lane.
|
||||
- `CUST-WP-0011` is already through T01-T06: image pushed, cluster
|
||||
manifests defined, empty deploy healthy, migrations run, WSL2 data restored,
|
||||
row counts compared, and cluster API health/summary verified.
|
||||
- Next gate is `CUST-WP-0011-T07`: explicit approval to freeze WSL2
|
||||
writes, restore the final dump, compare again, and redirect MCP/private access
|
||||
to the cluster endpoint.
|
||||
- `CUST-WP-0038` and `RAIL-BS-WP-0007` remain deferred HA
|
||||
lanes until the pragmatic path stabilizes and ThreePhoenix storage/database
|
||||
strategy is current.
|
||||
|
||||
## Task: Sequence FOS Hub Bootstrap To Completion
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T07
|
||||
status: progress
|
||||
priority: medium
|
||||
state_hub_task_id: "27b6828a-9e87-4135-a036-bce760c3057c"
|
||||
```
|
||||
|
||||
Use the stabilized substrate to finish `CUST-WP-0025` without reviving the
|
||||
mega-hub pattern.
|
||||
|
||||
Recommended order:
|
||||
|
||||
1. Finish identity foundations: NK-WP-0001, NK-WP-0002, then the IAM profile
|
||||
integration test.
|
||||
2. Create the standalone ops-hub repo from hub-core and ingest the inventory
|
||||
artifacts from `CUST-WP-0047`.
|
||||
3. Add ops models, MCP tools, Railiance integration, dev-hub coupling, dashboard,
|
||||
and MCP registration.
|
||||
4. Only then start the fin-hub/business-model tasks.
|
||||
|
||||
Done when `CUST-WP-0025` has no open foundational identity or ops-hub tasks and
|
||||
fin-hub work is either started on a stable scaffold or deliberately deferred.
|
||||
|
||||
Progress 2026-06-27:
|
||||
|
||||
- Added `docs/fos-hub-bootstrap-sequence-status.md` with the current sequence.
|
||||
- Corrected the identity foundation baseline in `CUST-WP-0025`: the old
|
||||
`NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local
|
||||
identity is done, and the remaining identity gate is the IAM Profile v0.2
|
||||
FastAPI integration test.
|
||||
- Current ops-hub reality is extension-first: `ops-hub` exists,
|
||||
`OPS-WP-0001` is finished, and `OPS-WP-0002` waits on authenticated
|
||||
Inter-Hub bootstrap/runtime-key evidence. Reconcile `CUST-WP-0025-T13`-`T19`
|
||||
after the first governed ops event lands.
|
||||
- Fin-hub/business tasks remain deliberately deferred until identity integration
|
||||
and ops-hub extension evidence are proven.
|
||||
|
||||
## Task: Create The Stable Pickup Checkpoint
|
||||
|
||||
```task
|
||||
id: CUST-WP-0051-T08
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "2cc0a127-a749-4228-962e-f8c9b693a1b3"
|
||||
```
|
||||
|
||||
Close this metaplan by creating an operator-friendly checkpoint.
|
||||
|
||||
Minimum contents:
|
||||
|
||||
- active workstream list with zero stale runbooks and zero contradictory task
|
||||
states;
|
||||
- blocker board showing no unowned credential, access, or approval gates;
|
||||
- daily automation evidence from the latest successful scheduled run;
|
||||
- production service status summary for State Hub, Inter-Hub, ops-hub evidence,
|
||||
issue-core, Forgejo, and artifact-store;
|
||||
- explicit next-pick list for remaining strategic tasks.
|
||||
|
||||
Done when a future agent can start from the checkpoint and choose the next
|
||||
workplan without reconstructing this review.
|
||||
|
||||
|
||||
Completed 2026-06-27: added
|
||||
`docs/infrastructure-stabilization-pickup-checkpoint.md` with the live active
|
||||
workstream list, named blocker board, latest daily-triage evidence, production
|
||||
service status summary, and next-pick sequence. This closes the handoff surface
|
||||
for future agents while the child workplans remain the execution source of
|
||||
truth.
|
||||
Reference in New Issue
Block a user