144 lines
10 KiB
Markdown
144 lines
10 KiB
Markdown
# Infrastructure Stabilization Pickup Checkpoint
|
|
|
|
Updated: 2026-06-27
|
|
Coordinator workplan: `CUST-WP-0051`
|
|
|
|
## Purpose
|
|
|
|
This checkpoint is the restart surface for the infrastructure stabilization
|
|
metaplan. It consolidates the workplan review, unblock boards, current State
|
|
Hub registration state, and the next strategic picks.
|
|
|
|
Use this file first when resuming the lane. Then open the source workplan named
|
|
in the relevant row and continue from its task state.
|
|
|
|
## Registration State
|
|
|
|
State Hub active workstreams queried on 2026-06-27:
|
|
|
|
| Workstream | Current pickup meaning |
|
|
| --- | --- |
|
|
| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
|
|
| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
|
|
| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
|
|
| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
|
|
| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. |
|
|
| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
|
|
| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
|
|
| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
|
|
| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. |
|
|
| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. |
|
|
| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
|
|
| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
|
|
| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
|
|
| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. |
|
|
|
|
Hygiene status:
|
|
|
|
- `CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook
|
|
record, not an empty active workstream.
|
|
- `CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with
|
|
todo task blocks.
|
|
- Completed or cancelled tasks no longer carry the stale human-needed flags
|
|
cleared during this stabilization session.
|
|
- `make fix-consistency REPO=the-custodian` still reports pre-existing C-12
|
|
orphan-row warnings, but the relevant workplan lifecycle and task states sync.
|
|
- `RAIL-BS-WP-0006-staged-promotion-lifecycle` is finished: all seven tasks
|
|
are done, the workstream is finished in State Hub, and the file frontmatter
|
|
is `status: finished`.
|
|
|
|
## Blocker Board
|
|
|
|
No live credential, access, or approval gate is unowned. Do not ask
|
|
`ops-warden` for secret values; use the route catalog, the `warden access`
|
|
assist/proxy surface where the catalog lane allows it, and the owning subsystem.
|
|
|
|
For credential-related blockers, classify the environment posture and workload
|
|
maturity first. Dev/test work can use synthetic contract doubles; production
|
|
real-value work needs owner custody, policy gates where applicable, and
|
|
non-secret evidence. See `docs/ops-warden-secret-posture-review.md`.
|
|
|
|
Do not implement ops-warden changes from this Custodian lane. New ops-warden
|
|
needs should be posted through State Hub as requirements or suggestions for the
|
|
separate ops-warden worker.
|
|
|
|
| Gate | Owner/route | Non-secret evidence to collect | Next action |
|
|
| --- | --- | --- | --- |
|
|
| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
|
|
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
|
|
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Legacy/fallback only. Prefer Core Hub deployed smoke; run attended Inter-Hub bootstrap only by explicit operator supersede/rollback decision. |
|
|
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Do not materialize legacy `OPS_HUB_KEY` until a deployed Core Hub smoke or explicit legacy Inter-Hub smoke is ready to use it. |
|
|
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. |
|
|
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
|
|
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
|
|
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | Approve custody profile and apply narrow issuer policies before live helper smokes. |
|
|
|
|
## Daily Automation Evidence
|
|
|
|
The scheduled daily-triage runner is alive and writing State Hub plus working
|
|
memory evidence. The current blocker is output validation, not scheduling or
|
|
sink reachability.
|
|
|
|
Latest clean scheduled run:
|
|
|
|
- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`,
|
|
schema-valid daily triage, working memory written.
|
|
|
|
Latest failed scheduled runs:
|
|
|
|
- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed
|
|
at char 5268, working memory written.
|
|
- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed
|
|
at char 5246, working memory written.
|
|
|
|
Resume from `docs/daily-triage-stabilization-status.md` and
|
|
`ACTIVITY-WP-0016` before restarting the three-clean-run gate.
|
|
|
|
## Production Service Summary
|
|
|
|
| Surface | Stable fact | Remaining gate |
|
|
| --- | --- | --- |
|
|
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. |
|
|
| Inter-Hub / Core Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; `CORE-WP-0008` finished the Core Hub API smoke harness, activity-core sink, staging profile, CLI wrappers, UI backlog, and Custodian handoff. | Run deployed Core Hub smoke, staging import, activity-core sink smoke, and readiness summary; keep Haskell Inter-Hub only for migration/rollback proof. |
|
|
| ops-hub evidence | `CUST-WP-0025-T14` is done with the Core Hub ops evidence contract spec. `CUST-WP-0025-T13` through `T19` now use Core Hub API/CLI/UI gates; `CUST-WP-0047` and `CUST-WP-0049` remain legacy/fallback records. | Execute `CUST-WP-0025-T16`, `T17`, and `T18`; close legacy Inter-Hub waits only through deployed Core Hub evidence or explicit supersede decision. |
|
|
| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. |
|
|
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
|
|
| artifact-store | D7.1 is done; D7.2 has an opt-in live MinIO compatibility harness and manual smoke docs. No live secret handoff is recorded. | Run D7.2 against an approved MinIO-compatible endpoint, then route D7.3 STS vending through identity/platform custody before changing credential behavior. |
|
|
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity, IAM Profile v0.2, the Core Hub FastAPI IAM Profile integration test, and Core Hub operator UI first screens are done; hub-core extraction/dev-hub work is done; CUST-WP-0025 Phase 3 has been rewritten for Core Hub. | Execute the remaining Core Hub deployed evidence and cutover gates: `CUST-WP-0025-T16` and `T17`. |
|
|
|
|
## Next-Pick List
|
|
|
|
1. Execute the remaining rewritten `CUST-WP-0025` Core Hub gates: deployed
|
|
smoke and activity-core proof (`T16`) and cutover decision coupling (`T17`).
|
|
T03, T14, and T18 are complete as the identity integration template, ops
|
|
evidence/read-model contract, and operator UI first-screen gates.
|
|
2. Keep `CUST-WP-0047` and `CUST-WP-0049` as legacy evidence/fallback until
|
|
Core Hub deployed smoke evidence or an explicit supersede decision closes
|
|
them.
|
|
3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt
|
|
bundle, then run the railiance01 daily-triage smoke.
|
|
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
|
|
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
|
|
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or
|
|
record that WSL2 remains primary for the next operating period.
|
|
6. Run artifact-store D7.2 live MinIO-compatible evidence; Forgejo and storage
|
|
work can now inherit the finished staged-promotion gates.
|
|
7. Keep Forgejo cutover and State Hub HA work parked until their human decision
|
|
and drill gates are satisfied.
|
|
|
|
## Resume Commands
|
|
|
|
```bash
|
|
cd /home/worsch/the-custodian
|
|
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
|
|
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
|
|
sed -n '1,260p' docs/credential-custody-unblock-board.md
|
|
```
|
|
|
|
After workplan edits, sync from State Hub:
|
|
|
|
```bash
|
|
cd /home/worsch/state-hub
|
|
make fix-consistency REPO=the-custodian
|
|
```
|