129 lines
9.0 KiB
Markdown
129 lines
9.0 KiB
Markdown
# Infrastructure Stabilization Pickup Checkpoint
|
|
|
|
Updated: 2026-06-27
|
|
Coordinator workplan: `CUST-WP-0051`
|
|
|
|
## Purpose
|
|
|
|
This checkpoint is the restart surface for the infrastructure stabilization
|
|
metaplan. It consolidates the workplan review, unblock boards, current State
|
|
Hub registration state, and the next strategic picks.
|
|
|
|
Use this file first when resuming the lane. Then open the source workplan named
|
|
in the relevant row and continue from its task state.
|
|
|
|
## Registration State
|
|
|
|
State Hub active workstreams queried on 2026-06-27:
|
|
|
|
| Workstream | Current pickup meaning |
|
|
| --- | --- |
|
|
| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
|
|
| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
|
|
| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
|
|
| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
|
|
| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. |
|
|
| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
|
|
| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
|
|
| `staged-promotion-lifecycle` | T02 `railiance/app.toml` contract, T03 overlay repo pattern/script, T04 Stage 1 runner, T05 canary template, and T06 deploy/observe tooling are done; continue with T07 promote/rollback/onboarding before broad production migrations. |
|
|
| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
|
|
| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. |
|
|
| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. |
|
|
| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
|
|
| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
|
|
| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
|
|
| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. |
|
|
|
|
Hygiene status:
|
|
|
|
- `CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook
|
|
record, not an empty active workstream.
|
|
- `CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with
|
|
todo task blocks.
|
|
- Completed or cancelled tasks no longer carry the stale human-needed flags
|
|
cleared during this stabilization session.
|
|
- `make fix-consistency REPO=the-custodian` still reports pre-existing C-12
|
|
orphan-row warnings, but the relevant workplan lifecycle and task states sync.
|
|
|
|
## Blocker Board
|
|
|
|
No live credential, access, or approval gate is unowned. Do not ask
|
|
`ops-warden` for secret values; use the route catalog and the owning subsystem.
|
|
|
|
| Gate | Owner/route | Non-secret evidence to collect | Next action |
|
|
| --- | --- | --- | --- |
|
|
| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
|
|
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
|
|
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Use the aligned live-vocabulary mapping, then run attended bootstrap and protected widget lookup. |
|
|
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Store/provide `OPS_HUB_KEY` outside Git and run the protected evidence smoke. |
|
|
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. |
|
|
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
|
|
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
|
|
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | Approve custody profile and apply narrow issuer policies before live helper smokes. |
|
|
|
|
## Daily Automation Evidence
|
|
|
|
The scheduled daily-triage runner is alive and writing State Hub plus working
|
|
memory evidence. The current blocker is output validation, not scheduling or
|
|
sink reachability.
|
|
|
|
Latest clean scheduled run:
|
|
|
|
- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`,
|
|
schema-valid daily triage, working memory written.
|
|
|
|
Latest failed scheduled runs:
|
|
|
|
- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed
|
|
at char 5268, working memory written.
|
|
- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed
|
|
at char 5246, working memory written.
|
|
|
|
Resume from `docs/daily-triage-stabilization-status.md` and
|
|
`ACTIVITY-WP-0016` before restarting the three-clean-run gate.
|
|
|
|
## Production Service Summary
|
|
|
|
| Surface | Stable fact | Remaining gate |
|
|
| --- | --- | --- |
|
|
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. |
|
|
| Inter-Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; public registry vocabulary is visible; Inter-Hub contract docs now target the live seed vocabulary. | Protected widget lookup, runtime key custody, and authenticated event smoke remain. |
|
|
| ops-hub evidence | `ops-hub` exists as the Inter-Hub Operations extension; `OPS-WP-0001` finished; `OPS-WP-0002` has early seed tasks done. | Attended bootstrap, runtime key custody, protected widget/event smoke. |
|
|
| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. |
|
|
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
|
|
| artifact-store | Workplan is active with all tasks open and no current live secret handoff recorded. | Start D7.1 fork/object-store landscape and D7.2 compatibility harness. |
|
|
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity and IAM Profile v0.2 are done; hub-core extraction/dev-hub work is done. | Keep `CUST-WP-0025-T03` as the identity integration test, then reconcile old ops-hub scaffold tasks after first Inter-Hub ops event lands. |
|
|
|
|
## Next-Pick List
|
|
|
|
1. Run the attended Inter-Hub ops-hub bootstrap with the aligned live-vocabulary
|
|
mapping, confirm protected widget ids, and seed any missing backup/risk target widgets.
|
|
2. Store/confirm `OPS_HUB_KEY` through approved custody and run the protected
|
|
widget/hub-registry/event smoke.
|
|
3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt
|
|
bundle, then run the railiance01 daily-triage smoke.
|
|
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
|
|
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
|
|
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or
|
|
record that WSL2 remains primary for the next operating period.
|
|
6. Continue staged-promotion T07 and start artifact-store D7.1/D7.2
|
|
so Forgejo and storage work inherit clear production promotion gates.
|
|
7. Keep Forgejo cutover and State Hub HA work parked until their human decision
|
|
and drill gates are satisfied.
|
|
|
|
## Resume Commands
|
|
|
|
```bash
|
|
cd /home/worsch/the-custodian
|
|
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
|
|
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
|
|
sed -n '1,260p' docs/credential-custody-unblock-board.md
|
|
```
|
|
|
|
After workplan edits, sync from State Hub:
|
|
|
|
```bash
|
|
cd /home/worsch/state-hub
|
|
make fix-consistency REPO=the-custodian
|
|
```
|