Rewrite FOS ops hub phase for Core Hub

This commit is contained in:
2026-06-27 21:59:20 +02:00
parent 8c4d9b952b
commit befb35c056
5 changed files with 158 additions and 90 deletions

View File

@@ -72,8 +72,10 @@ Inter-Hub bootstrap or rollback validation.
- T19: cancel or defer ops-hub MCP server registration until post-cutover - T19: cancel or defer ops-hub MCP server registration until post-cutover
demand proves it is needed. demand proves it is needed.
This is enough to rewrite `CUST-WP-0025` safely, but not enough to declare Core 2026-06-27 follow-up: `CUST-WP-0025-T13` through `T19` have now been
Hub production cutover complete. rewritten around this recommendation. The rewrite is enough to stop the obsolete
standalone ops-hub scaffold sequence, but not enough to declare Core Hub
production cutover complete.
## Remaining Gates ## Remaining Gates

View File

@@ -21,14 +21,14 @@ Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workpla
| --- | --- | --- | | --- | --- | --- |
| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. | | Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. |
| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. | | Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. |
| Ops hub | The old `ops-hub`/Inter-Hub extension path has useful seed evidence, but Core Hub now has the credible replacement platform: local `/api/v2` compatibility, ops-hub bootstrap smoke, protected persistence-backed resources, and `/console` visual checks. | Create or update the Core Hub API-first continuation workplan. Treat Haskell Inter-Hub as legacy compatibility or rollback evidence. | | Ops hub | Core Hub is now the replacement platform: `CORE-WP-0008` finished the API smoke harness, activity-core sink, staging profile, CLI wrappers, UI rebuild backlog, and Custodian handoff. Live deployed smokes and cutover evidence are still open. | Continue through Core Hub deployed evidence, migration import, activity-core smoke, and cutover gates. Treat Haskell Inter-Hub as legacy compatibility or rollback evidence. |
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` still describe a standalone hub-core/FastAPI/MCP scaffold. Current implementation direction is Core Hub replacement-first, not Inter-Hub extension-first. | Reconcile these tasks after Core Hub has a deployed compatibility/evidence smoke: rewrite them to Core Hub-owned API/CLI/UI tasks or explicitly defer/cancel the old standalone scaffold. | | Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` have been rewritten around Core Hub API evidence, CLI parity, deployed smoke/cutover gates, whynot-aligned UI, and cancellation of immediate standalone ops-hub MCP registration. | Execute the remaining wait/todo gates in the rewritten Phase 3. Do not resume the obsolete standalone ops-hub scaffold sequence. |
| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. | | Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. |
## Stable Pickup Order ## Stable Pickup Order
1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test. 1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test.
2. Use `CUST-WP-0052` to open or update the Core Hub API-first continuation lane. 2. Use the finished `CORE-WP-0008` evidence lane and `CUST-WP-0052` reset notes as the Core Hub replacement baseline.
3. Keep `CUST-WP-0047`/`CUST-WP-0049` as legacy evidence/fallback until Core Hub smoke evidence or an explicit supersede decision closes them. 3. Keep `CUST-WP-0047`/`CUST-WP-0049` as legacy evidence/fallback until Core Hub deployed smoke evidence or an explicit supersede decision closes them.
4. Rewrite `CUST-WP-0025-T13`-`T19` after Core Hub proves the replacement path. 4. Execute rewritten `CUST-WP-0025-T14`, `T16`, `T17`, and `T18` in API/CLI/UI order.
5. Start fin-hub/business work only after ops-hub proves the Core Hub pattern end-to-end. 5. Start fin-hub/business work only after ops-hub proves the Core Hub pattern end-to-end.

View File

@@ -350,15 +350,15 @@ few repos (unrelated to dev-hub rename); no new automation errors introduced.
**Goal**: Runtime operations coordination per FOS §7.3. **Goal**: Runtime operations coordination per FOS §7.3.
**Depends on**: Phase 2 (hub_core available), Phase 1 (identity for service auth). **Depends on**: Phase 2 (hub_core available), Phase 1 (identity for service auth).
**Repo**: ops-hub (new standalone repo, registered under custodian domain) **Repo**: core-hub for replacement runtime; the-custodian for coordination; standalone ops-hub is deferred until post-cutover need is proven.
**Inventory-first implementation slice (2026-06-05):** `CUST-WP-0047` **Inventory-first implementation slice (2026-06-05):** `CUST-WP-0047`
carves out the minimum useful part of T14/T16/T18 before the full standalone carves out the minimum useful part of T14/T16/T18 before the replacement runtime
`ops-hub` scaffold exists: a repo-owned service inventory contract, an initial is fully proven: a repo-owned service inventory contract, an initial
service/location/evidence seed, and the handoff path for Inter-Hub widgets and service/location/evidence seed, and the handoff path for Inter-Hub widgets and
activity-core probes. The T13-T19 tasks below remain the long-term ops-hub activity-core probes. After the Core Hub reset, these artifacts feed Core Hub
implementation; the inventory slice produces input artifacts that the eventual ops evidence first; a separate ops-hub repo should ingest them only if a
ops-hub repo can ingest rather than replace. post-cutover service boundary is proven useful.
**Inter-Hub bootstrap access lane (2026-06-17):** `CUST-WP-0049` extracts the **Inter-Hub bootstrap access lane (2026-06-17):** `CUST-WP-0049` extracts the
repeatable authenticated bootstrap routine needed to finish ops-hub production repeatable authenticated bootstrap routine needed to finish ops-hub production
@@ -367,29 +367,37 @@ ops-warden owns the short-lived SSH certificate envelope, and operator secret
custody remains outside Git. custody remains outside Git.
**Core Hub reset (2026-06-27):** `CUST-WP-0052` supersedes the Inter-Hub-first **Core Hub reset (2026-06-27):** `CUST-WP-0052` supersedes the Inter-Hub-first
implementation direction for future work. The old T13-T19 standalone ops-hub implementation direction for future work. T13-T19 below have been rewritten
scaffold should not be executed literally until it is rewritten around Core Hub: around Core Hub: API-first replacement contracts, CLI helpers second, deployed
API-first replacement contracts, CLI helpers second, and a rebuilt whynot-aligned evidence and cutover gates, and a rebuilt whynot-aligned operator UI third. Keep
operator UI third. Keep this phase active as a coordination record, not as a Haskell Inter-Hub as legacy compatibility or rollback evidence, not the preferred
mandate to expand Haskell Inter-Hub. implementation target.
### T13 — Create ops-hub repo from hub-core scaffold ### T13 — Open Core Hub replacement lane
```task ```task
id: CUST-WP-0025-T13 id: CUST-WP-0025-T13
status: todo status: done
priority: medium priority: high
state_hub_task_id: "2c6d1429-a67a-4f66-84d1-cb32ffdb890f" state_hub_task_id: "2c6d1429-a67a-4f66-84d1-cb32ffdb890f"
``` ```
Create `ops-hub` repo with: Replace the old immediate standalone `ops-hub` repo scaffold with a Core
- pyproject.toml depending on hub-core Hub-owned replacement lane.
- FastAPI app factory inheriting hub-core base
- MCP server extending hub-core base server
- Alembic setup with hub-core core migrations + ops-specific
- Register as managed repo under custodian domain
### T14 — Ops-specific models The replacement lane must keep the FOS intent of runtime operations
coordination while using the current implementation order:
- Core Hub API resources and compatibility/evidence smokes first;
- thin operator CLI wrappers second;
- web UI rebuild third, after API/CLI parity is stable.
Completed 2026-06-27: Core Hub workplan `CORE-WP-0008` finished as the
API-first execution counterpart, and Custodian recorded the replacement evidence
handoff in `docs/core-hub-replacement-evidence.md`. This task is complete as a
reframe/open-lane task; it does not claim production cutover is complete.
### T14 — Define Core Hub ops evidence contract and read-model gaps
```task ```task
id: CUST-WP-0025-T14 id: CUST-WP-0025-T14
@@ -398,91 +406,143 @@ priority: medium
state_hub_task_id: "0e811e9b-23a5-49f9-979e-cd1c5dcd937f" state_hub_task_id: "0e811e9b-23a5-49f9-979e-cd1c5dcd937f"
``` ```
Define SQLAlchemy models for: Define the Core Hub-owned operations evidence contract that replaces the old
- **Service**: name, namespace, health_status, last_seen, endpoints standalone ops-specific model list.
- **Incident**: severity, status (open/investigating/mitigated/resolved), timeline
- **Runbook**: service_id, trigger_conditions, steps, last_executed
- **AccessPath**: type (ssh/k8s/http), target, auth_method, status
- **OperationalDebt**: category, severity, location, owner
- **ChangeRecord**: what changed, when, by whom, rollback_path
### T15 — Ops-specific MCP tools The contract should reconcile:
- `CUST-WP-0047` service inventory and current evidence vocabulary;
- Core Hub hubs, manifests, widgets, API consumers, and interaction events;
- activity-core probe metadata and `core-hub-interaction-event` sink output;
- migration runs, deployment records, outcome signals, and cutover evidence;
- non-secret custody rules for key prefixes, hashes, routes, and evidence ids.
Known Core Hub API/read-model gaps to resolve before UI expansion:
- a protected migration-run read route such as `/api/v2/migration-runs`;
- non-deferred deployment/outcome evidence routes where needed;
- a mapping from service inventory ids to Core Hub widgets/events.
Done when Core Hub has a workplan or spec that names the API resources, record
shape, evidence event vocabulary, and migration path from the existing
Custodian inventory artifacts.
### T15 — Core Hub operator CLI parity
```task ```task
id: CUST-WP-0025-T15 id: CUST-WP-0025-T15
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "3fdd1f61-4c8e-4614-898b-df7a9aa4a514" state_hub_task_id: "3fdd1f61-4c8e-4614-898b-df7a9aa4a514"
``` ```
Implement ops-domain MCP tools: Replace the old MCP-first ops tool plan with API and CLI parity first.
- Service registry: register_service, list_services, get_service_health
- Health probes: probe_service, get_cluster_health, get_storage_health
- Incident lifecycle: create_incident, update_incident, resolve_incident
- Runbook: get_runbook, execute_runbook_step
- Access: list_access_paths, check_access_path
### T16 — Railiance infrastructure integration Required CLI surface:
- deployed Core Hub smoke evidence;
- ops-hub bootstrap/status checks;
- migration bundle validate/import;
- cutover readiness summary from non-secret evidence reports.
Completed 2026-06-27: `CORE-WP-0008-T05` added `make operator-cli` and
`scripts/core_hub_cli.py` with wrappers around the same Core Hub API behavior
used by tests and smokes. Any MCP surface should consume these proven APIs later
rather than becoming the first implementation path.
### T16 — Deployed ops evidence and activity-core smokes
```task ```task
id: CUST-WP-0025-T16 id: CUST-WP-0025-T16
status: todo status: wait
priority: medium priority: high
state_hub_task_id: "702849c5-b253-4ede-afa7-0ab4f81e49a5" state_hub_task_id: "702849c5-b253-4ede-afa7-0ab4f81e49a5"
``` ```
Connect ops-hub to railiance infrastructure observability: Run the production-like Core Hub evidence smokes that replace the old direct
- k3s cluster health via kubectl/API Railiance infrastructure integration task.
- Longhorn storage status and replication state
- Certificate expiry tracking (cert-manager)
- Backup status (S2 integrated backup)
- SSH tunnel health (ops-bridge)
### T17 — Cross-hub protocol: ops-hub to dev-hub Minimum evidence:
- `make deployed-smoke` or `make operator-cli CLI_ARGS="deployed-smoke ..."`
against a real Core Hub staging URL;
- deployed activity-core Core Hub sink smoke with approved runtime token and
widget mapping;
- non-secret report fields only: run id, hub/manifest/API-consumer ids,
key prefixes, widget/event ids, counts, statuses, and containment booleans;
- State Hub progress note linking the evidence and naming any remaining gates.
Blocked until an approved `CORE_HUB_BASE_URL`, operator/runtime token custody
path, and activity-core widget mapping are available. This task can close or
supersede `CUST-WP-0047-T05` and `CUST-WP-0049-T06` only after deployed Core
Hub evidence exists or an explicit supersede decision is recorded.
### T17 — Core Hub, dev-hub, and cutover decision coupling
```task ```task
id: CUST-WP-0025-T17 id: CUST-WP-0025-T17
status: todo status: wait
priority: medium priority: medium
state_hub_task_id: "b99a3ed8-440b-4e28-88f5-495de7276f66" state_hub_task_id: "b99a3ed8-440b-4e28-88f5-495de7276f66"
``` ```
Implement FOS §9.2.5 event coupling: Replace the old ops-hub-to-dev-hub protocol task with Core Hub replacement
- Deployment events in dev-hub → change signals in ops-hub coupling and cutover decision records.
- Incident events in ops-hub → blocker signals in dev-hub
- Shared event vocabulary (canonical event_types)
- HTTP-based event forwarding (keep it simple; upgrade to NATS later if needed)
### T18 — Ops Hub "now view" dashboard Minimum scope:
- Core Hub readiness summary from deployed smoke, migration import,
activity-core sink, and optional legacy Inter-Hub reference evidence;
- State Hub progress/decision records that state whether legacy Inter-Hub
fallback remains required;
- compatibility notes for consumers that still expect Inter-Hub `/api/v2`;
- rollback and Haskell retirement gates kept explicit.
Blocked until `CORE-WP-0005` staging import, dual-run smokes, and cutover
readiness evidence exist. Do not unblock `CORE-WP-0007` Haskell retirement from
local-only evidence.
### T18 — Core Hub operator UI first screens
```task ```task
id: CUST-WP-0025-T18 id: CUST-WP-0025-T18
status: todo status: todo
priority: low priority: medium
state_hub_task_id: "5b6cea8b-3982-49be-bacf-7269a3d2104e" state_hub_task_id: "5b6cea8b-3982-49be-bacf-7269a3d2104e"
``` ```
Observable Framework dashboard for ops-hub: Replace the old Observable Framework dashboard task with the Core Hub operator
- Service status grid (green/amber/red) UI rebuild backlog.
- Active incidents timeline
- Access path map
- Storage and certificate health
- Recent change log
### T19 — Register ops-hub as MCP server Initial UI work should implement only the first operator-critical screens:
- readiness overview;
- registry explorer;
- evidence stream;
- migration/cutover state;
- action-required gates;
- access metadata as a support panel, not a broad expansion area.
Use whynot-design tokens/components wherever practical and preserve
`make visual-check` style desktop/mobile, no-overlap, text-overflow, protected
route, and non-secret assertions. Start implementation from Core Hub
`docs/specs/operator-ui-rebuild-backlog.md`, not from old Inter-Hub screens.
### T19 — Ops-hub MCP server registration decision
```task ```task
id: CUST-WP-0025-T19 id: CUST-WP-0025-T19
status: todo status: cancel
priority: medium priority: medium
state_hub_task_id: "f033c80e-4ebb-49cf-8987-20c9b2ff4c13" state_hub_task_id: "f033c80e-4ebb-49cf-8987-20c9b2ff4c13"
``` ```
Register ops-hub MCP server: Cancel the old immediate registration of a standalone `ops-hub` MCP server.
- Port 8002 (dev-hub on 8001, ops-hub on 8002)
- Update global `~/.claude/CLAUDE.md` with ops-hub registration The preferred replacement path is Core Hub API first and operator CLI second.
- Update session protocol: domain repos that touch infrastructure should Register a separate ops-hub MCP server only if post-cutover usage proves that a
call both `get_domain_summary()` (dev-hub) and ops-hub orientation separate service boundary is still useful. Until then, State Hub progress and
Core Hub API/CLI evidence are the coordination surfaces.
## Phase 4 — Business Model & Fin Hub ## Phase 4 — Business Model & Fin Hub

View File

@@ -456,21 +456,22 @@ Progress 2026-06-27:
`NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local `NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local
identity is done, and the remaining identity gate is the IAM Profile v0.2 identity is done, and the remaining identity gate is the IAM Profile v0.2
FastAPI integration test. FastAPI integration test.
- Current ops-hub reality is extension-first: `ops-hub` exists, - Current ops-hub reality is Core Hub replacement-first: `CORE-WP-0008`
`OPS-WP-0001` is finished, and `OPS-WP-0002` waits on authenticated finished the API smoke harness, activity-core sink, staging profile, CLI
Inter-Hub bootstrap/runtime-key evidence. Reconcile `CUST-WP-0025-T13`-`T19` wrappers, UI rebuild backlog, and Custodian handoff. `CUST-WP-0025-T13`-`T19`
after the first governed ops event lands. have been rewritten away from the obsolete standalone scaffold.
- Fin-hub/business tasks remain deliberately deferred until identity integration - Fin-hub/business tasks remain deliberately deferred until identity integration
and ops-hub extension evidence are proven. and ops-hub extension evidence are proven.
Progress 2026-06-27 Core Hub reset: Progress 2026-06-27 Core Hub reset:
- `CUST-WP-0052` now owns the reset criteria. `CUST-WP-0025-T13` through - `CUST-WP-0052` completed the Phase 3 reset. `CUST-WP-0025-T13` through
`T19` should not be executed literally as the old standalone ops-hub scaffold `T19` now point at Core Hub-owned API evidence, CLI parity, deployed
until Core Hub replacement evidence is good enough and the tasks are rewritten. smoke/cutover gates, whynot-aligned UI, and cancellation of immediate
- Core Hub is promising enough to stop expanding the Inter-Hub-first path: standalone ops-hub MCP registration.
local ops-hub bootstrap compatibility and `/console` visual checks exist, but - Core Hub is now the preferred replacement lane, but staging import, deployed
staging import, deployed dual-run smokes, and cutover evidence are still open. dual-run smokes, cutover evidence, and Haskell retirement approval remain
open.
## Task: Create The Stable Pickup Checkpoint ## Task: Create The Stable Pickup Checkpoint

View File

@@ -158,7 +158,7 @@ and gated UI rebuild criteria.
```task ```task
id: CUST-WP-0052-T04 id: CUST-WP-0052-T04
status: todo status: done
priority: high priority: high
state_hub_task_id: "04c9c807-68d0-4750-bd72-a484730cd55d" state_hub_task_id: "04c9c807-68d0-4750-bd72-a484730cd55d"
``` ```
@@ -181,9 +181,14 @@ points future agents at the obsolete mega-hub/Inter-Hub scaffold sequence.
summarizes the Core Hub replacement proof from `CORE-WP-0008-T02` through summarizes the Core Hub replacement proof from `CORE-WP-0008-T02` through
`T06`, records why `CUST-WP-0047-T05` and `CUST-WP-0049-T06` should remain `T06`, records why `CUST-WP-0047-T05` and `CUST-WP-0049-T06` should remain
legacy/fallback wait tasks for now, and gives rewrite guidance for legacy/fallback wait tasks for now, and gives rewrite guidance for
`CUST-WP-0025-T13` through `T19`. The actual `CUST-WP-0025` rewrite is still `CUST-WP-0025-T13` through `T19`.
open because no live deployed Core Hub smoke ids/counts or cutover proof exist
yet. Completed 2026-06-27: rewrote `CUST-WP-0025-T13` through `T19` around Core
Hub-owned API evidence, operator CLI parity, deployed smoke/cutover gates, and
the whynot-aligned Core Hub UI backlog. The rewrite marks the old immediate
standalone ops-hub MCP registration as cancelled, keeps deployed evidence and
cutover tasks waiting on real staging/runtime proof, and does not claim Haskell
retirement is unblocked.
## Task: Align Helixforge Build And Environment Practices ## Task: Align Helixforge Build And Environment Practices
@@ -265,8 +270,8 @@ workplan notes, not buried in chat.
- CUST-WP-0051, CUST-WP-0047, and CUST-WP-0049 point toward Core Hub replacement - CUST-WP-0051, CUST-WP-0047, and CUST-WP-0049 point toward Core Hub replacement
instead of further Inter-Hub expansion. instead of further Inter-Hub expansion.
- CUST-WP-0025 has a clear reset gate and no one resumes the old standalone - CUST-WP-0025 Phase 3 has been rewritten so no one resumes the old
ops-hub scaffold until it is rewritten. standalone ops-hub scaffold sequence.
- The next implementation lane is API first, CLI second, web UI third. - The next implementation lane is API first, CLI second, web UI third.
- UI rebuild expectations name whynot-design and operator-priority views. - UI rebuild expectations name whynot-design and operator-priority views.
- External ops-warden needs are routed through State Hub requirements, not - External ops-warden needs are routed through State Hub requirements, not