Rewrite FOS ops hub phase for Core Hub

This commit is contained in:
2026-06-27 21:59:20 +02:00
parent 8c4d9b952b
commit befb35c056
5 changed files with 158 additions and 90 deletions

View File

@@ -72,8 +72,10 @@ Inter-Hub bootstrap or rollback validation.
- T19: cancel or defer ops-hub MCP server registration until post-cutover
demand proves it is needed.
This is enough to rewrite `CUST-WP-0025` safely, but not enough to declare Core
Hub production cutover complete.
2026-06-27 follow-up: `CUST-WP-0025-T13` through `T19` have now been
rewritten around this recommendation. The rewrite is enough to stop the obsolete
standalone ops-hub scaffold sequence, but not enough to declare Core Hub
production cutover complete.
## Remaining Gates

View File

@@ -21,14 +21,14 @@ Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workpla
| --- | --- | --- |
| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. |
| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. |
| Ops hub | The old `ops-hub`/Inter-Hub extension path has useful seed evidence, but Core Hub now has the credible replacement platform: local `/api/v2` compatibility, ops-hub bootstrap smoke, protected persistence-backed resources, and `/console` visual checks. | Create or update the Core Hub API-first continuation workplan. Treat Haskell Inter-Hub as legacy compatibility or rollback evidence. |
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` still describe a standalone hub-core/FastAPI/MCP scaffold. Current implementation direction is Core Hub replacement-first, not Inter-Hub extension-first. | Reconcile these tasks after Core Hub has a deployed compatibility/evidence smoke: rewrite them to Core Hub-owned API/CLI/UI tasks or explicitly defer/cancel the old standalone scaffold. |
| Ops hub | Core Hub is now the replacement platform: `CORE-WP-0008` finished the API smoke harness, activity-core sink, staging profile, CLI wrappers, UI rebuild backlog, and Custodian handoff. Live deployed smokes and cutover evidence are still open. | Continue through Core Hub deployed evidence, migration import, activity-core smoke, and cutover gates. Treat Haskell Inter-Hub as legacy compatibility or rollback evidence. |
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` have been rewritten around Core Hub API evidence, CLI parity, deployed smoke/cutover gates, whynot-aligned UI, and cancellation of immediate standalone ops-hub MCP registration. | Execute the remaining wait/todo gates in the rewritten Phase 3. Do not resume the obsolete standalone ops-hub scaffold sequence. |
| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. |
## Stable Pickup Order
1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test.
2. Use `CUST-WP-0052` to open or update the Core Hub API-first continuation lane.
3. Keep `CUST-WP-0047`/`CUST-WP-0049` as legacy evidence/fallback until Core Hub smoke evidence or an explicit supersede decision closes them.
4. Rewrite `CUST-WP-0025-T13`-`T19` after Core Hub proves the replacement path.
2. Use the finished `CORE-WP-0008` evidence lane and `CUST-WP-0052` reset notes as the Core Hub replacement baseline.
3. Keep `CUST-WP-0047`/`CUST-WP-0049` as legacy evidence/fallback until Core Hub deployed smoke evidence or an explicit supersede decision closes them.
4. Execute rewritten `CUST-WP-0025-T14`, `T16`, `T17`, and `T18` in API/CLI/UI order.
5. Start fin-hub/business work only after ops-hub proves the Core Hub pattern end-to-end.

View File

@@ -350,15 +350,15 @@ few repos (unrelated to dev-hub rename); no new automation errors introduced.
**Goal**: Runtime operations coordination per FOS §7.3.
**Depends on**: Phase 2 (hub_core available), Phase 1 (identity for service auth).
**Repo**: ops-hub (new standalone repo, registered under custodian domain)
**Repo**: core-hub for replacement runtime; the-custodian for coordination; standalone ops-hub is deferred until post-cutover need is proven.
**Inventory-first implementation slice (2026-06-05):** `CUST-WP-0047`
carves out the minimum useful part of T14/T16/T18 before the full standalone
`ops-hub` scaffold exists: a repo-owned service inventory contract, an initial
carves out the minimum useful part of T14/T16/T18 before the replacement runtime
is fully proven: a repo-owned service inventory contract, an initial
service/location/evidence seed, and the handoff path for Inter-Hub widgets and
activity-core probes. The T13-T19 tasks below remain the long-term ops-hub
implementation; the inventory slice produces input artifacts that the eventual
ops-hub repo can ingest rather than replace.
activity-core probes. After the Core Hub reset, these artifacts feed Core Hub
ops evidence first; a separate ops-hub repo should ingest them only if a
post-cutover service boundary is proven useful.
**Inter-Hub bootstrap access lane (2026-06-17):** `CUST-WP-0049` extracts the
repeatable authenticated bootstrap routine needed to finish ops-hub production
@@ -367,29 +367,37 @@ ops-warden owns the short-lived SSH certificate envelope, and operator secret
custody remains outside Git.
**Core Hub reset (2026-06-27):** `CUST-WP-0052` supersedes the Inter-Hub-first
implementation direction for future work. The old T13-T19 standalone ops-hub
scaffold should not be executed literally until it is rewritten around Core Hub:
API-first replacement contracts, CLI helpers second, and a rebuilt whynot-aligned
operator UI third. Keep this phase active as a coordination record, not as a
mandate to expand Haskell Inter-Hub.
implementation direction for future work. T13-T19 below have been rewritten
around Core Hub: API-first replacement contracts, CLI helpers second, deployed
evidence and cutover gates, and a rebuilt whynot-aligned operator UI third. Keep
Haskell Inter-Hub as legacy compatibility or rollback evidence, not the preferred
implementation target.
### T13 — Create ops-hub repo from hub-core scaffold
### T13 — Open Core Hub replacement lane
```task
id: CUST-WP-0025-T13
status: todo
priority: medium
status: done
priority: high
state_hub_task_id: "2c6d1429-a67a-4f66-84d1-cb32ffdb890f"
```
Create `ops-hub` repo with:
- pyproject.toml depending on hub-core
- FastAPI app factory inheriting hub-core base
- MCP server extending hub-core base server
- Alembic setup with hub-core core migrations + ops-specific
- Register as managed repo under custodian domain
Replace the old immediate standalone `ops-hub` repo scaffold with a Core
Hub-owned replacement lane.
### T14 — Ops-specific models
The replacement lane must keep the FOS intent of runtime operations
coordination while using the current implementation order:
- Core Hub API resources and compatibility/evidence smokes first;
- thin operator CLI wrappers second;
- web UI rebuild third, after API/CLI parity is stable.
Completed 2026-06-27: Core Hub workplan `CORE-WP-0008` finished as the
API-first execution counterpart, and Custodian recorded the replacement evidence
handoff in `docs/core-hub-replacement-evidence.md`. This task is complete as a
reframe/open-lane task; it does not claim production cutover is complete.
### T14 — Define Core Hub ops evidence contract and read-model gaps
```task
id: CUST-WP-0025-T14
@@ -398,91 +406,143 @@ priority: medium
state_hub_task_id: "0e811e9b-23a5-49f9-979e-cd1c5dcd937f"
```
Define SQLAlchemy models for:
- **Service**: name, namespace, health_status, last_seen, endpoints
- **Incident**: severity, status (open/investigating/mitigated/resolved), timeline
- **Runbook**: service_id, trigger_conditions, steps, last_executed
- **AccessPath**: type (ssh/k8s/http), target, auth_method, status
- **OperationalDebt**: category, severity, location, owner
- **ChangeRecord**: what changed, when, by whom, rollback_path
Define the Core Hub-owned operations evidence contract that replaces the old
standalone ops-specific model list.
### T15 — Ops-specific MCP tools
The contract should reconcile:
- `CUST-WP-0047` service inventory and current evidence vocabulary;
- Core Hub hubs, manifests, widgets, API consumers, and interaction events;
- activity-core probe metadata and `core-hub-interaction-event` sink output;
- migration runs, deployment records, outcome signals, and cutover evidence;
- non-secret custody rules for key prefixes, hashes, routes, and evidence ids.
Known Core Hub API/read-model gaps to resolve before UI expansion:
- a protected migration-run read route such as `/api/v2/migration-runs`;
- non-deferred deployment/outcome evidence routes where needed;
- a mapping from service inventory ids to Core Hub widgets/events.
Done when Core Hub has a workplan or spec that names the API resources, record
shape, evidence event vocabulary, and migration path from the existing
Custodian inventory artifacts.
### T15 — Core Hub operator CLI parity
```task
id: CUST-WP-0025-T15
status: todo
status: done
priority: medium
state_hub_task_id: "3fdd1f61-4c8e-4614-898b-df7a9aa4a514"
```
Implement ops-domain MCP tools:
- Service registry: register_service, list_services, get_service_health
- Health probes: probe_service, get_cluster_health, get_storage_health
- Incident lifecycle: create_incident, update_incident, resolve_incident
- Runbook: get_runbook, execute_runbook_step
- Access: list_access_paths, check_access_path
Replace the old MCP-first ops tool plan with API and CLI parity first.
### T16 — Railiance infrastructure integration
Required CLI surface:
- deployed Core Hub smoke evidence;
- ops-hub bootstrap/status checks;
- migration bundle validate/import;
- cutover readiness summary from non-secret evidence reports.
Completed 2026-06-27: `CORE-WP-0008-T05` added `make operator-cli` and
`scripts/core_hub_cli.py` with wrappers around the same Core Hub API behavior
used by tests and smokes. Any MCP surface should consume these proven APIs later
rather than becoming the first implementation path.
### T16 — Deployed ops evidence and activity-core smokes
```task
id: CUST-WP-0025-T16
status: todo
priority: medium
status: wait
priority: high
state_hub_task_id: "702849c5-b253-4ede-afa7-0ab4f81e49a5"
```
Connect ops-hub to railiance infrastructure observability:
- k3s cluster health via kubectl/API
- Longhorn storage status and replication state
- Certificate expiry tracking (cert-manager)
- Backup status (S2 integrated backup)
- SSH tunnel health (ops-bridge)
Run the production-like Core Hub evidence smokes that replace the old direct
Railiance infrastructure integration task.
### T17 — Cross-hub protocol: ops-hub to dev-hub
Minimum evidence:
- `make deployed-smoke` or `make operator-cli CLI_ARGS="deployed-smoke ..."`
against a real Core Hub staging URL;
- deployed activity-core Core Hub sink smoke with approved runtime token and
widget mapping;
- non-secret report fields only: run id, hub/manifest/API-consumer ids,
key prefixes, widget/event ids, counts, statuses, and containment booleans;
- State Hub progress note linking the evidence and naming any remaining gates.
Blocked until an approved `CORE_HUB_BASE_URL`, operator/runtime token custody
path, and activity-core widget mapping are available. This task can close or
supersede `CUST-WP-0047-T05` and `CUST-WP-0049-T06` only after deployed Core
Hub evidence exists or an explicit supersede decision is recorded.
### T17 — Core Hub, dev-hub, and cutover decision coupling
```task
id: CUST-WP-0025-T17
status: todo
status: wait
priority: medium
state_hub_task_id: "b99a3ed8-440b-4e28-88f5-495de7276f66"
```
Implement FOS §9.2.5 event coupling:
- Deployment events in dev-hub → change signals in ops-hub
- Incident events in ops-hub → blocker signals in dev-hub
- Shared event vocabulary (canonical event_types)
- HTTP-based event forwarding (keep it simple; upgrade to NATS later if needed)
Replace the old ops-hub-to-dev-hub protocol task with Core Hub replacement
coupling and cutover decision records.
### T18 — Ops Hub "now view" dashboard
Minimum scope:
- Core Hub readiness summary from deployed smoke, migration import,
activity-core sink, and optional legacy Inter-Hub reference evidence;
- State Hub progress/decision records that state whether legacy Inter-Hub
fallback remains required;
- compatibility notes for consumers that still expect Inter-Hub `/api/v2`;
- rollback and Haskell retirement gates kept explicit.
Blocked until `CORE-WP-0005` staging import, dual-run smokes, and cutover
readiness evidence exist. Do not unblock `CORE-WP-0007` Haskell retirement from
local-only evidence.
### T18 — Core Hub operator UI first screens
```task
id: CUST-WP-0025-T18
status: todo
priority: low
priority: medium
state_hub_task_id: "5b6cea8b-3982-49be-bacf-7269a3d2104e"
```
Observable Framework dashboard for ops-hub:
- Service status grid (green/amber/red)
- Active incidents timeline
- Access path map
- Storage and certificate health
- Recent change log
Replace the old Observable Framework dashboard task with the Core Hub operator
UI rebuild backlog.
### T19 — Register ops-hub as MCP server
Initial UI work should implement only the first operator-critical screens:
- readiness overview;
- registry explorer;
- evidence stream;
- migration/cutover state;
- action-required gates;
- access metadata as a support panel, not a broad expansion area.
Use whynot-design tokens/components wherever practical and preserve
`make visual-check` style desktop/mobile, no-overlap, text-overflow, protected
route, and non-secret assertions. Start implementation from Core Hub
`docs/specs/operator-ui-rebuild-backlog.md`, not from old Inter-Hub screens.
### T19 — Ops-hub MCP server registration decision
```task
id: CUST-WP-0025-T19
status: todo
status: cancel
priority: medium
state_hub_task_id: "f033c80e-4ebb-49cf-8987-20c9b2ff4c13"
```
Register ops-hub MCP server:
- Port 8002 (dev-hub on 8001, ops-hub on 8002)
- Update global `~/.claude/CLAUDE.md` with ops-hub registration
- Update session protocol: domain repos that touch infrastructure should
call both `get_domain_summary()` (dev-hub) and ops-hub orientation
Cancel the old immediate registration of a standalone `ops-hub` MCP server.
The preferred replacement path is Core Hub API first and operator CLI second.
Register a separate ops-hub MCP server only if post-cutover usage proves that a
separate service boundary is still useful. Until then, State Hub progress and
Core Hub API/CLI evidence are the coordination surfaces.
## Phase 4 — Business Model & Fin Hub

View File

@@ -456,21 +456,22 @@ Progress 2026-06-27:
`NK-WP-0001` Keycloak task is cancelled as superseded, `NK-WP-0002` local
identity is done, and the remaining identity gate is the IAM Profile v0.2
FastAPI integration test.
- Current ops-hub reality is extension-first: `ops-hub` exists,
`OPS-WP-0001` is finished, and `OPS-WP-0002` waits on authenticated
Inter-Hub bootstrap/runtime-key evidence. Reconcile `CUST-WP-0025-T13`-`T19`
after the first governed ops event lands.
- Current ops-hub reality is Core Hub replacement-first: `CORE-WP-0008`
finished the API smoke harness, activity-core sink, staging profile, CLI
wrappers, UI rebuild backlog, and Custodian handoff. `CUST-WP-0025-T13`-`T19`
have been rewritten away from the obsolete standalone scaffold.
- Fin-hub/business tasks remain deliberately deferred until identity integration
and ops-hub extension evidence are proven.
Progress 2026-06-27 Core Hub reset:
- `CUST-WP-0052` now owns the reset criteria. `CUST-WP-0025-T13` through
`T19` should not be executed literally as the old standalone ops-hub scaffold
until Core Hub replacement evidence is good enough and the tasks are rewritten.
- Core Hub is promising enough to stop expanding the Inter-Hub-first path:
local ops-hub bootstrap compatibility and `/console` visual checks exist, but
staging import, deployed dual-run smokes, and cutover evidence are still open.
- `CUST-WP-0052` completed the Phase 3 reset. `CUST-WP-0025-T13` through
`T19` now point at Core Hub-owned API evidence, CLI parity, deployed
smoke/cutover gates, whynot-aligned UI, and cancellation of immediate
standalone ops-hub MCP registration.
- Core Hub is now the preferred replacement lane, but staging import, deployed
dual-run smokes, cutover evidence, and Haskell retirement approval remain
open.
## Task: Create The Stable Pickup Checkpoint

View File

@@ -158,7 +158,7 @@ and gated UI rebuild criteria.
```task
id: CUST-WP-0052-T04
status: todo
status: done
priority: high
state_hub_task_id: "04c9c807-68d0-4750-bd72-a484730cd55d"
```
@@ -181,9 +181,14 @@ points future agents at the obsolete mega-hub/Inter-Hub scaffold sequence.
summarizes the Core Hub replacement proof from `CORE-WP-0008-T02` through
`T06`, records why `CUST-WP-0047-T05` and `CUST-WP-0049-T06` should remain
legacy/fallback wait tasks for now, and gives rewrite guidance for
`CUST-WP-0025-T13` through `T19`. The actual `CUST-WP-0025` rewrite is still
open because no live deployed Core Hub smoke ids/counts or cutover proof exist
yet.
`CUST-WP-0025-T13` through `T19`.
Completed 2026-06-27: rewrote `CUST-WP-0025-T13` through `T19` around Core
Hub-owned API evidence, operator CLI parity, deployed smoke/cutover gates, and
the whynot-aligned Core Hub UI backlog. The rewrite marks the old immediate
standalone ops-hub MCP registration as cancelled, keeps deployed evidence and
cutover tasks waiting on real staging/runtime proof, and does not claim Haskell
retirement is unblocked.
## Task: Align Helixforge Build And Environment Practices
@@ -265,8 +270,8 @@ workplan notes, not buried in chat.
- CUST-WP-0051, CUST-WP-0047, and CUST-WP-0049 point toward Core Hub replacement
instead of further Inter-Hub expansion.
- CUST-WP-0025 has a clear reset gate and no one resumes the old standalone
ops-hub scaffold until it is rewritten.
- CUST-WP-0025 Phase 3 has been rewritten so no one resumes the old
standalone ops-hub scaffold sequence.
- The next implementation lane is API first, CLI second, web UI third.
- UI rebuild expectations name whynot-design and operator-priority views.
- External ops-warden needs are routed through State Hub requirements, not