Files
the-custodian/workplans/CUST-WP-0025-fos-hub-bootstrap.md
tegwick 0777e5b2f0 feat: add FOS/credential standards, big-picture guidance, and CUST-WP-0025 workplan
- canon/standards/credential-management_v0.1.md: single root-of-trust credential hierarchy standard
- canon/standards/federated-organization-standard_v1.0.md: FOS reference architecture (VSM-based)
- wiki/BigPictureGuidance.md: integration guidance for OAS + FOS orthogonal layers
- workplans/CUST-WP-0025-fos-hub-bootstrap.md: 4-phase plan (identity, hub-core extraction, ops-hub, fin-hub)
- state-hub/Makefile: treat exit 2 (warnings-only) as success in check-consistency targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 23:48:13 +01:00

493 lines
16 KiB
Markdown

---
id: CUST-WP-0025
type: workplan
title: "FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub"
domain: custodian
repo: the-custodian
status: active
owner: custodian
topic_slug: custodian
created: "2026-03-20"
updated: "2026-03-20"
state_hub_workstream_id: "293a74fe-a85a-4ad6-8933-23d52a72fe8b"
---
# FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub
## Goal
Progress the Custodian from FOS maturity Level 1 (Single-Hub Emergence) toward
Level 3 (Core Federation) by:
1. Finalizing shared identity infrastructure (NetKingdom SSO)
2. Extracting a generic reusable hub-core package from state-hub
3. Renaming state-hub to dev-hub and transitioning all repos
4. Creating the ops-hub for runtime operations coordination
5. Building the fin-hub with railiance-as-a-service as first monetization path
## Context
The state-hub has matured through 24 completed workplans (62 workstreams, 573 tasks)
but remains a monolithic single hub mixing dev-coordination, governance, and generic
infrastructure. Per FOS §13.1, this risks becoming the "Mega-Hub" anti-pattern.
Two standards govern the architecture:
- **FOS** (Federated Organisation Standard): organizational recursion via domain hubs
- **OAS** (Orthogonal Architecture Standard): compute substrate via 6 dimensions
Together they form a complete cybernetic stack: FOS gives the viable organization,
OAS gives the viable infrastructure.
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Hub-core packaging | Separate pip-installable package | Clean separation, versioned independently, each hub depends via uv |
| Phase sequencing | Parallel start (Phase 1 + 2) | Identity and extraction run concurrently; auth bolted on later |
| Ops Hub location | New standalone repo | FOS separation principle — each hub independently deployable |
| First monetization | Railiance-as-a-service | Package OAS infra stack as managed/consultancy for EU SMEs |
## Phase 1 — Identity Infrastructure
**Goal**: Finalized user-id infrastructure so all future hubs share one SSO plane.
**Repos**: net-kingdom, railiance-cluster, railiance-platform
**Runs in parallel with Phase 2.**
### T01 — Complete NK-WP-0001: Keycloak + privacyIDEA on k3s
```task
id: CUST-WP-0025-T01
status: todo
priority: high
state_hub_task_id: "f55078b6-7fa3-49ab-be30-37db622d64c9"
```
Complete the SSO/MFA platform deployment. Keycloak as OIDC provider with
privacyIDEA for MFA, running on the k3s cluster. This is the identity
foundation for all hubs and services.
Cross-reference: net-kingdom NK-WP-0001.
### T02 — Complete NK-WP-0002: Local identity bootstrap
```task
id: CUST-WP-0025-T02
status: todo
priority: high
state_hub_task_id: "0d7792f7-5695-4e1a-9726-b9661d5e7108"
```
Implement lightweight file-based OIDC server for dev/sandbox/bootstrap
scenarios where the full Keycloak cluster is unavailable. Enables local
development of hub services without cluster dependency.
Cross-reference: net-kingdom NK-WP-0002.
### T03 — IAM Profile integration test
```task
id: CUST-WP-0025-T03
status: todo
priority: medium
state_hub_task_id: "e9894ac9-add3-45a6-9893-ea67c6e5e260"
```
Prove a FastAPI service can authenticate via NetKingdom OIDC end-to-end.
Write a minimal test service + integration test that:
- Obtains a token via OIDC/PKCE flow
- Calls a protected endpoint
- Validates token claims (sub, roles, expiry)
This test becomes the template for hub-core auth middleware.
### T04 — Canon standard: IAM Profile specification
```task
id: CUST-WP-0025-T04
status: todo
priority: medium
state_hub_task_id: "69acc880-394b-478a-94f0-476c9cbc1bc6"
```
Document the OIDC contract as `canon/standards/iam-profile_v0.1.md`:
- Discovery endpoint structure
- Required claims and scopes
- Token lifecycle (access + refresh)
- Hub-to-hub service account pattern
- Human override / emergency access
## Phase 2 — Hub Extraction & Dev Hub Rename
**Goal**: Extract generic hub-core package; rename state-hub to dev-hub.
**Repo**: the-custodian (extraction), hub-core (new repo)
**Runs in parallel with Phase 1.**
### Extraction Boundary
**Generic hub-core (~17 MCP tools, ~6 models, ~6 routers):**
- Models: Domain, AgentMessage, CapabilityCatalog, CapabilityRequest, ManagedRepo, TPSC*, ProgressEvent (generic event_types)
- Routers: domains, repos, messages, capability_requests, tpsc, policy
- MCP tools: orientation, messaging, capability routing, repo management, TPSC/GDPR, DoI
**Dev-hub-specific (~51 MCP tools, ~12 models):**
- Topics, workstreams, tasks, decisions, dependencies, EP/TD, contributions, SBOM, goals, DoI cache, kaizen agents, consistency checker
### T05 — Create hub-core package
```task
id: CUST-WP-0025-T05
status: todo
priority: high
state_hub_task_id: "04bf480c-8847-4a89-a4f2-e7c5fc51088d"
```
Create `hub-core` as a standalone repo with `pyproject.toml` (uv-managed).
Extract from state-hub:
- Generic SQLAlchemy models (Domain, AgentMessage, CapabilityCatalog, CapabilityRequest, ManagedRepo, TPSC*, ProgressEvent)
- Generic Pydantic schemas
- Generic FastAPI routers (domains, repos, messages, capability_requests, tpsc, policy)
- Alembic migration templates for core schema
- Shared utilities (slug resolution, pagination, trailing-slash normalization)
### T06 — Hub-core FastMCP base server
```task
id: CUST-WP-0025-T06
status: todo
priority: high
state_hub_task_id: "6b49d94a-b1ea-4507-a8a3-e27c1a918491"
```
Add a base MCP server class to hub-core that provides the ~17 generic tools:
- Orientation: get_state_summary, get_domain_summary, list_domains
- Messaging: send_message, get_messages, mark_message_read, reply_to_message
- Capability routing: register_capability, list_capabilities, request_capability, accept_capability_request, update_capability_request_status, list_capability_requests, get_capability_request
- Repo management: register_repo, update_repo_path, list_domain_repos
- TPSC/GDPR: register_service, list_services, ingest_tpsc_tool, get_gdpr_report
- DoI: check_repo_doi, get_doi_summary
Domain-specific hubs inherit and add their own tools.
### T07 — FOS §10 risk and alert tools
```task
id: CUST-WP-0025-T07
status: todo
priority: medium
state_hub_task_id: "5a54af24-f7cb-451f-874f-66bd6979ab07"
```
Add `get_risks()` and `get_alerts()` to hub-core, formalizing existing
ProgressEvent patterns. Define canonical event_type values:
- `risk_surfaced`, `risk_mitigated`, `risk_escalated`
- `alert_raised`, `alert_acknowledged`, `alert_resolved`
This completes the FOS §10 cross-hub contract.
### T08 — Refactor state-hub to import from hub-core
```task
id: CUST-WP-0025-T08
status: todo
priority: high
state_hub_task_id: "daf1d8ac-b55a-4692-b359-2671ddf6fc8a"
```
Refactor the state-hub codebase:
- Replace generic models/routers/schemas with imports from hub-core
- Keep dev-specific code (topics, workstreams, tasks, decisions, etc.) in state-hub
- Ensure all existing tests pass with the new import structure
- Update pyproject.toml to depend on hub-core
### T09 — Rename MCP server state-hub to dev-hub
```task
id: CUST-WP-0025-T09
status: todo
priority: high
state_hub_task_id: "2148a804-7d6a-4e26-b1a8-08da24929c88"
```
Rename across all integration points:
- `state-hub/mcp_server/server.py`: name="state-hub" → "dev-hub"
- `~/.claude/CLAUDE.md`: 3 locations (registration commands, references)
- `state-hub/scripts/register_project.sh`: validation checks
- `state-hub/scripts/patch_mcp_cwd.py`: config checks
- `state-hub/custodian_cli.py`: config checks
- `state-hub/scripts/project_rules/session-protocol.template`: template text
- `state-hub/api/main.py`: service metadata response
### T10 — MCP config migration script
```task
id: CUST-WP-0025-T10
status: todo
priority: medium
state_hub_task_id: "5953f129-089d-4d90-bbe5-f86da4eac1bf"
```
Create `state-hub/scripts/migrate_mcp_config.py` that:
- Reads `~/.claude.json`
- Renames `mcpServers["state-hub"]` to `mcpServers["dev-hub"]`
- Preserves all other settings
- Backs up original file before writing
### T11 — Regenerate domain repo rule files
```task
id: CUST-WP-0025-T11
status: todo
priority: medium
state_hub_task_id: "7b41766b-f97f-4e9f-9f3c-c0937edb355f"
```
After template update, regenerate `.claude/rules/session-protocol.md` for
all registered domain repos:
- railiance-infra, railiance-cluster, railiance-platform
- railiance-enablement, railiance-apps
- net-kingdom, markitect, coulomb.social
- personhood, foerster-capabilities
### T12 — Full test suite and consistency check
```task
id: CUST-WP-0025-T12
status: todo
priority: high
state_hub_task_id: "e55ae544-3cea-485e-80d5-a9696ef97b96"
```
Gate: all of the following must pass before Phase 2 is considered complete:
- `cd state-hub && make test` — full test suite
- `make fix-consistency REPO=the-custodian` — workplan ↔ DB sync
- `make check-consistency-all` — all registered repos
- Manual smoke test: start dev-hub MCP server, run get_domain_summary from a domain repo
## Phase 3 — Ops Hub
**Goal**: Runtime operations coordination per FOS §7.3.
**Depends on**: Phase 2 (hub_core available), Phase 1 (identity for service auth).
**Repo**: ops-hub (new standalone repo, registered under custodian domain)
### T13 — Create ops-hub repo from hub-core scaffold
```task
id: CUST-WP-0025-T13
status: todo
priority: medium
state_hub_task_id: "2c6d1429-a67a-4f66-84d1-cb32ffdb890f"
```
Create `ops-hub` repo with:
- pyproject.toml depending on hub-core
- FastAPI app factory inheriting hub-core base
- MCP server extending hub-core base server
- Alembic setup with hub-core core migrations + ops-specific
- Register as managed repo under custodian domain
### T14 — Ops-specific models
```task
id: CUST-WP-0025-T14
status: todo
priority: medium
state_hub_task_id: "0e811e9b-23a5-49f9-979e-cd1c5dcd937f"
```
Define SQLAlchemy models for:
- **Service**: name, namespace, health_status, last_seen, endpoints
- **Incident**: severity, status (open/investigating/mitigated/resolved), timeline
- **Runbook**: service_id, trigger_conditions, steps, last_executed
- **AccessPath**: type (ssh/k8s/http), target, auth_method, status
- **OperationalDebt**: category, severity, location, owner
- **ChangeRecord**: what changed, when, by whom, rollback_path
### T15 — Ops-specific MCP tools
```task
id: CUST-WP-0025-T15
status: todo
priority: medium
state_hub_task_id: "3fdd1f61-4c8e-4614-898b-df7a9aa4a514"
```
Implement ops-domain MCP tools:
- Service registry: register_service, list_services, get_service_health
- Health probes: probe_service, get_cluster_health, get_storage_health
- Incident lifecycle: create_incident, update_incident, resolve_incident
- Runbook: get_runbook, execute_runbook_step
- Access: list_access_paths, check_access_path
### T16 — Railiance infrastructure integration
```task
id: CUST-WP-0025-T16
status: todo
priority: medium
state_hub_task_id: "702849c5-b253-4ede-afa7-0ab4f81e49a5"
```
Connect ops-hub to railiance infrastructure observability:
- k3s cluster health via kubectl/API
- Longhorn storage status and replication state
- Certificate expiry tracking (cert-manager)
- Backup status (S2 integrated backup)
- SSH tunnel health (ops-bridge)
### T17 — Cross-hub protocol: ops-hub to dev-hub
```task
id: CUST-WP-0025-T17
status: todo
priority: medium
state_hub_task_id: "b99a3ed8-440b-4e28-88f5-495de7276f66"
```
Implement FOS §9.2.5 event coupling:
- Deployment events in dev-hub → change signals in ops-hub
- Incident events in ops-hub → blocker signals in dev-hub
- Shared event vocabulary (canonical event_types)
- HTTP-based event forwarding (keep it simple; upgrade to NATS later if needed)
### T18 — Ops Hub "now view" dashboard
```task
id: CUST-WP-0025-T18
status: todo
priority: low
state_hub_task_id: "5b6cea8b-3982-49be-bacf-7269a3d2104e"
```
Observable Framework dashboard for ops-hub:
- Service status grid (green/amber/red)
- Active incidents timeline
- Access path map
- Storage and certificate health
- Recent change log
### T19 — Register ops-hub as MCP server
```task
id: CUST-WP-0025-T19
status: todo
priority: medium
state_hub_task_id: "f033c80e-4ebb-49cf-8987-20c9b2ff4c13"
```
Register ops-hub MCP server:
- Port 8002 (dev-hub on 8001, ops-hub on 8002)
- Update global `~/.claude/CLAUDE.md` with ops-hub registration
- Update session protocol: domain repos that touch infrastructure should
call both `get_domain_summary()` (dev-hub) and ops-hub orientation
## Phase 4 — Business Model & Fin Hub
**Goal**: First monetization via railiance-as-a-service + resource viability hub.
**Depends on**: Phase 3 (multi-hub pattern proven).
### T20 — Business model canvas: railiance-as-a-service
```task
id: CUST-WP-0025-T20
status: todo
priority: medium
state_hub_task_id: "55db0560-2733-481d-adba-b72c3839ba45"
```
Define the offering:
- Target: EU SMEs needing sovereign, GDPR-compliant DevOps infrastructure
- Core: managed k3s cluster + observability + GitOps + backup
- Differentiator: VSM-based organizational architecture, not just infra
- Pricing tiers: self-hosted (open-source), managed, fully operated
- Document as `canon/projects/railiance/business-model-canvas_v0.1.md`
### T21 — Canon: Bootstrap Protocol document
```task
id: CUST-WP-0025-T21
status: todo
priority: medium
state_hub_task_id: "ce54d3fc-140e-49be-a181-779abc434d4e"
```
Address FOS blindspot #2 (bootstrapping & initial capital):
- Seed funding strategy and minimum viable budget
- MVP scope definition (what must exist before first customer)
- First 3 mandated roles: Constitutional Steward, Technical Operator, Financial Allocator
- Revenue threshold for role formalization
- Document as `canon/constitution/bootstrap-protocol_v0.1.md`
### T22 — Create fin-hub repo from hub-core scaffold
```task
id: CUST-WP-0025-T22
status: todo
priority: low
state_hub_task_id: "670757d8-305d-4736-9056-e79a150114b1"
```
Create `fin-hub` repo with same scaffold pattern as ops-hub.
Register under custodian domain.
### T23 — Fin-specific models
```task
id: CUST-WP-0025-T23
status: todo
priority: low
state_hub_task_id: "8ebffb3f-0dbb-4672-b4e9-928992c41cf4"
```
Define SQLAlchemy models for:
- **Budget**: domain, period, allocated, committed, spent
- **Commitment**: type (subscription/contract/salary), amount, cadence, start/end
- **BurnRate**: domain, period, actual_spend, projected_spend
- **RunwayProjection**: current_balance, monthly_burn, months_remaining, alert_threshold
- **TokenSpend**: provider (anthropic/openai), model, tokens_in, tokens_out, cost, session_id
### T24 — Fin-hub implementation: cost tracking + runway
```task
id: CUST-WP-0025-T24
status: todo
priority: low
state_hub_task_id: "405f81d3-dec5-4154-a1b8-a3af344a0cc4"
```
Implement:
- Cloud cost ingestion (manual CSV import initially, OpenCost integration later)
- Anthropic API token spend tracking (parse billing exports)
- HostEurope server cost tracking
- Runway calculator with burn-rate projection
- Budget alerts when projected runway drops below threshold
### T25 — Cross-hub coupling: fin-hub connections
```task
id: CUST-WP-0025-T25
status: todo
priority: low
state_hub_task_id: "90a41790-7290-4145-b89f-88bf491d7652"
```
Implement FOS §9 cross-hub coupling:
- fin→dev: resource pressure signals (budget alerts surface in dev-hub)
- fin→ops: infrastructure cost attribution (per-service cost view)
- fin→canon: viability alerts (runway below threshold escalates to System 5)
### T26 — Pricing and packaging: railiance-as-a-service MVP
```task
id: CUST-WP-0025-T26
status: todo
priority: low
state_hub_task_id: "e17ef269-e349-44cc-ab14-6c57b43199b1"
```
Concrete pricing:
- Define 3 tiers with feature matrix
- Create landing page content
- Define onboarding workflow (customer → provisioned k3s + monitoring)
- Legal: GmbH implications, liability, SLA framework
- First customer acquisition strategy