From 0777e5b2f0681d49564e46eebd883b5281629c8f Mon Sep 17 00:00:00 2001 From: tegwick Date: Fri, 20 Mar 2026 23:48:13 +0100 Subject: [PATCH] feat: add FOS/credential standards, big-picture guidance, and CUST-WP-0025 workplan - canon/standards/credential-management_v0.1.md: single root-of-trust credential hierarchy standard - canon/standards/federated-organization-standard_v1.0.md: FOS reference architecture (VSM-based) - wiki/BigPictureGuidance.md: integration guidance for OAS + FOS orthogonal layers - workplans/CUST-WP-0025-fos-hub-bootstrap.md: 4-phase plan (identity, hub-core extraction, ops-hub, fin-hub) - state-hub/Makefile: treat exit 2 (warnings-only) as success in check-consistency targets Co-Authored-By: Claude Sonnet 4.6 --- canon/standards/credential-management_v0.1.md | 300 ++++++ .../federated-organization-standard_v1.0.md | 977 ++++++++++++++++++ state-hub/Makefile | 14 +- wiki/BigPictureGuidance.md | 118 +++ workplans/CUST-WP-0025-fos-hub-bootstrap.md | 492 +++++++++ 5 files changed, 1897 insertions(+), 4 deletions(-) create mode 100644 canon/standards/credential-management_v0.1.md create mode 100644 canon/standards/federated-organization-standard_v1.0.md create mode 100644 wiki/BigPictureGuidance.md create mode 100644 workplans/CUST-WP-0025-fos-hub-bootstrap.md diff --git a/canon/standards/credential-management_v0.1.md b/canon/standards/credential-management_v0.1.md new file mode 100644 index 0000000..09d795a --- /dev/null +++ b/canon/standards/credential-management_v0.1.md @@ -0,0 +1,300 @@ +--- +title: "Credential Management Standard" +version: "0.1" +status: "Draft Standard" +domain: custodian +scope: all-domains +created: "2026-03-20" +--- + +# Credential Management Standard + +**Version:** 0.1 +**Status:** Draft Standard +**Scope:** All domains and repositories in the federated organization + +--- + +## 1. Purpose + +This standard defines how credentials, secrets, and key material are +managed across all systems — from a developer workstation with no +infrastructure, to a fully operational Kubernetes cluster. + +The core principle is a **single root of trust**: one operator keypair +anchors all credential storage and encryption. Every secret can be +traced back to that root. No secret lives outside this hierarchy. + +--- + +## 2. Trust Hierarchy + +``` +Operator passphrase (human memory only — never stored anywhere) + │ + └── age keypair (~/.config/sops/age/key.txt — one per operator) + │ + ├── SOPS encryption (GitOps secrets in all repos) + │ └── secrets/**/*.sops.yaml — encrypted at rest in git + │ + ├── Ops bundle (age-encrypted tar — offsite backup) + │ └── ops-bundle-.tar.age + │ └── all service secrets at point-in-time + │ + └── KeePassXC (pre-cluster primary credential store) + │ └── master password = operator passphrase (or derived) + │ + ├── Infrastructure credentials + │ ├── SSH keys (server access) + │ ├── API tokens (Gitea, HostEurope, Hetzner) + │ └── Cloud credentials + │ + ├── Service secrets (per-domain groups) + │ ├── net-kingdom/privacyidea/ + │ ├── net-kingdom/lldap/ + │ ├── net-kingdom/authelia/ + │ ├── net-kingdom/keycape/ + │ └── railiance/postgres/ + │ + └── Vault root token (in-cluster phase, stored here) + └── HashiCorp Vault + └── External Secrets Operator (ESO) + └── K8s Secrets → pods +``` + +--- + +## 3. Phases + +### Phase 0 — Pre-cluster (bootstrap) + +**Used when:** No Kubernetes cluster is available. Local development, +initial server provisioning, CI bootstrap. + +**Tools:** age keypair + KeePassXC + ops bundle + +**Flow:** +1. Generate service secrets with a `gen-secrets.sh` script +2. Copy each secret manually into KeePassXC (under the appropriate group) +3. Encrypt a point-in-time ops bundle: `pack-bundle.sh ` +4. Store the ops bundle offsite (separate physical location from KeePassXC) +5. Shred the plaintext secrets directory: `find secrets/ -type f -exec shred -u {} \;` +6. When deploying to k8s, read each secret from KeePassXC and inject via + `create-secrets.sh` scripts that produce K8s Secrets + +**Invariant:** Plaintext secrets MUST NOT persist on disk after being +stored in KeePassXC. The only durable forms are: KeePassXC + ops bundle. + +--- + +### Phase 1 — GitOps secrets (SOPS) + +**Used when:** Secrets need to live alongside infrastructure code in git. +All repos with infrastructure manifests use this pattern. + +**Tools:** SOPS + age + +**Configuration (`.sops.yaml` in repo root):** +```yaml +creation_rules: + - path_regex: secrets/.*$ + age: >- + + - path_regex: .*\.sops\.yaml$ + age: >- + +``` + +**Multi-operator:** When a second operator joins, add their age public key +as an additional recipient and re-encrypt all secrets with `sops updatekeys`. +Both keys can decrypt independently — no single point of failure. + +**Invariant:** The age private key is NEVER committed to git. The public +key is committed (in `.sops.yaml` and `keys/age.pub`). Encrypted values +in git are safe to store and review. + +--- + +### Phase 2 — In-cluster (HashiCorp Vault) + +**Used when:** Kubernetes cluster is operational and stable. + +**Tools:** HashiCorp Vault + External Secrets Operator (ESO) + +**Why ESO over Vault Agent Injector:** ESO produces standard K8s Secrets, +which are compatible with plain Helm charts and do not require pod +annotation changes. Decision D4 (net-kingdom DECISIONS.md). + +**Flow:** +1. Bootstrap Vault with the root token stored in KeePassXC +2. Enable Kubernetes auth method (`vault auth enable kubernetes`) +3. Create per-service policies with least-privilege access +4. Migrate each service secret from KeePassXC into Vault +5. Deploy ESO `SecretStore` pointing to Vault +6. Replace `create-secrets.sh` calls with `ExternalSecret` manifests +7. Vault reconciles secrets into K8s Secrets automatically + +**KeePassXC post-cluster:** Remains the source of truth for: +- The Vault root/unseal keys (emergency only) +- Dev/sandbox systems that do not connect to in-cluster Vault +- New secrets before they are migrated into Vault + +--- + +## 4. KeePassXC Group Structure + +All service secrets are organized under a standardized group hierarchy: + +``` +KeePassXC root +├── Infrastructure +│ ├── SSH Keys +│ │ └── (private key as attachment, public key as note) +│ ├── API Tokens +│ │ ├── gitea-admin +│ │ ├── hosteurope-api +│ │ └── hetzner-api +│ └── Cloud Credentials +│ └── +│ +├── net-kingdom +│ ├── privacyidea +│ │ ├── PI_SECRET_KEY +│ │ ├── PI_PEPPER +│ │ ├── PI_DB_PASSWORD +│ │ ├── pi-admin (password + totp-seed) +│ │ ├── trigger-admin (password + API token) +│ │ └── enckey (attachment: enckey file + audit keypair) +│ ├── lldap +│ │ ├── LLDAP_JWT_SECRET +│ │ └── LLDAP_LDAP_USER_PASS +│ ├── authelia +│ │ ├── AUTHELIA_JWT_SECRET +│ │ ├── AUTHELIA_SESSION_SECRET +│ │ ├── AUTHELIA_STORAGE_ENCRYPTION_KEY +│ │ ├── AUTHELIA_OIDC_HMAC_SECRET +│ │ └── AUTHELIA_KEYCAPE_CLIENT_SECRET +│ └── keycape +│ ├── RSA signing key (attachment: private + public PEM) +│ └── PI_ADMIN_TOKEN +│ +├── railiance +│ ├── postgres +│ │ └── PG_ROOT_PASSWORD +│ └── sops-age +│ └── age private key (attachment: key.txt) +│ +└── vault + ├── root-token + └── unseal-keys (attachment: unseal-keys.txt, gpg-encrypted) +``` + +--- + +## 5. Age Keypair Management + +**One keypair per operator.** The same key is used for: +- SOPS encryption across all repos +- Ops bundle encryption + +**Generate:** +```bash +age-keygen -o ~/.config/sops/age/key.txt +# output: Public key: age1... +``` + +**Add to repos:** Copy the public key into `.sops.yaml` of each repo and +into `keys/age.pub`. Commit both. + +**Back up:** The private key file MUST be stored in KeePassXC as an +attachment under `railiance/sops-age/age private key`. The KeePassXC +database is the disaster recovery path for the age private key. + +**Rotation:** If the private key is compromised, generate a new keypair, +add the new public key to all repos, re-encrypt all secrets with +`sops updatekeys`, then revoke the old key from all `.sops.yaml` files. + +--- + +## 6. Ops Bundle + +The ops bundle is a point-in-time snapshot of all service secrets, +encrypted with age and stored offsite. + +**Create:** +```bash +bash gen-secrets.sh ./secrets # generates all secrets as env files +# ... enter each into KeePassXC ... +bash pack-bundle.sh ./secrets # → ops-bundle-.tar.age +find secrets/ -type f -exec shred -u {} \; # shred plaintext +``` + +**Restore:** +```bash +age -d -i ~/.config/sops/age/key.txt -o secrets.tar ops-bundle-.tar.age +tar xf secrets.tar +# re-run create-secrets.sh scripts from restored env files +``` + +**Frequency:** Create a new ops bundle: +- Before any major cluster operation (migration, upgrade, rekey) +- After adding or rotating any service secret +- At least once per quarter + +--- + +## 7. Prohibited Patterns + +These are hard violations regardless of context: + +| Pattern | Why prohibited | +|---------|----------------| +| Plaintext secrets committed to git | Unrecoverable leak | +| Secrets in environment variables in shell history | ~/.bash_history exposure | +| Sharing secrets via chat, email, or issue trackers | Uncontrolled propagation | +| Using the same password for multiple services | Single-point compromise | +| Storing age private key only on a single machine | Catastrophic loss on disk failure | +| Hardcoded secrets in application code or Helm values | Accidental publishing | + +--- + +## 8. Multi-operator Extension + +When a second operator needs access: + +1. They generate their own age keypair (`age-keygen`) +2. Share only the **public key** (never the private key) +3. Primary operator adds it to `.sops.yaml` in all repos +4. Primary operator runs `sops updatekeys ` on all encrypted files +5. Both operators can now encrypt and decrypt independently +6. Share KeePassXC database via an encrypted channel (never plaintext) + — the other operator opens it with their own master password after import + +--- + +## 9. Vault Migration Checklist + +When the cluster is stable enough to operate Vault: + +- [ ] Deploy Vault via Helm with HA mode (3 replicas minimum) +- [ ] Store root token and unseal keys in KeePassXC (vault/ group) +- [ ] Enable Kubernetes auth method +- [ ] Create per-service Vault policies (least privilege) +- [ ] Deploy ESO `ClusterSecretStore` pointing to Vault +- [ ] For each service: create `ExternalSecret` manifest, verify K8s Secret + reconciles correctly, then delete the manually-created K8s Secret +- [ ] Verify ESO auto-rotation works (reduce TTL to 1h, confirm rotation) +- [ ] Remove `create-secrets.sh` scripts from deployment runbooks +- [ ] Update this standard to Phase 2 operational status + +--- + +## 10. Summary + +| Situation | Tool | Source of truth | +|-----------|------|----------------| +| No cluster, local dev | KeePassXC + create-secrets.sh | KeePassXC | +| GitOps secrets in repo | SOPS + age | Git (ciphertext) | +| Cluster operational | Vault + ESO | Vault (KeePassXC holds root) | +| Disaster recovery | Ops bundle (age) | Offsite encrypted archive | +| Multi-operator | SOPS multi-recipient | Each operator's age keypair | diff --git a/canon/standards/federated-organization-standard_v1.0.md b/canon/standards/federated-organization-standard_v1.0.md new file mode 100644 index 0000000..00e74c4 --- /dev/null +++ b/canon/standards/federated-organization-standard_v1.0.md @@ -0,0 +1,977 @@ +FederatedOrganizationStandard + +*Building blocks for scalable organizations* + +# FederatedOrganisationStandard (FOS) + +*A reference architecture standard for viable, scalable organizations composed of autonomous domains, coordinated through hubs and governed through explicit recursion* + +**Version:** 1.0 +**Status:** Draft Reference Standard + +--- + +# 1. Purpose + +The **FederatedOrganisationStandard (FOS)** defines an organizational architecture for building and operating a scalable entity — or a collection of entities — through a **federated system of domains and hubs**. + +It is intended for organizations that: + +* combine humans and artificial agents +* operate across multiple management domains +* require strong separation of concerns +* want sovereignty, auditability, and rebuildability +* need to scale recursively from projects to companies to foundation-like umbrella structures + +The standard introduces the concept of a **federated organization**: + +> A viable organization composed of semi-autonomous operational domains, each coordinated through a domain hub, and aligned through shared policy, escalation, and identity structures. + +The standard provides: + +* a conceptual model +* a VSM framing +* a core hub set for scalable organizations +* separation-of-concerns rules +* a cross-hub coupling model +* a recursion model for long-term organizational growth + +--- + +# 2. Core Concept + +## 2.1 Federated Organization + +A **federated organization** is an organization in which: + +* operational variety is handled locally where possible +* coordination is provided through explicit hubs +* authority is bounded and visible +* domain-specific systems remain autonomous +* global coherence is achieved through policy, escalation, and shared protocols rather than through monolithic control + +This is not a flat platform, and it is not a centralized command stack. + +It is an architecture in which: + +* domains remain responsible for their own reality +* hubs reduce coordination cost +* higher-order governance constrains without micromanaging +* the same pattern can recur across multiple levels of organizational scale + +--- + +## 2.2 Hub + +A **hub** is: + +> A domain-specific coordination and orientation layer that makes the state, tensions, requests, and responsibilities of a domain visible and actionable without collapsing that domain into centralized authority. + +A hub is not primarily a source of truth. +A hub is primarily a **derived coordination surface**. + +--- + +## 2.3 Federation + +Within this standard, **federation** means: + +* multiple domains +* multiple hubs +* explicit boundaries +* structured coupling +* recursive viability + +Federation does not imply weak structure. +It implies **bounded autonomy plus disciplined coordination**. + +--- + +# 3. Why This Standard Exists + +Organizations that grow across technical, operational, financial, legal, and strategic concerns tend to fail in one of two ways: + +## 3.1 Centralized Mud + +Everything is routed through one giant management layer, one dashboard, one database, one leadership abstraction, or one agent layer. + +This creates: + +* overloaded coordination channels +* mixed time horizons +* authority confusion +* brittle central systems +* low adaptability + +--- + +## 3.2 Fragmented Drift + +Each domain builds its own world without shared coupling structures. + +This creates: + +* invisible tensions +* misaligned incentives +* cross-domain blockers +* duplicated capabilities +* late escalation of risk + +--- + +## 3.3 FOS Response + +FOS exists to establish a middle path: + +* autonomy where possible +* coordination where necessary +* escalation where required +* identity where non-negotiable + +--- + +# 4. VSM Framing + +The **FederatedOrganisationStandard** is explicitly informed by the logic of the **Viable System Model (VSM)**. + +It assumes that a viable organization requires distinct but connected functions for: + +* operations +* coordination +* internal control +* audit / direct inspection +* intelligence / adaptation +* identity / policy + +FOS uses VSM not as a rigid org chart, but as an architectural framing for keeping complexity manageable. + +--- + +## 4.1 VSM Systems in FOS + +### System 1 — Operations + +These are the units that actually do work. + +Examples: + +* software development domains +* infrastructure operations +* finance administration +* legal/governance operations +* customer-facing service domains +* product teams +* business units + +System 1 units should absorb as much variety locally as they can. + +--- + +### System 2 — Coordination + +This is the layer that reduces oscillation and friction across operational units. + +In FOS, hubs provide much of this coordination through: + +* shared summaries +* request routing +* dependency visibility +* inboxes and message flows +* standard rituals for orientation and handoff + +--- + +### System 3 — Internal Control + +This is the layer concerned with: + +* current performance +* resource use +* compliance with internal expectations +* operational coherence + +In FOS, each domain hub typically includes System 3 functions relevant to its domain. + +--- + +### System 3* — Audit / Direct Inspection + +This is the probing, checking, validating function that bypasses polished reporting when necessary. + +Examples in FOS: + +* consistency checks +* force refresh +* direct probes +* posture validation +* raw-state inspection +* anomaly review + +--- + +### System 4 — Intelligence / Adaptation + +This is the outward- and forward-facing function. + +It handles: + +* future architecture +* emerging risks +* adaptation +* opportunity sensing +* long-term redesign +* environmental shifts + +System 4 may be partially implemented within hubs, but should not be collapsed into day-to-day control. + +--- + +### System 5 — Identity / Policy + +This is the function that answers: + +* who are we +* what must remain true +* what are our non-delegable boundaries +* what may never be optimized away + +In FOS, this role is anchored through a **Canon Hub** or equivalent constitutional layer. + +--- + +# 5. Design Principles + +## 5.1 Explicit Separation of Concerns + +Each hub MUST be domain-specific. + +A hub MUST NOT simultaneously serve as: + +* development hub +* operations hub +* finance hub +* security hub +* constitutional governance hub + +Blending these domains leads to mixed incentives and architectural confusion. + +--- + +## 5.2 Derived-State Preference + +A hub SHOULD be a derived coordination system wherever possible. + +That means: + +* source artefacts remain authoritative elsewhere +* hub data is computed, indexed, summarized, routed, or logged +* deleting and rebuilding the hub should not destroy organizational truth + +--- + +## 5.3 Bounded Authority + +Every hub MUST define: + +* what it can observe +* what it can derive +* what it can recommend +* what it can route +* what it can decide +* what it must escalate + +--- + +## 5.4 Recursive Viability + +The same organizational pattern should work at multiple levels: + +* repo or subsystem +* domain +* operating entity +* umbrella entity +* foundation structure + +Each level should be viable in its own right. + +--- + +## 5.5 Informational Coupling Without Structural Fusion + +Domains should exchange information through explicit protocols, not by collapsing into one giant shared state model. + +This is the core of federation. + +--- + +## 5.6 Sovereignty by Default + +The organization should retain operational control over its own coordination systems. + +FOS therefore favors: + +* local-first systems +* open interfaces +* inspectable stores +* append-only histories +* explicit exports + +--- + +# 6. Core Organizational Primitive: The Hub + +## 6.1 Definition + +A hub is: + +> A domain-specific, bounded coordination layer that exposes the present state, requests, tensions, and responsibilities of a domain in a way that humans and agents can act upon. + +--- + +## 6.2 Minimal Hub Responsibilities + +Every hub MUST provide: + +* orientation +* coordination +* escalation +* event traceability +* bounded interfaces +* domain summaries + +--- + +## 6.3 Minimal Hub Constraints + +Every hub MUST avoid: + +* becoming the sole source of truth without justification +* hidden authority +* invisible side effects +* silent irreversible automation +* uncontrolled cross-domain sprawl + +--- + +# 7. Core Hub Set for a Scalable Organization + +FOS defines a **core set of hubs** that together support a scalable organization. + +Not every organization must implement all of them immediately, but the standard treats them as the canonical target set. + +--- + +## 7.1 Canon Hub + +**Role:** Identity, policy, constitutional boundaries +**Dominant VSM Role:** System 5 + +### Purpose + +The Canon Hub defines the stable normative frame of the organization. + +It answers: + +* what the organization is for +* which roles and mandates exist +* what agents may not do autonomously +* how authority is delegated +* what hard boundaries constrain all lower domains + +### Typical Sources + +* constitution +* policy documents +* foundational ADRs +* role charters +* mandate definitions +* delegation matrices + +### Typical Derived Views + +* policy map +* authority map +* escalation destinations +* unresolved governance questions +* conflicts between policies or mandates + +### Separation Rule + +The Canon Hub MUST remain sparse, stable, and deliberately slower-moving than operational hubs. + +--- + +## 7.2 Dev Hub + +**Role:** Software production coordination +**Dominant VSM Roles:** System 2, 3, 3* + +### Purpose + +The Dev Hub coordinates software design and implementation work across repositories, workstreams, and coding agents. + +It answers: + +* what are we building +* what is blocked +* what changed +* what capabilities exist +* what should happen next in development + +### Typical Sources + +* repositories +* workplans +* scope files +* ADRs +* dependency manifests +* capability declarations + +### Typical Derived Views + +* workstream summaries +* blocker maps +* dependency graphs +* capability catalogs +* decision boards +* development health indicators + +### Separation Rule + +The Dev Hub MUST NOT become a runtime operations dashboard or security control plane. + +--- + +## 7.3 Ops Hub + +**Role:** Runtime operations coordination +**Dominant VSM Roles:** System 2, 3, 3* + +### Purpose + +The Ops Hub coordinates the running system. + +It answers: + +* what is running +* what is degraded +* what needs intervention now +* which access paths exist +* where operational risk is accumulating + +### Typical Sources + +* monitoring systems +* runtime topology +* host or cluster state +* alerts +* operational runbooks +* change records +* backup and certificate metadata +* access bridge definitions + +### Typical Derived Views + +* now view +* incident board +* resilience posture +* access map +* operational debt view +* capacity risk view + +### Separation Rule + +The Ops Hub MUST NOT become the owner of infrastructure intent; desired state belongs in infra repositories or equivalent canonical systems. + +--- + +## 7.4 Sec Hub + +**Role:** Trust, control, and security posture +**Dominant VSM Roles:** System 3, 3*, and strong coupling to System 5 + +### Purpose + +The Sec Hub governs the trust structure of the organization. + +It answers: + +* what is trusted +* what is exposed +* which controls are present or missing +* where exceptions are aging +* where access or identity posture is drifting + +### Typical Sources + +* IAM systems +* audit logs +* policy baselines +* vulnerability data +* exception registers +* certificate and secret metadata +* control definitions + +### Typical Derived Views + +* control coverage +* exposure map +* exception aging +* privileged-path map +* trust posture summary + +### Separation Rule + +The Sec Hub MUST remain distinct from the Ops Hub even when tightly coupled to it. + +Ops may observe and execute; Sec governs and constrains. + +--- + +## 7.5 Fin Hub + +**Role:** Resource viability and allocation +**Dominant VSM Roles:** System 3 and 4 + +### Purpose + +The Fin Hub governs resource viability. + +It answers: + +* what can we afford +* what is committed +* where burn is rising +* what resource tensions exist across domains +* which initiatives threaten long-term viability + +### Typical Sources + +* budget artifacts +* accounting exports +* cloud cost data +* resource allocation plans +* obligations and commitments +* investment or reserve tracking + +### Typical Derived Views + +* runway view +* burn by domain +* committed vs flexible spend +* allocation conflicts +* capital tension alerts + +### Separation Rule + +The Fin Hub MUST NOT be replaced by ad hoc development or operations heuristics when the organization reaches meaningful scale. + +--- + +## 7.6 Optional Domain Hubs + +As the organization grows, additional hubs may appear, such as: + +* Legal Hub +* People Hub +* Portfolio Hub +* Research Hub +* Partnership Hub +* Venture Hub + +These should only be introduced when the domain is stable and distinct enough to justify its own coordination surface. + +--- + +# 8. Separation of Concerns + +## 8.1 Why Separation Matters + +Different domains have: + +* different time horizons +* different failure modes +* different kinds of authority +* different acceptable risk profiles +* different source artefacts + +When these are mixed, the organization loses clarity. + +--- + +## 8.2 Time-Horizon Separation + +A useful default reading is: + +* Canon Hub: years +* Fin Hub: quarters to years +* Dev Hub: days to months +* Ops Hub: seconds to weeks +* Sec Hub: minutes to quarters, depending on the issue + +A hub that mixes radically different horizons will tend toward overload. + +--- + +## 8.3 Responsibility Separation + +A practical shorthand: + +* **Canon Hub** asks: what must remain true? +* **Dev Hub** asks: what should be built? +* **Ops Hub** asks: what must be kept running? +* **Sec Hub** asks: what must be trusted or contained? +* **Fin Hub** asks: what remains viable? + +--- + +## 8.4 Source-of-Truth Separation + +Canonical artefacts should live where they belong: + +* code and workplans in repos +* runtime intent in infra repos or control-plane definitions +* trust policy in security/policy artefacts +* financial truth in accounting or finance systems +* constitutional truth in governance canon + +Hubs summarize, route, and coordinate across these. + +--- + +# 9. Cross-Hub Coupling Model + +## 9.1 Principle + +Hubs should be **loosely coupled but informationally rich**. + +This means: + +* each hub remains structurally separate +* hubs exchange messages, requests, summaries, risks, and escalations +* hubs do not require one giant shared mutable database to cooperate + +--- + +## 9.2 Coupling Modes + +FOS recognizes five primary coupling modes. + +### 9.2.1 Summary Coupling + +One hub provides a compact domain summary to another or to a higher recursion level. + +Example: + +* Ops Hub reports system readiness to Canon or entity-level governance +* Fin Hub reports budget pressure to leadership + +--- + +### 9.2.2 Request Coupling + +One hub requests capability, support, or action from another. + +Example: + +* Dev Hub requests infrastructure provisioning from Ops Hub +* Ops Hub requests a code fix from Dev Hub +* Sec Hub requests remediation from Ops Hub or Dev Hub + +--- + +### 9.2.3 Risk Coupling + +One hub surfaces a risk that another hub must absorb or act on. + +Example: + +* Sec Hub surfaces identity drift to Ops Hub +* Fin Hub surfaces budget pressure to Dev Hub +* Ops Hub surfaces resilience risk to Canon or entity management + +--- + +### 9.2.4 Escalation Coupling + +A domain issue exceeds local authority or capacity and is explicitly escalated. + +Example: + +* Sec Hub escalates a policy breach to Canon Hub +* Ops Hub escalates a risk involving major spend to Fin Hub +* Dev Hub escalates unresolved architectural conflict to System 4 functions + +--- + +### 9.2.5 Event Coupling + +Hubs share relevant append-only events to preserve cross-domain traceability. + +Example: + +* a deployment event in Dev becomes a change signal in Ops +* an access exception in Sec becomes an operational constraint in Ops + +--- + +## 9.3 Coupling Rules + +Cross-hub coupling MUST obey the following rules: + +### Rule 1: No hidden dependencies + +If one hub depends on another, the dependency should be visible. + +### Rule 2: No authority smuggling + +A hub must not use messaging to silently take over another hub’s mandate. + +### Rule 3: No unbounded chatter + +Coupling should reduce, not amplify, organizational noise. + +### Rule 4: Summaries upward, detail locally + +Higher levels should receive compressed meaning, not raw variety dumps. + +### Rule 5: Hard boundaries remain hard + +Cross-hub coordination must not bypass constitutional, security, or financial constraints. + +--- + +# 10. Standard Cross-Hub Contract + +To support federation, all hubs SHOULD expose a minimal common contract. + +## 10.1 Required Generic Functions + +### Orientation + +* `get_domain_summary()` + +### Messaging + +* `get_messages()` +* `send_message()` + +### Risks and Alerts + +* `get_risks()` +* `get_alerts()` + +### Coordination + +* `request_capability()` +* `record_event()` + +### Escalation + +* `escalate_issue()` + +--- + +## 10.2 Domain-Specific Extensions + +Beyond the shared contract, each hub SHOULD expose domain-specific functions. + +Examples: + +* Dev Hub: workstreams, capabilities, decisions +* Ops Hub: services, incidents, runbooks, access paths +* Sec Hub: controls, exposures, exceptions +* Fin Hub: budgets, commitments, runway +* Canon Hub: mandates, policies, delegation rules + +--- + +# 11. Organizational Recursion + +## 11.1 Recursion Principle + +A federated organization may exist at multiple levels simultaneously. + +Examples: + +* a repo as a micro-domain +* a hub as a domain-level coordinator +* an operating company as a viable entity +* a foundation or family structure as a higher-order viable system +* a portfolio of ventures as a still higher recursion layer + +The same viability logic should hold at each level. + +--- + +## 11.2 Canonical Recursion Levels + +### L0 — Subsystem / Repo / Service + +A bounded working unit. + +### L1 — Domain Hub + +Dev, Ops, Sec, Fin, etc. + +### L2 — Operating Entity + +The company or core operating body. + +### L3 — Umbrella Governance Entity + +Foundation, holding structure, or multi-venture umbrella. + +--- + +## 11.3 Escalation Across Recursion Levels + +Escalation should occur when: + +* local authority is insufficient +* risk exceeds local tolerance +* policy conflict cannot be resolved locally +* resource conflict crosses domain boundaries +* strategic redesign is required + +The higher recursion level should receive a summary plus context, not raw noise. + +--- + +# 12. Role Logic in a Federated Organization + +FOS does not prescribe a rigid human org chart, but it does define role logic. + +## 12.1 Every Domain Needs at Least + +* an operational role +* a coordination role +* an analytical or review role + +These may be human, artificial, or hybrid. + +--- + +## 12.2 Higher-Order Roles + +As the organization grows, meta-roles may emerge, such as: + +* chief technical operator +* security steward +* portfolio strategist +* constitutional steward +* financial allocator + +These roles should coordinate across hubs, not erase them. + +--- + +# 13. Anti-Patterns + +## 13.1 The Mega-Hub + +One hub for everything. + +This destroys separation of concerns and creates central mud. + +--- + +## 13.2 The Silent Empire + +A hub accumulates hidden authority and becomes de facto sovereign. + +This undermines explicit governance. + +--- + +## 13.3 Domain Collapse + +Development, operations, security, and finance are treated as one blended management problem. + +This guarantees confusion. + +--- + +## 13.4 Connector Spaghetti + +Cross-hub integration grows ad hoc without standard contracts. + +This creates invisible fragility. + +--- + +## 13.5 Upward Variety Flooding + +Higher-order governance is flooded with low-level events and raw detail. + +This breaks recursion. + +--- + +# 14. Maturity Model + +## Level 0 — Unstructured + +No explicit hub logic, ad hoc coordination. + +## Level 1 — Single-Hub Emergence + +One hub exists, but boundaries are still mixed. + +## Level 2 — Domain Hub Clarity + +At least Dev and Ops are distinct. + +## Level 3 — Core Federation + +Canon, Dev, Ops, Sec, and Fin are conceptually separated and partially operational. + +## Level 4 — Protocolized Coupling + +Cross-hub messaging, requests, risks, and escalations follow standard contracts. + +## Level 5 — Recursive Federation + +Multiple entities or ventures operate under shared constitutional logic while retaining local autonomy. + +--- + +# 15. Minimal Reference Architecture + +A minimal scalable FOS architecture contains: + +* one **Canon Hub** +* one **Dev Hub** +* one **Ops Hub** +* one **Sec Hub** +* one **Fin Hub** +* one **common cross-hub protocol** +* one **shared event and escalation vocabulary** +* one **explicit recursion model** + +Not all must be implemented at once, but the organization should know where it is heading. + +--- + +# 16. Key Insight + +> A scalable organization is not built by centralizing everything. +> It is built by creating viable domains with clean boundaries, then coupling them through explicit hubs, shared protocols, and constitutional constraints. + +--- + +# 17. Closing Statement + +The **FederatedOrganisationStandard** defines an organization as a federation of viable domains rather than as a monolithic machine. + +Its core commitments are: + +* autonomy with accountability +* coordination without collapse +* policy without micromanagement +* recursion without chaos +* visibility without loss of sovereignty + +It provides a path by which a single evolving project can grow into a multi-domain, multi-entity, foundation-compatible structure without losing clarity of identity or operational coherence. + +xxx diff --git a/state-hub/Makefile b/state-hub/Makefile index 1bd7a55..e30b07c 100644 --- a/state-hub/Makefile +++ b/state-hub/Makefile @@ -235,26 +235,32 @@ validate-adr: uv run python scripts/validate_repo_adr.py "$(REPO)" $(if $(DOMAIN),--domain "$(DOMAIN)",) ## Check a single repo for ADR-001 consistency: make check-consistency REPO=the-custodian [REPO_PATH=/override] +## Exit 0 = clean, exit 2 = warnings only (treated as success), exit 1 = failures check-consistency: @test -n "$(REPO)" || (echo "ERROR: REPO is required. Usage: make check-consistency REPO="; exit 1) uv run python scripts/consistency_check.py --repo "$(REPO)" \ $(if $(API_BASE),--api-base "$(API_BASE)",) \ - $(if $(REPO_PATH),--repo-path "$(REPO_PATH)",) + $(if $(REPO_PATH),--repo-path "$(REPO_PATH)",); \ + e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e ## Check and auto-fix a single repo: make fix-consistency REPO=the-custodian [REPO_PATH=/override] +## Exit 0 = clean, exit 2 = warnings only (treated as success), exit 1 = failures fix-consistency: @test -n "$(REPO)" || (echo "ERROR: REPO is required. Usage: make fix-consistency REPO="; exit 1) uv run python scripts/consistency_check.py --repo "$(REPO)" --fix \ $(if $(API_BASE),--api-base "$(API_BASE)",) \ - $(if $(REPO_PATH),--repo-path "$(REPO_PATH)",) + $(if $(REPO_PATH),--repo-path "$(REPO_PATH)",); \ + e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e ## Check all registered repos for ADR-001 consistency check-consistency-all: - uv run python scripts/consistency_check.py --all $(if $(API_BASE),--api-base "$(API_BASE)",) + uv run python scripts/consistency_check.py --all $(if $(API_BASE),--api-base "$(API_BASE)",); \ + e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e ## Check and auto-fix all registered repos fix-consistency-all: - uv run python scripts/consistency_check.py --all --fix $(if $(API_BASE),--api-base "$(API_BASE)",) + uv run python scripts/consistency_check.py --all --fix $(if $(API_BASE),--api-base "$(API_BASE)",); \ + e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e ## Cancel open tasks belonging to completed/archived workstreams. ## Safe to run at any time; also suitable for a daily cron job. diff --git a/wiki/BigPictureGuidance.md b/wiki/BigPictureGuidance.md new file mode 100644 index 0000000..82c7f7e --- /dev/null +++ b/wiki/BigPictureGuidance.md @@ -0,0 +1,118 @@ +We will set up a federated autonomous organization and IT infrastructure based on two VSM based +standards according to the following guidance about how to integrate them and adress conflicts +and blind spots. + +This document is the current Big Picture Guidance for the Custodian and should inform creation +and priority of how to spend time, money and tokens bootstrapping the infrastructure efficiently. + +**Integration Guidance: Orthogonal Architecture Standard (OAS) + Federated Organisation Standard (FOS)** + +Note: The standards are available under canon/standards/ + +Both standards are **explicitly built on the same foundation** — Stafford Beer’s Viable System Model (VSM). This is not a coincidence; it is the single strongest reason they integrate cleanly. OAS applies VSM to **compute systems** (Kubernetes/cloud-native infra), while FOS applies the identical VSM logic to **organizational structure** (including humans + AI agents). Together they form a complete “cybernetic stack”: FOS gives you the viable *organization*, OAS gives you the viable *infrastructure* that the organization actually runs on. + +### 1. How to Integrate Them (Recommended Architecture) + +Treat the two standards as **orthogonal layers of one recursive system**: + +- **FOS = System-of-Systems layer** (organizational recursion L0–L3) +- **OAS = Compute substrate layer** (technical recursion inside every hub and every domain) + +**Core mapping (one-to-one VSM alignment)** + +| VSM Role | FOS Hub(s) | OAS Dimension(s) Used to Realize It | Concrete Implementation Pattern | +|----------------|--------------------------------|------------------------------------------------------|---------------------------------| +| System 5 | Canon Hub | Plane P2 + Capability C1 + Quality Q7 + Intelligence I5 | GitOps repo + policy-as-code + AI agents that *must* route through control plane | +| System 4 | Fin Hub + parts of Dev Hub | Intelligence I4 + Capability C4 + Stack S4 | Adaptive autoscaling + AI config assistants + cost-optimization loops | +| System 3 / 3* | Sec Hub + Ops Hub | Plane P2/P3 + Quality (all Q) + Stack S2/S3 | Kubernetes control plane + policy engines + observability + audit probes | +| System 2 | Dev Hub + Ops Hub cross-talk | Logic L3 + Plane P2 + Relation “governs”/“observes” | Workflow engines + GitOps controllers + cross-hub protocol | +| System 1 | All operational domains + workloads | Stack S5 + Logic L2/L4 + Capability C3/C5 | Actual pods/services/APIs that deliver value | + +**Practical bootstrap sequence (start small, stay viable at every level)** + +1. **Day 0–30: Canon Hub first** + Create a single sparse Git repository (or lightweight Notion/Obsidian + Git sync). This becomes your System 5. + Declare: mandates, delegation matrix, non-delegable boundaries, and the rule “*All AI actions MUST route through the control plane*” (directly from OAS I5 + FOS 5.6). + +2. **Day 30–60: Seed the four core hubs as derived-state services** + - Dev Hub → Backstage or custom portal on top of your repos + ADRs + capability catalog (OAS Logic L1 + Capability C1–C3). + - Ops Hub → ArgoCD + Prometheus + custom “now view” dashboard (OAS Stack S1–S3 + Plane P2). + - Sec Hub → OPA/Gatekeeper + Vault + Trivy + exception register (OAS Quality Q1 + Plane P2). + - Fin Hub → OpenCost + Git-based budget manifests + runway calculator (OAS Quality Q6 + Intelligence I4). + + All four hubs expose the **minimal cross-hub contract** from FOS §10 (get_domain_summary, request_capability, escalate_issue, etc.). Implement it once as a small Kubernetes operator or simple REST + NATS/event bus. + +3. **Ongoing: Make every hub itself a recursive OAS system** + Each hub runs on its own namespace/cluster slice. Apply the full OAS dimensions inside it: + - Its workloads = OAS Stack S5 + - Its internal control = OAS Plane P2 + - Its AI agents = OAS Intelligence I5 (always governed) + This satisfies FOS recursion principle automatically. + +4. **AI Agent Integration (the autonomous part)** + - All agents live in the **Intelligence dimension** of OAS. + - They are treated as **System 1 units** inside the relevant FOS domain (e.g., coding agents in Dev Hub). + - They *never* bypass the control plane (OAS P3 + FOS 10.1). + - Canon Hub policies (e.g., “no agent may spend >€500 without escalation”) are enforced by OAS Quality Q7 + Sec Hub. + +### 2. Conflicts to Mitigate (there are only two real ones) + +**Conflict 1: Centralised Control Plane vs. Derived-State Hubs** +OAS insists “all changes go through control plane”. +FOS insists “hubs must remain derived, rebuildable, not sole source of truth”. + +**Mitigation (already built into both standards):** +- Make the control plane itself **derived** (GitOps + reconciliation loops). +- Source of truth = canonical Git repos + policy documents. +- Hubs only index, summarise, and route. +- Delete/rebuild any hub → it re-derives everything from source (FOS 5.2 + OAS Plane P2). + +**Conflict 2: Time-horizon mixing** +Ops Hub wants seconds-to-weeks view; Fin Hub wants quarters-to-years; Canon wants “forever”. + +**Mitigation:** +- Enforce FOS §8.2 time-horizon separation at the data model level (different retention, different dashboards, different escalation SLAs). +- Use OAS Quality Q7 governance policies to forbid a single dashboard that mixes horizons. + +No other structural conflicts exist — the standards were clearly designed to interlock. + +### 3. Blindspots to Address (these are the real gaps) + +1. **Legal / Regulatory Reality (FOS mentions Legal Hub as optional — make it mandatory in EU)** + You are in Germany. Add a lightweight Legal Hub (or at least Canon Hub section) for DSGVO, AI Act, GmbH law, tax, etc. OAS does not mention compliance jurisdiction — you must layer it on top of Quality Q1 + Canon Hub. + +2. **Bootstrapping & Initial Capital** + Neither standard tells you how to fund the first €10k or hire the first human. + → Create a one-time “Bootstrap Protocol” document in Canon Hub that defines seed funding, MVP scope, and first three mandated roles (Constitutional Steward + Technical Operator + Financial Allocator). + +3. **Human–AI Handover & Psychological Safety** + FOS talks about “humans and artificial agents” but gives no guidance on when a human must override an agent or how to handle agent mistakes that affect people. + → Add explicit “Human Override” policy in Canon Hub + OAS Intelligence I5 rule: every agent action must be auditable and reversible by a human in <5 min. + +4. **External Interfaces & Customers** + Both standards are inward-focused. + → Explicitly model “Customer Capability” in OAS Capability C5 and expose it through a public-facing API gateway (OAS Stack S5) that is still governed by the same control plane. + +5. **Exit / Dissolution / Succession** + No standard covers “what happens if the founder disappears or the org needs to be wound down”. + → Canon Hub must contain a “Sunset Clause” and delegation-of-dissolution rules. + +6. **Concrete Tooling & Cost Control** + The standards are abstract. A practical minimal stack that satisfies both at once (all open-source, EU-hostable): + - Kubernetes (k3s or Talos) + ArgoCD + Crossplane (for OAS Stack) + - Backstage (Dev Hub) + - Grafana + Loki + Tempo + OpenCost (Ops + Fin) + - OPA + Kyverno + Vault (Sec) + - NATS/JetStream or Temporal for cross-hub protocol + - Ollama / local LLM agents routed through control plane (Intelligence) + +### Recommended Next Steps (30-day plan) + +Week 1: Create Canon Hub repo + write the 10 non-delegable policies (including “AI must go through control plane”). +Week 2: Spin up minimal Kubernetes + ArgoCD + the four hubs as simple operators/portals. +Week 3: Implement the cross-hub contract (5 generic functions). +Week 4: Seed first operational workload + first AI agent (coding assistant) and prove it cannot bypass governance. + +If you follow this integration path you will have a **genuinely autonomous, viable, auditable, rebuildable organization + infrastructure** that grows recursively without collapsing into central mud or fragmented drift — exactly what both standards promise. + +Both documents are still “Draft Standard” (v1.0). Treat them as living artefacts: version them inside the Canon Hub and evolve them together as your system learns. That is the ultimate recursive move. diff --git a/workplans/CUST-WP-0025-fos-hub-bootstrap.md b/workplans/CUST-WP-0025-fos-hub-bootstrap.md new file mode 100644 index 0000000..567b62a --- /dev/null +++ b/workplans/CUST-WP-0025-fos-hub-bootstrap.md @@ -0,0 +1,492 @@ +--- +id: CUST-WP-0025 +type: workplan +title: "FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub" +domain: custodian +repo: the-custodian +status: active +owner: custodian +topic_slug: custodian +created: "2026-03-20" +updated: "2026-03-20" +state_hub_workstream_id: "293a74fe-a85a-4ad6-8933-23d52a72fe8b" +--- + +# FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub + +## Goal + +Progress the Custodian from FOS maturity Level 1 (Single-Hub Emergence) toward +Level 3 (Core Federation) by: + +1. Finalizing shared identity infrastructure (NetKingdom SSO) +2. Extracting a generic reusable hub-core package from state-hub +3. Renaming state-hub to dev-hub and transitioning all repos +4. Creating the ops-hub for runtime operations coordination +5. Building the fin-hub with railiance-as-a-service as first monetization path + +## Context + +The state-hub has matured through 24 completed workplans (62 workstreams, 573 tasks) +but remains a monolithic single hub mixing dev-coordination, governance, and generic +infrastructure. Per FOS §13.1, this risks becoming the "Mega-Hub" anti-pattern. + +Two standards govern the architecture: +- **FOS** (Federated Organisation Standard): organizational recursion via domain hubs +- **OAS** (Orthogonal Architecture Standard): compute substrate via 6 dimensions + +Together they form a complete cybernetic stack: FOS gives the viable organization, +OAS gives the viable infrastructure. + +## Key Decisions + +| Decision | Choice | Rationale | +|----------|--------|-----------| +| Hub-core packaging | Separate pip-installable package | Clean separation, versioned independently, each hub depends via uv | +| Phase sequencing | Parallel start (Phase 1 + 2) | Identity and extraction run concurrently; auth bolted on later | +| Ops Hub location | New standalone repo | FOS separation principle — each hub independently deployable | +| First monetization | Railiance-as-a-service | Package OAS infra stack as managed/consultancy for EU SMEs | + +## Phase 1 — Identity Infrastructure + +**Goal**: Finalized user-id infrastructure so all future hubs share one SSO plane. +**Repos**: net-kingdom, railiance-cluster, railiance-platform +**Runs in parallel with Phase 2.** + +### T01 — Complete NK-WP-0001: Keycloak + privacyIDEA on k3s + +```task +id: CUST-WP-0025-T01 +status: todo +priority: high +state_hub_task_id: "f55078b6-7fa3-49ab-be30-37db622d64c9" +``` + +Complete the SSO/MFA platform deployment. Keycloak as OIDC provider with +privacyIDEA for MFA, running on the k3s cluster. This is the identity +foundation for all hubs and services. + +Cross-reference: net-kingdom NK-WP-0001. + +### T02 — Complete NK-WP-0002: Local identity bootstrap + +```task +id: CUST-WP-0025-T02 +status: todo +priority: high +state_hub_task_id: "0d7792f7-5695-4e1a-9726-b9661d5e7108" +``` + +Implement lightweight file-based OIDC server for dev/sandbox/bootstrap +scenarios where the full Keycloak cluster is unavailable. Enables local +development of hub services without cluster dependency. + +Cross-reference: net-kingdom NK-WP-0002. + +### T03 — IAM Profile integration test + +```task +id: CUST-WP-0025-T03 +status: todo +priority: medium +state_hub_task_id: "e9894ac9-add3-45a6-9893-ea67c6e5e260" +``` + +Prove a FastAPI service can authenticate via NetKingdom OIDC end-to-end. +Write a minimal test service + integration test that: +- Obtains a token via OIDC/PKCE flow +- Calls a protected endpoint +- Validates token claims (sub, roles, expiry) + +This test becomes the template for hub-core auth middleware. + +### T04 — Canon standard: IAM Profile specification + +```task +id: CUST-WP-0025-T04 +status: todo +priority: medium +state_hub_task_id: "69acc880-394b-478a-94f0-476c9cbc1bc6" +``` + +Document the OIDC contract as `canon/standards/iam-profile_v0.1.md`: +- Discovery endpoint structure +- Required claims and scopes +- Token lifecycle (access + refresh) +- Hub-to-hub service account pattern +- Human override / emergency access + +## Phase 2 — Hub Extraction & Dev Hub Rename + +**Goal**: Extract generic hub-core package; rename state-hub to dev-hub. +**Repo**: the-custodian (extraction), hub-core (new repo) +**Runs in parallel with Phase 1.** + +### Extraction Boundary + +**Generic hub-core (~17 MCP tools, ~6 models, ~6 routers):** +- Models: Domain, AgentMessage, CapabilityCatalog, CapabilityRequest, ManagedRepo, TPSC*, ProgressEvent (generic event_types) +- Routers: domains, repos, messages, capability_requests, tpsc, policy +- MCP tools: orientation, messaging, capability routing, repo management, TPSC/GDPR, DoI + +**Dev-hub-specific (~51 MCP tools, ~12 models):** +- Topics, workstreams, tasks, decisions, dependencies, EP/TD, contributions, SBOM, goals, DoI cache, kaizen agents, consistency checker + +### T05 — Create hub-core package + +```task +id: CUST-WP-0025-T05 +status: todo +priority: high +state_hub_task_id: "04bf480c-8847-4a89-a4f2-e7c5fc51088d" +``` + +Create `hub-core` as a standalone repo with `pyproject.toml` (uv-managed). +Extract from state-hub: +- Generic SQLAlchemy models (Domain, AgentMessage, CapabilityCatalog, CapabilityRequest, ManagedRepo, TPSC*, ProgressEvent) +- Generic Pydantic schemas +- Generic FastAPI routers (domains, repos, messages, capability_requests, tpsc, policy) +- Alembic migration templates for core schema +- Shared utilities (slug resolution, pagination, trailing-slash normalization) + +### T06 — Hub-core FastMCP base server + +```task +id: CUST-WP-0025-T06 +status: todo +priority: high +state_hub_task_id: "6b49d94a-b1ea-4507-a8a3-e27c1a918491" +``` + +Add a base MCP server class to hub-core that provides the ~17 generic tools: +- Orientation: get_state_summary, get_domain_summary, list_domains +- Messaging: send_message, get_messages, mark_message_read, reply_to_message +- Capability routing: register_capability, list_capabilities, request_capability, accept_capability_request, update_capability_request_status, list_capability_requests, get_capability_request +- Repo management: register_repo, update_repo_path, list_domain_repos +- TPSC/GDPR: register_service, list_services, ingest_tpsc_tool, get_gdpr_report +- DoI: check_repo_doi, get_doi_summary + +Domain-specific hubs inherit and add their own tools. + +### T07 — FOS §10 risk and alert tools + +```task +id: CUST-WP-0025-T07 +status: todo +priority: medium +state_hub_task_id: "5a54af24-f7cb-451f-874f-66bd6979ab07" +``` + +Add `get_risks()` and `get_alerts()` to hub-core, formalizing existing +ProgressEvent patterns. Define canonical event_type values: +- `risk_surfaced`, `risk_mitigated`, `risk_escalated` +- `alert_raised`, `alert_acknowledged`, `alert_resolved` + +This completes the FOS §10 cross-hub contract. + +### T08 — Refactor state-hub to import from hub-core + +```task +id: CUST-WP-0025-T08 +status: todo +priority: high +state_hub_task_id: "daf1d8ac-b55a-4692-b359-2671ddf6fc8a" +``` + +Refactor the state-hub codebase: +- Replace generic models/routers/schemas with imports from hub-core +- Keep dev-specific code (topics, workstreams, tasks, decisions, etc.) in state-hub +- Ensure all existing tests pass with the new import structure +- Update pyproject.toml to depend on hub-core + +### T09 — Rename MCP server state-hub to dev-hub + +```task +id: CUST-WP-0025-T09 +status: todo +priority: high +state_hub_task_id: "2148a804-7d6a-4e26-b1a8-08da24929c88" +``` + +Rename across all integration points: +- `state-hub/mcp_server/server.py`: name="state-hub" → "dev-hub" +- `~/.claude/CLAUDE.md`: 3 locations (registration commands, references) +- `state-hub/scripts/register_project.sh`: validation checks +- `state-hub/scripts/patch_mcp_cwd.py`: config checks +- `state-hub/custodian_cli.py`: config checks +- `state-hub/scripts/project_rules/session-protocol.template`: template text +- `state-hub/api/main.py`: service metadata response + +### T10 — MCP config migration script + +```task +id: CUST-WP-0025-T10 +status: todo +priority: medium +state_hub_task_id: "5953f129-089d-4d90-bbe5-f86da4eac1bf" +``` + +Create `state-hub/scripts/migrate_mcp_config.py` that: +- Reads `~/.claude.json` +- Renames `mcpServers["state-hub"]` to `mcpServers["dev-hub"]` +- Preserves all other settings +- Backs up original file before writing + +### T11 — Regenerate domain repo rule files + +```task +id: CUST-WP-0025-T11 +status: todo +priority: medium +state_hub_task_id: "7b41766b-f97f-4e9f-9f3c-c0937edb355f" +``` + +After template update, regenerate `.claude/rules/session-protocol.md` for +all registered domain repos: +- railiance-infra, railiance-cluster, railiance-platform +- railiance-enablement, railiance-apps +- net-kingdom, markitect, coulomb.social +- personhood, foerster-capabilities + +### T12 — Full test suite and consistency check + +```task +id: CUST-WP-0025-T12 +status: todo +priority: high +state_hub_task_id: "e55ae544-3cea-485e-80d5-a9696ef97b96" +``` + +Gate: all of the following must pass before Phase 2 is considered complete: +- `cd state-hub && make test` — full test suite +- `make fix-consistency REPO=the-custodian` — workplan ↔ DB sync +- `make check-consistency-all` — all registered repos +- Manual smoke test: start dev-hub MCP server, run get_domain_summary from a domain repo + +## Phase 3 — Ops Hub + +**Goal**: Runtime operations coordination per FOS §7.3. +**Depends on**: Phase 2 (hub_core available), Phase 1 (identity for service auth). +**Repo**: ops-hub (new standalone repo, registered under custodian domain) + +### T13 — Create ops-hub repo from hub-core scaffold + +```task +id: CUST-WP-0025-T13 +status: todo +priority: medium +state_hub_task_id: "2c6d1429-a67a-4f66-84d1-cb32ffdb890f" +``` + +Create `ops-hub` repo with: +- pyproject.toml depending on hub-core +- FastAPI app factory inheriting hub-core base +- MCP server extending hub-core base server +- Alembic setup with hub-core core migrations + ops-specific +- Register as managed repo under custodian domain + +### T14 — Ops-specific models + +```task +id: CUST-WP-0025-T14 +status: todo +priority: medium +state_hub_task_id: "0e811e9b-23a5-49f9-979e-cd1c5dcd937f" +``` + +Define SQLAlchemy models for: +- **Service**: name, namespace, health_status, last_seen, endpoints +- **Incident**: severity, status (open/investigating/mitigated/resolved), timeline +- **Runbook**: service_id, trigger_conditions, steps, last_executed +- **AccessPath**: type (ssh/k8s/http), target, auth_method, status +- **OperationalDebt**: category, severity, location, owner +- **ChangeRecord**: what changed, when, by whom, rollback_path + +### T15 — Ops-specific MCP tools + +```task +id: CUST-WP-0025-T15 +status: todo +priority: medium +state_hub_task_id: "3fdd1f61-4c8e-4614-898b-df7a9aa4a514" +``` + +Implement ops-domain MCP tools: +- Service registry: register_service, list_services, get_service_health +- Health probes: probe_service, get_cluster_health, get_storage_health +- Incident lifecycle: create_incident, update_incident, resolve_incident +- Runbook: get_runbook, execute_runbook_step +- Access: list_access_paths, check_access_path + +### T16 — Railiance infrastructure integration + +```task +id: CUST-WP-0025-T16 +status: todo +priority: medium +state_hub_task_id: "702849c5-b253-4ede-afa7-0ab4f81e49a5" +``` + +Connect ops-hub to railiance infrastructure observability: +- k3s cluster health via kubectl/API +- Longhorn storage status and replication state +- Certificate expiry tracking (cert-manager) +- Backup status (S2 integrated backup) +- SSH tunnel health (ops-bridge) + +### T17 — Cross-hub protocol: ops-hub to dev-hub + +```task +id: CUST-WP-0025-T17 +status: todo +priority: medium +state_hub_task_id: "b99a3ed8-440b-4e28-88f5-495de7276f66" +``` + +Implement FOS §9.2.5 event coupling: +- Deployment events in dev-hub → change signals in ops-hub +- Incident events in ops-hub → blocker signals in dev-hub +- Shared event vocabulary (canonical event_types) +- HTTP-based event forwarding (keep it simple; upgrade to NATS later if needed) + +### T18 — Ops Hub "now view" dashboard + +```task +id: CUST-WP-0025-T18 +status: todo +priority: low +state_hub_task_id: "5b6cea8b-3982-49be-bacf-7269a3d2104e" +``` + +Observable Framework dashboard for ops-hub: +- Service status grid (green/amber/red) +- Active incidents timeline +- Access path map +- Storage and certificate health +- Recent change log + +### T19 — Register ops-hub as MCP server + +```task +id: CUST-WP-0025-T19 +status: todo +priority: medium +state_hub_task_id: "f033c80e-4ebb-49cf-8987-20c9b2ff4c13" +``` + +Register ops-hub MCP server: +- Port 8002 (dev-hub on 8001, ops-hub on 8002) +- Update global `~/.claude/CLAUDE.md` with ops-hub registration +- Update session protocol: domain repos that touch infrastructure should + call both `get_domain_summary()` (dev-hub) and ops-hub orientation + +## Phase 4 — Business Model & Fin Hub + +**Goal**: First monetization via railiance-as-a-service + resource viability hub. +**Depends on**: Phase 3 (multi-hub pattern proven). + +### T20 — Business model canvas: railiance-as-a-service + +```task +id: CUST-WP-0025-T20 +status: todo +priority: medium +state_hub_task_id: "55db0560-2733-481d-adba-b72c3839ba45" +``` + +Define the offering: +- Target: EU SMEs needing sovereign, GDPR-compliant DevOps infrastructure +- Core: managed k3s cluster + observability + GitOps + backup +- Differentiator: VSM-based organizational architecture, not just infra +- Pricing tiers: self-hosted (open-source), managed, fully operated +- Document as `canon/projects/railiance/business-model-canvas_v0.1.md` + +### T21 — Canon: Bootstrap Protocol document + +```task +id: CUST-WP-0025-T21 +status: todo +priority: medium +state_hub_task_id: "ce54d3fc-140e-49be-a181-779abc434d4e" +``` + +Address FOS blindspot #2 (bootstrapping & initial capital): +- Seed funding strategy and minimum viable budget +- MVP scope definition (what must exist before first customer) +- First 3 mandated roles: Constitutional Steward, Technical Operator, Financial Allocator +- Revenue threshold for role formalization +- Document as `canon/constitution/bootstrap-protocol_v0.1.md` + +### T22 — Create fin-hub repo from hub-core scaffold + +```task +id: CUST-WP-0025-T22 +status: todo +priority: low +state_hub_task_id: "670757d8-305d-4736-9056-e79a150114b1" +``` + +Create `fin-hub` repo with same scaffold pattern as ops-hub. +Register under custodian domain. + +### T23 — Fin-specific models + +```task +id: CUST-WP-0025-T23 +status: todo +priority: low +state_hub_task_id: "8ebffb3f-0dbb-4672-b4e9-928992c41cf4" +``` + +Define SQLAlchemy models for: +- **Budget**: domain, period, allocated, committed, spent +- **Commitment**: type (subscription/contract/salary), amount, cadence, start/end +- **BurnRate**: domain, period, actual_spend, projected_spend +- **RunwayProjection**: current_balance, monthly_burn, months_remaining, alert_threshold +- **TokenSpend**: provider (anthropic/openai), model, tokens_in, tokens_out, cost, session_id + +### T24 — Fin-hub implementation: cost tracking + runway + +```task +id: CUST-WP-0025-T24 +status: todo +priority: low +state_hub_task_id: "405f81d3-dec5-4154-a1b8-a3af344a0cc4" +``` + +Implement: +- Cloud cost ingestion (manual CSV import initially, OpenCost integration later) +- Anthropic API token spend tracking (parse billing exports) +- HostEurope server cost tracking +- Runway calculator with burn-rate projection +- Budget alerts when projected runway drops below threshold + +### T25 — Cross-hub coupling: fin-hub connections + +```task +id: CUST-WP-0025-T25 +status: todo +priority: low +state_hub_task_id: "90a41790-7290-4145-b89f-88bf491d7652" +``` + +Implement FOS §9 cross-hub coupling: +- fin→dev: resource pressure signals (budget alerts surface in dev-hub) +- fin→ops: infrastructure cost attribution (per-service cost view) +- fin→canon: viability alerts (runway below threshold escalates to System 5) + +### T26 — Pricing and packaging: railiance-as-a-service MVP + +```task +id: CUST-WP-0025-T26 +status: todo +priority: low +state_hub_task_id: "e17ef269-e349-44cc-ab14-6c57b43199b1" +``` + +Concrete pricing: +- Define 3 tiers with feature matrix +- Create landing page content +- Define onboarding workflow (customer → provisioned k3s + monitoring) +- Legal: GmbH implications, liability, SLA framework +- First customer acquisition strategy