feat: add FOS/credential standards, big-picture guidance, and CUST-WP-0025 workplan

- canon/standards/credential-management_v0.1.md: single root-of-trust credential hierarchy standard
- canon/standards/federated-organization-standard_v1.0.md: FOS reference architecture (VSM-based)
- wiki/BigPictureGuidance.md: integration guidance for OAS + FOS orthogonal layers
- workplans/CUST-WP-0025-fos-hub-bootstrap.md: 4-phase plan (identity, hub-core extraction, ops-hub, fin-hub)
- state-hub/Makefile: treat exit 2 (warnings-only) as success in check-consistency targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-20 23:48:13 +01:00
parent cbad0dc958
commit 0777e5b2f0
5 changed files with 1897 additions and 4 deletions

View File

@@ -0,0 +1,300 @@
---
title: "Credential Management Standard"
version: "0.1"
status: "Draft Standard"
domain: custodian
scope: all-domains
created: "2026-03-20"
---
# Credential Management Standard
**Version:** 0.1
**Status:** Draft Standard
**Scope:** All domains and repositories in the federated organization
---
## 1. Purpose
This standard defines how credentials, secrets, and key material are
managed across all systems — from a developer workstation with no
infrastructure, to a fully operational Kubernetes cluster.
The core principle is a **single root of trust**: one operator keypair
anchors all credential storage and encryption. Every secret can be
traced back to that root. No secret lives outside this hierarchy.
---
## 2. Trust Hierarchy
```
Operator passphrase (human memory only — never stored anywhere)
└── age keypair (~/.config/sops/age/key.txt — one per operator)
├── SOPS encryption (GitOps secrets in all repos)
│ └── secrets/**/*.sops.yaml — encrypted at rest in git
├── Ops bundle (age-encrypted tar — offsite backup)
│ └── ops-bundle-<date>.tar.age
│ └── all service secrets at point-in-time
└── KeePassXC (pre-cluster primary credential store)
│ └── master password = operator passphrase (or derived)
├── Infrastructure credentials
│ ├── SSH keys (server access)
│ ├── API tokens (Gitea, HostEurope, Hetzner)
│ └── Cloud credentials
├── Service secrets (per-domain groups)
│ ├── net-kingdom/privacyidea/
│ ├── net-kingdom/lldap/
│ ├── net-kingdom/authelia/
│ ├── net-kingdom/keycape/
│ └── railiance/postgres/
└── Vault root token (in-cluster phase, stored here)
└── HashiCorp Vault
└── External Secrets Operator (ESO)
└── K8s Secrets → pods
```
---
## 3. Phases
### Phase 0 — Pre-cluster (bootstrap)
**Used when:** No Kubernetes cluster is available. Local development,
initial server provisioning, CI bootstrap.
**Tools:** age keypair + KeePassXC + ops bundle
**Flow:**
1. Generate service secrets with a `gen-secrets.sh` script
2. Copy each secret manually into KeePassXC (under the appropriate group)
3. Encrypt a point-in-time ops bundle: `pack-bundle.sh <secrets-dir> <age-pub-key>`
4. Store the ops bundle offsite (separate physical location from KeePassXC)
5. Shred the plaintext secrets directory: `find secrets/ -type f -exec shred -u {} \;`
6. When deploying to k8s, read each secret from KeePassXC and inject via
`create-secrets.sh` scripts that produce K8s Secrets
**Invariant:** Plaintext secrets MUST NOT persist on disk after being
stored in KeePassXC. The only durable forms are: KeePassXC + ops bundle.
---
### Phase 1 — GitOps secrets (SOPS)
**Used when:** Secrets need to live alongside infrastructure code in git.
All repos with infrastructure manifests use this pattern.
**Tools:** SOPS + age
**Configuration (`.sops.yaml` in repo root):**
```yaml
creation_rules:
- path_regex: secrets/.*$
age: >-
<operator-age-public-key>
- path_regex: .*\.sops\.yaml$
age: >-
<operator-age-public-key>
```
**Multi-operator:** When a second operator joins, add their age public key
as an additional recipient and re-encrypt all secrets with `sops updatekeys`.
Both keys can decrypt independently — no single point of failure.
**Invariant:** The age private key is NEVER committed to git. The public
key is committed (in `.sops.yaml` and `keys/age.pub`). Encrypted values
in git are safe to store and review.
---
### Phase 2 — In-cluster (HashiCorp Vault)
**Used when:** Kubernetes cluster is operational and stable.
**Tools:** HashiCorp Vault + External Secrets Operator (ESO)
**Why ESO over Vault Agent Injector:** ESO produces standard K8s Secrets,
which are compatible with plain Helm charts and do not require pod
annotation changes. Decision D4 (net-kingdom DECISIONS.md).
**Flow:**
1. Bootstrap Vault with the root token stored in KeePassXC
2. Enable Kubernetes auth method (`vault auth enable kubernetes`)
3. Create per-service policies with least-privilege access
4. Migrate each service secret from KeePassXC into Vault
5. Deploy ESO `SecretStore` pointing to Vault
6. Replace `create-secrets.sh` calls with `ExternalSecret` manifests
7. Vault reconciles secrets into K8s Secrets automatically
**KeePassXC post-cluster:** Remains the source of truth for:
- The Vault root/unseal keys (emergency only)
- Dev/sandbox systems that do not connect to in-cluster Vault
- New secrets before they are migrated into Vault
---
## 4. KeePassXC Group Structure
All service secrets are organized under a standardized group hierarchy:
```
KeePassXC root
├── Infrastructure
│ ├── SSH Keys
│ │ └── <hostname> (private key as attachment, public key as note)
│ ├── API Tokens
│ │ ├── gitea-admin
│ │ ├── hosteurope-api
│ │ └── hetzner-api
│ └── Cloud Credentials
│ └── <provider>
├── net-kingdom
│ ├── privacyidea
│ │ ├── PI_SECRET_KEY
│ │ ├── PI_PEPPER
│ │ ├── PI_DB_PASSWORD
│ │ ├── pi-admin (password + totp-seed)
│ │ ├── trigger-admin (password + API token)
│ │ └── enckey (attachment: enckey file + audit keypair)
│ ├── lldap
│ │ ├── LLDAP_JWT_SECRET
│ │ └── LLDAP_LDAP_USER_PASS
│ ├── authelia
│ │ ├── AUTHELIA_JWT_SECRET
│ │ ├── AUTHELIA_SESSION_SECRET
│ │ ├── AUTHELIA_STORAGE_ENCRYPTION_KEY
│ │ ├── AUTHELIA_OIDC_HMAC_SECRET
│ │ └── AUTHELIA_KEYCAPE_CLIENT_SECRET
│ └── keycape
│ ├── RSA signing key (attachment: private + public PEM)
│ └── PI_ADMIN_TOKEN
├── railiance
│ ├── postgres
│ │ └── PG_ROOT_PASSWORD
│ └── sops-age
│ └── age private key (attachment: key.txt)
└── vault
├── root-token
└── unseal-keys (attachment: unseal-keys.txt, gpg-encrypted)
```
---
## 5. Age Keypair Management
**One keypair per operator.** The same key is used for:
- SOPS encryption across all repos
- Ops bundle encryption
**Generate:**
```bash
age-keygen -o ~/.config/sops/age/key.txt
# output: Public key: age1...
```
**Add to repos:** Copy the public key into `.sops.yaml` of each repo and
into `keys/age.pub`. Commit both.
**Back up:** The private key file MUST be stored in KeePassXC as an
attachment under `railiance/sops-age/age private key`. The KeePassXC
database is the disaster recovery path for the age private key.
**Rotation:** If the private key is compromised, generate a new keypair,
add the new public key to all repos, re-encrypt all secrets with
`sops updatekeys`, then revoke the old key from all `.sops.yaml` files.
---
## 6. Ops Bundle
The ops bundle is a point-in-time snapshot of all service secrets,
encrypted with age and stored offsite.
**Create:**
```bash
bash gen-secrets.sh ./secrets # generates all secrets as env files
# ... enter each into KeePassXC ...
bash pack-bundle.sh ./secrets <age-pub-key> # → ops-bundle-<date>.tar.age
find secrets/ -type f -exec shred -u {} \; # shred plaintext
```
**Restore:**
```bash
age -d -i ~/.config/sops/age/key.txt -o secrets.tar ops-bundle-<date>.tar.age
tar xf secrets.tar
# re-run create-secrets.sh scripts from restored env files
```
**Frequency:** Create a new ops bundle:
- Before any major cluster operation (migration, upgrade, rekey)
- After adding or rotating any service secret
- At least once per quarter
---
## 7. Prohibited Patterns
These are hard violations regardless of context:
| Pattern | Why prohibited |
|---------|----------------|
| Plaintext secrets committed to git | Unrecoverable leak |
| Secrets in environment variables in shell history | ~/.bash_history exposure |
| Sharing secrets via chat, email, or issue trackers | Uncontrolled propagation |
| Using the same password for multiple services | Single-point compromise |
| Storing age private key only on a single machine | Catastrophic loss on disk failure |
| Hardcoded secrets in application code or Helm values | Accidental publishing |
---
## 8. Multi-operator Extension
When a second operator needs access:
1. They generate their own age keypair (`age-keygen`)
2. Share only the **public key** (never the private key)
3. Primary operator adds it to `.sops.yaml` in all repos
4. Primary operator runs `sops updatekeys <file>` on all encrypted files
5. Both operators can now encrypt and decrypt independently
6. Share KeePassXC database via an encrypted channel (never plaintext)
— the other operator opens it with their own master password after import
---
## 9. Vault Migration Checklist
When the cluster is stable enough to operate Vault:
- [ ] Deploy Vault via Helm with HA mode (3 replicas minimum)
- [ ] Store root token and unseal keys in KeePassXC (vault/ group)
- [ ] Enable Kubernetes auth method
- [ ] Create per-service Vault policies (least privilege)
- [ ] Deploy ESO `ClusterSecretStore` pointing to Vault
- [ ] For each service: create `ExternalSecret` manifest, verify K8s Secret
reconciles correctly, then delete the manually-created K8s Secret
- [ ] Verify ESO auto-rotation works (reduce TTL to 1h, confirm rotation)
- [ ] Remove `create-secrets.sh` scripts from deployment runbooks
- [ ] Update this standard to Phase 2 operational status
---
## 10. Summary
| Situation | Tool | Source of truth |
|-----------|------|----------------|
| No cluster, local dev | KeePassXC + create-secrets.sh | KeePassXC |
| GitOps secrets in repo | SOPS + age | Git (ciphertext) |
| Cluster operational | Vault + ESO | Vault (KeePassXC holds root) |
| Disaster recovery | Ops bundle (age) | Offsite encrypted archive |
| Multi-operator | SOPS multi-recipient | Each operator's age keypair |

View File

@@ -0,0 +1,977 @@
FederatedOrganizationStandard
*Building blocks for scalable organizations*
# FederatedOrganisationStandard (FOS)
*A reference architecture standard for viable, scalable organizations composed of autonomous domains, coordinated through hubs and governed through explicit recursion*
**Version:** 1.0
**Status:** Draft Reference Standard
---
# 1. Purpose
The **FederatedOrganisationStandard (FOS)** defines an organizational architecture for building and operating a scalable entity — or a collection of entities — through a **federated system of domains and hubs**.
It is intended for organizations that:
* combine humans and artificial agents
* operate across multiple management domains
* require strong separation of concerns
* want sovereignty, auditability, and rebuildability
* need to scale recursively from projects to companies to foundation-like umbrella structures
The standard introduces the concept of a **federated organization**:
> A viable organization composed of semi-autonomous operational domains, each coordinated through a domain hub, and aligned through shared policy, escalation, and identity structures.
The standard provides:
* a conceptual model
* a VSM framing
* a core hub set for scalable organizations
* separation-of-concerns rules
* a cross-hub coupling model
* a recursion model for long-term organizational growth
---
# 2. Core Concept
## 2.1 Federated Organization
A **federated organization** is an organization in which:
* operational variety is handled locally where possible
* coordination is provided through explicit hubs
* authority is bounded and visible
* domain-specific systems remain autonomous
* global coherence is achieved through policy, escalation, and shared protocols rather than through monolithic control
This is not a flat platform, and it is not a centralized command stack.
It is an architecture in which:
* domains remain responsible for their own reality
* hubs reduce coordination cost
* higher-order governance constrains without micromanaging
* the same pattern can recur across multiple levels of organizational scale
---
## 2.2 Hub
A **hub** is:
> A domain-specific coordination and orientation layer that makes the state, tensions, requests, and responsibilities of a domain visible and actionable without collapsing that domain into centralized authority.
A hub is not primarily a source of truth.
A hub is primarily a **derived coordination surface**.
---
## 2.3 Federation
Within this standard, **federation** means:
* multiple domains
* multiple hubs
* explicit boundaries
* structured coupling
* recursive viability
Federation does not imply weak structure.
It implies **bounded autonomy plus disciplined coordination**.
---
# 3. Why This Standard Exists
Organizations that grow across technical, operational, financial, legal, and strategic concerns tend to fail in one of two ways:
## 3.1 Centralized Mud
Everything is routed through one giant management layer, one dashboard, one database, one leadership abstraction, or one agent layer.
This creates:
* overloaded coordination channels
* mixed time horizons
* authority confusion
* brittle central systems
* low adaptability
---
## 3.2 Fragmented Drift
Each domain builds its own world without shared coupling structures.
This creates:
* invisible tensions
* misaligned incentives
* cross-domain blockers
* duplicated capabilities
* late escalation of risk
---
## 3.3 FOS Response
FOS exists to establish a middle path:
* autonomy where possible
* coordination where necessary
* escalation where required
* identity where non-negotiable
---
# 4. VSM Framing
The **FederatedOrganisationStandard** is explicitly informed by the logic of the **Viable System Model (VSM)**.
It assumes that a viable organization requires distinct but connected functions for:
* operations
* coordination
* internal control
* audit / direct inspection
* intelligence / adaptation
* identity / policy
FOS uses VSM not as a rigid org chart, but as an architectural framing for keeping complexity manageable.
---
## 4.1 VSM Systems in FOS
### System 1 — Operations
These are the units that actually do work.
Examples:
* software development domains
* infrastructure operations
* finance administration
* legal/governance operations
* customer-facing service domains
* product teams
* business units
System 1 units should absorb as much variety locally as they can.
---
### System 2 — Coordination
This is the layer that reduces oscillation and friction across operational units.
In FOS, hubs provide much of this coordination through:
* shared summaries
* request routing
* dependency visibility
* inboxes and message flows
* standard rituals for orientation and handoff
---
### System 3 — Internal Control
This is the layer concerned with:
* current performance
* resource use
* compliance with internal expectations
* operational coherence
In FOS, each domain hub typically includes System 3 functions relevant to its domain.
---
### System 3* — Audit / Direct Inspection
This is the probing, checking, validating function that bypasses polished reporting when necessary.
Examples in FOS:
* consistency checks
* force refresh
* direct probes
* posture validation
* raw-state inspection
* anomaly review
---
### System 4 — Intelligence / Adaptation
This is the outward- and forward-facing function.
It handles:
* future architecture
* emerging risks
* adaptation
* opportunity sensing
* long-term redesign
* environmental shifts
System 4 may be partially implemented within hubs, but should not be collapsed into day-to-day control.
---
### System 5 — Identity / Policy
This is the function that answers:
* who are we
* what must remain true
* what are our non-delegable boundaries
* what may never be optimized away
In FOS, this role is anchored through a **Canon Hub** or equivalent constitutional layer.
---
# 5. Design Principles
## 5.1 Explicit Separation of Concerns
Each hub MUST be domain-specific.
A hub MUST NOT simultaneously serve as:
* development hub
* operations hub
* finance hub
* security hub
* constitutional governance hub
Blending these domains leads to mixed incentives and architectural confusion.
---
## 5.2 Derived-State Preference
A hub SHOULD be a derived coordination system wherever possible.
That means:
* source artefacts remain authoritative elsewhere
* hub data is computed, indexed, summarized, routed, or logged
* deleting and rebuilding the hub should not destroy organizational truth
---
## 5.3 Bounded Authority
Every hub MUST define:
* what it can observe
* what it can derive
* what it can recommend
* what it can route
* what it can decide
* what it must escalate
---
## 5.4 Recursive Viability
The same organizational pattern should work at multiple levels:
* repo or subsystem
* domain
* operating entity
* umbrella entity
* foundation structure
Each level should be viable in its own right.
---
## 5.5 Informational Coupling Without Structural Fusion
Domains should exchange information through explicit protocols, not by collapsing into one giant shared state model.
This is the core of federation.
---
## 5.6 Sovereignty by Default
The organization should retain operational control over its own coordination systems.
FOS therefore favors:
* local-first systems
* open interfaces
* inspectable stores
* append-only histories
* explicit exports
---
# 6. Core Organizational Primitive: The Hub
## 6.1 Definition
A hub is:
> A domain-specific, bounded coordination layer that exposes the present state, requests, tensions, and responsibilities of a domain in a way that humans and agents can act upon.
---
## 6.2 Minimal Hub Responsibilities
Every hub MUST provide:
* orientation
* coordination
* escalation
* event traceability
* bounded interfaces
* domain summaries
---
## 6.3 Minimal Hub Constraints
Every hub MUST avoid:
* becoming the sole source of truth without justification
* hidden authority
* invisible side effects
* silent irreversible automation
* uncontrolled cross-domain sprawl
---
# 7. Core Hub Set for a Scalable Organization
FOS defines a **core set of hubs** that together support a scalable organization.
Not every organization must implement all of them immediately, but the standard treats them as the canonical target set.
---
## 7.1 Canon Hub
**Role:** Identity, policy, constitutional boundaries
**Dominant VSM Role:** System 5
### Purpose
The Canon Hub defines the stable normative frame of the organization.
It answers:
* what the organization is for
* which roles and mandates exist
* what agents may not do autonomously
* how authority is delegated
* what hard boundaries constrain all lower domains
### Typical Sources
* constitution
* policy documents
* foundational ADRs
* role charters
* mandate definitions
* delegation matrices
### Typical Derived Views
* policy map
* authority map
* escalation destinations
* unresolved governance questions
* conflicts between policies or mandates
### Separation Rule
The Canon Hub MUST remain sparse, stable, and deliberately slower-moving than operational hubs.
---
## 7.2 Dev Hub
**Role:** Software production coordination
**Dominant VSM Roles:** System 2, 3, 3*
### Purpose
The Dev Hub coordinates software design and implementation work across repositories, workstreams, and coding agents.
It answers:
* what are we building
* what is blocked
* what changed
* what capabilities exist
* what should happen next in development
### Typical Sources
* repositories
* workplans
* scope files
* ADRs
* dependency manifests
* capability declarations
### Typical Derived Views
* workstream summaries
* blocker maps
* dependency graphs
* capability catalogs
* decision boards
* development health indicators
### Separation Rule
The Dev Hub MUST NOT become a runtime operations dashboard or security control plane.
---
## 7.3 Ops Hub
**Role:** Runtime operations coordination
**Dominant VSM Roles:** System 2, 3, 3*
### Purpose
The Ops Hub coordinates the running system.
It answers:
* what is running
* what is degraded
* what needs intervention now
* which access paths exist
* where operational risk is accumulating
### Typical Sources
* monitoring systems
* runtime topology
* host or cluster state
* alerts
* operational runbooks
* change records
* backup and certificate metadata
* access bridge definitions
### Typical Derived Views
* now view
* incident board
* resilience posture
* access map
* operational debt view
* capacity risk view
### Separation Rule
The Ops Hub MUST NOT become the owner of infrastructure intent; desired state belongs in infra repositories or equivalent canonical systems.
---
## 7.4 Sec Hub
**Role:** Trust, control, and security posture
**Dominant VSM Roles:** System 3, 3*, and strong coupling to System 5
### Purpose
The Sec Hub governs the trust structure of the organization.
It answers:
* what is trusted
* what is exposed
* which controls are present or missing
* where exceptions are aging
* where access or identity posture is drifting
### Typical Sources
* IAM systems
* audit logs
* policy baselines
* vulnerability data
* exception registers
* certificate and secret metadata
* control definitions
### Typical Derived Views
* control coverage
* exposure map
* exception aging
* privileged-path map
* trust posture summary
### Separation Rule
The Sec Hub MUST remain distinct from the Ops Hub even when tightly coupled to it.
Ops may observe and execute; Sec governs and constrains.
---
## 7.5 Fin Hub
**Role:** Resource viability and allocation
**Dominant VSM Roles:** System 3 and 4
### Purpose
The Fin Hub governs resource viability.
It answers:
* what can we afford
* what is committed
* where burn is rising
* what resource tensions exist across domains
* which initiatives threaten long-term viability
### Typical Sources
* budget artifacts
* accounting exports
* cloud cost data
* resource allocation plans
* obligations and commitments
* investment or reserve tracking
### Typical Derived Views
* runway view
* burn by domain
* committed vs flexible spend
* allocation conflicts
* capital tension alerts
### Separation Rule
The Fin Hub MUST NOT be replaced by ad hoc development or operations heuristics when the organization reaches meaningful scale.
---
## 7.6 Optional Domain Hubs
As the organization grows, additional hubs may appear, such as:
* Legal Hub
* People Hub
* Portfolio Hub
* Research Hub
* Partnership Hub
* Venture Hub
These should only be introduced when the domain is stable and distinct enough to justify its own coordination surface.
---
# 8. Separation of Concerns
## 8.1 Why Separation Matters
Different domains have:
* different time horizons
* different failure modes
* different kinds of authority
* different acceptable risk profiles
* different source artefacts
When these are mixed, the organization loses clarity.
---
## 8.2 Time-Horizon Separation
A useful default reading is:
* Canon Hub: years
* Fin Hub: quarters to years
* Dev Hub: days to months
* Ops Hub: seconds to weeks
* Sec Hub: minutes to quarters, depending on the issue
A hub that mixes radically different horizons will tend toward overload.
---
## 8.3 Responsibility Separation
A practical shorthand:
* **Canon Hub** asks: what must remain true?
* **Dev Hub** asks: what should be built?
* **Ops Hub** asks: what must be kept running?
* **Sec Hub** asks: what must be trusted or contained?
* **Fin Hub** asks: what remains viable?
---
## 8.4 Source-of-Truth Separation
Canonical artefacts should live where they belong:
* code and workplans in repos
* runtime intent in infra repos or control-plane definitions
* trust policy in security/policy artefacts
* financial truth in accounting or finance systems
* constitutional truth in governance canon
Hubs summarize, route, and coordinate across these.
---
# 9. Cross-Hub Coupling Model
## 9.1 Principle
Hubs should be **loosely coupled but informationally rich**.
This means:
* each hub remains structurally separate
* hubs exchange messages, requests, summaries, risks, and escalations
* hubs do not require one giant shared mutable database to cooperate
---
## 9.2 Coupling Modes
FOS recognizes five primary coupling modes.
### 9.2.1 Summary Coupling
One hub provides a compact domain summary to another or to a higher recursion level.
Example:
* Ops Hub reports system readiness to Canon or entity-level governance
* Fin Hub reports budget pressure to leadership
---
### 9.2.2 Request Coupling
One hub requests capability, support, or action from another.
Example:
* Dev Hub requests infrastructure provisioning from Ops Hub
* Ops Hub requests a code fix from Dev Hub
* Sec Hub requests remediation from Ops Hub or Dev Hub
---
### 9.2.3 Risk Coupling
One hub surfaces a risk that another hub must absorb or act on.
Example:
* Sec Hub surfaces identity drift to Ops Hub
* Fin Hub surfaces budget pressure to Dev Hub
* Ops Hub surfaces resilience risk to Canon or entity management
---
### 9.2.4 Escalation Coupling
A domain issue exceeds local authority or capacity and is explicitly escalated.
Example:
* Sec Hub escalates a policy breach to Canon Hub
* Ops Hub escalates a risk involving major spend to Fin Hub
* Dev Hub escalates unresolved architectural conflict to System 4 functions
---
### 9.2.5 Event Coupling
Hubs share relevant append-only events to preserve cross-domain traceability.
Example:
* a deployment event in Dev becomes a change signal in Ops
* an access exception in Sec becomes an operational constraint in Ops
---
## 9.3 Coupling Rules
Cross-hub coupling MUST obey the following rules:
### Rule 1: No hidden dependencies
If one hub depends on another, the dependency should be visible.
### Rule 2: No authority smuggling
A hub must not use messaging to silently take over another hubs mandate.
### Rule 3: No unbounded chatter
Coupling should reduce, not amplify, organizational noise.
### Rule 4: Summaries upward, detail locally
Higher levels should receive compressed meaning, not raw variety dumps.
### Rule 5: Hard boundaries remain hard
Cross-hub coordination must not bypass constitutional, security, or financial constraints.
---
# 10. Standard Cross-Hub Contract
To support federation, all hubs SHOULD expose a minimal common contract.
## 10.1 Required Generic Functions
### Orientation
* `get_domain_summary()`
### Messaging
* `get_messages()`
* `send_message()`
### Risks and Alerts
* `get_risks()`
* `get_alerts()`
### Coordination
* `request_capability()`
* `record_event()`
### Escalation
* `escalate_issue()`
---
## 10.2 Domain-Specific Extensions
Beyond the shared contract, each hub SHOULD expose domain-specific functions.
Examples:
* Dev Hub: workstreams, capabilities, decisions
* Ops Hub: services, incidents, runbooks, access paths
* Sec Hub: controls, exposures, exceptions
* Fin Hub: budgets, commitments, runway
* Canon Hub: mandates, policies, delegation rules
---
# 11. Organizational Recursion
## 11.1 Recursion Principle
A federated organization may exist at multiple levels simultaneously.
Examples:
* a repo as a micro-domain
* a hub as a domain-level coordinator
* an operating company as a viable entity
* a foundation or family structure as a higher-order viable system
* a portfolio of ventures as a still higher recursion layer
The same viability logic should hold at each level.
---
## 11.2 Canonical Recursion Levels
### L0 — Subsystem / Repo / Service
A bounded working unit.
### L1 — Domain Hub
Dev, Ops, Sec, Fin, etc.
### L2 — Operating Entity
The company or core operating body.
### L3 — Umbrella Governance Entity
Foundation, holding structure, or multi-venture umbrella.
---
## 11.3 Escalation Across Recursion Levels
Escalation should occur when:
* local authority is insufficient
* risk exceeds local tolerance
* policy conflict cannot be resolved locally
* resource conflict crosses domain boundaries
* strategic redesign is required
The higher recursion level should receive a summary plus context, not raw noise.
---
# 12. Role Logic in a Federated Organization
FOS does not prescribe a rigid human org chart, but it does define role logic.
## 12.1 Every Domain Needs at Least
* an operational role
* a coordination role
* an analytical or review role
These may be human, artificial, or hybrid.
---
## 12.2 Higher-Order Roles
As the organization grows, meta-roles may emerge, such as:
* chief technical operator
* security steward
* portfolio strategist
* constitutional steward
* financial allocator
These roles should coordinate across hubs, not erase them.
---
# 13. Anti-Patterns
## 13.1 The Mega-Hub
One hub for everything.
This destroys separation of concerns and creates central mud.
---
## 13.2 The Silent Empire
A hub accumulates hidden authority and becomes de facto sovereign.
This undermines explicit governance.
---
## 13.3 Domain Collapse
Development, operations, security, and finance are treated as one blended management problem.
This guarantees confusion.
---
## 13.4 Connector Spaghetti
Cross-hub integration grows ad hoc without standard contracts.
This creates invisible fragility.
---
## 13.5 Upward Variety Flooding
Higher-order governance is flooded with low-level events and raw detail.
This breaks recursion.
---
# 14. Maturity Model
## Level 0 — Unstructured
No explicit hub logic, ad hoc coordination.
## Level 1 — Single-Hub Emergence
One hub exists, but boundaries are still mixed.
## Level 2 — Domain Hub Clarity
At least Dev and Ops are distinct.
## Level 3 — Core Federation
Canon, Dev, Ops, Sec, and Fin are conceptually separated and partially operational.
## Level 4 — Protocolized Coupling
Cross-hub messaging, requests, risks, and escalations follow standard contracts.
## Level 5 — Recursive Federation
Multiple entities or ventures operate under shared constitutional logic while retaining local autonomy.
---
# 15. Minimal Reference Architecture
A minimal scalable FOS architecture contains:
* one **Canon Hub**
* one **Dev Hub**
* one **Ops Hub**
* one **Sec Hub**
* one **Fin Hub**
* one **common cross-hub protocol**
* one **shared event and escalation vocabulary**
* one **explicit recursion model**
Not all must be implemented at once, but the organization should know where it is heading.
---
# 16. Key Insight
> A scalable organization is not built by centralizing everything.
> It is built by creating viable domains with clean boundaries, then coupling them through explicit hubs, shared protocols, and constitutional constraints.
---
# 17. Closing Statement
The **FederatedOrganisationStandard** defines an organization as a federation of viable domains rather than as a monolithic machine.
Its core commitments are:
* autonomy with accountability
* coordination without collapse
* policy without micromanagement
* recursion without chaos
* visibility without loss of sovereignty
It provides a path by which a single evolving project can grow into a multi-domain, multi-entity, foundation-compatible structure without losing clarity of identity or operational coherence.
xxx

View File

@@ -235,26 +235,32 @@ validate-adr:
uv run python scripts/validate_repo_adr.py "$(REPO)" $(if $(DOMAIN),--domain "$(DOMAIN)",)
## Check a single repo for ADR-001 consistency: make check-consistency REPO=the-custodian [REPO_PATH=/override]
## Exit 0 = clean, exit 2 = warnings only (treated as success), exit 1 = failures
check-consistency:
@test -n "$(REPO)" || (echo "ERROR: REPO is required. Usage: make check-consistency REPO=<slug>"; exit 1)
uv run python scripts/consistency_check.py --repo "$(REPO)" \
$(if $(API_BASE),--api-base "$(API_BASE)",) \
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)",)
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)",); \
e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e
## Check and auto-fix a single repo: make fix-consistency REPO=the-custodian [REPO_PATH=/override]
## Exit 0 = clean, exit 2 = warnings only (treated as success), exit 1 = failures
fix-consistency:
@test -n "$(REPO)" || (echo "ERROR: REPO is required. Usage: make fix-consistency REPO=<slug>"; exit 1)
uv run python scripts/consistency_check.py --repo "$(REPO)" --fix \
$(if $(API_BASE),--api-base "$(API_BASE)",) \
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)",)
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)",); \
e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e
## Check all registered repos for ADR-001 consistency
check-consistency-all:
uv run python scripts/consistency_check.py --all $(if $(API_BASE),--api-base "$(API_BASE)",)
uv run python scripts/consistency_check.py --all $(if $(API_BASE),--api-base "$(API_BASE)",); \
e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e
## Check and auto-fix all registered repos
fix-consistency-all:
uv run python scripts/consistency_check.py --all --fix $(if $(API_BASE),--api-base "$(API_BASE)",)
uv run python scripts/consistency_check.py --all --fix $(if $(API_BASE),--api-base "$(API_BASE)",); \
e=$$?; [ $$e -eq 2 ] && exit 0 || exit $$e
## Cancel open tasks belonging to completed/archived workstreams.
## Safe to run at any time; also suitable for a daily cron job.

118
wiki/BigPictureGuidance.md Normal file
View File

@@ -0,0 +1,118 @@
We will set up a federated autonomous organization and IT infrastructure based on two VSM based
standards according to the following guidance about how to integrate them and adress conflicts
and blind spots.
This document is the current Big Picture Guidance for the Custodian and should inform creation
and priority of how to spend time, money and tokens bootstrapping the infrastructure efficiently.
**Integration Guidance: Orthogonal Architecture Standard (OAS) + Federated Organisation Standard (FOS)**
Note: The standards are available under canon/standards/
Both standards are **explicitly built on the same foundation** — Stafford Beers Viable System Model (VSM). This is not a coincidence; it is the single strongest reason they integrate cleanly. OAS applies VSM to **compute systems** (Kubernetes/cloud-native infra), while FOS applies the identical VSM logic to **organizational structure** (including humans + AI agents). Together they form a complete “cybernetic stack”: FOS gives you the viable *organization*, OAS gives you the viable *infrastructure* that the organization actually runs on.
### 1. How to Integrate Them (Recommended Architecture)
Treat the two standards as **orthogonal layers of one recursive system**:
- **FOS = System-of-Systems layer** (organizational recursion L0L3)
- **OAS = Compute substrate layer** (technical recursion inside every hub and every domain)
**Core mapping (one-to-one VSM alignment)**
| VSM Role | FOS Hub(s) | OAS Dimension(s) Used to Realize It | Concrete Implementation Pattern |
|----------------|--------------------------------|------------------------------------------------------|---------------------------------|
| System 5 | Canon Hub | Plane P2 + Capability C1 + Quality Q7 + Intelligence I5 | GitOps repo + policy-as-code + AI agents that *must* route through control plane |
| System 4 | Fin Hub + parts of Dev Hub | Intelligence I4 + Capability C4 + Stack S4 | Adaptive autoscaling + AI config assistants + cost-optimization loops |
| System 3 / 3* | Sec Hub + Ops Hub | Plane P2/P3 + Quality (all Q) + Stack S2/S3 | Kubernetes control plane + policy engines + observability + audit probes |
| System 2 | Dev Hub + Ops Hub cross-talk | Logic L3 + Plane P2 + Relation “governs”/“observes” | Workflow engines + GitOps controllers + cross-hub protocol |
| System 1 | All operational domains + workloads | Stack S5 + Logic L2/L4 + Capability C3/C5 | Actual pods/services/APIs that deliver value |
**Practical bootstrap sequence (start small, stay viable at every level)**
1. **Day 030: Canon Hub first**
Create a single sparse Git repository (or lightweight Notion/Obsidian + Git sync). This becomes your System 5.
Declare: mandates, delegation matrix, non-delegable boundaries, and the rule “*All AI actions MUST route through the control plane*” (directly from OAS I5 + FOS 5.6).
2. **Day 3060: Seed the four core hubs as derived-state services**
- Dev Hub → Backstage or custom portal on top of your repos + ADRs + capability catalog (OAS Logic L1 + Capability C1C3).
- Ops Hub → ArgoCD + Prometheus + custom “now view” dashboard (OAS Stack S1S3 + Plane P2).
- Sec Hub → OPA/Gatekeeper + Vault + Trivy + exception register (OAS Quality Q1 + Plane P2).
- Fin Hub → OpenCost + Git-based budget manifests + runway calculator (OAS Quality Q6 + Intelligence I4).
All four hubs expose the **minimal cross-hub contract** from FOS §10 (get_domain_summary, request_capability, escalate_issue, etc.). Implement it once as a small Kubernetes operator or simple REST + NATS/event bus.
3. **Ongoing: Make every hub itself a recursive OAS system**
Each hub runs on its own namespace/cluster slice. Apply the full OAS dimensions inside it:
- Its workloads = OAS Stack S5
- Its internal control = OAS Plane P2
- Its AI agents = OAS Intelligence I5 (always governed)
This satisfies FOS recursion principle automatically.
4. **AI Agent Integration (the autonomous part)**
- All agents live in the **Intelligence dimension** of OAS.
- They are treated as **System 1 units** inside the relevant FOS domain (e.g., coding agents in Dev Hub).
- They *never* bypass the control plane (OAS P3 + FOS 10.1).
- Canon Hub policies (e.g., “no agent may spend >€500 without escalation”) are enforced by OAS Quality Q7 + Sec Hub.
### 2. Conflicts to Mitigate (there are only two real ones)
**Conflict 1: Centralised Control Plane vs. Derived-State Hubs**
OAS insists “all changes go through control plane”.
FOS insists “hubs must remain derived, rebuildable, not sole source of truth”.
**Mitigation (already built into both standards):**
- Make the control plane itself **derived** (GitOps + reconciliation loops).
- Source of truth = canonical Git repos + policy documents.
- Hubs only index, summarise, and route.
- Delete/rebuild any hub → it re-derives everything from source (FOS 5.2 + OAS Plane P2).
**Conflict 2: Time-horizon mixing**
Ops Hub wants seconds-to-weeks view; Fin Hub wants quarters-to-years; Canon wants “forever”.
**Mitigation:**
- Enforce FOS §8.2 time-horizon separation at the data model level (different retention, different dashboards, different escalation SLAs).
- Use OAS Quality Q7 governance policies to forbid a single dashboard that mixes horizons.
No other structural conflicts exist — the standards were clearly designed to interlock.
### 3. Blindspots to Address (these are the real gaps)
1. **Legal / Regulatory Reality (FOS mentions Legal Hub as optional — make it mandatory in EU)**
You are in Germany. Add a lightweight Legal Hub (or at least Canon Hub section) for DSGVO, AI Act, GmbH law, tax, etc. OAS does not mention compliance jurisdiction — you must layer it on top of Quality Q1 + Canon Hub.
2. **Bootstrapping & Initial Capital**
Neither standard tells you how to fund the first €10k or hire the first human.
→ Create a one-time “Bootstrap Protocol” document in Canon Hub that defines seed funding, MVP scope, and first three mandated roles (Constitutional Steward + Technical Operator + Financial Allocator).
3. **HumanAI Handover & Psychological Safety**
FOS talks about “humans and artificial agents” but gives no guidance on when a human must override an agent or how to handle agent mistakes that affect people.
→ Add explicit “Human Override” policy in Canon Hub + OAS Intelligence I5 rule: every agent action must be auditable and reversible by a human in <5 min.
4. **External Interfaces & Customers**
Both standards are inward-focused.
→ Explicitly model “Customer Capability” in OAS Capability C5 and expose it through a public-facing API gateway (OAS Stack S5) that is still governed by the same control plane.
5. **Exit / Dissolution / Succession**
No standard covers “what happens if the founder disappears or the org needs to be wound down”.
→ Canon Hub must contain a “Sunset Clause” and delegation-of-dissolution rules.
6. **Concrete Tooling & Cost Control**
The standards are abstract. A practical minimal stack that satisfies both at once (all open-source, EU-hostable):
- Kubernetes (k3s or Talos) + ArgoCD + Crossplane (for OAS Stack)
- Backstage (Dev Hub)
- Grafana + Loki + Tempo + OpenCost (Ops + Fin)
- OPA + Kyverno + Vault (Sec)
- NATS/JetStream or Temporal for cross-hub protocol
- Ollama / local LLM agents routed through control plane (Intelligence)
### Recommended Next Steps (30-day plan)
Week 1: Create Canon Hub repo + write the 10 non-delegable policies (including “AI must go through control plane”).
Week 2: Spin up minimal Kubernetes + ArgoCD + the four hubs as simple operators/portals.
Week 3: Implement the cross-hub contract (5 generic functions).
Week 4: Seed first operational workload + first AI agent (coding assistant) and prove it cannot bypass governance.
If you follow this integration path you will have a **genuinely autonomous, viable, auditable, rebuildable organization + infrastructure** that grows recursively without collapsing into central mud or fragmented drift — exactly what both standards promise.
Both documents are still “Draft Standard” (v1.0). Treat them as living artefacts: version them inside the Canon Hub and evolve them together as your system learns. That is the ultimate recursive move.

View File

@@ -0,0 +1,492 @@
---
id: CUST-WP-0025
type: workplan
title: "FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub"
domain: custodian
repo: the-custodian
status: active
owner: custodian
topic_slug: custodian
created: "2026-03-20"
updated: "2026-03-20"
state_hub_workstream_id: "293a74fe-a85a-4ad6-8933-23d52a72fe8b"
---
# FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub
## Goal
Progress the Custodian from FOS maturity Level 1 (Single-Hub Emergence) toward
Level 3 (Core Federation) by:
1. Finalizing shared identity infrastructure (NetKingdom SSO)
2. Extracting a generic reusable hub-core package from state-hub
3. Renaming state-hub to dev-hub and transitioning all repos
4. Creating the ops-hub for runtime operations coordination
5. Building the fin-hub with railiance-as-a-service as first monetization path
## Context
The state-hub has matured through 24 completed workplans (62 workstreams, 573 tasks)
but remains a monolithic single hub mixing dev-coordination, governance, and generic
infrastructure. Per FOS §13.1, this risks becoming the "Mega-Hub" anti-pattern.
Two standards govern the architecture:
- **FOS** (Federated Organisation Standard): organizational recursion via domain hubs
- **OAS** (Orthogonal Architecture Standard): compute substrate via 6 dimensions
Together they form a complete cybernetic stack: FOS gives the viable organization,
OAS gives the viable infrastructure.
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Hub-core packaging | Separate pip-installable package | Clean separation, versioned independently, each hub depends via uv |
| Phase sequencing | Parallel start (Phase 1 + 2) | Identity and extraction run concurrently; auth bolted on later |
| Ops Hub location | New standalone repo | FOS separation principle — each hub independently deployable |
| First monetization | Railiance-as-a-service | Package OAS infra stack as managed/consultancy for EU SMEs |
## Phase 1 — Identity Infrastructure
**Goal**: Finalized user-id infrastructure so all future hubs share one SSO plane.
**Repos**: net-kingdom, railiance-cluster, railiance-platform
**Runs in parallel with Phase 2.**
### T01 — Complete NK-WP-0001: Keycloak + privacyIDEA on k3s
```task
id: CUST-WP-0025-T01
status: todo
priority: high
state_hub_task_id: "f55078b6-7fa3-49ab-be30-37db622d64c9"
```
Complete the SSO/MFA platform deployment. Keycloak as OIDC provider with
privacyIDEA for MFA, running on the k3s cluster. This is the identity
foundation for all hubs and services.
Cross-reference: net-kingdom NK-WP-0001.
### T02 — Complete NK-WP-0002: Local identity bootstrap
```task
id: CUST-WP-0025-T02
status: todo
priority: high
state_hub_task_id: "0d7792f7-5695-4e1a-9726-b9661d5e7108"
```
Implement lightweight file-based OIDC server for dev/sandbox/bootstrap
scenarios where the full Keycloak cluster is unavailable. Enables local
development of hub services without cluster dependency.
Cross-reference: net-kingdom NK-WP-0002.
### T03 — IAM Profile integration test
```task
id: CUST-WP-0025-T03
status: todo
priority: medium
state_hub_task_id: "e9894ac9-add3-45a6-9893-ea67c6e5e260"
```
Prove a FastAPI service can authenticate via NetKingdom OIDC end-to-end.
Write a minimal test service + integration test that:
- Obtains a token via OIDC/PKCE flow
- Calls a protected endpoint
- Validates token claims (sub, roles, expiry)
This test becomes the template for hub-core auth middleware.
### T04 — Canon standard: IAM Profile specification
```task
id: CUST-WP-0025-T04
status: todo
priority: medium
state_hub_task_id: "69acc880-394b-478a-94f0-476c9cbc1bc6"
```
Document the OIDC contract as `canon/standards/iam-profile_v0.1.md`:
- Discovery endpoint structure
- Required claims and scopes
- Token lifecycle (access + refresh)
- Hub-to-hub service account pattern
- Human override / emergency access
## Phase 2 — Hub Extraction & Dev Hub Rename
**Goal**: Extract generic hub-core package; rename state-hub to dev-hub.
**Repo**: the-custodian (extraction), hub-core (new repo)
**Runs in parallel with Phase 1.**
### Extraction Boundary
**Generic hub-core (~17 MCP tools, ~6 models, ~6 routers):**
- Models: Domain, AgentMessage, CapabilityCatalog, CapabilityRequest, ManagedRepo, TPSC*, ProgressEvent (generic event_types)
- Routers: domains, repos, messages, capability_requests, tpsc, policy
- MCP tools: orientation, messaging, capability routing, repo management, TPSC/GDPR, DoI
**Dev-hub-specific (~51 MCP tools, ~12 models):**
- Topics, workstreams, tasks, decisions, dependencies, EP/TD, contributions, SBOM, goals, DoI cache, kaizen agents, consistency checker
### T05 — Create hub-core package
```task
id: CUST-WP-0025-T05
status: todo
priority: high
state_hub_task_id: "04bf480c-8847-4a89-a4f2-e7c5fc51088d"
```
Create `hub-core` as a standalone repo with `pyproject.toml` (uv-managed).
Extract from state-hub:
- Generic SQLAlchemy models (Domain, AgentMessage, CapabilityCatalog, CapabilityRequest, ManagedRepo, TPSC*, ProgressEvent)
- Generic Pydantic schemas
- Generic FastAPI routers (domains, repos, messages, capability_requests, tpsc, policy)
- Alembic migration templates for core schema
- Shared utilities (slug resolution, pagination, trailing-slash normalization)
### T06 — Hub-core FastMCP base server
```task
id: CUST-WP-0025-T06
status: todo
priority: high
state_hub_task_id: "6b49d94a-b1ea-4507-a8a3-e27c1a918491"
```
Add a base MCP server class to hub-core that provides the ~17 generic tools:
- Orientation: get_state_summary, get_domain_summary, list_domains
- Messaging: send_message, get_messages, mark_message_read, reply_to_message
- Capability routing: register_capability, list_capabilities, request_capability, accept_capability_request, update_capability_request_status, list_capability_requests, get_capability_request
- Repo management: register_repo, update_repo_path, list_domain_repos
- TPSC/GDPR: register_service, list_services, ingest_tpsc_tool, get_gdpr_report
- DoI: check_repo_doi, get_doi_summary
Domain-specific hubs inherit and add their own tools.
### T07 — FOS §10 risk and alert tools
```task
id: CUST-WP-0025-T07
status: todo
priority: medium
state_hub_task_id: "5a54af24-f7cb-451f-874f-66bd6979ab07"
```
Add `get_risks()` and `get_alerts()` to hub-core, formalizing existing
ProgressEvent patterns. Define canonical event_type values:
- `risk_surfaced`, `risk_mitigated`, `risk_escalated`
- `alert_raised`, `alert_acknowledged`, `alert_resolved`
This completes the FOS §10 cross-hub contract.
### T08 — Refactor state-hub to import from hub-core
```task
id: CUST-WP-0025-T08
status: todo
priority: high
state_hub_task_id: "daf1d8ac-b55a-4692-b359-2671ddf6fc8a"
```
Refactor the state-hub codebase:
- Replace generic models/routers/schemas with imports from hub-core
- Keep dev-specific code (topics, workstreams, tasks, decisions, etc.) in state-hub
- Ensure all existing tests pass with the new import structure
- Update pyproject.toml to depend on hub-core
### T09 — Rename MCP server state-hub to dev-hub
```task
id: CUST-WP-0025-T09
status: todo
priority: high
state_hub_task_id: "2148a804-7d6a-4e26-b1a8-08da24929c88"
```
Rename across all integration points:
- `state-hub/mcp_server/server.py`: name="state-hub" → "dev-hub"
- `~/.claude/CLAUDE.md`: 3 locations (registration commands, references)
- `state-hub/scripts/register_project.sh`: validation checks
- `state-hub/scripts/patch_mcp_cwd.py`: config checks
- `state-hub/custodian_cli.py`: config checks
- `state-hub/scripts/project_rules/session-protocol.template`: template text
- `state-hub/api/main.py`: service metadata response
### T10 — MCP config migration script
```task
id: CUST-WP-0025-T10
status: todo
priority: medium
state_hub_task_id: "5953f129-089d-4d90-bbe5-f86da4eac1bf"
```
Create `state-hub/scripts/migrate_mcp_config.py` that:
- Reads `~/.claude.json`
- Renames `mcpServers["state-hub"]` to `mcpServers["dev-hub"]`
- Preserves all other settings
- Backs up original file before writing
### T11 — Regenerate domain repo rule files
```task
id: CUST-WP-0025-T11
status: todo
priority: medium
state_hub_task_id: "7b41766b-f97f-4e9f-9f3c-c0937edb355f"
```
After template update, regenerate `.claude/rules/session-protocol.md` for
all registered domain repos:
- railiance-infra, railiance-cluster, railiance-platform
- railiance-enablement, railiance-apps
- net-kingdom, markitect, coulomb.social
- personhood, foerster-capabilities
### T12 — Full test suite and consistency check
```task
id: CUST-WP-0025-T12
status: todo
priority: high
state_hub_task_id: "e55ae544-3cea-485e-80d5-a9696ef97b96"
```
Gate: all of the following must pass before Phase 2 is considered complete:
- `cd state-hub && make test` — full test suite
- `make fix-consistency REPO=the-custodian` — workplan ↔ DB sync
- `make check-consistency-all` — all registered repos
- Manual smoke test: start dev-hub MCP server, run get_domain_summary from a domain repo
## Phase 3 — Ops Hub
**Goal**: Runtime operations coordination per FOS §7.3.
**Depends on**: Phase 2 (hub_core available), Phase 1 (identity for service auth).
**Repo**: ops-hub (new standalone repo, registered under custodian domain)
### T13 — Create ops-hub repo from hub-core scaffold
```task
id: CUST-WP-0025-T13
status: todo
priority: medium
state_hub_task_id: "2c6d1429-a67a-4f66-84d1-cb32ffdb890f"
```
Create `ops-hub` repo with:
- pyproject.toml depending on hub-core
- FastAPI app factory inheriting hub-core base
- MCP server extending hub-core base server
- Alembic setup with hub-core core migrations + ops-specific
- Register as managed repo under custodian domain
### T14 — Ops-specific models
```task
id: CUST-WP-0025-T14
status: todo
priority: medium
state_hub_task_id: "0e811e9b-23a5-49f9-979e-cd1c5dcd937f"
```
Define SQLAlchemy models for:
- **Service**: name, namespace, health_status, last_seen, endpoints
- **Incident**: severity, status (open/investigating/mitigated/resolved), timeline
- **Runbook**: service_id, trigger_conditions, steps, last_executed
- **AccessPath**: type (ssh/k8s/http), target, auth_method, status
- **OperationalDebt**: category, severity, location, owner
- **ChangeRecord**: what changed, when, by whom, rollback_path
### T15 — Ops-specific MCP tools
```task
id: CUST-WP-0025-T15
status: todo
priority: medium
state_hub_task_id: "3fdd1f61-4c8e-4614-898b-df7a9aa4a514"
```
Implement ops-domain MCP tools:
- Service registry: register_service, list_services, get_service_health
- Health probes: probe_service, get_cluster_health, get_storage_health
- Incident lifecycle: create_incident, update_incident, resolve_incident
- Runbook: get_runbook, execute_runbook_step
- Access: list_access_paths, check_access_path
### T16 — Railiance infrastructure integration
```task
id: CUST-WP-0025-T16
status: todo
priority: medium
state_hub_task_id: "702849c5-b253-4ede-afa7-0ab4f81e49a5"
```
Connect ops-hub to railiance infrastructure observability:
- k3s cluster health via kubectl/API
- Longhorn storage status and replication state
- Certificate expiry tracking (cert-manager)
- Backup status (S2 integrated backup)
- SSH tunnel health (ops-bridge)
### T17 — Cross-hub protocol: ops-hub to dev-hub
```task
id: CUST-WP-0025-T17
status: todo
priority: medium
state_hub_task_id: "b99a3ed8-440b-4e28-88f5-495de7276f66"
```
Implement FOS §9.2.5 event coupling:
- Deployment events in dev-hub → change signals in ops-hub
- Incident events in ops-hub → blocker signals in dev-hub
- Shared event vocabulary (canonical event_types)
- HTTP-based event forwarding (keep it simple; upgrade to NATS later if needed)
### T18 — Ops Hub "now view" dashboard
```task
id: CUST-WP-0025-T18
status: todo
priority: low
state_hub_task_id: "5b6cea8b-3982-49be-bacf-7269a3d2104e"
```
Observable Framework dashboard for ops-hub:
- Service status grid (green/amber/red)
- Active incidents timeline
- Access path map
- Storage and certificate health
- Recent change log
### T19 — Register ops-hub as MCP server
```task
id: CUST-WP-0025-T19
status: todo
priority: medium
state_hub_task_id: "f033c80e-4ebb-49cf-8987-20c9b2ff4c13"
```
Register ops-hub MCP server:
- Port 8002 (dev-hub on 8001, ops-hub on 8002)
- Update global `~/.claude/CLAUDE.md` with ops-hub registration
- Update session protocol: domain repos that touch infrastructure should
call both `get_domain_summary()` (dev-hub) and ops-hub orientation
## Phase 4 — Business Model & Fin Hub
**Goal**: First monetization via railiance-as-a-service + resource viability hub.
**Depends on**: Phase 3 (multi-hub pattern proven).
### T20 — Business model canvas: railiance-as-a-service
```task
id: CUST-WP-0025-T20
status: todo
priority: medium
state_hub_task_id: "55db0560-2733-481d-adba-b72c3839ba45"
```
Define the offering:
- Target: EU SMEs needing sovereign, GDPR-compliant DevOps infrastructure
- Core: managed k3s cluster + observability + GitOps + backup
- Differentiator: VSM-based organizational architecture, not just infra
- Pricing tiers: self-hosted (open-source), managed, fully operated
- Document as `canon/projects/railiance/business-model-canvas_v0.1.md`
### T21 — Canon: Bootstrap Protocol document
```task
id: CUST-WP-0025-T21
status: todo
priority: medium
state_hub_task_id: "ce54d3fc-140e-49be-a181-779abc434d4e"
```
Address FOS blindspot #2 (bootstrapping & initial capital):
- Seed funding strategy and minimum viable budget
- MVP scope definition (what must exist before first customer)
- First 3 mandated roles: Constitutional Steward, Technical Operator, Financial Allocator
- Revenue threshold for role formalization
- Document as `canon/constitution/bootstrap-protocol_v0.1.md`
### T22 — Create fin-hub repo from hub-core scaffold
```task
id: CUST-WP-0025-T22
status: todo
priority: low
state_hub_task_id: "670757d8-305d-4736-9056-e79a150114b1"
```
Create `fin-hub` repo with same scaffold pattern as ops-hub.
Register under custodian domain.
### T23 — Fin-specific models
```task
id: CUST-WP-0025-T23
status: todo
priority: low
state_hub_task_id: "8ebffb3f-0dbb-4672-b4e9-928992c41cf4"
```
Define SQLAlchemy models for:
- **Budget**: domain, period, allocated, committed, spent
- **Commitment**: type (subscription/contract/salary), amount, cadence, start/end
- **BurnRate**: domain, period, actual_spend, projected_spend
- **RunwayProjection**: current_balance, monthly_burn, months_remaining, alert_threshold
- **TokenSpend**: provider (anthropic/openai), model, tokens_in, tokens_out, cost, session_id
### T24 — Fin-hub implementation: cost tracking + runway
```task
id: CUST-WP-0025-T24
status: todo
priority: low
state_hub_task_id: "405f81d3-dec5-4154-a1b8-a3af344a0cc4"
```
Implement:
- Cloud cost ingestion (manual CSV import initially, OpenCost integration later)
- Anthropic API token spend tracking (parse billing exports)
- HostEurope server cost tracking
- Runway calculator with burn-rate projection
- Budget alerts when projected runway drops below threshold
### T25 — Cross-hub coupling: fin-hub connections
```task
id: CUST-WP-0025-T25
status: todo
priority: low
state_hub_task_id: "90a41790-7290-4145-b89f-88bf491d7652"
```
Implement FOS §9 cross-hub coupling:
- fin→dev: resource pressure signals (budget alerts surface in dev-hub)
- fin→ops: infrastructure cost attribution (per-service cost view)
- fin→canon: viability alerts (runway below threshold escalates to System 5)
### T26 — Pricing and packaging: railiance-as-a-service MVP
```task
id: CUST-WP-0025-T26
status: todo
priority: low
state_hub_task_id: "e17ef269-e349-44cc-ab14-6c57b43199b1"
```
Concrete pricing:
- Define 3 tiers with feature matrix
- Create landing page content
- Define onboarding workflow (customer → provisioned k3s + monitoring)
- Legal: GmbH implications, liability, SLA framework
- First customer acquisition strategy