generated from coulomb/repo-seed
Some checks failed
Test / test (push) Has been cancelled
Implements IHUB-WP-0009: closes four GAAF-2026 gaps before domain hub work begins. - TypeRegistry helper + controllers/views (hub_kind, hub_capability_manifest) - HubCapabilityManifest entity with validation and registry linkage - ARCHITECTURE-LAYERS.md + CI-enforced boundary contracts - Alembic migration 1743724800, fitness tests (Test/Architecture/) - GAAF spec, Operational Architecture spec, domain hub extension guide - Updates to CLAUDE.md, SCOPE.md, Schema.sql, Routes, FrontController, Types state_hub_sync: pending (tunnel was STALE at completion time; run fix-consistency) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
243 lines
13 KiB
Markdown
243 lines
13 KiB
Markdown
# Operational Architecture — NetKingdom / Railiance OAS
|
||
**Version:** 0.1
|
||
**Date:** 2026-03-31
|
||
**Status:** Adopted — working document
|
||
|
||
---
|
||
|
||
## 1. Governing Principle
|
||
|
||
> **The governor must not run on the governed.**
|
||
|
||
The management plane and the application domain are operationally independent. Neither may have a hard runtime dependency on the other. Identity federation is the one permitted soft coupling, and it runs in one direction only: the application domain may optionally trust the management plane IdP; the management plane trusts nothing in the cluster.
|
||
|
||
---
|
||
|
||
## 2. Two-Domain Model
|
||
|
||
### 2.1 Management Plane
|
||
|
||
| Attribute | Value |
|
||
|-----------|-------|
|
||
| Host type | Dedicated NixOS VPS (e.g. Hetzner CX22 or equivalent) |
|
||
| Provisioning | `nixos-anywhere` called from Terraform S1 (NixOS module added to existing S1 patterns) |
|
||
| Runtime | systemd services under NixOS — no container orchestrator |
|
||
| Config management | Declarative `configuration.nix`; atomic rollbacks via NixOS generations |
|
||
| Secrets | `agenix` (NixOS-native, age-encrypted secrets in config repo) |
|
||
|
||
**Workloads hosted on the management plane:**
|
||
|
||
| Service | Role |
|
||
|---------|------|
|
||
| `the-custodian` (FastAPI + PostgreSQL) | State hub — decisions, workstreams, progress events |
|
||
| `inter-hub` (IHP/Haskell) | Interaction Hub Framework — governed interaction substrate |
|
||
| All domain hub instances (dev-hub, ops-hub, fin-hub, …) | Hub instances built on the inter-hub framework |
|
||
| LLDAP (management users only) | Authoritative directory for operator accounts |
|
||
| Authelia | SSO/OIDC for management-plane services |
|
||
| ops-bridge | Management traffic entry point; not a governed workload itself |
|
||
|
||
**What does NOT run here:**
|
||
|
||
- Application workloads (markitect, kaizen-agentic, coulomb.social, activity-core, …)
|
||
- The cluster-resident key-cape identity stack
|
||
- Any service whose availability must depend on cluster health
|
||
|
||
### 2.2 Application Domain
|
||
|
||
| Attribute | Value |
|
||
|-----------|-------|
|
||
| Host(s) | COULOMBCORE + RAILIANCE01 |
|
||
| Orchestration | k3s (Railiance OAS S1–S5: Terraform/Ansible → cnpg → ArgoCD/Helm) |
|
||
| Config management | GitOps via ArgoCD |
|
||
| Secrets | SOPS/age (existing cluster pattern) |
|
||
|
||
**Workloads hosted in the cluster:**
|
||
|
||
| Service | Role |
|
||
|---------|------|
|
||
| `key-cape` | Application-domain IdP: Authelia + LLDAP + privacyIDEA (SSO/MFA/OIDC) |
|
||
| `markitect` | Application workload |
|
||
| `kaizen-agentic` | Application workload |
|
||
| `coulomb.social` | Application workload |
|
||
| `activity-core` | Application workload |
|
||
| cnpg PostgreSQL | Cluster-resident databases |
|
||
| cert-manager / ACME | TLS for `*.coulomb.social` |
|
||
|
||
**Status note (as of 2026-03-31):** key-cape stack (Authelia + LLDAP + privacyIDEA) is deployed and validated on RAILIANCE01 (NK-WP-0003 T01–T08 complete). T09 (backup, DR, monitoring) is the remaining task.
|
||
|
||
---
|
||
|
||
## 3. Identity and Security Architecture
|
||
|
||
### 3.1 Stack Placement
|
||
|
||
```
|
||
Management Plane (NixOS) Application Domain (k3s)
|
||
───────────────────────── ──────────────────────────────────
|
||
LLDAP ◀── operator accounts only key-cape:
|
||
Authelia ── OIDC for mgmt services ├─ LLDAP (application users)
|
||
├─ Authelia (SSO, OIDC broker)
|
||
optional upstream trust ──────▶ └─ privacyIDEA (MFA)
|
||
(cluster Authelia may pull
|
||
mgmt LLDAP as upstream)
|
||
```
|
||
|
||
### 3.2 Federation Direction
|
||
|
||
| Rule | Detail |
|
||
|------|--------|
|
||
| Management → Application | Management plane LLDAP can be registered as an upstream LDAP source in cluster Authelia, so operator accounts get cluster SSO without maintaining two passwords. This is **optional** and the cluster degrades gracefully if the management plane is unreachable. |
|
||
| Application → Management | **Never.** Management-plane services authenticate against the local LLDAP/Authelia only. |
|
||
|
||
### 3.3 Identity Lifecycle Phases
|
||
|
||
**Phase 1 — Management-plane IdP, federated outward (current target)**
|
||
|
||
- Management LLDAP is authoritative for all operator accounts
|
||
- Cluster Authelia federates management LLDAP as upstream for operator SSO
|
||
- Application-only users (if any) have direct accounts in cluster LLDAP
|
||
- Simple, low overhead, suitable for small operator team + small application user population
|
||
|
||
**Phase 2 — Full application-domain IdP, management users bridged in**
|
||
|
||
- Triggered when application user population warrants independent governance
|
||
- Cluster LLDAP becomes authoritative for application users
|
||
- Management users are federated into the cluster (not the reverse)
|
||
- Management plane remains fully independent — cluster IdP outage does not affect management operations
|
||
- Migration path is clean because the coupling direction never reverses
|
||
|
||
### 3.4 Secrets Management
|
||
|
||
| Domain | Tool | Rationale |
|
||
|--------|------|-----------|
|
||
| Management plane | `agenix` | NixOS-native; age-encrypted secrets declared alongside `configuration.nix`; same age key material as SOPS |
|
||
| Application domain | SOPS/age | Already established in cluster; ArgoCD + Helm secrets operator integration in place |
|
||
| Bridging | Shared age key material | Both tools use age — operator key material can overlap; no second key infrastructure needed |
|
||
|
||
---
|
||
|
||
## 4. Operational Boundaries and Failure Modes
|
||
|
||
### 4.1 Failure Independence
|
||
|
||
| Failure scenario | Management plane impact | Application domain impact |
|
||
|-----------------|------------------------|--------------------------|
|
||
| Cluster down | None — management plane unaffected | Application workloads down |
|
||
| Management plane down | Governance tooling unavailable | Application workloads continue; SSO may degrade for operator accounts if federation configured (Phase 1 only) |
|
||
| key-cape down | None | Application SSO down; management-plane auth unaffected |
|
||
| Management LLDAP down | Management SSO down | Application SSO degrades for operator accounts (if Phase 1 federation); application users unaffected |
|
||
|
||
### 4.2 Network Topology
|
||
|
||
- Management plane has no ingress dependency on the cluster
|
||
- ops-bridge on the management plane provides the entry point for operator traffic to management services
|
||
- Domain hubs (inter-hub instances) communicate with the cluster only via defined capability interfaces — no cluster-internal network access required
|
||
|
||
---
|
||
|
||
## 5. Hub and Framework Placement
|
||
|
||
Inter-hub and all domain hub instances (dev-hub, ops-hub, fin-hub, etc.) run on the management plane, not as cluster workloads. This is a deliberate departure from Option A/C:
|
||
|
||
- Hub instances are IHP/Haskell — their natural runtime is NixOS + systemd
|
||
- IHP containerisation is non-trivial (Nix OCI build); NixOS systemd is the design target
|
||
- Hubs govern cluster workloads — they must remain available when the cluster is disrupted
|
||
- All hub instances share the same operational paradigm: NixOS configuration, `agenix` secrets, systemd service units
|
||
|
||
Domain hubs communicate with cluster workloads exclusively through:
|
||
- Registered capability interfaces (state-hub capability registry)
|
||
- HTTPS endpoints (no cluster-internal DNS or service mesh access)
|
||
|
||
---
|
||
|
||
## 6. Provisioning Sequence
|
||
|
||
```
|
||
S0 Workstation (current state)
|
||
└─ custodian running locally
|
||
└─ inter-hub in development
|
||
|
||
S1 Provision management plane host
|
||
├─ Terraform null_resource → nixos-anywhere → NixOS install
|
||
├─ configuration.nix from inter-hub repo
|
||
└─ agenix secrets bootstrapped from operator workstation
|
||
|
||
S2 Migrate custodian to management plane
|
||
└─ PostgreSQL → management plane (local, NixOS-managed)
|
||
|
||
S3 Deploy inter-hub + hub instances to management plane
|
||
└─ systemd services, Authelia + LLDAP for management SSO
|
||
|
||
S4 Complete key-cape NK-WP-0003 T09 (backup, DR, monitoring)
|
||
└─ key-cape fully operational in cluster
|
||
|
||
S5 Configure identity federation (Phase 1)
|
||
└─ Cluster Authelia registers management LLDAP as upstream
|
||
|
||
S6 Domain hubs connect to cluster workloads
|
||
└─ Capability registrations, HTTPS interface contracts
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Open Decisions
|
||
|
||
| ID | Question | Owner | Status |
|
||
|----|----------|-------|--------|
|
||
| OA-D01 | Management plane host sizing and provider (Hetzner CX22 vs other) | Bernd | Open |
|
||
| OA-D02 | Authelia version and config parity between management plane and key-cape | Bernd | Open |
|
||
| OA-D03 | agenix key bootstrapping — which operator keys are age recipients on management plane | Bernd | Open |
|
||
| OA-D04 | Trigger condition for Phase 2 identity migration (application user threshold or organisational event) | Bernd | Open |
|
||
| OA-D05 | ops-bridge: reverse proxy (Caddy/nginx) or dedicated ingress component on management plane | Bernd | Open |
|
||
|
||
---
|
||
|
||
## 8. Relationship to Existing Specifications
|
||
|
||
| Document | Relationship |
|
||
|----------|-------------|
|
||
| `specs/InteractionHubFrameworkSpecification_v0.2.md` | IHF spec — defines hub phases; hub placement in this architecture implements IHF Phases 9–12 deployment targets |
|
||
| `SCOPE.md` | Situational guide for inter-hub development; this document governs where inter-hub runs |
|
||
| NK-WP-0003 (state-hub) | Active workplan for key-cape cluster deployment — T09 is a prerequisite for S4 above |
|
||
| Railiance OAS S1–S5 | Application domain provisioning patterns; NixOS management plane adds a NixOS module to S1 without replacing it |
|
||
|
||
---
|
||
|
||
## 9. Architecture Diagram
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||
│ MANAGEMENT PLANE (NixOS VPS) │
|
||
│ │
|
||
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────────────────┐ │
|
||
│ │ custodian │ │ inter-hub │ │ domain hubs │ │
|
||
│ │ state-hub │ │ (IHP/Hs) │ │ dev-hub / ops-hub / fin-hub … │ │
|
||
│ └─────────────┘ └─────────────┘ └──────────────────────────────────┘ │
|
||
│ │
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ Identity (management users only) │ │
|
||
│ │ LLDAP ──▶ Authelia (OIDC) │ │
|
||
│ └────────────────────────┬────────────────────────┘ │
|
||
│ │ optional upstream trust │
|
||
│ ▼ │
|
||
└────────────────────────────┼────────────────────────────────────────────────┘
|
||
│
|
||
┌───────────────────┼──────────────────────────────────────────────┐
|
||
│ APPLICATION DOMAIN (k3s — COULOMBCORE + RAILIANCE01) │
|
||
│ │ │
|
||
│ ┌───────────────▼────────────────────────────────┐ │
|
||
│ │ key-cape (Authelia + LLDAP + privacyIDEA) │ │
|
||
│ │ application IdP — *.coulomb.social │ │
|
||
│ └───────────────────────────────────────────────-┘ │
|
||
│ │
|
||
│ ┌────────────┐ ┌────────────────┐ ┌───────────────────┐ │
|
||
│ │ markitect │ │ kaizen-agentic │ │ coulomb.social │ │
|
||
│ └────────────┘ └────────────────┘ └───────────────────┘ │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
*This document is a living specification. Decisions recorded in OA-D01–D05 should be resolved in state-hub as they close, and this document updated accordingly.*
|