# Operational Architecture — NetKingdom / Railiance OAS **Version:** 0.1 **Date:** 2026-03-31 **Status:** Adopted — working document --- ## 1. Governing Principle > **The governor must not run on the governed.** The management plane and the application domain are operationally independent. Neither may have a hard runtime dependency on the other. Identity federation is the one permitted soft coupling, and it runs in one direction only: the application domain may optionally trust the management plane IdP; the management plane trusts nothing in the cluster. --- ## 2. Two-Domain Model ### 2.1 Management Plane | Attribute | Value | |-----------|-------| | Host type | Dedicated NixOS VPS (e.g. Hetzner CX22 or equivalent) | | Provisioning | `nixos-anywhere` called from Terraform S1 (NixOS module added to existing S1 patterns) | | Runtime | systemd services under NixOS — no container orchestrator | | Config management | Declarative `configuration.nix`; atomic rollbacks via NixOS generations | | Secrets | `agenix` (NixOS-native, age-encrypted secrets in config repo) | **Workloads hosted on the management plane:** | Service | Role | |---------|------| | `the-custodian` (FastAPI + PostgreSQL) | State hub — decisions, workstreams, progress events | | `inter-hub` (IHP/Haskell) | Interaction Hub Framework — governed interaction substrate | | All domain hub instances (dev-hub, ops-hub, fin-hub, …) | Hub instances built on the inter-hub framework | | LLDAP (management users only) | Authoritative directory for operator accounts | | Authelia | SSO/OIDC for management-plane services | | ops-bridge | Management traffic entry point; not a governed workload itself | **What does NOT run here:** - Application workloads (markitect, kaizen-agentic, coulomb.social, activity-core, …) - The cluster-resident key-cape identity stack - Any service whose availability must depend on cluster health ### 2.2 Application Domain | Attribute | Value | |-----------|-------| | Host(s) | COULOMBCORE + RAILIANCE01 | | Orchestration | k3s (Railiance OAS S1–S5: Terraform/Ansible → cnpg → ArgoCD/Helm) | | Config management | GitOps via ArgoCD | | Secrets | SOPS/age (existing cluster pattern) | **Workloads hosted in the cluster:** | Service | Role | |---------|------| | `key-cape` | Application-domain IdP: Authelia + LLDAP + privacyIDEA (SSO/MFA/OIDC) | | `markitect` | Application workload | | `kaizen-agentic` | Application workload | | `coulomb.social` | Application workload | | `activity-core` | Application workload | | cnpg PostgreSQL | Cluster-resident databases | | cert-manager / ACME | TLS for `*.coulomb.social` | **Status note (as of 2026-03-31):** key-cape stack (Authelia + LLDAP + privacyIDEA) is deployed and validated on RAILIANCE01 (NK-WP-0003 T01–T08 complete). T09 (backup, DR, monitoring) is the remaining task. --- ## 3. Identity and Security Architecture ### 3.1 Stack Placement ``` Management Plane (NixOS) Application Domain (k3s) ───────────────────────── ────────────────────────────────── LLDAP ◀── operator accounts only key-cape: Authelia ── OIDC for mgmt services ├─ LLDAP (application users) ├─ Authelia (SSO, OIDC broker) optional upstream trust ──────▶ └─ privacyIDEA (MFA) (cluster Authelia may pull mgmt LLDAP as upstream) ``` ### 3.2 Federation Direction | Rule | Detail | |------|--------| | Management → Application | Management plane LLDAP can be registered as an upstream LDAP source in cluster Authelia, so operator accounts get cluster SSO without maintaining two passwords. This is **optional** and the cluster degrades gracefully if the management plane is unreachable. | | Application → Management | **Never.** Management-plane services authenticate against the local LLDAP/Authelia only. | ### 3.3 Identity Lifecycle Phases **Phase 1 — Management-plane IdP, federated outward (current target)** - Management LLDAP is authoritative for all operator accounts - Cluster Authelia federates management LLDAP as upstream for operator SSO - Application-only users (if any) have direct accounts in cluster LLDAP - Simple, low overhead, suitable for small operator team + small application user population **Phase 2 — Full application-domain IdP, management users bridged in** - Triggered when application user population warrants independent governance - Cluster LLDAP becomes authoritative for application users - Management users are federated into the cluster (not the reverse) - Management plane remains fully independent — cluster IdP outage does not affect management operations - Migration path is clean because the coupling direction never reverses ### 3.4 Secrets Management | Domain | Tool | Rationale | |--------|------|-----------| | Management plane | `agenix` | NixOS-native; age-encrypted secrets declared alongside `configuration.nix`; same age key material as SOPS | | Application domain | SOPS/age | Already established in cluster; ArgoCD + Helm secrets operator integration in place | | Bridging | Shared age key material | Both tools use age — operator key material can overlap; no second key infrastructure needed | --- ## 4. Operational Boundaries and Failure Modes ### 4.1 Failure Independence | Failure scenario | Management plane impact | Application domain impact | |-----------------|------------------------|--------------------------| | Cluster down | None — management plane unaffected | Application workloads down | | Management plane down | Governance tooling unavailable | Application workloads continue; SSO may degrade for operator accounts if federation configured (Phase 1 only) | | key-cape down | None | Application SSO down; management-plane auth unaffected | | Management LLDAP down | Management SSO down | Application SSO degrades for operator accounts (if Phase 1 federation); application users unaffected | ### 4.2 Network Topology - Management plane has no ingress dependency on the cluster - ops-bridge on the management plane provides the entry point for operator traffic to management services - Domain hubs (inter-hub instances) communicate with the cluster only via defined capability interfaces — no cluster-internal network access required --- ## 5. Hub and Framework Placement Inter-hub and all domain hub instances (dev-hub, ops-hub, fin-hub, etc.) run on the management plane, not as cluster workloads. This is a deliberate departure from Option A/C: - Hub instances are IHP/Haskell — their natural runtime is NixOS + systemd - IHP containerisation is non-trivial (Nix OCI build); NixOS systemd is the design target - Hubs govern cluster workloads — they must remain available when the cluster is disrupted - All hub instances share the same operational paradigm: NixOS configuration, `agenix` secrets, systemd service units Domain hubs communicate with cluster workloads exclusively through: - Registered capability interfaces (state-hub capability registry) - HTTPS endpoints (no cluster-internal DNS or service mesh access) --- ## 6. Provisioning Sequence ``` S0 Workstation (current state) └─ custodian running locally └─ inter-hub in development S1 Provision management plane host ├─ Terraform null_resource → nixos-anywhere → NixOS install ├─ configuration.nix from inter-hub repo └─ agenix secrets bootstrapped from operator workstation S2 Migrate custodian to management plane └─ PostgreSQL → management plane (local, NixOS-managed) S3 Deploy inter-hub + hub instances to management plane └─ systemd services, Authelia + LLDAP for management SSO S4 Complete key-cape NK-WP-0003 T09 (backup, DR, monitoring) └─ key-cape fully operational in cluster S5 Configure identity federation (Phase 1) └─ Cluster Authelia registers management LLDAP as upstream S6 Domain hubs connect to cluster workloads └─ Capability registrations, HTTPS interface contracts ``` --- ## 7. Open Decisions | ID | Question | Owner | Status | |----|----------|-------|--------| | OA-D01 | Management plane host sizing and provider (Hetzner CX22 vs other) | Bernd | Open | | OA-D02 | Authelia version and config parity between management plane and key-cape | Bernd | Open | | OA-D03 | agenix key bootstrapping — which operator keys are age recipients on management plane | Bernd | Open | | OA-D04 | Trigger condition for Phase 2 identity migration (application user threshold or organisational event) | Bernd | Open | | OA-D05 | ops-bridge: reverse proxy (Caddy/nginx) or dedicated ingress component on management plane | Bernd | Open | --- ## 8. Relationship to Existing Specifications | Document | Relationship | |----------|-------------| | `specs/InteractionHubFrameworkSpecification_v0.2.md` | IHF spec — defines hub phases; hub placement in this architecture implements IHF Phases 9–12 deployment targets | | `SCOPE.md` | Situational guide for inter-hub development; this document governs where inter-hub runs | | NK-WP-0003 (state-hub) | Active workplan for key-cape cluster deployment — T09 is a prerequisite for S4 above | | Railiance OAS S1–S5 | Application domain provisioning patterns; NixOS management plane adds a NixOS module to S1 without replacing it | --- ## 9. Architecture Diagram ``` ┌──────────────────────────────────────────────────────────────────────────────┐ │ MANAGEMENT PLANE (NixOS VPS) │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────────────────┐ │ │ │ custodian │ │ inter-hub │ │ domain hubs │ │ │ │ state-hub │ │ (IHP/Hs) │ │ dev-hub / ops-hub / fin-hub … │ │ │ └─────────────┘ └─────────────┘ └──────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────┐ │ │ │ Identity (management users only) │ │ │ │ LLDAP ──▶ Authelia (OIDC) │ │ │ └────────────────────────┬────────────────────────┘ │ │ │ optional upstream trust │ │ ▼ │ └────────────────────────────┼────────────────────────────────────────────────┘ │ ┌───────────────────┼──────────────────────────────────────────────┐ │ APPLICATION DOMAIN (k3s — COULOMBCORE + RAILIANCE01) │ │ │ │ │ ┌───────────────▼────────────────────────────────┐ │ │ │ key-cape (Authelia + LLDAP + privacyIDEA) │ │ │ │ application IdP — *.coulomb.social │ │ │ └───────────────────────────────────────────────-┘ │ │ │ │ ┌────────────┐ ┌────────────────┐ ┌───────────────────┐ │ │ │ markitect │ │ kaizen-agentic │ │ coulomb.social │ │ │ └────────────┘ └────────────────┘ └───────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────┘ ``` --- *This document is a living specification. Decisions recorded in OA-D01–D05 should be resolved in state-hub as they close, and this document updated accordingly.*