feat(WP-0009): IHF GAAF Compliance Foundation — type registries, extension manifests, architectural contracts

Implements IHUB-WP-0009: closes four GAAF-2026 gaps before domain hub work begins. - TypeRegistry helper + controllers/views (hub_kind, hub_capability_manifest) - HubCapabilityManifest entity with validation and registry linkage - ARCHITECTURE-LAYERS.md + CI-enforced boundary contracts - Alembic migration 1743724800, fitness tests (Test/Architecture/) - GAAF spec, Operational Architecture spec, domain hub extension guide - Updates to CLAUDE.md, SCOPE.md, Schema.sql, Routes, FrontController, Types state_hub_sync: pending (tunnel was STALE at completion time; run fix-consistency) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 21:17:39 +00:00
parent 1a7732d7da
commit b5d73aa18b
47 changed files with 4855 additions and 104 deletions
--- a/specs/GoodSoftwareArchitectureFramework_2026.md
+++ b/specs/GoodSoftwareArchitectureFramework_2026.md
@@ -0,0 +1,234 @@
+GoodApplicationArchitecture2026
+
+*A guideline for building good software systems*
+
+**Good Application Architecture Framework 2026 (GAAF-2026)**  
+**Standards Document**  
+**Version 1.0 – 31 March 2026**
+
+### 1. Introduction
+The **Good Application Architecture Framework 2026 (GAAF-2026)** is a system-theoretic standard for designing, reviewing, and continuously improving software repositories, frameworks, and products. It separates different kinds of change into distinct layers so that rigidity protects stability, malleability enables product learning, extensibility supports controlled growth, and bounded variability keeps operational risk under control.
+
+**One-line doctrine**  
+Freeze the core, evolve the function, bound the customization, constrain the configuration, and govern all change through explicit contracts.
+
+GAAF-2026 turns architecture from an implicit art into a repeatable, measurable, enforceable control system. It is deliberately practical: every concept has an associated artifact, checklist, or automated fitness function that both humans and coding agents can apply immediately. It is designed for immediate adoption in any codebase (monorepo, framework, SaaS, open-source library) and scales across entire organizations.
+
+### 2. Core Concept
+GAAF-2026 views a software system as a **cybernetic control system** for managing change. It evaluates every architectural decision across five orthogonal dimensions:
+
+| Dimension     | Purpose                              |
+|---------------|--------------------------------------|
+| **Layer**     | Where the change lives               |
+| **Contract**  | How the change is constrained        |
+| **Lifecycle** | When the change is allowed           |
+| **Validation**| How correctness is ensured           |
+| **Failure Mode** | What happens when things break    |
+
+This five-dimensional lens prevents layering from collapsing over time.
+
+### 3. Layer Model (Final Form)
+
+| Layer                  | Rigidity     | Role                                      | Contract Type          | Lifecycle States                  | Defined Failure Mode                          | Primary Success Metric                     |
+|------------------------|--------------|-------------------------------------------|------------------------|-----------------------------------|-----------------------------------------------|--------------------------------------------|
+| **Core**               | High (frozen)| Domain-agnostic primitives & invariants   | Strong (versioned, immutable after v1) | Distilled only (rare promotion)  | Fail-fast, never undefined behaviour         | Replaceable only at major version boundaries |
+| **Functional**         | Medium       | Value-realization modules                 | Medium (evolvable, versioned) | Experimental → Beta → Stable → Deprecated | Graceful degradation                        | Demand-driven, independently shippable     |
+| **Customization**      | Low          | Vendor/operator-controlled adaptation     | Adaptive (migration-aware) | Versioned & migratable           | Isolated per tenant/customer                 | Zero manual upgrade intervention           |
+| **Configuration**      | Very Low     | User-controlled declarative state         | Schema (runtime-validated) | Dynamic but bounded              | Reject invalid state BEFORE execution        | Zero production incidents from bad config  |
+| **Extensions** (aspect)| Cross-cutting| Externally supplied Functional modules    | Negotiated (manifest + capability) | Full lifecycle governed         | Sandboxed (must not crash host)              | Full compatibility matrix coverage         |
+
+**Dependency rule (strict)**:  
+Core ← Functional ← Customization ← Configuration  
+Extensions plug into Core or Functional only via contracts.
+
+### 4. Contract System (First-Class Artifact)
+Every compliant repository **MUST** contain a top-level folder:
+
+```
+/contracts/
+  core/
+  functional/
+  customization/
+  config/
+  extensions/
+```
+
+A **Contract** is a versioned artifact that defines for any public surface:
+- Interface
+- Invariants (what must always hold)
+- Compatibility rules
+- Validation rules
+
+Contract types per layer are listed in the table above.
+
+### 5. Architectural Laws (Hard Review Criteria)
+1. Change must occur in the highest appropriate layer.  
+2. Lower layers define contracts; upper layers consume them (downward dependencies only).  
+3. The more rigid the layer, the stronger the interface discipline.  
+4. Variability must be explicit (who, what, guarantees, validation, upgrade path).  
+5. Customer-specific value must not poison product evolution.  
+6. Configuration must never become a second programming language by accident.  
+7. Extensions must use seams, not surgery.  
+8. **Enforcement Law**: All rules above must be automatically verified by architectural fitness functions in CI.
+
+### 6. Evolution Model
+**Promotion path** (rare, bottom-up only)  
+Experiment (Functional) → Stable Functional → Core  
+
+**Extraction path**  
+Functional → Extension (external ownership)  
+
+**Decay path**  
+Functional → Deprecated → Removed  
+
+**Core rule**: Core is never designed top-down; it is distilled from proven Functional patterns that have demonstrated multi-use value.
+
+### 7. Failure Model (Per-Layer Semantics)
+Every contract must explicitly document the failure behaviour for its layer (see table in §3).
+
+### 8. Validation & Architectural Fitness Functions
+Every repository **MUST** implement automated checks:
+- Import / dependency graph validation (no upward dependencies)
+- Core breaking-change detection
+- Config schema validation before any execution
+- Extension manifest + lifecycle hook presence
+- Layer boundary lint rules
+- Demand-signal / cost-justification check for Functional and Customization changes
+
+### 9. Reusable 7-Phase Workplan
+**Phase 0** – Scope & Inventory  
+**Phase 1** – Boundary & Contract Extraction  
+**Phase 2** – Refactoring by Relocation  
+**Phase 3** – Dependency Enforcement & Fitness Functions  
+**Phase 4** – Validation Architecture + Failure Testing  
+**Phase 5** – Governance & Release Discipline  
+**Phase 6** – Scorecard & Continuous Improvement  
+
+**Required living artifact** in every repository:  
+`ARCHITECTURE-LAYERS.md` (see template in §12).
+
+### 10. Scorecard
+**Scoring scale** (0–5)  
+0 = absent / actively harmful  
+1 = weak / ad-hoc  
+2 = partial / inconsistent  
+3 = adequate / workable  
+4 = strong / disciplined  
+5 = excellent / exemplary  
+
+**Default weighting** (long-term systems):  
+Core 30 % | Functional 20 % | Customization 15 % | Configuration 10 % | Extensions 10 % | Cross-layer 15 %
+
+**Core criteria (C1–C9)**  
+C1. Minimality C2. Orthogonality C3. Stability C4. Correctness confidence C5. Performance fitness C6. Scope completeness C7. Domain neutrality C8. Contract clarity C9. Invariant definition  
+
+**Functional criteria (F1–F8)**  
+F1. Module isolation F2. Value efficiency F3. Maturity labeling completeness F4. Reuse of core F5. Coupling discipline F6. Change velocity fitness F7. Third-party readiness F8. Demand-signal discipline  
+
+**Customization criteria (U1–U8)**  
+U1. Boundary clarity U2. Upgrade safety U3. Contract discipline U4. Migration reliability U5. Quality control U6. Tenant isolation U7. Operational predictability U8. Cost justification  
+
+**Configuration criteria (G1–G7)**  
+G1. Schema discipline G2. Validation strength G3. Safety of defaults G4. Role & permission control G5. Auditability G6. Rollback & recovery G7. State-space boundedness  
+
+**Extensions criteria (E1–E7)**  
+E1. Registration quality E2. Contract clarity E3. Isolation guarantees E4. Testability E5. Version compatibility E6. Domain packaging fitness E7. Developer experience  
+
+**Cross-layer criteria (X1–X8)**  
+X1. Layer clarity X2. Dependency rule compliance X3. Change placement X4. Interface governance X5. Architectural test coverage X6. Operational maintainability X7. Long-term evolvability X8. Failure containment & economic alignment  
+
+**Interpretation**  
+≥ 4.5 = Exemplary 3.5–4.4 = Strong 2.5–3.4 = Usable but vulnerable ≤ 2.4 = Needs restructuring
+
+### 11. Economic Alignment (Value-Driven Evolution)
+- Functional modules require an explicit **demand signal**.  
+- Customization requires **per-instance cost justification**.  
+- Core changes require **proven multi-use / reuse benefit** across domains.  
+
+This ensures architecture directly supports business economics.
+
+### 12. Practical Artifacts & Templates
+
+#### 12.1 ARCHITECTURE-LAYERS.md Template
+```markdown
+# ARCHITECTURE-LAYERS.md
+**Framework:** GAAF-2026  
+**Last reviewed:** YYYY-MM-DD  
+**Weighted scorecard:** XX % (see scorecard.xlsx)  
+**Repository purpose:** …  
+**Layer map:** Core: … | Functional: … | …  
+**Decisions log:** …  
+**Next review:** YYYY-MM-DD
+```
+
+#### 12.2 Standard Review Output Template
+**Repository**  
+Name: …  
+Purpose: …  
+Maturity: …  
+Review date: …  
+
+**Layer map**  
+- Core: …  
+- Functional: …  
+- etc.
+
+**Major findings**  
+Strengths / Violations / Risks / Fast wins / Strategic refactors  
+
+**Scores** (per section + weighted total)  
+
+**Priority actions** (P1–P3)  
+
+**Migration concerns**  
+
+**Decision** (Keep / Refine / Refactor / Re-architect)
+
+#### 12.3 Good-Signs / Bad-Signs Heuristics (Quick Checklist for Humans & Agents)
+**Good signs**  
+- Core is small and boring  
+- Modules are easy to add or remove  
+- Customer logic lives outside product code  
+- Config has strong validation  
+- Extension seams are explicit and registered  
+- Upgrades require zero heroics  
+
+**Bad signs**  
+- Core changes every month  
+- Features bypass core contracts  
+- Customers implemented as branches in code  
+- Config contains arbitrary expressions  
+- Plugins patch internal state  
+- Releases need manual per-customer repair  
+
+#### 12.4 Example Optimization Backlog Categories
+- **Core backlog**: shrink surface, remove domain leakage, formalize invariants  
+- **Functional backlog**: split coupled modules, mark maturity, eliminate core duplication  
+- **Customization backlog**: replace forks with rules/workflows, add manifest & migration engine  
+- **Configuration backlog**: add typed schemas, guardrails, audit log  
+- **Extension backlog**: define registration API, lifecycle, compatibility matrix, test kit
+
+### 13. Compliance Definition
+A repository is **GAAF-2026 compliant** if and only if it satisfies **all** of the following:
+1. Layers are separated as defined.
+2. Explicit contracts exist in `/contracts/`.
+3. Strict downward dependency direction is enforced.
+4. Lifecycle states are declared and respected.
+5. Upgradeability is guaranteed via bounded customization.
+6. All user-controlled variability is validated.
+7. Extensibility uses registered, contract-based mechanisms.
+8. Failure is contained within defined per-layer boundaries.
+9. Compliance is continuously measured via scorecard and fitness functions.
+
+### 14. Adoption & Next Steps
+- **For humans**: Use the workplan every major release or when scorecard < 3.5.  
+- **For agents**: Feed this entire document + the `ARCHITECTURE-LAYERS.md` into any coding or review prompt.  
+- **Automation**: Implement the fitness functions listed in §8 as the first CI jobs.  
+- **Repository starter kit**: Create the `/contracts/` folder and `ARCHITECTURE-LAYERS.md` on day one.
+
+This document is the single source of truth for GAAF-2026. It is intentionally self-contained, versioned, and ready for inclusion in every repository, Dev Hub, or organizational standard library.  
+
+**Approved for use across all systems.**  
+**Next scheduled framework review: March 2027.**
+
+xxx
--- a/specs/InteractionHubFrameworkSpecification_v0.2.md
+++ b/specs/InteractionHubFrameworkSpecification_v0.2.md
@@ -121,13 +121,59 @@ Phase 8 established federated governance within a single deployment. Phase 9
 exposes that governance state as a stable, versioned, authenticated REST API and
 ships consumer SDKs that make integration a day's work rather than a project.

+### GAAF Foundation Prerequisite
+
+> **IHUB-WP-0009 (GAAF Compliance Foundation) must be complete before Phase 9
+> begins.**
+
+Phase 9 generates an OpenAPI 3.1 specification that documents all IHF API
+fields. Three of those fields — `widget_type`, `event_type`, and `category` —
+are type discriminators. If they are documented as arbitrary `string` values,
+the API contract is immediately incorrect: consumers will invent values that
+diverge from the IHF vocabulary, breaking cross-hub aggregation and federation.
+
+IHUB-WP-0009 establishes the four type registries that enumerate these fields.
+Phase 9 must read from those registries to generate correct `enum` arrays in
+the OpenAPI spec. Building Phase 9 first and retrofitting enums later is a
+breaking API change.
+
+**Specific GAAF dependencies for Phase 9 implementation:**
+
+1. **Type registry enumerations in OpenAPI** — The spec generator must query
+   `widget_type_registry`, `event_type_registry`, and
+   `annotation_category_registry` to produce `enum` arrays for the
+   corresponding fields. The generated spec must NOT document these as
+   unconstrained `string`.
+
+2. **ApiConsumer linked to HubCapabilityManifest** — A `domain` hub
+   authenticating as an API consumer is identified by its active
+   `HubCapabilityManifest`. The `ApiConsumer` record should carry a
+   `hub_capability_manifest_id` FK (nullable — non-hub consumers such as
+   third-party tools authenticate without a manifest). When a manifested
+   consumer submits an event, the `event_type` is validated against both the
+   global `event_type_registry` and the manifest's `declared_event_types`.
+
+3. **OAuth scope alignment with registered vocabulary** — OAuth scopes should
+   include hub-specific scope claims (`hub:{slug}:write`) that the token
+   exchange validates against the hub's active manifest. A consumer without a
+   manifest can only write framework-level event types; hub-owned types require
+   the corresponding hub scope.
+
+4. **Contract file reference** — The OpenAPI spec must reference
+   `/contracts/functional/interaction-reporting-v1.md` as its human-readable
+   companion. The generated spec is derived data; the contract file is
+   authoritative intent.
+
 ### Scope

 * Versioned REST API (`/api/v2/`) for all core IHF artifact types
-* OpenAPI 3.1 specification generated from the live schema
+* OpenAPI 3.1 specification generated from the live schema, with type registry
+  enumerations for all type discriminator fields
 * Authentication: OAuth 2.0 client credentials flow (superseding per-hub Bearer tokens)
-* API key management UI for external consumers
-* Consumer SDKs: TypeScript/Node, Python
+* API key management UI for external consumers; domain hub consumers linked to
+  their active HubCapabilityManifest
+* Consumer SDKs: TypeScript/Node, Python (type-safe enums generated from
+  type registries)
 * Webhook delivery for interaction events, candidate creation, and decision records
 * API usage dashboard: request counts, error rates, consumer identity
 * Rate limiting and quota management per consumer
@@ -137,24 +183,33 @@ ships consumer SDKs that make integration a day's work rather than a project.
 * External systems can read widget registry, interaction events, annotations,
  requirement candidates, decisions, deployments, and outcome signals
 * External systems can submit interaction events and annotations via the API
+* Domain hub consumers submitting hub-owned event types require a matching
+  active HubCapabilityManifest
 * Downstream hubs can subscribe to governance events via webhooks
-* SDK consumers get type-safe access to IHF contracts without reading the spec
+* SDK consumers get type-safe access to IHF contracts without reading the spec;
+  SDK enum types are generated from the live type registries
 * API consumers are tracked, quotaed, and auditable

 ### Exit Criteria

 * All core IHF artifact types are readable via `/api/v2/`
 * Interaction events and annotations are writable via `/api/v2/`
-* OpenAPI spec is generated and accurate
-* TypeScript SDK and Python SDK published (as static files or packages)
+* OpenAPI spec is generated and accurate; `widget_type`, `event_type`, and
+  `category` fields carry `enum` arrays derived from the type registries
+* TypeScript SDK and Python SDK published (as static files or packages); both
+  export typed enums for widget types and event types
 * Webhook delivery confirmed for at least two event types
 * API usage dashboard renders correctly
 * OAuth token flow works end-to-end
+* Submission of an unregistered `event_type` returns HTTP 422 with a
+  registry-referenced error message

 ### Data Artifacts Introduced

 `ApiConsumer`, `ApiKey`, `WebhookSubscription`, `WebhookDelivery`

+Schema additions: `api_consumers.hub_capability_manifest_id` (FK, nullable)
+
 ---

 ## Phase 10 — Hub Registry and Widget Marketplace
@@ -166,45 +221,105 @@ configurations across deployments. Phase 9 made the IHF externally consumable.
 Phase 10 makes it composable: hubs and widgets can be discovered, rated,
 adopted, and evolved as shared platform assets.

+### GAAF Foundation Integration
+
+> **Phase 10's Hub Registry IS the `HubCapabilityManifest` table, extended with
+> a public-facing discovery UI.** It is not a separate data store. IHUB-WP-0009
+> must be complete before Phase 10 begins.
+
+The Hub Registry in Phase 10 is the public-facing projection of the capability
+manifests introduced in IHUB-WP-0009. Every registered hub already has an
+active `HubCapabilityManifest` that declares its widget types, event types,
+annotation categories, and policy vocabulary. Phase 10 adds the browsability,
+pattern publishing, and adoption mechanics on top of that existing foundation.
+
+**Specific GAAF integration points for Phase 10 implementation:**
+
+1. **Hub Registry = active HubCapabilityManifest + HubHealthSnapshot** — The
+   hub registry view is a join of `hub_capability_manifests` (status=active),
+   `hub_health_snapshots` (latest), and `hubs`. No new hub registry table is
+   required. The data already exists; Phase 10 adds the discovery UI.
+
+2. **Widget patterns reference registered types** — A `WidgetPattern` record
+   must declare a `widget_type` that exists in `widget_type_registry`. When
+   publishing a pattern, if the `widget_type` is owned by another hub, the
+   pattern is cross-hub and requires that hub's acknowledgement (or uses a
+   framework-level type). This prevents patterns from encoding unregistered
+   vocabulary.
+
+3. **Pattern adoption triggers manifest update** — When a hub adopts a
+   `WidgetPattern`, if the pattern's `widget_type` is not in the adopting
+   hub's `declared_widget_types`, the adopting hub's manifest is updated to
+   include it (in draft amendment mode). The hub operator must re-activate
+   the amended manifest. This ensures the adopting hub's type vocabulary stays
+   coherent with its actual widget usage.
+
+4. **Governance templates reference registered categories** — A
+   `GovernanceTemplate` for requirement categories must reference entries in
+   `annotation_category_registry`. Template cloning adds any new categories
+   to the cloning hub's manifest (draft amendment).
+
+5. **Hub registry GAAF compliance score** — The hub registry should display
+   each hub's GAAF compliance indicator: whether it has an active manifest,
+   how many registered types it owns, and whether the architecture fitness
+   functions report any violations. This makes GAAF compliance visible as a
+   platform-level metric.
+
 ### Scope

-* Hub registry: a catalog of registered hubs with public metadata, capability
-  declarations, and health summaries
-* Widget pattern library: reusable widget definitions that can be instantiated
-  into any hub
-* Governance template library: requirement distillation and decision templates
-  that can be cloned across hubs
+* Hub registry: a catalog of registered hubs built on `HubCapabilityManifest`
+  + `HubHealthSnapshot`, with public metadata, declared vocabulary, and health
+  summaries
+* Widget pattern library: reusable widget definitions tied to registered types
+  from `widget_type_registry`
+* Governance template library: requirement distillation and decision templates,
+  tied to registered annotation categories
 * Widget ratings and adoption tracking: which widgets are in use where, with
  aggregated friction scores across deployments
 * Pattern versioning: widget patterns have explicit versions; hubs can pin or
  follow-latest
+* Pattern adoption with manifest amendment workflow: adoption updates the
+  adopting hub's capability manifest when new types are introduced
 * Marketplace dashboard: browse, search, and adopt patterns

 ### Functional Capabilities

-* Hub operators can publish a widget pattern to the shared library
-* Hub operators can adopt a published pattern into their hub
+* Hub operators can publish a widget pattern to the shared library; pattern
+  widget type must be in `widget_type_registry`
+* Hub operators can adopt a published pattern into their hub; adoption
+  triggers a manifest amendment if new types are introduced
 * Governance templates (requirement categories, decision checklists) can be
-  cloned across hubs
+  cloned across hubs; cloning amends the cloning hub's manifest for new
+  categories
 * Widget adoption across hubs is tracked for aggregate friction and outcome
  analysis
 * Pattern authors receive friction and outcome feedback from all adopter hubs
  (opt-in anonymised)
+* Hub registry shows each hub's active capability manifest summary and GAAF
+  compliance status

 ### Exit Criteria

-* Hub registry renders all registered hubs with capability metadata
-* Widget pattern library lists published patterns with version history
-* A pattern can be published from one hub and adopted into another
+* Hub registry renders all registered hubs with their active capability
+  manifest declared vocabulary and current health score
+* Widget pattern library lists published patterns with version history; each
+  pattern's widget type is linked to its registry entry
+* A pattern can be published from one hub and adopted into another; adoption
+  triggers a manifest amendment draft when new types are introduced
 * Adoption tracking shows which hubs use which patterns
-* Governance template cloning works end-to-end
+* Governance template cloning works end-to-end; new categories appear in
+  the adopting hub's manifest amendment
 * Marketplace dashboard renders search and browse
+* Hub registry GAAF compliance indicator renders correctly for all hubs

 ### Data Artifacts Introduced

 `WidgetPattern`, `WidgetPatternVersion`, `PatternAdoption`, `GovernanceTemplate`,
 `GovernanceTemplateClone`

+Note: No `HubRegistry` table — the hub registry is a view over existing
+`hub_capability_manifests`, `hub_health_snapshots`, and `hubs` tables.
+
 ---

 ## Phase 11 — Advanced AI Federation
@@ -339,12 +454,14 @@ merely a record-keeping one.
 ## 7. Dependency Graph (Phases 9–12)

 ```
-Phase 8 (Federated) ──→ Phase 9 (External API)
-                              │
-                              ▼
-                        Phase 10 (Marketplace)
-                              │
-Phase 7 (Observability) ──→ Phase 11 (AI Federation)
+Phase 8 (Federated) ──→ IHUB-WP-0009 (GAAF Foundation) ──→ Phase 9 (External API)
+                              │                                      │
+                              │  type registries, manifests,         ▼
+                              │  contracts, fitness fns        Phase 10 (Marketplace)
+                              │                                      │
+                              └──────────────────────────────────────┤
+                                                                      │
+Phase 7 (Observability) ──→ Phase 11 (AI Federation) ←───────────────┘
 Phase 5 (Agent Assist)  ──┘       │
                                  ▼
                        Phase 12 (Platform Memory)
@@ -352,9 +469,18 @@ Phase 5 (Agent Assist)  ──┘       │
 Phase 4 (Outcomes) ───────────┘
 ```

+- **IHUB-WP-0009 (GAAF Compliance Foundation) is a prerequisite for Phase 9
+  and Phase 10.** It establishes the type registries, HubCapabilityManifest,
+  `/contracts/` directory, and architectural fitness functions that both phases
+  depend on. Phase 9 cannot generate a correct OpenAPI specification without
+  the type registries. Phase 10 cannot build its Hub Registry without the
+  manifest schema.
 - Phase 9 requires Phase 8 (stable federated schema, OAuth replaces per-hub
-  Bearer tokens)
- Phase 10 requires Phase 9 (marketplace API is built on v2 API surface)
+  Bearer tokens) and IHUB-WP-0009 (type registry enumerations, manifest-linked
+  API consumers)
+- Phase 10 requires Phase 9 (marketplace API is built on v2 API surface) and
+  IHUB-WP-0009 (Hub Registry = HubCapabilityManifest + discovery UI; widget
+  patterns reference type registry entries)
 - Phase 11 requires Phase 5 (agent model) and Phase 7 (observability signals
  needed for model routing and performance tracking)
 - Phase 12 requires Phase 4 (outcome signals), Phase 7 (friction/health
--- a/specs/OperationalArchitecture_v0.1.md
+++ b/specs/OperationalArchitecture_v0.1.md
@@ -0,0 +1,242 @@
+# Operational Architecture — NetKingdom / Railiance OAS
+**Version:** 0.1
+**Date:** 2026-03-31
+**Status:** Adopted — working document
+
+---
+
+## 1. Governing Principle
+
+> **The governor must not run on the governed.**
+
+The management plane and the application domain are operationally independent. Neither may have a hard runtime dependency on the other. Identity federation is the one permitted soft coupling, and it runs in one direction only: the application domain may optionally trust the management plane IdP; the management plane trusts nothing in the cluster.
+
+---
+
+## 2. Two-Domain Model
+
+### 2.1 Management Plane
+
+| Attribute | Value |
+|-----------|-------|
+| Host type | Dedicated NixOS VPS (e.g. Hetzner CX22 or equivalent) |
+| Provisioning | `nixos-anywhere` called from Terraform S1 (NixOS module added to existing S1 patterns) |
+| Runtime | systemd services under NixOS — no container orchestrator |
+| Config management | Declarative `configuration.nix`; atomic rollbacks via NixOS generations |
+| Secrets | `agenix` (NixOS-native, age-encrypted secrets in config repo) |
+
+**Workloads hosted on the management plane:**
+
+| Service | Role |
+|---------|------|
+| `the-custodian` (FastAPI + PostgreSQL) | State hub — decisions, workstreams, progress events |
+| `inter-hub` (IHP/Haskell) | Interaction Hub Framework — governed interaction substrate |
+| All domain hub instances (dev-hub, ops-hub, fin-hub, …) | Hub instances built on the inter-hub framework |
+| LLDAP (management users only) | Authoritative directory for operator accounts |
+| Authelia | SSO/OIDC for management-plane services |
+| ops-bridge | Management traffic entry point; not a governed workload itself |
+
+**What does NOT run here:**
+
+- Application workloads (markitect, kaizen-agentic, coulomb.social, activity-core, …)
+- The cluster-resident key-cape identity stack
+- Any service whose availability must depend on cluster health
+
+### 2.2 Application Domain
+
+| Attribute | Value |
+|-----------|-------|
+| Host(s) | COULOMBCORE + RAILIANCE01 |
+| Orchestration | k3s (Railiance OAS S1–S5: Terraform/Ansible → cnpg → ArgoCD/Helm) |
+| Config management | GitOps via ArgoCD |
+| Secrets | SOPS/age (existing cluster pattern) |
+
+**Workloads hosted in the cluster:**
+
+| Service | Role |
+|---------|------|
+| `key-cape` | Application-domain IdP: Authelia + LLDAP + privacyIDEA (SSO/MFA/OIDC) |
+| `markitect` | Application workload |
+| `kaizen-agentic` | Application workload |
+| `coulomb.social` | Application workload |
+| `activity-core` | Application workload |
+| cnpg PostgreSQL | Cluster-resident databases |
+| cert-manager / ACME | TLS for `*.coulomb.social` |
+
+**Status note (as of 2026-03-31):** key-cape stack (Authelia + LLDAP + privacyIDEA) is deployed and validated on RAILIANCE01 (NK-WP-0003 T01–T08 complete). T09 (backup, DR, monitoring) is the remaining task.
+
+---
+
+## 3. Identity and Security Architecture
+
+### 3.1 Stack Placement
+
+```
+Management Plane (NixOS)              Application Domain (k3s)
+─────────────────────────             ──────────────────────────────────
+LLDAP  ◀── operator accounts only     key-cape:
+Authelia ── OIDC for mgmt services      ├─ LLDAP  (application users)
+                                        ├─ Authelia  (SSO, OIDC broker)
+        optional upstream trust ──────▶ └─ privacyIDEA  (MFA)
+        (cluster Authelia may pull
+         mgmt LLDAP as upstream)
+```
+
+### 3.2 Federation Direction
+
+| Rule | Detail |
+|------|--------|
+| Management → Application | Management plane LLDAP can be registered as an upstream LDAP source in cluster Authelia, so operator accounts get cluster SSO without maintaining two passwords. This is **optional** and the cluster degrades gracefully if the management plane is unreachable. |
+| Application → Management | **Never.** Management-plane services authenticate against the local LLDAP/Authelia only. |
+
+### 3.3 Identity Lifecycle Phases
+
+**Phase 1 — Management-plane IdP, federated outward (current target)**
+
+- Management LLDAP is authoritative for all operator accounts
+- Cluster Authelia federates management LLDAP as upstream for operator SSO
+- Application-only users (if any) have direct accounts in cluster LLDAP
+- Simple, low overhead, suitable for small operator team + small application user population
+
+**Phase 2 — Full application-domain IdP, management users bridged in**
+
+- Triggered when application user population warrants independent governance
+- Cluster LLDAP becomes authoritative for application users
+- Management users are federated into the cluster (not the reverse)
+- Management plane remains fully independent — cluster IdP outage does not affect management operations
+- Migration path is clean because the coupling direction never reverses
+
+### 3.4 Secrets Management
+
+| Domain | Tool | Rationale |
+|--------|------|-----------|
+| Management plane | `agenix` | NixOS-native; age-encrypted secrets declared alongside `configuration.nix`; same age key material as SOPS |
+| Application domain | SOPS/age | Already established in cluster; ArgoCD + Helm secrets operator integration in place |
+| Bridging | Shared age key material | Both tools use age — operator key material can overlap; no second key infrastructure needed |
+
+---
+
+## 4. Operational Boundaries and Failure Modes
+
+### 4.1 Failure Independence
+
+| Failure scenario | Management plane impact | Application domain impact |
+|-----------------|------------------------|--------------------------|
+| Cluster down | None — management plane unaffected | Application workloads down |
+| Management plane down | Governance tooling unavailable | Application workloads continue; SSO may degrade for operator accounts if federation configured (Phase 1 only) |
+| key-cape down | None | Application SSO down; management-plane auth unaffected |
+| Management LLDAP down | Management SSO down | Application SSO degrades for operator accounts (if Phase 1 federation); application users unaffected |
+
+### 4.2 Network Topology
+
+- Management plane has no ingress dependency on the cluster
+- ops-bridge on the management plane provides the entry point for operator traffic to management services
+- Domain hubs (inter-hub instances) communicate with the cluster only via defined capability interfaces — no cluster-internal network access required
+
+---
+
+## 5. Hub and Framework Placement
+
+Inter-hub and all domain hub instances (dev-hub, ops-hub, fin-hub, etc.) run on the management plane, not as cluster workloads. This is a deliberate departure from Option A/C:
+
+- Hub instances are IHP/Haskell — their natural runtime is NixOS + systemd
+- IHP containerisation is non-trivial (Nix OCI build); NixOS systemd is the design target
+- Hubs govern cluster workloads — they must remain available when the cluster is disrupted
+- All hub instances share the same operational paradigm: NixOS configuration, `agenix` secrets, systemd service units
+
+Domain hubs communicate with cluster workloads exclusively through:
+- Registered capability interfaces (state-hub capability registry)
+- HTTPS endpoints (no cluster-internal DNS or service mesh access)
+
+---
+
+## 6. Provisioning Sequence
+
+```
+S0  Workstation (current state)
+    └─ custodian running locally
+    └─ inter-hub in development
+
+S1  Provision management plane host
+    ├─ Terraform null_resource → nixos-anywhere → NixOS install
+    ├─ configuration.nix from inter-hub repo
+    └─ agenix secrets bootstrapped from operator workstation
+
+S2  Migrate custodian to management plane
+    └─ PostgreSQL → management plane (local, NixOS-managed)
+
+S3  Deploy inter-hub + hub instances to management plane
+    └─ systemd services, Authelia + LLDAP for management SSO
+
+S4  Complete key-cape NK-WP-0003 T09 (backup, DR, monitoring)
+    └─ key-cape fully operational in cluster
+
+S5  Configure identity federation (Phase 1)
+    └─ Cluster Authelia registers management LLDAP as upstream
+
+S6  Domain hubs connect to cluster workloads
+    └─ Capability registrations, HTTPS interface contracts
+```
+
+---
+
+## 7. Open Decisions
+
+| ID | Question | Owner | Status |
+|----|----------|-------|--------|
+| OA-D01 | Management plane host sizing and provider (Hetzner CX22 vs other) | Bernd | Open |
+| OA-D02 | Authelia version and config parity between management plane and key-cape | Bernd | Open |
+| OA-D03 | agenix key bootstrapping — which operator keys are age recipients on management plane | Bernd | Open |
+| OA-D04 | Trigger condition for Phase 2 identity migration (application user threshold or organisational event) | Bernd | Open |
+| OA-D05 | ops-bridge: reverse proxy (Caddy/nginx) or dedicated ingress component on management plane | Bernd | Open |
+
+---
+
+## 8. Relationship to Existing Specifications
+
+| Document | Relationship |
+|----------|-------------|
+| `specs/InteractionHubFrameworkSpecification_v0.2.md` | IHF spec — defines hub phases; hub placement in this architecture implements IHF Phases 9–12 deployment targets |
+| `SCOPE.md` | Situational guide for inter-hub development; this document governs where inter-hub runs |
+| NK-WP-0003 (state-hub) | Active workplan for key-cape cluster deployment — T09 is a prerequisite for S4 above |
+| Railiance OAS S1–S5 | Application domain provisioning patterns; NixOS management plane adds a NixOS module to S1 without replacing it |
+
+---
+
+## 9. Architecture Diagram
+
+```
+┌──────────────────────────────────────────────────────────────────────────────┐
+│  MANAGEMENT PLANE  (NixOS VPS)                                               │
+│                                                                              │
+│   ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────────────┐   │
+│   │  custodian  │  │  inter-hub  │  │  domain hubs                     │   │
+│   │  state-hub  │  │  (IHP/Hs)   │  │  dev-hub / ops-hub / fin-hub … │   │
+│   └─────────────┘  └─────────────┘  └──────────────────────────────────┘   │
+│                                                                              │
+│   ┌─────────────────────────────────────────────────┐                       │
+│   │  Identity (management users only)               │                       │
+│   │  LLDAP  ──▶  Authelia (OIDC)                   │                       │
+│   └────────────────────────┬────────────────────────┘                       │
+│                            │ optional upstream trust                        │
+│                            ▼                                                │
+└────────────────────────────┼────────────────────────────────────────────────┘
+                             │
+         ┌───────────────────┼──────────────────────────────────────────────┐
+         │  APPLICATION DOMAIN  (k3s — COULOMBCORE + RAILIANCE01)           │
+         │                   │                                              │
+         │   ┌───────────────▼────────────────────────────────┐            │
+         │   │  key-cape  (Authelia + LLDAP + privacyIDEA)   │            │
+         │   │  application IdP — *.coulomb.social            │            │
+         │   └───────────────────────────────────────────────-┘            │
+         │                                                                  │
+         │   ┌────────────┐  ┌────────────────┐  ┌───────────────────┐    │
+         │   │ markitect  │  │ kaizen-agentic  │  │  coulomb.social   │    │
+         │   └────────────┘  └────────────────┘  └───────────────────┘    │
+         │                                                                  │
+         └──────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+*This document is a living specification. Decisions recorded in OA-D01–D05 should be resolved in state-hub as they close, and this document updated accordingly.*