feat(WP-0009): IHF GAAF Compliance Foundation — type registries, extension manifests, architectural contracts
Some checks failed
Test / test (push) Has been cancelled

Implements IHUB-WP-0009: closes four GAAF-2026 gaps before domain hub work begins.
- TypeRegistry helper + controllers/views (hub_kind, hub_capability_manifest)
- HubCapabilityManifest entity with validation and registry linkage
- ARCHITECTURE-LAYERS.md + CI-enforced boundary contracts
- Alembic migration 1743724800, fitness tests (Test/Architecture/)
- GAAF spec, Operational Architecture spec, domain hub extension guide
- Updates to CLAUDE.md, SCOPE.md, Schema.sql, Routes, FrontController, Types

state_hub_sync: pending (tunnel was STALE at completion time; run fix-consistency)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-31 21:17:39 +00:00
parent 1a7732d7da
commit b5d73aa18b
47 changed files with 4855 additions and 104 deletions

View File

@@ -0,0 +1,234 @@
GoodApplicationArchitecture2026
*A guideline for building good software systems*
**Good Application Architecture Framework 2026 (GAAF-2026)**
**Standards Document**
**Version 1.0 31 March 2026**
### 1. Introduction
The **Good Application Architecture Framework 2026 (GAAF-2026)** is a system-theoretic standard for designing, reviewing, and continuously improving software repositories, frameworks, and products. It separates different kinds of change into distinct layers so that rigidity protects stability, malleability enables product learning, extensibility supports controlled growth, and bounded variability keeps operational risk under control.
**One-line doctrine**
Freeze the core, evolve the function, bound the customization, constrain the configuration, and govern all change through explicit contracts.
GAAF-2026 turns architecture from an implicit art into a repeatable, measurable, enforceable control system. It is deliberately practical: every concept has an associated artifact, checklist, or automated fitness function that both humans and coding agents can apply immediately. It is designed for immediate adoption in any codebase (monorepo, framework, SaaS, open-source library) and scales across entire organizations.
### 2. Core Concept
GAAF-2026 views a software system as a **cybernetic control system** for managing change. It evaluates every architectural decision across five orthogonal dimensions:
| Dimension | Purpose |
|---------------|--------------------------------------|
| **Layer** | Where the change lives |
| **Contract** | How the change is constrained |
| **Lifecycle** | When the change is allowed |
| **Validation**| How correctness is ensured |
| **Failure Mode** | What happens when things break |
This five-dimensional lens prevents layering from collapsing over time.
### 3. Layer Model (Final Form)
| Layer | Rigidity | Role | Contract Type | Lifecycle States | Defined Failure Mode | Primary Success Metric |
|------------------------|--------------|-------------------------------------------|------------------------|-----------------------------------|-----------------------------------------------|--------------------------------------------|
| **Core** | High (frozen)| Domain-agnostic primitives & invariants | Strong (versioned, immutable after v1) | Distilled only (rare promotion) | Fail-fast, never undefined behaviour | Replaceable only at major version boundaries |
| **Functional** | Medium | Value-realization modules | Medium (evolvable, versioned) | Experimental → Beta → Stable → Deprecated | Graceful degradation | Demand-driven, independently shippable |
| **Customization** | Low | Vendor/operator-controlled adaptation | Adaptive (migration-aware) | Versioned & migratable | Isolated per tenant/customer | Zero manual upgrade intervention |
| **Configuration** | Very Low | User-controlled declarative state | Schema (runtime-validated) | Dynamic but bounded | Reject invalid state BEFORE execution | Zero production incidents from bad config |
| **Extensions** (aspect)| Cross-cutting| Externally supplied Functional modules | Negotiated (manifest + capability) | Full lifecycle governed | Sandboxed (must not crash host) | Full compatibility matrix coverage |
**Dependency rule (strict)**:
Core ← Functional ← Customization ← Configuration
Extensions plug into Core or Functional only via contracts.
### 4. Contract System (First-Class Artifact)
Every compliant repository **MUST** contain a top-level folder:
```
/contracts/
core/
functional/
customization/
config/
extensions/
```
A **Contract** is a versioned artifact that defines for any public surface:
- Interface
- Invariants (what must always hold)
- Compatibility rules
- Validation rules
Contract types per layer are listed in the table above.
### 5. Architectural Laws (Hard Review Criteria)
1. Change must occur in the highest appropriate layer.
2. Lower layers define contracts; upper layers consume them (downward dependencies only).
3. The more rigid the layer, the stronger the interface discipline.
4. Variability must be explicit (who, what, guarantees, validation, upgrade path).
5. Customer-specific value must not poison product evolution.
6. Configuration must never become a second programming language by accident.
7. Extensions must use seams, not surgery.
8. **Enforcement Law**: All rules above must be automatically verified by architectural fitness functions in CI.
### 6. Evolution Model
**Promotion path** (rare, bottom-up only)
Experiment (Functional) → Stable Functional → Core
**Extraction path**
Functional → Extension (external ownership)
**Decay path**
Functional → Deprecated → Removed
**Core rule**: Core is never designed top-down; it is distilled from proven Functional patterns that have demonstrated multi-use value.
### 7. Failure Model (Per-Layer Semantics)
Every contract must explicitly document the failure behaviour for its layer (see table in §3).
### 8. Validation & Architectural Fitness Functions
Every repository **MUST** implement automated checks:
- Import / dependency graph validation (no upward dependencies)
- Core breaking-change detection
- Config schema validation before any execution
- Extension manifest + lifecycle hook presence
- Layer boundary lint rules
- Demand-signal / cost-justification check for Functional and Customization changes
### 9. Reusable 7-Phase Workplan
**Phase 0** Scope & Inventory
**Phase 1** Boundary & Contract Extraction
**Phase 2** Refactoring by Relocation
**Phase 3** Dependency Enforcement & Fitness Functions
**Phase 4** Validation Architecture + Failure Testing
**Phase 5** Governance & Release Discipline
**Phase 6** Scorecard & Continuous Improvement
**Required living artifact** in every repository:
`ARCHITECTURE-LAYERS.md` (see template in §12).
### 10. Scorecard
**Scoring scale** (05)
0 = absent / actively harmful
1 = weak / ad-hoc
2 = partial / inconsistent
3 = adequate / workable
4 = strong / disciplined
5 = excellent / exemplary
**Default weighting** (long-term systems):
Core 30 % | Functional 20 % | Customization 15 % | Configuration 10 % | Extensions 10 % | Cross-layer 15 %
**Core criteria (C1C9)**
C1. MinimalityC2. OrthogonalityC3. StabilityC4. Correctness confidenceC5. Performance fitnessC6. Scope completenessC7. Domain neutralityC8. Contract clarityC9. Invariant definition
**Functional criteria (F1F8)**
F1. Module isolationF2. Value efficiencyF3. Maturity labeling completenessF4. Reuse of coreF5. Coupling disciplineF6. Change velocity fitnessF7. Third-party readinessF8. Demand-signal discipline
**Customization criteria (U1U8)**
U1. Boundary clarityU2. Upgrade safetyU3. Contract disciplineU4. Migration reliabilityU5. Quality controlU6. Tenant isolationU7. Operational predictabilityU8. Cost justification
**Configuration criteria (G1G7)**
G1. Schema disciplineG2. Validation strengthG3. Safety of defaultsG4. Role & permission controlG5. AuditabilityG6. Rollback & recoveryG7. State-space boundedness
**Extensions criteria (E1E7)**
E1. Registration qualityE2. Contract clarityE3. Isolation guaranteesE4. TestabilityE5. Version compatibilityE6. Domain packaging fitnessE7. Developer experience
**Cross-layer criteria (X1X8)**
X1. Layer clarityX2. Dependency rule complianceX3. Change placementX4. Interface governanceX5. Architectural test coverageX6. Operational maintainabilityX7. Long-term evolvabilityX8. Failure containment & economic alignment
**Interpretation**
≥ 4.5 = Exemplary3.54.4 = Strong2.53.4 = Usable but vulnerable≤ 2.4 = Needs restructuring
### 11. Economic Alignment (Value-Driven Evolution)
- Functional modules require an explicit **demand signal**.
- Customization requires **per-instance cost justification**.
- Core changes require **proven multi-use / reuse benefit** across domains.
This ensures architecture directly supports business economics.
### 12. Practical Artifacts & Templates
#### 12.1 ARCHITECTURE-LAYERS.md Template
```markdown
# ARCHITECTURE-LAYERS.md
**Framework:** GAAF-2026
**Last reviewed:** YYYY-MM-DD
**Weighted scorecard:** XX % (see scorecard.xlsx)
**Repository purpose:**
**Layer map:** Core: … | Functional: … | …
**Decisions log:**
**Next review:** YYYY-MM-DD
```
#### 12.2 Standard Review Output Template
**Repository**
Name: …
Purpose: …
Maturity: …
Review date: …
**Layer map**
- Core: …
- Functional: …
- etc.
**Major findings**
Strengths / Violations / Risks / Fast wins / Strategic refactors
**Scores** (per section + weighted total)
**Priority actions** (P1P3)
**Migration concerns**
**Decision** (Keep / Refine / Refactor / Re-architect)
#### 12.3 Good-Signs / Bad-Signs Heuristics (Quick Checklist for Humans & Agents)
**Good signs**
- Core is small and boring
- Modules are easy to add or remove
- Customer logic lives outside product code
- Config has strong validation
- Extension seams are explicit and registered
- Upgrades require zero heroics
**Bad signs**
- Core changes every month
- Features bypass core contracts
- Customers implemented as branches in code
- Config contains arbitrary expressions
- Plugins patch internal state
- Releases need manual per-customer repair
#### 12.4 Example Optimization Backlog Categories
- **Core backlog**: shrink surface, remove domain leakage, formalize invariants
- **Functional backlog**: split coupled modules, mark maturity, eliminate core duplication
- **Customization backlog**: replace forks with rules/workflows, add manifest & migration engine
- **Configuration backlog**: add typed schemas, guardrails, audit log
- **Extension backlog**: define registration API, lifecycle, compatibility matrix, test kit
### 13. Compliance Definition
A repository is **GAAF-2026 compliant** if and only if it satisfies **all** of the following:
1. Layers are separated as defined.
2. Explicit contracts exist in `/contracts/`.
3. Strict downward dependency direction is enforced.
4. Lifecycle states are declared and respected.
5. Upgradeability is guaranteed via bounded customization.
6. All user-controlled variability is validated.
7. Extensibility uses registered, contract-based mechanisms.
8. Failure is contained within defined per-layer boundaries.
9. Compliance is continuously measured via scorecard and fitness functions.
### 14. Adoption & Next Steps
- **For humans**: Use the workplan every major release or when scorecard < 3.5.
- **For agents**: Feed this entire document + the `ARCHITECTURE-LAYERS.md` into any coding or review prompt.
- **Automation**: Implement the fitness functions listed in §8 as the first CI jobs.
- **Repository starter kit**: Create the `/contracts/` folder and `ARCHITECTURE-LAYERS.md` on day one.
This document is the single source of truth for GAAF-2026. It is intentionally self-contained, versioned, and ready for inclusion in every repository, Dev Hub, or organizational standard library.
**Approved for use across all systems.**
**Next scheduled framework review: March 2027.**
xxx

View File

@@ -121,13 +121,59 @@ Phase 8 established federated governance within a single deployment. Phase 9
exposes that governance state as a stable, versioned, authenticated REST API and
ships consumer SDKs that make integration a day's work rather than a project.
### GAAF Foundation Prerequisite
> **IHUB-WP-0009 (GAAF Compliance Foundation) must be complete before Phase 9
> begins.**
Phase 9 generates an OpenAPI 3.1 specification that documents all IHF API
fields. Three of those fields — `widget_type`, `event_type`, and `category`
are type discriminators. If they are documented as arbitrary `string` values,
the API contract is immediately incorrect: consumers will invent values that
diverge from the IHF vocabulary, breaking cross-hub aggregation and federation.
IHUB-WP-0009 establishes the four type registries that enumerate these fields.
Phase 9 must read from those registries to generate correct `enum` arrays in
the OpenAPI spec. Building Phase 9 first and retrofitting enums later is a
breaking API change.
**Specific GAAF dependencies for Phase 9 implementation:**
1. **Type registry enumerations in OpenAPI** — The spec generator must query
`widget_type_registry`, `event_type_registry`, and
`annotation_category_registry` to produce `enum` arrays for the
corresponding fields. The generated spec must NOT document these as
unconstrained `string`.
2. **ApiConsumer linked to HubCapabilityManifest** — A `domain` hub
authenticating as an API consumer is identified by its active
`HubCapabilityManifest`. The `ApiConsumer` record should carry a
`hub_capability_manifest_id` FK (nullable — non-hub consumers such as
third-party tools authenticate without a manifest). When a manifested
consumer submits an event, the `event_type` is validated against both the
global `event_type_registry` and the manifest's `declared_event_types`.
3. **OAuth scope alignment with registered vocabulary** — OAuth scopes should
include hub-specific scope claims (`hub:{slug}:write`) that the token
exchange validates against the hub's active manifest. A consumer without a
manifest can only write framework-level event types; hub-owned types require
the corresponding hub scope.
4. **Contract file reference** — The OpenAPI spec must reference
`/contracts/functional/interaction-reporting-v1.md` as its human-readable
companion. The generated spec is derived data; the contract file is
authoritative intent.
### Scope
* Versioned REST API (`/api/v2/`) for all core IHF artifact types
* OpenAPI 3.1 specification generated from the live schema
* OpenAPI 3.1 specification generated from the live schema, with type registry
enumerations for all type discriminator fields
* Authentication: OAuth 2.0 client credentials flow (superseding per-hub Bearer tokens)
* API key management UI for external consumers
* Consumer SDKs: TypeScript/Node, Python
* API key management UI for external consumers; domain hub consumers linked to
their active HubCapabilityManifest
* Consumer SDKs: TypeScript/Node, Python (type-safe enums generated from
type registries)
* Webhook delivery for interaction events, candidate creation, and decision records
* API usage dashboard: request counts, error rates, consumer identity
* Rate limiting and quota management per consumer
@@ -137,24 +183,33 @@ ships consumer SDKs that make integration a day's work rather than a project.
* External systems can read widget registry, interaction events, annotations,
requirement candidates, decisions, deployments, and outcome signals
* External systems can submit interaction events and annotations via the API
* Domain hub consumers submitting hub-owned event types require a matching
active HubCapabilityManifest
* Downstream hubs can subscribe to governance events via webhooks
* SDK consumers get type-safe access to IHF contracts without reading the spec
* SDK consumers get type-safe access to IHF contracts without reading the spec;
SDK enum types are generated from the live type registries
* API consumers are tracked, quotaed, and auditable
### Exit Criteria
* All core IHF artifact types are readable via `/api/v2/`
* Interaction events and annotations are writable via `/api/v2/`
* OpenAPI spec is generated and accurate
* TypeScript SDK and Python SDK published (as static files or packages)
* OpenAPI spec is generated and accurate; `widget_type`, `event_type`, and
`category` fields carry `enum` arrays derived from the type registries
* TypeScript SDK and Python SDK published (as static files or packages); both
export typed enums for widget types and event types
* Webhook delivery confirmed for at least two event types
* API usage dashboard renders correctly
* OAuth token flow works end-to-end
* Submission of an unregistered `event_type` returns HTTP 422 with a
registry-referenced error message
### Data Artifacts Introduced
`ApiConsumer`, `ApiKey`, `WebhookSubscription`, `WebhookDelivery`
Schema additions: `api_consumers.hub_capability_manifest_id` (FK, nullable)
---
## Phase 10 — Hub Registry and Widget Marketplace
@@ -166,45 +221,105 @@ configurations across deployments. Phase 9 made the IHF externally consumable.
Phase 10 makes it composable: hubs and widgets can be discovered, rated,
adopted, and evolved as shared platform assets.
### GAAF Foundation Integration
> **Phase 10's Hub Registry IS the `HubCapabilityManifest` table, extended with
> a public-facing discovery UI.** It is not a separate data store. IHUB-WP-0009
> must be complete before Phase 10 begins.
The Hub Registry in Phase 10 is the public-facing projection of the capability
manifests introduced in IHUB-WP-0009. Every registered hub already has an
active `HubCapabilityManifest` that declares its widget types, event types,
annotation categories, and policy vocabulary. Phase 10 adds the browsability,
pattern publishing, and adoption mechanics on top of that existing foundation.
**Specific GAAF integration points for Phase 10 implementation:**
1. **Hub Registry = active HubCapabilityManifest + HubHealthSnapshot** — The
hub registry view is a join of `hub_capability_manifests` (status=active),
`hub_health_snapshots` (latest), and `hubs`. No new hub registry table is
required. The data already exists; Phase 10 adds the discovery UI.
2. **Widget patterns reference registered types** — A `WidgetPattern` record
must declare a `widget_type` that exists in `widget_type_registry`. When
publishing a pattern, if the `widget_type` is owned by another hub, the
pattern is cross-hub and requires that hub's acknowledgement (or uses a
framework-level type). This prevents patterns from encoding unregistered
vocabulary.
3. **Pattern adoption triggers manifest update** — When a hub adopts a
`WidgetPattern`, if the pattern's `widget_type` is not in the adopting
hub's `declared_widget_types`, the adopting hub's manifest is updated to
include it (in draft amendment mode). The hub operator must re-activate
the amended manifest. This ensures the adopting hub's type vocabulary stays
coherent with its actual widget usage.
4. **Governance templates reference registered categories** — A
`GovernanceTemplate` for requirement categories must reference entries in
`annotation_category_registry`. Template cloning adds any new categories
to the cloning hub's manifest (draft amendment).
5. **Hub registry GAAF compliance score** — The hub registry should display
each hub's GAAF compliance indicator: whether it has an active manifest,
how many registered types it owns, and whether the architecture fitness
functions report any violations. This makes GAAF compliance visible as a
platform-level metric.
### Scope
* Hub registry: a catalog of registered hubs with public metadata, capability
declarations, and health summaries
* Widget pattern library: reusable widget definitions that can be instantiated
into any hub
* Governance template library: requirement distillation and decision templates
that can be cloned across hubs
* Hub registry: a catalog of registered hubs built on `HubCapabilityManifest`
+ `HubHealthSnapshot`, with public metadata, declared vocabulary, and health
summaries
* Widget pattern library: reusable widget definitions tied to registered types
from `widget_type_registry`
* Governance template library: requirement distillation and decision templates,
tied to registered annotation categories
* Widget ratings and adoption tracking: which widgets are in use where, with
aggregated friction scores across deployments
* Pattern versioning: widget patterns have explicit versions; hubs can pin or
follow-latest
* Pattern adoption with manifest amendment workflow: adoption updates the
adopting hub's capability manifest when new types are introduced
* Marketplace dashboard: browse, search, and adopt patterns
### Functional Capabilities
* Hub operators can publish a widget pattern to the shared library
* Hub operators can adopt a published pattern into their hub
* Hub operators can publish a widget pattern to the shared library; pattern
widget type must be in `widget_type_registry`
* Hub operators can adopt a published pattern into their hub; adoption
triggers a manifest amendment if new types are introduced
* Governance templates (requirement categories, decision checklists) can be
cloned across hubs
cloned across hubs; cloning amends the cloning hub's manifest for new
categories
* Widget adoption across hubs is tracked for aggregate friction and outcome
analysis
* Pattern authors receive friction and outcome feedback from all adopter hubs
(opt-in anonymised)
* Hub registry shows each hub's active capability manifest summary and GAAF
compliance status
### Exit Criteria
* Hub registry renders all registered hubs with capability metadata
* Widget pattern library lists published patterns with version history
* A pattern can be published from one hub and adopted into another
* Hub registry renders all registered hubs with their active capability
manifest declared vocabulary and current health score
* Widget pattern library lists published patterns with version history; each
pattern's widget type is linked to its registry entry
* A pattern can be published from one hub and adopted into another; adoption
triggers a manifest amendment draft when new types are introduced
* Adoption tracking shows which hubs use which patterns
* Governance template cloning works end-to-end
* Governance template cloning works end-to-end; new categories appear in
the adopting hub's manifest amendment
* Marketplace dashboard renders search and browse
* Hub registry GAAF compliance indicator renders correctly for all hubs
### Data Artifacts Introduced
`WidgetPattern`, `WidgetPatternVersion`, `PatternAdoption`, `GovernanceTemplate`,
`GovernanceTemplateClone`
Note: No `HubRegistry` table — the hub registry is a view over existing
`hub_capability_manifests`, `hub_health_snapshots`, and `hubs` tables.
---
## Phase 11 — Advanced AI Federation
@@ -339,12 +454,14 @@ merely a record-keeping one.
## 7. Dependency Graph (Phases 912)
```
Phase 8 (Federated) ──→ Phase 9 (External API)
Phase 10 (Marketplace)
Phase 7 (Observability) ──→ Phase 11 (AI Federation)
Phase 8 (Federated) ──→ IHUB-WP-0009 (GAAF Foundation) ──→ Phase 9 (External API)
│ type registries, manifests,
│ contracts, fitness fns Phase 10 (Marketplace)
└──────────────────────────────────────┤
Phase 7 (Observability) ──→ Phase 11 (AI Federation) ←───────────────┘
Phase 5 (Agent Assist) ──┘ │
Phase 12 (Platform Memory)
@@ -352,9 +469,18 @@ Phase 5 (Agent Assist) ──┘ │
Phase 4 (Outcomes) ───────────┘
```
- **IHUB-WP-0009 (GAAF Compliance Foundation) is a prerequisite for Phase 9
and Phase 10.** It establishes the type registries, HubCapabilityManifest,
`/contracts/` directory, and architectural fitness functions that both phases
depend on. Phase 9 cannot generate a correct OpenAPI specification without
the type registries. Phase 10 cannot build its Hub Registry without the
manifest schema.
- Phase 9 requires Phase 8 (stable federated schema, OAuth replaces per-hub
Bearer tokens)
- Phase 10 requires Phase 9 (marketplace API is built on v2 API surface)
Bearer tokens) and IHUB-WP-0009 (type registry enumerations, manifest-linked
API consumers)
- Phase 10 requires Phase 9 (marketplace API is built on v2 API surface) and
IHUB-WP-0009 (Hub Registry = HubCapabilityManifest + discovery UI; widget
patterns reference type registry entries)
- Phase 11 requires Phase 5 (agent model) and Phase 7 (observability signals
needed for model routing and performance tracking)
- Phase 12 requires Phase 4 (outcome signals), Phase 7 (friction/health

View File

@@ -0,0 +1,242 @@
# Operational Architecture — NetKingdom / Railiance OAS
**Version:** 0.1
**Date:** 2026-03-31
**Status:** Adopted — working document
---
## 1. Governing Principle
> **The governor must not run on the governed.**
The management plane and the application domain are operationally independent. Neither may have a hard runtime dependency on the other. Identity federation is the one permitted soft coupling, and it runs in one direction only: the application domain may optionally trust the management plane IdP; the management plane trusts nothing in the cluster.
---
## 2. Two-Domain Model
### 2.1 Management Plane
| Attribute | Value |
|-----------|-------|
| Host type | Dedicated NixOS VPS (e.g. Hetzner CX22 or equivalent) |
| Provisioning | `nixos-anywhere` called from Terraform S1 (NixOS module added to existing S1 patterns) |
| Runtime | systemd services under NixOS — no container orchestrator |
| Config management | Declarative `configuration.nix`; atomic rollbacks via NixOS generations |
| Secrets | `agenix` (NixOS-native, age-encrypted secrets in config repo) |
**Workloads hosted on the management plane:**
| Service | Role |
|---------|------|
| `the-custodian` (FastAPI + PostgreSQL) | State hub — decisions, workstreams, progress events |
| `inter-hub` (IHP/Haskell) | Interaction Hub Framework — governed interaction substrate |
| All domain hub instances (dev-hub, ops-hub, fin-hub, …) | Hub instances built on the inter-hub framework |
| LLDAP (management users only) | Authoritative directory for operator accounts |
| Authelia | SSO/OIDC for management-plane services |
| ops-bridge | Management traffic entry point; not a governed workload itself |
**What does NOT run here:**
- Application workloads (markitect, kaizen-agentic, coulomb.social, activity-core, …)
- The cluster-resident key-cape identity stack
- Any service whose availability must depend on cluster health
### 2.2 Application Domain
| Attribute | Value |
|-----------|-------|
| Host(s) | COULOMBCORE + RAILIANCE01 |
| Orchestration | k3s (Railiance OAS S1S5: Terraform/Ansible → cnpg → ArgoCD/Helm) |
| Config management | GitOps via ArgoCD |
| Secrets | SOPS/age (existing cluster pattern) |
**Workloads hosted in the cluster:**
| Service | Role |
|---------|------|
| `key-cape` | Application-domain IdP: Authelia + LLDAP + privacyIDEA (SSO/MFA/OIDC) |
| `markitect` | Application workload |
| `kaizen-agentic` | Application workload |
| `coulomb.social` | Application workload |
| `activity-core` | Application workload |
| cnpg PostgreSQL | Cluster-resident databases |
| cert-manager / ACME | TLS for `*.coulomb.social` |
**Status note (as of 2026-03-31):** key-cape stack (Authelia + LLDAP + privacyIDEA) is deployed and validated on RAILIANCE01 (NK-WP-0003 T01T08 complete). T09 (backup, DR, monitoring) is the remaining task.
---
## 3. Identity and Security Architecture
### 3.1 Stack Placement
```
Management Plane (NixOS) Application Domain (k3s)
───────────────────────── ──────────────────────────────────
LLDAP ◀── operator accounts only key-cape:
Authelia ── OIDC for mgmt services ├─ LLDAP (application users)
├─ Authelia (SSO, OIDC broker)
optional upstream trust ──────▶ └─ privacyIDEA (MFA)
(cluster Authelia may pull
mgmt LLDAP as upstream)
```
### 3.2 Federation Direction
| Rule | Detail |
|------|--------|
| Management → Application | Management plane LLDAP can be registered as an upstream LDAP source in cluster Authelia, so operator accounts get cluster SSO without maintaining two passwords. This is **optional** and the cluster degrades gracefully if the management plane is unreachable. |
| Application → Management | **Never.** Management-plane services authenticate against the local LLDAP/Authelia only. |
### 3.3 Identity Lifecycle Phases
**Phase 1 — Management-plane IdP, federated outward (current target)**
- Management LLDAP is authoritative for all operator accounts
- Cluster Authelia federates management LLDAP as upstream for operator SSO
- Application-only users (if any) have direct accounts in cluster LLDAP
- Simple, low overhead, suitable for small operator team + small application user population
**Phase 2 — Full application-domain IdP, management users bridged in**
- Triggered when application user population warrants independent governance
- Cluster LLDAP becomes authoritative for application users
- Management users are federated into the cluster (not the reverse)
- Management plane remains fully independent — cluster IdP outage does not affect management operations
- Migration path is clean because the coupling direction never reverses
### 3.4 Secrets Management
| Domain | Tool | Rationale |
|--------|------|-----------|
| Management plane | `agenix` | NixOS-native; age-encrypted secrets declared alongside `configuration.nix`; same age key material as SOPS |
| Application domain | SOPS/age | Already established in cluster; ArgoCD + Helm secrets operator integration in place |
| Bridging | Shared age key material | Both tools use age — operator key material can overlap; no second key infrastructure needed |
---
## 4. Operational Boundaries and Failure Modes
### 4.1 Failure Independence
| Failure scenario | Management plane impact | Application domain impact |
|-----------------|------------------------|--------------------------|
| Cluster down | None — management plane unaffected | Application workloads down |
| Management plane down | Governance tooling unavailable | Application workloads continue; SSO may degrade for operator accounts if federation configured (Phase 1 only) |
| key-cape down | None | Application SSO down; management-plane auth unaffected |
| Management LLDAP down | Management SSO down | Application SSO degrades for operator accounts (if Phase 1 federation); application users unaffected |
### 4.2 Network Topology
- Management plane has no ingress dependency on the cluster
- ops-bridge on the management plane provides the entry point for operator traffic to management services
- Domain hubs (inter-hub instances) communicate with the cluster only via defined capability interfaces — no cluster-internal network access required
---
## 5. Hub and Framework Placement
Inter-hub and all domain hub instances (dev-hub, ops-hub, fin-hub, etc.) run on the management plane, not as cluster workloads. This is a deliberate departure from Option A/C:
- Hub instances are IHP/Haskell — their natural runtime is NixOS + systemd
- IHP containerisation is non-trivial (Nix OCI build); NixOS systemd is the design target
- Hubs govern cluster workloads — they must remain available when the cluster is disrupted
- All hub instances share the same operational paradigm: NixOS configuration, `agenix` secrets, systemd service units
Domain hubs communicate with cluster workloads exclusively through:
- Registered capability interfaces (state-hub capability registry)
- HTTPS endpoints (no cluster-internal DNS or service mesh access)
---
## 6. Provisioning Sequence
```
S0 Workstation (current state)
└─ custodian running locally
└─ inter-hub in development
S1 Provision management plane host
├─ Terraform null_resource → nixos-anywhere → NixOS install
├─ configuration.nix from inter-hub repo
└─ agenix secrets bootstrapped from operator workstation
S2 Migrate custodian to management plane
└─ PostgreSQL → management plane (local, NixOS-managed)
S3 Deploy inter-hub + hub instances to management plane
└─ systemd services, Authelia + LLDAP for management SSO
S4 Complete key-cape NK-WP-0003 T09 (backup, DR, monitoring)
└─ key-cape fully operational in cluster
S5 Configure identity federation (Phase 1)
└─ Cluster Authelia registers management LLDAP as upstream
S6 Domain hubs connect to cluster workloads
└─ Capability registrations, HTTPS interface contracts
```
---
## 7. Open Decisions
| ID | Question | Owner | Status |
|----|----------|-------|--------|
| OA-D01 | Management plane host sizing and provider (Hetzner CX22 vs other) | Bernd | Open |
| OA-D02 | Authelia version and config parity between management plane and key-cape | Bernd | Open |
| OA-D03 | agenix key bootstrapping — which operator keys are age recipients on management plane | Bernd | Open |
| OA-D04 | Trigger condition for Phase 2 identity migration (application user threshold or organisational event) | Bernd | Open |
| OA-D05 | ops-bridge: reverse proxy (Caddy/nginx) or dedicated ingress component on management plane | Bernd | Open |
---
## 8. Relationship to Existing Specifications
| Document | Relationship |
|----------|-------------|
| `specs/InteractionHubFrameworkSpecification_v0.2.md` | IHF spec — defines hub phases; hub placement in this architecture implements IHF Phases 912 deployment targets |
| `SCOPE.md` | Situational guide for inter-hub development; this document governs where inter-hub runs |
| NK-WP-0003 (state-hub) | Active workplan for key-cape cluster deployment — T09 is a prerequisite for S4 above |
| Railiance OAS S1S5 | Application domain provisioning patterns; NixOS management plane adds a NixOS module to S1 without replacing it |
---
## 9. Architecture Diagram
```
┌──────────────────────────────────────────────────────────────────────────────┐
│ MANAGEMENT PLANE (NixOS VPS) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────────────────┐ │
│ │ custodian │ │ inter-hub │ │ domain hubs │ │
│ │ state-hub │ │ (IHP/Hs) │ │ dev-hub / ops-hub / fin-hub … │ │
│ └─────────────┘ └─────────────┘ └──────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Identity (management users only) │ │
│ │ LLDAP ──▶ Authelia (OIDC) │ │
│ └────────────────────────┬────────────────────────┘ │
│ │ optional upstream trust │
│ ▼ │
└────────────────────────────┼────────────────────────────────────────────────┘
┌───────────────────┼──────────────────────────────────────────────┐
│ APPLICATION DOMAIN (k3s — COULOMBCORE + RAILIANCE01) │
│ │ │
│ ┌───────────────▼────────────────────────────────┐ │
│ │ key-cape (Authelia + LLDAP + privacyIDEA) │ │
│ │ application IdP — *.coulomb.social │ │
│ └───────────────────────────────────────────────-┘ │
│ │
│ ┌────────────┐ ┌────────────────┐ ┌───────────────────┐ │
│ │ markitect │ │ kaizen-agentic │ │ coulomb.social │ │
│ └────────────┘ └────────────────┘ └───────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
```
---
*This document is a living specification. Decisions recorded in OA-D01D05 should be resolved in state-hub as they close, and this document updated accordingly.*