diff --git a/docs/adr/ADR-0006-recursive-multi-tenant-identity-authorization.md b/docs/adr/ADR-0006-recursive-multi-tenant-identity-authorization.md new file mode 100644 index 0000000..a2a32d3 --- /dev/null +++ b/docs/adr/ADR-0006-recursive-multi-tenant-identity-authorization.md @@ -0,0 +1,96 @@ +# ADR-0006 - Recursive Multi-Tenant Identity and Authorization Architecture + +**Status:** Accepted +**Date:** 2026-05-17 +**Deciders:** Bernd Worsch + +## Context + +The Coulomb platform is being built from the same repositories and +services that will later support other use cases. This creates a +recursive architecture problem: Coulomb needs to use the shared identity, +security, policy, and deployment capabilities, while those capabilities +are themselves part of the infrastructure being built. + +If this recursion is left implicit, the first internal use case can drift +into being treated as the platform root of trust. That would make future +multi-tenant use harder, blur operational authority, and make secure +bootstrap/recovery decisions harder to reason about. + +NetKingdom already owns identity and security architecture concerns. +key-cape provides a lightweight IAM implementation of the NetKingdom IAM +Profile. Keycloak remains the expanded production IAM option. privacyIDEA +is relevant for MFA/token lifecycle. flex-auth is emerging as the +canonical authorization control plane and practical reference +implementation of CARING authorization semantics. Topaz is the most +likely first delegated authorization runtime behind flex-auth. + +## Decision + +We will document and implement the platform security architecture as a +recursive multi-tenant architecture with three explicit planes: + +- **Bootstrap plane** - establishes the first trusted runtime and recovery + authority before normal platform services exist. +- **Platform control plane** - operates shared identity, MFA, secrets, + authorization, policy, audit, and explanation services. +- **Tenant plane** - runs Coulomb and future workloads under scoped tenant + authority. + +Coulomb will be treated as the first internal/reference tenant, not as the +platform root of trust. + +NetKingdom will own the canonical security architecture and standards. +Railiance will own deployment layering and orchestration boundaries. +flex-auth will own the canonical authorization interface and CARING-based +policy/decision model. Topaz will be the first delegated PDP runtime, +with other authorization engines treated as adapters where useful. + +## Consequences + +- Architecture documentation must separate platform-root authority from + tenant administration, even for Coulomb. +- Workplans for identity, authorization, and bootstrapping must include + explicit tenant and control-plane boundaries. +- Bootstrap design must include trust-state transitions and recovery + procedures rather than assuming the final IAM service already exists. +- flex-auth should model tenants, platform resources, CARING descriptors, + decision envelopes, and runtime adapters in a provider-neutral way. +- key-cape and Keycloak should be treated as implementations of the IAM + Profile, not as the canonical source of resource authorization + semantics. +- A future orchestration repo may be useful, but only to coordinate safe + sequencing across Railiance and NetKingdom capabilities. It must not + bypass Railiance stack ownership. + +## Alternatives Considered + +### Treat Coulomb As The Platform Root + +This is simpler during early development but creates long-term coupling +between one internal use case and the shared platform. It makes later +multi-tenant operation and secure bootstrap harder. + +### Put All Security Semantics Into Keycloak + +Keycloak is useful for expanded IAM and can provide authorization +features, but making it the canonical model would make lightweight mode +and future authorization backends harder to support. The preferred model +keeps identity provider concerns separate from canonical authorization +semantics. + +### Create An Orchestration Repo Immediately + +A dedicated orchestration repo may become appropriate. Creating it before +we define trust states and repo boundaries would risk encoding accidental +sequence logic too early. The immediate step is to document the state +machine and update workplans. + +## Follow-Up + +- Refine bootstrapping around explicit trust-state transitions. +- Add tenant/control-plane language to flex-auth authorization workplans. +- Define the first production Topaz integration boundary for flex-auth. +- Decide when key-cape is sufficient and when Keycloak expanded mode is + required. +- Decide what, if anything, should live in a future orchestration repo. diff --git a/docs/platform-identity-security-architecture.md b/docs/platform-identity-security-architecture.md new file mode 100644 index 0000000..2f4a635 --- /dev/null +++ b/docs/platform-identity-security-architecture.md @@ -0,0 +1,272 @@ +# Platform Identity and Security Architecture + +Status: draft architecture baseline for NetKingdom/Railiance/Coulomb +Date: 2026-05-17 + +## Purpose + +This document captures the production-oriented identity, authorization, +MFA, credential, and bootstrap architecture for the platform we are +building. It deliberately treats Coulomb as the first internal tenant and +reference workload, not as the platform itself. + +The architecture must be recursive: the same platform that protects +future tenants also protects the services and repositories used to build +and operate the platform. That recursion is useful, but it is also where +many security designs accidentally collapse into self-administering root +power. This document exists to prevent that. + +## Core Model + +```text +Bootstrap plane + establishes initial trust before normal platform services exist + +Platform control plane + operates identity, MFA, secrets, policy, audit, and authorization + +Tenant planes + run Coulomb and future customer/project/domain workloads +``` + +Coulomb is the first internal tenant. It is also the reference tenant that +helps validate the platform. It must not become the platform root of +trust merely because it is first. + +## Planes + +### Bootstrap Plane + +The bootstrap plane exists before the full platform is alive. It owns the +minimal authority needed to create and recover the control plane. + +Responsibilities: + +- host provisioning and hardening +- root age/SOPS material and emergency bundles +- initial cluster access +- initial identity service deployment +- initial secret injection +- break-glass recovery +- transition to managed runtime authority + +Owned primarily by `railiance-infra`, `railiance-cluster`, and the +credential bootstrap work in `net-kingdom`. + +### Platform Control Plane + +The platform control plane owns shared security services. + +Responsibilities: + +- NetKingdom IAM Profile +- lightweight identity mode through key-cape +- expanded identity mode through Keycloak +- MFA/token lifecycle through privacyIDEA where applicable +- canonical authorization through flex-auth +- delegated authorization runtime through Topaz first, with other PDPs as + adapters +- audit and explanation records +- platform service secrets and rotation + +Owned conceptually by `net-kingdom`; deployed through the Railiance stack. + +### Tenant Plane + +Tenant planes are where workloads live. Coulomb is tenant zero/reference +tenant; later tenants may be projects, customers, domains, sandboxes, or +isolated deployments. + +Responsibilities: + +- protected services and repositories +- tenant-owned resources +- tenant-specific groups, policies, and service accounts +- local enforcement of authorization decisions +- workload audit events and diagnostics + +Tenant administrators may manage their tenant resources. They must not be +able to alter platform root trust, global identity configuration, +platform break-glass material, or the policy pipeline that governs the +platform itself. + +## Component Responsibilities + +| Component | Primary role | Must not become | +| --- | --- | --- | +| `net-kingdom` | canonical security architecture, IAM Profile, SSO/MFA, credential bootstrap decisions | a deployment repo for every stack layer | +| `key-cape` | lightweight IAM implementation of the NetKingdom IAM Profile | a general-purpose IAM platform or authorization engine | +| Keycloak | expanded-mode IAM and optional Keycloak Authorization Services adapter | the canonical model for all platform authorization | +| privacyIDEA | MFA/token authority, especially in lightweight/key-cape mode | a policy decision point for application resources | +| `flex-auth` | authorization control plane, CARING descriptors, policy packages, decision envelopes, audit/explain | an identity provider or backend-specific wrapper | +| Topaz | first delegated authorization runtime/PDP for flex-auth | the platform control plane or identity provider | +| Railiance repos | converged infrastructure, cluster, platform services, enablement, and app deployment | the source of security policy semantics | + +## Identity Path + +```text +Human/service/agent principal + | + v +NetKingdom IAM Profile + | + +-- lightweight mode: key-cape + | Authelia + LLDAP + privacyIDEA + | + +-- expanded mode: Keycloak + Keycloak + LDAP/Entra federation + MFA integration +``` + +Applications depend on the IAM Profile, not on the concrete provider. +key-cape is the lightweight profile implementation. Keycloak is the +expanded-mode profile implementation. privacyIDEA provides MFA/token +capabilities where the deployment mode uses it. + +Identity answers: who is this actor, how was the actor authenticated, +what coarse claims are asserted, and what assurance evidence exists? + +Identity does not answer final resource-specific authorization. + +## Authorization Path + +```text +Identity claims from IAM Profile + | + v +flex-auth + resource registry + policy packages + CARING descriptors + decision/audit/explain envelope + | + +-- standalone evaluator + +-- Topaz delegated PDP + +-- optional Keycloak AuthZ adapter + +-- future OpenFGA/SpiceDB/OPA/Cedar adapters + | + v +Protected service enforcement +``` + +Authorization answers: may this actor perform this action on this +resource in this context, and what explanation/audit/CARING metadata +supports that answer? + +Protected services enforce decisions locally. flex-auth is the canonical +policy and decision boundary; delegated PDPs are runtime implementations +behind it. + +## Recursive Trust Rule + +Normal tenant administration must never be sufficient to alter the +platform root of trust. + +This applies even when the tenant is Coulomb. Coulomb can be a tenant and +a reference workload, but platform-root actions require platform control +plane authority and appropriate bootstrap/break-glass safeguards. + +Examples of platform-root actions: + +- changing IAM Profile semantics +- rotating root bootstrap keys +- changing break-glass access +- changing global MFA requirements +- activating authorization policy that governs platform administration +- changing flex-auth/Topaz policy import pipelines +- changing audit retention or tamper-evidence settings + +## Tenant Model + +Every protected resource should belong to a tenant or to the platform +control plane. + +Suggested identifiers: + +```text +tenant:platform # platform control plane resources +tenant:coulomb # first internal/reference tenant +tenant:sandbox: # sandbox tenants +tenant:customer: # future customer tenants +``` + +Tenant membership and platform membership are distinct. A subject may be +an administrator in `tenant:coulomb` without being a platform operator. + +CARING descriptors should explicitly identify scope and tenant when the +access is tenant-scoped. Platform-scoped descriptors should be rare, +audited, and usually condition-bound. + +## Bootstrap To Runtime Transition + +Production setup should move through explicit trust states: + +1. **Bare host trust** - provisioned and verified by Railiance infra. +2. **Cluster trust** - Kubernetes runtime exists and is verified. +3. **Secret trust** - age/SOPS and emergency bundles are established. +4. **Bootstrap identity trust** - local/bootstrap identity can operate + enough to install full identity services. +5. **Runtime identity trust** - key-cape or Keycloak becomes the normal + IAM Profile issuer. +6. **Runtime authorization trust** - flex-auth and Topaz are initialized + with platform and tenant policies. +7. **Tenant onboarding trust** - Coulomb and later tenants register + resources and receive scoped authority. + +Each transition needs a verification check and a rollback/recovery path. + +## Production Topology + +For an initial production-capable Coulomb deployment: + +```text +railiance-infra + host baseline, SSH, age keys, emergency material + +railiance-cluster + Kubernetes, ingress, cert-manager, network policy + +railiance-platform + PostgreSQL, object storage, secret management + key-cape or Keycloak + privacyIDEA where used + flex-auth + Topaz + +railiance-apps + Coulomb services as tenant:coulomb workloads +``` + +`net-kingdom` owns the architecture and standards. Railiance owns the +converged deployment layers. Component repos own their implementation +contracts. + +## Orchestration Implication + +A future orchestration repo may be justified, but only after the state +machine is clear. It should not own resources directly. It should own +safe sequencing across repos. + +Possible responsibilities: + +- verify Railiance preconditions +- initialize credential bootstrap +- deploy or validate identity services +- deploy or validate flex-auth and Topaz +- run IAM Profile conformance checks +- run authorization conformance checks +- produce a platform security readiness report + +This orchestration layer should build on Railiance capabilities rather +than bypassing the Railiance stack boundaries. + +## Open Questions + +- Where is the durable audit log stored for platform-root decisions? +- Which actions require dual control or human confirmation? +- How is break-glass use recorded when normal identity is unavailable? +- Which tenant metadata is required before a service can register + resources with flex-auth? +- When does the platform switch from key-cape lightweight mode to + Keycloak expanded mode? +- Does Topaz run centrally for the platform, per tenant, or per service + for the first production deployment? diff --git a/workplans/NK-WP-0006-recursive-platform-identity-security-architecture.md b/workplans/NK-WP-0006-recursive-platform-identity-security-architecture.md new file mode 100644 index 0000000..2b05ced --- /dev/null +++ b/workplans/NK-WP-0006-recursive-platform-identity-security-architecture.md @@ -0,0 +1,137 @@ +--- +id: NK-WP-0006 +type: workplan +title: Recursive platform identity and security architecture +domain: identity-security +repo: net-kingdom +status: proposed +owner: Bernd Worsch +topic_slug: recursive-platform-identity-security +created: 2026-05-17 +updated: 2026-05-17 +depends_on: + - NK-WP-0001 + - NK-WP-0004 + - NK-WP-0005 +--- + +# NK-WP-0006 - Recursive Platform Identity and Security Architecture + +## Goal + +Make the platform identity and security architecture explicit enough that +Coulomb can be onboarded as the first internal/reference tenant without +accidentally becoming the platform root of trust. + +The workplan turns the recursive insight into operational structure: +bootstrap plane, platform control plane, tenant plane, IAM Profile, +flex-auth authorization, Topaz runtime, privacyIDEA MFA/token handling, +and safe orchestration boundaries. + +## Context + +The current platform work is both building the Coulomb infrastructure and +creating reusable infrastructure for later use cases. That means Coulomb +is tenant zero/reference tenant inside its own future platform. This is a +useful design pressure, but only if the tenant/control-plane separation +is made explicit. + +NetKingdom owns the canonical identity and security architecture. +Railiance owns deployment layering. flex-auth provides the practical +reference implementation for CARING authorization semantics. key-cape and +Keycloak implement identity profiles in different operating modes. + +## Scope + +In scope: + +- document the three-plane architecture +- define platform-root versus tenant authority +- define how NetKingdom, key-cape, Keycloak, privacyIDEA, flex-auth, + Topaz, and Railiance relate +- define bootstrap-to-runtime trust states +- update related workplans and ADRs when implementation details become + concrete +- identify whether a dedicated orchestration repo is justified + +Out of scope: + +- implementing flex-auth adapters +- deploying Keycloak, key-cape, privacyIDEA, Topaz, or Railiance services +- designing customer-specific tenant policy +- replacing existing Railiance layer ownership + +## Tasks + +```task +id: NK-WP-0006-T1 +status: done +priority: high +``` +Document the recursive multi-tenant identity/security architecture in +`docs/platform-identity-security-architecture.md`. + +```task +id: NK-WP-0006-T2 +status: done +priority: high +``` +Record the architecture decision in an ADR so later repo work can point +to a stable decision. + +```task +id: NK-WP-0006-T3 +status: pending +priority: high +``` +Review flex-auth workplans and add tenant/control-plane implications: +CARING descriptors, policy packages, decision envelopes, Topaz adapter +scope, audit/explain records, and platform-root guardrails. + +```task +id: NK-WP-0006-T4 +status: pending +priority: high +``` +Review NetKingdom credential/bootstrap workplans and add explicit trust +state transitions: bare host, cluster, secrets, bootstrap identity, +runtime identity, runtime authorization, tenant onboarding. + +```task +id: NK-WP-0006-T5 +status: pending +priority: medium +``` +Map the first Coulomb tenant onboarding path: identity claims, tenant id, +resource registration, policy package import, Topaz initialization, and +audit readiness. + +```task +id: NK-WP-0006-T6 +status: pending +priority: medium +``` +Decide whether orchestration should stay as Railiance playbooks or become +a dedicated repo. Capture the decision as an ADR before implementation. + +```task +id: NK-WP-0006-T7 +status: pending +priority: medium +``` +Define production readiness checks for the security platform: MFA state, +secret rotation state, flex-auth policy state, Topaz health, audit sink, +and break-glass verification. + +## Acceptance Criteria + +- Architecture docs distinguish bootstrap plane, platform control plane, + and tenant plane. +- Coulomb is represented as tenant zero/reference tenant, not platform + root. +- The role of NetKingdom, key-cape, Keycloak, privacyIDEA, flex-auth, + Topaz, and Railiance is clear. +- Follow-up workplans identify where flex-auth and bootstrap work need to + adapt. +- Any future orchestration repo is justified by an ADR before it is + created.