Document recursive platform security architecture

This commit is contained in:
2026-05-17 12:18:29 +02:00
parent 88fdb89e7d
commit 64a112f70c
3 changed files with 505 additions and 0 deletions

View File

@@ -0,0 +1,96 @@
# ADR-0006 - Recursive Multi-Tenant Identity and Authorization Architecture
**Status:** Accepted
**Date:** 2026-05-17
**Deciders:** Bernd Worsch
## Context
The Coulomb platform is being built from the same repositories and
services that will later support other use cases. This creates a
recursive architecture problem: Coulomb needs to use the shared identity,
security, policy, and deployment capabilities, while those capabilities
are themselves part of the infrastructure being built.
If this recursion is left implicit, the first internal use case can drift
into being treated as the platform root of trust. That would make future
multi-tenant use harder, blur operational authority, and make secure
bootstrap/recovery decisions harder to reason about.
NetKingdom already owns identity and security architecture concerns.
key-cape provides a lightweight IAM implementation of the NetKingdom IAM
Profile. Keycloak remains the expanded production IAM option. privacyIDEA
is relevant for MFA/token lifecycle. flex-auth is emerging as the
canonical authorization control plane and practical reference
implementation of CARING authorization semantics. Topaz is the most
likely first delegated authorization runtime behind flex-auth.
## Decision
We will document and implement the platform security architecture as a
recursive multi-tenant architecture with three explicit planes:
- **Bootstrap plane** - establishes the first trusted runtime and recovery
authority before normal platform services exist.
- **Platform control plane** - operates shared identity, MFA, secrets,
authorization, policy, audit, and explanation services.
- **Tenant plane** - runs Coulomb and future workloads under scoped tenant
authority.
Coulomb will be treated as the first internal/reference tenant, not as the
platform root of trust.
NetKingdom will own the canonical security architecture and standards.
Railiance will own deployment layering and orchestration boundaries.
flex-auth will own the canonical authorization interface and CARING-based
policy/decision model. Topaz will be the first delegated PDP runtime,
with other authorization engines treated as adapters where useful.
## Consequences
- Architecture documentation must separate platform-root authority from
tenant administration, even for Coulomb.
- Workplans for identity, authorization, and bootstrapping must include
explicit tenant and control-plane boundaries.
- Bootstrap design must include trust-state transitions and recovery
procedures rather than assuming the final IAM service already exists.
- flex-auth should model tenants, platform resources, CARING descriptors,
decision envelopes, and runtime adapters in a provider-neutral way.
- key-cape and Keycloak should be treated as implementations of the IAM
Profile, not as the canonical source of resource authorization
semantics.
- A future orchestration repo may be useful, but only to coordinate safe
sequencing across Railiance and NetKingdom capabilities. It must not
bypass Railiance stack ownership.
## Alternatives Considered
### Treat Coulomb As The Platform Root
This is simpler during early development but creates long-term coupling
between one internal use case and the shared platform. It makes later
multi-tenant operation and secure bootstrap harder.
### Put All Security Semantics Into Keycloak
Keycloak is useful for expanded IAM and can provide authorization
features, but making it the canonical model would make lightweight mode
and future authorization backends harder to support. The preferred model
keeps identity provider concerns separate from canonical authorization
semantics.
### Create An Orchestration Repo Immediately
A dedicated orchestration repo may become appropriate. Creating it before
we define trust states and repo boundaries would risk encoding accidental
sequence logic too early. The immediate step is to document the state
machine and update workplans.
## Follow-Up
- Refine bootstrapping around explicit trust-state transitions.
- Add tenant/control-plane language to flex-auth authorization workplans.
- Define the first production Topaz integration boundary for flex-auth.
- Decide when key-cape is sufficient and when Keycloak expanded mode is
required.
- Decide what, if anything, should live in a future orchestration repo.

View File

@@ -0,0 +1,272 @@
# Platform Identity and Security Architecture
Status: draft architecture baseline for NetKingdom/Railiance/Coulomb
Date: 2026-05-17
## Purpose
This document captures the production-oriented identity, authorization,
MFA, credential, and bootstrap architecture for the platform we are
building. It deliberately treats Coulomb as the first internal tenant and
reference workload, not as the platform itself.
The architecture must be recursive: the same platform that protects
future tenants also protects the services and repositories used to build
and operate the platform. That recursion is useful, but it is also where
many security designs accidentally collapse into self-administering root
power. This document exists to prevent that.
## Core Model
```text
Bootstrap plane
establishes initial trust before normal platform services exist
Platform control plane
operates identity, MFA, secrets, policy, audit, and authorization
Tenant planes
run Coulomb and future customer/project/domain workloads
```
Coulomb is the first internal tenant. It is also the reference tenant that
helps validate the platform. It must not become the platform root of
trust merely because it is first.
## Planes
### Bootstrap Plane
The bootstrap plane exists before the full platform is alive. It owns the
minimal authority needed to create and recover the control plane.
Responsibilities:
- host provisioning and hardening
- root age/SOPS material and emergency bundles
- initial cluster access
- initial identity service deployment
- initial secret injection
- break-glass recovery
- transition to managed runtime authority
Owned primarily by `railiance-infra`, `railiance-cluster`, and the
credential bootstrap work in `net-kingdom`.
### Platform Control Plane
The platform control plane owns shared security services.
Responsibilities:
- NetKingdom IAM Profile
- lightweight identity mode through key-cape
- expanded identity mode through Keycloak
- MFA/token lifecycle through privacyIDEA where applicable
- canonical authorization through flex-auth
- delegated authorization runtime through Topaz first, with other PDPs as
adapters
- audit and explanation records
- platform service secrets and rotation
Owned conceptually by `net-kingdom`; deployed through the Railiance stack.
### Tenant Plane
Tenant planes are where workloads live. Coulomb is tenant zero/reference
tenant; later tenants may be projects, customers, domains, sandboxes, or
isolated deployments.
Responsibilities:
- protected services and repositories
- tenant-owned resources
- tenant-specific groups, policies, and service accounts
- local enforcement of authorization decisions
- workload audit events and diagnostics
Tenant administrators may manage their tenant resources. They must not be
able to alter platform root trust, global identity configuration,
platform break-glass material, or the policy pipeline that governs the
platform itself.
## Component Responsibilities
| Component | Primary role | Must not become |
| --- | --- | --- |
| `net-kingdom` | canonical security architecture, IAM Profile, SSO/MFA, credential bootstrap decisions | a deployment repo for every stack layer |
| `key-cape` | lightweight IAM implementation of the NetKingdom IAM Profile | a general-purpose IAM platform or authorization engine |
| Keycloak | expanded-mode IAM and optional Keycloak Authorization Services adapter | the canonical model for all platform authorization |
| privacyIDEA | MFA/token authority, especially in lightweight/key-cape mode | a policy decision point for application resources |
| `flex-auth` | authorization control plane, CARING descriptors, policy packages, decision envelopes, audit/explain | an identity provider or backend-specific wrapper |
| Topaz | first delegated authorization runtime/PDP for flex-auth | the platform control plane or identity provider |
| Railiance repos | converged infrastructure, cluster, platform services, enablement, and app deployment | the source of security policy semantics |
## Identity Path
```text
Human/service/agent principal
|
v
NetKingdom IAM Profile
|
+-- lightweight mode: key-cape
| Authelia + LLDAP + privacyIDEA
|
+-- expanded mode: Keycloak
Keycloak + LDAP/Entra federation + MFA integration
```
Applications depend on the IAM Profile, not on the concrete provider.
key-cape is the lightweight profile implementation. Keycloak is the
expanded-mode profile implementation. privacyIDEA provides MFA/token
capabilities where the deployment mode uses it.
Identity answers: who is this actor, how was the actor authenticated,
what coarse claims are asserted, and what assurance evidence exists?
Identity does not answer final resource-specific authorization.
## Authorization Path
```text
Identity claims from IAM Profile
|
v
flex-auth
resource registry
policy packages
CARING descriptors
decision/audit/explain envelope
|
+-- standalone evaluator
+-- Topaz delegated PDP
+-- optional Keycloak AuthZ adapter
+-- future OpenFGA/SpiceDB/OPA/Cedar adapters
|
v
Protected service enforcement
```
Authorization answers: may this actor perform this action on this
resource in this context, and what explanation/audit/CARING metadata
supports that answer?
Protected services enforce decisions locally. flex-auth is the canonical
policy and decision boundary; delegated PDPs are runtime implementations
behind it.
## Recursive Trust Rule
Normal tenant administration must never be sufficient to alter the
platform root of trust.
This applies even when the tenant is Coulomb. Coulomb can be a tenant and
a reference workload, but platform-root actions require platform control
plane authority and appropriate bootstrap/break-glass safeguards.
Examples of platform-root actions:
- changing IAM Profile semantics
- rotating root bootstrap keys
- changing break-glass access
- changing global MFA requirements
- activating authorization policy that governs platform administration
- changing flex-auth/Topaz policy import pipelines
- changing audit retention or tamper-evidence settings
## Tenant Model
Every protected resource should belong to a tenant or to the platform
control plane.
Suggested identifiers:
```text
tenant:platform # platform control plane resources
tenant:coulomb # first internal/reference tenant
tenant:sandbox:<name> # sandbox tenants
tenant:customer:<name> # future customer tenants
```
Tenant membership and platform membership are distinct. A subject may be
an administrator in `tenant:coulomb` without being a platform operator.
CARING descriptors should explicitly identify scope and tenant when the
access is tenant-scoped. Platform-scoped descriptors should be rare,
audited, and usually condition-bound.
## Bootstrap To Runtime Transition
Production setup should move through explicit trust states:
1. **Bare host trust** - provisioned and verified by Railiance infra.
2. **Cluster trust** - Kubernetes runtime exists and is verified.
3. **Secret trust** - age/SOPS and emergency bundles are established.
4. **Bootstrap identity trust** - local/bootstrap identity can operate
enough to install full identity services.
5. **Runtime identity trust** - key-cape or Keycloak becomes the normal
IAM Profile issuer.
6. **Runtime authorization trust** - flex-auth and Topaz are initialized
with platform and tenant policies.
7. **Tenant onboarding trust** - Coulomb and later tenants register
resources and receive scoped authority.
Each transition needs a verification check and a rollback/recovery path.
## Production Topology
For an initial production-capable Coulomb deployment:
```text
railiance-infra
host baseline, SSH, age keys, emergency material
railiance-cluster
Kubernetes, ingress, cert-manager, network policy
railiance-platform
PostgreSQL, object storage, secret management
key-cape or Keycloak
privacyIDEA where used
flex-auth
Topaz
railiance-apps
Coulomb services as tenant:coulomb workloads
```
`net-kingdom` owns the architecture and standards. Railiance owns the
converged deployment layers. Component repos own their implementation
contracts.
## Orchestration Implication
A future orchestration repo may be justified, but only after the state
machine is clear. It should not own resources directly. It should own
safe sequencing across repos.
Possible responsibilities:
- verify Railiance preconditions
- initialize credential bootstrap
- deploy or validate identity services
- deploy or validate flex-auth and Topaz
- run IAM Profile conformance checks
- run authorization conformance checks
- produce a platform security readiness report
This orchestration layer should build on Railiance capabilities rather
than bypassing the Railiance stack boundaries.
## Open Questions
- Where is the durable audit log stored for platform-root decisions?
- Which actions require dual control or human confirmation?
- How is break-glass use recorded when normal identity is unavailable?
- Which tenant metadata is required before a service can register
resources with flex-auth?
- When does the platform switch from key-cape lightweight mode to
Keycloak expanded mode?
- Does Topaz run centrally for the platform, per tenant, or per service
for the first production deployment?

View File

@@ -0,0 +1,137 @@
---
id: NK-WP-0006
type: workplan
title: Recursive platform identity and security architecture
domain: identity-security
repo: net-kingdom
status: proposed
owner: Bernd Worsch
topic_slug: recursive-platform-identity-security
created: 2026-05-17
updated: 2026-05-17
depends_on:
- NK-WP-0001
- NK-WP-0004
- NK-WP-0005
---
# NK-WP-0006 - Recursive Platform Identity and Security Architecture
## Goal
Make the platform identity and security architecture explicit enough that
Coulomb can be onboarded as the first internal/reference tenant without
accidentally becoming the platform root of trust.
The workplan turns the recursive insight into operational structure:
bootstrap plane, platform control plane, tenant plane, IAM Profile,
flex-auth authorization, Topaz runtime, privacyIDEA MFA/token handling,
and safe orchestration boundaries.
## Context
The current platform work is both building the Coulomb infrastructure and
creating reusable infrastructure for later use cases. That means Coulomb
is tenant zero/reference tenant inside its own future platform. This is a
useful design pressure, but only if the tenant/control-plane separation
is made explicit.
NetKingdom owns the canonical identity and security architecture.
Railiance owns deployment layering. flex-auth provides the practical
reference implementation for CARING authorization semantics. key-cape and
Keycloak implement identity profiles in different operating modes.
## Scope
In scope:
- document the three-plane architecture
- define platform-root versus tenant authority
- define how NetKingdom, key-cape, Keycloak, privacyIDEA, flex-auth,
Topaz, and Railiance relate
- define bootstrap-to-runtime trust states
- update related workplans and ADRs when implementation details become
concrete
- identify whether a dedicated orchestration repo is justified
Out of scope:
- implementing flex-auth adapters
- deploying Keycloak, key-cape, privacyIDEA, Topaz, or Railiance services
- designing customer-specific tenant policy
- replacing existing Railiance layer ownership
## Tasks
```task
id: NK-WP-0006-T1
status: done
priority: high
```
Document the recursive multi-tenant identity/security architecture in
`docs/platform-identity-security-architecture.md`.
```task
id: NK-WP-0006-T2
status: done
priority: high
```
Record the architecture decision in an ADR so later repo work can point
to a stable decision.
```task
id: NK-WP-0006-T3
status: pending
priority: high
```
Review flex-auth workplans and add tenant/control-plane implications:
CARING descriptors, policy packages, decision envelopes, Topaz adapter
scope, audit/explain records, and platform-root guardrails.
```task
id: NK-WP-0006-T4
status: pending
priority: high
```
Review NetKingdom credential/bootstrap workplans and add explicit trust
state transitions: bare host, cluster, secrets, bootstrap identity,
runtime identity, runtime authorization, tenant onboarding.
```task
id: NK-WP-0006-T5
status: pending
priority: medium
```
Map the first Coulomb tenant onboarding path: identity claims, tenant id,
resource registration, policy package import, Topaz initialization, and
audit readiness.
```task
id: NK-WP-0006-T6
status: pending
priority: medium
```
Decide whether orchestration should stay as Railiance playbooks or become
a dedicated repo. Capture the decision as an ADR before implementation.
```task
id: NK-WP-0006-T7
status: pending
priority: medium
```
Define production readiness checks for the security platform: MFA state,
secret rotation state, flex-auth policy state, Topaz health, audit sink,
and break-glass verification.
## Acceptance Criteria
- Architecture docs distinguish bootstrap plane, platform control plane,
and tenant plane.
- Coulomb is represented as tenant zero/reference tenant, not platform
root.
- The role of NetKingdom, key-cape, Keycloak, privacyIDEA, flex-auth,
Topaz, and Railiance is clear.
- Follow-up workplans identify where flex-auth and bootstrap work need to
adapt.
- Any future orchestration repo is justified by an ADR before it is
created.