generated from coulomb/repo-seed
Document recursive platform security architecture
This commit is contained in:
@@ -0,0 +1,96 @@
|
||||
# ADR-0006 - Recursive Multi-Tenant Identity and Authorization Architecture
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2026-05-17
|
||||
**Deciders:** Bernd Worsch
|
||||
|
||||
## Context
|
||||
|
||||
The Coulomb platform is being built from the same repositories and
|
||||
services that will later support other use cases. This creates a
|
||||
recursive architecture problem: Coulomb needs to use the shared identity,
|
||||
security, policy, and deployment capabilities, while those capabilities
|
||||
are themselves part of the infrastructure being built.
|
||||
|
||||
If this recursion is left implicit, the first internal use case can drift
|
||||
into being treated as the platform root of trust. That would make future
|
||||
multi-tenant use harder, blur operational authority, and make secure
|
||||
bootstrap/recovery decisions harder to reason about.
|
||||
|
||||
NetKingdom already owns identity and security architecture concerns.
|
||||
key-cape provides a lightweight IAM implementation of the NetKingdom IAM
|
||||
Profile. Keycloak remains the expanded production IAM option. privacyIDEA
|
||||
is relevant for MFA/token lifecycle. flex-auth is emerging as the
|
||||
canonical authorization control plane and practical reference
|
||||
implementation of CARING authorization semantics. Topaz is the most
|
||||
likely first delegated authorization runtime behind flex-auth.
|
||||
|
||||
## Decision
|
||||
|
||||
We will document and implement the platform security architecture as a
|
||||
recursive multi-tenant architecture with three explicit planes:
|
||||
|
||||
- **Bootstrap plane** - establishes the first trusted runtime and recovery
|
||||
authority before normal platform services exist.
|
||||
- **Platform control plane** - operates shared identity, MFA, secrets,
|
||||
authorization, policy, audit, and explanation services.
|
||||
- **Tenant plane** - runs Coulomb and future workloads under scoped tenant
|
||||
authority.
|
||||
|
||||
Coulomb will be treated as the first internal/reference tenant, not as the
|
||||
platform root of trust.
|
||||
|
||||
NetKingdom will own the canonical security architecture and standards.
|
||||
Railiance will own deployment layering and orchestration boundaries.
|
||||
flex-auth will own the canonical authorization interface and CARING-based
|
||||
policy/decision model. Topaz will be the first delegated PDP runtime,
|
||||
with other authorization engines treated as adapters where useful.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Architecture documentation must separate platform-root authority from
|
||||
tenant administration, even for Coulomb.
|
||||
- Workplans for identity, authorization, and bootstrapping must include
|
||||
explicit tenant and control-plane boundaries.
|
||||
- Bootstrap design must include trust-state transitions and recovery
|
||||
procedures rather than assuming the final IAM service already exists.
|
||||
- flex-auth should model tenants, platform resources, CARING descriptors,
|
||||
decision envelopes, and runtime adapters in a provider-neutral way.
|
||||
- key-cape and Keycloak should be treated as implementations of the IAM
|
||||
Profile, not as the canonical source of resource authorization
|
||||
semantics.
|
||||
- A future orchestration repo may be useful, but only to coordinate safe
|
||||
sequencing across Railiance and NetKingdom capabilities. It must not
|
||||
bypass Railiance stack ownership.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Treat Coulomb As The Platform Root
|
||||
|
||||
This is simpler during early development but creates long-term coupling
|
||||
between one internal use case and the shared platform. It makes later
|
||||
multi-tenant operation and secure bootstrap harder.
|
||||
|
||||
### Put All Security Semantics Into Keycloak
|
||||
|
||||
Keycloak is useful for expanded IAM and can provide authorization
|
||||
features, but making it the canonical model would make lightweight mode
|
||||
and future authorization backends harder to support. The preferred model
|
||||
keeps identity provider concerns separate from canonical authorization
|
||||
semantics.
|
||||
|
||||
### Create An Orchestration Repo Immediately
|
||||
|
||||
A dedicated orchestration repo may become appropriate. Creating it before
|
||||
we define trust states and repo boundaries would risk encoding accidental
|
||||
sequence logic too early. The immediate step is to document the state
|
||||
machine and update workplans.
|
||||
|
||||
## Follow-Up
|
||||
|
||||
- Refine bootstrapping around explicit trust-state transitions.
|
||||
- Add tenant/control-plane language to flex-auth authorization workplans.
|
||||
- Define the first production Topaz integration boundary for flex-auth.
|
||||
- Decide when key-cape is sufficient and when Keycloak expanded mode is
|
||||
required.
|
||||
- Decide what, if anything, should live in a future orchestration repo.
|
||||
272
docs/platform-identity-security-architecture.md
Normal file
272
docs/platform-identity-security-architecture.md
Normal file
@@ -0,0 +1,272 @@
|
||||
# Platform Identity and Security Architecture
|
||||
|
||||
Status: draft architecture baseline for NetKingdom/Railiance/Coulomb
|
||||
Date: 2026-05-17
|
||||
|
||||
## Purpose
|
||||
|
||||
This document captures the production-oriented identity, authorization,
|
||||
MFA, credential, and bootstrap architecture for the platform we are
|
||||
building. It deliberately treats Coulomb as the first internal tenant and
|
||||
reference workload, not as the platform itself.
|
||||
|
||||
The architecture must be recursive: the same platform that protects
|
||||
future tenants also protects the services and repositories used to build
|
||||
and operate the platform. That recursion is useful, but it is also where
|
||||
many security designs accidentally collapse into self-administering root
|
||||
power. This document exists to prevent that.
|
||||
|
||||
## Core Model
|
||||
|
||||
```text
|
||||
Bootstrap plane
|
||||
establishes initial trust before normal platform services exist
|
||||
|
||||
Platform control plane
|
||||
operates identity, MFA, secrets, policy, audit, and authorization
|
||||
|
||||
Tenant planes
|
||||
run Coulomb and future customer/project/domain workloads
|
||||
```
|
||||
|
||||
Coulomb is the first internal tenant. It is also the reference tenant that
|
||||
helps validate the platform. It must not become the platform root of
|
||||
trust merely because it is first.
|
||||
|
||||
## Planes
|
||||
|
||||
### Bootstrap Plane
|
||||
|
||||
The bootstrap plane exists before the full platform is alive. It owns the
|
||||
minimal authority needed to create and recover the control plane.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- host provisioning and hardening
|
||||
- root age/SOPS material and emergency bundles
|
||||
- initial cluster access
|
||||
- initial identity service deployment
|
||||
- initial secret injection
|
||||
- break-glass recovery
|
||||
- transition to managed runtime authority
|
||||
|
||||
Owned primarily by `railiance-infra`, `railiance-cluster`, and the
|
||||
credential bootstrap work in `net-kingdom`.
|
||||
|
||||
### Platform Control Plane
|
||||
|
||||
The platform control plane owns shared security services.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- NetKingdom IAM Profile
|
||||
- lightweight identity mode through key-cape
|
||||
- expanded identity mode through Keycloak
|
||||
- MFA/token lifecycle through privacyIDEA where applicable
|
||||
- canonical authorization through flex-auth
|
||||
- delegated authorization runtime through Topaz first, with other PDPs as
|
||||
adapters
|
||||
- audit and explanation records
|
||||
- platform service secrets and rotation
|
||||
|
||||
Owned conceptually by `net-kingdom`; deployed through the Railiance stack.
|
||||
|
||||
### Tenant Plane
|
||||
|
||||
Tenant planes are where workloads live. Coulomb is tenant zero/reference
|
||||
tenant; later tenants may be projects, customers, domains, sandboxes, or
|
||||
isolated deployments.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- protected services and repositories
|
||||
- tenant-owned resources
|
||||
- tenant-specific groups, policies, and service accounts
|
||||
- local enforcement of authorization decisions
|
||||
- workload audit events and diagnostics
|
||||
|
||||
Tenant administrators may manage their tenant resources. They must not be
|
||||
able to alter platform root trust, global identity configuration,
|
||||
platform break-glass material, or the policy pipeline that governs the
|
||||
platform itself.
|
||||
|
||||
## Component Responsibilities
|
||||
|
||||
| Component | Primary role | Must not become |
|
||||
| --- | --- | --- |
|
||||
| `net-kingdom` | canonical security architecture, IAM Profile, SSO/MFA, credential bootstrap decisions | a deployment repo for every stack layer |
|
||||
| `key-cape` | lightweight IAM implementation of the NetKingdom IAM Profile | a general-purpose IAM platform or authorization engine |
|
||||
| Keycloak | expanded-mode IAM and optional Keycloak Authorization Services adapter | the canonical model for all platform authorization |
|
||||
| privacyIDEA | MFA/token authority, especially in lightweight/key-cape mode | a policy decision point for application resources |
|
||||
| `flex-auth` | authorization control plane, CARING descriptors, policy packages, decision envelopes, audit/explain | an identity provider or backend-specific wrapper |
|
||||
| Topaz | first delegated authorization runtime/PDP for flex-auth | the platform control plane or identity provider |
|
||||
| Railiance repos | converged infrastructure, cluster, platform services, enablement, and app deployment | the source of security policy semantics |
|
||||
|
||||
## Identity Path
|
||||
|
||||
```text
|
||||
Human/service/agent principal
|
||||
|
|
||||
v
|
||||
NetKingdom IAM Profile
|
||||
|
|
||||
+-- lightweight mode: key-cape
|
||||
| Authelia + LLDAP + privacyIDEA
|
||||
|
|
||||
+-- expanded mode: Keycloak
|
||||
Keycloak + LDAP/Entra federation + MFA integration
|
||||
```
|
||||
|
||||
Applications depend on the IAM Profile, not on the concrete provider.
|
||||
key-cape is the lightweight profile implementation. Keycloak is the
|
||||
expanded-mode profile implementation. privacyIDEA provides MFA/token
|
||||
capabilities where the deployment mode uses it.
|
||||
|
||||
Identity answers: who is this actor, how was the actor authenticated,
|
||||
what coarse claims are asserted, and what assurance evidence exists?
|
||||
|
||||
Identity does not answer final resource-specific authorization.
|
||||
|
||||
## Authorization Path
|
||||
|
||||
```text
|
||||
Identity claims from IAM Profile
|
||||
|
|
||||
v
|
||||
flex-auth
|
||||
resource registry
|
||||
policy packages
|
||||
CARING descriptors
|
||||
decision/audit/explain envelope
|
||||
|
|
||||
+-- standalone evaluator
|
||||
+-- Topaz delegated PDP
|
||||
+-- optional Keycloak AuthZ adapter
|
||||
+-- future OpenFGA/SpiceDB/OPA/Cedar adapters
|
||||
|
|
||||
v
|
||||
Protected service enforcement
|
||||
```
|
||||
|
||||
Authorization answers: may this actor perform this action on this
|
||||
resource in this context, and what explanation/audit/CARING metadata
|
||||
supports that answer?
|
||||
|
||||
Protected services enforce decisions locally. flex-auth is the canonical
|
||||
policy and decision boundary; delegated PDPs are runtime implementations
|
||||
behind it.
|
||||
|
||||
## Recursive Trust Rule
|
||||
|
||||
Normal tenant administration must never be sufficient to alter the
|
||||
platform root of trust.
|
||||
|
||||
This applies even when the tenant is Coulomb. Coulomb can be a tenant and
|
||||
a reference workload, but platform-root actions require platform control
|
||||
plane authority and appropriate bootstrap/break-glass safeguards.
|
||||
|
||||
Examples of platform-root actions:
|
||||
|
||||
- changing IAM Profile semantics
|
||||
- rotating root bootstrap keys
|
||||
- changing break-glass access
|
||||
- changing global MFA requirements
|
||||
- activating authorization policy that governs platform administration
|
||||
- changing flex-auth/Topaz policy import pipelines
|
||||
- changing audit retention or tamper-evidence settings
|
||||
|
||||
## Tenant Model
|
||||
|
||||
Every protected resource should belong to a tenant or to the platform
|
||||
control plane.
|
||||
|
||||
Suggested identifiers:
|
||||
|
||||
```text
|
||||
tenant:platform # platform control plane resources
|
||||
tenant:coulomb # first internal/reference tenant
|
||||
tenant:sandbox:<name> # sandbox tenants
|
||||
tenant:customer:<name> # future customer tenants
|
||||
```
|
||||
|
||||
Tenant membership and platform membership are distinct. A subject may be
|
||||
an administrator in `tenant:coulomb` without being a platform operator.
|
||||
|
||||
CARING descriptors should explicitly identify scope and tenant when the
|
||||
access is tenant-scoped. Platform-scoped descriptors should be rare,
|
||||
audited, and usually condition-bound.
|
||||
|
||||
## Bootstrap To Runtime Transition
|
||||
|
||||
Production setup should move through explicit trust states:
|
||||
|
||||
1. **Bare host trust** - provisioned and verified by Railiance infra.
|
||||
2. **Cluster trust** - Kubernetes runtime exists and is verified.
|
||||
3. **Secret trust** - age/SOPS and emergency bundles are established.
|
||||
4. **Bootstrap identity trust** - local/bootstrap identity can operate
|
||||
enough to install full identity services.
|
||||
5. **Runtime identity trust** - key-cape or Keycloak becomes the normal
|
||||
IAM Profile issuer.
|
||||
6. **Runtime authorization trust** - flex-auth and Topaz are initialized
|
||||
with platform and tenant policies.
|
||||
7. **Tenant onboarding trust** - Coulomb and later tenants register
|
||||
resources and receive scoped authority.
|
||||
|
||||
Each transition needs a verification check and a rollback/recovery path.
|
||||
|
||||
## Production Topology
|
||||
|
||||
For an initial production-capable Coulomb deployment:
|
||||
|
||||
```text
|
||||
railiance-infra
|
||||
host baseline, SSH, age keys, emergency material
|
||||
|
||||
railiance-cluster
|
||||
Kubernetes, ingress, cert-manager, network policy
|
||||
|
||||
railiance-platform
|
||||
PostgreSQL, object storage, secret management
|
||||
key-cape or Keycloak
|
||||
privacyIDEA where used
|
||||
flex-auth
|
||||
Topaz
|
||||
|
||||
railiance-apps
|
||||
Coulomb services as tenant:coulomb workloads
|
||||
```
|
||||
|
||||
`net-kingdom` owns the architecture and standards. Railiance owns the
|
||||
converged deployment layers. Component repos own their implementation
|
||||
contracts.
|
||||
|
||||
## Orchestration Implication
|
||||
|
||||
A future orchestration repo may be justified, but only after the state
|
||||
machine is clear. It should not own resources directly. It should own
|
||||
safe sequencing across repos.
|
||||
|
||||
Possible responsibilities:
|
||||
|
||||
- verify Railiance preconditions
|
||||
- initialize credential bootstrap
|
||||
- deploy or validate identity services
|
||||
- deploy or validate flex-auth and Topaz
|
||||
- run IAM Profile conformance checks
|
||||
- run authorization conformance checks
|
||||
- produce a platform security readiness report
|
||||
|
||||
This orchestration layer should build on Railiance capabilities rather
|
||||
than bypassing the Railiance stack boundaries.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Where is the durable audit log stored for platform-root decisions?
|
||||
- Which actions require dual control or human confirmation?
|
||||
- How is break-glass use recorded when normal identity is unavailable?
|
||||
- Which tenant metadata is required before a service can register
|
||||
resources with flex-auth?
|
||||
- When does the platform switch from key-cape lightweight mode to
|
||||
Keycloak expanded mode?
|
||||
- Does Topaz run centrally for the platform, per tenant, or per service
|
||||
for the first production deployment?
|
||||
@@ -0,0 +1,137 @@
|
||||
---
|
||||
id: NK-WP-0006
|
||||
type: workplan
|
||||
title: Recursive platform identity and security architecture
|
||||
domain: identity-security
|
||||
repo: net-kingdom
|
||||
status: proposed
|
||||
owner: Bernd Worsch
|
||||
topic_slug: recursive-platform-identity-security
|
||||
created: 2026-05-17
|
||||
updated: 2026-05-17
|
||||
depends_on:
|
||||
- NK-WP-0001
|
||||
- NK-WP-0004
|
||||
- NK-WP-0005
|
||||
---
|
||||
|
||||
# NK-WP-0006 - Recursive Platform Identity and Security Architecture
|
||||
|
||||
## Goal
|
||||
|
||||
Make the platform identity and security architecture explicit enough that
|
||||
Coulomb can be onboarded as the first internal/reference tenant without
|
||||
accidentally becoming the platform root of trust.
|
||||
|
||||
The workplan turns the recursive insight into operational structure:
|
||||
bootstrap plane, platform control plane, tenant plane, IAM Profile,
|
||||
flex-auth authorization, Topaz runtime, privacyIDEA MFA/token handling,
|
||||
and safe orchestration boundaries.
|
||||
|
||||
## Context
|
||||
|
||||
The current platform work is both building the Coulomb infrastructure and
|
||||
creating reusable infrastructure for later use cases. That means Coulomb
|
||||
is tenant zero/reference tenant inside its own future platform. This is a
|
||||
useful design pressure, but only if the tenant/control-plane separation
|
||||
is made explicit.
|
||||
|
||||
NetKingdom owns the canonical identity and security architecture.
|
||||
Railiance owns deployment layering. flex-auth provides the practical
|
||||
reference implementation for CARING authorization semantics. key-cape and
|
||||
Keycloak implement identity profiles in different operating modes.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- document the three-plane architecture
|
||||
- define platform-root versus tenant authority
|
||||
- define how NetKingdom, key-cape, Keycloak, privacyIDEA, flex-auth,
|
||||
Topaz, and Railiance relate
|
||||
- define bootstrap-to-runtime trust states
|
||||
- update related workplans and ADRs when implementation details become
|
||||
concrete
|
||||
- identify whether a dedicated orchestration repo is justified
|
||||
|
||||
Out of scope:
|
||||
|
||||
- implementing flex-auth adapters
|
||||
- deploying Keycloak, key-cape, privacyIDEA, Topaz, or Railiance services
|
||||
- designing customer-specific tenant policy
|
||||
- replacing existing Railiance layer ownership
|
||||
|
||||
## Tasks
|
||||
|
||||
```task
|
||||
id: NK-WP-0006-T1
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
Document the recursive multi-tenant identity/security architecture in
|
||||
`docs/platform-identity-security-architecture.md`.
|
||||
|
||||
```task
|
||||
id: NK-WP-0006-T2
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
Record the architecture decision in an ADR so later repo work can point
|
||||
to a stable decision.
|
||||
|
||||
```task
|
||||
id: NK-WP-0006-T3
|
||||
status: pending
|
||||
priority: high
|
||||
```
|
||||
Review flex-auth workplans and add tenant/control-plane implications:
|
||||
CARING descriptors, policy packages, decision envelopes, Topaz adapter
|
||||
scope, audit/explain records, and platform-root guardrails.
|
||||
|
||||
```task
|
||||
id: NK-WP-0006-T4
|
||||
status: pending
|
||||
priority: high
|
||||
```
|
||||
Review NetKingdom credential/bootstrap workplans and add explicit trust
|
||||
state transitions: bare host, cluster, secrets, bootstrap identity,
|
||||
runtime identity, runtime authorization, tenant onboarding.
|
||||
|
||||
```task
|
||||
id: NK-WP-0006-T5
|
||||
status: pending
|
||||
priority: medium
|
||||
```
|
||||
Map the first Coulomb tenant onboarding path: identity claims, tenant id,
|
||||
resource registration, policy package import, Topaz initialization, and
|
||||
audit readiness.
|
||||
|
||||
```task
|
||||
id: NK-WP-0006-T6
|
||||
status: pending
|
||||
priority: medium
|
||||
```
|
||||
Decide whether orchestration should stay as Railiance playbooks or become
|
||||
a dedicated repo. Capture the decision as an ADR before implementation.
|
||||
|
||||
```task
|
||||
id: NK-WP-0006-T7
|
||||
status: pending
|
||||
priority: medium
|
||||
```
|
||||
Define production readiness checks for the security platform: MFA state,
|
||||
secret rotation state, flex-auth policy state, Topaz health, audit sink,
|
||||
and break-glass verification.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- Architecture docs distinguish bootstrap plane, platform control plane,
|
||||
and tenant plane.
|
||||
- Coulomb is represented as tenant zero/reference tenant, not platform
|
||||
root.
|
||||
- The role of NetKingdom, key-cape, Keycloak, privacyIDEA, flex-auth,
|
||||
Topaz, and Railiance is clear.
|
||||
- Follow-up workplans identify where flex-auth and bootstrap work need to
|
||||
adapt.
|
||||
- Any future orchestration repo is justified by an ADR before it is
|
||||
created.
|
||||
Reference in New Issue
Block a user