generated from coulomb/repo-seed
273 lines
9.2 KiB
Markdown
273 lines
9.2 KiB
Markdown
# Platform Identity and Security Architecture
|
|
|
|
Status: draft architecture baseline for NetKingdom/Railiance/Coulomb
|
|
Date: 2026-05-17
|
|
|
|
## Purpose
|
|
|
|
This document captures the production-oriented identity, authorization,
|
|
MFA, credential, and bootstrap architecture for the platform we are
|
|
building. It deliberately treats Coulomb as the first internal tenant and
|
|
reference workload, not as the platform itself.
|
|
|
|
The architecture must be recursive: the same platform that protects
|
|
future tenants also protects the services and repositories used to build
|
|
and operate the platform. That recursion is useful, but it is also where
|
|
many security designs accidentally collapse into self-administering root
|
|
power. This document exists to prevent that.
|
|
|
|
## Core Model
|
|
|
|
```text
|
|
Bootstrap plane
|
|
establishes initial trust before normal platform services exist
|
|
|
|
Platform control plane
|
|
operates identity, MFA, secrets, policy, audit, and authorization
|
|
|
|
Tenant planes
|
|
run Coulomb and future customer/project/domain workloads
|
|
```
|
|
|
|
Coulomb is the first internal tenant. It is also the reference tenant that
|
|
helps validate the platform. It must not become the platform root of
|
|
trust merely because it is first.
|
|
|
|
## Planes
|
|
|
|
### Bootstrap Plane
|
|
|
|
The bootstrap plane exists before the full platform is alive. It owns the
|
|
minimal authority needed to create and recover the control plane.
|
|
|
|
Responsibilities:
|
|
|
|
- host provisioning and hardening
|
|
- root age/SOPS material and emergency bundles
|
|
- initial cluster access
|
|
- initial identity service deployment
|
|
- initial secret injection
|
|
- break-glass recovery
|
|
- transition to managed runtime authority
|
|
|
|
Owned primarily by `railiance-infra`, `railiance-cluster`, and the
|
|
credential bootstrap work in `net-kingdom`.
|
|
|
|
### Platform Control Plane
|
|
|
|
The platform control plane owns shared security services.
|
|
|
|
Responsibilities:
|
|
|
|
- NetKingdom IAM Profile
|
|
- lightweight identity mode through key-cape
|
|
- expanded identity mode through Keycloak
|
|
- MFA/token lifecycle through privacyIDEA where applicable
|
|
- canonical authorization through flex-auth
|
|
- delegated authorization runtime through Topaz first, with other PDPs as
|
|
adapters
|
|
- audit and explanation records
|
|
- platform service secrets and rotation
|
|
|
|
Owned conceptually by `net-kingdom`; deployed through the Railiance stack.
|
|
|
|
### Tenant Plane
|
|
|
|
Tenant planes are where workloads live. Coulomb is tenant zero/reference
|
|
tenant; later tenants may be projects, customers, domains, sandboxes, or
|
|
isolated deployments.
|
|
|
|
Responsibilities:
|
|
|
|
- protected services and repositories
|
|
- tenant-owned resources
|
|
- tenant-specific groups, policies, and service accounts
|
|
- local enforcement of authorization decisions
|
|
- workload audit events and diagnostics
|
|
|
|
Tenant administrators may manage their tenant resources. They must not be
|
|
able to alter platform root trust, global identity configuration,
|
|
platform break-glass material, or the policy pipeline that governs the
|
|
platform itself.
|
|
|
|
## Component Responsibilities
|
|
|
|
| Component | Primary role | Must not become |
|
|
| --- | --- | --- |
|
|
| `net-kingdom` | canonical security architecture, IAM Profile, SSO/MFA, credential bootstrap decisions | a deployment repo for every stack layer |
|
|
| `key-cape` | lightweight IAM implementation of the NetKingdom IAM Profile | a general-purpose IAM platform or authorization engine |
|
|
| Keycloak | expanded-mode IAM and optional Keycloak Authorization Services adapter | the canonical model for all platform authorization |
|
|
| privacyIDEA | MFA/token authority, especially in lightweight/key-cape mode | a policy decision point for application resources |
|
|
| `flex-auth` | authorization control plane, CARING descriptors, policy packages, decision envelopes, audit/explain | an identity provider or backend-specific wrapper |
|
|
| Topaz | first delegated authorization runtime/PDP for flex-auth | the platform control plane or identity provider |
|
|
| Railiance repos | converged infrastructure, cluster, platform services, enablement, and app deployment | the source of security policy semantics |
|
|
|
|
## Identity Path
|
|
|
|
```text
|
|
Human/service/agent principal
|
|
|
|
|
v
|
|
NetKingdom IAM Profile
|
|
|
|
|
+-- lightweight mode: key-cape
|
|
| Authelia + LLDAP + privacyIDEA
|
|
|
|
|
+-- expanded mode: Keycloak
|
|
Keycloak + LDAP/Entra federation + MFA integration
|
|
```
|
|
|
|
Applications depend on the IAM Profile, not on the concrete provider.
|
|
key-cape is the lightweight profile implementation. Keycloak is the
|
|
expanded-mode profile implementation. privacyIDEA provides MFA/token
|
|
capabilities where the deployment mode uses it.
|
|
|
|
Identity answers: who is this actor, how was the actor authenticated,
|
|
what coarse claims are asserted, and what assurance evidence exists?
|
|
|
|
Identity does not answer final resource-specific authorization.
|
|
|
|
## Authorization Path
|
|
|
|
```text
|
|
Identity claims from IAM Profile
|
|
|
|
|
v
|
|
flex-auth
|
|
resource registry
|
|
policy packages
|
|
CARING descriptors
|
|
decision/audit/explain envelope
|
|
|
|
|
+-- standalone evaluator
|
|
+-- Topaz delegated PDP
|
|
+-- optional Keycloak AuthZ adapter
|
|
+-- future OpenFGA/SpiceDB/OPA/Cedar adapters
|
|
|
|
|
v
|
|
Protected service enforcement
|
|
```
|
|
|
|
Authorization answers: may this actor perform this action on this
|
|
resource in this context, and what explanation/audit/CARING metadata
|
|
supports that answer?
|
|
|
|
Protected services enforce decisions locally. flex-auth is the canonical
|
|
policy and decision boundary; delegated PDPs are runtime implementations
|
|
behind it.
|
|
|
|
## Recursive Trust Rule
|
|
|
|
Normal tenant administration must never be sufficient to alter the
|
|
platform root of trust.
|
|
|
|
This applies even when the tenant is Coulomb. Coulomb can be a tenant and
|
|
a reference workload, but platform-root actions require platform control
|
|
plane authority and appropriate bootstrap/break-glass safeguards.
|
|
|
|
Examples of platform-root actions:
|
|
|
|
- changing IAM Profile semantics
|
|
- rotating root bootstrap keys
|
|
- changing break-glass access
|
|
- changing global MFA requirements
|
|
- activating authorization policy that governs platform administration
|
|
- changing flex-auth/Topaz policy import pipelines
|
|
- changing audit retention or tamper-evidence settings
|
|
|
|
## Tenant Model
|
|
|
|
Every protected resource should belong to a tenant or to the platform
|
|
control plane.
|
|
|
|
Suggested identifiers:
|
|
|
|
```text
|
|
tenant:platform # platform control plane resources
|
|
tenant:coulomb # first internal/reference tenant
|
|
tenant:sandbox:<name> # sandbox tenants
|
|
tenant:customer:<name> # future customer tenants
|
|
```
|
|
|
|
Tenant membership and platform membership are distinct. A subject may be
|
|
an administrator in `tenant:coulomb` without being a platform operator.
|
|
|
|
CARING descriptors should explicitly identify scope and tenant when the
|
|
access is tenant-scoped. Platform-scoped descriptors should be rare,
|
|
audited, and usually condition-bound.
|
|
|
|
## Bootstrap To Runtime Transition
|
|
|
|
Production setup should move through explicit trust states:
|
|
|
|
1. **Bare host trust** - provisioned and verified by Railiance infra.
|
|
2. **Cluster trust** - Kubernetes runtime exists and is verified.
|
|
3. **Secret trust** - age/SOPS and emergency bundles are established.
|
|
4. **Bootstrap identity trust** - local/bootstrap identity can operate
|
|
enough to install full identity services.
|
|
5. **Runtime identity trust** - key-cape or Keycloak becomes the normal
|
|
IAM Profile issuer.
|
|
6. **Runtime authorization trust** - flex-auth and Topaz are initialized
|
|
with platform and tenant policies.
|
|
7. **Tenant onboarding trust** - Coulomb and later tenants register
|
|
resources and receive scoped authority.
|
|
|
|
Each transition needs a verification check and a rollback/recovery path.
|
|
|
|
## Production Topology
|
|
|
|
For an initial production-capable Coulomb deployment:
|
|
|
|
```text
|
|
railiance-infra
|
|
host baseline, SSH, age keys, emergency material
|
|
|
|
railiance-cluster
|
|
Kubernetes, ingress, cert-manager, network policy
|
|
|
|
railiance-platform
|
|
PostgreSQL, object storage, secret management
|
|
key-cape or Keycloak
|
|
privacyIDEA where used
|
|
flex-auth
|
|
Topaz
|
|
|
|
railiance-apps
|
|
Coulomb services as tenant:coulomb workloads
|
|
```
|
|
|
|
`net-kingdom` owns the architecture and standards. Railiance owns the
|
|
converged deployment layers. Component repos own their implementation
|
|
contracts.
|
|
|
|
## Orchestration Implication
|
|
|
|
A future orchestration repo may be justified, but only after the state
|
|
machine is clear. It should not own resources directly. It should own
|
|
safe sequencing across repos.
|
|
|
|
Possible responsibilities:
|
|
|
|
- verify Railiance preconditions
|
|
- initialize credential bootstrap
|
|
- deploy or validate identity services
|
|
- deploy or validate flex-auth and Topaz
|
|
- run IAM Profile conformance checks
|
|
- run authorization conformance checks
|
|
- produce a platform security readiness report
|
|
|
|
This orchestration layer should build on Railiance capabilities rather
|
|
than bypassing the Railiance stack boundaries.
|
|
|
|
## Open Questions
|
|
|
|
- Where is the durable audit log stored for platform-root decisions?
|
|
- Which actions require dual control or human confirmation?
|
|
- How is break-glass use recorded when normal identity is unavailable?
|
|
- Which tenant metadata is required before a service can register
|
|
resources with flex-auth?
|
|
- When does the platform switch from key-cape lightweight mode to
|
|
Keycloak expanded mode?
|
|
- Does Topaz run centrally for the platform, per tenant, or per service
|
|
for the first production deployment?
|