generated from coulomb/repo-seed
323 lines
11 KiB
Markdown
323 lines
11 KiB
Markdown
# Platform Identity and Security Architecture
|
|
|
|
Status: draft architecture baseline for NetKingdom/Railiance/Coulomb
|
|
Date: 2026-05-17
|
|
|
|
## Purpose
|
|
|
|
This document captures the production-oriented identity, authorization,
|
|
MFA, credential, and bootstrap architecture for the platform we are
|
|
building. It deliberately treats Coulomb as the first internal tenant and
|
|
reference workload, not as the platform itself.
|
|
|
|
The architecture must be recursive: the same platform that protects
|
|
future tenants also protects the services and repositories used to build
|
|
and operate the platform. That recursion is useful, but it is also where
|
|
many security designs accidentally collapse into self-administering root
|
|
power. This document exists to prevent that.
|
|
|
|
## Core Model
|
|
|
|
```text
|
|
Bootstrap plane
|
|
establishes initial trust before normal platform services exist
|
|
|
|
Platform control plane
|
|
operates identity, MFA, secrets, policy, audit, and authorization
|
|
|
|
Tenant planes
|
|
run Coulomb and future customer/project/domain workloads
|
|
```
|
|
|
|
Coulomb is the first internal tenant. It is also the reference tenant that
|
|
helps validate the platform. It must not become the platform root of
|
|
trust merely because it is first.
|
|
|
|
## Planes
|
|
|
|
### Bootstrap Plane
|
|
|
|
The bootstrap plane exists before the full platform is alive. It owns the
|
|
minimal authority needed to create and recover the control plane.
|
|
|
|
Responsibilities:
|
|
|
|
- host provisioning and hardening
|
|
- root age/SOPS material and emergency bundles
|
|
- initial cluster access
|
|
- initial identity service deployment
|
|
- initial secret injection
|
|
- break-glass recovery
|
|
- transition to managed runtime authority
|
|
|
|
Owned primarily by `railiance-infra`, `railiance-cluster`, and the
|
|
credential bootstrap work in `net-kingdom`.
|
|
|
|
### Platform Control Plane
|
|
|
|
The platform control plane owns shared security services.
|
|
|
|
Responsibilities:
|
|
|
|
- NetKingdom IAM Profile
|
|
- lightweight identity mode through key-cape
|
|
- expanded identity mode through Keycloak
|
|
- MFA/token lifecycle through privacyIDEA where applicable
|
|
- canonical authorization through flex-auth
|
|
- delegated authorization runtime through Topaz first, with other PDPs as
|
|
adapters
|
|
- runtime secret authority through OpenBao
|
|
- audit and explanation records
|
|
- platform service secrets, dynamic credentials, leases, and rotation
|
|
|
|
Owned conceptually by `net-kingdom`; deployed through the Railiance stack.
|
|
|
|
### Tenant Plane
|
|
|
|
Tenant planes are where workloads live. Coulomb is tenant zero/reference
|
|
tenant; later tenants may be projects, customers, domains, sandboxes, or
|
|
isolated deployments.
|
|
|
|
Responsibilities:
|
|
|
|
- protected services and repositories
|
|
- tenant-owned resources
|
|
- tenant-specific groups, policies, and service accounts
|
|
- local enforcement of authorization decisions
|
|
- workload audit events and diagnostics
|
|
|
|
Tenant administrators may manage their tenant resources. They must not be
|
|
able to alter platform root trust, global identity configuration,
|
|
platform break-glass material, or the policy pipeline that governs the
|
|
platform itself.
|
|
|
|
## Component Responsibilities
|
|
|
|
| Component | Primary role | Must not become |
|
|
| --- | --- | --- |
|
|
| `net-kingdom` | canonical security architecture, IAM Profile, SSO/MFA, credential bootstrap decisions | a deployment repo for every stack layer |
|
|
| `key-cape` | lightweight IAM implementation of the NetKingdom IAM Profile | a general-purpose IAM platform or authorization engine |
|
|
| Keycloak | expanded-mode IAM and optional Keycloak Authorization Services adapter | the canonical model for all platform authorization |
|
|
| privacyIDEA | MFA/token authority, especially in lightweight/key-cape mode | a policy decision point for application resources |
|
|
| OpenBao | runtime platform secrets service, dynamic credential broker, lease/revocation point, and audit source for secret access | the bootstrap root of trust or an application-specific configuration store |
|
|
| `flex-auth` | authorization control plane, CARING descriptors, policy packages, decision envelopes, audit/explain | an identity provider or backend-specific wrapper |
|
|
| Topaz | first delegated authorization runtime/PDP for flex-auth | the platform control plane or identity provider |
|
|
| Railiance repos | converged infrastructure, cluster, platform services, enablement, and app deployment | the source of security policy semantics |
|
|
|
|
## Identity Path
|
|
|
|
```text
|
|
Human/service/agent principal
|
|
|
|
|
v
|
|
NetKingdom IAM Profile
|
|
|
|
|
+-- lightweight mode: key-cape
|
|
| Authelia + LLDAP + privacyIDEA
|
|
|
|
|
+-- expanded mode: Keycloak
|
|
Keycloak + LDAP/Entra federation + MFA integration
|
|
```
|
|
|
|
Applications depend on the IAM Profile, not on the concrete provider.
|
|
key-cape is the lightweight profile implementation. Keycloak is the
|
|
expanded-mode profile implementation. privacyIDEA provides MFA/token
|
|
capabilities where the deployment mode uses it.
|
|
|
|
Identity answers: who is this actor, how was the actor authenticated,
|
|
what coarse claims are asserted, and what assurance evidence exists?
|
|
|
|
Identity does not answer final resource-specific authorization.
|
|
|
|
## Authorization Path
|
|
|
|
```text
|
|
Identity claims from IAM Profile
|
|
|
|
|
v
|
|
flex-auth
|
|
resource registry
|
|
policy packages
|
|
CARING descriptors
|
|
decision/audit/explain envelope
|
|
|
|
|
+-- standalone evaluator
|
|
+-- Topaz delegated PDP
|
|
+-- optional Keycloak AuthZ adapter
|
|
+-- future OpenFGA/SpiceDB/OPA/Cedar adapters
|
|
|
|
|
v
|
|
Protected service enforcement
|
|
```
|
|
|
|
Authorization answers: may this actor perform this action on this
|
|
resource in this context, and what explanation/audit/CARING metadata
|
|
supports that answer?
|
|
|
|
Protected services enforce decisions locally. flex-auth is the canonical
|
|
policy and decision boundary; delegated PDPs are runtime implementations
|
|
behind it.
|
|
|
|
## Secret And Credential Path
|
|
|
|
```text
|
|
Bootstrap SOPS/age material
|
|
|
|
|
v
|
|
OpenBao platform secrets service
|
|
KV v2 platform configuration
|
|
dynamic database credentials
|
|
Kubernetes auth / workload identity
|
|
future object-storage credential brokering
|
|
audit devices and lease/revocation records
|
|
|
|
|
+-- direct OpenBao clients
|
|
+-- External Secrets Operator / synced Kubernetes Secrets
|
|
+-- CSI-mounted secrets where appropriate
|
|
|
|
|
v
|
|
Platform and tenant workloads
|
|
```
|
|
|
|
SOPS/age remains the bootstrap and Git-at-rest protection mechanism. It
|
|
can create the initial cluster secrets and emergency recovery bundles, but
|
|
it should not become the long-lived runtime authority for every workload
|
|
secret.
|
|
|
|
OpenBao is the runtime platform secrets service once the control plane is
|
|
alive. It owns secret leases, revocation, audit, dynamic credentials, and
|
|
workload-facing secret delivery patterns. Workloads should receive scoped
|
|
secrets or short-lived credentials, not platform-root material. Tenant
|
|
administrators may manage tenant-scoped secrets through approved policy
|
|
paths; they must not gain access to OpenBao root tokens, unseal keys,
|
|
platform mounts, or global secret engine configuration.
|
|
|
|
OpenBao does not replace identity or authorization. NetKingdom IAM
|
|
identifies actors and workloads; flex-auth decides whether a credential
|
|
or secret request is allowed; OpenBao stores, issues, audits, and revokes
|
|
the resulting secret material.
|
|
|
|
## Recursive Trust Rule
|
|
|
|
Normal tenant administration must never be sufficient to alter the
|
|
platform root of trust.
|
|
|
|
This applies even when the tenant is Coulomb. Coulomb can be a tenant and
|
|
a reference workload, but platform-root actions require platform control
|
|
plane authority and appropriate bootstrap/break-glass safeguards.
|
|
|
|
Examples of platform-root actions:
|
|
|
|
- changing IAM Profile semantics
|
|
- rotating root bootstrap keys
|
|
- changing break-glass access
|
|
- changing global MFA requirements
|
|
- activating authorization policy that governs platform administration
|
|
- changing flex-auth/Topaz policy import pipelines
|
|
- changing OpenBao root tokens, unseal policy, platform mounts, or global
|
|
auth methods
|
|
- changing audit retention or tamper-evidence settings
|
|
|
|
## Tenant Model
|
|
|
|
Every protected resource should belong to a tenant or to the platform
|
|
control plane.
|
|
|
|
Suggested identifiers:
|
|
|
|
```text
|
|
tenant:platform # platform control plane resources
|
|
tenant:coulomb # first internal/reference tenant
|
|
tenant:sandbox:<name> # sandbox tenants
|
|
tenant:customer:<name> # future customer tenants
|
|
```
|
|
|
|
Tenant membership and platform membership are distinct. A subject may be
|
|
an administrator in `tenant:coulomb` without being a platform operator.
|
|
|
|
CARING descriptors should explicitly identify scope and tenant when the
|
|
access is tenant-scoped. Platform-scoped descriptors should be rare,
|
|
audited, and usually condition-bound.
|
|
|
|
## Bootstrap To Runtime Transition
|
|
|
|
Production setup should move through explicit trust states:
|
|
|
|
1. **Bare host trust** - provisioned and verified by Railiance infra.
|
|
2. **Cluster trust** - Kubernetes runtime exists and is verified.
|
|
3. **Bootstrap secret trust** - age/SOPS and emergency bundles are
|
|
established.
|
|
4. **Bootstrap identity trust** - local/bootstrap identity can operate
|
|
enough to install full identity services.
|
|
5. **Runtime secret trust** - OpenBao is deployed, initialized, unsealed,
|
|
audited, backed up, and ready to issue scoped secrets.
|
|
6. **Runtime identity trust** - key-cape or Keycloak becomes the normal
|
|
IAM Profile issuer.
|
|
7. **Runtime authorization trust** - flex-auth and Topaz are initialized
|
|
with platform and tenant policies.
|
|
8. **Tenant onboarding trust** - Coulomb and later tenants register
|
|
resources and receive scoped authority.
|
|
|
|
Each transition needs a verification check and a rollback/recovery path.
|
|
|
|
## Production Topology
|
|
|
|
For an initial production-capable Coulomb deployment:
|
|
|
|
```text
|
|
railiance-infra
|
|
host baseline, SSH, age keys, emergency material
|
|
|
|
railiance-cluster
|
|
Kubernetes, ingress, cert-manager, network policy
|
|
|
|
railiance-platform
|
|
OpenBao, PostgreSQL, object storage, platform service secret delivery
|
|
key-cape or Keycloak
|
|
privacyIDEA where used
|
|
flex-auth
|
|
Topaz
|
|
|
|
railiance-apps
|
|
Coulomb services as tenant:coulomb workloads
|
|
```
|
|
|
|
`net-kingdom` owns the architecture and standards. Railiance owns the
|
|
converged deployment layers. Component repos own their implementation
|
|
contracts.
|
|
|
|
## Orchestration Implication
|
|
|
|
A future orchestration repo may be justified, but only after the state
|
|
machine is clear. It should not own resources directly. It should own
|
|
safe sequencing across repos.
|
|
|
|
Possible responsibilities:
|
|
|
|
- verify Railiance preconditions
|
|
- initialize credential bootstrap
|
|
- deploy or validate identity services
|
|
- deploy or validate flex-auth and Topaz
|
|
- run IAM Profile conformance checks
|
|
- run authorization conformance checks
|
|
- produce a platform security readiness report
|
|
|
|
This orchestration layer should build on Railiance capabilities rather
|
|
than bypassing the Railiance stack boundaries.
|
|
|
|
## Open Questions
|
|
|
|
- Where is the durable audit log stored for platform-root decisions?
|
|
- Where are OpenBao audit logs durably shipped, and how are they included
|
|
in tamper-evidence and restore drills?
|
|
- Which actions require dual control or human confirmation?
|
|
- How is break-glass use recorded when normal identity is unavailable?
|
|
- Which workloads consume OpenBao directly, via External Secrets Operator,
|
|
or via CSI-mounted secrets?
|
|
- Which tenant metadata is required before a service can register
|
|
resources with flex-auth?
|
|
- When does the platform switch from key-cape lightweight mode to
|
|
Keycloak expanded mode?
|
|
- Does Topaz run centrally for the platform, per tenant, or per service
|
|
for the first production deployment?
|