generated from coulomb/repo-seed
Refine the recursive platform security architecture to make OpenBao the canonical runtime secret authority, with SOPS/age, K8s Secrets, and the emergency bundle reframed as bootstrap/delivery/break-glass mechanisms. - credential-management standard v0.2: add OpenBao runtime authority section, rotation rules, and prohibited patterns (OpenBao-as-PDP, tenant platform-root) - platform-identity-security-architecture: mark implemented; add flex-auth/Topaz implications, Coulomb onboarding path, and a production-readiness checklist - NK-WP-0004/0005: document bootstrap-to-OpenBao handoff boundary - NK-WP-0006/0007: status -> done with implementation reviews; add recursive platform/tenant split and OpenBao broker/audit role for object-storage STS vending - NK-WP-0008: status -> done; repoint corpus to infospace-bench - new ADR-0007 (orchestration boundary), ADR-0008 (STS vending boundary), and the object-storage STS credential-vending architecture Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
403 lines
16 KiB
Markdown
403 lines
16 KiB
Markdown
# Platform Identity and Security Architecture
|
|
|
|
Status: implemented architecture baseline for NetKingdom/Railiance/Coulomb
|
|
Date: 2026-05-18
|
|
|
|
## Purpose
|
|
|
|
This document captures the production-oriented identity, authorization,
|
|
MFA, credential, and bootstrap architecture for the platform we are
|
|
building. It deliberately treats Coulomb as the first internal tenant and
|
|
reference workload, not as the platform itself.
|
|
|
|
The architecture must be recursive: the same platform that protects
|
|
future tenants also protects the services and repositories used to build
|
|
and operate the platform. That recursion is useful, but it is also where
|
|
many security designs accidentally collapse into self-administering root
|
|
power. This document exists to prevent that.
|
|
|
|
## Core Model
|
|
|
|
```text
|
|
Bootstrap plane
|
|
establishes initial trust before normal platform services exist
|
|
|
|
Platform control plane
|
|
operates identity, MFA, secrets, policy, audit, and authorization
|
|
|
|
Tenant planes
|
|
run Coulomb and future customer/project/domain workloads
|
|
```
|
|
|
|
Coulomb is the first internal tenant. It is also the reference tenant that
|
|
helps validate the platform. It must not become the platform root of
|
|
trust merely because it is first.
|
|
|
|
## Planes
|
|
|
|
### Bootstrap Plane
|
|
|
|
The bootstrap plane exists before the full platform is alive. It owns the
|
|
minimal authority needed to create and recover the control plane.
|
|
|
|
Responsibilities:
|
|
|
|
- host provisioning and hardening
|
|
- root age/SOPS material and emergency bundles
|
|
- initial cluster access
|
|
- initial identity service deployment
|
|
- initial secret injection
|
|
- break-glass recovery
|
|
- transition to managed runtime authority
|
|
|
|
Owned primarily by `railiance-infra`, `railiance-cluster`, and the
|
|
credential bootstrap work in `net-kingdom`.
|
|
|
|
### Platform Control Plane
|
|
|
|
The platform control plane owns shared security services.
|
|
|
|
Responsibilities:
|
|
|
|
- NetKingdom IAM Profile
|
|
- lightweight identity mode through key-cape
|
|
- expanded identity mode through Keycloak
|
|
- MFA/token lifecycle through privacyIDEA where applicable
|
|
- canonical authorization through flex-auth
|
|
- delegated authorization runtime through Topaz first, with other PDPs as
|
|
adapters
|
|
- runtime secret authority through OpenBao
|
|
- audit and explanation records
|
|
- platform service secrets, dynamic credentials, leases, and rotation
|
|
|
|
Owned conceptually by `net-kingdom`; deployed through the Railiance stack.
|
|
|
|
### Tenant Plane
|
|
|
|
Tenant planes are where workloads live. Coulomb is tenant zero/reference
|
|
tenant; later tenants may be projects, customers, domains, sandboxes, or
|
|
isolated deployments.
|
|
|
|
Responsibilities:
|
|
|
|
- protected services and repositories
|
|
- tenant-owned resources
|
|
- tenant-specific groups, policies, and service accounts
|
|
- local enforcement of authorization decisions
|
|
- workload audit events and diagnostics
|
|
|
|
Tenant administrators may manage their tenant resources. They must not be
|
|
able to alter platform root trust, global identity configuration,
|
|
platform break-glass material, or the policy pipeline that governs the
|
|
platform itself.
|
|
|
|
## Component Responsibilities
|
|
|
|
| Component | Primary role | Must not become |
|
|
| --- | --- | --- |
|
|
| `net-kingdom` | canonical security architecture, IAM Profile, SSO/MFA, credential bootstrap decisions | a deployment repo for every stack layer |
|
|
| `key-cape` | lightweight IAM implementation of the NetKingdom IAM Profile | a general-purpose IAM platform or authorization engine |
|
|
| Keycloak | expanded-mode IAM and optional Keycloak Authorization Services adapter | the canonical model for all platform authorization |
|
|
| privacyIDEA | MFA/token authority, especially in lightweight/key-cape mode | a policy decision point for application resources |
|
|
| OpenBao | runtime platform secrets service, dynamic credential broker, lease/revocation point, and audit source for secret access | the bootstrap root of trust or an application-specific configuration store |
|
|
| `flex-auth` | authorization control plane, CARING descriptors, policy packages, decision envelopes, audit/explain | an identity provider or backend-specific wrapper |
|
|
| Topaz | first delegated authorization runtime/PDP for flex-auth | the platform control plane or identity provider |
|
|
| Railiance repos | converged infrastructure, cluster, platform services, enablement, and app deployment | the source of security policy semantics |
|
|
|
|
## Identity Path
|
|
|
|
```text
|
|
Human/service/agent principal
|
|
|
|
|
v
|
|
NetKingdom IAM Profile
|
|
|
|
|
+-- lightweight mode: key-cape
|
|
| Authelia + LLDAP + privacyIDEA
|
|
|
|
|
+-- expanded mode: Keycloak
|
|
Keycloak + LDAP/Entra federation + MFA integration
|
|
```
|
|
|
|
Applications depend on the IAM Profile, not on the concrete provider.
|
|
key-cape is the lightweight profile implementation. Keycloak is the
|
|
expanded-mode profile implementation. privacyIDEA provides MFA/token
|
|
capabilities where the deployment mode uses it.
|
|
|
|
Identity answers: who is this actor, how was the actor authenticated,
|
|
what coarse claims are asserted, and what assurance evidence exists?
|
|
|
|
Identity does not answer final resource-specific authorization.
|
|
|
|
## Authorization Path
|
|
|
|
```text
|
|
Identity claims from IAM Profile
|
|
|
|
|
v
|
|
flex-auth
|
|
resource registry
|
|
policy packages
|
|
CARING descriptors
|
|
decision/audit/explain envelope
|
|
|
|
|
+-- standalone evaluator
|
|
+-- Topaz delegated PDP
|
|
+-- optional Keycloak AuthZ adapter
|
|
+-- future OpenFGA/SpiceDB/OPA/Cedar adapters
|
|
|
|
|
v
|
|
Protected service enforcement
|
|
```
|
|
|
|
Authorization answers: may this actor perform this action on this
|
|
resource in this context, and what explanation/audit/CARING metadata
|
|
supports that answer?
|
|
|
|
Protected services enforce decisions locally. flex-auth is the canonical
|
|
policy and decision boundary; delegated PDPs are runtime implementations
|
|
behind it.
|
|
|
|
## Secret And Credential Path
|
|
|
|
```text
|
|
Bootstrap SOPS/age material
|
|
|
|
|
v
|
|
OpenBao platform secrets service
|
|
KV v2 platform configuration
|
|
dynamic database credentials
|
|
Kubernetes auth / workload identity
|
|
future object-storage credential brokering
|
|
audit devices and lease/revocation records
|
|
|
|
|
+-- direct OpenBao clients
|
|
+-- External Secrets Operator / synced Kubernetes Secrets
|
|
+-- CSI-mounted secrets where appropriate
|
|
|
|
|
v
|
|
Platform and tenant workloads
|
|
```
|
|
|
|
SOPS/age remains the bootstrap and Git-at-rest protection mechanism. It
|
|
can create the initial cluster secrets and emergency recovery bundles, but
|
|
it should not become the long-lived runtime authority for every workload
|
|
secret.
|
|
|
|
OpenBao is the runtime platform secrets service once the control plane is
|
|
alive. It owns secret leases, revocation, audit, dynamic credentials, and
|
|
workload-facing secret delivery patterns. Workloads should receive scoped
|
|
secrets or short-lived credentials, not platform-root material. Tenant
|
|
administrators may manage tenant-scoped secrets through approved policy
|
|
paths; they must not gain access to OpenBao root tokens, unseal keys,
|
|
platform mounts, or global secret engine configuration.
|
|
|
|
OpenBao does not replace identity or authorization. NetKingdom IAM
|
|
identifies actors and workloads; flex-auth decides whether a credential
|
|
or secret request is allowed; OpenBao stores, issues, audits, and revokes
|
|
the resulting secret material.
|
|
|
|
## Recursive Trust Rule
|
|
|
|
Normal tenant administration must never be sufficient to alter the
|
|
platform root of trust.
|
|
|
|
This applies even when the tenant is Coulomb. Coulomb can be a tenant and
|
|
a reference workload, but platform-root actions require platform control
|
|
plane authority and appropriate bootstrap/break-glass safeguards.
|
|
|
|
Examples of platform-root actions:
|
|
|
|
- changing IAM Profile semantics
|
|
- rotating root bootstrap keys
|
|
- changing break-glass access
|
|
- changing global MFA requirements
|
|
- activating authorization policy that governs platform administration
|
|
- changing flex-auth/Topaz policy import pipelines
|
|
- changing OpenBao root tokens, unseal policy, platform mounts, or global
|
|
auth methods
|
|
- changing audit retention or tamper-evidence settings
|
|
|
|
## Tenant Model
|
|
|
|
Every protected resource should belong to a tenant or to the platform
|
|
control plane.
|
|
|
|
Suggested identifiers:
|
|
|
|
```text
|
|
tenant:platform # platform control plane resources
|
|
tenant:coulomb # first internal/reference tenant
|
|
tenant:sandbox:<name> # sandbox tenants
|
|
tenant:customer:<name> # future customer tenants
|
|
```
|
|
|
|
Tenant membership and platform membership are distinct. A subject may be
|
|
an administrator in `tenant:coulomb` without being a platform operator.
|
|
|
|
CARING descriptors should explicitly identify scope and tenant when the
|
|
access is tenant-scoped. Platform-scoped descriptors should be rare,
|
|
audited, and usually condition-bound.
|
|
|
|
## Bootstrap To Runtime Transition
|
|
|
|
Production setup should move through explicit trust states:
|
|
|
|
1. **Bare host trust** - provisioned and verified by Railiance infra.
|
|
2. **Cluster trust** - Kubernetes runtime exists and is verified.
|
|
3. **Bootstrap secret trust** - age/SOPS and emergency bundles are
|
|
established.
|
|
4. **Bootstrap identity trust** - local/bootstrap identity can operate
|
|
enough to install full identity services.
|
|
5. **Runtime secret trust** - OpenBao is deployed, initialized, unsealed,
|
|
audited, backed up, and ready to issue scoped secrets.
|
|
6. **Runtime identity trust** - key-cape or Keycloak becomes the normal
|
|
IAM Profile issuer.
|
|
7. **Runtime authorization trust** - flex-auth and Topaz are initialized
|
|
with platform and tenant policies.
|
|
8. **Tenant onboarding trust** - Coulomb and later tenants register
|
|
resources and receive scoped authority.
|
|
|
|
Each transition needs a verification check and a rollback/recovery path.
|
|
|
|
## Production Topology
|
|
|
|
For an initial production-capable Coulomb deployment:
|
|
|
|
```text
|
|
railiance-infra
|
|
host baseline, SSH, age keys, emergency material
|
|
|
|
railiance-cluster
|
|
Kubernetes, ingress, cert-manager, network policy
|
|
|
|
railiance-platform
|
|
OpenBao, PostgreSQL, object storage, platform service secret delivery
|
|
key-cape or Keycloak
|
|
privacyIDEA where used
|
|
flex-auth
|
|
Topaz
|
|
|
|
railiance-apps
|
|
Coulomb services as tenant:coulomb workloads
|
|
```
|
|
|
|
`net-kingdom` owns the architecture and standards. Railiance owns the
|
|
converged deployment layers. Component repos own their implementation
|
|
contracts.
|
|
|
|
## Orchestration Implication
|
|
|
|
A future orchestration repo may be justified, but only after the state
|
|
machine is clear. It should not own resources directly. It should own
|
|
safe sequencing across repos.
|
|
|
|
Possible responsibilities:
|
|
|
|
- verify Railiance preconditions
|
|
- initialize credential bootstrap
|
|
- deploy or validate identity services
|
|
- deploy or validate flex-auth and Topaz
|
|
- run IAM Profile conformance checks
|
|
- run authorization conformance checks
|
|
- produce a platform security readiness report
|
|
|
|
This orchestration layer should build on Railiance capabilities rather
|
|
than bypassing the Railiance stack boundaries.
|
|
|
|
ADR-0007 records the current decision: keep orchestration in Railiance
|
|
playbooks for now, with NetKingdom defining the trust-state model,
|
|
readiness checks, OpenBao boundaries, and security semantics.
|
|
|
|
## flex-auth And Topaz Implications
|
|
|
|
flex-auth work must preserve the recursive boundary between platform
|
|
control-plane resources and tenant resources.
|
|
|
|
Required implications:
|
|
|
|
- CARING descriptors must include scope and tenant metadata for
|
|
tenant-scoped access, and must mark rare platform-scoped access
|
|
explicitly.
|
|
- Policy packages must distinguish `tenant:platform` policy from
|
|
tenant-local packages such as `tenant:coulomb`.
|
|
- Decision envelopes must carry subject, issuer, audience, tenant,
|
|
protected-system id, resource, action, requested TTL where relevant,
|
|
assurance evidence, obligations, deny reasons, and audit correlation
|
|
ids.
|
|
- Topaz is a delegated PDP runtime behind flex-auth. It must not become
|
|
the canonical policy model, identity provider, or platform control
|
|
plane.
|
|
- Audit and explain records must be durable enough to reconstruct why a
|
|
platform-root, secret, credential, or tenant-administration decision was
|
|
allowed or denied.
|
|
- Platform-root guardrails must deny tenant administrators the ability to
|
|
alter IAM Profile semantics, OpenBao platform mounts/auth methods,
|
|
flex-auth policy import pipelines, Topaz runtime configuration, or
|
|
platform audit retention.
|
|
|
|
OpenBao secret access and dynamic credential requests follow the same
|
|
authorization rule: identity proves the actor or workload, flex-auth
|
|
decides whether the request is permitted, and OpenBao stores, issues,
|
|
leases, audits, and revokes the secret material.
|
|
|
|
## Coulomb Tenant Onboarding Path
|
|
|
|
The first Coulomb tenant onboarding path should be repeatable before it
|
|
becomes automated:
|
|
|
|
1. Register `tenant:coulomb` as a tenant distinct from
|
|
`tenant:platform`.
|
|
2. Map Coulomb human, service, and agent principals to IAM Profile claims
|
|
with issuer, audience, subject, group, tenant, and assurance evidence.
|
|
3. Register Coulomb protected systems and resources in flex-auth with
|
|
stable protected-system ids.
|
|
4. Import tenant-scoped policy packages and CARING descriptors for
|
|
Coulomb resources.
|
|
5. Initialize the delegated PDP runtime, starting with Topaz, using only
|
|
the policy packages approved for the tenant and platform boundary.
|
|
6. Provision Coulomb workload secret paths, Kubernetes auth roles, or
|
|
delivery mechanisms in OpenBao without granting access to platform
|
|
mounts, unseal/recovery material, or global auth configuration.
|
|
7. Run audit readiness checks before admitting production traffic:
|
|
identity issuance, flex-auth decision envelope, Topaz health,
|
|
OpenBao audit event, workload enforcement event, and correlation id.
|
|
|
|
The onboarding path is complete when a Coulomb workload can authenticate,
|
|
receive a scoped authorization decision, obtain only the allowed secret or
|
|
short-lived credential, enforce the decision locally, and produce an
|
|
auditable record without receiving platform-root authority.
|
|
|
|
## Production Readiness Checks
|
|
|
|
Before the security platform is production-ready, each trust state needs
|
|
an explicit check:
|
|
|
|
| Area | Readiness check |
|
|
| --- | --- |
|
|
| MFA and identity | key-cape or Keycloak issues IAM Profile-compatible tokens; privacyIDEA or the selected MFA provider enforces required assurance for privileged actions |
|
|
| Bootstrap and recovery | age/SOPS material, emergency bundle, and break-glass credentials are present, tested, and separated from tenant administration |
|
|
| OpenBao runtime secrets | OpenBao is initialized, unsealed or auto-unsealed by the approved mechanism, backed up, audited, and using scoped auth methods and mounts |
|
|
| Secret rotation | service, database, OpenBao-issued, and break-glass rotation paths have documented blast radius and verification steps |
|
|
| flex-auth policy state | platform and tenant policy packages are versioned, reviewable, imported, and explainable |
|
|
| Topaz runtime | delegated PDP health, data freshness, policy load status, and fail-closed behavior are verified |
|
|
| Tenant onboarding | `tenant:coulomb` resources, claims, policies, OpenBao paths, and audit correlation are registered and tested |
|
|
| Audit sink | identity, flex-auth, Topaz, OpenBao, Kubernetes, and workload audit records land in durable storage with restore/drill coverage |
|
|
| Break-glass | emergency access works when normal identity is unavailable and produces a post-event review record |
|
|
|
|
## Open Questions
|
|
|
|
- Where is the durable audit log stored for platform-root decisions?
|
|
- Where are OpenBao audit logs durably shipped, and how are they included
|
|
in tamper-evidence and restore drills?
|
|
- Which actions require dual control or human confirmation?
|
|
- How is break-glass use recorded when normal identity is unavailable?
|
|
- Which workloads consume OpenBao directly, via External Secrets Operator,
|
|
or via CSI-mounted secrets?
|
|
- Which tenant metadata is required before a service can register
|
|
resources with flex-auth?
|
|
- When does the platform switch from key-cape lightweight mode to
|
|
Keycloak expanded mode?
|
|
- Does Topaz run centrally for the platform, per tenant, or per service
|
|
for the first production deployment?
|