coulomb/net-kingdom

Fork 0

generated from coulomb/repo-seed

Files

tegwick 1d0b0e7330 openbao king credential bootstrapping

2026-05-24 09:26:02 +02:00

22 KiB

Raw Blame History

Platform Identity and Security Architecture

Status: implemented architecture baseline for NetKingdom/Railiance/Coulomb Date: 2026-05-24

Purpose

This document captures the production-oriented identity, authorization, MFA, credential, and bootstrap architecture for the platform we are building. It deliberately treats Coulomb as the first internal tenant and reference workload, not as the platform itself.

The architecture must be recursive: the same platform that protects future tenants also protects the services and repositories used to build and operate the platform. That recursion is useful, but it is also where many security designs accidentally collapse into self-administering root power. This document exists to prevent that.

Core Model

Bootstrap plane
  establishes initial trust before normal platform services exist

Platform control plane
  operates identity, MFA, secrets, policy, audit, and authorization

Tenant planes
  run Coulomb and future customer/project/domain workloads

Coulomb is the first internal tenant. It is also the reference tenant that helps validate the platform. It must not become the platform root of trust merely because it is first.

Planes

Bootstrap Plane

The bootstrap plane exists before the full platform is alive. It owns the minimal authority needed to create and recover the control plane.

Responsibilities:

host provisioning and hardening
root age/SOPS material and emergency bundles
initial cluster access
initial identity service deployment
initial secret injection
break-glass recovery
transition to managed runtime authority

Owned primarily by railiance-infra, railiance-cluster, and the credential bootstrap work in net-kingdom.

Platform Control Plane

The platform control plane owns shared security services.

Responsibilities:

NetKingdom IAM Profile
lightweight identity mode through key-cape
expanded identity mode through Keycloak
MFA/token lifecycle through privacyIDEA where applicable
canonical authorization through flex-auth
delegated authorization runtime through Topaz first, with other PDPs as adapters
runtime secret authority through OpenBao
audit and explanation records
platform service secrets, dynamic credentials, leases, and rotation

Owned conceptually by net-kingdom; deployed through the Railiance stack.

Tenant Plane

Tenant planes are where workloads live. Coulomb is tenant zero/reference tenant; later tenants may be projects, customers, domains, sandboxes, or isolated deployments.

Responsibilities:

protected services and repositories
tenant-owned resources
tenant-specific groups, policies, and service accounts
local enforcement of authorization decisions
workload audit events and diagnostics

Tenant administrators may manage their tenant resources. They must not be able to alter platform root trust, global identity configuration, platform break-glass material, or the policy pipeline that governs the platform itself.

Component Responsibilities

Component	Primary role	Must not become
`net-kingdom`	canonical security architecture, IAM Profile, SSO/MFA, credential bootstrap decisions	a deployment repo for every stack layer
`key-cape`	lightweight IAM implementation of the NetKingdom IAM Profile	a general-purpose IAM platform or authorization engine
Keycloak	expanded-mode IAM and optional Keycloak Authorization Services adapter	the canonical model for all platform authorization
privacyIDEA	MFA/token authority, especially in lightweight/key-cape mode	a policy decision point for application resources
OpenBao	runtime platform secrets service, dynamic credential broker, lease/revocation point, and audit source for secret access	the bootstrap root of trust or an application-specific configuration store
`flex-auth`	authorization control plane, CARING descriptors, policy packages, decision envelopes, audit/explain	an identity provider or backend-specific wrapper
Topaz	first delegated authorization runtime/PDP for flex-auth	the platform control plane or identity provider
Railiance repos	converged infrastructure, cluster, platform services, enablement, and app deployment	the source of security policy semantics

Identity Path

Human/service/agent principal
        |
        v
NetKingdom IAM Profile
        |
        +-- lightweight mode: key-cape
        |       Authelia + LLDAP + privacyIDEA
        |
        +-- expanded mode: Keycloak
                Keycloak + LDAP/Entra federation + MFA integration

Applications depend on the IAM Profile, not on the concrete provider. key-cape is the lightweight profile implementation. Keycloak is the expanded-mode profile implementation. privacyIDEA provides MFA/token capabilities where the deployment mode uses it.

The canonical profile is NetKingdom IAM Profile v0.2 (canon/standards/iam-profile_v0.2.md). It requires explicit tenant, principal_type, groups, roles, scope/scp, and assurance claims so flex-auth receives normalized identity input regardless of whether key-cape or Keycloak issued the token.

The choice between lightweight and expanded mode is capability-driven, not scale-driven. key-cape comfortably serves large internal user populations; expanded-mode Keycloak is introduced when a capability is required that the lightweight stack does not provide — chiefly inbound enterprise federation and SAML brokering (Entra ID, Active Directory, generic SAML IdPs), complex multi-realm topologies, or delegated admin. A deployment climbs to expanded mode because it needs that capability, not because it has more users. The lower resource and operational footprint of the lightweight stack is a consequence of this rule, not the trigger for it. See Capability Progression below.

Identity answers: who is this actor, how was the actor authenticated, what coarse claims are asserted, and what assurance evidence exists?

Identity does not answer final resource-specific authorization.

Authorization Path

Identity claims from IAM Profile
        |
        v
flex-auth
  resource registry
  policy packages
  CARING descriptors
  decision/audit/explain envelope
        |
        +-- standalone evaluator
        +-- Topaz delegated PDP
        +-- optional Keycloak AuthZ adapter
        +-- future OpenFGA/SpiceDB/OPA/Cedar adapters
        |
        v
Protected service enforcement

Authorization answers: may this actor perform this action on this resource in this context, and what explanation/audit/CARING metadata supports that answer?

Protected services enforce decisions locally. flex-auth is the canonical policy and decision boundary; delegated PDPs are runtime implementations behind it.

Secret And Credential Path

Bootstrap SOPS/age material
        |
        v
OpenBao platform secrets service
  KV v2 platform configuration
  dynamic database credentials
  Kubernetes auth / workload identity
  future object-storage credential brokering
  audit devices and lease/revocation records
        |
        +-- direct OpenBao clients
        +-- External Secrets Operator / synced Kubernetes Secrets
        +-- CSI-mounted secrets where appropriate
        |
        v
Platform and tenant workloads

SOPS/age remains the bootstrap and Git-at-rest protection mechanism. It can create the initial cluster secrets and emergency recovery bundles, but it should not become the long-lived runtime authority for every workload secret.

OpenBao is the runtime platform secrets service once the control plane is alive. It owns secret leases, revocation, audit, dynamic credentials, and workload-facing secret delivery patterns. Workloads should receive scoped secrets or short-lived credentials, not platform-root material. Tenant administrators may manage tenant-scoped secrets through approved policy paths; they must not gain access to OpenBao root tokens, unseal keys, platform mounts, or global secret engine configuration.

OpenBao does not replace identity or authorization. NetKingdom IAM identifies actors and workloads; flex-auth decides whether a credential or secret request is allowed; OpenBao stores, issues, audits, and revokes the resulting secret material.

Platform Root Custody

Platform root authority is an accountable custody role, not a tenant admin role and not a Git account secret. docs/platform-root-custody.md records tegwick / bernd.worsch@gmail.com as the initial setup operator and contact, not as the long-term platform root of trust.

The actual root-of-trust target is a separate king credential: a dedicated, rarely used platform-root identity independent from day-to-day Gitea and email accounts. Email may receive notifications, but Git, Gitea, State Hub, chat, tickets, shell history, and email must never store or transfer unseal keys, root tokens, private keys, OTP seeds, recovery codes, or screenshots of secret output.

Production-ready custody should move toward independent escrow, preferably two-of-three human or institutional recovery control. Temporary single-operator king custody is allowed only as a pre-production bootstrap posture with second-factor protection, encrypted offline storage, and a low-friction upgrade path to additional custodians.

The normal admin path should become NetKingdom IAM claims mapped to scoped OpenBao policies. The initial OpenBao root token remains a bootstrap or break-glass artifact and must not become the standing operator credential. The platform must also reset or rotate bootstrap-era credentials and access paths before live workloads rely on it.

Recursive Trust Rule

Normal tenant administration must never be sufficient to alter the platform root of trust.

This applies even when the tenant is Coulomb. Coulomb can be a tenant and a reference workload, but platform-root actions require platform control plane authority and appropriate bootstrap/break-glass safeguards.

Examples of platform-root actions:

changing IAM Profile semantics
rotating root bootstrap keys
changing break-glass access
changing global MFA requirements
activating authorization policy that governs platform administration
changing flex-auth/Topaz policy import pipelines
changing OpenBao root tokens, unseal policy, platform mounts, or global auth methods
changing audit retention or tamper-evidence settings

Tenant Model

Every protected resource should belong to a tenant or to the platform control plane.

Suggested identifiers:

tenant:platform          # platform control plane resources
tenant:coulomb           # first internal/reference tenant
tenant:sandbox:<name>    # sandbox tenants
tenant:customer:<name>   # future customer tenants

Tenant membership and platform membership are distinct. A subject may be an administrator in tenant:coulomb without being a platform operator.

CARING descriptors should explicitly identify scope and tenant when the access is tenant-scoped. Platform-scoped descriptors should be rare, audited, and usually condition-bound.

Bootstrap To Runtime Transition

Production setup should move through explicit trust states:

Bare host trust - provisioned and verified by Railiance infra.
Cluster trust - Kubernetes runtime exists and is verified.
Bootstrap secret trust - age/SOPS and emergency bundles are established.
Bootstrap identity trust - local/bootstrap identity can operate enough to install full identity services.
Runtime secret trust - OpenBao is deployed, initialized, unsealed, audited, backed up, and ready to issue scoped secrets.
Runtime identity trust - key-cape or Keycloak becomes the normal IAM Profile issuer.
Runtime authorization trust - flex-auth and Topaz are initialized with platform and tenant policies.
Tenant onboarding trust - Coulomb and later tenants register resources and receive scoped authority.

Each transition needs a verification check and a rollback/recovery path.

Production Topology

For an initial production-capable Coulomb deployment:

railiance-infra
  host baseline, SSH, age keys, emergency material

railiance-cluster
  Kubernetes, ingress, cert-manager, network policy

railiance-platform
  OpenBao, PostgreSQL, object storage, platform service secret delivery
  key-cape or Keycloak
  privacyIDEA where used
  flex-auth
  Topaz

railiance-apps
  Coulomb services as tenant:coulomb workloads

net-kingdom owns the architecture and standards. Railiance owns the converged deployment layers. Component repos own their implementation contracts.

Orchestration Implication

A future orchestration repo may be justified, but only after the state machine is clear. It should not own resources directly. It should own safe sequencing across repos.

Possible responsibilities:

verify Railiance preconditions
initialize credential bootstrap
deploy or validate identity services
deploy or validate flex-auth and Topaz
run IAM Profile conformance checks
run authorization conformance checks
produce a platform security readiness report

This orchestration layer should build on Railiance capabilities rather than bypassing the Railiance stack boundaries.

ADR-0007 records the current decision: keep orchestration in Railiance playbooks for now, with NetKingdom defining the trust-state model, readiness checks, OpenBao boundaries, and security semantics.

The playbook interface for that split is the NetKingdom Playbook Capability Contract (canon/standards/playbook-capability-contract_v0.1.md). Railiance playbooks publish declarations beside the playbooks; NetKingdom validates and consumes those declarations to select capabilities, parametrize allowed inputs, and assemble responsibility/trust-state views without taking over execution.

flex-auth And Topaz Implications

flex-auth work must preserve the recursive boundary between platform control-plane resources and tenant resources.

Required implications:

CARING descriptors must include scope and tenant metadata for tenant-scoped access, and must mark rare platform-scoped access explicitly.
Policy packages must distinguish tenant:platform policy from tenant-local packages such as tenant:coulomb.
Decision envelopes must carry subject, issuer, audience, tenant, principal type, groups, roles, scopes, protected-system id, resource, action, requested TTL where relevant, assurance evidence, obligations, deny reasons, and audit correlation ids. Subject, issuer, audience, tenant, principal type, groups, roles, scopes, and assurance come from the IAM Profile v0.2 token contract rather than provider-specific session state.
Topaz is a delegated PDP runtime behind flex-auth. It must not become the canonical policy model, identity provider, or platform control plane.
Audit and explain records must be durable enough to reconstruct why a platform-root, secret, credential, or tenant-administration decision was allowed or denied.
Platform-root guardrails must deny tenant administrators the ability to alter IAM Profile semantics, OpenBao platform mounts/auth methods, flex-auth policy import pipelines, Topaz runtime configuration, or platform audit retention.

OpenBao secret access and dynamic credential requests follow the same authorization rule: identity proves the actor or workload, flex-auth decides whether the request is permitted, and OpenBao stores, issues, leases, audits, and revokes the secret material.

Coulomb Tenant Onboarding Path

The first Coulomb tenant onboarding path should be repeatable before it becomes automated:

Register tenant:coulomb as a tenant distinct from tenant:platform.
Map Coulomb human, service, and agent principals to IAM Profile claims with issuer, audience, subject, group, tenant, and assurance evidence.
Register Coulomb protected systems and resources in flex-auth with stable protected-system ids.
Import tenant-scoped policy packages and CARING descriptors for Coulomb resources.
Initialize the delegated PDP runtime, starting with Topaz, using only the policy packages approved for the tenant and platform boundary.
Provision Coulomb workload secret paths, Kubernetes auth roles, or delivery mechanisms in OpenBao without granting access to platform mounts, unseal/recovery material, or global auth configuration.
Run audit readiness checks before admitting production traffic: identity issuance, flex-auth decision envelope, Topaz health, OpenBao audit event, workload enforcement event, and correlation id.

The onboarding path is complete when a Coulomb workload can authenticate, receive a scoped authorization decision, obtain only the allowed secret or short-lived credential, enforce the decision locally, and produce an auditable record without receiving platform-root authority.

Capability Progression (Start Small → Enterprise)

NetKingdom is designed so an IT landscape can be brought up from nothing and hardened one capability at a time, with no structural rework when the next capability is added. Every tier is usable on its own and every tier issues or consumes the same NetKingdom IAM Profile, so adding a capability extends the system rather than replacing it.

The progression is capability-keyed: you climb a tier when you need the capability it adds, never because of user count.

Tier	Capability added	Components added	You move here when…
C0 — Bootstrap identity	A local OIDC issuer + secret bootstrap so things can start safely before the platform exists	local-identity (NK-WP-0002), SOPS/age + agent bootstrap (NK-WP-0004/0005)	you have nothing yet and need dev/test/sandbox identity
C1 — Lightweight SSO	Single-factor OIDC SSO over an internal directory	key-cape: Authelia + LLDAP	you want real SSO for internal users/services
C2a — 2FA (light)	Second factor without a new heavy service	Authelia built-in TOTP / WebAuthn	you need 2FA but not enterprise token lifecycle
C2b — Token authority	Hardware tokens, many token types, self-service enrollment, token lifecycle	privacyIDEA	you need an enterprise-grade MFA/token authority
C3 — Runtime secrets	Dynamic, scoped, leased, audited secrets beyond bootstrap	OpenBao (NK-WP-0006)	workloads need runtime secrets, not just bootstrap material
C4 — Fine-grained authZ	Policy-as-code decisions beyond coarse SSO claims	flex-auth + Topaz PDP (ADR-0006)	identity alone can no longer answer "may this actor do this?"
C5 — Enterprise federation	Inbound Entra ID / AD / SAML brokering, multi-tenant realms	expanded-mode Keycloak (NK-WP-0011)	identities originate in an external enterprise IdP
C6 — Self-optimizing	Audit feedback loops, drift surfacing, continuous adaptation	central audit sink + kaizen loops	the platform should improve and verify itself continuously

Two properties make this safe rather than just sequential:

Usable at every tier. C1 is a working SSO platform; you are never forced to reach C5 to get value.
No structural breaks. Because every tier targets the IAM Profile contract, 2FA (C2), runtime secrets (C3), fine-grained authorization (C4), and federation (C5) are additive. Applications keep targeting the same Profile; the implementation behind it grows.

2FA illustrates the principle precisely: if you do not need a second factor, C2 is simply absent — the C1 stack runs without privacyIDEA. When you do, C2a (Authelia's built-in TOTP/WebAuthn) is the light option and C2b (privacyIDEA) is the enterprise token-authority option. Neither requires re-architecting C1.

The intent is turn-key: NetKingdom selects, places, and orchestrates the components for the chosen tier set so the landscape reaches ready-to-run state — like building a house to handover condition — and can be extended to the next tier later without demolition.

Production Readiness Checks

Before the security platform is production-ready, each trust state needs an explicit check:

Area	Readiness check
Platform root custody	setup operator, dedicated king credential, second factor, recovery storage, escrow posture, and root-token disposition are recorded without storing secret values
MFA and identity	key-cape or Keycloak issues IAM Profile v0.2-compatible tokens and passes `tools/iam-profile-conformance/`; privacyIDEA or the selected MFA provider enforces required assurance for privileged actions
Bootstrap and recovery	age/SOPS material, emergency bundle, and break-glass credentials are present, tested, and separated from tenant administration
OpenBao runtime secrets	OpenBao is initialized, unsealed or auto-unsealed by the approved mechanism, backed up, audited, and using scoped auth methods and mounts
Secret rotation	service, database, OpenBao-issued, and break-glass rotation paths have documented blast radius and verification steps
flex-auth policy state	platform and tenant policy packages are versioned, reviewable, imported, and explainable
Topaz runtime	delegated PDP health, data freshness, policy load status, and fail-closed behavior are verified
Tenant onboarding	`tenant:coulomb` resources, claims, policies, OpenBao paths, and audit correlation are registered and tested
Audit sink	identity, flex-auth, Topaz, OpenBao, Kubernetes, and workload audit records land in durable storage with restore/drill coverage
Break-glass	emergency access works when normal identity is unavailable and produces a post-event review record

Open Questions

Where is the durable audit log stored for platform-root decisions?
Where are OpenBao audit logs durably shipped, and how are they included in tamper-evidence and restore drills?
Which actions require dual control or human confirmation?
How is break-glass use recorded when normal identity is unavailable?
Which workloads consume OpenBao directly, via External Secrets Operator, or via CSI-mounted secrets?
Which tenant metadata is required before a service can register resources with flex-auth?
What precise per-tenant trigger and dual-issuer coexistence rule should NK-WP-0011-T1 use for Keycloak expanded mode?
Does Topaz run centrally for the platform, per tenant, or per service for the first production deployment?

22 KiB Raw Blame History

Platform Identity and Security Architecture

Purpose

Core Model

Planes

Bootstrap Plane

Platform Control Plane

Tenant Plane

Component Responsibilities

Identity Path

Authorization Path

Secret And Credential Path

Platform Root Custody

Recursive Trust Rule

Tenant Model

Bootstrap To Runtime Transition

Production Topology

Orchestration Implication

flex-auth And Topaz Implications

Coulomb Tenant Onboarding Path

Capability Progression (Start Small → Enterprise)

Production Readiness Checks

Open Questions

22 KiB

Raw Blame History