generated from coulomb/repo-seed
261 lines
11 KiB
Markdown
261 lines
11 KiB
Markdown
---
|
||
id: NK-WP-0011
|
||
type: workplan
|
||
title: "Enterprise Federation & SAML — Expanded-Mode Keycloak Identity Broker"
|
||
domain: netkingdom
|
||
repo: net-kingdom
|
||
status: proposed
|
||
owner: worsch
|
||
topic_slug: netkingdom
|
||
created: "2026-05-20"
|
||
updated: "2026-05-20"
|
||
state_hub_workstream_id: a44beef8-c18b-4ae7-b7fe-a178cc4fcdf0
|
||
depends_on:
|
||
- NK-WP-0003
|
||
- NK-WP-0004
|
||
- NK-WP-0006
|
||
supersedes_tasks:
|
||
- NK-WP-0001-T05
|
||
- NK-WP-0001-T06
|
||
- NK-WP-0001-T07
|
||
- NK-WP-0001-T08
|
||
---
|
||
|
||
# NK-WP-0011 — Enterprise Federation & SAML (Expanded-Mode Keycloak)
|
||
|
||
> Extracted from NK-WP-0001 (T05–T08, the deferred Keycloak path) and
|
||
> refined against where net-kingdom actually stands today: a deployed
|
||
> KeyCape lightweight stack, an OpenBao runtime-secret authority, and a
|
||
> recursive platform/tenant authorization model. This is **expanded
|
||
> identity mode** in the architecture (`docs/platform-identity-security-architecture.md`).
|
||
|
||
## Goal
|
||
|
||
Stand up **Keycloak as an identity broker** that federates upstream
|
||
enterprise identity providers (Microsoft Entra ID / Azure AD via OIDC,
|
||
on-prem Active Directory via LDAP, and generic SAML 2.0 IdPs) and issues
|
||
**NetKingdom IAM Profile-conformant** tokens downstream — without
|
||
displacing flex-auth as the authorization decision point or breaking the
|
||
recursive platform/tenant boundary.
|
||
|
||
This is the answer to the long-standing open question
|
||
*"when does the platform switch from key-cape lightweight mode to Keycloak
|
||
expanded mode?"* — expanded mode exists **specifically** to onboard
|
||
identities that originate in an external enterprise IdP, which the
|
||
lightweight Authelia + LLDAP stack cannot broker.
|
||
|
||
## Why this is not just "resume NK-WP-0001"
|
||
|
||
NK-WP-0001 assumed a greenfield: bootstrap Vault, build PostgreSQL, treat
|
||
Keycloak as the internal user store. None of those assumptions hold now:
|
||
|
||
| NK-WP-0001 assumption | Current reality | Effect on this plan |
|
||
|---|---|---|
|
||
| HashiCorp Vault, bootstrapped from KeePassXC | **OpenBao** is the runtime secret authority (NK-WP-0006); SOPS/age + agent bootstrap exist (NK-WP-0004/0005) | Keycloak DB + admin secrets come from OpenBao via ESO; no new vault bootstrap |
|
||
| PostgreSQL built from scratch | CloudNativePG running on RAILIANCE01 (NK-WP-0003) | Add `keycloak_db` to the existing operator, reuse backup pattern |
|
||
| Keycloak is the internal source of truth (D2 hybrid) | KeyCape lightweight stack is the *deployed* IAM Profile issuer | Keycloak is a **broker/federation front-end**, not the primary user store |
|
||
| Authorization via Keycloak Authorization Services | flex-auth + Topaz is the canonical PDP (ADR-0006) | Keycloak AuthZ Services is at most an optional adapter, never canonical |
|
||
| Single-tenant Coulomb deployment | Recursive `tenant:platform` vs `tenant:coulomb` model (NK-WP-0006) | Realm-per-tenant; tenant admins must not receive platform-root |
|
||
| MFA solely via privacyIDEA provider JAR | privacyIDEA deployed *and* upstream IdPs carry their own MFA | MFA assurance source becomes a decision, not a default |
|
||
|
||
## Architecture
|
||
|
||
```text
|
||
Enterprise IdPs (upstream)
|
||
Entra ID (OIDC) AD (LDAP) SAML 2.0 IdP
|
||
│ │ │
|
||
└──────────────┼──────────────┘
|
||
▼
|
||
[ Keycloak ] expanded-mode broker
|
||
│ realm-per-tenant; IAM Profile issuer
|
||
│ secrets ← OpenBao (ESO)
|
||
│ MFA ← privacyIDEA *or* upstream assurance
|
||
▼
|
||
NetKingdom IAM Profile token (OIDC/PKCE)
|
||
│
|
||
├──► applications (depend on the Profile, not the provider)
|
||
└──► flex-auth / Topaz ── authorization decision (PDP)
|
||
|
||
coexists with: KeyCape lightweight issuer (id.coulomb.social)
|
||
```
|
||
|
||
Keycloak answers identity (who, how authenticated, coarse claims,
|
||
assurance). It does **not** answer resource authorization — that stays in
|
||
flex-auth (ADR-0006). It does not store runtime secrets — those stay in
|
||
OpenBao.
|
||
|
||
## Scope
|
||
|
||
In scope:
|
||
|
||
- decision record for expanded-mode adoption: trigger, federation
|
||
topology (broker vs SAML SP), realm-per-tenant model, and coexistence
|
||
with the KeyCape lightweight issuer
|
||
- custom Keycloak image (privacyIDEA provider JAR if MFA is delegated to
|
||
privacyIDEA) and Helm deployment on RAILIANCE01
|
||
- upstream federation: Entra ID (OIDC), AD (LDAP), generic SAML 2.0 IdP
|
||
- claim mapping to the NetKingdom IAM Profile (issuer, audience, subject,
|
||
groups, tenant, assurance evidence) and IAM Profile conformance checks
|
||
- MFA / assurance source decision and enforcement of step-up for
|
||
privileged actions
|
||
- recursive tenancy: realm-per-tenant, platform-root guardrails, and the
|
||
flex-auth/Topaz authorization boundary
|
||
- backups, DR, break-glass, monitoring, and audit shipping for the broker
|
||
|
||
Out of scope:
|
||
|
||
- replacing flex-auth/Topaz with Keycloak Authorization Services
|
||
- migrating the deployed lightweight stack off KeyCape (coexistence only)
|
||
- application-side OIDC client code (apps target the IAM Profile spec)
|
||
- deploying OpenBao itself (Railiance platform) — consumed, not built
|
||
- tenant-specific federation policy for tenants beyond `tenant:platform`
|
||
and `tenant:coulomb`
|
||
|
||
## Tasks
|
||
|
||
```task
|
||
id: NK-WP-0011-T1
|
||
state_hub_task_id: 934f9223-2b6f-4d01-b49b-406b5b98b6e4
|
||
status: todo
|
||
priority: high
|
||
```
|
||
|
||
**Decision record — expanded-mode adoption & federation topology.** Write
|
||
an ADR (ADR-0009) capturing: the concrete trigger for switching a tenant
|
||
from lightweight to expanded mode; whether Keycloak acts as an OIDC
|
||
identity broker, a SAML service provider, or both; the realm-per-tenant
|
||
mapping onto `tenant:platform` / `tenant:coulomb`; how the Keycloak issuer
|
||
coexists with the KeyCape issuer (`id.coulomb.social`) so applications
|
||
still target one IAM Profile contract; and the canonical hostname/issuer
|
||
for the broker. Resolve or supersede D2 from NK-WP-0001.
|
||
|
||
```task
|
||
id: NK-WP-0011-T2
|
||
state_hub_task_id: 7b514cda-41b3-492f-8731-5a131422059d
|
||
status: todo
|
||
priority: high
|
||
```
|
||
|
||
**PostgreSQL `keycloak_db` on the existing operator.** Add a `keycloak`
|
||
database and role to the CloudNativePG instance from NK-WP-0003 (do not
|
||
deploy a new database). Source credentials from OpenBao via ESO into a K8s
|
||
Secret. Confirm the existing backup schedule covers the new database and
|
||
run a restore drill for `keycloak_db` specifically.
|
||
|
||
```task
|
||
id: NK-WP-0011-T3
|
||
state_hub_task_id: 8c29602e-ab9b-446a-8fee-5e2d8fbcb100
|
||
status: todo
|
||
priority: high
|
||
```
|
||
|
||
**Deploy expanded-mode Keycloak.** Build a custom image
|
||
(`kc.sh build`, privacyIDEA provider JAR included only if T5 delegates MFA
|
||
to privacyIDEA). Deploy via plain Helm on RAILIANCE01 behind Traefik +
|
||
cert-manager at the issuer hostname from T1. Admin bootstrap secret and DB
|
||
secret come from OpenBao/ESO — never typed, never in git. Hostname
|
||
strictness + proxy headers configured for Traefik. Realm import is
|
||
GitOps-friendly (realm JSON/CR in git).
|
||
|
||
```task
|
||
id: NK-WP-0011-T4
|
||
state_hub_task_id: d62d7683-24b1-458c-9fd6-96e576b52a64
|
||
status: todo
|
||
priority: high
|
||
```
|
||
|
||
**Upstream federation.** Configure identity brokering for Entra ID (OIDC),
|
||
on-prem AD (LDAP user federation), and a generic SAML 2.0 IdP. Map each
|
||
source's claims/attributes into the NetKingdom IAM Profile shape: issuer,
|
||
audience, subject, groups, **tenant**, and assurance evidence. Define the
|
||
attribute/claim mappers and group→role mapping. Verify a federated login
|
||
end-to-end for at least the Entra ID path.
|
||
|
||
```task
|
||
id: NK-WP-0011-T5
|
||
state_hub_task_id: 85319768-5d81-460d-89dc-8de76b63e0dc
|
||
status: todo
|
||
priority: medium
|
||
```
|
||
|
||
**MFA / assurance source.** Decide and implement the assurance model: MFA
|
||
enforced by privacyIDEA (via the Keycloak provider JAR + a "privacyIDEA
|
||
Browser" flow, carried over from NK-WP-0001) versus trusting upstream IdP
|
||
MFA (e.g. Entra Conditional Access) and reflecting it as assurance
|
||
evidence in the token. Require step-up for admin console and
|
||
platform-root-sensitive clients. Ensure assurance evidence is carried in
|
||
the IAM Profile token so flex-auth can gate privileged actions on it.
|
||
|
||
```task
|
||
id: NK-WP-0011-T6
|
||
state_hub_task_id: 53801dc2-7bdb-4fa3-9fa0-c1450ef1003b
|
||
status: todo
|
||
priority: high
|
||
```
|
||
|
||
**IAM Profile conformance & downstream coexistence.** Run IAM Profile
|
||
conformance checks against the Keycloak issuer (discovery document, PKCE,
|
||
token/claim shape, JWKS, userinfo). Verify an application configured for
|
||
the IAM Profile can authenticate against either the KeyCape or the
|
||
Keycloak issuer per the T1 selection rule. Use the canonical
|
||
`canon/standards/iam-profile_v0.2.md` contract and the executable suite in
|
||
`tools/iam-profile-conformance/`. Document per-tenant issuer selection.
|
||
|
||
```task
|
||
id: NK-WP-0011-T7
|
||
state_hub_task_id: 981f79bd-63ee-4da0-8d7d-9af8e468715e
|
||
status: todo
|
||
priority: high
|
||
```
|
||
|
||
**Recursive tenancy & authorization boundary.** Implement realm-per-tenant
|
||
with platform-root guardrails: tenant admins manage only their realm and
|
||
must not be able to alter IAM Profile semantics, the platform realm,
|
||
federation trust, OpenBao platform mounts, or audit retention (per the
|
||
flex-auth/Topaz implications in the architecture doc). Confirm flex-auth +
|
||
Topaz remains the PDP; if a Keycloak Authorization Services adapter is
|
||
used at all, document it as a delegated, non-canonical adapter.
|
||
|
||
```task
|
||
id: NK-WP-0011-T8
|
||
state_hub_task_id: 13634760-7817-40c0-b7db-5e0f4196dbf0
|
||
status: todo
|
||
priority: medium
|
||
```
|
||
|
||
**Backups, DR, break-glass, monitoring, audit.** Realm exports to git; DB
|
||
backup + restore drill (T2); break-glass admin path disabled-by-default
|
||
with alerting on use; Prometheus/Grafana for auth success/failure, MFA
|
||
latency, federation errors. Ship Keycloak events to the durable platform
|
||
audit sink alongside flex-auth/Topaz/OpenBao records, with correlation
|
||
ids — satisfying the "Audit sink" and "Break-glass" rows of the
|
||
production-readiness checklist.
|
||
|
||
## Acceptance Criteria
|
||
|
||
- An ADR records the expanded-mode trigger, federation topology,
|
||
realm-per-tenant model, and KeyCape/Keycloak issuer coexistence.
|
||
- A federated user from at least one enterprise IdP (Entra ID) can log in
|
||
and receive an IAM Profile-conformant token with tenant + assurance
|
||
claims.
|
||
- Keycloak secrets originate from OpenBao; none are bootstrapped from
|
||
KeePassXC or committed to git.
|
||
- flex-auth + Topaz remains the authorization decision point; Keycloak is
|
||
not the canonical policy engine.
|
||
- Tenant admins cannot cross the platform-root boundary.
|
||
- Keycloak audit events land in the durable platform audit sink with
|
||
correlation ids, and a DR/break-glass drill has passed.
|
||
|
||
## Open Questions / Dependencies on Other Repos
|
||
|
||
- **key-cape**: does coexistence require KeyCape changes, or can both
|
||
issuers serve the same IAM Profile unchanged? (EP-NK-001 federation
|
||
extension point.)
|
||
- **flex-auth**: confirmed claim/decision-envelope contract for tenant +
|
||
assurance evidence sourced from a federated token.
|
||
- **railiance-platform**: OpenBao must expose a Keycloak auth role / ESO
|
||
path before T3; unseal/break-glass story must be ready.
|
||
- **IAM Profile spec**: resolved by NK-WP-0012. T6 consumes
|
||
`canon/standards/iam-profile_v0.2.md` and
|
||
`tools/iam-profile-conformance/`.
|