Files
net-kingdom/workplans/NK-WP-0011-enterprise-federation-saml.md

261 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: NK-WP-0011
type: workplan
title: "Enterprise Federation & SAML — Expanded-Mode Keycloak Identity Broker"
domain: netkingdom
repo: net-kingdom
status: proposed
owner: worsch
topic_slug: netkingdom
created: "2026-05-20"
updated: "2026-05-20"
state_hub_workstream_id: a44beef8-c18b-4ae7-b7fe-a178cc4fcdf0
depends_on:
- NK-WP-0003
- NK-WP-0004
- NK-WP-0006
supersedes_tasks:
- NK-WP-0001-T05
- NK-WP-0001-T06
- NK-WP-0001-T07
- NK-WP-0001-T08
---
# NK-WP-0011 — Enterprise Federation & SAML (Expanded-Mode Keycloak)
> Extracted from NK-WP-0001 (T05T08, the deferred Keycloak path) and
> refined against where net-kingdom actually stands today: a deployed
> KeyCape lightweight stack, an OpenBao runtime-secret authority, and a
> recursive platform/tenant authorization model. This is **expanded
> identity mode** in the architecture (`docs/platform-identity-security-architecture.md`).
## Goal
Stand up **Keycloak as an identity broker** that federates upstream
enterprise identity providers (Microsoft Entra ID / Azure AD via OIDC,
on-prem Active Directory via LDAP, and generic SAML 2.0 IdPs) and issues
**NetKingdom IAM Profile-conformant** tokens downstream — without
displacing flex-auth as the authorization decision point or breaking the
recursive platform/tenant boundary.
This is the answer to the long-standing open question
*"when does the platform switch from key-cape lightweight mode to Keycloak
expanded mode?"* — expanded mode exists **specifically** to onboard
identities that originate in an external enterprise IdP, which the
lightweight Authelia + LLDAP stack cannot broker.
## Why this is not just "resume NK-WP-0001"
NK-WP-0001 assumed a greenfield: bootstrap Vault, build PostgreSQL, treat
Keycloak as the internal user store. None of those assumptions hold now:
| NK-WP-0001 assumption | Current reality | Effect on this plan |
|---|---|---|
| HashiCorp Vault, bootstrapped from KeePassXC | **OpenBao** is the runtime secret authority (NK-WP-0006); SOPS/age + agent bootstrap exist (NK-WP-0004/0005) | Keycloak DB + admin secrets come from OpenBao via ESO; no new vault bootstrap |
| PostgreSQL built from scratch | CloudNativePG running on RAILIANCE01 (NK-WP-0003) | Add `keycloak_db` to the existing operator, reuse backup pattern |
| Keycloak is the internal source of truth (D2 hybrid) | KeyCape lightweight stack is the *deployed* IAM Profile issuer | Keycloak is a **broker/federation front-end**, not the primary user store |
| Authorization via Keycloak Authorization Services | flex-auth + Topaz is the canonical PDP (ADR-0006) | Keycloak AuthZ Services is at most an optional adapter, never canonical |
| Single-tenant Coulomb deployment | Recursive `tenant:platform` vs `tenant:coulomb` model (NK-WP-0006) | Realm-per-tenant; tenant admins must not receive platform-root |
| MFA solely via privacyIDEA provider JAR | privacyIDEA deployed *and* upstream IdPs carry their own MFA | MFA assurance source becomes a decision, not a default |
## Architecture
```text
Enterprise IdPs (upstream)
Entra ID (OIDC) AD (LDAP) SAML 2.0 IdP
│ │ │
└──────────────┼──────────────┘
[ Keycloak ] expanded-mode broker
│ realm-per-tenant; IAM Profile issuer
│ secrets ← OpenBao (ESO)
│ MFA ← privacyIDEA *or* upstream assurance
NetKingdom IAM Profile token (OIDC/PKCE)
├──► applications (depend on the Profile, not the provider)
└──► flex-auth / Topaz ── authorization decision (PDP)
coexists with: KeyCape lightweight issuer (id.coulomb.social)
```
Keycloak answers identity (who, how authenticated, coarse claims,
assurance). It does **not** answer resource authorization — that stays in
flex-auth (ADR-0006). It does not store runtime secrets — those stay in
OpenBao.
## Scope
In scope:
- decision record for expanded-mode adoption: trigger, federation
topology (broker vs SAML SP), realm-per-tenant model, and coexistence
with the KeyCape lightweight issuer
- custom Keycloak image (privacyIDEA provider JAR if MFA is delegated to
privacyIDEA) and Helm deployment on RAILIANCE01
- upstream federation: Entra ID (OIDC), AD (LDAP), generic SAML 2.0 IdP
- claim mapping to the NetKingdom IAM Profile (issuer, audience, subject,
groups, tenant, assurance evidence) and IAM Profile conformance checks
- MFA / assurance source decision and enforcement of step-up for
privileged actions
- recursive tenancy: realm-per-tenant, platform-root guardrails, and the
flex-auth/Topaz authorization boundary
- backups, DR, break-glass, monitoring, and audit shipping for the broker
Out of scope:
- replacing flex-auth/Topaz with Keycloak Authorization Services
- migrating the deployed lightweight stack off KeyCape (coexistence only)
- application-side OIDC client code (apps target the IAM Profile spec)
- deploying OpenBao itself (Railiance platform) — consumed, not built
- tenant-specific federation policy for tenants beyond `tenant:platform`
and `tenant:coulomb`
## Tasks
```task
id: NK-WP-0011-T1
state_hub_task_id: 934f9223-2b6f-4d01-b49b-406b5b98b6e4
status: todo
priority: high
```
**Decision record — expanded-mode adoption & federation topology.** Write
an ADR (ADR-0009) capturing: the concrete trigger for switching a tenant
from lightweight to expanded mode; whether Keycloak acts as an OIDC
identity broker, a SAML service provider, or both; the realm-per-tenant
mapping onto `tenant:platform` / `tenant:coulomb`; how the Keycloak issuer
coexists with the KeyCape issuer (`id.coulomb.social`) so applications
still target one IAM Profile contract; and the canonical hostname/issuer
for the broker. Resolve or supersede D2 from NK-WP-0001.
```task
id: NK-WP-0011-T2
state_hub_task_id: 7b514cda-41b3-492f-8731-5a131422059d
status: todo
priority: high
```
**PostgreSQL `keycloak_db` on the existing operator.** Add a `keycloak`
database and role to the CloudNativePG instance from NK-WP-0003 (do not
deploy a new database). Source credentials from OpenBao via ESO into a K8s
Secret. Confirm the existing backup schedule covers the new database and
run a restore drill for `keycloak_db` specifically.
```task
id: NK-WP-0011-T3
state_hub_task_id: 8c29602e-ab9b-446a-8fee-5e2d8fbcb100
status: todo
priority: high
```
**Deploy expanded-mode Keycloak.** Build a custom image
(`kc.sh build`, privacyIDEA provider JAR included only if T5 delegates MFA
to privacyIDEA). Deploy via plain Helm on RAILIANCE01 behind Traefik +
cert-manager at the issuer hostname from T1. Admin bootstrap secret and DB
secret come from OpenBao/ESO — never typed, never in git. Hostname
strictness + proxy headers configured for Traefik. Realm import is
GitOps-friendly (realm JSON/CR in git).
```task
id: NK-WP-0011-T4
state_hub_task_id: d62d7683-24b1-458c-9fd6-96e576b52a64
status: todo
priority: high
```
**Upstream federation.** Configure identity brokering for Entra ID (OIDC),
on-prem AD (LDAP user federation), and a generic SAML 2.0 IdP. Map each
source's claims/attributes into the NetKingdom IAM Profile shape: issuer,
audience, subject, groups, **tenant**, and assurance evidence. Define the
attribute/claim mappers and group→role mapping. Verify a federated login
end-to-end for at least the Entra ID path.
```task
id: NK-WP-0011-T5
state_hub_task_id: 85319768-5d81-460d-89dc-8de76b63e0dc
status: todo
priority: medium
```
**MFA / assurance source.** Decide and implement the assurance model: MFA
enforced by privacyIDEA (via the Keycloak provider JAR + a "privacyIDEA
Browser" flow, carried over from NK-WP-0001) versus trusting upstream IdP
MFA (e.g. Entra Conditional Access) and reflecting it as assurance
evidence in the token. Require step-up for admin console and
platform-root-sensitive clients. Ensure assurance evidence is carried in
the IAM Profile token so flex-auth can gate privileged actions on it.
```task
id: NK-WP-0011-T6
state_hub_task_id: 53801dc2-7bdb-4fa3-9fa0-c1450ef1003b
status: todo
priority: high
```
**IAM Profile conformance & downstream coexistence.** Run IAM Profile
conformance checks against the Keycloak issuer (discovery document, PKCE,
token/claim shape, JWKS, userinfo). Verify an application configured for
the IAM Profile can authenticate against either the KeyCape or the
Keycloak issuer per the T1 selection rule. Use the canonical
`canon/standards/iam-profile_v0.2.md` contract and the executable suite in
`tools/iam-profile-conformance/`. Document per-tenant issuer selection.
```task
id: NK-WP-0011-T7
state_hub_task_id: 981f79bd-63ee-4da0-8d7d-9af8e468715e
status: todo
priority: high
```
**Recursive tenancy & authorization boundary.** Implement realm-per-tenant
with platform-root guardrails: tenant admins manage only their realm and
must not be able to alter IAM Profile semantics, the platform realm,
federation trust, OpenBao platform mounts, or audit retention (per the
flex-auth/Topaz implications in the architecture doc). Confirm flex-auth +
Topaz remains the PDP; if a Keycloak Authorization Services adapter is
used at all, document it as a delegated, non-canonical adapter.
```task
id: NK-WP-0011-T8
state_hub_task_id: 13634760-7817-40c0-b7db-5e0f4196dbf0
status: todo
priority: medium
```
**Backups, DR, break-glass, monitoring, audit.** Realm exports to git; DB
backup + restore drill (T2); break-glass admin path disabled-by-default
with alerting on use; Prometheus/Grafana for auth success/failure, MFA
latency, federation errors. Ship Keycloak events to the durable platform
audit sink alongside flex-auth/Topaz/OpenBao records, with correlation
ids — satisfying the "Audit sink" and "Break-glass" rows of the
production-readiness checklist.
## Acceptance Criteria
- An ADR records the expanded-mode trigger, federation topology,
realm-per-tenant model, and KeyCape/Keycloak issuer coexistence.
- A federated user from at least one enterprise IdP (Entra ID) can log in
and receive an IAM Profile-conformant token with tenant + assurance
claims.
- Keycloak secrets originate from OpenBao; none are bootstrapped from
KeePassXC or committed to git.
- flex-auth + Topaz remains the authorization decision point; Keycloak is
not the canonical policy engine.
- Tenant admins cannot cross the platform-root boundary.
- Keycloak audit events land in the durable platform audit sink with
correlation ids, and a DR/break-glass drill has passed.
## Open Questions / Dependencies on Other Repos
- **key-cape**: does coexistence require KeyCape changes, or can both
issuers serve the same IAM Profile unchanged? (EP-NK-001 federation
extension point.)
- **flex-auth**: confirmed claim/decision-envelope contract for tenant +
assurance evidence sourced from a federated token.
- **railiance-platform**: OpenBao must expose a Keycloak auth role / ESO
path before T3; unseal/break-glass story must be ready.
- **IAM Profile spec**: resolved by NK-WP-0012. T6 consumes
`canon/standards/iam-profile_v0.2.md` and
`tools/iam-profile-conformance/`.