Cancel NK-WP-0001-T04; extract Keycloak federation into NK-WP-0011

NK-WP-0001-T04 (privacyIDEA, Keycloak path) -> cancelled, superseded by
NK-WP-0003-T04 in the deployed KeyCape stack. T05-T08 (Keycloak SSO,
realm/MFA flow, user mgmt, DR) -> cancelled and migrated to NK-WP-0011.

NK-WP-0011 reframes the deferred Keycloak work as expanded-mode enterprise
federation: Keycloak as an identity broker for Entra ID / AD / SAML that
issues IAM Profile-conformant tokens, refined against the current stack
(OpenBao runtime secrets, CloudNativePG, flex-auth/Topaz PDP, recursive
platform/tenant model) rather than the original greenfield assumptions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-20 23:48:51 +02:00
parent 2037df49bc
commit ab79a32eba
2 changed files with 265 additions and 10 deletions

View File

@@ -21,11 +21,12 @@ superseded_by: NK-WP-0003
> - T01 (secret bootstrap) → replaced by NK-WP-0004 + NK-WP-0005
> - T02 (K8s foundations) → done, reused by NK-WP-0003
> - T03 (PostgreSQL) → done, reused by NK-WP-0003
> - T04 (privacyIDEA) → superseded by NK-WP-0003-T04
> - T05T08 (Keycloak) → deferred indefinitely; revisit if/when Keycloak
> is needed for enterprise federation or SAML requirements
> - T04 (privacyIDEA) → cancelled; superseded by NK-WP-0003-T04
> - T05T08 (Keycloak) → extracted into **NK-WP-0011** (enterprise
> federation / SAML, expanded-mode Keycloak). No longer tracked here.
>
> **Active work: see NK-WP-0003.**
> **Active work: see NK-WP-0003 (deployed stack) and NK-WP-0011
> (enterprise federation).**
## Summary
@@ -213,9 +214,9 @@ restore drill passed.
```task
id: NK-WP-0001-T04
state_hub_task_id: 6ad1296a-a488-4031-b665-f77030e971ed
status: in_progress
status: cancelled
priority: high
note: Manifests committed (pvc, configmap, deployment, middleware, ingress). Scripts: create-secrets.sh, enckey-bootstrap.sh, bootstrap-admin.sh. verify-t04.sh. Domain pink.coulomb.social (CP-NK-002/003). Pending: apply to live cluster, run enckey-bootstrap.sh, bootstrap-admin.sh.
note: Cancelled 2026-05-20. privacyIDEA deployment superseded by NK-WP-0003-T04 (privacyIDEA now runs in the live KeyCape stack on RAILIANCE01). This Keycloak-path variant is no longer pursued.
```
Deploy privacyIDEA via `gpappsoft/privacyidea` Helm chart (Artifact Hub) or
@@ -261,8 +262,9 @@ pi-admin enrolled with MFA, trigger-admin created, rate-limiting active.
```task
id: NK-WP-0001-T05
state_hub_task_id: b9f73aa6-9035-4643-9905-64e73a29b298
status: todo
status: cancelled
priority: high
note: Migrated to NK-WP-0011 (enterprise federation / SAML). Refined there against the deployed KeyCape stack and the OpenBao/flex-auth architecture.
```
Build a **custom Keycloak image** that includes the privacyIDEA Provider JAR:
@@ -294,8 +296,9 @@ custom image with privacyIDEA JAR deployed and verified.
```task
id: NK-WP-0001-T06
state_hub_task_id: 3b6379a4-a27b-4d25-82be-bc600879f036
status: todo
status: cancelled
priority: medium
note: Migrated to NK-WP-0011 (enterprise federation / SAML).
```
In Keycloak:
@@ -327,8 +330,9 @@ modes handled gracefully.
```task
id: NK-WP-0001-T07
state_hub_task_id: c7cf902a-b480-4545-a536-293070945206
status: todo
status: cancelled
priority: medium
note: Migrated to NK-WP-0011 (enterprise federation / SAML).
```
**Decision D2 applies:** identity source of truth is Keycloak-internal with
@@ -369,8 +373,9 @@ audit logs flowing, Keycloak resolver configured.
```task
id: NK-WP-0001-T08
state_hub_task_id: 9cbd1d89-b5bf-491e-9d16-b1c7d57076fb
status: todo
status: cancelled
priority: medium
note: Migrated to NK-WP-0011 (enterprise federation / SAML).
```
**Backups:**

View File

@@ -0,0 +1,250 @@
---
id: NK-WP-0011
type: workplan
title: "Enterprise Federation & SAML — Expanded-Mode Keycloak Identity Broker"
domain: netkingdom
repo: net-kingdom
status: proposed
owner: worsch
topic_slug: netkingdom
created: "2026-05-20"
updated: "2026-05-20"
state_hub_workstream_id: TBD
depends_on:
- NK-WP-0003
- NK-WP-0004
- NK-WP-0006
supersedes_tasks:
- NK-WP-0001-T05
- NK-WP-0001-T06
- NK-WP-0001-T07
- NK-WP-0001-T08
---
# NK-WP-0011 — Enterprise Federation & SAML (Expanded-Mode Keycloak)
> Extracted from NK-WP-0001 (T05T08, the deferred Keycloak path) and
> refined against where net-kingdom actually stands today: a deployed
> KeyCape lightweight stack, an OpenBao runtime-secret authority, and a
> recursive platform/tenant authorization model. This is **expanded
> identity mode** in the architecture (`docs/platform-identity-security-architecture.md`).
## Goal
Stand up **Keycloak as an identity broker** that federates upstream
enterprise identity providers (Microsoft Entra ID / Azure AD via OIDC,
on-prem Active Directory via LDAP, and generic SAML 2.0 IdPs) and issues
**NetKingdom IAM Profile-conformant** tokens downstream — without
displacing flex-auth as the authorization decision point or breaking the
recursive platform/tenant boundary.
This is the answer to the long-standing open question
*"when does the platform switch from key-cape lightweight mode to Keycloak
expanded mode?"* — expanded mode exists **specifically** to onboard
identities that originate in an external enterprise IdP, which the
lightweight Authelia + LLDAP stack cannot broker.
## Why this is not just "resume NK-WP-0001"
NK-WP-0001 assumed a greenfield: bootstrap Vault, build PostgreSQL, treat
Keycloak as the internal user store. None of those assumptions hold now:
| NK-WP-0001 assumption | Current reality | Effect on this plan |
|---|---|---|
| HashiCorp Vault, bootstrapped from KeePassXC | **OpenBao** is the runtime secret authority (NK-WP-0006); SOPS/age + agent bootstrap exist (NK-WP-0004/0005) | Keycloak DB + admin secrets come from OpenBao via ESO; no new vault bootstrap |
| PostgreSQL built from scratch | CloudNativePG running on RAILIANCE01 (NK-WP-0003) | Add `keycloak_db` to the existing operator, reuse backup pattern |
| Keycloak is the internal source of truth (D2 hybrid) | KeyCape lightweight stack is the *deployed* IAM Profile issuer | Keycloak is a **broker/federation front-end**, not the primary user store |
| Authorization via Keycloak Authorization Services | flex-auth + Topaz is the canonical PDP (ADR-0006) | Keycloak AuthZ Services is at most an optional adapter, never canonical |
| Single-tenant Coulomb deployment | Recursive `tenant:platform` vs `tenant:coulomb` model (NK-WP-0006) | Realm-per-tenant; tenant admins must not receive platform-root |
| MFA solely via privacyIDEA provider JAR | privacyIDEA deployed *and* upstream IdPs carry their own MFA | MFA assurance source becomes a decision, not a default |
## Architecture
```text
Enterprise IdPs (upstream)
Entra ID (OIDC) AD (LDAP) SAML 2.0 IdP
│ │ │
└──────────────┼──────────────┘
[ Keycloak ] expanded-mode broker
│ realm-per-tenant; IAM Profile issuer
│ secrets ← OpenBao (ESO)
│ MFA ← privacyIDEA *or* upstream assurance
NetKingdom IAM Profile token (OIDC/PKCE)
├──► applications (depend on the Profile, not the provider)
└──► flex-auth / Topaz ── authorization decision (PDP)
coexists with: KeyCape lightweight issuer (id.coulomb.social)
```
Keycloak answers identity (who, how authenticated, coarse claims,
assurance). It does **not** answer resource authorization — that stays in
flex-auth (ADR-0006). It does not store runtime secrets — those stay in
OpenBao.
## Scope
In scope:
- decision record for expanded-mode adoption: trigger, federation
topology (broker vs SAML SP), realm-per-tenant model, and coexistence
with the KeyCape lightweight issuer
- custom Keycloak image (privacyIDEA provider JAR if MFA is delegated to
privacyIDEA) and Helm deployment on RAILIANCE01
- upstream federation: Entra ID (OIDC), AD (LDAP), generic SAML 2.0 IdP
- claim mapping to the NetKingdom IAM Profile (issuer, audience, subject,
groups, tenant, assurance evidence) and IAM Profile conformance checks
- MFA / assurance source decision and enforcement of step-up for
privileged actions
- recursive tenancy: realm-per-tenant, platform-root guardrails, and the
flex-auth/Topaz authorization boundary
- backups, DR, break-glass, monitoring, and audit shipping for the broker
Out of scope:
- replacing flex-auth/Topaz with Keycloak Authorization Services
- migrating the deployed lightweight stack off KeyCape (coexistence only)
- application-side OIDC client code (apps target the IAM Profile spec)
- deploying OpenBao itself (Railiance platform) — consumed, not built
- tenant-specific federation policy for tenants beyond `tenant:platform`
and `tenant:coulomb`
## Tasks
```task
id: NK-WP-0011-T1
status: todo
priority: high
```
**Decision record — expanded-mode adoption & federation topology.** Write
an ADR (ADR-0009) capturing: the concrete trigger for switching a tenant
from lightweight to expanded mode; whether Keycloak acts as an OIDC
identity broker, a SAML service provider, or both; the realm-per-tenant
mapping onto `tenant:platform` / `tenant:coulomb`; how the Keycloak issuer
coexists with the KeyCape issuer (`id.coulomb.social`) so applications
still target one IAM Profile contract; and the canonical hostname/issuer
for the broker. Resolve or supersede D2 from NK-WP-0001.
```task
id: NK-WP-0011-T2
status: todo
priority: high
```
**PostgreSQL `keycloak_db` on the existing operator.** Add a `keycloak`
database and role to the CloudNativePG instance from NK-WP-0003 (do not
deploy a new database). Source credentials from OpenBao via ESO into a K8s
Secret. Confirm the existing backup schedule covers the new database and
run a restore drill for `keycloak_db` specifically.
```task
id: NK-WP-0011-T3
status: todo
priority: high
```
**Deploy expanded-mode Keycloak.** Build a custom image
(`kc.sh build`, privacyIDEA provider JAR included only if T5 delegates MFA
to privacyIDEA). Deploy via plain Helm on RAILIANCE01 behind Traefik +
cert-manager at the issuer hostname from T1. Admin bootstrap secret and DB
secret come from OpenBao/ESO — never typed, never in git. Hostname
strictness + proxy headers configured for Traefik. Realm import is
GitOps-friendly (realm JSON/CR in git).
```task
id: NK-WP-0011-T4
status: todo
priority: high
```
**Upstream federation.** Configure identity brokering for Entra ID (OIDC),
on-prem AD (LDAP user federation), and a generic SAML 2.0 IdP. Map each
source's claims/attributes into the NetKingdom IAM Profile shape: issuer,
audience, subject, groups, **tenant**, and assurance evidence. Define the
attribute/claim mappers and group→role mapping. Verify a federated login
end-to-end for at least the Entra ID path.
```task
id: NK-WP-0011-T5
status: todo
priority: medium
```
**MFA / assurance source.** Decide and implement the assurance model: MFA
enforced by privacyIDEA (via the Keycloak provider JAR + a "privacyIDEA
Browser" flow, carried over from NK-WP-0001) versus trusting upstream IdP
MFA (e.g. Entra Conditional Access) and reflecting it as assurance
evidence in the token. Require step-up for admin console and
platform-root-sensitive clients. Ensure assurance evidence is carried in
the IAM Profile token so flex-auth can gate privileged actions on it.
```task
id: NK-WP-0011-T6
status: todo
priority: high
```
**IAM Profile conformance & downstream coexistence.** Run IAM Profile
conformance checks against the Keycloak issuer (discovery document, PKCE,
token/claim shape, JWKS, userinfo). Verify an application configured for
the IAM Profile can authenticate against either the KeyCape or the
Keycloak issuer per the T1 selection rule. Document per-tenant issuer
selection.
```task
id: NK-WP-0011-T7
status: todo
priority: high
```
**Recursive tenancy & authorization boundary.** Implement realm-per-tenant
with platform-root guardrails: tenant admins manage only their realm and
must not be able to alter IAM Profile semantics, the platform realm,
federation trust, OpenBao platform mounts, or audit retention (per the
flex-auth/Topaz implications in the architecture doc). Confirm flex-auth +
Topaz remains the PDP; if a Keycloak Authorization Services adapter is
used at all, document it as a delegated, non-canonical adapter.
```task
id: NK-WP-0011-T8
status: todo
priority: medium
```
**Backups, DR, break-glass, monitoring, audit.** Realm exports to git; DB
backup + restore drill (T2); break-glass admin path disabled-by-default
with alerting on use; Prometheus/Grafana for auth success/failure, MFA
latency, federation errors. Ship Keycloak events to the durable platform
audit sink alongside flex-auth/Topaz/OpenBao records, with correlation
ids — satisfying the "Audit sink" and "Break-glass" rows of the
production-readiness checklist.
## Acceptance Criteria
- An ADR records the expanded-mode trigger, federation topology,
realm-per-tenant model, and KeyCape/Keycloak issuer coexistence.
- A federated user from at least one enterprise IdP (Entra ID) can log in
and receive an IAM Profile-conformant token with tenant + assurance
claims.
- Keycloak secrets originate from OpenBao; none are bootstrapped from
KeePassXC or committed to git.
- flex-auth + Topaz remains the authorization decision point; Keycloak is
not the canonical policy engine.
- Tenant admins cannot cross the platform-root boundary.
- Keycloak audit events land in the durable platform audit sink with
correlation ids, and a DR/break-glass drill has passed.
## Open Questions / Dependencies on Other Repos
- **key-cape**: does coexistence require KeyCape changes, or can both
issuers serve the same IAM Profile unchanged? (EP-NK-001 federation
extension point.)
- **flex-auth**: confirmed claim/decision-envelope contract for tenant +
assurance evidence sourced from a federated token.
- **railiance-platform**: OpenBao must expose a Keycloak auth role / ESO
path before T3; unseal/break-glass story must be ready.
- **IAM Profile spec**: must be versioned and have an executable
conformance check before T6 can pass (see "Missing" below).