Files
net-kingdom/workplans/NK-WP-0011-enterprise-federation-saml.md
tegwick ab79a32eba Cancel NK-WP-0001-T04; extract Keycloak federation into NK-WP-0011
NK-WP-0001-T04 (privacyIDEA, Keycloak path) -> cancelled, superseded by
NK-WP-0003-T04 in the deployed KeyCape stack. T05-T08 (Keycloak SSO,
realm/MFA flow, user mgmt, DR) -> cancelled and migrated to NK-WP-0011.

NK-WP-0011 reframes the deferred Keycloak work as expanded-mode enterprise
federation: Keycloak as an identity broker for Entra ID / AD / SAML that
issues IAM Profile-conformant tokens, refined against the current stack
(OpenBao runtime secrets, CloudNativePG, flex-auth/Topaz PDP, recursive
platform/tenant model) rather than the original greenfield assumptions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 23:48:51 +02:00

10 KiB
Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id, depends_on, supersedes_tasks
id type title domain repo status owner topic_slug created updated state_hub_workstream_id depends_on supersedes_tasks
NK-WP-0011 workplan Enterprise Federation & SAML — Expanded-Mode Keycloak Identity Broker netkingdom net-kingdom proposed worsch netkingdom 2026-05-20 2026-05-20 TBD
NK-WP-0003
NK-WP-0004
NK-WP-0006
NK-WP-0001-T05
NK-WP-0001-T06
NK-WP-0001-T07
NK-WP-0001-T08

NK-WP-0011 — Enterprise Federation & SAML (Expanded-Mode Keycloak)

Extracted from NK-WP-0001 (T05T08, the deferred Keycloak path) and refined against where net-kingdom actually stands today: a deployed KeyCape lightweight stack, an OpenBao runtime-secret authority, and a recursive platform/tenant authorization model. This is expanded identity mode in the architecture (docs/platform-identity-security-architecture.md).

Goal

Stand up Keycloak as an identity broker that federates upstream enterprise identity providers (Microsoft Entra ID / Azure AD via OIDC, on-prem Active Directory via LDAP, and generic SAML 2.0 IdPs) and issues NetKingdom IAM Profile-conformant tokens downstream — without displacing flex-auth as the authorization decision point or breaking the recursive platform/tenant boundary.

This is the answer to the long-standing open question "when does the platform switch from key-cape lightweight mode to Keycloak expanded mode?" — expanded mode exists specifically to onboard identities that originate in an external enterprise IdP, which the lightweight Authelia + LLDAP stack cannot broker.

Why this is not just "resume NK-WP-0001"

NK-WP-0001 assumed a greenfield: bootstrap Vault, build PostgreSQL, treat Keycloak as the internal user store. None of those assumptions hold now:

NK-WP-0001 assumption Current reality Effect on this plan
HashiCorp Vault, bootstrapped from KeePassXC OpenBao is the runtime secret authority (NK-WP-0006); SOPS/age + agent bootstrap exist (NK-WP-0004/0005) Keycloak DB + admin secrets come from OpenBao via ESO; no new vault bootstrap
PostgreSQL built from scratch CloudNativePG running on RAILIANCE01 (NK-WP-0003) Add keycloak_db to the existing operator, reuse backup pattern
Keycloak is the internal source of truth (D2 hybrid) KeyCape lightweight stack is the deployed IAM Profile issuer Keycloak is a broker/federation front-end, not the primary user store
Authorization via Keycloak Authorization Services flex-auth + Topaz is the canonical PDP (ADR-0006) Keycloak AuthZ Services is at most an optional adapter, never canonical
Single-tenant Coulomb deployment Recursive tenant:platform vs tenant:coulomb model (NK-WP-0006) Realm-per-tenant; tenant admins must not receive platform-root
MFA solely via privacyIDEA provider JAR privacyIDEA deployed and upstream IdPs carry their own MFA MFA assurance source becomes a decision, not a default

Architecture

        Enterprise IdPs (upstream)
   Entra ID (OIDC)   AD (LDAP)   SAML 2.0 IdP
        │              │              │
        └──────────────┼──────────────┘
                       ▼
                 [ Keycloak ]  expanded-mode broker
                       │   realm-per-tenant; IAM Profile issuer
                       │   secrets ← OpenBao (ESO)
                       │   MFA ← privacyIDEA *or* upstream assurance
                       ▼
        NetKingdom IAM Profile token (OIDC/PKCE)
                       │
                       ├──► applications (depend on the Profile, not the provider)
                       └──► flex-auth / Topaz  ── authorization decision (PDP)

   coexists with: KeyCape lightweight issuer (id.coulomb.social)

Keycloak answers identity (who, how authenticated, coarse claims, assurance). It does not answer resource authorization — that stays in flex-auth (ADR-0006). It does not store runtime secrets — those stay in OpenBao.

Scope

In scope:

  • decision record for expanded-mode adoption: trigger, federation topology (broker vs SAML SP), realm-per-tenant model, and coexistence with the KeyCape lightweight issuer
  • custom Keycloak image (privacyIDEA provider JAR if MFA is delegated to privacyIDEA) and Helm deployment on RAILIANCE01
  • upstream federation: Entra ID (OIDC), AD (LDAP), generic SAML 2.0 IdP
  • claim mapping to the NetKingdom IAM Profile (issuer, audience, subject, groups, tenant, assurance evidence) and IAM Profile conformance checks
  • MFA / assurance source decision and enforcement of step-up for privileged actions
  • recursive tenancy: realm-per-tenant, platform-root guardrails, and the flex-auth/Topaz authorization boundary
  • backups, DR, break-glass, monitoring, and audit shipping for the broker

Out of scope:

  • replacing flex-auth/Topaz with Keycloak Authorization Services
  • migrating the deployed lightweight stack off KeyCape (coexistence only)
  • application-side OIDC client code (apps target the IAM Profile spec)
  • deploying OpenBao itself (Railiance platform) — consumed, not built
  • tenant-specific federation policy for tenants beyond tenant:platform and tenant:coulomb

Tasks

id: NK-WP-0011-T1
status: todo
priority: high

Decision record — expanded-mode adoption & federation topology. Write an ADR (ADR-0009) capturing: the concrete trigger for switching a tenant from lightweight to expanded mode; whether Keycloak acts as an OIDC identity broker, a SAML service provider, or both; the realm-per-tenant mapping onto tenant:platform / tenant:coulomb; how the Keycloak issuer coexists with the KeyCape issuer (id.coulomb.social) so applications still target one IAM Profile contract; and the canonical hostname/issuer for the broker. Resolve or supersede D2 from NK-WP-0001.

id: NK-WP-0011-T2
status: todo
priority: high

PostgreSQL keycloak_db on the existing operator. Add a keycloak database and role to the CloudNativePG instance from NK-WP-0003 (do not deploy a new database). Source credentials from OpenBao via ESO into a K8s Secret. Confirm the existing backup schedule covers the new database and run a restore drill for keycloak_db specifically.

id: NK-WP-0011-T3
status: todo
priority: high

Deploy expanded-mode Keycloak. Build a custom image (kc.sh build, privacyIDEA provider JAR included only if T5 delegates MFA to privacyIDEA). Deploy via plain Helm on RAILIANCE01 behind Traefik + cert-manager at the issuer hostname from T1. Admin bootstrap secret and DB secret come from OpenBao/ESO — never typed, never in git. Hostname strictness + proxy headers configured for Traefik. Realm import is GitOps-friendly (realm JSON/CR in git).

id: NK-WP-0011-T4
status: todo
priority: high

Upstream federation. Configure identity brokering for Entra ID (OIDC), on-prem AD (LDAP user federation), and a generic SAML 2.0 IdP. Map each source's claims/attributes into the NetKingdom IAM Profile shape: issuer, audience, subject, groups, tenant, and assurance evidence. Define the attribute/claim mappers and group→role mapping. Verify a federated login end-to-end for at least the Entra ID path.

id: NK-WP-0011-T5
status: todo
priority: medium

MFA / assurance source. Decide and implement the assurance model: MFA enforced by privacyIDEA (via the Keycloak provider JAR + a "privacyIDEA Browser" flow, carried over from NK-WP-0001) versus trusting upstream IdP MFA (e.g. Entra Conditional Access) and reflecting it as assurance evidence in the token. Require step-up for admin console and platform-root-sensitive clients. Ensure assurance evidence is carried in the IAM Profile token so flex-auth can gate privileged actions on it.

id: NK-WP-0011-T6
status: todo
priority: high

IAM Profile conformance & downstream coexistence. Run IAM Profile conformance checks against the Keycloak issuer (discovery document, PKCE, token/claim shape, JWKS, userinfo). Verify an application configured for the IAM Profile can authenticate against either the KeyCape or the Keycloak issuer per the T1 selection rule. Document per-tenant issuer selection.

id: NK-WP-0011-T7
status: todo
priority: high

Recursive tenancy & authorization boundary. Implement realm-per-tenant with platform-root guardrails: tenant admins manage only their realm and must not be able to alter IAM Profile semantics, the platform realm, federation trust, OpenBao platform mounts, or audit retention (per the flex-auth/Topaz implications in the architecture doc). Confirm flex-auth + Topaz remains the PDP; if a Keycloak Authorization Services adapter is used at all, document it as a delegated, non-canonical adapter.

id: NK-WP-0011-T8
status: todo
priority: medium

Backups, DR, break-glass, monitoring, audit. Realm exports to git; DB backup + restore drill (T2); break-glass admin path disabled-by-default with alerting on use; Prometheus/Grafana for auth success/failure, MFA latency, federation errors. Ship Keycloak events to the durable platform audit sink alongside flex-auth/Topaz/OpenBao records, with correlation ids — satisfying the "Audit sink" and "Break-glass" rows of the production-readiness checklist.

Acceptance Criteria

  • An ADR records the expanded-mode trigger, federation topology, realm-per-tenant model, and KeyCape/Keycloak issuer coexistence.
  • A federated user from at least one enterprise IdP (Entra ID) can log in and receive an IAM Profile-conformant token with tenant + assurance claims.
  • Keycloak secrets originate from OpenBao; none are bootstrapped from KeePassXC or committed to git.
  • flex-auth + Topaz remains the authorization decision point; Keycloak is not the canonical policy engine.
  • Tenant admins cannot cross the platform-root boundary.
  • Keycloak audit events land in the durable platform audit sink with correlation ids, and a DR/break-glass drill has passed.

Open Questions / Dependencies on Other Repos

  • key-cape: does coexistence require KeyCape changes, or can both issuers serve the same IAM Profile unchanged? (EP-NK-001 federation extension point.)
  • flex-auth: confirmed claim/decision-envelope contract for tenant + assurance evidence sourced from a federated token.
  • railiance-platform: OpenBao must expose a Keycloak auth role / ESO path before T3; unseal/break-glass story must be ready.
  • IAM Profile spec: must be versioned and have an executable conformance check before T6 can pass (see "Missing" below).