Files
net-kingdom/workplans/NK-WP-0011-enterprise-federation-saml.md

11 KiB
Raw Permalink Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id, depends_on, supersedes_tasks
id type title domain repo status owner topic_slug created updated state_hub_workstream_id depends_on supersedes_tasks
NK-WP-0011 workplan Enterprise Federation & SAML — Expanded-Mode Keycloak Identity Broker netkingdom net-kingdom proposed worsch netkingdom 2026-05-20 2026-05-20 a44beef8-c18b-4ae7-b7fe-a178cc4fcdf0
NK-WP-0003
NK-WP-0004
NK-WP-0006
NK-WP-0001-T05
NK-WP-0001-T06
NK-WP-0001-T07
NK-WP-0001-T08

NK-WP-0011 — Enterprise Federation & SAML (Expanded-Mode Keycloak)

Extracted from NK-WP-0001 (T05T08, the deferred Keycloak path) and refined against where net-kingdom actually stands today: a deployed KeyCape lightweight stack, an OpenBao runtime-secret authority, and a recursive platform/tenant authorization model. This is expanded identity mode in the architecture (docs/platform-identity-security-architecture.md).

Goal

Stand up Keycloak as an identity broker that federates upstream enterprise identity providers (Microsoft Entra ID / Azure AD via OIDC, on-prem Active Directory via LDAP, and generic SAML 2.0 IdPs) and issues NetKingdom IAM Profile-conformant tokens downstream — without displacing flex-auth as the authorization decision point or breaking the recursive platform/tenant boundary.

This is the answer to the long-standing open question "when does the platform switch from key-cape lightweight mode to Keycloak expanded mode?" — expanded mode exists specifically to onboard identities that originate in an external enterprise IdP, which the lightweight Authelia + LLDAP stack cannot broker.

Why this is not just "resume NK-WP-0001"

NK-WP-0001 assumed a greenfield: bootstrap Vault, build PostgreSQL, treat Keycloak as the internal user store. None of those assumptions hold now:

NK-WP-0001 assumption Current reality Effect on this plan
HashiCorp Vault, bootstrapped from KeePassXC OpenBao is the runtime secret authority (NK-WP-0006); SOPS/age + agent bootstrap exist (NK-WP-0004/0005) Keycloak DB + admin secrets come from OpenBao via ESO; no new vault bootstrap
PostgreSQL built from scratch CloudNativePG running on RAILIANCE01 (NK-WP-0003) Add keycloak_db to the existing operator, reuse backup pattern
Keycloak is the internal source of truth (D2 hybrid) KeyCape lightweight stack is the deployed IAM Profile issuer Keycloak is a broker/federation front-end, not the primary user store
Authorization via Keycloak Authorization Services flex-auth + Topaz is the canonical PDP (ADR-0006) Keycloak AuthZ Services is at most an optional adapter, never canonical
Single-tenant Coulomb deployment Recursive tenant:platform vs tenant:coulomb model (NK-WP-0006) Realm-per-tenant; tenant admins must not receive platform-root
MFA solely via privacyIDEA provider JAR privacyIDEA deployed and upstream IdPs carry their own MFA MFA assurance source becomes a decision, not a default

Architecture

        Enterprise IdPs (upstream)
   Entra ID (OIDC)   AD (LDAP)   SAML 2.0 IdP
        │              │              │
        └──────────────┼──────────────┘
                       ▼
                 [ Keycloak ]  expanded-mode broker
                       │   realm-per-tenant; IAM Profile issuer
                       │   secrets ← OpenBao (ESO)
                       │   MFA ← privacyIDEA *or* upstream assurance
                       ▼
        NetKingdom IAM Profile token (OIDC/PKCE)
                       │
                       ├──► applications (depend on the Profile, not the provider)
                       └──► flex-auth / Topaz  ── authorization decision (PDP)

   coexists with: KeyCape lightweight issuer (id.coulomb.social)

Keycloak answers identity (who, how authenticated, coarse claims, assurance). It does not answer resource authorization — that stays in flex-auth (ADR-0006). It does not store runtime secrets — those stay in OpenBao.

Scope

In scope:

  • decision record for expanded-mode adoption: trigger, federation topology (broker vs SAML SP), realm-per-tenant model, and coexistence with the KeyCape lightweight issuer
  • custom Keycloak image (privacyIDEA provider JAR if MFA is delegated to privacyIDEA) and Helm deployment on RAILIANCE01
  • upstream federation: Entra ID (OIDC), AD (LDAP), generic SAML 2.0 IdP
  • claim mapping to the NetKingdom IAM Profile (issuer, audience, subject, groups, tenant, assurance evidence) and IAM Profile conformance checks
  • MFA / assurance source decision and enforcement of step-up for privileged actions
  • recursive tenancy: realm-per-tenant, platform-root guardrails, and the flex-auth/Topaz authorization boundary
  • backups, DR, break-glass, monitoring, and audit shipping for the broker

Out of scope:

  • replacing flex-auth/Topaz with Keycloak Authorization Services
  • migrating the deployed lightweight stack off KeyCape (coexistence only)
  • application-side OIDC client code (apps target the IAM Profile spec)
  • deploying OpenBao itself (Railiance platform) — consumed, not built
  • tenant-specific federation policy for tenants beyond tenant:platform and tenant:coulomb

Tasks

id: NK-WP-0011-T1
state_hub_task_id: 934f9223-2b6f-4d01-b49b-406b5b98b6e4
status: todo
priority: high

Decision record — expanded-mode adoption & federation topology. Write an ADR (ADR-0009) capturing: the concrete trigger for switching a tenant from lightweight to expanded mode; whether Keycloak acts as an OIDC identity broker, a SAML service provider, or both; the realm-per-tenant mapping onto tenant:platform / tenant:coulomb; how the Keycloak issuer coexists with the KeyCape issuer (id.coulomb.social) so applications still target one IAM Profile contract; and the canonical hostname/issuer for the broker. Resolve or supersede D2 from NK-WP-0001.

id: NK-WP-0011-T2
state_hub_task_id: 7b514cda-41b3-492f-8731-5a131422059d
status: todo
priority: high

PostgreSQL keycloak_db on the existing operator. Add a keycloak database and role to the CloudNativePG instance from NK-WP-0003 (do not deploy a new database). Source credentials from OpenBao via ESO into a K8s Secret. Confirm the existing backup schedule covers the new database and run a restore drill for keycloak_db specifically.

id: NK-WP-0011-T3
state_hub_task_id: 8c29602e-ab9b-446a-8fee-5e2d8fbcb100
status: todo
priority: high

Deploy expanded-mode Keycloak. Build a custom image (kc.sh build, privacyIDEA provider JAR included only if T5 delegates MFA to privacyIDEA). Deploy via plain Helm on RAILIANCE01 behind Traefik + cert-manager at the issuer hostname from T1. Admin bootstrap secret and DB secret come from OpenBao/ESO — never typed, never in git. Hostname strictness + proxy headers configured for Traefik. Realm import is GitOps-friendly (realm JSON/CR in git).

id: NK-WP-0011-T4
state_hub_task_id: d62d7683-24b1-458c-9fd6-96e576b52a64
status: todo
priority: high

Upstream federation. Configure identity brokering for Entra ID (OIDC), on-prem AD (LDAP user federation), and a generic SAML 2.0 IdP. Map each source's claims/attributes into the NetKingdom IAM Profile shape: issuer, audience, subject, groups, tenant, and assurance evidence. Define the attribute/claim mappers and group→role mapping. Verify a federated login end-to-end for at least the Entra ID path.

id: NK-WP-0011-T5
state_hub_task_id: 85319768-5d81-460d-89dc-8de76b63e0dc
status: todo
priority: medium

MFA / assurance source. Decide and implement the assurance model: MFA enforced by privacyIDEA (via the Keycloak provider JAR + a "privacyIDEA Browser" flow, carried over from NK-WP-0001) versus trusting upstream IdP MFA (e.g. Entra Conditional Access) and reflecting it as assurance evidence in the token. Require step-up for admin console and platform-root-sensitive clients. Ensure assurance evidence is carried in the IAM Profile token so flex-auth can gate privileged actions on it.

id: NK-WP-0011-T6
state_hub_task_id: 53801dc2-7bdb-4fa3-9fa0-c1450ef1003b
status: todo
priority: high

IAM Profile conformance & downstream coexistence. Run IAM Profile conformance checks against the Keycloak issuer (discovery document, PKCE, token/claim shape, JWKS, userinfo). Verify an application configured for the IAM Profile can authenticate against either the KeyCape or the Keycloak issuer per the T1 selection rule. Use the canonical canon/standards/iam-profile_v0.2.md contract and the executable suite in tools/iam-profile-conformance/. Document per-tenant issuer selection.

id: NK-WP-0011-T7
state_hub_task_id: 981f79bd-63ee-4da0-8d7d-9af8e468715e
status: todo
priority: high

Recursive tenancy & authorization boundary. Implement realm-per-tenant with platform-root guardrails: tenant admins manage only their realm and must not be able to alter IAM Profile semantics, the platform realm, federation trust, OpenBao platform mounts, or audit retention (per the flex-auth/Topaz implications in the architecture doc). Confirm flex-auth + Topaz remains the PDP; if a Keycloak Authorization Services adapter is used at all, document it as a delegated, non-canonical adapter.

id: NK-WP-0011-T8
state_hub_task_id: 13634760-7817-40c0-b7db-5e0f4196dbf0
status: todo
priority: medium

Backups, DR, break-glass, monitoring, audit. Realm exports to git; DB backup + restore drill (T2); break-glass admin path disabled-by-default with alerting on use; Prometheus/Grafana for auth success/failure, MFA latency, federation errors. Ship Keycloak events to the durable platform audit sink alongside flex-auth/Topaz/OpenBao records, with correlation ids — satisfying the "Audit sink" and "Break-glass" rows of the production-readiness checklist.

Acceptance Criteria

  • An ADR records the expanded-mode trigger, federation topology, realm-per-tenant model, and KeyCape/Keycloak issuer coexistence.
  • A federated user from at least one enterprise IdP (Entra ID) can log in and receive an IAM Profile-conformant token with tenant + assurance claims.
  • Keycloak secrets originate from OpenBao; none are bootstrapped from KeePassXC or committed to git.
  • flex-auth + Topaz remains the authorization decision point; Keycloak is not the canonical policy engine.
  • Tenant admins cannot cross the platform-root boundary.
  • Keycloak audit events land in the durable platform audit sink with correlation ids, and a DR/break-glass drill has passed.

Open Questions / Dependencies on Other Repos

  • key-cape: does coexistence require KeyCape changes, or can both issuers serve the same IAM Profile unchanged? (EP-NK-001 federation extension point.)
  • flex-auth: confirmed claim/decision-envelope contract for tenant + assurance evidence sourced from a federated token.
  • railiance-platform: OpenBao must expose a Keycloak auth role / ESO path before T3; unseal/break-glass story must be ready.
  • IAM Profile spec: resolved by NK-WP-0012. T6 consumes canon/standards/iam-profile_v0.2.md and tools/iam-profile-conformance/.