diff --git a/workplans/NK-WP-0001-sso-mfa-platform.md b/workplans/NK-WP-0001-sso-mfa-platform.md index ce1bea1..6845e1c 100644 --- a/workplans/NK-WP-0001-sso-mfa-platform.md +++ b/workplans/NK-WP-0001-sso-mfa-platform.md @@ -21,11 +21,12 @@ superseded_by: NK-WP-0003 > - T01 (secret bootstrap) → replaced by NK-WP-0004 + NK-WP-0005 > - T02 (K8s foundations) → done, reused by NK-WP-0003 > - T03 (PostgreSQL) → done, reused by NK-WP-0003 -> - T04 (privacyIDEA) → superseded by NK-WP-0003-T04 -> - T05–T08 (Keycloak) → deferred indefinitely; revisit if/when Keycloak -> is needed for enterprise federation or SAML requirements +> - T04 (privacyIDEA) → cancelled; superseded by NK-WP-0003-T04 +> - T05–T08 (Keycloak) → extracted into **NK-WP-0011** (enterprise +> federation / SAML, expanded-mode Keycloak). No longer tracked here. > -> **Active work: see NK-WP-0003.** +> **Active work: see NK-WP-0003 (deployed stack) and NK-WP-0011 +> (enterprise federation).** ## Summary @@ -213,9 +214,9 @@ restore drill passed. ```task id: NK-WP-0001-T04 state_hub_task_id: 6ad1296a-a488-4031-b665-f77030e971ed -status: in_progress +status: cancelled priority: high -note: Manifests committed (pvc, configmap, deployment, middleware, ingress). Scripts: create-secrets.sh, enckey-bootstrap.sh, bootstrap-admin.sh. verify-t04.sh. Domain pink.coulomb.social (CP-NK-002/003). Pending: apply to live cluster, run enckey-bootstrap.sh, bootstrap-admin.sh. +note: Cancelled 2026-05-20. privacyIDEA deployment superseded by NK-WP-0003-T04 (privacyIDEA now runs in the live KeyCape stack on RAILIANCE01). This Keycloak-path variant is no longer pursued. ``` Deploy privacyIDEA via `gpappsoft/privacyidea` Helm chart (Artifact Hub) or @@ -261,8 +262,9 @@ pi-admin enrolled with MFA, trigger-admin created, rate-limiting active. ```task id: NK-WP-0001-T05 state_hub_task_id: b9f73aa6-9035-4643-9905-64e73a29b298 -status: todo +status: cancelled priority: high +note: Migrated to NK-WP-0011 (enterprise federation / SAML). Refined there against the deployed KeyCape stack and the OpenBao/flex-auth architecture. ``` Build a **custom Keycloak image** that includes the privacyIDEA Provider JAR: @@ -294,8 +296,9 @@ custom image with privacyIDEA JAR deployed and verified. ```task id: NK-WP-0001-T06 state_hub_task_id: 3b6379a4-a27b-4d25-82be-bc600879f036 -status: todo +status: cancelled priority: medium +note: Migrated to NK-WP-0011 (enterprise federation / SAML). ``` In Keycloak: @@ -327,8 +330,9 @@ modes handled gracefully. ```task id: NK-WP-0001-T07 state_hub_task_id: c7cf902a-b480-4545-a536-293070945206 -status: todo +status: cancelled priority: medium +note: Migrated to NK-WP-0011 (enterprise federation / SAML). ``` **Decision D2 applies:** identity source of truth is Keycloak-internal with @@ -369,8 +373,9 @@ audit logs flowing, Keycloak resolver configured. ```task id: NK-WP-0001-T08 state_hub_task_id: 9cbd1d89-b5bf-491e-9d16-b1c7d57076fb -status: todo +status: cancelled priority: medium +note: Migrated to NK-WP-0011 (enterprise federation / SAML). ``` **Backups:** diff --git a/workplans/NK-WP-0011-enterprise-federation-saml.md b/workplans/NK-WP-0011-enterprise-federation-saml.md new file mode 100644 index 0000000..b0e75d5 --- /dev/null +++ b/workplans/NK-WP-0011-enterprise-federation-saml.md @@ -0,0 +1,250 @@ +--- +id: NK-WP-0011 +type: workplan +title: "Enterprise Federation & SAML — Expanded-Mode Keycloak Identity Broker" +domain: netkingdom +repo: net-kingdom +status: proposed +owner: worsch +topic_slug: netkingdom +created: "2026-05-20" +updated: "2026-05-20" +state_hub_workstream_id: TBD +depends_on: + - NK-WP-0003 + - NK-WP-0004 + - NK-WP-0006 +supersedes_tasks: + - NK-WP-0001-T05 + - NK-WP-0001-T06 + - NK-WP-0001-T07 + - NK-WP-0001-T08 +--- + +# NK-WP-0011 — Enterprise Federation & SAML (Expanded-Mode Keycloak) + +> Extracted from NK-WP-0001 (T05–T08, the deferred Keycloak path) and +> refined against where net-kingdom actually stands today: a deployed +> KeyCape lightweight stack, an OpenBao runtime-secret authority, and a +> recursive platform/tenant authorization model. This is **expanded +> identity mode** in the architecture (`docs/platform-identity-security-architecture.md`). + +## Goal + +Stand up **Keycloak as an identity broker** that federates upstream +enterprise identity providers (Microsoft Entra ID / Azure AD via OIDC, +on-prem Active Directory via LDAP, and generic SAML 2.0 IdPs) and issues +**NetKingdom IAM Profile-conformant** tokens downstream — without +displacing flex-auth as the authorization decision point or breaking the +recursive platform/tenant boundary. + +This is the answer to the long-standing open question +*"when does the platform switch from key-cape lightweight mode to Keycloak +expanded mode?"* — expanded mode exists **specifically** to onboard +identities that originate in an external enterprise IdP, which the +lightweight Authelia + LLDAP stack cannot broker. + +## Why this is not just "resume NK-WP-0001" + +NK-WP-0001 assumed a greenfield: bootstrap Vault, build PostgreSQL, treat +Keycloak as the internal user store. None of those assumptions hold now: + +| NK-WP-0001 assumption | Current reality | Effect on this plan | +|---|---|---| +| HashiCorp Vault, bootstrapped from KeePassXC | **OpenBao** is the runtime secret authority (NK-WP-0006); SOPS/age + agent bootstrap exist (NK-WP-0004/0005) | Keycloak DB + admin secrets come from OpenBao via ESO; no new vault bootstrap | +| PostgreSQL built from scratch | CloudNativePG running on RAILIANCE01 (NK-WP-0003) | Add `keycloak_db` to the existing operator, reuse backup pattern | +| Keycloak is the internal source of truth (D2 hybrid) | KeyCape lightweight stack is the *deployed* IAM Profile issuer | Keycloak is a **broker/federation front-end**, not the primary user store | +| Authorization via Keycloak Authorization Services | flex-auth + Topaz is the canonical PDP (ADR-0006) | Keycloak AuthZ Services is at most an optional adapter, never canonical | +| Single-tenant Coulomb deployment | Recursive `tenant:platform` vs `tenant:coulomb` model (NK-WP-0006) | Realm-per-tenant; tenant admins must not receive platform-root | +| MFA solely via privacyIDEA provider JAR | privacyIDEA deployed *and* upstream IdPs carry their own MFA | MFA assurance source becomes a decision, not a default | + +## Architecture + +```text + Enterprise IdPs (upstream) + Entra ID (OIDC) AD (LDAP) SAML 2.0 IdP + │ │ │ + └──────────────┼──────────────┘ + ▼ + [ Keycloak ] expanded-mode broker + │ realm-per-tenant; IAM Profile issuer + │ secrets ← OpenBao (ESO) + │ MFA ← privacyIDEA *or* upstream assurance + ▼ + NetKingdom IAM Profile token (OIDC/PKCE) + │ + ├──► applications (depend on the Profile, not the provider) + └──► flex-auth / Topaz ── authorization decision (PDP) + + coexists with: KeyCape lightweight issuer (id.coulomb.social) +``` + +Keycloak answers identity (who, how authenticated, coarse claims, +assurance). It does **not** answer resource authorization — that stays in +flex-auth (ADR-0006). It does not store runtime secrets — those stay in +OpenBao. + +## Scope + +In scope: + +- decision record for expanded-mode adoption: trigger, federation + topology (broker vs SAML SP), realm-per-tenant model, and coexistence + with the KeyCape lightweight issuer +- custom Keycloak image (privacyIDEA provider JAR if MFA is delegated to + privacyIDEA) and Helm deployment on RAILIANCE01 +- upstream federation: Entra ID (OIDC), AD (LDAP), generic SAML 2.0 IdP +- claim mapping to the NetKingdom IAM Profile (issuer, audience, subject, + groups, tenant, assurance evidence) and IAM Profile conformance checks +- MFA / assurance source decision and enforcement of step-up for + privileged actions +- recursive tenancy: realm-per-tenant, platform-root guardrails, and the + flex-auth/Topaz authorization boundary +- backups, DR, break-glass, monitoring, and audit shipping for the broker + +Out of scope: + +- replacing flex-auth/Topaz with Keycloak Authorization Services +- migrating the deployed lightweight stack off KeyCape (coexistence only) +- application-side OIDC client code (apps target the IAM Profile spec) +- deploying OpenBao itself (Railiance platform) — consumed, not built +- tenant-specific federation policy for tenants beyond `tenant:platform` + and `tenant:coulomb` + +## Tasks + +```task +id: NK-WP-0011-T1 +status: todo +priority: high +``` + +**Decision record — expanded-mode adoption & federation topology.** Write +an ADR (ADR-0009) capturing: the concrete trigger for switching a tenant +from lightweight to expanded mode; whether Keycloak acts as an OIDC +identity broker, a SAML service provider, or both; the realm-per-tenant +mapping onto `tenant:platform` / `tenant:coulomb`; how the Keycloak issuer +coexists with the KeyCape issuer (`id.coulomb.social`) so applications +still target one IAM Profile contract; and the canonical hostname/issuer +for the broker. Resolve or supersede D2 from NK-WP-0001. + +```task +id: NK-WP-0011-T2 +status: todo +priority: high +``` + +**PostgreSQL `keycloak_db` on the existing operator.** Add a `keycloak` +database and role to the CloudNativePG instance from NK-WP-0003 (do not +deploy a new database). Source credentials from OpenBao via ESO into a K8s +Secret. Confirm the existing backup schedule covers the new database and +run a restore drill for `keycloak_db` specifically. + +```task +id: NK-WP-0011-T3 +status: todo +priority: high +``` + +**Deploy expanded-mode Keycloak.** Build a custom image +(`kc.sh build`, privacyIDEA provider JAR included only if T5 delegates MFA +to privacyIDEA). Deploy via plain Helm on RAILIANCE01 behind Traefik + +cert-manager at the issuer hostname from T1. Admin bootstrap secret and DB +secret come from OpenBao/ESO — never typed, never in git. Hostname +strictness + proxy headers configured for Traefik. Realm import is +GitOps-friendly (realm JSON/CR in git). + +```task +id: NK-WP-0011-T4 +status: todo +priority: high +``` + +**Upstream federation.** Configure identity brokering for Entra ID (OIDC), +on-prem AD (LDAP user federation), and a generic SAML 2.0 IdP. Map each +source's claims/attributes into the NetKingdom IAM Profile shape: issuer, +audience, subject, groups, **tenant**, and assurance evidence. Define the +attribute/claim mappers and group→role mapping. Verify a federated login +end-to-end for at least the Entra ID path. + +```task +id: NK-WP-0011-T5 +status: todo +priority: medium +``` + +**MFA / assurance source.** Decide and implement the assurance model: MFA +enforced by privacyIDEA (via the Keycloak provider JAR + a "privacyIDEA +Browser" flow, carried over from NK-WP-0001) versus trusting upstream IdP +MFA (e.g. Entra Conditional Access) and reflecting it as assurance +evidence in the token. Require step-up for admin console and +platform-root-sensitive clients. Ensure assurance evidence is carried in +the IAM Profile token so flex-auth can gate privileged actions on it. + +```task +id: NK-WP-0011-T6 +status: todo +priority: high +``` + +**IAM Profile conformance & downstream coexistence.** Run IAM Profile +conformance checks against the Keycloak issuer (discovery document, PKCE, +token/claim shape, JWKS, userinfo). Verify an application configured for +the IAM Profile can authenticate against either the KeyCape or the +Keycloak issuer per the T1 selection rule. Document per-tenant issuer +selection. + +```task +id: NK-WP-0011-T7 +status: todo +priority: high +``` + +**Recursive tenancy & authorization boundary.** Implement realm-per-tenant +with platform-root guardrails: tenant admins manage only their realm and +must not be able to alter IAM Profile semantics, the platform realm, +federation trust, OpenBao platform mounts, or audit retention (per the +flex-auth/Topaz implications in the architecture doc). Confirm flex-auth + +Topaz remains the PDP; if a Keycloak Authorization Services adapter is +used at all, document it as a delegated, non-canonical adapter. + +```task +id: NK-WP-0011-T8 +status: todo +priority: medium +``` + +**Backups, DR, break-glass, monitoring, audit.** Realm exports to git; DB +backup + restore drill (T2); break-glass admin path disabled-by-default +with alerting on use; Prometheus/Grafana for auth success/failure, MFA +latency, federation errors. Ship Keycloak events to the durable platform +audit sink alongside flex-auth/Topaz/OpenBao records, with correlation +ids — satisfying the "Audit sink" and "Break-glass" rows of the +production-readiness checklist. + +## Acceptance Criteria + +- An ADR records the expanded-mode trigger, federation topology, + realm-per-tenant model, and KeyCape/Keycloak issuer coexistence. +- A federated user from at least one enterprise IdP (Entra ID) can log in + and receive an IAM Profile-conformant token with tenant + assurance + claims. +- Keycloak secrets originate from OpenBao; none are bootstrapped from + KeePassXC or committed to git. +- flex-auth + Topaz remains the authorization decision point; Keycloak is + not the canonical policy engine. +- Tenant admins cannot cross the platform-root boundary. +- Keycloak audit events land in the durable platform audit sink with + correlation ids, and a DR/break-glass drill has passed. + +## Open Questions / Dependencies on Other Repos + +- **key-cape**: does coexistence require KeyCape changes, or can both + issuers serve the same IAM Profile unchanged? (EP-NK-001 federation + extension point.) +- **flex-auth**: confirmed claim/decision-envelope contract for tenant + + assurance evidence sourced from a federated token. +- **railiance-platform**: OpenBao must expose a Keycloak auth role / ESO + path before T3; unseal/break-glass story must be ready. +- **IAM Profile spec**: must be versioned and have an executable + conformance check before T6 can pass (see "Missing" below).