--- id: NK-WP-0001 type: workplan title: "SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes" domain: netkingdom status: active owner: worsch topic_slug: netkingdom state_hub_workstream_id: 39263c4b-ef70-4053-b782-350834b7e1be created: "2026-02-28" updated: "2026-02-28" --- # SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes ## Summary Deploy a hardened SSO and MFA platform on Kubernetes: Keycloak as the OIDC/SAML identity provider, privacyIDEA as the MFA/token engine, integrated via the privacyIDEA Keycloak Provider. This is the foundational security layer for the net-kingdom DevSecOps platform. ## Context Synthesised from two AI protoplans (wiki/WorkplanOneChatgpt.md and wiki/WorkplanOneGrok.md). Both sources converge on the same architecture; this plan picks the most concrete and production-aligned choices from each: - **Single-credential bootstrap** (Grok) — one master secret unlocks the vault; all other credentials are vault-managed and never typed manually. - **Phase structure** (ChatGPT) — eight sequential phases reducing blast radius at each step. - **Tooling choices** (both) — Keycloak Operator or codecentric Helm, gpappsoft privacyIDEA Helm, CloudNativePG for PostgreSQL, cert-manager for TLS, Traefik as ingress (K3s native, aligned with Railiance). - **Custom Keycloak image** (both) — JAR baked into image via `kc.sh build` rather than `kubectl cp`; clean GitOps pattern. ## Architecture ``` Internet │ TLS (cert-manager / Let's Encrypt) ┌──────┴──────┐ │ Traefik │ (K3s native ingress) └──┬───────┬──┘ │ │ keycloak.… pi.… pi-account.… │ │ │ ┌──────┘ ┌────┘ │ ▼ ▼ │ [Keycloak] [privacyIDEA]◄──┘ (self-service portal) │ │ └────┬────┘ ▼ [PostgreSQL] (CloudNativePG, namespace: databases) │ [Vault / K8s Secrets] ← single credential unlocks ``` **Namespaces:** `sso` (Keycloak), `mfa` (privacyIDEA), `databases` **Integration:** Keycloak runs the browser login flow; privacyIDEA provides MFA via the privacyIDEA Keycloak Provider JAR (baked into custom image). ## Dependencies - Depends on: `railiance/three-phoenix-ha-cluster` — full production deployment targets the ThreePhoenix K3s HA cluster. Development/staging can proceed on a single-node k3s instance. - Depends on: `railiance/phase-0-operational-baseline` — cert-manager, TLS, backup strategy must be operational before going live. ## Tasks ### T01 — Phase 0: Vault & secret bootstrap (single-credential principle) ```task id: NK-WP-0001-T01 state_hub_task_id: 7992528c-d533-44e5-bcce-f92aaa2b75b2 status: todo priority: critical ``` Create the vault (KeePassXC .kdbx or self-hosted Bitwarden; HashiCorp Vault for later production hardening). Generate and store all secrets inside the vault — never typed again: - privacyIDEA: `SECRET_KEY` (64+ chars), `PI_PEPPER` (32+ chars), `PI_ENCFILE` content (`pi-manage create_enckey`). - PostgreSQL: root + `keycloak` + `privacyidea` user passwords. - Keycloak: admin bootstrap secret + DB password. - TLS: ACME account key (if not delegated fully to cert-manager). - Break-glass: admin credentials + offline recovery OTP seed. Export an age-encrypted ops bundle (encrypted tar of all secret YAML manifests). Enable K8s encryption-at-rest. Confirm secret injection strategy: External Secrets Operator + Vault backend, or sops/age for GitOps. **Done when:** vault created, all secrets generated, encrypted ops bundle exported and stored offsite. Secret injection strategy decided. --- ### T02 — Phase 1: K8s foundations (namespaces, NetworkPolicies, cert-manager) ```task id: NK-WP-0001-T02 state_hub_task_id: 721ca6b2-0cf4-4008-a966-87b1563550fa status: todo priority: high ``` Create namespaces: `sso`, `mfa`, `databases`. Verify cert-manager is installed and functional on the K3s cluster (Traefik ingress). Define and apply NetworkPolicies to prevent lateral movement: - Only ingress controller reaches Keycloak/privacyIDEA service ports. - Only Keycloak pods call the privacyIDEA API. - Only app pods/ingress reach Keycloak. - DB pods reachable only from `sso` and `mfa` namespaces. Verify StorageClass for PVCs. **Done when:** namespaces exist, NetworkPolicies applied and tested (verify denied paths), cert-manager issues a test certificate. --- ### T03 — Phase 2: PostgreSQL deployment (Keycloak + privacyIDEA DBs) ```task id: NK-WP-0001-T03 state_hub_task_id: 7fa60004-deb2-4db5-a470-f95dda07f6ab status: todo priority: high ``` Deploy PostgreSQL via CloudNativePG operator (preferred: aligns with ThreePhoenix HA posture) or Bitnami Helm chart as fallback. Create: - Database `keycloak_db`, user `keycloak` - Database `privacyidea_db`, user `privacyidea` Store DB credentials as K8s Secrets (or ExternalSecrets from vault). Configure automated DB backups to object storage (S3 or MinIO). **Run a restore drill before proceeding** — a failed restore later is a critical blocker. **Done when:** both DBs live, credentials in K8s Secrets, backup running, restore drill passed. --- ### T04 — Phase 3: Deploy privacyIDEA (MFA core) ```task id: NK-WP-0001-T04 state_hub_task_id: 6ad1296a-a488-4031-b665-f77030e971ed status: todo priority: high ``` Deploy privacyIDEA via `gpappsoft/privacyidea` Helm chart (Artifact Hub) or custom manifests (Deployment + Service + Ingress + PVC + Secrets). Key Helm values: ```yaml database: password: privacyidea: config: SECRET_KEY: PI_PEPPER: encfile: enabled: true existingSecret: privacyidea-secrets key: PI_ENCFILE ingress: enabled: true hostname: pi.yourdomain.com tls: true ``` Create K8s Secrets: `privacyidea-config`, `privacyidea-enckey`, `privacyidea-auditkeys`. Configure Ingress + TLS. Add rate-limiting and WAF rules at Traefik level. **Bootstrap (single-credential moment):** 1. `kubectl exec` into pod, run `pi-manage admin add pi-admin` — password comes from vault (only time a password is typed). 2. Immediately enroll MFA for `pi-admin` (TOTP or hardware token). 3. Create `trigger-admin` with `triggerchallenge` right only. 4. Apply policies: WebUI restricted to VPN/office IPs; MFA required for all admin actions. **Done when:** privacyIDEA reachable at pi.yourdomain.com with valid TLS, pi-admin enrolled with MFA, trigger-admin created, rate-limiting active. --- ### T05 — Phase 4: Deploy Keycloak (SSO core) ```task id: NK-WP-0001-T05 state_hub_task_id: b9f73aa6-9035-4643-9905-64e73a29b298 status: todo priority: high ``` Build a **custom Keycloak image** that includes the privacyIDEA Provider JAR: ```dockerfile FROM quay.io/keycloak/keycloak: COPY PrivacyIDEA-Provider.jar /opt/keycloak/providers/ RUN /opt/keycloak/bin/kc.sh build ``` Deploy via official Keycloak Operator (CRD-based) or codecentric KeycloakX Helm chart. Configure: - DB: `keycloak_db` (credentials from K8s Secret) - Ingress + TLS: `keycloak.yourdomain.com` (Traefik + cert-manager) - Hostname strictness + proxy mode (Traefik forward headers) - Metrics/logging (Prometheus annotations) - Admin bootstrap secret from vault - Realm import strategy: GitOps-friendly (realm JSON in git or CR) **Done when:** Keycloak reachable with valid TLS, admin console accessible, custom image with privacyIDEA JAR deployed and verified. --- ### T06 — Phase 5: Realm config & MFA authentication flow ```task id: NK-WP-0001-T06 state_hub_task_id: 3b6379a4-a27b-4d25-82be-bc600879f036 status: todo priority: medium ``` In Keycloak: 1. Create/configure realm; set identity source of truth (Keycloak internal users recommended for initial deployment; LDAP/AD or Entra as extension). 2. Create Authentication Flow "privacyIDEA Browser": - Add privacyIDEA execution step (REQUIRED) - Config: privacyIDEA URL = `https://pi.yourdomain.com`, service account = `trigger-admin` (secret from K8s Secret) - Optional: bypass group (break-glass) with strict restrictions + alerts 3. Set this flow as the default browser flow. 4. Require MFA step-up for admin console and sensitive OIDC clients. Test: - Normal user: password → MFA OTP → session established - Admin console: MFA required - Failure modes: wrong OTP, token missing, privacyIDEA unreachable - Break-glass: bypass works, alert fires **Done when:** end-to-end auth works for normal and admin paths, all failure modes handled gracefully. --- ### T07 — Phase 6: User management, policies & self-service portal ```task id: NK-WP-0001-T07 state_hub_task_id: c7cf902a-b480-4545-a536-293070945206 status: todo priority: medium ``` Decide and implement identity source of truth (Keycloak internal → privacyIDEA Keycloak resolver, or LDAP/AD shared). The privacyIDEA 3.12+ Keycloak user resolver simplifies alignment. Define policies in privacyIDEA: - Allowed token types: TOTP, hardware (YubiKey), passkey - Enrollment rules (who can self-enroll, which token types) - Admin rights separation: super-admin vs. helpdesk-admin Enable self-service portal at `pi-account.yourdomain.com` for user token enrollment/replacement. Configure auditing and log shipping: privacyIDEA audit logs + Keycloak events → centralized logging (ELK/Loki or equivalent). Token lifecycle policies: enrollment, revocation, re-enrollment on device loss. **Done when:** policies documented and applied, self-service portal live, audit logs flowing. --- ### T08 — Phase 7: Backups, DR, break-glass & monitoring ```task id: NK-WP-0001-T08 state_hub_task_id: 9cbd1d89-b5bf-491e-9d16-b1c7d57076fb status: todo priority: medium ``` **Backups:** - DB backups: Keycloak + privacyIDEA (Velero or CloudNativePG scheduled backup to S3/MinIO). Test restore. - privacyIDEA encryption/audit key Secrets: encrypted export, versioned. - Keycloak realm exports: stored as JSON in git (GitOps-friendly). **Disaster recovery drill** (mandatory before production): 1. Restore DB + keys into a fresh namespace. 2. Verify token validation still works — this catches key/secret mistakes. **Break-glass procedure:** - Disabled-by-default Keycloak admin path or group exemption. - Break-glass credentials stored offline + vault. Alert (PagerDuty/webhook) on every use. **Monitoring:** - Prometheus scraping Keycloak + privacyIDEA metrics. - Grafana dashboards: auth success/failure rates, MFA challenge latency, token count by type. - Alert: privacyIDEA unreachable (blocks all logins). **Final validation:** - All external traffic: Ingress + HSTS + strict TLS. - NetworkPolicies verified (no unintended open paths). - End-to-end: app → Keycloak → privacyIDEA OTP → SSO session established. **Done when:** DR drill passed, monitoring live, break-glass procedure documented and tested, HSTS and NetworkPolicies verified. --- ## Deliverables Checklist - [ ] Vault created; all secrets generated and encrypted ops bundle exported - [ ] `sso`, `mfa`, `databases` namespaces + NetworkPolicies deployed - [ ] TLS everywhere via cert-manager (Traefik ingress) - [ ] PostgreSQL live; both DBs created; backup + restore tested - [ ] privacyIDEA running at `pi.yourdomain.com`; pi-admin MFA enrolled; trigger-admin created with least-privilege rights - [ ] Keycloak running from custom image including privacyIDEA Provider JAR - [ ] Keycloak "privacyIDEA Browser" flow enforced as default - [ ] Realm exported to git; admin secret from vault - [ ] Self-service portal live; token lifecycle policies defined - [ ] DR drill passed; monitoring live; break-glass documented and tested ## Open Questions / Extension Points - **Vault backend**: KeePassXC (simple) vs HashiCorp Vault in-cluster (rotation, audit trail). Start with KeePassXC; upgrade to Vault when ThreePhoenix cluster is stable. - **Identity source of truth**: Keycloak-internal vs LDAP/AD/Entra. Decision needed before T07. - **GitOps tooling**: ArgoCD or Flux for declarative Helm management? Aligns with Railiance staged-promotion-lifecycle workstream. - **Cluster target**: Development on single-node k3s; production on ThreePhoenix (3-node HA). Workplan covers both; HA-specific steps noted where they diverge.