Files
net-kingdom/workplans/NK-WP-0001-sso-mfa-platform.md
tegwick 004a8d6e6b Add CLAUDE.md, wiki protoplans, and NK-WP-0001 workplan
Initialises the net-kingdom project structure:
- README.md: updated title and description
- CLAUDE.md: project instructions and State Hub integration config
- wiki/: three reference docs (NetKingdom overview, ChatGPT and Grok
  protoplans for the SSO/MFA platform)
- workplans/NK-WP-0001-sso-mfa-platform.md: combined workplan (8 phases,
  8 tasks) synthesised from the two protoplans; registered in the
  Custodian State Hub (workstream 39263c4b)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 17:21:51 +01:00

12 KiB

id, type, title, domain, status, owner, topic_slug, state_hub_workstream_id, created, updated
id type title domain status owner topic_slug state_hub_workstream_id created updated
NK-WP-0001 workplan SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes netkingdom active worsch netkingdom 39263c4b-ef70-4053-b782-350834b7e1be 2026-02-28 2026-02-28

SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes

Summary

Deploy a hardened SSO and MFA platform on Kubernetes: Keycloak as the OIDC/SAML identity provider, privacyIDEA as the MFA/token engine, integrated via the privacyIDEA Keycloak Provider. This is the foundational security layer for the net-kingdom DevSecOps platform.

Context

Synthesised from two AI protoplans (wiki/WorkplanOneChatgpt.md and wiki/WorkplanOneGrok.md). Both sources converge on the same architecture; this plan picks the most concrete and production-aligned choices from each:

  • Single-credential bootstrap (Grok) — one master secret unlocks the vault; all other credentials are vault-managed and never typed manually.
  • Phase structure (ChatGPT) — eight sequential phases reducing blast radius at each step.
  • Tooling choices (both) — Keycloak Operator or codecentric Helm, gpappsoft privacyIDEA Helm, CloudNativePG for PostgreSQL, cert-manager for TLS, Traefik as ingress (K3s native, aligned with Railiance).
  • Custom Keycloak image (both) — JAR baked into image via kc.sh build rather than kubectl cp; clean GitOps pattern.

Architecture

                  Internet
                     │ TLS (cert-manager / Let's Encrypt)
              ┌──────┴──────┐
              │   Traefik   │  (K3s native ingress)
              └──┬───────┬──┘
                 │       │
         keycloak.…  pi.…   pi-account.…
                 │       │         │
          ┌──────┘  ┌────┘         │
          ▼         ▼              │
      [Keycloak]  [privacyIDEA]◄──┘  (self-service portal)
          │         │
          └────┬────┘
               ▼
          [PostgreSQL]  (CloudNativePG, namespace: databases)
               │
          [Vault / K8s Secrets]  ← single credential unlocks

Namespaces: sso (Keycloak), mfa (privacyIDEA), databases

Integration: Keycloak runs the browser login flow; privacyIDEA provides MFA via the privacyIDEA Keycloak Provider JAR (baked into custom image).

Dependencies

  • Depends on: railiance/three-phoenix-ha-cluster — full production deployment targets the ThreePhoenix K3s HA cluster. Development/staging can proceed on a single-node k3s instance.
  • Depends on: railiance/phase-0-operational-baseline — cert-manager, TLS, backup strategy must be operational before going live.

Tasks

T01 — Phase 0: Vault & secret bootstrap (single-credential principle)

id: NK-WP-0001-T01
state_hub_task_id: 7992528c-d533-44e5-bcce-f92aaa2b75b2
status: todo
priority: critical

Create the vault (KeePassXC .kdbx or self-hosted Bitwarden; HashiCorp Vault for later production hardening). Generate and store all secrets inside the vault — never typed again:

  • privacyIDEA: SECRET_KEY (64+ chars), PI_PEPPER (32+ chars), PI_ENCFILE content (pi-manage create_enckey).
  • PostgreSQL: root + keycloak + privacyidea user passwords.
  • Keycloak: admin bootstrap secret + DB password.
  • TLS: ACME account key (if not delegated fully to cert-manager).
  • Break-glass: admin credentials + offline recovery OTP seed.

Export an age-encrypted ops bundle (encrypted tar of all secret YAML manifests). Enable K8s encryption-at-rest. Confirm secret injection strategy: External Secrets Operator + Vault backend, or sops/age for GitOps.

Done when: vault created, all secrets generated, encrypted ops bundle exported and stored offsite. Secret injection strategy decided.


T02 — Phase 1: K8s foundations (namespaces, NetworkPolicies, cert-manager)

id: NK-WP-0001-T02
state_hub_task_id: 721ca6b2-0cf4-4008-a966-87b1563550fa
status: todo
priority: high

Create namespaces: sso, mfa, databases. Verify cert-manager is installed and functional on the K3s cluster (Traefik ingress). Define and apply NetworkPolicies to prevent lateral movement:

  • Only ingress controller reaches Keycloak/privacyIDEA service ports.
  • Only Keycloak pods call the privacyIDEA API.
  • Only app pods/ingress reach Keycloak.
  • DB pods reachable only from sso and mfa namespaces.

Verify StorageClass for PVCs.

Done when: namespaces exist, NetworkPolicies applied and tested (verify denied paths), cert-manager issues a test certificate.


T03 — Phase 2: PostgreSQL deployment (Keycloak + privacyIDEA DBs)

id: NK-WP-0001-T03
state_hub_task_id: 7fa60004-deb2-4db5-a470-f95dda07f6ab
status: todo
priority: high

Deploy PostgreSQL via CloudNativePG operator (preferred: aligns with ThreePhoenix HA posture) or Bitnami Helm chart as fallback. Create:

  • Database keycloak_db, user keycloak
  • Database privacyidea_db, user privacyidea

Store DB credentials as K8s Secrets (or ExternalSecrets from vault). Configure automated DB backups to object storage (S3 or MinIO). Run a restore drill before proceeding — a failed restore later is a critical blocker.

Done when: both DBs live, credentials in K8s Secrets, backup running, restore drill passed.


T04 — Phase 3: Deploy privacyIDEA (MFA core)

id: NK-WP-0001-T04
state_hub_task_id: 6ad1296a-a488-4031-b665-f77030e971ed
status: todo
priority: high

Deploy privacyIDEA via gpappsoft/privacyidea Helm chart (Artifact Hub) or custom manifests (Deployment + Service + Ingress + PVC + Secrets). Key Helm values:

database:
  password: <from-vault>
privacyidea:
  config:
    SECRET_KEY: <from-vault>
    PI_PEPPER: <from-vault>
  encfile:
    enabled: true
    existingSecret: privacyidea-secrets
    key: PI_ENCFILE
  ingress:
    enabled: true
    hostname: pi.yourdomain.com
    tls: true

Create K8s Secrets: privacyidea-config, privacyidea-enckey, privacyidea-auditkeys. Configure Ingress + TLS. Add rate-limiting and WAF rules at Traefik level.

Bootstrap (single-credential moment):

  1. kubectl exec into pod, run pi-manage admin add pi-admin — password comes from vault (only time a password is typed).
  2. Immediately enroll MFA for pi-admin (TOTP or hardware token).
  3. Create trigger-admin with triggerchallenge right only.
  4. Apply policies: WebUI restricted to VPN/office IPs; MFA required for all admin actions.

Done when: privacyIDEA reachable at pi.yourdomain.com with valid TLS, pi-admin enrolled with MFA, trigger-admin created, rate-limiting active.


T05 — Phase 4: Deploy Keycloak (SSO core)

id: NK-WP-0001-T05
state_hub_task_id: b9f73aa6-9035-4643-9905-64e73a29b298
status: todo
priority: high

Build a custom Keycloak image that includes the privacyIDEA Provider JAR:

FROM quay.io/keycloak/keycloak:<version>
COPY PrivacyIDEA-Provider.jar /opt/keycloak/providers/
RUN /opt/keycloak/bin/kc.sh build

Deploy via official Keycloak Operator (CRD-based) or codecentric KeycloakX Helm chart. Configure:

  • DB: keycloak_db (credentials from K8s Secret)
  • Ingress + TLS: keycloak.yourdomain.com (Traefik + cert-manager)
  • Hostname strictness + proxy mode (Traefik forward headers)
  • Metrics/logging (Prometheus annotations)
  • Admin bootstrap secret from vault
  • Realm import strategy: GitOps-friendly (realm JSON in git or CR)

Done when: Keycloak reachable with valid TLS, admin console accessible, custom image with privacyIDEA JAR deployed and verified.


T06 — Phase 5: Realm config & MFA authentication flow

id: NK-WP-0001-T06
state_hub_task_id: 3b6379a4-a27b-4d25-82be-bc600879f036
status: todo
priority: medium

In Keycloak:

  1. Create/configure realm; set identity source of truth (Keycloak internal users recommended for initial deployment; LDAP/AD or Entra as extension).
  2. Create Authentication Flow "privacyIDEA Browser":
    • Add privacyIDEA execution step (REQUIRED)
    • Config: privacyIDEA URL = https://pi.yourdomain.com, service account = trigger-admin (secret from K8s Secret)
    • Optional: bypass group (break-glass) with strict restrictions + alerts
  3. Set this flow as the default browser flow.
  4. Require MFA step-up for admin console and sensitive OIDC clients.

Test:

  • Normal user: password → MFA OTP → session established
  • Admin console: MFA required
  • Failure modes: wrong OTP, token missing, privacyIDEA unreachable
  • Break-glass: bypass works, alert fires

Done when: end-to-end auth works for normal and admin paths, all failure modes handled gracefully.


T07 — Phase 6: User management, policies & self-service portal

id: NK-WP-0001-T07
state_hub_task_id: c7cf902a-b480-4545-a536-293070945206
status: todo
priority: medium

Decide and implement identity source of truth (Keycloak internal → privacyIDEA Keycloak resolver, or LDAP/AD shared). The privacyIDEA 3.12+ Keycloak user resolver simplifies alignment.

Define policies in privacyIDEA:

  • Allowed token types: TOTP, hardware (YubiKey), passkey
  • Enrollment rules (who can self-enroll, which token types)
  • Admin rights separation: super-admin vs. helpdesk-admin

Enable self-service portal at pi-account.yourdomain.com for user token enrollment/replacement.

Configure auditing and log shipping: privacyIDEA audit logs + Keycloak events → centralized logging (ELK/Loki or equivalent). Token lifecycle policies: enrollment, revocation, re-enrollment on device loss.

Done when: policies documented and applied, self-service portal live, audit logs flowing.


T08 — Phase 7: Backups, DR, break-glass & monitoring

id: NK-WP-0001-T08
state_hub_task_id: 9cbd1d89-b5bf-491e-9d16-b1c7d57076fb
status: todo
priority: medium

Backups:

  • DB backups: Keycloak + privacyIDEA (Velero or CloudNativePG scheduled backup to S3/MinIO). Test restore.
  • privacyIDEA encryption/audit key Secrets: encrypted export, versioned.
  • Keycloak realm exports: stored as JSON in git (GitOps-friendly).

Disaster recovery drill (mandatory before production):

  1. Restore DB + keys into a fresh namespace.
  2. Verify token validation still works — this catches key/secret mistakes.

Break-glass procedure:

  • Disabled-by-default Keycloak admin path or group exemption.
  • Break-glass credentials stored offline + vault. Alert (PagerDuty/webhook) on every use.

Monitoring:

  • Prometheus scraping Keycloak + privacyIDEA metrics.
  • Grafana dashboards: auth success/failure rates, MFA challenge latency, token count by type.
  • Alert: privacyIDEA unreachable (blocks all logins).

Final validation:

  • All external traffic: Ingress + HSTS + strict TLS.
  • NetworkPolicies verified (no unintended open paths).
  • End-to-end: app → Keycloak → privacyIDEA OTP → SSO session established.

Done when: DR drill passed, monitoring live, break-glass procedure documented and tested, HSTS and NetworkPolicies verified.


Deliverables Checklist

  • Vault created; all secrets generated and encrypted ops bundle exported
  • sso, mfa, databases namespaces + NetworkPolicies deployed
  • TLS everywhere via cert-manager (Traefik ingress)
  • PostgreSQL live; both DBs created; backup + restore tested
  • privacyIDEA running at pi.yourdomain.com; pi-admin MFA enrolled; trigger-admin created with least-privilege rights
  • Keycloak running from custom image including privacyIDEA Provider JAR
  • Keycloak "privacyIDEA Browser" flow enforced as default
  • Realm exported to git; admin secret from vault
  • Self-service portal live; token lifecycle policies defined
  • DR drill passed; monitoring live; break-glass documented and tested

Open Questions / Extension Points

  • Vault backend: KeePassXC (simple) vs HashiCorp Vault in-cluster (rotation, audit trail). Start with KeePassXC; upgrade to Vault when ThreePhoenix cluster is stable.
  • Identity source of truth: Keycloak-internal vs LDAP/AD/Entra. Decision needed before T07.
  • GitOps tooling: ArgoCD or Flux for declarative Helm management? Aligns with Railiance staged-promotion-lifecycle workstream.
  • Cluster target: Development on single-node k3s; production on ThreePhoenix (3-node HA). Workplan covers both; HA-specific steps noted where they diverge.