Set NK-WP-0001 status to canonical 'archived' (was non-canonical 'deferred', which the hub rejected). Backfill NK-WP-0011 workstream and task ids from State Hub registration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
17 KiB
id, type, title, domain, status, owner, topic_slug, state_hub_workstream_id, created, updated, superseded_by
| id | type | title | domain | status | owner | topic_slug | state_hub_workstream_id | created | updated | superseded_by |
|---|---|---|---|---|---|---|---|---|---|---|
| NK-WP-0001 | workplan | SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes | netkingdom | archived | worsch | netkingdom | 39263c4b-ef70-4053-b782-350834b7e1be | 2026-02-28 | 2026-03-21 | NK-WP-0003 |
SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes
Status: DEFERRED (2026-03-21) The Keycloak path has been superseded by the KeyCape + Authelia + LLDAP stack (NK-WP-0003). Keycloak is out of scope for the current deployment.
- T01 (secret bootstrap) → replaced by NK-WP-0004 + NK-WP-0005
- T02 (K8s foundations) → done, reused by NK-WP-0003
- T03 (PostgreSQL) → done, reused by NK-WP-0003
- T04 (privacyIDEA) → cancelled; superseded by NK-WP-0003-T04
- T05–T08 (Keycloak) → extracted into NK-WP-0011 (enterprise federation / SAML, expanded-mode Keycloak). No longer tracked here.
Active work: see NK-WP-0003 (deployed stack) and NK-WP-0011 (enterprise federation).
Summary
Deploy a hardened SSO and MFA platform on Kubernetes: Keycloak as the
OIDC/SAML identity provider, privacyIDEA as the MFA/token engine,
integrated via the privacyIDEA Keycloak Provider. Deferred — see NK-WP-0003.
This workplan is retained as a reference for the Keycloak-based architecture decisions (D1–D5) and for the T01–T03 infrastructure that was built and remains in use.
Context
Synthesised from two AI protoplans (wiki/WorkplanOneChatgpt.md and wiki/WorkplanOneGrok.md). Both sources converge on the same architecture; this plan picks the most concrete and production-aligned choices from each:
- Single-credential bootstrap (Grok) — one master secret unlocks the vault; all other credentials are vault-managed and never typed manually.
- Phase structure (ChatGPT) — eight sequential phases reducing blast radius at each step.
- Tooling choices (both) — Keycloak Operator or codecentric Helm, gpappsoft privacyIDEA Helm, CloudNativePG for PostgreSQL, cert-manager for TLS, Traefik as ingress (K3s native, aligned with Railiance).
- Custom Keycloak image (both) — JAR baked into image via
kc.sh buildrather thankubectl cp; clean GitOps pattern.
Decisions
Three of five decisions for this workstream have been resolved
(2026-03-01, decided by Tegwick). Full rationale in DECISIONS.md.
Two are pending and require further investigation (see Open Questions).
| ID | Decision | Status | Outcome / Notes |
|---|---|---|---|
| D1 | Vault backend | Resolved | KeePassXC pre-cluster → HashiCorp Vault in-cluster. |
| D2 | Identity source of truth | Resolved | Hybrid: Keycloak-internal + LDAP/Entra for enterprise tier. File-based bootstrap user store → Local Identity (NK-WP-0002). |
| D3 | GitOps tooling | Resolved | Plain Helm first, upgrade to Flux when warranted. AI-first philosophy (TDD, API-first, MCP, CLI; UI separate repos) — ecosystem ADR requested from custodian. |
| D4 | Secret injection: ESO vs Vault Agent Injector | Resolved | ESO. GitOps-aligned; standard K8s Secrets consumable by plain Helm. Monitor dynamic-secret gaps; revisit if needed. |
| D5 | File-based bootstrap user store | Resolved | Implement in-repo as local-identity. Staged workplan: NK-WP-0002. See docs/LocalIdentity.md. |
Architecture
Internet
│ TLS (cert-manager / Let's Encrypt)
┌──────┴──────┐
│ Traefik │ (K3s native ingress)
└──┬───────┬──┘
│ │
keycloak.… pi.… pi-account.…
│ │ │
┌──────┘ ┌────┘ │
▼ ▼ │
[Keycloak] [privacyIDEA]◄──┘ (self-service portal)
│ │
└────┬────┘
▼
[PostgreSQL] (CloudNativePG, namespace: databases)
│
[HashiCorp Vault] ← single credential unlocks (in-cluster)
[KeePassXC] ← pre-cluster bootstrap / dev/test/sandbox
Namespaces: sso (Keycloak), mfa (privacyIDEA), databases
Integration: Keycloak runs the browser login flow; privacyIDEA provides MFA via the privacyIDEA Keycloak Provider JAR (baked into custom image).
Dependencies
- Depends on:
railiance/three-phoenix-ha-cluster— full production deployment targets the ThreePhoenix K3s HA cluster. Development/staging can proceed on a single-node k3s instance. - Depends on:
railiance/phase-0-operational-baseline— cert-manager, TLS, backup strategy must be operational before going live.
Tasks
T01 — Phase 0: Vault & secret bootstrap (single-credential principle)
id: NK-WP-0001-T01
state_hub_task_id: 7992528c-d533-44e5-bcce-f92aaa2b75b2
status: done
priority: critical
commit_0a: c576188
note: Phase 0a complete (gen-secrets.sh, pack-bundle.sh, README). Phase 0b (Vault in-cluster) follows T02 cluster deployment.
Decision D1 applies: Two-phase vault strategy.
Phase 0a — Pre-cluster KeePassXC bootstrap (do this first, before K8s):
Create a KeePassXC .kdbx database as the initial secret store. Keep the
KeePassXC master password in a personal password manager. Generate and store
all bootstrap secrets inside KeePassXC:
- privacyIDEA:
SECRET_KEY(64+ chars),PI_PEPPER(32+ chars),PI_ENCFILEcontent (pi-manage create_enckey). - PostgreSQL: root +
keycloak+privacyideauser passwords. - Keycloak: admin bootstrap secret + DB password.
- TLS: ACME account key (if not delegated fully to cert-manager).
- Break-glass: admin credentials + offline recovery OTP seed.
Export an age-encrypted ops bundle (encrypted tar of all secret YAML manifests). Store offsite.
Phase 0b — HashiCorp Vault in-cluster (after T02, once K3s is running):
Deploy HashiCorp Vault in the cluster (Helm chart). Migrate secrets from KeePassXC into Vault. Enable K8s encryption-at-rest. Deploy External Secrets Operator (ESO) — decided D4: ESO reconciles Vault secrets into standard K8s Secrets, compatible with plain Helm charts without Vault-specific annotations. KeePassXC remains the source of truth for dev/test/sandbox systems that do not connect to the cluster Vault.
Done when: KeePassXC created and all secrets generated (0a). Vault deployed in-cluster, secrets migrated, ESO operational and injecting secrets into at least one test workload (0b). Encrypted ops bundle exported and stored offsite.
T02 — Phase 1: K8s foundations (namespaces, NetworkPolicies, cert-manager)
id: NK-WP-0001-T02
state_hub_task_id: 721ca6b2-0cf4-4008-a966-87b1563550fa
status: done
priority: high
commit: ee794a6
note: Manifests committed. Apply with sso-mfa/k8s/README.md apply order; verify-t02.sh checks done-criteria.
Prerequisite: T01 Phase 0a (KeePassXC bootstrap) must be complete — all secrets generated and encrypted ops bundle exported before cluster work begins.
Create namespaces: sso, mfa, databases. Verify cert-manager is
installed and functional on the K3s cluster (Traefik ingress). Define and
apply NetworkPolicies to prevent lateral movement:
- Only ingress controller reaches Keycloak/privacyIDEA service ports.
- Only Keycloak pods call the privacyIDEA API.
- Only app pods/ingress reach Keycloak.
- DB pods reachable only from
ssoandmfanamespaces.
Verify StorageClass for PVCs.
Done when: namespaces exist, NetworkPolicies applied and tested (verify denied paths), cert-manager issues a test certificate.
T03 — Phase 2: PostgreSQL deployment (Keycloak + privacyIDEA DBs)
id: NK-WP-0001-T03
state_hub_task_id: 7fa60004-deb2-4db5-a470-f95dda07f6ab
status: done
priority: high
commit: TBD
note: Manifests committed. Restore drill required before marking fully done in production.
Deploy PostgreSQL via CloudNativePG operator (preferred: aligns with ThreePhoenix HA posture) or Bitnami Helm chart as fallback. Create:
- Database
keycloak_db, userkeycloak - Database
privacyidea_db, userprivacyidea
Store DB credentials as K8s Secrets injected from Vault (T01 Phase 0b must be complete, or use placeholder K8s Secrets until Vault is live). Configure automated DB backups to object storage (S3 or MinIO). Run a restore drill before proceeding — a failed restore later is a critical blocker.
Done when: both DBs live, credentials in K8s Secrets, backup running, restore drill passed.
T04 — Phase 3: Deploy privacyIDEA (MFA core)
id: NK-WP-0001-T04
state_hub_task_id: 6ad1296a-a488-4031-b665-f77030e971ed
status: cancelled
priority: high
note: Cancelled 2026-05-20. privacyIDEA deployment superseded by NK-WP-0003-T04 (privacyIDEA now runs in the live KeyCape stack on RAILIANCE01). This Keycloak-path variant is no longer pursued.
Deploy privacyIDEA via gpappsoft/privacyidea Helm chart (Artifact Hub) or
custom manifests (Deployment + Service + Ingress + PVC + Secrets). Key
Helm values:
database:
password: <from-vault>
privacyidea:
config:
SECRET_KEY: <from-vault>
PI_PEPPER: <from-vault>
encfile:
enabled: true
existingSecret: privacyidea-secrets
key: PI_ENCFILE
ingress:
enabled: true
hostname: pink.coulomb.social
tls: true
Create K8s Secrets: privacyidea-config, privacyidea-enckey,
privacyidea-auditkeys. Configure Ingress + TLS. Add rate-limiting and
WAF rules at Traefik level.
Bootstrap (single-credential moment):
kubectl execinto pod, runpi-manage admin add pi-admin— password comes from vault (only time a password is typed).- Immediately enroll MFA for
pi-admin(TOTP or hardware token). - Create
trigger-adminwithtriggerchallengeright only. - Apply policies: WebUI restricted to VPN/office IPs; MFA required for all admin actions.
Done when: privacyIDEA reachable at pink.coulomb.social with valid TLS, pi-admin enrolled with MFA, trigger-admin created, rate-limiting active.
T05 — Phase 4: Deploy Keycloak (SSO core)
id: NK-WP-0001-T05
state_hub_task_id: b9f73aa6-9035-4643-9905-64e73a29b298
status: cancelled
priority: high
note: Migrated to NK-WP-0011 (enterprise federation / SAML). Refined there against the deployed KeyCape stack and the OpenBao/flex-auth architecture.
Build a custom Keycloak image that includes the privacyIDEA Provider JAR:
FROM quay.io/keycloak/keycloak:<version>
COPY PrivacyIDEA-Provider.jar /opt/keycloak/providers/
RUN /opt/keycloak/bin/kc.sh build
Deploy via plain Helm chart (official Keycloak Operator CRD-based or codecentric KeycloakX Helm chart; decision D3: plain Helm first, Flux later). Configure:
- DB:
keycloak_db(credentials from Vault / K8s Secret) - Ingress + TLS:
keycloak.yourdomain.com(Traefik + cert-manager) - Hostname strictness + proxy mode (Traefik forward headers)
- Metrics/logging (Prometheus annotations)
- Admin bootstrap secret from vault
- Realm import strategy: GitOps-friendly (realm JSON in git or CR)
Done when: Keycloak reachable with valid TLS, admin console accessible, custom image with privacyIDEA JAR deployed and verified.
T06 — Phase 5: Realm config & MFA authentication flow
id: NK-WP-0001-T06
state_hub_task_id: 3b6379a4-a27b-4d25-82be-bc600879f036
status: cancelled
priority: medium
note: Migrated to NK-WP-0011 (enterprise federation / SAML).
In Keycloak:
- Create/configure realm. Decision D2 applies: identity source of truth is Keycloak-internal users. LDAP/AD and Entra federation is deferred to the enterprise tier (not in scope for this workplan phase).
- Create Authentication Flow "privacyIDEA Browser":
- Add privacyIDEA execution step (REQUIRED)
- Config: privacyIDEA URL =
https://pink.coulomb.social, service account =trigger-admin(secret from K8s Secret) - Optional: bypass group (break-glass) with strict restrictions + alerts
- Set this flow as the default browser flow.
- Require MFA step-up for admin console and sensitive OIDC clients.
Test:
- Normal user: password → MFA OTP → session established
- Admin console: MFA required
- Failure modes: wrong OTP, token missing, privacyIDEA unreachable
- Break-glass: bypass works, alert fires
Done when: end-to-end auth works for normal and admin paths, all failure modes handled gracefully.
T07 — Phase 6: User management, policies & self-service portal
id: NK-WP-0001-T07
state_hub_task_id: c7cf902a-b480-4545-a536-293070945206
status: cancelled
priority: medium
note: Migrated to NK-WP-0011 (enterprise federation / SAML).
Decision D2 applies: identity source of truth is Keycloak-internal with the privacyIDEA Keycloak resolver. Implement (not decide):
- Configure privacyIDEA 3.12+ Keycloak user resolver to align Keycloak users with privacyIDEA token ownership.
- LDAP/Entra federation: out of scope for this phase. Registered as extension point EP-NK-001 (State Hub) for future enterprise-tier work.
Define policies in privacyIDEA:
- Allowed token types: TOTP, hardware (YubiKey), passkey
- Enrollment rules (who can self-enroll, which token types)
- Admin rights separation: super-admin vs. helpdesk-admin
Enable self-service portal at pink-account.coulomb.social for user token
enrollment/replacement.
Configure auditing and log shipping: privacyIDEA audit logs + Keycloak events → centralized logging (ELK/Loki or equivalent). Token lifecycle policies: enrollment, revocation, re-enrollment on device loss.
Bootstrap user management (D2 + D5 — Local Identity):
The pre-Keycloak user store is implemented as the local-identity capability.
See NK-WP-0002 and
docs/LocalIdentity.md. NK-WP-0002 Stage 2
produces Keycloak-compatible user exports (local-identity export --all)
that feed the realm bulk-import during T06. Once T06 is operational,
Local Identity should be explicitly migrated away from for that instance.
Done when: policies documented and applied, self-service portal live, audit logs flowing, Keycloak resolver configured.
T08 — Phase 7: Backups, DR, break-glass & monitoring
id: NK-WP-0001-T08
state_hub_task_id: 9cbd1d89-b5bf-491e-9d16-b1c7d57076fb
status: cancelled
priority: medium
note: Migrated to NK-WP-0011 (enterprise federation / SAML).
Backups:
- DB backups: Keycloak + privacyIDEA (Velero or CloudNativePG scheduled backup to S3/MinIO). Test restore.
- privacyIDEA encryption/audit key Secrets: encrypted export, versioned.
- Keycloak realm exports: stored as JSON in git (GitOps-friendly).
- Vault unseal keys and root token: offline copy in KeePassXC.
Disaster recovery drill (mandatory before production):
- Restore DB + keys into a fresh namespace.
- Verify token validation still works — this catches key/secret mistakes.
Break-glass procedure:
- Disabled-by-default Keycloak admin path or group exemption.
- Break-glass credentials stored offline + vault. Alert (PagerDuty/webhook) on every use.
Monitoring:
- Prometheus scraping Keycloak + privacyIDEA metrics.
- Grafana dashboards: auth success/failure rates, MFA challenge latency, token count by type.
- Alert: privacyIDEA unreachable (blocks all logins).
Final validation:
- All external traffic: Ingress + HSTS + strict TLS.
- NetworkPolicies verified (no unintended open paths).
- End-to-end: app → Keycloak → privacyIDEA OTP → SSO session established.
Done when: DR drill passed, monitoring live, break-glass procedure documented and tested, HSTS and NetworkPolicies verified.
Deliverables Checklist
- KeePassXC vault created; all secrets generated and encrypted ops bundle exported
- HashiCorp Vault deployed in-cluster; secrets migrated from KeePassXC
- Secret injection strategy chosen and operational (ESO + Vault or Vault Agent)
sso,mfa,databasesnamespaces + NetworkPolicies deployed- TLS everywhere via cert-manager (Traefik ingress)
- PostgreSQL live; both DBs created; backup + restore tested
- privacyIDEA running at
pink.coulomb.social; pi-admin MFA enrolled; trigger-admin created with least-privilege rights - Keycloak running from custom image including privacyIDEA Provider JAR
- Keycloak "privacyIDEA Browser" flow enforced as default
- Realm exported to git; admin secret from vault
- Self-service portal live; token lifecycle policies defined
- DR drill passed; monitoring live; break-glass documented and tested
Open Questions
See DECISIONS.md for the three resolved decisions (D1–D3).
Two pending decisions have been raised; see State Hub for full detail.
All five decisions are now resolved. See DECISIONS.md for D1–D3 rationale;
State Hub decisions aca69951 (D4) and d74e2b11 (D5) for the full records.
| Artefact | Item | Status |
|---|---|---|
Task 007415ef → [repo:custodian] |
Create ecosystem ADR for AI-first principles (D3) | Open; custodian to action |
EP-NK-001 (513a7644) |
LDAP/AD/Entra federation | Open; enterprise tier |