generated from coulomb/repo-seed
NK-WP-0005: mark all tasks done, status → done NK-WP-0003: T01 marked done (NK-WP-0004/0005 complete); pre-conditions updated; done criteria reflect agent-bootstrap model (no KeePassXC) NK-WP-0001: status → deferred; T05-T08 (Keycloak) deferred indefinitely; superseded_by: NK-WP-0003 added Active work path is now NK-WP-0003 T02-T09. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
436 lines
17 KiB
Markdown
436 lines
17 KiB
Markdown
---
|
||
id: NK-WP-0001
|
||
type: workplan
|
||
title: "SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes"
|
||
domain: netkingdom
|
||
status: deferred
|
||
owner: worsch
|
||
topic_slug: netkingdom
|
||
state_hub_workstream_id: 39263c4b-ef70-4053-b782-350834b7e1be
|
||
created: "2026-02-28"
|
||
updated: "2026-03-21"
|
||
superseded_by: NK-WP-0003
|
||
---
|
||
|
||
# SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes
|
||
|
||
> **Status: DEFERRED (2026-03-21)**
|
||
> The Keycloak path has been superseded by the KeyCape + Authelia + LLDAP
|
||
> stack (NK-WP-0003). Keycloak is out of scope for the current deployment.
|
||
>
|
||
> - T01 (secret bootstrap) → replaced by NK-WP-0004 + NK-WP-0005
|
||
> - T02 (K8s foundations) → done, reused by NK-WP-0003
|
||
> - T03 (PostgreSQL) → done, reused by NK-WP-0003
|
||
> - T04 (privacyIDEA) → superseded by NK-WP-0003-T04
|
||
> - T05–T08 (Keycloak) → deferred indefinitely; revisit if/when Keycloak
|
||
> is needed for enterprise federation or SAML requirements
|
||
>
|
||
> **Active work: see NK-WP-0003.**
|
||
|
||
## Summary
|
||
|
||
~~Deploy a hardened SSO and MFA platform on Kubernetes: Keycloak as the
|
||
OIDC/SAML identity provider, privacyIDEA as the MFA/token engine,
|
||
integrated via the privacyIDEA Keycloak Provider.~~ Deferred — see NK-WP-0003.
|
||
|
||
This workplan is retained as a reference for the Keycloak-based architecture
|
||
decisions (D1–D5) and for the T01–T03 infrastructure that was built and
|
||
remains in use.
|
||
|
||
## Context
|
||
|
||
Synthesised from two AI protoplans (wiki/WorkplanOneChatgpt.md and
|
||
wiki/WorkplanOneGrok.md). Both sources converge on the same architecture;
|
||
this plan picks the most concrete and production-aligned choices from each:
|
||
|
||
- **Single-credential bootstrap** (Grok) — one master secret unlocks the
|
||
vault; all other credentials are vault-managed and never typed manually.
|
||
- **Phase structure** (ChatGPT) — eight sequential phases reducing blast
|
||
radius at each step.
|
||
- **Tooling choices** (both) — Keycloak Operator or codecentric Helm,
|
||
gpappsoft privacyIDEA Helm, CloudNativePG for PostgreSQL, cert-manager
|
||
for TLS, Traefik as ingress (K3s native, aligned with Railiance).
|
||
- **Custom Keycloak image** (both) — JAR baked into image via `kc.sh build`
|
||
rather than `kubectl cp`; clean GitOps pattern.
|
||
|
||
## Decisions
|
||
|
||
Three of five decisions for this workstream have been resolved
|
||
(2026-03-01, decided by Tegwick). Full rationale in `DECISIONS.md`.
|
||
Two are pending and require further investigation (see Open Questions).
|
||
|
||
| ID | Decision | Status | Outcome / Notes |
|
||
|----|----------|--------|-----------------|
|
||
| D1 | Vault backend | **Resolved** | KeePassXC pre-cluster → HashiCorp Vault in-cluster. |
|
||
| D2 | Identity source of truth | **Resolved** | Hybrid: Keycloak-internal + LDAP/Entra for enterprise tier. File-based bootstrap user store → Local Identity (NK-WP-0002). |
|
||
| D3 | GitOps tooling | **Resolved** | Plain Helm first, upgrade to Flux when warranted. AI-first philosophy (TDD, API-first, MCP, CLI; UI separate repos) — ecosystem ADR requested from custodian. |
|
||
| D4 | Secret injection: ESO vs Vault Agent Injector | **Resolved** | **ESO.** GitOps-aligned; standard K8s Secrets consumable by plain Helm. Monitor dynamic-secret gaps; revisit if needed. |
|
||
| D5 | File-based bootstrap user store | **Resolved** | **Implement in-repo as `local-identity`.** Staged workplan: NK-WP-0002. See `docs/LocalIdentity.md`. |
|
||
|
||
## Architecture
|
||
|
||
```
|
||
Internet
|
||
│ TLS (cert-manager / Let's Encrypt)
|
||
┌──────┴──────┐
|
||
│ Traefik │ (K3s native ingress)
|
||
└──┬───────┬──┘
|
||
│ │
|
||
keycloak.… pi.… pi-account.…
|
||
│ │ │
|
||
┌──────┘ ┌────┘ │
|
||
▼ ▼ │
|
||
[Keycloak] [privacyIDEA]◄──┘ (self-service portal)
|
||
│ │
|
||
└────┬────┘
|
||
▼
|
||
[PostgreSQL] (CloudNativePG, namespace: databases)
|
||
│
|
||
[HashiCorp Vault] ← single credential unlocks (in-cluster)
|
||
[KeePassXC] ← pre-cluster bootstrap / dev/test/sandbox
|
||
```
|
||
|
||
**Namespaces:** `sso` (Keycloak), `mfa` (privacyIDEA), `databases`
|
||
|
||
**Integration:** Keycloak runs the browser login flow; privacyIDEA provides
|
||
MFA via the privacyIDEA Keycloak Provider JAR (baked into custom image).
|
||
|
||
## Dependencies
|
||
|
||
- Depends on: `railiance/three-phoenix-ha-cluster` — full production
|
||
deployment targets the ThreePhoenix K3s HA cluster. Development/staging
|
||
can proceed on a single-node k3s instance.
|
||
- Depends on: `railiance/phase-0-operational-baseline` — cert-manager, TLS,
|
||
backup strategy must be operational before going live.
|
||
|
||
## Tasks
|
||
|
||
### T01 — Phase 0: Vault & secret bootstrap (single-credential principle)
|
||
|
||
```task
|
||
id: NK-WP-0001-T01
|
||
state_hub_task_id: 7992528c-d533-44e5-bcce-f92aaa2b75b2
|
||
status: done
|
||
priority: critical
|
||
commit_0a: c576188
|
||
note: Phase 0a complete (gen-secrets.sh, pack-bundle.sh, README). Phase 0b (Vault in-cluster) follows T02 cluster deployment.
|
||
```
|
||
|
||
**Decision D1 applies:** Two-phase vault strategy.
|
||
|
||
**Phase 0a — Pre-cluster KeePassXC bootstrap (do this first, before K8s):**
|
||
|
||
Create a KeePassXC `.kdbx` database as the initial secret store. Keep the
|
||
KeePassXC master password in a personal password manager. Generate and store
|
||
all bootstrap secrets inside KeePassXC:
|
||
|
||
- privacyIDEA: `SECRET_KEY` (64+ chars), `PI_PEPPER` (32+ chars),
|
||
`PI_ENCFILE` content (`pi-manage create_enckey`).
|
||
- PostgreSQL: root + `keycloak` + `privacyidea` user passwords.
|
||
- Keycloak: admin bootstrap secret + DB password.
|
||
- TLS: ACME account key (if not delegated fully to cert-manager).
|
||
- Break-glass: admin credentials + offline recovery OTP seed.
|
||
|
||
Export an age-encrypted ops bundle (encrypted tar of all secret YAML
|
||
manifests). Store offsite.
|
||
|
||
**Phase 0b — HashiCorp Vault in-cluster (after T02, once K3s is running):**
|
||
|
||
Deploy HashiCorp Vault in the cluster (Helm chart). Migrate secrets from
|
||
KeePassXC into Vault. Enable K8s encryption-at-rest. Deploy External Secrets
|
||
Operator (ESO) — **decided D4**: ESO reconciles Vault secrets into standard
|
||
K8s Secrets, compatible with plain Helm charts without Vault-specific
|
||
annotations. KeePassXC remains the source of truth for dev/test/sandbox
|
||
systems that do not connect to the cluster Vault.
|
||
|
||
**Done when:** KeePassXC created and all secrets generated (0a). Vault
|
||
deployed in-cluster, secrets migrated, ESO operational and injecting secrets
|
||
into at least one test workload (0b). Encrypted ops bundle exported and
|
||
stored offsite.
|
||
|
||
---
|
||
|
||
### T02 — Phase 1: K8s foundations (namespaces, NetworkPolicies, cert-manager)
|
||
|
||
```task
|
||
id: NK-WP-0001-T02
|
||
state_hub_task_id: 721ca6b2-0cf4-4008-a966-87b1563550fa
|
||
status: done
|
||
priority: high
|
||
commit: ee794a6
|
||
note: Manifests committed. Apply with sso-mfa/k8s/README.md apply order; verify-t02.sh checks done-criteria.
|
||
```
|
||
|
||
**Prerequisite:** T01 Phase 0a (KeePassXC bootstrap) must be complete — all
|
||
secrets generated and encrypted ops bundle exported before cluster work begins.
|
||
|
||
Create namespaces: `sso`, `mfa`, `databases`. Verify cert-manager is
|
||
installed and functional on the K3s cluster (Traefik ingress). Define and
|
||
apply NetworkPolicies to prevent lateral movement:
|
||
|
||
- Only ingress controller reaches Keycloak/privacyIDEA service ports.
|
||
- Only Keycloak pods call the privacyIDEA API.
|
||
- Only app pods/ingress reach Keycloak.
|
||
- DB pods reachable only from `sso` and `mfa` namespaces.
|
||
|
||
Verify StorageClass for PVCs.
|
||
|
||
**Done when:** namespaces exist, NetworkPolicies applied and tested (verify
|
||
denied paths), cert-manager issues a test certificate.
|
||
|
||
---
|
||
|
||
### T03 — Phase 2: PostgreSQL deployment (Keycloak + privacyIDEA DBs)
|
||
|
||
```task
|
||
id: NK-WP-0001-T03
|
||
state_hub_task_id: 7fa60004-deb2-4db5-a470-f95dda07f6ab
|
||
status: done
|
||
priority: high
|
||
commit: TBD
|
||
note: Manifests committed. Restore drill required before marking fully done in production.
|
||
```
|
||
|
||
Deploy PostgreSQL via CloudNativePG operator (preferred: aligns with
|
||
ThreePhoenix HA posture) or Bitnami Helm chart as fallback. Create:
|
||
|
||
- Database `keycloak_db`, user `keycloak`
|
||
- Database `privacyidea_db`, user `privacyidea`
|
||
|
||
Store DB credentials as K8s Secrets injected from Vault (T01 Phase 0b must
|
||
be complete, or use placeholder K8s Secrets until Vault is live).
|
||
Configure automated DB backups to object storage (S3 or MinIO).
|
||
**Run a restore drill before proceeding** — a failed restore later is a
|
||
critical blocker.
|
||
|
||
**Done when:** both DBs live, credentials in K8s Secrets, backup running,
|
||
restore drill passed.
|
||
|
||
---
|
||
|
||
### T04 — Phase 3: Deploy privacyIDEA (MFA core)
|
||
|
||
```task
|
||
id: NK-WP-0001-T04
|
||
state_hub_task_id: 6ad1296a-a488-4031-b665-f77030e971ed
|
||
status: in_progress
|
||
priority: high
|
||
note: Manifests committed (pvc, configmap, deployment, middleware, ingress). Scripts: create-secrets.sh, enckey-bootstrap.sh, bootstrap-admin.sh. verify-t04.sh. Domain pink.coulomb.social (CP-NK-002/003). Pending: apply to live cluster, run enckey-bootstrap.sh, bootstrap-admin.sh.
|
||
```
|
||
|
||
Deploy privacyIDEA via `gpappsoft/privacyidea` Helm chart (Artifact Hub) or
|
||
custom manifests (Deployment + Service + Ingress + PVC + Secrets). Key
|
||
Helm values:
|
||
|
||
```yaml
|
||
database:
|
||
password: <from-vault>
|
||
privacyidea:
|
||
config:
|
||
SECRET_KEY: <from-vault>
|
||
PI_PEPPER: <from-vault>
|
||
encfile:
|
||
enabled: true
|
||
existingSecret: privacyidea-secrets
|
||
key: PI_ENCFILE
|
||
ingress:
|
||
enabled: true
|
||
hostname: pink.coulomb.social
|
||
tls: true
|
||
```
|
||
|
||
Create K8s Secrets: `privacyidea-config`, `privacyidea-enckey`,
|
||
`privacyidea-auditkeys`. Configure Ingress + TLS. Add rate-limiting and
|
||
WAF rules at Traefik level.
|
||
|
||
**Bootstrap (single-credential moment):**
|
||
1. `kubectl exec` into pod, run `pi-manage admin add pi-admin` — password
|
||
comes from vault (only time a password is typed).
|
||
2. Immediately enroll MFA for `pi-admin` (TOTP or hardware token).
|
||
3. Create `trigger-admin` with `triggerchallenge` right only.
|
||
4. Apply policies: WebUI restricted to VPN/office IPs; MFA required for
|
||
all admin actions.
|
||
|
||
**Done when:** privacyIDEA reachable at pink.coulomb.social with valid TLS,
|
||
pi-admin enrolled with MFA, trigger-admin created, rate-limiting active.
|
||
|
||
---
|
||
|
||
### T05 — Phase 4: Deploy Keycloak (SSO core)
|
||
|
||
```task
|
||
id: NK-WP-0001-T05
|
||
state_hub_task_id: b9f73aa6-9035-4643-9905-64e73a29b298
|
||
status: todo
|
||
priority: high
|
||
```
|
||
|
||
Build a **custom Keycloak image** that includes the privacyIDEA Provider JAR:
|
||
|
||
```dockerfile
|
||
FROM quay.io/keycloak/keycloak:<version>
|
||
COPY PrivacyIDEA-Provider.jar /opt/keycloak/providers/
|
||
RUN /opt/keycloak/bin/kc.sh build
|
||
```
|
||
|
||
Deploy via plain Helm chart (official Keycloak Operator CRD-based or
|
||
codecentric KeycloakX Helm chart; **decision D3: plain Helm first, Flux
|
||
later**). Configure:
|
||
|
||
- DB: `keycloak_db` (credentials from Vault / K8s Secret)
|
||
- Ingress + TLS: `keycloak.yourdomain.com` (Traefik + cert-manager)
|
||
- Hostname strictness + proxy mode (Traefik forward headers)
|
||
- Metrics/logging (Prometheus annotations)
|
||
- Admin bootstrap secret from vault
|
||
- Realm import strategy: GitOps-friendly (realm JSON in git or CR)
|
||
|
||
**Done when:** Keycloak reachable with valid TLS, admin console accessible,
|
||
custom image with privacyIDEA JAR deployed and verified.
|
||
|
||
---
|
||
|
||
### T06 — Phase 5: Realm config & MFA authentication flow
|
||
|
||
```task
|
||
id: NK-WP-0001-T06
|
||
state_hub_task_id: 3b6379a4-a27b-4d25-82be-bc600879f036
|
||
status: todo
|
||
priority: medium
|
||
```
|
||
|
||
In Keycloak:
|
||
|
||
1. Create/configure realm. **Decision D2 applies:** identity source of truth
|
||
is Keycloak-internal users. LDAP/AD and Entra federation is deferred to
|
||
the enterprise tier (not in scope for this workplan phase).
|
||
2. Create Authentication Flow "privacyIDEA Browser":
|
||
- Add privacyIDEA execution step (REQUIRED)
|
||
- Config: privacyIDEA URL = `https://pink.coulomb.social`, service account
|
||
= `trigger-admin` (secret from K8s Secret)
|
||
- Optional: bypass group (break-glass) with strict restrictions + alerts
|
||
3. Set this flow as the default browser flow.
|
||
4. Require MFA step-up for admin console and sensitive OIDC clients.
|
||
|
||
Test:
|
||
- Normal user: password → MFA OTP → session established
|
||
- Admin console: MFA required
|
||
- Failure modes: wrong OTP, token missing, privacyIDEA unreachable
|
||
- Break-glass: bypass works, alert fires
|
||
|
||
**Done when:** end-to-end auth works for normal and admin paths, all failure
|
||
modes handled gracefully.
|
||
|
||
---
|
||
|
||
### T07 — Phase 6: User management, policies & self-service portal
|
||
|
||
```task
|
||
id: NK-WP-0001-T07
|
||
state_hub_task_id: c7cf902a-b480-4545-a536-293070945206
|
||
status: todo
|
||
priority: medium
|
||
```
|
||
|
||
**Decision D2 applies:** identity source of truth is Keycloak-internal with
|
||
the privacyIDEA Keycloak resolver. Implement (not decide):
|
||
|
||
- Configure privacyIDEA 3.12+ Keycloak user resolver to align Keycloak
|
||
users with privacyIDEA token ownership.
|
||
- LDAP/Entra federation: out of scope for this phase. Registered as
|
||
extension point EP-NK-001 (State Hub) for future enterprise-tier work.
|
||
|
||
Define policies in privacyIDEA:
|
||
- Allowed token types: TOTP, hardware (YubiKey), passkey
|
||
- Enrollment rules (who can self-enroll, which token types)
|
||
- Admin rights separation: super-admin vs. helpdesk-admin
|
||
|
||
Enable self-service portal at `pink-account.coulomb.social` for user token
|
||
enrollment/replacement.
|
||
|
||
Configure auditing and log shipping: privacyIDEA audit logs + Keycloak
|
||
events → centralized logging (ELK/Loki or equivalent). Token lifecycle
|
||
policies: enrollment, revocation, re-enrollment on device loss.
|
||
|
||
**Bootstrap user management (D2 + D5 — Local Identity):**
|
||
The pre-Keycloak user store is implemented as the `local-identity` capability.
|
||
See [NK-WP-0002](NK-WP-0002-local-identity.md) and
|
||
[docs/LocalIdentity.md](../docs/LocalIdentity.md). NK-WP-0002 Stage 2
|
||
produces Keycloak-compatible user exports (`local-identity export --all`)
|
||
that feed the realm bulk-import during T06. Once T06 is operational,
|
||
Local Identity should be explicitly migrated away from for that instance.
|
||
|
||
**Done when:** policies documented and applied, self-service portal live,
|
||
audit logs flowing, Keycloak resolver configured.
|
||
|
||
---
|
||
|
||
### T08 — Phase 7: Backups, DR, break-glass & monitoring
|
||
|
||
```task
|
||
id: NK-WP-0001-T08
|
||
state_hub_task_id: 9cbd1d89-b5bf-491e-9d16-b1c7d57076fb
|
||
status: todo
|
||
priority: medium
|
||
```
|
||
|
||
**Backups:**
|
||
- DB backups: Keycloak + privacyIDEA (Velero or CloudNativePG scheduled
|
||
backup to S3/MinIO). Test restore.
|
||
- privacyIDEA encryption/audit key Secrets: encrypted export, versioned.
|
||
- Keycloak realm exports: stored as JSON in git (GitOps-friendly).
|
||
- Vault unseal keys and root token: offline copy in KeePassXC.
|
||
|
||
**Disaster recovery drill** (mandatory before production):
|
||
1. Restore DB + keys into a fresh namespace.
|
||
2. Verify token validation still works — this catches key/secret mistakes.
|
||
|
||
**Break-glass procedure:**
|
||
- Disabled-by-default Keycloak admin path or group exemption.
|
||
- Break-glass credentials stored offline + vault. Alert (PagerDuty/webhook)
|
||
on every use.
|
||
|
||
**Monitoring:**
|
||
- Prometheus scraping Keycloak + privacyIDEA metrics.
|
||
- Grafana dashboards: auth success/failure rates, MFA challenge latency,
|
||
token count by type.
|
||
- Alert: privacyIDEA unreachable (blocks all logins).
|
||
|
||
**Final validation:**
|
||
- All external traffic: Ingress + HSTS + strict TLS.
|
||
- NetworkPolicies verified (no unintended open paths).
|
||
- End-to-end: app → Keycloak → privacyIDEA OTP → SSO session established.
|
||
|
||
**Done when:** DR drill passed, monitoring live, break-glass procedure
|
||
documented and tested, HSTS and NetworkPolicies verified.
|
||
|
||
---
|
||
|
||
## Deliverables Checklist
|
||
|
||
- [ ] KeePassXC vault created; all secrets generated and encrypted ops bundle exported
|
||
- [ ] HashiCorp Vault deployed in-cluster; secrets migrated from KeePassXC
|
||
- [ ] Secret injection strategy chosen and operational (ESO + Vault or Vault Agent)
|
||
- [ ] `sso`, `mfa`, `databases` namespaces + NetworkPolicies deployed
|
||
- [ ] TLS everywhere via cert-manager (Traefik ingress)
|
||
- [ ] PostgreSQL live; both DBs created; backup + restore tested
|
||
- [ ] privacyIDEA running at `pink.coulomb.social`; pi-admin MFA enrolled;
|
||
trigger-admin created with least-privilege rights
|
||
- [ ] Keycloak running from custom image including privacyIDEA Provider JAR
|
||
- [ ] Keycloak "privacyIDEA Browser" flow enforced as default
|
||
- [ ] Realm exported to git; admin secret from vault
|
||
- [ ] Self-service portal live; token lifecycle policies defined
|
||
- [ ] DR drill passed; monitoring live; break-glass documented and tested
|
||
|
||
## Open Questions
|
||
|
||
See `DECISIONS.md` for the three resolved decisions (D1–D3).
|
||
Two pending decisions have been raised; see State Hub for full detail.
|
||
|
||
All five decisions are now resolved. See `DECISIONS.md` for D1–D3 rationale;
|
||
State Hub decisions `aca69951` (D4) and `d74e2b11` (D5) for the full records.
|
||
|
||
| Artefact | Item | Status |
|
||
|----------|------|--------|
|
||
| Task `007415ef` → [repo:custodian] | Create ecosystem ADR for AI-first principles (D3) | Open; custodian to action |
|
||
| EP-NK-001 (`513a7644`) | LDAP/AD/Entra federation | Open; enterprise tier |
|