Files
net-kingdom/workplans/NK-WP-0003-keycape-privacyidea-cluster-deployment.md

391 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: NK-WP-0003
type: workplan
title: "KeyCape + privacyIDEA Stack — Cluster Deployment"
domain: netkingdom
repo: net-kingdom
status: completed
owner: custodian
topic_slug: netkingdom
created: "2026-03-20"
updated: "2026-05-02"
state_hub_workstream_id: "f24cefd4-a09b-4fa1-9b25-94bf783b425e"
---
# KeyCape + privacyIDEA Stack — Cluster Deployment
## Goal
Deploy the full NetKingdom identity stack on the live k3s cluster without
Keycloak. KeyCape (v0.1, complete) is the OIDC orchestration layer; it
binds LLDAP (directory), Authelia (auth sessions), and privacyIDEA (MFA).
NK-WP-0001 was scoped around Keycloak and is deferred. This workplan
covers everything needed to reach a production-ready identity plane.
## Target cluster
**RAILIANCE01**`92.205.62.239` — k3s v1.35.1+k3s1, clean baseline.
Kubeconfig: `~/.kube/config-railiance01`
> Note: T02T07 were previously completed on CoulombCore (92.205.130.254) by
> mistake. CoulombCore is the old management host (Gitea/OCI registry only) and
> should not be touched. All SSO stack work targets RAILIANCE01 exclusively.
## Pre-conditions
- [x] k3s cluster healthy on RAILIANCE01 — v1.35.1+k3s1, node Ready ✓
- [x] kubeconfig available at `~/.kube/config-railiance01`
- [x] All manifests committed — net-kingdom `sso-mfa/k8s/`
- [x] KeyCape v0.1 complete — KEY-WP-0001 ✓
- [x] SOPS + age integrated into net-kingdom — NK-WP-0004 ✓
- [x] Agent-driven credential bootstrap ready — NK-WP-0005 ✓ (run `make creds-agent-init`)
## Architecture
```
Internet → Traefik (RAILIANCE01 k3s) → cert-manager TLS
├── auth.coulomb.social → Authelia
├── pink.coulomb.social → privacyIDEA portal
├── pink-account.coulomb.social → privacyIDEA account self-service
└── id.coulomb.social → KeyCape (OIDC)
KeyCape ──► Authelia (session, password)
──► LLDAP (directory, user lookup)
──► privacyIDEA (MFA challenges via trigger-admin token)
privacyIDEA ──► PostgreSQL (privacyidea_db via CloudNativePG)
LLDAP ──► SQLite (PVC)
Authelia ──► SQLite (PVC)
KeyCape image pulled from CoulombCore OCI registry: 92.205.130.254:32166
(insecure HTTP NodePort — requires registries.yaml on RAILIANCE01)
```
## Tasks
### T01 — Credential setup
```task
id: NK-WP-0003-T01
status: done
priority: high
state_hub_task_id: "6a22e17e-5854-4f8b-b419-9dc86d490357"
note: Credential foundation exists (NK-WP-0004 + NK-WP-0005). Secrets encrypted in
secrets.enc/. Before T02, run `make creds-agent-init` with KUBECONFIG pointing
to RAILIANCE01 to inject all secrets into the new cluster.
```
~~Net-kingdom currently uses a manual KeePassXC + age-bundle approach~~
Completed via NK-WP-0004 + NK-WP-0005. The credential foundation is in place:
- SOPS + age integrated — `~/.config/sops/age/keys.txt`, `.sops.yaml`, git hook
- Agent bootstrap: `make creds-agent-init` runs the full flow autonomously
- Credential standard: `canon/standards/credential-management_v0.2.md`
To bootstrap credentials into the RAILIANCE01 cluster before T02T09, run:
```bash
export KUBECONFIG=~/.kube/config-railiance01
make creds-agent-init
```
This generates all secrets, encrypts to `secrets.enc/`, injects into the
cluster, and delivers the emergency bundle. No KeePassXC steps required.
### T02 — Apply cluster foundations
```task
id: NK-WP-0003-T02
status: done
priority: high
state_hub_task_id: "a14e3a6b-18ee-4172-8a47-bd531f21e55a"
note: Done 2026-03-25 on RAILIANCE01. Namespaces, NetworkPolicies, cert-manager, ClusterIssuers,
insecure registry for CoulombCore OCI all applied and verified.
Known gotcha: added allow-traefik-to-acme-solver NetworkPolicy to sso + mfa namespaces
(default-deny-all blocked ACME HTTP-01 solver pods from receiving Traefik traffic).
```
Apply the K8s infrastructure foundations. All manifests already committed.
```bash
export KUBECONFIG=~/.kube/config-railiance01
kubectl apply -f sso-mfa/k8s/namespaces/
kubectl apply -f sso-mfa/k8s/network-policies/
kubectl apply -f sso-mfa/k8s/cert-manager/
```
Also configure the insecure OCI registry on RAILIANCE01 so k3s can pull the KeyCape image:
```bash
ssh tegwick@92.205.62.239 "sudo tee /etc/rancher/k3s/registries.yaml" <<'EOF'
mirrors:
"92.205.130.254:32166":
endpoint:
- "http://92.205.130.254:32166"
EOF
ssh tegwick@92.205.62.239 "sudo systemctl restart k3s"
```
Verify: `bash sso-mfa/k8s/verify-t02.sh`
Expected: namespaces `sso`, `mfa`, `databases` exist; NetworkPolicies applied;
cert-manager pods Running.
### T03 — Deploy PostgreSQL (CloudNativePG)
```task
id: NK-WP-0003-T03
status: done
priority: high
state_hub_task_id: "19e375d0-66bd-4cf0-9c2d-59d5c0d5989e"
note: Done 2026-03-25 on RAILIANCE01. CNPG operator + net-kingdom-pg cluster running,
privacyidea_db + role created. Verified via verify-t03.sh (8/8 PASS, 2 WARN for
superuser secret + scheduled backup — both expected at this stage).
```
Deploy the shared database cluster:
```bash
export KUBECONFIG=~/.kube/config-railiance01
kubectl apply -f sso-mfa/k8s/postgres/
```
Wait for cluster to be `Ready`, then verify: `bash sso-mfa/k8s/verify-t03.sh`
**Note**: Do not proceed to T04 until the CloudNativePG cluster is fully
healthy. Migration jobs will fail on a partially-started cluster.
### T04 — Deploy privacyIDEA
```task
id: NK-WP-0003-T04
status: done
priority: high
state_hub_task_id: "9c9c1ec9-0cf5-4546-a83e-d74dbf3b27af"
note: Done 2026-03-25 on RAILIANCE01. privacyIDEA pod Running, TLS certs issued,
enckey + audit keys bootstrapped (privacyidea-enckey + privacyidea-auditkeys Secrets created),
pi-admin + trigger-admin created, trigger-admin-rights policy created via REST API.
DEFERRED: pi-admin TOTP enrollment requires an admin realm (SQLresolver pointing to PI's
internal admin table) — pi-manage has no enroll command, WebUI token enrollment only works
for resolver-backed users. Admin MFA is production hardening; pi-admin auth works
password-only for now. Track as T09 hardening item.
```
Run credential bootstrap (injects privacyIDEA secrets + creates pi-admin/trigger-admin):
```bash
export KUBECONFIG=~/.kube/config-railiance01
make creds-agent-init
```
**Remaining manual step:**
Once `pink.coulomb.social` resolves to `92.205.62.239` and TLS cert is issued:
1. Log in to https://pink.coulomb.social as `pi-admin`
2. Enroll MFA for `pi-admin` (TOTP)
3. Verify/create trigger-admin policy: Policies → trigger-admin-rights
(Scope: admin, Action: triggerchallenge, AdminUser: trigger-admin)
### T05 — Deploy LLDAP
```task
id: NK-WP-0003-T05
status: done
priority: high
state_hub_task_id: "82fc90f7-8eb4-4718-b02a-dfd5fa39e5bc"
note: Done 2026-03-25 on RAILIANCE01. LLDAP pod Running, TLS cert issued (lldap.coulomb.social),
groups net-kingdom-users (id=4) + net-kingdom-admins (id=5) created via direct GraphQL.
bootstrap-users.sh has a bash set -e / json parse bug (workaround: direct curl).
```
Deploy LLDAP into the `sso` namespace:
```bash
export KUBECONFIG=~/.kube/config-railiance01
cd sso-mfa/k8s/lldap
bash create-secrets.sh
kubectl apply -f deployment.yaml
kubectl apply -f ingress.yaml
kubectl apply -f middleware.yaml
bash bootstrap-users.sh # creates base OU structure + initial admin user
```
Verify pod Running and LDAP bind works on `ldap.coulomb.social`.
### T06 — Deploy Authelia
```task
id: NK-WP-0003-T06
status: done
priority: high
state_hub_task_id: "3a28ff10-fbfa-443b-a64d-bbfe6153c544"
note: Done 2026-03-25 on RAILIANCE01. Authelia pod Running (1 restart on init, normal),
TLS cert issued (auth.coulomb.social), health endpoint returns {"status":"OK"}.
```
Deploy Authelia into the `sso` namespace:
```bash
export KUBECONFIG=~/.kube/config-railiance01
cd sso-mfa/k8s/authelia
bash create-secrets.sh
kubectl apply -f configmap.yaml
kubectl apply -f deployment.yaml
kubectl apply -f ingress.yaml
```
Verify: `bash sso-mfa/k8s/verify-t05.sh` (covers LLDAP + Authelia together)
### T07 — Deploy KeyCape
```task
id: NK-WP-0003-T07
status: done
priority: high
state_hub_task_id: "496a97c9-3e2a-486e-ba62-18449868c6cf"
note: Done 2026-03-25 on RAILIANCE01. KeyCape pod Running, TLS cert issued (kc.coulomb.social),
OIDC discovery endpoint live at https://kc.coulomb.social/.well-known/openid-configuration.
PI admin token refreshed via create-pi-token.sh (old token was from CoulombCore).
keycape-pi-token K8s Secret created in sso namespace.
```
Deploy KeyCape into the `sso` namespace:
```bash
export KUBECONFIG=~/.kube/config-railiance01
cd sso-mfa/k8s/keycape
bash create-secrets.sh # includes privacyIDEA trigger-admin token
bash create-pi-token.sh # registers KeyCape as a privacyIDEA application
kubectl apply -f deployment.yaml
kubectl apply -f ingress.yaml
kubectl apply -f middleware.yaml
```
Verify: OIDC discovery endpoint reachable at
`https://id.coulomb.social/.well-known/openid-configuration`
### T08 — End-to-end authentication test
```task
id: NK-WP-0003-T08
status: done
priority: high
state_hub_task_id: "0fba3392-c916-43fd-a2c1-24ce39481043"
note: Completed 2026-03-25. All 3 test packages pass (migration, negative, profile).
Go 1.22.10 found at ~/go/bin/go. DNS resolves to 92.205.62.239 (all 4 subdomains).
Tests run with: cd src && ~/go/bin/go test ./tests/... -v
Results: ok keycape/tests/migration, ok keycape/tests/negative, ok keycape/tests/profile
Note: tests use httptest.Server + mocks — no live cluster connection required.
Test user provisioned: testuser / test.user@coulomb.social
TOTP serial TOTP00007147, seed KVQLHEJCTKCI3K7G2UIF54QUE5BNLBAQ
Validated: auth PASS via privacyIDEA /validate/check.
pi-admin TOTP deferred to T09 hardening.
```
Prove the full auth flow works:
1. OIDC discovery resolves at `id.coulomb.social`
2. Authelia password auth succeeds for a test user
3. privacyIDEA TOTP challenge issued and accepted
4. KeyCape issues a valid access token
5. Token introspection returns expected claims (sub, groups, email)
Use the KeyCape acceptance test suite:
```bash
cd "$(git rev-parse --show-toplevel)/../key-cape"
go test ./tests/... -run TestProfileBaseline -v
```
### T08a — Create Cloudflare DNS A records
```task
id: NK-WP-0003-T08a
status: done
priority: high
state_hub_task_id: "c614f839-61c4-41f6-bfeb-b3f9525a7625"
note: Done — all 5 A records (kc, auth, pink, pink-account, lldap) resolve to 92.205.62.239
via @8.8.8.8. Confirmed 2026-03-25.
```
Create 5 A records in Cloudflare DNS, **proxy disabled (DNS-only / orange cloud OFF)**,
all pointing to `92.205.62.239` (RAILIANCE01 — where k3s/Traefik runs):
| Subdomain | Type | Value |
|-----------|------|-------|
| `kc.coulomb.social` | A | `92.205.62.239` |
| `auth.coulomb.social` | A | `92.205.62.239` |
| `pink.coulomb.social` | A | `92.205.62.239` |
| `pink-account.coulomb.social` | A | `92.205.62.239` |
| `lldap.coulomb.social` | A | `92.205.62.239` |
HTTP-01 ACME challenges require direct origin reachability — Cloudflare proxy blocks this.
Once DNS propagates, cert-manager's pending challenges will auto-resolve and TLS
certs will be issued for all ingresses.
Verify: `dig +short kc.coulomb.social @8.8.8.8``92.205.62.239`
### T08b — Install Go on RAILIANCE01
```task
id: NK-WP-0003-T08b
status: done
priority: high
state_hub_task_id: "fdfe595a-f5a8-466a-82e9-7cc2ad8e5c3e"
note: Go 1.22.10 already installed at ~/go/bin/go (workstation). Tests ran from workstation.
Also: Go v1.25.6 present on RAILIANCE01 via k3s.
```
Go is already installed on RAILIANCE01 via k3s (v1.25.6). No action needed.
Verify: `ssh tegwick@92.205.62.239 "go version"`
### T09 — Backup, DR, and monitoring
```task
id: NK-WP-0003-T09
status: done
priority: medium
state_hub_task_id: "a82751d8-4de8-4668-8568-8dc140a6322b"
note: Done 2026-05-02 consolidation. Backup CronJobs are live on RAILIANCE01 and
have recent successful runs for LLDAP, Authelia, and privacyIDEA. PVC backup
files exist for LLDAP and privacyIDEA enckey; Authelia job logs confirm
/data/backups/authelia.backup.2026-05-02. Break-glass and emergency bundle
state are confirmed in creds-state.yaml. DEFERRED to platform hardening:
CNPG object-store backup (requires MinIO/S3) and Prometheus scraping
(requires kube-prometheus-stack / monitoring CRDs).
```
Consolidation evidence (2026-05-02, RAILIANCE01):
- `lldap-backup`, `authelia-backup`, and `privacyidea-backup` CronJobs exist
and have recent successful runs.
- Latest job logs confirm:
- LLDAP: `/data/backups/users.backup.2026-05-02`
- Authelia: `/data/backups/authelia.backup.2026-05-02`
- privacyIDEA: `/data/backups/enckey.backup.2026-05-02`
- LLDAP PVC contains daily `users.backup.*` files through 2026-05-02.
- privacyIDEA PVC contains daily `enckey.backup.*` files through 2026-05-02.
- `creds-state.yaml` confirms:
- `ops_bundle_created: true`
- `emergency_bundle_delivered: true`
- `bootstrap_complete: true`
- DR runbook is present at `sso-mfa/k8s/backup/DR-RUNBOOK.md`.
- NetworkPolicies include default-deny and backup API egress allowance.
Deferred platform-hardening items:
- CNPG PostgreSQL object-store backup: CNPG is healthy, but no
`ScheduledBackup` resource is installed on RAILIANCE01. This requires a
MinIO/S3 target and should be tracked with the platform backup work rather
than blocking this SSO/MFA deployment workplan.
- Prometheus scraping: monitoring CRDs are not installed on RAILIANCE01
(`servicemonitor` resource type is absent). This requires a
kube-prometheus-stack deployment and should be tracked with cluster
observability work.
## Done criteria
- [x] Credentials: `bootstrap_complete: true` in `creds-state.yaml` (NK-WP-0005)
- [x] verify-t08.sh: PASS=15, FAIL=0 (WARNs are manual offsite confirmation only)
- [x] KeyCape acceptance test suite passes
- [x] DB restore drill completed (LLDAP SQLite — 2 users, all tables verified)
- [x] Emergency bundle delivered and stored in personal password manager (`creds-state.yaml`)
- [x] Ops bundle created and location recorded (`creds-state.yaml`)
- [x] privacyIDEA enckey backed up on PVC (/etc/privacyidea/backups/enckey.backup.*)
- [x] Monitoring/CNPG object-store backups explicitly deferred to platform hardening