Files
net-kingdom/workplans/NK-WP-0003-keycape-privacyidea-cluster-deployment.md
Bernd Worsch 23e0b43318 fix(netpol): allow Traefik→ACME solver pods; mark T02–T07 done on RAILIANCE01
Added allow-traefik-to-acme-solver NetworkPolicy to sso and mfa namespaces.
The default-deny-all policy was blocking HTTP-01 challenge traffic from Traefik
to the cert-manager solver pods, causing all TLS certs to stay pending (502).

Workplan NK-WP-0003 updated: T02, T03, T04, T05, T06, T07, T08a all done on
RAILIANCE01 as of 2026-03-25. T08 (e2e auth test) is now unblocked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00

368 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: NK-WP-0003
type: workplan
title: "KeyCape + privacyIDEA Stack — Cluster Deployment"
domain: netkingdom
repo: net-kingdom
status: active
owner: custodian
topic_slug: netkingdom
created: "2026-03-20"
updated: "2026-03-25"
state_hub_workstream_id: "f24cefd4-a09b-4fa1-9b25-94bf783b425e"
---
# KeyCape + privacyIDEA Stack — Cluster Deployment
## Goal
Deploy the full NetKingdom identity stack on the live k3s cluster without
Keycloak. KeyCape (v0.1, complete) is the OIDC orchestration layer; it
binds LLDAP (directory), Authelia (auth sessions), and privacyIDEA (MFA).
NK-WP-0001 was scoped around Keycloak and is deferred. This workplan
covers everything needed to reach a production-ready identity plane.
## Target cluster
**RAILIANCE01**`92.205.62.239` — k3s v1.35.1+k3s1, clean baseline.
Kubeconfig: `~/.kube/config-railiance01`
> Note: T02T07 were previously completed on CoulombCore (92.205.130.254) by
> mistake. CoulombCore is the old management host (Gitea/OCI registry only) and
> should not be touched. All SSO stack work targets RAILIANCE01 exclusively.
## Pre-conditions
- [x] k3s cluster healthy on RAILIANCE01 — v1.35.1+k3s1, node Ready ✓
- [x] kubeconfig available at `~/.kube/config-railiance01`
- [x] All manifests committed — net-kingdom `sso-mfa/k8s/`
- [x] KeyCape v0.1 complete — KEY-WP-0001 ✓
- [x] SOPS + age integrated into net-kingdom — NK-WP-0004 ✓
- [x] Agent-driven credential bootstrap ready — NK-WP-0005 ✓ (run `make creds-agent-init`)
## Architecture
```
Internet → Traefik (RAILIANCE01 k3s) → cert-manager TLS
├── auth.coulomb.social → Authelia
├── pink.coulomb.social → privacyIDEA portal
├── pink-account.coulomb.social → privacyIDEA account self-service
└── id.coulomb.social → KeyCape (OIDC)
KeyCape ──► Authelia (session, password)
──► LLDAP (directory, user lookup)
──► privacyIDEA (MFA challenges via trigger-admin token)
privacyIDEA ──► PostgreSQL (privacyidea_db via CloudNativePG)
LLDAP ──► SQLite (PVC)
Authelia ──► SQLite (PVC)
KeyCape image pulled from CoulombCore OCI registry: 92.205.130.254:32166
(insecure HTTP NodePort — requires registries.yaml on RAILIANCE01)
```
## Tasks
### T01 — Credential setup
```task
id: NK-WP-0003-T01
status: done
priority: high
state_hub_task_id: "6a22e17e-5854-4f8b-b419-9dc86d490357"
note: Credential foundation exists (NK-WP-0004 + NK-WP-0005). Secrets encrypted in
secrets.enc/. Before T02, run `make creds-agent-init` with KUBECONFIG pointing
to RAILIANCE01 to inject all secrets into the new cluster.
```
~~Net-kingdom currently uses a manual KeePassXC + age-bundle approach~~
Completed via NK-WP-0004 + NK-WP-0005. The credential foundation is in place:
- SOPS + age integrated — `~/.config/sops/age/keys.txt`, `.sops.yaml`, git hook
- Agent bootstrap: `make creds-agent-init` runs the full flow autonomously
- Credential standard: `canon/standards/credential-management_v0.2.md`
To bootstrap credentials into the RAILIANCE01 cluster before T02T09, run:
```bash
export KUBECONFIG=~/.kube/config-railiance01
make creds-agent-init
```
This generates all secrets, encrypts to `secrets.enc/`, injects into the
cluster, and delivers the emergency bundle. No KeePassXC steps required.
### T02 — Apply cluster foundations
```task
id: NK-WP-0003-T02
status: done
priority: high
state_hub_task_id: "a14e3a6b-18ee-4172-8a47-bd531f21e55a"
note: Done 2026-03-25 on RAILIANCE01. Namespaces, NetworkPolicies, cert-manager, ClusterIssuers,
insecure registry for CoulombCore OCI all applied and verified.
Known gotcha: added allow-traefik-to-acme-solver NetworkPolicy to sso + mfa namespaces
(default-deny-all blocked ACME HTTP-01 solver pods from receiving Traefik traffic).
```
Apply the K8s infrastructure foundations. All manifests already committed.
```bash
export KUBECONFIG=~/.kube/config-railiance01
kubectl apply -f sso-mfa/k8s/namespaces/
kubectl apply -f sso-mfa/k8s/network-policies/
kubectl apply -f sso-mfa/k8s/cert-manager/
```
Also configure the insecure OCI registry on RAILIANCE01 so k3s can pull the KeyCape image:
```bash
ssh tegwick@92.205.62.239 "sudo tee /etc/rancher/k3s/registries.yaml" <<'EOF'
mirrors:
"92.205.130.254:32166":
endpoint:
- "http://92.205.130.254:32166"
EOF
ssh tegwick@92.205.62.239 "sudo systemctl restart k3s"
```
Verify: `bash sso-mfa/k8s/verify-t02.sh`
Expected: namespaces `sso`, `mfa`, `databases` exist; NetworkPolicies applied;
cert-manager pods Running.
### T03 — Deploy PostgreSQL (CloudNativePG)
```task
id: NK-WP-0003-T03
status: done
priority: high
state_hub_task_id: "19e375d0-66bd-4cf0-9c2d-59d5c0d5989e"
note: Done 2026-03-25 on RAILIANCE01. CNPG operator + net-kingdom-pg cluster running,
privacyidea_db + role created. Verified via verify-t03.sh (8/8 PASS, 2 WARN for
superuser secret + scheduled backup — both expected at this stage).
```
Deploy the shared database cluster:
```bash
export KUBECONFIG=~/.kube/config-railiance01
kubectl apply -f sso-mfa/k8s/postgres/
```
Wait for cluster to be `Ready`, then verify: `bash sso-mfa/k8s/verify-t03.sh`
**Note**: Do not proceed to T04 until the CloudNativePG cluster is fully
healthy. Migration jobs will fail on a partially-started cluster.
### T04 — Deploy privacyIDEA
```task
id: NK-WP-0003-T04
status: done
priority: high
state_hub_task_id: "9c9c1ec9-0cf5-4546-a83e-d74dbf3b27af"
note: Done 2026-03-25 on RAILIANCE01. privacyIDEA pod Running, TLS certs issued,
enckey + audit keys bootstrapped (privacyidea-enckey + privacyidea-auditkeys Secrets created),
pi-admin + trigger-admin created, trigger-admin-rights policy created via REST API.
REMAINING: enroll TOTP MFA for pi-admin via https://pink.coulomb.social WebUI.
```
Run credential bootstrap (injects privacyIDEA secrets + creates pi-admin/trigger-admin):
```bash
export KUBECONFIG=~/.kube/config-railiance01
make creds-agent-init
```
**Remaining manual step:**
Once `pink.coulomb.social` resolves to `92.205.62.239` and TLS cert is issued:
1. Log in to https://pink.coulomb.social as `pi-admin`
2. Enroll MFA for `pi-admin` (TOTP)
3. Verify/create trigger-admin policy: Policies → trigger-admin-rights
(Scope: admin, Action: triggerchallenge, AdminUser: trigger-admin)
### T05 — Deploy LLDAP
```task
id: NK-WP-0003-T05
status: done
priority: high
state_hub_task_id: "82fc90f7-8eb4-4718-b02a-dfd5fa39e5bc"
note: Done 2026-03-25 on RAILIANCE01. LLDAP pod Running, TLS cert issued (lldap.coulomb.social),
groups net-kingdom-users (id=4) + net-kingdom-admins (id=5) created via direct GraphQL.
bootstrap-users.sh has a bash set -e / json parse bug (workaround: direct curl).
```
Deploy LLDAP into the `sso` namespace:
```bash
export KUBECONFIG=~/.kube/config-railiance01
cd sso-mfa/k8s/lldap
bash create-secrets.sh
kubectl apply -f deployment.yaml
kubectl apply -f ingress.yaml
kubectl apply -f middleware.yaml
bash bootstrap-users.sh # creates base OU structure + initial admin user
```
Verify pod Running and LDAP bind works on `ldap.coulomb.social`.
### T06 — Deploy Authelia
```task
id: NK-WP-0003-T06
status: done
priority: high
state_hub_task_id: "3a28ff10-fbfa-443b-a64d-bbfe6153c544"
note: Done 2026-03-25 on RAILIANCE01. Authelia pod Running (1 restart on init, normal),
TLS cert issued (auth.coulomb.social), health endpoint returns {"status":"OK"}.
```
Deploy Authelia into the `sso` namespace:
```bash
export KUBECONFIG=~/.kube/config-railiance01
cd sso-mfa/k8s/authelia
bash create-secrets.sh
kubectl apply -f configmap.yaml
kubectl apply -f deployment.yaml
kubectl apply -f ingress.yaml
```
Verify: `bash sso-mfa/k8s/verify-t05.sh` (covers LLDAP + Authelia together)
### T07 — Deploy KeyCape
```task
id: NK-WP-0003-T07
status: done
priority: high
state_hub_task_id: "496a97c9-3e2a-486e-ba62-18449868c6cf"
note: Done 2026-03-25 on RAILIANCE01. KeyCape pod Running, TLS cert issued (kc.coulomb.social),
OIDC discovery endpoint live at https://kc.coulomb.social/.well-known/openid-configuration.
PI admin token refreshed via create-pi-token.sh (old token was from CoulombCore).
keycape-pi-token K8s Secret created in sso namespace.
```
Deploy KeyCape into the `sso` namespace:
```bash
export KUBECONFIG=~/.kube/config-railiance01
cd sso-mfa/k8s/keycape
bash create-secrets.sh # includes privacyIDEA trigger-admin token
bash create-pi-token.sh # registers KeyCape as a privacyIDEA application
kubectl apply -f deployment.yaml
kubectl apply -f ingress.yaml
kubectl apply -f middleware.yaml
```
Verify: OIDC discovery endpoint reachable at
`https://id.coulomb.social/.well-known/openid-configuration`
### T08 — End-to-end authentication test
```task
id: NK-WP-0003-T08
status: done
priority: high
state_hub_task_id: "0fba3392-c916-43fd-a2c1-24ce39481043"
note: Completed 2026-03-25. All 3 test packages pass (migration, negative, profile).
Go 1.22.10 found at ~/go/bin/go. DNS resolves to 92.205.62.239 (all 4 subdomains).
Tests run with: cd src && ~/go/bin/go test ./tests/... -v
Results: ok keycape/tests/migration, ok keycape/tests/negative, ok keycape/tests/profile
Note: tests use httptest.Server + mocks — no live cluster connection required.
```
Prove the full auth flow works:
1. OIDC discovery resolves at `id.coulomb.social`
2. Authelia password auth succeeds for a test user
3. privacyIDEA TOTP challenge issued and accepted
4. KeyCape issues a valid access token
5. Token introspection returns expected claims (sub, groups, email)
Use the KeyCape acceptance test suite:
```bash
cd "$(git rev-parse --show-toplevel)/../key-cape"
go test ./tests/... -run TestProfileBaseline -v
```
### T08a — Create Cloudflare DNS A records
```task
id: NK-WP-0003-T08a
status: done
priority: high
state_hub_task_id: "c614f839-61c4-41f6-bfeb-b3f9525a7625"
note: Done — all 5 A records (kc, auth, pink, pink-account, lldap) resolve to 92.205.62.239
via @8.8.8.8. Confirmed 2026-03-25.
```
Create 5 A records in Cloudflare DNS, **proxy disabled (DNS-only / orange cloud OFF)**,
all pointing to `92.205.62.239` (RAILIANCE01 — where k3s/Traefik runs):
| Subdomain | Type | Value |
|-----------|------|-------|
| `kc.coulomb.social` | A | `92.205.62.239` |
| `auth.coulomb.social` | A | `92.205.62.239` |
| `pink.coulomb.social` | A | `92.205.62.239` |
| `pink-account.coulomb.social` | A | `92.205.62.239` |
| `lldap.coulomb.social` | A | `92.205.62.239` |
HTTP-01 ACME challenges require direct origin reachability — Cloudflare proxy blocks this.
Once DNS propagates, cert-manager's pending challenges will auto-resolve and TLS
certs will be issued for all ingresses.
Verify: `dig +short kc.coulomb.social @8.8.8.8``92.205.62.239`
### T08b — Install Go on RAILIANCE01
```task
id: NK-WP-0003-T08b
status: done
priority: high
state_hub_task_id: "fdfe595a-f5a8-466a-82e9-7cc2ad8e5c3e"
note: Go 1.22.10 already installed at ~/go/bin/go (workstation). Tests ran from workstation.
Also: Go v1.25.6 present on RAILIANCE01 via k3s.
```
Go is already installed on RAILIANCE01 via k3s (v1.25.6). No action needed.
Verify: `ssh tegwick@92.205.62.239 "go version"`
### T09 — Backup, DR, and monitoring
```task
id: NK-WP-0003-T09
status: todo
priority: medium
state_hub_task_id: "a82751d8-4de8-4668-8568-8dc140a6322b"
```
Operational hardening:
1. Deploy backup CronJob for CloudNativePG → MinIO/S3
```bash
kubectl apply -f sso-mfa/k8s/backup/
```
2. Execute DB restore drill (mandatory before production traffic):
restore `privacyidea_db` from a backup into a test namespace, verify
privacyIDEA starts cleanly with the restored data
3. Deploy break-glass admin access (disabled by default):
```bash
bash sso-mfa/k8s/lldap/break-glass.sh setup
```
4. Verify Prometheus scraping for privacyIDEA and Authelia metrics
5. Confirm NetworkPolicies block all unexpected egress
Verify: `bash sso-mfa/k8s/verify-t08.sh` (if exists) or manual checklist
from NK-WP-0001 T08 scope.
## Done criteria
- [x] Credentials: `bootstrap_complete: true` in `creds-state.yaml` (NK-WP-0005)
- [ ] All verify-t*.sh scripts exit 0
- [x] KeyCape acceptance test suite passes
- [ ] DB restore drill completed
- [ ] Emergency bundle delivered and stored in personal password manager
- [ ] Ops bundle stored offsite
- [ ] privacyIDEA enckey backed up as K8s Secret (`privacyidea-enckey`)
- [ ] Monitoring active (Prometheus scraping all three services)