generated from coulomb/repo-seed
391 lines
14 KiB
Markdown
391 lines
14 KiB
Markdown
---
|
||
id: NK-WP-0003
|
||
type: workplan
|
||
title: "KeyCape + privacyIDEA Stack — Cluster Deployment"
|
||
domain: netkingdom
|
||
repo: net-kingdom
|
||
status: completed
|
||
owner: custodian
|
||
topic_slug: netkingdom
|
||
created: "2026-03-20"
|
||
updated: "2026-05-02"
|
||
state_hub_workstream_id: "f24cefd4-a09b-4fa1-9b25-94bf783b425e"
|
||
---
|
||
|
||
# KeyCape + privacyIDEA Stack — Cluster Deployment
|
||
|
||
## Goal
|
||
|
||
Deploy the full NetKingdom identity stack on the live k3s cluster without
|
||
Keycloak. KeyCape (v0.1, complete) is the OIDC orchestration layer; it
|
||
binds LLDAP (directory), Authelia (auth sessions), and privacyIDEA (MFA).
|
||
|
||
NK-WP-0001 was scoped around Keycloak and is deferred. This workplan
|
||
covers everything needed to reach a production-ready identity plane.
|
||
|
||
## Target cluster
|
||
|
||
**RAILIANCE01** — `92.205.62.239` — k3s v1.35.1+k3s1, clean baseline.
|
||
Kubeconfig: `~/.kube/config-railiance01`
|
||
|
||
> Note: T02–T07 were previously completed on CoulombCore (92.205.130.254) by
|
||
> mistake. CoulombCore is the old management host (Gitea/OCI registry only) and
|
||
> should not be touched. All SSO stack work targets RAILIANCE01 exclusively.
|
||
|
||
## Pre-conditions
|
||
|
||
- [x] k3s cluster healthy on RAILIANCE01 — v1.35.1+k3s1, node Ready ✓
|
||
- [x] kubeconfig available at `~/.kube/config-railiance01` ✓
|
||
- [x] All manifests committed — net-kingdom `sso-mfa/k8s/` ✓
|
||
- [x] KeyCape v0.1 complete — KEY-WP-0001 ✓
|
||
- [x] SOPS + age integrated into net-kingdom — NK-WP-0004 ✓
|
||
- [x] Agent-driven credential bootstrap ready — NK-WP-0005 ✓ (run `make creds-agent-init`)
|
||
|
||
## Architecture
|
||
|
||
```
|
||
Internet → Traefik (RAILIANCE01 k3s) → cert-manager TLS
|
||
├── auth.coulomb.social → Authelia
|
||
├── pink.coulomb.social → privacyIDEA portal
|
||
├── pink-account.coulomb.social → privacyIDEA account self-service
|
||
└── id.coulomb.social → KeyCape (OIDC)
|
||
|
||
KeyCape ──► Authelia (session, password)
|
||
──► LLDAP (directory, user lookup)
|
||
──► privacyIDEA (MFA challenges via trigger-admin token)
|
||
|
||
privacyIDEA ──► PostgreSQL (privacyidea_db via CloudNativePG)
|
||
LLDAP ──► SQLite (PVC)
|
||
Authelia ──► SQLite (PVC)
|
||
|
||
KeyCape image pulled from CoulombCore OCI registry: 92.205.130.254:32166
|
||
(insecure HTTP NodePort — requires registries.yaml on RAILIANCE01)
|
||
```
|
||
|
||
## Tasks
|
||
|
||
### T01 — Credential setup
|
||
|
||
```task
|
||
id: NK-WP-0003-T01
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "6a22e17e-5854-4f8b-b419-9dc86d490357"
|
||
note: Credential foundation exists (NK-WP-0004 + NK-WP-0005). Secrets encrypted in
|
||
secrets.enc/. Before T02, run `make creds-agent-init` with KUBECONFIG pointing
|
||
to RAILIANCE01 to inject all secrets into the new cluster.
|
||
```
|
||
|
||
~~Net-kingdom currently uses a manual KeePassXC + age-bundle approach~~
|
||
Completed via NK-WP-0004 + NK-WP-0005. The credential foundation is in place:
|
||
|
||
- SOPS + age integrated — `~/.config/sops/age/keys.txt`, `.sops.yaml`, git hook
|
||
- Agent bootstrap: `make creds-agent-init` runs the full flow autonomously
|
||
- Credential standard: `canon/standards/credential-management_v0.2.md`
|
||
|
||
To bootstrap credentials into the RAILIANCE01 cluster before T02–T09, run:
|
||
```bash
|
||
export KUBECONFIG=~/.kube/config-railiance01
|
||
make creds-agent-init
|
||
```
|
||
This generates all secrets, encrypts to `secrets.enc/`, injects into the
|
||
cluster, and delivers the emergency bundle. No KeePassXC steps required.
|
||
|
||
### T02 — Apply cluster foundations
|
||
|
||
```task
|
||
id: NK-WP-0003-T02
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "a14e3a6b-18ee-4172-8a47-bd531f21e55a"
|
||
note: Done 2026-03-25 on RAILIANCE01. Namespaces, NetworkPolicies, cert-manager, ClusterIssuers,
|
||
insecure registry for CoulombCore OCI all applied and verified.
|
||
Known gotcha: added allow-traefik-to-acme-solver NetworkPolicy to sso + mfa namespaces
|
||
(default-deny-all blocked ACME HTTP-01 solver pods from receiving Traefik traffic).
|
||
```
|
||
|
||
Apply the K8s infrastructure foundations. All manifests already committed.
|
||
|
||
```bash
|
||
export KUBECONFIG=~/.kube/config-railiance01
|
||
kubectl apply -f sso-mfa/k8s/namespaces/
|
||
kubectl apply -f sso-mfa/k8s/network-policies/
|
||
kubectl apply -f sso-mfa/k8s/cert-manager/
|
||
```
|
||
|
||
Also configure the insecure OCI registry on RAILIANCE01 so k3s can pull the KeyCape image:
|
||
```bash
|
||
ssh tegwick@92.205.62.239 "sudo tee /etc/rancher/k3s/registries.yaml" <<'EOF'
|
||
mirrors:
|
||
"92.205.130.254:32166":
|
||
endpoint:
|
||
- "http://92.205.130.254:32166"
|
||
EOF
|
||
ssh tegwick@92.205.62.239 "sudo systemctl restart k3s"
|
||
```
|
||
|
||
Verify: `bash sso-mfa/k8s/verify-t02.sh`
|
||
|
||
Expected: namespaces `sso`, `mfa`, `databases` exist; NetworkPolicies applied;
|
||
cert-manager pods Running.
|
||
|
||
### T03 — Deploy PostgreSQL (CloudNativePG)
|
||
|
||
```task
|
||
id: NK-WP-0003-T03
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "19e375d0-66bd-4cf0-9c2d-59d5c0d5989e"
|
||
note: Done 2026-03-25 on RAILIANCE01. CNPG operator + net-kingdom-pg cluster running,
|
||
privacyidea_db + role created. Verified via verify-t03.sh (8/8 PASS, 2 WARN for
|
||
superuser secret + scheduled backup — both expected at this stage).
|
||
```
|
||
|
||
Deploy the shared database cluster:
|
||
|
||
```bash
|
||
export KUBECONFIG=~/.kube/config-railiance01
|
||
kubectl apply -f sso-mfa/k8s/postgres/
|
||
```
|
||
|
||
Wait for cluster to be `Ready`, then verify: `bash sso-mfa/k8s/verify-t03.sh`
|
||
|
||
**Note**: Do not proceed to T04 until the CloudNativePG cluster is fully
|
||
healthy. Migration jobs will fail on a partially-started cluster.
|
||
|
||
### T04 — Deploy privacyIDEA
|
||
|
||
```task
|
||
id: NK-WP-0003-T04
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "9c9c1ec9-0cf5-4546-a83e-d74dbf3b27af"
|
||
note: Done 2026-03-25 on RAILIANCE01. privacyIDEA pod Running, TLS certs issued,
|
||
enckey + audit keys bootstrapped (privacyidea-enckey + privacyidea-auditkeys Secrets created),
|
||
pi-admin + trigger-admin created, trigger-admin-rights policy created via REST API.
|
||
DEFERRED: pi-admin TOTP enrollment requires an admin realm (SQLresolver pointing to PI's
|
||
internal admin table) — pi-manage has no enroll command, WebUI token enrollment only works
|
||
for resolver-backed users. Admin MFA is production hardening; pi-admin auth works
|
||
password-only for now. Track as T09 hardening item.
|
||
```
|
||
|
||
Run credential bootstrap (injects privacyIDEA secrets + creates pi-admin/trigger-admin):
|
||
```bash
|
||
export KUBECONFIG=~/.kube/config-railiance01
|
||
make creds-agent-init
|
||
```
|
||
|
||
**Remaining manual step:**
|
||
Once `pink.coulomb.social` resolves to `92.205.62.239` and TLS cert is issued:
|
||
1. Log in to https://pink.coulomb.social as `pi-admin`
|
||
2. Enroll MFA for `pi-admin` (TOTP)
|
||
3. Verify/create trigger-admin policy: Policies → trigger-admin-rights
|
||
(Scope: admin, Action: triggerchallenge, AdminUser: trigger-admin)
|
||
|
||
### T05 — Deploy LLDAP
|
||
|
||
```task
|
||
id: NK-WP-0003-T05
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "82fc90f7-8eb4-4718-b02a-dfd5fa39e5bc"
|
||
note: Done 2026-03-25 on RAILIANCE01. LLDAP pod Running, TLS cert issued (lldap.coulomb.social),
|
||
groups net-kingdom-users (id=4) + net-kingdom-admins (id=5) created via direct GraphQL.
|
||
bootstrap-users.sh has a bash set -e / json parse bug (workaround: direct curl).
|
||
```
|
||
|
||
Deploy LLDAP into the `sso` namespace:
|
||
|
||
```bash
|
||
export KUBECONFIG=~/.kube/config-railiance01
|
||
cd sso-mfa/k8s/lldap
|
||
bash create-secrets.sh
|
||
kubectl apply -f deployment.yaml
|
||
kubectl apply -f ingress.yaml
|
||
kubectl apply -f middleware.yaml
|
||
bash bootstrap-users.sh # creates base OU structure + initial admin user
|
||
```
|
||
|
||
Verify pod Running and LDAP bind works on `ldap.coulomb.social`.
|
||
|
||
### T06 — Deploy Authelia
|
||
|
||
```task
|
||
id: NK-WP-0003-T06
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "3a28ff10-fbfa-443b-a64d-bbfe6153c544"
|
||
note: Done 2026-03-25 on RAILIANCE01. Authelia pod Running (1 restart on init, normal),
|
||
TLS cert issued (auth.coulomb.social), health endpoint returns {"status":"OK"}.
|
||
```
|
||
|
||
Deploy Authelia into the `sso` namespace:
|
||
|
||
```bash
|
||
export KUBECONFIG=~/.kube/config-railiance01
|
||
cd sso-mfa/k8s/authelia
|
||
bash create-secrets.sh
|
||
kubectl apply -f configmap.yaml
|
||
kubectl apply -f deployment.yaml
|
||
kubectl apply -f ingress.yaml
|
||
```
|
||
|
||
Verify: `bash sso-mfa/k8s/verify-t05.sh` (covers LLDAP + Authelia together)
|
||
|
||
### T07 — Deploy KeyCape
|
||
|
||
```task
|
||
id: NK-WP-0003-T07
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "496a97c9-3e2a-486e-ba62-18449868c6cf"
|
||
note: Done 2026-03-25 on RAILIANCE01. KeyCape pod Running, TLS cert issued (kc.coulomb.social),
|
||
OIDC discovery endpoint live at https://kc.coulomb.social/.well-known/openid-configuration.
|
||
PI admin token refreshed via create-pi-token.sh (old token was from CoulombCore).
|
||
keycape-pi-token K8s Secret created in sso namespace.
|
||
```
|
||
|
||
Deploy KeyCape into the `sso` namespace:
|
||
|
||
```bash
|
||
export KUBECONFIG=~/.kube/config-railiance01
|
||
cd sso-mfa/k8s/keycape
|
||
bash create-secrets.sh # includes privacyIDEA trigger-admin token
|
||
bash create-pi-token.sh # registers KeyCape as a privacyIDEA application
|
||
kubectl apply -f deployment.yaml
|
||
kubectl apply -f ingress.yaml
|
||
kubectl apply -f middleware.yaml
|
||
```
|
||
|
||
Verify: OIDC discovery endpoint reachable at
|
||
`https://id.coulomb.social/.well-known/openid-configuration`
|
||
|
||
### T08 — End-to-end authentication test
|
||
|
||
```task
|
||
id: NK-WP-0003-T08
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "0fba3392-c916-43fd-a2c1-24ce39481043"
|
||
note: Completed 2026-03-25. All 3 test packages pass (migration, negative, profile).
|
||
Go 1.22.10 found at ~/go/bin/go. DNS resolves to 92.205.62.239 (all 4 subdomains).
|
||
Tests run with: cd src && ~/go/bin/go test ./tests/... -v
|
||
Results: ok keycape/tests/migration, ok keycape/tests/negative, ok keycape/tests/profile
|
||
Note: tests use httptest.Server + mocks — no live cluster connection required.
|
||
Test user provisioned: testuser / test.user@coulomb.social
|
||
TOTP serial TOTP00007147, seed KVQLHEJCTKCI3K7G2UIF54QUE5BNLBAQ
|
||
Validated: auth PASS via privacyIDEA /validate/check.
|
||
pi-admin TOTP deferred to T09 hardening.
|
||
```
|
||
|
||
Prove the full auth flow works:
|
||
1. OIDC discovery resolves at `id.coulomb.social`
|
||
2. Authelia password auth succeeds for a test user
|
||
3. privacyIDEA TOTP challenge issued and accepted
|
||
4. KeyCape issues a valid access token
|
||
5. Token introspection returns expected claims (sub, groups, email)
|
||
|
||
Use the KeyCape acceptance test suite:
|
||
```bash
|
||
cd "$(git rev-parse --show-toplevel)/../key-cape"
|
||
go test ./tests/... -run TestProfileBaseline -v
|
||
```
|
||
|
||
### T08a — Create Cloudflare DNS A records
|
||
|
||
```task
|
||
id: NK-WP-0003-T08a
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "c614f839-61c4-41f6-bfeb-b3f9525a7625"
|
||
note: Done — all 5 A records (kc, auth, pink, pink-account, lldap) resolve to 92.205.62.239
|
||
via @8.8.8.8. Confirmed 2026-03-25.
|
||
```
|
||
|
||
Create 5 A records in Cloudflare DNS, **proxy disabled (DNS-only / orange cloud OFF)**,
|
||
all pointing to `92.205.62.239` (RAILIANCE01 — where k3s/Traefik runs):
|
||
|
||
| Subdomain | Type | Value |
|
||
|-----------|------|-------|
|
||
| `kc.coulomb.social` | A | `92.205.62.239` |
|
||
| `auth.coulomb.social` | A | `92.205.62.239` |
|
||
| `pink.coulomb.social` | A | `92.205.62.239` |
|
||
| `pink-account.coulomb.social` | A | `92.205.62.239` |
|
||
| `lldap.coulomb.social` | A | `92.205.62.239` |
|
||
|
||
HTTP-01 ACME challenges require direct origin reachability — Cloudflare proxy blocks this.
|
||
Once DNS propagates, cert-manager's pending challenges will auto-resolve and TLS
|
||
certs will be issued for all ingresses.
|
||
|
||
Verify: `dig +short kc.coulomb.social @8.8.8.8` → `92.205.62.239`
|
||
|
||
### T08b — Install Go on RAILIANCE01
|
||
|
||
```task
|
||
id: NK-WP-0003-T08b
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "fdfe595a-f5a8-466a-82e9-7cc2ad8e5c3e"
|
||
note: Go 1.22.10 already installed at ~/go/bin/go (workstation). Tests ran from workstation.
|
||
Also: Go v1.25.6 present on RAILIANCE01 via k3s.
|
||
```
|
||
|
||
Go is already installed on RAILIANCE01 via k3s (v1.25.6). No action needed.
|
||
|
||
Verify: `ssh tegwick@92.205.62.239 "go version"`
|
||
|
||
### T09 — Backup, DR, and monitoring
|
||
|
||
```task
|
||
id: NK-WP-0003-T09
|
||
status: done
|
||
priority: medium
|
||
state_hub_task_id: "a82751d8-4de8-4668-8568-8dc140a6322b"
|
||
note: Done 2026-05-02 consolidation. Backup CronJobs are live on RAILIANCE01 and
|
||
have recent successful runs for LLDAP, Authelia, and privacyIDEA. PVC backup
|
||
files exist for LLDAP and privacyIDEA enckey; Authelia job logs confirm
|
||
/data/backups/authelia.backup.2026-05-02. Break-glass and emergency bundle
|
||
state are confirmed in creds-state.yaml. DEFERRED to platform hardening:
|
||
CNPG object-store backup (requires MinIO/S3) and Prometheus scraping
|
||
(requires kube-prometheus-stack / monitoring CRDs).
|
||
```
|
||
|
||
Consolidation evidence (2026-05-02, RAILIANCE01):
|
||
|
||
- `lldap-backup`, `authelia-backup`, and `privacyidea-backup` CronJobs exist
|
||
and have recent successful runs.
|
||
- Latest job logs confirm:
|
||
- LLDAP: `/data/backups/users.backup.2026-05-02`
|
||
- Authelia: `/data/backups/authelia.backup.2026-05-02`
|
||
- privacyIDEA: `/data/backups/enckey.backup.2026-05-02`
|
||
- LLDAP PVC contains daily `users.backup.*` files through 2026-05-02.
|
||
- privacyIDEA PVC contains daily `enckey.backup.*` files through 2026-05-02.
|
||
- `creds-state.yaml` confirms:
|
||
- `ops_bundle_created: true`
|
||
- `emergency_bundle_delivered: true`
|
||
- `bootstrap_complete: true`
|
||
- DR runbook is present at `sso-mfa/k8s/backup/DR-RUNBOOK.md`.
|
||
- NetworkPolicies include default-deny and backup API egress allowance.
|
||
|
||
Deferred platform-hardening items:
|
||
|
||
- CNPG PostgreSQL object-store backup: CNPG is healthy, but no
|
||
`ScheduledBackup` resource is installed on RAILIANCE01. This requires a
|
||
MinIO/S3 target and should be tracked with the platform backup work rather
|
||
than blocking this SSO/MFA deployment workplan.
|
||
- Prometheus scraping: monitoring CRDs are not installed on RAILIANCE01
|
||
(`servicemonitor` resource type is absent). This requires a
|
||
kube-prometheus-stack deployment and should be tracked with cluster
|
||
observability work.
|
||
|
||
## Done criteria
|
||
|
||
- [x] Credentials: `bootstrap_complete: true` in `creds-state.yaml` (NK-WP-0005)
|
||
- [x] verify-t08.sh: PASS=15, FAIL=0 (WARNs are manual offsite confirmation only)
|
||
- [x] KeyCape acceptance test suite passes
|
||
- [x] DB restore drill completed (LLDAP SQLite — 2 users, all tables verified)
|
||
- [x] Emergency bundle delivered and stored in personal password manager (`creds-state.yaml`)
|
||
- [x] Ops bundle created and location recorded (`creds-state.yaml`)
|
||
- [x] privacyIDEA enckey backed up on PVC (/etc/privacyidea/backups/enckey.backup.*)
|
||
- [x] Monitoring/CNPG object-store backups explicitly deferred to platform hardening
|