diff --git a/sso-mfa/k8s/network-policies/netpol-mfa.yaml b/sso-mfa/k8s/network-policies/netpol-mfa.yaml index 6429700..f3f81f1 100644 --- a/sso-mfa/k8s/network-policies/netpol-mfa.yaml +++ b/sso-mfa/k8s/network-policies/netpol-mfa.yaml @@ -90,6 +90,30 @@ spec: - port: 5432 protocol: TCP --- +# ── Traefik → ACME HTTP-01 solver pods :8089 ───────────────────────────────── +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-traefik-to-acme-solver + namespace: mfa +spec: + podSelector: + matchLabels: + acme.cert-manager.io/http01-solver: "true" + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: kube-system + podSelector: + matchLabels: + app.kubernetes.io/name: traefik + ports: + - port: 8089 + protocol: TCP +--- # ── Allow egress DNS (all pods) ────────────────────────────────────────────── apiVersion: networking.k8s.io/v1 kind: NetworkPolicy diff --git a/sso-mfa/k8s/network-policies/netpol-sso.yaml b/sso-mfa/k8s/network-policies/netpol-sso.yaml index 30b5408..bf0fde5 100644 --- a/sso-mfa/k8s/network-policies/netpol-sso.yaml +++ b/sso-mfa/k8s/network-policies/netpol-sso.yaml @@ -244,6 +244,32 @@ spec: - port: 3890 protocol: TCP --- +# ── Traefik → ACME HTTP-01 solver pods :8089 ───────────────────────────────── +# cert-manager creates temporary solver pods during TLS cert issuance. +# default-deny-all blocks Traefik from reaching them; this policy allows it. +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-traefik-to-acme-solver + namespace: sso +spec: + podSelector: + matchLabels: + acme.cert-manager.io/http01-solver: "true" + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: kube-system + podSelector: + matchLabels: + app.kubernetes.io/name: traefik + ports: + - port: 8089 + protocol: TCP +--- # ── Allow egress DNS (all pods) ────────────────────────────────────────────── apiVersion: networking.k8s.io/v1 kind: NetworkPolicy diff --git a/workplans/NK-WP-0003-keycape-privacyidea-cluster-deployment.md b/workplans/NK-WP-0003-keycape-privacyidea-cluster-deployment.md index d809e02..5f7b458 100644 --- a/workplans/NK-WP-0003-keycape-privacyidea-cluster-deployment.md +++ b/workplans/NK-WP-0003-keycape-privacyidea-cluster-deployment.md @@ -8,7 +8,7 @@ status: active owner: custodian topic_slug: netkingdom created: "2026-03-20" -updated: "2026-03-20" +updated: "2026-03-25" state_hub_workstream_id: "f24cefd4-a09b-4fa1-9b25-94bf783b425e" --- @@ -23,10 +23,19 @@ binds LLDAP (directory), Authelia (auth sessions), and privacyIDEA (MFA). NK-WP-0001 was scoped around Keycloak and is deferred. This workplan covers everything needed to reach a production-ready identity plane. +## Target cluster + +**RAILIANCE01** — `92.205.62.239` — k3s v1.35.1+k3s1, clean baseline. +Kubeconfig: `~/.kube/config-railiance01` + +> Note: T02–T07 were previously completed on CoulombCore (92.205.130.254) by +> mistake. CoulombCore is the old management host (Gitea/OCI registry only) and +> should not be touched. All SSO stack work targets RAILIANCE01 exclusively. + ## Pre-conditions -- [x] k3s cluster healthy — RAIL-BS-WP-0002 ✓ -- [x] kubeconfig available at `~/.kube/config-hosteurope` — RAIL-BS-WP-0005 ✓ +- [x] k3s cluster healthy on RAILIANCE01 — v1.35.1+k3s1, node Ready ✓ +- [x] kubeconfig available at `~/.kube/config-railiance01` ✓ - [x] All manifests committed — net-kingdom `sso-mfa/k8s/` ✓ - [x] KeyCape v0.1 complete — KEY-WP-0001 ✓ - [x] SOPS + age integrated into net-kingdom — NK-WP-0004 ✓ @@ -35,18 +44,22 @@ covers everything needed to reach a production-ready identity plane. ## Architecture ``` -Internet → Traefik (k3s) → cert-manager TLS - ├── auth.coulomb.social → Authelia - ├── pink.coulomb.social → privacyIDEA portal - └── id.coulomb.social → KeyCape (OIDC) +Internet → Traefik (RAILIANCE01 k3s) → cert-manager TLS + ├── auth.coulomb.social → Authelia + ├── pink.coulomb.social → privacyIDEA portal + ├── pink-account.coulomb.social → privacyIDEA account self-service + └── id.coulomb.social → KeyCape (OIDC) KeyCape ──► Authelia (session, password) ──► LLDAP (directory, user lookup) ──► privacyIDEA (MFA challenges via trigger-admin token) privacyIDEA ──► PostgreSQL (privacyidea_db via CloudNativePG) -LLDAP ──► PostgreSQL (lldap_db via CloudNativePG) -Authelia ──► PostgreSQL (authelia_db via CloudNativePG) +LLDAP ──► SQLite (PVC) +Authelia ──► SQLite (PVC) + +KeyCape image pulled from CoulombCore OCI registry: 92.205.130.254:32166 +(insecure HTTP NodePort — requires registries.yaml on RAILIANCE01) ``` ## Tasks @@ -58,10 +71,9 @@ id: NK-WP-0003-T01 status: done priority: high state_hub_task_id: "6a22e17e-5854-4f8b-b419-9dc86d490357" -note: Superseded by NK-WP-0004 (credential foundation) and NK-WP-0005 (agent bootstrap). - Run `make creds-agent-init` to execute fully automated bootstrap. - The manual KeePassXC approach described here is retired — see - canon/standards/credential-management_v0.2.md for the current model. +note: Credential foundation exists (NK-WP-0004 + NK-WP-0005). Secrets encrypted in + secrets.enc/. Before T02, run `make creds-agent-init` with KUBECONFIG pointing + to RAILIANCE01 to inject all secrets into the new cluster. ``` ~~Net-kingdom currently uses a manual KeePassXC + age-bundle approach~~ @@ -71,8 +83,9 @@ Completed via NK-WP-0004 + NK-WP-0005. The credential foundation is in place: - Agent bootstrap: `make creds-agent-init` runs the full flow autonomously - Credential standard: `canon/standards/credential-management_v0.2.md` -To bootstrap credentials before T02–T09, run: +To bootstrap credentials into the RAILIANCE01 cluster before T02–T09, run: ```bash +export KUBECONFIG=~/.kube/config-railiance01 make creds-agent-init ``` This generates all secrets, encrypts to `secrets.enc/`, injects into the @@ -85,20 +98,32 @@ id: NK-WP-0003-T02 status: done priority: high state_hub_task_id: "a14e3a6b-18ee-4172-8a47-bd531f21e55a" -note: Verified 2026-03-21 — all namespaces, NetworkPolicies, cert-manager, and ClusterIssuers - already applied (35h+ ago). verify-t02.sh 22/22 passed. Fixed stale keycloak→keycape - check in verify script. +note: Done 2026-03-25 on RAILIANCE01. Namespaces, NetworkPolicies, cert-manager, ClusterIssuers, + insecure registry for CoulombCore OCI all applied and verified. + Known gotcha: added allow-traefik-to-acme-solver NetworkPolicy to sso + mfa namespaces + (default-deny-all blocked ACME HTTP-01 solver pods from receiving Traefik traffic). ``` Apply the K8s infrastructure foundations. All manifests already committed. ```bash -export KUBECONFIG=~/.kube/config-hosteurope +export KUBECONFIG=~/.kube/config-railiance01 kubectl apply -f sso-mfa/k8s/namespaces/ kubectl apply -f sso-mfa/k8s/network-policies/ kubectl apply -f sso-mfa/k8s/cert-manager/ ``` +Also configure the insecure OCI registry on RAILIANCE01 so k3s can pull the KeyCape image: +```bash +ssh tegwick@92.205.62.239 "sudo tee /etc/rancher/k3s/registries.yaml" <<'EOF' +mirrors: + "92.205.130.254:32166": + endpoint: + - "http://92.205.130.254:32166" +EOF +ssh tegwick@92.205.62.239 "sudo systemctl restart k3s" +``` + Verify: `bash sso-mfa/k8s/verify-t02.sh` Expected: namespaces `sso`, `mfa`, `databases` exist; NetworkPolicies applied; @@ -111,17 +136,15 @@ id: NK-WP-0003-T03 status: done priority: high state_hub_task_id: "19e375d0-66bd-4cf0-9c2d-59d5c0d5989e" -note: Verified 2026-03-21 — CNPG cluster net-kingdom-pg healthy (1/1 Ready), privacyidea_db exists. - LLDAP and Authelia use SQLite (PVC), no additional PG databases needed. - verify-t03.sh: 8 PASS, 2 WARN (superuser secret + backup — both expected at this stage). +note: Done 2026-03-25 on RAILIANCE01. CNPG operator + net-kingdom-pg cluster running, + privacyidea_db + role created. Verified via verify-t03.sh (8/8 PASS, 2 WARN for + superuser secret + scheduled backup — both expected at this stage). ``` -Deploy the shared database cluster with three databases: -- `privacyidea_db` — privacyIDEA -- `lldap_db` — LLDAP -- `authelia_db` — Authelia +Deploy the shared database cluster: ```bash +export KUBECONFIG=~/.kube/config-railiance01 kubectl apply -f sso-mfa/k8s/postgres/ ``` @@ -137,23 +160,20 @@ id: NK-WP-0003-T04 status: done priority: high state_hub_task_id: "9c9c1ec9-0cf5-4546-a83e-d74dbf3b27af" -note: Completed 2026-03-21 via make creds-agent-init (NK-WP-0005). - Pod Running (ghcr.io/gpappsoft/privacyidea-docker:3.12.2, port 8080). - enckey + audit keys extracted to K8s Secrets privacyidea-enckey/auditkeys. - pi-admin and trigger-admin created. keycape-pi-token Secret in sso namespace. - Remaining: TLS cert for pink.coulomb.social (ACME solver pods visible — T02 cert-manager needed). - trigger-admin policy must be set manually via WebUI once pink.coulomb.social resolves. +note: Done 2026-03-25 on RAILIANCE01. privacyIDEA pod Running, TLS certs issued, + enckey + audit keys bootstrapped (privacyidea-enckey + privacyidea-auditkeys Secrets created), + pi-admin + trigger-admin created, trigger-admin-rights policy created via REST API. + REMAINING: enroll TOTP MFA for pi-admin via https://pink.coulomb.social WebUI. ``` -Completed via `make creds-agent-init`. All Steps 1–4 were automated by the agent bootstrap. - -**Image fixes applied (2026-03-21):** -- `privacyidea/otpserver:3.12.2` → `ghcr.io/gpappsoft/privacyidea-docker:3.12.2` (port 8080) -- `PRIVACYIDEA_CONFIGFILE`, `PI_ADDRESS`, `PI_PORT` env vars added -- Readiness probe changed to `tcpSocket` (`/token/` returns 401 for unauthenticated GET) +Run credential bootstrap (injects privacyIDEA secrets + creates pi-admin/trigger-admin): +```bash +export KUBECONFIG=~/.kube/config-railiance01 +make creds-agent-init +``` **Remaining manual step:** -Once `pink.coulomb.social` resolves to the cluster IP and TLS cert is issued: +Once `pink.coulomb.social` resolves to `92.205.62.239` and TLS cert is issued: 1. Log in to https://pink.coulomb.social as `pi-admin` 2. Enroll MFA for `pi-admin` (TOTP) 3. Verify/create trigger-admin policy: Policies → trigger-admin-rights @@ -166,15 +186,15 @@ id: NK-WP-0003-T05 status: done priority: high state_hub_task_id: "82fc90f7-8eb4-4718-b02a-dfd5fa39e5bc" -note: Deployed 2026-03-21. securityContext fix: removed runAsNonRoot/runAsUser (lldap image - initialises as root). Pod 1/1 Running. Groups net-kingdom-users + net-kingdom-admins created - via API (plaintext secrets dir cleaned up by agent; used K8s secret directly). - ACME solver running for lldap.coulomb.social. +note: Done 2026-03-25 on RAILIANCE01. LLDAP pod Running, TLS cert issued (lldap.coulomb.social), + groups net-kingdom-users (id=4) + net-kingdom-admins (id=5) created via direct GraphQL. + bootstrap-users.sh has a bash set -e / json parse bug (workaround: direct curl). ``` -Deploy LLDAP into the `sso` namespace. +Deploy LLDAP into the `sso` namespace: ```bash +export KUBECONFIG=~/.kube/config-railiance01 cd sso-mfa/k8s/lldap bash create-secrets.sh kubectl apply -f deployment.yaml @@ -192,15 +212,14 @@ id: NK-WP-0003-T06 status: done priority: high state_hub_task_id: "3a28ff10-fbfa-443b-a64d-bbfe6153c544" -note: Deployed 2026-03-21. Two config fixes: (1) users_filter changed uid→{username_attribute}={input}; - (2) OIDC client secret moved from unsupported env var to inline bcrypt hash in configmap - (4.38 does not support CLIENTS_0_SECRET_FILE indexed env vars). Pod 1/1 Running, - "Startup complete". Remaining deprecation warnings are auto-mapped and non-fatal. +note: Done 2026-03-25 on RAILIANCE01. Authelia pod Running (1 restart on init, normal), + TLS cert issued (auth.coulomb.social), health endpoint returns {"status":"OK"}. ``` -Deploy Authelia into the `sso` namespace. +Deploy Authelia into the `sso` namespace: ```bash +export KUBECONFIG=~/.kube/config-railiance01 cd sso-mfa/k8s/authelia bash create-secrets.sh kubectl apply -f configmap.yaml @@ -217,22 +236,16 @@ id: NK-WP-0003-T07 status: done priority: high state_hub_task_id: "496a97c9-3e2a-486e-ba62-18449868c6cf" -note: Completed 2026-03-22. KEY-WP-0002 delivered image to Gitea OCI registry - (92.205.130.254:32166/coulomb/key-cape:latest). Three issues fixed: - 1. deployment.yaml image ref updated to Gitea registry (correct namespace: coulomb) - 2. k3s hosts.toml fixed: server endpoint must be http:// for plain-HTTP Gitea NodePort - (k3s generated https:// by default → "http: server gave HTTP response to HTTPS client") - 3. keycape-config clients: [] → added demo-app client (required for startup + T08 tests) - Pod 1/1 Running; /healthz OK; OIDC discovery live. - Note: hosts.toml at /var/lib/rancher/k3s/agent/etc/containerd/certs.d/92.205.130.254:32166/ - is generated from /etc/rancher/k3s/registries.yaml — will revert on k3s restart. - Permanent fix: registries.yaml mirror config generates HTTPS server by default; - need to manually maintain hosts.toml or find k3s config that forces HTTP server. +note: Done 2026-03-25 on RAILIANCE01. KeyCape pod Running, TLS cert issued (kc.coulomb.social), + OIDC discovery endpoint live at https://kc.coulomb.social/.well-known/openid-configuration. + PI admin token refreshed via create-pi-token.sh (old token was from CoulombCore). + keycape-pi-token K8s Secret created in sso namespace. ``` -Deploy KeyCape into the `sso` namespace. +Deploy KeyCape into the `sso` namespace: ```bash +export KUBECONFIG=~/.kube/config-railiance01 cd sso-mfa/k8s/keycape bash create-secrets.sh # includes privacyIDEA trigger-admin token bash create-pi-token.sh # registers KeyCape as a privacyIDEA application @@ -278,47 +291,41 @@ id: NK-WP-0003-T08a status: done priority: high state_hub_task_id: "c614f839-61c4-41f6-bfeb-b3f9525a7625" -note: DNS resolves 2026-03-25 — all 4 subdomains resolve to 92.205.62.239 via 8.8.8.8. - (IP differs from workplan spec of 92.205.130.254 — cluster IP may have changed.) +note: Done — all 5 A records (kc, auth, pink, pink-account, lldap) resolve to 92.205.62.239 + via @8.8.8.8. Confirmed 2026-03-25. ``` -Create 4 A records in Cloudflare DNS, **proxy disabled (DNS-only / orange cloud OFF)**, -all pointing to `92.205.130.254`: +Create 5 A records in Cloudflare DNS, **proxy disabled (DNS-only / orange cloud OFF)**, +all pointing to `92.205.62.239` (RAILIANCE01 — where k3s/Traefik runs): | Subdomain | Type | Value | |-----------|------|-------| -| `kc.coulomb.social` | A | `92.205.130.254` | -| `auth.coulomb.social` | A | `92.205.130.254` | -| `pink.coulomb.social` | A | `92.205.130.254` | -| `lldap.coulomb.social` | A | `92.205.130.254` | +| `kc.coulomb.social` | A | `92.205.62.239` | +| `auth.coulomb.social` | A | `92.205.62.239` | +| `pink.coulomb.social` | A | `92.205.62.239` | +| `pink-account.coulomb.social` | A | `92.205.62.239` | +| `lldap.coulomb.social` | A | `92.205.62.239` | HTTP-01 ACME challenges require direct origin reachability — Cloudflare proxy blocks this. -Once DNS propagates, cert-manager's three pending challenges will auto-resolve and TLS -certs will be issued for all four ingresses. +Once DNS propagates, cert-manager's pending challenges will auto-resolve and TLS +certs will be issued for all ingresses. -Verify: `dig +short kc.coulomb.social @8.8.8.8` → `92.205.130.254` +Verify: `dig +short kc.coulomb.social @8.8.8.8` → `92.205.62.239` -### T08b — Install Go on CoulombCore +### T08b — Install Go on RAILIANCE01 ```task id: NK-WP-0003-T08b status: done priority: high state_hub_task_id: "fdfe595a-f5a8-466a-82e9-7cc2ad8e5c3e" -note: Go 1.22.10 already installed at ~/go/bin/go. Tests run successfully against go 1.23 module. +note: Go 1.22.10 already installed at ~/go/bin/go (workstation). Tests ran from workstation. + Also: Go v1.25.6 present on RAILIANCE01 via k3s. ``` -Go is not installed on CoulombCore. Required for the KeyCape acceptance test suite (T08). +Go is already installed on RAILIANCE01 via k3s (v1.25.6). No action needed. -```bash -wget https://go.dev/dl/go1.22.5.linux-amd64.tar.gz -sudo tar -C /usr/local -xzf go1.22.5.linux-amd64.tar.gz -echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc -source ~/.bashrc -go version # should print go1.22.5 -``` - -Verify: `cd ~/key-cape/src && go test ./tests/... -run TestProfileBaseline -v` +Verify: `ssh tegwick@92.205.62.239 "go version"` ### T09 — Backup, DR, and monitoring