diff --git a/sso-mfa/WORKPLAN.md b/sso-mfa/WORKPLAN.md index cd3f90e..403da73 100644 --- a/sso-mfa/WORKPLAN.md +++ b/sso-mfa/WORKPLAN.md @@ -1,7 +1,7 @@ # SSO-MFA Platform — Stack Migration Workplan # NK-WP-0001 — Keycloak → Authelia + LLDAP + KeyCape -**Updated:** 2026-03-19 (T06 in progress) +**Updated:** 2026-03-19 (T06 pending cluster; T07/T08 manifests complete) **Workstream:** sso-mfa-platform (39263c4b-ef70-4053-b782-350834b7e1be) ## Stack Decision @@ -24,8 +24,8 @@ Hostnames: kc.coulomb.social (KeyCape), auth.coulomb.social (Authelia), lldap.co | T04 — privacyIDEA | 6ad1296a | **todo** | Manifests exist in k8s/privacyidea/; pending cluster | | T05 — SSO core (new stack) | b9f73aa6 | done | commit 0754dc3 | | T06 — Realm config & MFA flow | 3b6379a4 | **in-progress** | See below | -| T07 — User mgmt & self-service | c7cf902a | todo | | -| T08 — Backups, DR, break-glass | 9cbd1d89 | todo | | +| T07 — User mgmt & self-service | c7cf902a | **in-progress** | See below | +| T08 — Backups, DR, break-glass | 9cbd1d89 | **in-progress** | See below | ## T05 — SSO Core (new stack: LLDAP + Authelia + KeyCape) @@ -76,3 +76,49 @@ Hostnames: kc.coulomb.social (KeyCape), auth.coulomb.social (Authelia), lldap.co - KeyCape→privacyIDEA token list API returns status=True - At least one user has enrolled a TOTP token - verify-t06.sh: 0 FAILs + +## T07 — User mgmt & self-service + +### Deliverables +- [x] `k8s/lldap/bootstrap-users.sh` — creates net-kingdom-users and net-kingdom-admins groups in LLDAP via GraphQL API +- [x] `k8s/lldap/break-glass.sh` — creates the break-glass bypass account and assigns to net-kingdom-admins +- [x] `k8s/verify-t07.sh` — verifies groups, break-glass user, self-service portal, OIDC client registrations + +### Pending (needs live cluster) +- [ ] Run `lldap/bootstrap-users.sh` to create groups +- [ ] Run `lldap/break-glass.sh` to create break-glass account +- [ ] Add first real user via LLDAP WebUI (lldap.coulomb.social) +- [ ] Register first OIDC client in `keycape/create-secrets.sh` (clients: block) +- [ ] User self-enrolls TOTP at pink-account.coulomb.social +- [ ] Run `verify-t07.sh` — 0 FAILs + +### Done-criteria for T07 +- Groups net-kingdom-users and net-kingdom-admins exist in LLDAP +- break-glass user exists and is in net-kingdom-admins +- At least one regular user exists +- At least one OIDC client registered in KeyCape +- verify-t07.sh: 0 FAILs + +## T08 — Backups, DR, break-glass + +### Deliverables +- [x] `k8s/backup/cronjob-sqlite-backups.yaml` — daily SQLite backup CronJobs for LLDAP, Authelia, privacyIDEA; RBAC for Authelia scale-down/up +- [x] `k8s/backup/DR-RUNBOOK.md` — full restore runbook: scenarios, restore order, node rebuild procedure, offsite export +- [x] `k8s/verify-t08.sh` — verifies CronJobs, RBAC, backup files on PVCs, DR runbook presence + +### Pending (needs live cluster) +- [ ] Apply `backup/cronjob-sqlite-backups.yaml` +- [ ] Trigger each CronJob manually once to verify they run clean: + `kubectl create job -n sso --from=cronjob/lldap-backup lldap-backup-test` + `kubectl create job -n sso --from=cronjob/authelia-backup authelia-backup-test` + `kubectl create job -n mfa --from=cronjob/privacyidea-backup pi-backup-test` +- [ ] Confirm backup files appear on PVCs +- [ ] Run offsite export: pull backup files, encrypt with age, store offsite +- [ ] Run `verify-t08.sh` — 0 FAILs + +### Done-criteria for T08 +- All three backup CronJobs deployed and have ≥1 successful run +- Backup files confirmed on PVCs +- DR-RUNBOOK.md reviewed by operator +- Offsite ops bundle current (pack-bundle.sh run after all secrets finalised) +- verify-t08.sh: 0 FAILs diff --git a/sso-mfa/k8s/backup/DR-RUNBOOK.md b/sso-mfa/k8s/backup/DR-RUNBOOK.md new file mode 100644 index 0000000..ebec9c8 --- /dev/null +++ b/sso-mfa/k8s/backup/DR-RUNBOOK.md @@ -0,0 +1,187 @@ +# Disaster Recovery Runbook — net-kingdom SSO/MFA Platform + +**Stack:** LLDAP + Authelia + KeyCape (sso namespace) + privacyIDEA (mfa namespace) +**PostgreSQL:** Managed separately by CNPG (`postgresql/scheduled-backup.yaml`) + +--- + +## Recovery scenarios + +| Scenario | Impact | Recovery | +|----------|--------|----------| +| Pod crash / OOM | Stateless pods (KeyCape) recover automatically. Stateful pods (LLDAP, Authelia, PI) restart and reload from PVC. | K8s self-heals. Verify with `verify-t05.sh`. | +| PVC data corruption | Users/sessions/tokens lost. | Restore from SQLite backup (see below). | +| Node failure (single-node K3s) | All pods lost. PVCs intact on host. | Re-apply all manifests (idempotent). Pods re-attach to PVCs. | +| Node total loss (disk gone) | Everything lost. | Full restore from backup + KeePassXC. | +| Stack locked out (SSO broken, can't log in) | No user access to OIDC-protected apps. | Use break-glass account. | +| enckey lost (privacyIDEA) | All enrolled MFA tokens invalid. Users must re-enroll. | Restore from enckey backup or re-enroll all tokens. | + +--- + +## Break-glass access + +When the SSO stack is broken and no user can authenticate: + +```bash +# 1. Access LLDAP admin UI directly (requires VPN / IP-allowlisted access) +# URL: https://lldap.coulomb.social +# Username: break-glass +# Password: from KeePassXC → net-kingdom/Break-glass/break-glass +# +# 2. Or access LLDAP via kubectl exec (no network required) +kubectl exec -n sso deployment/lldap -- /bin/sh +# Inside container: use ldapwhoami / ldapsearch to verify directory state + +# 3. Access privacyIDEA admin UI +# URL: https://pink.coulomb.social +# Username: pi-admin +# Password: from KeePassXC → net-kingdom/privacyIDEA/pi-admin +# NOTE: pi-admin has MFA enrolled — if privacyIDEA MFA is down, use: +kubectl exec -n mfa deployment/privacyidea -- pi-manage admin list +``` + +--- + +## Restore order + +**CRITICAL: Always restore in this order.** Components depend on each other +at startup: privacyIDEA needs PostgreSQL, KeyCape needs all three. + +``` +1. PostgreSQL (databases ns) — CNPG operator handles restore +2. privacyIDEA (mfa ns) — needs PG + enckey PVC +3. LLDAP (sso ns) — standalone +4. Authelia (sso ns) — needs LLDAP (LDAP bind at startup check) +5. KeyCape (sso ns) — needs Authelia + LLDAP + privacyIDEA +``` + +--- + +## Restore from SQLite backup (PVC data corruption) + +### LLDAP + +```bash +# 1. Scale down LLDAP +kubectl scale deployment/lldap -n sso --replicas=0 + +# 2. Start a restore pod on the lldap-data PVC +kubectl run -n sso lldap-restore --image=nouchka/sqlite3:latest \ + --restart=Never \ + --overrides='{"spec":{"volumes":[{"name":"data","persistentVolumeClaim":{"claimName":"lldap-data"}}],"containers":[{"name":"lldap-restore","image":"nouchka/sqlite3:latest","command":["sleep","3600"],"volumeMounts":[{"name":"data","mountPath":"/data"}]}]}}' + +# 3. Copy backup file into the pod (or it's already on the PVC under /data/backups/) +kubectl exec -n sso lldap-restore -- ls /data/backups/ + +# 4. Restore from the chosen backup +kubectl exec -n sso lldap-restore -- \ + sqlite3 /data/backups/users.backup.YYYY-MM-DD ".dump" | \ + sqlite3 /data/users.db + +# 5. Clean up and restart +kubectl delete pod -n sso lldap-restore +kubectl scale deployment/lldap -n sso --replicas=1 +kubectl rollout status deployment/lldap -n sso --timeout=120s +``` + +### Authelia + +```bash +# Same pattern as LLDAP, using authelia-data PVC and authelia.backup.YYYY-MM-DD +kubectl scale deployment/authelia -n sso --replicas=0 +# ... (run restore pod, restore db.sqlite3, scale back up) +kubectl scale deployment/authelia -n sso --replicas=1 +``` + +### privacyIDEA enckey + +```bash +# If the enckey is lost, restore it from KeePassXC binary attachment PI_ENCFILE. +# Extract it to a local file first, then: +kubectl create secret generic privacyidea-enckey \ + --from-file=PI_ENCFILE=./pi.enc \ + --namespace mfa \ + --dry-run=client -o yaml | kubectl apply -f - + +# Restart privacyIDEA to pick up the restored key +kubectl rollout restart deployment/privacyidea -n mfa + +# If the enckey is truly lost and unrecoverable: +# All enrolled MFA tokens are invalid. +# Generate a new enckey with: kubectl exec -n mfa ... -- pi-manage create_enckey +# All users must re-enroll their TOTP/hardware tokens. +``` + +--- + +## Full node restore (new host) + +```bash +# Prerequisites on new host: +# - K3s installed +# - Traefik ingress (bundled with K3s) +# - cert-manager installed (helm install cert-manager ...) +# - DNS records pointing to new node IP +# - KeePassXC vault accessible (offline copy or age-encrypted bundle) + +# 1. Restore PostgreSQL from CNPG backup +# (See CNPG documentation for cluster restore from barmanObjectStore) + +# 2. Re-apply all manifests in order +cd sso-mfa/k8s +kubectl apply -f namespaces/namespaces.yaml +kubectl apply -f network-policies/ +kubectl apply -f cert-manager/issuers.yaml + +# 3. Restore secrets from KeePassXC +# Run each create-secrets.sh in order: +cd postgresql && ./create-secrets.sh && cd .. +cd privacyidea && ./create-secrets.sh && cd .. +cd lldap && ./create-secrets.sh && cd .. +cd authelia && ./create-secrets.sh && cd .. +cd keycape && ./create-secrets.sh && cd .. + +# 4. Apply workloads in restore order +kubectl apply -f postgresql/cluster.yaml +kubectl apply -f privacyidea/{pvc.yaml,configmap.yaml,deployment.yaml,middleware.yaml,ingress.yaml} +kubectl apply -f lldap/{pvc.yaml,deployment.yaml,middleware.yaml,ingress.yaml} +kubectl apply -f authelia/{pvc.yaml,configmap.yaml,deployment.yaml,ingress.yaml} +kubectl apply -f keycape/{deployment.yaml,middleware.yaml,ingress.yaml} + +# 5. Wait for everything to be Ready +kubectl rollout status deployment/privacyidea -n mfa --timeout=300s +kubectl rollout status deployment/lldap -n sso --timeout=120s +kubectl rollout status deployment/authelia -n sso --timeout=120s +kubectl rollout status deployment/keycape -n sso --timeout=60s + +# 6. Re-run bootstrap scripts if PVC data was lost +cd privacyidea && ./enckey-bootstrap.sh && ./bootstrap-admin.sh && ./bootstrap-realm.sh +cd ../lldap && ./bootstrap-users.sh && ./break-glass.sh +cd ../keycape && ./create-pi-token.sh && ./create-secrets.sh +kubectl rollout restart deployment/keycape -n sso + +# 7. Verify +./verify-t04.sh && ./verify-t05.sh && ./verify-t06.sh && ./verify-t07.sh && ./verify-t08.sh +``` + +--- + +## Backup offsite export + +The SQLite backup files land on the PVCs but are not offsite until exported. +Run this on the node host to pull them out and encrypt for offsite storage: + +```bash +# Pull backup files from pods +kubectl exec -n sso deployment/lldap -- \ + cat /data/backups/users.backup.$(date +%Y-%m-%d) > /tmp/lldap-backup.db +kubectl exec -n sso deployment/authelia -- \ + cat /data/backups/authelia.backup.$(date +%Y-%m-%d) > /tmp/authelia-backup.db + +# Encrypt with age and send offsite (same key as the ops bundle) +age -r "$(cat ~/net-kingdom-ops-bundle.key | grep 'public key' | awk '{print $NF}')" \ + -o /tmp/lldap-backup.db.age /tmp/lldap-backup.db + +# Shred plaintext copies +shred -u /tmp/lldap-backup.db /tmp/authelia-backup.db +``` diff --git a/sso-mfa/k8s/backup/cronjob-sqlite-backups.yaml b/sso-mfa/k8s/backup/cronjob-sqlite-backups.yaml new file mode 100644 index 0000000..08288a3 --- /dev/null +++ b/sso-mfa/k8s/backup/cronjob-sqlite-backups.yaml @@ -0,0 +1,304 @@ +# SQLite backup CronJobs — sso and mfa namespaces +# +# Three CronJobs, one per stateful SQLite database: +# 1. lldap-backup — LLDAP user/group store (namespace: sso) +# 2. authelia-backup — Authelia session/storage DB (namespace: sso) +# 3. privacyidea-backup — privacyIDEA token store (namespace: mfa) +# +# Each CronJob runs daily at 03:00 UTC. It uses `sqlite3 .backup` for a +# hot backup that is consistent even while the parent pod is running. +# Backups land on the same PVC next to the live database — to protect +# against pod failure, not PVC failure. Export the backup files offsite +# using pack-bundle.sh or a separate volume snapshot mechanism. +# +# PostgreSQL (privacyIDEA DB) is handled by CNPG ScheduledBackup in +# postgresql/scheduled-backup.yaml. Do not duplicate it here. +# +# Backup file naming: +# .backup. — created daily, pruned after 7 days +# +# Prerequisites: +# - SQLite3 available in the target pod (privacyIDEA and LLDAP images +# include it; Authelia's distroless image does NOT — so Authelia backup +# runs in a separate Job pod with sqlite:alpine image mounted on the PVC). +# +# Apply: +# kubectl apply -f cronjob-sqlite-backups.yaml + +--- +# ── 1. LLDAP backup (namespace: sso) ───────────────────────────────────────── +# LLDAP includes sqlite3 in its image — run the backup inside the live pod +# via a sidecar-style CronJob that mounts the same PVC. +apiVersion: batch/v1 +kind: CronJob +metadata: + name: lldap-backup + namespace: sso + labels: + app.kubernetes.io/name: lldap-backup + app.kubernetes.io/part-of: net-kingdom-sso-mfa + net-kingdom/component: backup +spec: + schedule: "0 3 * * *" # daily at 03:00 UTC + concurrencyPolicy: Forbid + successfulJobsHistoryLimit: 3 + failedJobsHistoryLimit: 3 + jobTemplate: + spec: + template: + metadata: + labels: + app.kubernetes.io/name: lldap-backup + net-kingdom/component: backup + spec: + restartPolicy: OnFailure + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 + volumes: + - name: data + persistentVolumeClaim: + claimName: lldap-data + containers: + - name: backup + # Use a lightweight SQLite image — LLDAP's image may not have sqlite3 CLI + image: nouchka/sqlite3:latest + imagePullPolicy: IfNotPresent + command: + - /bin/sh + - -c + - | + set -eu + DB=/data/users.db + BACKUP_DIR=/data/backups + DATE=$(date +%Y-%m-%d) + mkdir -p "$BACKUP_DIR" + if [ ! -f "$DB" ]; then + echo "WARN: $DB not found — LLDAP may not have been bootstrapped yet" + exit 0 + fi + sqlite3 "$DB" ".backup '$BACKUP_DIR/users.backup.$DATE'" + echo "OK: backed up $DB to $BACKUP_DIR/users.backup.$DATE" + # Prune backups older than 7 days + find "$BACKUP_DIR" -name 'users.backup.*' -mtime +7 -delete + echo "OK: pruned backups older than 7 days" + volumeMounts: + - name: data + mountPath: /data + resources: + requests: + cpu: "10m" + memory: "32Mi" + limits: + cpu: "100m" + memory: "64Mi" + +--- +# ── 2. Authelia backup (namespace: sso) ────────────────────────────────────── +# Authelia uses a distroless image — run backup in a separate pod on the same PVC. +# NOTE: Authelia uses ReadWriteOnce PVC. The backup pod and Authelia pod cannot +# both mount it simultaneously on most K3s setups. This CronJob scales Authelia +# to 0 replicas, takes the backup, then restores the replica count. +# For production: prefer a storage-level snapshot (Longhorn/Velero) instead. +apiVersion: batch/v1 +kind: CronJob +metadata: + name: authelia-backup + namespace: sso + labels: + app.kubernetes.io/name: authelia-backup + app.kubernetes.io/part-of: net-kingdom-sso-mfa + net-kingdom/component: backup +spec: + schedule: "15 3 * * *" # 03:15 UTC — offset from lldap-backup + concurrencyPolicy: Forbid + successfulJobsHistoryLimit: 3 + failedJobsHistoryLimit: 3 + jobTemplate: + spec: + template: + metadata: + labels: + app.kubernetes.io/name: authelia-backup + net-kingdom/component: backup + spec: + restartPolicy: OnFailure + serviceAccountName: backup-sa # needs scale permission — see RBAC below + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 + volumes: + - name: data + persistentVolumeClaim: + claimName: authelia-data + initContainers: + # Scale Authelia to 0 to release the PVC before mounting + - name: scale-down + image: bitnami/kubectl:latest + imagePullPolicy: IfNotPresent + command: + - kubectl + - scale + - deployment/authelia + - --replicas=0 + - -n + - sso + resources: + requests: + cpu: "10m" + memory: "32Mi" + limits: + cpu: "100m" + memory: "64Mi" + containers: + - name: backup + image: nouchka/sqlite3:latest + imagePullPolicy: IfNotPresent + command: + - /bin/sh + - -c + - | + set -eu + DB=/data/db.sqlite3 + BACKUP_DIR=/data/backups + DATE=$(date +%Y-%m-%d) + mkdir -p "$BACKUP_DIR" + if [ ! -f "$DB" ]; then + echo "WARN: $DB not found — Authelia may not have been bootstrapped yet" + else + sqlite3 "$DB" ".backup '$BACKUP_DIR/authelia.backup.$DATE'" + echo "OK: backed up $DB to $BACKUP_DIR/authelia.backup.$DATE" + find "$BACKUP_DIR" -name 'authelia.backup.*' -mtime +7 -delete + echo "OK: pruned backups older than 7 days" + fi + # Always scale Authelia back up, even on backup failure + kubectl scale deployment/authelia --replicas=1 -n sso || true + volumeMounts: + - name: data + mountPath: /data + resources: + requests: + cpu: "10m" + memory: "32Mi" + limits: + cpu: "100m" + memory: "64Mi" + +--- +# ── 3. privacyIDEA backup (namespace: mfa) ─────────────────────────────────── +# privacyIDEA's enckey and token store live in the PVC. +# The SQLite database (if configured) and enckey are both backed up here. +# NOTE: The main PI database is PostgreSQL (handled by CNPG). This backs up +# the PI_ENCFILE (encryption key) stored on the PVC and any local config files. +apiVersion: batch/v1 +kind: CronJob +metadata: + name: privacyidea-backup + namespace: mfa + labels: + app.kubernetes.io/name: privacyidea-backup + app.kubernetes.io/part-of: net-kingdom-sso-mfa + net-kingdom/component: backup +spec: + schedule: "30 3 * * *" # 03:30 UTC — offset from previous jobs + concurrencyPolicy: Forbid + successfulJobsHistoryLimit: 3 + failedJobsHistoryLimit: 3 + jobTemplate: + spec: + template: + metadata: + labels: + app.kubernetes.io/name: privacyidea-backup + net-kingdom/component: backup + spec: + restartPolicy: OnFailure + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 + volumes: + - name: data + persistentVolumeClaim: + claimName: privacyidea-data + containers: + - name: backup + image: busybox:stable + imagePullPolicy: IfNotPresent + command: + - /bin/sh + - -c + - | + set -eu + BACKUP_DIR=/data/backups + DATE=$(date +%Y-%m-%d) + mkdir -p "$BACKUP_DIR" + # Back up the enckey — this is the most critical file on this PVC. + # Loss of enckey = all enrolled MFA tokens become invalid. + if [ -f /data/enckey ]; then + cp /data/enckey "$BACKUP_DIR/enckey.backup.$DATE" + echo "OK: backed up enckey to $BACKUP_DIR/enckey.backup.$DATE" + else + echo "WARN: /data/enckey not found — enckey-bootstrap.sh may not have run yet" + fi + # Back up any local config files + if [ -f /data/privacyidea.cfg ]; then + cp /data/privacyidea.cfg "$BACKUP_DIR/privacyidea.cfg.backup.$DATE" + fi + # Prune files older than 7 days + find "$BACKUP_DIR" \( -name 'enckey.backup.*' -o -name '*.cfg.backup.*' \) \ + -mtime +7 -delete + echo "OK: pruned backups older than 7 days" + volumeMounts: + - name: data + mountPath: /data + resources: + requests: + cpu: "10m" + memory: "16Mi" + limits: + cpu: "50m" + memory: "32Mi" + +--- +# ── RBAC for backup-sa (Authelia scale-down/up) ─────────────────────────────── +apiVersion: v1 +kind: ServiceAccount +metadata: + name: backup-sa + namespace: sso + labels: + app.kubernetes.io/part-of: net-kingdom-sso-mfa + net-kingdom/component: backup +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: backup-scaler + namespace: sso + labels: + app.kubernetes.io/part-of: net-kingdom-sso-mfa + net-kingdom/component: backup +rules: + - apiGroups: ["apps"] + resources: ["deployments/scale", "deployments"] + verbs: ["get", "update", "patch"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: backup-sa-scaler + namespace: sso + labels: + app.kubernetes.io/part-of: net-kingdom-sso-mfa + net-kingdom/component: backup +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: backup-scaler +subjects: + - kind: ServiceAccount + name: backup-sa + namespace: sso diff --git a/sso-mfa/k8s/lldap/bootstrap-users.sh b/sso-mfa/k8s/lldap/bootstrap-users.sh new file mode 100755 index 0000000..97a2dcb --- /dev/null +++ b/sso-mfa/k8s/lldap/bootstrap-users.sh @@ -0,0 +1,172 @@ +#!/usr/bin/env bash +# bootstrap-users.sh — seed required groups in LLDAP +# +# Run AFTER LLDAP is deployed and Running (T05a). +# +# What it does: +# 1. Authenticates to LLDAP via its GraphQL API. +# 2. Creates the two required groups: net-kingdom-users, net-kingdom-admins. +# 3. Prints a user onboarding checklist (groups-only; individual users are +# added via the WebUI or by re-running this script with USER_EMAIL set). +# +# Groups created: +# net-kingdom-users — standard users; all human accounts go here. +# net-kingdom-admins — privileged users; KeyCape policies can enforce +# MFA step-up or grant extra scopes to this group. +# +# Usage: +# ./bootstrap-users.sh [lldap-url] [secrets-dir] +# +# default: https://lldap.coulomb.social +# default: ../../bootstrap/secrets + +set -euo pipefail + +LLDAP_URL="${1:-https://lldap.coulomb.social}" +SECRETS_DIR="${2:-../../bootstrap/secrets}" +LLDAP_ENV="$SECRETS_DIR/lldap/secrets.env" + +PASS_COUNT=0 +FAIL_COUNT=0 + +ok() { echo " [OK] $1"; ((PASS_COUNT++)); } +fail() { echo " [FAIL] $1"; ((FAIL_COUNT++)); } +info() { echo " [INFO] $1"; } + +if [[ ! -f "$LLDAP_ENV" ]]; then + echo "ERROR: $LLDAP_ENV not found — run sso-mfa/bootstrap/gen-secrets.sh first." >&2 + exit 1 +fi + +read_env() { bash -c "source '$1' 2>/dev/null; echo \${$2}"; } +LLDAP_ADMIN_PASS=$(read_env "$LLDAP_ENV" LLDAP_LDAP_USER_PASS) + +if [[ -z "$LLDAP_ADMIN_PASS" ]]; then + echo "ERROR: LLDAP_LDAP_USER_PASS not found in $LLDAP_ENV" >&2 + exit 1 +fi + +# ── 1. Authenticate ─────────────────────────────────────────────────────────── +echo "" +echo "Authenticating to LLDAP at $LLDAP_URL ..." + +AUTH_RESP=$(curl -sf -X POST "$LLDAP_URL/auth/simple/login" \ + -H "Content-Type: application/json" \ + -d "{\"username\":\"admin\",\"password\":\"$LLDAP_ADMIN_PASS\"}" \ + 2>/dev/null || echo "CURL_FAILED") + +if [[ "$AUTH_RESP" == "CURL_FAILED" ]]; then + echo "ERROR: Could not reach $LLDAP_URL — is LLDAP deployed and ingress up?" >&2 + exit 1 +fi + +LLDAP_TOKEN=$(echo "$AUTH_RESP" | python3 -c \ + "import sys,json; print(json.load(sys.stdin).get('token',''))" 2>/dev/null || echo "") + +if [[ -z "$LLDAP_TOKEN" ]]; then + echo "ERROR: Authentication failed. Response: $AUTH_RESP" >&2 + exit 1 +fi +info "Authenticated as admin" + +gql() { + # gql + local query="$1"; local vars="${2:-{}}" + local body + body=$(python3 -c " +import json, sys +print(json.dumps({'query': sys.argv[1], 'variables': json.loads(sys.argv[2])})) +" "$query" "$vars") + curl -sf -X POST "$LLDAP_URL/api/graphql" \ + -H "Authorization: Bearer $LLDAP_TOKEN" \ + -H "Content-Type: application/json" \ + -d "$body" 2>/dev/null || echo "CURL_FAILED" +} + +create_group() { + local name="$1" + echo "" + echo "Creating group: $name ..." + + # Check if group already exists + LIST_RESP=$(gql 'query { groups { id displayName } }') + if [[ "$LIST_RESP" != "CURL_FAILED" ]]; then + EXISTS=$(echo "$LIST_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='$name' for g in d.get('data',{}).get('groups',[])) else 'no')" \ + 2>/dev/null || echo "no") + if [[ "$EXISTS" == "yes" ]]; then + ok "Group '$name' already exists — skipping" + return 0 + fi + fi + + RESP=$(gql 'mutation CreateGroup($name: String!) { createGroup(name: $name) { id displayName } }' \ + "{\"name\":\"$name\"}") + if [[ "$RESP" == "CURL_FAILED" ]]; then + fail "Group '$name' — curl request failed" + return 1 + fi + ERR=$(echo "$RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \ + 2>/dev/null || echo "") + if [[ -n "$ERR" ]]; then + fail "Group '$name' — $ERR" + return 1 + fi + GID=$(echo "$RESP" | python3 -c \ + "import sys,json; print(json.load(sys.stdin).get('data',{}).get('createGroup',{}).get('id','?'))" \ + 2>/dev/null || echo "?") + ok "Group '$name' created (id=$GID)" +} + +# ── 2. Create required groups ───────────────────────────────────────────────── +create_group "net-kingdom-users" +create_group "net-kingdom-admins" + +# ── 3. Verify ───────────────────────────────────────────────────────────────── +echo "" +echo "Verifying groups ..." +LIST_RESP=$(gql 'query { groups { id displayName } }') +if [[ "$LIST_RESP" != "CURL_FAILED" ]]; then + for grp in "net-kingdom-users" "net-kingdom-admins"; do + EXISTS=$(echo "$LIST_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='$grp' for g in d.get('data',{}).get('groups',[])) else 'no')" \ + 2>/dev/null || echo "no") + if [[ "$EXISTS" == "yes" ]]; then + ok "Group '$grp' confirmed" + else + fail "Group '$grp' not found after creation" + fi + done +else + fail "Could not retrieve group list from LLDAP" +fi + +# ── Summary ─────────────────────────────────────────────────────────────────── +echo "" +echo "════════════════════════════════════════════════════════════" +echo " LLDAP group bootstrap: PASS=$PASS_COUNT FAIL=$FAIL_COUNT" +echo "════════════════════════════════════════════════════════════" +echo "" +echo "Next: add users via the LLDAP WebUI or LDAP provisioning." +echo "" +echo "User onboarding checklist:" +echo "" +echo " Per new user:" +echo " 1. Create account in LLDAP WebUI ($LLDAP_URL)" +echo " Fields: username (uid), display name, email" +echo " 2. Assign to net-kingdom-users group (mandatory)" +echo " Assign to net-kingdom-admins too if privileged access is needed" +echo " 3. User logs in to Authelia (auth.coulomb.social) to verify their password" +echo " 4. User self-enrolls TOTP at pink-account.coulomb.social" +echo " 5. User tests end-to-end login via an OIDC-protected application" +echo "" +echo " Break-glass account:" +echo " Run: sso-mfa/k8s/lldap/break-glass.sh" +echo " (Creates a pre-seeded local bypass user outside the normal MFA flow.)" +echo "" + +if [[ "$FAIL_COUNT" -gt 0 ]]; then + exit 1 +fi +exit 0 diff --git a/sso-mfa/k8s/lldap/break-glass.sh b/sso-mfa/k8s/lldap/break-glass.sh new file mode 100755 index 0000000..9ce0d7a --- /dev/null +++ b/sso-mfa/k8s/lldap/break-glass.sh @@ -0,0 +1,203 @@ +#!/usr/bin/env bash +# break-glass.sh — create the break-glass bypass account in LLDAP +# +# The break-glass account is a last-resort local user for when the SSO stack +# itself is broken (Authelia down, KeyCape misconfigured, etc.). It is: +# - Created in LLDAP with BREAKGLASS_PASSWORD from gen-secrets.sh +# - Assigned to net-kingdom-admins +# - NOT enrolled in privacyIDEA MFA (so it can log in even if privacyIDEA is down) +# - Its password is stored ONLY in KeePassXC (never in the cluster) +# +# IMPORTANT: After creating this account, immediately store the password in +# KeePassXC → net-kingdom/Break-glass/break-glass. Then test it by logging +# in to the LLDAP WebUI directly with this account. +# +# Usage: +# ./break-glass.sh [lldap-url] [secrets-dir] +# +# default: https://lldap.coulomb.social +# default: ../../bootstrap/secrets + +set -euo pipefail + +LLDAP_URL="${1:-https://lldap.coulomb.social}" +SECRETS_DIR="${2:-../../bootstrap/secrets}" +LLDAP_ENV="$SECRETS_DIR/lldap/secrets.env" +BG_ENV="$SECRETS_DIR/breakglass/secrets.env" + +BG_USERNAME="break-glass" +BG_EMAIL="break-glass@netkingdom.local" +BG_DISPLAY="Break-glass Account" + +PASS_COUNT=0 +FAIL_COUNT=0 + +ok() { echo " [OK] $1"; ((PASS_COUNT++)); } +fail() { echo " [FAIL] $1"; ((FAIL_COUNT++)); } +info() { echo " [INFO] $1"; } + +for f in "$LLDAP_ENV" "$BG_ENV"; do + if [[ ! -f "$f" ]]; then + echo "ERROR: $f not found — run sso-mfa/bootstrap/gen-secrets.sh first." >&2 + exit 1 + fi +done + +read_env() { bash -c "source '$1' 2>/dev/null; echo \${$2}"; } +LLDAP_ADMIN_PASS=$(read_env "$LLDAP_ENV" LLDAP_LDAP_USER_PASS) +BG_PASSWORD=$(read_env "$BG_ENV" BREAKGLASS_PASSWORD) + +if [[ -z "$LLDAP_ADMIN_PASS" || -z "$BG_PASSWORD" ]]; then + echo "ERROR: could not read LLDAP_LDAP_USER_PASS or BREAKGLASS_PASSWORD" >&2 + exit 1 +fi + +# ── Authenticate ────────────────────────────────────────────────────────────── +echo "" +echo "Authenticating to LLDAP at $LLDAP_URL ..." +AUTH_RESP=$(curl -sf -X POST "$LLDAP_URL/auth/simple/login" \ + -H "Content-Type: application/json" \ + -d "{\"username\":\"admin\",\"password\":\"$LLDAP_ADMIN_PASS\"}" \ + 2>/dev/null || echo "CURL_FAILED") + +if [[ "$AUTH_RESP" == "CURL_FAILED" ]]; then + echo "ERROR: Could not reach $LLDAP_URL" >&2 + exit 1 +fi +LLDAP_TOKEN=$(echo "$AUTH_RESP" | python3 -c \ + "import sys,json; print(json.load(sys.stdin).get('token',''))" 2>/dev/null || echo "") +if [[ -z "$LLDAP_TOKEN" ]]; then + echo "ERROR: LLDAP authentication failed" >&2 + exit 1 +fi +info "Authenticated as admin" + +gql() { + local query="$1"; local vars="${2:-{}}" + local body + body=$(python3 -c " +import json, sys +print(json.dumps({'query': sys.argv[1], 'variables': json.loads(sys.argv[2])})) +" "$query" "$vars") + curl -sf -X POST "$LLDAP_URL/api/graphql" \ + -H "Authorization: Bearer $LLDAP_TOKEN" \ + -H "Content-Type: application/json" \ + -d "$body" 2>/dev/null || echo "CURL_FAILED" +} + +# ── Check if user already exists ────────────────────────────────────────────── +echo "" +echo "Checking if break-glass user exists ..." +LIST_RESP=$(gql 'query { users { id displayName email } }') +if [[ "$LIST_RESP" != "CURL_FAILED" ]]; then + EXISTS=$(echo "$LIST_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); print('yes' if any(u['id']=='$BG_USERNAME' for u in d.get('data',{}).get('users',[])) else 'no')" \ + 2>/dev/null || echo "no") + if [[ "$EXISTS" == "yes" ]]; then + info "User '$BG_USERNAME' already exists — skipping creation" + ok "User '$BG_USERNAME' exists" + else + # ── Create user ─────────────────────────────────────────────────────── + echo "Creating user '$BG_USERNAME' ..." + CREATE_VARS=$(python3 -c " +import json, sys +v = { + 'user': { + 'id': '$BG_USERNAME', + 'email': '$BG_EMAIL', + 'displayName': '$BG_DISPLAY', + 'firstName': 'Break', + 'lastName': 'Glass' + } +} +print(json.dumps(v)) +") + RESP=$(gql 'mutation CreateUser($user: CreateUserInput!) { createUser(user: $user) { id creationDate } }' \ + "$CREATE_VARS") + ERR=$(echo "$RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \ + 2>/dev/null || echo "") + if [[ -n "$ERR" ]]; then + fail "Create user '$BG_USERNAME' — $ERR" + else + ok "User '$BG_USERNAME' created" + fi + fi +else + fail "Could not query user list from LLDAP" +fi + +# ── Set password ────────────────────────────────────────────────────────────── +# LLDAP requires a separate API call to set the password after user creation. +echo "" +echo "Setting password for '$BG_USERNAME' ..." +PW_VARS=$(python3 -c " +import json, sys +print(json.dumps({'userId': '$BG_USERNAME', 'password': sys.argv[1]})) +" "$BG_PASSWORD") +PW_RESP=$(gql 'mutation SetPassword($userId: String!, $password: String!) { changeUserPassword(userId: $userId, password: $password) }' \ + "$PW_VARS") +PW_ERR=$(echo "$PW_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \ + 2>/dev/null || echo "") +if [[ -n "$PW_ERR" ]]; then + fail "Set password — $PW_ERR" +else + ok "Password set for '$BG_USERNAME'" +fi + +# ── Add to net-kingdom-admins ───────────────────────────────────────────────── +echo "" +echo "Adding '$BG_USERNAME' to net-kingdom-admins group ..." + +# Find the net-kingdom-admins group ID +GROUPS_RESP=$(gql 'query { groups { id displayName } }') +ADMIN_GID=$(echo "$GROUPS_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); gs=[g for g in d.get('data',{}).get('groups',[]) if g['displayName']=='net-kingdom-admins']; print(gs[0]['id'] if gs else '')" \ + 2>/dev/null || echo "") + +if [[ -z "$ADMIN_GID" ]]; then + fail "Group 'net-kingdom-admins' not found — run bootstrap-users.sh first" +else + ADD_VARS="{\"userId\":\"$BG_USERNAME\",\"groupId\":$ADMIN_GID}" + ADD_RESP=$(gql 'mutation AddToGroup($userId: String!, $groupId: Int!) { addUserToGroup(userId: $userId, groupId: $groupId) }' \ + "$ADD_VARS") + ADD_ERR=$(echo "$ADD_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \ + 2>/dev/null || echo "") + if [[ -n "$ADD_ERR" && "$ADD_ERR" != *"already"* ]]; then + fail "Add to group — $ADD_ERR" + else + ok "'$BG_USERNAME' is in net-kingdom-admins" + fi +fi + +# ── Summary ─────────────────────────────────────────────────────────────────── +echo "" +echo "════════════════════════════════════════════════════════════" +echo " Break-glass bootstrap: PASS=$PASS_COUNT FAIL=$FAIL_COUNT" +echo "════════════════════════════════════════════════════════════" +echo "" +echo "CRITICAL — do these steps NOW:" +echo "" +echo " 1. Store the break-glass password in KeePassXC:" +echo " Group: net-kingdom/Break-glass" +echo " Entry: break-glass → username='$BG_USERNAME' password=" +echo "" +echo " 2. Test the account (LLDAP WebUI login):" +echo " $LLDAP_URL" +echo " Login as '$BG_USERNAME' with BREAKGLASS_PASSWORD" +echo " Confirm you can see the admin panel." +echo "" +echo " 3. Do NOT enroll MFA for '$BG_USERNAME' in privacyIDEA." +echo " This account must remain usable when privacyIDEA is unavailable." +echo " Its sole authentication factor is the password stored in KeePassXC." +echo "" +echo " 4. Document the DR restore sequence:" +echo " See sso-mfa/k8s/backup/DR-RUNBOOK.md" +echo "" + +if [[ "$FAIL_COUNT" -gt 0 ]]; then + exit 1 +fi +exit 0 diff --git a/sso-mfa/k8s/verify-t07.sh b/sso-mfa/k8s/verify-t07.sh new file mode 100755 index 0000000..5ed7259 --- /dev/null +++ b/sso-mfa/k8s/verify-t07.sh @@ -0,0 +1,199 @@ +#!/usr/bin/env bash +# verify-t07.sh — verify NK-WP-0001-T07 done-criteria +# +# Checks user management and self-service readiness. +# +# Sections: +# 1. LLDAP group: net-kingdom-users exists +# 2. LLDAP group: net-kingdom-admins exists +# 3. At least one non-admin user exists in LLDAP +# 4. Break-glass user exists and is in net-kingdom-admins +# 5. privacyIDEA self-service portal reachable +# 6. KeyCape config has at least one OIDC client registered +# +# Usage: +# chmod +x verify-t07.sh +# ./verify-t07.sh [lldap-url] [secrets-dir] +# +# default: https://lldap.coulomb.social +# default: ../bootstrap/secrets + +set -euo pipefail + +LLDAP_URL="${1:-https://lldap.coulomb.social}" +SECRETS_DIR="${2:-../bootstrap/secrets}" +LLDAP_ENV="$SECRETS_DIR/lldap/secrets.env" +SSO_NAMESPACE="sso" + +PASS=0 +FAIL=0 +WARN=0 + +pass() { echo " [PASS] $1"; ((PASS++)); } +fail() { echo " [FAIL] $1"; ((FAIL++)); } +warn() { echo " [WARN] $1"; ((WARN++)); } + +section() { echo ""; echo "── $1 ──────────────────────────────────────"; } + +# ── Authenticate to LLDAP ───────────────────────────────────────────────────── +LLDAP_TOKEN="" +if [[ -f "$LLDAP_ENV" ]]; then + read_env() { bash -c "source '$1' 2>/dev/null; echo \${$2}"; } + LLDAP_ADMIN_PASS=$(read_env "$LLDAP_ENV" LLDAP_LDAP_USER_PASS) + if [[ -n "$LLDAP_ADMIN_PASS" ]]; then + AUTH_RESP=$(curl -sf -X POST "$LLDAP_URL/auth/simple/login" \ + -H "Content-Type: application/json" \ + -d "{\"username\":\"admin\",\"password\":\"$LLDAP_ADMIN_PASS\"}" \ + 2>/dev/null || echo "CURL_FAILED") + if [[ "$AUTH_RESP" != "CURL_FAILED" ]]; then + LLDAP_TOKEN=$(echo "$AUTH_RESP" | python3 -c \ + "import sys,json; print(json.load(sys.stdin).get('token',''))" 2>/dev/null || echo "") + fi + fi +fi + +gql() { + if [[ -z "$LLDAP_TOKEN" ]]; then echo "NO_TOKEN"; return; fi + local query="$1"; local vars="${2:-{}}" + local body + body=$(python3 -c " +import json, sys +print(json.dumps({'query': sys.argv[1], 'variables': json.loads(sys.argv[2])})) +" "$query" "$vars") + curl -sf -X POST "$LLDAP_URL/api/graphql" \ + -H "Authorization: Bearer $LLDAP_TOKEN" \ + -H "Content-Type: application/json" \ + -d "$body" 2>/dev/null || echo "CURL_FAILED" +} + +GROUPS_RESP=$(gql 'query { groups { id displayName members { id } } }') + +# ── 1. net-kingdom-users group ─────────────────────────────────────────────── +section "1. LLDAP group: net-kingdom-users" +if [[ "$GROUPS_RESP" == "NO_TOKEN" ]]; then + warn "Skipping — could not authenticate to LLDAP at $LLDAP_URL" +elif [[ "$GROUPS_RESP" == "CURL_FAILED" ]]; then + fail "Could not query LLDAP groups — is LLDAP up?" +else + EXISTS=$(echo "$GROUPS_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='net-kingdom-users' for g in d.get('data',{}).get('groups',[])) else 'no')" \ + 2>/dev/null || echo "no") + if [[ "$EXISTS" == "yes" ]]; then + pass "Group 'net-kingdom-users' exists" + else + fail "Group 'net-kingdom-users' not found — run lldap/bootstrap-users.sh" + fi +fi + +# ── 2. net-kingdom-admins group ────────────────────────────────────────────── +section "2. LLDAP group: net-kingdom-admins" +if [[ "$GROUPS_RESP" == "NO_TOKEN" || "$GROUPS_RESP" == "CURL_FAILED" ]]; then + warn "Skipping — see section 1" +else + EXISTS=$(echo "$GROUPS_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='net-kingdom-admins' for g in d.get('data',{}).get('groups',[])) else 'no')" \ + 2>/dev/null || echo "no") + if [[ "$EXISTS" == "yes" ]]; then + pass "Group 'net-kingdom-admins' exists" + else + fail "Group 'net-kingdom-admins' not found — run lldap/bootstrap-users.sh" + fi +fi + +# ── 3. At least one non-admin user ─────────────────────────────────────────── +section "3. At least one regular user in LLDAP" +USERS_RESP=$(gql 'query { users { id displayName email } }') +if [[ "$USERS_RESP" == "NO_TOKEN" || "$USERS_RESP" == "CURL_FAILED" ]]; then + warn "Skipping — could not query users" +else + USERS=$(echo "$USERS_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); us=[u['id'] for u in d.get('data',{}).get('users',[]) if u['id'] not in ('admin','break-glass')]; print(len(us))" \ + 2>/dev/null || echo "0") + if [[ "$USERS" -gt 0 ]]; then + pass "$USERS regular user(s) exist in LLDAP" + else + warn "No regular users found — add users via the LLDAP WebUI ($LLDAP_URL) or provisioning script" + fi +fi + +# ── 4. Break-glass user ─────────────────────────────────────────────────────── +section "4. Break-glass account" +if [[ "$USERS_RESP" == "NO_TOKEN" || "$USERS_RESP" == "CURL_FAILED" ]]; then + warn "Skipping — could not query users" +else + BG_EXISTS=$(echo "$USERS_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); print('yes' if any(u['id']=='break-glass' for u in d.get('data',{}).get('users',[])) else 'no')" \ + 2>/dev/null || echo "no") + if [[ "$BG_EXISTS" == "yes" ]]; then + pass "User 'break-glass' exists in LLDAP" + # Check group membership + BG_IN_ADMINS=$(echo "$GROUPS_RESP" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); gs=[g for g in d.get('data',{}).get('groups',[]) if g['displayName']=='net-kingdom-admins']; print('yes' if gs and any(m['id']=='break-glass' for m in gs[0].get('members',[])) else 'no')" \ + 2>/dev/null || echo "no") + if [[ "$BG_IN_ADMINS" == "yes" ]]; then + pass "'break-glass' is in net-kingdom-admins group" + else + warn "'break-glass' is not in net-kingdom-admins — run lldap/break-glass.sh" + fi + else + fail "User 'break-glass' not found — run lldap/break-glass.sh" + fi +fi + +# ── 5. Self-service portal ──────────────────────────────────────────────────── +section "5. privacyIDEA self-service portal (pink-account.coulomb.social)" +PORTAL_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" \ + "https://pink-account.coulomb.social" 2>/dev/null || echo "000") +if [[ "$PORTAL_STATUS" == "200" || "$PORTAL_STATUS" == "302" ]]; then + pass "Self-service portal reachable (HTTP $PORTAL_STATUS)" +elif [[ "$PORTAL_STATUS" == "000" ]]; then + warn "Self-service portal not reachable — check DNS and ingress" +else + warn "Self-service portal returned HTTP $PORTAL_STATUS" +fi + +# ── 6. KeyCape OIDC client registrations ───────────────────────────────────── +section "6. KeyCape OIDC client registrations" +KC_POD=$(kubectl get pod -n "$SSO_NAMESPACE" \ + -l app.kubernetes.io/name=keycape \ + --field-selector=status.phase=Running \ + -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") +if [[ -n "$KC_POD" ]]; then + DISCOVERY=$(kubectl exec -n "$SSO_NAMESPACE" "$KC_POD" -- \ + wget -qO- "http://localhost:8080/.well-known/openid-configuration" 2>/dev/null || echo "") + if [[ -n "$DISCOVERY" ]]; then + pass "KeyCape OIDC discovery endpoint accessible" + # Check config for registered clients (KeyCape config in keycape-config Secret) + CONFIG=$(kubectl get secret keycape-config -n "$SSO_NAMESPACE" \ + -o jsonpath='{.data.config\.yaml}' 2>/dev/null | base64 -d 2>/dev/null || echo "") + CLIENT_COUNT=$(echo "$CONFIG" | python3 -c \ + "import sys; import yaml; cfg=yaml.safe_load(sys.stdin.read()); print(len(cfg.get('clients',[])))" \ + 2>/dev/null || echo "0") + if [[ "$CLIENT_COUNT" -gt 0 ]]; then + pass "$CLIENT_COUNT OIDC client(s) registered in KeyCape" + else + warn "No OIDC clients registered — add clients to keycape/create-secrets.sh and re-run it" + fi + else + warn "Skipping client check — KeyCape not reachable in-cluster" + fi +else + warn "Skipping OIDC client check — no running KeyCape pod" +fi + +# ── Summary ─────────────────────────────────────────────────────────────────── +echo "" +echo "════════════════════════════════════════════════════════════" +echo " T07 verification: PASS=$PASS WARN=$WARN FAIL=$FAIL" +echo "════════════════════════════════════════════════════════════" + +if [[ "$FAIL" -gt 0 ]]; then + echo " Result: INCOMPLETE — resolve FAIL items before marking T07 done" + exit 1 +elif [[ "$WARN" -gt 0 ]]; then + echo " Result: PARTIAL — required structure is in place; review WARN items" + exit 0 +else + echo " Result: COMPLETE — T07 done-criteria met; proceed to T08 (Backups, DR, break-glass)" + exit 0 +fi diff --git a/sso-mfa/k8s/verify-t08.sh b/sso-mfa/k8s/verify-t08.sh new file mode 100755 index 0000000..dff4d61 --- /dev/null +++ b/sso-mfa/k8s/verify-t08.sh @@ -0,0 +1,174 @@ +#!/usr/bin/env bash +# verify-t08.sh — verify NK-WP-0001-T08 done-criteria +# +# Checks backups, DR readiness, and break-glass account. +# +# Sections: +# 1. Backup CronJobs exist (lldap-backup, authelia-backup, privacyidea-backup) +# 2. backup-sa ServiceAccount and RBAC exist +# 3. lldap-backup has run successfully at least once +# 4. authelia-backup has run successfully at least once +# 5. privacyidea-backup has run successfully at least once +# 6. privacyIDEA enckey backup exists on PVC +# 7. LLDAP SQLite backup exists on PVC +# 8. DR-RUNBOOK.md present in repo +# 9. KeePassXC ops bundle (pack-bundle.sh) — manual confirmation required +# +# Usage: +# chmod +x verify-t08.sh +# ./verify-t08.sh + +set -euo pipefail + +SSO_NAMESPACE="sso" +MFA_NAMESPACE="mfa" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +PASS=0 +FAIL=0 +WARN=0 + +pass() { echo " [PASS] $1"; ((PASS++)); } +fail() { echo " [FAIL] $1"; ((FAIL++)); } +warn() { echo " [WARN] $1"; ((WARN++)); } + +section() { echo ""; echo "── $1 ──────────────────────────────────────"; } + +check_cronjob() { + local name="$1"; local ns="$2" + if kubectl get cronjob "$name" -n "$ns" &>/dev/null; then + pass "CronJob $name exists (namespace: $ns)" + local schedule + schedule=$(kubectl get cronjob "$name" -n "$ns" \ + -o jsonpath='{.spec.schedule}' 2>/dev/null || echo "?") + pass " Schedule: $schedule" + else + fail "CronJob $name not found in namespace $ns — apply backup/cronjob-sqlite-backups.yaml" + fi +} + +check_last_job() { + local cronjob="$1"; local ns="$2" + # Find the most recent Job spawned by this CronJob + LAST_JOB=$(kubectl get job -n "$ns" \ + -l "batch.kubernetes.io/controller-uid" \ + --sort-by=.metadata.creationTimestamp \ + -o jsonpath='{.items[-1].metadata.name}' 2>/dev/null || echo "") + # Simpler: look for any completed job with the cronjob name prefix + SUCCEEDED=$(kubectl get job -n "$ns" \ + -o jsonpath="{.items[?(@.metadata.ownerReferences[0].name==\"$cronjob\")].status.succeeded}" \ + 2>/dev/null || echo "") + if [[ "$SUCCEEDED" == *"1"* ]]; then + pass "CronJob $cronjob has at least one successful run" + else + warn "CronJob $cronjob has no successful runs yet — trigger manually to test:" + warn " kubectl create job -n $ns --from=cronjob/$cronjob ${cronjob}-manual-test" + fi +} + +# ── 1. Backup CronJobs ──────────────────────────────────────────────────────── +section "1. Backup CronJobs" +check_cronjob "lldap-backup" "$SSO_NAMESPACE" +check_cronjob "authelia-backup" "$SSO_NAMESPACE" +check_cronjob "privacyidea-backup" "$MFA_NAMESPACE" + +# ── 2. RBAC ─────────────────────────────────────────────────────────────────── +section "2. Backup ServiceAccount and RBAC (namespace: $SSO_NAMESPACE)" +if kubectl get serviceaccount backup-sa -n "$SSO_NAMESPACE" &>/dev/null; then + pass "ServiceAccount backup-sa exists" +else + fail "ServiceAccount backup-sa not found — apply backup/cronjob-sqlite-backups.yaml" +fi +if kubectl get role backup-scaler -n "$SSO_NAMESPACE" &>/dev/null; then + pass "Role backup-scaler exists" +else + fail "Role backup-scaler not found" +fi +if kubectl get rolebinding backup-sa-scaler -n "$SSO_NAMESPACE" &>/dev/null; then + pass "RoleBinding backup-sa-scaler exists" +else + fail "RoleBinding backup-sa-scaler not found" +fi + +# ── 3–5. CronJob run history ────────────────────────────────────────────────── +section "3. lldap-backup run history" +check_last_job "lldap-backup" "$SSO_NAMESPACE" + +section "4. authelia-backup run history" +check_last_job "authelia-backup" "$SSO_NAMESPACE" + +section "5. privacyidea-backup run history" +check_last_job "privacyidea-backup" "$MFA_NAMESPACE" + +# ── 6. privacyIDEA enckey backup on PVC ────────────────────────────────────── +section "6. privacyIDEA enckey backup on PVC" +PI_POD=$(kubectl get pod -n "$MFA_NAMESPACE" \ + -l app.kubernetes.io/name=privacyidea \ + --field-selector=status.phase=Running \ + -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") +if [[ -n "$PI_POD" ]]; then + BACKUP_COUNT=$(kubectl exec -n "$MFA_NAMESPACE" "$PI_POD" -- \ + sh -c 'ls /data/backups/enckey.backup.* 2>/dev/null | wc -l' 2>/dev/null || echo "0") + BACKUP_COUNT="${BACKUP_COUNT// /}" + if [[ "$BACKUP_COUNT" -gt 0 ]]; then + pass "privacyIDEA enckey backups found on PVC ($BACKUP_COUNT file(s))" + else + warn "No enckey backup files on PVC yet — trigger privacyidea-backup CronJob to create one" + warn " kubectl create job -n $MFA_NAMESPACE --from=cronjob/privacyidea-backup pi-backup-test" + fi +else + warn "Skipping enckey backup check — no running privacyIDEA pod" +fi + +# ── 7. LLDAP SQLite backup on PVC ──────────────────────────────────────────── +section "7. LLDAP SQLite backup on PVC" +LLDAP_POD=$(kubectl get pod -n "$SSO_NAMESPACE" \ + -l app.kubernetes.io/name=lldap \ + --field-selector=status.phase=Running \ + -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") +if [[ -n "$LLDAP_POD" ]]; then + BACKUP_COUNT=$(kubectl exec -n "$SSO_NAMESPACE" "$LLDAP_POD" -- \ + sh -c 'ls /data/backups/users.backup.* 2>/dev/null | wc -l' 2>/dev/null || echo "0") + BACKUP_COUNT="${BACKUP_COUNT// /}" + if [[ "$BACKUP_COUNT" -gt 0 ]]; then + pass "LLDAP SQLite backups found on PVC ($BACKUP_COUNT file(s))" + else + warn "No LLDAP backup files on PVC yet — trigger lldap-backup CronJob to create one" + warn " kubectl create job -n $SSO_NAMESPACE --from=cronjob/lldap-backup lldap-backup-test" + fi +else + warn "Skipping LLDAP backup check — no running LLDAP pod" +fi + +# ── 8. DR runbook present ───────────────────────────────────────────────────── +section "8. DR runbook" +RUNBOOK="$SCRIPT_DIR/backup/DR-RUNBOOK.md" +if [[ -f "$RUNBOOK" ]]; then + pass "DR-RUNBOOK.md present at $RUNBOOK" +else + fail "DR-RUNBOOK.md not found — it should be at sso-mfa/k8s/backup/DR-RUNBOOK.md" +fi + +# ── 9. Offsite backup (manual confirmation) ─────────────────────────────────── +section "9. Offsite backup (manual)" +warn "Cannot verify offsite backup automatically — confirm manually:" +warn " - pack-bundle.sh has been run with current secrets" +warn " - ops-bundle.tar.age stored in a separate physical location" +warn " - age decryption key stored separately (NOT in the same location as the bundle)" + +# ── Summary ─────────────────────────────────────────────────────────────────── +echo "" +echo "════════════════════════════════════════════════════════════" +echo " T08 verification: PASS=$PASS WARN=$WARN FAIL=$FAIL" +echo "════════════════════════════════════════════════════════════" + +if [[ "$FAIL" -gt 0 ]]; then + echo " Result: INCOMPLETE — resolve FAIL items before marking T08 done" + exit 1 +elif [[ "$WARN" -gt 0 ]]; then + echo " Result: PARTIAL — structure is in place; resolve WARN items (trigger CronJobs)" + exit 0 +else + echo " Result: COMPLETE — T08 done-criteria met; SSO/MFA platform workplan complete!" + exit 0 +fi