feat(sso-mfa): T07/T08 user mgmt, backups, DR & break-glass (NK-WP-0001-T07/T08)

T07 — User management & self-service:
- k8s/lldap/bootstrap-users.sh: creates net-kingdom-users and net-kingdom-admins
  groups in LLDAP via GraphQL API; idempotent.
- k8s/lldap/break-glass.sh: creates break-glass bypass account in LLDAP,
  sets BREAKGLASS_PASSWORD, assigns to net-kingdom-admins.
- k8s/verify-t07.sh: 6 checks — groups, break-glass, self-service portal,
  KeyCape OIDC client registrations.

T08 — Backups, DR, break-glass:
- k8s/backup/cronjob-sqlite-backups.yaml: daily CronJobs for LLDAP SQLite,
  Authelia SQLite (with scale-down/up RBAC), and privacyIDEA enckey backup.
  7-day retention, 03:00/03:15/03:30 UTC staggered schedule.
- k8s/backup/DR-RUNBOOK.md: full restore runbook — scenarios, restore order,
  LLDAP/Authelia/PI SQLite restore procedure, full node rebuild sequence,
  offsite age-encrypted export.
- k8s/verify-t08.sh: 9 checks — CronJobs, RBAC, run history, backup files
  on PVCs, DR runbook presence, offsite backup (manual confirmation).
- WORKPLAN.md: T07/T08 sections with done-criteria added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-19 09:17:03 +00:00
parent 69e900ddb1
commit 6c062e1295
7 changed files with 1288 additions and 3 deletions

View File

@@ -1,7 +1,7 @@
# SSO-MFA Platform — Stack Migration Workplan
# NK-WP-0001 — Keycloak → Authelia + LLDAP + KeyCape
**Updated:** 2026-03-19 (T06 in progress)
**Updated:** 2026-03-19 (T06 pending cluster; T07/T08 manifests complete)
**Workstream:** sso-mfa-platform (39263c4b-ef70-4053-b782-350834b7e1be)
## Stack Decision
@@ -24,8 +24,8 @@ Hostnames: kc.coulomb.social (KeyCape), auth.coulomb.social (Authelia), lldap.co
| T04 — privacyIDEA | 6ad1296a | **todo** | Manifests exist in k8s/privacyidea/; pending cluster |
| T05 — SSO core (new stack) | b9f73aa6 | done | commit 0754dc3 |
| T06 — Realm config & MFA flow | 3b6379a4 | **in-progress** | See below |
| T07 — User mgmt & self-service | c7cf902a | todo | |
| T08 — Backups, DR, break-glass | 9cbd1d89 | todo | |
| T07 — User mgmt & self-service | c7cf902a | **in-progress** | See below |
| T08 — Backups, DR, break-glass | 9cbd1d89 | **in-progress** | See below |
## T05 — SSO Core (new stack: LLDAP + Authelia + KeyCape)
@@ -76,3 +76,49 @@ Hostnames: kc.coulomb.social (KeyCape), auth.coulomb.social (Authelia), lldap.co
- KeyCape→privacyIDEA token list API returns status=True
- At least one user has enrolled a TOTP token
- verify-t06.sh: 0 FAILs
## T07 — User mgmt & self-service
### Deliverables
- [x] `k8s/lldap/bootstrap-users.sh` — creates net-kingdom-users and net-kingdom-admins groups in LLDAP via GraphQL API
- [x] `k8s/lldap/break-glass.sh` — creates the break-glass bypass account and assigns to net-kingdom-admins
- [x] `k8s/verify-t07.sh` — verifies groups, break-glass user, self-service portal, OIDC client registrations
### Pending (needs live cluster)
- [ ] Run `lldap/bootstrap-users.sh` to create groups
- [ ] Run `lldap/break-glass.sh` to create break-glass account
- [ ] Add first real user via LLDAP WebUI (lldap.coulomb.social)
- [ ] Register first OIDC client in `keycape/create-secrets.sh` (clients: block)
- [ ] User self-enrolls TOTP at pink-account.coulomb.social
- [ ] Run `verify-t07.sh` — 0 FAILs
### Done-criteria for T07
- Groups net-kingdom-users and net-kingdom-admins exist in LLDAP
- break-glass user exists and is in net-kingdom-admins
- At least one regular user exists
- At least one OIDC client registered in KeyCape
- verify-t07.sh: 0 FAILs
## T08 — Backups, DR, break-glass
### Deliverables
- [x] `k8s/backup/cronjob-sqlite-backups.yaml` — daily SQLite backup CronJobs for LLDAP, Authelia, privacyIDEA; RBAC for Authelia scale-down/up
- [x] `k8s/backup/DR-RUNBOOK.md` — full restore runbook: scenarios, restore order, node rebuild procedure, offsite export
- [x] `k8s/verify-t08.sh` — verifies CronJobs, RBAC, backup files on PVCs, DR runbook presence
### Pending (needs live cluster)
- [ ] Apply `backup/cronjob-sqlite-backups.yaml`
- [ ] Trigger each CronJob manually once to verify they run clean:
`kubectl create job -n sso --from=cronjob/lldap-backup lldap-backup-test`
`kubectl create job -n sso --from=cronjob/authelia-backup authelia-backup-test`
`kubectl create job -n mfa --from=cronjob/privacyidea-backup pi-backup-test`
- [ ] Confirm backup files appear on PVCs
- [ ] Run offsite export: pull backup files, encrypt with age, store offsite
- [ ] Run `verify-t08.sh` — 0 FAILs
### Done-criteria for T08
- All three backup CronJobs deployed and have ≥1 successful run
- Backup files confirmed on PVCs
- DR-RUNBOOK.md reviewed by operator
- Offsite ops bundle current (pack-bundle.sh run after all secrets finalised)
- verify-t08.sh: 0 FAILs

View File

@@ -0,0 +1,187 @@
# Disaster Recovery Runbook — net-kingdom SSO/MFA Platform
**Stack:** LLDAP + Authelia + KeyCape (sso namespace) + privacyIDEA (mfa namespace)
**PostgreSQL:** Managed separately by CNPG (`postgresql/scheduled-backup.yaml`)
---
## Recovery scenarios
| Scenario | Impact | Recovery |
|----------|--------|----------|
| Pod crash / OOM | Stateless pods (KeyCape) recover automatically. Stateful pods (LLDAP, Authelia, PI) restart and reload from PVC. | K8s self-heals. Verify with `verify-t05.sh`. |
| PVC data corruption | Users/sessions/tokens lost. | Restore from SQLite backup (see below). |
| Node failure (single-node K3s) | All pods lost. PVCs intact on host. | Re-apply all manifests (idempotent). Pods re-attach to PVCs. |
| Node total loss (disk gone) | Everything lost. | Full restore from backup + KeePassXC. |
| Stack locked out (SSO broken, can't log in) | No user access to OIDC-protected apps. | Use break-glass account. |
| enckey lost (privacyIDEA) | All enrolled MFA tokens invalid. Users must re-enroll. | Restore from enckey backup or re-enroll all tokens. |
---
## Break-glass access
When the SSO stack is broken and no user can authenticate:
```bash
# 1. Access LLDAP admin UI directly (requires VPN / IP-allowlisted access)
# URL: https://lldap.coulomb.social
# Username: break-glass
# Password: from KeePassXC → net-kingdom/Break-glass/break-glass
#
# 2. Or access LLDAP via kubectl exec (no network required)
kubectl exec -n sso deployment/lldap -- /bin/sh
# Inside container: use ldapwhoami / ldapsearch to verify directory state
# 3. Access privacyIDEA admin UI
# URL: https://pink.coulomb.social
# Username: pi-admin
# Password: from KeePassXC → net-kingdom/privacyIDEA/pi-admin
# NOTE: pi-admin has MFA enrolled — if privacyIDEA MFA is down, use:
kubectl exec -n mfa deployment/privacyidea -- pi-manage admin list
```
---
## Restore order
**CRITICAL: Always restore in this order.** Components depend on each other
at startup: privacyIDEA needs PostgreSQL, KeyCape needs all three.
```
1. PostgreSQL (databases ns) — CNPG operator handles restore
2. privacyIDEA (mfa ns) — needs PG + enckey PVC
3. LLDAP (sso ns) — standalone
4. Authelia (sso ns) — needs LLDAP (LDAP bind at startup check)
5. KeyCape (sso ns) — needs Authelia + LLDAP + privacyIDEA
```
---
## Restore from SQLite backup (PVC data corruption)
### LLDAP
```bash
# 1. Scale down LLDAP
kubectl scale deployment/lldap -n sso --replicas=0
# 2. Start a restore pod on the lldap-data PVC
kubectl run -n sso lldap-restore --image=nouchka/sqlite3:latest \
--restart=Never \
--overrides='{"spec":{"volumes":[{"name":"data","persistentVolumeClaim":{"claimName":"lldap-data"}}],"containers":[{"name":"lldap-restore","image":"nouchka/sqlite3:latest","command":["sleep","3600"],"volumeMounts":[{"name":"data","mountPath":"/data"}]}]}}'
# 3. Copy backup file into the pod (or it's already on the PVC under /data/backups/)
kubectl exec -n sso lldap-restore -- ls /data/backups/
# 4. Restore from the chosen backup
kubectl exec -n sso lldap-restore -- \
sqlite3 /data/backups/users.backup.YYYY-MM-DD ".dump" | \
sqlite3 /data/users.db
# 5. Clean up and restart
kubectl delete pod -n sso lldap-restore
kubectl scale deployment/lldap -n sso --replicas=1
kubectl rollout status deployment/lldap -n sso --timeout=120s
```
### Authelia
```bash
# Same pattern as LLDAP, using authelia-data PVC and authelia.backup.YYYY-MM-DD
kubectl scale deployment/authelia -n sso --replicas=0
# ... (run restore pod, restore db.sqlite3, scale back up)
kubectl scale deployment/authelia -n sso --replicas=1
```
### privacyIDEA enckey
```bash
# If the enckey is lost, restore it from KeePassXC binary attachment PI_ENCFILE.
# Extract it to a local file first, then:
kubectl create secret generic privacyidea-enckey \
--from-file=PI_ENCFILE=./pi.enc \
--namespace mfa \
--dry-run=client -o yaml | kubectl apply -f -
# Restart privacyIDEA to pick up the restored key
kubectl rollout restart deployment/privacyidea -n mfa
# If the enckey is truly lost and unrecoverable:
# All enrolled MFA tokens are invalid.
# Generate a new enckey with: kubectl exec -n mfa ... -- pi-manage create_enckey
# All users must re-enroll their TOTP/hardware tokens.
```
---
## Full node restore (new host)
```bash
# Prerequisites on new host:
# - K3s installed
# - Traefik ingress (bundled with K3s)
# - cert-manager installed (helm install cert-manager ...)
# - DNS records pointing to new node IP
# - KeePassXC vault accessible (offline copy or age-encrypted bundle)
# 1. Restore PostgreSQL from CNPG backup
# (See CNPG documentation for cluster restore from barmanObjectStore)
# 2. Re-apply all manifests in order
cd sso-mfa/k8s
kubectl apply -f namespaces/namespaces.yaml
kubectl apply -f network-policies/
kubectl apply -f cert-manager/issuers.yaml
# 3. Restore secrets from KeePassXC
# Run each create-secrets.sh in order:
cd postgresql && ./create-secrets.sh && cd ..
cd privacyidea && ./create-secrets.sh && cd ..
cd lldap && ./create-secrets.sh && cd ..
cd authelia && ./create-secrets.sh && cd ..
cd keycape && ./create-secrets.sh && cd ..
# 4. Apply workloads in restore order
kubectl apply -f postgresql/cluster.yaml
kubectl apply -f privacyidea/{pvc.yaml,configmap.yaml,deployment.yaml,middleware.yaml,ingress.yaml}
kubectl apply -f lldap/{pvc.yaml,deployment.yaml,middleware.yaml,ingress.yaml}
kubectl apply -f authelia/{pvc.yaml,configmap.yaml,deployment.yaml,ingress.yaml}
kubectl apply -f keycape/{deployment.yaml,middleware.yaml,ingress.yaml}
# 5. Wait for everything to be Ready
kubectl rollout status deployment/privacyidea -n mfa --timeout=300s
kubectl rollout status deployment/lldap -n sso --timeout=120s
kubectl rollout status deployment/authelia -n sso --timeout=120s
kubectl rollout status deployment/keycape -n sso --timeout=60s
# 6. Re-run bootstrap scripts if PVC data was lost
cd privacyidea && ./enckey-bootstrap.sh && ./bootstrap-admin.sh && ./bootstrap-realm.sh
cd ../lldap && ./bootstrap-users.sh && ./break-glass.sh
cd ../keycape && ./create-pi-token.sh && ./create-secrets.sh
kubectl rollout restart deployment/keycape -n sso
# 7. Verify
./verify-t04.sh && ./verify-t05.sh && ./verify-t06.sh && ./verify-t07.sh && ./verify-t08.sh
```
---
## Backup offsite export
The SQLite backup files land on the PVCs but are not offsite until exported.
Run this on the node host to pull them out and encrypt for offsite storage:
```bash
# Pull backup files from pods
kubectl exec -n sso deployment/lldap -- \
cat /data/backups/users.backup.$(date +%Y-%m-%d) > /tmp/lldap-backup.db
kubectl exec -n sso deployment/authelia -- \
cat /data/backups/authelia.backup.$(date +%Y-%m-%d) > /tmp/authelia-backup.db
# Encrypt with age and send offsite (same key as the ops bundle)
age -r "$(cat ~/net-kingdom-ops-bundle.key | grep 'public key' | awk '{print $NF}')" \
-o /tmp/lldap-backup.db.age /tmp/lldap-backup.db
# Shred plaintext copies
shred -u /tmp/lldap-backup.db /tmp/authelia-backup.db
```

View File

@@ -0,0 +1,304 @@
# SQLite backup CronJobs — sso and mfa namespaces
#
# Three CronJobs, one per stateful SQLite database:
# 1. lldap-backup — LLDAP user/group store (namespace: sso)
# 2. authelia-backup — Authelia session/storage DB (namespace: sso)
# 3. privacyidea-backup — privacyIDEA token store (namespace: mfa)
#
# Each CronJob runs daily at 03:00 UTC. It uses `sqlite3 .backup` for a
# hot backup that is consistent even while the parent pod is running.
# Backups land on the same PVC next to the live database — to protect
# against pod failure, not PVC failure. Export the backup files offsite
# using pack-bundle.sh or a separate volume snapshot mechanism.
#
# PostgreSQL (privacyIDEA DB) is handled by CNPG ScheduledBackup in
# postgresql/scheduled-backup.yaml. Do not duplicate it here.
#
# Backup file naming:
# <db>.backup.<YYYY-MM-DD> — created daily, pruned after 7 days
#
# Prerequisites:
# - SQLite3 available in the target pod (privacyIDEA and LLDAP images
# include it; Authelia's distroless image does NOT — so Authelia backup
# runs in a separate Job pod with sqlite:alpine image mounted on the PVC).
#
# Apply:
# kubectl apply -f cronjob-sqlite-backups.yaml
---
# ── 1. LLDAP backup (namespace: sso) ─────────────────────────────────────────
# LLDAP includes sqlite3 in its image — run the backup inside the live pod
# via a sidecar-style CronJob that mounts the same PVC.
apiVersion: batch/v1
kind: CronJob
metadata:
name: lldap-backup
namespace: sso
labels:
app.kubernetes.io/name: lldap-backup
app.kubernetes.io/part-of: net-kingdom-sso-mfa
net-kingdom/component: backup
spec:
schedule: "0 3 * * *" # daily at 03:00 UTC
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
metadata:
labels:
app.kubernetes.io/name: lldap-backup
net-kingdom/component: backup
spec:
restartPolicy: OnFailure
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
volumes:
- name: data
persistentVolumeClaim:
claimName: lldap-data
containers:
- name: backup
# Use a lightweight SQLite image — LLDAP's image may not have sqlite3 CLI
image: nouchka/sqlite3:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- |
set -eu
DB=/data/users.db
BACKUP_DIR=/data/backups
DATE=$(date +%Y-%m-%d)
mkdir -p "$BACKUP_DIR"
if [ ! -f "$DB" ]; then
echo "WARN: $DB not found — LLDAP may not have been bootstrapped yet"
exit 0
fi
sqlite3 "$DB" ".backup '$BACKUP_DIR/users.backup.$DATE'"
echo "OK: backed up $DB to $BACKUP_DIR/users.backup.$DATE"
# Prune backups older than 7 days
find "$BACKUP_DIR" -name 'users.backup.*' -mtime +7 -delete
echo "OK: pruned backups older than 7 days"
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
cpu: "10m"
memory: "32Mi"
limits:
cpu: "100m"
memory: "64Mi"
---
# ── 2. Authelia backup (namespace: sso) ──────────────────────────────────────
# Authelia uses a distroless image — run backup in a separate pod on the same PVC.
# NOTE: Authelia uses ReadWriteOnce PVC. The backup pod and Authelia pod cannot
# both mount it simultaneously on most K3s setups. This CronJob scales Authelia
# to 0 replicas, takes the backup, then restores the replica count.
# For production: prefer a storage-level snapshot (Longhorn/Velero) instead.
apiVersion: batch/v1
kind: CronJob
metadata:
name: authelia-backup
namespace: sso
labels:
app.kubernetes.io/name: authelia-backup
app.kubernetes.io/part-of: net-kingdom-sso-mfa
net-kingdom/component: backup
spec:
schedule: "15 3 * * *" # 03:15 UTC — offset from lldap-backup
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
metadata:
labels:
app.kubernetes.io/name: authelia-backup
net-kingdom/component: backup
spec:
restartPolicy: OnFailure
serviceAccountName: backup-sa # needs scale permission — see RBAC below
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
volumes:
- name: data
persistentVolumeClaim:
claimName: authelia-data
initContainers:
# Scale Authelia to 0 to release the PVC before mounting
- name: scale-down
image: bitnami/kubectl:latest
imagePullPolicy: IfNotPresent
command:
- kubectl
- scale
- deployment/authelia
- --replicas=0
- -n
- sso
resources:
requests:
cpu: "10m"
memory: "32Mi"
limits:
cpu: "100m"
memory: "64Mi"
containers:
- name: backup
image: nouchka/sqlite3:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- |
set -eu
DB=/data/db.sqlite3
BACKUP_DIR=/data/backups
DATE=$(date +%Y-%m-%d)
mkdir -p "$BACKUP_DIR"
if [ ! -f "$DB" ]; then
echo "WARN: $DB not found — Authelia may not have been bootstrapped yet"
else
sqlite3 "$DB" ".backup '$BACKUP_DIR/authelia.backup.$DATE'"
echo "OK: backed up $DB to $BACKUP_DIR/authelia.backup.$DATE"
find "$BACKUP_DIR" -name 'authelia.backup.*' -mtime +7 -delete
echo "OK: pruned backups older than 7 days"
fi
# Always scale Authelia back up, even on backup failure
kubectl scale deployment/authelia --replicas=1 -n sso || true
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
cpu: "10m"
memory: "32Mi"
limits:
cpu: "100m"
memory: "64Mi"
---
# ── 3. privacyIDEA backup (namespace: mfa) ───────────────────────────────────
# privacyIDEA's enckey and token store live in the PVC.
# The SQLite database (if configured) and enckey are both backed up here.
# NOTE: The main PI database is PostgreSQL (handled by CNPG). This backs up
# the PI_ENCFILE (encryption key) stored on the PVC and any local config files.
apiVersion: batch/v1
kind: CronJob
metadata:
name: privacyidea-backup
namespace: mfa
labels:
app.kubernetes.io/name: privacyidea-backup
app.kubernetes.io/part-of: net-kingdom-sso-mfa
net-kingdom/component: backup
spec:
schedule: "30 3 * * *" # 03:30 UTC — offset from previous jobs
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
metadata:
labels:
app.kubernetes.io/name: privacyidea-backup
net-kingdom/component: backup
spec:
restartPolicy: OnFailure
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
volumes:
- name: data
persistentVolumeClaim:
claimName: privacyidea-data
containers:
- name: backup
image: busybox:stable
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- |
set -eu
BACKUP_DIR=/data/backups
DATE=$(date +%Y-%m-%d)
mkdir -p "$BACKUP_DIR"
# Back up the enckey — this is the most critical file on this PVC.
# Loss of enckey = all enrolled MFA tokens become invalid.
if [ -f /data/enckey ]; then
cp /data/enckey "$BACKUP_DIR/enckey.backup.$DATE"
echo "OK: backed up enckey to $BACKUP_DIR/enckey.backup.$DATE"
else
echo "WARN: /data/enckey not found — enckey-bootstrap.sh may not have run yet"
fi
# Back up any local config files
if [ -f /data/privacyidea.cfg ]; then
cp /data/privacyidea.cfg "$BACKUP_DIR/privacyidea.cfg.backup.$DATE"
fi
# Prune files older than 7 days
find "$BACKUP_DIR" \( -name 'enckey.backup.*' -o -name '*.cfg.backup.*' \) \
-mtime +7 -delete
echo "OK: pruned backups older than 7 days"
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
cpu: "10m"
memory: "16Mi"
limits:
cpu: "50m"
memory: "32Mi"
---
# ── RBAC for backup-sa (Authelia scale-down/up) ───────────────────────────────
apiVersion: v1
kind: ServiceAccount
metadata:
name: backup-sa
namespace: sso
labels:
app.kubernetes.io/part-of: net-kingdom-sso-mfa
net-kingdom/component: backup
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: backup-scaler
namespace: sso
labels:
app.kubernetes.io/part-of: net-kingdom-sso-mfa
net-kingdom/component: backup
rules:
- apiGroups: ["apps"]
resources: ["deployments/scale", "deployments"]
verbs: ["get", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: backup-sa-scaler
namespace: sso
labels:
app.kubernetes.io/part-of: net-kingdom-sso-mfa
net-kingdom/component: backup
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: backup-scaler
subjects:
- kind: ServiceAccount
name: backup-sa
namespace: sso

View File

@@ -0,0 +1,172 @@
#!/usr/bin/env bash
# bootstrap-users.sh — seed required groups in LLDAP
#
# Run AFTER LLDAP is deployed and Running (T05a).
#
# What it does:
# 1. Authenticates to LLDAP via its GraphQL API.
# 2. Creates the two required groups: net-kingdom-users, net-kingdom-admins.
# 3. Prints a user onboarding checklist (groups-only; individual users are
# added via the WebUI or by re-running this script with USER_EMAIL set).
#
# Groups created:
# net-kingdom-users — standard users; all human accounts go here.
# net-kingdom-admins — privileged users; KeyCape policies can enforce
# MFA step-up or grant extra scopes to this group.
#
# Usage:
# ./bootstrap-users.sh [lldap-url] [secrets-dir]
#
# <lldap-url> default: https://lldap.coulomb.social
# <secrets-dir> default: ../../bootstrap/secrets
set -euo pipefail
LLDAP_URL="${1:-https://lldap.coulomb.social}"
SECRETS_DIR="${2:-../../bootstrap/secrets}"
LLDAP_ENV="$SECRETS_DIR/lldap/secrets.env"
PASS_COUNT=0
FAIL_COUNT=0
ok() { echo " [OK] $1"; ((PASS_COUNT++)); }
fail() { echo " [FAIL] $1"; ((FAIL_COUNT++)); }
info() { echo " [INFO] $1"; }
if [[ ! -f "$LLDAP_ENV" ]]; then
echo "ERROR: $LLDAP_ENV not found — run sso-mfa/bootstrap/gen-secrets.sh first." >&2
exit 1
fi
read_env() { bash -c "source '$1' 2>/dev/null; echo \${$2}"; }
LLDAP_ADMIN_PASS=$(read_env "$LLDAP_ENV" LLDAP_LDAP_USER_PASS)
if [[ -z "$LLDAP_ADMIN_PASS" ]]; then
echo "ERROR: LLDAP_LDAP_USER_PASS not found in $LLDAP_ENV" >&2
exit 1
fi
# ── 1. Authenticate ───────────────────────────────────────────────────────────
echo ""
echo "Authenticating to LLDAP at $LLDAP_URL ..."
AUTH_RESP=$(curl -sf -X POST "$LLDAP_URL/auth/simple/login" \
-H "Content-Type: application/json" \
-d "{\"username\":\"admin\",\"password\":\"$LLDAP_ADMIN_PASS\"}" \
2>/dev/null || echo "CURL_FAILED")
if [[ "$AUTH_RESP" == "CURL_FAILED" ]]; then
echo "ERROR: Could not reach $LLDAP_URL — is LLDAP deployed and ingress up?" >&2
exit 1
fi
LLDAP_TOKEN=$(echo "$AUTH_RESP" | python3 -c \
"import sys,json; print(json.load(sys.stdin).get('token',''))" 2>/dev/null || echo "")
if [[ -z "$LLDAP_TOKEN" ]]; then
echo "ERROR: Authentication failed. Response: $AUTH_RESP" >&2
exit 1
fi
info "Authenticated as admin"
gql() {
# gql <query> <variables-json>
local query="$1"; local vars="${2:-{}}"
local body
body=$(python3 -c "
import json, sys
print(json.dumps({'query': sys.argv[1], 'variables': json.loads(sys.argv[2])}))
" "$query" "$vars")
curl -sf -X POST "$LLDAP_URL/api/graphql" \
-H "Authorization: Bearer $LLDAP_TOKEN" \
-H "Content-Type: application/json" \
-d "$body" 2>/dev/null || echo "CURL_FAILED"
}
create_group() {
local name="$1"
echo ""
echo "Creating group: $name ..."
# Check if group already exists
LIST_RESP=$(gql 'query { groups { id displayName } }')
if [[ "$LIST_RESP" != "CURL_FAILED" ]]; then
EXISTS=$(echo "$LIST_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='$name' for g in d.get('data',{}).get('groups',[])) else 'no')" \
2>/dev/null || echo "no")
if [[ "$EXISTS" == "yes" ]]; then
ok "Group '$name' already exists — skipping"
return 0
fi
fi
RESP=$(gql 'mutation CreateGroup($name: String!) { createGroup(name: $name) { id displayName } }' \
"{\"name\":\"$name\"}")
if [[ "$RESP" == "CURL_FAILED" ]]; then
fail "Group '$name' — curl request failed"
return 1
fi
ERR=$(echo "$RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \
2>/dev/null || echo "")
if [[ -n "$ERR" ]]; then
fail "Group '$name' — $ERR"
return 1
fi
GID=$(echo "$RESP" | python3 -c \
"import sys,json; print(json.load(sys.stdin).get('data',{}).get('createGroup',{}).get('id','?'))" \
2>/dev/null || echo "?")
ok "Group '$name' created (id=$GID)"
}
# ── 2. Create required groups ─────────────────────────────────────────────────
create_group "net-kingdom-users"
create_group "net-kingdom-admins"
# ── 3. Verify ─────────────────────────────────────────────────────────────────
echo ""
echo "Verifying groups ..."
LIST_RESP=$(gql 'query { groups { id displayName } }')
if [[ "$LIST_RESP" != "CURL_FAILED" ]]; then
for grp in "net-kingdom-users" "net-kingdom-admins"; do
EXISTS=$(echo "$LIST_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='$grp' for g in d.get('data',{}).get('groups',[])) else 'no')" \
2>/dev/null || echo "no")
if [[ "$EXISTS" == "yes" ]]; then
ok "Group '$grp' confirmed"
else
fail "Group '$grp' not found after creation"
fi
done
else
fail "Could not retrieve group list from LLDAP"
fi
# ── Summary ───────────────────────────────────────────────────────────────────
echo ""
echo "════════════════════════════════════════════════════════════"
echo " LLDAP group bootstrap: PASS=$PASS_COUNT FAIL=$FAIL_COUNT"
echo "════════════════════════════════════════════════════════════"
echo ""
echo "Next: add users via the LLDAP WebUI or LDAP provisioning."
echo ""
echo "User onboarding checklist:"
echo ""
echo " Per new user:"
echo " 1. Create account in LLDAP WebUI ($LLDAP_URL)"
echo " Fields: username (uid), display name, email"
echo " 2. Assign to net-kingdom-users group (mandatory)"
echo " Assign to net-kingdom-admins too if privileged access is needed"
echo " 3. User logs in to Authelia (auth.coulomb.social) to verify their password"
echo " 4. User self-enrolls TOTP at pink-account.coulomb.social"
echo " 5. User tests end-to-end login via an OIDC-protected application"
echo ""
echo " Break-glass account:"
echo " Run: sso-mfa/k8s/lldap/break-glass.sh"
echo " (Creates a pre-seeded local bypass user outside the normal MFA flow.)"
echo ""
if [[ "$FAIL_COUNT" -gt 0 ]]; then
exit 1
fi
exit 0

203
sso-mfa/k8s/lldap/break-glass.sh Executable file
View File

@@ -0,0 +1,203 @@
#!/usr/bin/env bash
# break-glass.sh — create the break-glass bypass account in LLDAP
#
# The break-glass account is a last-resort local user for when the SSO stack
# itself is broken (Authelia down, KeyCape misconfigured, etc.). It is:
# - Created in LLDAP with BREAKGLASS_PASSWORD from gen-secrets.sh
# - Assigned to net-kingdom-admins
# - NOT enrolled in privacyIDEA MFA (so it can log in even if privacyIDEA is down)
# - Its password is stored ONLY in KeePassXC (never in the cluster)
#
# IMPORTANT: After creating this account, immediately store the password in
# KeePassXC → net-kingdom/Break-glass/break-glass. Then test it by logging
# in to the LLDAP WebUI directly with this account.
#
# Usage:
# ./break-glass.sh [lldap-url] [secrets-dir]
#
# <lldap-url> default: https://lldap.coulomb.social
# <secrets-dir> default: ../../bootstrap/secrets
set -euo pipefail
LLDAP_URL="${1:-https://lldap.coulomb.social}"
SECRETS_DIR="${2:-../../bootstrap/secrets}"
LLDAP_ENV="$SECRETS_DIR/lldap/secrets.env"
BG_ENV="$SECRETS_DIR/breakglass/secrets.env"
BG_USERNAME="break-glass"
BG_EMAIL="break-glass@netkingdom.local"
BG_DISPLAY="Break-glass Account"
PASS_COUNT=0
FAIL_COUNT=0
ok() { echo " [OK] $1"; ((PASS_COUNT++)); }
fail() { echo " [FAIL] $1"; ((FAIL_COUNT++)); }
info() { echo " [INFO] $1"; }
for f in "$LLDAP_ENV" "$BG_ENV"; do
if [[ ! -f "$f" ]]; then
echo "ERROR: $f not found — run sso-mfa/bootstrap/gen-secrets.sh first." >&2
exit 1
fi
done
read_env() { bash -c "source '$1' 2>/dev/null; echo \${$2}"; }
LLDAP_ADMIN_PASS=$(read_env "$LLDAP_ENV" LLDAP_LDAP_USER_PASS)
BG_PASSWORD=$(read_env "$BG_ENV" BREAKGLASS_PASSWORD)
if [[ -z "$LLDAP_ADMIN_PASS" || -z "$BG_PASSWORD" ]]; then
echo "ERROR: could not read LLDAP_LDAP_USER_PASS or BREAKGLASS_PASSWORD" >&2
exit 1
fi
# ── Authenticate ──────────────────────────────────────────────────────────────
echo ""
echo "Authenticating to LLDAP at $LLDAP_URL ..."
AUTH_RESP=$(curl -sf -X POST "$LLDAP_URL/auth/simple/login" \
-H "Content-Type: application/json" \
-d "{\"username\":\"admin\",\"password\":\"$LLDAP_ADMIN_PASS\"}" \
2>/dev/null || echo "CURL_FAILED")
if [[ "$AUTH_RESP" == "CURL_FAILED" ]]; then
echo "ERROR: Could not reach $LLDAP_URL" >&2
exit 1
fi
LLDAP_TOKEN=$(echo "$AUTH_RESP" | python3 -c \
"import sys,json; print(json.load(sys.stdin).get('token',''))" 2>/dev/null || echo "")
if [[ -z "$LLDAP_TOKEN" ]]; then
echo "ERROR: LLDAP authentication failed" >&2
exit 1
fi
info "Authenticated as admin"
gql() {
local query="$1"; local vars="${2:-{}}"
local body
body=$(python3 -c "
import json, sys
print(json.dumps({'query': sys.argv[1], 'variables': json.loads(sys.argv[2])}))
" "$query" "$vars")
curl -sf -X POST "$LLDAP_URL/api/graphql" \
-H "Authorization: Bearer $LLDAP_TOKEN" \
-H "Content-Type: application/json" \
-d "$body" 2>/dev/null || echo "CURL_FAILED"
}
# ── Check if user already exists ──────────────────────────────────────────────
echo ""
echo "Checking if break-glass user exists ..."
LIST_RESP=$(gql 'query { users { id displayName email } }')
if [[ "$LIST_RESP" != "CURL_FAILED" ]]; then
EXISTS=$(echo "$LIST_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('yes' if any(u['id']=='$BG_USERNAME' for u in d.get('data',{}).get('users',[])) else 'no')" \
2>/dev/null || echo "no")
if [[ "$EXISTS" == "yes" ]]; then
info "User '$BG_USERNAME' already exists — skipping creation"
ok "User '$BG_USERNAME' exists"
else
# ── Create user ───────────────────────────────────────────────────────
echo "Creating user '$BG_USERNAME' ..."
CREATE_VARS=$(python3 -c "
import json, sys
v = {
'user': {
'id': '$BG_USERNAME',
'email': '$BG_EMAIL',
'displayName': '$BG_DISPLAY',
'firstName': 'Break',
'lastName': 'Glass'
}
}
print(json.dumps(v))
")
RESP=$(gql 'mutation CreateUser($user: CreateUserInput!) { createUser(user: $user) { id creationDate } }' \
"$CREATE_VARS")
ERR=$(echo "$RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \
2>/dev/null || echo "")
if [[ -n "$ERR" ]]; then
fail "Create user '$BG_USERNAME' — $ERR"
else
ok "User '$BG_USERNAME' created"
fi
fi
else
fail "Could not query user list from LLDAP"
fi
# ── Set password ──────────────────────────────────────────────────────────────
# LLDAP requires a separate API call to set the password after user creation.
echo ""
echo "Setting password for '$BG_USERNAME' ..."
PW_VARS=$(python3 -c "
import json, sys
print(json.dumps({'userId': '$BG_USERNAME', 'password': sys.argv[1]}))
" "$BG_PASSWORD")
PW_RESP=$(gql 'mutation SetPassword($userId: String!, $password: String!) { changeUserPassword(userId: $userId, password: $password) }' \
"$PW_VARS")
PW_ERR=$(echo "$PW_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \
2>/dev/null || echo "")
if [[ -n "$PW_ERR" ]]; then
fail "Set password — $PW_ERR"
else
ok "Password set for '$BG_USERNAME'"
fi
# ── Add to net-kingdom-admins ─────────────────────────────────────────────────
echo ""
echo "Adding '$BG_USERNAME' to net-kingdom-admins group ..."
# Find the net-kingdom-admins group ID
GROUPS_RESP=$(gql 'query { groups { id displayName } }')
ADMIN_GID=$(echo "$GROUPS_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); gs=[g for g in d.get('data',{}).get('groups',[]) if g['displayName']=='net-kingdom-admins']; print(gs[0]['id'] if gs else '')" \
2>/dev/null || echo "")
if [[ -z "$ADMIN_GID" ]]; then
fail "Group 'net-kingdom-admins' not found — run bootstrap-users.sh first"
else
ADD_VARS="{\"userId\":\"$BG_USERNAME\",\"groupId\":$ADMIN_GID}"
ADD_RESP=$(gql 'mutation AddToGroup($userId: String!, $groupId: Int!) { addUserToGroup(userId: $userId, groupId: $groupId) }' \
"$ADD_VARS")
ADD_ERR=$(echo "$ADD_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); errs=d.get('errors',[]); print(errs[0]['message'] if errs else '')" \
2>/dev/null || echo "")
if [[ -n "$ADD_ERR" && "$ADD_ERR" != *"already"* ]]; then
fail "Add to group — $ADD_ERR"
else
ok "'$BG_USERNAME' is in net-kingdom-admins"
fi
fi
# ── Summary ───────────────────────────────────────────────────────────────────
echo ""
echo "════════════════════════════════════════════════════════════"
echo " Break-glass bootstrap: PASS=$PASS_COUNT FAIL=$FAIL_COUNT"
echo "════════════════════════════════════════════════════════════"
echo ""
echo "CRITICAL — do these steps NOW:"
echo ""
echo " 1. Store the break-glass password in KeePassXC:"
echo " Group: net-kingdom/Break-glass"
echo " Entry: break-glass → username='$BG_USERNAME' password=<from gen-secrets.sh>"
echo ""
echo " 2. Test the account (LLDAP WebUI login):"
echo " $LLDAP_URL"
echo " Login as '$BG_USERNAME' with BREAKGLASS_PASSWORD"
echo " Confirm you can see the admin panel."
echo ""
echo " 3. Do NOT enroll MFA for '$BG_USERNAME' in privacyIDEA."
echo " This account must remain usable when privacyIDEA is unavailable."
echo " Its sole authentication factor is the password stored in KeePassXC."
echo ""
echo " 4. Document the DR restore sequence:"
echo " See sso-mfa/k8s/backup/DR-RUNBOOK.md"
echo ""
if [[ "$FAIL_COUNT" -gt 0 ]]; then
exit 1
fi
exit 0

199
sso-mfa/k8s/verify-t07.sh Executable file
View File

@@ -0,0 +1,199 @@
#!/usr/bin/env bash
# verify-t07.sh — verify NK-WP-0001-T07 done-criteria
#
# Checks user management and self-service readiness.
#
# Sections:
# 1. LLDAP group: net-kingdom-users exists
# 2. LLDAP group: net-kingdom-admins exists
# 3. At least one non-admin user exists in LLDAP
# 4. Break-glass user exists and is in net-kingdom-admins
# 5. privacyIDEA self-service portal reachable
# 6. KeyCape config has at least one OIDC client registered
#
# Usage:
# chmod +x verify-t07.sh
# ./verify-t07.sh [lldap-url] [secrets-dir]
#
# <lldap-url> default: https://lldap.coulomb.social
# <secrets-dir> default: ../bootstrap/secrets
set -euo pipefail
LLDAP_URL="${1:-https://lldap.coulomb.social}"
SECRETS_DIR="${2:-../bootstrap/secrets}"
LLDAP_ENV="$SECRETS_DIR/lldap/secrets.env"
SSO_NAMESPACE="sso"
PASS=0
FAIL=0
WARN=0
pass() { echo " [PASS] $1"; ((PASS++)); }
fail() { echo " [FAIL] $1"; ((FAIL++)); }
warn() { echo " [WARN] $1"; ((WARN++)); }
section() { echo ""; echo "── $1 ──────────────────────────────────────"; }
# ── Authenticate to LLDAP ─────────────────────────────────────────────────────
LLDAP_TOKEN=""
if [[ -f "$LLDAP_ENV" ]]; then
read_env() { bash -c "source '$1' 2>/dev/null; echo \${$2}"; }
LLDAP_ADMIN_PASS=$(read_env "$LLDAP_ENV" LLDAP_LDAP_USER_PASS)
if [[ -n "$LLDAP_ADMIN_PASS" ]]; then
AUTH_RESP=$(curl -sf -X POST "$LLDAP_URL/auth/simple/login" \
-H "Content-Type: application/json" \
-d "{\"username\":\"admin\",\"password\":\"$LLDAP_ADMIN_PASS\"}" \
2>/dev/null || echo "CURL_FAILED")
if [[ "$AUTH_RESP" != "CURL_FAILED" ]]; then
LLDAP_TOKEN=$(echo "$AUTH_RESP" | python3 -c \
"import sys,json; print(json.load(sys.stdin).get('token',''))" 2>/dev/null || echo "")
fi
fi
fi
gql() {
if [[ -z "$LLDAP_TOKEN" ]]; then echo "NO_TOKEN"; return; fi
local query="$1"; local vars="${2:-{}}"
local body
body=$(python3 -c "
import json, sys
print(json.dumps({'query': sys.argv[1], 'variables': json.loads(sys.argv[2])}))
" "$query" "$vars")
curl -sf -X POST "$LLDAP_URL/api/graphql" \
-H "Authorization: Bearer $LLDAP_TOKEN" \
-H "Content-Type: application/json" \
-d "$body" 2>/dev/null || echo "CURL_FAILED"
}
GROUPS_RESP=$(gql 'query { groups { id displayName members { id } } }')
# ── 1. net-kingdom-users group ───────────────────────────────────────────────
section "1. LLDAP group: net-kingdom-users"
if [[ "$GROUPS_RESP" == "NO_TOKEN" ]]; then
warn "Skipping — could not authenticate to LLDAP at $LLDAP_URL"
elif [[ "$GROUPS_RESP" == "CURL_FAILED" ]]; then
fail "Could not query LLDAP groups — is LLDAP up?"
else
EXISTS=$(echo "$GROUPS_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='net-kingdom-users' for g in d.get('data',{}).get('groups',[])) else 'no')" \
2>/dev/null || echo "no")
if [[ "$EXISTS" == "yes" ]]; then
pass "Group 'net-kingdom-users' exists"
else
fail "Group 'net-kingdom-users' not found — run lldap/bootstrap-users.sh"
fi
fi
# ── 2. net-kingdom-admins group ──────────────────────────────────────────────
section "2. LLDAP group: net-kingdom-admins"
if [[ "$GROUPS_RESP" == "NO_TOKEN" || "$GROUPS_RESP" == "CURL_FAILED" ]]; then
warn "Skipping — see section 1"
else
EXISTS=$(echo "$GROUPS_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('yes' if any(g['displayName']=='net-kingdom-admins' for g in d.get('data',{}).get('groups',[])) else 'no')" \
2>/dev/null || echo "no")
if [[ "$EXISTS" == "yes" ]]; then
pass "Group 'net-kingdom-admins' exists"
else
fail "Group 'net-kingdom-admins' not found — run lldap/bootstrap-users.sh"
fi
fi
# ── 3. At least one non-admin user ───────────────────────────────────────────
section "3. At least one regular user in LLDAP"
USERS_RESP=$(gql 'query { users { id displayName email } }')
if [[ "$USERS_RESP" == "NO_TOKEN" || "$USERS_RESP" == "CURL_FAILED" ]]; then
warn "Skipping — could not query users"
else
USERS=$(echo "$USERS_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); us=[u['id'] for u in d.get('data',{}).get('users',[]) if u['id'] not in ('admin','break-glass')]; print(len(us))" \
2>/dev/null || echo "0")
if [[ "$USERS" -gt 0 ]]; then
pass "$USERS regular user(s) exist in LLDAP"
else
warn "No regular users found — add users via the LLDAP WebUI ($LLDAP_URL) or provisioning script"
fi
fi
# ── 4. Break-glass user ───────────────────────────────────────────────────────
section "4. Break-glass account"
if [[ "$USERS_RESP" == "NO_TOKEN" || "$USERS_RESP" == "CURL_FAILED" ]]; then
warn "Skipping — could not query users"
else
BG_EXISTS=$(echo "$USERS_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('yes' if any(u['id']=='break-glass' for u in d.get('data',{}).get('users',[])) else 'no')" \
2>/dev/null || echo "no")
if [[ "$BG_EXISTS" == "yes" ]]; then
pass "User 'break-glass' exists in LLDAP"
# Check group membership
BG_IN_ADMINS=$(echo "$GROUPS_RESP" | python3 -c \
"import sys,json; d=json.load(sys.stdin); gs=[g for g in d.get('data',{}).get('groups',[]) if g['displayName']=='net-kingdom-admins']; print('yes' if gs and any(m['id']=='break-glass' for m in gs[0].get('members',[])) else 'no')" \
2>/dev/null || echo "no")
if [[ "$BG_IN_ADMINS" == "yes" ]]; then
pass "'break-glass' is in net-kingdom-admins group"
else
warn "'break-glass' is not in net-kingdom-admins — run lldap/break-glass.sh"
fi
else
fail "User 'break-glass' not found — run lldap/break-glass.sh"
fi
fi
# ── 5. Self-service portal ────────────────────────────────────────────────────
section "5. privacyIDEA self-service portal (pink-account.coulomb.social)"
PORTAL_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" \
"https://pink-account.coulomb.social" 2>/dev/null || echo "000")
if [[ "$PORTAL_STATUS" == "200" || "$PORTAL_STATUS" == "302" ]]; then
pass "Self-service portal reachable (HTTP $PORTAL_STATUS)"
elif [[ "$PORTAL_STATUS" == "000" ]]; then
warn "Self-service portal not reachable — check DNS and ingress"
else
warn "Self-service portal returned HTTP $PORTAL_STATUS"
fi
# ── 6. KeyCape OIDC client registrations ─────────────────────────────────────
section "6. KeyCape OIDC client registrations"
KC_POD=$(kubectl get pod -n "$SSO_NAMESPACE" \
-l app.kubernetes.io/name=keycape \
--field-selector=status.phase=Running \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
if [[ -n "$KC_POD" ]]; then
DISCOVERY=$(kubectl exec -n "$SSO_NAMESPACE" "$KC_POD" -- \
wget -qO- "http://localhost:8080/.well-known/openid-configuration" 2>/dev/null || echo "")
if [[ -n "$DISCOVERY" ]]; then
pass "KeyCape OIDC discovery endpoint accessible"
# Check config for registered clients (KeyCape config in keycape-config Secret)
CONFIG=$(kubectl get secret keycape-config -n "$SSO_NAMESPACE" \
-o jsonpath='{.data.config\.yaml}' 2>/dev/null | base64 -d 2>/dev/null || echo "")
CLIENT_COUNT=$(echo "$CONFIG" | python3 -c \
"import sys; import yaml; cfg=yaml.safe_load(sys.stdin.read()); print(len(cfg.get('clients',[])))" \
2>/dev/null || echo "0")
if [[ "$CLIENT_COUNT" -gt 0 ]]; then
pass "$CLIENT_COUNT OIDC client(s) registered in KeyCape"
else
warn "No OIDC clients registered — add clients to keycape/create-secrets.sh and re-run it"
fi
else
warn "Skipping client check — KeyCape not reachable in-cluster"
fi
else
warn "Skipping OIDC client check — no running KeyCape pod"
fi
# ── Summary ───────────────────────────────────────────────────────────────────
echo ""
echo "════════════════════════════════════════════════════════════"
echo " T07 verification: PASS=$PASS WARN=$WARN FAIL=$FAIL"
echo "════════════════════════════════════════════════════════════"
if [[ "$FAIL" -gt 0 ]]; then
echo " Result: INCOMPLETE — resolve FAIL items before marking T07 done"
exit 1
elif [[ "$WARN" -gt 0 ]]; then
echo " Result: PARTIAL — required structure is in place; review WARN items"
exit 0
else
echo " Result: COMPLETE — T07 done-criteria met; proceed to T08 (Backups, DR, break-glass)"
exit 0
fi

174
sso-mfa/k8s/verify-t08.sh Executable file
View File

@@ -0,0 +1,174 @@
#!/usr/bin/env bash
# verify-t08.sh — verify NK-WP-0001-T08 done-criteria
#
# Checks backups, DR readiness, and break-glass account.
#
# Sections:
# 1. Backup CronJobs exist (lldap-backup, authelia-backup, privacyidea-backup)
# 2. backup-sa ServiceAccount and RBAC exist
# 3. lldap-backup has run successfully at least once
# 4. authelia-backup has run successfully at least once
# 5. privacyidea-backup has run successfully at least once
# 6. privacyIDEA enckey backup exists on PVC
# 7. LLDAP SQLite backup exists on PVC
# 8. DR-RUNBOOK.md present in repo
# 9. KeePassXC ops bundle (pack-bundle.sh) — manual confirmation required
#
# Usage:
# chmod +x verify-t08.sh
# ./verify-t08.sh
set -euo pipefail
SSO_NAMESPACE="sso"
MFA_NAMESPACE="mfa"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PASS=0
FAIL=0
WARN=0
pass() { echo " [PASS] $1"; ((PASS++)); }
fail() { echo " [FAIL] $1"; ((FAIL++)); }
warn() { echo " [WARN] $1"; ((WARN++)); }
section() { echo ""; echo "── $1 ──────────────────────────────────────"; }
check_cronjob() {
local name="$1"; local ns="$2"
if kubectl get cronjob "$name" -n "$ns" &>/dev/null; then
pass "CronJob $name exists (namespace: $ns)"
local schedule
schedule=$(kubectl get cronjob "$name" -n "$ns" \
-o jsonpath='{.spec.schedule}' 2>/dev/null || echo "?")
pass " Schedule: $schedule"
else
fail "CronJob $name not found in namespace $ns — apply backup/cronjob-sqlite-backups.yaml"
fi
}
check_last_job() {
local cronjob="$1"; local ns="$2"
# Find the most recent Job spawned by this CronJob
LAST_JOB=$(kubectl get job -n "$ns" \
-l "batch.kubernetes.io/controller-uid" \
--sort-by=.metadata.creationTimestamp \
-o jsonpath='{.items[-1].metadata.name}' 2>/dev/null || echo "")
# Simpler: look for any completed job with the cronjob name prefix
SUCCEEDED=$(kubectl get job -n "$ns" \
-o jsonpath="{.items[?(@.metadata.ownerReferences[0].name==\"$cronjob\")].status.succeeded}" \
2>/dev/null || echo "")
if [[ "$SUCCEEDED" == *"1"* ]]; then
pass "CronJob $cronjob has at least one successful run"
else
warn "CronJob $cronjob has no successful runs yet — trigger manually to test:"
warn " kubectl create job -n $ns --from=cronjob/$cronjob ${cronjob}-manual-test"
fi
}
# ── 1. Backup CronJobs ────────────────────────────────────────────────────────
section "1. Backup CronJobs"
check_cronjob "lldap-backup" "$SSO_NAMESPACE"
check_cronjob "authelia-backup" "$SSO_NAMESPACE"
check_cronjob "privacyidea-backup" "$MFA_NAMESPACE"
# ── 2. RBAC ───────────────────────────────────────────────────────────────────
section "2. Backup ServiceAccount and RBAC (namespace: $SSO_NAMESPACE)"
if kubectl get serviceaccount backup-sa -n "$SSO_NAMESPACE" &>/dev/null; then
pass "ServiceAccount backup-sa exists"
else
fail "ServiceAccount backup-sa not found — apply backup/cronjob-sqlite-backups.yaml"
fi
if kubectl get role backup-scaler -n "$SSO_NAMESPACE" &>/dev/null; then
pass "Role backup-scaler exists"
else
fail "Role backup-scaler not found"
fi
if kubectl get rolebinding backup-sa-scaler -n "$SSO_NAMESPACE" &>/dev/null; then
pass "RoleBinding backup-sa-scaler exists"
else
fail "RoleBinding backup-sa-scaler not found"
fi
# ── 35. CronJob run history ──────────────────────────────────────────────────
section "3. lldap-backup run history"
check_last_job "lldap-backup" "$SSO_NAMESPACE"
section "4. authelia-backup run history"
check_last_job "authelia-backup" "$SSO_NAMESPACE"
section "5. privacyidea-backup run history"
check_last_job "privacyidea-backup" "$MFA_NAMESPACE"
# ── 6. privacyIDEA enckey backup on PVC ──────────────────────────────────────
section "6. privacyIDEA enckey backup on PVC"
PI_POD=$(kubectl get pod -n "$MFA_NAMESPACE" \
-l app.kubernetes.io/name=privacyidea \
--field-selector=status.phase=Running \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
if [[ -n "$PI_POD" ]]; then
BACKUP_COUNT=$(kubectl exec -n "$MFA_NAMESPACE" "$PI_POD" -- \
sh -c 'ls /data/backups/enckey.backup.* 2>/dev/null | wc -l' 2>/dev/null || echo "0")
BACKUP_COUNT="${BACKUP_COUNT// /}"
if [[ "$BACKUP_COUNT" -gt 0 ]]; then
pass "privacyIDEA enckey backups found on PVC ($BACKUP_COUNT file(s))"
else
warn "No enckey backup files on PVC yet — trigger privacyidea-backup CronJob to create one"
warn " kubectl create job -n $MFA_NAMESPACE --from=cronjob/privacyidea-backup pi-backup-test"
fi
else
warn "Skipping enckey backup check — no running privacyIDEA pod"
fi
# ── 7. LLDAP SQLite backup on PVC ────────────────────────────────────────────
section "7. LLDAP SQLite backup on PVC"
LLDAP_POD=$(kubectl get pod -n "$SSO_NAMESPACE" \
-l app.kubernetes.io/name=lldap \
--field-selector=status.phase=Running \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
if [[ -n "$LLDAP_POD" ]]; then
BACKUP_COUNT=$(kubectl exec -n "$SSO_NAMESPACE" "$LLDAP_POD" -- \
sh -c 'ls /data/backups/users.backup.* 2>/dev/null | wc -l' 2>/dev/null || echo "0")
BACKUP_COUNT="${BACKUP_COUNT// /}"
if [[ "$BACKUP_COUNT" -gt 0 ]]; then
pass "LLDAP SQLite backups found on PVC ($BACKUP_COUNT file(s))"
else
warn "No LLDAP backup files on PVC yet — trigger lldap-backup CronJob to create one"
warn " kubectl create job -n $SSO_NAMESPACE --from=cronjob/lldap-backup lldap-backup-test"
fi
else
warn "Skipping LLDAP backup check — no running LLDAP pod"
fi
# ── 8. DR runbook present ─────────────────────────────────────────────────────
section "8. DR runbook"
RUNBOOK="$SCRIPT_DIR/backup/DR-RUNBOOK.md"
if [[ -f "$RUNBOOK" ]]; then
pass "DR-RUNBOOK.md present at $RUNBOOK"
else
fail "DR-RUNBOOK.md not found — it should be at sso-mfa/k8s/backup/DR-RUNBOOK.md"
fi
# ── 9. Offsite backup (manual confirmation) ───────────────────────────────────
section "9. Offsite backup (manual)"
warn "Cannot verify offsite backup automatically — confirm manually:"
warn " - pack-bundle.sh has been run with current secrets"
warn " - ops-bundle.tar.age stored in a separate physical location"
warn " - age decryption key stored separately (NOT in the same location as the bundle)"
# ── Summary ───────────────────────────────────────────────────────────────────
echo ""
echo "════════════════════════════════════════════════════════════"
echo " T08 verification: PASS=$PASS WARN=$WARN FAIL=$FAIL"
echo "════════════════════════════════════════════════════════════"
if [[ "$FAIL" -gt 0 ]]; then
echo " Result: INCOMPLETE — resolve FAIL items before marking T08 done"
exit 1
elif [[ "$WARN" -gt 0 ]]; then
echo " Result: PARTIAL — structure is in place; resolve WARN items (trigger CronJobs)"
exit 0
else
echo " Result: COMPLETE — T08 done-criteria met; SSO/MFA platform workplan complete!"
exit 0
fi