Commit Graph

84 Commits

Author SHA1 Message Date
576cf0d95b Local Identity OICD bootstrap 2026-05-02 16:58:44 +02:00
d8fea09de7 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-01:
  - update .custodian-brief.md for net-kingdom
2026-05-01 23:20:21 +02:00
f172f50f95 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-01:
  - update .custodian-brief.md for net-kingdom
2026-05-01 23:05:42 +02:00
d13a2b9b39 Scope update from repo-scoping refactor 2026-05-01 12:28:04 +02:00
69763056fa chore(session): read .custodian-brief.md before MCP call in session init
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:48:52 +01:00
4942ee1bba chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-26:
  - update .custodian-brief.md for net-kingdom
2026-03-26 17:47:51 +01:00
8612e6b8a2 Decision for KeyCape Implementation Language Go 2026-03-26 09:21:17 +01:00
c054241a5c feat(t09): backup, break-glass, DR drill — NK-WP-0003-T09 done
- Apply SQLite backup CronJobs (LLDAP, Authelia, privacyIDEA) — all verified running
- Fix authelia-backup: remove scale-down/up dance; concurrent local-path PVC mount
  works on single-node k3s, sqlite3 .backup is safe for concurrent access
- Fix privacyidea-backup: add supplementalGroups: [999] so uid=1000 can read enckey
- Add allow-backup-to-kube-api NetworkPolicy (backup pod → 10.43.0.1:443)
- Create break-glass LLDAP account (net-kingdom-admins); fix ((PASS++)) set-e trap
- SQLite restore drill: LLDAP backup valid (2 users, all tables)
- verify-t08.sh: PASS=15, FAIL=0; fix counter bug + enckey PVC path (/etc/privacyidea)
- Update DR-RUNBOOK.md Authelia restore procedure
- T09 deferred: CNPG backup (needs MinIO/S3), Prometheus (needs kube-prometheus-stack)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 23:56:40 +00:00
4c47c9035f chore(workplan): NK-WP-0003 T04+T08 — testuser provisioned, pi-admin TOTP deferred
testuser fully provisioned in LLDAP + privacyIDEA (TOTP00007147 validated).
pi-admin TOTP deferred: requires admin realm setup (SQLresolver), pi-manage
has no enroll command, WebUI only works for resolver-backed users.
T08 unblocked — proceed to KeyCape acceptance tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:51 +00:00
331eeaf378 fix(lldap): fix gql() brace bug + use LDAP for password setting
Three fixes:
1. gql() default vars '${2:-{}}' — bash parsed first '}' as closing the
   parameter expansion, appending a stray '}' to every caller's vars.
   Fixed by storing '{}' in a local variable first.
2. make_vars() — add VAR_INT_KEYS support so groupId is emitted as a
   JSON integer (Int!) rather than a string, matching LLDAP's schema.
3. Password setting — LLDAP has no GraphQL mutation for admin password
   reset. Replace the broken resetUserPasswordFromAdmin mutation with
   an RFC 3062 LDAP Password Modify operation via kubectl port-forward
   to the in-cluster LLDAP service, using ldap3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
3a76774dec feat(lldap): add --test flag to create-user.sh for auto-derived passwords
--test derives the password from the display name (spaces → hyphens, append -Pwd),
e.g. "Test User" → "Test-User-Pwd". Skips the interactive prompt.
Useful for provisioning test accounts in a non-interactive flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
ca69f6bb73 fix(lldap): use env vars in create-user.sh to avoid shell injection
Pass GraphQL query/variables and group names via environment variables
to python3 instead of shell argument interpolation. Prevents breakage
when display names, emails, or passwords contain quotes or spaces.

Also adds --admin flag support and interactive password prompt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
e802fe3a9d feat(lldap): add create-user.sh for user provisioning
Creates a user in LLDAP via GraphQL, adds them to net-kingdom-users,
optionally net-kingdom-admins (--admin flag), and sets a password interactively.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
35fa3a5767 fix(privacyidea): create pi-admin-all-rights policy in bootstrap-admin.sh
Once any admin policy exists, PI enforces it for all admins. Without an
explicit policy, pi-admin is locked out of the REST API after trigger-admin-rights
is created. Add pi-admin-all-rights (scope=admin, action=*) via pi-manage
(in-pod) as step 5, before the REST-based trigger-admin-rights step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
afbf968c76 fix(privacyidea): bootstrap-realm scope fixes + netpol for PI→LLDAP
bootstrap-realm.sh:
- Remove Content-Type header from GET requests (Werkzeug 3.x BadRequest fix)
- Fix resolver type check — result path is result.value.<name>.type, not .data
- Fix self-enrollment policy scope: 'user' not 'enrollment' (PI 3.12)

NetworkPolicies:
- allow-egress-to-lldap (mfa ns): privacyIDEA → LLDAP :3890
- allow-privacyidea-to-lldap (sso ns): ingress from mfa/privacyIDEA → LLDAP :3890

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
88bbd585fd fix(privacyidea): rename realm netkingdom → coulomb in bootstrap-realm.sh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
c0e330ee4e fix(privacyidea): disable response signing + raise rate limit to unblock login
PI_NO_RESPONSE_SIGN=True works around Werkzeug 3.x crash where request.json
raises BadRequest on GET requests with empty bodies (sign_response path).

Rate limit raised from 20/5 to 200/100 req/min to allow the AngularJS UI's
burst of ~50 parallel static asset requests on each page load without being
throttled by Traefik. TODO: split tight /auth+/validate vs loose /static limits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
23e0b43318 fix(netpol): allow Traefik→ACME solver pods; mark T02–T07 done on RAILIANCE01
Added allow-traefik-to-acme-solver NetworkPolicy to sso and mfa namespaces.
The default-deny-all policy was blocking HTTP-01 challenge traffic from Traefik
to the cert-manager solver pods, causing all TLS certs to stay pending (502).

Workplan NK-WP-0003 updated: T02, T03, T04, T05, T06, T07, T08a all done on
RAILIANCE01 as of 2026-03-25. T08 (e2e auth test) is now unblocked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:49:26 +00:00
df09dd42f4 feat(close): mark NK-WP-0003 T08/T08a/T08b done — acceptance tests passing
All 3 KeyCape test packages pass (migration, negative, profile).
DNS resolves for all 4 subdomains; Go 1.22.10 available at ~/go/bin/go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:52:11 +01:00
eebaa4fc81 chore(workplan): add T08a (DNS records) and T08b (Go install) tasks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:40:40 +00:00
d1fd73e7ed chore(workplan): NK-WP-0003-T08 blocked — DNS records + Go missing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:36:56 +00:00
c8c6efbc55 chore(workplan): NK-WP-0003-T07 done — KeyCape running
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:32:45 +00:00
880f89bf98 fix(keycape): NK-WP-0003-T07 — fix deployment image + add demo-app client
- deployment.yaml: image → 92.205.130.254:32166/coulomb/key-cape:latest
  (Gitea OCI registry, delivered by KEY-WP-0002; imagePullPolicy: Always)
- k3s insecure registry hosts.toml: fixed server endpoint to http:// so
  containerd does not attempt HTTPS against the plain-HTTP Gitea NodePort
- create-secrets.sh: add demo-app OIDC client (required for KeyCape to
  start; also needed for T08 acceptance tests)
- keycape-config Secret updated in-place (no re-bootstrap needed)

KeyCape pod 1/1 Running; /healthz OK; OIDC discovery live at
https://kc.coulomb.social/.well-known/openid-configuration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:30:58 +00:00
d0629e7f20 chore(workplan): NK-WP-0003-T07 blocked — awaiting GHCR image from key-cape
Deployment applied; pod in ImagePullBackOff. Secrets already correct.
Capability request 0e0aefd7 filed; key-cape must publish ghcr.io image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 22:24:32 +00:00
f2f07871eb fix(sso-mfa): commit T02–T06 fixes and workplan status updates
- authelia: users_filter uid→{username_attribute}, OIDC client secret
  moved from env var to inline bcrypt hash in configmap (4.38 limitation)
- authelia: remove unsupported CLIENTS_0_SECRET_FILE env var
- lldap: drop runAsNonRoot/runAsUser (image init requires root)
- verify-t02: keycloak→keycape NetworkPolicy check rename
- workplan: T02/T03/T05/T06 marked done with notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 20:25:03 +00:00
a60f4fc834 chore(workplan): NK-WP-0003-T04 done — privacyIDEA deployed and bootstrapped
Pod Running with correct image and config. enckey, audit keys, pi-admin,
trigger-admin all created via agent bootstrap (NK-WP-0005).
Remaining: TLS cert + trigger-admin policy via WebUI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 12:13:52 +00:00
59ba9e6fe1 fix(creds-bootstrap): harden agent bootstrap for non-interactive execution
- creds-bootstrap-agent.sh: skip Phase 3 if all secrets already applied
  (avoids CNPG SSL connection drops from repeated reconciliation)
- creds-bootstrap-agent.sh: wait for rollout to complete after restart
  before running enckey/admin bootstrap (fixes race with old pod)
- creds-bootstrap-agent.sh: only restart privacyIDEA when Phase 3 ran
- create-pi-token.sh: use env-var + retry for token fetch (no heredoc
  stdin; handles transient 500 from idle connection pool)
- create-pi-token.sh: create keycape-pi-token K8s Secret after fetching
- creds-verify.sh: map keycape-pi-token to secrets_applied.keycape
  (not pi_admin_created, which caused spurious Phase 5 re-runs)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 12:11:13 +00:00
56036cd4be chore(creds): bootstrap complete [agent NK-WP-0005] 2026-03-21 12:10:53 +00:00
329f086743 chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 12:09:53 +00:00
aa0edabb81 chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 11:43:02 +00:00
49ded70e4b chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 11:37:13 +00:00
eb7c21a00c chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 11:34:50 +00:00
152c2aac72 chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 11:24:38 +00:00
daeeb863cb chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 11:22:53 +00:00
ca853a84b7 chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 10:59:13 +00:00
5d032db3dd chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 10:51:47 +00:00
0ea05b302c chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 10:50:22 +00:00
963a4700ef chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 10:44:37 +00:00
33b9b93dba chore(creds): encrypted secrets [agent NK-WP-0005] 2026-03-21 10:42:39 +00:00
f227dfbd3d fix(privacyidea): add PI_ADDRESS/PI_PORT; switch readiness probe to tcpSocket
gpappsoft entrypoint requires PI_ADDRESS and PI_PORT env vars to build
the gunicorn bind argument. Without them the container crashes immediately.

/token/ returns 401 for unauthenticated GET requests so the httpGet
readiness probe was permanently failing. Switch to tcpSocket to match
the startup and liveness probes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 10:41:13 +00:00
9587d14803 fix(privacyidea): override PRIVACYIDEA_CONFIGFILE to use mounted pi.cfg
gpappsoft image sets PRIVACYIDEA_CONFIGFILE=/privacyidea/etc/pi.cfg
internally, causing it to ignore our mounted configmap at
/etc/privacyidea/pi.cfg and fall back to SQLite.

Override the env var so the entrypoint reads our pi.cfg, which points
to PostgreSQL via PI_SQLALCHEMY_DATABASE_URI from the secret.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:43:35 +00:00
bececac7b8 fix(privacyidea): correct image to ghcr.io/gpappsoft, port 5001→8080
privacyidea/privacyidea:3.12 and privacyidea/otpserver:3.12.2 do not
exist on Docker Hub. Correct image is ghcr.io/gpappsoft/privacyidea-docker:3.12.2
which listens on port 8080.

Update all port references: deployment, service, ingress, netpol-mfa,
netpol-sso (keycape→privacyIDEA egress rule).

Also: creds-bootstrap-agent.sh — restart privacyIDEA deployment after
applying new secrets so the pod picks up updated env vars.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:37:38 +00:00
bcae4bc6dd fix(workplans): portable key-cape path in NK-WP-0003-T08; add /creds-init skill
- NK-WP-0003 T08: replace hardcoded /home/worsch/key-cape with
  $(git rev-parse --show-toplevel)/../key-cape so acceptance tests
  run correctly on any machine
- NK-WP-0005 T04: create .claude/commands/creds-init.md — the
  autonomous credential bootstrap skill (reads creds-state.yaml,
  resumes from current phase, honours emergency bundle gate)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 10:01:14 +01:00
0670e17b42 chore(workplans): revise workplans post NK-WP-0005
NK-WP-0005: mark all tasks done, status → done
NK-WP-0003: T01 marked done (NK-WP-0004/0005 complete); pre-conditions
  updated; done criteria reflect agent-bootstrap model (no KeePassXC)
NK-WP-0001: status → deferred; T05-T08 (Keycloak) deferred indefinitely;
  superseded_by: NK-WP-0003 added

Active work path is now NK-WP-0003 T02-T09.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 08:47:44 +00:00
95656f2324 feat(creds): NK-WP-0005 — agent-driven credential bootstrap
Implements all 7 tasks from NK-WP-0005:

T01: creds-state.yaml → schema_version: 2, agent_mode: true
     Replaces keepass_confirmed with emergency_bundle_delivered,
     adds phase tracking fields for fully automated flow.

T02: creds-bootstrap-agent.sh — single entrypoint for autonomous
     bootstrap. 10 phases, idempotent re-runs via state file.
     Only human touchpoint: emergency bundle confirmation gate.

T03: emergency-bundle.sh — assembles and displays emergency bundle
     (age key + break-glass passwords + ops bundle location).
     Writes temp file, shreds on confirmation, clears screen.
     Supports --reprint for re-delivery.

T04: ~/.claude/commands/creds-init.md — /creds-init skill replaces
     /creds-bootstrap. Fully autonomous execution via the agent.

T05: Makefile — creds-agent-init, creds-agent-status,
     creds-emergency-reprint targets.

T06: creds-rotate.sh — --non-interactive flag for agent-driven
     rotation. Auto-confirms all gates; tracks last_rotated_<key>
     in creds-state.yaml. LLDAP web UI step prints warning in
     non-interactive mode.

T07: canon/standards/credential-management_v0.2.md — updated
     standard: KeePassXC removed from operational path, agent
     bootstrap as Phase 0, emergency bundle section, prohibited
     patterns updated.

Also: creds-status.sh handles both schema v1 (legacy) and v2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 08:38:52 +00:00
8db000e5f0 feat(workplan): NK-WP-0005 — agent-driven credential bootstrap
Replaces the human-as-operator model from NK-WP-0004 with full agent
automation. Agent generates, encrypts (SOPS), injects into cluster,
and delivers a single emergency bundle (age key + break-glass passwords).
Human only stores that bundle in their personal password manager.

KeePassXC removed from operational path. creds-state.yaml redesigned
with agent_mode and emergency_bundle_delivered gate. Standard to be
updated to v0.2 (T07).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:25:36 +01:00
b4a3a5966f chore(consistency): NK-WP-0004 complete — correct regressed task statuses
All 7 tasks were implemented in c10d7d2 but fix-consistency on the
workstation reverted all statuses to todo (CUST-WP-0026 regression bug).
Corrected all task blocks to status: done and workplan status to done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:19:37 +01:00
6fdfe8a669 chore 2026-03-21 00:15:58 +00:00
01c8a07f3a fix(sso-mfa): NK-WP-0003-T04 — correct privacyIDEA image and port
privacyidea/privacyidea:3.12 does not exist on Docker Hub.
Correct image: privacyidea/otpserver:3.12.2 (port 5001).

Updated files:
- deployment.yaml: image, containerPort, probes, service port
- ingress.yaml: backend service port
- netpol-mfa.yaml: ingress port + keycloak → keycape label
- netpol-sso.yaml: KeyCape egress port to privacyIDEA

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 23:54:18 +00:00
c10d7d2f8a feat(creds): implement NK-WP-0004 Credential Management Foundation
- .sops.yaml + keys/age.pub: SOPS age encryption for all secrets/ paths
- .gitignore: broad secrets/ catch-all (any depth)
- .githooks/pre-commit: blocks unencrypted secrets/, *.env outside bootstrap/,
  and known plaintext patterns (PI_SECRET_KEY=, LLDAP_JWT_SECRET=, etc.)
- Makefile: full credential lifecycle (creds-init/generate/bundle/apply/verify/
  status/rotate) + SOPS helpers (sops-setup/edit/encrypt/decrypt/rotate/check-secrets)
  + hooks/hooks-test
- creds-apply.sh: runs create-secrets.sh in dependency order (postgresql → lldap →
  authelia → privacyidea), skips keycape with printed instructions, updates state
- creds-verify.sh: checks all K8s secrets exist, updates creds-state.yaml
- creds-status.sh: human-readable state table from creds-state.yaml
- creds-rotate.sh: guided rotation for all 9 secret types with impact descriptions
  and atomic multi-component update sequences
- creds-state.yaml: committable state file tracking generation, bundle, KeePassXC
  confirmation, per-component apply status, enckey and pi-admin bootstrap flags

NK-WP-0003-T01 unblocked. /creds-bootstrap skill registered separately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 23:39:35 +00:00