NET-WP-0018 T09: Assess scratch-rebuild risk and define rehearsal plan

- Created docs/security-bootstrap-rebuild-risk-and-rehearsal.md (risk table 12+ items classified likelihood/impact/etc; UE adapters #1 HIGH per assessment 7 gaps; non-destructive rehearsal plan: scripted dry (creds-init --dry + 0019 orchestrator + make/console validate + T07/T08) then ns-isolated then parallel; rollback via cleanup/lock-offboard; prove via validators/evidence/status/tests; recs + coverage gaps documented; refs T02/T03/T05/T07/T08/0019/assessment/contract) - Updated workplan T09 status:done + detailed 2026-06-04 completion note (reviews all prior T0x + live console/evidence/metadata exercised in T09; pragmatic infra used; 9/9 closes 0018) - Frontmatter updated date - No destructive; all per session protocol + pragmatic audit (file source per ADR-001) - T07 tests + T08 validate-keycape + validate-onboarding-dry-run exercised OK as part of review Refs: workstream 800f9f16-..., task a9e60fd5-...; will POST /progress/ + fix-consistency
2026-06-04 00:50:18 +02:00
parent 9a3228489e
commit 875e50d573
2 changed files with 166 additions and 2 deletions
--- a/docs/security-bootstrap-rebuild-risk-and-rehearsal.md
+++ b/docs/security-bootstrap-rebuild-risk-and-rehearsal.md
@@ -0,0 +1,156 @@
+# Security Bootstrap Scratch-Rebuild Risk Assessment And Rehearsal Plan
+
+**Status:** complete (NET-WP-0018-T09)
+**Date:** 2026-06-03
+**Workplan:** NET-WP-0018 Bootstrap Automation And Rebuild Readiness
+**Related:** T02 (runtime architecture), T03 (retrospective + gap matrix), T05 (smooth bootstrap guide), T07 (tests), T08 (validators in UI state), NET-WP-0019 (dry-run polish), docs/user-engine-netkingdom-integration-assessment.md (7 gaps + boundary contract), canon/standards/user-engine-boundary-contract_v0.1.md, .local/security-bootstrap.json (S6 flags), tools/security-bootstrap-console/ (status/gates/validate), creds-init skill, sso-mfa/k8s/lldap/dry-run-nonroot-user.sh + Makefile targets.
+
+**Pragmatic note (per 0018 Coordination):** This assessment was produced using the pragmatic auditing infra agreed for 0018 impl: State Hub /progress/ + task linkage, dated notes in workplan file (source per ADR-001), git commits, console evidence/validators + .local metadata, /tmp evidence from 0019 runs, T07/T08 automated checks, and direct review of T02/T03/T05 artifacts. No production Audit Core required; this record + prior events enable exact retrospect/optimization review. (audit_core_bootstrap_risk_accepted:true in metadata with 2026-07-02 review.)
+
+## Executive Summary
+
+Post 0017 (S6 Reopen under custody) + 0019 (T06-adjacent dry-run automation) + 0018 T02–T08, the NetKingdom IAM/security bootstrap is repeatable via a documented control surface (console + make + 0019 orchestrator + smooth-guide + evidence validators) with many gates now computed from live checks (T08) or durable non-secret metadata (S6 flags: platform_reopened, cleanup_complete, oidc_login_verified, king kit, root revoked, restore passed, etc.).
+
+**Current live posture (console status + metadata + validators exercised in T09):**
+- Stage: S6 - Reopen under custody
+- "KeyCape OpenBao client deployed" gate: done (T08: computed via verify-openbao-client.sh where possible; also source def ready)
+- onboarding-dry-run evidence validates OK (12+ exact bools: actor_class=user, no platform-root grant, lldap_identity_verified, keycape_oidc_claims_verified, no_secret_material_recorded, effective preview, lock_offboard_result, post clean, etc.)
+- T07 pytest: 8/8 pass (templates, runbooks incl. "User lifecycle dry-run (T06)", audit_core_posture, source commands, validate fns, cross-checks)
+- All major bootstrap steps have non-secret evidence paths + validators + make targets
+
+**Rebuild risk is bounded but not zero.** A full scratch rebuild is **not a goal** of 0018 and was never performed. The first bootstrap succeeded with heavy interactive diagnosis; subsequent automation (console, 0019 /tmp-orchestrator + k8s-fallback + evidence discipline, T08 live validators, T05 consolidated guide) reduces rediscovery. Residual risks center on:
+- UE integration gaps (adapters missing → full governed user facts / claims_enrichment / audit correlation not yet in the IAM-orchestration rehearsal surface)
+- Fragile attended elements still present for drift/repair/init/custody (human gates explicit and required)
+- Operator state (.local) and token/expiry handling
+- Lack of exercised parallel-cluster or ns-isolated full end-to-end rehearsal to date (current is "as live" + dry-run fixtures)
+
+**Recommendation:** Rehearse via **scripted dry-run + namespace-isolated** modes first (using existing 0019 tooling + creds-init + smooth-guide + all validate-* + T07/T08). Classify "UE integration" as **HIGH** risk item until net-kingdom-specific adapters (IdentityClaimsAdapter etc.) + NK wiring + updated dry-run exercising UE projection land (per assessment + contract). No live destructive scratch rebuild. Current pragmatic bootstrap (creds-init automated path, direct LLDAP/Keycloak for platform users per contract allowance for bootstrap, evidence-proven lifecycle) is the proven repeatable base; rehearsal proves *that contract*. Use T09 + T02/T03 as the record for when adapters are ready (drive integration tests from 0018/0019).
+
+## Risk Classification
+
+Risks compiled from T02 (as-deployed + UE section + rebuild notes), T03 (9 bumps + matrix), T05 (guide + blocked conditions), T08 (current computed gates), assessment (7 gaps), live metadata/console (S6 + T08 validator success), 0019 evidence samples. Format: Likelihood (low/med/high), Impact (on rebuild success/time/correctness), Detection (how known today), Mitigation (current automation), Remaining Human Interaction, Priority for post-0018.
+
+| Risk Area | Likelihood | Impact | Detection | Mitigation (current) | Remaining Human | Priority / Notes |
+|-----------|------------|--------|-----------|----------------------|-----------------|------------------|
+| Missing UE platform adapters (IdentityClaimsAdapter from KeyCape claims, AuthorizationCheckPort to flex-auth, SecretProvider OpenBao, EventOutbox, AuditWriter, MembershipFactExporter, ApplicationBinding) | High | High (full UE-backed user facts, claims_enrichment, memberships with owning_system, audit correlation, projections not available in rebuild; drift between LLDAP groups and UE Membership) | T02 arch UE section, assessment gaps 1/2/4/6/7, T03 matrix UE row, console claims helpers still direct LLDAP, no UE deployment in NK flows yet | 0019 dry-run proves *IAM-lifecycle contract* (onboard/lock/offboard via LLDAP/KeyCape with evidence 12+ bools + actor checks + no root grant); NK orchestrates boundaries per contract; direct LLDAP allowed for bootstrap per assessment/contract | Implement adapters (primarily UE per contract) + NK/key-cape wiring for claims_enrichment (adapter-owned, UE never tokens/hard-dep); update dry-run to exercise UE projection + seed externally_provisioned for platform users; joint review | **HIGH (T09 classify)**. Biggest blocker to "UE as canonical for user-domain facts". Rehearsal today limited to IAM-orchestration surface. Mitigation: follow assessment recs (stub adapter, use 0018/0019 as testbed). Do not claim full UE rebuild readiness. |
+| Scratch loses current S6 / bootstrap evidence / .local state / k8s secrets | Med | High (current platform_reopened, cleanup_complete, oidc_verified, restore_passed, custody, audit_core_risk_accepted flags + OpenBao config + user state would need full re-ceremony) | .local/security-bootstrap.json (many true), console status S6, T02/T03 note live state, k8s secrets in sso ns | None (by design: scratch = reset). creds-init provides automated cred re-gen path. Smooth guide + validators allow re-proof. | Full re-run of S1–S6 with evidence at each step; human custody for init/unseal/root disposition/restore drill/approval | High. Non-goal of 0018 to trigger. Rehearsal must be non-destructive (dry/ns-isolated) so live S6 evidence preserved for production handoff. |
+| Realm/repair drift (privacyIDEA for LLDAP users/MFA) still requires attended repair | Med | Med (MFA self-enroll or admin broken on rebuild) | Console "privacyIDEA realm repair" runbook, T03 bump #1, T02 flows, validate-t02 partial | Runbook + repair-realm-live.sh; some validate | Attended apply/verify on live cluster; no fully declarative gate yet | Med (T08 target for realm health validator). |
+| OIDC callback / KeyCape client registration drift | Med | Med (login fails for console/OpenBao admin path) | T03 bump #2, T02 clients/redirects, T08 "KeyCape OpenBao client deployed" gate + source def (verify script) | Non-secret client in create-secrets.sh; T08 validator (source + live via script); console preflight | Manual apply/restart KeyCape on drift; verify via OIDC | Med (T08 already computes deployed). Good model for other client gates. |
+| Claims enrichment path still direct LLDAP (post-adapter drift risk) | High (until adapters) | High (violates contract: adapter-owned cache/freshness/fail; UE not in token path) | Assessment gap 6, T02 UE section, 0019 claims helper still infers from LLDAP groups, keycape OIDC config | 0019 dry-run + claims verify (groups + T01 role binding, warns on root/admins) exercises current direct path as "IAM contract proof" | Route via claims_enrichment adapter + cache when UE deployed; production adapters must reject local/loopback | High. Rehearsal today correctly exercises direct (bootstrap allowed). Future: update orchestrator to prefer UE projection when present. |
+| Token expiry / revocation / short-lived helper tokens | Med | Med (leaked paths or expired admin access during rebuild) | T03 bump #5, T02 token flows, console revoke runbooks | Runbook + helpers (no plaintext on CLI, accessor/self revoke); 0019 cleanup | Some cases still interactive (pod TTY for protected tokens); no fully non-interactive in all dry paths | Med (T08). 0019 lock-offboard + evidence "post clean" mitigates for test users. |
+| Operator-state / .local metadata drift or tamper | Low (now) | Med (flags out of sync with reality; stage stuck) | Console status/validate/metadata flows, T03 bump #6, S6 flags set | .local updated only via console (non-secret); validate-*; T08 compute from validators where possible (e.g. keycape) | Manual edit possible (file); multi-op sync not cluster-durable yet | Med (T08 pattern). Pragmatic record (State Hub progress) is cross-check. |
+| Audit correlation (pragmatic vs production) | High (gap) | Med (no shared IDs across UE/flex/platform sinks for rebuild forensics) | Assessment gap 7, T03 matrix audit row, T02 "Pragmatic Audit Paths" section, metadata audit_core_* | Pragmatic working: local-identity/audit.py TSV (mode 600), OpenBao PVC+mock, State Hub /progress/ + task_ids (used for 0018 itself), console evidence; audit_core_bootstrap_risk_accepted=true (owner/review 2026-07-02) | Production tenant-aware durable Audit Core sink + UE AuditWriter/OutboxEvent + correlation bundle (per contract) | High (T03/T09). Do not block 0018. Rehearsal uses pragmatic (progress events with workstream/task). |
+| Secret taint / hygiene in user lifecycle or cred bootstrap | Low (post-0019) | High (if occurs) | 0019 evidence "no_secret_material_recorded", validator require_fields + extra checks, T03 bump #7, console SECRET_EVIDENCE_MARKERS | 0019 orchestrator: /tmp WORKSPACE + trap EXIT rm -rf; k8s fallback (never writes persistent bootstrap/secrets for --test); creds-init uses SOPS/age + k8s injection; evidence discipline | Emergency bundle human gate (creds-init); password-safe for king | Low. Excellent model. Rehearsal must always use --dry or orchestrator paths. |
+| Restore drill / escrow / custody not re-proven on every rebuild | Med | High (recovery untrusted) | T03 matrix, T02 custody/restore notes, console validate-t02 + evidence gates, custody-roster | restore_drill_passed + post-unseal verified in metadata; signed roster; T07/T08 coverage | Attended ceremony for init/unseal/rotate/restore proof; human custody approval (temporary-single-king) | Med. Rehearsal plan must include isolated restore proof (non-destructive snapshot/restore in temp). |
+| Full cluster / parallel rebuild never exercised end-to-end | High | Med (unknown integration surprises at scale: DNS, ingress, ns isolation, ESO, operators) | T02 operational assumptions, T03 "rebuild risk" row, no parallel cluster in history | Scripted dry + 0019 ns/k8s-fallback paths; creds-init; smooth-guide sequence | Provision isolated env (temp ns, dev cluster, or parallel); run full guide + all validate + S6 reopen; manual for cluster bringup | High (T09). Recommend ns-isolated first (leverages existing k8s access + fallback); full parallel when hardware available. |
+| Membership / group / "net-kingdom" tenant semantics not synced to UE | Med | Med (coarse IAM groups vs UE Membership owning semantics; "platform-root" special case) | Assessment gaps 2/4, T02, T03 UE row, contract sync envelope ("owner wins", freshness, externally_provisioned) | 0019 dry-run exercises LLDAP groups as IAM facts (net-kingdom-users only for non-root); no UE Membership yet | Explicit bootstrap-to-governed transition rules (IAM groups seed externally_provisioned in UE; LLDAP remains auth source); decide platform users stay IAM-only or dual | Med. Rehearsal limited to IAM side today. |
+| Application onboarding "Application" concept overload | Low | Low | Assessment gap 3, contract: UE owns Application+Binding records (separate from KeyCape OIDC client) | NK/key-cape does OIDC client/secret; UE binding is separate record | Keep strict (no merge in automation/scripts) | Low. |
+
+**Summary classification:** 3–4 HIGH (UE adapters/integration #1, scratch state loss, claims path until fixed, cluster rehearsal unexercised, audit correlation). Most are known, documented, and have clear mitigation paths (adapters + wiring; non-destructive rehearsal modes; pragmatic audit accepted temporarily). No unknown unknowns from first bootstrap remain untracked.
+
+## Rehearsal Plan (Non-Destructive, Evidence-Proven, Reversible)
+
+**Guiding principles (from 0018 non-goals + T05 + 0019 + contract):**
+- Never destructive on live (no full teardown as part of 0018/ T09).
+- Prefer scripted dry-run (local, no k8s mutation or minimal) and namespace-isolated (temp ns or scoped resources) over parallel cluster.
+- Every step produces/consumes non-secret evidence matching templates/validators (exact bools, actor_class != king for non-root, no secret markers, effective_access before save).
+- Use existing control surface: creds-init skill (automated cred entrypoint + --dry-run), 0019 orchestrator (dry-run-nonroot-user.sh with /tmp+trap+k8s-fallback+cleanup), make security-bootstrap-* targets, console subcmds (onboarding-dry-run, validate-*, status, lifecycle-guide), T07 pytest + T08 computed validators (keycape + onboarding + t02 etc.).
+- Rollback always possible via lock-offboard (GraphQL removeUserFromGroup + deleteUser) or --cleanup-only + evidence "post_dry_run_disposition: clean".
+- Prove via: console status (all gates "done" or explicit "human"), make validate-* (return 0), T07 tests pass, evidence files validate, .local flags updated only via approved paths, State Hub progress events.
+- Human custody gates remain explicit (init, root disposition, emergency bundle, custody approval, S6 reopen). Do not hide them.
+- Rehearsal limited to IAM-orchestration contract today (direct LLDAP/Keycloak for platform users + test non-root); UE projection exercised only after adapters.
+
+**Recommended sequence / modes (lowest risk first):**
+
+1. **Scripted local dry-run (zero cluster impact, repeatable in CI or laptop):**
+   - `creds-init --dry-run` (or make creds-*-dry if exposed) — validates pre-flights, would-gen secrets, no write.
+   - Follow smooth-bootstrap-guide.md steps 1–6 (prereqs, king kit via console, privacyIDEA, LLDAP king+ MFA, KeyCape, OpenBao preflight) using console commands + validate-kit / validate-t02 etc. (no init).
+   - Step 7 user lifecycle: `make security-bootstrap-onboarding-dry-run SUBJECT=t09-rehearsal EMAIL=... DISPLAY=...` (internally calls orchestrator --actor user --scope none; produces /tmp/.../evidence.json with all 12+ bools; auto cleanup by default).
+   - `make security-bootstrap-validate-onboarding-dry-run` (or direct console validate-onboarding-dry-run --evidence ... ) — must pass.
+   - `make security-bootstrap-validate-keycape-client` (T08) + other validate-* .
+   - `make security-bootstrap-console-test security-bootstrap-scripts-syntax` (T07).
+   - `python3 tools/security-bootstrap-console/security_bootstrap_console.py --metadata .local/security-bootstrap.json status` — expect S6 (or appropriate) with T08 gates computed.
+   - Optional: `make security-bootstrap-lifecycle-cleanup-dryrun-users PATTERN=t09-*` (explicit rollback).
+   - Capture: /tmp evidence + console output + git status (no secret changes) + State Hub /progress/ event.
+   - Exit criteria: all validators 0, tests green, evidence matches template + bools true, no plaintext in bootstrap/secrets, no residual test users/groups.
+
+2. **Namespace-isolated or k8s-fallback rehearsal (uses live cluster access but scoped):**
+   - Same as above but target a temporary namespace or use KUBECTL with context that only sees test resources (if operators allow).
+   - Use 0019 k8s-fallback paths in create-user.sh (extracts LLDAP_ADMIN_PASS from k8s secret into /tmp only; never persistent bootstrap/secrets).
+   - For full S1–S6 elements that mutate (e.g. KeyCape client apply, realm repair): run against test KeyCape config or accept attended apply + immediate verify + cleanup.
+   - Include isolated restore drill (snapshot OpenBao audit PVC or test mount, encrypt to custodian age, restore proof in temp, re-verify post-unseal per T02/T03).
+   - Exercise claims verification + lock/offboard + post-clean evidence.
+   - Prove: same validators + console "KeyCape OpenBao client deployed" (computed) + S6 flags re-settable via approve/metadata.
+   - Rollback: explicit cleanup targets + GraphQL removes (as in 0019 script).
+
+3. **Parallel cluster / full isolated env rehearsal (when available):**
+   - Provision separate k3s/ThreePhoenix dev cluster or ns with full operators (CNPG, ESO, cert-manager, ingress, etc. per T02 assumptions).
+   - Run creds-init (full, with emergency bundle human gate).
+   - Follow smooth-bootstrap-guide.md end-to-end (S1–S6), producing evidence at each step.
+   - Use console status + all make validate-* + T07/T08 tests after each phase.
+   - Exercise full user lifecycle dry (0019) + platform reopen (set platform_reopened, cleanup_complete).
+   - Include full restore drill + custody roster sign + audit_core flags.
+   - Verify T08 live validators (keycape + future realm/LLDAP/Authelia/OpenBao OIDC/State Hub sync).
+   - Handoff: produce handover checklist; log State Hub progress.
+   - This is the closest to "scratch" without touching production.
+
+4. **Full live scratch (only after isolated rehearsal passes + explicit human approval):**
+   - Not recommended or scheduled. Would require destroying current S6 state (k8s secrets, OpenBao, .local flags, LLDAP users beyond platform-root?).
+   - If ever: same as #3 but on the target cluster; accept that all prior evidence must be re-generated and S6 re-approved.
+   - Use T09 doc + T02/T03 as the "what must be proven again" checklist.
+   - Per non-goals: do not perform as part of this workplan.
+
+**Rollback / hygiene in all modes:**
+- 0019: --no-lockoffboard / --keep-user for inspection; default + --cleanup-only for clean slate.
+- GraphQL: removeUserFromGroup + deleteUser (see create-user.sh + dry-run script).
+- /tmp workspaces auto-clean via trap.
+- No commit of plaintext (git hooks + pre-commit in creds).
+- Evidence always records "post_dry_run_disposition: Test subject fully removed..."
+
+**How to prove a rehearsal "passed" (T09 acceptance for future):**
+- console status reports expected stage + all critical gates "done" (T08 computed where possible) or explicit human.
+- All relevant `make security-bootstrap-validate-*` and `security-bootstrap-console-test` exit 0.
+- Evidence JSON files exist, validate structurally + contain required fields + bools (e.g. lldap_identity_verified: true, no_secret... : true, actor_class: "user", groups limited, lock_offboard_result clean).
+- .local/security-bootstrap.json updated only via console paths; S6 flags (or equivalent) set.
+- State Hub has /progress/ events with workstream/task linkage for the rehearsal run.
+- No secret material in Git, /tmp left behind, or console output.
+- For UE future: when adapters present, a rehearsal run must also show claims_enrichment projection exercised (via 0019 claims hook updated or new UE dry target) and Membership facts with owning semantics.
+
+**Current coverage gaps in rehearsal (documented, not hidden):**
+- UE adapters not present → cannot yet rehearse "UE projection as source of truth for non-root users in dry-run".
+- Full parallel cluster not yet provisioned in this env → ns/scripted is the exercised path.
+- Some drift repairs (realm) still attended → rehearsal will surface them again (good).
+- Audit Core production sink not ready → pragmatic correlation only.
+
+## Recommendations
+
+- Treat this doc + T02 (arch + UE gaps) + T03 (matrix + bumps) + T05 (guide) as the "rebuild bible". Update them when adapters land or new bumps found in rehearsal.
+- Implement net-kingdom-specific adapters in user-engine (start with IdentityClaimsAdapter + claims_enrichment wiring in key-cape/net-kingdom OIDC paths) — highest leverage follow-up from assessment/T09.
+- Extend T08 pattern: add more computed validators (realm health, LLDAP membership for bootstrap groups, Authelia route, OpenBao OIDC config proof, State Hub sync check) so console "proves itself" more.
+- Drive 0019-style dry-run updates from T09: once adapters exist, add UE-exercising path (e.g. claims hook calls UE projection, verifies Membership facts) while keeping IAM direct as bootstrap fallback.
+- Schedule a "T09 rehearsal drill" (scripted + ns) after any major change (new KeyCape image, OpenBao policy, 0017 hardening); record evidence + progress event.
+- For production readiness handoff (0017): require a successful isolated rehearsal + all T09 HIGH risks mitigated or explicitly accepted.
+- Continue pragmatic tracking: every rehearsal or real rebuild logs to State Hub with task linkage; feed new bumps to retrospective.
+- Refresh user-engine brief and consider whether orchestration polish (0019) stays in netkingdom or spawns USER-WP entries (per assessment rec).
+- No change to repo boundary: NK orchestrates; UE owns domain facts/impl.
+
+## References
+
+- T02: docs/NetkingdomRuntimeArchitecture.md (UE Integration Points and Known Gaps, Pragmatic Audit Paths, rebuild notes, 0019 additions, S6, concrete hosts/NS/trust)
+- T03: docs/security-bootstrap-retrospective.md (exec summary, 9 bumps, full gap matrix incl. UE/audit/rebuild rows, recs explicitly calling out T09 classify + scripted/namespace first)
+- T05: docs/smooth-bootstrap-guide.md (Step 7/8 user lifecycle + reopen using 0019; "Rehearse rebuild per T09 (scripted/namespace first; use 0019 dry-run as model)"; evidence table; pragmatic note)
+- T07: tools/security-bootstrap-console/tests/test_security_bootstrap_console.py (8 tests incl. dry-run template fields + 0019 checks, runbook payloads has T06 entry, validate fns, audit_core_posture, templates crosscheck)
+- T08: tools/security-bootstrap-console/security_bootstrap_console.py (keycape_openbao_client_deployed() + build_gates or, validate-keycape-client subcmd, make target; evidence_validator_gate + require_evidence_fields for onboarding/lifecycle/cleanup; print_status with T05 guide block)
+- 0019: workplans/NET-WP-0019-..., sso-mfa/k8s/lldap/dry-run-nonroot-user.sh (orchestrator, /tmp+trap, k8s-fallback, populate+validate, lock/offboard, cleanup-only), create-user.sh, claims helpers, Makefile targets (onboarding-dry-run, validate-*, cleanup)
+- Assessment: docs/user-engine-netkingdom-integration-assessment.md (full 1–7 gaps, no intent conflicts, boundary contract crossrefs, recs for 0018 T07/T08/T09 to drive integration)
+- Contract: canon/standards/user-engine-boundary-contract_v0.1.md (source-of-truth matrix, membership envelope, "owner wins", claims_enrichment adapter-owned, audit correlation bundle, bootstrap allowance for direct IAM)
+- Live artifacts: .local/security-bootstrap.json (S6 + audit_core_bootstrap_risk_accepted + many done flags), console status (T08 gates, actions incl. #17 validate-keycape, 0019 dry-run), /tmp/*-dry-run/evidence.json (validated samples), State Hub workstream 800f9f16-... + task a9e60fd5-... + prior /progress/
+- Creds: .claude/commands/creds-init.md (automated SOPS/age/k8s injection + --dry-run + emergency bundle human gate + state-hub log)
+- Other: SCOPE.md (UE integration in Getting Oriented + backlinks), docs/responsibility-map.md (UE orchestrated by NK), platform-identity-security-architecture.md
+
+**T09 completion closes NET-WP-0018 (9/9 tasks).** Brief will reflect after fix-consistency. All prior tasks (T02 arch, T03 retrospective, T04 boundary via assessment, T05 guide, T06 UI align, T07 tests, T08 validators) directly informed the risks and rehearsal design. Pragmatic infra enabled safe, reviewable implementation without blocking on full Audit Core.
+
+Next: operator may create follow-ups for UE adapters (in user-engine primarily), more T08 validators, scheduled T09 rehearsal drill, or production hardening under 0017. Archive this workplan per convention after final sync.
+
+---
+*Created during T09 implementation using file-first + pragmatic tracking (workplan note + progress events + git + console/evidence review). No secrets touched. All actions non-destructive.*
--- a/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md
+++ b/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md
@@ -8,7 +8,7 @@ status: active
 owner: codex
 topic_slug: netkingdom
 created: "2026-06-01"
-updated: "2026-06-03"
+updated: "2026-06-04"
 depends_on:
  - NET-WP-0015
  - NET-WP-0017
@@ -397,7 +397,7 @@ once adapters land (e.g. claims_enrichment projection).

 ```task
 id: NET-WP-0018-T09
-status: todo
+status: done
 priority: high
 state_hub_task_id: "a9e60fd5-fac6-46e9-bc63-b2979cca548e"
 ```
@@ -431,6 +431,14 @@ shows S6 reopen with many flags true, but adapter gaps remain). From assessment:
  automated cred bootstrap entrypoint for rehearsal. No live destructive rebuild
  as non-goal.

+**2026-06-04 (T09 complete):** Started T09 (last high-prio task; 8/9 in brief). Using pragmatic tracking (todo, file notes, will POST /progress/ with task_id, git, console/evidence review, T07/T08 run). Reviewed all prior: T02 NetkingdomRuntimeArchitecture.md (specific-as-deployed incl. full UE 7 gaps section + pragmatic audit + 0019 + rebuild notes), T03 retrospective.md (9 bumps + gap matrix with UE/audit/rebuild rows high for T09 + explicit rec "T09 classify UE risk + rehearsal scripted/namespace first"), T05 smooth-bootstrap-guide.md (consolidated sequence + Step 7/8 refs 0019 dry-run + "Rehearse rebuild per T09 (scripted/namespace first; use 0019 as model)"), T07 tests (8 pytest covering templates/0019 dry-run bools/runbooks/validators + syntax), T08 (keycape_openbao_client_deployed() live via verify script + or into build_gates + subcmd/make; "prove itself through same validations UI shows"), live console status (S6, T08 gate "done (computed via verify...)", action #17 validate-keycape-client, 0019 dry-run actions), .local/metadata (platform_reopened, cleanup_complete, audit_core_bootstrap_risk_accepted:true + review 2026-07-02, many oidc/openbao flags true), /tmp onboarding evidence (validated OK with 12+ exact bools: actor_class=user, no_secret_material_recorded, lldap_identity_verified, keycape_oidc_claims_verified, effective_access_summary, lock_offboard_result clean, prevents root etc.), assessment.md (full 7 gaps: #1 missing adapters biggest, bootstrap users vs UE, claims drift, membership, governance, audit correlation, etc.; no intent conflicts; recs for 0018 to classify + drive integration tests), boundary contract, creds-init skill (automated SOPS/age/k8s + --dry-run + human emergency bundle gate), 0019 orchestrator (dry-run-nonroot-user.sh: /tmp+trap, k8s-fallback, --test, claims, lock/offboard, cleanup-only, evidence populate+validate), Makefile (security-bootstrap-onboarding-dry-run + validate-* + console-test + scripts-syntax + validate-keycape-client all first-class; bootstrap target lists them), T08 verify script.
+
+Created docs/security-bootstrap-rebuild-risk-and-rehearsal.md (exec summary with live posture from T09 exercise of validators/tests; full risk table ~12 areas classified by likelihood/impact/detection/mitigation/remaining-human/priority — UE adapters HIGH #1, scratch state loss HIGH, claims path HIGH until fixed, cluster rehearsal unexercised HIGH, audit correlation HIGH, etc.; detailed non-destructive rehearsal plan: 1. scripted local dry-run (creds-init --dry + make security-bootstrap-onboarding-dry-run + all validate-* + T07 pytest + console status prove + /tmp evidence), 2. ns-isolated/k8s-fallback (orchestrator k8s extract + scoped; isolated restore drill), 3. parallel cluster (full guide + S6 re-proof when avail), 4. live scratch only post-rehearsal + approval (non-goal); rollback via 0019 cleanup/GraphQL; prove criteria (validators 0, tests green, evidence bools, no taint, progress events); current coverage gaps documented; recs: prioritize adapters per assessment, extend T08 validators, drive future dry-run UE exercise from 0018/0019, schedule T09 drills, use as rebuild bible + 0017 handoff gate). 
+
+Updated T09 in workplan to done + this note (refs new doc + all cross + pragmatic infra used for T09 itself + that 9/9 closes 0018). No destructive actions. All review via tools + direct execution of tests/validate (passed). File-first per ADR-001.
+
+Pragmatic: this note + will POST /progress/ (task_id a9e60fd5-... + workstream), git commit, make fix-consistency (expect brief 9/9, C-10 etc.), verify. T09 fulfills the "assess resulting + define rehearsal" + T03 rec + smooth-guide callout + assessment recs. Brief will show 9/9 once synced.
+
 ## Acceptance Criteria

 - `NET-WP-0015` is closed, archived, or explicitly reconciled with remaining