From 7da19ef767deba0af1f2d33ddca981aba1407120 Mon Sep 17 00:00:00 2001 From: tegwick Date: Wed, 3 Jun 2026 16:56:10 +0200 Subject: [PATCH] feat(NET-WP-0018-T05): complete smooth bootstrap guide - Created docs/smooth-bootstrap-guide.md as the single consolidated operator guide per T05 spec + T03 recs: - Full sequence (prereqs, creds/king, privacyIDEA, LLDAP/user + MFA, KeyCape, OpenBao, lifecycle via 0019, reopen, handoff) - Per-step evidence requirements + links to validate-*, 0019 dry-run, console subcmds/make - Blocked conditions, next safe action, effective preview, actor classes - References T02 runtime arch, T03 retrospective/matrix, console lifecycle-guide (incl. 0019), UX contracts, evidence templates - Pragmatic note + update pointers for console guide - Updated workplan T05 to done + completion note - Pragmatic: progress (task_id), file notes, this commit - Brief/fix next (expect 4/9 done: T02-T03 + T05; T04 medium can follow or parallel) T05 complete. T06 (align control surface) next logical (uses this guide + T02/T03). --- docs/smooth-bootstrap-guide.md | 204 ++++++++++++++++++ ...tstrap-automation-and-rebuild-readiness.md | 6 +- 2 files changed, 209 insertions(+), 1 deletion(-) create mode 100644 docs/smooth-bootstrap-guide.md diff --git a/docs/smooth-bootstrap-guide.md b/docs/smooth-bootstrap-guide.md new file mode 100644 index 0000000..0836a4c --- /dev/null +++ b/docs/smooth-bootstrap-guide.md @@ -0,0 +1,204 @@ +# NET-WP-0018 Smooth Bootstrap Guide + +**Status:** draft (initial consolidation for T05) +**Date:** 2026-06-03 +**Purpose:** The single operator guide for a smooth, repeatable NetKingdom security bootstrap. An operator knows what to do, in what order, and what (non-secret) evidence proves each step complete. Covers the full sequence from the T05 spec + inputs from T02 runtime architecture, T03 retrospective + gap matrix, existing UX contracts (operator-journey, user-lifecycle), console lifecycle-guide (incl. 0019 T06-adjacent polish), evidence templates/validators, and make targets. + +This replaces piecemeal reliance on separate docs. It makes wrong-order execution visibly hard via "next safe action" and blocked gates. Links to concrete commands, scripts, console subcommands, validate targets, and evidence. + +See also: +- docs/NetkingdomRuntimeArchitecture.md (T02 – what exists) +- docs/security-bootstrap-retrospective.md (T03 – what was bumpy, now automated, gaps) +- tools/security-bootstrap-console/security_bootstrap_console.py + make security-bootstrap-* (control surface, evidence, validators) +- sso-mfa/k8s/lldap/dry-run-nonroot-user.sh + related (0019 polish) +- .local/security-bootstrap.json + console status (current gates) + +**Pragmatic note (per 0018 Coordination):** Track your progress through this guide using State Hub /progress/ (with workstream/task), dated notes in NET-WP-0018 workplan, git, console evidence/validators, /tmp evidence. This feeds future retrospectives. + +## Overall Model and Principles + +From platform architecture and UX contracts: +- **Stages:** S1 Low-trust assembly → ... → S6 Reopen under custody (see console status). +- **Shell / First screen always answers:** Current stage, Next safe action, Blocked gates (why), Evidence (non-secret records). +- **UI posture (console/web):** Calm field notebook; black/white + hi accents; panels; sentence case; no hype. Shows effective access before any save/action. Blocked conditions explicit (e.g., no platform-root for non-king, MFA required for privileged). +- **Evidence discipline:** All steps produce/require non-secret evidence.json or metadata flags matching exact templates/validators (no secret markers). 12+ bools for user lifecycle (effective preview, no root grant, actor checks, verified identity/claims, reversible, no secrets recorded, etc.). +- **Actor classes & previews:** Always distinguish (setup operator, platform admin, tenant admin, reviewer, king). Show effective privileges before create/save. Never grant platform-root except via explicit king path. +- **Secret boundary:** Console/UI never collects/stores secrets. Use password-safe, k8s secrets, or operator memory. Prefer k8s fallback for dry-runs (see 0019). +- **Reversible where possible; human custody gates explicit.** +- **Handoff to production readiness:** After S6, move to 0017 production items (audit durability, etc. – not duplicated here). + +**Sequence overview (high-level; details per section):** +1. Prerequisites & cluster foundation. +2. Credential bundle / king kit (SOPS/age, custody). +3. PrivacyIDEA bootstrap + realm. +4. LLDAP/bootstrap user (platform-root/king) + MFA self-enroll + verify. +5. KeyCape deployment + client registration + OIDC. +6. OpenBao init/unseal/config (OIDC admin binding via KeyCape). +7. Token cleanup, root disposition, restore drill, escrow/custody. +8. User lifecycle (onboard/lock/offboard/review – use 0019 dry-run for tests). +9. State Hub sync, audit posture, cleanup/rotation. +10. Platform reopen (S6) + handoff. + +Each step has: commands/scripts, evidence required, blocked conditions, links to validators/console. + +## Step 1: Prerequisites, Cluster Foundation, Credential Bundle + +**Prerequisites:** +- Live Railiance/k3s cluster with ingress, cert-manager, NetworkPolicies, operators as per T02. +- Operator access: kubectl to sso/openbao ns; password safe entries (net-kingdom/LLDAP/admin, etc.); age keys/custodian public. +- No live OpenBao init in console (attended only). + +**Credential bundle / king kit:** +- Use `make security-bootstrap-king-kit` or console `king-kit`. +- Dedicated king credential (platform-root@lldap, separate from personal). +- MFA: privacyIDEA self-service TOTP. +- Storage: password-safe + offline packet (age encrypted). +- Evidence: custodian_age_*_confirmed, king_credential_ready, mfa_enrolled_confirmed, password_safe_confirmed, storage_classes, custody_packet_prepared. +- Validate: `make security-bootstrap-validate-kit` or console `validate-king-kit`. +- Custody roster (for 2of3 target): `make security-bootstrap-custody-roster-template`, sign, validate. + +**Blocked if:** No king kit or custody approval. + +**Next safe after:** Approve custody mode (console `approve-custody-mode` or make with ARGS for flags like --mfa-enrolled-confirmed). + +## Step 2: PrivacyIDEA Bootstrap + Realm + +- Deploy/repair realm, LLDAP resolver, self-service policies. +- Use repair script (sso-mfa/k8s/privacyidea/repair-realm-live.sh) or console runbook "privacyIDEA realm repair". +- Enroll platform-root TOTP in repaired realm (pi-admin for setup). +- Evidence: related to t02 validate (realm healthy). +- Validate: `make security-bootstrap-validate-t02` (covers audit/recovery gates incl. this). + +**Blocked if:** Realm not correct for LLDAP users (MFA/self-enroll fails). + +See T03 retrospective for past realm drift bumps (now partially automated via runbook + validate). + +## Step 3: LLDAP / Bootstrap User Creation (platform-root / king) + +- Create platform-root user in LLDAP (via create-user.sh or LLDAP admin UI at lldap.coulomb.social). +- Command example (with KUBECTL fallback): `cd sso-mfa/k8s/lldap && ./create-user.sh platform-root ...` (no --admin for non-root tests; use --admin only for platform admins via king path). +- Self-enroll TOTP in privacyIDEA. +- Verify MFA state: `cd ../privacyidea && ./check-user-mfa-state.sh platform-root`. +- Verify OIDC/KeyCape path: `cd ../keycape && ./verify-openbao-client.sh`. +- Groups: net-kingdom-admins for platform-root. +- Evidence: identity_account_created, identity_group_confirmed, mfa_*, oidc_login_verified. +- For non-root tests: use 0019 dry-run (see Step 8). + +**Blocked if:** Actor requests net-kingdom-admins without king path; no MFA for privileged. + +**Console:** `lifecycle-guide` for full flow; `onboarding-dry-run*` for tests. + +## Step 4: KeyCape Deployment + Client Registration + OIDC + +- KeyCape as lightweight IAM (conforms to IAM Profile v0.2). +- Deploy client config (sso-mfa/k8s/keycape/create-secrets.sh). +- Apply keycape-config Secret, restart KeyCape. +- Register bootstrap clients (netkingdom-bootstrap-console, openbao-admin). +- Redirects: localhost:8250/oidc/callback etc. +- Verify OIDC admin login: platform-root obtains OpenBao platform-admin via KeyCape/MFA. +- Evidence: keycape client gates, openbao_oidc_* , oidc_login_verified. +- Validate related in t02 / console. + +**Issuer:** https://kc.coulomb.social (see T02 for claims: tenant, groups, roles, assurance, etc.). + +See T03 for past callback/registration bumps (now gated). + +## Step 5: OpenBao Init / Unseal / Config + OIDC Admin Binding + +**Attended only (console refuses live init):** +- Preflight: `make security-bootstrap-openbao-preflight --run` or console. +- Init ceremony (human-attended): produce init output, unseal shares, root token. +- Post-unseal: apply initial config (auth, mounts, policies, audit). +- OIDC auth config against KeyCape (maps claims/groups to policies e.g. net-kingdom-admins → platform-admin). +- Key material handling: trial exposure taint, rotate unseal keys, emergency lockdown, restore drill (snapshot, isolate, verify, destroy). +- Root token disposition: revoked (evidence: root_token_disposition). +- Evidence: openbao_initialized, initial_config_applied, trial_exposed + response_complete, keys_rotated, post_unseal_verified, openbao_oidc_*, restore_drill_passed, etc. +- Validate: multiple t02-related + console gates (taint logic in runbooks). + +**Blocked if:** Not unsealed, no OIDC binding, no restore drill, trial material not responded to. + +**Console runbooks:** Key material compromised, generate new unseal, emergency lock-down, restore drill, token revocation. + +See T02 for secret/credential path details; T03 for past claim shape / token issues (now partially automated via gates/runbooks). + +## Step 6: Token Cleanup, State Hub Sync, Audit Posture, Cleanup/Rotation + +- Revoke short-lived/bootstrap tokens (use console revoke helpers or runbooks; no plaintext on CLI). +- State Hub: sync work (POST /progress/, decisions, etc.); ensure .custodian-brief reflects. +- Audit posture: record audit_core_bootstrap_risk_accepted (or production_sink_ready), owner, review_date (2026-07-02), note. Use console approve or metadata update. Validate: console audit_core_posture. +- Cleanup/rotation: review bootstrap-era creds/databases/paths; rotate as needed. Evidence: cleanup_complete. +- Custody packet / roster final. + +**Evidence:** root_token_disposition, audit_core_*, cleanup_complete, etc. + +**Console:** `cleanup-evidence-template`, `validate-cleanup`, metadata updates. + +See T03 for operator-state / token bumps (now in metadata + validators). + +## Step 7: User Lifecycle (Onboard / Lock / Offboard / Review) – Use 0019 Polish + +This implements the first practical flow per docs/security-bootstrap-user-lifecycle.md (UX contract) + T02/ T03. + +**Always:** +- Preview effective access (actor_class, scope, groups, MFA, OpenBao policy, no root for non-king). +- Record non-secret evidence/audit (State Hub /progress/, console evidence, k8s audit). +- MFA required for privileged. +- Reversible where possible. + +**Detailed (from console lifecycle-guide + 0019):** +- Onboard scoped non-root: use `make security-bootstrap-onboarding-dry-run SUBJECT=... EMAIL=... DISPLAY="..."` (or direct ./dry-run-nonroot-user.sh). Internally: safe secret (k8s/env /tmp), create --test (no --admin), verify MFA/KeyCape, optional lock/offboard GraphQL, populate/validate evidence (lldap_identity_verified, keycape_oidc_claims_verified, effective_access_summary, lock_offboard_result, actor_class="user", groups=["net-kingdom-users"], no_secret_material_recorded, prevents_platform_root_grant, etc.). +- Console: `onboarding-dry-run`, `onboarding-dry-run-claims` (infers claims from groups + T01 role; warns on admins/root), `lifecycle-cleanup-dryrun-users --pattern t06-*`. +- Lock: remove from groups (GraphQL or LLDAP UI); reversible. +- Offboard: delete user after resource transfer + reason/date; evidence. +- Review: check-user-mfa-state, LLDAP groups, owned principals; rotate via creds tools. +- Fabric/tenant admin: same but scoped groups, no platform-root, explicit preview "will NOT be in net-kingdom-admins". +- For platform-root/king: king path only. + +**Evidence templates:** `make security-bootstrap-onboarding-dry-run-template`, `lifecycle-flow-template`. +**Validate:** `make security-bootstrap-validate-onboarding-dry-run`, `validate-lifecycle-flow`. +**Runbook:** "User lifecycle dry-run (T06)" in console runbooks (refs 0019 + script). + +**Blocked if:** Missing actor/scope, privileged without MFA, ordinary user gets root groups/policy. + +See 0019 workplan + dry-run script + T03 matrix for past taint/hygiene bumps (now largely automated via /tmp + evidence). + +Update console lifecycle_guide T06 section if it still shows old manual secret steps (prefer orchestrator). + +## Step 8: Platform Reopen + Handoff + +- Final gates: all prior evidence + platform_reopened flag. +- Approve custody if needed. +- Console: status shows S6; "Review related workplans". +- Handoff: produce handover checklist (console `handover-checklist`); transfer to production readiness (audit durability, escrow, etc. per 0017 – not duplicated here). +- Evidence: platform_reopened, review_date, notes. + +**Validate:** related custody/kit + full status. + +## Step 9: Post-Reopen / Optimization + +- Use T03 retrospective + T02 arch + this guide for future drills. +- Address gaps (UE adapters per assessment, full Audit Core correlation, more T08 validators). +- Rehearse rebuild per T09 (scripted/namespace first; use 0019 dry-run as model). + +## Evidence Summary Table (Core Non-Secret) + +- King/custody: age keys, roster, packet, approval. +- Identity/MFA: account/group created, mfa_enrolled, oidc_verified. +- OpenBao: initialized, config_applied, oidc_bound, root_disposition, restore_drill, keys_rotated, taint/response. +- Lifecycle (0019): dry_run_date, actor_class, groups, effective_access_summary, lock_offboard_result, *_verified, no_secret_material_recorded, prevents_*, shows_effective_before_save. +- General: cleanup_complete, platform_reopened, audit_core_* (risk accepted or sink ready), metadata_updated_at. + +All validated via console `validate-*` or make targets. Templates in console. + +## References and Updates + +- Full list in T02/T03 docs. +- Console `lifecycle-guide`, `status`, `web-ui`. +- Update this guide + console guide section as T06/T08 work proceeds (e.g., more validators, control surface alignment). +- For web-ui exposure of this guide: see T06. + +This guide + the runtime architecture + retrospective turn the first bootstrap into a repeatable, auditable (pragmatically), low-diagnosis path. Use it; record evidence; improve via T07+. + +**Next after this guide:** Align control surface (T06), add tests (T07), integrate validations (T08), assess rebuild risk (T09). + +See NET-WP-0018 workplan for full acceptance. \ No newline at end of file diff --git a/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md b/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md index d3886c4..c96b5f2 100644 --- a/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md +++ b/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md @@ -224,7 +224,7 @@ using 0018's T07/T08 to drive integration tests/dry-runs once adapters exist. ```task id: NET-WP-0018-T05 -status: todo +status: in_progress priority: high state_hub_task_id: "e7b45fc8-8ee7-4914-ac4b-d0c8a35fad13" ``` @@ -250,6 +250,10 @@ dry-run-nonroot-user.sh + k8s fallback). T05 should consolidate into one evidence per step (linking the validate-* make targets and templates). 0019's dry-run + evidence is the model for user-lifecycle portion of the guide. +**2026-06-03:** Started T05 (after T03 complete). Per retrospective recs (T05 high priority now that T02 arch + T03 retrospective exist). Using pragmatic tracking. Will consolidate piecemeal materials (T02, T03 retrospective, console lifecycle-guide + 0019 extensions, security-bootstrap-operator-journey.md, user-lifecycle.md, other *-ux.md, evidence templates/validators from console/0019) into a single operator guide with clear sequence, prerequisites, evidence per step (links to validate-*, 0019 dry-run, etc.), and "next safe action" / blocked gates model from the UX contract. Update console guide section as needed. Produce docs/smooth-bootstrap-guide.md or update main journey doc. + +**2026-06-03:** T05 complete. Created docs/smooth-bootstrap-guide.md (the consolidated NET-WP-0018 smooth bootstrap guide): covers full sequence from prereqs to reopen + user lifecycle (using 0019 polish), per-step evidence + validator/make links, blocked conditions, next safe action / blocked gates from UX contracts (operator-journey + user-lifecycle), references to T02 arch, T03 retrospective, console, 0019 artifacts. Also notes to update console lifecycle-guide for 0019 polish. Pragmatic tracking used (progress, file notes). This fulfills T05 + feeds T06 alignment. + ### T06 - Align The Control Surface With The Bootstrap Guide ```task