diff --git a/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md b/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md new file mode 100644 index 0000000..c38a7a4 --- /dev/null +++ b/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md @@ -0,0 +1,260 @@ +--- +id: NET-WP-0018 +type: workplan +title: "Bootstrap Automation And Rebuild Readiness" +domain: netkingdom +repo: net-kingdom +status: ready +owner: codex +topic_slug: netkingdom +created: "2026-06-01" +updated: "2026-06-01" +depends_on: + - NET-WP-0015 + - NET-WP-0017 +state_hub_workstream_id: "800f9f16-bc44-4bbf-a771-58a630a3b698" +--- + +# NET-WP-0018 - Bootstrap Automation And Rebuild Readiness + +## Goal + +Turn the first successful NetKingdom security bootstrap into a repeatable, +well-bounded, highly automated setup path that can survive an infrastructure +reset with minimal interactive diagnosis. + +The first run proved that the stack can work: LLDAP, Authelia, privacyIDEA, +KeyCape, OpenBao, the local bootstrap control surface, and State Hub now form a +working identity and security bootstrap path. It also proved that the system is +still too easy to derail: realm drift, callback bridging, LLDAP lookup +assumptions, OpenBao claim shape, token expiry, and operator-state persistence +all required interactive repair. This workplan converts those lessons into +architecture documentation, bootstrap sequencing, validation coverage, UI +automation, and a clear scratch-rebuild risk assessment. + +## Strategy + +Proceed in layers: + +1. close or explicitly hand off the remaining `NET-WP-0015` bootstrap gates; +2. document the runtime architecture that now actually exists; +3. write down the bootstrap retrospective and automation gaps; +4. clarify repository boundaries so future fixes land in the right place; +5. produce a sequence guide for a smooth rebuild; +6. improve the control-surface UI so it follows that guide; +7. add tests and validations for every guided bootstrap section; and +8. assess the residual risk of rebuilding NetKingdom from scratch. + +This is not a request to immediately destroy and rebuild the live stack. A +scratch rebuild should come only after the guide, validations, and risk review +say which interactions remain genuinely unavoidable. + +## Coordination Notes + +- Avoid duplicating `NET-WP-0017`: audit durability, escrow, user onboarding, + and hardening remain there unless this workplan explicitly turns them into + bootstrap-guide or validation work. +- Keep the bootstrap UI a control surface, not a secret collector. It may run + safe checks, generate commands, and store non-secret evidence, but it must not + store passwords, OTP seeds, OpenBao tokens, unseal shares, or recovery codes. +- Prefer validation helpers that are usable both by the UI and by CI or + operator command lines. +- Treat interactive prompts as an explicit design boundary: automate everything + that can be automated safely, and document why each remaining human action is + required. + +## Tasks + +### T01 - Close Or Hand Off NET-WP-0015 Remaining Gates + +```task +id: NET-WP-0018-T01 +status: todo +priority: high +state_hub_task_id: "7ff22629-838b-41df-9feb-bb36c5d57cc1" +``` + +Review `NET-WP-0015` now that `platform-root` can obtain OpenBao +`platform-admin` through KeyCape/MFA. Close any gates that are truly complete, +and explicitly move unfinished production-readiness work to `NET-WP-0017` or +this workplan when it no longer belongs in the bootstrap ceremony plan. + +Done when `NET-WP-0015` is either finished and ready to archive, or its +remaining tasks have precise owners, target workplans, and non-duplicative +acceptance criteria. + +### T02 - Document The Runtime Architecture + +```task +id: NET-WP-0018-T02 +status: todo +priority: high +state_hub_task_id: "121ee797-e3f5-4d3e-9baa-cfa8c92f8a66" +``` + +Create `docs/NetkingdomRuntimeArchitecture.md` documenting what now exists: +identity stores, MFA realms, KeyCape OIDC flow, Authelia handoff, OpenBao OIDC +admin path, bootstrap UI state, State Hub relation, live DNS/routes, trust +boundaries, token flows, and operational assumptions. + +The document should explain the working system as deployed, not an idealized +future architecture. It should be specific enough to guide a scratch rebuild +without requiring the operator to rediscover the same integration details. + +### T03 - Produce A Bootstrap Retrospective And Automation Gap Matrix + +```task +id: NET-WP-0018-T03 +status: todo +priority: high +state_hub_task_id: "1a3c4261-4133-4021-bd53-ea3dc77021a0" +``` + +Assess how the first bootstrap went. Capture each bump encountered, the root +cause, how it was diagnosed, whether it is now automated, and what remains as a +manual step or fragile assumption. + +Recommended output: `docs/security-bootstrap-retrospective.md` with a gap +matrix covering state persistence, privacyIDEA realm repair, KeyCape image +delivery, OIDC callbacks, OpenBao claim mapping, token revocation, audit, +escrow, and rebuild verification. + +### T04 - Review Repository Intent And Scope Boundaries + +```task +id: NET-WP-0018-T04 +status: todo +priority: medium +state_hub_task_id: "9c286579-b7bc-46ae-9789-801b2b27b26d" +``` + +Review `INTENT.md`, `SCOPE.md`, and equivalent boundary documents across the +associated repositories involved in the bootstrap. At minimum consider +`net-kingdom`, `key-cape`, `railiance-platform`, `state-hub`/custodian, and any +repo that owns OpenBao deployment, image delivery, identity runtime, or +bootstrap automation. + +Update the boundary documents or create follow-up workplans where ownership is +unclear. The result should answer: where should a bug fix live, where should a +runbook live, where should validation live, and which repo owns live +deployment state. + +### T05 - Create The Smooth Bootstrap Guide + +```task +id: NET-WP-0018-T05 +status: todo +priority: high +state_hub_task_id: "e7b45fc8-8ee7-4914-ac4b-d0c8a35fad13" +``` + +Create or update the NetKingdom bootstrap guide so an operator knows what to +do, in what order, and what evidence proves each step is complete. + +The guide should cover prerequisites, credential bundle creation, cluster +foundation checks, privacyIDEA bootstrap, LLDAP/bootstrap user creation, +KeyCape deployment and client registration, OpenBao init/unseal/configuration, +OIDC admin binding, token cleanup, State Hub sync, and handoff to production +readiness. + +### T06 - Align The Control Surface With The Bootstrap Guide + +```task +id: NET-WP-0018-T06 +status: todo +priority: high +state_hub_task_id: "9bba26b3-b1be-4e58-a18b-a0533683d63b" +``` + +Review the local security bootstrap UI against the guide. Improve the +automation grade where safe: replace passive checkboxes with safe validators, +convert fragile copy-paste sequences into scripts, persist non-secret progress +durably, expose repair routines for known drift cases, and keep manual steps +clear when human custody or secret handling is required. + +Done when the UI guides the same sequence as the bootstrap guide and makes +wrong-order execution visibly hard. + +### T07 - Add Automated Tests For Bootstrap UI Sections And Runbooks + +```task +id: NET-WP-0018-T07 +status: todo +priority: high +state_hub_task_id: "c412d9e0-a2ca-4849-b6ee-bd4450b5a4a5" +``` + +For each task section and runbook exposed in the control surface, add automated +tests that validate the implementation contract. + +Use a layered approach: + +- static/unit tests for UI payload generation and command card presence; +- shell/Python syntax tests for generated helper scripts; +- dry-run or fixture tests for validators and state transitions; and +- live-cluster checks gated behind explicit operator environment variables. + +Done when every visible bootstrap section has at least one automated test that +would fail if the section disappears, emits the wrong command, or reports an +impossible state. + +### T08 - Integrate Validations Into The UI State Model + +```task +id: NET-WP-0018-T08 +status: todo +priority: high +state_hub_task_id: "32f05fb1-269c-421c-ae34-57d2ceb7e47a" +``` + +Make the current setup prove itself through the same validations the UI shows. +Where possible, compute `ok`, `fail`, `err`, or `nil` from validators rather +than relying only on manual confirmation. + +Important targets include KeyCape client config, privacyIDEA realm/resolver, +LLDAP user/group membership, Authelia/KeyCape route health, OpenBao OIDC auth +config, token policy proof, audit status, restore evidence, and State Hub sync. + +Done when the UI can distinguish success, failure, error, and unknown states +for the critical bootstrap gates and the live setup satisfies those checks. + +### T09 - Assess Scratch-Rebuild Risk And Define A Rehearsal Plan + +```task +id: NET-WP-0018-T09 +status: todo +priority: high +state_hub_task_id: "a9e60fd5-fac6-46e9-bc63-b2979cca548e" +``` + +Review the resulting architecture, guide, automation, tests, and live +validation coverage. Produce a risk assessment for restarting the NetKingdom +infrastructure from scratch. + +The assessment should classify each risk by likelihood, impact, detection +method, mitigation, and remaining human interaction. It should also recommend +whether the next rebuild should be a full teardown, an isolated parallel +cluster rehearsal, a namespace-level rehearsal, or a scripted dry run. + +## Acceptance Criteria + +- `NET-WP-0015` is closed, archived, or explicitly reconciled with remaining + work owned elsewhere. +- `docs/NetkingdomRuntimeArchitecture.md` documents the real deployed runtime. +- A bootstrap retrospective and automation gap matrix exists. +- Associated repository boundaries are reviewed and updated or tracked with + follow-up work. +- A smooth bootstrap guide describes the intended sequence and evidence. +- The control surface follows the guide and uses safe automation wherever + appropriate. +- Every bootstrap UI section and runbook has automated coverage. +- The live setup passes the integrated validations or reports actionable + failures. +- A scratch-rebuild risk assessment recommends the next rehearsal strategy. + +## Non-Goals + +- Do not perform a destructive live rebuild as part of this workplan. +- Do not move secret material into Git, State Hub, or the bootstrap UI. +- Do not hide remaining human custody decisions behind automation. +- Do not collapse repository ownership boundaries merely for convenience.