--- id: NET-WP-0018 type: workplan title: "Bootstrap Automation And Rebuild Readiness" domain: netkingdom repo: net-kingdom status: active owner: codex topic_slug: netkingdom created: "2026-06-01" updated: "2026-06-01" depends_on: - NET-WP-0015 - NET-WP-0017 state_hub_workstream_id: "800f9f16-bc44-4bbf-a771-58a630a3b698" --- # NET-WP-0018 - Bootstrap Automation And Rebuild Readiness ## Goal Turn the first successful NetKingdom security bootstrap into a repeatable, well-bounded, highly automated setup path that can survive an infrastructure reset with minimal interactive diagnosis. The first run proved that the stack can work: LLDAP, Authelia, privacyIDEA, KeyCape, OpenBao, the local bootstrap control surface, and State Hub now form a working identity and security bootstrap path. It also proved that the system is still too easy to derail: realm drift, callback bridging, LLDAP lookup assumptions, OpenBao claim shape, token expiry, and operator-state persistence all required interactive repair. This workplan converts those lessons into architecture documentation, bootstrap sequencing, validation coverage, UI automation, and a clear scratch-rebuild risk assessment. ## Strategy Proceed in layers: 1. close or explicitly hand off the remaining `NET-WP-0015` bootstrap gates; 2. document the runtime architecture that now actually exists; 3. write down the bootstrap retrospective and automation gaps; 4. clarify repository boundaries so future fixes land in the right place; 5. produce a sequence guide for a smooth rebuild; 6. improve the control-surface UI so it follows that guide; 7. add tests and validations for every guided bootstrap section; and 8. assess the residual risk of rebuilding NetKingdom from scratch. This is not a request to immediately destroy and rebuild the live stack. A scratch rebuild should come only after the guide, validations, and risk review say which interactions remain genuinely unavoidable. ## Coordination Notes - Avoid duplicating `NET-WP-0017`: audit durability, escrow, user onboarding, and hardening remain there unless this workplan explicitly turns them into bootstrap-guide or validation work. - Keep the bootstrap UI a control surface, not a secret collector. It may run safe checks, generate commands, and store non-secret evidence, but it must not store passwords, OTP seeds, OpenBao tokens, unseal shares, or recovery codes. - Prefer validation helpers that are usable both by the UI and by CI or operator command lines. - Treat interactive prompts as an explicit design boundary: automate everything that can be automated safely, and document why each remaining human action is required. ## Tasks ### T01 - Close Or Hand Off NET-WP-0015 Remaining Gates ```task id: NET-WP-0018-T01 status: done priority: high state_hub_task_id: "7ff22629-838b-41df-9feb-bb36c5d57cc1" ``` Review `NET-WP-0015` now that `platform-root` can obtain OpenBao `platform-admin` through KeyCape/MFA. Close any gates that are truly complete, and explicitly move unfinished production-readiness work to `NET-WP-0017` or this workplan when it no longer belongs in the bootstrap ceremony plan. Done when `NET-WP-0015` is either finished and ready to archive, or its remaining tasks have precise owners, target workplans, and non-duplicative acceptance criteria. **2026-06-01:** Completed. `NET-WP-0015` was scope-closed as finished after the OpenBao admin bridge was proven through KeyCape/MFA. Its remaining production-readiness concerns were reconciled into `NET-WP-0017`: T02 owns audit, restore, emergency drill evidence, and escrow; T03/T04 own bootstrap path retirement and credential reset/rotation; T07 owns final archive review. `NET-WP-0018` now continues with architecture documentation, retrospective, guide, UI automation, validations, and rebuild-risk assessment. ### T02 - Document The Runtime Architecture ```task id: NET-WP-0018-T02 status: todo priority: high state_hub_task_id: "121ee797-e3f5-4d3e-9baa-cfa8c92f8a66" ``` Create `docs/NetkingdomRuntimeArchitecture.md` documenting what now exists: identity stores, MFA realms, KeyCape OIDC flow, Authelia handoff, OpenBao OIDC admin path, bootstrap UI state, State Hub relation, live DNS/routes, trust boundaries, token flows, and operational assumptions. The document should explain the working system as deployed, not an idealized future architecture. It should be specific enough to guide a scratch rebuild without requiring the operator to rediscover the same integration details. ### T03 - Produce A Bootstrap Retrospective And Automation Gap Matrix ```task id: NET-WP-0018-T03 status: todo priority: high state_hub_task_id: "1a3c4261-4133-4021-bd53-ea3dc77021a0" ``` Assess how the first bootstrap went. Capture each bump encountered, the root cause, how it was diagnosed, whether it is now automated, and what remains as a manual step or fragile assumption. Recommended output: `docs/security-bootstrap-retrospective.md` with a gap matrix covering state persistence, privacyIDEA realm repair, KeyCape image delivery, OIDC callbacks, OpenBao claim mapping, token revocation, audit, escrow, and rebuild verification. ### T04 - Review Repository Intent And Scope Boundaries ```task id: NET-WP-0018-T04 status: todo priority: medium state_hub_task_id: "9c286579-b7bc-46ae-9789-801b2b27b26d" ``` Review `INTENT.md`, `SCOPE.md`, and equivalent boundary documents across the associated repositories involved in the bootstrap. At minimum consider `net-kingdom`, `key-cape`, `railiance-platform`, `state-hub`/custodian, and any repo that owns OpenBao deployment, image delivery, identity runtime, or bootstrap automation. Update the boundary documents or create follow-up workplans where ownership is unclear. The result should answer: where should a bug fix live, where should a runbook live, where should validation live, and which repo owns live deployment state. ### T05 - Create The Smooth Bootstrap Guide ```task id: NET-WP-0018-T05 status: todo priority: high state_hub_task_id: "e7b45fc8-8ee7-4914-ac4b-d0c8a35fad13" ``` Create or update the NetKingdom bootstrap guide so an operator knows what to do, in what order, and what evidence proves each step is complete. The guide should cover prerequisites, credential bundle creation, cluster foundation checks, privacyIDEA bootstrap, LLDAP/bootstrap user creation, KeyCape deployment and client registration, OpenBao init/unseal/configuration, OIDC admin binding, token cleanup, State Hub sync, and handoff to production readiness. ### T06 - Align The Control Surface With The Bootstrap Guide ```task id: NET-WP-0018-T06 status: todo priority: high state_hub_task_id: "9bba26b3-b1be-4e58-a18b-a0533683d63b" ``` Review the local security bootstrap UI against the guide. Improve the automation grade where safe: replace passive checkboxes with safe validators, convert fragile copy-paste sequences into scripts, persist non-secret progress durably, expose repair routines for known drift cases, and keep manual steps clear when human custody or secret handling is required. Done when the UI guides the same sequence as the bootstrap guide and makes wrong-order execution visibly hard. ### T07 - Add Automated Tests For Bootstrap UI Sections And Runbooks ```task id: NET-WP-0018-T07 status: todo priority: high state_hub_task_id: "c412d9e0-a2ca-4849-b6ee-bd4450b5a4a5" ``` For each task section and runbook exposed in the control surface, add automated tests that validate the implementation contract. Use a layered approach: - static/unit tests for UI payload generation and command card presence; - shell/Python syntax tests for generated helper scripts; - dry-run or fixture tests for validators and state transitions; and - live-cluster checks gated behind explicit operator environment variables. Done when every visible bootstrap section has at least one automated test that would fail if the section disappears, emits the wrong command, or reports an impossible state. ### T08 - Integrate Validations Into The UI State Model ```task id: NET-WP-0018-T08 status: todo priority: high state_hub_task_id: "32f05fb1-269c-421c-ae34-57d2ceb7e47a" ``` Make the current setup prove itself through the same validations the UI shows. Where possible, compute `ok`, `fail`, `err`, or `nil` from validators rather than relying only on manual confirmation. Important targets include KeyCape client config, privacyIDEA realm/resolver, LLDAP user/group membership, Authelia/KeyCape route health, OpenBao OIDC auth config, token policy proof, audit status, restore evidence, and State Hub sync. Done when the UI can distinguish success, failure, error, and unknown states for the critical bootstrap gates and the live setup satisfies those checks. ### T09 - Assess Scratch-Rebuild Risk And Define A Rehearsal Plan ```task id: NET-WP-0018-T09 status: todo priority: high state_hub_task_id: "a9e60fd5-fac6-46e9-bc63-b2979cca548e" ``` Review the resulting architecture, guide, automation, tests, and live validation coverage. Produce a risk assessment for restarting the NetKingdom infrastructure from scratch. The assessment should classify each risk by likelihood, impact, detection method, mitigation, and remaining human interaction. It should also recommend whether the next rebuild should be a full teardown, an isolated parallel cluster rehearsal, a namespace-level rehearsal, or a scripted dry run. ## Acceptance Criteria - `NET-WP-0015` is closed, archived, or explicitly reconciled with remaining work owned elsewhere. - `docs/NetkingdomRuntimeArchitecture.md` documents the real deployed runtime. - A bootstrap retrospective and automation gap matrix exists. - Associated repository boundaries are reviewed and updated or tracked with follow-up work. - A smooth bootstrap guide describes the intended sequence and evidence. - The control surface follows the guide and uses safe automation wherever appropriate. - Every bootstrap UI section and runbook has automated coverage. - The live setup passes the integrated validations or reports actionable failures. - A scratch-rebuild risk assessment recommends the next rehearsal strategy. ## Non-Goals - Do not perform a destructive live rebuild as part of this workplan. - Do not move secret material into Git, State Hub, or the bootstrap UI. - Do not hide remaining human custody decisions behind automation. - Do not collapse repository ownership boundaries merely for convenience.