Files
net-kingdom/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md

11 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated depends_on state_hub_workstream_id
NET-WP-0018 workplan Bootstrap Automation And Rebuild Readiness netkingdom net-kingdom active codex netkingdom 2026-06-01 2026-06-01
NET-WP-0015
NET-WP-0017
800f9f16-bc44-4bbf-a771-58a630a3b698

NET-WP-0018 - Bootstrap Automation And Rebuild Readiness

Goal

Turn the first successful NetKingdom security bootstrap into a repeatable, well-bounded, highly automated setup path that can survive an infrastructure reset with minimal interactive diagnosis.

The first run proved that the stack can work: LLDAP, Authelia, privacyIDEA, KeyCape, OpenBao, the local bootstrap control surface, and State Hub now form a working identity and security bootstrap path. It also proved that the system is still too easy to derail: realm drift, callback bridging, LLDAP lookup assumptions, OpenBao claim shape, token expiry, and operator-state persistence all required interactive repair. This workplan converts those lessons into architecture documentation, bootstrap sequencing, validation coverage, UI automation, and a clear scratch-rebuild risk assessment.

Strategy

Proceed in layers:

  1. close or explicitly hand off the remaining NET-WP-0015 bootstrap gates;
  2. document the runtime architecture that now actually exists;
  3. write down the bootstrap retrospective and automation gaps;
  4. clarify repository boundaries so future fixes land in the right place;
  5. produce a sequence guide for a smooth rebuild;
  6. improve the control-surface UI so it follows that guide;
  7. add tests and validations for every guided bootstrap section; and
  8. assess the residual risk of rebuilding NetKingdom from scratch.

This is not a request to immediately destroy and rebuild the live stack. A scratch rebuild should come only after the guide, validations, and risk review say which interactions remain genuinely unavoidable.

Coordination Notes

  • Avoid duplicating NET-WP-0017: audit durability, escrow, user onboarding, and hardening remain there unless this workplan explicitly turns them into bootstrap-guide or validation work.
  • Keep the bootstrap UI a control surface, not a secret collector. It may run safe checks, generate commands, and store non-secret evidence, but it must not store passwords, OTP seeds, OpenBao tokens, unseal shares, or recovery codes.
  • Prefer validation helpers that are usable both by the UI and by CI or operator command lines.
  • Treat interactive prompts as an explicit design boundary: automate everything that can be automated safely, and document why each remaining human action is required.

Tasks

T01 - Close Or Hand Off NET-WP-0015 Remaining Gates

id: NET-WP-0018-T01
status: done
priority: high
state_hub_task_id: "7ff22629-838b-41df-9feb-bb36c5d57cc1"

Review NET-WP-0015 now that platform-root can obtain OpenBao platform-admin through KeyCape/MFA. Close any gates that are truly complete, and explicitly move unfinished production-readiness work to NET-WP-0017 or this workplan when it no longer belongs in the bootstrap ceremony plan.

Done when NET-WP-0015 is either finished and ready to archive, or its remaining tasks have precise owners, target workplans, and non-duplicative acceptance criteria.

2026-06-01: Completed. NET-WP-0015 was scope-closed as finished after the OpenBao admin bridge was proven through KeyCape/MFA. Its remaining production-readiness concerns were reconciled into NET-WP-0017: T02 owns audit, restore, emergency drill evidence, and escrow; T03/T04 own bootstrap path retirement and credential reset/rotation; T07 owns final archive review. NET-WP-0018 now continues with architecture documentation, retrospective, guide, UI automation, validations, and rebuild-risk assessment.

T02 - Document The Runtime Architecture

id: NET-WP-0018-T02
status: todo
priority: high
state_hub_task_id: "121ee797-e3f5-4d3e-9baa-cfa8c92f8a66"

Create docs/NetkingdomRuntimeArchitecture.md documenting what now exists: identity stores, MFA realms, KeyCape OIDC flow, Authelia handoff, OpenBao OIDC admin path, bootstrap UI state, State Hub relation, live DNS/routes, trust boundaries, token flows, and operational assumptions.

The document should explain the working system as deployed, not an idealized future architecture. It should be specific enough to guide a scratch rebuild without requiring the operator to rediscover the same integration details.

T03 - Produce A Bootstrap Retrospective And Automation Gap Matrix

id: NET-WP-0018-T03
status: todo
priority: high
state_hub_task_id: "1a3c4261-4133-4021-bd53-ea3dc77021a0"

Assess how the first bootstrap went. Capture each bump encountered, the root cause, how it was diagnosed, whether it is now automated, and what remains as a manual step or fragile assumption.

Recommended output: docs/security-bootstrap-retrospective.md with a gap matrix covering state persistence, privacyIDEA realm repair, KeyCape image delivery, OIDC callbacks, OpenBao claim mapping, token revocation, audit, escrow, and rebuild verification.

T04 - Review Repository Intent And Scope Boundaries

id: NET-WP-0018-T04
status: todo
priority: medium
state_hub_task_id: "9c286579-b7bc-46ae-9789-801b2b27b26d"

Review INTENT.md, SCOPE.md, and equivalent boundary documents across the associated repositories involved in the bootstrap. At minimum consider net-kingdom, key-cape, railiance-platform, state-hub/custodian, and any repo that owns OpenBao deployment, image delivery, identity runtime, or bootstrap automation.

Update the boundary documents or create follow-up workplans where ownership is unclear. The result should answer: where should a bug fix live, where should a runbook live, where should validation live, and which repo owns live deployment state.

T05 - Create The Smooth Bootstrap Guide

id: NET-WP-0018-T05
status: todo
priority: high
state_hub_task_id: "e7b45fc8-8ee7-4914-ac4b-d0c8a35fad13"

Create or update the NetKingdom bootstrap guide so an operator knows what to do, in what order, and what evidence proves each step is complete.

The guide should cover prerequisites, credential bundle creation, cluster foundation checks, privacyIDEA bootstrap, LLDAP/bootstrap user creation, KeyCape deployment and client registration, OpenBao init/unseal/configuration, OIDC admin binding, token cleanup, State Hub sync, and handoff to production readiness.

T06 - Align The Control Surface With The Bootstrap Guide

id: NET-WP-0018-T06
status: todo
priority: high
state_hub_task_id: "9bba26b3-b1be-4e58-a18b-a0533683d63b"

Review the local security bootstrap UI against the guide. Improve the automation grade where safe: replace passive checkboxes with safe validators, convert fragile copy-paste sequences into scripts, persist non-secret progress durably, expose repair routines for known drift cases, and keep manual steps clear when human custody or secret handling is required.

Done when the UI guides the same sequence as the bootstrap guide and makes wrong-order execution visibly hard.

T07 - Add Automated Tests For Bootstrap UI Sections And Runbooks

id: NET-WP-0018-T07
status: todo
priority: high
state_hub_task_id: "c412d9e0-a2ca-4849-b6ee-bd4450b5a4a5"

For each task section and runbook exposed in the control surface, add automated tests that validate the implementation contract.

Use a layered approach:

  • static/unit tests for UI payload generation and command card presence;
  • shell/Python syntax tests for generated helper scripts;
  • dry-run or fixture tests for validators and state transitions; and
  • live-cluster checks gated behind explicit operator environment variables.

Done when every visible bootstrap section has at least one automated test that would fail if the section disappears, emits the wrong command, or reports an impossible state.

Note (NET-WP-0019 polish): Include tests for the user-lifecycle dry-run (T06 from 0017/0019): the orchestrator script, onboarding-dry-run console command, claims verification (T05), cleanup helper, and evidence validators. See NET-WP-0019 workplan and sso-mfa/k8s/lldap/dry-run-nonroot-user.sh . This cross-links the T06-adjacent polish into 0018's automation goals.

See also docs/user-engine-netkingdom-integration-assessment.md for the broader intent/scope fit, gaps (esp. adapters), and recommendations.

T08 - Integrate Validations Into The UI State Model

id: NET-WP-0018-T08
status: todo
priority: high
state_hub_task_id: "32f05fb1-269c-421c-ae34-57d2ceb7e47a"

Make the current setup prove itself through the same validations the UI shows. Where possible, compute ok, fail, err, or nil from validators rather than relying only on manual confirmation.

Important targets include KeyCape client config, privacyIDEA realm/resolver, LLDAP user/group membership, Authelia/KeyCape route health, OpenBao OIDC auth config, token policy proof, audit status, restore evidence, and State Hub sync.

Done when the UI can distinguish success, failure, error, and unknown states for the critical bootstrap gates and the live setup satisfies those checks.

T09 - Assess Scratch-Rebuild Risk And Define A Rehearsal Plan

id: NET-WP-0018-T09
status: todo
priority: high
state_hub_task_id: "a9e60fd5-fac6-46e9-bc63-b2979cca548e"

Review the resulting architecture, guide, automation, tests, and live validation coverage. Produce a risk assessment for restarting the NetKingdom infrastructure from scratch.

The assessment should classify each risk by likelihood, impact, detection method, mitigation, and remaining human interaction. It should also recommend whether the next rebuild should be a full teardown, an isolated parallel cluster rehearsal, a namespace-level rehearsal, or a scripted dry run.

Acceptance Criteria

  • NET-WP-0015 is closed, archived, or explicitly reconciled with remaining work owned elsewhere.
  • docs/NetkingdomRuntimeArchitecture.md documents the real deployed runtime.
  • A bootstrap retrospective and automation gap matrix exists.
  • Associated repository boundaries are reviewed and updated or tracked with follow-up work.
  • A smooth bootstrap guide describes the intended sequence and evidence.
  • The control surface follows the guide and uses safe automation wherever appropriate.
  • Every bootstrap UI section and runbook has automated coverage.
  • The live setup passes the integrated validations or reports actionable failures.
  • A scratch-rebuild risk assessment recommends the next rehearsal strategy.

Non-Goals

  • Do not perform a destructive live rebuild as part of this workplan.
  • Do not move secret material into Git, State Hub, or the bootstrap UI.
  • Do not hide remaining human custody decisions behind automation.
  • Do not collapse repository ownership boundaries merely for convenience.