Add bootstrap rebuild readiness workplan

This commit is contained in:
2026-06-01 21:48:48 +02:00
parent 155507eeb7
commit 8382a11e8e

View File

@@ -0,0 +1,260 @@
---
id: NET-WP-0018
type: workplan
title: "Bootstrap Automation And Rebuild Readiness"
domain: netkingdom
repo: net-kingdom
status: ready
owner: codex
topic_slug: netkingdom
created: "2026-06-01"
updated: "2026-06-01"
depends_on:
- NET-WP-0015
- NET-WP-0017
state_hub_workstream_id: "800f9f16-bc44-4bbf-a771-58a630a3b698"
---
# NET-WP-0018 - Bootstrap Automation And Rebuild Readiness
## Goal
Turn the first successful NetKingdom security bootstrap into a repeatable,
well-bounded, highly automated setup path that can survive an infrastructure
reset with minimal interactive diagnosis.
The first run proved that the stack can work: LLDAP, Authelia, privacyIDEA,
KeyCape, OpenBao, the local bootstrap control surface, and State Hub now form a
working identity and security bootstrap path. It also proved that the system is
still too easy to derail: realm drift, callback bridging, LLDAP lookup
assumptions, OpenBao claim shape, token expiry, and operator-state persistence
all required interactive repair. This workplan converts those lessons into
architecture documentation, bootstrap sequencing, validation coverage, UI
automation, and a clear scratch-rebuild risk assessment.
## Strategy
Proceed in layers:
1. close or explicitly hand off the remaining `NET-WP-0015` bootstrap gates;
2. document the runtime architecture that now actually exists;
3. write down the bootstrap retrospective and automation gaps;
4. clarify repository boundaries so future fixes land in the right place;
5. produce a sequence guide for a smooth rebuild;
6. improve the control-surface UI so it follows that guide;
7. add tests and validations for every guided bootstrap section; and
8. assess the residual risk of rebuilding NetKingdom from scratch.
This is not a request to immediately destroy and rebuild the live stack. A
scratch rebuild should come only after the guide, validations, and risk review
say which interactions remain genuinely unavoidable.
## Coordination Notes
- Avoid duplicating `NET-WP-0017`: audit durability, escrow, user onboarding,
and hardening remain there unless this workplan explicitly turns them into
bootstrap-guide or validation work.
- Keep the bootstrap UI a control surface, not a secret collector. It may run
safe checks, generate commands, and store non-secret evidence, but it must not
store passwords, OTP seeds, OpenBao tokens, unseal shares, or recovery codes.
- Prefer validation helpers that are usable both by the UI and by CI or
operator command lines.
- Treat interactive prompts as an explicit design boundary: automate everything
that can be automated safely, and document why each remaining human action is
required.
## Tasks
### T01 - Close Or Hand Off NET-WP-0015 Remaining Gates
```task
id: NET-WP-0018-T01
status: todo
priority: high
state_hub_task_id: "7ff22629-838b-41df-9feb-bb36c5d57cc1"
```
Review `NET-WP-0015` now that `platform-root` can obtain OpenBao
`platform-admin` through KeyCape/MFA. Close any gates that are truly complete,
and explicitly move unfinished production-readiness work to `NET-WP-0017` or
this workplan when it no longer belongs in the bootstrap ceremony plan.
Done when `NET-WP-0015` is either finished and ready to archive, or its
remaining tasks have precise owners, target workplans, and non-duplicative
acceptance criteria.
### T02 - Document The Runtime Architecture
```task
id: NET-WP-0018-T02
status: todo
priority: high
state_hub_task_id: "121ee797-e3f5-4d3e-9baa-cfa8c92f8a66"
```
Create `docs/NetkingdomRuntimeArchitecture.md` documenting what now exists:
identity stores, MFA realms, KeyCape OIDC flow, Authelia handoff, OpenBao OIDC
admin path, bootstrap UI state, State Hub relation, live DNS/routes, trust
boundaries, token flows, and operational assumptions.
The document should explain the working system as deployed, not an idealized
future architecture. It should be specific enough to guide a scratch rebuild
without requiring the operator to rediscover the same integration details.
### T03 - Produce A Bootstrap Retrospective And Automation Gap Matrix
```task
id: NET-WP-0018-T03
status: todo
priority: high
state_hub_task_id: "1a3c4261-4133-4021-bd53-ea3dc77021a0"
```
Assess how the first bootstrap went. Capture each bump encountered, the root
cause, how it was diagnosed, whether it is now automated, and what remains as a
manual step or fragile assumption.
Recommended output: `docs/security-bootstrap-retrospective.md` with a gap
matrix covering state persistence, privacyIDEA realm repair, KeyCape image
delivery, OIDC callbacks, OpenBao claim mapping, token revocation, audit,
escrow, and rebuild verification.
### T04 - Review Repository Intent And Scope Boundaries
```task
id: NET-WP-0018-T04
status: todo
priority: medium
state_hub_task_id: "9c286579-b7bc-46ae-9789-801b2b27b26d"
```
Review `INTENT.md`, `SCOPE.md`, and equivalent boundary documents across the
associated repositories involved in the bootstrap. At minimum consider
`net-kingdom`, `key-cape`, `railiance-platform`, `state-hub`/custodian, and any
repo that owns OpenBao deployment, image delivery, identity runtime, or
bootstrap automation.
Update the boundary documents or create follow-up workplans where ownership is
unclear. The result should answer: where should a bug fix live, where should a
runbook live, where should validation live, and which repo owns live
deployment state.
### T05 - Create The Smooth Bootstrap Guide
```task
id: NET-WP-0018-T05
status: todo
priority: high
state_hub_task_id: "e7b45fc8-8ee7-4914-ac4b-d0c8a35fad13"
```
Create or update the NetKingdom bootstrap guide so an operator knows what to
do, in what order, and what evidence proves each step is complete.
The guide should cover prerequisites, credential bundle creation, cluster
foundation checks, privacyIDEA bootstrap, LLDAP/bootstrap user creation,
KeyCape deployment and client registration, OpenBao init/unseal/configuration,
OIDC admin binding, token cleanup, State Hub sync, and handoff to production
readiness.
### T06 - Align The Control Surface With The Bootstrap Guide
```task
id: NET-WP-0018-T06
status: todo
priority: high
state_hub_task_id: "9bba26b3-b1be-4e58-a18b-a0533683d63b"
```
Review the local security bootstrap UI against the guide. Improve the
automation grade where safe: replace passive checkboxes with safe validators,
convert fragile copy-paste sequences into scripts, persist non-secret progress
durably, expose repair routines for known drift cases, and keep manual steps
clear when human custody or secret handling is required.
Done when the UI guides the same sequence as the bootstrap guide and makes
wrong-order execution visibly hard.
### T07 - Add Automated Tests For Bootstrap UI Sections And Runbooks
```task
id: NET-WP-0018-T07
status: todo
priority: high
state_hub_task_id: "c412d9e0-a2ca-4849-b6ee-bd4450b5a4a5"
```
For each task section and runbook exposed in the control surface, add automated
tests that validate the implementation contract.
Use a layered approach:
- static/unit tests for UI payload generation and command card presence;
- shell/Python syntax tests for generated helper scripts;
- dry-run or fixture tests for validators and state transitions; and
- live-cluster checks gated behind explicit operator environment variables.
Done when every visible bootstrap section has at least one automated test that
would fail if the section disappears, emits the wrong command, or reports an
impossible state.
### T08 - Integrate Validations Into The UI State Model
```task
id: NET-WP-0018-T08
status: todo
priority: high
state_hub_task_id: "32f05fb1-269c-421c-ae34-57d2ceb7e47a"
```
Make the current setup prove itself through the same validations the UI shows.
Where possible, compute `ok`, `fail`, `err`, or `nil` from validators rather
than relying only on manual confirmation.
Important targets include KeyCape client config, privacyIDEA realm/resolver,
LLDAP user/group membership, Authelia/KeyCape route health, OpenBao OIDC auth
config, token policy proof, audit status, restore evidence, and State Hub sync.
Done when the UI can distinguish success, failure, error, and unknown states
for the critical bootstrap gates and the live setup satisfies those checks.
### T09 - Assess Scratch-Rebuild Risk And Define A Rehearsal Plan
```task
id: NET-WP-0018-T09
status: todo
priority: high
state_hub_task_id: "a9e60fd5-fac6-46e9-bc63-b2979cca548e"
```
Review the resulting architecture, guide, automation, tests, and live
validation coverage. Produce a risk assessment for restarting the NetKingdom
infrastructure from scratch.
The assessment should classify each risk by likelihood, impact, detection
method, mitigation, and remaining human interaction. It should also recommend
whether the next rebuild should be a full teardown, an isolated parallel
cluster rehearsal, a namespace-level rehearsal, or a scripted dry run.
## Acceptance Criteria
- `NET-WP-0015` is closed, archived, or explicitly reconciled with remaining
work owned elsewhere.
- `docs/NetkingdomRuntimeArchitecture.md` documents the real deployed runtime.
- A bootstrap retrospective and automation gap matrix exists.
- Associated repository boundaries are reviewed and updated or tracked with
follow-up work.
- A smooth bootstrap guide describes the intended sequence and evidence.
- The control surface follows the guide and uses safe automation wherever
appropriate.
- Every bootstrap UI section and runbook has automated coverage.
- The live setup passes the integrated validations or reports actionable
failures.
- A scratch-rebuild risk assessment recommends the next rehearsal strategy.
## Non-Goals
- Do not perform a destructive live rebuild as part of this workplan.
- Do not move secret material into Git, State Hub, or the bootstrap UI.
- Do not hide remaining human custody decisions behind automation.
- Do not collapse repository ownership boundaries merely for convenience.