generated from coulomb/repo-seed
Add bootstrap rebuild readiness workplan
This commit is contained in:
@@ -0,0 +1,260 @@
|
||||
---
|
||||
id: NET-WP-0018
|
||||
type: workplan
|
||||
title: "Bootstrap Automation And Rebuild Readiness"
|
||||
domain: netkingdom
|
||||
repo: net-kingdom
|
||||
status: ready
|
||||
owner: codex
|
||||
topic_slug: netkingdom
|
||||
created: "2026-06-01"
|
||||
updated: "2026-06-01"
|
||||
depends_on:
|
||||
- NET-WP-0015
|
||||
- NET-WP-0017
|
||||
state_hub_workstream_id: "800f9f16-bc44-4bbf-a771-58a630a3b698"
|
||||
---
|
||||
|
||||
# NET-WP-0018 - Bootstrap Automation And Rebuild Readiness
|
||||
|
||||
## Goal
|
||||
|
||||
Turn the first successful NetKingdom security bootstrap into a repeatable,
|
||||
well-bounded, highly automated setup path that can survive an infrastructure
|
||||
reset with minimal interactive diagnosis.
|
||||
|
||||
The first run proved that the stack can work: LLDAP, Authelia, privacyIDEA,
|
||||
KeyCape, OpenBao, the local bootstrap control surface, and State Hub now form a
|
||||
working identity and security bootstrap path. It also proved that the system is
|
||||
still too easy to derail: realm drift, callback bridging, LLDAP lookup
|
||||
assumptions, OpenBao claim shape, token expiry, and operator-state persistence
|
||||
all required interactive repair. This workplan converts those lessons into
|
||||
architecture documentation, bootstrap sequencing, validation coverage, UI
|
||||
automation, and a clear scratch-rebuild risk assessment.
|
||||
|
||||
## Strategy
|
||||
|
||||
Proceed in layers:
|
||||
|
||||
1. close or explicitly hand off the remaining `NET-WP-0015` bootstrap gates;
|
||||
2. document the runtime architecture that now actually exists;
|
||||
3. write down the bootstrap retrospective and automation gaps;
|
||||
4. clarify repository boundaries so future fixes land in the right place;
|
||||
5. produce a sequence guide for a smooth rebuild;
|
||||
6. improve the control-surface UI so it follows that guide;
|
||||
7. add tests and validations for every guided bootstrap section; and
|
||||
8. assess the residual risk of rebuilding NetKingdom from scratch.
|
||||
|
||||
This is not a request to immediately destroy and rebuild the live stack. A
|
||||
scratch rebuild should come only after the guide, validations, and risk review
|
||||
say which interactions remain genuinely unavoidable.
|
||||
|
||||
## Coordination Notes
|
||||
|
||||
- Avoid duplicating `NET-WP-0017`: audit durability, escrow, user onboarding,
|
||||
and hardening remain there unless this workplan explicitly turns them into
|
||||
bootstrap-guide or validation work.
|
||||
- Keep the bootstrap UI a control surface, not a secret collector. It may run
|
||||
safe checks, generate commands, and store non-secret evidence, but it must not
|
||||
store passwords, OTP seeds, OpenBao tokens, unseal shares, or recovery codes.
|
||||
- Prefer validation helpers that are usable both by the UI and by CI or
|
||||
operator command lines.
|
||||
- Treat interactive prompts as an explicit design boundary: automate everything
|
||||
that can be automated safely, and document why each remaining human action is
|
||||
required.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 - Close Or Hand Off NET-WP-0015 Remaining Gates
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T01
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "7ff22629-838b-41df-9feb-bb36c5d57cc1"
|
||||
```
|
||||
|
||||
Review `NET-WP-0015` now that `platform-root` can obtain OpenBao
|
||||
`platform-admin` through KeyCape/MFA. Close any gates that are truly complete,
|
||||
and explicitly move unfinished production-readiness work to `NET-WP-0017` or
|
||||
this workplan when it no longer belongs in the bootstrap ceremony plan.
|
||||
|
||||
Done when `NET-WP-0015` is either finished and ready to archive, or its
|
||||
remaining tasks have precise owners, target workplans, and non-duplicative
|
||||
acceptance criteria.
|
||||
|
||||
### T02 - Document The Runtime Architecture
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T02
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "121ee797-e3f5-4d3e-9baa-cfa8c92f8a66"
|
||||
```
|
||||
|
||||
Create `docs/NetkingdomRuntimeArchitecture.md` documenting what now exists:
|
||||
identity stores, MFA realms, KeyCape OIDC flow, Authelia handoff, OpenBao OIDC
|
||||
admin path, bootstrap UI state, State Hub relation, live DNS/routes, trust
|
||||
boundaries, token flows, and operational assumptions.
|
||||
|
||||
The document should explain the working system as deployed, not an idealized
|
||||
future architecture. It should be specific enough to guide a scratch rebuild
|
||||
without requiring the operator to rediscover the same integration details.
|
||||
|
||||
### T03 - Produce A Bootstrap Retrospective And Automation Gap Matrix
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T03
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "1a3c4261-4133-4021-bd53-ea3dc77021a0"
|
||||
```
|
||||
|
||||
Assess how the first bootstrap went. Capture each bump encountered, the root
|
||||
cause, how it was diagnosed, whether it is now automated, and what remains as a
|
||||
manual step or fragile assumption.
|
||||
|
||||
Recommended output: `docs/security-bootstrap-retrospective.md` with a gap
|
||||
matrix covering state persistence, privacyIDEA realm repair, KeyCape image
|
||||
delivery, OIDC callbacks, OpenBao claim mapping, token revocation, audit,
|
||||
escrow, and rebuild verification.
|
||||
|
||||
### T04 - Review Repository Intent And Scope Boundaries
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T04
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "9c286579-b7bc-46ae-9789-801b2b27b26d"
|
||||
```
|
||||
|
||||
Review `INTENT.md`, `SCOPE.md`, and equivalent boundary documents across the
|
||||
associated repositories involved in the bootstrap. At minimum consider
|
||||
`net-kingdom`, `key-cape`, `railiance-platform`, `state-hub`/custodian, and any
|
||||
repo that owns OpenBao deployment, image delivery, identity runtime, or
|
||||
bootstrap automation.
|
||||
|
||||
Update the boundary documents or create follow-up workplans where ownership is
|
||||
unclear. The result should answer: where should a bug fix live, where should a
|
||||
runbook live, where should validation live, and which repo owns live
|
||||
deployment state.
|
||||
|
||||
### T05 - Create The Smooth Bootstrap Guide
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T05
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "e7b45fc8-8ee7-4914-ac4b-d0c8a35fad13"
|
||||
```
|
||||
|
||||
Create or update the NetKingdom bootstrap guide so an operator knows what to
|
||||
do, in what order, and what evidence proves each step is complete.
|
||||
|
||||
The guide should cover prerequisites, credential bundle creation, cluster
|
||||
foundation checks, privacyIDEA bootstrap, LLDAP/bootstrap user creation,
|
||||
KeyCape deployment and client registration, OpenBao init/unseal/configuration,
|
||||
OIDC admin binding, token cleanup, State Hub sync, and handoff to production
|
||||
readiness.
|
||||
|
||||
### T06 - Align The Control Surface With The Bootstrap Guide
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T06
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "9bba26b3-b1be-4e58-a18b-a0533683d63b"
|
||||
```
|
||||
|
||||
Review the local security bootstrap UI against the guide. Improve the
|
||||
automation grade where safe: replace passive checkboxes with safe validators,
|
||||
convert fragile copy-paste sequences into scripts, persist non-secret progress
|
||||
durably, expose repair routines for known drift cases, and keep manual steps
|
||||
clear when human custody or secret handling is required.
|
||||
|
||||
Done when the UI guides the same sequence as the bootstrap guide and makes
|
||||
wrong-order execution visibly hard.
|
||||
|
||||
### T07 - Add Automated Tests For Bootstrap UI Sections And Runbooks
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T07
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "c412d9e0-a2ca-4849-b6ee-bd4450b5a4a5"
|
||||
```
|
||||
|
||||
For each task section and runbook exposed in the control surface, add automated
|
||||
tests that validate the implementation contract.
|
||||
|
||||
Use a layered approach:
|
||||
|
||||
- static/unit tests for UI payload generation and command card presence;
|
||||
- shell/Python syntax tests for generated helper scripts;
|
||||
- dry-run or fixture tests for validators and state transitions; and
|
||||
- live-cluster checks gated behind explicit operator environment variables.
|
||||
|
||||
Done when every visible bootstrap section has at least one automated test that
|
||||
would fail if the section disappears, emits the wrong command, or reports an
|
||||
impossible state.
|
||||
|
||||
### T08 - Integrate Validations Into The UI State Model
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T08
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "32f05fb1-269c-421c-ae34-57d2ceb7e47a"
|
||||
```
|
||||
|
||||
Make the current setup prove itself through the same validations the UI shows.
|
||||
Where possible, compute `ok`, `fail`, `err`, or `nil` from validators rather
|
||||
than relying only on manual confirmation.
|
||||
|
||||
Important targets include KeyCape client config, privacyIDEA realm/resolver,
|
||||
LLDAP user/group membership, Authelia/KeyCape route health, OpenBao OIDC auth
|
||||
config, token policy proof, audit status, restore evidence, and State Hub sync.
|
||||
|
||||
Done when the UI can distinguish success, failure, error, and unknown states
|
||||
for the critical bootstrap gates and the live setup satisfies those checks.
|
||||
|
||||
### T09 - Assess Scratch-Rebuild Risk And Define A Rehearsal Plan
|
||||
|
||||
```task
|
||||
id: NET-WP-0018-T09
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "a9e60fd5-fac6-46e9-bc63-b2979cca548e"
|
||||
```
|
||||
|
||||
Review the resulting architecture, guide, automation, tests, and live
|
||||
validation coverage. Produce a risk assessment for restarting the NetKingdom
|
||||
infrastructure from scratch.
|
||||
|
||||
The assessment should classify each risk by likelihood, impact, detection
|
||||
method, mitigation, and remaining human interaction. It should also recommend
|
||||
whether the next rebuild should be a full teardown, an isolated parallel
|
||||
cluster rehearsal, a namespace-level rehearsal, or a scripted dry run.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- `NET-WP-0015` is closed, archived, or explicitly reconciled with remaining
|
||||
work owned elsewhere.
|
||||
- `docs/NetkingdomRuntimeArchitecture.md` documents the real deployed runtime.
|
||||
- A bootstrap retrospective and automation gap matrix exists.
|
||||
- Associated repository boundaries are reviewed and updated or tracked with
|
||||
follow-up work.
|
||||
- A smooth bootstrap guide describes the intended sequence and evidence.
|
||||
- The control surface follows the guide and uses safe automation wherever
|
||||
appropriate.
|
||||
- Every bootstrap UI section and runbook has automated coverage.
|
||||
- The live setup passes the integrated validations or reports actionable
|
||||
failures.
|
||||
- A scratch-rebuild risk assessment recommends the next rehearsal strategy.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Do not perform a destructive live rebuild as part of this workplan.
|
||||
- Do not move secret material into Git, State Hub, or the bootstrap UI.
|
||||
- Do not hide remaining human custody decisions behind automation.
|
||||
- Do not collapse repository ownership boundaries merely for convenience.
|
||||
Reference in New Issue
Block a user