docs: persist OpenBao/SSH/bootstrap state assessment in history

Capture live vs greenfield tracks, unseal custody models, console S6
interpretation, repo ownership, and ordered next actions before NET-WP-0020 T5.
This commit is contained in:
2026-06-18 01:01:50 +02:00
parent f625dd0681
commit 6336c28626
2 changed files with 252 additions and 0 deletions

View File

@@ -0,0 +1,251 @@
# OpenBao, SSH, and Bootstrap Custody — State Assessment
**Date:** 2026-06-17
**Author:** codex (with operator session evidence)
**Purpose:** Persist current state, concepts, and navigation map so security
setup work does not lose context while implementing NET-WP-0020 T5 and related
automation.
**Repos:** `net-kingdom`, `railiance-platform`, `railiance-infra`, `ops-warden`
---
## 1. Executive summary
NetKingdoms **first platform bootstrap is complete** (console stage **S6**,
OpenBao live at `https://bao.coulomb.social`). **SSH certificate infrastructure
via OpenBao is not started:** no `ssh/` secrets engine, hosts still on **legacy
static-key SSH** that predates OpenBao and ops-warden.
We adopted an **automation-first custody strategy** for *future* greenfield
rebuilds (`sops-held-automation`), while **blocking** unimplemented production
models (`attended-ceremony`, `auto-unseal-transit`) in the security bootstrap
console. That does **not** re-init the live cluster.
**Next implementation slice (after this assessment):** NET-WP-0020 **T5**
declarative OpenBao SSH engine + railiance-infra host CA trust — on the **live**
cluster first, then prove full unattended chain on greenfield 3-node.
---
## 2. Current state (verified 2026-06-17)
### 2.1 NetKingdom security bootstrap (operator workstation)
| Item | State |
| --- | --- |
| Metadata | `net-kingdom/.local/security-bootstrap.json` |
| Console stage | **S6 — Reopen under custody** |
| King custody | Approved (`temporary-single-king` or equivalent) |
| OpenBao unseal custody model | **`sops-held-automation`** selected (2026-06-17) |
| OpenBao init | **`openbao_initialized: true`** (from **first attended bootstrap**) |
| All bootstrap gates | **done** (preflight, OIDC, restore drill, platform reopen) |
| Plaintext bootstrap secrets | **absent** (good) |
| Encrypted bundle | `sso-mfa/bootstrap/secrets.enc` (11 files) |
**Interpretation:** Selecting `sops-held-automation` records the **preferred
model for the next rebuild**. Init ceremony gate shows **done** because OpenBao
was already initialized manually in NET-WP-00150017 — not because SOPS-held
automation ran on this cluster.
### 2.2 OpenBao platform (Railiance01 / production endpoint)
| Check | Result |
| --- | --- |
| `/v1/sys/health` | initialized, unsealed, v2.5.4+ |
| UI login | `netkingdom` / `platform-admin` (KeyCape OIDC) — works |
| **`ssh/` secrets engine** | **Not enabled** (operator confirmed) |
| `platform/operators/ops-warden` KV | **Not required** for SSH signing |
Evidence: `ops-warden/history/2026-06-17-openbao-production-verify.md`
### 2.3 ops-warden workstation
| Item | State |
| --- | --- |
| `~/.config/warden/warden.yaml` | Present (`backend: vault`, `bao.coulomb.social`) |
| `~/.config/warden/inventory.yaml` | Present (seed actors) |
| Test keypair | `~/.ssh/agt-state-hub-bridge_ed25519` created |
| `warden sign` against production | **Blocked** — no SSH engine |
| WP-0008 T2 | **wait** — SSH engine + host trust |
| Policy gate (WP-0007) | Shipped, `policy.enabled: false` default |
### 2.4 SSH infrastructure lineage
```text
Legacy (today on hosts) Target (not built)
──────────────────────── ──────────────────
Static keys / authorized_keys OpenSSH CA + short-lived certs
CA key on disk (if any) OpenBao ssh/ engine CA
Predates OpenBao ops-warden warden sign
railiance-infra principals + TrustedUserCAKeys
```
---
## 3. Core concepts (do not conflate)
### 3.1 Two custody dimensions
| Dimension | Field / doc | What it governs |
| --- | --- | --- |
| **King / platform recovery custody** | `custody_mode` in metadata | Who holds recovery authority (single king vs 2-of-3) |
| **OpenBao init/unseal execution** | `openbao_unseal_custody_model` | *How* init/unseal runs (automation vs attended vs KMS) |
Both are valid and orthogonal. See `docs/openbao-unseal-custody-models.md`.
### 3.2 Three unseal custody models (init/unseal execution)
| Model ID | Status | Use |
| --- | --- | --- |
| `sops-held-automation` | **Implemented** (console) | Default for **greenfield fast test cycles**; entry: `creds-bootstrap-agent.sh` |
| `attended-ceremony` | **Planned** (blocked in console) | Production trust; matches **first bootstrap** already performed |
| `auto-unseal-transit` | **Planned** (blocked in console) | HA rebuilds without manual unseal |
**Development strategy (agreed 2026-06-17):**
1. Max automation first → prove SSH engine + host CA + `warden sign` loops
2. Add attended ceremony gates for production profiles
3. Add auto-unseal for ThreePhoenix HA
### 3.3 Two operational tracks
```text
Track A — LIVE cluster (Railiance01 today)
• OpenBao: up, attended init done
• Gap: enable ssh/ engine + host CA trust
• Work: NET-WP-0020 T5, ops-warden WP-0008 T2 verify
• Do NOT re-run init; do NOT require platform KV secret for warden
Track B — GREENFIELD 3-node (future automation proof)
• Clean Linux + root SSH on 3 machines
• S1 infra → S2 k3s HA → S3 OpenBao deploy
• sops-held-automation → creds-bootstrap-agent init/unseal (T2)
• T5 SSH engine + host CA → warden sign smoke
• Use separate metadata e.g. .local/security-bootstrap-greenfield.json
```
### 3.4 What does NOT help SSH signing
| Action | Why irrelevant |
| --- | --- |
| Create `platform/operators/ops-warden` KV secret | KV stores secrets; warden calls **`ssh/sign/<role>`** API |
| Browser UI login alone | Does not set `VAULT_TOKEN` for CLI/`warden` |
| Re-selecting custody model on S6 metadata | Records preference only; does not enable `ssh/` engine |
---
## 4. Repo ownership (NetKingdom map)
| Concern | Owner | Artifact |
| --- | --- | --- |
| Bootstrap orchestration & custody canon | **net-kingdom** | console, smooth-bootstrap-guide, NET-WP-0020 |
| OpenBao deploy + post-unseal config | **railiance-platform** | `openbao-deploy`, `openbao-configure-initial` |
| OpenBao SSH engine enable + roles | **railiance-platform** (T5) | `openbao-configure-ssh` (planned) |
| Host `TrustedUserCAKeys` + principals | **railiance-infra** (T5) | `bootstrap-ssh-ca` (planned) |
| Sign CLI + inventory + audit log | **ops-warden** | `warden sign`, WP-0007 policy gate |
| flex-auth pre-sign policies | **flex-auth** | WP-0008 T5 (later) |
---
## 5. Workplan map (active strands)
| ID | Repo | Focus | Status |
| --- | --- | --- | --- |
| **NET-WP-0020** | net-kingdom | Unseal custody models + SSH automation path | T1 done; **T5 next** |
| **WARDEN-WP-0008** | ops-warden | Production `warden sign` evidence | T2 wait on T5 |
| **RAIL-BS-WP-0007** | railiance-cluster | ThreePhoenix 3-node HA | Prerequisite for Track B at scale |
| NET-WP-0018 | net-kingdom | Smooth bootstrap guide | S6 reached on live bootstrap |
---
## 6. Console commands reference (operator session)
```bash
cd ~/net-kingdom
make security-bootstrap-openbao-unseal-custody-models
make security-bootstrap-select-openbao-unseal-custody-model MODEL=sops-held-automation
make security-bootstrap-console
```
**Observed (2026-06-17):** All gates `done`, stage S6, unseal model gate
`done` with automation entry `sso-mfa/bootstrap/creds-bootstrap-agent.sh`, init
ceremony `done` (historical init). Next safe action: *Review related workplans*
— expected for completed bootstrap, not an error.
**Greenfield preview (when T2 exists):**
```bash
export METADATA=.local/security-bootstrap-greenfield.json
make security-bootstrap-metadata-init METADATA="$METADATA"
make security-bootstrap-select-openbao-unseal-custody-model \
MODEL=sops-held-automation METADATA="$METADATA"
make security-bootstrap-console METADATA="$METADATA"
# Expect lower stage, init gate status "automation"
```
---
## 7. Automation chain (target end state)
```text
[3 nodes root SSH]
→ railiance-infra S1 baseline
→ railiance-cluster S2 k3s HA
→ railiance-platform openbao-deploy
→ net-kingdom creds-bootstrap-agent (sops-held init/unseal) [T2]
→ railiance-platform openbao-configure-initial [exists]
→ railiance-platform openbao-configure-ssh [T5 — next]
→ railiance-infra bootstrap-ssh-ca (CA pubkey + principals) [T5]
→ ops-warden warden sign smoke [WP-0008 T2]
→ (later) flex-auth policy.enabled [WP-0008 T5]
```
On **Track A (live):** skip init/unseal steps; start at **openbao-configure-ssh**.
---
## 8. Credential management note (ops-warden)
Operator feedback: manual `ssh-keygen` for WP-0008 T2 is acceptable for first
sign proof but insufficient long-term. ops-warden should eventually document or
automate actor key lifecycle (`warden issue`, credential roster, rotation).
**Deferred** until T5 + T2 sign path succeeds.
---
## 9. Decisions log
| Date | Decision |
| --- | --- |
| 2026-06-17 | All three unseal custody models are canon; start automation-first |
| 2026-06-17 | Console blocks planned models with hints; only `sops-held-automation` selectable |
| 2026-06-17 | Live cluster uses Track A; greenfield uses Track B + separate metadata |
| 2026-06-17 | No `platform/operators/ops-warden` KV for SSH signing bootstrap |
| 2026-06-17 | Implement T5 on live OpenBao before greenfield full loop |
---
## 10. Next actions (ordered)
1. ~~Persist this assessment~~ (this file)
2. **NET-WP-0020 T5**`openbao-apply-ssh-engine.sh` + railiance-infra host CA role
3. **WP-0008 T2**`warden sign` smoke + append `openbao-production-verify.md`
4. **NET-WP-0020 T2** — wire `creds-bootstrap-agent.sh` for greenfield init/unseal
5. **NET-WP-0020 T3/T4** — unlock attended + auto-unseal console paths
---
## 11. Related files
| Path | Role |
| --- | --- |
| `docs/openbao-unseal-custody-models.md` | Unseal custody canon |
| `docs/smooth-bootstrap-guide.md` | Step 5 unseal model table |
| `workplans/NET-WP-0020-openbao-unseal-custody-and-ssh-automation.md` | Active workplan |
| `ops-warden/history/2026-06-17-openbao-production-verify.md` | Health + SSH engine gap |
| `ops-warden/workplans/WARDEN-WP-0008-*.md` | Production sign verification |
| `railiance-platform/docs/openbao.md` | Deploy + attended ceremony |
| `ops-warden/wiki/OpenBaoSshEngineChecklist.md` | Role TTL + verify procedure |
| `ops-warden/history/2026-06-17-post-wp0007-reassessment.md` | ops-warden completeness |

View File

@@ -90,5 +90,6 @@ priority: high
## See also
- `history/2026-06-17-openbao-ssh-custody-and-bootstrap-assessment.md` — state + concepts (read before T5)
- `ops-warden/workplans/WARDEN-WP-0008-production-ssh-path-and-stewardship-closeout.md`
- `railiance-platform/docs/openbao.md`