Files
net-kingdom/history/2026-06-17-openbao-ssh-custody-and-bootstrap-assessment.md
tegwick 5a5eb482d4 docs(NET-WP-0020): T5 automation ready; operator apply is next gate
Update workplan T5 to progress and assessment next-actions for live cluster
apply before WP-0008 warden sign smoke.
2026-06-18 01:06:43 +02:00

252 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OpenBao, SSH, and Bootstrap Custody — State Assessment
**Date:** 2026-06-17
**Author:** codex (with operator session evidence)
**Purpose:** Persist current state, concepts, and navigation map so security
setup work does not lose context while implementing NET-WP-0020 T5 and related
automation.
**Repos:** `net-kingdom`, `railiance-platform`, `railiance-infra`, `ops-warden`
---
## 1. Executive summary
NetKingdoms **first platform bootstrap is complete** (console stage **S6**,
OpenBao live at `https://bao.coulomb.social`). **SSH certificate infrastructure
via OpenBao is not started:** no `ssh/` secrets engine, hosts still on **legacy
static-key SSH** that predates OpenBao and ops-warden.
We adopted an **automation-first custody strategy** for *future* greenfield
rebuilds (`sops-held-automation`), while **blocking** unimplemented production
models (`attended-ceremony`, `auto-unseal-transit`) in the security bootstrap
console. That does **not** re-init the live cluster.
**Next implementation slice (after this assessment):** NET-WP-0020 **T5**
declarative OpenBao SSH engine + railiance-infra host CA trust — on the **live**
cluster first, then prove full unattended chain on greenfield 3-node.
---
## 2. Current state (verified 2026-06-17)
### 2.1 NetKingdom security bootstrap (operator workstation)
| Item | State |
| --- | --- |
| Metadata | `net-kingdom/.local/security-bootstrap.json` |
| Console stage | **S6 — Reopen under custody** |
| King custody | Approved (`temporary-single-king` or equivalent) |
| OpenBao unseal custody model | **`sops-held-automation`** selected (2026-06-17) |
| OpenBao init | **`openbao_initialized: true`** (from **first attended bootstrap**) |
| All bootstrap gates | **done** (preflight, OIDC, restore drill, platform reopen) |
| Plaintext bootstrap secrets | **absent** (good) |
| Encrypted bundle | `sso-mfa/bootstrap/secrets.enc` (11 files) |
**Interpretation:** Selecting `sops-held-automation` records the **preferred
model for the next rebuild**. Init ceremony gate shows **done** because OpenBao
was already initialized manually in NET-WP-00150017 — not because SOPS-held
automation ran on this cluster.
### 2.2 OpenBao platform (Railiance01 / production endpoint)
| Check | Result |
| --- | --- |
| `/v1/sys/health` | initialized, unsealed, v2.5.4+ |
| UI login | `netkingdom` / `platform-admin` (KeyCape OIDC) — works |
| **`ssh/` secrets engine** | **Not enabled** (operator confirmed) |
| `platform/operators/ops-warden` KV | **Not required** for SSH signing |
Evidence: `ops-warden/history/2026-06-17-openbao-production-verify.md`
### 2.3 ops-warden workstation
| Item | State |
| --- | --- |
| `~/.config/warden/warden.yaml` | Present (`backend: vault`, `bao.coulomb.social`) |
| `~/.config/warden/inventory.yaml` | Present (seed actors) |
| Test keypair | `~/.ssh/agt-state-hub-bridge_ed25519` created |
| `warden sign` against production | **Blocked** — no SSH engine |
| WP-0008 T2 | **wait** — SSH engine + host trust |
| Policy gate (WP-0007) | Shipped, `policy.enabled: false` default |
### 2.4 SSH infrastructure lineage
```text
Legacy (today on hosts) Target (not built)
──────────────────────── ──────────────────
Static keys / authorized_keys OpenSSH CA + short-lived certs
CA key on disk (if any) OpenBao ssh/ engine CA
Predates OpenBao ops-warden warden sign
railiance-infra principals + TrustedUserCAKeys
```
---
## 3. Core concepts (do not conflate)
### 3.1 Two custody dimensions
| Dimension | Field / doc | What it governs |
| --- | --- | --- |
| **King / platform recovery custody** | `custody_mode` in metadata | Who holds recovery authority (single king vs 2-of-3) |
| **OpenBao init/unseal execution** | `openbao_unseal_custody_model` | *How* init/unseal runs (automation vs attended vs KMS) |
Both are valid and orthogonal. See `docs/openbao-unseal-custody-models.md`.
### 3.2 Three unseal custody models (init/unseal execution)
| Model ID | Status | Use |
| --- | --- | --- |
| `sops-held-automation` | **Implemented** (console) | Default for **greenfield fast test cycles**; entry: `creds-bootstrap-agent.sh` |
| `attended-ceremony` | **Planned** (blocked in console) | Production trust; matches **first bootstrap** already performed |
| `auto-unseal-transit` | **Planned** (blocked in console) | HA rebuilds without manual unseal |
**Development strategy (agreed 2026-06-17):**
1. Max automation first → prove SSH engine + host CA + `warden sign` loops
2. Add attended ceremony gates for production profiles
3. Add auto-unseal for ThreePhoenix HA
### 3.3 Two operational tracks
```text
Track A — LIVE cluster (Railiance01 today)
• OpenBao: up, attended init done
• Gap: enable ssh/ engine + host CA trust
• Work: NET-WP-0020 T5, ops-warden WP-0008 T2 verify
• Do NOT re-run init; do NOT require platform KV secret for warden
Track B — GREENFIELD 3-node (future automation proof)
• Clean Linux + root SSH on 3 machines
• S1 infra → S2 k3s HA → S3 OpenBao deploy
• sops-held-automation → creds-bootstrap-agent init/unseal (T2)
• T5 SSH engine + host CA → warden sign smoke
• Use separate metadata e.g. .local/security-bootstrap-greenfield.json
```
### 3.4 What does NOT help SSH signing
| Action | Why irrelevant |
| --- | --- |
| Create `platform/operators/ops-warden` KV secret | KV stores secrets; warden calls **`ssh/sign/<role>`** API |
| Browser UI login alone | Does not set `VAULT_TOKEN` for CLI/`warden` |
| Re-selecting custody model on S6 metadata | Records preference only; does not enable `ssh/` engine |
---
## 4. Repo ownership (NetKingdom map)
| Concern | Owner | Artifact |
| --- | --- | --- |
| Bootstrap orchestration & custody canon | **net-kingdom** | console, smooth-bootstrap-guide, NET-WP-0020 |
| OpenBao deploy + post-unseal config | **railiance-platform** | `openbao-deploy`, `openbao-configure-initial` |
| OpenBao SSH engine enable + roles | **railiance-platform** (T5) | `openbao-configure-ssh` (planned) |
| Host `TrustedUserCAKeys` + principals | **railiance-infra** (T5) | `bootstrap-ssh-ca` (planned) |
| Sign CLI + inventory + audit log | **ops-warden** | `warden sign`, WP-0007 policy gate |
| flex-auth pre-sign policies | **flex-auth** | WP-0008 T5 (later) |
---
## 5. Workplan map (active strands)
| ID | Repo | Focus | Status |
| --- | --- | --- | --- |
| **NET-WP-0020** | net-kingdom | Unseal custody models + SSH automation path | T1 done; **T5 next** |
| **WARDEN-WP-0008** | ops-warden | Production `warden sign` evidence | T2 wait on T5 |
| **RAIL-BS-WP-0007** | railiance-cluster | ThreePhoenix 3-node HA | Prerequisite for Track B at scale |
| NET-WP-0018 | net-kingdom | Smooth bootstrap guide | S6 reached on live bootstrap |
---
## 6. Console commands reference (operator session)
```bash
cd ~/net-kingdom
make security-bootstrap-openbao-unseal-custody-models
make security-bootstrap-select-openbao-unseal-custody-model MODEL=sops-held-automation
make security-bootstrap-console
```
**Observed (2026-06-17):** All gates `done`, stage S6, unseal model gate
`done` with automation entry `sso-mfa/bootstrap/creds-bootstrap-agent.sh`, init
ceremony `done` (historical init). Next safe action: *Review related workplans*
— expected for completed bootstrap, not an error.
**Greenfield preview (when T2 exists):**
```bash
export METADATA=.local/security-bootstrap-greenfield.json
make security-bootstrap-metadata-init METADATA="$METADATA"
make security-bootstrap-select-openbao-unseal-custody-model \
MODEL=sops-held-automation METADATA="$METADATA"
make security-bootstrap-console METADATA="$METADATA"
# Expect lower stage, init gate status "automation"
```
---
## 7. Automation chain (target end state)
```text
[3 nodes root SSH]
→ railiance-infra S1 baseline
→ railiance-cluster S2 k3s HA
→ railiance-platform openbao-deploy
→ net-kingdom creds-bootstrap-agent (sops-held init/unseal) [T2]
→ railiance-platform openbao-configure-initial [exists]
→ railiance-platform openbao-configure-ssh [T5 — scripted; operator apply pending]
→ railiance-infra bootstrap-ssh-ca (CA pubkey + principals) [T5]
→ ops-warden warden sign smoke [WP-0008 T2]
→ (later) flex-auth policy.enabled [WP-0008 T5]
```
On **Track A (live):** skip init/unseal steps; start at **openbao-configure-ssh**.
---
## 8. Credential management note (ops-warden)
Operator feedback: manual `ssh-keygen` for WP-0008 T2 is acceptable for first
sign proof but insufficient long-term. ops-warden should eventually document or
automate actor key lifecycle (`warden issue`, credential roster, rotation).
**Deferred** until T5 + T2 sign path succeeds.
---
## 9. Decisions log
| Date | Decision |
| --- | --- |
| 2026-06-17 | All three unseal custody models are canon; start automation-first |
| 2026-06-17 | Console blocks planned models with hints; only `sops-held-automation` selectable |
| 2026-06-17 | Live cluster uses Track A; greenfield uses Track B + separate metadata |
| 2026-06-17 | No `platform/operators/ops-warden` KV for SSH signing bootstrap |
| 2026-06-17 | Implement T5 on live OpenBao before greenfield full loop |
---
## 10. Next actions (ordered)
1. ~~Persist this assessment~~ (this file)
2. ~~**NET-WP-0020 T5** — automation artifacts in railiance-platform + railiance-infra~~ (2026-06-18)
3. **Operator apply**`make openbao-configure-ssh` then `make bootstrap-ssh-ca` (Track A)
4. **WP-0008 T2**`warden sign` smoke + append `openbao-production-verify.md`
5. **NET-WP-0020 T2** — wire `creds-bootstrap-agent.sh` for greenfield init/unseal
6. **NET-WP-0020 T3/T4** — unlock attended + auto-unseal console paths
---
## 11. Related files
| Path | Role |
| --- | --- |
| `docs/openbao-unseal-custody-models.md` | Unseal custody canon |
| `docs/smooth-bootstrap-guide.md` | Step 5 unseal model table |
| `workplans/NET-WP-0020-openbao-unseal-custody-and-ssh-automation.md` | Active workplan |
| `ops-warden/history/2026-06-17-openbao-production-verify.md` | Health + SSH engine gap |
| `ops-warden/workplans/WARDEN-WP-0008-*.md` | Production sign verification |
| `railiance-platform/docs/openbao.md` | Deploy + attended ceremony |
| `ops-warden/wiki/OpenBaoSshEngineChecklist.md` | Role TTL + verify procedure |
| `ops-warden/history/2026-06-17-post-wp0007-reassessment.md` | ops-warden completeness |