Files
net-kingdom/history/2026-06-17-openbao-ssh-custody-and-bootstrap-assessment.md
tegwick 5a5eb482d4 docs(NET-WP-0020): T5 automation ready; operator apply is next gate
Update workplan T5 to progress and assessment next-actions for live cluster
apply before WP-0008 warden sign smoke.
2026-06-18 01:06:43 +02:00

10 KiB
Raw Blame History

OpenBao, SSH, and Bootstrap Custody — State Assessment

Date: 2026-06-17
Author: codex (with operator session evidence)
Purpose: Persist current state, concepts, and navigation map so security setup work does not lose context while implementing NET-WP-0020 T5 and related automation.

Repos: net-kingdom, railiance-platform, railiance-infra, ops-warden


1. Executive summary

NetKingdoms first platform bootstrap is complete (console stage S6, OpenBao live at https://bao.coulomb.social). SSH certificate infrastructure via OpenBao is not started: no ssh/ secrets engine, hosts still on legacy static-key SSH that predates OpenBao and ops-warden.

We adopted an automation-first custody strategy for future greenfield rebuilds (sops-held-automation), while blocking unimplemented production models (attended-ceremony, auto-unseal-transit) in the security bootstrap console. That does not re-init the live cluster.

Next implementation slice (after this assessment): NET-WP-0020 T5 — declarative OpenBao SSH engine + railiance-infra host CA trust — on the live cluster first, then prove full unattended chain on greenfield 3-node.


2. Current state (verified 2026-06-17)

2.1 NetKingdom security bootstrap (operator workstation)

Item State
Metadata net-kingdom/.local/security-bootstrap.json
Console stage S6 — Reopen under custody
King custody Approved (temporary-single-king or equivalent)
OpenBao unseal custody model sops-held-automation selected (2026-06-17)
OpenBao init openbao_initialized: true (from first attended bootstrap)
All bootstrap gates done (preflight, OIDC, restore drill, platform reopen)
Plaintext bootstrap secrets absent (good)
Encrypted bundle sso-mfa/bootstrap/secrets.enc (11 files)

Interpretation: Selecting sops-held-automation records the preferred model for the next rebuild. Init ceremony gate shows done because OpenBao was already initialized manually in NET-WP-00150017 — not because SOPS-held automation ran on this cluster.

2.2 OpenBao platform (Railiance01 / production endpoint)

Check Result
/v1/sys/health initialized, unsealed, v2.5.4+
UI login netkingdom / platform-admin (KeyCape OIDC) — works
ssh/ secrets engine Not enabled (operator confirmed)
platform/operators/ops-warden KV Not required for SSH signing

Evidence: ops-warden/history/2026-06-17-openbao-production-verify.md

2.3 ops-warden workstation

Item State
~/.config/warden/warden.yaml Present (backend: vault, bao.coulomb.social)
~/.config/warden/inventory.yaml Present (seed actors)
Test keypair ~/.ssh/agt-state-hub-bridge_ed25519 created
warden sign against production Blocked — no SSH engine
WP-0008 T2 wait — SSH engine + host trust
Policy gate (WP-0007) Shipped, policy.enabled: false default

2.4 SSH infrastructure lineage

Legacy (today on hosts)          Target (not built)
────────────────────────         ──────────────────
Static keys / authorized_keys    OpenSSH CA + short-lived certs
CA key on disk (if any)          OpenBao ssh/ engine CA
Predates OpenBao                 ops-warden warden sign
                                 railiance-infra principals + TrustedUserCAKeys

3. Core concepts (do not conflate)

3.1 Two custody dimensions

Dimension Field / doc What it governs
King / platform recovery custody custody_mode in metadata Who holds recovery authority (single king vs 2-of-3)
OpenBao init/unseal execution openbao_unseal_custody_model How init/unseal runs (automation vs attended vs KMS)

Both are valid and orthogonal. See docs/openbao-unseal-custody-models.md.

3.2 Three unseal custody models (init/unseal execution)

Model ID Status Use
sops-held-automation Implemented (console) Default for greenfield fast test cycles; entry: creds-bootstrap-agent.sh
attended-ceremony Planned (blocked in console) Production trust; matches first bootstrap already performed
auto-unseal-transit Planned (blocked in console) HA rebuilds without manual unseal

Development strategy (agreed 2026-06-17):

  1. Max automation first → prove SSH engine + host CA + warden sign loops
  2. Add attended ceremony gates for production profiles
  3. Add auto-unseal for ThreePhoenix HA

3.3 Two operational tracks

Track A — LIVE cluster (Railiance01 today)
  • OpenBao: up, attended init done
  • Gap: enable ssh/ engine + host CA trust
  • Work: NET-WP-0020 T5, ops-warden WP-0008 T2 verify
  • Do NOT re-run init; do NOT require platform KV secret for warden

Track B — GREENFIELD 3-node (future automation proof)
  • Clean Linux + root SSH on 3 machines
  • S1 infra → S2 k3s HA → S3 OpenBao deploy
  • sops-held-automation → creds-bootstrap-agent init/unseal (T2)
  • T5 SSH engine + host CA → warden sign smoke
  • Use separate metadata e.g. .local/security-bootstrap-greenfield.json

3.4 What does NOT help SSH signing

Action Why irrelevant
Create platform/operators/ops-warden KV secret KV stores secrets; warden calls ssh/sign/<role> API
Browser UI login alone Does not set VAULT_TOKEN for CLI/warden
Re-selecting custody model on S6 metadata Records preference only; does not enable ssh/ engine

4. Repo ownership (NetKingdom map)

Concern Owner Artifact
Bootstrap orchestration & custody canon net-kingdom console, smooth-bootstrap-guide, NET-WP-0020
OpenBao deploy + post-unseal config railiance-platform openbao-deploy, openbao-configure-initial
OpenBao SSH engine enable + roles railiance-platform (T5) openbao-configure-ssh (planned)
Host TrustedUserCAKeys + principals railiance-infra (T5) bootstrap-ssh-ca (planned)
Sign CLI + inventory + audit log ops-warden warden sign, WP-0007 policy gate
flex-auth pre-sign policies flex-auth WP-0008 T5 (later)

5. Workplan map (active strands)

ID Repo Focus Status
NET-WP-0020 net-kingdom Unseal custody models + SSH automation path T1 done; T5 next
WARDEN-WP-0008 ops-warden Production warden sign evidence T2 wait on T5
RAIL-BS-WP-0007 railiance-cluster ThreePhoenix 3-node HA Prerequisite for Track B at scale
NET-WP-0018 net-kingdom Smooth bootstrap guide S6 reached on live bootstrap

6. Console commands reference (operator session)

cd ~/net-kingdom

make security-bootstrap-openbao-unseal-custody-models
make security-bootstrap-select-openbao-unseal-custody-model MODEL=sops-held-automation
make security-bootstrap-console

Observed (2026-06-17): All gates done, stage S6, unseal model gate done with automation entry sso-mfa/bootstrap/creds-bootstrap-agent.sh, init ceremony done (historical init). Next safe action: Review related workplans — expected for completed bootstrap, not an error.

Greenfield preview (when T2 exists):

export METADATA=.local/security-bootstrap-greenfield.json
make security-bootstrap-metadata-init METADATA="$METADATA"
make security-bootstrap-select-openbao-unseal-custody-model \
  MODEL=sops-held-automation METADATA="$METADATA"
make security-bootstrap-console METADATA="$METADATA"
# Expect lower stage, init gate status "automation"

7. Automation chain (target end state)

[3 nodes root SSH]
    → railiance-infra S1 baseline
    → railiance-cluster S2 k3s HA
    → railiance-platform openbao-deploy
    → net-kingdom creds-bootstrap-agent (sops-held init/unseal)     [T2]
    → railiance-platform openbao-configure-initial                  [exists]
    → railiance-platform openbao-configure-ssh                       [T5 — scripted; operator apply pending]
    → railiance-infra bootstrap-ssh-ca (CA pubkey + principals)     [T5]
    → ops-warden warden sign smoke                                  [WP-0008 T2]
    → (later) flex-auth policy.enabled                              [WP-0008 T5]

On Track A (live): skip init/unseal steps; start at openbao-configure-ssh.


8. Credential management note (ops-warden)

Operator feedback: manual ssh-keygen for WP-0008 T2 is acceptable for first sign proof but insufficient long-term. ops-warden should eventually document or automate actor key lifecycle (warden issue, credential roster, rotation). Deferred until T5 + T2 sign path succeeds.


9. Decisions log

Date Decision
2026-06-17 All three unseal custody models are canon; start automation-first
2026-06-17 Console blocks planned models with hints; only sops-held-automation selectable
2026-06-17 Live cluster uses Track A; greenfield uses Track B + separate metadata
2026-06-17 No platform/operators/ops-warden KV for SSH signing bootstrap
2026-06-17 Implement T5 on live OpenBao before greenfield full loop

10. Next actions (ordered)

  1. Persist this assessment (this file)
  2. NET-WP-0020 T5 — automation artifacts in railiance-platform + railiance-infra (2026-06-18)
  3. Operator applymake openbao-configure-ssh then make bootstrap-ssh-ca (Track A)
  4. WP-0008 T2warden sign smoke + append openbao-production-verify.md
  5. NET-WP-0020 T2 — wire creds-bootstrap-agent.sh for greenfield init/unseal
  6. NET-WP-0020 T3/T4 — unlock attended + auto-unseal console paths

Path Role
docs/openbao-unseal-custody-models.md Unseal custody canon
docs/smooth-bootstrap-guide.md Step 5 unseal model table
workplans/NET-WP-0020-openbao-unseal-custody-and-ssh-automation.md Active workplan
ops-warden/history/2026-06-17-openbao-production-verify.md Health + SSH engine gap
ops-warden/workplans/WARDEN-WP-0008-*.md Production sign verification
railiance-platform/docs/openbao.md Deploy + attended ceremony
ops-warden/wiki/OpenBaoSshEngineChecklist.md Role TTL + verify procedure
ops-warden/history/2026-06-17-post-wp0007-reassessment.md ops-warden completeness