Files
ops-warden/wiki/playbooks/ops-bridge-tunnel-cert.md
tegwick 8bbd22285e feat(WARDEN-WP-0016): ops-bridge cert_command readiness gate + handoff
Close ops-warden's side of the last Partial INTENT criterion (ops-bridge integrates
via a stable cert_command). The migration playbook and contract already existed; what
was missing was an automated readiness gate before touching tunnel config.

T1 — scripts/check_tunnel_cert_readiness.py: read-only preflight that asserts the
cert_command path is ready without signing — config/backend, actor inventory + TTL
within type max, pubkey exists/parses/not-private, principals present, and optional
host-principal deployment (mirrors check_principals_drift). Exit 0/1/2.

T2 — opt-in --sign-smoke: runs the cert_command against the local backend and validates
identity/principals/TTL of the emitted cert; refuses a vault backend. Window measured
from the cert's own valid_from->valid_before so it's timezone-robust (fixes a CEST
off-by-2h artifact). integration-marked test + a vault-refusal unit test.

T3 — playbook now leads with Step 0 readiness gate; ops-bridge handoff message sent.
T4 — SCOPE INTENT row: Partial -> Pilot-ready; known-gaps + SSH-lane list updated.

9 unit + 1 integration test, 209 default passing, lint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 19:50:28 +02:00

143 lines
4.7 KiB
Markdown

# ops-bridge Tunnel — cert_command Migration
Date: 2026-06-24
Workplan: WARDEN-WP-0013 T3
Catalog: `ops-bridge-tunnel`
Migrate an ops-bridge tunnel from **static SSH keys** to **short-lived warden-signed
certificates** via the `cert_command` contract (`wiki/CertCommandInterface.md`).
ops-warden documents the migration; **ops-bridge** owns tunnel config changes.
---
## Step 0 — Readiness gate (run this first)
Before editing any tunnel config, run the read-only readiness gate (WARDEN-WP-0016).
It confirms ops-warden's side is set — actor inventory, TTL, public key, and (optionally)
host principals — **without signing anything**:
```bash
python scripts/check_tunnel_cert_readiness.py \
--actor agt-state-hub-bridge \
--pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub \
--config ~/.config/warden/warden.yaml \
--infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml
```
Exit 0 = ready, 1 = a check failed (fix before proceeding), 2 = bad input. The
Prerequisites and Migration checklist below are the human-readable backing for what the
gate verifies. To additionally prove the `cert_command` contract end to end against a
**local** backend (issues a throwaway cert, validates identity/principals/TTL), add
`--sign-smoke` with a local `warden.yaml`.
---
## Prerequisites
- [ ] Actor registered in `~/.config/warden/inventory.yaml` (see `wiki/ActorInventoryPatterns.md`)
- [ ] Actor keypair on disk (`ssh_key` private, `.pub` for signing)
- [ ] Production `warden.yaml` with `backend: vault` and valid scoped `VAULT_TOKEN`
- [ ] Host trusts warden/OpenBao CA (`railiance-infra` `bootstrap-ssh-ca`)
- [ ] Host principal allows the actor's principals (`railiance-infra` `ssh_principals.yaml`)
---
## Pilot tunnel: `agt-state-hub-bridge`
| Field | Value |
| --- | --- |
| Actor | `agt-state-hub-bridge` |
| Type | `agt` |
| Principals | `agt-task-bridge` |
| TTL | 24 h |
| Private key | `~/.ssh/agt-state-hub-bridge_ed25519` |
| Public key | `~/.ssh/agt-state-hub-bridge_ed25519.pub` |
| cert_command | `warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub` |
### Pre-migration smoke (operator workstation)
```bash
export VAULT_TOKEN="<scoped-warden-sign-token>" # never commit or paste in chat
warden status agt-state-hub-bridge
warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub | head -1
```
Confirm exit 0 and cert line starts with `ssh-ed25519-cert-v01@openssh.com`.
---
## Migration checklist
### 1. Inventory and signing path
- [ ] Actor exists: `warden inventory list` shows `agt-state-hub-bridge`
- [ ] `warden sign` succeeds with production OpenBao backend
- [ ] `signatures.log` records the sign (`~/.local/state/warden/signatures.log`)
### 2. ops-bridge tunnel config
Edit `~/.config/bridge/tunnels.yaml` (ops-bridge repo owns schema; example below):
```yaml
tunnels:
state-hub-coulombcore:
host: coulombcore
remote_port: 8001
local_port: 8000
ssh_user: agt-state-hub-bridge
ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
actor: agt-state-hub-bridge
cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
```
- [ ] `cert_command` uses the **public** key path (warden reads pubkey, writes cert to stdout)
- [ ] `ssh_user` matches the certificate identity / host expectation
- [ ] Remove or disable static-key-only fallback once cert path is verified
### 3. Host-side verification
- [ ] Principal `agt-task-bridge` present in `railiance-infra` `ssh_principals.yaml` for target host
- [ ] Run `scripts/check_principals_drift.py` if inventory `hosts` section documents allowed principals
### 4. Tunnel smoke
```bash
# ops-bridge (from ops-bridge repo)
bridge status state-hub-coulombcore
bridge up state-hub-coulombcore
```
- [ ] Tunnel establishes without static cert file on disk
- [ ] Re-run `bridge up` after cert TTL expires — `cert_command` re-issues automatically
### 5. Policy gate (optional, after FLEX-WP-0007)
When `policy.enabled: true`, confirm `signatures.log` includes `policy_decision_id`
on tunnel-driven signs. See `wiki/PolicyGatedSigning.md`.
---
## Rollback
Keep the static key path until cert_command smoke passes. To roll back:
1. Remove `cert_command` from tunnel config
2. Restore prior static-key or `CertificateFile` workflow
3. Document rollback in ops-bridge session notes (not in git secrets)
---
## Static-key tunnels (legacy)
Tunnels using `agt-claude-*` or other long-lived keys are **out of scope** for this
pilot. Migrate per-tunnel when ops-bridge owner prioritizes them.
---
## See also
- `wiki/CertCommandInterface.md`
- `wiki/OpsWardenConfig.md` — cert_command example
- `wiki/playbooks/operator-openbao-token-hygiene.md`
- `warden route show ops-bridge-tunnel --json`