Files
ops-warden/wiki/playbooks/ops-bridge-tunnel-cert.md
tegwick d6088e4e16 Implement WP-0022 audit trail and WP-0023 INTENT–SCOPE closeout
Add unified metadata-only audit.jsonl with secret-material guard, instrument
sign/access/worker paths, and expose warden activity CLI. Surface broker hint
when VAULT_TOKEN is unset, refresh INTENT/SCOPE docs, and add production
integration checklists plus catalog lane promotion playbook.
2026-07-01 23:32:38 +02:00

169 lines
5.8 KiB
Markdown

# ops-bridge Tunnel — cert_command Migration
Date: 2026-06-24
Workplan: WARDEN-WP-0013 T3
Catalog: `ops-bridge-tunnel`
Migrate an ops-bridge tunnel from **static SSH keys** to **short-lived warden-signed
certificates** via the `cert_command` contract (`wiki/CertCommandInterface.md`).
ops-warden documents the migration; **ops-bridge** owns tunnel config changes.
---
## Step 0 — Readiness gate (run this first)
Before editing any tunnel config, run the read-only readiness gate (WARDEN-WP-0016).
It confirms ops-warden's side is set — actor inventory, TTL, public key, and (optionally)
host principals — **without signing anything**:
```bash
python scripts/check_tunnel_cert_readiness.py \
--actor agt-state-hub-bridge \
--pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub \
--config ~/.config/warden/warden.yaml \
--infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml
```
Exit 0 = ready, 1 = a check failed (fix before proceeding), 2 = bad input. The
Prerequisites and Migration checklist below are the human-readable backing for what the
gate verifies. To additionally prove the `cert_command` contract end to end against a
**local** backend (issues a throwaway cert, validates identity/principals/TTL), add
`--sign-smoke` with a local `warden.yaml`.
---
## Prerequisites
- [ ] Actor registered in `~/.config/warden/inventory.yaml` (see `wiki/ActorInventoryPatterns.md`)
- [ ] Actor keypair on disk (`ssh_key` private, `.pub` for signing)
- [ ] Production `warden.yaml` with `backend: vault` and valid scoped `VAULT_TOKEN`
- [ ] Host trusts warden/OpenBao CA (`railiance-infra` `bootstrap-ssh-ca`)
- [ ] Host principal allows the actor's principals (`railiance-infra` `ssh_principals.yaml`)
---
## Pilot tunnel: `agt-state-hub-bridge`
| Field | Value |
| --- | --- |
| Actor | `agt-state-hub-bridge` |
| Type | `agt` |
| Principals | `agt-task-bridge` |
| TTL | 24 h |
| Private key | `~/.ssh/agt-state-hub-bridge_ed25519` |
| Public key | `~/.ssh/agt-state-hub-bridge_ed25519.pub` |
| cert_command | `warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub` |
### Pre-migration smoke (operator workstation)
```bash
export VAULT_TOKEN="<scoped-warden-sign-token>" # never commit or paste in chat
warden status agt-state-hub-bridge
warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub | head -1
```
Confirm exit 0 and cert line starts with `ssh-ed25519-cert-v01@openssh.com`.
---
## Migration checklist
### 1. Inventory and signing path
- [ ] Actor exists: `warden inventory list` shows `agt-state-hub-bridge`
- [ ] `warden sign` succeeds with production OpenBao backend
- [ ] `signatures.log` records the sign (`~/.local/state/warden/signatures.log`)
### 2. ops-bridge tunnel config
Edit `~/.config/bridge/tunnels.yaml` (ops-bridge repo owns schema; example below):
```yaml
tunnels:
state-hub-coulombcore:
host: coulombcore
remote_port: 8001
local_port: 8000
ssh_user: agt-state-hub-bridge
ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
actor: agt-state-hub-bridge
cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
```
- [ ] `cert_command` uses the **public** key path (warden reads pubkey, writes cert to stdout)
- [ ] `ssh_user` matches the certificate identity / host expectation
- [ ] Remove or disable static-key-only fallback once cert path is verified
### 3. Host-side verification
- [ ] Principal `agt-task-bridge` present in `railiance-infra` `ssh_principals.yaml` for target host
- [ ] Run `scripts/check_principals_drift.py` if inventory `hosts` section documents allowed principals
### 4. Tunnel smoke
```bash
# ops-bridge (from ops-bridge repo)
bridge status state-hub-coulombcore
bridge up state-hub-coulombcore
```
- [ ] Tunnel establishes without static cert file on disk
- [ ] Re-run `bridge up` after cert TTL expires — `cert_command` re-issues automatically
### 5. Policy gate (optional, after FLEX-WP-0007)
When `policy.enabled: true`, confirm `signatures.log` includes `policy_decision_id`
on tunnel-driven signs. See `wiki/PolicyGatedSigning.md`.
---
## Rollback
Keep the static key path until cert_command smoke passes. To roll back:
1. Remove `cert_command` from tunnel config
2. Restore prior static-key or `CertificateFile` workflow
3. Document rollback in ops-bridge session notes (not in git secrets)
---
## Static-key tunnels (legacy)
Tunnels using `agt-claude-*` or other long-lived keys are **out of scope** for this
pilot. Migrate per-tunnel when ops-bridge owner prioritizes them.
---
## Live cutover evidence template
When ops-bridge completes the pilot cutover, record **non-secret** evidence only.
Post a State Hub progress note or save under `history/` with these fields:
| Field | Example / instruction |
| --- | --- |
| Tunnel id | `state-hub-coulombcore` |
| Actor | `agt-state-hub-bridge` |
| Readiness gate | `check_tunnel_cert_readiness.py` exit code + date |
| First `bridge up` success | ISO timestamp (tunnel established) |
| First warden-signed connection | ISO timestamp from `signatures.log` or `warden activity --kind sign` |
| `cert_command` in use | yes / no |
| Rollback tested | yes / no — static-key path still available until verified |
| Operator | human handle or agent id |
| Cross-links | ops-bridge session notes, this playbook |
**Do not** include `VAULT_TOKEN`, private keys, cert bodies, or host passwords in
evidence. Use `warden activity --days 1 --kind sign --json` for sign metadata.
Coordination: message `ops-bridge` on State Hub with pointer to this template when
starting cutover (WARDEN-WP-0023).
---
## See also
- `wiki/CertCommandInterface.md`
- `wiki/OpsWardenConfig.md` — cert_command example
- `wiki/playbooks/operator-openbao-token-hygiene.md`
- `wiki/AuditTrail.md` — query recent signs via `warden activity`
- `warden route show ops-bridge-tunnel --json`