Files
ops-warden/wiki/playbooks/ops-bridge-tunnel-cert.md
tegwick 8bbd22285e feat(WARDEN-WP-0016): ops-bridge cert_command readiness gate + handoff
Close ops-warden's side of the last Partial INTENT criterion (ops-bridge integrates
via a stable cert_command). The migration playbook and contract already existed; what
was missing was an automated readiness gate before touching tunnel config.

T1 — scripts/check_tunnel_cert_readiness.py: read-only preflight that asserts the
cert_command path is ready without signing — config/backend, actor inventory + TTL
within type max, pubkey exists/parses/not-private, principals present, and optional
host-principal deployment (mirrors check_principals_drift). Exit 0/1/2.

T2 — opt-in --sign-smoke: runs the cert_command against the local backend and validates
identity/principals/TTL of the emitted cert; refuses a vault backend. Window measured
from the cert's own valid_from->valid_before so it's timezone-robust (fixes a CEST
off-by-2h artifact). integration-marked test + a vault-refusal unit test.

T3 — playbook now leads with Step 0 readiness gate; ops-bridge handoff message sent.
T4 — SCOPE INTENT row: Partial -> Pilot-ready; known-gaps + SSH-lane list updated.

9 unit + 1 integration test, 209 default passing, lint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 19:50:28 +02:00

4.7 KiB

ops-bridge Tunnel — cert_command Migration

Date: 2026-06-24
Workplan: WARDEN-WP-0013 T3
Catalog: ops-bridge-tunnel

Migrate an ops-bridge tunnel from static SSH keys to short-lived warden-signed certificates via the cert_command contract (wiki/CertCommandInterface.md).

ops-warden documents the migration; ops-bridge owns tunnel config changes.


Step 0 — Readiness gate (run this first)

Before editing any tunnel config, run the read-only readiness gate (WARDEN-WP-0016). It confirms ops-warden's side is set — actor inventory, TTL, public key, and (optionally) host principals — without signing anything:

python scripts/check_tunnel_cert_readiness.py \
  --actor agt-state-hub-bridge \
  --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub \
  --config ~/.config/warden/warden.yaml \
  --infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml

Exit 0 = ready, 1 = a check failed (fix before proceeding), 2 = bad input. The Prerequisites and Migration checklist below are the human-readable backing for what the gate verifies. To additionally prove the cert_command contract end to end against a local backend (issues a throwaway cert, validates identity/principals/TTL), add --sign-smoke with a local warden.yaml.


Prerequisites

  • Actor registered in ~/.config/warden/inventory.yaml (see wiki/ActorInventoryPatterns.md)
  • Actor keypair on disk (ssh_key private, .pub for signing)
  • Production warden.yaml with backend: vault and valid scoped VAULT_TOKEN
  • Host trusts warden/OpenBao CA (railiance-infra bootstrap-ssh-ca)
  • Host principal allows the actor's principals (railiance-infra ssh_principals.yaml)

Pilot tunnel: agt-state-hub-bridge

Field Value
Actor agt-state-hub-bridge
Type agt
Principals agt-task-bridge
TTL 24 h
Private key ~/.ssh/agt-state-hub-bridge_ed25519
Public key ~/.ssh/agt-state-hub-bridge_ed25519.pub
cert_command warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub

Pre-migration smoke (operator workstation)

export VAULT_TOKEN="<scoped-warden-sign-token>"   # never commit or paste in chat
warden status agt-state-hub-bridge
warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub | head -1

Confirm exit 0 and cert line starts with ssh-ed25519-cert-v01@openssh.com.


Migration checklist

1. Inventory and signing path

  • Actor exists: warden inventory list shows agt-state-hub-bridge
  • warden sign succeeds with production OpenBao backend
  • signatures.log records the sign (~/.local/state/warden/signatures.log)

2. ops-bridge tunnel config

Edit ~/.config/bridge/tunnels.yaml (ops-bridge repo owns schema; example below):

tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: agt-state-hub-bridge
    ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
    actor: agt-state-hub-bridge
    cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
  • cert_command uses the public key path (warden reads pubkey, writes cert to stdout)
  • ssh_user matches the certificate identity / host expectation
  • Remove or disable static-key-only fallback once cert path is verified

3. Host-side verification

  • Principal agt-task-bridge present in railiance-infra ssh_principals.yaml for target host
  • Run scripts/check_principals_drift.py if inventory hosts section documents allowed principals

4. Tunnel smoke

# ops-bridge (from ops-bridge repo)
bridge status state-hub-coulombcore
bridge up state-hub-coulombcore
  • Tunnel establishes without static cert file on disk
  • Re-run bridge up after cert TTL expires — cert_command re-issues automatically

5. Policy gate (optional, after FLEX-WP-0007)

When policy.enabled: true, confirm signatures.log includes policy_decision_id on tunnel-driven signs. See wiki/PolicyGatedSigning.md.


Rollback

Keep the static key path until cert_command smoke passes. To roll back:

  1. Remove cert_command from tunnel config
  2. Restore prior static-key or CertificateFile workflow
  3. Document rollback in ops-bridge session notes (not in git secrets)

Static-key tunnels (legacy)

Tunnels using agt-claude-* or other long-lived keys are out of scope for this pilot. Migrate per-tunnel when ops-bridge owner prioritizes them.


See also

  • wiki/CertCommandInterface.md
  • wiki/OpsWardenConfig.md — cert_command example
  • wiki/playbooks/operator-openbao-token-hygiene.md
  • warden route show ops-bridge-tunnel --json