Files
ops-warden/wiki/playbooks/ops-bridge-tunnel-cert.md
tegwick d6088e4e16 Implement WP-0022 audit trail and WP-0023 INTENT–SCOPE closeout
Add unified metadata-only audit.jsonl with secret-material guard, instrument
sign/access/worker paths, and expose warden activity CLI. Surface broker hint
when VAULT_TOKEN is unset, refresh INTENT/SCOPE docs, and add production
integration checklists plus catalog lane promotion playbook.
2026-07-01 23:32:38 +02:00

5.8 KiB

ops-bridge Tunnel — cert_command Migration

Date: 2026-06-24
Workplan: WARDEN-WP-0013 T3
Catalog: ops-bridge-tunnel

Migrate an ops-bridge tunnel from static SSH keys to short-lived warden-signed certificates via the cert_command contract (wiki/CertCommandInterface.md).

ops-warden documents the migration; ops-bridge owns tunnel config changes.


Step 0 — Readiness gate (run this first)

Before editing any tunnel config, run the read-only readiness gate (WARDEN-WP-0016). It confirms ops-warden's side is set — actor inventory, TTL, public key, and (optionally) host principals — without signing anything:

python scripts/check_tunnel_cert_readiness.py \
  --actor agt-state-hub-bridge \
  --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub \
  --config ~/.config/warden/warden.yaml \
  --infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml

Exit 0 = ready, 1 = a check failed (fix before proceeding), 2 = bad input. The Prerequisites and Migration checklist below are the human-readable backing for what the gate verifies. To additionally prove the cert_command contract end to end against a local backend (issues a throwaway cert, validates identity/principals/TTL), add --sign-smoke with a local warden.yaml.


Prerequisites

  • Actor registered in ~/.config/warden/inventory.yaml (see wiki/ActorInventoryPatterns.md)
  • Actor keypair on disk (ssh_key private, .pub for signing)
  • Production warden.yaml with backend: vault and valid scoped VAULT_TOKEN
  • Host trusts warden/OpenBao CA (railiance-infra bootstrap-ssh-ca)
  • Host principal allows the actor's principals (railiance-infra ssh_principals.yaml)

Pilot tunnel: agt-state-hub-bridge

Field Value
Actor agt-state-hub-bridge
Type agt
Principals agt-task-bridge
TTL 24 h
Private key ~/.ssh/agt-state-hub-bridge_ed25519
Public key ~/.ssh/agt-state-hub-bridge_ed25519.pub
cert_command warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub

Pre-migration smoke (operator workstation)

export VAULT_TOKEN="<scoped-warden-sign-token>"   # never commit or paste in chat
warden status agt-state-hub-bridge
warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub | head -1

Confirm exit 0 and cert line starts with ssh-ed25519-cert-v01@openssh.com.


Migration checklist

1. Inventory and signing path

  • Actor exists: warden inventory list shows agt-state-hub-bridge
  • warden sign succeeds with production OpenBao backend
  • signatures.log records the sign (~/.local/state/warden/signatures.log)

2. ops-bridge tunnel config

Edit ~/.config/bridge/tunnels.yaml (ops-bridge repo owns schema; example below):

tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: agt-state-hub-bridge
    ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
    actor: agt-state-hub-bridge
    cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
  • cert_command uses the public key path (warden reads pubkey, writes cert to stdout)
  • ssh_user matches the certificate identity / host expectation
  • Remove or disable static-key-only fallback once cert path is verified

3. Host-side verification

  • Principal agt-task-bridge present in railiance-infra ssh_principals.yaml for target host
  • Run scripts/check_principals_drift.py if inventory hosts section documents allowed principals

4. Tunnel smoke

# ops-bridge (from ops-bridge repo)
bridge status state-hub-coulombcore
bridge up state-hub-coulombcore
  • Tunnel establishes without static cert file on disk
  • Re-run bridge up after cert TTL expires — cert_command re-issues automatically

5. Policy gate (optional, after FLEX-WP-0007)

When policy.enabled: true, confirm signatures.log includes policy_decision_id on tunnel-driven signs. See wiki/PolicyGatedSigning.md.


Rollback

Keep the static key path until cert_command smoke passes. To roll back:

  1. Remove cert_command from tunnel config
  2. Restore prior static-key or CertificateFile workflow
  3. Document rollback in ops-bridge session notes (not in git secrets)

Static-key tunnels (legacy)

Tunnels using agt-claude-* or other long-lived keys are out of scope for this pilot. Migrate per-tunnel when ops-bridge owner prioritizes them.


Live cutover evidence template

When ops-bridge completes the pilot cutover, record non-secret evidence only. Post a State Hub progress note or save under history/ with these fields:

Field Example / instruction
Tunnel id state-hub-coulombcore
Actor agt-state-hub-bridge
Readiness gate check_tunnel_cert_readiness.py exit code + date
First bridge up success ISO timestamp (tunnel established)
First warden-signed connection ISO timestamp from signatures.log or warden activity --kind sign
cert_command in use yes / no
Rollback tested yes / no — static-key path still available until verified
Operator human handle or agent id
Cross-links ops-bridge session notes, this playbook

Do not include VAULT_TOKEN, private keys, cert bodies, or host passwords in evidence. Use warden activity --days 1 --kind sign --json for sign metadata.

Coordination: message ops-bridge on State Hub with pointer to this template when starting cutover (WARDEN-WP-0023).


See also

  • wiki/CertCommandInterface.md
  • wiki/OpsWardenConfig.md — cert_command example
  • wiki/playbooks/operator-openbao-token-hygiene.md
  • wiki/AuditTrail.md — query recent signs via warden activity
  • warden route show ops-bridge-tunnel --json