Files
ops-warden/wiki/CertCommandInterface.md
2026-03-28 00:45:43 +00:00

3.3 KiB

cert_command Interface

Version: 1.0 Date: 2026-03-28 Purpose: Define the contract between OpsWarden (issuer) and callers such as ops-bridge (consumer) for just-in-time SSH certificate acquisition.


Overview

cert_command is a shell string that a caller executes to obtain a short-lived, CA-signed SSH certificate for a named actor. The caller passes the cert to the SSH process alongside the actor's private key.

This interface is intentionally tool-agnostic: the caller (ops-bridge, a script, a CI pipeline) does not need to know whether the CA is a local file or HashiCorp Vault. Any command that writes a cert to stdout and exits 0 satisfies the contract.


Contract

Invocation

warden sign <actor-name> --pubkey <path/to/actor.pub>

Or any equivalent shell command:

vault write -field=signed_key ssh/sign/agt-role public_key=@/tmp/key.pub
ssh-keygen -s /path/to/ca -I agt-test -n agt-task -V +24h /tmp/key.pub && cat /tmp/key-cert.pub

Success (exit 0)

  • Stdout: certificate text only — a single line starting with the key type, e.g.:
    ssh-ed25519-cert-v01@openssh.com AAAA...
    
  • Stderr: ignored by the caller (warden may print warnings there)
  • Side effect: cert is also written to ~/.local/state/warden/<actor>-cert.pub by warden (for use by warden status and warden scorecard)

Failure (exit non-zero)

  • Exit code: any non-zero value
  • Stdout: ignored
  • Stderr: passed through to caller logs / audit detail field
  • Caller behaviour: treat as a transient error; apply reconnect backoff and retry

Caller Responsibilities (ops-bridge)

  1. Run cert_command via subprocess.run(shell=True) before each SSH subprocess launch
  2. Write stdout to a tempfile in the state dir: ~/.local/state/bridge/<tunnel>-cert.pub
  3. Add -i <cert_path> after -i <key_path> in the ssh command
  4. Parse ssh-keygen -L -f <cert> to extract Key ID → log as cert_identity in audit
  5. Parse Valid before: → schedule pre-emptive cert refresh ~5 min before expiry
  6. On cert_command failure: log BRIDGE_DISCONNECTED with stderr; apply backoff

What the Caller Must NOT Do

  • Cache or reuse a cert across reconnects (always re-run cert_command per reconnect)
  • Write the cert to disk with world-readable permissions (mode 600)
  • Ignore a non-zero exit from cert_command (must treat as failure, trigger backoff)

Example: ops-bridge tunnels.yaml

tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: agt-state-hub-bridge
    ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
    actor: agt-state-hub-bridge
    # cert_command is optional. When absent, ssh_key is used directly (static key mode).
    cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"

TTL Guidelines (AccessManagementDirective §2)

Actor type Max TTL Pre-emptive refresh
adm 48 h 5 min before expiry
agt 24 h 5 min before expiry
atm 8 h 5 min before expiry

ops-bridge enforces the refresh schedule. OpsWarden enforces the max TTL at signing time.


Backward Compatibility

Callers that do not set cert_command continue to use the static key (ssh_key) with no TTL, cert logic, or refresh. The two modes are fully independent.