Files
ops-warden/wiki/OpsWardenConfig.md
tegwick 8e9383a33a feat: opt-in flex-auth policy gate and OpenBao verify (WP-0007)
Add policy.py client that calls flex-auth /v1/check before sign/issue when
policy.enabled is true. Record policy_decision_id in signatures.log. Default
off preserves existing inventory-only behavior. Document production OpenBao
health probe and update config/wiki references.
2026-06-17 08:37:14 +02:00

7.7 KiB

OpsWarden Configuration Reference

Config file: ~/.config/warden/warden.yaml (override with WARDEN_CONFIG env var)


Backend overview

Backend Config value Use when
Local CA backend: local Labs, CI, air-gapped dev, hosts without platform secrets access
Platform CA backend: vault Production and shared ops environments

Platform standard: Railiance S3 uses OpenBao as the runtime platform secrets service (RAIL-PL-WP-0002 in railiance-platform). OpenBao exposes a Vault-compatible HTTP API, so ops-warden keeps the config keys backend: vault and the vault: block — no separate OpenBao backend name is required. The same config works against OpenBao or HashiCorp Vault if you point vault.addr at either service.

ops-warden signs SSH certificates only. It does not deploy OpenBao, manage unseal keys, or store long-lived API secrets. Cluster bootstrap and custody live in railiance-platform and NetKingdom docs.


Local backend (lab / offline)

# Uses ssh-keygen -s with a CA private key on disk.
backend: local

# Path to the CA private key. Keep this file mode 600 and never commit it.
ca_key: ~/.ssh/ops-ca-user

inventory_path: ~/.config/warden/inventory.yaml
state_dir: ~/.local/state/warden

# Optional flex-auth gate (default off — see wiki/PolicyGatedSigning.md)
policy:
  enabled: false
  flex_auth_url: http://127.0.0.1:8080
  fail_closed: true

Bootstrapping the local CA key

# Generate CA keypair once (offline, secure location)
ssh-keygen -t ed25519 -f ~/.ssh/ops-ca-user -C "Ops SSH User CA (2026)" -N ""
chmod 600 ~/.ssh/ops-ca-user
chmod 644 ~/.ssh/ops-ca-user.pub

# Distribute ops-ca-user.pub to every host:
#   TrustedUserCAKeys /etc/ssh/ca/ca_user.pub  (in sshd_config)
# See railiance-infra bootstrap-ssh-ca.yml playbook.

OpenBao / Vault-compatible backend (production)

Use this backend against the platform OpenBao instance or any other SSH secrets engine that implements the Vault signing API (POST /v1/<mount>/sign/<role>).

Example — Railiance01 (browser / operator workstation)

backend: vault

vault:
  # OpenBao UI/API (KeyCape OIDC). Prefer short-lived tokens from policy, not root.
  addr: https://bao.coulomb.social

  mount: ssh

  role_map:
    adm: adm-role
    agt: agt-role
    atm: atm-role

  # OpenBao accepts the same X-Vault-Token header name as Vault.
  token_env: VAULT_TOKEN

inventory_path: ~/.config/warden/inventory.yaml
state_dir: ~/.local/state/warden

# Enable after flex-auth ssh-certificate policies are deployed:
# policy:
#   enabled: true
#   flex_auth_url: http://flex-auth.flex-auth.svc.cluster.local:8080
#   fail_closed: true

Example — in-cluster caller (pod or trusted host)

backend: vault

vault:
  addr: http://openbao.openbao.svc.cluster.local:8200
  mount: ssh
  role_map:
    adm: adm-role
    agt: agt-role
    atm: atm-role
  token_env: VAULT_TOKEN

Choose the addr that matches where warden runs: operators on a laptop use the external HTTPS endpoint; workloads inside the cluster use the internal service URL. See railiance-platform/docs/openbao.md for deployment and access paths.

Authentication

Export a token with permission to sign against the mapped roles:

# After OIDC login or policy-issued token (OpenBao CLI)
export VAULT_TOKEN="<short-lived-token>"

# Or HashiCorp Vault CLI against a Vault-compatible endpoint
vault login

warden reads the token from the env var named in vault.token_env (default VAULT_TOKEN). OpenBao uses the same header; you do not need a separate BAO_TOKEN unless you configure token_env that way.

On failure, warden sign suggests falling back to --backend local only for lab recovery — not as a production substitute.

SSH secrets engine setup (OpenBao)

Run once per environment after OpenBao is initialized and unsealed. Adjust TTL limits to match ActorType policy in wiki/AccessManagementDirective.md (adm 48 h, agt 24 h, atm 8 h).

# OpenBao CLI (bao) — preferred on Railiance
bao secrets enable ssh

bao write ssh/roles/agt-role \
    key_type=ca \
    allowed_users="*" \
    allow_user_certificates=true \
    default_user="agt" \
    ttl=24h max_ttl=24h

bao write ssh/roles/adm-role \
    key_type=ca \
    allowed_users="*" \
    allow_user_certificates=true \
    default_user="adm" \
    ttl=48h max_ttl=48h

bao write ssh/roles/atm-role \
    key_type=ca \
    allowed_users="*" \
    allow_user_certificates=true \
    default_user="atm" \
    ttl=8h max_ttl=8h

HashiCorp Vault uses the same paths with the vault CLI:

vault secrets enable ssh
vault write ssh/roles/agt-role key_type=ca ...  # same role parameters

Mount path defaults to ssh; override with vault.mount in warden.yaml if your engine lives elsewhere.

Platform references

Topic Location
OpenBao deploy, unseal, OIDC admin railiance-platform/docs/openbao.md
Host CA trust and principals railiance-infra Ansible playbooks
Signing contract for callers wiki/CertCommandInterface.md

Principals inventory (inventory.yaml)

actors:
  # Actor name must carry the prefix matching its type:
  #   adm-*  for adm, agt-*  for agt, atm-*  for atm
  agt-state-hub-bridge:
    type: agt
    # Principals embedded in the cert; matched against /etc/ssh/auth_principals/%u
    principals:
      - agt-task-bridge
    # Certificate TTL in hours. Defaults: adm=48, agt=24, atm=8
    ttl_hours: 24
    description: "ops-bridge tunnel agent for state-hub"

  adm-bernd:
    type: adm
    principals:
      - adm-full
    ttl_hours: 48

  atm-backup-daily:
    type: atm
    principals:
      - atm-backup-daily
    ttl_hours: 8
    description: "nightly backup automation"

hosts:
  # Optional: documents which principals are allowed on each host.
  # Not enforced by warden; used for reference and future tooling.
  coulombcore:
    allowed_principals:
      agt:
        - agt-task-bridge
      atm:
        - atm-backup-daily

Policy gate (flex-auth, opt-in)

When policy.enabled: true, warden sign and warden issue call flex-auth POST /v1/check before signing. Deny or unreachable (with fail_closed: true) blocks issuance. Allowed decisions store policy_decision_id in signatures.log.

policy:
  enabled: false                    # default — no behavior change
  flex_auth_url: http://127.0.0.1:8080
  fail_closed: true                 # deny when flex-auth unreachable
  tenant: tenant:platform
  subject_env: WARDEN_POLICY_SUBJECT
  system: ops-warden

Full request shape and rollout notes: wiki/PolicyGatedSigning.md.


Environment variables

Variable Default Description
WARDEN_CONFIG ~/.config/warden/warden.yaml Config file path
VAULT_TOKEN API token for backend: vault (OpenBao or Vault; name configurable via vault.token_env)
WARDEN_POLICY_SUBJECT IAM subject id for flex-auth checks (when policy.enabled)

cert_command integration with ops-bridge

Add cert_command to a tunnel in ~/.config/bridge/tunnels.yaml:

tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: agt-state-hub-bridge
    ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
    actor: agt-state-hub-bridge
    cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"

ops-bridge runs cert_command before each SSH launch, captures stdout as the cert, and passes it alongside the private key via ssh -i <key> -i <cert>. See wiki/CertCommandInterface.md for the full contract.