Files
ops-warden/wiki/OpsWardenConfig.md
tegwick 8e9383a33a feat: opt-in flex-auth policy gate and OpenBao verify (WP-0007)
Add policy.py client that calls flex-auth /v1/check before sign/issue when
policy.enabled is true. Record policy_decision_id in signatures.log. Default
off preserves existing inventory-only behavior. Document production OpenBao
health probe and update config/wiki references.
2026-06-17 08:37:14 +02:00

275 lines
7.7 KiB
Markdown

# OpsWarden Configuration Reference
Config file: `~/.config/warden/warden.yaml` (override with `WARDEN_CONFIG` env var)
---
## Backend overview
| Backend | Config value | Use when |
|---------|--------------|----------|
| Local CA | `backend: local` | Labs, CI, air-gapped dev, hosts without platform secrets access |
| Platform CA | `backend: vault` | Production and shared ops environments |
**Platform standard:** Railiance S3 uses [OpenBao](https://openbao.org/) as the
runtime platform secrets service (`RAIL-PL-WP-0002` in `railiance-platform`).
OpenBao exposes a **Vault-compatible HTTP API**, so ops-warden keeps the config
keys `backend: vault` and the `vault:` block — no separate OpenBao backend name
is required. The same config works against OpenBao or HashiCorp Vault if you point
`vault.addr` at either service.
ops-warden signs SSH certificates only. It does **not** deploy OpenBao, manage
unseal keys, or store long-lived API secrets. Cluster bootstrap and custody live
in `railiance-platform` and NetKingdom docs.
---
## Local backend (lab / offline)
```yaml
# Uses ssh-keygen -s with a CA private key on disk.
backend: local
# Path to the CA private key. Keep this file mode 600 and never commit it.
ca_key: ~/.ssh/ops-ca-user
inventory_path: ~/.config/warden/inventory.yaml
state_dir: ~/.local/state/warden
# Optional flex-auth gate (default off — see wiki/PolicyGatedSigning.md)
policy:
enabled: false
flex_auth_url: http://127.0.0.1:8080
fail_closed: true
```
### Bootstrapping the local CA key
```bash
# Generate CA keypair once (offline, secure location)
ssh-keygen -t ed25519 -f ~/.ssh/ops-ca-user -C "Ops SSH User CA (2026)" -N ""
chmod 600 ~/.ssh/ops-ca-user
chmod 644 ~/.ssh/ops-ca-user.pub
# Distribute ops-ca-user.pub to every host:
# TrustedUserCAKeys /etc/ssh/ca/ca_user.pub (in sshd_config)
# See railiance-infra bootstrap-ssh-ca.yml playbook.
```
---
## OpenBao / Vault-compatible backend (production)
Use this backend against the platform OpenBao instance or any other SSH secrets
engine that implements the Vault signing API (`POST /v1/<mount>/sign/<role>`).
### Example — Railiance01 (browser / operator workstation)
```yaml
backend: vault
vault:
# OpenBao UI/API (KeyCape OIDC). Prefer short-lived tokens from policy, not root.
addr: https://bao.coulomb.social
mount: ssh
role_map:
adm: adm-role
agt: agt-role
atm: atm-role
# OpenBao accepts the same X-Vault-Token header name as Vault.
token_env: VAULT_TOKEN
inventory_path: ~/.config/warden/inventory.yaml
state_dir: ~/.local/state/warden
# Enable after flex-auth ssh-certificate policies are deployed:
# policy:
# enabled: true
# flex_auth_url: http://flex-auth.flex-auth.svc.cluster.local:8080
# fail_closed: true
```
### Example — in-cluster caller (pod or trusted host)
```yaml
backend: vault
vault:
addr: http://openbao.openbao.svc.cluster.local:8200
mount: ssh
role_map:
adm: adm-role
agt: agt-role
atm: atm-role
token_env: VAULT_TOKEN
```
Choose the `addr` that matches where `warden` runs: operators on a laptop use
the external HTTPS endpoint; workloads inside the cluster use the internal
service URL. See `railiance-platform/docs/openbao.md` for deployment and access
paths.
### Authentication
Export a token with permission to sign against the mapped roles:
```bash
# After OIDC login or policy-issued token (OpenBao CLI)
export VAULT_TOKEN="<short-lived-token>"
# Or HashiCorp Vault CLI against a Vault-compatible endpoint
vault login
```
`warden` reads the token from the env var named in `vault.token_env` (default
`VAULT_TOKEN`). OpenBao uses the same header; you do not need a separate
`BAO_TOKEN` unless you configure `token_env` that way.
On failure, `warden sign` suggests falling back to `--backend local` only for
lab recovery — not as a production substitute.
### SSH secrets engine setup (OpenBao)
Run once per environment after OpenBao is initialized and unsealed. Adjust TTL
limits to match `ActorType` policy in `wiki/AccessManagementDirective.md`
(adm 48 h, agt 24 h, atm 8 h).
```bash
# OpenBao CLI (bao) — preferred on Railiance
bao secrets enable ssh
bao write ssh/roles/agt-role \
key_type=ca \
allowed_users="*" \
allow_user_certificates=true \
default_user="agt" \
ttl=24h max_ttl=24h
bao write ssh/roles/adm-role \
key_type=ca \
allowed_users="*" \
allow_user_certificates=true \
default_user="adm" \
ttl=48h max_ttl=48h
bao write ssh/roles/atm-role \
key_type=ca \
allowed_users="*" \
allow_user_certificates=true \
default_user="atm" \
ttl=8h max_ttl=8h
```
HashiCorp Vault uses the same paths with the `vault` CLI:
```bash
vault secrets enable ssh
vault write ssh/roles/agt-role key_type=ca ... # same role parameters
```
Mount path defaults to `ssh`; override with `vault.mount` in `warden.yaml` if
your engine lives elsewhere.
### Platform references
| Topic | Location |
|-------|----------|
| OpenBao deploy, unseal, OIDC admin | `railiance-platform/docs/openbao.md` |
| Host CA trust and principals | `railiance-infra` Ansible playbooks |
| Signing contract for callers | `wiki/CertCommandInterface.md` |
---
## Principals inventory (`inventory.yaml`)
```yaml
actors:
# Actor name must carry the prefix matching its type:
# adm-* for adm, agt-* for agt, atm-* for atm
agt-state-hub-bridge:
type: agt
# Principals embedded in the cert; matched against /etc/ssh/auth_principals/%u
principals:
- agt-task-bridge
# Certificate TTL in hours. Defaults: adm=48, agt=24, atm=8
ttl_hours: 24
description: "ops-bridge tunnel agent for state-hub"
adm-bernd:
type: adm
principals:
- adm-full
ttl_hours: 48
atm-backup-daily:
type: atm
principals:
- atm-backup-daily
ttl_hours: 8
description: "nightly backup automation"
hosts:
# Optional: documents which principals are allowed on each host.
# Not enforced by warden; used for reference and future tooling.
coulombcore:
allowed_principals:
agt:
- agt-task-bridge
atm:
- atm-backup-daily
```
---
## Policy gate (flex-auth, opt-in)
When `policy.enabled: true`, `warden sign` and `warden issue` call flex-auth
`POST /v1/check` before signing. Deny or unreachable (with `fail_closed: true`)
blocks issuance. Allowed decisions store `policy_decision_id` in `signatures.log`.
```yaml
policy:
enabled: false # default — no behavior change
flex_auth_url: http://127.0.0.1:8080
fail_closed: true # deny when flex-auth unreachable
tenant: tenant:platform
subject_env: WARDEN_POLICY_SUBJECT
system: ops-warden
```
Full request shape and rollout notes: `wiki/PolicyGatedSigning.md`.
---
## Environment variables
| Variable | Default | Description |
|----------|---------|-------------|
| `WARDEN_CONFIG` | `~/.config/warden/warden.yaml` | Config file path |
| `VAULT_TOKEN` | — | API token for `backend: vault` (OpenBao or Vault; name configurable via `vault.token_env`) |
| `WARDEN_POLICY_SUBJECT` | — | IAM subject id for flex-auth checks (when `policy.enabled`) |
---
## cert_command integration with ops-bridge
Add `cert_command` to a tunnel in `~/.config/bridge/tunnels.yaml`:
```yaml
tunnels:
state-hub-coulombcore:
host: coulombcore
remote_port: 8001
local_port: 8000
ssh_user: agt-state-hub-bridge
ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
actor: agt-state-hub-bridge
cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
```
`ops-bridge` runs `cert_command` before each SSH launch, captures stdout as the cert,
and passes it alongside the private key via `ssh -i <key> -i <cert>`.
See `wiki/CertCommandInterface.md` for the full contract.