Ship flex-auth policy gate registry and smoke evidence, archive WP-0009 through WP-0013, and add integration docs: ops-bridge cert_command migration playbook, operator OpenBao token hygiene, principals drift check script, and 2026-06-24 INTENT/SCOPE gap analysis.
8.1 KiB
Policy-Gated SSH Signing
Date: 2026-06-23
Status: implemented (opt-in) — WARDEN-WP-0007; policy package confirmed FLEX-WP-0006
By default warden sign authorizes via inventory allow-list and TTL policy
only. When policy.enabled: true in warden.yaml, ops-warden calls flex-auth
before signing and records the decision id in signatures.log.
Flow
warden sign <actor> --pubkey <path>
|
v
Load actor from inventory (type, principals, ttl)
|
v
policy.enabled?
no -> skip
yes -> flex-auth POST /v1/check
|
+-- DENY / unreachable (fail_closed) -> CAError
|
v ALLOW
CABackend.sign() (local or OpenBao SSH engine)
|
v
Append signatures.log (+ policy_decision_id when set)
The same gate runs for warden issue (local backend only).
flex-auth request shape
| Field | Source |
|---|---|
subject.id |
WARDEN_POLICY_SUBJECT env var, or actor name |
subject.type |
Actor type (adm / agt / atm) |
tenant |
policy.tenant (default tenant:platform) |
resource.id |
ssh-cert:actor/<actor-name> |
resource.type |
ssh-certificate |
action |
sign |
context.principals |
From inventory |
context.actor_type |
adm | agt | atm |
context.pubkey_fingerprint |
SHA256 of pubkey text |
context.ttl_hours |
Requested TTL |
flex-auth must return effect: allow and an id (or request_id) on allow.
Deny responses include a reason surfaced in the CLI error.
Configuration
# warden.yaml — policy gate (opt-in, default off)
policy:
enabled: false
flex_auth_url: http://127.0.0.1:8080
fail_closed: true
tenant: tenant:platform
subject_env: WARDEN_POLICY_SUBJECT
system: ops-warden
| Key | Default | Description |
|---|---|---|
enabled |
false |
When true, call flex-auth before every sign/issue |
flex_auth_url |
http://127.0.0.1:8080 |
flex-auth base URL |
fail_closed |
true |
Deny sign when flex-auth is unreachable or returns HTTP error |
tenant |
tenant:platform |
Tenant sent in subject and resource |
subject_env |
WARDEN_POLICY_SUBJECT |
Env var for IAM subject id override |
system |
ops-warden |
Resource system identifier |
Set WARDEN_POLICY_SUBJECT to the caller's IAM profile sub when available.
If unset, the actor name is used as subject id.
Versioning
| Version | Gate | Status |
|---|---|---|
| v1 | Inventory + TTL max | Shipped |
| v2 | flex-auth opt-in via policy.enabled |
Shipped (WP-0007) |
| v2.1 | Identity claims required for adm signs |
Planned |
| v3 | Tenant-scoped policies per tenant:* |
Planned |
What stays in inventory
- Actor registration (name, type, default principals, default TTL)
- Host reference documentation
- Scorecard local checks
flex-auth decides whether this sign request is allowed now; inventory defines what the actor is allowed to request.
flex-auth policy package (FLEX-WP-0006)
flex-auth owns the ssh-certificate / sign policy package. ops-warden consumes
it via POST /v1/check when policy.enabled: true.
Handoff (canonical): ~/flex-auth/docs/ops-warden-policy-gate-handoff.md
| Asset | flex-auth path |
|---|---|
| Policy package | examples/ops-warden/policy_package.md |
| Allow/deny fixtures | examples/ops-warden/policy_fixtures.yaml |
| Registry snapshot | examples/ops-warden/registry_snapshot.json |
| Subject manifest | examples/ops-warden/subject_manifest.yaml |
| Resource manifest | examples/ops-warden/resource_manifest.yaml |
Tenant and subject bindings
| Field | Value |
|---|---|
| Tenant | tenant:platform (policy.tenant) |
| Resource system | ops-warden (policy.system) |
| Resource type | ssh-certificate |
| Action | sign |
| Resource id | ssh-cert:actor/<actor-name> |
| Actor type | Example flex-auth subject | ops-warden inventory name pattern |
|---|---|---|
adm |
platform-steward |
adm-* |
agt |
ci-deploy-agent |
agt-* |
atm |
backup-automation |
atm-* |
Subject id sent to flex-auth: WARDEN_POLICY_SUBJECT when set, otherwise the
inventory actor name. flex-auth may also allow iam:<actor-name> when listed in
allowed_subjects on the resource.
Principals and TTL: Taken from the sign request (inventory defaults). flex-auth
denies when principals are empty/disallowed or TTL exceeds max_ttl_hours on the
registered resource.
Fixture coverage (flex-auth)
Allow: fixture:ops-warden-adm-sign-allow, fixture:ops-warden-agt-sign-allow,
fixture:ops-warden-atm-sign-allow.
Deny: fixture:ops-warden-unknown-subject-deny,
fixture:ops-warden-actor-type-mismatch-deny, fixture:ops-warden-ttl-above-max-deny,
fixture:ops-warden-disallowed-principal-deny,
fixture:ops-warden-missing-fingerprint-deny.
Local smoke
# flex-auth (from ~/flex-auth)
flex-auth serve --addr 127.0.0.1:8080 \
--registry examples/ops-warden/registry_snapshot.json \
--policy examples/ops-warden/policy_package.md \
--log /tmp/flex-auth-ops-warden-decisions.jsonl
# warden.yaml — policy.enabled: true, flex_auth_url pointing at flex-auth
# Use an actor registered in the flex-auth registry (example fixtures use
# template names; production needs a registry slice for real inventory actors).
Local end-to-end evidence: history/2026-06-23-flex-auth-policy-gate-local-smoke.md.
Production registry from inventory
Build a flex-auth registry snapshot that mirrors inventory.yaml actors:
python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \
-o registry/flex-auth/production_registry_snapshot.json
flex-auth load-registry --file registry/flex-auth/production_registry_snapshot.json
Re-run after adding or changing actors. Deploy the snapshot to the production
flex-auth runtime together with ~/flex-auth/examples/ops-warden/policy_package.md.
Smoke (non-secret):
./scripts/policy_gate_production_smoke.sh
# OpenBao-backed when VAULT_TOKEN is valid:
SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh
Evidence: history/2026-06-23-flex-auth-policy-gate-production-smoke.md.
Production rollout
Keep policy.enabled: false until flex-auth is reachable at policy.flex_auth_url
with fail_closed: true, unreachable flex-auth blocks all signs.
Operator checklist
| Step | Owner | Action |
|---|---|---|
| 1 | flex-auth | Deploy runtime; confirm curl <flex_auth_url>/healthz → 200 (FLEX-WP-0007) |
| 2 | flex-auth | Load production registry + policy package (~/flex-auth/examples/ops-warden/) |
| 3 | ops-warden | Regenerate registry from inventory: scripts/build_flex_auth_registry.py |
| 4 | ops-warden | Local smoke: ./scripts/policy_gate_production_smoke.sh |
| 5 | operator | Vault smoke: SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh (valid VAULT_TOKEN) |
| 6 | operator | Set policy.flex_auth_url in ~/.config/warden/warden.yaml |
| 7 | operator | Set policy.enabled: true; keep fail_closed: true |
| 8 | operator | Allow smoke: warden sign <actor> — signatures.log has policy_decision_id |
| 9 | operator | Deny smoke: e.g. --ttl above max — CLI shows flex-auth reason, no cert |
Cross-repo references:
~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.mdhistory/2026-06-23-flex-auth-production-pickup-suggestion.mdhistory/2026-06-23-flex-auth-policy-gate-production-smoke.md
Summary
- Deploy the flex-auth registry and policy package to the production flex-auth runtime — not only the example fixtures.
- Set
policy.flex_auth_urlto the production flex-auth base URL. - Enable
policy.enabled: trueonly after steps 1–5 pass. - Keep
fail_closed: trueunless an explicit break-glass procedure exists. - Smoke allow and deny paths; preserve non-secret evidence only.
See also
wiki/OpsWardenConfig.md— full config referencewiki/CredentialRouting.md~/flex-auth/docs/ops-warden-policy-gate-handoff.md— flex-auth handoffflex-auth/INTENT.mdnet-kingdom/docs/platform-identity-security-architecture.md