Files
ops-warden/wiki/PolicyGatedSigning.md
tegwick 90007c2cda feat: close WP-0009/WP-0013 production integration stewardship strand
Ship flex-auth policy gate registry and smoke evidence, archive WP-0009
through WP-0013, and add integration docs: ops-bridge cert_command
migration playbook, operator OpenBao token hygiene, principals drift
check script, and 2026-06-24 INTENT/SCOPE gap analysis.
2026-06-24 12:44:32 +02:00

239 lines
8.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Policy-Gated SSH Signing
Date: 2026-06-23
Status: **implemented (opt-in)** — WARDEN-WP-0007; policy package confirmed FLEX-WP-0006
By default `warden sign` authorizes via **inventory allow-list** and TTL policy
only. When `policy.enabled: true` in `warden.yaml`, ops-warden calls flex-auth
before signing and records the decision id in `signatures.log`.
---
## Flow
```text
warden sign <actor> --pubkey <path>
|
v
Load actor from inventory (type, principals, ttl)
|
v
policy.enabled?
no -> skip
yes -> flex-auth POST /v1/check
|
+-- DENY / unreachable (fail_closed) -> CAError
|
v ALLOW
CABackend.sign() (local or OpenBao SSH engine)
|
v
Append signatures.log (+ policy_decision_id when set)
```
The same gate runs for `warden issue` (local backend only).
---
## flex-auth request shape
| Field | Source |
| --- | --- |
| `subject.id` | `WARDEN_POLICY_SUBJECT` env var, or actor name |
| `subject.type` | Actor type (`adm` / `agt` / `atm`) |
| `tenant` | `policy.tenant` (default `tenant:platform`) |
| `resource.id` | `ssh-cert:actor/<actor-name>` |
| `resource.type` | `ssh-certificate` |
| `action` | `sign` |
| `context.principals` | From inventory |
| `context.actor_type` | adm \| agt \| atm |
| `context.pubkey_fingerprint` | SHA256 of pubkey text |
| `context.ttl_hours` | Requested TTL |
flex-auth must return `effect: allow` and an `id` (or `request_id`) on allow.
Deny responses include a `reason` surfaced in the CLI error.
---
## Configuration
```yaml
# warden.yaml — policy gate (opt-in, default off)
policy:
enabled: false
flex_auth_url: http://127.0.0.1:8080
fail_closed: true
tenant: tenant:platform
subject_env: WARDEN_POLICY_SUBJECT
system: ops-warden
```
| Key | Default | Description |
| --- | --- | --- |
| `enabled` | `false` | When `true`, call flex-auth before every sign/issue |
| `flex_auth_url` | `http://127.0.0.1:8080` | flex-auth base URL |
| `fail_closed` | `true` | Deny sign when flex-auth is unreachable or returns HTTP error |
| `tenant` | `tenant:platform` | Tenant sent in subject and resource |
| `subject_env` | `WARDEN_POLICY_SUBJECT` | Env var for IAM subject id override |
| `system` | `ops-warden` | Resource system identifier |
Set `WARDEN_POLICY_SUBJECT` to the caller's IAM profile `sub` when available.
If unset, the actor name is used as subject id.
---
## Versioning
| Version | Gate | Status |
| --- | --- | --- |
| **v1** | Inventory + TTL max | Shipped |
| **v2** | flex-auth opt-in via `policy.enabled` | Shipped (WP-0007) |
| **v2.1** | Identity claims required for `adm` signs | Planned |
| **v3** | Tenant-scoped policies per `tenant:*` | Planned |
---
## What stays in inventory
- Actor registration (name, type, default principals, default TTL)
- Host reference documentation
- Scorecard local checks
flex-auth decides **whether this sign request is allowed now**; inventory
defines **what the actor is allowed to request**.
---
## flex-auth policy package (FLEX-WP-0006)
flex-auth owns the `ssh-certificate` / `sign` policy package. ops-warden consumes
it via `POST /v1/check` when `policy.enabled: true`.
**Handoff (canonical):** `~/flex-auth/docs/ops-warden-policy-gate-handoff.md`
| Asset | flex-auth path |
| --- | --- |
| Policy package | `examples/ops-warden/policy_package.md` |
| Allow/deny fixtures | `examples/ops-warden/policy_fixtures.yaml` |
| Registry snapshot | `examples/ops-warden/registry_snapshot.json` |
| Subject manifest | `examples/ops-warden/subject_manifest.yaml` |
| Resource manifest | `examples/ops-warden/resource_manifest.yaml` |
### Tenant and subject bindings
| Field | Value |
| --- | --- |
| Tenant | `tenant:platform` (`policy.tenant`) |
| Resource system | `ops-warden` (`policy.system`) |
| Resource type | `ssh-certificate` |
| Action | `sign` |
| Resource id | `ssh-cert:actor/<actor-name>` |
| Actor type | Example flex-auth subject | ops-warden inventory name pattern |
| --- | --- | --- |
| `adm` | `platform-steward` | `adm-*` |
| `agt` | `ci-deploy-agent` | `agt-*` |
| `atm` | `backup-automation` | `atm-*` |
**Subject id sent to flex-auth:** `WARDEN_POLICY_SUBJECT` when set, otherwise the
inventory actor name. flex-auth may also allow `iam:<actor-name>` when listed in
`allowed_subjects` on the resource.
**Principals and TTL:** Taken from the sign request (inventory defaults). flex-auth
denies when principals are empty/disallowed or TTL exceeds `max_ttl_hours` on the
registered resource.
### Fixture coverage (flex-auth)
Allow: `fixture:ops-warden-adm-sign-allow`, `fixture:ops-warden-agt-sign-allow`,
`fixture:ops-warden-atm-sign-allow`.
Deny: `fixture:ops-warden-unknown-subject-deny`,
`fixture:ops-warden-actor-type-mismatch-deny`, `fixture:ops-warden-ttl-above-max-deny`,
`fixture:ops-warden-disallowed-principal-deny`,
`fixture:ops-warden-missing-fingerprint-deny`.
### Local smoke
```bash
# flex-auth (from ~/flex-auth)
flex-auth serve --addr 127.0.0.1:8080 \
--registry examples/ops-warden/registry_snapshot.json \
--policy examples/ops-warden/policy_package.md \
--log /tmp/flex-auth-ops-warden-decisions.jsonl
# warden.yaml — policy.enabled: true, flex_auth_url pointing at flex-auth
# Use an actor registered in the flex-auth registry (example fixtures use
# template names; production needs a registry slice for real inventory actors).
```
Local end-to-end evidence: `history/2026-06-23-flex-auth-policy-gate-local-smoke.md`.
### Production registry from inventory
Build a flex-auth registry snapshot that mirrors `inventory.yaml` actors:
```bash
python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \
-o registry/flex-auth/production_registry_snapshot.json
flex-auth load-registry --file registry/flex-auth/production_registry_snapshot.json
```
Re-run after adding or changing actors. Deploy the snapshot to the production
flex-auth runtime together with `~/flex-auth/examples/ops-warden/policy_package.md`.
Smoke (non-secret):
```bash
./scripts/policy_gate_production_smoke.sh
# OpenBao-backed when VAULT_TOKEN is valid:
SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh
```
Evidence: `history/2026-06-23-flex-auth-policy-gate-production-smoke.md`.
---
## Production rollout
**Keep `policy.enabled: false` until flex-auth is reachable** at `policy.flex_auth_url`
with `fail_closed: true`, unreachable flex-auth blocks all signs.
### Operator checklist
| Step | Owner | Action |
| --- | --- | --- |
| 1 | flex-auth | Deploy runtime; confirm `curl <flex_auth_url>/healthz` → 200 (**FLEX-WP-0007**) |
| 2 | flex-auth | Load production registry + policy package (`~/flex-auth/examples/ops-warden/`) |
| 3 | ops-warden | Regenerate registry from inventory: `scripts/build_flex_auth_registry.py` |
| 4 | ops-warden | Local smoke: `./scripts/policy_gate_production_smoke.sh` |
| 5 | operator | Vault smoke: `SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh` (valid `VAULT_TOKEN`) |
| 6 | operator | Set `policy.flex_auth_url` in `~/.config/warden/warden.yaml` |
| 7 | operator | Set `policy.enabled: true`; keep `fail_closed: true` |
| 8 | operator | Allow smoke: `warden sign <actor>``signatures.log` has `policy_decision_id` |
| 9 | operator | Deny smoke: e.g. `--ttl` above max — CLI shows flex-auth `reason`, no cert |
Cross-repo references:
- `~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md`
- `history/2026-06-23-flex-auth-production-pickup-suggestion.md`
- `history/2026-06-23-flex-auth-policy-gate-production-smoke.md`
### Summary
1. Deploy the flex-auth registry and policy package to the production flex-auth
runtime — **not** only the example fixtures.
2. Set `policy.flex_auth_url` to the production flex-auth base URL.
3. Enable `policy.enabled: true` only after steps 15 pass.
4. Keep `fail_closed: true` unless an explicit break-glass procedure exists.
5. Smoke allow and deny paths; preserve non-secret evidence only.
---
## See also
- `wiki/OpsWardenConfig.md` — full config reference
- `wiki/CredentialRouting.md`
- `~/flex-auth/docs/ops-warden-policy-gate-handoff.md` — flex-auth handoff
- `flex-auth/INTENT.md`
- `net-kingdom/docs/platform-identity-security-architecture.md`