Files
flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md
tegwick 339c35e876
Some checks failed
CI / Build and Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
Close ops-warden policy gate deployment
2026-06-30 00:52:56 +02:00

224 lines
9.7 KiB
Markdown

---
id: FLEX-WP-0007
type: workplan
title: "Ops-Warden Policy Gate Production Deployment"
domain: infotech
repo: flex-auth
status: finished
owner: codex
topic_slug: flex-auth
planning_priority: P0
planning_order: 70
depends_on_workplans:
- FLEX-WP-0006
related_workplans:
- WARDEN-WP-0009
created: "2026-06-23"
updated: "2026-06-30"
state_hub_workstream_id: "358ce697-2611-4fe9-89ab-63e86ceb00fa"
---
# FLEX-WP-0007: Ops-Warden Policy Gate Production Deployment
## Purpose
Deploy flex-auth as a reachable production runtime for ops-warden's opt-in SSH
signing policy gate, load a production registry aligned with real inventory
actors, and complete joint smoke evidence so operators can set policy.enabled:
true in warden.yaml when the ecosystem maturity stage calls for live enforcement.
Review update: repo-side production readiness is now separated from
operator-only work. flex-auth can publish the production fixture, tests,
runtime command, and sync contract in this repo. The actual stable URL
deployment and OpenBao smoke were completed through the operator tunnel and a
scoped warden-sign OpenBao lane. The final `policy.enabled` production flip is
explicitly deferred until the ecosystem reaches testing/production maturity.
## Background
ops-warden finished WARDEN-WP-0009 on the caller side: local and
production-registry smoke passed, and the production registry generator exists.
The remaining risk is operational, not policy shape: warden workstations need a
reachable flex-auth URL and a vault-backed joint smoke before the gate can be
banked for later enforcement.
Production registry artifacts:
- flex-auth fixture: examples/ops-warden/production_registry_snapshot.json
- ops-warden source artifact: ~/ops-warden/registry/flex-auth/production_registry_snapshot.json
- ops-warden generator: ~/ops-warden/scripts/build_flex_auth_registry.py
## Ownership Boundary
| Concern | Owner |
| --- | --- |
| Policy package and PDP decision | flex-auth |
| Actor inventory and TTL/principal defaults | ops-warden |
| SSH CA and OpenBao signing | ops-warden |
| Production registry content for SSH actors | Joint: ops-warden generates, flex-auth hosts |
| policy.enabled flip | ops-warden operator after flex-auth is reachable |
No SSH private keys, OpenBao tokens, or other secrets belong in fixtures, docs,
State Hub messages, or smoke evidence.
## T1 - Deploy production flex-auth runtime
```task
id: FLEX-WP-0007-T01
status: done
priority: high
state_hub_task_id: "727573fc-86a3-4f5a-abd7-40b0ccb01e68"
```
Deploy flex-auth serve, or equivalent, to a stable URL reachable from
workstations that run warden sign.
- [x] Choose preferred target: in-cluster Service at http://flex-auth.flex-auth.svc.cluster.local:8080 when reachable; otherwise approved operator tunnel or ingress with the same base path
- [x] Document canonical policy.flex_auth_url selection in docs/ops-warden-registry-sync.md
- [x] Document healthz pre-flight: GET /healthz returns HTTP 200
- [x] Add service test coverage for /healthz
- [x] Operator tunnel deployed as flex-auth-coulombcore and confirmed POST /v1/check is reachable from CoulombCore
Acceptance: operator runs curl <flex_auth_url>/healthz from the warden
workstation and receives HTTP 200. Verified from CoulombCore on 2026-06-24 with
flex_auth_url http://127.0.0.1:18090.
## T2 - Load production registry and verify real actors
```task
id: FLEX-WP-0007-T02
status: done
priority: high
state_hub_task_id: "6ec1e00c-4a3a-475b-aefb-af3961de7070"
```
Load the production registry snapshot derived from ops-warden inventory, not
only the template actors in examples/ops-warden/registry_snapshot.json.
- [x] Add examples/ops-warden/production_registry_snapshot.json from the ops-warden generated artifact
- [x] Document regenerate and load procedure in docs/ops-warden-registry-sync.md
- [x] Verify allow for agt-state-hub-bridge / sign
- [x] Verify deny for ttl_out_of_bounds
- [x] Verify deny for unregistered actors with unknown_actor_resource
- [x] Add CI tests using production actor names: agt-state-hub-bridge, agt-codex-interhub-bootstrap, adm-example, atm-backup-daily
Acceptance: local flex-auth coverage allows agt-state-hub-bridge without
ops-warden-local registry patching. Deployed runtime verification remains part
of T1.
## T3 - Publish registry sync contract with ops-warden
```task
id: FLEX-WP-0007-T03
status: done
priority: medium
state_hub_task_id: "afa09ec3-516c-433d-87a7-330cb79845a8"
```
Document the two-repo workflow when inventory or policy boundaries change.
- [x] Publish docs/ops-warden-registry-sync.md
- [x] Cover ops-warden ownership of actor names, actor types, principals, and TTL defaults
- [x] Cover flex-auth ownership of hosted registry, relationships, and policy package evaluation
- [x] Document trigger: inventory add/change -> regenerate snapshot -> flex-auth reload
- [x] Cross-link from docs/ops-warden-policy-gate-handoff.md
- [x] Confirm ops-warden wiki/PolicyGatedSigning.md already points to the flex-auth handoff; flex-auth now points back from the sync runbook
Acceptance: a new agt-* actor addition has an unambiguous procedure across both
repos.
## T4 - Joint OpenBao + policy gate production smoke
```task
id: FLEX-WP-0007-T04
status: done
priority: medium
state_hub_task_id: "32a96f1c-e0e8-4e27-baa6-7b8c445cf7a1"
```
Coordinate with ops-warden for vault-backed signing through the deployed
flex-auth runtime.
- [x] flex-auth deployed with production registry via operator tunnel, completing T1
- [x] policy.flex_auth_url validated against deployed URL http://127.0.0.1:18090 on CoulombCore; `policy.enabled` intentionally remains off until testing/production maturity
- [x] Scoped warden-sign OpenBao lane available for the smoke; no token value recorded here
- [x] Allow smoke: `warden sign agt-state-hub-bridge` recorded backend `vault` and policy_decision_id `decision:032b096c433ad80c`
- [x] Deny smoke: TTL above registry max was denied by flex-auth before OpenBao with reason `ttl_out_of_bounds`
- [x] Record non-secret evidence: decision ids, reasons, actor names only
Closed on 2026-06-30 from ops-warden non-secret smoke evidence received
2026-06-29. The operator deliberately keeps `policy.enabled` off for now because
the ecosystem is still build-stage/pre-testing; the gate is verified and banked
for later live enforcement rather than forced into premature production rigor.
Smoke runner when token is valid:
SMOKE_VAULT=1 ~/ops-warden/scripts/policy_gate_production_smoke.sh
## T5 - IAM subject binding for production
```task
id: FLEX-WP-0007-T05
status: done
priority: low
state_hub_task_id: "65dc3c59-1e4b-4335-b6a0-db492ea9b2b5"
```
Clarify how WARDEN_POLICY_SUBJECT maps to flex-auth allowed_subjects in
production.
- [x] Document production default: actor name as subject.id unless WARDEN_POLICY_SUBJECT supplies the IAM subject
- [x] Confirm production registry allowed_subjects includes iam:<actor> entries
- [x] Add test coverage for iam:agt-state-hub-bridge allow path
Acceptance: documented subject-id strategy; no ops-warden special-casing is
required beyond existing policy behavior.
## Exit Criteria
- flex-auth production runtime reachable from CoulombCore warden path: done via flex-auth-coulombcore operator tunnel
- Production registry loaded and real inventory actors covered locally: done
- Registry sync contract published and cross-linked: done
- Joint vault-backed smoke evidence recorded: done, decision:032b096c433ad80c
- ops-warden operator has the repo-side artifacts needed to set policy.enabled: true later, when maturity posture calls for live enforcement
## Implementation Notes
2026-06-23 repo-side implementation:
- Added examples/ops-warden/production_registry_snapshot.json from the ops-warden generated production registry artifact.
- Added Go coverage for production actor allows, IAM subject allow, ttl_out_of_bounds, unknown_actor_resource, production registry counts, and /healthz.
- Published docs/ops-warden-registry-sync.md and cross-linked it from the handoff and examples docs.
Closeout note:
- The OpenBao-backed smoke passed through ops-warden with the scoped warden-sign lane.
- The `policy.enabled` flip is intentionally deferred by operator/maturity decision, not treated as an open repo-side blocker.
- After workplan file changes, run make fix-consistency REPO=flex-auth from ~/state-hub to mirror these statuses into State Hub.
## See Also
- docs/ops-warden-policy-gate-handoff.md
- docs/ops-warden-registry-sync.md
- workplans/FLEX-WP-0006-ops-warden-ssh-signing-policy-gate.md
- ~/ops-warden/wiki/PolicyGatedSigning.md
- ~/ops-warden/workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md
- ~/ops-warden/history/2026-06-23-flex-auth-production-pickup-suggestion.md
2026-06-24 operator tunnel update:
- Built /tmp/flex-auth and started the production registry runtime on local 127.0.0.1:18090.
- Added local ops-bridge tunnel flex-auth-coulombcore, forwarding CoulombCore 127.0.0.1:18090 to the local runtime.
- Verified remote health from CoulombCore: GET /healthz returned HTTP 200.
- Verified remote POST /v1/check from CoulombCore allowed agt-state-hub-bridge with decision:873c6c682a52bebc.
- VAULT_TOKEN is absent, so OpenBao-backed smoke remains blocked on operator credential refresh.
2026-06-30 closeout from ops-warden smoke handoff:
- Mode: `FLEX_AUTH_EXTERNAL` against deployed runtime `127.0.0.1:18090` via the CoulombCore operator path.
- Allow: `warden sign agt-state-hub-bridge` returned policy_decision_id `decision:032b096c433ad80c`.
- Deny: `--ttl 999` was rejected with `ttl_out_of_bounds` before OpenBao signing.
- Vault-backed allow: backend `vault` produced the same policy_decision_id through the scoped warden-sign OpenBao lane.
- Operator decision: keep `policy.enabled` off during build-stage/pre-testing and flip it later when the ecosystem reaches the appropriate maturity posture.