FLEX-WP-0007: production registry fixture, tests, and sync runbook
Some checks failed
CI / Build and Test (push) Has been cancelled
CI / Lint (push) Has been cancelled

Add production_registry_snapshot.json from ops-warden inventory with CI
coverage for real actors, IAM subject binding, ttl_out_of_bounds, and
unknown_actor_resource. Extend serve contract tests with /healthz and
publish the registry sync contract for operator deployment.
This commit is contained in:
2026-06-24 14:52:35 +02:00
parent fae0f00a69
commit 941501c590
7 changed files with 981 additions and 3 deletions

View File

@@ -0,0 +1,211 @@
---
id: FLEX-WP-0007
type: workplan
title: "Ops-Warden Policy Gate Production Deployment"
domain: infotech
repo: flex-auth
status: blocked
owner: codex
topic_slug: flex-auth
planning_priority: P0
planning_order: 70
depends_on_workplans:
- FLEX-WP-0006
related_workplans:
- WARDEN-WP-0009
created: "2026-06-23"
updated: "2026-06-23"
state_hub_workstream_id: "358ce697-2611-4fe9-89ab-63e86ceb00fa"
---
# FLEX-WP-0007: Ops-Warden Policy Gate Production Deployment
## Purpose
Deploy flex-auth as a reachable production runtime for ops-warden's opt-in SSH
signing policy gate, load a production registry aligned with real inventory
actors, and complete joint smoke evidence so operators can set policy.enabled:
true in warden.yaml.
Review update: repo-side production readiness is now separated from
operator-only work. flex-auth can publish the production fixture, tests,
runtime command, and sync contract in this repo. The actual stable URL
deployment and OpenBao smoke remain blocked because they need NetKingdom
reachability and a refreshed scoped VAULT_TOKEN.
## Background
ops-warden finished WARDEN-WP-0009 on the caller side: local and
production-registry smoke passed, and the production registry generator exists.
The remaining risk is operational, not policy shape: warden workstations need a
reachable flex-auth URL, and the vault-backed joint smoke needs a valid scoped
VAULT_TOKEN.
Production registry artifacts:
- flex-auth fixture: examples/ops-warden/production_registry_snapshot.json
- ops-warden source artifact: ~/ops-warden/registry/flex-auth/production_registry_snapshot.json
- ops-warden generator: ~/ops-warden/scripts/build_flex_auth_registry.py
## Ownership Boundary
| Concern | Owner |
| --- | --- |
| Policy package and PDP decision | flex-auth |
| Actor inventory and TTL/principal defaults | ops-warden |
| SSH CA and OpenBao signing | ops-warden |
| Production registry content for SSH actors | Joint: ops-warden generates, flex-auth hosts |
| policy.enabled flip | ops-warden operator after flex-auth is reachable |
No SSH private keys, OpenBao tokens, or other secrets belong in fixtures, docs,
State Hub messages, or smoke evidence.
## T1 - Deploy production flex-auth runtime
```task
id: FLEX-WP-0007-T01
status: done
priority: high
state_hub_task_id: "727573fc-86a3-4f5a-abd7-40b0ccb01e68"
```
Deploy flex-auth serve, or equivalent, to a stable URL reachable from
workstations that run warden sign.
- [x] Choose preferred target: in-cluster Service at http://flex-auth.flex-auth.svc.cluster.local:8080 when reachable; otherwise approved operator tunnel or ingress with the same base path
- [x] Document canonical policy.flex_auth_url selection in docs/ops-warden-registry-sync.md
- [x] Document healthz pre-flight: GET /healthz returns HTTP 200
- [x] Add service test coverage for /healthz
- [x] Operator tunnel deployed as flex-auth-coulombcore and confirmed POST /v1/check is reachable from CoulombCore
Acceptance: operator runs curl <flex_auth_url>/healthz from the warden
workstation and receives HTTP 200. Verified from CoulombCore on 2026-06-24 with
flex_auth_url http://127.0.0.1:18090.
## T2 - Load production registry and verify real actors
```task
id: FLEX-WP-0007-T02
status: done
priority: high
state_hub_task_id: "6ec1e00c-4a3a-475b-aefb-af3961de7070"
```
Load the production registry snapshot derived from ops-warden inventory, not
only the template actors in examples/ops-warden/registry_snapshot.json.
- [x] Add examples/ops-warden/production_registry_snapshot.json from the ops-warden generated artifact
- [x] Document regenerate and load procedure in docs/ops-warden-registry-sync.md
- [x] Verify allow for agt-state-hub-bridge / sign
- [x] Verify deny for ttl_out_of_bounds
- [x] Verify deny for unregistered actors with unknown_actor_resource
- [x] Add CI tests using production actor names: agt-state-hub-bridge, agt-codex-interhub-bootstrap, adm-example, atm-backup-daily
Acceptance: local flex-auth coverage allows agt-state-hub-bridge without
ops-warden-local registry patching. Deployed runtime verification remains part
of T1.
## T3 - Publish registry sync contract with ops-warden
```task
id: FLEX-WP-0007-T03
status: done
priority: medium
state_hub_task_id: "afa09ec3-516c-433d-87a7-330cb79845a8"
```
Document the two-repo workflow when inventory or policy boundaries change.
- [x] Publish docs/ops-warden-registry-sync.md
- [x] Cover ops-warden ownership of actor names, actor types, principals, and TTL defaults
- [x] Cover flex-auth ownership of hosted registry, relationships, and policy package evaluation
- [x] Document trigger: inventory add/change -> regenerate snapshot -> flex-auth reload
- [x] Cross-link from docs/ops-warden-policy-gate-handoff.md
- [x] Confirm ops-warden wiki/PolicyGatedSigning.md already points to the flex-auth handoff; flex-auth now points back from the sync runbook
Acceptance: a new agt-* actor addition has an unambiguous procedure across both
repos.
## T4 - Joint OpenBao + policy gate production smoke
```task
id: FLEX-WP-0007-T04
status: wait
priority: medium
state_hub_task_id: "32a96f1c-e0e8-4e27-baa6-7b8c445cf7a1"
```
Coordinate with ops-warden for vault-backed signing through the deployed
flex-auth runtime.
- [x] flex-auth deployed with production registry via operator tunnel, completing T1
- [ ] ops-warden policy.enabled: true and policy.flex_auth_url points to deployed URL http://127.0.0.1:18090 on CoulombCore
- [ ] Valid scoped VAULT_TOKEN with warden-sign policy, operator-provided
- [ ] Allow smoke: warden sign agt-state-hub-bridge records backend vault and policy_decision_id
- [ ] Deny smoke: TTL above registry max is denied by flex-auth before OpenBao
- [ ] Record non-secret evidence: decision ids, reasons, actor names only
Blocked on: scoped VAULT_TOKEN refresh. Previous ops-warden session returned
HTTP 403 on 2026-06-23; no VAULT_TOKEN is present in this session.
Smoke runner when token is valid:
SMOKE_VAULT=1 ~/ops-warden/scripts/policy_gate_production_smoke.sh
## T5 - IAM subject binding for production
```task
id: FLEX-WP-0007-T05
status: done
priority: low
state_hub_task_id: "65dc3c59-1e4b-4335-b6a0-db492ea9b2b5"
```
Clarify how WARDEN_POLICY_SUBJECT maps to flex-auth allowed_subjects in
production.
- [x] Document production default: actor name as subject.id unless WARDEN_POLICY_SUBJECT supplies the IAM subject
- [x] Confirm production registry allowed_subjects includes iam:<actor> entries
- [x] Add test coverage for iam:agt-state-hub-bridge allow path
Acceptance: documented subject-id strategy; no ops-warden special-casing is
required beyond existing policy behavior.
## Exit Criteria
- flex-auth production runtime reachable from CoulombCore warden path: done via flex-auth-coulombcore operator tunnel
- Production registry loaded and real inventory actors covered locally: done
- Registry sync contract published and cross-linked: done
- Joint vault-backed smoke evidence recorded, or T4 explicitly waits on token: T4 waits on scoped VAULT_TOKEN
- ops-warden operator has the repo-side artifacts needed to set policy.enabled: true after the stable URL and token are ready
## Implementation Notes
2026-06-23 repo-side implementation:
- Added examples/ops-warden/production_registry_snapshot.json from the ops-warden generated production registry artifact.
- Added Go coverage for production actor allows, IAM subject allow, ttl_out_of_bounds, unknown_actor_resource, production registry counts, and /healthz.
- Published docs/ops-warden-registry-sync.md and cross-linked it from the handoff and examples docs.
Remaining blocked work:
- Operator refreshes scoped VAULT_TOKEN and reruns the OpenBao-backed smoke.
- After workplan file changes, run make fix-consistency REPO=flex-auth from ~/state-hub to mirror these statuses into State Hub.
## See Also
- docs/ops-warden-policy-gate-handoff.md
- docs/ops-warden-registry-sync.md
- workplans/FLEX-WP-0006-ops-warden-ssh-signing-policy-gate.md
- ~/ops-warden/wiki/PolicyGatedSigning.md
- ~/ops-warden/workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md
- ~/ops-warden/history/2026-06-23-flex-auth-production-pickup-suggestion.md
2026-06-24 operator tunnel update:
- Built /tmp/flex-auth and started the production registry runtime on local 127.0.0.1:18090.
- Added local ops-bridge tunnel flex-auth-coulombcore, forwarding CoulombCore 127.0.0.1:18090 to the local runtime.
- Verified remote health from CoulombCore: GET /healthz returned HTTP 200.
- Verified remote POST /v1/check from CoulombCore allowed agt-state-hub-bridge with decision:873c6c682a52bebc.
- VAULT_TOKEN is absent, so OpenBao-backed smoke remains blocked on operator credential refresh.