Files
flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md
tegwick 339c35e876
Some checks failed
CI / Build and Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
Close ops-warden policy gate deployment
2026-06-30 00:52:56 +02:00

9.7 KiB

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, depends_on_workplans, related_workplans, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug planning_priority planning_order depends_on_workplans related_workplans created updated state_hub_workstream_id
FLEX-WP-0007 workplan Ops-Warden Policy Gate Production Deployment infotech flex-auth finished codex flex-auth P0 70
FLEX-WP-0006
WARDEN-WP-0009
2026-06-23 2026-06-30 358ce697-2611-4fe9-89ab-63e86ceb00fa

FLEX-WP-0007: Ops-Warden Policy Gate Production Deployment

Purpose

Deploy flex-auth as a reachable production runtime for ops-warden's opt-in SSH signing policy gate, load a production registry aligned with real inventory actors, and complete joint smoke evidence so operators can set policy.enabled: true in warden.yaml when the ecosystem maturity stage calls for live enforcement.

Review update: repo-side production readiness is now separated from operator-only work. flex-auth can publish the production fixture, tests, runtime command, and sync contract in this repo. The actual stable URL deployment and OpenBao smoke were completed through the operator tunnel and a scoped warden-sign OpenBao lane. The final policy.enabled production flip is explicitly deferred until the ecosystem reaches testing/production maturity.

Background

ops-warden finished WARDEN-WP-0009 on the caller side: local and production-registry smoke passed, and the production registry generator exists. The remaining risk is operational, not policy shape: warden workstations need a reachable flex-auth URL and a vault-backed joint smoke before the gate can be banked for later enforcement.

Production registry artifacts:

  • flex-auth fixture: examples/ops-warden/production_registry_snapshot.json
  • ops-warden source artifact: ~/ops-warden/registry/flex-auth/production_registry_snapshot.json
  • ops-warden generator: ~/ops-warden/scripts/build_flex_auth_registry.py

Ownership Boundary

Concern Owner
Policy package and PDP decision flex-auth
Actor inventory and TTL/principal defaults ops-warden
SSH CA and OpenBao signing ops-warden
Production registry content for SSH actors Joint: ops-warden generates, flex-auth hosts
policy.enabled flip ops-warden operator after flex-auth is reachable

No SSH private keys, OpenBao tokens, or other secrets belong in fixtures, docs, State Hub messages, or smoke evidence.

T1 - Deploy production flex-auth runtime

id: FLEX-WP-0007-T01
status: done
priority: high
state_hub_task_id: "727573fc-86a3-4f5a-abd7-40b0ccb01e68"

Deploy flex-auth serve, or equivalent, to a stable URL reachable from workstations that run warden sign.

  • Choose preferred target: in-cluster Service at http://flex-auth.flex-auth.svc.cluster.local:8080 when reachable; otherwise approved operator tunnel or ingress with the same base path
  • Document canonical policy.flex_auth_url selection in docs/ops-warden-registry-sync.md
  • Document healthz pre-flight: GET /healthz returns HTTP 200
  • Add service test coverage for /healthz
  • Operator tunnel deployed as flex-auth-coulombcore and confirmed POST /v1/check is reachable from CoulombCore

Acceptance: operator runs curl <flex_auth_url>/healthz from the warden workstation and receives HTTP 200. Verified from CoulombCore on 2026-06-24 with flex_auth_url http://127.0.0.1:18090.

T2 - Load production registry and verify real actors

id: FLEX-WP-0007-T02
status: done
priority: high
state_hub_task_id: "6ec1e00c-4a3a-475b-aefb-af3961de7070"

Load the production registry snapshot derived from ops-warden inventory, not only the template actors in examples/ops-warden/registry_snapshot.json.

  • Add examples/ops-warden/production_registry_snapshot.json from the ops-warden generated artifact
  • Document regenerate and load procedure in docs/ops-warden-registry-sync.md
  • Verify allow for agt-state-hub-bridge / sign
  • Verify deny for ttl_out_of_bounds
  • Verify deny for unregistered actors with unknown_actor_resource
  • Add CI tests using production actor names: agt-state-hub-bridge, agt-codex-interhub-bootstrap, adm-example, atm-backup-daily

Acceptance: local flex-auth coverage allows agt-state-hub-bridge without ops-warden-local registry patching. Deployed runtime verification remains part of T1.

T3 - Publish registry sync contract with ops-warden

id: FLEX-WP-0007-T03
status: done
priority: medium
state_hub_task_id: "afa09ec3-516c-433d-87a7-330cb79845a8"

Document the two-repo workflow when inventory or policy boundaries change.

  • Publish docs/ops-warden-registry-sync.md
  • Cover ops-warden ownership of actor names, actor types, principals, and TTL defaults
  • Cover flex-auth ownership of hosted registry, relationships, and policy package evaluation
  • Document trigger: inventory add/change -> regenerate snapshot -> flex-auth reload
  • Cross-link from docs/ops-warden-policy-gate-handoff.md
  • Confirm ops-warden wiki/PolicyGatedSigning.md already points to the flex-auth handoff; flex-auth now points back from the sync runbook

Acceptance: a new agt-* actor addition has an unambiguous procedure across both repos.

T4 - Joint OpenBao + policy gate production smoke

id: FLEX-WP-0007-T04
status: done
priority: medium
state_hub_task_id: "32a96f1c-e0e8-4e27-baa6-7b8c445cf7a1"

Coordinate with ops-warden for vault-backed signing through the deployed flex-auth runtime.

  • flex-auth deployed with production registry via operator tunnel, completing T1
  • policy.flex_auth_url validated against deployed URL http://127.0.0.1:18090 on CoulombCore; policy.enabled intentionally remains off until testing/production maturity
  • Scoped warden-sign OpenBao lane available for the smoke; no token value recorded here
  • Allow smoke: warden sign agt-state-hub-bridge recorded backend vault and policy_decision_id decision:032b096c433ad80c
  • Deny smoke: TTL above registry max was denied by flex-auth before OpenBao with reason ttl_out_of_bounds
  • Record non-secret evidence: decision ids, reasons, actor names only

Closed on 2026-06-30 from ops-warden non-secret smoke evidence received 2026-06-29. The operator deliberately keeps policy.enabled off for now because the ecosystem is still build-stage/pre-testing; the gate is verified and banked for later live enforcement rather than forced into premature production rigor.

Smoke runner when token is valid:

SMOKE_VAULT=1 ~/ops-warden/scripts/policy_gate_production_smoke.sh

T5 - IAM subject binding for production

id: FLEX-WP-0007-T05
status: done
priority: low
state_hub_task_id: "65dc3c59-1e4b-4335-b6a0-db492ea9b2b5"

Clarify how WARDEN_POLICY_SUBJECT maps to flex-auth allowed_subjects in production.

  • Document production default: actor name as subject.id unless WARDEN_POLICY_SUBJECT supplies the IAM subject
  • Confirm production registry allowed_subjects includes iam: entries
  • Add test coverage for iam:agt-state-hub-bridge allow path

Acceptance: documented subject-id strategy; no ops-warden special-casing is required beyond existing policy behavior.

Exit Criteria

  • flex-auth production runtime reachable from CoulombCore warden path: done via flex-auth-coulombcore operator tunnel
  • Production registry loaded and real inventory actors covered locally: done
  • Registry sync contract published and cross-linked: done
  • Joint vault-backed smoke evidence recorded: done, decision:032b096c433ad80c
  • ops-warden operator has the repo-side artifacts needed to set policy.enabled: true later, when maturity posture calls for live enforcement

Implementation Notes

2026-06-23 repo-side implementation:

  • Added examples/ops-warden/production_registry_snapshot.json from the ops-warden generated production registry artifact.
  • Added Go coverage for production actor allows, IAM subject allow, ttl_out_of_bounds, unknown_actor_resource, production registry counts, and /healthz.
  • Published docs/ops-warden-registry-sync.md and cross-linked it from the handoff and examples docs.

Closeout note:

  • The OpenBao-backed smoke passed through ops-warden with the scoped warden-sign lane.
  • The policy.enabled flip is intentionally deferred by operator/maturity decision, not treated as an open repo-side blocker.
  • After workplan file changes, run make fix-consistency REPO=flex-auth from ~/state-hub to mirror these statuses into State Hub.

See Also

  • docs/ops-warden-policy-gate-handoff.md
  • docs/ops-warden-registry-sync.md
  • workplans/FLEX-WP-0006-ops-warden-ssh-signing-policy-gate.md
  • ~/ops-warden/wiki/PolicyGatedSigning.md
  • ~/ops-warden/workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md
  • ~/ops-warden/history/2026-06-23-flex-auth-production-pickup-suggestion.md

2026-06-24 operator tunnel update:

  • Built /tmp/flex-auth and started the production registry runtime on local 127.0.0.1:18090.
  • Added local ops-bridge tunnel flex-auth-coulombcore, forwarding CoulombCore 127.0.0.1:18090 to the local runtime.
  • Verified remote health from CoulombCore: GET /healthz returned HTTP 200.
  • Verified remote POST /v1/check from CoulombCore allowed agt-state-hub-bridge with decision:873c6c682a52bebc.
  • VAULT_TOKEN is absent, so OpenBao-backed smoke remains blocked on operator credential refresh.

2026-06-30 closeout from ops-warden smoke handoff:

  • Mode: FLEX_AUTH_EXTERNAL against deployed runtime 127.0.0.1:18090 via the CoulombCore operator path.
  • Allow: warden sign agt-state-hub-bridge returned policy_decision_id decision:032b096c433ad80c.
  • Deny: --ttl 999 was rejected with ttl_out_of_bounds before OpenBao signing.
  • Vault-backed allow: backend vault produced the same policy_decision_id through the scoped warden-sign OpenBao lane.
  • Operator decision: keep policy.enabled off during build-stage/pre-testing and flip it later when the ecosystem reaches the appropriate maturity posture.