Ship flex-auth policy gate registry and smoke evidence, archive WP-0009 through WP-0013, and add integration docs: ops-bridge cert_command migration playbook, operator OpenBao token hygiene, principals drift check script, and 2026-06-24 INTENT/SCOPE gap analysis.
7.3 KiB
flex-auth Pickup Suggestion — Ops-Warden Policy Gate Production
Date: 2026-06-23
From: ops-warden (WARDEN-WP-0009 finished)
For: flex-auth owner
Prior delivery: FLEX-WP-0006 (policy package, template registry, handoff doc)
Summary
ops-warden closed WARDEN-WP-0009. The caller side (policy.enabled,
POST /v1/check, policy_decision_id in signatures.log) is verified.
flex-auth policy authoring for the gate contract is done.
What remains is flex-auth production runtime + registry operations so
operators can set policy.enabled: true on workstations running warden sign
without local flex-auth serve hacks.
What ops-warden already proved
| Evidence | Location |
|---|---|
| Template registry + policy smoke | history/2026-06-23-flex-auth-policy-gate-local-smoke.md |
| Production inventory registry smoke | history/2026-06-23-flex-auth-policy-gate-production-smoke.md |
| Production registry artifact | registry/flex-auth/production_registry_snapshot.json |
| Registry generator | scripts/build_flex_auth_registry.py |
| Joint smoke runner | scripts/policy_gate_production_smoke.sh |
Production-registry allow smoke (real actor agt-state-hub-bridge):
policy_decision_id: decision:032b096c433ad80c- Deny:
ttl_out_of_boundswithfail_closed: true
OpenBao-backed sign + policy gate is not yet joint-verified — scoped
VAULT_TOKEN returned HTTP 403 in this session (ops-warden operator task).
Gaps flex-auth should pick up
1. Production runtime deployment (P0)
Problem: No reachable flex-auth endpoint from the operator workstation.
Probe from WSL: flex-auth.flex-auth.svc.cluster.local:8080 does not resolve;
127.0.0.1:8080 is not running. ops-warden cannot enable policy.enabled
with fail_closed: true until flex-auth is up.
Suggestion for flex-auth:
- Deploy
flex-auth serve(or equivalent) to a stable production URL reachable from machines that runwarden sign. - Document the canonical URL for
policy.flex_auth_url(cluster DNS, tunnel, or ingress — whichever matches NetKingdom operator access patterns). - Expose
GET /healthz(already in code) in runbooks; ops-warden operators will use it as a pre-flight before enabling the gate.
Acceptance: Operator can curl <flex_auth_url>/healthz from the warden
workstation and get HTTP 200.
2. Load production registry, not only template fixtures (P0)
Problem: examples/ops-warden/registry_snapshot.json uses template
actors (platform-steward, ci-deploy-agent, backup-automation). Production
inventory uses different names (agt-state-hub-bridge, etc.). Signing with
policy.enabled: true denies unregistered actors (unknown_actor_resource).
Suggestion for flex-auth:
- Adopt ops-warden's production registry snapshot as the initial production
load target, or ingest equivalent manifests under
examples/ops-warden/generated from real inventory. - Document operator steps:
# ops-warden (regenerate when inventory changes) python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \ -o registry/flex-auth/production_registry_snapshot.json # flex-auth (load into runtime) flex-auth load-registry --file <path-to-production_registry_snapshot.json> flex-auth serve --registry <snapshot> --policy examples/ops-warden/policy_package.md ... - Add fixture or integration tests using production actor names
(
agt-state-hub-bridge,adm-example,atm-backup-daily) so CI catches registry drift.
Acceptance: POST /v1/check allows agt-state-hub-bridge / sign against
the deployed production registry without ops-warden-local registry patching.
3. Registry sync contract (P1)
Problem: ops-warden owns inventory.yaml; flex-auth owns authorization
registry. Today sync is manual: regenerate JSON, reload flex-auth.
Suggestion for flex-auth:
- Publish a short sync contract doc:
- ops-warden owns: actor names, types, principals, TTL defaults
- flex-auth owns:
allowed_subjects,max_ttl_hours, relationships, policy package - Trigger: inventory add/change → regenerate snapshot → flex-auth reload
- Optional later:
flex-auth validatetarget for ops-warden-generated snapshots; or HTTP reload endpoint for registry updates without restart.
Acceptance: Documented two-repo workflow; no ambiguity on who updates what
when a new agt-* actor is added.
4. Joint production smoke with OpenBao (P1)
Problem: Policy gate smoke used backend: local or local flex-auth. Full
production path is warden sign → flex-auth → OpenBao SSH engine.
Suggestion for flex-auth:
- Coordinate one joint smoke session with ops-warden once:
- flex-auth deployed with production registry
- ops-warden
policy.enabled: true, validVAULT_TOKEN - Allow:
warden sign agt-state-hub-bridge→signatures.loghasbackend: vaultandpolicy_decision_id - Deny: e.g.
--ttlabove max → flex-auth deny before OpenBao call
- Record non-secret evidence (decision ids, reasons, actor names only).
Acceptance: Shared history entry or flex-auth handoff update with vault-backed evidence mirroring ops-warden's local smoke format.
5. IAM subject binding in production (P2)
Problem: Policy allows subject.id = actor name or iam:<actor>. Production
may set WARDEN_POLICY_SUBJECT from key-cape/IAM profile sub.
Suggestion for flex-auth:
- Confirm production registry
allowed_subjectscovers expected IAM subs for each actor (or document that actor-name fallback is the production default until IAM mapping is wired). - Add one fixture for
WARDEN_POLICY_SUBJECT/iam:agt-state-hub-bridgeif that path is intended in prod.
Acceptance: Documented subject-id strategy for SSH sign gate in production.
Proposed flex-auth workplan (draft)
Title: FLEX-WP-0007 — Ops-Warden Policy Gate Production Deployment
Priority: P0
Depends on: FLEX-WP-0006, ops-warden WARDEN-WP-0009 (finished)
| Task | Summary |
|---|---|
| T1 | Deploy flex-auth runtime; document production flex_auth_url + /healthz |
| T2 | Load production registry snapshot; verify allow/deny for real inventory actors |
| T3 | Publish registry sync contract with ops-warden (inventory.yaml → snapshot) |
| T4 | Joint OpenBao + policy gate smoke with ops-warden (non-secret evidence) |
| T5 | IAM subject binding notes / fixtures for WARDEN_POLICY_SUBJECT (if needed) |
Ownership boundary (unchanged)
| Concern | Owner |
|---|---|
| Policy package + PDP decision | flex-auth |
| Actor inventory + TTL/principal defaults | ops-warden |
| SSH CA / OpenBao signing | ops-warden |
| Production registry content for SSH actors | joint — ops-warden generates from inventory; flex-auth hosts and evaluates |
policy.enabled flip |
ops-warden operator (after flex-auth reachable) |
References
| Doc | Repo |
|---|---|
docs/ops-warden-policy-gate-handoff.md |
flex-auth |
workplans/FLEX-WP-0006-ops-warden-ssh-signing-policy-gate.md |
flex-auth |
wiki/PolicyGatedSigning.md |
ops-warden |
workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md |
ops-warden |
registry/flex-auth/production_registry_snapshot.json |
ops-warden |