Files
ops-warden/history/2026-06-23-flex-auth-production-pickup-suggestion.md
tegwick 90007c2cda feat: close WP-0009/WP-0013 production integration stewardship strand
Ship flex-auth policy gate registry and smoke evidence, archive WP-0009
through WP-0013, and add integration docs: ops-bridge cert_command
migration playbook, operator OpenBao token hygiene, principals drift
check script, and 2026-06-24 INTENT/SCOPE gap analysis.
2026-06-24 12:44:32 +02:00

7.3 KiB

flex-auth Pickup Suggestion — Ops-Warden Policy Gate Production

Date: 2026-06-23
From: ops-warden (WARDEN-WP-0009 finished)
For: flex-auth owner
Prior delivery: FLEX-WP-0006 (policy package, template registry, handoff doc)


Summary

ops-warden closed WARDEN-WP-0009. The caller side (policy.enabled, POST /v1/check, policy_decision_id in signatures.log) is verified. flex-auth policy authoring for the gate contract is done.

What remains is flex-auth production runtime + registry operations so operators can set policy.enabled: true on workstations running warden sign without local flex-auth serve hacks.


What ops-warden already proved

Evidence Location
Template registry + policy smoke history/2026-06-23-flex-auth-policy-gate-local-smoke.md
Production inventory registry smoke history/2026-06-23-flex-auth-policy-gate-production-smoke.md
Production registry artifact registry/flex-auth/production_registry_snapshot.json
Registry generator scripts/build_flex_auth_registry.py
Joint smoke runner scripts/policy_gate_production_smoke.sh

Production-registry allow smoke (real actor agt-state-hub-bridge):

  • policy_decision_id: decision:032b096c433ad80c
  • Deny: ttl_out_of_bounds with fail_closed: true

OpenBao-backed sign + policy gate is not yet joint-verified — scoped VAULT_TOKEN returned HTTP 403 in this session (ops-warden operator task).


Gaps flex-auth should pick up

1. Production runtime deployment (P0)

Problem: No reachable flex-auth endpoint from the operator workstation. Probe from WSL: flex-auth.flex-auth.svc.cluster.local:8080 does not resolve; 127.0.0.1:8080 is not running. ops-warden cannot enable policy.enabled with fail_closed: true until flex-auth is up.

Suggestion for flex-auth:

  • Deploy flex-auth serve (or equivalent) to a stable production URL reachable from machines that run warden sign.
  • Document the canonical URL for policy.flex_auth_url (cluster DNS, tunnel, or ingress — whichever matches NetKingdom operator access patterns).
  • Expose GET /healthz (already in code) in runbooks; ops-warden operators will use it as a pre-flight before enabling the gate.

Acceptance: Operator can curl <flex_auth_url>/healthz from the warden workstation and get HTTP 200.


2. Load production registry, not only template fixtures (P0)

Problem: examples/ops-warden/registry_snapshot.json uses template actors (platform-steward, ci-deploy-agent, backup-automation). Production inventory uses different names (agt-state-hub-bridge, etc.). Signing with policy.enabled: true denies unregistered actors (unknown_actor_resource).

Suggestion for flex-auth:

  • Adopt ops-warden's production registry snapshot as the initial production load target, or ingest equivalent manifests under examples/ops-warden/ generated from real inventory.
  • Document operator steps:
    # ops-warden (regenerate when inventory changes)
    python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \
      -o registry/flex-auth/production_registry_snapshot.json
    
    # flex-auth (load into runtime)
    flex-auth load-registry --file <path-to-production_registry_snapshot.json>
    flex-auth serve --registry <snapshot> --policy examples/ops-warden/policy_package.md ...
    
  • Add fixture or integration tests using production actor names (agt-state-hub-bridge, adm-example, atm-backup-daily) so CI catches registry drift.

Acceptance: POST /v1/check allows agt-state-hub-bridge / sign against the deployed production registry without ops-warden-local registry patching.


3. Registry sync contract (P1)

Problem: ops-warden owns inventory.yaml; flex-auth owns authorization registry. Today sync is manual: regenerate JSON, reload flex-auth.

Suggestion for flex-auth:

  • Publish a short sync contract doc:
    • ops-warden owns: actor names, types, principals, TTL defaults
    • flex-auth owns: allowed_subjects, max_ttl_hours, relationships, policy package
    • Trigger: inventory add/change → regenerate snapshot → flex-auth reload
  • Optional later: flex-auth validate target for ops-warden-generated snapshots; or HTTP reload endpoint for registry updates without restart.

Acceptance: Documented two-repo workflow; no ambiguity on who updates what when a new agt-* actor is added.


4. Joint production smoke with OpenBao (P1)

Problem: Policy gate smoke used backend: local or local flex-auth. Full production path is warden sign → flex-auth → OpenBao SSH engine.

Suggestion for flex-auth:

  • Coordinate one joint smoke session with ops-warden once:
    • flex-auth deployed with production registry
    • ops-warden policy.enabled: true, valid VAULT_TOKEN
    • Allow: warden sign agt-state-hub-bridgesignatures.log has backend: vault and policy_decision_id
    • Deny: e.g. --ttl above max → flex-auth deny before OpenBao call
  • Record non-secret evidence (decision ids, reasons, actor names only).

Acceptance: Shared history entry or flex-auth handoff update with vault-backed evidence mirroring ops-warden's local smoke format.


5. IAM subject binding in production (P2)

Problem: Policy allows subject.id = actor name or iam:<actor>. Production may set WARDEN_POLICY_SUBJECT from key-cape/IAM profile sub.

Suggestion for flex-auth:

  • Confirm production registry allowed_subjects covers expected IAM subs for each actor (or document that actor-name fallback is the production default until IAM mapping is wired).
  • Add one fixture for WARDEN_POLICY_SUBJECT / iam:agt-state-hub-bridge if that path is intended in prod.

Acceptance: Documented subject-id strategy for SSH sign gate in production.


Proposed flex-auth workplan (draft)

Title: FLEX-WP-0007 — Ops-Warden Policy Gate Production Deployment
Priority: P0
Depends on: FLEX-WP-0006, ops-warden WARDEN-WP-0009 (finished)

Task Summary
T1 Deploy flex-auth runtime; document production flex_auth_url + /healthz
T2 Load production registry snapshot; verify allow/deny for real inventory actors
T3 Publish registry sync contract with ops-warden (inventory.yaml → snapshot)
T4 Joint OpenBao + policy gate smoke with ops-warden (non-secret evidence)
T5 IAM subject binding notes / fixtures for WARDEN_POLICY_SUBJECT (if needed)

Ownership boundary (unchanged)

Concern Owner
Policy package + PDP decision flex-auth
Actor inventory + TTL/principal defaults ops-warden
SSH CA / OpenBao signing ops-warden
Production registry content for SSH actors joint — ops-warden generates from inventory; flex-auth hosts and evaluates
policy.enabled flip ops-warden operator (after flex-auth reachable)

References

Doc Repo
docs/ops-warden-policy-gate-handoff.md flex-auth
workplans/FLEX-WP-0006-ops-warden-ssh-signing-policy-gate.md flex-auth
wiki/PolicyGatedSigning.md ops-warden
workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md ops-warden
registry/flex-auth/production_registry_snapshot.json ops-warden