generated from coulomb/repo-seed
Add production_registry_snapshot.json from ops-warden inventory with CI coverage for real actors, IAM subject binding, ttl_out_of_bounds, and unknown_actor_resource. Extend serve contract tests with /healthz and publish the registry sync contract for operator deployment.
129 lines
5.2 KiB
Markdown
129 lines
5.2 KiB
Markdown
# Ops-Warden Registry Sync
|
|
|
|
Date: 2026-06-23
|
|
Workplan: FLEX-WP-0007
|
|
|
|
This is the flex-auth side of the production policy gate runbook for ops-warden
|
|
SSH signing. ops-warden owns actor inventory and generated registry content;
|
|
flex-auth hosts that registry, evaluates the policy package, and returns the
|
|
decision envelope used by warden sign.
|
|
|
|
## Production Runtime Target
|
|
|
|
Use the NetKingdom operator-reachable service URL as the canonical
|
|
policy.flex_auth_url. The preferred target is an in-cluster flex-auth Service
|
|
fronted by the existing operator access path:
|
|
|
|
http://flex-auth.flex-auth.svc.cluster.local:8080
|
|
|
|
If cluster DNS is not reachable from the workstation that runs warden sign, use
|
|
an approved operator tunnel or ingress URL with the same base path semantics. Do
|
|
not turn on policy.enabled with fail_closed true until this pre-flight succeeds
|
|
from the same workstation:
|
|
|
|
curl -fsS <policy.flex_auth_url>/healthz
|
|
|
|
Start the runtime with the production registry snapshot and the ops-warden
|
|
policy package:
|
|
|
|
flex-auth serve --addr 0.0.0.0:8080 --registry examples/ops-warden/production_registry_snapshot.json --policy examples/ops-warden/policy_package.md --log /var/log/flex-auth/ops-warden-decisions.jsonl
|
|
|
|
The checked-in production snapshot is a non-secret fixture and initial load
|
|
target. Regenerate it from ops-warden inventory whenever actors, principals, or
|
|
TTL defaults change.
|
|
|
|
## Current Operator Tunnel
|
|
|
|
As of 2026-06-24, the reachable operator-tunnel URL for CoulombCore is:
|
|
|
|
http://127.0.0.1:18090
|
|
|
|
The tunnel name is flex-auth-coulombcore. It forwards CoulombCore
|
|
127.0.0.1:18090 to the local flex-auth runtime on 127.0.0.1:18090. Verified
|
|
checks from CoulombCore:
|
|
|
|
- GET /healthz returned HTTP 200.
|
|
- POST /v1/check for agt-state-hub-bridge returned allow with decision:873c6c682a52bebc.
|
|
|
|
This is an operator tunnel pattern, not a substitute for a future in-cluster
|
|
Service if flex-auth should run inside the cluster.
|
|
|
|
## Ownership Contract
|
|
|
|
| Concern | Owner | Notes |
|
|
| --- | --- | --- |
|
|
| Actor names and actor types | ops-warden | inventory.yaml defines adm, agt, and atm actors. |
|
|
| Default principals and TTLs | ops-warden | Used by warden sign and by generated registry attributes. |
|
|
| Registry hosting and reload | flex-auth | Runtime serves the generated snapshot and evaluates it with the policy package. |
|
|
| Policy package semantics | flex-auth | examples/ops-warden/policy_package.md owns allow and deny reasons. |
|
|
| OpenBao SSH signing | ops-warden | flex-auth never receives SSH private keys or Vault tokens. |
|
|
| Production policy.enabled flip | ops-warden operator | Only after healthz and allow/deny smoke pass. |
|
|
|
|
## Sync Procedure
|
|
|
|
1. In ops-warden, update the managed inventory source or ~/.config/warden/inventory.yaml.
|
|
2. Regenerate the flex-auth snapshot from ops-warden:
|
|
|
|
python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml -o registry/flex-auth/production_registry_snapshot.json
|
|
|
|
3. Validate the generated file before handoff:
|
|
|
|
flex-auth load-registry --file registry/flex-auth/production_registry_snapshot.json
|
|
|
|
4. Copy or promote the snapshot to the flex-auth runtime. For repo-level drift
|
|
coverage, update examples/ops-warden/production_registry_snapshot.json when
|
|
the intended production fixture changes.
|
|
5. Restart or reload the flex-auth runtime with the new snapshot.
|
|
6. From the workstation that runs warden sign, verify:
|
|
|
|
curl -fsS <policy.flex_auth_url>/healthz
|
|
|
|
7. Run one allow smoke and one deny smoke. Record only non-secret evidence:
|
|
actor name, decision id, effect, reason, backend, and whether a certificate
|
|
was issued.
|
|
|
|
## Current Production Fixture
|
|
|
|
The initial fixture mirrors ops-warden production inventory as of 2026-06-23.
|
|
It registers:
|
|
|
|
| Actor | Type | Principal | Max TTL hours | Allowed subjects |
|
|
| --- | --- | --- | --- | --- |
|
|
| adm-example | adm | adm-full | 48 | adm-example, iam:adm-example |
|
|
| agt-codex-interhub-bootstrap | agt | agt-interhub-bootstrap | 2 | agt-codex-interhub-bootstrap, iam:agt-codex-interhub-bootstrap |
|
|
| agt-state-hub-bridge | agt | agt-task-bridge | 24 | agt-state-hub-bridge, iam:agt-state-hub-bridge |
|
|
| atm-backup-daily | atm | atm-backup-daily | 8 | atm-backup-daily, iam:atm-backup-daily |
|
|
|
|
The IAM subject form is intended for WARDEN_POLICY_SUBJECT. If that environment
|
|
variable is unset, ops-warden sends the actor name and the same policy path
|
|
continues to work.
|
|
|
|
## Smoke Expectations
|
|
|
|
Allow path:
|
|
|
|
warden sign agt-state-hub-bridge
|
|
|
|
Expected non-secret evidence: decision effect allow, reason
|
|
signing_policy_matched, signatures.log includes policy_decision_id.
|
|
|
|
Deny path:
|
|
|
|
warden sign agt-state-hub-bridge --ttl 999
|
|
|
|
Expected non-secret evidence: effect deny, reason ttl_out_of_bounds, no
|
|
certificate issued. With fail_closed true, unreachable flex-auth must also block
|
|
signing.
|
|
|
|
OpenBao-backed signing remains an operator smoke because it requires a scoped
|
|
VAULT_TOKEN. The previous session returned HTTP 403 on 2026-06-23; retry with:
|
|
|
|
SMOKE_VAULT=1 ~/ops-warden/scripts/policy_gate_production_smoke.sh
|
|
|
|
## References
|
|
|
|
- docs/ops-warden-policy-gate-handoff.md
|
|
- examples/ops-warden/production_registry_snapshot.json
|
|
- ~/ops-warden/wiki/PolicyGatedSigning.md
|
|
- ~/ops-warden/history/2026-06-23-flex-auth-policy-gate-production-smoke.md
|