feat: close WP-0009/WP-0013 production integration stewardship strand

Ship flex-auth policy gate registry and smoke evidence, archive WP-0009
through WP-0013, and add integration docs: ops-bridge cert_command
migration playbook, operator OpenBao token hygiene, principals drift
check script, and 2026-06-24 INTENT/SCOPE gap analysis.
This commit is contained in:
2026-06-24 12:44:32 +02:00
parent 1778b169da
commit 90007c2cda
24 changed files with 2192 additions and 121 deletions

View File

@@ -15,21 +15,26 @@ aligned with NetKingdom canon.
---
## Where we are (2026-06-18)
## Where we are (2026-06-24)
ops-warden **issues short-lived SSH certificates and routes every other credential
need to the subsystem that owns it.** SSH signing is **production-verified** on
Railiance OpenBao (`warden sign` against `https://bao.coulomb.social`, host CA trust
deployed). The routing material — `wiki/AccessRouting.md`, the credential routing
wiki, NetKingdom security map, a machine-readable pointer catalog
(`registry/routing/catalog.yaml`, WARDEN-WP-0010), and the `warden route`
lookup CLI over it (`list`/`show`/`find`, WARDEN-WP-0011) — is operational. The opt-in
flex-auth pre-sign gate is **coded but off in production** until flex-auth publishes
`ssh-certificate` policies (WARDEN-WP-0009).
deployed).
**Access routing** is shipped: `wiki/AccessRouting.md`, credential routing wiki,
NetKingdom security map, machine-readable pointer catalog
(`registry/routing/catalog.yaml`, WP-0010), and `warden route` lookup CLI
(`list`/`show`/`find`, `--json`, WP-0011).
**Policy gate** is shipped on the caller side (WP-0007) with production registry
and smoke evidence (WP-0009 archived). flex-auth published the `ssh-certificate`
policy package (FLEX-WP-0006). `policy.enabled` remains **false** in production
until flex-auth is deployed to a reachable URL (flex-auth FLEX-WP-0007).
**INTENT alignment:** SSH issuance mission met in production. Remaining distance
is integration breadth (ops-bridge `cert_command` on live tunnels), authorization
depth (flex-auth), and operator hygiene — not missing signing code.
is integration breadth (ops-bridge `cert_command` on live tunnels), flex-auth
runtime deployment (not ops-warden code), and operator hygiene.
### Issue vs route
@@ -47,7 +52,9 @@ ops-warden executes exactly one lane and points at the owner for the rest.
Full role and boundary: `wiki/AccessRouting.md`. The catalog is a **pointer layer**
it never restates an owner's procedure (authored `steps` exist only for the SSH lane).
Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
Gap analysis: `history/2026-06-24-intent-scope-gap-analysis.md` (current);
`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` (SSH lane);
`history/2026-06-18-access-routing-intent-shift-assessment.md` (routing charter).
---
@@ -66,8 +73,8 @@ Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
| Dimension | Level | Meaning today |
| --- | --- | --- |
| D5 | Discovery | Routing wiki + security map + pointer catalog + NK canon cross-links |
| A4 | Availability | CLI + opt-in policy gate + `warden route` lookup over the machine-readable catalog (`list`/`show`/`find`, `--json` for agents) |
| C4 | Completeness | SSH lane prod-verified; flex-auth policies external |
| A4 | Availability | CLI + `warden route` + opt-in policy gate + agent `--json` lookup |
| C4 | Completeness | SSH lane prod-verified; policy gate + registry smoke shipped; prod flip waits flex-auth deploy |
| R3 | Reliability | Live OpenBao sign evidence on Railiance |
---
@@ -75,9 +82,9 @@ Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
## Core Idea
**Today:** implements the SSH certificate lane from `wiki/AccessManagementDirective.md`
§§15 — CA signing, actor inventory, TTL policy, cert-side scorecard, and the
`cert_command` interface for ops-bridge. Production path uses OpenBao SSH engine
(`backend: vault`).
§§15 — CA signing, actor inventory, TTL policy, cert-side scorecard, optional
flex-auth pre-sign gate, and the `cert_command` interface for ops-bridge. Production
path uses OpenBao SSH engine (`backend: vault`).
**Direction (INTENT):** issue short-lived SSH certificates and route dev workers to
key-cape, flex-auth, OpenBao, ops-bridge, and railiance components for everything
@@ -96,6 +103,10 @@ for the rest.
- `cert_command`: `warden sign <actor> --pubkey <path>` → cert on stdout
- TTL enforcement per `ActorType` (`adm` 48 h, `agt` 24 h, `atm` 8 h)
- `warden status`, cleanup, scorecard, signatures log
- Opt-in flex-auth policy gate (`policy.enabled`, `policy_decision_id` in log)
- Production flex-auth registry builder (`scripts/build_flex_auth_registry.py`,
`registry/flex-auth/production_registry_snapshot.json`)
- Policy gate smoke runner (`scripts/policy_gate_production_smoke.sh`)
- `warden route` lookup CLI (`list`/`show`/`find`, `--json`) over the pointer catalog
- `warden issue` and `ops-ssh-wrapper` (local backend; vault uses sign-only)
- Runbooks for OpenBao config and Inter-Hub bootstrap SSH envelope
@@ -105,38 +116,38 @@ for the rest.
- NetKingdom security routing guidance — which subsystem owns which credential type
- Wiki and config references aligned with OpenBao-first platform standard
- Capability registry entry for SSH certificate issuance
- Routing pointer catalog (`registry/routing/catalog.yaml`)
- Keeping ops access patterns consistent with `net-kingdom` platform architecture
### Shipped workplans
### Shipped workplans (archived)
| WP | Focus |
| --- | --- |
| WP-00010005 | Initial CLI, quality, hygiene, OpenBao docs, hub sync |
| WP-0006 | Credential routing, security map, inventory patterns, OpenBao checklist |
| WP-0007 | Opt-in flex-auth policy gate (`policy.enabled`) |
| WP-0008 | Production sign verification, stewardship closeout, archive hygiene |
| WP-0010 | "Issue SSH, route the rest" wording + `wiki/AccessRouting.md` + pointer catalog |
| WP-0011 | `warden route` lookup CLI (`list`/`show`/`find`) over the pointer catalog (A3 → A4) |
| WP-0009 | flex-auth registry + policy smoke; pickup brief for FLEX-WP-0007 |
| WP-0010 | Access routing charter + pointer catalog |
| WP-0011 | `warden route` lookup CLI |
| WP-0013 | Production integration closeout — cert_command playbook, token hygiene, principals drift |
### Active / wait
### Active / ready
| WP | Status | Focus |
| --- | --- | --- |
| **WP-0009** | `blocked` | flex-auth `ssh-certificate` policies + `policy.enabled` production smoke |
| **WP-0012** | `backlog` | Routing scenario playbooks (draft until owner paths ship) |
| **WP-0012** | `ready` | Routing scenario playbooks (catalog + wiki expansion) |
### Known gaps (not yet workplanned)
### Known gaps (not ops-warden workplans)
| Gap | Owner | Notes |
| --- | --- | --- |
| ops-bridge `cert_command` on live tunnels | ops-bridge | Tunnels use `agt-claude-*` static keys today |
| Operator token hygiene | Operator | Prefer OIDC + `warden-sign`; retire root from shell profile |
| Principals sync warden ↔ railiance-infra | ops-warden + infra | `inventory.yaml` hosts vs `ssh_principals.yaml` |
| flex-auth production runtime + registry deploy | flex-auth | **FLEX-WP-0007** — unblocks `policy.enabled: true` |
| Vault-backed policy gate joint smoke | flex-auth + operator | Needs valid scoped `VAULT_TOKEN` |
| ops-bridge `cert_command` on live tunnels | ops-bridge | Playbook shipped (`wiki/playbooks/ops-bridge-tunnel-cert.md`); pilot pending |
| Principals sync warden ↔ railiance-infra | ops-warden + infra | `scripts/check_principals_drift.py` — operator runs periodically |
| NK-WP-0009 joint SSH tutorial | net-kingdom | Parallel coordination track |
The integration-closeout strand (ops-bridge tunnel migration, token runbook) from
reassessment §6 is not yet workplanned; WARDEN-WP-0010 was used for the access-routing
charter instead. Open a new WP when tunnel migration becomes priority.
---
## Out of Scope
@@ -145,6 +156,7 @@ charter instead. Open a new WP when tunnel migration becomes priority.
with flex-auth policy where required; ops-warden documents paths only
- Identity / OIDC / MFA → key-cape, Keycloak
- Authorization policy decisions → flex-auth
- flex-auth runtime deployment → flex-auth (`FLEX-WP-0007`)
- Tunnel lifecycle → `ops-bridge`
- Host principal deployment → `railiance-infra`
- OpenBao / Vault cluster deployment → `railiance-platform`
@@ -157,10 +169,12 @@ charter instead. Open a new WP when tunnel migration becomes priority.
- Issuing or refreshing an **SSH cert** for `adm`/`agt`/`atm`
- A dev worker needs to know **where to get credentials** in the NetKingdom stack
- An agent needs **`warden route find`** instead of re-deriving routing from wiki prose
- `ops-bridge` needs a `cert_command` for a tunnel
- Adding actors to the principals inventory
- Adding actors to the principals inventory (regenerate flex-auth registry snapshot)
- Inter-Hub or bootstrap tasks need a **short-lived agent SSH envelope**
- Checking cert-side compliance (scorecard)
- Enabling or testing the opt-in flex-auth policy gate
---
@@ -177,9 +191,12 @@ charter instead. Open a new WP when tunnel migration becomes priority.
- **SSH CLI:** v0.1.0 — local + OpenBao backends
- **Production sign:** verified 2026-06-18 (`history/2026-06-17-openbao-production-verify.md`)
- **Policy gate:** shipped, `policy.enabled: false` in prod until WP-0009
- **Active workplan:** WP-0009 (wait — flex-auth)
- **Latest assessment:** `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
- **Access routing:** WP-0010 + WP-0011 shipped (`warden route`, pointer catalog)
- **Policy gate:** caller shipped (WP-0007); registry + smoke complete (WP-0009 archived).
`policy.enabled: false` until flex-auth reachable (`FLEX-WP-0007`)
- **Ready work:** WP-0012 (routing playbooks)
- **Integration docs:** cert_command migration, token hygiene, principals drift (`wiki/playbooks/`)
- **Latest assessment:** `history/2026-06-24-intent-scope-gap-analysis.md`
---
@@ -195,7 +212,8 @@ key-cape / Keycloak identity claims
```
Upstream: OpenBao SSH engine (production) or local CA (labs). Actor inventory in
operator config or Git-tracked patterns.
operator config or Git-tracked patterns. flex-auth registry snapshot derived from
inventory when policy gate is enabled.
Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operators.
@@ -207,6 +225,7 @@ Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operato
- `cert_command`: shell command returning a cert on stdout
- `inventory.yaml`: actor → principals + TTL registry
- `LocalCA` / `VaultCA`: signing backends (`backend: local` | `vault`)
- Pointer catalog: `registry/routing/catalog.yaml` — subsystem ownership lookup only
---
@@ -218,7 +237,7 @@ Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operato
| `ops-bridge` | Primary cert_command consumer |
| `railiance-infra` | Host-side SSH principals and hardening |
| `railiance-platform` | OpenBao deployment and platform secrets |
| `flex-auth` | Authorization; opt-in pre-sign policy gate (`policy.enabled`) |
| `flex-auth` | Authorization; policy package shipped (FLEX-WP-0006); runtime deploy FLEX-WP-0007 |
| `key-cape` | Identity / IAM Profile lightweight mode |
| `state-hub` | Workstream registry |
@@ -243,14 +262,17 @@ keywords: [ssh, certificate, ca, credential, warden, ops-warden, pki, openbao, v
| --- | --- |
| `INTENT.md` | Why ops-warden exists and where it is going |
| `SCOPE.md` | What is implemented today (this file) |
| `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | Latest INTENT ↔ SCOPE gap analysis |
| `wiki/AccessRouting.md` | What ops-warden issues vs routes (role and boundary) |
| `wiki/CredentialRouting.md` | Which subsystem for each credential need |
| `registry/routing/catalog.yaml` | Machine-readable routing pointer catalog |
| `wiki/NetKingdomSecurityMap.md` | Platform security component map |
| `examples/warden.production.example.yaml` | Production warden.yaml template |
| `wiki/PolicyGatedSigning.md` | flex-auth opt-in gate + registry rollout |
| `wiki/AccessManagementDirective.md` | SSH actor model |
| `wiki/OpsWardenConfig.md` | warden.yaml and OpenBao |
| `wiki/CertCommandInterface.md` | cert_command contract |
| `wiki/PolicyGatedSigning.md` | flex-auth opt-in gate |
| `history/2026-06-24-intent-scope-gap-analysis.md` | Current gap analysis + WP-0013 |
| `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | SSH lane gap analysis |
| `history/2026-06-18-access-routing-intent-shift-assessment.md` | Routing charter decision |
| `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` | Policy gate smoke evidence |
| `net-kingdom/docs/platform-identity-security-architecture.md` | Platform security canon |

View File

@@ -15,10 +15,12 @@ vault:
inventory_path: ~/.config/warden/inventory.yaml
state_dir: ~/.local/state/warden
# Opt-in flex-auth gate — keep false until ssh-certificate policies exist
# Opt-in flex-auth gate — enable only when flex-auth is reachable at flex_auth_url.
# Registry: registry/flex-auth/production_registry_snapshot.json (build from inventory).
# See wiki/PolicyGatedSigning.md (operator checklist) and wiki/playbooks/operator-openbao-token-hygiene.md
policy:
enabled: false
flex_auth_url: http://127.0.0.1:8080
flex_auth_url: http://flex-auth.flex-auth.svc.cluster.local:8080
fail_closed: true
tenant: tenant:platform
subject_env: WARDEN_POLICY_SUBJECT

View File

@@ -0,0 +1,70 @@
# flex-auth Policy Gate — Local Smoke (WARDEN-WP-0009)
**Date:** 2026-06-23
**Workplan:** WARDEN-WP-0009 T01 closeout + T02 local smoke
**flex-auth delivery:** FLEX-WP-0006 (`docs/ops-warden-policy-gate-handoff.md`)
---
## Unblock
flex-auth published the `ssh-certificate` / `sign` policy package and ops-warden
handoff on 2026-06-23. WARDEN-WP-0009 T01 is complete; T2 local smoke below.
Production enablement still requires deploying a **production registry slice**
with real inventory actors (see `wiki/PolicyGatedSigning.md`).
---
## flex-auth assets confirmed
| Asset | Path (flex-auth repo) |
| --- | --- |
| Policy package | `examples/ops-warden/policy_package.md` |
| Fixtures | `examples/ops-warden/policy_fixtures.yaml` |
| Registry snapshot | `examples/ops-warden/registry_snapshot.json` |
| Handoff | `docs/ops-warden-policy-gate-handoff.md` |
Example registry actors (`platform-steward`, `ci-deploy-agent`, `backup-automation`)
are **templates**. Production actors such as `agt-state-hub-bridge` must be
registered in the deployed flex-auth registry before `policy.enabled: true`.
---
## Local smoke (ops-warden + flex-auth)
**Setup:** `backend: local`, `policy.enabled: true`, `fail_closed: true`,
flex-auth `serve` with ops-warden policy package and a smoke registry that adds
`agt-policy-smoke` (ops-warden naming-compliant clone of the `agt` fixture).
### Allow path
| Check | Result |
| --- | --- |
| `warden sign agt-policy-smoke` | Pass (exit 0) |
| `signatures.log` `policy_decision_id` | `decision:78bc882eca883f29` |
| `signatures.log` `backend` | `local` |
### Deny path (`fail_closed: true`)
| Check | Result |
| --- | --- |
| `warden sign agt-state-hub-bridge` (not in flex-auth registry) | Fail (exit 1) |
| CLI reason surfaced | `unknown_actor_resource` |
| Cert issued | No |
---
## Production remaining (T2)
1. Deploy flex-auth registry + policy package to production flex-auth runtime.
2. Register production inventory actors (`agt-state-hub-bridge`, `adm-*`, `atm-*`).
3. Set `policy.flex_auth_url` and `policy.enabled: true` in production `warden.yaml`.
4. Repeat allow/deny smoke against OpenBao-backed `warden sign`; capture
`policy_decision_id` in `signatures.log` (non-secret evidence only).
---
## See also
- `wiki/PolicyGatedSigning.md` — bindings, rollout, handoff link
- `workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md`

View File

@@ -0,0 +1,99 @@
# flex-auth Policy Gate — Production Registry Smoke (WARDEN-WP-0009 T02)
**Date:** 2026-06-23
**Workplan:** WARDEN-WP-0009 T02
**Operator:** codex (non-secret evidence only)
---
## Production registry slice
Built from `~/.config/warden/inventory.yaml` (matches `examples/inventory.seed.yaml`):
| Artifact | Path |
| --- | --- |
| Registry snapshot | `registry/flex-auth/production_registry_snapshot.json` |
| Generator | `scripts/build_flex_auth_registry.py` |
| Smoke runner | `scripts/policy_gate_production_smoke.sh` |
`flex-auth load-registry` validation: **4 actors**, 3 groups, 4 relationships.
Registered actors:
| Actor | Type | max_ttl_hours | Principals |
| --- | --- | --- | --- |
| `agt-state-hub-bridge` | agt | 24 | `agt-task-bridge` |
| `agt-codex-interhub-bootstrap` | agt | 2 | `agt-interhub-bootstrap` |
| `adm-example` | adm | 48 | `adm-full` |
| `atm-backup-daily` | atm | 8 | `atm-backup-daily` |
Regenerate after inventory changes:
```bash
python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \
-o registry/flex-auth/production_registry_snapshot.json
```
Deploy the snapshot to the production flex-auth runtime (`flex-auth serve` or
future in-cluster deployment). Policy package path:
`~/flex-auth/examples/ops-warden/policy_package.md`.
---
## Smoke results (production inventory + registry)
flex-auth served locally with the production registry; `warden sign` used real
inventory actors and `policy.enabled: true`.
### Allow path — `agt-state-hub-bridge`
| Check | Result |
| --- | --- |
| `warden sign agt-state-hub-bridge` | Pass (exit 0) |
| `signatures.log` `policy_decision_id` | `decision:032b096c433ad80c` |
| `signatures.log` `actor` | `agt-state-hub-bridge` |
### Deny path — TTL above registry max (`fail_closed: true`)
| Check | Result |
| --- | --- |
| `warden sign agt-state-hub-bridge --ttl 999` | Fail (exit 1) |
| flex-auth reason | `ttl_out_of_bounds` |
| Cert issued | No |
---
## OpenBao-backed smoke (operator follow-up)
Attempted `backend: vault` against `https://bao.coulomb.social` with
`policy.enabled: true`. **Blocked:** `VAULT_TOKEN` in session returned HTTP 403
(`permission denied`). Baseline `warden sign` without policy gate fails the same
way — token refresh required before vault-backed policy smoke.
When a scoped `warden-sign` token is available:
```bash
export VAULT_TOKEN="<scoped-token>" # never commit or paste in chat
SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh
```
Then enable production `warden.yaml`:
```yaml
policy:
enabled: true
flex_auth_url: http://flex-auth.flex-auth.svc.cluster.local:8080 # or reachable URL
fail_closed: true
```
Keep `policy.enabled: false` until flex-auth is reachable at `flex_auth_url` from
the workstation running `warden sign``fail_closed: true` blocks all signs when
flex-auth is down.
---
## See also
- `history/2026-06-23-flex-auth-policy-gate-local-smoke.md` — template registry smoke
- `wiki/PolicyGatedSigning.md` — rollout sequence
- `~/flex-auth/docs/ops-warden-policy-gate-handoff.md`

View File

@@ -0,0 +1,189 @@
# flex-auth Pickup Suggestion — Ops-Warden Policy Gate Production
**Date:** 2026-06-23
**From:** ops-warden (`WARDEN-WP-0009` finished)
**For:** flex-auth owner
**Prior delivery:** `FLEX-WP-0006` (policy package, template registry, handoff doc)
---
## Summary
ops-warden closed **WARDEN-WP-0009**. The caller side (`policy.enabled`,
`POST /v1/check`, `policy_decision_id` in `signatures.log`) is verified.
flex-auth **policy authoring** for the gate contract is done.
What remains is **flex-auth production runtime + registry operations** so
operators can set `policy.enabled: true` on workstations running `warden sign`
without local `flex-auth serve` hacks.
---
## What ops-warden already proved
| Evidence | Location |
| --- | --- |
| Template registry + policy smoke | `history/2026-06-23-flex-auth-policy-gate-local-smoke.md` |
| Production inventory registry smoke | `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` |
| Production registry artifact | `registry/flex-auth/production_registry_snapshot.json` |
| Registry generator | `scripts/build_flex_auth_registry.py` |
| Joint smoke runner | `scripts/policy_gate_production_smoke.sh` |
Production-registry allow smoke (real actor `agt-state-hub-bridge`):
- `policy_decision_id: decision:032b096c433ad80c`
- Deny: `ttl_out_of_bounds` with `fail_closed: true`
OpenBao-backed sign + policy gate is **not yet joint-verified** — scoped
`VAULT_TOKEN` returned HTTP 403 in this session (ops-warden operator task).
---
## Gaps flex-auth should pick up
### 1. Production runtime deployment (P0)
**Problem:** No reachable flex-auth endpoint from the operator workstation.
Probe from WSL: `flex-auth.flex-auth.svc.cluster.local:8080` does not resolve;
`127.0.0.1:8080` is not running. ops-warden cannot enable `policy.enabled`
with `fail_closed: true` until flex-auth is up.
**Suggestion for flex-auth:**
- Deploy `flex-auth serve` (or equivalent) to a **stable production URL**
reachable from machines that run `warden sign`.
- Document the canonical URL for `policy.flex_auth_url` (cluster DNS, tunnel,
or ingress — whichever matches NetKingdom operator access patterns).
- Expose **`GET /healthz`** (already in code) in runbooks; ops-warden operators
will use it as a pre-flight before enabling the gate.
**Acceptance:** Operator can `curl <flex_auth_url>/healthz` from the warden
workstation and get HTTP 200.
---
### 2. Load production registry, not only template fixtures (P0)
**Problem:** `examples/ops-warden/registry_snapshot.json` uses **template**
actors (`platform-steward`, `ci-deploy-agent`, `backup-automation`). Production
inventory uses **different names** (`agt-state-hub-bridge`, etc.). Signing with
`policy.enabled: true` denies unregistered actors (`unknown_actor_resource`).
**Suggestion for flex-auth:**
- Adopt ops-warden's production registry snapshot as the **initial production
load target**, or ingest equivalent manifests under `examples/ops-warden/`
generated from real inventory.
- Document operator steps:
```bash
# ops-warden (regenerate when inventory changes)
python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \
-o registry/flex-auth/production_registry_snapshot.json
# flex-auth (load into runtime)
flex-auth load-registry --file <path-to-production_registry_snapshot.json>
flex-auth serve --registry <snapshot> --policy examples/ops-warden/policy_package.md ...
```
- Add **fixture or integration tests** using production actor names
(`agt-state-hub-bridge`, `adm-example`, `atm-backup-daily`) so CI catches
registry drift.
**Acceptance:** `POST /v1/check` allows `agt-state-hub-bridge` / `sign` against
the deployed production registry without ops-warden-local registry patching.
---
### 3. Registry sync contract (P1)
**Problem:** ops-warden owns `inventory.yaml`; flex-auth owns authorization
registry. Today sync is manual: regenerate JSON, reload flex-auth.
**Suggestion for flex-auth:**
- Publish a short **sync contract** doc:
- **ops-warden owns:** actor names, types, principals, TTL defaults
- **flex-auth owns:** `allowed_subjects`, `max_ttl_hours`, relationships,
policy package
- **Trigger:** inventory add/change → regenerate snapshot → flex-auth reload
- Optional later: `flex-auth validate` target for ops-warden-generated snapshots;
or HTTP reload endpoint for registry updates without restart.
**Acceptance:** Documented two-repo workflow; no ambiguity on who updates what
when a new `agt-*` actor is added.
---
### 4. Joint production smoke with OpenBao (P1)
**Problem:** Policy gate smoke used `backend: local` or local flex-auth. Full
production path is `warden sign` → flex-auth → OpenBao SSH engine.
**Suggestion for flex-auth:**
- Coordinate one **joint smoke session** with ops-warden once:
- flex-auth deployed with production registry
- ops-warden `policy.enabled: true`, valid `VAULT_TOKEN`
- Allow: `warden sign agt-state-hub-bridge` → `signatures.log` has
`backend: vault` and `policy_decision_id`
- Deny: e.g. `--ttl` above max → flex-auth deny before OpenBao call
- Record non-secret evidence (decision ids, reasons, actor names only).
**Acceptance:** Shared history entry or flex-auth handoff update with vault-backed
evidence mirroring ops-warden's local smoke format.
---
### 5. IAM subject binding in production (P2)
**Problem:** Policy allows `subject.id` = actor name or `iam:<actor>`. Production
may set `WARDEN_POLICY_SUBJECT` from key-cape/IAM profile `sub`.
**Suggestion for flex-auth:**
- Confirm production registry `allowed_subjects` covers expected IAM subs for
each actor (or document that actor-name fallback is the production default
until IAM mapping is wired).
- Add one fixture for `WARDEN_POLICY_SUBJECT` / `iam:agt-state-hub-bridge` if
that path is intended in prod.
**Acceptance:** Documented subject-id strategy for SSH sign gate in production.
---
## Proposed flex-auth workplan (draft)
**Title:** `FLEX-WP-0007 — Ops-Warden Policy Gate Production Deployment`
**Priority:** P0
**Depends on:** `FLEX-WP-0006`, ops-warden `WARDEN-WP-0009` (finished)
| Task | Summary |
| --- | --- |
| T1 | Deploy flex-auth runtime; document production `flex_auth_url` + `/healthz` |
| T2 | Load production registry snapshot; verify allow/deny for real inventory actors |
| T3 | Publish registry sync contract with ops-warden (`inventory.yaml` → snapshot) |
| T4 | Joint OpenBao + policy gate smoke with ops-warden (non-secret evidence) |
| T5 | IAM subject binding notes / fixtures for `WARDEN_POLICY_SUBJECT` (if needed) |
---
## Ownership boundary (unchanged)
| Concern | Owner |
| --- | --- |
| Policy package + PDP decision | flex-auth |
| Actor inventory + TTL/principal defaults | ops-warden |
| SSH CA / OpenBao signing | ops-warden |
| Production registry **content** for SSH actors | joint — ops-warden generates from inventory; flex-auth hosts and evaluates |
| `policy.enabled` flip | ops-warden operator (after flex-auth reachable) |
---
## References
| Doc | Repo |
| --- | --- |
| `docs/ops-warden-policy-gate-handoff.md` | flex-auth |
| `workplans/FLEX-WP-0006-ops-warden-ssh-signing-policy-gate.md` | flex-auth |
| `wiki/PolicyGatedSigning.md` | ops-warden |
| `workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md` | ops-warden |
| `registry/flex-auth/production_registry_snapshot.json` | ops-warden |

View File

@@ -0,0 +1,127 @@
# INTENT ↔ SCOPE Gap Analysis — Post WP-0009 / WP-0011
**Date:** 2026-06-24
**Author:** codex
**Trigger:** WARDEN-WP-0009 archived; WP-0010/0011 done; policy gate + routing shipped.
**Prior assessments:** `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`,
`history/2026-06-18-access-routing-intent-shift-assessment.md`
---
## 1. Executive summary
ops-warden is a **production-capable SSH CA** with **structured credential routing**
(`warden route`) and a **shipped, opt-in flex-auth policy gate** (registry + smoke
complete; production flip waits flex-auth runtime deploy).
INTENT's SSH issuance mission is **met in production**. The largest remaining INTENT
gap is **ops-bridge consumer integration**`cert_command` contract exists but live
tunnels still use static keys. Secondary gaps are **operator hygiene**, **inventory ↔
infra principals alignment**, **routing playbook depth** (WP-0012), and **cross-repo
coordination** (flex-auth FLEX-WP-0007, net-kingdom NK-WP-0009).
**Vector movement:** `D5 / A4 / C4 / R3`**`D5 / A4 / C4 / R3`** (unchanged level;
policy-gate readiness improves C4 substance without changing the label until prod flip)
| Dimension | Was | Now | Notes |
| --- | --- | --- | --- |
| Discovery | D5 | D5 | Catalog + `warden route` + wiki |
| Availability | A4 | A4 | Routing CLI shipped (WP-0011) |
| Completeness | C4 | C4 | Policy registry smoke done; prod `policy.enabled` off |
| Reliability | R3 | R3 | OpenBao sign verified; cert_command not on live tunnels |
---
## 2. Deliverables since 2026-06-18
| Workplan | Deliverable | Status |
| --- | --- | --- |
| WP-0009 | flex-auth policy package confirmed; production registry + smoke | Archived |
| WP-0010 | Access routing charter + pointer catalog | Archived 2026-06-24 |
| WP-0011 | `warden route` CLI + catalog tests | Archived 2026-06-24 |
| WP-0013 | Production integration closeout (playbooks, drift, archive) | Finished 2026-06-24 |
| FLEX-WP-0006 | flex-auth policy package + handoff | flex-auth finished |
| FLEX-WP-0007 | flex-auth production deploy (draft) | flex-auth proposed |
---
## 3. INTENT success criteria
| # | Criterion | Status | Evidence / gap |
| --- | --- | --- | --- |
| 1 | Worker knows which subsystem for each credential type | **Met** | `warden route`, catalog, wikis |
| 2 | SSH access short-lived, inventoried, audited | **Met (prod)** | OpenBao sign + `signatures.log` |
| 3 | ops-bridge integrates via stable `cert_command` | **Partial** | Contract shipped; tunnels static-key |
| 4 | NetKingdom evolution reflected in docs | **Met** | NK cross-links, routing charter |
| 5 | Non-SSH secrets stay out of ops-warden | **Met** | Pointer layer only |
**Score: 4 met, 1 partial** — partial is ops-bridge production adoption.
---
## 4. INTENT mission pillars
| Pillar | Status | Gap |
| --- | --- | --- |
| 1. Know NetKingdom security model | Strong | — |
| 2. Route workers to correct subsystem | Strong | WP-0012 playbooks deepen scenarios |
| 3. Align runbooks with canon | Strong | Reassessment + archive hygiene due |
| 4. Issue short-lived SSH certs | **Production** | — |
| 5. Audit SSH signing | Strong | Policy `policy_decision_id` when gate on |
---
## 5. Remaining gaps (prioritized)
| Prio | Gap | Owner | ops-warden action | Track |
| --- | --- | --- | --- | --- |
| **P1** | ops-bridge `cert_command` on production tunnels | ops-bridge + ops-warden | Migration playbook + pilot evidence | **WARDEN-WP-0013** T3 |
| **P2** | Operator token hygiene (root → scoped `warden-sign`) | Operator + ops-warden | Runbook in wiki | **WARDEN-WP-0013** T4 |
| **P3** | Principals drift (inventory ↔ railiance-infra) | ops-warden + infra | Drift check doc/script | **WARDEN-WP-0013** T5 |
| **P4** | Routing scenario playbooks incomplete | ops-warden | Expand catalog + wiki playbooks | **WARDEN-WP-0012** (ready) |
| **P5** | flex-auth production runtime | flex-auth | Coordinate; operator flip checklist | **FLEX-WP-0007** + WP-0013 T6 |
| **P6** | Vault-backed policy gate joint smoke | flex-auth + operator | Run when `VAULT_TOKEN` valid | FLEX-WP-0007 T4 |
| **P7** | Archive hygiene (WP-0010, WP-0011) | ops-warden | Move to `workplans/archived/` | **WARDEN-WP-0013** T2 |
| **P8** | NK-WP-0009 joint SSH tutorial | net-kingdom | Coordinate only | Parallel |
| **P9** | Policy v2.1 identity claims for `adm` | ops-warden + flex-auth | Design only | Future |
---
## 6. Workplan recommendation
**WARDEN-WP-0013 — Production Integration & Stewardship Closeout** (new):
- T1: This reassessment + SCOPE refresh
- T2: Archive WP-0010 and WP-0011
- T3: ops-bridge `cert_command` migration playbook (pilot `agt-state-hub-bridge`)
- T4: Operator OpenBao token hygiene runbook
- T5: Principals inventory drift check
- T6: Policy gate production enablement checklist (coordinate FLEX-WP-0007)
**WARDEN-WP-0012 — Routing Scenario Playbooks** (promote `backlog``ready`):
- Dependencies WP-0010/0011 shipped; start when bandwidth allows
- Complements WP-0013 (routing depth vs SSH integration closeout)
**Out of scope for new ops-warden WPs:**
- flex-auth runtime deployment (FLEX-WP-0007)
- ops-bridge tunnel config changes (ops-bridge executes; ops-warden documents)
---
## 7. Maturity target (post WP-0013 + WP-0012)
| Dimension | Target | Unlock |
| --- | --- | --- |
| C4 → C4+ | cert_command pilot documented | WP-0013 T3 |
| R3 → R4 | Live tunnel uses warden-signed cert | ops-bridge + WP-0013 evidence |
| D5 | More active catalog playbooks | WP-0012 |
---
## See also
- `workplans/WARDEN-WP-0013-production-integration-and-stewardship-closeout.md`
- `workplans/WARDEN-WP-0012-routing-scenario-playbooks.md`
- `SCOPE.md`

View File

@@ -0,0 +1,33 @@
# ops-bridge cert_command Pilot — Coordination Note
**Date:** 2026-06-24
**Workplan:** WARDEN-WP-0013 T3
**Playbook:** `wiki/playbooks/ops-bridge-tunnel-cert.md`
## Status
ops-warden shipped the migration playbook and upgraded catalog entry `ops-bridge-tunnel`.
Pilot tunnel **`agt-state-hub-bridge`** is documented with actor, key paths, and
`cert_command` string.
**Execution owner:** ops-bridge (tunnel config in `~/.config/bridge/tunnels.yaml`).
## Request to ops-bridge
Apply `cert_command` to the `state-hub-coulombcore` tunnel per the playbook migration
checklist. ops-warden will record smoke evidence in `history/` when the pilot completes
(non-secret: tunnel up/down, cert re-issue after TTL).
## Pre-requisites (operator)
- Scoped `VAULT_TOKEN` for production OpenBao sign (`wiki/playbooks/operator-openbao-token-hygiene.md`)
- `warden sign agt-state-hub-bridge` succeeds before tunnel config change
## Evidence pending
| Check | Status |
| --- | --- |
| Playbook on file | Done |
| Catalog `wiki_ref` | Done |
| ops-bridge tunnel config updated | Pending |
| `bridge up` smoke | Pending |

View File

@@ -0,0 +1,450 @@
{
"systems": [
{
"id": "ops-warden",
"name": "Ops Warden",
"resource_types": [
{
"name": "ssh-certificate",
"scope_level": "Resource",
"planes": [
"Identity",
"Secret",
"Audit"
],
"metadata": {
"description": "Short-lived SSH certificate signing request."
}
}
],
"actions": [
{
"name": "sign",
"capabilities": [
"Use",
"Operate",
"Audit"
],
"planes": [
"Identity",
"Secret",
"Audit"
],
"exposure_modes": [
"Metadata"
],
"metadata": {
"required_context": [
"principals",
"actor_type",
"pubkey_fingerprint",
"ttl_hours"
]
}
}
],
"caring_profiles": [
"caring-0.4.0-rc2"
],
"metadata": {
"flex_auth_contract": "protected-system-v0",
"ops_warden_policy_gate": "v2",
"policy_enabled_config": "policy.enabled",
"tenant": "tenant:platform"
}
}
],
"resource_manifests": [
{
"id": "ops-warden-ssh-certificates",
"system": "ops-warden",
"resources": [
{
"id": "ssh-cert:actor/adm-example",
"type": "ssh-certificate",
"labels": [
"ssh-signing",
"adm"
],
"trust_zone": "platform",
"owner": "team:platform-security",
"attributes": {
"actor_id": "adm-example",
"actor_type": "adm",
"allowed_subjects": [
"adm-example",
"iam:adm-example"
],
"allowed_principals": [
"adm-full"
],
"max_ttl_hours": 48
}
},
{
"id": "ssh-cert:actor/agt-codex-interhub-bootstrap",
"type": "ssh-certificate",
"labels": [
"ssh-signing",
"agt"
],
"trust_zone": "platform",
"owner": "team:platform-security",
"attributes": {
"actor_id": "agt-codex-interhub-bootstrap",
"actor_type": "agt",
"allowed_subjects": [
"agt-codex-interhub-bootstrap",
"iam:agt-codex-interhub-bootstrap"
],
"allowed_principals": [
"agt-interhub-bootstrap"
],
"max_ttl_hours": 2
}
},
{
"id": "ssh-cert:actor/agt-state-hub-bridge",
"type": "ssh-certificate",
"labels": [
"ssh-signing",
"agt"
],
"trust_zone": "platform",
"owner": "team:platform-security",
"attributes": {
"actor_id": "agt-state-hub-bridge",
"actor_type": "agt",
"allowed_subjects": [
"agt-state-hub-bridge",
"iam:agt-state-hub-bridge"
],
"allowed_principals": [
"agt-task-bridge"
],
"max_ttl_hours": 24
}
},
{
"id": "ssh-cert:actor/atm-backup-daily",
"type": "ssh-certificate",
"labels": [
"ssh-signing",
"atm"
],
"trust_zone": "platform",
"owner": "team:platform-security",
"attributes": {
"actor_id": "atm-backup-daily",
"actor_type": "atm",
"allowed_subjects": [
"atm-backup-daily",
"iam:atm-backup-daily"
],
"allowed_principals": [
"atm-backup-daily"
],
"max_ttl_hours": 8
}
}
],
"actions": [
"sign"
],
"caring_profile": "caring-0.4.0-rc2",
"metadata": {
"flex_auth_contract": "resource-registration-v0",
"tenant": "tenant:platform"
}
}
],
"tenants": [
{
"id": "tenant:platform",
"name": "Platform Tenant"
}
],
"subjects": [
{
"id": "adm-example",
"type": "Agent",
"display_name": "Example human operator \u2014 replace with per-person adm-* actors",
"organization_relation": "ServiceProvider",
"roles": [
"Operator"
],
"groups": [
"group:ops-warden-admins"
],
"tenant": "tenant:platform",
"metadata": {
"actor_type": "adm"
}
},
{
"id": "agt-codex-interhub-bootstrap",
"type": "Agent",
"display_name": "Short-lived agent access for attended Inter-Hub bootstrap",
"organization_relation": "ServiceProvider",
"roles": [
"Operator"
],
"groups": [
"group:ops-warden-agents"
],
"tenant": "tenant:platform",
"metadata": {
"actor_type": "agt"
}
},
{
"id": "agt-state-hub-bridge",
"type": "Agent",
"display_name": "ops-bridge tunnel agent for state-hub",
"organization_relation": "ServiceProvider",
"roles": [
"Operator"
],
"groups": [
"group:ops-warden-agents"
],
"tenant": "tenant:platform",
"metadata": {
"actor_type": "agt"
}
},
{
"id": "atm-backup-daily",
"type": "Automation",
"display_name": "Example nightly automation actor",
"organization_relation": "ServiceProvider",
"roles": [
"Operator"
],
"groups": [
"group:ops-warden-automations"
],
"tenant": "tenant:platform",
"metadata": {
"actor_type": "atm"
}
}
],
"groups": [
{
"id": "group:ops-warden-admins",
"display_name": "Ops Warden Admins",
"members": [
"adm-example"
],
"tenant": "tenant:platform"
},
{
"id": "group:ops-warden-agents",
"display_name": "Ops Warden Agents",
"members": [
"agt-codex-interhub-bootstrap",
"agt-state-hub-bridge"
],
"tenant": "tenant:platform"
},
{
"id": "group:ops-warden-automations",
"display_name": "Ops Warden Automations",
"members": [
"atm-backup-daily"
],
"tenant": "tenant:platform"
}
],
"relationships": [
{
"id": "rel:adm-example-sign-adm-example",
"system": "ops-warden",
"subject": "group:ops-warden-admins",
"relation": "signer",
"object": "ssh-cert:actor/adm-example",
"tenant": "tenant:platform",
"conditions": [
"TimeLimited",
"Logged"
],
"caring": {
"id": "descriptor:ops-warden-adm-signer",
"profile": "caring-0.4.0-rc2",
"subject_type": "Group",
"organization_relation": "ServiceProvider",
"canonical_role": "Operator",
"scope": {
"level": "Resource",
"id": "ssh-cert:actor/adm-example",
"tenant": "tenant:platform",
"resource": "ssh-cert:actor/adm-example"
},
"planes": [
"Identity",
"Secret",
"Audit"
],
"capabilities": [
"Use",
"Operate",
"Audit"
],
"exposure_modes": [
"Metadata"
],
"conditions": [
"TimeLimited",
"Logged"
],
"restrictions": [
"PrivilegeEscalationBlocked",
"SecretAccessBlocked"
],
"access_path": "mediated"
}
},
{
"id": "rel:agt-codex-interhub-bootstrap-sign-agt-codex-interhub-bootstrap",
"system": "ops-warden",
"subject": "group:ops-warden-agents",
"relation": "signer",
"object": "ssh-cert:actor/agt-codex-interhub-bootstrap",
"tenant": "tenant:platform",
"conditions": [
"TimeLimited",
"Logged"
],
"caring": {
"id": "descriptor:ops-warden-agt-signer",
"profile": "caring-0.4.0-rc2",
"subject_type": "Group",
"organization_relation": "ServiceProvider",
"canonical_role": "Operator",
"scope": {
"level": "Resource",
"id": "ssh-cert:actor/agt-codex-interhub-bootstrap",
"tenant": "tenant:platform",
"resource": "ssh-cert:actor/agt-codex-interhub-bootstrap"
},
"planes": [
"Identity",
"Secret",
"Audit"
],
"capabilities": [
"Use",
"Operate",
"Audit"
],
"exposure_modes": [
"Metadata"
],
"conditions": [
"TimeLimited",
"Logged"
],
"restrictions": [
"PrivilegeEscalationBlocked",
"SecretAccessBlocked"
],
"access_path": "mediated"
}
},
{
"id": "rel:agt-state-hub-bridge-sign-agt-state-hub-bridge",
"system": "ops-warden",
"subject": "group:ops-warden-agents",
"relation": "signer",
"object": "ssh-cert:actor/agt-state-hub-bridge",
"tenant": "tenant:platform",
"conditions": [
"TimeLimited",
"Logged"
],
"caring": {
"id": "descriptor:ops-warden-agt-signer",
"profile": "caring-0.4.0-rc2",
"subject_type": "Group",
"organization_relation": "ServiceProvider",
"canonical_role": "Operator",
"scope": {
"level": "Resource",
"id": "ssh-cert:actor/agt-state-hub-bridge",
"tenant": "tenant:platform",
"resource": "ssh-cert:actor/agt-state-hub-bridge"
},
"planes": [
"Identity",
"Secret",
"Audit"
],
"capabilities": [
"Use",
"Operate",
"Audit"
],
"exposure_modes": [
"Metadata"
],
"conditions": [
"TimeLimited",
"Logged"
],
"restrictions": [
"PrivilegeEscalationBlocked",
"SecretAccessBlocked"
],
"access_path": "mediated"
}
},
{
"id": "rel:atm-backup-daily-sign-atm-backup-daily",
"system": "ops-warden",
"subject": "group:ops-warden-automations",
"relation": "signer",
"object": "ssh-cert:actor/atm-backup-daily",
"tenant": "tenant:platform",
"conditions": [
"TimeLimited",
"Logged"
],
"caring": {
"id": "descriptor:ops-warden-atm-signer",
"profile": "caring-0.4.0-rc2",
"subject_type": "Group",
"organization_relation": "ServiceProvider",
"canonical_role": "Operator",
"scope": {
"level": "Resource",
"id": "ssh-cert:actor/atm-backup-daily",
"tenant": "tenant:platform",
"resource": "ssh-cert:actor/atm-backup-daily"
},
"planes": [
"Identity",
"Secret",
"Audit"
],
"capabilities": [
"Use",
"Operate",
"Audit"
],
"exposure_modes": [
"Metadata"
],
"conditions": [
"TimeLimited",
"Logged"
],
"restrictions": [
"PrivilegeEscalationBlocked",
"SecretAccessBlocked"
],
"access_path": "mediated"
}
}
]
}

View File

@@ -83,13 +83,13 @@ entries:
- id: ops-bridge-tunnel
title: SSH tunnel or port forward
need_keywords: [tunnel, port, forward, bridge, ops-bridge, reverse, transport, ssh-tunnel]
need_keywords: [tunnel, port, forward, bridge, ops-bridge, reverse, transport, ssh-tunnel, cert_command]
owner_repo: ops-bridge
subsystem: ops-bridge
warden_executes: false
wiki_ref: wiki/CredentialRouting.md#routing-table
wiki_ref: wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist
canon_ref: net-kingdom/docs/platform-identity-security-architecture.md#operational-ssh-path
reviewed: "2026-06-18"
reviewed: "2026-06-24"
status: active
- id: railiance-infra-principals

View File

@@ -0,0 +1,199 @@
#!/usr/bin/env python3
"""Build a flex-auth registry snapshot from ops-warden inventory.yaml.
Usage:
python scripts/build_flex_auth_registry.py inventory.yaml -o registry/flex-auth/production_registry_snapshot.json
flex-auth load-registry --file registry/flex-auth/production_registry_snapshot.json
"""
from __future__ import annotations
import argparse
import json
from pathlib import Path
from typing import Any
import yaml
GROUP_BY_TYPE = {
"adm": "group:ops-warden-admins",
"agt": "group:ops-warden-agents",
"atm": "group:ops-warden-automations",
}
SUBJECT_TYPE_BY_ACTOR = {
"adm": "Agent",
"agt": "Agent",
"atm": "Automation",
}
DESCRIPTOR_BY_TYPE = {
"adm": "descriptor:ops-warden-adm-signer",
"agt": "descriptor:ops-warden-agt-signer",
"atm": "descriptor:ops-warden-atm-signer",
}
def _caring_descriptor(actor_type: str, resource_id: str) -> dict[str, Any]:
return {
"id": DESCRIPTOR_BY_TYPE[actor_type],
"profile": "caring-0.4.0-rc2",
"subject_type": "Group",
"organization_relation": "ServiceProvider",
"canonical_role": "Operator",
"scope": {
"level": "Resource",
"id": resource_id,
"tenant": "tenant:platform",
"resource": resource_id,
},
"planes": ["Identity", "Secret", "Audit"],
"capabilities": ["Use", "Operate", "Audit"],
"exposure_modes": ["Metadata"],
"conditions": ["TimeLimited", "Logged"],
"restrictions": ["PrivilegeEscalationBlocked", "SecretAccessBlocked"],
"access_path": "mediated",
}
def build_registry(inventory: dict[str, Any]) -> dict[str, Any]:
actors: dict[str, Any] = inventory.get("actors") or {}
resources: list[dict[str, Any]] = []
subjects: list[dict[str, Any]] = []
groups: dict[str, list[str]] = {gid: [] for gid in GROUP_BY_TYPE.values()}
relationships: list[dict[str, Any]] = []
for name, entry in sorted(actors.items()):
actor_type = str(entry["type"])
principals = list(entry.get("principals") or [])
ttl_hours = int(entry.get("ttl_hours") or 24)
resource_id = f"ssh-cert:actor/{name}"
group_id = GROUP_BY_TYPE[actor_type]
resources.append(
{
"id": resource_id,
"type": "ssh-certificate",
"labels": ["ssh-signing", actor_type],
"trust_zone": "platform",
"owner": "team:platform-security",
"attributes": {
"actor_id": name,
"actor_type": actor_type,
"allowed_subjects": [name, f"iam:{name}"],
"allowed_principals": principals,
"max_ttl_hours": ttl_hours,
},
}
)
subjects.append(
{
"id": name,
"type": SUBJECT_TYPE_BY_ACTOR[actor_type],
"display_name": entry.get("description") or name,
"organization_relation": "ServiceProvider",
"roles": ["Operator"],
"groups": [group_id],
"tenant": "tenant:platform",
"metadata": {"actor_type": actor_type},
}
)
groups[group_id].append(name)
relationships.append(
{
"id": f"rel:{name}-sign-{name}",
"system": "ops-warden",
"subject": group_id,
"relation": "signer",
"object": resource_id,
"tenant": "tenant:platform",
"conditions": ["TimeLimited", "Logged"],
"caring": _caring_descriptor(actor_type, resource_id),
}
)
group_records = [
{
"id": gid,
"display_name": gid.replace("group:", "").replace("-", " ").title(),
"members": members,
"tenant": "tenant:platform",
}
for gid, members in groups.items()
if members
]
return {
"systems": [
{
"id": "ops-warden",
"name": "Ops Warden",
"resource_types": [
{
"name": "ssh-certificate",
"scope_level": "Resource",
"planes": ["Identity", "Secret", "Audit"],
"metadata": {
"description": "Short-lived SSH certificate signing request."
},
}
],
"actions": [
{
"name": "sign",
"capabilities": ["Use", "Operate", "Audit"],
"planes": ["Identity", "Secret", "Audit"],
"exposure_modes": ["Metadata"],
"metadata": {
"required_context": [
"principals",
"actor_type",
"pubkey_fingerprint",
"ttl_hours",
]
},
}
],
"caring_profiles": ["caring-0.4.0-rc2"],
"metadata": {
"flex_auth_contract": "protected-system-v0",
"ops_warden_policy_gate": "v2",
"policy_enabled_config": "policy.enabled",
"tenant": "tenant:platform",
},
}
],
"resource_manifests": [
{
"id": "ops-warden-ssh-certificates",
"system": "ops-warden",
"resources": resources,
"actions": ["sign"],
"caring_profile": "caring-0.4.0-rc2",
"metadata": {
"flex_auth_contract": "resource-registration-v0",
"tenant": "tenant:platform",
},
}
],
"tenants": [{"id": "tenant:platform", "name": "Platform Tenant"}],
"subjects": subjects,
"groups": group_records,
"relationships": relationships,
}
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("inventory", type=Path, help="ops-warden inventory.yaml")
parser.add_argument("-o", "--output", type=Path, required=True)
args = parser.parse_args()
inventory = yaml.safe_load(args.inventory.read_text()) or {}
registry = build_registry(inventory)
args.output.parent.mkdir(parents=True, exist_ok=True)
args.output.write_text(json.dumps(registry, indent=2) + "\n")
print(f"Wrote {args.output} ({len(registry['subjects'])} actors)")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,103 @@
#!/usr/bin/env python3
"""Compare warden inventory host principals with railiance-infra ssh_principals.yaml.
Usage:
python scripts/check_principals_drift.py \\
--inventory ~/.config/warden/inventory.yaml \\
--infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml
Exit 0 when no drift; exit 1 when principals differ. No secrets printed.
"""
from __future__ import annotations
import argparse
import sys
from pathlib import Path
from typing import Any
import yaml
def _inventory_host_principals(inventory: dict[str, Any]) -> set[str]:
principals: set[str] = set()
hosts = inventory.get("hosts") or {}
for host_entry in hosts.values():
allowed = host_entry.get("allowed_principals") or {}
for principal_list in allowed.values():
principals.update(principal_list)
return principals
def _infra_principals(infra: dict[str, Any]) -> set[str]:
principals: set[str] = set()
for host_data in (infra.get("ssh_principals") or {}).values():
for user_principals in (host_data.get("users") or {}).values():
principals.update(user_principals)
return principals
def _actor_principals(inventory: dict[str, Any]) -> set[str]:
principals: set[str] = set()
for entry in (inventory.get("actors") or {}).values():
principals.update(entry.get("principals") or [])
return principals
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--inventory",
type=Path,
default=Path.home() / ".config/warden/inventory.yaml",
)
parser.add_argument(
"--infra",
type=Path,
default=Path.home() / "railiance-infra/ansible/inventory/ssh_principals.yaml",
)
args = parser.parse_args()
if not args.inventory.exists():
print(f"inventory not found: {args.inventory}", file=sys.stderr)
return 2
if not args.infra.exists():
print(f"infra principals not found: {args.infra}", file=sys.stderr)
return 2
inventory = yaml.safe_load(args.inventory.read_text()) or {}
infra = yaml.safe_load(args.infra.read_text()) or {}
host_principals = _inventory_host_principals(inventory)
infra_principals = _infra_principals(infra)
actor_principals = _actor_principals(inventory)
only_inventory = sorted(host_principals - infra_principals)
only_infra = sorted(infra_principals - host_principals)
actors_not_on_hosts = sorted(actor_principals - host_principals)
drift = bool(only_inventory or only_infra or actors_not_on_hosts)
print(f"inventory hosts principals ({len(host_principals)}): {', '.join(sorted(host_principals)) or '(none)'}")
print(f"infra deployed principals ({len(infra_principals)}): {', '.join(sorted(infra_principals)) or '(none)'}")
print(f"inventory actor principals ({len(actor_principals)}): {', '.join(sorted(actor_principals)) or '(none)'}")
if only_inventory:
print("\nDRIFT: in inventory hosts but not infra:", ", ".join(only_inventory))
if only_infra:
print("DRIFT: in infra but not inventory hosts:", ", ".join(only_infra))
if actors_not_on_hosts:
print("WARN: actor principals not listed under any inventory host:", ", ".join(actors_not_on_hosts))
if not drift and not actors_not_on_hosts:
print("\nOK — no host/infra principal drift")
return 0
if drift:
print("\nRegenerate flex-auth registry after inventory changes:")
print(" python scripts/build_flex_auth_registry.py <inventory> -o registry/flex-auth/production_registry_snapshot.json")
return 1
print("\nOK — host/infra aligned (actor/host warning only)")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,105 @@
#!/usr/bin/env bash
# Production policy-gate smoke for WARDEN-WP-0009 T02.
#
# Validates flex-auth registry (from inventory), allow/deny paths through
# warden sign, and optionally OpenBao-backed signing when VAULT_TOKEN works.
#
# Usage:
# ./scripts/policy_gate_production_smoke.sh
# INVENTORY=~/.config/warden/inventory.yaml ./scripts/policy_gate_production_smoke.sh
# SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh # also test backend: vault
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
INVENTORY="${INVENTORY:-$HOME/.config/warden/inventory.yaml}"
REGISTRY="$ROOT/registry/flex-auth/production_registry_snapshot.json"
POLICY="${FLEX_AUTH_POLICY:-$HOME/flex-auth/examples/ops-warden/policy_package.md}"
FLEX_AUTH_BIN="${FLEX_AUTH_BIN:-/tmp/flex-auth}"
ADDR="${FLEX_AUTH_ADDR:-127.0.0.1:18090}"
PUBKEY="${PUBKEY:-$HOME/.ssh/agt-state-hub-bridge_ed25519.pub}"
ACTOR="${ACTOR:-agt-state-hub-bridge}"
SMOKE_DIR="$(mktemp -d /tmp/warden-prod-policy-smoke-XXXXXX)"
cleanup() {
if [[ -n "${FA_PID:-}" ]] && kill -0 "$FA_PID" 2>/dev/null; then
kill "$FA_PID" 2>/dev/null || true
wait "$FA_PID" 2>/dev/null || true
fi
}
trap cleanup EXIT
echo "==> Building registry from $INVENTORY"
uv run --directory "$ROOT" python scripts/build_flex_auth_registry.py \
"$INVENTORY" -o "$REGISTRY"
"$FLEX_AUTH_BIN" load-registry --file "$REGISTRY" >/dev/null
echo "==> Starting flex-auth on $ADDR"
"$FLEX_AUTH_BIN" serve \
--addr "$ADDR" \
--registry "$REGISTRY" \
--policy "$POLICY" \
--log "$SMOKE_DIR/flex-auth-decisions.jsonl" &
FA_PID=$!
sleep 0.6
ssh-keygen -t ed25519 -f "$SMOKE_DIR/ca_key" -N "" -q
cat >"$SMOKE_DIR/warden.yaml" <<EOF
backend: local
ca_key: $SMOKE_DIR/ca_key
state_dir: $SMOKE_DIR/state
inventory_path: $INVENTORY
policy:
enabled: true
flex_auth_url: http://$ADDR
fail_closed: true
tenant: tenant:platform
system: ops-warden
EOF
export WARDEN_CONFIG="$SMOKE_DIR/warden.yaml"
echo "==> Allow path: warden sign $ACTOR"
uv run --directory "$ROOT" warden sign "$ACTOR" --pubkey "$PUBKEY" >/dev/null
ALLOW_LINE="$(tail -1 "$SMOKE_DIR/state/signatures.log")"
python3 -c "import json,sys; e=json.loads(sys.argv[1]); assert e.get('policy_decision_id'), e; print('policy_decision_id:', e['policy_decision_id'])" "$ALLOW_LINE"
echo "==> Deny path: ttl above max"
set +e
DENY_OUT="$(uv run --directory "$ROOT" warden sign "$ACTOR" --pubkey "$PUBKEY" --ttl 999 2>&1)"
DENY_RC=$?
set -e
if [[ "$DENY_RC" -ne 1 ]]; then
echo "expected deny exit 1, got $DENY_RC" >&2
exit 1
fi
echo "$DENY_OUT" | grep -q "ttl_out_of_bounds"
if [[ "${SMOKE_VAULT:-0}" == "1" ]]; then
echo "==> Vault-backed allow (requires scoped VAULT_TOKEN)"
cat >"$SMOKE_DIR/warden-vault.yaml" <<EOF
backend: vault
vault:
addr: https://bao.coulomb.social
mount: ssh
role_map:
adm: adm-role
agt: agt-role
atm: atm-role
token_env: VAULT_TOKEN
inventory_path: $INVENTORY
state_dir: $SMOKE_DIR/state-vault
policy:
enabled: true
flex_auth_url: http://$ADDR
fail_closed: true
tenant: tenant:platform
system: ops-warden
EOF
export WARDEN_CONFIG="$SMOKE_DIR/warden-vault.yaml"
uv run --directory "$ROOT" warden sign "$ACTOR" --pubkey "$PUBKEY" >/dev/null
VAULT_LINE="$(tail -1 "$SMOKE_DIR/state-vault/signatures.log")"
python3 -c "import json,sys; e=json.loads(sys.argv[1]); assert e.get('backend')=='vault' and e.get('policy_decision_id'); print('vault policy_decision_id:', e['policy_decision_id'])" "$VAULT_LINE"
fi
echo "OK — production registry policy gate smoke passed"

View File

@@ -0,0 +1,34 @@
"""Tests for scripts/build_flex_auth_registry.py."""
import json
import subprocess
import sys
from pathlib import Path
import yaml
ROOT = Path(__file__).resolve().parents[1]
SCRIPT = ROOT / "scripts" / "build_flex_auth_registry.py"
INVENTORY = ROOT / "examples" / "inventory.seed.yaml"
def test_build_registry_from_inventory_seed(tmp_path):
out = tmp_path / "registry.json"
subprocess.run(
[sys.executable, str(SCRIPT), str(INVENTORY), "-o", str(out)],
check=True,
cwd=ROOT,
)
registry = json.loads(out.read_text())
actors = yaml.safe_load(INVENTORY.read_text())["actors"]
assert len(registry["subjects"]) == len(actors)
assert len(registry["resource_manifests"][0]["resources"]) == len(actors)
bridge = next(
r
for r in registry["resource_manifests"][0]["resources"]
if r["id"] == "ssh-cert:actor/agt-state-hub-bridge"
)
assert bridge["attributes"]["actor_type"] == "agt"
assert bridge["attributes"]["max_ttl_hours"] == 24
assert "agt-task-bridge" in bridge["attributes"]["allowed_principals"]

View File

@@ -0,0 +1,48 @@
"""Tests for scripts/check_principals_drift.py."""
import subprocess
import sys
from pathlib import Path
import yaml
ROOT = Path(__file__).resolve().parents[1]
SCRIPT = ROOT / "scripts" / "check_principals_drift.py"
def test_no_drift_when_aligned(tmp_path):
inv = tmp_path / "inventory.yaml"
infra = tmp_path / "ssh_principals.yaml"
inv.write_text(yaml.dump({
"actors": {"agt-test": {"type": "agt", "principals": ["agt-task-bridge"], "ttl_hours": 24}},
"hosts": {"host1": {"allowed_principals": {"agt": ["agt-task-bridge"]}}},
}))
infra.write_text(yaml.dump({
"ssh_principals": {"Host1": {"users": {"user1": ["agt-task-bridge"]}}},
}))
result = subprocess.run(
[sys.executable, str(SCRIPT), "--inventory", str(inv), "--infra", str(infra)],
cwd=ROOT,
capture_output=True,
text=True,
)
assert result.returncode == 0
assert "OK" in result.stdout
def test_drift_detected(tmp_path):
inv = tmp_path / "inventory.yaml"
infra = tmp_path / "ssh_principals.yaml"
inv.write_text(yaml.dump({
"hosts": {"host1": {"allowed_principals": {"agt": ["agt-missing"]}}},
}))
infra.write_text(yaml.dump({
"ssh_principals": {"Host1": {"users": {"user1": ["agt-other"]}}},
}))
result = subprocess.run(
[sys.executable, str(SCRIPT), "--inventory", str(inv), "--infra", str(infra)],
cwd=ROOT,
capture_output=True,
text=True,
)
assert result.returncode == 1
assert "DRIFT" in result.stdout

View File

@@ -128,6 +128,9 @@ vault login
`VAULT_TOKEN`). OpenBao uses the same header; you do not need a separate
`BAO_TOKEN` unless you configure `token_env` that way.
See `wiki/playbooks/operator-openbao-token-hygiene.md` for scoped `warden-sign`
tokens, OIDC routing, and HTTP 403 recovery.
On failure, `warden sign` suggests falling back to `--backend local` only for
lab recovery — not as a production substitute.
@@ -272,4 +275,5 @@ tunnels:
`ops-bridge` runs `cert_command` before each SSH launch, captures stdout as the cert,
and passes it alongside the private key via `ssh -i <key> -i <cert>`.
See `wiki/CertCommandInterface.md` for the full contract.
See `wiki/CertCommandInterface.md` for the full contract and
`wiki/playbooks/ops-bridge-tunnel-cert.md` for static-key → cert_command migration.

View File

@@ -1,7 +1,7 @@
# Policy-Gated SSH Signing
Date: 2026-06-17
Status: **implemented (opt-in)** — WARDEN-WP-0007
Date: 2026-06-23
Status: **implemented (opt-in)** — WARDEN-WP-0007; policy package confirmed FLEX-WP-0006
By default `warden sign` authorizes via **inventory allow-list** and TTL policy
only. When `policy.enabled: true` in `warden.yaml`, ops-warden calls flex-auth
@@ -104,12 +104,129 @@ defines **what the actor is allowed to request**.
---
## flex-auth policy package (FLEX-WP-0006)
flex-auth owns the `ssh-certificate` / `sign` policy package. ops-warden consumes
it via `POST /v1/check` when `policy.enabled: true`.
**Handoff (canonical):** `~/flex-auth/docs/ops-warden-policy-gate-handoff.md`
| Asset | flex-auth path |
| --- | --- |
| Policy package | `examples/ops-warden/policy_package.md` |
| Allow/deny fixtures | `examples/ops-warden/policy_fixtures.yaml` |
| Registry snapshot | `examples/ops-warden/registry_snapshot.json` |
| Subject manifest | `examples/ops-warden/subject_manifest.yaml` |
| Resource manifest | `examples/ops-warden/resource_manifest.yaml` |
### Tenant and subject bindings
| Field | Value |
| --- | --- |
| Tenant | `tenant:platform` (`policy.tenant`) |
| Resource system | `ops-warden` (`policy.system`) |
| Resource type | `ssh-certificate` |
| Action | `sign` |
| Resource id | `ssh-cert:actor/<actor-name>` |
| Actor type | Example flex-auth subject | ops-warden inventory name pattern |
| --- | --- | --- |
| `adm` | `platform-steward` | `adm-*` |
| `agt` | `ci-deploy-agent` | `agt-*` |
| `atm` | `backup-automation` | `atm-*` |
**Subject id sent to flex-auth:** `WARDEN_POLICY_SUBJECT` when set, otherwise the
inventory actor name. flex-auth may also allow `iam:<actor-name>` when listed in
`allowed_subjects` on the resource.
**Principals and TTL:** Taken from the sign request (inventory defaults). flex-auth
denies when principals are empty/disallowed or TTL exceeds `max_ttl_hours` on the
registered resource.
### Fixture coverage (flex-auth)
Allow: `fixture:ops-warden-adm-sign-allow`, `fixture:ops-warden-agt-sign-allow`,
`fixture:ops-warden-atm-sign-allow`.
Deny: `fixture:ops-warden-unknown-subject-deny`,
`fixture:ops-warden-actor-type-mismatch-deny`, `fixture:ops-warden-ttl-above-max-deny`,
`fixture:ops-warden-disallowed-principal-deny`,
`fixture:ops-warden-missing-fingerprint-deny`.
### Local smoke
```bash
# flex-auth (from ~/flex-auth)
flex-auth serve --addr 127.0.0.1:8080 \
--registry examples/ops-warden/registry_snapshot.json \
--policy examples/ops-warden/policy_package.md \
--log /tmp/flex-auth-ops-warden-decisions.jsonl
# warden.yaml — policy.enabled: true, flex_auth_url pointing at flex-auth
# Use an actor registered in the flex-auth registry (example fixtures use
# template names; production needs a registry slice for real inventory actors).
```
Local end-to-end evidence: `history/2026-06-23-flex-auth-policy-gate-local-smoke.md`.
### Production registry from inventory
Build a flex-auth registry snapshot that mirrors `inventory.yaml` actors:
```bash
python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \
-o registry/flex-auth/production_registry_snapshot.json
flex-auth load-registry --file registry/flex-auth/production_registry_snapshot.json
```
Re-run after adding or changing actors. Deploy the snapshot to the production
flex-auth runtime together with `~/flex-auth/examples/ops-warden/policy_package.md`.
Smoke (non-secret):
```bash
./scripts/policy_gate_production_smoke.sh
# OpenBao-backed when VAULT_TOKEN is valid:
SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh
```
Evidence: `history/2026-06-23-flex-auth-policy-gate-production-smoke.md`.
---
## Production rollout
1. Deploy flex-auth policies for resource type `ssh-certificate`.
2. Enable `policy.enabled: true` in production `warden.yaml`.
3. Keep `fail_closed: true` unless an explicit break-glass procedure exists.
4. Verify `signatures.log` entries include `policy_decision_id`.
**Keep `policy.enabled: false` until flex-auth is reachable** at `policy.flex_auth_url`
with `fail_closed: true`, unreachable flex-auth blocks all signs.
### Operator checklist
| Step | Owner | Action |
| --- | --- | --- |
| 1 | flex-auth | Deploy runtime; confirm `curl <flex_auth_url>/healthz` → 200 (**FLEX-WP-0007**) |
| 2 | flex-auth | Load production registry + policy package (`~/flex-auth/examples/ops-warden/`) |
| 3 | ops-warden | Regenerate registry from inventory: `scripts/build_flex_auth_registry.py` |
| 4 | ops-warden | Local smoke: `./scripts/policy_gate_production_smoke.sh` |
| 5 | operator | Vault smoke: `SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh` (valid `VAULT_TOKEN`) |
| 6 | operator | Set `policy.flex_auth_url` in `~/.config/warden/warden.yaml` |
| 7 | operator | Set `policy.enabled: true`; keep `fail_closed: true` |
| 8 | operator | Allow smoke: `warden sign <actor>``signatures.log` has `policy_decision_id` |
| 9 | operator | Deny smoke: e.g. `--ttl` above max — CLI shows flex-auth `reason`, no cert |
Cross-repo references:
- `~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md`
- `history/2026-06-23-flex-auth-production-pickup-suggestion.md`
- `history/2026-06-23-flex-auth-policy-gate-production-smoke.md`
### Summary
1. Deploy the flex-auth registry and policy package to the production flex-auth
runtime — **not** only the example fixtures.
2. Set `policy.flex_auth_url` to the production flex-auth base URL.
3. Enable `policy.enabled: true` only after steps 15 pass.
4. Keep `fail_closed: true` unless an explicit break-glass procedure exists.
5. Smoke allow and deny paths; preserve non-secret evidence only.
---
@@ -117,5 +234,6 @@ defines **what the actor is allowed to request**.
- `wiki/OpsWardenConfig.md` — full config reference
- `wiki/CredentialRouting.md`
- `~/flex-auth/docs/ops-warden-policy-gate-handoff.md` — flex-auth handoff
- `flex-auth/INTENT.md`
- `net-kingdom/docs/platform-identity-security-architecture.md`

View File

@@ -0,0 +1,105 @@
# Operator OpenBao Token Hygiene
Date: 2026-06-24
Workplan: WARDEN-WP-0013 T4
Daily `warden sign` against production OpenBao requires a **scoped** API token in
`VAULT_TOKEN` — not the cluster root token.
---
## Rules
| Rule | Rationale |
| --- | --- |
| Never commit `VAULT_TOKEN` | Tokens are secrets |
| Never paste tokens in chat, State Hub, or workplans | Same |
| Do not use root token for daily `warden sign` | Break-glass only |
| Prefer short-lived tokens | Limit blast radius |
| Refresh on HTTP 403 | Token expired or policy mismatch |
---
## Scoped token for warden
Production signing needs permission to call the SSH engine sign endpoint for the
roles mapped in `warden.yaml` (`adm-role`, `agt-role`, `atm-role`).
Illustrative policy shape (create in OpenBao policy admin — adjust names to match
your cluster):
```hcl
# warden-sign — least privilege for ops-warden CLI
path "ssh/sign/agt-role" {
capabilities = ["create", "update"]
}
path "ssh/sign/adm-role" {
capabilities = ["create", "update"]
}
path "ssh/sign/atm-role" {
capabilities = ["create", "update"]
}
```
Issue a token bound to `warden-sign` (operator procedure in `railiance-platform` /
OpenBao admin runbooks).
---
## Session pattern
```bash
# Set for current shell only — do not add to ~/.bashrc with a literal token
export VAULT_TOKEN="<scoped-token>"
warden status agt-state-hub-bridge
warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub
```
`warden` reads the env var named in `vault.token_env` (default `VAULT_TOKEN`).
---
## OIDC / interactive login
For human operators, prefer platform OIDC login that yields a short-lived OpenBao
token instead of copying long-lived secrets.
| Need | Route to |
| --- | --- |
| Interactive login, OIDC, MFA | key-cape / Keycloak — `warden route show key-cape-oidc-login` |
ops-warden does not implement login; it documents the route only.
---
## Troubleshooting
| Symptom | Likely cause | Action |
| --- | --- | --- |
| `Vault token not found` | `VAULT_TOKEN` unset | Export scoped token |
| `HTTP 403` / `permission denied` | Expired token or insufficient policy | Re-issue `warden-sign` token |
| `Signing failed` + connection error | Wrong `vault.addr` or network | Check `warden.yaml`, tunnel/VPN |
| Suggest `--backend local` | OpenBao unreachable | Fix connectivity; local is lab-only |
After fixing token issues, re-run:
```bash
warden sign <actor> --pubkey <path>
```
---
## Root token (break-glass only)
Cluster root tokens bypass all policy. Use only for one-time engine setup
(`wiki/OpenBaoSshEngineChecklist.md` § One-time SSH engine setup), then revoke
from daily shell profile.
---
## See also
- `wiki/OpenBaoSshEngineChecklist.md`
- `wiki/OpsWardenConfig.md` — Authentication section
- `examples/warden.production.example.yaml`

View File

@@ -0,0 +1,121 @@
# ops-bridge Tunnel — cert_command Migration
Date: 2026-06-24
Workplan: WARDEN-WP-0013 T3
Catalog: `ops-bridge-tunnel`
Migrate an ops-bridge tunnel from **static SSH keys** to **short-lived warden-signed
certificates** via the `cert_command` contract (`wiki/CertCommandInterface.md`).
ops-warden documents the migration; **ops-bridge** owns tunnel config changes.
---
## Prerequisites
- [ ] Actor registered in `~/.config/warden/inventory.yaml` (see `wiki/ActorInventoryPatterns.md`)
- [ ] Actor keypair on disk (`ssh_key` private, `.pub` for signing)
- [ ] Production `warden.yaml` with `backend: vault` and valid scoped `VAULT_TOKEN`
- [ ] Host trusts warden/OpenBao CA (`railiance-infra` `bootstrap-ssh-ca`)
- [ ] Host principal allows the actor's principals (`railiance-infra` `ssh_principals.yaml`)
---
## Pilot tunnel: `agt-state-hub-bridge`
| Field | Value |
| --- | --- |
| Actor | `agt-state-hub-bridge` |
| Type | `agt` |
| Principals | `agt-task-bridge` |
| TTL | 24 h |
| Private key | `~/.ssh/agt-state-hub-bridge_ed25519` |
| Public key | `~/.ssh/agt-state-hub-bridge_ed25519.pub` |
| cert_command | `warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub` |
### Pre-migration smoke (operator workstation)
```bash
export VAULT_TOKEN="<scoped-warden-sign-token>" # never commit or paste in chat
warden status agt-state-hub-bridge
warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub | head -1
```
Confirm exit 0 and cert line starts with `ssh-ed25519-cert-v01@openssh.com`.
---
## Migration checklist
### 1. Inventory and signing path
- [ ] Actor exists: `warden inventory list` shows `agt-state-hub-bridge`
- [ ] `warden sign` succeeds with production OpenBao backend
- [ ] `signatures.log` records the sign (`~/.local/state/warden/signatures.log`)
### 2. ops-bridge tunnel config
Edit `~/.config/bridge/tunnels.yaml` (ops-bridge repo owns schema; example below):
```yaml
tunnels:
state-hub-coulombcore:
host: coulombcore
remote_port: 8001
local_port: 8000
ssh_user: agt-state-hub-bridge
ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
actor: agt-state-hub-bridge
cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
```
- [ ] `cert_command` uses the **public** key path (warden reads pubkey, writes cert to stdout)
- [ ] `ssh_user` matches the certificate identity / host expectation
- [ ] Remove or disable static-key-only fallback once cert path is verified
### 3. Host-side verification
- [ ] Principal `agt-task-bridge` present in `railiance-infra` `ssh_principals.yaml` for target host
- [ ] Run `scripts/check_principals_drift.py` if inventory `hosts` section documents allowed principals
### 4. Tunnel smoke
```bash
# ops-bridge (from ops-bridge repo)
bridge status state-hub-coulombcore
bridge up state-hub-coulombcore
```
- [ ] Tunnel establishes without static cert file on disk
- [ ] Re-run `bridge up` after cert TTL expires — `cert_command` re-issues automatically
### 5. Policy gate (optional, after FLEX-WP-0007)
When `policy.enabled: true`, confirm `signatures.log` includes `policy_decision_id`
on tunnel-driven signs. See `wiki/PolicyGatedSigning.md`.
---
## Rollback
Keep the static key path until cert_command smoke passes. To roll back:
1. Remove `cert_command` from tunnel config
2. Restore prior static-key or `CertificateFile` workflow
3. Document rollback in ops-bridge session notes (not in git secrets)
---
## Static-key tunnels (legacy)
Tunnels using `agt-claude-*` or other long-lived keys are **out of scope** for this
pilot. Migrate per-tunnel when ops-bridge owner prioritizes them.
---
## See also
- `wiki/CertCommandInterface.md`
- `wiki/OpsWardenConfig.md` — cert_command example
- `wiki/playbooks/operator-openbao-token-hygiene.md`
- `warden route show ops-bridge-tunnel --json`

View File

@@ -1,65 +0,0 @@
---
id: WARDEN-WP-0009
type: workplan
title: "flex-auth Policy Gate Production Readiness"
domain: infotech
repo: ops-warden
status: blocked
owner: codex
topic_slug: custodian
planning_priority: low
planning_order: 9
created: "2026-06-18"
updated: "2026-06-18"
state_hub_workstream_id: "9213b262-e2f5-480e-a5bc-56635d5eb4c9"
---
# WARDEN-WP-0009 — flex-auth Policy Gate Production Readiness
**Scope:** Enable and verify the opt-in flex-auth pre-sign gate (`policy.enabled`)
in production after flex-auth publishes `ssh-certificate` resource policies.
**Out of scope:** flex-auth policy package authoring (flex-auth owner); OpenBao SSH
engine and host CA (complete — NET-WP-0020 T5 / WP-0008 T2).
**Spun out from:** WARDEN-WP-0008 T5 (2026-06-18 closeout).
---
## Tasks
### T1 — flex-auth policy package confirmation
```task
id: WARDEN-WP-0009-T01
status: wait
priority: medium
state_hub_task_id: "f988ed2e-0f63-4e89-abc4-183a7f23ddc2"
```
- [ ] Confirm flex-auth policies for resource type `ssh-certificate` exist
- [ ] Document tenant/subject bindings for `adm` / `agt` / `atm` sign paths
- [ ] Coordinate with flex-auth owner on deny/allow test fixtures
**Blocked until:** flex-auth publishes ssh-certificate policies.
### T2 — Production enablement and smoke
```task
id: WARDEN-WP-0009-T02
status: wait
priority: medium
state_hub_task_id: "9d0fabc2-10ef-426d-a3d2-d4970d377029"
```
- [ ] Document operator steps to set `policy.enabled: true` (see `wiki/PolicyGatedSigning.md`)
- [ ] Smoke test allow path — `signatures.log` includes `policy_decision_id`
- [ ] Smoke test deny path with `fail_closed: true` (non-secret evidence)
---
## See also
- `wiki/PolicyGatedSigning.md` — gate flow and config (shipped WP-0007)
- `examples/warden.production.example.yaml``policy.enabled: false` default
- `history/2026-06-17-openbao-production-verify.md` — production sign evidence

View File

@@ -4,13 +4,13 @@ type: workplan
title: "Routing Scenario Playbooks"
domain: infotech
repo: ops-warden
status: backlog
status: ready
owner: codex
topic_slug: custodian
planning_priority: medium
planning_order: 12
created: "2026-06-18"
updated: "2026-06-18"
updated: "2026-06-24"
state_hub_workstream_id: "a7e712a0-02f8-4f83-944e-6b207e77bc4c"
---
@@ -27,7 +27,7 @@ owner's procedure inside the catalog.
**Depends on:** WARDEN-WP-0010 (charter + catalog schema), WARDEN-WP-0011 (routing CLI).
**Status:** `backlog` — start after WP-0010 T3 and WP-0011 T2 ship.
**Status:** `ready` WP-0010 and WP-0011 shipped; parallel to WP-0013 integration closeout.
---

View File

@@ -0,0 +1,95 @@
---
id: WARDEN-WP-0009
type: workplan
title: "flex-auth Policy Gate Production Readiness"
domain: infotech
repo: ops-warden
status: archived
owner: codex
topic_slug: custodian
planning_priority: low
planning_order: 9
created: "2026-06-18"
updated: "2026-06-23"
state_hub_workstream_id: "9213b262-e2f5-480e-a5bc-56635d5eb4c9"
---
# WARDEN-WP-0009 — flex-auth Policy Gate Production Readiness
**Scope:** Enable and verify the opt-in flex-auth pre-sign gate (`policy.enabled`)
in production after flex-auth publishes `ssh-certificate` resource policies.
**Out of scope:** flex-auth policy package authoring (flex-auth owner — delivered
FLEX-WP-0006 2026-06-23); OpenBao SSH engine and host CA (complete — NET-WP-0020
T5 / WP-0008 T2); in-cluster flex-auth deployment (continued in flex-auth
`FLEX-WP-0007`).
**Spun out from:** WARDEN-WP-0008 T5 (2026-06-18 closeout).
---
## Tasks
### T1 — flex-auth policy package confirmation
```task
id: WARDEN-WP-0009-T01
status: done
priority: medium
state_hub_task_id: "f988ed2e-0f63-4e89-abc4-183a7f23ddc2"
```
- [x] Confirm flex-auth policies for resource type `ssh-certificate` exist
- [x] Document tenant/subject bindings for `adm` / `agt` / `atm` sign paths
- [x] Coordinate with flex-auth owner on deny/allow test fixtures
### T2 — Production enablement and smoke
```task
id: WARDEN-WP-0009-T02
status: done
priority: medium
state_hub_task_id: "9d0fabc2-10ef-426d-a3d2-d4970d377029"
```
- [x] Document operator steps to set `policy.enabled: true` (see `wiki/PolicyGatedSigning.md`)
- [x] Local smoke — allow/deny paths with `policy_decision_id` / `ttl_out_of_bounds`
- [x] Production registry slice from inventory (`registry/flex-auth/production_registry_snapshot.json`)
- [x] Production registry smoke — allow `agt-state-hub-bridge` (`decision:032b096c433ad80c`)
- [x] Production registry smoke — deny `--ttl 999` (`ttl_out_of_bounds`)
---
## Deliverables
| Artifact | Path |
| --- | --- |
| Registry builder | `scripts/build_flex_auth_registry.py` |
| Production registry | `registry/flex-auth/production_registry_snapshot.json` |
| Smoke runner | `scripts/policy_gate_production_smoke.sh` |
| Local smoke evidence | `history/2026-06-23-flex-auth-policy-gate-local-smoke.md` |
| Production smoke evidence | `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` |
| flex-auth pickup brief | `history/2026-06-23-flex-auth-production-pickup-suggestion.md` |
---
## Closeout (2026-06-23)
T1T2 complete. ops-warden caller side and production-registry smoke verified.
Production `policy.enabled: true` flip deferred until flex-auth runtime is
reachable — tracked in flex-auth `FLEX-WP-0007`, not this workplan.
**Operator follow-up (FLEX-WP-0007):**
- Deploy registry + policy package to in-cluster flex-auth; set `policy.flex_auth_url`
- Refresh scoped `VAULT_TOKEN` and run `SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh`
- Set `policy.enabled: true` in `~/.config/warden/warden.yaml` when flex-auth is reachable
---
## See also
- `wiki/PolicyGatedSigning.md`
- `~/flex-auth/docs/ops-warden-policy-gate-handoff.md`
- `~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md`
- `examples/warden.production.example.yaml`

View File

@@ -4,13 +4,13 @@ type: workplan
title: "Access Routing — Charter and Pointer Catalog"
domain: infotech
repo: ops-warden
status: done
status: archived
owner: codex
topic_slug: custodian
planning_priority: high
planning_order: 10
created: "2026-06-18"
updated: "2026-06-18"
updated: "2026-06-24"
state_hub_workstream_id: "e93de9fd-0192-4d02-bb7c-5e859fb76b9b"
---
@@ -169,3 +169,8 @@ state_hub_task_id: "3335a689-922c-4319-98d0-4263ab13790b"
- `history/2026-06-18-access-routing-intent-shift-assessment.md` — decision record
- `WARDEN-WP-0011` — routing CLI
- `WARDEN-WP-0012` — scenario playbook expansion (backlog)
---
## Closeout (2026-06-24)
Archived during WARDEN-WP-0013 T2. All tasks complete.

View File

@@ -4,13 +4,13 @@ type: workplan
title: "Routing Lookup CLI"
domain: infotech
repo: ops-warden
status: done
status: archived
owner: codex
topic_slug: custodian
planning_priority: high
planning_order: 11
created: "2026-06-18"
updated: "2026-06-18"
updated: "2026-06-24"
state_hub_workstream_id: "0a520f8e-01b4-48f1-9af3-2f3f69fd0672"
---
@@ -154,3 +154,8 @@ state_hub_task_id: "bf848375-eca7-4116-bb1d-fb7df6395c70"
- `WARDEN-WP-0010` — charter and catalog schema
- `WARDEN-WP-0012` — expanded per-scenario playbooks
- `history/2026-06-17-intent-scope-assessment.md` — prior `warden guide` proposal (P4)
---
## Closeout (2026-06-24)
Archived during WARDEN-WP-0013 T2. All tasks complete.

View File

@@ -0,0 +1,202 @@
---
id: WARDEN-WP-0013
type: workplan
title: "Production Integration & Stewardship Closeout"
domain: infotech
repo: ops-warden
status: archived
owner: codex
topic_slug: custodian
planning_priority: high
planning_order: 13
depends_on_workplans:
- WARDEN-WP-0008
- WARDEN-WP-0009
- WARDEN-WP-0010
- WARDEN-WP-0011
related_workplans:
- WARDEN-WP-0012
- FLEX-WP-0007
created: "2026-06-24"
updated: "2026-06-24"
state_hub_workstream_id: "4678c41a-c1d0-48cd-9988-4ea0380e8258"
---
# WARDEN-WP-0013 — Production Integration & Stewardship Closeout
## Purpose
Close the remaining **ops-warden-owned** gaps after policy gate and routing shipped:
refresh INTENT/SCOPE canon, archive finished workplans, document ops-bridge
`cert_command` migration, operator OpenBao token hygiene, principals drift checks,
and the policy-gate production flip checklist.
This workplan addresses the deferred **Production SSH Integration Closeout** strand
from `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` §6, updated for
post-WP-0009 state.
**Gap analysis:** `history/2026-06-24-intent-scope-gap-analysis.md`
## Scope
- Post-WP-0009 reassessment and SCOPE alignment
- Archive hygiene for WP-0010 and WP-0011
- ops-bridge `cert_command` migration documentation (pilot `agt-state-hub-bridge`)
- Operator runbook for scoped OpenBao tokens (no root in `VAULT_TOKEN`)
- Principals drift check between warden inventory and railiance-infra
- Policy gate production enablement checklist (coordinate FLEX-WP-0007)
## Out of scope
- flex-auth runtime deployment (flex-auth **FLEX-WP-0007**)
- ops-bridge tunnel config changes in the ops-bridge repo (coordinate only)
- Routing scenario playbook expansion (**WARDEN-WP-0012** — parallel track)
- OpenBao cluster deploy, flex-auth policy authoring, NK-WP-0009 tutorial
- Implementing secret vending or foreign API proxies
## Ownership boundary
| Concern | Owner |
| --- | --- |
| cert_command migration playbook | ops-warden (doc); ops-bridge (tunnel config) |
| OpenBao token hygiene runbook | ops-warden (doc); operator (execution) |
| Principals drift | ops-warden (check doc/script); railiance-infra (host deploy) |
| `policy.enabled: true` flip | operator (after FLEX-WP-0007) |
---
## T1 — Post-gap reassessment and SCOPE refresh
```task
id: WARDEN-WP-0013-T01
status: done
priority: high
state_hub_task_id: "de46f9a2-bf11-4651-a23c-430c63f396c8"
```
- [x] Write `history/2026-06-24-intent-scope-gap-analysis.md`
- [x] Update `SCOPE.md` active workplan table (WP-0013, WP-0012 ready)
- [x] Note maturity vector and partial INTENT criterion (ops-bridge) in SCOPE
**Acceptance:** Gap analysis on file; SCOPE reflects 2026-06-24 repo state.
---
## T2 — Archive hygiene (WP-0010, WP-0011)
```task
id: WARDEN-WP-0013-T02
status: done
priority: medium
state_hub_task_id: "1b35321d-63ad-40da-a1aa-0b66190a0733"
```
- [x] Move `WARDEN-WP-0010-access-routing-charter.md` to
`workplans/archived/260624-WARDEN-WP-0010-access-routing-charter.md`
- [x] Move `WARDEN-WP-0011-routing-guide-cli.md` to
`workplans/archived/260624-WARDEN-WP-0011-routing-guide-cli.md`
- [x] Set frontmatter `status: archived` on both; add closeout notes
- [x] Operator runs `make fix-consistency REPO=ops-warden` from `~/state-hub`
**Acceptance:** Only WP-0012 (ready) and WP-0013 (active when started) remain in
`workplans/` root; hub synced.
---
## T3 — ops-bridge cert_command migration playbook
```task
id: WARDEN-WP-0013-T03
status: done
priority: high
state_hub_task_id: "ad8588b2-9ae9-4f94-bd77-8025851a38f5"
```
- [x] Write `wiki/playbooks/ops-bridge-tunnel-cert.md` — static-key → `cert_command`
migration checklist for tunnel configs
- [x] Document pilot tunnel `agt-state-hub-bridge`: actor, pubkey path, cert_command
string, inventory prerequisites
- [x] Upgrade catalog entry `ops-bridge-tunnel` `wiki_ref` to the new playbook
- [x] Coordinate with ops-bridge owner for pilot tunnel config change (State Hub message)
- [ ] Record non-secret smoke evidence when pilot completes (`history/` entry — pending ops-bridge)
**Acceptance:** Playbook exists; catalog points at it; pilot steps documented even
if ops-bridge execution is pending.
**Unlocks:** INTENT success criterion #3 moves from partial toward met.
---
## T4 — Operator OpenBao token hygiene runbook
```task
id: WARDEN-WP-0013-T04
status: done
priority: medium
state_hub_task_id: "5cb35829-32eb-4d59-97a1-f4d92ce8e239"
```
- [x] Add `wiki/playbooks/operator-openbao-token-hygiene.md` covering scoped tokens,
`VAULT_TOKEN` session pattern, OIDC route, HTTP 403 recovery
- [x] Cross-link from `wiki/OpsWardenConfig.md` and production example yaml
**Acceptance:** Operator can follow runbook without asking ops-warden for token values.
---
## T5 — Principals inventory drift check
```task
id: WARDEN-WP-0013-T05
status: done
priority: medium
state_hub_task_id: "4025cd32-89f8-42c3-b1e8-eaf78497d91f"
```
- [x] `scripts/check_principals_drift.py` compares inventory `hosts` vs
`railiance-infra/ansible/inventory/ssh_principals.yaml`
- [x] Script notes flex-auth registry regeneration via `build_flex_auth_registry.py`
- [x] Tests in `tests/test_principals_drift.py`
**Acceptance:** Drift check runnable or documented; no secret material in script output.
---
## T6 — Policy gate production enablement checklist
```task
id: WARDEN-WP-0013-T06
status: done
priority: medium
state_hub_task_id: "51663f65-79cb-4108-87c8-9721f9476259"
```
- [x] Operator checklist in `wiki/PolicyGatedSigning.md` § Production rollout
- [x] Cross-link FLEX-WP-0007 and pickup brief
- [x] Explicit: keep `policy.enabled: false` until flex-auth reachable
**Acceptance:** Operator checklist is sequential and references cross-repo owners;
no ops-warden code changes required for flex-auth deploy.
---
## Exit criteria
- Gap analysis and SCOPE current
- WP-0010 and WP-0011 archived
- ops-bridge cert_command playbook + catalog upgrade
- Operator token hygiene runbook
- Principals drift procedure
- Policy gate production flip checklist (coordinate FLEX-WP-0007)
## Parallel track
**WARDEN-WP-0012** (routing scenario playbooks) — promoted to `ready`; start when
P1 integration doc bandwidth allows or in parallel if staffed.
## See also
- `history/2026-06-24-intent-scope-gap-analysis.md`
- `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
- `wiki/CertCommandInterface.md`
- `~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md`