From 90007c2cdad5759faa3938f99f4d183e66943d6f Mon Sep 17 00:00:00 2001 From: tegwick Date: Wed, 24 Jun 2026 12:44:32 +0200 Subject: [PATCH] feat: close WP-0009/WP-0013 production integration stewardship strand Ship flex-auth policy gate registry and smoke evidence, archive WP-0009 through WP-0013, and add integration docs: ops-bridge cert_command migration playbook, operator OpenBao token hygiene, principals drift check script, and 2026-06-24 INTENT/SCOPE gap analysis. --- SCOPE.md | 96 ++-- examples/warden.production.example.yaml | 6 +- ...06-23-flex-auth-policy-gate-local-smoke.md | 70 +++ ...-flex-auth-policy-gate-production-smoke.md | 99 ++++ ...-flex-auth-production-pickup-suggestion.md | 189 ++++++++ .../2026-06-24-intent-scope-gap-analysis.md | 127 +++++ ...-bridge-cert-command-pilot-coordination.md | 33 ++ .../production_registry_snapshot.json | 450 ++++++++++++++++++ registry/routing/catalog.yaml | 6 +- scripts/build_flex_auth_registry.py | 199 ++++++++ scripts/check_principals_drift.py | 103 ++++ scripts/policy_gate_production_smoke.sh | 105 ++++ tests/test_flex_auth_registry.py | 34 ++ tests/test_principals_drift.py | 48 ++ wiki/OpsWardenConfig.md | 6 +- wiki/PolicyGatedSigning.md | 130 ++++- .../operator-openbao-token-hygiene.md | 105 ++++ wiki/playbooks/ops-bridge-tunnel-cert.md | 121 +++++ ...P-0009-flex-auth-policy-gate-production.md | 65 --- ...RDEN-WP-0012-routing-scenario-playbooks.md | 6 +- ...P-0009-flex-auth-policy-gate-production.md | 95 ++++ ...-WARDEN-WP-0010-access-routing-charter.md} | 9 +- ...60624-WARDEN-WP-0011-routing-guide-cli.md} | 9 +- ...on-integration-and-stewardship-closeout.md | 202 ++++++++ 24 files changed, 2192 insertions(+), 121 deletions(-) create mode 100644 history/2026-06-23-flex-auth-policy-gate-local-smoke.md create mode 100644 history/2026-06-23-flex-auth-policy-gate-production-smoke.md create mode 100644 history/2026-06-23-flex-auth-production-pickup-suggestion.md create mode 100644 history/2026-06-24-intent-scope-gap-analysis.md create mode 100644 history/2026-06-24-ops-bridge-cert-command-pilot-coordination.md create mode 100644 registry/flex-auth/production_registry_snapshot.json create mode 100644 scripts/build_flex_auth_registry.py create mode 100644 scripts/check_principals_drift.py create mode 100755 scripts/policy_gate_production_smoke.sh create mode 100644 tests/test_flex_auth_registry.py create mode 100644 tests/test_principals_drift.py create mode 100644 wiki/playbooks/operator-openbao-token-hygiene.md create mode 100644 wiki/playbooks/ops-bridge-tunnel-cert.md delete mode 100644 workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md create mode 100644 workplans/archived/260623-WARDEN-WP-0009-flex-auth-policy-gate-production.md rename workplans/{WARDEN-WP-0010-access-routing-charter.md => archived/260624-WARDEN-WP-0010-access-routing-charter.md} (98%) rename workplans/{WARDEN-WP-0011-routing-guide-cli.md => archived/260624-WARDEN-WP-0011-routing-guide-cli.md} (97%) create mode 100644 workplans/archived/260624-WARDEN-WP-0013-production-integration-and-stewardship-closeout.md diff --git a/SCOPE.md b/SCOPE.md index 1b55349..3b723b7 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -15,21 +15,26 @@ aligned with NetKingdom canon. --- -## Where we are (2026-06-18) +## Where we are (2026-06-24) ops-warden **issues short-lived SSH certificates and routes every other credential need to the subsystem that owns it.** SSH signing is **production-verified** on Railiance OpenBao (`warden sign` against `https://bao.coulomb.social`, host CA trust -deployed). The routing material — `wiki/AccessRouting.md`, the credential routing -wiki, NetKingdom security map, a machine-readable pointer catalog -(`registry/routing/catalog.yaml`, WARDEN-WP-0010), and the `warden route` -lookup CLI over it (`list`/`show`/`find`, WARDEN-WP-0011) — is operational. The opt-in -flex-auth pre-sign gate is **coded but off in production** until flex-auth publishes -`ssh-certificate` policies (WARDEN-WP-0009). +deployed). + +**Access routing** is shipped: `wiki/AccessRouting.md`, credential routing wiki, +NetKingdom security map, machine-readable pointer catalog +(`registry/routing/catalog.yaml`, WP-0010), and `warden route` lookup CLI +(`list`/`show`/`find`, `--json`, WP-0011). + +**Policy gate** is shipped on the caller side (WP-0007) with production registry +and smoke evidence (WP-0009 archived). flex-auth published the `ssh-certificate` +policy package (FLEX-WP-0006). `policy.enabled` remains **false** in production +until flex-auth is deployed to a reachable URL (flex-auth FLEX-WP-0007). **INTENT alignment:** SSH issuance mission met in production. Remaining distance -is integration breadth (ops-bridge `cert_command` on live tunnels), authorization -depth (flex-auth), and operator hygiene — not missing signing code. +is integration breadth (ops-bridge `cert_command` on live tunnels), flex-auth +runtime deployment (not ops-warden code), and operator hygiene. ### Issue vs route @@ -47,7 +52,9 @@ ops-warden executes exactly one lane and points at the owner for the rest. Full role and boundary: `wiki/AccessRouting.md`. The catalog is a **pointer layer** — it never restates an owner's procedure (authored `steps` exist only for the SSH lane). -Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` +Gap analysis: `history/2026-06-24-intent-scope-gap-analysis.md` (current); +`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` (SSH lane); +`history/2026-06-18-access-routing-intent-shift-assessment.md` (routing charter). --- @@ -66,8 +73,8 @@ Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | Dimension | Level | Meaning today | | --- | --- | --- | | D5 | Discovery | Routing wiki + security map + pointer catalog + NK canon cross-links | -| A4 | Availability | CLI + opt-in policy gate + `warden route` lookup over the machine-readable catalog (`list`/`show`/`find`, `--json` for agents) | -| C4 | Completeness | SSH lane prod-verified; flex-auth policies external | +| A4 | Availability | CLI + `warden route` + opt-in policy gate + agent `--json` lookup | +| C4 | Completeness | SSH lane prod-verified; policy gate + registry smoke shipped; prod flip waits flex-auth deploy | | R3 | Reliability | Live OpenBao sign evidence on Railiance | --- @@ -75,9 +82,9 @@ Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` ## Core Idea **Today:** implements the SSH certificate lane from `wiki/AccessManagementDirective.md` -§§1–5 — CA signing, actor inventory, TTL policy, cert-side scorecard, and the -`cert_command` interface for ops-bridge. Production path uses OpenBao SSH engine -(`backend: vault`). +§§1–5 — CA signing, actor inventory, TTL policy, cert-side scorecard, optional +flex-auth pre-sign gate, and the `cert_command` interface for ops-bridge. Production +path uses OpenBao SSH engine (`backend: vault`). **Direction (INTENT):** issue short-lived SSH certificates and route dev workers to key-cape, flex-auth, OpenBao, ops-bridge, and railiance components for everything @@ -96,6 +103,10 @@ for the rest. - `cert_command`: `warden sign --pubkey ` → cert on stdout - TTL enforcement per `ActorType` (`adm` 48 h, `agt` 24 h, `atm` 8 h) - `warden status`, cleanup, scorecard, signatures log +- Opt-in flex-auth policy gate (`policy.enabled`, `policy_decision_id` in log) +- Production flex-auth registry builder (`scripts/build_flex_auth_registry.py`, + `registry/flex-auth/production_registry_snapshot.json`) +- Policy gate smoke runner (`scripts/policy_gate_production_smoke.sh`) - `warden route` lookup CLI (`list`/`show`/`find`, `--json`) over the pointer catalog - `warden issue` and `ops-ssh-wrapper` (local backend; vault uses sign-only) - Runbooks for OpenBao config and Inter-Hub bootstrap SSH envelope @@ -105,38 +116,38 @@ for the rest. - NetKingdom security routing guidance — which subsystem owns which credential type - Wiki and config references aligned with OpenBao-first platform standard - Capability registry entry for SSH certificate issuance +- Routing pointer catalog (`registry/routing/catalog.yaml`) - Keeping ops access patterns consistent with `net-kingdom` platform architecture -### Shipped workplans +### Shipped workplans (archived) | WP | Focus | | --- | --- | +| WP-0001–0005 | Initial CLI, quality, hygiene, OpenBao docs, hub sync | | WP-0006 | Credential routing, security map, inventory patterns, OpenBao checklist | | WP-0007 | Opt-in flex-auth policy gate (`policy.enabled`) | | WP-0008 | Production sign verification, stewardship closeout, archive hygiene | -| WP-0010 | "Issue SSH, route the rest" wording + `wiki/AccessRouting.md` + pointer catalog | -| WP-0011 | `warden route` lookup CLI (`list`/`show`/`find`) over the pointer catalog (A3 → A4) | +| WP-0009 | flex-auth registry + policy smoke; pickup brief for FLEX-WP-0007 | +| WP-0010 | Access routing charter + pointer catalog | +| WP-0011 | `warden route` lookup CLI | +| WP-0013 | Production integration closeout — cert_command playbook, token hygiene, principals drift | -### Active / wait +### Active / ready | WP | Status | Focus | | --- | --- | --- | -| **WP-0009** | `blocked` | flex-auth `ssh-certificate` policies + `policy.enabled` production smoke | -| **WP-0012** | `backlog` | Routing scenario playbooks (draft until owner paths ship) | +| **WP-0012** | `ready` | Routing scenario playbooks (catalog + wiki expansion) | -### Known gaps (not yet workplanned) +### Known gaps (not ops-warden workplans) | Gap | Owner | Notes | | --- | --- | --- | -| ops-bridge `cert_command` on live tunnels | ops-bridge | Tunnels use `agt-claude-*` static keys today | -| Operator token hygiene | Operator | Prefer OIDC + `warden-sign`; retire root from shell profile | -| Principals sync warden ↔ railiance-infra | ops-warden + infra | `inventory.yaml` hosts vs `ssh_principals.yaml` | +| flex-auth production runtime + registry deploy | flex-auth | **FLEX-WP-0007** — unblocks `policy.enabled: true` | +| Vault-backed policy gate joint smoke | flex-auth + operator | Needs valid scoped `VAULT_TOKEN` | +| ops-bridge `cert_command` on live tunnels | ops-bridge | Playbook shipped (`wiki/playbooks/ops-bridge-tunnel-cert.md`); pilot pending | +| Principals sync warden ↔ railiance-infra | ops-warden + infra | `scripts/check_principals_drift.py` — operator runs periodically | | NK-WP-0009 joint SSH tutorial | net-kingdom | Parallel coordination track | -The integration-closeout strand (ops-bridge tunnel migration, token runbook) from -reassessment §6 is not yet workplanned; WARDEN-WP-0010 was used for the access-routing -charter instead. Open a new WP when tunnel migration becomes priority. - --- ## Out of Scope @@ -145,6 +156,7 @@ charter instead. Open a new WP when tunnel migration becomes priority. with flex-auth policy where required; ops-warden documents paths only - Identity / OIDC / MFA → key-cape, Keycloak - Authorization policy decisions → flex-auth +- flex-auth runtime deployment → flex-auth (`FLEX-WP-0007`) - Tunnel lifecycle → `ops-bridge` - Host principal deployment → `railiance-infra` - OpenBao / Vault cluster deployment → `railiance-platform` @@ -157,10 +169,12 @@ charter instead. Open a new WP when tunnel migration becomes priority. - Issuing or refreshing an **SSH cert** for `adm`/`agt`/`atm` - A dev worker needs to know **where to get credentials** in the NetKingdom stack +- An agent needs **`warden route find`** instead of re-deriving routing from wiki prose - `ops-bridge` needs a `cert_command` for a tunnel -- Adding actors to the principals inventory +- Adding actors to the principals inventory (regenerate flex-auth registry snapshot) - Inter-Hub or bootstrap tasks need a **short-lived agent SSH envelope** - Checking cert-side compliance (scorecard) +- Enabling or testing the opt-in flex-auth policy gate --- @@ -177,9 +191,12 @@ charter instead. Open a new WP when tunnel migration becomes priority. - **SSH CLI:** v0.1.0 — local + OpenBao backends - **Production sign:** verified 2026-06-18 (`history/2026-06-17-openbao-production-verify.md`) -- **Policy gate:** shipped, `policy.enabled: false` in prod until WP-0009 -- **Active workplan:** WP-0009 (wait — flex-auth) -- **Latest assessment:** `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` +- **Access routing:** WP-0010 + WP-0011 shipped (`warden route`, pointer catalog) +- **Policy gate:** caller shipped (WP-0007); registry + smoke complete (WP-0009 archived). + `policy.enabled: false` until flex-auth reachable (`FLEX-WP-0007`) +- **Ready work:** WP-0012 (routing playbooks) +- **Integration docs:** cert_command migration, token hygiene, principals drift (`wiki/playbooks/`) +- **Latest assessment:** `history/2026-06-24-intent-scope-gap-analysis.md` --- @@ -195,7 +212,8 @@ key-cape / Keycloak identity claims ``` Upstream: OpenBao SSH engine (production) or local CA (labs). Actor inventory in -operator config or Git-tracked patterns. +operator config or Git-tracked patterns. flex-auth registry snapshot derived from +inventory when policy gate is enabled. Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operators. @@ -207,6 +225,7 @@ Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operato - `cert_command`: shell command returning a cert on stdout - `inventory.yaml`: actor → principals + TTL registry - `LocalCA` / `VaultCA`: signing backends (`backend: local` | `vault`) +- Pointer catalog: `registry/routing/catalog.yaml` — subsystem ownership lookup only --- @@ -218,7 +237,7 @@ Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operato | `ops-bridge` | Primary cert_command consumer | | `railiance-infra` | Host-side SSH principals and hardening | | `railiance-platform` | OpenBao deployment and platform secrets | -| `flex-auth` | Authorization; opt-in pre-sign policy gate (`policy.enabled`) | +| `flex-auth` | Authorization; policy package shipped (FLEX-WP-0006); runtime deploy FLEX-WP-0007 | | `key-cape` | Identity / IAM Profile lightweight mode | | `state-hub` | Workstream registry | @@ -243,14 +262,17 @@ keywords: [ssh, certificate, ca, credential, warden, ops-warden, pki, openbao, v | --- | --- | | `INTENT.md` | Why ops-warden exists and where it is going | | `SCOPE.md` | What is implemented today (this file) | -| `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | Latest INTENT ↔ SCOPE gap analysis | | `wiki/AccessRouting.md` | What ops-warden issues vs routes (role and boundary) | | `wiki/CredentialRouting.md` | Which subsystem for each credential need | | `registry/routing/catalog.yaml` | Machine-readable routing pointer catalog | | `wiki/NetKingdomSecurityMap.md` | Platform security component map | | `examples/warden.production.example.yaml` | Production warden.yaml template | +| `wiki/PolicyGatedSigning.md` | flex-auth opt-in gate + registry rollout | | `wiki/AccessManagementDirective.md` | SSH actor model | | `wiki/OpsWardenConfig.md` | warden.yaml and OpenBao | | `wiki/CertCommandInterface.md` | cert_command contract | -| `wiki/PolicyGatedSigning.md` | flex-auth opt-in gate | +| `history/2026-06-24-intent-scope-gap-analysis.md` | Current gap analysis + WP-0013 | +| `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | SSH lane gap analysis | +| `history/2026-06-18-access-routing-intent-shift-assessment.md` | Routing charter decision | +| `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` | Policy gate smoke evidence | | `net-kingdom/docs/platform-identity-security-architecture.md` | Platform security canon | \ No newline at end of file diff --git a/examples/warden.production.example.yaml b/examples/warden.production.example.yaml index 80a9fbc..3231321 100644 --- a/examples/warden.production.example.yaml +++ b/examples/warden.production.example.yaml @@ -15,10 +15,12 @@ vault: inventory_path: ~/.config/warden/inventory.yaml state_dir: ~/.local/state/warden -# Opt-in flex-auth gate — keep false until ssh-certificate policies exist +# Opt-in flex-auth gate — enable only when flex-auth is reachable at flex_auth_url. +# Registry: registry/flex-auth/production_registry_snapshot.json (build from inventory). +# See wiki/PolicyGatedSigning.md (operator checklist) and wiki/playbooks/operator-openbao-token-hygiene.md policy: enabled: false - flex_auth_url: http://127.0.0.1:8080 + flex_auth_url: http://flex-auth.flex-auth.svc.cluster.local:8080 fail_closed: true tenant: tenant:platform subject_env: WARDEN_POLICY_SUBJECT diff --git a/history/2026-06-23-flex-auth-policy-gate-local-smoke.md b/history/2026-06-23-flex-auth-policy-gate-local-smoke.md new file mode 100644 index 0000000..f24cc76 --- /dev/null +++ b/history/2026-06-23-flex-auth-policy-gate-local-smoke.md @@ -0,0 +1,70 @@ +# flex-auth Policy Gate — Local Smoke (WARDEN-WP-0009) + +**Date:** 2026-06-23 +**Workplan:** WARDEN-WP-0009 T01 closeout + T02 local smoke +**flex-auth delivery:** FLEX-WP-0006 (`docs/ops-warden-policy-gate-handoff.md`) + +--- + +## Unblock + +flex-auth published the `ssh-certificate` / `sign` policy package and ops-warden +handoff on 2026-06-23. WARDEN-WP-0009 T01 is complete; T2 local smoke below. +Production enablement still requires deploying a **production registry slice** +with real inventory actors (see `wiki/PolicyGatedSigning.md`). + +--- + +## flex-auth assets confirmed + +| Asset | Path (flex-auth repo) | +| --- | --- | +| Policy package | `examples/ops-warden/policy_package.md` | +| Fixtures | `examples/ops-warden/policy_fixtures.yaml` | +| Registry snapshot | `examples/ops-warden/registry_snapshot.json` | +| Handoff | `docs/ops-warden-policy-gate-handoff.md` | + +Example registry actors (`platform-steward`, `ci-deploy-agent`, `backup-automation`) +are **templates**. Production actors such as `agt-state-hub-bridge` must be +registered in the deployed flex-auth registry before `policy.enabled: true`. + +--- + +## Local smoke (ops-warden + flex-auth) + +**Setup:** `backend: local`, `policy.enabled: true`, `fail_closed: true`, +flex-auth `serve` with ops-warden policy package and a smoke registry that adds +`agt-policy-smoke` (ops-warden naming-compliant clone of the `agt` fixture). + +### Allow path + +| Check | Result | +| --- | --- | +| `warden sign agt-policy-smoke` | Pass (exit 0) | +| `signatures.log` `policy_decision_id` | `decision:78bc882eca883f29` | +| `signatures.log` `backend` | `local` | + +### Deny path (`fail_closed: true`) + +| Check | Result | +| --- | --- | +| `warden sign agt-state-hub-bridge` (not in flex-auth registry) | Fail (exit 1) | +| CLI reason surfaced | `unknown_actor_resource` | +| Cert issued | No | + +--- + +## Production remaining (T2) + +1. Deploy flex-auth registry + policy package to production flex-auth runtime. +2. Register production inventory actors (`agt-state-hub-bridge`, `adm-*`, `atm-*`). +3. Set `policy.flex_auth_url` and `policy.enabled: true` in production `warden.yaml`. +4. Repeat allow/deny smoke against OpenBao-backed `warden sign`; capture + `policy_decision_id` in `signatures.log` (non-secret evidence only). + +--- + +## See also + +- `wiki/PolicyGatedSigning.md` — bindings, rollout, handoff link +- `workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md` \ No newline at end of file diff --git a/history/2026-06-23-flex-auth-policy-gate-production-smoke.md b/history/2026-06-23-flex-auth-policy-gate-production-smoke.md new file mode 100644 index 0000000..d47258c --- /dev/null +++ b/history/2026-06-23-flex-auth-policy-gate-production-smoke.md @@ -0,0 +1,99 @@ +# flex-auth Policy Gate — Production Registry Smoke (WARDEN-WP-0009 T02) + +**Date:** 2026-06-23 +**Workplan:** WARDEN-WP-0009 T02 +**Operator:** codex (non-secret evidence only) + +--- + +## Production registry slice + +Built from `~/.config/warden/inventory.yaml` (matches `examples/inventory.seed.yaml`): + +| Artifact | Path | +| --- | --- | +| Registry snapshot | `registry/flex-auth/production_registry_snapshot.json` | +| Generator | `scripts/build_flex_auth_registry.py` | +| Smoke runner | `scripts/policy_gate_production_smoke.sh` | + +`flex-auth load-registry` validation: **4 actors**, 3 groups, 4 relationships. + +Registered actors: + +| Actor | Type | max_ttl_hours | Principals | +| --- | --- | --- | --- | +| `agt-state-hub-bridge` | agt | 24 | `agt-task-bridge` | +| `agt-codex-interhub-bootstrap` | agt | 2 | `agt-interhub-bootstrap` | +| `adm-example` | adm | 48 | `adm-full` | +| `atm-backup-daily` | atm | 8 | `atm-backup-daily` | + +Regenerate after inventory changes: + +```bash +python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \ + -o registry/flex-auth/production_registry_snapshot.json +``` + +Deploy the snapshot to the production flex-auth runtime (`flex-auth serve` or +future in-cluster deployment). Policy package path: +`~/flex-auth/examples/ops-warden/policy_package.md`. + +--- + +## Smoke results (production inventory + registry) + +flex-auth served locally with the production registry; `warden sign` used real +inventory actors and `policy.enabled: true`. + +### Allow path — `agt-state-hub-bridge` + +| Check | Result | +| --- | --- | +| `warden sign agt-state-hub-bridge` | Pass (exit 0) | +| `signatures.log` `policy_decision_id` | `decision:032b096c433ad80c` | +| `signatures.log` `actor` | `agt-state-hub-bridge` | + +### Deny path — TTL above registry max (`fail_closed: true`) + +| Check | Result | +| --- | --- | +| `warden sign agt-state-hub-bridge --ttl 999` | Fail (exit 1) | +| flex-auth reason | `ttl_out_of_bounds` | +| Cert issued | No | + +--- + +## OpenBao-backed smoke (operator follow-up) + +Attempted `backend: vault` against `https://bao.coulomb.social` with +`policy.enabled: true`. **Blocked:** `VAULT_TOKEN` in session returned HTTP 403 +(`permission denied`). Baseline `warden sign` without policy gate fails the same +way — token refresh required before vault-backed policy smoke. + +When a scoped `warden-sign` token is available: + +```bash +export VAULT_TOKEN="" # never commit or paste in chat +SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh +``` + +Then enable production `warden.yaml`: + +```yaml +policy: + enabled: true + flex_auth_url: http://flex-auth.flex-auth.svc.cluster.local:8080 # or reachable URL + fail_closed: true +``` + +Keep `policy.enabled: false` until flex-auth is reachable at `flex_auth_url` from +the workstation running `warden sign` — `fail_closed: true` blocks all signs when +flex-auth is down. + +--- + +## See also + +- `history/2026-06-23-flex-auth-policy-gate-local-smoke.md` — template registry smoke +- `wiki/PolicyGatedSigning.md` — rollout sequence +- `~/flex-auth/docs/ops-warden-policy-gate-handoff.md` \ No newline at end of file diff --git a/history/2026-06-23-flex-auth-production-pickup-suggestion.md b/history/2026-06-23-flex-auth-production-pickup-suggestion.md new file mode 100644 index 0000000..2456db9 --- /dev/null +++ b/history/2026-06-23-flex-auth-production-pickup-suggestion.md @@ -0,0 +1,189 @@ +# flex-auth Pickup Suggestion — Ops-Warden Policy Gate Production + +**Date:** 2026-06-23 +**From:** ops-warden (`WARDEN-WP-0009` finished) +**For:** flex-auth owner +**Prior delivery:** `FLEX-WP-0006` (policy package, template registry, handoff doc) + +--- + +## Summary + +ops-warden closed **WARDEN-WP-0009**. The caller side (`policy.enabled`, +`POST /v1/check`, `policy_decision_id` in `signatures.log`) is verified. +flex-auth **policy authoring** for the gate contract is done. + +What remains is **flex-auth production runtime + registry operations** so +operators can set `policy.enabled: true` on workstations running `warden sign` +without local `flex-auth serve` hacks. + +--- + +## What ops-warden already proved + +| Evidence | Location | +| --- | --- | +| Template registry + policy smoke | `history/2026-06-23-flex-auth-policy-gate-local-smoke.md` | +| Production inventory registry smoke | `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` | +| Production registry artifact | `registry/flex-auth/production_registry_snapshot.json` | +| Registry generator | `scripts/build_flex_auth_registry.py` | +| Joint smoke runner | `scripts/policy_gate_production_smoke.sh` | + +Production-registry allow smoke (real actor `agt-state-hub-bridge`): + +- `policy_decision_id: decision:032b096c433ad80c` +- Deny: `ttl_out_of_bounds` with `fail_closed: true` + +OpenBao-backed sign + policy gate is **not yet joint-verified** — scoped +`VAULT_TOKEN` returned HTTP 403 in this session (ops-warden operator task). + +--- + +## Gaps flex-auth should pick up + +### 1. Production runtime deployment (P0) + +**Problem:** No reachable flex-auth endpoint from the operator workstation. +Probe from WSL: `flex-auth.flex-auth.svc.cluster.local:8080` does not resolve; +`127.0.0.1:8080` is not running. ops-warden cannot enable `policy.enabled` +with `fail_closed: true` until flex-auth is up. + +**Suggestion for flex-auth:** + +- Deploy `flex-auth serve` (or equivalent) to a **stable production URL** + reachable from machines that run `warden sign`. +- Document the canonical URL for `policy.flex_auth_url` (cluster DNS, tunnel, + or ingress — whichever matches NetKingdom operator access patterns). +- Expose **`GET /healthz`** (already in code) in runbooks; ops-warden operators + will use it as a pre-flight before enabling the gate. + +**Acceptance:** Operator can `curl /healthz` from the warden +workstation and get HTTP 200. + +--- + +### 2. Load production registry, not only template fixtures (P0) + +**Problem:** `examples/ops-warden/registry_snapshot.json` uses **template** +actors (`platform-steward`, `ci-deploy-agent`, `backup-automation`). Production +inventory uses **different names** (`agt-state-hub-bridge`, etc.). Signing with +`policy.enabled: true` denies unregistered actors (`unknown_actor_resource`). + +**Suggestion for flex-auth:** + +- Adopt ops-warden's production registry snapshot as the **initial production + load target**, or ingest equivalent manifests under `examples/ops-warden/` + generated from real inventory. +- Document operator steps: + ```bash + # ops-warden (regenerate when inventory changes) + python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \ + -o registry/flex-auth/production_registry_snapshot.json + + # flex-auth (load into runtime) + flex-auth load-registry --file + flex-auth serve --registry --policy examples/ops-warden/policy_package.md ... + ``` +- Add **fixture or integration tests** using production actor names + (`agt-state-hub-bridge`, `adm-example`, `atm-backup-daily`) so CI catches + registry drift. + +**Acceptance:** `POST /v1/check` allows `agt-state-hub-bridge` / `sign` against +the deployed production registry without ops-warden-local registry patching. + +--- + +### 3. Registry sync contract (P1) + +**Problem:** ops-warden owns `inventory.yaml`; flex-auth owns authorization +registry. Today sync is manual: regenerate JSON, reload flex-auth. + +**Suggestion for flex-auth:** + +- Publish a short **sync contract** doc: + - **ops-warden owns:** actor names, types, principals, TTL defaults + - **flex-auth owns:** `allowed_subjects`, `max_ttl_hours`, relationships, + policy package + - **Trigger:** inventory add/change → regenerate snapshot → flex-auth reload +- Optional later: `flex-auth validate` target for ops-warden-generated snapshots; + or HTTP reload endpoint for registry updates without restart. + +**Acceptance:** Documented two-repo workflow; no ambiguity on who updates what +when a new `agt-*` actor is added. + +--- + +### 4. Joint production smoke with OpenBao (P1) + +**Problem:** Policy gate smoke used `backend: local` or local flex-auth. Full +production path is `warden sign` → flex-auth → OpenBao SSH engine. + +**Suggestion for flex-auth:** + +- Coordinate one **joint smoke session** with ops-warden once: + - flex-auth deployed with production registry + - ops-warden `policy.enabled: true`, valid `VAULT_TOKEN` + - Allow: `warden sign agt-state-hub-bridge` → `signatures.log` has + `backend: vault` and `policy_decision_id` + - Deny: e.g. `--ttl` above max → flex-auth deny before OpenBao call +- Record non-secret evidence (decision ids, reasons, actor names only). + +**Acceptance:** Shared history entry or flex-auth handoff update with vault-backed +evidence mirroring ops-warden's local smoke format. + +--- + +### 5. IAM subject binding in production (P2) + +**Problem:** Policy allows `subject.id` = actor name or `iam:`. Production +may set `WARDEN_POLICY_SUBJECT` from key-cape/IAM profile `sub`. + +**Suggestion for flex-auth:** + +- Confirm production registry `allowed_subjects` covers expected IAM subs for + each actor (or document that actor-name fallback is the production default + until IAM mapping is wired). +- Add one fixture for `WARDEN_POLICY_SUBJECT` / `iam:agt-state-hub-bridge` if + that path is intended in prod. + +**Acceptance:** Documented subject-id strategy for SSH sign gate in production. + +--- + +## Proposed flex-auth workplan (draft) + +**Title:** `FLEX-WP-0007 — Ops-Warden Policy Gate Production Deployment` +**Priority:** P0 +**Depends on:** `FLEX-WP-0006`, ops-warden `WARDEN-WP-0009` (finished) + +| Task | Summary | +| --- | --- | +| T1 | Deploy flex-auth runtime; document production `flex_auth_url` + `/healthz` | +| T2 | Load production registry snapshot; verify allow/deny for real inventory actors | +| T3 | Publish registry sync contract with ops-warden (`inventory.yaml` → snapshot) | +| T4 | Joint OpenBao + policy gate smoke with ops-warden (non-secret evidence) | +| T5 | IAM subject binding notes / fixtures for `WARDEN_POLICY_SUBJECT` (if needed) | + +--- + +## Ownership boundary (unchanged) + +| Concern | Owner | +| --- | --- | +| Policy package + PDP decision | flex-auth | +| Actor inventory + TTL/principal defaults | ops-warden | +| SSH CA / OpenBao signing | ops-warden | +| Production registry **content** for SSH actors | joint — ops-warden generates from inventory; flex-auth hosts and evaluates | +| `policy.enabled` flip | ops-warden operator (after flex-auth reachable) | + +--- + +## References + +| Doc | Repo | +| --- | --- | +| `docs/ops-warden-policy-gate-handoff.md` | flex-auth | +| `workplans/FLEX-WP-0006-ops-warden-ssh-signing-policy-gate.md` | flex-auth | +| `wiki/PolicyGatedSigning.md` | ops-warden | +| `workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md` | ops-warden | +| `registry/flex-auth/production_registry_snapshot.json` | ops-warden | \ No newline at end of file diff --git a/history/2026-06-24-intent-scope-gap-analysis.md b/history/2026-06-24-intent-scope-gap-analysis.md new file mode 100644 index 0000000..983e07d --- /dev/null +++ b/history/2026-06-24-intent-scope-gap-analysis.md @@ -0,0 +1,127 @@ +# INTENT ↔ SCOPE Gap Analysis — Post WP-0009 / WP-0011 + +**Date:** 2026-06-24 +**Author:** codex +**Trigger:** WARDEN-WP-0009 archived; WP-0010/0011 done; policy gate + routing shipped. +**Prior assessments:** `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`, +`history/2026-06-18-access-routing-intent-shift-assessment.md` + +--- + +## 1. Executive summary + +ops-warden is a **production-capable SSH CA** with **structured credential routing** +(`warden route`) and a **shipped, opt-in flex-auth policy gate** (registry + smoke +complete; production flip waits flex-auth runtime deploy). + +INTENT's SSH issuance mission is **met in production**. The largest remaining INTENT +gap is **ops-bridge consumer integration** — `cert_command` contract exists but live +tunnels still use static keys. Secondary gaps are **operator hygiene**, **inventory ↔ +infra principals alignment**, **routing playbook depth** (WP-0012), and **cross-repo +coordination** (flex-auth FLEX-WP-0007, net-kingdom NK-WP-0009). + +**Vector movement:** `D5 / A4 / C4 / R3` → **`D5 / A4 / C4 / R3`** (unchanged level; +policy-gate readiness improves C4 substance without changing the label until prod flip) + +| Dimension | Was | Now | Notes | +| --- | --- | --- | --- | +| Discovery | D5 | D5 | Catalog + `warden route` + wiki | +| Availability | A4 | A4 | Routing CLI shipped (WP-0011) | +| Completeness | C4 | C4 | Policy registry smoke done; prod `policy.enabled` off | +| Reliability | R3 | R3 | OpenBao sign verified; cert_command not on live tunnels | + +--- + +## 2. Deliverables since 2026-06-18 + +| Workplan | Deliverable | Status | +| --- | --- | --- | +| WP-0009 | flex-auth policy package confirmed; production registry + smoke | Archived | +| WP-0010 | Access routing charter + pointer catalog | Archived 2026-06-24 | +| WP-0011 | `warden route` CLI + catalog tests | Archived 2026-06-24 | +| WP-0013 | Production integration closeout (playbooks, drift, archive) | Finished 2026-06-24 | +| FLEX-WP-0006 | flex-auth policy package + handoff | flex-auth finished | +| FLEX-WP-0007 | flex-auth production deploy (draft) | flex-auth proposed | + +--- + +## 3. INTENT success criteria + +| # | Criterion | Status | Evidence / gap | +| --- | --- | --- | --- | +| 1 | Worker knows which subsystem for each credential type | **Met** | `warden route`, catalog, wikis | +| 2 | SSH access short-lived, inventoried, audited | **Met (prod)** | OpenBao sign + `signatures.log` | +| 3 | ops-bridge integrates via stable `cert_command` | **Partial** | Contract shipped; tunnels static-key | +| 4 | NetKingdom evolution reflected in docs | **Met** | NK cross-links, routing charter | +| 5 | Non-SSH secrets stay out of ops-warden | **Met** | Pointer layer only | + +**Score: 4 met, 1 partial** — partial is ops-bridge production adoption. + +--- + +## 4. INTENT mission pillars + +| Pillar | Status | Gap | +| --- | --- | --- | +| 1. Know NetKingdom security model | Strong | — | +| 2. Route workers to correct subsystem | Strong | WP-0012 playbooks deepen scenarios | +| 3. Align runbooks with canon | Strong | Reassessment + archive hygiene due | +| 4. Issue short-lived SSH certs | **Production** | — | +| 5. Audit SSH signing | Strong | Policy `policy_decision_id` when gate on | + +--- + +## 5. Remaining gaps (prioritized) + +| Prio | Gap | Owner | ops-warden action | Track | +| --- | --- | --- | --- | --- | +| **P1** | ops-bridge `cert_command` on production tunnels | ops-bridge + ops-warden | Migration playbook + pilot evidence | **WARDEN-WP-0013** T3 | +| **P2** | Operator token hygiene (root → scoped `warden-sign`) | Operator + ops-warden | Runbook in wiki | **WARDEN-WP-0013** T4 | +| **P3** | Principals drift (inventory ↔ railiance-infra) | ops-warden + infra | Drift check doc/script | **WARDEN-WP-0013** T5 | +| **P4** | Routing scenario playbooks incomplete | ops-warden | Expand catalog + wiki playbooks | **WARDEN-WP-0012** (ready) | +| **P5** | flex-auth production runtime | flex-auth | Coordinate; operator flip checklist | **FLEX-WP-0007** + WP-0013 T6 | +| **P6** | Vault-backed policy gate joint smoke | flex-auth + operator | Run when `VAULT_TOKEN` valid | FLEX-WP-0007 T4 | +| **P7** | Archive hygiene (WP-0010, WP-0011) | ops-warden | Move to `workplans/archived/` | **WARDEN-WP-0013** T2 | +| **P8** | NK-WP-0009 joint SSH tutorial | net-kingdom | Coordinate only | Parallel | +| **P9** | Policy v2.1 identity claims for `adm` | ops-warden + flex-auth | Design only | Future | + +--- + +## 6. Workplan recommendation + +**WARDEN-WP-0013 — Production Integration & Stewardship Closeout** (new): + +- T1: This reassessment + SCOPE refresh +- T2: Archive WP-0010 and WP-0011 +- T3: ops-bridge `cert_command` migration playbook (pilot `agt-state-hub-bridge`) +- T4: Operator OpenBao token hygiene runbook +- T5: Principals inventory drift check +- T6: Policy gate production enablement checklist (coordinate FLEX-WP-0007) + +**WARDEN-WP-0012 — Routing Scenario Playbooks** (promote `backlog` → `ready`): + +- Dependencies WP-0010/0011 shipped; start when bandwidth allows +- Complements WP-0013 (routing depth vs SSH integration closeout) + +**Out of scope for new ops-warden WPs:** + +- flex-auth runtime deployment (FLEX-WP-0007) +- ops-bridge tunnel config changes (ops-bridge executes; ops-warden documents) + +--- + +## 7. Maturity target (post WP-0013 + WP-0012) + +| Dimension | Target | Unlock | +| --- | --- | --- | +| C4 → C4+ | cert_command pilot documented | WP-0013 T3 | +| R3 → R4 | Live tunnel uses warden-signed cert | ops-bridge + WP-0013 evidence | +| D5 | More active catalog playbooks | WP-0012 | + +--- + +## See also + +- `workplans/WARDEN-WP-0013-production-integration-and-stewardship-closeout.md` +- `workplans/WARDEN-WP-0012-routing-scenario-playbooks.md` +- `SCOPE.md` \ No newline at end of file diff --git a/history/2026-06-24-ops-bridge-cert-command-pilot-coordination.md b/history/2026-06-24-ops-bridge-cert-command-pilot-coordination.md new file mode 100644 index 0000000..484585c --- /dev/null +++ b/history/2026-06-24-ops-bridge-cert-command-pilot-coordination.md @@ -0,0 +1,33 @@ +# ops-bridge cert_command Pilot — Coordination Note + +**Date:** 2026-06-24 +**Workplan:** WARDEN-WP-0013 T3 +**Playbook:** `wiki/playbooks/ops-bridge-tunnel-cert.md` + +## Status + +ops-warden shipped the migration playbook and upgraded catalog entry `ops-bridge-tunnel`. +Pilot tunnel **`agt-state-hub-bridge`** is documented with actor, key paths, and +`cert_command` string. + +**Execution owner:** ops-bridge (tunnel config in `~/.config/bridge/tunnels.yaml`). + +## Request to ops-bridge + +Apply `cert_command` to the `state-hub-coulombcore` tunnel per the playbook migration +checklist. ops-warden will record smoke evidence in `history/` when the pilot completes +(non-secret: tunnel up/down, cert re-issue after TTL). + +## Pre-requisites (operator) + +- Scoped `VAULT_TOKEN` for production OpenBao sign (`wiki/playbooks/operator-openbao-token-hygiene.md`) +- `warden sign agt-state-hub-bridge` succeeds before tunnel config change + +## Evidence pending + +| Check | Status | +| --- | --- | +| Playbook on file | Done | +| Catalog `wiki_ref` | Done | +| ops-bridge tunnel config updated | Pending | +| `bridge up` smoke | Pending | \ No newline at end of file diff --git a/registry/flex-auth/production_registry_snapshot.json b/registry/flex-auth/production_registry_snapshot.json new file mode 100644 index 0000000..3110228 --- /dev/null +++ b/registry/flex-auth/production_registry_snapshot.json @@ -0,0 +1,450 @@ +{ + "systems": [ + { + "id": "ops-warden", + "name": "Ops Warden", + "resource_types": [ + { + "name": "ssh-certificate", + "scope_level": "Resource", + "planes": [ + "Identity", + "Secret", + "Audit" + ], + "metadata": { + "description": "Short-lived SSH certificate signing request." + } + } + ], + "actions": [ + { + "name": "sign", + "capabilities": [ + "Use", + "Operate", + "Audit" + ], + "planes": [ + "Identity", + "Secret", + "Audit" + ], + "exposure_modes": [ + "Metadata" + ], + "metadata": { + "required_context": [ + "principals", + "actor_type", + "pubkey_fingerprint", + "ttl_hours" + ] + } + } + ], + "caring_profiles": [ + "caring-0.4.0-rc2" + ], + "metadata": { + "flex_auth_contract": "protected-system-v0", + "ops_warden_policy_gate": "v2", + "policy_enabled_config": "policy.enabled", + "tenant": "tenant:platform" + } + } + ], + "resource_manifests": [ + { + "id": "ops-warden-ssh-certificates", + "system": "ops-warden", + "resources": [ + { + "id": "ssh-cert:actor/adm-example", + "type": "ssh-certificate", + "labels": [ + "ssh-signing", + "adm" + ], + "trust_zone": "platform", + "owner": "team:platform-security", + "attributes": { + "actor_id": "adm-example", + "actor_type": "adm", + "allowed_subjects": [ + "adm-example", + "iam:adm-example" + ], + "allowed_principals": [ + "adm-full" + ], + "max_ttl_hours": 48 + } + }, + { + "id": "ssh-cert:actor/agt-codex-interhub-bootstrap", + "type": "ssh-certificate", + "labels": [ + "ssh-signing", + "agt" + ], + "trust_zone": "platform", + "owner": "team:platform-security", + "attributes": { + "actor_id": "agt-codex-interhub-bootstrap", + "actor_type": "agt", + "allowed_subjects": [ + "agt-codex-interhub-bootstrap", + "iam:agt-codex-interhub-bootstrap" + ], + "allowed_principals": [ + "agt-interhub-bootstrap" + ], + "max_ttl_hours": 2 + } + }, + { + "id": "ssh-cert:actor/agt-state-hub-bridge", + "type": "ssh-certificate", + "labels": [ + "ssh-signing", + "agt" + ], + "trust_zone": "platform", + "owner": "team:platform-security", + "attributes": { + "actor_id": "agt-state-hub-bridge", + "actor_type": "agt", + "allowed_subjects": [ + "agt-state-hub-bridge", + "iam:agt-state-hub-bridge" + ], + "allowed_principals": [ + "agt-task-bridge" + ], + "max_ttl_hours": 24 + } + }, + { + "id": "ssh-cert:actor/atm-backup-daily", + "type": "ssh-certificate", + "labels": [ + "ssh-signing", + "atm" + ], + "trust_zone": "platform", + "owner": "team:platform-security", + "attributes": { + "actor_id": "atm-backup-daily", + "actor_type": "atm", + "allowed_subjects": [ + "atm-backup-daily", + "iam:atm-backup-daily" + ], + "allowed_principals": [ + "atm-backup-daily" + ], + "max_ttl_hours": 8 + } + } + ], + "actions": [ + "sign" + ], + "caring_profile": "caring-0.4.0-rc2", + "metadata": { + "flex_auth_contract": "resource-registration-v0", + "tenant": "tenant:platform" + } + } + ], + "tenants": [ + { + "id": "tenant:platform", + "name": "Platform Tenant" + } + ], + "subjects": [ + { + "id": "adm-example", + "type": "Agent", + "display_name": "Example human operator \u2014 replace with per-person adm-* actors", + "organization_relation": "ServiceProvider", + "roles": [ + "Operator" + ], + "groups": [ + "group:ops-warden-admins" + ], + "tenant": "tenant:platform", + "metadata": { + "actor_type": "adm" + } + }, + { + "id": "agt-codex-interhub-bootstrap", + "type": "Agent", + "display_name": "Short-lived agent access for attended Inter-Hub bootstrap", + "organization_relation": "ServiceProvider", + "roles": [ + "Operator" + ], + "groups": [ + "group:ops-warden-agents" + ], + "tenant": "tenant:platform", + "metadata": { + "actor_type": "agt" + } + }, + { + "id": "agt-state-hub-bridge", + "type": "Agent", + "display_name": "ops-bridge tunnel agent for state-hub", + "organization_relation": "ServiceProvider", + "roles": [ + "Operator" + ], + "groups": [ + "group:ops-warden-agents" + ], + "tenant": "tenant:platform", + "metadata": { + "actor_type": "agt" + } + }, + { + "id": "atm-backup-daily", + "type": "Automation", + "display_name": "Example nightly automation actor", + "organization_relation": "ServiceProvider", + "roles": [ + "Operator" + ], + "groups": [ + "group:ops-warden-automations" + ], + "tenant": "tenant:platform", + "metadata": { + "actor_type": "atm" + } + } + ], + "groups": [ + { + "id": "group:ops-warden-admins", + "display_name": "Ops Warden Admins", + "members": [ + "adm-example" + ], + "tenant": "tenant:platform" + }, + { + "id": "group:ops-warden-agents", + "display_name": "Ops Warden Agents", + "members": [ + "agt-codex-interhub-bootstrap", + "agt-state-hub-bridge" + ], + "tenant": "tenant:platform" + }, + { + "id": "group:ops-warden-automations", + "display_name": "Ops Warden Automations", + "members": [ + "atm-backup-daily" + ], + "tenant": "tenant:platform" + } + ], + "relationships": [ + { + "id": "rel:adm-example-sign-adm-example", + "system": "ops-warden", + "subject": "group:ops-warden-admins", + "relation": "signer", + "object": "ssh-cert:actor/adm-example", + "tenant": "tenant:platform", + "conditions": [ + "TimeLimited", + "Logged" + ], + "caring": { + "id": "descriptor:ops-warden-adm-signer", + "profile": "caring-0.4.0-rc2", + "subject_type": "Group", + "organization_relation": "ServiceProvider", + "canonical_role": "Operator", + "scope": { + "level": "Resource", + "id": "ssh-cert:actor/adm-example", + "tenant": "tenant:platform", + "resource": "ssh-cert:actor/adm-example" + }, + "planes": [ + "Identity", + "Secret", + "Audit" + ], + "capabilities": [ + "Use", + "Operate", + "Audit" + ], + "exposure_modes": [ + "Metadata" + ], + "conditions": [ + "TimeLimited", + "Logged" + ], + "restrictions": [ + "PrivilegeEscalationBlocked", + "SecretAccessBlocked" + ], + "access_path": "mediated" + } + }, + { + "id": "rel:agt-codex-interhub-bootstrap-sign-agt-codex-interhub-bootstrap", + "system": "ops-warden", + "subject": "group:ops-warden-agents", + "relation": "signer", + "object": "ssh-cert:actor/agt-codex-interhub-bootstrap", + "tenant": "tenant:platform", + "conditions": [ + "TimeLimited", + "Logged" + ], + "caring": { + "id": "descriptor:ops-warden-agt-signer", + "profile": "caring-0.4.0-rc2", + "subject_type": "Group", + "organization_relation": "ServiceProvider", + "canonical_role": "Operator", + "scope": { + "level": "Resource", + "id": "ssh-cert:actor/agt-codex-interhub-bootstrap", + "tenant": "tenant:platform", + "resource": "ssh-cert:actor/agt-codex-interhub-bootstrap" + }, + "planes": [ + "Identity", + "Secret", + "Audit" + ], + "capabilities": [ + "Use", + "Operate", + "Audit" + ], + "exposure_modes": [ + "Metadata" + ], + "conditions": [ + "TimeLimited", + "Logged" + ], + "restrictions": [ + "PrivilegeEscalationBlocked", + "SecretAccessBlocked" + ], + "access_path": "mediated" + } + }, + { + "id": "rel:agt-state-hub-bridge-sign-agt-state-hub-bridge", + "system": "ops-warden", + "subject": "group:ops-warden-agents", + "relation": "signer", + "object": "ssh-cert:actor/agt-state-hub-bridge", + "tenant": "tenant:platform", + "conditions": [ + "TimeLimited", + "Logged" + ], + "caring": { + "id": "descriptor:ops-warden-agt-signer", + "profile": "caring-0.4.0-rc2", + "subject_type": "Group", + "organization_relation": "ServiceProvider", + "canonical_role": "Operator", + "scope": { + "level": "Resource", + "id": "ssh-cert:actor/agt-state-hub-bridge", + "tenant": "tenant:platform", + "resource": "ssh-cert:actor/agt-state-hub-bridge" + }, + "planes": [ + "Identity", + "Secret", + "Audit" + ], + "capabilities": [ + "Use", + "Operate", + "Audit" + ], + "exposure_modes": [ + "Metadata" + ], + "conditions": [ + "TimeLimited", + "Logged" + ], + "restrictions": [ + "PrivilegeEscalationBlocked", + "SecretAccessBlocked" + ], + "access_path": "mediated" + } + }, + { + "id": "rel:atm-backup-daily-sign-atm-backup-daily", + "system": "ops-warden", + "subject": "group:ops-warden-automations", + "relation": "signer", + "object": "ssh-cert:actor/atm-backup-daily", + "tenant": "tenant:platform", + "conditions": [ + "TimeLimited", + "Logged" + ], + "caring": { + "id": "descriptor:ops-warden-atm-signer", + "profile": "caring-0.4.0-rc2", + "subject_type": "Group", + "organization_relation": "ServiceProvider", + "canonical_role": "Operator", + "scope": { + "level": "Resource", + "id": "ssh-cert:actor/atm-backup-daily", + "tenant": "tenant:platform", + "resource": "ssh-cert:actor/atm-backup-daily" + }, + "planes": [ + "Identity", + "Secret", + "Audit" + ], + "capabilities": [ + "Use", + "Operate", + "Audit" + ], + "exposure_modes": [ + "Metadata" + ], + "conditions": [ + "TimeLimited", + "Logged" + ], + "restrictions": [ + "PrivilegeEscalationBlocked", + "SecretAccessBlocked" + ], + "access_path": "mediated" + } + } + ] +} diff --git a/registry/routing/catalog.yaml b/registry/routing/catalog.yaml index 69ac651..f7a29e8 100644 --- a/registry/routing/catalog.yaml +++ b/registry/routing/catalog.yaml @@ -83,13 +83,13 @@ entries: - id: ops-bridge-tunnel title: SSH tunnel or port forward - need_keywords: [tunnel, port, forward, bridge, ops-bridge, reverse, transport, ssh-tunnel] + need_keywords: [tunnel, port, forward, bridge, ops-bridge, reverse, transport, ssh-tunnel, cert_command] owner_repo: ops-bridge subsystem: ops-bridge warden_executes: false - wiki_ref: wiki/CredentialRouting.md#routing-table + wiki_ref: wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist canon_ref: net-kingdom/docs/platform-identity-security-architecture.md#operational-ssh-path - reviewed: "2026-06-18" + reviewed: "2026-06-24" status: active - id: railiance-infra-principals diff --git a/scripts/build_flex_auth_registry.py b/scripts/build_flex_auth_registry.py new file mode 100644 index 0000000..b1e4ffb --- /dev/null +++ b/scripts/build_flex_auth_registry.py @@ -0,0 +1,199 @@ +#!/usr/bin/env python3 +"""Build a flex-auth registry snapshot from ops-warden inventory.yaml. + +Usage: + python scripts/build_flex_auth_registry.py inventory.yaml -o registry/flex-auth/production_registry_snapshot.json + flex-auth load-registry --file registry/flex-auth/production_registry_snapshot.json +""" +from __future__ import annotations + +import argparse +import json +from pathlib import Path +from typing import Any + +import yaml + +GROUP_BY_TYPE = { + "adm": "group:ops-warden-admins", + "agt": "group:ops-warden-agents", + "atm": "group:ops-warden-automations", +} + +SUBJECT_TYPE_BY_ACTOR = { + "adm": "Agent", + "agt": "Agent", + "atm": "Automation", +} + +DESCRIPTOR_BY_TYPE = { + "adm": "descriptor:ops-warden-adm-signer", + "agt": "descriptor:ops-warden-agt-signer", + "atm": "descriptor:ops-warden-atm-signer", +} + + +def _caring_descriptor(actor_type: str, resource_id: str) -> dict[str, Any]: + return { + "id": DESCRIPTOR_BY_TYPE[actor_type], + "profile": "caring-0.4.0-rc2", + "subject_type": "Group", + "organization_relation": "ServiceProvider", + "canonical_role": "Operator", + "scope": { + "level": "Resource", + "id": resource_id, + "tenant": "tenant:platform", + "resource": resource_id, + }, + "planes": ["Identity", "Secret", "Audit"], + "capabilities": ["Use", "Operate", "Audit"], + "exposure_modes": ["Metadata"], + "conditions": ["TimeLimited", "Logged"], + "restrictions": ["PrivilegeEscalationBlocked", "SecretAccessBlocked"], + "access_path": "mediated", + } + + +def build_registry(inventory: dict[str, Any]) -> dict[str, Any]: + actors: dict[str, Any] = inventory.get("actors") or {} + resources: list[dict[str, Any]] = [] + subjects: list[dict[str, Any]] = [] + groups: dict[str, list[str]] = {gid: [] for gid in GROUP_BY_TYPE.values()} + relationships: list[dict[str, Any]] = [] + + for name, entry in sorted(actors.items()): + actor_type = str(entry["type"]) + principals = list(entry.get("principals") or []) + ttl_hours = int(entry.get("ttl_hours") or 24) + resource_id = f"ssh-cert:actor/{name}" + group_id = GROUP_BY_TYPE[actor_type] + + resources.append( + { + "id": resource_id, + "type": "ssh-certificate", + "labels": ["ssh-signing", actor_type], + "trust_zone": "platform", + "owner": "team:platform-security", + "attributes": { + "actor_id": name, + "actor_type": actor_type, + "allowed_subjects": [name, f"iam:{name}"], + "allowed_principals": principals, + "max_ttl_hours": ttl_hours, + }, + } + ) + subjects.append( + { + "id": name, + "type": SUBJECT_TYPE_BY_ACTOR[actor_type], + "display_name": entry.get("description") or name, + "organization_relation": "ServiceProvider", + "roles": ["Operator"], + "groups": [group_id], + "tenant": "tenant:platform", + "metadata": {"actor_type": actor_type}, + } + ) + groups[group_id].append(name) + relationships.append( + { + "id": f"rel:{name}-sign-{name}", + "system": "ops-warden", + "subject": group_id, + "relation": "signer", + "object": resource_id, + "tenant": "tenant:platform", + "conditions": ["TimeLimited", "Logged"], + "caring": _caring_descriptor(actor_type, resource_id), + } + ) + + group_records = [ + { + "id": gid, + "display_name": gid.replace("group:", "").replace("-", " ").title(), + "members": members, + "tenant": "tenant:platform", + } + for gid, members in groups.items() + if members + ] + + return { + "systems": [ + { + "id": "ops-warden", + "name": "Ops Warden", + "resource_types": [ + { + "name": "ssh-certificate", + "scope_level": "Resource", + "planes": ["Identity", "Secret", "Audit"], + "metadata": { + "description": "Short-lived SSH certificate signing request." + }, + } + ], + "actions": [ + { + "name": "sign", + "capabilities": ["Use", "Operate", "Audit"], + "planes": ["Identity", "Secret", "Audit"], + "exposure_modes": ["Metadata"], + "metadata": { + "required_context": [ + "principals", + "actor_type", + "pubkey_fingerprint", + "ttl_hours", + ] + }, + } + ], + "caring_profiles": ["caring-0.4.0-rc2"], + "metadata": { + "flex_auth_contract": "protected-system-v0", + "ops_warden_policy_gate": "v2", + "policy_enabled_config": "policy.enabled", + "tenant": "tenant:platform", + }, + } + ], + "resource_manifests": [ + { + "id": "ops-warden-ssh-certificates", + "system": "ops-warden", + "resources": resources, + "actions": ["sign"], + "caring_profile": "caring-0.4.0-rc2", + "metadata": { + "flex_auth_contract": "resource-registration-v0", + "tenant": "tenant:platform", + }, + } + ], + "tenants": [{"id": "tenant:platform", "name": "Platform Tenant"}], + "subjects": subjects, + "groups": group_records, + "relationships": relationships, + } + + +def main() -> None: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("inventory", type=Path, help="ops-warden inventory.yaml") + parser.add_argument("-o", "--output", type=Path, required=True) + args = parser.parse_args() + + inventory = yaml.safe_load(args.inventory.read_text()) or {} + registry = build_registry(inventory) + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text(json.dumps(registry, indent=2) + "\n") + print(f"Wrote {args.output} ({len(registry['subjects'])} actors)") + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/scripts/check_principals_drift.py b/scripts/check_principals_drift.py new file mode 100644 index 0000000..e69a4a2 --- /dev/null +++ b/scripts/check_principals_drift.py @@ -0,0 +1,103 @@ +#!/usr/bin/env python3 +"""Compare warden inventory host principals with railiance-infra ssh_principals.yaml. + +Usage: + python scripts/check_principals_drift.py \\ + --inventory ~/.config/warden/inventory.yaml \\ + --infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml + +Exit 0 when no drift; exit 1 when principals differ. No secrets printed. +""" +from __future__ import annotations + +import argparse +import sys +from pathlib import Path +from typing import Any + +import yaml + + +def _inventory_host_principals(inventory: dict[str, Any]) -> set[str]: + principals: set[str] = set() + hosts = inventory.get("hosts") or {} + for host_entry in hosts.values(): + allowed = host_entry.get("allowed_principals") or {} + for principal_list in allowed.values(): + principals.update(principal_list) + return principals + + +def _infra_principals(infra: dict[str, Any]) -> set[str]: + principals: set[str] = set() + for host_data in (infra.get("ssh_principals") or {}).values(): + for user_principals in (host_data.get("users") or {}).values(): + principals.update(user_principals) + return principals + + +def _actor_principals(inventory: dict[str, Any]) -> set[str]: + principals: set[str] = set() + for entry in (inventory.get("actors") or {}).values(): + principals.update(entry.get("principals") or []) + return principals + + +def main() -> int: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--inventory", + type=Path, + default=Path.home() / ".config/warden/inventory.yaml", + ) + parser.add_argument( + "--infra", + type=Path, + default=Path.home() / "railiance-infra/ansible/inventory/ssh_principals.yaml", + ) + args = parser.parse_args() + + if not args.inventory.exists(): + print(f"inventory not found: {args.inventory}", file=sys.stderr) + return 2 + if not args.infra.exists(): + print(f"infra principals not found: {args.infra}", file=sys.stderr) + return 2 + + inventory = yaml.safe_load(args.inventory.read_text()) or {} + infra = yaml.safe_load(args.infra.read_text()) or {} + + host_principals = _inventory_host_principals(inventory) + infra_principals = _infra_principals(infra) + actor_principals = _actor_principals(inventory) + + only_inventory = sorted(host_principals - infra_principals) + only_infra = sorted(infra_principals - host_principals) + actors_not_on_hosts = sorted(actor_principals - host_principals) + + drift = bool(only_inventory or only_infra or actors_not_on_hosts) + + print(f"inventory hosts principals ({len(host_principals)}): {', '.join(sorted(host_principals)) or '(none)'}") + print(f"infra deployed principals ({len(infra_principals)}): {', '.join(sorted(infra_principals)) or '(none)'}") + print(f"inventory actor principals ({len(actor_principals)}): {', '.join(sorted(actor_principals)) or '(none)'}") + + if only_inventory: + print("\nDRIFT: in inventory hosts but not infra:", ", ".join(only_inventory)) + if only_infra: + print("DRIFT: in infra but not inventory hosts:", ", ".join(only_infra)) + if actors_not_on_hosts: + print("WARN: actor principals not listed under any inventory host:", ", ".join(actors_not_on_hosts)) + + if not drift and not actors_not_on_hosts: + print("\nOK — no host/infra principal drift") + return 0 + if drift: + print("\nRegenerate flex-auth registry after inventory changes:") + print(" python scripts/build_flex_auth_registry.py -o registry/flex-auth/production_registry_snapshot.json") + return 1 + print("\nOK — host/infra aligned (actor/host warning only)") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) \ No newline at end of file diff --git a/scripts/policy_gate_production_smoke.sh b/scripts/policy_gate_production_smoke.sh new file mode 100755 index 0000000..8633e5e --- /dev/null +++ b/scripts/policy_gate_production_smoke.sh @@ -0,0 +1,105 @@ +#!/usr/bin/env bash +# Production policy-gate smoke for WARDEN-WP-0009 T02. +# +# Validates flex-auth registry (from inventory), allow/deny paths through +# warden sign, and optionally OpenBao-backed signing when VAULT_TOKEN works. +# +# Usage: +# ./scripts/policy_gate_production_smoke.sh +# INVENTORY=~/.config/warden/inventory.yaml ./scripts/policy_gate_production_smoke.sh +# SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh # also test backend: vault +set -euo pipefail + +ROOT="$(cd "$(dirname "$0")/.." && pwd)" +INVENTORY="${INVENTORY:-$HOME/.config/warden/inventory.yaml}" +REGISTRY="$ROOT/registry/flex-auth/production_registry_snapshot.json" +POLICY="${FLEX_AUTH_POLICY:-$HOME/flex-auth/examples/ops-warden/policy_package.md}" +FLEX_AUTH_BIN="${FLEX_AUTH_BIN:-/tmp/flex-auth}" +ADDR="${FLEX_AUTH_ADDR:-127.0.0.1:18090}" +PUBKEY="${PUBKEY:-$HOME/.ssh/agt-state-hub-bridge_ed25519.pub}" +ACTOR="${ACTOR:-agt-state-hub-bridge}" +SMOKE_DIR="$(mktemp -d /tmp/warden-prod-policy-smoke-XXXXXX)" + +cleanup() { + if [[ -n "${FA_PID:-}" ]] && kill -0 "$FA_PID" 2>/dev/null; then + kill "$FA_PID" 2>/dev/null || true + wait "$FA_PID" 2>/dev/null || true + fi +} +trap cleanup EXIT + +echo "==> Building registry from $INVENTORY" +uv run --directory "$ROOT" python scripts/build_flex_auth_registry.py \ + "$INVENTORY" -o "$REGISTRY" +"$FLEX_AUTH_BIN" load-registry --file "$REGISTRY" >/dev/null + +echo "==> Starting flex-auth on $ADDR" +"$FLEX_AUTH_BIN" serve \ + --addr "$ADDR" \ + --registry "$REGISTRY" \ + --policy "$POLICY" \ + --log "$SMOKE_DIR/flex-auth-decisions.jsonl" & +FA_PID=$! +sleep 0.6 + +ssh-keygen -t ed25519 -f "$SMOKE_DIR/ca_key" -N "" -q + +cat >"$SMOKE_DIR/warden.yaml" < Allow path: warden sign $ACTOR" +uv run --directory "$ROOT" warden sign "$ACTOR" --pubkey "$PUBKEY" >/dev/null +ALLOW_LINE="$(tail -1 "$SMOKE_DIR/state/signatures.log")" +python3 -c "import json,sys; e=json.loads(sys.argv[1]); assert e.get('policy_decision_id'), e; print('policy_decision_id:', e['policy_decision_id'])" "$ALLOW_LINE" + +echo "==> Deny path: ttl above max" +set +e +DENY_OUT="$(uv run --directory "$ROOT" warden sign "$ACTOR" --pubkey "$PUBKEY" --ttl 999 2>&1)" +DENY_RC=$? +set -e +if [[ "$DENY_RC" -ne 1 ]]; then + echo "expected deny exit 1, got $DENY_RC" >&2 + exit 1 +fi +echo "$DENY_OUT" | grep -q "ttl_out_of_bounds" + +if [[ "${SMOKE_VAULT:-0}" == "1" ]]; then + echo "==> Vault-backed allow (requires scoped VAULT_TOKEN)" + cat >"$SMOKE_DIR/warden-vault.yaml" </dev/null + VAULT_LINE="$(tail -1 "$SMOKE_DIR/state-vault/signatures.log")" + python3 -c "import json,sys; e=json.loads(sys.argv[1]); assert e.get('backend')=='vault' and e.get('policy_decision_id'); print('vault policy_decision_id:', e['policy_decision_id'])" "$VAULT_LINE" +fi + +echo "OK — production registry policy gate smoke passed" \ No newline at end of file diff --git a/tests/test_flex_auth_registry.py b/tests/test_flex_auth_registry.py new file mode 100644 index 0000000..f109d87 --- /dev/null +++ b/tests/test_flex_auth_registry.py @@ -0,0 +1,34 @@ +"""Tests for scripts/build_flex_auth_registry.py.""" +import json +import subprocess +import sys +from pathlib import Path + +import yaml + +ROOT = Path(__file__).resolve().parents[1] +SCRIPT = ROOT / "scripts" / "build_flex_auth_registry.py" +INVENTORY = ROOT / "examples" / "inventory.seed.yaml" + + +def test_build_registry_from_inventory_seed(tmp_path): + out = tmp_path / "registry.json" + subprocess.run( + [sys.executable, str(SCRIPT), str(INVENTORY), "-o", str(out)], + check=True, + cwd=ROOT, + ) + registry = json.loads(out.read_text()) + actors = yaml.safe_load(INVENTORY.read_text())["actors"] + + assert len(registry["subjects"]) == len(actors) + assert len(registry["resource_manifests"][0]["resources"]) == len(actors) + + bridge = next( + r + for r in registry["resource_manifests"][0]["resources"] + if r["id"] == "ssh-cert:actor/agt-state-hub-bridge" + ) + assert bridge["attributes"]["actor_type"] == "agt" + assert bridge["attributes"]["max_ttl_hours"] == 24 + assert "agt-task-bridge" in bridge["attributes"]["allowed_principals"] \ No newline at end of file diff --git a/tests/test_principals_drift.py b/tests/test_principals_drift.py new file mode 100644 index 0000000..7fb33d0 --- /dev/null +++ b/tests/test_principals_drift.py @@ -0,0 +1,48 @@ +"""Tests for scripts/check_principals_drift.py.""" +import subprocess +import sys +from pathlib import Path + +import yaml + +ROOT = Path(__file__).resolve().parents[1] +SCRIPT = ROOT / "scripts" / "check_principals_drift.py" + + +def test_no_drift_when_aligned(tmp_path): + inv = tmp_path / "inventory.yaml" + infra = tmp_path / "ssh_principals.yaml" + inv.write_text(yaml.dump({ + "actors": {"agt-test": {"type": "agt", "principals": ["agt-task-bridge"], "ttl_hours": 24}}, + "hosts": {"host1": {"allowed_principals": {"agt": ["agt-task-bridge"]}}}, + })) + infra.write_text(yaml.dump({ + "ssh_principals": {"Host1": {"users": {"user1": ["agt-task-bridge"]}}}, + })) + result = subprocess.run( + [sys.executable, str(SCRIPT), "--inventory", str(inv), "--infra", str(infra)], + cwd=ROOT, + capture_output=True, + text=True, + ) + assert result.returncode == 0 + assert "OK" in result.stdout + + +def test_drift_detected(tmp_path): + inv = tmp_path / "inventory.yaml" + infra = tmp_path / "ssh_principals.yaml" + inv.write_text(yaml.dump({ + "hosts": {"host1": {"allowed_principals": {"agt": ["agt-missing"]}}}, + })) + infra.write_text(yaml.dump({ + "ssh_principals": {"Host1": {"users": {"user1": ["agt-other"]}}}, + })) + result = subprocess.run( + [sys.executable, str(SCRIPT), "--inventory", str(inv), "--infra", str(infra)], + cwd=ROOT, + capture_output=True, + text=True, + ) + assert result.returncode == 1 + assert "DRIFT" in result.stdout \ No newline at end of file diff --git a/wiki/OpsWardenConfig.md b/wiki/OpsWardenConfig.md index 40e37b7..2696211 100644 --- a/wiki/OpsWardenConfig.md +++ b/wiki/OpsWardenConfig.md @@ -128,6 +128,9 @@ vault login `VAULT_TOKEN`). OpenBao uses the same header; you do not need a separate `BAO_TOKEN` unless you configure `token_env` that way. +See `wiki/playbooks/operator-openbao-token-hygiene.md` for scoped `warden-sign` +tokens, OIDC routing, and HTTP 403 recovery. + On failure, `warden sign` suggests falling back to `--backend local` only for lab recovery — not as a production substitute. @@ -272,4 +275,5 @@ tunnels: `ops-bridge` runs `cert_command` before each SSH launch, captures stdout as the cert, and passes it alongside the private key via `ssh -i -i `. -See `wiki/CertCommandInterface.md` for the full contract. \ No newline at end of file +See `wiki/CertCommandInterface.md` for the full contract and +`wiki/playbooks/ops-bridge-tunnel-cert.md` for static-key → cert_command migration. \ No newline at end of file diff --git a/wiki/PolicyGatedSigning.md b/wiki/PolicyGatedSigning.md index cbfa1e2..bfe6a83 100644 --- a/wiki/PolicyGatedSigning.md +++ b/wiki/PolicyGatedSigning.md @@ -1,7 +1,7 @@ # Policy-Gated SSH Signing -Date: 2026-06-17 -Status: **implemented (opt-in)** — WARDEN-WP-0007 +Date: 2026-06-23 +Status: **implemented (opt-in)** — WARDEN-WP-0007; policy package confirmed FLEX-WP-0006 By default `warden sign` authorizes via **inventory allow-list** and TTL policy only. When `policy.enabled: true` in `warden.yaml`, ops-warden calls flex-auth @@ -104,12 +104,129 @@ defines **what the actor is allowed to request**. --- +## flex-auth policy package (FLEX-WP-0006) + +flex-auth owns the `ssh-certificate` / `sign` policy package. ops-warden consumes +it via `POST /v1/check` when `policy.enabled: true`. + +**Handoff (canonical):** `~/flex-auth/docs/ops-warden-policy-gate-handoff.md` + +| Asset | flex-auth path | +| --- | --- | +| Policy package | `examples/ops-warden/policy_package.md` | +| Allow/deny fixtures | `examples/ops-warden/policy_fixtures.yaml` | +| Registry snapshot | `examples/ops-warden/registry_snapshot.json` | +| Subject manifest | `examples/ops-warden/subject_manifest.yaml` | +| Resource manifest | `examples/ops-warden/resource_manifest.yaml` | + +### Tenant and subject bindings + +| Field | Value | +| --- | --- | +| Tenant | `tenant:platform` (`policy.tenant`) | +| Resource system | `ops-warden` (`policy.system`) | +| Resource type | `ssh-certificate` | +| Action | `sign` | +| Resource id | `ssh-cert:actor/` | + +| Actor type | Example flex-auth subject | ops-warden inventory name pattern | +| --- | --- | --- | +| `adm` | `platform-steward` | `adm-*` | +| `agt` | `ci-deploy-agent` | `agt-*` | +| `atm` | `backup-automation` | `atm-*` | + +**Subject id sent to flex-auth:** `WARDEN_POLICY_SUBJECT` when set, otherwise the +inventory actor name. flex-auth may also allow `iam:` when listed in +`allowed_subjects` on the resource. + +**Principals and TTL:** Taken from the sign request (inventory defaults). flex-auth +denies when principals are empty/disallowed or TTL exceeds `max_ttl_hours` on the +registered resource. + +### Fixture coverage (flex-auth) + +Allow: `fixture:ops-warden-adm-sign-allow`, `fixture:ops-warden-agt-sign-allow`, +`fixture:ops-warden-atm-sign-allow`. + +Deny: `fixture:ops-warden-unknown-subject-deny`, +`fixture:ops-warden-actor-type-mismatch-deny`, `fixture:ops-warden-ttl-above-max-deny`, +`fixture:ops-warden-disallowed-principal-deny`, +`fixture:ops-warden-missing-fingerprint-deny`. + +### Local smoke + +```bash +# flex-auth (from ~/flex-auth) +flex-auth serve --addr 127.0.0.1:8080 \ + --registry examples/ops-warden/registry_snapshot.json \ + --policy examples/ops-warden/policy_package.md \ + --log /tmp/flex-auth-ops-warden-decisions.jsonl + +# warden.yaml — policy.enabled: true, flex_auth_url pointing at flex-auth +# Use an actor registered in the flex-auth registry (example fixtures use +# template names; production needs a registry slice for real inventory actors). +``` + +Local end-to-end evidence: `history/2026-06-23-flex-auth-policy-gate-local-smoke.md`. + +### Production registry from inventory + +Build a flex-auth registry snapshot that mirrors `inventory.yaml` actors: + +```bash +python scripts/build_flex_auth_registry.py ~/.config/warden/inventory.yaml \ + -o registry/flex-auth/production_registry_snapshot.json +flex-auth load-registry --file registry/flex-auth/production_registry_snapshot.json +``` + +Re-run after adding or changing actors. Deploy the snapshot to the production +flex-auth runtime together with `~/flex-auth/examples/ops-warden/policy_package.md`. + +Smoke (non-secret): + +```bash +./scripts/policy_gate_production_smoke.sh +# OpenBao-backed when VAULT_TOKEN is valid: +SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh +``` + +Evidence: `history/2026-06-23-flex-auth-policy-gate-production-smoke.md`. + +--- + ## Production rollout -1. Deploy flex-auth policies for resource type `ssh-certificate`. -2. Enable `policy.enabled: true` in production `warden.yaml`. -3. Keep `fail_closed: true` unless an explicit break-glass procedure exists. -4. Verify `signatures.log` entries include `policy_decision_id`. +**Keep `policy.enabled: false` until flex-auth is reachable** at `policy.flex_auth_url` +with `fail_closed: true`, unreachable flex-auth blocks all signs. + +### Operator checklist + +| Step | Owner | Action | +| --- | --- | --- | +| 1 | flex-auth | Deploy runtime; confirm `curl /healthz` → 200 (**FLEX-WP-0007**) | +| 2 | flex-auth | Load production registry + policy package (`~/flex-auth/examples/ops-warden/`) | +| 3 | ops-warden | Regenerate registry from inventory: `scripts/build_flex_auth_registry.py` | +| 4 | ops-warden | Local smoke: `./scripts/policy_gate_production_smoke.sh` | +| 5 | operator | Vault smoke: `SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh` (valid `VAULT_TOKEN`) | +| 6 | operator | Set `policy.flex_auth_url` in `~/.config/warden/warden.yaml` | +| 7 | operator | Set `policy.enabled: true`; keep `fail_closed: true` | +| 8 | operator | Allow smoke: `warden sign ` — `signatures.log` has `policy_decision_id` | +| 9 | operator | Deny smoke: e.g. `--ttl` above max — CLI shows flex-auth `reason`, no cert | + +Cross-repo references: + +- `~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md` +- `history/2026-06-23-flex-auth-production-pickup-suggestion.md` +- `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` + +### Summary + +1. Deploy the flex-auth registry and policy package to the production flex-auth + runtime — **not** only the example fixtures. +2. Set `policy.flex_auth_url` to the production flex-auth base URL. +3. Enable `policy.enabled: true` only after steps 1–5 pass. +4. Keep `fail_closed: true` unless an explicit break-glass procedure exists. +5. Smoke allow and deny paths; preserve non-secret evidence only. --- @@ -117,5 +234,6 @@ defines **what the actor is allowed to request**. - `wiki/OpsWardenConfig.md` — full config reference - `wiki/CredentialRouting.md` +- `~/flex-auth/docs/ops-warden-policy-gate-handoff.md` — flex-auth handoff - `flex-auth/INTENT.md` - `net-kingdom/docs/platform-identity-security-architecture.md` \ No newline at end of file diff --git a/wiki/playbooks/operator-openbao-token-hygiene.md b/wiki/playbooks/operator-openbao-token-hygiene.md new file mode 100644 index 0000000..c821c48 --- /dev/null +++ b/wiki/playbooks/operator-openbao-token-hygiene.md @@ -0,0 +1,105 @@ +# Operator OpenBao Token Hygiene + +Date: 2026-06-24 +Workplan: WARDEN-WP-0013 T4 + +Daily `warden sign` against production OpenBao requires a **scoped** API token in +`VAULT_TOKEN` — not the cluster root token. + +--- + +## Rules + +| Rule | Rationale | +| --- | --- | +| Never commit `VAULT_TOKEN` | Tokens are secrets | +| Never paste tokens in chat, State Hub, or workplans | Same | +| Do not use root token for daily `warden sign` | Break-glass only | +| Prefer short-lived tokens | Limit blast radius | +| Refresh on HTTP 403 | Token expired or policy mismatch | + +--- + +## Scoped token for warden + +Production signing needs permission to call the SSH engine sign endpoint for the +roles mapped in `warden.yaml` (`adm-role`, `agt-role`, `atm-role`). + +Illustrative policy shape (create in OpenBao policy admin — adjust names to match +your cluster): + +```hcl +# warden-sign — least privilege for ops-warden CLI +path "ssh/sign/agt-role" { + capabilities = ["create", "update"] +} +path "ssh/sign/adm-role" { + capabilities = ["create", "update"] +} +path "ssh/sign/atm-role" { + capabilities = ["create", "update"] +} +``` + +Issue a token bound to `warden-sign` (operator procedure in `railiance-platform` / +OpenBao admin runbooks). + +--- + +## Session pattern + +```bash +# Set for current shell only — do not add to ~/.bashrc with a literal token +export VAULT_TOKEN="" + +warden status agt-state-hub-bridge +warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub +``` + +`warden` reads the env var named in `vault.token_env` (default `VAULT_TOKEN`). + +--- + +## OIDC / interactive login + +For human operators, prefer platform OIDC login that yields a short-lived OpenBao +token instead of copying long-lived secrets. + +| Need | Route to | +| --- | --- | +| Interactive login, OIDC, MFA | key-cape / Keycloak — `warden route show key-cape-oidc-login` | + +ops-warden does not implement login; it documents the route only. + +--- + +## Troubleshooting + +| Symptom | Likely cause | Action | +| --- | --- | --- | +| `Vault token not found` | `VAULT_TOKEN` unset | Export scoped token | +| `HTTP 403` / `permission denied` | Expired token or insufficient policy | Re-issue `warden-sign` token | +| `Signing failed` + connection error | Wrong `vault.addr` or network | Check `warden.yaml`, tunnel/VPN | +| Suggest `--backend local` | OpenBao unreachable | Fix connectivity; local is lab-only | + +After fixing token issues, re-run: + +```bash +warden sign --pubkey +``` + +--- + +## Root token (break-glass only) + +Cluster root tokens bypass all policy. Use only for one-time engine setup +(`wiki/OpenBaoSshEngineChecklist.md` § One-time SSH engine setup), then revoke +from daily shell profile. + +--- + +## See also + +- `wiki/OpenBaoSshEngineChecklist.md` +- `wiki/OpsWardenConfig.md` — Authentication section +- `examples/warden.production.example.yaml` \ No newline at end of file diff --git a/wiki/playbooks/ops-bridge-tunnel-cert.md b/wiki/playbooks/ops-bridge-tunnel-cert.md new file mode 100644 index 0000000..97ef26c --- /dev/null +++ b/wiki/playbooks/ops-bridge-tunnel-cert.md @@ -0,0 +1,121 @@ +# ops-bridge Tunnel — cert_command Migration + +Date: 2026-06-24 +Workplan: WARDEN-WP-0013 T3 +Catalog: `ops-bridge-tunnel` + +Migrate an ops-bridge tunnel from **static SSH keys** to **short-lived warden-signed +certificates** via the `cert_command` contract (`wiki/CertCommandInterface.md`). + +ops-warden documents the migration; **ops-bridge** owns tunnel config changes. + +--- + +## Prerequisites + +- [ ] Actor registered in `~/.config/warden/inventory.yaml` (see `wiki/ActorInventoryPatterns.md`) +- [ ] Actor keypair on disk (`ssh_key` private, `.pub` for signing) +- [ ] Production `warden.yaml` with `backend: vault` and valid scoped `VAULT_TOKEN` +- [ ] Host trusts warden/OpenBao CA (`railiance-infra` `bootstrap-ssh-ca`) +- [ ] Host principal allows the actor's principals (`railiance-infra` `ssh_principals.yaml`) + +--- + +## Pilot tunnel: `agt-state-hub-bridge` + +| Field | Value | +| --- | --- | +| Actor | `agt-state-hub-bridge` | +| Type | `agt` | +| Principals | `agt-task-bridge` | +| TTL | 24 h | +| Private key | `~/.ssh/agt-state-hub-bridge_ed25519` | +| Public key | `~/.ssh/agt-state-hub-bridge_ed25519.pub` | +| cert_command | `warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub` | + +### Pre-migration smoke (operator workstation) + +```bash +export VAULT_TOKEN="" # never commit or paste in chat +warden status agt-state-hub-bridge +warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub | head -1 +``` + +Confirm exit 0 and cert line starts with `ssh-ed25519-cert-v01@openssh.com`. + +--- + +## Migration checklist + +### 1. Inventory and signing path + +- [ ] Actor exists: `warden inventory list` shows `agt-state-hub-bridge` +- [ ] `warden sign` succeeds with production OpenBao backend +- [ ] `signatures.log` records the sign (`~/.local/state/warden/signatures.log`) + +### 2. ops-bridge tunnel config + +Edit `~/.config/bridge/tunnels.yaml` (ops-bridge repo owns schema; example below): + +```yaml +tunnels: + state-hub-coulombcore: + host: coulombcore + remote_port: 8001 + local_port: 8000 + ssh_user: agt-state-hub-bridge + ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519 + actor: agt-state-hub-bridge + cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub" +``` + +- [ ] `cert_command` uses the **public** key path (warden reads pubkey, writes cert to stdout) +- [ ] `ssh_user` matches the certificate identity / host expectation +- [ ] Remove or disable static-key-only fallback once cert path is verified + +### 3. Host-side verification + +- [ ] Principal `agt-task-bridge` present in `railiance-infra` `ssh_principals.yaml` for target host +- [ ] Run `scripts/check_principals_drift.py` if inventory `hosts` section documents allowed principals + +### 4. Tunnel smoke + +```bash +# ops-bridge (from ops-bridge repo) +bridge status state-hub-coulombcore +bridge up state-hub-coulombcore +``` + +- [ ] Tunnel establishes without static cert file on disk +- [ ] Re-run `bridge up` after cert TTL expires — `cert_command` re-issues automatically + +### 5. Policy gate (optional, after FLEX-WP-0007) + +When `policy.enabled: true`, confirm `signatures.log` includes `policy_decision_id` +on tunnel-driven signs. See `wiki/PolicyGatedSigning.md`. + +--- + +## Rollback + +Keep the static key path until cert_command smoke passes. To roll back: + +1. Remove `cert_command` from tunnel config +2. Restore prior static-key or `CertificateFile` workflow +3. Document rollback in ops-bridge session notes (not in git secrets) + +--- + +## Static-key tunnels (legacy) + +Tunnels using `agt-claude-*` or other long-lived keys are **out of scope** for this +pilot. Migrate per-tunnel when ops-bridge owner prioritizes them. + +--- + +## See also + +- `wiki/CertCommandInterface.md` +- `wiki/OpsWardenConfig.md` — cert_command example +- `wiki/playbooks/operator-openbao-token-hygiene.md` +- `warden route show ops-bridge-tunnel --json` \ No newline at end of file diff --git a/workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md b/workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md deleted file mode 100644 index 385fd07..0000000 --- a/workplans/WARDEN-WP-0009-flex-auth-policy-gate-production.md +++ /dev/null @@ -1,65 +0,0 @@ ---- -id: WARDEN-WP-0009 -type: workplan -title: "flex-auth Policy Gate Production Readiness" -domain: infotech -repo: ops-warden -status: blocked -owner: codex -topic_slug: custodian -planning_priority: low -planning_order: 9 -created: "2026-06-18" -updated: "2026-06-18" -state_hub_workstream_id: "9213b262-e2f5-480e-a5bc-56635d5eb4c9" ---- - -# WARDEN-WP-0009 — flex-auth Policy Gate Production Readiness - -**Scope:** Enable and verify the opt-in flex-auth pre-sign gate (`policy.enabled`) -in production after flex-auth publishes `ssh-certificate` resource policies. - -**Out of scope:** flex-auth policy package authoring (flex-auth owner); OpenBao SSH -engine and host CA (complete — NET-WP-0020 T5 / WP-0008 T2). - -**Spun out from:** WARDEN-WP-0008 T5 (2026-06-18 closeout). - ---- - -## Tasks - -### T1 — flex-auth policy package confirmation - -```task -id: WARDEN-WP-0009-T01 -status: wait -priority: medium -state_hub_task_id: "f988ed2e-0f63-4e89-abc4-183a7f23ddc2" -``` - -- [ ] Confirm flex-auth policies for resource type `ssh-certificate` exist -- [ ] Document tenant/subject bindings for `adm` / `agt` / `atm` sign paths -- [ ] Coordinate with flex-auth owner on deny/allow test fixtures - -**Blocked until:** flex-auth publishes ssh-certificate policies. - -### T2 — Production enablement and smoke - -```task -id: WARDEN-WP-0009-T02 -status: wait -priority: medium -state_hub_task_id: "9d0fabc2-10ef-426d-a3d2-d4970d377029" -``` - -- [ ] Document operator steps to set `policy.enabled: true` (see `wiki/PolicyGatedSigning.md`) -- [ ] Smoke test allow path — `signatures.log` includes `policy_decision_id` -- [ ] Smoke test deny path with `fail_closed: true` (non-secret evidence) - ---- - -## See also - -- `wiki/PolicyGatedSigning.md` — gate flow and config (shipped WP-0007) -- `examples/warden.production.example.yaml` — `policy.enabled: false` default -- `history/2026-06-17-openbao-production-verify.md` — production sign evidence \ No newline at end of file diff --git a/workplans/WARDEN-WP-0012-routing-scenario-playbooks.md b/workplans/WARDEN-WP-0012-routing-scenario-playbooks.md index 012bc9a..6253f42 100644 --- a/workplans/WARDEN-WP-0012-routing-scenario-playbooks.md +++ b/workplans/WARDEN-WP-0012-routing-scenario-playbooks.md @@ -4,13 +4,13 @@ type: workplan title: "Routing Scenario Playbooks" domain: infotech repo: ops-warden -status: backlog +status: ready owner: codex topic_slug: custodian planning_priority: medium planning_order: 12 created: "2026-06-18" -updated: "2026-06-18" +updated: "2026-06-24" state_hub_workstream_id: "a7e712a0-02f8-4f83-944e-6b207e77bc4c" --- @@ -27,7 +27,7 @@ owner's procedure inside the catalog. **Depends on:** WARDEN-WP-0010 (charter + catalog schema), WARDEN-WP-0011 (routing CLI). -**Status:** `backlog` — start after WP-0010 T3 and WP-0011 T2 ship. +**Status:** `ready` — WP-0010 and WP-0011 shipped; parallel to WP-0013 integration closeout. --- diff --git a/workplans/archived/260623-WARDEN-WP-0009-flex-auth-policy-gate-production.md b/workplans/archived/260623-WARDEN-WP-0009-flex-auth-policy-gate-production.md new file mode 100644 index 0000000..94292a5 --- /dev/null +++ b/workplans/archived/260623-WARDEN-WP-0009-flex-auth-policy-gate-production.md @@ -0,0 +1,95 @@ +--- +id: WARDEN-WP-0009 +type: workplan +title: "flex-auth Policy Gate Production Readiness" +domain: infotech +repo: ops-warden +status: archived +owner: codex +topic_slug: custodian +planning_priority: low +planning_order: 9 +created: "2026-06-18" +updated: "2026-06-23" +state_hub_workstream_id: "9213b262-e2f5-480e-a5bc-56635d5eb4c9" +--- + +# WARDEN-WP-0009 — flex-auth Policy Gate Production Readiness + +**Scope:** Enable and verify the opt-in flex-auth pre-sign gate (`policy.enabled`) +in production after flex-auth publishes `ssh-certificate` resource policies. + +**Out of scope:** flex-auth policy package authoring (flex-auth owner — delivered +FLEX-WP-0006 2026-06-23); OpenBao SSH engine and host CA (complete — NET-WP-0020 +T5 / WP-0008 T2); in-cluster flex-auth deployment (continued in flex-auth +`FLEX-WP-0007`). + +**Spun out from:** WARDEN-WP-0008 T5 (2026-06-18 closeout). + +--- + +## Tasks + +### T1 — flex-auth policy package confirmation + +```task +id: WARDEN-WP-0009-T01 +status: done +priority: medium +state_hub_task_id: "f988ed2e-0f63-4e89-abc4-183a7f23ddc2" +``` + +- [x] Confirm flex-auth policies for resource type `ssh-certificate` exist +- [x] Document tenant/subject bindings for `adm` / `agt` / `atm` sign paths +- [x] Coordinate with flex-auth owner on deny/allow test fixtures + +### T2 — Production enablement and smoke + +```task +id: WARDEN-WP-0009-T02 +status: done +priority: medium +state_hub_task_id: "9d0fabc2-10ef-426d-a3d2-d4970d377029" +``` + +- [x] Document operator steps to set `policy.enabled: true` (see `wiki/PolicyGatedSigning.md`) +- [x] Local smoke — allow/deny paths with `policy_decision_id` / `ttl_out_of_bounds` +- [x] Production registry slice from inventory (`registry/flex-auth/production_registry_snapshot.json`) +- [x] Production registry smoke — allow `agt-state-hub-bridge` (`decision:032b096c433ad80c`) +- [x] Production registry smoke — deny `--ttl 999` (`ttl_out_of_bounds`) + +--- + +## Deliverables + +| Artifact | Path | +| --- | --- | +| Registry builder | `scripts/build_flex_auth_registry.py` | +| Production registry | `registry/flex-auth/production_registry_snapshot.json` | +| Smoke runner | `scripts/policy_gate_production_smoke.sh` | +| Local smoke evidence | `history/2026-06-23-flex-auth-policy-gate-local-smoke.md` | +| Production smoke evidence | `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` | +| flex-auth pickup brief | `history/2026-06-23-flex-auth-production-pickup-suggestion.md` | + +--- + +## Closeout (2026-06-23) + +T1–T2 complete. ops-warden caller side and production-registry smoke verified. +Production `policy.enabled: true` flip deferred until flex-auth runtime is +reachable — tracked in flex-auth `FLEX-WP-0007`, not this workplan. + +**Operator follow-up (FLEX-WP-0007):** + +- Deploy registry + policy package to in-cluster flex-auth; set `policy.flex_auth_url` +- Refresh scoped `VAULT_TOKEN` and run `SMOKE_VAULT=1 ./scripts/policy_gate_production_smoke.sh` +- Set `policy.enabled: true` in `~/.config/warden/warden.yaml` when flex-auth is reachable + +--- + +## See also + +- `wiki/PolicyGatedSigning.md` +- `~/flex-auth/docs/ops-warden-policy-gate-handoff.md` +- `~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md` +- `examples/warden.production.example.yaml` \ No newline at end of file diff --git a/workplans/WARDEN-WP-0010-access-routing-charter.md b/workplans/archived/260624-WARDEN-WP-0010-access-routing-charter.md similarity index 98% rename from workplans/WARDEN-WP-0010-access-routing-charter.md rename to workplans/archived/260624-WARDEN-WP-0010-access-routing-charter.md index d6cbbaf..cab86ae 100644 --- a/workplans/WARDEN-WP-0010-access-routing-charter.md +++ b/workplans/archived/260624-WARDEN-WP-0010-access-routing-charter.md @@ -4,13 +4,13 @@ type: workplan title: "Access Routing — Charter and Pointer Catalog" domain: infotech repo: ops-warden -status: done +status: archived owner: codex topic_slug: custodian planning_priority: high planning_order: 10 created: "2026-06-18" -updated: "2026-06-18" +updated: "2026-06-24" state_hub_workstream_id: "e93de9fd-0192-4d02-bb7c-5e859fb76b9b" --- @@ -169,3 +169,8 @@ state_hub_task_id: "3335a689-922c-4319-98d0-4263ab13790b" - `history/2026-06-18-access-routing-intent-shift-assessment.md` — decision record - `WARDEN-WP-0011` — routing CLI - `WARDEN-WP-0012` — scenario playbook expansion (backlog) +--- + +## Closeout (2026-06-24) + +Archived during WARDEN-WP-0013 T2. All tasks complete. diff --git a/workplans/WARDEN-WP-0011-routing-guide-cli.md b/workplans/archived/260624-WARDEN-WP-0011-routing-guide-cli.md similarity index 97% rename from workplans/WARDEN-WP-0011-routing-guide-cli.md rename to workplans/archived/260624-WARDEN-WP-0011-routing-guide-cli.md index 1d9c6b9..e08c63d 100644 --- a/workplans/WARDEN-WP-0011-routing-guide-cli.md +++ b/workplans/archived/260624-WARDEN-WP-0011-routing-guide-cli.md @@ -4,13 +4,13 @@ type: workplan title: "Routing Lookup CLI" domain: infotech repo: ops-warden -status: done +status: archived owner: codex topic_slug: custodian planning_priority: high planning_order: 11 created: "2026-06-18" -updated: "2026-06-18" +updated: "2026-06-24" state_hub_workstream_id: "0a520f8e-01b4-48f1-9af3-2f3f69fd0672" --- @@ -154,3 +154,8 @@ state_hub_task_id: "bf848375-eca7-4116-bb1d-fb7df6395c70" - `WARDEN-WP-0010` — charter and catalog schema - `WARDEN-WP-0012` — expanded per-scenario playbooks - `history/2026-06-17-intent-scope-assessment.md` — prior `warden guide` proposal (P4) +--- + +## Closeout (2026-06-24) + +Archived during WARDEN-WP-0013 T2. All tasks complete. diff --git a/workplans/archived/260624-WARDEN-WP-0013-production-integration-and-stewardship-closeout.md b/workplans/archived/260624-WARDEN-WP-0013-production-integration-and-stewardship-closeout.md new file mode 100644 index 0000000..5d5eedf --- /dev/null +++ b/workplans/archived/260624-WARDEN-WP-0013-production-integration-and-stewardship-closeout.md @@ -0,0 +1,202 @@ +--- +id: WARDEN-WP-0013 +type: workplan +title: "Production Integration & Stewardship Closeout" +domain: infotech +repo: ops-warden +status: archived +owner: codex +topic_slug: custodian +planning_priority: high +planning_order: 13 +depends_on_workplans: + - WARDEN-WP-0008 + - WARDEN-WP-0009 + - WARDEN-WP-0010 + - WARDEN-WP-0011 +related_workplans: + - WARDEN-WP-0012 + - FLEX-WP-0007 +created: "2026-06-24" +updated: "2026-06-24" +state_hub_workstream_id: "4678c41a-c1d0-48cd-9988-4ea0380e8258" +--- + +# WARDEN-WP-0013 — Production Integration & Stewardship Closeout + +## Purpose + +Close the remaining **ops-warden-owned** gaps after policy gate and routing shipped: +refresh INTENT/SCOPE canon, archive finished workplans, document ops-bridge +`cert_command` migration, operator OpenBao token hygiene, principals drift checks, +and the policy-gate production flip checklist. + +This workplan addresses the deferred **Production SSH Integration Closeout** strand +from `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` §6, updated for +post-WP-0009 state. + +**Gap analysis:** `history/2026-06-24-intent-scope-gap-analysis.md` + +## Scope + +- Post-WP-0009 reassessment and SCOPE alignment +- Archive hygiene for WP-0010 and WP-0011 +- ops-bridge `cert_command` migration documentation (pilot `agt-state-hub-bridge`) +- Operator runbook for scoped OpenBao tokens (no root in `VAULT_TOKEN`) +- Principals drift check between warden inventory and railiance-infra +- Policy gate production enablement checklist (coordinate FLEX-WP-0007) + +## Out of scope + +- flex-auth runtime deployment (flex-auth **FLEX-WP-0007**) +- ops-bridge tunnel config changes in the ops-bridge repo (coordinate only) +- Routing scenario playbook expansion (**WARDEN-WP-0012** — parallel track) +- OpenBao cluster deploy, flex-auth policy authoring, NK-WP-0009 tutorial +- Implementing secret vending or foreign API proxies + +## Ownership boundary + +| Concern | Owner | +| --- | --- | +| cert_command migration playbook | ops-warden (doc); ops-bridge (tunnel config) | +| OpenBao token hygiene runbook | ops-warden (doc); operator (execution) | +| Principals drift | ops-warden (check doc/script); railiance-infra (host deploy) | +| `policy.enabled: true` flip | operator (after FLEX-WP-0007) | + +--- + +## T1 — Post-gap reassessment and SCOPE refresh + +```task +id: WARDEN-WP-0013-T01 +status: done +priority: high +state_hub_task_id: "de46f9a2-bf11-4651-a23c-430c63f396c8" +``` + +- [x] Write `history/2026-06-24-intent-scope-gap-analysis.md` +- [x] Update `SCOPE.md` active workplan table (WP-0013, WP-0012 ready) +- [x] Note maturity vector and partial INTENT criterion (ops-bridge) in SCOPE + +**Acceptance:** Gap analysis on file; SCOPE reflects 2026-06-24 repo state. + +--- + +## T2 — Archive hygiene (WP-0010, WP-0011) + +```task +id: WARDEN-WP-0013-T02 +status: done +priority: medium +state_hub_task_id: "1b35321d-63ad-40da-a1aa-0b66190a0733" +``` + +- [x] Move `WARDEN-WP-0010-access-routing-charter.md` to + `workplans/archived/260624-WARDEN-WP-0010-access-routing-charter.md` +- [x] Move `WARDEN-WP-0011-routing-guide-cli.md` to + `workplans/archived/260624-WARDEN-WP-0011-routing-guide-cli.md` +- [x] Set frontmatter `status: archived` on both; add closeout notes +- [x] Operator runs `make fix-consistency REPO=ops-warden` from `~/state-hub` + +**Acceptance:** Only WP-0012 (ready) and WP-0013 (active when started) remain in +`workplans/` root; hub synced. + +--- + +## T3 — ops-bridge cert_command migration playbook + +```task +id: WARDEN-WP-0013-T03 +status: done +priority: high +state_hub_task_id: "ad8588b2-9ae9-4f94-bd77-8025851a38f5" +``` + +- [x] Write `wiki/playbooks/ops-bridge-tunnel-cert.md` — static-key → `cert_command` + migration checklist for tunnel configs +- [x] Document pilot tunnel `agt-state-hub-bridge`: actor, pubkey path, cert_command + string, inventory prerequisites +- [x] Upgrade catalog entry `ops-bridge-tunnel` `wiki_ref` to the new playbook +- [x] Coordinate with ops-bridge owner for pilot tunnel config change (State Hub message) +- [ ] Record non-secret smoke evidence when pilot completes (`history/` entry — pending ops-bridge) + +**Acceptance:** Playbook exists; catalog points at it; pilot steps documented even +if ops-bridge execution is pending. + +**Unlocks:** INTENT success criterion #3 moves from partial toward met. + +--- + +## T4 — Operator OpenBao token hygiene runbook + +```task +id: WARDEN-WP-0013-T04 +status: done +priority: medium +state_hub_task_id: "5cb35829-32eb-4d59-97a1-f4d92ce8e239" +``` + +- [x] Add `wiki/playbooks/operator-openbao-token-hygiene.md` covering scoped tokens, + `VAULT_TOKEN` session pattern, OIDC route, HTTP 403 recovery +- [x] Cross-link from `wiki/OpsWardenConfig.md` and production example yaml + +**Acceptance:** Operator can follow runbook without asking ops-warden for token values. + +--- + +## T5 — Principals inventory drift check + +```task +id: WARDEN-WP-0013-T05 +status: done +priority: medium +state_hub_task_id: "4025cd32-89f8-42c3-b1e8-eaf78497d91f" +``` + +- [x] `scripts/check_principals_drift.py` compares inventory `hosts` vs + `railiance-infra/ansible/inventory/ssh_principals.yaml` +- [x] Script notes flex-auth registry regeneration via `build_flex_auth_registry.py` +- [x] Tests in `tests/test_principals_drift.py` + +**Acceptance:** Drift check runnable or documented; no secret material in script output. + +--- + +## T6 — Policy gate production enablement checklist + +```task +id: WARDEN-WP-0013-T06 +status: done +priority: medium +state_hub_task_id: "51663f65-79cb-4108-87c8-9721f9476259" +``` + +- [x] Operator checklist in `wiki/PolicyGatedSigning.md` § Production rollout +- [x] Cross-link FLEX-WP-0007 and pickup brief +- [x] Explicit: keep `policy.enabled: false` until flex-auth reachable + +**Acceptance:** Operator checklist is sequential and references cross-repo owners; +no ops-warden code changes required for flex-auth deploy. + +--- + +## Exit criteria + +- Gap analysis and SCOPE current +- WP-0010 and WP-0011 archived +- ops-bridge cert_command playbook + catalog upgrade +- Operator token hygiene runbook +- Principals drift procedure +- Policy gate production flip checklist (coordinate FLEX-WP-0007) + +## Parallel track + +**WARDEN-WP-0012** (routing scenario playbooks) — promoted to `ready`; start when +P1 integration doc bandwidth allows or in parallel if staffed. + +## See also + +- `history/2026-06-24-intent-scope-gap-analysis.md` +- `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` +- `wiki/CertCommandInterface.md` +- `~/flex-auth/workplans/FLEX-WP-0007-ops-warden-policy-gate-production-deployment.md` \ No newline at end of file