feat: close WP-0009/WP-0013 production integration stewardship strand

Ship flex-auth policy gate registry and smoke evidence, archive WP-0009
through WP-0013, and add integration docs: ops-bridge cert_command
migration playbook, operator OpenBao token hygiene, principals drift
check script, and 2026-06-24 INTENT/SCOPE gap analysis.
This commit is contained in:
2026-06-24 12:44:32 +02:00
parent 1778b169da
commit 90007c2cda
24 changed files with 2192 additions and 121 deletions

View File

@@ -15,21 +15,26 @@ aligned with NetKingdom canon.
---
## Where we are (2026-06-18)
## Where we are (2026-06-24)
ops-warden **issues short-lived SSH certificates and routes every other credential
need to the subsystem that owns it.** SSH signing is **production-verified** on
Railiance OpenBao (`warden sign` against `https://bao.coulomb.social`, host CA trust
deployed). The routing material — `wiki/AccessRouting.md`, the credential routing
wiki, NetKingdom security map, a machine-readable pointer catalog
(`registry/routing/catalog.yaml`, WARDEN-WP-0010), and the `warden route`
lookup CLI over it (`list`/`show`/`find`, WARDEN-WP-0011) — is operational. The opt-in
flex-auth pre-sign gate is **coded but off in production** until flex-auth publishes
`ssh-certificate` policies (WARDEN-WP-0009).
deployed).
**Access routing** is shipped: `wiki/AccessRouting.md`, credential routing wiki,
NetKingdom security map, machine-readable pointer catalog
(`registry/routing/catalog.yaml`, WP-0010), and `warden route` lookup CLI
(`list`/`show`/`find`, `--json`, WP-0011).
**Policy gate** is shipped on the caller side (WP-0007) with production registry
and smoke evidence (WP-0009 archived). flex-auth published the `ssh-certificate`
policy package (FLEX-WP-0006). `policy.enabled` remains **false** in production
until flex-auth is deployed to a reachable URL (flex-auth FLEX-WP-0007).
**INTENT alignment:** SSH issuance mission met in production. Remaining distance
is integration breadth (ops-bridge `cert_command` on live tunnels), authorization
depth (flex-auth), and operator hygiene — not missing signing code.
is integration breadth (ops-bridge `cert_command` on live tunnels), flex-auth
runtime deployment (not ops-warden code), and operator hygiene.
### Issue vs route
@@ -47,7 +52,9 @@ ops-warden executes exactly one lane and points at the owner for the rest.
Full role and boundary: `wiki/AccessRouting.md`. The catalog is a **pointer layer**
it never restates an owner's procedure (authored `steps` exist only for the SSH lane).
Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
Gap analysis: `history/2026-06-24-intent-scope-gap-analysis.md` (current);
`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` (SSH lane);
`history/2026-06-18-access-routing-intent-shift-assessment.md` (routing charter).
---
@@ -66,8 +73,8 @@ Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
| Dimension | Level | Meaning today |
| --- | --- | --- |
| D5 | Discovery | Routing wiki + security map + pointer catalog + NK canon cross-links |
| A4 | Availability | CLI + opt-in policy gate + `warden route` lookup over the machine-readable catalog (`list`/`show`/`find`, `--json` for agents) |
| C4 | Completeness | SSH lane prod-verified; flex-auth policies external |
| A4 | Availability | CLI + `warden route` + opt-in policy gate + agent `--json` lookup |
| C4 | Completeness | SSH lane prod-verified; policy gate + registry smoke shipped; prod flip waits flex-auth deploy |
| R3 | Reliability | Live OpenBao sign evidence on Railiance |
---
@@ -75,9 +82,9 @@ Full gap analysis: `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
## Core Idea
**Today:** implements the SSH certificate lane from `wiki/AccessManagementDirective.md`
§§15 — CA signing, actor inventory, TTL policy, cert-side scorecard, and the
`cert_command` interface for ops-bridge. Production path uses OpenBao SSH engine
(`backend: vault`).
§§15 — CA signing, actor inventory, TTL policy, cert-side scorecard, optional
flex-auth pre-sign gate, and the `cert_command` interface for ops-bridge. Production
path uses OpenBao SSH engine (`backend: vault`).
**Direction (INTENT):** issue short-lived SSH certificates and route dev workers to
key-cape, flex-auth, OpenBao, ops-bridge, and railiance components for everything
@@ -96,6 +103,10 @@ for the rest.
- `cert_command`: `warden sign <actor> --pubkey <path>` → cert on stdout
- TTL enforcement per `ActorType` (`adm` 48 h, `agt` 24 h, `atm` 8 h)
- `warden status`, cleanup, scorecard, signatures log
- Opt-in flex-auth policy gate (`policy.enabled`, `policy_decision_id` in log)
- Production flex-auth registry builder (`scripts/build_flex_auth_registry.py`,
`registry/flex-auth/production_registry_snapshot.json`)
- Policy gate smoke runner (`scripts/policy_gate_production_smoke.sh`)
- `warden route` lookup CLI (`list`/`show`/`find`, `--json`) over the pointer catalog
- `warden issue` and `ops-ssh-wrapper` (local backend; vault uses sign-only)
- Runbooks for OpenBao config and Inter-Hub bootstrap SSH envelope
@@ -105,38 +116,38 @@ for the rest.
- NetKingdom security routing guidance — which subsystem owns which credential type
- Wiki and config references aligned with OpenBao-first platform standard
- Capability registry entry for SSH certificate issuance
- Routing pointer catalog (`registry/routing/catalog.yaml`)
- Keeping ops access patterns consistent with `net-kingdom` platform architecture
### Shipped workplans
### Shipped workplans (archived)
| WP | Focus |
| --- | --- |
| WP-00010005 | Initial CLI, quality, hygiene, OpenBao docs, hub sync |
| WP-0006 | Credential routing, security map, inventory patterns, OpenBao checklist |
| WP-0007 | Opt-in flex-auth policy gate (`policy.enabled`) |
| WP-0008 | Production sign verification, stewardship closeout, archive hygiene |
| WP-0010 | "Issue SSH, route the rest" wording + `wiki/AccessRouting.md` + pointer catalog |
| WP-0011 | `warden route` lookup CLI (`list`/`show`/`find`) over the pointer catalog (A3 → A4) |
| WP-0009 | flex-auth registry + policy smoke; pickup brief for FLEX-WP-0007 |
| WP-0010 | Access routing charter + pointer catalog |
| WP-0011 | `warden route` lookup CLI |
| WP-0013 | Production integration closeout — cert_command playbook, token hygiene, principals drift |
### Active / wait
### Active / ready
| WP | Status | Focus |
| --- | --- | --- |
| **WP-0009** | `blocked` | flex-auth `ssh-certificate` policies + `policy.enabled` production smoke |
| **WP-0012** | `backlog` | Routing scenario playbooks (draft until owner paths ship) |
| **WP-0012** | `ready` | Routing scenario playbooks (catalog + wiki expansion) |
### Known gaps (not yet workplanned)
### Known gaps (not ops-warden workplans)
| Gap | Owner | Notes |
| --- | --- | --- |
| ops-bridge `cert_command` on live tunnels | ops-bridge | Tunnels use `agt-claude-*` static keys today |
| Operator token hygiene | Operator | Prefer OIDC + `warden-sign`; retire root from shell profile |
| Principals sync warden ↔ railiance-infra | ops-warden + infra | `inventory.yaml` hosts vs `ssh_principals.yaml` |
| flex-auth production runtime + registry deploy | flex-auth | **FLEX-WP-0007** — unblocks `policy.enabled: true` |
| Vault-backed policy gate joint smoke | flex-auth + operator | Needs valid scoped `VAULT_TOKEN` |
| ops-bridge `cert_command` on live tunnels | ops-bridge | Playbook shipped (`wiki/playbooks/ops-bridge-tunnel-cert.md`); pilot pending |
| Principals sync warden ↔ railiance-infra | ops-warden + infra | `scripts/check_principals_drift.py` — operator runs periodically |
| NK-WP-0009 joint SSH tutorial | net-kingdom | Parallel coordination track |
The integration-closeout strand (ops-bridge tunnel migration, token runbook) from
reassessment §6 is not yet workplanned; WARDEN-WP-0010 was used for the access-routing
charter instead. Open a new WP when tunnel migration becomes priority.
---
## Out of Scope
@@ -145,6 +156,7 @@ charter instead. Open a new WP when tunnel migration becomes priority.
with flex-auth policy where required; ops-warden documents paths only
- Identity / OIDC / MFA → key-cape, Keycloak
- Authorization policy decisions → flex-auth
- flex-auth runtime deployment → flex-auth (`FLEX-WP-0007`)
- Tunnel lifecycle → `ops-bridge`
- Host principal deployment → `railiance-infra`
- OpenBao / Vault cluster deployment → `railiance-platform`
@@ -157,10 +169,12 @@ charter instead. Open a new WP when tunnel migration becomes priority.
- Issuing or refreshing an **SSH cert** for `adm`/`agt`/`atm`
- A dev worker needs to know **where to get credentials** in the NetKingdom stack
- An agent needs **`warden route find`** instead of re-deriving routing from wiki prose
- `ops-bridge` needs a `cert_command` for a tunnel
- Adding actors to the principals inventory
- Adding actors to the principals inventory (regenerate flex-auth registry snapshot)
- Inter-Hub or bootstrap tasks need a **short-lived agent SSH envelope**
- Checking cert-side compliance (scorecard)
- Enabling or testing the opt-in flex-auth policy gate
---
@@ -177,9 +191,12 @@ charter instead. Open a new WP when tunnel migration becomes priority.
- **SSH CLI:** v0.1.0 — local + OpenBao backends
- **Production sign:** verified 2026-06-18 (`history/2026-06-17-openbao-production-verify.md`)
- **Policy gate:** shipped, `policy.enabled: false` in prod until WP-0009
- **Active workplan:** WP-0009 (wait — flex-auth)
- **Latest assessment:** `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`
- **Access routing:** WP-0010 + WP-0011 shipped (`warden route`, pointer catalog)
- **Policy gate:** caller shipped (WP-0007); registry + smoke complete (WP-0009 archived).
`policy.enabled: false` until flex-auth reachable (`FLEX-WP-0007`)
- **Ready work:** WP-0012 (routing playbooks)
- **Integration docs:** cert_command migration, token hygiene, principals drift (`wiki/playbooks/`)
- **Latest assessment:** `history/2026-06-24-intent-scope-gap-analysis.md`
---
@@ -195,7 +212,8 @@ key-cape / Keycloak identity claims
```
Upstream: OpenBao SSH engine (production) or local CA (labs). Actor inventory in
operator config or Git-tracked patterns.
operator config or Git-tracked patterns. flex-auth registry snapshot derived from
inventory when policy gate is enabled.
Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operators.
@@ -207,6 +225,7 @@ Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operato
- `cert_command`: shell command returning a cert on stdout
- `inventory.yaml`: actor → principals + TTL registry
- `LocalCA` / `VaultCA`: signing backends (`backend: local` | `vault`)
- Pointer catalog: `registry/routing/catalog.yaml` — subsystem ownership lookup only
---
@@ -218,7 +237,7 @@ Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operato
| `ops-bridge` | Primary cert_command consumer |
| `railiance-infra` | Host-side SSH principals and hardening |
| `railiance-platform` | OpenBao deployment and platform secrets |
| `flex-auth` | Authorization; opt-in pre-sign policy gate (`policy.enabled`) |
| `flex-auth` | Authorization; policy package shipped (FLEX-WP-0006); runtime deploy FLEX-WP-0007 |
| `key-cape` | Identity / IAM Profile lightweight mode |
| `state-hub` | Workstream registry |
@@ -243,14 +262,17 @@ keywords: [ssh, certificate, ca, credential, warden, ops-warden, pki, openbao, v
| --- | --- |
| `INTENT.md` | Why ops-warden exists and where it is going |
| `SCOPE.md` | What is implemented today (this file) |
| `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | Latest INTENT ↔ SCOPE gap analysis |
| `wiki/AccessRouting.md` | What ops-warden issues vs routes (role and boundary) |
| `wiki/CredentialRouting.md` | Which subsystem for each credential need |
| `registry/routing/catalog.yaml` | Machine-readable routing pointer catalog |
| `wiki/NetKingdomSecurityMap.md` | Platform security component map |
| `examples/warden.production.example.yaml` | Production warden.yaml template |
| `wiki/PolicyGatedSigning.md` | flex-auth opt-in gate + registry rollout |
| `wiki/AccessManagementDirective.md` | SSH actor model |
| `wiki/OpsWardenConfig.md` | warden.yaml and OpenBao |
| `wiki/CertCommandInterface.md` | cert_command contract |
| `wiki/PolicyGatedSigning.md` | flex-auth opt-in gate |
| `history/2026-06-24-intent-scope-gap-analysis.md` | Current gap analysis + WP-0013 |
| `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | SSH lane gap analysis |
| `history/2026-06-18-access-routing-intent-shift-assessment.md` | Routing charter decision |
| `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` | Policy gate smoke evidence |
| `net-kingdom/docs/platform-identity-security-architecture.md` | Platform security canon |