diff --git a/INTENT.md b/INTENT.md index 4112aea..506063c 100644 --- a/INTENT.md +++ b/INTENT.md @@ -40,18 +40,23 @@ short-lived certificate lane** it owns. > *Where we are going.* -ops-warden aims to become the **operational access desk** for the ops fleet: +ops-warden **issues short-lived SSH certificates and routes every other credential +need to the subsystem that owns it.** It is not a desk that wraps the platform; it +owns one lane and points at the rest: 1. **Know** the NetKingdom security model — identity, authorization, secrets, SSH access, tunnels, bootstrap custody, and tenant/platform boundaries. 2. **Route** workers to the correct subsystem for each credential type instead - of becoming a universal secret vending machine. + of becoming a universal secret vending machine — through the wiki and a + machine-readable routing catalog that *points at* the owner's docs rather than + restating them. 3. **Align** runbooks, wiki, inventory patterns, and scorecard checks with NetKingdom canon as the platform evolves (OpenBao-first, flex-auth policy, key-cape IAM Profile, railiance deployment layers). 4. **Issue** short-lived SSH certificates for `adm` / `agt` / `atm` actors when host or ops reachability requires the SSH lane — via `warden sign`, - `cert_command`, and `ops-ssh-wrapper`. + `cert_command`, and `ops-ssh-wrapper`. This is the **only** lane ops-warden + executes. 5. **Audit** SSH signing operations and cert-side compliance so gatekeeping is observable, not tribal knowledge. @@ -151,7 +156,7 @@ Every successful SSH sign is auditable (`signatures.log`). Compliance checks Development worker needs access | v -ops-warden (steward / desk) +ops-warden (issue SSH; route the rest) | +-- SSH host / ops reachability? ----> warden sign / cert_command | @@ -164,9 +169,9 @@ ops-warden (steward / desk) +-- Tunnel only? --------------------> ops-bridge + cert_command ``` -Today the **steward desk** is primarily documentation, runbooks, and the -implemented SSH CLI. Routing automation and policy-gated issuance are intentional -follow-ups, not current promises. +Today the steward role is primarily documentation, runbooks, and the implemented +SSH CLI. The machine-readable routing catalog and `warden route` lookup, plus +policy-gated issuance, are intentional follow-ups, not current promises. --- @@ -207,6 +212,8 @@ ops-warden is succeeding when: - Replacing OpenBao, flex-auth, key-cape, or railiance deployment ownership - Storing Inter-Hub, LLM provider, or other long-lived API keys - Host-side SSH configuration deployment +- **Duplicating or restating another subsystem's procedure** — routing material + points at the owner's docs; it does not fork them - SSO / Teleport at scale (trigger per Access Management Directive §6.2) --- diff --git a/history/2026-06-18-access-routing-intent-shift-assessment.md b/history/2026-06-18-access-routing-intent-shift-assessment.md new file mode 100644 index 0000000..b66520a --- /dev/null +++ b/history/2026-06-18-access-routing-intent-shift-assessment.md @@ -0,0 +1,105 @@ +# Decision Record — Sharpen "steward" into "issue SSH, route the rest" + +**Date:** 2026-06-18 +**Author:** codex +**Status:** Accepted. Feeds WARDEN-WP-0010 T1. +**Supersedes:** the earlier "operations security coach" draft (rejected — see below). + +--- + +## 1. The decision + +Keep ops-warden's mission exactly as it is in production and sharpen only the +wording: **ops-warden issues short-lived SSH certificates and routes every other +credential need to the subsystem that owns it.** Add a small machine-readable +routing catalog and a `warden route` lookup CLI so agents stop re-deriving routing +from wiki prose. + +This is **wording plus a thin lookup surface**, not a new security lane. SSH +issuance stays the only thing ops-warden executes. + +| | Before | After | +| --- | --- | --- | +| Framing | "operational access steward / desk" | "issues SSH certs; routes the rest to its owner" | +| Non-SSH creds | document paths in wiki | same wiki + structured catalog pointing at it | +| Lookup | grep the wiki | `warden route find/show` | +| Foreign APIs | not owned | explicitly not proxied or restated | + +Maturity moves **Availability A3 → A4** (structured lookup for agents). Completeness +and Reliability for the SSH lane are unchanged — nothing here ships new signing code. + +--- + +## 2. Why not "coach" + +An earlier draft framed this as an "operations security coach." Rejected: + +- **Overpromises.** What is built is a routing directory — lookup, not pedagogy. + "Coach" implies teaching and an ongoing relationship the CLI does not deliver, + which feeds the "agent stops at the lookup and never learns the subsystem" + failure mode. +- **Generic / collision-prone** across other custodian domains. +- **No new metaphor needed.** "Steward who issues SSH and routes the rest" is + already accurate and harder to misread as a wrapping service. + +Command verb is `warden route` (concrete), not `warden coach`. + +--- + +## 3. The double-source-of-truth trap, and how we avoid it + +A routing catalog risks becoming a hand-maintained fork of net-kingdom's +responsibility map. A stale-but-authoritative-looking catalog is **worse** than +wiki prose, because an agent trusts structured output and will not second-guess it. + +**Rule (binding on WP-0010 T3 / enforced by WP-0011 T5):** the catalog is a +*pointer layer*. For any subsystem ops-warden does not own, an entry carries only +identifiers + `owner_repo` + `wiki_ref` (in-repo authoritative section) + +`canon_ref` (upstream net-kingdom doc) — **no restated procedure**. Procedure is +authored in exactly one place per need: the wiki section it points to. ops-warden +authors `steps` for exactly one lane — SSH issuance — because it owns it. + +This is enforced structurally, not by process: a CI test fails any non-SSH entry +that carries a `steps` block, and checks every `wiki_ref` anchor resolves. We do +not rely on a quarterly human review to catch drift. + +--- + +## 4. Other tightenings applied + +- **Dropped `warden coach check`.** Highest scope-creep risk, thin value (`warden + status` already covers SSH local preconditions). SSH precondition hints fold into + `warden route show` instead. +- **No agent-visible stubs for unshipped paths.** Scenarios whose owning repo has + not shipped a real path stay `status: draft` and are hidden from default + lookup (WP-0012 anti-stale rule). + +--- + +## 5. Guardrails (non-negotiable) + +1. **One execution lane** — only SSH cert issuance in ops-warden code. +2. **No secret material** in catalog, CLI output, logs, or history. +3. **No foreign API wrappers** — beyond the existing opt-in SSH pre-sign gate. +4. **No restated procedure** for subsystems ops-warden does not own — pointers only. +5. **Canon supremacy** — wiki tracks net-kingdom; ops-warden never overrides it. + +--- + +## 6. Failing signals (watch for these) + +- Feature requests cluster on `warden secret` / `warden bao` / `warden login`. +- A catalog entry grows a `steps` block for a non-SSH subsystem. +- `wiki_ref` anchors rot without CI failure. +- Operators bypass OpenBao "because warden is easier" — but warden cannot help. + +--- + +## 7. References + +- `INTENT.md`, `SCOPE.md` — pre-update wording +- `workplans/WARDEN-WP-0010-access-routing-charter.md` +- `workplans/WARDEN-WP-0011-routing-guide-cli.md` +- `workplans/WARDEN-WP-0012-routing-scenario-playbooks.md` +- `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` — prior gap analysis +- `wiki/CredentialRouting.md`, `wiki/NetKingdomSecurityMap.md` diff --git a/workplans/WARDEN-WP-0010-access-routing-charter.md b/workplans/WARDEN-WP-0010-access-routing-charter.md new file mode 100644 index 0000000..69349bd --- /dev/null +++ b/workplans/WARDEN-WP-0010-access-routing-charter.md @@ -0,0 +1,162 @@ +--- +id: WARDEN-WP-0010 +type: workplan +title: "Access Routing — Charter and Pointer Catalog" +domain: custodian +repo: ops-warden +status: ready +owner: codex +topic_slug: custodian +planning_priority: high +planning_order: 10 +created: "2026-06-18" +updated: "2026-06-18" +--- + +# WARDEN-WP-0010 — Access Routing — Charter and Pointer Catalog + +**Scope:** Sharpen the existing steward framing so it cannot be misread as a desk +API that wraps every subsystem. ops-warden **issues SSH certificates** and +**points workers to the owning subsystem** for everything else. This workplan +updates INTENT/SCOPE wording and adds a machine-readable routing catalog that is +a **pointer layer**, not a second copy of NetKingdom canon. + +**Not a new security lane.** This is wording + a thin lookup surface. SSH issuance +remains the only thing ops-warden executes. Maturity moves Availability A3 → A4 +(structured lookup for agents); Completeness and Reliability for SSH are unchanged. + +**Out of scope:** Secret-vending, OIDC, policy PDP, tunnel, or host-hardening code +in this repo; flex-auth policy packages (WARDEN-WP-0009); any universal broker. + +**Depends on:** WARDEN-WP-0006 stewardship canon (routing wiki, security map) — shipped. + +**Feeds:** WARDEN-WP-0011 (routing CLI over the catalog). + +--- + +## Principles (target) + +1. **Point, don't proxy** — Name the owner and the doc; do not wrap a foreign API + unless the answer is an SSH certificate. +2. **Direct interaction** — Workers (humans, agents, CI, operators) call OpenBao, + key-cape, flex-auth, ops-bridge, and railiance repos themselves. +3. **One source of truth** — Routing procedure for non-SSH needs lives in the wiki + (aligned to net-kingdom canon) and upstream canon, **not** restated in the + catalog. The catalog carries identifiers and pointers only. ops-warden authors + procedure for exactly one lane: SSH certificate issuance, which it owns. +4. **Same truth, two shapes** — Humans read the wiki; agents read the catalog. The + catalog references wiki sections by anchor so they cannot drift apart. + +--- + +## No-double-source rule (binding on T3) + +The catalog must not contain step-by-step procedure for any subsystem ops-warden +does not own. For non-SSH scenarios an entry carries: + +- `owner_repo`, `subsystem` — who to talk to +- `wiki_ref` — anchor into an in-repo wiki section (the authoritative restatement) +- `canon_ref` — upstream net-kingdom doc the wiki section tracks +- `need_keywords`, `title`, `id` — lookup metadata +- `warden_executes: false` + +Only `warden_executes: true` (SSH) entries may carry an authored `steps` block and +the `cert_command` pattern — because that is the lane ops-warden owns. A CI test +(WP-0011 T5) enforces this structurally: non-SSH entries with a `steps` block fail. + +--- + +## Tasks + +### T1 — INTENT and SCOPE wording + +```task +id: WARDEN-WP-0010-T01 +status: todo +priority: high +``` + +- [ ] `INTENT.md` — keep "operational access steward"; replace the "operational + access **desk**" phrasing with plain "issues SSH certs and routes everything + else to its owner." Drop any metaphor that implies a wrapping service. +- [ ] `SCOPE.md` — state the A3 → A4 move plainly: "structured routing lookup for + agents; execution unchanged." Add the coach-free "issue vs route" table. +- [ ] Non-goals: add "duplicating or restating another subsystem's procedure." +- [ ] Cross-link this workplan from the assessment note. + +### T2 — Routing-role wiki page + +```task +id: WARDEN-WP-0010-T02 +status: todo +priority: high +``` + +- [ ] Create `wiki/AccessRouting.md` — what ops-warden answers (where + who owns + it), what it executes (SSH only), anti-patterns (no `warden secret`, + `warden login`, `warden policy`), and audience notes. +- [ ] Include the **issue-vs-route** matrix (subsystem × ops-warden role × who acts). +- [ ] Link from README, `CredentialRouting.md`, `NetKingdomSecurityMap.md`. + +### T3 — Pointer catalog schema + seed + +```task +id: WARDEN-WP-0010-T03 +status: todo +priority: high +``` + +- [ ] Define `registry/routing/catalog.yaml` per the **No-double-source rule** above: + `id`, `title`, `need_keywords`, `owner_repo`, `subsystem`, `warden_executes`, + `wiki_ref`, `canon_ref`, `reviewed` (date), `status` (active|draft); plus + `steps` + `cert_command` **only** when `warden_executes: true`. +- [ ] Seed from existing WP-0006 scenarios: SSH cert (executes), OpenBao API key, + flex-auth policy, key-cape OIDC, ops-bridge tunnel, railiance-infra principals. +- [ ] Add `issue-core-ingestion-api-key` as `status: draft` (owner path TBD by + railiance-platform) — draft entries are not surfaced by default lookup. + +### T4 — Routing index in CredentialRouting.md + +```task +id: WARDEN-WP-0010-T04 +status: todo +priority: medium +``` + +- [ ] Add a playbook index table to `wiki/CredentialRouting.md` keyed to catalog `id`. +- [ ] Add "what ops-warden answers vs what the worker does next on the owner system" + examples — without restating the owner's procedure. +- [ ] Refresh the duplicate-interface anti-examples section. + +### T5 — Registry and repo-boundary alignment + +```task +id: WARDEN-WP-0010-T05 +status: todo +priority: medium +``` + +- [ ] Update `registry/capabilities/capability.security.ssh-certificate-issuance.md` + — note routing lookup in discovery; target availability notes the routing CLI. +- [ ] Update `.claude/rules/repo-boundary.md` and `AGENTS.md` one-liner (no new + metaphor — "issues SSH certs; routes other credential needs to their owner"). +- [ ] Extend the existing capability entry rather than minting a second capability. + +--- + +## Acceptance + +- A reader of INTENT + `wiki/AccessRouting.md` understands ops-warden **issues** SSH + certs and **routes** everything else, with no implication it proxies any API. +- `registry/routing/catalog.yaml` exists with ≥6 active scenarios; every non-SSH + entry has `wiki_ref` + `canon_ref` and **no** authored `steps`. +- No new secret-storage or foreign-API code. + +--- + +## See also + +- `INTENT.md` · `SCOPE.md` +- `history/2026-06-18-access-routing-intent-shift-assessment.md` — decision record +- `WARDEN-WP-0011` — routing CLI +- `WARDEN-WP-0012` — scenario playbook expansion (backlog) diff --git a/workplans/WARDEN-WP-0011-routing-guide-cli.md b/workplans/WARDEN-WP-0011-routing-guide-cli.md new file mode 100644 index 0000000..9cd82c6 --- /dev/null +++ b/workplans/WARDEN-WP-0011-routing-guide-cli.md @@ -0,0 +1,150 @@ +--- +id: WARDEN-WP-0011 +type: workplan +title: "Routing Lookup CLI" +domain: custodian +repo: ops-warden +status: ready +owner: codex +topic_slug: custodian +planning_priority: high +planning_order: 11 +created: "2026-06-18" +updated: "2026-06-18" +--- + +# WARDEN-WP-0011 — Routing Lookup CLI + +**Scope:** A `warden route` command group that reads the pointer catalog and tells +a worker which subsystem owns a need, what the prerequisites are, and which +wiki/canon doc to follow **on that system**. ops-warden does not call OpenBao, +flex-auth, or key-cape on the worker's behalf. + +**Out of scope:** HTTP API; live probes against any subsystem; secret generation or +retrieval; a separate health/precondition command (see "Dropped" below); replacing +subsystem CLIs. + +**Depends on:** WARDEN-WP-0010 T3 (catalog schema + seed). + +**Unlocks:** Agents run `warden route show --json` instead of re-deriving +routing from wiki prose each session. + +--- + +## Target CLI + +```text +warden route list [--json] [--tag ] +warden route show [--json] +warden route find [--json] # keyword match against need_keywords +``` + +`list`/`find` show only `status: active` entries by default (`--all` includes draft). + +### Behaviour + +| Command | Does | Does not | +| --- | --- | --- | +| `list` / `show` | Return owner, wiki/canon pointers, `warden_executes`, anti-patterns | Return secret material | +| `find` | Rank scenarios by keyword overlap | Invoke any external API | + +When `warden_executes: true` (SSH), `show` appends the catalog's authored `steps` +and the `warden sign` / `cert_command` pattern, plus a local precondition hint +("actor in inventory? backend configured? run `warden status`"). For all other +scenarios `show` ends with **"next action on `` — see ``"** +and never implies warden performed anything. + +### Dropped: separate `check` command + +The earlier draft had `warden coach check`. Cut. For SSH, `warden status` already +covers local preconditions; duplicating it invites scope creep toward probing +foreign subsystems. SSH precondition hints live inside `show` instead. + +--- + +## Tasks + +### T1 — Catalog loader and models + +```task +id: WARDEN-WP-0011-T01 +status: todo +priority: high +``` + +- [ ] Add `src/warden/routing/` package: `models.py`, `catalog.py`. +- [ ] Load and validate `registry/routing/catalog.yaml`. +- [ ] Enforce the no-double-source rule: non-SSH entries with a `steps` block are a + validation error. Clear errors for missing file, schema violations, dup `id`. + +### T2 — `warden route list` and `show` + +```task +id: WARDEN-WP-0011-T02 +status: todo +priority: high +``` + +- [ ] Register `route` Typer sub-app on the main CLI. +- [ ] `list` — Rich table + `--json` array of summaries; active-only unless `--all`. +- [ ] `show` — owner, prerequisites, pointers (`wiki_ref`, `canon_ref`), + `warden_executes`, anti-patterns; SSH entries also append `steps` + cert pattern. +- [ ] Exit 1 with a `find` hint when `show` id is unknown. + +### T3 — `warden route find` + +```task +id: WARDEN-WP-0011-T03 +status: todo +priority: high +``` + +- [ ] Tokenize query; match against `need_keywords`, `title`, `id`. +- [ ] Rank, show top matches (default 5); `--json` for agents. +- [ ] Fixtures: "issue core api key", "ssh tunnel", "openrouter key". + +### T4 — Tests + +```task +id: WARDEN-WP-0011-T04 +status: todo +priority: high +``` + +- [ ] `tests/test_routing.py` — catalog load, no-double-source validation rejects a + non-SSH `steps` block, find ranking, show JSON shape, SSH `show` includes cert + pattern. +- [ ] No integration test requires a live subsystem. + +### T5 — Doc consistency + drift guard + +```task +id: WARDEN-WP-0011-T05 +status: todo +priority: high +``` + +- [ ] CI/test: every `wiki_ref` anchor resolves to an existing in-repo wiki section; + every entry has a `reviewed` date. +- [ ] `wiki/AccessRouting.md` — CLI section with agent-oriented examples. +- [ ] README — `warden route --help` quick reference. +- [ ] Bump SCOPE availability note A3 → A4 on ship. + +--- + +## Acceptance + +- `uv run warden route find "issue core api key"` returns the draft scenario only + with `--all`, and never a generated key. +- `uv run warden route show ssh-cert-host-access --json` includes + `warden_executes: true` and the cert_command pattern. +- A non-SSH catalog entry carrying a `steps` block fails `test_routing.py`. +- `uv run pytest tests/test_routing.py` passes with no live-subsystem dependency. + +--- + +## See also + +- `WARDEN-WP-0010` — charter and catalog schema +- `WARDEN-WP-0012` — expanded per-scenario playbooks +- `history/2026-06-17-intent-scope-assessment.md` — prior `warden guide` proposal (P4) diff --git a/workplans/WARDEN-WP-0012-routing-scenario-playbooks.md b/workplans/WARDEN-WP-0012-routing-scenario-playbooks.md new file mode 100644 index 0000000..7aee23e --- /dev/null +++ b/workplans/WARDEN-WP-0012-routing-scenario-playbooks.md @@ -0,0 +1,135 @@ +--- +id: WARDEN-WP-0012 +type: workplan +title: "Routing Scenario Playbooks" +domain: custodian +repo: ops-warden +status: backlog +owner: codex +topic_slug: custodian +planning_priority: medium +planning_order: 12 +created: "2026-06-18" +updated: "2026-06-18" +--- + +# WARDEN-WP-0012 — Routing Scenario Playbooks + +**Scope:** Grow the routing catalog and wiki playbooks for high-frequency NetKingdom +access scenarios. Each wiki playbook restates **what the worker does on the owning +system** and tracks an upstream canon doc; the catalog only points at it. ops-warden +authors procedure only for the SSH lane. + +**Out of scope:** Implementing custody in ops-warden; creating OpenBao paths in +railiance-platform (coordinate only); authoring flex-auth policy; restating an +owner's procedure inside the catalog. + +**Depends on:** WARDEN-WP-0010 (charter + catalog schema), WARDEN-WP-0011 (routing CLI). + +**Status:** `backlog` — start after WP-0010 T3 and WP-0011 T2 ship. + +--- + +## Anti-stale rule + +A scenario is added to the catalog as `status: active` **only when its owning repo's +path actually exists** and a `wiki_ref` is written. Until then it stays `status: +draft` and is hidden from default `warden route find`/`list`. We do not seed +agent-visible entries for paths that owners have not shipped — a confident-looking +pointer to a non-existent path is worse than no entry. + +--- + +## Scenario backlog + +| Catalog id | Routing focus | Executing owner | Gate | +| --- | --- | --- | --- | +| `issue-core-ingestion-api-key` | OpenBao KV path, K8s injection, rotation | railiance-platform + issue-core | path exists | +| `activity-core-issue-sink` | `ISSUE_CORE_URL` + consumer key custody | activity-core + issue-core | path exists | +| `inter-hub-bootstrap-ssh` | SSH envelope + on-host wrapper reads OpenBao | ops-warden SSH + railiance-infra | ready (SSH lane) | +| `openrouter-llm-connect` | OpenBao → K8s Secret in activity-core | railiance-platform | path exists | +| `object-storage-sts` | NK-WP-0007 vending path | net-kingdom + flex-auth + OpenBao | canon exists | +| `ops-bridge-tunnel-cert` | cert_command vs static-key migration | ops-bridge | coordinate | +| `human-oidc-login` | key-cape / Keycloak IAM Profile | key-cape | canon exists | +| `flex-auth-resource-check` | Policy decision before sensitive action | flex-auth | canon exists | +| `host-principal-deploy` | auth_principals sync | railiance-infra | canon exists | + +--- + +## Tasks + +### T1 — issue-core ingestion key playbook + +```task +id: WARDEN-WP-0012-T01 +status: todo +priority: high +``` + +- [ ] Coordinate with railiance-platform to canonicalize the OpenBao path first. +- [ ] Then write `wiki/playbooks/issue-core-ingestion-api-key.md` (prerequisites, + ESO pattern, rotation, privileged-read policy) and promote the catalog entry + from `draft` to `active` with a `wiki_ref`. + +### T2 — Inter-Hub and bootstrap lanes + +```task +id: WARDEN-WP-0012-T02 +status: todo +priority: medium +``` + +- [ ] Align `wiki/InterHubBootstrapAccessLane.md` with the catalog id. +- [ ] Document attended vs unattended bootstrap branches. +- [ ] Cross-link flex-auth and OpenBao expectations (pointers, not restated steps). + +### T3 — ops-bridge tunnel migration + +```task +id: WARDEN-WP-0012-T03 +status: todo +priority: medium +``` + +- [ ] Playbook: static-key → `cert_command` migration checklist. +- [ ] Pilot tunnel notes (`agt-state-hub-bridge`) — coordinate with ops-bridge. + +### T4 — Platform secret scenarios (LLM, STS, DB) + +```task +id: WARDEN-WP-0012-T04 +status: todo +priority: low +``` + +- [ ] Playbooks for OpenRouter, object-storage STS, DB dynamic creds. +- [ ] Each ends with an owner-repo action; no warden secret code; pointers to canon. + +### T5 — Drift review cadence + +```task +id: WARDEN-WP-0012-T05 +status: todo +priority: low +``` + +- [ ] Document a review cadence against net-kingdom canon. +- [ ] `warden route list --stale` keyed off the `reviewed:` date field. +- [ ] Process note in `wiki/AccessRouting.md`. + +--- + +## Acceptance + +- Every active catalog entry has a `wiki_ref` to an existing section; no active entry + points at a path its owner has not shipped (those stay `draft`). +- `warden route find` resolves common agent queries without wiki grep. +- Playbooks and catalog contain no secret material — only owners, pointers, checklists. + +--- + +## See also + +- `WARDEN-WP-0010`, `WARDEN-WP-0011` +- `wiki/CredentialRouting.md` +- `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`