docs(WP-0010): rewire INTENT to "issue SSH, route the rest"; add access-routing plan

Drop the "operational access desk" framing (and the rejected "coach"
metaphor) for plain language: ops-warden issues short-lived SSH certs and
routes every other credential need to its owner. SSH is the only lane it
executes.

Adds WARDEN-WP-0010/0011/0012 with a pointer-layer routing catalog that
points at owner docs rather than restating them, enforced structurally
(non-SSH entries carrying a steps block fail CI). Drops the scope-creep-prone
`check` command; hides unshipped-path scenarios as draft.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-18 20:07:01 +02:00
parent 41da950e1a
commit dcfcc4b20a
5 changed files with 566 additions and 7 deletions

View File

@@ -40,18 +40,23 @@ short-lived certificate lane** it owns.
> *Where we are going.*
ops-warden aims to become the **operational access desk** for the ops fleet:
ops-warden **issues short-lived SSH certificates and routes every other credential
need to the subsystem that owns it.** It is not a desk that wraps the platform; it
owns one lane and points at the rest:
1. **Know** the NetKingdom security model — identity, authorization, secrets,
SSH access, tunnels, bootstrap custody, and tenant/platform boundaries.
2. **Route** workers to the correct subsystem for each credential type instead
of becoming a universal secret vending machine.
of becoming a universal secret vending machine — through the wiki and a
machine-readable routing catalog that *points at* the owner's docs rather than
restating them.
3. **Align** runbooks, wiki, inventory patterns, and scorecard checks with
NetKingdom canon as the platform evolves (OpenBao-first, flex-auth policy,
key-cape IAM Profile, railiance deployment layers).
4. **Issue** short-lived SSH certificates for `adm` / `agt` / `atm` actors when
host or ops reachability requires the SSH lane — via `warden sign`,
`cert_command`, and `ops-ssh-wrapper`.
`cert_command`, and `ops-ssh-wrapper`. This is the **only** lane ops-warden
executes.
5. **Audit** SSH signing operations and cert-side compliance so gatekeeping is
observable, not tribal knowledge.
@@ -151,7 +156,7 @@ Every successful SSH sign is auditable (`signatures.log`). Compliance checks
Development worker needs access
|
v
ops-warden (steward / desk)
ops-warden (issue SSH; route the rest)
|
+-- SSH host / ops reachability? ----> warden sign / cert_command
|
@@ -164,9 +169,9 @@ ops-warden (steward / desk)
+-- Tunnel only? --------------------> ops-bridge + cert_command
```
Today the **steward desk** is primarily documentation, runbooks, and the
implemented SSH CLI. Routing automation and policy-gated issuance are intentional
follow-ups, not current promises.
Today the steward role is primarily documentation, runbooks, and the implemented
SSH CLI. The machine-readable routing catalog and `warden route` lookup, plus
policy-gated issuance, are intentional follow-ups, not current promises.
---
@@ -207,6 +212,8 @@ ops-warden is succeeding when:
- Replacing OpenBao, flex-auth, key-cape, or railiance deployment ownership
- Storing Inter-Hub, LLM provider, or other long-lived API keys
- Host-side SSH configuration deployment
- **Duplicating or restating another subsystem's procedure** — routing material
points at the owner's docs; it does not fork them
- SSO / Teleport at scale (trigger per Access Management Directive §6.2)
---

View File

@@ -0,0 +1,105 @@
# Decision Record — Sharpen "steward" into "issue SSH, route the rest"
**Date:** 2026-06-18
**Author:** codex
**Status:** Accepted. Feeds WARDEN-WP-0010 T1.
**Supersedes:** the earlier "operations security coach" draft (rejected — see below).
---
## 1. The decision
Keep ops-warden's mission exactly as it is in production and sharpen only the
wording: **ops-warden issues short-lived SSH certificates and routes every other
credential need to the subsystem that owns it.** Add a small machine-readable
routing catalog and a `warden route` lookup CLI so agents stop re-deriving routing
from wiki prose.
This is **wording plus a thin lookup surface**, not a new security lane. SSH
issuance stays the only thing ops-warden executes.
| | Before | After |
| --- | --- | --- |
| Framing | "operational access steward / desk" | "issues SSH certs; routes the rest to its owner" |
| Non-SSH creds | document paths in wiki | same wiki + structured catalog pointing at it |
| Lookup | grep the wiki | `warden route find/show` |
| Foreign APIs | not owned | explicitly not proxied or restated |
Maturity moves **Availability A3 → A4** (structured lookup for agents). Completeness
and Reliability for the SSH lane are unchanged — nothing here ships new signing code.
---
## 2. Why not "coach"
An earlier draft framed this as an "operations security coach." Rejected:
- **Overpromises.** What is built is a routing directory — lookup, not pedagogy.
"Coach" implies teaching and an ongoing relationship the CLI does not deliver,
which feeds the "agent stops at the lookup and never learns the subsystem"
failure mode.
- **Generic / collision-prone** across other custodian domains.
- **No new metaphor needed.** "Steward who issues SSH and routes the rest" is
already accurate and harder to misread as a wrapping service.
Command verb is `warden route` (concrete), not `warden coach`.
---
## 3. The double-source-of-truth trap, and how we avoid it
A routing catalog risks becoming a hand-maintained fork of net-kingdom's
responsibility map. A stale-but-authoritative-looking catalog is **worse** than
wiki prose, because an agent trusts structured output and will not second-guess it.
**Rule (binding on WP-0010 T3 / enforced by WP-0011 T5):** the catalog is a
*pointer layer*. For any subsystem ops-warden does not own, an entry carries only
identifiers + `owner_repo` + `wiki_ref` (in-repo authoritative section) +
`canon_ref` (upstream net-kingdom doc) — **no restated procedure**. Procedure is
authored in exactly one place per need: the wiki section it points to. ops-warden
authors `steps` for exactly one lane — SSH issuance — because it owns it.
This is enforced structurally, not by process: a CI test fails any non-SSH entry
that carries a `steps` block, and checks every `wiki_ref` anchor resolves. We do
not rely on a quarterly human review to catch drift.
---
## 4. Other tightenings applied
- **Dropped `warden coach check`.** Highest scope-creep risk, thin value (`warden
status` already covers SSH local preconditions). SSH precondition hints fold into
`warden route show` instead.
- **No agent-visible stubs for unshipped paths.** Scenarios whose owning repo has
not shipped a real path stay `status: draft` and are hidden from default
lookup (WP-0012 anti-stale rule).
---
## 5. Guardrails (non-negotiable)
1. **One execution lane** — only SSH cert issuance in ops-warden code.
2. **No secret material** in catalog, CLI output, logs, or history.
3. **No foreign API wrappers** — beyond the existing opt-in SSH pre-sign gate.
4. **No restated procedure** for subsystems ops-warden does not own — pointers only.
5. **Canon supremacy** — wiki tracks net-kingdom; ops-warden never overrides it.
---
## 6. Failing signals (watch for these)
- Feature requests cluster on `warden secret` / `warden bao` / `warden login`.
- A catalog entry grows a `steps` block for a non-SSH subsystem.
- `wiki_ref` anchors rot without CI failure.
- Operators bypass OpenBao "because warden is easier" — but warden cannot help.
---
## 7. References
- `INTENT.md`, `SCOPE.md` — pre-update wording
- `workplans/WARDEN-WP-0010-access-routing-charter.md`
- `workplans/WARDEN-WP-0011-routing-guide-cli.md`
- `workplans/WARDEN-WP-0012-routing-scenario-playbooks.md`
- `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` — prior gap analysis
- `wiki/CredentialRouting.md`, `wiki/NetKingdomSecurityMap.md`

View File

@@ -0,0 +1,162 @@
---
id: WARDEN-WP-0010
type: workplan
title: "Access Routing — Charter and Pointer Catalog"
domain: custodian
repo: ops-warden
status: ready
owner: codex
topic_slug: custodian
planning_priority: high
planning_order: 10
created: "2026-06-18"
updated: "2026-06-18"
---
# WARDEN-WP-0010 — Access Routing — Charter and Pointer Catalog
**Scope:** Sharpen the existing steward framing so it cannot be misread as a desk
API that wraps every subsystem. ops-warden **issues SSH certificates** and
**points workers to the owning subsystem** for everything else. This workplan
updates INTENT/SCOPE wording and adds a machine-readable routing catalog that is
a **pointer layer**, not a second copy of NetKingdom canon.
**Not a new security lane.** This is wording + a thin lookup surface. SSH issuance
remains the only thing ops-warden executes. Maturity moves Availability A3 → A4
(structured lookup for agents); Completeness and Reliability for SSH are unchanged.
**Out of scope:** Secret-vending, OIDC, policy PDP, tunnel, or host-hardening code
in this repo; flex-auth policy packages (WARDEN-WP-0009); any universal broker.
**Depends on:** WARDEN-WP-0006 stewardship canon (routing wiki, security map) — shipped.
**Feeds:** WARDEN-WP-0011 (routing CLI over the catalog).
---
## Principles (target)
1. **Point, don't proxy** — Name the owner and the doc; do not wrap a foreign API
unless the answer is an SSH certificate.
2. **Direct interaction** — Workers (humans, agents, CI, operators) call OpenBao,
key-cape, flex-auth, ops-bridge, and railiance repos themselves.
3. **One source of truth** — Routing procedure for non-SSH needs lives in the wiki
(aligned to net-kingdom canon) and upstream canon, **not** restated in the
catalog. The catalog carries identifiers and pointers only. ops-warden authors
procedure for exactly one lane: SSH certificate issuance, which it owns.
4. **Same truth, two shapes** — Humans read the wiki; agents read the catalog. The
catalog references wiki sections by anchor so they cannot drift apart.
---
## No-double-source rule (binding on T3)
The catalog must not contain step-by-step procedure for any subsystem ops-warden
does not own. For non-SSH scenarios an entry carries:
- `owner_repo`, `subsystem` — who to talk to
- `wiki_ref` — anchor into an in-repo wiki section (the authoritative restatement)
- `canon_ref` — upstream net-kingdom doc the wiki section tracks
- `need_keywords`, `title`, `id` — lookup metadata
- `warden_executes: false`
Only `warden_executes: true` (SSH) entries may carry an authored `steps` block and
the `cert_command` pattern — because that is the lane ops-warden owns. A CI test
(WP-0011 T5) enforces this structurally: non-SSH entries with a `steps` block fail.
---
## Tasks
### T1 — INTENT and SCOPE wording
```task
id: WARDEN-WP-0010-T01
status: todo
priority: high
```
- [ ] `INTENT.md` — keep "operational access steward"; replace the "operational
access **desk**" phrasing with plain "issues SSH certs and routes everything
else to its owner." Drop any metaphor that implies a wrapping service.
- [ ] `SCOPE.md` — state the A3 → A4 move plainly: "structured routing lookup for
agents; execution unchanged." Add the coach-free "issue vs route" table.
- [ ] Non-goals: add "duplicating or restating another subsystem's procedure."
- [ ] Cross-link this workplan from the assessment note.
### T2 — Routing-role wiki page
```task
id: WARDEN-WP-0010-T02
status: todo
priority: high
```
- [ ] Create `wiki/AccessRouting.md` — what ops-warden answers (where + who owns
it), what it executes (SSH only), anti-patterns (no `warden secret`,
`warden login`, `warden policy`), and audience notes.
- [ ] Include the **issue-vs-route** matrix (subsystem × ops-warden role × who acts).
- [ ] Link from README, `CredentialRouting.md`, `NetKingdomSecurityMap.md`.
### T3 — Pointer catalog schema + seed
```task
id: WARDEN-WP-0010-T03
status: todo
priority: high
```
- [ ] Define `registry/routing/catalog.yaml` per the **No-double-source rule** above:
`id`, `title`, `need_keywords`, `owner_repo`, `subsystem`, `warden_executes`,
`wiki_ref`, `canon_ref`, `reviewed` (date), `status` (active|draft); plus
`steps` + `cert_command` **only** when `warden_executes: true`.
- [ ] Seed from existing WP-0006 scenarios: SSH cert (executes), OpenBao API key,
flex-auth policy, key-cape OIDC, ops-bridge tunnel, railiance-infra principals.
- [ ] Add `issue-core-ingestion-api-key` as `status: draft` (owner path TBD by
railiance-platform) — draft entries are not surfaced by default lookup.
### T4 — Routing index in CredentialRouting.md
```task
id: WARDEN-WP-0010-T04
status: todo
priority: medium
```
- [ ] Add a playbook index table to `wiki/CredentialRouting.md` keyed to catalog `id`.
- [ ] Add "what ops-warden answers vs what the worker does next on the owner system"
examples — without restating the owner's procedure.
- [ ] Refresh the duplicate-interface anti-examples section.
### T5 — Registry and repo-boundary alignment
```task
id: WARDEN-WP-0010-T05
status: todo
priority: medium
```
- [ ] Update `registry/capabilities/capability.security.ssh-certificate-issuance.md`
— note routing lookup in discovery; target availability notes the routing CLI.
- [ ] Update `.claude/rules/repo-boundary.md` and `AGENTS.md` one-liner (no new
metaphor — "issues SSH certs; routes other credential needs to their owner").
- [ ] Extend the existing capability entry rather than minting a second capability.
---
## Acceptance
- A reader of INTENT + `wiki/AccessRouting.md` understands ops-warden **issues** SSH
certs and **routes** everything else, with no implication it proxies any API.
- `registry/routing/catalog.yaml` exists with ≥6 active scenarios; every non-SSH
entry has `wiki_ref` + `canon_ref` and **no** authored `steps`.
- No new secret-storage or foreign-API code.
---
## See also
- `INTENT.md` · `SCOPE.md`
- `history/2026-06-18-access-routing-intent-shift-assessment.md` — decision record
- `WARDEN-WP-0011` — routing CLI
- `WARDEN-WP-0012` — scenario playbook expansion (backlog)

View File

@@ -0,0 +1,150 @@
---
id: WARDEN-WP-0011
type: workplan
title: "Routing Lookup CLI"
domain: custodian
repo: ops-warden
status: ready
owner: codex
topic_slug: custodian
planning_priority: high
planning_order: 11
created: "2026-06-18"
updated: "2026-06-18"
---
# WARDEN-WP-0011 — Routing Lookup CLI
**Scope:** A `warden route` command group that reads the pointer catalog and tells
a worker which subsystem owns a need, what the prerequisites are, and which
wiki/canon doc to follow **on that system**. ops-warden does not call OpenBao,
flex-auth, or key-cape on the worker's behalf.
**Out of scope:** HTTP API; live probes against any subsystem; secret generation or
retrieval; a separate health/precondition command (see "Dropped" below); replacing
subsystem CLIs.
**Depends on:** WARDEN-WP-0010 T3 (catalog schema + seed).
**Unlocks:** Agents run `warden route show <id> --json` instead of re-deriving
routing from wiki prose each session.
---
## Target CLI
```text
warden route list [--json] [--tag <tag>]
warden route show <id> [--json]
warden route find <query> [--json] # keyword match against need_keywords
```
`list`/`find` show only `status: active` entries by default (`--all` includes draft).
### Behaviour
| Command | Does | Does not |
| --- | --- | --- |
| `list` / `show` | Return owner, wiki/canon pointers, `warden_executes`, anti-patterns | Return secret material |
| `find` | Rank scenarios by keyword overlap | Invoke any external API |
When `warden_executes: true` (SSH), `show` appends the catalog's authored `steps`
and the `warden sign` / `cert_command` pattern, plus a local precondition hint
("actor in inventory? backend configured? run `warden status`"). For all other
scenarios `show` ends with **"next action on `<owner_repo>` — see `<wiki_ref>`"**
and never implies warden performed anything.
### Dropped: separate `check` command
The earlier draft had `warden coach check`. Cut. For SSH, `warden status` already
covers local preconditions; duplicating it invites scope creep toward probing
foreign subsystems. SSH precondition hints live inside `show` instead.
---
## Tasks
### T1 — Catalog loader and models
```task
id: WARDEN-WP-0011-T01
status: todo
priority: high
```
- [ ] Add `src/warden/routing/` package: `models.py`, `catalog.py`.
- [ ] Load and validate `registry/routing/catalog.yaml`.
- [ ] Enforce the no-double-source rule: non-SSH entries with a `steps` block are a
validation error. Clear errors for missing file, schema violations, dup `id`.
### T2 — `warden route list` and `show`
```task
id: WARDEN-WP-0011-T02
status: todo
priority: high
```
- [ ] Register `route` Typer sub-app on the main CLI.
- [ ] `list` — Rich table + `--json` array of summaries; active-only unless `--all`.
- [ ] `show` — owner, prerequisites, pointers (`wiki_ref`, `canon_ref`),
`warden_executes`, anti-patterns; SSH entries also append `steps` + cert pattern.
- [ ] Exit 1 with a `find` hint when `show` id is unknown.
### T3 — `warden route find`
```task
id: WARDEN-WP-0011-T03
status: todo
priority: high
```
- [ ] Tokenize query; match against `need_keywords`, `title`, `id`.
- [ ] Rank, show top matches (default 5); `--json` for agents.
- [ ] Fixtures: "issue core api key", "ssh tunnel", "openrouter key".
### T4 — Tests
```task
id: WARDEN-WP-0011-T04
status: todo
priority: high
```
- [ ] `tests/test_routing.py` — catalog load, no-double-source validation rejects a
non-SSH `steps` block, find ranking, show JSON shape, SSH `show` includes cert
pattern.
- [ ] No integration test requires a live subsystem.
### T5 — Doc consistency + drift guard
```task
id: WARDEN-WP-0011-T05
status: todo
priority: high
```
- [ ] CI/test: every `wiki_ref` anchor resolves to an existing in-repo wiki section;
every entry has a `reviewed` date.
- [ ] `wiki/AccessRouting.md` — CLI section with agent-oriented examples.
- [ ] README — `warden route --help` quick reference.
- [ ] Bump SCOPE availability note A3 → A4 on ship.
---
## Acceptance
- `uv run warden route find "issue core api key"` returns the draft scenario only
with `--all`, and never a generated key.
- `uv run warden route show ssh-cert-host-access --json` includes
`warden_executes: true` and the cert_command pattern.
- A non-SSH catalog entry carrying a `steps` block fails `test_routing.py`.
- `uv run pytest tests/test_routing.py` passes with no live-subsystem dependency.
---
## See also
- `WARDEN-WP-0010` — charter and catalog schema
- `WARDEN-WP-0012` — expanded per-scenario playbooks
- `history/2026-06-17-intent-scope-assessment.md` — prior `warden guide` proposal (P4)

View File

@@ -0,0 +1,135 @@
---
id: WARDEN-WP-0012
type: workplan
title: "Routing Scenario Playbooks"
domain: custodian
repo: ops-warden
status: backlog
owner: codex
topic_slug: custodian
planning_priority: medium
planning_order: 12
created: "2026-06-18"
updated: "2026-06-18"
---
# WARDEN-WP-0012 — Routing Scenario Playbooks
**Scope:** Grow the routing catalog and wiki playbooks for high-frequency NetKingdom
access scenarios. Each wiki playbook restates **what the worker does on the owning
system** and tracks an upstream canon doc; the catalog only points at it. ops-warden
authors procedure only for the SSH lane.
**Out of scope:** Implementing custody in ops-warden; creating OpenBao paths in
railiance-platform (coordinate only); authoring flex-auth policy; restating an
owner's procedure inside the catalog.
**Depends on:** WARDEN-WP-0010 (charter + catalog schema), WARDEN-WP-0011 (routing CLI).
**Status:** `backlog` — start after WP-0010 T3 and WP-0011 T2 ship.
---
## Anti-stale rule
A scenario is added to the catalog as `status: active` **only when its owning repo's
path actually exists** and a `wiki_ref` is written. Until then it stays `status:
draft` and is hidden from default `warden route find`/`list`. We do not seed
agent-visible entries for paths that owners have not shipped — a confident-looking
pointer to a non-existent path is worse than no entry.
---
## Scenario backlog
| Catalog id | Routing focus | Executing owner | Gate |
| --- | --- | --- | --- |
| `issue-core-ingestion-api-key` | OpenBao KV path, K8s injection, rotation | railiance-platform + issue-core | path exists |
| `activity-core-issue-sink` | `ISSUE_CORE_URL` + consumer key custody | activity-core + issue-core | path exists |
| `inter-hub-bootstrap-ssh` | SSH envelope + on-host wrapper reads OpenBao | ops-warden SSH + railiance-infra | ready (SSH lane) |
| `openrouter-llm-connect` | OpenBao → K8s Secret in activity-core | railiance-platform | path exists |
| `object-storage-sts` | NK-WP-0007 vending path | net-kingdom + flex-auth + OpenBao | canon exists |
| `ops-bridge-tunnel-cert` | cert_command vs static-key migration | ops-bridge | coordinate |
| `human-oidc-login` | key-cape / Keycloak IAM Profile | key-cape | canon exists |
| `flex-auth-resource-check` | Policy decision before sensitive action | flex-auth | canon exists |
| `host-principal-deploy` | auth_principals sync | railiance-infra | canon exists |
---
## Tasks
### T1 — issue-core ingestion key playbook
```task
id: WARDEN-WP-0012-T01
status: todo
priority: high
```
- [ ] Coordinate with railiance-platform to canonicalize the OpenBao path first.
- [ ] Then write `wiki/playbooks/issue-core-ingestion-api-key.md` (prerequisites,
ESO pattern, rotation, privileged-read policy) and promote the catalog entry
from `draft` to `active` with a `wiki_ref`.
### T2 — Inter-Hub and bootstrap lanes
```task
id: WARDEN-WP-0012-T02
status: todo
priority: medium
```
- [ ] Align `wiki/InterHubBootstrapAccessLane.md` with the catalog id.
- [ ] Document attended vs unattended bootstrap branches.
- [ ] Cross-link flex-auth and OpenBao expectations (pointers, not restated steps).
### T3 — ops-bridge tunnel migration
```task
id: WARDEN-WP-0012-T03
status: todo
priority: medium
```
- [ ] Playbook: static-key → `cert_command` migration checklist.
- [ ] Pilot tunnel notes (`agt-state-hub-bridge`) — coordinate with ops-bridge.
### T4 — Platform secret scenarios (LLM, STS, DB)
```task
id: WARDEN-WP-0012-T04
status: todo
priority: low
```
- [ ] Playbooks for OpenRouter, object-storage STS, DB dynamic creds.
- [ ] Each ends with an owner-repo action; no warden secret code; pointers to canon.
### T5 — Drift review cadence
```task
id: WARDEN-WP-0012-T05
status: todo
priority: low
```
- [ ] Document a review cadence against net-kingdom canon.
- [ ] `warden route list --stale` keyed off the `reviewed:` date field.
- [ ] Process note in `wiki/AccessRouting.md`.
---
## Acceptance
- Every active catalog entry has a `wiki_ref` to an existing section; no active entry
points at a path its owner has not shipped (those stay `draft`).
- `warden route find` resolves common agent queries without wiki grep.
- Playbooks and catalog contain no secret material — only owners, pointers, checklists.
---
## See also
- `WARDEN-WP-0010`, `WARDEN-WP-0011`
- `wiki/CredentialRouting.md`
- `history/2026-06-18-post-wp0008-intent-scope-reassessment.md`