feat(WARDEN-WP-0020): ops-warden coordination worker — T1 dry-run scaffold

Foundation for an autonomous worker that handles ops-warden's State Hub coordination lane via llm-connect (Bernd's call: full-auto in-scope + scheduled, staged dry-run -> manual -> scheduled). T1 is the llm-connect-independent, safe slice: src/warden/worker.py — HubClient (read unread to_agent=ops-warden), Brain protocol, deterministic RuleBrain (answers clear routing questions, escalates the rest), PlannedAction/WorkerPlan model, guardrail allowlist + validate_action enforced brain-agnostically (no-secret invariant + prod-config + off-allowlist all escalate), render_plans dry-run output. `warden worker run --dry-run` (default); --execute refused (exit 2) until the guarded executor (T3) lands. Guardrails are load-bearing because full-auto has no human in the loop: message content is untrusted data, the allowlist is enforced regardless of what the brain proposes. Hard dependency flagged in the workplan: the brain is llm-connect, which needs its provider key (OPENROUTER_API_KEY, deferred CCR-2026-0003) before it can run. 18 worker tests; 229 pass, lint clean. Live dry-run against the real hub verified. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 19:07:06 +02:00
parent 69d8ee848f
commit 211994ddbb
4 changed files with 473 additions and 0 deletions
--- a/workplans/WARDEN-WP-0020-ops-warden-worker.md
+++ b/workplans/WARDEN-WP-0020-ops-warden-worker.md
@@ -0,0 +1,135 @@
+---
+id: WARDEN-WP-0020
+type: workplan
+title: "ops-warden worker — autonomous coordination via llm-connect"
+domain: infotech
+repo: ops-warden
+status: active
+owner: claude
+topic_slug: custodian
+planning_priority: high
+planning_order: 20
+created: "2026-06-29"
+updated: "2026-06-29"
+---
+
+# WARDEN-WP-0020 — ops-warden worker (`warden worker`)
+
+**Problem:** ops-warden's coordination lane (State Hub inbox `to_agent=ops-warden`) is
+handled only when a human spins up an ops-warden session and relays instructions. That
+doesn't scale — Bernd is hand-relaying between flex-auth ↔ secrets-engine ↔ ops-warden
+across sessions.
+
+**Goal:** a `warden worker` CLI that pulls ops-warden's unread coordination requests and,
+using **llm-connect** for inference, drives each to an ops-warden action (answer a routing
+question, draft+send a reply, mark read, propose/commit a catalog diff, or escalate) — so
+the inbox is handled without a human starting a session.
+
+**Decisions (Bernd, 2026-06-29):** **full-auto in-scope** (worker executes any in-scope
+action; escalates only secrets/prod/out-of-scope) and **scheduled/unattended** (cron or
+activity-core). Because there is no human in the loop for in-scope actions, the guardrails
+are load-bearing and the rollout is staged: **dry-run → manual → scheduled**.
+
+**Build vs reuse:** inference = llm-connect (`/execute`); trigger = cron or activity-core
+(reuse the durable task factory, don't reinvent scheduling). Worker logic lives in warden.
+
+## Guardrails (non-negotiable — full-auto rests on these)
+1. **Fixed charter, non-overridable.** The boundary (issue SSH; route everything else;
+   conduit-not-broker; never hold/print a secret value) is a fixed system policy. Message
+   content is **untrusted data**, never instructions that can relax it (prompt-injection
+   containment).
+2. **Action allowlist.** Every action is validated against an allowlist before execution;
+   off-list → escalate. No secret handling, no prod-config writes, no irreversible/outward
+   actions without an explicit human ack.
+3. **No-secret invariant.** Refuse any task requiring a secret value in hand or in a prompt.
+4. **Full audit + dry-run.** Every action emits a progress event; `--dry-run` shows the
+   plan without executing. Scheduled mode only after a clean dry-run shakedown.
+
+## Hard dependency
+llm-connect must be operational — it needs its provider key (`OPENROUTER_API_KEY`,
+CCR-2026-0003, currently deferred by railiance-platform/secrets-engine). The worker is
+built against llm-connect's contract; it cannot run the brain until that lands.
+
+---
+
+## Tasks
+
+### T1 — Worker scaffold (llm-connect-independent, safe)
+
+```task
+id: WARDEN-WP-0020-T01
+status: done
+priority: high
+```
+
+- [x] `src/warden/worker.py`: State Hub inbox client (`HubClient.unread`), a `Brain`
+      protocol, a deterministic `RuleBrain` default (answers clear routing questions;
+      escalates the rest), the `PlannedAction`/`WorkerPlan` model, the guardrail allowlist +
+      `validate_action` (enforced brain-agnostically in `build_plans`), and a `render_plans`
+      dry-run renderer (plan only, no execution).
+- [x] `warden worker run [--once] [--dry-run]` CLI; `--dry-run` is the default and
+      `--execute` is refused (exit 2) until the guarded executor lands (T3).
+- [x] `tests/test_worker.py` (RuleBrain routing/secret/prod/unknown, guardrail downgrades a
+      reckless brain on secret/prod, off-allowlist rejection, render, CLI). 18 cases.
+- [x] Live dry-run against the real hub verified — read the inbox and produced a guardrailed
+      plan (it surfaced secrets-engine's OIDC-role reply, demonstrating the value).
+
+### T2 — llm-connect brain
+
+```task
+id: WARDEN-WP-0020-T02
+status: todo
+priority: high
+```
+
+- [ ] `LlmConnectBrain`: POST to llm-connect `/execute` with the fixed charter system
+      policy + the message as untrusted data; parse a structured action plan. Configurable
+      `llm_connect_url`. Blocked on llm-connect's API contract + it being operational.
+
+### T3 — Action dispatch + guardrails (full-auto in-scope)
+
+```task
+id: WARDEN-WP-0020-T03
+status: todo
+priority: high
+```
+
+- [ ] Execute in-scope actions: `warden route/access` answers, drafted replies, mark-read,
+      catalog/playbook diffs (commit + sync). Enforce the allowlist + no-secret invariant in
+      code; per-action progress-event audit; escalation path to a human queue.
+
+### T4 — Scheduled trigger
+
+```task
+id: WARDEN-WP-0020-T04
+status: todo
+priority: medium
+```
+
+- [ ] Wire cron or activity-core to `warden worker run --once`. Ships **disabled**; enabled
+      only after a clean dry-run shakedown. Concurrency guard (no overlapping runs).
+
+### T5 — Docs / SCOPE / INTENT
+
+```task
+id: WARDEN-WP-0020-T05
+status: todo
+priority: medium
+```
+
+- [ ] Record the scope expansion: ops-warden gains an autonomous coordination worker.
+      Document the guardrails as a security-model statement; update SCOPE/INTENT.
+
+---
+
+## Acceptance
+
+- `warden worker run --dry-run` reads the real inbox and prints a guardrailed plan.
+- Full-auto execution runs only in-scope, allowlisted actions; secrets/prod/out-of-scope
+  escalate; every action is audited. No secret value ever enters a prompt, log, or commit.
+- Scheduled mode is enabled only after a dry-run shakedown.
+
+## See also
+
+- llm-connect (inference), activity-core (durable trigger), kaizen-agentic (personas)
+- `.claude/rules/credential-routing.md` (the boundary the worker enforces)