feat(WARDEN-WP-0020): conservative triage tier as the --execute default (Option A)

Per Bernd's call: the guardrails prevent security harm but not LLM content errors, so the
worker should triage + draft, not auto-send, until reply quality is proven (matches the
build-stage/recoverability posture).

run_conservative triages NEW messages into a reviewed digest (state_dir/worker-digest.md)
with drafted replies, posts ONE progress note, tracks seen message ids (schedule-safe
dedup), and sends NOTHING to other agents / marks nothing read. `warden worker run
--execute` now runs this conservative tier; `--full-auto` opts into the auto-send path.

Live-verified with the LLM brain on the real inbox: produced a high-quality draft reply to
a secrets-engine coordination message and correctly flagged the llm-connect custody request
as NEEDS YOU. Conservative mode is safe to schedule (T4). 244 tests, lint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-30 00:38:36 +02:00
parent a55b3b7735
commit d0261ebb52
4 changed files with 137 additions and 12 deletions

View File

@@ -120,8 +120,15 @@ state_hub_task_id: "3a71965e-42d5-4258-9761-aced804c88e7"
per-message audit summary. Tests in `tests/test_worker.py` (route_answer reply+mark,
reply-with/without-body, escalated skip, catalog-diff left-for-human, progress_note,
failure-without-crash). 243 pass, lint clean.
- Note: first **live** `--execute` shakedown is the operator's (staged rollout: dry-run →
manual → scheduled); T4 wraps it on a schedule.
- [x] **Conservative tier is now the `--execute` default (Bernd's Option A, 2026-06-30):**
`run_conservative` triages NEW messages into a reviewed digest (`worker-digest.md`)
with drafted replies, posts ONE progress note, tracks seen ids (schedule-safe dedup),
and sends **nothing** to other agents / marks nothing read. `--full-auto` opts into the
auto-send path. Live-verified with the LLM brain: produced a high-quality draft reply
to secrets-engine and flagged the llm-connect request as NEEDS YOU. 244 tests.
Rationale: the guardrails prevent *security* harm but not LLM *content* errors, so replies
stay drafts-for-approval until quality is proven — matches the build-stage/recoverability
posture. Conservative mode is safe to schedule (T4).
### T4 — Scheduled trigger