Record daily triage schema canary blocker

2026-05-21 03:19:27 +02:00
parent ed6a13c8d7
commit a28deec772
3 changed files with 114 additions and 6 deletions
--- a/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md
+++ b/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md
@@ -10,7 +10,7 @@ topic_slug: custodian
 planning_priority: high
 planning_order: 45
 created: "2026-05-19"
-updated: "2026-05-19"
+updated: "2026-05-21"
 state_hub_workstream_id: "d9d9a3ec-f736-4041-beac-bb92c7ad314e"
 ---

@@ -421,6 +421,81 @@ Verification:
  `PYTHONPATH=. uv run pytest -q`:
  173 passed

+## Implementation Notes - 2026-05-21
+
+T06 remains in progress; no cutover was performed and the Codex automation must
+remain the fallback runner. The daily triage ActivityDefinition is still
+`enabled: false`.
+
+Real llm-connect canary attempt 1 reached the activity-core workflow but failed
+before report persistence:
+
+- workflow id:
+  `activity-6fca51fa-387a-4fd0-bc4e-d62c29eb859a:manual-d0317873-5e09-4849-a57a-6edff7fada2c`
+- Temporal status: `COMPLETED`
+- activity-core run id: `9b8486b5-0495-5d3f-8b7b-dc078a7c097b`
+- worker evidence: llm-connect returned HTTP 200 twice, but activity-core
+  rejected the instruction output as invalid JSON
+- persistence evidence: no working-memory note and no State Hub
+  `daily_triage` progress event were written
+
+Diagnosis showed that server-mode llm-connect was resolving the older
+`/usr/bin/claude` CLI instead of the working user install at
+`/home/worsch/.local/bin/claude`. A direct llm-connect probe through the older
+CLI returned the literal content `Execution error`, while the user install could
+return raw JSON. Restarting llm-connect with the user CLI path made a small
+probe return `{"ok": true}` through the HTTP boundary.
+
+Real llm-connect canary attempt 2 used the working Claude CLI path but still did
+not produce a persisted report:
+
+- workflow id:
+  `activity-6fca51fa-387a-4fd0-bc4e-d62c29eb859a:manual-2de56ad6-0f82-48f0-8184-f357bd22f658`
+- Temporal status: `COMPLETED`
+- activity-core run id: `953a1f46-e57b-58e1-b4a2-2e41e804a972`
+- worker evidence: first llm-connect call returned HTTP 200, then activity-core
+  retried because the output was not schema-valid JSON; the retry returned
+  HTTP 500
+- persistence evidence: no working-memory note and no State Hub
+  `daily_triage` progress event were written
+
+The follow-up fix keeps the existing activity-core/llm-connect boundary:
+
+- activity-core now loads an instruction's existing `output_schema` and forwards
+  that schema to llm-connect as `model_params.json_schema`
+- llm-connect's Claude Code adapter now prefers
+  `LLM_CONNECT_CLAUDE_CLI_PATH`, `CLAUDE_CLI_PATH`, or the user-local
+  `/home/worsch/.local/bin/claude` before falling back to `claude`
+- llm-connect's Claude Code adapter maps `model_params.json_schema` to the
+  native Claude CLI `--json-schema` option
+- the Custodian ActivityDefinition now points at the domain-owned absolute
+  schema path `/home/worsch/the-custodian/schemas/daily-triage-report.json`
+  and asks for JSON only as a fallback
+
+The patched schema probe could not be completed because the local Claude Code
+session limit was reached; the CLI reported:
+`You've hit your session limit · resets 3:40am (Europe/Berlin)`.
+
+Next T06 step after the limit resets, or after llm-connect routes this profile
+to another approved provider, is to rerun the manual trigger with the patched
+schema path and verify all three evidence surfaces before pausing Codex or
+enabling the activity-core schedule.
+
+Verification:
+
+- activity-core focused executor tests:
+  `uv run pytest tests/rules/test_executor.py -q`:
+  22 passed
+- llm-connect focused Claude Code/factory tests:
+  `PYTHONPATH=. uv run pytest tests/test_claude_code.py tests/test_factory.py -q`:
+  18 passed
+- activity-core full suite:
+  `uv run pytest -q`:
+  115 passed, 1 skipped
+- llm-connect full suite:
+  `PYTHONPATH=. uv run pytest -q`:
+  175 passed
+
 ## Acceptance Criteria

 - The daily State Hub WSJF triage runs from activity-core, not Codex app cron.