Close CUST-WP-0045-T06 — canary green on 2026-06-02

The daily-triage workflow completed end-to-end with all three evidence surfaces (working-memory note, State Hub daily_triage event, ActivityRun row) referencing the same run_id f9b97749. Backend: llm-connect against OpenRouter anthropic/claude-sonnet-4, 12.85s end-to-end. Add Implementation Notes - 2026-06-02 capturing the bug chain found and fixed today (five llm-connect commits, two activity-core commits), the backend choice and its consequences for the next scheduled run, and an explicit carve-out: the operational cutover step (pause Codex, flip enabled: true, sync schedules) is intentionally deferred to operator action and remains a prerequisite for T08. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 15:50:35 +02:00
parent 1874ab52bb
commit 80309950bc
1 changed files with 96 additions and 2 deletions
--- a/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md
+++ b/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md
@@ -10,7 +10,7 @@ topic_slug: custodian
 planning_priority: high
 planning_order: 45
 created: "2026-05-19"
-updated: "2026-06-01"
+updated: "2026-06-02"
 state_hub_workstream_id: "d9d9a3ec-f736-4041-beac-bb92c7ad314e"
 ---

@@ -237,7 +237,7 @@ paused Temporal schedule while disabled.

 ```task
 id: CUST-WP-0045-T06
-status: in_progress
+status: done
 priority: high
 depends_on: [CUST-WP-0045-T05]
 state_hub_task_id: "545162d7-0198-4519-a30b-06e88c6db915"
@@ -556,6 +556,100 @@ T07 and T08 remain `todo`. The cutover runbook overlaps with T07's evidence
 queries but does not cover T07's steady-state "did it run today?" framing or
 the missed-run policy documentation, so T07 is not satisfied by this session.

+## Implementation Notes - 2026-06-02
+
+T06 is `done`. The patched canary completed end-to-end against a real LLM and
+produced all three evidence surfaces in one run.
+
+**Canary run**
+
+- workflow id:
+  `activity-6fca51fa-387a-4fd0-bc4e-d62c29eb859a:manual-14811b78-f9fd-4f60-9304-50fdd863960a`
+- activity-core run id: `f9b97749-c1d0-5746-ab18-89932bef47c1`
+- Temporal status: `COMPLETED`, runtime `12.85s`
+- working-memory note:
+  `memory/working/daily-triage-2026-06-02-f9b97749.md` (2,675 bytes,
+  9 recommendations covering work-next/needs-human/revisit/split/park,
+  including a correct self-aware recommendation on `cust-wp-0045` itself
+  and a `park` on `adhoc-2026-06-01`)
+- State Hub progress event: `935244fa-b438-488c-a11a-42e1a84e3d59`
+  (`event_type: daily_triage`, full report nested under `detail.report`)
+
+**Backend used for the canary**
+
+The canary ran via llm-connect against OpenRouter `anthropic/claude-sonnet-4`,
+not the local Claude Code CLI. The choice was operational: the local CLI
+backend shares quota with any active Claude Code session (so it could not be
+driven from inside one), and the JSON schema mode that the local CLI exposes
+hit a series of envelope-shape and routing bugs that were patched but never
+fully verified end-to-end on a long prompt. The next operator-scheduled run
+will inherit whatever llm-connect is currently configured with.
+
+**Bug chain found and fixed during T06**
+
+llm-connect:
+
+- `9de0f49` — Claude Code adapter passes `--output-format json` whenever
+  `--json-schema` is set, then unwraps the CLI envelope. Without this,
+  `--print` returned conversational text on stdout and the structured
+  payload went to a sidecar channel the adapter never read.
+- `435da49` — envelope unwrap prefers any field whose value parses as JSON
+  and skips telemetry keys (`type`, `usage`, `total_cost_usd`, …) so a prose
+  preamble in `result` no longer wins over the structured payload elsewhere
+  in the envelope.
+- `cd4551c` — OpenRouter adapter translates `model_params.json_schema` to
+  the OpenAI Chat Completions `response_format` wrapper and drops Claude /
+  llm-connect-specific keys (`reasoning_effort`, `max_depth`). The previous
+  naive `payload.update(config.model_params)` triggered an OpenRouter 400.
+- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to
+  `False` because most real schemas do not satisfy OpenAI strict mode
+  (`additionalProperties: false` everywhere, all properties required).
+- `1b01f0e` — OpenRouter adapter honours an explicit constructor `--model`
+  even when that value equals the adapter's hardcoded default. Without this
+  fix, `--model anthropic/claude-sonnet-4` silently fell through to
+  `RunConfig.model_name` (which defaults to `"gpt-4"`), routing every call
+  to OpenAI's gpt-4 — which does not accept `response_format: json_schema`
+  and 400'd. This bug masqueraded as the strict-mode and translation
+  problems above for hours.
+
+activity-core:
+
+- `c79d098` — `_ACTIVITY_TIMEOUT` is now `ACTIVITY_TIMEOUT_SECONDS` env-
+  configurable (default 900s). Two racing 5-minute timeouts had been
+  killing the worker side just before llm-connect could write the
+  response, surfacing as `BrokenPipeError` server-side.
+- `4b4e162` — `instruction_output_error` warning now includes a 2KB raw
+  output preview so the next failure of this shape is one grep away
+  from diagnosis instead of requiring code edits.
+
+**Operational cutover — NOT done in this session**
+
+T06's original done-criteria included activity-core being "the only enabled
+runner" and "the first scheduled run has completed successfully." The
+technical canary half is complete and proven; the operational toggle remains
+explicitly deferred to operator action:
+
+1. Pause the Codex Desktop automation `daily-state-hub-wsjf-triage`.
+2. Edit `activity-definitions/daily-statehub-wsjf-triage.md` to set
+   `enabled: true`.
+3. Run `make sync-all && make sync-schedules` in activity-core to unpause
+   the Temporal schedule.
+4. Watch the next 07:20 Europe/Berlin tick; verify all three evidence
+   surfaces persist on a scheduled (not manual) run.
+
+This is a prerequisite for **T08** (three daily canaries against the new
+runner for CUST-WP-0044 calibration). Until the operator performs the
+cutover, the Codex automation remains the active runner and the
+activity-core schedule stays paused.
+
+**Verification**
+
+- activity-core full suite: `uv run pytest -q` → 120 passed, 1 skipped
+- llm-connect full suite: `PYTHONPATH=. uv run pytest -q` → 179 passed
+- Live canary against State Hub on 2026-06-02 at 12:52 UTC → three evidence
+  surfaces, identical run_id across all four (file, progress event,
+  ActivityRun row, Temporal result)
+
 ## Acceptance Criteria

 - The daily State Hub WSJF triage runs from activity-core, not Codex app cron.