From 80309950bc0c77fc53963051f3482ceb0a62028f Mon Sep 17 00:00:00 2001 From: tegwick Date: Tue, 2 Jun 2026 15:50:35 +0200 Subject: [PATCH] =?UTF-8?q?Close=20CUST-WP-0045-T06=20=E2=80=94=20canary?= =?UTF-8?q?=20green=20on=202026-06-02?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The daily-triage workflow completed end-to-end with all three evidence surfaces (working-memory note, State Hub daily_triage event, ActivityRun row) referencing the same run_id f9b97749. Backend: llm-connect against OpenRouter anthropic/claude-sonnet-4, 12.85s end-to-end. Add Implementation Notes - 2026-06-02 capturing the bug chain found and fixed today (five llm-connect commits, two activity-core commits), the backend choice and its consequences for the next scheduled run, and an explicit carve-out: the operational cutover step (pause Codex, flip enabled: true, sync schedules) is intentionally deferred to operator action and remains a prerequisite for T08. Co-Authored-By: Claude Opus 4.7 --- ...-0045-activity-core-daily-triage-runner.md | 98 ++++++++++++++++++- 1 file changed, 96 insertions(+), 2 deletions(-) diff --git a/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md b/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md index ca0db9a..92558f0 100644 --- a/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md +++ b/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md @@ -10,7 +10,7 @@ topic_slug: custodian planning_priority: high planning_order: 45 created: "2026-05-19" -updated: "2026-06-01" +updated: "2026-06-02" state_hub_workstream_id: "d9d9a3ec-f736-4041-beac-bb92c7ad314e" --- @@ -237,7 +237,7 @@ paused Temporal schedule while disabled. ```task id: CUST-WP-0045-T06 -status: in_progress +status: done priority: high depends_on: [CUST-WP-0045-T05] state_hub_task_id: "545162d7-0198-4519-a30b-06e88c6db915" @@ -556,6 +556,100 @@ T07 and T08 remain `todo`. The cutover runbook overlaps with T07's evidence queries but does not cover T07's steady-state "did it run today?" framing or the missed-run policy documentation, so T07 is not satisfied by this session. +## Implementation Notes - 2026-06-02 + +T06 is `done`. The patched canary completed end-to-end against a real LLM and +produced all three evidence surfaces in one run. + +**Canary run** + +- workflow id: + `activity-6fca51fa-387a-4fd0-bc4e-d62c29eb859a:manual-14811b78-f9fd-4f60-9304-50fdd863960a` +- activity-core run id: `f9b97749-c1d0-5746-ab18-89932bef47c1` +- Temporal status: `COMPLETED`, runtime `12.85s` +- working-memory note: + `memory/working/daily-triage-2026-06-02-f9b97749.md` (2,675 bytes, + 9 recommendations covering work-next/needs-human/revisit/split/park, + including a correct self-aware recommendation on `cust-wp-0045` itself + and a `park` on `adhoc-2026-06-01`) +- State Hub progress event: `935244fa-b438-488c-a11a-42e1a84e3d59` + (`event_type: daily_triage`, full report nested under `detail.report`) + +**Backend used for the canary** + +The canary ran via llm-connect against OpenRouter `anthropic/claude-sonnet-4`, +not the local Claude Code CLI. The choice was operational: the local CLI +backend shares quota with any active Claude Code session (so it could not be +driven from inside one), and the JSON schema mode that the local CLI exposes +hit a series of envelope-shape and routing bugs that were patched but never +fully verified end-to-end on a long prompt. The next operator-scheduled run +will inherit whatever llm-connect is currently configured with. + +**Bug chain found and fixed during T06** + +llm-connect: + +- `9de0f49` — Claude Code adapter passes `--output-format json` whenever + `--json-schema` is set, then unwraps the CLI envelope. Without this, + `--print` returned conversational text on stdout and the structured + payload went to a sidecar channel the adapter never read. +- `435da49` — envelope unwrap prefers any field whose value parses as JSON + and skips telemetry keys (`type`, `usage`, `total_cost_usd`, …) so a prose + preamble in `result` no longer wins over the structured payload elsewhere + in the envelope. +- `cd4551c` — OpenRouter adapter translates `model_params.json_schema` to + the OpenAI Chat Completions `response_format` wrapper and drops Claude / + llm-connect-specific keys (`reasoning_effort`, `max_depth`). The previous + naive `payload.update(config.model_params)` triggered an OpenRouter 400. +- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to + `False` because most real schemas do not satisfy OpenAI strict mode + (`additionalProperties: false` everywhere, all properties required). +- `1b01f0e` — OpenRouter adapter honours an explicit constructor `--model` + even when that value equals the adapter's hardcoded default. Without this + fix, `--model anthropic/claude-sonnet-4` silently fell through to + `RunConfig.model_name` (which defaults to `"gpt-4"`), routing every call + to OpenAI's gpt-4 — which does not accept `response_format: json_schema` + and 400'd. This bug masqueraded as the strict-mode and translation + problems above for hours. + +activity-core: + +- `c79d098` — `_ACTIVITY_TIMEOUT` is now `ACTIVITY_TIMEOUT_SECONDS` env- + configurable (default 900s). Two racing 5-minute timeouts had been + killing the worker side just before llm-connect could write the + response, surfacing as `BrokenPipeError` server-side. +- `4b4e162` — `instruction_output_error` warning now includes a 2KB raw + output preview so the next failure of this shape is one grep away + from diagnosis instead of requiring code edits. + +**Operational cutover — NOT done in this session** + +T06's original done-criteria included activity-core being "the only enabled +runner" and "the first scheduled run has completed successfully." The +technical canary half is complete and proven; the operational toggle remains +explicitly deferred to operator action: + +1. Pause the Codex Desktop automation `daily-state-hub-wsjf-triage`. +2. Edit `activity-definitions/daily-statehub-wsjf-triage.md` to set + `enabled: true`. +3. Run `make sync-all && make sync-schedules` in activity-core to unpause + the Temporal schedule. +4. Watch the next 07:20 Europe/Berlin tick; verify all three evidence + surfaces persist on a scheduled (not manual) run. + +This is a prerequisite for **T08** (three daily canaries against the new +runner for CUST-WP-0044 calibration). Until the operator performs the +cutover, the Codex automation remains the active runner and the +activity-core schedule stays paused. + +**Verification** + +- activity-core full suite: `uv run pytest -q` → 120 passed, 1 skipped +- llm-connect full suite: `PYTHONPATH=. uv run pytest -q` → 179 passed +- Live canary against State Hub on 2026-06-02 at 12:52 UTC → three evidence + surfaces, identical run_id across all four (file, progress event, + ActivityRun row, Temporal result) + ## Acceptance Criteria - The daily State Hub WSJF triage runs from activity-core, not Codex app cron.