Close CUST-WP-0045-T06 — canary green on 2026-06-02

The daily-triage workflow completed end-to-end with all three evidence
surfaces (working-memory note, State Hub daily_triage event, ActivityRun
row) referencing the same run_id f9b97749. Backend: llm-connect against
OpenRouter anthropic/claude-sonnet-4, 12.85s end-to-end.

Add Implementation Notes - 2026-06-02 capturing the bug chain found and
fixed today (five llm-connect commits, two activity-core commits), the
backend choice and its consequences for the next scheduled run, and an
explicit carve-out: the operational cutover step (pause Codex, flip
enabled: true, sync schedules) is intentionally deferred to operator
action and remains a prerequisite for T08.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-06-02 15:50:35 +02:00
parent 1874ab52bb
commit 80309950bc

View File

@@ -10,7 +10,7 @@ topic_slug: custodian
planning_priority: high
planning_order: 45
created: "2026-05-19"
updated: "2026-06-01"
updated: "2026-06-02"
state_hub_workstream_id: "d9d9a3ec-f736-4041-beac-bb92c7ad314e"
---
@@ -237,7 +237,7 @@ paused Temporal schedule while disabled.
```task
id: CUST-WP-0045-T06
status: in_progress
status: done
priority: high
depends_on: [CUST-WP-0045-T05]
state_hub_task_id: "545162d7-0198-4519-a30b-06e88c6db915"
@@ -556,6 +556,100 @@ T07 and T08 remain `todo`. The cutover runbook overlaps with T07's evidence
queries but does not cover T07's steady-state "did it run today?" framing or
the missed-run policy documentation, so T07 is not satisfied by this session.
## Implementation Notes - 2026-06-02
T06 is `done`. The patched canary completed end-to-end against a real LLM and
produced all three evidence surfaces in one run.
**Canary run**
- workflow id:
`activity-6fca51fa-387a-4fd0-bc4e-d62c29eb859a:manual-14811b78-f9fd-4f60-9304-50fdd863960a`
- activity-core run id: `f9b97749-c1d0-5746-ab18-89932bef47c1`
- Temporal status: `COMPLETED`, runtime `12.85s`
- working-memory note:
`memory/working/daily-triage-2026-06-02-f9b97749.md` (2,675 bytes,
9 recommendations covering work-next/needs-human/revisit/split/park,
including a correct self-aware recommendation on `cust-wp-0045` itself
and a `park` on `adhoc-2026-06-01`)
- State Hub progress event: `935244fa-b438-488c-a11a-42e1a84e3d59`
(`event_type: daily_triage`, full report nested under `detail.report`)
**Backend used for the canary**
The canary ran via llm-connect against OpenRouter `anthropic/claude-sonnet-4`,
not the local Claude Code CLI. The choice was operational: the local CLI
backend shares quota with any active Claude Code session (so it could not be
driven from inside one), and the JSON schema mode that the local CLI exposes
hit a series of envelope-shape and routing bugs that were patched but never
fully verified end-to-end on a long prompt. The next operator-scheduled run
will inherit whatever llm-connect is currently configured with.
**Bug chain found and fixed during T06**
llm-connect:
- `9de0f49` — Claude Code adapter passes `--output-format json` whenever
`--json-schema` is set, then unwraps the CLI envelope. Without this,
`--print` returned conversational text on stdout and the structured
payload went to a sidecar channel the adapter never read.
- `435da49` — envelope unwrap prefers any field whose value parses as JSON
and skips telemetry keys (`type`, `usage`, `total_cost_usd`, …) so a prose
preamble in `result` no longer wins over the structured payload elsewhere
in the envelope.
- `cd4551c` — OpenRouter adapter translates `model_params.json_schema` to
the OpenAI Chat Completions `response_format` wrapper and drops Claude /
llm-connect-specific keys (`reasoning_effort`, `max_depth`). The previous
naive `payload.update(config.model_params)` triggered an OpenRouter 400.
- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to
`False` because most real schemas do not satisfy OpenAI strict mode
(`additionalProperties: false` everywhere, all properties required).
- `1b01f0e` — OpenRouter adapter honours an explicit constructor `--model`
even when that value equals the adapter's hardcoded default. Without this
fix, `--model anthropic/claude-sonnet-4` silently fell through to
`RunConfig.model_name` (which defaults to `"gpt-4"`), routing every call
to OpenAI's gpt-4 — which does not accept `response_format: json_schema`
and 400'd. This bug masqueraded as the strict-mode and translation
problems above for hours.
activity-core:
- `c79d098` — `_ACTIVITY_TIMEOUT` is now `ACTIVITY_TIMEOUT_SECONDS` env-
configurable (default 900s). Two racing 5-minute timeouts had been
killing the worker side just before llm-connect could write the
response, surfacing as `BrokenPipeError` server-side.
- `4b4e162` — `instruction_output_error` warning now includes a 2KB raw
output preview so the next failure of this shape is one grep away
from diagnosis instead of requiring code edits.
**Operational cutover — NOT done in this session**
T06's original done-criteria included activity-core being "the only enabled
runner" and "the first scheduled run has completed successfully." The
technical canary half is complete and proven; the operational toggle remains
explicitly deferred to operator action:
1. Pause the Codex Desktop automation `daily-state-hub-wsjf-triage`.
2. Edit `activity-definitions/daily-statehub-wsjf-triage.md` to set
`enabled: true`.
3. Run `make sync-all && make sync-schedules` in activity-core to unpause
the Temporal schedule.
4. Watch the next 07:20 Europe/Berlin tick; verify all three evidence
surfaces persist on a scheduled (not manual) run.
This is a prerequisite for **T08** (three daily canaries against the new
runner for CUST-WP-0044 calibration). Until the operator performs the
cutover, the Codex automation remains the active runner and the
activity-core schedule stays paused.
**Verification**
- activity-core full suite: `uv run pytest -q` → 120 passed, 1 skipped
- llm-connect full suite: `PYTHONPATH=. uv run pytest -q` → 179 passed
- Live canary against State Hub on 2026-06-02 at 12:52 UTC → three evidence
surfaces, identical run_id across all four (file, progress event,
ActivityRun row, Temporal result)
## Acceptance Criteria
- The daily State Hub WSJF triage runs from activity-core, not Codex app cron.