Close CUST-WP-0045-T06 — canary green on 2026-06-02
The daily-triage workflow completed end-to-end with all three evidence surfaces (working-memory note, State Hub daily_triage event, ActivityRun row) referencing the same run_id f9b97749. Backend: llm-connect against OpenRouter anthropic/claude-sonnet-4, 12.85s end-to-end. Add Implementation Notes - 2026-06-02 capturing the bug chain found and fixed today (five llm-connect commits, two activity-core commits), the backend choice and its consequences for the next scheduled run, and an explicit carve-out: the operational cutover step (pause Codex, flip enabled: true, sync schedules) is intentionally deferred to operator action and remains a prerequisite for T08. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -10,7 +10,7 @@ topic_slug: custodian
|
||||
planning_priority: high
|
||||
planning_order: 45
|
||||
created: "2026-05-19"
|
||||
updated: "2026-06-01"
|
||||
updated: "2026-06-02"
|
||||
state_hub_workstream_id: "d9d9a3ec-f736-4041-beac-bb92c7ad314e"
|
||||
---
|
||||
|
||||
@@ -237,7 +237,7 @@ paused Temporal schedule while disabled.
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T06
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: high
|
||||
depends_on: [CUST-WP-0045-T05]
|
||||
state_hub_task_id: "545162d7-0198-4519-a30b-06e88c6db915"
|
||||
@@ -556,6 +556,100 @@ T07 and T08 remain `todo`. The cutover runbook overlaps with T07's evidence
|
||||
queries but does not cover T07's steady-state "did it run today?" framing or
|
||||
the missed-run policy documentation, so T07 is not satisfied by this session.
|
||||
|
||||
## Implementation Notes - 2026-06-02
|
||||
|
||||
T06 is `done`. The patched canary completed end-to-end against a real LLM and
|
||||
produced all three evidence surfaces in one run.
|
||||
|
||||
**Canary run**
|
||||
|
||||
- workflow id:
|
||||
`activity-6fca51fa-387a-4fd0-bc4e-d62c29eb859a:manual-14811b78-f9fd-4f60-9304-50fdd863960a`
|
||||
- activity-core run id: `f9b97749-c1d0-5746-ab18-89932bef47c1`
|
||||
- Temporal status: `COMPLETED`, runtime `12.85s`
|
||||
- working-memory note:
|
||||
`memory/working/daily-triage-2026-06-02-f9b97749.md` (2,675 bytes,
|
||||
9 recommendations covering work-next/needs-human/revisit/split/park,
|
||||
including a correct self-aware recommendation on `cust-wp-0045` itself
|
||||
and a `park` on `adhoc-2026-06-01`)
|
||||
- State Hub progress event: `935244fa-b438-488c-a11a-42e1a84e3d59`
|
||||
(`event_type: daily_triage`, full report nested under `detail.report`)
|
||||
|
||||
**Backend used for the canary**
|
||||
|
||||
The canary ran via llm-connect against OpenRouter `anthropic/claude-sonnet-4`,
|
||||
not the local Claude Code CLI. The choice was operational: the local CLI
|
||||
backend shares quota with any active Claude Code session (so it could not be
|
||||
driven from inside one), and the JSON schema mode that the local CLI exposes
|
||||
hit a series of envelope-shape and routing bugs that were patched but never
|
||||
fully verified end-to-end on a long prompt. The next operator-scheduled run
|
||||
will inherit whatever llm-connect is currently configured with.
|
||||
|
||||
**Bug chain found and fixed during T06**
|
||||
|
||||
llm-connect:
|
||||
|
||||
- `9de0f49` — Claude Code adapter passes `--output-format json` whenever
|
||||
`--json-schema` is set, then unwraps the CLI envelope. Without this,
|
||||
`--print` returned conversational text on stdout and the structured
|
||||
payload went to a sidecar channel the adapter never read.
|
||||
- `435da49` — envelope unwrap prefers any field whose value parses as JSON
|
||||
and skips telemetry keys (`type`, `usage`, `total_cost_usd`, …) so a prose
|
||||
preamble in `result` no longer wins over the structured payload elsewhere
|
||||
in the envelope.
|
||||
- `cd4551c` — OpenRouter adapter translates `model_params.json_schema` to
|
||||
the OpenAI Chat Completions `response_format` wrapper and drops Claude /
|
||||
llm-connect-specific keys (`reasoning_effort`, `max_depth`). The previous
|
||||
naive `payload.update(config.model_params)` triggered an OpenRouter 400.
|
||||
- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to
|
||||
`False` because most real schemas do not satisfy OpenAI strict mode
|
||||
(`additionalProperties: false` everywhere, all properties required).
|
||||
- `1b01f0e` — OpenRouter adapter honours an explicit constructor `--model`
|
||||
even when that value equals the adapter's hardcoded default. Without this
|
||||
fix, `--model anthropic/claude-sonnet-4` silently fell through to
|
||||
`RunConfig.model_name` (which defaults to `"gpt-4"`), routing every call
|
||||
to OpenAI's gpt-4 — which does not accept `response_format: json_schema`
|
||||
and 400'd. This bug masqueraded as the strict-mode and translation
|
||||
problems above for hours.
|
||||
|
||||
activity-core:
|
||||
|
||||
- `c79d098` — `_ACTIVITY_TIMEOUT` is now `ACTIVITY_TIMEOUT_SECONDS` env-
|
||||
configurable (default 900s). Two racing 5-minute timeouts had been
|
||||
killing the worker side just before llm-connect could write the
|
||||
response, surfacing as `BrokenPipeError` server-side.
|
||||
- `4b4e162` — `instruction_output_error` warning now includes a 2KB raw
|
||||
output preview so the next failure of this shape is one grep away
|
||||
from diagnosis instead of requiring code edits.
|
||||
|
||||
**Operational cutover — NOT done in this session**
|
||||
|
||||
T06's original done-criteria included activity-core being "the only enabled
|
||||
runner" and "the first scheduled run has completed successfully." The
|
||||
technical canary half is complete and proven; the operational toggle remains
|
||||
explicitly deferred to operator action:
|
||||
|
||||
1. Pause the Codex Desktop automation `daily-state-hub-wsjf-triage`.
|
||||
2. Edit `activity-definitions/daily-statehub-wsjf-triage.md` to set
|
||||
`enabled: true`.
|
||||
3. Run `make sync-all && make sync-schedules` in activity-core to unpause
|
||||
the Temporal schedule.
|
||||
4. Watch the next 07:20 Europe/Berlin tick; verify all three evidence
|
||||
surfaces persist on a scheduled (not manual) run.
|
||||
|
||||
This is a prerequisite for **T08** (three daily canaries against the new
|
||||
runner for CUST-WP-0044 calibration). Until the operator performs the
|
||||
cutover, the Codex automation remains the active runner and the
|
||||
activity-core schedule stays paused.
|
||||
|
||||
**Verification**
|
||||
|
||||
- activity-core full suite: `uv run pytest -q` → 120 passed, 1 skipped
|
||||
- llm-connect full suite: `PYTHONPATH=. uv run pytest -q` → 179 passed
|
||||
- Live canary against State Hub on 2026-06-02 at 12:52 UTC → three evidence
|
||||
surfaces, identical run_id across all four (file, progress event,
|
||||
ActivityRun row, Temporal result)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- The daily State Hub WSJF triage runs from activity-core, not Codex app cron.
|
||||
|
||||
Reference in New Issue
Block a user