Capture llm-connect lessons from CUST-WP-0045 canary as ADHOC-2026-06-02
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled

The 2026-06-02 daily-triage canary debugging session uncovered five real
bugs (commits 9de0f49, 435da49, cd4551c, 583ab57, 1b01f0e), mostly because
llm-connect has no way to see what payload the adapter sent or what the
provider returned. Capture the six structural improvements that would
collapse the next diagnosis of this shape from half a day to minutes:

  T01 — LLM_CONNECT_DEBUG envelope mode for /execute responses
  T02 — ThreadingHTTPServer drop-in replacement for stdlib HTTPServer
  T03 — Per-call audit log + replay CLI (LLM_CONNECT_AUDIT_DIR)
  T04 — Apply param-translation contract to OpenAI and Gemini adapters
  T05 — Provider-agnostic structured-output smoke test in CI
  T06 — Document the model_params translation contract for adapter authors

All six registered in the State Hub under workstream
adhoc-llmc-2026-06-02 (1c936c91-79c7-427d-ab37-9052e8a61cda).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-06-02 15:55:42 +02:00
parent 1b01f0edf4
commit 79c899b694

View File

@@ -0,0 +1,212 @@
---
id: ADHOC-2026-06-02
type: workplan
title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary"
domain: custodian
repo: llm-connect
status: ready
owner: custodian
topic_slug: custodian
created: "2026-06-02"
updated: "2026-06-02"
state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda"
---
# ADHOC-2026-06-02 — llm-connect lessons from CUST-WP-0045 canary
Captured at the close of CUST-WP-0045 T06. The daily-triage canary uncovered
a sequence of bugs that took the better part of a day to diagnose, mostly
because llm-connect has no way to see what the underlying provider returned
or what payload the adapter sent. Five commits landed today:
- `9de0f49` — Claude Code adapter passes `--output-format json` and unwraps
- `435da49` — envelope unwrap prefers JSON-bearing fields over prose
- `cd4551c` — OpenRouter adapter translates `json_schema``response_format`
and drops Claude-specific keys
- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to `False`
- `1b01f0e` — OpenRouter honours explicit `--model` even when it equals the
hardcoded default
This adhoc captures the structural improvements that would have collapsed the
diagnosis loop from half a day to minutes. Each task is sized as an opportunistic
improvement; the larger ones explicitly call out when to promote into a
workplan.
## Tasks
### T01 - Debug envelope mode for /execute responses
```task
id: ADHOC-2026-06-02-T01
status: todo
priority: medium
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"
```
When the `LLM_CONNECT_DEBUG=1` env var is set (or `?debug=1` is added to the
`/execute` request), include the raw provider response and the adapter's
constructed payload alongside the normal `content` field. Today's diagnosis
required two iterations on the Claude Code envelope shape and one on the
OpenRouter adapter routing — each required modifying source, restarting
llm-connect, re-running, observing, modifying again. A debug envelope mode
would have given the diagnostic data in one curl.
Proposed response shape under debug mode:
```json
{
"content": "<unchanged>",
"model": "<unchanged>",
"usage": {...},
"metadata": {...},
"debug": {
"provider_request": {"url": "...", "payload": {...}, "headers_redacted": {...}},
"provider_response": {"status": 200, "body": {...}},
"adapter_transformations": [
{"step": "merge_model_params", "before": {...}, "after": {...}},
{"step": "unwrap_cli_envelope", "before": "...", "after": "..."}
]
}
}
```
Done when `LLM_CONNECT_DEBUG=1 curl /execute …` returns the raw round-trip in
the response without changing the default response shape, and a test pins the
debug field is omitted in normal mode.
### T02 - Replace stdlib HTTPServer with ThreadingHTTPServer
```task
id: ADHOC-2026-06-02-T02
status: todo
priority: low
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"
```
`llm_connect/server.py` uses `http.server.HTTPServer`, which is single-threaded.
Concurrent requests queue behind whatever subprocess or HTTPS call is in flight.
That made today's diagnosis worse: a smoke probe sent while the canary's real
LLM call was running just hung until the canary finished. With Claude CLI calls
taking 60-90s each, queuing is painful.
`http.server.ThreadingHTTPServer` is a drop-in replacement (subclass of
`HTTPServer` + `ThreadingMixIn`). One-line change. Document that adapters must
be thread-safe (the subprocess and HTTPS adapters already are).
Done when concurrent `/execute` calls run in parallel and a regression test
asserts two requests with measurable backend latency complete in roughly the
max of their individual latencies, not the sum.
### T03 - Per-call audit log for replay
```task
id: ADHOC-2026-06-02-T03
status: todo
priority: medium
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"
```
When `LLM_CONNECT_AUDIT_DIR=/path` is set, record every `/execute` call as a
JSON file named `{timestamp}-{response_id}.json` containing `{prompt, config,
provider_request, provider_response, parsed_content, latency_seconds}`. A
sibling `python -m llm_connect.replay <file>` command re-runs the parser /
unwrapper / validation against the recorded provider response without making a
new provider call.
Saves quota during diagnosis (today we burned several Sonnet calls just to
re-observe the envelope shape) and gives downstream consumers like
activity-core's instruction executor a "what did the model actually return"
artifact when validation fails.
Promote to a workplan if anyone picks it up — schema for the audit record
needs a small design pass (retention, redaction of API keys in
`provider_request.headers`, file size cap), and the CLI side has its own
ergonomics.
### T04 - Apply param-translation contract to OpenAI and Gemini adapters
```task
id: ADHOC-2026-06-02-T04
status: todo
priority: medium
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"
```
Today's `cd4551c` added a `_merge_model_params` helper in the OpenRouter
adapter that whitelists OpenAI Chat Completions fields, translates
`json_schema` to `response_format`, and drops Claude/llm-connect-specific
keys. The same naive `payload.update(config.model_params)` is latent in
`llm_connect/openai.py` and `llm_connect/gemini.py` — neither has been
exercised against an activity-core-style payload yet, but they will 400
the moment they are.
Extract `_merge_model_params` to a shared module (`llm_connect/_payload.py`?)
and use it from all three adapters. Gemini's accepted top-level fields differ
(`generationConfig` wrapper, `safetySettings`, etc.), so the helper either
needs adapter-specific whitelists or a per-adapter translation table. The
shape choice is small but real.
Done when a parametrised test sends the activity-core daily-triage
`model_params` shape (`reasoning_effort`, `max_depth`, `json_schema`) through
each adapter and asserts the resulting payload is provider-accepted (no
forbidden top-level fields, schema in the right wrapper).
### T05 - Provider-agnostic structured-output smoke test in CI
```task
id: ADHOC-2026-06-02-T05
status: todo
priority: medium
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"
```
CI today does not exercise the round-trip from `model_params.json_schema`
through any real provider. The `1b01f0e` model-routing bug — where
`--model anthropic/claude-sonnet-4` silently became `gpt-4` because of an
inverted comparison — only surfaced through the canary, because none of the
adapter tests check the actual `payload["model"]` value end-to-end.
Add a test (gated on each provider's API key being present in CI env) that:
1. Starts the server with `--provider {p} --model {m}` for each registered
adapter.
2. POSTs a fixed prompt with `model_params.json_schema` requiring
`{summary, recommendations}` (mirroring activity-core's canary shape).
3. Asserts the response parses cleanly against the schema.
4. Asserts `response.model` matches what was configured at startup, not the
`RunConfig.model_name` default.
Without API keys present, the test mocks the provider HTTP layer and asserts
the *payload* matches expectations. That catches routing/translation bugs
without provider cost.
Done when a smoke probe shaped like today's CUST-WP-0045 canary would have
failed CI on `1b01f0e` (the routing bug) and on `cd4551c` (the translation
bug) before either was merged.
### T06 - Document required model_params translation in adapter README
```task
id: ADHOC-2026-06-02-T06
status: todo
priority: low
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"
```
Adapter authors and callers both need to know:
- which `model_params` keys are pass-through to the underlying provider
- which are translated (`json_schema``response_format` for OpenAI-shaped
providers, → tool-use for native Anthropic, etc.)
- which are dropped (Claude-only `reasoning_effort`, llm-connect's
`max_depth`)
- the strict-mode tradeoff and how to opt back in
Today's bug chain happened partly because the activity-core team assumed
`model_params` was a pass-through and the llm-connect team assumed callers
would only send OpenAI-valid fields. Codify the contract in
`docs/adapter-model-params.md` (new file) or extend the existing
`ARCHITECTURE-LAYERS.md`.
Done when a new adapter author can read the doc and know what their
`_merge_model_params` implementation must support.