generated from coulomb/repo-seed
Capture llm-connect lessons from CUST-WP-0045 canary as ADHOC-2026-06-02
The 2026-06-02 daily-triage canary debugging session uncovered five real bugs (commits9de0f49,435da49,cd4551c,583ab57,1b01f0e), mostly because llm-connect has no way to see what payload the adapter sent or what the provider returned. Capture the six structural improvements that would collapse the next diagnosis of this shape from half a day to minutes: T01 — LLM_CONNECT_DEBUG envelope mode for /execute responses T02 — ThreadingHTTPServer drop-in replacement for stdlib HTTPServer T03 — Per-call audit log + replay CLI (LLM_CONNECT_AUDIT_DIR) T04 — Apply param-translation contract to OpenAI and Gemini adapters T05 — Provider-agnostic structured-output smoke test in CI T06 — Document the model_params translation contract for adapter authors All six registered in the State Hub under workstream adhoc-llmc-2026-06-02 (1c936c91-79c7-427d-ab37-9052e8a61cda). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
212
workplans/ADHOC-2026-06-02.md
Normal file
212
workplans/ADHOC-2026-06-02.md
Normal file
@@ -0,0 +1,212 @@
|
||||
---
|
||||
id: ADHOC-2026-06-02
|
||||
type: workplan
|
||||
title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary"
|
||||
domain: custodian
|
||||
repo: llm-connect
|
||||
status: ready
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
created: "2026-06-02"
|
||||
updated: "2026-06-02"
|
||||
state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda"
|
||||
---
|
||||
|
||||
# ADHOC-2026-06-02 — llm-connect lessons from CUST-WP-0045 canary
|
||||
|
||||
Captured at the close of CUST-WP-0045 T06. The daily-triage canary uncovered
|
||||
a sequence of bugs that took the better part of a day to diagnose, mostly
|
||||
because llm-connect has no way to see what the underlying provider returned
|
||||
or what payload the adapter sent. Five commits landed today:
|
||||
|
||||
- `9de0f49` — Claude Code adapter passes `--output-format json` and unwraps
|
||||
- `435da49` — envelope unwrap prefers JSON-bearing fields over prose
|
||||
- `cd4551c` — OpenRouter adapter translates `json_schema` → `response_format`
|
||||
and drops Claude-specific keys
|
||||
- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to `False`
|
||||
- `1b01f0e` — OpenRouter honours explicit `--model` even when it equals the
|
||||
hardcoded default
|
||||
|
||||
This adhoc captures the structural improvements that would have collapsed the
|
||||
diagnosis loop from half a day to minutes. Each task is sized as an opportunistic
|
||||
improvement; the larger ones explicitly call out when to promote into a
|
||||
workplan.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 - Debug envelope mode for /execute responses
|
||||
|
||||
```task
|
||||
id: ADHOC-2026-06-02-T01
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"
|
||||
```
|
||||
|
||||
When the `LLM_CONNECT_DEBUG=1` env var is set (or `?debug=1` is added to the
|
||||
`/execute` request), include the raw provider response and the adapter's
|
||||
constructed payload alongside the normal `content` field. Today's diagnosis
|
||||
required two iterations on the Claude Code envelope shape and one on the
|
||||
OpenRouter adapter routing — each required modifying source, restarting
|
||||
llm-connect, re-running, observing, modifying again. A debug envelope mode
|
||||
would have given the diagnostic data in one curl.
|
||||
|
||||
Proposed response shape under debug mode:
|
||||
|
||||
```json
|
||||
{
|
||||
"content": "<unchanged>",
|
||||
"model": "<unchanged>",
|
||||
"usage": {...},
|
||||
"metadata": {...},
|
||||
"debug": {
|
||||
"provider_request": {"url": "...", "payload": {...}, "headers_redacted": {...}},
|
||||
"provider_response": {"status": 200, "body": {...}},
|
||||
"adapter_transformations": [
|
||||
{"step": "merge_model_params", "before": {...}, "after": {...}},
|
||||
{"step": "unwrap_cli_envelope", "before": "...", "after": "..."}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Done when `LLM_CONNECT_DEBUG=1 curl /execute …` returns the raw round-trip in
|
||||
the response without changing the default response shape, and a test pins the
|
||||
debug field is omitted in normal mode.
|
||||
|
||||
### T02 - Replace stdlib HTTPServer with ThreadingHTTPServer
|
||||
|
||||
```task
|
||||
id: ADHOC-2026-06-02-T02
|
||||
status: todo
|
||||
priority: low
|
||||
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"
|
||||
```
|
||||
|
||||
`llm_connect/server.py` uses `http.server.HTTPServer`, which is single-threaded.
|
||||
Concurrent requests queue behind whatever subprocess or HTTPS call is in flight.
|
||||
That made today's diagnosis worse: a smoke probe sent while the canary's real
|
||||
LLM call was running just hung until the canary finished. With Claude CLI calls
|
||||
taking 60-90s each, queuing is painful.
|
||||
|
||||
`http.server.ThreadingHTTPServer` is a drop-in replacement (subclass of
|
||||
`HTTPServer` + `ThreadingMixIn`). One-line change. Document that adapters must
|
||||
be thread-safe (the subprocess and HTTPS adapters already are).
|
||||
|
||||
Done when concurrent `/execute` calls run in parallel and a regression test
|
||||
asserts two requests with measurable backend latency complete in roughly the
|
||||
max of their individual latencies, not the sum.
|
||||
|
||||
### T03 - Per-call audit log for replay
|
||||
|
||||
```task
|
||||
id: ADHOC-2026-06-02-T03
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"
|
||||
```
|
||||
|
||||
When `LLM_CONNECT_AUDIT_DIR=/path` is set, record every `/execute` call as a
|
||||
JSON file named `{timestamp}-{response_id}.json` containing `{prompt, config,
|
||||
provider_request, provider_response, parsed_content, latency_seconds}`. A
|
||||
sibling `python -m llm_connect.replay <file>` command re-runs the parser /
|
||||
unwrapper / validation against the recorded provider response without making a
|
||||
new provider call.
|
||||
|
||||
Saves quota during diagnosis (today we burned several Sonnet calls just to
|
||||
re-observe the envelope shape) and gives downstream consumers like
|
||||
activity-core's instruction executor a "what did the model actually return"
|
||||
artifact when validation fails.
|
||||
|
||||
Promote to a workplan if anyone picks it up — schema for the audit record
|
||||
needs a small design pass (retention, redaction of API keys in
|
||||
`provider_request.headers`, file size cap), and the CLI side has its own
|
||||
ergonomics.
|
||||
|
||||
### T04 - Apply param-translation contract to OpenAI and Gemini adapters
|
||||
|
||||
```task
|
||||
id: ADHOC-2026-06-02-T04
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"
|
||||
```
|
||||
|
||||
Today's `cd4551c` added a `_merge_model_params` helper in the OpenRouter
|
||||
adapter that whitelists OpenAI Chat Completions fields, translates
|
||||
`json_schema` to `response_format`, and drops Claude/llm-connect-specific
|
||||
keys. The same naive `payload.update(config.model_params)` is latent in
|
||||
`llm_connect/openai.py` and `llm_connect/gemini.py` — neither has been
|
||||
exercised against an activity-core-style payload yet, but they will 400
|
||||
the moment they are.
|
||||
|
||||
Extract `_merge_model_params` to a shared module (`llm_connect/_payload.py`?)
|
||||
and use it from all three adapters. Gemini's accepted top-level fields differ
|
||||
(`generationConfig` wrapper, `safetySettings`, etc.), so the helper either
|
||||
needs adapter-specific whitelists or a per-adapter translation table. The
|
||||
shape choice is small but real.
|
||||
|
||||
Done when a parametrised test sends the activity-core daily-triage
|
||||
`model_params` shape (`reasoning_effort`, `max_depth`, `json_schema`) through
|
||||
each adapter and asserts the resulting payload is provider-accepted (no
|
||||
forbidden top-level fields, schema in the right wrapper).
|
||||
|
||||
### T05 - Provider-agnostic structured-output smoke test in CI
|
||||
|
||||
```task
|
||||
id: ADHOC-2026-06-02-T05
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"
|
||||
```
|
||||
|
||||
CI today does not exercise the round-trip from `model_params.json_schema`
|
||||
through any real provider. The `1b01f0e` model-routing bug — where
|
||||
`--model anthropic/claude-sonnet-4` silently became `gpt-4` because of an
|
||||
inverted comparison — only surfaced through the canary, because none of the
|
||||
adapter tests check the actual `payload["model"]` value end-to-end.
|
||||
|
||||
Add a test (gated on each provider's API key being present in CI env) that:
|
||||
|
||||
1. Starts the server with `--provider {p} --model {m}` for each registered
|
||||
adapter.
|
||||
2. POSTs a fixed prompt with `model_params.json_schema` requiring
|
||||
`{summary, recommendations}` (mirroring activity-core's canary shape).
|
||||
3. Asserts the response parses cleanly against the schema.
|
||||
4. Asserts `response.model` matches what was configured at startup, not the
|
||||
`RunConfig.model_name` default.
|
||||
|
||||
Without API keys present, the test mocks the provider HTTP layer and asserts
|
||||
the *payload* matches expectations. That catches routing/translation bugs
|
||||
without provider cost.
|
||||
|
||||
Done when a smoke probe shaped like today's CUST-WP-0045 canary would have
|
||||
failed CI on `1b01f0e` (the routing bug) and on `cd4551c` (the translation
|
||||
bug) before either was merged.
|
||||
|
||||
### T06 - Document required model_params translation in adapter README
|
||||
|
||||
```task
|
||||
id: ADHOC-2026-06-02-T06
|
||||
status: todo
|
||||
priority: low
|
||||
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"
|
||||
```
|
||||
|
||||
Adapter authors and callers both need to know:
|
||||
|
||||
- which `model_params` keys are pass-through to the underlying provider
|
||||
- which are translated (`json_schema` → `response_format` for OpenAI-shaped
|
||||
providers, → tool-use for native Anthropic, etc.)
|
||||
- which are dropped (Claude-only `reasoning_effort`, llm-connect's
|
||||
`max_depth`)
|
||||
- the strict-mode tradeoff and how to opt back in
|
||||
|
||||
Today's bug chain happened partly because the activity-core team assumed
|
||||
`model_params` was a pass-through and the llm-connect team assumed callers
|
||||
would only send OpenAI-valid fields. Codify the contract in
|
||||
`docs/adapter-model-params.md` (new file) or extend the existing
|
||||
`ARCHITECTURE-LAYERS.md`.
|
||||
|
||||
Done when a new adapter author can read the doc and know what their
|
||||
`_merge_model_params` implementation must support.
|
||||
Reference in New Issue
Block a user