9.0 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| ADHOC-2026-06-02 | workplan | Ad hoc — llm-connect lessons from CUST-WP-0045 canary | custodian | llm-connect | finished | custodian | custodian | 2026-06-02 | 2026-06-03 | 1c936c91-79c7-427d-ab37-9052e8a61cda |
ADHOC-2026-06-02 — llm-connect lessons from CUST-WP-0045 canary
Captured at the close of CUST-WP-0045 T06. The daily-triage canary uncovered a sequence of bugs that took the better part of a day to diagnose, mostly because llm-connect has no way to see what the underlying provider returned or what payload the adapter sent. Five commits landed today:
9de0f49— Claude Code adapter passes--output-format jsonand unwraps435da49— envelope unwrap prefers JSON-bearing fields over prosecd4551c— OpenRouter adapter translatesjson_schema→response_formatand drops Claude-specific keys583ab57— OpenRouterresponse_format.json_schema.strictdefaults toFalse1b01f0e— OpenRouter honours explicit--modeleven when it equals the hardcoded default
This adhoc captures the structural improvements that would have collapsed the diagnosis loop from half a day to minutes. Each task is sized as an opportunistic improvement; the larger ones explicitly call out when to promote into a workplan.
Tasks
T01 - Debug envelope mode for /execute responses
id: ADHOC-2026-06-02-T01
status: done
priority: medium
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"
When the LLM_CONNECT_DEBUG=1 env var is set (or ?debug=1 is added to the
/execute request), include the raw provider response and the adapter's
constructed payload alongside the normal content field. Today's diagnosis
required two iterations on the Claude Code envelope shape and one on the
OpenRouter adapter routing — each required modifying source, restarting
llm-connect, re-running, observing, modifying again. A debug envelope mode
would have given the diagnostic data in one curl.
Proposed response shape under debug mode:
{
"content": "<unchanged>",
"model": "<unchanged>",
"usage": {...},
"metadata": {...},
"debug": {
"provider_request": {"url": "...", "payload": {...}, "headers_redacted": {...}},
"provider_response": {"status": 200, "body": {...}},
"adapter_transformations": [
{"step": "merge_model_params", "before": {...}, "after": {...}},
{"step": "unwrap_cli_envelope", "before": "...", "after": "..."}
]
}
}
Done when LLM_CONNECT_DEBUG=1 curl /execute … returns the raw round-trip in
the response without changing the default response shape, and a test pins the
debug field is omitted in normal mode.
T02 - Replace stdlib HTTPServer with ThreadingHTTPServer
id: ADHOC-2026-06-02-T02
status: done
priority: low
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"
llm_connect/server.py uses http.server.HTTPServer, which is single-threaded.
Concurrent requests queue behind whatever subprocess or HTTPS call is in flight.
That made today's diagnosis worse: a smoke probe sent while the canary's real
LLM call was running just hung until the canary finished. With Claude CLI calls
taking 60-90s each, queuing is painful.
http.server.ThreadingHTTPServer is a drop-in replacement (subclass of
HTTPServer + ThreadingMixIn). One-line change. Document that adapters must
be thread-safe (the subprocess and HTTPS adapters already are).
Done when concurrent /execute calls run in parallel and a regression test
asserts two requests with measurable backend latency complete in roughly the
max of their individual latencies, not the sum.
T03 - Per-call audit log for replay
id: ADHOC-2026-06-02-T03
status: done
priority: medium
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"
When LLM_CONNECT_AUDIT_DIR=/path is set, record every /execute call as a
JSON file named {timestamp}-{response_id}.json containing {prompt, config, provider_request, provider_response, parsed_content, latency_seconds}. A
sibling python -m llm_connect.replay <file> command re-runs the parser /
unwrapper / validation against the recorded provider response without making a
new provider call.
Saves quota during diagnosis (today we burned several Sonnet calls just to re-observe the envelope shape) and gives downstream consumers like activity-core's instruction executor a "what did the model actually return" artifact when validation fails.
Promote to a workplan if anyone picks it up — schema for the audit record
needs a small design pass (retention, redaction of API keys in
provider_request.headers, file size cap), and the CLI side has its own
ergonomics.
T04 - Apply param-translation contract to OpenAI and Gemini adapters
id: ADHOC-2026-06-02-T04
status: done
priority: medium
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"
Today's cd4551c added a _merge_model_params helper in the OpenRouter
adapter that whitelists OpenAI Chat Completions fields, translates
json_schema to response_format, and drops Claude/llm-connect-specific
keys. The same naive payload.update(config.model_params) is latent in
llm_connect/openai.py and llm_connect/gemini.py — neither has been
exercised against an activity-core-style payload yet, but they will 400
the moment they are.
Extract _merge_model_params to a shared module (llm_connect/_payload.py?)
and use it from all three adapters. Gemini's accepted top-level fields differ
(generationConfig wrapper, safetySettings, etc.), so the helper either
needs adapter-specific whitelists or a per-adapter translation table. The
shape choice is small but real.
Done when a parametrised test sends the activity-core daily-triage
model_params shape (reasoning_effort, max_depth, json_schema) through
each adapter and asserts the resulting payload is provider-accepted (no
forbidden top-level fields, schema in the right wrapper).
T05 - Provider-agnostic structured-output smoke test in CI
id: ADHOC-2026-06-02-T05
status: done
priority: medium
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"
CI today does not exercise the round-trip from model_params.json_schema
through any real provider. The 1b01f0e model-routing bug — where
--model anthropic/claude-sonnet-4 silently became gpt-4 because of an
inverted comparison — only surfaced through the canary, because none of the
adapter tests check the actual payload["model"] value end-to-end.
Add a test (gated on each provider's API key being present in CI env) that:
- Starts the server with
--provider {p} --model {m}for each registered adapter. - POSTs a fixed prompt with
model_params.json_schemarequiring{summary, recommendations}(mirroring activity-core's canary shape). - Asserts the response parses cleanly against the schema.
- Asserts
response.modelmatches what was configured at startup, not theRunConfig.model_namedefault.
Without API keys present, the test mocks the provider HTTP layer and asserts the payload matches expectations. That catches routing/translation bugs without provider cost.
Done when a smoke probe shaped like today's CUST-WP-0045 canary would have
failed CI on 1b01f0e (the routing bug) and on cd4551c (the translation
bug) before either was merged.
T06 - Document required model_params translation in adapter README
id: ADHOC-2026-06-02-T06
status: done
priority: low
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"
Adapter authors and callers both need to know:
- which
model_paramskeys are pass-through to the underlying provider - which are translated (
json_schema→response_formatfor OpenAI-shaped providers, → tool-use for native Anthropic, etc.) - which are dropped (Claude-only
reasoning_effort, llm-connect'smax_depth) - the strict-mode tradeoff and how to opt back in
Today's bug chain happened partly because the activity-core team assumed
model_params was a pass-through and the llm-connect team assumed callers
would only send OpenAI-valid fields. Codify the contract in
docs/adapter-model-params.md (new file) or extend the existing
ARCHITECTURE-LAYERS.md.
Done when a new adapter author can read the doc and know what their
_merge_model_params implementation must support.
Implementation Notes
Completed on 2026-06-03:
- Added opt-in
/executedebug envelopes viaLLM_CONNECT_DEBUG=1or?debug=1, with redacted provider request/response capture and adapter transformation records. - Switched serve mode to
ThreadingHTTPServerand added a concurrency regression test. - Added
LLM_CONNECT_AUDIT_DIRper-call audit records pluspython -m llm_connect.replayfor parser/unwrapper replay. - Extracted shared OpenAI-compatible and Gemini payload translation helpers and wired OpenRouter, OpenAI, and Gemini through them.
- Added CI-safe structured-output smoke tests that mock provider HTTP calls and assert model routing plus payload shape.
- Documented the adapter
model_paramscontract indocs/adapter-model-params.md.