diff --git a/workplans/ADHOC-2026-06-02.md b/workplans/ADHOC-2026-06-02.md new file mode 100644 index 0000000..9304348 --- /dev/null +++ b/workplans/ADHOC-2026-06-02.md @@ -0,0 +1,212 @@ +--- +id: ADHOC-2026-06-02 +type: workplan +title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary" +domain: custodian +repo: llm-connect +status: ready +owner: custodian +topic_slug: custodian +created: "2026-06-02" +updated: "2026-06-02" +state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda" +--- + +# ADHOC-2026-06-02 — llm-connect lessons from CUST-WP-0045 canary + +Captured at the close of CUST-WP-0045 T06. The daily-triage canary uncovered +a sequence of bugs that took the better part of a day to diagnose, mostly +because llm-connect has no way to see what the underlying provider returned +or what payload the adapter sent. Five commits landed today: + +- `9de0f49` — Claude Code adapter passes `--output-format json` and unwraps +- `435da49` — envelope unwrap prefers JSON-bearing fields over prose +- `cd4551c` — OpenRouter adapter translates `json_schema` → `response_format` + and drops Claude-specific keys +- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to `False` +- `1b01f0e` — OpenRouter honours explicit `--model` even when it equals the + hardcoded default + +This adhoc captures the structural improvements that would have collapsed the +diagnosis loop from half a day to minutes. Each task is sized as an opportunistic +improvement; the larger ones explicitly call out when to promote into a +workplan. + +## Tasks + +### T01 - Debug envelope mode for /execute responses + +```task +id: ADHOC-2026-06-02-T01 +status: todo +priority: medium +state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293" +``` + +When the `LLM_CONNECT_DEBUG=1` env var is set (or `?debug=1` is added to the +`/execute` request), include the raw provider response and the adapter's +constructed payload alongside the normal `content` field. Today's diagnosis +required two iterations on the Claude Code envelope shape and one on the +OpenRouter adapter routing — each required modifying source, restarting +llm-connect, re-running, observing, modifying again. A debug envelope mode +would have given the diagnostic data in one curl. + +Proposed response shape under debug mode: + +```json +{ + "content": "", + "model": "", + "usage": {...}, + "metadata": {...}, + "debug": { + "provider_request": {"url": "...", "payload": {...}, "headers_redacted": {...}}, + "provider_response": {"status": 200, "body": {...}}, + "adapter_transformations": [ + {"step": "merge_model_params", "before": {...}, "after": {...}}, + {"step": "unwrap_cli_envelope", "before": "...", "after": "..."} + ] + } +} +``` + +Done when `LLM_CONNECT_DEBUG=1 curl /execute …` returns the raw round-trip in +the response without changing the default response shape, and a test pins the +debug field is omitted in normal mode. + +### T02 - Replace stdlib HTTPServer with ThreadingHTTPServer + +```task +id: ADHOC-2026-06-02-T02 +status: todo +priority: low +state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba" +``` + +`llm_connect/server.py` uses `http.server.HTTPServer`, which is single-threaded. +Concurrent requests queue behind whatever subprocess or HTTPS call is in flight. +That made today's diagnosis worse: a smoke probe sent while the canary's real +LLM call was running just hung until the canary finished. With Claude CLI calls +taking 60-90s each, queuing is painful. + +`http.server.ThreadingHTTPServer` is a drop-in replacement (subclass of +`HTTPServer` + `ThreadingMixIn`). One-line change. Document that adapters must +be thread-safe (the subprocess and HTTPS adapters already are). + +Done when concurrent `/execute` calls run in parallel and a regression test +asserts two requests with measurable backend latency complete in roughly the +max of their individual latencies, not the sum. + +### T03 - Per-call audit log for replay + +```task +id: ADHOC-2026-06-02-T03 +status: todo +priority: medium +state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f" +``` + +When `LLM_CONNECT_AUDIT_DIR=/path` is set, record every `/execute` call as a +JSON file named `{timestamp}-{response_id}.json` containing `{prompt, config, +provider_request, provider_response, parsed_content, latency_seconds}`. A +sibling `python -m llm_connect.replay ` command re-runs the parser / +unwrapper / validation against the recorded provider response without making a +new provider call. + +Saves quota during diagnosis (today we burned several Sonnet calls just to +re-observe the envelope shape) and gives downstream consumers like +activity-core's instruction executor a "what did the model actually return" +artifact when validation fails. + +Promote to a workplan if anyone picks it up — schema for the audit record +needs a small design pass (retention, redaction of API keys in +`provider_request.headers`, file size cap), and the CLI side has its own +ergonomics. + +### T04 - Apply param-translation contract to OpenAI and Gemini adapters + +```task +id: ADHOC-2026-06-02-T04 +status: todo +priority: medium +state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751" +``` + +Today's `cd4551c` added a `_merge_model_params` helper in the OpenRouter +adapter that whitelists OpenAI Chat Completions fields, translates +`json_schema` to `response_format`, and drops Claude/llm-connect-specific +keys. The same naive `payload.update(config.model_params)` is latent in +`llm_connect/openai.py` and `llm_connect/gemini.py` — neither has been +exercised against an activity-core-style payload yet, but they will 400 +the moment they are. + +Extract `_merge_model_params` to a shared module (`llm_connect/_payload.py`?) +and use it from all three adapters. Gemini's accepted top-level fields differ +(`generationConfig` wrapper, `safetySettings`, etc.), so the helper either +needs adapter-specific whitelists or a per-adapter translation table. The +shape choice is small but real. + +Done when a parametrised test sends the activity-core daily-triage +`model_params` shape (`reasoning_effort`, `max_depth`, `json_schema`) through +each adapter and asserts the resulting payload is provider-accepted (no +forbidden top-level fields, schema in the right wrapper). + +### T05 - Provider-agnostic structured-output smoke test in CI + +```task +id: ADHOC-2026-06-02-T05 +status: todo +priority: medium +state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f" +``` + +CI today does not exercise the round-trip from `model_params.json_schema` +through any real provider. The `1b01f0e` model-routing bug — where +`--model anthropic/claude-sonnet-4` silently became `gpt-4` because of an +inverted comparison — only surfaced through the canary, because none of the +adapter tests check the actual `payload["model"]` value end-to-end. + +Add a test (gated on each provider's API key being present in CI env) that: + +1. Starts the server with `--provider {p} --model {m}` for each registered + adapter. +2. POSTs a fixed prompt with `model_params.json_schema` requiring + `{summary, recommendations}` (mirroring activity-core's canary shape). +3. Asserts the response parses cleanly against the schema. +4. Asserts `response.model` matches what was configured at startup, not the + `RunConfig.model_name` default. + +Without API keys present, the test mocks the provider HTTP layer and asserts +the *payload* matches expectations. That catches routing/translation bugs +without provider cost. + +Done when a smoke probe shaped like today's CUST-WP-0045 canary would have +failed CI on `1b01f0e` (the routing bug) and on `cd4551c` (the translation +bug) before either was merged. + +### T06 - Document required model_params translation in adapter README + +```task +id: ADHOC-2026-06-02-T06 +status: todo +priority: low +state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711" +``` + +Adapter authors and callers both need to know: + +- which `model_params` keys are pass-through to the underlying provider +- which are translated (`json_schema` → `response_format` for OpenAI-shaped + providers, → tool-use for native Anthropic, etc.) +- which are dropped (Claude-only `reasoning_effort`, llm-connect's + `max_depth`) +- the strict-mode tradeoff and how to opt back in + +Today's bug chain happened partly because the activity-core team assumed +`model_params` was a pass-through and the llm-connect team assumed callers +would only send OpenAI-valid fields. Codify the contract in +`docs/adapter-model-params.md` (new file) or extend the existing +`ARCHITECTURE-LAYERS.md`. + +Done when a new adapter author can read the doc and know what their +`_merge_model_params` implementation must support.