generated from coulomb/repo-seed
231 lines
9.0 KiB
Markdown
231 lines
9.0 KiB
Markdown
---
|
|
id: ADHOC-2026-06-02
|
|
type: workplan
|
|
title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary"
|
|
domain: custodian
|
|
repo: llm-connect
|
|
status: finished
|
|
owner: custodian
|
|
topic_slug: custodian
|
|
created: "2026-06-02"
|
|
updated: "2026-06-03"
|
|
state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda"
|
|
---
|
|
|
|
# ADHOC-2026-06-02 — llm-connect lessons from CUST-WP-0045 canary
|
|
|
|
Captured at the close of CUST-WP-0045 T06. The daily-triage canary uncovered
|
|
a sequence of bugs that took the better part of a day to diagnose, mostly
|
|
because llm-connect has no way to see what the underlying provider returned
|
|
or what payload the adapter sent. Five commits landed today:
|
|
|
|
- `9de0f49` — Claude Code adapter passes `--output-format json` and unwraps
|
|
- `435da49` — envelope unwrap prefers JSON-bearing fields over prose
|
|
- `cd4551c` — OpenRouter adapter translates `json_schema` → `response_format`
|
|
and drops Claude-specific keys
|
|
- `583ab57` — OpenRouter `response_format.json_schema.strict` defaults to `False`
|
|
- `1b01f0e` — OpenRouter honours explicit `--model` even when it equals the
|
|
hardcoded default
|
|
|
|
This adhoc captures the structural improvements that would have collapsed the
|
|
diagnosis loop from half a day to minutes. Each task is sized as an opportunistic
|
|
improvement; the larger ones explicitly call out when to promote into a
|
|
workplan.
|
|
|
|
## Tasks
|
|
|
|
### T01 - Debug envelope mode for /execute responses
|
|
|
|
```task
|
|
id: ADHOC-2026-06-02-T01
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"
|
|
```
|
|
|
|
When the `LLM_CONNECT_DEBUG=1` env var is set (or `?debug=1` is added to the
|
|
`/execute` request), include the raw provider response and the adapter's
|
|
constructed payload alongside the normal `content` field. Today's diagnosis
|
|
required two iterations on the Claude Code envelope shape and one on the
|
|
OpenRouter adapter routing — each required modifying source, restarting
|
|
llm-connect, re-running, observing, modifying again. A debug envelope mode
|
|
would have given the diagnostic data in one curl.
|
|
|
|
Proposed response shape under debug mode:
|
|
|
|
```json
|
|
{
|
|
"content": "<unchanged>",
|
|
"model": "<unchanged>",
|
|
"usage": {...},
|
|
"metadata": {...},
|
|
"debug": {
|
|
"provider_request": {"url": "...", "payload": {...}, "headers_redacted": {...}},
|
|
"provider_response": {"status": 200, "body": {...}},
|
|
"adapter_transformations": [
|
|
{"step": "merge_model_params", "before": {...}, "after": {...}},
|
|
{"step": "unwrap_cli_envelope", "before": "...", "after": "..."}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
Done when `LLM_CONNECT_DEBUG=1 curl /execute …` returns the raw round-trip in
|
|
the response without changing the default response shape, and a test pins the
|
|
debug field is omitted in normal mode.
|
|
|
|
### T02 - Replace stdlib HTTPServer with ThreadingHTTPServer
|
|
|
|
```task
|
|
id: ADHOC-2026-06-02-T02
|
|
status: done
|
|
priority: low
|
|
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"
|
|
```
|
|
|
|
`llm_connect/server.py` uses `http.server.HTTPServer`, which is single-threaded.
|
|
Concurrent requests queue behind whatever subprocess or HTTPS call is in flight.
|
|
That made today's diagnosis worse: a smoke probe sent while the canary's real
|
|
LLM call was running just hung until the canary finished. With Claude CLI calls
|
|
taking 60-90s each, queuing is painful.
|
|
|
|
`http.server.ThreadingHTTPServer` is a drop-in replacement (subclass of
|
|
`HTTPServer` + `ThreadingMixIn`). One-line change. Document that adapters must
|
|
be thread-safe (the subprocess and HTTPS adapters already are).
|
|
|
|
Done when concurrent `/execute` calls run in parallel and a regression test
|
|
asserts two requests with measurable backend latency complete in roughly the
|
|
max of their individual latencies, not the sum.
|
|
|
|
### T03 - Per-call audit log for replay
|
|
|
|
```task
|
|
id: ADHOC-2026-06-02-T03
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"
|
|
```
|
|
|
|
When `LLM_CONNECT_AUDIT_DIR=/path` is set, record every `/execute` call as a
|
|
JSON file named `{timestamp}-{response_id}.json` containing `{prompt, config,
|
|
provider_request, provider_response, parsed_content, latency_seconds}`. A
|
|
sibling `python -m llm_connect.replay <file>` command re-runs the parser /
|
|
unwrapper / validation against the recorded provider response without making a
|
|
new provider call.
|
|
|
|
Saves quota during diagnosis (today we burned several Sonnet calls just to
|
|
re-observe the envelope shape) and gives downstream consumers like
|
|
activity-core's instruction executor a "what did the model actually return"
|
|
artifact when validation fails.
|
|
|
|
Promote to a workplan if anyone picks it up — schema for the audit record
|
|
needs a small design pass (retention, redaction of API keys in
|
|
`provider_request.headers`, file size cap), and the CLI side has its own
|
|
ergonomics.
|
|
|
|
### T04 - Apply param-translation contract to OpenAI and Gemini adapters
|
|
|
|
```task
|
|
id: ADHOC-2026-06-02-T04
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"
|
|
```
|
|
|
|
Today's `cd4551c` added a `_merge_model_params` helper in the OpenRouter
|
|
adapter that whitelists OpenAI Chat Completions fields, translates
|
|
`json_schema` to `response_format`, and drops Claude/llm-connect-specific
|
|
keys. The same naive `payload.update(config.model_params)` is latent in
|
|
`llm_connect/openai.py` and `llm_connect/gemini.py` — neither has been
|
|
exercised against an activity-core-style payload yet, but they will 400
|
|
the moment they are.
|
|
|
|
Extract `_merge_model_params` to a shared module (`llm_connect/_payload.py`?)
|
|
and use it from all three adapters. Gemini's accepted top-level fields differ
|
|
(`generationConfig` wrapper, `safetySettings`, etc.), so the helper either
|
|
needs adapter-specific whitelists or a per-adapter translation table. The
|
|
shape choice is small but real.
|
|
|
|
Done when a parametrised test sends the activity-core daily-triage
|
|
`model_params` shape (`reasoning_effort`, `max_depth`, `json_schema`) through
|
|
each adapter and asserts the resulting payload is provider-accepted (no
|
|
forbidden top-level fields, schema in the right wrapper).
|
|
|
|
### T05 - Provider-agnostic structured-output smoke test in CI
|
|
|
|
```task
|
|
id: ADHOC-2026-06-02-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"
|
|
```
|
|
|
|
CI today does not exercise the round-trip from `model_params.json_schema`
|
|
through any real provider. The `1b01f0e` model-routing bug — where
|
|
`--model anthropic/claude-sonnet-4` silently became `gpt-4` because of an
|
|
inverted comparison — only surfaced through the canary, because none of the
|
|
adapter tests check the actual `payload["model"]` value end-to-end.
|
|
|
|
Add a test (gated on each provider's API key being present in CI env) that:
|
|
|
|
1. Starts the server with `--provider {p} --model {m}` for each registered
|
|
adapter.
|
|
2. POSTs a fixed prompt with `model_params.json_schema` requiring
|
|
`{summary, recommendations}` (mirroring activity-core's canary shape).
|
|
3. Asserts the response parses cleanly against the schema.
|
|
4. Asserts `response.model` matches what was configured at startup, not the
|
|
`RunConfig.model_name` default.
|
|
|
|
Without API keys present, the test mocks the provider HTTP layer and asserts
|
|
the *payload* matches expectations. That catches routing/translation bugs
|
|
without provider cost.
|
|
|
|
Done when a smoke probe shaped like today's CUST-WP-0045 canary would have
|
|
failed CI on `1b01f0e` (the routing bug) and on `cd4551c` (the translation
|
|
bug) before either was merged.
|
|
|
|
### T06 - Document required model_params translation in adapter README
|
|
|
|
```task
|
|
id: ADHOC-2026-06-02-T06
|
|
status: done
|
|
priority: low
|
|
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"
|
|
```
|
|
|
|
Adapter authors and callers both need to know:
|
|
|
|
- which `model_params` keys are pass-through to the underlying provider
|
|
- which are translated (`json_schema` → `response_format` for OpenAI-shaped
|
|
providers, → tool-use for native Anthropic, etc.)
|
|
- which are dropped (Claude-only `reasoning_effort`, llm-connect's
|
|
`max_depth`)
|
|
- the strict-mode tradeoff and how to opt back in
|
|
|
|
Today's bug chain happened partly because the activity-core team assumed
|
|
`model_params` was a pass-through and the llm-connect team assumed callers
|
|
would only send OpenAI-valid fields. Codify the contract in
|
|
`docs/adapter-model-params.md` (new file) or extend the existing
|
|
`ARCHITECTURE-LAYERS.md`.
|
|
|
|
Done when a new adapter author can read the doc and know what their
|
|
`_merge_model_params` implementation must support.
|
|
|
|
## Implementation Notes
|
|
|
|
Completed on 2026-06-03:
|
|
|
|
- Added opt-in `/execute` debug envelopes via `LLM_CONNECT_DEBUG=1` or
|
|
`?debug=1`, with redacted provider request/response capture and adapter
|
|
transformation records.
|
|
- Switched serve mode to `ThreadingHTTPServer` and added a concurrency
|
|
regression test.
|
|
- Added `LLM_CONNECT_AUDIT_DIR` per-call audit records plus
|
|
`python -m llm_connect.replay` for parser/unwrapper replay.
|
|
- Extracted shared OpenAI-compatible and Gemini payload translation helpers
|
|
and wired OpenRouter, OpenAI, and Gemini through them.
|
|
- Added CI-safe structured-output smoke tests that mock provider HTTP calls
|
|
and assert model routing plus payload shape.
|
|
- Documented the adapter `model_params` contract in
|
|
`docs/adapter-model-params.md`.
|