coulomb/llm-connect

Fork 0

generated from coulomb/repo-seed

Files

tegwick 24f4c09d42

CI / test (3.10) (push) Has been cancelled

Details

CI / test (3.11) (push) Has been cancelled

Details

CI / test (3.12) (push) Has been cancelled

Details

Implement llm-connect ADHOC diagnostics

2026-06-03 11:56:21 +02:00

9.0 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id

id	type	title	domain	repo	status	owner	topic_slug	created	updated	state_hub_workstream_id
ADHOC-2026-06-02	workplan	Ad hoc — llm-connect lessons from CUST-WP-0045 canary	custodian	llm-connect	finished	custodian	custodian	2026-06-02	2026-06-03	1c936c91-79c7-427d-ab37-9052e8a61cda

ADHOC-2026-06-02 — llm-connect lessons from CUST-WP-0045 canary

Captured at the close of CUST-WP-0045 T06. The daily-triage canary uncovered a sequence of bugs that took the better part of a day to diagnose, mostly because llm-connect has no way to see what the underlying provider returned or what payload the adapter sent. Five commits landed today:

9de0f49 — Claude Code adapter passes --output-format json and unwraps
435da49 — envelope unwrap prefers JSON-bearing fields over prose
cd4551c — OpenRouter adapter translates json_schema → response_format and drops Claude-specific keys
583ab57 — OpenRouter response_format.json_schema.strict defaults to False
1b01f0e — OpenRouter honours explicit --model even when it equals the hardcoded default

This adhoc captures the structural improvements that would have collapsed the diagnosis loop from half a day to minutes. Each task is sized as an opportunistic improvement; the larger ones explicitly call out when to promote into a workplan.

Tasks

T01 - Debug envelope mode for /execute responses

id: ADHOC-2026-06-02-T01
status: done
priority: medium
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"

When the LLM_CONNECT_DEBUG=1 env var is set (or ?debug=1 is added to the /execute request), include the raw provider response and the adapter's constructed payload alongside the normal content field. Today's diagnosis required two iterations on the Claude Code envelope shape and one on the OpenRouter adapter routing — each required modifying source, restarting llm-connect, re-running, observing, modifying again. A debug envelope mode would have given the diagnostic data in one curl.

Proposed response shape under debug mode:

{
  "content": "<unchanged>",
  "model": "<unchanged>",
  "usage": {...},
  "metadata": {...},
  "debug": {
    "provider_request": {"url": "...", "payload": {...}, "headers_redacted": {...}},
    "provider_response": {"status": 200, "body": {...}},
    "adapter_transformations": [
      {"step": "merge_model_params", "before": {...}, "after": {...}},
      {"step": "unwrap_cli_envelope", "before": "...", "after": "..."}
    ]
  }
}

Done when LLM_CONNECT_DEBUG=1 curl /execute … returns the raw round-trip in the response without changing the default response shape, and a test pins the debug field is omitted in normal mode.

T02 - Replace stdlib HTTPServer with ThreadingHTTPServer

id: ADHOC-2026-06-02-T02
status: done
priority: low
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"

llm_connect/server.py uses http.server.HTTPServer, which is single-threaded. Concurrent requests queue behind whatever subprocess or HTTPS call is in flight. That made today's diagnosis worse: a smoke probe sent while the canary's real LLM call was running just hung until the canary finished. With Claude CLI calls taking 60-90s each, queuing is painful.

http.server.ThreadingHTTPServer is a drop-in replacement (subclass of HTTPServer + ThreadingMixIn). One-line change. Document that adapters must be thread-safe (the subprocess and HTTPS adapters already are).

Done when concurrent /execute calls run in parallel and a regression test asserts two requests with measurable backend latency complete in roughly the max of their individual latencies, not the sum.

T03 - Per-call audit log for replay

id: ADHOC-2026-06-02-T03
status: done
priority: medium
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"

When LLM_CONNECT_AUDIT_DIR=/path is set, record every /execute call as a JSON file named {timestamp}-{response_id}.json containing {prompt, config, provider_request, provider_response, parsed_content, latency_seconds}. A sibling python -m llm_connect.replay <file> command re-runs the parser / unwrapper / validation against the recorded provider response without making a new provider call.

Saves quota during diagnosis (today we burned several Sonnet calls just to re-observe the envelope shape) and gives downstream consumers like activity-core's instruction executor a "what did the model actually return" artifact when validation fails.

Promote to a workplan if anyone picks it up — schema for the audit record needs a small design pass (retention, redaction of API keys in provider_request.headers, file size cap), and the CLI side has its own ergonomics.

T04 - Apply param-translation contract to OpenAI and Gemini adapters

id: ADHOC-2026-06-02-T04
status: done
priority: medium
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"

Today's cd4551c added a _merge_model_params helper in the OpenRouter adapter that whitelists OpenAI Chat Completions fields, translates json_schema to response_format, and drops Claude/llm-connect-specific keys. The same naive payload.update(config.model_params) is latent in llm_connect/openai.py and llm_connect/gemini.py — neither has been exercised against an activity-core-style payload yet, but they will 400 the moment they are.

Extract _merge_model_params to a shared module (llm_connect/_payload.py?) and use it from all three adapters. Gemini's accepted top-level fields differ (generationConfig wrapper, safetySettings, etc.), so the helper either needs adapter-specific whitelists or a per-adapter translation table. The shape choice is small but real.

Done when a parametrised test sends the activity-core daily-triage model_params shape (reasoning_effort, max_depth, json_schema) through each adapter and asserts the resulting payload is provider-accepted (no forbidden top-level fields, schema in the right wrapper).

T05 - Provider-agnostic structured-output smoke test in CI

id: ADHOC-2026-06-02-T05
status: done
priority: medium
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"

CI today does not exercise the round-trip from model_params.json_schema through any real provider. The 1b01f0e model-routing bug — where --model anthropic/claude-sonnet-4 silently became gpt-4 because of an inverted comparison — only surfaced through the canary, because none of the adapter tests check the actual payload["model"] value end-to-end.

Add a test (gated on each provider's API key being present in CI env) that:

Starts the server with --provider {p} --model {m} for each registered adapter.
POSTs a fixed prompt with model_params.json_schema requiring {summary, recommendations} (mirroring activity-core's canary shape).
Asserts the response parses cleanly against the schema.
Asserts response.model matches what was configured at startup, not the RunConfig.model_name default.

Without API keys present, the test mocks the provider HTTP layer and asserts the payload matches expectations. That catches routing/translation bugs without provider cost.

Done when a smoke probe shaped like today's CUST-WP-0045 canary would have failed CI on 1b01f0e (the routing bug) and on cd4551c (the translation bug) before either was merged.

T06 - Document required model_params translation in adapter README

id: ADHOC-2026-06-02-T06
status: done
priority: low
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"

Adapter authors and callers both need to know:

which model_params keys are pass-through to the underlying provider
which are translated (json_schema → response_format for OpenAI-shaped providers, → tool-use for native Anthropic, etc.)
which are dropped (Claude-only reasoning_effort, llm-connect's max_depth)
the strict-mode tradeoff and how to opt back in

Today's bug chain happened partly because the activity-core team assumed model_params was a pass-through and the llm-connect team assumed callers would only send OpenAI-valid fields. Codify the contract in docs/adapter-model-params.md (new file) or extend the existing ARCHITECTURE-LAYERS.md.

Done when a new adapter author can read the doc and know what their _merge_model_params implementation must support.

Implementation Notes

Completed on 2026-06-03:

Added opt-in /execute debug envelopes via LLM_CONNECT_DEBUG=1 or ?debug=1, with redacted provider request/response capture and adapter transformation records.
Switched serve mode to ThreadingHTTPServer and added a concurrency regression test.
Added LLM_CONNECT_AUDIT_DIR per-call audit records plus python -m llm_connect.replay for parser/unwrapper replay.
Extracted shared OpenAI-compatible and Gemini payload translation helpers and wired OpenRouter, OpenAI, and Gemini through them.
Added CI-safe structured-output smoke tests that mock provider HTTP calls and assert model routing plus payload shape.
Documented the adapter model_params contract in docs/adapter-model-params.md.

9.0 KiB Raw Blame History

ADHOC-2026-06-02 — llm-connect lessons from CUST-WP-0045 canary

Tasks

T01 - Debug envelope mode for /execute responses

T02 - Replace stdlib HTTPServer with ThreadingHTTPServer

T03 - Per-call audit log for replay

T04 - Apply param-translation contract to OpenAI and Gemini adapters

T05 - Provider-agnostic structured-output smoke test in CI

T06 - Document required model_params translation in adapter README

Implementation Notes

9.0 KiB

Raw Blame History