Implement llm-connect ADHOC diagnostics

2026-06-03 11:56:21 +02:00
parent 79c899b694
commit 24f4c09d42
17 changed files with 1618 additions and 611 deletions
--- a/docs/adapter-model-params.md
+++ b/docs/adapter-model-params.md
@@ -0,0 +1,102 @@
+# Adapter `model_params` contract
+
+`RunConfig.model_params` is a portability layer, not a blind provider payload
+escape hatch. Adapters must translate the shared keys they understand, pass
+through only provider-valid keys, and drop provider-specific keys that would
+make another provider reject the request.
+
+## Shared structured output
+
+Callers may request structured output with:
+
+```python
+RunConfig(
+    model_params={
+        "json_schema": {
+            "type": "object",
+            "properties": {
+                "summary": {"type": "string"},
+                "recommendations": {"type": "array", "items": {"type": "string"}},
+            },
+            "required": ["summary", "recommendations"],
+        }
+    }
+)
+```
+
+Adapters translate that key into the provider's native shape:
+
+| Adapter | Translation |
+|---|---|
+| OpenAI | `response_format = {"type": "json_schema", "json_schema": ...}` |
+| OpenRouter | Same OpenAI-compatible `response_format` wrapper |
+| Gemini | `generationConfig.responseMimeType = "application/json"` and `generationConfig.responseSchema = ...` |
+| Claude Code CLI | `--json-schema <schema>` plus `--output-format json`, then envelope unwrap |
+
+OpenAI-compatible adapters default `json_schema.strict` to `False`. Strict mode
+requires schemas to meet provider-specific constraints such as
+`additionalProperties: false` on object nodes and complete `required` lists.
+Callers that need strict behavior can pass an explicit provider-native
+`response_format` in `model_params`.
+
+## Pass-through keys
+
+OpenAI and OpenRouter pass through known Chat Completions fields:
+
+`top_p`, `n`, `stream`, `stop`, `presence_penalty`, `frequency_penalty`,
+`logit_bias`, `user`, `seed`, `tools`, `tool_choice`, `response_format`,
+`logprobs`, `top_logprobs`, and `parallel_tool_calls`.
+
+Gemini passes through valid `generateContent` top-level fields:
+
+`safetySettings`, `tools`, `toolConfig`, `systemInstruction`, and
+`cachedContent`.
+
+Gemini also accepts generation config fields directly or via snake-case aliases:
+
+`candidateCount`, `candidate_count`, `stopSequences`, `stop_sequences`,
+`maxOutputTokens`, `max_output_tokens`, `temperature`, `topP`, `top_p`, `topK`,
+`top_k`, `responseMimeType`, `response_mime_type`, `responseSchema`, and
+`response_schema`.
+
+## Dropped keys
+
+Adapters must drop keys that are meaningful to another adapter or to
+llm-connect itself but invalid for the target provider. The current shared drop
+set includes:
+
+`reasoning_effort`, `max_depth`, `claude_cli_path`, and raw `json_schema` after
+translation.
+
+Unknown keys are ignored by default. This keeps activity-specific configs from
+causing provider HTTP 400 errors when a caller switches providers.
+
+## Diagnostics and replay
+
+Server mode supports opt-in diagnostics for `/execute`:
+
+```bash
+LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
+curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
+```
+
+Debug responses include a `debug` field with the redacted provider request, raw
+provider response body, and adapter transformations such as `merge_model_params`
+or `unwrap_cli_envelope`. Normal responses omit `debug`.
+
+Set `LLM_CONNECT_AUDIT_DIR=/path/to/audit` to write one JSON audit record per
+`/execute` call. Audit records include the prompt, config, redacted provider
+request, provider response, parsed content, and latency. Re-run parsing without
+another provider call with:
+
+```bash
+python -m llm_connect.replay /path/to/audit/record.json --json
+```
+
+## Server concurrency
+
+`llm_connect.server.LLMServer` uses `ThreadingHTTPServer`. Adapter instances
+used in server mode must be safe to call concurrently. The bundled HTTP and
+subprocess adapters keep per-call state local; custom adapters should avoid
+mutating shared instance attributes during `execute_prompt` unless they use
+their own locks.