Adapter `model_params` contract

RunConfig.model_params is a portability layer, not a blind provider payload escape hatch. Adapters must translate the shared keys they understand, pass through only provider-valid keys, and drop provider-specific keys that would make another provider reject the request.

Shared structured output

Callers may request structured output with:

RunConfig(
    model_params={
        "json_schema": {
            "type": "object",
            "properties": {
                "summary": {"type": "string"},
                "recommendations": {"type": "array", "items": {"type": "string"}},
            },
            "required": ["summary", "recommendations"],
        }
    }
)

Adapters translate that key into the provider's native shape:

Adapter	Translation
OpenAI	`response_format = {"type": "json_schema", "json_schema": ...}`
OpenRouter	Same OpenAI-compatible `response_format` wrapper
Gemini	`generationConfig.responseMimeType = "application/json"` and `generationConfig.responseSchema = ...`
Claude Code CLI	`--json-schema <schema>` plus `--output-format json`, then envelope unwrap

OpenAI-compatible adapters default json_schema.strict to False. Strict mode requires schemas to meet provider-specific constraints such as additionalProperties: false on object nodes and complete required lists. Callers that need strict behavior can pass an explicit provider-native response_format in model_params.

Pass-through keys

OpenAI and OpenRouter pass through known Chat Completions fields:

top_p, n, stream, stop, presence_penalty, frequency_penalty, logit_bias, user, seed, tools, tool_choice, response_format, logprobs, top_logprobs, and parallel_tool_calls.

Gemini passes through valid generateContent top-level fields:

safetySettings, tools, toolConfig, systemInstruction, and cachedContent.

Gemini also accepts generation config fields directly or via snake-case aliases:

candidateCount, candidate_count, stopSequences, stop_sequences, maxOutputTokens, max_output_tokens, temperature, topP, top_p, topK, top_k, responseMimeType, response_mime_type, responseSchema, and response_schema.

Dropped keys

Adapters must drop keys that are meaningful to another adapter or to llm-connect itself but invalid for the target provider. The current shared drop set includes:

reasoning_effort, max_depth, claude_cli_path, and raw json_schema after translation.

Unknown keys are ignored by default. This keeps activity-specific configs from causing provider HTTP 400 errors when a caller switches providers.

Diagnostics and replay

Server mode supports opt-in diagnostics for /execute:

LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'

Debug responses include a debug field with the redacted provider request, raw provider response body, and adapter transformations such as merge_model_params or unwrap_cli_envelope. Normal responses omit debug.

Set LLM_CONNECT_AUDIT_DIR=/path/to/audit to write one JSON audit record per /execute call. Audit records include the prompt, config, redacted provider request, provider response, parsed content, and latency. Re-run parsing without another provider call with:

python -m llm_connect.replay /path/to/audit/record.json --json

Server concurrency

llm_connect.server.LLMServer uses ThreadingHTTPServer. Adapter instances used in server mode must be safe to call concurrently. The bundled HTTP and subprocess adapters keep per-call state local; custom adapters should avoid mutating shared instance attributes during execute_prompt unless they use their own locks.

3.8 KiB Raw Blame History

Adapter model_params contract