3.8 KiB
Adapter model_params contract
RunConfig.model_params is a portability layer, not a blind provider payload
escape hatch. Adapters must translate the shared keys they understand, pass
through only provider-valid keys, and drop provider-specific keys that would
make another provider reject the request.
Shared structured output
Callers may request structured output with:
RunConfig(
model_params={
"json_schema": {
"type": "object",
"properties": {
"summary": {"type": "string"},
"recommendations": {"type": "array", "items": {"type": "string"}},
},
"required": ["summary", "recommendations"],
}
}
)
Adapters translate that key into the provider's native shape:
| Adapter | Translation |
|---|---|
| OpenAI | response_format = {"type": "json_schema", "json_schema": ...} |
| OpenRouter | Same OpenAI-compatible response_format wrapper |
| Gemini | generationConfig.responseMimeType = "application/json" and generationConfig.responseSchema = ... |
| Claude Code CLI | --json-schema <schema> plus --output-format json, then envelope unwrap |
OpenAI-compatible adapters default json_schema.strict to False. Strict mode
requires schemas to meet provider-specific constraints such as
additionalProperties: false on object nodes and complete required lists.
Callers that need strict behavior can pass an explicit provider-native
response_format in model_params.
Pass-through keys
OpenAI and OpenRouter pass through known Chat Completions fields:
top_p, n, stream, stop, presence_penalty, frequency_penalty,
logit_bias, user, seed, tools, tool_choice, response_format,
logprobs, top_logprobs, and parallel_tool_calls.
Gemini passes through valid generateContent top-level fields:
safetySettings, tools, toolConfig, systemInstruction, and
cachedContent.
Gemini also accepts generation config fields directly or via snake-case aliases:
candidateCount, candidate_count, stopSequences, stop_sequences,
maxOutputTokens, max_output_tokens, temperature, topP, top_p, topK,
top_k, responseMimeType, response_mime_type, responseSchema, and
response_schema.
Dropped keys
Adapters must drop keys that are meaningful to another adapter or to llm-connect itself but invalid for the target provider. The current shared drop set includes:
reasoning_effort, max_depth, claude_cli_path, and raw json_schema after
translation.
Unknown keys are ignored by default. This keeps activity-specific configs from causing provider HTTP 400 errors when a caller switches providers.
Diagnostics and replay
Server mode supports opt-in diagnostics for /execute:
LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
Debug responses include a debug field with the redacted provider request, raw
provider response body, and adapter transformations such as merge_model_params
or unwrap_cli_envelope. Normal responses omit debug.
Set LLM_CONNECT_AUDIT_DIR=/path/to/audit to write one JSON audit record per
/execute call. Audit records include the prompt, config, redacted provider
request, provider response, parsed content, and latency. Re-run parsing without
another provider call with:
python -m llm_connect.replay /path/to/audit/record.json --json
Server concurrency
llm_connect.server.LLMServer uses ThreadingHTTPServer. Adapter instances
used in server mode must be safe to call concurrently. The bundled HTTP and
subprocess adapters keep per-call state local; custom adapters should avoid
mutating shared instance attributes during execute_prompt unless they use
their own locks.