generated from coulomb/repo-seed
Implement llm-connect ADHOC diagnostics
This commit is contained in:
102
docs/adapter-model-params.md
Normal file
102
docs/adapter-model-params.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Adapter `model_params` contract
|
||||
|
||||
`RunConfig.model_params` is a portability layer, not a blind provider payload
|
||||
escape hatch. Adapters must translate the shared keys they understand, pass
|
||||
through only provider-valid keys, and drop provider-specific keys that would
|
||||
make another provider reject the request.
|
||||
|
||||
## Shared structured output
|
||||
|
||||
Callers may request structured output with:
|
||||
|
||||
```python
|
||||
RunConfig(
|
||||
model_params={
|
||||
"json_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"summary": {"type": "string"},
|
||||
"recommendations": {"type": "array", "items": {"type": "string"}},
|
||||
},
|
||||
"required": ["summary", "recommendations"],
|
||||
}
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
Adapters translate that key into the provider's native shape:
|
||||
|
||||
| Adapter | Translation |
|
||||
|---|---|
|
||||
| OpenAI | `response_format = {"type": "json_schema", "json_schema": ...}` |
|
||||
| OpenRouter | Same OpenAI-compatible `response_format` wrapper |
|
||||
| Gemini | `generationConfig.responseMimeType = "application/json"` and `generationConfig.responseSchema = ...` |
|
||||
| Claude Code CLI | `--json-schema <schema>` plus `--output-format json`, then envelope unwrap |
|
||||
|
||||
OpenAI-compatible adapters default `json_schema.strict` to `False`. Strict mode
|
||||
requires schemas to meet provider-specific constraints such as
|
||||
`additionalProperties: false` on object nodes and complete `required` lists.
|
||||
Callers that need strict behavior can pass an explicit provider-native
|
||||
`response_format` in `model_params`.
|
||||
|
||||
## Pass-through keys
|
||||
|
||||
OpenAI and OpenRouter pass through known Chat Completions fields:
|
||||
|
||||
`top_p`, `n`, `stream`, `stop`, `presence_penalty`, `frequency_penalty`,
|
||||
`logit_bias`, `user`, `seed`, `tools`, `tool_choice`, `response_format`,
|
||||
`logprobs`, `top_logprobs`, and `parallel_tool_calls`.
|
||||
|
||||
Gemini passes through valid `generateContent` top-level fields:
|
||||
|
||||
`safetySettings`, `tools`, `toolConfig`, `systemInstruction`, and
|
||||
`cachedContent`.
|
||||
|
||||
Gemini also accepts generation config fields directly or via snake-case aliases:
|
||||
|
||||
`candidateCount`, `candidate_count`, `stopSequences`, `stop_sequences`,
|
||||
`maxOutputTokens`, `max_output_tokens`, `temperature`, `topP`, `top_p`, `topK`,
|
||||
`top_k`, `responseMimeType`, `response_mime_type`, `responseSchema`, and
|
||||
`response_schema`.
|
||||
|
||||
## Dropped keys
|
||||
|
||||
Adapters must drop keys that are meaningful to another adapter or to
|
||||
llm-connect itself but invalid for the target provider. The current shared drop
|
||||
set includes:
|
||||
|
||||
`reasoning_effort`, `max_depth`, `claude_cli_path`, and raw `json_schema` after
|
||||
translation.
|
||||
|
||||
Unknown keys are ignored by default. This keeps activity-specific configs from
|
||||
causing provider HTTP 400 errors when a caller switches providers.
|
||||
|
||||
## Diagnostics and replay
|
||||
|
||||
Server mode supports opt-in diagnostics for `/execute`:
|
||||
|
||||
```bash
|
||||
LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
|
||||
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
|
||||
```
|
||||
|
||||
Debug responses include a `debug` field with the redacted provider request, raw
|
||||
provider response body, and adapter transformations such as `merge_model_params`
|
||||
or `unwrap_cli_envelope`. Normal responses omit `debug`.
|
||||
|
||||
Set `LLM_CONNECT_AUDIT_DIR=/path/to/audit` to write one JSON audit record per
|
||||
`/execute` call. Audit records include the prompt, config, redacted provider
|
||||
request, provider response, parsed content, and latency. Re-run parsing without
|
||||
another provider call with:
|
||||
|
||||
```bash
|
||||
python -m llm_connect.replay /path/to/audit/record.json --json
|
||||
```
|
||||
|
||||
## Server concurrency
|
||||
|
||||
`llm_connect.server.LLMServer` uses `ThreadingHTTPServer`. Adapter instances
|
||||
used in server mode must be safe to call concurrently. The bundled HTTP and
|
||||
subprocess adapters keep per-call state local; custom adapters should avoid
|
||||
mutating shared instance attributes during `execute_prompt` unless they use
|
||||
their own locks.
|
||||
Reference in New Issue
Block a user