Implement llm-connect ADHOC diagnostics
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled

This commit is contained in:
2026-06-03 11:56:21 +02:00
parent 79c899b694
commit 24f4c09d42
17 changed files with 1618 additions and 611 deletions

View File

@@ -32,6 +32,9 @@ Maturity states: **Experimental → Beta → Stable → Deprecated**
| `gemini.py` | `GeminiAdapter` — Google Generative Language API | Beta | | `gemini.py` | `GeminiAdapter` — Google Generative Language API | Beta |
| `openrouter.py` | `OpenRouterAdapter` — OpenAI-compatible multi-model routing | Beta | | `openrouter.py` | `OpenRouterAdapter` — OpenAI-compatible multi-model routing | Beta |
| `claude_code.py` | `ClaudeCodeAdapter``claude --print` subprocess | Beta | | `claude_code.py` | `ClaudeCodeAdapter``claude --print` subprocess | Beta |
| `_payload.py` | Shared adapter payload translation for `RunConfig.model_params` | Beta |
| `_diagnostics.py` | Opt-in per-call diagnostics capture for server debug and audit modes | Beta |
| `replay.py` | Audit replay parser CLI (`python -m llm_connect.replay`) | Beta |
| `embedding_adapter.py` | `EmbeddingAdapter` ABC | Beta | | `embedding_adapter.py` | `EmbeddingAdapter` ABC | Beta |
| `embedding_openai.py` | `OpenAICompatibleEmbeddingAdapter` | Beta | | `embedding_openai.py` | `OpenAICompatibleEmbeddingAdapter` | Beta |
| `embedding_cache.py` | `EmbeddingCache` — disk-backed embedding cache | Beta | | `embedding_cache.py` | `EmbeddingCache` — disk-backed embedding cache | Beta |

View File

@@ -73,15 +73,15 @@ config = RunConfig(
) )
``` ```
| Field | Default | Description | | Field | Default | Description |
|---|---|---| |---|---|---|
| `model_name` | `"gpt-4"` | Model identifier (adapter may override) | | `model_name` | `"gpt-4"` | Model identifier (adapter may override) |
| `temperature` | `0.7` | Sampling temperature | | `temperature` | `0.7` | Sampling temperature |
| `max_tokens` | `2000` | Maximum output tokens | | `max_tokens` | `2000` | Maximum output tokens |
| `model_params` | `{}` | Extra provider-specific parameters | | `model_params` | `{}` | Portable extras translated by each adapter; see `docs/adapter-model-params.md` |
| `max_depth` | `3` | Max nesting depth for recursive calls | | `max_depth` | `3` | Max nesting depth for recursive calls |
| `skip_if_exists` | `True` | Skip if identical input hash already processed | | `skip_if_exists` | `True` | Skip if identical input hash already processed |
| `timeout_seconds` | `300` | Request timeout | | `timeout_seconds` | `300` | Request timeout |
### `LLMResponse` ### `LLMResponse`
@@ -92,8 +92,24 @@ response = adapter.execute_prompt(prompt, config)
print(response.content) # generated text print(response.content) # generated text
print(response.model) # model actually used print(response.model) # model actually used
print(response.usage) # {"prompt_tokens": …, "completion_tokens": …, "total_tokens": …} print(response.usage) # {"prompt_tokens": …, "completion_tokens": …, "total_tokens": …}
print(response.finish_reason) # "stop", "length", etc. print(response.finish_reason) # "stop", "length", etc.
``` ```
## Server diagnostics
Serve mode can include a debug envelope without changing normal responses:
```bash
LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
```
Set `LLM_CONNECT_AUDIT_DIR=/path/to/audit` to write per-call replay records,
then parse one without another provider call:
```bash
python -m llm_connect.replay /path/to/audit/record.json --json
```
## Writing your own adapter ## Writing your own adapter

View File

@@ -0,0 +1,102 @@
# Adapter `model_params` contract
`RunConfig.model_params` is a portability layer, not a blind provider payload
escape hatch. Adapters must translate the shared keys they understand, pass
through only provider-valid keys, and drop provider-specific keys that would
make another provider reject the request.
## Shared structured output
Callers may request structured output with:
```python
RunConfig(
model_params={
"json_schema": {
"type": "object",
"properties": {
"summary": {"type": "string"},
"recommendations": {"type": "array", "items": {"type": "string"}},
},
"required": ["summary", "recommendations"],
}
}
)
```
Adapters translate that key into the provider's native shape:
| Adapter | Translation |
|---|---|
| OpenAI | `response_format = {"type": "json_schema", "json_schema": ...}` |
| OpenRouter | Same OpenAI-compatible `response_format` wrapper |
| Gemini | `generationConfig.responseMimeType = "application/json"` and `generationConfig.responseSchema = ...` |
| Claude Code CLI | `--json-schema <schema>` plus `--output-format json`, then envelope unwrap |
OpenAI-compatible adapters default `json_schema.strict` to `False`. Strict mode
requires schemas to meet provider-specific constraints such as
`additionalProperties: false` on object nodes and complete `required` lists.
Callers that need strict behavior can pass an explicit provider-native
`response_format` in `model_params`.
## Pass-through keys
OpenAI and OpenRouter pass through known Chat Completions fields:
`top_p`, `n`, `stream`, `stop`, `presence_penalty`, `frequency_penalty`,
`logit_bias`, `user`, `seed`, `tools`, `tool_choice`, `response_format`,
`logprobs`, `top_logprobs`, and `parallel_tool_calls`.
Gemini passes through valid `generateContent` top-level fields:
`safetySettings`, `tools`, `toolConfig`, `systemInstruction`, and
`cachedContent`.
Gemini also accepts generation config fields directly or via snake-case aliases:
`candidateCount`, `candidate_count`, `stopSequences`, `stop_sequences`,
`maxOutputTokens`, `max_output_tokens`, `temperature`, `topP`, `top_p`, `topK`,
`top_k`, `responseMimeType`, `response_mime_type`, `responseSchema`, and
`response_schema`.
## Dropped keys
Adapters must drop keys that are meaningful to another adapter or to
llm-connect itself but invalid for the target provider. The current shared drop
set includes:
`reasoning_effort`, `max_depth`, `claude_cli_path`, and raw `json_schema` after
translation.
Unknown keys are ignored by default. This keeps activity-specific configs from
causing provider HTTP 400 errors when a caller switches providers.
## Diagnostics and replay
Server mode supports opt-in diagnostics for `/execute`:
```bash
LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
```
Debug responses include a `debug` field with the redacted provider request, raw
provider response body, and adapter transformations such as `merge_model_params`
or `unwrap_cli_envelope`. Normal responses omit `debug`.
Set `LLM_CONNECT_AUDIT_DIR=/path/to/audit` to write one JSON audit record per
`/execute` call. Audit records include the prompt, config, redacted provider
request, provider response, parsed content, and latency. Re-run parsing without
another provider call with:
```bash
python -m llm_connect.replay /path/to/audit/record.json --json
```
## Server concurrency
`llm_connect.server.LLMServer` uses `ThreadingHTTPServer`. Adapter instances
used in server mode must be safe to call concurrently. The bundled HTTP and
subprocess adapters keep per-call state local; custom adapters should avoid
mutating shared instance attributes during `execute_prompt` unless they use
their own locks.

153
llm_connect/_diagnostics.py Normal file
View File

@@ -0,0 +1,153 @@
"""Per-call diagnostics capture for server debug and audit modes."""
from __future__ import annotations
import copy
import json
from contextlib import contextmanager
from contextvars import ContextVar
from dataclasses import dataclass, field
from typing import Any, Iterator, Mapping
from urllib.parse import parse_qsl, urlencode, urlsplit, urlunsplit
_SECRET_QUERY_KEYS = {"key", "api_key", "apikey", "access_token", "token"}
_SECRET_HEADER_TOKENS = ("authorization", "api-key", "apikey", "token", "secret", "key")
@dataclass
class Diagnostics:
"""Captured provider request/response details for one logical LLM call."""
provider_request: dict[str, Any] | None = None
provider_response: dict[str, Any] | None = None
adapter_transformations: list[dict[str, Any]] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
return {
"provider_request": self.provider_request,
"provider_response": self.provider_response,
"adapter_transformations": self.adapter_transformations,
}
_CURRENT: ContextVar[Diagnostics | None] = ContextVar(
"llm_connect_diagnostics",
default=None,
)
@contextmanager
def capture_diagnostics(enabled: bool = True) -> Iterator[Diagnostics | None]:
"""Capture diagnostics within this context when *enabled* is true."""
if not enabled:
yield None
return
diagnostics = Diagnostics()
token = _CURRENT.set(diagnostics)
try:
yield diagnostics
finally:
_CURRENT.reset(token)
def diagnostics_enabled() -> bool:
return _CURRENT.get() is not None
def current_diagnostics() -> Diagnostics | None:
return _CURRENT.get()
def record_provider_request(
*,
url: str | None = None,
payload: Any | None = None,
headers: Mapping[str, Any] | None = None,
command: list[str] | None = None,
) -> None:
diagnostics = _CURRENT.get()
if diagnostics is None:
return
request: dict[str, Any] = {}
if url is not None:
request["url"] = redact_url(url)
if payload is not None:
request["payload"] = json_safe(payload)
if headers is not None:
request["headers_redacted"] = redact_headers(headers)
if command is not None:
request["command"] = list(command)
diagnostics.provider_request = request
def record_provider_response(*, status: int | None = None, body: Any | None = None) -> None:
diagnostics = _CURRENT.get()
if diagnostics is None:
return
response: dict[str, Any] = {}
if status is not None:
response["status"] = status
if body is not None:
response["body"] = json_safe(body)
diagnostics.provider_response = response
def record_adapter_transformation(step: str, before: Any, after: Any) -> None:
diagnostics = _CURRENT.get()
if diagnostics is None:
return
diagnostics.adapter_transformations.append(
{
"step": step,
"before": json_safe(before),
"after": json_safe(after),
}
)
def json_safe(value: Any) -> Any:
"""Return a JSON-serializable snapshot of *value* without mutating it."""
try:
return json.loads(json.dumps(value))
except (TypeError, ValueError):
try:
return copy.deepcopy(value)
except Exception:
return repr(value)
def redact_headers(headers: Mapping[str, Any]) -> dict[str, Any]:
redacted: dict[str, Any] = {}
for key, value in headers.items():
lowered = str(key).lower()
if any(token in lowered for token in _SECRET_HEADER_TOKENS):
redacted[str(key)] = _redact_header_value(value)
else:
redacted[str(key)] = json_safe(value)
return redacted
def redact_url(url: str) -> str:
parts = urlsplit(url)
query = []
for key, value in parse_qsl(parts.query, keep_blank_values=True):
if key.lower() in _SECRET_QUERY_KEYS:
query.append((key, "<redacted>"))
else:
query.append((key, value))
return urlunsplit((parts.scheme, parts.netloc, parts.path, urlencode(query), parts.fragment))
def _redact_header_value(value: Any) -> str:
text = str(value)
if " " in text:
scheme = text.split(" ", 1)[0]
return f"{scheme} <redacted>"
return "<redacted>"

View File

@@ -1,86 +1,101 @@
""" """
Thin synchronous HTTP helper built on :mod:`urllib.request`. Thin synchronous HTTP helper built on :mod:`urllib.request`.
Translates HTTP errors into typed :mod:`markitect.llm.exceptions`. Translates HTTP errors into typed :mod:`markitect.llm.exceptions`.
""" """
import json import json
import urllib.request import urllib.error
import urllib.error import urllib.request
from typing import Dict, Any, Optional from typing import Any, Dict, Optional
from llm_connect.exceptions import ( from llm_connect._diagnostics import record_provider_request, record_provider_response
LLMAPIError, from llm_connect.exceptions import (
LLMRateLimitError, LLMAPIError,
LLMTimeoutError, LLMRateLimitError,
) LLMTimeoutError,
)
def post_json(
url: str, def post_json(
payload: Dict[str, Any], url: str,
headers: Optional[Dict[str, str]] = None, payload: Dict[str, Any],
timeout: int = 300, headers: Optional[Dict[str, str]] = None,
) -> Dict[str, Any]: timeout: int = 300,
"""POST *payload* as JSON and return the parsed response body. ) -> Dict[str, Any]:
"""POST *payload* as JSON and return the parsed response body.
Raises:
LLMRateLimitError: on HTTP 429 Raises:
LLMAPIError: on other non-2xx responses LLMRateLimitError: on HTTP 429
LLMTimeoutError: on socket / read timeout LLMAPIError: on other non-2xx responses
""" LLMTimeoutError: on socket / read timeout
data = json.dumps(payload).encode() """
req = urllib.request.Request( record_provider_request(url=url, payload=payload, headers=headers or {})
url, data = json.dumps(payload).encode()
data=data, req = urllib.request.Request(
headers={"Content-Type": "application/json", **(headers or {})}, url,
method="POST", data=data,
) headers={"Content-Type": "application/json", **(headers or {})},
method="POST",
try: )
with urllib.request.urlopen(req, timeout=timeout) as resp:
body = resp.read().decode() try:
try: with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(body) body = resp.read().decode()
except json.JSONDecodeError as exc: try:
preview = body[:300].replace("\n", "\\n") parsed = json.loads(body)
raise LLMAPIError( record_provider_response(status=resp.status, body=parsed)
f"Invalid JSON response from {url}: {exc} — body preview: {preview!r}", return parsed
cause=exc, except json.JSONDecodeError as exc:
) from exc record_provider_response(status=resp.status, body=body)
except urllib.error.HTTPError as exc: preview = body[:300].replace("\n", "\\n")
body = "" raise LLMAPIError(
try: f"Invalid JSON response from {url}: {exc} - body preview: {preview!r}",
body = exc.read().decode() cause=exc,
except Exception: ) from exc
pass except urllib.error.HTTPError as exc:
body = ""
if exc.code == 429: try:
raise LLMRateLimitError( body = exc.read().decode()
f"Rate limited (429) from {url}", except Exception:
status_code=429, pass
response_body=body, record_provider_response(status=exc.code, body=_json_or_text(body))
cause=exc,
) from exc if exc.code == 429:
raise LLMRateLimitError(
raise LLMAPIError( f"Rate limited (429) from {url}",
f"HTTP {exc.code} from {url}", status_code=429,
status_code=exc.code, response_body=body,
response_body=body, cause=exc,
cause=exc, ) from exc
) from exc
except urllib.error.URLError as exc: raise LLMAPIError(
if "timed out" in str(exc.reason): f"HTTP {exc.code} from {url}",
raise LLMTimeoutError( status_code=exc.code,
f"Request to {url} timed out after {timeout}s", response_body=body,
cause=exc, cause=exc,
) from exc ) from exc
raise LLMAPIError( except urllib.error.URLError as exc:
f"URL error for {url}: {exc.reason}", record_provider_response(body={"error": str(exc.reason)})
cause=exc, if "timed out" in str(exc.reason):
) from exc raise LLMTimeoutError(
except TimeoutError as exc: f"Request to {url} timed out after {timeout}s",
raise LLMTimeoutError( cause=exc,
f"Request to {url} timed out after {timeout}s", ) from exc
cause=exc, raise LLMAPIError(
) from exc f"URL error for {url}: {exc.reason}",
cause=exc,
) from exc
except TimeoutError as exc:
record_provider_response(body={"error": "timeout"})
raise LLMTimeoutError(
f"Request to {url} timed out after {timeout}s",
cause=exc,
) from exc
def _json_or_text(body: str) -> Any:
try:
return json.loads(body)
except (TypeError, ValueError):
return body

154
llm_connect/_payload.py Normal file
View File

@@ -0,0 +1,154 @@
"""Provider payload helpers for translating ``RunConfig.model_params``."""
from __future__ import annotations
import json
from typing import Any
from llm_connect._diagnostics import (
diagnostics_enabled,
json_safe,
record_adapter_transformation,
)
# OpenAI Chat Completions fields that map straight through from model_params.
# Anything not in this set is provider-specific and must be either translated
# or dropped. Blind merges are deliberately avoided because OpenAI-compatible
# providers commonly reject unknown top-level fields with HTTP 400.
OPENAI_CHAT_PASSTHROUGH_FIELDS = frozenset(
{
"top_p",
"n",
"stream",
"stop",
"presence_penalty",
"frequency_penalty",
"logit_bias",
"user",
"seed",
"tools",
"tool_choice",
"response_format",
"logprobs",
"top_logprobs",
"parallel_tool_calls",
}
)
DROPPED_NON_OPENAI_FIELDS = frozenset(
{
"reasoning_effort",
"max_depth",
"claude_cli_path",
"json_schema",
}
)
GEMINI_TOP_LEVEL_FIELDS = frozenset(
{
"safetySettings",
"tools",
"toolConfig",
"systemInstruction",
"cachedContent",
}
)
GEMINI_GENERATION_CONFIG_FIELDS = frozenset(
{
"candidateCount",
"stopSequences",
"maxOutputTokens",
"temperature",
"topP",
"topK",
"responseMimeType",
"responseSchema",
}
)
GEMINI_GENERATION_CONFIG_ALIASES = {
"candidate_count": "candidateCount",
"stop_sequences": "stopSequences",
"max_output_tokens": "maxOutputTokens",
"top_p": "topP",
"top_k": "topK",
"response_mime_type": "responseMimeType",
"response_schema": "responseSchema",
}
def merge_openai_chat_model_params(payload: dict[str, Any], model_params: dict[str, Any]) -> None:
"""Merge model_params into an OpenAI Chat Completions-style payload.
Translates ``json_schema`` to ``response_format``, passes known OpenAI
fields through, and drops Claude/llm-connect-only knobs.
"""
before = json_safe(payload) if diagnostics_enabled() else None
schema = _coerce_json_schema(model_params.get("json_schema"))
caller_response_format = model_params.get("response_format")
if schema is not None and caller_response_format is None and "response_format" not in payload:
payload["response_format"] = {
"type": "json_schema",
"json_schema": {
"name": "structured_output",
"schema": schema,
"strict": False,
},
}
for key, value in model_params.items():
if key in DROPPED_NON_OPENAI_FIELDS:
continue
if key in OPENAI_CHAT_PASSTHROUGH_FIELDS:
payload[key] = value
if before is not None:
record_adapter_transformation("merge_model_params.openai_chat", before, payload)
def merge_gemini_model_params(payload: dict[str, Any], model_params: dict[str, Any]) -> None:
"""Merge model_params into a Gemini ``generateContent`` payload."""
before = json_safe(payload) if diagnostics_enabled() else None
generation_config = payload.setdefault("generationConfig", {})
schema = _coerce_json_schema(model_params.get("json_schema"))
if schema is not None and "responseSchema" not in generation_config:
generation_config["responseMimeType"] = "application/json"
generation_config["responseSchema"] = schema
explicit_generation_config = model_params.get("generationConfig")
if isinstance(explicit_generation_config, dict):
generation_config.update(explicit_generation_config)
for key, value in model_params.items():
if key in {"json_schema", "generationConfig", "reasoning_effort", "max_depth"}:
continue
if key in GEMINI_TOP_LEVEL_FIELDS:
payload[key] = value
continue
gemini_key = GEMINI_GENERATION_CONFIG_ALIASES.get(key, key)
if gemini_key in GEMINI_GENERATION_CONFIG_FIELDS:
generation_config[gemini_key] = value
if before is not None:
record_adapter_transformation("merge_model_params.gemini", before, payload)
def _coerce_json_schema(schema: Any) -> dict[str, Any] | None:
if isinstance(schema, str):
try:
schema = json.loads(schema)
except (TypeError, ValueError):
return None
if isinstance(schema, dict):
return schema
return None

View File

@@ -1,277 +1,289 @@
""" """
Claude Code CLI adapter runs the ``claude`` CLI as a subprocess. Claude Code CLI adapter - runs the ``claude`` CLI as a subprocess.
""" """
import asyncio import asyncio
import json import json
import os import os
import subprocess import subprocess
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
from llm_connect.adapter import LLMAdapter from llm_connect._diagnostics import (
from llm_connect.models import RunConfig, LLMResponse record_adapter_transformation,
from llm_connect.config import LLMConfig record_provider_request,
from llm_connect._token_estimator import estimate_tokens record_provider_response,
from llm_connect.exceptions import ( )
LLMSubprocessError, from llm_connect._token_estimator import estimate_tokens
LLMTimeoutError, from llm_connect.adapter import LLMAdapter
) from llm_connect.config import LLMConfig
from llm_connect.exceptions import LLMSubprocessError, LLMTimeoutError
from llm_connect.models import LLMResponse, RunConfig
class ClaudeCodeAdapter(LLMAdapter):
"""LLM adapter that shells out to the ``claude`` CLI with ``--print``.
class ClaudeCodeAdapter(LLMAdapter):
The compiled prompt is piped via **stdin** to avoid shell argument """LLM adapter that shells out to the ``claude`` CLI with ``--print``.
length limits (compiled prompts can exceed 30 KB).
""" The compiled prompt is piped via stdin to avoid shell argument length
limits. Compiled prompts can exceed 30 KB.
def __init__( """
self,
cli_path: Optional[str] = None, def __init__(
model: Optional[str] = None, self,
config: Optional[LLMConfig] = None, cli_path: Optional[str] = None,
): model: Optional[str] = None,
self._config = config or LLMConfig(provider="claude-code") config: Optional[LLMConfig] = None,
self._cli_path = cli_path or self._resolve_cli_path() ):
self._model = model self._config = config or LLMConfig(provider="claude-code")
self._cli_path = cli_path or self._resolve_cli_path()
# ── LLMAdapter interface ──────────────────────────────────────── self._model = model
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: # LLMAdapter interface
self._preflight_budget(config)
cmd = self._build_command(config) def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
self._preflight_budget(config)
timeout = config.timeout_seconds or self._config.timeout_seconds cmd = self._build_command(config)
try: timeout = config.timeout_seconds or self._config.timeout_seconds
result = subprocess.run( record_provider_request(command=cmd, payload={"stdin": prompt})
cmd,
input=prompt, try:
capture_output=True, result = subprocess.run(
text=True, cmd,
timeout=timeout, input=prompt,
) capture_output=True,
except subprocess.TimeoutExpired as exc: text=True,
raise LLMTimeoutError( timeout=timeout,
f"claude CLI timed out after {timeout}s", )
cause=exc, except subprocess.TimeoutExpired as exc:
) from exc raise LLMTimeoutError(
f"claude CLI timed out after {timeout}s",
if result.returncode != 0: cause=exc,
raise LLMSubprocessError( ) from exc
f"claude CLI exited with code {result.returncode}",
return_code=result.returncode, record_provider_response(
stderr=result.stderr, status=result.returncode,
) body={"stdout": result.stdout, "stderr": result.stderr},
)
content = _unwrap_cli_json_envelope(result.stdout, config) if result.returncode != 0:
prompt_tokens = estimate_tokens(prompt) raise LLMSubprocessError(
completion_tokens = estimate_tokens(content) f"claude CLI exited with code {result.returncode}",
return_code=result.returncode,
response = LLMResponse( stderr=result.stderr,
content=content, )
model=self._model or "claude-code-cli",
usage={ content = _unwrap_cli_json_envelope(result.stdout, config)
"prompt_tokens": prompt_tokens, prompt_tokens = estimate_tokens(prompt)
"completion_tokens": completion_tokens, completion_tokens = estimate_tokens(content)
"total_tokens": prompt_tokens + completion_tokens,
}, response = LLMResponse(
finish_reason="stop", content=content,
metadata={ model=self._model or "claude-code-cli",
"provider": "claude-code", usage={
"cli_path": self._cli_path, "prompt_tokens": prompt_tokens,
}, "completion_tokens": completion_tokens,
) "total_tokens": prompt_tokens + completion_tokens,
self._consume_budget(config, response) },
return response finish_reason="stop",
metadata={
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: "provider": "claude-code",
"""Native async implementation using asyncio.create_subprocess_exec.""" "cli_path": self._cli_path,
self._preflight_budget(config) },
cmd = self._build_command(config) )
self._consume_budget(config, response)
timeout = config.timeout_seconds or self._config.timeout_seconds return response
try: async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
proc = await asyncio.create_subprocess_exec( """Native async implementation using asyncio.create_subprocess_exec."""
*cmd, self._preflight_budget(config)
stdin=asyncio.subprocess.PIPE, cmd = self._build_command(config)
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE, timeout = config.timeout_seconds or self._config.timeout_seconds
) record_provider_request(command=cmd, payload={"stdin": prompt})
stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(input=prompt.encode()), try:
timeout=timeout, proc = await asyncio.create_subprocess_exec(
) *cmd,
except asyncio.TimeoutError as exc: stdin=asyncio.subprocess.PIPE,
raise LLMTimeoutError( stdout=asyncio.subprocess.PIPE,
f"claude CLI timed out after {timeout}s", stderr=asyncio.subprocess.PIPE,
cause=exc, )
) from exc stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(input=prompt.encode()),
if proc.returncode != 0: timeout=timeout,
raise LLMSubprocessError( )
f"claude CLI exited with code {proc.returncode}", except asyncio.TimeoutError as exc:
return_code=proc.returncode, raise LLMTimeoutError(
stderr=stderr_bytes.decode(), f"claude CLI timed out after {timeout}s",
) cause=exc,
) from exc
content = _unwrap_cli_json_envelope(stdout_bytes.decode(), config)
prompt_tokens = estimate_tokens(prompt) stdout = stdout_bytes.decode()
completion_tokens = estimate_tokens(content) stderr = stderr_bytes.decode()
record_provider_response(
response = LLMResponse( status=proc.returncode,
content=content, body={"stdout": stdout, "stderr": stderr},
model=self._model or "claude-code-cli", )
usage={ if proc.returncode != 0:
"prompt_tokens": prompt_tokens, raise LLMSubprocessError(
"completion_tokens": completion_tokens, f"claude CLI exited with code {proc.returncode}",
"total_tokens": prompt_tokens + completion_tokens, return_code=proc.returncode,
}, stderr=stderr,
finish_reason="stop", )
metadata={
"provider": "claude-code", content = _unwrap_cli_json_envelope(stdout, config)
"cli_path": self._cli_path, prompt_tokens = estimate_tokens(prompt)
"async": True, completion_tokens = estimate_tokens(content)
},
) response = LLMResponse(
self._consume_budget(config, response) content=content,
return response model=self._model or "claude-code-cli",
usage={
def validate_config(self, config: RunConfig) -> bool: "prompt_tokens": prompt_tokens,
try: "completion_tokens": completion_tokens,
result = subprocess.run( "total_tokens": prompt_tokens + completion_tokens,
[self._cli_path, "--version"], },
capture_output=True, finish_reason="stop",
text=True, metadata={
timeout=10, "provider": "claude-code",
) "cli_path": self._cli_path,
return result.returncode == 0 "async": True,
except (subprocess.TimeoutExpired, FileNotFoundError, OSError): },
return False )
self._consume_budget(config, response)
def _build_command(self, config: RunConfig) -> list[str]: return response
cmd = [self._cli_path, "--print"]
if self._model: def validate_config(self, config: RunConfig) -> bool:
cmd.extend(["--model", self._model]) try:
result = subprocess.run(
json_schema = _json_schema_arg(config) [self._cli_path, "--version"],
if json_schema: capture_output=True,
cmd.extend(["--json-schema", json_schema]) text=True,
# With --json-schema alone the CLI prints conversational text on timeout=10,
# stdout while the structured payload ships on a sidecar channel )
# callers cannot reach. --output-format json forces the structured return result.returncode == 0
# response (wrapped in an envelope) onto stdout. except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
cmd.extend(["--output-format", "json"]) return False
return cmd
def _build_command(self, config: RunConfig) -> list[str]:
def _resolve_cli_path(self) -> str: cmd = [self._cli_path, "--print"]
configured = ( if self._model:
os.environ.get("LLM_CONNECT_CLAUDE_CLI_PATH") cmd.extend(["--model", self._model])
or os.environ.get("CLAUDE_CLI_PATH")
or self._config.claude_cli_path json_schema = _json_schema_arg(config)
) if json_schema:
if configured and configured != "claude": cmd.extend(["--json-schema", json_schema])
return configured # With --json-schema alone the CLI prints conversational text on
# stdout while the structured payload ships on a sidecar channel
local_cli = Path.home() / ".local" / "bin" / "claude" # callers cannot reach. --output-format json forces the structured
if local_cli.exists(): # response (wrapped in an envelope) onto stdout.
return str(local_cli) cmd.extend(["--output-format", "json"])
return configured or "claude" return cmd
def _resolve_cli_path(self) -> str:
def _json_schema_arg(config: RunConfig) -> str | None: configured = (
schema = (config.model_params or {}).get("json_schema") os.environ.get("LLM_CONNECT_CLAUDE_CLI_PATH")
if not schema: or os.environ.get("CLAUDE_CLI_PATH")
return None or self._config.claude_cli_path
if isinstance(schema, str): )
return schema if configured and configured != "claude":
if isinstance(schema, dict): return configured
return json.dumps(schema, separators=(",", ":"))
return None local_cli = Path.home() / ".local" / "bin" / "claude"
if local_cli.exists():
return str(local_cli)
# Envelope field names Claude Code's `--output-format json` is known to use return configured or "claude"
# for the model's primary textual response. Used as a fall-back when no field
# carries a JSON-parseable payload (e.g. plain prose generation).
_ENVELOPE_TEXT_FIELDS = ("result", "result_text", "content", "text", "output") def _json_schema_arg(config: RunConfig) -> str | None:
schema = (config.model_params or {}).get("json_schema")
if not schema:
def _unwrap_cli_json_envelope(stdout: str, config: RunConfig) -> str: return None
"""Extract the model's payload from Claude CLI's --output-format json envelope. if isinstance(schema, str):
return schema
Only runs when --json-schema was set (the only code path that adds if isinstance(schema, dict):
--output-format json to the CLI invocation). Other callers keep the raw return json.dumps(schema, separators=(",", ":"))
stdout behavior unchanged. return None
Strategy: when --json-schema is set the caller wants JSON back, so prefer
any envelope field whose value is itself valid JSON (dict, list, or a # Envelope field names Claude Code's --output-format json is known to use for
string that parses as JSON). This handles two observed envelope shapes: # the model's primary textual response. Used as a fallback when no field carries
# a JSON-parseable payload, such as plain prose generation.
1. Short prompts where the model emits the structured payload directly _ENVELOPE_TEXT_FIELDS = ("result", "result_text", "content", "text", "output")
in the `result` field as a JSON-encoded string.
2. Longer prompts where the model emits a conversational preamble in
`result` and the schema-enforced JSON in a separate field (the exact def _unwrap_cli_json_envelope(stdout: str, config: RunConfig) -> str:
field name varies across CLI versions). """Extract the model's payload from Claude CLI's --output-format json envelope.
Fall back to the first text field only when no JSON-bearing field exists, Only runs when --json-schema was set. Other callers keep the raw stdout
so non-schema callers via this code path still see the model's prose. behavior unchanged.
Surface the raw envelope as a last resort so the operator can see what """
shape arrived and extend the strategy. if not _json_schema_arg(config):
""" return stdout
if not _json_schema_arg(config): text = stdout.strip()
return stdout if not text:
text = stdout.strip() return stdout
if not text: try:
return stdout envelope = json.loads(text)
try: except json.JSONDecodeError:
envelope = json.loads(text) return stdout
except json.JSONDecodeError: if not isinstance(envelope, dict):
return stdout return stdout
if not isinstance(envelope, dict):
return stdout json_payload = _find_json_payload(envelope)
if json_payload is not None:
json_payload = _find_json_payload(envelope) return _record_unwrap(stdout, json_payload)
if json_payload is not None:
return json_payload for key in _ENVELOPE_TEXT_FIELDS:
value = envelope.get(key)
for key in _ENVELOPE_TEXT_FIELDS: if isinstance(value, str):
value = envelope.get(key) return _record_unwrap(stdout, value)
if isinstance(value, str): if isinstance(value, (dict, list)):
return value return _record_unwrap(stdout, json.dumps(value))
if isinstance(value, (dict, list)):
return json.dumps(value) return stdout
return stdout
def _find_json_payload(envelope: dict) -> str | None:
"""Return the first envelope value that represents valid JSON."""
def _find_json_payload(envelope: dict) -> str | None: for key, value in envelope.items():
"""Return the first envelope value that represents valid JSON. if key in _ENVELOPE_METADATA_KEYS:
continue
Insertion order is preserved by Python dicts, so this prefers fields the if isinstance(value, (dict, list)):
CLI lists earliest in its envelope. Skips obvious metadata keys (cost, return json.dumps(value)
usage, timing) so we never accidentally pick a numeric or telemetry value. if isinstance(value, str):
""" stripped = value.strip()
for key, value in envelope.items(): if stripped.startswith(("{", "[")):
if key in _ENVELOPE_METADATA_KEYS: try:
continue json.loads(stripped)
if isinstance(value, (dict, list)): except json.JSONDecodeError:
return json.dumps(value) continue
if isinstance(value, str): return stripped
stripped = value.strip() return None
if stripped.startswith(("{", "[")):
try:
json.loads(stripped) # Envelope keys that carry telemetry, never the model payload.
except json.JSONDecodeError: _ENVELOPE_METADATA_KEYS = frozenset(
continue {
return stripped "type",
return None "subtype",
"model",
"usage",
# Envelope keys that carry telemetry, never the model payload. "total_cost_usd",
_ENVELOPE_METADATA_KEYS = frozenset({ "cost_usd",
"type", "subtype", "model", "usage", "total_cost_usd", "cost_usd", "duration_ms",
"duration_ms", "duration_api_ms", "num_turns", "session_id", "duration_api_ms",
"is_error", "stop_reason", "permission_denials", "uuid", "num_turns",
}) "session_id",
"is_error",
"stop_reason",
"permission_denials",
"uuid",
}
)
def _record_unwrap(stdout: str, content: str) -> str:
if content != stdout:
record_adapter_transformation("unwrap_cli_envelope", stdout, content)
return content

View File

@@ -9,6 +9,7 @@ from llm_connect.adapter import LLMAdapter
from llm_connect.models import RunConfig, LLMResponse from llm_connect.models import RunConfig, LLMResponse
from llm_connect.config import resolve_api_key, find_project_root from llm_connect.config import resolve_api_key, find_project_root
from llm_connect._http import post_json from llm_connect._http import post_json
from llm_connect._payload import merge_gemini_model_params
from llm_connect.exceptions import LLMConfigurationError from llm_connect.exceptions import LLMConfigurationError
_DEFAULT_MODEL = "gemini-2.5-flash" _DEFAULT_MODEL = "gemini-2.5-flash"
@@ -74,6 +75,8 @@ class GeminiAdapter(LLMAdapter):
"maxOutputTokens": config.max_tokens, "maxOutputTokens": config.max_tokens,
}, },
} }
if config.model_params:
merge_gemini_model_params(payload, config.model_params)
url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}" url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}"

View File

@@ -9,6 +9,7 @@ from llm_connect.adapter import LLMAdapter
from llm_connect.models import RunConfig, LLMResponse from llm_connect.models import RunConfig, LLMResponse
from llm_connect.config import resolve_api_key, find_project_root from llm_connect.config import resolve_api_key, find_project_root
from llm_connect._http import post_json from llm_connect._http import post_json
from llm_connect._payload import merge_openai_chat_model_params
from llm_connect.exceptions import ( from llm_connect.exceptions import (
LLMConfigurationError, LLMConfigurationError,
LLMAPIError, LLMAPIError,
@@ -65,6 +66,8 @@ class OpenAIAdapter(LLMAdapter):
"temperature": config.temperature, "temperature": config.temperature,
"max_tokens": config.max_tokens, "max_tokens": config.max_tokens,
} }
if config.model_params:
merge_openai_chat_model_params(payload, config.model_params)
headers = { headers = {
"Authorization": f"Bearer {self._api_key}", "Authorization": f"Bearer {self._api_key}",

View File

@@ -1,221 +1,151 @@
""" """
OpenRouter adapter calls the OpenAI-compatible chat completions API. OpenRouter adapter - calls the OpenAI-compatible chat completions API.
""" """
import time import time
from typing import Optional, Dict, Any from typing import Any, Dict, Optional
from llm_connect.adapter import LLMAdapter from llm_connect._http import post_json
from llm_connect.models import RunConfig, LLMResponse from llm_connect._payload import merge_openai_chat_model_params
from llm_connect.config import LLMConfig, resolve_api_key, find_project_root from llm_connect.adapter import LLMAdapter
from llm_connect._http import post_json from llm_connect.config import LLMConfig, find_project_root, resolve_api_key
from llm_connect.exceptions import ( from llm_connect.exceptions import LLMAPIError, LLMRateLimitError
LLMConfigurationError, from llm_connect.models import LLMResponse, RunConfig
LLMAPIError,
LLMRateLimitError, _DEFAULT_MODEL = "anthropic/claude-sonnet-4"
)
_DEFAULT_MODEL = "anthropic/claude-sonnet-4" class OpenRouterAdapter(LLMAdapter):
"""LLM adapter that calls the OpenRouter chat completions endpoint.
class OpenRouterAdapter(LLMAdapter): Constructor args override values from *config*; *config* overrides
"""LLM adapter that calls the OpenRouter chat completions endpoint. global defaults. The model used for a given call is resolved as:
``constructor model > RunConfig.model_name > default``.
Constructor args override values from *config*; *config* overrides """
global defaults. The model used for a given call is resolved as:
``constructor model > RunConfig.model_name > default``. def __init__(
""" self,
model: Optional[str] = None,
def __init__( api_key: Optional[str] = None,
self, api_base: Optional[str] = None,
model: Optional[str] = None, config: Optional[LLMConfig] = None,
api_key: Optional[str] = None, system_prompt: Optional[str] = None,
api_base: Optional[str] = None, extra_headers: Optional[Dict[str, str]] = None,
config: Optional[LLMConfig] = None, max_retries: Optional[int] = None,
system_prompt: Optional[str] = None, ):
extra_headers: Optional[Dict[str, str]] = None, self._config = config or LLMConfig()
max_retries: Optional[int] = None, # Track whether the model was explicitly supplied (constructor or
): # LLMConfig). Comparing self._model to _DEFAULT_MODEL is not enough:
self._config = config or LLMConfig() # callers who pass --model anthropic/claude-sonnet-4 happen to match
# Track whether the model was explicitly supplied (constructor or # the default and would otherwise be misrouted to RunConfig.model_name
# LLMConfig). Comparing self._model to _DEFAULT_MODEL is not enough — # (which defaults to "gpt-4", quietly sending every call to OpenAI's
# callers who pass --model anthropic/claude-sonnet-4 happen to match # gpt-4 model, which is what broke the activity-core CUST-WP-0045
# the default and would otherwise be misrouted to RunConfig.model_name # canary on 2026-06-02).
# (which defaults to "gpt-4" — quietly sending every call to OpenAI's self._explicit_model = model is not None or self._config.model is not None
# gpt-4 model, which is what broke the activity-core CUST-WP-0045 self._model = model or self._config.model or _DEFAULT_MODEL
# canary on 2026-06-02). self._api_base = (api_base or self._config.api_base).rstrip("/")
self._explicit_model = model is not None or self._config.model is not None self._system_prompt = system_prompt
self._model = model or self._config.model or _DEFAULT_MODEL self._extra_headers = extra_headers or {}
self._api_base = (api_base or self._config.api_base).rstrip("/") self._max_retries = max_retries if max_retries is not None else self._config.max_retries
self._system_prompt = system_prompt
self._extra_headers = extra_headers or {} root = find_project_root()
self._max_retries = max_retries if max_retries is not None else self._config.max_retries key_file_paths = [root / "apikey-openrouter.txt"] if root else []
self._api_key = resolve_api_key(
# Resolve API key explicit=api_key or self._config.api_key,
root = find_project_root() env_var="OPENROUTER_API_KEY",
key_file_paths = [root / "apikey-openrouter.txt"] if root else [] key_file_paths=key_file_paths,
self._api_key = resolve_api_key( )
explicit=api_key or self._config.api_key,
env_var="OPENROUTER_API_KEY", # LLMAdapter interface
key_file_paths=key_file_paths,
) def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
self._preflight_budget(config)
# ── LLMAdapter interface ──────────────────────────────────────── # Explicit constructor/LLMConfig model wins; only fall back to the
# per-call RunConfig.model_name when the adapter was not told what to
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: # use. RunConfig.model_name defaults to "gpt-4", so falling back
self._preflight_budget(config) # unconditionally would silently misroute callers.
# Explicit constructor/LLMConfig model wins; only fall back to the if self._explicit_model:
# per-call RunConfig.model_name when the adapter wasn't told what to model = self._model
# use. RunConfig.model_name defaults to "gpt-4", so falling back else:
# unconditionally would silently misroute callers. model = config.model_name or self._model
if self._explicit_model:
model = self._model messages: list[Dict[str, str]] = []
else: if self._system_prompt:
model = config.model_name or self._model messages.append({"role": "system", "content": self._system_prompt})
messages.append({"role": "user", "content": prompt})
messages: list[Dict[str, str]] = []
if self._system_prompt: payload: Dict[str, Any] = {
messages.append({"role": "system", "content": self._system_prompt}) "model": model,
messages.append({"role": "user", "content": prompt}) "messages": messages,
"temperature": config.temperature,
payload: Dict[str, Any] = { "max_tokens": config.max_tokens,
"model": model, }
"messages": messages, if config.model_params:
"temperature": config.temperature, merge_openai_chat_model_params(payload, config.model_params)
"max_tokens": config.max_tokens,
} headers = {
if config.model_params: "Authorization": f"Bearer {self._api_key}",
_merge_model_params(payload, config.model_params) **self._extra_headers,
}
headers = { url = f"{self._api_base}/chat/completions"
"Authorization": f"Bearer {self._api_key}",
**self._extra_headers, start = time.time()
} data = self._post_with_retries(url, payload, headers, config.timeout_seconds)
url = f"{self._api_base}/chat/completions" latency = time.time() - start
start = time.time() choice = data.get("choices", [{}])[0]
data = self._post_with_retries(url, payload, headers, config.timeout_seconds) content = choice.get("message", {}).get("content", "")
latency = time.time() - start finish_reason = choice.get("finish_reason", "stop")
usage = data.get("usage", {})
# Parse response
choice = data.get("choices", [{}])[0] response = LLMResponse(
content = choice.get("message", {}).get("content", "") content=content,
finish_reason = choice.get("finish_reason", "stop") model=data.get("model", model),
usage = data.get("usage", {}) usage={
"prompt_tokens": usage.get("prompt_tokens", 0),
response = LLMResponse( "completion_tokens": usage.get("completion_tokens", 0),
content=content, "total_tokens": usage.get("total_tokens", 0),
model=data.get("model", model), },
usage={ finish_reason=finish_reason,
"prompt_tokens": usage.get("prompt_tokens", 0), metadata={
"completion_tokens": usage.get("completion_tokens", 0), "provider": "openrouter",
"total_tokens": usage.get("total_tokens", 0), "latency_seconds": round(latency, 3),
}, "response_id": data.get("id", ""),
finish_reason=finish_reason, },
metadata={ )
"provider": "openrouter", self._consume_budget(config, response)
"latency_seconds": round(latency, 3), return response
"response_id": data.get("id", ""),
}, def validate_config(self, config: RunConfig) -> bool:
) if not self._api_key:
self._consume_budget(config, response) return False
return response if not (self._model or config.model_name):
return False
def validate_config(self, config: RunConfig) -> bool: if not (0.0 <= config.temperature <= 2.0):
if not self._api_key: return False
return False return True
if not (self._model or config.model_name):
return False # Internals
if not (0.0 <= config.temperature <= 2.0):
return False def _post_with_retries(
return True self,
url: str,
# ── Internals ─────────────────────────────────────────────────── payload: Dict[str, Any],
headers: Dict[str, str],
def _post_with_retries( timeout: int,
self, ) -> Dict[str, Any]:
url: str, last_exc: Optional[Exception] = None
payload: Dict[str, Any], for attempt in range(self._max_retries + 1):
headers: Dict[str, str], try:
timeout: int, return post_json(url, payload, headers, timeout=timeout)
) -> Dict[str, Any]: except LLMRateLimitError as exc:
last_exc: Optional[Exception] = None last_exc = exc
for attempt in range(self._max_retries + 1): if attempt < self._max_retries:
try: time.sleep(2 ** attempt)
return post_json(url, payload, headers, timeout=timeout) except LLMAPIError as exc:
except LLMRateLimitError as exc: if exc.status_code >= 500 and attempt < self._max_retries:
last_exc = exc last_exc = exc
if attempt < self._max_retries: time.sleep(2 ** attempt)
time.sleep(2 ** attempt) else:
except LLMAPIError as exc: raise
if exc.status_code >= 500 and attempt < self._max_retries: raise last_exc # type: ignore[misc]
last_exc = exc
time.sleep(2 ** attempt)
else:
raise
raise last_exc # type: ignore[misc]
# OpenAI Chat Completions fields that map straight through from model_params.
# Anything not in this set is provider-specific and must be either translated
# or dropped — we never blind-merge into the payload, because OpenRouter
# rejects unknown top-level fields with HTTP 400.
_OPENAI_PASSTHROUGH_FIELDS = frozenset({
"top_p", "n", "stream", "stop", "presence_penalty",
"frequency_penalty", "logit_bias", "user", "seed",
"tools", "tool_choice", "response_format",
"logprobs", "top_logprobs", "parallel_tool_calls",
})
# Provider-specific model_params keys that have no OpenAI Chat Completions
# equivalent and must be silently dropped to keep payloads valid.
_DROPPED_NON_OPENAI_FIELDS = frozenset({
"reasoning_effort", # Claude CLI / Anthropic-specific
"max_depth", # llm-connect's own depth knob
"claude_cli_path", # adapter wiring leak
"json_schema", # translated below into response_format
})
def _merge_model_params(payload: Dict[str, Any], model_params: Dict[str, Any]) -> None:
"""Merge RunConfig.model_params into an OpenAI Chat Completions payload.
Pass-through whitelisted OpenAI keys, translate json_schema into the
proper response_format wrapper, drop known provider-specific fields,
and ignore anything else rather than letting it through and triggering
a 400 from OpenRouter (the failure mode that hit CUST-WP-0045 on
2026-06-02 — reasoning_effort and a top-level json_schema were merged
into the body and the API rejected both).
"""
schema = model_params.get("json_schema")
if schema is not None and "response_format" not in payload:
if isinstance(schema, str):
try:
import json as _json
schema = _json.loads(schema)
except (ValueError, TypeError):
schema = None
if isinstance(schema, dict):
# strict=False: OpenAI's strict mode requires additionalProperties
# to be false on every object and every property in the required
# list. Most application-supplied schemas are not written that
# way (the activity-core daily-triage schema, for example, has
# neither). With strict=False, OpenRouter still honours the
# schema as a soft constraint and the model's output remains
# structured. Callers can opt back into strict by including
# `strict: true` themselves in a custom `response_format`.
payload["response_format"] = {
"type": "json_schema",
"json_schema": {
"name": "structured_output",
"schema": schema,
"strict": False,
},
}
for key, value in model_params.items():
if key in _DROPPED_NON_OPENAI_FIELDS:
continue
if key in _OPENAI_PASSTHROUGH_FIELDS:
payload[key] = value
# else: silently drop unknown keys rather than risk a 400.

121
llm_connect/replay.py Normal file
View File

@@ -0,0 +1,121 @@
"""Replay llm-connect audit records without making provider calls."""
from __future__ import annotations
import argparse
import json
from pathlib import Path
from typing import Any
from llm_connect.claude_code import _unwrap_cli_json_envelope
from llm_connect.models import RunConfig
def parse_audit_record(record: dict[str, Any]) -> dict[str, Any]:
"""Parse the recorded provider response and compare it to saved content."""
config = RunConfig.from_dict(record.get("config", {}))
provider = record.get("provider") or _infer_provider(record)
provider_response = record.get("provider_response") or {}
body = provider_response.get("body")
parsed_content = _parse_provider_response(provider, body, config)
recorded_content = record.get("parsed_content")
schema_check = _check_structured_output(parsed_content, config.model_params.get("json_schema"))
return {
"provider": provider,
"parsed_content": parsed_content,
"matches_recorded_content": parsed_content == recorded_content,
"structured_output": schema_check,
}
def main(argv: list[str] | None = None) -> None:
parser = argparse.ArgumentParser(
prog="python -m llm_connect.replay",
description="Replay parsing for a llm-connect audit JSON file.",
)
parser.add_argument("audit_file", help="Path to an audit JSON file")
parser.add_argument("--json", action="store_true", help="Print the full replay report")
args = parser.parse_args(argv)
record = json.loads(Path(args.audit_file).read_text(encoding="utf-8"))
report = parse_audit_record(record)
if args.json:
print(json.dumps(report, indent=2, sort_keys=True))
else:
print(report["parsed_content"])
def _parse_provider_response(provider: str | None, body: Any, config: RunConfig) -> str:
if provider in {"openai", "openrouter"}:
if isinstance(body, dict):
choice = (body.get("choices") or [{}])[0]
return choice.get("message", {}).get("content", "")
return ""
if provider == "gemini":
if isinstance(body, dict):
candidates = body.get("candidates") or []
if not candidates:
return ""
parts = candidates[0].get("content", {}).get("parts", [])
return "".join(part.get("text", "") for part in parts)
return ""
if provider == "claude-code":
if isinstance(body, dict):
return _unwrap_cli_json_envelope(body.get("stdout", ""), config)
return ""
if isinstance(body, str):
return body
if body is None:
return ""
return json.dumps(body)
def _infer_provider(record: dict[str, Any]) -> str | None:
request = record.get("provider_request") or {}
url = request.get("url", "")
if "openrouter.ai" in url:
return "openrouter"
if "api.openai.com" in url:
return "openai"
if "generativelanguage.googleapis.com" in url:
return "gemini"
if request.get("command"):
return "claude-code"
return None
def _check_structured_output(content: str, schema: Any) -> dict[str, Any]:
if not schema:
return {"checked": False}
if isinstance(schema, str):
try:
schema = json.loads(schema)
except ValueError as exc:
return {"checked": True, "valid": False, "error": f"invalid schema JSON: {exc}"}
if not isinstance(schema, dict):
return {"checked": True, "valid": False, "error": "schema must be an object"}
try:
parsed = json.loads(content)
except ValueError as exc:
return {"checked": True, "valid": False, "error": f"invalid output JSON: {exc}"}
missing = []
if schema.get("type") == "object":
if not isinstance(parsed, dict):
return {"checked": True, "valid": False, "error": "output is not an object"}
for key in schema.get("required", []):
if key not in parsed:
missing.append(key)
if missing:
return {"checked": True, "valid": False, "missing_required": missing}
return {"checked": True, "valid": True}
if __name__ == "__main__":
main()

View File

@@ -21,13 +21,21 @@ Usage (CLI)::
""" """
import argparse import argparse
import datetime as _dt
import json import json
import os
import re
import threading import threading
from http.server import BaseHTTPRequestHandler, HTTPServer import time
import uuid
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from pathlib import Path
from typing import Optional from typing import Optional
from urllib.parse import parse_qs, urlsplit
from llm_connect._diagnostics import capture_diagnostics
from llm_connect.adapter import LLMAdapter from llm_connect.adapter import LLMAdapter
from llm_connect.models import RunConfig from llm_connect.models import LLMResponse, RunConfig
class _Handler(BaseHTTPRequestHandler): class _Handler(BaseHTTPRequestHandler):
@@ -39,7 +47,8 @@ class _Handler(BaseHTTPRequestHandler):
# ── GET ──────────────────────────────────────────────────────── # ── GET ────────────────────────────────────────────────────────
def do_GET(self): def do_GET(self):
if self.path == "/health": parsed = urlsplit(self.path)
if parsed.path == "/health":
self._respond(200, {"status": "ok"}) self._respond(200, {"status": "ok"})
else: else:
self._respond(404, {"error": "not found"}) self._respond(404, {"error": "not found"})
@@ -47,10 +56,13 @@ class _Handler(BaseHTTPRequestHandler):
# ── POST ─────────────────────────────────────────────────────── # ── POST ───────────────────────────────────────────────────────
def do_POST(self): def do_POST(self):
if self.path != "/execute": parsed = urlsplit(self.path)
if parsed.path != "/execute":
self._respond(404, {"error": "not found"}) self._respond(404, {"error": "not found"})
return return
debug_enabled = _debug_requested(parsed.query)
audit_dir = os.environ.get("LLM_CONNECT_AUDIT_DIR")
length = int(self.headers.get("Content-Length", 0)) length = int(self.headers.get("Content-Length", 0))
raw = self.rfile.read(length) raw = self.rfile.read(length)
try: try:
@@ -70,9 +82,19 @@ class _Handler(BaseHTTPRequestHandler):
return return
config = RunConfig.from_dict(cfg) config = RunConfig.from_dict(cfg)
start = time.time()
diagnostics_enabled = debug_enabled or bool(audit_dir)
try: try:
response = self.server.adapter.execute_prompt(prompt, config) # type: ignore[attr-defined] with capture_diagnostics(diagnostics_enabled) as diagnostics:
self._respond(200, response.to_dict()) response = self.server.adapter.execute_prompt(prompt, config) # type: ignore[attr-defined]
latency = time.time() - start
body = response.to_dict()
debug = diagnostics.to_dict() if diagnostics is not None else None
if debug_enabled and debug is not None:
body["debug"] = debug
if audit_dir:
_write_audit_record(audit_dir, prompt, config, response, debug, latency)
self._respond(200, body)
except Exception as exc: except Exception as exc:
self._respond(500, {"error": str(exc)}) self._respond(500, {"error": str(exc)})
@@ -102,7 +124,7 @@ class LLMServer:
host: str = "127.0.0.1", host: str = "127.0.0.1",
port: int = 8080, port: int = 8080,
) -> None: ) -> None:
self._httpd = HTTPServer((host, port), _Handler) self._httpd = ThreadingHTTPServer((host, port), _Handler)
self._httpd.adapter = adapter # type: ignore[attr-defined] self._httpd.adapter = adapter # type: ignore[attr-defined]
self._thread: Optional[threading.Thread] = None self._thread: Optional[threading.Thread] = None
@@ -138,6 +160,55 @@ def _build_adapter(provider: str, model: Optional[str]) -> LLMAdapter:
return create_adapter(provider, model=model) return create_adapter(provider, model=model)
def _debug_requested(query: str) -> bool:
env = os.environ.get("LLM_CONNECT_DEBUG", "")
if _truthy(env):
return True
values = parse_qs(query).get("debug", [])
return any(_truthy(value) for value in values)
def _truthy(value: str) -> bool:
return value.strip().lower() in {"1", "true", "yes", "on"}
def _write_audit_record(
audit_dir: str,
prompt: str,
config: RunConfig,
response: LLMResponse,
debug: dict | None,
latency_seconds: float,
) -> None:
target_dir = Path(audit_dir)
target_dir.mkdir(parents=True, exist_ok=True)
now = _dt.datetime.now(_dt.timezone.utc)
response_id = str(response.metadata.get("response_id") or uuid.uuid4().hex)
filename = f"{now.strftime('%Y%m%dT%H%M%S%fZ')}-{_safe_filename(response_id)}.json"
diagnostics = debug or {}
record = {
"timestamp": now.isoformat().replace("+00:00", "Z"),
"prompt": prompt,
"config": config.to_dict(),
"provider": response.metadata.get("provider"),
"provider_request": diagnostics.get("provider_request"),
"provider_response": diagnostics.get("provider_response"),
"adapter_transformations": diagnostics.get("adapter_transformations", []),
"parsed_content": response.content,
"latency_seconds": round(latency_seconds, 3),
"response": response.to_dict(),
}
(target_dir / filename).write_text(
json.dumps(record, indent=2, sort_keys=True),
encoding="utf-8",
)
def _safe_filename(value: str) -> str:
return re.sub(r"[^A-Za-z0-9_.-]+", "-", value).strip("-") or "response"
def main(argv=None) -> None: def main(argv=None) -> None:
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
prog="python -m llm_connect.server", prog="python -m llm_connect.server",

81
tests/test_payload.py Normal file
View File

@@ -0,0 +1,81 @@
from llm_connect._payload import merge_gemini_model_params, merge_openai_chat_model_params
STRUCTURED_SCHEMA = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"recommendations": {"type": "array", "items": {"type": "string"}},
},
"required": ["summary", "recommendations"],
}
ACTIVITY_CORE_MODEL_PARAMS = {
"reasoning_effort": "medium",
"max_depth": 4,
"json_schema": STRUCTURED_SCHEMA,
"top_p": 0.8,
}
def test_openai_chat_model_params_translate_activity_core_shape():
payload = {
"model": "gpt-4.1-mini",
"messages": [{"role": "user", "content": "triage"}],
"temperature": 0.2,
"max_tokens": 200,
}
merge_openai_chat_model_params(payload, ACTIVITY_CORE_MODEL_PARAMS)
assert payload["response_format"] == {
"type": "json_schema",
"json_schema": {
"name": "structured_output",
"schema": STRUCTURED_SCHEMA,
"strict": False,
},
}
assert payload["top_p"] == 0.8
assert "reasoning_effort" not in payload
assert "max_depth" not in payload
assert "json_schema" not in payload
def test_openai_chat_model_params_preserve_explicit_response_format():
explicit = {
"type": "json_schema",
"json_schema": {
"name": "custom",
"schema": STRUCTURED_SCHEMA,
"strict": True,
},
}
payload = {"model": "gpt-4.1-mini", "messages": []}
merge_openai_chat_model_params(
payload,
{"json_schema": STRUCTURED_SCHEMA, "response_format": explicit},
)
assert payload["response_format"] == explicit
def test_gemini_model_params_translate_activity_core_shape():
payload = {
"contents": [{"role": "user", "parts": [{"text": "triage"}]}],
"generationConfig": {
"temperature": 0.2,
"maxOutputTokens": 200,
},
}
merge_gemini_model_params(payload, ACTIVITY_CORE_MODEL_PARAMS)
assert payload["generationConfig"]["responseMimeType"] == "application/json"
assert payload["generationConfig"]["responseSchema"] == STRUCTURED_SCHEMA
assert payload["generationConfig"]["topP"] == 0.8
assert "reasoning_effort" not in payload
assert "max_depth" not in payload
assert "json_schema" not in payload

62
tests/test_replay.py Normal file
View File

@@ -0,0 +1,62 @@
from llm_connect.replay import parse_audit_record
STRUCTURED_SCHEMA = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"recommendations": {"type": "array", "items": {"type": "string"}},
},
"required": ["summary", "recommendations"],
}
def test_replay_parses_openai_style_provider_response():
record = {
"provider": "openrouter",
"config": {"model_params": {"json_schema": STRUCTURED_SCHEMA}},
"provider_response": {
"status": 200,
"body": {
"choices": [
{
"message": {
"content": '{"summary":"ok","recommendations":[]}'
}
}
]
},
},
"parsed_content": '{"summary":"ok","recommendations":[]}',
}
report = parse_audit_record(record)
assert report["parsed_content"] == '{"summary":"ok","recommendations":[]}'
assert report["matches_recorded_content"] is True
assert report["structured_output"] == {"checked": True, "valid": True}
def test_replay_reuses_claude_code_envelope_unwrapper():
record = {
"provider": "claude-code",
"config": {"model_params": {"json_schema": STRUCTURED_SCHEMA}},
"provider_response": {
"status": 0,
"body": {
"stdout": (
'{"type":"result","result":"prose",'
'"structured_result":"{\\"summary\\":\\"ok\\",'
'\\"recommendations\\":[]}"}'
),
"stderr": "",
},
},
"parsed_content": '{"summary":"ok","recommendations":[]}',
}
report = parse_audit_record(record)
assert report["parsed_content"] == '{"summary":"ok","recommendations":[]}'
assert report["matches_recorded_content"] is True
assert report["structured_output"] == {"checked": True, "valid": True}

View File

@@ -2,14 +2,22 @@
Tests for LLMServer HTTP serve mode (FR-1). Tests for LLMServer HTTP serve mode (FR-1).
""" """
import threading
import time
from concurrent.futures import ThreadPoolExecutor
import json import json
import urllib.error import urllib.error
import urllib.request import urllib.request
import pytest import pytest
from llm_connect._diagnostics import (
record_adapter_transformation,
record_provider_request,
record_provider_response,
)
from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter
from llm_connect.models import RunConfig from llm_connect.models import LLMResponse, RunConfig
from llm_connect.server import LLMServer from llm_connect.server import LLMServer
@@ -45,6 +53,35 @@ def _post(url: str, body: dict) -> tuple[int, dict]:
return exc.code, json.loads(exc.read()) return exc.code, json.loads(exc.read())
class DiagnosticLLMAdapter(MockLLMAdapter):
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
record_provider_request(
url="https://provider.example/v1/chat",
payload={"prompt": prompt, "model": config.model_name},
headers={"Authorization": "Bearer secret-token"},
)
response = super().execute_prompt(prompt, config)
response.metadata["provider"] = "diagnostic"
response.metadata["response_id"] = "diag-response"
record_provider_response(status=200, body={"id": "diag-response", "content": response.content})
record_adapter_transformation(
"diagnostic_transform",
{"before": prompt},
{"after": response.content},
)
return response
class BarrierLLMAdapter(MockLLMAdapter):
def __init__(self):
super().__init__(mock_response="parallel")
self._barrier = threading.Barrier(2)
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
self._barrier.wait(timeout=2.0)
return super().execute_prompt(prompt, config)
class TestHealth: class TestHealth:
def test_health_returns_200(self, server): def test_health_returns_200(self, server):
status, body = _get(f"http://127.0.0.1:{server.port}/health") status, body = _get(f"http://127.0.0.1:{server.port}/health")
@@ -65,6 +102,7 @@ class TestExecute:
assert status == 200 assert status == 200
assert body["content"] == "hello world" assert body["content"] == "hello world"
assert body["finish_reason"] == "stop" assert body["finish_reason"] == "stop"
assert "debug" not in body
def test_response_includes_usage(self, server): def test_response_includes_usage(self, server):
status, body = _post( status, body = _post(
@@ -150,3 +188,86 @@ class TestExecute:
) )
assert status == 400 assert status == 400
assert "config" in body["error"] assert "config" in body["error"]
def test_debug_query_returns_diagnostics(self):
s = LLMServer(adapter=DiagnosticLLMAdapter(mock_response="debug body"), port=0)
s.start()
try:
status, body = _post(
f"http://127.0.0.1:{s.port}/execute?debug=1",
{"prompt": "inspect", "config": {"model_name": "diagnostic-model"}},
)
finally:
s.stop()
assert status == 200
assert body["content"] == "debug body"
debug = body["debug"]
assert debug["provider_request"]["payload"] == {
"prompt": "inspect",
"model": "diagnostic-model",
}
assert debug["provider_request"]["headers_redacted"]["Authorization"] == "Bearer <redacted>"
assert debug["provider_response"]["status"] == 200
assert debug["adapter_transformations"][0]["step"] == "diagnostic_transform"
def test_debug_env_returns_diagnostics(self, monkeypatch):
monkeypatch.setenv("LLM_CONNECT_DEBUG", "1")
s = LLMServer(adapter=DiagnosticLLMAdapter(mock_response="debug body"), port=0)
s.start()
try:
status, body = _post(
f"http://127.0.0.1:{s.port}/execute",
{"prompt": "inspect"},
)
finally:
s.stop()
assert status == 200
assert "debug" in body
def test_audit_dir_records_replayable_call(self, monkeypatch, tmp_path):
monkeypatch.setenv("LLM_CONNECT_AUDIT_DIR", str(tmp_path))
s = LLMServer(adapter=DiagnosticLLMAdapter(mock_response="audit body"), port=0)
s.start()
try:
status, body = _post(
f"http://127.0.0.1:{s.port}/execute",
{"prompt": "audit me", "config": {"model_name": "audit-model"}},
)
finally:
s.stop()
assert status == 200
assert "debug" not in body
files = list(tmp_path.glob("*.json"))
assert len(files) == 1
record = json.loads(files[0].read_text(encoding="utf-8"))
assert record["prompt"] == "audit me"
assert record["config"]["model_name"] == "audit-model"
assert record["parsed_content"] == "audit body"
assert record["provider_request"]["headers_redacted"]["Authorization"] == "Bearer <redacted>"
assert record["provider_response"]["body"]["id"] == "diag-response"
assert record["latency_seconds"] >= 0
def test_execute_requests_run_concurrently(self):
s = LLMServer(adapter=BarrierLLMAdapter(), port=0)
s.start()
try:
start = time.monotonic()
with ThreadPoolExecutor(max_workers=2) as pool:
futures = [
pool.submit(
_post,
f"http://127.0.0.1:{s.port}/execute",
{"prompt": f"request {idx}"},
)
for idx in range(2)
]
results = [future.result(timeout=3.0) for future in futures]
elapsed = time.monotonic() - start
finally:
s.stop()
assert [status for status, _body in results] == [200, 200]
assert elapsed < 1.5

View File

@@ -0,0 +1,142 @@
import json
from llm_connect.gemini import GeminiAdapter
from llm_connect.models import RunConfig
from llm_connect.openai import OpenAIAdapter
from llm_connect.openrouter import OpenRouterAdapter
STRUCTURED_SCHEMA = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"recommendations": {"type": "array", "items": {"type": "string"}},
},
"required": ["summary", "recommendations"],
}
SMOKE_CONFIG = RunConfig(
model_name="gpt-4",
temperature=0.1,
max_tokens=300,
model_params={
"reasoning_effort": "medium",
"max_depth": 3,
"json_schema": STRUCTURED_SCHEMA,
},
)
def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):
captured: dict[str, object] = {}
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
captured["url"] = url
captured["payload"] = payload
captured["headers"] = headers
captured["timeout"] = timeout
return {
"id": "or-response",
"model": payload["model"],
"choices": [
{
"message": {
"content": json.dumps(
{"summary": "ok", "recommendations": ["keep payload clean"]}
)
},
"finish_reason": "stop",
}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
}
monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
adapter = OpenRouterAdapter(
model="anthropic/claude-sonnet-4",
api_key="or-test",
api_base="https://openrouter.example/api/v1",
)
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
payload = captured["payload"]
assert response.model == "anthropic/claude-sonnet-4"
assert payload["model"] == "anthropic/claude-sonnet-4"
assert payload["response_format"]["json_schema"]["schema"] == STRUCTURED_SCHEMA
assert payload["response_format"]["json_schema"]["strict"] is False
assert "reasoning_effort" not in payload
assert "max_depth" not in payload
assert "json_schema" not in payload
def test_openai_structured_output_payload(monkeypatch):
captured: dict[str, object] = {}
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
captured["payload"] = payload
return {
"id": "oa-response",
"model": payload["model"],
"choices": [
{
"message": {
"content": json.dumps({"summary": "ok", "recommendations": []})
},
"finish_reason": "stop",
}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
}
monkeypatch.setattr("llm_connect.openai.post_json", fake_post_json)
adapter = OpenAIAdapter(model="gpt-4.1-mini", api_key="sk-test")
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
payload = captured["payload"]
assert response.model == "gpt-4.1-mini"
assert payload["model"] == "gpt-4.1-mini"
assert payload["response_format"]["json_schema"]["schema"] == STRUCTURED_SCHEMA
assert "reasoning_effort" not in payload
assert "max_depth" not in payload
assert "json_schema" not in payload
def test_gemini_structured_output_payload(monkeypatch):
captured: dict[str, object] = {}
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
captured["url"] = url
captured["payload"] = payload
return {
"candidates": [
{
"content": {
"parts": [
{"text": json.dumps({"summary": "ok", "recommendations": []})}
]
},
"finishReason": "STOP",
}
],
"usageMetadata": {
"promptTokenCount": 1,
"candidatesTokenCount": 2,
"totalTokenCount": 3,
},
}
monkeypatch.setattr("llm_connect.gemini.post_json", fake_post_json)
adapter = GeminiAdapter(model="gemini-2.5-flash", api_key="gemini-test")
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
payload = captured["payload"]
assert response.model == "gemini-2.5-flash"
assert payload["generationConfig"]["responseMimeType"] == "application/json"
assert payload["generationConfig"]["responseSchema"] == STRUCTURED_SCHEMA
assert "reasoning_effort" not in payload
assert "max_depth" not in payload
assert "json_schema" not in payload

View File

@@ -4,11 +4,11 @@ type: workplan
title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary" title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary"
domain: custodian domain: custodian
repo: llm-connect repo: llm-connect
status: ready status: finished
owner: custodian owner: custodian
topic_slug: custodian topic_slug: custodian
created: "2026-06-02" created: "2026-06-02"
updated: "2026-06-02" updated: "2026-06-03"
state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda" state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda"
--- ---
@@ -38,7 +38,7 @@ workplan.
```task ```task
id: ADHOC-2026-06-02-T01 id: ADHOC-2026-06-02-T01
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293" state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"
``` ```
@@ -78,7 +78,7 @@ debug field is omitted in normal mode.
```task ```task
id: ADHOC-2026-06-02-T02 id: ADHOC-2026-06-02-T02
status: todo status: done
priority: low priority: low
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba" state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"
``` ```
@@ -101,7 +101,7 @@ max of their individual latencies, not the sum.
```task ```task
id: ADHOC-2026-06-02-T03 id: ADHOC-2026-06-02-T03
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f" state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"
``` ```
@@ -127,7 +127,7 @@ ergonomics.
```task ```task
id: ADHOC-2026-06-02-T04 id: ADHOC-2026-06-02-T04
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751" state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"
``` ```
@@ -155,7 +155,7 @@ forbidden top-level fields, schema in the right wrapper).
```task ```task
id: ADHOC-2026-06-02-T05 id: ADHOC-2026-06-02-T05
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f" state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"
``` ```
@@ -188,7 +188,7 @@ bug) before either was merged.
```task ```task
id: ADHOC-2026-06-02-T06 id: ADHOC-2026-06-02-T06
status: todo status: done
priority: low priority: low
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711" state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"
``` ```
@@ -210,3 +210,21 @@ would only send OpenAI-valid fields. Codify the contract in
Done when a new adapter author can read the doc and know what their Done when a new adapter author can read the doc and know what their
`_merge_model_params` implementation must support. `_merge_model_params` implementation must support.
## Implementation Notes
Completed on 2026-06-03:
- Added opt-in `/execute` debug envelopes via `LLM_CONNECT_DEBUG=1` or
`?debug=1`, with redacted provider request/response capture and adapter
transformation records.
- Switched serve mode to `ThreadingHTTPServer` and added a concurrency
regression test.
- Added `LLM_CONNECT_AUDIT_DIR` per-call audit records plus
`python -m llm_connect.replay` for parser/unwrapper replay.
- Extracted shared OpenAI-compatible and Gemini payload translation helpers
and wired OpenRouter, OpenAI, and Gemini through them.
- Added CI-safe structured-output smoke tests that mock provider HTTP calls
and assert model routing plus payload shape.
- Documented the adapter `model_params` contract in
`docs/adapter-model-params.md`.