generated from coulomb/repo-seed
Implement llm-connect ADHOC diagnostics
This commit is contained in:
@@ -32,6 +32,9 @@ Maturity states: **Experimental → Beta → Stable → Deprecated**
|
|||||||
| `gemini.py` | `GeminiAdapter` — Google Generative Language API | Beta |
|
| `gemini.py` | `GeminiAdapter` — Google Generative Language API | Beta |
|
||||||
| `openrouter.py` | `OpenRouterAdapter` — OpenAI-compatible multi-model routing | Beta |
|
| `openrouter.py` | `OpenRouterAdapter` — OpenAI-compatible multi-model routing | Beta |
|
||||||
| `claude_code.py` | `ClaudeCodeAdapter` — `claude --print` subprocess | Beta |
|
| `claude_code.py` | `ClaudeCodeAdapter` — `claude --print` subprocess | Beta |
|
||||||
|
| `_payload.py` | Shared adapter payload translation for `RunConfig.model_params` | Beta |
|
||||||
|
| `_diagnostics.py` | Opt-in per-call diagnostics capture for server debug and audit modes | Beta |
|
||||||
|
| `replay.py` | Audit replay parser CLI (`python -m llm_connect.replay`) | Beta |
|
||||||
| `embedding_adapter.py` | `EmbeddingAdapter` ABC | Beta |
|
| `embedding_adapter.py` | `EmbeddingAdapter` ABC | Beta |
|
||||||
| `embedding_openai.py` | `OpenAICompatibleEmbeddingAdapter` | Beta |
|
| `embedding_openai.py` | `OpenAICompatibleEmbeddingAdapter` | Beta |
|
||||||
| `embedding_cache.py` | `EmbeddingCache` — disk-backed embedding cache | Beta |
|
| `embedding_cache.py` | `EmbeddingCache` — disk-backed embedding cache | Beta |
|
||||||
|
|||||||
38
README.md
38
README.md
@@ -73,15 +73,15 @@ config = RunConfig(
|
|||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
| Field | Default | Description |
|
| Field | Default | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `model_name` | `"gpt-4"` | Model identifier (adapter may override) |
|
| `model_name` | `"gpt-4"` | Model identifier (adapter may override) |
|
||||||
| `temperature` | `0.7` | Sampling temperature |
|
| `temperature` | `0.7` | Sampling temperature |
|
||||||
| `max_tokens` | `2000` | Maximum output tokens |
|
| `max_tokens` | `2000` | Maximum output tokens |
|
||||||
| `model_params` | `{}` | Extra provider-specific parameters |
|
| `model_params` | `{}` | Portable extras translated by each adapter; see `docs/adapter-model-params.md` |
|
||||||
| `max_depth` | `3` | Max nesting depth for recursive calls |
|
| `max_depth` | `3` | Max nesting depth for recursive calls |
|
||||||
| `skip_if_exists` | `True` | Skip if identical input hash already processed |
|
| `skip_if_exists` | `True` | Skip if identical input hash already processed |
|
||||||
| `timeout_seconds` | `300` | Request timeout |
|
| `timeout_seconds` | `300` | Request timeout |
|
||||||
|
|
||||||
### `LLMResponse`
|
### `LLMResponse`
|
||||||
|
|
||||||
@@ -92,8 +92,24 @@ response = adapter.execute_prompt(prompt, config)
|
|||||||
print(response.content) # generated text
|
print(response.content) # generated text
|
||||||
print(response.model) # model actually used
|
print(response.model) # model actually used
|
||||||
print(response.usage) # {"prompt_tokens": …, "completion_tokens": …, "total_tokens": …}
|
print(response.usage) # {"prompt_tokens": …, "completion_tokens": …, "total_tokens": …}
|
||||||
print(response.finish_reason) # "stop", "length", etc.
|
print(response.finish_reason) # "stop", "length", etc.
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Server diagnostics
|
||||||
|
|
||||||
|
Serve mode can include a debug envelope without changing normal responses:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
|
||||||
|
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Set `LLM_CONNECT_AUDIT_DIR=/path/to/audit` to write per-call replay records,
|
||||||
|
then parse one without another provider call:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m llm_connect.replay /path/to/audit/record.json --json
|
||||||
|
```
|
||||||
|
|
||||||
## Writing your own adapter
|
## Writing your own adapter
|
||||||
|
|
||||||
|
|||||||
102
docs/adapter-model-params.md
Normal file
102
docs/adapter-model-params.md
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
# Adapter `model_params` contract
|
||||||
|
|
||||||
|
`RunConfig.model_params` is a portability layer, not a blind provider payload
|
||||||
|
escape hatch. Adapters must translate the shared keys they understand, pass
|
||||||
|
through only provider-valid keys, and drop provider-specific keys that would
|
||||||
|
make another provider reject the request.
|
||||||
|
|
||||||
|
## Shared structured output
|
||||||
|
|
||||||
|
Callers may request structured output with:
|
||||||
|
|
||||||
|
```python
|
||||||
|
RunConfig(
|
||||||
|
model_params={
|
||||||
|
"json_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"summary": {"type": "string"},
|
||||||
|
"recommendations": {"type": "array", "items": {"type": "string"}},
|
||||||
|
},
|
||||||
|
"required": ["summary", "recommendations"],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Adapters translate that key into the provider's native shape:
|
||||||
|
|
||||||
|
| Adapter | Translation |
|
||||||
|
|---|---|
|
||||||
|
| OpenAI | `response_format = {"type": "json_schema", "json_schema": ...}` |
|
||||||
|
| OpenRouter | Same OpenAI-compatible `response_format` wrapper |
|
||||||
|
| Gemini | `generationConfig.responseMimeType = "application/json"` and `generationConfig.responseSchema = ...` |
|
||||||
|
| Claude Code CLI | `--json-schema <schema>` plus `--output-format json`, then envelope unwrap |
|
||||||
|
|
||||||
|
OpenAI-compatible adapters default `json_schema.strict` to `False`. Strict mode
|
||||||
|
requires schemas to meet provider-specific constraints such as
|
||||||
|
`additionalProperties: false` on object nodes and complete `required` lists.
|
||||||
|
Callers that need strict behavior can pass an explicit provider-native
|
||||||
|
`response_format` in `model_params`.
|
||||||
|
|
||||||
|
## Pass-through keys
|
||||||
|
|
||||||
|
OpenAI and OpenRouter pass through known Chat Completions fields:
|
||||||
|
|
||||||
|
`top_p`, `n`, `stream`, `stop`, `presence_penalty`, `frequency_penalty`,
|
||||||
|
`logit_bias`, `user`, `seed`, `tools`, `tool_choice`, `response_format`,
|
||||||
|
`logprobs`, `top_logprobs`, and `parallel_tool_calls`.
|
||||||
|
|
||||||
|
Gemini passes through valid `generateContent` top-level fields:
|
||||||
|
|
||||||
|
`safetySettings`, `tools`, `toolConfig`, `systemInstruction`, and
|
||||||
|
`cachedContent`.
|
||||||
|
|
||||||
|
Gemini also accepts generation config fields directly or via snake-case aliases:
|
||||||
|
|
||||||
|
`candidateCount`, `candidate_count`, `stopSequences`, `stop_sequences`,
|
||||||
|
`maxOutputTokens`, `max_output_tokens`, `temperature`, `topP`, `top_p`, `topK`,
|
||||||
|
`top_k`, `responseMimeType`, `response_mime_type`, `responseSchema`, and
|
||||||
|
`response_schema`.
|
||||||
|
|
||||||
|
## Dropped keys
|
||||||
|
|
||||||
|
Adapters must drop keys that are meaningful to another adapter or to
|
||||||
|
llm-connect itself but invalid for the target provider. The current shared drop
|
||||||
|
set includes:
|
||||||
|
|
||||||
|
`reasoning_effort`, `max_depth`, `claude_cli_path`, and raw `json_schema` after
|
||||||
|
translation.
|
||||||
|
|
||||||
|
Unknown keys are ignored by default. This keeps activity-specific configs from
|
||||||
|
causing provider HTTP 400 errors when a caller switches providers.
|
||||||
|
|
||||||
|
## Diagnostics and replay
|
||||||
|
|
||||||
|
Server mode supports opt-in diagnostics for `/execute`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
|
||||||
|
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Debug responses include a `debug` field with the redacted provider request, raw
|
||||||
|
provider response body, and adapter transformations such as `merge_model_params`
|
||||||
|
or `unwrap_cli_envelope`. Normal responses omit `debug`.
|
||||||
|
|
||||||
|
Set `LLM_CONNECT_AUDIT_DIR=/path/to/audit` to write one JSON audit record per
|
||||||
|
`/execute` call. Audit records include the prompt, config, redacted provider
|
||||||
|
request, provider response, parsed content, and latency. Re-run parsing without
|
||||||
|
another provider call with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m llm_connect.replay /path/to/audit/record.json --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Server concurrency
|
||||||
|
|
||||||
|
`llm_connect.server.LLMServer` uses `ThreadingHTTPServer`. Adapter instances
|
||||||
|
used in server mode must be safe to call concurrently. The bundled HTTP and
|
||||||
|
subprocess adapters keep per-call state local; custom adapters should avoid
|
||||||
|
mutating shared instance attributes during `execute_prompt` unless they use
|
||||||
|
their own locks.
|
||||||
153
llm_connect/_diagnostics.py
Normal file
153
llm_connect/_diagnostics.py
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
"""Per-call diagnostics capture for server debug and audit modes."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import copy
|
||||||
|
import json
|
||||||
|
from contextlib import contextmanager
|
||||||
|
from contextvars import ContextVar
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Any, Iterator, Mapping
|
||||||
|
from urllib.parse import parse_qsl, urlencode, urlsplit, urlunsplit
|
||||||
|
|
||||||
|
|
||||||
|
_SECRET_QUERY_KEYS = {"key", "api_key", "apikey", "access_token", "token"}
|
||||||
|
_SECRET_HEADER_TOKENS = ("authorization", "api-key", "apikey", "token", "secret", "key")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Diagnostics:
|
||||||
|
"""Captured provider request/response details for one logical LLM call."""
|
||||||
|
|
||||||
|
provider_request: dict[str, Any] | None = None
|
||||||
|
provider_response: dict[str, Any] | None = None
|
||||||
|
adapter_transformations: list[dict[str, Any]] = field(default_factory=list)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"provider_request": self.provider_request,
|
||||||
|
"provider_response": self.provider_response,
|
||||||
|
"adapter_transformations": self.adapter_transformations,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
_CURRENT: ContextVar[Diagnostics | None] = ContextVar(
|
||||||
|
"llm_connect_diagnostics",
|
||||||
|
default=None,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@contextmanager
|
||||||
|
def capture_diagnostics(enabled: bool = True) -> Iterator[Diagnostics | None]:
|
||||||
|
"""Capture diagnostics within this context when *enabled* is true."""
|
||||||
|
|
||||||
|
if not enabled:
|
||||||
|
yield None
|
||||||
|
return
|
||||||
|
|
||||||
|
diagnostics = Diagnostics()
|
||||||
|
token = _CURRENT.set(diagnostics)
|
||||||
|
try:
|
||||||
|
yield diagnostics
|
||||||
|
finally:
|
||||||
|
_CURRENT.reset(token)
|
||||||
|
|
||||||
|
|
||||||
|
def diagnostics_enabled() -> bool:
|
||||||
|
return _CURRENT.get() is not None
|
||||||
|
|
||||||
|
|
||||||
|
def current_diagnostics() -> Diagnostics | None:
|
||||||
|
return _CURRENT.get()
|
||||||
|
|
||||||
|
|
||||||
|
def record_provider_request(
|
||||||
|
*,
|
||||||
|
url: str | None = None,
|
||||||
|
payload: Any | None = None,
|
||||||
|
headers: Mapping[str, Any] | None = None,
|
||||||
|
command: list[str] | None = None,
|
||||||
|
) -> None:
|
||||||
|
diagnostics = _CURRENT.get()
|
||||||
|
if diagnostics is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
request: dict[str, Any] = {}
|
||||||
|
if url is not None:
|
||||||
|
request["url"] = redact_url(url)
|
||||||
|
if payload is not None:
|
||||||
|
request["payload"] = json_safe(payload)
|
||||||
|
if headers is not None:
|
||||||
|
request["headers_redacted"] = redact_headers(headers)
|
||||||
|
if command is not None:
|
||||||
|
request["command"] = list(command)
|
||||||
|
diagnostics.provider_request = request
|
||||||
|
|
||||||
|
|
||||||
|
def record_provider_response(*, status: int | None = None, body: Any | None = None) -> None:
|
||||||
|
diagnostics = _CURRENT.get()
|
||||||
|
if diagnostics is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
response: dict[str, Any] = {}
|
||||||
|
if status is not None:
|
||||||
|
response["status"] = status
|
||||||
|
if body is not None:
|
||||||
|
response["body"] = json_safe(body)
|
||||||
|
diagnostics.provider_response = response
|
||||||
|
|
||||||
|
|
||||||
|
def record_adapter_transformation(step: str, before: Any, after: Any) -> None:
|
||||||
|
diagnostics = _CURRENT.get()
|
||||||
|
if diagnostics is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
diagnostics.adapter_transformations.append(
|
||||||
|
{
|
||||||
|
"step": step,
|
||||||
|
"before": json_safe(before),
|
||||||
|
"after": json_safe(after),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def json_safe(value: Any) -> Any:
|
||||||
|
"""Return a JSON-serializable snapshot of *value* without mutating it."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
return json.loads(json.dumps(value))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
try:
|
||||||
|
return copy.deepcopy(value)
|
||||||
|
except Exception:
|
||||||
|
return repr(value)
|
||||||
|
|
||||||
|
|
||||||
|
def redact_headers(headers: Mapping[str, Any]) -> dict[str, Any]:
|
||||||
|
redacted: dict[str, Any] = {}
|
||||||
|
for key, value in headers.items():
|
||||||
|
lowered = str(key).lower()
|
||||||
|
if any(token in lowered for token in _SECRET_HEADER_TOKENS):
|
||||||
|
redacted[str(key)] = _redact_header_value(value)
|
||||||
|
else:
|
||||||
|
redacted[str(key)] = json_safe(value)
|
||||||
|
return redacted
|
||||||
|
|
||||||
|
|
||||||
|
def redact_url(url: str) -> str:
|
||||||
|
parts = urlsplit(url)
|
||||||
|
query = []
|
||||||
|
for key, value in parse_qsl(parts.query, keep_blank_values=True):
|
||||||
|
if key.lower() in _SECRET_QUERY_KEYS:
|
||||||
|
query.append((key, "<redacted>"))
|
||||||
|
else:
|
||||||
|
query.append((key, value))
|
||||||
|
return urlunsplit((parts.scheme, parts.netloc, parts.path, urlencode(query), parts.fragment))
|
||||||
|
|
||||||
|
|
||||||
|
def _redact_header_value(value: Any) -> str:
|
||||||
|
text = str(value)
|
||||||
|
if " " in text:
|
||||||
|
scheme = text.split(" ", 1)[0]
|
||||||
|
return f"{scheme} <redacted>"
|
||||||
|
return "<redacted>"
|
||||||
@@ -1,86 +1,101 @@
|
|||||||
"""
|
"""
|
||||||
Thin synchronous HTTP helper built on :mod:`urllib.request`.
|
Thin synchronous HTTP helper built on :mod:`urllib.request`.
|
||||||
|
|
||||||
Translates HTTP errors into typed :mod:`markitect.llm.exceptions`.
|
Translates HTTP errors into typed :mod:`markitect.llm.exceptions`.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
import urllib.request
|
import urllib.error
|
||||||
import urllib.error
|
import urllib.request
|
||||||
from typing import Dict, Any, Optional
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
from llm_connect.exceptions import (
|
from llm_connect._diagnostics import record_provider_request, record_provider_response
|
||||||
LLMAPIError,
|
from llm_connect.exceptions import (
|
||||||
LLMRateLimitError,
|
LLMAPIError,
|
||||||
LLMTimeoutError,
|
LLMRateLimitError,
|
||||||
)
|
LLMTimeoutError,
|
||||||
|
)
|
||||||
|
|
||||||
def post_json(
|
|
||||||
url: str,
|
def post_json(
|
||||||
payload: Dict[str, Any],
|
url: str,
|
||||||
headers: Optional[Dict[str, str]] = None,
|
payload: Dict[str, Any],
|
||||||
timeout: int = 300,
|
headers: Optional[Dict[str, str]] = None,
|
||||||
) -> Dict[str, Any]:
|
timeout: int = 300,
|
||||||
"""POST *payload* as JSON and return the parsed response body.
|
) -> Dict[str, Any]:
|
||||||
|
"""POST *payload* as JSON and return the parsed response body.
|
||||||
Raises:
|
|
||||||
LLMRateLimitError: on HTTP 429
|
Raises:
|
||||||
LLMAPIError: on other non-2xx responses
|
LLMRateLimitError: on HTTP 429
|
||||||
LLMTimeoutError: on socket / read timeout
|
LLMAPIError: on other non-2xx responses
|
||||||
"""
|
LLMTimeoutError: on socket / read timeout
|
||||||
data = json.dumps(payload).encode()
|
"""
|
||||||
req = urllib.request.Request(
|
record_provider_request(url=url, payload=payload, headers=headers or {})
|
||||||
url,
|
data = json.dumps(payload).encode()
|
||||||
data=data,
|
req = urllib.request.Request(
|
||||||
headers={"Content-Type": "application/json", **(headers or {})},
|
url,
|
||||||
method="POST",
|
data=data,
|
||||||
)
|
headers={"Content-Type": "application/json", **(headers or {})},
|
||||||
|
method="POST",
|
||||||
try:
|
)
|
||||||
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
|
||||||
body = resp.read().decode()
|
try:
|
||||||
try:
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||||
return json.loads(body)
|
body = resp.read().decode()
|
||||||
except json.JSONDecodeError as exc:
|
try:
|
||||||
preview = body[:300].replace("\n", "\\n")
|
parsed = json.loads(body)
|
||||||
raise LLMAPIError(
|
record_provider_response(status=resp.status, body=parsed)
|
||||||
f"Invalid JSON response from {url}: {exc} — body preview: {preview!r}",
|
return parsed
|
||||||
cause=exc,
|
except json.JSONDecodeError as exc:
|
||||||
) from exc
|
record_provider_response(status=resp.status, body=body)
|
||||||
except urllib.error.HTTPError as exc:
|
preview = body[:300].replace("\n", "\\n")
|
||||||
body = ""
|
raise LLMAPIError(
|
||||||
try:
|
f"Invalid JSON response from {url}: {exc} - body preview: {preview!r}",
|
||||||
body = exc.read().decode()
|
cause=exc,
|
||||||
except Exception:
|
) from exc
|
||||||
pass
|
except urllib.error.HTTPError as exc:
|
||||||
|
body = ""
|
||||||
if exc.code == 429:
|
try:
|
||||||
raise LLMRateLimitError(
|
body = exc.read().decode()
|
||||||
f"Rate limited (429) from {url}",
|
except Exception:
|
||||||
status_code=429,
|
pass
|
||||||
response_body=body,
|
record_provider_response(status=exc.code, body=_json_or_text(body))
|
||||||
cause=exc,
|
|
||||||
) from exc
|
if exc.code == 429:
|
||||||
|
raise LLMRateLimitError(
|
||||||
raise LLMAPIError(
|
f"Rate limited (429) from {url}",
|
||||||
f"HTTP {exc.code} from {url}",
|
status_code=429,
|
||||||
status_code=exc.code,
|
response_body=body,
|
||||||
response_body=body,
|
cause=exc,
|
||||||
cause=exc,
|
) from exc
|
||||||
) from exc
|
|
||||||
except urllib.error.URLError as exc:
|
raise LLMAPIError(
|
||||||
if "timed out" in str(exc.reason):
|
f"HTTP {exc.code} from {url}",
|
||||||
raise LLMTimeoutError(
|
status_code=exc.code,
|
||||||
f"Request to {url} timed out after {timeout}s",
|
response_body=body,
|
||||||
cause=exc,
|
cause=exc,
|
||||||
) from exc
|
) from exc
|
||||||
raise LLMAPIError(
|
except urllib.error.URLError as exc:
|
||||||
f"URL error for {url}: {exc.reason}",
|
record_provider_response(body={"error": str(exc.reason)})
|
||||||
cause=exc,
|
if "timed out" in str(exc.reason):
|
||||||
) from exc
|
raise LLMTimeoutError(
|
||||||
except TimeoutError as exc:
|
f"Request to {url} timed out after {timeout}s",
|
||||||
raise LLMTimeoutError(
|
cause=exc,
|
||||||
f"Request to {url} timed out after {timeout}s",
|
) from exc
|
||||||
cause=exc,
|
raise LLMAPIError(
|
||||||
) from exc
|
f"URL error for {url}: {exc.reason}",
|
||||||
|
cause=exc,
|
||||||
|
) from exc
|
||||||
|
except TimeoutError as exc:
|
||||||
|
record_provider_response(body={"error": "timeout"})
|
||||||
|
raise LLMTimeoutError(
|
||||||
|
f"Request to {url} timed out after {timeout}s",
|
||||||
|
cause=exc,
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
|
||||||
|
def _json_or_text(body: str) -> Any:
|
||||||
|
try:
|
||||||
|
return json.loads(body)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return body
|
||||||
|
|||||||
154
llm_connect/_payload.py
Normal file
154
llm_connect/_payload.py
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
"""Provider payload helpers for translating ``RunConfig.model_params``."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from llm_connect._diagnostics import (
|
||||||
|
diagnostics_enabled,
|
||||||
|
json_safe,
|
||||||
|
record_adapter_transformation,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# OpenAI Chat Completions fields that map straight through from model_params.
|
||||||
|
# Anything not in this set is provider-specific and must be either translated
|
||||||
|
# or dropped. Blind merges are deliberately avoided because OpenAI-compatible
|
||||||
|
# providers commonly reject unknown top-level fields with HTTP 400.
|
||||||
|
OPENAI_CHAT_PASSTHROUGH_FIELDS = frozenset(
|
||||||
|
{
|
||||||
|
"top_p",
|
||||||
|
"n",
|
||||||
|
"stream",
|
||||||
|
"stop",
|
||||||
|
"presence_penalty",
|
||||||
|
"frequency_penalty",
|
||||||
|
"logit_bias",
|
||||||
|
"user",
|
||||||
|
"seed",
|
||||||
|
"tools",
|
||||||
|
"tool_choice",
|
||||||
|
"response_format",
|
||||||
|
"logprobs",
|
||||||
|
"top_logprobs",
|
||||||
|
"parallel_tool_calls",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
DROPPED_NON_OPENAI_FIELDS = frozenset(
|
||||||
|
{
|
||||||
|
"reasoning_effort",
|
||||||
|
"max_depth",
|
||||||
|
"claude_cli_path",
|
||||||
|
"json_schema",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
GEMINI_TOP_LEVEL_FIELDS = frozenset(
|
||||||
|
{
|
||||||
|
"safetySettings",
|
||||||
|
"tools",
|
||||||
|
"toolConfig",
|
||||||
|
"systemInstruction",
|
||||||
|
"cachedContent",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
GEMINI_GENERATION_CONFIG_FIELDS = frozenset(
|
||||||
|
{
|
||||||
|
"candidateCount",
|
||||||
|
"stopSequences",
|
||||||
|
"maxOutputTokens",
|
||||||
|
"temperature",
|
||||||
|
"topP",
|
||||||
|
"topK",
|
||||||
|
"responseMimeType",
|
||||||
|
"responseSchema",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
GEMINI_GENERATION_CONFIG_ALIASES = {
|
||||||
|
"candidate_count": "candidateCount",
|
||||||
|
"stop_sequences": "stopSequences",
|
||||||
|
"max_output_tokens": "maxOutputTokens",
|
||||||
|
"top_p": "topP",
|
||||||
|
"top_k": "topK",
|
||||||
|
"response_mime_type": "responseMimeType",
|
||||||
|
"response_schema": "responseSchema",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def merge_openai_chat_model_params(payload: dict[str, Any], model_params: dict[str, Any]) -> None:
|
||||||
|
"""Merge model_params into an OpenAI Chat Completions-style payload.
|
||||||
|
|
||||||
|
Translates ``json_schema`` to ``response_format``, passes known OpenAI
|
||||||
|
fields through, and drops Claude/llm-connect-only knobs.
|
||||||
|
"""
|
||||||
|
|
||||||
|
before = json_safe(payload) if diagnostics_enabled() else None
|
||||||
|
|
||||||
|
schema = _coerce_json_schema(model_params.get("json_schema"))
|
||||||
|
caller_response_format = model_params.get("response_format")
|
||||||
|
if schema is not None and caller_response_format is None and "response_format" not in payload:
|
||||||
|
payload["response_format"] = {
|
||||||
|
"type": "json_schema",
|
||||||
|
"json_schema": {
|
||||||
|
"name": "structured_output",
|
||||||
|
"schema": schema,
|
||||||
|
"strict": False,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for key, value in model_params.items():
|
||||||
|
if key in DROPPED_NON_OPENAI_FIELDS:
|
||||||
|
continue
|
||||||
|
if key in OPENAI_CHAT_PASSTHROUGH_FIELDS:
|
||||||
|
payload[key] = value
|
||||||
|
|
||||||
|
if before is not None:
|
||||||
|
record_adapter_transformation("merge_model_params.openai_chat", before, payload)
|
||||||
|
|
||||||
|
|
||||||
|
def merge_gemini_model_params(payload: dict[str, Any], model_params: dict[str, Any]) -> None:
|
||||||
|
"""Merge model_params into a Gemini ``generateContent`` payload."""
|
||||||
|
|
||||||
|
before = json_safe(payload) if diagnostics_enabled() else None
|
||||||
|
generation_config = payload.setdefault("generationConfig", {})
|
||||||
|
|
||||||
|
schema = _coerce_json_schema(model_params.get("json_schema"))
|
||||||
|
if schema is not None and "responseSchema" not in generation_config:
|
||||||
|
generation_config["responseMimeType"] = "application/json"
|
||||||
|
generation_config["responseSchema"] = schema
|
||||||
|
|
||||||
|
explicit_generation_config = model_params.get("generationConfig")
|
||||||
|
if isinstance(explicit_generation_config, dict):
|
||||||
|
generation_config.update(explicit_generation_config)
|
||||||
|
|
||||||
|
for key, value in model_params.items():
|
||||||
|
if key in {"json_schema", "generationConfig", "reasoning_effort", "max_depth"}:
|
||||||
|
continue
|
||||||
|
if key in GEMINI_TOP_LEVEL_FIELDS:
|
||||||
|
payload[key] = value
|
||||||
|
continue
|
||||||
|
gemini_key = GEMINI_GENERATION_CONFIG_ALIASES.get(key, key)
|
||||||
|
if gemini_key in GEMINI_GENERATION_CONFIG_FIELDS:
|
||||||
|
generation_config[gemini_key] = value
|
||||||
|
|
||||||
|
if before is not None:
|
||||||
|
record_adapter_transformation("merge_model_params.gemini", before, payload)
|
||||||
|
|
||||||
|
|
||||||
|
def _coerce_json_schema(schema: Any) -> dict[str, Any] | None:
|
||||||
|
if isinstance(schema, str):
|
||||||
|
try:
|
||||||
|
schema = json.loads(schema)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return None
|
||||||
|
if isinstance(schema, dict):
|
||||||
|
return schema
|
||||||
|
return None
|
||||||
@@ -1,277 +1,289 @@
|
|||||||
"""
|
"""
|
||||||
Claude Code CLI adapter — runs the ``claude`` CLI as a subprocess.
|
Claude Code CLI adapter - runs the ``claude`` CLI as a subprocess.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
import subprocess
|
import subprocess
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
from llm_connect.adapter import LLMAdapter
|
from llm_connect._diagnostics import (
|
||||||
from llm_connect.models import RunConfig, LLMResponse
|
record_adapter_transformation,
|
||||||
from llm_connect.config import LLMConfig
|
record_provider_request,
|
||||||
from llm_connect._token_estimator import estimate_tokens
|
record_provider_response,
|
||||||
from llm_connect.exceptions import (
|
)
|
||||||
LLMSubprocessError,
|
from llm_connect._token_estimator import estimate_tokens
|
||||||
LLMTimeoutError,
|
from llm_connect.adapter import LLMAdapter
|
||||||
)
|
from llm_connect.config import LLMConfig
|
||||||
|
from llm_connect.exceptions import LLMSubprocessError, LLMTimeoutError
|
||||||
|
from llm_connect.models import LLMResponse, RunConfig
|
||||||
class ClaudeCodeAdapter(LLMAdapter):
|
|
||||||
"""LLM adapter that shells out to the ``claude`` CLI with ``--print``.
|
|
||||||
|
class ClaudeCodeAdapter(LLMAdapter):
|
||||||
The compiled prompt is piped via **stdin** to avoid shell argument
|
"""LLM adapter that shells out to the ``claude`` CLI with ``--print``.
|
||||||
length limits (compiled prompts can exceed 30 KB).
|
|
||||||
"""
|
The compiled prompt is piped via stdin to avoid shell argument length
|
||||||
|
limits. Compiled prompts can exceed 30 KB.
|
||||||
def __init__(
|
"""
|
||||||
self,
|
|
||||||
cli_path: Optional[str] = None,
|
def __init__(
|
||||||
model: Optional[str] = None,
|
self,
|
||||||
config: Optional[LLMConfig] = None,
|
cli_path: Optional[str] = None,
|
||||||
):
|
model: Optional[str] = None,
|
||||||
self._config = config or LLMConfig(provider="claude-code")
|
config: Optional[LLMConfig] = None,
|
||||||
self._cli_path = cli_path or self._resolve_cli_path()
|
):
|
||||||
self._model = model
|
self._config = config or LLMConfig(provider="claude-code")
|
||||||
|
self._cli_path = cli_path or self._resolve_cli_path()
|
||||||
# ── LLMAdapter interface ────────────────────────────────────────
|
self._model = model
|
||||||
|
|
||||||
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
# LLMAdapter interface
|
||||||
self._preflight_budget(config)
|
|
||||||
cmd = self._build_command(config)
|
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
||||||
|
self._preflight_budget(config)
|
||||||
timeout = config.timeout_seconds or self._config.timeout_seconds
|
cmd = self._build_command(config)
|
||||||
|
|
||||||
try:
|
timeout = config.timeout_seconds or self._config.timeout_seconds
|
||||||
result = subprocess.run(
|
record_provider_request(command=cmd, payload={"stdin": prompt})
|
||||||
cmd,
|
|
||||||
input=prompt,
|
try:
|
||||||
capture_output=True,
|
result = subprocess.run(
|
||||||
text=True,
|
cmd,
|
||||||
timeout=timeout,
|
input=prompt,
|
||||||
)
|
capture_output=True,
|
||||||
except subprocess.TimeoutExpired as exc:
|
text=True,
|
||||||
raise LLMTimeoutError(
|
timeout=timeout,
|
||||||
f"claude CLI timed out after {timeout}s",
|
)
|
||||||
cause=exc,
|
except subprocess.TimeoutExpired as exc:
|
||||||
) from exc
|
raise LLMTimeoutError(
|
||||||
|
f"claude CLI timed out after {timeout}s",
|
||||||
if result.returncode != 0:
|
cause=exc,
|
||||||
raise LLMSubprocessError(
|
) from exc
|
||||||
f"claude CLI exited with code {result.returncode}",
|
|
||||||
return_code=result.returncode,
|
record_provider_response(
|
||||||
stderr=result.stderr,
|
status=result.returncode,
|
||||||
)
|
body={"stdout": result.stdout, "stderr": result.stderr},
|
||||||
|
)
|
||||||
content = _unwrap_cli_json_envelope(result.stdout, config)
|
if result.returncode != 0:
|
||||||
prompt_tokens = estimate_tokens(prompt)
|
raise LLMSubprocessError(
|
||||||
completion_tokens = estimate_tokens(content)
|
f"claude CLI exited with code {result.returncode}",
|
||||||
|
return_code=result.returncode,
|
||||||
response = LLMResponse(
|
stderr=result.stderr,
|
||||||
content=content,
|
)
|
||||||
model=self._model or "claude-code-cli",
|
|
||||||
usage={
|
content = _unwrap_cli_json_envelope(result.stdout, config)
|
||||||
"prompt_tokens": prompt_tokens,
|
prompt_tokens = estimate_tokens(prompt)
|
||||||
"completion_tokens": completion_tokens,
|
completion_tokens = estimate_tokens(content)
|
||||||
"total_tokens": prompt_tokens + completion_tokens,
|
|
||||||
},
|
response = LLMResponse(
|
||||||
finish_reason="stop",
|
content=content,
|
||||||
metadata={
|
model=self._model or "claude-code-cli",
|
||||||
"provider": "claude-code",
|
usage={
|
||||||
"cli_path": self._cli_path,
|
"prompt_tokens": prompt_tokens,
|
||||||
},
|
"completion_tokens": completion_tokens,
|
||||||
)
|
"total_tokens": prompt_tokens + completion_tokens,
|
||||||
self._consume_budget(config, response)
|
},
|
||||||
return response
|
finish_reason="stop",
|
||||||
|
metadata={
|
||||||
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
"provider": "claude-code",
|
||||||
"""Native async implementation using asyncio.create_subprocess_exec."""
|
"cli_path": self._cli_path,
|
||||||
self._preflight_budget(config)
|
},
|
||||||
cmd = self._build_command(config)
|
)
|
||||||
|
self._consume_budget(config, response)
|
||||||
timeout = config.timeout_seconds or self._config.timeout_seconds
|
return response
|
||||||
|
|
||||||
try:
|
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
||||||
proc = await asyncio.create_subprocess_exec(
|
"""Native async implementation using asyncio.create_subprocess_exec."""
|
||||||
*cmd,
|
self._preflight_budget(config)
|
||||||
stdin=asyncio.subprocess.PIPE,
|
cmd = self._build_command(config)
|
||||||
stdout=asyncio.subprocess.PIPE,
|
|
||||||
stderr=asyncio.subprocess.PIPE,
|
timeout = config.timeout_seconds or self._config.timeout_seconds
|
||||||
)
|
record_provider_request(command=cmd, payload={"stdin": prompt})
|
||||||
stdout_bytes, stderr_bytes = await asyncio.wait_for(
|
|
||||||
proc.communicate(input=prompt.encode()),
|
try:
|
||||||
timeout=timeout,
|
proc = await asyncio.create_subprocess_exec(
|
||||||
)
|
*cmd,
|
||||||
except asyncio.TimeoutError as exc:
|
stdin=asyncio.subprocess.PIPE,
|
||||||
raise LLMTimeoutError(
|
stdout=asyncio.subprocess.PIPE,
|
||||||
f"claude CLI timed out after {timeout}s",
|
stderr=asyncio.subprocess.PIPE,
|
||||||
cause=exc,
|
)
|
||||||
) from exc
|
stdout_bytes, stderr_bytes = await asyncio.wait_for(
|
||||||
|
proc.communicate(input=prompt.encode()),
|
||||||
if proc.returncode != 0:
|
timeout=timeout,
|
||||||
raise LLMSubprocessError(
|
)
|
||||||
f"claude CLI exited with code {proc.returncode}",
|
except asyncio.TimeoutError as exc:
|
||||||
return_code=proc.returncode,
|
raise LLMTimeoutError(
|
||||||
stderr=stderr_bytes.decode(),
|
f"claude CLI timed out after {timeout}s",
|
||||||
)
|
cause=exc,
|
||||||
|
) from exc
|
||||||
content = _unwrap_cli_json_envelope(stdout_bytes.decode(), config)
|
|
||||||
prompt_tokens = estimate_tokens(prompt)
|
stdout = stdout_bytes.decode()
|
||||||
completion_tokens = estimate_tokens(content)
|
stderr = stderr_bytes.decode()
|
||||||
|
record_provider_response(
|
||||||
response = LLMResponse(
|
status=proc.returncode,
|
||||||
content=content,
|
body={"stdout": stdout, "stderr": stderr},
|
||||||
model=self._model or "claude-code-cli",
|
)
|
||||||
usage={
|
if proc.returncode != 0:
|
||||||
"prompt_tokens": prompt_tokens,
|
raise LLMSubprocessError(
|
||||||
"completion_tokens": completion_tokens,
|
f"claude CLI exited with code {proc.returncode}",
|
||||||
"total_tokens": prompt_tokens + completion_tokens,
|
return_code=proc.returncode,
|
||||||
},
|
stderr=stderr,
|
||||||
finish_reason="stop",
|
)
|
||||||
metadata={
|
|
||||||
"provider": "claude-code",
|
content = _unwrap_cli_json_envelope(stdout, config)
|
||||||
"cli_path": self._cli_path,
|
prompt_tokens = estimate_tokens(prompt)
|
||||||
"async": True,
|
completion_tokens = estimate_tokens(content)
|
||||||
},
|
|
||||||
)
|
response = LLMResponse(
|
||||||
self._consume_budget(config, response)
|
content=content,
|
||||||
return response
|
model=self._model or "claude-code-cli",
|
||||||
|
usage={
|
||||||
def validate_config(self, config: RunConfig) -> bool:
|
"prompt_tokens": prompt_tokens,
|
||||||
try:
|
"completion_tokens": completion_tokens,
|
||||||
result = subprocess.run(
|
"total_tokens": prompt_tokens + completion_tokens,
|
||||||
[self._cli_path, "--version"],
|
},
|
||||||
capture_output=True,
|
finish_reason="stop",
|
||||||
text=True,
|
metadata={
|
||||||
timeout=10,
|
"provider": "claude-code",
|
||||||
)
|
"cli_path": self._cli_path,
|
||||||
return result.returncode == 0
|
"async": True,
|
||||||
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
|
},
|
||||||
return False
|
)
|
||||||
|
self._consume_budget(config, response)
|
||||||
def _build_command(self, config: RunConfig) -> list[str]:
|
return response
|
||||||
cmd = [self._cli_path, "--print"]
|
|
||||||
if self._model:
|
def validate_config(self, config: RunConfig) -> bool:
|
||||||
cmd.extend(["--model", self._model])
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
json_schema = _json_schema_arg(config)
|
[self._cli_path, "--version"],
|
||||||
if json_schema:
|
capture_output=True,
|
||||||
cmd.extend(["--json-schema", json_schema])
|
text=True,
|
||||||
# With --json-schema alone the CLI prints conversational text on
|
timeout=10,
|
||||||
# stdout while the structured payload ships on a sidecar channel
|
)
|
||||||
# callers cannot reach. --output-format json forces the structured
|
return result.returncode == 0
|
||||||
# response (wrapped in an envelope) onto stdout.
|
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
|
||||||
cmd.extend(["--output-format", "json"])
|
return False
|
||||||
return cmd
|
|
||||||
|
def _build_command(self, config: RunConfig) -> list[str]:
|
||||||
def _resolve_cli_path(self) -> str:
|
cmd = [self._cli_path, "--print"]
|
||||||
configured = (
|
if self._model:
|
||||||
os.environ.get("LLM_CONNECT_CLAUDE_CLI_PATH")
|
cmd.extend(["--model", self._model])
|
||||||
or os.environ.get("CLAUDE_CLI_PATH")
|
|
||||||
or self._config.claude_cli_path
|
json_schema = _json_schema_arg(config)
|
||||||
)
|
if json_schema:
|
||||||
if configured and configured != "claude":
|
cmd.extend(["--json-schema", json_schema])
|
||||||
return configured
|
# With --json-schema alone the CLI prints conversational text on
|
||||||
|
# stdout while the structured payload ships on a sidecar channel
|
||||||
local_cli = Path.home() / ".local" / "bin" / "claude"
|
# callers cannot reach. --output-format json forces the structured
|
||||||
if local_cli.exists():
|
# response (wrapped in an envelope) onto stdout.
|
||||||
return str(local_cli)
|
cmd.extend(["--output-format", "json"])
|
||||||
return configured or "claude"
|
return cmd
|
||||||
|
|
||||||
|
def _resolve_cli_path(self) -> str:
|
||||||
def _json_schema_arg(config: RunConfig) -> str | None:
|
configured = (
|
||||||
schema = (config.model_params or {}).get("json_schema")
|
os.environ.get("LLM_CONNECT_CLAUDE_CLI_PATH")
|
||||||
if not schema:
|
or os.environ.get("CLAUDE_CLI_PATH")
|
||||||
return None
|
or self._config.claude_cli_path
|
||||||
if isinstance(schema, str):
|
)
|
||||||
return schema
|
if configured and configured != "claude":
|
||||||
if isinstance(schema, dict):
|
return configured
|
||||||
return json.dumps(schema, separators=(",", ":"))
|
|
||||||
return None
|
local_cli = Path.home() / ".local" / "bin" / "claude"
|
||||||
|
if local_cli.exists():
|
||||||
|
return str(local_cli)
|
||||||
# Envelope field names Claude Code's `--output-format json` is known to use
|
return configured or "claude"
|
||||||
# for the model's primary textual response. Used as a fall-back when no field
|
|
||||||
# carries a JSON-parseable payload (e.g. plain prose generation).
|
|
||||||
_ENVELOPE_TEXT_FIELDS = ("result", "result_text", "content", "text", "output")
|
def _json_schema_arg(config: RunConfig) -> str | None:
|
||||||
|
schema = (config.model_params or {}).get("json_schema")
|
||||||
|
if not schema:
|
||||||
def _unwrap_cli_json_envelope(stdout: str, config: RunConfig) -> str:
|
return None
|
||||||
"""Extract the model's payload from Claude CLI's --output-format json envelope.
|
if isinstance(schema, str):
|
||||||
|
return schema
|
||||||
Only runs when --json-schema was set (the only code path that adds
|
if isinstance(schema, dict):
|
||||||
--output-format json to the CLI invocation). Other callers keep the raw
|
return json.dumps(schema, separators=(",", ":"))
|
||||||
stdout behavior unchanged.
|
return None
|
||||||
|
|
||||||
Strategy: when --json-schema is set the caller wants JSON back, so prefer
|
|
||||||
any envelope field whose value is itself valid JSON (dict, list, or a
|
# Envelope field names Claude Code's --output-format json is known to use for
|
||||||
string that parses as JSON). This handles two observed envelope shapes:
|
# the model's primary textual response. Used as a fallback when no field carries
|
||||||
|
# a JSON-parseable payload, such as plain prose generation.
|
||||||
1. Short prompts where the model emits the structured payload directly
|
_ENVELOPE_TEXT_FIELDS = ("result", "result_text", "content", "text", "output")
|
||||||
in the `result` field as a JSON-encoded string.
|
|
||||||
2. Longer prompts where the model emits a conversational preamble in
|
|
||||||
`result` and the schema-enforced JSON in a separate field (the exact
|
def _unwrap_cli_json_envelope(stdout: str, config: RunConfig) -> str:
|
||||||
field name varies across CLI versions).
|
"""Extract the model's payload from Claude CLI's --output-format json envelope.
|
||||||
|
|
||||||
Fall back to the first text field only when no JSON-bearing field exists,
|
Only runs when --json-schema was set. Other callers keep the raw stdout
|
||||||
so non-schema callers via this code path still see the model's prose.
|
behavior unchanged.
|
||||||
Surface the raw envelope as a last resort so the operator can see what
|
"""
|
||||||
shape arrived and extend the strategy.
|
if not _json_schema_arg(config):
|
||||||
"""
|
return stdout
|
||||||
if not _json_schema_arg(config):
|
text = stdout.strip()
|
||||||
return stdout
|
if not text:
|
||||||
text = stdout.strip()
|
return stdout
|
||||||
if not text:
|
try:
|
||||||
return stdout
|
envelope = json.loads(text)
|
||||||
try:
|
except json.JSONDecodeError:
|
||||||
envelope = json.loads(text)
|
return stdout
|
||||||
except json.JSONDecodeError:
|
if not isinstance(envelope, dict):
|
||||||
return stdout
|
return stdout
|
||||||
if not isinstance(envelope, dict):
|
|
||||||
return stdout
|
json_payload = _find_json_payload(envelope)
|
||||||
|
if json_payload is not None:
|
||||||
json_payload = _find_json_payload(envelope)
|
return _record_unwrap(stdout, json_payload)
|
||||||
if json_payload is not None:
|
|
||||||
return json_payload
|
for key in _ENVELOPE_TEXT_FIELDS:
|
||||||
|
value = envelope.get(key)
|
||||||
for key in _ENVELOPE_TEXT_FIELDS:
|
if isinstance(value, str):
|
||||||
value = envelope.get(key)
|
return _record_unwrap(stdout, value)
|
||||||
if isinstance(value, str):
|
if isinstance(value, (dict, list)):
|
||||||
return value
|
return _record_unwrap(stdout, json.dumps(value))
|
||||||
if isinstance(value, (dict, list)):
|
|
||||||
return json.dumps(value)
|
return stdout
|
||||||
|
|
||||||
return stdout
|
|
||||||
|
def _find_json_payload(envelope: dict) -> str | None:
|
||||||
|
"""Return the first envelope value that represents valid JSON."""
|
||||||
def _find_json_payload(envelope: dict) -> str | None:
|
for key, value in envelope.items():
|
||||||
"""Return the first envelope value that represents valid JSON.
|
if key in _ENVELOPE_METADATA_KEYS:
|
||||||
|
continue
|
||||||
Insertion order is preserved by Python dicts, so this prefers fields the
|
if isinstance(value, (dict, list)):
|
||||||
CLI lists earliest in its envelope. Skips obvious metadata keys (cost,
|
return json.dumps(value)
|
||||||
usage, timing) so we never accidentally pick a numeric or telemetry value.
|
if isinstance(value, str):
|
||||||
"""
|
stripped = value.strip()
|
||||||
for key, value in envelope.items():
|
if stripped.startswith(("{", "[")):
|
||||||
if key in _ENVELOPE_METADATA_KEYS:
|
try:
|
||||||
continue
|
json.loads(stripped)
|
||||||
if isinstance(value, (dict, list)):
|
except json.JSONDecodeError:
|
||||||
return json.dumps(value)
|
continue
|
||||||
if isinstance(value, str):
|
return stripped
|
||||||
stripped = value.strip()
|
return None
|
||||||
if stripped.startswith(("{", "[")):
|
|
||||||
try:
|
|
||||||
json.loads(stripped)
|
# Envelope keys that carry telemetry, never the model payload.
|
||||||
except json.JSONDecodeError:
|
_ENVELOPE_METADATA_KEYS = frozenset(
|
||||||
continue
|
{
|
||||||
return stripped
|
"type",
|
||||||
return None
|
"subtype",
|
||||||
|
"model",
|
||||||
|
"usage",
|
||||||
# Envelope keys that carry telemetry, never the model payload.
|
"total_cost_usd",
|
||||||
_ENVELOPE_METADATA_KEYS = frozenset({
|
"cost_usd",
|
||||||
"type", "subtype", "model", "usage", "total_cost_usd", "cost_usd",
|
"duration_ms",
|
||||||
"duration_ms", "duration_api_ms", "num_turns", "session_id",
|
"duration_api_ms",
|
||||||
"is_error", "stop_reason", "permission_denials", "uuid",
|
"num_turns",
|
||||||
})
|
"session_id",
|
||||||
|
"is_error",
|
||||||
|
"stop_reason",
|
||||||
|
"permission_denials",
|
||||||
|
"uuid",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _record_unwrap(stdout: str, content: str) -> str:
|
||||||
|
if content != stdout:
|
||||||
|
record_adapter_transformation("unwrap_cli_envelope", stdout, content)
|
||||||
|
return content
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ from llm_connect.adapter import LLMAdapter
|
|||||||
from llm_connect.models import RunConfig, LLMResponse
|
from llm_connect.models import RunConfig, LLMResponse
|
||||||
from llm_connect.config import resolve_api_key, find_project_root
|
from llm_connect.config import resolve_api_key, find_project_root
|
||||||
from llm_connect._http import post_json
|
from llm_connect._http import post_json
|
||||||
|
from llm_connect._payload import merge_gemini_model_params
|
||||||
from llm_connect.exceptions import LLMConfigurationError
|
from llm_connect.exceptions import LLMConfigurationError
|
||||||
|
|
||||||
_DEFAULT_MODEL = "gemini-2.5-flash"
|
_DEFAULT_MODEL = "gemini-2.5-flash"
|
||||||
@@ -74,6 +75,8 @@ class GeminiAdapter(LLMAdapter):
|
|||||||
"maxOutputTokens": config.max_tokens,
|
"maxOutputTokens": config.max_tokens,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
if config.model_params:
|
||||||
|
merge_gemini_model_params(payload, config.model_params)
|
||||||
|
|
||||||
url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}"
|
url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}"
|
||||||
|
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ from llm_connect.adapter import LLMAdapter
|
|||||||
from llm_connect.models import RunConfig, LLMResponse
|
from llm_connect.models import RunConfig, LLMResponse
|
||||||
from llm_connect.config import resolve_api_key, find_project_root
|
from llm_connect.config import resolve_api_key, find_project_root
|
||||||
from llm_connect._http import post_json
|
from llm_connect._http import post_json
|
||||||
|
from llm_connect._payload import merge_openai_chat_model_params
|
||||||
from llm_connect.exceptions import (
|
from llm_connect.exceptions import (
|
||||||
LLMConfigurationError,
|
LLMConfigurationError,
|
||||||
LLMAPIError,
|
LLMAPIError,
|
||||||
@@ -65,6 +66,8 @@ class OpenAIAdapter(LLMAdapter):
|
|||||||
"temperature": config.temperature,
|
"temperature": config.temperature,
|
||||||
"max_tokens": config.max_tokens,
|
"max_tokens": config.max_tokens,
|
||||||
}
|
}
|
||||||
|
if config.model_params:
|
||||||
|
merge_openai_chat_model_params(payload, config.model_params)
|
||||||
|
|
||||||
headers = {
|
headers = {
|
||||||
"Authorization": f"Bearer {self._api_key}",
|
"Authorization": f"Bearer {self._api_key}",
|
||||||
|
|||||||
@@ -1,221 +1,151 @@
|
|||||||
"""
|
"""
|
||||||
OpenRouter adapter — calls the OpenAI-compatible chat completions API.
|
OpenRouter adapter - calls the OpenAI-compatible chat completions API.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import time
|
import time
|
||||||
from typing import Optional, Dict, Any
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
from llm_connect.adapter import LLMAdapter
|
from llm_connect._http import post_json
|
||||||
from llm_connect.models import RunConfig, LLMResponse
|
from llm_connect._payload import merge_openai_chat_model_params
|
||||||
from llm_connect.config import LLMConfig, resolve_api_key, find_project_root
|
from llm_connect.adapter import LLMAdapter
|
||||||
from llm_connect._http import post_json
|
from llm_connect.config import LLMConfig, find_project_root, resolve_api_key
|
||||||
from llm_connect.exceptions import (
|
from llm_connect.exceptions import LLMAPIError, LLMRateLimitError
|
||||||
LLMConfigurationError,
|
from llm_connect.models import LLMResponse, RunConfig
|
||||||
LLMAPIError,
|
|
||||||
LLMRateLimitError,
|
_DEFAULT_MODEL = "anthropic/claude-sonnet-4"
|
||||||
)
|
|
||||||
|
|
||||||
_DEFAULT_MODEL = "anthropic/claude-sonnet-4"
|
class OpenRouterAdapter(LLMAdapter):
|
||||||
|
"""LLM adapter that calls the OpenRouter chat completions endpoint.
|
||||||
|
|
||||||
class OpenRouterAdapter(LLMAdapter):
|
Constructor args override values from *config*; *config* overrides
|
||||||
"""LLM adapter that calls the OpenRouter chat completions endpoint.
|
global defaults. The model used for a given call is resolved as:
|
||||||
|
``constructor model > RunConfig.model_name > default``.
|
||||||
Constructor args override values from *config*; *config* overrides
|
"""
|
||||||
global defaults. The model used for a given call is resolved as:
|
|
||||||
``constructor model > RunConfig.model_name > default``.
|
def __init__(
|
||||||
"""
|
self,
|
||||||
|
model: Optional[str] = None,
|
||||||
def __init__(
|
api_key: Optional[str] = None,
|
||||||
self,
|
api_base: Optional[str] = None,
|
||||||
model: Optional[str] = None,
|
config: Optional[LLMConfig] = None,
|
||||||
api_key: Optional[str] = None,
|
system_prompt: Optional[str] = None,
|
||||||
api_base: Optional[str] = None,
|
extra_headers: Optional[Dict[str, str]] = None,
|
||||||
config: Optional[LLMConfig] = None,
|
max_retries: Optional[int] = None,
|
||||||
system_prompt: Optional[str] = None,
|
):
|
||||||
extra_headers: Optional[Dict[str, str]] = None,
|
self._config = config or LLMConfig()
|
||||||
max_retries: Optional[int] = None,
|
# Track whether the model was explicitly supplied (constructor or
|
||||||
):
|
# LLMConfig). Comparing self._model to _DEFAULT_MODEL is not enough:
|
||||||
self._config = config or LLMConfig()
|
# callers who pass --model anthropic/claude-sonnet-4 happen to match
|
||||||
# Track whether the model was explicitly supplied (constructor or
|
# the default and would otherwise be misrouted to RunConfig.model_name
|
||||||
# LLMConfig). Comparing self._model to _DEFAULT_MODEL is not enough —
|
# (which defaults to "gpt-4", quietly sending every call to OpenAI's
|
||||||
# callers who pass --model anthropic/claude-sonnet-4 happen to match
|
# gpt-4 model, which is what broke the activity-core CUST-WP-0045
|
||||||
# the default and would otherwise be misrouted to RunConfig.model_name
|
# canary on 2026-06-02).
|
||||||
# (which defaults to "gpt-4" — quietly sending every call to OpenAI's
|
self._explicit_model = model is not None or self._config.model is not None
|
||||||
# gpt-4 model, which is what broke the activity-core CUST-WP-0045
|
self._model = model or self._config.model or _DEFAULT_MODEL
|
||||||
# canary on 2026-06-02).
|
self._api_base = (api_base or self._config.api_base).rstrip("/")
|
||||||
self._explicit_model = model is not None or self._config.model is not None
|
self._system_prompt = system_prompt
|
||||||
self._model = model or self._config.model or _DEFAULT_MODEL
|
self._extra_headers = extra_headers or {}
|
||||||
self._api_base = (api_base or self._config.api_base).rstrip("/")
|
self._max_retries = max_retries if max_retries is not None else self._config.max_retries
|
||||||
self._system_prompt = system_prompt
|
|
||||||
self._extra_headers = extra_headers or {}
|
root = find_project_root()
|
||||||
self._max_retries = max_retries if max_retries is not None else self._config.max_retries
|
key_file_paths = [root / "apikey-openrouter.txt"] if root else []
|
||||||
|
self._api_key = resolve_api_key(
|
||||||
# Resolve API key
|
explicit=api_key or self._config.api_key,
|
||||||
root = find_project_root()
|
env_var="OPENROUTER_API_KEY",
|
||||||
key_file_paths = [root / "apikey-openrouter.txt"] if root else []
|
key_file_paths=key_file_paths,
|
||||||
self._api_key = resolve_api_key(
|
)
|
||||||
explicit=api_key or self._config.api_key,
|
|
||||||
env_var="OPENROUTER_API_KEY",
|
# LLMAdapter interface
|
||||||
key_file_paths=key_file_paths,
|
|
||||||
)
|
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
||||||
|
self._preflight_budget(config)
|
||||||
# ── LLMAdapter interface ────────────────────────────────────────
|
# Explicit constructor/LLMConfig model wins; only fall back to the
|
||||||
|
# per-call RunConfig.model_name when the adapter was not told what to
|
||||||
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
# use. RunConfig.model_name defaults to "gpt-4", so falling back
|
||||||
self._preflight_budget(config)
|
# unconditionally would silently misroute callers.
|
||||||
# Explicit constructor/LLMConfig model wins; only fall back to the
|
if self._explicit_model:
|
||||||
# per-call RunConfig.model_name when the adapter wasn't told what to
|
model = self._model
|
||||||
# use. RunConfig.model_name defaults to "gpt-4", so falling back
|
else:
|
||||||
# unconditionally would silently misroute callers.
|
model = config.model_name or self._model
|
||||||
if self._explicit_model:
|
|
||||||
model = self._model
|
messages: list[Dict[str, str]] = []
|
||||||
else:
|
if self._system_prompt:
|
||||||
model = config.model_name or self._model
|
messages.append({"role": "system", "content": self._system_prompt})
|
||||||
|
messages.append({"role": "user", "content": prompt})
|
||||||
messages: list[Dict[str, str]] = []
|
|
||||||
if self._system_prompt:
|
payload: Dict[str, Any] = {
|
||||||
messages.append({"role": "system", "content": self._system_prompt})
|
"model": model,
|
||||||
messages.append({"role": "user", "content": prompt})
|
"messages": messages,
|
||||||
|
"temperature": config.temperature,
|
||||||
payload: Dict[str, Any] = {
|
"max_tokens": config.max_tokens,
|
||||||
"model": model,
|
}
|
||||||
"messages": messages,
|
if config.model_params:
|
||||||
"temperature": config.temperature,
|
merge_openai_chat_model_params(payload, config.model_params)
|
||||||
"max_tokens": config.max_tokens,
|
|
||||||
}
|
headers = {
|
||||||
if config.model_params:
|
"Authorization": f"Bearer {self._api_key}",
|
||||||
_merge_model_params(payload, config.model_params)
|
**self._extra_headers,
|
||||||
|
}
|
||||||
headers = {
|
url = f"{self._api_base}/chat/completions"
|
||||||
"Authorization": f"Bearer {self._api_key}",
|
|
||||||
**self._extra_headers,
|
start = time.time()
|
||||||
}
|
data = self._post_with_retries(url, payload, headers, config.timeout_seconds)
|
||||||
url = f"{self._api_base}/chat/completions"
|
latency = time.time() - start
|
||||||
|
|
||||||
start = time.time()
|
choice = data.get("choices", [{}])[0]
|
||||||
data = self._post_with_retries(url, payload, headers, config.timeout_seconds)
|
content = choice.get("message", {}).get("content", "")
|
||||||
latency = time.time() - start
|
finish_reason = choice.get("finish_reason", "stop")
|
||||||
|
usage = data.get("usage", {})
|
||||||
# Parse response
|
|
||||||
choice = data.get("choices", [{}])[0]
|
response = LLMResponse(
|
||||||
content = choice.get("message", {}).get("content", "")
|
content=content,
|
||||||
finish_reason = choice.get("finish_reason", "stop")
|
model=data.get("model", model),
|
||||||
usage = data.get("usage", {})
|
usage={
|
||||||
|
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||||
response = LLMResponse(
|
"completion_tokens": usage.get("completion_tokens", 0),
|
||||||
content=content,
|
"total_tokens": usage.get("total_tokens", 0),
|
||||||
model=data.get("model", model),
|
},
|
||||||
usage={
|
finish_reason=finish_reason,
|
||||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
metadata={
|
||||||
"completion_tokens": usage.get("completion_tokens", 0),
|
"provider": "openrouter",
|
||||||
"total_tokens": usage.get("total_tokens", 0),
|
"latency_seconds": round(latency, 3),
|
||||||
},
|
"response_id": data.get("id", ""),
|
||||||
finish_reason=finish_reason,
|
},
|
||||||
metadata={
|
)
|
||||||
"provider": "openrouter",
|
self._consume_budget(config, response)
|
||||||
"latency_seconds": round(latency, 3),
|
return response
|
||||||
"response_id": data.get("id", ""),
|
|
||||||
},
|
def validate_config(self, config: RunConfig) -> bool:
|
||||||
)
|
if not self._api_key:
|
||||||
self._consume_budget(config, response)
|
return False
|
||||||
return response
|
if not (self._model or config.model_name):
|
||||||
|
return False
|
||||||
def validate_config(self, config: RunConfig) -> bool:
|
if not (0.0 <= config.temperature <= 2.0):
|
||||||
if not self._api_key:
|
return False
|
||||||
return False
|
return True
|
||||||
if not (self._model or config.model_name):
|
|
||||||
return False
|
# Internals
|
||||||
if not (0.0 <= config.temperature <= 2.0):
|
|
||||||
return False
|
def _post_with_retries(
|
||||||
return True
|
self,
|
||||||
|
url: str,
|
||||||
# ── Internals ───────────────────────────────────────────────────
|
payload: Dict[str, Any],
|
||||||
|
headers: Dict[str, str],
|
||||||
def _post_with_retries(
|
timeout: int,
|
||||||
self,
|
) -> Dict[str, Any]:
|
||||||
url: str,
|
last_exc: Optional[Exception] = None
|
||||||
payload: Dict[str, Any],
|
for attempt in range(self._max_retries + 1):
|
||||||
headers: Dict[str, str],
|
try:
|
||||||
timeout: int,
|
return post_json(url, payload, headers, timeout=timeout)
|
||||||
) -> Dict[str, Any]:
|
except LLMRateLimitError as exc:
|
||||||
last_exc: Optional[Exception] = None
|
last_exc = exc
|
||||||
for attempt in range(self._max_retries + 1):
|
if attempt < self._max_retries:
|
||||||
try:
|
time.sleep(2 ** attempt)
|
||||||
return post_json(url, payload, headers, timeout=timeout)
|
except LLMAPIError as exc:
|
||||||
except LLMRateLimitError as exc:
|
if exc.status_code >= 500 and attempt < self._max_retries:
|
||||||
last_exc = exc
|
last_exc = exc
|
||||||
if attempt < self._max_retries:
|
time.sleep(2 ** attempt)
|
||||||
time.sleep(2 ** attempt)
|
else:
|
||||||
except LLMAPIError as exc:
|
raise
|
||||||
if exc.status_code >= 500 and attempt < self._max_retries:
|
raise last_exc # type: ignore[misc]
|
||||||
last_exc = exc
|
|
||||||
time.sleep(2 ** attempt)
|
|
||||||
else:
|
|
||||||
raise
|
|
||||||
raise last_exc # type: ignore[misc]
|
|
||||||
|
|
||||||
|
|
||||||
# OpenAI Chat Completions fields that map straight through from model_params.
|
|
||||||
# Anything not in this set is provider-specific and must be either translated
|
|
||||||
# or dropped — we never blind-merge into the payload, because OpenRouter
|
|
||||||
# rejects unknown top-level fields with HTTP 400.
|
|
||||||
_OPENAI_PASSTHROUGH_FIELDS = frozenset({
|
|
||||||
"top_p", "n", "stream", "stop", "presence_penalty",
|
|
||||||
"frequency_penalty", "logit_bias", "user", "seed",
|
|
||||||
"tools", "tool_choice", "response_format",
|
|
||||||
"logprobs", "top_logprobs", "parallel_tool_calls",
|
|
||||||
})
|
|
||||||
|
|
||||||
# Provider-specific model_params keys that have no OpenAI Chat Completions
|
|
||||||
# equivalent and must be silently dropped to keep payloads valid.
|
|
||||||
_DROPPED_NON_OPENAI_FIELDS = frozenset({
|
|
||||||
"reasoning_effort", # Claude CLI / Anthropic-specific
|
|
||||||
"max_depth", # llm-connect's own depth knob
|
|
||||||
"claude_cli_path", # adapter wiring leak
|
|
||||||
"json_schema", # translated below into response_format
|
|
||||||
})
|
|
||||||
|
|
||||||
|
|
||||||
def _merge_model_params(payload: Dict[str, Any], model_params: Dict[str, Any]) -> None:
|
|
||||||
"""Merge RunConfig.model_params into an OpenAI Chat Completions payload.
|
|
||||||
|
|
||||||
Pass-through whitelisted OpenAI keys, translate json_schema into the
|
|
||||||
proper response_format wrapper, drop known provider-specific fields,
|
|
||||||
and ignore anything else rather than letting it through and triggering
|
|
||||||
a 400 from OpenRouter (the failure mode that hit CUST-WP-0045 on
|
|
||||||
2026-06-02 — reasoning_effort and a top-level json_schema were merged
|
|
||||||
into the body and the API rejected both).
|
|
||||||
"""
|
|
||||||
schema = model_params.get("json_schema")
|
|
||||||
if schema is not None and "response_format" not in payload:
|
|
||||||
if isinstance(schema, str):
|
|
||||||
try:
|
|
||||||
import json as _json
|
|
||||||
schema = _json.loads(schema)
|
|
||||||
except (ValueError, TypeError):
|
|
||||||
schema = None
|
|
||||||
if isinstance(schema, dict):
|
|
||||||
# strict=False: OpenAI's strict mode requires additionalProperties
|
|
||||||
# to be false on every object and every property in the required
|
|
||||||
# list. Most application-supplied schemas are not written that
|
|
||||||
# way (the activity-core daily-triage schema, for example, has
|
|
||||||
# neither). With strict=False, OpenRouter still honours the
|
|
||||||
# schema as a soft constraint and the model's output remains
|
|
||||||
# structured. Callers can opt back into strict by including
|
|
||||||
# `strict: true` themselves in a custom `response_format`.
|
|
||||||
payload["response_format"] = {
|
|
||||||
"type": "json_schema",
|
|
||||||
"json_schema": {
|
|
||||||
"name": "structured_output",
|
|
||||||
"schema": schema,
|
|
||||||
"strict": False,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
for key, value in model_params.items():
|
|
||||||
if key in _DROPPED_NON_OPENAI_FIELDS:
|
|
||||||
continue
|
|
||||||
if key in _OPENAI_PASSTHROUGH_FIELDS:
|
|
||||||
payload[key] = value
|
|
||||||
# else: silently drop unknown keys rather than risk a 400.
|
|
||||||
|
|||||||
121
llm_connect/replay.py
Normal file
121
llm_connect/replay.py
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
"""Replay llm-connect audit records without making provider calls."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from llm_connect.claude_code import _unwrap_cli_json_envelope
|
||||||
|
from llm_connect.models import RunConfig
|
||||||
|
|
||||||
|
|
||||||
|
def parse_audit_record(record: dict[str, Any]) -> dict[str, Any]:
|
||||||
|
"""Parse the recorded provider response and compare it to saved content."""
|
||||||
|
|
||||||
|
config = RunConfig.from_dict(record.get("config", {}))
|
||||||
|
provider = record.get("provider") or _infer_provider(record)
|
||||||
|
provider_response = record.get("provider_response") or {}
|
||||||
|
body = provider_response.get("body")
|
||||||
|
parsed_content = _parse_provider_response(provider, body, config)
|
||||||
|
recorded_content = record.get("parsed_content")
|
||||||
|
schema_check = _check_structured_output(parsed_content, config.model_params.get("json_schema"))
|
||||||
|
|
||||||
|
return {
|
||||||
|
"provider": provider,
|
||||||
|
"parsed_content": parsed_content,
|
||||||
|
"matches_recorded_content": parsed_content == recorded_content,
|
||||||
|
"structured_output": schema_check,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv: list[str] | None = None) -> None:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
prog="python -m llm_connect.replay",
|
||||||
|
description="Replay parsing for a llm-connect audit JSON file.",
|
||||||
|
)
|
||||||
|
parser.add_argument("audit_file", help="Path to an audit JSON file")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Print the full replay report")
|
||||||
|
args = parser.parse_args(argv)
|
||||||
|
|
||||||
|
record = json.loads(Path(args.audit_file).read_text(encoding="utf-8"))
|
||||||
|
report = parse_audit_record(record)
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps(report, indent=2, sort_keys=True))
|
||||||
|
else:
|
||||||
|
print(report["parsed_content"])
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_provider_response(provider: str | None, body: Any, config: RunConfig) -> str:
|
||||||
|
if provider in {"openai", "openrouter"}:
|
||||||
|
if isinstance(body, dict):
|
||||||
|
choice = (body.get("choices") or [{}])[0]
|
||||||
|
return choice.get("message", {}).get("content", "")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
if provider == "gemini":
|
||||||
|
if isinstance(body, dict):
|
||||||
|
candidates = body.get("candidates") or []
|
||||||
|
if not candidates:
|
||||||
|
return ""
|
||||||
|
parts = candidates[0].get("content", {}).get("parts", [])
|
||||||
|
return "".join(part.get("text", "") for part in parts)
|
||||||
|
return ""
|
||||||
|
|
||||||
|
if provider == "claude-code":
|
||||||
|
if isinstance(body, dict):
|
||||||
|
return _unwrap_cli_json_envelope(body.get("stdout", ""), config)
|
||||||
|
return ""
|
||||||
|
|
||||||
|
if isinstance(body, str):
|
||||||
|
return body
|
||||||
|
if body is None:
|
||||||
|
return ""
|
||||||
|
return json.dumps(body)
|
||||||
|
|
||||||
|
|
||||||
|
def _infer_provider(record: dict[str, Any]) -> str | None:
|
||||||
|
request = record.get("provider_request") or {}
|
||||||
|
url = request.get("url", "")
|
||||||
|
if "openrouter.ai" in url:
|
||||||
|
return "openrouter"
|
||||||
|
if "api.openai.com" in url:
|
||||||
|
return "openai"
|
||||||
|
if "generativelanguage.googleapis.com" in url:
|
||||||
|
return "gemini"
|
||||||
|
if request.get("command"):
|
||||||
|
return "claude-code"
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _check_structured_output(content: str, schema: Any) -> dict[str, Any]:
|
||||||
|
if not schema:
|
||||||
|
return {"checked": False}
|
||||||
|
if isinstance(schema, str):
|
||||||
|
try:
|
||||||
|
schema = json.loads(schema)
|
||||||
|
except ValueError as exc:
|
||||||
|
return {"checked": True, "valid": False, "error": f"invalid schema JSON: {exc}"}
|
||||||
|
if not isinstance(schema, dict):
|
||||||
|
return {"checked": True, "valid": False, "error": "schema must be an object"}
|
||||||
|
|
||||||
|
try:
|
||||||
|
parsed = json.loads(content)
|
||||||
|
except ValueError as exc:
|
||||||
|
return {"checked": True, "valid": False, "error": f"invalid output JSON: {exc}"}
|
||||||
|
|
||||||
|
missing = []
|
||||||
|
if schema.get("type") == "object":
|
||||||
|
if not isinstance(parsed, dict):
|
||||||
|
return {"checked": True, "valid": False, "error": "output is not an object"}
|
||||||
|
for key in schema.get("required", []):
|
||||||
|
if key not in parsed:
|
||||||
|
missing.append(key)
|
||||||
|
if missing:
|
||||||
|
return {"checked": True, "valid": False, "missing_required": missing}
|
||||||
|
return {"checked": True, "valid": True}
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -21,13 +21,21 @@ Usage (CLI)::
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
|
import datetime as _dt
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
import threading
|
import threading
|
||||||
from http.server import BaseHTTPRequestHandler, HTTPServer
|
import time
|
||||||
|
import uuid
|
||||||
|
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
|
||||||
|
from pathlib import Path
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
from urllib.parse import parse_qs, urlsplit
|
||||||
|
|
||||||
|
from llm_connect._diagnostics import capture_diagnostics
|
||||||
from llm_connect.adapter import LLMAdapter
|
from llm_connect.adapter import LLMAdapter
|
||||||
from llm_connect.models import RunConfig
|
from llm_connect.models import LLMResponse, RunConfig
|
||||||
|
|
||||||
|
|
||||||
class _Handler(BaseHTTPRequestHandler):
|
class _Handler(BaseHTTPRequestHandler):
|
||||||
@@ -39,7 +47,8 @@ class _Handler(BaseHTTPRequestHandler):
|
|||||||
# ── GET ────────────────────────────────────────────────────────
|
# ── GET ────────────────────────────────────────────────────────
|
||||||
|
|
||||||
def do_GET(self):
|
def do_GET(self):
|
||||||
if self.path == "/health":
|
parsed = urlsplit(self.path)
|
||||||
|
if parsed.path == "/health":
|
||||||
self._respond(200, {"status": "ok"})
|
self._respond(200, {"status": "ok"})
|
||||||
else:
|
else:
|
||||||
self._respond(404, {"error": "not found"})
|
self._respond(404, {"error": "not found"})
|
||||||
@@ -47,10 +56,13 @@ class _Handler(BaseHTTPRequestHandler):
|
|||||||
# ── POST ───────────────────────────────────────────────────────
|
# ── POST ───────────────────────────────────────────────────────
|
||||||
|
|
||||||
def do_POST(self):
|
def do_POST(self):
|
||||||
if self.path != "/execute":
|
parsed = urlsplit(self.path)
|
||||||
|
if parsed.path != "/execute":
|
||||||
self._respond(404, {"error": "not found"})
|
self._respond(404, {"error": "not found"})
|
||||||
return
|
return
|
||||||
|
|
||||||
|
debug_enabled = _debug_requested(parsed.query)
|
||||||
|
audit_dir = os.environ.get("LLM_CONNECT_AUDIT_DIR")
|
||||||
length = int(self.headers.get("Content-Length", 0))
|
length = int(self.headers.get("Content-Length", 0))
|
||||||
raw = self.rfile.read(length)
|
raw = self.rfile.read(length)
|
||||||
try:
|
try:
|
||||||
@@ -70,9 +82,19 @@ class _Handler(BaseHTTPRequestHandler):
|
|||||||
return
|
return
|
||||||
config = RunConfig.from_dict(cfg)
|
config = RunConfig.from_dict(cfg)
|
||||||
|
|
||||||
|
start = time.time()
|
||||||
|
diagnostics_enabled = debug_enabled or bool(audit_dir)
|
||||||
try:
|
try:
|
||||||
response = self.server.adapter.execute_prompt(prompt, config) # type: ignore[attr-defined]
|
with capture_diagnostics(diagnostics_enabled) as diagnostics:
|
||||||
self._respond(200, response.to_dict())
|
response = self.server.adapter.execute_prompt(prompt, config) # type: ignore[attr-defined]
|
||||||
|
latency = time.time() - start
|
||||||
|
body = response.to_dict()
|
||||||
|
debug = diagnostics.to_dict() if diagnostics is not None else None
|
||||||
|
if debug_enabled and debug is not None:
|
||||||
|
body["debug"] = debug
|
||||||
|
if audit_dir:
|
||||||
|
_write_audit_record(audit_dir, prompt, config, response, debug, latency)
|
||||||
|
self._respond(200, body)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
self._respond(500, {"error": str(exc)})
|
self._respond(500, {"error": str(exc)})
|
||||||
|
|
||||||
@@ -102,7 +124,7 @@ class LLMServer:
|
|||||||
host: str = "127.0.0.1",
|
host: str = "127.0.0.1",
|
||||||
port: int = 8080,
|
port: int = 8080,
|
||||||
) -> None:
|
) -> None:
|
||||||
self._httpd = HTTPServer((host, port), _Handler)
|
self._httpd = ThreadingHTTPServer((host, port), _Handler)
|
||||||
self._httpd.adapter = adapter # type: ignore[attr-defined]
|
self._httpd.adapter = adapter # type: ignore[attr-defined]
|
||||||
self._thread: Optional[threading.Thread] = None
|
self._thread: Optional[threading.Thread] = None
|
||||||
|
|
||||||
@@ -138,6 +160,55 @@ def _build_adapter(provider: str, model: Optional[str]) -> LLMAdapter:
|
|||||||
return create_adapter(provider, model=model)
|
return create_adapter(provider, model=model)
|
||||||
|
|
||||||
|
|
||||||
|
def _debug_requested(query: str) -> bool:
|
||||||
|
env = os.environ.get("LLM_CONNECT_DEBUG", "")
|
||||||
|
if _truthy(env):
|
||||||
|
return True
|
||||||
|
values = parse_qs(query).get("debug", [])
|
||||||
|
return any(_truthy(value) for value in values)
|
||||||
|
|
||||||
|
|
||||||
|
def _truthy(value: str) -> bool:
|
||||||
|
return value.strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
|
||||||
|
|
||||||
|
def _write_audit_record(
|
||||||
|
audit_dir: str,
|
||||||
|
prompt: str,
|
||||||
|
config: RunConfig,
|
||||||
|
response: LLMResponse,
|
||||||
|
debug: dict | None,
|
||||||
|
latency_seconds: float,
|
||||||
|
) -> None:
|
||||||
|
target_dir = Path(audit_dir)
|
||||||
|
target_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
now = _dt.datetime.now(_dt.timezone.utc)
|
||||||
|
response_id = str(response.metadata.get("response_id") or uuid.uuid4().hex)
|
||||||
|
filename = f"{now.strftime('%Y%m%dT%H%M%S%fZ')}-{_safe_filename(response_id)}.json"
|
||||||
|
diagnostics = debug or {}
|
||||||
|
record = {
|
||||||
|
"timestamp": now.isoformat().replace("+00:00", "Z"),
|
||||||
|
"prompt": prompt,
|
||||||
|
"config": config.to_dict(),
|
||||||
|
"provider": response.metadata.get("provider"),
|
||||||
|
"provider_request": diagnostics.get("provider_request"),
|
||||||
|
"provider_response": diagnostics.get("provider_response"),
|
||||||
|
"adapter_transformations": diagnostics.get("adapter_transformations", []),
|
||||||
|
"parsed_content": response.content,
|
||||||
|
"latency_seconds": round(latency_seconds, 3),
|
||||||
|
"response": response.to_dict(),
|
||||||
|
}
|
||||||
|
(target_dir / filename).write_text(
|
||||||
|
json.dumps(record, indent=2, sort_keys=True),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_filename(value: str) -> str:
|
||||||
|
return re.sub(r"[^A-Za-z0-9_.-]+", "-", value).strip("-") or "response"
|
||||||
|
|
||||||
|
|
||||||
def main(argv=None) -> None:
|
def main(argv=None) -> None:
|
||||||
parser = argparse.ArgumentParser(
|
parser = argparse.ArgumentParser(
|
||||||
prog="python -m llm_connect.server",
|
prog="python -m llm_connect.server",
|
||||||
|
|||||||
81
tests/test_payload.py
Normal file
81
tests/test_payload.py
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
from llm_connect._payload import merge_gemini_model_params, merge_openai_chat_model_params
|
||||||
|
|
||||||
|
|
||||||
|
STRUCTURED_SCHEMA = {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"summary": {"type": "string"},
|
||||||
|
"recommendations": {"type": "array", "items": {"type": "string"}},
|
||||||
|
},
|
||||||
|
"required": ["summary", "recommendations"],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
ACTIVITY_CORE_MODEL_PARAMS = {
|
||||||
|
"reasoning_effort": "medium",
|
||||||
|
"max_depth": 4,
|
||||||
|
"json_schema": STRUCTURED_SCHEMA,
|
||||||
|
"top_p": 0.8,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_openai_chat_model_params_translate_activity_core_shape():
|
||||||
|
payload = {
|
||||||
|
"model": "gpt-4.1-mini",
|
||||||
|
"messages": [{"role": "user", "content": "triage"}],
|
||||||
|
"temperature": 0.2,
|
||||||
|
"max_tokens": 200,
|
||||||
|
}
|
||||||
|
|
||||||
|
merge_openai_chat_model_params(payload, ACTIVITY_CORE_MODEL_PARAMS)
|
||||||
|
|
||||||
|
assert payload["response_format"] == {
|
||||||
|
"type": "json_schema",
|
||||||
|
"json_schema": {
|
||||||
|
"name": "structured_output",
|
||||||
|
"schema": STRUCTURED_SCHEMA,
|
||||||
|
"strict": False,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
assert payload["top_p"] == 0.8
|
||||||
|
assert "reasoning_effort" not in payload
|
||||||
|
assert "max_depth" not in payload
|
||||||
|
assert "json_schema" not in payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_openai_chat_model_params_preserve_explicit_response_format():
|
||||||
|
explicit = {
|
||||||
|
"type": "json_schema",
|
||||||
|
"json_schema": {
|
||||||
|
"name": "custom",
|
||||||
|
"schema": STRUCTURED_SCHEMA,
|
||||||
|
"strict": True,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
payload = {"model": "gpt-4.1-mini", "messages": []}
|
||||||
|
|
||||||
|
merge_openai_chat_model_params(
|
||||||
|
payload,
|
||||||
|
{"json_schema": STRUCTURED_SCHEMA, "response_format": explicit},
|
||||||
|
)
|
||||||
|
|
||||||
|
assert payload["response_format"] == explicit
|
||||||
|
|
||||||
|
|
||||||
|
def test_gemini_model_params_translate_activity_core_shape():
|
||||||
|
payload = {
|
||||||
|
"contents": [{"role": "user", "parts": [{"text": "triage"}]}],
|
||||||
|
"generationConfig": {
|
||||||
|
"temperature": 0.2,
|
||||||
|
"maxOutputTokens": 200,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
merge_gemini_model_params(payload, ACTIVITY_CORE_MODEL_PARAMS)
|
||||||
|
|
||||||
|
assert payload["generationConfig"]["responseMimeType"] == "application/json"
|
||||||
|
assert payload["generationConfig"]["responseSchema"] == STRUCTURED_SCHEMA
|
||||||
|
assert payload["generationConfig"]["topP"] == 0.8
|
||||||
|
assert "reasoning_effort" not in payload
|
||||||
|
assert "max_depth" not in payload
|
||||||
|
assert "json_schema" not in payload
|
||||||
62
tests/test_replay.py
Normal file
62
tests/test_replay.py
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
from llm_connect.replay import parse_audit_record
|
||||||
|
|
||||||
|
|
||||||
|
STRUCTURED_SCHEMA = {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"summary": {"type": "string"},
|
||||||
|
"recommendations": {"type": "array", "items": {"type": "string"}},
|
||||||
|
},
|
||||||
|
"required": ["summary", "recommendations"],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_replay_parses_openai_style_provider_response():
|
||||||
|
record = {
|
||||||
|
"provider": "openrouter",
|
||||||
|
"config": {"model_params": {"json_schema": STRUCTURED_SCHEMA}},
|
||||||
|
"provider_response": {
|
||||||
|
"status": 200,
|
||||||
|
"body": {
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"message": {
|
||||||
|
"content": '{"summary":"ok","recommendations":[]}'
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"parsed_content": '{"summary":"ok","recommendations":[]}',
|
||||||
|
}
|
||||||
|
|
||||||
|
report = parse_audit_record(record)
|
||||||
|
|
||||||
|
assert report["parsed_content"] == '{"summary":"ok","recommendations":[]}'
|
||||||
|
assert report["matches_recorded_content"] is True
|
||||||
|
assert report["structured_output"] == {"checked": True, "valid": True}
|
||||||
|
|
||||||
|
|
||||||
|
def test_replay_reuses_claude_code_envelope_unwrapper():
|
||||||
|
record = {
|
||||||
|
"provider": "claude-code",
|
||||||
|
"config": {"model_params": {"json_schema": STRUCTURED_SCHEMA}},
|
||||||
|
"provider_response": {
|
||||||
|
"status": 0,
|
||||||
|
"body": {
|
||||||
|
"stdout": (
|
||||||
|
'{"type":"result","result":"prose",'
|
||||||
|
'"structured_result":"{\\"summary\\":\\"ok\\",'
|
||||||
|
'\\"recommendations\\":[]}"}'
|
||||||
|
),
|
||||||
|
"stderr": "",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"parsed_content": '{"summary":"ok","recommendations":[]}',
|
||||||
|
}
|
||||||
|
|
||||||
|
report = parse_audit_record(record)
|
||||||
|
|
||||||
|
assert report["parsed_content"] == '{"summary":"ok","recommendations":[]}'
|
||||||
|
assert report["matches_recorded_content"] is True
|
||||||
|
assert report["structured_output"] == {"checked": True, "valid": True}
|
||||||
@@ -2,14 +2,22 @@
|
|||||||
Tests for LLMServer HTTP serve mode (FR-1).
|
Tests for LLMServer HTTP serve mode (FR-1).
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from concurrent.futures import ThreadPoolExecutor
|
||||||
import json
|
import json
|
||||||
import urllib.error
|
import urllib.error
|
||||||
import urllib.request
|
import urllib.request
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
|
from llm_connect._diagnostics import (
|
||||||
|
record_adapter_transformation,
|
||||||
|
record_provider_request,
|
||||||
|
record_provider_response,
|
||||||
|
)
|
||||||
from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter
|
from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter
|
||||||
from llm_connect.models import RunConfig
|
from llm_connect.models import LLMResponse, RunConfig
|
||||||
from llm_connect.server import LLMServer
|
from llm_connect.server import LLMServer
|
||||||
|
|
||||||
|
|
||||||
@@ -45,6 +53,35 @@ def _post(url: str, body: dict) -> tuple[int, dict]:
|
|||||||
return exc.code, json.loads(exc.read())
|
return exc.code, json.loads(exc.read())
|
||||||
|
|
||||||
|
|
||||||
|
class DiagnosticLLMAdapter(MockLLMAdapter):
|
||||||
|
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
||||||
|
record_provider_request(
|
||||||
|
url="https://provider.example/v1/chat",
|
||||||
|
payload={"prompt": prompt, "model": config.model_name},
|
||||||
|
headers={"Authorization": "Bearer secret-token"},
|
||||||
|
)
|
||||||
|
response = super().execute_prompt(prompt, config)
|
||||||
|
response.metadata["provider"] = "diagnostic"
|
||||||
|
response.metadata["response_id"] = "diag-response"
|
||||||
|
record_provider_response(status=200, body={"id": "diag-response", "content": response.content})
|
||||||
|
record_adapter_transformation(
|
||||||
|
"diagnostic_transform",
|
||||||
|
{"before": prompt},
|
||||||
|
{"after": response.content},
|
||||||
|
)
|
||||||
|
return response
|
||||||
|
|
||||||
|
|
||||||
|
class BarrierLLMAdapter(MockLLMAdapter):
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__(mock_response="parallel")
|
||||||
|
self._barrier = threading.Barrier(2)
|
||||||
|
|
||||||
|
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
|
||||||
|
self._barrier.wait(timeout=2.0)
|
||||||
|
return super().execute_prompt(prompt, config)
|
||||||
|
|
||||||
|
|
||||||
class TestHealth:
|
class TestHealth:
|
||||||
def test_health_returns_200(self, server):
|
def test_health_returns_200(self, server):
|
||||||
status, body = _get(f"http://127.0.0.1:{server.port}/health")
|
status, body = _get(f"http://127.0.0.1:{server.port}/health")
|
||||||
@@ -65,6 +102,7 @@ class TestExecute:
|
|||||||
assert status == 200
|
assert status == 200
|
||||||
assert body["content"] == "hello world"
|
assert body["content"] == "hello world"
|
||||||
assert body["finish_reason"] == "stop"
|
assert body["finish_reason"] == "stop"
|
||||||
|
assert "debug" not in body
|
||||||
|
|
||||||
def test_response_includes_usage(self, server):
|
def test_response_includes_usage(self, server):
|
||||||
status, body = _post(
|
status, body = _post(
|
||||||
@@ -150,3 +188,86 @@ class TestExecute:
|
|||||||
)
|
)
|
||||||
assert status == 400
|
assert status == 400
|
||||||
assert "config" in body["error"]
|
assert "config" in body["error"]
|
||||||
|
|
||||||
|
def test_debug_query_returns_diagnostics(self):
|
||||||
|
s = LLMServer(adapter=DiagnosticLLMAdapter(mock_response="debug body"), port=0)
|
||||||
|
s.start()
|
||||||
|
try:
|
||||||
|
status, body = _post(
|
||||||
|
f"http://127.0.0.1:{s.port}/execute?debug=1",
|
||||||
|
{"prompt": "inspect", "config": {"model_name": "diagnostic-model"}},
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
s.stop()
|
||||||
|
|
||||||
|
assert status == 200
|
||||||
|
assert body["content"] == "debug body"
|
||||||
|
debug = body["debug"]
|
||||||
|
assert debug["provider_request"]["payload"] == {
|
||||||
|
"prompt": "inspect",
|
||||||
|
"model": "diagnostic-model",
|
||||||
|
}
|
||||||
|
assert debug["provider_request"]["headers_redacted"]["Authorization"] == "Bearer <redacted>"
|
||||||
|
assert debug["provider_response"]["status"] == 200
|
||||||
|
assert debug["adapter_transformations"][0]["step"] == "diagnostic_transform"
|
||||||
|
|
||||||
|
def test_debug_env_returns_diagnostics(self, monkeypatch):
|
||||||
|
monkeypatch.setenv("LLM_CONNECT_DEBUG", "1")
|
||||||
|
s = LLMServer(adapter=DiagnosticLLMAdapter(mock_response="debug body"), port=0)
|
||||||
|
s.start()
|
||||||
|
try:
|
||||||
|
status, body = _post(
|
||||||
|
f"http://127.0.0.1:{s.port}/execute",
|
||||||
|
{"prompt": "inspect"},
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
s.stop()
|
||||||
|
|
||||||
|
assert status == 200
|
||||||
|
assert "debug" in body
|
||||||
|
|
||||||
|
def test_audit_dir_records_replayable_call(self, monkeypatch, tmp_path):
|
||||||
|
monkeypatch.setenv("LLM_CONNECT_AUDIT_DIR", str(tmp_path))
|
||||||
|
s = LLMServer(adapter=DiagnosticLLMAdapter(mock_response="audit body"), port=0)
|
||||||
|
s.start()
|
||||||
|
try:
|
||||||
|
status, body = _post(
|
||||||
|
f"http://127.0.0.1:{s.port}/execute",
|
||||||
|
{"prompt": "audit me", "config": {"model_name": "audit-model"}},
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
s.stop()
|
||||||
|
|
||||||
|
assert status == 200
|
||||||
|
assert "debug" not in body
|
||||||
|
files = list(tmp_path.glob("*.json"))
|
||||||
|
assert len(files) == 1
|
||||||
|
record = json.loads(files[0].read_text(encoding="utf-8"))
|
||||||
|
assert record["prompt"] == "audit me"
|
||||||
|
assert record["config"]["model_name"] == "audit-model"
|
||||||
|
assert record["parsed_content"] == "audit body"
|
||||||
|
assert record["provider_request"]["headers_redacted"]["Authorization"] == "Bearer <redacted>"
|
||||||
|
assert record["provider_response"]["body"]["id"] == "diag-response"
|
||||||
|
assert record["latency_seconds"] >= 0
|
||||||
|
|
||||||
|
def test_execute_requests_run_concurrently(self):
|
||||||
|
s = LLMServer(adapter=BarrierLLMAdapter(), port=0)
|
||||||
|
s.start()
|
||||||
|
try:
|
||||||
|
start = time.monotonic()
|
||||||
|
with ThreadPoolExecutor(max_workers=2) as pool:
|
||||||
|
futures = [
|
||||||
|
pool.submit(
|
||||||
|
_post,
|
||||||
|
f"http://127.0.0.1:{s.port}/execute",
|
||||||
|
{"prompt": f"request {idx}"},
|
||||||
|
)
|
||||||
|
for idx in range(2)
|
||||||
|
]
|
||||||
|
results = [future.result(timeout=3.0) for future in futures]
|
||||||
|
elapsed = time.monotonic() - start
|
||||||
|
finally:
|
||||||
|
s.stop()
|
||||||
|
|
||||||
|
assert [status for status, _body in results] == [200, 200]
|
||||||
|
assert elapsed < 1.5
|
||||||
|
|||||||
142
tests/test_structured_output_smoke.py
Normal file
142
tests/test_structured_output_smoke.py
Normal file
@@ -0,0 +1,142 @@
|
|||||||
|
import json
|
||||||
|
|
||||||
|
from llm_connect.gemini import GeminiAdapter
|
||||||
|
from llm_connect.models import RunConfig
|
||||||
|
from llm_connect.openai import OpenAIAdapter
|
||||||
|
from llm_connect.openrouter import OpenRouterAdapter
|
||||||
|
|
||||||
|
|
||||||
|
STRUCTURED_SCHEMA = {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"summary": {"type": "string"},
|
||||||
|
"recommendations": {"type": "array", "items": {"type": "string"}},
|
||||||
|
},
|
||||||
|
"required": ["summary", "recommendations"],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
SMOKE_CONFIG = RunConfig(
|
||||||
|
model_name="gpt-4",
|
||||||
|
temperature=0.1,
|
||||||
|
max_tokens=300,
|
||||||
|
model_params={
|
||||||
|
"reasoning_effort": "medium",
|
||||||
|
"max_depth": 3,
|
||||||
|
"json_schema": STRUCTURED_SCHEMA,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
|
||||||
|
captured["url"] = url
|
||||||
|
captured["payload"] = payload
|
||||||
|
captured["headers"] = headers
|
||||||
|
captured["timeout"] = timeout
|
||||||
|
return {
|
||||||
|
"id": "or-response",
|
||||||
|
"model": payload["model"],
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"message": {
|
||||||
|
"content": json.dumps(
|
||||||
|
{"summary": "ok", "recommendations": ["keep payload clean"]}
|
||||||
|
)
|
||||||
|
},
|
||||||
|
"finish_reason": "stop",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
|
||||||
|
adapter = OpenRouterAdapter(
|
||||||
|
model="anthropic/claude-sonnet-4",
|
||||||
|
api_key="or-test",
|
||||||
|
api_base="https://openrouter.example/api/v1",
|
||||||
|
)
|
||||||
|
|
||||||
|
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
|
||||||
|
payload = captured["payload"]
|
||||||
|
|
||||||
|
assert response.model == "anthropic/claude-sonnet-4"
|
||||||
|
assert payload["model"] == "anthropic/claude-sonnet-4"
|
||||||
|
assert payload["response_format"]["json_schema"]["schema"] == STRUCTURED_SCHEMA
|
||||||
|
assert payload["response_format"]["json_schema"]["strict"] is False
|
||||||
|
assert "reasoning_effort" not in payload
|
||||||
|
assert "max_depth" not in payload
|
||||||
|
assert "json_schema" not in payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_openai_structured_output_payload(monkeypatch):
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
|
||||||
|
captured["payload"] = payload
|
||||||
|
return {
|
||||||
|
"id": "oa-response",
|
||||||
|
"model": payload["model"],
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"message": {
|
||||||
|
"content": json.dumps({"summary": "ok", "recommendations": []})
|
||||||
|
},
|
||||||
|
"finish_reason": "stop",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("llm_connect.openai.post_json", fake_post_json)
|
||||||
|
adapter = OpenAIAdapter(model="gpt-4.1-mini", api_key="sk-test")
|
||||||
|
|
||||||
|
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
|
||||||
|
payload = captured["payload"]
|
||||||
|
|
||||||
|
assert response.model == "gpt-4.1-mini"
|
||||||
|
assert payload["model"] == "gpt-4.1-mini"
|
||||||
|
assert payload["response_format"]["json_schema"]["schema"] == STRUCTURED_SCHEMA
|
||||||
|
assert "reasoning_effort" not in payload
|
||||||
|
assert "max_depth" not in payload
|
||||||
|
assert "json_schema" not in payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_gemini_structured_output_payload(monkeypatch):
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
|
||||||
|
captured["url"] = url
|
||||||
|
captured["payload"] = payload
|
||||||
|
return {
|
||||||
|
"candidates": [
|
||||||
|
{
|
||||||
|
"content": {
|
||||||
|
"parts": [
|
||||||
|
{"text": json.dumps({"summary": "ok", "recommendations": []})}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"finishReason": "STOP",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"usageMetadata": {
|
||||||
|
"promptTokenCount": 1,
|
||||||
|
"candidatesTokenCount": 2,
|
||||||
|
"totalTokenCount": 3,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("llm_connect.gemini.post_json", fake_post_json)
|
||||||
|
adapter = GeminiAdapter(model="gemini-2.5-flash", api_key="gemini-test")
|
||||||
|
|
||||||
|
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
|
||||||
|
payload = captured["payload"]
|
||||||
|
|
||||||
|
assert response.model == "gemini-2.5-flash"
|
||||||
|
assert payload["generationConfig"]["responseMimeType"] == "application/json"
|
||||||
|
assert payload["generationConfig"]["responseSchema"] == STRUCTURED_SCHEMA
|
||||||
|
assert "reasoning_effort" not in payload
|
||||||
|
assert "max_depth" not in payload
|
||||||
|
assert "json_schema" not in payload
|
||||||
@@ -4,11 +4,11 @@ type: workplan
|
|||||||
title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary"
|
title: "Ad hoc — llm-connect lessons from CUST-WP-0045 canary"
|
||||||
domain: custodian
|
domain: custodian
|
||||||
repo: llm-connect
|
repo: llm-connect
|
||||||
status: ready
|
status: finished
|
||||||
owner: custodian
|
owner: custodian
|
||||||
topic_slug: custodian
|
topic_slug: custodian
|
||||||
created: "2026-06-02"
|
created: "2026-06-02"
|
||||||
updated: "2026-06-02"
|
updated: "2026-06-03"
|
||||||
state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda"
|
state_hub_workstream_id: "1c936c91-79c7-427d-ab37-9052e8a61cda"
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -38,7 +38,7 @@ workplan.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: ADHOC-2026-06-02-T01
|
id: ADHOC-2026-06-02-T01
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"
|
state_hub_task_id: "69626e9e-29f1-40f6-8cd2-d38a7e802293"
|
||||||
```
|
```
|
||||||
@@ -78,7 +78,7 @@ debug field is omitted in normal mode.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: ADHOC-2026-06-02-T02
|
id: ADHOC-2026-06-02-T02
|
||||||
status: todo
|
status: done
|
||||||
priority: low
|
priority: low
|
||||||
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"
|
state_hub_task_id: "e2b1be30-71f7-4497-9b10-b0f24d37beba"
|
||||||
```
|
```
|
||||||
@@ -101,7 +101,7 @@ max of their individual latencies, not the sum.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: ADHOC-2026-06-02-T03
|
id: ADHOC-2026-06-02-T03
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"
|
state_hub_task_id: "da4821f0-a876-44ce-9dc3-f3fc67732d0f"
|
||||||
```
|
```
|
||||||
@@ -127,7 +127,7 @@ ergonomics.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: ADHOC-2026-06-02-T04
|
id: ADHOC-2026-06-02-T04
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"
|
state_hub_task_id: "f8a033e6-22ac-4700-b8d2-43a5d76a3751"
|
||||||
```
|
```
|
||||||
@@ -155,7 +155,7 @@ forbidden top-level fields, schema in the right wrapper).
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: ADHOC-2026-06-02-T05
|
id: ADHOC-2026-06-02-T05
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"
|
state_hub_task_id: "5d53dbb4-b374-45fe-b81c-ff0b222ca74f"
|
||||||
```
|
```
|
||||||
@@ -188,7 +188,7 @@ bug) before either was merged.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: ADHOC-2026-06-02-T06
|
id: ADHOC-2026-06-02-T06
|
||||||
status: todo
|
status: done
|
||||||
priority: low
|
priority: low
|
||||||
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"
|
state_hub_task_id: "33fcb951-d7ab-4d3c-8d67-9eebd986c711"
|
||||||
```
|
```
|
||||||
@@ -210,3 +210,21 @@ would only send OpenAI-valid fields. Codify the contract in
|
|||||||
|
|
||||||
Done when a new adapter author can read the doc and know what their
|
Done when a new adapter author can read the doc and know what their
|
||||||
`_merge_model_params` implementation must support.
|
`_merge_model_params` implementation must support.
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
Completed on 2026-06-03:
|
||||||
|
|
||||||
|
- Added opt-in `/execute` debug envelopes via `LLM_CONNECT_DEBUG=1` or
|
||||||
|
`?debug=1`, with redacted provider request/response capture and adapter
|
||||||
|
transformation records.
|
||||||
|
- Switched serve mode to `ThreadingHTTPServer` and added a concurrency
|
||||||
|
regression test.
|
||||||
|
- Added `LLM_CONNECT_AUDIT_DIR` per-call audit records plus
|
||||||
|
`python -m llm_connect.replay` for parser/unwrapper replay.
|
||||||
|
- Extracted shared OpenAI-compatible and Gemini payload translation helpers
|
||||||
|
and wired OpenRouter, OpenAI, and Gemini through them.
|
||||||
|
- Added CI-safe structured-output smoke tests that mock provider HTTP calls
|
||||||
|
and assert model routing plus payload shape.
|
||||||
|
- Documented the adapter `model_params` contract in
|
||||||
|
`docs/adapter-model-params.md`.
|
||||||
|
|||||||
Reference in New Issue
Block a user