3.6 KiB
Contract: HTTP Serve Mode
layer: Functional
maturity: Beta
module: llm_connect.server
since: WP-0003
Purpose
Expose any LLMAdapter as a lightweight HTTP service. Intended for
local/inter-process use; not hardened for public internet exposure.
API endpoints
GET /health
Liveness probe.
Response 200
{"status": "ok"}
POST /execute
Execute a prompt through the configured adapter.
Request body (JSON)
| Field | Type | Required | Description |
|---|---|---|---|
prompt |
string | yes | Prompt text |
config |
object | no | RunConfig overrides (see below) |
config sub-fields (all optional, defaults match RunConfig defaults):
| Field | Type | Default |
|---|---|---|
model_name |
string | "gpt-4" |
temperature |
float | 0.7 |
max_tokens |
int | 2000 |
timeout_seconds |
int | 300 |
Response 200 — LLMResponse.to_dict() shape
{
"content": "...",
"model": "...",
"usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
"finish_reason": "stop",
"metadata": {}
}
Error responses
| HTTP | Condition |
|---|---|
| 400 | Missing prompt field or invalid JSON body |
| 404 | Unknown path |
| 429 | Provider rate limit |
| 500 | Configuration or adapter failure |
| 502 | Provider API / transport failure |
| 504 | Provider timeout |
Server error bodies are structured and must not expose provider credentials:
{
"error": "provider_api_error",
"message": "HTTP 500 from https://provider.example/v1?key=<redacted>",
"type": "LLMAPIError",
"provider_status": 500
}
Known error codes include unknown_profile, configuration_error,
provider_api_error, provider_rate_limited, provider_timeout,
budget_exceeded, llm_error, and internal_error.
Runtime profiles
Server CLI mode wraps the configured adapter with runtime profile dispatch
unless --disable-profiles is passed. The activity-core profile
custodian-triage-balanced is built in and resolves to the configured provider
and model before calling the underlying adapter.
Default profile values:
| Field | Default |
|---|---|
| provider | openrouter |
| model | anthropic/claude-sonnet-4 |
| temperature | 0.2 |
| max_tokens | 1800 |
| max_depth | 2 |
| timeout_seconds | 300 |
| model_params.reasoning_effort | medium |
Profile provider/model and default call values can be overridden with
environment variables such as LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER,
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL, and
LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS. Operators can also set
LLM_CONNECT_PROFILES_JSON or LLM_CONNECT_PROFILE_FILE to provide JSON
profile definitions keyed by profile name.
Implementation notes
- Uses Python stdlib
http.server— no additional runtime dependency. - The
[server]optional-dependency group is reserved for future migration toaiohttp/starletteif native async serving is required. LLMServer(adapter, port=0)binds to an OS-assigned free port; read back viaserver.portafterstart().
CLI
python -m llm_connect.server [--host HOST] [--port PORT] [--provider PROVIDER] [--model MODEL] [--disable-profiles] [--strict-profiles]
CLI defaults can also be supplied with LLM_CONNECT_HOST, LLM_CONNECT_PORT,
LLM_CONNECT_PROVIDER, and LLM_CONNECT_MODEL. Default provider: mock. All
registered providers from create_adapter are valid.
Known consumers
inter-hub(IHUB-WP-0012 Phase 11): drives federation calls over HTTP from non-Python services.