# Contract: HTTP Serve Mode **layer:** Functional **maturity:** Beta **module:** `llm_connect.server` **since:** WP-0003 ## Purpose Expose any `LLMAdapter` as a lightweight HTTP service. Intended for local/inter-process use; not hardened for public internet exposure. ## API endpoints ### `GET /health` Liveness probe. **Response 200** ```json {"status": "ok"} ``` --- ### `POST /execute` Execute a prompt through the configured adapter. **Request body** (JSON) | Field | Type | Required | Description | |-------|------|----------|-------------| | `prompt` | string | yes | Prompt text | | `config` | object | no | `RunConfig` overrides (see below) | `config` sub-fields (all optional, defaults match `RunConfig` defaults): | Field | Type | Default | |-------|------|---------| | `model_name` | string | `"gpt-4"` | | `temperature` | float | `0.7` | | `max_tokens` | int | `2000` | | `timeout_seconds` | int | `300` | **Response 200** — `LLMResponse.to_dict()` shape ```json { "content": "...", "model": "...", "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}, "finish_reason": "stop", "metadata": {} } ``` **Error responses** | HTTP | Condition | |------|-----------| | 400 | Missing `prompt` field or invalid JSON body | | 404 | Unknown path | | 429 | Provider rate limit | | 500 | Configuration or adapter failure | | 502 | Provider API / transport failure | | 504 | Provider timeout | Server error bodies are structured and must not expose provider credentials: ```json { "error": "provider_api_error", "message": "HTTP 500 from https://provider.example/v1?key=", "type": "LLMAPIError", "provider_status": 500 } ``` Known error codes include `unknown_profile`, `configuration_error`, `provider_api_error`, `provider_rate_limited`, `provider_timeout`, `budget_exceeded`, `llm_error`, and `internal_error`. ## Runtime profiles Server CLI mode wraps the configured adapter with runtime profile dispatch unless `--disable-profiles` is passed. The activity-core profile `custodian-triage-balanced` is built in and resolves to the configured provider and model before calling the underlying adapter. Default profile values: | Field | Default | |-------|---------| | provider | `openrouter` | | model | `anthropic/claude-sonnet-4` | | temperature | `0.2` | | max_tokens | `1800` | | max_depth | `2` | | timeout_seconds | `300` | | model_params.reasoning_effort | `medium` | Profile provider/model and default call values can be overridden with environment variables such as `LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER`, `LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL`, and `LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS`. Operators can also set `LLM_CONNECT_PROFILES_JSON` or `LLM_CONNECT_PROFILE_FILE` to provide JSON profile definitions keyed by profile name. ## Implementation notes - Uses Python stdlib `http.server` — **no additional runtime dependency**. - The `[server]` optional-dependency group is reserved for future migration to `aiohttp`/`starlette` if native async serving is required. - `LLMServer(adapter, port=0)` binds to an OS-assigned free port; read back via `server.port` after `start()`. ## CLI ``` python -m llm_connect.server [--host HOST] [--port PORT] [--provider PROVIDER] [--model MODEL] [--disable-profiles] [--strict-profiles] ``` CLI defaults can also be supplied with `LLM_CONNECT_HOST`, `LLM_CONNECT_PORT`, `LLM_CONNECT_PROVIDER`, and `LLM_CONNECT_MODEL`. Default provider: `mock`. All registered providers from `create_adapter` are valid. ## Known consumers - `inter-hub` (IHUB-WP-0012 Phase 11): drives federation calls over HTTP from non-Python services.