generated from coulomb/repo-seed
132 lines
3.6 KiB
Markdown
132 lines
3.6 KiB
Markdown
# Contract: HTTP Serve Mode
|
|
|
|
**layer:** Functional
|
|
**maturity:** Beta
|
|
**module:** `llm_connect.server`
|
|
**since:** WP-0003
|
|
|
|
## Purpose
|
|
|
|
Expose any `LLMAdapter` as a lightweight HTTP service. Intended for
|
|
local/inter-process use; not hardened for public internet exposure.
|
|
|
|
## API endpoints
|
|
|
|
### `GET /health`
|
|
|
|
Liveness probe.
|
|
|
|
**Response 200**
|
|
|
|
```json
|
|
{"status": "ok"}
|
|
```
|
|
|
|
---
|
|
|
|
### `POST /execute`
|
|
|
|
Execute a prompt through the configured adapter.
|
|
|
|
**Request body** (JSON)
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `prompt` | string | yes | Prompt text |
|
|
| `config` | object | no | `RunConfig` overrides (see below) |
|
|
|
|
`config` sub-fields (all optional, defaults match `RunConfig` defaults):
|
|
|
|
| Field | Type | Default |
|
|
|-------|------|---------|
|
|
| `model_name` | string | `"gpt-4"` |
|
|
| `temperature` | float | `0.7` |
|
|
| `max_tokens` | int | `2000` |
|
|
| `timeout_seconds` | int | `300` |
|
|
|
|
**Response 200** — `LLMResponse.to_dict()` shape
|
|
|
|
```json
|
|
{
|
|
"content": "...",
|
|
"model": "...",
|
|
"usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
|
|
"finish_reason": "stop",
|
|
"metadata": {}
|
|
}
|
|
```
|
|
|
|
**Error responses**
|
|
|
|
| HTTP | Condition |
|
|
|------|-----------|
|
|
| 400 | Missing `prompt` field or invalid JSON body |
|
|
| 404 | Unknown path |
|
|
| 429 | Provider rate limit |
|
|
| 500 | Configuration or adapter failure |
|
|
| 502 | Provider API / transport failure |
|
|
| 504 | Provider timeout |
|
|
|
|
Server error bodies are structured and must not expose provider credentials:
|
|
|
|
```json
|
|
{
|
|
"error": "provider_api_error",
|
|
"message": "HTTP 500 from https://provider.example/v1?key=<redacted>",
|
|
"type": "LLMAPIError",
|
|
"provider_status": 500
|
|
}
|
|
```
|
|
|
|
Known error codes include `unknown_profile`, `configuration_error`,
|
|
`provider_api_error`, `provider_rate_limited`, `provider_timeout`,
|
|
`budget_exceeded`, `llm_error`, and `internal_error`.
|
|
|
|
## Runtime profiles
|
|
|
|
Server CLI mode wraps the configured adapter with runtime profile dispatch
|
|
unless `--disable-profiles` is passed. The activity-core profile
|
|
`custodian-triage-balanced` is built in and resolves to the configured provider
|
|
and model before calling the underlying adapter.
|
|
|
|
Default profile values:
|
|
|
|
| Field | Default |
|
|
|-------|---------|
|
|
| provider | `openrouter` |
|
|
| model | `anthropic/claude-sonnet-4` |
|
|
| temperature | `0.2` |
|
|
| max_tokens | `1800` |
|
|
| max_depth | `2` |
|
|
| timeout_seconds | `300` |
|
|
| model_params.reasoning_effort | `medium` |
|
|
|
|
Profile provider/model and default call values can be overridden with
|
|
environment variables such as `LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER`,
|
|
`LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL`, and
|
|
`LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS`. Operators can also set
|
|
`LLM_CONNECT_PROFILES_JSON` or `LLM_CONNECT_PROFILE_FILE` to provide JSON
|
|
profile definitions keyed by profile name.
|
|
|
|
## Implementation notes
|
|
|
|
- Uses Python stdlib `http.server` — **no additional runtime dependency**.
|
|
- The `[server]` optional-dependency group is reserved for future migration
|
|
to `aiohttp`/`starlette` if native async serving is required.
|
|
- `LLMServer(adapter, port=0)` binds to an OS-assigned free port; read back
|
|
via `server.port` after `start()`.
|
|
|
|
## CLI
|
|
|
|
```
|
|
python -m llm_connect.server [--host HOST] [--port PORT] [--provider PROVIDER] [--model MODEL] [--disable-profiles] [--strict-profiles]
|
|
```
|
|
|
|
CLI defaults can also be supplied with `LLM_CONNECT_HOST`, `LLM_CONNECT_PORT`,
|
|
`LLM_CONNECT_PROVIDER`, and `LLM_CONNECT_MODEL`. Default provider: `mock`. All
|
|
registered providers from `create_adapter` are valid.
|
|
|
|
## Known consumers
|
|
|
|
- `inter-hub` (IHUB-WP-0012 Phase 11): drives federation calls over HTTP from non-Python services.
|