Files
llm-connect/contracts/functional/server.md
tegwick 14ba47c129
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Add activity-core LLM endpoint support
2026-06-07 19:24:45 +02:00

3.6 KiB

Contract: HTTP Serve Mode

layer: Functional
maturity: Beta
module: llm_connect.server
since: WP-0003

Purpose

Expose any LLMAdapter as a lightweight HTTP service. Intended for local/inter-process use; not hardened for public internet exposure.

API endpoints

GET /health

Liveness probe.

Response 200

{"status": "ok"}

POST /execute

Execute a prompt through the configured adapter.

Request body (JSON)

Field Type Required Description
prompt string yes Prompt text
config object no RunConfig overrides (see below)

config sub-fields (all optional, defaults match RunConfig defaults):

Field Type Default
model_name string "gpt-4"
temperature float 0.7
max_tokens int 2000
timeout_seconds int 300

Response 200LLMResponse.to_dict() shape

{
  "content": "...",
  "model": "...",
  "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
  "finish_reason": "stop",
  "metadata": {}
}

Error responses

HTTP Condition
400 Missing prompt field or invalid JSON body
404 Unknown path
429 Provider rate limit
500 Configuration or adapter failure
502 Provider API / transport failure
504 Provider timeout

Server error bodies are structured and must not expose provider credentials:

{
  "error": "provider_api_error",
  "message": "HTTP 500 from https://provider.example/v1?key=<redacted>",
  "type": "LLMAPIError",
  "provider_status": 500
}

Known error codes include unknown_profile, configuration_error, provider_api_error, provider_rate_limited, provider_timeout, budget_exceeded, llm_error, and internal_error.

Runtime profiles

Server CLI mode wraps the configured adapter with runtime profile dispatch unless --disable-profiles is passed. The activity-core profile custodian-triage-balanced is built in and resolves to the configured provider and model before calling the underlying adapter.

Default profile values:

Field Default
provider openrouter
model anthropic/claude-sonnet-4
temperature 0.2
max_tokens 1800
max_depth 2
timeout_seconds 300
model_params.reasoning_effort medium

Profile provider/model and default call values can be overridden with environment variables such as LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER, LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL, and LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS. Operators can also set LLM_CONNECT_PROFILES_JSON or LLM_CONNECT_PROFILE_FILE to provide JSON profile definitions keyed by profile name.

Implementation notes

  • Uses Python stdlib http.serverno additional runtime dependency.
  • The [server] optional-dependency group is reserved for future migration to aiohttp/starlette if native async serving is required.
  • LLMServer(adapter, port=0) binds to an OS-assigned free port; read back via server.port after start().

CLI

python -m llm_connect.server [--host HOST] [--port PORT] [--provider PROVIDER] [--model MODEL] [--disable-profiles] [--strict-profiles]

CLI defaults can also be supplied with LLM_CONNECT_HOST, LLM_CONNECT_PORT, LLM_CONNECT_PROVIDER, and LLM_CONNECT_MODEL. Default provider: mock. All registered providers from create_adapter are valid.

Known consumers

  • inter-hub (IHUB-WP-0012 Phase 11): drives federation calls over HTTP from non-Python services.