coulomb/llm-connect

Fork 0

generated from coulomb/repo-seed

Files

tegwick 14ba47c129

CI / test (3.10) (push) Has been cancelled

Details

CI / test (3.11) (push) Has been cancelled

Details

CI / test (3.12) (push) Has been cancelled

Details

Add activity-core LLM endpoint support

2026-06-07 19:24:45 +02:00

3.6 KiB

Raw Blame History

Contract: HTTP Serve Mode

layer: Functional
maturity: Beta
module: llm_connect.server
since: WP-0003

Purpose

Expose any LLMAdapter as a lightweight HTTP service. Intended for local/inter-process use; not hardened for public internet exposure.

API endpoints

`GET /health`

Liveness probe.

Response 200

{"status": "ok"}

`POST /execute`

Execute a prompt through the configured adapter.

Request body (JSON)

Field	Type	Required	Description
`prompt`	string	yes	Prompt text
`config`	object	no	`RunConfig` overrides (see below)

config sub-fields (all optional, defaults match RunConfig defaults):

Field	Type	Default
`model_name`	string	`"gpt-4"`
`temperature`	float	`0.7`
`max_tokens`	int	`2000`
`timeout_seconds`	int	`300`

Response 200 — LLMResponse.to_dict() shape

{
  "content": "...",
  "model": "...",
  "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
  "finish_reason": "stop",
  "metadata": {}
}

Error responses

HTTP	Condition
400	Missing `prompt` field or invalid JSON body
404	Unknown path
429	Provider rate limit
500	Configuration or adapter failure
502	Provider API / transport failure
504	Provider timeout

Server error bodies are structured and must not expose provider credentials:

{
  "error": "provider_api_error",
  "message": "HTTP 500 from https://provider.example/v1?key=<redacted>",
  "type": "LLMAPIError",
  "provider_status": 500
}

Known error codes include unknown_profile, configuration_error, provider_api_error, provider_rate_limited, provider_timeout, budget_exceeded, llm_error, and internal_error.

Runtime profiles

Server CLI mode wraps the configured adapter with runtime profile dispatch unless --disable-profiles is passed. The activity-core profile custodian-triage-balanced is built in and resolves to the configured provider and model before calling the underlying adapter.

Default profile values:

Field	Default
provider	`openrouter`
model	`anthropic/claude-sonnet-4`
temperature	`0.2`
max_tokens	`1800`
max_depth	`2`
timeout_seconds	`300`
model_params.reasoning_effort	`medium`

Profile provider/model and default call values can be overridden with environment variables such as LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER, LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL, and LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS. Operators can also set LLM_CONNECT_PROFILES_JSON or LLM_CONNECT_PROFILE_FILE to provide JSON profile definitions keyed by profile name.

Implementation notes

Uses Python stdlib http.server — no additional runtime dependency.
The [server] optional-dependency group is reserved for future migration to aiohttp/starlette if native async serving is required.
LLMServer(adapter, port=0) binds to an OS-assigned free port; read back via server.port after start().

CLI

python -m llm_connect.server [--host HOST] [--port PORT] [--provider PROVIDER] [--model MODEL] [--disable-profiles] [--strict-profiles]

CLI defaults can also be supplied with LLM_CONNECT_HOST, LLM_CONNECT_PORT, LLM_CONNECT_PROVIDER, and LLM_CONNECT_MODEL. Default provider: mock. All registered providers from create_adapter are valid.

Known consumers

inter-hub (IHUB-WP-0012 Phase 11): drives federation calls over HTTP from non-Python services.

3.6 KiB Raw Blame History

Contract: HTTP Serve Mode

Purpose

API endpoints

GET /health

POST /execute

Runtime profiles

Implementation notes

CLI

Known consumers

3.6 KiB

Raw Blame History

`GET /health`

`POST /execute`