Files
llm-connect/README.md
tegwick 14ba47c129
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Add activity-core LLM endpoint support
2026-06-07 19:24:45 +02:00

233 lines
6.9 KiB
Markdown

# llm-connect
Pluggable LLM adapters for Python and the commandline. Supports OpenRouter, Gemini,
OpenAI, and the Claude Code CLI out of the box, with a clean abstract interface for adding
your own.
## Quick start
```python
from llm_connect import create_adapter, RunConfig
adapter = create_adapter("gemini", model="gemini-2.5-flash")
config = RunConfig(temperature=0.7, max_tokens=1000)
response = adapter.execute_prompt("Summarise the value chain concept.", config)
print(response.content)
```
## Installation
```bash
pip install -e /path/to/llm-connect # local editable install
# or, once published:
pip install llm-connect
```
**Requires:** Python 3.10+, `toml`
## Providers
| Provider key | Class | Notes |
|---|---|---|
| `"openrouter"` | `OpenRouterAdapter` | OpenAI-compatible endpoint; supports all OpenRouter models |
| `"gemini"` | `GeminiAdapter` | Google Generative Language REST API; supports free tier |
```python
from llm_connect import create_adapter
# OpenRouter
adapter = create_adapter("openrouter", model="anthropic/claude-sonnet-4")
# Gemini (uses GEMINI_API_KEY env var or apikey-geminifree.txt)
adapter = create_adapter("gemini", model="gemini-2.5-flash")
# OpenAI (uses OPENAI_API_KEY env var)
adapter = create_adapter("openai", model="gpt-4.1-mini")
# Claude Code CLI (uses locally installed claude binary)
adapter = create_adapter("claude-code")
```
## API keys
Keys are resolved in this order (first found wins):
1. Explicit `api_key` argument to the constructor
2. Environment variable (e.g. `OPENROUTER_API_KEY`, `GEMINI_API_KEY`, `OPENAI_API_KEY`)
3. Key file in the project root (e.g. `apikey-openrouter.txt`, `apikey-geminifree.txt`)
## Core types
### `RunConfig`
Controls a single LLM call.
```python
from llm_connect import RunConfig
config = RunConfig(
model_name="gemini-2.5-flash", # overrides adapter default
temperature=0.3,
max_tokens=2000,
timeout_seconds=60,
)
```
| Field | Default | Description |
|---|---|---|
| `model_name` | `"gpt-4"` | Model identifier (adapter may override) |
| `temperature` | `0.7` | Sampling temperature |
| `max_tokens` | `2000` | Maximum output tokens |
| `model_params` | `{}` | Portable extras translated by each adapter; see `docs/adapter-model-params.md` |
| `max_depth` | `3` | Max nesting depth for recursive calls |
| `skip_if_exists` | `True` | Skip if identical input hash already processed |
| `timeout_seconds` | `300` | Request timeout |
### `LLMResponse`
Returned by every `execute_prompt` call.
```python
response = adapter.execute_prompt(prompt, config)
print(response.content) # generated text
print(response.model) # model actually used
print(response.usage) # {"prompt_tokens": …, "completion_tokens": …, "total_tokens": …}
print(response.finish_reason) # "stop", "length", etc.
```
## Server diagnostics
Serve mode can include a debug envelope without changing normal responses:
```bash
LLM_CONNECT_DEBUG=1 python -m llm_connect.server --provider openrouter
curl 'http://127.0.0.1:8080/execute?debug=1' -d '{"prompt":"hi"}'
```
Set `LLM_CONNECT_AUDIT_DIR=/path/to/audit` to write per-call replay records,
then parse one without another provider call:
```bash
python -m llm_connect.replay /path/to/audit/record.json --json
```
## Server runtime profiles
Serve mode enables named runtime profiles by default. A client can send
`config.model_name="custodian-triage-balanced"` and the server resolves it to
the configured provider/model before calling the adapter.
Useful runtime environment variables:
```bash
LLM_CONNECT_HOST=0.0.0.0
LLM_CONNECT_PORT=8080
LLM_CONNECT_PROVIDER=openrouter
LLM_CONNECT_MODEL=anthropic/claude-sonnet-4
LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER=openrouter
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=anthropic/claude-sonnet-4
```
For local smoke tests without provider credentials:
```bash
export LLM_CONNECT_MOCK_RESPONSE="$(python -c 'import json; print(json.dumps(json.load(open("fixtures/activity_core/daily-triage-valid-content.json"))))')"
python -m llm_connect.server --provider mock
python scripts/smoke_activity_core_endpoint.py --url http://127.0.0.1:8080
```
Disable profile dispatch with `--disable-profiles`. Set
`LLM_CONNECT_STRICT_PROFILES=1` or pass `--strict-profiles` to reject direct
model names that are not configured profiles.
## Writing your own adapter
```python
from llm_connect import LLMAdapter, RunConfig, LLMResponse
class MyAdapter(LLMAdapter):
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
# call your API here
return LLMResponse(content="...", model="my-model")
def validate_config(self, config: RunConfig) -> bool:
return True
```
## TOML configuration chain
The `resolve_llm()` function walks a 7-level priority chain to pick a
provider and model. This is used by the `llm-helper` integration but is also
available standalone:
```python
from llm_connect.toml_config import resolve_llm
resolved = resolve_llm(app_name="myapp")
print(resolved.provider, resolved.model, resolved.provider_source)
```
Priority order (highest first):
1. CLI flags (`cli_provider`, `cli_model` arguments)
2. Env var `{APP_NAME}_HELPER_MODEL` (model only)
3. User preference — `~/.config/{app_name}/config.toml` `[llm.preference]`
4. Directory preference — `.{app_name}.toml` `[llm.preference]`
5. Directory default — `.{app_name}.toml` `[llm.default]`
6. User default — `~/.config/{app_name}/config.toml` `[llm.default]`
7. Hardcoded fallback — `gemini / gemini-2.5-flash`
Example config file (`~/.config/myapp/config.toml`):
```toml
[llm.default]
provider = "gemini"
model = "gemini-2.5-flash"
[llm.preference]
provider = "openrouter"
model = "anthropic/claude-sonnet-4"
```
## Embeddings
```python
from llm_connect import create_embedding_adapter, EmbeddingCache
adapter = create_embedding_adapter("openai", model="text-embedding-3-small")
cache = EmbeddingCache(cache_dir=".embeddings")
# Get embedding (cached after first call)
vec = cache.get_or_compute("my text", lambda t: adapter.embed([t])[0])
```
## Exceptions
```python
from llm_connect.exceptions import (
LLMError, # base
LLMConfigurationError,# bad key, unknown provider
LLMAPIError, # HTTP error from provider (has .status_code)
LLMRateLimitError, # 429
LLMTimeoutError, # request timed out
LLMSubprocessError, # claude CLI failed (has .return_code, .stderr)
)
```
## Testing
```python
from llm_connect import MockLLMAdapter, RunConfig
mock = MockLLMAdapter(mock_response="Test response")
config = RunConfig()
response = mock.execute_prompt("any prompt", config)
assert response.content == "Test response"
assert mock.call_count == 1
```
## Origin
Extracted from the [markitect](https://github.com/worsch/markitect) project.
The `markitect.llm` module remains a re-export shim pointing here.