coulomb/llm-connect

Fork 0

generated from coulomb/repo-seed

Go to file

tegwick 9de0f495db

CI / test (3.10) (push) Has been cancelled

Details

CI / test (3.11) (push) Has been cancelled

Details

CI / test (3.12) (push) Has been cancelled

Details

Pass --output-format json with --json-schema and unwrap CLI envelope

The Claude Code adapter previously passed --json-schema alone. On Claude
CLI 2.1.160 that combination still emits the model's conversational
preamble on stdout while the schema-enforced structured payload ships on
a sidecar channel the adapter cannot read. Result: callers requesting
structured output got prose that fails JSON parsing downstream — exactly
the failure mode the activity-core CUST-WP-0045 daily triage canary hit
on 2026-06-01 ("Triage report generated and returned via structured
output. Key signals:..." → json.loads error at column 1).

Fix: when --json-schema is set, also pass --output-format json. The CLI
then writes a JSON envelope on stdout. The adapter unwraps it by
probing a small allowlist of known text-bearing fields (result,
result_text, content, text, output). Unknown envelope shapes fall
through to raw stdout so the operator can introspect the structure and
extend the allowlist.

The unwrap path is only triggered when --json-schema was set, so non-
schema callers keep the existing raw-stdout behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-02 10:20:24 +02:00

.claude/rules

Refresh agent instruction files

2026-05-18 16:55:44 +02:00

.github/workflows

feat: WP-0001 foundation + WP-0002 core extensions

2026-04-01 22:24:14 +00:00

contracts

Implement-LLM-WP-0005-cost-model-estimators

2026-05-19 05:02:20 +02:00

docs

Implement-LLM-WP-0005-cost-model-estimators

2026-05-19 05:02:20 +02:00

examples

Adaptive routing initial version

2026-05-18 11:38:12 +02:00

llm_connect

Pass --output-format json with --json-schema and unwrap CLI envelope

2026-06-02 10:20:24 +02:00

tests

Pass --output-format json with --json-schema and unwrap CLI envelope

2026-06-02 10:20:24 +02:00

workplans

Implement-LLM-WP-0005-cost-model-estimators

2026-05-19 05:02:20 +02:00

.custodian-brief.md

chore(consistency): sync task status from DB [auto]

2026-05-17 22:54:25 +02:00

.gitignore

chore: add .gitignore, remove pycache

2026-02-27 07:54:53 +01:00

AGENTS.md

Refresh agent instruction files

2026-05-18 16:55:44 +02:00

ARCHITECTURE-LAYERS.md

feat: WP-0001 foundation + WP-0002 core extensions

2026-04-01 22:24:14 +00:00

CLAUDE.md

chore(custodian): add CLAUDE.md and .claude/rules/ orientation files

2026-04-01 23:15:29 +02:00

FEATURE_REQUESTS.md

added feature requests

2026-04-01 21:08:15 +00:00

INTENT.md

Added INTENT.md file and reviewed scope

2026-05-03 17:46:24 +02:00

pyproject.toml

Implement-LLM-WP-0005-cost-model-estimators

2026-05-19 05:02:20 +02:00

README.md

Added INTENT.md file and reviewed scope

2026-05-03 17:46:24 +02:00

SCOPE.md

Scope update from repo-scoping refactor

2026-05-01 12:26:51 +02:00

tpsc.yaml

Third party services catalog declaration

2026-03-25 00:10:13 +01:00

uv.lock

Preserve llm-connect run config in server mode

2026-05-19 20:55:02 +02:00

README.md

llm-connect

Pluggable LLM adapters for Python and the commandline. Supports OpenRouter, Gemini, OpenAI, and the Claude Code CLI out of the box, with a clean abstract interface for adding your own.

Quick start

from llm_connect import create_adapter, RunConfig

adapter = create_adapter("gemini", model="gemini-2.5-flash")
config = RunConfig(temperature=0.7, max_tokens=1000)
response = adapter.execute_prompt("Summarise the value chain concept.", config)
print(response.content)

Installation

pip install -e /path/to/llm-connect     # local editable install
# or, once published:
pip install llm-connect

Requires: Python 3.10+, toml

Providers

Provider key	Class	Notes
`"openrouter"`	`OpenRouterAdapter`	OpenAI-compatible endpoint; supports all OpenRouter models
`"gemini"`	`GeminiAdapter`	Google Generative Language REST API; supports free tier

from llm_connect import create_adapter

# OpenRouter
adapter = create_adapter("openrouter", model="anthropic/claude-sonnet-4")

# Gemini (uses GEMINI_API_KEY env var or apikey-geminifree.txt)
adapter = create_adapter("gemini", model="gemini-2.5-flash")

# OpenAI (uses OPENAI_API_KEY env var)
adapter = create_adapter("openai", model="gpt-4.1-mini")

# Claude Code CLI (uses locally installed claude binary)
adapter = create_adapter("claude-code")

API keys

Keys are resolved in this order (first found wins):

Explicit api_key argument to the constructor
Environment variable (e.g. OPENROUTER_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY)
Key file in the project root (e.g. apikey-openrouter.txt, apikey-geminifree.txt)

Core types

`RunConfig`

Controls a single LLM call.

from llm_connect import RunConfig

config = RunConfig(
    model_name="gemini-2.5-flash",  # overrides adapter default
    temperature=0.3,
    max_tokens=2000,
    timeout_seconds=60,
)

Field	Default	Description
`model_name`	`"gpt-4"`	Model identifier (adapter may override)
`temperature`	`0.7`	Sampling temperature
`max_tokens`	`2000`	Maximum output tokens
`model_params`	`{}`	Extra provider-specific parameters
`max_depth`	`3`	Max nesting depth for recursive calls
`skip_if_exists`	`True`	Skip if identical input hash already processed
`timeout_seconds`	`300`	Request timeout

`LLMResponse`

Returned by every execute_prompt call.

response = adapter.execute_prompt(prompt, config)
print(response.content)       # generated text
print(response.model)         # model actually used
print(response.usage)         # {"prompt_tokens": …, "completion_tokens": …, "total_tokens": …}
print(response.finish_reason) # "stop", "length", etc.

Writing your own adapter

from llm_connect import LLMAdapter, RunConfig, LLMResponse

class MyAdapter(LLMAdapter):
    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
        # call your API here
        return LLMResponse(content="...", model="my-model")

    def validate_config(self, config: RunConfig) -> bool:
        return True

TOML configuration chain

The resolve_llm() function walks a 7-level priority chain to pick a provider and model. This is used by the llm-helper integration but is also available standalone:

from llm_connect.toml_config import resolve_llm

resolved = resolve_llm(app_name="myapp")
print(resolved.provider, resolved.model, resolved.provider_source)

Priority order (highest first):

CLI flags (cli_provider, cli_model arguments)
Env var {APP_NAME}_HELPER_MODEL (model only)
User preference — ~/.config/{app_name}/config.toml [llm.preference]
Directory preference — .{app_name}.toml [llm.preference]
Directory default — .{app_name}.toml [llm.default]
User default — ~/.config/{app_name}/config.toml [llm.default]
Hardcoded fallback — gemini / gemini-2.5-flash

Example config file (~/.config/myapp/config.toml):

[llm.default]
provider = "gemini"
model = "gemini-2.5-flash"

[llm.preference]
provider = "openrouter"
model = "anthropic/claude-sonnet-4"

Embeddings

from llm_connect import create_embedding_adapter, EmbeddingCache

adapter = create_embedding_adapter("openai", model="text-embedding-3-small")
cache = EmbeddingCache(cache_dir=".embeddings")

# Get embedding (cached after first call)
vec = cache.get_or_compute("my text", lambda t: adapter.embed([t])[0])

Exceptions

from llm_connect.exceptions import (
    LLMError,             # base
    LLMConfigurationError,# bad key, unknown provider
    LLMAPIError,          # HTTP error from provider (has .status_code)
    LLMRateLimitError,    # 429
    LLMTimeoutError,      # request timed out
    LLMSubprocessError,   # claude CLI failed (has .return_code, .stderr)
)

Testing

from llm_connect import MockLLMAdapter, RunConfig

mock = MockLLMAdapter(mock_response="Test response")
config = RunConfig()
response = mock.execute_prompt("any prompt", config)
assert response.content == "Test response"
assert mock.call_count == 1

Origin

Extracted from the markitect project. The markitect.llm module remains a re-export shim pointing here.