feat: WP-0001 foundation + WP-0002 core extensions

WP-0001 — Foundation & GAAF Baseline
- SCOPE.md, ARCHITECTURE-LAYERS.md, contracts/ tree
- .claude/rules/ stubs filled (architecture, stack, boundary)
- 57 tests (pytest), pyproject.toml with ruff+mypy, CI workflow

WP-0002 — Core Extensions (FR-4 + FR-3)
- FR-4: BudgetTracker (thread-safe) + LLMBudgetExceededError +
  optional RunConfig.budget_tracker + enforcement in all adapters
- FR-3: async_execute_prompt on LLMAdapter ABC (asyncio.to_thread
  fallback) + native asyncio.create_subprocess_exec in ClaudeCodeAdapter

81 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-01 22:24:14 +00:00
parent 57b346bb8b
commit d71f4114d1
28 changed files with 1601 additions and 26 deletions

View File

@@ -1,8 +1,58 @@
## Architecture
<!-- TODO: Describe the key design decisions and component structure.
Key modules, data flows, external integrations, state machines, etc. -->
llm-connect is structured as a **GAAF-2026 layered library**. See
`ARCHITECTURE-LAYERS.md` for the full layer map and scorecard.
## Quick Reference
### Layer summary
`~/the-custodian/state-hub/mcp_server/TOOLS.md` — MCP tool reference
```
Core (frozen after v1)
LLMAdapter ABC adapter.py
RunConfig / LLMResponse models.py
LLMError hierarchy exceptions.py
MockLLMAdapter adapter.py ← test primitive, belongs with Core
Functional (evolvable, independently shippable)
OpenAIAdapter openai.py
GeminiAdapter gemini.py
OpenRouterAdapter openrouter.py
ClaudeCodeAdapter claude_code.py
EmbeddingAdapter ABC embedding_adapter.py
OpenAICompatibleEmbeddingAdapter embedding_openai.py
EmbeddingCache embedding_cache.py
create_adapter() factory.py
create_embedding_adapter() embedding_factory.py
_token_estimator _token_estimator.py
similarity utilities similarity.py
Configuration (user-controlled declarative state)
resolve_llm() chain toml_config.py ← 7-level TOML priority chain
LLMConfig / load_config config.py
_http shared utility _http.py ← also used by Functional adapters
```
### Dependency rule
Core ← Functional ← Configuration
No upward dependencies. `_http.py` is consumed by Functional only.
### Key design decisions
**API key resolution** (`config.resolve_api_key`): three-step chain —
explicit argument → environment variable → plaintext key file in project root.
Adapters raise `LLMConfigurationError` at construction time if no key is found
(except `ClaudeCodeAdapter` which needs no key).
**TOML config chain** (`toml_config.resolve_llm`): 7 priority levels allow
per-project and per-user LLM preferences. Currently defaults to `markitect`
app_name for backward compatibility — consumers pass their own `app_name`.
**Factory pattern** (`factory.create_adapter`): lazy imports prevent pulling
all provider SDKs at module load. Add a new provider by registering its FQN
in `_PROVIDERS`.
**ClaudeCodeAdapter subprocess model**: prompt is piped via stdin (not CLI
arg) to avoid shell argument length limits on large prompts (>30 KB).
**Retry logic**: `OpenAIAdapter` and `OpenRouterAdapter` retry on 429 and 5xx
with exponential backoff. `GeminiAdapter` does not (rate-limit handling deferred).

View File

@@ -1,8 +1,17 @@
## Repo boundary
This repo owns **{PROJECT_NAME}** only. It does not own:
This repo owns **llm-connect** — the multi-provider LLM client library — only.
<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code → the-custodian/state-hub/
-->
It does NOT own:
- **API key storage / secret management** → caller's environment (env vars,
key files, vault). llm-connect resolves keys but does not store them.
- **Consumer routing logic** → `inter-hub/AgentBridge.hs`, `markitect` etc.
`RoutingPolicy` (WP-0003) provides primitives; policy data belongs in the consumer.
- **The Claude Code CLI binary** → installed separately; `ClaudeCodeAdapter`
shells out to it.
- **markitect application code** → `markitect.llm` is a shim that re-exports
from here; all implementation lives in this repo.
- **State hub / custodian infrastructure** → `the-custodian/state-hub/`
- **IHF bridge scripts** → `inter-hub/scripts/llm_bridge.py` lives in inter-hub,
not here. llm-connect is a dependency of that script.

View File

@@ -1,19 +1,59 @@
## Stack
<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **Language:**
- **Key deps:**
- **Language:** Python 3.10+
- **Key deps (runtime):** `toml` (TOML config parsing)
- **Key deps (dev):** `pytest`, `ruff`, `mypy`
- **HTTP:** stdlib `urllib` via `_http.py` (no requests/httpx runtime dep)
- **Build:** setuptools / uv
## Dev Commands
```bash
# TODO: Fill in the standard commands for this repo
# Install dependencies
# Install (editable, with dev extras)
uv pip install -e ".[dev]"
# or
pip install -e ".[dev]"
# Run tests
uv run pytest
# or
pytest
# Lint / type check
# Lint
uv run ruff check .
# Build / package (if applicable)
# Type check
uv run mypy llm_connect
# Run a single test file
uv run pytest tests/test_models.py -v
# Build package (dry run)
uv build --no-sources
```
## Project layout
```
llm_connect/ source package
adapter.py LLMAdapter ABC + Mock/ErrorLLMAdapter
models.py RunConfig, LLMResponse
exceptions.py LLMError hierarchy
factory.py create_adapter()
openai.py OpenAIAdapter
gemini.py GeminiAdapter
openrouter.py OpenRouterAdapter
claude_code.py ClaudeCodeAdapter
embedding_adapter.py EmbeddingAdapter ABC
embedding_openai.py OpenAICompatibleEmbeddingAdapter
embedding_cache.py EmbeddingCache
embedding_factory.py create_embedding_adapter()
toml_config.py 7-level TOML config resolution
config.py LLMConfig, resolve_api_key, find_project_root
_http.py shared HTTP POST utility
_token_estimator.py rough token count estimate
similarity.py cosine similarity utilities
tests/ pytest test suite
contracts/ GAAF-2026 contract docs
workplans/ workplan files (LLM-WP-NNNN)
```

37
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,37 @@
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v3
- name: Install dependencies
run: uv pip install --system -e ".[dev]"
- name: Lint (ruff)
run: ruff check .
- name: Type check (mypy)
run: mypy llm_connect
- name: Test (pytest)
run: pytest

94
ARCHITECTURE-LAYERS.md Normal file
View File

@@ -0,0 +1,94 @@
# ARCHITECTURE-LAYERS.md
**Framework:** GAAF-2026
**Last reviewed:** 2026-04-01
**Repository purpose:** Multi-provider LLM client library — unified adapter interface for Python
**Next review:** 2026-07-01
---
## Layer Map
### Core (high rigidity — frozen after v1)
Domain-agnostic primitives. Must not change without a major version bump once stable.
| Module | Contents |
|--------|----------|
| `adapter.py` | `LLMAdapter` ABC (`execute_prompt`, `validate_config`); `MockLLMAdapter`; `ErrorLLMAdapter` |
| `models.py` | `RunConfig`, `LLMResponse` dataclasses |
| `exceptions.py` | `LLMError``LLMConfigurationError`, `LLMAPIError`, `LLMRateLimitError`, `LLMTimeoutError`, `LLMSubprocessError` |
**Contract:** `contracts/core/llm-adapter.md`
### Functional (medium rigidity — evolvable, versioned)
Value-realization modules. Each adapter is independently shippable.
Maturity states: **Experimental → Beta → Stable → Deprecated**
| Module | Contents | Maturity |
|--------|----------|----------|
| `openai.py` | `OpenAIAdapter` — OpenAI chat completions | Beta |
| `gemini.py` | `GeminiAdapter` — Google Generative Language API | Beta |
| `openrouter.py` | `OpenRouterAdapter` — OpenAI-compatible multi-model routing | Beta |
| `claude_code.py` | `ClaudeCodeAdapter``claude --print` subprocess | Beta |
| `embedding_adapter.py` | `EmbeddingAdapter` ABC | Beta |
| `embedding_openai.py` | `OpenAICompatibleEmbeddingAdapter` | Beta |
| `embedding_cache.py` | `EmbeddingCache` — disk-backed embedding cache | Beta |
| `embedding_factory.py` | `create_embedding_adapter()` factory | Beta |
| `factory.py` | `create_adapter()` factory — lazy provider registration | Beta |
| `_token_estimator.py` | Rough token count estimation (word-based) | Beta |
| `similarity.py` | `cosine_similarity`, `similarity_matrix`, `find_similar_pairs` | Beta |
**Planned additions (WP-0003):** `RoutingPolicy`, `server.py`
**Contracts:** `contracts/functional/`
### Configuration (very low rigidity — user-controlled declarative state)
| Module | Contents |
|--------|----------|
| `toml_config.py` | `resolve_llm()` — 7-level TOML priority chain; `ResolvedLLM`; `LLMLayer` |
| `config.py` | `LLMConfig` dataclass; `resolve_api_key()`; `find_project_root()`; `load_config()` |
| `_http.py` | Shared HTTP POST utility (used by Functional adapters) |
**Contracts:** `contracts/config/`
---
## Dependency Rule
```
Core ← Functional ← Configuration
```
Upward dependencies (Configuration → Functional, Functional → Core) are **prohibited**.
`_http.py` sits in the Configuration layer but is consumed only by Functional adapters — acceptable as a shared utility with no upward reach.
---
## Decisions Log
| Date | Decision | Rationale |
|------|----------|-----------|
| 2026-04-01 | FR-3 async: default executor fallback on ABC rather than abstract method | Non-breaking; existing adapters remain valid; native async opt-in per adapter |
| 2026-04-01 | FR-4 BudgetTracker: optional field on RunConfig, not a separate context object | Keeps RunConfig as single call config; avoids thread-local / contextvar complexity |
| 2026-04-01 | FR-1 HTTP server: optional dep `[server]`, not runtime dep | Keeps base install lightweight; most consumers call the library directly |
---
## GAAF-2026 Scorecard (initial baseline — 2026-04-01)
> Scoring: 0 = absent / harmful · 5 = excellent
| Dimension | Score | Notes |
|-----------|-------|-------|
| **Core** | 2.5 | ABC and models well-defined; no formal contracts, no tests, no invariant docs yet |
| **Functional** | 2.5 | Adapters isolated and independently usable; no maturity labels enforced, no tests |
| **Customization** | n/a | Not applicable (library, not SaaS) |
| **Configuration** | 2.0 | TOML chain works; no schema validation; `markitect` name coupling in toml_config defaults |
| **Extensions** | n/a | Not applicable yet (RoutingPolicy + server in WP-0003) |
| **Cross-layer** | 2.0 | Dependency direction correct; no CI fitness functions; no import graph checks |
| **Weighted total** | ~2.3 | Usable but vulnerable — WP-0001 targets ≥ 3.5 |
**Target after WP-0001:** ≥ 3.5 (Strong)
**Target after WP-0002 + WP-0003:** ≥ 4.0 (Strong / Exemplary)

45
SCOPE.md Normal file
View File

@@ -0,0 +1,45 @@
# SCOPE.md — llm-connect
## Purpose
`llm-connect` is a **multi-provider LLM client library for Python**.
It provides a unified adapter interface over OpenAI, Gemini, OpenRouter,
and the Claude Code CLI, with embedding support, token estimation, and a
TOML-based configuration chain.
Extracted from [markitect](https://github.com/worsch/markitect).
The `markitect.llm` module remains a re-export shim pointing here.
## This repo owns
- `LLMAdapter` ABC and `RunConfig` / `LLMResponse` data models (Core)
- All concrete provider adapters: `OpenAIAdapter`, `GeminiAdapter`,
`OpenRouterAdapter`, `ClaudeCodeAdapter` (Functional)
- Embedding adapters: `EmbeddingAdapter` ABC, `OpenAICompatibleEmbeddingAdapter`,
`EmbeddingCache`, `create_embedding_adapter` factory (Functional)
- TOML-based config resolution (`toml_config.py`, `config.py`) (Configuration)
- Shared HTTP utility (`_http.py`), token estimator (`_token_estimator.py`),
cosine similarity utilities (`similarity.py`)
- The full `LLMError` exception hierarchy
## This repo does NOT own
- Consumer application logic — that lives in `markitect`, `inter-hub`, etc.
- API key management infrastructure — keys are resolved from env vars or
plaintext key files; secret storage belongs in the calling environment
- Model routing decisions specific to a consumer — `RoutingPolicy` (WP-0003)
provides primitives; policy configuration belongs in the consumer
- The Claude Code CLI binary itself — `ClaudeCodeAdapter` shells out to `claude`
## Consumers (as of 2026-04-01)
| Consumer | How it uses llm-connect |
|----------|------------------------|
| `markitect` | Re-exports via `markitect.llm` shim; drives document generation |
| `inter-hub` (IHF) | Subprocess bridge (`scripts/llm_bridge.py` + `AgentBridge.hs`) for multi-agent federation |
## Versioning
- Current version: **0.1.0** (pre-release; API not yet stable)
- Core layer (`LLMAdapter`, `RunConfig`, `LLMResponse`) will be stabilised at **v1.0.0**
- Breaking Core changes require a major version bump

View File

@@ -0,0 +1,80 @@
# Contract: Configuration — TOML Config Chain
**Layer:** Configuration
**Version:** 0.1.0
**Last updated:** 2026-04-01
---
## resolve_llm()
`llm_connect.toml_config.resolve_llm(cli_provider, cli_model, app_name)`
Walks a 7-level priority chain to resolve provider and model independently.
Returns `ResolvedLLM(provider, model, provider_source, model_source)`.
### Priority chain (highest → lowest)
| Level | Source |
|-------|--------|
| 1 | CLI flags (`cli_provider`, `cli_model`) |
| 2 | Env var `{APP_NAME}_HELPER_MODEL` (model only) |
| 3 | User preference — `~/.config/{app_name}/config.toml` `[llm.preference]` |
| 4 | Directory preference — `.{app_name}.toml` `[llm.preference]` |
| 5 | Directory default — `.{app_name}.toml` `[llm.default]` |
| 6 | User default — `~/.config/{app_name}/config.toml` `[llm.default]` |
| 7 | Hardcoded fallback — `gemini / gemini-2.5-flash` |
### Invariants
- Always returns a fully-resolved `ResolvedLLM` (never raises, never returns None).
- Provider and model are resolved independently — a preference for model does
not imply a preference for provider.
- TOML parse errors are silently ignored (returns empty layer).
- `app_name` defaults to `"markitect"` for backward compatibility; consumers
should pass their own app name.
### Known issue
`toml_config.py` has `markitect`-specific defaults (`MARKITECT_HELPER_MODEL`,
`USER_CONFIG_DIR`). These are kept for backward compatibility but callers
outside markitect should always pass an explicit `app_name`.
---
## resolve_api_key()
`llm_connect.config.resolve_api_key(explicit, env_var, key_file_paths)`
Resolution order:
1. `explicit` argument
2. Environment variable `env_var`
3. First readable file in `key_file_paths` with non-empty content
Returns `None` if nothing is found. Never raises.
---
## find_project_root()
Walks up from CWD looking for `pyproject.toml`. Returns the containing directory
or `None`. Used by adapters to locate key files.
---
## LLMConfig
`llm_connect.config.LLMConfig`
Dataclass holding per-adapter configuration. Used directly by `OpenRouterAdapter`
and `ClaudeCodeAdapter`. Not required by the Core `LLMAdapter` ABC.
| Field | Default |
|-------|---------|
| `provider` | `"openrouter"` |
| `model` | `"anthropic/claude-sonnet-4"` |
| `api_key` | `None` |
| `api_base` | `"https://openrouter.ai/api/v1"` |
| `claude_cli_path` | `"claude"` |
| `timeout_seconds` | `300` |
| `max_retries` | `3` |

View File

@@ -0,0 +1,122 @@
# Contract: Core — LLMAdapter Interface
**Layer:** Core
**Version:** 0.1.0
**Status:** Draft (stabilises at v1.0.0)
**Last updated:** 2026-04-01
---
## LLMAdapter ABC
`llm_connect.adapter.LLMAdapter`
### Interface
```python
class LLMAdapter(ABC):
@abstractmethod
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
@abstractmethod
def validate_config(self, config: RunConfig) -> bool: ...
```
**Planned addition (WP-0002 T07):**
```python
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
# Default: runs execute_prompt in a thread executor
...
```
### Invariants
1. `execute_prompt` MUST return an `LLMResponse` with a non-empty `content` field on success.
2. `execute_prompt` MUST raise a subclass of `LLMError` on any failure — never a bare exception.
3. `validate_config` MUST be side-effect-free and return `bool` only.
4. `validate_config` returning `False` does not preclude calling `execute_prompt` — it is advisory.
5. Adapters MUST NOT mutate the `config` argument.
6. `execute_prompt` is allowed to be slow (network I/O) but MUST respect `config.timeout_seconds`.
### Failure modes
| Condition | Exception |
|-----------|-----------|
| Missing / invalid API key | `LLMConfigurationError` |
| HTTP 4xx (non-429) | `LLMAPIError` (with `.status_code`) |
| HTTP 429 | `LLMRateLimitError` |
| Request timeout | `LLMTimeoutError` |
| CLI subprocess failure | `LLMSubprocessError` (with `.return_code`, `.stderr`) |
| Token budget exceeded (WP-0002) | `LLMBudgetExceededError` |
### Compatibility rules
- Any code that accepts `LLMAdapter` MUST work with `MockLLMAdapter`.
- Adding new optional methods to the ABC is non-breaking (default implementations provided).
- Removing or changing the signature of `execute_prompt` or `validate_config` is a **breaking Core change** requiring a major version bump.
---
## RunConfig
`llm_connect.models.RunConfig`
### Fields and invariants
| Field | Type | Default | Invariant |
|-------|------|---------|-----------|
| `model_name` | `str` | `"gpt-4"` | Non-empty string; adapters MAY override |
| `temperature` | `float` | `0.7` | 0.0 ≤ temperature ≤ 2.0 |
| `max_tokens` | `int` | `2000` | > 0 |
| `model_params` | `dict` | `{}` | Provider-specific pass-through; no invariants |
| `max_depth` | `int` | `3` | ≥ 0 |
| `skip_if_exists` | `bool` | `True` | — |
| `timeout_seconds` | `int` | `300` | > 0 |
| `budget_tracker` | `BudgetTracker \| None` | `None` | Optional; added in WP-0002 |
Adapters MUST NOT mutate `RunConfig` fields.
---
## LLMResponse
`llm_connect.models.LLMResponse`
### Fields and invariants
| Field | Type | Invariant |
|-------|------|-----------|
| `content` | `str` | Non-empty on success; may be empty only if provider returned empty output |
| `model` | `str` | Non-empty; the model actually used (may differ from `RunConfig.model_name`) |
| `usage` | `dict` | Keys: `prompt_tokens`, `completion_tokens`, `total_tokens` (all int ≥ 0) |
| `finish_reason` | `str` | Provider-reported; `"stop"` is the normal value |
| `metadata` | `dict` | Arbitrary; always includes `"provider"` key |
---
## LLMError Hierarchy
```
LLMError
├── LLMConfigurationError bad key / unknown provider
├── LLMAPIError HTTP error (has .status_code, .response_body)
│ └── LLMRateLimitError 429
├── LLMTimeoutError request or subprocess timed out
├── LLMSubprocessError CLI failed (has .return_code, .stderr)
└── LLMBudgetExceededError token budget cap exceeded (WP-0002)
```
All exceptions carry optional `cause` (chained exception) and `context` (dict).
---
## Mock adapters
`MockLLMAdapter` and `ErrorLLMAdapter` are part of Core — they are test
primitives that any consumer may depend on without importing dev extras.
`MockLLMAdapter` invariants:
- Returns deterministic response without network I/O
- Increments `call_count` on each call
- Records `last_prompt` and `last_config`
- `reset()` clears all counters and recorded state

View File

@@ -0,0 +1,94 @@
# Contract: Functional — Provider Adapters
**Layer:** Functional
**Version:** 0.1.0
**Maturity:** Beta (all adapters)
**Last updated:** 2026-04-01
---
## Common adapter contract
All provider adapters implement `LLMAdapter` (see `contracts/core/llm-adapter.md`).
Additional shared guarantees:
- Constructors resolve API keys at instantiation and raise `LLMConfigurationError`
immediately if no key is found (fail-fast).
- HTTP-based adapters (`OpenAIAdapter`, `GeminiAdapter`, `OpenRouterAdapter`)
use `_http.post_json` and do not add runtime dependencies beyond stdlib.
- `metadata` in the returned `LLMResponse` always contains `"provider"` and
`"latency_seconds"` keys.
- HTTP adapters that retry (`OpenAIAdapter`, `OpenRouterAdapter`) use
exponential backoff: `sleep(2 ** attempt)` on 429 and 5xx.
---
## OpenAIAdapter
**Provider key:** `"openai"`
**Default model:** `gpt-4.1-mini`
**API:** `https://api.openai.com/v1/chat/completions`
**Auth:** `OPENAI_API_KEY` env var or `apikey-chatgpt.txt` in project root
**Retries:** 3 (exponential backoff on 429 and 5xx)
---
## GeminiAdapter
**Provider key:** `"gemini"`
**Default model:** `gemini-2.5-flash`
**API:** `https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent`
**Auth:** `GEMINI_API_KEY` env var or `apikey-geminifree.txt` in project root
**Retries:** 0 (no retry logic; rate-limit handling deferred)
**Note:** System prompt is simulated via a user/model turn pair (Gemini has no native system role).
---
## OpenRouterAdapter
**Provider key:** `"openrouter"`
**Default model:** `anthropic/claude-sonnet-4`
**API:** `https://openrouter.ai/api/v1/chat/completions` (configurable via `LLMConfig.api_base`)
**Auth:** `OPENROUTER_API_KEY` env var or `apikey-openrouter.txt` in project root
**Retries:** 3 (exponential backoff on 429 and 5xx)
**Note:** OpenRouter is an OpenAI-compatible endpoint; `RunConfig.model_params` are merged into the payload.
---
## ClaudeCodeAdapter
**Provider key:** `"claude-code"`
**Default model:** n/a (uses the CLI's configured default)
**Auth:** none (delegates to locally installed `claude` CLI)
**Subprocess:** `claude --print [--model M]` with prompt on stdin
**Token counts:** estimated via `_token_estimator` (not provider-reported)
**validate_config:** runs `claude --version`; returns `False` if CLI not found
---
## EmbeddingAdapter ABC
`llm_connect.embedding_adapter.EmbeddingAdapter`
```python
class EmbeddingAdapter(ABC):
@abstractmethod
def embed(self, texts: list[str]) -> list[list[float]]: ...
```
Invariant: returns a list of the same length as `texts`.
### OpenAICompatibleEmbeddingAdapter
Compatible with any OpenAI-format embedding endpoint (`/v1/embeddings`).
Default model: `text-embedding-3-small`.
---
## EmbeddingCache
`llm_connect.embedding_cache.EmbeddingCache`
Disk-backed cache keyed by text content (SHA-256 hash).
`get_or_compute(text, compute_fn)` returns cached vector or calls `compute_fn`.

View File

@@ -12,7 +12,7 @@ Quick start::
response = adapter.execute_prompt(prompt, run_config)
"""
from llm_connect.models import RunConfig, LLMResponse
from llm_connect.models import RunConfig, LLMResponse, BudgetTracker
from llm_connect.adapter import LLMAdapter, MockLLMAdapter, ErrorLLMAdapter
from llm_connect.factory import create_adapter
from llm_connect.openrouter import OpenRouterAdapter
@@ -27,6 +27,7 @@ from llm_connect.exceptions import (
LLMRateLimitError,
LLMTimeoutError,
LLMSubprocessError,
LLMBudgetExceededError,
)
from llm_connect.embedding_adapter import EmbeddingAdapter
from llm_connect.embedding_openai import OpenAICompatibleEmbeddingAdapter
@@ -41,6 +42,7 @@ from llm_connect.similarity import (
__all__ = [
"RunConfig",
"LLMResponse",
"BudgetTracker",
"LLMAdapter",
"MockLLMAdapter",
"ErrorLLMAdapter",
@@ -57,6 +59,7 @@ __all__ = [
"LLMRateLimitError",
"LLMTimeoutError",
"LLMSubprocessError",
"LLMBudgetExceededError",
"EmbeddingAdapter",
"OpenAICompatibleEmbeddingAdapter",
"EmbeddingCache",

View File

@@ -5,10 +5,11 @@ Implements abstraction layer for LLM integration, supporting
multiple providers (OpenAI, Anthropic, local models, etc.).
"""
import asyncio
from abc import ABC, abstractmethod
from typing import Dict, Any
from llm_connect.models import RunConfig, LLMResponse
from llm_connect.models import RunConfig, LLMResponse, BudgetTracker
class LLMAdapter(ABC):
@@ -40,6 +41,26 @@ class LLMAdapter(ABC):
"""
pass
async def async_execute_prompt(
self,
prompt: str,
config: RunConfig,
) -> LLMResponse:
"""Execute a prompt asynchronously.
Default implementation runs :meth:`execute_prompt` in a thread
executor so that the event loop is not blocked. Subclasses may
override with a native ``asyncio``-based implementation.
Args:
prompt: Compiled prompt text
config: Execution configuration
Returns:
LLMResponse with generated content
"""
return await asyncio.to_thread(self.execute_prompt, prompt, config)
@abstractmethod
def validate_config(self, config: RunConfig) -> bool:
"""
@@ -53,6 +74,27 @@ class LLMAdapter(ABC):
"""
pass
# ── Budget helpers (call in execute_prompt implementations) ─────
def _preflight_budget(self, config: RunConfig) -> None:
"""Raise ``LLMBudgetExceededError`` if the budget is already exhausted."""
if config.budget_tracker is not None and config.budget_tracker.remaining() == 0:
from llm_connect.exceptions import LLMBudgetExceededError
tracker = config.budget_tracker
raise LLMBudgetExceededError(
"Token budget exhausted before making request",
total=tracker.total,
spent=tracker.spent,
requested=0,
context={"total": tracker.total, "spent": tracker.spent},
)
def _consume_budget(self, config: RunConfig, response: LLMResponse) -> None:
"""Consume tokens from the budget tracker after a successful call."""
if config.budget_tracker is not None:
tokens = response.usage.get("total_tokens", 0)
config.budget_tracker.consume(tokens)
class MockLLMAdapter(LLMAdapter):
"""
@@ -88,11 +130,12 @@ class MockLLMAdapter(LLMAdapter):
Returns:
Mock LLMResponse
"""
self._preflight_budget(config)
self.call_count += 1
self.last_prompt = prompt
self.last_config = config
return LLMResponse(
response = LLMResponse(
content=self.mock_response,
model=config.model_name,
usage={
@@ -103,6 +146,8 @@ class MockLLMAdapter(LLMAdapter):
finish_reason="stop",
metadata={"mock": True},
)
self._consume_budget(config, response)
return response
def validate_config(self, config: RunConfig) -> bool:
"""

View File

@@ -2,6 +2,7 @@
Claude Code CLI adapter — runs the ``claude`` CLI as a subprocess.
"""
import asyncio
import subprocess
from typing import Optional
@@ -35,6 +36,7 @@ class ClaudeCodeAdapter(LLMAdapter):
# ── LLMAdapter interface ────────────────────────────────────────
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
self._preflight_budget(config)
cmd = [self._cli_path, "--print"]
if self._model:
cmd.extend(["--model", self._model])
@@ -66,7 +68,7 @@ class ClaudeCodeAdapter(LLMAdapter):
prompt_tokens = estimate_tokens(prompt)
completion_tokens = estimate_tokens(content)
return LLMResponse(
response = LLMResponse(
content=content,
model=self._model or "claude-code-cli",
usage={
@@ -80,6 +82,63 @@ class ClaudeCodeAdapter(LLMAdapter):
"cli_path": self._cli_path,
},
)
self._consume_budget(config, response)
return response
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
"""Native async implementation using asyncio.create_subprocess_exec."""
self._preflight_budget(config)
cmd = [self._cli_path, "--print"]
if self._model:
cmd.extend(["--model", self._model])
timeout = config.timeout_seconds or self._config.timeout_seconds
try:
proc = await asyncio.create_subprocess_exec(
*cmd,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(input=prompt.encode()),
timeout=timeout,
)
except asyncio.TimeoutError as exc:
raise LLMTimeoutError(
f"claude CLI timed out after {timeout}s",
cause=exc,
) from exc
if proc.returncode != 0:
raise LLMSubprocessError(
f"claude CLI exited with code {proc.returncode}",
return_code=proc.returncode,
stderr=stderr_bytes.decode(),
)
content = stdout_bytes.decode()
prompt_tokens = estimate_tokens(prompt)
completion_tokens = estimate_tokens(content)
response = LLMResponse(
content=content,
model=self._model or "claude-code-cli",
usage={
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens,
},
finish_reason="stop",
metadata={
"provider": "claude-code",
"cli_path": self._cli_path,
"async": True,
},
)
self._consume_budget(config, response)
return response
def validate_config(self, config: RunConfig) -> bool:
try:

View File

@@ -64,6 +64,30 @@ class LLMTimeoutError(LLMError):
pass
class LLMBudgetExceededError(LLMError):
"""Token budget cap exceeded during a call or delegation chain.
Attributes:
total: The configured token cap.
spent: Tokens already consumed before this call.
requested: Tokens this call would have consumed.
"""
def __init__(
self,
message: str,
total: int = 0,
spent: int = 0,
requested: int = 0,
cause: Optional[Exception] = None,
context: Optional[Dict[str, Any]] = None,
):
super().__init__(message, cause=cause, context=context)
self.total = total
self.spent = spent
self.requested = requested
class LLMSubprocessError(LLMError):
"""Claude Code CLI subprocess failed.

View File

@@ -2,6 +2,7 @@
Google Gemini adapter — calls the Generative Language REST API directly.
"""
import asyncio
import time
from typing import Optional, Dict, Any
@@ -48,6 +49,7 @@ class GeminiAdapter(LLMAdapter):
# ── LLMAdapter interface ────────────────────────────────────────
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
self._preflight_budget(config)
model = self._model
# Build Gemini request
@@ -92,7 +94,7 @@ class GeminiAdapter(LLMAdapter):
usage_meta = data.get("usageMetadata", {})
return LLMResponse(
response = LLMResponse(
content=content,
model=model,
usage={
@@ -106,6 +108,12 @@ class GeminiAdapter(LLMAdapter):
"latency_seconds": round(latency, 3),
},
)
self._consume_budget(config, response)
return response
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
"""Async wrapper — runs execute_prompt in a thread executor."""
return await asyncio.to_thread(self.execute_prompt, prompt, config)
def validate_config(self, config: RunConfig) -> bool:
if not self._api_key:

View File

@@ -5,8 +5,53 @@ These classes are the canonical definitions; they are re-exported by
markitect.prompts.execution.models for backward compatibility.
"""
import threading
from dataclasses import dataclass, field
from typing import Dict, Any
from typing import Dict, Any, Optional
class BudgetTracker:
"""Shared token budget for a call or delegation chain.
Thread-safe. Tracks cumulative token spend across multiple adapter
calls. Raises ``LLMBudgetExceededError`` when the cap is exceeded.
Example::
tracker = BudgetTracker(total=4000)
config = RunConfig(budget_tracker=tracker)
# All adapter calls sharing this config will consume from the same cap.
"""
def __init__(self, total: int) -> None:
if total <= 0:
raise ValueError(f"BudgetTracker total must be positive, got {total}")
self.total = total
self.spent = 0
self._lock = threading.Lock()
def remaining(self) -> int:
"""Return tokens remaining in the budget."""
return max(0, self.total - self.spent)
def consume(self, tokens: int) -> None:
"""Record *tokens* as spent. Raises ``LLMBudgetExceededError`` if cap exceeded."""
from llm_connect.exceptions import LLMBudgetExceededError # avoid circular at module load
with self._lock:
new_spent = self.spent + tokens
if new_spent > self.total:
raise LLMBudgetExceededError(
f"Token budget exceeded: {new_spent} tokens used, cap is {self.total}",
total=self.total,
spent=self.spent,
requested=tokens,
context={"total": self.total, "spent": self.spent, "requested": tokens},
)
self.spent = new_spent
def __repr__(self) -> str:
return f"BudgetTracker(total={self.total}, spent={self.spent}, remaining={self.remaining()})"
@dataclass
@@ -30,9 +75,10 @@ class RunConfig:
max_depth: int = 3
skip_if_exists: bool = True
timeout_seconds: int = 300
budget_tracker: Optional["BudgetTracker"] = field(default=None, repr=False)
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary."""
"""Convert to dictionary. ``budget_tracker`` is excluded (runtime object)."""
return {
"model_name": self.model_name,
"temperature": self.temperature,

View File

@@ -2,6 +2,7 @@
OpenAI (ChatGPT) adapter — calls the OpenAI chat completions API.
"""
import asyncio
import time
from typing import Optional, Dict, Any
@@ -51,6 +52,7 @@ class OpenAIAdapter(LLMAdapter):
# ── LLMAdapter interface ────────────────────────────────────────
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
self._preflight_budget(config)
model = self._model
messages: list[Dict[str, str]] = []
@@ -80,7 +82,7 @@ class OpenAIAdapter(LLMAdapter):
finish_reason = choice.get("finish_reason", "stop")
usage = data.get("usage", {})
return LLMResponse(
response = LLMResponse(
content=content,
model=data.get("model", model),
usage={
@@ -95,6 +97,12 @@ class OpenAIAdapter(LLMAdapter):
"response_id": data.get("id", ""),
},
)
self._consume_budget(config, response)
return response
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
"""Async wrapper — runs execute_prompt in a thread executor."""
return await asyncio.to_thread(self.execute_prompt, prompt, config)
def validate_config(self, config: RunConfig) -> bool:
if not self._api_key:

View File

@@ -2,6 +2,7 @@
OpenRouter adapter — calls the OpenAI-compatible chat completions API.
"""
import asyncio
import time
from typing import Optional, Dict, Any
@@ -55,6 +56,7 @@ class OpenRouterAdapter(LLMAdapter):
# ── LLMAdapter interface ────────────────────────────────────────
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
self._preflight_budget(config)
model = self._model if self._model != _DEFAULT_MODEL else (config.model_name or self._model)
messages: list[Dict[str, str]] = []
@@ -88,7 +90,7 @@ class OpenRouterAdapter(LLMAdapter):
finish_reason = choice.get("finish_reason", "stop")
usage = data.get("usage", {})
return LLMResponse(
response = LLMResponse(
content=content,
model=data.get("model", model),
usage={
@@ -103,6 +105,12 @@ class OpenRouterAdapter(LLMAdapter):
"response_id": data.get("id", ""),
},
)
self._consume_budget(config, response)
return response
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
"""Async wrapper — runs execute_prompt in a thread executor."""
return await asyncio.to_thread(self.execute_prompt, prompt, config)
def validate_config(self, config: RunConfig) -> bool:
if not self._api_key:

View File

@@ -14,6 +14,8 @@ dependencies = [
[project.optional-dependencies]
dev = [
"pytest>=7.0",
"ruff>=0.4",
"mypy>=1.10",
]
[tool.setuptools.packages.find]
@@ -23,4 +25,26 @@ include = ["llm_connect*"]
[dependency-groups]
dev = [
"pytest>=9.0.2",
"ruff>=0.4",
"mypy>=1.10",
]
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v"
[tool.ruff]
target-version = "py310"
line-length = 100
[tool.ruff.lint]
select = ["E", "F", "W", "I", "UP"]
ignore = ["E501"]
[tool.mypy]
python_version = "3.10"
strict = false
ignore_missing_imports = true
disallow_untyped_defs = true
warn_return_any = true
warn_unused_ignores = true

26
tests/conftest.py Normal file
View File

@@ -0,0 +1,26 @@
"""
Shared pytest fixtures for llm-connect tests.
"""
import pytest
from llm_connect.models import RunConfig, LLMResponse
from llm_connect.adapter import MockLLMAdapter
@pytest.fixture
def run_config():
"""Default RunConfig for tests."""
return RunConfig()
@pytest.fixture
def mock_adapter():
"""MockLLMAdapter with a predictable response."""
return MockLLMAdapter(mock_response="test response")
@pytest.fixture
def sample_response():
"""A minimal valid LLMResponse."""
return LLMResponse(content="hello", model="test-model")

77
tests/test_adapter.py Normal file
View File

@@ -0,0 +1,77 @@
"""
Tests for MockLLMAdapter and ErrorLLMAdapter (Core adapter utilities).
"""
import pytest
from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter
from llm_connect.models import RunConfig, LLMResponse
class TestMockLLMAdapter:
def test_returns_mock_response(self, mock_adapter, run_config):
response = mock_adapter.execute_prompt("hello", run_config)
assert response.content == "test response"
def test_returns_llm_response(self, mock_adapter, run_config):
response = mock_adapter.execute_prompt("hello", run_config)
assert isinstance(response, LLMResponse)
def test_call_count_increments(self, mock_adapter, run_config):
assert mock_adapter.call_count == 0
mock_adapter.execute_prompt("a", run_config)
mock_adapter.execute_prompt("b", run_config)
assert mock_adapter.call_count == 2
def test_records_last_prompt(self, mock_adapter, run_config):
mock_adapter.execute_prompt("my prompt", run_config)
assert mock_adapter.last_prompt == "my prompt"
def test_records_last_config(self, mock_adapter, run_config):
mock_adapter.execute_prompt("x", run_config)
assert mock_adapter.last_config is run_config
def test_reset_clears_state(self, mock_adapter, run_config):
mock_adapter.execute_prompt("x", run_config)
mock_adapter.reset()
assert mock_adapter.call_count == 0
assert mock_adapter.last_prompt is None
assert mock_adapter.last_config is None
def test_validate_config_always_true(self, mock_adapter, run_config):
assert mock_adapter.validate_config(run_config) is True
def test_usage_contains_expected_keys(self, mock_adapter, run_config):
response = mock_adapter.execute_prompt("prompt text", run_config)
assert "prompt_tokens" in response.usage
assert "completion_tokens" in response.usage
assert "total_tokens" in response.usage
def test_custom_response_text(self, run_config):
adapter = MockLLMAdapter(mock_response="custom answer")
response = adapter.execute_prompt("q", run_config)
assert response.content == "custom answer"
def test_default_response_text(self, run_config):
adapter = MockLLMAdapter()
response = adapter.execute_prompt("q", run_config)
assert response.content == "Mock LLM response"
def test_metadata_marks_as_mock(self, mock_adapter, run_config):
response = mock_adapter.execute_prompt("q", run_config)
assert response.metadata.get("mock") is True
class TestErrorLLMAdapter:
def test_raises_on_execute(self, run_config):
adapter = ErrorLLMAdapter()
with pytest.raises(RuntimeError):
adapter.execute_prompt("q", run_config)
def test_raises_with_custom_message(self, run_config):
adapter = ErrorLLMAdapter(error_message="boom")
with pytest.raises(RuntimeError, match="boom"):
adapter.execute_prompt("q", run_config)
def test_validate_config_returns_true(self, run_config):
adapter = ErrorLLMAdapter()
assert adapter.validate_config(run_config) is True

101
tests/test_async.py Normal file
View File

@@ -0,0 +1,101 @@
"""
Tests for async_execute_prompt (FR-3).
"""
import asyncio
import pytest
from llm_connect.models import RunConfig, BudgetTracker
from llm_connect.adapter import MockLLMAdapter
from llm_connect.exceptions import LLMBudgetExceededError
class TestAsyncExecutePrompt:
def test_default_fallback_returns_response(self):
adapter = MockLLMAdapter(mock_response="async result")
config = RunConfig()
response = asyncio.run(adapter.async_execute_prompt("hello", config))
assert response.content == "async result"
def test_gather_multiple_adapters(self):
"""asyncio.gather over N adapters completes without errors."""
adapters = [MockLLMAdapter(mock_response=f"resp-{i}") for i in range(4)]
config = RunConfig()
async def run():
return await asyncio.gather(*[
a.async_execute_prompt("prompt", config) for a in adapters
])
results = asyncio.run(run())
assert len(results) == 4
for i, r in enumerate(results):
assert r.content == f"resp-{i}"
def test_gather_increments_call_counts(self):
adapter = MockLLMAdapter()
config = RunConfig()
async def run():
await asyncio.gather(*[
adapter.async_execute_prompt("p", config) for _ in range(5)
])
asyncio.run(run())
assert adapter.call_count == 5
def test_concurrent_faster_than_sequential(self):
"""Gathering N async calls should not be N× slower than one call."""
import time
adapter = MockLLMAdapter()
config = RunConfig()
async def run_concurrent(n: int):
await asyncio.gather(*[
adapter.async_execute_prompt("p", config) for _ in range(n)
])
# Just verify it completes without deadlock or error — timing is CI-unreliable
asyncio.run(run_concurrent(10))
assert adapter.call_count == 10
def test_async_with_budget_tracker(self):
"""Budget enforcement works through async calls."""
tracker = BudgetTracker(total=10000)
config = RunConfig(budget_tracker=tracker)
adapter = MockLLMAdapter(mock_response="hi")
asyncio.run(adapter.async_execute_prompt("hello", config))
assert tracker.spent > 0
def test_async_exhausted_budget_raises(self):
"""Exhausted budget raises LLMBudgetExceededError in async context."""
tracker = BudgetTracker(total=1)
tracker.consume(1)
config = RunConfig(budget_tracker=tracker)
adapter = MockLLMAdapter()
with pytest.raises(LLMBudgetExceededError):
asyncio.run(adapter.async_execute_prompt("p", config))
def test_async_gather_with_shared_budget(self):
"""Shared budget across concurrent async calls is enforced correctly."""
tracker = BudgetTracker(total=100000)
config = RunConfig(budget_tracker=tracker)
adapters = [MockLLMAdapter(mock_response="hi") for _ in range(4)]
async def run():
await asyncio.gather(*[
a.async_execute_prompt("hello", config) for a in adapters
])
asyncio.run(run())
assert tracker.spent > 0
def test_returns_llm_response_type(self):
from llm_connect.models import LLMResponse
adapter = MockLLMAdapter()
config = RunConfig()
response = asyncio.run(adapter.async_execute_prompt("q", config))
assert isinstance(response, LLMResponse)

152
tests/test_budget.py Normal file
View File

@@ -0,0 +1,152 @@
"""
Tests for BudgetTracker (FR-4) and LLMBudgetExceededError.
"""
import threading
import pytest
from llm_connect.models import BudgetTracker, RunConfig
from llm_connect.adapter import MockLLMAdapter
from llm_connect.exceptions import LLMBudgetExceededError, LLMError
class TestBudgetTracker:
def test_initial_state(self):
t = BudgetTracker(total=1000)
assert t.total == 1000
assert t.spent == 0
assert t.remaining() == 1000
def test_consume_updates_spent(self):
t = BudgetTracker(total=1000)
t.consume(300)
assert t.spent == 300
assert t.remaining() == 700
def test_consume_multiple_times(self):
t = BudgetTracker(total=1000)
t.consume(400)
t.consume(400)
assert t.spent == 800
assert t.remaining() == 200
def test_consume_exact_budget(self):
t = BudgetTracker(total=100)
t.consume(100)
assert t.spent == 100
assert t.remaining() == 0
def test_consume_exceeds_budget_raises(self):
t = BudgetTracker(total=100)
t.consume(60)
with pytest.raises(LLMBudgetExceededError):
t.consume(50)
def test_exceeded_error_carries_details(self):
t = BudgetTracker(total=100)
t.consume(80)
with pytest.raises(LLMBudgetExceededError) as exc_info:
t.consume(30)
err = exc_info.value
assert err.total == 100
assert err.spent == 80
assert err.requested == 30
def test_exceeded_error_is_subclass_of_llm_error(self):
with pytest.raises(LLMError):
t = BudgetTracker(total=10)
t.consume(20)
def test_remaining_never_negative(self):
t = BudgetTracker(total=100)
t.consume(100)
assert t.remaining() == 0
def test_invalid_total_raises(self):
with pytest.raises(ValueError):
BudgetTracker(total=0)
with pytest.raises(ValueError):
BudgetTracker(total=-1)
def test_repr(self):
t = BudgetTracker(total=500)
t.consume(100)
r = repr(t)
assert "500" in r
assert "100" in r
def test_thread_safety(self):
"""Concurrent consume() calls must not corrupt state or allow overspend."""
total = 1000
t = BudgetTracker(total=total)
errors = []
def consume_100():
try:
t.consume(100)
except LLMBudgetExceededError:
errors.append(1)
threads = [threading.Thread(target=consume_100) for _ in range(15)]
for th in threads:
th.start()
for th in threads:
th.join()
# At most 10 consumes of 100 can succeed within a budget of 1000
assert t.spent <= total
assert len(errors) == 5 # 15 attempts, 10 succeed, 5 fail
class TestBudgetEnforcementInAdapter:
def test_single_call_consumes_budget(self):
tracker = BudgetTracker(total=10000)
config = RunConfig(budget_tracker=tracker)
adapter = MockLLMAdapter(mock_response="hello world")
adapter.execute_prompt("test prompt", config)
assert tracker.spent > 0
def test_exhausted_budget_raises_before_call(self):
tracker = BudgetTracker(total=1)
tracker.consume(1) # exhaust it
config = RunConfig(budget_tracker=tracker)
adapter = MockLLMAdapter()
with pytest.raises(LLMBudgetExceededError):
adapter.execute_prompt("any prompt", config)
# Adapter should not have been called
assert adapter.call_count == 0
def test_delegation_chain_shared_tracker(self):
"""A → B → C sharing the same tracker enforces the cap across all calls."""
tracker = BudgetTracker(total=10000)
config = RunConfig(budget_tracker=tracker)
adapter = MockLLMAdapter(mock_response="response")
adapter.execute_prompt("call A", config)
adapter.execute_prompt("call B", config)
adapter.execute_prompt("call C", config)
assert adapter.call_count == 3
assert tracker.spent > 0
def test_budget_exceeded_mid_chain(self):
"""Chain stops when budget is exhausted between calls."""
# MockLLMAdapter uses word count for tokens — "x" * 200 = 200 token prompt
# mock_response "r" * 100 = 25 tokens; total ~75 per call
adapter = MockLLMAdapter(mock_response="r " * 50) # ~50 completion tokens
tracker = BudgetTracker(total=200)
config = RunConfig(budget_tracker=tracker)
# First call succeeds
adapter.execute_prompt("p " * 100, config)
# Eventually exhausts the budget
with pytest.raises(LLMBudgetExceededError):
for _ in range(10):
adapter.execute_prompt("p " * 100, config)
def test_no_tracker_has_no_effect(self):
"""Adapters work normally when no budget_tracker is set."""
config = RunConfig() # no budget_tracker
adapter = MockLLMAdapter()
response = adapter.execute_prompt("hello", config)
assert response.content == "Mock LLM response"

96
tests/test_exceptions.py Normal file
View File

@@ -0,0 +1,96 @@
"""
Tests for the LLMError exception hierarchy (Core).
"""
import pytest
from llm_connect.exceptions import (
LLMError,
LLMConfigurationError,
LLMAPIError,
LLMRateLimitError,
LLMTimeoutError,
LLMSubprocessError,
)
class TestLLMErrorHierarchy:
def test_all_are_subclasses_of_llm_error(self):
assert issubclass(LLMConfigurationError, LLMError)
assert issubclass(LLMAPIError, LLMError)
assert issubclass(LLMRateLimitError, LLMError)
assert issubclass(LLMTimeoutError, LLMError)
assert issubclass(LLMSubprocessError, LLMError)
def test_rate_limit_is_api_error(self):
assert issubclass(LLMRateLimitError, LLMAPIError)
def test_all_are_exceptions(self):
assert issubclass(LLMError, Exception)
class TestLLMError:
def test_basic_message(self):
err = LLMError("something went wrong")
assert str(err) == "something went wrong"
def test_context_appears_in_str(self):
err = LLMError("oops", context={"provider": "openai"})
assert "provider=openai" in str(err)
def test_cause_is_chained(self):
cause = ValueError("root cause")
err = LLMError("wrapper", cause=cause)
assert err.__cause__ is cause
def test_empty_context_does_not_appear(self):
err = LLMError("clean message", context={})
assert str(err) == "clean message"
class TestLLMAPIError:
def test_has_status_code(self):
err = LLMAPIError("bad request", status_code=400)
assert err.status_code == 400
def test_has_response_body(self):
err = LLMAPIError("error", status_code=500, response_body='{"error": "oops"}')
assert err.response_body == '{"error": "oops"}'
def test_defaults(self):
err = LLMAPIError("minimal")
assert err.status_code == 0
assert err.response_body == ""
def test_rate_limit_inherits_status_code(self):
err = LLMRateLimitError("too many", status_code=429)
assert err.status_code == 429
assert isinstance(err, LLMAPIError)
class TestLLMSubprocessError:
def test_has_return_code(self):
err = LLMSubprocessError("cli failed", return_code=1)
assert err.return_code == 1
def test_has_stderr(self):
err = LLMSubprocessError("cli failed", stderr="error output")
assert err.stderr == "error output"
def test_defaults(self):
err = LLMSubprocessError("minimal")
assert err.return_code == 1
assert err.stderr == ""
class TestRaiseAndCatch:
def test_catch_as_llm_error(self):
with pytest.raises(LLMError):
raise LLMConfigurationError("no key")
def test_catch_api_error_as_llm_error(self):
with pytest.raises(LLMError):
raise LLMAPIError("http error", status_code=502)
def test_catch_rate_limit_as_api_error(self):
with pytest.raises(LLMAPIError):
raise LLMRateLimitError("429", status_code=429)

97
tests/test_factory.py Normal file
View File

@@ -0,0 +1,97 @@
"""
Tests for create_adapter() and create_embedding_adapter() factories.
"""
import pytest
from llm_connect.factory import create_adapter
from llm_connect.embedding_factory import create_embedding_adapter
from llm_connect.exceptions import LLMConfigurationError
from llm_connect.adapter import LLMAdapter
from llm_connect.embedding_adapter import EmbeddingAdapter
from llm_connect.openrouter import OpenRouterAdapter
from llm_connect.claude_code import ClaudeCodeAdapter
from llm_connect.openai import OpenAIAdapter
from llm_connect.gemini import GeminiAdapter
from llm_connect.embedding_openai import OpenAICompatibleEmbeddingAdapter
class TestCreateAdapter:
def test_unknown_provider_raises(self):
with pytest.raises(LLMConfigurationError, match="Unknown LLM provider"):
create_adapter("nonexistent-provider")
def test_unknown_provider_error_lists_known(self):
with pytest.raises(LLMConfigurationError) as exc_info:
create_adapter("bad")
assert "openai" in str(exc_info.value)
assert "gemini" in str(exc_info.value)
def test_openrouter_returns_adapter(self):
adapter = create_adapter("openrouter", api_key="test-key")
assert isinstance(adapter, OpenRouterAdapter)
assert isinstance(adapter, LLMAdapter)
def test_openrouter_no_key_still_constructs(self):
# OpenRouterAdapter defers key validation to execute_prompt
adapter = create_adapter("openrouter")
assert isinstance(adapter, OpenRouterAdapter)
def test_openai_with_key_returns_adapter(self):
adapter = create_adapter("openai", api_key="sk-test-key")
assert isinstance(adapter, OpenAIAdapter)
assert isinstance(adapter, LLMAdapter)
def test_openai_without_key_raises(self, monkeypatch):
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
with pytest.raises(LLMConfigurationError):
create_adapter("openai")
def test_gemini_with_key_returns_adapter(self):
adapter = create_adapter("gemini", api_key="aistudio-test-key")
assert isinstance(adapter, GeminiAdapter)
assert isinstance(adapter, LLMAdapter)
def test_gemini_without_key_raises(self, monkeypatch):
monkeypatch.delenv("GEMINI_API_KEY", raising=False)
with pytest.raises(LLMConfigurationError):
create_adapter("gemini")
def test_claude_code_returns_adapter(self):
adapter = create_adapter("claude-code")
assert isinstance(adapter, ClaudeCodeAdapter)
assert isinstance(adapter, LLMAdapter)
def test_claude_code_with_model(self):
adapter = create_adapter("claude-code", model="claude-opus-4-6")
assert isinstance(adapter, ClaudeCodeAdapter)
def test_all_known_providers_are_reachable(self):
known = {"openrouter", "openai", "gemini", "claude-code"}
# Just verify each key is in the factory registry (no construction needed)
from llm_connect.factory import _PROVIDERS
assert known == set(_PROVIDERS.keys())
class TestCreateEmbeddingAdapter:
def test_unknown_provider_raises(self):
with pytest.raises(LLMConfigurationError, match="Unknown embedding provider"):
create_embedding_adapter("nonexistent")
def test_openai_returns_adapter(self):
adapter = create_embedding_adapter("openai", api_key="sk-test")
assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
assert isinstance(adapter, EmbeddingAdapter)
def test_openrouter_returns_adapter(self):
adapter = create_embedding_adapter("openrouter", api_key="or-test")
assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
assert isinstance(adapter, EmbeddingAdapter)
def test_validate_returns_true_when_key_set(self):
adapter = create_embedding_adapter("openai", api_key="sk-test")
assert adapter.validate() is True
def test_validate_returns_false_when_no_key(self, monkeypatch):
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
adapter = create_embedding_adapter("openai")
assert adapter.validate() is False

86
tests/test_models.py Normal file
View File

@@ -0,0 +1,86 @@
"""
Tests for RunConfig and LLMResponse (Core models).
"""
import pytest
from llm_connect.models import RunConfig, LLMResponse
class TestRunConfig:
def test_defaults(self):
cfg = RunConfig()
assert cfg.model_name == "gpt-4"
assert cfg.temperature == 0.7
assert cfg.max_tokens == 2000
assert cfg.model_params == {}
assert cfg.max_depth == 3
assert cfg.skip_if_exists is True
assert cfg.timeout_seconds == 300
def test_custom_values(self):
cfg = RunConfig(model_name="gemini-2.5-flash", temperature=0.1, max_tokens=500)
assert cfg.model_name == "gemini-2.5-flash"
assert cfg.temperature == 0.1
assert cfg.max_tokens == 500
def test_to_dict_roundtrip(self):
cfg = RunConfig(model_name="gpt-4o", temperature=0.3, max_tokens=1000)
d = cfg.to_dict()
assert d["model_name"] == "gpt-4o"
assert d["temperature"] == 0.3
assert d["max_tokens"] == 1000
def test_from_dict_roundtrip(self):
original = RunConfig(model_name="claude-3", temperature=0.5, max_tokens=800)
restored = RunConfig.from_dict(original.to_dict())
assert restored.model_name == original.model_name
assert restored.temperature == original.temperature
assert restored.max_tokens == original.max_tokens
def test_from_dict_uses_defaults_for_missing_keys(self):
cfg = RunConfig.from_dict({})
assert cfg.model_name == "gpt-4"
assert cfg.temperature == 0.7
def test_model_params_default_is_independent(self):
a = RunConfig()
b = RunConfig()
a.model_params["x"] = 1
assert "x" not in b.model_params
class TestLLMResponse:
def test_minimal_construction(self):
r = LLMResponse(content="hello", model="test-model")
assert r.content == "hello"
assert r.model == "test-model"
assert r.usage == {}
assert r.finish_reason == "stop"
assert r.metadata == {}
def test_full_construction(self):
r = LLMResponse(
content="response text",
model="gpt-4",
usage={"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
finish_reason="length",
metadata={"provider": "openai", "latency_seconds": 1.2},
)
assert r.usage["total_tokens"] == 15
assert r.finish_reason == "length"
assert r.metadata["provider"] == "openai"
def test_to_dict(self):
r = LLMResponse(content="hi", model="m", finish_reason="stop")
d = r.to_dict()
assert d["content"] == "hi"
assert d["model"] == "m"
assert d["finish_reason"] == "stop"
assert "usage" in d
assert "metadata" in d
def test_metadata_default_is_independent(self):
a = LLMResponse(content="a", model="m")
b = LLMResponse(content="b", model="m")
a.metadata["x"] = 1
assert "x" not in b.metadata

View File

@@ -0,0 +1,36 @@
# LLM-WP-0001 — Foundation & GAAF Baseline
**status:** active
**owner:** llm-connect
**repo:** llm-connect
**created:** 2026-04-01
## Purpose
Establish the structural foundation required before any Core modifications.
Covers repo orientation files, GAAF-2026 compliance artifacts, test suite, CI,
and state-hub housekeeping.
## Tasks
| ID | Title | Priority | Status |
|-----|-------|----------|--------|
| T01 | Create `SCOPE.md` | high | done |
| T02 | Fill `.claude/rules/` stubs: `architecture.md`, `stack-and-commands.md`, `repo-boundary.md` | high | done |
| T03 | Create `ARCHITECTURE-LAYERS.md` with layer map, scorecard stub, next-review date | high | done |
| T04 | Create `/contracts/` tree (`core/`, `functional/`, `config/`) | high | done |
| T05 | Core contract doc: `LLMAdapter` interface invariants, `RunConfig`/`LLMResponse` field contracts | high | done |
| T06 | Functional contract stubs for all 4 adapters + embedding adapters (maturity: Beta) | medium | done |
| T07 | Create `tests/` with `conftest.py`, wire pytest in `pyproject.toml` | high | done |
| T08 | Unit tests: `RunConfig`, `LLMResponse`, `MockLLMAdapter`, full exception hierarchy | high | done |
| T09 | Unit tests: `create_adapter` (all providers + unknown provider error), `create_embedding_adapter` | high | done |
| T10 | Add `ruff`, `mypy` to dev deps in `pyproject.toml` | medium | done |
| T11 | CI workflow: pytest + ruff + mypy | medium | done |
| T12 | State hub: register this host path, SBOM refresh | low | done |
## Exit criteria
- `ARCHITECTURE-LAYERS.md` and `/contracts/core/` exist and describe the current Core surface
- pytest passes with coverage of Core and factory
- ruff + mypy clean
- CI green on push

View File

@@ -0,0 +1,57 @@
# LLM-WP-0002 — Core Extensions (FR-4 + FR-3)
**status:** active
**owner:** llm-connect
**repo:** llm-connect
**created:** 2026-04-01
**depends-on:** LLM-WP-0001 (contracts and tests must exist before Core is modified)
## Purpose
Implement the two IHF feature requests that touch the Core layer.
FR-4 (BudgetTracker) is additive and non-breaking. FR-3 (async) extends
the Core ABC with a default executor fallback — non-breaking, overridable
per adapter for native async.
Origin: IHUB-WP-0012 Phase 11 — Advanced AI Federation (completed 2026-04-01).
## GAAF notes
Both changes are Core-layer modifications under GAAF-2026:
- FR-4: new primitive (`BudgetTracker`) + new exception (`LLMBudgetExceededError`)
added as optional `RunConfig` field — additive, non-breaking.
- FR-3: `async_execute_prompt` added to `LLMAdapter` ABC with a default
`asyncio.get_event_loop().run_in_executor(None, ...)` fallback so existing
adapters remain valid; native async overrides are provided per adapter.
Core contract doc (from WP-0001 T05) must be updated after each change.
## Tasks
### FR-4 — BudgetTracker
| ID | Title | Priority | Status |
|-----|-------|----------|--------|
| T01 | `BudgetTracker` dataclass: `total`, `spent`, `remaining()`, thread-safe increment | high | todo |
| T02 | `LLMBudgetExceededError(LLMError)` in `exceptions.py` | high | todo |
| T03 | Optional `budget_tracker: BudgetTracker \| None` field on `RunConfig` | high | todo |
| T04 | Enforcement: each adapter checks/updates tracker around call; raises on exceeded | high | todo |
| T05 | Update Core contract doc | medium | todo |
| T06 | Tests: single call, delegation chain (A→B→C shared tracker), exceeded error, multi-adapter | high | todo |
### FR-3 — async_execute_prompt
| ID | Title | Priority | Status |
|-----|-------|----------|--------|
| T07 | Add `async_execute_prompt` to `LLMAdapter` ABC with default executor fallback | high | todo |
| T08 | Native async override in `OpenAIAdapter`, `GeminiAdapter`, `OpenRouterAdapter` | high | todo |
| T09 | Native async for `ClaudeCodeAdapter` via `asyncio.create_subprocess_exec` | high | todo |
| T10 | Update Core contract doc | medium | todo |
| T11 | Tests: `asyncio.gather` over N adapters, timeout propagation, budget interaction | high | todo |
## Exit criteria
- `BudgetTracker` enforces caps across a delegation chain of 3 adapters in tests
- `asyncio.gather` over 4 mock adapters completes without errors
- All existing tests still pass (non-breaking validation)
- Core contract doc reflects both additions

View File

@@ -0,0 +1,51 @@
# LLM-WP-0003 — Functional Extensions (FR-2 + FR-1)
**status:** active
**owner:** llm-connect
**repo:** llm-connect
**created:** 2026-04-01
**depends-on:** LLM-WP-0001 (test infrastructure must exist)
## Purpose
Implement the two IHF feature requests that add new Functional-layer modules.
Neither touches Core. Both can be developed independently of WP-0002.
Origin: IHUB-WP-0012 Phase 11 — Advanced AI Federation (completed 2026-04-01).
## GAAF notes
Both additions are Functional-layer under GAAF-2026:
- Demand signal is explicit: IHF (inter-hub) is the primary consumer for both.
- Each gets its own functional contract doc in `/contracts/functional/`.
- Maturity on release: Beta (single known consumer, interface not yet stabilised).
## Tasks
### FR-2 — RoutingPolicy
| ID | Title | Priority | Status |
|-----|-------|----------|--------|
| T01 | `RoutingPolicy` data model: `rules` list with `task_type`, `prefer`, `max_cost_per_1k`, `fallback` | high | todo |
| T02 | `policy.resolve(task_type)` → returns configured `LLMAdapter` | high | todo |
| T03 | Export from `llm_connect.__init__` and update `__all__` | medium | todo |
| T04 | Functional contract doc for `RoutingPolicy` | medium | todo |
| T05 | Tests: rule match, cost-cap fallback, unknown task_type fallback, no-match default | high | todo |
### FR-1 — HTTP serve mode
| ID | Title | Priority | Status |
|-----|-------|----------|--------|
| T06 | Design `/execute` JSON schema (request: provider, model, prompt, config; response: LLMResponse fields) | high | todo |
| T07 | Implement `llm_connect/server.py` — minimal HTTP server, `POST /execute`, `GET /health` | high | todo |
| T08 | `python -m llm_connect.server --port N --provider X --model Y` CLI entry point | high | todo |
| T09 | Add `httpx` or `aiohttp` server dep under `[project.optional-dependencies] server` | medium | todo |
| T10 | Functional contract doc (API schema — request/response shapes, error codes) | medium | todo |
| T11 | Tests: spin up server in subprocess or via `TestClient`, POST round-trip (MockAdapter), error responses | high | todo |
## Exit criteria
- `RoutingPolicy.resolve("triage")` returns the correct adapter per rules in tests
- `python -m llm_connect.server --port 9999` starts and responds to `POST /execute`
- `GET /health` returns 200
- All functional contract docs present in `/contracts/functional/`