diff --git a/.claude/rules/architecture.md b/.claude/rules/architecture.md index 4f6c74c..d1484eb 100644 --- a/.claude/rules/architecture.md +++ b/.claude/rules/architecture.md @@ -1,8 +1,58 @@ ## Architecture - +llm-connect is structured as a **GAAF-2026 layered library**. See +`ARCHITECTURE-LAYERS.md` for the full layer map and scorecard. -## Quick Reference +### Layer summary -`~/the-custodian/state-hub/mcp_server/TOOLS.md` — MCP tool reference +``` +Core (frozen after v1) + LLMAdapter ABC adapter.py + RunConfig / LLMResponse models.py + LLMError hierarchy exceptions.py + MockLLMAdapter adapter.py ← test primitive, belongs with Core + +Functional (evolvable, independently shippable) + OpenAIAdapter openai.py + GeminiAdapter gemini.py + OpenRouterAdapter openrouter.py + ClaudeCodeAdapter claude_code.py + EmbeddingAdapter ABC embedding_adapter.py + OpenAICompatibleEmbeddingAdapter embedding_openai.py + EmbeddingCache embedding_cache.py + create_adapter() factory.py + create_embedding_adapter() embedding_factory.py + _token_estimator _token_estimator.py + similarity utilities similarity.py + +Configuration (user-controlled declarative state) + resolve_llm() chain toml_config.py ← 7-level TOML priority chain + LLMConfig / load_config config.py + _http shared utility _http.py ← also used by Functional adapters +``` + +### Dependency rule + +Core ← Functional ← Configuration +No upward dependencies. `_http.py` is consumed by Functional only. + +### Key design decisions + +**API key resolution** (`config.resolve_api_key`): three-step chain — +explicit argument → environment variable → plaintext key file in project root. +Adapters raise `LLMConfigurationError` at construction time if no key is found +(except `ClaudeCodeAdapter` which needs no key). + +**TOML config chain** (`toml_config.resolve_llm`): 7 priority levels allow +per-project and per-user LLM preferences. Currently defaults to `markitect` +app_name for backward compatibility — consumers pass their own `app_name`. + +**Factory pattern** (`factory.create_adapter`): lazy imports prevent pulling +all provider SDKs at module load. Add a new provider by registering its FQN +in `_PROVIDERS`. + +**ClaudeCodeAdapter subprocess model**: prompt is piped via stdin (not CLI +arg) to avoid shell argument length limits on large prompts (>30 KB). + +**Retry logic**: `OpenAIAdapter` and `OpenRouterAdapter` retry on 429 and 5xx +with exponential backoff. `GeminiAdapter` does not (rate-limit handling deferred). diff --git a/.claude/rules/repo-boundary.md b/.claude/rules/repo-boundary.md index ea4e1f5..8588075 100644 --- a/.claude/rules/repo-boundary.md +++ b/.claude/rules/repo-boundary.md @@ -1,8 +1,17 @@ ## Repo boundary -This repo owns **{PROJECT_NAME}** only. It does not own: +This repo owns **llm-connect** — the multi-provider LLM client library — only. - +It does NOT own: + +- **API key storage / secret management** → caller's environment (env vars, + key files, vault). llm-connect resolves keys but does not store them. +- **Consumer routing logic** → `inter-hub/AgentBridge.hs`, `markitect` etc. + `RoutingPolicy` (WP-0003) provides primitives; policy data belongs in the consumer. +- **The Claude Code CLI binary** → installed separately; `ClaudeCodeAdapter` + shells out to it. +- **markitect application code** → `markitect.llm` is a shim that re-exports + from here; all implementation lives in this repo. +- **State hub / custodian infrastructure** → `the-custodian/state-hub/` +- **IHF bridge scripts** → `inter-hub/scripts/llm_bridge.py` lives in inter-hub, + not here. llm-connect is a dependency of that script. diff --git a/.claude/rules/stack-and-commands.md b/.claude/rules/stack-and-commands.md index dc53ac6..f1d0216 100644 --- a/.claude/rules/stack-and-commands.md +++ b/.claude/rules/stack-and-commands.md @@ -1,19 +1,59 @@ ## Stack - -- **Language:** -- **Key deps:** +- **Language:** Python 3.10+ +- **Key deps (runtime):** `toml` (TOML config parsing) +- **Key deps (dev):** `pytest`, `ruff`, `mypy` +- **HTTP:** stdlib `urllib` via `_http.py` (no requests/httpx runtime dep) +- **Build:** setuptools / uv ## Dev Commands ```bash -# TODO: Fill in the standard commands for this repo - -# Install dependencies +# Install (editable, with dev extras) +uv pip install -e ".[dev]" +# or +pip install -e ".[dev]" # Run tests +uv run pytest +# or +pytest -# Lint / type check +# Lint +uv run ruff check . -# Build / package (if applicable) +# Type check +uv run mypy llm_connect + +# Run a single test file +uv run pytest tests/test_models.py -v + +# Build package (dry run) +uv build --no-sources +``` + +## Project layout + +``` +llm_connect/ source package + adapter.py LLMAdapter ABC + Mock/ErrorLLMAdapter + models.py RunConfig, LLMResponse + exceptions.py LLMError hierarchy + factory.py create_adapter() + openai.py OpenAIAdapter + gemini.py GeminiAdapter + openrouter.py OpenRouterAdapter + claude_code.py ClaudeCodeAdapter + embedding_adapter.py EmbeddingAdapter ABC + embedding_openai.py OpenAICompatibleEmbeddingAdapter + embedding_cache.py EmbeddingCache + embedding_factory.py create_embedding_adapter() + toml_config.py 7-level TOML config resolution + config.py LLMConfig, resolve_api_key, find_project_root + _http.py shared HTTP POST utility + _token_estimator.py rough token count estimate + similarity.py cosine similarity utilities +tests/ pytest test suite +contracts/ GAAF-2026 contract docs +workplans/ workplan files (LLM-WP-NNNN) ``` diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..147a9d0 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,37 @@ +name: CI + +on: + push: + branches: [main] + pull_request: + branches: [main] + +jobs: + test: + runs-on: ubuntu-latest + strategy: + matrix: + python-version: ["3.10", "3.11", "3.12"] + + steps: + - uses: actions/checkout@v4 + + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + + - name: Install uv + uses: astral-sh/setup-uv@v3 + + - name: Install dependencies + run: uv pip install --system -e ".[dev]" + + - name: Lint (ruff) + run: ruff check . + + - name: Type check (mypy) + run: mypy llm_connect + + - name: Test (pytest) + run: pytest diff --git a/ARCHITECTURE-LAYERS.md b/ARCHITECTURE-LAYERS.md new file mode 100644 index 0000000..1c6a3ee --- /dev/null +++ b/ARCHITECTURE-LAYERS.md @@ -0,0 +1,94 @@ +# ARCHITECTURE-LAYERS.md + +**Framework:** GAAF-2026 +**Last reviewed:** 2026-04-01 +**Repository purpose:** Multi-provider LLM client library — unified adapter interface for Python +**Next review:** 2026-07-01 + +--- + +## Layer Map + +### Core (high rigidity — frozen after v1) + +Domain-agnostic primitives. Must not change without a major version bump once stable. + +| Module | Contents | +|--------|----------| +| `adapter.py` | `LLMAdapter` ABC (`execute_prompt`, `validate_config`); `MockLLMAdapter`; `ErrorLLMAdapter` | +| `models.py` | `RunConfig`, `LLMResponse` dataclasses | +| `exceptions.py` | `LLMError` → `LLMConfigurationError`, `LLMAPIError`, `LLMRateLimitError`, `LLMTimeoutError`, `LLMSubprocessError` | + +**Contract:** `contracts/core/llm-adapter.md` + +### Functional (medium rigidity — evolvable, versioned) + +Value-realization modules. Each adapter is independently shippable. +Maturity states: **Experimental → Beta → Stable → Deprecated** + +| Module | Contents | Maturity | +|--------|----------|----------| +| `openai.py` | `OpenAIAdapter` — OpenAI chat completions | Beta | +| `gemini.py` | `GeminiAdapter` — Google Generative Language API | Beta | +| `openrouter.py` | `OpenRouterAdapter` — OpenAI-compatible multi-model routing | Beta | +| `claude_code.py` | `ClaudeCodeAdapter` — `claude --print` subprocess | Beta | +| `embedding_adapter.py` | `EmbeddingAdapter` ABC | Beta | +| `embedding_openai.py` | `OpenAICompatibleEmbeddingAdapter` | Beta | +| `embedding_cache.py` | `EmbeddingCache` — disk-backed embedding cache | Beta | +| `embedding_factory.py` | `create_embedding_adapter()` factory | Beta | +| `factory.py` | `create_adapter()` factory — lazy provider registration | Beta | +| `_token_estimator.py` | Rough token count estimation (word-based) | Beta | +| `similarity.py` | `cosine_similarity`, `similarity_matrix`, `find_similar_pairs` | Beta | + +**Planned additions (WP-0003):** `RoutingPolicy`, `server.py` +**Contracts:** `contracts/functional/` + +### Configuration (very low rigidity — user-controlled declarative state) + +| Module | Contents | +|--------|----------| +| `toml_config.py` | `resolve_llm()` — 7-level TOML priority chain; `ResolvedLLM`; `LLMLayer` | +| `config.py` | `LLMConfig` dataclass; `resolve_api_key()`; `find_project_root()`; `load_config()` | +| `_http.py` | Shared HTTP POST utility (used by Functional adapters) | + +**Contracts:** `contracts/config/` + +--- + +## Dependency Rule + +``` +Core ← Functional ← Configuration +``` + +Upward dependencies (Configuration → Functional, Functional → Core) are **prohibited**. +`_http.py` sits in the Configuration layer but is consumed only by Functional adapters — acceptable as a shared utility with no upward reach. + +--- + +## Decisions Log + +| Date | Decision | Rationale | +|------|----------|-----------| +| 2026-04-01 | FR-3 async: default executor fallback on ABC rather than abstract method | Non-breaking; existing adapters remain valid; native async opt-in per adapter | +| 2026-04-01 | FR-4 BudgetTracker: optional field on RunConfig, not a separate context object | Keeps RunConfig as single call config; avoids thread-local / contextvar complexity | +| 2026-04-01 | FR-1 HTTP server: optional dep `[server]`, not runtime dep | Keeps base install lightweight; most consumers call the library directly | + +--- + +## GAAF-2026 Scorecard (initial baseline — 2026-04-01) + +> Scoring: 0 = absent / harmful · 5 = excellent + +| Dimension | Score | Notes | +|-----------|-------|-------| +| **Core** | 2.5 | ABC and models well-defined; no formal contracts, no tests, no invariant docs yet | +| **Functional** | 2.5 | Adapters isolated and independently usable; no maturity labels enforced, no tests | +| **Customization** | n/a | Not applicable (library, not SaaS) | +| **Configuration** | 2.0 | TOML chain works; no schema validation; `markitect` name coupling in toml_config defaults | +| **Extensions** | n/a | Not applicable yet (RoutingPolicy + server in WP-0003) | +| **Cross-layer** | 2.0 | Dependency direction correct; no CI fitness functions; no import graph checks | +| **Weighted total** | ~2.3 | Usable but vulnerable — WP-0001 targets ≥ 3.5 | + +**Target after WP-0001:** ≥ 3.5 (Strong) +**Target after WP-0002 + WP-0003:** ≥ 4.0 (Strong / Exemplary) diff --git a/SCOPE.md b/SCOPE.md new file mode 100644 index 0000000..c90f43d --- /dev/null +++ b/SCOPE.md @@ -0,0 +1,45 @@ +# SCOPE.md — llm-connect + +## Purpose + +`llm-connect` is a **multi-provider LLM client library for Python**. +It provides a unified adapter interface over OpenAI, Gemini, OpenRouter, +and the Claude Code CLI, with embedding support, token estimation, and a +TOML-based configuration chain. + +Extracted from [markitect](https://github.com/worsch/markitect). +The `markitect.llm` module remains a re-export shim pointing here. + +## This repo owns + +- `LLMAdapter` ABC and `RunConfig` / `LLMResponse` data models (Core) +- All concrete provider adapters: `OpenAIAdapter`, `GeminiAdapter`, + `OpenRouterAdapter`, `ClaudeCodeAdapter` (Functional) +- Embedding adapters: `EmbeddingAdapter` ABC, `OpenAICompatibleEmbeddingAdapter`, + `EmbeddingCache`, `create_embedding_adapter` factory (Functional) +- TOML-based config resolution (`toml_config.py`, `config.py`) (Configuration) +- Shared HTTP utility (`_http.py`), token estimator (`_token_estimator.py`), + cosine similarity utilities (`similarity.py`) +- The full `LLMError` exception hierarchy + +## This repo does NOT own + +- Consumer application logic — that lives in `markitect`, `inter-hub`, etc. +- API key management infrastructure — keys are resolved from env vars or + plaintext key files; secret storage belongs in the calling environment +- Model routing decisions specific to a consumer — `RoutingPolicy` (WP-0003) + provides primitives; policy configuration belongs in the consumer +- The Claude Code CLI binary itself — `ClaudeCodeAdapter` shells out to `claude` + +## Consumers (as of 2026-04-01) + +| Consumer | How it uses llm-connect | +|----------|------------------------| +| `markitect` | Re-exports via `markitect.llm` shim; drives document generation | +| `inter-hub` (IHF) | Subprocess bridge (`scripts/llm_bridge.py` + `AgentBridge.hs`) for multi-agent federation | + +## Versioning + +- Current version: **0.1.0** (pre-release; API not yet stable) +- Core layer (`LLMAdapter`, `RunConfig`, `LLMResponse`) will be stabilised at **v1.0.0** +- Breaking Core changes require a major version bump diff --git a/contracts/config/toml-chain.md b/contracts/config/toml-chain.md new file mode 100644 index 0000000..4f57f64 --- /dev/null +++ b/contracts/config/toml-chain.md @@ -0,0 +1,80 @@ +# Contract: Configuration — TOML Config Chain + +**Layer:** Configuration +**Version:** 0.1.0 +**Last updated:** 2026-04-01 + +--- + +## resolve_llm() + +`llm_connect.toml_config.resolve_llm(cli_provider, cli_model, app_name)` + +Walks a 7-level priority chain to resolve provider and model independently. +Returns `ResolvedLLM(provider, model, provider_source, model_source)`. + +### Priority chain (highest → lowest) + +| Level | Source | +|-------|--------| +| 1 | CLI flags (`cli_provider`, `cli_model`) | +| 2 | Env var `{APP_NAME}_HELPER_MODEL` (model only) | +| 3 | User preference — `~/.config/{app_name}/config.toml` `[llm.preference]` | +| 4 | Directory preference — `.{app_name}.toml` `[llm.preference]` | +| 5 | Directory default — `.{app_name}.toml` `[llm.default]` | +| 6 | User default — `~/.config/{app_name}/config.toml` `[llm.default]` | +| 7 | Hardcoded fallback — `gemini / gemini-2.5-flash` | + +### Invariants + +- Always returns a fully-resolved `ResolvedLLM` (never raises, never returns None). +- Provider and model are resolved independently — a preference for model does + not imply a preference for provider. +- TOML parse errors are silently ignored (returns empty layer). +- `app_name` defaults to `"markitect"` for backward compatibility; consumers + should pass their own app name. + +### Known issue + +`toml_config.py` has `markitect`-specific defaults (`MARKITECT_HELPER_MODEL`, +`USER_CONFIG_DIR`). These are kept for backward compatibility but callers +outside markitect should always pass an explicit `app_name`. + +--- + +## resolve_api_key() + +`llm_connect.config.resolve_api_key(explicit, env_var, key_file_paths)` + +Resolution order: +1. `explicit` argument +2. Environment variable `env_var` +3. First readable file in `key_file_paths` with non-empty content + +Returns `None` if nothing is found. Never raises. + +--- + +## find_project_root() + +Walks up from CWD looking for `pyproject.toml`. Returns the containing directory +or `None`. Used by adapters to locate key files. + +--- + +## LLMConfig + +`llm_connect.config.LLMConfig` + +Dataclass holding per-adapter configuration. Used directly by `OpenRouterAdapter` +and `ClaudeCodeAdapter`. Not required by the Core `LLMAdapter` ABC. + +| Field | Default | +|-------|---------| +| `provider` | `"openrouter"` | +| `model` | `"anthropic/claude-sonnet-4"` | +| `api_key` | `None` | +| `api_base` | `"https://openrouter.ai/api/v1"` | +| `claude_cli_path` | `"claude"` | +| `timeout_seconds` | `300` | +| `max_retries` | `3` | diff --git a/contracts/core/llm-adapter.md b/contracts/core/llm-adapter.md new file mode 100644 index 0000000..7859e8c --- /dev/null +++ b/contracts/core/llm-adapter.md @@ -0,0 +1,122 @@ +# Contract: Core — LLMAdapter Interface + +**Layer:** Core +**Version:** 0.1.0 +**Status:** Draft (stabilises at v1.0.0) +**Last updated:** 2026-04-01 + +--- + +## LLMAdapter ABC + +`llm_connect.adapter.LLMAdapter` + +### Interface + +```python +class LLMAdapter(ABC): + @abstractmethod + def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ... + + @abstractmethod + def validate_config(self, config: RunConfig) -> bool: ... +``` + +**Planned addition (WP-0002 T07):** +```python + async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + # Default: runs execute_prompt in a thread executor + ... +``` + +### Invariants + +1. `execute_prompt` MUST return an `LLMResponse` with a non-empty `content` field on success. +2. `execute_prompt` MUST raise a subclass of `LLMError` on any failure — never a bare exception. +3. `validate_config` MUST be side-effect-free and return `bool` only. +4. `validate_config` returning `False` does not preclude calling `execute_prompt` — it is advisory. +5. Adapters MUST NOT mutate the `config` argument. +6. `execute_prompt` is allowed to be slow (network I/O) but MUST respect `config.timeout_seconds`. + +### Failure modes + +| Condition | Exception | +|-----------|-----------| +| Missing / invalid API key | `LLMConfigurationError` | +| HTTP 4xx (non-429) | `LLMAPIError` (with `.status_code`) | +| HTTP 429 | `LLMRateLimitError` | +| Request timeout | `LLMTimeoutError` | +| CLI subprocess failure | `LLMSubprocessError` (with `.return_code`, `.stderr`) | +| Token budget exceeded (WP-0002) | `LLMBudgetExceededError` | + +### Compatibility rules + +- Any code that accepts `LLMAdapter` MUST work with `MockLLMAdapter`. +- Adding new optional methods to the ABC is non-breaking (default implementations provided). +- Removing or changing the signature of `execute_prompt` or `validate_config` is a **breaking Core change** requiring a major version bump. + +--- + +## RunConfig + +`llm_connect.models.RunConfig` + +### Fields and invariants + +| Field | Type | Default | Invariant | +|-------|------|---------|-----------| +| `model_name` | `str` | `"gpt-4"` | Non-empty string; adapters MAY override | +| `temperature` | `float` | `0.7` | 0.0 ≤ temperature ≤ 2.0 | +| `max_tokens` | `int` | `2000` | > 0 | +| `model_params` | `dict` | `{}` | Provider-specific pass-through; no invariants | +| `max_depth` | `int` | `3` | ≥ 0 | +| `skip_if_exists` | `bool` | `True` | — | +| `timeout_seconds` | `int` | `300` | > 0 | +| `budget_tracker` | `BudgetTracker \| None` | `None` | Optional; added in WP-0002 | + +Adapters MUST NOT mutate `RunConfig` fields. + +--- + +## LLMResponse + +`llm_connect.models.LLMResponse` + +### Fields and invariants + +| Field | Type | Invariant | +|-------|------|-----------| +| `content` | `str` | Non-empty on success; may be empty only if provider returned empty output | +| `model` | `str` | Non-empty; the model actually used (may differ from `RunConfig.model_name`) | +| `usage` | `dict` | Keys: `prompt_tokens`, `completion_tokens`, `total_tokens` (all int ≥ 0) | +| `finish_reason` | `str` | Provider-reported; `"stop"` is the normal value | +| `metadata` | `dict` | Arbitrary; always includes `"provider"` key | + +--- + +## LLMError Hierarchy + +``` +LLMError +├── LLMConfigurationError bad key / unknown provider +├── LLMAPIError HTTP error (has .status_code, .response_body) +│ └── LLMRateLimitError 429 +├── LLMTimeoutError request or subprocess timed out +├── LLMSubprocessError CLI failed (has .return_code, .stderr) +└── LLMBudgetExceededError token budget cap exceeded (WP-0002) +``` + +All exceptions carry optional `cause` (chained exception) and `context` (dict). + +--- + +## Mock adapters + +`MockLLMAdapter` and `ErrorLLMAdapter` are part of Core — they are test +primitives that any consumer may depend on without importing dev extras. + +`MockLLMAdapter` invariants: +- Returns deterministic response without network I/O +- Increments `call_count` on each call +- Records `last_prompt` and `last_config` +- `reset()` clears all counters and recorded state diff --git a/contracts/functional/adapters.md b/contracts/functional/adapters.md new file mode 100644 index 0000000..004811e --- /dev/null +++ b/contracts/functional/adapters.md @@ -0,0 +1,94 @@ +# Contract: Functional — Provider Adapters + +**Layer:** Functional +**Version:** 0.1.0 +**Maturity:** Beta (all adapters) +**Last updated:** 2026-04-01 + +--- + +## Common adapter contract + +All provider adapters implement `LLMAdapter` (see `contracts/core/llm-adapter.md`). + +Additional shared guarantees: + +- Constructors resolve API keys at instantiation and raise `LLMConfigurationError` + immediately if no key is found (fail-fast). +- HTTP-based adapters (`OpenAIAdapter`, `GeminiAdapter`, `OpenRouterAdapter`) + use `_http.post_json` and do not add runtime dependencies beyond stdlib. +- `metadata` in the returned `LLMResponse` always contains `"provider"` and + `"latency_seconds"` keys. +- HTTP adapters that retry (`OpenAIAdapter`, `OpenRouterAdapter`) use + exponential backoff: `sleep(2 ** attempt)` on 429 and 5xx. + +--- + +## OpenAIAdapter + +**Provider key:** `"openai"` +**Default model:** `gpt-4.1-mini` +**API:** `https://api.openai.com/v1/chat/completions` +**Auth:** `OPENAI_API_KEY` env var or `apikey-chatgpt.txt` in project root +**Retries:** 3 (exponential backoff on 429 and 5xx) + +--- + +## GeminiAdapter + +**Provider key:** `"gemini"` +**Default model:** `gemini-2.5-flash` +**API:** `https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent` +**Auth:** `GEMINI_API_KEY` env var or `apikey-geminifree.txt` in project root +**Retries:** 0 (no retry logic; rate-limit handling deferred) +**Note:** System prompt is simulated via a user/model turn pair (Gemini has no native system role). + +--- + +## OpenRouterAdapter + +**Provider key:** `"openrouter"` +**Default model:** `anthropic/claude-sonnet-4` +**API:** `https://openrouter.ai/api/v1/chat/completions` (configurable via `LLMConfig.api_base`) +**Auth:** `OPENROUTER_API_KEY` env var or `apikey-openrouter.txt` in project root +**Retries:** 3 (exponential backoff on 429 and 5xx) +**Note:** OpenRouter is an OpenAI-compatible endpoint; `RunConfig.model_params` are merged into the payload. + +--- + +## ClaudeCodeAdapter + +**Provider key:** `"claude-code"` +**Default model:** n/a (uses the CLI's configured default) +**Auth:** none (delegates to locally installed `claude` CLI) +**Subprocess:** `claude --print [--model M]` with prompt on stdin +**Token counts:** estimated via `_token_estimator` (not provider-reported) +**validate_config:** runs `claude --version`; returns `False` if CLI not found + +--- + +## EmbeddingAdapter ABC + +`llm_connect.embedding_adapter.EmbeddingAdapter` + +```python +class EmbeddingAdapter(ABC): + @abstractmethod + def embed(self, texts: list[str]) -> list[list[float]]: ... +``` + +Invariant: returns a list of the same length as `texts`. + +### OpenAICompatibleEmbeddingAdapter + +Compatible with any OpenAI-format embedding endpoint (`/v1/embeddings`). +Default model: `text-embedding-3-small`. + +--- + +## EmbeddingCache + +`llm_connect.embedding_cache.EmbeddingCache` + +Disk-backed cache keyed by text content (SHA-256 hash). +`get_or_compute(text, compute_fn)` returns cached vector or calls `compute_fn`. diff --git a/llm_connect/__init__.py b/llm_connect/__init__.py index 5d7cbe6..fc74c1f 100644 --- a/llm_connect/__init__.py +++ b/llm_connect/__init__.py @@ -12,7 +12,7 @@ Quick start:: response = adapter.execute_prompt(prompt, run_config) """ -from llm_connect.models import RunConfig, LLMResponse +from llm_connect.models import RunConfig, LLMResponse, BudgetTracker from llm_connect.adapter import LLMAdapter, MockLLMAdapter, ErrorLLMAdapter from llm_connect.factory import create_adapter from llm_connect.openrouter import OpenRouterAdapter @@ -27,6 +27,7 @@ from llm_connect.exceptions import ( LLMRateLimitError, LLMTimeoutError, LLMSubprocessError, + LLMBudgetExceededError, ) from llm_connect.embedding_adapter import EmbeddingAdapter from llm_connect.embedding_openai import OpenAICompatibleEmbeddingAdapter @@ -41,6 +42,7 @@ from llm_connect.similarity import ( __all__ = [ "RunConfig", "LLMResponse", + "BudgetTracker", "LLMAdapter", "MockLLMAdapter", "ErrorLLMAdapter", @@ -57,6 +59,7 @@ __all__ = [ "LLMRateLimitError", "LLMTimeoutError", "LLMSubprocessError", + "LLMBudgetExceededError", "EmbeddingAdapter", "OpenAICompatibleEmbeddingAdapter", "EmbeddingCache", diff --git a/llm_connect/adapter.py b/llm_connect/adapter.py index e8d4574..4742e70 100644 --- a/llm_connect/adapter.py +++ b/llm_connect/adapter.py @@ -5,10 +5,11 @@ Implements abstraction layer for LLM integration, supporting multiple providers (OpenAI, Anthropic, local models, etc.). """ +import asyncio from abc import ABC, abstractmethod from typing import Dict, Any -from llm_connect.models import RunConfig, LLMResponse +from llm_connect.models import RunConfig, LLMResponse, BudgetTracker class LLMAdapter(ABC): @@ -40,6 +41,26 @@ class LLMAdapter(ABC): """ pass + async def async_execute_prompt( + self, + prompt: str, + config: RunConfig, + ) -> LLMResponse: + """Execute a prompt asynchronously. + + Default implementation runs :meth:`execute_prompt` in a thread + executor so that the event loop is not blocked. Subclasses may + override with a native ``asyncio``-based implementation. + + Args: + prompt: Compiled prompt text + config: Execution configuration + + Returns: + LLMResponse with generated content + """ + return await asyncio.to_thread(self.execute_prompt, prompt, config) + @abstractmethod def validate_config(self, config: RunConfig) -> bool: """ @@ -53,6 +74,27 @@ class LLMAdapter(ABC): """ pass + # ── Budget helpers (call in execute_prompt implementations) ───── + + def _preflight_budget(self, config: RunConfig) -> None: + """Raise ``LLMBudgetExceededError`` if the budget is already exhausted.""" + if config.budget_tracker is not None and config.budget_tracker.remaining() == 0: + from llm_connect.exceptions import LLMBudgetExceededError + tracker = config.budget_tracker + raise LLMBudgetExceededError( + "Token budget exhausted before making request", + total=tracker.total, + spent=tracker.spent, + requested=0, + context={"total": tracker.total, "spent": tracker.spent}, + ) + + def _consume_budget(self, config: RunConfig, response: LLMResponse) -> None: + """Consume tokens from the budget tracker after a successful call.""" + if config.budget_tracker is not None: + tokens = response.usage.get("total_tokens", 0) + config.budget_tracker.consume(tokens) + class MockLLMAdapter(LLMAdapter): """ @@ -88,11 +130,12 @@ class MockLLMAdapter(LLMAdapter): Returns: Mock LLMResponse """ + self._preflight_budget(config) self.call_count += 1 self.last_prompt = prompt self.last_config = config - return LLMResponse( + response = LLMResponse( content=self.mock_response, model=config.model_name, usage={ @@ -103,6 +146,8 @@ class MockLLMAdapter(LLMAdapter): finish_reason="stop", metadata={"mock": True}, ) + self._consume_budget(config, response) + return response def validate_config(self, config: RunConfig) -> bool: """ diff --git a/llm_connect/claude_code.py b/llm_connect/claude_code.py index 534c80a..fa8c786 100644 --- a/llm_connect/claude_code.py +++ b/llm_connect/claude_code.py @@ -2,6 +2,7 @@ Claude Code CLI adapter — runs the ``claude`` CLI as a subprocess. """ +import asyncio import subprocess from typing import Optional @@ -35,6 +36,7 @@ class ClaudeCodeAdapter(LLMAdapter): # ── LLMAdapter interface ──────────────────────────────────────── def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + self._preflight_budget(config) cmd = [self._cli_path, "--print"] if self._model: cmd.extend(["--model", self._model]) @@ -66,7 +68,7 @@ class ClaudeCodeAdapter(LLMAdapter): prompt_tokens = estimate_tokens(prompt) completion_tokens = estimate_tokens(content) - return LLMResponse( + response = LLMResponse( content=content, model=self._model or "claude-code-cli", usage={ @@ -80,6 +82,63 @@ class ClaudeCodeAdapter(LLMAdapter): "cli_path": self._cli_path, }, ) + self._consume_budget(config, response) + return response + + async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + """Native async implementation using asyncio.create_subprocess_exec.""" + self._preflight_budget(config) + cmd = [self._cli_path, "--print"] + if self._model: + cmd.extend(["--model", self._model]) + + timeout = config.timeout_seconds or self._config.timeout_seconds + + try: + proc = await asyncio.create_subprocess_exec( + *cmd, + stdin=asyncio.subprocess.PIPE, + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + stdout_bytes, stderr_bytes = await asyncio.wait_for( + proc.communicate(input=prompt.encode()), + timeout=timeout, + ) + except asyncio.TimeoutError as exc: + raise LLMTimeoutError( + f"claude CLI timed out after {timeout}s", + cause=exc, + ) from exc + + if proc.returncode != 0: + raise LLMSubprocessError( + f"claude CLI exited with code {proc.returncode}", + return_code=proc.returncode, + stderr=stderr_bytes.decode(), + ) + + content = stdout_bytes.decode() + prompt_tokens = estimate_tokens(prompt) + completion_tokens = estimate_tokens(content) + + response = LLMResponse( + content=content, + model=self._model or "claude-code-cli", + usage={ + "prompt_tokens": prompt_tokens, + "completion_tokens": completion_tokens, + "total_tokens": prompt_tokens + completion_tokens, + }, + finish_reason="stop", + metadata={ + "provider": "claude-code", + "cli_path": self._cli_path, + "async": True, + }, + ) + self._consume_budget(config, response) + return response def validate_config(self, config: RunConfig) -> bool: try: diff --git a/llm_connect/exceptions.py b/llm_connect/exceptions.py index f2fc34d..165a92b 100644 --- a/llm_connect/exceptions.py +++ b/llm_connect/exceptions.py @@ -64,6 +64,30 @@ class LLMTimeoutError(LLMError): pass +class LLMBudgetExceededError(LLMError): + """Token budget cap exceeded during a call or delegation chain. + + Attributes: + total: The configured token cap. + spent: Tokens already consumed before this call. + requested: Tokens this call would have consumed. + """ + + def __init__( + self, + message: str, + total: int = 0, + spent: int = 0, + requested: int = 0, + cause: Optional[Exception] = None, + context: Optional[Dict[str, Any]] = None, + ): + super().__init__(message, cause=cause, context=context) + self.total = total + self.spent = spent + self.requested = requested + + class LLMSubprocessError(LLMError): """Claude Code CLI subprocess failed. diff --git a/llm_connect/gemini.py b/llm_connect/gemini.py index 667f952..171c176 100644 --- a/llm_connect/gemini.py +++ b/llm_connect/gemini.py @@ -2,6 +2,7 @@ Google Gemini adapter — calls the Generative Language REST API directly. """ +import asyncio import time from typing import Optional, Dict, Any @@ -48,6 +49,7 @@ class GeminiAdapter(LLMAdapter): # ── LLMAdapter interface ──────────────────────────────────────── def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + self._preflight_budget(config) model = self._model # Build Gemini request @@ -92,7 +94,7 @@ class GeminiAdapter(LLMAdapter): usage_meta = data.get("usageMetadata", {}) - return LLMResponse( + response = LLMResponse( content=content, model=model, usage={ @@ -106,6 +108,12 @@ class GeminiAdapter(LLMAdapter): "latency_seconds": round(latency, 3), }, ) + self._consume_budget(config, response) + return response + + async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + """Async wrapper — runs execute_prompt in a thread executor.""" + return await asyncio.to_thread(self.execute_prompt, prompt, config) def validate_config(self, config: RunConfig) -> bool: if not self._api_key: diff --git a/llm_connect/models.py b/llm_connect/models.py index 5872918..b456e6c 100644 --- a/llm_connect/models.py +++ b/llm_connect/models.py @@ -5,8 +5,53 @@ These classes are the canonical definitions; they are re-exported by markitect.prompts.execution.models for backward compatibility. """ +import threading from dataclasses import dataclass, field -from typing import Dict, Any +from typing import Dict, Any, Optional + + +class BudgetTracker: + """Shared token budget for a call or delegation chain. + + Thread-safe. Tracks cumulative token spend across multiple adapter + calls. Raises ``LLMBudgetExceededError`` when the cap is exceeded. + + Example:: + + tracker = BudgetTracker(total=4000) + config = RunConfig(budget_tracker=tracker) + # All adapter calls sharing this config will consume from the same cap. + """ + + def __init__(self, total: int) -> None: + if total <= 0: + raise ValueError(f"BudgetTracker total must be positive, got {total}") + self.total = total + self.spent = 0 + self._lock = threading.Lock() + + def remaining(self) -> int: + """Return tokens remaining in the budget.""" + return max(0, self.total - self.spent) + + def consume(self, tokens: int) -> None: + """Record *tokens* as spent. Raises ``LLMBudgetExceededError`` if cap exceeded.""" + from llm_connect.exceptions import LLMBudgetExceededError # avoid circular at module load + + with self._lock: + new_spent = self.spent + tokens + if new_spent > self.total: + raise LLMBudgetExceededError( + f"Token budget exceeded: {new_spent} tokens used, cap is {self.total}", + total=self.total, + spent=self.spent, + requested=tokens, + context={"total": self.total, "spent": self.spent, "requested": tokens}, + ) + self.spent = new_spent + + def __repr__(self) -> str: + return f"BudgetTracker(total={self.total}, spent={self.spent}, remaining={self.remaining()})" @dataclass @@ -30,9 +75,10 @@ class RunConfig: max_depth: int = 3 skip_if_exists: bool = True timeout_seconds: int = 300 + budget_tracker: Optional["BudgetTracker"] = field(default=None, repr=False) def to_dict(self) -> Dict[str, Any]: - """Convert to dictionary.""" + """Convert to dictionary. ``budget_tracker`` is excluded (runtime object).""" return { "model_name": self.model_name, "temperature": self.temperature, diff --git a/llm_connect/openai.py b/llm_connect/openai.py index 285aa60..9528fbc 100644 --- a/llm_connect/openai.py +++ b/llm_connect/openai.py @@ -2,6 +2,7 @@ OpenAI (ChatGPT) adapter — calls the OpenAI chat completions API. """ +import asyncio import time from typing import Optional, Dict, Any @@ -51,6 +52,7 @@ class OpenAIAdapter(LLMAdapter): # ── LLMAdapter interface ──────────────────────────────────────── def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + self._preflight_budget(config) model = self._model messages: list[Dict[str, str]] = [] @@ -80,7 +82,7 @@ class OpenAIAdapter(LLMAdapter): finish_reason = choice.get("finish_reason", "stop") usage = data.get("usage", {}) - return LLMResponse( + response = LLMResponse( content=content, model=data.get("model", model), usage={ @@ -95,6 +97,12 @@ class OpenAIAdapter(LLMAdapter): "response_id": data.get("id", ""), }, ) + self._consume_budget(config, response) + return response + + async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + """Async wrapper — runs execute_prompt in a thread executor.""" + return await asyncio.to_thread(self.execute_prompt, prompt, config) def validate_config(self, config: RunConfig) -> bool: if not self._api_key: diff --git a/llm_connect/openrouter.py b/llm_connect/openrouter.py index 97c9aa9..8cba1c1 100644 --- a/llm_connect/openrouter.py +++ b/llm_connect/openrouter.py @@ -2,6 +2,7 @@ OpenRouter adapter — calls the OpenAI-compatible chat completions API. """ +import asyncio import time from typing import Optional, Dict, Any @@ -55,6 +56,7 @@ class OpenRouterAdapter(LLMAdapter): # ── LLMAdapter interface ──────────────────────────────────────── def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + self._preflight_budget(config) model = self._model if self._model != _DEFAULT_MODEL else (config.model_name or self._model) messages: list[Dict[str, str]] = [] @@ -88,7 +90,7 @@ class OpenRouterAdapter(LLMAdapter): finish_reason = choice.get("finish_reason", "stop") usage = data.get("usage", {}) - return LLMResponse( + response = LLMResponse( content=content, model=data.get("model", model), usage={ @@ -103,6 +105,12 @@ class OpenRouterAdapter(LLMAdapter): "response_id": data.get("id", ""), }, ) + self._consume_budget(config, response) + return response + + async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: + """Async wrapper — runs execute_prompt in a thread executor.""" + return await asyncio.to_thread(self.execute_prompt, prompt, config) def validate_config(self, config: RunConfig) -> bool: if not self._api_key: diff --git a/pyproject.toml b/pyproject.toml index b4a3357..224ef74 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -14,6 +14,8 @@ dependencies = [ [project.optional-dependencies] dev = [ "pytest>=7.0", + "ruff>=0.4", + "mypy>=1.10", ] [tool.setuptools.packages.find] @@ -23,4 +25,26 @@ include = ["llm_connect*"] [dependency-groups] dev = [ "pytest>=9.0.2", + "ruff>=0.4", + "mypy>=1.10", ] + +[tool.pytest.ini_options] +testpaths = ["tests"] +addopts = "-v" + +[tool.ruff] +target-version = "py310" +line-length = 100 + +[tool.ruff.lint] +select = ["E", "F", "W", "I", "UP"] +ignore = ["E501"] + +[tool.mypy] +python_version = "3.10" +strict = false +ignore_missing_imports = true +disallow_untyped_defs = true +warn_return_any = true +warn_unused_ignores = true diff --git a/tests/conftest.py b/tests/conftest.py new file mode 100644 index 0000000..62e5714 --- /dev/null +++ b/tests/conftest.py @@ -0,0 +1,26 @@ +""" +Shared pytest fixtures for llm-connect tests. +""" + +import pytest + +from llm_connect.models import RunConfig, LLMResponse +from llm_connect.adapter import MockLLMAdapter + + +@pytest.fixture +def run_config(): + """Default RunConfig for tests.""" + return RunConfig() + + +@pytest.fixture +def mock_adapter(): + """MockLLMAdapter with a predictable response.""" + return MockLLMAdapter(mock_response="test response") + + +@pytest.fixture +def sample_response(): + """A minimal valid LLMResponse.""" + return LLMResponse(content="hello", model="test-model") diff --git a/tests/test_adapter.py b/tests/test_adapter.py new file mode 100644 index 0000000..39b42f5 --- /dev/null +++ b/tests/test_adapter.py @@ -0,0 +1,77 @@ +""" +Tests for MockLLMAdapter and ErrorLLMAdapter (Core adapter utilities). +""" + +import pytest +from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter +from llm_connect.models import RunConfig, LLMResponse + + +class TestMockLLMAdapter: + def test_returns_mock_response(self, mock_adapter, run_config): + response = mock_adapter.execute_prompt("hello", run_config) + assert response.content == "test response" + + def test_returns_llm_response(self, mock_adapter, run_config): + response = mock_adapter.execute_prompt("hello", run_config) + assert isinstance(response, LLMResponse) + + def test_call_count_increments(self, mock_adapter, run_config): + assert mock_adapter.call_count == 0 + mock_adapter.execute_prompt("a", run_config) + mock_adapter.execute_prompt("b", run_config) + assert mock_adapter.call_count == 2 + + def test_records_last_prompt(self, mock_adapter, run_config): + mock_adapter.execute_prompt("my prompt", run_config) + assert mock_adapter.last_prompt == "my prompt" + + def test_records_last_config(self, mock_adapter, run_config): + mock_adapter.execute_prompt("x", run_config) + assert mock_adapter.last_config is run_config + + def test_reset_clears_state(self, mock_adapter, run_config): + mock_adapter.execute_prompt("x", run_config) + mock_adapter.reset() + assert mock_adapter.call_count == 0 + assert mock_adapter.last_prompt is None + assert mock_adapter.last_config is None + + def test_validate_config_always_true(self, mock_adapter, run_config): + assert mock_adapter.validate_config(run_config) is True + + def test_usage_contains_expected_keys(self, mock_adapter, run_config): + response = mock_adapter.execute_prompt("prompt text", run_config) + assert "prompt_tokens" in response.usage + assert "completion_tokens" in response.usage + assert "total_tokens" in response.usage + + def test_custom_response_text(self, run_config): + adapter = MockLLMAdapter(mock_response="custom answer") + response = adapter.execute_prompt("q", run_config) + assert response.content == "custom answer" + + def test_default_response_text(self, run_config): + adapter = MockLLMAdapter() + response = adapter.execute_prompt("q", run_config) + assert response.content == "Mock LLM response" + + def test_metadata_marks_as_mock(self, mock_adapter, run_config): + response = mock_adapter.execute_prompt("q", run_config) + assert response.metadata.get("mock") is True + + +class TestErrorLLMAdapter: + def test_raises_on_execute(self, run_config): + adapter = ErrorLLMAdapter() + with pytest.raises(RuntimeError): + adapter.execute_prompt("q", run_config) + + def test_raises_with_custom_message(self, run_config): + adapter = ErrorLLMAdapter(error_message="boom") + with pytest.raises(RuntimeError, match="boom"): + adapter.execute_prompt("q", run_config) + + def test_validate_config_returns_true(self, run_config): + adapter = ErrorLLMAdapter() + assert adapter.validate_config(run_config) is True diff --git a/tests/test_async.py b/tests/test_async.py new file mode 100644 index 0000000..4f2dae3 --- /dev/null +++ b/tests/test_async.py @@ -0,0 +1,101 @@ +""" +Tests for async_execute_prompt (FR-3). +""" + +import asyncio +import pytest + +from llm_connect.models import RunConfig, BudgetTracker +from llm_connect.adapter import MockLLMAdapter +from llm_connect.exceptions import LLMBudgetExceededError + + +class TestAsyncExecutePrompt: + def test_default_fallback_returns_response(self): + adapter = MockLLMAdapter(mock_response="async result") + config = RunConfig() + response = asyncio.run(adapter.async_execute_prompt("hello", config)) + assert response.content == "async result" + + def test_gather_multiple_adapters(self): + """asyncio.gather over N adapters completes without errors.""" + adapters = [MockLLMAdapter(mock_response=f"resp-{i}") for i in range(4)] + config = RunConfig() + + async def run(): + return await asyncio.gather(*[ + a.async_execute_prompt("prompt", config) for a in adapters + ]) + + results = asyncio.run(run()) + assert len(results) == 4 + for i, r in enumerate(results): + assert r.content == f"resp-{i}" + + def test_gather_increments_call_counts(self): + adapter = MockLLMAdapter() + config = RunConfig() + + async def run(): + await asyncio.gather(*[ + adapter.async_execute_prompt("p", config) for _ in range(5) + ]) + + asyncio.run(run()) + assert adapter.call_count == 5 + + def test_concurrent_faster_than_sequential(self): + """Gathering N async calls should not be N× slower than one call.""" + import time + + adapter = MockLLMAdapter() + config = RunConfig() + + async def run_concurrent(n: int): + await asyncio.gather(*[ + adapter.async_execute_prompt("p", config) for _ in range(n) + ]) + + # Just verify it completes without deadlock or error — timing is CI-unreliable + asyncio.run(run_concurrent(10)) + assert adapter.call_count == 10 + + def test_async_with_budget_tracker(self): + """Budget enforcement works through async calls.""" + tracker = BudgetTracker(total=10000) + config = RunConfig(budget_tracker=tracker) + adapter = MockLLMAdapter(mock_response="hi") + + asyncio.run(adapter.async_execute_prompt("hello", config)) + assert tracker.spent > 0 + + def test_async_exhausted_budget_raises(self): + """Exhausted budget raises LLMBudgetExceededError in async context.""" + tracker = BudgetTracker(total=1) + tracker.consume(1) + config = RunConfig(budget_tracker=tracker) + adapter = MockLLMAdapter() + + with pytest.raises(LLMBudgetExceededError): + asyncio.run(adapter.async_execute_prompt("p", config)) + + def test_async_gather_with_shared_budget(self): + """Shared budget across concurrent async calls is enforced correctly.""" + tracker = BudgetTracker(total=100000) + config = RunConfig(budget_tracker=tracker) + adapters = [MockLLMAdapter(mock_response="hi") for _ in range(4)] + + async def run(): + await asyncio.gather(*[ + a.async_execute_prompt("hello", config) for a in adapters + ]) + + asyncio.run(run()) + assert tracker.spent > 0 + + def test_returns_llm_response_type(self): + from llm_connect.models import LLMResponse + adapter = MockLLMAdapter() + config = RunConfig() + response = asyncio.run(adapter.async_execute_prompt("q", config)) + assert isinstance(response, LLMResponse) diff --git a/tests/test_budget.py b/tests/test_budget.py new file mode 100644 index 0000000..9458161 --- /dev/null +++ b/tests/test_budget.py @@ -0,0 +1,152 @@ +""" +Tests for BudgetTracker (FR-4) and LLMBudgetExceededError. +""" + +import threading +import pytest + +from llm_connect.models import BudgetTracker, RunConfig +from llm_connect.adapter import MockLLMAdapter +from llm_connect.exceptions import LLMBudgetExceededError, LLMError + + +class TestBudgetTracker: + def test_initial_state(self): + t = BudgetTracker(total=1000) + assert t.total == 1000 + assert t.spent == 0 + assert t.remaining() == 1000 + + def test_consume_updates_spent(self): + t = BudgetTracker(total=1000) + t.consume(300) + assert t.spent == 300 + assert t.remaining() == 700 + + def test_consume_multiple_times(self): + t = BudgetTracker(total=1000) + t.consume(400) + t.consume(400) + assert t.spent == 800 + assert t.remaining() == 200 + + def test_consume_exact_budget(self): + t = BudgetTracker(total=100) + t.consume(100) + assert t.spent == 100 + assert t.remaining() == 0 + + def test_consume_exceeds_budget_raises(self): + t = BudgetTracker(total=100) + t.consume(60) + with pytest.raises(LLMBudgetExceededError): + t.consume(50) + + def test_exceeded_error_carries_details(self): + t = BudgetTracker(total=100) + t.consume(80) + with pytest.raises(LLMBudgetExceededError) as exc_info: + t.consume(30) + err = exc_info.value + assert err.total == 100 + assert err.spent == 80 + assert err.requested == 30 + + def test_exceeded_error_is_subclass_of_llm_error(self): + with pytest.raises(LLMError): + t = BudgetTracker(total=10) + t.consume(20) + + def test_remaining_never_negative(self): + t = BudgetTracker(total=100) + t.consume(100) + assert t.remaining() == 0 + + def test_invalid_total_raises(self): + with pytest.raises(ValueError): + BudgetTracker(total=0) + with pytest.raises(ValueError): + BudgetTracker(total=-1) + + def test_repr(self): + t = BudgetTracker(total=500) + t.consume(100) + r = repr(t) + assert "500" in r + assert "100" in r + + def test_thread_safety(self): + """Concurrent consume() calls must not corrupt state or allow overspend.""" + total = 1000 + t = BudgetTracker(total=total) + errors = [] + + def consume_100(): + try: + t.consume(100) + except LLMBudgetExceededError: + errors.append(1) + + threads = [threading.Thread(target=consume_100) for _ in range(15)] + for th in threads: + th.start() + for th in threads: + th.join() + + # At most 10 consumes of 100 can succeed within a budget of 1000 + assert t.spent <= total + assert len(errors) == 5 # 15 attempts, 10 succeed, 5 fail + + +class TestBudgetEnforcementInAdapter: + def test_single_call_consumes_budget(self): + tracker = BudgetTracker(total=10000) + config = RunConfig(budget_tracker=tracker) + adapter = MockLLMAdapter(mock_response="hello world") + adapter.execute_prompt("test prompt", config) + assert tracker.spent > 0 + + def test_exhausted_budget_raises_before_call(self): + tracker = BudgetTracker(total=1) + tracker.consume(1) # exhaust it + config = RunConfig(budget_tracker=tracker) + adapter = MockLLMAdapter() + with pytest.raises(LLMBudgetExceededError): + adapter.execute_prompt("any prompt", config) + # Adapter should not have been called + assert adapter.call_count == 0 + + def test_delegation_chain_shared_tracker(self): + """A → B → C sharing the same tracker enforces the cap across all calls.""" + tracker = BudgetTracker(total=10000) + config = RunConfig(budget_tracker=tracker) + adapter = MockLLMAdapter(mock_response="response") + + adapter.execute_prompt("call A", config) + adapter.execute_prompt("call B", config) + adapter.execute_prompt("call C", config) + + assert adapter.call_count == 3 + assert tracker.spent > 0 + + def test_budget_exceeded_mid_chain(self): + """Chain stops when budget is exhausted between calls.""" + # MockLLMAdapter uses word count for tokens — "x" * 200 = 200 token prompt + # mock_response "r" * 100 = 25 tokens; total ~75 per call + adapter = MockLLMAdapter(mock_response="r " * 50) # ~50 completion tokens + tracker = BudgetTracker(total=200) + config = RunConfig(budget_tracker=tracker) + + # First call succeeds + adapter.execute_prompt("p " * 100, config) + # Eventually exhausts the budget + with pytest.raises(LLMBudgetExceededError): + for _ in range(10): + adapter.execute_prompt("p " * 100, config) + + def test_no_tracker_has_no_effect(self): + """Adapters work normally when no budget_tracker is set.""" + config = RunConfig() # no budget_tracker + adapter = MockLLMAdapter() + response = adapter.execute_prompt("hello", config) + assert response.content == "Mock LLM response" diff --git a/tests/test_exceptions.py b/tests/test_exceptions.py new file mode 100644 index 0000000..b0a5cd0 --- /dev/null +++ b/tests/test_exceptions.py @@ -0,0 +1,96 @@ +""" +Tests for the LLMError exception hierarchy (Core). +""" + +import pytest +from llm_connect.exceptions import ( + LLMError, + LLMConfigurationError, + LLMAPIError, + LLMRateLimitError, + LLMTimeoutError, + LLMSubprocessError, +) + + +class TestLLMErrorHierarchy: + def test_all_are_subclasses_of_llm_error(self): + assert issubclass(LLMConfigurationError, LLMError) + assert issubclass(LLMAPIError, LLMError) + assert issubclass(LLMRateLimitError, LLMError) + assert issubclass(LLMTimeoutError, LLMError) + assert issubclass(LLMSubprocessError, LLMError) + + def test_rate_limit_is_api_error(self): + assert issubclass(LLMRateLimitError, LLMAPIError) + + def test_all_are_exceptions(self): + assert issubclass(LLMError, Exception) + + +class TestLLMError: + def test_basic_message(self): + err = LLMError("something went wrong") + assert str(err) == "something went wrong" + + def test_context_appears_in_str(self): + err = LLMError("oops", context={"provider": "openai"}) + assert "provider=openai" in str(err) + + def test_cause_is_chained(self): + cause = ValueError("root cause") + err = LLMError("wrapper", cause=cause) + assert err.__cause__ is cause + + def test_empty_context_does_not_appear(self): + err = LLMError("clean message", context={}) + assert str(err) == "clean message" + + +class TestLLMAPIError: + def test_has_status_code(self): + err = LLMAPIError("bad request", status_code=400) + assert err.status_code == 400 + + def test_has_response_body(self): + err = LLMAPIError("error", status_code=500, response_body='{"error": "oops"}') + assert err.response_body == '{"error": "oops"}' + + def test_defaults(self): + err = LLMAPIError("minimal") + assert err.status_code == 0 + assert err.response_body == "" + + def test_rate_limit_inherits_status_code(self): + err = LLMRateLimitError("too many", status_code=429) + assert err.status_code == 429 + assert isinstance(err, LLMAPIError) + + +class TestLLMSubprocessError: + def test_has_return_code(self): + err = LLMSubprocessError("cli failed", return_code=1) + assert err.return_code == 1 + + def test_has_stderr(self): + err = LLMSubprocessError("cli failed", stderr="error output") + assert err.stderr == "error output" + + def test_defaults(self): + err = LLMSubprocessError("minimal") + assert err.return_code == 1 + assert err.stderr == "" + + +class TestRaiseAndCatch: + def test_catch_as_llm_error(self): + with pytest.raises(LLMError): + raise LLMConfigurationError("no key") + + def test_catch_api_error_as_llm_error(self): + with pytest.raises(LLMError): + raise LLMAPIError("http error", status_code=502) + + def test_catch_rate_limit_as_api_error(self): + with pytest.raises(LLMAPIError): + raise LLMRateLimitError("429", status_code=429) diff --git a/tests/test_factory.py b/tests/test_factory.py new file mode 100644 index 0000000..af98a46 --- /dev/null +++ b/tests/test_factory.py @@ -0,0 +1,97 @@ +""" +Tests for create_adapter() and create_embedding_adapter() factories. +""" + +import pytest +from llm_connect.factory import create_adapter +from llm_connect.embedding_factory import create_embedding_adapter +from llm_connect.exceptions import LLMConfigurationError +from llm_connect.adapter import LLMAdapter +from llm_connect.embedding_adapter import EmbeddingAdapter +from llm_connect.openrouter import OpenRouterAdapter +from llm_connect.claude_code import ClaudeCodeAdapter +from llm_connect.openai import OpenAIAdapter +from llm_connect.gemini import GeminiAdapter +from llm_connect.embedding_openai import OpenAICompatibleEmbeddingAdapter + + +class TestCreateAdapter: + def test_unknown_provider_raises(self): + with pytest.raises(LLMConfigurationError, match="Unknown LLM provider"): + create_adapter("nonexistent-provider") + + def test_unknown_provider_error_lists_known(self): + with pytest.raises(LLMConfigurationError) as exc_info: + create_adapter("bad") + assert "openai" in str(exc_info.value) + assert "gemini" in str(exc_info.value) + + def test_openrouter_returns_adapter(self): + adapter = create_adapter("openrouter", api_key="test-key") + assert isinstance(adapter, OpenRouterAdapter) + assert isinstance(adapter, LLMAdapter) + + def test_openrouter_no_key_still_constructs(self): + # OpenRouterAdapter defers key validation to execute_prompt + adapter = create_adapter("openrouter") + assert isinstance(adapter, OpenRouterAdapter) + + def test_openai_with_key_returns_adapter(self): + adapter = create_adapter("openai", api_key="sk-test-key") + assert isinstance(adapter, OpenAIAdapter) + assert isinstance(adapter, LLMAdapter) + + def test_openai_without_key_raises(self, monkeypatch): + monkeypatch.delenv("OPENAI_API_KEY", raising=False) + with pytest.raises(LLMConfigurationError): + create_adapter("openai") + + def test_gemini_with_key_returns_adapter(self): + adapter = create_adapter("gemini", api_key="aistudio-test-key") + assert isinstance(adapter, GeminiAdapter) + assert isinstance(adapter, LLMAdapter) + + def test_gemini_without_key_raises(self, monkeypatch): + monkeypatch.delenv("GEMINI_API_KEY", raising=False) + with pytest.raises(LLMConfigurationError): + create_adapter("gemini") + + def test_claude_code_returns_adapter(self): + adapter = create_adapter("claude-code") + assert isinstance(adapter, ClaudeCodeAdapter) + assert isinstance(adapter, LLMAdapter) + + def test_claude_code_with_model(self): + adapter = create_adapter("claude-code", model="claude-opus-4-6") + assert isinstance(adapter, ClaudeCodeAdapter) + + def test_all_known_providers_are_reachable(self): + known = {"openrouter", "openai", "gemini", "claude-code"} + # Just verify each key is in the factory registry (no construction needed) + from llm_connect.factory import _PROVIDERS + assert known == set(_PROVIDERS.keys()) + + +class TestCreateEmbeddingAdapter: + def test_unknown_provider_raises(self): + with pytest.raises(LLMConfigurationError, match="Unknown embedding provider"): + create_embedding_adapter("nonexistent") + + def test_openai_returns_adapter(self): + adapter = create_embedding_adapter("openai", api_key="sk-test") + assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter) + assert isinstance(adapter, EmbeddingAdapter) + + def test_openrouter_returns_adapter(self): + adapter = create_embedding_adapter("openrouter", api_key="or-test") + assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter) + assert isinstance(adapter, EmbeddingAdapter) + + def test_validate_returns_true_when_key_set(self): + adapter = create_embedding_adapter("openai", api_key="sk-test") + assert adapter.validate() is True + + def test_validate_returns_false_when_no_key(self, monkeypatch): + monkeypatch.delenv("OPENAI_API_KEY", raising=False) + adapter = create_embedding_adapter("openai") + assert adapter.validate() is False diff --git a/tests/test_models.py b/tests/test_models.py new file mode 100644 index 0000000..7f4bc3e --- /dev/null +++ b/tests/test_models.py @@ -0,0 +1,86 @@ +""" +Tests for RunConfig and LLMResponse (Core models). +""" + +import pytest +from llm_connect.models import RunConfig, LLMResponse + + +class TestRunConfig: + def test_defaults(self): + cfg = RunConfig() + assert cfg.model_name == "gpt-4" + assert cfg.temperature == 0.7 + assert cfg.max_tokens == 2000 + assert cfg.model_params == {} + assert cfg.max_depth == 3 + assert cfg.skip_if_exists is True + assert cfg.timeout_seconds == 300 + + def test_custom_values(self): + cfg = RunConfig(model_name="gemini-2.5-flash", temperature=0.1, max_tokens=500) + assert cfg.model_name == "gemini-2.5-flash" + assert cfg.temperature == 0.1 + assert cfg.max_tokens == 500 + + def test_to_dict_roundtrip(self): + cfg = RunConfig(model_name="gpt-4o", temperature=0.3, max_tokens=1000) + d = cfg.to_dict() + assert d["model_name"] == "gpt-4o" + assert d["temperature"] == 0.3 + assert d["max_tokens"] == 1000 + + def test_from_dict_roundtrip(self): + original = RunConfig(model_name="claude-3", temperature=0.5, max_tokens=800) + restored = RunConfig.from_dict(original.to_dict()) + assert restored.model_name == original.model_name + assert restored.temperature == original.temperature + assert restored.max_tokens == original.max_tokens + + def test_from_dict_uses_defaults_for_missing_keys(self): + cfg = RunConfig.from_dict({}) + assert cfg.model_name == "gpt-4" + assert cfg.temperature == 0.7 + + def test_model_params_default_is_independent(self): + a = RunConfig() + b = RunConfig() + a.model_params["x"] = 1 + assert "x" not in b.model_params + + +class TestLLMResponse: + def test_minimal_construction(self): + r = LLMResponse(content="hello", model="test-model") + assert r.content == "hello" + assert r.model == "test-model" + assert r.usage == {} + assert r.finish_reason == "stop" + assert r.metadata == {} + + def test_full_construction(self): + r = LLMResponse( + content="response text", + model="gpt-4", + usage={"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}, + finish_reason="length", + metadata={"provider": "openai", "latency_seconds": 1.2}, + ) + assert r.usage["total_tokens"] == 15 + assert r.finish_reason == "length" + assert r.metadata["provider"] == "openai" + + def test_to_dict(self): + r = LLMResponse(content="hi", model="m", finish_reason="stop") + d = r.to_dict() + assert d["content"] == "hi" + assert d["model"] == "m" + assert d["finish_reason"] == "stop" + assert "usage" in d + assert "metadata" in d + + def test_metadata_default_is_independent(self): + a = LLMResponse(content="a", model="m") + b = LLMResponse(content="b", model="m") + a.metadata["x"] = 1 + assert "x" not in b.metadata diff --git a/workplans/llm-connect-WP-0001-foundation-gaaf-baseline.md b/workplans/llm-connect-WP-0001-foundation-gaaf-baseline.md new file mode 100644 index 0000000..ad9132f --- /dev/null +++ b/workplans/llm-connect-WP-0001-foundation-gaaf-baseline.md @@ -0,0 +1,36 @@ +# LLM-WP-0001 — Foundation & GAAF Baseline + +**status:** active +**owner:** llm-connect +**repo:** llm-connect +**created:** 2026-04-01 + +## Purpose + +Establish the structural foundation required before any Core modifications. +Covers repo orientation files, GAAF-2026 compliance artifacts, test suite, CI, +and state-hub housekeeping. + +## Tasks + +| ID | Title | Priority | Status | +|-----|-------|----------|--------| +| T01 | Create `SCOPE.md` | high | done | +| T02 | Fill `.claude/rules/` stubs: `architecture.md`, `stack-and-commands.md`, `repo-boundary.md` | high | done | +| T03 | Create `ARCHITECTURE-LAYERS.md` with layer map, scorecard stub, next-review date | high | done | +| T04 | Create `/contracts/` tree (`core/`, `functional/`, `config/`) | high | done | +| T05 | Core contract doc: `LLMAdapter` interface invariants, `RunConfig`/`LLMResponse` field contracts | high | done | +| T06 | Functional contract stubs for all 4 adapters + embedding adapters (maturity: Beta) | medium | done | +| T07 | Create `tests/` with `conftest.py`, wire pytest in `pyproject.toml` | high | done | +| T08 | Unit tests: `RunConfig`, `LLMResponse`, `MockLLMAdapter`, full exception hierarchy | high | done | +| T09 | Unit tests: `create_adapter` (all providers + unknown provider error), `create_embedding_adapter` | high | done | +| T10 | Add `ruff`, `mypy` to dev deps in `pyproject.toml` | medium | done | +| T11 | CI workflow: pytest + ruff + mypy | medium | done | +| T12 | State hub: register this host path, SBOM refresh | low | done | + +## Exit criteria + +- `ARCHITECTURE-LAYERS.md` and `/contracts/core/` exist and describe the current Core surface +- pytest passes with coverage of Core and factory +- ruff + mypy clean +- CI green on push diff --git a/workplans/llm-connect-WP-0002-core-extensions.md b/workplans/llm-connect-WP-0002-core-extensions.md new file mode 100644 index 0000000..150abf8 --- /dev/null +++ b/workplans/llm-connect-WP-0002-core-extensions.md @@ -0,0 +1,57 @@ +# LLM-WP-0002 — Core Extensions (FR-4 + FR-3) + +**status:** active +**owner:** llm-connect +**repo:** llm-connect +**created:** 2026-04-01 +**depends-on:** LLM-WP-0001 (contracts and tests must exist before Core is modified) + +## Purpose + +Implement the two IHF feature requests that touch the Core layer. +FR-4 (BudgetTracker) is additive and non-breaking. FR-3 (async) extends +the Core ABC with a default executor fallback — non-breaking, overridable +per adapter for native async. + +Origin: IHUB-WP-0012 Phase 11 — Advanced AI Federation (completed 2026-04-01). + +## GAAF notes + +Both changes are Core-layer modifications under GAAF-2026: +- FR-4: new primitive (`BudgetTracker`) + new exception (`LLMBudgetExceededError`) + added as optional `RunConfig` field — additive, non-breaking. +- FR-3: `async_execute_prompt` added to `LLMAdapter` ABC with a default + `asyncio.get_event_loop().run_in_executor(None, ...)` fallback so existing + adapters remain valid; native async overrides are provided per adapter. + +Core contract doc (from WP-0001 T05) must be updated after each change. + +## Tasks + +### FR-4 — BudgetTracker + +| ID | Title | Priority | Status | +|-----|-------|----------|--------| +| T01 | `BudgetTracker` dataclass: `total`, `spent`, `remaining()`, thread-safe increment | high | todo | +| T02 | `LLMBudgetExceededError(LLMError)` in `exceptions.py` | high | todo | +| T03 | Optional `budget_tracker: BudgetTracker \| None` field on `RunConfig` | high | todo | +| T04 | Enforcement: each adapter checks/updates tracker around call; raises on exceeded | high | todo | +| T05 | Update Core contract doc | medium | todo | +| T06 | Tests: single call, delegation chain (A→B→C shared tracker), exceeded error, multi-adapter | high | todo | + +### FR-3 — async_execute_prompt + +| ID | Title | Priority | Status | +|-----|-------|----------|--------| +| T07 | Add `async_execute_prompt` to `LLMAdapter` ABC with default executor fallback | high | todo | +| T08 | Native async override in `OpenAIAdapter`, `GeminiAdapter`, `OpenRouterAdapter` | high | todo | +| T09 | Native async for `ClaudeCodeAdapter` via `asyncio.create_subprocess_exec` | high | todo | +| T10 | Update Core contract doc | medium | todo | +| T11 | Tests: `asyncio.gather` over N adapters, timeout propagation, budget interaction | high | todo | + +## Exit criteria + +- `BudgetTracker` enforces caps across a delegation chain of 3 adapters in tests +- `asyncio.gather` over 4 mock adapters completes without errors +- All existing tests still pass (non-breaking validation) +- Core contract doc reflects both additions diff --git a/workplans/llm-connect-WP-0003-functional-extensions.md b/workplans/llm-connect-WP-0003-functional-extensions.md new file mode 100644 index 0000000..34d1d4b --- /dev/null +++ b/workplans/llm-connect-WP-0003-functional-extensions.md @@ -0,0 +1,51 @@ +# LLM-WP-0003 — Functional Extensions (FR-2 + FR-1) + +**status:** active +**owner:** llm-connect +**repo:** llm-connect +**created:** 2026-04-01 +**depends-on:** LLM-WP-0001 (test infrastructure must exist) + +## Purpose + +Implement the two IHF feature requests that add new Functional-layer modules. +Neither touches Core. Both can be developed independently of WP-0002. + +Origin: IHUB-WP-0012 Phase 11 — Advanced AI Federation (completed 2026-04-01). + +## GAAF notes + +Both additions are Functional-layer under GAAF-2026: +- Demand signal is explicit: IHF (inter-hub) is the primary consumer for both. +- Each gets its own functional contract doc in `/contracts/functional/`. +- Maturity on release: Beta (single known consumer, interface not yet stabilised). + +## Tasks + +### FR-2 — RoutingPolicy + +| ID | Title | Priority | Status | +|-----|-------|----------|--------| +| T01 | `RoutingPolicy` data model: `rules` list with `task_type`, `prefer`, `max_cost_per_1k`, `fallback` | high | todo | +| T02 | `policy.resolve(task_type)` → returns configured `LLMAdapter` | high | todo | +| T03 | Export from `llm_connect.__init__` and update `__all__` | medium | todo | +| T04 | Functional contract doc for `RoutingPolicy` | medium | todo | +| T05 | Tests: rule match, cost-cap fallback, unknown task_type fallback, no-match default | high | todo | + +### FR-1 — HTTP serve mode + +| ID | Title | Priority | Status | +|-----|-------|----------|--------| +| T06 | Design `/execute` JSON schema (request: provider, model, prompt, config; response: LLMResponse fields) | high | todo | +| T07 | Implement `llm_connect/server.py` — minimal HTTP server, `POST /execute`, `GET /health` | high | todo | +| T08 | `python -m llm_connect.server --port N --provider X --model Y` CLI entry point | high | todo | +| T09 | Add `httpx` or `aiohttp` server dep under `[project.optional-dependencies] server` | medium | todo | +| T10 | Functional contract doc (API schema — request/response shapes, error codes) | medium | todo | +| T11 | Tests: spin up server in subprocess or via `TestClient`, POST round-trip (MockAdapter), error responses | high | todo | + +## Exit criteria + +- `RoutingPolicy.resolve("triage")` returns the correct adapter per rules in tests +- `python -m llm_connect.server --port 9999` starts and responds to `POST /execute` +- `GET /health` returns 200 +- All functional contract docs present in `/contracts/functional/`