feat: WP-0001 foundation + WP-0002 core extensions

WP-0001 — Foundation & GAAF Baseline - SCOPE.md, ARCHITECTURE-LAYERS.md, contracts/ tree - .claude/rules/ stubs filled (architecture, stack, boundary) - 57 tests (pytest), pyproject.toml with ruff+mypy, CI workflow WP-0002 — Core Extensions (FR-4 + FR-3) - FR-4: BudgetTracker (thread-safe) + LLMBudgetExceededError + optional RunConfig.budget_tracker + enforcement in all adapters - FR-3: async_execute_prompt on LLMAdapter ABC (asyncio.to_thread fallback) + native asyncio.create_subprocess_exec in ClaudeCodeAdapter 81 tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:24:14 +00:00
parent 57b346bb8b
commit d71f4114d1
28 changed files with 1601 additions and 26 deletions
--- a/.claude/rules/architecture.md
+++ b/.claude/rules/architecture.md
@@ -1,8 +1,58 @@
 ## Architecture

-<!-- TODO: Describe the key design decisions and component structure.
-     Key modules, data flows, external integrations, state machines, etc. -->
+llm-connect is structured as a **GAAF-2026 layered library**. See
+`ARCHITECTURE-LAYERS.md` for the full layer map and scorecard.

-## Quick Reference
+### Layer summary

-`~/the-custodian/state-hub/mcp_server/TOOLS.md` — MCP tool reference
+```
+Core (frozen after v1)
+  LLMAdapter ABC          adapter.py
+  RunConfig / LLMResponse models.py
+  LLMError hierarchy      exceptions.py
+  MockLLMAdapter          adapter.py   ← test primitive, belongs with Core
+
+Functional (evolvable, independently shippable)
+  OpenAIAdapter           openai.py
+  GeminiAdapter           gemini.py
+  OpenRouterAdapter       openrouter.py
+  ClaudeCodeAdapter       claude_code.py
+  EmbeddingAdapter ABC    embedding_adapter.py
+  OpenAICompatibleEmbeddingAdapter  embedding_openai.py
+  EmbeddingCache          embedding_cache.py
+  create_adapter()        factory.py
+  create_embedding_adapter()  embedding_factory.py
+  _token_estimator        _token_estimator.py
+  similarity utilities    similarity.py
+
+Configuration (user-controlled declarative state)
+  resolve_llm() chain     toml_config.py   ← 7-level TOML priority chain
+  LLMConfig / load_config config.py
+  _http shared utility    _http.py         ← also used by Functional adapters
+```
+
+### Dependency rule
+
+Core ← Functional ← Configuration  
+No upward dependencies. `_http.py` is consumed by Functional only.
+
+### Key design decisions
+
+**API key resolution** (`config.resolve_api_key`): three-step chain —
+explicit argument → environment variable → plaintext key file in project root.
+Adapters raise `LLMConfigurationError` at construction time if no key is found
+(except `ClaudeCodeAdapter` which needs no key).
+
+**TOML config chain** (`toml_config.resolve_llm`): 7 priority levels allow
+per-project and per-user LLM preferences. Currently defaults to `markitect`
+app_name for backward compatibility — consumers pass their own `app_name`.
+
+**Factory pattern** (`factory.create_adapter`): lazy imports prevent pulling
+all provider SDKs at module load. Add a new provider by registering its FQN
+in `_PROVIDERS`.
+
+**ClaudeCodeAdapter subprocess model**: prompt is piped via stdin (not CLI
+arg) to avoid shell argument length limits on large prompts (>30 KB).
+
+**Retry logic**: `OpenAIAdapter` and `OpenRouterAdapter` retry on 429 and 5xx
+with exponential backoff. `GeminiAdapter` does not (rate-limit handling deferred).
--- a/.claude/rules/repo-boundary.md
+++ b/.claude/rules/repo-boundary.md
@@ -1,8 +1,17 @@
 ## Repo boundary

-This repo owns **{PROJECT_NAME}** only. It does not own:
+This repo owns **llm-connect** — the multi-provider LLM client library — only.

-<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code     → the-custodian/state-hub/
-->
+It does NOT own:
+
+- **API key storage / secret management** → caller's environment (env vars,
+  key files, vault). llm-connect resolves keys but does not store them.
+- **Consumer routing logic** → `inter-hub/AgentBridge.hs`, `markitect` etc.
+  `RoutingPolicy` (WP-0003) provides primitives; policy data belongs in the consumer.
+- **The Claude Code CLI binary** → installed separately; `ClaudeCodeAdapter`
+  shells out to it.
+- **markitect application code** → `markitect.llm` is a shim that re-exports
+  from here; all implementation lives in this repo.
+- **State hub / custodian infrastructure** → `the-custodian/state-hub/`
+- **IHF bridge scripts** → `inter-hub/scripts/llm_bridge.py` lives in inter-hub,
+  not here. llm-connect is a dependency of that script.
--- a/.claude/rules/stack-and-commands.md
+++ b/.claude/rules/stack-and-commands.md
@@ -1,19 +1,59 @@
 ## Stack

-<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **Language:**
- **Key deps:**
+- **Language:** Python 3.10+
+- **Key deps (runtime):** `toml` (TOML config parsing)
+- **Key deps (dev):** `pytest`, `ruff`, `mypy`
+- **HTTP:** stdlib `urllib` via `_http.py` (no requests/httpx runtime dep)
+- **Build:** setuptools / uv

 ## Dev Commands

 ```bash
-# TODO: Fill in the standard commands for this repo
-
-# Install dependencies
+# Install (editable, with dev extras)
+uv pip install -e ".[dev]"
+# or
+pip install -e ".[dev]"

 # Run tests
+uv run pytest
+# or
+pytest

-# Lint / type check
+# Lint
+uv run ruff check .

-# Build / package (if applicable)
+# Type check
+uv run mypy llm_connect
+
+# Run a single test file
+uv run pytest tests/test_models.py -v
+
+# Build package (dry run)
+uv build --no-sources
+```
+
+## Project layout
+
+```
+llm_connect/         source package
+  adapter.py         LLMAdapter ABC + Mock/ErrorLLMAdapter
+  models.py          RunConfig, LLMResponse
+  exceptions.py      LLMError hierarchy
+  factory.py         create_adapter()
+  openai.py          OpenAIAdapter
+  gemini.py          GeminiAdapter
+  openrouter.py      OpenRouterAdapter
+  claude_code.py     ClaudeCodeAdapter
+  embedding_adapter.py  EmbeddingAdapter ABC
+  embedding_openai.py   OpenAICompatibleEmbeddingAdapter
+  embedding_cache.py    EmbeddingCache
+  embedding_factory.py  create_embedding_adapter()
+  toml_config.py     7-level TOML config resolution
+  config.py          LLMConfig, resolve_api_key, find_project_root
+  _http.py           shared HTTP POST utility
+  _token_estimator.py  rough token count estimate
+  similarity.py      cosine similarity utilities
+tests/               pytest test suite
+contracts/           GAAF-2026 contract docs
+workplans/           workplan files (LLM-WP-NNNN)
 ```
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,37 @@
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11", "3.12"]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v3
+
+      - name: Install dependencies
+        run: uv pip install --system -e ".[dev]"
+
+      - name: Lint (ruff)
+        run: ruff check .
+
+      - name: Type check (mypy)
+        run: mypy llm_connect
+
+      - name: Test (pytest)
+        run: pytest
--- a/ARCHITECTURE-LAYERS.md
+++ b/ARCHITECTURE-LAYERS.md
@@ -0,0 +1,94 @@
+# ARCHITECTURE-LAYERS.md
+
+**Framework:** GAAF-2026  
+**Last reviewed:** 2026-04-01  
+**Repository purpose:** Multi-provider LLM client library — unified adapter interface for Python  
+**Next review:** 2026-07-01
+
+---
+
+## Layer Map
+
+### Core (high rigidity — frozen after v1)
+
+Domain-agnostic primitives. Must not change without a major version bump once stable.
+
+| Module | Contents |
+|--------|----------|
+| `adapter.py` | `LLMAdapter` ABC (`execute_prompt`, `validate_config`); `MockLLMAdapter`; `ErrorLLMAdapter` |
+| `models.py` | `RunConfig`, `LLMResponse` dataclasses |
+| `exceptions.py` | `LLMError` → `LLMConfigurationError`, `LLMAPIError`, `LLMRateLimitError`, `LLMTimeoutError`, `LLMSubprocessError` |
+
+**Contract:** `contracts/core/llm-adapter.md`
+
+### Functional (medium rigidity — evolvable, versioned)
+
+Value-realization modules. Each adapter is independently shippable.
+Maturity states: **Experimental → Beta → Stable → Deprecated**
+
+| Module | Contents | Maturity |
+|--------|----------|----------|
+| `openai.py` | `OpenAIAdapter` — OpenAI chat completions | Beta |
+| `gemini.py` | `GeminiAdapter` — Google Generative Language API | Beta |
+| `openrouter.py` | `OpenRouterAdapter` — OpenAI-compatible multi-model routing | Beta |
+| `claude_code.py` | `ClaudeCodeAdapter` — `claude --print` subprocess | Beta |
+| `embedding_adapter.py` | `EmbeddingAdapter` ABC | Beta |
+| `embedding_openai.py` | `OpenAICompatibleEmbeddingAdapter` | Beta |
+| `embedding_cache.py` | `EmbeddingCache` — disk-backed embedding cache | Beta |
+| `embedding_factory.py` | `create_embedding_adapter()` factory | Beta |
+| `factory.py` | `create_adapter()` factory — lazy provider registration | Beta |
+| `_token_estimator.py` | Rough token count estimation (word-based) | Beta |
+| `similarity.py` | `cosine_similarity`, `similarity_matrix`, `find_similar_pairs` | Beta |
+
+**Planned additions (WP-0003):** `RoutingPolicy`, `server.py`  
+**Contracts:** `contracts/functional/`
+
+### Configuration (very low rigidity — user-controlled declarative state)
+
+| Module | Contents |
+|--------|----------|
+| `toml_config.py` | `resolve_llm()` — 7-level TOML priority chain; `ResolvedLLM`; `LLMLayer` |
+| `config.py` | `LLMConfig` dataclass; `resolve_api_key()`; `find_project_root()`; `load_config()` |
+| `_http.py` | Shared HTTP POST utility (used by Functional adapters) |
+
+**Contracts:** `contracts/config/`
+
+---
+
+## Dependency Rule
+
+```
+Core  ←  Functional  ←  Configuration
+```
+
+Upward dependencies (Configuration → Functional, Functional → Core) are **prohibited**.  
+`_http.py` sits in the Configuration layer but is consumed only by Functional adapters — acceptable as a shared utility with no upward reach.
+
+---
+
+## Decisions Log
+
+| Date | Decision | Rationale |
+|------|----------|-----------|
+| 2026-04-01 | FR-3 async: default executor fallback on ABC rather than abstract method | Non-breaking; existing adapters remain valid; native async opt-in per adapter |
+| 2026-04-01 | FR-4 BudgetTracker: optional field on RunConfig, not a separate context object | Keeps RunConfig as single call config; avoids thread-local / contextvar complexity |
+| 2026-04-01 | FR-1 HTTP server: optional dep `[server]`, not runtime dep | Keeps base install lightweight; most consumers call the library directly |
+
+---
+
+## GAAF-2026 Scorecard (initial baseline — 2026-04-01)
+
+> Scoring: 0 = absent / harmful · 5 = excellent
+
+| Dimension | Score | Notes |
+|-----------|-------|-------|
+| **Core** | 2.5 | ABC and models well-defined; no formal contracts, no tests, no invariant docs yet |
+| **Functional** | 2.5 | Adapters isolated and independently usable; no maturity labels enforced, no tests |
+| **Customization** | n/a | Not applicable (library, not SaaS) |
+| **Configuration** | 2.0 | TOML chain works; no schema validation; `markitect` name coupling in toml_config defaults |
+| **Extensions** | n/a | Not applicable yet (RoutingPolicy + server in WP-0003) |
+| **Cross-layer** | 2.0 | Dependency direction correct; no CI fitness functions; no import graph checks |
+| **Weighted total** | ~2.3 | Usable but vulnerable — WP-0001 targets ≥ 3.5 |
+
+**Target after WP-0001:** ≥ 3.5 (Strong)  
+**Target after WP-0002 + WP-0003:** ≥ 4.0 (Strong / Exemplary)
--- a/SCOPE.md
+++ b/SCOPE.md
@@ -0,0 +1,45 @@
+# SCOPE.md — llm-connect
+
+## Purpose
+
+`llm-connect` is a **multi-provider LLM client library for Python**.
+It provides a unified adapter interface over OpenAI, Gemini, OpenRouter,
+and the Claude Code CLI, with embedding support, token estimation, and a
+TOML-based configuration chain.
+
+Extracted from [markitect](https://github.com/worsch/markitect).
+The `markitect.llm` module remains a re-export shim pointing here.
+
+## This repo owns
+
+- `LLMAdapter` ABC and `RunConfig` / `LLMResponse` data models (Core)
+- All concrete provider adapters: `OpenAIAdapter`, `GeminiAdapter`,
+  `OpenRouterAdapter`, `ClaudeCodeAdapter` (Functional)
+- Embedding adapters: `EmbeddingAdapter` ABC, `OpenAICompatibleEmbeddingAdapter`,
+  `EmbeddingCache`, `create_embedding_adapter` factory (Functional)
+- TOML-based config resolution (`toml_config.py`, `config.py`) (Configuration)
+- Shared HTTP utility (`_http.py`), token estimator (`_token_estimator.py`),
+  cosine similarity utilities (`similarity.py`)
+- The full `LLMError` exception hierarchy
+
+## This repo does NOT own
+
+- Consumer application logic — that lives in `markitect`, `inter-hub`, etc.
+- API key management infrastructure — keys are resolved from env vars or
+  plaintext key files; secret storage belongs in the calling environment
+- Model routing decisions specific to a consumer — `RoutingPolicy` (WP-0003)
+  provides primitives; policy configuration belongs in the consumer
+- The Claude Code CLI binary itself — `ClaudeCodeAdapter` shells out to `claude`
+
+## Consumers (as of 2026-04-01)
+
+| Consumer | How it uses llm-connect |
+|----------|------------------------|
+| `markitect` | Re-exports via `markitect.llm` shim; drives document generation |
+| `inter-hub` (IHF) | Subprocess bridge (`scripts/llm_bridge.py` + `AgentBridge.hs`) for multi-agent federation |
+
+## Versioning
+
+- Current version: **0.1.0** (pre-release; API not yet stable)
+- Core layer (`LLMAdapter`, `RunConfig`, `LLMResponse`) will be stabilised at **v1.0.0**
+- Breaking Core changes require a major version bump
--- a/contracts/config/toml-chain.md
+++ b/contracts/config/toml-chain.md
@@ -0,0 +1,80 @@
+# Contract: Configuration — TOML Config Chain
+
+**Layer:** Configuration  
+**Version:** 0.1.0  
+**Last updated:** 2026-04-01
+
+---
+
+## resolve_llm()
+
+`llm_connect.toml_config.resolve_llm(cli_provider, cli_model, app_name)`
+
+Walks a 7-level priority chain to resolve provider and model independently.
+Returns `ResolvedLLM(provider, model, provider_source, model_source)`.
+
+### Priority chain (highest → lowest)
+
+| Level | Source |
+|-------|--------|
+| 1 | CLI flags (`cli_provider`, `cli_model`) |
+| 2 | Env var `{APP_NAME}_HELPER_MODEL` (model only) |
+| 3 | User preference — `~/.config/{app_name}/config.toml` `[llm.preference]` |
+| 4 | Directory preference — `.{app_name}.toml` `[llm.preference]` |
+| 5 | Directory default — `.{app_name}.toml` `[llm.default]` |
+| 6 | User default — `~/.config/{app_name}/config.toml` `[llm.default]` |
+| 7 | Hardcoded fallback — `gemini / gemini-2.5-flash` |
+
+### Invariants
+
+- Always returns a fully-resolved `ResolvedLLM` (never raises, never returns None).
+- Provider and model are resolved independently — a preference for model does
+  not imply a preference for provider.
+- TOML parse errors are silently ignored (returns empty layer).
+- `app_name` defaults to `"markitect"` for backward compatibility; consumers
+  should pass their own app name.
+
+### Known issue
+
+`toml_config.py` has `markitect`-specific defaults (`MARKITECT_HELPER_MODEL`,
+`USER_CONFIG_DIR`). These are kept for backward compatibility but callers
+outside markitect should always pass an explicit `app_name`.
+
+---
+
+## resolve_api_key()
+
+`llm_connect.config.resolve_api_key(explicit, env_var, key_file_paths)`
+
+Resolution order:
+1. `explicit` argument
+2. Environment variable `env_var`
+3. First readable file in `key_file_paths` with non-empty content
+
+Returns `None` if nothing is found. Never raises.
+
+---
+
+## find_project_root()
+
+Walks up from CWD looking for `pyproject.toml`. Returns the containing directory
+or `None`. Used by adapters to locate key files.
+
+---
+
+## LLMConfig
+
+`llm_connect.config.LLMConfig`
+
+Dataclass holding per-adapter configuration. Used directly by `OpenRouterAdapter`
+and `ClaudeCodeAdapter`. Not required by the Core `LLMAdapter` ABC.
+
+| Field | Default |
+|-------|---------|
+| `provider` | `"openrouter"` |
+| `model` | `"anthropic/claude-sonnet-4"` |
+| `api_key` | `None` |
+| `api_base` | `"https://openrouter.ai/api/v1"` |
+| `claude_cli_path` | `"claude"` |
+| `timeout_seconds` | `300` |
+| `max_retries` | `3` |
--- a/contracts/core/llm-adapter.md
+++ b/contracts/core/llm-adapter.md
@@ -0,0 +1,122 @@
+# Contract: Core — LLMAdapter Interface
+
+**Layer:** Core  
+**Version:** 0.1.0  
+**Status:** Draft (stabilises at v1.0.0)  
+**Last updated:** 2026-04-01
+
+---
+
+## LLMAdapter ABC
+
+`llm_connect.adapter.LLMAdapter`
+
+### Interface
+
+```python
+class LLMAdapter(ABC):
+    @abstractmethod
+    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
+
+    @abstractmethod
+    def validate_config(self, config: RunConfig) -> bool: ...
+```
+
+**Planned addition (WP-0002 T07):**
+```python
+    async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        # Default: runs execute_prompt in a thread executor
+        ...
+```
+
+### Invariants
+
+1. `execute_prompt` MUST return an `LLMResponse` with a non-empty `content` field on success.
+2. `execute_prompt` MUST raise a subclass of `LLMError` on any failure — never a bare exception.
+3. `validate_config` MUST be side-effect-free and return `bool` only.
+4. `validate_config` returning `False` does not preclude calling `execute_prompt` — it is advisory.
+5. Adapters MUST NOT mutate the `config` argument.
+6. `execute_prompt` is allowed to be slow (network I/O) but MUST respect `config.timeout_seconds`.
+
+### Failure modes
+
+| Condition | Exception |
+|-----------|-----------|
+| Missing / invalid API key | `LLMConfigurationError` |
+| HTTP 4xx (non-429) | `LLMAPIError` (with `.status_code`) |
+| HTTP 429 | `LLMRateLimitError` |
+| Request timeout | `LLMTimeoutError` |
+| CLI subprocess failure | `LLMSubprocessError` (with `.return_code`, `.stderr`) |
+| Token budget exceeded (WP-0002) | `LLMBudgetExceededError` |
+
+### Compatibility rules
+
+- Any code that accepts `LLMAdapter` MUST work with `MockLLMAdapter`.
+- Adding new optional methods to the ABC is non-breaking (default implementations provided).
+- Removing or changing the signature of `execute_prompt` or `validate_config` is a **breaking Core change** requiring a major version bump.
+
+---
+
+## RunConfig
+
+`llm_connect.models.RunConfig`
+
+### Fields and invariants
+
+| Field | Type | Default | Invariant |
+|-------|------|---------|-----------|
+| `model_name` | `str` | `"gpt-4"` | Non-empty string; adapters MAY override |
+| `temperature` | `float` | `0.7` | 0.0 ≤ temperature ≤ 2.0 |
+| `max_tokens` | `int` | `2000` | > 0 |
+| `model_params` | `dict` | `{}` | Provider-specific pass-through; no invariants |
+| `max_depth` | `int` | `3` | ≥ 0 |
+| `skip_if_exists` | `bool` | `True` | — |
+| `timeout_seconds` | `int` | `300` | > 0 |
+| `budget_tracker` | `BudgetTracker \| None` | `None` | Optional; added in WP-0002 |
+
+Adapters MUST NOT mutate `RunConfig` fields.
+
+---
+
+## LLMResponse
+
+`llm_connect.models.LLMResponse`
+
+### Fields and invariants
+
+| Field | Type | Invariant |
+|-------|------|-----------|
+| `content` | `str` | Non-empty on success; may be empty only if provider returned empty output |
+| `model` | `str` | Non-empty; the model actually used (may differ from `RunConfig.model_name`) |
+| `usage` | `dict` | Keys: `prompt_tokens`, `completion_tokens`, `total_tokens` (all int ≥ 0) |
+| `finish_reason` | `str` | Provider-reported; `"stop"` is the normal value |
+| `metadata` | `dict` | Arbitrary; always includes `"provider"` key |
+
+---
+
+## LLMError Hierarchy
+
+```
+LLMError
+├── LLMConfigurationError   bad key / unknown provider
+├── LLMAPIError             HTTP error (has .status_code, .response_body)
+│   └── LLMRateLimitError   429
+├── LLMTimeoutError         request or subprocess timed out
+├── LLMSubprocessError      CLI failed (has .return_code, .stderr)
+└── LLMBudgetExceededError  token budget cap exceeded (WP-0002)
+```
+
+All exceptions carry optional `cause` (chained exception) and `context` (dict).
+
+---
+
+## Mock adapters
+
+`MockLLMAdapter` and `ErrorLLMAdapter` are part of Core — they are test
+primitives that any consumer may depend on without importing dev extras.
+
+`MockLLMAdapter` invariants:
+- Returns deterministic response without network I/O
+- Increments `call_count` on each call
+- Records `last_prompt` and `last_config`
+- `reset()` clears all counters and recorded state
--- a/contracts/functional/adapters.md
+++ b/contracts/functional/adapters.md
@@ -0,0 +1,94 @@
+# Contract: Functional — Provider Adapters
+
+**Layer:** Functional  
+**Version:** 0.1.0  
+**Maturity:** Beta (all adapters)  
+**Last updated:** 2026-04-01
+
+---
+
+## Common adapter contract
+
+All provider adapters implement `LLMAdapter` (see `contracts/core/llm-adapter.md`).
+
+Additional shared guarantees:
+
+- Constructors resolve API keys at instantiation and raise `LLMConfigurationError`
+  immediately if no key is found (fail-fast).
+- HTTP-based adapters (`OpenAIAdapter`, `GeminiAdapter`, `OpenRouterAdapter`)
+  use `_http.post_json` and do not add runtime dependencies beyond stdlib.
+- `metadata` in the returned `LLMResponse` always contains `"provider"` and
+  `"latency_seconds"` keys.
+- HTTP adapters that retry (`OpenAIAdapter`, `OpenRouterAdapter`) use
+  exponential backoff: `sleep(2 ** attempt)` on 429 and 5xx.
+
+---
+
+## OpenAIAdapter
+
+**Provider key:** `"openai"`  
+**Default model:** `gpt-4.1-mini`  
+**API:** `https://api.openai.com/v1/chat/completions`  
+**Auth:** `OPENAI_API_KEY` env var or `apikey-chatgpt.txt` in project root  
+**Retries:** 3 (exponential backoff on 429 and 5xx)
+
+---
+
+## GeminiAdapter
+
+**Provider key:** `"gemini"`  
+**Default model:** `gemini-2.5-flash`  
+**API:** `https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent`  
+**Auth:** `GEMINI_API_KEY` env var or `apikey-geminifree.txt` in project root  
+**Retries:** 0 (no retry logic; rate-limit handling deferred)  
+**Note:** System prompt is simulated via a user/model turn pair (Gemini has no native system role).
+
+---
+
+## OpenRouterAdapter
+
+**Provider key:** `"openrouter"`  
+**Default model:** `anthropic/claude-sonnet-4`  
+**API:** `https://openrouter.ai/api/v1/chat/completions` (configurable via `LLMConfig.api_base`)  
+**Auth:** `OPENROUTER_API_KEY` env var or `apikey-openrouter.txt` in project root  
+**Retries:** 3 (exponential backoff on 429 and 5xx)  
+**Note:** OpenRouter is an OpenAI-compatible endpoint; `RunConfig.model_params` are merged into the payload.
+
+---
+
+## ClaudeCodeAdapter
+
+**Provider key:** `"claude-code"`  
+**Default model:** n/a (uses the CLI's configured default)  
+**Auth:** none (delegates to locally installed `claude` CLI)  
+**Subprocess:** `claude --print [--model M]` with prompt on stdin  
+**Token counts:** estimated via `_token_estimator` (not provider-reported)  
+**validate_config:** runs `claude --version`; returns `False` if CLI not found
+
+---
+
+## EmbeddingAdapter ABC
+
+`llm_connect.embedding_adapter.EmbeddingAdapter`
+
+```python
+class EmbeddingAdapter(ABC):
+    @abstractmethod
+    def embed(self, texts: list[str]) -> list[list[float]]: ...
+```
+
+Invariant: returns a list of the same length as `texts`.
+
+### OpenAICompatibleEmbeddingAdapter
+
+Compatible with any OpenAI-format embedding endpoint (`/v1/embeddings`).  
+Default model: `text-embedding-3-small`.
+
+---
+
+## EmbeddingCache
+
+`llm_connect.embedding_cache.EmbeddingCache`
+
+Disk-backed cache keyed by text content (SHA-256 hash).  
+`get_or_compute(text, compute_fn)` returns cached vector or calls `compute_fn`.
--- a/llm_connect/init.py
+++ b/llm_connect/init.py
@@ -12,7 +12,7 @@ Quick start::
    response = adapter.execute_prompt(prompt, run_config)
 """

-from llm_connect.models import RunConfig, LLMResponse
+from llm_connect.models import RunConfig, LLMResponse, BudgetTracker
 from llm_connect.adapter import LLMAdapter, MockLLMAdapter, ErrorLLMAdapter
 from llm_connect.factory import create_adapter
 from llm_connect.openrouter import OpenRouterAdapter
@@ -27,6 +27,7 @@ from llm_connect.exceptions import (
    LLMRateLimitError,
    LLMTimeoutError,
    LLMSubprocessError,
+    LLMBudgetExceededError,
 )
 from llm_connect.embedding_adapter import EmbeddingAdapter
 from llm_connect.embedding_openai import OpenAICompatibleEmbeddingAdapter
@@ -41,6 +42,7 @@ from llm_connect.similarity import (
 __all__ = [
    "RunConfig",
    "LLMResponse",
+    "BudgetTracker",
    "LLMAdapter",
    "MockLLMAdapter",
    "ErrorLLMAdapter",
@@ -57,6 +59,7 @@ __all__ = [
    "LLMRateLimitError",
    "LLMTimeoutError",
    "LLMSubprocessError",
+    "LLMBudgetExceededError",
    "EmbeddingAdapter",
    "OpenAICompatibleEmbeddingAdapter",
    "EmbeddingCache",
--- a/llm_connect/adapter.py
+++ b/llm_connect/adapter.py
@@ -5,10 +5,11 @@ Implements abstraction layer for LLM integration, supporting
 multiple providers (OpenAI, Anthropic, local models, etc.).
 """

+import asyncio
 from abc import ABC, abstractmethod
 from typing import Dict, Any

-from llm_connect.models import RunConfig, LLMResponse
+from llm_connect.models import RunConfig, LLMResponse, BudgetTracker


 class LLMAdapter(ABC):
@@ -40,6 +41,26 @@ class LLMAdapter(ABC):
        """
        pass

+    async def async_execute_prompt(
+        self,
+        prompt: str,
+        config: RunConfig,
+    ) -> LLMResponse:
+        """Execute a prompt asynchronously.
+
+        Default implementation runs :meth:`execute_prompt` in a thread
+        executor so that the event loop is not blocked. Subclasses may
+        override with a native ``asyncio``-based implementation.
+
+        Args:
+            prompt: Compiled prompt text
+            config: Execution configuration
+
+        Returns:
+            LLMResponse with generated content
+        """
+        return await asyncio.to_thread(self.execute_prompt, prompt, config)
+
    @abstractmethod
    def validate_config(self, config: RunConfig) -> bool:
        """
@@ -53,6 +74,27 @@ class LLMAdapter(ABC):
        """
        pass

+    # ── Budget helpers (call in execute_prompt implementations) ─────
+
+    def _preflight_budget(self, config: RunConfig) -> None:
+        """Raise ``LLMBudgetExceededError`` if the budget is already exhausted."""
+        if config.budget_tracker is not None and config.budget_tracker.remaining() == 0:
+            from llm_connect.exceptions import LLMBudgetExceededError
+            tracker = config.budget_tracker
+            raise LLMBudgetExceededError(
+                "Token budget exhausted before making request",
+                total=tracker.total,
+                spent=tracker.spent,
+                requested=0,
+                context={"total": tracker.total, "spent": tracker.spent},
+            )
+
+    def _consume_budget(self, config: RunConfig, response: LLMResponse) -> None:
+        """Consume tokens from the budget tracker after a successful call."""
+        if config.budget_tracker is not None:
+            tokens = response.usage.get("total_tokens", 0)
+            config.budget_tracker.consume(tokens)
+

 class MockLLMAdapter(LLMAdapter):
    """
@@ -88,11 +130,12 @@ class MockLLMAdapter(LLMAdapter):
        Returns:
            Mock LLMResponse
        """
+        self._preflight_budget(config)
        self.call_count += 1
        self.last_prompt = prompt
        self.last_config = config

-        return LLMResponse(
+        response = LLMResponse(
            content=self.mock_response,
            model=config.model_name,
            usage={
@@ -103,6 +146,8 @@ class MockLLMAdapter(LLMAdapter):
            finish_reason="stop",
            metadata={"mock": True},
        )
+        self._consume_budget(config, response)
+        return response

    def validate_config(self, config: RunConfig) -> bool:
        """
--- a/llm_connect/claude_code.py
+++ b/llm_connect/claude_code.py
@@ -2,6 +2,7 @@
 Claude Code CLI adapter — runs the ``claude`` CLI as a subprocess.
 """

+import asyncio
 import subprocess
 from typing import Optional

@@ -35,6 +36,7 @@ class ClaudeCodeAdapter(LLMAdapter):
    # ── LLMAdapter interface ────────────────────────────────────────

    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        self._preflight_budget(config)
        cmd = [self._cli_path, "--print"]
        if self._model:
            cmd.extend(["--model", self._model])
@@ -66,7 +68,7 @@ class ClaudeCodeAdapter(LLMAdapter):
        prompt_tokens = estimate_tokens(prompt)
        completion_tokens = estimate_tokens(content)

-        return LLMResponse(
+        response = LLMResponse(
            content=content,
            model=self._model or "claude-code-cli",
            usage={
@@ -80,6 +82,63 @@ class ClaudeCodeAdapter(LLMAdapter):
                "cli_path": self._cli_path,
            },
        )
+        self._consume_budget(config, response)
+        return response
+
+    async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        """Native async implementation using asyncio.create_subprocess_exec."""
+        self._preflight_budget(config)
+        cmd = [self._cli_path, "--print"]
+        if self._model:
+            cmd.extend(["--model", self._model])
+
+        timeout = config.timeout_seconds or self._config.timeout_seconds
+
+        try:
+            proc = await asyncio.create_subprocess_exec(
+                *cmd,
+                stdin=asyncio.subprocess.PIPE,
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE,
+            )
+            stdout_bytes, stderr_bytes = await asyncio.wait_for(
+                proc.communicate(input=prompt.encode()),
+                timeout=timeout,
+            )
+        except asyncio.TimeoutError as exc:
+            raise LLMTimeoutError(
+                f"claude CLI timed out after {timeout}s",
+                cause=exc,
+            ) from exc
+
+        if proc.returncode != 0:
+            raise LLMSubprocessError(
+                f"claude CLI exited with code {proc.returncode}",
+                return_code=proc.returncode,
+                stderr=stderr_bytes.decode(),
+            )
+
+        content = stdout_bytes.decode()
+        prompt_tokens = estimate_tokens(prompt)
+        completion_tokens = estimate_tokens(content)
+
+        response = LLMResponse(
+            content=content,
+            model=self._model or "claude-code-cli",
+            usage={
+                "prompt_tokens": prompt_tokens,
+                "completion_tokens": completion_tokens,
+                "total_tokens": prompt_tokens + completion_tokens,
+            },
+            finish_reason="stop",
+            metadata={
+                "provider": "claude-code",
+                "cli_path": self._cli_path,
+                "async": True,
+            },
+        )
+        self._consume_budget(config, response)
+        return response

    def validate_config(self, config: RunConfig) -> bool:
        try:
--- a/llm_connect/exceptions.py
+++ b/llm_connect/exceptions.py
@@ -64,6 +64,30 @@ class LLMTimeoutError(LLMError):
    pass


+class LLMBudgetExceededError(LLMError):
+    """Token budget cap exceeded during a call or delegation chain.
+
+    Attributes:
+        total: The configured token cap.
+        spent: Tokens already consumed before this call.
+        requested: Tokens this call would have consumed.
+    """
+
+    def __init__(
+        self,
+        message: str,
+        total: int = 0,
+        spent: int = 0,
+        requested: int = 0,
+        cause: Optional[Exception] = None,
+        context: Optional[Dict[str, Any]] = None,
+    ):
+        super().__init__(message, cause=cause, context=context)
+        self.total = total
+        self.spent = spent
+        self.requested = requested
+
+
 class LLMSubprocessError(LLMError):
    """Claude Code CLI subprocess failed.

--- a/llm_connect/gemini.py
+++ b/llm_connect/gemini.py
@@ -2,6 +2,7 @@
 Google Gemini adapter — calls the Generative Language REST API directly.
 """

+import asyncio
 import time
 from typing import Optional, Dict, Any

@@ -48,6 +49,7 @@ class GeminiAdapter(LLMAdapter):
    # ── LLMAdapter interface ────────────────────────────────────────

    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        self._preflight_budget(config)
        model = self._model

        # Build Gemini request
@@ -92,7 +94,7 @@ class GeminiAdapter(LLMAdapter):

        usage_meta = data.get("usageMetadata", {})

-        return LLMResponse(
+        response = LLMResponse(
            content=content,
            model=model,
            usage={
@@ -106,6 +108,12 @@ class GeminiAdapter(LLMAdapter):
                "latency_seconds": round(latency, 3),
            },
        )
+        self._consume_budget(config, response)
+        return response
+
+    async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        """Async wrapper — runs execute_prompt in a thread executor."""
+        return await asyncio.to_thread(self.execute_prompt, prompt, config)

    def validate_config(self, config: RunConfig) -> bool:
        if not self._api_key:
--- a/llm_connect/models.py
+++ b/llm_connect/models.py
@@ -5,8 +5,53 @@ These classes are the canonical definitions; they are re-exported by
 markitect.prompts.execution.models for backward compatibility.
 """

+import threading
 from dataclasses import dataclass, field
-from typing import Dict, Any
+from typing import Dict, Any, Optional
+
+
+class BudgetTracker:
+    """Shared token budget for a call or delegation chain.
+
+    Thread-safe. Tracks cumulative token spend across multiple adapter
+    calls. Raises ``LLMBudgetExceededError`` when the cap is exceeded.
+
+    Example::
+
+        tracker = BudgetTracker(total=4000)
+        config = RunConfig(budget_tracker=tracker)
+        # All adapter calls sharing this config will consume from the same cap.
+    """
+
+    def __init__(self, total: int) -> None:
+        if total <= 0:
+            raise ValueError(f"BudgetTracker total must be positive, got {total}")
+        self.total = total
+        self.spent = 0
+        self._lock = threading.Lock()
+
+    def remaining(self) -> int:
+        """Return tokens remaining in the budget."""
+        return max(0, self.total - self.spent)
+
+    def consume(self, tokens: int) -> None:
+        """Record *tokens* as spent. Raises ``LLMBudgetExceededError`` if cap exceeded."""
+        from llm_connect.exceptions import LLMBudgetExceededError  # avoid circular at module load
+
+        with self._lock:
+            new_spent = self.spent + tokens
+            if new_spent > self.total:
+                raise LLMBudgetExceededError(
+                    f"Token budget exceeded: {new_spent} tokens used, cap is {self.total}",
+                    total=self.total,
+                    spent=self.spent,
+                    requested=tokens,
+                    context={"total": self.total, "spent": self.spent, "requested": tokens},
+                )
+            self.spent = new_spent
+
+    def __repr__(self) -> str:
+        return f"BudgetTracker(total={self.total}, spent={self.spent}, remaining={self.remaining()})"


@dataclass
@@ -30,9 +75,10 @@ class RunConfig:
    max_depth: int = 3
    skip_if_exists: bool = True
    timeout_seconds: int = 300
+    budget_tracker: Optional["BudgetTracker"] = field(default=None, repr=False)

    def to_dict(self) -> Dict[str, Any]:
-        """Convert to dictionary."""
+        """Convert to dictionary. ``budget_tracker`` is excluded (runtime object)."""
        return {
            "model_name": self.model_name,
            "temperature": self.temperature,
--- a/llm_connect/openai.py
+++ b/llm_connect/openai.py
@@ -2,6 +2,7 @@
 OpenAI (ChatGPT) adapter — calls the OpenAI chat completions API.
 """

+import asyncio
 import time
 from typing import Optional, Dict, Any

@@ -51,6 +52,7 @@ class OpenAIAdapter(LLMAdapter):
    # ── LLMAdapter interface ────────────────────────────────────────

    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        self._preflight_budget(config)
        model = self._model

        messages: list[Dict[str, str]] = []
@@ -80,7 +82,7 @@ class OpenAIAdapter(LLMAdapter):
        finish_reason = choice.get("finish_reason", "stop")
        usage = data.get("usage", {})

-        return LLMResponse(
+        response = LLMResponse(
            content=content,
            model=data.get("model", model),
            usage={
@@ -95,6 +97,12 @@ class OpenAIAdapter(LLMAdapter):
                "response_id": data.get("id", ""),
            },
        )
+        self._consume_budget(config, response)
+        return response
+
+    async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        """Async wrapper — runs execute_prompt in a thread executor."""
+        return await asyncio.to_thread(self.execute_prompt, prompt, config)

    def validate_config(self, config: RunConfig) -> bool:
        if not self._api_key:
--- a/llm_connect/openrouter.py
+++ b/llm_connect/openrouter.py
@@ -2,6 +2,7 @@
 OpenRouter adapter — calls the OpenAI-compatible chat completions API.
 """

+import asyncio
 import time
 from typing import Optional, Dict, Any

@@ -55,6 +56,7 @@ class OpenRouterAdapter(LLMAdapter):
    # ── LLMAdapter interface ────────────────────────────────────────

    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        self._preflight_budget(config)
        model = self._model if self._model != _DEFAULT_MODEL else (config.model_name or self._model)

        messages: list[Dict[str, str]] = []
@@ -88,7 +90,7 @@ class OpenRouterAdapter(LLMAdapter):
        finish_reason = choice.get("finish_reason", "stop")
        usage = data.get("usage", {})

-        return LLMResponse(
+        response = LLMResponse(
            content=content,
            model=data.get("model", model),
            usage={
@@ -103,6 +105,12 @@ class OpenRouterAdapter(LLMAdapter):
                "response_id": data.get("id", ""),
            },
        )
+        self._consume_budget(config, response)
+        return response
+
+    async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse:
+        """Async wrapper — runs execute_prompt in a thread executor."""
+        return await asyncio.to_thread(self.execute_prompt, prompt, config)

    def validate_config(self, config: RunConfig) -> bool:
        if not self._api_key:
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -14,6 +14,8 @@ dependencies = [
 [project.optional-dependencies]
 dev = [
    "pytest>=7.0",
+    "ruff>=0.4",
+    "mypy>=1.10",
 ]

 [tool.setuptools.packages.find]
@@ -23,4 +25,26 @@ include = ["llm_connect*"]
 [dependency-groups]
 dev = [
    "pytest>=9.0.2",
+    "ruff>=0.4",
+    "mypy>=1.10",
 ]
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+addopts = "-v"
+
+[tool.ruff]
+target-version = "py310"
+line-length = 100
+
+[tool.ruff.lint]
+select = ["E", "F", "W", "I", "UP"]
+ignore = ["E501"]
+
+[tool.mypy]
+python_version = "3.10"
+strict = false
+ignore_missing_imports = true
+disallow_untyped_defs = true
+warn_return_any = true
+warn_unused_ignores = true
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -0,0 +1,26 @@
+"""
+Shared pytest fixtures for llm-connect tests.
+"""
+
+import pytest
+
+from llm_connect.models import RunConfig, LLMResponse
+from llm_connect.adapter import MockLLMAdapter
+
+
+@pytest.fixture
+def run_config():
+    """Default RunConfig for tests."""
+    return RunConfig()
+
+
+@pytest.fixture
+def mock_adapter():
+    """MockLLMAdapter with a predictable response."""
+    return MockLLMAdapter(mock_response="test response")
+
+
+@pytest.fixture
+def sample_response():
+    """A minimal valid LLMResponse."""
+    return LLMResponse(content="hello", model="test-model")
--- a/tests/test_adapter.py
+++ b/tests/test_adapter.py
@@ -0,0 +1,77 @@
+"""
+Tests for MockLLMAdapter and ErrorLLMAdapter (Core adapter utilities).
+"""
+
+import pytest
+from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter
+from llm_connect.models import RunConfig, LLMResponse
+
+
+class TestMockLLMAdapter:
+    def test_returns_mock_response(self, mock_adapter, run_config):
+        response = mock_adapter.execute_prompt("hello", run_config)
+        assert response.content == "test response"
+
+    def test_returns_llm_response(self, mock_adapter, run_config):
+        response = mock_adapter.execute_prompt("hello", run_config)
+        assert isinstance(response, LLMResponse)
+
+    def test_call_count_increments(self, mock_adapter, run_config):
+        assert mock_adapter.call_count == 0
+        mock_adapter.execute_prompt("a", run_config)
+        mock_adapter.execute_prompt("b", run_config)
+        assert mock_adapter.call_count == 2
+
+    def test_records_last_prompt(self, mock_adapter, run_config):
+        mock_adapter.execute_prompt("my prompt", run_config)
+        assert mock_adapter.last_prompt == "my prompt"
+
+    def test_records_last_config(self, mock_adapter, run_config):
+        mock_adapter.execute_prompt("x", run_config)
+        assert mock_adapter.last_config is run_config
+
+    def test_reset_clears_state(self, mock_adapter, run_config):
+        mock_adapter.execute_prompt("x", run_config)
+        mock_adapter.reset()
+        assert mock_adapter.call_count == 0
+        assert mock_adapter.last_prompt is None
+        assert mock_adapter.last_config is None
+
+    def test_validate_config_always_true(self, mock_adapter, run_config):
+        assert mock_adapter.validate_config(run_config) is True
+
+    def test_usage_contains_expected_keys(self, mock_adapter, run_config):
+        response = mock_adapter.execute_prompt("prompt text", run_config)
+        assert "prompt_tokens" in response.usage
+        assert "completion_tokens" in response.usage
+        assert "total_tokens" in response.usage
+
+    def test_custom_response_text(self, run_config):
+        adapter = MockLLMAdapter(mock_response="custom answer")
+        response = adapter.execute_prompt("q", run_config)
+        assert response.content == "custom answer"
+
+    def test_default_response_text(self, run_config):
+        adapter = MockLLMAdapter()
+        response = adapter.execute_prompt("q", run_config)
+        assert response.content == "Mock LLM response"
+
+    def test_metadata_marks_as_mock(self, mock_adapter, run_config):
+        response = mock_adapter.execute_prompt("q", run_config)
+        assert response.metadata.get("mock") is True
+
+
+class TestErrorLLMAdapter:
+    def test_raises_on_execute(self, run_config):
+        adapter = ErrorLLMAdapter()
+        with pytest.raises(RuntimeError):
+            adapter.execute_prompt("q", run_config)
+
+    def test_raises_with_custom_message(self, run_config):
+        adapter = ErrorLLMAdapter(error_message="boom")
+        with pytest.raises(RuntimeError, match="boom"):
+            adapter.execute_prompt("q", run_config)
+
+    def test_validate_config_returns_true(self, run_config):
+        adapter = ErrorLLMAdapter()
+        assert adapter.validate_config(run_config) is True
--- a/tests/test_async.py
+++ b/tests/test_async.py
@@ -0,0 +1,101 @@
+"""
+Tests for async_execute_prompt (FR-3).
+"""
+
+import asyncio
+import pytest
+
+from llm_connect.models import RunConfig, BudgetTracker
+from llm_connect.adapter import MockLLMAdapter
+from llm_connect.exceptions import LLMBudgetExceededError
+
+
+class TestAsyncExecutePrompt:
+    def test_default_fallback_returns_response(self):
+        adapter = MockLLMAdapter(mock_response="async result")
+        config = RunConfig()
+        response = asyncio.run(adapter.async_execute_prompt("hello", config))
+        assert response.content == "async result"
+
+    def test_gather_multiple_adapters(self):
+        """asyncio.gather over N adapters completes without errors."""
+        adapters = [MockLLMAdapter(mock_response=f"resp-{i}") for i in range(4)]
+        config = RunConfig()
+
+        async def run():
+            return await asyncio.gather(*[
+                a.async_execute_prompt("prompt", config) for a in adapters
+            ])
+
+        results = asyncio.run(run())
+        assert len(results) == 4
+        for i, r in enumerate(results):
+            assert r.content == f"resp-{i}"
+
+    def test_gather_increments_call_counts(self):
+        adapter = MockLLMAdapter()
+        config = RunConfig()
+
+        async def run():
+            await asyncio.gather(*[
+                adapter.async_execute_prompt("p", config) for _ in range(5)
+            ])
+
+        asyncio.run(run())
+        assert adapter.call_count == 5
+
+    def test_concurrent_faster_than_sequential(self):
+        """Gathering N async calls should not be N× slower than one call."""
+        import time
+
+        adapter = MockLLMAdapter()
+        config = RunConfig()
+
+        async def run_concurrent(n: int):
+            await asyncio.gather(*[
+                adapter.async_execute_prompt("p", config) for _ in range(n)
+            ])
+
+        # Just verify it completes without deadlock or error — timing is CI-unreliable
+        asyncio.run(run_concurrent(10))
+        assert adapter.call_count == 10
+
+    def test_async_with_budget_tracker(self):
+        """Budget enforcement works through async calls."""
+        tracker = BudgetTracker(total=10000)
+        config = RunConfig(budget_tracker=tracker)
+        adapter = MockLLMAdapter(mock_response="hi")
+
+        asyncio.run(adapter.async_execute_prompt("hello", config))
+        assert tracker.spent > 0
+
+    def test_async_exhausted_budget_raises(self):
+        """Exhausted budget raises LLMBudgetExceededError in async context."""
+        tracker = BudgetTracker(total=1)
+        tracker.consume(1)
+        config = RunConfig(budget_tracker=tracker)
+        adapter = MockLLMAdapter()
+
+        with pytest.raises(LLMBudgetExceededError):
+            asyncio.run(adapter.async_execute_prompt("p", config))
+
+    def test_async_gather_with_shared_budget(self):
+        """Shared budget across concurrent async calls is enforced correctly."""
+        tracker = BudgetTracker(total=100000)
+        config = RunConfig(budget_tracker=tracker)
+        adapters = [MockLLMAdapter(mock_response="hi") for _ in range(4)]
+
+        async def run():
+            await asyncio.gather(*[
+                a.async_execute_prompt("hello", config) for a in adapters
+            ])
+
+        asyncio.run(run())
+        assert tracker.spent > 0
+
+    def test_returns_llm_response_type(self):
+        from llm_connect.models import LLMResponse
+        adapter = MockLLMAdapter()
+        config = RunConfig()
+        response = asyncio.run(adapter.async_execute_prompt("q", config))
+        assert isinstance(response, LLMResponse)
--- a/tests/test_budget.py
+++ b/tests/test_budget.py
@@ -0,0 +1,152 @@
+"""
+Tests for BudgetTracker (FR-4) and LLMBudgetExceededError.
+"""
+
+import threading
+import pytest
+
+from llm_connect.models import BudgetTracker, RunConfig
+from llm_connect.adapter import MockLLMAdapter
+from llm_connect.exceptions import LLMBudgetExceededError, LLMError
+
+
+class TestBudgetTracker:
+    def test_initial_state(self):
+        t = BudgetTracker(total=1000)
+        assert t.total == 1000
+        assert t.spent == 0
+        assert t.remaining() == 1000
+
+    def test_consume_updates_spent(self):
+        t = BudgetTracker(total=1000)
+        t.consume(300)
+        assert t.spent == 300
+        assert t.remaining() == 700
+
+    def test_consume_multiple_times(self):
+        t = BudgetTracker(total=1000)
+        t.consume(400)
+        t.consume(400)
+        assert t.spent == 800
+        assert t.remaining() == 200
+
+    def test_consume_exact_budget(self):
+        t = BudgetTracker(total=100)
+        t.consume(100)
+        assert t.spent == 100
+        assert t.remaining() == 0
+
+    def test_consume_exceeds_budget_raises(self):
+        t = BudgetTracker(total=100)
+        t.consume(60)
+        with pytest.raises(LLMBudgetExceededError):
+            t.consume(50)
+
+    def test_exceeded_error_carries_details(self):
+        t = BudgetTracker(total=100)
+        t.consume(80)
+        with pytest.raises(LLMBudgetExceededError) as exc_info:
+            t.consume(30)
+        err = exc_info.value
+        assert err.total == 100
+        assert err.spent == 80
+        assert err.requested == 30
+
+    def test_exceeded_error_is_subclass_of_llm_error(self):
+        with pytest.raises(LLMError):
+            t = BudgetTracker(total=10)
+            t.consume(20)
+
+    def test_remaining_never_negative(self):
+        t = BudgetTracker(total=100)
+        t.consume(100)
+        assert t.remaining() == 0
+
+    def test_invalid_total_raises(self):
+        with pytest.raises(ValueError):
+            BudgetTracker(total=0)
+        with pytest.raises(ValueError):
+            BudgetTracker(total=-1)
+
+    def test_repr(self):
+        t = BudgetTracker(total=500)
+        t.consume(100)
+        r = repr(t)
+        assert "500" in r
+        assert "100" in r
+
+    def test_thread_safety(self):
+        """Concurrent consume() calls must not corrupt state or allow overspend."""
+        total = 1000
+        t = BudgetTracker(total=total)
+        errors = []
+
+        def consume_100():
+            try:
+                t.consume(100)
+            except LLMBudgetExceededError:
+                errors.append(1)
+
+        threads = [threading.Thread(target=consume_100) for _ in range(15)]
+        for th in threads:
+            th.start()
+        for th in threads:
+            th.join()
+
+        # At most 10 consumes of 100 can succeed within a budget of 1000
+        assert t.spent <= total
+        assert len(errors) == 5  # 15 attempts, 10 succeed, 5 fail
+
+
+class TestBudgetEnforcementInAdapter:
+    def test_single_call_consumes_budget(self):
+        tracker = BudgetTracker(total=10000)
+        config = RunConfig(budget_tracker=tracker)
+        adapter = MockLLMAdapter(mock_response="hello world")
+        adapter.execute_prompt("test prompt", config)
+        assert tracker.spent > 0
+
+    def test_exhausted_budget_raises_before_call(self):
+        tracker = BudgetTracker(total=1)
+        tracker.consume(1)  # exhaust it
+        config = RunConfig(budget_tracker=tracker)
+        adapter = MockLLMAdapter()
+        with pytest.raises(LLMBudgetExceededError):
+            adapter.execute_prompt("any prompt", config)
+        # Adapter should not have been called
+        assert adapter.call_count == 0
+
+    def test_delegation_chain_shared_tracker(self):
+        """A → B → C sharing the same tracker enforces the cap across all calls."""
+        tracker = BudgetTracker(total=10000)
+        config = RunConfig(budget_tracker=tracker)
+        adapter = MockLLMAdapter(mock_response="response")
+
+        adapter.execute_prompt("call A", config)
+        adapter.execute_prompt("call B", config)
+        adapter.execute_prompt("call C", config)
+
+        assert adapter.call_count == 3
+        assert tracker.spent > 0
+
+    def test_budget_exceeded_mid_chain(self):
+        """Chain stops when budget is exhausted between calls."""
+        # MockLLMAdapter uses word count for tokens — "x" * 200 = 200 token prompt
+        # mock_response "r" * 100 = 25 tokens; total ~75 per call
+        adapter = MockLLMAdapter(mock_response="r " * 50)  # ~50 completion tokens
+        tracker = BudgetTracker(total=200)
+        config = RunConfig(budget_tracker=tracker)
+
+        # First call succeeds
+        adapter.execute_prompt("p " * 100, config)
+        # Eventually exhausts the budget
+        with pytest.raises(LLMBudgetExceededError):
+            for _ in range(10):
+                adapter.execute_prompt("p " * 100, config)
+
+    def test_no_tracker_has_no_effect(self):
+        """Adapters work normally when no budget_tracker is set."""
+        config = RunConfig()  # no budget_tracker
+        adapter = MockLLMAdapter()
+        response = adapter.execute_prompt("hello", config)
+        assert response.content == "Mock LLM response"
--- a/tests/test_exceptions.py
+++ b/tests/test_exceptions.py
@@ -0,0 +1,96 @@
+"""
+Tests for the LLMError exception hierarchy (Core).
+"""
+
+import pytest
+from llm_connect.exceptions import (
+    LLMError,
+    LLMConfigurationError,
+    LLMAPIError,
+    LLMRateLimitError,
+    LLMTimeoutError,
+    LLMSubprocessError,
+)
+
+
+class TestLLMErrorHierarchy:
+    def test_all_are_subclasses_of_llm_error(self):
+        assert issubclass(LLMConfigurationError, LLMError)
+        assert issubclass(LLMAPIError, LLMError)
+        assert issubclass(LLMRateLimitError, LLMError)
+        assert issubclass(LLMTimeoutError, LLMError)
+        assert issubclass(LLMSubprocessError, LLMError)
+
+    def test_rate_limit_is_api_error(self):
+        assert issubclass(LLMRateLimitError, LLMAPIError)
+
+    def test_all_are_exceptions(self):
+        assert issubclass(LLMError, Exception)
+
+
+class TestLLMError:
+    def test_basic_message(self):
+        err = LLMError("something went wrong")
+        assert str(err) == "something went wrong"
+
+    def test_context_appears_in_str(self):
+        err = LLMError("oops", context={"provider": "openai"})
+        assert "provider=openai" in str(err)
+
+    def test_cause_is_chained(self):
+        cause = ValueError("root cause")
+        err = LLMError("wrapper", cause=cause)
+        assert err.__cause__ is cause
+
+    def test_empty_context_does_not_appear(self):
+        err = LLMError("clean message", context={})
+        assert str(err) == "clean message"
+
+
+class TestLLMAPIError:
+    def test_has_status_code(self):
+        err = LLMAPIError("bad request", status_code=400)
+        assert err.status_code == 400
+
+    def test_has_response_body(self):
+        err = LLMAPIError("error", status_code=500, response_body='{"error": "oops"}')
+        assert err.response_body == '{"error": "oops"}'
+
+    def test_defaults(self):
+        err = LLMAPIError("minimal")
+        assert err.status_code == 0
+        assert err.response_body == ""
+
+    def test_rate_limit_inherits_status_code(self):
+        err = LLMRateLimitError("too many", status_code=429)
+        assert err.status_code == 429
+        assert isinstance(err, LLMAPIError)
+
+
+class TestLLMSubprocessError:
+    def test_has_return_code(self):
+        err = LLMSubprocessError("cli failed", return_code=1)
+        assert err.return_code == 1
+
+    def test_has_stderr(self):
+        err = LLMSubprocessError("cli failed", stderr="error output")
+        assert err.stderr == "error output"
+
+    def test_defaults(self):
+        err = LLMSubprocessError("minimal")
+        assert err.return_code == 1
+        assert err.stderr == ""
+
+
+class TestRaiseAndCatch:
+    def test_catch_as_llm_error(self):
+        with pytest.raises(LLMError):
+            raise LLMConfigurationError("no key")
+
+    def test_catch_api_error_as_llm_error(self):
+        with pytest.raises(LLMError):
+            raise LLMAPIError("http error", status_code=502)
+
+    def test_catch_rate_limit_as_api_error(self):
+        with pytest.raises(LLMAPIError):
+            raise LLMRateLimitError("429", status_code=429)
--- a/tests/test_factory.py
+++ b/tests/test_factory.py
@@ -0,0 +1,97 @@
+"""
+Tests for create_adapter() and create_embedding_adapter() factories.
+"""
+
+import pytest
+from llm_connect.factory import create_adapter
+from llm_connect.embedding_factory import create_embedding_adapter
+from llm_connect.exceptions import LLMConfigurationError
+from llm_connect.adapter import LLMAdapter
+from llm_connect.embedding_adapter import EmbeddingAdapter
+from llm_connect.openrouter import OpenRouterAdapter
+from llm_connect.claude_code import ClaudeCodeAdapter
+from llm_connect.openai import OpenAIAdapter
+from llm_connect.gemini import GeminiAdapter
+from llm_connect.embedding_openai import OpenAICompatibleEmbeddingAdapter
+
+
+class TestCreateAdapter:
+    def test_unknown_provider_raises(self):
+        with pytest.raises(LLMConfigurationError, match="Unknown LLM provider"):
+            create_adapter("nonexistent-provider")
+
+    def test_unknown_provider_error_lists_known(self):
+        with pytest.raises(LLMConfigurationError) as exc_info:
+            create_adapter("bad")
+        assert "openai" in str(exc_info.value)
+        assert "gemini" in str(exc_info.value)
+
+    def test_openrouter_returns_adapter(self):
+        adapter = create_adapter("openrouter", api_key="test-key")
+        assert isinstance(adapter, OpenRouterAdapter)
+        assert isinstance(adapter, LLMAdapter)
+
+    def test_openrouter_no_key_still_constructs(self):
+        # OpenRouterAdapter defers key validation to execute_prompt
+        adapter = create_adapter("openrouter")
+        assert isinstance(adapter, OpenRouterAdapter)
+
+    def test_openai_with_key_returns_adapter(self):
+        adapter = create_adapter("openai", api_key="sk-test-key")
+        assert isinstance(adapter, OpenAIAdapter)
+        assert isinstance(adapter, LLMAdapter)
+
+    def test_openai_without_key_raises(self, monkeypatch):
+        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+        with pytest.raises(LLMConfigurationError):
+            create_adapter("openai")
+
+    def test_gemini_with_key_returns_adapter(self):
+        adapter = create_adapter("gemini", api_key="aistudio-test-key")
+        assert isinstance(adapter, GeminiAdapter)
+        assert isinstance(adapter, LLMAdapter)
+
+    def test_gemini_without_key_raises(self, monkeypatch):
+        monkeypatch.delenv("GEMINI_API_KEY", raising=False)
+        with pytest.raises(LLMConfigurationError):
+            create_adapter("gemini")
+
+    def test_claude_code_returns_adapter(self):
+        adapter = create_adapter("claude-code")
+        assert isinstance(adapter, ClaudeCodeAdapter)
+        assert isinstance(adapter, LLMAdapter)
+
+    def test_claude_code_with_model(self):
+        adapter = create_adapter("claude-code", model="claude-opus-4-6")
+        assert isinstance(adapter, ClaudeCodeAdapter)
+
+    def test_all_known_providers_are_reachable(self):
+        known = {"openrouter", "openai", "gemini", "claude-code"}
+        # Just verify each key is in the factory registry (no construction needed)
+        from llm_connect.factory import _PROVIDERS
+        assert known == set(_PROVIDERS.keys())
+
+
+class TestCreateEmbeddingAdapter:
+    def test_unknown_provider_raises(self):
+        with pytest.raises(LLMConfigurationError, match="Unknown embedding provider"):
+            create_embedding_adapter("nonexistent")
+
+    def test_openai_returns_adapter(self):
+        adapter = create_embedding_adapter("openai", api_key="sk-test")
+        assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
+        assert isinstance(adapter, EmbeddingAdapter)
+
+    def test_openrouter_returns_adapter(self):
+        adapter = create_embedding_adapter("openrouter", api_key="or-test")
+        assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
+        assert isinstance(adapter, EmbeddingAdapter)
+
+    def test_validate_returns_true_when_key_set(self):
+        adapter = create_embedding_adapter("openai", api_key="sk-test")
+        assert adapter.validate() is True
+
+    def test_validate_returns_false_when_no_key(self, monkeypatch):
+        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+        adapter = create_embedding_adapter("openai")
+        assert adapter.validate() is False
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -0,0 +1,86 @@
+"""
+Tests for RunConfig and LLMResponse (Core models).
+"""
+
+import pytest
+from llm_connect.models import RunConfig, LLMResponse
+
+
+class TestRunConfig:
+    def test_defaults(self):
+        cfg = RunConfig()
+        assert cfg.model_name == "gpt-4"
+        assert cfg.temperature == 0.7
+        assert cfg.max_tokens == 2000
+        assert cfg.model_params == {}
+        assert cfg.max_depth == 3
+        assert cfg.skip_if_exists is True
+        assert cfg.timeout_seconds == 300
+
+    def test_custom_values(self):
+        cfg = RunConfig(model_name="gemini-2.5-flash", temperature=0.1, max_tokens=500)
+        assert cfg.model_name == "gemini-2.5-flash"
+        assert cfg.temperature == 0.1
+        assert cfg.max_tokens == 500
+
+    def test_to_dict_roundtrip(self):
+        cfg = RunConfig(model_name="gpt-4o", temperature=0.3, max_tokens=1000)
+        d = cfg.to_dict()
+        assert d["model_name"] == "gpt-4o"
+        assert d["temperature"] == 0.3
+        assert d["max_tokens"] == 1000
+
+    def test_from_dict_roundtrip(self):
+        original = RunConfig(model_name="claude-3", temperature=0.5, max_tokens=800)
+        restored = RunConfig.from_dict(original.to_dict())
+        assert restored.model_name == original.model_name
+        assert restored.temperature == original.temperature
+        assert restored.max_tokens == original.max_tokens
+
+    def test_from_dict_uses_defaults_for_missing_keys(self):
+        cfg = RunConfig.from_dict({})
+        assert cfg.model_name == "gpt-4"
+        assert cfg.temperature == 0.7
+
+    def test_model_params_default_is_independent(self):
+        a = RunConfig()
+        b = RunConfig()
+        a.model_params["x"] = 1
+        assert "x" not in b.model_params
+
+
+class TestLLMResponse:
+    def test_minimal_construction(self):
+        r = LLMResponse(content="hello", model="test-model")
+        assert r.content == "hello"
+        assert r.model == "test-model"
+        assert r.usage == {}
+        assert r.finish_reason == "stop"
+        assert r.metadata == {}
+
+    def test_full_construction(self):
+        r = LLMResponse(
+            content="response text",
+            model="gpt-4",
+            usage={"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
+            finish_reason="length",
+            metadata={"provider": "openai", "latency_seconds": 1.2},
+        )
+        assert r.usage["total_tokens"] == 15
+        assert r.finish_reason == "length"
+        assert r.metadata["provider"] == "openai"
+
+    def test_to_dict(self):
+        r = LLMResponse(content="hi", model="m", finish_reason="stop")
+        d = r.to_dict()
+        assert d["content"] == "hi"
+        assert d["model"] == "m"
+        assert d["finish_reason"] == "stop"
+        assert "usage" in d
+        assert "metadata" in d
+
+    def test_metadata_default_is_independent(self):
+        a = LLMResponse(content="a", model="m")
+        b = LLMResponse(content="b", model="m")
+        a.metadata["x"] = 1
+        assert "x" not in b.metadata
--- a/workplans/llm-connect-WP-0001-foundation-gaaf-baseline.md
+++ b/workplans/llm-connect-WP-0001-foundation-gaaf-baseline.md
@@ -0,0 +1,36 @@
+# LLM-WP-0001 — Foundation & GAAF Baseline
+
+**status:** active  
+**owner:** llm-connect  
+**repo:** llm-connect  
+**created:** 2026-04-01  
+
+## Purpose
+
+Establish the structural foundation required before any Core modifications.
+Covers repo orientation files, GAAF-2026 compliance artifacts, test suite, CI,
+and state-hub housekeeping.
+
+## Tasks
+
+| ID  | Title | Priority | Status |
+|-----|-------|----------|--------|
+| T01 | Create `SCOPE.md` | high | done |
+| T02 | Fill `.claude/rules/` stubs: `architecture.md`, `stack-and-commands.md`, `repo-boundary.md` | high | done |
+| T03 | Create `ARCHITECTURE-LAYERS.md` with layer map, scorecard stub, next-review date | high | done |
+| T04 | Create `/contracts/` tree (`core/`, `functional/`, `config/`) | high | done |
+| T05 | Core contract doc: `LLMAdapter` interface invariants, `RunConfig`/`LLMResponse` field contracts | high | done |
+| T06 | Functional contract stubs for all 4 adapters + embedding adapters (maturity: Beta) | medium | done |
+| T07 | Create `tests/` with `conftest.py`, wire pytest in `pyproject.toml` | high | done |
+| T08 | Unit tests: `RunConfig`, `LLMResponse`, `MockLLMAdapter`, full exception hierarchy | high | done |
+| T09 | Unit tests: `create_adapter` (all providers + unknown provider error), `create_embedding_adapter` | high | done |
+| T10 | Add `ruff`, `mypy` to dev deps in `pyproject.toml` | medium | done |
+| T11 | CI workflow: pytest + ruff + mypy | medium | done |
+| T12 | State hub: register this host path, SBOM refresh | low | done |
+
+## Exit criteria
+
+- `ARCHITECTURE-LAYERS.md` and `/contracts/core/` exist and describe the current Core surface
+- pytest passes with coverage of Core and factory
+- ruff + mypy clean
+- CI green on push
--- a/workplans/llm-connect-WP-0002-core-extensions.md
+++ b/workplans/llm-connect-WP-0002-core-extensions.md
@@ -0,0 +1,57 @@
+# LLM-WP-0002 — Core Extensions (FR-4 + FR-3)
+
+**status:** active  
+**owner:** llm-connect  
+**repo:** llm-connect  
+**created:** 2026-04-01  
+**depends-on:** LLM-WP-0001 (contracts and tests must exist before Core is modified)
+
+## Purpose
+
+Implement the two IHF feature requests that touch the Core layer.
+FR-4 (BudgetTracker) is additive and non-breaking. FR-3 (async) extends
+the Core ABC with a default executor fallback — non-breaking, overridable
+per adapter for native async.
+
+Origin: IHUB-WP-0012 Phase 11 — Advanced AI Federation (completed 2026-04-01).
+
+## GAAF notes
+
+Both changes are Core-layer modifications under GAAF-2026:
+- FR-4: new primitive (`BudgetTracker`) + new exception (`LLMBudgetExceededError`)
+  added as optional `RunConfig` field — additive, non-breaking.
+- FR-3: `async_execute_prompt` added to `LLMAdapter` ABC with a default
+  `asyncio.get_event_loop().run_in_executor(None, ...)` fallback so existing
+  adapters remain valid; native async overrides are provided per adapter.
+
+Core contract doc (from WP-0001 T05) must be updated after each change.
+
+## Tasks
+
+### FR-4 — BudgetTracker
+
+| ID  | Title | Priority | Status |
+|-----|-------|----------|--------|
+| T01 | `BudgetTracker` dataclass: `total`, `spent`, `remaining()`, thread-safe increment | high | todo |
+| T02 | `LLMBudgetExceededError(LLMError)` in `exceptions.py` | high | todo |
+| T03 | Optional `budget_tracker: BudgetTracker \| None` field on `RunConfig` | high | todo |
+| T04 | Enforcement: each adapter checks/updates tracker around call; raises on exceeded | high | todo |
+| T05 | Update Core contract doc | medium | todo |
+| T06 | Tests: single call, delegation chain (A→B→C shared tracker), exceeded error, multi-adapter | high | todo |
+
+### FR-3 — async_execute_prompt
+
+| ID  | Title | Priority | Status |
+|-----|-------|----------|--------|
+| T07 | Add `async_execute_prompt` to `LLMAdapter` ABC with default executor fallback | high | todo |
+| T08 | Native async override in `OpenAIAdapter`, `GeminiAdapter`, `OpenRouterAdapter` | high | todo |
+| T09 | Native async for `ClaudeCodeAdapter` via `asyncio.create_subprocess_exec` | high | todo |
+| T10 | Update Core contract doc | medium | todo |
+| T11 | Tests: `asyncio.gather` over N adapters, timeout propagation, budget interaction | high | todo |
+
+## Exit criteria
+
+- `BudgetTracker` enforces caps across a delegation chain of 3 adapters in tests
+- `asyncio.gather` over 4 mock adapters completes without errors
+- All existing tests still pass (non-breaking validation)
+- Core contract doc reflects both additions
--- a/workplans/llm-connect-WP-0003-functional-extensions.md
+++ b/workplans/llm-connect-WP-0003-functional-extensions.md
@@ -0,0 +1,51 @@
+# LLM-WP-0003 — Functional Extensions (FR-2 + FR-1)
+
+**status:** active  
+**owner:** llm-connect  
+**repo:** llm-connect  
+**created:** 2026-04-01  
+**depends-on:** LLM-WP-0001 (test infrastructure must exist)
+
+## Purpose
+
+Implement the two IHF feature requests that add new Functional-layer modules.
+Neither touches Core. Both can be developed independently of WP-0002.
+
+Origin: IHUB-WP-0012 Phase 11 — Advanced AI Federation (completed 2026-04-01).
+
+## GAAF notes
+
+Both additions are Functional-layer under GAAF-2026:
+- Demand signal is explicit: IHF (inter-hub) is the primary consumer for both.
+- Each gets its own functional contract doc in `/contracts/functional/`.
+- Maturity on release: Beta (single known consumer, interface not yet stabilised).
+
+## Tasks
+
+### FR-2 — RoutingPolicy
+
+| ID  | Title | Priority | Status |
+|-----|-------|----------|--------|
+| T01 | `RoutingPolicy` data model: `rules` list with `task_type`, `prefer`, `max_cost_per_1k`, `fallback` | high | todo |
+| T02 | `policy.resolve(task_type)` → returns configured `LLMAdapter` | high | todo |
+| T03 | Export from `llm_connect.__init__` and update `__all__` | medium | todo |
+| T04 | Functional contract doc for `RoutingPolicy` | medium | todo |
+| T05 | Tests: rule match, cost-cap fallback, unknown task_type fallback, no-match default | high | todo |
+
+### FR-1 — HTTP serve mode
+
+| ID  | Title | Priority | Status |
+|-----|-------|----------|--------|
+| T06 | Design `/execute` JSON schema (request: provider, model, prompt, config; response: LLMResponse fields) | high | todo |
+| T07 | Implement `llm_connect/server.py` — minimal HTTP server, `POST /execute`, `GET /health` | high | todo |
+| T08 | `python -m llm_connect.server --port N --provider X --model Y` CLI entry point | high | todo |
+| T09 | Add `httpx` or `aiohttp` server dep under `[project.optional-dependencies] server` | medium | todo |
+| T10 | Functional contract doc (API schema — request/response shapes, error codes) | medium | todo |
+| T11 | Tests: spin up server in subprocess or via `TestClient`, POST round-trip (MockAdapter), error responses | high | todo |
+
+## Exit criteria
+
+- `RoutingPolicy.resolve("triage")` returns the correct adapter per rules in tests
+- `python -m llm_connect.server --port 9999` starts and responds to `POST /execute`
+- `GET /health` returns 200
+- All functional contract docs present in `/contracts/functional/`