generated from coulomb/repo-seed
feat: WP-0003 — RoutingPolicy (FR-2) and HTTP serve mode (FR-1)
FR-2 RoutingPolicy: - RoutingPolicy + RoutingRule dataclasses in llm_connect/routing.py - resolve(task_type, estimated_cost_per_1k=None) with cost-cap fallback - Exported from llm_connect.__init__; contract doc at contracts/functional/routing-policy.md - 11 tests covering rule match, cost-cap, fallback, unknown type, no-match FR-1 HTTP serve mode: - LLMServer in llm_connect/server.py (stdlib http.server, zero extra deps) - POST /execute + GET /health; CLI via python -m llm_connect.server - [server] optional-dep group added to pyproject.toml - Contract doc at contracts/functional/server.md - 9 tests: health, round-trip, 400/404/500 errors, config forwarding - Added "mock" provider to factory for CLI default All 101 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
53
contracts/functional/routing-policy.md
Normal file
53
contracts/functional/routing-policy.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Contract: RoutingPolicy
|
||||
|
||||
**layer:** Functional
|
||||
**maturity:** Beta
|
||||
**module:** `llm_connect.routing`
|
||||
**since:** WP-0003
|
||||
|
||||
## Purpose
|
||||
|
||||
Route logical task types to concrete `LLMAdapter` instances based on a
|
||||
prioritised rule list, with optional per-rule cost-cap fallback.
|
||||
|
||||
## Public surface
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class RoutingRule:
|
||||
task_type: str
|
||||
prefer: LLMAdapter
|
||||
max_cost_per_1k: Optional[float] = None # USD per 1 000 tokens
|
||||
fallback: Optional[LLMAdapter] = None
|
||||
|
||||
@dataclass
|
||||
class RoutingPolicy:
|
||||
rules: List[RoutingRule] = field(default_factory=list)
|
||||
default: Optional[LLMAdapter] = None
|
||||
|
||||
def resolve(
|
||||
self,
|
||||
task_type: str,
|
||||
estimated_cost_per_1k: Optional[float] = None,
|
||||
) -> LLMAdapter: ...
|
||||
```
|
||||
|
||||
## Invariants
|
||||
|
||||
1. Rules are evaluated in list order; the first rule whose `task_type` matches wins.
|
||||
2. When `estimated_cost_per_1k` is supplied and a matching rule has `max_cost_per_1k` set:
|
||||
- If `estimated_cost_per_1k > max_cost_per_1k` **and** `fallback is not None` → return `fallback`.
|
||||
- Otherwise → return `prefer` (no fallback configured or cost within cap).
|
||||
3. When no rule matches and `default is not None` → return `default`.
|
||||
4. When no rule matches and `default is None` → raise `LookupError`.
|
||||
5. `resolve()` never mutates policy state.
|
||||
|
||||
## Error contract
|
||||
|
||||
| Condition | Exception |
|
||||
|-----------|-----------|
|
||||
| No matching rule, no default | `LookupError` |
|
||||
|
||||
## Known consumers
|
||||
|
||||
- `inter-hub` (IHUB-WP-0012 Phase 11): uses `RoutingPolicy` to select federation adapters per task class.
|
||||
85
contracts/functional/server.md
Normal file
85
contracts/functional/server.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Contract: HTTP Serve Mode
|
||||
|
||||
**layer:** Functional
|
||||
**maturity:** Beta
|
||||
**module:** `llm_connect.server`
|
||||
**since:** WP-0003
|
||||
|
||||
## Purpose
|
||||
|
||||
Expose any `LLMAdapter` as a lightweight HTTP service. Intended for
|
||||
local/inter-process use; not hardened for public internet exposure.
|
||||
|
||||
## API endpoints
|
||||
|
||||
### `GET /health`
|
||||
|
||||
Liveness probe.
|
||||
|
||||
**Response 200**
|
||||
|
||||
```json
|
||||
{"status": "ok"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `POST /execute`
|
||||
|
||||
Execute a prompt through the configured adapter.
|
||||
|
||||
**Request body** (JSON)
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `prompt` | string | yes | Prompt text |
|
||||
| `config` | object | no | `RunConfig` overrides (see below) |
|
||||
|
||||
`config` sub-fields (all optional, defaults match `RunConfig` defaults):
|
||||
|
||||
| Field | Type | Default |
|
||||
|-------|------|---------|
|
||||
| `model_name` | string | `"gpt-4"` |
|
||||
| `temperature` | float | `0.7` |
|
||||
| `max_tokens` | int | `2000` |
|
||||
| `timeout_seconds` | int | `300` |
|
||||
|
||||
**Response 200** — `LLMResponse.to_dict()` shape
|
||||
|
||||
```json
|
||||
{
|
||||
"content": "...",
|
||||
"model": "...",
|
||||
"usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
|
||||
"finish_reason": "stop",
|
||||
"metadata": {}
|
||||
}
|
||||
```
|
||||
|
||||
**Error responses**
|
||||
|
||||
| HTTP | Condition |
|
||||
|------|-----------|
|
||||
| 400 | Missing `prompt` field or invalid JSON body |
|
||||
| 404 | Unknown path |
|
||||
| 500 | Adapter raised an exception |
|
||||
|
||||
## Implementation notes
|
||||
|
||||
- Uses Python stdlib `http.server` — **no additional runtime dependency**.
|
||||
- The `[server]` optional-dependency group is reserved for future migration
|
||||
to `aiohttp`/`starlette` if native async serving is required.
|
||||
- `LLMServer(adapter, port=0)` binds to an OS-assigned free port; read back
|
||||
via `server.port` after `start()`.
|
||||
|
||||
## CLI
|
||||
|
||||
```
|
||||
python -m llm_connect.server [--host HOST] [--port PORT] [--provider PROVIDER] [--model MODEL]
|
||||
```
|
||||
|
||||
Default provider: `mock`. All registered providers from `create_adapter` are valid.
|
||||
|
||||
## Known consumers
|
||||
|
||||
- `inter-hub` (IHUB-WP-0012 Phase 11): drives federation calls over HTTP from non-Python services.
|
||||
@@ -33,6 +33,8 @@ from llm_connect.embedding_adapter import EmbeddingAdapter
|
||||
from llm_connect.embedding_openai import OpenAICompatibleEmbeddingAdapter
|
||||
from llm_connect.embedding_cache import EmbeddingCache
|
||||
from llm_connect.embedding_factory import create_embedding_adapter
|
||||
from llm_connect.routing import RoutingPolicy, RoutingRule
|
||||
from llm_connect.server import LLMServer
|
||||
from llm_connect.similarity import (
|
||||
cosine_similarity,
|
||||
similarity_matrix,
|
||||
@@ -67,4 +69,7 @@ __all__ = [
|
||||
"cosine_similarity",
|
||||
"similarity_matrix",
|
||||
"find_similar_pairs",
|
||||
"RoutingPolicy",
|
||||
"RoutingRule",
|
||||
"LLMServer",
|
||||
]
|
||||
|
||||
@@ -13,6 +13,7 @@ _PROVIDERS: Dict[str, str] = {
|
||||
"claude-code": "llm_connect.claude_code.ClaudeCodeAdapter",
|
||||
"gemini": "llm_connect.gemini.GeminiAdapter",
|
||||
"openai": "llm_connect.openai.OpenAIAdapter",
|
||||
"mock": "llm_connect.adapter.MockLLMAdapter",
|
||||
}
|
||||
|
||||
|
||||
@@ -57,4 +58,4 @@ def create_adapter(
|
||||
elif provider == "claude-code":
|
||||
return cls(model=model, **kwargs)
|
||||
else:
|
||||
return cls(**kwargs) # pragma: no cover
|
||||
return cls(**kwargs)
|
||||
|
||||
89
llm_connect/routing.py
Normal file
89
llm_connect/routing.py
Normal file
@@ -0,0 +1,89 @@
|
||||
"""
|
||||
RoutingPolicy — task-type-aware adapter selection (FR-2).
|
||||
|
||||
Maps task types to preferred adapters with optional cost-cap fallback.
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional, List
|
||||
|
||||
from llm_connect.adapter import LLMAdapter
|
||||
|
||||
|
||||
@dataclass
|
||||
class RoutingRule:
|
||||
"""Single routing rule binding a task type to an adapter.
|
||||
|
||||
Attributes:
|
||||
task_type: Logical task identifier (e.g. ``"triage"``, ``"summarise"``).
|
||||
prefer: Adapter to use when this rule matches.
|
||||
max_cost_per_1k: Optional cost ceiling (USD per 1 000 tokens). When the
|
||||
caller supplies ``estimated_cost_per_1k`` to :meth:`RoutingPolicy.resolve`
|
||||
and it exceeds this cap, *fallback* is returned instead of *prefer*.
|
||||
fallback: Adapter to use when the cost cap is breached.
|
||||
"""
|
||||
|
||||
task_type: str
|
||||
prefer: LLMAdapter
|
||||
max_cost_per_1k: Optional[float] = None
|
||||
fallback: Optional[LLMAdapter] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class RoutingPolicy:
|
||||
"""Route task types to LLM adapters.
|
||||
|
||||
Rules are evaluated in order; the first match wins. When no rule matches,
|
||||
*default* is returned. If *default* is also absent, ``LookupError`` is raised.
|
||||
|
||||
Example::
|
||||
|
||||
policy = RoutingPolicy(
|
||||
rules=[
|
||||
RoutingRule("triage", prefer=fast_adapter, max_cost_per_1k=0.5, fallback=cheap_adapter),
|
||||
RoutingRule("analysis", prefer=smart_adapter),
|
||||
],
|
||||
default=cheap_adapter,
|
||||
)
|
||||
adapter = policy.resolve("triage")
|
||||
"""
|
||||
|
||||
rules: List[RoutingRule] = field(default_factory=list)
|
||||
default: Optional[LLMAdapter] = None
|
||||
|
||||
def resolve(
|
||||
self,
|
||||
task_type: str,
|
||||
estimated_cost_per_1k: Optional[float] = None,
|
||||
) -> LLMAdapter:
|
||||
"""Return the adapter for *task_type*.
|
||||
|
||||
Args:
|
||||
task_type: Logical task identifier.
|
||||
estimated_cost_per_1k: Caller-supplied cost estimate (USD / 1k tokens).
|
||||
When provided and a matching rule has ``max_cost_per_1k`` set, the
|
||||
rule's ``fallback`` is returned if the estimate exceeds the cap.
|
||||
|
||||
Returns:
|
||||
The selected :class:`~llm_connect.adapter.LLMAdapter`.
|
||||
|
||||
Raises:
|
||||
LookupError: No matching rule and no *default* configured.
|
||||
"""
|
||||
for rule in self.rules:
|
||||
if rule.task_type == task_type:
|
||||
if (
|
||||
estimated_cost_per_1k is not None
|
||||
and rule.max_cost_per_1k is not None
|
||||
and estimated_cost_per_1k > rule.max_cost_per_1k
|
||||
and rule.fallback is not None
|
||||
):
|
||||
return rule.fallback
|
||||
return rule.prefer
|
||||
|
||||
if self.default is not None:
|
||||
return self.default
|
||||
|
||||
raise LookupError(
|
||||
f"No routing rule for task_type={task_type!r} and no default configured"
|
||||
)
|
||||
164
llm_connect/server.py
Normal file
164
llm_connect/server.py
Normal file
@@ -0,0 +1,164 @@
|
||||
"""
|
||||
Minimal HTTP server for llm_connect — serve mode (FR-1).
|
||||
|
||||
Exposes:
|
||||
POST /execute — run a prompt through the configured adapter
|
||||
GET /health — liveness probe
|
||||
|
||||
Usage (programmatic)::
|
||||
|
||||
from llm_connect import MockLLMAdapter
|
||||
from llm_connect.server import LLMServer
|
||||
|
||||
server = LLMServer(adapter=MockLLMAdapter(), port=8080)
|
||||
server.start() # background thread
|
||||
# ...
|
||||
server.stop()
|
||||
|
||||
Usage (CLI)::
|
||||
|
||||
python -m llm_connect.server --port 8080 --provider openrouter --model anthropic/claude-sonnet-4
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import threading
|
||||
from http.server import BaseHTTPRequestHandler, HTTPServer
|
||||
from typing import Optional
|
||||
|
||||
from llm_connect.adapter import LLMAdapter
|
||||
from llm_connect.models import RunConfig
|
||||
|
||||
|
||||
class _Handler(BaseHTTPRequestHandler):
|
||||
"""Request handler — adapter injected via server.adapter."""
|
||||
|
||||
def log_message(self, format, *args): # suppress default access log
|
||||
pass
|
||||
|
||||
# ── GET ────────────────────────────────────────────────────────
|
||||
|
||||
def do_GET(self):
|
||||
if self.path == "/health":
|
||||
self._respond(200, {"status": "ok"})
|
||||
else:
|
||||
self._respond(404, {"error": "not found"})
|
||||
|
||||
# ── POST ───────────────────────────────────────────────────────
|
||||
|
||||
def do_POST(self):
|
||||
if self.path != "/execute":
|
||||
self._respond(404, {"error": "not found"})
|
||||
return
|
||||
|
||||
length = int(self.headers.get("Content-Length", 0))
|
||||
raw = self.rfile.read(length)
|
||||
try:
|
||||
data = json.loads(raw)
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
self._respond(400, {"error": "invalid JSON body"})
|
||||
return
|
||||
|
||||
prompt = data.get("prompt")
|
||||
if not prompt:
|
||||
self._respond(400, {"error": "missing required field: 'prompt'"})
|
||||
return
|
||||
|
||||
cfg = data.get("config", {})
|
||||
config = RunConfig(
|
||||
model_name=cfg.get("model_name", "gpt-4"),
|
||||
temperature=float(cfg.get("temperature", 0.7)),
|
||||
max_tokens=int(cfg.get("max_tokens", 2000)),
|
||||
timeout_seconds=int(cfg.get("timeout_seconds", 300)),
|
||||
)
|
||||
|
||||
try:
|
||||
response = self.server.adapter.execute_prompt(prompt, config) # type: ignore[attr-defined]
|
||||
self._respond(200, response.to_dict())
|
||||
except Exception as exc:
|
||||
self._respond(500, {"error": str(exc)})
|
||||
|
||||
# ── helpers ────────────────────────────────────────────────────
|
||||
|
||||
def _respond(self, status: int, body: dict) -> None:
|
||||
payload = json.dumps(body).encode()
|
||||
self.send_response(status)
|
||||
self.send_header("Content-Type", "application/json")
|
||||
self.send_header("Content-Length", str(len(payload)))
|
||||
self.end_headers()
|
||||
self.wfile.write(payload)
|
||||
|
||||
|
||||
class LLMServer:
|
||||
"""HTTP server wrapping an :class:`~llm_connect.adapter.LLMAdapter`.
|
||||
|
||||
Args:
|
||||
adapter: The adapter that handles ``POST /execute`` requests.
|
||||
host: Bind address (default ``"127.0.0.1"``).
|
||||
port: TCP port (default ``8080``; ``0`` picks a free port).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
adapter: LLMAdapter,
|
||||
host: str = "127.0.0.1",
|
||||
port: int = 8080,
|
||||
) -> None:
|
||||
self._httpd = HTTPServer((host, port), _Handler)
|
||||
self._httpd.adapter = adapter # type: ignore[attr-defined]
|
||||
self._thread: Optional[threading.Thread] = None
|
||||
|
||||
@property
|
||||
def port(self) -> int:
|
||||
"""Actual bound port (useful when ``port=0`` was requested)."""
|
||||
return self._httpd.server_address[1]
|
||||
|
||||
@property
|
||||
def host(self) -> str:
|
||||
return self._httpd.server_address[0]
|
||||
|
||||
def start(self) -> None:
|
||||
"""Start serving in a daemon background thread."""
|
||||
self._thread = threading.Thread(target=self._httpd.serve_forever, daemon=True)
|
||||
self._thread.start()
|
||||
|
||||
def stop(self) -> None:
|
||||
"""Shut down the server and join the background thread."""
|
||||
self._httpd.shutdown()
|
||||
if self._thread is not None:
|
||||
self._thread.join()
|
||||
|
||||
def serve_forever(self) -> None:
|
||||
"""Block the calling thread until interrupted."""
|
||||
self._httpd.serve_forever()
|
||||
|
||||
|
||||
# ── CLI entry point ────────────────────────────────────────────────────────────
|
||||
|
||||
def _build_adapter(provider: str, model: Optional[str]) -> LLMAdapter:
|
||||
from llm_connect.factory import create_adapter
|
||||
return create_adapter(provider, model=model)
|
||||
|
||||
|
||||
def main(argv=None) -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="python -m llm_connect.server",
|
||||
description="Start llm_connect HTTP serve mode.",
|
||||
)
|
||||
parser.add_argument("--port", type=int, default=8080, help="TCP port (default: 8080)")
|
||||
parser.add_argument("--host", default="127.0.0.1", help="Bind address (default: 127.0.0.1)")
|
||||
parser.add_argument("--provider", default="mock", help="Provider name passed to create_adapter")
|
||||
parser.add_argument("--model", default=None, help="Model name (optional)")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
adapter = _build_adapter(args.provider, args.model)
|
||||
server = LLMServer(adapter=adapter, host=args.host, port=args.port)
|
||||
print(f"llm_connect server listening on http://{args.host}:{args.port}")
|
||||
try:
|
||||
server.serve_forever()
|
||||
except KeyboardInterrupt:
|
||||
print("\nShutting down.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -17,6 +17,8 @@ dev = [
|
||||
"ruff>=0.4",
|
||||
"mypy>=1.10",
|
||||
]
|
||||
# serve mode uses stdlib http.server — no additional runtime dependency required
|
||||
server = []
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["."]
|
||||
|
||||
@@ -66,7 +66,7 @@ class TestCreateAdapter:
|
||||
assert isinstance(adapter, ClaudeCodeAdapter)
|
||||
|
||||
def test_all_known_providers_are_reachable(self):
|
||||
known = {"openrouter", "openai", "gemini", "claude-code"}
|
||||
known = {"openrouter", "openai", "gemini", "claude-code", "mock"}
|
||||
# Just verify each key is in the factory registry (no construction needed)
|
||||
from llm_connect.factory import _PROVIDERS
|
||||
assert known == set(_PROVIDERS.keys())
|
||||
|
||||
91
tests/test_routing.py
Normal file
91
tests/test_routing.py
Normal file
@@ -0,0 +1,91 @@
|
||||
"""
|
||||
Tests for RoutingPolicy (FR-2).
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
from llm_connect.routing import RoutingPolicy, RoutingRule
|
||||
from llm_connect.adapter import MockLLMAdapter
|
||||
|
||||
|
||||
class TestRoutingPolicy:
|
||||
def _adapters(self, n: int = 3):
|
||||
return [MockLLMAdapter(mock_response=f"resp-{i}") for i in range(n)]
|
||||
|
||||
def test_rule_match_returns_prefer(self):
|
||||
prefer, *_ = self._adapters()
|
||||
policy = RoutingPolicy(rules=[RoutingRule("triage", prefer=prefer)])
|
||||
assert policy.resolve("triage") is prefer
|
||||
|
||||
def test_first_matching_rule_wins(self):
|
||||
a, b = self._adapters(2)
|
||||
policy = RoutingPolicy(rules=[
|
||||
RoutingRule("triage", prefer=a),
|
||||
RoutingRule("triage", prefer=b),
|
||||
])
|
||||
assert policy.resolve("triage") is a
|
||||
|
||||
def test_cost_cap_within_limit_returns_prefer(self):
|
||||
prefer, fallback = self._adapters(2)
|
||||
policy = RoutingPolicy(rules=[
|
||||
RoutingRule("triage", prefer=prefer, max_cost_per_1k=1.0, fallback=fallback)
|
||||
])
|
||||
assert policy.resolve("triage", estimated_cost_per_1k=0.5) is prefer
|
||||
|
||||
def test_cost_cap_exceeded_returns_fallback(self):
|
||||
prefer, fallback = self._adapters(2)
|
||||
policy = RoutingPolicy(rules=[
|
||||
RoutingRule("triage", prefer=prefer, max_cost_per_1k=1.0, fallback=fallback)
|
||||
])
|
||||
assert policy.resolve("triage", estimated_cost_per_1k=2.0) is fallback
|
||||
|
||||
def test_cost_cap_exceeded_no_fallback_returns_prefer(self):
|
||||
"""When cost exceeds cap but no fallback is set, still return prefer."""
|
||||
prefer, *_ = self._adapters()
|
||||
policy = RoutingPolicy(rules=[
|
||||
RoutingRule("triage", prefer=prefer, max_cost_per_1k=0.1)
|
||||
])
|
||||
assert policy.resolve("triage", estimated_cost_per_1k=5.0) is prefer
|
||||
|
||||
def test_no_estimated_cost_ignores_cap(self):
|
||||
prefer, fallback = self._adapters(2)
|
||||
policy = RoutingPolicy(rules=[
|
||||
RoutingRule("triage", prefer=prefer, max_cost_per_1k=0.01, fallback=fallback)
|
||||
])
|
||||
# No cost estimate → cap not applied
|
||||
assert policy.resolve("triage") is prefer
|
||||
|
||||
def test_unknown_task_type_returns_default(self):
|
||||
prefer, default = self._adapters(2)
|
||||
policy = RoutingPolicy(
|
||||
rules=[RoutingRule("triage", prefer=prefer)],
|
||||
default=default,
|
||||
)
|
||||
assert policy.resolve("unknown") is default
|
||||
|
||||
def test_no_match_no_default_raises_lookup_error(self):
|
||||
prefer, *_ = self._adapters()
|
||||
policy = RoutingPolicy(rules=[RoutingRule("triage", prefer=prefer)])
|
||||
with pytest.raises(LookupError, match="unknown"):
|
||||
policy.resolve("unknown")
|
||||
|
||||
def test_empty_rules_with_default_returns_default(self):
|
||||
default, *_ = self._adapters()
|
||||
policy = RoutingPolicy(default=default)
|
||||
assert policy.resolve("anything") is default
|
||||
|
||||
def test_empty_policy_raises(self):
|
||||
policy = RoutingPolicy()
|
||||
with pytest.raises(LookupError):
|
||||
policy.resolve("triage")
|
||||
|
||||
def test_multiple_task_types(self):
|
||||
a, b, c = self._adapters(3)
|
||||
policy = RoutingPolicy(rules=[
|
||||
RoutingRule("fast", prefer=a),
|
||||
RoutingRule("smart", prefer=b),
|
||||
RoutingRule("cheap", prefer=c),
|
||||
])
|
||||
assert policy.resolve("fast") is a
|
||||
assert policy.resolve("smart") is b
|
||||
assert policy.resolve("cheap") is c
|
||||
134
tests/test_server.py
Normal file
134
tests/test_server.py
Normal file
@@ -0,0 +1,134 @@
|
||||
"""
|
||||
Tests for LLMServer HTTP serve mode (FR-1).
|
||||
"""
|
||||
|
||||
import json
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
|
||||
import pytest
|
||||
|
||||
from llm_connect.adapter import MockLLMAdapter, ErrorLLMAdapter
|
||||
from llm_connect.models import RunConfig
|
||||
from llm_connect.server import LLMServer
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def server():
|
||||
"""Start a server on a free port; stop after each test."""
|
||||
s = LLMServer(adapter=MockLLMAdapter(mock_response="hello world"), port=0)
|
||||
s.start()
|
||||
yield s
|
||||
s.stop()
|
||||
|
||||
|
||||
def _get(url: str) -> tuple[int, dict]:
|
||||
try:
|
||||
with urllib.request.urlopen(url) as resp:
|
||||
return resp.status, json.loads(resp.read())
|
||||
except urllib.error.HTTPError as exc:
|
||||
return exc.code, json.loads(exc.read())
|
||||
|
||||
|
||||
def _post(url: str, body: dict) -> tuple[int, dict]:
|
||||
payload = json.dumps(body).encode()
|
||||
req = urllib.request.Request(
|
||||
url,
|
||||
data=payload,
|
||||
headers={"Content-Type": "application/json"},
|
||||
method="POST",
|
||||
)
|
||||
try:
|
||||
with urllib.request.urlopen(req) as resp:
|
||||
return resp.status, json.loads(resp.read())
|
||||
except urllib.error.HTTPError as exc:
|
||||
return exc.code, json.loads(exc.read())
|
||||
|
||||
|
||||
class TestHealth:
|
||||
def test_health_returns_200(self, server):
|
||||
status, body = _get(f"http://127.0.0.1:{server.port}/health")
|
||||
assert status == 200
|
||||
assert body["status"] == "ok"
|
||||
|
||||
def test_unknown_get_returns_404(self, server):
|
||||
status, body = _get(f"http://127.0.0.1:{server.port}/nope")
|
||||
assert status == 404
|
||||
|
||||
|
||||
class TestExecute:
|
||||
def test_post_execute_round_trip(self, server):
|
||||
status, body = _post(
|
||||
f"http://127.0.0.1:{server.port}/execute",
|
||||
{"prompt": "say hello"},
|
||||
)
|
||||
assert status == 200
|
||||
assert body["content"] == "hello world"
|
||||
assert body["finish_reason"] == "stop"
|
||||
|
||||
def test_response_includes_usage(self, server):
|
||||
status, body = _post(
|
||||
f"http://127.0.0.1:{server.port}/execute",
|
||||
{"prompt": "count tokens"},
|
||||
)
|
||||
assert status == 200
|
||||
assert "usage" in body
|
||||
assert body["usage"]["total_tokens"] > 0
|
||||
|
||||
def test_missing_prompt_returns_400(self, server):
|
||||
status, body = _post(
|
||||
f"http://127.0.0.1:{server.port}/execute",
|
||||
{"config": {}},
|
||||
)
|
||||
assert status == 400
|
||||
assert "prompt" in body["error"]
|
||||
|
||||
def test_invalid_json_returns_400(self, server):
|
||||
req = urllib.request.Request(
|
||||
f"http://127.0.0.1:{server.port}/execute",
|
||||
data=b"not json",
|
||||
headers={"Content-Type": "application/json"},
|
||||
method="POST",
|
||||
)
|
||||
try:
|
||||
with urllib.request.urlopen(req) as resp:
|
||||
status, body = resp.status, json.loads(resp.read())
|
||||
except urllib.error.HTTPError as exc:
|
||||
status, body = exc.code, json.loads(exc.read())
|
||||
assert status == 400
|
||||
|
||||
def test_unknown_post_path_returns_404(self, server):
|
||||
status, body = _post(
|
||||
f"http://127.0.0.1:{server.port}/wrong",
|
||||
{"prompt": "hi"},
|
||||
)
|
||||
assert status == 404
|
||||
|
||||
def test_adapter_error_returns_500(self):
|
||||
s = LLMServer(adapter=ErrorLLMAdapter("boom"), port=0)
|
||||
s.start()
|
||||
try:
|
||||
status, body = _post(
|
||||
f"http://127.0.0.1:{s.port}/execute",
|
||||
{"prompt": "hello"},
|
||||
)
|
||||
assert status == 500
|
||||
assert "boom" in body["error"]
|
||||
finally:
|
||||
s.stop()
|
||||
|
||||
def test_config_fields_forwarded(self):
|
||||
"""Config fields in request body reach the adapter via RunConfig."""
|
||||
adapter = MockLLMAdapter(mock_response="x")
|
||||
s = LLMServer(adapter=adapter, port=0)
|
||||
s.start()
|
||||
try:
|
||||
status, body = _post(
|
||||
f"http://127.0.0.1:{s.port}/execute",
|
||||
{"prompt": "hi", "config": {"model_name": "gpt-3.5-turbo", "max_tokens": 100}},
|
||||
)
|
||||
assert status == 200
|
||||
assert adapter.last_config.model_name == "gpt-3.5-turbo"
|
||||
assert adapter.last_config.max_tokens == 100
|
||||
finally:
|
||||
s.stop()
|
||||
@@ -1,6 +1,6 @@
|
||||
# LLM-WP-0003 — Functional Extensions (FR-2 + FR-1)
|
||||
|
||||
**status:** active
|
||||
**status:** done
|
||||
**owner:** llm-connect
|
||||
**repo:** llm-connect
|
||||
**created:** 2026-04-01
|
||||
@@ -26,22 +26,22 @@ Both additions are Functional-layer under GAAF-2026:
|
||||
|
||||
| ID | Title | Priority | Status |
|
||||
|-----|-------|----------|--------|
|
||||
| T01 | `RoutingPolicy` data model: `rules` list with `task_type`, `prefer`, `max_cost_per_1k`, `fallback` | high | todo |
|
||||
| T02 | `policy.resolve(task_type)` → returns configured `LLMAdapter` | high | todo |
|
||||
| T03 | Export from `llm_connect.__init__` and update `__all__` | medium | todo |
|
||||
| T04 | Functional contract doc for `RoutingPolicy` | medium | todo |
|
||||
| T05 | Tests: rule match, cost-cap fallback, unknown task_type fallback, no-match default | high | todo |
|
||||
| T01 | `RoutingPolicy` data model: `rules` list with `task_type`, `prefer`, `max_cost_per_1k`, `fallback` | high | done |
|
||||
| T02 | `policy.resolve(task_type)` → returns configured `LLMAdapter` | high | done |
|
||||
| T03 | Export from `llm_connect.__init__` and update `__all__` | medium | done |
|
||||
| T04 | Functional contract doc for `RoutingPolicy` | medium | done |
|
||||
| T05 | Tests: rule match, cost-cap fallback, unknown task_type fallback, no-match default | high | done |
|
||||
|
||||
### FR-1 — HTTP serve mode
|
||||
|
||||
| ID | Title | Priority | Status |
|
||||
|-----|-------|----------|--------|
|
||||
| T06 | Design `/execute` JSON schema (request: provider, model, prompt, config; response: LLMResponse fields) | high | todo |
|
||||
| T07 | Implement `llm_connect/server.py` — minimal HTTP server, `POST /execute`, `GET /health` | high | todo |
|
||||
| T08 | `python -m llm_connect.server --port N --provider X --model Y` CLI entry point | high | todo |
|
||||
| T09 | Add `httpx` or `aiohttp` server dep under `[project.optional-dependencies] server` | medium | todo |
|
||||
| T10 | Functional contract doc (API schema — request/response shapes, error codes) | medium | todo |
|
||||
| T11 | Tests: spin up server in subprocess or via `TestClient`, POST round-trip (MockAdapter), error responses | high | todo |
|
||||
| T06 | Design `/execute` JSON schema (request: provider, model, prompt, config; response: LLMResponse fields) | high | done |
|
||||
| T07 | Implement `llm_connect/server.py` — minimal HTTP server, `POST /execute`, `GET /health` | high | done |
|
||||
| T08 | `python -m llm_connect.server --port N --provider X --model Y` CLI entry point | high | done |
|
||||
| T09 | Add `httpx` or `aiohttp` server dep under `[project.optional-dependencies] server` | medium | done |
|
||||
| T10 | Functional contract doc (API schema — request/response shapes, error codes) | medium | done |
|
||||
| T11 | Tests: spin up server in subprocess or via `TestClient`, POST round-trip (MockAdapter), error responses | high | done |
|
||||
|
||||
## Exit criteria
|
||||
|
||||
|
||||
Reference in New Issue
Block a user