diff --git a/FEATURE_REQUESTS.md b/FEATURE_REQUESTS.md new file mode 100644 index 0000000..e1b993f --- /dev/null +++ b/FEATURE_REQUESTS.md @@ -0,0 +1,107 @@ +# llm-connect Feature Requests + +Raised by: IHF Phase 11 — Advanced AI Federation (IHUB-WP-0012) +Date: 2026-04-01 + +These gaps were identified during integration of llm-connect into the +Interaction Hub Framework (IHF) as a subprocess bridge for multi-agent +federation. None are blockers for Phase 11, but they affect performance +and architectural elegance. + +--- + +## FR-1 — HTTP/JSON-RPC serve mode + +**Problem:** The current architecture requires spawning a new `python3 +scripts/llm_bridge.py` process for every agent invocation. This adds +significant overhead in production when collective proposals invoke 3–5 +agents in sequence. + +**Proposed API:** +```bash +python -m llm_connect.server --port 9999 +``` +IHP (Haskell) would call `POST localhost:9999/execute` with the same JSON +payload the bridge script currently reads from stdin. + +**Impact:** Eliminates process spawn overhead. A single persistent server +process handles all requests in the session lifetime. + +--- + +## FR-2 — `RoutingPolicy` class for declarative provider/model selection + +**Problem:** `RunConfig.model_name` is the only selection mechanism. IHF +needs declarative routing rules — e.g. "for triage tasks, prefer +openrouter/claude-haiku-4-5; fall back to gemini if cost exceeds 0.5/1k +tokens; never use auto_apply trust agents for autonomous actions". + +**Proposed API:** +```python +from llm_connect import RoutingPolicy + +policy = RoutingPolicy(rules=[ + { + "task_type": "triage", + "prefer": [{"provider": "openrouter", "model": "claude-haiku-4-5"}], + "max_cost_per_1k": 0.5, + "fallback": {"provider": "gemini", "model": "gemini-flash-1.5"}, + } +]) +adapter = policy.resolve(task_type="triage") +``` + +**Impact:** Moves routing logic into llm-connect instead of duplicating it +in every consumer (currently IHF implements this in `ModelRouter.hs`). + +--- + +## FR-3 — `async_execute_prompt()` for concurrent execution + +**Problem:** Collective proposals invoke agents sequentially because +`execute_prompt` is synchronous. With 3–5 agents this is 3–5× slower than +necessary. + +**Proposed API:** +```python +import asyncio +from llm_connect import create_adapter + +async def main(): + adapters = [create_adapter(...) for _ in agents] + responses = await asyncio.gather(*[ + a.async_execute_prompt(prompt, config) for a in adapters + ]) +``` + +Standard `asyncio` coroutine interface, same signature as `execute_prompt`. + +**Impact:** Collective proposal latency scales with the slowest agent +rather than the sum of all agent latencies. + +--- + +## FR-4 — `BudgetTracker` for delegation chains + +**Problem:** IHF's inter-agent delegation model enforces token budgets at +the Haskell layer (`AgentDelegation.tokenBudget`), but the bridge itself +has no concept of a shared budget. A delegation chain (A → B → C) cannot +enforce that the total token spend stays below a cap set by A. + +**Proposed API:** +```python +from llm_connect import BudgetTracker, RunConfig + +tracker = BudgetTracker(total=4000) +config = RunConfig(model_name="...", budget_tracker=tracker) +# Subsequent calls on any adapter sharing this tracker will raise +# LLMBudgetExceededError if the cumulative spend exceeds 4000 tokens. +resp = adapter.execute_prompt(prompt, config) +``` + +`LLMBudgetExceededError` should be a subclass of `LLMError` so existing +error handling catches it. + +**Impact:** Budget enforcement moves into the bridge layer where it can be +applied uniformly across all providers, rather than requiring each consumer +to track it manually.