Files
llm-connect/FEATURE_REQUESTS.md
2026-04-01 21:08:15 +00:00

108 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# llm-connect Feature Requests
Raised by: IHF Phase 11 — Advanced AI Federation (IHUB-WP-0012)
Date: 2026-04-01
These gaps were identified during integration of llm-connect into the
Interaction Hub Framework (IHF) as a subprocess bridge for multi-agent
federation. None are blockers for Phase 11, but they affect performance
and architectural elegance.
---
## FR-1 — HTTP/JSON-RPC serve mode
**Problem:** The current architecture requires spawning a new `python3
scripts/llm_bridge.py` process for every agent invocation. This adds
significant overhead in production when collective proposals invoke 35
agents in sequence.
**Proposed API:**
```bash
python -m llm_connect.server --port 9999
```
IHP (Haskell) would call `POST localhost:9999/execute` with the same JSON
payload the bridge script currently reads from stdin.
**Impact:** Eliminates process spawn overhead. A single persistent server
process handles all requests in the session lifetime.
---
## FR-2 — `RoutingPolicy` class for declarative provider/model selection
**Problem:** `RunConfig.model_name` is the only selection mechanism. IHF
needs declarative routing rules — e.g. "for triage tasks, prefer
openrouter/claude-haiku-4-5; fall back to gemini if cost exceeds 0.5/1k
tokens; never use auto_apply trust agents for autonomous actions".
**Proposed API:**
```python
from llm_connect import RoutingPolicy
policy = RoutingPolicy(rules=[
{
"task_type": "triage",
"prefer": [{"provider": "openrouter", "model": "claude-haiku-4-5"}],
"max_cost_per_1k": 0.5,
"fallback": {"provider": "gemini", "model": "gemini-flash-1.5"},
}
])
adapter = policy.resolve(task_type="triage")
```
**Impact:** Moves routing logic into llm-connect instead of duplicating it
in every consumer (currently IHF implements this in `ModelRouter.hs`).
---
## FR-3 — `async_execute_prompt()` for concurrent execution
**Problem:** Collective proposals invoke agents sequentially because
`execute_prompt` is synchronous. With 35 agents this is 35× slower than
necessary.
**Proposed API:**
```python
import asyncio
from llm_connect import create_adapter
async def main():
adapters = [create_adapter(...) for _ in agents]
responses = await asyncio.gather(*[
a.async_execute_prompt(prompt, config) for a in adapters
])
```
Standard `asyncio` coroutine interface, same signature as `execute_prompt`.
**Impact:** Collective proposal latency scales with the slowest agent
rather than the sum of all agent latencies.
---
## FR-4 — `BudgetTracker` for delegation chains
**Problem:** IHF's inter-agent delegation model enforces token budgets at
the Haskell layer (`AgentDelegation.tokenBudget`), but the bridge itself
has no concept of a shared budget. A delegation chain (A → B → C) cannot
enforce that the total token spend stays below a cap set by A.
**Proposed API:**
```python
from llm_connect import BudgetTracker, RunConfig
tracker = BudgetTracker(total=4000)
config = RunConfig(model_name="...", budget_tracker=tracker)
# Subsequent calls on any adapter sharing this tracker will raise
# LLMBudgetExceededError if the cumulative spend exceeds 4000 tokens.
resp = adapter.execute_prompt(prompt, config)
```
`LLMBudgetExceededError` should be a subclass of `LLMError` so existing
error handling catches it.
**Impact:** Budget enforcement moves into the bridge layer where it can be
applied uniformly across all providers, rather than requiring each consumer
to track it manually.