Files
llm-connect/FEATURE_REQUESTS.md
2026-04-01 21:08:15 +00:00

3.4 KiB
Raw Blame History

llm-connect Feature Requests

Raised by: IHF Phase 11 — Advanced AI Federation (IHUB-WP-0012) Date: 2026-04-01

These gaps were identified during integration of llm-connect into the Interaction Hub Framework (IHF) as a subprocess bridge for multi-agent federation. None are blockers for Phase 11, but they affect performance and architectural elegance.


FR-1 — HTTP/JSON-RPC serve mode

Problem: The current architecture requires spawning a new python3 scripts/llm_bridge.py process for every agent invocation. This adds significant overhead in production when collective proposals invoke 35 agents in sequence.

Proposed API:

python -m llm_connect.server --port 9999

IHP (Haskell) would call POST localhost:9999/execute with the same JSON payload the bridge script currently reads from stdin.

Impact: Eliminates process spawn overhead. A single persistent server process handles all requests in the session lifetime.


FR-2 — RoutingPolicy class for declarative provider/model selection

Problem: RunConfig.model_name is the only selection mechanism. IHF needs declarative routing rules — e.g. "for triage tasks, prefer openrouter/claude-haiku-4-5; fall back to gemini if cost exceeds 0.5/1k tokens; never use auto_apply trust agents for autonomous actions".

Proposed API:

from llm_connect import RoutingPolicy

policy = RoutingPolicy(rules=[
    {
        "task_type": "triage",
        "prefer": [{"provider": "openrouter", "model": "claude-haiku-4-5"}],
        "max_cost_per_1k": 0.5,
        "fallback": {"provider": "gemini", "model": "gemini-flash-1.5"},
    }
])
adapter = policy.resolve(task_type="triage")

Impact: Moves routing logic into llm-connect instead of duplicating it in every consumer (currently IHF implements this in ModelRouter.hs).


FR-3 — async_execute_prompt() for concurrent execution

Problem: Collective proposals invoke agents sequentially because execute_prompt is synchronous. With 35 agents this is 35× slower than necessary.

Proposed API:

import asyncio
from llm_connect import create_adapter

async def main():
    adapters = [create_adapter(...) for _ in agents]
    responses = await asyncio.gather(*[
        a.async_execute_prompt(prompt, config) for a in adapters
    ])

Standard asyncio coroutine interface, same signature as execute_prompt.

Impact: Collective proposal latency scales with the slowest agent rather than the sum of all agent latencies.


FR-4 — BudgetTracker for delegation chains

Problem: IHF's inter-agent delegation model enforces token budgets at the Haskell layer (AgentDelegation.tokenBudget), but the bridge itself has no concept of a shared budget. A delegation chain (A → B → C) cannot enforce that the total token spend stays below a cap set by A.

Proposed API:

from llm_connect import BudgetTracker, RunConfig

tracker = BudgetTracker(total=4000)
config = RunConfig(model_name="...", budget_tracker=tracker)
# Subsequent calls on any adapter sharing this tracker will raise
# LLMBudgetExceededError if the cumulative spend exceeds 4000 tokens.
resp = adapter.execute_prompt(prompt, config)

LLMBudgetExceededError should be a subclass of LLMError so existing error handling catches it.

Impact: Budget enforcement moves into the bridge layer where it can be applied uniformly across all providers, rather than requiring each consumer to track it manually.