llm-connect Feature Requests

Raised by: IHF Phase 11 — Advanced AI Federation (IHUB-WP-0012) Date: 2026-04-01

These gaps were identified during integration of llm-connect into the Interaction Hub Framework (IHF) as a subprocess bridge for multi-agent federation. None are blockers for Phase 11, but they affect performance and architectural elegance.

FR-1 — HTTP/JSON-RPC serve mode

Problem: The current architecture requires spawning a new python3 scripts/llm_bridge.py process for every agent invocation. This adds significant overhead in production when collective proposals invoke 3–5 agents in sequence.

Proposed API:

python -m llm_connect.server --port 9999

IHP (Haskell) would call POST localhost:9999/execute with the same JSON payload the bridge script currently reads from stdin.

Impact: Eliminates process spawn overhead. A single persistent server process handles all requests in the session lifetime.

FR-2 — `RoutingPolicy` class for declarative provider/model selection

Problem: RunConfig.model_name is the only selection mechanism. IHF needs declarative routing rules — e.g. "for triage tasks, prefer openrouter/claude-haiku-4-5; fall back to gemini if cost exceeds 0.5/1k tokens; never use auto_apply trust agents for autonomous actions".

Proposed API:

from llm_connect import RoutingPolicy

policy = RoutingPolicy(rules=[
    {
        "task_type": "triage",
        "prefer": [{"provider": "openrouter", "model": "claude-haiku-4-5"}],
        "max_cost_per_1k": 0.5,
        "fallback": {"provider": "gemini", "model": "gemini-flash-1.5"},
    }
])
adapter = policy.resolve(task_type="triage")

Impact: Moves routing logic into llm-connect instead of duplicating it in every consumer (currently IHF implements this in ModelRouter.hs).

FR-3 — `async_execute_prompt()` for concurrent execution

Problem: Collective proposals invoke agents sequentially because execute_prompt is synchronous. With 3–5 agents this is 3–5× slower than necessary.

Proposed API:

import asyncio
from llm_connect import create_adapter

async def main():
    adapters = [create_adapter(...) for _ in agents]
    responses = await asyncio.gather(*[
        a.async_execute_prompt(prompt, config) for a in adapters
    ])

Standard asyncio coroutine interface, same signature as execute_prompt.

Impact: Collective proposal latency scales with the slowest agent rather than the sum of all agent latencies.

FR-4 — `BudgetTracker` for delegation chains

Problem: IHF's inter-agent delegation model enforces token budgets at the Haskell layer (AgentDelegation.tokenBudget), but the bridge itself has no concept of a shared budget. A delegation chain (A → B → C) cannot enforce that the total token spend stays below a cap set by A.

Proposed API:

from llm_connect import BudgetTracker, RunConfig

tracker = BudgetTracker(total=4000)
config = RunConfig(model_name="...", budget_tracker=tracker)
# Subsequent calls on any adapter sharing this tracker will raise
# LLMBudgetExceededError if the cumulative spend exceeds 4000 tokens.
resp = adapter.execute_prompt(prompt, config)

LLMBudgetExceededError should be a subclass of LLMError so existing error handling catches it.

Impact: Budget enforcement moves into the bridge layer where it can be applied uniformly across all providers, rather than requiring each consumer to track it manually.

3.4 KiB Raw Blame History Unescape Escape

llm-connect Feature Requests

FR-1 — HTTP/JSON-RPC serve mode

FR-2 — RoutingPolicy class for declarative provider/model selection

FR-3 — async_execute_prompt() for concurrent execution

FR-4 — BudgetTracker for delegation chains

3.4 KiB

Raw Blame History

FR-2 — `RoutingPolicy` class for declarative provider/model selection

FR-3 — `async_execute_prompt()` for concurrent execution

FR-4 — `BudgetTracker` for delegation chains