Files
llm-connect/docs/infospace-bench-adaptive-routing.md
tegwick c4ad4bb9f2
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Add adaptive cost-quality routing primitives
2026-05-17 21:32:27 +02:00

2.7 KiB

Infospace-Bench Adaptive Routing Guide

This guide shows how a consumer such as infospace-bench can wire task-type stages into the adaptive cost-quality primitives from llm-connect.

Stage taxonomy

The consumer owns task names and quality thresholds. A first pass for infospace-bench could use:

Stage Task type Suggested floor
Source chapter summary summarize-source 0.82
Entity extraction extract-entities 0.88
Relation extraction extract-relations 0.86
Entity evaluation evaluate-entity 0.90
Report synthesis synthesize-report 0.92

These floors are starting points, not library defaults. Raise them for stages whose errors compound downstream.

Wiring sketch

from llm_connect.grading import ExactMatchJudge, PairedGrader
from llm_connect.quality import QualityLedger
from llm_connect.routing import AdaptiveRoutingPolicy, RoutingRule
from llm_connect.shadowing import ShadowingAdapter

ledger = QualityLedger("quality-ledger.jsonl")
grader = PairedGrader(ExactMatchJudge())

baseline = claude_code_adapter
cheap = openrouter_cheap_adapter
mid = openrouter_mid_adapter

shadowed_cheap = ShadowingAdapter(
    candidate_adapter=cheap,
    baseline_adapter=baseline,
    grader=grader,
    ledger=ledger,
    task_type="extract-relations",
    adapter_id="openrouter-cheap",
    baseline_adapter_id="claude-code",
    shadow_rate=0.1,
    tags={"prompt_fingerprint": prompt_fingerprint},
)

policy = AdaptiveRoutingPolicy(
    rules=[
        RoutingRule("extract-relations", prefer=baseline, fallback=mid),
    ],
    ledger=ledger,
    adapters_by_id={
        "openrouter-cheap": shadowed_cheap,
        "openrouter-mid": mid,
        "claude-code": baseline,
    },
    window_size=20,
    min_observations=3,
)

adapter = policy.resolve("extract-relations", quality_floor=0.86)
response = adapter.execute_prompt(prompt, run_config)

Operating loop

  1. Start with static routing to the trusted baseline or mid-tier adapter.
  2. Wrap cheaper candidates with ShadowingAdapter at a conservative shadow_rate, for example 0.05 to 0.1.
  3. Record a prompt fingerprint or template version in tags so later prompt changes do not mix incompatible observations.
  4. Increase min_observations for stages with high variance.
  5. Let AdaptiveRoutingPolicy select the cheapest adapter that clears each stage floor.

Refresh rules

When a provider model, prompt template, or parser contract changes, treat prior observations as a different regime. Either write to a new ledger, prune old observations, or filter with a new prompt_fingerprint tag before trusting adaptive selection again.