generated from coulomb/repo-seed
2.7 KiB
2.7 KiB
Infospace-Bench Adaptive Routing Guide
This guide shows how a consumer such as infospace-bench can wire task-type
stages into the adaptive cost-quality primitives from llm-connect.
Stage taxonomy
The consumer owns task names and quality thresholds. A first pass for
infospace-bench could use:
| Stage | Task type | Suggested floor |
|---|---|---|
| Source chapter summary | summarize-source |
0.82 |
| Entity extraction | extract-entities |
0.88 |
| Relation extraction | extract-relations |
0.86 |
| Entity evaluation | evaluate-entity |
0.90 |
| Report synthesis | synthesize-report |
0.92 |
These floors are starting points, not library defaults. Raise them for stages whose errors compound downstream.
Wiring sketch
from llm_connect.grading import ExactMatchJudge, PairedGrader
from llm_connect.quality import QualityLedger
from llm_connect.routing import AdaptiveRoutingPolicy, RoutingRule
from llm_connect.shadowing import ShadowingAdapter
ledger = QualityLedger("quality-ledger.jsonl")
grader = PairedGrader(ExactMatchJudge())
baseline = claude_code_adapter
cheap = openrouter_cheap_adapter
mid = openrouter_mid_adapter
shadowed_cheap = ShadowingAdapter(
candidate_adapter=cheap,
baseline_adapter=baseline,
grader=grader,
ledger=ledger,
task_type="extract-relations",
adapter_id="openrouter-cheap",
baseline_adapter_id="claude-code",
shadow_rate=0.1,
tags={"prompt_fingerprint": prompt_fingerprint},
)
policy = AdaptiveRoutingPolicy(
rules=[
RoutingRule("extract-relations", prefer=baseline, fallback=mid),
],
ledger=ledger,
adapters_by_id={
"openrouter-cheap": shadowed_cheap,
"openrouter-mid": mid,
"claude-code": baseline,
},
window_size=20,
min_observations=3,
)
adapter = policy.resolve("extract-relations", quality_floor=0.86)
response = adapter.execute_prompt(prompt, run_config)
Operating loop
- Start with static routing to the trusted baseline or mid-tier adapter.
- Wrap cheaper candidates with
ShadowingAdapterat a conservativeshadow_rate, for example0.05to0.1. - Record a prompt fingerprint or template version in
tagsso later prompt changes do not mix incompatible observations. - Increase
min_observationsfor stages with high variance. - Let
AdaptiveRoutingPolicyselect the cheapest adapter that clears each stage floor.
Refresh rules
When a provider model, prompt template, or parser contract changes, treat prior
observations as a different regime. Either write to a new ledger, prune old
observations, or filter with a new prompt_fingerprint tag before trusting
adaptive selection again.