generated from coulomb/repo-seed
Add adaptive cost-quality routing primitives
This commit is contained in:
83
docs/infospace-bench-adaptive-routing.md
Normal file
83
docs/infospace-bench-adaptive-routing.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# Infospace-Bench Adaptive Routing Guide
|
||||
|
||||
This guide shows how a consumer such as `infospace-bench` can wire task-type
|
||||
stages into the adaptive cost-quality primitives from `llm-connect`.
|
||||
|
||||
## Stage taxonomy
|
||||
|
||||
The consumer owns task names and quality thresholds. A first pass for
|
||||
`infospace-bench` could use:
|
||||
|
||||
| Stage | Task type | Suggested floor |
|
||||
|-------|-----------|-----------------|
|
||||
| Source chapter summary | `summarize-source` | `0.82` |
|
||||
| Entity extraction | `extract-entities` | `0.88` |
|
||||
| Relation extraction | `extract-relations` | `0.86` |
|
||||
| Entity evaluation | `evaluate-entity` | `0.90` |
|
||||
| Report synthesis | `synthesize-report` | `0.92` |
|
||||
|
||||
These floors are starting points, not library defaults. Raise them for stages
|
||||
whose errors compound downstream.
|
||||
|
||||
## Wiring sketch
|
||||
|
||||
```python
|
||||
from llm_connect.grading import ExactMatchJudge, PairedGrader
|
||||
from llm_connect.quality import QualityLedger
|
||||
from llm_connect.routing import AdaptiveRoutingPolicy, RoutingRule
|
||||
from llm_connect.shadowing import ShadowingAdapter
|
||||
|
||||
ledger = QualityLedger("quality-ledger.jsonl")
|
||||
grader = PairedGrader(ExactMatchJudge())
|
||||
|
||||
baseline = claude_code_adapter
|
||||
cheap = openrouter_cheap_adapter
|
||||
mid = openrouter_mid_adapter
|
||||
|
||||
shadowed_cheap = ShadowingAdapter(
|
||||
candidate_adapter=cheap,
|
||||
baseline_adapter=baseline,
|
||||
grader=grader,
|
||||
ledger=ledger,
|
||||
task_type="extract-relations",
|
||||
adapter_id="openrouter-cheap",
|
||||
baseline_adapter_id="claude-code",
|
||||
shadow_rate=0.1,
|
||||
tags={"prompt_fingerprint": prompt_fingerprint},
|
||||
)
|
||||
|
||||
policy = AdaptiveRoutingPolicy(
|
||||
rules=[
|
||||
RoutingRule("extract-relations", prefer=baseline, fallback=mid),
|
||||
],
|
||||
ledger=ledger,
|
||||
adapters_by_id={
|
||||
"openrouter-cheap": shadowed_cheap,
|
||||
"openrouter-mid": mid,
|
||||
"claude-code": baseline,
|
||||
},
|
||||
window_size=20,
|
||||
min_observations=3,
|
||||
)
|
||||
|
||||
adapter = policy.resolve("extract-relations", quality_floor=0.86)
|
||||
response = adapter.execute_prompt(prompt, run_config)
|
||||
```
|
||||
|
||||
## Operating loop
|
||||
|
||||
1. Start with static routing to the trusted baseline or mid-tier adapter.
|
||||
2. Wrap cheaper candidates with `ShadowingAdapter` at a conservative
|
||||
`shadow_rate`, for example `0.05` to `0.1`.
|
||||
3. Record a prompt fingerprint or template version in `tags` so later prompt
|
||||
changes do not mix incompatible observations.
|
||||
4. Increase `min_observations` for stages with high variance.
|
||||
5. Let `AdaptiveRoutingPolicy` select the cheapest adapter that clears each
|
||||
stage floor.
|
||||
|
||||
## Refresh rules
|
||||
|
||||
When a provider model, prompt template, or parser contract changes, treat prior
|
||||
observations as a different regime. Either write to a new ledger, prune old
|
||||
observations, or filter with a new `prompt_fingerprint` tag before trusting
|
||||
adaptive selection again.
|
||||
Reference in New Issue
Block a user