Files
infospace-bench/examples/routing/trading-literature.yaml
tegwick debd2b8e69 IB-WP-0020-T04: example routing config + live routing smoke
examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.

tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.

tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.

docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.

174 tests pass, 2 skipped (both live smokes, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 22:19:54 +02:00

82 lines
2.8 KiB
YAML

# Example routing config for a trading-literature Lefevre-style run.
#
# Captures the IB-WP-0018 task-type taxonomy from docs/routing-task-types.md:
# summarize-source → cheap model (volume-heavy, recoverable downstream)
# extract-entities → smart model (durable output; be strict)
# extract-relations → smart model (depends on entities)
# evaluate-entity → judge model (different family from extraction)
# synthesize-report → smart model (volume-of-one, quality matters, cheap)
#
# Quality floors are the recommended starting points from
# docs/routing-task-types.md. With a ledger configured, AdaptiveRoutingPolicy
# will pick the cheapest *qualifying* adapter per task type as observations
# accumulate; until then it falls back to the static prefer/fallback order.
#
# Refresh the model rates in src/infospace_bench/model_rates.yaml before any
# full-book run — list prices drift, and the rough USD estimate in the budget
# log depends on them.
schema_version: 1
# Workspace-relative ledger so QualityLedger observations from this workspace
# stay with this workspace. Drop this line to run pure static routing.
ledger_path: output/routing/quality.jsonl
# Floors apply when --quality-floor is not passed at the call site. The CLI
# flag wins, then the per-task quality_floor below, then this default.
default_quality_floor: 0.80
stage_to_task_type:
summarize-source: cheap
extract-entities: smart
extract-relations: smart
evaluate-entity: judge
synthesize-report: smart
task_types:
cheap:
quality_floor: 0.70
candidates:
- id: openrouter:gpt-4o-mini
provider: openrouter
model: openai/gpt-4o-mini
api_key_env: OPENROUTER_API_KEY
max_cost_per_1k: 0.001
- id: openrouter:claude-3.5-haiku
provider: openrouter
model: anthropic/claude-3.5-haiku
api_key_env: OPENROUTER_API_KEY
max_cost_per_1k: 0.003
smart:
quality_floor: 0.85
candidates:
- id: openrouter:claude-3.5-haiku
provider: openrouter
model: anthropic/claude-3.5-haiku
api_key_env: OPENROUTER_API_KEY
- id: openrouter:claude-3.5-sonnet
provider: openrouter
model: anthropic/claude-3.5-sonnet
api_key_env: OPENROUTER_API_KEY
judge:
quality_floor: 0.80
candidates:
# Evaluation goes through a different family than extraction to limit
# self-preference bias.
- id: openrouter:gpt-4o-mini
provider: openrouter
model: openai/gpt-4o-mini
api_key_env: OPENROUTER_API_KEY
# Baseline is wired here so a follow-up T05 ShadowingAdapter step can
# reference `claude-code` as the grading oracle without editing the
# task_types stanza.
baseline:
candidates:
- id: claude-code
provider: claude_code
model: claude-opus-4-7