generated from coulomb/repo-seed
examples/routing/trading-literature.yaml is the checked-in starting config for a Lefevre-style run. It applies the IB-WP-0018 task-type taxonomy: cheap candidates for summary + evaluation, smart candidates for entity + relation extraction, and a separate baseline rule wiring claude_code for a follow-on T05 ShadowingAdapter step. Workspace- relative ledger_path keeps adaptive observations with the workspace. tests/test_routing_config.py gains a regression test that asserts the shipped example parses cleanly, every stage in stage_to_task_type maps to a declared task type, and the baseline candidate uses the claude_code provider — so the example will not bit-rot silently. tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY opt-in as the existing static smoke. It builds a one-candidate routing config, runs a single chapter through --provider routing, and asserts the per-stage adapter-choices report section names the routed model and the routed artifacts carry adapter_id provenance. docs/generic-source-generator.md gains a "Live runs with --provider routing" subsection that walks through the one-command routed run, explains the --quality-floor override, and points at the parallel live smoke test. 174 tests pass, 2 skipped (both live smokes, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
82 lines
2.8 KiB
YAML
82 lines
2.8 KiB
YAML
# Example routing config for a trading-literature Lefevre-style run.
|
|
#
|
|
# Captures the IB-WP-0018 task-type taxonomy from docs/routing-task-types.md:
|
|
# summarize-source → cheap model (volume-heavy, recoverable downstream)
|
|
# extract-entities → smart model (durable output; be strict)
|
|
# extract-relations → smart model (depends on entities)
|
|
# evaluate-entity → judge model (different family from extraction)
|
|
# synthesize-report → smart model (volume-of-one, quality matters, cheap)
|
|
#
|
|
# Quality floors are the recommended starting points from
|
|
# docs/routing-task-types.md. With a ledger configured, AdaptiveRoutingPolicy
|
|
# will pick the cheapest *qualifying* adapter per task type as observations
|
|
# accumulate; until then it falls back to the static prefer/fallback order.
|
|
#
|
|
# Refresh the model rates in src/infospace_bench/model_rates.yaml before any
|
|
# full-book run — list prices drift, and the rough USD estimate in the budget
|
|
# log depends on them.
|
|
|
|
schema_version: 1
|
|
|
|
# Workspace-relative ledger so QualityLedger observations from this workspace
|
|
# stay with this workspace. Drop this line to run pure static routing.
|
|
ledger_path: output/routing/quality.jsonl
|
|
|
|
# Floors apply when --quality-floor is not passed at the call site. The CLI
|
|
# flag wins, then the per-task quality_floor below, then this default.
|
|
default_quality_floor: 0.80
|
|
|
|
stage_to_task_type:
|
|
summarize-source: cheap
|
|
extract-entities: smart
|
|
extract-relations: smart
|
|
evaluate-entity: judge
|
|
synthesize-report: smart
|
|
|
|
task_types:
|
|
|
|
cheap:
|
|
quality_floor: 0.70
|
|
candidates:
|
|
- id: openrouter:gpt-4o-mini
|
|
provider: openrouter
|
|
model: openai/gpt-4o-mini
|
|
api_key_env: OPENROUTER_API_KEY
|
|
max_cost_per_1k: 0.001
|
|
- id: openrouter:claude-3.5-haiku
|
|
provider: openrouter
|
|
model: anthropic/claude-3.5-haiku
|
|
api_key_env: OPENROUTER_API_KEY
|
|
max_cost_per_1k: 0.003
|
|
|
|
smart:
|
|
quality_floor: 0.85
|
|
candidates:
|
|
- id: openrouter:claude-3.5-haiku
|
|
provider: openrouter
|
|
model: anthropic/claude-3.5-haiku
|
|
api_key_env: OPENROUTER_API_KEY
|
|
- id: openrouter:claude-3.5-sonnet
|
|
provider: openrouter
|
|
model: anthropic/claude-3.5-sonnet
|
|
api_key_env: OPENROUTER_API_KEY
|
|
|
|
judge:
|
|
quality_floor: 0.80
|
|
candidates:
|
|
# Evaluation goes through a different family than extraction to limit
|
|
# self-preference bias.
|
|
- id: openrouter:gpt-4o-mini
|
|
provider: openrouter
|
|
model: openai/gpt-4o-mini
|
|
api_key_env: OPENROUTER_API_KEY
|
|
|
|
# Baseline is wired here so a follow-up T05 ShadowingAdapter step can
|
|
# reference `claude-code` as the grading oracle without editing the
|
|
# task_types stanza.
|
|
baseline:
|
|
candidates:
|
|
- id: claude-code
|
|
provider: claude_code
|
|
model: claude-opus-4-7
|