IB-WP-0020-T04: example routing config + live routing smoke

examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.

tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.

tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.

docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.

174 tests pass, 2 skipped (both live smokes, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-18 22:19:54 +02:00
parent d3562454d7
commit debd2b8e69
5 changed files with 221 additions and 1 deletions

View File

@@ -412,6 +412,25 @@ def test_build_routing_policy_claude_code_needs_no_api_key() -> None:
assert isinstance(policy.rules[0].prefer, ClaudeCodeAdapter)
def test_example_trading_literature_config_parses() -> None:
"""Regression: the shipped example config must parse cleanly."""
from infospace_bench.routing_config import load_routing_config
example_path = Path(__file__).resolve().parent.parent / "examples" / "routing" / "trading-literature.yaml"
config = load_routing_config(example_path)
task_type_names = {task.task_type for task in config.task_types}
assert {"cheap", "smart", "judge", "baseline"} <= task_type_names
assert config.default_quality_floor == 0.80
# Each shipped stage maps to a task type the config actually declares.
for stage, task_type in config.stage_to_task_type.items():
assert task_type in task_type_names, f"stage {stage!r} maps to undeclared task type {task_type!r}"
# baseline is included so a T05 ShadowingAdapter wiring can reference it.
baseline = next(t for t in config.task_types if t.task_type == "baseline")
assert baseline.candidates[0].provider == "claude_code"
def test_build_routing_policy_honours_custom_api_key_env() -> None:
from infospace_bench.routing_config import build_routing_policy_from_config
from llm_connect.openrouter import OpenRouterAdapter