generated from coulomb/repo-seed
IB-WP-0020-T04: example routing config + live routing smoke
examples/routing/trading-literature.yaml is the checked-in starting config for a Lefevre-style run. It applies the IB-WP-0018 task-type taxonomy: cheap candidates for summary + evaluation, smart candidates for entity + relation extraction, and a separate baseline rule wiring claude_code for a follow-on T05 ShadowingAdapter step. Workspace- relative ledger_path keeps adaptive observations with the workspace. tests/test_routing_config.py gains a regression test that asserts the shipped example parses cleanly, every stage in stage_to_task_type maps to a declared task type, and the baseline candidate uses the claude_code provider — so the example will not bit-rot silently. tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY opt-in as the existing static smoke. It builds a one-candidate routing config, runs a single chapter through --provider routing, and asserts the per-stage adapter-choices report section names the routed model and the routed artifacts carry adapter_id provenance. docs/generic-source-generator.md gains a "Live runs with --provider routing" subsection that walks through the one-command routed run, explains the --quality-floor override, and points at the parallel live smoke test. 174 tests pass, 2 skipped (both live smokes, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -94,6 +94,42 @@ skipped unless both `OPENROUTER_API_KEY` and
|
|||||||
chapter through the same path and asserts the provider metadata
|
chapter through the same path and asserts the provider metadata
|
||||||
plumb-through.
|
plumb-through.
|
||||||
|
|
||||||
|
### Live runs with `--provider routing`
|
||||||
|
|
||||||
|
When the routing CLI is what you want to exercise live, swap
|
||||||
|
`--provider openrouter --model ...` for the routing pair:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
infospace-bench generate from-source ./LEFEVRE.epub \
|
||||||
|
--workspace ./infospaces \
|
||||||
|
--slug reminiscences-routed \
|
||||||
|
--name "Reminiscences (Routed)" \
|
||||||
|
--profile trading-literature \
|
||||||
|
--provider routing \
|
||||||
|
--routing-config ./examples/routing/trading-literature.yaml \
|
||||||
|
--chapter I \
|
||||||
|
--apply
|
||||||
|
```
|
||||||
|
|
||||||
|
`examples/routing/trading-literature.yaml` is a checked-in starting
|
||||||
|
config: cheap candidates for summary/evaluation, smart candidates for
|
||||||
|
entity/relation, a `claude_code` baseline rule for future shadow
|
||||||
|
sampling, and a workspace-relative `output/routing/quality.jsonl`
|
||||||
|
ledger so adaptive observations stay with the workspace.
|
||||||
|
|
||||||
|
`--quality-floor <float>` on the same command overrides the config's
|
||||||
|
`default_quality_floor` for a single invocation — useful for
|
||||||
|
tightening the bar for a specific run without editing the file. The
|
||||||
|
ledger fills up as the `AdaptiveRoutingPolicy` records each
|
||||||
|
observation; later runs against the same workspace get the benefit
|
||||||
|
without re-grading from scratch.
|
||||||
|
|
||||||
|
The parallel live-smoke test
|
||||||
|
(`test_provider_routing_one_chapter_live_smoke`) is also gated on
|
||||||
|
`INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` + `OPENROUTER_API_KEY` and
|
||||||
|
asserts the per-stage adapter-choices report section names the routed
|
||||||
|
model.
|
||||||
|
|
||||||
### Budget and usage registry
|
### Budget and usage registry
|
||||||
|
|
||||||
Every `generate plan` invocation appends a compact snapshot to
|
Every `generate plan` invocation appends a compact snapshot to
|
||||||
|
|||||||
81
examples/routing/trading-literature.yaml
Normal file
81
examples/routing/trading-literature.yaml
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
# Example routing config for a trading-literature Lefevre-style run.
|
||||||
|
#
|
||||||
|
# Captures the IB-WP-0018 task-type taxonomy from docs/routing-task-types.md:
|
||||||
|
# summarize-source → cheap model (volume-heavy, recoverable downstream)
|
||||||
|
# extract-entities → smart model (durable output; be strict)
|
||||||
|
# extract-relations → smart model (depends on entities)
|
||||||
|
# evaluate-entity → judge model (different family from extraction)
|
||||||
|
# synthesize-report → smart model (volume-of-one, quality matters, cheap)
|
||||||
|
#
|
||||||
|
# Quality floors are the recommended starting points from
|
||||||
|
# docs/routing-task-types.md. With a ledger configured, AdaptiveRoutingPolicy
|
||||||
|
# will pick the cheapest *qualifying* adapter per task type as observations
|
||||||
|
# accumulate; until then it falls back to the static prefer/fallback order.
|
||||||
|
#
|
||||||
|
# Refresh the model rates in src/infospace_bench/model_rates.yaml before any
|
||||||
|
# full-book run — list prices drift, and the rough USD estimate in the budget
|
||||||
|
# log depends on them.
|
||||||
|
|
||||||
|
schema_version: 1
|
||||||
|
|
||||||
|
# Workspace-relative ledger so QualityLedger observations from this workspace
|
||||||
|
# stay with this workspace. Drop this line to run pure static routing.
|
||||||
|
ledger_path: output/routing/quality.jsonl
|
||||||
|
|
||||||
|
# Floors apply when --quality-floor is not passed at the call site. The CLI
|
||||||
|
# flag wins, then the per-task quality_floor below, then this default.
|
||||||
|
default_quality_floor: 0.80
|
||||||
|
|
||||||
|
stage_to_task_type:
|
||||||
|
summarize-source: cheap
|
||||||
|
extract-entities: smart
|
||||||
|
extract-relations: smart
|
||||||
|
evaluate-entity: judge
|
||||||
|
synthesize-report: smart
|
||||||
|
|
||||||
|
task_types:
|
||||||
|
|
||||||
|
cheap:
|
||||||
|
quality_floor: 0.70
|
||||||
|
candidates:
|
||||||
|
- id: openrouter:gpt-4o-mini
|
||||||
|
provider: openrouter
|
||||||
|
model: openai/gpt-4o-mini
|
||||||
|
api_key_env: OPENROUTER_API_KEY
|
||||||
|
max_cost_per_1k: 0.001
|
||||||
|
- id: openrouter:claude-3.5-haiku
|
||||||
|
provider: openrouter
|
||||||
|
model: anthropic/claude-3.5-haiku
|
||||||
|
api_key_env: OPENROUTER_API_KEY
|
||||||
|
max_cost_per_1k: 0.003
|
||||||
|
|
||||||
|
smart:
|
||||||
|
quality_floor: 0.85
|
||||||
|
candidates:
|
||||||
|
- id: openrouter:claude-3.5-haiku
|
||||||
|
provider: openrouter
|
||||||
|
model: anthropic/claude-3.5-haiku
|
||||||
|
api_key_env: OPENROUTER_API_KEY
|
||||||
|
- id: openrouter:claude-3.5-sonnet
|
||||||
|
provider: openrouter
|
||||||
|
model: anthropic/claude-3.5-sonnet
|
||||||
|
api_key_env: OPENROUTER_API_KEY
|
||||||
|
|
||||||
|
judge:
|
||||||
|
quality_floor: 0.80
|
||||||
|
candidates:
|
||||||
|
# Evaluation goes through a different family than extraction to limit
|
||||||
|
# self-preference bias.
|
||||||
|
- id: openrouter:gpt-4o-mini
|
||||||
|
provider: openrouter
|
||||||
|
model: openai/gpt-4o-mini
|
||||||
|
api_key_env: OPENROUTER_API_KEY
|
||||||
|
|
||||||
|
# Baseline is wired here so a follow-up T05 ShadowingAdapter step can
|
||||||
|
# reference `claude-code` as the grading oracle without editing the
|
||||||
|
# task_types stanza.
|
||||||
|
baseline:
|
||||||
|
candidates:
|
||||||
|
- id: claude-code
|
||||||
|
provider: claude_code
|
||||||
|
model: claude-opus-4-7
|
||||||
@@ -208,3 +208,87 @@ def test_openrouter_one_chapter_smoke(tmp_path: Path) -> None:
|
|||||||
and item.get("provenance", {}).get("provider_metadata", {}).get("request_id")
|
and item.get("provenance", {}).get("provider_metadata", {}).get("request_id")
|
||||||
]
|
]
|
||||||
assert generated_with_metadata, "generated artifacts should carry provider_metadata.request_id"
|
assert generated_with_metadata, "generated artifacts should carry provider_metadata.request_id"
|
||||||
|
|
||||||
|
|
||||||
|
_LIVE_ROUTING_REASON = (
|
||||||
|
"set INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 and OPENROUTER_API_KEY to run "
|
||||||
|
"the optional one-chapter routing smoke against OpenRouter"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.skipif(not (_LIVE_OPT_IN and _LIVE_API_KEY), reason=_LIVE_ROUTING_REASON)
|
||||||
|
def test_provider_routing_one_chapter_live_smoke(tmp_path: Path) -> None:
|
||||||
|
"""Live smoke: one chapter through --provider routing against OpenRouter.
|
||||||
|
|
||||||
|
Uses a minimal one-candidate-per-task-type routing config so the test
|
||||||
|
spends roughly the same as the static OpenRouter smoke. Asserts the run
|
||||||
|
completes, the routing bridge recorded adapter_id / task_type on
|
||||||
|
provider_metadata, and the per-stage adapter-choices report section
|
||||||
|
reflects routed choices.
|
||||||
|
"""
|
||||||
|
book = _build_fixture_epub(tmp_path / "lefevre.epub")
|
||||||
|
model = os.environ.get("INFOSPACE_BENCH_LIVE_MODEL", "openai/gpt-4o-mini")
|
||||||
|
|
||||||
|
routing_config = tmp_path / "routing.yaml"
|
||||||
|
routing_config.write_text(
|
||||||
|
yaml.safe_dump(
|
||||||
|
{
|
||||||
|
"schema_version": 1,
|
||||||
|
"stage_to_task_type": {
|
||||||
|
"summarize-source": "cheap",
|
||||||
|
"extract-entities": "cheap",
|
||||||
|
"extract-relations": "cheap",
|
||||||
|
"evaluate-entity": "cheap",
|
||||||
|
"synthesize-report": "cheap",
|
||||||
|
},
|
||||||
|
"task_types": {
|
||||||
|
"cheap": {
|
||||||
|
"candidates": [
|
||||||
|
{
|
||||||
|
"id": f"openrouter:{model}",
|
||||||
|
"provider": "openrouter",
|
||||||
|
"model": model,
|
||||||
|
"api_key_env": "OPENROUTER_API_KEY",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
sort_keys=False,
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
infospace = init_generation_infospace(
|
||||||
|
tmp_path,
|
||||||
|
book,
|
||||||
|
"lefevre-live-routing",
|
||||||
|
name="Lefevre Live Routing",
|
||||||
|
profile="trading-literature",
|
||||||
|
chapter_filter=["I"],
|
||||||
|
)
|
||||||
|
plan_generation(infospace.root, cost_per_1k_tokens=0.5)
|
||||||
|
result = run_generation(
|
||||||
|
infospace.root,
|
||||||
|
provider="routing",
|
||||||
|
routing_config=routing_config,
|
||||||
|
)
|
||||||
|
status = status_generation(infospace.root)
|
||||||
|
|
||||||
|
assert result.status == "completed"
|
||||||
|
assert status["source_chunk_count"] == 1
|
||||||
|
assert status["entity_count"] >= 1
|
||||||
|
|
||||||
|
report = (infospace.root / "reports" / "generation-summary.md").read_text(encoding="utf-8")
|
||||||
|
assert "## Per-stage adapter choices" in report
|
||||||
|
assert model in report, "report should name the routed model"
|
||||||
|
|
||||||
|
# The routing bridge writes adapter_id + task_type onto provider_metadata.
|
||||||
|
index = yaml.safe_load((infospace.root / "artifacts" / "index.yaml").read_text(encoding="utf-8"))
|
||||||
|
routed_artifacts = [
|
||||||
|
item
|
||||||
|
for item in index["artifacts"]
|
||||||
|
if item["kind"] in {"entity", "relation", "generated"}
|
||||||
|
and (item.get("provenance") or {}).get("provider_metadata", {}).get("adapter_id")
|
||||||
|
]
|
||||||
|
assert routed_artifacts, "routed artifacts must carry adapter_id provenance"
|
||||||
|
|||||||
@@ -412,6 +412,25 @@ def test_build_routing_policy_claude_code_needs_no_api_key() -> None:
|
|||||||
assert isinstance(policy.rules[0].prefer, ClaudeCodeAdapter)
|
assert isinstance(policy.rules[0].prefer, ClaudeCodeAdapter)
|
||||||
|
|
||||||
|
|
||||||
|
def test_example_trading_literature_config_parses() -> None:
|
||||||
|
"""Regression: the shipped example config must parse cleanly."""
|
||||||
|
from infospace_bench.routing_config import load_routing_config
|
||||||
|
|
||||||
|
example_path = Path(__file__).resolve().parent.parent / "examples" / "routing" / "trading-literature.yaml"
|
||||||
|
|
||||||
|
config = load_routing_config(example_path)
|
||||||
|
|
||||||
|
task_type_names = {task.task_type for task in config.task_types}
|
||||||
|
assert {"cheap", "smart", "judge", "baseline"} <= task_type_names
|
||||||
|
assert config.default_quality_floor == 0.80
|
||||||
|
# Each shipped stage maps to a task type the config actually declares.
|
||||||
|
for stage, task_type in config.stage_to_task_type.items():
|
||||||
|
assert task_type in task_type_names, f"stage {stage!r} maps to undeclared task type {task_type!r}"
|
||||||
|
# baseline is included so a T05 ShadowingAdapter wiring can reference it.
|
||||||
|
baseline = next(t for t in config.task_types if t.task_type == "baseline")
|
||||||
|
assert baseline.candidates[0].provider == "claude_code"
|
||||||
|
|
||||||
|
|
||||||
def test_build_routing_policy_honours_custom_api_key_env() -> None:
|
def test_build_routing_policy_honours_custom_api_key_env() -> None:
|
||||||
from infospace_bench.routing_config import build_routing_policy_from_config
|
from infospace_bench.routing_config import build_routing_policy_from_config
|
||||||
from llm_connect.openrouter import OpenRouterAdapter
|
from llm_connect.openrouter import OpenRouterAdapter
|
||||||
|
|||||||
@@ -138,7 +138,7 @@ state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: IB-WP-0020-T04
|
id: IB-WP-0020-T04
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
|
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
|
||||||
```
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user