generated from coulomb/repo-seed
Compare commits
2 Commits
82468c2165
...
debd2b8e69
| Author | SHA1 | Date | |
|---|---|---|---|
| debd2b8e69 | |||
| d3562454d7 |
@@ -94,6 +94,42 @@ skipped unless both `OPENROUTER_API_KEY` and
|
||||
chapter through the same path and asserts the provider metadata
|
||||
plumb-through.
|
||||
|
||||
### Live runs with `--provider routing`
|
||||
|
||||
When the routing CLI is what you want to exercise live, swap
|
||||
`--provider openrouter --model ...` for the routing pair:
|
||||
|
||||
```bash
|
||||
infospace-bench generate from-source ./LEFEVRE.epub \
|
||||
--workspace ./infospaces \
|
||||
--slug reminiscences-routed \
|
||||
--name "Reminiscences (Routed)" \
|
||||
--profile trading-literature \
|
||||
--provider routing \
|
||||
--routing-config ./examples/routing/trading-literature.yaml \
|
||||
--chapter I \
|
||||
--apply
|
||||
```
|
||||
|
||||
`examples/routing/trading-literature.yaml` is a checked-in starting
|
||||
config: cheap candidates for summary/evaluation, smart candidates for
|
||||
entity/relation, a `claude_code` baseline rule for future shadow
|
||||
sampling, and a workspace-relative `output/routing/quality.jsonl`
|
||||
ledger so adaptive observations stay with the workspace.
|
||||
|
||||
`--quality-floor <float>` on the same command overrides the config's
|
||||
`default_quality_floor` for a single invocation — useful for
|
||||
tightening the bar for a specific run without editing the file. The
|
||||
ledger fills up as the `AdaptiveRoutingPolicy` records each
|
||||
observation; later runs against the same workspace get the benefit
|
||||
without re-grading from scratch.
|
||||
|
||||
The parallel live-smoke test
|
||||
(`test_provider_routing_one_chapter_live_smoke`) is also gated on
|
||||
`INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` + `OPENROUTER_API_KEY` and
|
||||
asserts the per-stage adapter-choices report section names the routed
|
||||
model.
|
||||
|
||||
### Budget and usage registry
|
||||
|
||||
Every `generate plan` invocation appends a compact snapshot to
|
||||
|
||||
81
examples/routing/trading-literature.yaml
Normal file
81
examples/routing/trading-literature.yaml
Normal file
@@ -0,0 +1,81 @@
|
||||
# Example routing config for a trading-literature Lefevre-style run.
|
||||
#
|
||||
# Captures the IB-WP-0018 task-type taxonomy from docs/routing-task-types.md:
|
||||
# summarize-source → cheap model (volume-heavy, recoverable downstream)
|
||||
# extract-entities → smart model (durable output; be strict)
|
||||
# extract-relations → smart model (depends on entities)
|
||||
# evaluate-entity → judge model (different family from extraction)
|
||||
# synthesize-report → smart model (volume-of-one, quality matters, cheap)
|
||||
#
|
||||
# Quality floors are the recommended starting points from
|
||||
# docs/routing-task-types.md. With a ledger configured, AdaptiveRoutingPolicy
|
||||
# will pick the cheapest *qualifying* adapter per task type as observations
|
||||
# accumulate; until then it falls back to the static prefer/fallback order.
|
||||
#
|
||||
# Refresh the model rates in src/infospace_bench/model_rates.yaml before any
|
||||
# full-book run — list prices drift, and the rough USD estimate in the budget
|
||||
# log depends on them.
|
||||
|
||||
schema_version: 1
|
||||
|
||||
# Workspace-relative ledger so QualityLedger observations from this workspace
|
||||
# stay with this workspace. Drop this line to run pure static routing.
|
||||
ledger_path: output/routing/quality.jsonl
|
||||
|
||||
# Floors apply when --quality-floor is not passed at the call site. The CLI
|
||||
# flag wins, then the per-task quality_floor below, then this default.
|
||||
default_quality_floor: 0.80
|
||||
|
||||
stage_to_task_type:
|
||||
summarize-source: cheap
|
||||
extract-entities: smart
|
||||
extract-relations: smart
|
||||
evaluate-entity: judge
|
||||
synthesize-report: smart
|
||||
|
||||
task_types:
|
||||
|
||||
cheap:
|
||||
quality_floor: 0.70
|
||||
candidates:
|
||||
- id: openrouter:gpt-4o-mini
|
||||
provider: openrouter
|
||||
model: openai/gpt-4o-mini
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
max_cost_per_1k: 0.001
|
||||
- id: openrouter:claude-3.5-haiku
|
||||
provider: openrouter
|
||||
model: anthropic/claude-3.5-haiku
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
max_cost_per_1k: 0.003
|
||||
|
||||
smart:
|
||||
quality_floor: 0.85
|
||||
candidates:
|
||||
- id: openrouter:claude-3.5-haiku
|
||||
provider: openrouter
|
||||
model: anthropic/claude-3.5-haiku
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
- id: openrouter:claude-3.5-sonnet
|
||||
provider: openrouter
|
||||
model: anthropic/claude-3.5-sonnet
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
|
||||
judge:
|
||||
quality_floor: 0.80
|
||||
candidates:
|
||||
# Evaluation goes through a different family than extraction to limit
|
||||
# self-preference bias.
|
||||
- id: openrouter:gpt-4o-mini
|
||||
provider: openrouter
|
||||
model: openai/gpt-4o-mini
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
|
||||
# Baseline is wired here so a follow-up T05 ShadowingAdapter step can
|
||||
# reference `claude-code` as the grading oracle without editing the
|
||||
# task_types stanza.
|
||||
baseline:
|
||||
candidates:
|
||||
- id: claude-code
|
||||
provider: claude_code
|
||||
model: claude-opus-4-7
|
||||
@@ -29,7 +29,7 @@ _PACKAGE_RATES_PATH = Path(__file__).parent / "model_rates.yaml"
|
||||
HUB_URL_ENV = "INFOSPACE_BENCH_HUB_URL"
|
||||
HUB_DISABLE_ENV = "INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS"
|
||||
DEFAULT_HUB_URL = "http://127.0.0.1:8000"
|
||||
TOKEN_EVENTS_PATH = "/state/token-events"
|
||||
TOKEN_EVENTS_PATH = "/token-events/"
|
||||
HUB_TIMEOUT_SECONDS = 3.0
|
||||
|
||||
BUDGET_DIR = Path("output/budget")
|
||||
|
||||
@@ -203,9 +203,11 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
)
|
||||
generate_run.add_argument("root")
|
||||
generate_run.add_argument("--stage", default="all")
|
||||
generate_run.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
|
||||
generate_run.add_argument("--provider", choices=["fixture", "openrouter", "routing"], default="fixture")
|
||||
generate_run.add_argument("--model", default="")
|
||||
generate_run.add_argument("--fixture-responses", default="")
|
||||
generate_run.add_argument("--routing-config", default="", help="YAML routing config (required with --provider routing)")
|
||||
generate_run.add_argument("--quality-floor", type=float, default=None, help="Override the config's default_quality_floor for this run")
|
||||
generate_run.add_argument("--resume", action="store_true")
|
||||
generate_run.add_argument("--force", action="store_true")
|
||||
|
||||
@@ -215,9 +217,11 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
)
|
||||
generate_resume.add_argument("root")
|
||||
generate_resume.add_argument("--stage", default="all")
|
||||
generate_resume.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
|
||||
generate_resume.add_argument("--provider", choices=["fixture", "openrouter", "routing"], default="fixture")
|
||||
generate_resume.add_argument("--model", default="")
|
||||
generate_resume.add_argument("--fixture-responses", default="")
|
||||
generate_resume.add_argument("--routing-config", default="")
|
||||
generate_resume.add_argument("--quality-floor", type=float, default=None)
|
||||
generate_resume.add_argument("--force", action="store_true")
|
||||
|
||||
generate_status = generate_sub.add_parser(
|
||||
@@ -236,9 +240,11 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
generate_from_source.add_argument("--name", required=True)
|
||||
generate_from_source.add_argument("--profile", default="general-knowledge")
|
||||
generate_from_source.add_argument("--stage", default="all")
|
||||
generate_from_source.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
|
||||
generate_from_source.add_argument("--provider", choices=["fixture", "openrouter", "routing"], default="fixture")
|
||||
generate_from_source.add_argument("--model", default="")
|
||||
generate_from_source.add_argument("--fixture-responses", default="")
|
||||
generate_from_source.add_argument("--routing-config", default="", help="YAML routing config (required with --provider routing)")
|
||||
generate_from_source.add_argument("--quality-floor", type=float, default=None)
|
||||
generate_from_source.add_argument("--max-chunks", type=int, default=0)
|
||||
generate_from_source.add_argument(
|
||||
"--chapter",
|
||||
@@ -551,6 +557,8 @@ def main(argv: list[str] | None = None) -> int:
|
||||
provider=args.provider,
|
||||
model=args.model,
|
||||
fixture_responses=args.fixture_responses or None,
|
||||
routing_config=args.routing_config or None,
|
||||
quality_floor=args.quality_floor,
|
||||
resume=args.resume,
|
||||
force=args.force,
|
||||
).to_dict()
|
||||
@@ -563,6 +571,8 @@ def main(argv: list[str] | None = None) -> int:
|
||||
provider=args.provider,
|
||||
model=args.model,
|
||||
fixture_responses=args.fixture_responses or None,
|
||||
routing_config=args.routing_config or None,
|
||||
quality_floor=args.quality_floor,
|
||||
resume=True,
|
||||
force=args.force,
|
||||
).to_dict()
|
||||
@@ -589,6 +599,8 @@ def main(argv: list[str] | None = None) -> int:
|
||||
provider=args.provider,
|
||||
model=args.model,
|
||||
fixture_responses=args.fixture_responses or None,
|
||||
routing_config=args.routing_config or None,
|
||||
quality_floor=args.quality_floor,
|
||||
)
|
||||
_write_json(result.to_dict())
|
||||
else:
|
||||
|
||||
@@ -427,6 +427,8 @@ def run_generation(
|
||||
provider: str = "fixture",
|
||||
model: str = "",
|
||||
fixture_responses: str | Path | None = None,
|
||||
routing_config: str | Path | None = None,
|
||||
quality_floor: float | None = None,
|
||||
resume: bool = False,
|
||||
force: bool = False,
|
||||
) -> GenerationRunResult:
|
||||
@@ -449,7 +451,14 @@ def run_generation(
|
||||
started_wall = datetime.now(timezone.utc)
|
||||
monotonic_start = _monotonic()
|
||||
adapter = (
|
||||
_adapter_for(provider, model=model, fixture_responses=fixture_responses)
|
||||
_adapter_for(
|
||||
provider,
|
||||
model=model,
|
||||
fixture_responses=fixture_responses,
|
||||
routing_config=routing_config,
|
||||
quality_floor=quality_floor,
|
||||
workspace=_workspace_for(root_path),
|
||||
)
|
||||
if workflow_ids
|
||||
else None
|
||||
)
|
||||
@@ -551,14 +560,42 @@ def _adapter_for(
|
||||
*,
|
||||
model: str,
|
||||
fixture_responses: str | Path | None,
|
||||
routing_config: str | Path | None = None,
|
||||
quality_floor: float | None = None,
|
||||
workspace: Path | None = None,
|
||||
) -> AssistedGenerationAdapter:
|
||||
if fixture_responses:
|
||||
return FixtureAssistedGenerationAdapter.from_file(Path(fixture_responses))
|
||||
if provider == "openrouter":
|
||||
return OpenRouterAssistedGenerationAdapter(model=model)
|
||||
if provider == "routing":
|
||||
if not routing_config:
|
||||
raise InfospaceError(
|
||||
"missing_routing_config",
|
||||
"--provider routing requires --routing-config <path>",
|
||||
{"provider": provider},
|
||||
)
|
||||
from .routing import RoutingAssistedGenerationAdapter
|
||||
from .routing_config import (
|
||||
build_routing_policy_from_config,
|
||||
load_routing_config,
|
||||
)
|
||||
|
||||
config = load_routing_config(routing_config)
|
||||
policy = build_routing_policy_from_config(config, workspace=workspace)
|
||||
effective_floor = (
|
||||
quality_floor
|
||||
if quality_floor is not None
|
||||
else config.default_quality_floor
|
||||
)
|
||||
return RoutingAssistedGenerationAdapter(
|
||||
policy=policy,
|
||||
stage_to_task_type=dict(config.stage_to_task_type),
|
||||
quality_floor=effective_floor,
|
||||
)
|
||||
raise InfospaceError(
|
||||
"missing_assisted_generation_adapter",
|
||||
"Assisted generation requires --fixture-responses or --provider openrouter",
|
||||
"Assisted generation requires --fixture-responses, --provider openrouter, or --provider routing",
|
||||
{"provider": provider},
|
||||
)
|
||||
|
||||
|
||||
@@ -112,7 +112,11 @@ def _identify_adapter(adapter: LLMAdapter) -> str:
|
||||
adapter_id = getattr(adapter, "adapter_id", "")
|
||||
if adapter_id:
|
||||
return str(adapter_id)
|
||||
model = getattr(adapter, "model", "") or getattr(adapter, "model_name", "")
|
||||
model = (
|
||||
getattr(adapter, "model", "")
|
||||
or getattr(adapter, "model_name", "")
|
||||
or getattr(adapter, "_model", "")
|
||||
)
|
||||
name = type(adapter).__name__
|
||||
if model:
|
||||
return f"{name}:{model}"
|
||||
|
||||
@@ -522,7 +522,7 @@ def test_emit_token_event_calls_poster_with_record_token_payload(tmp_path: Path)
|
||||
assert result["status"] == "emitted"
|
||||
assert len(calls) == 1
|
||||
url, payload, timeout = calls[0]
|
||||
assert url == "http://hub.example/state/token-events"
|
||||
assert url == "http://hub.example/token-events/"
|
||||
assert payload["tokens_in"] == 1200
|
||||
assert payload["tokens_out"] == 400
|
||||
assert payload["model"] == "openai/gpt-4o-mini"
|
||||
|
||||
@@ -208,3 +208,87 @@ def test_openrouter_one_chapter_smoke(tmp_path: Path) -> None:
|
||||
and item.get("provenance", {}).get("provider_metadata", {}).get("request_id")
|
||||
]
|
||||
assert generated_with_metadata, "generated artifacts should carry provider_metadata.request_id"
|
||||
|
||||
|
||||
_LIVE_ROUTING_REASON = (
|
||||
"set INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 and OPENROUTER_API_KEY to run "
|
||||
"the optional one-chapter routing smoke against OpenRouter"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.skipif(not (_LIVE_OPT_IN and _LIVE_API_KEY), reason=_LIVE_ROUTING_REASON)
|
||||
def test_provider_routing_one_chapter_live_smoke(tmp_path: Path) -> None:
|
||||
"""Live smoke: one chapter through --provider routing against OpenRouter.
|
||||
|
||||
Uses a minimal one-candidate-per-task-type routing config so the test
|
||||
spends roughly the same as the static OpenRouter smoke. Asserts the run
|
||||
completes, the routing bridge recorded adapter_id / task_type on
|
||||
provider_metadata, and the per-stage adapter-choices report section
|
||||
reflects routed choices.
|
||||
"""
|
||||
book = _build_fixture_epub(tmp_path / "lefevre.epub")
|
||||
model = os.environ.get("INFOSPACE_BENCH_LIVE_MODEL", "openai/gpt-4o-mini")
|
||||
|
||||
routing_config = tmp_path / "routing.yaml"
|
||||
routing_config.write_text(
|
||||
yaml.safe_dump(
|
||||
{
|
||||
"schema_version": 1,
|
||||
"stage_to_task_type": {
|
||||
"summarize-source": "cheap",
|
||||
"extract-entities": "cheap",
|
||||
"extract-relations": "cheap",
|
||||
"evaluate-entity": "cheap",
|
||||
"synthesize-report": "cheap",
|
||||
},
|
||||
"task_types": {
|
||||
"cheap": {
|
||||
"candidates": [
|
||||
{
|
||||
"id": f"openrouter:{model}",
|
||||
"provider": "openrouter",
|
||||
"model": model,
|
||||
"api_key_env": "OPENROUTER_API_KEY",
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
sort_keys=False,
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
infospace = init_generation_infospace(
|
||||
tmp_path,
|
||||
book,
|
||||
"lefevre-live-routing",
|
||||
name="Lefevre Live Routing",
|
||||
profile="trading-literature",
|
||||
chapter_filter=["I"],
|
||||
)
|
||||
plan_generation(infospace.root, cost_per_1k_tokens=0.5)
|
||||
result = run_generation(
|
||||
infospace.root,
|
||||
provider="routing",
|
||||
routing_config=routing_config,
|
||||
)
|
||||
status = status_generation(infospace.root)
|
||||
|
||||
assert result.status == "completed"
|
||||
assert status["source_chunk_count"] == 1
|
||||
assert status["entity_count"] >= 1
|
||||
|
||||
report = (infospace.root / "reports" / "generation-summary.md").read_text(encoding="utf-8")
|
||||
assert "## Per-stage adapter choices" in report
|
||||
assert model in report, "report should name the routed model"
|
||||
|
||||
# The routing bridge writes adapter_id + task_type onto provider_metadata.
|
||||
index = yaml.safe_load((infospace.root / "artifacts" / "index.yaml").read_text(encoding="utf-8"))
|
||||
routed_artifacts = [
|
||||
item
|
||||
for item in index["artifacts"]
|
||||
if item["kind"] in {"entity", "relation", "generated"}
|
||||
and (item.get("provenance") or {}).get("provider_metadata", {}).get("adapter_id")
|
||||
]
|
||||
assert routed_artifacts, "routed artifacts must carry adapter_id provenance"
|
||||
|
||||
286
tests/test_routing_cli.py
Normal file
286
tests/test_routing_cli.py
Normal file
@@ -0,0 +1,286 @@
|
||||
"""
|
||||
Tests for the routing CLI flags (IB-WP-0020-T03).
|
||||
|
||||
Three levels:
|
||||
- _adapter_for("routing") unit checks — missing config, happy path
|
||||
- run_generation end-to-end through --provider routing with a stubbed
|
||||
OpenRouterAdapter.execute_prompt so no network is required
|
||||
- CLI subprocess smoke that proves the new flags are wired
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import yaml
|
||||
|
||||
from infospace_bench.errors import InfospaceError
|
||||
from infospace_bench.generator import (
|
||||
_adapter_for,
|
||||
init_generation_infospace,
|
||||
run_generation,
|
||||
status_generation,
|
||||
)
|
||||
from infospace_bench.routing import RoutingAssistedGenerationAdapter
|
||||
|
||||
|
||||
FIXTURE_ROOT = Path(__file__).parent / "fixtures" / "lefevre"
|
||||
|
||||
|
||||
def _build_fixture_epub(target: Path) -> Path:
|
||||
sources = FIXTURE_ROOT / "sources"
|
||||
layout: dict[str, str] = {
|
||||
"mimetype": "application/epub+zip",
|
||||
"META-INF/container.xml": (sources / "container.xml").read_text(encoding="utf-8"),
|
||||
}
|
||||
for source in sorted(sources.glob("*.xhtml")):
|
||||
layout[f"OEBPS/{source.name}"] = source.read_text(encoding="utf-8")
|
||||
layout["OEBPS/content.opf"] = (sources / "content.opf").read_text(encoding="utf-8")
|
||||
with zipfile.ZipFile(target, "w") as archive:
|
||||
for path_in_zip, contents in layout.items():
|
||||
archive.writestr(path_in_zip, contents)
|
||||
return target
|
||||
|
||||
|
||||
def _write_routing_config(path: Path, *, ledger_relpath: str | None = None) -> None:
|
||||
"""Minimal routing config that maps every fixture stage to one cheap candidate."""
|
||||
data: dict = {
|
||||
"schema_version": 1,
|
||||
"stage_to_task_type": {
|
||||
"summarize-source": "cheap",
|
||||
"extract-entities": "cheap",
|
||||
"extract-relations": "cheap",
|
||||
"evaluate-entity": "cheap",
|
||||
"synthesize-report": "cheap",
|
||||
},
|
||||
"task_types": {
|
||||
"cheap": {
|
||||
"candidates": [
|
||||
{
|
||||
"id": "openrouter:gpt-4o-mini",
|
||||
"provider": "openrouter",
|
||||
"model": "openai/gpt-4o-mini",
|
||||
"api_key_env": "OPENROUTER_API_KEY",
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
}
|
||||
if ledger_relpath is not None:
|
||||
data["ledger_path"] = ledger_relpath
|
||||
path.write_text(yaml.safe_dump(data, sort_keys=False), encoding="utf-8")
|
||||
|
||||
|
||||
def test_adapter_for_routing_missing_config_raises() -> None:
|
||||
with pytest.raises(InfospaceError) as exc_info:
|
||||
_adapter_for("routing", model="", fixture_responses=None, routing_config=None)
|
||||
assert exc_info.value.code == "missing_routing_config"
|
||||
|
||||
|
||||
def test_adapter_for_routing_returns_bridge(tmp_path: Path, monkeypatch) -> None:
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-fake-test-key")
|
||||
config_path = tmp_path / "routing.yaml"
|
||||
_write_routing_config(config_path)
|
||||
|
||||
adapter = _adapter_for(
|
||||
"routing",
|
||||
model="",
|
||||
fixture_responses=None,
|
||||
routing_config=config_path,
|
||||
workspace=tmp_path,
|
||||
)
|
||||
|
||||
assert isinstance(adapter, RoutingAssistedGenerationAdapter)
|
||||
assert adapter.stage_to_task_type["summarize-source"] == "cheap"
|
||||
|
||||
|
||||
_FIXTURE_RESPONSES = {
|
||||
"summarize-source": "# Source Summary\n\nFixture summary content.\n",
|
||||
"extract-entities": (
|
||||
"# Stub Entity\n\n"
|
||||
"## Category\n\nstrategy\n\n"
|
||||
"## Definition\n\nA stub trading concept for the routing CLI smoke.\n"
|
||||
),
|
||||
"extract-relations": (
|
||||
"# Stub Entity Practices Tape Reading\n\n"
|
||||
"## Subject\n\nStub Entity\n\n"
|
||||
"## Predicate\n\npractices\n\n"
|
||||
"## Object\n\nTape Reading\n\n"
|
||||
"## Relation Type\n\nstrategy_outcome\n\n"
|
||||
"## Evidence\n\nFixture evidence.\n"
|
||||
),
|
||||
"evaluate-entity": (
|
||||
"---\n"
|
||||
"artifact_id: entity/stub-entity.md\n"
|
||||
"evaluator: fixture\n"
|
||||
"evaluated_at: '2026-05-18T00:00:00'\n"
|
||||
"scores:\n"
|
||||
" - name: groundedness\n value: 4.0\n max_value: 5.0\n"
|
||||
" - name: lesson_clarity\n value: 4.0\n max_value: 5.0\n"
|
||||
" - name: historical_context\n value: 4.0\n max_value: 5.0\n"
|
||||
" - name: overgeneralization_risk\n value: 4.0\n max_value: 5.0\n"
|
||||
"---\n\n"
|
||||
"# Evaluation: entity/stub-entity.md\n"
|
||||
),
|
||||
"synthesize-report": "# Routed Report\n\nFixture report.\n",
|
||||
}
|
||||
|
||||
|
||||
def _stub_openrouter_execute(self, prompt, config):
|
||||
"""Replacement for OpenRouterAdapter.execute_prompt that returns canned content.
|
||||
|
||||
Identifies the stage from the rendered template's H1 line (templates
|
||||
start with ``# Extract Entities`` / ``# Extract Relations`` / ``# Evaluate
|
||||
...`` / ``# Synthesize ...``; anything else is treated as the
|
||||
summarize-source stage).
|
||||
"""
|
||||
from llm_connect.models import LLMResponse
|
||||
|
||||
first_line = prompt.lstrip().splitlines()[0] if prompt.strip() else ""
|
||||
lower = first_line.lower()
|
||||
if lower.startswith("# extract") and "entit" in lower:
|
||||
content = _FIXTURE_RESPONSES["extract-entities"]
|
||||
elif lower.startswith("# extract") and "relation" in lower:
|
||||
content = _FIXTURE_RESPONSES["extract-relations"]
|
||||
elif lower.startswith("# evaluate"):
|
||||
content = _FIXTURE_RESPONSES["evaluate-entity"]
|
||||
elif lower.startswith("# synthesize"):
|
||||
content = _FIXTURE_RESPONSES["synthesize-report"]
|
||||
else:
|
||||
content = _FIXTURE_RESPONSES["summarize-source"]
|
||||
return LLMResponse(
|
||||
content=content,
|
||||
model=getattr(self, "_model", "openai/gpt-4o-mini"),
|
||||
usage={"prompt_tokens": len(prompt.split()), "completion_tokens": 40},
|
||||
finish_reason="stop",
|
||||
metadata={"request_id": "or-stub-1"},
|
||||
)
|
||||
|
||||
|
||||
def test_run_generation_via_routing_provider_completes_end_to_end(
|
||||
tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-fake-test-key")
|
||||
from llm_connect.openrouter import OpenRouterAdapter
|
||||
|
||||
monkeypatch.setattr(
|
||||
OpenRouterAdapter, "execute_prompt", _stub_openrouter_execute, raising=True
|
||||
)
|
||||
|
||||
book = _build_fixture_epub(tmp_path / "lefevre.epub")
|
||||
config_path = tmp_path / "routing.yaml"
|
||||
_write_routing_config(config_path)
|
||||
|
||||
infospace = init_generation_infospace(
|
||||
tmp_path,
|
||||
book,
|
||||
"lefevre-routing-smoke",
|
||||
name="Lefevre Routing Smoke",
|
||||
profile="trading-literature",
|
||||
chapter_filter=["I"],
|
||||
)
|
||||
result = run_generation(
|
||||
infospace.root,
|
||||
provider="routing",
|
||||
routing_config=config_path,
|
||||
)
|
||||
status = status_generation(infospace.root)
|
||||
|
||||
assert result.status == "completed"
|
||||
assert status["source_chunk_count"] == 1
|
||||
assert status["entity_count"] >= 1
|
||||
assert status["evaluation_count"] >= 1
|
||||
|
||||
report = (infospace.root / "reports" / "generation-summary.md").read_text(encoding="utf-8")
|
||||
assert "## Per-stage adapter choices" in report
|
||||
assert "openai/gpt-4o-mini" in report # adapter_id ends with the model
|
||||
|
||||
# Budget usage rollup should bucket calls by the routed model.
|
||||
import yaml as _yaml
|
||||
|
||||
usage = _yaml.safe_load((infospace.root / "output" / "budget" / "usage.yaml").read_text(encoding="utf-8"))
|
||||
bucket_models = {b["model"] for b in usage["runs"][0]["per_bucket"]}
|
||||
assert "openai/gpt-4o-mini" in bucket_models
|
||||
|
||||
|
||||
def test_from_source_cli_provider_routing(tmp_path: Path, monkeypatch) -> None:
|
||||
book = _build_fixture_epub(tmp_path / "lefevre.epub")
|
||||
config_path = tmp_path / "routing.yaml"
|
||||
_write_routing_config(config_path)
|
||||
|
||||
env = os.environ.copy()
|
||||
env["PYTHONPATH"] = "src:/home/worsch/markitect-tool/src:/home/worsch/llm-connect"
|
||||
|
||||
# Missing API key → fast fail from the loader, no subprocess crash.
|
||||
env.pop("OPENROUTER_API_KEY", None)
|
||||
bad = subprocess.run(
|
||||
[
|
||||
sys.executable,
|
||||
"-m",
|
||||
"infospace_bench",
|
||||
"generate",
|
||||
"from-source",
|
||||
str(book),
|
||||
"--workspace",
|
||||
str(tmp_path),
|
||||
"--slug",
|
||||
"routing-cli-missing-key",
|
||||
"--name",
|
||||
"Routing CLI Missing Key",
|
||||
"--profile",
|
||||
"trading-literature",
|
||||
"--provider",
|
||||
"routing",
|
||||
"--routing-config",
|
||||
str(config_path),
|
||||
"--chapter",
|
||||
"I",
|
||||
"--apply",
|
||||
],
|
||||
check=False,
|
||||
env=env,
|
||||
text=True,
|
||||
capture_output=True,
|
||||
)
|
||||
assert bad.returncode != 0
|
||||
assert "missing_routing_api_key" in (bad.stdout + bad.stderr)
|
||||
|
||||
|
||||
def test_run_via_routing_resolves_workspace_relative_ledger(
|
||||
tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-fake-test-key")
|
||||
from llm_connect.openrouter import OpenRouterAdapter
|
||||
|
||||
monkeypatch.setattr(
|
||||
OpenRouterAdapter, "execute_prompt", _stub_openrouter_execute, raising=True
|
||||
)
|
||||
|
||||
book = _build_fixture_epub(tmp_path / "lefevre.epub")
|
||||
config_path = tmp_path / "routing.yaml"
|
||||
_write_routing_config(config_path, ledger_relpath="output/routing/quality.jsonl")
|
||||
|
||||
infospace = init_generation_infospace(
|
||||
tmp_path,
|
||||
book,
|
||||
"lefevre-routing-ledger",
|
||||
name="Lefevre Routing Ledger",
|
||||
profile="trading-literature",
|
||||
chapter_filter=["I"],
|
||||
)
|
||||
run_generation(
|
||||
infospace.root,
|
||||
provider="routing",
|
||||
routing_config=config_path,
|
||||
quality_floor=0.7,
|
||||
)
|
||||
|
||||
# ledger_path is relative to the workspace (tmp_path), not the infospace root.
|
||||
ledger_path = tmp_path / "output" / "routing" / "quality.jsonl"
|
||||
assert ledger_path.parent.is_dir(), "loader must create the ledger parent dir"
|
||||
@@ -412,6 +412,25 @@ def test_build_routing_policy_claude_code_needs_no_api_key() -> None:
|
||||
assert isinstance(policy.rules[0].prefer, ClaudeCodeAdapter)
|
||||
|
||||
|
||||
def test_example_trading_literature_config_parses() -> None:
|
||||
"""Regression: the shipped example config must parse cleanly."""
|
||||
from infospace_bench.routing_config import load_routing_config
|
||||
|
||||
example_path = Path(__file__).resolve().parent.parent / "examples" / "routing" / "trading-literature.yaml"
|
||||
|
||||
config = load_routing_config(example_path)
|
||||
|
||||
task_type_names = {task.task_type for task in config.task_types}
|
||||
assert {"cheap", "smart", "judge", "baseline"} <= task_type_names
|
||||
assert config.default_quality_floor == 0.80
|
||||
# Each shipped stage maps to a task type the config actually declares.
|
||||
for stage, task_type in config.stage_to_task_type.items():
|
||||
assert task_type in task_type_names, f"stage {stage!r} maps to undeclared task type {task_type!r}"
|
||||
# baseline is included so a T05 ShadowingAdapter wiring can reference it.
|
||||
baseline = next(t for t in config.task_types if t.task_type == "baseline")
|
||||
assert baseline.candidates[0].provider == "claude_code"
|
||||
|
||||
|
||||
def test_build_routing_policy_honours_custom_api_key_env() -> None:
|
||||
from infospace_bench.routing_config import build_routing_policy_from_config
|
||||
from llm_connect.openrouter import OpenRouterAdapter
|
||||
|
||||
@@ -117,7 +117,7 @@ state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd"
|
||||
|
||||
```task
|
||||
id: IB-WP-0020-T03
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
|
||||
```
|
||||
@@ -138,7 +138,7 @@ state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
|
||||
|
||||
```task
|
||||
id: IB-WP-0020-T04
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user