IB-WP-0016-T06: OpenRouter live-run guardrails

Add --chapter / --from-chapter / --to-chapter / --chunk selection flags
to generate init and generate from-source, plumb them into
init_generation_infospace via a new _filter_chunks_by_chapter helper,
and refuse to create an infospace when the filters reject every chunk
(InfospaceError "empty_chapter_selection"). The flags use the same
T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a
single-chapter selection is a one-flag command.

OpenRouter run-record metadata (model, request_id, usage tokens,
retry_count, duration_seconds) already lands in
output/workflows/runs/*.yaml; this task just adds the smoke test that
proves it stays there, plus the parallel guarantee that the same
provider metadata reaches generated artifact provenance via
provenance.provider_metadata.

tests/test_openrouter_live.py covers:
- chapter-filter, from/to-chapter range, and empty-selection failure on
  init (non-live, deterministic)
- CLI smoke through generate from-source with --chapter
- a pytest-skipped live OpenRouter one-chapter end-to-end gated by
  OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with
  INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini)

docs/generic-source-generator.md gains a "Live OpenRouter runs (handle
with care)" section that walks plan-before-run, single-chapter live
run, the budget/usage artifacts, and the checks a reviewer should run
before scaling to the full book.

129 tests pass, 1 skipped (the live smoke, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 23:04:19 +02:00
parent 348deca9f2
commit ab23c5873e
5 changed files with 351 additions and 1 deletions

View File

@@ -48,6 +48,52 @@ infospace-bench generate status ./infospaces/book-space
shows chunk counts, generated artifact counts, evaluations, metrics, history,
and stale source/profile inputs.
### Live OpenRouter runs (handle with care)
A single-chapter live run is the only OpenRouter shape the test suite
covers today. Use `--chapter` (or `--from-chapter` / `--to-chapter`) on
`generate init` or `generate from-source` to scope what gets registered
before any provider calls happen:
```bash
export OPENROUTER_API_KEY=...
# Preview the cost first
infospace-bench generate plan ./infospaces/foo --chapter I --cost-per-1k 0.30
# Run only Chapter I against a cheap model
infospace-bench generate from-source ./LEFEVRE.epub \
--workspace ./infospaces \
--slug reminiscences-ch1 \
--name "Reminiscences (Ch I)" \
--profile trading-literature \
--provider openrouter \
--model openai/gpt-4o-mini \
--chapter I \
--apply
```
`output/budget/plans.yaml`, `usage.yaml`, and `summary.yaml` record what
was estimated, what was actually spent, and the plan-vs-actual delta.
`output/workflows/runs/*.yaml` carry the OpenRouter request_id, model,
token usage, retry count, and per-call duration; the same metadata
reaches the entity/relation/evaluation artifacts via
`provenance.provider_metadata`.
Before scaling to the full book:
- Inspect each chapter's outputs and `generation-summary.md`
- Multiply the per-chapter `total_provider_calls_estimate` and
`estimated_cost_usd` by the chapter count and compare to your budget
- Decide on a final model and confirm the rate-table entry exists in
`src/infospace_bench/model_rates.yaml` or your workspace override
The optional live-smoke test in `tests/test_openrouter_live.py` is
skipped unless both `OPENROUTER_API_KEY` and
`INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` are set. It runs a single
chapter through the same path and asserts the provider metadata
plumb-through.
### Budget and usage registry
Every `generate plan` invocation appends a compact snapshot to

View File

@@ -148,6 +148,20 @@ def build_parser() -> argparse.ArgumentParser:
generate_init.add_argument("--name", required=True)
generate_init.add_argument("--profile", default="general-knowledge")
generate_init.add_argument("--max-chunks", type=int, default=0)
generate_init.add_argument(
"--chapter",
action="append",
default=[],
help="Filter to chapter label or number (repeatable: --chapter I --chapter II)",
)
generate_init.add_argument("--from-chapter", type=int, default=None)
generate_init.add_argument("--to-chapter", type=int, default=None)
generate_init.add_argument(
"--chunk",
action="append",
default=[],
help="Filter to source chunk id (repeatable)",
)
generate_plan = generate_sub.add_parser(
"plan",
@@ -226,6 +240,20 @@ def build_parser() -> argparse.ArgumentParser:
generate_from_source.add_argument("--model", default="")
generate_from_source.add_argument("--fixture-responses", default="")
generate_from_source.add_argument("--max-chunks", type=int, default=0)
generate_from_source.add_argument(
"--chapter",
action="append",
default=[],
help="Filter to chapter label or number (repeatable)",
)
generate_from_source.add_argument("--from-chapter", type=int, default=None)
generate_from_source.add_argument("--to-chapter", type=int, default=None)
generate_from_source.add_argument(
"--chunk",
action="append",
default=[],
help="Filter to source chunk id (repeatable)",
)
generate_from_source.add_argument("--apply", action="store_true")
budget = sub.add_parser("budget", help="Inspect per-infospace budget and usage records")
@@ -479,6 +507,10 @@ def main(argv: list[str] | None = None) -> int:
name=args.name,
profile=args.profile,
max_chunks=_optional_positive(args.max_chunks),
chapter_filter=args.chapter or None,
chunk_filter=args.chunk or None,
from_chapter=args.from_chapter,
to_chapter=args.to_chapter,
)
_write_json(
{
@@ -537,6 +569,10 @@ def main(argv: list[str] | None = None) -> int:
name=args.name,
profile=args.profile,
max_chunks=_optional_positive(args.max_chunks),
chapter_filter=args.chapter or None,
chunk_filter=args.chunk or None,
from_chapter=args.from_chapter,
to_chapter=args.to_chapter,
)
if args.apply:
result = run_generation(

View File

@@ -79,8 +79,30 @@ def init_generation_infospace(
name: str,
profile: str = DEFAULT_PROFILE,
max_chunks: int | None = None,
chapter_filter: list[str] | None = None,
chunk_filter: list[str] | None = None,
from_chapter: int | None = None,
to_chapter: int | None = None,
) -> Any:
chunks = normalize_source(source, max_chunks=max_chunks)
chunks = _filter_chunks_by_chapter(
chunks,
chapter_filter=chapter_filter,
chunk_filter=chunk_filter,
from_chapter=from_chapter,
to_chapter=to_chapter,
)
if not chunks:
raise InfospaceError(
"empty_chapter_selection",
"Selection filters excluded every chunk; nothing to register",
{
"chapter_filter": chapter_filter,
"chunk_filter": chunk_filter,
"from_chapter": from_chapter,
"to_chapter": to_chapter,
},
)
infospace = create_infospace(Path(workspace), slug, name=name)
_install_profile(infospace.root, profile)
_write_workflows(infospace.root, profile)
@@ -256,6 +278,42 @@ def plan_generation_summary(
}
def _filter_chunks_by_chapter(
chunks: list[SourceChunk],
*,
chapter_filter: list[str] | None,
chunk_filter: list[str] | None,
from_chapter: int | None,
to_chapter: int | None,
) -> list[SourceChunk]:
chunk_set = {value.strip() for value in (chunk_filter or []) if value.strip()}
label_set = {value.strip().lower() for value in (chapter_filter or []) if value.strip()}
if not chunk_set and not label_set and from_chapter is None and to_chapter is None:
return list(chunks)
out: list[SourceChunk] = []
for chunk in chunks:
if chunk_set and chunk.chunk_id not in chunk_set:
continue
if label_set:
label = (chunk.chapter_label or "").strip().lower()
number_match = (
chunk.chapter_number is not None
and str(chunk.chapter_number) in label_set
)
label_match = label in label_set if label else False
if not (number_match or label_match):
continue
if from_chapter is not None or to_chapter is not None:
if chunk.chapter_number is None:
continue
if from_chapter is not None and chunk.chapter_number < from_chapter:
continue
if to_chapter is not None and chunk.chapter_number > to_chapter:
continue
out.append(chunk)
return out
def _select_source_chunks(
sources: list[Any],
*,

View File

@@ -0,0 +1,210 @@
"""
Optional live-run smoke against OpenRouter (IB-WP-0016-T06).
Skipped unless the caller has explicitly opted in by setting BOTH
`OPENROUTER_API_KEY` and `INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1`. The
smoke run touches the network and is billed; keep `--chapter`-style
selection bound to a single chapter and a cheap model.
Also includes a non-live regression test that exercises the new chapter
selection plumbing without hitting the network.
"""
import json
import os
import subprocess
import sys
import zipfile
from pathlib import Path
import pytest
import yaml
from infospace_bench.generator import (
init_generation_infospace,
plan_generation,
run_generation,
status_generation,
)
FIXTURE_ROOT = Path(__file__).parent / "fixtures" / "lefevre"
def _build_fixture_epub(target: Path) -> Path:
sources = FIXTURE_ROOT / "sources"
layout: dict[str, str] = {
"mimetype": "application/epub+zip",
"META-INF/container.xml": (sources / "container.xml").read_text(encoding="utf-8"),
}
for source in sorted(sources.glob("*.xhtml")):
layout[f"OEBPS/{source.name}"] = source.read_text(encoding="utf-8")
layout["OEBPS/content.opf"] = (sources / "content.opf").read_text(encoding="utf-8")
with zipfile.ZipFile(target, "w") as archive:
for path_in_zip, contents in layout.items():
archive.writestr(path_in_zip, contents)
return target
def test_init_chapter_filter_keeps_only_selected_chapters(tmp_path: Path) -> None:
book = _build_fixture_epub(tmp_path / "lefevre.epub")
infospace = init_generation_infospace(
tmp_path,
book,
"lefevre-chapter-1",
name="Lefevre Chapter I",
profile="trading-literature",
chapter_filter=["I"],
)
sources = sorted(p.name for p in (infospace.root / "artifacts" / "sources").glob("*.md"))
assert sources == ["chapter-01.md"], sources
def test_init_chapter_range_keeps_only_selected_range(tmp_path: Path) -> None:
book = _build_fixture_epub(tmp_path / "lefevre.epub")
infospace = init_generation_infospace(
tmp_path,
book,
"lefevre-chapter-range",
name="Lefevre Chapters II-III",
profile="trading-literature",
from_chapter=2,
to_chapter=3,
)
sources = sorted(p.name for p in (infospace.root / "artifacts" / "sources").glob("*.md"))
assert sources == ["chapter-02.md", "chapter-03.md"], sources
def test_init_empty_selection_raises(tmp_path: Path) -> None:
from infospace_bench.errors import InfospaceError
book = _build_fixture_epub(tmp_path / "lefevre.epub")
with pytest.raises(InfospaceError) as exc_info:
init_generation_infospace(
tmp_path,
book,
"lefevre-no-match",
name="None",
profile="trading-literature",
chapter_filter=["nope"],
)
assert exc_info.value.code == "empty_chapter_selection"
def test_from_source_cli_chapter_filter(tmp_path: Path) -> None:
book = _build_fixture_epub(tmp_path / "lefevre.epub")
env = os.environ.copy()
env["PYTHONPATH"] = "src:/home/worsch/markitect-tool/src"
result = subprocess.run(
[
sys.executable,
"-m",
"infospace_bench",
"generate",
"from-source",
str(book),
"--workspace",
str(tmp_path),
"--slug",
"lefevre-cli-ch1",
"--name",
"Lefevre CLI Chapter I",
"--profile",
"trading-literature",
"--chapter",
"I",
"--fixture-responses",
str(FIXTURE_ROOT / "responses.yaml"),
"--apply",
],
check=False,
env=env,
text=True,
capture_output=True,
)
assert result.returncode == 0, result.stderr
payload = json.loads(result.stdout)
assert payload["status"] == "completed"
root = Path(payload["root"])
sources = sorted(p.name for p in (root / "artifacts" / "sources").glob("*.md"))
assert sources == ["chapter-01.md"]
_LIVE_OPT_IN = os.environ.get("INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER")
_LIVE_API_KEY = os.environ.get("OPENROUTER_API_KEY")
_LIVE_REASON = (
"set INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 and OPENROUTER_API_KEY to run "
"the optional one-chapter OpenRouter smoke"
)
@pytest.mark.skipif(not (_LIVE_OPT_IN and _LIVE_API_KEY), reason=_LIVE_REASON)
def test_openrouter_one_chapter_smoke(tmp_path: Path) -> None:
"""Live smoke: one Lefevre fixture chapter through OpenRouter end-to-end.
Asserts the run completes, produces a manifest-backed infospace, and that
the run record + generated artifact provenance carry the provider model,
request_id, usage counts, and retry count fields T06 promises.
"""
book = _build_fixture_epub(tmp_path / "lefevre.epub")
model = os.environ.get("INFOSPACE_BENCH_LIVE_MODEL", "openai/gpt-4o-mini")
infospace = init_generation_infospace(
tmp_path,
book,
"lefevre-live-smoke",
name="Lefevre Live Smoke",
profile="trading-literature",
chapter_filter=["I"],
)
plan_generation(infospace.root, cost_per_1k_tokens=0.5)
result = run_generation(
infospace.root,
provider="openrouter",
model=model,
)
status = status_generation(infospace.root)
assert result.status == "completed"
assert status["source_chunk_count"] == 1
assert status["entity_count"] >= 1
assert status["evaluation_count"] >= 1
# At least one run record must carry provider metadata for openrouter calls.
run_records = sorted((infospace.root / "output" / "workflows" / "runs").glob("*.yaml"))
assert run_records
saw_openrouter_metadata = False
for record_path in run_records:
data = yaml.safe_load(record_path.read_text(encoding="utf-8"))
for stage in data.get("stages") or []:
if stage.get("provider") != "openrouter":
continue
metadata = stage.get("metadata") or {}
assert metadata.get("model") == model
assert metadata.get("request_id"), "expected non-empty OpenRouter request_id"
usage = metadata.get("usage") or {}
assert int(usage.get("prompt_tokens") or 0) > 0
assert int(usage.get("completion_tokens") or 0) > 0
assert metadata.get("retry_count") is not None
assert float(metadata.get("duration_seconds") or 0) > 0
saw_openrouter_metadata = True
assert saw_openrouter_metadata, "no openrouter stage recorded provider metadata"
# Provider metadata also reaches generated artifact provenance.
import yaml as _yaml
index = _yaml.safe_load((infospace.root / "artifacts" / "index.yaml").read_text(encoding="utf-8"))
generated_with_metadata = [
item
for item in index["artifacts"]
if item["kind"] in {"entity", "relation", "generated"}
and item.get("provenance", {}).get("provider_metadata", {}).get("request_id")
]
assert generated_with_metadata, "generated artifacts should carry provider_metadata.request_id"

View File

@@ -192,7 +192,7 @@ state_hub_task_id: "c9bbc84e-691b-4530-a79a-6ecfa9c41fdd"
```task
id: IB-WP-0016-T06
status: todo
status: done
priority: high
state_hub_task_id: "c6bf97c3-1c2c-4993-8f4f-97a48e01cce2"
```