generated from coulomb/repo-seed
IB-WP-0016-T06: OpenRouter live-run guardrails
Add --chapter / --from-chapter / --to-chapter / --chunk selection flags to generate init and generate from-source, plumb them into init_generation_infospace via a new _filter_chunks_by_chapter helper, and refuse to create an infospace when the filters reject every chunk (InfospaceError "empty_chapter_selection"). The flags use the same T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a single-chapter selection is a one-flag command. OpenRouter run-record metadata (model, request_id, usage tokens, retry_count, duration_seconds) already lands in output/workflows/runs/*.yaml; this task just adds the smoke test that proves it stays there, plus the parallel guarantee that the same provider metadata reaches generated artifact provenance via provenance.provider_metadata. tests/test_openrouter_live.py covers: - chapter-filter, from/to-chapter range, and empty-selection failure on init (non-live, deterministic) - CLI smoke through generate from-source with --chapter - a pytest-skipped live OpenRouter one-chapter end-to-end gated by OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini) docs/generic-source-generator.md gains a "Live OpenRouter runs (handle with care)" section that walks plan-before-run, single-chapter live run, the budget/usage artifacts, and the checks a reviewer should run before scaling to the full book. 129 tests pass, 1 skipped (the live smoke, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -48,6 +48,52 @@ infospace-bench generate status ./infospaces/book-space
|
||||
shows chunk counts, generated artifact counts, evaluations, metrics, history,
|
||||
and stale source/profile inputs.
|
||||
|
||||
### Live OpenRouter runs (handle with care)
|
||||
|
||||
A single-chapter live run is the only OpenRouter shape the test suite
|
||||
covers today. Use `--chapter` (or `--from-chapter` / `--to-chapter`) on
|
||||
`generate init` or `generate from-source` to scope what gets registered
|
||||
before any provider calls happen:
|
||||
|
||||
```bash
|
||||
export OPENROUTER_API_KEY=...
|
||||
|
||||
# Preview the cost first
|
||||
infospace-bench generate plan ./infospaces/foo --chapter I --cost-per-1k 0.30
|
||||
|
||||
# Run only Chapter I against a cheap model
|
||||
infospace-bench generate from-source ./LEFEVRE.epub \
|
||||
--workspace ./infospaces \
|
||||
--slug reminiscences-ch1 \
|
||||
--name "Reminiscences (Ch I)" \
|
||||
--profile trading-literature \
|
||||
--provider openrouter \
|
||||
--model openai/gpt-4o-mini \
|
||||
--chapter I \
|
||||
--apply
|
||||
```
|
||||
|
||||
`output/budget/plans.yaml`, `usage.yaml`, and `summary.yaml` record what
|
||||
was estimated, what was actually spent, and the plan-vs-actual delta.
|
||||
`output/workflows/runs/*.yaml` carry the OpenRouter request_id, model,
|
||||
token usage, retry count, and per-call duration; the same metadata
|
||||
reaches the entity/relation/evaluation artifacts via
|
||||
`provenance.provider_metadata`.
|
||||
|
||||
Before scaling to the full book:
|
||||
|
||||
- Inspect each chapter's outputs and `generation-summary.md`
|
||||
- Multiply the per-chapter `total_provider_calls_estimate` and
|
||||
`estimated_cost_usd` by the chapter count and compare to your budget
|
||||
- Decide on a final model and confirm the rate-table entry exists in
|
||||
`src/infospace_bench/model_rates.yaml` or your workspace override
|
||||
|
||||
The optional live-smoke test in `tests/test_openrouter_live.py` is
|
||||
skipped unless both `OPENROUTER_API_KEY` and
|
||||
`INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` are set. It runs a single
|
||||
chapter through the same path and asserts the provider metadata
|
||||
plumb-through.
|
||||
|
||||
### Budget and usage registry
|
||||
|
||||
Every `generate plan` invocation appends a compact snapshot to
|
||||
|
||||
Reference in New Issue
Block a user