Add --chapter / --from-chapter / --to-chapter / --chunk selection flags to generate init and generate from-source, plumb them into init_generation_infospace via a new _filter_chunks_by_chapter helper, and refuse to create an infospace when the filters reject every chunk (InfospaceError "empty_chapter_selection"). The flags use the same T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a single-chapter selection is a one-flag command. OpenRouter run-record metadata (model, request_id, usage tokens, retry_count, duration_seconds) already lands in output/workflows/runs/*.yaml; this task just adds the smoke test that proves it stays there, plus the parallel guarantee that the same provider metadata reaches generated artifact provenance via provenance.provider_metadata. tests/test_openrouter_live.py covers: - chapter-filter, from/to-chapter range, and empty-selection failure on init (non-live, deterministic) - CLI smoke through generate from-source with --chapter - a pytest-skipped live OpenRouter one-chapter end-to-end gated by OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini) docs/generic-source-generator.md gains a "Live OpenRouter runs (handle with care)" section that walks plan-before-run, single-chapter live run, the budget/usage artifacts, and the checks a reviewer should run before scaling to the full book. 129 tests pass, 1 skipped (the live smoke, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.2 KiB
Generic Source Generator
Date: 2026-05-14
Purpose
infospace-bench generate turns a local article, ebook-like file, or folder of
knowledge sources into a manifest-backed infospace. It generalizes the
Wealth/VSM pilot into an explicit workflow path with deterministic fixture
support and an optional OpenRouter provider.
Deterministic Run
Use fixture responses for repeatable tests and demos:
infospace-bench generate from-source ./examples/article.md \
--workspace . \
--slug article-space \
--name "Article Space" \
--profile general-knowledge \
--fixture-responses ./examples/responses.yaml \
--apply
The command creates normalized source chunks, installs the selected profile,
runs the declared workflows, writes entities, relations, evaluations, metrics,
history, and a generation report, then registers artifacts in
artifacts/index.yaml.
Stepwise Workflow
infospace-bench generate init ./book.epub \
--workspace . \
--slug book-space \
--name "Book Space" \
--profile general-knowledge \
--max-chunks 3
infospace-bench generate plan ./infospaces/book-space --stage all
infospace-bench generate run ./infospaces/book-space \
--fixture-responses ./responses.yaml
infospace-bench generate status ./infospaces/book-space
--max-chunks caps early experiments and provider cost. generate status
shows chunk counts, generated artifact counts, evaluations, metrics, history,
and stale source/profile inputs.
Live OpenRouter runs (handle with care)
A single-chapter live run is the only OpenRouter shape the test suite
covers today. Use --chapter (or --from-chapter / --to-chapter) on
generate init or generate from-source to scope what gets registered
before any provider calls happen:
export OPENROUTER_API_KEY=...
# Preview the cost first
infospace-bench generate plan ./infospaces/foo --chapter I --cost-per-1k 0.30
# Run only Chapter I against a cheap model
infospace-bench generate from-source ./LEFEVRE.epub \
--workspace ./infospaces \
--slug reminiscences-ch1 \
--name "Reminiscences (Ch I)" \
--profile trading-literature \
--provider openrouter \
--model openai/gpt-4o-mini \
--chapter I \
--apply
output/budget/plans.yaml, usage.yaml, and summary.yaml record what
was estimated, what was actually spent, and the plan-vs-actual delta.
output/workflows/runs/*.yaml carry the OpenRouter request_id, model,
token usage, retry count, and per-call duration; the same metadata
reaches the entity/relation/evaluation artifacts via
provenance.provider_metadata.
Before scaling to the full book:
- Inspect each chapter's outputs and
generation-summary.md - Multiply the per-chapter
total_provider_calls_estimateandestimated_cost_usdby the chapter count and compare to your budget - Decide on a final model and confirm the rate-table entry exists in
src/infospace_bench/model_rates.yamlor your workspace override
The optional live-smoke test in tests/test_openrouter_live.py is
skipped unless both OPENROUTER_API_KEY and
INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 are set. It runs a single
chapter through the same path and asserts the provider metadata
plumb-through.
Budget and usage registry
Every generate plan invocation appends a compact snapshot to
output/budget/plans.yaml (deterministic 12-char snapshot_id, 50-entry
sliding retention). Every generate run invocation appends a usage
rollup to output/budget/usage.yaml, bucketed by (workflow_id, stage_id, provider, model) with prompt and completion token counts,
known cost (when the adapter returned it), and estimated cost (when a
rate table entry matches the model).
The default rate table is bundled at
src/infospace_bench/model_rates.yaml and covers a handful of common
OpenRouter models at list price (see the file for the captured-at
timestamp). A workspace can override or extend entries by placing
model-rates.yaml next to its infospaces/ directory; the workspace
file is overlaid on top of the package default so partial overrides
are fine.
Cost resolution order on each run: adapter-returned cost first, then
the rate table, then cost_status="unknown" (recorded explicitly,
never silently zeroed). The plan-vs-actual variance summary lands in
follow-on task T04.
Profiles
Two profiles ship today:
general-knowledge— durable concepts, claims, methods, people, places, works, and objects across any sourcetrading-literature— trading memoirs and market-structure texts; tunes entity categories (trader,market,strategy,error,psychological_pattern,institution,instrument,evidence_bearing_claim), relation types (cause_effect,lesson_evidence,risk_mitigation,actor_venue,strategy_outcome), and evaluation criteria (groundedness,lesson_clarity,historical_context,overgeneralization_risk)
Select via --profile trading-literature on generate init or
generate from-source. The generic profile remains the default.
Scale-aware plan
generate plan returns a compact estimate by default — counts of selected
chunks, calls per workflow, prompt-word and token estimates, and a rough
USD cost when --cost-per-1k is supplied. Long corpora no longer dump
hundreds of full prompts unless --full is set.
infospace-bench generate plan ./infospaces/book-space \
--from-chapter 1 --to-chapter 3 \
--cost-per-1k 0.30 \
--max-calls 50 \
--cost-cap 2.00
Selection filters:
--chapter LABEL(repeatable) — match a chapter by roman/arabic label or numeric value (e.g.--chapter Ior--chapter 2)--from-chapter N/--to-chapter N— numeric chapter range--chunk ID(repeatable) — exact source chunk id (e.g.chapter-01-part-002)
Budget flags --max-calls and --cost-cap are reported as
exceeds_max_calls / exceeds_cost_cap booleans in the summary, so a
caller can fail fast before invoking run. Use --full to opt back into
the full per-workflow plan with prompts for deep inspection.
OpenRouter
Live model calls are explicit:
export OPENROUTER_API_KEY=...
infospace-bench generate run ./infospaces/book-space \
--provider openrouter \
--model openai/gpt-4o-mini \
--stage all
Choose the --model value from OpenRouter model IDs. The API key is read from
OPENROUTER_API_KEY; it is not written to infospace.yaml. Default tests never
make live provider calls.
Resume
Use resume for interrupted or reviewed runs:
infospace-bench generate resume ./infospaces/book-space \
--provider openrouter \
--model openai/gpt-4o-mini
Unchanged completed runs are skipped. Use --force when you intentionally want
to rerun completed work. Stale status is reported when source artifact digests
or installed profile/template files change.
Review Path
After generation:
- inspect
artifacts/sources/for normalized input chunks - inspect
artifacts/entities/andartifacts/relations/for generated claims - inspect
output/evaluations/for rubric output - run
infospace-bench validate <root>andinfospace-bench graph <root> - review
reports/generation-summary.md
Move from the generic profile to a specialized profile when the source domain needs stricter terminology, narrower extraction granularity, or a discipline lens such as VSM.