Files

tegwick debd2b8e69 IB-WP-0020-T04: example routing config + live routing smoke

examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.

tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.

tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.

docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.

174 tests pass, 2 skipped (both live smokes, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-18 22:19:54 +02:00

8.6 KiB

Raw Blame History

Generic Source Generator

Date: 2026-05-14

Purpose

infospace-bench generate turns a local article, ebook-like file, or folder of knowledge sources into a manifest-backed infospace. It generalizes the Wealth/VSM pilot into an explicit workflow path with deterministic fixture support and an optional OpenRouter provider.

Deterministic Run

Use fixture responses for repeatable tests and demos:

infospace-bench generate from-source ./examples/article.md \
  --workspace . \
  --slug article-space \
  --name "Article Space" \
  --profile general-knowledge \
  --fixture-responses ./examples/responses.yaml \
  --apply

The command creates normalized source chunks, installs the selected profile, runs the declared workflows, writes entities, relations, evaluations, metrics, history, and a generation report, then registers artifacts in artifacts/index.yaml.

Stepwise Workflow

infospace-bench generate init ./book.epub \
  --workspace . \
  --slug book-space \
  --name "Book Space" \
  --profile general-knowledge \
  --max-chunks 3

infospace-bench generate plan ./infospaces/book-space --stage all
infospace-bench generate run ./infospaces/book-space \
  --fixture-responses ./responses.yaml
infospace-bench generate status ./infospaces/book-space

--max-chunks caps early experiments and provider cost. generate status shows chunk counts, generated artifact counts, evaluations, metrics, history, and stale source/profile inputs.

Live OpenRouter runs (handle with care)

A single-chapter live run is the only OpenRouter shape the test suite covers today. Use --chapter (or --from-chapter / --to-chapter) on generate init or generate from-source to scope what gets registered before any provider calls happen:

export OPENROUTER_API_KEY=...

# Preview the cost first
infospace-bench generate plan ./infospaces/foo --chapter I --cost-per-1k 0.30

# Run only Chapter I against a cheap model
infospace-bench generate from-source ./LEFEVRE.epub \
  --workspace ./infospaces \
  --slug reminiscences-ch1 \
  --name "Reminiscences (Ch I)" \
  --profile trading-literature \
  --provider openrouter \
  --model openai/gpt-4o-mini \
  --chapter I \
  --apply

output/budget/plans.yaml, usage.yaml, and summary.yaml record what was estimated, what was actually spent, and the plan-vs-actual delta. output/workflows/runs/*.yaml carry the OpenRouter request_id, model, token usage, retry count, and per-call duration; the same metadata reaches the entity/relation/evaluation artifacts via provenance.provider_metadata.

Before scaling to the full book:

Inspect each chapter's outputs and generation-summary.md
Multiply the per-chapter total_provider_calls_estimate and estimated_cost_usd by the chapter count and compare to your budget
Decide on a final model and confirm the rate-table entry exists in src/infospace_bench/model_rates.yaml or your workspace override

The optional live-smoke test in tests/test_openrouter_live.py is skipped unless both OPENROUTER_API_KEY and INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 are set. It runs a single chapter through the same path and asserts the provider metadata plumb-through.

Live runs with `--provider routing`

When the routing CLI is what you want to exercise live, swap --provider openrouter --model ... for the routing pair:

infospace-bench generate from-source ./LEFEVRE.epub \
  --workspace ./infospaces \
  --slug reminiscences-routed \
  --name "Reminiscences (Routed)" \
  --profile trading-literature \
  --provider routing \
  --routing-config ./examples/routing/trading-literature.yaml \
  --chapter I \
  --apply

examples/routing/trading-literature.yaml is a checked-in starting config: cheap candidates for summary/evaluation, smart candidates for entity/relation, a claude_code baseline rule for future shadow sampling, and a workspace-relative output/routing/quality.jsonl ledger so adaptive observations stay with the workspace.

--quality-floor <float> on the same command overrides the config's default_quality_floor for a single invocation — useful for tightening the bar for a specific run without editing the file. The ledger fills up as the AdaptiveRoutingPolicy records each observation; later runs against the same workspace get the benefit without re-grading from scratch.

The parallel live-smoke test (test_provider_routing_one_chapter_live_smoke) is also gated on INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 + OPENROUTER_API_KEY and asserts the per-stage adapter-choices report section names the routed model.

Budget and usage registry

Every generate plan invocation appends a compact snapshot to output/budget/plans.yaml (deterministic 12-char snapshot_id, 50-entry sliding retention). Every generate run invocation appends a usage rollup to output/budget/usage.yaml, bucketed by (workflow_id, stage_id, provider, model) with prompt and completion token counts, known cost (when the adapter returned it), and estimated cost (when a rate table entry matches the model).

The default rate table is bundled at src/infospace_bench/model_rates.yaml and covers a handful of common OpenRouter models at list price (see the file for the captured-at timestamp). A workspace can override or extend entries by placing model-rates.yaml next to its infospaces/ directory; the workspace file is overlaid on top of the package default so partial overrides are fine.

Cost resolution order on each run: adapter-returned cost first, then the rate table, then cost_status="unknown" (recorded explicitly, never silently zeroed). The plan-vs-actual variance summary lands in follow-on task T04.

Profiles

Two profiles ship today:

general-knowledge — durable concepts, claims, methods, people, places, works, and objects across any source
trading-literature — trading memoirs and market-structure texts; tunes entity categories (trader, market, strategy, error, psychological_pattern, institution, instrument, evidence_bearing_claim), relation types (cause_effect, lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), and evaluation criteria (groundedness, lesson_clarity, historical_context, overgeneralization_risk)

Select via --profile trading-literature on generate init or generate from-source. The generic profile remains the default.

Scale-aware plan

generate plan returns a compact estimate by default — counts of selected chunks, calls per workflow, prompt-word and token estimates, and a rough USD cost when --cost-per-1k is supplied. Long corpora no longer dump hundreds of full prompts unless --full is set.

infospace-bench generate plan ./infospaces/book-space \
  --from-chapter 1 --to-chapter 3 \
  --cost-per-1k 0.30 \
  --max-calls 50 \
  --cost-cap 2.00

Selection filters:

--chapter LABEL (repeatable) — match a chapter by roman/arabic label or numeric value (e.g. --chapter I or --chapter 2)
--from-chapter N / --to-chapter N — numeric chapter range
--chunk ID (repeatable) — exact source chunk id (e.g. chapter-01-part-002)

Budget flags --max-calls and --cost-cap are reported as exceeds_max_calls / exceeds_cost_cap booleans in the summary, so a caller can fail fast before invoking run. Use --full to opt back into the full per-workflow plan with prompts for deep inspection.

OpenRouter

Live model calls are explicit:

export OPENROUTER_API_KEY=...

infospace-bench generate run ./infospaces/book-space \
  --provider openrouter \
  --model openai/gpt-4o-mini \
  --stage all

Choose the --model value from OpenRouter model IDs. The API key is read from OPENROUTER_API_KEY; it is not written to infospace.yaml. Default tests never make live provider calls.

Resume

Use resume for interrupted or reviewed runs:

infospace-bench generate resume ./infospaces/book-space \
  --provider openrouter \
  --model openai/gpt-4o-mini

Unchanged completed runs are skipped. Use --force when you intentionally want to rerun completed work. Stale status is reported when source artifact digests or installed profile/template files change.

Review Path

After generation:

inspect artifacts/sources/ for normalized input chunks
inspect artifacts/entities/ and artifacts/relations/ for generated claims
inspect output/evaluations/ for rubric output
run infospace-bench validate <root> and infospace-bench graph <root>
review reports/generation-summary.md

Move from the generic profile to a specialized profile when the source domain needs stricter terminology, narrower extraction granularity, or a discipline lens such as VSM.

8.6 KiB Raw Blame History