Files
infospace-bench/docs/generic-source-generator.md
tegwick a4dde53fc3 IB-WP-0019-T03: rate-table cost computation
Ship a starter model rate table at src/infospace_bench/model_rates.yaml
(prompt_per_1k / completion_per_1k for the OpenRouter models we have
actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet
and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a
load_rate_table() / estimate_cost_usd() pair that overlays an optional
<workspace>/model-rates.yaml on top of the bundled defaults.

generate run now passes a workspace-aware cost_resolver into
record_run_usage, so cost_usd_estimated lands on every usage bucket
whose model matches the table. Adapter-returned cost still wins
(cost_status="known"); rate-table cost is reported under
cost_status="estimated"; unmatched models are recorded as
cost_status="unknown" rather than silently zeroed. Rate-table file is
listed in pyproject.toml package-data so pip-installed users keep the
defaults.

106 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:54:30 +02:00

163 lines
5.5 KiB
Markdown

# Generic Source Generator
Date: 2026-05-14
## Purpose
`infospace-bench generate` turns a local article, ebook-like file, or folder of
knowledge sources into a manifest-backed infospace. It generalizes the
Wealth/VSM pilot into an explicit workflow path with deterministic fixture
support and an optional OpenRouter provider.
## Deterministic Run
Use fixture responses for repeatable tests and demos:
```bash
infospace-bench generate from-source ./examples/article.md \
--workspace . \
--slug article-space \
--name "Article Space" \
--profile general-knowledge \
--fixture-responses ./examples/responses.yaml \
--apply
```
The command creates normalized source chunks, installs the selected profile,
runs the declared workflows, writes entities, relations, evaluations, metrics,
history, and a generation report, then registers artifacts in
`artifacts/index.yaml`.
## Stepwise Workflow
```bash
infospace-bench generate init ./book.epub \
--workspace . \
--slug book-space \
--name "Book Space" \
--profile general-knowledge \
--max-chunks 3
infospace-bench generate plan ./infospaces/book-space --stage all
infospace-bench generate run ./infospaces/book-space \
--fixture-responses ./responses.yaml
infospace-bench generate status ./infospaces/book-space
```
`--max-chunks` caps early experiments and provider cost. `generate status`
shows chunk counts, generated artifact counts, evaluations, metrics, history,
and stale source/profile inputs.
### Budget and usage registry
Every `generate plan` invocation appends a compact snapshot to
`output/budget/plans.yaml` (deterministic 12-char `snapshot_id`, 50-entry
sliding retention). Every `generate run` invocation appends a usage
rollup to `output/budget/usage.yaml`, bucketed by `(workflow_id,
stage_id, provider, model)` with prompt and completion token counts,
known cost (when the adapter returned it), and estimated cost (when a
rate table entry matches the model).
The default rate table is bundled at
`src/infospace_bench/model_rates.yaml` and covers a handful of common
OpenRouter models at list price (see the file for the captured-at
timestamp). A workspace can override or extend entries by placing
`model-rates.yaml` next to its `infospaces/` directory; the workspace
file is overlaid on top of the package default so partial overrides
are fine.
Cost resolution order on each run: adapter-returned `cost` first, then
the rate table, then `cost_status="unknown"` (recorded explicitly,
never silently zeroed). The plan-vs-actual variance summary lands in
follow-on task T04.
### Profiles
Two profiles ship today:
- `general-knowledge` — durable concepts, claims, methods, people,
places, works, and objects across any source
- `trading-literature` — trading memoirs and market-structure texts;
tunes entity categories (`trader`, `market`, `strategy`, `error`,
`psychological_pattern`, `institution`, `instrument`,
`evidence_bearing_claim`), relation types (`cause_effect`,
`lesson_evidence`, `risk_mitigation`, `actor_venue`,
`strategy_outcome`), and evaluation criteria (`groundedness`,
`lesson_clarity`, `historical_context`, `overgeneralization_risk`)
Select via `--profile trading-literature` on `generate init` or
`generate from-source`. The generic profile remains the default.
### Scale-aware plan
`generate plan` returns a compact estimate by default — counts of selected
chunks, calls per workflow, prompt-word and token estimates, and a rough
USD cost when `--cost-per-1k` is supplied. Long corpora no longer dump
hundreds of full prompts unless `--full` is set.
```bash
infospace-bench generate plan ./infospaces/book-space \
--from-chapter 1 --to-chapter 3 \
--cost-per-1k 0.30 \
--max-calls 50 \
--cost-cap 2.00
```
Selection filters:
- `--chapter LABEL` (repeatable) — match a chapter by roman/arabic label
or numeric value (e.g. `--chapter I` or `--chapter 2`)
- `--from-chapter N` / `--to-chapter N` — numeric chapter range
- `--chunk ID` (repeatable) — exact source chunk id (e.g.
`chapter-01-part-002`)
Budget flags `--max-calls` and `--cost-cap` are reported as
`exceeds_max_calls` / `exceeds_cost_cap` booleans in the summary, so a
caller can fail fast before invoking `run`. Use `--full` to opt back into
the full per-workflow plan with prompts for deep inspection.
## OpenRouter
Live model calls are explicit:
```bash
export OPENROUTER_API_KEY=...
infospace-bench generate run ./infospaces/book-space \
--provider openrouter \
--model openai/gpt-4o-mini \
--stage all
```
Choose the `--model` value from OpenRouter model IDs. The API key is read from
`OPENROUTER_API_KEY`; it is not written to `infospace.yaml`. Default tests never
make live provider calls.
## Resume
Use resume for interrupted or reviewed runs:
```bash
infospace-bench generate resume ./infospaces/book-space \
--provider openrouter \
--model openai/gpt-4o-mini
```
Unchanged completed runs are skipped. Use `--force` when you intentionally want
to rerun completed work. Stale status is reported when source artifact digests
or installed profile/template files change.
## Review Path
After generation:
- inspect `artifacts/sources/` for normalized input chunks
- inspect `artifacts/entities/` and `artifacts/relations/` for generated claims
- inspect `output/evaluations/` for rubric output
- run `infospace-bench validate <root>` and `infospace-bench graph <root>`
- review `reports/generation-summary.md`
Move from the generic profile to a specialized profile when the source domain
needs stricter terminology, narrower extraction granularity, or a discipline
lens such as VSM.