generated from coulomb/repo-seed
Replace generate plan's full-prompt dump with a compact summary that reports selected-chunk counts, selected chapter numbers, per-workflow call counts, prompt-word and token estimates, and a rough USD cost when --cost-per-1k is supplied. Selection filters --chapter (label or number, repeatable), --from-chapter / --to-chapter (numeric range), and --chunk (repeatable id) shape the estimate. Budget caps --max-calls and --cost-cap are reported as exceeds_* booleans so callers can fail fast before run. The old full per-workflow plan with prompts remains available behind --full so deep inspection is opt-in instead of the default. Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls, ~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks, 95 calls, ~64k tokens. 87 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
123 lines
3.7 KiB
Markdown
123 lines
3.7 KiB
Markdown
# Generic Source Generator
|
|
|
|
Date: 2026-05-14
|
|
|
|
## Purpose
|
|
|
|
`infospace-bench generate` turns a local article, ebook-like file, or folder of
|
|
knowledge sources into a manifest-backed infospace. It generalizes the
|
|
Wealth/VSM pilot into an explicit workflow path with deterministic fixture
|
|
support and an optional OpenRouter provider.
|
|
|
|
## Deterministic Run
|
|
|
|
Use fixture responses for repeatable tests and demos:
|
|
|
|
```bash
|
|
infospace-bench generate from-source ./examples/article.md \
|
|
--workspace . \
|
|
--slug article-space \
|
|
--name "Article Space" \
|
|
--profile general-knowledge \
|
|
--fixture-responses ./examples/responses.yaml \
|
|
--apply
|
|
```
|
|
|
|
The command creates normalized source chunks, installs the selected profile,
|
|
runs the declared workflows, writes entities, relations, evaluations, metrics,
|
|
history, and a generation report, then registers artifacts in
|
|
`artifacts/index.yaml`.
|
|
|
|
## Stepwise Workflow
|
|
|
|
```bash
|
|
infospace-bench generate init ./book.epub \
|
|
--workspace . \
|
|
--slug book-space \
|
|
--name "Book Space" \
|
|
--profile general-knowledge \
|
|
--max-chunks 3
|
|
|
|
infospace-bench generate plan ./infospaces/book-space --stage all
|
|
infospace-bench generate run ./infospaces/book-space \
|
|
--fixture-responses ./responses.yaml
|
|
infospace-bench generate status ./infospaces/book-space
|
|
```
|
|
|
|
`--max-chunks` caps early experiments and provider cost. `generate status`
|
|
shows chunk counts, generated artifact counts, evaluations, metrics, history,
|
|
and stale source/profile inputs.
|
|
|
|
### Scale-aware plan
|
|
|
|
`generate plan` returns a compact estimate by default — counts of selected
|
|
chunks, calls per workflow, prompt-word and token estimates, and a rough
|
|
USD cost when `--cost-per-1k` is supplied. Long corpora no longer dump
|
|
hundreds of full prompts unless `--full` is set.
|
|
|
|
```bash
|
|
infospace-bench generate plan ./infospaces/book-space \
|
|
--from-chapter 1 --to-chapter 3 \
|
|
--cost-per-1k 0.30 \
|
|
--max-calls 50 \
|
|
--cost-cap 2.00
|
|
```
|
|
|
|
Selection filters:
|
|
|
|
- `--chapter LABEL` (repeatable) — match a chapter by roman/arabic label
|
|
or numeric value (e.g. `--chapter I` or `--chapter 2`)
|
|
- `--from-chapter N` / `--to-chapter N` — numeric chapter range
|
|
- `--chunk ID` (repeatable) — exact source chunk id (e.g.
|
|
`chapter-01-part-002`)
|
|
|
|
Budget flags `--max-calls` and `--cost-cap` are reported as
|
|
`exceeds_max_calls` / `exceeds_cost_cap` booleans in the summary, so a
|
|
caller can fail fast before invoking `run`. Use `--full` to opt back into
|
|
the full per-workflow plan with prompts for deep inspection.
|
|
|
|
## OpenRouter
|
|
|
|
Live model calls are explicit:
|
|
|
|
```bash
|
|
export OPENROUTER_API_KEY=...
|
|
|
|
infospace-bench generate run ./infospaces/book-space \
|
|
--provider openrouter \
|
|
--model openai/gpt-4o-mini \
|
|
--stage all
|
|
```
|
|
|
|
Choose the `--model` value from OpenRouter model IDs. The API key is read from
|
|
`OPENROUTER_API_KEY`; it is not written to `infospace.yaml`. Default tests never
|
|
make live provider calls.
|
|
|
|
## Resume
|
|
|
|
Use resume for interrupted or reviewed runs:
|
|
|
|
```bash
|
|
infospace-bench generate resume ./infospaces/book-space \
|
|
--provider openrouter \
|
|
--model openai/gpt-4o-mini
|
|
```
|
|
|
|
Unchanged completed runs are skipped. Use `--force` when you intentionally want
|
|
to rerun completed work. Stale status is reported when source artifact digests
|
|
or installed profile/template files change.
|
|
|
|
## Review Path
|
|
|
|
After generation:
|
|
|
|
- inspect `artifacts/sources/` for normalized input chunks
|
|
- inspect `artifacts/entities/` and `artifacts/relations/` for generated claims
|
|
- inspect `output/evaluations/` for rubric output
|
|
- run `infospace-bench validate <root>` and `infospace-bench graph <root>`
|
|
- review `reports/generation-summary.md`
|
|
|
|
Move from the generic profile to a specialized profile when the source domain
|
|
needs stricter terminology, narrower extraction granularity, or a discipline
|
|
lens such as VSM.
|