generic source-to-infospace generator

2026-05-14 19:33:22 +02:00
parent 065e17f42e
commit 46aad3cce8
20 changed files with 1629 additions and 8 deletions
--- a/docs/generic-source-generator.md
+++ b/docs/generic-source-generator.md
@@ -0,0 +1,94 @@
+# Generic Source Generator
+
+Date: 2026-05-14
+
+## Purpose
+
+`infospace-bench generate` turns a local article, ebook-like file, or folder of
+knowledge sources into a manifest-backed infospace. It generalizes the
+Wealth/VSM pilot into an explicit workflow path with deterministic fixture
+support and an optional OpenRouter provider.
+
+## Deterministic Run
+
+Use fixture responses for repeatable tests and demos:
+
+```bash
+infospace-bench generate from-source ./examples/article.md \
+  --workspace . \
+  --slug article-space \
+  --name "Article Space" \
+  --profile general-knowledge \
+  --fixture-responses ./examples/responses.yaml \
+  --apply
+```
+
+The command creates normalized source chunks, installs the selected profile,
+runs the declared workflows, writes entities, relations, evaluations, metrics,
+history, and a generation report, then registers artifacts in
+`artifacts/index.yaml`.
+
+## Stepwise Workflow
+
+```bash
+infospace-bench generate init ./book.epub \
+  --workspace . \
+  --slug book-space \
+  --name "Book Space" \
+  --profile general-knowledge \
+  --max-chunks 3
+
+infospace-bench generate plan ./infospaces/book-space --stage all
+infospace-bench generate run ./infospaces/book-space \
+  --fixture-responses ./responses.yaml
+infospace-bench generate status ./infospaces/book-space
+```
+
+`--max-chunks` caps early experiments and provider cost. `generate status`
+shows chunk counts, generated artifact counts, evaluations, metrics, history,
+and stale source/profile inputs.
+
+## OpenRouter
+
+Live model calls are explicit:
+
+```bash
+export OPENROUTER_API_KEY=...
+
+infospace-bench generate run ./infospaces/book-space \
+  --provider openrouter \
+  --model openai/gpt-4o-mini \
+  --stage all
+```
+
+Choose the `--model` value from OpenRouter model IDs. The API key is read from
+`OPENROUTER_API_KEY`; it is not written to `infospace.yaml`. Default tests never
+make live provider calls.
+
+## Resume
+
+Use resume for interrupted or reviewed runs:
+
+```bash
+infospace-bench generate resume ./infospaces/book-space \
+  --provider openrouter \
+  --model openai/gpt-4o-mini
+```
+
+Unchanged completed runs are skipped. Use `--force` when you intentionally want
+to rerun completed work. Stale status is reported when source artifact digests
+or installed profile/template files change.
+
+## Review Path
+
+After generation:
+
+- inspect `artifacts/sources/` for normalized input chunks
+- inspect `artifacts/entities/` and `artifacts/relations/` for generated claims
+- inspect `output/evaluations/` for rubric output
+- run `infospace-bench validate <root>` and `infospace-bench graph <root>`
+- review `reports/generation-summary.md`
+
+Move from the generic profile to a specialized profile when the source domain
+needs stricter terminology, narrower extraction granularity, or a discipline
+lens such as VSM.