9.5 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_slug, state_hub_workstream_id, depends_on_workplans, related_workplans
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_slug | state_hub_workstream_id | depends_on_workplans | related_workplans | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IB-WP-0015 | workplan | Generic Source Infospace Generator CLI | markitect | infospace-bench | completed | markitect | markitect | 2026-05-14 | 2026-05-14 | ib-wp-0015-generic-source-infospace-generator-cli | 1bf47fb9-fe55-428a-b8da-8e6cc76d4b03 |
|
|
IB-WP-0015 - Generic Source Infospace Generator CLI
Goal
Turn the Wealth/VSM example into a reusable CLI capability for incrementally building an infospace from an ebook, article, or collection of knowledge.
When this workplan is done, a user should be able to run something like:
infospace-bench generate from-source ./examples/my-book.epub \
--slug my-book \
--name "My Book Infospace" \
--profile general-knowledge \
--provider openrouter \
--model <openrouter-model-id> \
--apply
and get a manifest-backed infospace with normalized sources, generated entity artifacts, relation artifacts, evaluations, metrics/history, reports, and a clear resume path.
Intent
IB-WP-0013 proved the successor shape on a one-chapter Wealth/VSM pilot. This
workplan generalizes that capability:
- source intake is generic, not Wealth-only
- workflow templates are reusable profiles
- assisted generation can use OpenRouter explicitly
- generation is incremental and resumable
- default tests stay deterministic and never require live provider credentials
The old infrastructure could generate the Adam Smith example with OpenRouter. The new infrastructure should recover that operational convenience while preserving the successor design: explicit workflows, auditable provider calls, stable artifact IDs, and clean repo boundaries.
Target CLI Shape
Suggested commands:
infospace-bench generate init <source> --slug <slug> --name <name> --profile <profile>
infospace-bench generate plan <root> --stage all
infospace-bench generate run <root> --provider openrouter --model <model-id> --stage all
infospace-bench generate resume <root> --provider openrouter --model <model-id>
infospace-bench generate status <root>
Short-form combined command:
infospace-bench generate from-source <source> \
--slug <slug> \
--name <name> \
--profile general-knowledge \
--provider openrouter \
--model <model-id> \
--apply
Default-safe modes:
--dry-run: plan without provider calls or writes beyond optional plan output--fixture-responses <path>: deterministic tests and demos--max-chunks <n>: bound early runs--stage intake|extract|relations|evaluate|metrics|all--resume: skip completed chunks and retry failed or stale work
Non-Goals
- Do not make live OpenRouter calls in the default test suite.
- Do not store API keys in
infospace.yaml. - Do not build a general document conversion product inside this repo.
- Do not hide provider calls behind implicit workflow execution.
- Do not solve remote storage backends here;
IB-WP-0014owns backend abstraction. - Do not require full EPUB/PDF/article extraction perfection in the first pass; extraction quality should be explicit and testable.
Tasks
T01 - Source intake and corpus normalization
id: IB-WP-0015-T01
status: done
priority: high
state_hub_task_id: "08196bf2-9323-4cd8-860c-4306c965ed60"
- Add a source intake module that accepts files and folders
- Normalize supported inputs into markdown-ish source artifacts with stable IDs
- First supported source types:
- Markdown
- plain text
- local HTML/article export
- EPUB or ebook-like directory fixtures
- folder collections of the above
- Record source metadata: original path, source type, title, digest, chunk ID, import time, and extractor version
- Add chunking for long inputs with deterministic chunk IDs
- Add tests for article, ebook, and folder fixtures
- Keep URL fetching optional and explicit; local fixtures must cover tests
T02 - Generic workflow template pack and schema profiles
id: IB-WP-0015-T02
status: done
priority: high
state_hub_task_id: "5604796b-cb09-43ed-b3a9-5d4906790807"
- Create reusable profile packs under a clear directory such as
profiles/general-knowledge/ - Include contracts for generated entities, relations, summaries, and evaluations
- Include prompt templates for:
- source/chunk summary
- entity extraction
- relation extraction
- entity evaluation
- collection synthesis/reporting
- Let profiles define terminology, extraction granularity, evaluation criteria, and optional lenses such as VSM
- Preserve the Wealth/VSM pilot as a specialized profile or example derived from the generic path
T03 - OpenRouter provider adapter and model configuration
id: IB-WP-0015-T03
status: done
priority: high
state_hub_task_id: "c02720c5-1b82-458a-bf8c-9147af4fd9e9"
- Add an explicit OpenRouter assisted-generation adapter
- Read credentials from environment, preferably
OPENROUTER_API_KEY - Accept
--model <openrouter-model-id>at the CLI boundary - Record provider, model, request ID if available, timing, token usage if available, retry count, and error detail in run records
- Add rate-limit and retry behavior that is visible and bounded
- Add model fallback support only when explicitly configured
- Keep fixture adapter support for deterministic tests
- Add provider contract tests with mocked HTTP, not live network calls
T04 - Generator CLI orchestration
id: IB-WP-0015-T04
status: done
priority: high
state_hub_task_id: "21b50fbc-f43e-4b18-b012-976a5241f52a"
- Add
infospace-bench generate ...subcommands generate initcreates an infospace from a source and selected profilegenerate planshows chunk/stage/provider work without mutationgenerate runexecutes selected stagesgenerate resumecontinues incomplete or failed workgenerate statusreports source chunks, generated artifacts, failures, stale outputs, evaluations, and metrics- Support both stepwise and combined
from-sourceflows - Keep CLI output structured JSON by default, consistent with existing commands
- Ensure commands work with current local-folder backend and do not block
IB-WP-0014
T05 - Incremental resume and stale output handling
id: IB-WP-0015-T05
status: done
priority: high
state_hub_task_id: "ad882b6e-924e-4f9a-8e93-119aeadd8132"
- Track a generation state file under
output/workflows/or an equivalent successor location - Record chunk digest, stage status, output artifact IDs, provider metadata, errors, and timestamps
- Skip unchanged completed chunks by default
- Detect stale generated artifacts when source digests or profile/template digests change
- Support rerun policies:
- failed only
- stale only
- force all
- selected chunk
- Add tests for interrupted generation, resume, stale detection, and idempotent manifest updates
T06 - End-to-end examples, docs, and acceptance suite
id: IB-WP-0015-T06
status: done
priority: medium
state_hub_task_id: "3461eacf-e42a-455c-954c-849b0ad69fc1"
- Add deterministic end-to-end fixtures:
- one article
- one small ebook-like fixture
- one folder collection
- Prove each can generate an infospace with fixture responses
- Add an optional live OpenRouter smoke path that is skipped unless explicitly enabled
- Document:
- how to choose a model
- where to put credentials
- how to cap chunks/cost
- how to resume
- how to review generated artifacts
- how to move from a generic profile to a specialized profile
- Update README and replacement docs with the new generator path
Acceptance
- A user can generate a new infospace from a local article fixture using only deterministic fixture responses
- A user can generate a new infospace from an ebook-like fixture using only deterministic fixture responses
- A user can generate a new infospace from a folder collection using only deterministic fixture responses
- A user can run the same CLI with
--provider openrouter --model <model-id>whenOPENROUTER_API_KEYis configured - Generated sources, chunks, entities, relations, evaluations, metrics, history, and reports are manifest-backed and inspectable
- Generation is resumable and idempotent for unchanged inputs
- Stale outputs are detected when source or profile/template inputs change
- Live provider calls are explicit, auditable, and absent from default tests
Relationship To Existing Work
- Builds on
IB-WP-0013, which proved the explicit workflow shape for the Wealth/VSM one-chapter pilot. - Should stay compatible with
IB-WP-0014, but should not wait for remote backend support. - Continues the successor split:
markitect-tool: markdown parsing, templates, contractsinfospace-bench: applied infospace generation workflow and CLIkontextual-engine: durable runtime/retrieval/audit if needed later
Implementation Notes
Completed on 2026-05-14.
- Added generic source intake for Markdown, plain text, local HTML, EPUB-like archives, and folder collections.
- Added the
general-knowledgeprofile with prompt templates and contracts. - Added an explicit OpenRouter assisted-generation adapter with mocked provider tests and environment-based credential lookup.
- Added
infospace-bench generatesubcommands for init, plan, run, resume, status, and from-source flows. - Added generation state, resume skipping, source/profile stale detection, metrics/history recording, and a manifest-backed generation report.
- Added deterministic acceptance tests for article, ebook-like, and folder generation using fixture responses.