generated from coulomb/repo-seed
282 lines
9.5 KiB
Markdown
282 lines
9.5 KiB
Markdown
---
|
|
id: IB-WP-0015
|
|
type: workplan
|
|
title: "Generic Source Infospace Generator CLI"
|
|
domain: markitect
|
|
repo: infospace-bench
|
|
status: completed
|
|
owner: markitect
|
|
topic_slug: markitect
|
|
created: "2026-05-14"
|
|
updated: "2026-05-14"
|
|
state_hub_workstream_slug: "ib-wp-0015-generic-source-infospace-generator-cli"
|
|
state_hub_workstream_id: "1bf47fb9-fe55-428a-b8da-8e6cc76d4b03"
|
|
depends_on_workplans:
|
|
- IB-WP-0013
|
|
related_workplans:
|
|
- IB-WP-0014
|
|
---
|
|
|
|
# IB-WP-0015 - Generic Source Infospace Generator CLI
|
|
|
|
## Goal
|
|
|
|
Turn the Wealth/VSM example into a reusable CLI capability for incrementally
|
|
building an infospace from an ebook, article, or collection of knowledge.
|
|
|
|
When this workplan is done, a user should be able to run something like:
|
|
|
|
```bash
|
|
infospace-bench generate from-source ./examples/my-book.epub \
|
|
--slug my-book \
|
|
--name "My Book Infospace" \
|
|
--profile general-knowledge \
|
|
--provider openrouter \
|
|
--model <openrouter-model-id> \
|
|
--apply
|
|
```
|
|
|
|
and get a manifest-backed infospace with normalized sources, generated entity
|
|
artifacts, relation artifacts, evaluations, metrics/history, reports, and a
|
|
clear resume path.
|
|
|
|
## Intent
|
|
|
|
`IB-WP-0013` proved the successor shape on a one-chapter Wealth/VSM pilot. This
|
|
workplan generalizes that capability:
|
|
|
|
- source intake is generic, not Wealth-only
|
|
- workflow templates are reusable profiles
|
|
- assisted generation can use OpenRouter explicitly
|
|
- generation is incremental and resumable
|
|
- default tests stay deterministic and never require live provider credentials
|
|
|
|
The old infrastructure could generate the Adam Smith example with OpenRouter.
|
|
The new infrastructure should recover that operational convenience while
|
|
preserving the successor design: explicit workflows, auditable provider calls,
|
|
stable artifact IDs, and clean repo boundaries.
|
|
|
|
## Target CLI Shape
|
|
|
|
Suggested commands:
|
|
|
|
```bash
|
|
infospace-bench generate init <source> --slug <slug> --name <name> --profile <profile>
|
|
infospace-bench generate plan <root> --stage all
|
|
infospace-bench generate run <root> --provider openrouter --model <model-id> --stage all
|
|
infospace-bench generate resume <root> --provider openrouter --model <model-id>
|
|
infospace-bench generate status <root>
|
|
```
|
|
|
|
Short-form combined command:
|
|
|
|
```bash
|
|
infospace-bench generate from-source <source> \
|
|
--slug <slug> \
|
|
--name <name> \
|
|
--profile general-knowledge \
|
|
--provider openrouter \
|
|
--model <model-id> \
|
|
--apply
|
|
```
|
|
|
|
Default-safe modes:
|
|
|
|
- `--dry-run`: plan without provider calls or writes beyond optional plan output
|
|
- `--fixture-responses <path>`: deterministic tests and demos
|
|
- `--max-chunks <n>`: bound early runs
|
|
- `--stage intake|extract|relations|evaluate|metrics|all`
|
|
- `--resume`: skip completed chunks and retry failed or stale work
|
|
|
|
## Non-Goals
|
|
|
|
- Do not make live OpenRouter calls in the default test suite.
|
|
- Do not store API keys in `infospace.yaml`.
|
|
- Do not build a general document conversion product inside this repo.
|
|
- Do not hide provider calls behind implicit workflow execution.
|
|
- Do not solve remote storage backends here; `IB-WP-0014` owns backend
|
|
abstraction.
|
|
- Do not require full EPUB/PDF/article extraction perfection in the first pass;
|
|
extraction quality should be explicit and testable.
|
|
|
|
## Tasks
|
|
|
|
### T01 - Source intake and corpus normalization
|
|
|
|
```task
|
|
id: IB-WP-0015-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "08196bf2-9323-4cd8-860c-4306c965ed60"
|
|
```
|
|
|
|
- Add a source intake module that accepts files and folders
|
|
- Normalize supported inputs into markdown-ish source artifacts with stable IDs
|
|
- First supported source types:
|
|
- Markdown
|
|
- plain text
|
|
- local HTML/article export
|
|
- EPUB or ebook-like directory fixtures
|
|
- folder collections of the above
|
|
- Record source metadata: original path, source type, title, digest, chunk ID,
|
|
import time, and extractor version
|
|
- Add chunking for long inputs with deterministic chunk IDs
|
|
- Add tests for article, ebook, and folder fixtures
|
|
- Keep URL fetching optional and explicit; local fixtures must cover tests
|
|
|
|
### T02 - Generic workflow template pack and schema profiles
|
|
|
|
```task
|
|
id: IB-WP-0015-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "5604796b-cb09-43ed-b3a9-5d4906790807"
|
|
```
|
|
|
|
- Create reusable profile packs under a clear directory such as
|
|
`profiles/general-knowledge/`
|
|
- Include contracts for generated entities, relations, summaries, and
|
|
evaluations
|
|
- Include prompt templates for:
|
|
- source/chunk summary
|
|
- entity extraction
|
|
- relation extraction
|
|
- entity evaluation
|
|
- collection synthesis/reporting
|
|
- Let profiles define terminology, extraction granularity, evaluation criteria,
|
|
and optional lenses such as VSM
|
|
- Preserve the Wealth/VSM pilot as a specialized profile or example derived
|
|
from the generic path
|
|
|
|
### T03 - OpenRouter provider adapter and model configuration
|
|
|
|
```task
|
|
id: IB-WP-0015-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "c02720c5-1b82-458a-bf8c-9147af4fd9e9"
|
|
```
|
|
|
|
- Add an explicit OpenRouter assisted-generation adapter
|
|
- Read credentials from environment, preferably `OPENROUTER_API_KEY`
|
|
- Accept `--model <openrouter-model-id>` at the CLI boundary
|
|
- Record provider, model, request ID if available, timing, token usage if
|
|
available, retry count, and error detail in run records
|
|
- Add rate-limit and retry behavior that is visible and bounded
|
|
- Add model fallback support only when explicitly configured
|
|
- Keep fixture adapter support for deterministic tests
|
|
- Add provider contract tests with mocked HTTP, not live network calls
|
|
|
|
### T04 - Generator CLI orchestration
|
|
|
|
```task
|
|
id: IB-WP-0015-T04
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "21b50fbc-f43e-4b18-b012-976a5241f52a"
|
|
```
|
|
|
|
- Add `infospace-bench generate ...` subcommands
|
|
- `generate init` creates an infospace from a source and selected profile
|
|
- `generate plan` shows chunk/stage/provider work without mutation
|
|
- `generate run` executes selected stages
|
|
- `generate resume` continues incomplete or failed work
|
|
- `generate status` reports source chunks, generated artifacts, failures,
|
|
stale outputs, evaluations, and metrics
|
|
- Support both stepwise and combined `from-source` flows
|
|
- Keep CLI output structured JSON by default, consistent with existing commands
|
|
- Ensure commands work with current local-folder backend and do not block
|
|
`IB-WP-0014`
|
|
|
|
### T05 - Incremental resume and stale output handling
|
|
|
|
```task
|
|
id: IB-WP-0015-T05
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "ad882b6e-924e-4f9a-8e93-119aeadd8132"
|
|
```
|
|
|
|
- Track a generation state file under `output/workflows/` or an equivalent
|
|
successor location
|
|
- Record chunk digest, stage status, output artifact IDs, provider metadata,
|
|
errors, and timestamps
|
|
- Skip unchanged completed chunks by default
|
|
- Detect stale generated artifacts when source digests or profile/template
|
|
digests change
|
|
- Support rerun policies:
|
|
- failed only
|
|
- stale only
|
|
- force all
|
|
- selected chunk
|
|
- Add tests for interrupted generation, resume, stale detection, and idempotent
|
|
manifest updates
|
|
|
|
### T06 - End-to-end examples, docs, and acceptance suite
|
|
|
|
```task
|
|
id: IB-WP-0015-T06
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "3461eacf-e42a-455c-954c-849b0ad69fc1"
|
|
```
|
|
|
|
- Add deterministic end-to-end fixtures:
|
|
- one article
|
|
- one small ebook-like fixture
|
|
- one folder collection
|
|
- Prove each can generate an infospace with fixture responses
|
|
- Add an optional live OpenRouter smoke path that is skipped unless explicitly
|
|
enabled
|
|
- Document:
|
|
- how to choose a model
|
|
- where to put credentials
|
|
- how to cap chunks/cost
|
|
- how to resume
|
|
- how to review generated artifacts
|
|
- how to move from a generic profile to a specialized profile
|
|
- Update README and replacement docs with the new generator path
|
|
|
|
## Acceptance
|
|
|
|
- A user can generate a new infospace from a local article fixture using only
|
|
deterministic fixture responses
|
|
- A user can generate a new infospace from an ebook-like fixture using only
|
|
deterministic fixture responses
|
|
- A user can generate a new infospace from a folder collection using only
|
|
deterministic fixture responses
|
|
- A user can run the same CLI with `--provider openrouter --model <model-id>`
|
|
when `OPENROUTER_API_KEY` is configured
|
|
- Generated sources, chunks, entities, relations, evaluations, metrics, history,
|
|
and reports are manifest-backed and inspectable
|
|
- Generation is resumable and idempotent for unchanged inputs
|
|
- Stale outputs are detected when source or profile/template inputs change
|
|
- Live provider calls are explicit, auditable, and absent from default tests
|
|
|
|
## Relationship To Existing Work
|
|
|
|
- Builds on `IB-WP-0013`, which proved the explicit workflow shape for the
|
|
Wealth/VSM one-chapter pilot.
|
|
- Should stay compatible with `IB-WP-0014`, but should not wait for remote
|
|
backend support.
|
|
- Continues the successor split:
|
|
- `markitect-tool`: markdown parsing, templates, contracts
|
|
- `infospace-bench`: applied infospace generation workflow and CLI
|
|
- `kontextual-engine`: durable runtime/retrieval/audit if needed later
|
|
|
|
## Implementation Notes
|
|
|
|
Completed on 2026-05-14.
|
|
|
|
- Added generic source intake for Markdown, plain text, local HTML, EPUB-like
|
|
archives, and folder collections.
|
|
- Added the `general-knowledge` profile with prompt templates and contracts.
|
|
- Added an explicit OpenRouter assisted-generation adapter with mocked provider
|
|
tests and environment-based credential lookup.
|
|
- Added `infospace-bench generate` subcommands for init, plan, run, resume,
|
|
status, and from-source flows.
|
|
- Added generation state, resume skipping, source/profile stale detection,
|
|
metrics/history recording, and a manifest-backed generation report.
|
|
- Added deterministic acceptance tests for article, ebook-like, and folder
|
|
generation using fixture responses.
|