generated from coulomb/repo-seed
Workplan for infospace creation
This commit is contained in:
266
workplans/IB-WP-0015-generic-source-infospace-generator-cli.md
Normal file
266
workplans/IB-WP-0015-generic-source-infospace-generator-cli.md
Normal file
@@ -0,0 +1,266 @@
|
||||
---
|
||||
id: IB-WP-0015
|
||||
type: workplan
|
||||
title: "Generic Source Infospace Generator CLI"
|
||||
domain: markitect
|
||||
repo: infospace-bench
|
||||
status: planned
|
||||
owner: markitect
|
||||
topic_slug: markitect
|
||||
created: "2026-05-14"
|
||||
updated: "2026-05-14"
|
||||
state_hub_workstream_slug: "ib-wp-0015-generic-source-infospace-generator-cli"
|
||||
state_hub_workstream_id: "1bf47fb9-fe55-428a-b8da-8e6cc76d4b03"
|
||||
depends_on_workplans:
|
||||
- IB-WP-0013
|
||||
related_workplans:
|
||||
- IB-WP-0014
|
||||
---
|
||||
|
||||
# IB-WP-0015 - Generic Source Infospace Generator CLI
|
||||
|
||||
## Goal
|
||||
|
||||
Turn the Wealth/VSM example into a reusable CLI capability for incrementally
|
||||
building an infospace from an ebook, article, or collection of knowledge.
|
||||
|
||||
When this workplan is done, a user should be able to run something like:
|
||||
|
||||
```bash
|
||||
infospace-bench generate from-source ./examples/my-book.epub \
|
||||
--slug my-book \
|
||||
--name "My Book Infospace" \
|
||||
--profile general-knowledge \
|
||||
--provider openrouter \
|
||||
--model <openrouter-model-id> \
|
||||
--apply
|
||||
```
|
||||
|
||||
and get a manifest-backed infospace with normalized sources, generated entity
|
||||
artifacts, relation artifacts, evaluations, metrics/history, reports, and a
|
||||
clear resume path.
|
||||
|
||||
## Intent
|
||||
|
||||
`IB-WP-0013` proved the successor shape on a one-chapter Wealth/VSM pilot. This
|
||||
workplan generalizes that capability:
|
||||
|
||||
- source intake is generic, not Wealth-only
|
||||
- workflow templates are reusable profiles
|
||||
- assisted generation can use OpenRouter explicitly
|
||||
- generation is incremental and resumable
|
||||
- default tests stay deterministic and never require live provider credentials
|
||||
|
||||
The old infrastructure could generate the Adam Smith example with OpenRouter.
|
||||
The new infrastructure should recover that operational convenience while
|
||||
preserving the successor design: explicit workflows, auditable provider calls,
|
||||
stable artifact IDs, and clean repo boundaries.
|
||||
|
||||
## Target CLI Shape
|
||||
|
||||
Suggested commands:
|
||||
|
||||
```bash
|
||||
infospace-bench generate init <source> --slug <slug> --name <name> --profile <profile>
|
||||
infospace-bench generate plan <root> --stage all
|
||||
infospace-bench generate run <root> --provider openrouter --model <model-id> --stage all
|
||||
infospace-bench generate resume <root> --provider openrouter --model <model-id>
|
||||
infospace-bench generate status <root>
|
||||
```
|
||||
|
||||
Short-form combined command:
|
||||
|
||||
```bash
|
||||
infospace-bench generate from-source <source> \
|
||||
--slug <slug> \
|
||||
--name <name> \
|
||||
--profile general-knowledge \
|
||||
--provider openrouter \
|
||||
--model <model-id> \
|
||||
--apply
|
||||
```
|
||||
|
||||
Default-safe modes:
|
||||
|
||||
- `--dry-run`: plan without provider calls or writes beyond optional plan output
|
||||
- `--fixture-responses <path>`: deterministic tests and demos
|
||||
- `--max-chunks <n>`: bound early runs
|
||||
- `--stage intake|extract|relations|evaluate|metrics|all`
|
||||
- `--resume`: skip completed chunks and retry failed or stale work
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Do not make live OpenRouter calls in the default test suite.
|
||||
- Do not store API keys in `infospace.yaml`.
|
||||
- Do not build a general document conversion product inside this repo.
|
||||
- Do not hide provider calls behind implicit workflow execution.
|
||||
- Do not solve remote storage backends here; `IB-WP-0014` owns backend
|
||||
abstraction.
|
||||
- Do not require full EPUB/PDF/article extraction perfection in the first pass;
|
||||
extraction quality should be explicit and testable.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 - Source intake and corpus normalization
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T01
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "08196bf2-9323-4cd8-860c-4306c965ed60"
|
||||
```
|
||||
|
||||
- Add a source intake module that accepts files and folders
|
||||
- Normalize supported inputs into markdown-ish source artifacts with stable IDs
|
||||
- First supported source types:
|
||||
- Markdown
|
||||
- plain text
|
||||
- local HTML/article export
|
||||
- EPUB or ebook-like directory fixtures
|
||||
- folder collections of the above
|
||||
- Record source metadata: original path, source type, title, digest, chunk ID,
|
||||
import time, and extractor version
|
||||
- Add chunking for long inputs with deterministic chunk IDs
|
||||
- Add tests for article, ebook, and folder fixtures
|
||||
- Keep URL fetching optional and explicit; local fixtures must cover tests
|
||||
|
||||
### T02 - Generic workflow template pack and schema profiles
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T02
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "5604796b-cb09-43ed-b3a9-5d4906790807"
|
||||
```
|
||||
|
||||
- Create reusable profile packs under a clear directory such as
|
||||
`profiles/general-knowledge/`
|
||||
- Include contracts for generated entities, relations, summaries, and
|
||||
evaluations
|
||||
- Include prompt templates for:
|
||||
- source/chunk summary
|
||||
- entity extraction
|
||||
- relation extraction
|
||||
- entity evaluation
|
||||
- collection synthesis/reporting
|
||||
- Let profiles define terminology, extraction granularity, evaluation criteria,
|
||||
and optional lenses such as VSM
|
||||
- Preserve the Wealth/VSM pilot as a specialized profile or example derived
|
||||
from the generic path
|
||||
|
||||
### T03 - OpenRouter provider adapter and model configuration
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T03
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "c02720c5-1b82-458a-bf8c-9147af4fd9e9"
|
||||
```
|
||||
|
||||
- Add an explicit OpenRouter assisted-generation adapter
|
||||
- Read credentials from environment, preferably `OPENROUTER_API_KEY`
|
||||
- Accept `--model <openrouter-model-id>` at the CLI boundary
|
||||
- Record provider, model, request ID if available, timing, token usage if
|
||||
available, retry count, and error detail in run records
|
||||
- Add rate-limit and retry behavior that is visible and bounded
|
||||
- Add model fallback support only when explicitly configured
|
||||
- Keep fixture adapter support for deterministic tests
|
||||
- Add provider contract tests with mocked HTTP, not live network calls
|
||||
|
||||
### T04 - Generator CLI orchestration
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T04
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "21b50fbc-f43e-4b18-b012-976a5241f52a"
|
||||
```
|
||||
|
||||
- Add `infospace-bench generate ...` subcommands
|
||||
- `generate init` creates an infospace from a source and selected profile
|
||||
- `generate plan` shows chunk/stage/provider work without mutation
|
||||
- `generate run` executes selected stages
|
||||
- `generate resume` continues incomplete or failed work
|
||||
- `generate status` reports source chunks, generated artifacts, failures,
|
||||
stale outputs, evaluations, and metrics
|
||||
- Support both stepwise and combined `from-source` flows
|
||||
- Keep CLI output structured JSON by default, consistent with existing commands
|
||||
- Ensure commands work with current local-folder backend and do not block
|
||||
`IB-WP-0014`
|
||||
|
||||
### T05 - Incremental resume and stale output handling
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T05
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "ad882b6e-924e-4f9a-8e93-119aeadd8132"
|
||||
```
|
||||
|
||||
- Track a generation state file under `output/workflows/` or an equivalent
|
||||
successor location
|
||||
- Record chunk digest, stage status, output artifact IDs, provider metadata,
|
||||
errors, and timestamps
|
||||
- Skip unchanged completed chunks by default
|
||||
- Detect stale generated artifacts when source digests or profile/template
|
||||
digests change
|
||||
- Support rerun policies:
|
||||
- failed only
|
||||
- stale only
|
||||
- force all
|
||||
- selected chunk
|
||||
- Add tests for interrupted generation, resume, stale detection, and idempotent
|
||||
manifest updates
|
||||
|
||||
### T06 - End-to-end examples, docs, and acceptance suite
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T06
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "3461eacf-e42a-455c-954c-849b0ad69fc1"
|
||||
```
|
||||
|
||||
- Add deterministic end-to-end fixtures:
|
||||
- one article
|
||||
- one small ebook-like fixture
|
||||
- one folder collection
|
||||
- Prove each can generate an infospace with fixture responses
|
||||
- Add an optional live OpenRouter smoke path that is skipped unless explicitly
|
||||
enabled
|
||||
- Document:
|
||||
- how to choose a model
|
||||
- where to put credentials
|
||||
- how to cap chunks/cost
|
||||
- how to resume
|
||||
- how to review generated artifacts
|
||||
- how to move from a generic profile to a specialized profile
|
||||
- Update README and replacement docs with the new generator path
|
||||
|
||||
## Acceptance
|
||||
|
||||
- A user can generate a new infospace from a local article fixture using only
|
||||
deterministic fixture responses
|
||||
- A user can generate a new infospace from an ebook-like fixture using only
|
||||
deterministic fixture responses
|
||||
- A user can generate a new infospace from a folder collection using only
|
||||
deterministic fixture responses
|
||||
- A user can run the same CLI with `--provider openrouter --model <model-id>`
|
||||
when `OPENROUTER_API_KEY` is configured
|
||||
- Generated sources, chunks, entities, relations, evaluations, metrics, history,
|
||||
and reports are manifest-backed and inspectable
|
||||
- Generation is resumable and idempotent for unchanged inputs
|
||||
- Stale outputs are detected when source or profile/template inputs change
|
||||
- Live provider calls are explicit, auditable, and absent from default tests
|
||||
|
||||
## Relationship To Existing Work
|
||||
|
||||
- Builds on `IB-WP-0013`, which proved the explicit workflow shape for the
|
||||
Wealth/VSM one-chapter pilot.
|
||||
- Should stay compatible with `IB-WP-0014`, but should not wait for remote
|
||||
backend support.
|
||||
- Continues the successor split:
|
||||
- `markitect-tool`: markdown parsing, templates, contracts
|
||||
- `infospace-bench`: applied infospace generation workflow and CLI
|
||||
- `kontextual-engine`: durable runtime/retrieval/audit if needed later
|
||||
|
||||
Reference in New Issue
Block a user