generated from coulomb/repo-seed
179 lines
6.2 KiB
Markdown
179 lines
6.2 KiB
Markdown
---
|
|
id: IB-WP-0013
|
|
type: workplan
|
|
title: "Wealth VSM Generation Pipeline Parity"
|
|
domain: markitect
|
|
repo: infospace-bench
|
|
status: completed
|
|
owner: markitect
|
|
topic_slug: markitect
|
|
created: "2026-05-14"
|
|
updated: "2026-05-14"
|
|
state_hub_workstream_slug: "ib-wp-0013-wealth-vsm-generation-pipeline-parity"
|
|
state_hub_workstream_id: "74dc579e-9b03-4a00-b739-84b1007cfb94"
|
|
---
|
|
|
|
# IB-WP-0013 - Wealth VSM Generation Pipeline Parity
|
|
|
|
## Goal
|
|
|
|
Make `infospace-bench` capable of regenerating the Adam Smith
|
|
`Wealth of Nations` / VSM infospace through explicit, auditable workflows.
|
|
|
|
This should replace the old `markitect-project` generation path without
|
|
copying its hidden provider calls, implicit output conventions, or monolithic
|
|
`process` command shape.
|
|
|
|
## Intent
|
|
|
|
The legacy implementation could run a chapter corpus through:
|
|
|
|
- entity extraction
|
|
- VSM mapping
|
|
- chapter-level analysis synthesis
|
|
- entity evaluation
|
|
- classification and relation enrichment
|
|
- collection metrics
|
|
|
|
The successor should express those stages as declared infospace workflows with
|
|
deterministic planning, fake-adapter tests, explicit assisted-generation
|
|
requests, stable manifest registration, and clear provenance.
|
|
|
|
## Non-Goals
|
|
|
|
- Recreate the old `process_chapters.py` script as-is.
|
|
- Hide provider-specific LLM calls behind a generic command.
|
|
- Require a live provider or network access for default tests.
|
|
- Commit the full regenerated Wealth/VSM output before a one-chapter pilot is
|
|
proven.
|
|
- Move durable runtime, retrieval, or audit responsibilities into
|
|
`infospace-bench`; those remain `kontextual-engine` concerns.
|
|
|
|
## Tasks
|
|
|
|
### T01 - Legacy pipeline decomposition and corpus map
|
|
|
|
```task
|
|
id: IB-WP-0013-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "2c558d1e-290f-4e0e-abe6-37302cc31ac4"
|
|
```
|
|
|
|
- Map legacy `examples/infospace-with-history/process_chapters.py`
|
|
- Inventory old templates: `extract-entities`, `map-to-vsm`,
|
|
`synthesize-analysis`, `evaluate-entity`, and `assess-metrics`
|
|
- Inventory source corpus, guidelines, VSM reference artifacts, generated
|
|
outputs, processing logs, and metrics files
|
|
- Record what must be migrated, reframed, delegated, deferred, or retired
|
|
- Pick the first one-chapter golden target, preferably Book I Chapter III so it
|
|
aligns with the current pruned legacy slice
|
|
|
|
### T02 - Assisted generation adapter and CLI boundary
|
|
|
|
```task
|
|
id: IB-WP-0013-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "70beb49c-49a3-49f4-9b3a-a4c5bdb88485"
|
|
```
|
|
|
|
- Extend workflow execution so assisted stages can be executed through an
|
|
explicit adapter selected by the caller
|
|
- Keep dry-run planning as the default safe path
|
|
- Add a deterministic fake adapter for tests
|
|
- Persist assisted requests, provider metadata, generated outputs, and run
|
|
records
|
|
- Expose CLI/API behavior without embedding provider-specific code in core
|
|
workflow logic
|
|
|
|
### T03 - Entity bundle splitting and manifest registration
|
|
|
|
```task
|
|
id: IB-WP-0013-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "4a340077-f0ab-40fe-a0bc-0fa94a325774"
|
|
```
|
|
|
|
- Parse generated chapter-level entity bundles into individual entity artifacts
|
|
- Normalize stable artifact IDs and filenames
|
|
- Register each artifact in `artifacts/index.yaml`
|
|
- Preserve source chapter, workflow, stage, provider, and input provenance
|
|
- Make reruns idempotent: unchanged artifacts should not duplicate manifest
|
|
entries
|
|
- Add tests for malformed bundles, duplicate entities, and manifest updates
|
|
|
|
### T04 - VSM mapping analysis and evaluation workflows
|
|
|
|
```task
|
|
id: IB-WP-0013-T04
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "62696191-d6fa-4d34-bf18-97f390a31b61"
|
|
```
|
|
|
|
- Recreate `map-to-vsm` as an explicit assisted workflow
|
|
- Recreate `synthesize-analysis` as an explicit assisted workflow
|
|
- Recreate entity evaluation as an explicit assisted workflow that writes
|
|
successor `artifact_id` evaluation files
|
|
- Ensure generated mappings and relations can be parsed by current semantic
|
|
models or clearly identify required model extensions
|
|
- Connect generated evaluations to metrics/history and viability checks
|
|
|
|
### T05 - Wealth VSM pilot scale-up acceptance
|
|
|
|
```task
|
|
id: IB-WP-0013-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "fe8dd175-9630-4fe1-99aa-2f3e58172a52"
|
|
```
|
|
|
|
- Prove one-chapter regeneration end to end with deterministic tests
|
|
- Add a committed pilot report comparing regenerated successor output with the
|
|
legacy generated output shape
|
|
- Add docs for running a live provider-backed generation outside the default
|
|
test suite
|
|
- Document cost, rate-limit, resume, and reproducibility guidance
|
|
- Define the acceptance path for scaling from one chapter to the full corpus
|
|
|
|
## Acceptance
|
|
|
|
- A user can inspect, plan, and run the Wealth/VSM generation workflow over a
|
|
one-chapter pilot without using the old `markitect-project` process script
|
|
- Default tests use fake adapters and are deterministic
|
|
- Generated entities are split into stable files and registered in the manifest
|
|
- Evaluation outputs use successor `artifact_id` semantics and feed metrics
|
|
history
|
|
- The workflow clearly distinguishes deterministic template stages from
|
|
assisted provider-backed stages
|
|
- Remaining full-corpus risks are documented before any large generation run
|
|
|
|
## Relationship To IB-WP-0014
|
|
|
|
This workplan can start on the current local-folder backend. It should avoid
|
|
hard-coding storage assumptions where reasonable, but it is not blocked by the
|
|
backend abstraction workplan.
|
|
|
|
## Implementation
|
|
|
|
- Added `docs/wealth-vsm-generation-pipeline.md` with the legacy pipeline
|
|
decomposition, one-chapter pilot path, live-provider guidance, and full
|
|
corpus scale-up sequence.
|
|
- Added `infospaces/wealth-vsm-generation-pilot/` with Book I Chapter III,
|
|
explicit extraction, mapping/analysis, and evaluation workflows, deterministic
|
|
fixture responses, contracts, and a pilot report.
|
|
- Added `FixtureAssistedGenerationAdapter` and CLI
|
|
`workflow run --fixture-responses` support so assisted stages are explicit and
|
|
deterministic by default.
|
|
- Added entity bundle parsing/splitting with idempotent manifest registration.
|
|
- Added evaluation output handling so generated evaluation files feed
|
|
`infospace-bench check` metrics/history.
|
|
- Added `tests/test_wealth_vsm_generation.py`.
|
|
|
|
## Verification
|
|
|
|
- `python3 -m pytest tests/test_wealth_vsm_generation.py`
|
|
- `python3 -m pytest`
|