3.1 KiB
Wealth VSM Generation Pipeline
Date: 2026-05-14
Purpose
This document defines how infospace-bench regenerates the Adam Smith
Wealth of Nations / VSM infospace through explicit workflows.
The successor path is workflow-first. It does not reuse the legacy
process_chapters.py entrypoint, hide provider calls in a broad command, or
write generated files outside the artifact manifest.
Legacy pipeline decomposition
The old Wealth/VSM experiment in markitect-main processed source chapters
through these conceptual stages:
| Legacy stage | Successor workflow shape | Notes |
|---|---|---|
extract-entities |
wealth-vsm-extract-entities assisted stage plus split_entities stage |
Assisted output is a chapter entity bundle; bench splits and registers stable entity artifacts. |
map-to-vsm |
wealth-vsm-map-and-analyze assisted relation stage |
Relation artifacts use the successor relation parser and manifest IDs. |
synthesize-analysis |
wealth-vsm-map-and-analyze assisted analysis stage |
Analysis remains a generated artifact with source provenance. |
evaluate-entity |
wealth-vsm-evaluate-entities assisted stage |
Evaluation files use successor artifact_id frontmatter. |
assess-metrics |
infospace-bench check |
Deterministic checks merge generated evaluations into metrics and history. |
The first golden target is Book I Chapter III because it grounds the existing
wealth-vsm-legacy-slice pilot and exercises the market-extent relation.
One-chapter pilot
infospaces/wealth-vsm-generation-pilot/ contains:
- one source excerpt:
book-1-chapter-03.md - explicit workflow declarations for extraction, VSM mapping/analysis, and entity evaluation
- deterministic fixture responses for tests
- markdown contracts for generated entity and relation artifacts
- a pilot report comparing the successor workflow shape with the legacy process script
Default tests use fixture responses so they do not require network access, provider credentials, or live model output.
Live provider-backed generation
Any live provider-backed generation should use the same workflow declarations and the same assisted request records. Provider adapters must be selected explicitly by the caller and should record provider metadata in workflow run records and artifact provenance.
Live runs should document:
- provider and model
- prompt/template version
- source corpus selection
- retry and rate-limit settings
- expected cost range
- resume strategy
- generated artifact review status
Full corpus scale-up
Scale-up should proceed only after the one-chapter pilot is green.
Recommended sequence:
- Run Book I Chapter III with fixture responses.
- Run Book I Chapter III with a live provider in a disposable copy.
- Review generated entities, relations, evaluations, and metrics.
- Add a small Book I batch with explicit cost and resume notes.
- Only then run the full corpus.
The full corpus should not be committed wholesale until it has a current scoped use, deterministic acceptance coverage, and a migration report explaining what was generated, reviewed, deferred, or retired.