Wealth VSM Generation Pipeline

Date: 2026-05-14

Purpose

This document defines how infospace-bench regenerates the Adam Smith Wealth of Nations / VSM infospace through explicit workflows.

The successor path is workflow-first. It does not reuse the legacy process_chapters.py entrypoint, hide provider calls in a broad command, or write generated files outside the artifact manifest.

Legacy pipeline decomposition

The old Wealth/VSM experiment in markitect-main processed source chapters through these conceptual stages:

Legacy stage	Successor workflow shape	Notes
`extract-entities`	`wealth-vsm-extract-entities` assisted stage plus `split_entities` stage	Assisted output is a chapter entity bundle; bench splits and registers stable entity artifacts.
`map-to-vsm`	`wealth-vsm-map-and-analyze` assisted relation stage	Relation artifacts use the successor relation parser and manifest IDs.
`synthesize-analysis`	`wealth-vsm-map-and-analyze` assisted analysis stage	Analysis remains a generated artifact with source provenance.
`evaluate-entity`	`wealth-vsm-evaluate-entities` assisted stage	Evaluation files use successor `artifact_id` frontmatter.
`assess-metrics`	`infospace-bench check`	Deterministic checks merge generated evaluations into metrics and history.

The first golden target is Book I Chapter III because it grounds the existing wealth-vsm-legacy-slice pilot and exercises the market-extent relation.

One-chapter pilot

infospaces/wealth-vsm-generation-pilot/ contains:

one source excerpt: book-1-chapter-03.md
explicit workflow declarations for extraction, VSM mapping/analysis, and entity evaluation
deterministic fixture responses for tests
markdown contracts for generated entity and relation artifacts
a pilot report comparing the successor workflow shape with the legacy process script

Default tests use fixture responses so they do not require network access, provider credentials, or live model output.

Live provider-backed generation

Any live provider-backed generation should use the same workflow declarations and the same assisted request records. Provider adapters must be selected explicitly by the caller and should record provider metadata in workflow run records and artifact provenance.

Live runs should document:

provider and model
prompt/template version
source corpus selection
retry and rate-limit settings
expected cost range
resume strategy
generated artifact review status

Full corpus scale-up

Scale-up should proceed only after the one-chapter pilot is green.

Recommended sequence:

Run Book I Chapter III with fixture responses.
Run Book I Chapter III with a live provider in a disposable copy.
Review generated entities, relations, evaluations, and metrics.
Add a small Book I batch with explicit cost and resume notes.
Only then run the full corpus.

The full corpus should not be committed wholesale until it has a current scoped use, deterministic acceptance coverage, and a migration report explaining what was generated, reviewed, deferred, or retired.

3.1 KiB Raw Permalink Blame History