infospace pipeline for wealth of nations example

This commit is contained in:
2026-05-14 18:04:38 +02:00
parent 8804461ca3
commit a729a7643e
26 changed files with 1124 additions and 32 deletions

View File

@@ -87,3 +87,12 @@ infospace-bench workflow plan infospaces/bootstrap-pilot bootstrap-readiness
infospace-bench workflow run infospaces/bootstrap-pilot bootstrap-readiness
```
Run the Wealth/VSM one-chapter generation pilot with deterministic assisted
fixtures:
```bash
infospace-bench workflow run infospaces/wealth-vsm-generation-pilot wealth-vsm-extract-entities --fixture-responses infospaces/wealth-vsm-generation-pilot/workflows/fixtures/wealth-vsm-fake-responses.yaml
infospace-bench workflow run infospaces/wealth-vsm-generation-pilot wealth-vsm-map-and-analyze --fixture-responses infospaces/wealth-vsm-generation-pilot/workflows/fixtures/wealth-vsm-fake-responses.yaml
infospace-bench workflow run infospaces/wealth-vsm-generation-pilot wealth-vsm-evaluate-entities --fixture-responses infospaces/wealth-vsm-generation-pilot/workflows/fixtures/wealth-vsm-fake-responses.yaml
infospace-bench check infospaces/wealth-vsm-generation-pilot
```

View File

@@ -30,6 +30,7 @@ considered a replacement for each in-scope legacy infospace behavior from
| Persist durable assets | Optional engine-backed repository adapter | Dry-run sync tests and integration design | `IB-WP-0010` | boundary done |
| Run a legacy-derived pilot | Pruned `infospace-with-history` migration | Pilot corpus, migration report, parity comparison | `IB-WP-0011` | done |
| Provide command migration path | Legacy command parity guide | Command table, examples, migration guide, decision record, acceptance tests | `IB-WP-0012` | done |
| Regenerate Wealth/VSM pilot | Explicit assisted workflows and deterministic fixtures | One-chapter generation tests, bundle splitting, evaluation metrics, scale-up docs | `IB-WP-0013` | done |
## Replacement Gates

View File

@@ -0,0 +1,76 @@
# Wealth VSM Generation Pipeline
Date: 2026-05-14
## Purpose
This document defines how `infospace-bench` regenerates the Adam Smith
`Wealth of Nations` / VSM infospace through explicit workflows.
The successor path is workflow-first. It does not reuse the legacy
`process_chapters.py` entrypoint, hide provider calls in a broad command, or
write generated files outside the artifact manifest.
## Legacy pipeline decomposition
The old Wealth/VSM experiment in `markitect-main` processed source chapters
through these conceptual stages:
| Legacy stage | Successor workflow shape | Notes |
| --- | --- | --- |
| `extract-entities` | `wealth-vsm-extract-entities` assisted stage plus `split_entities` stage | Assisted output is a chapter entity bundle; bench splits and registers stable entity artifacts. |
| `map-to-vsm` | `wealth-vsm-map-and-analyze` assisted relation stage | Relation artifacts use the successor relation parser and manifest IDs. |
| `synthesize-analysis` | `wealth-vsm-map-and-analyze` assisted analysis stage | Analysis remains a generated artifact with source provenance. |
| `evaluate-entity` | `wealth-vsm-evaluate-entities` assisted stage | Evaluation files use successor `artifact_id` frontmatter. |
| `assess-metrics` | `infospace-bench check` | Deterministic checks merge generated evaluations into metrics and history. |
The first golden target is Book I Chapter III because it grounds the existing
`wealth-vsm-legacy-slice` pilot and exercises the market-extent relation.
## One-chapter pilot
`infospaces/wealth-vsm-generation-pilot/` contains:
- one source excerpt: `book-1-chapter-03.md`
- explicit workflow declarations for extraction, VSM mapping/analysis, and
entity evaluation
- deterministic fixture responses for tests
- markdown contracts for generated entity and relation artifacts
- a pilot report comparing the successor workflow shape with the legacy
process script
Default tests use fixture responses so they do not require network access,
provider credentials, or live model output.
## Live provider-backed generation
Any live provider-backed generation should use the same workflow declarations and
the same assisted request records. Provider adapters must be selected
explicitly by the caller and should record provider metadata in workflow run
records and artifact provenance.
Live runs should document:
- provider and model
- prompt/template version
- source corpus selection
- retry and rate-limit settings
- expected cost range
- resume strategy
- generated artifact review status
## Full corpus scale-up
Scale-up should proceed only after the one-chapter pilot is green.
Recommended sequence:
1. Run Book I Chapter III with fixture responses.
2. Run Book I Chapter III with a live provider in a disposable copy.
3. Review generated entities, relations, evaluations, and metrics.
4. Add a small Book I batch with explicit cost and resume notes.
5. Only then run the full corpus.
The full corpus should not be committed wholesale until it has a current scoped
use, deterministic acceptance coverage, and a migration report explaining what
was generated, reviewed, deferred, or retired.