infospace pipeline for wealth of nations example

This commit is contained in:
2026-05-14 18:04:38 +02:00
parent 8804461ca3
commit a729a7643e
26 changed files with 1124 additions and 32 deletions

View File

@@ -30,8 +30,10 @@ Start with:
- `docs/legacy-command-parity.md`
- `docs/legacy-infospace-migration-guide.md`
- `docs/replacement-readiness-decision.md`
- `docs/wealth-vsm-generation-pipeline.md`
- `infospaces/bootstrap-pilot/`
- `infospaces/wealth-vsm-legacy-slice/`
- `infospaces/wealth-vsm-generation-pilot/`
- `workplans/`
Current development command:

View File

@@ -87,3 +87,12 @@ infospace-bench workflow plan infospaces/bootstrap-pilot bootstrap-readiness
infospace-bench workflow run infospaces/bootstrap-pilot bootstrap-readiness
```
Run the Wealth/VSM one-chapter generation pilot with deterministic assisted
fixtures:
```bash
infospace-bench workflow run infospaces/wealth-vsm-generation-pilot wealth-vsm-extract-entities --fixture-responses infospaces/wealth-vsm-generation-pilot/workflows/fixtures/wealth-vsm-fake-responses.yaml
infospace-bench workflow run infospaces/wealth-vsm-generation-pilot wealth-vsm-map-and-analyze --fixture-responses infospaces/wealth-vsm-generation-pilot/workflows/fixtures/wealth-vsm-fake-responses.yaml
infospace-bench workflow run infospaces/wealth-vsm-generation-pilot wealth-vsm-evaluate-entities --fixture-responses infospaces/wealth-vsm-generation-pilot/workflows/fixtures/wealth-vsm-fake-responses.yaml
infospace-bench check infospaces/wealth-vsm-generation-pilot
```

View File

@@ -30,6 +30,7 @@ considered a replacement for each in-scope legacy infospace behavior from
| Persist durable assets | Optional engine-backed repository adapter | Dry-run sync tests and integration design | `IB-WP-0010` | boundary done |
| Run a legacy-derived pilot | Pruned `infospace-with-history` migration | Pilot corpus, migration report, parity comparison | `IB-WP-0011` | done |
| Provide command migration path | Legacy command parity guide | Command table, examples, migration guide, decision record, acceptance tests | `IB-WP-0012` | done |
| Regenerate Wealth/VSM pilot | Explicit assisted workflows and deterministic fixtures | One-chapter generation tests, bundle splitting, evaluation metrics, scale-up docs | `IB-WP-0013` | done |
## Replacement Gates

View File

@@ -0,0 +1,76 @@
# Wealth VSM Generation Pipeline
Date: 2026-05-14
## Purpose
This document defines how `infospace-bench` regenerates the Adam Smith
`Wealth of Nations` / VSM infospace through explicit workflows.
The successor path is workflow-first. It does not reuse the legacy
`process_chapters.py` entrypoint, hide provider calls in a broad command, or
write generated files outside the artifact manifest.
## Legacy pipeline decomposition
The old Wealth/VSM experiment in `markitect-main` processed source chapters
through these conceptual stages:
| Legacy stage | Successor workflow shape | Notes |
| --- | --- | --- |
| `extract-entities` | `wealth-vsm-extract-entities` assisted stage plus `split_entities` stage | Assisted output is a chapter entity bundle; bench splits and registers stable entity artifacts. |
| `map-to-vsm` | `wealth-vsm-map-and-analyze` assisted relation stage | Relation artifacts use the successor relation parser and manifest IDs. |
| `synthesize-analysis` | `wealth-vsm-map-and-analyze` assisted analysis stage | Analysis remains a generated artifact with source provenance. |
| `evaluate-entity` | `wealth-vsm-evaluate-entities` assisted stage | Evaluation files use successor `artifact_id` frontmatter. |
| `assess-metrics` | `infospace-bench check` | Deterministic checks merge generated evaluations into metrics and history. |
The first golden target is Book I Chapter III because it grounds the existing
`wealth-vsm-legacy-slice` pilot and exercises the market-extent relation.
## One-chapter pilot
`infospaces/wealth-vsm-generation-pilot/` contains:
- one source excerpt: `book-1-chapter-03.md`
- explicit workflow declarations for extraction, VSM mapping/analysis, and
entity evaluation
- deterministic fixture responses for tests
- markdown contracts for generated entity and relation artifacts
- a pilot report comparing the successor workflow shape with the legacy
process script
Default tests use fixture responses so they do not require network access,
provider credentials, or live model output.
## Live provider-backed generation
Any live provider-backed generation should use the same workflow declarations and
the same assisted request records. Provider adapters must be selected
explicitly by the caller and should record provider metadata in workflow run
records and artifact provenance.
Live runs should document:
- provider and model
- prompt/template version
- source corpus selection
- retry and rate-limit settings
- expected cost range
- resume strategy
- generated artifact review status
## Full corpus scale-up
Scale-up should proceed only after the one-chapter pilot is green.
Recommended sequence:
1. Run Book I Chapter III with fixture responses.
2. Run Book I Chapter III with a live provider in a disposable copy.
3. Review generated entities, relations, evaluations, and metrics.
4. Add a small Book I batch with explicit cost and resume notes.
5. Only then run the full corpus.
The full corpus should not be committed wholesale until it has a current scoped
use, deterministic acceptance coverage, and a migration report explaining what
was generated, reviewed, deferred, or retired.

View File

@@ -0,0 +1,10 @@
artifacts:
- id: source/book-1-chapter-03.md
path: artifacts/sources/book-1-chapter-03.md
kind: source
title: Book I Chapter III
provenance:
legacy_path: markitect-main/examples/infospace-with-history/artifacts/sources/book-1-chapter-03.md
pilot_role: one-chapter generation target
relationships: []

View File

@@ -0,0 +1,23 @@
---
id: book-1-chapter-03
title: THAT THE DIVISION OF LABOUR IS LIMITED BY THE EXTENT OF THE MARKET.
book: "1"
chapter: 3
artifact_type: content
legacy_source: markitect-main/examples/infospace-with-history/artifacts/sources/book-1-chapter-03.md
---
# Book I Chapter III
## Excerpt
As it is the power of exchanging that gives occasion to the division of
labour, so the extent of this division must always be limited by the extent of
that power, or, in other words, by the extent of the market.
When the market is very small, no person can have any encouragement to dedicate
himself entirely to one employment, for want of the power to exchange all that
surplus part of the produce of his own labour, which is over and above his own
consumption, for such parts of the produce of other men's labour as he has
occasion for.

View File

@@ -0,0 +1,25 @@
# Economic Entity Contract
```yaml contract
id: economic-entity-generation-pilot
document:
type: economic-entity
sections:
- id: definition
title: Definition
presence: required
level: 2
- id: source-chapter
title: Source Chapter
presence: required
level: 2
- id: context
title: Context
presence: required
level: 2
- id: economic-domain
title: Economic Domain
presence: required
level: 2
```

View File

@@ -0,0 +1,33 @@
# Relation Triplet Contract
```yaml contract
id: relation-triplet-generation-pilot
document:
type: relation-triplet
sections:
- id: subject
title: Subject
presence: required
level: 2
- id: predicate
title: Predicate
presence: required
level: 2
- id: object
title: Object
presence: required
level: 2
- id: relation-type
title: Relation Type
presence: required
level: 2
- id: vsm-channel
title: VSM Channel
presence: required
level: 2
- id: evidence
title: Evidence
presence: required
level: 2
```

View File

@@ -0,0 +1,106 @@
slug: wealth-vsm-generation-pilot
name: Wealth/VSM Generation Pilot
topic:
name: The Wealth of Nations / VSM Generation Pilot
domain: Classical Economics
sources: artifacts/sources
disciplines:
- name: Viable System Model
path: artifacts/vsm-reference
schemas:
entity: contracts/economic-entity.contract.md
relation: contracts/relation.contract.md
workflows:
- id: wealth-vsm-extract-entities
description: Extract economic entities from a source chapter and split the bundle into manifest artifacts.
inputs:
source:
kind: source
artifact_ids:
- source/book-1-chapter-03.md
static_macros:
discipline: Viable System Model
stages:
- id: extract-entities
kind: assisted
input: source
template: workflows/templates/extract-entities.md
provider_hint: explicit-adapter
output:
path: artifacts/generated/{{ input.slug }}-entities-bundle.md
artifact_id: generated/{{ input.slug }}-entities-bundle.md
kind: generated
title: "{{ input.title }} Entity Bundle"
- id: split-entity-bundle
kind: split_entities
input: source
static_macros:
bundle_stage: extract-entities
expected_evaluations:
- entity-contracts
- metrics
- id: wealth-vsm-map-and-analyze
description: Map generated entities to VSM relation artifacts and produce chapter analysis.
inputs:
source:
kind: source
artifact_ids:
- source/book-1-chapter-03.md
static_macros:
discipline: Viable System Model
stages:
- id: map-to-vsm
kind: assisted
input: source
template: workflows/templates/map-to-vsm.md
provider_hint: explicit-adapter
output:
path: artifacts/relations/division-of-labour-constrains-market-extent.md
artifact_id: relation/division-of-labour-constrains-market-extent.md
kind: relation
title: Division of Labour constrains Market Extent
- id: synthesize-analysis
kind: assisted
input: source
template: workflows/templates/synthesize-analysis.md
provider_hint: explicit-adapter
output:
path: artifacts/generated/{{ input.slug }}-analysis.md
artifact_id: generated/{{ input.slug }}-analysis.md
kind: generated
title: "{{ input.title }} VSM Analysis"
expected_evaluations:
- relation-contracts
- metrics
- id: wealth-vsm-evaluate-entities
description: Evaluate generated entity artifacts using successor artifact_id semantics.
inputs:
entity:
kind: entity
static_macros:
discipline: Viable System Model
stages:
- id: evaluate-entity
kind: assisted
input: entity
template: workflows/templates/evaluate-entity.md
provider_hint: explicit-adapter
output:
path: output/evaluations/{{ input.slug }}.md
artifact_id: evaluation/{{ input.slug }}.md
kind: evaluation
title: "{{ input.title }} Evaluation"
expected_evaluations:
- metrics
viability:
coverage_ratio:
min: 0.5
redundancy_ratio:
max: 0.1
coherence_components:
max: 3
consistency_cycles:
max: 0
per_artifact_mean:
min: 3.5

View File

@@ -0,0 +1,39 @@
# Wealth/VSM Generation Pilot Report
## Summary
This pilot proves one-chapter regeneration for the Adam Smith Wealth/VSM
infospace using explicit `infospace-bench` workflows, not the legacy process
script.
This is not the legacy process script; it is the successor workflow shape.
## One-Chapter Target
- Source: Book I Chapter III
- Legacy alignment: `markitect-main/examples/infospace-with-history`
- Successor pilot: `infospaces/wealth-vsm-generation-pilot`
## Workflow Shape
- `wealth-vsm-extract-entities` produces a generated entity bundle and splits
it into stable entity artifacts.
- `wealth-vsm-map-and-analyze` produces a relation artifact and generated
chapter analysis.
- `wealth-vsm-evaluate-entities` produces evaluation Markdown files using
successor `artifact_id` semantics.
- `infospace-bench check` merges deterministic collection checks and generated
evaluations into metrics history.
## Comparison To Legacy
The legacy implementation centered on a large `process_chapters.py` script with
implicit provider execution and output conventions. This pilot keeps provider
work explicit through assisted requests and records generated outputs through
the manifest or evaluation history.
## Scale-Up Notes
Full-corpus generation should happen only after reviewing one live
provider-backed run. Cost, model, prompt version, retry settings, and resume
strategy should be recorded before processing more chapters.

View File

@@ -0,0 +1,19 @@
# Book I Chapter III VSM Analysis
## Summary
The chapter describes a viability constraint: operational specialization can
increase productive power only when the exchange environment is large enough to
absorb specialized surplus.
## VSM Interpretation
Division of Labour behaves like a System 1 operational pattern. Market Extent
acts as an environmental and coordination constraint around those operations.
The generated relation records the structural dependency that makes
specialization viable only at sufficient market scale.
## Provenance
Generated by the fixture adapter for the one-chapter successor workflow pilot.

View File

@@ -0,0 +1,52 @@
# Division of Labour
## Definition
The separation of work into specialized tasks that increases productivity and depends on opportunities for exchange.
## Source Chapter
Book I, Chapter 3
## Context
Smith explains that the power of exchanging gives occasion to the division of labour, but that this division is constrained by how much exchange the market can support.
## Economic Domain
Production
## Original Wording
Smith writes that the power of exchanging gives occasion to the division of labour.
## Modern Interpretation
Specialization works only when producers can exchange specialized output for the goods they need.
# Market Extent
## Definition
The geographical and economic reach of exchange that determines whether specialized producers can find enough demand for their surplus output.
## Source Chapter
Book I, Chapter 3
## Context
Smith states that the extent of the division of labour must always be limited by the extent of the market.
## Economic Domain
Exchange
## Original Wording
The extent of this division must always be limited by the extent of the market.
## Modern Interpretation
Market size sets the practical ceiling for specialization because small markets cannot absorb surplus production.

View File

@@ -0,0 +1,30 @@
# Division of Labour constrains Market Extent
## Subject
Division of Labour
## Predicate
is limited by
## Object
Market Extent
## Relation Type
constrains
## VSM Channel
S2 <- S1
## Evidence
Book I, Chapter 3: "the extent of this division must always be limited by the extent of that power, or, in other words, by the extent of the market."
## Feedback Role
Market extent constrains specialization; successful specialization can increase surplus exchange and eventually expand the market.

View File

@@ -0,0 +1,51 @@
---
artifact_id: entity/division-of-labour.md
evaluator: fixture
evaluated_at: "2026-05-14T12:00:00+00:00"
scores:
- name: definition_precision
value: 4.0
max_value: 5.0
rationale: The definition is concise and grounded in exchange-mediated specialization.
- name: source_grounding
value: 5.0
max_value: 5.0
rationale: The entity is directly supported by Book I Chapter III.
- name: domain_placement
value: 4.0
max_value: 5.0
rationale: Production is the correct primary domain.
- name: vsm_relevance
value: 4.0
max_value: 5.0
rationale: The entity maps clearly to System 1 operations and coordination needs.
- name: explanatory_value
value: 5.0
max_value: 5.0
rationale: It explains why productive organization depends on exchange scale.
notes:
- Fixture evaluation for the explicit Wealth/VSM workflow pilot.
---
# Evaluation: entity/division-of-labour.md
## definition_precision - 4.0 / 5.0
The definition is concise and grounded in exchange-mediated specialization.
## source_grounding - 5.0 / 5.0
The entity is directly supported by Book I Chapter III.
## domain_placement - 4.0 / 5.0
Production is the correct primary domain.
## vsm_relevance - 4.0 / 5.0
The entity maps clearly to System 1 operations and coordination needs.
## explanatory_value - 5.0 / 5.0
It explains why productive organization depends on exchange scale.

View File

@@ -0,0 +1,51 @@
---
artifact_id: entity/market-extent.md
evaluator: fixture
evaluated_at: "2026-05-14T12:00:00+00:00"
scores:
- name: definition_precision
value: 4.0
max_value: 5.0
rationale: The definition captures market reach and demand capacity.
- name: source_grounding
value: 5.0
max_value: 5.0
rationale: The chapter states the relation directly.
- name: domain_placement
value: 4.0
max_value: 5.0
rationale: Exchange is the correct economic domain.
- name: vsm_relevance
value: 4.0
max_value: 5.0
rationale: The entity captures the environment that constrains operations.
- name: explanatory_value
value: 4.0
max_value: 5.0
rationale: It explains the limit on specialization.
notes:
- Fixture evaluation for the explicit Wealth/VSM workflow pilot.
---
# Evaluation: entity/market-extent.md
## definition_precision - 4.0 / 5.0
The definition captures market reach and demand capacity.
## source_grounding - 5.0 / 5.0
The chapter states the relation directly.
## domain_placement - 4.0 / 5.0
Exchange is the correct economic domain.
## vsm_relevance - 4.0 / 5.0
The entity captures the environment that constrains operations.
## explanatory_value - 4.0 / 5.0
It explains the limit on specialization.

View File

@@ -0,0 +1,22 @@
responses:
- stage_id: extract-entities
input_artifact_id: source/book-1-chapter-03.md
provider: fixture
markdown_path: responses/book-1-chapter-03-entities-bundle.md
- stage_id: map-to-vsm
input_artifact_id: source/book-1-chapter-03.md
provider: fixture
markdown_path: responses/book-1-chapter-03-relation.md
- stage_id: synthesize-analysis
input_artifact_id: source/book-1-chapter-03.md
provider: fixture
markdown_path: responses/book-1-chapter-03-analysis.md
- stage_id: evaluate-entity
input_artifact_id: entity/division-of-labour.md
provider: fixture
markdown_path: responses/division-of-labour-evaluation.md
- stage_id: evaluate-entity
input_artifact_id: entity/market-extent.md
provider: fixture
markdown_path: responses/market-extent-evaluation.md

View File

@@ -0,0 +1,12 @@
# Evaluate Entity
Discipline: {{ macros.discipline }}
Entity: {{ input.artifact_id }}
Return a successor evaluation Markdown file with YAML frontmatter containing
artifact_id, evaluator, evaluated_at, and scores.
## Entity Text
{{ input.content }}

View File

@@ -0,0 +1,12 @@
# Extract Economic Entities
Discipline: {{ macros.discipline }}
Source: {{ input.artifact_id }}
Return one Markdown entity document per top-level heading. Each entity must
include Definition, Source Chapter, Context, and Economic Domain sections.
## Source Text
{{ input.content }}

View File

@@ -0,0 +1,13 @@
# Map Source To VSM Relation
Discipline: {{ macros.discipline }}
Source: {{ input.artifact_id }}
Return one relation triplet that links the generated economic entities and
includes Subject, Predicate, Object, Relation Type, VSM Channel, Evidence, and
Feedback Role sections.
## Source Text
{{ input.content }}

View File

@@ -0,0 +1,12 @@
# Synthesize Chapter Analysis
Discipline: {{ macros.discipline }}
Source: {{ input.artifact_id }}
Write a concise VSM analysis of the chapter using the generated entities and
relation as evidence.
## Source Text
{{ input.content }}

View File

@@ -9,6 +9,7 @@ from pathlib import Path
from .checks import run_collection_checks
from .engine import engine_capability_contract, plan_asset_sync, sync_assets
from .errors import InfospaceError
from .evaluation_io import read_entity_evaluations
from .history import (
build_viability_report,
find_snapshot,
@@ -21,7 +22,12 @@ from .inspection import export_mermaid, relationship_summary
from .lifecycle import add_artifact, create_infospace, load_infospace
from .markdown_adapter import validate_infospace_artifacts
from .semantics import list_entities, list_relations
from .workflow import load_workflows, plan_workflow, run_workflow
from .workflow import (
FixtureAssistedGenerationAdapter,
load_workflows,
plan_workflow,
run_workflow,
)
def build_parser() -> argparse.ArgumentParser:
@@ -111,6 +117,11 @@ def build_parser() -> argparse.ArgumentParser:
)
workflow_run.add_argument("root")
workflow_run.add_argument("workflow_id")
workflow_run.add_argument(
"--fixture-responses",
default="",
help="Run assisted stages with deterministic fixture responses",
)
engine = sub.add_parser("engine", help="Inspect and sync engine boundary state")
engine_sub = engine.add_subparsers(dest="engine_command", required=True)
@@ -222,7 +233,11 @@ def main(argv: list[str] | None = None) -> int:
elif args.command == "check":
infospace = load_infospace(Path(args.root))
report = run_collection_checks(infospace.artifacts)
result = record_check_results(infospace.root, report)
result = record_check_results(
infospace.root,
report,
artifact_evaluations=_read_output_evaluations(infospace.root),
)
_write_json(
{
**result.to_dict(),
@@ -253,8 +268,19 @@ def main(argv: list[str] | None = None) -> int:
plan_workflow(Path(args.root), args.workflow_id).to_dict()
)
elif args.workflow_command == "run":
adapter = (
FixtureAssistedGenerationAdapter.from_file(
Path(args.fixture_responses)
)
if args.fixture_responses
else None
)
_write_json(
run_workflow(Path(args.root), args.workflow_id).to_dict()
run_workflow(
Path(args.root),
args.workflow_id,
assisted_adapter=adapter,
).to_dict()
)
else:
parser.error(f"Unhandled workflow command: {args.workflow_command}")
@@ -328,9 +354,14 @@ def _record_checks(root: Path):
return record_check_results(
infospace.root,
run_collection_checks(infospace.artifacts),
artifact_evaluations=_read_output_evaluations(infospace.root),
)
def _read_output_evaluations(root: Path):
return read_entity_evaluations(root / "output" / "evaluations")
def _relationship_summary_payload(summary) -> dict:
return {
"node_count": summary.node_count,

View File

@@ -75,6 +75,17 @@ def read_entity_evaluation(path: str | Path) -> EntityEvaluation:
)
def read_entity_evaluations(directory: str | Path) -> list[EntityEvaluation]:
source = Path(directory)
if not source.is_dir():
return []
return [
read_entity_evaluation(path)
for path in sorted(source.glob("*.md"))
if path.is_file()
]
def write_snapshot(snapshot: EvaluationSnapshot, path: str | Path) -> None:
target = Path(path)
target.parent.mkdir(parents=True, exist_ok=True)

View File

@@ -0,0 +1,127 @@
from __future__ import annotations
import re
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Any
from .errors import InfospaceError
from .lifecycle import register_artifact
from .semantics import slugify
ENTITY_HEADING_RE = re.compile(r"(?m)^# (?P<title>.+?)\s*$")
@dataclass(frozen=True)
class EntityBundleItem:
title: str
slug: str
markdown: str
@property
def artifact_id(self) -> str:
return f"entity/{self.slug}.md"
@property
def path(self) -> str:
return f"artifacts/entities/{self.slug}.md"
def to_dict(self) -> dict[str, Any]:
return asdict(self) | {
"artifact_id": self.artifact_id,
"path": self.path,
}
def parse_entity_bundle(markdown: str) -> list[EntityBundleItem]:
matches = list(ENTITY_HEADING_RE.finditer(markdown))
if not matches:
raise InfospaceError(
"invalid_entity_bundle",
"Entity bundle does not contain any top-level entity headings",
{"required_heading": "# <Entity Title>"},
)
items: list[EntityBundleItem] = []
seen_slugs: set[str] = set()
for index, match in enumerate(matches):
end = matches[index + 1].start() if index + 1 < len(matches) else len(markdown)
section = markdown[match.start() : end].strip() + "\n"
title = match.group("title").strip()
slug = slugify(title)
if not slug:
raise InfospaceError(
"invalid_entity_bundle",
"Entity bundle contains an empty entity heading",
{"title": title},
)
if slug in seen_slugs:
raise InfospaceError(
"duplicate_entity_bundle_item",
f"Entity bundle contains duplicate entity: {title}",
{"slug": slug, "title": title},
)
if not re.search(r"(?m)^## Definition\s*$", section):
raise InfospaceError(
"invalid_entity_bundle",
f"Entity bundle item is missing a Definition section: {title}",
{"slug": slug, "missing_sections": ["definition"]},
)
seen_slugs.add(slug)
items.append(EntityBundleItem(title=title, slug=slug, markdown=section))
return items
def write_entity_bundle_artifacts(
root: str | Path,
markdown: str,
*,
workflow_id: str,
stage_id: str,
input_artifact_id: str,
source_bundle_artifact_id: str = "",
provider: str = "",
dry_run: bool = False,
) -> list[EntityBundleItem]:
items = parse_entity_bundle(markdown)
root_path = Path(root)
for item in items:
if dry_run:
continue
target = root_path / item.path
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(item.markdown, encoding="utf-8")
relationships = [
{
"type": "generated_from",
"target": input_artifact_id,
}
]
if source_bundle_artifact_id:
relationships.append(
{
"type": "split_from",
"target": source_bundle_artifact_id,
}
)
register_artifact(
root_path,
artifact_id=item.artifact_id,
path=item.path,
kind="entity",
title=item.title,
provenance={
"workflow_id": workflow_id,
"stage_id": stage_id,
"input_artifact_id": input_artifact_id,
**(
{"source_bundle_artifact_id": source_bundle_artifact_id}
if source_bundle_artifact_id
else {}
),
**({"provider": provider} if provider else {}),
},
relationships=relationships,
)
return items

View File

@@ -9,6 +9,7 @@ from typing import Any, Protocol
import yaml
from .errors import InfospaceError
from .generation import write_entity_bundle_artifacts
from .lifecycle import load_infospace, register_artifact
from .markdown_adapter import render_markdown_template
from .models import KnowledgeArtifact
@@ -61,7 +62,7 @@ class WorkflowStage:
id: str
kind: str
input: str
template: str
template: str = ""
output: WorkflowOutputSpec | None = None
static_macros: dict[str, Any] = field(default_factory=dict)
provider_hint: str | None = None
@@ -74,7 +75,7 @@ class WorkflowStage:
id=str(data["id"]),
kind=str(data.get("kind") or "template"),
input=str(data.get("input") or ""),
template=str(data["template"]),
template=str(data.get("template") or ""),
output=WorkflowOutputSpec.from_dict(output) if isinstance(output, dict) else None,
static_macros=dict(data.get("static_macros") or {}),
provider_hint=(
@@ -210,6 +211,60 @@ class AssistedGenerationAdapter(Protocol):
"""Generate Markdown for an assisted workflow request."""
class FixtureAssistedGenerationAdapter:
def __init__(
self,
responses: dict[tuple[str, str], AssistedGenerationResult],
) -> None:
self.responses = responses
@classmethod
def from_file(cls, path: str | Path) -> "FixtureAssistedGenerationAdapter":
source = Path(path)
data = yaml.safe_load(source.read_text(encoding="utf-8")) or {}
if not isinstance(data, dict):
raise InfospaceError(
"invalid_assisted_fixture",
f"Expected mapping in assisted fixture file: {source}",
{"path": str(source)},
)
responses: dict[tuple[str, str], AssistedGenerationResult] = {}
for item in data.get("responses", []):
if not isinstance(item, dict):
continue
stage_id = str(item["stage_id"])
input_artifact_id = str(item.get("input_artifact_id") or "*")
markdown = str(item.get("markdown") or "")
markdown_path = item.get("markdown_path")
if markdown_path:
markdown = (source.parent / str(markdown_path)).read_text(
encoding="utf-8"
)
responses[(stage_id, input_artifact_id)] = AssistedGenerationResult(
markdown=markdown,
provider=str(item.get("provider") or "fixture"),
metadata=dict(item.get("metadata") or {}),
)
return cls(responses)
def generate(
self,
request: AssistedGenerationRequest,
) -> AssistedGenerationResult:
key = (request.stage_id, request.input_artifact_id)
result = self.responses.get(key) or self.responses.get((request.stage_id, "*"))
if result is None:
raise InfospaceError(
"missing_assisted_fixture_response",
"No fixture response for assisted workflow request",
{
"stage_id": request.stage_id,
"input_artifact_id": request.input_artifact_id,
},
)
return result
@dataclass(frozen=True)
class WorkflowStageRecord:
stage_id: str
@@ -317,9 +372,9 @@ def _execute_workflow(
)
for input_record in selected_inputs:
data = _template_data(workflow, stage, input_record, stage_outputs)
template_text = _read_template(infospace.root, stage.template)
rendered = render_markdown_template(template_text, data)
if stage.kind == "template":
template_text = _read_template(infospace.root, stage.template)
rendered = render_markdown_template(template_text, data)
output = _resolve_output(
workflow,
stage,
@@ -334,6 +389,7 @@ def _execute_workflow(
"content": rendered.markdown,
"artifact_id": output.artifact_id,
"path": output.path,
"provider": "",
}
stages.append(
WorkflowStageRecord(
@@ -345,6 +401,8 @@ def _execute_workflow(
)
)
elif stage.kind == "assisted":
template_text = _read_template(infospace.root, stage.template)
rendered = render_markdown_template(template_text, data)
request = AssistedGenerationRequest(
stage_id=stage.id,
workflow_id=workflow.id,
@@ -386,6 +444,13 @@ def _execute_workflow(
provider=result.provider,
)
outputs.append(output)
stage_outputs[stage.id] = {
"content": result.markdown,
"artifact_id": output.artifact_id,
"path": output.path,
"provider": result.provider,
"metadata": result.metadata,
}
stages.append(
WorkflowStageRecord(
stage_id=stage.id,
@@ -395,6 +460,77 @@ def _execute_workflow(
output_artifact_id=output.artifact_id,
)
)
elif stage.kind == "split_entities":
bundle_stage = str(stage.static_macros.get("bundle_stage") or "")
if not bundle_stage:
raise InfospaceError(
"missing_split_bundle_stage",
"split_entities stage requires static_macros.bundle_stage",
{"workflow_id": workflow.id, "stage_id": stage.id},
)
bundle_output = stage_outputs.get(bundle_stage)
if bundle_output is None:
if dry_run:
stages.append(
WorkflowStageRecord(
stage_id=stage.id,
kind=stage.kind,
status="waiting_for_assisted_output",
input_artifact_id=input_record.artifact_id,
)
)
continue
raise InfospaceError(
"missing_split_bundle_output",
"split_entities stage could not find the source bundle output",
{
"workflow_id": workflow.id,
"stage_id": stage.id,
"bundle_stage": bundle_stage,
},
)
items = write_entity_bundle_artifacts(
infospace.root,
str(bundle_output.get("content") or ""),
workflow_id=workflow.id,
stage_id=stage.id,
input_artifact_id=input_record.artifact_id,
source_bundle_artifact_id=str(
bundle_output.get("artifact_id") or ""
),
provider=str(bundle_output.get("provider") or ""),
dry_run=dry_run,
)
for item in items:
outputs.append(
WorkflowOutputRecord(
stage_id=stage.id,
artifact_id=item.artifact_id,
path=item.path,
kind="entity",
title=item.title,
input_artifact_id=input_record.artifact_id,
written=not dry_run,
)
)
stage_outputs[stage.id] = {
"content": "\n".join(item.markdown for item in items),
"artifact_id": ",".join(item.artifact_id for item in items),
"path": ",".join(item.path for item in items),
"provider": str(bundle_output.get("provider") or ""),
}
stages.append(
WorkflowStageRecord(
stage_id=stage.id,
kind=stage.kind,
status="planned" if dry_run else "completed",
input_artifact_id=input_record.artifact_id,
output_artifact_id=",".join(
item.artifact_id for item in items
),
message=f"split {len(items)} entities",
)
)
else:
raise InfospaceError(
"unsupported_workflow_stage",
@@ -525,25 +661,26 @@ def _resolve_output(
if not dry_run:
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(markdown, encoding="utf-8")
register_artifact(
root,
artifact_id=artifact_id,
path=output_path,
kind=stage.output.kind,
title=title,
provenance={
"workflow_id": workflow.id,
"stage_id": stage.id,
"input_artifact_id": input_record.artifact_id,
**({"provider": provider} if provider else {}),
},
relationships=[
{
"type": "generated_from",
"target": input_record.artifact_id,
}
],
)
if stage.output.kind != "evaluation":
register_artifact(
root,
artifact_id=artifact_id,
path=output_path,
kind=stage.output.kind,
title=title,
provenance={
"workflow_id": workflow.id,
"stage_id": stage.id,
"input_artifact_id": input_record.artifact_id,
**({"provider": provider} if provider else {}),
},
relationships=[
{
"type": "generated_from",
"target": input_record.artifact_id,
}
],
)
return WorkflowOutputRecord(
stage_id=stage.id,
artifact_id=artifact_id,

View File

@@ -0,0 +1,168 @@
import json
import os
import shutil
import subprocess
import sys
from pathlib import Path
import pytest
from infospace_bench import InfospaceError, load_infospace
from infospace_bench.evaluation_io import read_entity_evaluation
from infospace_bench.generation import parse_entity_bundle
from infospace_bench.history import read_metrics_file
from infospace_bench.markdown_adapter import validate_infospace_artifacts
from infospace_bench.semantics import list_entities, list_relations
REPO_ROOT = Path(__file__).resolve().parents[1]
PILOT = REPO_ROOT / "infospaces" / "wealth-vsm-generation-pilot"
FIXTURES = PILOT / "workflows" / "fixtures" / "wealth-vsm-fake-responses.yaml"
def cli_env() -> dict[str, str]:
env = os.environ.copy()
env["PYTHONPATH"] = str(REPO_ROOT / "src") + ":/home/worsch/markitect-tool/src"
return env
def run_cli(*args: str, cwd: Path | None = None) -> subprocess.CompletedProcess[str]:
return subprocess.run(
[sys.executable, "-m", "infospace_bench", *args],
check=False,
cwd=cwd or REPO_ROOT,
env=cli_env(),
text=True,
capture_output=True,
)
def copy_pilot(tmp_path: Path) -> Path:
target = tmp_path / PILOT.name
shutil.copytree(PILOT, target)
return target
def test_wealth_vsm_generation_plan_is_explicit_and_assisted() -> None:
plan = run_cli("workflow", "plan", str(PILOT), "wealth-vsm-extract-entities")
workflows = run_cli("workflow", "inspect", str(PILOT))
assert plan.returncode == 0, plan.stderr
assert workflows.returncode == 0, workflows.stderr
plan_payload = json.loads(plan.stdout)
workflow_ids = [
workflow["id"] for workflow in json.loads(workflows.stdout)["workflows"]
]
assert workflow_ids == [
"wealth-vsm-extract-entities",
"wealth-vsm-map-and-analyze",
"wealth-vsm-evaluate-entities",
]
assert plan_payload["status"] == "planned"
assert plan_payload["assisted_requests"][0]["stage_id"] == "extract-entities"
assert plan_payload["stages"][1]["kind"] == "split_entities"
assert plan_payload["stages"][1]["status"] == "waiting_for_assisted_output"
def test_wealth_vsm_generation_pipeline_runs_with_fixture_adapter(
tmp_path: Path,
) -> None:
root = copy_pilot(tmp_path)
fixture = root / "workflows" / "fixtures" / "wealth-vsm-fake-responses.yaml"
extraction = run_cli(
"workflow",
"run",
str(root),
"wealth-vsm-extract-entities",
"--fixture-responses",
str(fixture),
)
second_extraction = run_cli(
"workflow",
"run",
str(root),
"wealth-vsm-extract-entities",
"--fixture-responses",
str(fixture),
)
mapping = run_cli(
"workflow",
"run",
str(root),
"wealth-vsm-map-and-analyze",
"--fixture-responses",
str(fixture),
)
evaluation = run_cli(
"workflow",
"run",
str(root),
"wealth-vsm-evaluate-entities",
"--fixture-responses",
str(fixture),
)
check = run_cli("check", str(root))
validation = run_cli("validate", str(root))
assert extraction.returncode == 0, extraction.stderr
assert second_extraction.returncode == 0, second_extraction.stderr
assert mapping.returncode == 0, mapping.stderr
assert evaluation.returncode == 0, evaluation.stderr
assert check.returncode == 0, check.stderr
assert validation.returncode == 0, validation.stderr
loaded = load_infospace(root)
artifact_ids = [artifact.id for artifact in loaded.artifacts]
entities = list_entities(root)
relations = list_relations(root)
metrics = read_metrics_file(root / "output" / "metrics" / "metrics.yaml")
division_eval = read_entity_evaluation(
root / "output" / "evaluations" / "division-of-labour.md"
)
validation_results = validate_infospace_artifacts(root)
assert artifact_ids.count("entity/division-of-labour.md") == 1
assert artifact_ids.count("entity/market-extent.md") == 1
assert "generated/book-1-chapter-03-analysis.md" in artifact_ids
assert [entity.slug for entity in entities] == [
"division-of-labour",
"market-extent",
]
assert relations[0].subject_entity_id == "entity/division-of-labour.md"
assert relations[0].object_entity_id == "entity/market-extent.md"
assert division_eval.artifact_id == "entity/division-of-labour.md"
assert metrics["per_artifact_mean"] == 4.3
assert all(result.valid for result in validation_results)
def test_entity_bundle_parser_rejects_malformed_and_duplicate_bundles() -> None:
with pytest.raises(InfospaceError) as missing_h1:
parse_entity_bundle("## Definition\n\nNo top-level entity heading.")
with pytest.raises(InfospaceError) as duplicate:
parse_entity_bundle(
"# Market Extent\n\n## Definition\n\nOne.\n\n"
"# Market Extent\n\n## Definition\n\nTwo.\n"
)
assert missing_h1.value.code == "invalid_entity_bundle"
assert duplicate.value.code == "duplicate_entity_bundle_item"
def test_wealth_vsm_generation_docs_capture_scale_up_risks() -> None:
doc = (REPO_ROOT / "docs" / "wealth-vsm-generation-pipeline.md").read_text(
encoding="utf-8"
)
report = (PILOT / "reports" / "generation-pilot-report.md").read_text(
encoding="utf-8"
)
assert "Legacy pipeline decomposition" in doc
assert "One-chapter pilot" in doc
assert "Full corpus scale-up" in doc
assert "live provider-backed generation" in doc
assert "one-chapter regeneration" in report
assert "not the legacy process script" in report

View File

@@ -4,7 +4,7 @@ type: workplan
title: "Wealth VSM Generation Pipeline Parity"
domain: markitect
repo: infospace-bench
status: planned
status: completed
owner: markitect
topic_slug: markitect
created: "2026-05-14"
@@ -55,7 +55,7 @@ requests, stable manifest registration, and clear provenance.
```task
id: IB-WP-0013-T01
status: in_progress
status: done
priority: high
state_hub_task_id: "2c558d1e-290f-4e0e-abe6-37302cc31ac4"
```
@@ -73,7 +73,7 @@ state_hub_task_id: "2c558d1e-290f-4e0e-abe6-37302cc31ac4"
```task
id: IB-WP-0013-T02
status: in_progress
status: done
priority: high
state_hub_task_id: "70beb49c-49a3-49f4-9b3a-a4c5bdb88485"
```
@@ -91,7 +91,7 @@ state_hub_task_id: "70beb49c-49a3-49f4-9b3a-a4c5bdb88485"
```task
id: IB-WP-0013-T03
status: todo
status: done
priority: high
state_hub_task_id: "4a340077-f0ab-40fe-a0bc-0fa94a325774"
```
@@ -108,7 +108,7 @@ state_hub_task_id: "4a340077-f0ab-40fe-a0bc-0fa94a325774"
```task
id: IB-WP-0013-T04
status: todo
status: done
priority: high
state_hub_task_id: "62696191-d6fa-4d34-bf18-97f390a31b61"
```
@@ -125,7 +125,7 @@ state_hub_task_id: "62696191-d6fa-4d34-bf18-97f390a31b61"
```task
id: IB-WP-0013-T05
status: todo
status: done
priority: medium
state_hub_task_id: "fe8dd175-9630-4fe1-99aa-2f3e58172a52"
```
@@ -156,3 +156,23 @@ This workplan can start on the current local-folder backend. It should avoid
hard-coding storage assumptions where reasonable, but it is not blocked by the
backend abstraction workplan.
## Implementation
- Added `docs/wealth-vsm-generation-pipeline.md` with the legacy pipeline
decomposition, one-chapter pilot path, live-provider guidance, and full
corpus scale-up sequence.
- Added `infospaces/wealth-vsm-generation-pilot/` with Book I Chapter III,
explicit extraction, mapping/analysis, and evaluation workflows, deterministic
fixture responses, contracts, and a pilot report.
- Added `FixtureAssistedGenerationAdapter` and CLI
`workflow run --fixture-responses` support so assisted stages are explicit and
deterministic by default.
- Added entity bundle parsing/splitting with idempotent manifest registration.
- Added evaluation output handling so generated evaluation files feed
`infospace-bench check` metrics/history.
- Added `tests/test_wealth_vsm_generation.py`.
## Verification
- `python3 -m pytest tests/test_wealth_vsm_generation.py`
- `python3 -m pytest`