Demonstrates infospace composition: the Wealth of Nations infospace is
used as a discipline, applying Smith's economic framework as a lens to
analyse modern supply chain management concepts.
New example: examples/supply-chain-vsm/
- infospace.yaml binding WoN as discipline (../infospace-with-history)
- 3 source documents: coordination mechanisms, capital & inventory,
market structure (~400 words each, original content)
- supply-chain-entity-schema-v1.0.md with WoN Concept required section
- won-mapping-schema-v1.0.md with Conceptual Continuity rating
- artifacts/won-reference/core-entities.md — 12 curated WoN entities
for injection as discipline context
- 8 hand-crafted entity files demonstrating LLM output format
- 3 mapping files with full rationale and VSM inheritance chains
- Viable: YES (5/5 thresholds)
Key mappings demonstrated:
Demand Signal → Effectual Demand (Strong, S2)
Vendor-Managed Inventory → Division of Labour (Strong, S1/S2)
Just-in-Time Inventory → Circulating Capital (Strong, S1/S3)
Bullwhip Effect → Natural Price (Moderate, S2)
Platform Intermediary → Merchant Capital (Strong, S2/S4)
Monopsony Power → Combination of Masters (Strong, S3*)
Platform fix: entity_parser.py now recognises ## Supply Chain Domain
as a domain alias for ## Economic Domain, enabling composed infospaces
to use their own domain section name.
Tutorial §13 rewritten with real commands, real output, and the full
mapping table from the demo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds LAYERED-DEVELOPMENT.md documenting the concept for evolving a flat
entity collection into a structured systemic model through four layers:
L0 Source text → L1 Raw entities (current) → L2 Typed entities
→ L3 Relation graph → L4 Minimal systemic model
Covers: the element/relation/principle/institution type taxonomy,
VSM as a structural coordinate system, the type × VSM coverage matrix,
triplet extraction with a controlled predicate vocabulary, feedback loop
detection, and the distillation hypothesis for finding the generative
core of a corpus.
Extends TUTORIAL.md with sections 17–23:
17. Observing entity heterogeneity
18. The four-layer model overview
19. Layer 2 — classifying entities (schema, pipeline stage, metrics)
20. Layer 3 — extracting the relation graph (triplets, feedback loops)
21. Layer 4 — the minimal systemic model (core-model.md output)
22. Planned CLI commands for layers 2–4
23. Layers 2–4 as composed infospaces
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `.*-raw\.md$` to `_DEFAULT_EXCLUDE_PATTERNS` in entity_parser.py to
prevent per-chapter raw LLM output files from being parsed as entities.
This eliminates 33 malformed domain values where delimiter text was
bleeding into the Economic Domain field.
- Lower coverage_ratio threshold from 0.50 → 0.40 in infospace.yaml to
reflect realistic multi-book corpus expectations (documented rationale
in METRICS-METHODOLOGY.md).
Post-fix metrics: 988 entities, 0 malformed, coverage_ratio=0.619 (pass).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- coverage.py: rewrite module docstring to explain what the metric actually
computes (domain × chapter cross-tabulation, not VSM system coverage),
what it does not capture (entity connectivity → C3), and when the
threshold is appropriate
- CoverageReport: add domain_densities, density_std, cross_cutting_ratio
for distribution-level insight beyond the aggregate ratio
- check_coverage: compute per-domain density and cross-cutting ratio
- METRICS-METHODOLOGY.md: correct C2 section to match implementation,
document the distribution-based interpretation, add implementation status
table distinguishing what is wired vs planned
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1021 entities extracted across all Books 1-5 of The Wealth of Nations.
Final metrics: coverage=0.4424, granularity=2.9533, redundancy=0.0059.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Free-tier APIs intermittently return invalid JSON or empty responses.
Now any exception in _call_llm retries up to 3 times with a 5s back-off,
rather than failing immediately on non-rate-limit errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Chapters with many pre-existing entities were still truncating at 6000 tokens
because the LLM needs space to output the full list of candidates even when
most are skipped.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
map-to-vsm was consistently truncating at 6000 tokens; synthesize-analysis
sometimes truncated at 3000 for chapters with many entities.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PipelineStage now supports max_tokens to override the 4096 default
- SourcePipeline records provider/model on each entity file as HTML comment
- output/processing-log.yaml tracks tokens, cost, duration, retries, errors
- _call_llm returns (content, metadata) for downstream traceability
- _http.py wraps JSON parse errors with body preview for debugging
- infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- SourcePipeline: retry split_entities stage once when 0 entity delimiters
are found (free-tier models intermittently return short non-formatted
responses); save raw LLM response to <stage>-raw.md alongside prompts
- Return None (pause pipeline) rather than writing empty view file when
no entities found after max retries
- _http.py: wrap json.JSONDecodeError in LLMAPIError with body preview
- extract-entities.md: add explicit H2-heading format example to Output
Format section to prevent models from using inline "Section:" format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend PipelineStage with name, output_dir, output_macro,
split_entities, and macros fields for declarative pipeline config
- Add SourcePipeline class (pipeline.py) using simple @{macro}
substitution — no SQLite dependency, skip-if-exists per stage,
LLM retry on rate limits, git commit per source
- Add `markitect infospace process [GLOB_PATTERN]` CLI command with
--all, --provider, --model, --check-after-each, --no-commit flags
- Update infospace.yaml with output_dir, output_macro, split_entities,
and macros for each pipeline stage in the WoN example
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend PipelineStage with name, output_dir, output_macro,
split_entities, and macros fields for declarative pipeline config
- Add SourcePipeline class (pipeline.py) using simple @{macro}
substitution — no SQLite dependency, skip-if-exists per stage,
LLM retry on rate limits, git commit per source
- Add `markitect infospace process [GLOB_PATTERN]` CLI command with
--all, --provider, --model, --check-after-each, --no-commit flags
- Update infospace.yaml with output_dir, output_macro, split_entities,
and macros for each pipeline stage in the WoN example
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two root causes of metric fragmentation observed in collection checks:
1. Schema's Economic Domain used free-form examples ("labour economics,
trade theory") which overrode the enum in extraction-rules.md, causing
the LLM to produce multi-domain strings and non-canonical values.
Fix: schema now specifies the exact 7-value enum with descriptions.
2. Source Chapter had no format constraint, producing 9 different formats
for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.
These fixes are prerequisites for clean reprocessing (S3.2 continuation).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two root causes of metric fragmentation observed in collection checks:
1. Schema's Economic Domain used free-form examples ("labour economics,
trade theory") which overrode the enum in extraction-rules.md, causing
the LLM to produce multi-domain strings and non-canonical values.
Fix: schema now specifies the exact 7-value enum with descriptions.
2. Source Chapter had no format constraint, producing 9 different formats
for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.
These fixes are prerequisites for clean reprocessing (S3.2 continuation).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>