Files

tegwick ce30f874d5 docs(example): rewrite tutorial for infospace tooling (S3.4)

Update TUTORIAL.md to use infospace tooling commands alongside the
chapter processing pipeline:

- Add infospace.yaml declaration and `markitect infospace init`
- Add sections for evaluate, check (C1–C5), and viability dashboard
- Add `markitect infospace history` and status/entities commands
- Add composition section (bind-discipline, disciplines, stale-mappings)
- Update cost/performance: OpenRouter free tier, note claude-code limit
- Update chapter count to 9/35, reference clean-example-history branch
- Restructure as 16 sections following S3.4 roadmap outline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-19 11:11:45 +01:00

22 KiB

Raw Blame History

Building an Infospace with History — Tutorial

This tutorial walks through how to build a structured infospace from Adam Smith's The Wealth of Nations, mapping classical economic concepts to Stafford Beer's Viable System Model (VSM), using MarkiTect's infospace tooling.

By the end you will understand how to:

Declare an infospace with infospace.yaml and markitect infospace init
Design schemas that scaffold structured LLM output
Write prompt templates with dependency injection (@{macro} syntax)
Run an incremental, chapter-by-chapter pipeline
Evaluate entity quality and run collection-level checks
Review viability against declared thresholds
Track every change through git history
Use a completed infospace as a discipline for a new project

1. What Is an Infospace?

An infospace is a curated, self-describing collection of entities (concepts, mechanisms, observations) that together explain a topic through the lens of one or more disciplines.

Term	This example
Topic	The Wealth of Nations (Smith, 1776)
Discipline	Viable System Model (Beer)
Entities	Economic concepts: division of labour, natural price, …
Viability	Does the entity set answer the competency questions?

The challenge with a large source corpus is that it is too big for a single prompt. MarkiTect processes it incrementally, one chapter at a time, building up the entity set and tracking progress through git.

An infospace is viable when it meets threshold scores across defined metrics — it is fit for purpose as an explanatory tool.

2. Project Layout

examples/infospace-with-history/
│
├── infospace.yaml              # Declarative infospace configuration (NEW)
├── README.md
├── TUTORIAL.md                 # This file
├── INFRA-TASKS.md              # Infrastructure issues found during the experiment
├── process_chapters.py         # Pipeline script (chapter processing)
├── infospace.db                # SQLite artifact database (generated, not in git)
│
├── schemas/                    # Output structure definitions
│   ├── economic-entity-schema-v1.0.md
│   ├── vsm-concept-schema-v1.0.md
│   ├── vsm-mapping-schema-v1.0.md
│   └── chapter-analysis-schema-v1.0.md
│
├── templates/                  # Prompt templates (with @{macro} placeholders)
│   ├── extract-entities.md
│   ├── map-to-vsm.md
│   ├── synthesize-analysis.md
│   └── assess-metrics.md
│
├── artifacts/                  # Input artifacts
│   ├── sources/                # Chapter text (35 files)
│   ├── guidelines/             # Extraction and mapping rules
│   └── vsm-reference/         # VSM framework definition
│
└── output/                     # Generated artifacts (LLM outputs)
    ├── entities/               # Flat canonical entity set + chapter views
    │   ├── division-of-labour.md        # Canonical entity file (PRIMARY)
    │   ├── exchange.md
    │   ├── book-1-chapter-01-entities.md  # Chapter view (transclusion)
    │   └── ...
    ├── mappings/               # Per-chapter VSM mappings
    ├── analyses/               # Per-chapter synthesised analyses
    └── metrics/                # Collection metrics + history
        ├── metrics.yaml        # Latest metric values
        └── history.yaml        # Timestamped snapshot log

Entity organisation: The infospace maintains a flat canonical set of entities — one markdown file per entity in output/entities/. Duplicate slugs across chapters are skipped (first occurrence wins). Per-chapter *-entities.md files are secondary views using transclusion directives ({{ include "entity.md" }}), so editing a canonical file updates every chapter view that references it.

3. Initialising an Infospace

Starting fresh

Use markitect infospace init to create an infospace.yaml:

cd my-new-infospace/
markitect infospace init \
  --topic "The Wealth of Nations" \
  --domain "Classical Economics" \
  --sources artifacts/sources/ \
  --discipline "Viable System Model"

This creates a minimal infospace.yaml. Edit it to add schemas, competency questions, and viability thresholds:

topic:
  name: "The Wealth of Nations"
  domain: "Classical Economics"
  sources: artifacts/sources/

disciplines:
  - name: "Viable System Model"
    path: artifacts/vsm-reference/

schemas:
  entity: schemas/economic-entity-schema-v1.0.md
  mapping: schemas/vsm-mapping-schema-v1.0.md
  analysis: schemas/chapter-analysis-schema-v1.0.md

competency_questions: |
  1. How does Smith's division of labour map to VSM System 1 operations?
  2. What mechanisms in WoN correspond to VSM coordination (System 2)?
  3. Where does Smith describe self-organising regulation (System 3)?
  4. What role does the "invisible hand" play as a System 4 mechanism?
  5. How do Smith's views on government map to System 5 policy?
  6. Is the WoN entity set viable as an explanatory framework?

viability:
  redundancy_ratio: { max: 0.10 }
  coverage_ratio: { min: 0.50 }
  coherence_components: { max: 3 }
  consistency_cycles: { max: 0 }
  granularity_entropy: { min: 1.0 }

pipeline:
  stages:
    - name: extract-entities
      template: templates/extract-entities.md
    - name: map-to-vsm
      template: templates/map-to-vsm.md
    - name: synthesize-analysis
      template: templates/synthesize-analysis.md

Checking status

At any point, inspect the infospace:

markitect infospace status
# Infospace: The Wealth of Nations
# Domain:    Classical Economics
# Entities:  109
# Domains:   Production, Distribution, Exchange, Regulation
# Disciplines: Viable System Model
# Chapters:  9/35 processed

markitect infospace entities
# Lists all entities with domain, source chapter, word count

4. Designing Schemas

Before writing any prompts, define schemas — markdown documents that tell the LLM exactly what sections each output must contain. Schemas are not code; the LLM reads them as instructions.

Economic Entity Schema (`schemas/economic-entity-schema-v1.0.md`)

Every extracted entity must have:

H1 heading with the entity name (title case)
Definition (20–150 words, precise and non-circular)
Source Chapter citing Book and Chapter
Context — where in Smith's argument the entity appears
Economic Domain (Production, Distribution, Exchange, etc.)

Optional: Smith's Original Wording, Modern Interpretation.

VSM Mapping Schema (`schemas/vsm-mapping-schema-v1.0.md`)

Every entity-to-VSM mapping must have:

H1 heading: Entity Name -> VSM Concept Name
Economic Entity Reference and VSM Concept Reference
Mapping Rationale (minimum 30 words, grounded in Beer's definitions)
Mapping Strength: Strong, Moderate, or Weak

Chapter Analysis Schema (`schemas/chapter-analysis-schema-v1.0.md`)

The per-chapter synthesis includes:

Chapter Summary (50–300 words)
Entities Extracted — bulleted list
VSM Mappings — entity, concept, strength
VSM Coverage — explicit assessment of S1 through S5 and S3*
Gaps & Observations

Key insight: Schemas are artifacts — they live in the repository and can be versioned, diffed, and refined just like code. Improving a schema and re-processing a chapter is visible as a git diff.

5. Writing Prompt Templates

Each template is a markdown file with @{macro_name} placeholders that MarkiTect's resolver fills with artifact content at compile time.

Template 1: Extract Entities (`templates/extract-entities.md`)

# Extract Economic Entities

You are an analytical economist specialising in classical economic theory.
Your task is to extract distinct economic entities from a chapter of
Adam Smith's *The Wealth of Nations*.

## Source Chapter
@{chapter_text}

## Extraction Guidelines
@{extraction_rules}

## VSM Framework Context
@{vsm_framework}

## Existing Entities
@{existing_entities}

## Output Format
Output each entity delimited by `--- ENTITY: <entity-name> ---` markers.

The @{existing_entities} macro is generated at runtime from canonical files already on disk, enabling incremental extraction without duplication.

Template 2: Map to VSM (`templates/map-to-vsm.md`)

Inputs: @{entities}, @{vsm_framework}, @{mapping_rules}.

Template 3: Synthesise Analysis (`templates/synthesize-analysis.md`)

Inputs: @{chapter_text}, @{entities}, @{mappings}, @{vsm_framework}.

Template 4: Assess Metrics (`templates/assess-metrics.md`)

Inputs: @{all_analyses} (all chapter analyses concatenated), @{vsm_framework}. Runs across the entire infospace, not per-chapter.

Dependency chain per chapter:

chapter_text ─────┐
extraction_rules ──┤
vsm_framework ────┤
                   ▼
           extract-entities
                   │
                   ▼ entities
           map-to-vsm
                   │
                   ▼ mappings
           synthesize-analysis
                   │
                   ▼ analysis

6. Populating Artifacts

Source chapters (`artifacts/sources/`)

35 markdown files with the full public-domain text of each chapter. Named book-1-chapter-01.md through book-5-chapter-03.md.

Guidelines (`artifacts/guidelines/`)

extraction-rules.md — What constitutes an entity, granularity rules, naming conventions.
mapping-rules.md — How to map entities to VSM systems, what constitutes Strong/Moderate/Weak strength.

VSM reference (`artifacts/vsm-reference/`)

vsm-framework.md — Complete description of Beer's VSM (S1–S5, S3*, recursion, variety, viability, algedonic signals, autonomy) with economic interpretations.

7. Processing Chapters

process_chapters.py orchestrates the three-stage pipeline. It initialises the artifact repository, loads static artifacts, runs entity extraction → VSM mapping → analysis synthesis, and commits each chapter to git.

Single chapter

# Manual mode (writes prompts, awaits output files):
python process_chapters.py --chapter book-1-chapter-05 --no-commit

# Auto mode via OpenRouter (free models available):
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter

# With a specific free model:
python process_chapters.py --chapter book-1-chapter-05 \
  --provider openrouter --model meta-llama/llama-4-maverick:free

Whole book or all chapters

python process_chapters.py --book 1 --provider openrouter
python process_chapters.py --all --provider openrouter

Check progress

python process_chapters.py --list

Available chapters (35):

  Chapter                        Entities     Mappings     Analysis
  ------------------------------ ------------ ------------ ------------
  book-1-chapter-01              done (13)    done         done
  book-1-chapter-02              done (7)     done         done
  ...

  Canonical entity set: 109 unique entities

Entity lifecycle

Entities in the canonical set are never silently deleted. Retire an entity by archiving it with a documented reason:

python process_chapters.py --archive-entity enlarged-monopoly \
  --reason "Subsumed by monopoly-price — same market distortion"

The archived file moves to output/entities/archive/<slug>.md with a dated header, preserving the intellectual history of every decision.

8. Evaluating Entity Quality

Once chapters are processed, evaluate the entity set using the infospace tooling commands.

Per-entity evaluation

# Evaluate all entities (requires LLM provider):
markitect infospace evaluate --provider openrouter

# Evaluate entities from a specific chapter:
markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter

# Re-evaluate a single entity:
markitect infospace evaluate --entity division-of-labour --provider openrouter

This runs the evaluate-entity prompt template against each entity, scoring dimensions like definition precision, source grounding, and VSM relevance. Results are written to output/evaluations/.

Collection-level checks (C1–C5)

# Run all five collection checks:
markitect infospace check --provider openrouter

# Run individual checks:
markitect infospace check redundancy   # C1: Are any entities synonymous?
markitect infospace check coverage     # C2: Which domain × VSM cells are empty?
markitect infospace check coherence    # C3: Is the entity graph well-connected?
markitect infospace check consistency  # C4: Are there circular definitions?
markitect infospace check granularity  # C5: Is abstraction level balanced?

Each check uses the platform's embedding, graph analysis, and FCA infrastructure. Results are written to output/metrics/ and a new snapshot is appended to metrics-history.yaml.

Sample output:

Running collection checks on 109 entities...

  C1 — redundancy
    redundancy_ratio: 0.0183
    high_similarity_pairs: 2

  C2 — coverage
    coverage_ratio: 0.4286
    empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']]

  C3 — coherence
    coherence_components: 1
    modularity: 0.412

  C4 — consistency
    consistency_cycles: 0
    grounding_ratio: 0.94

  C5 — granularity
    granularity_entropy: 2.69

9. Reviewing Viability

markitect infospace viability

Compares the latest metrics against the thresholds declared in infospace.yaml:

Metric                         Value    Threshold   Status
-----------------------------------------------------------
redundancy_ratio               0.0183    max=0.10     PASS
coverage_ratio                 0.4286    min=0.50     FAIL
coherence_components           1         max=3        PASS
consistency_cycles             0         max=0        PASS
granularity_entropy            2.6900    min=1.0      PASS

Viable: NO (4/5 thresholds met)

Coverage is currently failing (42% < 50% threshold) because only 9 of 35 chapters have been processed. Once more chapters are done, coverage will rise.

Metrics history

markitect infospace history

Shows how metrics evolved across runs:

Snapshot  Date        Entities  coverage  redundancy  entropy
-------------------------------------------------------------
6ba48eb2  2026-02-19  85        0.361     0.000       2.687

10. Tracking History with Git

Every processed chapter produces one git commit containing:

Compiled prompts (*-prompt.md) — audit what was sent to the LLM
Canonical entity files (output/entities/<slug>.md) — first occurrence wins
Chapter entity views (<chapter>-entities.md) — transclusion references
Generated outputs (*-mappings.md, *-analysis.md)

This means:

git log shows the chronological order of processing
git diff between commits shows what each chapter contributed
You can git bisect to find where quality degraded
You can revert a chapter and re-process with improved guidelines

The clean-example-history branch in this repository demonstrates the intended structure: each chapter is a single, self-contained commit. Use it as a reference for how the infospace grew step by step.

To commit manually after reviewing:

python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
# review output/entities/ and output/mappings/
git add examples/infospace-with-history/output/
git commit -m "infospace: process book-1-chapter-05"

11. Cost and Performance

	OpenRouter (free)	OpenRouter (paid)	Gemini (free)
Time per chapter	~5 min	~2 min	~45 sec
Cost per chapter	$0.00	~$0.07	$0.00
Default model	`arcee-ai/trinity-large-preview:free`	`anthropic/claude-sonnet-4`	`gemini-2.5-flash`
Rate limits	~200 req/day	High	Per-minute

OpenRouter free tier: Sign up at openrouter.ai (no credit card required). Store your key in apikey-openrouter.txt in the project root (git-ignored), or set OPENROUTER_API_KEY.

export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')

Use openrouter/free to automatically select from whichever free model is available:

python process_chapters.py --chapter book-1-chapter-05 \
  --provider openrouter --model openrouter/free

Gemini free tier: Get a key at aistudio.google.com/apikey, store in apikey-geminifree.txt.

Note: The claude-code provider (Claude CLI subprocess) is not available when running inside a Claude Code session due to nested session restrictions.

12. Completing the Remaining Chapters

As of writing, 9 of 35 chapters are processed (Book I, Chapters 1–9).

Process Book I remainder:

export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
git checkout clean-example-history
python process_chapters.py --book 1 --provider openrouter

Already-processed chapters are skipped — their chapter view files exist. The @{existing_entities} macro ensures the LLM only extracts genuinely new entities.

Process Books II–V:

python process_chapters.py --book 2 --provider openrouter
python process_chapters.py --book 3 --provider openrouter
python process_chapters.py --book 4 --provider openrouter
python process_chapters.py --book 5 --provider openrouter

Run collection checks after each book:

markitect infospace check --provider openrouter
markitect infospace viability

Expected progression:

After	Chapters	Expected coverage
Book I (11 ch.)	11/35	S1, S2, S4 strong; S3 emerging
Books I–II (16 ch.)	16/35	S3 (capital control) covered
Books I–III (20 ch.)	20/35	Historical patterns add depth
Books I–IV (30 ch.)	30/35	S5 (policy, mercantilism) emerging
All (35 ch.)	35/35	Full coverage; S3* and algedonic signals from Book V

13. Using the Infospace as a Discipline

A completed, viable infospace can itself become a discipline — a lens applied to a new topic. For example, the Wealth of Nations infospace could be applied to analyse a modern supply chain.

# In a new infospace directory:
markitect infospace init \
  --topic "Modern Supply Chain Management" \
  --domain "Operations Research" \
  --discipline "Wealth of Nations"

# Bind the WoN infospace as a discipline:
markitect infospace bind-discipline ../infospace-with-history

# List bound disciplines and their viability:
markitect infospace disciplines
# Viable System Model    PASS (from vsm-reference/)
# Wealth of Nations      PASS (from ../infospace-with-history)

# Check for stale mappings after discipline update:
markitect infospace stale-mappings

The discipline infospace must be viable (meeting its own thresholds) before it can be used as a lens. If the discipline's entities change, dependent mappings are flagged for re-evaluation.

14. Quality Improvement Loop

The infospace is designed to be iteratively refined:

Process chapters — run the pipeline
Evaluate — markitect infospace evaluate --provider openrouter
Check — markitect infospace check --provider openrouter
Review viability — markitect infospace viability
Refine guidelines — update extraction-rules.md or mapping-rules.md to address identified weaknesses
Re-process — delete output files for specific chapters and re-run
Compare — git diff shows how refined guidelines changed the output

Example: if checks show S3* (Audit) is consistently missing, add a paragraph to extraction-rules.md explicitly asking the LLM to look for audit, inspection, and oversight mechanisms.

To re-process a specific chapter:

rm -f output/entities/book-1-chapter-03-entities.md
rm -f output/mappings/book-1-chapter-03-mappings.md
rm -f output/analyses/book-1-chapter-03-analysis.md
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter

Never silently delete canonical entity files. Archive them instead:

python process_chapters.py --archive-entity extent-of-the-market \
  --reason "Subsumed by market-price and effectual-demand"
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter

15. The Artifact Database (`infospace.db`)

The pipeline stores all artifacts and dependency edges in a local SQLite database — infospace.db. This file is not committed to git because it is fully derived from the markdown files that are tracked.

To regenerate it after a fresh clone (no LLM calls needed):

python process_chapters.py --all --no-commit

16. Adapting This Pattern to Your Own Project

To build your own infospace:

markitect infospace init --topic "..." --domain "..." --discipline "..."
Write schemas defining required sections for each output type
Write extraction guidelines that tell the LLM what to look for
Create prompt templates using @{macro} syntax
Populate artifacts/sources/ with your source corpus
Run process_chapters.py (or your equivalent pipeline script)
Evaluate with markitect infospace evaluate and check
Review markitect infospace viability against your thresholds
Iterate: refine guidelines, re-process, re-evaluate
Once viable, use as a discipline for a new infospace

The key insight is that schemas and guidelines are artifacts — they live in the repository and can be versioned and diffed just like code. Every refinement decision is traceable through git history.

22 KiB Raw Blame History Unescape Escape