markitect-main/examples/infospace-with-history/TUTORIAL.md

# Building an Infospace with History — Tutorial

This tutorial walks through how to build a structured **infospace** from
Adam Smith's *The Wealth of Nations*, mapping classical economic concepts
to Stafford Beer's **Viable System Model** (VSM), using MarkiTect's
infospace tooling.

By the end you will understand how to:

1. Declare an infospace with `infospace.yaml` and `markitect infospace init`
2. Design schemas that scaffold structured LLM output
3. Write prompt templates with dependency injection (`@{macro}` syntax)
4. Run an incremental, chapter-by-chapter pipeline
5. Evaluate entity quality and run collection-level checks
6. Review viability against declared thresholds
7. Track every change through git history
8. Use a completed infospace as a discipline for a new project

---

## 1. What Is an Infospace?

An **infospace** is a curated, self-describing collection of **entities**
(concepts, mechanisms, observations) that together explain a **topic**
through the lens of one or more **disciplines**.

| Term | This example |
|---|---|
| Topic | *The Wealth of Nations* (Smith, 1776) |
| Discipline | Viable System Model (Beer) |
| Entities | Economic concepts: division of labour, natural price, … |
| Viability | Does the entity set answer the competency questions? |

The challenge with a large source corpus is that it is too big for a single
prompt. MarkiTect processes it **incrementally**, one chapter at a time,
building up the entity set and tracking progress through git.

An infospace is **viable** when it meets threshold scores across defined
metrics — it is fit for purpose as an explanatory tool.

---

## 2. Project Layout

```
examples/infospace-with-history/
│
├── infospace.yaml              # Declarative infospace configuration (NEW)
├── README.md
├── TUTORIAL.md                 # This file
├── INFRA-TASKS.md              # Infrastructure issues found during the experiment
├── process_chapters.py         # Pipeline script (chapter processing)
├── infospace.db                # SQLite artifact database (generated, not in git)
│
├── schemas/                    # Output structure definitions
│   ├── economic-entity-schema-v1.0.md
│   ├── vsm-concept-schema-v1.0.md
│   ├── vsm-mapping-schema-v1.0.md
│   └── chapter-analysis-schema-v1.0.md
│
├── templates/                  # Prompt templates (with @{macro} placeholders)
│   ├── extract-entities.md
│   ├── map-to-vsm.md
│   ├── synthesize-analysis.md
│   └── assess-metrics.md
│
├── artifacts/                  # Input artifacts
│   ├── sources/                # Chapter text (35 files)
│   ├── guidelines/             # Extraction and mapping rules
│   └── vsm-reference/         # VSM framework definition
│
└── output/                     # Generated artifacts (LLM outputs)
    ├── entities/               # Flat canonical entity set + chapter views
    │   ├── division-of-labour.md        # Canonical entity file (PRIMARY)
    │   ├── exchange.md
    │   ├── book-1-chapter-01-entities.md  # Chapter view (transclusion)
    │   └── ...
    ├── mappings/               # Per-chapter VSM mappings
    ├── analyses/               # Per-chapter synthesised analyses
    └── metrics/                # Collection metrics + history
        ├── metrics.yaml        # Latest metric values
        └── history.yaml        # Timestamped snapshot log
```

**Entity organisation**: The infospace maintains a **flat canonical set**
of entities — one markdown file per entity in `output/entities/`. Duplicate
slugs across chapters are skipped (first occurrence wins). Per-chapter
`*-entities.md` files are **secondary views** using transclusion directives
(`{{ include "entity.md" }}`), so editing a canonical file updates every
chapter view that references it.

---

## 3. Initialising an Infospace

### Starting fresh

Use `markitect infospace init` to create an `infospace.yaml`:

```bash
cd my-new-infospace/
markitect infospace init \
  --topic "The Wealth of Nations" \
  --domain "Classical Economics" \
  --sources artifacts/sources/ \
  --discipline "Viable System Model"
```

This creates a minimal `infospace.yaml`. Edit it to add schemas,
competency questions, and viability thresholds:

```yaml
topic:
  name: "The Wealth of Nations"
  domain: "Classical Economics"
  sources: artifacts/sources/

disciplines:
  - name: "Viable System Model"
    path: artifacts/vsm-reference/

schemas:
  entity: schemas/economic-entity-schema-v1.0.md
  mapping: schemas/vsm-mapping-schema-v1.0.md
  analysis: schemas/chapter-analysis-schema-v1.0.md

competency_questions: |
  1. How does Smith's division of labour map to VSM System 1 operations?
  2. What mechanisms in WoN correspond to VSM coordination (System 2)?
  3. Where does Smith describe self-organising regulation (System 3)?
  4. What role does the "invisible hand" play as a System 4 mechanism?
  5. How do Smith's views on government map to System 5 policy?
  6. Is the WoN entity set viable as an explanatory framework?

viability:
  redundancy_ratio: { max: 0.10 }
  coverage_ratio: { min: 0.50 }
  coherence_components: { max: 3 }
  consistency_cycles: { max: 0 }
  granularity_entropy: { min: 1.0 }

pipeline:
  stages:
    - name: extract-entities
      template: templates/extract-entities.md
    - name: map-to-vsm
      template: templates/map-to-vsm.md
    - name: synthesize-analysis
      template: templates/synthesize-analysis.md
```

### Checking status

At any point, inspect the infospace:

```bash
markitect infospace status
# Infospace: The Wealth of Nations
# Domain:    Classical Economics
# Entities:  109
# Domains:   Production, Distribution, Exchange, Regulation
# Disciplines: Viable System Model
# Chapters:  9/35 processed

markitect infospace entities
# Lists all entities with domain, source chapter, word count
```

---

## 4. Designing Schemas

Before writing any prompts, define **schemas** — markdown documents that
tell the LLM exactly what sections each output must contain. Schemas are
not code; the LLM reads them as instructions.

### Economic Entity Schema (`schemas/economic-entity-schema-v1.0.md`)

Every extracted entity must have:

- **H1 heading** with the entity name (title case)
- **Definition** (20–150 words, precise and non-circular)
- **Source Chapter** citing Book and Chapter
- **Context** — where in Smith's argument the entity appears
- **Economic Domain** (Production, Distribution, Exchange, etc.)

Optional: Smith's Original Wording, Modern Interpretation.

### VSM Mapping Schema (`schemas/vsm-mapping-schema-v1.0.md`)

Every entity-to-VSM mapping must have:

- **H1 heading**: `Entity Name -> VSM Concept Name`
- **Economic Entity Reference** and **VSM Concept Reference**
- **Mapping Rationale** (minimum 30 words, grounded in Beer's definitions)
- **Mapping Strength**: Strong, Moderate, or Weak

### Chapter Analysis Schema (`schemas/chapter-analysis-schema-v1.0.md`)

The per-chapter synthesis includes:

- **Chapter Summary** (50–300 words)
- **Entities Extracted** — bulleted list
- **VSM Mappings** — entity, concept, strength
- **VSM Coverage** — explicit assessment of S1 through S5 and S3*
- **Gaps & Observations**

**Key insight**: Schemas are artifacts — they live in the repository and
can be versioned, diffed, and refined just like code. Improving a schema
and re-processing a chapter is visible as a git diff.

---

## 5. Writing Prompt Templates

Each template is a markdown file with `@{macro_name}` placeholders that
MarkiTect's resolver fills with artifact content at compile time.

### Template 1: Extract Entities (`templates/extract-entities.md`)

```markdown
# Extract Economic Entities

You are an analytical economist specialising in classical economic theory.
Your task is to extract distinct economic entities from a chapter of
Adam Smith's *The Wealth of Nations*.

## Source Chapter
@{chapter_text}

## Extraction Guidelines
@{extraction_rules}

## VSM Framework Context
@{vsm_framework}

## Existing Entities
@{existing_entities}

## Output Format
Output each entity delimited by `--- ENTITY: <entity-name> ---` markers.
```

The `@{existing_entities}` macro is generated at runtime from canonical
files already on disk, enabling incremental extraction without duplication.

### Template 2: Map to VSM (`templates/map-to-vsm.md`)

Inputs: `@{entities}`, `@{vsm_framework}`, `@{mapping_rules}`.

### Template 3: Synthesise Analysis (`templates/synthesize-analysis.md`)

Inputs: `@{chapter_text}`, `@{entities}`, `@{mappings}`, `@{vsm_framework}`.

### Template 4: Assess Metrics (`templates/assess-metrics.md`)

Inputs: `@{all_analyses}` (all chapter analyses concatenated), `@{vsm_framework}`.
Runs across the entire infospace, not per-chapter.

**Dependency chain per chapter:**

```
chapter_text ─────┐
extraction_rules ──┤
vsm_framework ────┤
                   ▼
           extract-entities
                   │
                   ▼ entities
           map-to-vsm
                   │
                   ▼ mappings
           synthesize-analysis
                   │
                   ▼ analysis
```

---

## 6. Populating Artifacts

### Source chapters (`artifacts/sources/`)

35 markdown files with the full public-domain text of each chapter.
Named `book-1-chapter-01.md` through `book-5-chapter-03.md`.

### Guidelines (`artifacts/guidelines/`)

- **`extraction-rules.md`** — What constitutes an entity, granularity
  rules, naming conventions.
- **`mapping-rules.md`** — How to map entities to VSM systems, what
  constitutes Strong/Moderate/Weak strength.

### VSM reference (`artifacts/vsm-reference/`)

- **`vsm-framework.md`** — Complete description of Beer's VSM (S1–S5,
  S3*, recursion, variety, viability, algedonic signals, autonomy) with
  economic interpretations.

---

## 7. Processing Chapters

`process_chapters.py` orchestrates the three-stage pipeline. It initialises
the artifact repository, loads static artifacts, runs entity extraction →
VSM mapping → analysis synthesis, and commits each chapter to git.

### Single chapter

```bash
# Manual mode (writes prompts, awaits output files):
python process_chapters.py --chapter book-1-chapter-05 --no-commit

# Auto mode via OpenRouter (free models available):
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter

# With a specific free model:
python process_chapters.py --chapter book-1-chapter-05 \
  --provider openrouter --model meta-llama/llama-4-maverick:free
```

### Whole book or all chapters

```bash
python process_chapters.py --book 1 --provider openrouter
python process_chapters.py --all --provider openrouter
```

### Check progress

```bash
python process_chapters.py --list
```

```
Available chapters (35):

  Chapter                        Entities     Mappings     Analysis
  ------------------------------ ------------ ------------ ------------
  book-1-chapter-01              done (13)    done         done
  book-1-chapter-02              done (7)     done         done
  ...

  Canonical entity set: 109 unique entities
```

### Entity lifecycle

Entities in the canonical set are **never silently deleted**. Retire
an entity by archiving it with a documented reason:

```bash
python process_chapters.py --archive-entity enlarged-monopoly \
  --reason "Subsumed by monopoly-price — same market distortion"
```

The archived file moves to `output/entities/archive/<slug>.md` with a
dated header, preserving the intellectual history of every decision.

---

## 8. Evaluating Entity Quality

Once chapters are processed, evaluate the entity set using the infospace
tooling commands.

### Per-entity evaluation

```bash
# Evaluate all entities (requires LLM provider):
markitect infospace evaluate --provider openrouter

# Evaluate entities from a specific chapter:
markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter

# Re-evaluate a single entity:
markitect infospace evaluate --entity division-of-labour --provider openrouter
```

This runs the `evaluate-entity` prompt template against each entity,
scoring dimensions like definition precision, source grounding, and
VSM relevance. Results are written to `output/evaluations/`.

### Collection-level checks (C1–C5)

```bash
# Run all five collection checks:
markitect infospace check --provider openrouter

# Run individual checks:
markitect infospace check redundancy   # C1: Are any entities synonymous?
markitect infospace check coverage     # C2: Which domain × VSM cells are empty?
markitect infospace check coherence    # C3: Is the entity graph well-connected?
markitect infospace check consistency  # C4: Are there circular definitions?
markitect infospace check granularity  # C5: Is abstraction level balanced?
```

Each check uses the platform's embedding, graph analysis, and FCA
infrastructure. Results are written to `output/metrics/` and a new
snapshot is appended to `metrics-history.yaml`.

Sample output:

```
Running collection checks on 109 entities...

  C1 — redundancy
    redundancy_ratio: 0.0183
    high_similarity_pairs: 2

  C2 — coverage
    coverage_ratio: 0.4286
    empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']]

  C3 — coherence
    coherence_components: 1
    modularity: 0.412

  C4 — consistency
    consistency_cycles: 0
    grounding_ratio: 0.94

  C5 — granularity
    granularity_entropy: 2.69
```

---

## 9. Reviewing Viability

```bash
markitect infospace viability
```

Compares the latest metrics against the thresholds declared in
`infospace.yaml`:

```
Metric                         Value    Threshold   Status
-----------------------------------------------------------
redundancy_ratio               0.0183    max=0.10     PASS
coverage_ratio                 0.4286    min=0.50     FAIL
coherence_components           1         max=3        PASS
consistency_cycles             0         max=0        PASS
granularity_entropy            2.6900    min=1.0      PASS

Viable: NO (4/5 thresholds met)
```

Coverage is currently failing (42% < 50% threshold) because only 9 of
35 chapters have been processed. Once more chapters are done, coverage
will rise.

### Metrics history

```bash
markitect infospace history
```

Shows how metrics evolved across runs:

```
Snapshot  Date        Entities  coverage  redundancy  entropy
-------------------------------------------------------------
6ba48eb2  2026-02-19  85        0.361     0.000       2.687
```

---

## 10. Tracking History with Git

Every processed chapter produces one git commit containing:

- Compiled prompts (`*-prompt.md`) — audit what was sent to the LLM
- Canonical entity files (`output/entities/<slug>.md`) — first occurrence wins
- Chapter entity views (`<chapter>-entities.md`) — transclusion references
- Generated outputs (`*-mappings.md`, `*-analysis.md`)

This means:

- `git log` shows the chronological order of processing
- `git diff` between commits shows what each chapter contributed
- You can `git bisect` to find where quality degraded
- You can revert a chapter and re-process with improved guidelines

The `clean-example-history` branch in this repository demonstrates the
intended structure: each chapter is a single, self-contained commit.
Use it as a reference for how the infospace grew step by step.

To commit manually after reviewing:

```bash
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
# review output/entities/ and output/mappings/
git add examples/infospace-with-history/output/
git commit -m "infospace: process book-1-chapter-05"
```

---

## 11. Cost and Performance

| | OpenRouter (free) | OpenRouter (paid) | Gemini (free) |
|---|---|---|---|
| Time per chapter | ~5 min | ~2 min | ~45 sec |
| Cost per chapter | $0.00 | ~$0.07 | $0.00 |
| Default model | `arcee-ai/trinity-large-preview:free` | `anthropic/claude-sonnet-4` | `gemini-2.5-flash` |
| Rate limits | ~200 req/day | High | Per-minute |

**OpenRouter free tier**: Sign up at [openrouter.ai](https://openrouter.ai)
(no credit card required). Store your key in `apikey-openrouter.txt` in the
project root (git-ignored), or set `OPENROUTER_API_KEY`.

```bash
export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
```

Use `openrouter/free` to automatically select from whichever free model is
available:

```bash
python process_chapters.py --chapter book-1-chapter-05 \
  --provider openrouter --model openrouter/free
```

**Gemini free tier**: Get a key at [aistudio.google.com/apikey](https://aistudio.google.com/apikey),
store in `apikey-geminifree.txt`.

Note: The `claude-code` provider (Claude CLI subprocess) is not available
when running inside a Claude Code session due to nested session restrictions.

---

## 12. Completing the Remaining Chapters

As of writing, 9 of 35 chapters are processed (Book I, Chapters 1–9).

**Process Book I remainder:**

```bash
export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
git checkout clean-example-history
python process_chapters.py --book 1 --provider openrouter
```

Already-processed chapters are skipped — their chapter view files exist.
The `@{existing_entities}` macro ensures the LLM only extracts genuinely
new entities.

**Process Books II–V:**

```bash
python process_chapters.py --book 2 --provider openrouter
python process_chapters.py --book 3 --provider openrouter
python process_chapters.py --book 4 --provider openrouter
python process_chapters.py --book 5 --provider openrouter
```

**Run collection checks after each book:**

```bash
markitect infospace check --provider openrouter
markitect infospace viability
```

**Expected progression:**

| After | Chapters | Expected coverage |
|-------|----------|-------------------|
| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging |
| Books I–II (16 ch.) | 16/35 | S3 (capital control) covered |
| Books I–III (20 ch.) | 20/35 | Historical patterns add depth |
| Books I–IV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging |
| All (35 ch.) | 35/35 | Full coverage; S3* and algedonic signals from Book V |

---

## 13. Using the Infospace as a Discipline

A completed, viable infospace can itself become a **discipline** — a lens
applied to a new topic. For example, the Wealth of Nations infospace could
be applied to analyse a modern supply chain.

```bash
# In a new infospace directory:
markitect infospace init \
  --topic "Modern Supply Chain Management" \
  --domain "Operations Research" \
  --discipline "Wealth of Nations"

# Bind the WoN infospace as a discipline:
markitect infospace bind-discipline ../infospace-with-history

# List bound disciplines and their viability:
markitect infospace disciplines
# Viable System Model    PASS (from vsm-reference/)
# Wealth of Nations      PASS (from ../infospace-with-history)

# Check for stale mappings after discipline update:
markitect infospace stale-mappings
```

The discipline infospace must be viable (meeting its own thresholds)
before it can be used as a lens. If the discipline's entities change,
dependent mappings are flagged for re-evaluation.

---

## 14. Quality Improvement Loop

The infospace is designed to be **iteratively refined**:

1. **Process chapters** — run the pipeline
2. **Evaluate** — `markitect infospace evaluate --provider openrouter`
3. **Check** — `markitect infospace check --provider openrouter`
4. **Review viability** — `markitect infospace viability`
5. **Refine guidelines** — update `extraction-rules.md` or
   `mapping-rules.md` to address identified weaknesses
6. **Re-process** — delete output files for specific chapters and re-run
7. **Compare** — `git diff` shows how refined guidelines changed the output

Example: if checks show S3* (Audit) is consistently missing, add a
paragraph to `extraction-rules.md` explicitly asking the LLM to look for
audit, inspection, and oversight mechanisms.

To re-process a specific chapter:

```bash
rm -f output/entities/book-1-chapter-03-entities.md
rm -f output/mappings/book-1-chapter-03-mappings.md
rm -f output/analyses/book-1-chapter-03-analysis.md
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
```

Never silently delete canonical entity files. Archive them instead:

```bash
python process_chapters.py --archive-entity extent-of-the-market \
  --reason "Subsumed by market-price and effectual-demand"
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
```

---

## 15. The Artifact Database (`infospace.db`)

The pipeline stores all artifacts and dependency edges in a local SQLite
database — `infospace.db`. This file is **not committed to git** because
it is fully derived from the markdown files that are tracked.

To regenerate it after a fresh clone (no LLM calls needed):

```bash
python process_chapters.py --all --no-commit
```

---

## 16. Adapting This Pattern to Your Own Project

To build your own infospace:

1. `markitect infospace init --topic "..." --domain "..." --discipline "..."`
2. Write schemas defining required sections for each output type
3. Write extraction guidelines that tell the LLM what to look for
4. Create prompt templates using `@{macro}` syntax
5. Populate `artifacts/sources/` with your source corpus
6. Run `process_chapters.py` (or your equivalent pipeline script)
7. Evaluate with `markitect infospace evaluate` and `check`
8. Review `markitect infospace viability` against your thresholds
9. Iterate: refine guidelines, re-process, re-evaluate
10. Once viable, use as a discipline for a new infospace

The key insight is that **schemas and guidelines are artifacts** — they
live in the repository and can be versioned and diffed just like code.
Every refinement decision is traceable through git history.