docs(examples): add infospace-with-history tutorial
Comprehensive walkthrough covering schema design, prompt templates, artifact population, pipeline usage, LLM integration, git history tracking, metrics, and how to complete the remaining 31 chapters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
533
examples/infospace-with-history/TUTORIAL.md
Normal file
533
examples/infospace-with-history/TUTORIAL.md
Normal file
@@ -0,0 +1,533 @@
|
||||
# Building an Infospace with History — Tutorial
|
||||
|
||||
This tutorial walks through how we built a structured **information space**
|
||||
(infospace) from Adam Smith's *The Wealth of Nations*, mapping classical
|
||||
economic concepts to Stafford Beer's **Viable System Model** (VSM), using
|
||||
MarkiTect's prompt dependency resolution and LLM integration.
|
||||
|
||||
By the end you will understand how to:
|
||||
|
||||
1. Design schemas that scaffold structured LLM output
|
||||
2. Write prompt templates with dependency injection (`@{macro}` syntax)
|
||||
3. Populate source artifacts and reference material
|
||||
4. Run an incremental, chapter-by-chapter pipeline
|
||||
5. Track every change through git history
|
||||
6. Measure completeness and consistency with metrics
|
||||
7. Continue the work to process remaining chapters
|
||||
|
||||
---
|
||||
|
||||
## 1. The Idea
|
||||
|
||||
We want to transform a large body of text — the full public-domain text of
|
||||
*The Wealth of Nations* (5 books, 35 chapters) — into a **curated
|
||||
collection of economic concepts and entities**, each mapped to the VSM.
|
||||
|
||||
The challenge: this is too much for a single prompt. The text is hundreds of
|
||||
thousands of words. We need to work **incrementally**, one chapter at a time,
|
||||
building up the infospace and tracking progress.
|
||||
|
||||
MarkiTect's prompt dependency resolution lets us define **templates** with
|
||||
`@{placeholder}` macros that are filled from an artifact repository at
|
||||
execution time. The pipeline compiles each template into a complete prompt,
|
||||
sends it to an LLM, and stores the output — all tracked by git.
|
||||
|
||||
---
|
||||
|
||||
## 2. Project Layout
|
||||
|
||||
```
|
||||
examples/infospace-with-history/
|
||||
│
|
||||
├── README.md # Project brief
|
||||
├── TUTORIAL.md # This file
|
||||
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
|
||||
├── process_chapters.py # Pipeline script
|
||||
│
|
||||
├── schemas/ # Output structure definitions
|
||||
│ ├── economic-entity-schema-v1.0.md
|
||||
│ ├── vsm-concept-schema-v1.0.md
|
||||
│ ├── vsm-mapping-schema-v1.0.md
|
||||
│ └── chapter-analysis-schema-v1.0.md
|
||||
│
|
||||
├── templates/ # Prompt templates (with @{macro} placeholders)
|
||||
│ ├── extract-entities.md
|
||||
│ ├── map-to-vsm.md
|
||||
│ ├── synthesize-analysis.md
|
||||
│ └── assess-metrics.md
|
||||
│
|
||||
├── artifacts/ # Input artifacts
|
||||
│ ├── sources/ # Chapter text (35 files)
|
||||
│ ├── guidelines/ # Extraction and mapping rules
|
||||
│ └── vsm-reference/ # VSM framework definition
|
||||
│
|
||||
└── output/ # Generated artifacts (LLM outputs)
|
||||
├── entities/ # Per-chapter entity extractions
|
||||
├── mappings/ # Per-chapter VSM mappings
|
||||
├── analyses/ # Per-chapter synthesised analyses
|
||||
└── metrics/ # Cross-chapter metrics reports
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Designing Schemas
|
||||
|
||||
Before writing any prompts we defined **four schemas** that tell the LLM
|
||||
exactly what sections each output document must contain. This ensures every
|
||||
generated document is machine-parseable and comparable across chapters.
|
||||
|
||||
### Economic Entity Schema (`schemas/economic-entity-schema-v1.0.md`)
|
||||
|
||||
Every extracted entity must have:
|
||||
|
||||
- **H1 heading** with the entity name
|
||||
- **Definition** (20-150 words)
|
||||
- **Source Chapter** citing Book and Chapter
|
||||
- **Context** — where in Smith's argument the entity appears
|
||||
- **Economic Domain** (Production, Distribution, Exchange, etc.)
|
||||
|
||||
Optional: Smith's Original Wording, Modern Interpretation.
|
||||
|
||||
### VSM Mapping Schema (`schemas/vsm-mapping-schema-v1.0.md`)
|
||||
|
||||
Every entity-to-VSM mapping must have:
|
||||
|
||||
- **H1 heading** in the format `Entity Name -> VSM Concept Name`
|
||||
- **Economic Entity Reference** and **VSM Concept Reference**
|
||||
- **Mapping Rationale** (minimum 30 words, grounded in Beer's definitions)
|
||||
- **Mapping Strength**: Strong, Moderate, or Weak
|
||||
|
||||
### Chapter Analysis Schema (`schemas/chapter-analysis-schema-v1.0.md`)
|
||||
|
||||
The per-chapter synthesis includes:
|
||||
|
||||
- **Chapter Summary** (50-300 words)
|
||||
- **Entities Extracted** — bulleted list
|
||||
- **VSM Mappings** — entity, concept, strength
|
||||
- **VSM Coverage** — explicit assessment of S1 through S5 and S3*
|
||||
- **Gaps & Observations**
|
||||
|
||||
### Metrics Schema (implicit in `assess-metrics` template)
|
||||
|
||||
The metrics report computes:
|
||||
|
||||
- VSM Concept Coverage (% of S1-S5, recursion, variety, etc.)
|
||||
- Chapter Coverage (% of 35 chapters processed)
|
||||
- Entity and Mapping counts
|
||||
- Terminology Consistency and Cross-reference Integrity scores
|
||||
|
||||
**Key insight**: Schemas are not code — they are markdown documents that
|
||||
the LLM reads as instructions. This means you can iterate on them without
|
||||
changing any infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## 4. Writing Prompt Templates
|
||||
|
||||
Each template is a markdown file containing instructions for the LLM plus
|
||||
`@{macro_name}` placeholders that MarkiTect's resolver fills with artifact
|
||||
content at compile time.
|
||||
|
||||
### Template 1: Extract Entities (`templates/extract-entities.md`)
|
||||
|
||||
```markdown
|
||||
# Extract Economic Entities
|
||||
|
||||
You are an analytical economist specialising in classical economic theory.
|
||||
Your task is to extract distinct economic entities from a chapter of
|
||||
Adam Smith's *The Wealth of Nations*.
|
||||
|
||||
## Source Chapter
|
||||
|
||||
@{chapter_text}
|
||||
|
||||
## Extraction Guidelines
|
||||
|
||||
@{extraction_rules}
|
||||
|
||||
## VSM Framework Context
|
||||
|
||||
@{vsm_framework}
|
||||
|
||||
## Instructions
|
||||
[... detailed step-by-step instructions ...]
|
||||
|
||||
## Output Format
|
||||
|
||||
Output each entity as a separate markdown document, delimited by
|
||||
`--- ENTITY: <entity-name> ---` markers.
|
||||
```
|
||||
|
||||
The three macros (`chapter_text`, `extraction_rules`, `vsm_framework`) are
|
||||
resolved by looking up artifacts by name in the relevant information spaces.
|
||||
|
||||
### Template 2: Map to VSM (`templates/map-to-vsm.md`)
|
||||
|
||||
Takes `@{entities}` (output from stage 1), `@{vsm_framework}`, and
|
||||
`@{mapping_rules}` as inputs.
|
||||
|
||||
### Template 3: Synthesise Analysis (`templates/synthesize-analysis.md`)
|
||||
|
||||
Takes `@{chapter_text}`, `@{entities}` (stage 1 output),
|
||||
`@{mappings}` (stage 2 output), and `@{vsm_framework}`.
|
||||
|
||||
### Template 4: Assess Metrics (`templates/assess-metrics.md`)
|
||||
|
||||
Takes `@{all_analyses}` (concatenation of all chapter analyses) and
|
||||
`@{vsm_framework}`. Runs across the entire infospace, not per-chapter.
|
||||
|
||||
**Dependency chain per chapter:**
|
||||
|
||||
```
|
||||
chapter_text ─────┐
|
||||
extraction_rules ──┤
|
||||
vsm_framework ────┤
|
||||
▼
|
||||
extract-entities
|
||||
│
|
||||
▼ entities
|
||||
map-to-vsm
|
||||
│
|
||||
▼ mappings
|
||||
synthesize-analysis
|
||||
│
|
||||
▼ analysis
|
||||
```
|
||||
|
||||
After all chapters are processed, `assess-metrics` evaluates the
|
||||
complete infospace.
|
||||
|
||||
---
|
||||
|
||||
## 5. Populating Artifacts
|
||||
|
||||
### Source chapters (`artifacts/sources/`)
|
||||
|
||||
35 markdown files containing the full public-domain text of each chapter.
|
||||
Named by convention: `book-1-chapter-01.md` through `book-5-chapter-03.md`,
|
||||
plus `introduction.md`.
|
||||
|
||||
These are loaded into the `infospace-sources` information space.
|
||||
|
||||
### Guidelines (`artifacts/guidelines/`)
|
||||
|
||||
Two hand-written reference documents:
|
||||
|
||||
- **`extraction-rules.md`** — What constitutes an entity, granularity rules,
|
||||
naming conventions, quality checks.
|
||||
- **`mapping-rules.md`** — How to map entities to VSM systems, what
|
||||
constitutes strong/moderate/weak mapping strength.
|
||||
|
||||
These are loaded into `infospace-guidelines`.
|
||||
|
||||
### VSM reference (`artifacts/vsm-reference/`)
|
||||
|
||||
- **`vsm-framework.md`** — Complete description of Beer's VSM (S1-S5, S3*,
|
||||
recursion, variety, viability, attenuation/amplification, algedonic
|
||||
signals, autonomy). Includes economic interpretations for each system.
|
||||
|
||||
Loaded into `infospace-vsm-reference`.
|
||||
|
||||
---
|
||||
|
||||
## 6. The Pipeline Script
|
||||
|
||||
`process_chapters.py` orchestrates everything. It:
|
||||
|
||||
1. Initialises the artifact repository (SQLite) and information spaces
|
||||
2. Loads all static artifacts (templates, guidelines, VSM reference)
|
||||
3. For each chapter, runs the three-stage pipeline
|
||||
4. Optionally calls an LLM to auto-generate outputs
|
||||
5. Records dependency edges in the graph
|
||||
6. Commits results to git
|
||||
|
||||
### Running a single chapter
|
||||
|
||||
```bash
|
||||
# Manual mode (writes prompts, waits for you to provide output files):
|
||||
python process_chapters.py --chapter book-1-chapter-05 --no-commit
|
||||
|
||||
# Automatic mode via OpenRouter (recommended — fast, real token counts):
|
||||
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
|
||||
|
||||
# Automatic mode via Claude Code CLI:
|
||||
python process_chapters.py --chapter book-1-chapter-05 --provider claude-code --no-commit
|
||||
|
||||
# With a specific model:
|
||||
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --model anthropic/claude-haiku-4-5-20251001 --no-commit
|
||||
```
|
||||
|
||||
### Running a whole book
|
||||
|
||||
```bash
|
||||
python process_chapters.py --book 1 --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
### Running all chapters
|
||||
|
||||
```bash
|
||||
python process_chapters.py --all --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
### Checking progress
|
||||
|
||||
```bash
|
||||
python process_chapters.py --list
|
||||
```
|
||||
|
||||
Prints a table showing which chapters have completed each stage:
|
||||
|
||||
```
|
||||
Available chapters (35):
|
||||
|
||||
Chapter Entities Mappings Analysis
|
||||
------------------------------ ------------ ------------ ------------
|
||||
book-1-chapter-01 done done done
|
||||
book-1-chapter-02 done done done
|
||||
book-1-chapter-03 done done done
|
||||
book-1-chapter-04 done done done
|
||||
book-1-chapter-05 - - -
|
||||
...
|
||||
```
|
||||
|
||||
### Assessing metrics
|
||||
|
||||
After processing a batch of chapters, run the metrics assessment:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --metrics --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
This concatenates all completed analyses and asks the LLM to evaluate
|
||||
coverage, consistency, and completeness.
|
||||
|
||||
### Dependency statistics
|
||||
|
||||
```bash
|
||||
python process_chapters.py --stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. How the LLM Integration Works
|
||||
|
||||
The pipeline uses MarkiTect's `markitect.llm` module, which provides two
|
||||
adapter backends that implement the `LLMAdapter` interface:
|
||||
|
||||
| Backend | How it works | Pros | Cons |
|
||||
|---------|-------------|------|------|
|
||||
| `openrouter` | HTTP POST to OpenRouter API | Fast, real token counts, any model | Needs API key |
|
||||
| `claude-code` | Shells out to `claude --print` | No API key needed if CLI installed | Slower, estimated token counts |
|
||||
|
||||
### API key setup (OpenRouter)
|
||||
|
||||
Place your key in one of these locations (checked in order):
|
||||
|
||||
1. Pass `--api-key` on the command line (not yet implemented in the CLI)
|
||||
2. Set `OPENROUTER_API_KEY` environment variable
|
||||
3. Create `apikey-openrouter.txt` in the project root (git-ignored)
|
||||
|
||||
### What happens per stage
|
||||
|
||||
1. The pipeline **resolves** macro placeholders by looking up artifacts
|
||||
in the repository
|
||||
2. It **compiles** the template into a complete prompt (macros replaced
|
||||
with real content)
|
||||
3. It writes the compiled prompt to `output/<stage>/<chapter>-prompt.md`
|
||||
for inspection
|
||||
4. If an LLM adapter is configured and no output file exists yet, it
|
||||
**executes** the prompt and writes the result
|
||||
5. The output is **stored** as a generated artifact in the repository
|
||||
6. Dependency edges are **recorded** in the graph
|
||||
|
||||
---
|
||||
|
||||
## 8. Tracking History with Git
|
||||
|
||||
Every processed chapter produces a git commit containing:
|
||||
|
||||
- Compiled prompts (`*-prompt.md`) — so you can audit exactly what was sent
|
||||
- Generated outputs (`*-entities.md`, `*-mappings.md`, `*-analysis.md`)
|
||||
|
||||
This means:
|
||||
|
||||
- `git log` shows the chronological order of processing
|
||||
- `git diff` between commits shows what each chapter contributed
|
||||
- You can `git bisect` to find where quality degraded
|
||||
- You can revert a chapter and re-process it with different settings
|
||||
|
||||
To let the script auto-commit (default):
|
||||
|
||||
```bash
|
||||
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter
|
||||
```
|
||||
|
||||
To commit manually after reviewing:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
|
||||
# review output/entities/book-1-chapter-05-entities.md etc.
|
||||
git add examples/infospace-with-history/output/
|
||||
git commit -m "infospace: process book-1-chapter-05"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Cost and Performance
|
||||
|
||||
From our measurements processing chapters 3 and 4:
|
||||
|
||||
| | Claude Code CLI | OpenRouter |
|
||||
|---|---|---|
|
||||
| Time per chapter | ~5 minutes | ~2 minutes |
|
||||
| Token counts | Estimated (4 chars/tok) | Real (from API) |
|
||||
| Cost per chapter | ~$0.35 est. | ~$0.07 est. |
|
||||
|
||||
**Projected cost for all 35 chapters via OpenRouter:** ~$2.50
|
||||
(varies by chapter length; Book V chapters are longer).
|
||||
|
||||
To reduce costs further, use a cheaper model:
|
||||
|
||||
```bash
|
||||
--provider openrouter --model anthropic/claude-haiku-4-5-20251001
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Completing the Remaining Chapters
|
||||
|
||||
As of now, 4 of 35 chapters are processed (Book I, Chapters 1-4). Here is
|
||||
how to complete the rest.
|
||||
|
||||
### Step-by-step
|
||||
|
||||
**1. Process remaining Book I chapters (5-11):**
|
||||
|
||||
```bash
|
||||
python process_chapters.py --book 1 --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
Already-processed chapters are skipped (their output files exist).
|
||||
|
||||
**2. Process Books II-V:**
|
||||
|
||||
```bash
|
||||
python process_chapters.py --book 2 --provider openrouter --no-commit
|
||||
python process_chapters.py --book 3 --provider openrouter --no-commit
|
||||
python process_chapters.py --book 4 --provider openrouter --no-commit
|
||||
python process_chapters.py --book 5 --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
Or all at once:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --all --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
**3. Run metrics after each book (or at the end):**
|
||||
|
||||
```bash
|
||||
python process_chapters.py --metrics --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
**4. Commit the results:**
|
||||
|
||||
```bash
|
||||
git add examples/infospace-with-history/output/
|
||||
git commit -m "infospace: process all remaining chapters"
|
||||
```
|
||||
|
||||
**5. Review the metrics report:**
|
||||
|
||||
Open `output/metrics/metrics-report.md`. It will show:
|
||||
|
||||
- Which VSM concepts (S1-S5, recursion, variety, etc.) now have mappings
|
||||
- Total entity and mapping counts
|
||||
- Consistency scores
|
||||
- Recommendations for gaps
|
||||
|
||||
### Expected progression
|
||||
|
||||
| After | Chapters | Expected coverage |
|
||||
|-------|----------|-------------------|
|
||||
| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging |
|
||||
| Books I-II (16 ch.) | 16/35 | S3 (capital control) covered |
|
||||
| Books I-III (20 ch.) | 20/35 | Historical patterns add depth |
|
||||
| Books I-IV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging |
|
||||
| All (35 ch.) | 35/35 | Full coverage, S3* and algedonic signals likely from Book V |
|
||||
|
||||
Book V (public revenue, taxation, sovereign duties) is expected to
|
||||
fill the remaining gaps in S3*, S5, and regulatory concepts.
|
||||
|
||||
---
|
||||
|
||||
## 11. Quality Improvement Loop
|
||||
|
||||
The infospace is designed to be **iteratively refined**:
|
||||
|
||||
1. **Process chapters** — run the pipeline
|
||||
2. **Assess metrics** — identify gaps in VSM coverage and consistency
|
||||
3. **Refine guidelines** — update `extraction-rules.md` or
|
||||
`mapping-rules.md` to address identified weaknesses
|
||||
4. **Re-process** — delete output files for specific chapters and re-run
|
||||
with improved guidelines
|
||||
5. **Compare** — use git diff to see how the refined guidelines changed
|
||||
the output
|
||||
|
||||
Example: if metrics show that S3* (Audit) is consistently missed, you
|
||||
could add a paragraph to `extraction-rules.md` explicitly asking the LLM
|
||||
to look for audit, inspection, and oversight mechanisms.
|
||||
|
||||
To re-process a specific chapter:
|
||||
|
||||
```bash
|
||||
rm examples/infospace-with-history/output/entities/book-1-chapter-03-entities.md
|
||||
rm examples/infospace-with-history/output/mappings/book-1-chapter-03-mappings.md
|
||||
rm examples/infospace-with-history/output/analyses/book-1-chapter-03-analysis.md
|
||||
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Infrastructure Issues Found
|
||||
|
||||
During development we documented three issues with the MarkiTect
|
||||
infrastructure in `INFRA-TASKS.md`:
|
||||
|
||||
1. **Artifact repo doesn't store content** — the resolver returns
|
||||
placeholder text; the pipeline works around this with a local cache.
|
||||
2. **ContentMacro `raw_text` defaults to `""`** — causes silent data
|
||||
corruption when macros are constructed programmatically.
|
||||
3. **No `@{target}` syntax in TemplateAnalyzer** — macros must be
|
||||
constructed manually rather than auto-detected from template text.
|
||||
|
||||
These are intentionally not fixed in this example (the constraint was
|
||||
"no changes to markitect infrastructure"). They are tracked for future
|
||||
improvement, after which the experiment can be re-run.
|
||||
|
||||
---
|
||||
|
||||
## 13. Adapting This Pattern to Your Own Project
|
||||
|
||||
To build your own infospace using this pattern:
|
||||
|
||||
1. **Choose your source corpus** — any collection of documents you want
|
||||
to transform into structured knowledge.
|
||||
2. **Define your target ontology** — what concepts, relationships, or
|
||||
categories you want to extract (our VSM is just one example).
|
||||
3. **Write schemas** — markdown documents defining the required sections
|
||||
and validation rules for each output type.
|
||||
4. **Write extraction guidelines** — rules that tell the LLM what to
|
||||
look for and how to handle edge cases.
|
||||
5. **Create prompt templates** — use `@{macro}` syntax to inject source
|
||||
text and guidelines at compile time.
|
||||
6. **Build your pipeline** — follow `process_chapters.py` as a reference
|
||||
for loading artifacts, resolving templates, and calling the LLM.
|
||||
7. **Process incrementally** — work through your corpus one document at a
|
||||
time, tracking everything in git.
|
||||
8. **Measure and refine** — define metrics, assess them periodically,
|
||||
and update your guidelines when gaps appear.
|
||||
|
||||
The key architectural insight is that **schemas and guidelines are
|
||||
artifacts** — they live in the same repository as your source text and
|
||||
can be versioned, diffed, and refined just like code.
|
||||
Reference in New Issue
Block a user