From 01b9596ce666ce9ee9b57ce1ca6e7dd2db1fb1ff Mon Sep 17 00:00:00 2001 From: tegwick Date: Wed, 11 Feb 2026 01:50:49 +0100 Subject: [PATCH] docs(examples): add infospace-with-history tutorial Comprehensive walkthrough covering schema design, prompt templates, artifact population, pipeline usage, LLM integration, git history tracking, metrics, and how to complete the remaining 31 chapters. Co-Authored-By: Claude Opus 4.6 --- examples/infospace-with-history/TUTORIAL.md | 533 ++++++++++++++++++++ 1 file changed, 533 insertions(+) create mode 100644 examples/infospace-with-history/TUTORIAL.md diff --git a/examples/infospace-with-history/TUTORIAL.md b/examples/infospace-with-history/TUTORIAL.md new file mode 100644 index 00000000..ca57f077 --- /dev/null +++ b/examples/infospace-with-history/TUTORIAL.md @@ -0,0 +1,533 @@ +# Building an Infospace with History — Tutorial + +This tutorial walks through how we built a structured **information space** +(infospace) from Adam Smith's *The Wealth of Nations*, mapping classical +economic concepts to Stafford Beer's **Viable System Model** (VSM), using +MarkiTect's prompt dependency resolution and LLM integration. + +By the end you will understand how to: + +1. Design schemas that scaffold structured LLM output +2. Write prompt templates with dependency injection (`@{macro}` syntax) +3. Populate source artifacts and reference material +4. Run an incremental, chapter-by-chapter pipeline +5. Track every change through git history +6. Measure completeness and consistency with metrics +7. Continue the work to process remaining chapters + +--- + +## 1. The Idea + +We want to transform a large body of text — the full public-domain text of +*The Wealth of Nations* (5 books, 35 chapters) — into a **curated +collection of economic concepts and entities**, each mapped to the VSM. + +The challenge: this is too much for a single prompt. The text is hundreds of +thousands of words. We need to work **incrementally**, one chapter at a time, +building up the infospace and tracking progress. + +MarkiTect's prompt dependency resolution lets us define **templates** with +`@{placeholder}` macros that are filled from an artifact repository at +execution time. The pipeline compiles each template into a complete prompt, +sends it to an LLM, and stores the output — all tracked by git. + +--- + +## 2. Project Layout + +``` +examples/infospace-with-history/ +│ +├── README.md # Project brief +├── TUTORIAL.md # This file +├── INFRA-TASKS.md # Infrastructure issues found during the experiment +├── process_chapters.py # Pipeline script +│ +├── schemas/ # Output structure definitions +│ ├── economic-entity-schema-v1.0.md +│ ├── vsm-concept-schema-v1.0.md +│ ├── vsm-mapping-schema-v1.0.md +│ └── chapter-analysis-schema-v1.0.md +│ +├── templates/ # Prompt templates (with @{macro} placeholders) +│ ├── extract-entities.md +│ ├── map-to-vsm.md +│ ├── synthesize-analysis.md +│ └── assess-metrics.md +│ +├── artifacts/ # Input artifacts +│ ├── sources/ # Chapter text (35 files) +│ ├── guidelines/ # Extraction and mapping rules +│ └── vsm-reference/ # VSM framework definition +│ +└── output/ # Generated artifacts (LLM outputs) + ├── entities/ # Per-chapter entity extractions + ├── mappings/ # Per-chapter VSM mappings + ├── analyses/ # Per-chapter synthesised analyses + └── metrics/ # Cross-chapter metrics reports +``` + +--- + +## 3. Designing Schemas + +Before writing any prompts we defined **four schemas** that tell the LLM +exactly what sections each output document must contain. This ensures every +generated document is machine-parseable and comparable across chapters. + +### Economic Entity Schema (`schemas/economic-entity-schema-v1.0.md`) + +Every extracted entity must have: + +- **H1 heading** with the entity name +- **Definition** (20-150 words) +- **Source Chapter** citing Book and Chapter +- **Context** — where in Smith's argument the entity appears +- **Economic Domain** (Production, Distribution, Exchange, etc.) + +Optional: Smith's Original Wording, Modern Interpretation. + +### VSM Mapping Schema (`schemas/vsm-mapping-schema-v1.0.md`) + +Every entity-to-VSM mapping must have: + +- **H1 heading** in the format `Entity Name -> VSM Concept Name` +- **Economic Entity Reference** and **VSM Concept Reference** +- **Mapping Rationale** (minimum 30 words, grounded in Beer's definitions) +- **Mapping Strength**: Strong, Moderate, or Weak + +### Chapter Analysis Schema (`schemas/chapter-analysis-schema-v1.0.md`) + +The per-chapter synthesis includes: + +- **Chapter Summary** (50-300 words) +- **Entities Extracted** — bulleted list +- **VSM Mappings** — entity, concept, strength +- **VSM Coverage** — explicit assessment of S1 through S5 and S3* +- **Gaps & Observations** + +### Metrics Schema (implicit in `assess-metrics` template) + +The metrics report computes: + +- VSM Concept Coverage (% of S1-S5, recursion, variety, etc.) +- Chapter Coverage (% of 35 chapters processed) +- Entity and Mapping counts +- Terminology Consistency and Cross-reference Integrity scores + +**Key insight**: Schemas are not code — they are markdown documents that +the LLM reads as instructions. This means you can iterate on them without +changing any infrastructure. + +--- + +## 4. Writing Prompt Templates + +Each template is a markdown file containing instructions for the LLM plus +`@{macro_name}` placeholders that MarkiTect's resolver fills with artifact +content at compile time. + +### Template 1: Extract Entities (`templates/extract-entities.md`) + +```markdown +# Extract Economic Entities + +You are an analytical economist specialising in classical economic theory. +Your task is to extract distinct economic entities from a chapter of +Adam Smith's *The Wealth of Nations*. + +## Source Chapter + +@{chapter_text} + +## Extraction Guidelines + +@{extraction_rules} + +## VSM Framework Context + +@{vsm_framework} + +## Instructions +[... detailed step-by-step instructions ...] + +## Output Format + +Output each entity as a separate markdown document, delimited by +`--- ENTITY: ---` markers. +``` + +The three macros (`chapter_text`, `extraction_rules`, `vsm_framework`) are +resolved by looking up artifacts by name in the relevant information spaces. + +### Template 2: Map to VSM (`templates/map-to-vsm.md`) + +Takes `@{entities}` (output from stage 1), `@{vsm_framework}`, and +`@{mapping_rules}` as inputs. + +### Template 3: Synthesise Analysis (`templates/synthesize-analysis.md`) + +Takes `@{chapter_text}`, `@{entities}` (stage 1 output), +`@{mappings}` (stage 2 output), and `@{vsm_framework}`. + +### Template 4: Assess Metrics (`templates/assess-metrics.md`) + +Takes `@{all_analyses}` (concatenation of all chapter analyses) and +`@{vsm_framework}`. Runs across the entire infospace, not per-chapter. + +**Dependency chain per chapter:** + +``` +chapter_text ─────┐ +extraction_rules ──┤ +vsm_framework ────┤ + ▼ + extract-entities + │ + ▼ entities + map-to-vsm + │ + ▼ mappings + synthesize-analysis + │ + ▼ analysis +``` + +After all chapters are processed, `assess-metrics` evaluates the +complete infospace. + +--- + +## 5. Populating Artifacts + +### Source chapters (`artifacts/sources/`) + +35 markdown files containing the full public-domain text of each chapter. +Named by convention: `book-1-chapter-01.md` through `book-5-chapter-03.md`, +plus `introduction.md`. + +These are loaded into the `infospace-sources` information space. + +### Guidelines (`artifacts/guidelines/`) + +Two hand-written reference documents: + +- **`extraction-rules.md`** — What constitutes an entity, granularity rules, + naming conventions, quality checks. +- **`mapping-rules.md`** — How to map entities to VSM systems, what + constitutes strong/moderate/weak mapping strength. + +These are loaded into `infospace-guidelines`. + +### VSM reference (`artifacts/vsm-reference/`) + +- **`vsm-framework.md`** — Complete description of Beer's VSM (S1-S5, S3*, + recursion, variety, viability, attenuation/amplification, algedonic + signals, autonomy). Includes economic interpretations for each system. + +Loaded into `infospace-vsm-reference`. + +--- + +## 6. The Pipeline Script + +`process_chapters.py` orchestrates everything. It: + +1. Initialises the artifact repository (SQLite) and information spaces +2. Loads all static artifacts (templates, guidelines, VSM reference) +3. For each chapter, runs the three-stage pipeline +4. Optionally calls an LLM to auto-generate outputs +5. Records dependency edges in the graph +6. Commits results to git + +### Running a single chapter + +```bash +# Manual mode (writes prompts, waits for you to provide output files): +python process_chapters.py --chapter book-1-chapter-05 --no-commit + +# Automatic mode via OpenRouter (recommended — fast, real token counts): +python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit + +# Automatic mode via Claude Code CLI: +python process_chapters.py --chapter book-1-chapter-05 --provider claude-code --no-commit + +# With a specific model: +python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --model anthropic/claude-haiku-4-5-20251001 --no-commit +``` + +### Running a whole book + +```bash +python process_chapters.py --book 1 --provider openrouter --no-commit +``` + +### Running all chapters + +```bash +python process_chapters.py --all --provider openrouter --no-commit +``` + +### Checking progress + +```bash +python process_chapters.py --list +``` + +Prints a table showing which chapters have completed each stage: + +``` +Available chapters (35): + + Chapter Entities Mappings Analysis + ------------------------------ ------------ ------------ ------------ + book-1-chapter-01 done done done + book-1-chapter-02 done done done + book-1-chapter-03 done done done + book-1-chapter-04 done done done + book-1-chapter-05 - - - + ... +``` + +### Assessing metrics + +After processing a batch of chapters, run the metrics assessment: + +```bash +python process_chapters.py --metrics --provider openrouter --no-commit +``` + +This concatenates all completed analyses and asks the LLM to evaluate +coverage, consistency, and completeness. + +### Dependency statistics + +```bash +python process_chapters.py --stats +``` + +--- + +## 7. How the LLM Integration Works + +The pipeline uses MarkiTect's `markitect.llm` module, which provides two +adapter backends that implement the `LLMAdapter` interface: + +| Backend | How it works | Pros | Cons | +|---------|-------------|------|------| +| `openrouter` | HTTP POST to OpenRouter API | Fast, real token counts, any model | Needs API key | +| `claude-code` | Shells out to `claude --print` | No API key needed if CLI installed | Slower, estimated token counts | + +### API key setup (OpenRouter) + +Place your key in one of these locations (checked in order): + +1. Pass `--api-key` on the command line (not yet implemented in the CLI) +2. Set `OPENROUTER_API_KEY` environment variable +3. Create `apikey-openrouter.txt` in the project root (git-ignored) + +### What happens per stage + +1. The pipeline **resolves** macro placeholders by looking up artifacts + in the repository +2. It **compiles** the template into a complete prompt (macros replaced + with real content) +3. It writes the compiled prompt to `output//-prompt.md` + for inspection +4. If an LLM adapter is configured and no output file exists yet, it + **executes** the prompt and writes the result +5. The output is **stored** as a generated artifact in the repository +6. Dependency edges are **recorded** in the graph + +--- + +## 8. Tracking History with Git + +Every processed chapter produces a git commit containing: + +- Compiled prompts (`*-prompt.md`) — so you can audit exactly what was sent +- Generated outputs (`*-entities.md`, `*-mappings.md`, `*-analysis.md`) + +This means: + +- `git log` shows the chronological order of processing +- `git diff` between commits shows what each chapter contributed +- You can `git bisect` to find where quality degraded +- You can revert a chapter and re-process it with different settings + +To let the script auto-commit (default): + +```bash +python process_chapters.py --chapter book-1-chapter-05 --provider openrouter +``` + +To commit manually after reviewing: + +```bash +python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit +# review output/entities/book-1-chapter-05-entities.md etc. +git add examples/infospace-with-history/output/ +git commit -m "infospace: process book-1-chapter-05" +``` + +--- + +## 9. Cost and Performance + +From our measurements processing chapters 3 and 4: + +| | Claude Code CLI | OpenRouter | +|---|---|---| +| Time per chapter | ~5 minutes | ~2 minutes | +| Token counts | Estimated (4 chars/tok) | Real (from API) | +| Cost per chapter | ~$0.35 est. | ~$0.07 est. | + +**Projected cost for all 35 chapters via OpenRouter:** ~$2.50 +(varies by chapter length; Book V chapters are longer). + +To reduce costs further, use a cheaper model: + +```bash +--provider openrouter --model anthropic/claude-haiku-4-5-20251001 +``` + +--- + +## 10. Completing the Remaining Chapters + +As of now, 4 of 35 chapters are processed (Book I, Chapters 1-4). Here is +how to complete the rest. + +### Step-by-step + +**1. Process remaining Book I chapters (5-11):** + +```bash +python process_chapters.py --book 1 --provider openrouter --no-commit +``` + +Already-processed chapters are skipped (their output files exist). + +**2. Process Books II-V:** + +```bash +python process_chapters.py --book 2 --provider openrouter --no-commit +python process_chapters.py --book 3 --provider openrouter --no-commit +python process_chapters.py --book 4 --provider openrouter --no-commit +python process_chapters.py --book 5 --provider openrouter --no-commit +``` + +Or all at once: + +```bash +python process_chapters.py --all --provider openrouter --no-commit +``` + +**3. Run metrics after each book (or at the end):** + +```bash +python process_chapters.py --metrics --provider openrouter --no-commit +``` + +**4. Commit the results:** + +```bash +git add examples/infospace-with-history/output/ +git commit -m "infospace: process all remaining chapters" +``` + +**5. Review the metrics report:** + +Open `output/metrics/metrics-report.md`. It will show: + +- Which VSM concepts (S1-S5, recursion, variety, etc.) now have mappings +- Total entity and mapping counts +- Consistency scores +- Recommendations for gaps + +### Expected progression + +| After | Chapters | Expected coverage | +|-------|----------|-------------------| +| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging | +| Books I-II (16 ch.) | 16/35 | S3 (capital control) covered | +| Books I-III (20 ch.) | 20/35 | Historical patterns add depth | +| Books I-IV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging | +| All (35 ch.) | 35/35 | Full coverage, S3* and algedonic signals likely from Book V | + +Book V (public revenue, taxation, sovereign duties) is expected to +fill the remaining gaps in S3*, S5, and regulatory concepts. + +--- + +## 11. Quality Improvement Loop + +The infospace is designed to be **iteratively refined**: + +1. **Process chapters** — run the pipeline +2. **Assess metrics** — identify gaps in VSM coverage and consistency +3. **Refine guidelines** — update `extraction-rules.md` or + `mapping-rules.md` to address identified weaknesses +4. **Re-process** — delete output files for specific chapters and re-run + with improved guidelines +5. **Compare** — use git diff to see how the refined guidelines changed + the output + +Example: if metrics show that S3* (Audit) is consistently missed, you +could add a paragraph to `extraction-rules.md` explicitly asking the LLM +to look for audit, inspection, and oversight mechanisms. + +To re-process a specific chapter: + +```bash +rm examples/infospace-with-history/output/entities/book-1-chapter-03-entities.md +rm examples/infospace-with-history/output/mappings/book-1-chapter-03-mappings.md +rm examples/infospace-with-history/output/analyses/book-1-chapter-03-analysis.md +python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit +``` + +--- + +## 12. Infrastructure Issues Found + +During development we documented three issues with the MarkiTect +infrastructure in `INFRA-TASKS.md`: + +1. **Artifact repo doesn't store content** — the resolver returns + placeholder text; the pipeline works around this with a local cache. +2. **ContentMacro `raw_text` defaults to `""`** — causes silent data + corruption when macros are constructed programmatically. +3. **No `@{target}` syntax in TemplateAnalyzer** — macros must be + constructed manually rather than auto-detected from template text. + +These are intentionally not fixed in this example (the constraint was +"no changes to markitect infrastructure"). They are tracked for future +improvement, after which the experiment can be re-run. + +--- + +## 13. Adapting This Pattern to Your Own Project + +To build your own infospace using this pattern: + +1. **Choose your source corpus** — any collection of documents you want + to transform into structured knowledge. +2. **Define your target ontology** — what concepts, relationships, or + categories you want to extract (our VSM is just one example). +3. **Write schemas** — markdown documents defining the required sections + and validation rules for each output type. +4. **Write extraction guidelines** — rules that tell the LLM what to + look for and how to handle edge cases. +5. **Create prompt templates** — use `@{macro}` syntax to inject source + text and guidelines at compile time. +6. **Build your pipeline** — follow `process_chapters.py` as a reference + for loading artifacts, resolving templates, and calling the LLM. +7. **Process incrementally** — work through your corpus one document at a + time, tracking everything in git. +8. **Measure and refine** — define metrics, assess them periodically, + and update your guidelines when gaps appear. + +The key architectural insight is that **schemas and guidelines are +artifacts** — they live in the same repository as your source text and +can be versioned, diffed, and refined just like code.