From 8f00fa20182bf7967eee8f1fcceb356078da2384 Mon Sep 17 00:00:00 2001 From: tegwick Date: Sun, 22 Feb 2026 23:31:38 +0100 Subject: [PATCH] docs(tutorial): update all commands to use markitect infospace CLI (S3.4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace all process_chapters.py references throughout the tutorial with the correct markitect infospace subcommands: - §2 Project layout: remove process_chapters.py, add LAYERED-DEVELOPMENT.md - §7 Processing: --chapter → process "glob", --book N → "book-N-*.md", --list → status/entities, --archive-entity → documented manual step - §8 Check: remove incorrect --provider flag; note checks are deterministic - §9 Viability: real output from full 988-entity corpus (Viable: YES) - §10 History: real snapshot table; add --metric flag example - §10 Git tracking: remove process_chapters.py from commit example - §11 Cost: update openrouter/free example command - §12 Completion: rewrite with actual observed metric progression table - §14 Quality loop: update all commands; add archive-entity manual procedure - §15 Artifact DB: --all without --provider = dry-run (no LLM calls) - §16 Adapting: update step 6 and 7 to new CLI Co-Authored-By: Claude Sonnet 4.6 --- examples/infospace-with-history/TUTORIAL.md | 253 ++++++++++++-------- 1 file changed, 152 insertions(+), 101 deletions(-) diff --git a/examples/infospace-with-history/TUTORIAL.md b/examples/infospace-with-history/TUTORIAL.md index 904c8cf5..358759c9 100644 --- a/examples/infospace-with-history/TUTORIAL.md +++ b/examples/infospace-with-history/TUTORIAL.md @@ -45,11 +45,11 @@ metrics — it is fit for purpose as an explanatory tool. ``` examples/infospace-with-history/ │ -├── infospace.yaml # Declarative infospace configuration (NEW) +├── infospace.yaml # Declarative infospace configuration ├── README.md ├── TUTORIAL.md # This file ├── INFRA-TASKS.md # Infrastructure issues found during the experiment -├── process_chapters.py # Pipeline script (chapter processing) +├── LAYERED-DEVELOPMENT.md # Concept for L2–L4 entity classification and modelling ├── infospace.db # SQLite artifact database (generated, not in git) │ ├── schemas/ # Output structure definitions @@ -301,61 +301,79 @@ Named `book-1-chapter-01.md` through `book-5-chapter-03.md`. ## 7. Processing Chapters -`process_chapters.py` orchestrates the three-stage pipeline. It initialises -the artifact repository, loads static artifacts, runs entity extraction → -VSM mapping → analysis synthesis, and commits each chapter to git. +`markitect infospace process` orchestrates the three-stage pipeline declared +in `infospace.yaml`. It runs entity extraction → VSM mapping → analysis +synthesis for each source file, and commits each chapter to git. ### Single chapter ```bash -# Manual mode (writes prompts, awaits output files): -python process_chapters.py --chapter book-1-chapter-05 --no-commit +# Dry run — loads existing outputs only, no LLM calls: +markitect infospace process "book-1-chapter-05.md" -# Auto mode via OpenRouter (free models available): -python process_chapters.py --chapter book-1-chapter-05 --provider openrouter +# Process via OpenRouter (free models available): +markitect infospace process "book-1-chapter-05.md" --provider openrouter # With a specific free model: -python process_chapters.py --chapter book-1-chapter-05 \ +markitect infospace process "book-1-chapter-05.md" \ --provider openrouter --model meta-llama/llama-4-maverick:free + +# Skip git commit after processing: +markitect infospace process "book-1-chapter-05.md" \ + --provider openrouter --no-commit ``` +The GLOB_PATTERN is matched against the `sources` directory declared in +`infospace.yaml`. Already-processed chapters are skipped automatically — +their output files already exist on disk. + ### Whole book or all chapters ```bash -python process_chapters.py --book 1 --provider openrouter -python process_chapters.py --all --provider openrouter +# Process all chapters of Book 1: +markitect infospace process "book-1-*.md" --provider openrouter + +# Process all 35 source files: +markitect infospace process --all --provider openrouter + +# Process all chapters and run quality checks after each one: +markitect infospace process --all --provider openrouter --check-after-each ``` ### Check progress ```bash -python process_chapters.py --list +markitect infospace status ``` ``` -Available chapters (35): - - Chapter Entities Mappings Analysis - ------------------------------ ------------ ------------ ------------ - book-1-chapter-01 done (13) done done - book-1-chapter-02 done (7) done done - ... - - Canonical entity set: 109 unique entities +Infospace: The Wealth of Nations +Domain: Classical Economics +Entities: 988 +Domains: Accumulation, Consumption, Distribution, Exchange, + General Theory, Production, Regulation +Disciplines: Viable System Model +Last evaluated: 2026-02-19T21:54:44 ``` +```bash +markitect infospace entities +``` + +Lists all canonical entities with domain, source chapter, and word count. + ### Entity lifecycle -Entities in the canonical set are **never silently deleted**. Retire -an entity by archiving it with a documented reason: +Entities in the canonical set are **never silently deleted**. To retire +an entity, move it to `output/entities/archive/.md` and add a +dated archive header: -```bash -python process_chapters.py --archive-entity enlarged-monopoly \ - --reason "Subsumed by monopoly-price — same market distortion" +```markdown + ``` -The archived file moves to `output/entities/archive/.md` with a -dated header, preserving the intellectual history of every decision. +Then commit the removal so the intellectual history of every decision +is preserved in git. --- @@ -385,43 +403,46 @@ VSM relevance. Results are written to `output/evaluations/`. ```bash # Run all five collection checks: -markitect infospace check --provider openrouter +markitect infospace check # Run individual checks: -markitect infospace check redundancy # C1: Are any entities synonymous? -markitect infospace check coverage # C2: Which domain × VSM cells are empty? -markitect infospace check coherence # C3: Is the entity graph well-connected? -markitect infospace check consistency # C4: Are there circular definitions? -markitect infospace check granularity # C5: Is abstraction level balanced? +markitect infospace check --concern redundancy # C1: Are any entities synonymous? +markitect infospace check --concern coverage # C2: Which domain × chapter cells are empty? +markitect infospace check --concern coherence # C3: Is the entity graph well-connected? +markitect infospace check --concern consistency # C4: Are there circular definitions? +markitect infospace check --concern granularity # C5: Is abstraction level balanced? ``` +Collection checks are deterministic (embeddings, graph analysis, FCA) and +require no LLM provider. + Each check uses the platform's embedding, graph analysis, and FCA infrastructure. Results are written to `output/metrics/` and a new snapshot is appended to `metrics-history.yaml`. -Sample output: +Sample output (full corpus, 988 entities): ``` -Running collection checks on 109 entities... +Collection checks — 988 entities C1 — redundancy - redundancy_ratio: 0.0183 - high_similarity_pairs: 2 + redundancy_ratio: 0.0061 + similar_pairs: 3 candidates (word-overlap > 0.85) C2 — coverage - coverage_ratio: 0.4286 - empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']] + coverage_ratio: 0.619 + domain_densities: Exchange 0.85, Regulation 0.85, General Theory 0.73 … + density_std: 0.211 cross_cutting_ratio: 0.714 C3 — coherence - coherence_components: 1 - modularity: 0.412 + connected_components: 0 (no cross-reference graph built yet) + modularity: 0.0 C4 — consistency - consistency_cycles: 0 - grounding_ratio: 0.94 + cycle_count: 0 C5 — granularity - granularity_entropy: 2.69 + granularity_entropy: 2.953 ``` --- @@ -436,20 +457,21 @@ Compares the latest metrics against the thresholds declared in `infospace.yaml`: ``` -Metric Value Threshold Status ------------------------------------------------------------ -redundancy_ratio 0.0183 max=0.10 PASS -coverage_ratio 0.4286 min=0.50 FAIL -coherence_components 1 max=3 PASS -consistency_cycles 0 max=0 PASS -granularity_entropy 2.6900 min=1.0 PASS +Metric Value Threshold Status +--------------------------------------------------------------- +redundancy_ratio 0.0059 max=0.1 PASS +coverage_ratio 0.6190 min=0.4 PASS +coherence_components 0.0000 max=3 PASS +consistency_cycles 0.0000 max=0 PASS +granularity_entropy 2.9533 min=1.0 PASS -Viable: NO (4/5 thresholds met) +Viable: YES (5/5 thresholds met) ``` -Coverage is currently failing (42% < 50% threshold) because only 9 of -35 chapters have been processed. Once more chapters are done, coverage -will rise. +During early processing (first few books), coverage will fall and +then stabilise as the domain × chapter matrix fills in. The threshold +of 0.40 reflects realistic expectations for a multi-book corpus where +some domains are naturally sparse in certain chapters. ### Metrics history @@ -460,9 +482,19 @@ markitect infospace history Shows how metrics evolved across runs: ``` -Snapshot Date Entities coverage redundancy entropy -------------------------------------------------------------- -6ba48eb2 2026-02-19 85 0.361 0.000 2.687 +History: 36 snapshot(s) + +# Date Entities Metrics +------------------------------------------ +1 2026-02-19T13:07:13 18 6 +2 2026-02-19T13:16:36 43 6 +... +36 2026-02-19T21:54:44 1021 6 +``` + +```bash +# Show trend for a specific metric: +markitect infospace history --metric coverage_ratio ``` --- @@ -483,16 +515,13 @@ This means: - You can `git bisect` to find where quality degraded - You can revert a chapter and re-process with improved guidelines -The `clean-example-history` branch in this repository demonstrates the -intended structure: each chapter is a single, self-contained commit. -Use it as a reference for how the infospace grew step by step. - -To commit manually after reviewing: +To review before committing: ```bash -python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit +markitect infospace process "book-1-chapter-05.md" \ + --provider openrouter --no-commit # review output/entities/ and output/mappings/ -git add examples/infospace-with-history/output/ +git add output/ git commit -m "infospace: process book-1-chapter-05" ``` @@ -519,7 +548,7 @@ Use `openrouter/free` to automatically select from whichever free model is available: ```bash -python process_chapters.py --chapter book-1-chapter-05 \ +markitect infospace process "book-1-chapter-05.md" \ --provider openrouter --model openrouter/free ``` @@ -531,47 +560,53 @@ when running inside a Claude Code session due to nested session restrictions. --- -## 12. Completing the Remaining Chapters +## 12. Processing the Full Corpus -As of writing, 9 of 35 chapters are processed (Book I, Chapters 1–9). +All 35 chapters have been processed in this example. The commands below +show how the full run was executed — use them as a template for your own +corpus. -**Process Book I remainder:** +**Process one book at a time:** ```bash export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]') -git checkout clean-example-history -python process_chapters.py --book 1 --provider openrouter + +markitect infospace process "book-1-*.md" --provider openrouter +markitect infospace process "book-2-*.md" --provider openrouter +markitect infospace process "book-3-*.md" --provider openrouter +markitect infospace process "book-4-*.md" --provider openrouter +markitect infospace process "book-5-*.md" --provider openrouter ``` -Already-processed chapters are skipped — their chapter view files exist. -The `@{existing_entities}` macro ensures the LLM only extracts genuinely -new entities. +Already-processed chapters are skipped automatically — their output files +exist on disk. The `@{existing_entities}` macro ensures the LLM only +extracts genuinely new entities. -**Process Books II–V:** +**Or process everything at once:** ```bash -python process_chapters.py --book 2 --provider openrouter -python process_chapters.py --book 3 --provider openrouter -python process_chapters.py --book 4 --provider openrouter -python process_chapters.py --book 5 --provider openrouter +markitect infospace process --all --provider openrouter ``` **Run collection checks after each book:** ```bash -markitect infospace check --provider openrouter +markitect infospace check markitect infospace viability ``` -**Expected progression:** +**Observed metric progression (actual results):** -| After | Chapters | Expected coverage | -|-------|----------|-------------------| -| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging | -| Books I–II (16 ch.) | 16/35 | S3 (capital control) covered | -| Books I–III (20 ch.) | 20/35 | Historical patterns add depth | -| Books I–IV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging | -| All (35 ch.) | 35/35 | Full coverage; S3* and algedonic signals from Book V | +| After | Entities | coverage_ratio | entropy | +|-------|----------|----------------|---------| +| Book I (11 ch.) | ~236 | 0.51 | 2.77 | +| Books I–II (16 ch.) | ~348 | 0.56 | 2.82 | +| Books I–III (20 ch.) | ~456 | 0.59 | 2.97 | +| Books I–IV (30 ch.) | ~930 | 0.51 | 2.94 | +| All (35 ch.) | 988 | **0.62** | 2.95 | + +Coverage dips in Books IV–V as policy-heavy chapters introduce domains +that are sparse in earlier books, then recovers as the matrix fills in. --- @@ -610,9 +645,9 @@ dependent mappings are flagged for re-evaluation. The infospace is designed to be **iteratively refined**: -1. **Process chapters** — run the pipeline +1. **Process chapters** — `markitect infospace process "book-1-*.md" --provider openrouter` 2. **Evaluate** — `markitect infospace evaluate --provider openrouter` -3. **Check** — `markitect infospace check --provider openrouter` +3. **Check** — `markitect infospace check` 4. **Review viability** — `markitect infospace viability` 5. **Refine guidelines** — update `extraction-rules.md` or `mapping-rules.md` to address identified weaknesses @@ -626,18 +661,31 @@ audit, inspection, and oversight mechanisms. To re-process a specific chapter: ```bash +# Delete stage outputs for that chapter (not canonical entity files): rm -f output/entities/book-1-chapter-03-entities.md rm -f output/mappings/book-1-chapter-03-mappings.md rm -f output/analyses/book-1-chapter-03-analysis.md -python process_chapters.py --chapter book-1-chapter-03 --provider openrouter + +# Re-run: +markitect infospace process "book-1-chapter-03.md" --provider openrouter ``` -Never silently delete canonical entity files. Archive them instead: +Never silently delete canonical entity files. Archive them instead by +moving to `output/entities/archive/` with a dated comment header, then +re-process the chapter so the pipeline can extract a replacement: ```bash -python process_chapters.py --archive-entity extent-of-the-market \ - --reason "Subsumed by market-price and effectual-demand" -python process_chapters.py --chapter book-1-chapter-03 --provider openrouter +# Archive the entity manually: +mkdir -p output/entities/archive +mv output/entities/extent-of-the-market.md output/entities/archive/ +# Add header to the archived file explaining why +echo '' \ + | cat - output/entities/archive/extent-of-the-market.md > /tmp/tmp.md \ + && mv /tmp/tmp.md output/entities/archive/extent-of-the-market.md + +# Delete the chapter entity view so the chapter re-runs: +rm -f output/entities/book-1-chapter-03-entities.md +markitect infospace process "book-1-chapter-03.md" --provider openrouter ``` --- @@ -651,9 +699,12 @@ it is fully derived from the markdown files that are tracked. To regenerate it after a fresh clone (no LLM calls needed): ```bash -python process_chapters.py --all --no-commit +markitect infospace process --all ``` +Without `--provider`, the command runs in dry-run mode: it loads existing +output files from disk into the database without making any LLM calls. + --- ## 16. Adapting This Pattern to Your Own Project @@ -665,9 +716,9 @@ To build your own infospace: 3. Write extraction guidelines that tell the LLM what to look for 4. Create prompt templates using `@{macro}` syntax 5. Populate `artifacts/sources/` with your source corpus -6. Run `process_chapters.py` (or your equivalent pipeline script) -7. Evaluate with `markitect infospace evaluate` and `check` -8. Review `markitect infospace viability` against your thresholds +6. `markitect infospace process --all --provider openrouter` +7. `markitect infospace check` and `markitect infospace evaluate --provider openrouter` +8. `markitect infospace viability` — review against your thresholds 9. Iterate: refine guidelines, re-process, re-evaluate 10. Once viable, use as a discipline for a new infospace