docs(tutorial): update all commands to use markitect infospace CLI (S3.4)

Replace all process_chapters.py references throughout the tutorial with the correct markitect infospace subcommands: - §2 Project layout: remove process_chapters.py, add LAYERED-DEVELOPMENT.md - §7 Processing: --chapter → process "glob", --book N → "book-N-*.md", --list → status/entities, --archive-entity → documented manual step - §8 Check: remove incorrect --provider flag; note checks are deterministic - §9 Viability: real output from full 988-entity corpus (Viable: YES) - §10 History: real snapshot table; add --metric flag example - §10 Git tracking: remove process_chapters.py from commit example - §11 Cost: update openrouter/free example command - §12 Completion: rewrite with actual observed metric progression table - §14 Quality loop: update all commands; add archive-entity manual procedure - §15 Artifact DB: --all without --provider = dry-run (no LLM calls) - §16 Adapting: update step 6 and 7 to new CLI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 23:31:38 +01:00
parent c861520ccd
commit 8f00fa2018
1 changed files with 152 additions and 101 deletions
--- a/examples/infospace-with-history/TUTORIAL.md
+++ b/examples/infospace-with-history/TUTORIAL.md
@@ -45,11 +45,11 @@ metrics — it is fit for purpose as an explanatory tool.
 ```
 examples/infospace-with-history/
 │
-├── infospace.yaml              # Declarative infospace configuration (NEW)
+├── infospace.yaml              # Declarative infospace configuration
 ├── README.md
 ├── TUTORIAL.md                 # This file
 ├── INFRA-TASKS.md              # Infrastructure issues found during the experiment
-├── process_chapters.py         # Pipeline script (chapter processing)
+├── LAYERED-DEVELOPMENT.md      # Concept for L2–L4 entity classification and modelling
 ├── infospace.db                # SQLite artifact database (generated, not in git)
 │
 ├── schemas/                    # Output structure definitions
@@ -301,61 +301,79 @@ Named `book-1-chapter-01.md` through `book-5-chapter-03.md`.
 ## 7. Processing Chapters
-`process_chapters.py` orchestrates the three-stage pipeline. It initialises
+`markitect infospace process` orchestrates the three-stage pipeline declared
-the artifact repository, loads static artifacts, runs entity extraction →
+in `infospace.yaml`. It runs entity extraction → VSM mapping → analysis
-VSM mapping → analysis synthesis, and commits each chapter to git.
+synthesis for each source file, and commits each chapter to git.
 ### Single chapter
 ```bash
-# Manual mode (writes prompts, awaits output files):
+# Dry run — loads existing outputs only, no LLM calls:
-python process_chapters.py --chapter book-1-chapter-05 --no-commit
+markitect infospace process "book-1-chapter-05.md"
-# Auto mode via OpenRouter (free models available):
+# Process via OpenRouter (free models available):
-python process_chapters.py --chapter book-1-chapter-05 --provider openrouter
+markitect infospace process "book-1-chapter-05.md" --provider openrouter
 # With a specific free model:
-python process_chapters.py --chapter book-1-chapter-05 \
+markitect infospace process "book-1-chapter-05.md" \
  --provider openrouter --model meta-llama/llama-4-maverick:free
 # Skip git commit after processing:
 markitect infospace process "book-1-chapter-05.md" \
  --provider openrouter --no-commit
 ```
 The GLOB_PATTERN is matched against the `sources` directory declared in
 `infospace.yaml`. Already-processed chapters are skipped automatically —
 their output files already exist on disk.
 ### Whole book or all chapters
 ```bash
-python process_chapters.py --book 1 --provider openrouter
+# Process all chapters of Book 1:
-python process_chapters.py --all --provider openrouter
+markitect infospace process "book-1-*.md" --provider openrouter
 # Process all 35 source files:
 markitect infospace process --all --provider openrouter
 # Process all chapters and run quality checks after each one:
 markitect infospace process --all --provider openrouter --check-after-each
 ```
 ### Check progress
 ```bash
-python process_chapters.py --list
+markitect infospace status
 ```
 ```
-Available chapters (35):
+Infospace: The Wealth of Nations
-
+Domain:    Classical Economics
-  Chapter                        Entities     Mappings     Analysis
+Entities:  988
-  ------------------------------ ------------ ------------ ------------
+Domains:   Accumulation, Consumption, Distribution, Exchange,
-  book-1-chapter-01              done (13)    done         done
+           General Theory, Production, Regulation
-  book-1-chapter-02              done (7)     done         done
+Disciplines: Viable System Model
-  ...
+Last evaluated: 2026-02-19T21:54:44
  Canonical entity set: 109 unique entities
 ```
 ```bash
 markitect infospace entities
 ```
 Lists all canonical entities with domain, source chapter, and word count.
 ### Entity lifecycle
-Entities in the canonical set are **never silently deleted**. Retire
+Entities in the canonical set are **never silently deleted**. To retire
-an entity by archiving it with a documented reason:
+an entity, move it to `output/entities/archive/<slug>.md` and add a
 dated archive header:
-```bash
+```markdown
-python process_chapters.py --archive-entity enlarged-monopoly \
+<!-- archived: 2026-02-22 reason="Subsumed by monopoly-price — same market distortion" -->
  --reason "Subsumed by monopoly-price — same market distortion"
 ```
-The archived file moves to `output/entities/archive/<slug>.md` with a
+Then commit the removal so the intellectual history of every decision
-dated header, preserving the intellectual history of every decision.
+is preserved in git.
 ---
@@ -385,43 +403,46 @@ VSM relevance. Results are written to `output/evaluations/`.
 ```bash
 # Run all five collection checks:
-markitect infospace check --provider openrouter
+markitect infospace check
 # Run individual checks:
-markitect infospace check redundancy   # C1: Are any entities synonymous?
+markitect infospace check --concern redundancy   # C1: Are any entities synonymous?
-markitect infospace check coverage     # C2: Which domain × VSM cells are empty?
+markitect infospace check --concern coverage     # C2: Which domain × chapter cells are empty?
-markitect infospace check coherence    # C3: Is the entity graph well-connected?
+markitect infospace check --concern coherence    # C3: Is the entity graph well-connected?
-markitect infospace check consistency  # C4: Are there circular definitions?
+markitect infospace check --concern consistency  # C4: Are there circular definitions?
-markitect infospace check granularity  # C5: Is abstraction level balanced?
+markitect infospace check --concern granularity  # C5: Is abstraction level balanced?
 ```
 Collection checks are deterministic (embeddings, graph analysis, FCA) and
 require no LLM provider.
 Each check uses the platform's embedding, graph analysis, and FCA
 infrastructure. Results are written to `output/metrics/` and a new
 snapshot is appended to `metrics-history.yaml`.
-Sample output:
+Sample output (full corpus, 988 entities):
 ```
-Running collection checks on 109 entities...
+Collection checks — 988 entities
  C1 — redundancy
-    redundancy_ratio: 0.0183
+    redundancy_ratio: 0.0061
-    high_similarity_pairs: 2
+    similar_pairs: 3 candidates (word-overlap > 0.85)
  C2 — coverage
-    coverage_ratio: 0.4286
+    coverage_ratio: 0.619
-    empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']]
+    domain_densities: Exchange 0.85, Regulation 0.85, General Theory 0.73 …
    density_std: 0.211  cross_cutting_ratio: 0.714
  C3 — coherence
-    coherence_components: 1
+    connected_components: 0   (no cross-reference graph built yet)
-    modularity: 0.412
+    modularity: 0.0
  C4 — consistency
-    consistency_cycles: 0
+    cycle_count: 0
    grounding_ratio: 0.94
  C5 — granularity
-    granularity_entropy: 2.69
+    granularity_entropy: 2.953
 ```
 ---
@@ -436,20 +457,21 @@ Compares the latest metrics against the thresholds declared in
 `infospace.yaml`:
 ```
-Metric                         Value    Threshold   Status
+Metric                            Value       Threshold   Status
-----------------------------------------------------------
+---------------------------------------------------------------
-redundancy_ratio               0.0183    max=0.10     PASS
+redundancy_ratio                 0.0059         max=0.1     PASS
-coverage_ratio                 0.4286    min=0.50     FAIL
+coverage_ratio                   0.6190         min=0.4     PASS
-coherence_components           1         max=3        PASS
+coherence_components             0.0000           max=3     PASS
-consistency_cycles             0         max=0        PASS
+consistency_cycles               0.0000           max=0     PASS
-granularity_entropy            2.6900    min=1.0      PASS
+granularity_entropy              2.9533         min=1.0     PASS
-Viable: NO (4/5 thresholds met)
+Viable: YES (5/5 thresholds met)
 ```
-Coverage is currently failing (42% < 50% threshold) because only 9 of
+During early processing (first few books), coverage will fall and
-35 chapters have been processed. Once more chapters are done, coverage
+then stabilise as the domain × chapter matrix fills in. The threshold
-will rise.
+of 0.40 reflects realistic expectations for a multi-book corpus where
 some domains are naturally sparse in certain chapters.
 ### Metrics history
@@ -460,9 +482,19 @@ markitect infospace history
 Shows how metrics evolved across runs:
 ```
-Snapshot  Date        Entities  coverage  redundancy  entropy
+History: 36 snapshot(s)
-------------------------------------------------------------
+
-6ba48eb2  2026-02-19  85        0.361     0.000       2.687
+#    Date                 Entities  Metrics
 ------------------------------------------
 1    2026-02-19T13:07:13        18        6
 2    2026-02-19T13:16:36        43        6
 ...
 36   2026-02-19T21:54:44      1021        6
 ```
 ```bash
 # Show trend for a specific metric:
 markitect infospace history --metric coverage_ratio
 ```
 ---
@@ -483,16 +515,13 @@ This means:
 - You can `git bisect` to find where quality degraded
 - You can revert a chapter and re-process with improved guidelines
-The `clean-example-history` branch in this repository demonstrates the
+To review before committing:
 intended structure: each chapter is a single, self-contained commit.
 Use it as a reference for how the infospace grew step by step.
 To commit manually after reviewing:
 ```bash
-python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
+markitect infospace process "book-1-chapter-05.md" \
  --provider openrouter --no-commit
 # review output/entities/ and output/mappings/
-git add examples/infospace-with-history/output/
+git add output/
 git commit -m "infospace: process book-1-chapter-05"
 ```
@@ -519,7 +548,7 @@ Use `openrouter/free` to automatically select from whichever free model is
 available:
 ```bash
-python process_chapters.py --chapter book-1-chapter-05 \
+markitect infospace process "book-1-chapter-05.md" \
  --provider openrouter --model openrouter/free
 ```
@@ -531,47 +560,53 @@ when running inside a Claude Code session due to nested session restrictions.
 ---
-## 12. Completing the Remaining Chapters
+## 12. Processing the Full Corpus
-As of writing, 9 of 35 chapters are processed (Book I, Chapters 1–9).
+All 35 chapters have been processed in this example. The commands below
 show how the full run was executed — use them as a template for your own
 corpus.
-**Process Book I remainder:**
+**Process one book at a time:**
 ```bash
 export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
-git checkout clean-example-history
+
-python process_chapters.py --book 1 --provider openrouter
+markitect infospace process "book-1-*.md" --provider openrouter
 markitect infospace process "book-2-*.md" --provider openrouter
 markitect infospace process "book-3-*.md" --provider openrouter
 markitect infospace process "book-4-*.md" --provider openrouter
 markitect infospace process "book-5-*.md" --provider openrouter
 ```
-Already-processed chapters are skipped — their chapter view files exist.
+Already-processed chapters are skipped automatically — their output files
-The `@{existing_entities}` macro ensures the LLM only extracts genuinely
+exist on disk. The `@{existing_entities}` macro ensures the LLM only
-new entities.
+extracts genuinely new entities.
-**Process Books II–V:**
+**Or process everything at once:**
 ```bash
-python process_chapters.py --book 2 --provider openrouter
+markitect infospace process --all --provider openrouter
 python process_chapters.py --book 3 --provider openrouter
 python process_chapters.py --book 4 --provider openrouter
 python process_chapters.py --book 5 --provider openrouter
 ```
 **Run collection checks after each book:**
 ```bash
-markitect infospace check --provider openrouter
+markitect infospace check
 markitect infospace viability
 ```
-**Expected progression:**
+**Observed metric progression (actual results):**
-| After | Chapters | Expected coverage |
+| After | Entities | coverage_ratio | entropy |
-|-------|----------|-------------------|
+|-------|----------|----------------|---------|
-| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging |
+| Book I (11 ch.) | ~236 | 0.51 | 2.77 |
-| Books I–II (16 ch.) | 16/35 | S3 (capital control) covered |
+| Books I–II (16 ch.) | ~348 | 0.56 | 2.82 |
-| Books I–III (20 ch.) | 20/35 | Historical patterns add depth |
+| Books I–III (20 ch.) | ~456 | 0.59 | 2.97 |
-| Books I–IV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging |
+| Books I–IV (30 ch.) | ~930 | 0.51 | 2.94 |
-| All (35 ch.) | 35/35 | Full coverage; S3* and algedonic signals from Book V |
+| All (35 ch.) | 988 | **0.62** | 2.95 |
 Coverage dips in Books IV–V as policy-heavy chapters introduce domains
 that are sparse in earlier books, then recovers as the matrix fills in.
 ---
@@ -610,9 +645,9 @@ dependent mappings are flagged for re-evaluation.
 The infospace is designed to be **iteratively refined**:
-1. **Process chapters** — run the pipeline
+1. **Process chapters** — `markitect infospace process "book-1-*.md" --provider openrouter`
 2. **Evaluate** — `markitect infospace evaluate --provider openrouter`
-3. **Check** — `markitect infospace check --provider openrouter`
+3. **Check** — `markitect infospace check`
 4. **Review viability** — `markitect infospace viability`
 5. **Refine guidelines** — update `extraction-rules.md` or
   `mapping-rules.md` to address identified weaknesses
@@ -626,18 +661,31 @@ audit, inspection, and oversight mechanisms.
 To re-process a specific chapter:
 ```bash
 # Delete stage outputs for that chapter (not canonical entity files):
 rm -f output/entities/book-1-chapter-03-entities.md
 rm -f output/mappings/book-1-chapter-03-mappings.md
 rm -f output/analyses/book-1-chapter-03-analysis.md
-python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
+
 # Re-run:
 markitect infospace process "book-1-chapter-03.md" --provider openrouter
 ```
-Never silently delete canonical entity files. Archive them instead:
+Never silently delete canonical entity files. Archive them instead by
 moving to `output/entities/archive/` with a dated comment header, then
 re-process the chapter so the pipeline can extract a replacement:
 ```bash
-python process_chapters.py --archive-entity extent-of-the-market \
+# Archive the entity manually:
-  --reason "Subsumed by market-price and effectual-demand"
+mkdir -p output/entities/archive
-python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
+mv output/entities/extent-of-the-market.md output/entities/archive/
 # Add header to the archived file explaining why
 echo '<!-- archived: 2026-02-22 reason="Subsumed by market-price and effectual-demand" -->' \
  | cat - output/entities/archive/extent-of-the-market.md > /tmp/tmp.md \
  && mv /tmp/tmp.md output/entities/archive/extent-of-the-market.md
 # Delete the chapter entity view so the chapter re-runs:
 rm -f output/entities/book-1-chapter-03-entities.md
 markitect infospace process "book-1-chapter-03.md" --provider openrouter
 ```
 ---
@@ -651,9 +699,12 @@ it is fully derived from the markdown files that are tracked.
 To regenerate it after a fresh clone (no LLM calls needed):
 ```bash
-python process_chapters.py --all --no-commit
+markitect infospace process --all
 ```
 Without `--provider`, the command runs in dry-run mode: it loads existing
 output files from disk into the database without making any LLM calls.
 ---
 ## 16. Adapting This Pattern to Your Own Project
@@ -665,9 +716,9 @@ To build your own infospace:
 3. Write extraction guidelines that tell the LLM what to look for
 4. Create prompt templates using `@{macro}` syntax
 5. Populate `artifacts/sources/` with your source corpus
-6. Run `process_chapters.py` (or your equivalent pipeline script)
+6. `markitect infospace process --all --provider openrouter`
-7. Evaluate with `markitect infospace evaluate` and `check`
+7. `markitect infospace check` and `markitect infospace evaluate --provider openrouter`
-8. Review `markitect infospace viability` against your thresholds
+8. `markitect infospace viability` — review against your thresholds
 9. Iterate: refine guidelines, re-process, re-evaluate
 10. Once viable, use as a discipline for a new infospace