docs(tutorial): update all commands to use markitect infospace CLI (S3.4)

Replace all process_chapters.py references throughout the tutorial with the correct markitect infospace subcommands: - §2 Project layout: remove process_chapters.py, add LAYERED-DEVELOPMENT.md - §7 Processing: --chapter → process "glob", --book N → "book-N-*.md", --list → status/entities, --archive-entity → documented manual step - §8 Check: remove incorrect --provider flag; note checks are deterministic - §9 Viability: real output from full 988-entity corpus (Viable: YES) - §10 History: real snapshot table; add --metric flag example - §10 Git tracking: remove process_chapters.py from commit example - §11 Cost: update openrouter/free example command - §12 Completion: rewrite with actual observed metric progression table - §14 Quality loop: update all commands; add archive-entity manual procedure - §15 Artifact DB: --all without --provider = dry-run (no LLM calls) - §16 Adapting: update step 6 and 7 to new CLI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 23:31:38 +01:00
parent c861520ccd
commit 8f00fa2018
1 changed files with 152 additions and 101 deletions
--- a/examples/infospace-with-history/TUTORIAL.md
+++ b/examples/infospace-with-history/TUTORIAL.md
@@ -45,11 +45,11 @@ metrics — it is fit for purpose as an explanatory tool.
 ```
 examples/infospace-with-history/
 │
-├── infospace.yaml              # Declarative infospace configuration (NEW)
+├── infospace.yaml              # Declarative infospace configuration
 ├── README.md
 ├── TUTORIAL.md                 # This file
 ├── INFRA-TASKS.md              # Infrastructure issues found during the experiment
-├── process_chapters.py         # Pipeline script (chapter processing)
+├── LAYERED-DEVELOPMENT.md      # Concept for L2–L4 entity classification and modelling
 ├── infospace.db                # SQLite artifact database (generated, not in git)
 │
 ├── schemas/                    # Output structure definitions
@@ -301,61 +301,79 @@ Named `book-1-chapter-01.md` through `book-5-chapter-03.md`.

 ## 7. Processing Chapters

-`process_chapters.py` orchestrates the three-stage pipeline. It initialises
-the artifact repository, loads static artifacts, runs entity extraction →
-VSM mapping → analysis synthesis, and commits each chapter to git.
+`markitect infospace process` orchestrates the three-stage pipeline declared
+in `infospace.yaml`. It runs entity extraction → VSM mapping → analysis
+synthesis for each source file, and commits each chapter to git.

 ### Single chapter

 ```bash
-# Manual mode (writes prompts, awaits output files):
-python process_chapters.py --chapter book-1-chapter-05 --no-commit
+# Dry run — loads existing outputs only, no LLM calls:
+markitect infospace process "book-1-chapter-05.md"

-# Auto mode via OpenRouter (free models available):
-python process_chapters.py --chapter book-1-chapter-05 --provider openrouter
+# Process via OpenRouter (free models available):
+markitect infospace process "book-1-chapter-05.md" --provider openrouter

 # With a specific free model:
-python process_chapters.py --chapter book-1-chapter-05 \
+markitect infospace process "book-1-chapter-05.md" \
  --provider openrouter --model meta-llama/llama-4-maverick:free
+
+# Skip git commit after processing:
+markitect infospace process "book-1-chapter-05.md" \
+  --provider openrouter --no-commit
 ```

+The GLOB_PATTERN is matched against the `sources` directory declared in
+`infospace.yaml`. Already-processed chapters are skipped automatically —
+their output files already exist on disk.
+
 ### Whole book or all chapters

 ```bash
-python process_chapters.py --book 1 --provider openrouter
-python process_chapters.py --all --provider openrouter
+# Process all chapters of Book 1:
+markitect infospace process "book-1-*.md" --provider openrouter
+
+# Process all 35 source files:
+markitect infospace process --all --provider openrouter
+
+# Process all chapters and run quality checks after each one:
+markitect infospace process --all --provider openrouter --check-after-each
 ```

 ### Check progress

 ```bash
-python process_chapters.py --list
+markitect infospace status
 ```

 ```
-Available chapters (35):
-
-  Chapter                        Entities     Mappings     Analysis
-  ------------------------------ ------------ ------------ ------------
-  book-1-chapter-01              done (13)    done         done
-  book-1-chapter-02              done (7)     done         done
-  ...
-
-  Canonical entity set: 109 unique entities
+Infospace: The Wealth of Nations
+Domain:    Classical Economics
+Entities:  988
+Domains:   Accumulation, Consumption, Distribution, Exchange,
+           General Theory, Production, Regulation
+Disciplines: Viable System Model
+Last evaluated: 2026-02-19T21:54:44
 ```

+```bash
+markitect infospace entities
+```
+
+Lists all canonical entities with domain, source chapter, and word count.
+
 ### Entity lifecycle

-Entities in the canonical set are **never silently deleted**. Retire
-an entity by archiving it with a documented reason:
+Entities in the canonical set are **never silently deleted**. To retire
+an entity, move it to `output/entities/archive/<slug>.md` and add a
+dated archive header:

-```bash
-python process_chapters.py --archive-entity enlarged-monopoly \
-  --reason "Subsumed by monopoly-price — same market distortion"
+```markdown
+<!-- archived: 2026-02-22 reason="Subsumed by monopoly-price — same market distortion" -->
 ```

-The archived file moves to `output/entities/archive/<slug>.md` with a
-dated header, preserving the intellectual history of every decision.
+Then commit the removal so the intellectual history of every decision
+is preserved in git.

 ---

@@ -385,43 +403,46 @@ VSM relevance. Results are written to `output/evaluations/`.

 ```bash
 # Run all five collection checks:
-markitect infospace check --provider openrouter
+markitect infospace check

 # Run individual checks:
-markitect infospace check redundancy   # C1: Are any entities synonymous?
-markitect infospace check coverage     # C2: Which domain × VSM cells are empty?
-markitect infospace check coherence    # C3: Is the entity graph well-connected?
-markitect infospace check consistency  # C4: Are there circular definitions?
-markitect infospace check granularity  # C5: Is abstraction level balanced?
+markitect infospace check --concern redundancy   # C1: Are any entities synonymous?
+markitect infospace check --concern coverage     # C2: Which domain × chapter cells are empty?
+markitect infospace check --concern coherence    # C3: Is the entity graph well-connected?
+markitect infospace check --concern consistency  # C4: Are there circular definitions?
+markitect infospace check --concern granularity  # C5: Is abstraction level balanced?
 ```

+Collection checks are deterministic (embeddings, graph analysis, FCA) and
+require no LLM provider.
+
 Each check uses the platform's embedding, graph analysis, and FCA
 infrastructure. Results are written to `output/metrics/` and a new
 snapshot is appended to `metrics-history.yaml`.

-Sample output:
+Sample output (full corpus, 988 entities):

 ```
-Running collection checks on 109 entities...
+Collection checks — 988 entities

  C1 — redundancy
-    redundancy_ratio: 0.0183
-    high_similarity_pairs: 2
+    redundancy_ratio: 0.0061
+    similar_pairs: 3 candidates (word-overlap > 0.85)

  C2 — coverage
-    coverage_ratio: 0.4286
-    empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']]
+    coverage_ratio: 0.619
+    domain_densities: Exchange 0.85, Regulation 0.85, General Theory 0.73 …
+    density_std: 0.211  cross_cutting_ratio: 0.714

  C3 — coherence
-    coherence_components: 1
-    modularity: 0.412
+    connected_components: 0   (no cross-reference graph built yet)
+    modularity: 0.0

  C4 — consistency
-    consistency_cycles: 0
-    grounding_ratio: 0.94
+    cycle_count: 0

  C5 — granularity
-    granularity_entropy: 2.69
+    granularity_entropy: 2.953
 ```

 ---
@@ -436,20 +457,21 @@ Compares the latest metrics against the thresholds declared in
 `infospace.yaml`:

 ```
-Metric                         Value    Threshold   Status
-----------------------------------------------------------
-redundancy_ratio               0.0183    max=0.10     PASS
-coverage_ratio                 0.4286    min=0.50     FAIL
-coherence_components           1         max=3        PASS
-consistency_cycles             0         max=0        PASS
-granularity_entropy            2.6900    min=1.0      PASS
+Metric                            Value       Threshold   Status
+---------------------------------------------------------------
+redundancy_ratio                 0.0059         max=0.1     PASS
+coverage_ratio                   0.6190         min=0.4     PASS
+coherence_components             0.0000           max=3     PASS
+consistency_cycles               0.0000           max=0     PASS
+granularity_entropy              2.9533         min=1.0     PASS

-Viable: NO (4/5 thresholds met)
+Viable: YES (5/5 thresholds met)
 ```

-Coverage is currently failing (42% < 50% threshold) because only 9 of
-35 chapters have been processed. Once more chapters are done, coverage
-will rise.
+During early processing (first few books), coverage will fall and
+then stabilise as the domain × chapter matrix fills in. The threshold
+of 0.40 reflects realistic expectations for a multi-book corpus where
+some domains are naturally sparse in certain chapters.

 ### Metrics history

@@ -460,9 +482,19 @@ markitect infospace history
 Shows how metrics evolved across runs:

 ```
-Snapshot  Date        Entities  coverage  redundancy  entropy
-------------------------------------------------------------
-6ba48eb2  2026-02-19  85        0.361     0.000       2.687
+History: 36 snapshot(s)
+
+#    Date                 Entities  Metrics
+------------------------------------------
+1    2026-02-19T13:07:13        18        6
+2    2026-02-19T13:16:36        43        6
+...
+36   2026-02-19T21:54:44      1021        6
+```
+
+```bash
+# Show trend for a specific metric:
+markitect infospace history --metric coverage_ratio
 ```

 ---
@@ -483,16 +515,13 @@ This means:
 - You can `git bisect` to find where quality degraded
 - You can revert a chapter and re-process with improved guidelines

-The `clean-example-history` branch in this repository demonstrates the
-intended structure: each chapter is a single, self-contained commit.
-Use it as a reference for how the infospace grew step by step.
-
-To commit manually after reviewing:
+To review before committing:

 ```bash
-python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
+markitect infospace process "book-1-chapter-05.md" \
+  --provider openrouter --no-commit
 # review output/entities/ and output/mappings/
-git add examples/infospace-with-history/output/
+git add output/
 git commit -m "infospace: process book-1-chapter-05"
 ```

@@ -519,7 +548,7 @@ Use `openrouter/free` to automatically select from whichever free model is
 available:

 ```bash
-python process_chapters.py --chapter book-1-chapter-05 \
+markitect infospace process "book-1-chapter-05.md" \
  --provider openrouter --model openrouter/free
 ```

@@ -531,47 +560,53 @@ when running inside a Claude Code session due to nested session restrictions.

 ---

-## 12. Completing the Remaining Chapters
+## 12. Processing the Full Corpus

-As of writing, 9 of 35 chapters are processed (Book I, Chapters 1–9).
+All 35 chapters have been processed in this example. The commands below
+show how the full run was executed — use them as a template for your own
+corpus.

-**Process Book I remainder:**
+**Process one book at a time:**

 ```bash
 export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
-git checkout clean-example-history
-python process_chapters.py --book 1 --provider openrouter
+
+markitect infospace process "book-1-*.md" --provider openrouter
+markitect infospace process "book-2-*.md" --provider openrouter
+markitect infospace process "book-3-*.md" --provider openrouter
+markitect infospace process "book-4-*.md" --provider openrouter
+markitect infospace process "book-5-*.md" --provider openrouter
 ```

-Already-processed chapters are skipped — their chapter view files exist.
-The `@{existing_entities}` macro ensures the LLM only extracts genuinely
-new entities.
+Already-processed chapters are skipped automatically — their output files
+exist on disk. The `@{existing_entities}` macro ensures the LLM only
+extracts genuinely new entities.

-**Process Books II–V:**
+**Or process everything at once:**

 ```bash
-python process_chapters.py --book 2 --provider openrouter
-python process_chapters.py --book 3 --provider openrouter
-python process_chapters.py --book 4 --provider openrouter
-python process_chapters.py --book 5 --provider openrouter
+markitect infospace process --all --provider openrouter
 ```

 **Run collection checks after each book:**

 ```bash
-markitect infospace check --provider openrouter
+markitect infospace check
 markitect infospace viability
 ```

-**Expected progression:**
+**Observed metric progression (actual results):**

-| After | Chapters | Expected coverage |
-|-------|----------|-------------------|
-| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging |
-| Books I–II (16 ch.) | 16/35 | S3 (capital control) covered |
-| Books I–III (20 ch.) | 20/35 | Historical patterns add depth |
-| Books I–IV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging |
-| All (35 ch.) | 35/35 | Full coverage; S3* and algedonic signals from Book V |
+| After | Entities | coverage_ratio | entropy |
+|-------|----------|----------------|---------|
+| Book I (11 ch.) | ~236 | 0.51 | 2.77 |
+| Books I–II (16 ch.) | ~348 | 0.56 | 2.82 |
+| Books I–III (20 ch.) | ~456 | 0.59 | 2.97 |
+| Books I–IV (30 ch.) | ~930 | 0.51 | 2.94 |
+| All (35 ch.) | 988 | **0.62** | 2.95 |
+
+Coverage dips in Books IV–V as policy-heavy chapters introduce domains
+that are sparse in earlier books, then recovers as the matrix fills in.

 ---

@@ -610,9 +645,9 @@ dependent mappings are flagged for re-evaluation.

 The infospace is designed to be **iteratively refined**:

-1. **Process chapters** — run the pipeline
+1. **Process chapters** — `markitect infospace process "book-1-*.md" --provider openrouter`
 2. **Evaluate** — `markitect infospace evaluate --provider openrouter`
-3. **Check** — `markitect infospace check --provider openrouter`
+3. **Check** — `markitect infospace check`
 4. **Review viability** — `markitect infospace viability`
 5. **Refine guidelines** — update `extraction-rules.md` or
   `mapping-rules.md` to address identified weaknesses
@@ -626,18 +661,31 @@ audit, inspection, and oversight mechanisms.
 To re-process a specific chapter:

 ```bash
+# Delete stage outputs for that chapter (not canonical entity files):
 rm -f output/entities/book-1-chapter-03-entities.md
 rm -f output/mappings/book-1-chapter-03-mappings.md
 rm -f output/analyses/book-1-chapter-03-analysis.md
-python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
+
+# Re-run:
+markitect infospace process "book-1-chapter-03.md" --provider openrouter
 ```

-Never silently delete canonical entity files. Archive them instead:
+Never silently delete canonical entity files. Archive them instead by
+moving to `output/entities/archive/` with a dated comment header, then
+re-process the chapter so the pipeline can extract a replacement:

 ```bash
-python process_chapters.py --archive-entity extent-of-the-market \
-  --reason "Subsumed by market-price and effectual-demand"
-python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
+# Archive the entity manually:
+mkdir -p output/entities/archive
+mv output/entities/extent-of-the-market.md output/entities/archive/
+# Add header to the archived file explaining why
+echo '<!-- archived: 2026-02-22 reason="Subsumed by market-price and effectual-demand" -->' \
+  | cat - output/entities/archive/extent-of-the-market.md > /tmp/tmp.md \
+  && mv /tmp/tmp.md output/entities/archive/extent-of-the-market.md
+
+# Delete the chapter entity view so the chapter re-runs:
+rm -f output/entities/book-1-chapter-03-entities.md
+markitect infospace process "book-1-chapter-03.md" --provider openrouter
 ```

 ---
@@ -651,9 +699,12 @@ it is fully derived from the markdown files that are tracked.
 To regenerate it after a fresh clone (no LLM calls needed):

 ```bash
-python process_chapters.py --all --no-commit
+markitect infospace process --all
 ```

+Without `--provider`, the command runs in dry-run mode: it loads existing
+output files from disk into the database without making any LLM calls.
+
 ---

 ## 16. Adapting This Pattern to Your Own Project
@@ -665,9 +716,9 @@ To build your own infospace:
 3. Write extraction guidelines that tell the LLM what to look for
 4. Create prompt templates using `@{macro}` syntax
 5. Populate `artifacts/sources/` with your source corpus
-6. Run `process_chapters.py` (or your equivalent pipeline script)
-7. Evaluate with `markitect infospace evaluate` and `check`
-8. Review `markitect infospace viability` against your thresholds
+6. `markitect infospace process --all --provider openrouter`
+7. `markitect infospace check` and `markitect infospace evaluate --provider openrouter`
+8. `markitect infospace viability` — review against your thresholds
 9. Iterate: refine guidelines, re-process, re-evaluate
 10. Once viable, use as a discipline for a new infospace