docs(tutorial): update all commands to use markitect infospace CLI (S3.4)

Replace all process_chapters.py references throughout the tutorial with
the correct markitect infospace subcommands:

- §2  Project layout: remove process_chapters.py, add LAYERED-DEVELOPMENT.md
- §7  Processing: --chapter → process "glob", --book N → "book-N-*.md",
      --list → status/entities, --archive-entity → documented manual step
- §8  Check: remove incorrect --provider flag; note checks are deterministic
- §9  Viability: real output from full 988-entity corpus (Viable: YES)
- §10 History: real snapshot table; add --metric flag example
- §10 Git tracking: remove process_chapters.py from commit example
- §11 Cost: update openrouter/free example command
- §12 Completion: rewrite with actual observed metric progression table
- §14 Quality loop: update all commands; add archive-entity manual procedure
- §15 Artifact DB: --all without --provider = dry-run (no LLM calls)
- §16 Adapting: update step 6 and 7 to new CLI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-22 23:31:38 +01:00
parent c861520ccd
commit 8f00fa2018

View File

@@ -45,11 +45,11 @@ metrics — it is fit for purpose as an explanatory tool.
``` ```
examples/infospace-with-history/ examples/infospace-with-history/
├── infospace.yaml # Declarative infospace configuration (NEW) ├── infospace.yaml # Declarative infospace configuration
├── README.md ├── README.md
├── TUTORIAL.md # This file ├── TUTORIAL.md # This file
├── INFRA-TASKS.md # Infrastructure issues found during the experiment ├── INFRA-TASKS.md # Infrastructure issues found during the experiment
├── process_chapters.py # Pipeline script (chapter processing) ├── LAYERED-DEVELOPMENT.md # Concept for L2L4 entity classification and modelling
├── infospace.db # SQLite artifact database (generated, not in git) ├── infospace.db # SQLite artifact database (generated, not in git)
├── schemas/ # Output structure definitions ├── schemas/ # Output structure definitions
@@ -301,61 +301,79 @@ Named `book-1-chapter-01.md` through `book-5-chapter-03.md`.
## 7. Processing Chapters ## 7. Processing Chapters
`process_chapters.py` orchestrates the three-stage pipeline. It initialises `markitect infospace process` orchestrates the three-stage pipeline declared
the artifact repository, loads static artifacts, runs entity extraction → in `infospace.yaml`. It runs entity extraction → VSM mapping → analysis
VSM mapping → analysis synthesis, and commits each chapter to git. synthesis for each source file, and commits each chapter to git.
### Single chapter ### Single chapter
```bash ```bash
# Manual mode (writes prompts, awaits output files): # Dry run — loads existing outputs only, no LLM calls:
python process_chapters.py --chapter book-1-chapter-05 --no-commit markitect infospace process "book-1-chapter-05.md"
# Auto mode via OpenRouter (free models available): # Process via OpenRouter (free models available):
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter markitect infospace process "book-1-chapter-05.md" --provider openrouter
# With a specific free model: # With a specific free model:
python process_chapters.py --chapter book-1-chapter-05 \ markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --model meta-llama/llama-4-maverick:free --provider openrouter --model meta-llama/llama-4-maverick:free
# Skip git commit after processing:
markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --no-commit
``` ```
The GLOB_PATTERN is matched against the `sources` directory declared in
`infospace.yaml`. Already-processed chapters are skipped automatically —
their output files already exist on disk.
### Whole book or all chapters ### Whole book or all chapters
```bash ```bash
python process_chapters.py --book 1 --provider openrouter # Process all chapters of Book 1:
python process_chapters.py --all --provider openrouter markitect infospace process "book-1-*.md" --provider openrouter
# Process all 35 source files:
markitect infospace process --all --provider openrouter
# Process all chapters and run quality checks after each one:
markitect infospace process --all --provider openrouter --check-after-each
``` ```
### Check progress ### Check progress
```bash ```bash
python process_chapters.py --list markitect infospace status
``` ```
``` ```
Available chapters (35): Infospace: The Wealth of Nations
Domain: Classical Economics
Chapter Entities Mappings Analysis Entities: 988
------------------------------ ------------ ------------ ------------ Domains: Accumulation, Consumption, Distribution, Exchange,
book-1-chapter-01 done (13) done done General Theory, Production, Regulation
book-1-chapter-02 done (7) done done Disciplines: Viable System Model
... Last evaluated: 2026-02-19T21:54:44
Canonical entity set: 109 unique entities
``` ```
```bash
markitect infospace entities
```
Lists all canonical entities with domain, source chapter, and word count.
### Entity lifecycle ### Entity lifecycle
Entities in the canonical set are **never silently deleted**. Retire Entities in the canonical set are **never silently deleted**. To retire
an entity by archiving it with a documented reason: an entity, move it to `output/entities/archive/<slug>.md` and add a
dated archive header:
```bash ```markdown
python process_chapters.py --archive-entity enlarged-monopoly \ <!-- archived: 2026-02-22 reason="Subsumed by monopoly-price — same market distortion" -->
--reason "Subsumed by monopoly-price — same market distortion"
``` ```
The archived file moves to `output/entities/archive/<slug>.md` with a Then commit the removal so the intellectual history of every decision
dated header, preserving the intellectual history of every decision. is preserved in git.
--- ---
@@ -385,43 +403,46 @@ VSM relevance. Results are written to `output/evaluations/`.
```bash ```bash
# Run all five collection checks: # Run all five collection checks:
markitect infospace check --provider openrouter markitect infospace check
# Run individual checks: # Run individual checks:
markitect infospace check redundancy # C1: Are any entities synonymous? markitect infospace check --concern redundancy # C1: Are any entities synonymous?
markitect infospace check coverage # C2: Which domain × VSM cells are empty? markitect infospace check --concern coverage # C2: Which domain × chapter cells are empty?
markitect infospace check coherence # C3: Is the entity graph well-connected? markitect infospace check --concern coherence # C3: Is the entity graph well-connected?
markitect infospace check consistency # C4: Are there circular definitions? markitect infospace check --concern consistency # C4: Are there circular definitions?
markitect infospace check granularity # C5: Is abstraction level balanced? markitect infospace check --concern granularity # C5: Is abstraction level balanced?
``` ```
Collection checks are deterministic (embeddings, graph analysis, FCA) and
require no LLM provider.
Each check uses the platform's embedding, graph analysis, and FCA Each check uses the platform's embedding, graph analysis, and FCA
infrastructure. Results are written to `output/metrics/` and a new infrastructure. Results are written to `output/metrics/` and a new
snapshot is appended to `metrics-history.yaml`. snapshot is appended to `metrics-history.yaml`.
Sample output: Sample output (full corpus, 988 entities):
``` ```
Running collection checks on 109 entities... Collection checks — 988 entities
C1 — redundancy C1 — redundancy
redundancy_ratio: 0.0183 redundancy_ratio: 0.0061
high_similarity_pairs: 2 similar_pairs: 3 candidates (word-overlap > 0.85)
C2 — coverage C2 — coverage
coverage_ratio: 0.4286 coverage_ratio: 0.619
empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']] domain_densities: Exchange 0.85, Regulation 0.85, General Theory 0.73 …
density_std: 0.211 cross_cutting_ratio: 0.714
C3 — coherence C3 — coherence
coherence_components: 1 connected_components: 0 (no cross-reference graph built yet)
modularity: 0.412 modularity: 0.0
C4 — consistency C4 — consistency
consistency_cycles: 0 cycle_count: 0
grounding_ratio: 0.94
C5 — granularity C5 — granularity
granularity_entropy: 2.69 granularity_entropy: 2.953
``` ```
--- ---
@@ -436,20 +457,21 @@ Compares the latest metrics against the thresholds declared in
`infospace.yaml`: `infospace.yaml`:
``` ```
Metric Value Threshold Status Metric Value Threshold Status
----------------------------------------------------------- ---------------------------------------------------------------
redundancy_ratio 0.0183 max=0.10 PASS redundancy_ratio 0.0059 max=0.1 PASS
coverage_ratio 0.4286 min=0.50 FAIL coverage_ratio 0.6190 min=0.4 PASS
coherence_components 1 max=3 PASS coherence_components 0.0000 max=3 PASS
consistency_cycles 0 max=0 PASS consistency_cycles 0.0000 max=0 PASS
granularity_entropy 2.6900 min=1.0 PASS granularity_entropy 2.9533 min=1.0 PASS
Viable: NO (4/5 thresholds met) Viable: YES (5/5 thresholds met)
``` ```
Coverage is currently failing (42% < 50% threshold) because only 9 of During early processing (first few books), coverage will fall and
35 chapters have been processed. Once more chapters are done, coverage then stabilise as the domain × chapter matrix fills in. The threshold
will rise. of 0.40 reflects realistic expectations for a multi-book corpus where
some domains are naturally sparse in certain chapters.
### Metrics history ### Metrics history
@@ -460,9 +482,19 @@ markitect infospace history
Shows how metrics evolved across runs: Shows how metrics evolved across runs:
``` ```
Snapshot Date Entities coverage redundancy entropy History: 36 snapshot(s)
-------------------------------------------------------------
6ba48eb2 2026-02-19 85 0.361 0.000 2.687 # Date Entities Metrics
------------------------------------------
1 2026-02-19T13:07:13 18 6
2 2026-02-19T13:16:36 43 6
...
36 2026-02-19T21:54:44 1021 6
```
```bash
# Show trend for a specific metric:
markitect infospace history --metric coverage_ratio
``` ```
--- ---
@@ -483,16 +515,13 @@ This means:
- You can `git bisect` to find where quality degraded - You can `git bisect` to find where quality degraded
- You can revert a chapter and re-process with improved guidelines - You can revert a chapter and re-process with improved guidelines
The `clean-example-history` branch in this repository demonstrates the To review before committing:
intended structure: each chapter is a single, self-contained commit.
Use it as a reference for how the infospace grew step by step.
To commit manually after reviewing:
```bash ```bash
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --no-commit
# review output/entities/ and output/mappings/ # review output/entities/ and output/mappings/
git add examples/infospace-with-history/output/ git add output/
git commit -m "infospace: process book-1-chapter-05" git commit -m "infospace: process book-1-chapter-05"
``` ```
@@ -519,7 +548,7 @@ Use `openrouter/free` to automatically select from whichever free model is
available: available:
```bash ```bash
python process_chapters.py --chapter book-1-chapter-05 \ markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --model openrouter/free --provider openrouter --model openrouter/free
``` ```
@@ -531,47 +560,53 @@ when running inside a Claude Code session due to nested session restrictions.
--- ---
## 12. Completing the Remaining Chapters ## 12. Processing the Full Corpus
As of writing, 9 of 35 chapters are processed (Book I, Chapters 19). All 35 chapters have been processed in this example. The commands below
show how the full run was executed — use them as a template for your own
corpus.
**Process Book I remainder:** **Process one book at a time:**
```bash ```bash
export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]') export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
git checkout clean-example-history
python process_chapters.py --book 1 --provider openrouter markitect infospace process "book-1-*.md" --provider openrouter
markitect infospace process "book-2-*.md" --provider openrouter
markitect infospace process "book-3-*.md" --provider openrouter
markitect infospace process "book-4-*.md" --provider openrouter
markitect infospace process "book-5-*.md" --provider openrouter
``` ```
Already-processed chapters are skipped — their chapter view files exist. Already-processed chapters are skipped automatically — their output files
The `@{existing_entities}` macro ensures the LLM only extracts genuinely exist on disk. The `@{existing_entities}` macro ensures the LLM only
new entities. extracts genuinely new entities.
**Process Books IIV:** **Or process everything at once:**
```bash ```bash
python process_chapters.py --book 2 --provider openrouter markitect infospace process --all --provider openrouter
python process_chapters.py --book 3 --provider openrouter
python process_chapters.py --book 4 --provider openrouter
python process_chapters.py --book 5 --provider openrouter
``` ```
**Run collection checks after each book:** **Run collection checks after each book:**
```bash ```bash
markitect infospace check --provider openrouter markitect infospace check
markitect infospace viability markitect infospace viability
``` ```
**Expected progression:** **Observed metric progression (actual results):**
| After | Chapters | Expected coverage | | After | Entities | coverage_ratio | entropy |
|-------|----------|-------------------| |-------|----------|----------------|---------|
| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging | | Book I (11 ch.) | ~236 | 0.51 | 2.77 |
| Books III (16 ch.) | 16/35 | S3 (capital control) covered | | Books III (16 ch.) | ~348 | 0.56 | 2.82 |
| Books IIII (20 ch.) | 20/35 | Historical patterns add depth | | Books IIII (20 ch.) | ~456 | 0.59 | 2.97 |
| Books IIV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging | | Books IIV (30 ch.) | ~930 | 0.51 | 2.94 |
| All (35 ch.) | 35/35 | Full coverage; S3* and algedonic signals from Book V | | All (35 ch.) | 988 | **0.62** | 2.95 |
Coverage dips in Books IVV as policy-heavy chapters introduce domains
that are sparse in earlier books, then recovers as the matrix fills in.
--- ---
@@ -610,9 +645,9 @@ dependent mappings are flagged for re-evaluation.
The infospace is designed to be **iteratively refined**: The infospace is designed to be **iteratively refined**:
1. **Process chapters**run the pipeline 1. **Process chapters**`markitect infospace process "book-1-*.md" --provider openrouter`
2. **Evaluate**`markitect infospace evaluate --provider openrouter` 2. **Evaluate**`markitect infospace evaluate --provider openrouter`
3. **Check**`markitect infospace check --provider openrouter` 3. **Check**`markitect infospace check`
4. **Review viability**`markitect infospace viability` 4. **Review viability**`markitect infospace viability`
5. **Refine guidelines** — update `extraction-rules.md` or 5. **Refine guidelines** — update `extraction-rules.md` or
`mapping-rules.md` to address identified weaknesses `mapping-rules.md` to address identified weaknesses
@@ -626,18 +661,31 @@ audit, inspection, and oversight mechanisms.
To re-process a specific chapter: To re-process a specific chapter:
```bash ```bash
# Delete stage outputs for that chapter (not canonical entity files):
rm -f output/entities/book-1-chapter-03-entities.md rm -f output/entities/book-1-chapter-03-entities.md
rm -f output/mappings/book-1-chapter-03-mappings.md rm -f output/mappings/book-1-chapter-03-mappings.md
rm -f output/analyses/book-1-chapter-03-analysis.md rm -f output/analyses/book-1-chapter-03-analysis.md
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
# Re-run:
markitect infospace process "book-1-chapter-03.md" --provider openrouter
``` ```
Never silently delete canonical entity files. Archive them instead: Never silently delete canonical entity files. Archive them instead by
moving to `output/entities/archive/` with a dated comment header, then
re-process the chapter so the pipeline can extract a replacement:
```bash ```bash
python process_chapters.py --archive-entity extent-of-the-market \ # Archive the entity manually:
--reason "Subsumed by market-price and effectual-demand" mkdir -p output/entities/archive
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter mv output/entities/extent-of-the-market.md output/entities/archive/
# Add header to the archived file explaining why
echo '<!-- archived: 2026-02-22 reason="Subsumed by market-price and effectual-demand" -->' \
| cat - output/entities/archive/extent-of-the-market.md > /tmp/tmp.md \
&& mv /tmp/tmp.md output/entities/archive/extent-of-the-market.md
# Delete the chapter entity view so the chapter re-runs:
rm -f output/entities/book-1-chapter-03-entities.md
markitect infospace process "book-1-chapter-03.md" --provider openrouter
``` ```
--- ---
@@ -651,9 +699,12 @@ it is fully derived from the markdown files that are tracked.
To regenerate it after a fresh clone (no LLM calls needed): To regenerate it after a fresh clone (no LLM calls needed):
```bash ```bash
python process_chapters.py --all --no-commit markitect infospace process --all
``` ```
Without `--provider`, the command runs in dry-run mode: it loads existing
output files from disk into the database without making any LLM calls.
--- ---
## 16. Adapting This Pattern to Your Own Project ## 16. Adapting This Pattern to Your Own Project
@@ -665,9 +716,9 @@ To build your own infospace:
3. Write extraction guidelines that tell the LLM what to look for 3. Write extraction guidelines that tell the LLM what to look for
4. Create prompt templates using `@{macro}` syntax 4. Create prompt templates using `@{macro}` syntax
5. Populate `artifacts/sources/` with your source corpus 5. Populate `artifacts/sources/` with your source corpus
6. Run `process_chapters.py` (or your equivalent pipeline script) 6. `markitect infospace process --all --provider openrouter`
7. Evaluate with `markitect infospace evaluate` and `check` 7. `markitect infospace check` and `markitect infospace evaluate --provider openrouter`
8. Review `markitect infospace viability` against your thresholds 8. `markitect infospace viability` — review against your thresholds
9. Iterate: refine guidelines, re-process, re-evaluate 9. Iterate: refine guidelines, re-process, re-evaluate
10. Once viable, use as a discipline for a new infospace 10. Once viable, use as a discipline for a new infospace