docs(tutorial): update all commands to use markitect infospace CLI (S3.4)

Replace all process_chapters.py references throughout the tutorial with
the correct markitect infospace subcommands:

- §2  Project layout: remove process_chapters.py, add LAYERED-DEVELOPMENT.md
- §7  Processing: --chapter → process "glob", --book N → "book-N-*.md",
      --list → status/entities, --archive-entity → documented manual step
- §8  Check: remove incorrect --provider flag; note checks are deterministic
- §9  Viability: real output from full 988-entity corpus (Viable: YES)
- §10 History: real snapshot table; add --metric flag example
- §10 Git tracking: remove process_chapters.py from commit example
- §11 Cost: update openrouter/free example command
- §12 Completion: rewrite with actual observed metric progression table
- §14 Quality loop: update all commands; add archive-entity manual procedure
- §15 Artifact DB: --all without --provider = dry-run (no LLM calls)
- §16 Adapting: update step 6 and 7 to new CLI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-22 23:31:38 +01:00
parent c861520ccd
commit 8f00fa2018

View File

@@ -45,11 +45,11 @@ metrics — it is fit for purpose as an explanatory tool.
```
examples/infospace-with-history/
├── infospace.yaml # Declarative infospace configuration (NEW)
├── infospace.yaml # Declarative infospace configuration
├── README.md
├── TUTORIAL.md # This file
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
├── process_chapters.py # Pipeline script (chapter processing)
├── LAYERED-DEVELOPMENT.md # Concept for L2L4 entity classification and modelling
├── infospace.db # SQLite artifact database (generated, not in git)
├── schemas/ # Output structure definitions
@@ -301,61 +301,79 @@ Named `book-1-chapter-01.md` through `book-5-chapter-03.md`.
## 7. Processing Chapters
`process_chapters.py` orchestrates the three-stage pipeline. It initialises
the artifact repository, loads static artifacts, runs entity extraction →
VSM mapping → analysis synthesis, and commits each chapter to git.
`markitect infospace process` orchestrates the three-stage pipeline declared
in `infospace.yaml`. It runs entity extraction → VSM mapping → analysis
synthesis for each source file, and commits each chapter to git.
### Single chapter
```bash
# Manual mode (writes prompts, awaits output files):
python process_chapters.py --chapter book-1-chapter-05 --no-commit
# Dry run — loads existing outputs only, no LLM calls:
markitect infospace process "book-1-chapter-05.md"
# Auto mode via OpenRouter (free models available):
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter
# Process via OpenRouter (free models available):
markitect infospace process "book-1-chapter-05.md" --provider openrouter
# With a specific free model:
python process_chapters.py --chapter book-1-chapter-05 \
markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --model meta-llama/llama-4-maverick:free
# Skip git commit after processing:
markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --no-commit
```
The GLOB_PATTERN is matched against the `sources` directory declared in
`infospace.yaml`. Already-processed chapters are skipped automatically —
their output files already exist on disk.
### Whole book or all chapters
```bash
python process_chapters.py --book 1 --provider openrouter
python process_chapters.py --all --provider openrouter
# Process all chapters of Book 1:
markitect infospace process "book-1-*.md" --provider openrouter
# Process all 35 source files:
markitect infospace process --all --provider openrouter
# Process all chapters and run quality checks after each one:
markitect infospace process --all --provider openrouter --check-after-each
```
### Check progress
```bash
python process_chapters.py --list
markitect infospace status
```
```
Available chapters (35):
Chapter Entities Mappings Analysis
------------------------------ ------------ ------------ ------------
book-1-chapter-01 done (13) done done
book-1-chapter-02 done (7) done done
...
Canonical entity set: 109 unique entities
Infospace: The Wealth of Nations
Domain: Classical Economics
Entities: 988
Domains: Accumulation, Consumption, Distribution, Exchange,
General Theory, Production, Regulation
Disciplines: Viable System Model
Last evaluated: 2026-02-19T21:54:44
```
```bash
markitect infospace entities
```
Lists all canonical entities with domain, source chapter, and word count.
### Entity lifecycle
Entities in the canonical set are **never silently deleted**. Retire
an entity by archiving it with a documented reason:
Entities in the canonical set are **never silently deleted**. To retire
an entity, move it to `output/entities/archive/<slug>.md` and add a
dated archive header:
```bash
python process_chapters.py --archive-entity enlarged-monopoly \
--reason "Subsumed by monopoly-price — same market distortion"
```markdown
<!-- archived: 2026-02-22 reason="Subsumed by monopoly-price — same market distortion" -->
```
The archived file moves to `output/entities/archive/<slug>.md` with a
dated header, preserving the intellectual history of every decision.
Then commit the removal so the intellectual history of every decision
is preserved in git.
---
@@ -385,43 +403,46 @@ VSM relevance. Results are written to `output/evaluations/`.
```bash
# Run all five collection checks:
markitect infospace check --provider openrouter
markitect infospace check
# Run individual checks:
markitect infospace check redundancy # C1: Are any entities synonymous?
markitect infospace check coverage # C2: Which domain × VSM cells are empty?
markitect infospace check coherence # C3: Is the entity graph well-connected?
markitect infospace check consistency # C4: Are there circular definitions?
markitect infospace check granularity # C5: Is abstraction level balanced?
markitect infospace check --concern redundancy # C1: Are any entities synonymous?
markitect infospace check --concern coverage # C2: Which domain × chapter cells are empty?
markitect infospace check --concern coherence # C3: Is the entity graph well-connected?
markitect infospace check --concern consistency # C4: Are there circular definitions?
markitect infospace check --concern granularity # C5: Is abstraction level balanced?
```
Collection checks are deterministic (embeddings, graph analysis, FCA) and
require no LLM provider.
Each check uses the platform's embedding, graph analysis, and FCA
infrastructure. Results are written to `output/metrics/` and a new
snapshot is appended to `metrics-history.yaml`.
Sample output:
Sample output (full corpus, 988 entities):
```
Running collection checks on 109 entities...
Collection checks — 988 entities
C1 — redundancy
redundancy_ratio: 0.0183
high_similarity_pairs: 2
redundancy_ratio: 0.0061
similar_pairs: 3 candidates (word-overlap > 0.85)
C2 — coverage
coverage_ratio: 0.4286
empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']]
coverage_ratio: 0.619
domain_densities: Exchange 0.85, Regulation 0.85, General Theory 0.73 …
density_std: 0.211 cross_cutting_ratio: 0.714
C3 — coherence
coherence_components: 1
modularity: 0.412
connected_components: 0 (no cross-reference graph built yet)
modularity: 0.0
C4 — consistency
consistency_cycles: 0
grounding_ratio: 0.94
cycle_count: 0
C5 — granularity
granularity_entropy: 2.69
granularity_entropy: 2.953
```
---
@@ -436,20 +457,21 @@ Compares the latest metrics against the thresholds declared in
`infospace.yaml`:
```
Metric Value Threshold Status
-----------------------------------------------------------
redundancy_ratio 0.0183 max=0.10 PASS
coverage_ratio 0.4286 min=0.50 FAIL
coherence_components 1 max=3 PASS
consistency_cycles 0 max=0 PASS
granularity_entropy 2.6900 min=1.0 PASS
Metric Value Threshold Status
---------------------------------------------------------------
redundancy_ratio 0.0059 max=0.1 PASS
coverage_ratio 0.6190 min=0.4 PASS
coherence_components 0.0000 max=3 PASS
consistency_cycles 0.0000 max=0 PASS
granularity_entropy 2.9533 min=1.0 PASS
Viable: NO (4/5 thresholds met)
Viable: YES (5/5 thresholds met)
```
Coverage is currently failing (42% < 50% threshold) because only 9 of
35 chapters have been processed. Once more chapters are done, coverage
will rise.
During early processing (first few books), coverage will fall and
then stabilise as the domain × chapter matrix fills in. The threshold
of 0.40 reflects realistic expectations for a multi-book corpus where
some domains are naturally sparse in certain chapters.
### Metrics history
@@ -460,9 +482,19 @@ markitect infospace history
Shows how metrics evolved across runs:
```
Snapshot Date Entities coverage redundancy entropy
-------------------------------------------------------------
6ba48eb2 2026-02-19 85 0.361 0.000 2.687
History: 36 snapshot(s)
# Date Entities Metrics
------------------------------------------
1 2026-02-19T13:07:13 18 6
2 2026-02-19T13:16:36 43 6
...
36 2026-02-19T21:54:44 1021 6
```
```bash
# Show trend for a specific metric:
markitect infospace history --metric coverage_ratio
```
---
@@ -483,16 +515,13 @@ This means:
- You can `git bisect` to find where quality degraded
- You can revert a chapter and re-process with improved guidelines
The `clean-example-history` branch in this repository demonstrates the
intended structure: each chapter is a single, self-contained commit.
Use it as a reference for how the infospace grew step by step.
To commit manually after reviewing:
To review before committing:
```bash
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --no-commit
# review output/entities/ and output/mappings/
git add examples/infospace-with-history/output/
git add output/
git commit -m "infospace: process book-1-chapter-05"
```
@@ -519,7 +548,7 @@ Use `openrouter/free` to automatically select from whichever free model is
available:
```bash
python process_chapters.py --chapter book-1-chapter-05 \
markitect infospace process "book-1-chapter-05.md" \
--provider openrouter --model openrouter/free
```
@@ -531,47 +560,53 @@ when running inside a Claude Code session due to nested session restrictions.
---
## 12. Completing the Remaining Chapters
## 12. Processing the Full Corpus
As of writing, 9 of 35 chapters are processed (Book I, Chapters 19).
All 35 chapters have been processed in this example. The commands below
show how the full run was executed — use them as a template for your own
corpus.
**Process Book I remainder:**
**Process one book at a time:**
```bash
export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
git checkout clean-example-history
python process_chapters.py --book 1 --provider openrouter
markitect infospace process "book-1-*.md" --provider openrouter
markitect infospace process "book-2-*.md" --provider openrouter
markitect infospace process "book-3-*.md" --provider openrouter
markitect infospace process "book-4-*.md" --provider openrouter
markitect infospace process "book-5-*.md" --provider openrouter
```
Already-processed chapters are skipped — their chapter view files exist.
The `@{existing_entities}` macro ensures the LLM only extracts genuinely
new entities.
Already-processed chapters are skipped automatically — their output files
exist on disk. The `@{existing_entities}` macro ensures the LLM only
extracts genuinely new entities.
**Process Books IIV:**
**Or process everything at once:**
```bash
python process_chapters.py --book 2 --provider openrouter
python process_chapters.py --book 3 --provider openrouter
python process_chapters.py --book 4 --provider openrouter
python process_chapters.py --book 5 --provider openrouter
markitect infospace process --all --provider openrouter
```
**Run collection checks after each book:**
```bash
markitect infospace check --provider openrouter
markitect infospace check
markitect infospace viability
```
**Expected progression:**
**Observed metric progression (actual results):**
| After | Chapters | Expected coverage |
|-------|----------|-------------------|
| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging |
| Books III (16 ch.) | 16/35 | S3 (capital control) covered |
| Books IIII (20 ch.) | 20/35 | Historical patterns add depth |
| Books IIV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging |
| All (35 ch.) | 35/35 | Full coverage; S3* and algedonic signals from Book V |
| After | Entities | coverage_ratio | entropy |
|-------|----------|----------------|---------|
| Book I (11 ch.) | ~236 | 0.51 | 2.77 |
| Books III (16 ch.) | ~348 | 0.56 | 2.82 |
| Books IIII (20 ch.) | ~456 | 0.59 | 2.97 |
| Books IIV (30 ch.) | ~930 | 0.51 | 2.94 |
| All (35 ch.) | 988 | **0.62** | 2.95 |
Coverage dips in Books IVV as policy-heavy chapters introduce domains
that are sparse in earlier books, then recovers as the matrix fills in.
---
@@ -610,9 +645,9 @@ dependent mappings are flagged for re-evaluation.
The infospace is designed to be **iteratively refined**:
1. **Process chapters**run the pipeline
1. **Process chapters**`markitect infospace process "book-1-*.md" --provider openrouter`
2. **Evaluate**`markitect infospace evaluate --provider openrouter`
3. **Check**`markitect infospace check --provider openrouter`
3. **Check**`markitect infospace check`
4. **Review viability**`markitect infospace viability`
5. **Refine guidelines** — update `extraction-rules.md` or
`mapping-rules.md` to address identified weaknesses
@@ -626,18 +661,31 @@ audit, inspection, and oversight mechanisms.
To re-process a specific chapter:
```bash
# Delete stage outputs for that chapter (not canonical entity files):
rm -f output/entities/book-1-chapter-03-entities.md
rm -f output/mappings/book-1-chapter-03-mappings.md
rm -f output/analyses/book-1-chapter-03-analysis.md
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
# Re-run:
markitect infospace process "book-1-chapter-03.md" --provider openrouter
```
Never silently delete canonical entity files. Archive them instead:
Never silently delete canonical entity files. Archive them instead by
moving to `output/entities/archive/` with a dated comment header, then
re-process the chapter so the pipeline can extract a replacement:
```bash
python process_chapters.py --archive-entity extent-of-the-market \
--reason "Subsumed by market-price and effectual-demand"
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
# Archive the entity manually:
mkdir -p output/entities/archive
mv output/entities/extent-of-the-market.md output/entities/archive/
# Add header to the archived file explaining why
echo '<!-- archived: 2026-02-22 reason="Subsumed by market-price and effectual-demand" -->' \
| cat - output/entities/archive/extent-of-the-market.md > /tmp/tmp.md \
&& mv /tmp/tmp.md output/entities/archive/extent-of-the-market.md
# Delete the chapter entity view so the chapter re-runs:
rm -f output/entities/book-1-chapter-03-entities.md
markitect infospace process "book-1-chapter-03.md" --provider openrouter
```
---
@@ -651,9 +699,12 @@ it is fully derived from the markdown files that are tracked.
To regenerate it after a fresh clone (no LLM calls needed):
```bash
python process_chapters.py --all --no-commit
markitect infospace process --all
```
Without `--provider`, the command runs in dry-run mode: it loads existing
output files from disk into the database without making any LLM calls.
---
## 16. Adapting This Pattern to Your Own Project
@@ -665,9 +716,9 @@ To build your own infospace:
3. Write extraction guidelines that tell the LLM what to look for
4. Create prompt templates using `@{macro}` syntax
5. Populate `artifacts/sources/` with your source corpus
6. Run `process_chapters.py` (or your equivalent pipeline script)
7. Evaluate with `markitect infospace evaluate` and `check`
8. Review `markitect infospace viability` against your thresholds
6. `markitect infospace process --all --provider openrouter`
7. `markitect infospace check` and `markitect infospace evaluate --provider openrouter`
8. `markitect infospace viability` — review against your thresholds
9. Iterate: refine guidelines, re-process, re-evaluate
10. Once viable, use as a discipline for a new infospace