docs(tutorial): update all commands to use markitect infospace CLI (S3.4)
Replace all process_chapters.py references throughout the tutorial with
the correct markitect infospace subcommands:
- §2 Project layout: remove process_chapters.py, add LAYERED-DEVELOPMENT.md
- §7 Processing: --chapter → process "glob", --book N → "book-N-*.md",
--list → status/entities, --archive-entity → documented manual step
- §8 Check: remove incorrect --provider flag; note checks are deterministic
- §9 Viability: real output from full 988-entity corpus (Viable: YES)
- §10 History: real snapshot table; add --metric flag example
- §10 Git tracking: remove process_chapters.py from commit example
- §11 Cost: update openrouter/free example command
- §12 Completion: rewrite with actual observed metric progression table
- §14 Quality loop: update all commands; add archive-entity manual procedure
- §15 Artifact DB: --all without --provider = dry-run (no LLM calls)
- §16 Adapting: update step 6 and 7 to new CLI
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -45,11 +45,11 @@ metrics — it is fit for purpose as an explanatory tool.
|
||||
```
|
||||
examples/infospace-with-history/
|
||||
│
|
||||
├── infospace.yaml # Declarative infospace configuration (NEW)
|
||||
├── infospace.yaml # Declarative infospace configuration
|
||||
├── README.md
|
||||
├── TUTORIAL.md # This file
|
||||
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
|
||||
├── process_chapters.py # Pipeline script (chapter processing)
|
||||
├── LAYERED-DEVELOPMENT.md # Concept for L2–L4 entity classification and modelling
|
||||
├── infospace.db # SQLite artifact database (generated, not in git)
|
||||
│
|
||||
├── schemas/ # Output structure definitions
|
||||
@@ -301,61 +301,79 @@ Named `book-1-chapter-01.md` through `book-5-chapter-03.md`.
|
||||
|
||||
## 7. Processing Chapters
|
||||
|
||||
`process_chapters.py` orchestrates the three-stage pipeline. It initialises
|
||||
the artifact repository, loads static artifacts, runs entity extraction →
|
||||
VSM mapping → analysis synthesis, and commits each chapter to git.
|
||||
`markitect infospace process` orchestrates the three-stage pipeline declared
|
||||
in `infospace.yaml`. It runs entity extraction → VSM mapping → analysis
|
||||
synthesis for each source file, and commits each chapter to git.
|
||||
|
||||
### Single chapter
|
||||
|
||||
```bash
|
||||
# Manual mode (writes prompts, awaits output files):
|
||||
python process_chapters.py --chapter book-1-chapter-05 --no-commit
|
||||
# Dry run — loads existing outputs only, no LLM calls:
|
||||
markitect infospace process "book-1-chapter-05.md"
|
||||
|
||||
# Auto mode via OpenRouter (free models available):
|
||||
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter
|
||||
# Process via OpenRouter (free models available):
|
||||
markitect infospace process "book-1-chapter-05.md" --provider openrouter
|
||||
|
||||
# With a specific free model:
|
||||
python process_chapters.py --chapter book-1-chapter-05 \
|
||||
markitect infospace process "book-1-chapter-05.md" \
|
||||
--provider openrouter --model meta-llama/llama-4-maverick:free
|
||||
|
||||
# Skip git commit after processing:
|
||||
markitect infospace process "book-1-chapter-05.md" \
|
||||
--provider openrouter --no-commit
|
||||
```
|
||||
|
||||
The GLOB_PATTERN is matched against the `sources` directory declared in
|
||||
`infospace.yaml`. Already-processed chapters are skipped automatically —
|
||||
their output files already exist on disk.
|
||||
|
||||
### Whole book or all chapters
|
||||
|
||||
```bash
|
||||
python process_chapters.py --book 1 --provider openrouter
|
||||
python process_chapters.py --all --provider openrouter
|
||||
# Process all chapters of Book 1:
|
||||
markitect infospace process "book-1-*.md" --provider openrouter
|
||||
|
||||
# Process all 35 source files:
|
||||
markitect infospace process --all --provider openrouter
|
||||
|
||||
# Process all chapters and run quality checks after each one:
|
||||
markitect infospace process --all --provider openrouter --check-after-each
|
||||
```
|
||||
|
||||
### Check progress
|
||||
|
||||
```bash
|
||||
python process_chapters.py --list
|
||||
markitect infospace status
|
||||
```
|
||||
|
||||
```
|
||||
Available chapters (35):
|
||||
|
||||
Chapter Entities Mappings Analysis
|
||||
------------------------------ ------------ ------------ ------------
|
||||
book-1-chapter-01 done (13) done done
|
||||
book-1-chapter-02 done (7) done done
|
||||
...
|
||||
|
||||
Canonical entity set: 109 unique entities
|
||||
Infospace: The Wealth of Nations
|
||||
Domain: Classical Economics
|
||||
Entities: 988
|
||||
Domains: Accumulation, Consumption, Distribution, Exchange,
|
||||
General Theory, Production, Regulation
|
||||
Disciplines: Viable System Model
|
||||
Last evaluated: 2026-02-19T21:54:44
|
||||
```
|
||||
|
||||
```bash
|
||||
markitect infospace entities
|
||||
```
|
||||
|
||||
Lists all canonical entities with domain, source chapter, and word count.
|
||||
|
||||
### Entity lifecycle
|
||||
|
||||
Entities in the canonical set are **never silently deleted**. Retire
|
||||
an entity by archiving it with a documented reason:
|
||||
Entities in the canonical set are **never silently deleted**. To retire
|
||||
an entity, move it to `output/entities/archive/<slug>.md` and add a
|
||||
dated archive header:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --archive-entity enlarged-monopoly \
|
||||
--reason "Subsumed by monopoly-price — same market distortion"
|
||||
```markdown
|
||||
<!-- archived: 2026-02-22 reason="Subsumed by monopoly-price — same market distortion" -->
|
||||
```
|
||||
|
||||
The archived file moves to `output/entities/archive/<slug>.md` with a
|
||||
dated header, preserving the intellectual history of every decision.
|
||||
Then commit the removal so the intellectual history of every decision
|
||||
is preserved in git.
|
||||
|
||||
---
|
||||
|
||||
@@ -385,43 +403,46 @@ VSM relevance. Results are written to `output/evaluations/`.
|
||||
|
||||
```bash
|
||||
# Run all five collection checks:
|
||||
markitect infospace check --provider openrouter
|
||||
markitect infospace check
|
||||
|
||||
# Run individual checks:
|
||||
markitect infospace check redundancy # C1: Are any entities synonymous?
|
||||
markitect infospace check coverage # C2: Which domain × VSM cells are empty?
|
||||
markitect infospace check coherence # C3: Is the entity graph well-connected?
|
||||
markitect infospace check consistency # C4: Are there circular definitions?
|
||||
markitect infospace check granularity # C5: Is abstraction level balanced?
|
||||
markitect infospace check --concern redundancy # C1: Are any entities synonymous?
|
||||
markitect infospace check --concern coverage # C2: Which domain × chapter cells are empty?
|
||||
markitect infospace check --concern coherence # C3: Is the entity graph well-connected?
|
||||
markitect infospace check --concern consistency # C4: Are there circular definitions?
|
||||
markitect infospace check --concern granularity # C5: Is abstraction level balanced?
|
||||
```
|
||||
|
||||
Collection checks are deterministic (embeddings, graph analysis, FCA) and
|
||||
require no LLM provider.
|
||||
|
||||
Each check uses the platform's embedding, graph analysis, and FCA
|
||||
infrastructure. Results are written to `output/metrics/` and a new
|
||||
snapshot is appended to `metrics-history.yaml`.
|
||||
|
||||
Sample output:
|
||||
Sample output (full corpus, 988 entities):
|
||||
|
||||
```
|
||||
Running collection checks on 109 entities...
|
||||
Collection checks — 988 entities
|
||||
|
||||
C1 — redundancy
|
||||
redundancy_ratio: 0.0183
|
||||
high_similarity_pairs: 2
|
||||
redundancy_ratio: 0.0061
|
||||
similar_pairs: 3 candidates (word-overlap > 0.85)
|
||||
|
||||
C2 — coverage
|
||||
coverage_ratio: 0.4286
|
||||
empty_cells: [['Regulation', 'S3*'], ['Historical', 'S5']]
|
||||
coverage_ratio: 0.619
|
||||
domain_densities: Exchange 0.85, Regulation 0.85, General Theory 0.73 …
|
||||
density_std: 0.211 cross_cutting_ratio: 0.714
|
||||
|
||||
C3 — coherence
|
||||
coherence_components: 1
|
||||
modularity: 0.412
|
||||
connected_components: 0 (no cross-reference graph built yet)
|
||||
modularity: 0.0
|
||||
|
||||
C4 — consistency
|
||||
consistency_cycles: 0
|
||||
grounding_ratio: 0.94
|
||||
cycle_count: 0
|
||||
|
||||
C5 — granularity
|
||||
granularity_entropy: 2.69
|
||||
granularity_entropy: 2.953
|
||||
```
|
||||
|
||||
---
|
||||
@@ -436,20 +457,21 @@ Compares the latest metrics against the thresholds declared in
|
||||
`infospace.yaml`:
|
||||
|
||||
```
|
||||
Metric Value Threshold Status
|
||||
-----------------------------------------------------------
|
||||
redundancy_ratio 0.0183 max=0.10 PASS
|
||||
coverage_ratio 0.4286 min=0.50 FAIL
|
||||
coherence_components 1 max=3 PASS
|
||||
consistency_cycles 0 max=0 PASS
|
||||
granularity_entropy 2.6900 min=1.0 PASS
|
||||
Metric Value Threshold Status
|
||||
---------------------------------------------------------------
|
||||
redundancy_ratio 0.0059 max=0.1 PASS
|
||||
coverage_ratio 0.6190 min=0.4 PASS
|
||||
coherence_components 0.0000 max=3 PASS
|
||||
consistency_cycles 0.0000 max=0 PASS
|
||||
granularity_entropy 2.9533 min=1.0 PASS
|
||||
|
||||
Viable: NO (4/5 thresholds met)
|
||||
Viable: YES (5/5 thresholds met)
|
||||
```
|
||||
|
||||
Coverage is currently failing (42% < 50% threshold) because only 9 of
|
||||
35 chapters have been processed. Once more chapters are done, coverage
|
||||
will rise.
|
||||
During early processing (first few books), coverage will fall and
|
||||
then stabilise as the domain × chapter matrix fills in. The threshold
|
||||
of 0.40 reflects realistic expectations for a multi-book corpus where
|
||||
some domains are naturally sparse in certain chapters.
|
||||
|
||||
### Metrics history
|
||||
|
||||
@@ -460,9 +482,19 @@ markitect infospace history
|
||||
Shows how metrics evolved across runs:
|
||||
|
||||
```
|
||||
Snapshot Date Entities coverage redundancy entropy
|
||||
-------------------------------------------------------------
|
||||
6ba48eb2 2026-02-19 85 0.361 0.000 2.687
|
||||
History: 36 snapshot(s)
|
||||
|
||||
# Date Entities Metrics
|
||||
------------------------------------------
|
||||
1 2026-02-19T13:07:13 18 6
|
||||
2 2026-02-19T13:16:36 43 6
|
||||
...
|
||||
36 2026-02-19T21:54:44 1021 6
|
||||
```
|
||||
|
||||
```bash
|
||||
# Show trend for a specific metric:
|
||||
markitect infospace history --metric coverage_ratio
|
||||
```
|
||||
|
||||
---
|
||||
@@ -483,16 +515,13 @@ This means:
|
||||
- You can `git bisect` to find where quality degraded
|
||||
- You can revert a chapter and re-process with improved guidelines
|
||||
|
||||
The `clean-example-history` branch in this repository demonstrates the
|
||||
intended structure: each chapter is a single, self-contained commit.
|
||||
Use it as a reference for how the infospace grew step by step.
|
||||
|
||||
To commit manually after reviewing:
|
||||
To review before committing:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --chapter book-1-chapter-05 --provider openrouter --no-commit
|
||||
markitect infospace process "book-1-chapter-05.md" \
|
||||
--provider openrouter --no-commit
|
||||
# review output/entities/ and output/mappings/
|
||||
git add examples/infospace-with-history/output/
|
||||
git add output/
|
||||
git commit -m "infospace: process book-1-chapter-05"
|
||||
```
|
||||
|
||||
@@ -519,7 +548,7 @@ Use `openrouter/free` to automatically select from whichever free model is
|
||||
available:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --chapter book-1-chapter-05 \
|
||||
markitect infospace process "book-1-chapter-05.md" \
|
||||
--provider openrouter --model openrouter/free
|
||||
```
|
||||
|
||||
@@ -531,47 +560,53 @@ when running inside a Claude Code session due to nested session restrictions.
|
||||
|
||||
---
|
||||
|
||||
## 12. Completing the Remaining Chapters
|
||||
## 12. Processing the Full Corpus
|
||||
|
||||
As of writing, 9 of 35 chapters are processed (Book I, Chapters 1–9).
|
||||
All 35 chapters have been processed in this example. The commands below
|
||||
show how the full run was executed — use them as a template for your own
|
||||
corpus.
|
||||
|
||||
**Process Book I remainder:**
|
||||
**Process one book at a time:**
|
||||
|
||||
```bash
|
||||
export OPENROUTER_API_KEY=$(cat apikey-openrouter.txt | tr -d '[:space:]')
|
||||
git checkout clean-example-history
|
||||
python process_chapters.py --book 1 --provider openrouter
|
||||
|
||||
markitect infospace process "book-1-*.md" --provider openrouter
|
||||
markitect infospace process "book-2-*.md" --provider openrouter
|
||||
markitect infospace process "book-3-*.md" --provider openrouter
|
||||
markitect infospace process "book-4-*.md" --provider openrouter
|
||||
markitect infospace process "book-5-*.md" --provider openrouter
|
||||
```
|
||||
|
||||
Already-processed chapters are skipped — their chapter view files exist.
|
||||
The `@{existing_entities}` macro ensures the LLM only extracts genuinely
|
||||
new entities.
|
||||
Already-processed chapters are skipped automatically — their output files
|
||||
exist on disk. The `@{existing_entities}` macro ensures the LLM only
|
||||
extracts genuinely new entities.
|
||||
|
||||
**Process Books II–V:**
|
||||
**Or process everything at once:**
|
||||
|
||||
```bash
|
||||
python process_chapters.py --book 2 --provider openrouter
|
||||
python process_chapters.py --book 3 --provider openrouter
|
||||
python process_chapters.py --book 4 --provider openrouter
|
||||
python process_chapters.py --book 5 --provider openrouter
|
||||
markitect infospace process --all --provider openrouter
|
||||
```
|
||||
|
||||
**Run collection checks after each book:**
|
||||
|
||||
```bash
|
||||
markitect infospace check --provider openrouter
|
||||
markitect infospace check
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
**Expected progression:**
|
||||
**Observed metric progression (actual results):**
|
||||
|
||||
| After | Chapters | Expected coverage |
|
||||
|-------|----------|-------------------|
|
||||
| Book I (11 ch.) | 11/35 | S1, S2, S4 strong; S3 emerging |
|
||||
| Books I–II (16 ch.) | 16/35 | S3 (capital control) covered |
|
||||
| Books I–III (20 ch.) | 20/35 | Historical patterns add depth |
|
||||
| Books I–IV (30 ch.) | 30/35 | S5 (policy, mercantilism) emerging |
|
||||
| All (35 ch.) | 35/35 | Full coverage; S3* and algedonic signals from Book V |
|
||||
| After | Entities | coverage_ratio | entropy |
|
||||
|-------|----------|----------------|---------|
|
||||
| Book I (11 ch.) | ~236 | 0.51 | 2.77 |
|
||||
| Books I–II (16 ch.) | ~348 | 0.56 | 2.82 |
|
||||
| Books I–III (20 ch.) | ~456 | 0.59 | 2.97 |
|
||||
| Books I–IV (30 ch.) | ~930 | 0.51 | 2.94 |
|
||||
| All (35 ch.) | 988 | **0.62** | 2.95 |
|
||||
|
||||
Coverage dips in Books IV–V as policy-heavy chapters introduce domains
|
||||
that are sparse in earlier books, then recovers as the matrix fills in.
|
||||
|
||||
---
|
||||
|
||||
@@ -610,9 +645,9 @@ dependent mappings are flagged for re-evaluation.
|
||||
|
||||
The infospace is designed to be **iteratively refined**:
|
||||
|
||||
1. **Process chapters** — run the pipeline
|
||||
1. **Process chapters** — `markitect infospace process "book-1-*.md" --provider openrouter`
|
||||
2. **Evaluate** — `markitect infospace evaluate --provider openrouter`
|
||||
3. **Check** — `markitect infospace check --provider openrouter`
|
||||
3. **Check** — `markitect infospace check`
|
||||
4. **Review viability** — `markitect infospace viability`
|
||||
5. **Refine guidelines** — update `extraction-rules.md` or
|
||||
`mapping-rules.md` to address identified weaknesses
|
||||
@@ -626,18 +661,31 @@ audit, inspection, and oversight mechanisms.
|
||||
To re-process a specific chapter:
|
||||
|
||||
```bash
|
||||
# Delete stage outputs for that chapter (not canonical entity files):
|
||||
rm -f output/entities/book-1-chapter-03-entities.md
|
||||
rm -f output/mappings/book-1-chapter-03-mappings.md
|
||||
rm -f output/analyses/book-1-chapter-03-analysis.md
|
||||
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
|
||||
|
||||
# Re-run:
|
||||
markitect infospace process "book-1-chapter-03.md" --provider openrouter
|
||||
```
|
||||
|
||||
Never silently delete canonical entity files. Archive them instead:
|
||||
Never silently delete canonical entity files. Archive them instead by
|
||||
moving to `output/entities/archive/` with a dated comment header, then
|
||||
re-process the chapter so the pipeline can extract a replacement:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --archive-entity extent-of-the-market \
|
||||
--reason "Subsumed by market-price and effectual-demand"
|
||||
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter
|
||||
# Archive the entity manually:
|
||||
mkdir -p output/entities/archive
|
||||
mv output/entities/extent-of-the-market.md output/entities/archive/
|
||||
# Add header to the archived file explaining why
|
||||
echo '<!-- archived: 2026-02-22 reason="Subsumed by market-price and effectual-demand" -->' \
|
||||
| cat - output/entities/archive/extent-of-the-market.md > /tmp/tmp.md \
|
||||
&& mv /tmp/tmp.md output/entities/archive/extent-of-the-market.md
|
||||
|
||||
# Delete the chapter entity view so the chapter re-runs:
|
||||
rm -f output/entities/book-1-chapter-03-entities.md
|
||||
markitect infospace process "book-1-chapter-03.md" --provider openrouter
|
||||
```
|
||||
|
||||
---
|
||||
@@ -651,9 +699,12 @@ it is fully derived from the markdown files that are tracked.
|
||||
To regenerate it after a fresh clone (no LLM calls needed):
|
||||
|
||||
```bash
|
||||
python process_chapters.py --all --no-commit
|
||||
markitect infospace process --all
|
||||
```
|
||||
|
||||
Without `--provider`, the command runs in dry-run mode: it loads existing
|
||||
output files from disk into the database without making any LLM calls.
|
||||
|
||||
---
|
||||
|
||||
## 16. Adapting This Pattern to Your Own Project
|
||||
@@ -665,9 +716,9 @@ To build your own infospace:
|
||||
3. Write extraction guidelines that tell the LLM what to look for
|
||||
4. Create prompt templates using `@{macro}` syntax
|
||||
5. Populate `artifacts/sources/` with your source corpus
|
||||
6. Run `process_chapters.py` (or your equivalent pipeline script)
|
||||
7. Evaluate with `markitect infospace evaluate` and `check`
|
||||
8. Review `markitect infospace viability` against your thresholds
|
||||
6. `markitect infospace process --all --provider openrouter`
|
||||
7. `markitect infospace check` and `markitect infospace evaluate --provider openrouter`
|
||||
8. `markitect infospace viability` — review against your thresholds
|
||||
9. Iterate: refine guidelines, re-process, re-evaluate
|
||||
10. Once viable, use as a discipline for a new infospace
|
||||
|
||||
|
||||
Reference in New Issue
Block a user