Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Three coordinated changes that let the pipeline produce a clean chapter-by-chapter git history on long texts without archaeology after the fact. 1. Richer commit messages. `SourcePipeline._git_commit` now diffs the staged changes, buckets added files by output subdirectory (entities, evaluations, classifications, mappings, analyses, metrics, logs), and includes counts in the commit body. So `git log` reads "entities: +23, evaluations: +23" per chapter instead of the same generic blurb on every commit. Zero behaviour change when no output changed; falls back to the original message if the diff query fails. 2. --eval-after-source / --classify-after-source on `infospace process`. After a source's stages succeed, the pipeline identifies which entity files are *new* (set diff of entity slugs before vs after), loads their EntityMeta, and runs per-entity evaluation and/or classification scoped to just those slugs before the per-source git commit lands. Result: each chapter's commit is self-contained — extraction + evaluation + classification in one atomic unit. Gated behind explicit flags because the cost is real (LLM latency per chapter rather than amortised across one bulk batch). 3. `markitect infospace chapters` subcommand. Lists source files in canonical order with entity count, evaluated count, classified count, and mean per-entity score per source. Text or JSON output. Natural triage surface for long-text infospaces — spot chapters that under-extracted or evaluated poorly. Also: `docs/advanced-usage.md` gets a new "Systematic processing of long texts" section with the recommended flag combo and the tradeoff note on cost. 11 new unit tests cover the chapters command (text/json/no-sources), the process flag wiring (help + provider requirement), and the commit-body bucket logic. Full infospace+llm unit suite (315 tests) green; 3 pre-existing infospace failures unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
231 lines
7.8 KiB
Markdown
231 lines
7.8 KiB
Markdown
# Advanced Usage — Wealth of Nations Infospace
|
||
|
||
Patterns for working with the WoN infospace (988 entities) after the initial
|
||
pipeline run. Every command in this file has been run against the actual
|
||
infospace at the time of writing (2026-04-21); output shapes are excerpted
|
||
verbatim.
|
||
|
||
All commands assume `cwd = examples/infospace-with-history` and the
|
||
`markitect-venv` Python environment.
|
||
|
||
---
|
||
|
||
## 1. Incremental evaluation — add entities after the initial run
|
||
|
||
`markitect infospace evaluate` writes one file per entity under
|
||
`output/evaluations/<slug>.md`. It skips any entity whose evaluation file
|
||
already exists, so re-running after adding a new entity processes only the
|
||
new one.
|
||
|
||
```bash
|
||
# Add a new entity file
|
||
vim output/entities/new-concept.md
|
||
|
||
# Evaluate only the new entity (explicit)
|
||
markitect infospace evaluate --entity new-concept --provider openrouter
|
||
|
||
# Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM
|
||
markitect infospace evaluate --provider openrouter
|
||
```
|
||
|
||
**How skip detection works.** Evaluation slugs are normalised to underscores
|
||
with `_s_` preserving apostrophes (`farmers-capital` entity →
|
||
`farmer_s_capital.md` evaluation). If a new entity slug collides with an
|
||
existing evaluation under this normalisation, the eval will be skipped.
|
||
To be sure an entity was picked up, check:
|
||
|
||
```bash
|
||
# Count entities vs evaluations
|
||
ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l
|
||
ls output/evaluations/*.md | wc -l
|
||
```
|
||
|
||
---
|
||
|
||
## 2. Re-evaluating after guideline changes
|
||
|
||
`evaluate` has no `--force` flag; re-evaluation requires deleting the
|
||
existing file first.
|
||
|
||
```bash
|
||
# Re-evaluate a single entity after updating the evaluation rubric
|
||
rm output/evaluations/accumulation_of_stock.md
|
||
markitect infospace evaluate --entity accumulation-of-stock --provider openrouter
|
||
|
||
# Re-evaluate a whole chapter
|
||
ls output/entities/book-1-chapter-06-entities.md # see which entities the chapter produced
|
||
# Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them
|
||
```
|
||
|
||
After re-evaluating, refresh the aggregate:
|
||
|
||
```bash
|
||
markitect infospace eval-summary --update-metrics
|
||
```
|
||
|
||
This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next
|
||
`markitect infospace viability` check reflects the new scores.
|
||
|
||
---
|
||
|
||
## 3. Interpreting per-entity score distributions
|
||
|
||
`eval-summary` shows the mean for each of the five evaluation dimensions
|
||
plus the overall range:
|
||
|
||
```
|
||
$ markitect infospace eval-summary
|
||
Evaluation summary — 985 entities evaluated
|
||
|
||
Dimension Mean
|
||
--------------------------------------
|
||
overall 3.956
|
||
definition_precision 3.620
|
||
domain_placement 4.559
|
||
explanatory_value 3.936
|
||
source_grounding 4.358
|
||
vsm_relevance 3.305
|
||
|
||
Range: 1.00 – 4.80
|
||
```
|
||
|
||
Interpretation:
|
||
- `overall` above the 3.5 viability threshold → the collection passes
|
||
`per_entity_mean`.
|
||
- The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If
|
||
the collection is meant to be VSM-grounded, this is the dimension most
|
||
worth improving (via sharper entity definitions or schema changes).
|
||
- A wide range (1.00 – 4.80) tells you there are outliers at both ends —
|
||
worth triaging (see pattern 4).
|
||
|
||
---
|
||
|
||
## 4. Triaging low scorers
|
||
|
||
`markitect infospace entities --by-type` prints each entity's star score
|
||
in-line:
|
||
|
||
```
|
||
$ markitect infospace entities --by-type | head
|
||
=== Element (315 entities) ===
|
||
active_and_productive_stock Accumulation S1 ★4.6
|
||
advanced_state_of_society General Theory S5
|
||
agio_of_bank_money Exchange S2 ★4.8
|
||
```
|
||
|
||
Entities with no `★` have no evaluation yet. To list the lowest-scoring
|
||
entities across the whole collection:
|
||
|
||
```bash
|
||
# Extract overall_score from every evaluation file and sort ascending
|
||
for f in output/evaluations/*.md; do
|
||
score=$(awk '/^overall_score:/ {print $2; exit}' "$f")
|
||
printf "%s\t%s\n" "$score" "$(basename "$f" .md)"
|
||
done | sort -n | head -20
|
||
```
|
||
|
||
The 20 lowest scorers are the natural triage list — inspect their
|
||
`output/entities/<slug>.md` and evaluation rationales to decide whether to
|
||
refine the entity, merge it with a better-formed neighbour, or drop it.
|
||
|
||
---
|
||
|
||
## 5. Reading and acting on collection-check output
|
||
|
||
`markitect infospace check` runs five concerns (C1–C5). Use `--concern` to
|
||
focus on one and `--json` for machine-readable output:
|
||
|
||
```bash
|
||
# Redundancy — which pairs of entities are suspiciously similar?
|
||
markitect infospace check --concern redundancy --json
|
||
```
|
||
|
||
```json
|
||
{
|
||
"redundancy": {
|
||
"concern": "C1",
|
||
"redundancy_ratio": 0.0061,
|
||
"similar_pairs": [
|
||
{"entity_a": "bank_economic_contribution_metrics",
|
||
"entity_b": "bank_economic_development_metrics",
|
||
"similarity": 1.0, "method": "word_overlap"},
|
||
{"entity_a": "economic_system_objectives",
|
||
"entity_b": "economic_system_purpose",
|
||
"similarity": 0.9394, "method": "word_overlap"}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
Acting on this:
|
||
- **Similarity = 1.0** is almost certainly a duplicate — pick one slug and
|
||
merge or delete the other.
|
||
- **0.85–0.99** usually means two entities genuinely cover the same idea
|
||
with slight phrasing differences. Merging is the cleanest fix.
|
||
- **< 0.85** usually represents legitimate adjacent concepts — leave as-is
|
||
unless the definition rubric says otherwise.
|
||
|
||
For coverage and coherence, the pattern is the same: the `--json` output
|
||
surfaces the specific entities / missing links / disconnected components
|
||
you need to look at, rather than a bare ratio.
|
||
|
||
---
|
||
|
||
## 5. Systematic processing of long texts
|
||
|
||
For long source material (books, multi-chapter specifications, corpora), the
|
||
pipeline can produce a clean chapter-by-chapter git history on its own if
|
||
you let it. The pattern:
|
||
|
||
```bash
|
||
# Process all sources in canonical order, eval and classify per chapter,
|
||
# snapshot metrics after each chapter.
|
||
markitect infospace process --all \
|
||
--provider openrouter \
|
||
--eval-after-source \
|
||
--classify-after-source \
|
||
--check-after-each
|
||
```
|
||
|
||
What you get:
|
||
|
||
- **One commit per source file**, not per batch run. The commit message body
|
||
lists counts by bucket (`entities: +23`, `evaluations: +23`,
|
||
`classifications: +23`) derived from the actual staged diff, so `git log`
|
||
reads like the story of the infospace growing.
|
||
- **Chapter-atomic commits.** `--eval-after-source` and
|
||
`--classify-after-source` evaluate and classify *only the new entities*
|
||
from the just-processed source before the commit lands, so each commit is
|
||
a self-contained chapter snapshot.
|
||
- **Metrics-per-chapter trail.** `--check-after-each` appends a snapshot to
|
||
`output/metrics/history.yaml` after every chapter, so `markitect infospace
|
||
history` later shows the metric trajectory rather than just start/end.
|
||
|
||
**Cost tradeoff.** `--eval-after-source` pays LLM latency per chapter rather
|
||
than amortising it across one bulk batch. It's worth it when you care about
|
||
the git history or want early quality signal, not when you're bulk-backfilling
|
||
a known-good corpus.
|
||
|
||
**Triage during the run.** While processing, use `markitect infospace
|
||
chapters` in another shell to see per-source entity/eval/classify counts and
|
||
mean scores — handy for spotting chapters that under-extracted or evaluated
|
||
poorly.
|
||
|
||
```
|
||
$ markitect infospace chapters
|
||
source entities evaluated classified mean_score
|
||
------------------- -------- --------- ---------- ----------
|
||
book-1-chapter-01 96 96 79 4.22
|
||
book-1-chapter-02 16 16 10 4.06
|
||
…
|
||
```
|
||
|
||
---
|
||
|
||
## See also
|
||
|
||
- `METRICS-METHODOLOGY.md` — how each metric is computed.
|
||
- `docs/composition-guide.md` — using this infospace as a discipline for a
|
||
different domain.
|
||
- `docs/performance-notes.md` — observed timings and provider choices.
|