# Advanced Usage — Wealth of Nations Infospace Patterns for working with the WoN infospace (988 entities) after the initial pipeline run. Every command in this file has been run against the actual infospace at the time of writing (2026-04-21); output shapes are excerpted verbatim. All commands assume `cwd = examples/infospace-with-history` and the `markitect-venv` Python environment. --- ## 1. Incremental evaluation — add entities after the initial run `markitect infospace evaluate` writes one file per entity under `output/evaluations/.md`. It skips any entity whose evaluation file already exists, so re-running after adding a new entity processes only the new one. ```bash # Add a new entity file vim output/entities/new-concept.md # Evaluate only the new entity (explicit) markitect infospace evaluate --entity new-concept --provider openrouter # Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM markitect infospace evaluate --provider openrouter ``` **How skip detection works.** Evaluation slugs are normalised to underscores with `_s_` preserving apostrophes (`farmers-capital` entity → `farmer_s_capital.md` evaluation). If a new entity slug collides with an existing evaluation under this normalisation, the eval will be skipped. To be sure an entity was picked up, check: ```bash # Count entities vs evaluations ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l ls output/evaluations/*.md | wc -l ``` --- ## 2. Re-evaluating after guideline changes `evaluate` has no `--force` flag; re-evaluation requires deleting the existing file first. ```bash # Re-evaluate a single entity after updating the evaluation rubric rm output/evaluations/accumulation_of_stock.md markitect infospace evaluate --entity accumulation-of-stock --provider openrouter # Re-evaluate a whole chapter ls output/entities/book-1-chapter-06-entities.md # see which entities the chapter produced # Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them ``` After re-evaluating, refresh the aggregate: ```bash markitect infospace eval-summary --update-metrics ``` This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next `markitect infospace viability` check reflects the new scores. --- ## 3. Interpreting per-entity score distributions `eval-summary` shows the mean for each of the five evaluation dimensions plus the overall range: ``` $ markitect infospace eval-summary Evaluation summary — 985 entities evaluated Dimension Mean -------------------------------------- overall 3.956 definition_precision 3.620 domain_placement 4.559 explanatory_value 3.936 source_grounding 4.358 vsm_relevance 3.305 Range: 1.00 – 4.80 ``` Interpretation: - `overall` above the 3.5 viability threshold → the collection passes `per_entity_mean`. - The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If the collection is meant to be VSM-grounded, this is the dimension most worth improving (via sharper entity definitions or schema changes). - A wide range (1.00 – 4.80) tells you there are outliers at both ends — worth triaging (see pattern 4). --- ## 4. Triaging low scorers `markitect infospace entities --by-type` prints each entity's star score in-line: ``` $ markitect infospace entities --by-type | head === Element (315 entities) === active_and_productive_stock Accumulation S1 ★4.6 advanced_state_of_society General Theory S5 agio_of_bank_money Exchange S2 ★4.8 ``` Entities with no `★` have no evaluation yet. To list the lowest-scoring entities across the whole collection: ```bash # Extract overall_score from every evaluation file and sort ascending for f in output/evaluations/*.md; do score=$(awk '/^overall_score:/ {print $2; exit}' "$f") printf "%s\t%s\n" "$score" "$(basename "$f" .md)" done | sort -n | head -20 ``` The 20 lowest scorers are the natural triage list — inspect their `output/entities/.md` and evaluation rationales to decide whether to refine the entity, merge it with a better-formed neighbour, or drop it. --- ## 5. Reading and acting on collection-check output `markitect infospace check` runs five concerns (C1–C5). Use `--concern` to focus on one and `--json` for machine-readable output: ```bash # Redundancy — which pairs of entities are suspiciously similar? markitect infospace check --concern redundancy --json ``` ```json { "redundancy": { "concern": "C1", "redundancy_ratio": 0.0061, "similar_pairs": [ {"entity_a": "bank_economic_contribution_metrics", "entity_b": "bank_economic_development_metrics", "similarity": 1.0, "method": "word_overlap"}, {"entity_a": "economic_system_objectives", "entity_b": "economic_system_purpose", "similarity": 0.9394, "method": "word_overlap"} ] } } ``` Acting on this: - **Similarity = 1.0** is almost certainly a duplicate — pick one slug and merge or delete the other. - **0.85–0.99** usually means two entities genuinely cover the same idea with slight phrasing differences. Merging is the cleanest fix. - **< 0.85** usually represents legitimate adjacent concepts — leave as-is unless the definition rubric says otherwise. For coverage and coherence, the pattern is the same: the `--json` output surfaces the specific entities / missing links / disconnected components you need to look at, rather than a bare ratio. --- ## 5. Systematic processing of long texts For long source material (books, multi-chapter specifications, corpora), the pipeline can produce a clean chapter-by-chapter git history on its own if you let it. The pattern: ```bash # Process all sources in canonical order, eval and classify per chapter, # snapshot metrics after each chapter. markitect infospace process --all \ --provider openrouter \ --eval-after-source \ --classify-after-source \ --check-after-each ``` What you get: - **One commit per source file**, not per batch run. The commit message body lists counts by bucket (`entities: +23`, `evaluations: +23`, `classifications: +23`) derived from the actual staged diff, so `git log` reads like the story of the infospace growing. - **Chapter-atomic commits.** `--eval-after-source` and `--classify-after-source` evaluate and classify *only the new entities* from the just-processed source before the commit lands, so each commit is a self-contained chapter snapshot. - **Metrics-per-chapter trail.** `--check-after-each` appends a snapshot to `output/metrics/history.yaml` after every chapter, so `markitect infospace history` later shows the metric trajectory rather than just start/end. **Cost tradeoff.** `--eval-after-source` pays LLM latency per chapter rather than amortising it across one bulk batch. It's worth it when you care about the git history or want early quality signal, not when you're bulk-backfilling a known-good corpus. **Triage during the run.** While processing, use `markitect infospace chapters` in another shell to see per-source entity/eval/classify counts and mean scores — handy for spotting chapters that under-extracted or evaluated poorly. ``` $ markitect infospace chapters source entities evaluated classified mean_score ------------------- -------- --------- ---------- ---------- book-1-chapter-01 96 96 79 4.22 book-1-chapter-02 16 16 10 4.06 … ``` --- ## See also - `METRICS-METHODOLOGY.md` — how each metric is computed. - `docs/composition-guide.md` — using this infospace as a discipline for a different domain. - `docs/performance-notes.md` — observed timings and provider choices.