From 34ed7a6fab601930ffbe2ff0e3f2aa323014ab04 Mon Sep 17 00:00:00 2001
From: tegwick <bernd.worsch@gmail.com>
Date: Mon, 23 Feb 2026 05:33:11 +0100
Subject: [PATCH] =?UTF-8?q?docs(tutorial):=20update=20=C2=A78-9=20for=20ev?=
 =?UTF-8?q?al-summary=20command=20and=206/6=20viability?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Add eval-summary command documentation with dimension descriptions
- Document resumable evaluate (incremental skip on re-run)
- Fix --entity slug example to use underscores (not hyphens)
- Update viability output to show per_entity_mean as 6th threshold
- Add workflow note: check → eval-summary --update-metrics → viability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 examples/infospace-with-history/TUTORIAL.md | 62 ++++++++++++++++++---
 1 file changed, 54 insertions(+), 8 deletions(-)
diff --git a/examples/infospace-with-history/TUTORIAL.md b/examples/infospace-with-history/TUTORIAL.md
index 5144e44c..f5718f28 100644
--- a/examples/infospace-with-history/TUTORIAL.md
+++ b/examples/infospace-with-history/TUTORIAL.md
@@ -391,13 +391,54 @@ markitect infospace evaluate --provider openrouter
 # Evaluate entities from a specific chapter:
 markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter
 
-# Re-evaluate a single entity:
-markitect infospace evaluate --entity division-of-labour --provider openrouter
+# Re-evaluate a single entity (slugs use underscores):
+markitect infospace evaluate --entity division_of_labour --provider openrouter
 ```
 
-This runs the `evaluate-entity` prompt template against each entity,
-scoring dimensions like definition precision, source grounding, and
-VSM relevance. Results are written to `output/evaluations/`.
+The command is resumable: entities with existing evaluation files are
+skipped automatically. Re-run after an interruption and it picks up
+where it left off. Results are written incrementally to
+`output/evaluations/<slug>.md`.
+
+Each entity is scored on five dimensions (1–5 scale):
+- `definition_precision` — Is the definition precise and non-circular?
+- `source_grounding` — Is the entity grounded in the actual source text?
+- `domain_placement` — Is the economic domain assignment correct?
+- `vsm_relevance` — Does the entity map naturally to a VSM system (S1–S5)?
+- `explanatory_value` — Does the entity add genuine explanatory power?
+
+### Evaluation summary
+
+After the evaluation run completes, compute aggregate statistics:
+
+```bash
+# Show per-dimension means:
+markitect infospace eval-summary
+
+# Also write per_entity_mean to metrics.yaml for viability checks:
+markitect infospace eval-summary --update-metrics
+```
+
+Sample output (full corpus, 988 entities):
+
+```
+Evaluation summary — 988 entities evaluated
+
+  Dimension                        Mean
+  --------------------------------------
+  overall                         4.XX
+  definition_precision            4.XX
+  domain_placement                X.XX
+  explanatory_value               4.XX
+  source_grounding                4.XX
+  vsm_relevance                   3.XX
+
+  Range: X.XX – X.XX
+```
+
+`vsm_relevance` typically scores lower than the other dimensions —
+many WoN concepts are foundational economic ideas that don't map
+cleanly to a single VSM system. This is expected and informative.
 
 ### Collection-level checks (C1–C5)
 
@@ -459,15 +500,20 @@ Compares the latest metrics against the thresholds declared in
 ```
 Metric                            Value       Threshold   Status
 ---------------------------------------------------------------
-redundancy_ratio                 0.0059         max=0.1     PASS
+redundancy_ratio                 0.0061         max=0.1     PASS
 coverage_ratio                   0.6190         min=0.4     PASS
 coherence_components             0.0000           max=3     PASS
 consistency_cycles               0.0000           max=0     PASS
-granularity_entropy              2.9533         min=1.0     PASS
+granularity_entropy              2.6748         min=1.0     PASS
+per_entity_mean                  4.XXXX         min=3.5     PASS
 
-Viable: YES (5/5 thresholds met)
+Viable: YES (6/6 thresholds met)
 ```
 
+`per_entity_mean` only appears after running `eval-summary --update-metrics`.
+Run `check` first (deterministic), then `eval-summary --update-metrics`,
+then `viability` to see the full six-threshold dashboard.
+
 During early processing (first few books), coverage will fall and
 then stabilise as the domain × chapter matrix fills in. The threshold
 of 0.40 reflects realistic expectations for a multi-book corpus where