docs(infospace): add advanced-usage, composition guide, and performance notes (C.4/C.5/C.6)

Closes out three docs tasks from roadmap/infospace-s3-closeout/PLAN.md: - examples/infospace-with-history/docs/advanced-usage.md (C.4) — 5 worked patterns covering incremental eval, re-eval workflow (no --force flag exists; documents the rm-then-re-run pattern instead), interpreting the eval-summary distribution, triaging low scorers via an awk pipeline over overall_score (since `entities --sort-by score` does not exist), and acting on check --json output. - docs/composition-guide.md (C.5) — walks through how supply-chain-vsm binds WoN as a discipline, then a step-by-step for creating a new infospace that binds an existing one. Includes live output from `markitect infospace disciplines`. - examples/infospace-with-history/docs/performance-notes.md (C.6) — cites the 6h 28m wall time of the 985-entity S3.3 batch, ~2.5 ent/min rate, ~2000–3000 tokens/entity estimate, word_overlap vs embedding backend for redundancy checks, and a provider-by-scale recommendation table. All commands in these docs were run against the live infospace at commit time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 07:02:46 +02:00
parent b7e11461f4
commit 36a5136bdf
3 changed files with 488 additions and 0 deletions
--- a/docs/composition-guide.md
+++ b/docs/composition-guide.md
@@ -0,0 +1,203 @@
+# Infospace Composition Guide
+
+One completed, viable infospace can be reused as a **discipline** for
+another infospace — a lens applied to a different topic. This guide
+explains how composition works and walks through the live
+`examples/supply-chain-vsm/` reference.
+
+---
+
+## What composition means
+
+An **infospace** is a directory of typed entities governed by
+`infospace.yaml`. Its entities and relations describe a specific topic
+(for example, Adam Smith's *Wealth of Nations*).
+
+A **discipline** is an infospace declared as a reusable analytical
+framework by another infospace. When infospace B binds infospace A as a
+discipline:
+
+1. B's entities can reference A's entities in `## WoN Concept` (or
+   equivalent) sections.
+2. Properties A has already computed on its entities — such as VSM system
+   placement — become available to B by transitivity through the mapping.
+3. B can impose its own viability thresholds independently of A's. The two
+   infospaces each pass or fail viability on their own terms.
+
+The binding is declarative: a relative path in `infospace.yaml` plus a
+display name. No code. No import. The discipline is looked up on disk at
+the declared path when B's commands run.
+
+---
+
+## The viability pre-condition
+
+Binding a non-viable infospace as a discipline is a mistake: a framework
+that fails its own thresholds is not a stable reference frame. Before
+binding, confirm the candidate discipline is viable:
+
+```bash
+cd examples/infospace-with-history
+markitect infospace viability
+```
+
+```
+Metric                            Value       Threshold   Status
+---------------------------------------------------------------
+redundancy_ratio                 0.0061         max=0.1     PASS
+coverage_ratio                   0.6190         min=0.4     PASS
+coherence_components             0.0000           max=3     PASS
+consistency_cycles               0.0000           max=0     PASS
+granularity_entropy              2.6748         min=1.0     PASS
+per_entity_mean                  3.9556         min=3.5     PASS
+
+Viable: YES (6/6 thresholds met)
+```
+
+If the discipline is not viable, fix it first (see
+`examples/infospace-with-history/docs/advanced-usage.md` §4 for triaging
+low scorers).
+
+---
+
+## Example — how `supply-chain-vsm` binds WoN
+
+The supply-chain infospace declares WoN as a discipline in its
+`infospace.yaml`:
+
+```yaml
+topic:
+  name: "Modern Supply Chain Management"
+  domain: "Operations Management"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Wealth of Nations"
+    path: ../infospace-with-history
+```
+
+The binding is a **relative path**, so the two infospaces travel together
+(they can be moved as a pair without breaking the link).
+
+Verify the binding resolves and the discipline is viable:
+
+```bash
+cd examples/supply-chain-vsm
+markitect infospace disciplines
+```
+
+```
+Name                           Entities   Viable Path
+----------------------------------------------------------------------
+Wealth of Nations                   988      YES ../infospace-with-history
+```
+
+Each supply-chain entity then carries a `## WoN Concept` section
+mapping it to exactly one WoN entity. The consolidated mapping files
+(`output/mappings/*-mappings.md`) record the pairing, rationale, and a
+conceptual-continuity rating (Strong / Moderate / Weak):
+
+| Supply Chain Entity          | WoN Concept                      | Strength | VSM   |
+|------------------------------|----------------------------------|----------|-------|
+| Demand Signal                | Effectual Demand                 | Strong   | S2    |
+| Vendor-Managed Inventory     | Division of Labour               | Strong   | S1/S2 |
+| Just-in-Time Inventory       | Circulating Capital              | Strong   | S1/S3 |
+| Bullwhip Effect              | Natural Price as Central Price   | Moderate | S2    |
+| Safety Stock                 | Accumulation of Stock            | Moderate | S3    |
+
+Because each WoN entity already has a VSM system placement (S1–S5), the
+supply-chain entities inherit a VSM position by transitivity through
+their mapping — without supply-chain-vsm needing its own VSM reference.
+
+---
+
+## Creating a new infospace that binds an existing one
+
+Step-by-step, using WoN as the discipline for a hypothetical "Modern
+Monetary Policy" infospace:
+
+### 1. Start from the target topic
+
+```bash
+mkdir -p examples/monetary-policy/artifacts/sources
+cd examples/monetary-policy
+markitect infospace init
+```
+
+### 2. Declare the discipline in `infospace.yaml`
+
+```yaml
+topic:
+  name: "Modern Monetary Policy"
+  domain: "Macroeconomics"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Wealth of Nations"
+    path: ../infospace-with-history
+```
+
+Alternatively, bind imperatively after `init`:
+
+```bash
+markitect infospace bind-discipline ../infospace-with-history --name "Wealth of Nations"
+```
+
+### 3. Set your own viability thresholds
+
+Copy the `viability:` block from a reference infospace and tune the
+numbers to the scale and maturity of your topic. A smaller infospace
+(50 entities, not 988) may need laxer `coverage_ratio` and stricter
+`redundancy_ratio`.
+
+### 4. Verify the binding
+
+```bash
+markitect infospace disciplines
+```
+
+If `Viable` is `NO`, stop and fix the discipline before continuing.
+
+### 5. Reference discipline entities in your own entities
+
+For each entity in the new infospace, add a `## <Discipline> Concept`
+section that names the WoN entity the concept maps to, plus a rationale.
+The exact section heading is configured per schema — see
+`schemas/won-mapping-schema-v1.0.md` in `supply-chain-vsm` for the
+template used there.
+
+### 6. Run checks and evaluate
+
+```bash
+markitect infospace check
+markitect infospace evaluate --provider openrouter
+markitect infospace eval-summary --update-metrics
+markitect infospace viability
+```
+
+The new infospace passes or fails viability independently of WoN.
+
+---
+
+## Why composition, not inclusion?
+
+An alternative would be to copy WoN entities directly into the target
+infospace. Composition avoids that by design:
+
+- **One source of truth** — if WoN is refined, every infospace that binds
+  it picks up the improvement on the next run without a sync step.
+- **Separation of concerns** — each infospace owns its own schema,
+  thresholds, and entity set. Changing the target topic cannot pollute
+  the discipline.
+- **Bounded dependency** — the binding is a path, so the coupling is
+  visible in one place (`infospace.yaml`) and easy to remove.
+
+---
+
+## See also
+
+- `examples/supply-chain-vsm/README.md` — the full reference composition.
+- `examples/supply-chain-vsm/output/mappings/` — consolidated mapping
+  files showing the rationale and strength rating for each pairing.
+- `examples/infospace-with-history/docs/advanced-usage.md` — patterns for
+  maintaining the discipline once it is in use.
--- a/examples/infospace-with-history/docs/advanced-usage.md
+++ b/examples/infospace-with-history/docs/advanced-usage.md
@@ -0,0 +1,179 @@
+# Advanced Usage — Wealth of Nations Infospace
+
+Patterns for working with the WoN infospace (988 entities) after the initial
+pipeline run. Every command in this file has been run against the actual
+infospace at the time of writing (2026-04-21); output shapes are excerpted
+verbatim.
+
+All commands assume `cwd = examples/infospace-with-history` and the
+`markitect-venv` Python environment.
+
+---
+
+## 1. Incremental evaluation — add entities after the initial run
+
+`markitect infospace evaluate` writes one file per entity under
+`output/evaluations/<slug>.md`. It skips any entity whose evaluation file
+already exists, so re-running after adding a new entity processes only the
+new one.
+
+```bash
+# Add a new entity file
+vim output/entities/new-concept.md
+
+# Evaluate only the new entity (explicit)
+markitect infospace evaluate --entity new-concept --provider openrouter
+
+# Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM
+markitect infospace evaluate --provider openrouter
+```
+
+**How skip detection works.** Evaluation slugs are normalised to underscores
+with `_s_` preserving apostrophes (`farmers-capital` entity →
+`farmer_s_capital.md` evaluation). If a new entity slug collides with an
+existing evaluation under this normalisation, the eval will be skipped.
+To be sure an entity was picked up, check:
+
+```bash
+# Count entities vs evaluations
+ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l
+ls output/evaluations/*.md | wc -l
+```
+
+---
+
+## 2. Re-evaluating after guideline changes
+
+`evaluate` has no `--force` flag; re-evaluation requires deleting the
+existing file first.
+
+```bash
+# Re-evaluate a single entity after updating the evaluation rubric
+rm output/evaluations/accumulation_of_stock.md
+markitect infospace evaluate --entity accumulation-of-stock --provider openrouter
+
+# Re-evaluate a whole chapter
+ls output/entities/book-1-chapter-06-entities.md   # see which entities the chapter produced
+# Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them
+```
+
+After re-evaluating, refresh the aggregate:
+
+```bash
+markitect infospace eval-summary --update-metrics
+```
+
+This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next
+`markitect infospace viability` check reflects the new scores.
+
+---
+
+## 3. Interpreting per-entity score distributions
+
+`eval-summary` shows the mean for each of the five evaluation dimensions
+plus the overall range:
+
+```
+$ markitect infospace eval-summary
+Evaluation summary — 985 entities evaluated
+
+  Dimension                        Mean
+  --------------------------------------
+  overall                         3.956
+  definition_precision            3.620
+  domain_placement                4.559
+  explanatory_value               3.936
+  source_grounding                4.358
+  vsm_relevance                   3.305
+
+  Range: 1.00 – 4.80
+```
+
+Interpretation:
+- `overall` above the 3.5 viability threshold → the collection passes
+  `per_entity_mean`.
+- The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If
+  the collection is meant to be VSM-grounded, this is the dimension most
+  worth improving (via sharper entity definitions or schema changes).
+- A wide range (1.00 – 4.80) tells you there are outliers at both ends —
+  worth triaging (see pattern 4).
+
+---
+
+## 4. Triaging low scorers
+
+`markitect infospace entities --by-type` prints each entity's star score
+in-line:
+
+```
+$ markitect infospace entities --by-type | head
+=== Element (315 entities) ===
+  active_and_productive_stock              Accumulation       S1   ★4.6
+  advanced_state_of_society                General Theory     S5
+  agio_of_bank_money                       Exchange           S2   ★4.8
+```
+
+Entities with no `★` have no evaluation yet. To list the lowest-scoring
+entities across the whole collection:
+
+```bash
+# Extract overall_score from every evaluation file and sort ascending
+for f in output/evaluations/*.md; do
+  score=$(awk '/^overall_score:/ {print $2; exit}' "$f")
+  printf "%s\t%s\n" "$score" "$(basename "$f" .md)"
+done | sort -n | head -20
+```
+
+The 20 lowest scorers are the natural triage list — inspect their
+`output/entities/<slug>.md` and evaluation rationales to decide whether to
+refine the entity, merge it with a better-formed neighbour, or drop it.
+
+---
+
+## 5. Reading and acting on collection-check output
+
+`markitect infospace check` runs five concerns (C1–C5). Use `--concern` to
+focus on one and `--json` for machine-readable output:
+
+```bash
+# Redundancy — which pairs of entities are suspiciously similar?
+markitect infospace check --concern redundancy --json
+```
+
+```json
+{
+  "redundancy": {
+    "concern": "C1",
+    "redundancy_ratio": 0.0061,
+    "similar_pairs": [
+      {"entity_a": "bank_economic_contribution_metrics",
+       "entity_b": "bank_economic_development_metrics",
+       "similarity": 1.0, "method": "word_overlap"},
+      {"entity_a": "economic_system_objectives",
+       "entity_b": "economic_system_purpose",
+       "similarity": 0.9394, "method": "word_overlap"}
+    ]
+  }
+}
+```
+
+Acting on this:
+- **Similarity = 1.0** is almost certainly a duplicate — pick one slug and
+  merge or delete the other.
+- **0.85–0.99** usually means two entities genuinely cover the same idea
+  with slight phrasing differences. Merging is the cleanest fix.
+- **< 0.85** usually represents legitimate adjacent concepts — leave as-is
+  unless the definition rubric says otherwise.
+
+For coverage and coherence, the pattern is the same: the `--json` output
+surfaces the specific entities / missing links / disconnected components
+you need to look at, rather than a bare ratio.
+
+---
+
+## See also
+
+- `METRICS-METHODOLOGY.md` — how each metric is computed.
+- `docs/composition-guide.md` — using this infospace as a discipline for a
+  different domain.
+- `docs/performance-notes.md` — observed timings and provider choices.
--- a/examples/infospace-with-history/docs/performance-notes.md
+++ b/examples/infospace-with-history/docs/performance-notes.md
@@ -0,0 +1,106 @@
+# Performance Notes — Wealth of Nations Infospace
+
+Observed timings, file sizes, and provider choices from the 988-entity WoN
+example. These are **operational notes**, not a benchmark — numbers come
+from the actual S3.3 evaluation run (2026-02-23) rather than a controlled
+experiment.
+
+---
+
+## Evaluation batch duration
+
+The initial evaluation pass produced 985 `output/evaluations/*.md` files:
+
+- First `evaluated_at`: `2026-02-23T00:11:52`
+- Last `evaluated_at`:  `2026-02-23T06:39:45`
+- **Total wall time: ~6h 28m**
+- **Effective throughput: ~2.5 entities/min** (~152 entities/hour)
+
+Extracted from evaluation frontmatter:
+```bash
+grep -h '^evaluated_at:' output/evaluations/*.md | sort | sed -n '1p;$p'
+```
+
+Caveats:
+- This was against OpenRouter's free tier, which applies implicit
+  rate-limiting and occasional retries.
+- Throughput is not constant — gaps between bursts show up as plateaus
+  when you plot the timestamps.
+- The batch was not fully parallelised; a tuned concurrent client could
+  likely 2–4× this throughput on a paid OpenRouter tier.
+
+---
+
+## Tokens per entity (estimate)
+
+Direct token counts are not logged in the evaluation files, but the
+inputs and outputs are on disk:
+
+- **Input per request**: evaluation schema (~3.7 KB) + entity file
+  (~0.7 KB median) + fixed system prompt ≈ **~1500–2500 tokens in**
+- **Output per request**: structured evaluation with 5 dimensions and
+  rationales, median eval file 3.6 KB ≈ **~600–800 tokens out**
+- **Round-trip total**: **~2000–3000 tokens per entity**
+- **Batch total estimate**: 985 entities × ~2500 tokens ≈ **~2.5M tokens**
+  for the full pass
+
+The constant per-entity input means the cheapest way to reduce spend on a
+re-run is to narrow the targeted entities (`--entity <slug>` or
+`--chapter <n>`), not to shorten the schema.
+
+---
+
+## Embedding cache and collection checks
+
+`markitect infospace check --concern redundancy` supports two similarity
+backends (see `markitect/infospace/checks/redundancy.py`):
+
+- **`word_overlap`** — the default, used when no embeddings are provided.
+  Pure-Python set intersection over tokenised entity text. **No LLM calls,
+  no cache needed.** This is what the current WoN check runs.
+- **`embedding`** — active when a pre-computed `{slug: vector}` mapping is
+  passed in. No persistent on-disk embedding cache exists today; the
+  caller is responsible for computing and supplying the vectors.
+
+Implication: the 988-entity `check` runs in seconds because it's all
+word-overlap. Switching to embedding similarity would add an embedding
+API pass (another ~988 requests) which is currently a manual step
+outside the CLI.
+
+---
+
+## Provider choice — recommendation
+
+For the WoN dataset specifically (text-heavy entities, 5-dimension
+rubric):
+
+| Scale                 | Recommended provider             | Rationale |
+|-----------------------|----------------------------------|-----------|
+| < 50 entities         | `gemini/gemini-2.5-flash`        | Fast default; free tier is generous enough; consistent with `markitect llm-check` out of the box. |
+| 50 – 1000 entities    | `openrouter` with a `:free` model (e.g. `arcee-ai/trinity-large-preview:free`) | What the S3.3 batch used; gets through 988 entities in one overnight run without cost. |
+| > 1000 entities       | `openrouter` with a paid small-context model, or `openai` | Free-tier rate limits start to dominate wall time; paying for higher concurrency is cheaper than calendar time. |
+
+All providers are accepted by `markitect infospace evaluate --provider`.
+The evaluation schema doesn't assume any provider-specific features.
+
+Note on provider mixing: if part of a collection is evaluated under one
+provider/model and the rest under another, `per_entity_mean` can drift
+slightly (different models calibrate scores differently). For the
+viability threshold of 3.5 the drift is usually negligible, but for
+fine-grained outlier analysis prefer a single provider per batch.
+
+---
+
+## What is *not* measured here
+
+- **End-to-end pipeline time** (entity extraction from raw chapters,
+  classification, relation graph) — only the evaluation phase is timed.
+- **Memory footprint** — the full in-memory state for 988 entities is
+  small (< 200 MB observed), but not systematically measured.
+- **Failure/retry rates** — the 985 vs 988 gap is three entities the
+  original run missed (plus one added later); no structured retry log
+  was kept.
+
+Expanding any of these into a proper benchmark is **out of scope** for
+the WoN example and should live alongside a synthetic corpus that can be
+regenerated deterministically.