docs(infospace): add advanced-usage, composition guide, and performance notes (C.4/C.5/C.6)

Closes out three docs tasks from roadmap/infospace-s3-closeout/PLAN.md: - examples/infospace-with-history/docs/advanced-usage.md (C.4) — 5 worked patterns covering incremental eval, re-eval workflow (no --force flag exists; documents the rm-then-re-run pattern instead), interpreting the eval-summary distribution, triaging low scorers via an awk pipeline over overall_score (since `entities --sort-by score` does not exist), and acting on check --json output. - docs/composition-guide.md (C.5) — walks through how supply-chain-vsm binds WoN as a discipline, then a step-by-step for creating a new infospace that binds an existing one. Includes live output from `markitect infospace disciplines`. - examples/infospace-with-history/docs/performance-notes.md (C.6) — cites the 6h 28m wall time of the 985-entity S3.3 batch, ~2.5 ent/min rate, ~2000–3000 tokens/entity estimate, word_overlap vs embedding backend for redundancy checks, and a provider-by-scale recommendation table. All commands in these docs were run against the live infospace at commit time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 07:02:46 +02:00
parent b7e11461f4
commit 36a5136bdf
3 changed files with 488 additions and 0 deletions
--- a/docs/composition-guide.md
+++ b/docs/composition-guide.md
@@ -0,0 +1,203 @@
 # Infospace Composition Guide
 One completed, viable infospace can be reused as a **discipline** for
 another infospace — a lens applied to a different topic. This guide
 explains how composition works and walks through the live
 `examples/supply-chain-vsm/` reference.
 ---
 ## What composition means
 An **infospace** is a directory of typed entities governed by
 `infospace.yaml`. Its entities and relations describe a specific topic
 (for example, Adam Smith's *Wealth of Nations*).
 A **discipline** is an infospace declared as a reusable analytical
 framework by another infospace. When infospace B binds infospace A as a
 discipline:
 1. B's entities can reference A's entities in `## WoN Concept` (or
   equivalent) sections.
 2. Properties A has already computed on its entities — such as VSM system
   placement — become available to B by transitivity through the mapping.
 3. B can impose its own viability thresholds independently of A's. The two
   infospaces each pass or fail viability on their own terms.
 The binding is declarative: a relative path in `infospace.yaml` plus a
 display name. No code. No import. The discipline is looked up on disk at
 the declared path when B's commands run.
 ---
 ## The viability pre-condition
 Binding a non-viable infospace as a discipline is a mistake: a framework
 that fails its own thresholds is not a stable reference frame. Before
 binding, confirm the candidate discipline is viable:
 ```bash
 cd examples/infospace-with-history
 markitect infospace viability
 ```
 ```
 Metric                            Value       Threshold   Status
 ---------------------------------------------------------------
 redundancy_ratio                 0.0061         max=0.1     PASS
 coverage_ratio                   0.6190         min=0.4     PASS
 coherence_components             0.0000           max=3     PASS
 consistency_cycles               0.0000           max=0     PASS
 granularity_entropy              2.6748         min=1.0     PASS
 per_entity_mean                  3.9556         min=3.5     PASS
 Viable: YES (6/6 thresholds met)
 ```
 If the discipline is not viable, fix it first (see
 `examples/infospace-with-history/docs/advanced-usage.md` §4 for triaging
 low scorers).
 ---
 ## Example — how `supply-chain-vsm` binds WoN
 The supply-chain infospace declares WoN as a discipline in its
 `infospace.yaml`:
 ```yaml
 topic:
  name: "Modern Supply Chain Management"
  domain: "Operations Management"
  sources: artifacts/sources/
 disciplines:
  - name: "Wealth of Nations"
    path: ../infospace-with-history
 ```
 The binding is a **relative path**, so the two infospaces travel together
 (they can be moved as a pair without breaking the link).
 Verify the binding resolves and the discipline is viable:
 ```bash
 cd examples/supply-chain-vsm
 markitect infospace disciplines
 ```
 ```
 Name                           Entities   Viable Path
 ----------------------------------------------------------------------
 Wealth of Nations                   988      YES ../infospace-with-history
 ```
 Each supply-chain entity then carries a `## WoN Concept` section
 mapping it to exactly one WoN entity. The consolidated mapping files
 (`output/mappings/*-mappings.md`) record the pairing, rationale, and a
 conceptual-continuity rating (Strong / Moderate / Weak):
 | Supply Chain Entity          | WoN Concept                      | Strength | VSM   |
 |------------------------------|----------------------------------|----------|-------|
 | Demand Signal                | Effectual Demand                 | Strong   | S2    |
 | Vendor-Managed Inventory     | Division of Labour               | Strong   | S1/S2 |
 | Just-in-Time Inventory       | Circulating Capital              | Strong   | S1/S3 |
 | Bullwhip Effect              | Natural Price as Central Price   | Moderate | S2    |
 | Safety Stock                 | Accumulation of Stock            | Moderate | S3    |
 Because each WoN entity already has a VSM system placement (S1–S5), the
 supply-chain entities inherit a VSM position by transitivity through
 their mapping — without supply-chain-vsm needing its own VSM reference.
 ---
 ## Creating a new infospace that binds an existing one
 Step-by-step, using WoN as the discipline for a hypothetical "Modern
 Monetary Policy" infospace:
 ### 1. Start from the target topic
 ```bash
 mkdir -p examples/monetary-policy/artifacts/sources
 cd examples/monetary-policy
 markitect infospace init
 ```
 ### 2. Declare the discipline in `infospace.yaml`
 ```yaml
 topic:
  name: "Modern Monetary Policy"
  domain: "Macroeconomics"
  sources: artifacts/sources/
 disciplines:
  - name: "Wealth of Nations"
    path: ../infospace-with-history
 ```
 Alternatively, bind imperatively after `init`:
 ```bash
 markitect infospace bind-discipline ../infospace-with-history --name "Wealth of Nations"
 ```
 ### 3. Set your own viability thresholds
 Copy the `viability:` block from a reference infospace and tune the
 numbers to the scale and maturity of your topic. A smaller infospace
 (50 entities, not 988) may need laxer `coverage_ratio` and stricter
 `redundancy_ratio`.
 ### 4. Verify the binding
 ```bash
 markitect infospace disciplines
 ```
 If `Viable` is `NO`, stop and fix the discipline before continuing.
 ### 5. Reference discipline entities in your own entities
 For each entity in the new infospace, add a `## <Discipline> Concept`
 section that names the WoN entity the concept maps to, plus a rationale.
 The exact section heading is configured per schema — see
 `schemas/won-mapping-schema-v1.0.md` in `supply-chain-vsm` for the
 template used there.
 ### 6. Run checks and evaluate
 ```bash
 markitect infospace check
 markitect infospace evaluate --provider openrouter
 markitect infospace eval-summary --update-metrics
 markitect infospace viability
 ```
 The new infospace passes or fails viability independently of WoN.
 ---
 ## Why composition, not inclusion?
 An alternative would be to copy WoN entities directly into the target
 infospace. Composition avoids that by design:
 - **One source of truth** — if WoN is refined, every infospace that binds
  it picks up the improvement on the next run without a sync step.
 - **Separation of concerns** — each infospace owns its own schema,
  thresholds, and entity set. Changing the target topic cannot pollute
  the discipline.
 - **Bounded dependency** — the binding is a path, so the coupling is
  visible in one place (`infospace.yaml`) and easy to remove.
 ---
 ## See also
 - `examples/supply-chain-vsm/README.md` — the full reference composition.
 - `examples/supply-chain-vsm/output/mappings/` — consolidated mapping
  files showing the rationale and strength rating for each pairing.
 - `examples/infospace-with-history/docs/advanced-usage.md` — patterns for
  maintaining the discipline once it is in use.
--- a/examples/infospace-with-history/docs/advanced-usage.md
+++ b/examples/infospace-with-history/docs/advanced-usage.md
@@ -0,0 +1,179 @@
 # Advanced Usage — Wealth of Nations Infospace
 Patterns for working with the WoN infospace (988 entities) after the initial
 pipeline run. Every command in this file has been run against the actual
 infospace at the time of writing (2026-04-21); output shapes are excerpted
 verbatim.
 All commands assume `cwd = examples/infospace-with-history` and the
 `markitect-venv` Python environment.
 ---
 ## 1. Incremental evaluation — add entities after the initial run
 `markitect infospace evaluate` writes one file per entity under
 `output/evaluations/<slug>.md`. It skips any entity whose evaluation file
 already exists, so re-running after adding a new entity processes only the
 new one.
 ```bash
 # Add a new entity file
 vim output/entities/new-concept.md
 # Evaluate only the new entity (explicit)
 markitect infospace evaluate --entity new-concept --provider openrouter
 # Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM
 markitect infospace evaluate --provider openrouter
 ```
 **How skip detection works.** Evaluation slugs are normalised to underscores
 with `_s_` preserving apostrophes (`farmers-capital` entity →
 `farmer_s_capital.md` evaluation). If a new entity slug collides with an
 existing evaluation under this normalisation, the eval will be skipped.
 To be sure an entity was picked up, check:
 ```bash
 # Count entities vs evaluations
 ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l
 ls output/evaluations/*.md | wc -l
 ```
 ---
 ## 2. Re-evaluating after guideline changes
 `evaluate` has no `--force` flag; re-evaluation requires deleting the
 existing file first.
 ```bash
 # Re-evaluate a single entity after updating the evaluation rubric
 rm output/evaluations/accumulation_of_stock.md
 markitect infospace evaluate --entity accumulation-of-stock --provider openrouter
 # Re-evaluate a whole chapter
 ls output/entities/book-1-chapter-06-entities.md   # see which entities the chapter produced
 # Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them
 ```
 After re-evaluating, refresh the aggregate:
 ```bash
 markitect infospace eval-summary --update-metrics
 ```
 This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next
 `markitect infospace viability` check reflects the new scores.
 ---
 ## 3. Interpreting per-entity score distributions
 `eval-summary` shows the mean for each of the five evaluation dimensions
 plus the overall range:
 ```
 $ markitect infospace eval-summary
 Evaluation summary — 985 entities evaluated
  Dimension                        Mean
  --------------------------------------
  overall                         3.956
  definition_precision            3.620
  domain_placement                4.559
  explanatory_value               3.936
  source_grounding                4.358
  vsm_relevance                   3.305
  Range: 1.00 – 4.80
 ```
 Interpretation:
 - `overall` above the 3.5 viability threshold → the collection passes
  `per_entity_mean`.
 - The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If
  the collection is meant to be VSM-grounded, this is the dimension most
  worth improving (via sharper entity definitions or schema changes).
 - A wide range (1.00 – 4.80) tells you there are outliers at both ends —
  worth triaging (see pattern 4).
 ---
 ## 4. Triaging low scorers
 `markitect infospace entities --by-type` prints each entity's star score
 in-line:
 ```
 $ markitect infospace entities --by-type | head
 === Element (315 entities) ===
  active_and_productive_stock              Accumulation       S1   ★4.6
  advanced_state_of_society                General Theory     S5
  agio_of_bank_money                       Exchange           S2   ★4.8
 ```
 Entities with no `★` have no evaluation yet. To list the lowest-scoring
 entities across the whole collection:
 ```bash
 # Extract overall_score from every evaluation file and sort ascending
 for f in output/evaluations/*.md; do
  score=$(awk '/^overall_score:/ {print $2; exit}' "$f")
  printf "%s\t%s\n" "$score" "$(basename "$f" .md)"
 done | sort -n | head -20
 ```
 The 20 lowest scorers are the natural triage list — inspect their
 `output/entities/<slug>.md` and evaluation rationales to decide whether to
 refine the entity, merge it with a better-formed neighbour, or drop it.
 ---
 ## 5. Reading and acting on collection-check output
 `markitect infospace check` runs five concerns (C1–C5). Use `--concern` to
 focus on one and `--json` for machine-readable output:
 ```bash
 # Redundancy — which pairs of entities are suspiciously similar?
 markitect infospace check --concern redundancy --json
 ```
 ```json
 {
  "redundancy": {
    "concern": "C1",
    "redundancy_ratio": 0.0061,
    "similar_pairs": [
      {"entity_a": "bank_economic_contribution_metrics",
       "entity_b": "bank_economic_development_metrics",
       "similarity": 1.0, "method": "word_overlap"},
      {"entity_a": "economic_system_objectives",
       "entity_b": "economic_system_purpose",
       "similarity": 0.9394, "method": "word_overlap"}
    ]
  }
 }
 ```
 Acting on this:
 - **Similarity = 1.0** is almost certainly a duplicate — pick one slug and
  merge or delete the other.
 - **0.85–0.99** usually means two entities genuinely cover the same idea
  with slight phrasing differences. Merging is the cleanest fix.
 - **< 0.85** usually represents legitimate adjacent concepts — leave as-is
  unless the definition rubric says otherwise.
 For coverage and coherence, the pattern is the same: the `--json` output
 surfaces the specific entities / missing links / disconnected components
 you need to look at, rather than a bare ratio.
 ---
 ## See also
 - `METRICS-METHODOLOGY.md` — how each metric is computed.
 - `docs/composition-guide.md` — using this infospace as a discipline for a
  different domain.
 - `docs/performance-notes.md` — observed timings and provider choices.
--- a/examples/infospace-with-history/docs/performance-notes.md
+++ b/examples/infospace-with-history/docs/performance-notes.md
@@ -0,0 +1,106 @@
 # Performance Notes — Wealth of Nations Infospace
 Observed timings, file sizes, and provider choices from the 988-entity WoN
 example. These are **operational notes**, not a benchmark — numbers come
 from the actual S3.3 evaluation run (2026-02-23) rather than a controlled
 experiment.
 ---
 ## Evaluation batch duration
 The initial evaluation pass produced 985 `output/evaluations/*.md` files:
 - First `evaluated_at`: `2026-02-23T00:11:52`
 - Last `evaluated_at`:  `2026-02-23T06:39:45`
 - **Total wall time: ~6h 28m**
 - **Effective throughput: ~2.5 entities/min** (~152 entities/hour)
 Extracted from evaluation frontmatter:
 ```bash
 grep -h '^evaluated_at:' output/evaluations/*.md | sort | sed -n '1p;$p'
 ```
 Caveats:
 - This was against OpenRouter's free tier, which applies implicit
  rate-limiting and occasional retries.
 - Throughput is not constant — gaps between bursts show up as plateaus
  when you plot the timestamps.
 - The batch was not fully parallelised; a tuned concurrent client could
  likely 2–4× this throughput on a paid OpenRouter tier.
 ---
 ## Tokens per entity (estimate)
 Direct token counts are not logged in the evaluation files, but the
 inputs and outputs are on disk:
 - **Input per request**: evaluation schema (~3.7 KB) + entity file
  (~0.7 KB median) + fixed system prompt ≈ **~1500–2500 tokens in**
 - **Output per request**: structured evaluation with 5 dimensions and
  rationales, median eval file 3.6 KB ≈ **~600–800 tokens out**
 - **Round-trip total**: **~2000–3000 tokens per entity**
 - **Batch total estimate**: 985 entities × ~2500 tokens ≈ **~2.5M tokens**
  for the full pass
 The constant per-entity input means the cheapest way to reduce spend on a
 re-run is to narrow the targeted entities (`--entity <slug>` or
 `--chapter <n>`), not to shorten the schema.
 ---
 ## Embedding cache and collection checks
 `markitect infospace check --concern redundancy` supports two similarity
 backends (see `markitect/infospace/checks/redundancy.py`):
 - **`word_overlap`** — the default, used when no embeddings are provided.
  Pure-Python set intersection over tokenised entity text. **No LLM calls,
  no cache needed.** This is what the current WoN check runs.
 - **`embedding`** — active when a pre-computed `{slug: vector}` mapping is
  passed in. No persistent on-disk embedding cache exists today; the
  caller is responsible for computing and supplying the vectors.
 Implication: the 988-entity `check` runs in seconds because it's all
 word-overlap. Switching to embedding similarity would add an embedding
 API pass (another ~988 requests) which is currently a manual step
 outside the CLI.
 ---
 ## Provider choice — recommendation
 For the WoN dataset specifically (text-heavy entities, 5-dimension
 rubric):
 | Scale                 | Recommended provider             | Rationale |
 |-----------------------|----------------------------------|-----------|
 | < 50 entities         | `gemini/gemini-2.5-flash`        | Fast default; free tier is generous enough; consistent with `markitect llm-check` out of the box. |
 | 50 – 1000 entities    | `openrouter` with a `:free` model (e.g. `arcee-ai/trinity-large-preview:free`) | What the S3.3 batch used; gets through 988 entities in one overnight run without cost. |
 | > 1000 entities       | `openrouter` with a paid small-context model, or `openai` | Free-tier rate limits start to dominate wall time; paying for higher concurrency is cheaper than calendar time. |
 All providers are accepted by `markitect infospace evaluate --provider`.
 The evaluation schema doesn't assume any provider-specific features.
 Note on provider mixing: if part of a collection is evaluated under one
 provider/model and the rest under another, `per_entity_mean` can drift
 slightly (different models calibrate scores differently). For the
 viability threshold of 3.5 the drift is usually negligible, but for
 fine-grained outlier analysis prefer a single provider per batch.
 ---
 ## What is *not* measured here
 - **End-to-end pipeline time** (entity extraction from raw chapters,
  classification, relation graph) — only the evaluation phase is timed.
 - **Memory footprint** — the full in-memory state for 988 entities is
  small (< 200 MB observed), but not systematically measured.
 - **Failure/retry rates** — the 985 vs 988 gap is three entities the
  original run missed (plus one added later); no structured retry log
  was kept.
 Expanding any of these into a proper benchmark is **out of scope** for
 the WoN example and should live alongside a synthetic corpus that can be
 regenerated deterministically.