diff --git a/docs/composition-guide.md b/docs/composition-guide.md new file mode 100644 index 00000000..6ea869f1 --- /dev/null +++ b/docs/composition-guide.md @@ -0,0 +1,203 @@ +# Infospace Composition Guide + +One completed, viable infospace can be reused as a **discipline** for +another infospace — a lens applied to a different topic. This guide +explains how composition works and walks through the live +`examples/supply-chain-vsm/` reference. + +--- + +## What composition means + +An **infospace** is a directory of typed entities governed by +`infospace.yaml`. Its entities and relations describe a specific topic +(for example, Adam Smith's *Wealth of Nations*). + +A **discipline** is an infospace declared as a reusable analytical +framework by another infospace. When infospace B binds infospace A as a +discipline: + +1. B's entities can reference A's entities in `## WoN Concept` (or + equivalent) sections. +2. Properties A has already computed on its entities — such as VSM system + placement — become available to B by transitivity through the mapping. +3. B can impose its own viability thresholds independently of A's. The two + infospaces each pass or fail viability on their own terms. + +The binding is declarative: a relative path in `infospace.yaml` plus a +display name. No code. No import. The discipline is looked up on disk at +the declared path when B's commands run. + +--- + +## The viability pre-condition + +Binding a non-viable infospace as a discipline is a mistake: a framework +that fails its own thresholds is not a stable reference frame. Before +binding, confirm the candidate discipline is viable: + +```bash +cd examples/infospace-with-history +markitect infospace viability +``` + +``` +Metric Value Threshold Status +--------------------------------------------------------------- +redundancy_ratio 0.0061 max=0.1 PASS +coverage_ratio 0.6190 min=0.4 PASS +coherence_components 0.0000 max=3 PASS +consistency_cycles 0.0000 max=0 PASS +granularity_entropy 2.6748 min=1.0 PASS +per_entity_mean 3.9556 min=3.5 PASS + +Viable: YES (6/6 thresholds met) +``` + +If the discipline is not viable, fix it first (see +`examples/infospace-with-history/docs/advanced-usage.md` §4 for triaging +low scorers). + +--- + +## Example — how `supply-chain-vsm` binds WoN + +The supply-chain infospace declares WoN as a discipline in its +`infospace.yaml`: + +```yaml +topic: + name: "Modern Supply Chain Management" + domain: "Operations Management" + sources: artifacts/sources/ + +disciplines: + - name: "Wealth of Nations" + path: ../infospace-with-history +``` + +The binding is a **relative path**, so the two infospaces travel together +(they can be moved as a pair without breaking the link). + +Verify the binding resolves and the discipline is viable: + +```bash +cd examples/supply-chain-vsm +markitect infospace disciplines +``` + +``` +Name Entities Viable Path +---------------------------------------------------------------------- +Wealth of Nations 988 YES ../infospace-with-history +``` + +Each supply-chain entity then carries a `## WoN Concept` section +mapping it to exactly one WoN entity. The consolidated mapping files +(`output/mappings/*-mappings.md`) record the pairing, rationale, and a +conceptual-continuity rating (Strong / Moderate / Weak): + +| Supply Chain Entity | WoN Concept | Strength | VSM | +|------------------------------|----------------------------------|----------|-------| +| Demand Signal | Effectual Demand | Strong | S2 | +| Vendor-Managed Inventory | Division of Labour | Strong | S1/S2 | +| Just-in-Time Inventory | Circulating Capital | Strong | S1/S3 | +| Bullwhip Effect | Natural Price as Central Price | Moderate | S2 | +| Safety Stock | Accumulation of Stock | Moderate | S3 | + +Because each WoN entity already has a VSM system placement (S1–S5), the +supply-chain entities inherit a VSM position by transitivity through +their mapping — without supply-chain-vsm needing its own VSM reference. + +--- + +## Creating a new infospace that binds an existing one + +Step-by-step, using WoN as the discipline for a hypothetical "Modern +Monetary Policy" infospace: + +### 1. Start from the target topic + +```bash +mkdir -p examples/monetary-policy/artifacts/sources +cd examples/monetary-policy +markitect infospace init +``` + +### 2. Declare the discipline in `infospace.yaml` + +```yaml +topic: + name: "Modern Monetary Policy" + domain: "Macroeconomics" + sources: artifacts/sources/ + +disciplines: + - name: "Wealth of Nations" + path: ../infospace-with-history +``` + +Alternatively, bind imperatively after `init`: + +```bash +markitect infospace bind-discipline ../infospace-with-history --name "Wealth of Nations" +``` + +### 3. Set your own viability thresholds + +Copy the `viability:` block from a reference infospace and tune the +numbers to the scale and maturity of your topic. A smaller infospace +(50 entities, not 988) may need laxer `coverage_ratio` and stricter +`redundancy_ratio`. + +### 4. Verify the binding + +```bash +markitect infospace disciplines +``` + +If `Viable` is `NO`, stop and fix the discipline before continuing. + +### 5. Reference discipline entities in your own entities + +For each entity in the new infospace, add a `## Concept` +section that names the WoN entity the concept maps to, plus a rationale. +The exact section heading is configured per schema — see +`schemas/won-mapping-schema-v1.0.md` in `supply-chain-vsm` for the +template used there. + +### 6. Run checks and evaluate + +```bash +markitect infospace check +markitect infospace evaluate --provider openrouter +markitect infospace eval-summary --update-metrics +markitect infospace viability +``` + +The new infospace passes or fails viability independently of WoN. + +--- + +## Why composition, not inclusion? + +An alternative would be to copy WoN entities directly into the target +infospace. Composition avoids that by design: + +- **One source of truth** — if WoN is refined, every infospace that binds + it picks up the improvement on the next run without a sync step. +- **Separation of concerns** — each infospace owns its own schema, + thresholds, and entity set. Changing the target topic cannot pollute + the discipline. +- **Bounded dependency** — the binding is a path, so the coupling is + visible in one place (`infospace.yaml`) and easy to remove. + +--- + +## See also + +- `examples/supply-chain-vsm/README.md` — the full reference composition. +- `examples/supply-chain-vsm/output/mappings/` — consolidated mapping + files showing the rationale and strength rating for each pairing. +- `examples/infospace-with-history/docs/advanced-usage.md` — patterns for + maintaining the discipline once it is in use. diff --git a/examples/infospace-with-history/docs/advanced-usage.md b/examples/infospace-with-history/docs/advanced-usage.md new file mode 100644 index 00000000..cff5d36e --- /dev/null +++ b/examples/infospace-with-history/docs/advanced-usage.md @@ -0,0 +1,179 @@ +# Advanced Usage — Wealth of Nations Infospace + +Patterns for working with the WoN infospace (988 entities) after the initial +pipeline run. Every command in this file has been run against the actual +infospace at the time of writing (2026-04-21); output shapes are excerpted +verbatim. + +All commands assume `cwd = examples/infospace-with-history` and the +`markitect-venv` Python environment. + +--- + +## 1. Incremental evaluation — add entities after the initial run + +`markitect infospace evaluate` writes one file per entity under +`output/evaluations/.md`. It skips any entity whose evaluation file +already exists, so re-running after adding a new entity processes only the +new one. + +```bash +# Add a new entity file +vim output/entities/new-concept.md + +# Evaluate only the new entity (explicit) +markitect infospace evaluate --entity new-concept --provider openrouter + +# Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM +markitect infospace evaluate --provider openrouter +``` + +**How skip detection works.** Evaluation slugs are normalised to underscores +with `_s_` preserving apostrophes (`farmers-capital` entity → +`farmer_s_capital.md` evaluation). If a new entity slug collides with an +existing evaluation under this normalisation, the eval will be skipped. +To be sure an entity was picked up, check: + +```bash +# Count entities vs evaluations +ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l +ls output/evaluations/*.md | wc -l +``` + +--- + +## 2. Re-evaluating after guideline changes + +`evaluate` has no `--force` flag; re-evaluation requires deleting the +existing file first. + +```bash +# Re-evaluate a single entity after updating the evaluation rubric +rm output/evaluations/accumulation_of_stock.md +markitect infospace evaluate --entity accumulation-of-stock --provider openrouter + +# Re-evaluate a whole chapter +ls output/entities/book-1-chapter-06-entities.md # see which entities the chapter produced +# Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them +``` + +After re-evaluating, refresh the aggregate: + +```bash +markitect infospace eval-summary --update-metrics +``` + +This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next +`markitect infospace viability` check reflects the new scores. + +--- + +## 3. Interpreting per-entity score distributions + +`eval-summary` shows the mean for each of the five evaluation dimensions +plus the overall range: + +``` +$ markitect infospace eval-summary +Evaluation summary — 985 entities evaluated + + Dimension Mean + -------------------------------------- + overall 3.956 + definition_precision 3.620 + domain_placement 4.559 + explanatory_value 3.936 + source_grounding 4.358 + vsm_relevance 3.305 + + Range: 1.00 – 4.80 +``` + +Interpretation: +- `overall` above the 3.5 viability threshold → the collection passes + `per_entity_mean`. +- The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If + the collection is meant to be VSM-grounded, this is the dimension most + worth improving (via sharper entity definitions or schema changes). +- A wide range (1.00 – 4.80) tells you there are outliers at both ends — + worth triaging (see pattern 4). + +--- + +## 4. Triaging low scorers + +`markitect infospace entities --by-type` prints each entity's star score +in-line: + +``` +$ markitect infospace entities --by-type | head +=== Element (315 entities) === + active_and_productive_stock Accumulation S1 ★4.6 + advanced_state_of_society General Theory S5 + agio_of_bank_money Exchange S2 ★4.8 +``` + +Entities with no `★` have no evaluation yet. To list the lowest-scoring +entities across the whole collection: + +```bash +# Extract overall_score from every evaluation file and sort ascending +for f in output/evaluations/*.md; do + score=$(awk '/^overall_score:/ {print $2; exit}' "$f") + printf "%s\t%s\n" "$score" "$(basename "$f" .md)" +done | sort -n | head -20 +``` + +The 20 lowest scorers are the natural triage list — inspect their +`output/entities/.md` and evaluation rationales to decide whether to +refine the entity, merge it with a better-formed neighbour, or drop it. + +--- + +## 5. Reading and acting on collection-check output + +`markitect infospace check` runs five concerns (C1–C5). Use `--concern` to +focus on one and `--json` for machine-readable output: + +```bash +# Redundancy — which pairs of entities are suspiciously similar? +markitect infospace check --concern redundancy --json +``` + +```json +{ + "redundancy": { + "concern": "C1", + "redundancy_ratio": 0.0061, + "similar_pairs": [ + {"entity_a": "bank_economic_contribution_metrics", + "entity_b": "bank_economic_development_metrics", + "similarity": 1.0, "method": "word_overlap"}, + {"entity_a": "economic_system_objectives", + "entity_b": "economic_system_purpose", + "similarity": 0.9394, "method": "word_overlap"} + ] + } +} +``` + +Acting on this: +- **Similarity = 1.0** is almost certainly a duplicate — pick one slug and + merge or delete the other. +- **0.85–0.99** usually means two entities genuinely cover the same idea + with slight phrasing differences. Merging is the cleanest fix. +- **< 0.85** usually represents legitimate adjacent concepts — leave as-is + unless the definition rubric says otherwise. + +For coverage and coherence, the pattern is the same: the `--json` output +surfaces the specific entities / missing links / disconnected components +you need to look at, rather than a bare ratio. + +--- + +## See also + +- `METRICS-METHODOLOGY.md` — how each metric is computed. +- `docs/composition-guide.md` — using this infospace as a discipline for a + different domain. +- `docs/performance-notes.md` — observed timings and provider choices. diff --git a/examples/infospace-with-history/docs/performance-notes.md b/examples/infospace-with-history/docs/performance-notes.md new file mode 100644 index 00000000..e6faaf54 --- /dev/null +++ b/examples/infospace-with-history/docs/performance-notes.md @@ -0,0 +1,106 @@ +# Performance Notes — Wealth of Nations Infospace + +Observed timings, file sizes, and provider choices from the 988-entity WoN +example. These are **operational notes**, not a benchmark — numbers come +from the actual S3.3 evaluation run (2026-02-23) rather than a controlled +experiment. + +--- + +## Evaluation batch duration + +The initial evaluation pass produced 985 `output/evaluations/*.md` files: + +- First `evaluated_at`: `2026-02-23T00:11:52` +- Last `evaluated_at`: `2026-02-23T06:39:45` +- **Total wall time: ~6h 28m** +- **Effective throughput: ~2.5 entities/min** (~152 entities/hour) + +Extracted from evaluation frontmatter: +```bash +grep -h '^evaluated_at:' output/evaluations/*.md | sort | sed -n '1p;$p' +``` + +Caveats: +- This was against OpenRouter's free tier, which applies implicit + rate-limiting and occasional retries. +- Throughput is not constant — gaps between bursts show up as plateaus + when you plot the timestamps. +- The batch was not fully parallelised; a tuned concurrent client could + likely 2–4× this throughput on a paid OpenRouter tier. + +--- + +## Tokens per entity (estimate) + +Direct token counts are not logged in the evaluation files, but the +inputs and outputs are on disk: + +- **Input per request**: evaluation schema (~3.7 KB) + entity file + (~0.7 KB median) + fixed system prompt ≈ **~1500–2500 tokens in** +- **Output per request**: structured evaluation with 5 dimensions and + rationales, median eval file 3.6 KB ≈ **~600–800 tokens out** +- **Round-trip total**: **~2000–3000 tokens per entity** +- **Batch total estimate**: 985 entities × ~2500 tokens ≈ **~2.5M tokens** + for the full pass + +The constant per-entity input means the cheapest way to reduce spend on a +re-run is to narrow the targeted entities (`--entity ` or +`--chapter `), not to shorten the schema. + +--- + +## Embedding cache and collection checks + +`markitect infospace check --concern redundancy` supports two similarity +backends (see `markitect/infospace/checks/redundancy.py`): + +- **`word_overlap`** — the default, used when no embeddings are provided. + Pure-Python set intersection over tokenised entity text. **No LLM calls, + no cache needed.** This is what the current WoN check runs. +- **`embedding`** — active when a pre-computed `{slug: vector}` mapping is + passed in. No persistent on-disk embedding cache exists today; the + caller is responsible for computing and supplying the vectors. + +Implication: the 988-entity `check` runs in seconds because it's all +word-overlap. Switching to embedding similarity would add an embedding +API pass (another ~988 requests) which is currently a manual step +outside the CLI. + +--- + +## Provider choice — recommendation + +For the WoN dataset specifically (text-heavy entities, 5-dimension +rubric): + +| Scale | Recommended provider | Rationale | +|-----------------------|----------------------------------|-----------| +| < 50 entities | `gemini/gemini-2.5-flash` | Fast default; free tier is generous enough; consistent with `markitect llm-check` out of the box. | +| 50 – 1000 entities | `openrouter` with a `:free` model (e.g. `arcee-ai/trinity-large-preview:free`) | What the S3.3 batch used; gets through 988 entities in one overnight run without cost. | +| > 1000 entities | `openrouter` with a paid small-context model, or `openai` | Free-tier rate limits start to dominate wall time; paying for higher concurrency is cheaper than calendar time. | + +All providers are accepted by `markitect infospace evaluate --provider`. +The evaluation schema doesn't assume any provider-specific features. + +Note on provider mixing: if part of a collection is evaluated under one +provider/model and the rest under another, `per_entity_mean` can drift +slightly (different models calibrate scores differently). For the +viability threshold of 3.5 the drift is usually negligible, but for +fine-grained outlier analysis prefer a single provider per batch. + +--- + +## What is *not* measured here + +- **End-to-end pipeline time** (entity extraction from raw chapters, + classification, relation graph) — only the evaluation phase is timed. +- **Memory footprint** — the full in-memory state for 988 entities is + small (< 200 MB observed), but not systematically measured. +- **Failure/retry rates** — the 985 vs 988 gap is three entities the + original run missed (plus one added later); no structured retry log + was kept. + +Expanding any of these into a proper benchmark is **out of scope** for +the WoN example and should live alongside a synthetic corpus that can be +regenerated deterministically.