docs(infospace): add advanced-usage, composition guide, and performance notes (C.4/C.5/C.6)

Closes out three docs tasks from roadmap/infospace-s3-closeout/PLAN.md:

- examples/infospace-with-history/docs/advanced-usage.md (C.4) — 5 worked
  patterns covering incremental eval, re-eval workflow (no --force flag
  exists; documents the rm-then-re-run pattern instead), interpreting the
  eval-summary distribution, triaging low scorers via an awk pipeline
  over overall_score (since `entities --sort-by score` does not exist),
  and acting on check --json output.
- docs/composition-guide.md (C.5) — walks through how supply-chain-vsm
  binds WoN as a discipline, then a step-by-step for creating a new
  infospace that binds an existing one. Includes live output from
  `markitect infospace disciplines`.
- examples/infospace-with-history/docs/performance-notes.md (C.6) — cites
  the 6h 28m wall time of the 985-entity S3.3 batch, ~2.5 ent/min rate,
  ~2000–3000 tokens/entity estimate, word_overlap vs embedding backend
  for redundancy checks, and a provider-by-scale recommendation table.

All commands in these docs were run against the live infospace at
commit time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-21 07:02:46 +02:00
parent b7e11461f4
commit 36a5136bdf
3 changed files with 488 additions and 0 deletions

203
docs/composition-guide.md Normal file
View File

@@ -0,0 +1,203 @@
# Infospace Composition Guide
One completed, viable infospace can be reused as a **discipline** for
another infospace — a lens applied to a different topic. This guide
explains how composition works and walks through the live
`examples/supply-chain-vsm/` reference.
---
## What composition means
An **infospace** is a directory of typed entities governed by
`infospace.yaml`. Its entities and relations describe a specific topic
(for example, Adam Smith's *Wealth of Nations*).
A **discipline** is an infospace declared as a reusable analytical
framework by another infospace. When infospace B binds infospace A as a
discipline:
1. B's entities can reference A's entities in `## WoN Concept` (or
equivalent) sections.
2. Properties A has already computed on its entities — such as VSM system
placement — become available to B by transitivity through the mapping.
3. B can impose its own viability thresholds independently of A's. The two
infospaces each pass or fail viability on their own terms.
The binding is declarative: a relative path in `infospace.yaml` plus a
display name. No code. No import. The discipline is looked up on disk at
the declared path when B's commands run.
---
## The viability pre-condition
Binding a non-viable infospace as a discipline is a mistake: a framework
that fails its own thresholds is not a stable reference frame. Before
binding, confirm the candidate discipline is viable:
```bash
cd examples/infospace-with-history
markitect infospace viability
```
```
Metric Value Threshold Status
---------------------------------------------------------------
redundancy_ratio 0.0061 max=0.1 PASS
coverage_ratio 0.6190 min=0.4 PASS
coherence_components 0.0000 max=3 PASS
consistency_cycles 0.0000 max=0 PASS
granularity_entropy 2.6748 min=1.0 PASS
per_entity_mean 3.9556 min=3.5 PASS
Viable: YES (6/6 thresholds met)
```
If the discipline is not viable, fix it first (see
`examples/infospace-with-history/docs/advanced-usage.md` §4 for triaging
low scorers).
---
## Example — how `supply-chain-vsm` binds WoN
The supply-chain infospace declares WoN as a discipline in its
`infospace.yaml`:
```yaml
topic:
name: "Modern Supply Chain Management"
domain: "Operations Management"
sources: artifacts/sources/
disciplines:
- name: "Wealth of Nations"
path: ../infospace-with-history
```
The binding is a **relative path**, so the two infospaces travel together
(they can be moved as a pair without breaking the link).
Verify the binding resolves and the discipline is viable:
```bash
cd examples/supply-chain-vsm
markitect infospace disciplines
```
```
Name Entities Viable Path
----------------------------------------------------------------------
Wealth of Nations 988 YES ../infospace-with-history
```
Each supply-chain entity then carries a `## WoN Concept` section
mapping it to exactly one WoN entity. The consolidated mapping files
(`output/mappings/*-mappings.md`) record the pairing, rationale, and a
conceptual-continuity rating (Strong / Moderate / Weak):
| Supply Chain Entity | WoN Concept | Strength | VSM |
|------------------------------|----------------------------------|----------|-------|
| Demand Signal | Effectual Demand | Strong | S2 |
| Vendor-Managed Inventory | Division of Labour | Strong | S1/S2 |
| Just-in-Time Inventory | Circulating Capital | Strong | S1/S3 |
| Bullwhip Effect | Natural Price as Central Price | Moderate | S2 |
| Safety Stock | Accumulation of Stock | Moderate | S3 |
Because each WoN entity already has a VSM system placement (S1S5), the
supply-chain entities inherit a VSM position by transitivity through
their mapping — without supply-chain-vsm needing its own VSM reference.
---
## Creating a new infospace that binds an existing one
Step-by-step, using WoN as the discipline for a hypothetical "Modern
Monetary Policy" infospace:
### 1. Start from the target topic
```bash
mkdir -p examples/monetary-policy/artifacts/sources
cd examples/monetary-policy
markitect infospace init
```
### 2. Declare the discipline in `infospace.yaml`
```yaml
topic:
name: "Modern Monetary Policy"
domain: "Macroeconomics"
sources: artifacts/sources/
disciplines:
- name: "Wealth of Nations"
path: ../infospace-with-history
```
Alternatively, bind imperatively after `init`:
```bash
markitect infospace bind-discipline ../infospace-with-history --name "Wealth of Nations"
```
### 3. Set your own viability thresholds
Copy the `viability:` block from a reference infospace and tune the
numbers to the scale and maturity of your topic. A smaller infospace
(50 entities, not 988) may need laxer `coverage_ratio` and stricter
`redundancy_ratio`.
### 4. Verify the binding
```bash
markitect infospace disciplines
```
If `Viable` is `NO`, stop and fix the discipline before continuing.
### 5. Reference discipline entities in your own entities
For each entity in the new infospace, add a `## <Discipline> Concept`
section that names the WoN entity the concept maps to, plus a rationale.
The exact section heading is configured per schema — see
`schemas/won-mapping-schema-v1.0.md` in `supply-chain-vsm` for the
template used there.
### 6. Run checks and evaluate
```bash
markitect infospace check
markitect infospace evaluate --provider openrouter
markitect infospace eval-summary --update-metrics
markitect infospace viability
```
The new infospace passes or fails viability independently of WoN.
---
## Why composition, not inclusion?
An alternative would be to copy WoN entities directly into the target
infospace. Composition avoids that by design:
- **One source of truth** — if WoN is refined, every infospace that binds
it picks up the improvement on the next run without a sync step.
- **Separation of concerns** — each infospace owns its own schema,
thresholds, and entity set. Changing the target topic cannot pollute
the discipline.
- **Bounded dependency** — the binding is a path, so the coupling is
visible in one place (`infospace.yaml`) and easy to remove.
---
## See also
- `examples/supply-chain-vsm/README.md` — the full reference composition.
- `examples/supply-chain-vsm/output/mappings/` — consolidated mapping
files showing the rationale and strength rating for each pairing.
- `examples/infospace-with-history/docs/advanced-usage.md` — patterns for
maintaining the discipline once it is in use.

View File

@@ -0,0 +1,179 @@
# Advanced Usage — Wealth of Nations Infospace
Patterns for working with the WoN infospace (988 entities) after the initial
pipeline run. Every command in this file has been run against the actual
infospace at the time of writing (2026-04-21); output shapes are excerpted
verbatim.
All commands assume `cwd = examples/infospace-with-history` and the
`markitect-venv` Python environment.
---
## 1. Incremental evaluation — add entities after the initial run
`markitect infospace evaluate` writes one file per entity under
`output/evaluations/<slug>.md`. It skips any entity whose evaluation file
already exists, so re-running after adding a new entity processes only the
new one.
```bash
# Add a new entity file
vim output/entities/new-concept.md
# Evaluate only the new entity (explicit)
markitect infospace evaluate --entity new-concept --provider openrouter
# Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM
markitect infospace evaluate --provider openrouter
```
**How skip detection works.** Evaluation slugs are normalised to underscores
with `_s_` preserving apostrophes (`farmers-capital` entity →
`farmer_s_capital.md` evaluation). If a new entity slug collides with an
existing evaluation under this normalisation, the eval will be skipped.
To be sure an entity was picked up, check:
```bash
# Count entities vs evaluations
ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l
ls output/evaluations/*.md | wc -l
```
---
## 2. Re-evaluating after guideline changes
`evaluate` has no `--force` flag; re-evaluation requires deleting the
existing file first.
```bash
# Re-evaluate a single entity after updating the evaluation rubric
rm output/evaluations/accumulation_of_stock.md
markitect infospace evaluate --entity accumulation-of-stock --provider openrouter
# Re-evaluate a whole chapter
ls output/entities/book-1-chapter-06-entities.md # see which entities the chapter produced
# Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them
```
After re-evaluating, refresh the aggregate:
```bash
markitect infospace eval-summary --update-metrics
```
This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next
`markitect infospace viability` check reflects the new scores.
---
## 3. Interpreting per-entity score distributions
`eval-summary` shows the mean for each of the five evaluation dimensions
plus the overall range:
```
$ markitect infospace eval-summary
Evaluation summary — 985 entities evaluated
Dimension Mean
--------------------------------------
overall 3.956
definition_precision 3.620
domain_placement 4.559
explanatory_value 3.936
source_grounding 4.358
vsm_relevance 3.305
Range: 1.00 4.80
```
Interpretation:
- `overall` above the 3.5 viability threshold → the collection passes
`per_entity_mean`.
- The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If
the collection is meant to be VSM-grounded, this is the dimension most
worth improving (via sharper entity definitions or schema changes).
- A wide range (1.00 4.80) tells you there are outliers at both ends —
worth triaging (see pattern 4).
---
## 4. Triaging low scorers
`markitect infospace entities --by-type` prints each entity's star score
in-line:
```
$ markitect infospace entities --by-type | head
=== Element (315 entities) ===
active_and_productive_stock Accumulation S1 ★4.6
advanced_state_of_society General Theory S5
agio_of_bank_money Exchange S2 ★4.8
```
Entities with no `★` have no evaluation yet. To list the lowest-scoring
entities across the whole collection:
```bash
# Extract overall_score from every evaluation file and sort ascending
for f in output/evaluations/*.md; do
score=$(awk '/^overall_score:/ {print $2; exit}' "$f")
printf "%s\t%s\n" "$score" "$(basename "$f" .md)"
done | sort -n | head -20
```
The 20 lowest scorers are the natural triage list — inspect their
`output/entities/<slug>.md` and evaluation rationales to decide whether to
refine the entity, merge it with a better-formed neighbour, or drop it.
---
## 5. Reading and acting on collection-check output
`markitect infospace check` runs five concerns (C1C5). Use `--concern` to
focus on one and `--json` for machine-readable output:
```bash
# Redundancy — which pairs of entities are suspiciously similar?
markitect infospace check --concern redundancy --json
```
```json
{
"redundancy": {
"concern": "C1",
"redundancy_ratio": 0.0061,
"similar_pairs": [
{"entity_a": "bank_economic_contribution_metrics",
"entity_b": "bank_economic_development_metrics",
"similarity": 1.0, "method": "word_overlap"},
{"entity_a": "economic_system_objectives",
"entity_b": "economic_system_purpose",
"similarity": 0.9394, "method": "word_overlap"}
]
}
}
```
Acting on this:
- **Similarity = 1.0** is almost certainly a duplicate — pick one slug and
merge or delete the other.
- **0.850.99** usually means two entities genuinely cover the same idea
with slight phrasing differences. Merging is the cleanest fix.
- **< 0.85** usually represents legitimate adjacent concepts — leave as-is
unless the definition rubric says otherwise.
For coverage and coherence, the pattern is the same: the `--json` output
surfaces the specific entities / missing links / disconnected components
you need to look at, rather than a bare ratio.
---
## See also
- `METRICS-METHODOLOGY.md` — how each metric is computed.
- `docs/composition-guide.md` — using this infospace as a discipline for a
different domain.
- `docs/performance-notes.md` — observed timings and provider choices.

View File

@@ -0,0 +1,106 @@
# Performance Notes — Wealth of Nations Infospace
Observed timings, file sizes, and provider choices from the 988-entity WoN
example. These are **operational notes**, not a benchmark — numbers come
from the actual S3.3 evaluation run (2026-02-23) rather than a controlled
experiment.
---
## Evaluation batch duration
The initial evaluation pass produced 985 `output/evaluations/*.md` files:
- First `evaluated_at`: `2026-02-23T00:11:52`
- Last `evaluated_at`: `2026-02-23T06:39:45`
- **Total wall time: ~6h 28m**
- **Effective throughput: ~2.5 entities/min** (~152 entities/hour)
Extracted from evaluation frontmatter:
```bash
grep -h '^evaluated_at:' output/evaluations/*.md | sort | sed -n '1p;$p'
```
Caveats:
- This was against OpenRouter's free tier, which applies implicit
rate-limiting and occasional retries.
- Throughput is not constant — gaps between bursts show up as plateaus
when you plot the timestamps.
- The batch was not fully parallelised; a tuned concurrent client could
likely 24× this throughput on a paid OpenRouter tier.
---
## Tokens per entity (estimate)
Direct token counts are not logged in the evaluation files, but the
inputs and outputs are on disk:
- **Input per request**: evaluation schema (~3.7 KB) + entity file
(~0.7 KB median) + fixed system prompt ≈ **~15002500 tokens in**
- **Output per request**: structured evaluation with 5 dimensions and
rationales, median eval file 3.6 KB ≈ **~600800 tokens out**
- **Round-trip total**: **~20003000 tokens per entity**
- **Batch total estimate**: 985 entities × ~2500 tokens ≈ **~2.5M tokens**
for the full pass
The constant per-entity input means the cheapest way to reduce spend on a
re-run is to narrow the targeted entities (`--entity <slug>` or
`--chapter <n>`), not to shorten the schema.
---
## Embedding cache and collection checks
`markitect infospace check --concern redundancy` supports two similarity
backends (see `markitect/infospace/checks/redundancy.py`):
- **`word_overlap`** — the default, used when no embeddings are provided.
Pure-Python set intersection over tokenised entity text. **No LLM calls,
no cache needed.** This is what the current WoN check runs.
- **`embedding`** — active when a pre-computed `{slug: vector}` mapping is
passed in. No persistent on-disk embedding cache exists today; the
caller is responsible for computing and supplying the vectors.
Implication: the 988-entity `check` runs in seconds because it's all
word-overlap. Switching to embedding similarity would add an embedding
API pass (another ~988 requests) which is currently a manual step
outside the CLI.
---
## Provider choice — recommendation
For the WoN dataset specifically (text-heavy entities, 5-dimension
rubric):
| Scale | Recommended provider | Rationale |
|-----------------------|----------------------------------|-----------|
| < 50 entities | `gemini/gemini-2.5-flash` | Fast default; free tier is generous enough; consistent with `markitect llm-check` out of the box. |
| 50 1000 entities | `openrouter` with a `:free` model (e.g. `arcee-ai/trinity-large-preview:free`) | What the S3.3 batch used; gets through 988 entities in one overnight run without cost. |
| > 1000 entities | `openrouter` with a paid small-context model, or `openai` | Free-tier rate limits start to dominate wall time; paying for higher concurrency is cheaper than calendar time. |
All providers are accepted by `markitect infospace evaluate --provider`.
The evaluation schema doesn't assume any provider-specific features.
Note on provider mixing: if part of a collection is evaluated under one
provider/model and the rest under another, `per_entity_mean` can drift
slightly (different models calibrate scores differently). For the
viability threshold of 3.5 the drift is usually negligible, but for
fine-grained outlier analysis prefer a single provider per batch.
---
## What is *not* measured here
- **End-to-end pipeline time** (entity extraction from raw chapters,
classification, relation graph) — only the evaluation phase is timed.
- **Memory footprint** — the full in-memory state for 988 entities is
small (< 200 MB observed), but not systematically measured.
- **Failure/retry rates** — the 985 vs 988 gap is three entities the
original run missed (plus one added later); no structured retry log
was kept.
Expanding any of these into a proper benchmark is **out of scope** for
the WoN example and should live alongside a synthetic corpus that can be
regenerated deterministically.