docs(infospace): add advanced-usage, composition guide, and performance notes (C.4/C.5/C.6)
Closes out three docs tasks from roadmap/infospace-s3-closeout/PLAN.md: - examples/infospace-with-history/docs/advanced-usage.md (C.4) — 5 worked patterns covering incremental eval, re-eval workflow (no --force flag exists; documents the rm-then-re-run pattern instead), interpreting the eval-summary distribution, triaging low scorers via an awk pipeline over overall_score (since `entities --sort-by score` does not exist), and acting on check --json output. - docs/composition-guide.md (C.5) — walks through how supply-chain-vsm binds WoN as a discipline, then a step-by-step for creating a new infospace that binds an existing one. Includes live output from `markitect infospace disciplines`. - examples/infospace-with-history/docs/performance-notes.md (C.6) — cites the 6h 28m wall time of the 985-entity S3.3 batch, ~2.5 ent/min rate, ~2000–3000 tokens/entity estimate, word_overlap vs embedding backend for redundancy checks, and a provider-by-scale recommendation table. All commands in these docs were run against the live infospace at commit time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
203
docs/composition-guide.md
Normal file
203
docs/composition-guide.md
Normal file
@@ -0,0 +1,203 @@
|
|||||||
|
# Infospace Composition Guide
|
||||||
|
|
||||||
|
One completed, viable infospace can be reused as a **discipline** for
|
||||||
|
another infospace — a lens applied to a different topic. This guide
|
||||||
|
explains how composition works and walks through the live
|
||||||
|
`examples/supply-chain-vsm/` reference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What composition means
|
||||||
|
|
||||||
|
An **infospace** is a directory of typed entities governed by
|
||||||
|
`infospace.yaml`. Its entities and relations describe a specific topic
|
||||||
|
(for example, Adam Smith's *Wealth of Nations*).
|
||||||
|
|
||||||
|
A **discipline** is an infospace declared as a reusable analytical
|
||||||
|
framework by another infospace. When infospace B binds infospace A as a
|
||||||
|
discipline:
|
||||||
|
|
||||||
|
1. B's entities can reference A's entities in `## WoN Concept` (or
|
||||||
|
equivalent) sections.
|
||||||
|
2. Properties A has already computed on its entities — such as VSM system
|
||||||
|
placement — become available to B by transitivity through the mapping.
|
||||||
|
3. B can impose its own viability thresholds independently of A's. The two
|
||||||
|
infospaces each pass or fail viability on their own terms.
|
||||||
|
|
||||||
|
The binding is declarative: a relative path in `infospace.yaml` plus a
|
||||||
|
display name. No code. No import. The discipline is looked up on disk at
|
||||||
|
the declared path when B's commands run.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The viability pre-condition
|
||||||
|
|
||||||
|
Binding a non-viable infospace as a discipline is a mistake: a framework
|
||||||
|
that fails its own thresholds is not a stable reference frame. Before
|
||||||
|
binding, confirm the candidate discipline is viable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd examples/infospace-with-history
|
||||||
|
markitect infospace viability
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Metric Value Threshold Status
|
||||||
|
---------------------------------------------------------------
|
||||||
|
redundancy_ratio 0.0061 max=0.1 PASS
|
||||||
|
coverage_ratio 0.6190 min=0.4 PASS
|
||||||
|
coherence_components 0.0000 max=3 PASS
|
||||||
|
consistency_cycles 0.0000 max=0 PASS
|
||||||
|
granularity_entropy 2.6748 min=1.0 PASS
|
||||||
|
per_entity_mean 3.9556 min=3.5 PASS
|
||||||
|
|
||||||
|
Viable: YES (6/6 thresholds met)
|
||||||
|
```
|
||||||
|
|
||||||
|
If the discipline is not viable, fix it first (see
|
||||||
|
`examples/infospace-with-history/docs/advanced-usage.md` §4 for triaging
|
||||||
|
low scorers).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Example — how `supply-chain-vsm` binds WoN
|
||||||
|
|
||||||
|
The supply-chain infospace declares WoN as a discipline in its
|
||||||
|
`infospace.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
topic:
|
||||||
|
name: "Modern Supply Chain Management"
|
||||||
|
domain: "Operations Management"
|
||||||
|
sources: artifacts/sources/
|
||||||
|
|
||||||
|
disciplines:
|
||||||
|
- name: "Wealth of Nations"
|
||||||
|
path: ../infospace-with-history
|
||||||
|
```
|
||||||
|
|
||||||
|
The binding is a **relative path**, so the two infospaces travel together
|
||||||
|
(they can be moved as a pair without breaking the link).
|
||||||
|
|
||||||
|
Verify the binding resolves and the discipline is viable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd examples/supply-chain-vsm
|
||||||
|
markitect infospace disciplines
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Name Entities Viable Path
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
Wealth of Nations 988 YES ../infospace-with-history
|
||||||
|
```
|
||||||
|
|
||||||
|
Each supply-chain entity then carries a `## WoN Concept` section
|
||||||
|
mapping it to exactly one WoN entity. The consolidated mapping files
|
||||||
|
(`output/mappings/*-mappings.md`) record the pairing, rationale, and a
|
||||||
|
conceptual-continuity rating (Strong / Moderate / Weak):
|
||||||
|
|
||||||
|
| Supply Chain Entity | WoN Concept | Strength | VSM |
|
||||||
|
|------------------------------|----------------------------------|----------|-------|
|
||||||
|
| Demand Signal | Effectual Demand | Strong | S2 |
|
||||||
|
| Vendor-Managed Inventory | Division of Labour | Strong | S1/S2 |
|
||||||
|
| Just-in-Time Inventory | Circulating Capital | Strong | S1/S3 |
|
||||||
|
| Bullwhip Effect | Natural Price as Central Price | Moderate | S2 |
|
||||||
|
| Safety Stock | Accumulation of Stock | Moderate | S3 |
|
||||||
|
|
||||||
|
Because each WoN entity already has a VSM system placement (S1–S5), the
|
||||||
|
supply-chain entities inherit a VSM position by transitivity through
|
||||||
|
their mapping — without supply-chain-vsm needing its own VSM reference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Creating a new infospace that binds an existing one
|
||||||
|
|
||||||
|
Step-by-step, using WoN as the discipline for a hypothetical "Modern
|
||||||
|
Monetary Policy" infospace:
|
||||||
|
|
||||||
|
### 1. Start from the target topic
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p examples/monetary-policy/artifacts/sources
|
||||||
|
cd examples/monetary-policy
|
||||||
|
markitect infospace init
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Declare the discipline in `infospace.yaml`
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
topic:
|
||||||
|
name: "Modern Monetary Policy"
|
||||||
|
domain: "Macroeconomics"
|
||||||
|
sources: artifacts/sources/
|
||||||
|
|
||||||
|
disciplines:
|
||||||
|
- name: "Wealth of Nations"
|
||||||
|
path: ../infospace-with-history
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternatively, bind imperatively after `init`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
markitect infospace bind-discipline ../infospace-with-history --name "Wealth of Nations"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Set your own viability thresholds
|
||||||
|
|
||||||
|
Copy the `viability:` block from a reference infospace and tune the
|
||||||
|
numbers to the scale and maturity of your topic. A smaller infospace
|
||||||
|
(50 entities, not 988) may need laxer `coverage_ratio` and stricter
|
||||||
|
`redundancy_ratio`.
|
||||||
|
|
||||||
|
### 4. Verify the binding
|
||||||
|
|
||||||
|
```bash
|
||||||
|
markitect infospace disciplines
|
||||||
|
```
|
||||||
|
|
||||||
|
If `Viable` is `NO`, stop and fix the discipline before continuing.
|
||||||
|
|
||||||
|
### 5. Reference discipline entities in your own entities
|
||||||
|
|
||||||
|
For each entity in the new infospace, add a `## <Discipline> Concept`
|
||||||
|
section that names the WoN entity the concept maps to, plus a rationale.
|
||||||
|
The exact section heading is configured per schema — see
|
||||||
|
`schemas/won-mapping-schema-v1.0.md` in `supply-chain-vsm` for the
|
||||||
|
template used there.
|
||||||
|
|
||||||
|
### 6. Run checks and evaluate
|
||||||
|
|
||||||
|
```bash
|
||||||
|
markitect infospace check
|
||||||
|
markitect infospace evaluate --provider openrouter
|
||||||
|
markitect infospace eval-summary --update-metrics
|
||||||
|
markitect infospace viability
|
||||||
|
```
|
||||||
|
|
||||||
|
The new infospace passes or fails viability independently of WoN.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why composition, not inclusion?
|
||||||
|
|
||||||
|
An alternative would be to copy WoN entities directly into the target
|
||||||
|
infospace. Composition avoids that by design:
|
||||||
|
|
||||||
|
- **One source of truth** — if WoN is refined, every infospace that binds
|
||||||
|
it picks up the improvement on the next run without a sync step.
|
||||||
|
- **Separation of concerns** — each infospace owns its own schema,
|
||||||
|
thresholds, and entity set. Changing the target topic cannot pollute
|
||||||
|
the discipline.
|
||||||
|
- **Bounded dependency** — the binding is a path, so the coupling is
|
||||||
|
visible in one place (`infospace.yaml`) and easy to remove.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- `examples/supply-chain-vsm/README.md` — the full reference composition.
|
||||||
|
- `examples/supply-chain-vsm/output/mappings/` — consolidated mapping
|
||||||
|
files showing the rationale and strength rating for each pairing.
|
||||||
|
- `examples/infospace-with-history/docs/advanced-usage.md` — patterns for
|
||||||
|
maintaining the discipline once it is in use.
|
||||||
179
examples/infospace-with-history/docs/advanced-usage.md
Normal file
179
examples/infospace-with-history/docs/advanced-usage.md
Normal file
@@ -0,0 +1,179 @@
|
|||||||
|
# Advanced Usage — Wealth of Nations Infospace
|
||||||
|
|
||||||
|
Patterns for working with the WoN infospace (988 entities) after the initial
|
||||||
|
pipeline run. Every command in this file has been run against the actual
|
||||||
|
infospace at the time of writing (2026-04-21); output shapes are excerpted
|
||||||
|
verbatim.
|
||||||
|
|
||||||
|
All commands assume `cwd = examples/infospace-with-history` and the
|
||||||
|
`markitect-venv` Python environment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Incremental evaluation — add entities after the initial run
|
||||||
|
|
||||||
|
`markitect infospace evaluate` writes one file per entity under
|
||||||
|
`output/evaluations/<slug>.md`. It skips any entity whose evaluation file
|
||||||
|
already exists, so re-running after adding a new entity processes only the
|
||||||
|
new one.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Add a new entity file
|
||||||
|
vim output/entities/new-concept.md
|
||||||
|
|
||||||
|
# Evaluate only the new entity (explicit)
|
||||||
|
markitect infospace evaluate --entity new-concept --provider openrouter
|
||||||
|
|
||||||
|
# Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM
|
||||||
|
markitect infospace evaluate --provider openrouter
|
||||||
|
```
|
||||||
|
|
||||||
|
**How skip detection works.** Evaluation slugs are normalised to underscores
|
||||||
|
with `_s_` preserving apostrophes (`farmers-capital` entity →
|
||||||
|
`farmer_s_capital.md` evaluation). If a new entity slug collides with an
|
||||||
|
existing evaluation under this normalisation, the eval will be skipped.
|
||||||
|
To be sure an entity was picked up, check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Count entities vs evaluations
|
||||||
|
ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l
|
||||||
|
ls output/evaluations/*.md | wc -l
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Re-evaluating after guideline changes
|
||||||
|
|
||||||
|
`evaluate` has no `--force` flag; re-evaluation requires deleting the
|
||||||
|
existing file first.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Re-evaluate a single entity after updating the evaluation rubric
|
||||||
|
rm output/evaluations/accumulation_of_stock.md
|
||||||
|
markitect infospace evaluate --entity accumulation-of-stock --provider openrouter
|
||||||
|
|
||||||
|
# Re-evaluate a whole chapter
|
||||||
|
ls output/entities/book-1-chapter-06-entities.md # see which entities the chapter produced
|
||||||
|
# Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them
|
||||||
|
```
|
||||||
|
|
||||||
|
After re-evaluating, refresh the aggregate:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
markitect infospace eval-summary --update-metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next
|
||||||
|
`markitect infospace viability` check reflects the new scores.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Interpreting per-entity score distributions
|
||||||
|
|
||||||
|
`eval-summary` shows the mean for each of the five evaluation dimensions
|
||||||
|
plus the overall range:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ markitect infospace eval-summary
|
||||||
|
Evaluation summary — 985 entities evaluated
|
||||||
|
|
||||||
|
Dimension Mean
|
||||||
|
--------------------------------------
|
||||||
|
overall 3.956
|
||||||
|
definition_precision 3.620
|
||||||
|
domain_placement 4.559
|
||||||
|
explanatory_value 3.936
|
||||||
|
source_grounding 4.358
|
||||||
|
vsm_relevance 3.305
|
||||||
|
|
||||||
|
Range: 1.00 – 4.80
|
||||||
|
```
|
||||||
|
|
||||||
|
Interpretation:
|
||||||
|
- `overall` above the 3.5 viability threshold → the collection passes
|
||||||
|
`per_entity_mean`.
|
||||||
|
- The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If
|
||||||
|
the collection is meant to be VSM-grounded, this is the dimension most
|
||||||
|
worth improving (via sharper entity definitions or schema changes).
|
||||||
|
- A wide range (1.00 – 4.80) tells you there are outliers at both ends —
|
||||||
|
worth triaging (see pattern 4).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Triaging low scorers
|
||||||
|
|
||||||
|
`markitect infospace entities --by-type` prints each entity's star score
|
||||||
|
in-line:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ markitect infospace entities --by-type | head
|
||||||
|
=== Element (315 entities) ===
|
||||||
|
active_and_productive_stock Accumulation S1 ★4.6
|
||||||
|
advanced_state_of_society General Theory S5
|
||||||
|
agio_of_bank_money Exchange S2 ★4.8
|
||||||
|
```
|
||||||
|
|
||||||
|
Entities with no `★` have no evaluation yet. To list the lowest-scoring
|
||||||
|
entities across the whole collection:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Extract overall_score from every evaluation file and sort ascending
|
||||||
|
for f in output/evaluations/*.md; do
|
||||||
|
score=$(awk '/^overall_score:/ {print $2; exit}' "$f")
|
||||||
|
printf "%s\t%s\n" "$score" "$(basename "$f" .md)"
|
||||||
|
done | sort -n | head -20
|
||||||
|
```
|
||||||
|
|
||||||
|
The 20 lowest scorers are the natural triage list — inspect their
|
||||||
|
`output/entities/<slug>.md` and evaluation rationales to decide whether to
|
||||||
|
refine the entity, merge it with a better-formed neighbour, or drop it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Reading and acting on collection-check output
|
||||||
|
|
||||||
|
`markitect infospace check` runs five concerns (C1–C5). Use `--concern` to
|
||||||
|
focus on one and `--json` for machine-readable output:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Redundancy — which pairs of entities are suspiciously similar?
|
||||||
|
markitect infospace check --concern redundancy --json
|
||||||
|
```
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"redundancy": {
|
||||||
|
"concern": "C1",
|
||||||
|
"redundancy_ratio": 0.0061,
|
||||||
|
"similar_pairs": [
|
||||||
|
{"entity_a": "bank_economic_contribution_metrics",
|
||||||
|
"entity_b": "bank_economic_development_metrics",
|
||||||
|
"similarity": 1.0, "method": "word_overlap"},
|
||||||
|
{"entity_a": "economic_system_objectives",
|
||||||
|
"entity_b": "economic_system_purpose",
|
||||||
|
"similarity": 0.9394, "method": "word_overlap"}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Acting on this:
|
||||||
|
- **Similarity = 1.0** is almost certainly a duplicate — pick one slug and
|
||||||
|
merge or delete the other.
|
||||||
|
- **0.85–0.99** usually means two entities genuinely cover the same idea
|
||||||
|
with slight phrasing differences. Merging is the cleanest fix.
|
||||||
|
- **< 0.85** usually represents legitimate adjacent concepts — leave as-is
|
||||||
|
unless the definition rubric says otherwise.
|
||||||
|
|
||||||
|
For coverage and coherence, the pattern is the same: the `--json` output
|
||||||
|
surfaces the specific entities / missing links / disconnected components
|
||||||
|
you need to look at, rather than a bare ratio.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- `METRICS-METHODOLOGY.md` — how each metric is computed.
|
||||||
|
- `docs/composition-guide.md` — using this infospace as a discipline for a
|
||||||
|
different domain.
|
||||||
|
- `docs/performance-notes.md` — observed timings and provider choices.
|
||||||
106
examples/infospace-with-history/docs/performance-notes.md
Normal file
106
examples/infospace-with-history/docs/performance-notes.md
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
# Performance Notes — Wealth of Nations Infospace
|
||||||
|
|
||||||
|
Observed timings, file sizes, and provider choices from the 988-entity WoN
|
||||||
|
example. These are **operational notes**, not a benchmark — numbers come
|
||||||
|
from the actual S3.3 evaluation run (2026-02-23) rather than a controlled
|
||||||
|
experiment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Evaluation batch duration
|
||||||
|
|
||||||
|
The initial evaluation pass produced 985 `output/evaluations/*.md` files:
|
||||||
|
|
||||||
|
- First `evaluated_at`: `2026-02-23T00:11:52`
|
||||||
|
- Last `evaluated_at`: `2026-02-23T06:39:45`
|
||||||
|
- **Total wall time: ~6h 28m**
|
||||||
|
- **Effective throughput: ~2.5 entities/min** (~152 entities/hour)
|
||||||
|
|
||||||
|
Extracted from evaluation frontmatter:
|
||||||
|
```bash
|
||||||
|
grep -h '^evaluated_at:' output/evaluations/*.md | sort | sed -n '1p;$p'
|
||||||
|
```
|
||||||
|
|
||||||
|
Caveats:
|
||||||
|
- This was against OpenRouter's free tier, which applies implicit
|
||||||
|
rate-limiting and occasional retries.
|
||||||
|
- Throughput is not constant — gaps between bursts show up as plateaus
|
||||||
|
when you plot the timestamps.
|
||||||
|
- The batch was not fully parallelised; a tuned concurrent client could
|
||||||
|
likely 2–4× this throughput on a paid OpenRouter tier.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tokens per entity (estimate)
|
||||||
|
|
||||||
|
Direct token counts are not logged in the evaluation files, but the
|
||||||
|
inputs and outputs are on disk:
|
||||||
|
|
||||||
|
- **Input per request**: evaluation schema (~3.7 KB) + entity file
|
||||||
|
(~0.7 KB median) + fixed system prompt ≈ **~1500–2500 tokens in**
|
||||||
|
- **Output per request**: structured evaluation with 5 dimensions and
|
||||||
|
rationales, median eval file 3.6 KB ≈ **~600–800 tokens out**
|
||||||
|
- **Round-trip total**: **~2000–3000 tokens per entity**
|
||||||
|
- **Batch total estimate**: 985 entities × ~2500 tokens ≈ **~2.5M tokens**
|
||||||
|
for the full pass
|
||||||
|
|
||||||
|
The constant per-entity input means the cheapest way to reduce spend on a
|
||||||
|
re-run is to narrow the targeted entities (`--entity <slug>` or
|
||||||
|
`--chapter <n>`), not to shorten the schema.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Embedding cache and collection checks
|
||||||
|
|
||||||
|
`markitect infospace check --concern redundancy` supports two similarity
|
||||||
|
backends (see `markitect/infospace/checks/redundancy.py`):
|
||||||
|
|
||||||
|
- **`word_overlap`** — the default, used when no embeddings are provided.
|
||||||
|
Pure-Python set intersection over tokenised entity text. **No LLM calls,
|
||||||
|
no cache needed.** This is what the current WoN check runs.
|
||||||
|
- **`embedding`** — active when a pre-computed `{slug: vector}` mapping is
|
||||||
|
passed in. No persistent on-disk embedding cache exists today; the
|
||||||
|
caller is responsible for computing and supplying the vectors.
|
||||||
|
|
||||||
|
Implication: the 988-entity `check` runs in seconds because it's all
|
||||||
|
word-overlap. Switching to embedding similarity would add an embedding
|
||||||
|
API pass (another ~988 requests) which is currently a manual step
|
||||||
|
outside the CLI.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provider choice — recommendation
|
||||||
|
|
||||||
|
For the WoN dataset specifically (text-heavy entities, 5-dimension
|
||||||
|
rubric):
|
||||||
|
|
||||||
|
| Scale | Recommended provider | Rationale |
|
||||||
|
|-----------------------|----------------------------------|-----------|
|
||||||
|
| < 50 entities | `gemini/gemini-2.5-flash` | Fast default; free tier is generous enough; consistent with `markitect llm-check` out of the box. |
|
||||||
|
| 50 – 1000 entities | `openrouter` with a `:free` model (e.g. `arcee-ai/trinity-large-preview:free`) | What the S3.3 batch used; gets through 988 entities in one overnight run without cost. |
|
||||||
|
| > 1000 entities | `openrouter` with a paid small-context model, or `openai` | Free-tier rate limits start to dominate wall time; paying for higher concurrency is cheaper than calendar time. |
|
||||||
|
|
||||||
|
All providers are accepted by `markitect infospace evaluate --provider`.
|
||||||
|
The evaluation schema doesn't assume any provider-specific features.
|
||||||
|
|
||||||
|
Note on provider mixing: if part of a collection is evaluated under one
|
||||||
|
provider/model and the rest under another, `per_entity_mean` can drift
|
||||||
|
slightly (different models calibrate scores differently). For the
|
||||||
|
viability threshold of 3.5 the drift is usually negligible, but for
|
||||||
|
fine-grained outlier analysis prefer a single provider per batch.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What is *not* measured here
|
||||||
|
|
||||||
|
- **End-to-end pipeline time** (entity extraction from raw chapters,
|
||||||
|
classification, relation graph) — only the evaluation phase is timed.
|
||||||
|
- **Memory footprint** — the full in-memory state for 988 entities is
|
||||||
|
small (< 200 MB observed), but not systematically measured.
|
||||||
|
- **Failure/retry rates** — the 985 vs 988 gap is three entities the
|
||||||
|
original run missed (plus one added later); no structured retry log
|
||||||
|
was kept.
|
||||||
|
|
||||||
|
Expanding any of these into a proper benchmark is **out of scope** for
|
||||||
|
the WoN example and should live alongside a synthetic corpus that can be
|
||||||
|
regenerated deterministically.
|
||||||
Reference in New Issue
Block a user