feat(example): add baseline metrics snapshot from collection checks run

Initial metrics from S2.4 checks on 85 entities (7 of 35 chapters): coverage_ratio=0.361, redundancy=0.0, coherence_components=0.0, consistency_cycles=0.0, granularity_entropy=2.69 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(example): migrate to infospace config with tooling integration (S3.1)
2026-02-19 07:44:01 +01:00 · 2026-02-19 02:29:53 +01:00 · 2026-02-19 02:05:09 +01:00 · 2026-02-19 02:03:54 +01:00 · 2026-02-19 02:01:00 +01:00 · 2026-02-19 01:54:22 +01:00
62 changed files with 11252 additions and 7 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -78,6 +78,7 @@ Thumbs.db

 # MarkiTect database files (local development)
 markitect.db
+**/infospace.db
 assets/assets.db
 **/assets.db
 .markitect/
--- a/docs/infospace-primitives.md
+++ b/docs/infospace-primitives.md
@@ -0,0 +1,344 @@
+# Infospace Primitives Reference
+
+This document describes the primitives provided by the `markitect/infospace/`
+package for creating, evaluating, maintaining, and composing infospaces.
+
+---
+
+## Core Concepts
+
+An **infospace** is a structured, evaluable, composable collection of
+entities that explains a **topic** through the lens of one or more
+**disciplines**.
+
+| Term | Meaning |
+|------|---------|
+| **Topic** | The subject matter being explained |
+| **Discipline** | A reusable framework of concepts applied as an analytical lens |
+| **Entity** | The atomic unit of knowledge — slug, definition, provenance, domain |
+| **Evaluation** | Per-entity or collection-level quality assessment |
+| **Viability** | Whether an infospace meets its threshold scores |
+
+---
+
+## Configuration (`infospace.yaml`)
+
+Every infospace is declared via an `infospace.yaml` file. The configuration
+model is defined in `markitect/infospace/config.py`.
+
+### Minimal example
+
+```yaml
+topic:
+  name: "The Wealth of Nations"
+  domain: "Classical Economics"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Viable System Model"
+    path: artifacts/vsm-reference/
+
+schemas:
+  entity: schemas/economic-entity-schema-v1.0.md
+
+viability:
+  coverage_ratio: { min: 0.60 }
+  redundancy_ratio: { max: 0.05 }
+  per_entity_mean: { min: 3.5 }
+```
+
+### Key models
+
+- **`TopicConfig`** — `name`, `domain`, `sources`
+- **`DisciplineBinding`** — `name`, `path` (to another infospace directory)
+- **`SchemaRegistry`** — `entity`, `mapping`, `analysis` schema paths
+- **`ViabilityThreshold`** — `metric`, `min`, `max` bounds
+- **`PipelineConfig`** — Ordered list of `PipelineStage` entries
+- **`InfospaceConfig`** — Top-level config combining all of the above
+
+### Default directories
+
+| Setting | Default |
+|---------|---------|
+| `entities_dir` | `output/entities` |
+| `evaluations_dir` | `output/evaluations` |
+| `metrics_dir` | `output/metrics` |
+
+---
+
+## Entity Metadata
+
+Entities are parsed from markdown files by `markitect/infospace/entity_parser.py`.
+
+**`EntityMeta`** fields: `slug`, `title`, `definition`, `domain`,
+`source_chapter`, `context`, `original_wording`, `modern_interpretation`,
+`definition_word_count`, `total_word_count`, `section_slugs`.
+
+```python
+from markitect.infospace import parse_entity_directory
+entities = parse_entity_directory(Path("output/entities"))
+```
+
+---
+
+## Schema Validation
+
+Deterministic validation of entity files against structural schemas.
+
+```python
+from markitect.infospace import validate_entity, ECONOMIC_ENTITY_SCHEMA
+result = validate_entity(entity_meta, schema=ECONOMIC_ENTITY_SCHEMA)
+print(result.summary())
+```
+
+Checks: section presence, word count ranges, heading format, enum values.
+
+---
+
+## Per-entity Evaluation
+
+LLM-based quality assessment of individual entities. Defined in
+`markitect/infospace/evaluate.py`.
+
+```bash
+# Evaluate all entities
+markitect infospace evaluate --provider openrouter
+
+# Single entity
+markitect infospace evaluate --entity division-of-labour --provider openrouter
+```
+
+### Pipeline functions
+
+- `build_evaluation_prompt(entity, topic, dimensions)` — build the LLM prompt
+- `parse_evaluation_response(text, dimensions)` — parse LLM output to `ScoreEntry` list
+- `run_entity_evaluation(config, entities, adapter, ...)` — full batch pipeline
+
+Results are written to `output/evaluations/` as YAML frontmatter + markdown.
+
+---
+
+## Collection-level Checks
+
+Five concerns assessed at the collection level. Each has a dedicated
+module in `markitect/infospace/checks/`.
+
+| Concern | Module | Key metric |
+|---------|--------|------------|
+| **C1 — Redundancy** | `redundancy.py` | `redundancy_ratio` |
+| **C2 — Coverage** | `coverage.py` | `coverage_ratio` |
+| **C3 — Coherence** | `coherence.py` | `coherence_components`, `modularity` |
+| **C4 — Consistency** | `consistency.py` | `consistency_cycles` |
+| **C5 — Granularity** | `granularity.py` | `granularity_entropy` |
+
+### Orchestrator
+
+```python
+from markitect.infospace.checks import run_all_checks
+report = run_all_checks(entities, embeddings=emb, graph=g)
+metrics = report.metrics()  # Dict[str, float]
+```
+
+### CLI
+
+```bash
+# Run all checks
+markitect infospace check
+
+# Run specific concerns
+markitect infospace check --concern redundancy --concern coverage
+
+# JSON output
+markitect infospace check --json
+```
+
+After each check run, metrics are automatically recorded to history.
+
+---
+
+## Metrics History
+
+Timestamped snapshots track metrics over time. Defined in
+`markitect/infospace/history.py`.
+
+```bash
+# Show history
+markitect infospace history
+
+# Trend for a single metric
+markitect infospace history --metric coverage_ratio
+
+# Compare two snapshots
+markitect infospace history-diff 2026-02-01 2026-03-01
+```
+
+### Key functions
+
+- `snapshot_from_checks(report, entity_count)` — create snapshot from check results
+- `record_check_results(report, config, root, entity_count)` — save metrics + append to history
+- `get_history(config, root)` — read full history
+- `metric_trend(history, metric_name)` — extract single metric across time
+
+---
+
+## Viability
+
+Viability is assessed by comparing current metrics to thresholds declared
+in `infospace.yaml`.
+
+```bash
+markitect infospace viability
+```
+
+### Threshold model
+
+```yaml
+viability:
+  coverage_ratio: { min: 0.60 }       # must be >= 0.60
+  redundancy_ratio: { max: 0.05 }     # must be <= 0.05
+  consistency_cycles: { max: 0 }       # must be exactly 0
+```
+
+Each threshold has `min` and/or `max` bounds. A metric passes if it falls
+within bounds. An infospace is viable when all thresholds pass.
+
+---
+
+## Composition
+
+One infospace can use another as a discipline. The composition model is
+defined in `markitect/infospace/composition.py`.
+
+### Binding a discipline
+
+```bash
+markitect infospace bind-discipline ./path/to/vsm-infospace --name "Viable System Model"
+```
+
+This adds a `DisciplineBinding` to `infospace.yaml` and validates the
+discipline exists and has an `infospace.yaml`.
+
+### Checking discipline status
+
+```bash
+markitect infospace disciplines
+```
+
+Shows: name, entity count, viability status, path.
+
+### Viability requirement
+
+A discipline must meet its own viability thresholds to be considered
+reliable. The `check_discipline_status()` function loads the discipline's
+metrics and runs its own threshold checks.
+
+### Stale mapping detection
+
+```bash
+markitect infospace stale-mappings
+```
+
+Compares local mapping references against the discipline's current entity
+set. If a referenced discipline entity has been removed, the mapping is
+flagged as stale.
+
+### Key functions
+
+- `resolve_discipline_path(binding, root)` — resolve to absolute path
+- `load_discipline_config(binding, root)` — load discipline's `infospace.yaml`
+- `check_discipline_status(binding, root)` — full status with viability
+- `get_discipline_entities(binding, root)` — entity list from discipline
+- `find_stale_mappings(config, root, mapping_references)` — detect stale refs
+- `bind_discipline(config, name, path, root)` — add binding to config
+
+---
+
+## Evaluation Output Format
+
+Evaluation results use YAML frontmatter + markdown body. Defined in
+`markitect/infospace/evaluation.py` and `evaluation_io.py`.
+
+### Per-entity evaluation file
+
+```markdown
+---
+entity_slug: division-of-labour
+evaluator: openrouter/default
+evaluated_at: '2026-02-19T10:30:00'
+overall_score: 4.1667
+scores:
+- name: definition_precision
+  value: 4.5
+  max_value: 5.0
+...
+---
+
+# Evaluation: Division Of Labour
+
+## definition_precision — 4.5 / 5.0
+
+The definition clearly captures the core concept...
+```
+
+### Snapshot
+
+```yaml
+snapshot_id: abc12345
+created_at: '2026-02-19T10:30:00+00:00'
+schema_name: default
+entity_count: 85
+entity_evaluations: [...]
+collection_metrics:
+  - name: coverage_ratio
+    value: 0.75
+    concern: C2
+```
+
+---
+
+## State
+
+Runtime state is computed from entities, evaluations, and metrics.
+Defined in `markitect/infospace/state.py`.
+
+```python
+from markitect.infospace import build_state
+state = build_state(config, entities=entities, metrics=metrics)
+state.is_viable          # True if all thresholds pass
+state.viability_results  # List[ViabilityResult]
+state.summary()          # Dict for display
+```
+
+---
+
+## CLI Command Summary
+
+All commands are under `markitect infospace`:
+
+| Command | Purpose |
+|---------|---------|
+| `init` | Create a new `infospace.yaml` |
+| `status` | Show entity count, domains, evaluation state |
+| `entities` | List entities with metadata |
+| `evaluate` | Run per-entity LLM evaluation |
+| `check` | Run collection-level quality checks (C1-C5) |
+| `viability` | Show viability dashboard |
+| `history` | Show metrics history |
+| `history-diff` | Compare two snapshots by date |
+| `bind-discipline` | Bind an external infospace as a discipline |
+| `disciplines` | List bound disciplines and viability |
+| `stale-mappings` | Detect stale cross-infospace references |
+
+---
+
+## Platform Dependencies
+
+The infospace tooling builds on these platform modules:
+
+| Module | Used for |
+|--------|----------|
+| `markitect/llm/` | Embedding adapters, LLM evaluation |
+| `markitect/analysis/graph.py` | Graph analysis (networkx wrapper) |
+| `markitect/analysis/fca.py` | Formal Concept Analysis |
+| `markitect/prompts/execution/batch.py` | Batch LLM evaluation |
+| `markitect/prompts/dependencies/models.py` | DependencyGraph |
--- a/examples/infospace-with-history/INFRA-TASKS.md
+++ b/examples/infospace-with-history/INFRA-TASKS.md
@@ -37,3 +37,513 @@ no automatic parsing for this format, requiring manual macro construction.
 **Fix applied:** Added `SHORTHAND_PATTERN` to `MacroParser` that recognises
 `@{target}` and maps it to `MacroKind.REQUIRED`. Updated `has_macros()`,
 `count_macros()`, and `find_macro_positions()` accordingly.
+
+---
+
+## Assignment Assessment (18 Feb 2026)
+
+How the example measures against the objectives stated in `README.md`:
+
+| # | Objective | Status | Notes |
+|---|-----------|--------|-------|
+| 1 | Capture knowledge from Wealth of Nations | **Partial** | 7 of 35 chapters processed (Book I, ch. 1-7). 85 canonical entities extracted. |
+| 2 | Transform to VSM concepts/entities | **Done (for processed chapters)** | Entities mapped to S1-S5 with strength ratings. |
+| 3 | Consistent and complete | **Not yet** | Only 20% of chapters done. Metrics report exists but covers limited scope. |
+| 4 | Schemas as scaffolding | **Done** | Four schemas defined and used across all stages. |
+| 5 | Prompt dependency resolution | **Done** | `@{macro}` templates resolved via MultiSpaceResolutionStrategy. |
+| 6 | Incremental chapter injection | **Done** | Pipeline processes one chapter at a time; `@{existing_entities}` prevents duplication. |
+| 7 | Keep changes as git history | **Not done** | See task 4 below. |
+| 8 | Metrics for completeness/consistency | **Partial** | Template and report exist but only cover 4 chapters (report predates ch. 5-7). |
+| 9 | No infrastructure changes during experiment | **Violated** | Three infra fixes were required (tasks 1-3 above). Documented as intended. |
+| 10 | Generate task list for infra issues | **Done** | This file. |
+
+## 4. Infospace has no per-chapter git history — OPEN
+
+**Objective:** README states "The information space should utilize the option
+of keeping changes as git history."
+**Issue:** The 7 processed chapters were committed in mixed batches alongside
+infrastructure changes (LLM adapters, entity refactoring, archive policy).
+Chapters 1-2 are bundled into `fecc2fd` with the entire LLM module.
+Chapters 5-7 share a single commit (`41773f1`) with the OpenAI adapter and
+archive policy. There is no commit where you can `git diff` to see exactly
+what one chapter contributed to the infospace.
+**Impact:** Cannot use `git log`, `git diff`, or `git bisect` to trace how
+the infospace grew chapter by chapter — the core promise of "with history."
+**Suggested fix:** Re-run the 7 processed chapters (and remaining 28) using
+`process_chapters.py` without `--no-commit`, on a clean branch or after
+squashing the current output into a baseline commit. Each chapter gets its
+own commit via `_git_commit_chapter()`.
+
+## 5. Prompt files are regenerated as a side-effect of DB rebuild — OPEN
+
+**Issue:** Running `--all --no-commit` to regenerate `infospace.db` also
+overwrites `*-prompt.md` files in the output directories because each
+pipeline stage unconditionally writes the compiled prompt before checking
+whether output already exists. The `@{existing_entities}` macro content
+shifts as earlier chapters are loaded, so prompt files for already-processed
+chapters change on every full run.
+**Impact:** A DB regeneration dirties the working tree with prompt file
+changes, even though no actual outputs changed. Users must `git checkout`
+the prompt files after regeneration.
+**Suggested fix:** Skip writing prompt files when the corresponding output
+file already exists on disk, or add a `--rebuild-db-only` flag that
+populates the database without touching the file system.
+
+## 6. Metrics report is stale — OPEN
+
+**Issue:** The metrics report (`output/metrics/metrics-report.md`) was
+generated after chapters 1-4. Chapters 5-7 have since been processed but
+the report has not been refreshed.
+**Impact:** The metrics do not reflect the current state of the infospace.
+**Suggested fix:** Re-run `--metrics --provider <provider> --no-commit`
+after every batch of new chapters. Consider making metrics assessment
+automatic at the end of `--book` or `--all` runs.
+
+## 7. Remaining 28 chapters not yet processed — OPEN
+
+**Issue:** Only Book I chapters 1-7 have been processed. Books II-V
+(28 chapters) remain unprocessed.
+**Impact:** The infospace is incomplete — VSM coverage is limited to S1,
+S2, and partial S4. S3, S3*, S5, and many systemic concepts (algedonic
+signals, recursion, variety) are expected to emerge from later books.
+**Suggested fix:** Process remaining chapters in book-sized batches with
+per-chapter commits, refreshing metrics after each book.
+
+---
+
+## Per-Concept Metrics (tasks 8-12)
+
+The current metrics system is a single LLM-evaluated narrative report that
+assesses the infospace as a whole. It produces no machine-readable output,
+cannot be tracked over time, and conflates per-concept quality with
+collection-level coherence.
+
+The improvement splits metrics into two layers:
+
+- **LLM-Eval**: A prompt template evaluates each concept individually
+  against quality criteria defined in the schema. The LLM returns structured
+  scores, not prose.
+- **Deterministic aggregation**: `process_chapters.py` computes what it can
+  from files on disk (schema compliance, word counts, section presence,
+  coverage tallies) and aggregates LLM-eval scores into dashboard metrics.
+
+Both layers persist results in structured form so they can be diffed,
+tracked over time, and committed alongside the entities they evaluate.
+
+## 8. Add per-concept quality metrics to entity schema — OPEN
+
+**Issue:** The entity schema (`economic-entity-schema-v1.0.md`) defines
+required sections and validation rules (section presence, word count range)
+but no quality criteria. There is no definition of what makes a *good*
+entity versus a merely *compliant* one.
+**Suggested fix:** Add a `## Quality Metrics` section to the entity schema
+defining evaluation dimensions with scoring rubrics:
+
+- **Definition Precision** (1-5): Is the definition specific, non-circular,
+  and distinguishable from neighbouring concepts?
+- **Source Grounding** (1-5): Is the entity grounded in a specific passage?
+  Does the citation exist and support the definition?
+- **Domain Placement** (1-5): Is the economic domain assignment correct and
+  specific (not just "General Theory")?
+- **VSM Relevance** (1-5): Does the entity connect meaningfully to at least
+  one VSM system, or is it too granular/abstract to map?
+- **Explanatory Value** (1-5): Does this entity contribute to explaining
+  the economic system, or is it a restatement of another concept?
+
+Similarly update the VSM mapping schema with:
+
+- **Rationale Rigour** (1-5): Is the mapping justified with reference to
+  Beer's definitions, not just surface-level analogy?
+- **Strength Calibration** (1-5): Is the declared strength (Strong/Moderate/
+  Weak) consistent with the rationale given?
+
+These rubrics become the prompt instructions for task 9.
+
+## 9. Create evaluate-entity prompt template — OPEN
+
+**Depends on:** Task 8 (quality metrics in schema).
+**Issue:** There is no mechanism to evaluate an existing entity after
+extraction. Quality is only judged implicitly during the global metrics
+assessment, which is too coarse to identify individual weak entities.
+**Suggested fix:** Create `templates/evaluate-entity.md` — a prompt
+template that:
+
+1. Takes `@{entity_content}`, `@{source_chapter}`, `@{vsm_framework}`,
+   and `@{quality_rubric}` (from the schema's quality metrics section).
+2. Asks the LLM to score each dimension (1-5) with a one-sentence
+   justification per score.
+3. Outputs structured YAML front-matter (scores) followed by markdown
+   (justifications), e.g.:
+
+```yaml
+---
+entity: division-of-labour
+scores:
+  definition_precision: 5
+  source_grounding: 5
+  domain_placement: 4
+  vsm_relevance: 5
+  explanatory_value: 5
+overall: 4.8
+flags: []
+---
+```
+
+Add a pipeline stage: `--evaluate` runs this template against every
+canonical entity and writes results to `output/evaluations/<slug>-eval.md`.
+A `--evaluate --chapter <id>` variant evaluates only entities introduced
+by that chapter.
+
+## 10. Add deterministic schema compliance checker — OPEN
+
+**Issue:** Schema compliance is currently LLM-evaluated ("100%" in the
+metrics report) but the validation rules in the schemas are mechanical:
+section presence, word count ranges, heading format. These should be
+checked programmatically, not by an LLM.
+**Suggested fix:** Add a `validate_entity(path) -> ValidationResult`
+function to `process_chapters.py` (or a new `validate.py` module) that:
+
+- Parses the markdown to extract H2 section headings
+- Checks required sections are present (Definition, Source Chapter,
+  Context, Economic Domain)
+- Counts words in the Definition section (must be 20-150)
+- Checks H1 heading exists and is not a slug (e.g. `effectual-demand`
+  in chapter 7 has `# effectual-demand` instead of `# Effectual Demand`)
+- Validates Source Chapter cites a specific book/chapter
+- For mapping files: checks Mapping Strength is one of the enum values
+
+Expose as `--validate` CLI flag. Output a structured report:
+
+```
+Validation: 85 entities, 3 warnings
+  effectual-demand.md: H1 is slug format, not title case
+  porter.md: Definition is 18 words (minimum 20)
+  ...
+```
+
+This is fully deterministic — no LLM calls needed.
+
+## 11. Structured metrics output format — OPEN
+
+**Depends on:** Tasks 9 and 10.
+**Issue:** The metrics report is a markdown narrative. Values cannot be
+parsed programmatically, diffed meaningfully, or plotted over time.
+**Suggested fix:** Alongside the human-readable `metrics-report.md`,
+emit a machine-readable `metrics.yaml` (or `.json`) containing:
+
+```yaml
+timestamp: "2026-02-18T12:00:00Z"
+chapters_processed: 7
+chapters_total: 35
+entities_total: 85
+entities_archived: 0
+vsm_coverage:
+  S1: 28
+  S2: 12
+  S3: 8
+  S3_star: 0
+  S4: 5
+  S5: 0
+  recursion: 1
+  variety: 0
+mapping_strength:
+  strong: 64
+  moderate: 18
+  weak: 3
+validation:
+  schema_compliant: 82
+  warnings: 3
+evaluation:    # from LLM-eval (task 9)
+  mean_overall: 4.2
+  min_overall: 2.8
+  flagged_entities: ["porter", "country-workman"]
+```
+
+The `--metrics` command writes both files. The YAML file is committed
+to git so `git diff` shows exactly how metrics changed between runs.
+
+## 12. Metrics-over-time tracking — OPEN
+
+**Depends on:** Task 11 (structured output).
+**Issue:** There is one metrics snapshot that gets overwritten. No history
+of how metrics evolved as chapters were added.
+**Suggested fix:** Append each metrics snapshot to a cumulative log file
+`output/metrics/metrics-history.yaml` (list of timestamped entries). This
+is committed to git alongside the current snapshot. The pipeline can
+optionally render a simple text-based progress summary:
+
+```
+Metrics history (5 snapshots):
+  2026-02-10  ch 1/35   13 entities  41.7% VSM coverage
+  2026-02-11  ch 4/35   38 entities  50.0% VSM coverage
+  2026-02-11  ch 7/35   85 entities  58.3% VSM coverage
+  ...
+```
+
+This provides the "metrics that improve over time" feedback loop the
+README envisions: process chapters → evaluate → see coverage grow (or
+flag regressions when a re-extraction reduces quality scores).
+
+---
+
+## Collection-Level Metrics (tasks 13-19)
+
+These tasks implement the five collection-level concerns described in
+`METRICS-METHODOLOGY.md`. They share underlying infrastructure (entity
+metadata index, definition embeddings, relationship graph) that should
+be built once per evaluation run.
+
+See the methodology document for theoretical grounding, framework
+references, and the full metric definitions per concern.
+
+## 13. Entity metadata index — deterministic parsing layer — OPEN
+
+**Depends on:** Task 10 (schema compliance checker shares parsing logic).
+**Issue:** Several collection-level metrics (coverage matrix, FCA context,
+granularity distribution) require structured metadata extracted from entity
+files: H1 title, economic domain, VSM system(s), source chapter, section
+presence, word counts. Currently this information exists only as prose
+inside markdown files.
+**Suggested fix:** Add a `parse_entity_metadata(path) -> EntityMeta`
+function that extracts from each entity file:
+
+```python
+@dataclass
+class EntityMeta:
+    slug: str
+    title: str                  # from H1
+    domain: str                 # from Economic Domain section
+    source_chapter: str         # from Source Chapter section
+    definition_words: int       # word count of Definition section
+    has_original_wording: bool  # optional section present?
+    has_modern_interpretation: bool
+    vsm_systems: list[str]     # from mapping file if exists
+    mapping_strengths: list[str]
+```
+
+Build an index of all entities at the start of each evaluation run.
+This index is the input for tasks 14, 16, and 18. Expose as
+`--index` CLI flag for inspection.
+
+## 14. Redundancy detection (Concern C1) — OPEN
+
+**Depends on:** Task 13 (metadata index).
+**Methodology:** OOPS! P2 (synonymous classes) + embedding similarity +
+LLM pairwise judgment. See METRICS-METHODOLOGY.md §4 C1.
+**Issue:** Entities with different slugs but overlapping meanings (e.g.
+`natural-rate` / `ordinary-or-average-rate`) survive extraction because
+dedup only checks slug collisions. There is no semantic overlap detection.
+**Suggested fix:** Implement in three stages:
+
+1. **Embed** — Compute vector embeddings of all entity definitions using
+   an embedding API (OpenRouter, OpenAI, or a local sentence-transformer).
+   Cache embeddings in `output/metrics/embeddings.json` keyed by
+   `{slug: content_digest}` so unchanged entities skip re-embedding.
+
+2. **Similarity matrix** — Compute NxN cosine similarity. Write the full
+   matrix to `output/metrics/similarity-matrix.json`. Flag all pairs with
+   cosine > 0.80 as candidates.
+
+3. **LLM pairwise judgment** — For each candidate pair, run a prompt:
+   "Given these two entity definitions, are they (a) the same concept and
+   should be merged, (b) genuinely distinct, or (c) partially overlapping
+   and should be clarified?" Write results to
+   `output/metrics/redundancy-report.md` + YAML.
+
+**Metrics produced:**
+- `high_similarity_pairs`: count and list
+- `confirmed_synonyms`: count (LLM-confirmed same concept)
+- `redundancy_ratio`: `confirmed_synonyms / total_entities`
+- `intensional_conciseness`: `1 - redundancy_ratio`
+
+**CLI:** `--check-redundancy --provider <provider>`
+
+## 15. Coverage completeness (Concern C2) — OPEN
+
+**Depends on:** Task 13 (metadata index).
+**Methodology:** SEQUAL completeness + FCA gap analysis + DSL competency
+questions. See METRICS-METHODOLOGY.md §4 C2.
+**Issue:** Coverage is currently assessed by the LLM in a single narrative
+pass. There is no structured view of which domain × VSM cells are
+populated, and no way to test whether the entity set can answer specific
+questions about the economic system.
+**Suggested fix:** Implement in three stages:
+
+1. **Domain × VSM matrix** — From the metadata index, count entities per
+   {economic_domain, vsm_system} cell. Render as a table. Identify empty
+   cells as specific, actionable gaps. Compute:
+   - `coverage_ratio = populated_cells / total_cells`
+   - `vsm_balance_entropy = -Σ(pᵢ log pᵢ)` across VSM systems
+
+2. **FCA lattice** — Construct a formal context with objects = entities,
+   attributes = {domain, vsm_system, source_book, abstraction_level}.
+   Compute the concept lattice (Python `concepts` library). Extract
+   attribute combinations with no corresponding entity — these are
+   **structural coverage gaps** not visible in the simple matrix.
+
+3. **Competency questions** — Define a set of 15-20 canonical questions
+   the infospace should answer (stored in
+   `schemas/competency-questions.md`). Example questions:
+   - "How does the division of labour relate to market extent?"
+   - "What mechanisms regulate wages toward their natural rate?"
+   - "How do monopolies distort the viable system?"
+   LLM-Eval tests whether current entities suffice to answer each.
+   Unanswerable questions identify specific completeness gaps.
+
+**Metrics produced:**
+- `domain_vsm_matrix`: cell counts
+- `coverage_ratio`: scalar
+- `vsm_balance_entropy`: scalar
+- `empty_cells`: list of {domain, vsm_system} gaps
+- `fca_gap_concepts`: attribute combos with no entity
+- `competency_coverage`: fraction of questions answerable
+
+**CLI:** `--check-coverage --provider <provider>`
+
+## 16. Structural coherence (Concern C3) — OPEN
+
+**Depends on:** Task 13 (metadata index).
+**Methodology:** OntoQA relationship richness + graph connectivity +
+community detection. See METRICS-METHODOLOGY.md §4 C3.
+**Issue:** It is unknown whether the 85 entities form a connected
+explanatory web or a fragmented collection. No relationship graph exists
+between entities.
+**Suggested fix:** Implement in three stages:
+
+1. **Explicit cross-references** — Scan each entity's definition for
+   mentions of other entity slugs or titles (normalised string matching).
+   This is deterministic and catches direct references.
+
+2. **LLM-inferred edges** — For entity pairs not caught by string
+   matching but in the same domain or VSM system, LLM-Eval: "Does A's
+   definition conceptually depend on or explain B, or vice versa?" Run
+   in batches. Write the combined graph to
+   `output/metrics/relationship-graph.json` (adjacency list).
+
+3. **Graph analysis** — Using networkx or equivalent:
+   - Connected components (target: 1)
+   - Graph density, average degree
+   - Betweenness centrality → identify bridge concepts
+   - Louvain community detection → compare to declared domains
+   - OntoQA Relationship Richness
+   - Cohesion per domain, coupling across domains
+   - Orphan entities (degree 0 or 1)
+
+**Metrics produced:**
+- `connected_components`: count (target: 1)
+- `graph_density`: scalar
+- `avg_degree`: scalar
+- `relationship_richness`: OntoQA RR
+- `modularity`: Louvain score
+- `bridge_concepts`: list (high betweenness centrality)
+- `orphan_entities`: list (degree ≤ 1)
+- `cohesion_by_domain` / `coupling_across_domains`: scalars
+
+**CLI:** `--check-coherence --provider <provider>`
+
+## 17. Definitional consistency (Concern C4) — OPEN
+
+**Depends on:** Task 16 (relationship graph — the definitional dependency
+graph is a directed variant of the same structure).
+**Methodology:** OntoClean metaproperties + OOPS! P24 (circular
+definitions) + SEQUAL validity. See METRICS-METHODOLOGY.md §4 C4.
+**Issue:** No mechanism to detect circular definitions, contradictions
+between related entities, or terms used in definitions that should be
+entities but aren't.
+**Suggested fix:** Implement in four stages:
+
+1. **Definitional dependency graph** — Directed version of the
+   relationship graph: edge A→B means A's definition uses B's concept.
+   Reuse cross-reference extraction from task 16.
+
+2. **Cycle detection** — Find all cycles of length ≤ 3 in the directed
+   graph. Short cycles are problematic (A defines B, B defines A).
+   Compute `grounding_ratio`: fraction of entities traceable to terms
+   outside the entity set without encountering a cycle.
+
+3. **Undefined dependencies** — Extract terms from definitions that match
+   entity-name patterns (capitalised noun phrases, kebab-case slugs) but
+   have no corresponding entity file. These are concepts the infospace
+   implicitly relies on but hasn't defined.
+
+4. **LLM consistency checks** — For directly-connected entity pairs,
+   LLM-Eval: "Do these definitions contradict each other?" For entities
+   with Smith's Original Wording, LLM-Eval: "Does the definition
+   accurately represent the cited passage?"
+
+**Metrics produced:**
+- `circular_definitions`: count and list of cycles (length ≤ 3)
+- `grounding_ratio`: fraction of entities reaching primitives
+- `undefined_dependencies`: list of missing terms
+- `contradiction_candidates`: LLM-flagged pairs
+- `source_fidelity_score`: fraction passing source check
+
+**CLI:** `--check-consistency --provider <provider>`
+
+## 18. Granularity balance (Concern C5) — OPEN
+
+**Depends on:** Task 13 (metadata index).
+**Methodology:** Keet granularity theory + OntoClean rigidity +
+DSL laconicity. See METRICS-METHODOLOGY.md §4 C5.
+**Issue:** Entities range from broad sectors (`agriculture`) to specific
+market roles (`effectual-demanders`) to abstract principles
+(`division-of-labour`). It is unclear whether this range is appropriate
+or whether some entities are too specific/general relative to their peers.
+**Suggested fix:** Implement in three stages:
+
+1. **LLM classification** — For each entity, LLM-Eval assigns:
+   - Abstraction level: `theory` / `mechanism` / `observation`
+   - Scope score: 1-5 (very specific → very general)
+   - Indispensability: 1-5 ("if removed, how much explanatory power lost?")
+   Write to `output/evaluations/<slug>-classification.yaml`.
+
+2. **Distribution analysis** — Deterministic:
+   - Count per abstraction level; compute entropy
+   - Per-domain scope variance (flag domains with high variance)
+   - Level × domain matrix (from FCA context in task 15)
+   - Outlier detection: entities > 1.5σ from their domain's mean scope
+
+3. **Merge/split recommendations** — For outlier entities, LLM-Eval:
+   "Should this entity be merged into a broader concept, split into
+   sub-concepts, or is its current granularity justified?" For entities
+   with indispensability ≤ 2: "Could another entity serve this purpose?"
+
+**Metrics produced:**
+- `abstraction_distribution`: {theory: n, mechanism: n, observation: n}
+- `abstraction_entropy`: scalar (higher = more balanced)
+- `scope_variance_by_domain`: per-domain scalar
+- `dispensable_entities`: list (indispensability ≤ 2)
+- `merge_candidates`: list of pairs
+- `split_candidates`: list of entities
+
+**CLI:** `--check-granularity --provider <provider>`
+
+## 19. Unified collection evaluation command — OPEN
+
+**Depends on:** Tasks 13-18.
+**Issue:** Running five separate `--check-*` commands is cumbersome and
+repeats shared computation (metadata parsing, embedding, graph building).
+**Suggested fix:** Add `--evaluate-collection --provider <provider>` that
+runs all five checks in sequence, sharing infrastructure:
+
+1. Parse entity metadata index (task 13) — used by all
+2. Compute embeddings (task 14) — used by C1, C3
+3. Build relationship graph (task 16) — used by C3, C4
+4. Run all five concern checks
+5. Write per-concern reports to `output/metrics/`
+6. Write unified `metrics.yaml` with all collection metrics
+7. Append to `metrics-history.yaml` (task 12)
+
+Incremental mode: `--evaluate-collection --chapter <id>` re-evaluates
+only entities from that chapter plus pairwise checks involving them.
+
+Report a summary to stdout:
+
+```
+Collection evaluation (85 entities, 7 chapters):
+  Redundancy:   3 synonym candidates, conciseness 0.96
+  Coverage:     58% VSM, 20% chapters, 4 domain gaps
+  Coherence:    1 component, density 0.12, 2 orphans
+  Consistency:  0 cycles, 5 undefined deps, 0 contradictions
+  Granularity:  entropy 1.42, 1 dispensable, 2 merge candidates
+```
--- a/examples/infospace-with-history/METRICS-METHODOLOGY.md
+++ b/examples/infospace-with-history/METRICS-METHODOLOGY.md
@@ -0,0 +1,501 @@
+# Collection-Level Metrics Methodology
+
+How we evaluate the quality of the infospace as a **collection of
+interrelated concepts**, beyond the quality of individual entities.
+
+This document describes the theoretical frameworks drawn from ontology
+engineering, formal concept analysis, semiotic quality theory, and DSL
+design — and how each is adapted to work within MarkiTect's two-layer
+evaluation model (LLM-Eval + deterministic aggregation).
+
+---
+
+## 1. The Two-Layer Model
+
+Every metric in this methodology decomposes into two layers:
+
+| Layer | What it does | How it runs |
+|-------|-------------|-------------|
+| **LLM-Eval** | Qualitative judgment: "Are these two concepts the same?", "Is this definition grounded in the source?" | Prompt template → LLM → structured YAML output |
+| **Deterministic** | Quantitative aggregation: cosine similarity, graph connectivity, coverage counting, cycle detection | Python code in `process_chapters.py` or dedicated `metrics.py` |
+
+The LLM-Eval layer produces **per-entity** or **per-pair** structured
+scores. The deterministic layer **aggregates** these into collection-level
+metrics, persisted as machine-readable YAML alongside human-readable
+markdown reports.
+
+Per-concept quality metrics (definition precision, source grounding, VSM
+relevance — see INFRA-TASKS 8-12) operate at the individual entity level.
+This document covers the five **collection-level concerns** that assess how
+the entities work together as an explanatory system.
+
+---
+
+## 2. Five Collection-Level Concerns
+
+### Overview
+
+| # | Concern | Question | Primary framework |
+|---|---------|----------|-------------------|
+| C1 | Semantic Overlap | Are there redundant concepts? | OOPS! P2, embedding similarity |
+| C2 | Coverage Completeness | Does the concept set cover the domain? | SEQUAL, FCA |
+| C3 | Structural Coherence | Do concepts form a connected explanatory graph? | OntoQA, graph theory |
+| C4 | Definitional Consistency | Are concepts defined consistently and non-circularly? | OntoClean, OOPS! P24 |
+| C5 | Granularity Balance | Are concepts at comparable levels of abstraction? | Granularity theory, DSL laconicity |
+
+---
+
+## 3. Theoretical Frameworks
+
+### 3.1 SEQUAL (Semiotic Quality Framework)
+
+**Origin:** Lindland, Sindre & Sølvberg (1994), extended by Krogstie et al.
+
+**What it defines:** Quality of a conceptual model as the correspondence
+between three worlds — the domain (what exists), the model (what we
+captured), and the audience's interpretation (what they understand).
+
+Two key dimensions of **semantic quality**:
+
+- **Validity** — everything in the model corresponds to something real
+  in the domain. No invented concepts.
+- **Completeness** — everything relevant in the domain is represented in
+  the model. No missing concepts.
+
+**How we use it:** SEQUAL frames our entire metrics approach. Every
+collection-level metric maps to one of these dimensions:
+
+| SEQUAL dimension | Our concerns |
+|-----------------|--------------|
+| Validity | C1 (redundancy reduces validity — duplicate concepts don't correspond to distinct domain facts), C4 (consistency — contradictory definitions can't both be valid) |
+| Completeness | C2 (coverage — are all needed concepts present?), C5 (granularity — missing levels of abstraction are completeness gaps) |
+| Both | C3 (coherence — disconnected concepts suggest either missing bridging concepts [completeness] or misplaced concepts [validity]) |
+
+**Adaptation:** SEQUAL was designed for formal models evaluated by human
+experts. We replace human judgment with LLM-Eval (for validity checks like
+"does this concept correspond to something Smith actually described?") and
+deterministic counting (for completeness checks like "which VSM systems
+lack entity mappings?").
+
+### 3.2 OntoClean
+
+**Origin:** Guarino & Welty (2004).
+
+**What it defines:** A methodology for validating taxonomic relationships
+by assigning **metaproperties** to each concept:
+
+- **Rigidity** — Is the property essential to all its instances? (e.g.
+  "market" is rigid; "effectual demander" is anti-rigid — an agent can
+  stop being an effectual demander)
+- **Identity** — Does the concept carry an identity criterion? (e.g.
+  "division of labour" can be identified by its three causal mechanisms)
+- **Unity** — Are all instances of this concept whole in the same way?
+- **Dependence** — Does the concept require another concept to exist?
+  (e.g. "market price" depends on "effectual demand")
+
+**Constraint:** A rigid concept cannot be subsumed by an anti-rigid one.
+Violations indicate structural confusion.
+
+**How we use it:** We do not have a formal taxonomy, but our flat entity
+set implicitly contains subsumption relationships (e.g. "natural rate"
+subsumes "ordinary-or-average rate"). OntoClean metaproperties help detect:
+
+- **Granularity mismatches** (C5): A rigid concept at the same level as
+  an anti-rigid one suggests different abstraction levels are mixed.
+- **Definitional consistency** (C4): If entity A depends on entity B per
+  OntoClean, but B's definition doesn't acknowledge A, the definitions
+  are inconsistent.
+- **Redundancy** (C1): Two entities with identical metaproperty profiles
+  and overlapping definitions are candidates for merging.
+
+**Adaptation:** Instead of manual metaproperty assignment, we use LLM-Eval
+to classify each entity's rigidity, identity criterion, and dependencies.
+The constraint checking is then deterministic.
+
+### 3.3 OOPS! (Ontology Pitfall Scanner)
+
+**Origin:** Poveda-Villalón et al. (2014). Catalogue of 41 common
+ontology design pitfalls.
+
+**What it defines:** Concrete, testable anti-patterns. The pitfalls most
+relevant to our infospace:
+
+| Pitfall | Description | Our concern |
+|---------|-------------|-------------|
+| P2 | Synonymous classes — different names, same meaning | C1 (redundancy) |
+| P4 | Unconnected ontology elements | C3 (coherence) |
+| P6 | Missing inverse relationships | C3 |
+| P7 | Merging different concepts in the same class | C5 (granularity — too coarse) |
+| P11 | Missing domain or range | C4 (consistency) |
+| P19 | Missing disjointness axioms | C1 (how do we know two concepts don't overlap?) |
+| P24 | Recursive/circular definition | C4 (consistency) |
+| P25 | Inverse of itself | C4 |
+
+**How we use it:** OOPS! pitfalls become a **checklist for LLM-Eval
+prompts**. Rather than running a formal OWL scanner, we ask the LLM to
+check for each pitfall pattern:
+
+- "Are entities A and B synonymous?" (P2)
+- "Does entity A's definition reference itself?" (P24)
+- "Is entity A actually two distinct concepts merged together?" (P7)
+
+The deterministic layer counts pitfall occurrences and tracks them over
+time.
+
+**Adaptation:** We select the subset of OOPS! pitfalls applicable to
+semi-formal markdown-based ontologies (no OWL axioms) and implement each
+as an LLM-Eval prompt pattern rather than a formal reasoner check.
+
+### 3.4 OntoQA (Metric-Based Ontology Quality Analysis)
+
+**Origin:** Tartir & Arpinar (2007).
+
+**What it defines:** Quantitative schema-level and instance-level metrics:
+
+- **Relationship Richness (RR):** Proportion of non-taxonomic (lateral)
+  relationships to total relationships. `RR = non_hierarchical / total`.
+  Low RR = mere taxonomy. High RR = rich cross-cutting connections.
+- **Attribute Richness (AR):** Average number of attributes per concept.
+  `AR = total_attributes / total_concepts`.
+- **Inheritance Richness (IR):** Average subclasses per class — measures
+  how knowledge distributes across the hierarchy.
+- **Class Richness (CR):** Proportion of classes with instances.
+
+**How we use it:** Our entities don't have formal relationships declared
+between them, but we can **infer** a relationship graph from their
+definitions and mappings:
+
+- Entity A references entity B in its definition → definitional dependency
+- Entities A and B map to the same VSM system → structural co-occurrence
+- Entities A and B appear in the same chapter → contextual co-occurrence
+
+From this inferred graph, we compute OntoQA metrics directly:
+
+- **Relationship Richness** tells us whether our concepts form a web of
+  explanatory connections or just a flat list.
+- **Attribute Richness** maps to our schema sections — entities with more
+  optional sections filled (Original Wording, Modern Interpretation) are
+  richer.
+
+**Adaptation:** The key modification is that relationship inference is an
+LLM-Eval step (pairwise: "does A's definition depend on or reference B?"),
+after which all OntoQA metrics are computed deterministically on the
+resulting graph.
+
+### 3.5 Formal Concept Analysis (FCA)
+
+**Origin:** Wille (1982). Applied to ontology auditing by Elhaj et al.
+(2008) for SNOMED CT completeness checking.
+
+**What it defines:** A mathematical framework for deriving a **concept
+lattice** from a binary relation between objects and attributes. The
+lattice reveals:
+
+- **Formal concepts**: maximal sets of objects sharing the same attributes
+- **Subconcept/superconcept** relationships: the natural hierarchy
+- **Missing concepts**: attribute combinations with no corresponding object
+
+**How we use it:** We construct a **formal context** (binary matrix):
+
+- **Objects** = our 85 entities
+- **Attributes** = economic domain, VSM system, source book, abstraction
+  level (from LLM-Eval), key terms (extracted from definitions)
+
+The concept lattice then reveals:
+
+- **Coverage gaps** (C2): Attribute combinations with no entity. E.g. if
+  the cell {Distribution, S3} is empty, we lack control-layer concepts
+  for distribution — a specific, actionable gap.
+- **Redundancy** (C1): Entities with identical attribute sets (same formal
+  concept) are candidates for merging.
+- **Granularity** (C5): The lattice depth indicates how many meaningful
+  levels of abstraction exist. A shallow lattice suggests missing
+  intermediate concepts.
+
+**Adaptation:** Classic FCA requires crisp binary attributes. Our domains
+and VSM mappings are already categorical, but abstraction level and key
+terms need LLM-Eval to produce. The lattice computation itself is
+deterministic (Python `concepts` library or equivalent). The FCA approach
+replaces the current "ask the LLM about coverage" with a structural
+computation that can identify *specific* gaps rather than vague
+recommendations.
+
+### 3.6 DSL Design Principles
+
+**Origin:** Mernik et al. (2005) "When and How to Develop DSLs";
+Karsai et al. (2014) "Design Guidelines for Domain-Specific Languages".
+
+**What they define:** Quality criteria for a set of concepts that form a
+language for a specific domain:
+
+- **Soundness**: Every concept in the language corresponds to a real domain
+  concern (no invented abstractions).
+- **Completeness**: The language can express everything needed for its
+  intended tasks.
+- **Laconicity**: No unnecessary concepts — every concept earns its place.
+- **Orthogonality**: Concepts are independent; combining any two produces
+  a meaningful result (no redundant combinations).
+
+**How we use it:** Our entity set is effectively a domain-specific
+vocabulary for "explaining classical economics through VSM". DSL quality
+criteria translate directly:
+
+- **Soundness** → Validity (SEQUAL): every entity grounded in Smith's text
+- **Completeness** → Coverage (C2): can we answer the "competency
+  questions" the infospace is meant to address?
+- **Laconicity** → Anti-redundancy (C1) + Indispensability (C5): would
+  removing any entity lose explanatory power?
+- **Orthogonality** → Non-overlap (C1): entity definitions don't
+  substantially duplicate each other
+
+**Adaptation:** We operationalise DSL completeness through **competency
+questions** — a set of canonical questions the infospace should be able to
+answer (e.g. "How does the division of labour relate to market extent?",
+"What mechanisms regulate wages toward their natural rate?"). LLM-Eval
+tests whether the current entity set suffices to answer each question.
+Unanswerable questions identify specific completeness gaps.
+
+Laconicity is operationalised as **indispensability scoring**: for each
+entity, LLM-Eval rates whether removing it would lose explanatory power.
+Low-scoring entities are candidates for merging or retirement.
+
+---
+
+## 4. Integration: Metric Definitions by Concern
+
+### C1: Semantic Overlap / Redundancy
+
+**Goal:** Identify entities that substantially overlap in meaning and
+should be merged, distinguished, or retired.
+
+**Metrics:**
+
+| Metric | Type | Computation |
+|--------|------|-------------|
+| `similarity_matrix` | Deterministic | Embed all entity definitions; compute NxN cosine similarity |
+| `high_similarity_pairs` | Deterministic | Pairs with cosine > 0.80, sorted descending |
+| `confirmed_synonyms` | LLM-Eval | For each high-similarity pair, LLM judges: "same concept" / "genuinely distinct" / "partial overlap" |
+| `redundancy_ratio` | Deterministic | `confirmed_synonyms / total_entities` |
+| `intensional_conciseness` | Deterministic | `1 - redundancy_ratio` (from KG quality framework) |
+
+**Pipeline:**
+1. Embed definitions (embedding API or local model)
+2. Compute cosine similarity matrix
+3. Filter pairs above threshold
+4. LLM pairwise judgment on filtered pairs only (avoids N² LLM calls)
+5. Aggregate into ratio and conciseness score
+
+**Output:** `output/metrics/redundancy-report.md` + structured YAML with
+pair list, scores, and merge/retire recommendations.
+
+### C2: Coverage Completeness
+
+**Goal:** Identify domain areas and VSM systems that lack adequate
+representation in the entity set.
+
+**Metrics:**
+
+| Metric | Type | Computation |
+|--------|------|-------------|
+| `domain_vsm_matrix` | Deterministic | Count entities per {economic_domain, VSM_system} cell |
+| `coverage_ratio` | Deterministic | `populated_cells / expected_cells` |
+| `vsm_balance_entropy` | Deterministic | Shannon entropy of entity distribution across VSM systems (higher = more balanced) |
+| `empty_cells` | Deterministic | List of {domain, VSM_system} pairs with zero entities |
+| `competency_coverage` | LLM-Eval | For each competency question, can it be answered with current entities? |
+| `fca_gap_concepts` | Deterministic | Attribute combinations in the FCA lattice with no corresponding entity |
+
+**Pipeline:**
+1. Parse entity metadata (domain, VSM mapping) from files on disk
+2. Build domain × VSM matrix; identify empty cells
+3. Build FCA formal context; compute lattice; extract gap concepts
+4. Define competency questions (initially hand-written, later LLM-generated
+   from the source material)
+5. LLM-evaluate answerability of each question
+6. Aggregate into coverage ratio, entropy, and gap list
+
+**Output:** `output/metrics/coverage-report.md` + YAML with matrix, gaps,
+and competency question results.
+
+### C3: Structural Coherence
+
+**Goal:** Determine whether the entities form a connected explanatory web
+or a fragmented collection of isolated concepts.
+
+**Metrics:**
+
+| Metric | Type | Computation |
+|--------|------|-------------|
+| `relationship_graph` | LLM-Eval + Deterministic | Infer edges from definition cross-references (string matching) + LLM judgment for implicit references |
+| `connected_components` | Deterministic | Number of connected components in the graph (target: 1) |
+| `graph_density` | Deterministic | `actual_edges / possible_edges` |
+| `avg_degree` | Deterministic | `total_edges / total_entities` |
+| `relationship_richness` | Deterministic | OntoQA RR: `non_hierarchical_edges / total_edges` |
+| `modularity` | Deterministic | Louvain modularity score (0.3-0.7 = meaningful structure; >0.8 = fragmentation) |
+| `bridge_concepts` | Deterministic | Entities with highest betweenness centrality (connect clusters) |
+| `orphan_entities` | Deterministic | Entities with degree 0 or 1 |
+| `cohesion_by_domain` | Deterministic | Avg intra-domain edges per entity |
+| `coupling_across_domains` | Deterministic | Inter-domain edges / total edges |
+
+**Pipeline:**
+1. Extract explicit cross-references from definitions (entity name
+   mentions in other definitions — string matching with slug normalisation)
+2. For entity pairs not caught by string matching, LLM-Eval: "Does A's
+   definition depend on or reference B's concept?"
+3. Build directed graph
+4. Compute graph metrics (networkx or equivalent)
+5. Run community detection; compare detected communities to declared
+   economic domains
+
+**Output:** `output/metrics/coherence-report.md` + YAML with graph
+statistics, orphan list, bridge concepts, and community structure.
+
+### C4: Definitional Consistency
+
+**Goal:** Ensure entities are defined consistently, non-circularly, and
+without contradicting each other.
+
+**Metrics:**
+
+| Metric | Type | Computation |
+|--------|------|-------------|
+| `definitional_dependency_graph` | Deterministic + LLM-Eval | Edges where A's definition uses B's concept |
+| `circular_definitions` | Deterministic | Cycles of length ≤ 3 in the dependency graph |
+| `definition_depth` | Deterministic | Longest dependency chain per entity before reaching a term not in the entity set |
+| `undefined_dependencies` | Deterministic | Terms used in definitions that arguably should be entities but aren't |
+| `pairwise_consistency` | LLM-Eval | For related entity pairs (sharing edges): "Do these definitions contradict each other?" |
+| `source_fidelity` | LLM-Eval | "Does this definition accurately represent what Smith wrote in the cited passage?" |
+| `metaproperty_violations` | LLM-Eval + Deterministic | OntoClean constraint checking after LLM classifies rigidity/identity |
+| `grounding_ratio` | Deterministic | Fraction of entities traceable to primitives without cycles |
+
+**Pipeline:**
+1. Build definitional dependency graph (same technique as C3, but directed
+   — A depends on B means A's definition uses B, not vice versa)
+2. Detect cycles; flag short cycles
+3. Extract undefined terms (terms matching entity-name patterns that appear
+   in definitions but have no corresponding entity file)
+4. LLM pairwise consistency check on directly-connected pairs
+5. LLM source fidelity check (compare definition to source chapter text)
+6. LLM OntoClean metaproperty classification; deterministic constraint
+   checking
+
+**Output:** `output/metrics/consistency-report.md` + YAML with cycle list,
+undefined terms, contradiction candidates, and metaproperty violations.
+
+### C5: Granularity Balance
+
+**Goal:** Ensure entities operate at comparable levels of abstraction
+within their respective domains and perspectives.
+
+**Metrics:**
+
+| Metric | Type | Computation |
+|--------|------|-------------|
+| `abstraction_classification` | LLM-Eval | Classify each entity as theory-level / mechanism-level / observation-level |
+| `scope_score` | LLM-Eval | Rate each entity 1-5 for generality (1 = very specific instance, 5 = broad theoretical principle) |
+| `abstraction_distribution` | Deterministic | Count per level; compute entropy |
+| `scope_variance` | Deterministic | Variance of scope scores within each domain |
+| `level_x_perspective_matrix` | Deterministic | Cross-tabulation of abstraction level × economic domain |
+| `indispensability` | LLM-Eval | "If removed, what explanatory power is lost?" (1-5) |
+| `dispensable_entities` | Deterministic | Entities with indispensability score ≤ 2 |
+| `merge_candidates` | LLM-Eval | Pairs where one is a sub-case of the other |
+
+**Pipeline:**
+1. LLM-classify each entity: abstraction level, scope score,
+   indispensability
+2. Build level × perspective matrix
+3. Compute distribution entropy and per-domain scope variance
+4. Flag outliers: entities whose scope score deviates > 1.5σ from their
+   domain mean
+5. For outlier entities, LLM-Eval: "Should this be merged into a broader
+   concept, or split into sub-concepts?"
+
+**Output:** `output/metrics/granularity-report.md` + YAML with
+classifications, distribution, outliers, and merge/split recommendations.
+
+---
+
+## 5. Shared Infrastructure
+
+Several concerns share underlying computations:
+
+| Infrastructure | Used by | Build once |
+|---------------|---------|------------|
+| Definition embeddings (vector per entity) | C1, C3 | Embedding API call per entity |
+| Relationship graph (entity → entity edges) | C3, C4 | String matching + LLM-Eval |
+| FCA formal context (entity × attribute matrix) | C2, C5 | Metadata parsing + LLM classification |
+| Entity metadata index (domain, VSM, chapter, sections) | C2, C5, C10 (schema compliance) | Deterministic markdown parsing |
+
+These should be computed once per evaluation run and cached for use by
+all concern-specific metrics.
+
+---
+
+## 6. Evaluation Workflow
+
+A full collection-level evaluation run:
+
+```
+process_chapters.py --evaluate-collection --provider <provider>
+```
+
+1. **Parse** — deterministic metadata extraction from all entity files
+2. **Embed** — compute definition embeddings (cached; only new/changed
+   entities need fresh embeddings)
+3. **Infer** — LLM-Eval for relationship edges, metaproperties,
+   abstraction levels, pairwise judgments (batched to minimise LLM calls)
+4. **Compute** — deterministic graph metrics, FCA lattice, coverage
+   matrix, similarity matrix, cycle detection
+5. **Aggregate** — combine per-entity and per-pair scores into
+   collection-level metrics
+6. **Report** — write per-concern markdown reports + unified `metrics.yaml`
+7. **Append** — add timestamped snapshot to `metrics-history.yaml`
+
+Incremental mode (`--evaluate-collection --chapter <id>`) re-evaluates
+only the entities introduced or modified by that chapter, plus any
+pairwise checks involving those entities.
+
+---
+
+## 7. References
+
+- Lindland, O.I., Sindre, G. & Sølvberg, A. (1994). "Understanding
+  Quality in Conceptual Modeling." *IEEE Software* 11(2), 42-49.
+  → SEQUAL framework: validity and completeness dimensions.
+
+- Guarino, N. & Welty, C.A. (2004). "An Overview of OntoClean." In
+  *Handbook on Ontologies*, Springer, 151-171.
+  → Metaproperty analysis: rigidity, identity, unity, dependence.
+
+- Poveda-Villalón, M., Gómez-Pérez, A. & Suárez-Figueroa, M.C. (2014).
+  "OOPS! (OntOlogy Pitfall Scanner!): An On-line Tool for Ontology
+  Evaluation." *IJSWIS* 10(2), 7-34.
+  → Pitfall catalogue: 41 anti-patterns for ontology design.
+
+- Tartir, S. & Arpinar, I.B. (2007). "Ontology Evaluation and Ranking
+  using OntoQA." *ICSC 2007*, IEEE, 185-192.
+  → Schema metrics: relationship richness, attribute richness.
+
+- Wille, R. (1982). "Restructuring Lattice Theory." In *Ordered Sets*,
+  Reidel, 445-470.
+  → Formal Concept Analysis: concept lattices from binary contexts.
+
+- Elhaj, H. et al. (2008). "Auditing SNOMED CT with Formal Concept
+  Analysis." *AMIA Annual Symposium*, PMC2605587.
+  → FCA for ontology completeness auditing.
+
+- Keet, C.M. (2008). *A Formal Theory of Granularity.* PhD thesis,
+  Free University of Bozen-Bolzano.
+  → Granularity levels and perspectives for ontology design.
+
+- Mernik, M., Heering, J. & Sloane, A.M. (2005). "When and How to
+  Develop Domain-Specific Languages." *ACM Computing Surveys* 37(4),
+  316-344.
+  → DSL design: soundness, completeness, laconicity.
+
+- Karsai, G. et al. (2014). "Design Guidelines for Domain Specific
+  Languages." *arXiv:1409.2378*.
+  → Orthogonality, necessary-and-sufficient principle.
+
+- Xue, B. & Zou, L. (2022). "Knowledge Graph Quality Management: A
+  Comprehensive Survey." *IEEE TKDE* 35(5), 4969-4988.
+  → KG quality dimensions: conciseness, consistency, completeness.
--- a/examples/infospace-with-history/TUTORIAL.md
+++ b/examples/infospace-with-history/TUTORIAL.md
@@ -43,6 +43,7 @@ examples/infospace-with-history/
 ├── TUTORIAL.md                 # This file
 ├── INFRA-TASKS.md              # Infrastructure issues found during the experiment
 ├── process_chapters.py         # Pipeline script
+├── infospace.db                # SQLite artifact database (generated, not in git)
 │
 ├── schemas/                    # Output structure definitions
 │   ├── economic-entity-schema-v1.0.md
@@ -369,7 +370,53 @@ python process_chapters.py --stats

 ---

-## 7. How the LLM Integration Works
+## 7. The Artifact Database (`infospace.db`)
+
+The pipeline stores all artifacts (source text, templates, guidelines, generated
+outputs) and their dependency edges in a local SQLite database —
+`infospace.db`. This file is **not checked into git** because it is a derived
+cache that can be regenerated deterministically from the files already in the
+repository.
+
+### Why it is excluded
+
+- **Binary format** — SQLite databases don't produce meaningful diffs and
+  would bloat the git history with every pipeline run.
+- **Fully derived** — every piece of data in the database originates from
+  markdown files that *are* tracked in git (sources, templates, schemas,
+  guidelines, and generated output).
+- **Reproducible** — re-running the pipeline rebuilds the database from
+  scratch without any LLM calls, because each stage checks for existing
+  output files on disk before invoking the LLM.
+
+### How to regenerate it
+
+If `infospace.db` is missing (e.g. after a fresh clone), rebuild it by
+re-running the pipeline over the chapters that already have output on disk:
+
+```bash
+# Regenerate the database from existing output files (no LLM calls needed):
+python process_chapters.py --all --no-commit
+```
+
+This will:
+
+1. Create a fresh `infospace.db`
+2. Load all static artifacts (templates, guidelines, VSM reference)
+3. For each chapter whose output files already exist, import them into the
+   database and record dependency edges
+4. Skip LLM calls entirely — existing files are detected and reused
+
+After regeneration, `--list` and `--stats` work as normal:
+
+```bash
+python process_chapters.py --list
+python process_chapters.py --stats
+```
+
+---
+
+## 8. How the LLM Integration Works

 The pipeline uses MarkiTect's `markitect.llm` module, which provides three
 adapter backends that implement the `LLMAdapter` interface:
@@ -423,7 +470,7 @@ supports `gemini-2.5-flash` with generous rate limits.

 ---

-## 8. Tracking History with Git
+## 9. Tracking History with Git

 Every processed chapter produces a git commit containing:

@@ -459,7 +506,7 @@ git commit -m "infospace: process book-1-chapter-05"

 ---

-## 9. Cost and Performance
+## 10. Cost and Performance

 From our measurements processing chapters 3-5:

@@ -486,7 +533,7 @@ To reduce costs further, use a cheaper model:

 ---

-## 10. Completing the Remaining Chapters
+## 11. Completing the Remaining Chapters

 As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
 how to complete the rest.
@@ -555,7 +602,7 @@ fill the remaining gaps in S3*, S5, and regulatory concepts.

 ---

-## 11. Quality Improvement Loop
+## 12. Quality Improvement Loop

 The infospace is designed to be **iteratively refined**:

@@ -604,7 +651,7 @@ history of the infospace — every refinement decision is traceable.

 ---

-## 12. Infrastructure Issues Found and Fixed
+## 13. Infrastructure Issues Found and Fixed

 During development we documented three issues with the MarkiTect
 infrastructure in `INFRA-TASKS.md`:
@@ -624,7 +671,7 @@ See `INFRA-TASKS.md` for details on each fix.

 ---

-## 13. Adapting This Pattern to Your Own Project
+## 14. Adapting This Pattern to Your Own Project

 To build your own infospace using this pattern:

--- a/examples/infospace-with-history/infospace.yaml
+++ b/examples/infospace-with-history/infospace.yaml
@@ -0,0 +1,51 @@
+# Infospace: The Wealth of Nations through the Viable System Model
+#
+# This configuration declares the infospace built by processing
+# Adam Smith's "The Wealth of Nations" (1776) through the lens of
+# Stafford Beer's Viable System Model (VSM).
+
+topic:
+  name: "The Wealth of Nations"
+  domain: "Classical Economics"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Viable System Model"
+    path: artifacts/vsm-reference/
+
+schemas:
+  entity: schemas/economic-entity-schema-v1.0.md
+  mapping: schemas/vsm-mapping-schema-v1.0.md
+  analysis: schemas/chapter-analysis-schema-v1.0.md
+
+competency_questions: |
+  1. How does Smith's division of labour map to VSM System 1 operations?
+  2. What mechanisms in WoN correspond to VSM coordination (System 2)?
+  3. Where does Smith describe self-organising regulation (System 3)?
+  4. What role does the "invisible hand" play as a System 4 mechanism?
+  5. How do Smith's views on government map to System 5 policy?
+  6. Is the WoN entity set viable as an explanatory framework?
+
+viability:
+  redundancy_ratio:
+    max: 0.10
+  coverage_ratio:
+    min: 0.50
+  coherence_components:
+    max: 3
+  consistency_cycles:
+    max: 0
+  granularity_entropy:
+    min: 1.0
+
+pipeline:
+  stages:
+    - name: extract-entities
+      template: templates/extract-entities.md
+    - name: map-to-vsm
+      template: templates/map-to-vsm.md
+    - name: synthesize-analysis
+      template: templates/synthesize-analysis.md
+  post_batch:
+    - name: assess-metrics
+      template: templates/assess-metrics.md
--- a/examples/infospace-with-history/output/metrics/history.yaml
+++ b/examples/infospace-with-history/output/metrics/history.yaml
@@ -0,0 +1,26 @@
+- snapshot_id: 6ba48eb2
+  created_at: '2026-02-19T01:29:41.225843+00:00'
+  schema_name: default
+  entity_count: 85
+  entity_evaluations: []
+  collection_metrics:
+  - name: coherence_components
+    value: 0.0
+    concern: C3
+  - name: consistency_cycles
+    value: 0.0
+    concern: C4
+  - name: coverage_ratio
+    value: 0.3611111111111111
+    concern: C2
+  - name: granularity_entropy
+    value: 2.687485267017996
+    concern: C5
+  - name: modularity
+    value: 0.0
+    concern: C3
+  - name: redundancy_ratio
+    value: 0.0
+    concern: C1
+  metadata:
+    source: collection-checks
--- a/examples/infospace-with-history/output/metrics/metrics.yaml
+++ b/examples/infospace-with-history/output/metrics/metrics.yaml
@@ -0,0 +1,6 @@
+coherence_components: 0.0
+consistency_cycles: 0.0
+coverage_ratio: 0.361111
+granularity_entropy: 2.687485
+modularity: 0.0
+redundancy_ratio: 0.0
--- a/examples/infospace-with-history/process_chapters.py
+++ b/examples/infospace-with-history/process_chapters.py
@@ -856,6 +856,125 @@ class ChapterProcessor:
            print(f"  (No data yet: {e})")


+# ── Infospace tooling integration ─────────────────────────────────
+
+
+def _load_infospace(example_dir: Path):
+    """Load infospace config and entities from the example directory."""
+    from markitect.infospace.config import load_infospace_config
+    from markitect.infospace.entity_parser import parse_entity_directory
+
+    config_path = example_dir / "infospace.yaml"
+    if not config_path.is_file():
+        print("Error: No infospace.yaml found. Create one first.")
+        sys.exit(1)
+
+    config = load_infospace_config(config_path)
+    entities_dir = example_dir / config.entities_dir
+    entities = parse_entity_directory(entities_dir) if entities_dir.is_dir() else []
+    return config, config_path, entities
+
+
+def _run_infospace_status(example_dir: Path):
+    """Show infospace status using the tooling layer."""
+    from markitect.infospace.state import build_state
+
+    config, config_path, entities = _load_infospace(example_dir)
+    state = build_state(config, entities=entities)
+
+    print(f"Infospace: {state.topic_name}")
+    print(f"Domain:    {config.topic.domain}")
+    print(f"Entities:  {state.entity_count}")
+    if state.domains:
+        print(f"Domains:   {', '.join(state.domains)}")
+    if config.disciplines:
+        names = [d.name for d in config.disciplines]
+        print(f"Disciplines: {', '.join(names)}")
+
+    # Show processing progress
+    sources_dir = example_dir / "artifacts" / "sources"
+    total_chapters = len(list(sources_dir.glob("*.md")))
+    processed = len(list((example_dir / "output" / "analyses").glob("*-analysis.md")))
+    print(f"Chapters:  {processed}/{total_chapters} processed")
+
+
+def _run_infospace_check(example_dir: Path):
+    """Run collection-level quality checks."""
+    from markitect.infospace.checks import run_all_checks
+    from markitect.infospace.history import record_check_results
+
+    config, config_path, entities = _load_infospace(example_dir)
+
+    if not entities:
+        print("No entities to check.")
+        return
+
+    print(f"Running collection checks on {len(entities)} entities...\n")
+    report = run_all_checks(entities=entities)
+
+    d = report.to_dict()
+    for concern_name, concern_data in d.items():
+        label = concern_data.get("concern", concern_name.upper())
+        print(f"  {label} — {concern_name}")
+        for k, v in concern_data.items():
+            if k == "concern":
+                continue
+            print(f"    {k}: {v}")
+        print()
+
+    m = report.metrics()
+    if m:
+        print("Metrics summary:")
+        for k, v in sorted(m.items()):
+            print(f"  {k}: {v:.4f}")
+        snap = record_check_results(report, config, example_dir, entity_count=len(entities))
+        print(f"\nRecorded snapshot {snap.snapshot_id}")
+
+
+def _run_infospace_viability(example_dir: Path):
+    """Show viability dashboard."""
+    from markitect.infospace.history import read_metrics_file
+    from markitect.infospace.state import build_state
+
+    config, config_path, entities = _load_infospace(example_dir)
+
+    if not config.viability:
+        print("No viability thresholds configured.")
+        return
+
+    metrics = read_metrics_file(example_dir / config.metrics_dir / "metrics.yaml")
+    if not metrics:
+        print("No metrics available. Run --infospace-check first.")
+        print("\nConfigured thresholds:")
+        for name, t in config.viability.items():
+            bounds = []
+            if t.min is not None:
+                bounds.append(f"min={t.min}")
+            if t.max is not None:
+                bounds.append(f"max={t.max}")
+            print(f"  {name}: {', '.join(bounds)}")
+        return
+
+    state = build_state(config, entities=entities, metrics=metrics)
+
+    print(f"{'Metric':<30} {'Value':>8} {'Threshold':>15} {'Status':>8}")
+    print("-" * 63)
+    for r in state.viability_results:
+        bounds = []
+        if r.threshold.min is not None:
+            bounds.append(f"min={r.threshold.min}")
+        if r.threshold.max is not None:
+            bounds.append(f"max={r.threshold.max}")
+        status_str = "PASS" if r.passed else "FAIL"
+        print(f"{r.metric:<30} {r.value:>8.4f} {', '.join(bounds):>15} {status_str:>8}")
+
+    print()
+    if state.is_viable:
+        print(f"Viable: YES ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
+    else:
+        print(f"Viable: NO ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
+
+
 def main():
    parser = argparse.ArgumentParser(
        description="Process Wealth of Nations chapters through VSM analysis pipeline"
@@ -869,6 +988,12 @@ def main():
    group.add_argument("--stats", action="store_true", help="Show dependency statistics")
    group.add_argument("--archive-entity", type=str, metavar="SLUG",
                       help="Archive an entity (move to archive/ with reason)")
+    group.add_argument("--infospace-status", action="store_true",
+                       help="Show infospace status via infospace tooling")
+    group.add_argument("--infospace-check", action="store_true",
+                       help="Run collection-level quality checks (C1-C5)")
+    group.add_argument("--infospace-viability", action="store_true",
+                       help="Show viability dashboard")

    parser.add_argument("--reason", type=str, default=None,
                        help="Reason for archiving (used with --archive-entity)")
@@ -930,6 +1055,15 @@ def main():
        for ch in chapters:
            processor.process_chapter(ch, auto_commit=not args.no_commit)
            print()
+    elif args.infospace_status:
+        _run_infospace_status(example_dir)
+        return
+    elif args.infospace_check:
+        _run_infospace_check(example_dir)
+        return
+    elif args.infospace_viability:
+        _run_infospace_viability(example_dir)
+        return

    processor.show_stats()

--- a/markitect/analysis/init.py
+++ b/markitect/analysis/init.py
@@ -0,0 +1,6 @@
+"""
+markitect.analysis — Analytical utilities for MarkiTect.
+
+Provides graph analysis, similarity computation, and other
+quantitative tools used by infospace tooling.
+"""
--- a/markitect/analysis/fca.py
+++ b/markitect/analysis/fca.py
@@ -0,0 +1,307 @@
+"""
+Formal Concept Analysis (FCA) for coverage gap detection.
+
+Provides a pure-Python implementation of:
+
+- :class:`FormalContext` — entity × attribute binary relation with
+  extent/intent operations and double-prime closure.
+- :class:`ConceptLattice` — the set of all formal concepts computed
+  via the NextClosure algorithm (Ganter, 1984).
+- :func:`find_gap_concepts` — attribute combinations present in the
+  lattice whose extent is empty, revealing structural coverage gaps.
+
+Sufficient for entity scales of ~100s.  For larger contexts a library
+such as ``concepts`` (PyPI) can be substituted.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Iterable, Optional
+
+
+class FormalContext:
+    """Binary relation between objects and attributes.
+
+    Args:
+        objects: Iterable of object identifiers (e.g. entity slugs).
+        attributes: Iterable of attribute identifiers (e.g. "domain:Production").
+        incidence: Mapping of object → set of attributes it possesses.
+    """
+
+    def __init__(
+        self,
+        objects: Iterable[str],
+        attributes: Iterable[str],
+        incidence: dict[str, set[str]],
+    ):
+        self._objects = sorted(set(objects))
+        self._attributes = sorted(set(attributes))
+        self._obj_set = frozenset(self._objects)
+        self._attr_set = frozenset(self._attributes)
+
+        # Normalise incidence: only keep known attributes
+        self._incidence: dict[str, frozenset[str]] = {}
+        for obj in self._objects:
+            raw = incidence.get(obj, set())
+            self._incidence[obj] = frozenset(raw) & self._attr_set
+
+        # Reverse index: attribute → set of objects that have it
+        self._attr_to_objs: dict[str, frozenset[str]] = {}
+        for attr in self._attributes:
+            self._attr_to_objs[attr] = frozenset(
+                obj for obj in self._objects if attr in self._incidence[obj]
+            )
+
+    @property
+    def objects(self) -> list[str]:
+        """Sorted list of objects."""
+        return list(self._objects)
+
+    @property
+    def attributes(self) -> list[str]:
+        """Sorted list of attributes."""
+        return list(self._attributes)
+
+    @property
+    def object_count(self) -> int:
+        return len(self._objects)
+
+    @property
+    def attribute_count(self) -> int:
+        return len(self._attributes)
+
+    def extent(self, attrs: Iterable[str]) -> frozenset[str]:
+        """Objects possessing **all** given attributes (B' operation)."""
+        attr_set = frozenset(attrs)
+        if not attr_set:
+            return self._obj_set
+        result = self._obj_set
+        for attr in attr_set:
+            result = result & self._attr_to_objs.get(attr, frozenset())
+        return result
+
+    def intent(self, objs: Iterable[str]) -> frozenset[str]:
+        """Attributes shared by **all** given objects (A' operation)."""
+        obj_list = [o for o in objs if o in self._incidence]
+        if not obj_list:
+            return self._attr_set
+        result = self._incidence[obj_list[0]]
+        for obj in obj_list[1:]:
+            result = result & self._incidence[obj]
+        return result
+
+    def closure(self, attrs: Iterable[str]) -> frozenset[str]:
+        """Double-prime closure: B'' = intent(extent(B))."""
+        return self.intent(self.extent(attrs))
+
+    def has_attribute(self, obj: str, attr: str) -> bool:
+        """Check if *obj* has *attr*."""
+        return attr in self._incidence.get(obj, frozenset())
+
+    def density(self) -> float:
+        """Proportion of 1s in the incidence matrix."""
+        total = len(self._objects) * len(self._attributes)
+        if total == 0:
+            return 0.0
+        filled = sum(len(attrs) for attrs in self._incidence.values())
+        return filled / total
+
+    @classmethod
+    def from_dict(cls, entity_attributes: dict[str, set[str]]) -> FormalContext:
+        """Convenience: build context from ``{object: {attr, ...}}``."""
+        objects = list(entity_attributes.keys())
+        all_attrs: set[str] = set()
+        for attrs in entity_attributes.values():
+            all_attrs.update(attrs)
+        return cls(objects, all_attrs, entity_attributes)
+
+
+@dataclass(frozen=True)
+class FormalConcept:
+    """A formal concept (A, B) where A' = B and B' = A."""
+
+    extent: frozenset[str]
+    intent: frozenset[str]
+
+    @property
+    def extent_size(self) -> int:
+        return len(self.extent)
+
+    @property
+    def intent_size(self) -> int:
+        return len(self.intent)
+
+
+@dataclass
+class ConceptLattice:
+    """The set of all formal concepts derived from a :class:`FormalContext`.
+
+    Concepts are ordered by extent inclusion (subconcept ≤ superconcept).
+    """
+
+    concepts: list[FormalConcept] = field(default_factory=list)
+
+    @property
+    def size(self) -> int:
+        """Number of formal concepts in the lattice."""
+        return len(self.concepts)
+
+    @property
+    def top(self) -> Optional[FormalConcept]:
+        """Supremum: concept with largest extent."""
+        if not self.concepts:
+            return None
+        return max(self.concepts, key=lambda c: c.extent_size)
+
+    @property
+    def bottom(self) -> Optional[FormalConcept]:
+        """Infimum: concept with largest intent."""
+        if not self.concepts:
+            return None
+        return max(self.concepts, key=lambda c: c.intent_size)
+
+    @classmethod
+    def from_context(cls, context: FormalContext) -> ConceptLattice:
+        """Compute all formal concepts using the NextClosure algorithm."""
+        attrs = context.attributes  # sorted, fixed order
+        if not attrs:
+            # Degenerate: no attributes → single concept with all objects
+            top = FormalConcept(
+                extent=frozenset(context.objects),
+                intent=frozenset(),
+            )
+            return cls(concepts=[top])
+
+        concepts: list[FormalConcept] = []
+
+        # Start with closure of empty attribute set
+        current = context.closure(frozenset())
+        ext = context.extent(current)
+        concepts.append(FormalConcept(extent=ext, intent=current))
+
+        while current != frozenset(attrs):
+            nxt = _next_closure(current, attrs, context.closure)
+            if nxt is None:
+                break
+            ext = context.extent(nxt)
+            concepts.append(FormalConcept(extent=ext, intent=nxt))
+            current = nxt
+
+        return cls(concepts=concepts)
+
+    def gap_concepts(self) -> list[FormalConcept]:
+        """Formal concepts whose extent is empty."""
+        return [c for c in self.concepts if c.extent_size == 0]
+
+    def concepts_with_extent_size(self, min_size: int = 0, max_size: Optional[int] = None) -> list[FormalConcept]:
+        """Filter concepts by extent size."""
+        result = [c for c in self.concepts if c.extent_size >= min_size]
+        if max_size is not None:
+            result = [c for c in result if c.extent_size <= max_size]
+        return result
+
+    def depth(self) -> int:
+        """Longest chain length in the concept ordering.
+
+        A chain is a sequence of concepts c_1 < c_2 < ... < c_k
+        where < means strict subconcept (extent inclusion).
+        """
+        if not self.concepts:
+            return 0
+
+        # Build DAG: concept i → j if i is direct subconcept of j
+        # Use extent inclusion: i < j iff extent_i ⊂ extent_j
+        n = len(self.concepts)
+        extents = [c.extent for c in self.concepts]
+
+        # Longest path via dynamic programming on sorted order
+        # Sort by extent size ascending (smaller extents = more specific)
+        order = sorted(range(n), key=lambda i: len(extents[i]))
+        longest = [1] * n
+
+        for idx in range(n):
+            i = order[idx]
+            for jdx in range(idx + 1, n):
+                j = order[jdx]
+                if extents[i] < extents[j]:  # strict subset
+                    if longest[j] < longest[i] + 1:
+                        longest[j] = longest[i] + 1
+
+        return max(longest) if longest else 0
+
+
+def find_gap_concepts(
+    context: FormalContext,
+    lattice: Optional[ConceptLattice] = None,
+) -> list[FormalConcept]:
+    """Find formal concepts with empty extent (coverage gaps).
+
+    These represent attribute combinations that are structurally
+    present in the lattice but have no corresponding entities.
+
+    Args:
+        context: The formal context.
+        lattice: Pre-computed lattice.  If ``None``, computed from *context*.
+
+    Returns:
+        List of :class:`FormalConcept` with empty extent, sorted by
+        intent size ascending (most specific gaps first).
+    """
+    if lattice is None:
+        lattice = ConceptLattice.from_context(context)
+    gaps = lattice.gap_concepts()
+    gaps.sort(key=lambda c: c.intent_size)
+    return gaps
+
+
+def find_empty_cells(
+    context: FormalContext,
+    dimension_a: list[str],
+    dimension_b: list[str],
+) -> list[tuple[str, str]]:
+    """Find empty cells in a two-dimensional cross-tabulation.
+
+    Given two sets of attributes (e.g. domain values and VSM systems),
+    return pairs ``(attr_a, attr_b)`` where no object possesses both.
+
+    This is a simpler alternative to full FCA for two-dimensional
+    coverage analysis.
+    """
+    empty: list[tuple[str, str]] = []
+    for a in sorted(dimension_a):
+        for b in sorted(dimension_b):
+            if not context.extent([a, b]):
+                empty.append((a, b))
+    return empty
+
+
+# ── NextClosure internals ───────────────────────────────────────────
+
+
+def _next_closure(
+    current: frozenset[str],
+    attrs: list[str],
+    closure_fn,
+) -> Optional[frozenset[str]]:
+    """Compute the next closed set in lectic order after *current*.
+
+    Implements Ganter's NextClosure algorithm.
+    """
+    for i in range(len(attrs) - 1, -1, -1):
+        m = attrs[i]
+        if m in current:
+            current = current - {m}
+        else:
+            candidate = current | {m}
+            closed = closure_fn(candidate)
+            # Canonicity test: no attribute before position i
+            # was added by the closure
+            canonical = True
+            for j in range(i):
+                if attrs[j] in closed and attrs[j] not in candidate:
+                    canonical = False
+                    break
+            if canonical:
+                return closed
+    return None
--- a/markitect/analysis/graph.py
+++ b/markitect/analysis/graph.py
@@ -0,0 +1,184 @@
+"""
+Graph analysis utilities for collection-level metrics.
+
+Provides connected components, centrality, community detection,
+modularity, degree distribution, and cohesion/coupling computation.
+
+Requires ``networkx`` (optional dependency)::
+
+    pip install networkx
+"""
+
+from __future__ import annotations
+
+from typing import Optional
+
+from markitect.prompts.dependencies.models import DependencyGraph
+
+
+def _require_networkx():
+    """Import and return networkx, raising a clear error if missing."""
+    try:
+        import networkx as nx
+        return nx
+    except ImportError:
+        raise ImportError(
+            "networkx is required for graph analysis. "
+            "Install it with: pip install networkx"
+        ) from None
+
+
+def to_networkx(graph: DependencyGraph):
+    """Convert a :class:`DependencyGraph` to a networkx ``DiGraph``.
+
+    Each edge carries an ``edge_type`` attribute (string value of the
+    :class:`EdgeType` enum, or ``None``).
+    """
+    nx = _require_networkx()
+    G = nx.DiGraph()
+    G.add_nodes_from(graph.nodes)
+    for node in graph.nodes:
+        for succ in graph.get_successors(node):
+            edge_type = graph.get_edge_type(node, succ)
+            G.add_edge(
+                node, succ,
+                edge_type=edge_type.value if edge_type else None,
+            )
+    return G
+
+
+def connected_components(graph: DependencyGraph) -> list[set[str]]:
+    """Find weakly connected components (edges treated as undirected).
+
+    Returns a list of node sets, one per component, sorted largest-first.
+    """
+    nx = _require_networkx()
+    G = to_networkx(graph)
+    components = list(nx.weakly_connected_components(G))
+    components.sort(key=len, reverse=True)
+    return [set(c) for c in components]
+
+
+def betweenness_centrality(graph: DependencyGraph) -> dict[str, float]:
+    """Compute betweenness centrality for all nodes.
+
+    Returns a dict mapping node ID to centrality score in [0, 1].
+    """
+    nx = _require_networkx()
+    G = to_networkx(graph)
+    return nx.betweenness_centrality(G)
+
+
+def detect_communities(
+    graph: DependencyGraph,
+    seed: Optional[int] = None,
+) -> list[set[str]]:
+    """Detect communities using the Louvain algorithm.
+
+    Operates on an undirected projection of the graph.  Returns a list
+    of node sets, one per community, sorted largest-first.
+
+    Args:
+        graph: The dependency graph to analyse.
+        seed: Random seed for reproducibility (passed to Louvain).
+    """
+    nx = _require_networkx()
+    G = to_networkx(graph).to_undirected()
+    if len(G.nodes) == 0:
+        return []
+    communities = list(nx.community.louvain_communities(G, seed=seed))
+    communities.sort(key=len, reverse=True)
+    return [set(c) for c in communities]
+
+
+def modularity_score(
+    graph: DependencyGraph,
+    communities: Optional[list[set[str]]] = None,
+    seed: Optional[int] = None,
+) -> float:
+    """Compute the modularity score for a community partition.
+
+    Args:
+        graph: The dependency graph.
+        communities: Pre-computed communities. If ``None``, communities
+            are detected via :func:`detect_communities`.
+        seed: Random seed (used only when *communities* is ``None``).
+
+    Returns:
+        Modularity in [-0.5, 1.0].  Returns 0.0 for graphs with no edges.
+    """
+    nx = _require_networkx()
+    G = to_networkx(graph).to_undirected()
+    if len(G.edges) == 0:
+        return 0.0
+    if communities is None:
+        communities = detect_communities(graph, seed=seed)
+    return nx.community.modularity(G, communities)
+
+
+def degree_distribution(graph: DependencyGraph) -> dict[str, dict[str, int]]:
+    """Compute in-degree, out-degree, and total degree for each node.
+
+    Returns::
+
+        {"node_id": {"in_degree": 2, "out_degree": 1, "total_degree": 3}, ...}
+    """
+    nx = _require_networkx()
+    G = to_networkx(graph)
+    result = {}
+    for node in G.nodes:
+        ind = G.in_degree(node)
+        outd = G.out_degree(node)
+        result[node] = {
+            "in_degree": ind,
+            "out_degree": outd,
+            "total_degree": ind + outd,
+        }
+    return result
+
+
+def cohesion_coupling(
+    graph: DependencyGraph,
+    communities: Optional[list[set[str]]] = None,
+    seed: Optional[int] = None,
+) -> dict:
+    """Compute cohesion (intra-community edges) and coupling (inter-community edges).
+
+    Args:
+        graph: The dependency graph.
+        communities: Pre-computed communities.  If ``None``, detected
+            via :func:`detect_communities`.
+        seed: Random seed (used only when *communities* is ``None``).
+
+    Returns:
+        Dict with keys ``cohesion``, ``coupling`` (ratios in [0, 1]),
+        ``intra_edges``, ``inter_edges``, ``total_edges``, ``communities``.
+    """
+    _require_networkx()
+    G = to_networkx(graph)
+    if communities is None:
+        communities = detect_communities(graph, seed=seed)
+
+    # Build node → community index
+    node_community: dict[str, int] = {}
+    for i, comm in enumerate(communities):
+        for node in comm:
+            node_community[node] = i
+
+    intra = 0
+    inter = 0
+    for u, v in G.edges:
+        if node_community.get(u) == node_community.get(v):
+            intra += 1
+        else:
+            inter += 1
+
+    total = intra + inter
+    return {
+        "cohesion": intra / total if total > 0 else 0.0,
+        "coupling": inter / total if total > 0 else 0.0,
+        "intra_edges": intra,
+        "inter_edges": inter,
+        "total_edges": total,
+        "communities": len(communities),
+    }
--- a/markitect/cli.py
+++ b/markitect/cli.py
@@ -7147,6 +7147,13 @@ try:
 except ImportError:
    pass  # Helper module not available

+# Register infospace commands
+try:
+    from markitect.infospace.cli import infospace_commands
+    cli.add_command(infospace_commands)
+except ImportError:
+    pass  # Infospace module not available
+
 # Register proxy file system commands
 try:
    from markitect.proxy.cli import proxy_group
--- a/markitect/core/init.py
+++ b/markitect/core/init.py
@@ -9,6 +9,7 @@ This package contains the fundamental building blocks:
 """

 from .parser import parse_markdown_to_ast
+from .section_tree import build_section_tree, extract_section_text
 from .serializer import ASTSerializer
 from .document_manager import DocumentManager, CleanDocumentManager
 from .workspace import (
@@ -29,6 +30,9 @@ from .workspace import (
 __all__ = [
    # Parser
    "parse_markdown_to_ast",
+    # Section tree
+    "build_section_tree",
+    "extract_section_text",
    # Serializer
    "ASTSerializer",
    # Document Manager
--- a/markitect/core/section_tree.py
+++ b/markitect/core/section_tree.py
@@ -0,0 +1,124 @@
+"""
+Standalone section-tree utilities extracted from SchemaGenerator.
+
+Builds a hierarchical section tree from flat markdown-it AST tokens and
+provides helpers for navigating heading structure and extracting text.
+These functions are used by both the schema generator and the infospace
+entity parser.
+"""
+
+import re
+from typing import Any, Dict, List, Optional
+
+
+def slugify(text: str) -> str:
+    """Convert heading or label text to a valid slug / JSON property key."""
+    replacements = {
+        'ä': 'ae', 'ö': 'oe', 'ü': 'ue',
+        'Ä': 'Ae', 'Ö': 'Oe', 'Ü': 'Ue', 'ß': 'ss',
+    }
+    slug = text
+    for char, repl in replacements.items():
+        slug = slug.replace(char, repl)
+    slug = slug.lower()
+    slug = re.sub(r'[^a-z0-9]+', '_', slug)
+    slug = slug.strip('_')
+    return slug or 'feld'
+
+
+def extract_heading_level(tag: str) -> int:
+    """Extract heading level from an HTML tag string (h1, h2, …)."""
+    if tag.startswith('h') and len(tag) == 2:
+        try:
+            return int(tag[1])
+        except ValueError:
+            pass
+    return 1
+
+
+def extract_heading_content(tokens: List[Dict[str, Any]], start_index: int) -> str:
+    """Return the inline text content following a ``heading_open`` token."""
+    for i in range(start_index, min(start_index + 3, len(tokens))):
+        token = tokens[i]
+        if token.get('type') == 'inline':
+            return token.get('content', '')
+    return ''
+
+
+def build_section_tree(
+    tokens: List[Dict[str, Any]], max_depth: Optional[int] = None
+) -> Dict[str, Any]:
+    """
+    Build a hierarchical section tree from a flat markdown-it token list.
+
+    Returns a root node whose ``children`` list contains the top-level
+    sections.  Each node carries:
+
+    - ``heading`` – heading text (``None`` for the root)
+    - ``level`` – heading depth (``0`` for the root)
+    - ``slug`` – slugified heading
+    - ``content_tokens`` – non-heading tokens belonging to this section
+    - ``children`` – nested sub-sections
+    """
+    root: Dict[str, Any] = {
+        'heading': None, 'level': 0, 'slug': '',
+        'content_tokens': [], 'children': []
+    }
+    stack = [root]
+
+    i = 0
+    while i < len(tokens):
+        token = tokens[i]
+        if token.get('type') == 'heading_open':
+            level = extract_heading_level(token.get('tag', ''))
+            heading_text = extract_heading_content(tokens, i)
+
+            if max_depth is not None and level > max_depth:
+                # Skip this heading and its close token, but keep content
+                i += 1
+                while i < len(tokens) and tokens[i].get('type') != 'heading_close':
+                    i += 1
+                i += 1
+                continue
+
+            section: Dict[str, Any] = {
+                'heading': heading_text,
+                'level': level,
+                'slug': slugify(heading_text),
+                'content_tokens': [],
+                'children': []
+            }
+
+            # Pop stack until we find the parent (level < current)
+            while len(stack) > 1 and stack[-1]['level'] >= level:
+                stack.pop()
+
+            stack[-1]['children'].append(section)
+            stack.append(section)
+
+            # Skip past heading_close
+            i += 1
+            while i < len(tokens) and tokens[i].get('type') != 'heading_close':
+                i += 1
+        else:
+            # Add content token to current section
+            stack[-1]['content_tokens'].append(token)
+
+        i += 1
+
+    return root
+
+
+def extract_section_text(section: Dict[str, Any]) -> str:
+    """
+    Return the plain text content of a section node.
+
+    Concatenates the ``content`` field of every ``inline`` token found
+    in the section's ``content_tokens``.  Paragraphs are separated by
+    newlines; other inline tokens are joined with spaces.
+    """
+    parts: List[str] = []
+    for token in section.get('content_tokens', []):
+        if token.get('type') == 'inline':
+            parts.append(token.get('content', ''))
+    return '\n'.join(parts)
--- a/markitect/infospace/init.py
+++ b/markitect/infospace/init.py
@@ -0,0 +1,107 @@
+"""
+Infospace analysis package.
+
+Provides tooling for extracting structured metadata from entity markdown
+files and analysing infospace collections.
+"""
+
+from .models import EntityMeta
+from .entity_parser import parse_entity_file, parse_entity_directory
+from .schema import (
+    ECONOMIC_ENTITY_SCHEMA,
+    EntitySchema,
+    EnumConstraint,
+    SectionRequirement,
+    SectionRule,
+)
+from .validator import (
+    BatchComplianceResult,
+    ComplianceDiagnostic,
+    ComplianceResult,
+    validate_entities,
+    validate_entity,
+)
+from .evaluation import (
+    EntityEvaluation,
+    EvaluationSnapshot,
+    MetricChange,
+    MetricValue,
+    ScoreChange,
+    ScoreEntry,
+    SnapshotDiff,
+)
+from .evaluation_io import (
+    append_to_history,
+    diff_snapshots,
+    read_entity_evaluation,
+    read_history,
+    read_snapshot,
+    write_entity_evaluation,
+    write_snapshot,
+)
+from .config import (
+    DisciplineBinding,
+    InfospaceConfig,
+    PipelineConfig,
+    PipelineStage,
+    SchemaRegistry,
+    TopicConfig,
+    ViabilityThreshold,
+    find_infospace_config,
+    load_infospace_config,
+    save_infospace_config,
+)
+from .state import (
+    InfospaceState,
+    ViabilityResult,
+    build_state,
+)
+
+__all__ = [
+    "EntityMeta",
+    "parse_entity_file",
+    "parse_entity_directory",
+    # Schema
+    "ECONOMIC_ENTITY_SCHEMA",
+    "EntitySchema",
+    "EnumConstraint",
+    "SectionRequirement",
+    "SectionRule",
+    # Validator
+    "BatchComplianceResult",
+    "ComplianceDiagnostic",
+    "ComplianceResult",
+    "validate_entities",
+    "validate_entity",
+    # Evaluation models
+    "EntityEvaluation",
+    "EvaluationSnapshot",
+    "MetricChange",
+    "MetricValue",
+    "ScoreChange",
+    "ScoreEntry",
+    "SnapshotDiff",
+    # Evaluation I/O
+    "append_to_history",
+    "diff_snapshots",
+    "read_entity_evaluation",
+    "read_history",
+    "read_snapshot",
+    "write_entity_evaluation",
+    "write_snapshot",
+    # Config
+    "DisciplineBinding",
+    "InfospaceConfig",
+    "PipelineConfig",
+    "PipelineStage",
+    "SchemaRegistry",
+    "TopicConfig",
+    "ViabilityThreshold",
+    "find_infospace_config",
+    "load_infospace_config",
+    "save_infospace_config",
+    # State
+    "InfospaceState",
+    "ViabilityResult",
+    "build_state",
+]
--- a/markitect/infospace/checks/init.py
+++ b/markitect/infospace/checks/init.py
@@ -0,0 +1,23 @@
+"""
+Collection-level quality checks for infospaces.
+
+Five concerns: Redundancy (C1), Coverage (C2), Coherence (C3),
+Consistency (C4), Granularity (C5).
+"""
+
+from markitect.infospace.checks.redundancy import check_redundancy
+from markitect.infospace.checks.coverage import check_coverage
+from markitect.infospace.checks.coherence import check_coherence
+from markitect.infospace.checks.consistency import check_consistency
+from markitect.infospace.checks.granularity import check_granularity
+from markitect.infospace.checks.orchestrator import run_all_checks, CheckReport
+
+__all__ = [
+    "check_redundancy",
+    "check_coverage",
+    "check_coherence",
+    "check_consistency",
+    "check_granularity",
+    "run_all_checks",
+    "CheckReport",
+]
--- a/markitect/infospace/checks/coherence.py
+++ b/markitect/infospace/checks/coherence.py
@@ -0,0 +1,81 @@
+"""
+C3 — Structural coherence.
+
+Uses graph analysis to check that the entity relationship graph is
+well-connected and has meaningful community structure.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional
+
+from markitect.prompts.dependencies.models import DependencyGraph
+
+
+@dataclass
+class CoherenceReport:
+    """Results from coherence analysis."""
+
+    connected_components: int = 0
+    largest_component_size: int = 0
+    modularity: float = 0.0
+    community_count: int = 0
+    cohesion: float = 0.0
+    coupling: float = 0.0
+    entity_count: int = 0
+
+    def to_dict(self) -> dict:
+        return {
+            "concern": "C3",
+            "connected_components": self.connected_components,
+            "largest_component_size": self.largest_component_size,
+            "modularity": round(self.modularity, 4),
+            "community_count": self.community_count,
+            "cohesion": round(self.cohesion, 4),
+            "coupling": round(self.coupling, 4),
+            "entity_count": self.entity_count,
+        }
+
+
+def check_coherence(
+    graph: Optional[DependencyGraph] = None,
+    entity_count: int = 0,
+) -> CoherenceReport:
+    """Check structural coherence of the entity relationship graph.
+
+    Args:
+        graph: The entity relationship graph.  If ``None``, returns
+            a report with zero values.
+        entity_count: Total number of entities (for context).
+
+    Returns:
+        :class:`CoherenceReport` with connectivity and community metrics.
+    """
+    if graph is None or len(graph.nodes) == 0:
+        return CoherenceReport(entity_count=entity_count)
+
+    try:
+        from markitect.analysis.graph import (
+            connected_components,
+            modularity_score,
+            detect_communities,
+            cohesion_coupling,
+        )
+    except ImportError:
+        return CoherenceReport(entity_count=entity_count)
+
+    components = connected_components(graph)
+    communities = detect_communities(graph, seed=42)
+    mod = modularity_score(graph, communities=communities)
+    cc = cohesion_coupling(graph, communities=communities)
+
+    return CoherenceReport(
+        connected_components=len(components),
+        largest_component_size=len(components[0]) if components else 0,
+        modularity=mod,
+        community_count=len(communities),
+        cohesion=cc["cohesion"],
+        coupling=cc["coupling"],
+        entity_count=entity_count or len(graph.nodes),
+    )
--- a/markitect/infospace/checks/consistency.py
+++ b/markitect/infospace/checks/consistency.py
@@ -0,0 +1,58 @@
+"""
+C4 — Definitional consistency.
+
+Checks for cycles in the dependency graph and definitional conflicts
+between entities.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional
+
+from markitect.infospace.models import EntityMeta
+from markitect.prompts.dependencies.models import DependencyGraph
+
+
+@dataclass
+class ConsistencyReport:
+    """Results from consistency analysis."""
+
+    cycles: List[List[str]] = field(default_factory=list)
+    cycle_count: int = 0
+    entity_count: int = 0
+
+    def to_dict(self) -> dict:
+        return {
+            "concern": "C4",
+            "cycle_count": self.cycle_count,
+            "cycles": self.cycles,
+            "entity_count": self.entity_count,
+        }
+
+
+def check_consistency(
+    entities: List[EntityMeta],
+    graph: Optional[DependencyGraph] = None,
+) -> ConsistencyReport:
+    """Check definitional consistency.
+
+    Args:
+        entities: Entity metadata list.
+        graph: Optional dependency graph for cycle detection.
+
+    Returns:
+        :class:`ConsistencyReport` with cycles found.
+    """
+    n = len(entities)
+    cycles: List[List[str]] = []
+
+    if graph is not None and len(graph.nodes) > 0:
+        raw_cycles = graph.detect_cycles()
+        cycles = raw_cycles
+
+    return ConsistencyReport(
+        cycles=cycles,
+        cycle_count=len(cycles),
+        entity_count=n,
+    )
--- a/markitect/infospace/checks/coverage.py
+++ b/markitect/infospace/checks/coverage.py
@@ -0,0 +1,111 @@
+"""
+C2 — Coverage completeness.
+
+Uses FCA and cross-tabulation to detect structural coverage gaps:
+attribute combinations (domain × VSM system) with no entities.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
+
+from markitect.infospace.models import EntityMeta
+from markitect.analysis.fca import FormalContext, find_empty_cells, find_gap_concepts
+
+
+@dataclass
+class CoverageReport:
+    """Results from coverage analysis."""
+
+    coverage_ratio: float = 0.0
+    empty_cells: List[dict] = field(default_factory=list)
+    gap_concepts: List[dict] = field(default_factory=list)
+    domain_counts: Dict[str, int] = field(default_factory=dict)
+    entity_count: int = 0
+
+    def to_dict(self) -> dict:
+        return {
+            "concern": "C2",
+            "coverage_ratio": round(self.coverage_ratio, 4),
+            "empty_cells": self.empty_cells,
+            "gap_concepts_count": len(self.gap_concepts),
+            "domain_counts": self.domain_counts,
+            "entity_count": self.entity_count,
+        }
+
+
+def _extract_attributes(entity: EntityMeta) -> set[str]:
+    """Extract FCA attributes from an entity."""
+    attrs: set[str] = set()
+    if entity.domain:
+        attrs.add(f"domain:{entity.domain}")
+    if entity.source_chapter:
+        attrs.add(f"chapter:{entity.source_chapter}")
+    return attrs
+
+
+def check_coverage(
+    entities: List[EntityMeta],
+    extra_attributes: Optional[Dict[str, set[str]]] = None,
+) -> CoverageReport:
+    """Check coverage completeness using FCA gap analysis.
+
+    Args:
+        entities: Entity metadata list.
+        extra_attributes: Optional ``{slug: {attr, ...}}`` to merge
+            with auto-extracted attributes (e.g. VSM mappings).
+
+    Returns:
+        :class:`CoverageReport` with gaps and coverage ratio.
+    """
+    n = len(entities)
+    if n == 0:
+        return CoverageReport()
+
+    # Build entity → attributes mapping
+    entity_attrs: Dict[str, set[str]] = {}
+    for e in entities:
+        attrs = _extract_attributes(e)
+        if extra_attributes and e.slug in extra_attributes:
+            attrs.update(extra_attributes[e.slug])
+        entity_attrs[e.slug] = attrs
+
+    # Domain counts
+    domain_counts: Dict[str, int] = {}
+    for e in entities:
+        d = e.domain or "(unspecified)"
+        domain_counts[d] = domain_counts.get(d, 0) + 1
+
+    # Build FCA context
+    context = FormalContext.from_dict(entity_attrs)
+
+    # Cross-tabulation: domain × chapter
+    domains = sorted({a for attrs in entity_attrs.values() for a in attrs if a.startswith("domain:")})
+    chapters = sorted({a for attrs in entity_attrs.values() for a in attrs if a.startswith("chapter:")})
+
+    empty = []
+    if domains and chapters:
+        raw_empty = find_empty_cells(context, domains, chapters)
+        empty = [{"dimension_a": a, "dimension_b": b} for a, b in raw_empty]
+
+    # FCA gap concepts
+    gaps = find_gap_concepts(context)
+    gap_dicts = [
+        {"intent": sorted(g.intent), "extent_size": g.extent_size}
+        for g in gaps
+        if g.intent_size <= 4  # Only report manageable gaps
+    ]
+
+    # Coverage ratio: populated cells / total possible cells
+    total_cells = len(domains) * len(chapters) if domains and chapters else 1
+    populated = total_cells - len(empty)
+    ratio = populated / total_cells if total_cells > 0 else 0.0
+
+    return CoverageReport(
+        coverage_ratio=ratio,
+        empty_cells=empty,
+        gap_concepts=gap_dicts,
+        domain_counts=domain_counts,
+        entity_count=n,
+    )
--- a/markitect/infospace/checks/granularity.py
+++ b/markitect/infospace/checks/granularity.py
@@ -0,0 +1,98 @@
+"""
+C5 — Granularity balance.
+
+Checks that entities are at a consistent level of abstraction,
+measured by word count distribution and Shannon entropy of domain
+assignments.
+"""
+
+from __future__ import annotations
+
+import math
+from dataclasses import dataclass, field
+from typing import Dict, List
+
+from markitect.infospace.models import EntityMeta
+
+
+@dataclass
+class GranularityReport:
+    """Results from granularity analysis."""
+
+    domain_entropy: float = 0.0
+    word_count_stats: Dict[str, float] = field(default_factory=dict)
+    domain_distribution: Dict[str, int] = field(default_factory=dict)
+    entity_count: int = 0
+
+    def to_dict(self) -> dict:
+        return {
+            "concern": "C5",
+            "domain_entropy": round(self.domain_entropy, 4),
+            "word_count_stats": {
+                k: round(v, 2) for k, v in self.word_count_stats.items()
+            },
+            "domain_distribution": self.domain_distribution,
+            "entity_count": self.entity_count,
+        }
+
+
+def _shannon_entropy(counts: Dict[str, int]) -> float:
+    """Compute Shannon entropy of a distribution."""
+    total = sum(counts.values())
+    if total == 0:
+        return 0.0
+    entropy = 0.0
+    for count in counts.values():
+        if count > 0:
+            p = count / total
+            entropy -= p * math.log2(p)
+    return entropy
+
+
+def check_granularity(entities: List[EntityMeta]) -> GranularityReport:
+    """Check granularity balance across entities.
+
+    Metrics:
+    - Domain entropy: higher = more balanced distribution.
+    - Word count statistics: mean, min, max, std dev.
+
+    Args:
+        entities: Entity metadata list.
+
+    Returns:
+        :class:`GranularityReport` with balance metrics.
+    """
+    n = len(entities)
+    if n == 0:
+        return GranularityReport()
+
+    # Domain distribution
+    domain_counts: Dict[str, int] = {}
+    for e in entities:
+        d = e.domain or "(unspecified)"
+        domain_counts[d] = domain_counts.get(d, 0) + 1
+
+    entropy = _shannon_entropy(domain_counts)
+
+    # Word count statistics
+    word_counts = [e.definition_word_count for e in entities]
+    if not word_counts:
+        word_counts = [0]
+
+    mean_wc = sum(word_counts) / len(word_counts)
+    min_wc = min(word_counts)
+    max_wc = max(word_counts)
+    variance = sum((wc - mean_wc) ** 2 for wc in word_counts) / len(word_counts)
+    std_wc = math.sqrt(variance)
+
+    return GranularityReport(
+        domain_entropy=entropy,
+        word_count_stats={
+            "mean": mean_wc,
+            "min": float(min_wc),
+            "max": float(max_wc),
+            "std": std_wc,
+        },
+        domain_distribution=domain_counts,
+        entity_count=n,
+    )
--- a/markitect/infospace/checks/orchestrator.py
+++ b/markitect/infospace/checks/orchestrator.py
@@ -0,0 +1,102 @@
+"""
+Unified orchestrator for all five collection-level checks.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
+
+from markitect.infospace.models import EntityMeta
+from markitect.prompts.dependencies.models import DependencyGraph
+
+from .redundancy import RedundancyReport, check_redundancy
+from .coverage import CoverageReport, check_coverage
+from .coherence import CoherenceReport, check_coherence
+from .consistency import ConsistencyReport, check_consistency
+from .granularity import GranularityReport, check_granularity
+
+
+@dataclass
+class CheckReport:
+    """Unified report from all five collection-level checks."""
+
+    redundancy: Optional[RedundancyReport] = None
+    coverage: Optional[CoverageReport] = None
+    coherence: Optional[CoherenceReport] = None
+    consistency: Optional[ConsistencyReport] = None
+    granularity: Optional[GranularityReport] = None
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {}
+        if self.redundancy:
+            d["redundancy"] = self.redundancy.to_dict()
+        if self.coverage:
+            d["coverage"] = self.coverage.to_dict()
+        if self.coherence:
+            d["coherence"] = self.coherence.to_dict()
+        if self.consistency:
+            d["consistency"] = self.consistency.to_dict()
+        if self.granularity:
+            d["granularity"] = self.granularity.to_dict()
+        return d
+
+    def metrics(self) -> Dict[str, float]:
+        """Extract key metrics for viability checking."""
+        m: Dict[str, float] = {}
+        if self.redundancy:
+            m["redundancy_ratio"] = self.redundancy.redundancy_ratio
+        if self.coverage:
+            m["coverage_ratio"] = self.coverage.coverage_ratio
+        if self.coherence:
+            m["coherence_components"] = float(self.coherence.connected_components)
+            m["modularity"] = self.coherence.modularity
+        if self.consistency:
+            m["consistency_cycles"] = float(self.consistency.cycle_count)
+        if self.granularity:
+            m["granularity_entropy"] = self.granularity.domain_entropy
+        return m
+
+
+def run_all_checks(
+    entities: List[EntityMeta],
+    embeddings: Optional[Dict[str, list[float]]] = None,
+    graph: Optional[DependencyGraph] = None,
+    extra_attributes: Optional[Dict[str, set[str]]] = None,
+    checks: Optional[List[str]] = None,
+) -> CheckReport:
+    """Run all (or selected) collection-level checks.
+
+    Args:
+        entities: Entity metadata list.
+        embeddings: Pre-computed embedding vectors for C1.
+        graph: Entity relationship graph for C3 and C4.
+        extra_attributes: Extra FCA attributes for C2.
+        checks: List of check names to run.  If ``None``, runs all five.
+            Valid names: ``redundancy``, ``coverage``, ``coherence``,
+            ``consistency``, ``granularity``.
+
+    Returns:
+        :class:`CheckReport` with results from each check.
+    """
+    run_all = checks is None
+    check_set = set(checks) if checks else set()
+
+    report = CheckReport()
+
+    if run_all or "redundancy" in check_set:
+        report.redundancy = check_redundancy(entities, embeddings=embeddings)
+
+    if run_all or "coverage" in check_set:
+        report.coverage = check_coverage(entities, extra_attributes=extra_attributes)
+
+    if run_all or "coherence" in check_set:
+        report.coherence = check_coherence(graph=graph, entity_count=len(entities))
+
+    if run_all or "consistency" in check_set:
+        report.consistency = check_consistency(entities, graph=graph)
+
+    if run_all or "granularity" in check_set:
+        report.granularity = check_granularity(entities)
+
+    return report
--- a/markitect/infospace/checks/redundancy.py
+++ b/markitect/infospace/checks/redundancy.py
@@ -0,0 +1,98 @@
+"""
+C1 — Redundancy detection.
+
+Uses embedding similarity to find entity pairs with overlapping
+meanings that may be candidates for merging.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional
+
+from markitect.infospace.models import EntityMeta
+from markitect.llm.similarity import find_similar_pairs
+
+
+@dataclass
+class RedundancyReport:
+    """Results from redundancy analysis."""
+
+    similar_pairs: List[dict] = field(default_factory=list)
+    redundancy_ratio: float = 0.0
+    entity_count: int = 0
+
+    def to_dict(self) -> dict:
+        return {
+            "concern": "C1",
+            "redundancy_ratio": round(self.redundancy_ratio, 4),
+            "similar_pairs": self.similar_pairs,
+            "entity_count": self.entity_count,
+        }
+
+
+def check_redundancy(
+    entities: List[EntityMeta],
+    embeddings: Optional[Dict[str, list[float]]] = None,
+    threshold: float = 0.85,
+) -> RedundancyReport:
+    """Check for redundant entities using embedding similarity.
+
+    Args:
+        entities: Entity metadata list.
+        embeddings: Pre-computed ``{slug: vector}`` mapping.
+            If ``None``, redundancy is checked structurally (title overlap).
+        threshold: Similarity threshold for flagging pairs.
+
+    Returns:
+        :class:`RedundancyReport` with similar pairs and ratio.
+    """
+    n = len(entities)
+    if n < 2:
+        return RedundancyReport(entity_count=n)
+
+    pairs: list[dict] = []
+
+    if embeddings:
+        # Embedding-based similarity
+        raw_pairs = find_similar_pairs(embeddings, threshold=threshold)
+        for slug_a, slug_b, sim in raw_pairs:
+            pairs.append({
+                "entity_a": slug_a,
+                "entity_b": slug_b,
+                "similarity": round(sim, 4),
+                "method": "embedding",
+            })
+    else:
+        # Fallback: structural overlap (shared definition words)
+        slug_to_words = {}
+        for e in entities:
+            words = set(e.definition.lower().split()) if e.definition else set()
+            slug_to_words[e.slug] = words
+
+        slugs = sorted(slug_to_words)
+        for i, a in enumerate(slugs):
+            for b in slugs[i + 1:]:
+                wa, wb = slug_to_words[a], slug_to_words[b]
+                if wa and wb:
+                    overlap = len(wa & wb) / min(len(wa), len(wb))
+                    if overlap >= threshold:
+                        pairs.append({
+                            "entity_a": a,
+                            "entity_b": b,
+                            "similarity": round(overlap, 4),
+                            "method": "word_overlap",
+                        })
+
+    # redundancy_ratio: fraction of entities involved in similar pairs
+    involved = set()
+    for p in pairs:
+        involved.add(p["entity_a"])
+        involved.add(p["entity_b"])
+    ratio = len(involved) / n if n > 0 else 0.0
+
+    return RedundancyReport(
+        similar_pairs=pairs,
+        redundancy_ratio=ratio,
+        entity_count=n,
+    )
--- a/markitect/infospace/cli.py
+++ b/markitect/infospace/cli.py
@@ -0,0 +1,524 @@
+"""
+CLI commands for infospace lifecycle management.
+
+Provides ``markitect infospace`` subcommands for initialising,
+inspecting, and evaluating infospaces.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Optional
+
+import click
+
+from markitect.infospace.config import (
+    DisciplineBinding,
+    InfospaceConfig,
+    SchemaRegistry,
+    TopicConfig,
+    find_infospace_config,
+    load_infospace_config,
+    save_infospace_config,
+)
+from markitect.infospace.entity_parser import parse_entity_directory
+from markitect.infospace.state import build_state
+
+
+def _load_config_or_exit(config_path: Optional[str] = None) -> tuple:
+    """Resolve and load infospace.yaml, or exit with an error."""
+    if config_path:
+        p = Path(config_path)
+    else:
+        p = find_infospace_config()
+    if p is None:
+        click.echo("Error: No infospace.yaml found. Run 'markitect infospace init' first.", err=True)
+        raise SystemExit(1)
+    cfg = load_infospace_config(p)
+    return cfg, p
+
+
+@click.group(name="infospace")
+def infospace_commands():
+    """Manage infospaces — create, inspect, evaluate."""
+    pass
+
+
+# ── init ─────────────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--topic", required=True, help="Topic name for the infospace.")
+@click.option("--domain", default="", help="Knowledge domain.")
+@click.option("--sources", default="", help="Path to source material directory.")
+@click.option("--discipline", multiple=True, help="Discipline name (repeatable).")
+@click.option("--output", "-o", default="infospace.yaml", help="Output config file path.")
+def init(topic: str, domain: str, sources: str, discipline: tuple, output: str):
+    """Initialise a new infospace configuration file."""
+    out_path = Path(output)
+    if out_path.exists():
+        click.echo(f"Error: {out_path} already exists.", err=True)
+        raise SystemExit(1)
+
+    disciplines = [DisciplineBinding(name=d) for d in discipline]
+    config = InfospaceConfig(
+        topic=TopicConfig(name=topic, domain=domain, sources=sources),
+        disciplines=disciplines,
+    )
+    save_infospace_config(config, out_path)
+    click.echo(f"Created {out_path}")
+
+
+# ── status ───────────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+def status(config_path: Optional[str]):
+    """Show infospace status — entity count, domains, evaluation state."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    # Parse entities
+    entities_dir = root / cfg.entities_dir
+    entities = []
+    if entities_dir.is_dir():
+        entities = parse_entity_directory(entities_dir)
+
+    # Load latest snapshot if available
+    snapshot = None
+    history_path = root / cfg.metrics_dir / "history.yaml"
+    if history_path.is_file():
+        from markitect.infospace.evaluation_io import read_history
+        history = read_history(history_path)
+        if history:
+            snapshot = history[-1]
+
+    state = build_state(cfg, entities=entities, snapshot=snapshot)
+
+    click.echo(f"Infospace: {state.topic_name}")
+    if cfg.topic.domain:
+        click.echo(f"Domain:    {cfg.topic.domain}")
+    click.echo(f"Entities:  {state.entity_count}")
+    if state.domains:
+        click.echo(f"Domains:   {', '.join(state.domains)}")
+    if cfg.disciplines:
+        names = [d.name for d in cfg.disciplines]
+        click.echo(f"Disciplines: {', '.join(names)}")
+    if state.has_evaluations:
+        click.echo(f"Last evaluated: {state.latest_snapshot.created_at.isoformat()}")
+    else:
+        click.echo("Evaluations: none")
+
+
+# ── entities ─────────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+@click.option(
+    "--sort-by", "sort_key",
+    type=click.Choice(["slug", "domain", "words"]),
+    default="slug",
+    help="Sort entities by field.",
+)
+def entities(config_path: Optional[str], sort_key: str):
+    """List entities with metadata summary."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+    entities_dir = root / cfg.entities_dir
+
+    if not entities_dir.is_dir():
+        click.echo("No entities directory found.")
+        return
+
+    entity_list = parse_entity_directory(entities_dir)
+    if not entity_list:
+        click.echo("No entities found.")
+        return
+
+    # Sort
+    if sort_key == "domain":
+        entity_list.sort(key=lambda e: (e.domain or "", e.slug))
+    elif sort_key == "words":
+        entity_list.sort(key=lambda e: e.total_word_count, reverse=True)
+    else:
+        entity_list.sort(key=lambda e: e.slug)
+
+    # Format as table
+    click.echo(f"{'Slug':<40} {'Domain':<20} {'Words':>6}")
+    click.echo("-" * 68)
+    for e in entity_list:
+        click.echo(f"{e.slug:<40} {(e.domain or '-'):<20} {e.total_word_count:>6}")
+    click.echo(f"\nTotal: {len(entity_list)} entities")
+
+
+# ── evaluate ─────────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+@click.option("--provider", default="openrouter", help="LLM provider (openrouter, openai, etc.).")
+@click.option("--model", default=None, help="LLM model name.")
+@click.option("--entity", "entity_slug", default=None, help="Evaluate a single entity by slug.")
+@click.option("--chapter", default=None, help="Evaluate entities from a specific chapter.")
+def evaluate(config_path, provider, model, entity_slug, chapter):
+    """Evaluate entities using LLM-based quality assessment."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    entities_dir = root / cfg.entities_dir
+    if not entities_dir.is_dir():
+        click.echo("Error: No entities directory found.", err=True)
+        raise SystemExit(1)
+
+    entity_list = parse_entity_directory(entities_dir)
+    if not entity_list:
+        click.echo("No entities to evaluate.")
+        return
+
+    # Filter
+    if entity_slug:
+        entity_list = [e for e in entity_list if e.slug == entity_slug]
+        if not entity_list:
+            click.echo(f"Error: Entity '{entity_slug}' not found.", err=True)
+            raise SystemExit(1)
+    elif chapter:
+        entity_list = [e for e in entity_list if chapter in e.source_chapter]
+        if not entity_list:
+            click.echo(f"No entities found for chapter '{chapter}'.")
+            return
+
+    # Create adapter
+    from markitect.llm import create_adapter
+    from markitect.prompts.execution.models import RunConfig
+    adapter = create_adapter(provider, model=model)
+    run_config = RunConfig(model_name=model or "default", temperature=0.3, max_tokens=2000)
+
+    # Progress callback
+    def on_progress(done, total, result):
+        status = result.status.upper()
+        click.echo(f"  [{done}/{total}] {result.key}: {status}")
+
+    click.echo(f"Evaluating {len(entity_list)} entities via {provider}...")
+
+    from markitect.infospace.evaluate import run_entity_evaluation
+    output_dir = root / cfg.evaluations_dir
+    summary = run_entity_evaluation(
+        config=cfg,
+        entities=entity_list,
+        adapter=adapter,
+        run_config=run_config,
+        output_dir=output_dir,
+        progress_callback=on_progress,
+    )
+
+    click.echo(f"\nDone: {summary.succeeded} succeeded, {summary.failed} failed, {summary.skipped} skipped")
+    if summary.total_tokens > 0:
+        click.echo(f"Tokens used: {summary.total_tokens}")
+
+
+# ── viability ────────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+def viability(config_path: Optional[str]):
+    """Show viability dashboard — threshold checks and pass/fail."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+
+    if not cfg.viability:
+        click.echo("No viability thresholds configured in infospace.yaml.")
+        return
+
+    # Try to load latest metrics
+    root = cfg_path.parent
+    metrics: dict = {}
+    metrics_file = root / cfg.metrics_dir / "metrics.yaml"
+    if metrics_file.is_file():
+        import yaml
+        raw = yaml.safe_load(metrics_file.read_text(encoding="utf-8"))
+        if isinstance(raw, dict):
+            metrics = {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
+
+    state = build_state(cfg, metrics=metrics if metrics else None)
+
+    if not state.viability_results:
+        click.echo("No metrics available. Run evaluations first.")
+        click.echo("\nConfigured thresholds:")
+        for name, t in cfg.viability.items():
+            bounds = []
+            if t.min is not None:
+                bounds.append(f"min={t.min}")
+            if t.max is not None:
+                bounds.append(f"max={t.max}")
+            click.echo(f"  {name}: {', '.join(bounds)}")
+        return
+
+    click.echo(f"{'Metric':<30} {'Value':>8} {'Threshold':>15} {'Status':>8}")
+    click.echo("-" * 63)
+    for r in state.viability_results:
+        bounds = []
+        if r.threshold.min is not None:
+            bounds.append(f"min={r.threshold.min}")
+        if r.threshold.max is not None:
+            bounds.append(f"max={r.threshold.max}")
+        status_str = "PASS" if r.passed else "FAIL"
+        click.echo(
+            f"{r.metric:<30} {r.value:>8.4f} {', '.join(bounds):>15} {status_str:>8}"
+        )
+
+    click.echo()
+    if state.is_viable:
+        click.echo(f"Viable: YES ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
+    else:
+        click.echo(f"Viable: NO ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
+
+
+# ── check ───────────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+@click.option(
+    "--concern", "concerns", multiple=True,
+    type=click.Choice(["redundancy", "coverage", "coherence", "consistency", "granularity"]),
+    help="Run specific concern(s). Omit to run all five.",
+)
+@click.option("--json", "as_json", is_flag=True, help="Output results as JSON.")
+def check(config_path: Optional[str], concerns: tuple, as_json: bool):
+    """Run collection-level quality checks (C1–C5)."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    entities_dir = root / cfg.entities_dir
+    if not entities_dir.is_dir():
+        click.echo("Error: No entities directory found.", err=True)
+        raise SystemExit(1)
+
+    entity_list = parse_entity_directory(entities_dir)
+    if not entity_list:
+        click.echo("No entities to check.")
+        return
+
+    from markitect.infospace.checks import run_all_checks
+
+    checks_list = list(concerns) if concerns else None
+
+    report = run_all_checks(
+        entities=entity_list,
+        checks=checks_list,
+    )
+
+    if as_json:
+        import json
+        click.echo(json.dumps(report.to_dict(), indent=2))
+    else:
+        click.echo(f"Collection checks — {len(entity_list)} entities\n")
+        d = report.to_dict()
+        for concern_name, concern_data in d.items():
+            label = concern_data.get("concern", concern_name.upper())
+            click.echo(f"  {label} — {concern_name}")
+            for k, v in concern_data.items():
+                if k == "concern":
+                    continue
+                click.echo(f"    {k}: {v}")
+            click.echo()
+
+    # Show summary metrics
+    m = report.metrics()
+    if m and not as_json:
+        click.echo("Metrics summary:")
+        for k, v in sorted(m.items()):
+            click.echo(f"  {k}: {v:.4f}")
+
+    # Record to history
+    if m:
+        from markitect.infospace.history import record_check_results
+        snap = record_check_results(report, cfg, root, entity_count=len(entity_list))
+        if not as_json:
+            click.echo(f"\nRecorded snapshot {snap.snapshot_id}")
+
+
+# ── history ─────────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+@click.option("--metric", default=None, help="Show trend for a specific metric.")
+@click.option("--json", "as_json", is_flag=True, help="Output as JSON.")
+def history(config_path: Optional[str], metric: Optional[str], as_json: bool):
+    """Show metrics history — snapshots over time."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    from markitect.infospace.history import get_history, metric_trend
+
+    snapshots = get_history(cfg, root)
+    if not snapshots:
+        click.echo("No history found. Run 'markitect infospace check' first.")
+        return
+
+    if metric:
+        trend = metric_trend(snapshots, metric)
+        if not trend:
+            click.echo(f"No data for metric '{metric}'.")
+            return
+        if as_json:
+            import json
+            click.echo(json.dumps(trend, indent=2))
+        else:
+            click.echo(f"Trend: {metric}\n")
+            for entry in trend:
+                click.echo(f"  {entry['date'][:19]}  {entry['value']:.4f}")
+        return
+
+    if as_json:
+        import json
+        click.echo(json.dumps([s.to_dict() for s in snapshots], indent=2, default=str))
+        return
+
+    click.echo(f"History: {len(snapshots)} snapshot(s)\n")
+    click.echo(f"{'#':<4} {'Date':<20} {'Entities':>8} {'Metrics':>8}")
+    click.echo("-" * 42)
+    for i, snap in enumerate(snapshots, 1):
+        date_str = snap.created_at.isoformat()[:19]
+        n_metrics = len(snap.collection_metrics)
+        click.echo(f"{i:<4} {date_str:<20} {snap.entity_count:>8} {n_metrics:>8}")
+
+
+@infospace_commands.command(name="history-diff")
+@click.argument("date_a")
+@click.argument("date_b")
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+def history_diff(date_a: str, date_b: str, config_path: Optional[str]):
+    """Compare two history snapshots by date (YYYY-MM-DD)."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    from markitect.infospace.history import find_snapshot_by_date, get_history
+    from markitect.infospace.evaluation_io import diff_snapshots
+
+    snapshots = get_history(cfg, root)
+    if len(snapshots) < 2:
+        click.echo("Need at least two snapshots to diff.")
+        return
+
+    snap_a = find_snapshot_by_date(snapshots, date_a)
+    snap_b = find_snapshot_by_date(snapshots, date_b)
+
+    if snap_a is None:
+        click.echo(f"No snapshot found near '{date_a}'.")
+        return
+    if snap_b is None:
+        click.echo(f"No snapshot found near '{date_b}'.")
+        return
+    if snap_a.snapshot_id == snap_b.snapshot_id:
+        click.echo("Both dates resolve to the same snapshot.")
+        return
+
+    diff = diff_snapshots(snap_a, snap_b)
+    click.echo(diff.summary())
+
+
+# ── bind-discipline ─────────────────────────────────────────────────
+
+
+@infospace_commands.command(name="bind-discipline")
+@click.argument("discipline_path")
+@click.option("--name", required=True, help="Name for the discipline.")
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+def bind_discipline_cmd(discipline_path: str, name: str, config_path: Optional[str]):
+    """Bind a discipline infospace to the current infospace."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    from markitect.infospace.composition import bind_discipline
+
+    status = bind_discipline(cfg, name=name, path=discipline_path, root=root)
+
+    if status.error:
+        click.echo(f"Error: {status.error}", err=True)
+        raise SystemExit(1)
+
+    # Persist updated config
+    save_infospace_config(cfg, cfg_path)
+
+    click.echo(f"Bound discipline '{name}' from {discipline_path}")
+    click.echo(f"  Entities: {status.entity_count}")
+    if status.has_config:
+        viable_str = "YES" if status.is_viable else "NO"
+        click.echo(f"  Viable: {viable_str}")
+
+
+# ── disciplines ─────────────────────────────────────────────────────
+
+
+@infospace_commands.command()
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+def disciplines(config_path: Optional[str]):
+    """List bound disciplines and their viability status."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    if not cfg.disciplines:
+        click.echo("No disciplines bound.")
+        return
+
+    from markitect.infospace.composition import check_discipline_status
+
+    click.echo(f"{'Name':<30} {'Entities':>8} {'Viable':>8} {'Path'}")
+    click.echo("-" * 70)
+    for binding in cfg.disciplines:
+        status = check_discipline_status(binding, root)
+        viable_str = "YES" if status.is_viable else ("NO" if status.has_config else "?")
+        click.echo(
+            f"{status.name:<30} {status.entity_count:>8} {viable_str:>8} {status.path}"
+        )
+        if status.error:
+            click.echo(f"  Error: {status.error}")
+
+
+# ── stale-mappings ──────────────────────────────────────────────────
+
+
+@infospace_commands.command(name="stale-mappings")
+@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
+def stale_mappings(config_path: Optional[str]):
+    """Check for stale mappings due to discipline changes."""
+    cfg, cfg_path = _load_config_or_exit(config_path)
+    root = cfg_path.parent
+
+    if not cfg.disciplines:
+        click.echo("No disciplines bound — no mappings to check.")
+        return
+
+    from markitect.infospace.composition import find_stale_mappings
+
+    # Try to load mapping references from output
+    mapping_refs = _load_mapping_references(cfg, root)
+
+    stale = find_stale_mappings(cfg, root, mapping_references=mapping_refs)
+
+    if not stale:
+        click.echo("No stale mappings detected.")
+        return
+
+    click.echo(f"Found {len(stale)} stale mapping(s):\n")
+    for s in stale:
+        click.echo(f"  {s.entity_slug} -> {s.discipline_entity}")
+        click.echo(f"    {s.reason}")
+
+
+def _load_mapping_references(
+    cfg: InfospaceConfig, root: Path
+) -> Optional[dict]:
+    """Try to load mapping references from YAML file in output dir."""
+    mapping_file = root / cfg.metrics_dir / "mapping-references.yaml"
+    if not mapping_file.is_file():
+        return None
+    import yaml
+    data = yaml.safe_load(mapping_file.read_text(encoding="utf-8"))
+    if isinstance(data, dict):
+        return data
+    return None
--- a/markitect/infospace/composition.py
+++ b/markitect/infospace/composition.py
@@ -0,0 +1,281 @@
+"""
+Infospace composition model.
+
+Allows one infospace to use another as a discipline — a reusable
+framework of concepts applied as an analytical lens.
+
+Key operations:
+- Resolve and validate discipline bindings
+- Check discipline viability (must meet its own thresholds)
+- List discipline entities as mapping targets
+- Detect stale mappings when discipline content changes
+"""
+
+from __future__ import annotations
+
+import hashlib
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from markitect.infospace.config import (
+    DisciplineBinding,
+    InfospaceConfig,
+    load_infospace_config,
+)
+from markitect.infospace.entity_parser import parse_entity_directory
+from markitect.infospace.history import get_latest_snapshot, read_metrics_file
+from markitect.infospace.models import EntityMeta
+from markitect.infospace.state import InfospaceState, ViabilityResult, build_state
+
+
+@dataclass
+class DisciplineStatus:
+    """Status of a bound discipline infospace."""
+
+    name: str
+    path: str
+    resolved_path: Optional[Path] = None
+    exists: bool = False
+    has_config: bool = False
+    entity_count: int = 0
+    is_viable: bool = False
+    viability_results: List[ViabilityResult] = field(default_factory=list)
+    error: str = ""
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {
+            "name": self.name,
+            "path": self.path,
+            "exists": self.exists,
+            "has_config": self.has_config,
+            "entity_count": self.entity_count,
+            "is_viable": self.is_viable,
+        }
+        if self.viability_results:
+            d["viability"] = [r.to_dict() for r in self.viability_results]
+        if self.error:
+            d["error"] = self.error
+        return d
+
+
+@dataclass
+class StaleMappingInfo:
+    """Information about a mapping that may be stale."""
+
+    entity_slug: str
+    discipline_entity: str
+    reason: str
+
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "entity_slug": self.entity_slug,
+            "discipline_entity": self.discipline_entity,
+            "reason": self.reason,
+        }
+
+
+# ── Resolution ───────────────────────────────────────────────────────
+
+
+def resolve_discipline_path(
+    binding: DisciplineBinding, root: Path
+) -> Optional[Path]:
+    """Resolve a discipline binding to an absolute path.
+
+    Tries the binding's path relative to *root*, then as an absolute path.
+    Returns ``None`` if the directory doesn't exist.
+    """
+    if not binding.path:
+        return None
+
+    # Try relative to root first
+    candidate = root / binding.path
+    if candidate.is_dir():
+        return candidate.resolve()
+
+    # Try as absolute
+    candidate = Path(binding.path)
+    if candidate.is_dir():
+        return candidate.resolve()
+
+    return None
+
+
+def load_discipline_config(
+    binding: DisciplineBinding, root: Path
+) -> Optional[InfospaceConfig]:
+    """Load the infospace config for a bound discipline.
+
+    Returns ``None`` if the discipline path cannot be resolved or
+    has no ``infospace.yaml``.
+    """
+    disc_path = resolve_discipline_path(binding, root)
+    if disc_path is None:
+        return None
+
+    config_file = disc_path / "infospace.yaml"
+    if not config_file.is_file():
+        return None
+
+    return load_infospace_config(config_file)
+
+
+# ── Viability checking ───────────────────────────────────────────────
+
+
+def check_discipline_status(
+    binding: DisciplineBinding, root: Path
+) -> DisciplineStatus:
+    """Check the full status of a bound discipline.
+
+    Resolves the path, loads config, counts entities, and checks
+    viability against the discipline's own thresholds.
+    """
+    status = DisciplineStatus(name=binding.name, path=binding.path)
+
+    disc_path = resolve_discipline_path(binding, root)
+    if disc_path is None:
+        status.error = f"Path not found: {binding.path}"
+        return status
+
+    status.resolved_path = disc_path
+    status.exists = True
+
+    # Load config
+    config_file = disc_path / "infospace.yaml"
+    if not config_file.is_file():
+        status.error = "No infospace.yaml found"
+        return status
+
+    disc_config = load_infospace_config(config_file)
+    status.has_config = True
+
+    # Count entities
+    entities_dir = disc_path / disc_config.entities_dir
+    if entities_dir.is_dir():
+        entities = parse_entity_directory(entities_dir)
+        status.entity_count = len(entities)
+
+    # Check viability
+    if disc_config.viability:
+        metrics = read_metrics_file(disc_path / disc_config.metrics_dir / "metrics.yaml")
+        if metrics:
+            state = build_state(disc_config, metrics=metrics)
+            status.viability_results = state.viability_results
+            status.is_viable = state.is_viable
+
+    return status
+
+
+def get_discipline_entities(
+    binding: DisciplineBinding, root: Path
+) -> List[EntityMeta]:
+    """Get all entities from a bound discipline infospace."""
+    disc_path = resolve_discipline_path(binding, root)
+    if disc_path is None:
+        return []
+
+    disc_config = load_discipline_config(binding, root)
+    if disc_config is None:
+        return []
+
+    entities_dir = disc_path / disc_config.entities_dir
+    if not entities_dir.is_dir():
+        return []
+
+    return parse_entity_directory(entities_dir)
+
+
+# ── Stale mapping detection ─────────────────────────────────────────
+
+
+def _content_digest(entity: EntityMeta) -> str:
+    """Compute a short content digest for an entity."""
+    content = f"{entity.slug}|{entity.definition}|{entity.domain}"
+    return hashlib.sha256(content.encode()).hexdigest()[:12]
+
+
+def compute_discipline_digests(
+    binding: DisciplineBinding, root: Path
+) -> Dict[str, str]:
+    """Compute content digests for all entities in a discipline.
+
+    Returns ``{slug: digest}`` mapping.
+    """
+    entities = get_discipline_entities(binding, root)
+    return {e.slug: _content_digest(e) for e in entities}
+
+
+def find_stale_mappings(
+    config: InfospaceConfig,
+    root: Path,
+    mapping_references: Optional[Dict[str, List[str]]] = None,
+) -> List[StaleMappingInfo]:
+    """Find mappings that may be stale due to discipline changes.
+
+    Args:
+        config: The infospace configuration.
+        root: Project root directory.
+        mapping_references: ``{entity_slug: [discipline_entity_slugs]}``
+            mapping of local entities to the discipline entities they
+            reference. If ``None``, returns an empty list (no mapping
+            data available).
+
+    Returns:
+        List of stale mapping info objects.
+    """
+    if not mapping_references:
+        return []
+
+    stale: List[StaleMappingInfo] = []
+
+    for binding in config.disciplines:
+        disc_entities = get_discipline_entities(binding, root)
+        disc_slugs = {e.slug for e in disc_entities}
+
+        for entity_slug, refs in mapping_references.items():
+            for ref_slug in refs:
+                if ref_slug not in disc_slugs:
+                    stale.append(StaleMappingInfo(
+                        entity_slug=entity_slug,
+                        discipline_entity=ref_slug,
+                        reason=f"Discipline entity '{ref_slug}' no longer exists in '{binding.name}'",
+                    ))
+
+    return stale
+
+
+# ── Binding management ───────────────────────────────────────────────
+
+
+def bind_discipline(
+    config: InfospaceConfig,
+    name: str,
+    path: str,
+    root: Path,
+) -> DisciplineStatus:
+    """Add a discipline binding to the config and validate it.
+
+    Does NOT persist the config — the caller should save it.
+
+    Args:
+        config: The infospace configuration to update.
+        name: Discipline name.
+        path: Path to the discipline infospace.
+        root: Project root for path resolution.
+
+    Returns:
+        Status of the newly bound discipline.
+    """
+    # Check for duplicates
+    existing = {d.name for d in config.disciplines}
+    if name in existing:
+        return DisciplineStatus(
+            name=name, path=path, error=f"Discipline '{name}' already bound"
+        )
+
+    binding = DisciplineBinding(name=name, path=path)
+    config.disciplines.append(binding)
+
+    return check_discipline_status(binding, root)
--- a/markitect/infospace/config.py
+++ b/markitect/infospace/config.py
@@ -0,0 +1,309 @@
+"""
+Infospace configuration model and YAML loader.
+
+An infospace is declared via an ``infospace.yaml`` file that specifies
+its topic, disciplines, schemas, competency questions, and viability
+thresholds.  This module provides the data models and I/O for that
+configuration.
+
+Example ``infospace.yaml``::
+
+    topic:
+      name: "The Wealth of Nations"
+      domain: "Classical Economics"
+      sources: artifacts/sources/
+
+    disciplines:
+      - name: "Viable System Model"
+        path: artifacts/vsm-reference/
+
+    schemas:
+      entity: schemas/economic-entity-schema-v1.0.md
+
+    competency_questions: schemas/competency-questions.md
+
+    viability:
+      coverage_ratio: { min: 0.60 }
+      per_entity_mean: { min: 3.5 }
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import yaml
+
+
+@dataclass
+class TopicConfig:
+    """The subject matter an infospace explains.
+
+    Attributes:
+        name: Human-readable topic name.
+        domain: Broader knowledge domain.
+        sources: Path (relative to infospace root) to source material.
+    """
+
+    name: str
+    domain: str = ""
+    sources: str = ""
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {"name": self.name}
+        if self.domain:
+            d["domain"] = self.domain
+        if self.sources:
+            d["sources"] = self.sources
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> TopicConfig:
+        return cls(
+            name=data["name"],
+            domain=data.get("domain", ""),
+            sources=data.get("sources", ""),
+        )
+
+
+@dataclass
+class DisciplineBinding:
+    """An external infospace applied as an analytical lens.
+
+    Attributes:
+        name: Human-readable discipline name.
+        path: Path to the discipline infospace (relative to root).
+    """
+
+    name: str
+    path: str = ""
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {"name": self.name}
+        if self.path:
+            d["path"] = self.path
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> DisciplineBinding:
+        return cls(name=data["name"], path=data.get("path", ""))
+
+
+@dataclass
+class SchemaRegistry:
+    """Schema paths governing entity and document structure.
+
+    All paths are relative to the infospace root directory.
+    """
+
+    entity: str = ""
+    mapping: str = ""
+    analysis: str = ""
+    extra: Dict[str, str] = field(default_factory=dict)
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {}
+        if self.entity:
+            d["entity"] = self.entity
+        if self.mapping:
+            d["mapping"] = self.mapping
+        if self.analysis:
+            d["analysis"] = self.analysis
+        d.update(self.extra)
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> SchemaRegistry:
+        known = {"entity", "mapping", "analysis"}
+        extra = {k: v for k, v in data.items() if k not in known}
+        return cls(
+            entity=data.get("entity", ""),
+            mapping=data.get("mapping", ""),
+            analysis=data.get("analysis", ""),
+            extra=extra,
+        )
+
+
+@dataclass
+class ViabilityThreshold:
+    """Threshold for a single viability metric.
+
+    At least one of *min* or *max* should be set.
+    """
+
+    metric: str
+    min: Optional[float] = None
+    max: Optional[float] = None
+
+    def check(self, value: float) -> bool:
+        """Return ``True`` if *value* is within the threshold."""
+        if self.min is not None and value < self.min:
+            return False
+        if self.max is not None and value > self.max:
+            return False
+        return True
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {}
+        if self.min is not None:
+            d["min"] = self.min
+        if self.max is not None:
+            d["max"] = self.max
+        return d
+
+
+@dataclass
+class PipelineStage:
+    """A single stage in the processing pipeline."""
+
+    template: str
+    spaces: List[str] = field(default_factory=list)
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {"template": self.template}
+        if self.spaces:
+            d["spaces"] = self.spaces
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> PipelineStage:
+        return cls(
+            template=data["template"],
+            spaces=data.get("spaces", []),
+        )
+
+
+@dataclass
+class PipelineConfig:
+    """Processing pipeline configuration."""
+
+    stages: List[PipelineStage] = field(default_factory=list)
+    post_batch: List[PipelineStage] = field(default_factory=list)
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {}
+        if self.stages:
+            d["stages"] = [s.to_dict() for s in self.stages]
+        if self.post_batch:
+            d["post_batch"] = [s.to_dict() for s in self.post_batch]
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> PipelineConfig:
+        return cls(
+            stages=[PipelineStage.from_dict(s) for s in data.get("stages", [])],
+            post_batch=[PipelineStage.from_dict(s) for s in data.get("post_batch", [])],
+        )
+
+
+@dataclass
+class InfospaceConfig:
+    """Complete infospace configuration, loaded from ``infospace.yaml``.
+
+    This is the declarative description of an infospace: what it
+    explains, through which lenses, governed by which schemas, and
+    what quality thresholds it must meet.
+    """
+
+    topic: TopicConfig
+    disciplines: List[DisciplineBinding] = field(default_factory=list)
+    schemas: SchemaRegistry = field(default_factory=SchemaRegistry)
+    competency_questions: str = ""
+    viability: Dict[str, ViabilityThreshold] = field(default_factory=dict)
+    pipeline: Optional[PipelineConfig] = None
+    entities_dir: str = "output/entities"
+    evaluations_dir: str = "output/evaluations"
+    metrics_dir: str = "output/metrics"
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {"topic": self.topic.to_dict()}
+        if self.disciplines:
+            d["disciplines"] = [db.to_dict() for db in self.disciplines]
+        schemas_dict = self.schemas.to_dict()
+        if schemas_dict:
+            d["schemas"] = schemas_dict
+        if self.competency_questions:
+            d["competency_questions"] = self.competency_questions
+        if self.viability:
+            d["viability"] = {
+                name: t.to_dict() for name, t in self.viability.items()
+            }
+        if self.pipeline:
+            d["pipeline"] = self.pipeline.to_dict()
+        if self.entities_dir != "output/entities":
+            d["entities_dir"] = self.entities_dir
+        if self.evaluations_dir != "output/evaluations":
+            d["evaluations_dir"] = self.evaluations_dir
+        if self.metrics_dir != "output/metrics":
+            d["metrics_dir"] = self.metrics_dir
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> InfospaceConfig:
+        viability_raw = data.get("viability", {})
+        viability = {
+            name: ViabilityThreshold(metric=name, **bounds)
+            for name, bounds in viability_raw.items()
+        }
+        pipeline_raw = data.get("pipeline")
+        pipeline = PipelineConfig.from_dict(pipeline_raw) if pipeline_raw else None
+
+        return cls(
+            topic=TopicConfig.from_dict(data["topic"]),
+            disciplines=[
+                DisciplineBinding.from_dict(d)
+                for d in data.get("disciplines", [])
+            ],
+            schemas=SchemaRegistry.from_dict(data.get("schemas", {})),
+            competency_questions=data.get("competency_questions", ""),
+            viability=viability,
+            pipeline=pipeline,
+            entities_dir=data.get("entities_dir", "output/entities"),
+            evaluations_dir=data.get("evaluations_dir", "output/evaluations"),
+            metrics_dir=data.get("metrics_dir", "output/metrics"),
+        )
+
+
+def load_infospace_config(path: Path) -> InfospaceConfig:
+    """Load an :class:`InfospaceConfig` from a YAML file.
+
+    Args:
+        path: Path to ``infospace.yaml``.
+
+    Raises:
+        FileNotFoundError: If *path* does not exist.
+        ValueError: If required fields are missing.
+    """
+    data = yaml.safe_load(path.read_text(encoding="utf-8"))
+    if not isinstance(data, dict):
+        raise ValueError(f"Expected a YAML mapping in {path}")
+    if "topic" not in data:
+        raise ValueError(f"Missing required 'topic' key in {path}")
+    return InfospaceConfig.from_dict(data)
+
+
+def save_infospace_config(config: InfospaceConfig, path: Path) -> None:
+    """Write an :class:`InfospaceConfig` to a YAML file."""
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(
+        yaml.safe_dump(
+            config.to_dict(),
+            default_flow_style=False,
+            sort_keys=False,
+        ),
+        encoding="utf-8",
+    )
+
+
+def find_infospace_config(start: Optional[Path] = None) -> Optional[Path]:
+    """Walk up from *start* looking for ``infospace.yaml``.
+
+    Returns the path to the config file, or ``None``.
+    """
+    current = (start or Path.cwd()).resolve()
+    for directory in [current, *current.parents]:
+        candidate = directory / "infospace.yaml"
+        if candidate.is_file():
+            return candidate
+    return None
--- a/markitect/infospace/entity_parser.py
+++ b/markitect/infospace/entity_parser.py
@@ -0,0 +1,176 @@
+"""
+Entity metadata parser.
+
+Extracts structured :class:`EntityMeta` from entity markdown files
+produced by the infospace entity-extraction pipeline.
+"""
+
+import logging
+import re
+from pathlib import Path
+from typing import List, Optional, Sequence
+
+from markitect.core.parser import parse_markdown_to_ast
+from markitect.core.section_tree import (
+    build_section_tree,
+    extract_heading_content,
+    extract_heading_level,
+    extract_section_text,
+    slugify,
+)
+from .models import EntityMeta
+
+logger = logging.getLogger(__name__)
+
+# Sections we look for (slug → human-friendly label)
+_KNOWN_SECTIONS = {
+    "definition": "Definition",
+    "source_chapter": "Source Chapter",
+    "context": "Context",
+    "economic_domain": "Economic Domain",
+    "smith_s_original_wording": "Smith's Original Wording",
+    "modern_interpretation": "Modern Interpretation",
+}
+
+# Default filename patterns to exclude from directory parsing
+_DEFAULT_EXCLUDE_PATTERNS = (
+    r".*-entities\.md$",
+    r".*-prompt\.md$",
+)
+
+
+def _is_title_case(text: str) -> bool:
+    """Return True if *text* is in title case (ignoring short words)."""
+    # Words that are allowed to be lowercase in title case
+    minor_words = {
+        "a", "an", "the", "and", "but", "or", "nor", "for", "yet", "so",
+        "in", "on", "at", "to", "by", "of", "up", "as", "is", "if",
+    }
+    words = text.split()
+    if not words:
+        return False
+    for i, word in enumerate(words):
+        # Strip leading/trailing punctuation for the check
+        clean = re.sub(r"[^\w]", "", word)
+        if not clean:
+            continue
+        # First word must be capitalised
+        if i == 0:
+            if not clean[0].isupper():
+                return False
+        elif clean.lower() in minor_words:
+            continue  # minor words may be lower
+        elif not clean[0].isupper():
+            return False
+    return True
+
+
+def _word_count(text: str) -> int:
+    """Count whitespace-separated words in *text*."""
+    return len(text.split())
+
+
+def _find_h2_section(tree_root: dict, slug: str) -> Optional[dict]:
+    """Find a direct H2 child of the root by slug."""
+    for child in tree_root.get("children", []):
+        if child["level"] == 2 and child["slug"] == slug:
+            return child
+    return None
+
+
+def parse_entity_file(path: Path) -> EntityMeta:
+    """Parse a single entity markdown file into :class:`EntityMeta`.
+
+    Raises:
+        ValueError: If the file has no H1 heading.
+    """
+    content = path.read_text(encoding="utf-8")
+    tokens = parse_markdown_to_ast(content)
+    tree = build_section_tree(tokens)
+
+    # --- H1: entity title ---
+    h1_section = None
+    for child in tree["children"]:
+        if child["level"] == 1:
+            h1_section = child
+            break
+
+    if h1_section is None:
+        raise ValueError(f"No H1 heading found in {path}")
+
+    h1_raw = h1_section["heading"]
+    slug = slugify(h1_raw)
+    title = h1_raw
+    h1_is_title_case = _is_title_case(h1_raw)
+
+    # Use the H1 node as the effective root for H2 look-ups
+    effective_root = h1_section
+
+    # Collect all H2 section slugs
+    section_slugs = [c["slug"] for c in effective_root.get("children", []) if c["level"] == 2]
+
+    # --- Extract known sections ---
+    def _get_section_text(section_slug: str) -> str:
+        node = _find_h2_section(effective_root, section_slug)
+        if node is None:
+            return ""
+        return extract_section_text(node).strip()
+
+    definition = _get_section_text("definition")
+    source_chapter = _get_section_text("source_chapter")
+    context = _get_section_text("context")
+    domain = _get_section_text("economic_domain")
+    original_wording = _get_section_text("smith_s_original_wording")
+    modern_interpretation = _get_section_text("modern_interpretation")
+
+    # --- Derived metrics ---
+    has_original_wording = bool(original_wording)
+    definition_word_count = _word_count(definition)
+    total_word_count = _word_count(content)
+
+    return EntityMeta(
+        slug=slug,
+        title=title,
+        h1_raw=h1_raw,
+        definition=definition,
+        source_chapter=source_chapter,
+        context=context,
+        domain=domain,
+        original_wording=original_wording,
+        modern_interpretation=modern_interpretation,
+        h1_is_title_case=h1_is_title_case,
+        has_original_wording=has_original_wording,
+        definition_word_count=definition_word_count,
+        total_word_count=total_word_count,
+        section_slugs=section_slugs,
+        source_path=str(path),
+    )
+
+
+def parse_entity_directory(
+    directory: Path,
+    exclude_patterns: Optional[Sequence[str]] = None,
+) -> List[EntityMeta]:
+    """Parse all entity markdown files in *directory*.
+
+    Files matching *exclude_patterns* (regexes tested against the
+    filename) are skipped.  Defaults exclude chapter-view
+    (``*-entities.md``) and prompt (``*-prompt.md``) files.
+
+    Malformed files are skipped with a warning rather than raising.
+    """
+    if exclude_patterns is None:
+        exclude_patterns = _DEFAULT_EXCLUDE_PATTERNS
+
+    compiled = [re.compile(p) for p in exclude_patterns]
+    entities: List[EntityMeta] = []
+
+    for md_file in sorted(directory.glob("*.md")):
+        if any(pat.match(md_file.name) for pat in compiled):
+            continue
+        try:
+            entities.append(parse_entity_file(md_file))
+        except Exception as exc:
+            logger.warning("Skipping %s: %s", md_file.name, exc)
+
+    return entities
--- a/markitect/infospace/evaluate.py
+++ b/markitect/infospace/evaluate.py
@@ -0,0 +1,215 @@
+"""
+Per-entity evaluation pipeline.
+
+Builds prompts from entity metadata and delegates LLM evaluation to
+the :class:`BatchEvaluator`.  Writes structured results to the
+evaluations directory.
+"""
+
+from __future__ import annotations
+
+import hashlib
+from datetime import datetime
+from pathlib import Path
+from typing import Callable, Dict, List, Optional
+
+from markitect.infospace.config import InfospaceConfig
+from markitect.infospace.evaluation import EntityEvaluation, ScoreEntry
+from markitect.infospace.evaluation_io import write_entity_evaluation
+from markitect.infospace.models import EntityMeta
+from markitect.prompts.execution.batch import BatchEvaluator, BatchItem, BatchSummary
+from markitect.prompts.execution.llm_adapter import LLMAdapter
+from markitect.prompts.execution.models import RunConfig
+
+
+_DEFAULT_DIMENSIONS = [
+    "definition_precision",
+    "source_grounding",
+    "domain_relevance",
+    "discipline_alignment",
+    "conceptual_clarity",
+]
+
+_PROMPT_TEMPLATE = """\
+You are evaluating an entity from an infospace about "{topic}".
+
+## Entity: {title}
+
+**Slug:** {slug}
+**Domain:** {domain}
+**Source chapter:** {source_chapter}
+
+### Definition
+{definition}
+
+### Context
+{context}
+
+## Instructions
+
+Rate this entity on each dimension below using a scale of 1-5 \
+(1 = poor, 5 = excellent). For each dimension, provide:
+1. A numeric score (1-5)
+2. A brief rationale (1-2 sentences)
+
+### Dimensions to evaluate:
+{dimensions_list}
+
+## Output format
+
+Return your evaluation as a structured list:
+
+DIMENSION: <name>
+SCORE: <1-5>
+RATIONALE: <explanation>
+
+Repeat for each dimension.
+"""
+
+
+def build_evaluation_prompt(
+    entity: EntityMeta,
+    topic: str,
+    dimensions: Optional[List[str]] = None,
+) -> str:
+    """Build an evaluation prompt for a single entity."""
+    dims = dimensions or _DEFAULT_DIMENSIONS
+    dims_list = "\n".join(f"- {d}" for d in dims)
+    return _PROMPT_TEMPLATE.format(
+        topic=topic,
+        title=entity.title,
+        slug=entity.slug,
+        domain=entity.domain or "(unspecified)",
+        source_chapter=entity.source_chapter or "(unspecified)",
+        definition=entity.definition or "(no definition)",
+        context=entity.context or "(no context)",
+        dimensions_list=dims_list,
+    )
+
+
+def content_digest(entity: EntityMeta) -> str:
+    """Compute a content digest for incremental evaluation."""
+    content = f"{entity.slug}:{entity.definition}:{entity.context}:{entity.domain}"
+    return hashlib.sha256(content.encode()).hexdigest()[:16]
+
+
+def parse_evaluation_response(
+    response_text: str,
+    dimensions: Optional[List[str]] = None,
+) -> List[ScoreEntry]:
+    """Parse structured dimension scores from LLM response text.
+
+    Expects blocks of::
+
+        DIMENSION: <name>
+        SCORE: <1-5>
+        RATIONALE: <text>
+    """
+    dims = dimensions or _DEFAULT_DIMENSIONS
+    scores: List[ScoreEntry] = []
+    current_dim = None
+    current_score = None
+    current_rationale = ""
+
+    for line in response_text.splitlines():
+        stripped = line.strip()
+        if stripped.upper().startswith("DIMENSION:"):
+            # Flush previous
+            if current_dim is not None and current_score is not None:
+                scores.append(ScoreEntry(
+                    name=current_dim,
+                    value=current_score,
+                    max_value=5.0,
+                    rationale=current_rationale.strip(),
+                ))
+            current_dim = stripped.split(":", 1)[1].strip()
+            current_score = None
+            current_rationale = ""
+        elif stripped.upper().startswith("SCORE:"):
+            try:
+                current_score = float(stripped.split(":", 1)[1].strip())
+            except ValueError:
+                current_score = None
+        elif stripped.upper().startswith("RATIONALE:"):
+            current_rationale = stripped.split(":", 1)[1].strip()
+        elif current_dim is not None and current_score is not None:
+            # Continuation of rationale
+            if stripped:
+                current_rationale += " " + stripped
+
+    # Flush last
+    if current_dim is not None and current_score is not None:
+        scores.append(ScoreEntry(
+            name=current_dim,
+            value=current_score,
+            max_value=5.0,
+            rationale=current_rationale.strip(),
+        ))
+
+    return scores
+
+
+def run_entity_evaluation(
+    config: InfospaceConfig,
+    entities: List[EntityMeta],
+    adapter: LLMAdapter,
+    run_config: Optional[RunConfig] = None,
+    output_dir: Optional[Path] = None,
+    previous_digests: Optional[Dict[str, str]] = None,
+    progress_callback: Optional[Callable] = None,
+    dimensions: Optional[List[str]] = None,
+) -> BatchSummary:
+    """Run per-entity evaluation using the batch evaluator.
+
+    Args:
+        config: The infospace configuration.
+        entities: Entities to evaluate.
+        adapter: LLM adapter for evaluation.
+        run_config: LLM execution configuration.
+        output_dir: Where to write evaluation results.  Defaults to
+            ``config.evaluations_dir`` relative to CWD.
+        previous_digests: ``{slug: digest}`` for incremental skip.
+        progress_callback: Called after each item.
+        dimensions: Custom evaluation dimensions.
+
+    Returns:
+        A :class:`BatchSummary` with per-entity results.
+    """
+    topic = config.topic.name
+    items = [
+        BatchItem(
+            key=entity.slug,
+            prompt=build_evaluation_prompt(entity, topic, dimensions),
+            content_digest=content_digest(entity),
+            metadata={"source_path": entity.source_path},
+        )
+        for entity in entities
+    ]
+
+    evaluator = BatchEvaluator(
+        adapter=adapter,
+        config=run_config,
+        progress_callback=progress_callback,
+        previous_digests=previous_digests,
+    )
+    summary = evaluator.evaluate(items)
+
+    # Write successful results
+    evaluations_path = output_dir or Path(config.evaluations_dir)
+    evaluator_name = (run_config.model_name if run_config else "unknown")
+
+    for result in summary.results:
+        if result.status != "success" or result.response is None:
+            continue
+
+        scores = parse_evaluation_response(result.response.content, dimensions)
+        evaluation = EntityEvaluation(
+            entity_slug=result.key,
+            evaluator=evaluator_name,
+            scores=scores,
+            evaluated_at=datetime.utcnow(),
+        )
+        eval_path = evaluations_path / f"{result.key}.md"
+        write_entity_evaluation(evaluation, eval_path)
+
+    return summary
--- a/markitect/infospace/evaluation.py
+++ b/markitect/infospace/evaluation.py
@@ -0,0 +1,207 @@
+"""
+Data models for structured evaluation output.
+
+Provides typed containers for per-entity LLM-evaluated scores and
+collection-level metrics.  All models support ``to_dict()``/``from_dict()``
+round-tripping for YAML serialisation.
+"""
+
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Any, Dict, List, Optional
+
+
+@dataclass
+class ScoreEntry:
+    """A single scored dimension (e.g. definition_precision: 4.5/5.0)."""
+
+    name: str
+    value: float
+    max_value: float = 5.0
+    rationale: str = ""
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {
+            "name": self.name,
+            "value": self.value,
+            "max_value": self.max_value,
+        }
+        if self.rationale:
+            d["rationale"] = self.rationale
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "ScoreEntry":
+        return cls(
+            name=data["name"],
+            value=float(data["value"]),
+            max_value=float(data.get("max_value", 5.0)),
+            rationale=data.get("rationale", ""),
+        )
+
+
+@dataclass
+class EntityEvaluation:
+    """Per-entity evaluation result."""
+
+    entity_slug: str
+    evaluator: str
+    scores: List[ScoreEntry]
+    evaluated_at: datetime
+    notes: List[str] = field(default_factory=list)
+
+    @property
+    def overall_score(self) -> float:
+        if not self.scores:
+            return 0.0
+        return sum(s.value for s in self.scores) / len(self.scores)
+
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "entity_slug": self.entity_slug,
+            "evaluator": self.evaluator,
+            "evaluated_at": self.evaluated_at.isoformat(),
+            "overall_score": round(self.overall_score, 4),
+            "scores": [s.to_dict() for s in self.scores],
+            "notes": self.notes,
+        }
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "EntityEvaluation":
+        return cls(
+            entity_slug=data["entity_slug"],
+            evaluator=data["evaluator"],
+            scores=[ScoreEntry.from_dict(s) for s in data["scores"]],
+            evaluated_at=datetime.fromisoformat(data["evaluated_at"]),
+            notes=data.get("notes", []),
+        )
+
+
+@dataclass
+class MetricValue:
+    """A single collection-level metric."""
+
+    name: str
+    value: float
+    concern: str = ""
+    details: Dict[str, Any] = field(default_factory=dict)
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {"name": self.name, "value": self.value}
+        if self.concern:
+            d["concern"] = self.concern
+        if self.details:
+            d["details"] = self.details
+        return d
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "MetricValue":
+        return cls(
+            name=data["name"],
+            value=float(data["value"]),
+            concern=data.get("concern", ""),
+            details=data.get("details", {}),
+        )
+
+
+@dataclass
+class EvaluationSnapshot:
+    """Timestamped snapshot of entity evaluations and collection metrics."""
+
+    snapshot_id: str
+    created_at: datetime
+    schema_name: str
+    entity_count: int
+    entity_evaluations: List[EntityEvaluation] = field(default_factory=list)
+    collection_metrics: List[MetricValue] = field(default_factory=list)
+    metadata: Dict[str, Any] = field(default_factory=dict)
+
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "snapshot_id": self.snapshot_id,
+            "created_at": self.created_at.isoformat(),
+            "schema_name": self.schema_name,
+            "entity_count": self.entity_count,
+            "entity_evaluations": [e.to_dict() for e in self.entity_evaluations],
+            "collection_metrics": [m.to_dict() for m in self.collection_metrics],
+            "metadata": self.metadata,
+        }
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "EvaluationSnapshot":
+        return cls(
+            snapshot_id=data["snapshot_id"],
+            created_at=datetime.fromisoformat(data["created_at"]),
+            schema_name=data["schema_name"],
+            entity_count=data["entity_count"],
+            entity_evaluations=[
+                EntityEvaluation.from_dict(e) for e in data.get("entity_evaluations", [])
+            ],
+            collection_metrics=[
+                MetricValue.from_dict(m) for m in data.get("collection_metrics", [])
+            ],
+            metadata=data.get("metadata", {}),
+        )
+
+
+@dataclass
+class ScoreChange:
+    """Delta record for a single score dimension between snapshots."""
+
+    entity_slug: str
+    dimension: str
+    before: float
+    after: float
+
+    @property
+    def delta(self) -> float:
+        return self.after - self.before
+
+
+@dataclass
+class MetricChange:
+    """Delta record for a collection metric between snapshots."""
+
+    name: str
+    before: float
+    after: float
+
+    @property
+    def delta(self) -> float:
+        return self.after - self.before
+
+
+@dataclass
+class SnapshotDiff:
+    """Diff between two evaluation snapshots."""
+
+    before_id: str
+    after_id: str
+    added_entities: List[str] = field(default_factory=list)
+    removed_entities: List[str] = field(default_factory=list)
+    score_changes: List[ScoreChange] = field(default_factory=list)
+    metric_changes: List[MetricChange] = field(default_factory=list)
+
+    def summary(self) -> str:
+        lines = [f"Diff: {self.before_id} -> {self.after_id}"]
+        if self.added_entities:
+            lines.append(f"  Added entities: {', '.join(self.added_entities)}")
+        if self.removed_entities:
+            lines.append(f"  Removed entities: {', '.join(self.removed_entities)}")
+        if self.score_changes:
+            lines.append(f"  Score changes: {len(self.score_changes)}")
+            for sc in self.score_changes:
+                lines.append(
+                    f"    {sc.entity_slug}/{sc.dimension}: "
+                    f"{sc.before} -> {sc.after} ({sc.delta:+.2f})"
+                )
+        if self.metric_changes:
+            lines.append(f"  Metric changes: {len(self.metric_changes)}")
+            for mc in self.metric_changes:
+                lines.append(
+                    f"    {mc.name}: {mc.before} -> {mc.after} ({mc.delta:+.2f})"
+                )
+        if not any([self.added_entities, self.removed_entities,
+                     self.score_changes, self.metric_changes]):
+            lines.append("  No changes")
+        return "\n".join(lines)
--- a/markitect/infospace/evaluation_io.py
+++ b/markitect/infospace/evaluation_io.py
@@ -0,0 +1,213 @@
+"""
+Read/write utilities for evaluation output files.
+
+Per-entity evaluations are stored as markdown with YAML frontmatter.
+Snapshots and history are stored as pure YAML files.
+"""
+
+from pathlib import Path
+from typing import List
+
+import yaml
+
+from .evaluation import (
+    EntityEvaluation,
+    EvaluationSnapshot,
+    MetricChange,
+    MetricValue,
+    ScoreChange,
+    SnapshotDiff,
+)
+
+_FRONTMATTER_SEP = "---"
+
+
+def write_entity_evaluation(evaluation: EntityEvaluation, path: Path) -> None:
+    """Write a per-entity evaluation as YAML frontmatter + markdown body."""
+    frontmatter = {
+        "entity_slug": evaluation.entity_slug,
+        "evaluator": evaluation.evaluator,
+        "evaluated_at": evaluation.evaluated_at.isoformat(),
+        "overall_score": round(evaluation.overall_score, 4),
+        "scores": [s.to_dict() for s in evaluation.scores],
+    }
+    if evaluation.notes:
+        frontmatter["notes"] = evaluation.notes
+
+    lines: List[str] = []
+    lines.append(_FRONTMATTER_SEP)
+    lines.append(yaml.safe_dump(frontmatter, default_flow_style=False, sort_keys=False).rstrip())
+    lines.append(_FRONTMATTER_SEP)
+    lines.append("")
+
+    # Title
+    title = evaluation.entity_slug.replace("_", " ").replace("-", " ").title()
+    lines.append(f"# Evaluation: {title}")
+    lines.append("")
+
+    # One section per score with rationale
+    for score in evaluation.scores:
+        lines.append(f"## {score.name} — {score.value} / {score.max_value}")
+        lines.append("")
+        if score.rationale:
+            lines.append(score.rationale)
+            lines.append("")
+
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text("\n".join(lines), encoding="utf-8")
+
+
+def read_entity_evaluation(path: Path) -> EntityEvaluation:
+    """Read a per-entity evaluation from a YAML frontmatter markdown file."""
+    text = path.read_text(encoding="utf-8")
+    parts = text.split(f"{_FRONTMATTER_SEP}\n", maxsplit=2)
+    # parts: ["", frontmatter_text, body]
+    if len(parts) < 3:
+        raise ValueError(f"Invalid frontmatter in {path}")
+    fm_text = parts[1]
+    body = parts[2]
+
+    fm = yaml.safe_load(fm_text)
+
+    # Parse rationales from body
+    rationales = _parse_rationales(body)
+
+    from .evaluation import ScoreEntry
+
+    scores = []
+    for s_data in fm["scores"]:
+        se = ScoreEntry.from_dict(s_data)
+        if se.name in rationales:
+            se.rationale = rationales[se.name]
+        scores.append(se)
+
+    return EntityEvaluation(
+        entity_slug=fm["entity_slug"],
+        evaluator=fm["evaluator"],
+        scores=scores,
+        evaluated_at=__import__("datetime").datetime.fromisoformat(fm["evaluated_at"]),
+        notes=fm.get("notes", []),
+    )
+
+
+def _parse_rationales(body: str) -> dict:
+    """Extract rationale text per dimension from the markdown body."""
+    rationales: dict = {}
+    current_name = None
+    current_lines: List[str] = []
+
+    for line in body.splitlines():
+        if line.startswith("## "):
+            # Save previous
+            if current_name is not None:
+                rationales[current_name] = "\n".join(current_lines).strip()
+            # Parse "## dimension_name — 4.5 / 5.0"
+            heading = line[3:].strip()
+            name = heading.split("—")[0].strip() if "—" in heading else heading
+            current_name = name
+            current_lines = []
+        elif current_name is not None:
+            current_lines.append(line)
+
+    if current_name is not None:
+        rationales[current_name] = "\n".join(current_lines).strip()
+
+    return rationales
+
+
+def write_snapshot(snapshot: EvaluationSnapshot, path: Path) -> None:
+    """Write an evaluation snapshot as a YAML file."""
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(
+        yaml.safe_dump(snapshot.to_dict(), default_flow_style=False, sort_keys=False),
+        encoding="utf-8",
+    )
+
+
+def read_snapshot(path: Path) -> EvaluationSnapshot:
+    """Read an evaluation snapshot from a YAML file."""
+    data = yaml.safe_load(path.read_text(encoding="utf-8"))
+    return EvaluationSnapshot.from_dict(data)
+
+
+def append_to_history(snapshot: EvaluationSnapshot, history_path: Path) -> None:
+    """Append a snapshot to a YAML list file (creates if missing)."""
+    history_path.parent.mkdir(parents=True, exist_ok=True)
+    existing: List[dict] = []
+    if history_path.exists():
+        loaded = yaml.safe_load(history_path.read_text(encoding="utf-8"))
+        if loaded is not None:
+            existing = loaded
+
+    existing.append(snapshot.to_dict())
+    history_path.write_text(
+        yaml.safe_dump(existing, default_flow_style=False, sort_keys=False),
+        encoding="utf-8",
+    )
+
+
+def read_history(history_path: Path) -> List[EvaluationSnapshot]:
+    """Read all snapshots from a YAML history file."""
+    data = yaml.safe_load(history_path.read_text(encoding="utf-8"))
+    if data is None:
+        return []
+    return [EvaluationSnapshot.from_dict(d) for d in data]
+
+
+def diff_snapshots(before: EvaluationSnapshot, after: EvaluationSnapshot) -> SnapshotDiff:
+    """Compute the diff between two evaluation snapshots."""
+    before_slugs = {e.entity_slug for e in before.entity_evaluations}
+    after_slugs = {e.entity_slug for e in after.entity_evaluations}
+
+    added = sorted(after_slugs - before_slugs)
+    removed = sorted(before_slugs - after_slugs)
+
+    # Build score lookup: {slug: {dimension: value}}
+    before_scores: dict = {}
+    for ev in before.entity_evaluations:
+        before_scores[ev.entity_slug] = {s.name: s.value for s in ev.scores}
+
+    after_scores: dict = {}
+    for ev in after.entity_evaluations:
+        after_scores[ev.entity_slug] = {s.name: s.value for s in ev.scores}
+
+    score_changes: List[ScoreChange] = []
+    common_slugs = sorted(before_slugs & after_slugs)
+    for slug in common_slugs:
+        b_dims = before_scores[slug]
+        a_dims = after_scores[slug]
+        all_dims = sorted(set(b_dims) | set(a_dims))
+        for dim in all_dims:
+            bv = b_dims.get(dim)
+            av = a_dims.get(dim)
+            if bv != av:
+                score_changes.append(ScoreChange(
+                    entity_slug=slug,
+                    dimension=dim,
+                    before=bv if bv is not None else 0.0,
+                    after=av if av is not None else 0.0,
+                ))
+
+    # Metric changes
+    before_metrics = {m.name: m.value for m in before.collection_metrics}
+    after_metrics = {m.name: m.value for m in after.collection_metrics}
+    all_metric_names = sorted(set(before_metrics) | set(after_metrics))
+    metric_changes: List[MetricChange] = []
+    for name in all_metric_names:
+        bv = before_metrics.get(name)
+        av = after_metrics.get(name)
+        if bv != av:
+            metric_changes.append(MetricChange(
+                name=name,
+                before=bv if bv is not None else 0.0,
+                after=av if av is not None else 0.0,
+            ))
+
+    return SnapshotDiff(
+        before_id=before.snapshot_id,
+        after_id=after.snapshot_id,
+        added_entities=added,
+        removed_entities=removed,
+        score_changes=score_changes,
+        metric_changes=metric_changes,
+    )
--- a/markitect/infospace/history.py
+++ b/markitect/infospace/history.py
@@ -0,0 +1,223 @@
+"""
+Metrics history and viability tracking.
+
+Converts check results into timestamped snapshots and maintains a
+persistent history file for trend analysis.
+"""
+
+from __future__ import annotations
+
+import uuid
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import yaml
+
+from markitect.infospace.checks.orchestrator import CheckReport
+from markitect.infospace.config import InfospaceConfig
+from markitect.infospace.evaluation import EvaluationSnapshot, MetricValue
+from markitect.infospace.evaluation_io import (
+    append_to_history,
+    diff_snapshots,
+    read_history,
+)
+from markitect.infospace.state import ViabilityResult
+
+
+# ── Snapshot creation ────────────────────────────────────────────────
+
+
+def _concern_for_metric(name: str) -> str:
+    """Map a metric name to its concern label."""
+    mapping = {
+        "redundancy_ratio": "C1",
+        "coverage_ratio": "C2",
+        "coherence_components": "C3",
+        "modularity": "C3",
+        "consistency_cycles": "C4",
+        "granularity_entropy": "C5",
+    }
+    return mapping.get(name, "")
+
+
+def snapshot_from_checks(
+    check_report: CheckReport,
+    entity_count: int,
+    schema_name: str = "default",
+    metadata: Optional[Dict[str, Any]] = None,
+) -> EvaluationSnapshot:
+    """Create an :class:`EvaluationSnapshot` from collection check results.
+
+    Args:
+        check_report: Output from :func:`run_all_checks`.
+        entity_count: Number of entities checked.
+        schema_name: Schema identifier for the snapshot.
+        metadata: Optional extra metadata to attach.
+
+    Returns:
+        A snapshot containing the check metrics as collection_metrics.
+    """
+    metrics_dict = check_report.metrics()
+    collection_metrics = [
+        MetricValue(
+            name=name,
+            value=value,
+            concern=_concern_for_metric(name),
+        )
+        for name, value in sorted(metrics_dict.items())
+    ]
+
+    return EvaluationSnapshot(
+        snapshot_id=str(uuid.uuid4())[:8],
+        created_at=datetime.now(timezone.utc),
+        schema_name=schema_name,
+        entity_count=entity_count,
+        collection_metrics=collection_metrics,
+        metadata=metadata or {},
+    )
+
+
+# ── Metrics file I/O ────────────────────────────────────────────────
+
+
+def write_metrics_file(metrics: Dict[str, float], path: Path) -> None:
+    """Write the latest metrics to a simple YAML file.
+
+    This file is used by ``markitect infospace viability`` for quick
+    threshold checking.
+    """
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(
+        yaml.safe_dump(
+            {k: round(v, 6) for k, v in sorted(metrics.items())},
+            default_flow_style=False,
+            sort_keys=True,
+        ),
+        encoding="utf-8",
+    )
+
+
+def read_metrics_file(path: Path) -> Dict[str, float]:
+    """Read the latest metrics from a YAML file."""
+    if not path.is_file():
+        return {}
+    raw = yaml.safe_load(path.read_text(encoding="utf-8"))
+    if not isinstance(raw, dict):
+        return {}
+    return {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
+
+
+# ── History operations ───────────────────────────────────────────────
+
+
+def record_check_results(
+    check_report: CheckReport,
+    config: InfospaceConfig,
+    root: Path,
+    entity_count: int,
+) -> EvaluationSnapshot:
+    """Record check results: save metrics file and append to history.
+
+    Args:
+        check_report: Output from ``run_all_checks()``.
+        config: The infospace configuration.
+        root: Project root directory.
+        entity_count: Number of entities checked.
+
+    Returns:
+        The snapshot that was recorded.
+    """
+    metrics_dir = root / config.metrics_dir
+    metrics = check_report.metrics()
+
+    # Save latest metrics
+    write_metrics_file(metrics, metrics_dir / "metrics.yaml")
+
+    # Create and append snapshot
+    snapshot = snapshot_from_checks(
+        check_report,
+        entity_count=entity_count,
+        metadata={"source": "collection-checks"},
+    )
+    append_to_history(snapshot, metrics_dir / "history.yaml")
+
+    return snapshot
+
+
+def get_history(config: InfospaceConfig, root: Path) -> List[EvaluationSnapshot]:
+    """Read the full metrics history for an infospace."""
+    history_path = root / config.metrics_dir / "history.yaml"
+    if not history_path.is_file():
+        return []
+    return read_history(history_path)
+
+
+def get_latest_snapshot(
+    config: InfospaceConfig, root: Path
+) -> Optional[EvaluationSnapshot]:
+    """Get the most recent snapshot from the history."""
+    history = get_history(config, root)
+    return history[-1] if history else None
+
+
+def find_snapshot_by_date(
+    history: List[EvaluationSnapshot], date_str: str
+) -> Optional[EvaluationSnapshot]:
+    """Find the snapshot closest to a given date string.
+
+    Args:
+        history: List of snapshots in chronological order.
+        date_str: Date string in ``YYYY-MM-DD`` or ``YYYY-MM-DDTHH:MM:SS`` format.
+
+    Returns:
+        The snapshot closest to the given date, or ``None`` if history is empty.
+    """
+    if not history:
+        return None
+
+    # Parse the target date
+    try:
+        if "T" in date_str:
+            target = datetime.fromisoformat(date_str)
+        else:
+            target = datetime.fromisoformat(date_str + "T00:00:00")
+    except ValueError:
+        return None
+
+    # Make timezone-aware if needed
+    if target.tzinfo is None:
+        target = target.replace(tzinfo=timezone.utc)
+
+    best = None
+    best_delta = None
+    for snap in history:
+        snap_dt = snap.created_at
+        if snap_dt.tzinfo is None:
+            snap_dt = snap_dt.replace(tzinfo=timezone.utc)
+        delta = abs((snap_dt - target).total_seconds())
+        if best_delta is None or delta < best_delta:
+            best = snap
+            best_delta = delta
+
+    return best
+
+
+def metric_trend(
+    history: List[EvaluationSnapshot], metric_name: str
+) -> List[Dict[str, Any]]:
+    """Extract a single metric's values across the history.
+
+    Returns a list of ``{"date": iso_str, "value": float}`` entries
+    for each snapshot that contains the metric.
+    """
+    trend: List[Dict[str, Any]] = []
+    for snap in history:
+        for m in snap.collection_metrics:
+            if m.name == metric_name:
+                trend.append({
+                    "date": snap.created_at.isoformat(),
+                    "value": m.value,
+                })
+                break
+    return trend
--- a/markitect/infospace/models.py
+++ b/markitect/infospace/models.py
@@ -0,0 +1,53 @@
+"""
+Data models for infospace entity metadata.
+"""
+
+from dataclasses import dataclass, field, asdict
+from typing import Any, Dict, List
+
+
+@dataclass
+class EntityMeta:
+    """Structured metadata extracted from a single entity markdown file.
+
+    The parser populates every field it can find; missing optional
+    sections are left as empty strings (validation is a separate step).
+    """
+
+    # Identity
+    slug: str
+    title: str
+    h1_raw: str  # verbatim H1 text before any normalisation
+
+    # Section contents (plain text, empty string if section missing)
+    definition: str = ""
+    source_chapter: str = ""
+    context: str = ""
+    domain: str = ""
+    original_wording: str = ""
+    modern_interpretation: str = ""
+
+    # Derived flags
+    h1_is_title_case: bool = False
+    has_original_wording: bool = False
+
+    # Metrics-ready numbers
+    definition_word_count: int = 0
+    total_word_count: int = 0
+
+    # All H2 section slugs found (preserves order)
+    section_slugs: List[str] = field(default_factory=list)
+
+    # Source file path (as string for serialisation)
+    source_path: str = ""
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Serialise to a plain dictionary."""
+        return asdict(self)
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "EntityMeta":
+        """Deserialise from a plain dictionary."""
+        known_fields = {f.name for f in cls.__dataclass_fields__.values()}
+        filtered = {k: v for k, v in data.items() if k in known_fields}
+        return cls(**filtered)
--- a/markitect/infospace/schema.py
+++ b/markitect/infospace/schema.py
@@ -0,0 +1,144 @@
+"""
+Declarative schema definitions for entity compliance validation.
+
+A schema describes the expected structure of an entity: which sections
+are required, word count bounds, heading format, and valid enum values.
+Schemas are frozen (immutable once created).
+"""
+
+from dataclasses import dataclass
+from enum import Enum
+from typing import Optional, Tuple
+
+
+class SectionRequirement(Enum):
+    """How strictly a section must be present."""
+
+    REQUIRED = "required"
+    RECOMMENDED = "recommended"
+    OPTIONAL = "optional"
+
+
+@dataclass(frozen=True)
+class SectionRule:
+    """Validation rule for a single H2 section.
+
+    Parameters
+    ----------
+    slug:
+        Section slug as it appears in entity metadata (e.g. ``definition``).
+    label:
+        Human-readable section name for diagnostics.
+    requirement:
+        Whether the section is required, recommended, or optional.
+    min_words:
+        Minimum word count (inclusive).  ``None`` means no lower bound.
+    max_words:
+        Maximum word count (inclusive).  ``None`` means no upper bound.
+    """
+
+    slug: str
+    label: str
+    requirement: SectionRequirement
+    min_words: Optional[int] = None
+    max_words: Optional[int] = None
+
+
+@dataclass(frozen=True)
+class EnumConstraint:
+    """Constraint limiting a field to a set of allowed values.
+
+    Parameters
+    ----------
+    field_name:
+        The ``EntityMeta`` field to check (e.g. ``domain``).
+    allowed_values:
+        Tuple of acceptable string values.
+    severity:
+        ``"error"`` or ``"warning"`` when the value is not in the set.
+    """
+
+    field_name: str
+    allowed_values: Tuple[str, ...]
+    severity: str = "warning"
+
+
+@dataclass(frozen=True)
+class EntitySchema:
+    """Complete validation schema for an entity type.
+
+    Parameters
+    ----------
+    name:
+        Human-readable schema name (e.g. ``"Economic Entity"``).
+    section_rules:
+        Tuple of :class:`SectionRule` objects.
+    enum_constraints:
+        Tuple of :class:`EnumConstraint` objects.
+    h1_title_case_severity:
+        Severity for non-title-case H1 headings (``"error"`` or ``"warning"``).
+    require_h1:
+        Whether a non-empty slug (H1) is required.
+    """
+
+    name: str
+    section_rules: Tuple[SectionRule, ...]
+    enum_constraints: Tuple[EnumConstraint, ...] = ()
+    h1_title_case_severity: str = "warning"
+    require_h1: bool = True
+
+
+# ── Default schema for the economic-entity infospace ──────────────
+
+ECONOMIC_ENTITY_SCHEMA = EntitySchema(
+    name="Economic Entity",
+    section_rules=(
+        SectionRule(
+            slug="definition",
+            label="Definition",
+            requirement=SectionRequirement.REQUIRED,
+            min_words=20,
+            max_words=150,
+        ),
+        SectionRule(
+            slug="source_chapter",
+            label="Source Chapter",
+            requirement=SectionRequirement.REQUIRED,
+        ),
+        SectionRule(
+            slug="context",
+            label="Context",
+            requirement=SectionRequirement.REQUIRED,
+        ),
+        SectionRule(
+            slug="economic_domain",
+            label="Economic Domain",
+            requirement=SectionRequirement.REQUIRED,
+        ),
+        SectionRule(
+            slug="smith_s_original_wording",
+            label="Smith's Original Wording",
+            requirement=SectionRequirement.OPTIONAL,
+        ),
+        SectionRule(
+            slug="modern_interpretation",
+            label="Modern Interpretation",
+            requirement=SectionRequirement.OPTIONAL,
+        ),
+    ),
+    enum_constraints=(
+        EnumConstraint(
+            field_name="domain",
+            allowed_values=(
+                "Production",
+                "Exchange",
+                "Distribution",
+                "Regulation",
+                "General Theory",
+            ),
+            severity="warning",
+        ),
+    ),
+    h1_title_case_severity="warning",
+    require_h1=True,
+)
--- a/markitect/infospace/state.py
+++ b/markitect/infospace/state.py
@@ -0,0 +1,141 @@
+"""
+Infospace runtime state.
+
+Computed from the current entities, evaluations, and metrics on disk.
+Provides the data behind ``markitect infospace status`` and
+``markitect infospace viability``.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from markitect.infospace.config import InfospaceConfig, ViabilityThreshold
+from markitect.infospace.models import EntityMeta
+from markitect.infospace.evaluation import EvaluationSnapshot
+
+
+@dataclass
+class ViabilityResult:
+    """Result of checking a single viability threshold."""
+
+    metric: str
+    value: float
+    threshold: ViabilityThreshold
+    passed: bool
+
+    def to_dict(self) -> Dict[str, Any]:
+        d: Dict[str, Any] = {
+            "metric": self.metric,
+            "value": self.value,
+            "passed": self.passed,
+        }
+        if self.threshold.min is not None:
+            d["min"] = self.threshold.min
+        if self.threshold.max is not None:
+            d["max"] = self.threshold.max
+        return d
+
+
+@dataclass
+class InfospaceState:
+    """Current runtime state of an infospace.
+
+    Aggregates entity metadata, evaluation results, and viability
+    checks into a single queryable object.
+    """
+
+    config: InfospaceConfig
+    entities: List[EntityMeta] = field(default_factory=list)
+    latest_snapshot: Optional[EvaluationSnapshot] = None
+    viability_results: List[ViabilityResult] = field(default_factory=list)
+    computed_at: datetime = field(default_factory=datetime.utcnow)
+
+    @property
+    def entity_count(self) -> int:
+        return len(self.entities)
+
+    @property
+    def topic_name(self) -> str:
+        return self.config.topic.name
+
+    @property
+    def is_viable(self) -> bool:
+        """``True`` if all viability thresholds are met."""
+        if not self.viability_results:
+            return False
+        return all(r.passed for r in self.viability_results)
+
+    @property
+    def viability_pass_count(self) -> int:
+        return sum(1 for r in self.viability_results if r.passed)
+
+    @property
+    def viability_total_count(self) -> int:
+        return len(self.viability_results)
+
+    @property
+    def domains(self) -> List[str]:
+        """Distinct domain values across all entities."""
+        return sorted({e.domain for e in self.entities if e.domain})
+
+    @property
+    def has_evaluations(self) -> bool:
+        return self.latest_snapshot is not None
+
+    def check_viability(self, metrics: Dict[str, float]) -> List[ViabilityResult]:
+        """Check *metrics* against the configured viability thresholds.
+
+        Updates :attr:`viability_results` and returns the results.
+        """
+        results: List[ViabilityResult] = []
+        for name, threshold in self.config.viability.items():
+            value = metrics.get(name, 0.0)
+            results.append(ViabilityResult(
+                metric=name,
+                value=value,
+                threshold=threshold,
+                passed=threshold.check(value),
+            ))
+        self.viability_results = results
+        return results
+
+    def summary(self) -> Dict[str, Any]:
+        """Return a summary dict suitable for display or serialisation."""
+        d: Dict[str, Any] = {
+            "topic": self.topic_name,
+            "entity_count": self.entity_count,
+            "domains": self.domains,
+            "has_evaluations": self.has_evaluations,
+        }
+        if self.viability_results:
+            d["viable"] = self.is_viable
+            d["viability_pass"] = self.viability_pass_count
+            d["viability_total"] = self.viability_total_count
+        if self.latest_snapshot:
+            d["last_evaluated"] = self.latest_snapshot.created_at.isoformat()
+        return d
+
+
+def build_state(
+    config: InfospaceConfig,
+    entities: Optional[List[EntityMeta]] = None,
+    snapshot: Optional[EvaluationSnapshot] = None,
+    metrics: Optional[Dict[str, float]] = None,
+) -> InfospaceState:
+    """Build an :class:`InfospaceState` from available data.
+
+    This is a convenience function that assembles the state object
+    and optionally runs viability checks if *metrics* are provided.
+    """
+    state = InfospaceState(
+        config=config,
+        entities=entities or [],
+        latest_snapshot=snapshot,
+    )
+    if metrics is not None:
+        state.check_viability(metrics)
+    return state
--- a/markitect/infospace/validator.py
+++ b/markitect/infospace/validator.py
@@ -0,0 +1,261 @@
+"""
+Schema compliance validator for entity metadata.
+
+Validates :class:`~markitect.infospace.models.EntityMeta` instances
+against a declarative :class:`~markitect.infospace.schema.EntitySchema`.
+All checks are deterministic — no LLM calls.
+"""
+
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Sequence
+
+from .models import EntityMeta
+from .schema import EntitySchema, SectionRequirement
+
+# Maps section slugs (as they appear in the schema) to EntityMeta field
+# names.  Most match directly; ``economic_domain`` maps to ``domain``.
+_SECTION_FIELD_MAP: Dict[str, str] = {
+    "definition": "definition",
+    "source_chapter": "source_chapter",
+    "context": "context",
+    "economic_domain": "domain",
+    "smith_s_original_wording": "original_wording",
+    "modern_interpretation": "modern_interpretation",
+}
+
+
+@dataclass
+class ComplianceDiagnostic:
+    """A single validation finding."""
+
+    code: str
+    message: str
+    severity: str  # "error" or "warning"
+    section: Optional[str] = None
+    field: Optional[str] = None
+
+    def __str__(self) -> str:
+        parts = [f"[{self.severity.upper()}] {self.code}: {self.message}"]
+        if self.section:
+            parts.append(f"(section: {self.section})")
+        if self.field:
+            parts.append(f"(field: {self.field})")
+        return " ".join(parts)
+
+
+@dataclass
+class ComplianceResult:
+    """Validation result for a single entity."""
+
+    entity_slug: str
+    schema_name: str
+    diagnostics: List[ComplianceDiagnostic] = field(default_factory=list)
+    checks_run: int = 0
+
+    @property
+    def is_compliant(self) -> bool:
+        return self.error_count == 0
+
+    @property
+    def error_count(self) -> int:
+        return sum(1 for d in self.diagnostics if d.severity == "error")
+
+    @property
+    def warning_count(self) -> int:
+        return sum(1 for d in self.diagnostics if d.severity == "warning")
+
+    @property
+    def errors(self) -> List[ComplianceDiagnostic]:
+        return [d for d in self.diagnostics if d.severity == "error"]
+
+    @property
+    def warnings(self) -> List[ComplianceDiagnostic]:
+        return [d for d in self.diagnostics if d.severity == "warning"]
+
+    def summary(self) -> str:
+        status = "PASS" if self.is_compliant else "FAIL"
+        return (
+            f"{self.entity_slug}: {status} "
+            f"({self.checks_run} checks, "
+            f"{self.error_count} errors, "
+            f"{self.warning_count} warnings)"
+        )
+
+
+@dataclass
+class BatchComplianceResult:
+    """Aggregated validation result for multiple entities."""
+
+    results: List[ComplianceResult] = field(default_factory=list)
+    schema_name: str = ""
+
+    @property
+    def total_entities(self) -> int:
+        return len(self.results)
+
+    @property
+    def compliant_count(self) -> int:
+        return sum(1 for r in self.results if r.is_compliant)
+
+    @property
+    def non_compliant_count(self) -> int:
+        return self.total_entities - self.compliant_count
+
+    @property
+    def total_errors(self) -> int:
+        return sum(r.error_count for r in self.results)
+
+    @property
+    def total_warnings(self) -> int:
+        return sum(r.warning_count for r in self.results)
+
+    def summary(self) -> str:
+        lines = [
+            f"Schema: {self.schema_name}",
+            f"Entities: {self.total_entities}",
+            f"Compliant: {self.compliant_count}/{self.total_entities}",
+            f"Errors: {self.total_errors}, Warnings: {self.total_warnings}",
+        ]
+        for r in self.results:
+            lines.append(f"  {r.summary()}")
+        return "\n".join(lines)
+
+
+def _word_count(text: str) -> int:
+    """Count whitespace-separated words."""
+    return len(text.split())
+
+
+def validate_entity(
+    entity: EntityMeta,
+    schema: EntitySchema,
+) -> ComplianceResult:
+    """Validate a single entity against *schema*.
+
+    Returns a :class:`ComplianceResult` with all diagnostics found.
+    """
+    result = ComplianceResult(
+        entity_slug=entity.slug,
+        schema_name=schema.name,
+    )
+    checks = 0
+
+    # ── H1 checks ─────────────────────────────────────────────────
+    if schema.require_h1:
+        checks += 1
+        if not entity.slug:
+            result.diagnostics.append(
+                ComplianceDiagnostic(
+                    code="H1_MISSING",
+                    message="Entity has no H1 heading (empty slug).",
+                    severity="error",
+                )
+            )
+
+    checks += 1
+    if entity.slug and not entity.h1_is_title_case:
+        result.diagnostics.append(
+            ComplianceDiagnostic(
+                code="H1_NOT_TITLE_CASE",
+                message=f"H1 '{entity.h1_raw}' is not in title case.",
+                severity=schema.h1_title_case_severity,
+            )
+        )
+
+    # ── Section checks ────────────────────────────────────────────
+    for rule in schema.section_rules:
+        checks += 1
+        field_name = _SECTION_FIELD_MAP.get(rule.slug, rule.slug)
+        value = getattr(entity, field_name, "")
+
+        is_empty = not value or not value.strip()
+
+        if is_empty:
+            if rule.requirement == SectionRequirement.REQUIRED:
+                result.diagnostics.append(
+                    ComplianceDiagnostic(
+                        code="SECTION_MISSING",
+                        message=f"Required section '{rule.label}' is missing or empty.",
+                        severity="error",
+                        section=rule.slug,
+                    )
+                )
+            elif rule.requirement == SectionRequirement.RECOMMENDED:
+                result.diagnostics.append(
+                    ComplianceDiagnostic(
+                        code="SECTION_RECOMMENDED",
+                        message=f"Recommended section '{rule.label}' is missing.",
+                        severity="warning",
+                        section=rule.slug,
+                    )
+                )
+            # OPTIONAL + empty → no diagnostic
+            continue
+
+        # Word count bounds (only if section has content)
+        wc = _word_count(value)
+        if rule.min_words is not None and wc < rule.min_words:
+            checks += 1
+            result.diagnostics.append(
+                ComplianceDiagnostic(
+                    code="SECTION_TOO_SHORT",
+                    message=(
+                        f"Section '{rule.label}' has {wc} words "
+                        f"(minimum: {rule.min_words})."
+                    ),
+                    severity="error",
+                    section=rule.slug,
+                )
+            )
+        elif rule.max_words is not None and wc > rule.max_words:
+            checks += 1
+            result.diagnostics.append(
+                ComplianceDiagnostic(
+                    code="SECTION_TOO_LONG",
+                    message=(
+                        f"Section '{rule.label}' has {wc} words "
+                        f"(maximum: {rule.max_words})."
+                    ),
+                    severity="warning",
+                    section=rule.slug,
+                )
+            )
+
+    # ── Enum constraints ──────────────────────────────────────────
+    for constraint in schema.enum_constraints:
+        checks += 1
+        value = getattr(entity, constraint.field_name, "")
+
+        # Empty field is already caught by SECTION_MISSING above
+        if not value or not value.strip():
+            continue
+
+        if value.strip() not in constraint.allowed_values:
+            result.diagnostics.append(
+                ComplianceDiagnostic(
+                    code="ENUM_VALUE_UNKNOWN",
+                    message=(
+                        f"Field '{constraint.field_name}' has value "
+                        f"'{value.strip()}' which is not in the allowed set."
+                    ),
+                    severity=constraint.severity,
+                    field=constraint.field_name,
+                )
+            )
+
+    result.checks_run = checks
+    return result
+
+
+def validate_entities(
+    entities: Sequence[EntityMeta],
+    schema: EntitySchema,
+) -> BatchComplianceResult:
+    """Validate multiple entities against *schema*.
+
+    Returns a :class:`BatchComplianceResult` with per-entity results.
+    """
+    batch = BatchComplianceResult(schema_name=schema.name)
+    for entity in entities:
+        batch.results.append(validate_entity(entity, schema))
+    return batch
--- a/markitect/llm/init.py
+++ b/markitect/llm/init.py
@@ -26,6 +26,15 @@ from markitect.llm.exceptions import (
    LLMTimeoutError,
    LLMSubprocessError,
 )
+from markitect.llm.embedding_adapter import EmbeddingAdapter
+from markitect.llm.embedding_openai import OpenAICompatibleEmbeddingAdapter
+from markitect.llm.embedding_cache import EmbeddingCache
+from markitect.llm.embedding_factory import create_embedding_adapter
+from markitect.llm.similarity import (
+    cosine_similarity,
+    similarity_matrix,
+    find_similar_pairs,
+)

 __all__ = [
    "create_adapter",
@@ -41,4 +50,11 @@ __all__ = [
    "LLMRateLimitError",
    "LLMTimeoutError",
    "LLMSubprocessError",
+    "EmbeddingAdapter",
+    "OpenAICompatibleEmbeddingAdapter",
+    "EmbeddingCache",
+    "create_embedding_adapter",
+    "cosine_similarity",
+    "similarity_matrix",
+    "find_similar_pairs",
 ]
--- a/markitect/llm/embedding_adapter.py
+++ b/markitect/llm/embedding_adapter.py
@@ -0,0 +1,34 @@
+"""
+Abstract base class for embedding adapters.
+
+Embedding adapters convert text into float vectors. This is a separate
+hierarchy from :class:`LLMAdapter` (text generation) because the API
+contract is fundamentally different: text in, float vectors out.
+"""
+
+from abc import ABC, abstractmethod
+
+
+class EmbeddingAdapter(ABC):
+    """Base class for all embedding adapters."""
+
+    @abstractmethod
+    def embed(self, texts: list[str]) -> list[list[float]]:
+        """Embed a batch of texts into vectors.
+
+        Args:
+            texts: One or more strings to embed.
+
+        Returns:
+            A list of embedding vectors, one per input text,
+            in the same order as *texts*.
+        """
+
+    @abstractmethod
+    def validate(self) -> bool:
+        """Check that the adapter is configured correctly.
+
+        Returns:
+            ``True`` if the adapter has a valid configuration
+            (e.g. API key present), ``False`` otherwise.
+        """
--- a/markitect/llm/embedding_cache.py
+++ b/markitect/llm/embedding_cache.py
@@ -0,0 +1,64 @@
+"""
+File-based embedding cache.
+
+Stores embedding vectors in a single JSON file keyed by entity slug.
+Each entry includes a content digest so stale embeddings are
+automatically invalidated when entity content changes.
+"""
+
+import json
+from pathlib import Path
+from typing import Optional
+
+
+class EmbeddingCache:
+    """Persistent cache for embedding vectors.
+
+    Structure on disk (``embeddings.json``)::
+
+        {
+            "division-of-labour": {"digest": "abc123", "vector": [0.1, ...]},
+            ...
+        }
+    """
+
+    def __init__(self, cache_dir: Path):
+        self._path = cache_dir / "embeddings.json"
+        self._data: dict[str, dict] = {}
+        self._hits = 0
+        self._misses = 0
+        self._load()
+
+    def get(self, slug: str, content_digest: str) -> Optional[list[float]]:
+        """Return the cached vector if *content_digest* matches, else ``None``."""
+        entry = self._data.get(slug)
+        if entry is not None and entry.get("digest") == content_digest:
+            self._hits += 1
+            return entry["vector"]
+        self._misses += 1
+        return None
+
+    def put(self, slug: str, content_digest: str, vector: list[float]) -> None:
+        """Store or overwrite the embedding for *slug*."""
+        self._data[slug] = {"digest": content_digest, "vector": vector}
+
+    def save(self) -> None:
+        """Write cache to disk."""
+        self._path.parent.mkdir(parents=True, exist_ok=True)
+        self._path.write_text(json.dumps(self._data, separators=(",", ":")))
+
+    def stats(self) -> dict:
+        """Return cache statistics."""
+        return {
+            "entries": len(self._data),
+            "hits": self._hits,
+            "misses": self._misses,
+        }
+
+    def _load(self) -> None:
+        """Read cache from disk if it exists."""
+        if self._path.is_file():
+            try:
+                self._data = json.loads(self._path.read_text())
+            except (json.JSONDecodeError, OSError):
+                self._data = {}
--- a/markitect/llm/embedding_factory.py
+++ b/markitect/llm/embedding_factory.py
@@ -0,0 +1,50 @@
+"""
+Factory for creating embedding adapters by provider name.
+"""
+
+from typing import Optional, Any
+
+from markitect.llm.embedding_adapter import EmbeddingAdapter
+from markitect.llm.exceptions import LLMConfigurationError
+
+_EMBEDDING_PROVIDERS = {
+    "openai": "markitect.llm.embedding_openai.OpenAICompatibleEmbeddingAdapter",
+    "openrouter": "markitect.llm.embedding_openai.OpenAICompatibleEmbeddingAdapter",
+}
+
+
+def create_embedding_adapter(
+    provider: str = "openai",
+    model: Optional[str] = None,
+    api_key: Optional[str] = None,
+    **kwargs: Any,
+) -> EmbeddingAdapter:
+    """Instantiate an :class:`EmbeddingAdapter` for the given *provider*.
+
+    Args:
+        provider: ``"openai"`` or ``"openrouter"``.
+        model: Embedding model name (e.g. ``"text-embedding-3-small"``).
+        api_key: Explicit API key.
+        **kwargs: Extra keyword arguments forwarded to the adapter.
+
+    Returns:
+        A ready-to-use :class:`EmbeddingAdapter` instance.
+
+    Raises:
+        LLMConfigurationError: If *provider* is not recognised.
+    """
+    if provider not in _EMBEDDING_PROVIDERS:
+        known = ", ".join(sorted(_EMBEDDING_PROVIDERS))
+        raise LLMConfigurationError(
+            f"Unknown embedding provider {provider!r}. Choose from: {known}",
+            context={"provider": provider},
+        )
+
+    # Lazy import
+    fqn = _EMBEDDING_PROVIDERS[provider]
+    module_path, class_name = fqn.rsplit(".", 1)
+    import importlib
+    mod = importlib.import_module(module_path)
+    cls = getattr(mod, class_name)
+
+    return cls(model=model, api_key=api_key, provider=provider, **kwargs)
--- a/markitect/llm/embedding_openai.py
+++ b/markitect/llm/embedding_openai.py
@@ -0,0 +1,125 @@
+"""
+OpenAI-compatible embedding adapter.
+
+Works with both OpenAI (``/v1/embeddings``) and OpenRouter
+(``/api/v1/embeddings``) since they share the same API format.
+The *provider* parameter determines the default base URL and
+API key environment variable.
+"""
+
+import time
+from typing import Optional, Dict, Any
+
+from markitect.llm.embedding_adapter import EmbeddingAdapter
+from markitect.llm.config import resolve_api_key, find_project_root
+from markitect.llm._http import post_json
+from markitect.llm.exceptions import (
+    LLMConfigurationError,
+    LLMAPIError,
+    LLMRateLimitError,
+)
+
+_DEFAULT_MODEL = "text-embedding-3-small"
+
+_PROVIDER_DEFAULTS: Dict[str, Dict[str, str]] = {
+    "openai": {
+        "api_base": "https://api.openai.com/v1",
+        "env_var": "OPENAI_API_KEY",
+    },
+    "openrouter": {
+        "api_base": "https://openrouter.ai/api/v1",
+        "env_var": "OPENROUTER_API_KEY",
+    },
+}
+
+
+class OpenAICompatibleEmbeddingAdapter(EmbeddingAdapter):
+    """Embedding adapter for OpenAI-compatible endpoints.
+
+    A single class handles both OpenAI and OpenRouter because they
+    expose the same ``/embeddings`` endpoint format.
+    """
+
+    def __init__(
+        self,
+        model: Optional[str] = None,
+        api_key: Optional[str] = None,
+        api_base: Optional[str] = None,
+        provider: str = "openai",
+        max_retries: int = 3,
+    ):
+        if provider not in _PROVIDER_DEFAULTS:
+            known = ", ".join(sorted(_PROVIDER_DEFAULTS))
+            raise LLMConfigurationError(
+                f"Unknown embedding provider {provider!r}. Choose from: {known}",
+                context={"provider": provider},
+            )
+
+        defaults = _PROVIDER_DEFAULTS[provider]
+        self._model = model or _DEFAULT_MODEL
+        self._api_base = (api_base or defaults["api_base"]).rstrip("/")
+        self._max_retries = max_retries
+        self._provider = provider
+
+        # Resolve API key
+        env_var = defaults["env_var"]
+        root = find_project_root()
+        key_file_paths = [root / f"apikey-{provider}.txt"] if root else []
+        self._api_key = resolve_api_key(
+            explicit=api_key,
+            env_var=env_var,
+            key_file_paths=key_file_paths,
+        )
+
+    def embed(self, texts: list[str]) -> list[list[float]]:
+        """Embed texts via the OpenAI-compatible ``/embeddings`` endpoint.
+
+        Raises:
+            LLMConfigurationError: If no API key is configured.
+            LLMAPIError: On HTTP errors after retries are exhausted.
+        """
+        if not self._api_key:
+            raise LLMConfigurationError(
+                "No API key configured for embedding adapter",
+                context={"provider": self._provider},
+            )
+
+        url = f"{self._api_base}/embeddings"
+        payload: Dict[str, Any] = {
+            "model": self._model,
+            "input": texts,
+        }
+        headers = {"Authorization": f"Bearer {self._api_key}"}
+
+        data = self._post_with_retries(url, payload, headers)
+
+        # Response: {"data": [{"embedding": [...], "index": 0}, ...]}
+        # Sort by index to guarantee input order.
+        items = sorted(data["data"], key=lambda d: d["index"])
+        return [item["embedding"] for item in items]
+
+    def validate(self) -> bool:
+        """Return ``True`` if an API key is available."""
+        return self._api_key is not None
+
+    def _post_with_retries(
+        self,
+        url: str,
+        payload: Dict[str, Any],
+        headers: Dict[str, str],
+    ) -> Dict[str, Any]:
+        last_exc: Optional[Exception] = None
+        for attempt in range(self._max_retries + 1):
+            try:
+                return post_json(url, payload, headers)
+            except LLMRateLimitError as exc:
+                last_exc = exc
+                if attempt < self._max_retries:
+                    time.sleep(2 ** attempt)
+            except LLMAPIError as exc:
+                if exc.status_code >= 500 and attempt < self._max_retries:
+                    last_exc = exc
+                    time.sleep(2 ** attempt)
+                else:
+                    raise
+        raise last_exc  # type: ignore[misc]
--- a/markitect/llm/similarity.py
+++ b/markitect/llm/similarity.py
@@ -0,0 +1,64 @@
+"""
+Pure-Python vector similarity utilities.
+
+No external dependencies — uses :mod:`math` only.  Sufficient for the
+current entity scale (~100s).  numpy can be substituted later if needed.
+"""
+
+import math
+
+
+def cosine_similarity(a: list[float], b: list[float]) -> float:
+    """Cosine similarity between two vectors.
+
+    Returns a float in [-1, 1].  Returns 0.0 if either vector has
+    zero magnitude (to avoid division by zero).
+    """
+    dot = sum(x * y for x, y in zip(a, b))
+    mag_a = math.sqrt(sum(x * x for x in a))
+    mag_b = math.sqrt(sum(x * x for x in b))
+    if mag_a == 0.0 or mag_b == 0.0:
+        return 0.0
+    return dot / (mag_a * mag_b)
+
+
+def similarity_matrix(embeddings: list[list[float]]) -> list[list[float]]:
+    """Build an NxN cosine similarity matrix.
+
+    ``matrix[i][j]`` is the cosine similarity between
+    ``embeddings[i]`` and ``embeddings[j]``.
+    """
+    n = len(embeddings)
+    mat: list[list[float]] = [[0.0] * n for _ in range(n)]
+    for i in range(n):
+        mat[i][i] = 1.0
+        for j in range(i + 1, n):
+            sim = cosine_similarity(embeddings[i], embeddings[j])
+            mat[i][j] = sim
+            mat[j][i] = sim
+    return mat
+
+
+def find_similar_pairs(
+    embeddings: dict[str, list[float]],
+    threshold: float = 0.80,
+) -> list[tuple[str, str, float]]:
+    """Find all pairs with cosine similarity >= *threshold*.
+
+    Args:
+        embeddings: Mapping of slug → embedding vector.
+        threshold: Minimum similarity to include (default 0.80).
+
+    Returns:
+        List of ``(slug_a, slug_b, similarity)`` tuples sorted by
+        similarity descending.
+    """
+    slugs = sorted(embeddings)
+    pairs: list[tuple[str, str, float]] = []
+    for i, slug_a in enumerate(slugs):
+        for slug_b in slugs[i + 1:]:
+            sim = cosine_similarity(embeddings[slug_a], embeddings[slug_b])
+            if sim >= threshold:
+                pairs.append((slug_a, slug_b, sim))
+    pairs.sort(key=lambda t: t[2], reverse=True)
+    return pairs
--- a/markitect/prompts/execution/batch.py
+++ b/markitect/prompts/execution/batch.py
@@ -0,0 +1,168 @@
+"""
+Batch LLM evaluation orchestrator.
+
+Runs an evaluation prompt against a batch of items (entities, pairs,
+etc.), collecting structured results.  Handles:
+
+- Incremental evaluation (skip items whose content hasn't changed)
+- Progress reporting via callback
+- Graceful error handling per item (one failure doesn't stop the batch)
+- Aggregate token usage tracking
+
+This is the mechanism by which infospace tooling delegates LLM work
+to the platform.  The adapter's own retry logic handles transient
+API errors (rate limits, 5xx).
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any, Callable, Dict, List, Optional
+
+from markitect.prompts.execution.llm_adapter import LLMAdapter
+from markitect.prompts.execution.models import LLMResponse, RunConfig
+
+
+@dataclass
+class BatchItem:
+    """A single item to evaluate in a batch.
+
+    Attributes:
+        key: Unique identifier (e.g. entity slug).
+        prompt: The compiled prompt text to send to the LLM.
+        content_digest: Hash of the source content, used for
+            incremental evaluation (skip if unchanged).
+        metadata: Arbitrary pass-through metadata.
+    """
+
+    key: str
+    prompt: str
+    content_digest: str = ""
+    metadata: Dict[str, Any] = field(default_factory=dict)
+
+
+@dataclass
+class BatchResult:
+    """Result for a single batch item.
+
+    Attributes:
+        key: Matches the input :attr:`BatchItem.key`.
+        status: One of ``"success"``, ``"error"``, ``"skipped"``.
+        response: The LLM response (``None`` if skipped or error).
+        error: Error message (``None`` if success or skipped).
+        metadata: Pass-through metadata from the input item.
+    """
+
+    key: str
+    status: str
+    response: Optional[LLMResponse] = None
+    error: Optional[str] = None
+    metadata: Dict[str, Any] = field(default_factory=dict)
+
+
+@dataclass
+class BatchSummary:
+    """Aggregate results from a batch evaluation run."""
+
+    total: int = 0
+    succeeded: int = 0
+    failed: int = 0
+    skipped: int = 0
+    results: List[BatchResult] = field(default_factory=list)
+    total_prompt_tokens: int = 0
+    total_completion_tokens: int = 0
+
+    @property
+    def total_tokens(self) -> int:
+        return self.total_prompt_tokens + self.total_completion_tokens
+
+    def success_rate(self) -> float:
+        """Fraction of non-skipped items that succeeded."""
+        attempted = self.total - self.skipped
+        if attempted == 0:
+            return 1.0
+        return self.succeeded / attempted
+
+
+class BatchEvaluator:
+    """Orchestrates LLM evaluation across a batch of items.
+
+    Args:
+        adapter: The LLM adapter to use for evaluation.
+        config: Run configuration (model, temperature, etc.).
+        progress_callback: Optional ``fn(completed, total, result)``
+            called after each item is processed.
+        previous_digests: Optional ``{key: digest}`` mapping from a
+            previous run.  Items whose digest matches are skipped.
+    """
+
+    def __init__(
+        self,
+        adapter: LLMAdapter,
+        config: Optional[RunConfig] = None,
+        progress_callback: Optional[Callable[[int, int, BatchResult], None]] = None,
+        previous_digests: Optional[Dict[str, str]] = None,
+    ):
+        self._adapter = adapter
+        self._config = config or RunConfig()
+        self._progress_callback = progress_callback
+        self._previous_digests = previous_digests or {}
+
+    def evaluate(self, items: List[BatchItem]) -> BatchSummary:
+        """Run evaluation for all items and return aggregate results.
+
+        Items whose :attr:`~BatchItem.content_digest` matches an entry
+        in *previous_digests* are skipped.  All other items are sent to
+        the LLM adapter.  Errors on individual items are captured
+        without aborting the batch.
+        """
+        summary = BatchSummary(total=len(items))
+
+        for idx, item in enumerate(items):
+            result = self._evaluate_one(item)
+            summary.results.append(result)
+
+            if result.status == "success":
+                summary.succeeded += 1
+                usage = result.response.usage if result.response else {}
+                summary.total_prompt_tokens += usage.get("prompt_tokens", 0)
+                summary.total_completion_tokens += usage.get("completion_tokens", 0)
+            elif result.status == "skipped":
+                summary.skipped += 1
+            else:
+                summary.failed += 1
+
+            if self._progress_callback is not None:
+                self._progress_callback(idx + 1, len(items), result)
+
+        return summary
+
+    def _evaluate_one(self, item: BatchItem) -> BatchResult:
+        """Evaluate a single item, handling skip logic and errors."""
+        # Incremental: skip if digest unchanged
+        if (
+            item.content_digest
+            and item.key in self._previous_digests
+            and self._previous_digests[item.key] == item.content_digest
+        ):
+            return BatchResult(
+                key=item.key,
+                status="skipped",
+                metadata=item.metadata,
+            )
+
+        try:
+            response = self._adapter.execute_prompt(item.prompt, self._config)
+            return BatchResult(
+                key=item.key,
+                status="success",
+                response=response,
+                metadata=item.metadata,
+            )
+        except Exception as exc:
+            return BatchResult(
+                key=item.key,
+                status="error",
+                error=str(exc),
+                metadata=item.metadata,
+            )
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -33,6 +33,7 @@ development = [
    "kaizen-agentic @ file:./capabilities/kaizen-agentic"
 ]
 proxy-pdf = ["pymupdf4llm>=0.0.10"]
+analysis = ["networkx>=3.0"]
 proxy-html = ["markdownify>=0.13.1"]
 proxy-markitdown = ["markitdown-no-magika[pdf]"]
 proxy = ["markitdown-no-magika[pdf]"]
--- a/roadmap/infospace-tooling/PLAN.md
+++ b/roadmap/infospace-tooling/PLAN.md
@@ -0,0 +1,621 @@
+# Viable Infospace Tooling — Roadmap
+
+## Vision
+
+An **infospace** is a structured, evaluable, composable collection of
+concepts that explains a **topic** through the lens of one or more
+**disciplines**. Infospaces are the unit of knowledge work in MarkiTect.
+
+This roadmap organises the work needed to move from the current
+ad-hoc example (`infospace-with-history`) to a general-purpose platform
+for creating, evaluating, maintaining, and composing infospaces.
+
+---
+
+## Terminology
+
+These terms establish the vocabulary for infospace tooling. They
+generalise from the Wealth of Nations / VSM example but are not
+specific to it.
+
+### Infospace
+
+A curated, self-describing collection of **entities** (concepts,
+mechanisms, observations) that together explain a **topic**. An
+infospace has:
+
+- A **topic** — the subject matter being explained (e.g. "The Wealth
+  of Nations", "cellular biology", "Kubernetes networking")
+- One or more **disciplines** — external frameworks applied as lenses
+  (e.g. "Viable System Model", "category theory")
+- **Entities** — the atomic units of knowledge, each with a definition,
+  provenance, and quality scores
+- **Schemas** — structural templates that define what a well-formed
+  entity, mapping, or analysis looks like
+- **Evaluations** — per-entity and collection-level quality assessments
+- **Metrics** — quantitative indicators of completeness, coherence,
+  consistency, and granularity balance
+
+An infospace is **viable** when it meets threshold scores across its
+defined metrics — it is fit for purpose as an explanatory tool.
+
+### Topic
+
+The subject matter an infospace is built to explain. A topic sits
+within a **domain** (broader field of knowledge) but is more specific:
+
+- Domain: Economics → Topic: The Wealth of Nations
+- Domain: Systems Theory → Topic: Viable System Model
+- Domain: Computer Science → Topic: Distributed consensus protocols
+
+A topic provides the **source material** — the texts, data, or
+observations from which entities are extracted.
+
+### Discipline
+
+A reusable framework of concepts applied as a lens to explore a topic.
+A discipline is itself an infospace — one that has been evaluated as
+viable and packaged for reuse.
+
+In our example, the VSM is the discipline: a set of concepts (S1-S5,
+recursion, variety, viability) from systems theory, applied to the
+economic concepts in Smith's work.
+
+**Key property:** Disciplines compose. An infospace built with one
+discipline can itself become a discipline for another infospace. The
+Wealth of Nations infospace, viewed through VSM, could become a
+discipline applied to a modern supply chain analysis.
+
+### Entity
+
+The atomic unit of an infospace. An entity has:
+
+- **Identity**: a unique slug and human-readable title
+- **Definition**: a precise, non-circular explanation
+- **Provenance**: the source chapter, passage, and extraction context
+- **Domain placement**: which area of the topic it belongs to
+- **Discipline mapping**: how it connects to the applied discipline
+  (e.g. which VSM system)
+- **Quality scores**: per-entity LLM-evaluated metrics
+- **Lifecycle state**: active, archived (with reason), or draft
+
+### Evaluation
+
+A structured assessment of quality, applied at two levels:
+
+- **Per-entity evaluation**: scores an individual entity against
+  quality rubrics defined in its schema (definition precision, source
+  grounding, discipline relevance, etc.)
+- **Collection evaluation**: scores the entity set as a whole against
+  five concerns: redundancy, coverage, coherence, consistency, and
+  granularity balance
+
+Evaluations are always performed by **delegated LLM calls** through
+MarkiTect's LLM integration — never by the coding agent working on
+infrastructure. This separation ensures that domain-level judgment
+stays in the problem space, not the tooling space.
+
+### Viability
+
+An infospace is viable when:
+
+1. Its entities individually meet quality thresholds (per-entity eval)
+2. Its collection metrics are within acceptable ranges
+3. It can answer its defined **competency questions** — the canonical
+   queries the infospace is meant to support
+4. It has been evaluated recently enough that metrics reflect current
+   content
+
+Viability is not binary — it is a profile of scores that the user
+sets thresholds for based on their needs.
+
+---
+
+## Architecture: Three Layers
+
+```
+┌──────────────────────────────────────────────────┐
+│  Layer 3: Infospace Instances                    │
+│  Specific infospaces built by users              │
+│  (Wealth of Nations + VSM, supply chain + ...)   │
+│  Works IN an infospace                           │
+├──────────────────────────────────────────────────┤
+│  Layer 2: Infospace Tooling                      │
+│  Terminology, primitives, composition model      │
+│  CLI: infospace create/evaluate/compose/...      │
+│  Works WITH infospaces                           │
+├──────────────────────────────────────────────────┤
+│  Layer 1: MarkiTect Platform                     │
+│  Artifacts, prompts, LLM, spaces, graph, embed   │
+│  Provides FOR infospaces                         │
+└──────────────────────────────────────────────────┘
+```
+
+### Boundary condition: LLM delegation
+
+All LLM-based evaluation (entity scoring, pairwise judgments, coverage
+analysis) is delegated to MarkiTect's LLM integration module. The coding
+agent that works on infrastructure never makes domain-level judgments
+itself. This keeps a clean separation:
+
+- **Coding agent** → writes Python, templates, schemas, tests
+- **MarkiTect LLM** → evaluates entities, judges redundancy, assesses
+  coverage, checks consistency
+
+The infospace tooling (Layer 2) orchestrates these LLM calls through
+prompt templates and the prompt execution engine, not through ad-hoc
+prompting.
+
+---
+
+## Stage 1: MarkiTect Platform Additions
+
+Infrastructure that must exist before infospace tooling can be built.
+These are general-purpose platform capabilities, not infospace-specific.
+
+### S1.1 — Entity metadata parser
+
+Add a deterministic markdown parser that extracts structured metadata
+from entity files: H1 title, sections present, word counts, domain,
+source chapter. Returns a dataclass usable by all downstream metrics.
+
+**Maps to:** INFRA-TASKS #13, #10
+**Location:** `markitect/prompts/quality/` or new `markitect/analysis/`
+**Depends on:** Nothing — can start immediately
+**Deliverable:** `parse_entity_metadata(path) -> EntityMeta` function
+with tests
+
+### S1.2 — Schema compliance validator
+
+Deterministic validation of entity/mapping files against their schemas:
+section presence, word count ranges, heading format, enum values. No
+LLM needed.
+
+**Maps to:** INFRA-TASKS #10
+**Location:** `markitect/prompts/quality/validator.py` (extend existing)
+**Depends on:** S1.1
+**Deliverable:** `validate_document(path, schema) -> ValidationResult`
+with tests
+
+### S1.3 — Embedding adapter
+
+Add embedding support to `markitect/llm/`. Needs:
+
+- `EmbeddingAdapter` interface: `embed(texts: list[str]) -> list[list[float]]`
+- `OpenRouterEmbeddingAdapter` implementation (or OpenAI embedding endpoint)
+- Caching layer: store embeddings keyed by `{slug: content_digest}` so
+  unchanged entities skip re-embedding
+- Cosine similarity utility: `similarity_matrix(embeddings) -> np.ndarray`
+
+**Maps to:** INFRA-TASKS #14 (prerequisite)
+**Location:** `markitect/llm/embeddings.py`
+**Depends on:** Nothing — can start immediately
+**Deliverable:** Embedding adapter + cache + similarity computation, with
+tests
+
+### S1.4 — Graph analysis utilities
+
+The existing `DependencyGraph` supports basic traversal and cycle
+detection. Collection-level metrics need richer analysis:
+
+- Connected components
+- Betweenness centrality
+- Community detection (Louvain or label propagation)
+- Modularity score
+- Degree distribution
+- Cohesion/coupling computation
+
+Decide: extend `DependencyGraph` or add a lightweight wrapper that
+converts to networkx (adding it as an optional dependency).
+
+**Maps to:** INFRA-TASKS #16 (prerequisite)
+**Location:** `markitect/prompts/dependencies/analysis.py` or new
+`markitect/analysis/graph.py`
+**Depends on:** Nothing — can start immediately
+**Deliverable:** Graph analysis functions with tests
+
+### S1.5 — Structured evaluation output
+
+Define a standard format for evaluation results: YAML front-matter +
+markdown body. Add utilities for:
+
+- Writing evaluation results (per-entity, per-pair, collection-level)
+- Reading/parsing evaluation results back into dataclasses
+- Appending timestamped snapshots to a history file
+- Diffing two snapshots
+
+**Maps to:** INFRA-TASKS #11, #12
+**Location:** `markitect/prompts/quality/` or `markitect/analysis/`
+**Depends on:** S1.1
+**Deliverable:** `EvaluationResult` model + read/write utilities with
+tests
+
+### S1.6 — Batch LLM evaluation orchestrator
+
+A pipeline component that runs an evaluation prompt template against a
+batch of entities (or entity pairs), collecting structured results.
+Must handle:
+
+- Rate limiting and retry (reuse existing adapter logic)
+- Progress reporting
+- Incremental evaluation (skip entities whose content hasn't changed
+  since last eval)
+- Result aggregation
+
+This is the mechanism by which infospace tooling delegates LLM work
+to the platform.
+
+**Maps to:** INFRA-TASKS #9 (prerequisite)
+**Location:** `markitect/prompts/execution/batch.py`
+**Depends on:** S1.5
+**Deliverable:** `BatchEvaluator` class with tests
+
+### S1.7 — FCA computation
+
+Formal Concept Analysis: build a formal context (entity × attribute
+matrix), compute the concept lattice, extract gap concepts. Either
+implement a minimal FCA algorithm or integrate a library.
+
+**Maps to:** INFRA-TASKS #15 (prerequisite)
+**Location:** `markitect/analysis/fca.py`
+**Depends on:** S1.1
+**Deliverable:** `FormalContext`, `ConceptLattice`, `find_gap_concepts()`
+with tests
+
+### Summary: Stage 1 dependency graph
+
+```
+S1.1 Entity metadata parser ──┬── S1.2 Schema validator
+                               ├── S1.5 Eval output format ── S1.6 Batch evaluator
+                               └── S1.7 FCA computation
+
+S1.3 Embedding adapter ──────── (independent)
+S1.4 Graph analysis ─────────── (independent)
+```
+
+S1.1, S1.3, and S1.4 can proceed in parallel. S1.6 (batch evaluator) is
+the final piece needed before Stage 2 can begin.
+
+---
+
+## Stage 2: Infospace Tooling
+
+The user-facing layer that provides documented primitives for working
+with infospaces. Built on top of Stage 1 infrastructure and the existing
+`markitect/spaces/` module.
+
+### S2.1 — Infospace model and configuration
+
+Define the `Infospace` as a first-class concept that extends the existing
+`InformationSpace` with:
+
+- **Topic declaration**: name, domain, source material reference
+- **Discipline bindings**: which external infospaces are applied as lenses
+- **Schema registry**: which schemas govern entity structure
+- **Competency questions**: what the infospace should be able to answer
+- **Viability thresholds**: minimum acceptable metric scores
+- **Evaluation state**: latest per-entity and collection scores
+
+Configuration format: a `infospace.yaml` (or section in existing config)
+that declares all of the above.
+
+**Location:** new `markitect/infospace/` package
+**Depends on:** S1.1, S1.5, existing `markitect/spaces/`
+**Deliverable:** `InfospaceConfig`, `InfospaceState` models + loader
+
+### S2.2 — Infospace lifecycle commands
+
+CLI commands for the core lifecycle:
+
+```bash
+# Initialise a new infospace
+markitect infospace init --topic "Wealth of Nations" \
+  --domain "Economics" \
+  --discipline vsm-framework
+
+# Show infospace status (entity count, eval state, viability)
+markitect infospace status
+
+# List entities with quality summary
+markitect infospace entities [--sort-by score|domain|chapter]
+
+# Show viability dashboard
+markitect infospace viability
+```
+
+These commands read the `infospace.yaml` config and present information
+from the metadata index and evaluation results.
+
+**Location:** `markitect/infospace/cli.py` integrated into main CLI
+**Depends on:** S2.1
+**Deliverable:** CLI commands with help text and tests
+
+### S2.3 — Per-entity evaluation primitives
+
+Prompt templates and CLI commands for evaluating individual entities:
+
+```bash
+# Evaluate all entities
+markitect infospace evaluate --provider openrouter
+
+# Evaluate entities from a specific chapter
+markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter
+
+# Re-evaluate a single entity
+markitect infospace evaluate --entity division-of-labour --provider openrouter
+```
+
+Uses the batch evaluator (S1.6) to run the evaluate-entity prompt
+template (defined in the infospace's schema directory) against entities.
+Writes structured results to `output/evaluations/`.
+
+**Maps to:** INFRA-TASKS #8, #9
+**Location:** `markitect/infospace/evaluation.py`
+**Depends on:** S1.6, S2.1
+**Deliverable:** Per-entity evaluation pipeline + CLI + prompt template
+
+### S2.4 — Collection-level checks
+
+CLI commands for each of the five collection concerns:
+
+```bash
+# Run all collection checks
+markitect infospace check --provider openrouter
+
+# Run specific checks
+markitect infospace check redundancy --provider openrouter
+markitect infospace check coverage --provider openrouter
+markitect infospace check coherence --provider openrouter
+markitect infospace check consistency --provider openrouter
+markitect infospace check granularity --provider openrouter
+```
+
+Each check uses Stage 1 infrastructure (embeddings, graph analysis, FCA)
+and delegates LLM judgment to the platform. Results written to
+`output/metrics/` as per-concern reports + unified `metrics.yaml`.
+
+**Maps to:** INFRA-TASKS #14-19
+**Location:** `markitect/infospace/checks/` (one module per concern)
+**Depends on:** S1.3, S1.4, S1.6, S1.7, S2.1
+**Deliverable:** Five check modules + unified orchestrator + CLI
+
+### S2.5 — Metrics history and viability tracking
+
+Track metrics over time. After each evaluation or check run, append a
+timestamped snapshot to `metrics-history.yaml`. Provide commands to
+review trends:
+
+```bash
+# Show metrics history
+markitect infospace history
+
+# Compare two snapshots
+markitect infospace history diff 2026-02-18 2026-03-01
+
+# Check viability against thresholds
+markitect infospace viability
+```
+
+Viability is assessed by comparing current metrics to the thresholds
+declared in `infospace.yaml`. A simple pass/fail per metric with the
+actual value.
+
+**Maps to:** INFRA-TASKS #12
+**Location:** `markitect/infospace/history.py`
+**Depends on:** S2.4, S1.5
+**Deliverable:** History tracking + viability assessment + CLI
+
+### S2.6 — Infospace composition model
+
+The mechanism by which one infospace is applied as a discipline to
+another. Builds on `markitect/spaces/composability/`:
+
+- **Discipline binding**: declare that infospace A uses infospace B as a
+  discipline. B's entities become available as mapping targets.
+- **Cross-infospace references**: entity in A maps to concept in B using
+  the same mapping schema and evaluation pipeline.
+- **Discipline viability requirement**: B must be viable (meets its own
+  thresholds) before it can be used as a discipline for A.
+- **Cascading evaluation**: when B's entities change, A's mappings that
+  reference them are flagged for re-evaluation.
+
+```bash
+# Bind a discipline to the current infospace
+markitect infospace bind-discipline ./path/to/vsm-infospace
+
+# List bound disciplines and their viability
+markitect infospace disciplines
+
+# Check for stale mappings after discipline update
+markitect infospace check stale-mappings
+```
+
+**Location:** `markitect/infospace/composition.py`
+**Depends on:** S2.1, existing `markitect/spaces/composability/`
+**Deliverable:** Composition model + CLI + documentation
+
+### S2.7 — Documentation: Infospace Primitives Reference
+
+A reference document explaining all primitives, their purpose, and how
+they compose. This is the user-facing documentation for the infospace
+tooling layer — the equivalent of a framework guide.
+
+**Location:** `docs/infospace-primitives.md` or in-CLI help
+**Depends on:** S2.1-S2.6
+**Deliverable:** Reference documentation
+
+### Summary: Stage 2 dependency graph
+
+```
+S2.1 Model & config ──┬── S2.2 Lifecycle CLI
+                       ├── S2.3 Per-entity evaluation
+                       ├── S2.4 Collection checks ── S2.5 History & viability
+                       └── S2.6 Composition model
+
+S2.7 Documentation (depends on all above)
+```
+
+---
+
+## Stage 3: Example Revision
+
+Revisit the Wealth of Nations / VSM example using the new tooling.
+The example becomes both a tutorial and a validation of the tooling.
+
+### S3.1 — Migrate example to infospace configuration
+
+Replace the ad-hoc `process_chapters.py` setup with a declarative
+`infospace.yaml`:
+
+```yaml
+topic:
+  name: "The Wealth of Nations"
+  domain: "Classical Economics"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Viable System Model"
+    path: artifacts/vsm-reference/
+
+schemas:
+  entity: schemas/economic-entity-schema-v1.0.md
+  mapping: schemas/vsm-mapping-schema-v1.0.md
+  analysis: schemas/chapter-analysis-schema-v1.0.md
+
+competency_questions: schemas/competency-questions.md
+
+viability:
+  redundancy_ratio: { max: 0.05 }
+  coverage_ratio: { min: 0.60 }
+  coherence_components: { max: 1 }
+  consistency_cycles: { max: 0 }
+  granularity_entropy: { min: 1.0 }
+  per_entity_mean: { min: 3.5 }
+
+pipeline:
+  stages:
+    - template: extract-entities
+      spaces: [sources, guidelines, vsm-reference, entities]
+    - template: map-to-vsm
+      spaces: [entities, vsm-reference, guidelines]
+    - template: synthesize-analysis
+      spaces: [sources, entities, mappings, vsm-reference]
+  post_batch:
+    - template: assess-metrics
+      spaces: [analyses, vsm-reference]
+```
+
+**Depends on:** S2.1
+**Deliverable:** `infospace.yaml` + migration of `process_chapters.py` to
+use infospace tooling APIs
+
+### S3.2 — Clean per-chapter git history
+
+Re-run all processed chapters (and remaining ones) with per-chapter
+commits on a clean branch, then replace the current tangled history.
+
+**Maps to:** INFRA-TASKS #4, #7
+**Depends on:** S3.1
+**Deliverable:** Clean branch with one commit per chapter
+
+### S3.3 — Full evaluation run
+
+Run all per-entity evaluations and collection checks on the completed
+infospace. Establish baseline metrics. Demonstrate the viability
+dashboard.
+
+**Maps to:** INFRA-TASKS #6
+**Depends on:** S2.3, S2.4, S2.5, S3.2
+**Deliverable:** Complete evaluation results + viability report
+
+### S3.4 — Rewrite tutorial
+
+Update `TUTORIAL.md` to use infospace tooling commands instead of
+raw `process_chapters.py` invocations. The tutorial should walk
+through:
+
+1. Initialising an infospace (`markitect infospace init`)
+2. Defining schemas and competency questions
+3. Processing chapters (pipeline execution)
+4. Evaluating entities (`markitect infospace evaluate`)
+5. Running collection checks (`markitect infospace check`)
+6. Reviewing viability (`markitect infospace viability`)
+7. Iterating: refining guidelines, re-processing, re-evaluating
+8. Using the infospace as a discipline for a new project
+
+**Depends on:** S3.1-S3.3
+**Deliverable:** Revised `TUTORIAL.md`
+
+### S3.5 — Demonstrate composition
+
+Create a minimal second infospace (e.g. a modern supply chain case
+study or a different economic text) that binds the Wealth of Nations
+infospace as a discipline. Demonstrates the composition model from S2.6.
+
+**Depends on:** S2.6, S3.3
+**Deliverable:** Second example infospace + composition tutorial section
+
+---
+
+## Task Mapping
+
+Cross-reference between INFRA-TASKS numbers and roadmap stages:
+
+| INFRA-TASK | Description | Stage |
+|------------|-------------|-------|
+| 1-3 | Infra fixes (resolved) | — |
+| 4 | Per-chapter git history | S3.2 |
+| 5 | Prompt file side-effects | S1.6 (batch eval avoids this) |
+| 6 | Stale metrics | S3.3 |
+| 7 | Remaining 28 chapters | S3.2 |
+| 8 | Per-concept quality metrics in schema | S2.3 |
+| 9 | Evaluate-entity prompt template | S2.3 |
+| 10 | Deterministic schema compliance | S1.2 |
+| 11 | Structured metrics output | S1.5 |
+| 12 | Metrics-over-time tracking | S2.5 |
+| 13 | Entity metadata index | S1.1 |
+| 14 | Redundancy detection (C1) | S2.4 |
+| 15 | Coverage completeness (C2) | S2.4 |
+| 16 | Structural coherence (C3) | S2.4 |
+| 17 | Definitional consistency (C4) | S2.4 |
+| 18 | Granularity balance (C5) | S2.4 |
+| 19 | Unified collection evaluation | S2.4 |
+
+---
+
+## Implementation Order
+
+Recommended sequence, accounting for dependencies and value delivery:
+
+**Phase A — Foundation (Stage 1, parallelisable)**
+1. S1.1 Entity metadata parser
+2. S1.3 Embedding adapter
+3. S1.4 Graph analysis utilities
+
+**Phase B — Validation & Output (Stage 1)**
+4. S1.2 Schema compliance validator (needs S1.1)
+5. S1.5 Structured evaluation output (needs S1.1)
+6. S1.7 FCA computation (needs S1.1)
+
+**Phase C — Orchestration (Stage 1 → Stage 2 bridge)**
+7. S1.6 Batch LLM evaluation orchestrator (needs S1.5)
+
+**Phase D — Infospace Core (Stage 2)**
+8. S2.1 Infospace model and configuration
+9. S2.2 Lifecycle commands
+10. S2.3 Per-entity evaluation primitives (needs S1.6, S2.1)
+
+**Phase E — Collection Intelligence (Stage 2)**
+11. S2.4 Collection-level checks (needs S1.3, S1.4, S1.7, S2.1)
+12. S2.5 Metrics history and viability tracking
+
+**Phase F — Composition (Stage 2)**
+13. S2.6 Infospace composition model
+14. S2.7 Documentation
+
+**Phase G — Example (Stage 3)**
+15. S3.1 Migrate example to infospace config
+16. S3.2 Clean per-chapter history
+17. S3.3 Full evaluation run
+18. S3.4 Rewrite tutorial
+19. S3.5 Demonstrate composition
--- a/roadmap/infospace-tooling/viable-information-spaces.md
+++ b/roadmap/infospace-tooling/viable-information-spaces.md
@@ -0,0 +1,381 @@
+# Viable Information Spaces
+
+*A preliminary introduction to the concepts, structure, and purpose of
+viable information spaces as a framework for structured knowledge work.*
+
+---
+
+## What is an Information Space?
+
+An information space is a curated collection of concepts — each precisely
+defined, grounded in source material, and connected to the others — that
+together explain a topic. It is not a database, not a knowledge graph in
+the technical sense, and not a document collection. It is closer to what
+a domain expert carries in their head: a working vocabulary of ideas,
+their relationships, and the judgment to know which idea applies where.
+
+The difference is that an information space makes this vocabulary
+**explicit, evaluable, and composable**. Every concept has a written
+definition. Every relationship can be traced. The quality of the whole
+collection can be measured and improved over time.
+
+We use the term **infospace** as shorthand.
+
+---
+
+## Why "Viable"?
+
+The word comes from Stafford Beer's Viable System Model, but the idea
+generalises beyond it. A viable system is one that can maintain a
+separate existence — it is complete enough to function, coherent enough
+to hold together, and adaptive enough to improve when circumstances
+change.
+
+A **viable infospace** has the same properties:
+
+- **Complete enough** — it covers the topic well enough to answer the
+  questions it was built to answer. Not every detail, but every concept
+  that matters.
+- **Coherent enough** — its concepts connect into an explanatory web,
+  not a disconnected list. You can trace how one idea leads to another.
+- **Consistent enough** — concepts don't contradict each other. Terms
+  are used the same way throughout. Definitions don't go in circles.
+- **Balanced enough** — concepts operate at comparable levels of
+  abstraction. The infospace doesn't mix foundational theories with
+  trivial observations without acknowledging the difference.
+- **Non-redundant enough** — each concept earns its place. Two concepts
+  that mean the same thing should be one concept.
+
+None of these are absolute. "Enough" is defined by the purpose. An
+infospace built for teaching needs different coverage than one built for
+research. Viability is a profile of scores against thresholds that the
+user sets.
+
+---
+
+## The Anatomy of an Infospace
+
+### Topic
+
+Every infospace is built to explain something specific. The **topic** is
+the subject matter: a text, a system, a body of knowledge, a problem
+domain. In our first example, the topic is Adam Smith's *The Wealth of
+Nations* — the economic ideas contained in that specific work.
+
+A topic sits within a broader **domain** (economics, biology, software
+engineering) but is more focused. The domain provides context; the topic
+provides the source material from which concepts are extracted.
+
+### Entities
+
+The atomic units of an infospace are its **entities** — the individual
+concepts, mechanisms, and observations that constitute its vocabulary.
+Each entity has:
+
+- A **name** and unique identifier
+- A **definition** — precise, non-circular, distinguishable from
+  neighbouring concepts
+- **Provenance** — where it came from (which chapter, passage, or data
+  source)
+- A **domain placement** — which area of the topic it belongs to
+- **Quality scores** — how well it is defined, grounded, and connected
+
+Entities are stored as individual files, one concept per file. This makes
+them independently addressable, diffable, and composable.
+
+### Schemas
+
+**Schemas** define what a well-formed entity looks like: which sections
+it must have, what validation rules apply, what quality metrics are
+evaluated. A schema is not code — it is a markdown document that both
+humans and LLMs read as instructions.
+
+Schemas serve two purposes:
+
+1. **Structural** — they tell the extraction pipeline what to produce
+   (required sections, word count ranges, heading formats)
+2. **Evaluative** — they define quality rubrics against which each entity
+   is scored (definition precision, source grounding, explanatory value)
+
+By changing a schema, you change what the infospace considers "good"
+without changing any infrastructure.
+
+### Disciplines
+
+Here is where things get interesting. An infospace doesn't just catalogue
+what's in the source material — it looks at the source through a
+**lens**. We call this lens a **discipline**: a structured framework of
+concepts from another domain, applied to illuminate the topic at hand.
+
+In our example, the discipline is Stafford Beer's Viable System Model —
+a set of concepts from systems theory (System 1 through System 5,
+recursion, variety, viability) applied to the economic ideas in Smith's
+work. The VSM provides the analytical structure; Smith provides the raw
+material.
+
+The key insight: **a discipline is itself an infospace.** The VSM
+concepts (S1-S5, recursion, variety, algedonic signals) form their own
+curated, evaluable collection of ideas. To use the VSM as a discipline,
+it must first be a viable infospace in its own right — its concepts must
+be well-defined, coherent, and complete.
+
+This leads to a recursive property: infospaces can be built on top of
+other infospaces. The Wealth of Nations infospace, viewed through the
+VSM lens, could itself become a discipline applied to analyse a modern
+supply chain. Each layer adds structure without losing the detail
+beneath it.
+
+---
+
+## How Infospaces Are Built
+
+Building an infospace is an incremental process with four repeating
+phases:
+
+### 1. Extract
+
+Source material is processed one unit at a time (a chapter, a document,
+a dataset). For each unit, an LLM extracts entities according to the
+schemas and guidelines. Entities that already exist are recognised and
+skipped — the infospace grows by accumulation, not duplication.
+
+### 2. Map
+
+Extracted entities are mapped to the discipline. In our example, each
+economic concept is mapped to a VSM system with a strength rating and
+rationale. This is where the discipline lens does its work: it forces
+the question "what role does this concept play in the larger system?"
+
+### 3. Evaluate
+
+After extraction and mapping, the infospace is evaluated at two levels:
+
+- **Per-entity**: each concept is scored against quality rubrics. Is the
+  definition precise? Is it grounded in the source? Does it connect
+  meaningfully to the discipline?
+- **Collection-level**: the set of concepts is assessed for redundancy,
+  coverage, coherence, consistency, and granularity balance.
+
+Evaluation produces structured, machine-readable scores — not prose
+narratives. These scores are tracked over time.
+
+### 4. Refine
+
+Evaluation reveals what needs improvement. Redundant concepts are merged
+or archived. Coverage gaps are addressed by re-extracting with improved
+guidelines. Inconsistencies are resolved by clarifying definitions.
+Guidelines and schemas are updated. The cycle repeats.
+
+This loop — extract, map, evaluate, refine — is the heartbeat of a
+viable infospace. Each iteration makes the infospace more viable:
+more complete, more coherent, more consistent.
+
+---
+
+## How Infospaces Are Evaluated
+
+Quality is assessed through two complementary mechanisms:
+
+### LLM Evaluation
+
+A language model reads an entity (or a pair of entities) and judges it
+against defined rubrics. This captures qualitative aspects that can't be
+computed mechanically: Is this definition actually precise? Does this
+mapping rationale make sense? Are these two concepts really different?
+
+LLM evaluation is always **delegated** — it runs through prompt templates
+and the platform's LLM integration, never through the human or agent
+working on infrastructure. This separation keeps domain judgment in the
+problem space.
+
+### Deterministic Aggregation
+
+Structured scores from LLM evaluation, plus metrics computed directly
+from files (section counts, word lengths, graph properties, similarity
+matrices), are aggregated into collection-level indicators. These are
+numbers that can be tracked, diffed, and plotted:
+
+- **Redundancy ratio** — what fraction of concepts substantially overlap
+- **Coverage ratio** — what fraction of the domain-discipline matrix is
+  populated
+- **Graph density** — how connected the concept web is
+- **Cycle count** — how many circular definition chains exist
+- **Granularity entropy** — how balanced the abstraction levels are
+
+These indicators, compared against user-defined thresholds, determine
+whether the infospace is **viable** for its intended purpose.
+
+---
+
+## Five Concerns of Collection Quality
+
+Individual concept quality (is this definition good?) is necessary but
+not sufficient. An infospace made of individually excellent concepts can
+still fail as a collection. Five concerns capture what can go wrong:
+
+### Redundancy
+
+Do two concepts mean the same thing? Overlap wastes the reader's
+attention and creates ambiguity about which concept to use. Redundancy is
+detected through embedding similarity (are the definitions close in
+meaning?) confirmed by LLM judgment (are they genuinely the same
+concept, or merely related?).
+
+### Coverage
+
+Does the concept set cover the domain? Are there areas of the topic that
+have no corresponding concepts? Coverage is assessed structurally (which
+cells in the domain-discipline matrix are empty?) and functionally (can
+the infospace answer the questions it was built to answer?).
+
+### Coherence
+
+Do the concepts form a connected web of explanations, or a fragmented
+list of isolated ideas? Coherence is measured through graph analysis:
+connected components (is everything reachable?), modularity (are there
+meaningful clusters?), and bridge concepts (which ideas connect different
+areas?).
+
+### Consistency
+
+Are concepts defined in terms of each other without contradiction? Are
+there circular definition chains? Do definitions use terms that should
+be concepts but aren't? Consistency is checked through dependency graph
+analysis (cycles, undefined terms) and LLM pairwise judgment
+(do related definitions contradict each other?).
+
+### Granularity Balance
+
+Are concepts at comparable levels of abstraction? An infospace that mixes
+broad theoretical principles with narrow observations — without
+acknowledging the difference — confuses more than it explains. Balance
+is assessed by classifying each concept's abstraction level and measuring
+the distribution.
+
+---
+
+## Infospaces as Organisms
+
+The biological metaphor is deliberate. A viable organism maintains its
+identity while exchanging material with its environment. It has internal
+coherence (its parts work together), boundary integrity (it is
+distinguishable from its surroundings), and adaptive capacity (it
+responds to change).
+
+Infospaces exhibit the same properties:
+
+- **Internal coherence** — concepts connect and support each other
+- **Boundary** — the topic and discipline define what belongs and what
+  doesn't
+- **Adaptation** — evaluation and refinement allow the infospace to
+  improve
+
+And like organisms, infospaces don't exist in isolation.
+
+### Hierarchical Composition
+
+One infospace can serve as a discipline for another. The VSM infospace
+provides the lens for the Wealth of Nations infospace, which could
+provide the lens for a supply chain infospace. Each layer adds structure
+and interpretive power. This is analogous to biological organisation:
+cells compose into tissues, tissues into organs, organs into organisms.
+
+For this to work, the lower-level infospace must be viable — you can't
+build reliable analysis on a shaky foundation. A discipline that is
+incomplete or inconsistent will produce unreliable mappings.
+
+### Network Composition
+
+Infospaces can also relate laterally. Two infospaces at the same level
+might share concepts, reference each other's entities, or provide
+complementary views of overlapping domains. A Wealth of Nations infospace
+and a Marx's Capital infospace might share economic entities while
+differing in their analytical discipline.
+
+This networked structure mirrors how knowledge actually works: fields
+overlap, vocabularies are shared and contested, and understanding grows
+by connecting islands of well-organised thought.
+
+### Swarm Behaviour
+
+When many infospaces exist and interact, emergent properties appear.
+Common entities across many infospaces become well-tested through
+repeated evaluation in different contexts. Concepts that survive across
+multiple disciplines are more likely to be fundamental. Gaps visible from
+one perspective may be filled by insights from another.
+
+This is speculative territory for now, but the tooling should be designed
+with it in mind: infospaces as first-class, composable, addressable
+units of knowledge.
+
+---
+
+## The Role of Tooling
+
+An infospace is a living artefact that requires ongoing maintenance. The
+tooling must support every phase of the lifecycle:
+
+### Creating an infospace
+
+Declaring a topic, binding disciplines, defining schemas and competency
+questions, setting viability thresholds. This should be a single
+configuration step, not a programming exercise.
+
+### Populating an infospace
+
+Processing source material through the extract-map pipeline, one unit at
+a time. Progress is tracked. Each addition is committed to version
+history.
+
+### Evaluating an infospace
+
+Running per-entity and collection-level checks. Producing structured,
+machine-readable scores. Comparing against viability thresholds.
+Identifying specific issues (this entity is redundant, this domain gap
+needs filling, these definitions contradict).
+
+### Refining an infospace
+
+Acting on evaluation results: archiving redundant entities, re-extracting
+with improved guidelines, updating schemas, re-evaluating. Every change
+is traceable.
+
+### Composing infospaces
+
+Binding one infospace as a discipline for another. Checking that the
+discipline is viable. Propagating changes when the discipline's concepts
+are updated.
+
+### Monitoring an infospace
+
+Tracking metrics over time. Seeing how coverage, coherence, and
+consistency evolve as content is added. Detecting regressions when a
+re-extraction reduces quality.
+
+The tooling should present these operations as simple, well-documented
+commands — not as infrastructure details. The user thinks in terms of
+"evaluate my infospace" and "check for redundancy", not in terms of
+embedding vectors and graph algorithms.
+
+---
+
+## Where We Are
+
+We have built the first example infospace: 85 economic entities from
+Adam Smith's *The Wealth of Nations*, mapped to Beer's Viable System
+Model, with schemas, prompt templates, and a chapter-by-chapter
+pipeline.
+
+This example has taught us what works (incremental extraction,
+deduplication, flat canonical entity sets, transclusion views) and what's
+missing (per-concept evaluation, collection-level checks, composition
+model, clean tooling commands).
+
+The work ahead is to generalise from this example: build the platform
+capabilities needed, create the tooling layer that makes infospace
+operations accessible, and then revisit the example as both a validation
+and a tutorial.
+
+The goal is that anyone with a body of source material and an analytical
+framework can create a viable infospace — and that infospaces, once
+built, become reusable intellectual tools for future work.
--- a/tests/unit/analysis/init.py
+++ b/tests/unit/analysis/init.py
--- a/tests/unit/analysis/test_fca.py
+++ b/tests/unit/analysis/test_fca.py
@@ -0,0 +1,313 @@
+"""Tests for markitect.analysis.fca."""
+
+import pytest
+
+from markitect.analysis.fca import (
+    FormalContext,
+    FormalConcept,
+    ConceptLattice,
+    find_gap_concepts,
+    find_empty_cells,
+)
+
+
+# ── Test data ────────────────────────────────────────────────────────
+
+
+def _animal_context():
+    """Classic FCA example: animals × properties.
+
+    Context:
+        | animal    | legs | wings | feathers | fur |
+        |-----------|------|-------|----------|-----|
+        | dog       |  x   |       |          |  x  |
+        | cat       |  x   |       |          |  x  |
+        | eagle     |  x   |   x   |    x     |     |
+        | sparrow   |  x   |   x   |    x     |     |
+        | penguin   |  x   |       |    x     |     |
+    """
+    return FormalContext(
+        objects=["dog", "cat", "eagle", "sparrow", "penguin"],
+        attributes=["legs", "wings", "feathers", "fur"],
+        incidence={
+            "dog":     {"legs", "fur"},
+            "cat":     {"legs", "fur"},
+            "eagle":   {"legs", "wings", "feathers"},
+            "sparrow": {"legs", "wings", "feathers"},
+            "penguin": {"legs", "feathers"},
+        },
+    )
+
+
+def _infospace_context():
+    """Simplified infospace-style context: entities × {domain, vsm_system}.
+
+    Entities with domain and VSM classification, including a gap:
+    no entity has both domain:Exchange and vsm:S3.
+    """
+    return FormalContext.from_dict({
+        "division-of-labour":   {"domain:Production", "vsm:S1"},
+        "pin-factory":          {"domain:Production", "vsm:S1"},
+        "market-extent":        {"domain:Exchange", "vsm:S4"},
+        "wage-determination":   {"domain:Distribution", "vsm:S3"},
+        "rent-theory":          {"domain:Distribution", "vsm:S5"},
+        "capital-accumulation": {"domain:Production", "vsm:S3"},
+    })
+
+
+def _empty_context():
+    """Context with no objects."""
+    return FormalContext([], ["a", "b"], {})
+
+
+def _single_entity():
+    """Context with one object."""
+    return FormalContext(["only"], ["x", "y"], {"only": {"x", "y"}})
+
+
+# ── FormalContext ────────────────────────────────────────────────────
+
+
+class TestFormalContext:
+    def test_objects_sorted(self):
+        ctx = _animal_context()
+        assert ctx.objects == sorted(ctx.objects)
+
+    def test_attributes_sorted(self):
+        ctx = _animal_context()
+        assert ctx.attributes == sorted(ctx.attributes)
+
+    def test_object_count(self):
+        assert _animal_context().object_count == 5
+
+    def test_attribute_count(self):
+        assert _animal_context().attribute_count == 4
+
+    def test_extent_single_attr(self):
+        ctx = _animal_context()
+        assert ctx.extent(["fur"]) == frozenset({"dog", "cat"})
+
+    def test_extent_multiple_attrs(self):
+        ctx = _animal_context()
+        assert ctx.extent(["wings", "feathers"]) == frozenset({"eagle", "sparrow"})
+
+    def test_extent_empty_returns_all(self):
+        ctx = _animal_context()
+        assert ctx.extent([]) == frozenset(ctx.objects)
+
+    def test_extent_no_match(self):
+        ctx = _animal_context()
+        assert ctx.extent(["fur", "feathers"]) == frozenset()
+
+    def test_intent_single_obj(self):
+        ctx = _animal_context()
+        assert ctx.intent(["penguin"]) == frozenset({"legs", "feathers"})
+
+    def test_intent_multiple_objs(self):
+        ctx = _animal_context()
+        # dog and cat share: legs, fur
+        assert ctx.intent(["dog", "cat"]) == frozenset({"legs", "fur"})
+
+    def test_intent_empty_returns_all(self):
+        ctx = _animal_context()
+        assert ctx.intent([]) == frozenset(ctx.attributes)
+
+    def test_closure_is_idempotent(self):
+        ctx = _animal_context()
+        c1 = ctx.closure({"fur"})
+        c2 = ctx.closure(c1)
+        assert c1 == c2
+
+    def test_closure_expands(self):
+        ctx = _animal_context()
+        # fur → {dog, cat} → {legs, fur} (both have legs too)
+        assert ctx.closure({"fur"}) == frozenset({"legs", "fur"})
+
+    def test_has_attribute(self):
+        ctx = _animal_context()
+        assert ctx.has_attribute("dog", "legs") is True
+        assert ctx.has_attribute("dog", "wings") is False
+
+    def test_density(self):
+        ctx = _animal_context()
+        # 5 objects × 4 attributes = 20 cells
+        # dog:2, cat:2, eagle:3, sparrow:3, penguin:2 = 12 filled
+        assert ctx.density() == pytest.approx(12 / 20)
+
+    def test_density_empty(self):
+        assert FormalContext([], [], {}).density() == 0.0
+
+    def test_from_dict(self):
+        ctx = FormalContext.from_dict({
+            "a": {"x", "y"},
+            "b": {"y", "z"},
+        })
+        assert ctx.object_count == 2
+        assert ctx.attribute_count == 3
+
+    def test_unknown_attributes_ignored(self):
+        ctx = FormalContext(
+            ["a"], ["x"], {"a": {"x", "unknown"}}
+        )
+        assert ctx.intent(["a"]) == frozenset({"x"})
+
+
+# ── ConceptLattice ──────────────────────────────────────────────────
+
+
+class TestConceptLattice:
+    def test_animal_concept_count(self):
+        ctx = _animal_context()
+        lattice = ConceptLattice.from_context(ctx)
+        # Known: the animal context produces exactly 7 formal concepts
+        # Top: ({all}, {legs}), Bottom: ({}, {all 4}),
+        # plus intermediate concepts
+        assert lattice.size >= 5
+
+    def test_top_has_all_objects(self):
+        ctx = _animal_context()
+        lattice = ConceptLattice.from_context(ctx)
+        top = lattice.top
+        assert top is not None
+        assert top.extent == frozenset(ctx.objects)
+
+    def test_top_intent_is_common_attributes(self):
+        ctx = _animal_context()
+        lattice = ConceptLattice.from_context(ctx)
+        top = lattice.top
+        # All animals have "legs"
+        assert "legs" in top.intent
+
+    def test_bottom_has_all_attributes(self):
+        ctx = _animal_context()
+        lattice = ConceptLattice.from_context(ctx)
+        bottom = lattice.bottom
+        assert bottom is not None
+        assert bottom.intent == frozenset(ctx.attributes)
+
+    def test_bottom_extent_empty_when_no_universal_object(self):
+        ctx = _animal_context()
+        lattice = ConceptLattice.from_context(ctx)
+        bottom = lattice.bottom
+        # No animal has all 4 attributes
+        assert bottom.extent_size == 0
+
+    def test_all_concepts_are_closed(self):
+        ctx = _animal_context()
+        lattice = ConceptLattice.from_context(ctx)
+        for concept in lattice.concepts:
+            # intent should be closed: closure(intent) == intent
+            assert ctx.closure(concept.intent) == concept.intent
+            # extent' should equal intent
+            assert ctx.intent(concept.extent) == concept.intent
+            # intent' should equal extent
+            assert ctx.extent(concept.intent) == concept.extent
+
+    def test_empty_context(self):
+        ctx = _empty_context()
+        lattice = ConceptLattice.from_context(ctx)
+        # Empty context → gap concepts for all attribute combinations
+        assert lattice.size >= 1
+
+    def test_single_entity(self):
+        ctx = _single_entity()
+        lattice = ConceptLattice.from_context(ctx)
+        # At least 1 concept containing the single entity
+        has_entity = any(
+            "only" in c.extent for c in lattice.concepts
+        )
+        assert has_entity
+
+    def test_no_attributes_produces_one_concept(self):
+        ctx = FormalContext(["a", "b"], [], {})
+        lattice = ConceptLattice.from_context(ctx)
+        assert lattice.size == 1
+        assert lattice.concepts[0].extent == frozenset({"a", "b"})
+
+    def test_depth(self):
+        ctx = _animal_context()
+        lattice = ConceptLattice.from_context(ctx)
+        d = lattice.depth()
+        # At least 2 levels (top → bottom)
+        assert d >= 2
+
+    def test_depth_empty(self):
+        lattice = ConceptLattice(concepts=[])
+        assert lattice.depth() == 0
+
+
+# ── Gap concepts ────────────────────────────────────────────────────
+
+
+class TestGapConcepts:
+    def test_animal_has_gap(self):
+        ctx = _animal_context()
+        gaps = find_gap_concepts(ctx)
+        # {fur, feathers} has no animal → gap concept
+        fur_feathers_gap = any(
+            {"fur", "feathers"} <= c.intent for c in gaps
+        )
+        assert fur_feathers_gap
+
+    def test_gap_extents_are_empty(self):
+        ctx = _animal_context()
+        gaps = find_gap_concepts(ctx)
+        for gap in gaps:
+            assert gap.extent_size == 0
+
+    def test_no_gaps_when_all_combinations_covered(self):
+        # Every attribute combination has at least one object
+        ctx = FormalContext.from_dict({
+            "obj1": {"a", "b"},
+            "obj2": {"a"},
+            "obj3": {"b"},
+        })
+        lattice = ConceptLattice.from_context(ctx)
+        gaps = find_gap_concepts(ctx, lattice)
+        assert len(gaps) == 0
+
+    def test_sorted_by_intent_size(self):
+        ctx = _animal_context()
+        gaps = find_gap_concepts(ctx)
+        sizes = [g.intent_size for g in gaps]
+        assert sizes == sorted(sizes)
+
+    def test_infospace_gap(self):
+        ctx = _infospace_context()
+        gaps = find_gap_concepts(ctx)
+        # domain:Exchange + vsm:S1 has no entity → should appear as gap
+        gap_intents = [g.intent for g in gaps]
+        exchange_s1_covered = any(
+            {"domain:Exchange", "vsm:S1"} <= intent for intent in gap_intents
+        )
+        assert exchange_s1_covered
+
+
+# ── Empty cells (cross-tab) ─────────────────────────────────────────
+
+
+class TestFindEmptyCells:
+    def test_finds_empty_cells(self):
+        ctx = _infospace_context()
+        domains = ["domain:Production", "domain:Distribution", "domain:Exchange"]
+        vsm_systems = ["vsm:S1", "vsm:S3", "vsm:S4", "vsm:S5"]
+        empty = find_empty_cells(ctx, domains, vsm_systems)
+        # domain:Exchange + vsm:S1 should be empty
+        assert ("domain:Exchange", "vsm:S1") in empty
+        # domain:Production + vsm:S1 should NOT be empty (division-of-labour)
+        assert ("domain:Production", "vsm:S1") not in empty
+
+    def test_all_filled_returns_empty_list(self):
+        ctx = FormalContext.from_dict({
+            "a": {"x", "y"},
+            "b": {"x", "z"},
+            "c": {"y", "z"},
+            "d": {"x", "y", "z"},
+        })
+        empty = find_empty_cells(ctx, ["x", "y"], ["z"])
+        assert empty == []
+
+    def test_empty_context_all_cells_empty(self):
+        ctx = FormalContext([], ["a", "b", "c"], {})
+        empty = find_empty_cells(ctx, ["a"], ["b", "c"])
+        assert len(empty) == 2
--- a/tests/unit/analysis/test_graph.py
+++ b/tests/unit/analysis/test_graph.py
@@ -0,0 +1,254 @@
+"""Tests for markitect.analysis.graph."""
+
+import pytest
+
+nx = pytest.importorskip("networkx", reason="networkx not installed")
+
+from markitect.prompts.dependencies.models import DependencyGraph, EdgeType
+from markitect.analysis.graph import (
+    to_networkx,
+    connected_components,
+    betweenness_centrality,
+    detect_communities,
+    modularity_score,
+    degree_distribution,
+    cohesion_coupling,
+)
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+
+def _linear_graph():
+    """A -> B -> C -> D (simple chain)."""
+    g = DependencyGraph()
+    g.add_edge("A", "B")
+    g.add_edge("B", "C")
+    g.add_edge("C", "D")
+    return g
+
+
+def _two_clusters():
+    """Two dense clusters connected by a single bridge edge.
+
+    Cluster 1: A -- B -- C (fully connected)
+    Cluster 2: X -- Y -- Z (fully connected)
+    Bridge: C -> X
+    """
+    g = DependencyGraph()
+    # Cluster 1
+    g.add_edge("A", "B")
+    g.add_edge("B", "A")
+    g.add_edge("B", "C")
+    g.add_edge("C", "B")
+    g.add_edge("A", "C")
+    g.add_edge("C", "A")
+    # Cluster 2
+    g.add_edge("X", "Y")
+    g.add_edge("Y", "X")
+    g.add_edge("Y", "Z")
+    g.add_edge("Z", "Y")
+    g.add_edge("X", "Z")
+    g.add_edge("Z", "X")
+    # Bridge
+    g.add_edge("C", "X")
+    return g
+
+
+def _disconnected_graph():
+    """Two separate components: {A, B} and {X, Y}."""
+    g = DependencyGraph()
+    g.add_edge("A", "B")
+    g.add_edge("X", "Y")
+    return g
+
+
+def _empty_graph():
+    """Graph with no nodes or edges."""
+    return DependencyGraph()
+
+
+def _isolated_nodes():
+    """Graph with nodes but no edges."""
+    g = DependencyGraph()
+    # add_edge creates both nodes, so we use two separate edges
+    # and then extract a subgraph with isolated nodes
+    g.add_edge("A", "B")
+    return g.get_subgraph({"A", "B", "C"})
+
+
+# ── to_networkx ─────────────────────────────────────────────────────
+
+
+class TestToNetworkx:
+    def test_preserves_nodes(self):
+        g = _linear_graph()
+        G = to_networkx(g)
+        assert set(G.nodes) == {"A", "B", "C", "D"}
+
+    def test_preserves_edges(self):
+        g = _linear_graph()
+        G = to_networkx(g)
+        assert G.has_edge("A", "B")
+        assert G.has_edge("B", "C")
+        assert not G.has_edge("D", "A")
+
+    def test_preserves_edge_type(self):
+        g = DependencyGraph()
+        g.add_edge("A", "B", EdgeType.GENERATES)
+        G = to_networkx(g)
+        assert G.edges["A", "B"]["edge_type"] == "generates"
+
+    def test_empty_graph(self):
+        G = to_networkx(_empty_graph())
+        assert len(G.nodes) == 0
+        assert len(G.edges) == 0
+
+
+# ── Connected components ────────────────────────────────────────────
+
+
+class TestConnectedComponents:
+    def test_single_component(self):
+        comps = connected_components(_linear_graph())
+        assert len(comps) == 1
+        assert comps[0] == {"A", "B", "C", "D"}
+
+    def test_two_components(self):
+        comps = connected_components(_disconnected_graph())
+        assert len(comps) == 2
+        node_sets = [frozenset(c) for c in comps]
+        assert frozenset({"A", "B"}) in node_sets
+        assert frozenset({"X", "Y"}) in node_sets
+
+    def test_sorted_largest_first(self):
+        g = DependencyGraph()
+        g.add_edge("A", "B")
+        g.add_edge("B", "C")
+        g.add_edge("X", "Y")
+        comps = connected_components(g)
+        assert len(comps[0]) >= len(comps[1])
+
+    def test_empty_graph(self):
+        assert connected_components(_empty_graph()) == []
+
+
+# ── Betweenness centrality ──────────────────────────────────────────
+
+
+class TestBetweennessCentrality:
+    def test_linear_chain_middle_node_highest(self):
+        g = _linear_graph()
+        bc = betweenness_centrality(g)
+        # B and C are on all shortest paths between endpoints
+        assert bc["B"] > bc["A"]
+        assert bc["C"] > bc["D"]
+
+    def test_values_in_range(self):
+        bc = betweenness_centrality(_two_clusters())
+        for v in bc.values():
+            assert 0.0 <= v <= 1.0
+
+    def test_empty_graph(self):
+        assert betweenness_centrality(_empty_graph()) == {}
+
+
+# ── Community detection ─────────────────────────────────────────────
+
+
+class TestDetectCommunities:
+    def test_two_clusters_detected(self):
+        comms = detect_communities(_two_clusters(), seed=42)
+        # Should detect at least 2 communities
+        assert len(comms) >= 2
+        # Each node in exactly one community
+        all_nodes = set()
+        for c in comms:
+            all_nodes.update(c)
+        assert all_nodes == {"A", "B", "C", "X", "Y", "Z"}
+
+    def test_deterministic_with_seed(self):
+        g = _two_clusters()
+        c1 = detect_communities(g, seed=42)
+        c2 = detect_communities(g, seed=42)
+        assert c1 == c2
+
+    def test_empty_graph(self):
+        assert detect_communities(_empty_graph()) == []
+
+    def test_sorted_largest_first(self):
+        comms = detect_communities(_two_clusters(), seed=42)
+        sizes = [len(c) for c in comms]
+        assert sizes == sorted(sizes, reverse=True)
+
+
+# ── Modularity score ────────────────────────────────────────────────
+
+
+class TestModularityScore:
+    def test_no_edges_returns_zero(self):
+        assert modularity_score(_empty_graph()) == 0.0
+
+    def test_two_clusters_positive(self):
+        g = _two_clusters()
+        comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
+        score = modularity_score(g, communities=comms)
+        assert score > 0.0
+
+    def test_single_community_near_zero(self):
+        g = _two_clusters()
+        all_nodes = {"A", "B", "C", "X", "Y", "Z"}
+        score = modularity_score(g, communities=[all_nodes])
+        assert score == pytest.approx(0.0, abs=1e-10)
+
+
+# ── Degree distribution ─────────────────────────────────────────────
+
+
+class TestDegreeDistribution:
+    def test_linear_chain(self):
+        dd = degree_distribution(_linear_graph())
+        # A: out=1 in=0; B: out=1 in=1; D: out=0 in=1
+        assert dd["A"]["out_degree"] == 1
+        assert dd["A"]["in_degree"] == 0
+        assert dd["B"]["in_degree"] == 1
+        assert dd["B"]["out_degree"] == 1
+        assert dd["D"]["in_degree"] == 1
+        assert dd["D"]["out_degree"] == 0
+
+    def test_total_degree(self):
+        dd = degree_distribution(_linear_graph())
+        for node, degrees in dd.items():
+            assert degrees["total_degree"] == degrees["in_degree"] + degrees["out_degree"]
+
+    def test_empty_graph(self):
+        assert degree_distribution(_empty_graph()) == {}
+
+
+# ── Cohesion / coupling ─────────────────────────────────────────────
+
+
+class TestCohesionCoupling:
+    def test_two_clusters_with_bridge(self):
+        g = _two_clusters()
+        comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
+        cc = cohesion_coupling(g, communities=comms)
+        # 12 intra-cluster edges + 1 bridge = 13 total
+        assert cc["intra_edges"] == 12
+        assert cc["inter_edges"] == 1
+        assert cc["total_edges"] == 13
+        assert cc["cohesion"] == pytest.approx(12 / 13)
+        assert cc["coupling"] == pytest.approx(1 / 13)
+        assert cc["communities"] == 2
+
+    def test_no_edges(self):
+        cc = cohesion_coupling(_empty_graph())
+        assert cc["cohesion"] == 0.0
+        assert cc["coupling"] == 0.0
+        assert cc["total_edges"] == 0
+
+    def test_ratios_sum_to_one(self):
+        g = _two_clusters()
+        comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
+        cc = cohesion_coupling(g, communities=comms)
+        assert cc["cohesion"] + cc["coupling"] == pytest.approx(1.0)
--- a/tests/unit/core/init.py
+++ b/tests/unit/core/init.py
--- a/tests/unit/core/test_section_tree.py
+++ b/tests/unit/core/test_section_tree.py
@@ -0,0 +1,137 @@
+"""Tests for markitect.core.section_tree."""
+
+from markitect.core.parser import parse_markdown_to_ast
+from markitect.core.section_tree import (
+    build_section_tree,
+    extract_heading_content,
+    extract_heading_level,
+    extract_section_text,
+    slugify,
+)
+
+
+class TestSlugify:
+    def test_simple_text(self):
+        assert slugify("Hello World") == "hello_world"
+
+    def test_german_umlauts(self):
+        assert slugify("Ärger mit Über") == "aerger_mit_ueber"
+
+    def test_special_characters(self):
+        assert slugify("Smith's Original Wording") == "smith_s_original_wording"
+
+    def test_empty_string(self):
+        assert slugify("") == "feld"
+
+    def test_trailing_underscores_stripped(self):
+        assert slugify("--hello--") == "hello"
+
+    def test_multiple_spaces(self):
+        assert slugify("a   b") == "a_b"
+
+
+class TestExtractHeadingLevel:
+    def test_h1(self):
+        assert extract_heading_level("h1") == 1
+
+    def test_h6(self):
+        assert extract_heading_level("h6") == 6
+
+    def test_invalid_tag(self):
+        assert extract_heading_level("p") == 1
+
+    def test_empty(self):
+        assert extract_heading_level("") == 1
+
+
+class TestExtractHeadingContent:
+    def test_finds_inline_token(self):
+        tokens = [
+            {"type": "heading_open", "tag": "h1"},
+            {"type": "inline", "content": "Hello"},
+            {"type": "heading_close", "tag": "h1"},
+        ]
+        assert extract_heading_content(tokens, 0) == "Hello"
+
+    def test_no_inline(self):
+        tokens = [
+            {"type": "heading_open", "tag": "h1"},
+            {"type": "heading_close", "tag": "h1"},
+        ]
+        assert extract_heading_content(tokens, 0) == ""
+
+
+class TestBuildSectionTree:
+    def test_single_heading(self):
+        md = "# Title\n\nSome text."
+        tokens = parse_markdown_to_ast(md)
+        tree = build_section_tree(tokens)
+
+        assert tree["level"] == 0
+        assert len(tree["children"]) == 1
+        assert tree["children"][0]["heading"] == "Title"
+        assert tree["children"][0]["level"] == 1
+
+    def test_nested_headings(self):
+        md = "# Top\n\n## Sub\n\ntext\n\n## Sub2\n\nmore"
+        tokens = parse_markdown_to_ast(md)
+        tree = build_section_tree(tokens)
+
+        top = tree["children"][0]
+        assert top["heading"] == "Top"
+        assert len(top["children"]) == 2
+        assert top["children"][0]["heading"] == "Sub"
+        assert top["children"][1]["heading"] == "Sub2"
+
+    def test_max_depth(self):
+        md = "# Top\n\n## Sub\n\n### Deep\n\ntext"
+        tokens = parse_markdown_to_ast(md)
+        tree = build_section_tree(tokens, max_depth=2)
+
+        top = tree["children"][0]
+        sub = top["children"][0]
+        # H3 should be excluded from tree
+        assert len(sub["children"]) == 0
+
+    def test_content_tokens_captured(self):
+        md = "# Title\n\nParagraph text here."
+        tokens = parse_markdown_to_ast(md)
+        tree = build_section_tree(tokens)
+
+        section = tree["children"][0]
+        inline_tokens = [t for t in section["content_tokens"] if t.get("type") == "inline"]
+        assert len(inline_tokens) == 1
+        assert "Paragraph text here" in inline_tokens[0]["content"]
+
+    def test_slug_assigned(self):
+        md = "# Economic Domain\n\ntext"
+        tokens = parse_markdown_to_ast(md)
+        tree = build_section_tree(tokens)
+
+        assert tree["children"][0]["slug"] == "economic_domain"
+
+    def test_empty_document(self):
+        tokens = parse_markdown_to_ast("")
+        tree = build_section_tree(tokens)
+        assert tree["children"] == []
+
+
+class TestExtractSectionText:
+    def test_simple_paragraph(self):
+        md = "# Title\n\nHello world."
+        tokens = parse_markdown_to_ast(md)
+        tree = build_section_tree(tokens)
+        text = extract_section_text(tree["children"][0])
+        assert text == "Hello world."
+
+    def test_multiple_paragraphs(self):
+        md = "# Title\n\nFirst paragraph.\n\nSecond paragraph."
+        tokens = parse_markdown_to_ast(md)
+        tree = build_section_tree(tokens)
+        text = extract_section_text(tree["children"][0])
+        assert "First paragraph." in text
+        assert "Second paragraph." in text
+
+    def test_empty_section(self):
+        section = {"content_tokens": []}
+        assert extract_section_text(section) == ""
--- a/tests/unit/infospace/init.py
+++ b/tests/unit/infospace/init.py
--- a/tests/unit/infospace/test_checks.py
+++ b/tests/unit/infospace/test_checks.py
@@ -0,0 +1,413 @@
+"""
+Tests for collection-level quality checks (S2.4).
+
+Covers all five concerns: Redundancy (C1), Coverage (C2), Coherence (C3),
+Consistency (C4), Granularity (C5), and the orchestrator.
+"""
+
+from __future__ import annotations
+
+import math
+
+import pytest
+
+from markitect.infospace.models import EntityMeta
+from markitect.prompts.dependencies.models import DependencyGraph
+
+
+# ── helpers ──────────────────────────────────────────────────────────
+
+
+def _entity(slug: str, domain: str = "", definition: str = "",
+            source_chapter: str = "", word_count: int = 0) -> EntityMeta:
+    wc = word_count if word_count else (len(definition.split()) if definition else 0)
+    return EntityMeta(
+        slug=slug,
+        title=slug.replace("-", " ").title(),
+        h1_raw=slug.replace("-", " ").title(),
+        definition=definition,
+        domain=domain,
+        source_chapter=source_chapter,
+        definition_word_count=wc,
+        total_word_count=wc,
+    )
+
+
+def _sample_entities() -> list[EntityMeta]:
+    return [
+        _entity("alpha", domain="economics", definition="the first concept in our model", source_chapter="ch01"),
+        _entity("beta", domain="economics", definition="the second concept about markets", source_chapter="ch01"),
+        _entity("gamma", domain="sociology", definition="a social structure framework", source_chapter="ch02"),
+        _entity("delta", domain="sociology", definition="a social dynamic pattern", source_chapter="ch02"),
+        _entity("epsilon", domain="philosophy", definition="an epistemic principle", source_chapter="ch03"),
+    ]
+
+
+def _linear_graph() -> DependencyGraph:
+    """A -> B -> C -> D."""
+    g = DependencyGraph()
+    g.add_edge("A", "B")
+    g.add_edge("B", "C")
+    g.add_edge("C", "D")
+    return g
+
+
+def _cyclic_graph() -> DependencyGraph:
+    """A -> B -> C -> A (one cycle)."""
+    g = DependencyGraph()
+    g.add_edge("A", "B")
+    g.add_edge("B", "C")
+    g.add_edge("C", "A")
+    return g
+
+
+def _can_import_graph_analysis():
+    try:
+        from markitect.analysis.graph import connected_components  # noqa: F401
+        return True
+    except ImportError:
+        return False
+
+
+# ── C1: Redundancy ──────────────────────────────────────────────────
+
+
+class TestRedundancy:
+    def test_empty_entities(self):
+        from markitect.infospace.checks.redundancy import check_redundancy
+        report = check_redundancy([])
+        assert report.entity_count == 0
+        assert report.redundancy_ratio == 0.0
+        assert report.similar_pairs == []
+
+    def test_single_entity(self):
+        from markitect.infospace.checks.redundancy import check_redundancy
+        report = check_redundancy([_entity("a", definition="hello world")])
+        assert report.entity_count == 1
+        assert report.redundancy_ratio == 0.0
+
+    def test_no_overlap_word_fallback(self):
+        from markitect.infospace.checks.redundancy import check_redundancy
+        entities = [
+            _entity("a", definition="apple banana cherry"),
+            _entity("b", definition="delta epsilon zeta"),
+        ]
+        report = check_redundancy(entities, threshold=0.5)
+        assert report.similar_pairs == []
+        assert report.redundancy_ratio == 0.0
+
+    def test_high_overlap_word_fallback(self):
+        from markitect.infospace.checks.redundancy import check_redundancy
+        entities = [
+            _entity("a", definition="the quick brown fox"),
+            _entity("b", definition="the quick brown dog"),
+        ]
+        report = check_redundancy(entities, threshold=0.5)
+        assert len(report.similar_pairs) == 1
+        assert report.similar_pairs[0]["method"] == "word_overlap"
+        assert report.similar_pairs[0]["entity_a"] == "a"
+        assert report.similar_pairs[0]["entity_b"] == "b"
+        assert report.redundancy_ratio == 1.0  # both entities involved
+
+    def test_embedding_based(self):
+        from markitect.infospace.checks.redundancy import check_redundancy
+        entities = [
+            _entity("a", definition="x"),
+            _entity("b", definition="y"),
+            _entity("c", definition="z"),
+        ]
+        # a and b are very similar; c is different
+        embeddings = {
+            "a": [1.0, 0.0, 0.0],
+            "b": [0.99, 0.1, 0.0],
+            "c": [0.0, 0.0, 1.0],
+        }
+        report = check_redundancy(entities, embeddings=embeddings, threshold=0.9)
+        assert len(report.similar_pairs) >= 1
+        assert report.similar_pairs[0]["method"] == "embedding"
+        assert report.redundancy_ratio > 0.0
+
+    def test_to_dict(self):
+        from markitect.infospace.checks.redundancy import RedundancyReport
+        r = RedundancyReport(similar_pairs=[], redundancy_ratio=0.25, entity_count=10)
+        d = r.to_dict()
+        assert d["concern"] == "C1"
+        assert d["redundancy_ratio"] == 0.25
+        assert d["entity_count"] == 10
+
+
+# ── C2: Coverage ────────────────────────────────────────────────────
+
+
+class TestCoverage:
+    def test_empty_entities(self):
+        from markitect.infospace.checks.coverage import check_coverage
+        report = check_coverage([])
+        assert report.entity_count == 0
+        assert report.coverage_ratio == 0.0
+
+    def test_full_coverage(self):
+        """All domain×chapter cells are populated."""
+        from markitect.infospace.checks.coverage import check_coverage
+        entities = [
+            _entity("a", domain="d1", source_chapter="ch1"),
+            _entity("b", domain="d2", source_chapter="ch1"),
+            _entity("c", domain="d1", source_chapter="ch2"),
+            _entity("d", domain="d2", source_chapter="ch2"),
+        ]
+        report = check_coverage(entities)
+        assert report.coverage_ratio == 1.0
+        assert report.empty_cells == []
+
+    def test_partial_coverage(self):
+        """One cell is missing → coverage < 1.0."""
+        from markitect.infospace.checks.coverage import check_coverage
+        entities = [
+            _entity("a", domain="d1", source_chapter="ch1"),
+            _entity("b", domain="d2", source_chapter="ch1"),
+            _entity("c", domain="d1", source_chapter="ch2"),
+            # Missing: d2×ch2
+        ]
+        report = check_coverage(entities)
+        assert report.coverage_ratio < 1.0
+        assert len(report.empty_cells) == 1
+        assert report.empty_cells[0]["dimension_a"] == "domain:d2"
+        assert report.empty_cells[0]["dimension_b"] == "chapter:ch2"
+
+    def test_domain_counts(self):
+        from markitect.infospace.checks.coverage import check_coverage
+        entities = _sample_entities()
+        report = check_coverage(entities)
+        assert report.domain_counts["economics"] == 2
+        assert report.domain_counts["sociology"] == 2
+        assert report.domain_counts["philosophy"] == 1
+
+    def test_to_dict(self):
+        from markitect.infospace.checks.coverage import CoverageReport
+        r = CoverageReport(coverage_ratio=0.75, entity_count=8)
+        d = r.to_dict()
+        assert d["concern"] == "C2"
+        assert d["coverage_ratio"] == 0.75
+
+    def test_extra_attributes(self):
+        from markitect.infospace.checks.coverage import check_coverage
+        entities = [
+            _entity("a", domain="d1", source_chapter="ch1"),
+        ]
+        extra = {"a": {"vsm:production"}}
+        report = check_coverage(entities, extra_attributes=extra)
+        assert report.entity_count == 1
+
+
+# ── C3: Coherence ───────────────────────────────────────────────────
+
+
+class TestCoherence:
+    def test_no_graph(self):
+        from markitect.infospace.checks.coherence import check_coherence
+        report = check_coherence(graph=None, entity_count=5)
+        assert report.connected_components == 0
+        assert report.entity_count == 5
+
+    def test_empty_graph(self):
+        from markitect.infospace.checks.coherence import check_coherence
+        g = DependencyGraph()
+        report = check_coherence(graph=g, entity_count=0)
+        assert report.connected_components == 0
+
+    def test_to_dict(self):
+        from markitect.infospace.checks.coherence import CoherenceReport
+        r = CoherenceReport(connected_components=2, modularity=0.3456, entity_count=10)
+        d = r.to_dict()
+        assert d["concern"] == "C3"
+        assert d["modularity"] == 0.3456
+        assert d["connected_components"] == 2
+
+    @pytest.mark.skipif(
+        not _can_import_graph_analysis(),
+        reason="networkx not available",
+    )
+    def test_with_graph(self):
+        from markitect.infospace.checks.coherence import check_coherence
+        g = _linear_graph()
+        report = check_coherence(graph=g, entity_count=4)
+        assert report.connected_components >= 1
+        assert report.entity_count == 4
+
+
+# ── C4: Consistency ─────────────────────────────────────────────────
+
+
+class TestConsistency:
+    def test_no_graph(self):
+        from markitect.infospace.checks.consistency import check_consistency
+        entities = _sample_entities()
+        report = check_consistency(entities)
+        assert report.cycle_count == 0
+        assert report.entity_count == 5
+
+    def test_acyclic_graph(self):
+        from markitect.infospace.checks.consistency import check_consistency
+        entities = _sample_entities()
+        g = _linear_graph()
+        report = check_consistency(entities, graph=g)
+        assert report.cycle_count == 0
+
+    def test_cyclic_graph(self):
+        from markitect.infospace.checks.consistency import check_consistency
+        entities = _sample_entities()
+        g = _cyclic_graph()
+        report = check_consistency(entities, graph=g)
+        assert report.cycle_count >= 1
+        assert len(report.cycles) >= 1
+
+    def test_to_dict(self):
+        from markitect.infospace.checks.consistency import ConsistencyReport
+        r = ConsistencyReport(cycles=[["A", "B", "A"]], cycle_count=1, entity_count=5)
+        d = r.to_dict()
+        assert d["concern"] == "C4"
+        assert d["cycle_count"] == 1
+
+
+# ── C5: Granularity ─────────────────────────────────────────────────
+
+
+class TestGranularity:
+    def test_empty_entities(self):
+        from markitect.infospace.checks.granularity import check_granularity
+        report = check_granularity([])
+        assert report.entity_count == 0
+        assert report.domain_entropy == 0.0
+
+    def test_single_domain(self):
+        from markitect.infospace.checks.granularity import check_granularity
+        entities = [
+            _entity("a", domain="d1", word_count=10),
+            _entity("b", domain="d1", word_count=20),
+        ]
+        report = check_granularity(entities)
+        assert report.domain_entropy == 0.0  # single domain = zero entropy
+        assert report.entity_count == 2
+        assert report.word_count_stats["mean"] == 15.0
+
+    def test_balanced_domains(self):
+        from markitect.infospace.checks.granularity import check_granularity
+        entities = [
+            _entity("a", domain="d1", word_count=10),
+            _entity("b", domain="d2", word_count=10),
+        ]
+        report = check_granularity(entities)
+        assert report.domain_entropy == pytest.approx(1.0)  # log2(2) = 1.0
+        assert report.domain_distribution == {"d1": 1, "d2": 1}
+
+    def test_word_count_stats(self):
+        from markitect.infospace.checks.granularity import check_granularity
+        entities = [
+            _entity("a", domain="d1", word_count=10),
+            _entity("b", domain="d1", word_count=30),
+        ]
+        report = check_granularity(entities)
+        assert report.word_count_stats["mean"] == 20.0
+        assert report.word_count_stats["min"] == 10.0
+        assert report.word_count_stats["max"] == 30.0
+        assert report.word_count_stats["std"] == 10.0
+
+    def test_to_dict(self):
+        from markitect.infospace.checks.granularity import GranularityReport
+        r = GranularityReport(domain_entropy=1.5, entity_count=4)
+        d = r.to_dict()
+        assert d["concern"] == "C5"
+        assert d["domain_entropy"] == 1.5
+
+    def test_unspecified_domain(self):
+        from markitect.infospace.checks.granularity import check_granularity
+        entities = [_entity("a", domain="", word_count=10)]
+        report = check_granularity(entities)
+        assert "(unspecified)" in report.domain_distribution
+
+
+# ── Orchestrator ────────────────────────────────────────────────────
+
+
+class TestOrchestrator:
+    def test_run_all_default(self):
+        from markitect.infospace.checks.orchestrator import run_all_checks
+        entities = _sample_entities()
+        report = run_all_checks(entities)
+        assert report.redundancy is not None
+        assert report.coverage is not None
+        assert report.coherence is not None
+        assert report.consistency is not None
+        assert report.granularity is not None
+
+    def test_run_selected_checks(self):
+        from markitect.infospace.checks.orchestrator import run_all_checks
+        entities = _sample_entities()
+        report = run_all_checks(entities, checks=["redundancy", "granularity"])
+        assert report.redundancy is not None
+        assert report.granularity is not None
+        assert report.coverage is None
+        assert report.coherence is None
+        assert report.consistency is None
+
+    def test_to_dict(self):
+        from markitect.infospace.checks.orchestrator import run_all_checks
+        entities = _sample_entities()
+        report = run_all_checks(entities, checks=["granularity"])
+        d = report.to_dict()
+        assert "granularity" in d
+        assert "redundancy" not in d
+
+    def test_metrics(self):
+        from markitect.infospace.checks.orchestrator import run_all_checks
+        entities = _sample_entities()
+        report = run_all_checks(entities, checks=["redundancy", "granularity"])
+        m = report.metrics()
+        assert "redundancy_ratio" in m
+        assert "granularity_entropy" in m
+        assert isinstance(m["redundancy_ratio"], float)
+        assert isinstance(m["granularity_entropy"], float)
+
+    def test_metrics_empty_report(self):
+        from markitect.infospace.checks.orchestrator import CheckReport
+        report = CheckReport()
+        assert report.metrics() == {}
+
+    def test_run_all_with_graph(self):
+        from markitect.infospace.checks.orchestrator import run_all_checks
+        entities = _sample_entities()
+        g = _linear_graph()
+        report = run_all_checks(entities, graph=g, checks=["consistency"])
+        assert report.consistency is not None
+        assert report.consistency.cycle_count == 0
+
+    def test_run_all_with_cyclic_graph(self):
+        from markitect.infospace.checks.orchestrator import run_all_checks
+        entities = _sample_entities()
+        g = _cyclic_graph()
+        report = run_all_checks(entities, graph=g, checks=["consistency"])
+        assert report.consistency.cycle_count >= 1
+
+
+# ── Shannon entropy helper ──────────────────────────────────────────
+
+
+class TestShannonEntropy:
+    def test_uniform_distribution(self):
+        from markitect.infospace.checks.granularity import _shannon_entropy
+        counts = {"a": 1, "b": 1, "c": 1, "d": 1}
+        assert _shannon_entropy(counts) == pytest.approx(2.0)  # log2(4)
+
+    def test_single_element(self):
+        from markitect.infospace.checks.granularity import _shannon_entropy
+        assert _shannon_entropy({"a": 10}) == 0.0
+
+    def test_empty(self):
+        from markitect.infospace.checks.granularity import _shannon_entropy
+        assert _shannon_entropy({}) == 0.0
+
+    def test_skewed(self):
+        from markitect.infospace.checks.granularity import _shannon_entropy
+        counts = {"a": 99, "b": 1}
+        entropy = _shannon_entropy(counts)
+        assert 0.0 < entropy < 1.0
--- a/tests/unit/infospace/test_cli.py
+++ b/tests/unit/infospace/test_cli.py
@@ -0,0 +1,225 @@
+"""Tests for markitect.infospace.cli."""
+
+from pathlib import Path
+
+import pytest
+from click.testing import CliRunner
+
+from markitect.infospace.cli import infospace_commands
+
+
+@pytest.fixture
+def runner():
+    return CliRunner()
+
+
+@pytest.fixture
+def infospace_dir(tmp_path):
+    """Create a minimal infospace directory with config and entities."""
+    config_yaml = """\
+topic:
+  name: "Test Infospace"
+  domain: "Testing"
+
+disciplines:
+  - name: "Test Discipline"
+
+viability:
+  coverage_ratio:
+    min: 0.60
+  redundancy_ratio:
+    max: 0.05
+"""
+    (tmp_path / "infospace.yaml").write_text(config_yaml)
+
+    entities = tmp_path / "output" / "entities"
+    entities.mkdir(parents=True)
+    (entities / "alpha.md").write_text(
+        "# Alpha\n\n## Definition\n\nAlpha is a test entity.\n\n"
+        "## Source Chapter\n\nChapter 1\n\n"
+        "## Domain\n\nProduction\n"
+    )
+    (entities / "beta.md").write_text(
+        "# Beta\n\n## Definition\n\nBeta is another test entity with more words "
+        "to make it longer.\n\n"
+        "## Source Chapter\n\nChapter 2\n\n"
+        "## Domain\n\nDistribution\n"
+    )
+    return tmp_path
+
+
+# ── init ─────────────────────────────────────────────────────────────
+
+
+class TestInitCommand:
+    def test_creates_config_file(self, runner, tmp_path):
+        out = tmp_path / "infospace.yaml"
+        result = runner.invoke(
+            infospace_commands,
+            ["init", "--topic", "My Topic", "--domain", "Science", "-o", str(out)],
+        )
+        assert result.exit_code == 0
+        assert out.exists()
+        assert "Created" in result.output
+
+    def test_config_contains_topic(self, runner, tmp_path):
+        out = tmp_path / "infospace.yaml"
+        runner.invoke(
+            infospace_commands,
+            ["init", "--topic", "My Topic", "-o", str(out)],
+        )
+        text = out.read_text()
+        assert "My Topic" in text
+
+    def test_refuses_overwrite(self, runner, tmp_path):
+        out = tmp_path / "infospace.yaml"
+        out.write_text("existing")
+        result = runner.invoke(
+            infospace_commands,
+            ["init", "--topic", "X", "-o", str(out)],
+        )
+        assert result.exit_code != 0
+        assert "already exists" in result.output
+
+    def test_with_disciplines(self, runner, tmp_path):
+        out = tmp_path / "infospace.yaml"
+        result = runner.invoke(
+            infospace_commands,
+            [
+                "init", "--topic", "T",
+                "--discipline", "VSM",
+                "--discipline", "Category Theory",
+                "-o", str(out),
+            ],
+        )
+        assert result.exit_code == 0
+        text = out.read_text()
+        assert "VSM" in text
+        assert "Category Theory" in text
+
+
+# ── status ───────────────────────────────────────────────────────────
+
+
+class TestStatusCommand:
+    def test_shows_topic_and_count(self, runner, infospace_dir):
+        result = runner.invoke(
+            infospace_commands,
+            ["status", "--config", str(infospace_dir / "infospace.yaml")],
+        )
+        assert result.exit_code == 0
+        assert "Test Infospace" in result.output
+        assert "2" in result.output  # 2 entities
+
+    def test_shows_domain_field(self, runner, infospace_dir):
+        result = runner.invoke(
+            infospace_commands,
+            ["status", "--config", str(infospace_dir / "infospace.yaml")],
+        )
+        # Domain from config (topic.domain), not entity domains
+        assert "Testing" in result.output
+
+    def test_shows_disciplines(self, runner, infospace_dir):
+        result = runner.invoke(
+            infospace_commands,
+            ["status", "--config", str(infospace_dir / "infospace.yaml")],
+        )
+        assert "Test Discipline" in result.output
+
+    def test_no_config_exits(self, runner, tmp_path):
+        result = runner.invoke(
+            infospace_commands,
+            ["status", "--config", str(tmp_path / "nonexistent.yaml")],
+        )
+        assert result.exit_code != 0
+
+
+# ── entities ─────────────────────────────────────────────────────────
+
+
+class TestEntitiesCommand:
+    def test_lists_entities(self, runner, infospace_dir):
+        result = runner.invoke(
+            infospace_commands,
+            ["entities", "--config", str(infospace_dir / "infospace.yaml")],
+        )
+        assert result.exit_code == 0
+        assert "alpha" in result.output
+        assert "beta" in result.output
+        assert "Total: 2" in result.output
+
+    def test_sort_by_domain(self, runner, infospace_dir):
+        result = runner.invoke(
+            infospace_commands,
+            [
+                "entities",
+                "--config", str(infospace_dir / "infospace.yaml"),
+                "--sort-by", "domain",
+            ],
+        )
+        assert result.exit_code == 0
+        lines = result.output.strip().split("\n")
+        # Distribution comes before Production alphabetically
+        data_lines = [l for l in lines if "alpha" in l or "beta" in l]
+        assert len(data_lines) == 2
+
+    def test_no_entities_dir(self, runner, tmp_path):
+        (tmp_path / "infospace.yaml").write_text("topic:\n  name: X\n")
+        result = runner.invoke(
+            infospace_commands,
+            ["entities", "--config", str(tmp_path / "infospace.yaml")],
+        )
+        assert result.exit_code == 0
+        assert "No entities" in result.output
+
+
+# ── viability ────────────────────────────────────────────────────────
+
+
+class TestViabilityCommand:
+    def test_no_metrics_shows_thresholds(self, runner, infospace_dir):
+        result = runner.invoke(
+            infospace_commands,
+            ["viability", "--config", str(infospace_dir / "infospace.yaml")],
+        )
+        assert result.exit_code == 0
+        assert "coverage_ratio" in result.output
+
+    def test_with_metrics_file(self, runner, infospace_dir):
+        import yaml
+        metrics_dir = infospace_dir / "output" / "metrics"
+        metrics_dir.mkdir(parents=True, exist_ok=True)
+        metrics = {"coverage_ratio": 0.85, "redundancy_ratio": 0.02}
+        (metrics_dir / "metrics.yaml").write_text(yaml.safe_dump(metrics))
+
+        result = runner.invoke(
+            infospace_commands,
+            ["viability", "--config", str(infospace_dir / "infospace.yaml")],
+        )
+        assert result.exit_code == 0
+        assert "PASS" in result.output
+        assert "Viable: YES" in result.output
+
+    def test_failing_threshold(self, runner, infospace_dir):
+        import yaml
+        metrics_dir = infospace_dir / "output" / "metrics"
+        metrics_dir.mkdir(parents=True, exist_ok=True)
+        metrics = {"coverage_ratio": 0.3, "redundancy_ratio": 0.02}
+        (metrics_dir / "metrics.yaml").write_text(yaml.safe_dump(metrics))
+
+        result = runner.invoke(
+            infospace_commands,
+            ["viability", "--config", str(infospace_dir / "infospace.yaml")],
+        )
+        assert result.exit_code == 0
+        assert "FAIL" in result.output
+        assert "Viable: NO" in result.output
+
+    def test_no_thresholds_configured(self, runner, tmp_path):
+        (tmp_path / "infospace.yaml").write_text("topic:\n  name: X\n")
+        result = runner.invoke(
+            infospace_commands,
+            ["viability", "--config", str(tmp_path / "infospace.yaml")],
+        )
+        assert result.exit_code == 0
+        assert "No viability thresholds" in result.output
--- a/tests/unit/infospace/test_composition.py
+++ b/tests/unit/infospace/test_composition.py
@@ -0,0 +1,257 @@
+"""
+Tests for infospace composition model (S2.6).
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+import yaml
+
+from markitect.infospace.composition import (
+    DisciplineStatus,
+    StaleMappingInfo,
+    bind_discipline,
+    check_discipline_status,
+    compute_discipline_digests,
+    find_stale_mappings,
+    get_discipline_entities,
+    load_discipline_config,
+    resolve_discipline_path,
+)
+from markitect.infospace.config import (
+    DisciplineBinding,
+    InfospaceConfig,
+    TopicConfig,
+    ViabilityThreshold,
+    save_infospace_config,
+)
+
+
+# ── helpers ──────────────────────────────────────────────────────────
+
+
+def _create_discipline(tmp_path: Path, name: str = "test-discipline") -> Path:
+    """Create a minimal discipline infospace directory."""
+    disc_dir = tmp_path / name
+    disc_dir.mkdir(parents=True, exist_ok=True)
+
+    disc_config = InfospaceConfig(
+        topic=TopicConfig(name=name.replace("-", " ").title(), domain="Testing"),
+        viability={"coverage_ratio": ViabilityThreshold(metric="coverage_ratio", min=0.5)},
+    )
+    save_infospace_config(disc_config, disc_dir / "infospace.yaml")
+
+    # Create some entities
+    entities_dir = disc_dir / "output" / "entities"
+    entities_dir.mkdir(parents=True, exist_ok=True)
+
+    for slug in ["concept_a", "concept_b", "concept-c"]:
+        title = slug.replace("-", " ").title()
+        (entities_dir / f"{slug}.md").write_text(
+            f"# {title}\n\n## Definition\n\nA test concept for {slug}.\n\n"
+            f"## Source Chapter\n\nch01\n\n## Domain\n\nTesting\n",
+            encoding="utf-8",
+        )
+
+    return disc_dir
+
+
+def _parent_config(tmp_path: Path, disc_path: str = "") -> InfospaceConfig:
+    """Create a parent infospace config."""
+    return InfospaceConfig(
+        topic=TopicConfig(name="Parent", domain="Testing"),
+        disciplines=[DisciplineBinding(name="Test Discipline", path=disc_path)]
+        if disc_path
+        else [],
+    )
+
+
+# ── resolve_discipline_path ─────────────────────────────────────────
+
+
+class TestResolveDisciplinePath:
+    def test_relative_path(self, tmp_path):
+        disc_dir = _create_discipline(tmp_path)
+        binding = DisciplineBinding(name="test", path="test-discipline")
+        result = resolve_discipline_path(binding, tmp_path)
+        assert result is not None
+        assert result == disc_dir.resolve()
+
+    def test_absolute_path(self, tmp_path):
+        disc_dir = _create_discipline(tmp_path)
+        binding = DisciplineBinding(name="test", path=str(disc_dir))
+        result = resolve_discipline_path(binding, tmp_path / "other")
+        assert result is not None
+        assert result == disc_dir.resolve()
+
+    def test_missing_path(self, tmp_path):
+        binding = DisciplineBinding(name="test", path="nonexistent")
+        assert resolve_discipline_path(binding, tmp_path) is None
+
+    def test_empty_path(self, tmp_path):
+        binding = DisciplineBinding(name="test", path="")
+        assert resolve_discipline_path(binding, tmp_path) is None
+
+
+# ── load_discipline_config ──────────────────────────────────────────
+
+
+class TestLoadDisciplineConfig:
+    def test_loads_config(self, tmp_path):
+        disc_dir = _create_discipline(tmp_path)
+        binding = DisciplineBinding(name="test", path="test-discipline")
+        config = load_discipline_config(binding, tmp_path)
+        assert config is not None
+        assert config.topic.domain == "Testing"
+
+    def test_missing_config_file(self, tmp_path):
+        (tmp_path / "no-config").mkdir()
+        binding = DisciplineBinding(name="test", path="no-config")
+        assert load_discipline_config(binding, tmp_path) is None
+
+    def test_missing_directory(self, tmp_path):
+        binding = DisciplineBinding(name="test", path="gone")
+        assert load_discipline_config(binding, tmp_path) is None
+
+
+# ── check_discipline_status ─────────────────────────────────────────
+
+
+class TestCheckDisciplineStatus:
+    def test_valid_discipline(self, tmp_path):
+        _create_discipline(tmp_path)
+        binding = DisciplineBinding(name="Test Discipline", path="test-discipline")
+        status = check_discipline_status(binding, tmp_path)
+        assert status.exists
+        assert status.has_config
+        assert status.entity_count == 3
+        assert status.error == ""
+
+    def test_missing_discipline(self, tmp_path):
+        binding = DisciplineBinding(name="Missing", path="nope")
+        status = check_discipline_status(binding, tmp_path)
+        assert not status.exists
+        assert "not found" in status.error.lower()
+
+    def test_no_config(self, tmp_path):
+        (tmp_path / "bare").mkdir()
+        binding = DisciplineBinding(name="Bare", path="bare")
+        status = check_discipline_status(binding, tmp_path)
+        assert status.exists
+        assert not status.has_config
+
+    def test_viable_with_metrics(self, tmp_path):
+        disc_dir = _create_discipline(tmp_path)
+        # Write metrics that meet the threshold
+        metrics_dir = disc_dir / "output" / "metrics"
+        metrics_dir.mkdir(parents=True, exist_ok=True)
+        (metrics_dir / "metrics.yaml").write_text(
+            yaml.safe_dump({"coverage_ratio": 0.8}), encoding="utf-8"
+        )
+        binding = DisciplineBinding(name="Test", path="test-discipline")
+        status = check_discipline_status(binding, tmp_path)
+        assert status.is_viable
+
+    def test_not_viable_below_threshold(self, tmp_path):
+        disc_dir = _create_discipline(tmp_path)
+        metrics_dir = disc_dir / "output" / "metrics"
+        metrics_dir.mkdir(parents=True, exist_ok=True)
+        (metrics_dir / "metrics.yaml").write_text(
+            yaml.safe_dump({"coverage_ratio": 0.2}), encoding="utf-8"
+        )
+        binding = DisciplineBinding(name="Test", path="test-discipline")
+        status = check_discipline_status(binding, tmp_path)
+        assert not status.is_viable
+
+    def test_to_dict(self, tmp_path):
+        _create_discipline(tmp_path)
+        binding = DisciplineBinding(name="Test", path="test-discipline")
+        status = check_discipline_status(binding, tmp_path)
+        d = status.to_dict()
+        assert d["name"] == "Test"
+        assert d["exists"] is True
+        assert d["entity_count"] == 3
+
+
+# ── get_discipline_entities ─────────────────────────────────────────
+
+
+class TestGetDisciplineEntities:
+    def test_returns_entities(self, tmp_path):
+        _create_discipline(tmp_path)
+        binding = DisciplineBinding(name="Test", path="test-discipline")
+        entities = get_discipline_entities(binding, tmp_path)
+        assert len(entities) == 3
+        slugs = {e.slug for e in entities}
+        assert "concept_a" in slugs
+
+    def test_missing_discipline(self, tmp_path):
+        binding = DisciplineBinding(name="Test", path="nope")
+        assert get_discipline_entities(binding, tmp_path) == []
+
+
+# ── compute_discipline_digests ──────────────────────────────────────
+
+
+class TestComputeDisciplineDigests:
+    def test_returns_digests(self, tmp_path):
+        _create_discipline(tmp_path)
+        binding = DisciplineBinding(name="Test", path="test-discipline")
+        digests = compute_discipline_digests(binding, tmp_path)
+        assert len(digests) == 3
+        assert "concept_a" in digests
+        assert isinstance(digests["concept_a"], str)
+        assert len(digests["concept_a"]) == 12
+
+
+# ── find_stale_mappings ─────────────────────────────────────────────
+
+
+class TestFindStaleMappings:
+    def test_no_references(self, tmp_path):
+        cfg = _parent_config(tmp_path, disc_path="test-discipline")
+        assert find_stale_mappings(cfg, tmp_path) == []
+
+    def test_no_stale(self, tmp_path):
+        _create_discipline(tmp_path)
+        cfg = _parent_config(tmp_path, disc_path="test-discipline")
+        refs = {"entity_x": ["concept_a", "concept_b"]}
+        stale = find_stale_mappings(cfg, tmp_path, mapping_references=refs)
+        assert stale == []
+
+    def test_detects_stale(self, tmp_path):
+        _create_discipline(tmp_path)
+        cfg = _parent_config(tmp_path, disc_path="test-discipline")
+        refs = {"entity_x": ["concept_a", "deleted_concept"]}
+        stale = find_stale_mappings(cfg, tmp_path, mapping_references=refs)
+        assert len(stale) == 1
+        assert stale[0].entity_slug == "entity_x"
+        assert stale[0].discipline_entity == "deleted_concept"
+
+    def test_stale_to_dict(self):
+        info = StaleMappingInfo(
+            entity_slug="e1", discipline_entity="d1", reason="gone"
+        )
+        d = info.to_dict()
+        assert d["entity_slug"] == "e1"
+
+
+# ── bind_discipline ─────────────────────────────────────────────────
+
+
+class TestBindDiscipline:
+    def test_adds_binding(self, tmp_path):
+        _create_discipline(tmp_path)
+        cfg = InfospaceConfig(topic=TopicConfig(name="Parent"))
+        status = bind_discipline(cfg, name="Test", path="test-discipline", root=tmp_path)
+        assert status.exists
+        assert len(cfg.disciplines) == 1
+        assert cfg.disciplines[0].name == "Test"
+
+    def test_duplicate_rejected(self, tmp_path):
+        _create_discipline(tmp_path)
+        cfg = _parent_config(tmp_path, disc_path="test-discipline")
+        status = bind_discipline(cfg, name="Test Discipline", path="x", root=tmp_path)
+        assert "already bound" in status.error
--- a/tests/unit/infospace/test_config.py
+++ b/tests/unit/infospace/test_config.py
@@ -0,0 +1,400 @@
+"""Tests for markitect.infospace.config and state."""
+
+from datetime import datetime
+from pathlib import Path
+
+import pytest
+
+from markitect.infospace.config import (
+    DisciplineBinding,
+    InfospaceConfig,
+    PipelineConfig,
+    PipelineStage,
+    SchemaRegistry,
+    TopicConfig,
+    ViabilityThreshold,
+    find_infospace_config,
+    load_infospace_config,
+    save_infospace_config,
+)
+from markitect.infospace.state import (
+    InfospaceState,
+    ViabilityResult,
+    build_state,
+)
+from markitect.infospace.models import EntityMeta
+from markitect.infospace.evaluation import (
+    EntityEvaluation,
+    EvaluationSnapshot,
+    ScoreEntry,
+)
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+
+_SAMPLE_YAML = """\
+topic:
+  name: "The Wealth of Nations"
+  domain: "Classical Economics"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Viable System Model"
+    path: artifacts/vsm-reference/
+
+schemas:
+  entity: schemas/economic-entity-schema-v1.0.md
+  mapping: schemas/vsm-mapping-schema-v1.0.md
+
+competency_questions: schemas/competency-questions.md
+
+viability:
+  coverage_ratio:
+    min: 0.60
+  per_entity_mean:
+    min: 3.5
+  redundancy_ratio:
+    max: 0.05
+
+pipeline:
+  stages:
+    - template: extract-entities
+      spaces: [sources, guidelines]
+    - template: map-to-vsm
+      spaces: [entities, vsm-reference]
+  post_batch:
+    - template: assess-metrics
+"""
+
+
+def _sample_config() -> InfospaceConfig:
+    return InfospaceConfig(
+        topic=TopicConfig(name="Test Topic", domain="Testing"),
+        disciplines=[DisciplineBinding(name="VSM", path="vsm/")],
+        schemas=SchemaRegistry(entity="schemas/entity.md"),
+        competency_questions="schemas/cq.md",
+        viability={
+            "coverage_ratio": ViabilityThreshold("coverage_ratio", min=0.6),
+            "redundancy_ratio": ViabilityThreshold("redundancy_ratio", max=0.05),
+        },
+    )
+
+
+def _sample_entities(n=5) -> list:
+    return [
+        EntityMeta(
+            slug=f"entity-{i}",
+            title=f"Entity {i}",
+            h1_raw=f"Entity {i}",
+            domain="Production" if i % 2 == 0 else "Distribution",
+        )
+        for i in range(n)
+    ]
+
+
+# ── TopicConfig ──────────────────────────────────────────────────────
+
+
+class TestTopicConfig:
+    def test_round_trip(self):
+        tc = TopicConfig("WoN", "Economics", "sources/")
+        d = tc.to_dict()
+        restored = TopicConfig.from_dict(d)
+        assert restored.name == "WoN"
+        assert restored.domain == "Economics"
+        assert restored.sources == "sources/"
+
+    def test_minimal(self):
+        tc = TopicConfig.from_dict({"name": "Minimal"})
+        assert tc.domain == ""
+        assert tc.sources == ""
+
+    def test_to_dict_omits_empty(self):
+        tc = TopicConfig("X")
+        d = tc.to_dict()
+        assert "domain" not in d
+        assert "sources" not in d
+
+
+# ── DisciplineBinding ────────────────────────────────────────────────
+
+
+class TestDisciplineBinding:
+    def test_round_trip(self):
+        db = DisciplineBinding("VSM", "path/to/vsm")
+        d = db.to_dict()
+        restored = DisciplineBinding.from_dict(d)
+        assert restored.name == "VSM"
+        assert restored.path == "path/to/vsm"
+
+
+# ── SchemaRegistry ───────────────────────────────────────────────────
+
+
+class TestSchemaRegistry:
+    def test_round_trip(self):
+        sr = SchemaRegistry(entity="e.md", mapping="m.md", analysis="a.md")
+        d = sr.to_dict()
+        restored = SchemaRegistry.from_dict(d)
+        assert restored.entity == "e.md"
+        assert restored.mapping == "m.md"
+
+    def test_extra_schemas(self):
+        sr = SchemaRegistry.from_dict({"entity": "e.md", "custom": "c.md"})
+        assert sr.entity == "e.md"
+        assert sr.extra == {"custom": "c.md"}
+
+
+# ── ViabilityThreshold ──────────────────────────────────────────────
+
+
+class TestViabilityThreshold:
+    def test_min_check(self):
+        t = ViabilityThreshold("x", min=0.5)
+        assert t.check(0.6) is True
+        assert t.check(0.5) is True
+        assert t.check(0.4) is False
+
+    def test_max_check(self):
+        t = ViabilityThreshold("x", max=0.1)
+        assert t.check(0.05) is True
+        assert t.check(0.1) is True
+        assert t.check(0.2) is False
+
+    def test_min_and_max(self):
+        t = ViabilityThreshold("x", min=0.3, max=0.7)
+        assert t.check(0.5) is True
+        assert t.check(0.2) is False
+        assert t.check(0.8) is False
+
+    def test_no_bounds_always_passes(self):
+        t = ViabilityThreshold("x")
+        assert t.check(999.0) is True
+
+
+# ── PipelineConfig ──────────────────────────────────────────────────
+
+
+class TestPipelineConfig:
+    def test_round_trip(self):
+        pc = PipelineConfig(
+            stages=[PipelineStage("extract", ["s1", "s2"])],
+            post_batch=[PipelineStage("assess")],
+        )
+        d = pc.to_dict()
+        restored = PipelineConfig.from_dict(d)
+        assert len(restored.stages) == 1
+        assert restored.stages[0].template == "extract"
+        assert restored.stages[0].spaces == ["s1", "s2"]
+        assert len(restored.post_batch) == 1
+
+
+# ── InfospaceConfig ─────────────────────────────────────────────────
+
+
+class TestInfospaceConfig:
+    def test_to_dict_from_dict_round_trip(self):
+        cfg = _sample_config()
+        d = cfg.to_dict()
+        restored = InfospaceConfig.from_dict(d)
+        assert restored.topic.name == "Test Topic"
+        assert len(restored.disciplines) == 1
+        assert restored.schemas.entity == "schemas/entity.md"
+        assert restored.competency_questions == "schemas/cq.md"
+        assert len(restored.viability) == 2
+
+    def test_viability_thresholds_preserved(self):
+        cfg = _sample_config()
+        d = cfg.to_dict()
+        restored = InfospaceConfig.from_dict(d)
+        assert restored.viability["coverage_ratio"].min == 0.6
+        assert restored.viability["redundancy_ratio"].max == 0.05
+
+    def test_default_dirs(self):
+        cfg = InfospaceConfig(topic=TopicConfig("X"))
+        assert cfg.entities_dir == "output/entities"
+        assert cfg.evaluations_dir == "output/evaluations"
+        assert cfg.metrics_dir == "output/metrics"
+
+    def test_custom_dirs(self):
+        cfg = InfospaceConfig.from_dict({
+            "topic": {"name": "X"},
+            "entities_dir": "custom/entities",
+        })
+        assert cfg.entities_dir == "custom/entities"
+
+
+# ── YAML I/O ────────────────────────────────────────────────────────
+
+
+class TestYAMLIO:
+    def test_save_load_round_trip(self, tmp_path):
+        cfg = _sample_config()
+        p = tmp_path / "infospace.yaml"
+        save_infospace_config(cfg, p)
+        loaded = load_infospace_config(p)
+        assert loaded.topic.name == cfg.topic.name
+        assert len(loaded.viability) == len(cfg.viability)
+
+    def test_load_full_example(self, tmp_path):
+        p = tmp_path / "infospace.yaml"
+        p.write_text(_SAMPLE_YAML, encoding="utf-8")
+        cfg = load_infospace_config(p)
+        assert cfg.topic.name == "The Wealth of Nations"
+        assert cfg.topic.domain == "Classical Economics"
+        assert len(cfg.disciplines) == 1
+        assert cfg.disciplines[0].name == "Viable System Model"
+        assert cfg.schemas.entity == "schemas/economic-entity-schema-v1.0.md"
+        assert cfg.competency_questions == "schemas/competency-questions.md"
+        assert len(cfg.viability) == 3
+        assert cfg.viability["coverage_ratio"].min == 0.60
+        assert cfg.viability["redundancy_ratio"].max == 0.05
+        assert cfg.pipeline is not None
+        assert len(cfg.pipeline.stages) == 2
+        assert len(cfg.pipeline.post_batch) == 1
+
+    def test_load_missing_file(self, tmp_path):
+        with pytest.raises(FileNotFoundError):
+            load_infospace_config(tmp_path / "nonexistent.yaml")
+
+    def test_load_missing_topic(self, tmp_path):
+        p = tmp_path / "bad.yaml"
+        p.write_text("schemas:\n  entity: x.md\n")
+        with pytest.raises(ValueError, match="topic"):
+            load_infospace_config(p)
+
+    def test_save_creates_parent_dirs(self, tmp_path):
+        cfg = InfospaceConfig(topic=TopicConfig("X"))
+        p = tmp_path / "deep" / "nested" / "infospace.yaml"
+        save_infospace_config(cfg, p)
+        assert p.exists()
+
+
+class TestFindConfig:
+    def test_finds_config_in_current_dir(self, tmp_path):
+        (tmp_path / "infospace.yaml").write_text("topic:\n  name: X\n")
+        found = find_infospace_config(tmp_path)
+        assert found is not None
+        assert found.name == "infospace.yaml"
+
+    def test_finds_config_in_parent(self, tmp_path):
+        (tmp_path / "infospace.yaml").write_text("topic:\n  name: X\n")
+        child = tmp_path / "sub" / "dir"
+        child.mkdir(parents=True)
+        found = find_infospace_config(child)
+        assert found is not None
+
+    def test_returns_none_if_not_found(self, tmp_path):
+        assert find_infospace_config(tmp_path) is None
+
+
+# ── InfospaceState ──────────────────────────────────────────────────
+
+
+class TestInfospaceState:
+    def test_entity_count(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg, entities=_sample_entities(5))
+        assert state.entity_count == 5
+
+    def test_topic_name(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg)
+        assert state.topic_name == "Test Topic"
+
+    def test_domains(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg, entities=_sample_entities(4))
+        assert "Production" in state.domains
+        assert "Distribution" in state.domains
+
+    def test_has_evaluations(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg)
+        assert state.has_evaluations is False
+
+        snap = EvaluationSnapshot(
+            snapshot_id="s1",
+            created_at=datetime(2026, 1, 1),
+            schema_name="Test",
+            entity_count=0,
+        )
+        state.latest_snapshot = snap
+        assert state.has_evaluations is True
+
+
+class TestViabilityCheck:
+    def test_all_pass(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg)
+        metrics = {"coverage_ratio": 0.8, "redundancy_ratio": 0.02}
+        results = state.check_viability(metrics)
+        assert all(r.passed for r in results)
+        assert state.is_viable is True
+
+    def test_one_fails(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg)
+        metrics = {"coverage_ratio": 0.4, "redundancy_ratio": 0.02}
+        results = state.check_viability(metrics)
+        assert not all(r.passed for r in results)
+        assert state.is_viable is False
+
+    def test_missing_metric_defaults_to_zero(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg)
+        # coverage_ratio min=0.6, missing → 0.0 → fails
+        results = state.check_viability({})
+        coverage = next(r for r in results if r.metric == "coverage_ratio")
+        assert coverage.passed is False
+        assert coverage.value == 0.0
+
+    def test_viability_counts(self):
+        cfg = _sample_config()
+        state = InfospaceState(config=cfg)
+        metrics = {"coverage_ratio": 0.8, "redundancy_ratio": 0.2}
+        state.check_viability(metrics)
+        assert state.viability_pass_count == 1  # coverage passes
+        assert state.viability_total_count == 2
+
+    def test_no_thresholds_not_viable(self):
+        cfg = InfospaceConfig(topic=TopicConfig("X"))
+        state = InfospaceState(config=cfg)
+        assert state.is_viable is False
+
+
+class TestBuildState:
+    def test_builds_with_entities(self):
+        cfg = _sample_config()
+        entities = _sample_entities(3)
+        state = build_state(cfg, entities=entities)
+        assert state.entity_count == 3
+
+    def test_builds_with_metrics(self):
+        cfg = _sample_config()
+        metrics = {"coverage_ratio": 0.9, "redundancy_ratio": 0.01}
+        state = build_state(cfg, metrics=metrics)
+        assert state.is_viable is True
+
+    def test_summary(self):
+        cfg = _sample_config()
+        entities = _sample_entities(3)
+        metrics = {"coverage_ratio": 0.9, "redundancy_ratio": 0.01}
+        state = build_state(cfg, entities=entities, metrics=metrics)
+        s = state.summary()
+        assert s["topic"] == "Test Topic"
+        assert s["entity_count"] == 3
+        assert s["viable"] is True
+
+
+class TestViabilityResult:
+    def test_to_dict(self):
+        t = ViabilityThreshold("x", min=0.5)
+        r = ViabilityResult(metric="x", value=0.7, threshold=t, passed=True)
+        d = r.to_dict()
+        assert d["metric"] == "x"
+        assert d["value"] == 0.7
+        assert d["passed"] is True
+        assert d["min"] == 0.5
+        assert "max" not in d
--- a/tests/unit/infospace/test_entity_parser.py
+++ b/tests/unit/infospace/test_entity_parser.py
@@ -0,0 +1,230 @@
+"""Tests for markitect.infospace.entity_parser and EntityMeta."""
+
+import logging
+from pathlib import Path
+
+import pytest
+
+from markitect.infospace import EntityMeta, parse_entity_file, parse_entity_directory
+
+
+# ── Fixtures ────────────────────────────────────────────────────────
+
+COMPLETE_ENTITY = """\
+# Division of Labour
+
+## Definition
+
+The separation of a work process into a number of distinct tasks, each performed
+by a specialised worker, resulting in a significant increase in the productive
+powers of labour.
+
+## Source Chapter
+
+Book I, Chapter 1: "Of the Division of Labour"
+
+## Context
+
+The division of labour is the central argument of the chapter.
+
+## Economic Domain
+
+Production
+
+## Smith's Original Wording
+
+"The greatest improvements in the productive powers of labour…"
+
+## Modern Interpretation
+
+The division of labour remains a foundational concept in economics.
+"""
+
+MINIMAL_ENTITY = """\
+# Minimal Entity
+
+## Definition
+
+A brief definition.
+
+## Source Chapter
+
+Book I, Chapter 1
+
+## Context
+
+Some context.
+
+## Economic Domain
+
+Exchange
+"""
+
+SLUG_H1_ENTITY = """\
+# effectual-demand
+
+## Definition
+
+Effectual demand is the demand by consumers who are willing and able to pay.
+
+## Source Chapter
+
+Book 1, Chapter 7
+
+## Context
+
+Context for effectual demand.
+
+## Economic Domain
+
+Exchange
+
+## Smith's Original Wording
+
+"Such people may be called the effectual demanders…"
+
+## Modern Interpretation
+
+Represents the intersection of desire and purchasing power.
+"""
+
+NO_H1 = """\
+## Only H2
+
+Some content.
+"""
+
+
+# ── parse_entity_file ────────────────────────────────────────────────
+
+class TestParseEntityFile:
+    def test_complete_entity(self, tmp_path):
+        f = tmp_path / "division-of-labour.md"
+        f.write_text(COMPLETE_ENTITY)
+        meta = parse_entity_file(f)
+
+        assert meta.slug == "division_of_labour"
+        assert meta.title == "Division of Labour"
+        assert meta.h1_is_title_case is True
+        assert meta.has_original_wording is True
+        assert meta.domain == "Production"
+        assert meta.definition_word_count > 20
+        assert "separation" in meta.definition.lower()
+        assert meta.source_path == str(f)
+        assert "definition" in meta.section_slugs
+        assert "smith_s_original_wording" in meta.section_slugs
+
+    def test_minimal_entity(self, tmp_path):
+        f = tmp_path / "minimal-entity.md"
+        f.write_text(MINIMAL_ENTITY)
+        meta = parse_entity_file(f)
+
+        assert meta.slug == "minimal_entity"
+        assert meta.has_original_wording is False
+        assert meta.original_wording == ""
+        assert meta.modern_interpretation == ""
+        assert meta.domain == "Exchange"
+
+    def test_slug_format_h1(self, tmp_path):
+        f = tmp_path / "effectual-demand.md"
+        f.write_text(SLUG_H1_ENTITY)
+        meta = parse_entity_file(f)
+
+        assert meta.h1_raw == "effectual-demand"
+        assert meta.h1_is_title_case is False
+        assert meta.slug == "effectual_demand"
+        assert meta.has_original_wording is True
+
+    def test_missing_h1_raises(self, tmp_path):
+        f = tmp_path / "no-h1.md"
+        f.write_text(NO_H1)
+        with pytest.raises(ValueError, match="No H1"):
+            parse_entity_file(f)
+
+    def test_missing_sections_return_empty(self, tmp_path):
+        f = tmp_path / "minimal.md"
+        f.write_text(MINIMAL_ENTITY)
+        meta = parse_entity_file(f)
+
+        # Optional sections not present → empty string
+        assert meta.original_wording == ""
+        assert meta.modern_interpretation == ""
+
+    def test_word_count_accuracy(self, tmp_path):
+        f = tmp_path / "test.md"
+        f.write_text("# Test\n\n## Definition\n\none two three four five\n")
+        meta = parse_entity_file(f)
+        assert meta.definition_word_count == 5
+
+
+# ── parse_entity_directory ──────────────────────────────────────────
+
+class TestParseEntityDirectory:
+    def _make_dir(self, tmp_path):
+        """Create a temporary entity directory."""
+        d = tmp_path / "entities"
+        d.mkdir()
+        (d / "entity-a.md").write_text(COMPLETE_ENTITY)
+        (d / "entity-b.md").write_text(MINIMAL_ENTITY)
+        # files that should be excluded by default
+        (d / "book-1-chapter-01-entities.md").write_text("# View\n\nview file")
+        (d / "book-1-chapter-01-prompt.md").write_text("# Prompt\n\nprompt file")
+        return d
+
+    def test_excludes_view_and_prompt(self, tmp_path):
+        d = self._make_dir(tmp_path)
+        results = parse_entity_directory(d)
+        slugs = {e.slug for e in results}
+
+        assert "division_of_labour" in slugs
+        assert "minimal_entity" in slugs
+        # Excluded files should not be parsed as entities
+        assert len(results) == 2
+
+    def test_custom_exclude_patterns(self, tmp_path):
+        d = self._make_dir(tmp_path)
+        # Only exclude prompt files, allow entity views
+        results = parse_entity_directory(d, exclude_patterns=[r".*-prompt\.md$"])
+        assert len(results) == 3  # entity-a, entity-b, chapter-01-entities
+
+    def test_malformed_skipped_with_warning(self, tmp_path, caplog):
+        d = tmp_path / "entities"
+        d.mkdir()
+        (d / "good.md").write_text(COMPLETE_ENTITY)
+        (d / "bad.md").write_text(NO_H1)
+
+        with caplog.at_level(logging.WARNING):
+            results = parse_entity_directory(d)
+
+        assert len(results) == 1
+        assert "bad.md" in caplog.text
+
+
+# ── EntityMeta round-trip ───────────────────────────────────────────
+
+class TestEntityMetaRoundTrip:
+    def test_to_dict_from_dict(self, tmp_path):
+        f = tmp_path / "entity.md"
+        f.write_text(COMPLETE_ENTITY)
+        original = parse_entity_file(f)
+
+        data = original.to_dict()
+        restored = EntityMeta.from_dict(data)
+
+        assert restored.slug == original.slug
+        assert restored.title == original.title
+        assert restored.definition == original.definition
+        assert restored.h1_is_title_case == original.h1_is_title_case
+        assert restored.section_slugs == original.section_slugs
+        assert restored.definition_word_count == original.definition_word_count
+
+    def test_from_dict_ignores_unknown_keys(self):
+        data = {
+            "slug": "test",
+            "title": "Test",
+            "h1_raw": "Test",
+            "unknown_field": "should be ignored",
+        }
+        meta = EntityMeta.from_dict(data)
+        assert meta.slug == "test"
+        assert not hasattr(meta, "unknown_field") or "unknown_field" not in meta.__dict__
--- a/tests/unit/infospace/test_evaluate.py
+++ b/tests/unit/infospace/test_evaluate.py
@@ -0,0 +1,224 @@
+"""Tests for markitect.infospace.evaluate."""
+
+from datetime import datetime
+from pathlib import Path
+
+import pytest
+
+from markitect.infospace.config import InfospaceConfig, TopicConfig
+from markitect.infospace.evaluate import (
+    build_evaluation_prompt,
+    content_digest,
+    parse_evaluation_response,
+    run_entity_evaluation,
+)
+from markitect.infospace.evaluation import ScoreEntry
+from markitect.infospace.models import EntityMeta
+from markitect.prompts.execution.llm_adapter import MockLLMAdapter
+from markitect.prompts.execution.models import RunConfig
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+
+def _entity(**overrides) -> EntityMeta:
+    defaults = dict(
+        slug="division-of-labour",
+        title="Division Of Labour",
+        h1_raw="Division Of Labour",
+        definition="Splitting work into specialised tasks.",
+        source_chapter="Book I Chapter 1",
+        context="Smith introduces the concept early.",
+        domain="Production",
+        source_path="entities/division-of-labour.md",
+    )
+    defaults.update(overrides)
+    return EntityMeta(**defaults)
+
+
+def _config() -> InfospaceConfig:
+    return InfospaceConfig(topic=TopicConfig(name="The Wealth of Nations"))
+
+
+_MOCK_RESPONSE = """\
+DIMENSION: definition_precision
+SCORE: 4.5
+RATIONALE: Clear and specific definition of the concept.
+
+DIMENSION: source_grounding
+SCORE: 4.0
+RATIONALE: Well grounded in Smith's text.
+
+DIMENSION: domain_relevance
+SCORE: 5.0
+RATIONALE: Directly relevant to production economics.
+"""
+
+
+# ── build_evaluation_prompt ──────────────────────────────────────────
+
+
+class TestBuildPrompt:
+    def test_contains_entity_fields(self):
+        entity = _entity()
+        prompt = build_evaluation_prompt(entity, "Test Topic")
+        assert "division-of-labour" in prompt
+        assert "Division Of Labour" in prompt
+        assert "Production" in prompt
+        assert "Splitting work" in prompt
+
+    def test_contains_topic(self):
+        prompt = build_evaluation_prompt(_entity(), "WoN")
+        assert "WoN" in prompt
+
+    def test_contains_dimensions(self):
+        prompt = build_evaluation_prompt(_entity(), "T")
+        assert "definition_precision" in prompt
+        assert "source_grounding" in prompt
+
+    def test_custom_dimensions(self):
+        prompt = build_evaluation_prompt(
+            _entity(), "T", dimensions=["novelty", "coherence"]
+        )
+        assert "novelty" in prompt
+        assert "coherence" in prompt
+        assert "definition_precision" not in prompt
+
+    def test_handles_missing_fields(self):
+        entity = _entity(definition="", context="", domain="")
+        prompt = build_evaluation_prompt(entity, "T")
+        assert "(no definition)" in prompt
+        assert "(no context)" in prompt
+        assert "(unspecified)" in prompt
+
+
+# ── content_digest ───────────────────────────────────────────────────
+
+
+class TestContentDigest:
+    def test_deterministic(self):
+        e = _entity()
+        assert content_digest(e) == content_digest(e)
+
+    def test_changes_with_content(self):
+        e1 = _entity(definition="A")
+        e2 = _entity(definition="B")
+        assert content_digest(e1) != content_digest(e2)
+
+
+# ── parse_evaluation_response ────────────────────────────────────────
+
+
+class TestParseResponse:
+    def test_parses_three_dimensions(self):
+        scores = parse_evaluation_response(_MOCK_RESPONSE)
+        assert len(scores) == 3
+
+    def test_correct_names(self):
+        scores = parse_evaluation_response(_MOCK_RESPONSE)
+        names = [s.name for s in scores]
+        assert "definition_precision" in names
+        assert "source_grounding" in names
+        assert "domain_relevance" in names
+
+    def test_correct_scores(self):
+        scores = parse_evaluation_response(_MOCK_RESPONSE)
+        by_name = {s.name: s for s in scores}
+        assert by_name["definition_precision"].value == 4.5
+        assert by_name["source_grounding"].value == 4.0
+        assert by_name["domain_relevance"].value == 5.0
+
+    def test_correct_rationales(self):
+        scores = parse_evaluation_response(_MOCK_RESPONSE)
+        by_name = {s.name: s for s in scores}
+        assert "Clear" in by_name["definition_precision"].rationale
+
+    def test_empty_response(self):
+        scores = parse_evaluation_response("")
+        assert scores == []
+
+    def test_malformed_score_skipped(self):
+        text = "DIMENSION: x\nSCORE: not-a-number\nRATIONALE: oops"
+        scores = parse_evaluation_response(text)
+        assert len(scores) == 0
+
+
+# ── run_entity_evaluation ────────────────────────────────────────────
+
+
+class TestRunEntityEvaluation:
+    def test_evaluates_entities(self, tmp_path):
+        adapter = MockLLMAdapter(_MOCK_RESPONSE)
+        cfg = _config()
+        entities = [_entity(), _entity(slug="pin-factory", title="Pin Factory")]
+
+        summary = run_entity_evaluation(
+            config=cfg,
+            entities=entities,
+            adapter=adapter,
+            output_dir=tmp_path / "evals",
+        )
+        assert summary.total == 2
+        assert summary.succeeded == 2
+        assert adapter.call_count == 2
+
+    def test_writes_evaluation_files(self, tmp_path):
+        adapter = MockLLMAdapter(_MOCK_RESPONSE)
+        cfg = _config()
+        entities = [_entity()]
+
+        run_entity_evaluation(
+            config=cfg,
+            entities=entities,
+            adapter=adapter,
+            output_dir=tmp_path / "evals",
+        )
+        eval_file = tmp_path / "evals" / "division-of-labour.md"
+        assert eval_file.exists()
+        text = eval_file.read_text()
+        assert "definition_precision" in text
+
+    def test_incremental_skip(self, tmp_path):
+        adapter = MockLLMAdapter(_MOCK_RESPONSE)
+        cfg = _config()
+        entity = _entity()
+        digest = content_digest(entity)
+
+        summary = run_entity_evaluation(
+            config=cfg,
+            entities=[entity],
+            adapter=adapter,
+            output_dir=tmp_path,
+            previous_digests={entity.slug: digest},
+        )
+        assert summary.skipped == 1
+        assert adapter.call_count == 0
+
+    def test_progress_callback_called(self, tmp_path):
+        adapter = MockLLMAdapter(_MOCK_RESPONSE)
+        cfg = _config()
+        calls = []
+
+        run_entity_evaluation(
+            config=cfg,
+            entities=[_entity()],
+            adapter=adapter,
+            output_dir=tmp_path,
+            progress_callback=lambda d, t, r: calls.append((d, t, r.key)),
+        )
+        assert len(calls) == 1
+        assert calls[0] == (1, 1, "division-of-labour")
+
+    def test_passes_run_config(self, tmp_path):
+        adapter = MockLLMAdapter(_MOCK_RESPONSE)
+        cfg = _config()
+        rc = RunConfig(temperature=0.1, max_tokens=500)
+
+        run_entity_evaluation(
+            config=cfg,
+            entities=[_entity()],
+            adapter=adapter,
+            run_config=rc,
+            output_dir=tmp_path,
+        )
+        assert adapter.last_config.temperature == 0.1
--- a/tests/unit/infospace/test_evaluation.py
+++ b/tests/unit/infospace/test_evaluation.py
@@ -0,0 +1,398 @@
+"""Tests for markitect.infospace evaluation models and I/O."""
+
+from datetime import datetime
+from pathlib import Path
+
+import pytest
+
+from markitect.infospace import (
+    EntityEvaluation,
+    EvaluationSnapshot,
+    MetricChange,
+    MetricValue,
+    ScoreChange,
+    ScoreEntry,
+    SnapshotDiff,
+    append_to_history,
+    diff_snapshots,
+    read_entity_evaluation,
+    read_history,
+    read_snapshot,
+    write_entity_evaluation,
+    write_snapshot,
+)
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+_NOW = datetime(2026, 2, 19, 12, 0, 0)
+
+
+def _sample_scores() -> list:
+    return [
+        ScoreEntry("definition_precision", 4.5, rationale="Clear and specific."),
+        ScoreEntry("source_grounding", 4.0, rationale="Well grounded."),
+        ScoreEntry("domain_relevance", 4.5),
+    ]
+
+
+def _sample_evaluation(**overrides) -> EntityEvaluation:
+    defaults = dict(
+        entity_slug="division-of-labour",
+        evaluator="openrouter/anthropic/claude-3.5-sonnet",
+        scores=_sample_scores(),
+        evaluated_at=_NOW,
+        notes=["Strong entity with clear provenance"],
+    )
+    defaults.update(overrides)
+    return EntityEvaluation(**defaults)
+
+
+def _sample_metric() -> MetricValue:
+    return MetricValue("coverage_ratio", 0.85, concern="C2", details={"checked": 85})
+
+
+def _sample_snapshot(**overrides) -> EvaluationSnapshot:
+    defaults = dict(
+        snapshot_id="2026-02-19",
+        created_at=_NOW,
+        schema_name="Economic Entity",
+        entity_count=1,
+        entity_evaluations=[_sample_evaluation()],
+        collection_metrics=[_sample_metric()],
+        metadata={"version": "1.0"},
+    )
+    defaults.update(overrides)
+    return EvaluationSnapshot(**defaults)
+
+
+# ── Model tests ──────────────────────────────────────────────────────
+
+
+class TestScoreEntry:
+    def test_to_dict_from_dict_round_trip(self):
+        se = ScoreEntry("precision", 4.5, 5.0, "Good definition.")
+        d = se.to_dict()
+        restored = ScoreEntry.from_dict(d)
+        assert restored.name == se.name
+        assert restored.value == se.value
+        assert restored.max_value == se.max_value
+        assert restored.rationale == se.rationale
+
+    def test_to_dict_omits_empty_rationale(self):
+        se = ScoreEntry("precision", 4.5)
+        d = se.to_dict()
+        assert "rationale" not in d
+
+    def test_from_dict_defaults(self):
+        se = ScoreEntry.from_dict({"name": "x", "value": 3.0})
+        assert se.max_value == 5.0
+        assert se.rationale == ""
+
+
+class TestEntityEvaluation:
+    def test_overall_score_is_mean(self):
+        ev = _sample_evaluation()
+        # (4.5 + 4.0 + 4.5) / 3 ≈ 4.333
+        assert abs(ev.overall_score - 4.333333) < 0.001
+
+    def test_overall_score_zero_scores(self):
+        ev = _sample_evaluation(scores=[])
+        assert ev.overall_score == 0.0
+
+    def test_to_dict_from_dict_round_trip(self):
+        ev = _sample_evaluation()
+        d = ev.to_dict()
+        restored = EntityEvaluation.from_dict(d)
+        assert restored.entity_slug == ev.entity_slug
+        assert restored.evaluator == ev.evaluator
+        assert len(restored.scores) == len(ev.scores)
+        assert restored.evaluated_at == ev.evaluated_at
+        assert restored.notes == ev.notes
+
+    def test_to_dict_includes_overall_score(self):
+        ev = _sample_evaluation()
+        d = ev.to_dict()
+        assert "overall_score" in d
+        assert abs(d["overall_score"] - 4.3333) < 0.01
+
+
+class TestMetricValue:
+    def test_to_dict_from_dict_round_trip(self):
+        mv = _sample_metric()
+        d = mv.to_dict()
+        restored = MetricValue.from_dict(d)
+        assert restored.name == mv.name
+        assert restored.value == mv.value
+        assert restored.concern == mv.concern
+        assert restored.details == mv.details
+
+    def test_to_dict_omits_empty_concern(self):
+        mv = MetricValue("x", 1.0)
+        d = mv.to_dict()
+        assert "concern" not in d
+        assert "details" not in d
+
+
+class TestEvaluationSnapshot:
+    def test_to_dict_from_dict_round_trip(self):
+        snap = _sample_snapshot()
+        d = snap.to_dict()
+        restored = EvaluationSnapshot.from_dict(d)
+        assert restored.snapshot_id == snap.snapshot_id
+        assert restored.created_at == snap.created_at
+        assert restored.schema_name == snap.schema_name
+        assert restored.entity_count == snap.entity_count
+        assert len(restored.entity_evaluations) == 1
+        assert len(restored.collection_metrics) == 1
+        assert restored.metadata == snap.metadata
+
+    def test_from_dict_empty_lists(self):
+        d = {
+            "snapshot_id": "test",
+            "created_at": _NOW.isoformat(),
+            "schema_name": "Test",
+            "entity_count": 0,
+        }
+        snap = EvaluationSnapshot.from_dict(d)
+        assert snap.entity_evaluations == []
+        assert snap.collection_metrics == []
+        assert snap.metadata == {}
+
+
+# ── Per-entity file I/O ──────────────────────────────────────────────
+
+
+class TestEntityEvaluationIO:
+    def test_write_creates_file(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "eval.md"
+        write_entity_evaluation(ev, p)
+        assert p.exists()
+
+    def test_file_has_yaml_frontmatter(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "eval.md"
+        write_entity_evaluation(ev, p)
+        text = p.read_text()
+        assert text.startswith("---\n")
+        assert "\n---\n" in text
+
+    def test_frontmatter_contains_expected_keys(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "eval.md"
+        write_entity_evaluation(ev, p)
+        text = p.read_text()
+        for key in ["entity_slug", "evaluator", "evaluated_at", "overall_score", "scores"]:
+            assert key in text
+
+    def test_markdown_body_contains_rationales(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "eval.md"
+        write_entity_evaluation(ev, p)
+        text = p.read_text()
+        assert "Clear and specific." in text
+        assert "Well grounded." in text
+        assert "## definition_precision" in text
+
+    def test_read_back_matches_original(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "eval.md"
+        write_entity_evaluation(ev, p)
+        restored = read_entity_evaluation(p)
+        assert restored.entity_slug == ev.entity_slug
+        assert restored.evaluator == ev.evaluator
+        assert restored.evaluated_at == ev.evaluated_at
+        assert restored.notes == ev.notes
+        assert len(restored.scores) == len(ev.scores)
+
+    def test_round_trip_preserves_scores(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "eval.md"
+        write_entity_evaluation(ev, p)
+        restored = read_entity_evaluation(p)
+        for orig, rest in zip(ev.scores, restored.scores):
+            assert rest.name == orig.name
+            assert rest.value == orig.value
+            assert rest.max_value == orig.max_value
+
+    def test_round_trip_preserves_rationales(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "eval.md"
+        write_entity_evaluation(ev, p)
+        restored = read_entity_evaluation(p)
+        assert restored.scores[0].rationale == "Clear and specific."
+        assert restored.scores[1].rationale == "Well grounded."
+        # Third score has no rationale
+        assert restored.scores[2].rationale == ""
+
+    def test_write_creates_parent_dirs(self, tmp_path):
+        ev = _sample_evaluation()
+        p = tmp_path / "deep" / "nested" / "eval.md"
+        write_entity_evaluation(ev, p)
+        assert p.exists()
+
+
+# ── Snapshot I/O ─────────────────────────────────────────────────────
+
+
+class TestSnapshotIO:
+    def test_write_creates_file(self, tmp_path):
+        snap = _sample_snapshot()
+        p = tmp_path / "snapshot.yaml"
+        write_snapshot(snap, p)
+        assert p.exists()
+
+    def test_read_back_matches_original(self, tmp_path):
+        snap = _sample_snapshot()
+        p = tmp_path / "snapshot.yaml"
+        write_snapshot(snap, p)
+        restored = read_snapshot(p)
+        assert restored.snapshot_id == snap.snapshot_id
+        assert restored.created_at == snap.created_at
+        assert restored.schema_name == snap.schema_name
+        assert restored.entity_count == snap.entity_count
+
+    def test_round_trip_preserves_entity_evaluations(self, tmp_path):
+        snap = _sample_snapshot()
+        p = tmp_path / "snapshot.yaml"
+        write_snapshot(snap, p)
+        restored = read_snapshot(p)
+        assert len(restored.entity_evaluations) == 1
+        ev = restored.entity_evaluations[0]
+        assert ev.entity_slug == "division-of-labour"
+        assert len(ev.scores) == 3
+
+    def test_round_trip_preserves_collection_metrics(self, tmp_path):
+        snap = _sample_snapshot()
+        p = tmp_path / "snapshot.yaml"
+        write_snapshot(snap, p)
+        restored = read_snapshot(p)
+        assert len(restored.collection_metrics) == 1
+        m = restored.collection_metrics[0]
+        assert m.name == "coverage_ratio"
+        assert m.value == 0.85
+        assert m.concern == "C2"
+
+
+# ── History ──────────────────────────────────────────────────────────
+
+
+class TestHistory:
+    def test_append_creates_new_file(self, tmp_path):
+        snap = _sample_snapshot()
+        hp = tmp_path / "history.yaml"
+        append_to_history(snap, hp)
+        assert hp.exists()
+        history = read_history(hp)
+        assert len(history) == 1
+
+    def test_append_adds_to_existing(self, tmp_path):
+        hp = tmp_path / "history.yaml"
+        snap1 = _sample_snapshot(snapshot_id="snap-1")
+        snap2 = _sample_snapshot(snapshot_id="snap-2")
+        append_to_history(snap1, hp)
+        append_to_history(snap2, hp)
+        history = read_history(hp)
+        assert len(history) == 2
+        assert history[0].snapshot_id == "snap-1"
+        assert history[1].snapshot_id == "snap-2"
+
+    def test_multiple_appends_all_preserved(self, tmp_path):
+        hp = tmp_path / "history.yaml"
+        for i in range(5):
+            snap = _sample_snapshot(snapshot_id=f"snap-{i}")
+            append_to_history(snap, hp)
+        history = read_history(hp)
+        assert len(history) == 5
+        assert [h.snapshot_id for h in history] == [f"snap-{i}" for i in range(5)]
+
+    def test_read_history_returns_list_in_order(self, tmp_path):
+        hp = tmp_path / "history.yaml"
+        snap_a = _sample_snapshot(snapshot_id="a")
+        snap_b = _sample_snapshot(snapshot_id="b")
+        append_to_history(snap_a, hp)
+        append_to_history(snap_b, hp)
+        history = read_history(hp)
+        assert history[0].snapshot_id == "a"
+        assert history[1].snapshot_id == "b"
+
+
+# ── Diffing ──────────────────────────────────────────────────────────
+
+
+class TestDiffSnapshots:
+    def test_identical_snapshots_empty_diff(self):
+        snap = _sample_snapshot()
+        diff = diff_snapshots(snap, snap)
+        assert diff.added_entities == []
+        assert diff.removed_entities == []
+        assert diff.score_changes == []
+        assert diff.metric_changes == []
+
+    def test_added_entity(self):
+        before = _sample_snapshot(entity_evaluations=[])
+        after = _sample_snapshot()
+        diff = diff_snapshots(before, after)
+        assert "division-of-labour" in diff.added_entities
+        assert diff.removed_entities == []
+
+    def test_removed_entity(self):
+        before = _sample_snapshot()
+        after = _sample_snapshot(entity_evaluations=[])
+        diff = diff_snapshots(before, after)
+        assert "division-of-labour" in diff.removed_entities
+        assert diff.added_entities == []
+
+    def test_changed_score(self):
+        ev_before = _sample_evaluation(scores=[ScoreEntry("precision", 4.0)])
+        ev_after = _sample_evaluation(scores=[ScoreEntry("precision", 4.8)])
+        before = _sample_snapshot(entity_evaluations=[ev_before])
+        after = _sample_snapshot(entity_evaluations=[ev_after])
+        diff = diff_snapshots(before, after)
+        assert len(diff.score_changes) == 1
+        sc = diff.score_changes[0]
+        assert sc.entity_slug == "division-of-labour"
+        assert sc.dimension == "precision"
+        assert sc.before == 4.0
+        assert sc.after == 4.8
+
+    def test_changed_metric(self):
+        before = _sample_snapshot(
+            collection_metrics=[MetricValue("coverage_ratio", 0.80)]
+        )
+        after = _sample_snapshot(
+            collection_metrics=[MetricValue("coverage_ratio", 0.90)]
+        )
+        diff = diff_snapshots(before, after)
+        assert len(diff.metric_changes) == 1
+        mc = diff.metric_changes[0]
+        assert mc.name == "coverage_ratio"
+        assert mc.before == 0.80
+        assert mc.after == 0.90
+
+    def test_summary_readable(self):
+        ev_before = _sample_evaluation(scores=[ScoreEntry("precision", 4.0)])
+        ev_after = _sample_evaluation(scores=[ScoreEntry("precision", 4.8)])
+        before = _sample_snapshot(
+            snapshot_id="snap-1",
+            entity_evaluations=[ev_before],
+            collection_metrics=[MetricValue("coverage", 0.80)],
+        )
+        after = _sample_snapshot(
+            snapshot_id="snap-2",
+            entity_evaluations=[ev_after],
+            collection_metrics=[MetricValue("coverage", 0.90)],
+        )
+        diff = diff_snapshots(before, after)
+        text = diff.summary()
+        assert "snap-1" in text
+        assert "snap-2" in text
+        assert "precision" in text
+        assert "coverage" in text
+
+    def test_summary_no_changes(self):
+        snap = _sample_snapshot()
+        diff = diff_snapshots(snap, snap)
+        text = diff.summary()
+        assert "No changes" in text
--- a/tests/unit/infospace/test_history.py
+++ b/tests/unit/infospace/test_history.py
@@ -0,0 +1,258 @@
+"""
+Tests for metrics history and viability tracking (S2.5).
+"""
+
+from __future__ import annotations
+
+import json
+from datetime import datetime, timezone
+from pathlib import Path
+
+import pytest
+import yaml
+
+from markitect.infospace.checks.orchestrator import CheckReport
+from markitect.infospace.checks.granularity import GranularityReport
+from markitect.infospace.checks.redundancy import RedundancyReport
+from markitect.infospace.config import InfospaceConfig, TopicConfig, ViabilityThreshold
+from markitect.infospace.evaluation import EvaluationSnapshot, MetricValue
+from markitect.infospace.history import (
+    find_snapshot_by_date,
+    get_history,
+    get_latest_snapshot,
+    metric_trend,
+    read_metrics_file,
+    record_check_results,
+    snapshot_from_checks,
+    write_metrics_file,
+)
+
+
+# ── helpers ──────────────────────────────────────────────────────────
+
+
+def _check_report() -> CheckReport:
+    return CheckReport(
+        redundancy=RedundancyReport(redundancy_ratio=0.1, entity_count=10),
+        granularity=GranularityReport(domain_entropy=1.5, entity_count=10),
+    )
+
+
+def _config(tmp_path: Path) -> InfospaceConfig:
+    return InfospaceConfig(
+        topic=TopicConfig(name="Test Topic", domain="Testing"),
+        metrics_dir=str(tmp_path / "metrics"),
+    )
+
+
+def _snapshot(snap_id: str, date_str: str, metrics: dict) -> EvaluationSnapshot:
+    return EvaluationSnapshot(
+        snapshot_id=snap_id,
+        created_at=datetime.fromisoformat(date_str).replace(tzinfo=timezone.utc),
+        schema_name="default",
+        entity_count=10,
+        collection_metrics=[
+            MetricValue(name=k, value=v) for k, v in metrics.items()
+        ],
+    )
+
+
+# ── snapshot_from_checks ────────────────────────────────────────────
+
+
+class TestSnapshotFromChecks:
+    def test_creates_snapshot(self):
+        report = _check_report()
+        snap = snapshot_from_checks(report, entity_count=10)
+        assert snap.entity_count == 10
+        assert snap.snapshot_id  # non-empty
+        assert snap.created_at is not None
+
+    def test_contains_metrics(self):
+        report = _check_report()
+        snap = snapshot_from_checks(report, entity_count=10)
+        metric_names = {m.name for m in snap.collection_metrics}
+        assert "redundancy_ratio" in metric_names
+        assert "granularity_entropy" in metric_names
+
+    def test_concern_labels(self):
+        report = _check_report()
+        snap = snapshot_from_checks(report, entity_count=10)
+        by_name = {m.name: m for m in snap.collection_metrics}
+        assert by_name["redundancy_ratio"].concern == "C1"
+        assert by_name["granularity_entropy"].concern == "C5"
+
+    def test_custom_schema(self):
+        report = _check_report()
+        snap = snapshot_from_checks(report, entity_count=5, schema_name="custom")
+        assert snap.schema_name == "custom"
+
+    def test_metadata(self):
+        report = _check_report()
+        snap = snapshot_from_checks(report, entity_count=5, metadata={"key": "val"})
+        assert snap.metadata == {"key": "val"}
+
+    def test_empty_report(self):
+        report = CheckReport()
+        snap = snapshot_from_checks(report, entity_count=0)
+        assert snap.collection_metrics == []
+
+
+# ── write_metrics_file / read_metrics_file ──────────────────────────
+
+
+class TestMetricsFileIO:
+    def test_round_trip(self, tmp_path):
+        path = tmp_path / "metrics.yaml"
+        metrics = {"redundancy_ratio": 0.05, "coverage_ratio": 0.85}
+        write_metrics_file(metrics, path)
+        loaded = read_metrics_file(path)
+        assert loaded["redundancy_ratio"] == pytest.approx(0.05)
+        assert loaded["coverage_ratio"] == pytest.approx(0.85)
+
+    def test_creates_parent_dirs(self, tmp_path):
+        path = tmp_path / "deep" / "nested" / "metrics.yaml"
+        write_metrics_file({"x": 1.0}, path)
+        assert path.is_file()
+
+    def test_read_missing_file(self, tmp_path):
+        path = tmp_path / "nonexistent.yaml"
+        assert read_metrics_file(path) == {}
+
+    def test_read_invalid_content(self, tmp_path):
+        path = tmp_path / "bad.yaml"
+        path.write_text("just a string", encoding="utf-8")
+        assert read_metrics_file(path) == {}
+
+
+# ── record_check_results ────────────────────────────────────────────
+
+
+class TestRecordCheckResults:
+    def test_creates_metrics_file(self, tmp_path):
+        cfg = _config(tmp_path)
+        report = _check_report()
+        record_check_results(report, cfg, tmp_path, entity_count=10)
+        metrics_path = tmp_path / cfg.metrics_dir / "metrics.yaml"
+        assert metrics_path.is_file()
+
+    def test_creates_history_file(self, tmp_path):
+        cfg = _config(tmp_path)
+        report = _check_report()
+        record_check_results(report, cfg, tmp_path, entity_count=10)
+        history_path = tmp_path / cfg.metrics_dir / "history.yaml"
+        assert history_path.is_file()
+
+    def test_appends_to_history(self, tmp_path):
+        cfg = _config(tmp_path)
+        report = _check_report()
+        record_check_results(report, cfg, tmp_path, entity_count=10)
+        record_check_results(report, cfg, tmp_path, entity_count=12)
+        history = get_history(cfg, tmp_path)
+        assert len(history) == 2
+        assert history[0].entity_count == 10
+        assert history[1].entity_count == 12
+
+    def test_returns_snapshot(self, tmp_path):
+        cfg = _config(tmp_path)
+        report = _check_report()
+        snap = record_check_results(report, cfg, tmp_path, entity_count=10)
+        assert snap.snapshot_id
+        assert snap.entity_count == 10
+
+
+# ── get_history / get_latest_snapshot ────────────────────────────────
+
+
+class TestGetHistory:
+    def test_empty_history(self, tmp_path):
+        cfg = _config(tmp_path)
+        assert get_history(cfg, tmp_path) == []
+
+    def test_get_latest(self, tmp_path):
+        cfg = _config(tmp_path)
+        report = _check_report()
+        record_check_results(report, cfg, tmp_path, entity_count=5)
+        record_check_results(report, cfg, tmp_path, entity_count=10)
+        latest = get_latest_snapshot(cfg, tmp_path)
+        assert latest is not None
+        assert latest.entity_count == 10
+
+    def test_latest_none_when_empty(self, tmp_path):
+        cfg = _config(tmp_path)
+        assert get_latest_snapshot(cfg, tmp_path) is None
+
+
+# ── find_snapshot_by_date ────────────────────────────────────────────
+
+
+class TestFindSnapshotByDate:
+    def test_finds_closest(self):
+        history = [
+            _snapshot("a", "2026-01-01T10:00:00", {"x": 1.0}),
+            _snapshot("b", "2026-02-15T10:00:00", {"x": 2.0}),
+            _snapshot("c", "2026-03-01T10:00:00", {"x": 3.0}),
+        ]
+        result = find_snapshot_by_date(history, "2026-02-14")
+        assert result is not None
+        assert result.snapshot_id == "b"
+
+    def test_exact_match(self):
+        history = [
+            _snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
+            _snapshot("b", "2026-02-01T00:00:00", {"x": 2.0}),
+        ]
+        result = find_snapshot_by_date(history, "2026-02-01")
+        assert result is not None
+        assert result.snapshot_id == "b"
+
+    def test_empty_history(self):
+        assert find_snapshot_by_date([], "2026-01-01") is None
+
+    def test_invalid_date(self):
+        history = [_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0})]
+        assert find_snapshot_by_date(history, "not-a-date") is None
+
+    def test_with_timestamp(self):
+        history = [
+            _snapshot("a", "2026-01-01T10:00:00", {"x": 1.0}),
+            _snapshot("b", "2026-01-01T14:00:00", {"x": 2.0}),
+        ]
+        result = find_snapshot_by_date(history, "2026-01-01T13:00:00")
+        assert result is not None
+        assert result.snapshot_id == "b"
+
+
+# ── metric_trend ─────────────────────────────────────────────────────
+
+
+class TestMetricTrend:
+    def test_extracts_trend(self):
+        history = [
+            _snapshot("a", "2026-01-01T00:00:00", {"x": 1.0, "y": 2.0}),
+            _snapshot("b", "2026-02-01T00:00:00", {"x": 1.5, "y": 2.5}),
+        ]
+        trend = metric_trend(history, "x")
+        assert len(trend) == 2
+        assert trend[0]["value"] == 1.0
+        assert trend[1]["value"] == 1.5
+
+    def test_missing_metric(self):
+        history = [
+            _snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
+        ]
+        assert metric_trend(history, "nonexistent") == []
+
+    def test_empty_history(self):
+        assert metric_trend([], "x") == []
+
+    def test_partial_presence(self):
+        history = [
+            _snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
+            _snapshot("b", "2026-02-01T00:00:00", {"y": 2.0}),  # x missing
+            _snapshot("c", "2026-03-01T00:00:00", {"x": 3.0}),
+        ]
+        trend = metric_trend(history, "x")
+        assert len(trend) == 2
+        assert trend[0]["value"] == 1.0
+        assert trend[1]["value"] == 3.0
--- a/tests/unit/infospace/test_schema_validator.py
+++ b/tests/unit/infospace/test_schema_validator.py
@@ -0,0 +1,419 @@
+"""Tests for markitect.infospace schema and validator modules."""
+
+import pytest
+
+from markitect.infospace import (
+    ECONOMIC_ENTITY_SCHEMA,
+    BatchComplianceResult,
+    ComplianceDiagnostic,
+    ComplianceResult,
+    EntityMeta,
+    EntitySchema,
+    EnumConstraint,
+    SectionRequirement,
+    SectionRule,
+    validate_entities,
+    validate_entity,
+)
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+def _compliant_entity(**overrides) -> EntityMeta:
+    """Return an EntityMeta that passes ECONOMIC_ENTITY_SCHEMA."""
+    defaults = dict(
+        slug="division_of_labour",
+        title="Division of Labour",
+        h1_raw="Division of Labour",
+        definition=(
+            "The separation of a work process into a number of distinct "
+            "tasks, each performed by a specialised worker, resulting in "
+            "a significant increase in the productive powers of labour."
+        ),
+        source_chapter='Book I, Chapter 1: "Of the Division of Labour"',
+        context="The division of labour is the central argument of the chapter.",
+        domain="Production",
+        original_wording='"The greatest improvements in the productive powers…"',
+        modern_interpretation="Remains foundational in economics.",
+        h1_is_title_case=True,
+        has_original_wording=True,
+        definition_word_count=30,
+        total_word_count=100,
+        section_slugs=[
+            "definition",
+            "source_chapter",
+            "context",
+            "economic_domain",
+            "smith_s_original_wording",
+            "modern_interpretation",
+        ],
+        source_path="/tmp/division-of-labour.md",
+    )
+    defaults.update(overrides)
+    return EntityMeta(**defaults)
+
+
+# ── Single-entity validation ────────────────────────────────────────
+
+class TestValidateEntityCompliant:
+    def test_fully_compliant_zero_diagnostics(self):
+        entity = _compliant_entity()
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        assert result.diagnostics == []
+        assert result.is_compliant is True
+        assert result.error_count == 0
+        assert result.warning_count == 0
+        assert result.checks_run > 0
+
+    def test_summary_shows_pass(self):
+        entity = _compliant_entity()
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        assert "PASS" in result.summary()
+        assert "division_of_labour" in result.summary()
+
+
+class TestSectionMissing:
+    def test_missing_required_section_error(self):
+        entity = _compliant_entity(definition="")
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        codes = [d.code for d in result.diagnostics]
+        assert "SECTION_MISSING" in codes
+        assert not result.is_compliant
+
+    def test_empty_required_section_error(self):
+        entity = _compliant_entity(definition="   ")
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        codes = [d.code for d in result.diagnostics]
+        assert "SECTION_MISSING" in codes
+
+    def test_optional_section_absent_no_diagnostic(self):
+        entity = _compliant_entity(original_wording="", modern_interpretation="")
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        # Only optional sections removed — should still be fully compliant
+        assert result.is_compliant is True
+        assert result.error_count == 0
+        # No SECTION_MISSING or SECTION_RECOMMENDED for optional sections
+        section_codes = {d.code for d in result.diagnostics}
+        assert "SECTION_MISSING" not in section_codes
+        assert "SECTION_RECOMMENDED" not in section_codes
+
+
+class TestSectionRecommended:
+    def test_recommended_section_missing_warning(self):
+        schema = EntitySchema(
+            name="Test Schema",
+            section_rules=(
+                SectionRule(
+                    slug="definition",
+                    label="Definition",
+                    requirement=SectionRequirement.RECOMMENDED,
+                ),
+            ),
+        )
+        entity = _compliant_entity(definition="")
+        result = validate_entity(entity, schema)
+        codes = [d.code for d in result.diagnostics]
+        assert "SECTION_RECOMMENDED" in codes
+        severities = [d.severity for d in result.diagnostics if d.code == "SECTION_RECOMMENDED"]
+        assert severities == ["warning"]
+        # Warnings don't break compliance
+        assert result.is_compliant is True
+
+
+class TestWordCountBounds:
+    def test_definition_too_short_error(self):
+        entity = _compliant_entity(definition="only ten words here to test the lower boundary check now")
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        short_diags = [d for d in result.diagnostics if d.code == "SECTION_TOO_SHORT"]
+        assert len(short_diags) == 1
+        assert short_diags[0].severity == "error"
+        assert not result.is_compliant
+
+    def test_definition_too_long_warning(self):
+        long_def = " ".join(["word"] * 200)
+        entity = _compliant_entity(definition=long_def)
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        long_diags = [d for d in result.diagnostics if d.code == "SECTION_TOO_LONG"]
+        assert len(long_diags) == 1
+        assert long_diags[0].severity == "warning"
+        # Warnings don't break compliance
+        assert result.is_compliant is True
+
+    def test_definition_at_min_boundary_passes(self):
+        exactly_20 = " ".join(["word"] * 20)
+        entity = _compliant_entity(definition=exactly_20)
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        codes = [d.code for d in result.diagnostics]
+        assert "SECTION_TOO_SHORT" not in codes
+
+    def test_definition_at_max_boundary_passes(self):
+        exactly_150 = " ".join(["word"] * 150)
+        entity = _compliant_entity(definition=exactly_150)
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        codes = [d.code for d in result.diagnostics]
+        assert "SECTION_TOO_LONG" not in codes
+
+
+class TestH1Checks:
+    def test_slug_format_h1_warning(self):
+        entity = _compliant_entity(
+            h1_raw="effectual-demand",
+            h1_is_title_case=False,
+        )
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        h1_diags = [d for d in result.diagnostics if d.code == "H1_NOT_TITLE_CASE"]
+        assert len(h1_diags) == 1
+        assert h1_diags[0].severity == "warning"
+        # Still compliant (it's a warning)
+        assert result.is_compliant is True
+
+    def test_h1_missing_error(self):
+        entity = _compliant_entity(slug="", h1_raw="")
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        codes = [d.code for d in result.diagnostics]
+        assert "H1_MISSING" in codes
+        assert not result.is_compliant
+
+    def test_h1_title_case_error_severity(self):
+        schema = EntitySchema(
+            name="Strict",
+            section_rules=(),
+            h1_title_case_severity="error",
+        )
+        entity = _compliant_entity(h1_is_title_case=False)
+        result = validate_entity(entity, schema)
+        h1_diags = [d for d in result.diagnostics if d.code == "H1_NOT_TITLE_CASE"]
+        assert h1_diags[0].severity == "error"
+        assert not result.is_compliant
+
+
+class TestEnumConstraints:
+    def test_unknown_domain_warning(self):
+        entity = _compliant_entity(domain="Metaphysics")
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
+        assert len(enum_diags) == 1
+        assert enum_diags[0].severity == "warning"
+        assert result.is_compliant is True
+
+    def test_empty_domain_no_enum_diagnostic(self):
+        """Empty domain triggers SECTION_MISSING, not ENUM_VALUE_UNKNOWN."""
+        entity = _compliant_entity(domain="")
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        enum_codes = [d.code for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
+        assert len(enum_codes) == 0
+        # But SECTION_MISSING is raised for the required section
+        missing_codes = [d.code for d in result.diagnostics if d.code == "SECTION_MISSING"]
+        assert len(missing_codes) >= 1
+
+    def test_valid_domain_no_diagnostic(self):
+        for domain in ("Production", "Exchange", "Distribution", "Regulation", "General Theory"):
+            entity = _compliant_entity(domain=domain)
+            result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+            enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
+            assert len(enum_diags) == 0, f"Unexpected enum diagnostic for domain '{domain}'"
+
+
+class TestMultipleIssues:
+    def test_multiple_issues_on_one_entity(self):
+        entity = _compliant_entity(
+            definition="too short",
+            domain="UnknownDomain",
+            h1_is_title_case=False,
+        )
+        result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
+        codes = {d.code for d in result.diagnostics}
+        assert "SECTION_TOO_SHORT" in codes
+        assert "ENUM_VALUE_UNKNOWN" in codes
+        assert "H1_NOT_TITLE_CASE" in codes
+        assert len(result.diagnostics) >= 3
+
+
+class TestCustomSchema:
+    def test_custom_schema_different_rules(self):
+        schema = EntitySchema(
+            name="Custom",
+            section_rules=(
+                SectionRule(
+                    slug="definition",
+                    label="Definition",
+                    requirement=SectionRequirement.REQUIRED,
+                    min_words=5,
+                    max_words=50,
+                ),
+            ),
+            enum_constraints=(
+                EnumConstraint(
+                    field_name="domain",
+                    allowed_values=("Alpha", "Beta"),
+                    severity="error",
+                ),
+            ),
+            h1_title_case_severity="error",
+            require_h1=False,
+        )
+        entity = _compliant_entity(
+            definition="just five words here exactly",
+            domain="Alpha",
+        )
+        result = validate_entity(entity, schema)
+        assert result.is_compliant is True
+        assert result.schema_name == "Custom"
+
+    def test_custom_enum_error_severity(self):
+        schema = EntitySchema(
+            name="Strict Enum",
+            section_rules=(),
+            enum_constraints=(
+                EnumConstraint(
+                    field_name="domain",
+                    allowed_values=("A",),
+                    severity="error",
+                ),
+            ),
+        )
+        entity = _compliant_entity(domain="B")
+        result = validate_entity(entity, schema)
+        assert not result.is_compliant
+        enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
+        assert enum_diags[0].severity == "error"
+
+
+# ── Batch validation ────────────────────────────────────────────────
+
+class TestBatchValidation:
+    def test_empty_list(self):
+        result = validate_entities([], ECONOMIC_ENTITY_SCHEMA)
+        assert result.total_entities == 0
+        assert result.compliant_count == 0
+        assert result.total_errors == 0
+        assert result.total_warnings == 0
+
+    def test_mixed_compliance(self):
+        good = _compliant_entity()
+        bad = _compliant_entity(slug="bad", definition="")
+        result = validate_entities([good, bad], ECONOMIC_ENTITY_SCHEMA)
+        assert result.total_entities == 2
+        assert result.compliant_count == 1
+        assert result.non_compliant_count == 1
+        assert result.total_errors >= 1
+
+    def test_summary_format(self):
+        good = _compliant_entity()
+        bad = _compliant_entity(slug="bad_entity", definition="too short")
+        result = validate_entities([good, bad], ECONOMIC_ENTITY_SCHEMA)
+        summary = result.summary()
+        assert "Schema: Economic Entity" in summary
+        assert "Entities: 2" in summary
+        assert "Compliant: 1/2" in summary
+        assert "division_of_labour" in summary
+        assert "bad_entity" in summary
+
+    def test_aggregate_counts(self):
+        entities = [
+            _compliant_entity(slug="e1"),
+            _compliant_entity(slug="e2", definition="short"),
+            _compliant_entity(slug="e3", domain="Unknown", h1_is_title_case=False),
+        ]
+        result = validate_entities(entities, ECONOMIC_ENTITY_SCHEMA)
+        assert result.total_entities == 3
+        assert result.total_errors == result.results[0].error_count + result.results[1].error_count + result.results[2].error_count
+        assert result.total_warnings == result.results[0].warning_count + result.results[1].warning_count + result.results[2].warning_count
+
+    def test_schema_name_propagated(self):
+        result = validate_entities([], ECONOMIC_ENTITY_SCHEMA)
+        assert result.schema_name == "Economic Entity"
+
+
+# ── Default schema checks ──────────────────────────────────────────
+
+class TestDefaultSchema:
+    def test_correct_section_count(self):
+        assert len(ECONOMIC_ENTITY_SCHEMA.section_rules) == 6
+
+    def test_required_sections(self):
+        required = [
+            r.slug for r in ECONOMIC_ENTITY_SCHEMA.section_rules
+            if r.requirement == SectionRequirement.REQUIRED
+        ]
+        assert set(required) == {"definition", "source_chapter", "context", "economic_domain"}
+
+    def test_optional_sections(self):
+        optional = [
+            r.slug for r in ECONOMIC_ENTITY_SCHEMA.section_rules
+            if r.requirement == SectionRequirement.OPTIONAL
+        ]
+        assert set(optional) == {"smith_s_original_wording", "modern_interpretation"}
+
+    def test_domain_enum_values(self):
+        domain_constraint = ECONOMIC_ENTITY_SCHEMA.enum_constraints[0]
+        assert domain_constraint.field_name == "domain"
+        assert set(domain_constraint.allowed_values) == {
+            "Production", "Exchange", "Distribution", "Regulation", "General Theory",
+        }
+
+    def test_schema_is_frozen(self):
+        with pytest.raises(AttributeError):
+            ECONOMIC_ENTITY_SCHEMA.name = "Changed"
+
+    def test_section_rule_is_frozen(self):
+        rule = ECONOMIC_ENTITY_SCHEMA.section_rules[0]
+        with pytest.raises(AttributeError):
+            rule.slug = "changed"
+
+    def test_enum_constraint_is_frozen(self):
+        constraint = ECONOMIC_ENTITY_SCHEMA.enum_constraints[0]
+        with pytest.raises(AttributeError):
+            constraint.field_name = "changed"
+
+
+# ── ComplianceDiagnostic __str__ ────────────────────────────────────
+
+class TestDiagnosticStr:
+    def test_basic_str(self):
+        d = ComplianceDiagnostic(code="TEST", message="test msg", severity="error")
+        assert "[ERROR] TEST: test msg" in str(d)
+
+    def test_str_with_section(self):
+        d = ComplianceDiagnostic(
+            code="SECTION_MISSING",
+            message="Missing.",
+            severity="error",
+            section="definition",
+        )
+        s = str(d)
+        assert "(section: definition)" in s
+
+    def test_str_with_field(self):
+        d = ComplianceDiagnostic(
+            code="ENUM_VALUE_UNKNOWN",
+            message="Unknown.",
+            severity="warning",
+            field="domain",
+        )
+        s = str(d)
+        assert "(field: domain)" in s
+
+
+# ── ComplianceResult properties ─────────────────────────────────────
+
+class TestComplianceResultProperties:
+    def test_errors_property(self):
+        result = ComplianceResult(entity_slug="test", schema_name="Test")
+        result.diagnostics = [
+            ComplianceDiagnostic(code="A", message="a", severity="error"),
+            ComplianceDiagnostic(code="B", message="b", severity="warning"),
+            ComplianceDiagnostic(code="C", message="c", severity="error"),
+        ]
+        assert len(result.errors) == 2
+        assert len(result.warnings) == 1
+        assert result.error_count == 2
+        assert result.warning_count == 1
+        assert not result.is_compliant
+
+    def test_summary_fail(self):
+        result = ComplianceResult(entity_slug="test", schema_name="Test", checks_run=5)
+        result.diagnostics = [
+            ComplianceDiagnostic(code="A", message="a", severity="error"),
+        ]
+        assert "FAIL" in result.summary()
--- a/tests/unit/llm/test_embeddings.py
+++ b/tests/unit/llm/test_embeddings.py
@@ -0,0 +1,235 @@
+"""Tests for embedding adapter, cache, similarity, and factory."""
+
+from pathlib import Path
+from unittest import mock
+
+import pytest
+
+from markitect.llm.similarity import (
+    cosine_similarity,
+    similarity_matrix,
+    find_similar_pairs,
+)
+from markitect.llm.embedding_cache import EmbeddingCache
+from markitect.llm.embedding_openai import OpenAICompatibleEmbeddingAdapter
+from markitect.llm.embedding_factory import create_embedding_adapter
+from markitect.llm.exceptions import LLMConfigurationError, LLMRateLimitError
+
+
+# ── Similarity math ─────────────────────────────────────────────────
+
+
+class TestCosineSimilarity:
+    def test_identical_vectors(self):
+        v = [1.0, 2.0, 3.0]
+        assert cosine_similarity(v, v) == pytest.approx(1.0)
+
+    def test_orthogonal_vectors(self):
+        a = [1.0, 0.0, 0.0]
+        b = [0.0, 1.0, 0.0]
+        assert cosine_similarity(a, b) == pytest.approx(0.0)
+
+    def test_opposite_vectors(self):
+        a = [1.0, 0.0]
+        b = [-1.0, 0.0]
+        assert cosine_similarity(a, b) == pytest.approx(-1.0)
+
+    def test_zero_vector(self):
+        assert cosine_similarity([0.0, 0.0], [1.0, 2.0]) == 0.0
+
+
+class TestSimilarityMatrix:
+    def test_diagonal_is_one(self):
+        vecs = [[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]]
+        mat = similarity_matrix(vecs)
+        for i in range(len(vecs)):
+            assert mat[i][i] == pytest.approx(1.0)
+
+    def test_symmetric(self):
+        vecs = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
+        mat = similarity_matrix(vecs)
+        n = len(vecs)
+        for i in range(n):
+            for j in range(n):
+                assert mat[i][j] == pytest.approx(mat[j][i])
+
+
+class TestFindSimilarPairs:
+    def test_threshold_filters(self):
+        emb = {
+            "a": [1.0, 0.0],
+            "b": [0.0, 1.0],
+            "c": [1.0, 0.01],  # very similar to "a"
+        }
+        pairs = find_similar_pairs(emb, threshold=0.90)
+        slugs_in_pairs = {(s1, s2) for s1, s2, _ in pairs}
+        assert ("a", "c") in slugs_in_pairs
+        # a-b are orthogonal, should not appear
+        assert ("a", "b") not in slugs_in_pairs
+
+    def test_sorted_descending(self):
+        emb = {
+            "x": [1.0, 0.0, 0.0],
+            "y": [0.9, 0.1, 0.0],
+            "z": [0.95, 0.05, 0.0],
+        }
+        pairs = find_similar_pairs(emb, threshold=0.0)
+        sims = [s for _, _, s in pairs]
+        assert sims == sorted(sims, reverse=True)
+
+    def test_empty_embeddings(self):
+        assert find_similar_pairs({}) == []
+
+    def test_single_embedding(self):
+        assert find_similar_pairs({"only": [1.0, 0.0]}) == []
+
+
+# ── Embedding cache ─────────────────────────────────────────────────
+
+
+class TestEmbeddingCache:
+    def test_put_get_roundtrip(self, tmp_path: Path):
+        cache = EmbeddingCache(tmp_path)
+        cache.put("division-of-labour", "abc123", [0.1, 0.2, 0.3])
+        assert cache.get("division-of-labour", "abc123") == [0.1, 0.2, 0.3]
+
+    def test_wrong_digest_returns_none(self, tmp_path: Path):
+        cache = EmbeddingCache(tmp_path)
+        cache.put("slug", "digest-v1", [1.0])
+        assert cache.get("slug", "digest-v2") is None
+
+    def test_missing_slug_returns_none(self, tmp_path: Path):
+        cache = EmbeddingCache(tmp_path)
+        assert cache.get("nonexistent", "any") is None
+
+    def test_save_load_persists(self, tmp_path: Path):
+        cache = EmbeddingCache(tmp_path)
+        cache.put("slug-a", "d1", [0.5, 0.6])
+        cache.save()
+
+        cache2 = EmbeddingCache(tmp_path)
+        assert cache2.get("slug-a", "d1") == [0.5, 0.6]
+
+    def test_stats_tracks_hits_and_misses(self, tmp_path: Path):
+        cache = EmbeddingCache(tmp_path)
+        cache.put("s", "d", [1.0])
+        cache.get("s", "d")       # hit
+        cache.get("s", "wrong")   # miss
+        cache.get("missing", "x") # miss
+        s = cache.stats()
+        assert s["entries"] == 1
+        assert s["hits"] == 1
+        assert s["misses"] == 2
+
+
+# ── Adapter (mocked HTTP) ──────────────────────────────────────────
+
+
+def _make_embedding_response(vectors):
+    """Build a mock API response for the /embeddings endpoint."""
+    return {
+        "data": [
+            {"embedding": vec, "index": i}
+            for i, vec in enumerate(vectors)
+        ],
+        "usage": {"prompt_tokens": 5, "total_tokens": 5},
+    }
+
+
+class TestOpenAICompatibleEmbeddingAdapter:
+    def _adapter(self, **kwargs):
+        defaults = {"api_key": "sk-test", "provider": "openai"}
+        defaults.update(kwargs)
+        return OpenAICompatibleEmbeddingAdapter(**defaults)
+
+    @mock.patch("markitect.llm.embedding_openai.post_json")
+    def test_embed_returns_vectors_in_order(self, mock_post):
+        # Return indices out of order to verify sorting
+        mock_post.return_value = {
+            "data": [
+                {"embedding": [0.2, 0.3], "index": 1},
+                {"embedding": [0.1, 0.2], "index": 0},
+            ],
+            "usage": {},
+        }
+        adapter = self._adapter()
+        result = adapter.embed(["text1", "text2"])
+        assert result == [[0.1, 0.2], [0.2, 0.3]]
+
+    @mock.patch("markitect.llm.embedding_openai.post_json")
+    def test_embed_payload_structure(self, mock_post):
+        mock_post.return_value = _make_embedding_response([[0.1]])
+        adapter = self._adapter(model="text-embedding-3-large")
+        adapter.embed(["hello"])
+
+        call_args = mock_post.call_args
+        url = call_args[0][0]
+        payload = call_args[0][1]
+        assert url == "https://api.openai.com/v1/embeddings"
+        assert payload["model"] == "text-embedding-3-large"
+        assert payload["input"] == ["hello"]
+
+    def test_embed_raises_without_api_key(self):
+        adapter = OpenAICompatibleEmbeddingAdapter(api_key=None, provider="openai")
+        adapter._api_key = None
+        with pytest.raises(LLMConfigurationError):
+            adapter.embed(["test"])
+
+    def test_validate_true_with_key(self):
+        adapter = self._adapter()
+        assert adapter.validate() is True
+
+    def test_validate_false_without_key(self):
+        adapter = OpenAICompatibleEmbeddingAdapter(api_key=None, provider="openai")
+        adapter._api_key = None
+        assert adapter.validate() is False
+
+    @mock.patch("markitect.llm.embedding_openai.post_json")
+    @mock.patch("markitect.llm.embedding_openai.time.sleep")
+    def test_retry_on_429(self, mock_sleep, mock_post):
+        mock_post.side_effect = [
+            LLMRateLimitError("rate limited", status_code=429),
+            _make_embedding_response([[0.1, 0.2]]),
+        ]
+        adapter = self._adapter(max_retries=2)
+        result = adapter.embed(["test"])
+        assert result == [[0.1, 0.2]]
+        assert mock_sleep.call_count == 1
+
+    def test_openai_provider_base_url(self):
+        adapter = self._adapter(provider="openai")
+        assert adapter._api_base == "https://api.openai.com/v1"
+
+    def test_openrouter_provider_base_url(self):
+        adapter = self._adapter(provider="openrouter")
+        assert adapter._api_base == "https://openrouter.ai/api/v1"
+
+    def test_unknown_provider_raises(self):
+        with pytest.raises(LLMConfigurationError):
+            OpenAICompatibleEmbeddingAdapter(api_key="sk-test", provider="unknown")
+
+
+# ── Factory ─────────────────────────────────────────────────────────
+
+
+class TestCreateEmbeddingAdapter:
+    def test_openai_provider(self):
+        adapter = create_embedding_adapter("openai", api_key="sk-test")
+        assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
+        assert adapter._provider == "openai"
+
+    def test_openrouter_provider(self):
+        adapter = create_embedding_adapter("openrouter", api_key="sk-test")
+        assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
+        assert adapter._provider == "openrouter"
+
+    def test_unknown_provider_raises(self):
+        with pytest.raises(LLMConfigurationError) as exc_info:
+            create_embedding_adapter("unknown")
+        assert "unknown" in str(exc_info.value)
+
+    def test_model_passed_through(self):
+        adapter = create_embedding_adapter(
+            "openai", model="text-embedding-3-large", api_key="sk-test"
+        )
+        assert adapter._model == "text-embedding-3-large"
--- a/tests/unit/prompts/test_batch_evaluator.py
+++ b/tests/unit/prompts/test_batch_evaluator.py
@@ -0,0 +1,281 @@
+"""Tests for markitect.prompts.execution.batch."""
+
+import pytest
+
+from markitect.prompts.execution.batch import (
+    BatchEvaluator,
+    BatchItem,
+    BatchResult,
+    BatchSummary,
+)
+from markitect.prompts.execution.llm_adapter import MockLLMAdapter, ErrorLLMAdapter
+from markitect.prompts.execution.models import RunConfig, LLMResponse
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+
+def _items(n=3, digest_prefix="d"):
+    return [
+        BatchItem(
+            key=f"entity-{i}",
+            prompt=f"Evaluate entity {i}",
+            content_digest=f"{digest_prefix}{i}",
+            metadata={"index": i},
+        )
+        for i in range(n)
+    ]
+
+
+# ── BatchItem / BatchResult / BatchSummary ───────────────────────────
+
+
+class TestBatchModels:
+    def test_batch_item_defaults(self):
+        item = BatchItem(key="slug", prompt="text")
+        assert item.content_digest == ""
+        assert item.metadata == {}
+
+    def test_batch_result_defaults(self):
+        result = BatchResult(key="slug", status="success")
+        assert result.response is None
+        assert result.error is None
+
+    def test_summary_total_tokens(self):
+        s = BatchSummary(total_prompt_tokens=100, total_completion_tokens=50)
+        assert s.total_tokens == 150
+
+    def test_summary_success_rate_all_success(self):
+        s = BatchSummary(total=3, succeeded=3)
+        assert s.success_rate() == 1.0
+
+    def test_summary_success_rate_with_failures(self):
+        s = BatchSummary(total=4, succeeded=2, failed=2)
+        assert s.success_rate() == pytest.approx(0.5)
+
+    def test_summary_success_rate_all_skipped(self):
+        s = BatchSummary(total=3, skipped=3)
+        assert s.success_rate() == 1.0
+
+    def test_summary_success_rate_mixed(self):
+        s = BatchSummary(total=5, succeeded=2, failed=1, skipped=2)
+        # 3 attempted, 2 succeeded
+        assert s.success_rate() == pytest.approx(2 / 3)
+
+
+# ── BatchEvaluator ──────────────────────────────────────────────────
+
+
+class TestBatchEvaluator:
+    def test_evaluate_all_items(self):
+        adapter = MockLLMAdapter("result")
+        evaluator = BatchEvaluator(adapter)
+        summary = evaluator.evaluate(_items(3))
+
+        assert summary.total == 3
+        assert summary.succeeded == 3
+        assert summary.failed == 0
+        assert summary.skipped == 0
+        assert len(summary.results) == 3
+        assert adapter.call_count == 3
+
+    def test_results_preserve_keys(self):
+        adapter = MockLLMAdapter("ok")
+        evaluator = BatchEvaluator(adapter)
+        items = _items(2)
+        summary = evaluator.evaluate(items)
+
+        keys = [r.key for r in summary.results]
+        assert keys == ["entity-0", "entity-1"]
+
+    def test_results_preserve_metadata(self):
+        adapter = MockLLMAdapter("ok")
+        evaluator = BatchEvaluator(adapter)
+        items = _items(1)
+        summary = evaluator.evaluate(items)
+        assert summary.results[0].metadata == {"index": 0}
+
+    def test_response_content_available(self):
+        adapter = MockLLMAdapter("evaluated text")
+        evaluator = BatchEvaluator(adapter)
+        summary = evaluator.evaluate(_items(1))
+        assert summary.results[0].response.content == "evaluated text"
+
+    def test_token_usage_aggregated(self):
+        adapter = MockLLMAdapter("result")
+        evaluator = BatchEvaluator(adapter)
+        summary = evaluator.evaluate(_items(3))
+        assert summary.total_prompt_tokens > 0
+        assert summary.total_completion_tokens > 0
+        assert summary.total_tokens == summary.total_prompt_tokens + summary.total_completion_tokens
+
+    def test_config_passed_to_adapter(self):
+        adapter = MockLLMAdapter("ok")
+        config = RunConfig(temperature=0.1, max_tokens=500)
+        evaluator = BatchEvaluator(adapter, config=config)
+        evaluator.evaluate(_items(1))
+        assert adapter.last_config.temperature == 0.1
+        assert adapter.last_config.max_tokens == 500
+
+
+# ── Incremental evaluation ──────────────────────────────────────────
+
+
+class TestIncrementalEvaluation:
+    def test_skip_unchanged_items(self):
+        adapter = MockLLMAdapter("result")
+        previous = {"entity-0": "d0", "entity-1": "d1", "entity-2": "d2"}
+        evaluator = BatchEvaluator(adapter, previous_digests=previous)
+
+        summary = evaluator.evaluate(_items(3))
+        assert summary.skipped == 3
+        assert summary.succeeded == 0
+        assert adapter.call_count == 0
+
+    def test_evaluate_changed_items(self):
+        adapter = MockLLMAdapter("result")
+        # Only entity-0 has matching digest
+        previous = {"entity-0": "d0"}
+        evaluator = BatchEvaluator(adapter, previous_digests=previous)
+
+        summary = evaluator.evaluate(_items(3))
+        assert summary.skipped == 1
+        assert summary.succeeded == 2
+        assert adapter.call_count == 2
+
+    def test_evaluate_new_items(self):
+        adapter = MockLLMAdapter("result")
+        # Previous has different keys
+        previous = {"old-entity": "old-digest"}
+        evaluator = BatchEvaluator(adapter, previous_digests=previous)
+
+        summary = evaluator.evaluate(_items(2))
+        assert summary.skipped == 0
+        assert summary.succeeded == 2
+
+    def test_changed_digest_not_skipped(self):
+        adapter = MockLLMAdapter("result")
+        # Same key but different digest
+        previous = {"entity-0": "old-digest"}
+        evaluator = BatchEvaluator(adapter, previous_digests=previous)
+
+        summary = evaluator.evaluate(_items(1))
+        assert summary.skipped == 0
+        assert summary.succeeded == 1
+
+    def test_empty_digest_not_skipped(self):
+        adapter = MockLLMAdapter("result")
+        previous = {"entity-0": "d0"}
+        evaluator = BatchEvaluator(adapter, previous_digests=previous)
+
+        item = BatchItem(key="entity-0", prompt="eval", content_digest="")
+        summary = evaluator.evaluate([item])
+        assert summary.skipped == 0
+        assert summary.succeeded == 1
+
+    def test_skipped_status_in_result(self):
+        adapter = MockLLMAdapter("result")
+        previous = {"entity-0": "d0"}
+        evaluator = BatchEvaluator(adapter, previous_digests=previous)
+
+        summary = evaluator.evaluate(_items(1))
+        assert summary.results[0].status == "skipped"
+        assert summary.results[0].response is None
+
+
+# ── Error handling ──────────────────────────────────────────────────
+
+
+class TestBatchErrorHandling:
+    def test_error_captured_not_raised(self):
+        adapter = ErrorLLMAdapter("kaboom")
+        evaluator = BatchEvaluator(adapter)
+
+        summary = evaluator.evaluate(_items(2))
+        assert summary.failed == 2
+        assert summary.succeeded == 0
+
+    def test_error_message_in_result(self):
+        adapter = ErrorLLMAdapter("something went wrong")
+        evaluator = BatchEvaluator(adapter)
+
+        summary = evaluator.evaluate(_items(1))
+        assert summary.results[0].status == "error"
+        assert "something went wrong" in summary.results[0].error
+
+    def test_error_does_not_stop_batch(self):
+        """One failing item doesn't prevent others from running."""
+        call_count = 0
+
+        class FailOnFirstAdapter(MockLLMAdapter):
+            def execute_prompt(self, prompt, config):
+                nonlocal call_count
+                call_count += 1
+                if call_count == 1:
+                    raise RuntimeError("first fails")
+                return super().execute_prompt(prompt, config)
+
+        adapter = FailOnFirstAdapter("ok")
+        evaluator = BatchEvaluator(adapter)
+        summary = evaluator.evaluate(_items(3))
+
+        assert summary.failed == 1
+        assert summary.succeeded == 2
+        assert summary.results[0].status == "error"
+        assert summary.results[1].status == "success"
+        assert summary.results[2].status == "success"
+
+
+# ── Progress callback ───────────────────────────────────────────────
+
+
+class TestProgressCallback:
+    def test_callback_called_for_each_item(self):
+        calls = []
+        adapter = MockLLMAdapter("ok")
+        evaluator = BatchEvaluator(
+            adapter,
+            progress_callback=lambda done, total, result: calls.append(
+                (done, total, result.key)
+            ),
+        )
+        evaluator.evaluate(_items(3))
+
+        assert len(calls) == 3
+        assert calls[0] == (1, 3, "entity-0")
+        assert calls[1] == (2, 3, "entity-1")
+        assert calls[2] == (3, 3, "entity-2")
+
+    def test_callback_receives_result(self):
+        results = []
+        adapter = MockLLMAdapter("ok")
+        evaluator = BatchEvaluator(
+            adapter,
+            progress_callback=lambda done, total, result: results.append(result),
+        )
+        evaluator.evaluate(_items(2))
+
+        assert all(isinstance(r, BatchResult) for r in results)
+        assert results[0].status == "success"
+
+    def test_no_callback_no_error(self):
+        adapter = MockLLMAdapter("ok")
+        evaluator = BatchEvaluator(adapter)
+        # Should work fine without callback
+        summary = evaluator.evaluate(_items(1))
+        assert summary.succeeded == 1
+
+
+# ── Empty batch ─────────────────────────────────────────────────────
+
+
+class TestEmptyBatch:
+    def test_empty_items(self):
+        adapter = MockLLMAdapter("ok")
+        evaluator = BatchEvaluator(adapter)
+        summary = evaluator.evaluate([])
+
+        assert summary.total == 0
+        assert summary.succeeded == 0
+        assert summary.results == []
+        assert adapter.call_count == 0
Author	SHA1	Message	Date
tegwick	3ac8447c10	feat(example): add baseline metrics snapshot from collection checks run Some checks failed Test Suite / unit-tests (3.11) (push) Has been cancelled Details Test Suite / unit-tests (3.12) (push) Has been cancelled Details Test Suite / code-quality (push) Has been cancelled Details Test Suite / security-scan (push) Has been cancelled Details Test Suite / integration-tests (push) Has been cancelled Details Test Suite / e2e-tests (push) Has been cancelled Details Test Suite / performance-tests (push) Has been cancelled Details Test Suite / test-summary (push) Has been cancelled Details Initial metrics from S2.4 checks on 85 entities (7 of 35 chapters): coverage_ratio=0.361, redundancy=0.0, coherence_components=0.0, consistency_cycles=0.0, granularity_entropy=2.69 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 07:44:01 +01:00
tegwick	94cb2063af	feat(example): migrate to infospace config with tooling integration (S3.1) Add infospace.yaml declaring topic, disciplines, schemas, viability thresholds. Integrate infospace tooling into process_chapters.py with --infospace-status, --infospace-check, and --infospace-viability flags. Initial check: 85 entities, 4/5 viable (coverage 0.36 < 0.50 — only 7/35 chapters processed so far). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:29:53 +01:00
tegwick	d1c6e53754	docs: add infospace primitives reference (S2.7) Reference document covering all infospace tooling primitives: config, entity metadata, schema validation, per-entity evaluation, collection checks, metrics history, viability, composition, and CLI commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:05:09 +01:00
tegwick	b76d6d38c1	feat(infospace): add composition model for discipline binding (S2.6) Discipline resolution, viability checking, entity access, stale mapping detection, and binding management. CLI commands: bind-discipline, disciplines, stale-mappings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:03:54 +01:00
tegwick	ce7f78d57d	feat(infospace): add metrics history and viability tracking (S2.5) History module with snapshot creation from check results, metrics file I/O, auto-append to history after checks, date-based snapshot lookup, and metric trend extraction. CLI commands: history, history-diff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:01:00 +01:00
tegwick	11585e6968	feat(infospace): add collection-level quality checks C1–C5 (S2.4) Five concern checks: Redundancy (embedding/word overlap), Coverage (FCA gap analysis), Coherence (graph connectivity), Consistency (cycle detection), Granularity (Shannon entropy). Orchestrator runs all or selected checks, CLI `markitect infospace check` command added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:54:22 +01:00
tegwick	3461d2f354	feat(infospace): add per-entity evaluation pipeline and CLI command (S2.3) Evaluation pipeline builds prompts from entity metadata, delegates to BatchEvaluator, parses structured LLM responses into ScoreEntry objects, and writes evaluation files. CLI: 'markitect infospace evaluate' with --provider, --entity, --chapter filters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:48:34 +01:00
tegwick	3726503adb	feat(infospace): add lifecycle CLI commands — init, status, entities, viability (S2.2) Adds 'markitect infospace' command group with init (create config), status (entity count/domains/disciplines), entities (list with sort), and viability (threshold dashboard with pass/fail). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:46:54 +01:00
tegwick	b20fe4db68	feat(infospace): add infospace configuration model and state (S2.1) InfospaceConfig (topic, disciplines, schemas, competency questions, viability thresholds, pipeline) with YAML load/save and directory discovery. InfospaceState aggregates entities, evaluations, and viability checks for status reporting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:44:14 +01:00
tegwick	144a88c0c2	feat(prompts): add batch LLM evaluation orchestrator (S1.6) BatchEvaluator runs evaluation prompts across item batches with incremental evaluation (skip unchanged via content digest), per-item error isolation, progress callbacks, and aggregate token usage tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:40:13 +01:00
tegwick	dc22017b7c	feat(analysis): add Formal Concept Analysis for coverage gap detection (S1.7) Pure-Python FCA implementation: FormalContext (entity × attribute binary relation with extent/intent/closure), ConceptLattice via NextClosure algorithm, find_gap_concepts() for structural coverage gaps, and find_empty_cells() for cross-tabulation analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:38:35 +01:00
tegwick	f8c9ab33f0	feat(infospace): add structured evaluation output with history and diffing (S1.5) Add data models (ScoreEntry, EntityEvaluation, EvaluationSnapshot, SnapshotDiff) and I/O utilities for YAML frontmatter evaluation files, snapshot persistence, history append, and snapshot diffing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:35:22 +01:00
tegwick	bad01e32bd	feat(analysis): add graph analysis utilities with networkx (S1.4) Add connected components, betweenness centrality, Louvain community detection, modularity scoring, degree distribution, and cohesion/coupling computation. Wraps DependencyGraph via networkx (optional dependency) for downstream collection-level coherence metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:34:53 +01:00
tegwick	267368eb60	feat(llm): add embedding adapter with cache and similarity utils (S1.3) Add OpenAI-compatible embedding support (works with both OpenAI and OpenRouter), file-based embedding cache with content-digest invalidation, and pure-Python cosine similarity utilities for downstream redundancy detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:22:21 +01:00
tegwick	9031e1162c	feat(infospace): add schema compliance validator (S1.2) Deterministic validation of EntityMeta against declarative schemas: section presence/word counts, heading format, domain enum values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 00:48:57 +01:00
tegwick	03c6c5e8de	feat(infospace): add entity metadata parser (S1.1) Extract section-tree algorithm from SchemaGenerator into standalone core/section_tree.py and build markitect/infospace/ package with EntityMeta dataclass and parse_entity_file/parse_entity_directory. Foundation for schema compliance, coverage, and granularity metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 00:27:45 +01:00
tegwick	b5e994b014	docs: preliminary introduction to Viable Information Spaces Conceptual overview of infospaces as structured, evaluable, composable knowledge collections. Establishes the vocabulary (topic, discipline, entity, viability), the build cycle (extract, map, evaluate, refine), the five collection quality concerns, and the composition model (hierarchical, networked, swarm). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:54:53 +01:00
tegwick	4ce856d4d0	docs: metrics methodology, collection-level tasks, and infospace tooling roadmap Add METRICS-METHODOLOGY.md documenting the theoretical frameworks (SEQUAL, OntoClean, OOPS!, OntoQA, FCA, DSL principles) adapted for two-layer evaluation (LLM-Eval + deterministic aggregation) across five collection concerns: redundancy, coverage, coherence, consistency, and granularity balance. Extend INFRA-TASKS.md with assignment assessment (tasks 4-7), per-concept metrics (tasks 8-12), and collection-level metrics (tasks 13-19). Add roadmap/infospace-tooling/PLAN.md defining terminology (infospace, topic, discipline, entity, evaluation, viability) and a three-stage implementation plan: Stage 1 platform additions, Stage 2 infospace tooling layer, Stage 3 example revision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:53:21 +01:00
tegwick	2f0989f9bf	docs(infospace): document infospace.db and add to .gitignore The SQLite artifact database is a derived cache regenerable from committed files — no LLM calls needed. Added tutorial section explaining why it is excluded and how to rebuild it after a fresh clone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 22:27:08 +01:00