docs: add infospace primitives reference (S2.7)

Reference document covering all infospace tooling primitives: config, entity metadata, schema validation, per-entity evaluation, collection checks, metrics history, viability, composition, and CLI commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:05:09 +01:00
parent b76d6d38c1
commit d1c6e53754
1 changed files with 344 additions and 0 deletions
--- a/docs/infospace-primitives.md
+++ b/docs/infospace-primitives.md
@@ -0,0 +1,344 @@
+# Infospace Primitives Reference
+
+This document describes the primitives provided by the `markitect/infospace/`
+package for creating, evaluating, maintaining, and composing infospaces.
+
+---
+
+## Core Concepts
+
+An **infospace** is a structured, evaluable, composable collection of
+entities that explains a **topic** through the lens of one or more
+**disciplines**.
+
+| Term | Meaning |
+|------|---------|
+| **Topic** | The subject matter being explained |
+| **Discipline** | A reusable framework of concepts applied as an analytical lens |
+| **Entity** | The atomic unit of knowledge — slug, definition, provenance, domain |
+| **Evaluation** | Per-entity or collection-level quality assessment |
+| **Viability** | Whether an infospace meets its threshold scores |
+
+---
+
+## Configuration (`infospace.yaml`)
+
+Every infospace is declared via an `infospace.yaml` file. The configuration
+model is defined in `markitect/infospace/config.py`.
+
+### Minimal example
+
+```yaml
+topic:
+  name: "The Wealth of Nations"
+  domain: "Classical Economics"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Viable System Model"
+    path: artifacts/vsm-reference/
+
+schemas:
+  entity: schemas/economic-entity-schema-v1.0.md
+
+viability:
+  coverage_ratio: { min: 0.60 }
+  redundancy_ratio: { max: 0.05 }
+  per_entity_mean: { min: 3.5 }
+```
+
+### Key models
+
+- **`TopicConfig`** — `name`, `domain`, `sources`
+- **`DisciplineBinding`** — `name`, `path` (to another infospace directory)
+- **`SchemaRegistry`** — `entity`, `mapping`, `analysis` schema paths
+- **`ViabilityThreshold`** — `metric`, `min`, `max` bounds
+- **`PipelineConfig`** — Ordered list of `PipelineStage` entries
+- **`InfospaceConfig`** — Top-level config combining all of the above
+
+### Default directories
+
+| Setting | Default |
+|---------|---------|
+| `entities_dir` | `output/entities` |
+| `evaluations_dir` | `output/evaluations` |
+| `metrics_dir` | `output/metrics` |
+
+---
+
+## Entity Metadata
+
+Entities are parsed from markdown files by `markitect/infospace/entity_parser.py`.
+
+**`EntityMeta`** fields: `slug`, `title`, `definition`, `domain`,
+`source_chapter`, `context`, `original_wording`, `modern_interpretation`,
+`definition_word_count`, `total_word_count`, `section_slugs`.
+
+```python
+from markitect.infospace import parse_entity_directory
+entities = parse_entity_directory(Path("output/entities"))
+```
+
+---
+
+## Schema Validation
+
+Deterministic validation of entity files against structural schemas.
+
+```python
+from markitect.infospace import validate_entity, ECONOMIC_ENTITY_SCHEMA
+result = validate_entity(entity_meta, schema=ECONOMIC_ENTITY_SCHEMA)
+print(result.summary())
+```
+
+Checks: section presence, word count ranges, heading format, enum values.
+
+---
+
+## Per-entity Evaluation
+
+LLM-based quality assessment of individual entities. Defined in
+`markitect/infospace/evaluate.py`.
+
+```bash
+# Evaluate all entities
+markitect infospace evaluate --provider openrouter
+
+# Single entity
+markitect infospace evaluate --entity division-of-labour --provider openrouter
+```
+
+### Pipeline functions
+
+- `build_evaluation_prompt(entity, topic, dimensions)` — build the LLM prompt
+- `parse_evaluation_response(text, dimensions)` — parse LLM output to `ScoreEntry` list
+- `run_entity_evaluation(config, entities, adapter, ...)` — full batch pipeline
+
+Results are written to `output/evaluations/` as YAML frontmatter + markdown.
+
+---
+
+## Collection-level Checks
+
+Five concerns assessed at the collection level. Each has a dedicated
+module in `markitect/infospace/checks/`.
+
+| Concern | Module | Key metric |
+|---------|--------|------------|
+| **C1 — Redundancy** | `redundancy.py` | `redundancy_ratio` |
+| **C2 — Coverage** | `coverage.py` | `coverage_ratio` |
+| **C3 — Coherence** | `coherence.py` | `coherence_components`, `modularity` |
+| **C4 — Consistency** | `consistency.py` | `consistency_cycles` |
+| **C5 — Granularity** | `granularity.py` | `granularity_entropy` |
+
+### Orchestrator
+
+```python
+from markitect.infospace.checks import run_all_checks
+report = run_all_checks(entities, embeddings=emb, graph=g)
+metrics = report.metrics()  # Dict[str, float]
+```
+
+### CLI
+
+```bash
+# Run all checks
+markitect infospace check
+
+# Run specific concerns
+markitect infospace check --concern redundancy --concern coverage
+
+# JSON output
+markitect infospace check --json
+```
+
+After each check run, metrics are automatically recorded to history.
+
+---
+
+## Metrics History
+
+Timestamped snapshots track metrics over time. Defined in
+`markitect/infospace/history.py`.
+
+```bash
+# Show history
+markitect infospace history
+
+# Trend for a single metric
+markitect infospace history --metric coverage_ratio
+
+# Compare two snapshots
+markitect infospace history-diff 2026-02-01 2026-03-01
+```
+
+### Key functions
+
+- `snapshot_from_checks(report, entity_count)` — create snapshot from check results
+- `record_check_results(report, config, root, entity_count)` — save metrics + append to history
+- `get_history(config, root)` — read full history
+- `metric_trend(history, metric_name)` — extract single metric across time
+
+---
+
+## Viability
+
+Viability is assessed by comparing current metrics to thresholds declared
+in `infospace.yaml`.
+
+```bash
+markitect infospace viability
+```
+
+### Threshold model
+
+```yaml
+viability:
+  coverage_ratio: { min: 0.60 }       # must be >= 0.60
+  redundancy_ratio: { max: 0.05 }     # must be <= 0.05
+  consistency_cycles: { max: 0 }       # must be exactly 0
+```
+
+Each threshold has `min` and/or `max` bounds. A metric passes if it falls
+within bounds. An infospace is viable when all thresholds pass.
+
+---
+
+## Composition
+
+One infospace can use another as a discipline. The composition model is
+defined in `markitect/infospace/composition.py`.
+
+### Binding a discipline
+
+```bash
+markitect infospace bind-discipline ./path/to/vsm-infospace --name "Viable System Model"
+```
+
+This adds a `DisciplineBinding` to `infospace.yaml` and validates the
+discipline exists and has an `infospace.yaml`.
+
+### Checking discipline status
+
+```bash
+markitect infospace disciplines
+```
+
+Shows: name, entity count, viability status, path.
+
+### Viability requirement
+
+A discipline must meet its own viability thresholds to be considered
+reliable. The `check_discipline_status()` function loads the discipline's
+metrics and runs its own threshold checks.
+
+### Stale mapping detection
+
+```bash
+markitect infospace stale-mappings
+```
+
+Compares local mapping references against the discipline's current entity
+set. If a referenced discipline entity has been removed, the mapping is
+flagged as stale.
+
+### Key functions
+
+- `resolve_discipline_path(binding, root)` — resolve to absolute path
+- `load_discipline_config(binding, root)` — load discipline's `infospace.yaml`
+- `check_discipline_status(binding, root)` — full status with viability
+- `get_discipline_entities(binding, root)` — entity list from discipline
+- `find_stale_mappings(config, root, mapping_references)` — detect stale refs
+- `bind_discipline(config, name, path, root)` — add binding to config
+
+---
+
+## Evaluation Output Format
+
+Evaluation results use YAML frontmatter + markdown body. Defined in
+`markitect/infospace/evaluation.py` and `evaluation_io.py`.
+
+### Per-entity evaluation file
+
+```markdown
+---
+entity_slug: division-of-labour
+evaluator: openrouter/default
+evaluated_at: '2026-02-19T10:30:00'
+overall_score: 4.1667
+scores:
+- name: definition_precision
+  value: 4.5
+  max_value: 5.0
+...
+---
+
+# Evaluation: Division Of Labour
+
+## definition_precision — 4.5 / 5.0
+
+The definition clearly captures the core concept...
+```
+
+### Snapshot
+
+```yaml
+snapshot_id: abc12345
+created_at: '2026-02-19T10:30:00+00:00'
+schema_name: default
+entity_count: 85
+entity_evaluations: [...]
+collection_metrics:
+  - name: coverage_ratio
+    value: 0.75
+    concern: C2
+```
+
+---
+
+## State
+
+Runtime state is computed from entities, evaluations, and metrics.
+Defined in `markitect/infospace/state.py`.
+
+```python
+from markitect.infospace import build_state
+state = build_state(config, entities=entities, metrics=metrics)
+state.is_viable          # True if all thresholds pass
+state.viability_results  # List[ViabilityResult]
+state.summary()          # Dict for display
+```
+
+---
+
+## CLI Command Summary
+
+All commands are under `markitect infospace`:
+
+| Command | Purpose |
+|---------|---------|
+| `init` | Create a new `infospace.yaml` |
+| `status` | Show entity count, domains, evaluation state |
+| `entities` | List entities with metadata |
+| `evaluate` | Run per-entity LLM evaluation |
+| `check` | Run collection-level quality checks (C1-C5) |
+| `viability` | Show viability dashboard |
+| `history` | Show metrics history |
+| `history-diff` | Compare two snapshots by date |
+| `bind-discipline` | Bind an external infospace as a discipline |
+| `disciplines` | List bound disciplines and viability |
+| `stale-mappings` | Detect stale cross-infospace references |
+
+---
+
+## Platform Dependencies
+
+The infospace tooling builds on these platform modules:
+
+| Module | Used for |
+|--------|----------|
+| `markitect/llm/` | Embedding adapters, LLM evaluation |
+| `markitect/analysis/graph.py` | Graph analysis (networkx wrapper) |
+| `markitect/analysis/fca.py` | Formal Concept Analysis |
+| `markitect/prompts/execution/batch.py` | Batch LLM evaluation |
+| `markitect/prompts/dependencies/models.py` | DependencyGraph |