Files
markitect-main/docs/infospace-primitives.md
tegwick d1c6e53754 docs: add infospace primitives reference (S2.7)
Reference document covering all infospace tooling primitives: config,
entity metadata, schema validation, per-entity evaluation, collection
checks, metrics history, viability, composition, and CLI commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:05:09 +01:00

345 lines
9.1 KiB
Markdown

# Infospace Primitives Reference
This document describes the primitives provided by the `markitect/infospace/`
package for creating, evaluating, maintaining, and composing infospaces.
---
## Core Concepts
An **infospace** is a structured, evaluable, composable collection of
entities that explains a **topic** through the lens of one or more
**disciplines**.
| Term | Meaning |
|------|---------|
| **Topic** | The subject matter being explained |
| **Discipline** | A reusable framework of concepts applied as an analytical lens |
| **Entity** | The atomic unit of knowledge — slug, definition, provenance, domain |
| **Evaluation** | Per-entity or collection-level quality assessment |
| **Viability** | Whether an infospace meets its threshold scores |
---
## Configuration (`infospace.yaml`)
Every infospace is declared via an `infospace.yaml` file. The configuration
model is defined in `markitect/infospace/config.py`.
### Minimal example
```yaml
topic:
name: "The Wealth of Nations"
domain: "Classical Economics"
sources: artifacts/sources/
disciplines:
- name: "Viable System Model"
path: artifacts/vsm-reference/
schemas:
entity: schemas/economic-entity-schema-v1.0.md
viability:
coverage_ratio: { min: 0.60 }
redundancy_ratio: { max: 0.05 }
per_entity_mean: { min: 3.5 }
```
### Key models
- **`TopicConfig`** — `name`, `domain`, `sources`
- **`DisciplineBinding`** — `name`, `path` (to another infospace directory)
- **`SchemaRegistry`** — `entity`, `mapping`, `analysis` schema paths
- **`ViabilityThreshold`** — `metric`, `min`, `max` bounds
- **`PipelineConfig`** — Ordered list of `PipelineStage` entries
- **`InfospaceConfig`** — Top-level config combining all of the above
### Default directories
| Setting | Default |
|---------|---------|
| `entities_dir` | `output/entities` |
| `evaluations_dir` | `output/evaluations` |
| `metrics_dir` | `output/metrics` |
---
## Entity Metadata
Entities are parsed from markdown files by `markitect/infospace/entity_parser.py`.
**`EntityMeta`** fields: `slug`, `title`, `definition`, `domain`,
`source_chapter`, `context`, `original_wording`, `modern_interpretation`,
`definition_word_count`, `total_word_count`, `section_slugs`.
```python
from markitect.infospace import parse_entity_directory
entities = parse_entity_directory(Path("output/entities"))
```
---
## Schema Validation
Deterministic validation of entity files against structural schemas.
```python
from markitect.infospace import validate_entity, ECONOMIC_ENTITY_SCHEMA
result = validate_entity(entity_meta, schema=ECONOMIC_ENTITY_SCHEMA)
print(result.summary())
```
Checks: section presence, word count ranges, heading format, enum values.
---
## Per-entity Evaluation
LLM-based quality assessment of individual entities. Defined in
`markitect/infospace/evaluate.py`.
```bash
# Evaluate all entities
markitect infospace evaluate --provider openrouter
# Single entity
markitect infospace evaluate --entity division-of-labour --provider openrouter
```
### Pipeline functions
- `build_evaluation_prompt(entity, topic, dimensions)` — build the LLM prompt
- `parse_evaluation_response(text, dimensions)` — parse LLM output to `ScoreEntry` list
- `run_entity_evaluation(config, entities, adapter, ...)` — full batch pipeline
Results are written to `output/evaluations/` as YAML frontmatter + markdown.
---
## Collection-level Checks
Five concerns assessed at the collection level. Each has a dedicated
module in `markitect/infospace/checks/`.
| Concern | Module | Key metric |
|---------|--------|------------|
| **C1 — Redundancy** | `redundancy.py` | `redundancy_ratio` |
| **C2 — Coverage** | `coverage.py` | `coverage_ratio` |
| **C3 — Coherence** | `coherence.py` | `coherence_components`, `modularity` |
| **C4 — Consistency** | `consistency.py` | `consistency_cycles` |
| **C5 — Granularity** | `granularity.py` | `granularity_entropy` |
### Orchestrator
```python
from markitect.infospace.checks import run_all_checks
report = run_all_checks(entities, embeddings=emb, graph=g)
metrics = report.metrics() # Dict[str, float]
```
### CLI
```bash
# Run all checks
markitect infospace check
# Run specific concerns
markitect infospace check --concern redundancy --concern coverage
# JSON output
markitect infospace check --json
```
After each check run, metrics are automatically recorded to history.
---
## Metrics History
Timestamped snapshots track metrics over time. Defined in
`markitect/infospace/history.py`.
```bash
# Show history
markitect infospace history
# Trend for a single metric
markitect infospace history --metric coverage_ratio
# Compare two snapshots
markitect infospace history-diff 2026-02-01 2026-03-01
```
### Key functions
- `snapshot_from_checks(report, entity_count)` — create snapshot from check results
- `record_check_results(report, config, root, entity_count)` — save metrics + append to history
- `get_history(config, root)` — read full history
- `metric_trend(history, metric_name)` — extract single metric across time
---
## Viability
Viability is assessed by comparing current metrics to thresholds declared
in `infospace.yaml`.
```bash
markitect infospace viability
```
### Threshold model
```yaml
viability:
coverage_ratio: { min: 0.60 } # must be >= 0.60
redundancy_ratio: { max: 0.05 } # must be <= 0.05
consistency_cycles: { max: 0 } # must be exactly 0
```
Each threshold has `min` and/or `max` bounds. A metric passes if it falls
within bounds. An infospace is viable when all thresholds pass.
---
## Composition
One infospace can use another as a discipline. The composition model is
defined in `markitect/infospace/composition.py`.
### Binding a discipline
```bash
markitect infospace bind-discipline ./path/to/vsm-infospace --name "Viable System Model"
```
This adds a `DisciplineBinding` to `infospace.yaml` and validates the
discipline exists and has an `infospace.yaml`.
### Checking discipline status
```bash
markitect infospace disciplines
```
Shows: name, entity count, viability status, path.
### Viability requirement
A discipline must meet its own viability thresholds to be considered
reliable. The `check_discipline_status()` function loads the discipline's
metrics and runs its own threshold checks.
### Stale mapping detection
```bash
markitect infospace stale-mappings
```
Compares local mapping references against the discipline's current entity
set. If a referenced discipline entity has been removed, the mapping is
flagged as stale.
### Key functions
- `resolve_discipline_path(binding, root)` — resolve to absolute path
- `load_discipline_config(binding, root)` — load discipline's `infospace.yaml`
- `check_discipline_status(binding, root)` — full status with viability
- `get_discipline_entities(binding, root)` — entity list from discipline
- `find_stale_mappings(config, root, mapping_references)` — detect stale refs
- `bind_discipline(config, name, path, root)` — add binding to config
---
## Evaluation Output Format
Evaluation results use YAML frontmatter + markdown body. Defined in
`markitect/infospace/evaluation.py` and `evaluation_io.py`.
### Per-entity evaluation file
```markdown
---
entity_slug: division-of-labour
evaluator: openrouter/default
evaluated_at: '2026-02-19T10:30:00'
overall_score: 4.1667
scores:
- name: definition_precision
value: 4.5
max_value: 5.0
...
---
# Evaluation: Division Of Labour
## definition_precision — 4.5 / 5.0
The definition clearly captures the core concept...
```
### Snapshot
```yaml
snapshot_id: abc12345
created_at: '2026-02-19T10:30:00+00:00'
schema_name: default
entity_count: 85
entity_evaluations: [...]
collection_metrics:
- name: coverage_ratio
value: 0.75
concern: C2
```
---
## State
Runtime state is computed from entities, evaluations, and metrics.
Defined in `markitect/infospace/state.py`.
```python
from markitect.infospace import build_state
state = build_state(config, entities=entities, metrics=metrics)
state.is_viable # True if all thresholds pass
state.viability_results # List[ViabilityResult]
state.summary() # Dict for display
```
---
## CLI Command Summary
All commands are under `markitect infospace`:
| Command | Purpose |
|---------|---------|
| `init` | Create a new `infospace.yaml` |
| `status` | Show entity count, domains, evaluation state |
| `entities` | List entities with metadata |
| `evaluate` | Run per-entity LLM evaluation |
| `check` | Run collection-level quality checks (C1-C5) |
| `viability` | Show viability dashboard |
| `history` | Show metrics history |
| `history-diff` | Compare two snapshots by date |
| `bind-discipline` | Bind an external infospace as a discipline |
| `disciplines` | List bound disciplines and viability |
| `stale-mappings` | Detect stale cross-infospace references |
---
## Platform Dependencies
The infospace tooling builds on these platform modules:
| Module | Used for |
|--------|----------|
| `markitect/llm/` | Embedding adapters, LLM evaluation |
| `markitect/analysis/graph.py` | Graph analysis (networkx wrapper) |
| `markitect/analysis/fca.py` | Formal Concept Analysis |
| `markitect/prompts/execution/batch.py` | Batch LLM evaluation |
| `markitect/prompts/dependencies/models.py` | DependencyGraph |