markitect-main/docs/infospace-primitives.md

# Infospace Primitives Reference

This document describes the primitives provided by the `markitect/infospace/`
package for creating, evaluating, maintaining, and composing infospaces.

---

## Core Concepts

An **infospace** is a structured, evaluable, composable collection of
entities that explains a **topic** through the lens of one or more
**disciplines**.

| Term | Meaning |
|------|---------|
| **Topic** | The subject matter being explained |
| **Discipline** | A reusable framework of concepts applied as an analytical lens |
| **Entity** | The atomic unit of knowledge — slug, definition, provenance, domain |
| **Evaluation** | Per-entity or collection-level quality assessment |
| **Viability** | Whether an infospace meets its threshold scores |

---

## Configuration (`infospace.yaml`)

Every infospace is declared via an `infospace.yaml` file. The configuration
model is defined in `markitect/infospace/config.py`.

### Minimal example

```yaml
topic:
  name: "The Wealth of Nations"
  domain: "Classical Economics"
  sources: artifacts/sources/

disciplines:
  - name: "Viable System Model"
    path: artifacts/vsm-reference/

schemas:
  entity: schemas/economic-entity-schema-v1.0.md

viability:
  coverage_ratio: { min: 0.60 }
  redundancy_ratio: { max: 0.05 }
  per_entity_mean: { min: 3.5 }
```

### Key models

- **`TopicConfig`** — `name`, `domain`, `sources`
- **`DisciplineBinding`** — `name`, `path` (to another infospace directory)
- **`SchemaRegistry`** — `entity`, `mapping`, `analysis` schema paths
- **`ViabilityThreshold`** — `metric`, `min`, `max` bounds
- **`PipelineConfig`** — Ordered list of `PipelineStage` entries
- **`InfospaceConfig`** — Top-level config combining all of the above

### Default directories

| Setting | Default |
|---------|---------|
| `entities_dir` | `output/entities` |
| `evaluations_dir` | `output/evaluations` |
| `metrics_dir` | `output/metrics` |

---

## Entity Metadata

Entities are parsed from markdown files by `markitect/infospace/entity_parser.py`.

**`EntityMeta`** fields: `slug`, `title`, `definition`, `domain`,
`source_chapter`, `context`, `original_wording`, `modern_interpretation`,
`definition_word_count`, `total_word_count`, `section_slugs`.

```python
from markitect.infospace import parse_entity_directory
entities = parse_entity_directory(Path("output/entities"))
```

---

## Schema Validation

Deterministic validation of entity files against structural schemas.

```python
from markitect.infospace import validate_entity, ECONOMIC_ENTITY_SCHEMA
result = validate_entity(entity_meta, schema=ECONOMIC_ENTITY_SCHEMA)
print(result.summary())
```

Checks: section presence, word count ranges, heading format, enum values.

---

## Per-entity Evaluation

LLM-based quality assessment of individual entities. Defined in
`markitect/infospace/evaluate.py`.

```bash
# Evaluate all entities
markitect infospace evaluate --provider openrouter

# Single entity
markitect infospace evaluate --entity division-of-labour --provider openrouter
```

### Pipeline functions

- `build_evaluation_prompt(entity, topic, dimensions)` — build the LLM prompt
- `parse_evaluation_response(text, dimensions)` — parse LLM output to `ScoreEntry` list
- `run_entity_evaluation(config, entities, adapter, ...)` — full batch pipeline

Results are written to `output/evaluations/` as YAML frontmatter + markdown.

---

## Collection-level Checks

Five concerns assessed at the collection level. Each has a dedicated
module in `markitect/infospace/checks/`.

| Concern | Module | Key metric |
|---------|--------|------------|
| **C1 — Redundancy** | `redundancy.py` | `redundancy_ratio` |
| **C2 — Coverage** | `coverage.py` | `coverage_ratio` |
| **C3 — Coherence** | `coherence.py` | `coherence_components`, `modularity` |
| **C4 — Consistency** | `consistency.py` | `consistency_cycles` |
| **C5 — Granularity** | `granularity.py` | `granularity_entropy` |

### Orchestrator

```python
from markitect.infospace.checks import run_all_checks
report = run_all_checks(entities, embeddings=emb, graph=g)
metrics = report.metrics()  # Dict[str, float]
```

### CLI

```bash
# Run all checks
markitect infospace check

# Run specific concerns
markitect infospace check --concern redundancy --concern coverage

# JSON output
markitect infospace check --json
```

After each check run, metrics are automatically recorded to history.

---

## Metrics History

Timestamped snapshots track metrics over time. Defined in
`markitect/infospace/history.py`.

```bash
# Show history
markitect infospace history

# Trend for a single metric
markitect infospace history --metric coverage_ratio

# Compare two snapshots
markitect infospace history-diff 2026-02-01 2026-03-01
```

### Key functions

- `snapshot_from_checks(report, entity_count)` — create snapshot from check results
- `record_check_results(report, config, root, entity_count)` — save metrics + append to history
- `get_history(config, root)` — read full history
- `metric_trend(history, metric_name)` — extract single metric across time

---

## Viability

Viability is assessed by comparing current metrics to thresholds declared
in `infospace.yaml`.

```bash
markitect infospace viability
```

### Threshold model

```yaml
viability:
  coverage_ratio: { min: 0.60 }       # must be >= 0.60
  redundancy_ratio: { max: 0.05 }     # must be <= 0.05
  consistency_cycles: { max: 0 }       # must be exactly 0
```

Each threshold has `min` and/or `max` bounds. A metric passes if it falls
within bounds. An infospace is viable when all thresholds pass.

---

## Composition

One infospace can use another as a discipline. The composition model is
defined in `markitect/infospace/composition.py`.

### Binding a discipline

```bash
markitect infospace bind-discipline ./path/to/vsm-infospace --name "Viable System Model"
```

This adds a `DisciplineBinding` to `infospace.yaml` and validates the
discipline exists and has an `infospace.yaml`.

### Checking discipline status

```bash
markitect infospace disciplines
```

Shows: name, entity count, viability status, path.

### Viability requirement

A discipline must meet its own viability thresholds to be considered
reliable. The `check_discipline_status()` function loads the discipline's
metrics and runs its own threshold checks.

### Stale mapping detection

```bash
markitect infospace stale-mappings
```

Compares local mapping references against the discipline's current entity
set. If a referenced discipline entity has been removed, the mapping is
flagged as stale.

### Key functions

- `resolve_discipline_path(binding, root)` — resolve to absolute path
- `load_discipline_config(binding, root)` — load discipline's `infospace.yaml`
- `check_discipline_status(binding, root)` — full status with viability
- `get_discipline_entities(binding, root)` — entity list from discipline
- `find_stale_mappings(config, root, mapping_references)` — detect stale refs
- `bind_discipline(config, name, path, root)` — add binding to config

---

## Evaluation Output Format

Evaluation results use YAML frontmatter + markdown body. Defined in
`markitect/infospace/evaluation.py` and `evaluation_io.py`.

### Per-entity evaluation file

```markdown
---
entity_slug: division-of-labour
evaluator: openrouter/default
evaluated_at: '2026-02-19T10:30:00'
overall_score: 4.1667
scores:
- name: definition_precision
  value: 4.5
  max_value: 5.0
...
---

# Evaluation: Division Of Labour

## definition_precision — 4.5 / 5.0

The definition clearly captures the core concept...
```

### Snapshot

```yaml
snapshot_id: abc12345
created_at: '2026-02-19T10:30:00+00:00'
schema_name: default
entity_count: 85
entity_evaluations: [...]
collection_metrics:
  - name: coverage_ratio
    value: 0.75
    concern: C2
```

---

## State

Runtime state is computed from entities, evaluations, and metrics.
Defined in `markitect/infospace/state.py`.

```python
from markitect.infospace import build_state
state = build_state(config, entities=entities, metrics=metrics)
state.is_viable          # True if all thresholds pass
state.viability_results  # List[ViabilityResult]
state.summary()          # Dict for display
```

---

## CLI Command Summary

All commands are under `markitect infospace`:

| Command | Purpose |
|---------|---------|
| `init` | Create a new `infospace.yaml` |
| `status` | Show entity count, domains, evaluation state |
| `entities` | List entities with metadata |
| `evaluate` | Run per-entity LLM evaluation |
| `check` | Run collection-level quality checks (C1-C5) |
| `viability` | Show viability dashboard |
| `history` | Show metrics history |
| `history-diff` | Compare two snapshots by date |
| `bind-discipline` | Bind an external infospace as a discipline |
| `disciplines` | List bound disciplines and viability |
| `stale-mappings` | Detect stale cross-infospace references |

---

## Platform Dependencies

The infospace tooling builds on these platform modules:

| Module | Used for |
|--------|----------|
| `markitect/llm/` | Embedding adapters, LLM evaluation |
| `markitect/analysis/graph.py` | Graph analysis (networkx wrapper) |
| `markitect/analysis/fca.py` | Formal Concept Analysis |
| `markitect/prompts/execution/batch.py` | Batch LLM evaluation |
| `markitect/prompts/dependencies/models.py` | DependencyGraph |