docs: metrics methodology, collection-level tasks, and infospace tooling roadmap

Add METRICS-METHODOLOGY.md documenting the theoretical frameworks (SEQUAL, OntoClean, OOPS!, OntoQA, FCA, DSL principles) adapted for two-layer evaluation (LLM-Eval + deterministic aggregation) across five collection concerns: redundancy, coverage, coherence, consistency, and granularity balance. Extend INFRA-TASKS.md with assignment assessment (tasks 4-7), per-concept metrics (tasks 8-12), and collection-level metrics (tasks 13-19). Add roadmap/infospace-tooling/PLAN.md defining terminology (infospace, topic, discipline, entity, evaluation, viability) and a three-stage implementation plan: Stage 1 platform additions, Stage 2 infospace tooling layer, Stage 3 example revision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 23:53:21 +01:00
parent 2f0989f9bf
commit 4ce856d4d0
3 changed files with 1632 additions and 0 deletions
--- a/roadmap/infospace-tooling/PLAN.md
+++ b/roadmap/infospace-tooling/PLAN.md
@@ -0,0 +1,621 @@
+# Viable Infospace Tooling — Roadmap
+
+## Vision
+
+An **infospace** is a structured, evaluable, composable collection of
+concepts that explains a **topic** through the lens of one or more
+**disciplines**. Infospaces are the unit of knowledge work in MarkiTect.
+
+This roadmap organises the work needed to move from the current
+ad-hoc example (`infospace-with-history`) to a general-purpose platform
+for creating, evaluating, maintaining, and composing infospaces.
+
+---
+
+## Terminology
+
+These terms establish the vocabulary for infospace tooling. They
+generalise from the Wealth of Nations / VSM example but are not
+specific to it.
+
+### Infospace
+
+A curated, self-describing collection of **entities** (concepts,
+mechanisms, observations) that together explain a **topic**. An
+infospace has:
+
+- A **topic** — the subject matter being explained (e.g. "The Wealth
+  of Nations", "cellular biology", "Kubernetes networking")
+- One or more **disciplines** — external frameworks applied as lenses
+  (e.g. "Viable System Model", "category theory")
+- **Entities** — the atomic units of knowledge, each with a definition,
+  provenance, and quality scores
+- **Schemas** — structural templates that define what a well-formed
+  entity, mapping, or analysis looks like
+- **Evaluations** — per-entity and collection-level quality assessments
+- **Metrics** — quantitative indicators of completeness, coherence,
+  consistency, and granularity balance
+
+An infospace is **viable** when it meets threshold scores across its
+defined metrics — it is fit for purpose as an explanatory tool.
+
+### Topic
+
+The subject matter an infospace is built to explain. A topic sits
+within a **domain** (broader field of knowledge) but is more specific:
+
+- Domain: Economics → Topic: The Wealth of Nations
+- Domain: Systems Theory → Topic: Viable System Model
+- Domain: Computer Science → Topic: Distributed consensus protocols
+
+A topic provides the **source material** — the texts, data, or
+observations from which entities are extracted.
+
+### Discipline
+
+A reusable framework of concepts applied as a lens to explore a topic.
+A discipline is itself an infospace — one that has been evaluated as
+viable and packaged for reuse.
+
+In our example, the VSM is the discipline: a set of concepts (S1-S5,
+recursion, variety, viability) from systems theory, applied to the
+economic concepts in Smith's work.
+
+**Key property:** Disciplines compose. An infospace built with one
+discipline can itself become a discipline for another infospace. The
+Wealth of Nations infospace, viewed through VSM, could become a
+discipline applied to a modern supply chain analysis.
+
+### Entity
+
+The atomic unit of an infospace. An entity has:
+
+- **Identity**: a unique slug and human-readable title
+- **Definition**: a precise, non-circular explanation
+- **Provenance**: the source chapter, passage, and extraction context
+- **Domain placement**: which area of the topic it belongs to
+- **Discipline mapping**: how it connects to the applied discipline
+  (e.g. which VSM system)
+- **Quality scores**: per-entity LLM-evaluated metrics
+- **Lifecycle state**: active, archived (with reason), or draft
+
+### Evaluation
+
+A structured assessment of quality, applied at two levels:
+
+- **Per-entity evaluation**: scores an individual entity against
+  quality rubrics defined in its schema (definition precision, source
+  grounding, discipline relevance, etc.)
+- **Collection evaluation**: scores the entity set as a whole against
+  five concerns: redundancy, coverage, coherence, consistency, and
+  granularity balance
+
+Evaluations are always performed by **delegated LLM calls** through
+MarkiTect's LLM integration — never by the coding agent working on
+infrastructure. This separation ensures that domain-level judgment
+stays in the problem space, not the tooling space.
+
+### Viability
+
+An infospace is viable when:
+
+1. Its entities individually meet quality thresholds (per-entity eval)
+2. Its collection metrics are within acceptable ranges
+3. It can answer its defined **competency questions** — the canonical
+   queries the infospace is meant to support
+4. It has been evaluated recently enough that metrics reflect current
+   content
+
+Viability is not binary — it is a profile of scores that the user
+sets thresholds for based on their needs.
+
+---
+
+## Architecture: Three Layers
+
+```
+┌──────────────────────────────────────────────────┐
+│  Layer 3: Infospace Instances                    │
+│  Specific infospaces built by users              │
+│  (Wealth of Nations + VSM, supply chain + ...)   │
+│  Works IN an infospace                           │
+├──────────────────────────────────────────────────┤
+│  Layer 2: Infospace Tooling                      │
+│  Terminology, primitives, composition model      │
+│  CLI: infospace create/evaluate/compose/...      │
+│  Works WITH infospaces                           │
+├──────────────────────────────────────────────────┤
+│  Layer 1: MarkiTect Platform                     │
+│  Artifacts, prompts, LLM, spaces, graph, embed   │
+│  Provides FOR infospaces                         │
+└──────────────────────────────────────────────────┘
+```
+
+### Boundary condition: LLM delegation
+
+All LLM-based evaluation (entity scoring, pairwise judgments, coverage
+analysis) is delegated to MarkiTect's LLM integration module. The coding
+agent that works on infrastructure never makes domain-level judgments
+itself. This keeps a clean separation:
+
+- **Coding agent** → writes Python, templates, schemas, tests
+- **MarkiTect LLM** → evaluates entities, judges redundancy, assesses
+  coverage, checks consistency
+
+The infospace tooling (Layer 2) orchestrates these LLM calls through
+prompt templates and the prompt execution engine, not through ad-hoc
+prompting.
+
+---
+
+## Stage 1: MarkiTect Platform Additions
+
+Infrastructure that must exist before infospace tooling can be built.
+These are general-purpose platform capabilities, not infospace-specific.
+
+### S1.1 — Entity metadata parser
+
+Add a deterministic markdown parser that extracts structured metadata
+from entity files: H1 title, sections present, word counts, domain,
+source chapter. Returns a dataclass usable by all downstream metrics.
+
+**Maps to:** INFRA-TASKS #13, #10
+**Location:** `markitect/prompts/quality/` or new `markitect/analysis/`
+**Depends on:** Nothing — can start immediately
+**Deliverable:** `parse_entity_metadata(path) -> EntityMeta` function
+with tests
+
+### S1.2 — Schema compliance validator
+
+Deterministic validation of entity/mapping files against their schemas:
+section presence, word count ranges, heading format, enum values. No
+LLM needed.
+
+**Maps to:** INFRA-TASKS #10
+**Location:** `markitect/prompts/quality/validator.py` (extend existing)
+**Depends on:** S1.1
+**Deliverable:** `validate_document(path, schema) -> ValidationResult`
+with tests
+
+### S1.3 — Embedding adapter
+
+Add embedding support to `markitect/llm/`. Needs:
+
+- `EmbeddingAdapter` interface: `embed(texts: list[str]) -> list[list[float]]`
+- `OpenRouterEmbeddingAdapter` implementation (or OpenAI embedding endpoint)
+- Caching layer: store embeddings keyed by `{slug: content_digest}` so
+  unchanged entities skip re-embedding
+- Cosine similarity utility: `similarity_matrix(embeddings) -> np.ndarray`
+
+**Maps to:** INFRA-TASKS #14 (prerequisite)
+**Location:** `markitect/llm/embeddings.py`
+**Depends on:** Nothing — can start immediately
+**Deliverable:** Embedding adapter + cache + similarity computation, with
+tests
+
+### S1.4 — Graph analysis utilities
+
+The existing `DependencyGraph` supports basic traversal and cycle
+detection. Collection-level metrics need richer analysis:
+
+- Connected components
+- Betweenness centrality
+- Community detection (Louvain or label propagation)
+- Modularity score
+- Degree distribution
+- Cohesion/coupling computation
+
+Decide: extend `DependencyGraph` or add a lightweight wrapper that
+converts to networkx (adding it as an optional dependency).
+
+**Maps to:** INFRA-TASKS #16 (prerequisite)
+**Location:** `markitect/prompts/dependencies/analysis.py` or new
+`markitect/analysis/graph.py`
+**Depends on:** Nothing — can start immediately
+**Deliverable:** Graph analysis functions with tests
+
+### S1.5 — Structured evaluation output
+
+Define a standard format for evaluation results: YAML front-matter +
+markdown body. Add utilities for:
+
+- Writing evaluation results (per-entity, per-pair, collection-level)
+- Reading/parsing evaluation results back into dataclasses
+- Appending timestamped snapshots to a history file
+- Diffing two snapshots
+
+**Maps to:** INFRA-TASKS #11, #12
+**Location:** `markitect/prompts/quality/` or `markitect/analysis/`
+**Depends on:** S1.1
+**Deliverable:** `EvaluationResult` model + read/write utilities with
+tests
+
+### S1.6 — Batch LLM evaluation orchestrator
+
+A pipeline component that runs an evaluation prompt template against a
+batch of entities (or entity pairs), collecting structured results.
+Must handle:
+
+- Rate limiting and retry (reuse existing adapter logic)
+- Progress reporting
+- Incremental evaluation (skip entities whose content hasn't changed
+  since last eval)
+- Result aggregation
+
+This is the mechanism by which infospace tooling delegates LLM work
+to the platform.
+
+**Maps to:** INFRA-TASKS #9 (prerequisite)
+**Location:** `markitect/prompts/execution/batch.py`
+**Depends on:** S1.5
+**Deliverable:** `BatchEvaluator` class with tests
+
+### S1.7 — FCA computation
+
+Formal Concept Analysis: build a formal context (entity × attribute
+matrix), compute the concept lattice, extract gap concepts. Either
+implement a minimal FCA algorithm or integrate a library.
+
+**Maps to:** INFRA-TASKS #15 (prerequisite)
+**Location:** `markitect/analysis/fca.py`
+**Depends on:** S1.1
+**Deliverable:** `FormalContext`, `ConceptLattice`, `find_gap_concepts()`
+with tests
+
+### Summary: Stage 1 dependency graph
+
+```
+S1.1 Entity metadata parser ──┬── S1.2 Schema validator
+                               ├── S1.5 Eval output format ── S1.6 Batch evaluator
+                               └── S1.7 FCA computation
+
+S1.3 Embedding adapter ──────── (independent)
+S1.4 Graph analysis ─────────── (independent)
+```
+
+S1.1, S1.3, and S1.4 can proceed in parallel. S1.6 (batch evaluator) is
+the final piece needed before Stage 2 can begin.
+
+---
+
+## Stage 2: Infospace Tooling
+
+The user-facing layer that provides documented primitives for working
+with infospaces. Built on top of Stage 1 infrastructure and the existing
+`markitect/spaces/` module.
+
+### S2.1 — Infospace model and configuration
+
+Define the `Infospace` as a first-class concept that extends the existing
+`InformationSpace` with:
+
+- **Topic declaration**: name, domain, source material reference
+- **Discipline bindings**: which external infospaces are applied as lenses
+- **Schema registry**: which schemas govern entity structure
+- **Competency questions**: what the infospace should be able to answer
+- **Viability thresholds**: minimum acceptable metric scores
+- **Evaluation state**: latest per-entity and collection scores
+
+Configuration format: a `infospace.yaml` (or section in existing config)
+that declares all of the above.
+
+**Location:** new `markitect/infospace/` package
+**Depends on:** S1.1, S1.5, existing `markitect/spaces/`
+**Deliverable:** `InfospaceConfig`, `InfospaceState` models + loader
+
+### S2.2 — Infospace lifecycle commands
+
+CLI commands for the core lifecycle:
+
+```bash
+# Initialise a new infospace
+markitect infospace init --topic "Wealth of Nations" \
+  --domain "Economics" \
+  --discipline vsm-framework
+
+# Show infospace status (entity count, eval state, viability)
+markitect infospace status
+
+# List entities with quality summary
+markitect infospace entities [--sort-by score|domain|chapter]
+
+# Show viability dashboard
+markitect infospace viability
+```
+
+These commands read the `infospace.yaml` config and present information
+from the metadata index and evaluation results.
+
+**Location:** `markitect/infospace/cli.py` integrated into main CLI
+**Depends on:** S2.1
+**Deliverable:** CLI commands with help text and tests
+
+### S2.3 — Per-entity evaluation primitives
+
+Prompt templates and CLI commands for evaluating individual entities:
+
+```bash
+# Evaluate all entities
+markitect infospace evaluate --provider openrouter
+
+# Evaluate entities from a specific chapter
+markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter
+
+# Re-evaluate a single entity
+markitect infospace evaluate --entity division-of-labour --provider openrouter
+```
+
+Uses the batch evaluator (S1.6) to run the evaluate-entity prompt
+template (defined in the infospace's schema directory) against entities.
+Writes structured results to `output/evaluations/`.
+
+**Maps to:** INFRA-TASKS #8, #9
+**Location:** `markitect/infospace/evaluation.py`
+**Depends on:** S1.6, S2.1
+**Deliverable:** Per-entity evaluation pipeline + CLI + prompt template
+
+### S2.4 — Collection-level checks
+
+CLI commands for each of the five collection concerns:
+
+```bash
+# Run all collection checks
+markitect infospace check --provider openrouter
+
+# Run specific checks
+markitect infospace check redundancy --provider openrouter
+markitect infospace check coverage --provider openrouter
+markitect infospace check coherence --provider openrouter
+markitect infospace check consistency --provider openrouter
+markitect infospace check granularity --provider openrouter
+```
+
+Each check uses Stage 1 infrastructure (embeddings, graph analysis, FCA)
+and delegates LLM judgment to the platform. Results written to
+`output/metrics/` as per-concern reports + unified `metrics.yaml`.
+
+**Maps to:** INFRA-TASKS #14-19
+**Location:** `markitect/infospace/checks/` (one module per concern)
+**Depends on:** S1.3, S1.4, S1.6, S1.7, S2.1
+**Deliverable:** Five check modules + unified orchestrator + CLI
+
+### S2.5 — Metrics history and viability tracking
+
+Track metrics over time. After each evaluation or check run, append a
+timestamped snapshot to `metrics-history.yaml`. Provide commands to
+review trends:
+
+```bash
+# Show metrics history
+markitect infospace history
+
+# Compare two snapshots
+markitect infospace history diff 2026-02-18 2026-03-01
+
+# Check viability against thresholds
+markitect infospace viability
+```
+
+Viability is assessed by comparing current metrics to the thresholds
+declared in `infospace.yaml`. A simple pass/fail per metric with the
+actual value.
+
+**Maps to:** INFRA-TASKS #12
+**Location:** `markitect/infospace/history.py`
+**Depends on:** S2.4, S1.5
+**Deliverable:** History tracking + viability assessment + CLI
+
+### S2.6 — Infospace composition model
+
+The mechanism by which one infospace is applied as a discipline to
+another. Builds on `markitect/spaces/composability/`:
+
+- **Discipline binding**: declare that infospace A uses infospace B as a
+  discipline. B's entities become available as mapping targets.
+- **Cross-infospace references**: entity in A maps to concept in B using
+  the same mapping schema and evaluation pipeline.
+- **Discipline viability requirement**: B must be viable (meets its own
+  thresholds) before it can be used as a discipline for A.
+- **Cascading evaluation**: when B's entities change, A's mappings that
+  reference them are flagged for re-evaluation.
+
+```bash
+# Bind a discipline to the current infospace
+markitect infospace bind-discipline ./path/to/vsm-infospace
+
+# List bound disciplines and their viability
+markitect infospace disciplines
+
+# Check for stale mappings after discipline update
+markitect infospace check stale-mappings
+```
+
+**Location:** `markitect/infospace/composition.py`
+**Depends on:** S2.1, existing `markitect/spaces/composability/`
+**Deliverable:** Composition model + CLI + documentation
+
+### S2.7 — Documentation: Infospace Primitives Reference
+
+A reference document explaining all primitives, their purpose, and how
+they compose. This is the user-facing documentation for the infospace
+tooling layer — the equivalent of a framework guide.
+
+**Location:** `docs/infospace-primitives.md` or in-CLI help
+**Depends on:** S2.1-S2.6
+**Deliverable:** Reference documentation
+
+### Summary: Stage 2 dependency graph
+
+```
+S2.1 Model & config ──┬── S2.2 Lifecycle CLI
+                       ├── S2.3 Per-entity evaluation
+                       ├── S2.4 Collection checks ── S2.5 History & viability
+                       └── S2.6 Composition model
+
+S2.7 Documentation (depends on all above)
+```
+
+---
+
+## Stage 3: Example Revision
+
+Revisit the Wealth of Nations / VSM example using the new tooling.
+The example becomes both a tutorial and a validation of the tooling.
+
+### S3.1 — Migrate example to infospace configuration
+
+Replace the ad-hoc `process_chapters.py` setup with a declarative
+`infospace.yaml`:
+
+```yaml
+topic:
+  name: "The Wealth of Nations"
+  domain: "Classical Economics"
+  sources: artifacts/sources/
+
+disciplines:
+  - name: "Viable System Model"
+    path: artifacts/vsm-reference/
+
+schemas:
+  entity: schemas/economic-entity-schema-v1.0.md
+  mapping: schemas/vsm-mapping-schema-v1.0.md
+  analysis: schemas/chapter-analysis-schema-v1.0.md
+
+competency_questions: schemas/competency-questions.md
+
+viability:
+  redundancy_ratio: { max: 0.05 }
+  coverage_ratio: { min: 0.60 }
+  coherence_components: { max: 1 }
+  consistency_cycles: { max: 0 }
+  granularity_entropy: { min: 1.0 }
+  per_entity_mean: { min: 3.5 }
+
+pipeline:
+  stages:
+    - template: extract-entities
+      spaces: [sources, guidelines, vsm-reference, entities]
+    - template: map-to-vsm
+      spaces: [entities, vsm-reference, guidelines]
+    - template: synthesize-analysis
+      spaces: [sources, entities, mappings, vsm-reference]
+  post_batch:
+    - template: assess-metrics
+      spaces: [analyses, vsm-reference]
+```
+
+**Depends on:** S2.1
+**Deliverable:** `infospace.yaml` + migration of `process_chapters.py` to
+use infospace tooling APIs
+
+### S3.2 — Clean per-chapter git history
+
+Re-run all processed chapters (and remaining ones) with per-chapter
+commits on a clean branch, then replace the current tangled history.
+
+**Maps to:** INFRA-TASKS #4, #7
+**Depends on:** S3.1
+**Deliverable:** Clean branch with one commit per chapter
+
+### S3.3 — Full evaluation run
+
+Run all per-entity evaluations and collection checks on the completed
+infospace. Establish baseline metrics. Demonstrate the viability
+dashboard.
+
+**Maps to:** INFRA-TASKS #6
+**Depends on:** S2.3, S2.4, S2.5, S3.2
+**Deliverable:** Complete evaluation results + viability report
+
+### S3.4 — Rewrite tutorial
+
+Update `TUTORIAL.md` to use infospace tooling commands instead of
+raw `process_chapters.py` invocations. The tutorial should walk
+through:
+
+1. Initialising an infospace (`markitect infospace init`)
+2. Defining schemas and competency questions
+3. Processing chapters (pipeline execution)
+4. Evaluating entities (`markitect infospace evaluate`)
+5. Running collection checks (`markitect infospace check`)
+6. Reviewing viability (`markitect infospace viability`)
+7. Iterating: refining guidelines, re-processing, re-evaluating
+8. Using the infospace as a discipline for a new project
+
+**Depends on:** S3.1-S3.3
+**Deliverable:** Revised `TUTORIAL.md`
+
+### S3.5 — Demonstrate composition
+
+Create a minimal second infospace (e.g. a modern supply chain case
+study or a different economic text) that binds the Wealth of Nations
+infospace as a discipline. Demonstrates the composition model from S2.6.
+
+**Depends on:** S2.6, S3.3
+**Deliverable:** Second example infospace + composition tutorial section
+
+---
+
+## Task Mapping
+
+Cross-reference between INFRA-TASKS numbers and roadmap stages:
+
+| INFRA-TASK | Description | Stage |
+|------------|-------------|-------|
+| 1-3 | Infra fixes (resolved) | — |
+| 4 | Per-chapter git history | S3.2 |
+| 5 | Prompt file side-effects | S1.6 (batch eval avoids this) |
+| 6 | Stale metrics | S3.3 |
+| 7 | Remaining 28 chapters | S3.2 |
+| 8 | Per-concept quality metrics in schema | S2.3 |
+| 9 | Evaluate-entity prompt template | S2.3 |
+| 10 | Deterministic schema compliance | S1.2 |
+| 11 | Structured metrics output | S1.5 |
+| 12 | Metrics-over-time tracking | S2.5 |
+| 13 | Entity metadata index | S1.1 |
+| 14 | Redundancy detection (C1) | S2.4 |
+| 15 | Coverage completeness (C2) | S2.4 |
+| 16 | Structural coherence (C3) | S2.4 |
+| 17 | Definitional consistency (C4) | S2.4 |
+| 18 | Granularity balance (C5) | S2.4 |
+| 19 | Unified collection evaluation | S2.4 |
+
+---
+
+## Implementation Order
+
+Recommended sequence, accounting for dependencies and value delivery:
+
+**Phase A — Foundation (Stage 1, parallelisable)**
+1. S1.1 Entity metadata parser
+2. S1.3 Embedding adapter
+3. S1.4 Graph analysis utilities
+
+**Phase B — Validation & Output (Stage 1)**
+4. S1.2 Schema compliance validator (needs S1.1)
+5. S1.5 Structured evaluation output (needs S1.1)
+6. S1.7 FCA computation (needs S1.1)
+
+**Phase C — Orchestration (Stage 1 → Stage 2 bridge)**
+7. S1.6 Batch LLM evaluation orchestrator (needs S1.5)
+
+**Phase D — Infospace Core (Stage 2)**
+8. S2.1 Infospace model and configuration
+9. S2.2 Lifecycle commands
+10. S2.3 Per-entity evaluation primitives (needs S1.6, S2.1)
+
+**Phase E — Collection Intelligence (Stage 2)**
+11. S2.4 Collection-level checks (needs S1.3, S1.4, S1.7, S2.1)
+12. S2.5 Metrics history and viability tracking
+
+**Phase F — Composition (Stage 2)**
+13. S2.6 Infospace composition model
+14. S2.7 Documentation
+
+**Phase G — Example (Stage 3)**
+15. S3.1 Migrate example to infospace config
+16. S3.2 Clean per-chapter history
+17. S3.3 Full evaluation run
+18. S3.4 Rewrite tutorial
+19. S3.5 Demonstrate composition