19 Commits

Author SHA1 Message Date
3ac8447c10 feat(example): add baseline metrics snapshot from collection checks run
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Initial metrics from S2.4 checks on 85 entities (7 of 35 chapters):
coverage_ratio=0.361, redundancy=0.0, coherence_components=0.0,
consistency_cycles=0.0, granularity_entropy=2.69

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 07:44:01 +01:00
94cb2063af feat(example): migrate to infospace config with tooling integration (S3.1)
Add infospace.yaml declaring topic, disciplines, schemas, viability
thresholds. Integrate infospace tooling into process_chapters.py with
--infospace-status, --infospace-check, and --infospace-viability flags.

Initial check: 85 entities, 4/5 viable (coverage 0.36 < 0.50 — only
7/35 chapters processed so far).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:29:53 +01:00
d1c6e53754 docs: add infospace primitives reference (S2.7)
Reference document covering all infospace tooling primitives: config,
entity metadata, schema validation, per-entity evaluation, collection
checks, metrics history, viability, composition, and CLI commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:05:09 +01:00
b76d6d38c1 feat(infospace): add composition model for discipline binding (S2.6)
Discipline resolution, viability checking, entity access, stale
mapping detection, and binding management. CLI commands: bind-discipline,
disciplines, stale-mappings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:03:54 +01:00
ce7f78d57d feat(infospace): add metrics history and viability tracking (S2.5)
History module with snapshot creation from check results, metrics file
I/O, auto-append to history after checks, date-based snapshot lookup,
and metric trend extraction. CLI commands: history, history-diff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:01:00 +01:00
11585e6968 feat(infospace): add collection-level quality checks C1–C5 (S2.4)
Five concern checks: Redundancy (embedding/word overlap), Coverage
(FCA gap analysis), Coherence (graph connectivity), Consistency
(cycle detection), Granularity (Shannon entropy). Orchestrator runs
all or selected checks, CLI `markitect infospace check` command added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:54:22 +01:00
3461d2f354 feat(infospace): add per-entity evaluation pipeline and CLI command (S2.3)
Evaluation pipeline builds prompts from entity metadata, delegates
to BatchEvaluator, parses structured LLM responses into ScoreEntry
objects, and writes evaluation files. CLI: 'markitect infospace evaluate'
with --provider, --entity, --chapter filters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:48:34 +01:00
3726503adb feat(infospace): add lifecycle CLI commands — init, status, entities, viability (S2.2)
Adds 'markitect infospace' command group with init (create config),
status (entity count/domains/disciplines), entities (list with sort),
and viability (threshold dashboard with pass/fail).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:46:54 +01:00
b20fe4db68 feat(infospace): add infospace configuration model and state (S2.1)
InfospaceConfig (topic, disciplines, schemas, competency questions,
viability thresholds, pipeline) with YAML load/save and directory
discovery. InfospaceState aggregates entities, evaluations, and
viability checks for status reporting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:44:14 +01:00
144a88c0c2 feat(prompts): add batch LLM evaluation orchestrator (S1.6)
BatchEvaluator runs evaluation prompts across item batches with
incremental evaluation (skip unchanged via content digest), per-item
error isolation, progress callbacks, and aggregate token usage tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:40:13 +01:00
dc22017b7c feat(analysis): add Formal Concept Analysis for coverage gap detection (S1.7)
Pure-Python FCA implementation: FormalContext (entity × attribute
binary relation with extent/intent/closure), ConceptLattice via
NextClosure algorithm, find_gap_concepts() for structural coverage
gaps, and find_empty_cells() for cross-tabulation analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:38:35 +01:00
f8c9ab33f0 feat(infospace): add structured evaluation output with history and diffing (S1.5)
Add data models (ScoreEntry, EntityEvaluation, EvaluationSnapshot,
SnapshotDiff) and I/O utilities for YAML frontmatter evaluation files,
snapshot persistence, history append, and snapshot diffing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:35:22 +01:00
bad01e32bd feat(analysis): add graph analysis utilities with networkx (S1.4)
Add connected components, betweenness centrality, Louvain community
detection, modularity scoring, degree distribution, and cohesion/coupling
computation. Wraps DependencyGraph via networkx (optional dependency)
for downstream collection-level coherence metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:34:53 +01:00
267368eb60 feat(llm): add embedding adapter with cache and similarity utils (S1.3)
Add OpenAI-compatible embedding support (works with both OpenAI and
OpenRouter), file-based embedding cache with content-digest invalidation,
and pure-Python cosine similarity utilities for downstream redundancy
detection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:22:21 +01:00
9031e1162c feat(infospace): add schema compliance validator (S1.2)
Deterministic validation of EntityMeta against declarative schemas:
section presence/word counts, heading format, domain enum values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 00:48:57 +01:00
03c6c5e8de feat(infospace): add entity metadata parser (S1.1)
Extract section-tree algorithm from SchemaGenerator into standalone
core/section_tree.py and build markitect/infospace/ package with
EntityMeta dataclass and parse_entity_file/parse_entity_directory.
Foundation for schema compliance, coverage, and granularity metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 00:27:45 +01:00
b5e994b014 docs: preliminary introduction to Viable Information Spaces
Conceptual overview of infospaces as structured, evaluable, composable
knowledge collections. Establishes the vocabulary (topic, discipline,
entity, viability), the build cycle (extract, map, evaluate, refine),
the five collection quality concerns, and the composition model
(hierarchical, networked, swarm).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 23:54:53 +01:00
4ce856d4d0 docs: metrics methodology, collection-level tasks, and infospace tooling roadmap
Add METRICS-METHODOLOGY.md documenting the theoretical frameworks
(SEQUAL, OntoClean, OOPS!, OntoQA, FCA, DSL principles) adapted for
two-layer evaluation (LLM-Eval + deterministic aggregation) across
five collection concerns: redundancy, coverage, coherence, consistency,
and granularity balance.

Extend INFRA-TASKS.md with assignment assessment (tasks 4-7),
per-concept metrics (tasks 8-12), and collection-level metrics
(tasks 13-19).

Add roadmap/infospace-tooling/PLAN.md defining terminology (infospace,
topic, discipline, entity, evaluation, viability) and a three-stage
implementation plan: Stage 1 platform additions, Stage 2 infospace
tooling layer, Stage 3 example revision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 23:53:21 +01:00
2f0989f9bf docs(infospace): document infospace.db and add to .gitignore
The SQLite artifact database is a derived cache regenerable from
committed files — no LLM calls needed. Added tutorial section
explaining why it is excluded and how to rebuild it after a fresh clone.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 22:27:08 +01:00
62 changed files with 11252 additions and 7 deletions

1
.gitignore vendored
View File

@@ -78,6 +78,7 @@ Thumbs.db
# MarkiTect database files (local development)
markitect.db
**/infospace.db
assets/assets.db
**/assets.db
.markitect/

View File

@@ -0,0 +1,344 @@
# Infospace Primitives Reference
This document describes the primitives provided by the `markitect/infospace/`
package for creating, evaluating, maintaining, and composing infospaces.
---
## Core Concepts
An **infospace** is a structured, evaluable, composable collection of
entities that explains a **topic** through the lens of one or more
**disciplines**.
| Term | Meaning |
|------|---------|
| **Topic** | The subject matter being explained |
| **Discipline** | A reusable framework of concepts applied as an analytical lens |
| **Entity** | The atomic unit of knowledge — slug, definition, provenance, domain |
| **Evaluation** | Per-entity or collection-level quality assessment |
| **Viability** | Whether an infospace meets its threshold scores |
---
## Configuration (`infospace.yaml`)
Every infospace is declared via an `infospace.yaml` file. The configuration
model is defined in `markitect/infospace/config.py`.
### Minimal example
```yaml
topic:
name: "The Wealth of Nations"
domain: "Classical Economics"
sources: artifacts/sources/
disciplines:
- name: "Viable System Model"
path: artifacts/vsm-reference/
schemas:
entity: schemas/economic-entity-schema-v1.0.md
viability:
coverage_ratio: { min: 0.60 }
redundancy_ratio: { max: 0.05 }
per_entity_mean: { min: 3.5 }
```
### Key models
- **`TopicConfig`** — `name`, `domain`, `sources`
- **`DisciplineBinding`** — `name`, `path` (to another infospace directory)
- **`SchemaRegistry`** — `entity`, `mapping`, `analysis` schema paths
- **`ViabilityThreshold`** — `metric`, `min`, `max` bounds
- **`PipelineConfig`** — Ordered list of `PipelineStage` entries
- **`InfospaceConfig`** — Top-level config combining all of the above
### Default directories
| Setting | Default |
|---------|---------|
| `entities_dir` | `output/entities` |
| `evaluations_dir` | `output/evaluations` |
| `metrics_dir` | `output/metrics` |
---
## Entity Metadata
Entities are parsed from markdown files by `markitect/infospace/entity_parser.py`.
**`EntityMeta`** fields: `slug`, `title`, `definition`, `domain`,
`source_chapter`, `context`, `original_wording`, `modern_interpretation`,
`definition_word_count`, `total_word_count`, `section_slugs`.
```python
from markitect.infospace import parse_entity_directory
entities = parse_entity_directory(Path("output/entities"))
```
---
## Schema Validation
Deterministic validation of entity files against structural schemas.
```python
from markitect.infospace import validate_entity, ECONOMIC_ENTITY_SCHEMA
result = validate_entity(entity_meta, schema=ECONOMIC_ENTITY_SCHEMA)
print(result.summary())
```
Checks: section presence, word count ranges, heading format, enum values.
---
## Per-entity Evaluation
LLM-based quality assessment of individual entities. Defined in
`markitect/infospace/evaluate.py`.
```bash
# Evaluate all entities
markitect infospace evaluate --provider openrouter
# Single entity
markitect infospace evaluate --entity division-of-labour --provider openrouter
```
### Pipeline functions
- `build_evaluation_prompt(entity, topic, dimensions)` — build the LLM prompt
- `parse_evaluation_response(text, dimensions)` — parse LLM output to `ScoreEntry` list
- `run_entity_evaluation(config, entities, adapter, ...)` — full batch pipeline
Results are written to `output/evaluations/` as YAML frontmatter + markdown.
---
## Collection-level Checks
Five concerns assessed at the collection level. Each has a dedicated
module in `markitect/infospace/checks/`.
| Concern | Module | Key metric |
|---------|--------|------------|
| **C1 — Redundancy** | `redundancy.py` | `redundancy_ratio` |
| **C2 — Coverage** | `coverage.py` | `coverage_ratio` |
| **C3 — Coherence** | `coherence.py` | `coherence_components`, `modularity` |
| **C4 — Consistency** | `consistency.py` | `consistency_cycles` |
| **C5 — Granularity** | `granularity.py` | `granularity_entropy` |
### Orchestrator
```python
from markitect.infospace.checks import run_all_checks
report = run_all_checks(entities, embeddings=emb, graph=g)
metrics = report.metrics() # Dict[str, float]
```
### CLI
```bash
# Run all checks
markitect infospace check
# Run specific concerns
markitect infospace check --concern redundancy --concern coverage
# JSON output
markitect infospace check --json
```
After each check run, metrics are automatically recorded to history.
---
## Metrics History
Timestamped snapshots track metrics over time. Defined in
`markitect/infospace/history.py`.
```bash
# Show history
markitect infospace history
# Trend for a single metric
markitect infospace history --metric coverage_ratio
# Compare two snapshots
markitect infospace history-diff 2026-02-01 2026-03-01
```
### Key functions
- `snapshot_from_checks(report, entity_count)` — create snapshot from check results
- `record_check_results(report, config, root, entity_count)` — save metrics + append to history
- `get_history(config, root)` — read full history
- `metric_trend(history, metric_name)` — extract single metric across time
---
## Viability
Viability is assessed by comparing current metrics to thresholds declared
in `infospace.yaml`.
```bash
markitect infospace viability
```
### Threshold model
```yaml
viability:
coverage_ratio: { min: 0.60 } # must be >= 0.60
redundancy_ratio: { max: 0.05 } # must be <= 0.05
consistency_cycles: { max: 0 } # must be exactly 0
```
Each threshold has `min` and/or `max` bounds. A metric passes if it falls
within bounds. An infospace is viable when all thresholds pass.
---
## Composition
One infospace can use another as a discipline. The composition model is
defined in `markitect/infospace/composition.py`.
### Binding a discipline
```bash
markitect infospace bind-discipline ./path/to/vsm-infospace --name "Viable System Model"
```
This adds a `DisciplineBinding` to `infospace.yaml` and validates the
discipline exists and has an `infospace.yaml`.
### Checking discipline status
```bash
markitect infospace disciplines
```
Shows: name, entity count, viability status, path.
### Viability requirement
A discipline must meet its own viability thresholds to be considered
reliable. The `check_discipline_status()` function loads the discipline's
metrics and runs its own threshold checks.
### Stale mapping detection
```bash
markitect infospace stale-mappings
```
Compares local mapping references against the discipline's current entity
set. If a referenced discipline entity has been removed, the mapping is
flagged as stale.
### Key functions
- `resolve_discipline_path(binding, root)` — resolve to absolute path
- `load_discipline_config(binding, root)` — load discipline's `infospace.yaml`
- `check_discipline_status(binding, root)` — full status with viability
- `get_discipline_entities(binding, root)` — entity list from discipline
- `find_stale_mappings(config, root, mapping_references)` — detect stale refs
- `bind_discipline(config, name, path, root)` — add binding to config
---
## Evaluation Output Format
Evaluation results use YAML frontmatter + markdown body. Defined in
`markitect/infospace/evaluation.py` and `evaluation_io.py`.
### Per-entity evaluation file
```markdown
---
entity_slug: division-of-labour
evaluator: openrouter/default
evaluated_at: '2026-02-19T10:30:00'
overall_score: 4.1667
scores:
- name: definition_precision
value: 4.5
max_value: 5.0
...
---
# Evaluation: Division Of Labour
## definition_precision — 4.5 / 5.0
The definition clearly captures the core concept...
```
### Snapshot
```yaml
snapshot_id: abc12345
created_at: '2026-02-19T10:30:00+00:00'
schema_name: default
entity_count: 85
entity_evaluations: [...]
collection_metrics:
- name: coverage_ratio
value: 0.75
concern: C2
```
---
## State
Runtime state is computed from entities, evaluations, and metrics.
Defined in `markitect/infospace/state.py`.
```python
from markitect.infospace import build_state
state = build_state(config, entities=entities, metrics=metrics)
state.is_viable # True if all thresholds pass
state.viability_results # List[ViabilityResult]
state.summary() # Dict for display
```
---
## CLI Command Summary
All commands are under `markitect infospace`:
| Command | Purpose |
|---------|---------|
| `init` | Create a new `infospace.yaml` |
| `status` | Show entity count, domains, evaluation state |
| `entities` | List entities with metadata |
| `evaluate` | Run per-entity LLM evaluation |
| `check` | Run collection-level quality checks (C1-C5) |
| `viability` | Show viability dashboard |
| `history` | Show metrics history |
| `history-diff` | Compare two snapshots by date |
| `bind-discipline` | Bind an external infospace as a discipline |
| `disciplines` | List bound disciplines and viability |
| `stale-mappings` | Detect stale cross-infospace references |
---
## Platform Dependencies
The infospace tooling builds on these platform modules:
| Module | Used for |
|--------|----------|
| `markitect/llm/` | Embedding adapters, LLM evaluation |
| `markitect/analysis/graph.py` | Graph analysis (networkx wrapper) |
| `markitect/analysis/fca.py` | Formal Concept Analysis |
| `markitect/prompts/execution/batch.py` | Batch LLM evaluation |
| `markitect/prompts/dependencies/models.py` | DependencyGraph |

View File

@@ -37,3 +37,513 @@ no automatic parsing for this format, requiring manual macro construction.
**Fix applied:** Added `SHORTHAND_PATTERN` to `MacroParser` that recognises
`@{target}` and maps it to `MacroKind.REQUIRED`. Updated `has_macros()`,
`count_macros()`, and `find_macro_positions()` accordingly.
---
## Assignment Assessment (18 Feb 2026)
How the example measures against the objectives stated in `README.md`:
| # | Objective | Status | Notes |
|---|-----------|--------|-------|
| 1 | Capture knowledge from Wealth of Nations | **Partial** | 7 of 35 chapters processed (Book I, ch. 1-7). 85 canonical entities extracted. |
| 2 | Transform to VSM concepts/entities | **Done (for processed chapters)** | Entities mapped to S1-S5 with strength ratings. |
| 3 | Consistent and complete | **Not yet** | Only 20% of chapters done. Metrics report exists but covers limited scope. |
| 4 | Schemas as scaffolding | **Done** | Four schemas defined and used across all stages. |
| 5 | Prompt dependency resolution | **Done** | `@{macro}` templates resolved via MultiSpaceResolutionStrategy. |
| 6 | Incremental chapter injection | **Done** | Pipeline processes one chapter at a time; `@{existing_entities}` prevents duplication. |
| 7 | Keep changes as git history | **Not done** | See task 4 below. |
| 8 | Metrics for completeness/consistency | **Partial** | Template and report exist but only cover 4 chapters (report predates ch. 5-7). |
| 9 | No infrastructure changes during experiment | **Violated** | Three infra fixes were required (tasks 1-3 above). Documented as intended. |
| 10 | Generate task list for infra issues | **Done** | This file. |
## 4. Infospace has no per-chapter git history — OPEN
**Objective:** README states "The information space should utilize the option
of keeping changes as git history."
**Issue:** The 7 processed chapters were committed in mixed batches alongside
infrastructure changes (LLM adapters, entity refactoring, archive policy).
Chapters 1-2 are bundled into `fecc2fd` with the entire LLM module.
Chapters 5-7 share a single commit (`41773f1`) with the OpenAI adapter and
archive policy. There is no commit where you can `git diff` to see exactly
what one chapter contributed to the infospace.
**Impact:** Cannot use `git log`, `git diff`, or `git bisect` to trace how
the infospace grew chapter by chapter — the core promise of "with history."
**Suggested fix:** Re-run the 7 processed chapters (and remaining 28) using
`process_chapters.py` without `--no-commit`, on a clean branch or after
squashing the current output into a baseline commit. Each chapter gets its
own commit via `_git_commit_chapter()`.
## 5. Prompt files are regenerated as a side-effect of DB rebuild — OPEN
**Issue:** Running `--all --no-commit` to regenerate `infospace.db` also
overwrites `*-prompt.md` files in the output directories because each
pipeline stage unconditionally writes the compiled prompt before checking
whether output already exists. The `@{existing_entities}` macro content
shifts as earlier chapters are loaded, so prompt files for already-processed
chapters change on every full run.
**Impact:** A DB regeneration dirties the working tree with prompt file
changes, even though no actual outputs changed. Users must `git checkout`
the prompt files after regeneration.
**Suggested fix:** Skip writing prompt files when the corresponding output
file already exists on disk, or add a `--rebuild-db-only` flag that
populates the database without touching the file system.
## 6. Metrics report is stale — OPEN
**Issue:** The metrics report (`output/metrics/metrics-report.md`) was
generated after chapters 1-4. Chapters 5-7 have since been processed but
the report has not been refreshed.
**Impact:** The metrics do not reflect the current state of the infospace.
**Suggested fix:** Re-run `--metrics --provider <provider> --no-commit`
after every batch of new chapters. Consider making metrics assessment
automatic at the end of `--book` or `--all` runs.
## 7. Remaining 28 chapters not yet processed — OPEN
**Issue:** Only Book I chapters 1-7 have been processed. Books II-V
(28 chapters) remain unprocessed.
**Impact:** The infospace is incomplete — VSM coverage is limited to S1,
S2, and partial S4. S3, S3*, S5, and many systemic concepts (algedonic
signals, recursion, variety) are expected to emerge from later books.
**Suggested fix:** Process remaining chapters in book-sized batches with
per-chapter commits, refreshing metrics after each book.
---
## Per-Concept Metrics (tasks 8-12)
The current metrics system is a single LLM-evaluated narrative report that
assesses the infospace as a whole. It produces no machine-readable output,
cannot be tracked over time, and conflates per-concept quality with
collection-level coherence.
The improvement splits metrics into two layers:
- **LLM-Eval**: A prompt template evaluates each concept individually
against quality criteria defined in the schema. The LLM returns structured
scores, not prose.
- **Deterministic aggregation**: `process_chapters.py` computes what it can
from files on disk (schema compliance, word counts, section presence,
coverage tallies) and aggregates LLM-eval scores into dashboard metrics.
Both layers persist results in structured form so they can be diffed,
tracked over time, and committed alongside the entities they evaluate.
## 8. Add per-concept quality metrics to entity schema — OPEN
**Issue:** The entity schema (`economic-entity-schema-v1.0.md`) defines
required sections and validation rules (section presence, word count range)
but no quality criteria. There is no definition of what makes a *good*
entity versus a merely *compliant* one.
**Suggested fix:** Add a `## Quality Metrics` section to the entity schema
defining evaluation dimensions with scoring rubrics:
- **Definition Precision** (1-5): Is the definition specific, non-circular,
and distinguishable from neighbouring concepts?
- **Source Grounding** (1-5): Is the entity grounded in a specific passage?
Does the citation exist and support the definition?
- **Domain Placement** (1-5): Is the economic domain assignment correct and
specific (not just "General Theory")?
- **VSM Relevance** (1-5): Does the entity connect meaningfully to at least
one VSM system, or is it too granular/abstract to map?
- **Explanatory Value** (1-5): Does this entity contribute to explaining
the economic system, or is it a restatement of another concept?
Similarly update the VSM mapping schema with:
- **Rationale Rigour** (1-5): Is the mapping justified with reference to
Beer's definitions, not just surface-level analogy?
- **Strength Calibration** (1-5): Is the declared strength (Strong/Moderate/
Weak) consistent with the rationale given?
These rubrics become the prompt instructions for task 9.
## 9. Create evaluate-entity prompt template — OPEN
**Depends on:** Task 8 (quality metrics in schema).
**Issue:** There is no mechanism to evaluate an existing entity after
extraction. Quality is only judged implicitly during the global metrics
assessment, which is too coarse to identify individual weak entities.
**Suggested fix:** Create `templates/evaluate-entity.md` — a prompt
template that:
1. Takes `@{entity_content}`, `@{source_chapter}`, `@{vsm_framework}`,
and `@{quality_rubric}` (from the schema's quality metrics section).
2. Asks the LLM to score each dimension (1-5) with a one-sentence
justification per score.
3. Outputs structured YAML front-matter (scores) followed by markdown
(justifications), e.g.:
```yaml
---
entity: division-of-labour
scores:
definition_precision: 5
source_grounding: 5
domain_placement: 4
vsm_relevance: 5
explanatory_value: 5
overall: 4.8
flags: []
---
```
Add a pipeline stage: `--evaluate` runs this template against every
canonical entity and writes results to `output/evaluations/<slug>-eval.md`.
A `--evaluate --chapter <id>` variant evaluates only entities introduced
by that chapter.
## 10. Add deterministic schema compliance checker — OPEN
**Issue:** Schema compliance is currently LLM-evaluated ("100%" in the
metrics report) but the validation rules in the schemas are mechanical:
section presence, word count ranges, heading format. These should be
checked programmatically, not by an LLM.
**Suggested fix:** Add a `validate_entity(path) -> ValidationResult`
function to `process_chapters.py` (or a new `validate.py` module) that:
- Parses the markdown to extract H2 section headings
- Checks required sections are present (Definition, Source Chapter,
Context, Economic Domain)
- Counts words in the Definition section (must be 20-150)
- Checks H1 heading exists and is not a slug (e.g. `effectual-demand`
in chapter 7 has `# effectual-demand` instead of `# Effectual Demand`)
- Validates Source Chapter cites a specific book/chapter
- For mapping files: checks Mapping Strength is one of the enum values
Expose as `--validate` CLI flag. Output a structured report:
```
Validation: 85 entities, 3 warnings
effectual-demand.md: H1 is slug format, not title case
porter.md: Definition is 18 words (minimum 20)
...
```
This is fully deterministic — no LLM calls needed.
## 11. Structured metrics output format — OPEN
**Depends on:** Tasks 9 and 10.
**Issue:** The metrics report is a markdown narrative. Values cannot be
parsed programmatically, diffed meaningfully, or plotted over time.
**Suggested fix:** Alongside the human-readable `metrics-report.md`,
emit a machine-readable `metrics.yaml` (or `.json`) containing:
```yaml
timestamp: "2026-02-18T12:00:00Z"
chapters_processed: 7
chapters_total: 35
entities_total: 85
entities_archived: 0
vsm_coverage:
S1: 28
S2: 12
S3: 8
S3_star: 0
S4: 5
S5: 0
recursion: 1
variety: 0
mapping_strength:
strong: 64
moderate: 18
weak: 3
validation:
schema_compliant: 82
warnings: 3
evaluation: # from LLM-eval (task 9)
mean_overall: 4.2
min_overall: 2.8
flagged_entities: ["porter", "country-workman"]
```
The `--metrics` command writes both files. The YAML file is committed
to git so `git diff` shows exactly how metrics changed between runs.
## 12. Metrics-over-time tracking — OPEN
**Depends on:** Task 11 (structured output).
**Issue:** There is one metrics snapshot that gets overwritten. No history
of how metrics evolved as chapters were added.
**Suggested fix:** Append each metrics snapshot to a cumulative log file
`output/metrics/metrics-history.yaml` (list of timestamped entries). This
is committed to git alongside the current snapshot. The pipeline can
optionally render a simple text-based progress summary:
```
Metrics history (5 snapshots):
2026-02-10 ch 1/35 13 entities 41.7% VSM coverage
2026-02-11 ch 4/35 38 entities 50.0% VSM coverage
2026-02-11 ch 7/35 85 entities 58.3% VSM coverage
...
```
This provides the "metrics that improve over time" feedback loop the
README envisions: process chapters → evaluate → see coverage grow (or
flag regressions when a re-extraction reduces quality scores).
---
## Collection-Level Metrics (tasks 13-19)
These tasks implement the five collection-level concerns described in
`METRICS-METHODOLOGY.md`. They share underlying infrastructure (entity
metadata index, definition embeddings, relationship graph) that should
be built once per evaluation run.
See the methodology document for theoretical grounding, framework
references, and the full metric definitions per concern.
## 13. Entity metadata index — deterministic parsing layer — OPEN
**Depends on:** Task 10 (schema compliance checker shares parsing logic).
**Issue:** Several collection-level metrics (coverage matrix, FCA context,
granularity distribution) require structured metadata extracted from entity
files: H1 title, economic domain, VSM system(s), source chapter, section
presence, word counts. Currently this information exists only as prose
inside markdown files.
**Suggested fix:** Add a `parse_entity_metadata(path) -> EntityMeta`
function that extracts from each entity file:
```python
@dataclass
class EntityMeta:
slug: str
title: str # from H1
domain: str # from Economic Domain section
source_chapter: str # from Source Chapter section
definition_words: int # word count of Definition section
has_original_wording: bool # optional section present?
has_modern_interpretation: bool
vsm_systems: list[str] # from mapping file if exists
mapping_strengths: list[str]
```
Build an index of all entities at the start of each evaluation run.
This index is the input for tasks 14, 16, and 18. Expose as
`--index` CLI flag for inspection.
## 14. Redundancy detection (Concern C1) — OPEN
**Depends on:** Task 13 (metadata index).
**Methodology:** OOPS! P2 (synonymous classes) + embedding similarity +
LLM pairwise judgment. See METRICS-METHODOLOGY.md §4 C1.
**Issue:** Entities with different slugs but overlapping meanings (e.g.
`natural-rate` / `ordinary-or-average-rate`) survive extraction because
dedup only checks slug collisions. There is no semantic overlap detection.
**Suggested fix:** Implement in three stages:
1. **Embed** — Compute vector embeddings of all entity definitions using
an embedding API (OpenRouter, OpenAI, or a local sentence-transformer).
Cache embeddings in `output/metrics/embeddings.json` keyed by
`{slug: content_digest}` so unchanged entities skip re-embedding.
2. **Similarity matrix** — Compute NxN cosine similarity. Write the full
matrix to `output/metrics/similarity-matrix.json`. Flag all pairs with
cosine > 0.80 as candidates.
3. **LLM pairwise judgment** — For each candidate pair, run a prompt:
"Given these two entity definitions, are they (a) the same concept and
should be merged, (b) genuinely distinct, or (c) partially overlapping
and should be clarified?" Write results to
`output/metrics/redundancy-report.md` + YAML.
**Metrics produced:**
- `high_similarity_pairs`: count and list
- `confirmed_synonyms`: count (LLM-confirmed same concept)
- `redundancy_ratio`: `confirmed_synonyms / total_entities`
- `intensional_conciseness`: `1 - redundancy_ratio`
**CLI:** `--check-redundancy --provider <provider>`
## 15. Coverage completeness (Concern C2) — OPEN
**Depends on:** Task 13 (metadata index).
**Methodology:** SEQUAL completeness + FCA gap analysis + DSL competency
questions. See METRICS-METHODOLOGY.md §4 C2.
**Issue:** Coverage is currently assessed by the LLM in a single narrative
pass. There is no structured view of which domain × VSM cells are
populated, and no way to test whether the entity set can answer specific
questions about the economic system.
**Suggested fix:** Implement in three stages:
1. **Domain × VSM matrix** — From the metadata index, count entities per
{economic_domain, vsm_system} cell. Render as a table. Identify empty
cells as specific, actionable gaps. Compute:
- `coverage_ratio = populated_cells / total_cells`
- `vsm_balance_entropy = -Σ(pᵢ log pᵢ)` across VSM systems
2. **FCA lattice** — Construct a formal context with objects = entities,
attributes = {domain, vsm_system, source_book, abstraction_level}.
Compute the concept lattice (Python `concepts` library). Extract
attribute combinations with no corresponding entity — these are
**structural coverage gaps** not visible in the simple matrix.
3. **Competency questions** — Define a set of 15-20 canonical questions
the infospace should answer (stored in
`schemas/competency-questions.md`). Example questions:
- "How does the division of labour relate to market extent?"
- "What mechanisms regulate wages toward their natural rate?"
- "How do monopolies distort the viable system?"
LLM-Eval tests whether current entities suffice to answer each.
Unanswerable questions identify specific completeness gaps.
**Metrics produced:**
- `domain_vsm_matrix`: cell counts
- `coverage_ratio`: scalar
- `vsm_balance_entropy`: scalar
- `empty_cells`: list of {domain, vsm_system} gaps
- `fca_gap_concepts`: attribute combos with no entity
- `competency_coverage`: fraction of questions answerable
**CLI:** `--check-coverage --provider <provider>`
## 16. Structural coherence (Concern C3) — OPEN
**Depends on:** Task 13 (metadata index).
**Methodology:** OntoQA relationship richness + graph connectivity +
community detection. See METRICS-METHODOLOGY.md §4 C3.
**Issue:** It is unknown whether the 85 entities form a connected
explanatory web or a fragmented collection. No relationship graph exists
between entities.
**Suggested fix:** Implement in three stages:
1. **Explicit cross-references** — Scan each entity's definition for
mentions of other entity slugs or titles (normalised string matching).
This is deterministic and catches direct references.
2. **LLM-inferred edges** — For entity pairs not caught by string
matching but in the same domain or VSM system, LLM-Eval: "Does A's
definition conceptually depend on or explain B, or vice versa?" Run
in batches. Write the combined graph to
`output/metrics/relationship-graph.json` (adjacency list).
3. **Graph analysis** — Using networkx or equivalent:
- Connected components (target: 1)
- Graph density, average degree
- Betweenness centrality → identify bridge concepts
- Louvain community detection → compare to declared domains
- OntoQA Relationship Richness
- Cohesion per domain, coupling across domains
- Orphan entities (degree 0 or 1)
**Metrics produced:**
- `connected_components`: count (target: 1)
- `graph_density`: scalar
- `avg_degree`: scalar
- `relationship_richness`: OntoQA RR
- `modularity`: Louvain score
- `bridge_concepts`: list (high betweenness centrality)
- `orphan_entities`: list (degree ≤ 1)
- `cohesion_by_domain` / `coupling_across_domains`: scalars
**CLI:** `--check-coherence --provider <provider>`
## 17. Definitional consistency (Concern C4) — OPEN
**Depends on:** Task 16 (relationship graph — the definitional dependency
graph is a directed variant of the same structure).
**Methodology:** OntoClean metaproperties + OOPS! P24 (circular
definitions) + SEQUAL validity. See METRICS-METHODOLOGY.md §4 C4.
**Issue:** No mechanism to detect circular definitions, contradictions
between related entities, or terms used in definitions that should be
entities but aren't.
**Suggested fix:** Implement in four stages:
1. **Definitional dependency graph** — Directed version of the
relationship graph: edge A→B means A's definition uses B's concept.
Reuse cross-reference extraction from task 16.
2. **Cycle detection** — Find all cycles of length ≤ 3 in the directed
graph. Short cycles are problematic (A defines B, B defines A).
Compute `grounding_ratio`: fraction of entities traceable to terms
outside the entity set without encountering a cycle.
3. **Undefined dependencies** — Extract terms from definitions that match
entity-name patterns (capitalised noun phrases, kebab-case slugs) but
have no corresponding entity file. These are concepts the infospace
implicitly relies on but hasn't defined.
4. **LLM consistency checks** — For directly-connected entity pairs,
LLM-Eval: "Do these definitions contradict each other?" For entities
with Smith's Original Wording, LLM-Eval: "Does the definition
accurately represent the cited passage?"
**Metrics produced:**
- `circular_definitions`: count and list of cycles (length ≤ 3)
- `grounding_ratio`: fraction of entities reaching primitives
- `undefined_dependencies`: list of missing terms
- `contradiction_candidates`: LLM-flagged pairs
- `source_fidelity_score`: fraction passing source check
**CLI:** `--check-consistency --provider <provider>`
## 18. Granularity balance (Concern C5) — OPEN
**Depends on:** Task 13 (metadata index).
**Methodology:** Keet granularity theory + OntoClean rigidity +
DSL laconicity. See METRICS-METHODOLOGY.md §4 C5.
**Issue:** Entities range from broad sectors (`agriculture`) to specific
market roles (`effectual-demanders`) to abstract principles
(`division-of-labour`). It is unclear whether this range is appropriate
or whether some entities are too specific/general relative to their peers.
**Suggested fix:** Implement in three stages:
1. **LLM classification** — For each entity, LLM-Eval assigns:
- Abstraction level: `theory` / `mechanism` / `observation`
- Scope score: 1-5 (very specific → very general)
- Indispensability: 1-5 ("if removed, how much explanatory power lost?")
Write to `output/evaluations/<slug>-classification.yaml`.
2. **Distribution analysis** — Deterministic:
- Count per abstraction level; compute entropy
- Per-domain scope variance (flag domains with high variance)
- Level × domain matrix (from FCA context in task 15)
- Outlier detection: entities > 1.5σ from their domain's mean scope
3. **Merge/split recommendations** — For outlier entities, LLM-Eval:
"Should this entity be merged into a broader concept, split into
sub-concepts, or is its current granularity justified?" For entities
with indispensability ≤ 2: "Could another entity serve this purpose?"
**Metrics produced:**
- `abstraction_distribution`: {theory: n, mechanism: n, observation: n}
- `abstraction_entropy`: scalar (higher = more balanced)
- `scope_variance_by_domain`: per-domain scalar
- `dispensable_entities`: list (indispensability ≤ 2)
- `merge_candidates`: list of pairs
- `split_candidates`: list of entities
**CLI:** `--check-granularity --provider <provider>`
## 19. Unified collection evaluation command — OPEN
**Depends on:** Tasks 13-18.
**Issue:** Running five separate `--check-*` commands is cumbersome and
repeats shared computation (metadata parsing, embedding, graph building).
**Suggested fix:** Add `--evaluate-collection --provider <provider>` that
runs all five checks in sequence, sharing infrastructure:
1. Parse entity metadata index (task 13) — used by all
2. Compute embeddings (task 14) — used by C1, C3
3. Build relationship graph (task 16) — used by C3, C4
4. Run all five concern checks
5. Write per-concern reports to `output/metrics/`
6. Write unified `metrics.yaml` with all collection metrics
7. Append to `metrics-history.yaml` (task 12)
Incremental mode: `--evaluate-collection --chapter <id>` re-evaluates
only entities from that chapter plus pairwise checks involving them.
Report a summary to stdout:
```
Collection evaluation (85 entities, 7 chapters):
Redundancy: 3 synonym candidates, conciseness 0.96
Coverage: 58% VSM, 20% chapters, 4 domain gaps
Coherence: 1 component, density 0.12, 2 orphans
Consistency: 0 cycles, 5 undefined deps, 0 contradictions
Granularity: entropy 1.42, 1 dispensable, 2 merge candidates
```

View File

@@ -0,0 +1,501 @@
# Collection-Level Metrics Methodology
How we evaluate the quality of the infospace as a **collection of
interrelated concepts**, beyond the quality of individual entities.
This document describes the theoretical frameworks drawn from ontology
engineering, formal concept analysis, semiotic quality theory, and DSL
design — and how each is adapted to work within MarkiTect's two-layer
evaluation model (LLM-Eval + deterministic aggregation).
---
## 1. The Two-Layer Model
Every metric in this methodology decomposes into two layers:
| Layer | What it does | How it runs |
|-------|-------------|-------------|
| **LLM-Eval** | Qualitative judgment: "Are these two concepts the same?", "Is this definition grounded in the source?" | Prompt template → LLM → structured YAML output |
| **Deterministic** | Quantitative aggregation: cosine similarity, graph connectivity, coverage counting, cycle detection | Python code in `process_chapters.py` or dedicated `metrics.py` |
The LLM-Eval layer produces **per-entity** or **per-pair** structured
scores. The deterministic layer **aggregates** these into collection-level
metrics, persisted as machine-readable YAML alongside human-readable
markdown reports.
Per-concept quality metrics (definition precision, source grounding, VSM
relevance — see INFRA-TASKS 8-12) operate at the individual entity level.
This document covers the five **collection-level concerns** that assess how
the entities work together as an explanatory system.
---
## 2. Five Collection-Level Concerns
### Overview
| # | Concern | Question | Primary framework |
|---|---------|----------|-------------------|
| C1 | Semantic Overlap | Are there redundant concepts? | OOPS! P2, embedding similarity |
| C2 | Coverage Completeness | Does the concept set cover the domain? | SEQUAL, FCA |
| C3 | Structural Coherence | Do concepts form a connected explanatory graph? | OntoQA, graph theory |
| C4 | Definitional Consistency | Are concepts defined consistently and non-circularly? | OntoClean, OOPS! P24 |
| C5 | Granularity Balance | Are concepts at comparable levels of abstraction? | Granularity theory, DSL laconicity |
---
## 3. Theoretical Frameworks
### 3.1 SEQUAL (Semiotic Quality Framework)
**Origin:** Lindland, Sindre & Sølvberg (1994), extended by Krogstie et al.
**What it defines:** Quality of a conceptual model as the correspondence
between three worlds — the domain (what exists), the model (what we
captured), and the audience's interpretation (what they understand).
Two key dimensions of **semantic quality**:
- **Validity** — everything in the model corresponds to something real
in the domain. No invented concepts.
- **Completeness** — everything relevant in the domain is represented in
the model. No missing concepts.
**How we use it:** SEQUAL frames our entire metrics approach. Every
collection-level metric maps to one of these dimensions:
| SEQUAL dimension | Our concerns |
|-----------------|--------------|
| Validity | C1 (redundancy reduces validity — duplicate concepts don't correspond to distinct domain facts), C4 (consistency — contradictory definitions can't both be valid) |
| Completeness | C2 (coverage — are all needed concepts present?), C5 (granularity — missing levels of abstraction are completeness gaps) |
| Both | C3 (coherence — disconnected concepts suggest either missing bridging concepts [completeness] or misplaced concepts [validity]) |
**Adaptation:** SEQUAL was designed for formal models evaluated by human
experts. We replace human judgment with LLM-Eval (for validity checks like
"does this concept correspond to something Smith actually described?") and
deterministic counting (for completeness checks like "which VSM systems
lack entity mappings?").
### 3.2 OntoClean
**Origin:** Guarino & Welty (2004).
**What it defines:** A methodology for validating taxonomic relationships
by assigning **metaproperties** to each concept:
- **Rigidity** — Is the property essential to all its instances? (e.g.
"market" is rigid; "effectual demander" is anti-rigid — an agent can
stop being an effectual demander)
- **Identity** — Does the concept carry an identity criterion? (e.g.
"division of labour" can be identified by its three causal mechanisms)
- **Unity** — Are all instances of this concept whole in the same way?
- **Dependence** — Does the concept require another concept to exist?
(e.g. "market price" depends on "effectual demand")
**Constraint:** A rigid concept cannot be subsumed by an anti-rigid one.
Violations indicate structural confusion.
**How we use it:** We do not have a formal taxonomy, but our flat entity
set implicitly contains subsumption relationships (e.g. "natural rate"
subsumes "ordinary-or-average rate"). OntoClean metaproperties help detect:
- **Granularity mismatches** (C5): A rigid concept at the same level as
an anti-rigid one suggests different abstraction levels are mixed.
- **Definitional consistency** (C4): If entity A depends on entity B per
OntoClean, but B's definition doesn't acknowledge A, the definitions
are inconsistent.
- **Redundancy** (C1): Two entities with identical metaproperty profiles
and overlapping definitions are candidates for merging.
**Adaptation:** Instead of manual metaproperty assignment, we use LLM-Eval
to classify each entity's rigidity, identity criterion, and dependencies.
The constraint checking is then deterministic.
### 3.3 OOPS! (Ontology Pitfall Scanner)
**Origin:** Poveda-Villalón et al. (2014). Catalogue of 41 common
ontology design pitfalls.
**What it defines:** Concrete, testable anti-patterns. The pitfalls most
relevant to our infospace:
| Pitfall | Description | Our concern |
|---------|-------------|-------------|
| P2 | Synonymous classes — different names, same meaning | C1 (redundancy) |
| P4 | Unconnected ontology elements | C3 (coherence) |
| P6 | Missing inverse relationships | C3 |
| P7 | Merging different concepts in the same class | C5 (granularity — too coarse) |
| P11 | Missing domain or range | C4 (consistency) |
| P19 | Missing disjointness axioms | C1 (how do we know two concepts don't overlap?) |
| P24 | Recursive/circular definition | C4 (consistency) |
| P25 | Inverse of itself | C4 |
**How we use it:** OOPS! pitfalls become a **checklist for LLM-Eval
prompts**. Rather than running a formal OWL scanner, we ask the LLM to
check for each pitfall pattern:
- "Are entities A and B synonymous?" (P2)
- "Does entity A's definition reference itself?" (P24)
- "Is entity A actually two distinct concepts merged together?" (P7)
The deterministic layer counts pitfall occurrences and tracks them over
time.
**Adaptation:** We select the subset of OOPS! pitfalls applicable to
semi-formal markdown-based ontologies (no OWL axioms) and implement each
as an LLM-Eval prompt pattern rather than a formal reasoner check.
### 3.4 OntoQA (Metric-Based Ontology Quality Analysis)
**Origin:** Tartir & Arpinar (2007).
**What it defines:** Quantitative schema-level and instance-level metrics:
- **Relationship Richness (RR):** Proportion of non-taxonomic (lateral)
relationships to total relationships. `RR = non_hierarchical / total`.
Low RR = mere taxonomy. High RR = rich cross-cutting connections.
- **Attribute Richness (AR):** Average number of attributes per concept.
`AR = total_attributes / total_concepts`.
- **Inheritance Richness (IR):** Average subclasses per class — measures
how knowledge distributes across the hierarchy.
- **Class Richness (CR):** Proportion of classes with instances.
**How we use it:** Our entities don't have formal relationships declared
between them, but we can **infer** a relationship graph from their
definitions and mappings:
- Entity A references entity B in its definition → definitional dependency
- Entities A and B map to the same VSM system → structural co-occurrence
- Entities A and B appear in the same chapter → contextual co-occurrence
From this inferred graph, we compute OntoQA metrics directly:
- **Relationship Richness** tells us whether our concepts form a web of
explanatory connections or just a flat list.
- **Attribute Richness** maps to our schema sections — entities with more
optional sections filled (Original Wording, Modern Interpretation) are
richer.
**Adaptation:** The key modification is that relationship inference is an
LLM-Eval step (pairwise: "does A's definition depend on or reference B?"),
after which all OntoQA metrics are computed deterministically on the
resulting graph.
### 3.5 Formal Concept Analysis (FCA)
**Origin:** Wille (1982). Applied to ontology auditing by Elhaj et al.
(2008) for SNOMED CT completeness checking.
**What it defines:** A mathematical framework for deriving a **concept
lattice** from a binary relation between objects and attributes. The
lattice reveals:
- **Formal concepts**: maximal sets of objects sharing the same attributes
- **Subconcept/superconcept** relationships: the natural hierarchy
- **Missing concepts**: attribute combinations with no corresponding object
**How we use it:** We construct a **formal context** (binary matrix):
- **Objects** = our 85 entities
- **Attributes** = economic domain, VSM system, source book, abstraction
level (from LLM-Eval), key terms (extracted from definitions)
The concept lattice then reveals:
- **Coverage gaps** (C2): Attribute combinations with no entity. E.g. if
the cell {Distribution, S3} is empty, we lack control-layer concepts
for distribution — a specific, actionable gap.
- **Redundancy** (C1): Entities with identical attribute sets (same formal
concept) are candidates for merging.
- **Granularity** (C5): The lattice depth indicates how many meaningful
levels of abstraction exist. A shallow lattice suggests missing
intermediate concepts.
**Adaptation:** Classic FCA requires crisp binary attributes. Our domains
and VSM mappings are already categorical, but abstraction level and key
terms need LLM-Eval to produce. The lattice computation itself is
deterministic (Python `concepts` library or equivalent). The FCA approach
replaces the current "ask the LLM about coverage" with a structural
computation that can identify *specific* gaps rather than vague
recommendations.
### 3.6 DSL Design Principles
**Origin:** Mernik et al. (2005) "When and How to Develop DSLs";
Karsai et al. (2014) "Design Guidelines for Domain-Specific Languages".
**What they define:** Quality criteria for a set of concepts that form a
language for a specific domain:
- **Soundness**: Every concept in the language corresponds to a real domain
concern (no invented abstractions).
- **Completeness**: The language can express everything needed for its
intended tasks.
- **Laconicity**: No unnecessary concepts — every concept earns its place.
- **Orthogonality**: Concepts are independent; combining any two produces
a meaningful result (no redundant combinations).
**How we use it:** Our entity set is effectively a domain-specific
vocabulary for "explaining classical economics through VSM". DSL quality
criteria translate directly:
- **Soundness** → Validity (SEQUAL): every entity grounded in Smith's text
- **Completeness** → Coverage (C2): can we answer the "competency
questions" the infospace is meant to address?
- **Laconicity** → Anti-redundancy (C1) + Indispensability (C5): would
removing any entity lose explanatory power?
- **Orthogonality** → Non-overlap (C1): entity definitions don't
substantially duplicate each other
**Adaptation:** We operationalise DSL completeness through **competency
questions** — a set of canonical questions the infospace should be able to
answer (e.g. "How does the division of labour relate to market extent?",
"What mechanisms regulate wages toward their natural rate?"). LLM-Eval
tests whether the current entity set suffices to answer each question.
Unanswerable questions identify specific completeness gaps.
Laconicity is operationalised as **indispensability scoring**: for each
entity, LLM-Eval rates whether removing it would lose explanatory power.
Low-scoring entities are candidates for merging or retirement.
---
## 4. Integration: Metric Definitions by Concern
### C1: Semantic Overlap / Redundancy
**Goal:** Identify entities that substantially overlap in meaning and
should be merged, distinguished, or retired.
**Metrics:**
| Metric | Type | Computation |
|--------|------|-------------|
| `similarity_matrix` | Deterministic | Embed all entity definitions; compute NxN cosine similarity |
| `high_similarity_pairs` | Deterministic | Pairs with cosine > 0.80, sorted descending |
| `confirmed_synonyms` | LLM-Eval | For each high-similarity pair, LLM judges: "same concept" / "genuinely distinct" / "partial overlap" |
| `redundancy_ratio` | Deterministic | `confirmed_synonyms / total_entities` |
| `intensional_conciseness` | Deterministic | `1 - redundancy_ratio` (from KG quality framework) |
**Pipeline:**
1. Embed definitions (embedding API or local model)
2. Compute cosine similarity matrix
3. Filter pairs above threshold
4. LLM pairwise judgment on filtered pairs only (avoids N² LLM calls)
5. Aggregate into ratio and conciseness score
**Output:** `output/metrics/redundancy-report.md` + structured YAML with
pair list, scores, and merge/retire recommendations.
### C2: Coverage Completeness
**Goal:** Identify domain areas and VSM systems that lack adequate
representation in the entity set.
**Metrics:**
| Metric | Type | Computation |
|--------|------|-------------|
| `domain_vsm_matrix` | Deterministic | Count entities per {economic_domain, VSM_system} cell |
| `coverage_ratio` | Deterministic | `populated_cells / expected_cells` |
| `vsm_balance_entropy` | Deterministic | Shannon entropy of entity distribution across VSM systems (higher = more balanced) |
| `empty_cells` | Deterministic | List of {domain, VSM_system} pairs with zero entities |
| `competency_coverage` | LLM-Eval | For each competency question, can it be answered with current entities? |
| `fca_gap_concepts` | Deterministic | Attribute combinations in the FCA lattice with no corresponding entity |
**Pipeline:**
1. Parse entity metadata (domain, VSM mapping) from files on disk
2. Build domain × VSM matrix; identify empty cells
3. Build FCA formal context; compute lattice; extract gap concepts
4. Define competency questions (initially hand-written, later LLM-generated
from the source material)
5. LLM-evaluate answerability of each question
6. Aggregate into coverage ratio, entropy, and gap list
**Output:** `output/metrics/coverage-report.md` + YAML with matrix, gaps,
and competency question results.
### C3: Structural Coherence
**Goal:** Determine whether the entities form a connected explanatory web
or a fragmented collection of isolated concepts.
**Metrics:**
| Metric | Type | Computation |
|--------|------|-------------|
| `relationship_graph` | LLM-Eval + Deterministic | Infer edges from definition cross-references (string matching) + LLM judgment for implicit references |
| `connected_components` | Deterministic | Number of connected components in the graph (target: 1) |
| `graph_density` | Deterministic | `actual_edges / possible_edges` |
| `avg_degree` | Deterministic | `total_edges / total_entities` |
| `relationship_richness` | Deterministic | OntoQA RR: `non_hierarchical_edges / total_edges` |
| `modularity` | Deterministic | Louvain modularity score (0.3-0.7 = meaningful structure; >0.8 = fragmentation) |
| `bridge_concepts` | Deterministic | Entities with highest betweenness centrality (connect clusters) |
| `orphan_entities` | Deterministic | Entities with degree 0 or 1 |
| `cohesion_by_domain` | Deterministic | Avg intra-domain edges per entity |
| `coupling_across_domains` | Deterministic | Inter-domain edges / total edges |
**Pipeline:**
1. Extract explicit cross-references from definitions (entity name
mentions in other definitions — string matching with slug normalisation)
2. For entity pairs not caught by string matching, LLM-Eval: "Does A's
definition depend on or reference B's concept?"
3. Build directed graph
4. Compute graph metrics (networkx or equivalent)
5. Run community detection; compare detected communities to declared
economic domains
**Output:** `output/metrics/coherence-report.md` + YAML with graph
statistics, orphan list, bridge concepts, and community structure.
### C4: Definitional Consistency
**Goal:** Ensure entities are defined consistently, non-circularly, and
without contradicting each other.
**Metrics:**
| Metric | Type | Computation |
|--------|------|-------------|
| `definitional_dependency_graph` | Deterministic + LLM-Eval | Edges where A's definition uses B's concept |
| `circular_definitions` | Deterministic | Cycles of length ≤ 3 in the dependency graph |
| `definition_depth` | Deterministic | Longest dependency chain per entity before reaching a term not in the entity set |
| `undefined_dependencies` | Deterministic | Terms used in definitions that arguably should be entities but aren't |
| `pairwise_consistency` | LLM-Eval | For related entity pairs (sharing edges): "Do these definitions contradict each other?" |
| `source_fidelity` | LLM-Eval | "Does this definition accurately represent what Smith wrote in the cited passage?" |
| `metaproperty_violations` | LLM-Eval + Deterministic | OntoClean constraint checking after LLM classifies rigidity/identity |
| `grounding_ratio` | Deterministic | Fraction of entities traceable to primitives without cycles |
**Pipeline:**
1. Build definitional dependency graph (same technique as C3, but directed
— A depends on B means A's definition uses B, not vice versa)
2. Detect cycles; flag short cycles
3. Extract undefined terms (terms matching entity-name patterns that appear
in definitions but have no corresponding entity file)
4. LLM pairwise consistency check on directly-connected pairs
5. LLM source fidelity check (compare definition to source chapter text)
6. LLM OntoClean metaproperty classification; deterministic constraint
checking
**Output:** `output/metrics/consistency-report.md` + YAML with cycle list,
undefined terms, contradiction candidates, and metaproperty violations.
### C5: Granularity Balance
**Goal:** Ensure entities operate at comparable levels of abstraction
within their respective domains and perspectives.
**Metrics:**
| Metric | Type | Computation |
|--------|------|-------------|
| `abstraction_classification` | LLM-Eval | Classify each entity as theory-level / mechanism-level / observation-level |
| `scope_score` | LLM-Eval | Rate each entity 1-5 for generality (1 = very specific instance, 5 = broad theoretical principle) |
| `abstraction_distribution` | Deterministic | Count per level; compute entropy |
| `scope_variance` | Deterministic | Variance of scope scores within each domain |
| `level_x_perspective_matrix` | Deterministic | Cross-tabulation of abstraction level × economic domain |
| `indispensability` | LLM-Eval | "If removed, what explanatory power is lost?" (1-5) |
| `dispensable_entities` | Deterministic | Entities with indispensability score ≤ 2 |
| `merge_candidates` | LLM-Eval | Pairs where one is a sub-case of the other |
**Pipeline:**
1. LLM-classify each entity: abstraction level, scope score,
indispensability
2. Build level × perspective matrix
3. Compute distribution entropy and per-domain scope variance
4. Flag outliers: entities whose scope score deviates > 1.5σ from their
domain mean
5. For outlier entities, LLM-Eval: "Should this be merged into a broader
concept, or split into sub-concepts?"
**Output:** `output/metrics/granularity-report.md` + YAML with
classifications, distribution, outliers, and merge/split recommendations.
---
## 5. Shared Infrastructure
Several concerns share underlying computations:
| Infrastructure | Used by | Build once |
|---------------|---------|------------|
| Definition embeddings (vector per entity) | C1, C3 | Embedding API call per entity |
| Relationship graph (entity → entity edges) | C3, C4 | String matching + LLM-Eval |
| FCA formal context (entity × attribute matrix) | C2, C5 | Metadata parsing + LLM classification |
| Entity metadata index (domain, VSM, chapter, sections) | C2, C5, C10 (schema compliance) | Deterministic markdown parsing |
These should be computed once per evaluation run and cached for use by
all concern-specific metrics.
---
## 6. Evaluation Workflow
A full collection-level evaluation run:
```
process_chapters.py --evaluate-collection --provider <provider>
```
1. **Parse** — deterministic metadata extraction from all entity files
2. **Embed** — compute definition embeddings (cached; only new/changed
entities need fresh embeddings)
3. **Infer** — LLM-Eval for relationship edges, metaproperties,
abstraction levels, pairwise judgments (batched to minimise LLM calls)
4. **Compute** — deterministic graph metrics, FCA lattice, coverage
matrix, similarity matrix, cycle detection
5. **Aggregate** — combine per-entity and per-pair scores into
collection-level metrics
6. **Report** — write per-concern markdown reports + unified `metrics.yaml`
7. **Append** — add timestamped snapshot to `metrics-history.yaml`
Incremental mode (`--evaluate-collection --chapter <id>`) re-evaluates
only the entities introduced or modified by that chapter, plus any
pairwise checks involving those entities.
---
## 7. References
- Lindland, O.I., Sindre, G. & Sølvberg, A. (1994). "Understanding
Quality in Conceptual Modeling." *IEEE Software* 11(2), 42-49.
→ SEQUAL framework: validity and completeness dimensions.
- Guarino, N. & Welty, C.A. (2004). "An Overview of OntoClean." In
*Handbook on Ontologies*, Springer, 151-171.
→ Metaproperty analysis: rigidity, identity, unity, dependence.
- Poveda-Villalón, M., Gómez-Pérez, A. & Suárez-Figueroa, M.C. (2014).
"OOPS! (OntOlogy Pitfall Scanner!): An On-line Tool for Ontology
Evaluation." *IJSWIS* 10(2), 7-34.
→ Pitfall catalogue: 41 anti-patterns for ontology design.
- Tartir, S. & Arpinar, I.B. (2007). "Ontology Evaluation and Ranking
using OntoQA." *ICSC 2007*, IEEE, 185-192.
→ Schema metrics: relationship richness, attribute richness.
- Wille, R. (1982). "Restructuring Lattice Theory." In *Ordered Sets*,
Reidel, 445-470.
→ Formal Concept Analysis: concept lattices from binary contexts.
- Elhaj, H. et al. (2008). "Auditing SNOMED CT with Formal Concept
Analysis." *AMIA Annual Symposium*, PMC2605587.
→ FCA for ontology completeness auditing.
- Keet, C.M. (2008). *A Formal Theory of Granularity.* PhD thesis,
Free University of Bozen-Bolzano.
→ Granularity levels and perspectives for ontology design.
- Mernik, M., Heering, J. & Sloane, A.M. (2005). "When and How to
Develop Domain-Specific Languages." *ACM Computing Surveys* 37(4),
316-344.
→ DSL design: soundness, completeness, laconicity.
- Karsai, G. et al. (2014). "Design Guidelines for Domain Specific
Languages." *arXiv:1409.2378*.
→ Orthogonality, necessary-and-sufficient principle.
- Xue, B. & Zou, L. (2022). "Knowledge Graph Quality Management: A
Comprehensive Survey." *IEEE TKDE* 35(5), 4969-4988.
→ KG quality dimensions: conciseness, consistency, completeness.

View File

@@ -43,6 +43,7 @@ examples/infospace-with-history/
├── TUTORIAL.md # This file
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
├── process_chapters.py # Pipeline script
├── infospace.db # SQLite artifact database (generated, not in git)
├── schemas/ # Output structure definitions
│ ├── economic-entity-schema-v1.0.md
@@ -369,7 +370,53 @@ python process_chapters.py --stats
---
## 7. How the LLM Integration Works
## 7. The Artifact Database (`infospace.db`)
The pipeline stores all artifacts (source text, templates, guidelines, generated
outputs) and their dependency edges in a local SQLite database —
`infospace.db`. This file is **not checked into git** because it is a derived
cache that can be regenerated deterministically from the files already in the
repository.
### Why it is excluded
- **Binary format** — SQLite databases don't produce meaningful diffs and
would bloat the git history with every pipeline run.
- **Fully derived** — every piece of data in the database originates from
markdown files that *are* tracked in git (sources, templates, schemas,
guidelines, and generated output).
- **Reproducible** — re-running the pipeline rebuilds the database from
scratch without any LLM calls, because each stage checks for existing
output files on disk before invoking the LLM.
### How to regenerate it
If `infospace.db` is missing (e.g. after a fresh clone), rebuild it by
re-running the pipeline over the chapters that already have output on disk:
```bash
# Regenerate the database from existing output files (no LLM calls needed):
python process_chapters.py --all --no-commit
```
This will:
1. Create a fresh `infospace.db`
2. Load all static artifacts (templates, guidelines, VSM reference)
3. For each chapter whose output files already exist, import them into the
database and record dependency edges
4. Skip LLM calls entirely — existing files are detected and reused
After regeneration, `--list` and `--stats` work as normal:
```bash
python process_chapters.py --list
python process_chapters.py --stats
```
---
## 8. How the LLM Integration Works
The pipeline uses MarkiTect's `markitect.llm` module, which provides three
adapter backends that implement the `LLMAdapter` interface:
@@ -423,7 +470,7 @@ supports `gemini-2.5-flash` with generous rate limits.
---
## 8. Tracking History with Git
## 9. Tracking History with Git
Every processed chapter produces a git commit containing:
@@ -459,7 +506,7 @@ git commit -m "infospace: process book-1-chapter-05"
---
## 9. Cost and Performance
## 10. Cost and Performance
From our measurements processing chapters 3-5:
@@ -486,7 +533,7 @@ To reduce costs further, use a cheaper model:
---
## 10. Completing the Remaining Chapters
## 11. Completing the Remaining Chapters
As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
how to complete the rest.
@@ -555,7 +602,7 @@ fill the remaining gaps in S3*, S5, and regulatory concepts.
---
## 11. Quality Improvement Loop
## 12. Quality Improvement Loop
The infospace is designed to be **iteratively refined**:
@@ -604,7 +651,7 @@ history of the infospace — every refinement decision is traceable.
---
## 12. Infrastructure Issues Found and Fixed
## 13. Infrastructure Issues Found and Fixed
During development we documented three issues with the MarkiTect
infrastructure in `INFRA-TASKS.md`:
@@ -624,7 +671,7 @@ See `INFRA-TASKS.md` for details on each fix.
---
## 13. Adapting This Pattern to Your Own Project
## 14. Adapting This Pattern to Your Own Project
To build your own infospace using this pattern:

View File

@@ -0,0 +1,51 @@
# Infospace: The Wealth of Nations through the Viable System Model
#
# This configuration declares the infospace built by processing
# Adam Smith's "The Wealth of Nations" (1776) through the lens of
# Stafford Beer's Viable System Model (VSM).
topic:
name: "The Wealth of Nations"
domain: "Classical Economics"
sources: artifacts/sources/
disciplines:
- name: "Viable System Model"
path: artifacts/vsm-reference/
schemas:
entity: schemas/economic-entity-schema-v1.0.md
mapping: schemas/vsm-mapping-schema-v1.0.md
analysis: schemas/chapter-analysis-schema-v1.0.md
competency_questions: |
1. How does Smith's division of labour map to VSM System 1 operations?
2. What mechanisms in WoN correspond to VSM coordination (System 2)?
3. Where does Smith describe self-organising regulation (System 3)?
4. What role does the "invisible hand" play as a System 4 mechanism?
5. How do Smith's views on government map to System 5 policy?
6. Is the WoN entity set viable as an explanatory framework?
viability:
redundancy_ratio:
max: 0.10
coverage_ratio:
min: 0.50
coherence_components:
max: 3
consistency_cycles:
max: 0
granularity_entropy:
min: 1.0
pipeline:
stages:
- name: extract-entities
template: templates/extract-entities.md
- name: map-to-vsm
template: templates/map-to-vsm.md
- name: synthesize-analysis
template: templates/synthesize-analysis.md
post_batch:
- name: assess-metrics
template: templates/assess-metrics.md

View File

@@ -0,0 +1,26 @@
- snapshot_id: 6ba48eb2
created_at: '2026-02-19T01:29:41.225843+00:00'
schema_name: default
entity_count: 85
entity_evaluations: []
collection_metrics:
- name: coherence_components
value: 0.0
concern: C3
- name: consistency_cycles
value: 0.0
concern: C4
- name: coverage_ratio
value: 0.3611111111111111
concern: C2
- name: granularity_entropy
value: 2.687485267017996
concern: C5
- name: modularity
value: 0.0
concern: C3
- name: redundancy_ratio
value: 0.0
concern: C1
metadata:
source: collection-checks

View File

@@ -0,0 +1,6 @@
coherence_components: 0.0
consistency_cycles: 0.0
coverage_ratio: 0.361111
granularity_entropy: 2.687485
modularity: 0.0
redundancy_ratio: 0.0

View File

@@ -856,6 +856,125 @@ class ChapterProcessor:
print(f" (No data yet: {e})")
# ── Infospace tooling integration ─────────────────────────────────
def _load_infospace(example_dir: Path):
"""Load infospace config and entities from the example directory."""
from markitect.infospace.config import load_infospace_config
from markitect.infospace.entity_parser import parse_entity_directory
config_path = example_dir / "infospace.yaml"
if not config_path.is_file():
print("Error: No infospace.yaml found. Create one first.")
sys.exit(1)
config = load_infospace_config(config_path)
entities_dir = example_dir / config.entities_dir
entities = parse_entity_directory(entities_dir) if entities_dir.is_dir() else []
return config, config_path, entities
def _run_infospace_status(example_dir: Path):
"""Show infospace status using the tooling layer."""
from markitect.infospace.state import build_state
config, config_path, entities = _load_infospace(example_dir)
state = build_state(config, entities=entities)
print(f"Infospace: {state.topic_name}")
print(f"Domain: {config.topic.domain}")
print(f"Entities: {state.entity_count}")
if state.domains:
print(f"Domains: {', '.join(state.domains)}")
if config.disciplines:
names = [d.name for d in config.disciplines]
print(f"Disciplines: {', '.join(names)}")
# Show processing progress
sources_dir = example_dir / "artifacts" / "sources"
total_chapters = len(list(sources_dir.glob("*.md")))
processed = len(list((example_dir / "output" / "analyses").glob("*-analysis.md")))
print(f"Chapters: {processed}/{total_chapters} processed")
def _run_infospace_check(example_dir: Path):
"""Run collection-level quality checks."""
from markitect.infospace.checks import run_all_checks
from markitect.infospace.history import record_check_results
config, config_path, entities = _load_infospace(example_dir)
if not entities:
print("No entities to check.")
return
print(f"Running collection checks on {len(entities)} entities...\n")
report = run_all_checks(entities=entities)
d = report.to_dict()
for concern_name, concern_data in d.items():
label = concern_data.get("concern", concern_name.upper())
print(f" {label}{concern_name}")
for k, v in concern_data.items():
if k == "concern":
continue
print(f" {k}: {v}")
print()
m = report.metrics()
if m:
print("Metrics summary:")
for k, v in sorted(m.items()):
print(f" {k}: {v:.4f}")
snap = record_check_results(report, config, example_dir, entity_count=len(entities))
print(f"\nRecorded snapshot {snap.snapshot_id}")
def _run_infospace_viability(example_dir: Path):
"""Show viability dashboard."""
from markitect.infospace.history import read_metrics_file
from markitect.infospace.state import build_state
config, config_path, entities = _load_infospace(example_dir)
if not config.viability:
print("No viability thresholds configured.")
return
metrics = read_metrics_file(example_dir / config.metrics_dir / "metrics.yaml")
if not metrics:
print("No metrics available. Run --infospace-check first.")
print("\nConfigured thresholds:")
for name, t in config.viability.items():
bounds = []
if t.min is not None:
bounds.append(f"min={t.min}")
if t.max is not None:
bounds.append(f"max={t.max}")
print(f" {name}: {', '.join(bounds)}")
return
state = build_state(config, entities=entities, metrics=metrics)
print(f"{'Metric':<30} {'Value':>8} {'Threshold':>15} {'Status':>8}")
print("-" * 63)
for r in state.viability_results:
bounds = []
if r.threshold.min is not None:
bounds.append(f"min={r.threshold.min}")
if r.threshold.max is not None:
bounds.append(f"max={r.threshold.max}")
status_str = "PASS" if r.passed else "FAIL"
print(f"{r.metric:<30} {r.value:>8.4f} {', '.join(bounds):>15} {status_str:>8}")
print()
if state.is_viable:
print(f"Viable: YES ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
else:
print(f"Viable: NO ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
def main():
parser = argparse.ArgumentParser(
description="Process Wealth of Nations chapters through VSM analysis pipeline"
@@ -869,6 +988,12 @@ def main():
group.add_argument("--stats", action="store_true", help="Show dependency statistics")
group.add_argument("--archive-entity", type=str, metavar="SLUG",
help="Archive an entity (move to archive/ with reason)")
group.add_argument("--infospace-status", action="store_true",
help="Show infospace status via infospace tooling")
group.add_argument("--infospace-check", action="store_true",
help="Run collection-level quality checks (C1-C5)")
group.add_argument("--infospace-viability", action="store_true",
help="Show viability dashboard")
parser.add_argument("--reason", type=str, default=None,
help="Reason for archiving (used with --archive-entity)")
@@ -930,6 +1055,15 @@ def main():
for ch in chapters:
processor.process_chapter(ch, auto_commit=not args.no_commit)
print()
elif args.infospace_status:
_run_infospace_status(example_dir)
return
elif args.infospace_check:
_run_infospace_check(example_dir)
return
elif args.infospace_viability:
_run_infospace_viability(example_dir)
return
processor.show_stats()

View File

@@ -0,0 +1,6 @@
"""
markitect.analysis — Analytical utilities for MarkiTect.
Provides graph analysis, similarity computation, and other
quantitative tools used by infospace tooling.
"""

307
markitect/analysis/fca.py Normal file
View File

@@ -0,0 +1,307 @@
"""
Formal Concept Analysis (FCA) for coverage gap detection.
Provides a pure-Python implementation of:
- :class:`FormalContext` — entity × attribute binary relation with
extent/intent operations and double-prime closure.
- :class:`ConceptLattice` — the set of all formal concepts computed
via the NextClosure algorithm (Ganter, 1984).
- :func:`find_gap_concepts` — attribute combinations present in the
lattice whose extent is empty, revealing structural coverage gaps.
Sufficient for entity scales of ~100s. For larger contexts a library
such as ``concepts`` (PyPI) can be substituted.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Iterable, Optional
class FormalContext:
"""Binary relation between objects and attributes.
Args:
objects: Iterable of object identifiers (e.g. entity slugs).
attributes: Iterable of attribute identifiers (e.g. "domain:Production").
incidence: Mapping of object → set of attributes it possesses.
"""
def __init__(
self,
objects: Iterable[str],
attributes: Iterable[str],
incidence: dict[str, set[str]],
):
self._objects = sorted(set(objects))
self._attributes = sorted(set(attributes))
self._obj_set = frozenset(self._objects)
self._attr_set = frozenset(self._attributes)
# Normalise incidence: only keep known attributes
self._incidence: dict[str, frozenset[str]] = {}
for obj in self._objects:
raw = incidence.get(obj, set())
self._incidence[obj] = frozenset(raw) & self._attr_set
# Reverse index: attribute → set of objects that have it
self._attr_to_objs: dict[str, frozenset[str]] = {}
for attr in self._attributes:
self._attr_to_objs[attr] = frozenset(
obj for obj in self._objects if attr in self._incidence[obj]
)
@property
def objects(self) -> list[str]:
"""Sorted list of objects."""
return list(self._objects)
@property
def attributes(self) -> list[str]:
"""Sorted list of attributes."""
return list(self._attributes)
@property
def object_count(self) -> int:
return len(self._objects)
@property
def attribute_count(self) -> int:
return len(self._attributes)
def extent(self, attrs: Iterable[str]) -> frozenset[str]:
"""Objects possessing **all** given attributes (B' operation)."""
attr_set = frozenset(attrs)
if not attr_set:
return self._obj_set
result = self._obj_set
for attr in attr_set:
result = result & self._attr_to_objs.get(attr, frozenset())
return result
def intent(self, objs: Iterable[str]) -> frozenset[str]:
"""Attributes shared by **all** given objects (A' operation)."""
obj_list = [o for o in objs if o in self._incidence]
if not obj_list:
return self._attr_set
result = self._incidence[obj_list[0]]
for obj in obj_list[1:]:
result = result & self._incidence[obj]
return result
def closure(self, attrs: Iterable[str]) -> frozenset[str]:
"""Double-prime closure: B'' = intent(extent(B))."""
return self.intent(self.extent(attrs))
def has_attribute(self, obj: str, attr: str) -> bool:
"""Check if *obj* has *attr*."""
return attr in self._incidence.get(obj, frozenset())
def density(self) -> float:
"""Proportion of 1s in the incidence matrix."""
total = len(self._objects) * len(self._attributes)
if total == 0:
return 0.0
filled = sum(len(attrs) for attrs in self._incidence.values())
return filled / total
@classmethod
def from_dict(cls, entity_attributes: dict[str, set[str]]) -> FormalContext:
"""Convenience: build context from ``{object: {attr, ...}}``."""
objects = list(entity_attributes.keys())
all_attrs: set[str] = set()
for attrs in entity_attributes.values():
all_attrs.update(attrs)
return cls(objects, all_attrs, entity_attributes)
@dataclass(frozen=True)
class FormalConcept:
"""A formal concept (A, B) where A' = B and B' = A."""
extent: frozenset[str]
intent: frozenset[str]
@property
def extent_size(self) -> int:
return len(self.extent)
@property
def intent_size(self) -> int:
return len(self.intent)
@dataclass
class ConceptLattice:
"""The set of all formal concepts derived from a :class:`FormalContext`.
Concepts are ordered by extent inclusion (subconcept ≤ superconcept).
"""
concepts: list[FormalConcept] = field(default_factory=list)
@property
def size(self) -> int:
"""Number of formal concepts in the lattice."""
return len(self.concepts)
@property
def top(self) -> Optional[FormalConcept]:
"""Supremum: concept with largest extent."""
if not self.concepts:
return None
return max(self.concepts, key=lambda c: c.extent_size)
@property
def bottom(self) -> Optional[FormalConcept]:
"""Infimum: concept with largest intent."""
if not self.concepts:
return None
return max(self.concepts, key=lambda c: c.intent_size)
@classmethod
def from_context(cls, context: FormalContext) -> ConceptLattice:
"""Compute all formal concepts using the NextClosure algorithm."""
attrs = context.attributes # sorted, fixed order
if not attrs:
# Degenerate: no attributes → single concept with all objects
top = FormalConcept(
extent=frozenset(context.objects),
intent=frozenset(),
)
return cls(concepts=[top])
concepts: list[FormalConcept] = []
# Start with closure of empty attribute set
current = context.closure(frozenset())
ext = context.extent(current)
concepts.append(FormalConcept(extent=ext, intent=current))
while current != frozenset(attrs):
nxt = _next_closure(current, attrs, context.closure)
if nxt is None:
break
ext = context.extent(nxt)
concepts.append(FormalConcept(extent=ext, intent=nxt))
current = nxt
return cls(concepts=concepts)
def gap_concepts(self) -> list[FormalConcept]:
"""Formal concepts whose extent is empty."""
return [c for c in self.concepts if c.extent_size == 0]
def concepts_with_extent_size(self, min_size: int = 0, max_size: Optional[int] = None) -> list[FormalConcept]:
"""Filter concepts by extent size."""
result = [c for c in self.concepts if c.extent_size >= min_size]
if max_size is not None:
result = [c for c in result if c.extent_size <= max_size]
return result
def depth(self) -> int:
"""Longest chain length in the concept ordering.
A chain is a sequence of concepts c_1 < c_2 < ... < c_k
where < means strict subconcept (extent inclusion).
"""
if not self.concepts:
return 0
# Build DAG: concept i → j if i is direct subconcept of j
# Use extent inclusion: i < j iff extent_i ⊂ extent_j
n = len(self.concepts)
extents = [c.extent for c in self.concepts]
# Longest path via dynamic programming on sorted order
# Sort by extent size ascending (smaller extents = more specific)
order = sorted(range(n), key=lambda i: len(extents[i]))
longest = [1] * n
for idx in range(n):
i = order[idx]
for jdx in range(idx + 1, n):
j = order[jdx]
if extents[i] < extents[j]: # strict subset
if longest[j] < longest[i] + 1:
longest[j] = longest[i] + 1
return max(longest) if longest else 0
def find_gap_concepts(
context: FormalContext,
lattice: Optional[ConceptLattice] = None,
) -> list[FormalConcept]:
"""Find formal concepts with empty extent (coverage gaps).
These represent attribute combinations that are structurally
present in the lattice but have no corresponding entities.
Args:
context: The formal context.
lattice: Pre-computed lattice. If ``None``, computed from *context*.
Returns:
List of :class:`FormalConcept` with empty extent, sorted by
intent size ascending (most specific gaps first).
"""
if lattice is None:
lattice = ConceptLattice.from_context(context)
gaps = lattice.gap_concepts()
gaps.sort(key=lambda c: c.intent_size)
return gaps
def find_empty_cells(
context: FormalContext,
dimension_a: list[str],
dimension_b: list[str],
) -> list[tuple[str, str]]:
"""Find empty cells in a two-dimensional cross-tabulation.
Given two sets of attributes (e.g. domain values and VSM systems),
return pairs ``(attr_a, attr_b)`` where no object possesses both.
This is a simpler alternative to full FCA for two-dimensional
coverage analysis.
"""
empty: list[tuple[str, str]] = []
for a in sorted(dimension_a):
for b in sorted(dimension_b):
if not context.extent([a, b]):
empty.append((a, b))
return empty
# ── NextClosure internals ───────────────────────────────────────────
def _next_closure(
current: frozenset[str],
attrs: list[str],
closure_fn,
) -> Optional[frozenset[str]]:
"""Compute the next closed set in lectic order after *current*.
Implements Ganter's NextClosure algorithm.
"""
for i in range(len(attrs) - 1, -1, -1):
m = attrs[i]
if m in current:
current = current - {m}
else:
candidate = current | {m}
closed = closure_fn(candidate)
# Canonicity test: no attribute before position i
# was added by the closure
canonical = True
for j in range(i):
if attrs[j] in closed and attrs[j] not in candidate:
canonical = False
break
if canonical:
return closed
return None

184
markitect/analysis/graph.py Normal file
View File

@@ -0,0 +1,184 @@
"""
Graph analysis utilities for collection-level metrics.
Provides connected components, centrality, community detection,
modularity, degree distribution, and cohesion/coupling computation.
Requires ``networkx`` (optional dependency)::
pip install networkx
"""
from __future__ import annotations
from typing import Optional
from markitect.prompts.dependencies.models import DependencyGraph
def _require_networkx():
"""Import and return networkx, raising a clear error if missing."""
try:
import networkx as nx
return nx
except ImportError:
raise ImportError(
"networkx is required for graph analysis. "
"Install it with: pip install networkx"
) from None
def to_networkx(graph: DependencyGraph):
"""Convert a :class:`DependencyGraph` to a networkx ``DiGraph``.
Each edge carries an ``edge_type`` attribute (string value of the
:class:`EdgeType` enum, or ``None``).
"""
nx = _require_networkx()
G = nx.DiGraph()
G.add_nodes_from(graph.nodes)
for node in graph.nodes:
for succ in graph.get_successors(node):
edge_type = graph.get_edge_type(node, succ)
G.add_edge(
node, succ,
edge_type=edge_type.value if edge_type else None,
)
return G
def connected_components(graph: DependencyGraph) -> list[set[str]]:
"""Find weakly connected components (edges treated as undirected).
Returns a list of node sets, one per component, sorted largest-first.
"""
nx = _require_networkx()
G = to_networkx(graph)
components = list(nx.weakly_connected_components(G))
components.sort(key=len, reverse=True)
return [set(c) for c in components]
def betweenness_centrality(graph: DependencyGraph) -> dict[str, float]:
"""Compute betweenness centrality for all nodes.
Returns a dict mapping node ID to centrality score in [0, 1].
"""
nx = _require_networkx()
G = to_networkx(graph)
return nx.betweenness_centrality(G)
def detect_communities(
graph: DependencyGraph,
seed: Optional[int] = None,
) -> list[set[str]]:
"""Detect communities using the Louvain algorithm.
Operates on an undirected projection of the graph. Returns a list
of node sets, one per community, sorted largest-first.
Args:
graph: The dependency graph to analyse.
seed: Random seed for reproducibility (passed to Louvain).
"""
nx = _require_networkx()
G = to_networkx(graph).to_undirected()
if len(G.nodes) == 0:
return []
communities = list(nx.community.louvain_communities(G, seed=seed))
communities.sort(key=len, reverse=True)
return [set(c) for c in communities]
def modularity_score(
graph: DependencyGraph,
communities: Optional[list[set[str]]] = None,
seed: Optional[int] = None,
) -> float:
"""Compute the modularity score for a community partition.
Args:
graph: The dependency graph.
communities: Pre-computed communities. If ``None``, communities
are detected via :func:`detect_communities`.
seed: Random seed (used only when *communities* is ``None``).
Returns:
Modularity in [-0.5, 1.0]. Returns 0.0 for graphs with no edges.
"""
nx = _require_networkx()
G = to_networkx(graph).to_undirected()
if len(G.edges) == 0:
return 0.0
if communities is None:
communities = detect_communities(graph, seed=seed)
return nx.community.modularity(G, communities)
def degree_distribution(graph: DependencyGraph) -> dict[str, dict[str, int]]:
"""Compute in-degree, out-degree, and total degree for each node.
Returns::
{"node_id": {"in_degree": 2, "out_degree": 1, "total_degree": 3}, ...}
"""
nx = _require_networkx()
G = to_networkx(graph)
result = {}
for node in G.nodes:
ind = G.in_degree(node)
outd = G.out_degree(node)
result[node] = {
"in_degree": ind,
"out_degree": outd,
"total_degree": ind + outd,
}
return result
def cohesion_coupling(
graph: DependencyGraph,
communities: Optional[list[set[str]]] = None,
seed: Optional[int] = None,
) -> dict:
"""Compute cohesion (intra-community edges) and coupling (inter-community edges).
Args:
graph: The dependency graph.
communities: Pre-computed communities. If ``None``, detected
via :func:`detect_communities`.
seed: Random seed (used only when *communities* is ``None``).
Returns:
Dict with keys ``cohesion``, ``coupling`` (ratios in [0, 1]),
``intra_edges``, ``inter_edges``, ``total_edges``, ``communities``.
"""
_require_networkx()
G = to_networkx(graph)
if communities is None:
communities = detect_communities(graph, seed=seed)
# Build node → community index
node_community: dict[str, int] = {}
for i, comm in enumerate(communities):
for node in comm:
node_community[node] = i
intra = 0
inter = 0
for u, v in G.edges:
if node_community.get(u) == node_community.get(v):
intra += 1
else:
inter += 1
total = intra + inter
return {
"cohesion": intra / total if total > 0 else 0.0,
"coupling": inter / total if total > 0 else 0.0,
"intra_edges": intra,
"inter_edges": inter,
"total_edges": total,
"communities": len(communities),
}

View File

@@ -7147,6 +7147,13 @@ try:
except ImportError:
pass # Helper module not available
# Register infospace commands
try:
from markitect.infospace.cli import infospace_commands
cli.add_command(infospace_commands)
except ImportError:
pass # Infospace module not available
# Register proxy file system commands
try:
from markitect.proxy.cli import proxy_group

View File

@@ -9,6 +9,7 @@ This package contains the fundamental building blocks:
"""
from .parser import parse_markdown_to_ast
from .section_tree import build_section_tree, extract_section_text
from .serializer import ASTSerializer
from .document_manager import DocumentManager, CleanDocumentManager
from .workspace import (
@@ -29,6 +30,9 @@ from .workspace import (
__all__ = [
# Parser
"parse_markdown_to_ast",
# Section tree
"build_section_tree",
"extract_section_text",
# Serializer
"ASTSerializer",
# Document Manager

View File

@@ -0,0 +1,124 @@
"""
Standalone section-tree utilities extracted from SchemaGenerator.
Builds a hierarchical section tree from flat markdown-it AST tokens and
provides helpers for navigating heading structure and extracting text.
These functions are used by both the schema generator and the infospace
entity parser.
"""
import re
from typing import Any, Dict, List, Optional
def slugify(text: str) -> str:
"""Convert heading or label text to a valid slug / JSON property key."""
replacements = {
'ä': 'ae', 'ö': 'oe', 'ü': 'ue',
'Ä': 'Ae', 'Ö': 'Oe', 'Ü': 'Ue', 'ß': 'ss',
}
slug = text
for char, repl in replacements.items():
slug = slug.replace(char, repl)
slug = slug.lower()
slug = re.sub(r'[^a-z0-9]+', '_', slug)
slug = slug.strip('_')
return slug or 'feld'
def extract_heading_level(tag: str) -> int:
"""Extract heading level from an HTML tag string (h1, h2, …)."""
if tag.startswith('h') and len(tag) == 2:
try:
return int(tag[1])
except ValueError:
pass
return 1
def extract_heading_content(tokens: List[Dict[str, Any]], start_index: int) -> str:
"""Return the inline text content following a ``heading_open`` token."""
for i in range(start_index, min(start_index + 3, len(tokens))):
token = tokens[i]
if token.get('type') == 'inline':
return token.get('content', '')
return ''
def build_section_tree(
tokens: List[Dict[str, Any]], max_depth: Optional[int] = None
) -> Dict[str, Any]:
"""
Build a hierarchical section tree from a flat markdown-it token list.
Returns a root node whose ``children`` list contains the top-level
sections. Each node carries:
- ``heading`` heading text (``None`` for the root)
- ``level`` heading depth (``0`` for the root)
- ``slug`` slugified heading
- ``content_tokens`` non-heading tokens belonging to this section
- ``children`` nested sub-sections
"""
root: Dict[str, Any] = {
'heading': None, 'level': 0, 'slug': '',
'content_tokens': [], 'children': []
}
stack = [root]
i = 0
while i < len(tokens):
token = tokens[i]
if token.get('type') == 'heading_open':
level = extract_heading_level(token.get('tag', ''))
heading_text = extract_heading_content(tokens, i)
if max_depth is not None and level > max_depth:
# Skip this heading and its close token, but keep content
i += 1
while i < len(tokens) and tokens[i].get('type') != 'heading_close':
i += 1
i += 1
continue
section: Dict[str, Any] = {
'heading': heading_text,
'level': level,
'slug': slugify(heading_text),
'content_tokens': [],
'children': []
}
# Pop stack until we find the parent (level < current)
while len(stack) > 1 and stack[-1]['level'] >= level:
stack.pop()
stack[-1]['children'].append(section)
stack.append(section)
# Skip past heading_close
i += 1
while i < len(tokens) and tokens[i].get('type') != 'heading_close':
i += 1
else:
# Add content token to current section
stack[-1]['content_tokens'].append(token)
i += 1
return root
def extract_section_text(section: Dict[str, Any]) -> str:
"""
Return the plain text content of a section node.
Concatenates the ``content`` field of every ``inline`` token found
in the section's ``content_tokens``. Paragraphs are separated by
newlines; other inline tokens are joined with spaces.
"""
parts: List[str] = []
for token in section.get('content_tokens', []):
if token.get('type') == 'inline':
parts.append(token.get('content', ''))
return '\n'.join(parts)

View File

@@ -0,0 +1,107 @@
"""
Infospace analysis package.
Provides tooling for extracting structured metadata from entity markdown
files and analysing infospace collections.
"""
from .models import EntityMeta
from .entity_parser import parse_entity_file, parse_entity_directory
from .schema import (
ECONOMIC_ENTITY_SCHEMA,
EntitySchema,
EnumConstraint,
SectionRequirement,
SectionRule,
)
from .validator import (
BatchComplianceResult,
ComplianceDiagnostic,
ComplianceResult,
validate_entities,
validate_entity,
)
from .evaluation import (
EntityEvaluation,
EvaluationSnapshot,
MetricChange,
MetricValue,
ScoreChange,
ScoreEntry,
SnapshotDiff,
)
from .evaluation_io import (
append_to_history,
diff_snapshots,
read_entity_evaluation,
read_history,
read_snapshot,
write_entity_evaluation,
write_snapshot,
)
from .config import (
DisciplineBinding,
InfospaceConfig,
PipelineConfig,
PipelineStage,
SchemaRegistry,
TopicConfig,
ViabilityThreshold,
find_infospace_config,
load_infospace_config,
save_infospace_config,
)
from .state import (
InfospaceState,
ViabilityResult,
build_state,
)
__all__ = [
"EntityMeta",
"parse_entity_file",
"parse_entity_directory",
# Schema
"ECONOMIC_ENTITY_SCHEMA",
"EntitySchema",
"EnumConstraint",
"SectionRequirement",
"SectionRule",
# Validator
"BatchComplianceResult",
"ComplianceDiagnostic",
"ComplianceResult",
"validate_entities",
"validate_entity",
# Evaluation models
"EntityEvaluation",
"EvaluationSnapshot",
"MetricChange",
"MetricValue",
"ScoreChange",
"ScoreEntry",
"SnapshotDiff",
# Evaluation I/O
"append_to_history",
"diff_snapshots",
"read_entity_evaluation",
"read_history",
"read_snapshot",
"write_entity_evaluation",
"write_snapshot",
# Config
"DisciplineBinding",
"InfospaceConfig",
"PipelineConfig",
"PipelineStage",
"SchemaRegistry",
"TopicConfig",
"ViabilityThreshold",
"find_infospace_config",
"load_infospace_config",
"save_infospace_config",
# State
"InfospaceState",
"ViabilityResult",
"build_state",
]

View File

@@ -0,0 +1,23 @@
"""
Collection-level quality checks for infospaces.
Five concerns: Redundancy (C1), Coverage (C2), Coherence (C3),
Consistency (C4), Granularity (C5).
"""
from markitect.infospace.checks.redundancy import check_redundancy
from markitect.infospace.checks.coverage import check_coverage
from markitect.infospace.checks.coherence import check_coherence
from markitect.infospace.checks.consistency import check_consistency
from markitect.infospace.checks.granularity import check_granularity
from markitect.infospace.checks.orchestrator import run_all_checks, CheckReport
__all__ = [
"check_redundancy",
"check_coverage",
"check_coherence",
"check_consistency",
"check_granularity",
"run_all_checks",
"CheckReport",
]

View File

@@ -0,0 +1,81 @@
"""
C3 — Structural coherence.
Uses graph analysis to check that the entity relationship graph is
well-connected and has meaningful community structure.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from markitect.prompts.dependencies.models import DependencyGraph
@dataclass
class CoherenceReport:
"""Results from coherence analysis."""
connected_components: int = 0
largest_component_size: int = 0
modularity: float = 0.0
community_count: int = 0
cohesion: float = 0.0
coupling: float = 0.0
entity_count: int = 0
def to_dict(self) -> dict:
return {
"concern": "C3",
"connected_components": self.connected_components,
"largest_component_size": self.largest_component_size,
"modularity": round(self.modularity, 4),
"community_count": self.community_count,
"cohesion": round(self.cohesion, 4),
"coupling": round(self.coupling, 4),
"entity_count": self.entity_count,
}
def check_coherence(
graph: Optional[DependencyGraph] = None,
entity_count: int = 0,
) -> CoherenceReport:
"""Check structural coherence of the entity relationship graph.
Args:
graph: The entity relationship graph. If ``None``, returns
a report with zero values.
entity_count: Total number of entities (for context).
Returns:
:class:`CoherenceReport` with connectivity and community metrics.
"""
if graph is None or len(graph.nodes) == 0:
return CoherenceReport(entity_count=entity_count)
try:
from markitect.analysis.graph import (
connected_components,
modularity_score,
detect_communities,
cohesion_coupling,
)
except ImportError:
return CoherenceReport(entity_count=entity_count)
components = connected_components(graph)
communities = detect_communities(graph, seed=42)
mod = modularity_score(graph, communities=communities)
cc = cohesion_coupling(graph, communities=communities)
return CoherenceReport(
connected_components=len(components),
largest_component_size=len(components[0]) if components else 0,
modularity=mod,
community_count=len(communities),
cohesion=cc["cohesion"],
coupling=cc["coupling"],
entity_count=entity_count or len(graph.nodes),
)

View File

@@ -0,0 +1,58 @@
"""
C4 — Definitional consistency.
Checks for cycles in the dependency graph and definitional conflicts
between entities.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from markitect.infospace.models import EntityMeta
from markitect.prompts.dependencies.models import DependencyGraph
@dataclass
class ConsistencyReport:
"""Results from consistency analysis."""
cycles: List[List[str]] = field(default_factory=list)
cycle_count: int = 0
entity_count: int = 0
def to_dict(self) -> dict:
return {
"concern": "C4",
"cycle_count": self.cycle_count,
"cycles": self.cycles,
"entity_count": self.entity_count,
}
def check_consistency(
entities: List[EntityMeta],
graph: Optional[DependencyGraph] = None,
) -> ConsistencyReport:
"""Check definitional consistency.
Args:
entities: Entity metadata list.
graph: Optional dependency graph for cycle detection.
Returns:
:class:`ConsistencyReport` with cycles found.
"""
n = len(entities)
cycles: List[List[str]] = []
if graph is not None and len(graph.nodes) > 0:
raw_cycles = graph.detect_cycles()
cycles = raw_cycles
return ConsistencyReport(
cycles=cycles,
cycle_count=len(cycles),
entity_count=n,
)

View File

@@ -0,0 +1,111 @@
"""
C2 — Coverage completeness.
Uses FCA and cross-tabulation to detect structural coverage gaps:
attribute combinations (domain × VSM system) with no entities.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
from markitect.infospace.models import EntityMeta
from markitect.analysis.fca import FormalContext, find_empty_cells, find_gap_concepts
@dataclass
class CoverageReport:
"""Results from coverage analysis."""
coverage_ratio: float = 0.0
empty_cells: List[dict] = field(default_factory=list)
gap_concepts: List[dict] = field(default_factory=list)
domain_counts: Dict[str, int] = field(default_factory=dict)
entity_count: int = 0
def to_dict(self) -> dict:
return {
"concern": "C2",
"coverage_ratio": round(self.coverage_ratio, 4),
"empty_cells": self.empty_cells,
"gap_concepts_count": len(self.gap_concepts),
"domain_counts": self.domain_counts,
"entity_count": self.entity_count,
}
def _extract_attributes(entity: EntityMeta) -> set[str]:
"""Extract FCA attributes from an entity."""
attrs: set[str] = set()
if entity.domain:
attrs.add(f"domain:{entity.domain}")
if entity.source_chapter:
attrs.add(f"chapter:{entity.source_chapter}")
return attrs
def check_coverage(
entities: List[EntityMeta],
extra_attributes: Optional[Dict[str, set[str]]] = None,
) -> CoverageReport:
"""Check coverage completeness using FCA gap analysis.
Args:
entities: Entity metadata list.
extra_attributes: Optional ``{slug: {attr, ...}}`` to merge
with auto-extracted attributes (e.g. VSM mappings).
Returns:
:class:`CoverageReport` with gaps and coverage ratio.
"""
n = len(entities)
if n == 0:
return CoverageReport()
# Build entity → attributes mapping
entity_attrs: Dict[str, set[str]] = {}
for e in entities:
attrs = _extract_attributes(e)
if extra_attributes and e.slug in extra_attributes:
attrs.update(extra_attributes[e.slug])
entity_attrs[e.slug] = attrs
# Domain counts
domain_counts: Dict[str, int] = {}
for e in entities:
d = e.domain or "(unspecified)"
domain_counts[d] = domain_counts.get(d, 0) + 1
# Build FCA context
context = FormalContext.from_dict(entity_attrs)
# Cross-tabulation: domain × chapter
domains = sorted({a for attrs in entity_attrs.values() for a in attrs if a.startswith("domain:")})
chapters = sorted({a for attrs in entity_attrs.values() for a in attrs if a.startswith("chapter:")})
empty = []
if domains and chapters:
raw_empty = find_empty_cells(context, domains, chapters)
empty = [{"dimension_a": a, "dimension_b": b} for a, b in raw_empty]
# FCA gap concepts
gaps = find_gap_concepts(context)
gap_dicts = [
{"intent": sorted(g.intent), "extent_size": g.extent_size}
for g in gaps
if g.intent_size <= 4 # Only report manageable gaps
]
# Coverage ratio: populated cells / total possible cells
total_cells = len(domains) * len(chapters) if domains and chapters else 1
populated = total_cells - len(empty)
ratio = populated / total_cells if total_cells > 0 else 0.0
return CoverageReport(
coverage_ratio=ratio,
empty_cells=empty,
gap_concepts=gap_dicts,
domain_counts=domain_counts,
entity_count=n,
)

View File

@@ -0,0 +1,98 @@
"""
C5 — Granularity balance.
Checks that entities are at a consistent level of abstraction,
measured by word count distribution and Shannon entropy of domain
assignments.
"""
from __future__ import annotations
import math
from dataclasses import dataclass, field
from typing import Dict, List
from markitect.infospace.models import EntityMeta
@dataclass
class GranularityReport:
"""Results from granularity analysis."""
domain_entropy: float = 0.0
word_count_stats: Dict[str, float] = field(default_factory=dict)
domain_distribution: Dict[str, int] = field(default_factory=dict)
entity_count: int = 0
def to_dict(self) -> dict:
return {
"concern": "C5",
"domain_entropy": round(self.domain_entropy, 4),
"word_count_stats": {
k: round(v, 2) for k, v in self.word_count_stats.items()
},
"domain_distribution": self.domain_distribution,
"entity_count": self.entity_count,
}
def _shannon_entropy(counts: Dict[str, int]) -> float:
"""Compute Shannon entropy of a distribution."""
total = sum(counts.values())
if total == 0:
return 0.0
entropy = 0.0
for count in counts.values():
if count > 0:
p = count / total
entropy -= p * math.log2(p)
return entropy
def check_granularity(entities: List[EntityMeta]) -> GranularityReport:
"""Check granularity balance across entities.
Metrics:
- Domain entropy: higher = more balanced distribution.
- Word count statistics: mean, min, max, std dev.
Args:
entities: Entity metadata list.
Returns:
:class:`GranularityReport` with balance metrics.
"""
n = len(entities)
if n == 0:
return GranularityReport()
# Domain distribution
domain_counts: Dict[str, int] = {}
for e in entities:
d = e.domain or "(unspecified)"
domain_counts[d] = domain_counts.get(d, 0) + 1
entropy = _shannon_entropy(domain_counts)
# Word count statistics
word_counts = [e.definition_word_count for e in entities]
if not word_counts:
word_counts = [0]
mean_wc = sum(word_counts) / len(word_counts)
min_wc = min(word_counts)
max_wc = max(word_counts)
variance = sum((wc - mean_wc) ** 2 for wc in word_counts) / len(word_counts)
std_wc = math.sqrt(variance)
return GranularityReport(
domain_entropy=entropy,
word_count_stats={
"mean": mean_wc,
"min": float(min_wc),
"max": float(max_wc),
"std": std_wc,
},
domain_distribution=domain_counts,
entity_count=n,
)

View File

@@ -0,0 +1,102 @@
"""
Unified orchestrator for all five collection-level checks.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
from markitect.infospace.models import EntityMeta
from markitect.prompts.dependencies.models import DependencyGraph
from .redundancy import RedundancyReport, check_redundancy
from .coverage import CoverageReport, check_coverage
from .coherence import CoherenceReport, check_coherence
from .consistency import ConsistencyReport, check_consistency
from .granularity import GranularityReport, check_granularity
@dataclass
class CheckReport:
"""Unified report from all five collection-level checks."""
redundancy: Optional[RedundancyReport] = None
coverage: Optional[CoverageReport] = None
coherence: Optional[CoherenceReport] = None
consistency: Optional[ConsistencyReport] = None
granularity: Optional[GranularityReport] = None
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {}
if self.redundancy:
d["redundancy"] = self.redundancy.to_dict()
if self.coverage:
d["coverage"] = self.coverage.to_dict()
if self.coherence:
d["coherence"] = self.coherence.to_dict()
if self.consistency:
d["consistency"] = self.consistency.to_dict()
if self.granularity:
d["granularity"] = self.granularity.to_dict()
return d
def metrics(self) -> Dict[str, float]:
"""Extract key metrics for viability checking."""
m: Dict[str, float] = {}
if self.redundancy:
m["redundancy_ratio"] = self.redundancy.redundancy_ratio
if self.coverage:
m["coverage_ratio"] = self.coverage.coverage_ratio
if self.coherence:
m["coherence_components"] = float(self.coherence.connected_components)
m["modularity"] = self.coherence.modularity
if self.consistency:
m["consistency_cycles"] = float(self.consistency.cycle_count)
if self.granularity:
m["granularity_entropy"] = self.granularity.domain_entropy
return m
def run_all_checks(
entities: List[EntityMeta],
embeddings: Optional[Dict[str, list[float]]] = None,
graph: Optional[DependencyGraph] = None,
extra_attributes: Optional[Dict[str, set[str]]] = None,
checks: Optional[List[str]] = None,
) -> CheckReport:
"""Run all (or selected) collection-level checks.
Args:
entities: Entity metadata list.
embeddings: Pre-computed embedding vectors for C1.
graph: Entity relationship graph for C3 and C4.
extra_attributes: Extra FCA attributes for C2.
checks: List of check names to run. If ``None``, runs all five.
Valid names: ``redundancy``, ``coverage``, ``coherence``,
``consistency``, ``granularity``.
Returns:
:class:`CheckReport` with results from each check.
"""
run_all = checks is None
check_set = set(checks) if checks else set()
report = CheckReport()
if run_all or "redundancy" in check_set:
report.redundancy = check_redundancy(entities, embeddings=embeddings)
if run_all or "coverage" in check_set:
report.coverage = check_coverage(entities, extra_attributes=extra_attributes)
if run_all or "coherence" in check_set:
report.coherence = check_coherence(graph=graph, entity_count=len(entities))
if run_all or "consistency" in check_set:
report.consistency = check_consistency(entities, graph=graph)
if run_all or "granularity" in check_set:
report.granularity = check_granularity(entities)
return report

View File

@@ -0,0 +1,98 @@
"""
C1 — Redundancy detection.
Uses embedding similarity to find entity pairs with overlapping
meanings that may be candidates for merging.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from markitect.infospace.models import EntityMeta
from markitect.llm.similarity import find_similar_pairs
@dataclass
class RedundancyReport:
"""Results from redundancy analysis."""
similar_pairs: List[dict] = field(default_factory=list)
redundancy_ratio: float = 0.0
entity_count: int = 0
def to_dict(self) -> dict:
return {
"concern": "C1",
"redundancy_ratio": round(self.redundancy_ratio, 4),
"similar_pairs": self.similar_pairs,
"entity_count": self.entity_count,
}
def check_redundancy(
entities: List[EntityMeta],
embeddings: Optional[Dict[str, list[float]]] = None,
threshold: float = 0.85,
) -> RedundancyReport:
"""Check for redundant entities using embedding similarity.
Args:
entities: Entity metadata list.
embeddings: Pre-computed ``{slug: vector}`` mapping.
If ``None``, redundancy is checked structurally (title overlap).
threshold: Similarity threshold for flagging pairs.
Returns:
:class:`RedundancyReport` with similar pairs and ratio.
"""
n = len(entities)
if n < 2:
return RedundancyReport(entity_count=n)
pairs: list[dict] = []
if embeddings:
# Embedding-based similarity
raw_pairs = find_similar_pairs(embeddings, threshold=threshold)
for slug_a, slug_b, sim in raw_pairs:
pairs.append({
"entity_a": slug_a,
"entity_b": slug_b,
"similarity": round(sim, 4),
"method": "embedding",
})
else:
# Fallback: structural overlap (shared definition words)
slug_to_words = {}
for e in entities:
words = set(e.definition.lower().split()) if e.definition else set()
slug_to_words[e.slug] = words
slugs = sorted(slug_to_words)
for i, a in enumerate(slugs):
for b in slugs[i + 1:]:
wa, wb = slug_to_words[a], slug_to_words[b]
if wa and wb:
overlap = len(wa & wb) / min(len(wa), len(wb))
if overlap >= threshold:
pairs.append({
"entity_a": a,
"entity_b": b,
"similarity": round(overlap, 4),
"method": "word_overlap",
})
# redundancy_ratio: fraction of entities involved in similar pairs
involved = set()
for p in pairs:
involved.add(p["entity_a"])
involved.add(p["entity_b"])
ratio = len(involved) / n if n > 0 else 0.0
return RedundancyReport(
similar_pairs=pairs,
redundancy_ratio=ratio,
entity_count=n,
)

524
markitect/infospace/cli.py Normal file
View File

@@ -0,0 +1,524 @@
"""
CLI commands for infospace lifecycle management.
Provides ``markitect infospace`` subcommands for initialising,
inspecting, and evaluating infospaces.
"""
from __future__ import annotations
from pathlib import Path
from typing import Optional
import click
from markitect.infospace.config import (
DisciplineBinding,
InfospaceConfig,
SchemaRegistry,
TopicConfig,
find_infospace_config,
load_infospace_config,
save_infospace_config,
)
from markitect.infospace.entity_parser import parse_entity_directory
from markitect.infospace.state import build_state
def _load_config_or_exit(config_path: Optional[str] = None) -> tuple:
"""Resolve and load infospace.yaml, or exit with an error."""
if config_path:
p = Path(config_path)
else:
p = find_infospace_config()
if p is None:
click.echo("Error: No infospace.yaml found. Run 'markitect infospace init' first.", err=True)
raise SystemExit(1)
cfg = load_infospace_config(p)
return cfg, p
@click.group(name="infospace")
def infospace_commands():
"""Manage infospaces — create, inspect, evaluate."""
pass
# ── init ─────────────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--topic", required=True, help="Topic name for the infospace.")
@click.option("--domain", default="", help="Knowledge domain.")
@click.option("--sources", default="", help="Path to source material directory.")
@click.option("--discipline", multiple=True, help="Discipline name (repeatable).")
@click.option("--output", "-o", default="infospace.yaml", help="Output config file path.")
def init(topic: str, domain: str, sources: str, discipline: tuple, output: str):
"""Initialise a new infospace configuration file."""
out_path = Path(output)
if out_path.exists():
click.echo(f"Error: {out_path} already exists.", err=True)
raise SystemExit(1)
disciplines = [DisciplineBinding(name=d) for d in discipline]
config = InfospaceConfig(
topic=TopicConfig(name=topic, domain=domain, sources=sources),
disciplines=disciplines,
)
save_infospace_config(config, out_path)
click.echo(f"Created {out_path}")
# ── status ───────────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
def status(config_path: Optional[str]):
"""Show infospace status — entity count, domains, evaluation state."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
# Parse entities
entities_dir = root / cfg.entities_dir
entities = []
if entities_dir.is_dir():
entities = parse_entity_directory(entities_dir)
# Load latest snapshot if available
snapshot = None
history_path = root / cfg.metrics_dir / "history.yaml"
if history_path.is_file():
from markitect.infospace.evaluation_io import read_history
history = read_history(history_path)
if history:
snapshot = history[-1]
state = build_state(cfg, entities=entities, snapshot=snapshot)
click.echo(f"Infospace: {state.topic_name}")
if cfg.topic.domain:
click.echo(f"Domain: {cfg.topic.domain}")
click.echo(f"Entities: {state.entity_count}")
if state.domains:
click.echo(f"Domains: {', '.join(state.domains)}")
if cfg.disciplines:
names = [d.name for d in cfg.disciplines]
click.echo(f"Disciplines: {', '.join(names)}")
if state.has_evaluations:
click.echo(f"Last evaluated: {state.latest_snapshot.created_at.isoformat()}")
else:
click.echo("Evaluations: none")
# ── entities ─────────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
@click.option(
"--sort-by", "sort_key",
type=click.Choice(["slug", "domain", "words"]),
default="slug",
help="Sort entities by field.",
)
def entities(config_path: Optional[str], sort_key: str):
"""List entities with metadata summary."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
entities_dir = root / cfg.entities_dir
if not entities_dir.is_dir():
click.echo("No entities directory found.")
return
entity_list = parse_entity_directory(entities_dir)
if not entity_list:
click.echo("No entities found.")
return
# Sort
if sort_key == "domain":
entity_list.sort(key=lambda e: (e.domain or "", e.slug))
elif sort_key == "words":
entity_list.sort(key=lambda e: e.total_word_count, reverse=True)
else:
entity_list.sort(key=lambda e: e.slug)
# Format as table
click.echo(f"{'Slug':<40} {'Domain':<20} {'Words':>6}")
click.echo("-" * 68)
for e in entity_list:
click.echo(f"{e.slug:<40} {(e.domain or '-'):<20} {e.total_word_count:>6}")
click.echo(f"\nTotal: {len(entity_list)} entities")
# ── evaluate ─────────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
@click.option("--provider", default="openrouter", help="LLM provider (openrouter, openai, etc.).")
@click.option("--model", default=None, help="LLM model name.")
@click.option("--entity", "entity_slug", default=None, help="Evaluate a single entity by slug.")
@click.option("--chapter", default=None, help="Evaluate entities from a specific chapter.")
def evaluate(config_path, provider, model, entity_slug, chapter):
"""Evaluate entities using LLM-based quality assessment."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
entities_dir = root / cfg.entities_dir
if not entities_dir.is_dir():
click.echo("Error: No entities directory found.", err=True)
raise SystemExit(1)
entity_list = parse_entity_directory(entities_dir)
if not entity_list:
click.echo("No entities to evaluate.")
return
# Filter
if entity_slug:
entity_list = [e for e in entity_list if e.slug == entity_slug]
if not entity_list:
click.echo(f"Error: Entity '{entity_slug}' not found.", err=True)
raise SystemExit(1)
elif chapter:
entity_list = [e for e in entity_list if chapter in e.source_chapter]
if not entity_list:
click.echo(f"No entities found for chapter '{chapter}'.")
return
# Create adapter
from markitect.llm import create_adapter
from markitect.prompts.execution.models import RunConfig
adapter = create_adapter(provider, model=model)
run_config = RunConfig(model_name=model or "default", temperature=0.3, max_tokens=2000)
# Progress callback
def on_progress(done, total, result):
status = result.status.upper()
click.echo(f" [{done}/{total}] {result.key}: {status}")
click.echo(f"Evaluating {len(entity_list)} entities via {provider}...")
from markitect.infospace.evaluate import run_entity_evaluation
output_dir = root / cfg.evaluations_dir
summary = run_entity_evaluation(
config=cfg,
entities=entity_list,
adapter=adapter,
run_config=run_config,
output_dir=output_dir,
progress_callback=on_progress,
)
click.echo(f"\nDone: {summary.succeeded} succeeded, {summary.failed} failed, {summary.skipped} skipped")
if summary.total_tokens > 0:
click.echo(f"Tokens used: {summary.total_tokens}")
# ── viability ────────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
def viability(config_path: Optional[str]):
"""Show viability dashboard — threshold checks and pass/fail."""
cfg, cfg_path = _load_config_or_exit(config_path)
if not cfg.viability:
click.echo("No viability thresholds configured in infospace.yaml.")
return
# Try to load latest metrics
root = cfg_path.parent
metrics: dict = {}
metrics_file = root / cfg.metrics_dir / "metrics.yaml"
if metrics_file.is_file():
import yaml
raw = yaml.safe_load(metrics_file.read_text(encoding="utf-8"))
if isinstance(raw, dict):
metrics = {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
state = build_state(cfg, metrics=metrics if metrics else None)
if not state.viability_results:
click.echo("No metrics available. Run evaluations first.")
click.echo("\nConfigured thresholds:")
for name, t in cfg.viability.items():
bounds = []
if t.min is not None:
bounds.append(f"min={t.min}")
if t.max is not None:
bounds.append(f"max={t.max}")
click.echo(f" {name}: {', '.join(bounds)}")
return
click.echo(f"{'Metric':<30} {'Value':>8} {'Threshold':>15} {'Status':>8}")
click.echo("-" * 63)
for r in state.viability_results:
bounds = []
if r.threshold.min is not None:
bounds.append(f"min={r.threshold.min}")
if r.threshold.max is not None:
bounds.append(f"max={r.threshold.max}")
status_str = "PASS" if r.passed else "FAIL"
click.echo(
f"{r.metric:<30} {r.value:>8.4f} {', '.join(bounds):>15} {status_str:>8}"
)
click.echo()
if state.is_viable:
click.echo(f"Viable: YES ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
else:
click.echo(f"Viable: NO ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
# ── check ───────────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
@click.option(
"--concern", "concerns", multiple=True,
type=click.Choice(["redundancy", "coverage", "coherence", "consistency", "granularity"]),
help="Run specific concern(s). Omit to run all five.",
)
@click.option("--json", "as_json", is_flag=True, help="Output results as JSON.")
def check(config_path: Optional[str], concerns: tuple, as_json: bool):
"""Run collection-level quality checks (C1C5)."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
entities_dir = root / cfg.entities_dir
if not entities_dir.is_dir():
click.echo("Error: No entities directory found.", err=True)
raise SystemExit(1)
entity_list = parse_entity_directory(entities_dir)
if not entity_list:
click.echo("No entities to check.")
return
from markitect.infospace.checks import run_all_checks
checks_list = list(concerns) if concerns else None
report = run_all_checks(
entities=entity_list,
checks=checks_list,
)
if as_json:
import json
click.echo(json.dumps(report.to_dict(), indent=2))
else:
click.echo(f"Collection checks — {len(entity_list)} entities\n")
d = report.to_dict()
for concern_name, concern_data in d.items():
label = concern_data.get("concern", concern_name.upper())
click.echo(f" {label}{concern_name}")
for k, v in concern_data.items():
if k == "concern":
continue
click.echo(f" {k}: {v}")
click.echo()
# Show summary metrics
m = report.metrics()
if m and not as_json:
click.echo("Metrics summary:")
for k, v in sorted(m.items()):
click.echo(f" {k}: {v:.4f}")
# Record to history
if m:
from markitect.infospace.history import record_check_results
snap = record_check_results(report, cfg, root, entity_count=len(entity_list))
if not as_json:
click.echo(f"\nRecorded snapshot {snap.snapshot_id}")
# ── history ─────────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
@click.option("--metric", default=None, help="Show trend for a specific metric.")
@click.option("--json", "as_json", is_flag=True, help="Output as JSON.")
def history(config_path: Optional[str], metric: Optional[str], as_json: bool):
"""Show metrics history — snapshots over time."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
from markitect.infospace.history import get_history, metric_trend
snapshots = get_history(cfg, root)
if not snapshots:
click.echo("No history found. Run 'markitect infospace check' first.")
return
if metric:
trend = metric_trend(snapshots, metric)
if not trend:
click.echo(f"No data for metric '{metric}'.")
return
if as_json:
import json
click.echo(json.dumps(trend, indent=2))
else:
click.echo(f"Trend: {metric}\n")
for entry in trend:
click.echo(f" {entry['date'][:19]} {entry['value']:.4f}")
return
if as_json:
import json
click.echo(json.dumps([s.to_dict() for s in snapshots], indent=2, default=str))
return
click.echo(f"History: {len(snapshots)} snapshot(s)\n")
click.echo(f"{'#':<4} {'Date':<20} {'Entities':>8} {'Metrics':>8}")
click.echo("-" * 42)
for i, snap in enumerate(snapshots, 1):
date_str = snap.created_at.isoformat()[:19]
n_metrics = len(snap.collection_metrics)
click.echo(f"{i:<4} {date_str:<20} {snap.entity_count:>8} {n_metrics:>8}")
@infospace_commands.command(name="history-diff")
@click.argument("date_a")
@click.argument("date_b")
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
def history_diff(date_a: str, date_b: str, config_path: Optional[str]):
"""Compare two history snapshots by date (YYYY-MM-DD)."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
from markitect.infospace.history import find_snapshot_by_date, get_history
from markitect.infospace.evaluation_io import diff_snapshots
snapshots = get_history(cfg, root)
if len(snapshots) < 2:
click.echo("Need at least two snapshots to diff.")
return
snap_a = find_snapshot_by_date(snapshots, date_a)
snap_b = find_snapshot_by_date(snapshots, date_b)
if snap_a is None:
click.echo(f"No snapshot found near '{date_a}'.")
return
if snap_b is None:
click.echo(f"No snapshot found near '{date_b}'.")
return
if snap_a.snapshot_id == snap_b.snapshot_id:
click.echo("Both dates resolve to the same snapshot.")
return
diff = diff_snapshots(snap_a, snap_b)
click.echo(diff.summary())
# ── bind-discipline ─────────────────────────────────────────────────
@infospace_commands.command(name="bind-discipline")
@click.argument("discipline_path")
@click.option("--name", required=True, help="Name for the discipline.")
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
def bind_discipline_cmd(discipline_path: str, name: str, config_path: Optional[str]):
"""Bind a discipline infospace to the current infospace."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
from markitect.infospace.composition import bind_discipline
status = bind_discipline(cfg, name=name, path=discipline_path, root=root)
if status.error:
click.echo(f"Error: {status.error}", err=True)
raise SystemExit(1)
# Persist updated config
save_infospace_config(cfg, cfg_path)
click.echo(f"Bound discipline '{name}' from {discipline_path}")
click.echo(f" Entities: {status.entity_count}")
if status.has_config:
viable_str = "YES" if status.is_viable else "NO"
click.echo(f" Viable: {viable_str}")
# ── disciplines ─────────────────────────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
def disciplines(config_path: Optional[str]):
"""List bound disciplines and their viability status."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
if not cfg.disciplines:
click.echo("No disciplines bound.")
return
from markitect.infospace.composition import check_discipline_status
click.echo(f"{'Name':<30} {'Entities':>8} {'Viable':>8} {'Path'}")
click.echo("-" * 70)
for binding in cfg.disciplines:
status = check_discipline_status(binding, root)
viable_str = "YES" if status.is_viable else ("NO" if status.has_config else "?")
click.echo(
f"{status.name:<30} {status.entity_count:>8} {viable_str:>8} {status.path}"
)
if status.error:
click.echo(f" Error: {status.error}")
# ── stale-mappings ──────────────────────────────────────────────────
@infospace_commands.command(name="stale-mappings")
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
def stale_mappings(config_path: Optional[str]):
"""Check for stale mappings due to discipline changes."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
if not cfg.disciplines:
click.echo("No disciplines bound — no mappings to check.")
return
from markitect.infospace.composition import find_stale_mappings
# Try to load mapping references from output
mapping_refs = _load_mapping_references(cfg, root)
stale = find_stale_mappings(cfg, root, mapping_references=mapping_refs)
if not stale:
click.echo("No stale mappings detected.")
return
click.echo(f"Found {len(stale)} stale mapping(s):\n")
for s in stale:
click.echo(f" {s.entity_slug} -> {s.discipline_entity}")
click.echo(f" {s.reason}")
def _load_mapping_references(
cfg: InfospaceConfig, root: Path
) -> Optional[dict]:
"""Try to load mapping references from YAML file in output dir."""
mapping_file = root / cfg.metrics_dir / "mapping-references.yaml"
if not mapping_file.is_file():
return None
import yaml
data = yaml.safe_load(mapping_file.read_text(encoding="utf-8"))
if isinstance(data, dict):
return data
return None

View File

@@ -0,0 +1,281 @@
"""
Infospace composition model.
Allows one infospace to use another as a discipline — a reusable
framework of concepts applied as an analytical lens.
Key operations:
- Resolve and validate discipline bindings
- Check discipline viability (must meet its own thresholds)
- List discipline entities as mapping targets
- Detect stale mappings when discipline content changes
"""
from __future__ import annotations
import hashlib
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional
from markitect.infospace.config import (
DisciplineBinding,
InfospaceConfig,
load_infospace_config,
)
from markitect.infospace.entity_parser import parse_entity_directory
from markitect.infospace.history import get_latest_snapshot, read_metrics_file
from markitect.infospace.models import EntityMeta
from markitect.infospace.state import InfospaceState, ViabilityResult, build_state
@dataclass
class DisciplineStatus:
"""Status of a bound discipline infospace."""
name: str
path: str
resolved_path: Optional[Path] = None
exists: bool = False
has_config: bool = False
entity_count: int = 0
is_viable: bool = False
viability_results: List[ViabilityResult] = field(default_factory=list)
error: str = ""
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {
"name": self.name,
"path": self.path,
"exists": self.exists,
"has_config": self.has_config,
"entity_count": self.entity_count,
"is_viable": self.is_viable,
}
if self.viability_results:
d["viability"] = [r.to_dict() for r in self.viability_results]
if self.error:
d["error"] = self.error
return d
@dataclass
class StaleMappingInfo:
"""Information about a mapping that may be stale."""
entity_slug: str
discipline_entity: str
reason: str
def to_dict(self) -> Dict[str, Any]:
return {
"entity_slug": self.entity_slug,
"discipline_entity": self.discipline_entity,
"reason": self.reason,
}
# ── Resolution ───────────────────────────────────────────────────────
def resolve_discipline_path(
binding: DisciplineBinding, root: Path
) -> Optional[Path]:
"""Resolve a discipline binding to an absolute path.
Tries the binding's path relative to *root*, then as an absolute path.
Returns ``None`` if the directory doesn't exist.
"""
if not binding.path:
return None
# Try relative to root first
candidate = root / binding.path
if candidate.is_dir():
return candidate.resolve()
# Try as absolute
candidate = Path(binding.path)
if candidate.is_dir():
return candidate.resolve()
return None
def load_discipline_config(
binding: DisciplineBinding, root: Path
) -> Optional[InfospaceConfig]:
"""Load the infospace config for a bound discipline.
Returns ``None`` if the discipline path cannot be resolved or
has no ``infospace.yaml``.
"""
disc_path = resolve_discipline_path(binding, root)
if disc_path is None:
return None
config_file = disc_path / "infospace.yaml"
if not config_file.is_file():
return None
return load_infospace_config(config_file)
# ── Viability checking ───────────────────────────────────────────────
def check_discipline_status(
binding: DisciplineBinding, root: Path
) -> DisciplineStatus:
"""Check the full status of a bound discipline.
Resolves the path, loads config, counts entities, and checks
viability against the discipline's own thresholds.
"""
status = DisciplineStatus(name=binding.name, path=binding.path)
disc_path = resolve_discipline_path(binding, root)
if disc_path is None:
status.error = f"Path not found: {binding.path}"
return status
status.resolved_path = disc_path
status.exists = True
# Load config
config_file = disc_path / "infospace.yaml"
if not config_file.is_file():
status.error = "No infospace.yaml found"
return status
disc_config = load_infospace_config(config_file)
status.has_config = True
# Count entities
entities_dir = disc_path / disc_config.entities_dir
if entities_dir.is_dir():
entities = parse_entity_directory(entities_dir)
status.entity_count = len(entities)
# Check viability
if disc_config.viability:
metrics = read_metrics_file(disc_path / disc_config.metrics_dir / "metrics.yaml")
if metrics:
state = build_state(disc_config, metrics=metrics)
status.viability_results = state.viability_results
status.is_viable = state.is_viable
return status
def get_discipline_entities(
binding: DisciplineBinding, root: Path
) -> List[EntityMeta]:
"""Get all entities from a bound discipline infospace."""
disc_path = resolve_discipline_path(binding, root)
if disc_path is None:
return []
disc_config = load_discipline_config(binding, root)
if disc_config is None:
return []
entities_dir = disc_path / disc_config.entities_dir
if not entities_dir.is_dir():
return []
return parse_entity_directory(entities_dir)
# ── Stale mapping detection ─────────────────────────────────────────
def _content_digest(entity: EntityMeta) -> str:
"""Compute a short content digest for an entity."""
content = f"{entity.slug}|{entity.definition}|{entity.domain}"
return hashlib.sha256(content.encode()).hexdigest()[:12]
def compute_discipline_digests(
binding: DisciplineBinding, root: Path
) -> Dict[str, str]:
"""Compute content digests for all entities in a discipline.
Returns ``{slug: digest}`` mapping.
"""
entities = get_discipline_entities(binding, root)
return {e.slug: _content_digest(e) for e in entities}
def find_stale_mappings(
config: InfospaceConfig,
root: Path,
mapping_references: Optional[Dict[str, List[str]]] = None,
) -> List[StaleMappingInfo]:
"""Find mappings that may be stale due to discipline changes.
Args:
config: The infospace configuration.
root: Project root directory.
mapping_references: ``{entity_slug: [discipline_entity_slugs]}``
mapping of local entities to the discipline entities they
reference. If ``None``, returns an empty list (no mapping
data available).
Returns:
List of stale mapping info objects.
"""
if not mapping_references:
return []
stale: List[StaleMappingInfo] = []
for binding in config.disciplines:
disc_entities = get_discipline_entities(binding, root)
disc_slugs = {e.slug for e in disc_entities}
for entity_slug, refs in mapping_references.items():
for ref_slug in refs:
if ref_slug not in disc_slugs:
stale.append(StaleMappingInfo(
entity_slug=entity_slug,
discipline_entity=ref_slug,
reason=f"Discipline entity '{ref_slug}' no longer exists in '{binding.name}'",
))
return stale
# ── Binding management ───────────────────────────────────────────────
def bind_discipline(
config: InfospaceConfig,
name: str,
path: str,
root: Path,
) -> DisciplineStatus:
"""Add a discipline binding to the config and validate it.
Does NOT persist the config — the caller should save it.
Args:
config: The infospace configuration to update.
name: Discipline name.
path: Path to the discipline infospace.
root: Project root for path resolution.
Returns:
Status of the newly bound discipline.
"""
# Check for duplicates
existing = {d.name for d in config.disciplines}
if name in existing:
return DisciplineStatus(
name=name, path=path, error=f"Discipline '{name}' already bound"
)
binding = DisciplineBinding(name=name, path=path)
config.disciplines.append(binding)
return check_discipline_status(binding, root)

View File

@@ -0,0 +1,309 @@
"""
Infospace configuration model and YAML loader.
An infospace is declared via an ``infospace.yaml`` file that specifies
its topic, disciplines, schemas, competency questions, and viability
thresholds. This module provides the data models and I/O for that
configuration.
Example ``infospace.yaml``::
topic:
name: "The Wealth of Nations"
domain: "Classical Economics"
sources: artifacts/sources/
disciplines:
- name: "Viable System Model"
path: artifacts/vsm-reference/
schemas:
entity: schemas/economic-entity-schema-v1.0.md
competency_questions: schemas/competency-questions.md
viability:
coverage_ratio: { min: 0.60 }
per_entity_mean: { min: 3.5 }
"""
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional
import yaml
@dataclass
class TopicConfig:
"""The subject matter an infospace explains.
Attributes:
name: Human-readable topic name.
domain: Broader knowledge domain.
sources: Path (relative to infospace root) to source material.
"""
name: str
domain: str = ""
sources: str = ""
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {"name": self.name}
if self.domain:
d["domain"] = self.domain
if self.sources:
d["sources"] = self.sources
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> TopicConfig:
return cls(
name=data["name"],
domain=data.get("domain", ""),
sources=data.get("sources", ""),
)
@dataclass
class DisciplineBinding:
"""An external infospace applied as an analytical lens.
Attributes:
name: Human-readable discipline name.
path: Path to the discipline infospace (relative to root).
"""
name: str
path: str = ""
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {"name": self.name}
if self.path:
d["path"] = self.path
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> DisciplineBinding:
return cls(name=data["name"], path=data.get("path", ""))
@dataclass
class SchemaRegistry:
"""Schema paths governing entity and document structure.
All paths are relative to the infospace root directory.
"""
entity: str = ""
mapping: str = ""
analysis: str = ""
extra: Dict[str, str] = field(default_factory=dict)
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {}
if self.entity:
d["entity"] = self.entity
if self.mapping:
d["mapping"] = self.mapping
if self.analysis:
d["analysis"] = self.analysis
d.update(self.extra)
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> SchemaRegistry:
known = {"entity", "mapping", "analysis"}
extra = {k: v for k, v in data.items() if k not in known}
return cls(
entity=data.get("entity", ""),
mapping=data.get("mapping", ""),
analysis=data.get("analysis", ""),
extra=extra,
)
@dataclass
class ViabilityThreshold:
"""Threshold for a single viability metric.
At least one of *min* or *max* should be set.
"""
metric: str
min: Optional[float] = None
max: Optional[float] = None
def check(self, value: float) -> bool:
"""Return ``True`` if *value* is within the threshold."""
if self.min is not None and value < self.min:
return False
if self.max is not None and value > self.max:
return False
return True
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {}
if self.min is not None:
d["min"] = self.min
if self.max is not None:
d["max"] = self.max
return d
@dataclass
class PipelineStage:
"""A single stage in the processing pipeline."""
template: str
spaces: List[str] = field(default_factory=list)
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {"template": self.template}
if self.spaces:
d["spaces"] = self.spaces
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> PipelineStage:
return cls(
template=data["template"],
spaces=data.get("spaces", []),
)
@dataclass
class PipelineConfig:
"""Processing pipeline configuration."""
stages: List[PipelineStage] = field(default_factory=list)
post_batch: List[PipelineStage] = field(default_factory=list)
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {}
if self.stages:
d["stages"] = [s.to_dict() for s in self.stages]
if self.post_batch:
d["post_batch"] = [s.to_dict() for s in self.post_batch]
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> PipelineConfig:
return cls(
stages=[PipelineStage.from_dict(s) for s in data.get("stages", [])],
post_batch=[PipelineStage.from_dict(s) for s in data.get("post_batch", [])],
)
@dataclass
class InfospaceConfig:
"""Complete infospace configuration, loaded from ``infospace.yaml``.
This is the declarative description of an infospace: what it
explains, through which lenses, governed by which schemas, and
what quality thresholds it must meet.
"""
topic: TopicConfig
disciplines: List[DisciplineBinding] = field(default_factory=list)
schemas: SchemaRegistry = field(default_factory=SchemaRegistry)
competency_questions: str = ""
viability: Dict[str, ViabilityThreshold] = field(default_factory=dict)
pipeline: Optional[PipelineConfig] = None
entities_dir: str = "output/entities"
evaluations_dir: str = "output/evaluations"
metrics_dir: str = "output/metrics"
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {"topic": self.topic.to_dict()}
if self.disciplines:
d["disciplines"] = [db.to_dict() for db in self.disciplines]
schemas_dict = self.schemas.to_dict()
if schemas_dict:
d["schemas"] = schemas_dict
if self.competency_questions:
d["competency_questions"] = self.competency_questions
if self.viability:
d["viability"] = {
name: t.to_dict() for name, t in self.viability.items()
}
if self.pipeline:
d["pipeline"] = self.pipeline.to_dict()
if self.entities_dir != "output/entities":
d["entities_dir"] = self.entities_dir
if self.evaluations_dir != "output/evaluations":
d["evaluations_dir"] = self.evaluations_dir
if self.metrics_dir != "output/metrics":
d["metrics_dir"] = self.metrics_dir
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> InfospaceConfig:
viability_raw = data.get("viability", {})
viability = {
name: ViabilityThreshold(metric=name, **bounds)
for name, bounds in viability_raw.items()
}
pipeline_raw = data.get("pipeline")
pipeline = PipelineConfig.from_dict(pipeline_raw) if pipeline_raw else None
return cls(
topic=TopicConfig.from_dict(data["topic"]),
disciplines=[
DisciplineBinding.from_dict(d)
for d in data.get("disciplines", [])
],
schemas=SchemaRegistry.from_dict(data.get("schemas", {})),
competency_questions=data.get("competency_questions", ""),
viability=viability,
pipeline=pipeline,
entities_dir=data.get("entities_dir", "output/entities"),
evaluations_dir=data.get("evaluations_dir", "output/evaluations"),
metrics_dir=data.get("metrics_dir", "output/metrics"),
)
def load_infospace_config(path: Path) -> InfospaceConfig:
"""Load an :class:`InfospaceConfig` from a YAML file.
Args:
path: Path to ``infospace.yaml``.
Raises:
FileNotFoundError: If *path* does not exist.
ValueError: If required fields are missing.
"""
data = yaml.safe_load(path.read_text(encoding="utf-8"))
if not isinstance(data, dict):
raise ValueError(f"Expected a YAML mapping in {path}")
if "topic" not in data:
raise ValueError(f"Missing required 'topic' key in {path}")
return InfospaceConfig.from_dict(data)
def save_infospace_config(config: InfospaceConfig, path: Path) -> None:
"""Write an :class:`InfospaceConfig` to a YAML file."""
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
yaml.safe_dump(
config.to_dict(),
default_flow_style=False,
sort_keys=False,
),
encoding="utf-8",
)
def find_infospace_config(start: Optional[Path] = None) -> Optional[Path]:
"""Walk up from *start* looking for ``infospace.yaml``.
Returns the path to the config file, or ``None``.
"""
current = (start or Path.cwd()).resolve()
for directory in [current, *current.parents]:
candidate = directory / "infospace.yaml"
if candidate.is_file():
return candidate
return None

View File

@@ -0,0 +1,176 @@
"""
Entity metadata parser.
Extracts structured :class:`EntityMeta` from entity markdown files
produced by the infospace entity-extraction pipeline.
"""
import logging
import re
from pathlib import Path
from typing import List, Optional, Sequence
from markitect.core.parser import parse_markdown_to_ast
from markitect.core.section_tree import (
build_section_tree,
extract_heading_content,
extract_heading_level,
extract_section_text,
slugify,
)
from .models import EntityMeta
logger = logging.getLogger(__name__)
# Sections we look for (slug → human-friendly label)
_KNOWN_SECTIONS = {
"definition": "Definition",
"source_chapter": "Source Chapter",
"context": "Context",
"economic_domain": "Economic Domain",
"smith_s_original_wording": "Smith's Original Wording",
"modern_interpretation": "Modern Interpretation",
}
# Default filename patterns to exclude from directory parsing
_DEFAULT_EXCLUDE_PATTERNS = (
r".*-entities\.md$",
r".*-prompt\.md$",
)
def _is_title_case(text: str) -> bool:
"""Return True if *text* is in title case (ignoring short words)."""
# Words that are allowed to be lowercase in title case
minor_words = {
"a", "an", "the", "and", "but", "or", "nor", "for", "yet", "so",
"in", "on", "at", "to", "by", "of", "up", "as", "is", "if",
}
words = text.split()
if not words:
return False
for i, word in enumerate(words):
# Strip leading/trailing punctuation for the check
clean = re.sub(r"[^\w]", "", word)
if not clean:
continue
# First word must be capitalised
if i == 0:
if not clean[0].isupper():
return False
elif clean.lower() in minor_words:
continue # minor words may be lower
elif not clean[0].isupper():
return False
return True
def _word_count(text: str) -> int:
"""Count whitespace-separated words in *text*."""
return len(text.split())
def _find_h2_section(tree_root: dict, slug: str) -> Optional[dict]:
"""Find a direct H2 child of the root by slug."""
for child in tree_root.get("children", []):
if child["level"] == 2 and child["slug"] == slug:
return child
return None
def parse_entity_file(path: Path) -> EntityMeta:
"""Parse a single entity markdown file into :class:`EntityMeta`.
Raises:
ValueError: If the file has no H1 heading.
"""
content = path.read_text(encoding="utf-8")
tokens = parse_markdown_to_ast(content)
tree = build_section_tree(tokens)
# --- H1: entity title ---
h1_section = None
for child in tree["children"]:
if child["level"] == 1:
h1_section = child
break
if h1_section is None:
raise ValueError(f"No H1 heading found in {path}")
h1_raw = h1_section["heading"]
slug = slugify(h1_raw)
title = h1_raw
h1_is_title_case = _is_title_case(h1_raw)
# Use the H1 node as the effective root for H2 look-ups
effective_root = h1_section
# Collect all H2 section slugs
section_slugs = [c["slug"] for c in effective_root.get("children", []) if c["level"] == 2]
# --- Extract known sections ---
def _get_section_text(section_slug: str) -> str:
node = _find_h2_section(effective_root, section_slug)
if node is None:
return ""
return extract_section_text(node).strip()
definition = _get_section_text("definition")
source_chapter = _get_section_text("source_chapter")
context = _get_section_text("context")
domain = _get_section_text("economic_domain")
original_wording = _get_section_text("smith_s_original_wording")
modern_interpretation = _get_section_text("modern_interpretation")
# --- Derived metrics ---
has_original_wording = bool(original_wording)
definition_word_count = _word_count(definition)
total_word_count = _word_count(content)
return EntityMeta(
slug=slug,
title=title,
h1_raw=h1_raw,
definition=definition,
source_chapter=source_chapter,
context=context,
domain=domain,
original_wording=original_wording,
modern_interpretation=modern_interpretation,
h1_is_title_case=h1_is_title_case,
has_original_wording=has_original_wording,
definition_word_count=definition_word_count,
total_word_count=total_word_count,
section_slugs=section_slugs,
source_path=str(path),
)
def parse_entity_directory(
directory: Path,
exclude_patterns: Optional[Sequence[str]] = None,
) -> List[EntityMeta]:
"""Parse all entity markdown files in *directory*.
Files matching *exclude_patterns* (regexes tested against the
filename) are skipped. Defaults exclude chapter-view
(``*-entities.md``) and prompt (``*-prompt.md``) files.
Malformed files are skipped with a warning rather than raising.
"""
if exclude_patterns is None:
exclude_patterns = _DEFAULT_EXCLUDE_PATTERNS
compiled = [re.compile(p) for p in exclude_patterns]
entities: List[EntityMeta] = []
for md_file in sorted(directory.glob("*.md")):
if any(pat.match(md_file.name) for pat in compiled):
continue
try:
entities.append(parse_entity_file(md_file))
except Exception as exc:
logger.warning("Skipping %s: %s", md_file.name, exc)
return entities

View File

@@ -0,0 +1,215 @@
"""
Per-entity evaluation pipeline.
Builds prompts from entity metadata and delegates LLM evaluation to
the :class:`BatchEvaluator`. Writes structured results to the
evaluations directory.
"""
from __future__ import annotations
import hashlib
from datetime import datetime
from pathlib import Path
from typing import Callable, Dict, List, Optional
from markitect.infospace.config import InfospaceConfig
from markitect.infospace.evaluation import EntityEvaluation, ScoreEntry
from markitect.infospace.evaluation_io import write_entity_evaluation
from markitect.infospace.models import EntityMeta
from markitect.prompts.execution.batch import BatchEvaluator, BatchItem, BatchSummary
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.prompts.execution.models import RunConfig
_DEFAULT_DIMENSIONS = [
"definition_precision",
"source_grounding",
"domain_relevance",
"discipline_alignment",
"conceptual_clarity",
]
_PROMPT_TEMPLATE = """\
You are evaluating an entity from an infospace about "{topic}".
## Entity: {title}
**Slug:** {slug}
**Domain:** {domain}
**Source chapter:** {source_chapter}
### Definition
{definition}
### Context
{context}
## Instructions
Rate this entity on each dimension below using a scale of 1-5 \
(1 = poor, 5 = excellent). For each dimension, provide:
1. A numeric score (1-5)
2. A brief rationale (1-2 sentences)
### Dimensions to evaluate:
{dimensions_list}
## Output format
Return your evaluation as a structured list:
DIMENSION: <name>
SCORE: <1-5>
RATIONALE: <explanation>
Repeat for each dimension.
"""
def build_evaluation_prompt(
entity: EntityMeta,
topic: str,
dimensions: Optional[List[str]] = None,
) -> str:
"""Build an evaluation prompt for a single entity."""
dims = dimensions or _DEFAULT_DIMENSIONS
dims_list = "\n".join(f"- {d}" for d in dims)
return _PROMPT_TEMPLATE.format(
topic=topic,
title=entity.title,
slug=entity.slug,
domain=entity.domain or "(unspecified)",
source_chapter=entity.source_chapter or "(unspecified)",
definition=entity.definition or "(no definition)",
context=entity.context or "(no context)",
dimensions_list=dims_list,
)
def content_digest(entity: EntityMeta) -> str:
"""Compute a content digest for incremental evaluation."""
content = f"{entity.slug}:{entity.definition}:{entity.context}:{entity.domain}"
return hashlib.sha256(content.encode()).hexdigest()[:16]
def parse_evaluation_response(
response_text: str,
dimensions: Optional[List[str]] = None,
) -> List[ScoreEntry]:
"""Parse structured dimension scores from LLM response text.
Expects blocks of::
DIMENSION: <name>
SCORE: <1-5>
RATIONALE: <text>
"""
dims = dimensions or _DEFAULT_DIMENSIONS
scores: List[ScoreEntry] = []
current_dim = None
current_score = None
current_rationale = ""
for line in response_text.splitlines():
stripped = line.strip()
if stripped.upper().startswith("DIMENSION:"):
# Flush previous
if current_dim is not None and current_score is not None:
scores.append(ScoreEntry(
name=current_dim,
value=current_score,
max_value=5.0,
rationale=current_rationale.strip(),
))
current_dim = stripped.split(":", 1)[1].strip()
current_score = None
current_rationale = ""
elif stripped.upper().startswith("SCORE:"):
try:
current_score = float(stripped.split(":", 1)[1].strip())
except ValueError:
current_score = None
elif stripped.upper().startswith("RATIONALE:"):
current_rationale = stripped.split(":", 1)[1].strip()
elif current_dim is not None and current_score is not None:
# Continuation of rationale
if stripped:
current_rationale += " " + stripped
# Flush last
if current_dim is not None and current_score is not None:
scores.append(ScoreEntry(
name=current_dim,
value=current_score,
max_value=5.0,
rationale=current_rationale.strip(),
))
return scores
def run_entity_evaluation(
config: InfospaceConfig,
entities: List[EntityMeta],
adapter: LLMAdapter,
run_config: Optional[RunConfig] = None,
output_dir: Optional[Path] = None,
previous_digests: Optional[Dict[str, str]] = None,
progress_callback: Optional[Callable] = None,
dimensions: Optional[List[str]] = None,
) -> BatchSummary:
"""Run per-entity evaluation using the batch evaluator.
Args:
config: The infospace configuration.
entities: Entities to evaluate.
adapter: LLM adapter for evaluation.
run_config: LLM execution configuration.
output_dir: Where to write evaluation results. Defaults to
``config.evaluations_dir`` relative to CWD.
previous_digests: ``{slug: digest}`` for incremental skip.
progress_callback: Called after each item.
dimensions: Custom evaluation dimensions.
Returns:
A :class:`BatchSummary` with per-entity results.
"""
topic = config.topic.name
items = [
BatchItem(
key=entity.slug,
prompt=build_evaluation_prompt(entity, topic, dimensions),
content_digest=content_digest(entity),
metadata={"source_path": entity.source_path},
)
for entity in entities
]
evaluator = BatchEvaluator(
adapter=adapter,
config=run_config,
progress_callback=progress_callback,
previous_digests=previous_digests,
)
summary = evaluator.evaluate(items)
# Write successful results
evaluations_path = output_dir or Path(config.evaluations_dir)
evaluator_name = (run_config.model_name if run_config else "unknown")
for result in summary.results:
if result.status != "success" or result.response is None:
continue
scores = parse_evaluation_response(result.response.content, dimensions)
evaluation = EntityEvaluation(
entity_slug=result.key,
evaluator=evaluator_name,
scores=scores,
evaluated_at=datetime.utcnow(),
)
eval_path = evaluations_path / f"{result.key}.md"
write_entity_evaluation(evaluation, eval_path)
return summary

View File

@@ -0,0 +1,207 @@
"""
Data models for structured evaluation output.
Provides typed containers for per-entity LLM-evaluated scores and
collection-level metrics. All models support ``to_dict()``/``from_dict()``
round-tripping for YAML serialisation.
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Dict, List, Optional
@dataclass
class ScoreEntry:
"""A single scored dimension (e.g. definition_precision: 4.5/5.0)."""
name: str
value: float
max_value: float = 5.0
rationale: str = ""
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {
"name": self.name,
"value": self.value,
"max_value": self.max_value,
}
if self.rationale:
d["rationale"] = self.rationale
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "ScoreEntry":
return cls(
name=data["name"],
value=float(data["value"]),
max_value=float(data.get("max_value", 5.0)),
rationale=data.get("rationale", ""),
)
@dataclass
class EntityEvaluation:
"""Per-entity evaluation result."""
entity_slug: str
evaluator: str
scores: List[ScoreEntry]
evaluated_at: datetime
notes: List[str] = field(default_factory=list)
@property
def overall_score(self) -> float:
if not self.scores:
return 0.0
return sum(s.value for s in self.scores) / len(self.scores)
def to_dict(self) -> Dict[str, Any]:
return {
"entity_slug": self.entity_slug,
"evaluator": self.evaluator,
"evaluated_at": self.evaluated_at.isoformat(),
"overall_score": round(self.overall_score, 4),
"scores": [s.to_dict() for s in self.scores],
"notes": self.notes,
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "EntityEvaluation":
return cls(
entity_slug=data["entity_slug"],
evaluator=data["evaluator"],
scores=[ScoreEntry.from_dict(s) for s in data["scores"]],
evaluated_at=datetime.fromisoformat(data["evaluated_at"]),
notes=data.get("notes", []),
)
@dataclass
class MetricValue:
"""A single collection-level metric."""
name: str
value: float
concern: str = ""
details: Dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {"name": self.name, "value": self.value}
if self.concern:
d["concern"] = self.concern
if self.details:
d["details"] = self.details
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MetricValue":
return cls(
name=data["name"],
value=float(data["value"]),
concern=data.get("concern", ""),
details=data.get("details", {}),
)
@dataclass
class EvaluationSnapshot:
"""Timestamped snapshot of entity evaluations and collection metrics."""
snapshot_id: str
created_at: datetime
schema_name: str
entity_count: int
entity_evaluations: List[EntityEvaluation] = field(default_factory=list)
collection_metrics: List[MetricValue] = field(default_factory=list)
metadata: Dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> Dict[str, Any]:
return {
"snapshot_id": self.snapshot_id,
"created_at": self.created_at.isoformat(),
"schema_name": self.schema_name,
"entity_count": self.entity_count,
"entity_evaluations": [e.to_dict() for e in self.entity_evaluations],
"collection_metrics": [m.to_dict() for m in self.collection_metrics],
"metadata": self.metadata,
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "EvaluationSnapshot":
return cls(
snapshot_id=data["snapshot_id"],
created_at=datetime.fromisoformat(data["created_at"]),
schema_name=data["schema_name"],
entity_count=data["entity_count"],
entity_evaluations=[
EntityEvaluation.from_dict(e) for e in data.get("entity_evaluations", [])
],
collection_metrics=[
MetricValue.from_dict(m) for m in data.get("collection_metrics", [])
],
metadata=data.get("metadata", {}),
)
@dataclass
class ScoreChange:
"""Delta record for a single score dimension between snapshots."""
entity_slug: str
dimension: str
before: float
after: float
@property
def delta(self) -> float:
return self.after - self.before
@dataclass
class MetricChange:
"""Delta record for a collection metric between snapshots."""
name: str
before: float
after: float
@property
def delta(self) -> float:
return self.after - self.before
@dataclass
class SnapshotDiff:
"""Diff between two evaluation snapshots."""
before_id: str
after_id: str
added_entities: List[str] = field(default_factory=list)
removed_entities: List[str] = field(default_factory=list)
score_changes: List[ScoreChange] = field(default_factory=list)
metric_changes: List[MetricChange] = field(default_factory=list)
def summary(self) -> str:
lines = [f"Diff: {self.before_id} -> {self.after_id}"]
if self.added_entities:
lines.append(f" Added entities: {', '.join(self.added_entities)}")
if self.removed_entities:
lines.append(f" Removed entities: {', '.join(self.removed_entities)}")
if self.score_changes:
lines.append(f" Score changes: {len(self.score_changes)}")
for sc in self.score_changes:
lines.append(
f" {sc.entity_slug}/{sc.dimension}: "
f"{sc.before} -> {sc.after} ({sc.delta:+.2f})"
)
if self.metric_changes:
lines.append(f" Metric changes: {len(self.metric_changes)}")
for mc in self.metric_changes:
lines.append(
f" {mc.name}: {mc.before} -> {mc.after} ({mc.delta:+.2f})"
)
if not any([self.added_entities, self.removed_entities,
self.score_changes, self.metric_changes]):
lines.append(" No changes")
return "\n".join(lines)

View File

@@ -0,0 +1,213 @@
"""
Read/write utilities for evaluation output files.
Per-entity evaluations are stored as markdown with YAML frontmatter.
Snapshots and history are stored as pure YAML files.
"""
from pathlib import Path
from typing import List
import yaml
from .evaluation import (
EntityEvaluation,
EvaluationSnapshot,
MetricChange,
MetricValue,
ScoreChange,
SnapshotDiff,
)
_FRONTMATTER_SEP = "---"
def write_entity_evaluation(evaluation: EntityEvaluation, path: Path) -> None:
"""Write a per-entity evaluation as YAML frontmatter + markdown body."""
frontmatter = {
"entity_slug": evaluation.entity_slug,
"evaluator": evaluation.evaluator,
"evaluated_at": evaluation.evaluated_at.isoformat(),
"overall_score": round(evaluation.overall_score, 4),
"scores": [s.to_dict() for s in evaluation.scores],
}
if evaluation.notes:
frontmatter["notes"] = evaluation.notes
lines: List[str] = []
lines.append(_FRONTMATTER_SEP)
lines.append(yaml.safe_dump(frontmatter, default_flow_style=False, sort_keys=False).rstrip())
lines.append(_FRONTMATTER_SEP)
lines.append("")
# Title
title = evaluation.entity_slug.replace("_", " ").replace("-", " ").title()
lines.append(f"# Evaluation: {title}")
lines.append("")
# One section per score with rationale
for score in evaluation.scores:
lines.append(f"## {score.name}{score.value} / {score.max_value}")
lines.append("")
if score.rationale:
lines.append(score.rationale)
lines.append("")
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text("\n".join(lines), encoding="utf-8")
def read_entity_evaluation(path: Path) -> EntityEvaluation:
"""Read a per-entity evaluation from a YAML frontmatter markdown file."""
text = path.read_text(encoding="utf-8")
parts = text.split(f"{_FRONTMATTER_SEP}\n", maxsplit=2)
# parts: ["", frontmatter_text, body]
if len(parts) < 3:
raise ValueError(f"Invalid frontmatter in {path}")
fm_text = parts[1]
body = parts[2]
fm = yaml.safe_load(fm_text)
# Parse rationales from body
rationales = _parse_rationales(body)
from .evaluation import ScoreEntry
scores = []
for s_data in fm["scores"]:
se = ScoreEntry.from_dict(s_data)
if se.name in rationales:
se.rationale = rationales[se.name]
scores.append(se)
return EntityEvaluation(
entity_slug=fm["entity_slug"],
evaluator=fm["evaluator"],
scores=scores,
evaluated_at=__import__("datetime").datetime.fromisoformat(fm["evaluated_at"]),
notes=fm.get("notes", []),
)
def _parse_rationales(body: str) -> dict:
"""Extract rationale text per dimension from the markdown body."""
rationales: dict = {}
current_name = None
current_lines: List[str] = []
for line in body.splitlines():
if line.startswith("## "):
# Save previous
if current_name is not None:
rationales[current_name] = "\n".join(current_lines).strip()
# Parse "## dimension_name — 4.5 / 5.0"
heading = line[3:].strip()
name = heading.split("")[0].strip() if "" in heading else heading
current_name = name
current_lines = []
elif current_name is not None:
current_lines.append(line)
if current_name is not None:
rationales[current_name] = "\n".join(current_lines).strip()
return rationales
def write_snapshot(snapshot: EvaluationSnapshot, path: Path) -> None:
"""Write an evaluation snapshot as a YAML file."""
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
yaml.safe_dump(snapshot.to_dict(), default_flow_style=False, sort_keys=False),
encoding="utf-8",
)
def read_snapshot(path: Path) -> EvaluationSnapshot:
"""Read an evaluation snapshot from a YAML file."""
data = yaml.safe_load(path.read_text(encoding="utf-8"))
return EvaluationSnapshot.from_dict(data)
def append_to_history(snapshot: EvaluationSnapshot, history_path: Path) -> None:
"""Append a snapshot to a YAML list file (creates if missing)."""
history_path.parent.mkdir(parents=True, exist_ok=True)
existing: List[dict] = []
if history_path.exists():
loaded = yaml.safe_load(history_path.read_text(encoding="utf-8"))
if loaded is not None:
existing = loaded
existing.append(snapshot.to_dict())
history_path.write_text(
yaml.safe_dump(existing, default_flow_style=False, sort_keys=False),
encoding="utf-8",
)
def read_history(history_path: Path) -> List[EvaluationSnapshot]:
"""Read all snapshots from a YAML history file."""
data = yaml.safe_load(history_path.read_text(encoding="utf-8"))
if data is None:
return []
return [EvaluationSnapshot.from_dict(d) for d in data]
def diff_snapshots(before: EvaluationSnapshot, after: EvaluationSnapshot) -> SnapshotDiff:
"""Compute the diff between two evaluation snapshots."""
before_slugs = {e.entity_slug for e in before.entity_evaluations}
after_slugs = {e.entity_slug for e in after.entity_evaluations}
added = sorted(after_slugs - before_slugs)
removed = sorted(before_slugs - after_slugs)
# Build score lookup: {slug: {dimension: value}}
before_scores: dict = {}
for ev in before.entity_evaluations:
before_scores[ev.entity_slug] = {s.name: s.value for s in ev.scores}
after_scores: dict = {}
for ev in after.entity_evaluations:
after_scores[ev.entity_slug] = {s.name: s.value for s in ev.scores}
score_changes: List[ScoreChange] = []
common_slugs = sorted(before_slugs & after_slugs)
for slug in common_slugs:
b_dims = before_scores[slug]
a_dims = after_scores[slug]
all_dims = sorted(set(b_dims) | set(a_dims))
for dim in all_dims:
bv = b_dims.get(dim)
av = a_dims.get(dim)
if bv != av:
score_changes.append(ScoreChange(
entity_slug=slug,
dimension=dim,
before=bv if bv is not None else 0.0,
after=av if av is not None else 0.0,
))
# Metric changes
before_metrics = {m.name: m.value for m in before.collection_metrics}
after_metrics = {m.name: m.value for m in after.collection_metrics}
all_metric_names = sorted(set(before_metrics) | set(after_metrics))
metric_changes: List[MetricChange] = []
for name in all_metric_names:
bv = before_metrics.get(name)
av = after_metrics.get(name)
if bv != av:
metric_changes.append(MetricChange(
name=name,
before=bv if bv is not None else 0.0,
after=av if av is not None else 0.0,
))
return SnapshotDiff(
before_id=before.snapshot_id,
after_id=after.snapshot_id,
added_entities=added,
removed_entities=removed,
score_changes=score_changes,
metric_changes=metric_changes,
)

View File

@@ -0,0 +1,223 @@
"""
Metrics history and viability tracking.
Converts check results into timestamped snapshots and maintains a
persistent history file for trend analysis.
"""
from __future__ import annotations
import uuid
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional
import yaml
from markitect.infospace.checks.orchestrator import CheckReport
from markitect.infospace.config import InfospaceConfig
from markitect.infospace.evaluation import EvaluationSnapshot, MetricValue
from markitect.infospace.evaluation_io import (
append_to_history,
diff_snapshots,
read_history,
)
from markitect.infospace.state import ViabilityResult
# ── Snapshot creation ────────────────────────────────────────────────
def _concern_for_metric(name: str) -> str:
"""Map a metric name to its concern label."""
mapping = {
"redundancy_ratio": "C1",
"coverage_ratio": "C2",
"coherence_components": "C3",
"modularity": "C3",
"consistency_cycles": "C4",
"granularity_entropy": "C5",
}
return mapping.get(name, "")
def snapshot_from_checks(
check_report: CheckReport,
entity_count: int,
schema_name: str = "default",
metadata: Optional[Dict[str, Any]] = None,
) -> EvaluationSnapshot:
"""Create an :class:`EvaluationSnapshot` from collection check results.
Args:
check_report: Output from :func:`run_all_checks`.
entity_count: Number of entities checked.
schema_name: Schema identifier for the snapshot.
metadata: Optional extra metadata to attach.
Returns:
A snapshot containing the check metrics as collection_metrics.
"""
metrics_dict = check_report.metrics()
collection_metrics = [
MetricValue(
name=name,
value=value,
concern=_concern_for_metric(name),
)
for name, value in sorted(metrics_dict.items())
]
return EvaluationSnapshot(
snapshot_id=str(uuid.uuid4())[:8],
created_at=datetime.now(timezone.utc),
schema_name=schema_name,
entity_count=entity_count,
collection_metrics=collection_metrics,
metadata=metadata or {},
)
# ── Metrics file I/O ────────────────────────────────────────────────
def write_metrics_file(metrics: Dict[str, float], path: Path) -> None:
"""Write the latest metrics to a simple YAML file.
This file is used by ``markitect infospace viability`` for quick
threshold checking.
"""
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
yaml.safe_dump(
{k: round(v, 6) for k, v in sorted(metrics.items())},
default_flow_style=False,
sort_keys=True,
),
encoding="utf-8",
)
def read_metrics_file(path: Path) -> Dict[str, float]:
"""Read the latest metrics from a YAML file."""
if not path.is_file():
return {}
raw = yaml.safe_load(path.read_text(encoding="utf-8"))
if not isinstance(raw, dict):
return {}
return {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
# ── History operations ───────────────────────────────────────────────
def record_check_results(
check_report: CheckReport,
config: InfospaceConfig,
root: Path,
entity_count: int,
) -> EvaluationSnapshot:
"""Record check results: save metrics file and append to history.
Args:
check_report: Output from ``run_all_checks()``.
config: The infospace configuration.
root: Project root directory.
entity_count: Number of entities checked.
Returns:
The snapshot that was recorded.
"""
metrics_dir = root / config.metrics_dir
metrics = check_report.metrics()
# Save latest metrics
write_metrics_file(metrics, metrics_dir / "metrics.yaml")
# Create and append snapshot
snapshot = snapshot_from_checks(
check_report,
entity_count=entity_count,
metadata={"source": "collection-checks"},
)
append_to_history(snapshot, metrics_dir / "history.yaml")
return snapshot
def get_history(config: InfospaceConfig, root: Path) -> List[EvaluationSnapshot]:
"""Read the full metrics history for an infospace."""
history_path = root / config.metrics_dir / "history.yaml"
if not history_path.is_file():
return []
return read_history(history_path)
def get_latest_snapshot(
config: InfospaceConfig, root: Path
) -> Optional[EvaluationSnapshot]:
"""Get the most recent snapshot from the history."""
history = get_history(config, root)
return history[-1] if history else None
def find_snapshot_by_date(
history: List[EvaluationSnapshot], date_str: str
) -> Optional[EvaluationSnapshot]:
"""Find the snapshot closest to a given date string.
Args:
history: List of snapshots in chronological order.
date_str: Date string in ``YYYY-MM-DD`` or ``YYYY-MM-DDTHH:MM:SS`` format.
Returns:
The snapshot closest to the given date, or ``None`` if history is empty.
"""
if not history:
return None
# Parse the target date
try:
if "T" in date_str:
target = datetime.fromisoformat(date_str)
else:
target = datetime.fromisoformat(date_str + "T00:00:00")
except ValueError:
return None
# Make timezone-aware if needed
if target.tzinfo is None:
target = target.replace(tzinfo=timezone.utc)
best = None
best_delta = None
for snap in history:
snap_dt = snap.created_at
if snap_dt.tzinfo is None:
snap_dt = snap_dt.replace(tzinfo=timezone.utc)
delta = abs((snap_dt - target).total_seconds())
if best_delta is None or delta < best_delta:
best = snap
best_delta = delta
return best
def metric_trend(
history: List[EvaluationSnapshot], metric_name: str
) -> List[Dict[str, Any]]:
"""Extract a single metric's values across the history.
Returns a list of ``{"date": iso_str, "value": float}`` entries
for each snapshot that contains the metric.
"""
trend: List[Dict[str, Any]] = []
for snap in history:
for m in snap.collection_metrics:
if m.name == metric_name:
trend.append({
"date": snap.created_at.isoformat(),
"value": m.value,
})
break
return trend

View File

@@ -0,0 +1,53 @@
"""
Data models for infospace entity metadata.
"""
from dataclasses import dataclass, field, asdict
from typing import Any, Dict, List
@dataclass
class EntityMeta:
"""Structured metadata extracted from a single entity markdown file.
The parser populates every field it can find; missing optional
sections are left as empty strings (validation is a separate step).
"""
# Identity
slug: str
title: str
h1_raw: str # verbatim H1 text before any normalisation
# Section contents (plain text, empty string if section missing)
definition: str = ""
source_chapter: str = ""
context: str = ""
domain: str = ""
original_wording: str = ""
modern_interpretation: str = ""
# Derived flags
h1_is_title_case: bool = False
has_original_wording: bool = False
# Metrics-ready numbers
definition_word_count: int = 0
total_word_count: int = 0
# All H2 section slugs found (preserves order)
section_slugs: List[str] = field(default_factory=list)
# Source file path (as string for serialisation)
source_path: str = ""
def to_dict(self) -> Dict[str, Any]:
"""Serialise to a plain dictionary."""
return asdict(self)
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "EntityMeta":
"""Deserialise from a plain dictionary."""
known_fields = {f.name for f in cls.__dataclass_fields__.values()}
filtered = {k: v for k, v in data.items() if k in known_fields}
return cls(**filtered)

View File

@@ -0,0 +1,144 @@
"""
Declarative schema definitions for entity compliance validation.
A schema describes the expected structure of an entity: which sections
are required, word count bounds, heading format, and valid enum values.
Schemas are frozen (immutable once created).
"""
from dataclasses import dataclass
from enum import Enum
from typing import Optional, Tuple
class SectionRequirement(Enum):
"""How strictly a section must be present."""
REQUIRED = "required"
RECOMMENDED = "recommended"
OPTIONAL = "optional"
@dataclass(frozen=True)
class SectionRule:
"""Validation rule for a single H2 section.
Parameters
----------
slug:
Section slug as it appears in entity metadata (e.g. ``definition``).
label:
Human-readable section name for diagnostics.
requirement:
Whether the section is required, recommended, or optional.
min_words:
Minimum word count (inclusive). ``None`` means no lower bound.
max_words:
Maximum word count (inclusive). ``None`` means no upper bound.
"""
slug: str
label: str
requirement: SectionRequirement
min_words: Optional[int] = None
max_words: Optional[int] = None
@dataclass(frozen=True)
class EnumConstraint:
"""Constraint limiting a field to a set of allowed values.
Parameters
----------
field_name:
The ``EntityMeta`` field to check (e.g. ``domain``).
allowed_values:
Tuple of acceptable string values.
severity:
``"error"`` or ``"warning"`` when the value is not in the set.
"""
field_name: str
allowed_values: Tuple[str, ...]
severity: str = "warning"
@dataclass(frozen=True)
class EntitySchema:
"""Complete validation schema for an entity type.
Parameters
----------
name:
Human-readable schema name (e.g. ``"Economic Entity"``).
section_rules:
Tuple of :class:`SectionRule` objects.
enum_constraints:
Tuple of :class:`EnumConstraint` objects.
h1_title_case_severity:
Severity for non-title-case H1 headings (``"error"`` or ``"warning"``).
require_h1:
Whether a non-empty slug (H1) is required.
"""
name: str
section_rules: Tuple[SectionRule, ...]
enum_constraints: Tuple[EnumConstraint, ...] = ()
h1_title_case_severity: str = "warning"
require_h1: bool = True
# ── Default schema for the economic-entity infospace ──────────────
ECONOMIC_ENTITY_SCHEMA = EntitySchema(
name="Economic Entity",
section_rules=(
SectionRule(
slug="definition",
label="Definition",
requirement=SectionRequirement.REQUIRED,
min_words=20,
max_words=150,
),
SectionRule(
slug="source_chapter",
label="Source Chapter",
requirement=SectionRequirement.REQUIRED,
),
SectionRule(
slug="context",
label="Context",
requirement=SectionRequirement.REQUIRED,
),
SectionRule(
slug="economic_domain",
label="Economic Domain",
requirement=SectionRequirement.REQUIRED,
),
SectionRule(
slug="smith_s_original_wording",
label="Smith's Original Wording",
requirement=SectionRequirement.OPTIONAL,
),
SectionRule(
slug="modern_interpretation",
label="Modern Interpretation",
requirement=SectionRequirement.OPTIONAL,
),
),
enum_constraints=(
EnumConstraint(
field_name="domain",
allowed_values=(
"Production",
"Exchange",
"Distribution",
"Regulation",
"General Theory",
),
severity="warning",
),
),
h1_title_case_severity="warning",
require_h1=True,
)

View File

@@ -0,0 +1,141 @@
"""
Infospace runtime state.
Computed from the current entities, evaluations, and metrics on disk.
Provides the data behind ``markitect infospace status`` and
``markitect infospace viability``.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional
from markitect.infospace.config import InfospaceConfig, ViabilityThreshold
from markitect.infospace.models import EntityMeta
from markitect.infospace.evaluation import EvaluationSnapshot
@dataclass
class ViabilityResult:
"""Result of checking a single viability threshold."""
metric: str
value: float
threshold: ViabilityThreshold
passed: bool
def to_dict(self) -> Dict[str, Any]:
d: Dict[str, Any] = {
"metric": self.metric,
"value": self.value,
"passed": self.passed,
}
if self.threshold.min is not None:
d["min"] = self.threshold.min
if self.threshold.max is not None:
d["max"] = self.threshold.max
return d
@dataclass
class InfospaceState:
"""Current runtime state of an infospace.
Aggregates entity metadata, evaluation results, and viability
checks into a single queryable object.
"""
config: InfospaceConfig
entities: List[EntityMeta] = field(default_factory=list)
latest_snapshot: Optional[EvaluationSnapshot] = None
viability_results: List[ViabilityResult] = field(default_factory=list)
computed_at: datetime = field(default_factory=datetime.utcnow)
@property
def entity_count(self) -> int:
return len(self.entities)
@property
def topic_name(self) -> str:
return self.config.topic.name
@property
def is_viable(self) -> bool:
"""``True`` if all viability thresholds are met."""
if not self.viability_results:
return False
return all(r.passed for r in self.viability_results)
@property
def viability_pass_count(self) -> int:
return sum(1 for r in self.viability_results if r.passed)
@property
def viability_total_count(self) -> int:
return len(self.viability_results)
@property
def domains(self) -> List[str]:
"""Distinct domain values across all entities."""
return sorted({e.domain for e in self.entities if e.domain})
@property
def has_evaluations(self) -> bool:
return self.latest_snapshot is not None
def check_viability(self, metrics: Dict[str, float]) -> List[ViabilityResult]:
"""Check *metrics* against the configured viability thresholds.
Updates :attr:`viability_results` and returns the results.
"""
results: List[ViabilityResult] = []
for name, threshold in self.config.viability.items():
value = metrics.get(name, 0.0)
results.append(ViabilityResult(
metric=name,
value=value,
threshold=threshold,
passed=threshold.check(value),
))
self.viability_results = results
return results
def summary(self) -> Dict[str, Any]:
"""Return a summary dict suitable for display or serialisation."""
d: Dict[str, Any] = {
"topic": self.topic_name,
"entity_count": self.entity_count,
"domains": self.domains,
"has_evaluations": self.has_evaluations,
}
if self.viability_results:
d["viable"] = self.is_viable
d["viability_pass"] = self.viability_pass_count
d["viability_total"] = self.viability_total_count
if self.latest_snapshot:
d["last_evaluated"] = self.latest_snapshot.created_at.isoformat()
return d
def build_state(
config: InfospaceConfig,
entities: Optional[List[EntityMeta]] = None,
snapshot: Optional[EvaluationSnapshot] = None,
metrics: Optional[Dict[str, float]] = None,
) -> InfospaceState:
"""Build an :class:`InfospaceState` from available data.
This is a convenience function that assembles the state object
and optionally runs viability checks if *metrics* are provided.
"""
state = InfospaceState(
config=config,
entities=entities or [],
latest_snapshot=snapshot,
)
if metrics is not None:
state.check_viability(metrics)
return state

View File

@@ -0,0 +1,261 @@
"""
Schema compliance validator for entity metadata.
Validates :class:`~markitect.infospace.models.EntityMeta` instances
against a declarative :class:`~markitect.infospace.schema.EntitySchema`.
All checks are deterministic — no LLM calls.
"""
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Sequence
from .models import EntityMeta
from .schema import EntitySchema, SectionRequirement
# Maps section slugs (as they appear in the schema) to EntityMeta field
# names. Most match directly; ``economic_domain`` maps to ``domain``.
_SECTION_FIELD_MAP: Dict[str, str] = {
"definition": "definition",
"source_chapter": "source_chapter",
"context": "context",
"economic_domain": "domain",
"smith_s_original_wording": "original_wording",
"modern_interpretation": "modern_interpretation",
}
@dataclass
class ComplianceDiagnostic:
"""A single validation finding."""
code: str
message: str
severity: str # "error" or "warning"
section: Optional[str] = None
field: Optional[str] = None
def __str__(self) -> str:
parts = [f"[{self.severity.upper()}] {self.code}: {self.message}"]
if self.section:
parts.append(f"(section: {self.section})")
if self.field:
parts.append(f"(field: {self.field})")
return " ".join(parts)
@dataclass
class ComplianceResult:
"""Validation result for a single entity."""
entity_slug: str
schema_name: str
diagnostics: List[ComplianceDiagnostic] = field(default_factory=list)
checks_run: int = 0
@property
def is_compliant(self) -> bool:
return self.error_count == 0
@property
def error_count(self) -> int:
return sum(1 for d in self.diagnostics if d.severity == "error")
@property
def warning_count(self) -> int:
return sum(1 for d in self.diagnostics if d.severity == "warning")
@property
def errors(self) -> List[ComplianceDiagnostic]:
return [d for d in self.diagnostics if d.severity == "error"]
@property
def warnings(self) -> List[ComplianceDiagnostic]:
return [d for d in self.diagnostics if d.severity == "warning"]
def summary(self) -> str:
status = "PASS" if self.is_compliant else "FAIL"
return (
f"{self.entity_slug}: {status} "
f"({self.checks_run} checks, "
f"{self.error_count} errors, "
f"{self.warning_count} warnings)"
)
@dataclass
class BatchComplianceResult:
"""Aggregated validation result for multiple entities."""
results: List[ComplianceResult] = field(default_factory=list)
schema_name: str = ""
@property
def total_entities(self) -> int:
return len(self.results)
@property
def compliant_count(self) -> int:
return sum(1 for r in self.results if r.is_compliant)
@property
def non_compliant_count(self) -> int:
return self.total_entities - self.compliant_count
@property
def total_errors(self) -> int:
return sum(r.error_count for r in self.results)
@property
def total_warnings(self) -> int:
return sum(r.warning_count for r in self.results)
def summary(self) -> str:
lines = [
f"Schema: {self.schema_name}",
f"Entities: {self.total_entities}",
f"Compliant: {self.compliant_count}/{self.total_entities}",
f"Errors: {self.total_errors}, Warnings: {self.total_warnings}",
]
for r in self.results:
lines.append(f" {r.summary()}")
return "\n".join(lines)
def _word_count(text: str) -> int:
"""Count whitespace-separated words."""
return len(text.split())
def validate_entity(
entity: EntityMeta,
schema: EntitySchema,
) -> ComplianceResult:
"""Validate a single entity against *schema*.
Returns a :class:`ComplianceResult` with all diagnostics found.
"""
result = ComplianceResult(
entity_slug=entity.slug,
schema_name=schema.name,
)
checks = 0
# ── H1 checks ─────────────────────────────────────────────────
if schema.require_h1:
checks += 1
if not entity.slug:
result.diagnostics.append(
ComplianceDiagnostic(
code="H1_MISSING",
message="Entity has no H1 heading (empty slug).",
severity="error",
)
)
checks += 1
if entity.slug and not entity.h1_is_title_case:
result.diagnostics.append(
ComplianceDiagnostic(
code="H1_NOT_TITLE_CASE",
message=f"H1 '{entity.h1_raw}' is not in title case.",
severity=schema.h1_title_case_severity,
)
)
# ── Section checks ────────────────────────────────────────────
for rule in schema.section_rules:
checks += 1
field_name = _SECTION_FIELD_MAP.get(rule.slug, rule.slug)
value = getattr(entity, field_name, "")
is_empty = not value or not value.strip()
if is_empty:
if rule.requirement == SectionRequirement.REQUIRED:
result.diagnostics.append(
ComplianceDiagnostic(
code="SECTION_MISSING",
message=f"Required section '{rule.label}' is missing or empty.",
severity="error",
section=rule.slug,
)
)
elif rule.requirement == SectionRequirement.RECOMMENDED:
result.diagnostics.append(
ComplianceDiagnostic(
code="SECTION_RECOMMENDED",
message=f"Recommended section '{rule.label}' is missing.",
severity="warning",
section=rule.slug,
)
)
# OPTIONAL + empty → no diagnostic
continue
# Word count bounds (only if section has content)
wc = _word_count(value)
if rule.min_words is not None and wc < rule.min_words:
checks += 1
result.diagnostics.append(
ComplianceDiagnostic(
code="SECTION_TOO_SHORT",
message=(
f"Section '{rule.label}' has {wc} words "
f"(minimum: {rule.min_words})."
),
severity="error",
section=rule.slug,
)
)
elif rule.max_words is not None and wc > rule.max_words:
checks += 1
result.diagnostics.append(
ComplianceDiagnostic(
code="SECTION_TOO_LONG",
message=(
f"Section '{rule.label}' has {wc} words "
f"(maximum: {rule.max_words})."
),
severity="warning",
section=rule.slug,
)
)
# ── Enum constraints ──────────────────────────────────────────
for constraint in schema.enum_constraints:
checks += 1
value = getattr(entity, constraint.field_name, "")
# Empty field is already caught by SECTION_MISSING above
if not value or not value.strip():
continue
if value.strip() not in constraint.allowed_values:
result.diagnostics.append(
ComplianceDiagnostic(
code="ENUM_VALUE_UNKNOWN",
message=(
f"Field '{constraint.field_name}' has value "
f"'{value.strip()}' which is not in the allowed set."
),
severity=constraint.severity,
field=constraint.field_name,
)
)
result.checks_run = checks
return result
def validate_entities(
entities: Sequence[EntityMeta],
schema: EntitySchema,
) -> BatchComplianceResult:
"""Validate multiple entities against *schema*.
Returns a :class:`BatchComplianceResult` with per-entity results.
"""
batch = BatchComplianceResult(schema_name=schema.name)
for entity in entities:
batch.results.append(validate_entity(entity, schema))
return batch

View File

@@ -26,6 +26,15 @@ from markitect.llm.exceptions import (
LLMTimeoutError,
LLMSubprocessError,
)
from markitect.llm.embedding_adapter import EmbeddingAdapter
from markitect.llm.embedding_openai import OpenAICompatibleEmbeddingAdapter
from markitect.llm.embedding_cache import EmbeddingCache
from markitect.llm.embedding_factory import create_embedding_adapter
from markitect.llm.similarity import (
cosine_similarity,
similarity_matrix,
find_similar_pairs,
)
__all__ = [
"create_adapter",
@@ -41,4 +50,11 @@ __all__ = [
"LLMRateLimitError",
"LLMTimeoutError",
"LLMSubprocessError",
"EmbeddingAdapter",
"OpenAICompatibleEmbeddingAdapter",
"EmbeddingCache",
"create_embedding_adapter",
"cosine_similarity",
"similarity_matrix",
"find_similar_pairs",
]

View File

@@ -0,0 +1,34 @@
"""
Abstract base class for embedding adapters.
Embedding adapters convert text into float vectors. This is a separate
hierarchy from :class:`LLMAdapter` (text generation) because the API
contract is fundamentally different: text in, float vectors out.
"""
from abc import ABC, abstractmethod
class EmbeddingAdapter(ABC):
"""Base class for all embedding adapters."""
@abstractmethod
def embed(self, texts: list[str]) -> list[list[float]]:
"""Embed a batch of texts into vectors.
Args:
texts: One or more strings to embed.
Returns:
A list of embedding vectors, one per input text,
in the same order as *texts*.
"""
@abstractmethod
def validate(self) -> bool:
"""Check that the adapter is configured correctly.
Returns:
``True`` if the adapter has a valid configuration
(e.g. API key present), ``False`` otherwise.
"""

View File

@@ -0,0 +1,64 @@
"""
File-based embedding cache.
Stores embedding vectors in a single JSON file keyed by entity slug.
Each entry includes a content digest so stale embeddings are
automatically invalidated when entity content changes.
"""
import json
from pathlib import Path
from typing import Optional
class EmbeddingCache:
"""Persistent cache for embedding vectors.
Structure on disk (``embeddings.json``)::
{
"division-of-labour": {"digest": "abc123", "vector": [0.1, ...]},
...
}
"""
def __init__(self, cache_dir: Path):
self._path = cache_dir / "embeddings.json"
self._data: dict[str, dict] = {}
self._hits = 0
self._misses = 0
self._load()
def get(self, slug: str, content_digest: str) -> Optional[list[float]]:
"""Return the cached vector if *content_digest* matches, else ``None``."""
entry = self._data.get(slug)
if entry is not None and entry.get("digest") == content_digest:
self._hits += 1
return entry["vector"]
self._misses += 1
return None
def put(self, slug: str, content_digest: str, vector: list[float]) -> None:
"""Store or overwrite the embedding for *slug*."""
self._data[slug] = {"digest": content_digest, "vector": vector}
def save(self) -> None:
"""Write cache to disk."""
self._path.parent.mkdir(parents=True, exist_ok=True)
self._path.write_text(json.dumps(self._data, separators=(",", ":")))
def stats(self) -> dict:
"""Return cache statistics."""
return {
"entries": len(self._data),
"hits": self._hits,
"misses": self._misses,
}
def _load(self) -> None:
"""Read cache from disk if it exists."""
if self._path.is_file():
try:
self._data = json.loads(self._path.read_text())
except (json.JSONDecodeError, OSError):
self._data = {}

View File

@@ -0,0 +1,50 @@
"""
Factory for creating embedding adapters by provider name.
"""
from typing import Optional, Any
from markitect.llm.embedding_adapter import EmbeddingAdapter
from markitect.llm.exceptions import LLMConfigurationError
_EMBEDDING_PROVIDERS = {
"openai": "markitect.llm.embedding_openai.OpenAICompatibleEmbeddingAdapter",
"openrouter": "markitect.llm.embedding_openai.OpenAICompatibleEmbeddingAdapter",
}
def create_embedding_adapter(
provider: str = "openai",
model: Optional[str] = None,
api_key: Optional[str] = None,
**kwargs: Any,
) -> EmbeddingAdapter:
"""Instantiate an :class:`EmbeddingAdapter` for the given *provider*.
Args:
provider: ``"openai"`` or ``"openrouter"``.
model: Embedding model name (e.g. ``"text-embedding-3-small"``).
api_key: Explicit API key.
**kwargs: Extra keyword arguments forwarded to the adapter.
Returns:
A ready-to-use :class:`EmbeddingAdapter` instance.
Raises:
LLMConfigurationError: If *provider* is not recognised.
"""
if provider not in _EMBEDDING_PROVIDERS:
known = ", ".join(sorted(_EMBEDDING_PROVIDERS))
raise LLMConfigurationError(
f"Unknown embedding provider {provider!r}. Choose from: {known}",
context={"provider": provider},
)
# Lazy import
fqn = _EMBEDDING_PROVIDERS[provider]
module_path, class_name = fqn.rsplit(".", 1)
import importlib
mod = importlib.import_module(module_path)
cls = getattr(mod, class_name)
return cls(model=model, api_key=api_key, provider=provider, **kwargs)

View File

@@ -0,0 +1,125 @@
"""
OpenAI-compatible embedding adapter.
Works with both OpenAI (``/v1/embeddings``) and OpenRouter
(``/api/v1/embeddings``) since they share the same API format.
The *provider* parameter determines the default base URL and
API key environment variable.
"""
import time
from typing import Optional, Dict, Any
from markitect.llm.embedding_adapter import EmbeddingAdapter
from markitect.llm.config import resolve_api_key, find_project_root
from markitect.llm._http import post_json
from markitect.llm.exceptions import (
LLMConfigurationError,
LLMAPIError,
LLMRateLimitError,
)
_DEFAULT_MODEL = "text-embedding-3-small"
_PROVIDER_DEFAULTS: Dict[str, Dict[str, str]] = {
"openai": {
"api_base": "https://api.openai.com/v1",
"env_var": "OPENAI_API_KEY",
},
"openrouter": {
"api_base": "https://openrouter.ai/api/v1",
"env_var": "OPENROUTER_API_KEY",
},
}
class OpenAICompatibleEmbeddingAdapter(EmbeddingAdapter):
"""Embedding adapter for OpenAI-compatible endpoints.
A single class handles both OpenAI and OpenRouter because they
expose the same ``/embeddings`` endpoint format.
"""
def __init__(
self,
model: Optional[str] = None,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
provider: str = "openai",
max_retries: int = 3,
):
if provider not in _PROVIDER_DEFAULTS:
known = ", ".join(sorted(_PROVIDER_DEFAULTS))
raise LLMConfigurationError(
f"Unknown embedding provider {provider!r}. Choose from: {known}",
context={"provider": provider},
)
defaults = _PROVIDER_DEFAULTS[provider]
self._model = model or _DEFAULT_MODEL
self._api_base = (api_base or defaults["api_base"]).rstrip("/")
self._max_retries = max_retries
self._provider = provider
# Resolve API key
env_var = defaults["env_var"]
root = find_project_root()
key_file_paths = [root / f"apikey-{provider}.txt"] if root else []
self._api_key = resolve_api_key(
explicit=api_key,
env_var=env_var,
key_file_paths=key_file_paths,
)
def embed(self, texts: list[str]) -> list[list[float]]:
"""Embed texts via the OpenAI-compatible ``/embeddings`` endpoint.
Raises:
LLMConfigurationError: If no API key is configured.
LLMAPIError: On HTTP errors after retries are exhausted.
"""
if not self._api_key:
raise LLMConfigurationError(
"No API key configured for embedding adapter",
context={"provider": self._provider},
)
url = f"{self._api_base}/embeddings"
payload: Dict[str, Any] = {
"model": self._model,
"input": texts,
}
headers = {"Authorization": f"Bearer {self._api_key}"}
data = self._post_with_retries(url, payload, headers)
# Response: {"data": [{"embedding": [...], "index": 0}, ...]}
# Sort by index to guarantee input order.
items = sorted(data["data"], key=lambda d: d["index"])
return [item["embedding"] for item in items]
def validate(self) -> bool:
"""Return ``True`` if an API key is available."""
return self._api_key is not None
def _post_with_retries(
self,
url: str,
payload: Dict[str, Any],
headers: Dict[str, str],
) -> Dict[str, Any]:
last_exc: Optional[Exception] = None
for attempt in range(self._max_retries + 1):
try:
return post_json(url, payload, headers)
except LLMRateLimitError as exc:
last_exc = exc
if attempt < self._max_retries:
time.sleep(2 ** attempt)
except LLMAPIError as exc:
if exc.status_code >= 500 and attempt < self._max_retries:
last_exc = exc
time.sleep(2 ** attempt)
else:
raise
raise last_exc # type: ignore[misc]

View File

@@ -0,0 +1,64 @@
"""
Pure-Python vector similarity utilities.
No external dependencies — uses :mod:`math` only. Sufficient for the
current entity scale (~100s). numpy can be substituted later if needed.
"""
import math
def cosine_similarity(a: list[float], b: list[float]) -> float:
"""Cosine similarity between two vectors.
Returns a float in [-1, 1]. Returns 0.0 if either vector has
zero magnitude (to avoid division by zero).
"""
dot = sum(x * y for x, y in zip(a, b))
mag_a = math.sqrt(sum(x * x for x in a))
mag_b = math.sqrt(sum(x * x for x in b))
if mag_a == 0.0 or mag_b == 0.0:
return 0.0
return dot / (mag_a * mag_b)
def similarity_matrix(embeddings: list[list[float]]) -> list[list[float]]:
"""Build an NxN cosine similarity matrix.
``matrix[i][j]`` is the cosine similarity between
``embeddings[i]`` and ``embeddings[j]``.
"""
n = len(embeddings)
mat: list[list[float]] = [[0.0] * n for _ in range(n)]
for i in range(n):
mat[i][i] = 1.0
for j in range(i + 1, n):
sim = cosine_similarity(embeddings[i], embeddings[j])
mat[i][j] = sim
mat[j][i] = sim
return mat
def find_similar_pairs(
embeddings: dict[str, list[float]],
threshold: float = 0.80,
) -> list[tuple[str, str, float]]:
"""Find all pairs with cosine similarity >= *threshold*.
Args:
embeddings: Mapping of slug → embedding vector.
threshold: Minimum similarity to include (default 0.80).
Returns:
List of ``(slug_a, slug_b, similarity)`` tuples sorted by
similarity descending.
"""
slugs = sorted(embeddings)
pairs: list[tuple[str, str, float]] = []
for i, slug_a in enumerate(slugs):
for slug_b in slugs[i + 1:]:
sim = cosine_similarity(embeddings[slug_a], embeddings[slug_b])
if sim >= threshold:
pairs.append((slug_a, slug_b, sim))
pairs.sort(key=lambda t: t[2], reverse=True)
return pairs

View File

@@ -0,0 +1,168 @@
"""
Batch LLM evaluation orchestrator.
Runs an evaluation prompt against a batch of items (entities, pairs,
etc.), collecting structured results. Handles:
- Incremental evaluation (skip items whose content hasn't changed)
- Progress reporting via callback
- Graceful error handling per item (one failure doesn't stop the batch)
- Aggregate token usage tracking
This is the mechanism by which infospace tooling delegates LLM work
to the platform. The adapter's own retry logic handles transient
API errors (rate limits, 5xx).
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Callable, Dict, List, Optional
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.prompts.execution.models import LLMResponse, RunConfig
@dataclass
class BatchItem:
"""A single item to evaluate in a batch.
Attributes:
key: Unique identifier (e.g. entity slug).
prompt: The compiled prompt text to send to the LLM.
content_digest: Hash of the source content, used for
incremental evaluation (skip if unchanged).
metadata: Arbitrary pass-through metadata.
"""
key: str
prompt: str
content_digest: str = ""
metadata: Dict[str, Any] = field(default_factory=dict)
@dataclass
class BatchResult:
"""Result for a single batch item.
Attributes:
key: Matches the input :attr:`BatchItem.key`.
status: One of ``"success"``, ``"error"``, ``"skipped"``.
response: The LLM response (``None`` if skipped or error).
error: Error message (``None`` if success or skipped).
metadata: Pass-through metadata from the input item.
"""
key: str
status: str
response: Optional[LLMResponse] = None
error: Optional[str] = None
metadata: Dict[str, Any] = field(default_factory=dict)
@dataclass
class BatchSummary:
"""Aggregate results from a batch evaluation run."""
total: int = 0
succeeded: int = 0
failed: int = 0
skipped: int = 0
results: List[BatchResult] = field(default_factory=list)
total_prompt_tokens: int = 0
total_completion_tokens: int = 0
@property
def total_tokens(self) -> int:
return self.total_prompt_tokens + self.total_completion_tokens
def success_rate(self) -> float:
"""Fraction of non-skipped items that succeeded."""
attempted = self.total - self.skipped
if attempted == 0:
return 1.0
return self.succeeded / attempted
class BatchEvaluator:
"""Orchestrates LLM evaluation across a batch of items.
Args:
adapter: The LLM adapter to use for evaluation.
config: Run configuration (model, temperature, etc.).
progress_callback: Optional ``fn(completed, total, result)``
called after each item is processed.
previous_digests: Optional ``{key: digest}`` mapping from a
previous run. Items whose digest matches are skipped.
"""
def __init__(
self,
adapter: LLMAdapter,
config: Optional[RunConfig] = None,
progress_callback: Optional[Callable[[int, int, BatchResult], None]] = None,
previous_digests: Optional[Dict[str, str]] = None,
):
self._adapter = adapter
self._config = config or RunConfig()
self._progress_callback = progress_callback
self._previous_digests = previous_digests or {}
def evaluate(self, items: List[BatchItem]) -> BatchSummary:
"""Run evaluation for all items and return aggregate results.
Items whose :attr:`~BatchItem.content_digest` matches an entry
in *previous_digests* are skipped. All other items are sent to
the LLM adapter. Errors on individual items are captured
without aborting the batch.
"""
summary = BatchSummary(total=len(items))
for idx, item in enumerate(items):
result = self._evaluate_one(item)
summary.results.append(result)
if result.status == "success":
summary.succeeded += 1
usage = result.response.usage if result.response else {}
summary.total_prompt_tokens += usage.get("prompt_tokens", 0)
summary.total_completion_tokens += usage.get("completion_tokens", 0)
elif result.status == "skipped":
summary.skipped += 1
else:
summary.failed += 1
if self._progress_callback is not None:
self._progress_callback(idx + 1, len(items), result)
return summary
def _evaluate_one(self, item: BatchItem) -> BatchResult:
"""Evaluate a single item, handling skip logic and errors."""
# Incremental: skip if digest unchanged
if (
item.content_digest
and item.key in self._previous_digests
and self._previous_digests[item.key] == item.content_digest
):
return BatchResult(
key=item.key,
status="skipped",
metadata=item.metadata,
)
try:
response = self._adapter.execute_prompt(item.prompt, self._config)
return BatchResult(
key=item.key,
status="success",
response=response,
metadata=item.metadata,
)
except Exception as exc:
return BatchResult(
key=item.key,
status="error",
error=str(exc),
metadata=item.metadata,
)

View File

@@ -33,6 +33,7 @@ development = [
"kaizen-agentic @ file:./capabilities/kaizen-agentic"
]
proxy-pdf = ["pymupdf4llm>=0.0.10"]
analysis = ["networkx>=3.0"]
proxy-html = ["markdownify>=0.13.1"]
proxy-markitdown = ["markitdown-no-magika[pdf]"]
proxy = ["markitdown-no-magika[pdf]"]

View File

@@ -0,0 +1,621 @@
# Viable Infospace Tooling — Roadmap
## Vision
An **infospace** is a structured, evaluable, composable collection of
concepts that explains a **topic** through the lens of one or more
**disciplines**. Infospaces are the unit of knowledge work in MarkiTect.
This roadmap organises the work needed to move from the current
ad-hoc example (`infospace-with-history`) to a general-purpose platform
for creating, evaluating, maintaining, and composing infospaces.
---
## Terminology
These terms establish the vocabulary for infospace tooling. They
generalise from the Wealth of Nations / VSM example but are not
specific to it.
### Infospace
A curated, self-describing collection of **entities** (concepts,
mechanisms, observations) that together explain a **topic**. An
infospace has:
- A **topic** — the subject matter being explained (e.g. "The Wealth
of Nations", "cellular biology", "Kubernetes networking")
- One or more **disciplines** — external frameworks applied as lenses
(e.g. "Viable System Model", "category theory")
- **Entities** — the atomic units of knowledge, each with a definition,
provenance, and quality scores
- **Schemas** — structural templates that define what a well-formed
entity, mapping, or analysis looks like
- **Evaluations** — per-entity and collection-level quality assessments
- **Metrics** — quantitative indicators of completeness, coherence,
consistency, and granularity balance
An infospace is **viable** when it meets threshold scores across its
defined metrics — it is fit for purpose as an explanatory tool.
### Topic
The subject matter an infospace is built to explain. A topic sits
within a **domain** (broader field of knowledge) but is more specific:
- Domain: Economics → Topic: The Wealth of Nations
- Domain: Systems Theory → Topic: Viable System Model
- Domain: Computer Science → Topic: Distributed consensus protocols
A topic provides the **source material** — the texts, data, or
observations from which entities are extracted.
### Discipline
A reusable framework of concepts applied as a lens to explore a topic.
A discipline is itself an infospace — one that has been evaluated as
viable and packaged for reuse.
In our example, the VSM is the discipline: a set of concepts (S1-S5,
recursion, variety, viability) from systems theory, applied to the
economic concepts in Smith's work.
**Key property:** Disciplines compose. An infospace built with one
discipline can itself become a discipline for another infospace. The
Wealth of Nations infospace, viewed through VSM, could become a
discipline applied to a modern supply chain analysis.
### Entity
The atomic unit of an infospace. An entity has:
- **Identity**: a unique slug and human-readable title
- **Definition**: a precise, non-circular explanation
- **Provenance**: the source chapter, passage, and extraction context
- **Domain placement**: which area of the topic it belongs to
- **Discipline mapping**: how it connects to the applied discipline
(e.g. which VSM system)
- **Quality scores**: per-entity LLM-evaluated metrics
- **Lifecycle state**: active, archived (with reason), or draft
### Evaluation
A structured assessment of quality, applied at two levels:
- **Per-entity evaluation**: scores an individual entity against
quality rubrics defined in its schema (definition precision, source
grounding, discipline relevance, etc.)
- **Collection evaluation**: scores the entity set as a whole against
five concerns: redundancy, coverage, coherence, consistency, and
granularity balance
Evaluations are always performed by **delegated LLM calls** through
MarkiTect's LLM integration — never by the coding agent working on
infrastructure. This separation ensures that domain-level judgment
stays in the problem space, not the tooling space.
### Viability
An infospace is viable when:
1. Its entities individually meet quality thresholds (per-entity eval)
2. Its collection metrics are within acceptable ranges
3. It can answer its defined **competency questions** — the canonical
queries the infospace is meant to support
4. It has been evaluated recently enough that metrics reflect current
content
Viability is not binary — it is a profile of scores that the user
sets thresholds for based on their needs.
---
## Architecture: Three Layers
```
┌──────────────────────────────────────────────────┐
│ Layer 3: Infospace Instances │
│ Specific infospaces built by users │
│ (Wealth of Nations + VSM, supply chain + ...) │
│ Works IN an infospace │
├──────────────────────────────────────────────────┤
│ Layer 2: Infospace Tooling │
│ Terminology, primitives, composition model │
│ CLI: infospace create/evaluate/compose/... │
│ Works WITH infospaces │
├──────────────────────────────────────────────────┤
│ Layer 1: MarkiTect Platform │
│ Artifacts, prompts, LLM, spaces, graph, embed │
│ Provides FOR infospaces │
└──────────────────────────────────────────────────┘
```
### Boundary condition: LLM delegation
All LLM-based evaluation (entity scoring, pairwise judgments, coverage
analysis) is delegated to MarkiTect's LLM integration module. The coding
agent that works on infrastructure never makes domain-level judgments
itself. This keeps a clean separation:
- **Coding agent** → writes Python, templates, schemas, tests
- **MarkiTect LLM** → evaluates entities, judges redundancy, assesses
coverage, checks consistency
The infospace tooling (Layer 2) orchestrates these LLM calls through
prompt templates and the prompt execution engine, not through ad-hoc
prompting.
---
## Stage 1: MarkiTect Platform Additions
Infrastructure that must exist before infospace tooling can be built.
These are general-purpose platform capabilities, not infospace-specific.
### S1.1 — Entity metadata parser
Add a deterministic markdown parser that extracts structured metadata
from entity files: H1 title, sections present, word counts, domain,
source chapter. Returns a dataclass usable by all downstream metrics.
**Maps to:** INFRA-TASKS #13, #10
**Location:** `markitect/prompts/quality/` or new `markitect/analysis/`
**Depends on:** Nothing — can start immediately
**Deliverable:** `parse_entity_metadata(path) -> EntityMeta` function
with tests
### S1.2 — Schema compliance validator
Deterministic validation of entity/mapping files against their schemas:
section presence, word count ranges, heading format, enum values. No
LLM needed.
**Maps to:** INFRA-TASKS #10
**Location:** `markitect/prompts/quality/validator.py` (extend existing)
**Depends on:** S1.1
**Deliverable:** `validate_document(path, schema) -> ValidationResult`
with tests
### S1.3 — Embedding adapter
Add embedding support to `markitect/llm/`. Needs:
- `EmbeddingAdapter` interface: `embed(texts: list[str]) -> list[list[float]]`
- `OpenRouterEmbeddingAdapter` implementation (or OpenAI embedding endpoint)
- Caching layer: store embeddings keyed by `{slug: content_digest}` so
unchanged entities skip re-embedding
- Cosine similarity utility: `similarity_matrix(embeddings) -> np.ndarray`
**Maps to:** INFRA-TASKS #14 (prerequisite)
**Location:** `markitect/llm/embeddings.py`
**Depends on:** Nothing — can start immediately
**Deliverable:** Embedding adapter + cache + similarity computation, with
tests
### S1.4 — Graph analysis utilities
The existing `DependencyGraph` supports basic traversal and cycle
detection. Collection-level metrics need richer analysis:
- Connected components
- Betweenness centrality
- Community detection (Louvain or label propagation)
- Modularity score
- Degree distribution
- Cohesion/coupling computation
Decide: extend `DependencyGraph` or add a lightweight wrapper that
converts to networkx (adding it as an optional dependency).
**Maps to:** INFRA-TASKS #16 (prerequisite)
**Location:** `markitect/prompts/dependencies/analysis.py` or new
`markitect/analysis/graph.py`
**Depends on:** Nothing — can start immediately
**Deliverable:** Graph analysis functions with tests
### S1.5 — Structured evaluation output
Define a standard format for evaluation results: YAML front-matter +
markdown body. Add utilities for:
- Writing evaluation results (per-entity, per-pair, collection-level)
- Reading/parsing evaluation results back into dataclasses
- Appending timestamped snapshots to a history file
- Diffing two snapshots
**Maps to:** INFRA-TASKS #11, #12
**Location:** `markitect/prompts/quality/` or `markitect/analysis/`
**Depends on:** S1.1
**Deliverable:** `EvaluationResult` model + read/write utilities with
tests
### S1.6 — Batch LLM evaluation orchestrator
A pipeline component that runs an evaluation prompt template against a
batch of entities (or entity pairs), collecting structured results.
Must handle:
- Rate limiting and retry (reuse existing adapter logic)
- Progress reporting
- Incremental evaluation (skip entities whose content hasn't changed
since last eval)
- Result aggregation
This is the mechanism by which infospace tooling delegates LLM work
to the platform.
**Maps to:** INFRA-TASKS #9 (prerequisite)
**Location:** `markitect/prompts/execution/batch.py`
**Depends on:** S1.5
**Deliverable:** `BatchEvaluator` class with tests
### S1.7 — FCA computation
Formal Concept Analysis: build a formal context (entity × attribute
matrix), compute the concept lattice, extract gap concepts. Either
implement a minimal FCA algorithm or integrate a library.
**Maps to:** INFRA-TASKS #15 (prerequisite)
**Location:** `markitect/analysis/fca.py`
**Depends on:** S1.1
**Deliverable:** `FormalContext`, `ConceptLattice`, `find_gap_concepts()`
with tests
### Summary: Stage 1 dependency graph
```
S1.1 Entity metadata parser ──┬── S1.2 Schema validator
├── S1.5 Eval output format ── S1.6 Batch evaluator
└── S1.7 FCA computation
S1.3 Embedding adapter ──────── (independent)
S1.4 Graph analysis ─────────── (independent)
```
S1.1, S1.3, and S1.4 can proceed in parallel. S1.6 (batch evaluator) is
the final piece needed before Stage 2 can begin.
---
## Stage 2: Infospace Tooling
The user-facing layer that provides documented primitives for working
with infospaces. Built on top of Stage 1 infrastructure and the existing
`markitect/spaces/` module.
### S2.1 — Infospace model and configuration
Define the `Infospace` as a first-class concept that extends the existing
`InformationSpace` with:
- **Topic declaration**: name, domain, source material reference
- **Discipline bindings**: which external infospaces are applied as lenses
- **Schema registry**: which schemas govern entity structure
- **Competency questions**: what the infospace should be able to answer
- **Viability thresholds**: minimum acceptable metric scores
- **Evaluation state**: latest per-entity and collection scores
Configuration format: a `infospace.yaml` (or section in existing config)
that declares all of the above.
**Location:** new `markitect/infospace/` package
**Depends on:** S1.1, S1.5, existing `markitect/spaces/`
**Deliverable:** `InfospaceConfig`, `InfospaceState` models + loader
### S2.2 — Infospace lifecycle commands
CLI commands for the core lifecycle:
```bash
# Initialise a new infospace
markitect infospace init --topic "Wealth of Nations" \
--domain "Economics" \
--discipline vsm-framework
# Show infospace status (entity count, eval state, viability)
markitect infospace status
# List entities with quality summary
markitect infospace entities [--sort-by score|domain|chapter]
# Show viability dashboard
markitect infospace viability
```
These commands read the `infospace.yaml` config and present information
from the metadata index and evaluation results.
**Location:** `markitect/infospace/cli.py` integrated into main CLI
**Depends on:** S2.1
**Deliverable:** CLI commands with help text and tests
### S2.3 — Per-entity evaluation primitives
Prompt templates and CLI commands for evaluating individual entities:
```bash
# Evaluate all entities
markitect infospace evaluate --provider openrouter
# Evaluate entities from a specific chapter
markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter
# Re-evaluate a single entity
markitect infospace evaluate --entity division-of-labour --provider openrouter
```
Uses the batch evaluator (S1.6) to run the evaluate-entity prompt
template (defined in the infospace's schema directory) against entities.
Writes structured results to `output/evaluations/`.
**Maps to:** INFRA-TASKS #8, #9
**Location:** `markitect/infospace/evaluation.py`
**Depends on:** S1.6, S2.1
**Deliverable:** Per-entity evaluation pipeline + CLI + prompt template
### S2.4 — Collection-level checks
CLI commands for each of the five collection concerns:
```bash
# Run all collection checks
markitect infospace check --provider openrouter
# Run specific checks
markitect infospace check redundancy --provider openrouter
markitect infospace check coverage --provider openrouter
markitect infospace check coherence --provider openrouter
markitect infospace check consistency --provider openrouter
markitect infospace check granularity --provider openrouter
```
Each check uses Stage 1 infrastructure (embeddings, graph analysis, FCA)
and delegates LLM judgment to the platform. Results written to
`output/metrics/` as per-concern reports + unified `metrics.yaml`.
**Maps to:** INFRA-TASKS #14-19
**Location:** `markitect/infospace/checks/` (one module per concern)
**Depends on:** S1.3, S1.4, S1.6, S1.7, S2.1
**Deliverable:** Five check modules + unified orchestrator + CLI
### S2.5 — Metrics history and viability tracking
Track metrics over time. After each evaluation or check run, append a
timestamped snapshot to `metrics-history.yaml`. Provide commands to
review trends:
```bash
# Show metrics history
markitect infospace history
# Compare two snapshots
markitect infospace history diff 2026-02-18 2026-03-01
# Check viability against thresholds
markitect infospace viability
```
Viability is assessed by comparing current metrics to the thresholds
declared in `infospace.yaml`. A simple pass/fail per metric with the
actual value.
**Maps to:** INFRA-TASKS #12
**Location:** `markitect/infospace/history.py`
**Depends on:** S2.4, S1.5
**Deliverable:** History tracking + viability assessment + CLI
### S2.6 — Infospace composition model
The mechanism by which one infospace is applied as a discipline to
another. Builds on `markitect/spaces/composability/`:
- **Discipline binding**: declare that infospace A uses infospace B as a
discipline. B's entities become available as mapping targets.
- **Cross-infospace references**: entity in A maps to concept in B using
the same mapping schema and evaluation pipeline.
- **Discipline viability requirement**: B must be viable (meets its own
thresholds) before it can be used as a discipline for A.
- **Cascading evaluation**: when B's entities change, A's mappings that
reference them are flagged for re-evaluation.
```bash
# Bind a discipline to the current infospace
markitect infospace bind-discipline ./path/to/vsm-infospace
# List bound disciplines and their viability
markitect infospace disciplines
# Check for stale mappings after discipline update
markitect infospace check stale-mappings
```
**Location:** `markitect/infospace/composition.py`
**Depends on:** S2.1, existing `markitect/spaces/composability/`
**Deliverable:** Composition model + CLI + documentation
### S2.7 — Documentation: Infospace Primitives Reference
A reference document explaining all primitives, their purpose, and how
they compose. This is the user-facing documentation for the infospace
tooling layer — the equivalent of a framework guide.
**Location:** `docs/infospace-primitives.md` or in-CLI help
**Depends on:** S2.1-S2.6
**Deliverable:** Reference documentation
### Summary: Stage 2 dependency graph
```
S2.1 Model & config ──┬── S2.2 Lifecycle CLI
├── S2.3 Per-entity evaluation
├── S2.4 Collection checks ── S2.5 History & viability
└── S2.6 Composition model
S2.7 Documentation (depends on all above)
```
---
## Stage 3: Example Revision
Revisit the Wealth of Nations / VSM example using the new tooling.
The example becomes both a tutorial and a validation of the tooling.
### S3.1 — Migrate example to infospace configuration
Replace the ad-hoc `process_chapters.py` setup with a declarative
`infospace.yaml`:
```yaml
topic:
name: "The Wealth of Nations"
domain: "Classical Economics"
sources: artifacts/sources/
disciplines:
- name: "Viable System Model"
path: artifacts/vsm-reference/
schemas:
entity: schemas/economic-entity-schema-v1.0.md
mapping: schemas/vsm-mapping-schema-v1.0.md
analysis: schemas/chapter-analysis-schema-v1.0.md
competency_questions: schemas/competency-questions.md
viability:
redundancy_ratio: { max: 0.05 }
coverage_ratio: { min: 0.60 }
coherence_components: { max: 1 }
consistency_cycles: { max: 0 }
granularity_entropy: { min: 1.0 }
per_entity_mean: { min: 3.5 }
pipeline:
stages:
- template: extract-entities
spaces: [sources, guidelines, vsm-reference, entities]
- template: map-to-vsm
spaces: [entities, vsm-reference, guidelines]
- template: synthesize-analysis
spaces: [sources, entities, mappings, vsm-reference]
post_batch:
- template: assess-metrics
spaces: [analyses, vsm-reference]
```
**Depends on:** S2.1
**Deliverable:** `infospace.yaml` + migration of `process_chapters.py` to
use infospace tooling APIs
### S3.2 — Clean per-chapter git history
Re-run all processed chapters (and remaining ones) with per-chapter
commits on a clean branch, then replace the current tangled history.
**Maps to:** INFRA-TASKS #4, #7
**Depends on:** S3.1
**Deliverable:** Clean branch with one commit per chapter
### S3.3 — Full evaluation run
Run all per-entity evaluations and collection checks on the completed
infospace. Establish baseline metrics. Demonstrate the viability
dashboard.
**Maps to:** INFRA-TASKS #6
**Depends on:** S2.3, S2.4, S2.5, S3.2
**Deliverable:** Complete evaluation results + viability report
### S3.4 — Rewrite tutorial
Update `TUTORIAL.md` to use infospace tooling commands instead of
raw `process_chapters.py` invocations. The tutorial should walk
through:
1. Initialising an infospace (`markitect infospace init`)
2. Defining schemas and competency questions
3. Processing chapters (pipeline execution)
4. Evaluating entities (`markitect infospace evaluate`)
5. Running collection checks (`markitect infospace check`)
6. Reviewing viability (`markitect infospace viability`)
7. Iterating: refining guidelines, re-processing, re-evaluating
8. Using the infospace as a discipline for a new project
**Depends on:** S3.1-S3.3
**Deliverable:** Revised `TUTORIAL.md`
### S3.5 — Demonstrate composition
Create a minimal second infospace (e.g. a modern supply chain case
study or a different economic text) that binds the Wealth of Nations
infospace as a discipline. Demonstrates the composition model from S2.6.
**Depends on:** S2.6, S3.3
**Deliverable:** Second example infospace + composition tutorial section
---
## Task Mapping
Cross-reference between INFRA-TASKS numbers and roadmap stages:
| INFRA-TASK | Description | Stage |
|------------|-------------|-------|
| 1-3 | Infra fixes (resolved) | — |
| 4 | Per-chapter git history | S3.2 |
| 5 | Prompt file side-effects | S1.6 (batch eval avoids this) |
| 6 | Stale metrics | S3.3 |
| 7 | Remaining 28 chapters | S3.2 |
| 8 | Per-concept quality metrics in schema | S2.3 |
| 9 | Evaluate-entity prompt template | S2.3 |
| 10 | Deterministic schema compliance | S1.2 |
| 11 | Structured metrics output | S1.5 |
| 12 | Metrics-over-time tracking | S2.5 |
| 13 | Entity metadata index | S1.1 |
| 14 | Redundancy detection (C1) | S2.4 |
| 15 | Coverage completeness (C2) | S2.4 |
| 16 | Structural coherence (C3) | S2.4 |
| 17 | Definitional consistency (C4) | S2.4 |
| 18 | Granularity balance (C5) | S2.4 |
| 19 | Unified collection evaluation | S2.4 |
---
## Implementation Order
Recommended sequence, accounting for dependencies and value delivery:
**Phase A — Foundation (Stage 1, parallelisable)**
1. S1.1 Entity metadata parser
2. S1.3 Embedding adapter
3. S1.4 Graph analysis utilities
**Phase B — Validation & Output (Stage 1)**
4. S1.2 Schema compliance validator (needs S1.1)
5. S1.5 Structured evaluation output (needs S1.1)
6. S1.7 FCA computation (needs S1.1)
**Phase C — Orchestration (Stage 1 → Stage 2 bridge)**
7. S1.6 Batch LLM evaluation orchestrator (needs S1.5)
**Phase D — Infospace Core (Stage 2)**
8. S2.1 Infospace model and configuration
9. S2.2 Lifecycle commands
10. S2.3 Per-entity evaluation primitives (needs S1.6, S2.1)
**Phase E — Collection Intelligence (Stage 2)**
11. S2.4 Collection-level checks (needs S1.3, S1.4, S1.7, S2.1)
12. S2.5 Metrics history and viability tracking
**Phase F — Composition (Stage 2)**
13. S2.6 Infospace composition model
14. S2.7 Documentation
**Phase G — Example (Stage 3)**
15. S3.1 Migrate example to infospace config
16. S3.2 Clean per-chapter history
17. S3.3 Full evaluation run
18. S3.4 Rewrite tutorial
19. S3.5 Demonstrate composition

View File

@@ -0,0 +1,381 @@
# Viable Information Spaces
*A preliminary introduction to the concepts, structure, and purpose of
viable information spaces as a framework for structured knowledge work.*
---
## What is an Information Space?
An information space is a curated collection of concepts — each precisely
defined, grounded in source material, and connected to the others — that
together explain a topic. It is not a database, not a knowledge graph in
the technical sense, and not a document collection. It is closer to what
a domain expert carries in their head: a working vocabulary of ideas,
their relationships, and the judgment to know which idea applies where.
The difference is that an information space makes this vocabulary
**explicit, evaluable, and composable**. Every concept has a written
definition. Every relationship can be traced. The quality of the whole
collection can be measured and improved over time.
We use the term **infospace** as shorthand.
---
## Why "Viable"?
The word comes from Stafford Beer's Viable System Model, but the idea
generalises beyond it. A viable system is one that can maintain a
separate existence — it is complete enough to function, coherent enough
to hold together, and adaptive enough to improve when circumstances
change.
A **viable infospace** has the same properties:
- **Complete enough** — it covers the topic well enough to answer the
questions it was built to answer. Not every detail, but every concept
that matters.
- **Coherent enough** — its concepts connect into an explanatory web,
not a disconnected list. You can trace how one idea leads to another.
- **Consistent enough** — concepts don't contradict each other. Terms
are used the same way throughout. Definitions don't go in circles.
- **Balanced enough** — concepts operate at comparable levels of
abstraction. The infospace doesn't mix foundational theories with
trivial observations without acknowledging the difference.
- **Non-redundant enough** — each concept earns its place. Two concepts
that mean the same thing should be one concept.
None of these are absolute. "Enough" is defined by the purpose. An
infospace built for teaching needs different coverage than one built for
research. Viability is a profile of scores against thresholds that the
user sets.
---
## The Anatomy of an Infospace
### Topic
Every infospace is built to explain something specific. The **topic** is
the subject matter: a text, a system, a body of knowledge, a problem
domain. In our first example, the topic is Adam Smith's *The Wealth of
Nations* — the economic ideas contained in that specific work.
A topic sits within a broader **domain** (economics, biology, software
engineering) but is more focused. The domain provides context; the topic
provides the source material from which concepts are extracted.
### Entities
The atomic units of an infospace are its **entities** — the individual
concepts, mechanisms, and observations that constitute its vocabulary.
Each entity has:
- A **name** and unique identifier
- A **definition** — precise, non-circular, distinguishable from
neighbouring concepts
- **Provenance** — where it came from (which chapter, passage, or data
source)
- A **domain placement** — which area of the topic it belongs to
- **Quality scores** — how well it is defined, grounded, and connected
Entities are stored as individual files, one concept per file. This makes
them independently addressable, diffable, and composable.
### Schemas
**Schemas** define what a well-formed entity looks like: which sections
it must have, what validation rules apply, what quality metrics are
evaluated. A schema is not code — it is a markdown document that both
humans and LLMs read as instructions.
Schemas serve two purposes:
1. **Structural** — they tell the extraction pipeline what to produce
(required sections, word count ranges, heading formats)
2. **Evaluative** — they define quality rubrics against which each entity
is scored (definition precision, source grounding, explanatory value)
By changing a schema, you change what the infospace considers "good"
without changing any infrastructure.
### Disciplines
Here is where things get interesting. An infospace doesn't just catalogue
what's in the source material — it looks at the source through a
**lens**. We call this lens a **discipline**: a structured framework of
concepts from another domain, applied to illuminate the topic at hand.
In our example, the discipline is Stafford Beer's Viable System Model —
a set of concepts from systems theory (System 1 through System 5,
recursion, variety, viability) applied to the economic ideas in Smith's
work. The VSM provides the analytical structure; Smith provides the raw
material.
The key insight: **a discipline is itself an infospace.** The VSM
concepts (S1-S5, recursion, variety, algedonic signals) form their own
curated, evaluable collection of ideas. To use the VSM as a discipline,
it must first be a viable infospace in its own right — its concepts must
be well-defined, coherent, and complete.
This leads to a recursive property: infospaces can be built on top of
other infospaces. The Wealth of Nations infospace, viewed through the
VSM lens, could itself become a discipline applied to analyse a modern
supply chain. Each layer adds structure without losing the detail
beneath it.
---
## How Infospaces Are Built
Building an infospace is an incremental process with four repeating
phases:
### 1. Extract
Source material is processed one unit at a time (a chapter, a document,
a dataset). For each unit, an LLM extracts entities according to the
schemas and guidelines. Entities that already exist are recognised and
skipped — the infospace grows by accumulation, not duplication.
### 2. Map
Extracted entities are mapped to the discipline. In our example, each
economic concept is mapped to a VSM system with a strength rating and
rationale. This is where the discipline lens does its work: it forces
the question "what role does this concept play in the larger system?"
### 3. Evaluate
After extraction and mapping, the infospace is evaluated at two levels:
- **Per-entity**: each concept is scored against quality rubrics. Is the
definition precise? Is it grounded in the source? Does it connect
meaningfully to the discipline?
- **Collection-level**: the set of concepts is assessed for redundancy,
coverage, coherence, consistency, and granularity balance.
Evaluation produces structured, machine-readable scores — not prose
narratives. These scores are tracked over time.
### 4. Refine
Evaluation reveals what needs improvement. Redundant concepts are merged
or archived. Coverage gaps are addressed by re-extracting with improved
guidelines. Inconsistencies are resolved by clarifying definitions.
Guidelines and schemas are updated. The cycle repeats.
This loop — extract, map, evaluate, refine — is the heartbeat of a
viable infospace. Each iteration makes the infospace more viable:
more complete, more coherent, more consistent.
---
## How Infospaces Are Evaluated
Quality is assessed through two complementary mechanisms:
### LLM Evaluation
A language model reads an entity (or a pair of entities) and judges it
against defined rubrics. This captures qualitative aspects that can't be
computed mechanically: Is this definition actually precise? Does this
mapping rationale make sense? Are these two concepts really different?
LLM evaluation is always **delegated** — it runs through prompt templates
and the platform's LLM integration, never through the human or agent
working on infrastructure. This separation keeps domain judgment in the
problem space.
### Deterministic Aggregation
Structured scores from LLM evaluation, plus metrics computed directly
from files (section counts, word lengths, graph properties, similarity
matrices), are aggregated into collection-level indicators. These are
numbers that can be tracked, diffed, and plotted:
- **Redundancy ratio** — what fraction of concepts substantially overlap
- **Coverage ratio** — what fraction of the domain-discipline matrix is
populated
- **Graph density** — how connected the concept web is
- **Cycle count** — how many circular definition chains exist
- **Granularity entropy** — how balanced the abstraction levels are
These indicators, compared against user-defined thresholds, determine
whether the infospace is **viable** for its intended purpose.
---
## Five Concerns of Collection Quality
Individual concept quality (is this definition good?) is necessary but
not sufficient. An infospace made of individually excellent concepts can
still fail as a collection. Five concerns capture what can go wrong:
### Redundancy
Do two concepts mean the same thing? Overlap wastes the reader's
attention and creates ambiguity about which concept to use. Redundancy is
detected through embedding similarity (are the definitions close in
meaning?) confirmed by LLM judgment (are they genuinely the same
concept, or merely related?).
### Coverage
Does the concept set cover the domain? Are there areas of the topic that
have no corresponding concepts? Coverage is assessed structurally (which
cells in the domain-discipline matrix are empty?) and functionally (can
the infospace answer the questions it was built to answer?).
### Coherence
Do the concepts form a connected web of explanations, or a fragmented
list of isolated ideas? Coherence is measured through graph analysis:
connected components (is everything reachable?), modularity (are there
meaningful clusters?), and bridge concepts (which ideas connect different
areas?).
### Consistency
Are concepts defined in terms of each other without contradiction? Are
there circular definition chains? Do definitions use terms that should
be concepts but aren't? Consistency is checked through dependency graph
analysis (cycles, undefined terms) and LLM pairwise judgment
(do related definitions contradict each other?).
### Granularity Balance
Are concepts at comparable levels of abstraction? An infospace that mixes
broad theoretical principles with narrow observations — without
acknowledging the difference — confuses more than it explains. Balance
is assessed by classifying each concept's abstraction level and measuring
the distribution.
---
## Infospaces as Organisms
The biological metaphor is deliberate. A viable organism maintains its
identity while exchanging material with its environment. It has internal
coherence (its parts work together), boundary integrity (it is
distinguishable from its surroundings), and adaptive capacity (it
responds to change).
Infospaces exhibit the same properties:
- **Internal coherence** — concepts connect and support each other
- **Boundary** — the topic and discipline define what belongs and what
doesn't
- **Adaptation** — evaluation and refinement allow the infospace to
improve
And like organisms, infospaces don't exist in isolation.
### Hierarchical Composition
One infospace can serve as a discipline for another. The VSM infospace
provides the lens for the Wealth of Nations infospace, which could
provide the lens for a supply chain infospace. Each layer adds structure
and interpretive power. This is analogous to biological organisation:
cells compose into tissues, tissues into organs, organs into organisms.
For this to work, the lower-level infospace must be viable — you can't
build reliable analysis on a shaky foundation. A discipline that is
incomplete or inconsistent will produce unreliable mappings.
### Network Composition
Infospaces can also relate laterally. Two infospaces at the same level
might share concepts, reference each other's entities, or provide
complementary views of overlapping domains. A Wealth of Nations infospace
and a Marx's Capital infospace might share economic entities while
differing in their analytical discipline.
This networked structure mirrors how knowledge actually works: fields
overlap, vocabularies are shared and contested, and understanding grows
by connecting islands of well-organised thought.
### Swarm Behaviour
When many infospaces exist and interact, emergent properties appear.
Common entities across many infospaces become well-tested through
repeated evaluation in different contexts. Concepts that survive across
multiple disciplines are more likely to be fundamental. Gaps visible from
one perspective may be filled by insights from another.
This is speculative territory for now, but the tooling should be designed
with it in mind: infospaces as first-class, composable, addressable
units of knowledge.
---
## The Role of Tooling
An infospace is a living artefact that requires ongoing maintenance. The
tooling must support every phase of the lifecycle:
### Creating an infospace
Declaring a topic, binding disciplines, defining schemas and competency
questions, setting viability thresholds. This should be a single
configuration step, not a programming exercise.
### Populating an infospace
Processing source material through the extract-map pipeline, one unit at
a time. Progress is tracked. Each addition is committed to version
history.
### Evaluating an infospace
Running per-entity and collection-level checks. Producing structured,
machine-readable scores. Comparing against viability thresholds.
Identifying specific issues (this entity is redundant, this domain gap
needs filling, these definitions contradict).
### Refining an infospace
Acting on evaluation results: archiving redundant entities, re-extracting
with improved guidelines, updating schemas, re-evaluating. Every change
is traceable.
### Composing infospaces
Binding one infospace as a discipline for another. Checking that the
discipline is viable. Propagating changes when the discipline's concepts
are updated.
### Monitoring an infospace
Tracking metrics over time. Seeing how coverage, coherence, and
consistency evolve as content is added. Detecting regressions when a
re-extraction reduces quality.
The tooling should present these operations as simple, well-documented
commands — not as infrastructure details. The user thinks in terms of
"evaluate my infospace" and "check for redundancy", not in terms of
embedding vectors and graph algorithms.
---
## Where We Are
We have built the first example infospace: 85 economic entities from
Adam Smith's *The Wealth of Nations*, mapped to Beer's Viable System
Model, with schemas, prompt templates, and a chapter-by-chapter
pipeline.
This example has taught us what works (incremental extraction,
deduplication, flat canonical entity sets, transclusion views) and what's
missing (per-concept evaluation, collection-level checks, composition
model, clean tooling commands).
The work ahead is to generalise from this example: build the platform
capabilities needed, create the tooling layer that makes infospace
operations accessible, and then revisit the example as both a validation
and a tutorial.
The goal is that anyone with a body of source material and an analytical
framework can create a viable infospace — and that infospaces, once
built, become reusable intellectual tools for future work.

View File

View File

@@ -0,0 +1,313 @@
"""Tests for markitect.analysis.fca."""
import pytest
from markitect.analysis.fca import (
FormalContext,
FormalConcept,
ConceptLattice,
find_gap_concepts,
find_empty_cells,
)
# ── Test data ────────────────────────────────────────────────────────
def _animal_context():
"""Classic FCA example: animals × properties.
Context:
| animal | legs | wings | feathers | fur |
|-----------|------|-------|----------|-----|
| dog | x | | | x |
| cat | x | | | x |
| eagle | x | x | x | |
| sparrow | x | x | x | |
| penguin | x | | x | |
"""
return FormalContext(
objects=["dog", "cat", "eagle", "sparrow", "penguin"],
attributes=["legs", "wings", "feathers", "fur"],
incidence={
"dog": {"legs", "fur"},
"cat": {"legs", "fur"},
"eagle": {"legs", "wings", "feathers"},
"sparrow": {"legs", "wings", "feathers"},
"penguin": {"legs", "feathers"},
},
)
def _infospace_context():
"""Simplified infospace-style context: entities × {domain, vsm_system}.
Entities with domain and VSM classification, including a gap:
no entity has both domain:Exchange and vsm:S3.
"""
return FormalContext.from_dict({
"division-of-labour": {"domain:Production", "vsm:S1"},
"pin-factory": {"domain:Production", "vsm:S1"},
"market-extent": {"domain:Exchange", "vsm:S4"},
"wage-determination": {"domain:Distribution", "vsm:S3"},
"rent-theory": {"domain:Distribution", "vsm:S5"},
"capital-accumulation": {"domain:Production", "vsm:S3"},
})
def _empty_context():
"""Context with no objects."""
return FormalContext([], ["a", "b"], {})
def _single_entity():
"""Context with one object."""
return FormalContext(["only"], ["x", "y"], {"only": {"x", "y"}})
# ── FormalContext ────────────────────────────────────────────────────
class TestFormalContext:
def test_objects_sorted(self):
ctx = _animal_context()
assert ctx.objects == sorted(ctx.objects)
def test_attributes_sorted(self):
ctx = _animal_context()
assert ctx.attributes == sorted(ctx.attributes)
def test_object_count(self):
assert _animal_context().object_count == 5
def test_attribute_count(self):
assert _animal_context().attribute_count == 4
def test_extent_single_attr(self):
ctx = _animal_context()
assert ctx.extent(["fur"]) == frozenset({"dog", "cat"})
def test_extent_multiple_attrs(self):
ctx = _animal_context()
assert ctx.extent(["wings", "feathers"]) == frozenset({"eagle", "sparrow"})
def test_extent_empty_returns_all(self):
ctx = _animal_context()
assert ctx.extent([]) == frozenset(ctx.objects)
def test_extent_no_match(self):
ctx = _animal_context()
assert ctx.extent(["fur", "feathers"]) == frozenset()
def test_intent_single_obj(self):
ctx = _animal_context()
assert ctx.intent(["penguin"]) == frozenset({"legs", "feathers"})
def test_intent_multiple_objs(self):
ctx = _animal_context()
# dog and cat share: legs, fur
assert ctx.intent(["dog", "cat"]) == frozenset({"legs", "fur"})
def test_intent_empty_returns_all(self):
ctx = _animal_context()
assert ctx.intent([]) == frozenset(ctx.attributes)
def test_closure_is_idempotent(self):
ctx = _animal_context()
c1 = ctx.closure({"fur"})
c2 = ctx.closure(c1)
assert c1 == c2
def test_closure_expands(self):
ctx = _animal_context()
# fur → {dog, cat} → {legs, fur} (both have legs too)
assert ctx.closure({"fur"}) == frozenset({"legs", "fur"})
def test_has_attribute(self):
ctx = _animal_context()
assert ctx.has_attribute("dog", "legs") is True
assert ctx.has_attribute("dog", "wings") is False
def test_density(self):
ctx = _animal_context()
# 5 objects × 4 attributes = 20 cells
# dog:2, cat:2, eagle:3, sparrow:3, penguin:2 = 12 filled
assert ctx.density() == pytest.approx(12 / 20)
def test_density_empty(self):
assert FormalContext([], [], {}).density() == 0.0
def test_from_dict(self):
ctx = FormalContext.from_dict({
"a": {"x", "y"},
"b": {"y", "z"},
})
assert ctx.object_count == 2
assert ctx.attribute_count == 3
def test_unknown_attributes_ignored(self):
ctx = FormalContext(
["a"], ["x"], {"a": {"x", "unknown"}}
)
assert ctx.intent(["a"]) == frozenset({"x"})
# ── ConceptLattice ──────────────────────────────────────────────────
class TestConceptLattice:
def test_animal_concept_count(self):
ctx = _animal_context()
lattice = ConceptLattice.from_context(ctx)
# Known: the animal context produces exactly 7 formal concepts
# Top: ({all}, {legs}), Bottom: ({}, {all 4}),
# plus intermediate concepts
assert lattice.size >= 5
def test_top_has_all_objects(self):
ctx = _animal_context()
lattice = ConceptLattice.from_context(ctx)
top = lattice.top
assert top is not None
assert top.extent == frozenset(ctx.objects)
def test_top_intent_is_common_attributes(self):
ctx = _animal_context()
lattice = ConceptLattice.from_context(ctx)
top = lattice.top
# All animals have "legs"
assert "legs" in top.intent
def test_bottom_has_all_attributes(self):
ctx = _animal_context()
lattice = ConceptLattice.from_context(ctx)
bottom = lattice.bottom
assert bottom is not None
assert bottom.intent == frozenset(ctx.attributes)
def test_bottom_extent_empty_when_no_universal_object(self):
ctx = _animal_context()
lattice = ConceptLattice.from_context(ctx)
bottom = lattice.bottom
# No animal has all 4 attributes
assert bottom.extent_size == 0
def test_all_concepts_are_closed(self):
ctx = _animal_context()
lattice = ConceptLattice.from_context(ctx)
for concept in lattice.concepts:
# intent should be closed: closure(intent) == intent
assert ctx.closure(concept.intent) == concept.intent
# extent' should equal intent
assert ctx.intent(concept.extent) == concept.intent
# intent' should equal extent
assert ctx.extent(concept.intent) == concept.extent
def test_empty_context(self):
ctx = _empty_context()
lattice = ConceptLattice.from_context(ctx)
# Empty context → gap concepts for all attribute combinations
assert lattice.size >= 1
def test_single_entity(self):
ctx = _single_entity()
lattice = ConceptLattice.from_context(ctx)
# At least 1 concept containing the single entity
has_entity = any(
"only" in c.extent for c in lattice.concepts
)
assert has_entity
def test_no_attributes_produces_one_concept(self):
ctx = FormalContext(["a", "b"], [], {})
lattice = ConceptLattice.from_context(ctx)
assert lattice.size == 1
assert lattice.concepts[0].extent == frozenset({"a", "b"})
def test_depth(self):
ctx = _animal_context()
lattice = ConceptLattice.from_context(ctx)
d = lattice.depth()
# At least 2 levels (top → bottom)
assert d >= 2
def test_depth_empty(self):
lattice = ConceptLattice(concepts=[])
assert lattice.depth() == 0
# ── Gap concepts ────────────────────────────────────────────────────
class TestGapConcepts:
def test_animal_has_gap(self):
ctx = _animal_context()
gaps = find_gap_concepts(ctx)
# {fur, feathers} has no animal → gap concept
fur_feathers_gap = any(
{"fur", "feathers"} <= c.intent for c in gaps
)
assert fur_feathers_gap
def test_gap_extents_are_empty(self):
ctx = _animal_context()
gaps = find_gap_concepts(ctx)
for gap in gaps:
assert gap.extent_size == 0
def test_no_gaps_when_all_combinations_covered(self):
# Every attribute combination has at least one object
ctx = FormalContext.from_dict({
"obj1": {"a", "b"},
"obj2": {"a"},
"obj3": {"b"},
})
lattice = ConceptLattice.from_context(ctx)
gaps = find_gap_concepts(ctx, lattice)
assert len(gaps) == 0
def test_sorted_by_intent_size(self):
ctx = _animal_context()
gaps = find_gap_concepts(ctx)
sizes = [g.intent_size for g in gaps]
assert sizes == sorted(sizes)
def test_infospace_gap(self):
ctx = _infospace_context()
gaps = find_gap_concepts(ctx)
# domain:Exchange + vsm:S1 has no entity → should appear as gap
gap_intents = [g.intent for g in gaps]
exchange_s1_covered = any(
{"domain:Exchange", "vsm:S1"} <= intent for intent in gap_intents
)
assert exchange_s1_covered
# ── Empty cells (cross-tab) ─────────────────────────────────────────
class TestFindEmptyCells:
def test_finds_empty_cells(self):
ctx = _infospace_context()
domains = ["domain:Production", "domain:Distribution", "domain:Exchange"]
vsm_systems = ["vsm:S1", "vsm:S3", "vsm:S4", "vsm:S5"]
empty = find_empty_cells(ctx, domains, vsm_systems)
# domain:Exchange + vsm:S1 should be empty
assert ("domain:Exchange", "vsm:S1") in empty
# domain:Production + vsm:S1 should NOT be empty (division-of-labour)
assert ("domain:Production", "vsm:S1") not in empty
def test_all_filled_returns_empty_list(self):
ctx = FormalContext.from_dict({
"a": {"x", "y"},
"b": {"x", "z"},
"c": {"y", "z"},
"d": {"x", "y", "z"},
})
empty = find_empty_cells(ctx, ["x", "y"], ["z"])
assert empty == []
def test_empty_context_all_cells_empty(self):
ctx = FormalContext([], ["a", "b", "c"], {})
empty = find_empty_cells(ctx, ["a"], ["b", "c"])
assert len(empty) == 2

View File

@@ -0,0 +1,254 @@
"""Tests for markitect.analysis.graph."""
import pytest
nx = pytest.importorskip("networkx", reason="networkx not installed")
from markitect.prompts.dependencies.models import DependencyGraph, EdgeType
from markitect.analysis.graph import (
to_networkx,
connected_components,
betweenness_centrality,
detect_communities,
modularity_score,
degree_distribution,
cohesion_coupling,
)
# ── Helpers ──────────────────────────────────────────────────────────
def _linear_graph():
"""A -> B -> C -> D (simple chain)."""
g = DependencyGraph()
g.add_edge("A", "B")
g.add_edge("B", "C")
g.add_edge("C", "D")
return g
def _two_clusters():
"""Two dense clusters connected by a single bridge edge.
Cluster 1: A -- B -- C (fully connected)
Cluster 2: X -- Y -- Z (fully connected)
Bridge: C -> X
"""
g = DependencyGraph()
# Cluster 1
g.add_edge("A", "B")
g.add_edge("B", "A")
g.add_edge("B", "C")
g.add_edge("C", "B")
g.add_edge("A", "C")
g.add_edge("C", "A")
# Cluster 2
g.add_edge("X", "Y")
g.add_edge("Y", "X")
g.add_edge("Y", "Z")
g.add_edge("Z", "Y")
g.add_edge("X", "Z")
g.add_edge("Z", "X")
# Bridge
g.add_edge("C", "X")
return g
def _disconnected_graph():
"""Two separate components: {A, B} and {X, Y}."""
g = DependencyGraph()
g.add_edge("A", "B")
g.add_edge("X", "Y")
return g
def _empty_graph():
"""Graph with no nodes or edges."""
return DependencyGraph()
def _isolated_nodes():
"""Graph with nodes but no edges."""
g = DependencyGraph()
# add_edge creates both nodes, so we use two separate edges
# and then extract a subgraph with isolated nodes
g.add_edge("A", "B")
return g.get_subgraph({"A", "B", "C"})
# ── to_networkx ─────────────────────────────────────────────────────
class TestToNetworkx:
def test_preserves_nodes(self):
g = _linear_graph()
G = to_networkx(g)
assert set(G.nodes) == {"A", "B", "C", "D"}
def test_preserves_edges(self):
g = _linear_graph()
G = to_networkx(g)
assert G.has_edge("A", "B")
assert G.has_edge("B", "C")
assert not G.has_edge("D", "A")
def test_preserves_edge_type(self):
g = DependencyGraph()
g.add_edge("A", "B", EdgeType.GENERATES)
G = to_networkx(g)
assert G.edges["A", "B"]["edge_type"] == "generates"
def test_empty_graph(self):
G = to_networkx(_empty_graph())
assert len(G.nodes) == 0
assert len(G.edges) == 0
# ── Connected components ────────────────────────────────────────────
class TestConnectedComponents:
def test_single_component(self):
comps = connected_components(_linear_graph())
assert len(comps) == 1
assert comps[0] == {"A", "B", "C", "D"}
def test_two_components(self):
comps = connected_components(_disconnected_graph())
assert len(comps) == 2
node_sets = [frozenset(c) for c in comps]
assert frozenset({"A", "B"}) in node_sets
assert frozenset({"X", "Y"}) in node_sets
def test_sorted_largest_first(self):
g = DependencyGraph()
g.add_edge("A", "B")
g.add_edge("B", "C")
g.add_edge("X", "Y")
comps = connected_components(g)
assert len(comps[0]) >= len(comps[1])
def test_empty_graph(self):
assert connected_components(_empty_graph()) == []
# ── Betweenness centrality ──────────────────────────────────────────
class TestBetweennessCentrality:
def test_linear_chain_middle_node_highest(self):
g = _linear_graph()
bc = betweenness_centrality(g)
# B and C are on all shortest paths between endpoints
assert bc["B"] > bc["A"]
assert bc["C"] > bc["D"]
def test_values_in_range(self):
bc = betweenness_centrality(_two_clusters())
for v in bc.values():
assert 0.0 <= v <= 1.0
def test_empty_graph(self):
assert betweenness_centrality(_empty_graph()) == {}
# ── Community detection ─────────────────────────────────────────────
class TestDetectCommunities:
def test_two_clusters_detected(self):
comms = detect_communities(_two_clusters(), seed=42)
# Should detect at least 2 communities
assert len(comms) >= 2
# Each node in exactly one community
all_nodes = set()
for c in comms:
all_nodes.update(c)
assert all_nodes == {"A", "B", "C", "X", "Y", "Z"}
def test_deterministic_with_seed(self):
g = _two_clusters()
c1 = detect_communities(g, seed=42)
c2 = detect_communities(g, seed=42)
assert c1 == c2
def test_empty_graph(self):
assert detect_communities(_empty_graph()) == []
def test_sorted_largest_first(self):
comms = detect_communities(_two_clusters(), seed=42)
sizes = [len(c) for c in comms]
assert sizes == sorted(sizes, reverse=True)
# ── Modularity score ────────────────────────────────────────────────
class TestModularityScore:
def test_no_edges_returns_zero(self):
assert modularity_score(_empty_graph()) == 0.0
def test_two_clusters_positive(self):
g = _two_clusters()
comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
score = modularity_score(g, communities=comms)
assert score > 0.0
def test_single_community_near_zero(self):
g = _two_clusters()
all_nodes = {"A", "B", "C", "X", "Y", "Z"}
score = modularity_score(g, communities=[all_nodes])
assert score == pytest.approx(0.0, abs=1e-10)
# ── Degree distribution ─────────────────────────────────────────────
class TestDegreeDistribution:
def test_linear_chain(self):
dd = degree_distribution(_linear_graph())
# A: out=1 in=0; B: out=1 in=1; D: out=0 in=1
assert dd["A"]["out_degree"] == 1
assert dd["A"]["in_degree"] == 0
assert dd["B"]["in_degree"] == 1
assert dd["B"]["out_degree"] == 1
assert dd["D"]["in_degree"] == 1
assert dd["D"]["out_degree"] == 0
def test_total_degree(self):
dd = degree_distribution(_linear_graph())
for node, degrees in dd.items():
assert degrees["total_degree"] == degrees["in_degree"] + degrees["out_degree"]
def test_empty_graph(self):
assert degree_distribution(_empty_graph()) == {}
# ── Cohesion / coupling ─────────────────────────────────────────────
class TestCohesionCoupling:
def test_two_clusters_with_bridge(self):
g = _two_clusters()
comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
cc = cohesion_coupling(g, communities=comms)
# 12 intra-cluster edges + 1 bridge = 13 total
assert cc["intra_edges"] == 12
assert cc["inter_edges"] == 1
assert cc["total_edges"] == 13
assert cc["cohesion"] == pytest.approx(12 / 13)
assert cc["coupling"] == pytest.approx(1 / 13)
assert cc["communities"] == 2
def test_no_edges(self):
cc = cohesion_coupling(_empty_graph())
assert cc["cohesion"] == 0.0
assert cc["coupling"] == 0.0
assert cc["total_edges"] == 0
def test_ratios_sum_to_one(self):
g = _two_clusters()
comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
cc = cohesion_coupling(g, communities=comms)
assert cc["cohesion"] + cc["coupling"] == pytest.approx(1.0)

View File

View File

@@ -0,0 +1,137 @@
"""Tests for markitect.core.section_tree."""
from markitect.core.parser import parse_markdown_to_ast
from markitect.core.section_tree import (
build_section_tree,
extract_heading_content,
extract_heading_level,
extract_section_text,
slugify,
)
class TestSlugify:
def test_simple_text(self):
assert slugify("Hello World") == "hello_world"
def test_german_umlauts(self):
assert slugify("Ärger mit Über") == "aerger_mit_ueber"
def test_special_characters(self):
assert slugify("Smith's Original Wording") == "smith_s_original_wording"
def test_empty_string(self):
assert slugify("") == "feld"
def test_trailing_underscores_stripped(self):
assert slugify("--hello--") == "hello"
def test_multiple_spaces(self):
assert slugify("a b") == "a_b"
class TestExtractHeadingLevel:
def test_h1(self):
assert extract_heading_level("h1") == 1
def test_h6(self):
assert extract_heading_level("h6") == 6
def test_invalid_tag(self):
assert extract_heading_level("p") == 1
def test_empty(self):
assert extract_heading_level("") == 1
class TestExtractHeadingContent:
def test_finds_inline_token(self):
tokens = [
{"type": "heading_open", "tag": "h1"},
{"type": "inline", "content": "Hello"},
{"type": "heading_close", "tag": "h1"},
]
assert extract_heading_content(tokens, 0) == "Hello"
def test_no_inline(self):
tokens = [
{"type": "heading_open", "tag": "h1"},
{"type": "heading_close", "tag": "h1"},
]
assert extract_heading_content(tokens, 0) == ""
class TestBuildSectionTree:
def test_single_heading(self):
md = "# Title\n\nSome text."
tokens = parse_markdown_to_ast(md)
tree = build_section_tree(tokens)
assert tree["level"] == 0
assert len(tree["children"]) == 1
assert tree["children"][0]["heading"] == "Title"
assert tree["children"][0]["level"] == 1
def test_nested_headings(self):
md = "# Top\n\n## Sub\n\ntext\n\n## Sub2\n\nmore"
tokens = parse_markdown_to_ast(md)
tree = build_section_tree(tokens)
top = tree["children"][0]
assert top["heading"] == "Top"
assert len(top["children"]) == 2
assert top["children"][0]["heading"] == "Sub"
assert top["children"][1]["heading"] == "Sub2"
def test_max_depth(self):
md = "# Top\n\n## Sub\n\n### Deep\n\ntext"
tokens = parse_markdown_to_ast(md)
tree = build_section_tree(tokens, max_depth=2)
top = tree["children"][0]
sub = top["children"][0]
# H3 should be excluded from tree
assert len(sub["children"]) == 0
def test_content_tokens_captured(self):
md = "# Title\n\nParagraph text here."
tokens = parse_markdown_to_ast(md)
tree = build_section_tree(tokens)
section = tree["children"][0]
inline_tokens = [t for t in section["content_tokens"] if t.get("type") == "inline"]
assert len(inline_tokens) == 1
assert "Paragraph text here" in inline_tokens[0]["content"]
def test_slug_assigned(self):
md = "# Economic Domain\n\ntext"
tokens = parse_markdown_to_ast(md)
tree = build_section_tree(tokens)
assert tree["children"][0]["slug"] == "economic_domain"
def test_empty_document(self):
tokens = parse_markdown_to_ast("")
tree = build_section_tree(tokens)
assert tree["children"] == []
class TestExtractSectionText:
def test_simple_paragraph(self):
md = "# Title\n\nHello world."
tokens = parse_markdown_to_ast(md)
tree = build_section_tree(tokens)
text = extract_section_text(tree["children"][0])
assert text == "Hello world."
def test_multiple_paragraphs(self):
md = "# Title\n\nFirst paragraph.\n\nSecond paragraph."
tokens = parse_markdown_to_ast(md)
tree = build_section_tree(tokens)
text = extract_section_text(tree["children"][0])
assert "First paragraph." in text
assert "Second paragraph." in text
def test_empty_section(self):
section = {"content_tokens": []}
assert extract_section_text(section) == ""

View File

View File

@@ -0,0 +1,413 @@
"""
Tests for collection-level quality checks (S2.4).
Covers all five concerns: Redundancy (C1), Coverage (C2), Coherence (C3),
Consistency (C4), Granularity (C5), and the orchestrator.
"""
from __future__ import annotations
import math
import pytest
from markitect.infospace.models import EntityMeta
from markitect.prompts.dependencies.models import DependencyGraph
# ── helpers ──────────────────────────────────────────────────────────
def _entity(slug: str, domain: str = "", definition: str = "",
source_chapter: str = "", word_count: int = 0) -> EntityMeta:
wc = word_count if word_count else (len(definition.split()) if definition else 0)
return EntityMeta(
slug=slug,
title=slug.replace("-", " ").title(),
h1_raw=slug.replace("-", " ").title(),
definition=definition,
domain=domain,
source_chapter=source_chapter,
definition_word_count=wc,
total_word_count=wc,
)
def _sample_entities() -> list[EntityMeta]:
return [
_entity("alpha", domain="economics", definition="the first concept in our model", source_chapter="ch01"),
_entity("beta", domain="economics", definition="the second concept about markets", source_chapter="ch01"),
_entity("gamma", domain="sociology", definition="a social structure framework", source_chapter="ch02"),
_entity("delta", domain="sociology", definition="a social dynamic pattern", source_chapter="ch02"),
_entity("epsilon", domain="philosophy", definition="an epistemic principle", source_chapter="ch03"),
]
def _linear_graph() -> DependencyGraph:
"""A -> B -> C -> D."""
g = DependencyGraph()
g.add_edge("A", "B")
g.add_edge("B", "C")
g.add_edge("C", "D")
return g
def _cyclic_graph() -> DependencyGraph:
"""A -> B -> C -> A (one cycle)."""
g = DependencyGraph()
g.add_edge("A", "B")
g.add_edge("B", "C")
g.add_edge("C", "A")
return g
def _can_import_graph_analysis():
try:
from markitect.analysis.graph import connected_components # noqa: F401
return True
except ImportError:
return False
# ── C1: Redundancy ──────────────────────────────────────────────────
class TestRedundancy:
def test_empty_entities(self):
from markitect.infospace.checks.redundancy import check_redundancy
report = check_redundancy([])
assert report.entity_count == 0
assert report.redundancy_ratio == 0.0
assert report.similar_pairs == []
def test_single_entity(self):
from markitect.infospace.checks.redundancy import check_redundancy
report = check_redundancy([_entity("a", definition="hello world")])
assert report.entity_count == 1
assert report.redundancy_ratio == 0.0
def test_no_overlap_word_fallback(self):
from markitect.infospace.checks.redundancy import check_redundancy
entities = [
_entity("a", definition="apple banana cherry"),
_entity("b", definition="delta epsilon zeta"),
]
report = check_redundancy(entities, threshold=0.5)
assert report.similar_pairs == []
assert report.redundancy_ratio == 0.0
def test_high_overlap_word_fallback(self):
from markitect.infospace.checks.redundancy import check_redundancy
entities = [
_entity("a", definition="the quick brown fox"),
_entity("b", definition="the quick brown dog"),
]
report = check_redundancy(entities, threshold=0.5)
assert len(report.similar_pairs) == 1
assert report.similar_pairs[0]["method"] == "word_overlap"
assert report.similar_pairs[0]["entity_a"] == "a"
assert report.similar_pairs[0]["entity_b"] == "b"
assert report.redundancy_ratio == 1.0 # both entities involved
def test_embedding_based(self):
from markitect.infospace.checks.redundancy import check_redundancy
entities = [
_entity("a", definition="x"),
_entity("b", definition="y"),
_entity("c", definition="z"),
]
# a and b are very similar; c is different
embeddings = {
"a": [1.0, 0.0, 0.0],
"b": [0.99, 0.1, 0.0],
"c": [0.0, 0.0, 1.0],
}
report = check_redundancy(entities, embeddings=embeddings, threshold=0.9)
assert len(report.similar_pairs) >= 1
assert report.similar_pairs[0]["method"] == "embedding"
assert report.redundancy_ratio > 0.0
def test_to_dict(self):
from markitect.infospace.checks.redundancy import RedundancyReport
r = RedundancyReport(similar_pairs=[], redundancy_ratio=0.25, entity_count=10)
d = r.to_dict()
assert d["concern"] == "C1"
assert d["redundancy_ratio"] == 0.25
assert d["entity_count"] == 10
# ── C2: Coverage ────────────────────────────────────────────────────
class TestCoverage:
def test_empty_entities(self):
from markitect.infospace.checks.coverage import check_coverage
report = check_coverage([])
assert report.entity_count == 0
assert report.coverage_ratio == 0.0
def test_full_coverage(self):
"""All domain×chapter cells are populated."""
from markitect.infospace.checks.coverage import check_coverage
entities = [
_entity("a", domain="d1", source_chapter="ch1"),
_entity("b", domain="d2", source_chapter="ch1"),
_entity("c", domain="d1", source_chapter="ch2"),
_entity("d", domain="d2", source_chapter="ch2"),
]
report = check_coverage(entities)
assert report.coverage_ratio == 1.0
assert report.empty_cells == []
def test_partial_coverage(self):
"""One cell is missing → coverage < 1.0."""
from markitect.infospace.checks.coverage import check_coverage
entities = [
_entity("a", domain="d1", source_chapter="ch1"),
_entity("b", domain="d2", source_chapter="ch1"),
_entity("c", domain="d1", source_chapter="ch2"),
# Missing: d2×ch2
]
report = check_coverage(entities)
assert report.coverage_ratio < 1.0
assert len(report.empty_cells) == 1
assert report.empty_cells[0]["dimension_a"] == "domain:d2"
assert report.empty_cells[0]["dimension_b"] == "chapter:ch2"
def test_domain_counts(self):
from markitect.infospace.checks.coverage import check_coverage
entities = _sample_entities()
report = check_coverage(entities)
assert report.domain_counts["economics"] == 2
assert report.domain_counts["sociology"] == 2
assert report.domain_counts["philosophy"] == 1
def test_to_dict(self):
from markitect.infospace.checks.coverage import CoverageReport
r = CoverageReport(coverage_ratio=0.75, entity_count=8)
d = r.to_dict()
assert d["concern"] == "C2"
assert d["coverage_ratio"] == 0.75
def test_extra_attributes(self):
from markitect.infospace.checks.coverage import check_coverage
entities = [
_entity("a", domain="d1", source_chapter="ch1"),
]
extra = {"a": {"vsm:production"}}
report = check_coverage(entities, extra_attributes=extra)
assert report.entity_count == 1
# ── C3: Coherence ───────────────────────────────────────────────────
class TestCoherence:
def test_no_graph(self):
from markitect.infospace.checks.coherence import check_coherence
report = check_coherence(graph=None, entity_count=5)
assert report.connected_components == 0
assert report.entity_count == 5
def test_empty_graph(self):
from markitect.infospace.checks.coherence import check_coherence
g = DependencyGraph()
report = check_coherence(graph=g, entity_count=0)
assert report.connected_components == 0
def test_to_dict(self):
from markitect.infospace.checks.coherence import CoherenceReport
r = CoherenceReport(connected_components=2, modularity=0.3456, entity_count=10)
d = r.to_dict()
assert d["concern"] == "C3"
assert d["modularity"] == 0.3456
assert d["connected_components"] == 2
@pytest.mark.skipif(
not _can_import_graph_analysis(),
reason="networkx not available",
)
def test_with_graph(self):
from markitect.infospace.checks.coherence import check_coherence
g = _linear_graph()
report = check_coherence(graph=g, entity_count=4)
assert report.connected_components >= 1
assert report.entity_count == 4
# ── C4: Consistency ─────────────────────────────────────────────────
class TestConsistency:
def test_no_graph(self):
from markitect.infospace.checks.consistency import check_consistency
entities = _sample_entities()
report = check_consistency(entities)
assert report.cycle_count == 0
assert report.entity_count == 5
def test_acyclic_graph(self):
from markitect.infospace.checks.consistency import check_consistency
entities = _sample_entities()
g = _linear_graph()
report = check_consistency(entities, graph=g)
assert report.cycle_count == 0
def test_cyclic_graph(self):
from markitect.infospace.checks.consistency import check_consistency
entities = _sample_entities()
g = _cyclic_graph()
report = check_consistency(entities, graph=g)
assert report.cycle_count >= 1
assert len(report.cycles) >= 1
def test_to_dict(self):
from markitect.infospace.checks.consistency import ConsistencyReport
r = ConsistencyReport(cycles=[["A", "B", "A"]], cycle_count=1, entity_count=5)
d = r.to_dict()
assert d["concern"] == "C4"
assert d["cycle_count"] == 1
# ── C5: Granularity ─────────────────────────────────────────────────
class TestGranularity:
def test_empty_entities(self):
from markitect.infospace.checks.granularity import check_granularity
report = check_granularity([])
assert report.entity_count == 0
assert report.domain_entropy == 0.0
def test_single_domain(self):
from markitect.infospace.checks.granularity import check_granularity
entities = [
_entity("a", domain="d1", word_count=10),
_entity("b", domain="d1", word_count=20),
]
report = check_granularity(entities)
assert report.domain_entropy == 0.0 # single domain = zero entropy
assert report.entity_count == 2
assert report.word_count_stats["mean"] == 15.0
def test_balanced_domains(self):
from markitect.infospace.checks.granularity import check_granularity
entities = [
_entity("a", domain="d1", word_count=10),
_entity("b", domain="d2", word_count=10),
]
report = check_granularity(entities)
assert report.domain_entropy == pytest.approx(1.0) # log2(2) = 1.0
assert report.domain_distribution == {"d1": 1, "d2": 1}
def test_word_count_stats(self):
from markitect.infospace.checks.granularity import check_granularity
entities = [
_entity("a", domain="d1", word_count=10),
_entity("b", domain="d1", word_count=30),
]
report = check_granularity(entities)
assert report.word_count_stats["mean"] == 20.0
assert report.word_count_stats["min"] == 10.0
assert report.word_count_stats["max"] == 30.0
assert report.word_count_stats["std"] == 10.0
def test_to_dict(self):
from markitect.infospace.checks.granularity import GranularityReport
r = GranularityReport(domain_entropy=1.5, entity_count=4)
d = r.to_dict()
assert d["concern"] == "C5"
assert d["domain_entropy"] == 1.5
def test_unspecified_domain(self):
from markitect.infospace.checks.granularity import check_granularity
entities = [_entity("a", domain="", word_count=10)]
report = check_granularity(entities)
assert "(unspecified)" in report.domain_distribution
# ── Orchestrator ────────────────────────────────────────────────────
class TestOrchestrator:
def test_run_all_default(self):
from markitect.infospace.checks.orchestrator import run_all_checks
entities = _sample_entities()
report = run_all_checks(entities)
assert report.redundancy is not None
assert report.coverage is not None
assert report.coherence is not None
assert report.consistency is not None
assert report.granularity is not None
def test_run_selected_checks(self):
from markitect.infospace.checks.orchestrator import run_all_checks
entities = _sample_entities()
report = run_all_checks(entities, checks=["redundancy", "granularity"])
assert report.redundancy is not None
assert report.granularity is not None
assert report.coverage is None
assert report.coherence is None
assert report.consistency is None
def test_to_dict(self):
from markitect.infospace.checks.orchestrator import run_all_checks
entities = _sample_entities()
report = run_all_checks(entities, checks=["granularity"])
d = report.to_dict()
assert "granularity" in d
assert "redundancy" not in d
def test_metrics(self):
from markitect.infospace.checks.orchestrator import run_all_checks
entities = _sample_entities()
report = run_all_checks(entities, checks=["redundancy", "granularity"])
m = report.metrics()
assert "redundancy_ratio" in m
assert "granularity_entropy" in m
assert isinstance(m["redundancy_ratio"], float)
assert isinstance(m["granularity_entropy"], float)
def test_metrics_empty_report(self):
from markitect.infospace.checks.orchestrator import CheckReport
report = CheckReport()
assert report.metrics() == {}
def test_run_all_with_graph(self):
from markitect.infospace.checks.orchestrator import run_all_checks
entities = _sample_entities()
g = _linear_graph()
report = run_all_checks(entities, graph=g, checks=["consistency"])
assert report.consistency is not None
assert report.consistency.cycle_count == 0
def test_run_all_with_cyclic_graph(self):
from markitect.infospace.checks.orchestrator import run_all_checks
entities = _sample_entities()
g = _cyclic_graph()
report = run_all_checks(entities, graph=g, checks=["consistency"])
assert report.consistency.cycle_count >= 1
# ── Shannon entropy helper ──────────────────────────────────────────
class TestShannonEntropy:
def test_uniform_distribution(self):
from markitect.infospace.checks.granularity import _shannon_entropy
counts = {"a": 1, "b": 1, "c": 1, "d": 1}
assert _shannon_entropy(counts) == pytest.approx(2.0) # log2(4)
def test_single_element(self):
from markitect.infospace.checks.granularity import _shannon_entropy
assert _shannon_entropy({"a": 10}) == 0.0
def test_empty(self):
from markitect.infospace.checks.granularity import _shannon_entropy
assert _shannon_entropy({}) == 0.0
def test_skewed(self):
from markitect.infospace.checks.granularity import _shannon_entropy
counts = {"a": 99, "b": 1}
entropy = _shannon_entropy(counts)
assert 0.0 < entropy < 1.0

View File

@@ -0,0 +1,225 @@
"""Tests for markitect.infospace.cli."""
from pathlib import Path
import pytest
from click.testing import CliRunner
from markitect.infospace.cli import infospace_commands
@pytest.fixture
def runner():
return CliRunner()
@pytest.fixture
def infospace_dir(tmp_path):
"""Create a minimal infospace directory with config and entities."""
config_yaml = """\
topic:
name: "Test Infospace"
domain: "Testing"
disciplines:
- name: "Test Discipline"
viability:
coverage_ratio:
min: 0.60
redundancy_ratio:
max: 0.05
"""
(tmp_path / "infospace.yaml").write_text(config_yaml)
entities = tmp_path / "output" / "entities"
entities.mkdir(parents=True)
(entities / "alpha.md").write_text(
"# Alpha\n\n## Definition\n\nAlpha is a test entity.\n\n"
"## Source Chapter\n\nChapter 1\n\n"
"## Domain\n\nProduction\n"
)
(entities / "beta.md").write_text(
"# Beta\n\n## Definition\n\nBeta is another test entity with more words "
"to make it longer.\n\n"
"## Source Chapter\n\nChapter 2\n\n"
"## Domain\n\nDistribution\n"
)
return tmp_path
# ── init ─────────────────────────────────────────────────────────────
class TestInitCommand:
def test_creates_config_file(self, runner, tmp_path):
out = tmp_path / "infospace.yaml"
result = runner.invoke(
infospace_commands,
["init", "--topic", "My Topic", "--domain", "Science", "-o", str(out)],
)
assert result.exit_code == 0
assert out.exists()
assert "Created" in result.output
def test_config_contains_topic(self, runner, tmp_path):
out = tmp_path / "infospace.yaml"
runner.invoke(
infospace_commands,
["init", "--topic", "My Topic", "-o", str(out)],
)
text = out.read_text()
assert "My Topic" in text
def test_refuses_overwrite(self, runner, tmp_path):
out = tmp_path / "infospace.yaml"
out.write_text("existing")
result = runner.invoke(
infospace_commands,
["init", "--topic", "X", "-o", str(out)],
)
assert result.exit_code != 0
assert "already exists" in result.output
def test_with_disciplines(self, runner, tmp_path):
out = tmp_path / "infospace.yaml"
result = runner.invoke(
infospace_commands,
[
"init", "--topic", "T",
"--discipline", "VSM",
"--discipline", "Category Theory",
"-o", str(out),
],
)
assert result.exit_code == 0
text = out.read_text()
assert "VSM" in text
assert "Category Theory" in text
# ── status ───────────────────────────────────────────────────────────
class TestStatusCommand:
def test_shows_topic_and_count(self, runner, infospace_dir):
result = runner.invoke(
infospace_commands,
["status", "--config", str(infospace_dir / "infospace.yaml")],
)
assert result.exit_code == 0
assert "Test Infospace" in result.output
assert "2" in result.output # 2 entities
def test_shows_domain_field(self, runner, infospace_dir):
result = runner.invoke(
infospace_commands,
["status", "--config", str(infospace_dir / "infospace.yaml")],
)
# Domain from config (topic.domain), not entity domains
assert "Testing" in result.output
def test_shows_disciplines(self, runner, infospace_dir):
result = runner.invoke(
infospace_commands,
["status", "--config", str(infospace_dir / "infospace.yaml")],
)
assert "Test Discipline" in result.output
def test_no_config_exits(self, runner, tmp_path):
result = runner.invoke(
infospace_commands,
["status", "--config", str(tmp_path / "nonexistent.yaml")],
)
assert result.exit_code != 0
# ── entities ─────────────────────────────────────────────────────────
class TestEntitiesCommand:
def test_lists_entities(self, runner, infospace_dir):
result = runner.invoke(
infospace_commands,
["entities", "--config", str(infospace_dir / "infospace.yaml")],
)
assert result.exit_code == 0
assert "alpha" in result.output
assert "beta" in result.output
assert "Total: 2" in result.output
def test_sort_by_domain(self, runner, infospace_dir):
result = runner.invoke(
infospace_commands,
[
"entities",
"--config", str(infospace_dir / "infospace.yaml"),
"--sort-by", "domain",
],
)
assert result.exit_code == 0
lines = result.output.strip().split("\n")
# Distribution comes before Production alphabetically
data_lines = [l for l in lines if "alpha" in l or "beta" in l]
assert len(data_lines) == 2
def test_no_entities_dir(self, runner, tmp_path):
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
result = runner.invoke(
infospace_commands,
["entities", "--config", str(tmp_path / "infospace.yaml")],
)
assert result.exit_code == 0
assert "No entities" in result.output
# ── viability ────────────────────────────────────────────────────────
class TestViabilityCommand:
def test_no_metrics_shows_thresholds(self, runner, infospace_dir):
result = runner.invoke(
infospace_commands,
["viability", "--config", str(infospace_dir / "infospace.yaml")],
)
assert result.exit_code == 0
assert "coverage_ratio" in result.output
def test_with_metrics_file(self, runner, infospace_dir):
import yaml
metrics_dir = infospace_dir / "output" / "metrics"
metrics_dir.mkdir(parents=True, exist_ok=True)
metrics = {"coverage_ratio": 0.85, "redundancy_ratio": 0.02}
(metrics_dir / "metrics.yaml").write_text(yaml.safe_dump(metrics))
result = runner.invoke(
infospace_commands,
["viability", "--config", str(infospace_dir / "infospace.yaml")],
)
assert result.exit_code == 0
assert "PASS" in result.output
assert "Viable: YES" in result.output
def test_failing_threshold(self, runner, infospace_dir):
import yaml
metrics_dir = infospace_dir / "output" / "metrics"
metrics_dir.mkdir(parents=True, exist_ok=True)
metrics = {"coverage_ratio": 0.3, "redundancy_ratio": 0.02}
(metrics_dir / "metrics.yaml").write_text(yaml.safe_dump(metrics))
result = runner.invoke(
infospace_commands,
["viability", "--config", str(infospace_dir / "infospace.yaml")],
)
assert result.exit_code == 0
assert "FAIL" in result.output
assert "Viable: NO" in result.output
def test_no_thresholds_configured(self, runner, tmp_path):
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
result = runner.invoke(
infospace_commands,
["viability", "--config", str(tmp_path / "infospace.yaml")],
)
assert result.exit_code == 0
assert "No viability thresholds" in result.output

View File

@@ -0,0 +1,257 @@
"""
Tests for infospace composition model (S2.6).
"""
from __future__ import annotations
from pathlib import Path
import pytest
import yaml
from markitect.infospace.composition import (
DisciplineStatus,
StaleMappingInfo,
bind_discipline,
check_discipline_status,
compute_discipline_digests,
find_stale_mappings,
get_discipline_entities,
load_discipline_config,
resolve_discipline_path,
)
from markitect.infospace.config import (
DisciplineBinding,
InfospaceConfig,
TopicConfig,
ViabilityThreshold,
save_infospace_config,
)
# ── helpers ──────────────────────────────────────────────────────────
def _create_discipline(tmp_path: Path, name: str = "test-discipline") -> Path:
"""Create a minimal discipline infospace directory."""
disc_dir = tmp_path / name
disc_dir.mkdir(parents=True, exist_ok=True)
disc_config = InfospaceConfig(
topic=TopicConfig(name=name.replace("-", " ").title(), domain="Testing"),
viability={"coverage_ratio": ViabilityThreshold(metric="coverage_ratio", min=0.5)},
)
save_infospace_config(disc_config, disc_dir / "infospace.yaml")
# Create some entities
entities_dir = disc_dir / "output" / "entities"
entities_dir.mkdir(parents=True, exist_ok=True)
for slug in ["concept_a", "concept_b", "concept-c"]:
title = slug.replace("-", " ").title()
(entities_dir / f"{slug}.md").write_text(
f"# {title}\n\n## Definition\n\nA test concept for {slug}.\n\n"
f"## Source Chapter\n\nch01\n\n## Domain\n\nTesting\n",
encoding="utf-8",
)
return disc_dir
def _parent_config(tmp_path: Path, disc_path: str = "") -> InfospaceConfig:
"""Create a parent infospace config."""
return InfospaceConfig(
topic=TopicConfig(name="Parent", domain="Testing"),
disciplines=[DisciplineBinding(name="Test Discipline", path=disc_path)]
if disc_path
else [],
)
# ── resolve_discipline_path ─────────────────────────────────────────
class TestResolveDisciplinePath:
def test_relative_path(self, tmp_path):
disc_dir = _create_discipline(tmp_path)
binding = DisciplineBinding(name="test", path="test-discipline")
result = resolve_discipline_path(binding, tmp_path)
assert result is not None
assert result == disc_dir.resolve()
def test_absolute_path(self, tmp_path):
disc_dir = _create_discipline(tmp_path)
binding = DisciplineBinding(name="test", path=str(disc_dir))
result = resolve_discipline_path(binding, tmp_path / "other")
assert result is not None
assert result == disc_dir.resolve()
def test_missing_path(self, tmp_path):
binding = DisciplineBinding(name="test", path="nonexistent")
assert resolve_discipline_path(binding, tmp_path) is None
def test_empty_path(self, tmp_path):
binding = DisciplineBinding(name="test", path="")
assert resolve_discipline_path(binding, tmp_path) is None
# ── load_discipline_config ──────────────────────────────────────────
class TestLoadDisciplineConfig:
def test_loads_config(self, tmp_path):
disc_dir = _create_discipline(tmp_path)
binding = DisciplineBinding(name="test", path="test-discipline")
config = load_discipline_config(binding, tmp_path)
assert config is not None
assert config.topic.domain == "Testing"
def test_missing_config_file(self, tmp_path):
(tmp_path / "no-config").mkdir()
binding = DisciplineBinding(name="test", path="no-config")
assert load_discipline_config(binding, tmp_path) is None
def test_missing_directory(self, tmp_path):
binding = DisciplineBinding(name="test", path="gone")
assert load_discipline_config(binding, tmp_path) is None
# ── check_discipline_status ─────────────────────────────────────────
class TestCheckDisciplineStatus:
def test_valid_discipline(self, tmp_path):
_create_discipline(tmp_path)
binding = DisciplineBinding(name="Test Discipline", path="test-discipline")
status = check_discipline_status(binding, tmp_path)
assert status.exists
assert status.has_config
assert status.entity_count == 3
assert status.error == ""
def test_missing_discipline(self, tmp_path):
binding = DisciplineBinding(name="Missing", path="nope")
status = check_discipline_status(binding, tmp_path)
assert not status.exists
assert "not found" in status.error.lower()
def test_no_config(self, tmp_path):
(tmp_path / "bare").mkdir()
binding = DisciplineBinding(name="Bare", path="bare")
status = check_discipline_status(binding, tmp_path)
assert status.exists
assert not status.has_config
def test_viable_with_metrics(self, tmp_path):
disc_dir = _create_discipline(tmp_path)
# Write metrics that meet the threshold
metrics_dir = disc_dir / "output" / "metrics"
metrics_dir.mkdir(parents=True, exist_ok=True)
(metrics_dir / "metrics.yaml").write_text(
yaml.safe_dump({"coverage_ratio": 0.8}), encoding="utf-8"
)
binding = DisciplineBinding(name="Test", path="test-discipline")
status = check_discipline_status(binding, tmp_path)
assert status.is_viable
def test_not_viable_below_threshold(self, tmp_path):
disc_dir = _create_discipline(tmp_path)
metrics_dir = disc_dir / "output" / "metrics"
metrics_dir.mkdir(parents=True, exist_ok=True)
(metrics_dir / "metrics.yaml").write_text(
yaml.safe_dump({"coverage_ratio": 0.2}), encoding="utf-8"
)
binding = DisciplineBinding(name="Test", path="test-discipline")
status = check_discipline_status(binding, tmp_path)
assert not status.is_viable
def test_to_dict(self, tmp_path):
_create_discipline(tmp_path)
binding = DisciplineBinding(name="Test", path="test-discipline")
status = check_discipline_status(binding, tmp_path)
d = status.to_dict()
assert d["name"] == "Test"
assert d["exists"] is True
assert d["entity_count"] == 3
# ── get_discipline_entities ─────────────────────────────────────────
class TestGetDisciplineEntities:
def test_returns_entities(self, tmp_path):
_create_discipline(tmp_path)
binding = DisciplineBinding(name="Test", path="test-discipline")
entities = get_discipline_entities(binding, tmp_path)
assert len(entities) == 3
slugs = {e.slug for e in entities}
assert "concept_a" in slugs
def test_missing_discipline(self, tmp_path):
binding = DisciplineBinding(name="Test", path="nope")
assert get_discipline_entities(binding, tmp_path) == []
# ── compute_discipline_digests ──────────────────────────────────────
class TestComputeDisciplineDigests:
def test_returns_digests(self, tmp_path):
_create_discipline(tmp_path)
binding = DisciplineBinding(name="Test", path="test-discipline")
digests = compute_discipline_digests(binding, tmp_path)
assert len(digests) == 3
assert "concept_a" in digests
assert isinstance(digests["concept_a"], str)
assert len(digests["concept_a"]) == 12
# ── find_stale_mappings ─────────────────────────────────────────────
class TestFindStaleMappings:
def test_no_references(self, tmp_path):
cfg = _parent_config(tmp_path, disc_path="test-discipline")
assert find_stale_mappings(cfg, tmp_path) == []
def test_no_stale(self, tmp_path):
_create_discipline(tmp_path)
cfg = _parent_config(tmp_path, disc_path="test-discipline")
refs = {"entity_x": ["concept_a", "concept_b"]}
stale = find_stale_mappings(cfg, tmp_path, mapping_references=refs)
assert stale == []
def test_detects_stale(self, tmp_path):
_create_discipline(tmp_path)
cfg = _parent_config(tmp_path, disc_path="test-discipline")
refs = {"entity_x": ["concept_a", "deleted_concept"]}
stale = find_stale_mappings(cfg, tmp_path, mapping_references=refs)
assert len(stale) == 1
assert stale[0].entity_slug == "entity_x"
assert stale[0].discipline_entity == "deleted_concept"
def test_stale_to_dict(self):
info = StaleMappingInfo(
entity_slug="e1", discipline_entity="d1", reason="gone"
)
d = info.to_dict()
assert d["entity_slug"] == "e1"
# ── bind_discipline ─────────────────────────────────────────────────
class TestBindDiscipline:
def test_adds_binding(self, tmp_path):
_create_discipline(tmp_path)
cfg = InfospaceConfig(topic=TopicConfig(name="Parent"))
status = bind_discipline(cfg, name="Test", path="test-discipline", root=tmp_path)
assert status.exists
assert len(cfg.disciplines) == 1
assert cfg.disciplines[0].name == "Test"
def test_duplicate_rejected(self, tmp_path):
_create_discipline(tmp_path)
cfg = _parent_config(tmp_path, disc_path="test-discipline")
status = bind_discipline(cfg, name="Test Discipline", path="x", root=tmp_path)
assert "already bound" in status.error

View File

@@ -0,0 +1,400 @@
"""Tests for markitect.infospace.config and state."""
from datetime import datetime
from pathlib import Path
import pytest
from markitect.infospace.config import (
DisciplineBinding,
InfospaceConfig,
PipelineConfig,
PipelineStage,
SchemaRegistry,
TopicConfig,
ViabilityThreshold,
find_infospace_config,
load_infospace_config,
save_infospace_config,
)
from markitect.infospace.state import (
InfospaceState,
ViabilityResult,
build_state,
)
from markitect.infospace.models import EntityMeta
from markitect.infospace.evaluation import (
EntityEvaluation,
EvaluationSnapshot,
ScoreEntry,
)
# ── Helpers ──────────────────────────────────────────────────────────
_SAMPLE_YAML = """\
topic:
name: "The Wealth of Nations"
domain: "Classical Economics"
sources: artifacts/sources/
disciplines:
- name: "Viable System Model"
path: artifacts/vsm-reference/
schemas:
entity: schemas/economic-entity-schema-v1.0.md
mapping: schemas/vsm-mapping-schema-v1.0.md
competency_questions: schemas/competency-questions.md
viability:
coverage_ratio:
min: 0.60
per_entity_mean:
min: 3.5
redundancy_ratio:
max: 0.05
pipeline:
stages:
- template: extract-entities
spaces: [sources, guidelines]
- template: map-to-vsm
spaces: [entities, vsm-reference]
post_batch:
- template: assess-metrics
"""
def _sample_config() -> InfospaceConfig:
return InfospaceConfig(
topic=TopicConfig(name="Test Topic", domain="Testing"),
disciplines=[DisciplineBinding(name="VSM", path="vsm/")],
schemas=SchemaRegistry(entity="schemas/entity.md"),
competency_questions="schemas/cq.md",
viability={
"coverage_ratio": ViabilityThreshold("coverage_ratio", min=0.6),
"redundancy_ratio": ViabilityThreshold("redundancy_ratio", max=0.05),
},
)
def _sample_entities(n=5) -> list:
return [
EntityMeta(
slug=f"entity-{i}",
title=f"Entity {i}",
h1_raw=f"Entity {i}",
domain="Production" if i % 2 == 0 else "Distribution",
)
for i in range(n)
]
# ── TopicConfig ──────────────────────────────────────────────────────
class TestTopicConfig:
def test_round_trip(self):
tc = TopicConfig("WoN", "Economics", "sources/")
d = tc.to_dict()
restored = TopicConfig.from_dict(d)
assert restored.name == "WoN"
assert restored.domain == "Economics"
assert restored.sources == "sources/"
def test_minimal(self):
tc = TopicConfig.from_dict({"name": "Minimal"})
assert tc.domain == ""
assert tc.sources == ""
def test_to_dict_omits_empty(self):
tc = TopicConfig("X")
d = tc.to_dict()
assert "domain" not in d
assert "sources" not in d
# ── DisciplineBinding ────────────────────────────────────────────────
class TestDisciplineBinding:
def test_round_trip(self):
db = DisciplineBinding("VSM", "path/to/vsm")
d = db.to_dict()
restored = DisciplineBinding.from_dict(d)
assert restored.name == "VSM"
assert restored.path == "path/to/vsm"
# ── SchemaRegistry ───────────────────────────────────────────────────
class TestSchemaRegistry:
def test_round_trip(self):
sr = SchemaRegistry(entity="e.md", mapping="m.md", analysis="a.md")
d = sr.to_dict()
restored = SchemaRegistry.from_dict(d)
assert restored.entity == "e.md"
assert restored.mapping == "m.md"
def test_extra_schemas(self):
sr = SchemaRegistry.from_dict({"entity": "e.md", "custom": "c.md"})
assert sr.entity == "e.md"
assert sr.extra == {"custom": "c.md"}
# ── ViabilityThreshold ──────────────────────────────────────────────
class TestViabilityThreshold:
def test_min_check(self):
t = ViabilityThreshold("x", min=0.5)
assert t.check(0.6) is True
assert t.check(0.5) is True
assert t.check(0.4) is False
def test_max_check(self):
t = ViabilityThreshold("x", max=0.1)
assert t.check(0.05) is True
assert t.check(0.1) is True
assert t.check(0.2) is False
def test_min_and_max(self):
t = ViabilityThreshold("x", min=0.3, max=0.7)
assert t.check(0.5) is True
assert t.check(0.2) is False
assert t.check(0.8) is False
def test_no_bounds_always_passes(self):
t = ViabilityThreshold("x")
assert t.check(999.0) is True
# ── PipelineConfig ──────────────────────────────────────────────────
class TestPipelineConfig:
def test_round_trip(self):
pc = PipelineConfig(
stages=[PipelineStage("extract", ["s1", "s2"])],
post_batch=[PipelineStage("assess")],
)
d = pc.to_dict()
restored = PipelineConfig.from_dict(d)
assert len(restored.stages) == 1
assert restored.stages[0].template == "extract"
assert restored.stages[0].spaces == ["s1", "s2"]
assert len(restored.post_batch) == 1
# ── InfospaceConfig ─────────────────────────────────────────────────
class TestInfospaceConfig:
def test_to_dict_from_dict_round_trip(self):
cfg = _sample_config()
d = cfg.to_dict()
restored = InfospaceConfig.from_dict(d)
assert restored.topic.name == "Test Topic"
assert len(restored.disciplines) == 1
assert restored.schemas.entity == "schemas/entity.md"
assert restored.competency_questions == "schemas/cq.md"
assert len(restored.viability) == 2
def test_viability_thresholds_preserved(self):
cfg = _sample_config()
d = cfg.to_dict()
restored = InfospaceConfig.from_dict(d)
assert restored.viability["coverage_ratio"].min == 0.6
assert restored.viability["redundancy_ratio"].max == 0.05
def test_default_dirs(self):
cfg = InfospaceConfig(topic=TopicConfig("X"))
assert cfg.entities_dir == "output/entities"
assert cfg.evaluations_dir == "output/evaluations"
assert cfg.metrics_dir == "output/metrics"
def test_custom_dirs(self):
cfg = InfospaceConfig.from_dict({
"topic": {"name": "X"},
"entities_dir": "custom/entities",
})
assert cfg.entities_dir == "custom/entities"
# ── YAML I/O ────────────────────────────────────────────────────────
class TestYAMLIO:
def test_save_load_round_trip(self, tmp_path):
cfg = _sample_config()
p = tmp_path / "infospace.yaml"
save_infospace_config(cfg, p)
loaded = load_infospace_config(p)
assert loaded.topic.name == cfg.topic.name
assert len(loaded.viability) == len(cfg.viability)
def test_load_full_example(self, tmp_path):
p = tmp_path / "infospace.yaml"
p.write_text(_SAMPLE_YAML, encoding="utf-8")
cfg = load_infospace_config(p)
assert cfg.topic.name == "The Wealth of Nations"
assert cfg.topic.domain == "Classical Economics"
assert len(cfg.disciplines) == 1
assert cfg.disciplines[0].name == "Viable System Model"
assert cfg.schemas.entity == "schemas/economic-entity-schema-v1.0.md"
assert cfg.competency_questions == "schemas/competency-questions.md"
assert len(cfg.viability) == 3
assert cfg.viability["coverage_ratio"].min == 0.60
assert cfg.viability["redundancy_ratio"].max == 0.05
assert cfg.pipeline is not None
assert len(cfg.pipeline.stages) == 2
assert len(cfg.pipeline.post_batch) == 1
def test_load_missing_file(self, tmp_path):
with pytest.raises(FileNotFoundError):
load_infospace_config(tmp_path / "nonexistent.yaml")
def test_load_missing_topic(self, tmp_path):
p = tmp_path / "bad.yaml"
p.write_text("schemas:\n entity: x.md\n")
with pytest.raises(ValueError, match="topic"):
load_infospace_config(p)
def test_save_creates_parent_dirs(self, tmp_path):
cfg = InfospaceConfig(topic=TopicConfig("X"))
p = tmp_path / "deep" / "nested" / "infospace.yaml"
save_infospace_config(cfg, p)
assert p.exists()
class TestFindConfig:
def test_finds_config_in_current_dir(self, tmp_path):
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
found = find_infospace_config(tmp_path)
assert found is not None
assert found.name == "infospace.yaml"
def test_finds_config_in_parent(self, tmp_path):
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
child = tmp_path / "sub" / "dir"
child.mkdir(parents=True)
found = find_infospace_config(child)
assert found is not None
def test_returns_none_if_not_found(self, tmp_path):
assert find_infospace_config(tmp_path) is None
# ── InfospaceState ──────────────────────────────────────────────────
class TestInfospaceState:
def test_entity_count(self):
cfg = _sample_config()
state = InfospaceState(config=cfg, entities=_sample_entities(5))
assert state.entity_count == 5
def test_topic_name(self):
cfg = _sample_config()
state = InfospaceState(config=cfg)
assert state.topic_name == "Test Topic"
def test_domains(self):
cfg = _sample_config()
state = InfospaceState(config=cfg, entities=_sample_entities(4))
assert "Production" in state.domains
assert "Distribution" in state.domains
def test_has_evaluations(self):
cfg = _sample_config()
state = InfospaceState(config=cfg)
assert state.has_evaluations is False
snap = EvaluationSnapshot(
snapshot_id="s1",
created_at=datetime(2026, 1, 1),
schema_name="Test",
entity_count=0,
)
state.latest_snapshot = snap
assert state.has_evaluations is True
class TestViabilityCheck:
def test_all_pass(self):
cfg = _sample_config()
state = InfospaceState(config=cfg)
metrics = {"coverage_ratio": 0.8, "redundancy_ratio": 0.02}
results = state.check_viability(metrics)
assert all(r.passed for r in results)
assert state.is_viable is True
def test_one_fails(self):
cfg = _sample_config()
state = InfospaceState(config=cfg)
metrics = {"coverage_ratio": 0.4, "redundancy_ratio": 0.02}
results = state.check_viability(metrics)
assert not all(r.passed for r in results)
assert state.is_viable is False
def test_missing_metric_defaults_to_zero(self):
cfg = _sample_config()
state = InfospaceState(config=cfg)
# coverage_ratio min=0.6, missing → 0.0 → fails
results = state.check_viability({})
coverage = next(r for r in results if r.metric == "coverage_ratio")
assert coverage.passed is False
assert coverage.value == 0.0
def test_viability_counts(self):
cfg = _sample_config()
state = InfospaceState(config=cfg)
metrics = {"coverage_ratio": 0.8, "redundancy_ratio": 0.2}
state.check_viability(metrics)
assert state.viability_pass_count == 1 # coverage passes
assert state.viability_total_count == 2
def test_no_thresholds_not_viable(self):
cfg = InfospaceConfig(topic=TopicConfig("X"))
state = InfospaceState(config=cfg)
assert state.is_viable is False
class TestBuildState:
def test_builds_with_entities(self):
cfg = _sample_config()
entities = _sample_entities(3)
state = build_state(cfg, entities=entities)
assert state.entity_count == 3
def test_builds_with_metrics(self):
cfg = _sample_config()
metrics = {"coverage_ratio": 0.9, "redundancy_ratio": 0.01}
state = build_state(cfg, metrics=metrics)
assert state.is_viable is True
def test_summary(self):
cfg = _sample_config()
entities = _sample_entities(3)
metrics = {"coverage_ratio": 0.9, "redundancy_ratio": 0.01}
state = build_state(cfg, entities=entities, metrics=metrics)
s = state.summary()
assert s["topic"] == "Test Topic"
assert s["entity_count"] == 3
assert s["viable"] is True
class TestViabilityResult:
def test_to_dict(self):
t = ViabilityThreshold("x", min=0.5)
r = ViabilityResult(metric="x", value=0.7, threshold=t, passed=True)
d = r.to_dict()
assert d["metric"] == "x"
assert d["value"] == 0.7
assert d["passed"] is True
assert d["min"] == 0.5
assert "max" not in d

View File

@@ -0,0 +1,230 @@
"""Tests for markitect.infospace.entity_parser and EntityMeta."""
import logging
from pathlib import Path
import pytest
from markitect.infospace import EntityMeta, parse_entity_file, parse_entity_directory
# ── Fixtures ────────────────────────────────────────────────────────
COMPLETE_ENTITY = """\
# Division of Labour
## Definition
The separation of a work process into a number of distinct tasks, each performed
by a specialised worker, resulting in a significant increase in the productive
powers of labour.
## Source Chapter
Book I, Chapter 1: "Of the Division of Labour"
## Context
The division of labour is the central argument of the chapter.
## Economic Domain
Production
## Smith's Original Wording
"The greatest improvements in the productive powers of labour…"
## Modern Interpretation
The division of labour remains a foundational concept in economics.
"""
MINIMAL_ENTITY = """\
# Minimal Entity
## Definition
A brief definition.
## Source Chapter
Book I, Chapter 1
## Context
Some context.
## Economic Domain
Exchange
"""
SLUG_H1_ENTITY = """\
# effectual-demand
## Definition
Effectual demand is the demand by consumers who are willing and able to pay.
## Source Chapter
Book 1, Chapter 7
## Context
Context for effectual demand.
## Economic Domain
Exchange
## Smith's Original Wording
"Such people may be called the effectual demanders…"
## Modern Interpretation
Represents the intersection of desire and purchasing power.
"""
NO_H1 = """\
## Only H2
Some content.
"""
# ── parse_entity_file ────────────────────────────────────────────────
class TestParseEntityFile:
def test_complete_entity(self, tmp_path):
f = tmp_path / "division-of-labour.md"
f.write_text(COMPLETE_ENTITY)
meta = parse_entity_file(f)
assert meta.slug == "division_of_labour"
assert meta.title == "Division of Labour"
assert meta.h1_is_title_case is True
assert meta.has_original_wording is True
assert meta.domain == "Production"
assert meta.definition_word_count > 20
assert "separation" in meta.definition.lower()
assert meta.source_path == str(f)
assert "definition" in meta.section_slugs
assert "smith_s_original_wording" in meta.section_slugs
def test_minimal_entity(self, tmp_path):
f = tmp_path / "minimal-entity.md"
f.write_text(MINIMAL_ENTITY)
meta = parse_entity_file(f)
assert meta.slug == "minimal_entity"
assert meta.has_original_wording is False
assert meta.original_wording == ""
assert meta.modern_interpretation == ""
assert meta.domain == "Exchange"
def test_slug_format_h1(self, tmp_path):
f = tmp_path / "effectual-demand.md"
f.write_text(SLUG_H1_ENTITY)
meta = parse_entity_file(f)
assert meta.h1_raw == "effectual-demand"
assert meta.h1_is_title_case is False
assert meta.slug == "effectual_demand"
assert meta.has_original_wording is True
def test_missing_h1_raises(self, tmp_path):
f = tmp_path / "no-h1.md"
f.write_text(NO_H1)
with pytest.raises(ValueError, match="No H1"):
parse_entity_file(f)
def test_missing_sections_return_empty(self, tmp_path):
f = tmp_path / "minimal.md"
f.write_text(MINIMAL_ENTITY)
meta = parse_entity_file(f)
# Optional sections not present → empty string
assert meta.original_wording == ""
assert meta.modern_interpretation == ""
def test_word_count_accuracy(self, tmp_path):
f = tmp_path / "test.md"
f.write_text("# Test\n\n## Definition\n\none two three four five\n")
meta = parse_entity_file(f)
assert meta.definition_word_count == 5
# ── parse_entity_directory ──────────────────────────────────────────
class TestParseEntityDirectory:
def _make_dir(self, tmp_path):
"""Create a temporary entity directory."""
d = tmp_path / "entities"
d.mkdir()
(d / "entity-a.md").write_text(COMPLETE_ENTITY)
(d / "entity-b.md").write_text(MINIMAL_ENTITY)
# files that should be excluded by default
(d / "book-1-chapter-01-entities.md").write_text("# View\n\nview file")
(d / "book-1-chapter-01-prompt.md").write_text("# Prompt\n\nprompt file")
return d
def test_excludes_view_and_prompt(self, tmp_path):
d = self._make_dir(tmp_path)
results = parse_entity_directory(d)
slugs = {e.slug for e in results}
assert "division_of_labour" in slugs
assert "minimal_entity" in slugs
# Excluded files should not be parsed as entities
assert len(results) == 2
def test_custom_exclude_patterns(self, tmp_path):
d = self._make_dir(tmp_path)
# Only exclude prompt files, allow entity views
results = parse_entity_directory(d, exclude_patterns=[r".*-prompt\.md$"])
assert len(results) == 3 # entity-a, entity-b, chapter-01-entities
def test_malformed_skipped_with_warning(self, tmp_path, caplog):
d = tmp_path / "entities"
d.mkdir()
(d / "good.md").write_text(COMPLETE_ENTITY)
(d / "bad.md").write_text(NO_H1)
with caplog.at_level(logging.WARNING):
results = parse_entity_directory(d)
assert len(results) == 1
assert "bad.md" in caplog.text
# ── EntityMeta round-trip ───────────────────────────────────────────
class TestEntityMetaRoundTrip:
def test_to_dict_from_dict(self, tmp_path):
f = tmp_path / "entity.md"
f.write_text(COMPLETE_ENTITY)
original = parse_entity_file(f)
data = original.to_dict()
restored = EntityMeta.from_dict(data)
assert restored.slug == original.slug
assert restored.title == original.title
assert restored.definition == original.definition
assert restored.h1_is_title_case == original.h1_is_title_case
assert restored.section_slugs == original.section_slugs
assert restored.definition_word_count == original.definition_word_count
def test_from_dict_ignores_unknown_keys(self):
data = {
"slug": "test",
"title": "Test",
"h1_raw": "Test",
"unknown_field": "should be ignored",
}
meta = EntityMeta.from_dict(data)
assert meta.slug == "test"
assert not hasattr(meta, "unknown_field") or "unknown_field" not in meta.__dict__

View File

@@ -0,0 +1,224 @@
"""Tests for markitect.infospace.evaluate."""
from datetime import datetime
from pathlib import Path
import pytest
from markitect.infospace.config import InfospaceConfig, TopicConfig
from markitect.infospace.evaluate import (
build_evaluation_prompt,
content_digest,
parse_evaluation_response,
run_entity_evaluation,
)
from markitect.infospace.evaluation import ScoreEntry
from markitect.infospace.models import EntityMeta
from markitect.prompts.execution.llm_adapter import MockLLMAdapter
from markitect.prompts.execution.models import RunConfig
# ── Helpers ──────────────────────────────────────────────────────────
def _entity(**overrides) -> EntityMeta:
defaults = dict(
slug="division-of-labour",
title="Division Of Labour",
h1_raw="Division Of Labour",
definition="Splitting work into specialised tasks.",
source_chapter="Book I Chapter 1",
context="Smith introduces the concept early.",
domain="Production",
source_path="entities/division-of-labour.md",
)
defaults.update(overrides)
return EntityMeta(**defaults)
def _config() -> InfospaceConfig:
return InfospaceConfig(topic=TopicConfig(name="The Wealth of Nations"))
_MOCK_RESPONSE = """\
DIMENSION: definition_precision
SCORE: 4.5
RATIONALE: Clear and specific definition of the concept.
DIMENSION: source_grounding
SCORE: 4.0
RATIONALE: Well grounded in Smith's text.
DIMENSION: domain_relevance
SCORE: 5.0
RATIONALE: Directly relevant to production economics.
"""
# ── build_evaluation_prompt ──────────────────────────────────────────
class TestBuildPrompt:
def test_contains_entity_fields(self):
entity = _entity()
prompt = build_evaluation_prompt(entity, "Test Topic")
assert "division-of-labour" in prompt
assert "Division Of Labour" in prompt
assert "Production" in prompt
assert "Splitting work" in prompt
def test_contains_topic(self):
prompt = build_evaluation_prompt(_entity(), "WoN")
assert "WoN" in prompt
def test_contains_dimensions(self):
prompt = build_evaluation_prompt(_entity(), "T")
assert "definition_precision" in prompt
assert "source_grounding" in prompt
def test_custom_dimensions(self):
prompt = build_evaluation_prompt(
_entity(), "T", dimensions=["novelty", "coherence"]
)
assert "novelty" in prompt
assert "coherence" in prompt
assert "definition_precision" not in prompt
def test_handles_missing_fields(self):
entity = _entity(definition="", context="", domain="")
prompt = build_evaluation_prompt(entity, "T")
assert "(no definition)" in prompt
assert "(no context)" in prompt
assert "(unspecified)" in prompt
# ── content_digest ───────────────────────────────────────────────────
class TestContentDigest:
def test_deterministic(self):
e = _entity()
assert content_digest(e) == content_digest(e)
def test_changes_with_content(self):
e1 = _entity(definition="A")
e2 = _entity(definition="B")
assert content_digest(e1) != content_digest(e2)
# ── parse_evaluation_response ────────────────────────────────────────
class TestParseResponse:
def test_parses_three_dimensions(self):
scores = parse_evaluation_response(_MOCK_RESPONSE)
assert len(scores) == 3
def test_correct_names(self):
scores = parse_evaluation_response(_MOCK_RESPONSE)
names = [s.name for s in scores]
assert "definition_precision" in names
assert "source_grounding" in names
assert "domain_relevance" in names
def test_correct_scores(self):
scores = parse_evaluation_response(_MOCK_RESPONSE)
by_name = {s.name: s for s in scores}
assert by_name["definition_precision"].value == 4.5
assert by_name["source_grounding"].value == 4.0
assert by_name["domain_relevance"].value == 5.0
def test_correct_rationales(self):
scores = parse_evaluation_response(_MOCK_RESPONSE)
by_name = {s.name: s for s in scores}
assert "Clear" in by_name["definition_precision"].rationale
def test_empty_response(self):
scores = parse_evaluation_response("")
assert scores == []
def test_malformed_score_skipped(self):
text = "DIMENSION: x\nSCORE: not-a-number\nRATIONALE: oops"
scores = parse_evaluation_response(text)
assert len(scores) == 0
# ── run_entity_evaluation ────────────────────────────────────────────
class TestRunEntityEvaluation:
def test_evaluates_entities(self, tmp_path):
adapter = MockLLMAdapter(_MOCK_RESPONSE)
cfg = _config()
entities = [_entity(), _entity(slug="pin-factory", title="Pin Factory")]
summary = run_entity_evaluation(
config=cfg,
entities=entities,
adapter=adapter,
output_dir=tmp_path / "evals",
)
assert summary.total == 2
assert summary.succeeded == 2
assert adapter.call_count == 2
def test_writes_evaluation_files(self, tmp_path):
adapter = MockLLMAdapter(_MOCK_RESPONSE)
cfg = _config()
entities = [_entity()]
run_entity_evaluation(
config=cfg,
entities=entities,
adapter=adapter,
output_dir=tmp_path / "evals",
)
eval_file = tmp_path / "evals" / "division-of-labour.md"
assert eval_file.exists()
text = eval_file.read_text()
assert "definition_precision" in text
def test_incremental_skip(self, tmp_path):
adapter = MockLLMAdapter(_MOCK_RESPONSE)
cfg = _config()
entity = _entity()
digest = content_digest(entity)
summary = run_entity_evaluation(
config=cfg,
entities=[entity],
adapter=adapter,
output_dir=tmp_path,
previous_digests={entity.slug: digest},
)
assert summary.skipped == 1
assert adapter.call_count == 0
def test_progress_callback_called(self, tmp_path):
adapter = MockLLMAdapter(_MOCK_RESPONSE)
cfg = _config()
calls = []
run_entity_evaluation(
config=cfg,
entities=[_entity()],
adapter=adapter,
output_dir=tmp_path,
progress_callback=lambda d, t, r: calls.append((d, t, r.key)),
)
assert len(calls) == 1
assert calls[0] == (1, 1, "division-of-labour")
def test_passes_run_config(self, tmp_path):
adapter = MockLLMAdapter(_MOCK_RESPONSE)
cfg = _config()
rc = RunConfig(temperature=0.1, max_tokens=500)
run_entity_evaluation(
config=cfg,
entities=[_entity()],
adapter=adapter,
run_config=rc,
output_dir=tmp_path,
)
assert adapter.last_config.temperature == 0.1

View File

@@ -0,0 +1,398 @@
"""Tests for markitect.infospace evaluation models and I/O."""
from datetime import datetime
from pathlib import Path
import pytest
from markitect.infospace import (
EntityEvaluation,
EvaluationSnapshot,
MetricChange,
MetricValue,
ScoreChange,
ScoreEntry,
SnapshotDiff,
append_to_history,
diff_snapshots,
read_entity_evaluation,
read_history,
read_snapshot,
write_entity_evaluation,
write_snapshot,
)
# ── Helpers ──────────────────────────────────────────────────────────
_NOW = datetime(2026, 2, 19, 12, 0, 0)
def _sample_scores() -> list:
return [
ScoreEntry("definition_precision", 4.5, rationale="Clear and specific."),
ScoreEntry("source_grounding", 4.0, rationale="Well grounded."),
ScoreEntry("domain_relevance", 4.5),
]
def _sample_evaluation(**overrides) -> EntityEvaluation:
defaults = dict(
entity_slug="division-of-labour",
evaluator="openrouter/anthropic/claude-3.5-sonnet",
scores=_sample_scores(),
evaluated_at=_NOW,
notes=["Strong entity with clear provenance"],
)
defaults.update(overrides)
return EntityEvaluation(**defaults)
def _sample_metric() -> MetricValue:
return MetricValue("coverage_ratio", 0.85, concern="C2", details={"checked": 85})
def _sample_snapshot(**overrides) -> EvaluationSnapshot:
defaults = dict(
snapshot_id="2026-02-19",
created_at=_NOW,
schema_name="Economic Entity",
entity_count=1,
entity_evaluations=[_sample_evaluation()],
collection_metrics=[_sample_metric()],
metadata={"version": "1.0"},
)
defaults.update(overrides)
return EvaluationSnapshot(**defaults)
# ── Model tests ──────────────────────────────────────────────────────
class TestScoreEntry:
def test_to_dict_from_dict_round_trip(self):
se = ScoreEntry("precision", 4.5, 5.0, "Good definition.")
d = se.to_dict()
restored = ScoreEntry.from_dict(d)
assert restored.name == se.name
assert restored.value == se.value
assert restored.max_value == se.max_value
assert restored.rationale == se.rationale
def test_to_dict_omits_empty_rationale(self):
se = ScoreEntry("precision", 4.5)
d = se.to_dict()
assert "rationale" not in d
def test_from_dict_defaults(self):
se = ScoreEntry.from_dict({"name": "x", "value": 3.0})
assert se.max_value == 5.0
assert se.rationale == ""
class TestEntityEvaluation:
def test_overall_score_is_mean(self):
ev = _sample_evaluation()
# (4.5 + 4.0 + 4.5) / 3 ≈ 4.333
assert abs(ev.overall_score - 4.333333) < 0.001
def test_overall_score_zero_scores(self):
ev = _sample_evaluation(scores=[])
assert ev.overall_score == 0.0
def test_to_dict_from_dict_round_trip(self):
ev = _sample_evaluation()
d = ev.to_dict()
restored = EntityEvaluation.from_dict(d)
assert restored.entity_slug == ev.entity_slug
assert restored.evaluator == ev.evaluator
assert len(restored.scores) == len(ev.scores)
assert restored.evaluated_at == ev.evaluated_at
assert restored.notes == ev.notes
def test_to_dict_includes_overall_score(self):
ev = _sample_evaluation()
d = ev.to_dict()
assert "overall_score" in d
assert abs(d["overall_score"] - 4.3333) < 0.01
class TestMetricValue:
def test_to_dict_from_dict_round_trip(self):
mv = _sample_metric()
d = mv.to_dict()
restored = MetricValue.from_dict(d)
assert restored.name == mv.name
assert restored.value == mv.value
assert restored.concern == mv.concern
assert restored.details == mv.details
def test_to_dict_omits_empty_concern(self):
mv = MetricValue("x", 1.0)
d = mv.to_dict()
assert "concern" not in d
assert "details" not in d
class TestEvaluationSnapshot:
def test_to_dict_from_dict_round_trip(self):
snap = _sample_snapshot()
d = snap.to_dict()
restored = EvaluationSnapshot.from_dict(d)
assert restored.snapshot_id == snap.snapshot_id
assert restored.created_at == snap.created_at
assert restored.schema_name == snap.schema_name
assert restored.entity_count == snap.entity_count
assert len(restored.entity_evaluations) == 1
assert len(restored.collection_metrics) == 1
assert restored.metadata == snap.metadata
def test_from_dict_empty_lists(self):
d = {
"snapshot_id": "test",
"created_at": _NOW.isoformat(),
"schema_name": "Test",
"entity_count": 0,
}
snap = EvaluationSnapshot.from_dict(d)
assert snap.entity_evaluations == []
assert snap.collection_metrics == []
assert snap.metadata == {}
# ── Per-entity file I/O ──────────────────────────────────────────────
class TestEntityEvaluationIO:
def test_write_creates_file(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "eval.md"
write_entity_evaluation(ev, p)
assert p.exists()
def test_file_has_yaml_frontmatter(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "eval.md"
write_entity_evaluation(ev, p)
text = p.read_text()
assert text.startswith("---\n")
assert "\n---\n" in text
def test_frontmatter_contains_expected_keys(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "eval.md"
write_entity_evaluation(ev, p)
text = p.read_text()
for key in ["entity_slug", "evaluator", "evaluated_at", "overall_score", "scores"]:
assert key in text
def test_markdown_body_contains_rationales(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "eval.md"
write_entity_evaluation(ev, p)
text = p.read_text()
assert "Clear and specific." in text
assert "Well grounded." in text
assert "## definition_precision" in text
def test_read_back_matches_original(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "eval.md"
write_entity_evaluation(ev, p)
restored = read_entity_evaluation(p)
assert restored.entity_slug == ev.entity_slug
assert restored.evaluator == ev.evaluator
assert restored.evaluated_at == ev.evaluated_at
assert restored.notes == ev.notes
assert len(restored.scores) == len(ev.scores)
def test_round_trip_preserves_scores(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "eval.md"
write_entity_evaluation(ev, p)
restored = read_entity_evaluation(p)
for orig, rest in zip(ev.scores, restored.scores):
assert rest.name == orig.name
assert rest.value == orig.value
assert rest.max_value == orig.max_value
def test_round_trip_preserves_rationales(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "eval.md"
write_entity_evaluation(ev, p)
restored = read_entity_evaluation(p)
assert restored.scores[0].rationale == "Clear and specific."
assert restored.scores[1].rationale == "Well grounded."
# Third score has no rationale
assert restored.scores[2].rationale == ""
def test_write_creates_parent_dirs(self, tmp_path):
ev = _sample_evaluation()
p = tmp_path / "deep" / "nested" / "eval.md"
write_entity_evaluation(ev, p)
assert p.exists()
# ── Snapshot I/O ─────────────────────────────────────────────────────
class TestSnapshotIO:
def test_write_creates_file(self, tmp_path):
snap = _sample_snapshot()
p = tmp_path / "snapshot.yaml"
write_snapshot(snap, p)
assert p.exists()
def test_read_back_matches_original(self, tmp_path):
snap = _sample_snapshot()
p = tmp_path / "snapshot.yaml"
write_snapshot(snap, p)
restored = read_snapshot(p)
assert restored.snapshot_id == snap.snapshot_id
assert restored.created_at == snap.created_at
assert restored.schema_name == snap.schema_name
assert restored.entity_count == snap.entity_count
def test_round_trip_preserves_entity_evaluations(self, tmp_path):
snap = _sample_snapshot()
p = tmp_path / "snapshot.yaml"
write_snapshot(snap, p)
restored = read_snapshot(p)
assert len(restored.entity_evaluations) == 1
ev = restored.entity_evaluations[0]
assert ev.entity_slug == "division-of-labour"
assert len(ev.scores) == 3
def test_round_trip_preserves_collection_metrics(self, tmp_path):
snap = _sample_snapshot()
p = tmp_path / "snapshot.yaml"
write_snapshot(snap, p)
restored = read_snapshot(p)
assert len(restored.collection_metrics) == 1
m = restored.collection_metrics[0]
assert m.name == "coverage_ratio"
assert m.value == 0.85
assert m.concern == "C2"
# ── History ──────────────────────────────────────────────────────────
class TestHistory:
def test_append_creates_new_file(self, tmp_path):
snap = _sample_snapshot()
hp = tmp_path / "history.yaml"
append_to_history(snap, hp)
assert hp.exists()
history = read_history(hp)
assert len(history) == 1
def test_append_adds_to_existing(self, tmp_path):
hp = tmp_path / "history.yaml"
snap1 = _sample_snapshot(snapshot_id="snap-1")
snap2 = _sample_snapshot(snapshot_id="snap-2")
append_to_history(snap1, hp)
append_to_history(snap2, hp)
history = read_history(hp)
assert len(history) == 2
assert history[0].snapshot_id == "snap-1"
assert history[1].snapshot_id == "snap-2"
def test_multiple_appends_all_preserved(self, tmp_path):
hp = tmp_path / "history.yaml"
for i in range(5):
snap = _sample_snapshot(snapshot_id=f"snap-{i}")
append_to_history(snap, hp)
history = read_history(hp)
assert len(history) == 5
assert [h.snapshot_id for h in history] == [f"snap-{i}" for i in range(5)]
def test_read_history_returns_list_in_order(self, tmp_path):
hp = tmp_path / "history.yaml"
snap_a = _sample_snapshot(snapshot_id="a")
snap_b = _sample_snapshot(snapshot_id="b")
append_to_history(snap_a, hp)
append_to_history(snap_b, hp)
history = read_history(hp)
assert history[0].snapshot_id == "a"
assert history[1].snapshot_id == "b"
# ── Diffing ──────────────────────────────────────────────────────────
class TestDiffSnapshots:
def test_identical_snapshots_empty_diff(self):
snap = _sample_snapshot()
diff = diff_snapshots(snap, snap)
assert diff.added_entities == []
assert diff.removed_entities == []
assert diff.score_changes == []
assert diff.metric_changes == []
def test_added_entity(self):
before = _sample_snapshot(entity_evaluations=[])
after = _sample_snapshot()
diff = diff_snapshots(before, after)
assert "division-of-labour" in diff.added_entities
assert diff.removed_entities == []
def test_removed_entity(self):
before = _sample_snapshot()
after = _sample_snapshot(entity_evaluations=[])
diff = diff_snapshots(before, after)
assert "division-of-labour" in diff.removed_entities
assert diff.added_entities == []
def test_changed_score(self):
ev_before = _sample_evaluation(scores=[ScoreEntry("precision", 4.0)])
ev_after = _sample_evaluation(scores=[ScoreEntry("precision", 4.8)])
before = _sample_snapshot(entity_evaluations=[ev_before])
after = _sample_snapshot(entity_evaluations=[ev_after])
diff = diff_snapshots(before, after)
assert len(diff.score_changes) == 1
sc = diff.score_changes[0]
assert sc.entity_slug == "division-of-labour"
assert sc.dimension == "precision"
assert sc.before == 4.0
assert sc.after == 4.8
def test_changed_metric(self):
before = _sample_snapshot(
collection_metrics=[MetricValue("coverage_ratio", 0.80)]
)
after = _sample_snapshot(
collection_metrics=[MetricValue("coverage_ratio", 0.90)]
)
diff = diff_snapshots(before, after)
assert len(diff.metric_changes) == 1
mc = diff.metric_changes[0]
assert mc.name == "coverage_ratio"
assert mc.before == 0.80
assert mc.after == 0.90
def test_summary_readable(self):
ev_before = _sample_evaluation(scores=[ScoreEntry("precision", 4.0)])
ev_after = _sample_evaluation(scores=[ScoreEntry("precision", 4.8)])
before = _sample_snapshot(
snapshot_id="snap-1",
entity_evaluations=[ev_before],
collection_metrics=[MetricValue("coverage", 0.80)],
)
after = _sample_snapshot(
snapshot_id="snap-2",
entity_evaluations=[ev_after],
collection_metrics=[MetricValue("coverage", 0.90)],
)
diff = diff_snapshots(before, after)
text = diff.summary()
assert "snap-1" in text
assert "snap-2" in text
assert "precision" in text
assert "coverage" in text
def test_summary_no_changes(self):
snap = _sample_snapshot()
diff = diff_snapshots(snap, snap)
text = diff.summary()
assert "No changes" in text

View File

@@ -0,0 +1,258 @@
"""
Tests for metrics history and viability tracking (S2.5).
"""
from __future__ import annotations
import json
from datetime import datetime, timezone
from pathlib import Path
import pytest
import yaml
from markitect.infospace.checks.orchestrator import CheckReport
from markitect.infospace.checks.granularity import GranularityReport
from markitect.infospace.checks.redundancy import RedundancyReport
from markitect.infospace.config import InfospaceConfig, TopicConfig, ViabilityThreshold
from markitect.infospace.evaluation import EvaluationSnapshot, MetricValue
from markitect.infospace.history import (
find_snapshot_by_date,
get_history,
get_latest_snapshot,
metric_trend,
read_metrics_file,
record_check_results,
snapshot_from_checks,
write_metrics_file,
)
# ── helpers ──────────────────────────────────────────────────────────
def _check_report() -> CheckReport:
return CheckReport(
redundancy=RedundancyReport(redundancy_ratio=0.1, entity_count=10),
granularity=GranularityReport(domain_entropy=1.5, entity_count=10),
)
def _config(tmp_path: Path) -> InfospaceConfig:
return InfospaceConfig(
topic=TopicConfig(name="Test Topic", domain="Testing"),
metrics_dir=str(tmp_path / "metrics"),
)
def _snapshot(snap_id: str, date_str: str, metrics: dict) -> EvaluationSnapshot:
return EvaluationSnapshot(
snapshot_id=snap_id,
created_at=datetime.fromisoformat(date_str).replace(tzinfo=timezone.utc),
schema_name="default",
entity_count=10,
collection_metrics=[
MetricValue(name=k, value=v) for k, v in metrics.items()
],
)
# ── snapshot_from_checks ────────────────────────────────────────────
class TestSnapshotFromChecks:
def test_creates_snapshot(self):
report = _check_report()
snap = snapshot_from_checks(report, entity_count=10)
assert snap.entity_count == 10
assert snap.snapshot_id # non-empty
assert snap.created_at is not None
def test_contains_metrics(self):
report = _check_report()
snap = snapshot_from_checks(report, entity_count=10)
metric_names = {m.name for m in snap.collection_metrics}
assert "redundancy_ratio" in metric_names
assert "granularity_entropy" in metric_names
def test_concern_labels(self):
report = _check_report()
snap = snapshot_from_checks(report, entity_count=10)
by_name = {m.name: m for m in snap.collection_metrics}
assert by_name["redundancy_ratio"].concern == "C1"
assert by_name["granularity_entropy"].concern == "C5"
def test_custom_schema(self):
report = _check_report()
snap = snapshot_from_checks(report, entity_count=5, schema_name="custom")
assert snap.schema_name == "custom"
def test_metadata(self):
report = _check_report()
snap = snapshot_from_checks(report, entity_count=5, metadata={"key": "val"})
assert snap.metadata == {"key": "val"}
def test_empty_report(self):
report = CheckReport()
snap = snapshot_from_checks(report, entity_count=0)
assert snap.collection_metrics == []
# ── write_metrics_file / read_metrics_file ──────────────────────────
class TestMetricsFileIO:
def test_round_trip(self, tmp_path):
path = tmp_path / "metrics.yaml"
metrics = {"redundancy_ratio": 0.05, "coverage_ratio": 0.85}
write_metrics_file(metrics, path)
loaded = read_metrics_file(path)
assert loaded["redundancy_ratio"] == pytest.approx(0.05)
assert loaded["coverage_ratio"] == pytest.approx(0.85)
def test_creates_parent_dirs(self, tmp_path):
path = tmp_path / "deep" / "nested" / "metrics.yaml"
write_metrics_file({"x": 1.0}, path)
assert path.is_file()
def test_read_missing_file(self, tmp_path):
path = tmp_path / "nonexistent.yaml"
assert read_metrics_file(path) == {}
def test_read_invalid_content(self, tmp_path):
path = tmp_path / "bad.yaml"
path.write_text("just a string", encoding="utf-8")
assert read_metrics_file(path) == {}
# ── record_check_results ────────────────────────────────────────────
class TestRecordCheckResults:
def test_creates_metrics_file(self, tmp_path):
cfg = _config(tmp_path)
report = _check_report()
record_check_results(report, cfg, tmp_path, entity_count=10)
metrics_path = tmp_path / cfg.metrics_dir / "metrics.yaml"
assert metrics_path.is_file()
def test_creates_history_file(self, tmp_path):
cfg = _config(tmp_path)
report = _check_report()
record_check_results(report, cfg, tmp_path, entity_count=10)
history_path = tmp_path / cfg.metrics_dir / "history.yaml"
assert history_path.is_file()
def test_appends_to_history(self, tmp_path):
cfg = _config(tmp_path)
report = _check_report()
record_check_results(report, cfg, tmp_path, entity_count=10)
record_check_results(report, cfg, tmp_path, entity_count=12)
history = get_history(cfg, tmp_path)
assert len(history) == 2
assert history[0].entity_count == 10
assert history[1].entity_count == 12
def test_returns_snapshot(self, tmp_path):
cfg = _config(tmp_path)
report = _check_report()
snap = record_check_results(report, cfg, tmp_path, entity_count=10)
assert snap.snapshot_id
assert snap.entity_count == 10
# ── get_history / get_latest_snapshot ────────────────────────────────
class TestGetHistory:
def test_empty_history(self, tmp_path):
cfg = _config(tmp_path)
assert get_history(cfg, tmp_path) == []
def test_get_latest(self, tmp_path):
cfg = _config(tmp_path)
report = _check_report()
record_check_results(report, cfg, tmp_path, entity_count=5)
record_check_results(report, cfg, tmp_path, entity_count=10)
latest = get_latest_snapshot(cfg, tmp_path)
assert latest is not None
assert latest.entity_count == 10
def test_latest_none_when_empty(self, tmp_path):
cfg = _config(tmp_path)
assert get_latest_snapshot(cfg, tmp_path) is None
# ── find_snapshot_by_date ────────────────────────────────────────────
class TestFindSnapshotByDate:
def test_finds_closest(self):
history = [
_snapshot("a", "2026-01-01T10:00:00", {"x": 1.0}),
_snapshot("b", "2026-02-15T10:00:00", {"x": 2.0}),
_snapshot("c", "2026-03-01T10:00:00", {"x": 3.0}),
]
result = find_snapshot_by_date(history, "2026-02-14")
assert result is not None
assert result.snapshot_id == "b"
def test_exact_match(self):
history = [
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
_snapshot("b", "2026-02-01T00:00:00", {"x": 2.0}),
]
result = find_snapshot_by_date(history, "2026-02-01")
assert result is not None
assert result.snapshot_id == "b"
def test_empty_history(self):
assert find_snapshot_by_date([], "2026-01-01") is None
def test_invalid_date(self):
history = [_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0})]
assert find_snapshot_by_date(history, "not-a-date") is None
def test_with_timestamp(self):
history = [
_snapshot("a", "2026-01-01T10:00:00", {"x": 1.0}),
_snapshot("b", "2026-01-01T14:00:00", {"x": 2.0}),
]
result = find_snapshot_by_date(history, "2026-01-01T13:00:00")
assert result is not None
assert result.snapshot_id == "b"
# ── metric_trend ─────────────────────────────────────────────────────
class TestMetricTrend:
def test_extracts_trend(self):
history = [
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0, "y": 2.0}),
_snapshot("b", "2026-02-01T00:00:00", {"x": 1.5, "y": 2.5}),
]
trend = metric_trend(history, "x")
assert len(trend) == 2
assert trend[0]["value"] == 1.0
assert trend[1]["value"] == 1.5
def test_missing_metric(self):
history = [
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
]
assert metric_trend(history, "nonexistent") == []
def test_empty_history(self):
assert metric_trend([], "x") == []
def test_partial_presence(self):
history = [
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
_snapshot("b", "2026-02-01T00:00:00", {"y": 2.0}), # x missing
_snapshot("c", "2026-03-01T00:00:00", {"x": 3.0}),
]
trend = metric_trend(history, "x")
assert len(trend) == 2
assert trend[0]["value"] == 1.0
assert trend[1]["value"] == 3.0

View File

@@ -0,0 +1,419 @@
"""Tests for markitect.infospace schema and validator modules."""
import pytest
from markitect.infospace import (
ECONOMIC_ENTITY_SCHEMA,
BatchComplianceResult,
ComplianceDiagnostic,
ComplianceResult,
EntityMeta,
EntitySchema,
EnumConstraint,
SectionRequirement,
SectionRule,
validate_entities,
validate_entity,
)
# ── Helpers ──────────────────────────────────────────────────────────
def _compliant_entity(**overrides) -> EntityMeta:
"""Return an EntityMeta that passes ECONOMIC_ENTITY_SCHEMA."""
defaults = dict(
slug="division_of_labour",
title="Division of Labour",
h1_raw="Division of Labour",
definition=(
"The separation of a work process into a number of distinct "
"tasks, each performed by a specialised worker, resulting in "
"a significant increase in the productive powers of labour."
),
source_chapter='Book I, Chapter 1: "Of the Division of Labour"',
context="The division of labour is the central argument of the chapter.",
domain="Production",
original_wording='"The greatest improvements in the productive powers…"',
modern_interpretation="Remains foundational in economics.",
h1_is_title_case=True,
has_original_wording=True,
definition_word_count=30,
total_word_count=100,
section_slugs=[
"definition",
"source_chapter",
"context",
"economic_domain",
"smith_s_original_wording",
"modern_interpretation",
],
source_path="/tmp/division-of-labour.md",
)
defaults.update(overrides)
return EntityMeta(**defaults)
# ── Single-entity validation ────────────────────────────────────────
class TestValidateEntityCompliant:
def test_fully_compliant_zero_diagnostics(self):
entity = _compliant_entity()
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
assert result.diagnostics == []
assert result.is_compliant is True
assert result.error_count == 0
assert result.warning_count == 0
assert result.checks_run > 0
def test_summary_shows_pass(self):
entity = _compliant_entity()
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
assert "PASS" in result.summary()
assert "division_of_labour" in result.summary()
class TestSectionMissing:
def test_missing_required_section_error(self):
entity = _compliant_entity(definition="")
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
codes = [d.code for d in result.diagnostics]
assert "SECTION_MISSING" in codes
assert not result.is_compliant
def test_empty_required_section_error(self):
entity = _compliant_entity(definition=" ")
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
codes = [d.code for d in result.diagnostics]
assert "SECTION_MISSING" in codes
def test_optional_section_absent_no_diagnostic(self):
entity = _compliant_entity(original_wording="", modern_interpretation="")
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
# Only optional sections removed — should still be fully compliant
assert result.is_compliant is True
assert result.error_count == 0
# No SECTION_MISSING or SECTION_RECOMMENDED for optional sections
section_codes = {d.code for d in result.diagnostics}
assert "SECTION_MISSING" not in section_codes
assert "SECTION_RECOMMENDED" not in section_codes
class TestSectionRecommended:
def test_recommended_section_missing_warning(self):
schema = EntitySchema(
name="Test Schema",
section_rules=(
SectionRule(
slug="definition",
label="Definition",
requirement=SectionRequirement.RECOMMENDED,
),
),
)
entity = _compliant_entity(definition="")
result = validate_entity(entity, schema)
codes = [d.code for d in result.diagnostics]
assert "SECTION_RECOMMENDED" in codes
severities = [d.severity for d in result.diagnostics if d.code == "SECTION_RECOMMENDED"]
assert severities == ["warning"]
# Warnings don't break compliance
assert result.is_compliant is True
class TestWordCountBounds:
def test_definition_too_short_error(self):
entity = _compliant_entity(definition="only ten words here to test the lower boundary check now")
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
short_diags = [d for d in result.diagnostics if d.code == "SECTION_TOO_SHORT"]
assert len(short_diags) == 1
assert short_diags[0].severity == "error"
assert not result.is_compliant
def test_definition_too_long_warning(self):
long_def = " ".join(["word"] * 200)
entity = _compliant_entity(definition=long_def)
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
long_diags = [d for d in result.diagnostics if d.code == "SECTION_TOO_LONG"]
assert len(long_diags) == 1
assert long_diags[0].severity == "warning"
# Warnings don't break compliance
assert result.is_compliant is True
def test_definition_at_min_boundary_passes(self):
exactly_20 = " ".join(["word"] * 20)
entity = _compliant_entity(definition=exactly_20)
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
codes = [d.code for d in result.diagnostics]
assert "SECTION_TOO_SHORT" not in codes
def test_definition_at_max_boundary_passes(self):
exactly_150 = " ".join(["word"] * 150)
entity = _compliant_entity(definition=exactly_150)
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
codes = [d.code for d in result.diagnostics]
assert "SECTION_TOO_LONG" not in codes
class TestH1Checks:
def test_slug_format_h1_warning(self):
entity = _compliant_entity(
h1_raw="effectual-demand",
h1_is_title_case=False,
)
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
h1_diags = [d for d in result.diagnostics if d.code == "H1_NOT_TITLE_CASE"]
assert len(h1_diags) == 1
assert h1_diags[0].severity == "warning"
# Still compliant (it's a warning)
assert result.is_compliant is True
def test_h1_missing_error(self):
entity = _compliant_entity(slug="", h1_raw="")
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
codes = [d.code for d in result.diagnostics]
assert "H1_MISSING" in codes
assert not result.is_compliant
def test_h1_title_case_error_severity(self):
schema = EntitySchema(
name="Strict",
section_rules=(),
h1_title_case_severity="error",
)
entity = _compliant_entity(h1_is_title_case=False)
result = validate_entity(entity, schema)
h1_diags = [d for d in result.diagnostics if d.code == "H1_NOT_TITLE_CASE"]
assert h1_diags[0].severity == "error"
assert not result.is_compliant
class TestEnumConstraints:
def test_unknown_domain_warning(self):
entity = _compliant_entity(domain="Metaphysics")
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
assert len(enum_diags) == 1
assert enum_diags[0].severity == "warning"
assert result.is_compliant is True
def test_empty_domain_no_enum_diagnostic(self):
"""Empty domain triggers SECTION_MISSING, not ENUM_VALUE_UNKNOWN."""
entity = _compliant_entity(domain="")
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
enum_codes = [d.code for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
assert len(enum_codes) == 0
# But SECTION_MISSING is raised for the required section
missing_codes = [d.code for d in result.diagnostics if d.code == "SECTION_MISSING"]
assert len(missing_codes) >= 1
def test_valid_domain_no_diagnostic(self):
for domain in ("Production", "Exchange", "Distribution", "Regulation", "General Theory"):
entity = _compliant_entity(domain=domain)
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
assert len(enum_diags) == 0, f"Unexpected enum diagnostic for domain '{domain}'"
class TestMultipleIssues:
def test_multiple_issues_on_one_entity(self):
entity = _compliant_entity(
definition="too short",
domain="UnknownDomain",
h1_is_title_case=False,
)
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
codes = {d.code for d in result.diagnostics}
assert "SECTION_TOO_SHORT" in codes
assert "ENUM_VALUE_UNKNOWN" in codes
assert "H1_NOT_TITLE_CASE" in codes
assert len(result.diagnostics) >= 3
class TestCustomSchema:
def test_custom_schema_different_rules(self):
schema = EntitySchema(
name="Custom",
section_rules=(
SectionRule(
slug="definition",
label="Definition",
requirement=SectionRequirement.REQUIRED,
min_words=5,
max_words=50,
),
),
enum_constraints=(
EnumConstraint(
field_name="domain",
allowed_values=("Alpha", "Beta"),
severity="error",
),
),
h1_title_case_severity="error",
require_h1=False,
)
entity = _compliant_entity(
definition="just five words here exactly",
domain="Alpha",
)
result = validate_entity(entity, schema)
assert result.is_compliant is True
assert result.schema_name == "Custom"
def test_custom_enum_error_severity(self):
schema = EntitySchema(
name="Strict Enum",
section_rules=(),
enum_constraints=(
EnumConstraint(
field_name="domain",
allowed_values=("A",),
severity="error",
),
),
)
entity = _compliant_entity(domain="B")
result = validate_entity(entity, schema)
assert not result.is_compliant
enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
assert enum_diags[0].severity == "error"
# ── Batch validation ────────────────────────────────────────────────
class TestBatchValidation:
def test_empty_list(self):
result = validate_entities([], ECONOMIC_ENTITY_SCHEMA)
assert result.total_entities == 0
assert result.compliant_count == 0
assert result.total_errors == 0
assert result.total_warnings == 0
def test_mixed_compliance(self):
good = _compliant_entity()
bad = _compliant_entity(slug="bad", definition="")
result = validate_entities([good, bad], ECONOMIC_ENTITY_SCHEMA)
assert result.total_entities == 2
assert result.compliant_count == 1
assert result.non_compliant_count == 1
assert result.total_errors >= 1
def test_summary_format(self):
good = _compliant_entity()
bad = _compliant_entity(slug="bad_entity", definition="too short")
result = validate_entities([good, bad], ECONOMIC_ENTITY_SCHEMA)
summary = result.summary()
assert "Schema: Economic Entity" in summary
assert "Entities: 2" in summary
assert "Compliant: 1/2" in summary
assert "division_of_labour" in summary
assert "bad_entity" in summary
def test_aggregate_counts(self):
entities = [
_compliant_entity(slug="e1"),
_compliant_entity(slug="e2", definition="short"),
_compliant_entity(slug="e3", domain="Unknown", h1_is_title_case=False),
]
result = validate_entities(entities, ECONOMIC_ENTITY_SCHEMA)
assert result.total_entities == 3
assert result.total_errors == result.results[0].error_count + result.results[1].error_count + result.results[2].error_count
assert result.total_warnings == result.results[0].warning_count + result.results[1].warning_count + result.results[2].warning_count
def test_schema_name_propagated(self):
result = validate_entities([], ECONOMIC_ENTITY_SCHEMA)
assert result.schema_name == "Economic Entity"
# ── Default schema checks ──────────────────────────────────────────
class TestDefaultSchema:
def test_correct_section_count(self):
assert len(ECONOMIC_ENTITY_SCHEMA.section_rules) == 6
def test_required_sections(self):
required = [
r.slug for r in ECONOMIC_ENTITY_SCHEMA.section_rules
if r.requirement == SectionRequirement.REQUIRED
]
assert set(required) == {"definition", "source_chapter", "context", "economic_domain"}
def test_optional_sections(self):
optional = [
r.slug for r in ECONOMIC_ENTITY_SCHEMA.section_rules
if r.requirement == SectionRequirement.OPTIONAL
]
assert set(optional) == {"smith_s_original_wording", "modern_interpretation"}
def test_domain_enum_values(self):
domain_constraint = ECONOMIC_ENTITY_SCHEMA.enum_constraints[0]
assert domain_constraint.field_name == "domain"
assert set(domain_constraint.allowed_values) == {
"Production", "Exchange", "Distribution", "Regulation", "General Theory",
}
def test_schema_is_frozen(self):
with pytest.raises(AttributeError):
ECONOMIC_ENTITY_SCHEMA.name = "Changed"
def test_section_rule_is_frozen(self):
rule = ECONOMIC_ENTITY_SCHEMA.section_rules[0]
with pytest.raises(AttributeError):
rule.slug = "changed"
def test_enum_constraint_is_frozen(self):
constraint = ECONOMIC_ENTITY_SCHEMA.enum_constraints[0]
with pytest.raises(AttributeError):
constraint.field_name = "changed"
# ── ComplianceDiagnostic __str__ ────────────────────────────────────
class TestDiagnosticStr:
def test_basic_str(self):
d = ComplianceDiagnostic(code="TEST", message="test msg", severity="error")
assert "[ERROR] TEST: test msg" in str(d)
def test_str_with_section(self):
d = ComplianceDiagnostic(
code="SECTION_MISSING",
message="Missing.",
severity="error",
section="definition",
)
s = str(d)
assert "(section: definition)" in s
def test_str_with_field(self):
d = ComplianceDiagnostic(
code="ENUM_VALUE_UNKNOWN",
message="Unknown.",
severity="warning",
field="domain",
)
s = str(d)
assert "(field: domain)" in s
# ── ComplianceResult properties ─────────────────────────────────────
class TestComplianceResultProperties:
def test_errors_property(self):
result = ComplianceResult(entity_slug="test", schema_name="Test")
result.diagnostics = [
ComplianceDiagnostic(code="A", message="a", severity="error"),
ComplianceDiagnostic(code="B", message="b", severity="warning"),
ComplianceDiagnostic(code="C", message="c", severity="error"),
]
assert len(result.errors) == 2
assert len(result.warnings) == 1
assert result.error_count == 2
assert result.warning_count == 1
assert not result.is_compliant
def test_summary_fail(self):
result = ComplianceResult(entity_slug="test", schema_name="Test", checks_run=5)
result.diagnostics = [
ComplianceDiagnostic(code="A", message="a", severity="error"),
]
assert "FAIL" in result.summary()

View File

@@ -0,0 +1,235 @@
"""Tests for embedding adapter, cache, similarity, and factory."""
from pathlib import Path
from unittest import mock
import pytest
from markitect.llm.similarity import (
cosine_similarity,
similarity_matrix,
find_similar_pairs,
)
from markitect.llm.embedding_cache import EmbeddingCache
from markitect.llm.embedding_openai import OpenAICompatibleEmbeddingAdapter
from markitect.llm.embedding_factory import create_embedding_adapter
from markitect.llm.exceptions import LLMConfigurationError, LLMRateLimitError
# ── Similarity math ─────────────────────────────────────────────────
class TestCosineSimilarity:
def test_identical_vectors(self):
v = [1.0, 2.0, 3.0]
assert cosine_similarity(v, v) == pytest.approx(1.0)
def test_orthogonal_vectors(self):
a = [1.0, 0.0, 0.0]
b = [0.0, 1.0, 0.0]
assert cosine_similarity(a, b) == pytest.approx(0.0)
def test_opposite_vectors(self):
a = [1.0, 0.0]
b = [-1.0, 0.0]
assert cosine_similarity(a, b) == pytest.approx(-1.0)
def test_zero_vector(self):
assert cosine_similarity([0.0, 0.0], [1.0, 2.0]) == 0.0
class TestSimilarityMatrix:
def test_diagonal_is_one(self):
vecs = [[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]]
mat = similarity_matrix(vecs)
for i in range(len(vecs)):
assert mat[i][i] == pytest.approx(1.0)
def test_symmetric(self):
vecs = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
mat = similarity_matrix(vecs)
n = len(vecs)
for i in range(n):
for j in range(n):
assert mat[i][j] == pytest.approx(mat[j][i])
class TestFindSimilarPairs:
def test_threshold_filters(self):
emb = {
"a": [1.0, 0.0],
"b": [0.0, 1.0],
"c": [1.0, 0.01], # very similar to "a"
}
pairs = find_similar_pairs(emb, threshold=0.90)
slugs_in_pairs = {(s1, s2) for s1, s2, _ in pairs}
assert ("a", "c") in slugs_in_pairs
# a-b are orthogonal, should not appear
assert ("a", "b") not in slugs_in_pairs
def test_sorted_descending(self):
emb = {
"x": [1.0, 0.0, 0.0],
"y": [0.9, 0.1, 0.0],
"z": [0.95, 0.05, 0.0],
}
pairs = find_similar_pairs(emb, threshold=0.0)
sims = [s for _, _, s in pairs]
assert sims == sorted(sims, reverse=True)
def test_empty_embeddings(self):
assert find_similar_pairs({}) == []
def test_single_embedding(self):
assert find_similar_pairs({"only": [1.0, 0.0]}) == []
# ── Embedding cache ─────────────────────────────────────────────────
class TestEmbeddingCache:
def test_put_get_roundtrip(self, tmp_path: Path):
cache = EmbeddingCache(tmp_path)
cache.put("division-of-labour", "abc123", [0.1, 0.2, 0.3])
assert cache.get("division-of-labour", "abc123") == [0.1, 0.2, 0.3]
def test_wrong_digest_returns_none(self, tmp_path: Path):
cache = EmbeddingCache(tmp_path)
cache.put("slug", "digest-v1", [1.0])
assert cache.get("slug", "digest-v2") is None
def test_missing_slug_returns_none(self, tmp_path: Path):
cache = EmbeddingCache(tmp_path)
assert cache.get("nonexistent", "any") is None
def test_save_load_persists(self, tmp_path: Path):
cache = EmbeddingCache(tmp_path)
cache.put("slug-a", "d1", [0.5, 0.6])
cache.save()
cache2 = EmbeddingCache(tmp_path)
assert cache2.get("slug-a", "d1") == [0.5, 0.6]
def test_stats_tracks_hits_and_misses(self, tmp_path: Path):
cache = EmbeddingCache(tmp_path)
cache.put("s", "d", [1.0])
cache.get("s", "d") # hit
cache.get("s", "wrong") # miss
cache.get("missing", "x") # miss
s = cache.stats()
assert s["entries"] == 1
assert s["hits"] == 1
assert s["misses"] == 2
# ── Adapter (mocked HTTP) ──────────────────────────────────────────
def _make_embedding_response(vectors):
"""Build a mock API response for the /embeddings endpoint."""
return {
"data": [
{"embedding": vec, "index": i}
for i, vec in enumerate(vectors)
],
"usage": {"prompt_tokens": 5, "total_tokens": 5},
}
class TestOpenAICompatibleEmbeddingAdapter:
def _adapter(self, **kwargs):
defaults = {"api_key": "sk-test", "provider": "openai"}
defaults.update(kwargs)
return OpenAICompatibleEmbeddingAdapter(**defaults)
@mock.patch("markitect.llm.embedding_openai.post_json")
def test_embed_returns_vectors_in_order(self, mock_post):
# Return indices out of order to verify sorting
mock_post.return_value = {
"data": [
{"embedding": [0.2, 0.3], "index": 1},
{"embedding": [0.1, 0.2], "index": 0},
],
"usage": {},
}
adapter = self._adapter()
result = adapter.embed(["text1", "text2"])
assert result == [[0.1, 0.2], [0.2, 0.3]]
@mock.patch("markitect.llm.embedding_openai.post_json")
def test_embed_payload_structure(self, mock_post):
mock_post.return_value = _make_embedding_response([[0.1]])
adapter = self._adapter(model="text-embedding-3-large")
adapter.embed(["hello"])
call_args = mock_post.call_args
url = call_args[0][0]
payload = call_args[0][1]
assert url == "https://api.openai.com/v1/embeddings"
assert payload["model"] == "text-embedding-3-large"
assert payload["input"] == ["hello"]
def test_embed_raises_without_api_key(self):
adapter = OpenAICompatibleEmbeddingAdapter(api_key=None, provider="openai")
adapter._api_key = None
with pytest.raises(LLMConfigurationError):
adapter.embed(["test"])
def test_validate_true_with_key(self):
adapter = self._adapter()
assert adapter.validate() is True
def test_validate_false_without_key(self):
adapter = OpenAICompatibleEmbeddingAdapter(api_key=None, provider="openai")
adapter._api_key = None
assert adapter.validate() is False
@mock.patch("markitect.llm.embedding_openai.post_json")
@mock.patch("markitect.llm.embedding_openai.time.sleep")
def test_retry_on_429(self, mock_sleep, mock_post):
mock_post.side_effect = [
LLMRateLimitError("rate limited", status_code=429),
_make_embedding_response([[0.1, 0.2]]),
]
adapter = self._adapter(max_retries=2)
result = adapter.embed(["test"])
assert result == [[0.1, 0.2]]
assert mock_sleep.call_count == 1
def test_openai_provider_base_url(self):
adapter = self._adapter(provider="openai")
assert adapter._api_base == "https://api.openai.com/v1"
def test_openrouter_provider_base_url(self):
adapter = self._adapter(provider="openrouter")
assert adapter._api_base == "https://openrouter.ai/api/v1"
def test_unknown_provider_raises(self):
with pytest.raises(LLMConfigurationError):
OpenAICompatibleEmbeddingAdapter(api_key="sk-test", provider="unknown")
# ── Factory ─────────────────────────────────────────────────────────
class TestCreateEmbeddingAdapter:
def test_openai_provider(self):
adapter = create_embedding_adapter("openai", api_key="sk-test")
assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
assert adapter._provider == "openai"
def test_openrouter_provider(self):
adapter = create_embedding_adapter("openrouter", api_key="sk-test")
assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
assert adapter._provider == "openrouter"
def test_unknown_provider_raises(self):
with pytest.raises(LLMConfigurationError) as exc_info:
create_embedding_adapter("unknown")
assert "unknown" in str(exc_info.value)
def test_model_passed_through(self):
adapter = create_embedding_adapter(
"openai", model="text-embedding-3-large", api_key="sk-test"
)
assert adapter._model == "text-embedding-3-large"

View File

@@ -0,0 +1,281 @@
"""Tests for markitect.prompts.execution.batch."""
import pytest
from markitect.prompts.execution.batch import (
BatchEvaluator,
BatchItem,
BatchResult,
BatchSummary,
)
from markitect.prompts.execution.llm_adapter import MockLLMAdapter, ErrorLLMAdapter
from markitect.prompts.execution.models import RunConfig, LLMResponse
# ── Helpers ──────────────────────────────────────────────────────────
def _items(n=3, digest_prefix="d"):
return [
BatchItem(
key=f"entity-{i}",
prompt=f"Evaluate entity {i}",
content_digest=f"{digest_prefix}{i}",
metadata={"index": i},
)
for i in range(n)
]
# ── BatchItem / BatchResult / BatchSummary ───────────────────────────
class TestBatchModels:
def test_batch_item_defaults(self):
item = BatchItem(key="slug", prompt="text")
assert item.content_digest == ""
assert item.metadata == {}
def test_batch_result_defaults(self):
result = BatchResult(key="slug", status="success")
assert result.response is None
assert result.error is None
def test_summary_total_tokens(self):
s = BatchSummary(total_prompt_tokens=100, total_completion_tokens=50)
assert s.total_tokens == 150
def test_summary_success_rate_all_success(self):
s = BatchSummary(total=3, succeeded=3)
assert s.success_rate() == 1.0
def test_summary_success_rate_with_failures(self):
s = BatchSummary(total=4, succeeded=2, failed=2)
assert s.success_rate() == pytest.approx(0.5)
def test_summary_success_rate_all_skipped(self):
s = BatchSummary(total=3, skipped=3)
assert s.success_rate() == 1.0
def test_summary_success_rate_mixed(self):
s = BatchSummary(total=5, succeeded=2, failed=1, skipped=2)
# 3 attempted, 2 succeeded
assert s.success_rate() == pytest.approx(2 / 3)
# ── BatchEvaluator ──────────────────────────────────────────────────
class TestBatchEvaluator:
def test_evaluate_all_items(self):
adapter = MockLLMAdapter("result")
evaluator = BatchEvaluator(adapter)
summary = evaluator.evaluate(_items(3))
assert summary.total == 3
assert summary.succeeded == 3
assert summary.failed == 0
assert summary.skipped == 0
assert len(summary.results) == 3
assert adapter.call_count == 3
def test_results_preserve_keys(self):
adapter = MockLLMAdapter("ok")
evaluator = BatchEvaluator(adapter)
items = _items(2)
summary = evaluator.evaluate(items)
keys = [r.key for r in summary.results]
assert keys == ["entity-0", "entity-1"]
def test_results_preserve_metadata(self):
adapter = MockLLMAdapter("ok")
evaluator = BatchEvaluator(adapter)
items = _items(1)
summary = evaluator.evaluate(items)
assert summary.results[0].metadata == {"index": 0}
def test_response_content_available(self):
adapter = MockLLMAdapter("evaluated text")
evaluator = BatchEvaluator(adapter)
summary = evaluator.evaluate(_items(1))
assert summary.results[0].response.content == "evaluated text"
def test_token_usage_aggregated(self):
adapter = MockLLMAdapter("result")
evaluator = BatchEvaluator(adapter)
summary = evaluator.evaluate(_items(3))
assert summary.total_prompt_tokens > 0
assert summary.total_completion_tokens > 0
assert summary.total_tokens == summary.total_prompt_tokens + summary.total_completion_tokens
def test_config_passed_to_adapter(self):
adapter = MockLLMAdapter("ok")
config = RunConfig(temperature=0.1, max_tokens=500)
evaluator = BatchEvaluator(adapter, config=config)
evaluator.evaluate(_items(1))
assert adapter.last_config.temperature == 0.1
assert adapter.last_config.max_tokens == 500
# ── Incremental evaluation ──────────────────────────────────────────
class TestIncrementalEvaluation:
def test_skip_unchanged_items(self):
adapter = MockLLMAdapter("result")
previous = {"entity-0": "d0", "entity-1": "d1", "entity-2": "d2"}
evaluator = BatchEvaluator(adapter, previous_digests=previous)
summary = evaluator.evaluate(_items(3))
assert summary.skipped == 3
assert summary.succeeded == 0
assert adapter.call_count == 0
def test_evaluate_changed_items(self):
adapter = MockLLMAdapter("result")
# Only entity-0 has matching digest
previous = {"entity-0": "d0"}
evaluator = BatchEvaluator(adapter, previous_digests=previous)
summary = evaluator.evaluate(_items(3))
assert summary.skipped == 1
assert summary.succeeded == 2
assert adapter.call_count == 2
def test_evaluate_new_items(self):
adapter = MockLLMAdapter("result")
# Previous has different keys
previous = {"old-entity": "old-digest"}
evaluator = BatchEvaluator(adapter, previous_digests=previous)
summary = evaluator.evaluate(_items(2))
assert summary.skipped == 0
assert summary.succeeded == 2
def test_changed_digest_not_skipped(self):
adapter = MockLLMAdapter("result")
# Same key but different digest
previous = {"entity-0": "old-digest"}
evaluator = BatchEvaluator(adapter, previous_digests=previous)
summary = evaluator.evaluate(_items(1))
assert summary.skipped == 0
assert summary.succeeded == 1
def test_empty_digest_not_skipped(self):
adapter = MockLLMAdapter("result")
previous = {"entity-0": "d0"}
evaluator = BatchEvaluator(adapter, previous_digests=previous)
item = BatchItem(key="entity-0", prompt="eval", content_digest="")
summary = evaluator.evaluate([item])
assert summary.skipped == 0
assert summary.succeeded == 1
def test_skipped_status_in_result(self):
adapter = MockLLMAdapter("result")
previous = {"entity-0": "d0"}
evaluator = BatchEvaluator(adapter, previous_digests=previous)
summary = evaluator.evaluate(_items(1))
assert summary.results[0].status == "skipped"
assert summary.results[0].response is None
# ── Error handling ──────────────────────────────────────────────────
class TestBatchErrorHandling:
def test_error_captured_not_raised(self):
adapter = ErrorLLMAdapter("kaboom")
evaluator = BatchEvaluator(adapter)
summary = evaluator.evaluate(_items(2))
assert summary.failed == 2
assert summary.succeeded == 0
def test_error_message_in_result(self):
adapter = ErrorLLMAdapter("something went wrong")
evaluator = BatchEvaluator(adapter)
summary = evaluator.evaluate(_items(1))
assert summary.results[0].status == "error"
assert "something went wrong" in summary.results[0].error
def test_error_does_not_stop_batch(self):
"""One failing item doesn't prevent others from running."""
call_count = 0
class FailOnFirstAdapter(MockLLMAdapter):
def execute_prompt(self, prompt, config):
nonlocal call_count
call_count += 1
if call_count == 1:
raise RuntimeError("first fails")
return super().execute_prompt(prompt, config)
adapter = FailOnFirstAdapter("ok")
evaluator = BatchEvaluator(adapter)
summary = evaluator.evaluate(_items(3))
assert summary.failed == 1
assert summary.succeeded == 2
assert summary.results[0].status == "error"
assert summary.results[1].status == "success"
assert summary.results[2].status == "success"
# ── Progress callback ───────────────────────────────────────────────
class TestProgressCallback:
def test_callback_called_for_each_item(self):
calls = []
adapter = MockLLMAdapter("ok")
evaluator = BatchEvaluator(
adapter,
progress_callback=lambda done, total, result: calls.append(
(done, total, result.key)
),
)
evaluator.evaluate(_items(3))
assert len(calls) == 3
assert calls[0] == (1, 3, "entity-0")
assert calls[1] == (2, 3, "entity-1")
assert calls[2] == (3, 3, "entity-2")
def test_callback_receives_result(self):
results = []
adapter = MockLLMAdapter("ok")
evaluator = BatchEvaluator(
adapter,
progress_callback=lambda done, total, result: results.append(result),
)
evaluator.evaluate(_items(2))
assert all(isinstance(r, BatchResult) for r in results)
assert results[0].status == "success"
def test_no_callback_no_error(self):
adapter = MockLLMAdapter("ok")
evaluator = BatchEvaluator(adapter)
# Should work fine without callback
summary = evaluator.evaluate(_items(1))
assert summary.succeeded == 1
# ── Empty batch ─────────────────────────────────────────────────────
class TestEmptyBatch:
def test_empty_items(self):
adapter = MockLLMAdapter("ok")
evaluator = BatchEvaluator(adapter)
summary = evaluator.evaluate([])
assert summary.total == 0
assert summary.succeeded == 0
assert summary.results == []
assert adapter.call_count == 0