Compare commits
19 Commits
60f33443ae
...
3ac8447c10
| Author | SHA1 | Date | |
|---|---|---|---|
| 3ac8447c10 | |||
| 94cb2063af | |||
| d1c6e53754 | |||
| b76d6d38c1 | |||
| ce7f78d57d | |||
| 11585e6968 | |||
| 3461d2f354 | |||
| 3726503adb | |||
| b20fe4db68 | |||
| 144a88c0c2 | |||
| dc22017b7c | |||
| f8c9ab33f0 | |||
| bad01e32bd | |||
| 267368eb60 | |||
| 9031e1162c | |||
| 03c6c5e8de | |||
| b5e994b014 | |||
| 4ce856d4d0 | |||
| 2f0989f9bf |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -78,6 +78,7 @@ Thumbs.db
|
||||
|
||||
# MarkiTect database files (local development)
|
||||
markitect.db
|
||||
**/infospace.db
|
||||
assets/assets.db
|
||||
**/assets.db
|
||||
.markitect/
|
||||
|
||||
344
docs/infospace-primitives.md
Normal file
344
docs/infospace-primitives.md
Normal file
@@ -0,0 +1,344 @@
|
||||
# Infospace Primitives Reference
|
||||
|
||||
This document describes the primitives provided by the `markitect/infospace/`
|
||||
package for creating, evaluating, maintaining, and composing infospaces.
|
||||
|
||||
---
|
||||
|
||||
## Core Concepts
|
||||
|
||||
An **infospace** is a structured, evaluable, composable collection of
|
||||
entities that explains a **topic** through the lens of one or more
|
||||
**disciplines**.
|
||||
|
||||
| Term | Meaning |
|
||||
|------|---------|
|
||||
| **Topic** | The subject matter being explained |
|
||||
| **Discipline** | A reusable framework of concepts applied as an analytical lens |
|
||||
| **Entity** | The atomic unit of knowledge — slug, definition, provenance, domain |
|
||||
| **Evaluation** | Per-entity or collection-level quality assessment |
|
||||
| **Viability** | Whether an infospace meets its threshold scores |
|
||||
|
||||
---
|
||||
|
||||
## Configuration (`infospace.yaml`)
|
||||
|
||||
Every infospace is declared via an `infospace.yaml` file. The configuration
|
||||
model is defined in `markitect/infospace/config.py`.
|
||||
|
||||
### Minimal example
|
||||
|
||||
```yaml
|
||||
topic:
|
||||
name: "The Wealth of Nations"
|
||||
domain: "Classical Economics"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Viable System Model"
|
||||
path: artifacts/vsm-reference/
|
||||
|
||||
schemas:
|
||||
entity: schemas/economic-entity-schema-v1.0.md
|
||||
|
||||
viability:
|
||||
coverage_ratio: { min: 0.60 }
|
||||
redundancy_ratio: { max: 0.05 }
|
||||
per_entity_mean: { min: 3.5 }
|
||||
```
|
||||
|
||||
### Key models
|
||||
|
||||
- **`TopicConfig`** — `name`, `domain`, `sources`
|
||||
- **`DisciplineBinding`** — `name`, `path` (to another infospace directory)
|
||||
- **`SchemaRegistry`** — `entity`, `mapping`, `analysis` schema paths
|
||||
- **`ViabilityThreshold`** — `metric`, `min`, `max` bounds
|
||||
- **`PipelineConfig`** — Ordered list of `PipelineStage` entries
|
||||
- **`InfospaceConfig`** — Top-level config combining all of the above
|
||||
|
||||
### Default directories
|
||||
|
||||
| Setting | Default |
|
||||
|---------|---------|
|
||||
| `entities_dir` | `output/entities` |
|
||||
| `evaluations_dir` | `output/evaluations` |
|
||||
| `metrics_dir` | `output/metrics` |
|
||||
|
||||
---
|
||||
|
||||
## Entity Metadata
|
||||
|
||||
Entities are parsed from markdown files by `markitect/infospace/entity_parser.py`.
|
||||
|
||||
**`EntityMeta`** fields: `slug`, `title`, `definition`, `domain`,
|
||||
`source_chapter`, `context`, `original_wording`, `modern_interpretation`,
|
||||
`definition_word_count`, `total_word_count`, `section_slugs`.
|
||||
|
||||
```python
|
||||
from markitect.infospace import parse_entity_directory
|
||||
entities = parse_entity_directory(Path("output/entities"))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema Validation
|
||||
|
||||
Deterministic validation of entity files against structural schemas.
|
||||
|
||||
```python
|
||||
from markitect.infospace import validate_entity, ECONOMIC_ENTITY_SCHEMA
|
||||
result = validate_entity(entity_meta, schema=ECONOMIC_ENTITY_SCHEMA)
|
||||
print(result.summary())
|
||||
```
|
||||
|
||||
Checks: section presence, word count ranges, heading format, enum values.
|
||||
|
||||
---
|
||||
|
||||
## Per-entity Evaluation
|
||||
|
||||
LLM-based quality assessment of individual entities. Defined in
|
||||
`markitect/infospace/evaluate.py`.
|
||||
|
||||
```bash
|
||||
# Evaluate all entities
|
||||
markitect infospace evaluate --provider openrouter
|
||||
|
||||
# Single entity
|
||||
markitect infospace evaluate --entity division-of-labour --provider openrouter
|
||||
```
|
||||
|
||||
### Pipeline functions
|
||||
|
||||
- `build_evaluation_prompt(entity, topic, dimensions)` — build the LLM prompt
|
||||
- `parse_evaluation_response(text, dimensions)` — parse LLM output to `ScoreEntry` list
|
||||
- `run_entity_evaluation(config, entities, adapter, ...)` — full batch pipeline
|
||||
|
||||
Results are written to `output/evaluations/` as YAML frontmatter + markdown.
|
||||
|
||||
---
|
||||
|
||||
## Collection-level Checks
|
||||
|
||||
Five concerns assessed at the collection level. Each has a dedicated
|
||||
module in `markitect/infospace/checks/`.
|
||||
|
||||
| Concern | Module | Key metric |
|
||||
|---------|--------|------------|
|
||||
| **C1 — Redundancy** | `redundancy.py` | `redundancy_ratio` |
|
||||
| **C2 — Coverage** | `coverage.py` | `coverage_ratio` |
|
||||
| **C3 — Coherence** | `coherence.py` | `coherence_components`, `modularity` |
|
||||
| **C4 — Consistency** | `consistency.py` | `consistency_cycles` |
|
||||
| **C5 — Granularity** | `granularity.py` | `granularity_entropy` |
|
||||
|
||||
### Orchestrator
|
||||
|
||||
```python
|
||||
from markitect.infospace.checks import run_all_checks
|
||||
report = run_all_checks(entities, embeddings=emb, graph=g)
|
||||
metrics = report.metrics() # Dict[str, float]
|
||||
```
|
||||
|
||||
### CLI
|
||||
|
||||
```bash
|
||||
# Run all checks
|
||||
markitect infospace check
|
||||
|
||||
# Run specific concerns
|
||||
markitect infospace check --concern redundancy --concern coverage
|
||||
|
||||
# JSON output
|
||||
markitect infospace check --json
|
||||
```
|
||||
|
||||
After each check run, metrics are automatically recorded to history.
|
||||
|
||||
---
|
||||
|
||||
## Metrics History
|
||||
|
||||
Timestamped snapshots track metrics over time. Defined in
|
||||
`markitect/infospace/history.py`.
|
||||
|
||||
```bash
|
||||
# Show history
|
||||
markitect infospace history
|
||||
|
||||
# Trend for a single metric
|
||||
markitect infospace history --metric coverage_ratio
|
||||
|
||||
# Compare two snapshots
|
||||
markitect infospace history-diff 2026-02-01 2026-03-01
|
||||
```
|
||||
|
||||
### Key functions
|
||||
|
||||
- `snapshot_from_checks(report, entity_count)` — create snapshot from check results
|
||||
- `record_check_results(report, config, root, entity_count)` — save metrics + append to history
|
||||
- `get_history(config, root)` — read full history
|
||||
- `metric_trend(history, metric_name)` — extract single metric across time
|
||||
|
||||
---
|
||||
|
||||
## Viability
|
||||
|
||||
Viability is assessed by comparing current metrics to thresholds declared
|
||||
in `infospace.yaml`.
|
||||
|
||||
```bash
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
### Threshold model
|
||||
|
||||
```yaml
|
||||
viability:
|
||||
coverage_ratio: { min: 0.60 } # must be >= 0.60
|
||||
redundancy_ratio: { max: 0.05 } # must be <= 0.05
|
||||
consistency_cycles: { max: 0 } # must be exactly 0
|
||||
```
|
||||
|
||||
Each threshold has `min` and/or `max` bounds. A metric passes if it falls
|
||||
within bounds. An infospace is viable when all thresholds pass.
|
||||
|
||||
---
|
||||
|
||||
## Composition
|
||||
|
||||
One infospace can use another as a discipline. The composition model is
|
||||
defined in `markitect/infospace/composition.py`.
|
||||
|
||||
### Binding a discipline
|
||||
|
||||
```bash
|
||||
markitect infospace bind-discipline ./path/to/vsm-infospace --name "Viable System Model"
|
||||
```
|
||||
|
||||
This adds a `DisciplineBinding` to `infospace.yaml` and validates the
|
||||
discipline exists and has an `infospace.yaml`.
|
||||
|
||||
### Checking discipline status
|
||||
|
||||
```bash
|
||||
markitect infospace disciplines
|
||||
```
|
||||
|
||||
Shows: name, entity count, viability status, path.
|
||||
|
||||
### Viability requirement
|
||||
|
||||
A discipline must meet its own viability thresholds to be considered
|
||||
reliable. The `check_discipline_status()` function loads the discipline's
|
||||
metrics and runs its own threshold checks.
|
||||
|
||||
### Stale mapping detection
|
||||
|
||||
```bash
|
||||
markitect infospace stale-mappings
|
||||
```
|
||||
|
||||
Compares local mapping references against the discipline's current entity
|
||||
set. If a referenced discipline entity has been removed, the mapping is
|
||||
flagged as stale.
|
||||
|
||||
### Key functions
|
||||
|
||||
- `resolve_discipline_path(binding, root)` — resolve to absolute path
|
||||
- `load_discipline_config(binding, root)` — load discipline's `infospace.yaml`
|
||||
- `check_discipline_status(binding, root)` — full status with viability
|
||||
- `get_discipline_entities(binding, root)` — entity list from discipline
|
||||
- `find_stale_mappings(config, root, mapping_references)` — detect stale refs
|
||||
- `bind_discipline(config, name, path, root)` — add binding to config
|
||||
|
||||
---
|
||||
|
||||
## Evaluation Output Format
|
||||
|
||||
Evaluation results use YAML frontmatter + markdown body. Defined in
|
||||
`markitect/infospace/evaluation.py` and `evaluation_io.py`.
|
||||
|
||||
### Per-entity evaluation file
|
||||
|
||||
```markdown
|
||||
---
|
||||
entity_slug: division-of-labour
|
||||
evaluator: openrouter/default
|
||||
evaluated_at: '2026-02-19T10:30:00'
|
||||
overall_score: 4.1667
|
||||
scores:
|
||||
- name: definition_precision
|
||||
value: 4.5
|
||||
max_value: 5.0
|
||||
...
|
||||
---
|
||||
|
||||
# Evaluation: Division Of Labour
|
||||
|
||||
## definition_precision — 4.5 / 5.0
|
||||
|
||||
The definition clearly captures the core concept...
|
||||
```
|
||||
|
||||
### Snapshot
|
||||
|
||||
```yaml
|
||||
snapshot_id: abc12345
|
||||
created_at: '2026-02-19T10:30:00+00:00'
|
||||
schema_name: default
|
||||
entity_count: 85
|
||||
entity_evaluations: [...]
|
||||
collection_metrics:
|
||||
- name: coverage_ratio
|
||||
value: 0.75
|
||||
concern: C2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## State
|
||||
|
||||
Runtime state is computed from entities, evaluations, and metrics.
|
||||
Defined in `markitect/infospace/state.py`.
|
||||
|
||||
```python
|
||||
from markitect.infospace import build_state
|
||||
state = build_state(config, entities=entities, metrics=metrics)
|
||||
state.is_viable # True if all thresholds pass
|
||||
state.viability_results # List[ViabilityResult]
|
||||
state.summary() # Dict for display
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Command Summary
|
||||
|
||||
All commands are under `markitect infospace`:
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `init` | Create a new `infospace.yaml` |
|
||||
| `status` | Show entity count, domains, evaluation state |
|
||||
| `entities` | List entities with metadata |
|
||||
| `evaluate` | Run per-entity LLM evaluation |
|
||||
| `check` | Run collection-level quality checks (C1-C5) |
|
||||
| `viability` | Show viability dashboard |
|
||||
| `history` | Show metrics history |
|
||||
| `history-diff` | Compare two snapshots by date |
|
||||
| `bind-discipline` | Bind an external infospace as a discipline |
|
||||
| `disciplines` | List bound disciplines and viability |
|
||||
| `stale-mappings` | Detect stale cross-infospace references |
|
||||
|
||||
---
|
||||
|
||||
## Platform Dependencies
|
||||
|
||||
The infospace tooling builds on these platform modules:
|
||||
|
||||
| Module | Used for |
|
||||
|--------|----------|
|
||||
| `markitect/llm/` | Embedding adapters, LLM evaluation |
|
||||
| `markitect/analysis/graph.py` | Graph analysis (networkx wrapper) |
|
||||
| `markitect/analysis/fca.py` | Formal Concept Analysis |
|
||||
| `markitect/prompts/execution/batch.py` | Batch LLM evaluation |
|
||||
| `markitect/prompts/dependencies/models.py` | DependencyGraph |
|
||||
@@ -37,3 +37,513 @@ no automatic parsing for this format, requiring manual macro construction.
|
||||
**Fix applied:** Added `SHORTHAND_PATTERN` to `MacroParser` that recognises
|
||||
`@{target}` and maps it to `MacroKind.REQUIRED`. Updated `has_macros()`,
|
||||
`count_macros()`, and `find_macro_positions()` accordingly.
|
||||
|
||||
---
|
||||
|
||||
## Assignment Assessment (18 Feb 2026)
|
||||
|
||||
How the example measures against the objectives stated in `README.md`:
|
||||
|
||||
| # | Objective | Status | Notes |
|
||||
|---|-----------|--------|-------|
|
||||
| 1 | Capture knowledge from Wealth of Nations | **Partial** | 7 of 35 chapters processed (Book I, ch. 1-7). 85 canonical entities extracted. |
|
||||
| 2 | Transform to VSM concepts/entities | **Done (for processed chapters)** | Entities mapped to S1-S5 with strength ratings. |
|
||||
| 3 | Consistent and complete | **Not yet** | Only 20% of chapters done. Metrics report exists but covers limited scope. |
|
||||
| 4 | Schemas as scaffolding | **Done** | Four schemas defined and used across all stages. |
|
||||
| 5 | Prompt dependency resolution | **Done** | `@{macro}` templates resolved via MultiSpaceResolutionStrategy. |
|
||||
| 6 | Incremental chapter injection | **Done** | Pipeline processes one chapter at a time; `@{existing_entities}` prevents duplication. |
|
||||
| 7 | Keep changes as git history | **Not done** | See task 4 below. |
|
||||
| 8 | Metrics for completeness/consistency | **Partial** | Template and report exist but only cover 4 chapters (report predates ch. 5-7). |
|
||||
| 9 | No infrastructure changes during experiment | **Violated** | Three infra fixes were required (tasks 1-3 above). Documented as intended. |
|
||||
| 10 | Generate task list for infra issues | **Done** | This file. |
|
||||
|
||||
## 4. Infospace has no per-chapter git history — OPEN
|
||||
|
||||
**Objective:** README states "The information space should utilize the option
|
||||
of keeping changes as git history."
|
||||
**Issue:** The 7 processed chapters were committed in mixed batches alongside
|
||||
infrastructure changes (LLM adapters, entity refactoring, archive policy).
|
||||
Chapters 1-2 are bundled into `fecc2fd` with the entire LLM module.
|
||||
Chapters 5-7 share a single commit (`41773f1`) with the OpenAI adapter and
|
||||
archive policy. There is no commit where you can `git diff` to see exactly
|
||||
what one chapter contributed to the infospace.
|
||||
**Impact:** Cannot use `git log`, `git diff`, or `git bisect` to trace how
|
||||
the infospace grew chapter by chapter — the core promise of "with history."
|
||||
**Suggested fix:** Re-run the 7 processed chapters (and remaining 28) using
|
||||
`process_chapters.py` without `--no-commit`, on a clean branch or after
|
||||
squashing the current output into a baseline commit. Each chapter gets its
|
||||
own commit via `_git_commit_chapter()`.
|
||||
|
||||
## 5. Prompt files are regenerated as a side-effect of DB rebuild — OPEN
|
||||
|
||||
**Issue:** Running `--all --no-commit` to regenerate `infospace.db` also
|
||||
overwrites `*-prompt.md` files in the output directories because each
|
||||
pipeline stage unconditionally writes the compiled prompt before checking
|
||||
whether output already exists. The `@{existing_entities}` macro content
|
||||
shifts as earlier chapters are loaded, so prompt files for already-processed
|
||||
chapters change on every full run.
|
||||
**Impact:** A DB regeneration dirties the working tree with prompt file
|
||||
changes, even though no actual outputs changed. Users must `git checkout`
|
||||
the prompt files after regeneration.
|
||||
**Suggested fix:** Skip writing prompt files when the corresponding output
|
||||
file already exists on disk, or add a `--rebuild-db-only` flag that
|
||||
populates the database without touching the file system.
|
||||
|
||||
## 6. Metrics report is stale — OPEN
|
||||
|
||||
**Issue:** The metrics report (`output/metrics/metrics-report.md`) was
|
||||
generated after chapters 1-4. Chapters 5-7 have since been processed but
|
||||
the report has not been refreshed.
|
||||
**Impact:** The metrics do not reflect the current state of the infospace.
|
||||
**Suggested fix:** Re-run `--metrics --provider <provider> --no-commit`
|
||||
after every batch of new chapters. Consider making metrics assessment
|
||||
automatic at the end of `--book` or `--all` runs.
|
||||
|
||||
## 7. Remaining 28 chapters not yet processed — OPEN
|
||||
|
||||
**Issue:** Only Book I chapters 1-7 have been processed. Books II-V
|
||||
(28 chapters) remain unprocessed.
|
||||
**Impact:** The infospace is incomplete — VSM coverage is limited to S1,
|
||||
S2, and partial S4. S3, S3*, S5, and many systemic concepts (algedonic
|
||||
signals, recursion, variety) are expected to emerge from later books.
|
||||
**Suggested fix:** Process remaining chapters in book-sized batches with
|
||||
per-chapter commits, refreshing metrics after each book.
|
||||
|
||||
---
|
||||
|
||||
## Per-Concept Metrics (tasks 8-12)
|
||||
|
||||
The current metrics system is a single LLM-evaluated narrative report that
|
||||
assesses the infospace as a whole. It produces no machine-readable output,
|
||||
cannot be tracked over time, and conflates per-concept quality with
|
||||
collection-level coherence.
|
||||
|
||||
The improvement splits metrics into two layers:
|
||||
|
||||
- **LLM-Eval**: A prompt template evaluates each concept individually
|
||||
against quality criteria defined in the schema. The LLM returns structured
|
||||
scores, not prose.
|
||||
- **Deterministic aggregation**: `process_chapters.py` computes what it can
|
||||
from files on disk (schema compliance, word counts, section presence,
|
||||
coverage tallies) and aggregates LLM-eval scores into dashboard metrics.
|
||||
|
||||
Both layers persist results in structured form so they can be diffed,
|
||||
tracked over time, and committed alongside the entities they evaluate.
|
||||
|
||||
## 8. Add per-concept quality metrics to entity schema — OPEN
|
||||
|
||||
**Issue:** The entity schema (`economic-entity-schema-v1.0.md`) defines
|
||||
required sections and validation rules (section presence, word count range)
|
||||
but no quality criteria. There is no definition of what makes a *good*
|
||||
entity versus a merely *compliant* one.
|
||||
**Suggested fix:** Add a `## Quality Metrics` section to the entity schema
|
||||
defining evaluation dimensions with scoring rubrics:
|
||||
|
||||
- **Definition Precision** (1-5): Is the definition specific, non-circular,
|
||||
and distinguishable from neighbouring concepts?
|
||||
- **Source Grounding** (1-5): Is the entity grounded in a specific passage?
|
||||
Does the citation exist and support the definition?
|
||||
- **Domain Placement** (1-5): Is the economic domain assignment correct and
|
||||
specific (not just "General Theory")?
|
||||
- **VSM Relevance** (1-5): Does the entity connect meaningfully to at least
|
||||
one VSM system, or is it too granular/abstract to map?
|
||||
- **Explanatory Value** (1-5): Does this entity contribute to explaining
|
||||
the economic system, or is it a restatement of another concept?
|
||||
|
||||
Similarly update the VSM mapping schema with:
|
||||
|
||||
- **Rationale Rigour** (1-5): Is the mapping justified with reference to
|
||||
Beer's definitions, not just surface-level analogy?
|
||||
- **Strength Calibration** (1-5): Is the declared strength (Strong/Moderate/
|
||||
Weak) consistent with the rationale given?
|
||||
|
||||
These rubrics become the prompt instructions for task 9.
|
||||
|
||||
## 9. Create evaluate-entity prompt template — OPEN
|
||||
|
||||
**Depends on:** Task 8 (quality metrics in schema).
|
||||
**Issue:** There is no mechanism to evaluate an existing entity after
|
||||
extraction. Quality is only judged implicitly during the global metrics
|
||||
assessment, which is too coarse to identify individual weak entities.
|
||||
**Suggested fix:** Create `templates/evaluate-entity.md` — a prompt
|
||||
template that:
|
||||
|
||||
1. Takes `@{entity_content}`, `@{source_chapter}`, `@{vsm_framework}`,
|
||||
and `@{quality_rubric}` (from the schema's quality metrics section).
|
||||
2. Asks the LLM to score each dimension (1-5) with a one-sentence
|
||||
justification per score.
|
||||
3. Outputs structured YAML front-matter (scores) followed by markdown
|
||||
(justifications), e.g.:
|
||||
|
||||
```yaml
|
||||
---
|
||||
entity: division-of-labour
|
||||
scores:
|
||||
definition_precision: 5
|
||||
source_grounding: 5
|
||||
domain_placement: 4
|
||||
vsm_relevance: 5
|
||||
explanatory_value: 5
|
||||
overall: 4.8
|
||||
flags: []
|
||||
---
|
||||
```
|
||||
|
||||
Add a pipeline stage: `--evaluate` runs this template against every
|
||||
canonical entity and writes results to `output/evaluations/<slug>-eval.md`.
|
||||
A `--evaluate --chapter <id>` variant evaluates only entities introduced
|
||||
by that chapter.
|
||||
|
||||
## 10. Add deterministic schema compliance checker — OPEN
|
||||
|
||||
**Issue:** Schema compliance is currently LLM-evaluated ("100%" in the
|
||||
metrics report) but the validation rules in the schemas are mechanical:
|
||||
section presence, word count ranges, heading format. These should be
|
||||
checked programmatically, not by an LLM.
|
||||
**Suggested fix:** Add a `validate_entity(path) -> ValidationResult`
|
||||
function to `process_chapters.py` (or a new `validate.py` module) that:
|
||||
|
||||
- Parses the markdown to extract H2 section headings
|
||||
- Checks required sections are present (Definition, Source Chapter,
|
||||
Context, Economic Domain)
|
||||
- Counts words in the Definition section (must be 20-150)
|
||||
- Checks H1 heading exists and is not a slug (e.g. `effectual-demand`
|
||||
in chapter 7 has `# effectual-demand` instead of `# Effectual Demand`)
|
||||
- Validates Source Chapter cites a specific book/chapter
|
||||
- For mapping files: checks Mapping Strength is one of the enum values
|
||||
|
||||
Expose as `--validate` CLI flag. Output a structured report:
|
||||
|
||||
```
|
||||
Validation: 85 entities, 3 warnings
|
||||
effectual-demand.md: H1 is slug format, not title case
|
||||
porter.md: Definition is 18 words (minimum 20)
|
||||
...
|
||||
```
|
||||
|
||||
This is fully deterministic — no LLM calls needed.
|
||||
|
||||
## 11. Structured metrics output format — OPEN
|
||||
|
||||
**Depends on:** Tasks 9 and 10.
|
||||
**Issue:** The metrics report is a markdown narrative. Values cannot be
|
||||
parsed programmatically, diffed meaningfully, or plotted over time.
|
||||
**Suggested fix:** Alongside the human-readable `metrics-report.md`,
|
||||
emit a machine-readable `metrics.yaml` (or `.json`) containing:
|
||||
|
||||
```yaml
|
||||
timestamp: "2026-02-18T12:00:00Z"
|
||||
chapters_processed: 7
|
||||
chapters_total: 35
|
||||
entities_total: 85
|
||||
entities_archived: 0
|
||||
vsm_coverage:
|
||||
S1: 28
|
||||
S2: 12
|
||||
S3: 8
|
||||
S3_star: 0
|
||||
S4: 5
|
||||
S5: 0
|
||||
recursion: 1
|
||||
variety: 0
|
||||
mapping_strength:
|
||||
strong: 64
|
||||
moderate: 18
|
||||
weak: 3
|
||||
validation:
|
||||
schema_compliant: 82
|
||||
warnings: 3
|
||||
evaluation: # from LLM-eval (task 9)
|
||||
mean_overall: 4.2
|
||||
min_overall: 2.8
|
||||
flagged_entities: ["porter", "country-workman"]
|
||||
```
|
||||
|
||||
The `--metrics` command writes both files. The YAML file is committed
|
||||
to git so `git diff` shows exactly how metrics changed between runs.
|
||||
|
||||
## 12. Metrics-over-time tracking — OPEN
|
||||
|
||||
**Depends on:** Task 11 (structured output).
|
||||
**Issue:** There is one metrics snapshot that gets overwritten. No history
|
||||
of how metrics evolved as chapters were added.
|
||||
**Suggested fix:** Append each metrics snapshot to a cumulative log file
|
||||
`output/metrics/metrics-history.yaml` (list of timestamped entries). This
|
||||
is committed to git alongside the current snapshot. The pipeline can
|
||||
optionally render a simple text-based progress summary:
|
||||
|
||||
```
|
||||
Metrics history (5 snapshots):
|
||||
2026-02-10 ch 1/35 13 entities 41.7% VSM coverage
|
||||
2026-02-11 ch 4/35 38 entities 50.0% VSM coverage
|
||||
2026-02-11 ch 7/35 85 entities 58.3% VSM coverage
|
||||
...
|
||||
```
|
||||
|
||||
This provides the "metrics that improve over time" feedback loop the
|
||||
README envisions: process chapters → evaluate → see coverage grow (or
|
||||
flag regressions when a re-extraction reduces quality scores).
|
||||
|
||||
---
|
||||
|
||||
## Collection-Level Metrics (tasks 13-19)
|
||||
|
||||
These tasks implement the five collection-level concerns described in
|
||||
`METRICS-METHODOLOGY.md`. They share underlying infrastructure (entity
|
||||
metadata index, definition embeddings, relationship graph) that should
|
||||
be built once per evaluation run.
|
||||
|
||||
See the methodology document for theoretical grounding, framework
|
||||
references, and the full metric definitions per concern.
|
||||
|
||||
## 13. Entity metadata index — deterministic parsing layer — OPEN
|
||||
|
||||
**Depends on:** Task 10 (schema compliance checker shares parsing logic).
|
||||
**Issue:** Several collection-level metrics (coverage matrix, FCA context,
|
||||
granularity distribution) require structured metadata extracted from entity
|
||||
files: H1 title, economic domain, VSM system(s), source chapter, section
|
||||
presence, word counts. Currently this information exists only as prose
|
||||
inside markdown files.
|
||||
**Suggested fix:** Add a `parse_entity_metadata(path) -> EntityMeta`
|
||||
function that extracts from each entity file:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class EntityMeta:
|
||||
slug: str
|
||||
title: str # from H1
|
||||
domain: str # from Economic Domain section
|
||||
source_chapter: str # from Source Chapter section
|
||||
definition_words: int # word count of Definition section
|
||||
has_original_wording: bool # optional section present?
|
||||
has_modern_interpretation: bool
|
||||
vsm_systems: list[str] # from mapping file if exists
|
||||
mapping_strengths: list[str]
|
||||
```
|
||||
|
||||
Build an index of all entities at the start of each evaluation run.
|
||||
This index is the input for tasks 14, 16, and 18. Expose as
|
||||
`--index` CLI flag for inspection.
|
||||
|
||||
## 14. Redundancy detection (Concern C1) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** OOPS! P2 (synonymous classes) + embedding similarity +
|
||||
LLM pairwise judgment. See METRICS-METHODOLOGY.md §4 C1.
|
||||
**Issue:** Entities with different slugs but overlapping meanings (e.g.
|
||||
`natural-rate` / `ordinary-or-average-rate`) survive extraction because
|
||||
dedup only checks slug collisions. There is no semantic overlap detection.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **Embed** — Compute vector embeddings of all entity definitions using
|
||||
an embedding API (OpenRouter, OpenAI, or a local sentence-transformer).
|
||||
Cache embeddings in `output/metrics/embeddings.json` keyed by
|
||||
`{slug: content_digest}` so unchanged entities skip re-embedding.
|
||||
|
||||
2. **Similarity matrix** — Compute NxN cosine similarity. Write the full
|
||||
matrix to `output/metrics/similarity-matrix.json`. Flag all pairs with
|
||||
cosine > 0.80 as candidates.
|
||||
|
||||
3. **LLM pairwise judgment** — For each candidate pair, run a prompt:
|
||||
"Given these two entity definitions, are they (a) the same concept and
|
||||
should be merged, (b) genuinely distinct, or (c) partially overlapping
|
||||
and should be clarified?" Write results to
|
||||
`output/metrics/redundancy-report.md` + YAML.
|
||||
|
||||
**Metrics produced:**
|
||||
- `high_similarity_pairs`: count and list
|
||||
- `confirmed_synonyms`: count (LLM-confirmed same concept)
|
||||
- `redundancy_ratio`: `confirmed_synonyms / total_entities`
|
||||
- `intensional_conciseness`: `1 - redundancy_ratio`
|
||||
|
||||
**CLI:** `--check-redundancy --provider <provider>`
|
||||
|
||||
## 15. Coverage completeness (Concern C2) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** SEQUAL completeness + FCA gap analysis + DSL competency
|
||||
questions. See METRICS-METHODOLOGY.md §4 C2.
|
||||
**Issue:** Coverage is currently assessed by the LLM in a single narrative
|
||||
pass. There is no structured view of which domain × VSM cells are
|
||||
populated, and no way to test whether the entity set can answer specific
|
||||
questions about the economic system.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **Domain × VSM matrix** — From the metadata index, count entities per
|
||||
{economic_domain, vsm_system} cell. Render as a table. Identify empty
|
||||
cells as specific, actionable gaps. Compute:
|
||||
- `coverage_ratio = populated_cells / total_cells`
|
||||
- `vsm_balance_entropy = -Σ(pᵢ log pᵢ)` across VSM systems
|
||||
|
||||
2. **FCA lattice** — Construct a formal context with objects = entities,
|
||||
attributes = {domain, vsm_system, source_book, abstraction_level}.
|
||||
Compute the concept lattice (Python `concepts` library). Extract
|
||||
attribute combinations with no corresponding entity — these are
|
||||
**structural coverage gaps** not visible in the simple matrix.
|
||||
|
||||
3. **Competency questions** — Define a set of 15-20 canonical questions
|
||||
the infospace should answer (stored in
|
||||
`schemas/competency-questions.md`). Example questions:
|
||||
- "How does the division of labour relate to market extent?"
|
||||
- "What mechanisms regulate wages toward their natural rate?"
|
||||
- "How do monopolies distort the viable system?"
|
||||
LLM-Eval tests whether current entities suffice to answer each.
|
||||
Unanswerable questions identify specific completeness gaps.
|
||||
|
||||
**Metrics produced:**
|
||||
- `domain_vsm_matrix`: cell counts
|
||||
- `coverage_ratio`: scalar
|
||||
- `vsm_balance_entropy`: scalar
|
||||
- `empty_cells`: list of {domain, vsm_system} gaps
|
||||
- `fca_gap_concepts`: attribute combos with no entity
|
||||
- `competency_coverage`: fraction of questions answerable
|
||||
|
||||
**CLI:** `--check-coverage --provider <provider>`
|
||||
|
||||
## 16. Structural coherence (Concern C3) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** OntoQA relationship richness + graph connectivity +
|
||||
community detection. See METRICS-METHODOLOGY.md §4 C3.
|
||||
**Issue:** It is unknown whether the 85 entities form a connected
|
||||
explanatory web or a fragmented collection. No relationship graph exists
|
||||
between entities.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **Explicit cross-references** — Scan each entity's definition for
|
||||
mentions of other entity slugs or titles (normalised string matching).
|
||||
This is deterministic and catches direct references.
|
||||
|
||||
2. **LLM-inferred edges** — For entity pairs not caught by string
|
||||
matching but in the same domain or VSM system, LLM-Eval: "Does A's
|
||||
definition conceptually depend on or explain B, or vice versa?" Run
|
||||
in batches. Write the combined graph to
|
||||
`output/metrics/relationship-graph.json` (adjacency list).
|
||||
|
||||
3. **Graph analysis** — Using networkx or equivalent:
|
||||
- Connected components (target: 1)
|
||||
- Graph density, average degree
|
||||
- Betweenness centrality → identify bridge concepts
|
||||
- Louvain community detection → compare to declared domains
|
||||
- OntoQA Relationship Richness
|
||||
- Cohesion per domain, coupling across domains
|
||||
- Orphan entities (degree 0 or 1)
|
||||
|
||||
**Metrics produced:**
|
||||
- `connected_components`: count (target: 1)
|
||||
- `graph_density`: scalar
|
||||
- `avg_degree`: scalar
|
||||
- `relationship_richness`: OntoQA RR
|
||||
- `modularity`: Louvain score
|
||||
- `bridge_concepts`: list (high betweenness centrality)
|
||||
- `orphan_entities`: list (degree ≤ 1)
|
||||
- `cohesion_by_domain` / `coupling_across_domains`: scalars
|
||||
|
||||
**CLI:** `--check-coherence --provider <provider>`
|
||||
|
||||
## 17. Definitional consistency (Concern C4) — OPEN
|
||||
|
||||
**Depends on:** Task 16 (relationship graph — the definitional dependency
|
||||
graph is a directed variant of the same structure).
|
||||
**Methodology:** OntoClean metaproperties + OOPS! P24 (circular
|
||||
definitions) + SEQUAL validity. See METRICS-METHODOLOGY.md §4 C4.
|
||||
**Issue:** No mechanism to detect circular definitions, contradictions
|
||||
between related entities, or terms used in definitions that should be
|
||||
entities but aren't.
|
||||
**Suggested fix:** Implement in four stages:
|
||||
|
||||
1. **Definitional dependency graph** — Directed version of the
|
||||
relationship graph: edge A→B means A's definition uses B's concept.
|
||||
Reuse cross-reference extraction from task 16.
|
||||
|
||||
2. **Cycle detection** — Find all cycles of length ≤ 3 in the directed
|
||||
graph. Short cycles are problematic (A defines B, B defines A).
|
||||
Compute `grounding_ratio`: fraction of entities traceable to terms
|
||||
outside the entity set without encountering a cycle.
|
||||
|
||||
3. **Undefined dependencies** — Extract terms from definitions that match
|
||||
entity-name patterns (capitalised noun phrases, kebab-case slugs) but
|
||||
have no corresponding entity file. These are concepts the infospace
|
||||
implicitly relies on but hasn't defined.
|
||||
|
||||
4. **LLM consistency checks** — For directly-connected entity pairs,
|
||||
LLM-Eval: "Do these definitions contradict each other?" For entities
|
||||
with Smith's Original Wording, LLM-Eval: "Does the definition
|
||||
accurately represent the cited passage?"
|
||||
|
||||
**Metrics produced:**
|
||||
- `circular_definitions`: count and list of cycles (length ≤ 3)
|
||||
- `grounding_ratio`: fraction of entities reaching primitives
|
||||
- `undefined_dependencies`: list of missing terms
|
||||
- `contradiction_candidates`: LLM-flagged pairs
|
||||
- `source_fidelity_score`: fraction passing source check
|
||||
|
||||
**CLI:** `--check-consistency --provider <provider>`
|
||||
|
||||
## 18. Granularity balance (Concern C5) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** Keet granularity theory + OntoClean rigidity +
|
||||
DSL laconicity. See METRICS-METHODOLOGY.md §4 C5.
|
||||
**Issue:** Entities range from broad sectors (`agriculture`) to specific
|
||||
market roles (`effectual-demanders`) to abstract principles
|
||||
(`division-of-labour`). It is unclear whether this range is appropriate
|
||||
or whether some entities are too specific/general relative to their peers.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **LLM classification** — For each entity, LLM-Eval assigns:
|
||||
- Abstraction level: `theory` / `mechanism` / `observation`
|
||||
- Scope score: 1-5 (very specific → very general)
|
||||
- Indispensability: 1-5 ("if removed, how much explanatory power lost?")
|
||||
Write to `output/evaluations/<slug>-classification.yaml`.
|
||||
|
||||
2. **Distribution analysis** — Deterministic:
|
||||
- Count per abstraction level; compute entropy
|
||||
- Per-domain scope variance (flag domains with high variance)
|
||||
- Level × domain matrix (from FCA context in task 15)
|
||||
- Outlier detection: entities > 1.5σ from their domain's mean scope
|
||||
|
||||
3. **Merge/split recommendations** — For outlier entities, LLM-Eval:
|
||||
"Should this entity be merged into a broader concept, split into
|
||||
sub-concepts, or is its current granularity justified?" For entities
|
||||
with indispensability ≤ 2: "Could another entity serve this purpose?"
|
||||
|
||||
**Metrics produced:**
|
||||
- `abstraction_distribution`: {theory: n, mechanism: n, observation: n}
|
||||
- `abstraction_entropy`: scalar (higher = more balanced)
|
||||
- `scope_variance_by_domain`: per-domain scalar
|
||||
- `dispensable_entities`: list (indispensability ≤ 2)
|
||||
- `merge_candidates`: list of pairs
|
||||
- `split_candidates`: list of entities
|
||||
|
||||
**CLI:** `--check-granularity --provider <provider>`
|
||||
|
||||
## 19. Unified collection evaluation command — OPEN
|
||||
|
||||
**Depends on:** Tasks 13-18.
|
||||
**Issue:** Running five separate `--check-*` commands is cumbersome and
|
||||
repeats shared computation (metadata parsing, embedding, graph building).
|
||||
**Suggested fix:** Add `--evaluate-collection --provider <provider>` that
|
||||
runs all five checks in sequence, sharing infrastructure:
|
||||
|
||||
1. Parse entity metadata index (task 13) — used by all
|
||||
2. Compute embeddings (task 14) — used by C1, C3
|
||||
3. Build relationship graph (task 16) — used by C3, C4
|
||||
4. Run all five concern checks
|
||||
5. Write per-concern reports to `output/metrics/`
|
||||
6. Write unified `metrics.yaml` with all collection metrics
|
||||
7. Append to `metrics-history.yaml` (task 12)
|
||||
|
||||
Incremental mode: `--evaluate-collection --chapter <id>` re-evaluates
|
||||
only entities from that chapter plus pairwise checks involving them.
|
||||
|
||||
Report a summary to stdout:
|
||||
|
||||
```
|
||||
Collection evaluation (85 entities, 7 chapters):
|
||||
Redundancy: 3 synonym candidates, conciseness 0.96
|
||||
Coverage: 58% VSM, 20% chapters, 4 domain gaps
|
||||
Coherence: 1 component, density 0.12, 2 orphans
|
||||
Consistency: 0 cycles, 5 undefined deps, 0 contradictions
|
||||
Granularity: entropy 1.42, 1 dispensable, 2 merge candidates
|
||||
```
|
||||
|
||||
501
examples/infospace-with-history/METRICS-METHODOLOGY.md
Normal file
501
examples/infospace-with-history/METRICS-METHODOLOGY.md
Normal file
@@ -0,0 +1,501 @@
|
||||
# Collection-Level Metrics Methodology
|
||||
|
||||
How we evaluate the quality of the infospace as a **collection of
|
||||
interrelated concepts**, beyond the quality of individual entities.
|
||||
|
||||
This document describes the theoretical frameworks drawn from ontology
|
||||
engineering, formal concept analysis, semiotic quality theory, and DSL
|
||||
design — and how each is adapted to work within MarkiTect's two-layer
|
||||
evaluation model (LLM-Eval + deterministic aggregation).
|
||||
|
||||
---
|
||||
|
||||
## 1. The Two-Layer Model
|
||||
|
||||
Every metric in this methodology decomposes into two layers:
|
||||
|
||||
| Layer | What it does | How it runs |
|
||||
|-------|-------------|-------------|
|
||||
| **LLM-Eval** | Qualitative judgment: "Are these two concepts the same?", "Is this definition grounded in the source?" | Prompt template → LLM → structured YAML output |
|
||||
| **Deterministic** | Quantitative aggregation: cosine similarity, graph connectivity, coverage counting, cycle detection | Python code in `process_chapters.py` or dedicated `metrics.py` |
|
||||
|
||||
The LLM-Eval layer produces **per-entity** or **per-pair** structured
|
||||
scores. The deterministic layer **aggregates** these into collection-level
|
||||
metrics, persisted as machine-readable YAML alongside human-readable
|
||||
markdown reports.
|
||||
|
||||
Per-concept quality metrics (definition precision, source grounding, VSM
|
||||
relevance — see INFRA-TASKS 8-12) operate at the individual entity level.
|
||||
This document covers the five **collection-level concerns** that assess how
|
||||
the entities work together as an explanatory system.
|
||||
|
||||
---
|
||||
|
||||
## 2. Five Collection-Level Concerns
|
||||
|
||||
### Overview
|
||||
|
||||
| # | Concern | Question | Primary framework |
|
||||
|---|---------|----------|-------------------|
|
||||
| C1 | Semantic Overlap | Are there redundant concepts? | OOPS! P2, embedding similarity |
|
||||
| C2 | Coverage Completeness | Does the concept set cover the domain? | SEQUAL, FCA |
|
||||
| C3 | Structural Coherence | Do concepts form a connected explanatory graph? | OntoQA, graph theory |
|
||||
| C4 | Definitional Consistency | Are concepts defined consistently and non-circularly? | OntoClean, OOPS! P24 |
|
||||
| C5 | Granularity Balance | Are concepts at comparable levels of abstraction? | Granularity theory, DSL laconicity |
|
||||
|
||||
---
|
||||
|
||||
## 3. Theoretical Frameworks
|
||||
|
||||
### 3.1 SEQUAL (Semiotic Quality Framework)
|
||||
|
||||
**Origin:** Lindland, Sindre & Sølvberg (1994), extended by Krogstie et al.
|
||||
|
||||
**What it defines:** Quality of a conceptual model as the correspondence
|
||||
between three worlds — the domain (what exists), the model (what we
|
||||
captured), and the audience's interpretation (what they understand).
|
||||
|
||||
Two key dimensions of **semantic quality**:
|
||||
|
||||
- **Validity** — everything in the model corresponds to something real
|
||||
in the domain. No invented concepts.
|
||||
- **Completeness** — everything relevant in the domain is represented in
|
||||
the model. No missing concepts.
|
||||
|
||||
**How we use it:** SEQUAL frames our entire metrics approach. Every
|
||||
collection-level metric maps to one of these dimensions:
|
||||
|
||||
| SEQUAL dimension | Our concerns |
|
||||
|-----------------|--------------|
|
||||
| Validity | C1 (redundancy reduces validity — duplicate concepts don't correspond to distinct domain facts), C4 (consistency — contradictory definitions can't both be valid) |
|
||||
| Completeness | C2 (coverage — are all needed concepts present?), C5 (granularity — missing levels of abstraction are completeness gaps) |
|
||||
| Both | C3 (coherence — disconnected concepts suggest either missing bridging concepts [completeness] or misplaced concepts [validity]) |
|
||||
|
||||
**Adaptation:** SEQUAL was designed for formal models evaluated by human
|
||||
experts. We replace human judgment with LLM-Eval (for validity checks like
|
||||
"does this concept correspond to something Smith actually described?") and
|
||||
deterministic counting (for completeness checks like "which VSM systems
|
||||
lack entity mappings?").
|
||||
|
||||
### 3.2 OntoClean
|
||||
|
||||
**Origin:** Guarino & Welty (2004).
|
||||
|
||||
**What it defines:** A methodology for validating taxonomic relationships
|
||||
by assigning **metaproperties** to each concept:
|
||||
|
||||
- **Rigidity** — Is the property essential to all its instances? (e.g.
|
||||
"market" is rigid; "effectual demander" is anti-rigid — an agent can
|
||||
stop being an effectual demander)
|
||||
- **Identity** — Does the concept carry an identity criterion? (e.g.
|
||||
"division of labour" can be identified by its three causal mechanisms)
|
||||
- **Unity** — Are all instances of this concept whole in the same way?
|
||||
- **Dependence** — Does the concept require another concept to exist?
|
||||
(e.g. "market price" depends on "effectual demand")
|
||||
|
||||
**Constraint:** A rigid concept cannot be subsumed by an anti-rigid one.
|
||||
Violations indicate structural confusion.
|
||||
|
||||
**How we use it:** We do not have a formal taxonomy, but our flat entity
|
||||
set implicitly contains subsumption relationships (e.g. "natural rate"
|
||||
subsumes "ordinary-or-average rate"). OntoClean metaproperties help detect:
|
||||
|
||||
- **Granularity mismatches** (C5): A rigid concept at the same level as
|
||||
an anti-rigid one suggests different abstraction levels are mixed.
|
||||
- **Definitional consistency** (C4): If entity A depends on entity B per
|
||||
OntoClean, but B's definition doesn't acknowledge A, the definitions
|
||||
are inconsistent.
|
||||
- **Redundancy** (C1): Two entities with identical metaproperty profiles
|
||||
and overlapping definitions are candidates for merging.
|
||||
|
||||
**Adaptation:** Instead of manual metaproperty assignment, we use LLM-Eval
|
||||
to classify each entity's rigidity, identity criterion, and dependencies.
|
||||
The constraint checking is then deterministic.
|
||||
|
||||
### 3.3 OOPS! (Ontology Pitfall Scanner)
|
||||
|
||||
**Origin:** Poveda-Villalón et al. (2014). Catalogue of 41 common
|
||||
ontology design pitfalls.
|
||||
|
||||
**What it defines:** Concrete, testable anti-patterns. The pitfalls most
|
||||
relevant to our infospace:
|
||||
|
||||
| Pitfall | Description | Our concern |
|
||||
|---------|-------------|-------------|
|
||||
| P2 | Synonymous classes — different names, same meaning | C1 (redundancy) |
|
||||
| P4 | Unconnected ontology elements | C3 (coherence) |
|
||||
| P6 | Missing inverse relationships | C3 |
|
||||
| P7 | Merging different concepts in the same class | C5 (granularity — too coarse) |
|
||||
| P11 | Missing domain or range | C4 (consistency) |
|
||||
| P19 | Missing disjointness axioms | C1 (how do we know two concepts don't overlap?) |
|
||||
| P24 | Recursive/circular definition | C4 (consistency) |
|
||||
| P25 | Inverse of itself | C4 |
|
||||
|
||||
**How we use it:** OOPS! pitfalls become a **checklist for LLM-Eval
|
||||
prompts**. Rather than running a formal OWL scanner, we ask the LLM to
|
||||
check for each pitfall pattern:
|
||||
|
||||
- "Are entities A and B synonymous?" (P2)
|
||||
- "Does entity A's definition reference itself?" (P24)
|
||||
- "Is entity A actually two distinct concepts merged together?" (P7)
|
||||
|
||||
The deterministic layer counts pitfall occurrences and tracks them over
|
||||
time.
|
||||
|
||||
**Adaptation:** We select the subset of OOPS! pitfalls applicable to
|
||||
semi-formal markdown-based ontologies (no OWL axioms) and implement each
|
||||
as an LLM-Eval prompt pattern rather than a formal reasoner check.
|
||||
|
||||
### 3.4 OntoQA (Metric-Based Ontology Quality Analysis)
|
||||
|
||||
**Origin:** Tartir & Arpinar (2007).
|
||||
|
||||
**What it defines:** Quantitative schema-level and instance-level metrics:
|
||||
|
||||
- **Relationship Richness (RR):** Proportion of non-taxonomic (lateral)
|
||||
relationships to total relationships. `RR = non_hierarchical / total`.
|
||||
Low RR = mere taxonomy. High RR = rich cross-cutting connections.
|
||||
- **Attribute Richness (AR):** Average number of attributes per concept.
|
||||
`AR = total_attributes / total_concepts`.
|
||||
- **Inheritance Richness (IR):** Average subclasses per class — measures
|
||||
how knowledge distributes across the hierarchy.
|
||||
- **Class Richness (CR):** Proportion of classes with instances.
|
||||
|
||||
**How we use it:** Our entities don't have formal relationships declared
|
||||
between them, but we can **infer** a relationship graph from their
|
||||
definitions and mappings:
|
||||
|
||||
- Entity A references entity B in its definition → definitional dependency
|
||||
- Entities A and B map to the same VSM system → structural co-occurrence
|
||||
- Entities A and B appear in the same chapter → contextual co-occurrence
|
||||
|
||||
From this inferred graph, we compute OntoQA metrics directly:
|
||||
|
||||
- **Relationship Richness** tells us whether our concepts form a web of
|
||||
explanatory connections or just a flat list.
|
||||
- **Attribute Richness** maps to our schema sections — entities with more
|
||||
optional sections filled (Original Wording, Modern Interpretation) are
|
||||
richer.
|
||||
|
||||
**Adaptation:** The key modification is that relationship inference is an
|
||||
LLM-Eval step (pairwise: "does A's definition depend on or reference B?"),
|
||||
after which all OntoQA metrics are computed deterministically on the
|
||||
resulting graph.
|
||||
|
||||
### 3.5 Formal Concept Analysis (FCA)
|
||||
|
||||
**Origin:** Wille (1982). Applied to ontology auditing by Elhaj et al.
|
||||
(2008) for SNOMED CT completeness checking.
|
||||
|
||||
**What it defines:** A mathematical framework for deriving a **concept
|
||||
lattice** from a binary relation between objects and attributes. The
|
||||
lattice reveals:
|
||||
|
||||
- **Formal concepts**: maximal sets of objects sharing the same attributes
|
||||
- **Subconcept/superconcept** relationships: the natural hierarchy
|
||||
- **Missing concepts**: attribute combinations with no corresponding object
|
||||
|
||||
**How we use it:** We construct a **formal context** (binary matrix):
|
||||
|
||||
- **Objects** = our 85 entities
|
||||
- **Attributes** = economic domain, VSM system, source book, abstraction
|
||||
level (from LLM-Eval), key terms (extracted from definitions)
|
||||
|
||||
The concept lattice then reveals:
|
||||
|
||||
- **Coverage gaps** (C2): Attribute combinations with no entity. E.g. if
|
||||
the cell {Distribution, S3} is empty, we lack control-layer concepts
|
||||
for distribution — a specific, actionable gap.
|
||||
- **Redundancy** (C1): Entities with identical attribute sets (same formal
|
||||
concept) are candidates for merging.
|
||||
- **Granularity** (C5): The lattice depth indicates how many meaningful
|
||||
levels of abstraction exist. A shallow lattice suggests missing
|
||||
intermediate concepts.
|
||||
|
||||
**Adaptation:** Classic FCA requires crisp binary attributes. Our domains
|
||||
and VSM mappings are already categorical, but abstraction level and key
|
||||
terms need LLM-Eval to produce. The lattice computation itself is
|
||||
deterministic (Python `concepts` library or equivalent). The FCA approach
|
||||
replaces the current "ask the LLM about coverage" with a structural
|
||||
computation that can identify *specific* gaps rather than vague
|
||||
recommendations.
|
||||
|
||||
### 3.6 DSL Design Principles
|
||||
|
||||
**Origin:** Mernik et al. (2005) "When and How to Develop DSLs";
|
||||
Karsai et al. (2014) "Design Guidelines for Domain-Specific Languages".
|
||||
|
||||
**What they define:** Quality criteria for a set of concepts that form a
|
||||
language for a specific domain:
|
||||
|
||||
- **Soundness**: Every concept in the language corresponds to a real domain
|
||||
concern (no invented abstractions).
|
||||
- **Completeness**: The language can express everything needed for its
|
||||
intended tasks.
|
||||
- **Laconicity**: No unnecessary concepts — every concept earns its place.
|
||||
- **Orthogonality**: Concepts are independent; combining any two produces
|
||||
a meaningful result (no redundant combinations).
|
||||
|
||||
**How we use it:** Our entity set is effectively a domain-specific
|
||||
vocabulary for "explaining classical economics through VSM". DSL quality
|
||||
criteria translate directly:
|
||||
|
||||
- **Soundness** → Validity (SEQUAL): every entity grounded in Smith's text
|
||||
- **Completeness** → Coverage (C2): can we answer the "competency
|
||||
questions" the infospace is meant to address?
|
||||
- **Laconicity** → Anti-redundancy (C1) + Indispensability (C5): would
|
||||
removing any entity lose explanatory power?
|
||||
- **Orthogonality** → Non-overlap (C1): entity definitions don't
|
||||
substantially duplicate each other
|
||||
|
||||
**Adaptation:** We operationalise DSL completeness through **competency
|
||||
questions** — a set of canonical questions the infospace should be able to
|
||||
answer (e.g. "How does the division of labour relate to market extent?",
|
||||
"What mechanisms regulate wages toward their natural rate?"). LLM-Eval
|
||||
tests whether the current entity set suffices to answer each question.
|
||||
Unanswerable questions identify specific completeness gaps.
|
||||
|
||||
Laconicity is operationalised as **indispensability scoring**: for each
|
||||
entity, LLM-Eval rates whether removing it would lose explanatory power.
|
||||
Low-scoring entities are candidates for merging or retirement.
|
||||
|
||||
---
|
||||
|
||||
## 4. Integration: Metric Definitions by Concern
|
||||
|
||||
### C1: Semantic Overlap / Redundancy
|
||||
|
||||
**Goal:** Identify entities that substantially overlap in meaning and
|
||||
should be merged, distinguished, or retired.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `similarity_matrix` | Deterministic | Embed all entity definitions; compute NxN cosine similarity |
|
||||
| `high_similarity_pairs` | Deterministic | Pairs with cosine > 0.80, sorted descending |
|
||||
| `confirmed_synonyms` | LLM-Eval | For each high-similarity pair, LLM judges: "same concept" / "genuinely distinct" / "partial overlap" |
|
||||
| `redundancy_ratio` | Deterministic | `confirmed_synonyms / total_entities` |
|
||||
| `intensional_conciseness` | Deterministic | `1 - redundancy_ratio` (from KG quality framework) |
|
||||
|
||||
**Pipeline:**
|
||||
1. Embed definitions (embedding API or local model)
|
||||
2. Compute cosine similarity matrix
|
||||
3. Filter pairs above threshold
|
||||
4. LLM pairwise judgment on filtered pairs only (avoids N² LLM calls)
|
||||
5. Aggregate into ratio and conciseness score
|
||||
|
||||
**Output:** `output/metrics/redundancy-report.md` + structured YAML with
|
||||
pair list, scores, and merge/retire recommendations.
|
||||
|
||||
### C2: Coverage Completeness
|
||||
|
||||
**Goal:** Identify domain areas and VSM systems that lack adequate
|
||||
representation in the entity set.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `domain_vsm_matrix` | Deterministic | Count entities per {economic_domain, VSM_system} cell |
|
||||
| `coverage_ratio` | Deterministic | `populated_cells / expected_cells` |
|
||||
| `vsm_balance_entropy` | Deterministic | Shannon entropy of entity distribution across VSM systems (higher = more balanced) |
|
||||
| `empty_cells` | Deterministic | List of {domain, VSM_system} pairs with zero entities |
|
||||
| `competency_coverage` | LLM-Eval | For each competency question, can it be answered with current entities? |
|
||||
| `fca_gap_concepts` | Deterministic | Attribute combinations in the FCA lattice with no corresponding entity |
|
||||
|
||||
**Pipeline:**
|
||||
1. Parse entity metadata (domain, VSM mapping) from files on disk
|
||||
2. Build domain × VSM matrix; identify empty cells
|
||||
3. Build FCA formal context; compute lattice; extract gap concepts
|
||||
4. Define competency questions (initially hand-written, later LLM-generated
|
||||
from the source material)
|
||||
5. LLM-evaluate answerability of each question
|
||||
6. Aggregate into coverage ratio, entropy, and gap list
|
||||
|
||||
**Output:** `output/metrics/coverage-report.md` + YAML with matrix, gaps,
|
||||
and competency question results.
|
||||
|
||||
### C3: Structural Coherence
|
||||
|
||||
**Goal:** Determine whether the entities form a connected explanatory web
|
||||
or a fragmented collection of isolated concepts.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `relationship_graph` | LLM-Eval + Deterministic | Infer edges from definition cross-references (string matching) + LLM judgment for implicit references |
|
||||
| `connected_components` | Deterministic | Number of connected components in the graph (target: 1) |
|
||||
| `graph_density` | Deterministic | `actual_edges / possible_edges` |
|
||||
| `avg_degree` | Deterministic | `total_edges / total_entities` |
|
||||
| `relationship_richness` | Deterministic | OntoQA RR: `non_hierarchical_edges / total_edges` |
|
||||
| `modularity` | Deterministic | Louvain modularity score (0.3-0.7 = meaningful structure; >0.8 = fragmentation) |
|
||||
| `bridge_concepts` | Deterministic | Entities with highest betweenness centrality (connect clusters) |
|
||||
| `orphan_entities` | Deterministic | Entities with degree 0 or 1 |
|
||||
| `cohesion_by_domain` | Deterministic | Avg intra-domain edges per entity |
|
||||
| `coupling_across_domains` | Deterministic | Inter-domain edges / total edges |
|
||||
|
||||
**Pipeline:**
|
||||
1. Extract explicit cross-references from definitions (entity name
|
||||
mentions in other definitions — string matching with slug normalisation)
|
||||
2. For entity pairs not caught by string matching, LLM-Eval: "Does A's
|
||||
definition depend on or reference B's concept?"
|
||||
3. Build directed graph
|
||||
4. Compute graph metrics (networkx or equivalent)
|
||||
5. Run community detection; compare detected communities to declared
|
||||
economic domains
|
||||
|
||||
**Output:** `output/metrics/coherence-report.md` + YAML with graph
|
||||
statistics, orphan list, bridge concepts, and community structure.
|
||||
|
||||
### C4: Definitional Consistency
|
||||
|
||||
**Goal:** Ensure entities are defined consistently, non-circularly, and
|
||||
without contradicting each other.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `definitional_dependency_graph` | Deterministic + LLM-Eval | Edges where A's definition uses B's concept |
|
||||
| `circular_definitions` | Deterministic | Cycles of length ≤ 3 in the dependency graph |
|
||||
| `definition_depth` | Deterministic | Longest dependency chain per entity before reaching a term not in the entity set |
|
||||
| `undefined_dependencies` | Deterministic | Terms used in definitions that arguably should be entities but aren't |
|
||||
| `pairwise_consistency` | LLM-Eval | For related entity pairs (sharing edges): "Do these definitions contradict each other?" |
|
||||
| `source_fidelity` | LLM-Eval | "Does this definition accurately represent what Smith wrote in the cited passage?" |
|
||||
| `metaproperty_violations` | LLM-Eval + Deterministic | OntoClean constraint checking after LLM classifies rigidity/identity |
|
||||
| `grounding_ratio` | Deterministic | Fraction of entities traceable to primitives without cycles |
|
||||
|
||||
**Pipeline:**
|
||||
1. Build definitional dependency graph (same technique as C3, but directed
|
||||
— A depends on B means A's definition uses B, not vice versa)
|
||||
2. Detect cycles; flag short cycles
|
||||
3. Extract undefined terms (terms matching entity-name patterns that appear
|
||||
in definitions but have no corresponding entity file)
|
||||
4. LLM pairwise consistency check on directly-connected pairs
|
||||
5. LLM source fidelity check (compare definition to source chapter text)
|
||||
6. LLM OntoClean metaproperty classification; deterministic constraint
|
||||
checking
|
||||
|
||||
**Output:** `output/metrics/consistency-report.md` + YAML with cycle list,
|
||||
undefined terms, contradiction candidates, and metaproperty violations.
|
||||
|
||||
### C5: Granularity Balance
|
||||
|
||||
**Goal:** Ensure entities operate at comparable levels of abstraction
|
||||
within their respective domains and perspectives.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `abstraction_classification` | LLM-Eval | Classify each entity as theory-level / mechanism-level / observation-level |
|
||||
| `scope_score` | LLM-Eval | Rate each entity 1-5 for generality (1 = very specific instance, 5 = broad theoretical principle) |
|
||||
| `abstraction_distribution` | Deterministic | Count per level; compute entropy |
|
||||
| `scope_variance` | Deterministic | Variance of scope scores within each domain |
|
||||
| `level_x_perspective_matrix` | Deterministic | Cross-tabulation of abstraction level × economic domain |
|
||||
| `indispensability` | LLM-Eval | "If removed, what explanatory power is lost?" (1-5) |
|
||||
| `dispensable_entities` | Deterministic | Entities with indispensability score ≤ 2 |
|
||||
| `merge_candidates` | LLM-Eval | Pairs where one is a sub-case of the other |
|
||||
|
||||
**Pipeline:**
|
||||
1. LLM-classify each entity: abstraction level, scope score,
|
||||
indispensability
|
||||
2. Build level × perspective matrix
|
||||
3. Compute distribution entropy and per-domain scope variance
|
||||
4. Flag outliers: entities whose scope score deviates > 1.5σ from their
|
||||
domain mean
|
||||
5. For outlier entities, LLM-Eval: "Should this be merged into a broader
|
||||
concept, or split into sub-concepts?"
|
||||
|
||||
**Output:** `output/metrics/granularity-report.md` + YAML with
|
||||
classifications, distribution, outliers, and merge/split recommendations.
|
||||
|
||||
---
|
||||
|
||||
## 5. Shared Infrastructure
|
||||
|
||||
Several concerns share underlying computations:
|
||||
|
||||
| Infrastructure | Used by | Build once |
|
||||
|---------------|---------|------------|
|
||||
| Definition embeddings (vector per entity) | C1, C3 | Embedding API call per entity |
|
||||
| Relationship graph (entity → entity edges) | C3, C4 | String matching + LLM-Eval |
|
||||
| FCA formal context (entity × attribute matrix) | C2, C5 | Metadata parsing + LLM classification |
|
||||
| Entity metadata index (domain, VSM, chapter, sections) | C2, C5, C10 (schema compliance) | Deterministic markdown parsing |
|
||||
|
||||
These should be computed once per evaluation run and cached for use by
|
||||
all concern-specific metrics.
|
||||
|
||||
---
|
||||
|
||||
## 6. Evaluation Workflow
|
||||
|
||||
A full collection-level evaluation run:
|
||||
|
||||
```
|
||||
process_chapters.py --evaluate-collection --provider <provider>
|
||||
```
|
||||
|
||||
1. **Parse** — deterministic metadata extraction from all entity files
|
||||
2. **Embed** — compute definition embeddings (cached; only new/changed
|
||||
entities need fresh embeddings)
|
||||
3. **Infer** — LLM-Eval for relationship edges, metaproperties,
|
||||
abstraction levels, pairwise judgments (batched to minimise LLM calls)
|
||||
4. **Compute** — deterministic graph metrics, FCA lattice, coverage
|
||||
matrix, similarity matrix, cycle detection
|
||||
5. **Aggregate** — combine per-entity and per-pair scores into
|
||||
collection-level metrics
|
||||
6. **Report** — write per-concern markdown reports + unified `metrics.yaml`
|
||||
7. **Append** — add timestamped snapshot to `metrics-history.yaml`
|
||||
|
||||
Incremental mode (`--evaluate-collection --chapter <id>`) re-evaluates
|
||||
only the entities introduced or modified by that chapter, plus any
|
||||
pairwise checks involving those entities.
|
||||
|
||||
---
|
||||
|
||||
## 7. References
|
||||
|
||||
- Lindland, O.I., Sindre, G. & Sølvberg, A. (1994). "Understanding
|
||||
Quality in Conceptual Modeling." *IEEE Software* 11(2), 42-49.
|
||||
→ SEQUAL framework: validity and completeness dimensions.
|
||||
|
||||
- Guarino, N. & Welty, C.A. (2004). "An Overview of OntoClean." In
|
||||
*Handbook on Ontologies*, Springer, 151-171.
|
||||
→ Metaproperty analysis: rigidity, identity, unity, dependence.
|
||||
|
||||
- Poveda-Villalón, M., Gómez-Pérez, A. & Suárez-Figueroa, M.C. (2014).
|
||||
"OOPS! (OntOlogy Pitfall Scanner!): An On-line Tool for Ontology
|
||||
Evaluation." *IJSWIS* 10(2), 7-34.
|
||||
→ Pitfall catalogue: 41 anti-patterns for ontology design.
|
||||
|
||||
- Tartir, S. & Arpinar, I.B. (2007). "Ontology Evaluation and Ranking
|
||||
using OntoQA." *ICSC 2007*, IEEE, 185-192.
|
||||
→ Schema metrics: relationship richness, attribute richness.
|
||||
|
||||
- Wille, R. (1982). "Restructuring Lattice Theory." In *Ordered Sets*,
|
||||
Reidel, 445-470.
|
||||
→ Formal Concept Analysis: concept lattices from binary contexts.
|
||||
|
||||
- Elhaj, H. et al. (2008). "Auditing SNOMED CT with Formal Concept
|
||||
Analysis." *AMIA Annual Symposium*, PMC2605587.
|
||||
→ FCA for ontology completeness auditing.
|
||||
|
||||
- Keet, C.M. (2008). *A Formal Theory of Granularity.* PhD thesis,
|
||||
Free University of Bozen-Bolzano.
|
||||
→ Granularity levels and perspectives for ontology design.
|
||||
|
||||
- Mernik, M., Heering, J. & Sloane, A.M. (2005). "When and How to
|
||||
Develop Domain-Specific Languages." *ACM Computing Surveys* 37(4),
|
||||
316-344.
|
||||
→ DSL design: soundness, completeness, laconicity.
|
||||
|
||||
- Karsai, G. et al. (2014). "Design Guidelines for Domain Specific
|
||||
Languages." *arXiv:1409.2378*.
|
||||
→ Orthogonality, necessary-and-sufficient principle.
|
||||
|
||||
- Xue, B. & Zou, L. (2022). "Knowledge Graph Quality Management: A
|
||||
Comprehensive Survey." *IEEE TKDE* 35(5), 4969-4988.
|
||||
→ KG quality dimensions: conciseness, consistency, completeness.
|
||||
@@ -43,6 +43,7 @@ examples/infospace-with-history/
|
||||
├── TUTORIAL.md # This file
|
||||
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
|
||||
├── process_chapters.py # Pipeline script
|
||||
├── infospace.db # SQLite artifact database (generated, not in git)
|
||||
│
|
||||
├── schemas/ # Output structure definitions
|
||||
│ ├── economic-entity-schema-v1.0.md
|
||||
@@ -369,7 +370,53 @@ python process_chapters.py --stats
|
||||
|
||||
---
|
||||
|
||||
## 7. How the LLM Integration Works
|
||||
## 7. The Artifact Database (`infospace.db`)
|
||||
|
||||
The pipeline stores all artifacts (source text, templates, guidelines, generated
|
||||
outputs) and their dependency edges in a local SQLite database —
|
||||
`infospace.db`. This file is **not checked into git** because it is a derived
|
||||
cache that can be regenerated deterministically from the files already in the
|
||||
repository.
|
||||
|
||||
### Why it is excluded
|
||||
|
||||
- **Binary format** — SQLite databases don't produce meaningful diffs and
|
||||
would bloat the git history with every pipeline run.
|
||||
- **Fully derived** — every piece of data in the database originates from
|
||||
markdown files that *are* tracked in git (sources, templates, schemas,
|
||||
guidelines, and generated output).
|
||||
- **Reproducible** — re-running the pipeline rebuilds the database from
|
||||
scratch without any LLM calls, because each stage checks for existing
|
||||
output files on disk before invoking the LLM.
|
||||
|
||||
### How to regenerate it
|
||||
|
||||
If `infospace.db` is missing (e.g. after a fresh clone), rebuild it by
|
||||
re-running the pipeline over the chapters that already have output on disk:
|
||||
|
||||
```bash
|
||||
# Regenerate the database from existing output files (no LLM calls needed):
|
||||
python process_chapters.py --all --no-commit
|
||||
```
|
||||
|
||||
This will:
|
||||
|
||||
1. Create a fresh `infospace.db`
|
||||
2. Load all static artifacts (templates, guidelines, VSM reference)
|
||||
3. For each chapter whose output files already exist, import them into the
|
||||
database and record dependency edges
|
||||
4. Skip LLM calls entirely — existing files are detected and reused
|
||||
|
||||
After regeneration, `--list` and `--stats` work as normal:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --list
|
||||
python process_chapters.py --stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. How the LLM Integration Works
|
||||
|
||||
The pipeline uses MarkiTect's `markitect.llm` module, which provides three
|
||||
adapter backends that implement the `LLMAdapter` interface:
|
||||
@@ -423,7 +470,7 @@ supports `gemini-2.5-flash` with generous rate limits.
|
||||
|
||||
---
|
||||
|
||||
## 8. Tracking History with Git
|
||||
## 9. Tracking History with Git
|
||||
|
||||
Every processed chapter produces a git commit containing:
|
||||
|
||||
@@ -459,7 +506,7 @@ git commit -m "infospace: process book-1-chapter-05"
|
||||
|
||||
---
|
||||
|
||||
## 9. Cost and Performance
|
||||
## 10. Cost and Performance
|
||||
|
||||
From our measurements processing chapters 3-5:
|
||||
|
||||
@@ -486,7 +533,7 @@ To reduce costs further, use a cheaper model:
|
||||
|
||||
---
|
||||
|
||||
## 10. Completing the Remaining Chapters
|
||||
## 11. Completing the Remaining Chapters
|
||||
|
||||
As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
|
||||
how to complete the rest.
|
||||
@@ -555,7 +602,7 @@ fill the remaining gaps in S3*, S5, and regulatory concepts.
|
||||
|
||||
---
|
||||
|
||||
## 11. Quality Improvement Loop
|
||||
## 12. Quality Improvement Loop
|
||||
|
||||
The infospace is designed to be **iteratively refined**:
|
||||
|
||||
@@ -604,7 +651,7 @@ history of the infospace — every refinement decision is traceable.
|
||||
|
||||
---
|
||||
|
||||
## 12. Infrastructure Issues Found and Fixed
|
||||
## 13. Infrastructure Issues Found and Fixed
|
||||
|
||||
During development we documented three issues with the MarkiTect
|
||||
infrastructure in `INFRA-TASKS.md`:
|
||||
@@ -624,7 +671,7 @@ See `INFRA-TASKS.md` for details on each fix.
|
||||
|
||||
---
|
||||
|
||||
## 13. Adapting This Pattern to Your Own Project
|
||||
## 14. Adapting This Pattern to Your Own Project
|
||||
|
||||
To build your own infospace using this pattern:
|
||||
|
||||
|
||||
51
examples/infospace-with-history/infospace.yaml
Normal file
51
examples/infospace-with-history/infospace.yaml
Normal file
@@ -0,0 +1,51 @@
|
||||
# Infospace: The Wealth of Nations through the Viable System Model
|
||||
#
|
||||
# This configuration declares the infospace built by processing
|
||||
# Adam Smith's "The Wealth of Nations" (1776) through the lens of
|
||||
# Stafford Beer's Viable System Model (VSM).
|
||||
|
||||
topic:
|
||||
name: "The Wealth of Nations"
|
||||
domain: "Classical Economics"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Viable System Model"
|
||||
path: artifacts/vsm-reference/
|
||||
|
||||
schemas:
|
||||
entity: schemas/economic-entity-schema-v1.0.md
|
||||
mapping: schemas/vsm-mapping-schema-v1.0.md
|
||||
analysis: schemas/chapter-analysis-schema-v1.0.md
|
||||
|
||||
competency_questions: |
|
||||
1. How does Smith's division of labour map to VSM System 1 operations?
|
||||
2. What mechanisms in WoN correspond to VSM coordination (System 2)?
|
||||
3. Where does Smith describe self-organising regulation (System 3)?
|
||||
4. What role does the "invisible hand" play as a System 4 mechanism?
|
||||
5. How do Smith's views on government map to System 5 policy?
|
||||
6. Is the WoN entity set viable as an explanatory framework?
|
||||
|
||||
viability:
|
||||
redundancy_ratio:
|
||||
max: 0.10
|
||||
coverage_ratio:
|
||||
min: 0.50
|
||||
coherence_components:
|
||||
max: 3
|
||||
consistency_cycles:
|
||||
max: 0
|
||||
granularity_entropy:
|
||||
min: 1.0
|
||||
|
||||
pipeline:
|
||||
stages:
|
||||
- name: extract-entities
|
||||
template: templates/extract-entities.md
|
||||
- name: map-to-vsm
|
||||
template: templates/map-to-vsm.md
|
||||
- name: synthesize-analysis
|
||||
template: templates/synthesize-analysis.md
|
||||
post_batch:
|
||||
- name: assess-metrics
|
||||
template: templates/assess-metrics.md
|
||||
26
examples/infospace-with-history/output/metrics/history.yaml
Normal file
26
examples/infospace-with-history/output/metrics/history.yaml
Normal file
@@ -0,0 +1,26 @@
|
||||
- snapshot_id: 6ba48eb2
|
||||
created_at: '2026-02-19T01:29:41.225843+00:00'
|
||||
schema_name: default
|
||||
entity_count: 85
|
||||
entity_evaluations: []
|
||||
collection_metrics:
|
||||
- name: coherence_components
|
||||
value: 0.0
|
||||
concern: C3
|
||||
- name: consistency_cycles
|
||||
value: 0.0
|
||||
concern: C4
|
||||
- name: coverage_ratio
|
||||
value: 0.3611111111111111
|
||||
concern: C2
|
||||
- name: granularity_entropy
|
||||
value: 2.687485267017996
|
||||
concern: C5
|
||||
- name: modularity
|
||||
value: 0.0
|
||||
concern: C3
|
||||
- name: redundancy_ratio
|
||||
value: 0.0
|
||||
concern: C1
|
||||
metadata:
|
||||
source: collection-checks
|
||||
@@ -0,0 +1,6 @@
|
||||
coherence_components: 0.0
|
||||
consistency_cycles: 0.0
|
||||
coverage_ratio: 0.361111
|
||||
granularity_entropy: 2.687485
|
||||
modularity: 0.0
|
||||
redundancy_ratio: 0.0
|
||||
@@ -856,6 +856,125 @@ class ChapterProcessor:
|
||||
print(f" (No data yet: {e})")
|
||||
|
||||
|
||||
# ── Infospace tooling integration ─────────────────────────────────
|
||||
|
||||
|
||||
def _load_infospace(example_dir: Path):
|
||||
"""Load infospace config and entities from the example directory."""
|
||||
from markitect.infospace.config import load_infospace_config
|
||||
from markitect.infospace.entity_parser import parse_entity_directory
|
||||
|
||||
config_path = example_dir / "infospace.yaml"
|
||||
if not config_path.is_file():
|
||||
print("Error: No infospace.yaml found. Create one first.")
|
||||
sys.exit(1)
|
||||
|
||||
config = load_infospace_config(config_path)
|
||||
entities_dir = example_dir / config.entities_dir
|
||||
entities = parse_entity_directory(entities_dir) if entities_dir.is_dir() else []
|
||||
return config, config_path, entities
|
||||
|
||||
|
||||
def _run_infospace_status(example_dir: Path):
|
||||
"""Show infospace status using the tooling layer."""
|
||||
from markitect.infospace.state import build_state
|
||||
|
||||
config, config_path, entities = _load_infospace(example_dir)
|
||||
state = build_state(config, entities=entities)
|
||||
|
||||
print(f"Infospace: {state.topic_name}")
|
||||
print(f"Domain: {config.topic.domain}")
|
||||
print(f"Entities: {state.entity_count}")
|
||||
if state.domains:
|
||||
print(f"Domains: {', '.join(state.domains)}")
|
||||
if config.disciplines:
|
||||
names = [d.name for d in config.disciplines]
|
||||
print(f"Disciplines: {', '.join(names)}")
|
||||
|
||||
# Show processing progress
|
||||
sources_dir = example_dir / "artifacts" / "sources"
|
||||
total_chapters = len(list(sources_dir.glob("*.md")))
|
||||
processed = len(list((example_dir / "output" / "analyses").glob("*-analysis.md")))
|
||||
print(f"Chapters: {processed}/{total_chapters} processed")
|
||||
|
||||
|
||||
def _run_infospace_check(example_dir: Path):
|
||||
"""Run collection-level quality checks."""
|
||||
from markitect.infospace.checks import run_all_checks
|
||||
from markitect.infospace.history import record_check_results
|
||||
|
||||
config, config_path, entities = _load_infospace(example_dir)
|
||||
|
||||
if not entities:
|
||||
print("No entities to check.")
|
||||
return
|
||||
|
||||
print(f"Running collection checks on {len(entities)} entities...\n")
|
||||
report = run_all_checks(entities=entities)
|
||||
|
||||
d = report.to_dict()
|
||||
for concern_name, concern_data in d.items():
|
||||
label = concern_data.get("concern", concern_name.upper())
|
||||
print(f" {label} — {concern_name}")
|
||||
for k, v in concern_data.items():
|
||||
if k == "concern":
|
||||
continue
|
||||
print(f" {k}: {v}")
|
||||
print()
|
||||
|
||||
m = report.metrics()
|
||||
if m:
|
||||
print("Metrics summary:")
|
||||
for k, v in sorted(m.items()):
|
||||
print(f" {k}: {v:.4f}")
|
||||
snap = record_check_results(report, config, example_dir, entity_count=len(entities))
|
||||
print(f"\nRecorded snapshot {snap.snapshot_id}")
|
||||
|
||||
|
||||
def _run_infospace_viability(example_dir: Path):
|
||||
"""Show viability dashboard."""
|
||||
from markitect.infospace.history import read_metrics_file
|
||||
from markitect.infospace.state import build_state
|
||||
|
||||
config, config_path, entities = _load_infospace(example_dir)
|
||||
|
||||
if not config.viability:
|
||||
print("No viability thresholds configured.")
|
||||
return
|
||||
|
||||
metrics = read_metrics_file(example_dir / config.metrics_dir / "metrics.yaml")
|
||||
if not metrics:
|
||||
print("No metrics available. Run --infospace-check first.")
|
||||
print("\nConfigured thresholds:")
|
||||
for name, t in config.viability.items():
|
||||
bounds = []
|
||||
if t.min is not None:
|
||||
bounds.append(f"min={t.min}")
|
||||
if t.max is not None:
|
||||
bounds.append(f"max={t.max}")
|
||||
print(f" {name}: {', '.join(bounds)}")
|
||||
return
|
||||
|
||||
state = build_state(config, entities=entities, metrics=metrics)
|
||||
|
||||
print(f"{'Metric':<30} {'Value':>8} {'Threshold':>15} {'Status':>8}")
|
||||
print("-" * 63)
|
||||
for r in state.viability_results:
|
||||
bounds = []
|
||||
if r.threshold.min is not None:
|
||||
bounds.append(f"min={r.threshold.min}")
|
||||
if r.threshold.max is not None:
|
||||
bounds.append(f"max={r.threshold.max}")
|
||||
status_str = "PASS" if r.passed else "FAIL"
|
||||
print(f"{r.metric:<30} {r.value:>8.4f} {', '.join(bounds):>15} {status_str:>8}")
|
||||
|
||||
print()
|
||||
if state.is_viable:
|
||||
print(f"Viable: YES ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
|
||||
else:
|
||||
print(f"Viable: NO ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Process Wealth of Nations chapters through VSM analysis pipeline"
|
||||
@@ -869,6 +988,12 @@ def main():
|
||||
group.add_argument("--stats", action="store_true", help="Show dependency statistics")
|
||||
group.add_argument("--archive-entity", type=str, metavar="SLUG",
|
||||
help="Archive an entity (move to archive/ with reason)")
|
||||
group.add_argument("--infospace-status", action="store_true",
|
||||
help="Show infospace status via infospace tooling")
|
||||
group.add_argument("--infospace-check", action="store_true",
|
||||
help="Run collection-level quality checks (C1-C5)")
|
||||
group.add_argument("--infospace-viability", action="store_true",
|
||||
help="Show viability dashboard")
|
||||
|
||||
parser.add_argument("--reason", type=str, default=None,
|
||||
help="Reason for archiving (used with --archive-entity)")
|
||||
@@ -930,6 +1055,15 @@ def main():
|
||||
for ch in chapters:
|
||||
processor.process_chapter(ch, auto_commit=not args.no_commit)
|
||||
print()
|
||||
elif args.infospace_status:
|
||||
_run_infospace_status(example_dir)
|
||||
return
|
||||
elif args.infospace_check:
|
||||
_run_infospace_check(example_dir)
|
||||
return
|
||||
elif args.infospace_viability:
|
||||
_run_infospace_viability(example_dir)
|
||||
return
|
||||
|
||||
processor.show_stats()
|
||||
|
||||
|
||||
6
markitect/analysis/__init__.py
Normal file
6
markitect/analysis/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
"""
|
||||
markitect.analysis — Analytical utilities for MarkiTect.
|
||||
|
||||
Provides graph analysis, similarity computation, and other
|
||||
quantitative tools used by infospace tooling.
|
||||
"""
|
||||
307
markitect/analysis/fca.py
Normal file
307
markitect/analysis/fca.py
Normal file
@@ -0,0 +1,307 @@
|
||||
"""
|
||||
Formal Concept Analysis (FCA) for coverage gap detection.
|
||||
|
||||
Provides a pure-Python implementation of:
|
||||
|
||||
- :class:`FormalContext` — entity × attribute binary relation with
|
||||
extent/intent operations and double-prime closure.
|
||||
- :class:`ConceptLattice` — the set of all formal concepts computed
|
||||
via the NextClosure algorithm (Ganter, 1984).
|
||||
- :func:`find_gap_concepts` — attribute combinations present in the
|
||||
lattice whose extent is empty, revealing structural coverage gaps.
|
||||
|
||||
Sufficient for entity scales of ~100s. For larger contexts a library
|
||||
such as ``concepts`` (PyPI) can be substituted.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Iterable, Optional
|
||||
|
||||
|
||||
class FormalContext:
|
||||
"""Binary relation between objects and attributes.
|
||||
|
||||
Args:
|
||||
objects: Iterable of object identifiers (e.g. entity slugs).
|
||||
attributes: Iterable of attribute identifiers (e.g. "domain:Production").
|
||||
incidence: Mapping of object → set of attributes it possesses.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
objects: Iterable[str],
|
||||
attributes: Iterable[str],
|
||||
incidence: dict[str, set[str]],
|
||||
):
|
||||
self._objects = sorted(set(objects))
|
||||
self._attributes = sorted(set(attributes))
|
||||
self._obj_set = frozenset(self._objects)
|
||||
self._attr_set = frozenset(self._attributes)
|
||||
|
||||
# Normalise incidence: only keep known attributes
|
||||
self._incidence: dict[str, frozenset[str]] = {}
|
||||
for obj in self._objects:
|
||||
raw = incidence.get(obj, set())
|
||||
self._incidence[obj] = frozenset(raw) & self._attr_set
|
||||
|
||||
# Reverse index: attribute → set of objects that have it
|
||||
self._attr_to_objs: dict[str, frozenset[str]] = {}
|
||||
for attr in self._attributes:
|
||||
self._attr_to_objs[attr] = frozenset(
|
||||
obj for obj in self._objects if attr in self._incidence[obj]
|
||||
)
|
||||
|
||||
@property
|
||||
def objects(self) -> list[str]:
|
||||
"""Sorted list of objects."""
|
||||
return list(self._objects)
|
||||
|
||||
@property
|
||||
def attributes(self) -> list[str]:
|
||||
"""Sorted list of attributes."""
|
||||
return list(self._attributes)
|
||||
|
||||
@property
|
||||
def object_count(self) -> int:
|
||||
return len(self._objects)
|
||||
|
||||
@property
|
||||
def attribute_count(self) -> int:
|
||||
return len(self._attributes)
|
||||
|
||||
def extent(self, attrs: Iterable[str]) -> frozenset[str]:
|
||||
"""Objects possessing **all** given attributes (B' operation)."""
|
||||
attr_set = frozenset(attrs)
|
||||
if not attr_set:
|
||||
return self._obj_set
|
||||
result = self._obj_set
|
||||
for attr in attr_set:
|
||||
result = result & self._attr_to_objs.get(attr, frozenset())
|
||||
return result
|
||||
|
||||
def intent(self, objs: Iterable[str]) -> frozenset[str]:
|
||||
"""Attributes shared by **all** given objects (A' operation)."""
|
||||
obj_list = [o for o in objs if o in self._incidence]
|
||||
if not obj_list:
|
||||
return self._attr_set
|
||||
result = self._incidence[obj_list[0]]
|
||||
for obj in obj_list[1:]:
|
||||
result = result & self._incidence[obj]
|
||||
return result
|
||||
|
||||
def closure(self, attrs: Iterable[str]) -> frozenset[str]:
|
||||
"""Double-prime closure: B'' = intent(extent(B))."""
|
||||
return self.intent(self.extent(attrs))
|
||||
|
||||
def has_attribute(self, obj: str, attr: str) -> bool:
|
||||
"""Check if *obj* has *attr*."""
|
||||
return attr in self._incidence.get(obj, frozenset())
|
||||
|
||||
def density(self) -> float:
|
||||
"""Proportion of 1s in the incidence matrix."""
|
||||
total = len(self._objects) * len(self._attributes)
|
||||
if total == 0:
|
||||
return 0.0
|
||||
filled = sum(len(attrs) for attrs in self._incidence.values())
|
||||
return filled / total
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, entity_attributes: dict[str, set[str]]) -> FormalContext:
|
||||
"""Convenience: build context from ``{object: {attr, ...}}``."""
|
||||
objects = list(entity_attributes.keys())
|
||||
all_attrs: set[str] = set()
|
||||
for attrs in entity_attributes.values():
|
||||
all_attrs.update(attrs)
|
||||
return cls(objects, all_attrs, entity_attributes)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FormalConcept:
|
||||
"""A formal concept (A, B) where A' = B and B' = A."""
|
||||
|
||||
extent: frozenset[str]
|
||||
intent: frozenset[str]
|
||||
|
||||
@property
|
||||
def extent_size(self) -> int:
|
||||
return len(self.extent)
|
||||
|
||||
@property
|
||||
def intent_size(self) -> int:
|
||||
return len(self.intent)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ConceptLattice:
|
||||
"""The set of all formal concepts derived from a :class:`FormalContext`.
|
||||
|
||||
Concepts are ordered by extent inclusion (subconcept ≤ superconcept).
|
||||
"""
|
||||
|
||||
concepts: list[FormalConcept] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def size(self) -> int:
|
||||
"""Number of formal concepts in the lattice."""
|
||||
return len(self.concepts)
|
||||
|
||||
@property
|
||||
def top(self) -> Optional[FormalConcept]:
|
||||
"""Supremum: concept with largest extent."""
|
||||
if not self.concepts:
|
||||
return None
|
||||
return max(self.concepts, key=lambda c: c.extent_size)
|
||||
|
||||
@property
|
||||
def bottom(self) -> Optional[FormalConcept]:
|
||||
"""Infimum: concept with largest intent."""
|
||||
if not self.concepts:
|
||||
return None
|
||||
return max(self.concepts, key=lambda c: c.intent_size)
|
||||
|
||||
@classmethod
|
||||
def from_context(cls, context: FormalContext) -> ConceptLattice:
|
||||
"""Compute all formal concepts using the NextClosure algorithm."""
|
||||
attrs = context.attributes # sorted, fixed order
|
||||
if not attrs:
|
||||
# Degenerate: no attributes → single concept with all objects
|
||||
top = FormalConcept(
|
||||
extent=frozenset(context.objects),
|
||||
intent=frozenset(),
|
||||
)
|
||||
return cls(concepts=[top])
|
||||
|
||||
concepts: list[FormalConcept] = []
|
||||
|
||||
# Start with closure of empty attribute set
|
||||
current = context.closure(frozenset())
|
||||
ext = context.extent(current)
|
||||
concepts.append(FormalConcept(extent=ext, intent=current))
|
||||
|
||||
while current != frozenset(attrs):
|
||||
nxt = _next_closure(current, attrs, context.closure)
|
||||
if nxt is None:
|
||||
break
|
||||
ext = context.extent(nxt)
|
||||
concepts.append(FormalConcept(extent=ext, intent=nxt))
|
||||
current = nxt
|
||||
|
||||
return cls(concepts=concepts)
|
||||
|
||||
def gap_concepts(self) -> list[FormalConcept]:
|
||||
"""Formal concepts whose extent is empty."""
|
||||
return [c for c in self.concepts if c.extent_size == 0]
|
||||
|
||||
def concepts_with_extent_size(self, min_size: int = 0, max_size: Optional[int] = None) -> list[FormalConcept]:
|
||||
"""Filter concepts by extent size."""
|
||||
result = [c for c in self.concepts if c.extent_size >= min_size]
|
||||
if max_size is not None:
|
||||
result = [c for c in result if c.extent_size <= max_size]
|
||||
return result
|
||||
|
||||
def depth(self) -> int:
|
||||
"""Longest chain length in the concept ordering.
|
||||
|
||||
A chain is a sequence of concepts c_1 < c_2 < ... < c_k
|
||||
where < means strict subconcept (extent inclusion).
|
||||
"""
|
||||
if not self.concepts:
|
||||
return 0
|
||||
|
||||
# Build DAG: concept i → j if i is direct subconcept of j
|
||||
# Use extent inclusion: i < j iff extent_i ⊂ extent_j
|
||||
n = len(self.concepts)
|
||||
extents = [c.extent for c in self.concepts]
|
||||
|
||||
# Longest path via dynamic programming on sorted order
|
||||
# Sort by extent size ascending (smaller extents = more specific)
|
||||
order = sorted(range(n), key=lambda i: len(extents[i]))
|
||||
longest = [1] * n
|
||||
|
||||
for idx in range(n):
|
||||
i = order[idx]
|
||||
for jdx in range(idx + 1, n):
|
||||
j = order[jdx]
|
||||
if extents[i] < extents[j]: # strict subset
|
||||
if longest[j] < longest[i] + 1:
|
||||
longest[j] = longest[i] + 1
|
||||
|
||||
return max(longest) if longest else 0
|
||||
|
||||
|
||||
def find_gap_concepts(
|
||||
context: FormalContext,
|
||||
lattice: Optional[ConceptLattice] = None,
|
||||
) -> list[FormalConcept]:
|
||||
"""Find formal concepts with empty extent (coverage gaps).
|
||||
|
||||
These represent attribute combinations that are structurally
|
||||
present in the lattice but have no corresponding entities.
|
||||
|
||||
Args:
|
||||
context: The formal context.
|
||||
lattice: Pre-computed lattice. If ``None``, computed from *context*.
|
||||
|
||||
Returns:
|
||||
List of :class:`FormalConcept` with empty extent, sorted by
|
||||
intent size ascending (most specific gaps first).
|
||||
"""
|
||||
if lattice is None:
|
||||
lattice = ConceptLattice.from_context(context)
|
||||
gaps = lattice.gap_concepts()
|
||||
gaps.sort(key=lambda c: c.intent_size)
|
||||
return gaps
|
||||
|
||||
|
||||
def find_empty_cells(
|
||||
context: FormalContext,
|
||||
dimension_a: list[str],
|
||||
dimension_b: list[str],
|
||||
) -> list[tuple[str, str]]:
|
||||
"""Find empty cells in a two-dimensional cross-tabulation.
|
||||
|
||||
Given two sets of attributes (e.g. domain values and VSM systems),
|
||||
return pairs ``(attr_a, attr_b)`` where no object possesses both.
|
||||
|
||||
This is a simpler alternative to full FCA for two-dimensional
|
||||
coverage analysis.
|
||||
"""
|
||||
empty: list[tuple[str, str]] = []
|
||||
for a in sorted(dimension_a):
|
||||
for b in sorted(dimension_b):
|
||||
if not context.extent([a, b]):
|
||||
empty.append((a, b))
|
||||
return empty
|
||||
|
||||
|
||||
# ── NextClosure internals ───────────────────────────────────────────
|
||||
|
||||
|
||||
def _next_closure(
|
||||
current: frozenset[str],
|
||||
attrs: list[str],
|
||||
closure_fn,
|
||||
) -> Optional[frozenset[str]]:
|
||||
"""Compute the next closed set in lectic order after *current*.
|
||||
|
||||
Implements Ganter's NextClosure algorithm.
|
||||
"""
|
||||
for i in range(len(attrs) - 1, -1, -1):
|
||||
m = attrs[i]
|
||||
if m in current:
|
||||
current = current - {m}
|
||||
else:
|
||||
candidate = current | {m}
|
||||
closed = closure_fn(candidate)
|
||||
# Canonicity test: no attribute before position i
|
||||
# was added by the closure
|
||||
canonical = True
|
||||
for j in range(i):
|
||||
if attrs[j] in closed and attrs[j] not in candidate:
|
||||
canonical = False
|
||||
break
|
||||
if canonical:
|
||||
return closed
|
||||
return None
|
||||
184
markitect/analysis/graph.py
Normal file
184
markitect/analysis/graph.py
Normal file
@@ -0,0 +1,184 @@
|
||||
"""
|
||||
Graph analysis utilities for collection-level metrics.
|
||||
|
||||
Provides connected components, centrality, community detection,
|
||||
modularity, degree distribution, and cohesion/coupling computation.
|
||||
|
||||
Requires ``networkx`` (optional dependency)::
|
||||
|
||||
pip install networkx
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Optional
|
||||
|
||||
from markitect.prompts.dependencies.models import DependencyGraph
|
||||
|
||||
|
||||
def _require_networkx():
|
||||
"""Import and return networkx, raising a clear error if missing."""
|
||||
try:
|
||||
import networkx as nx
|
||||
return nx
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"networkx is required for graph analysis. "
|
||||
"Install it with: pip install networkx"
|
||||
) from None
|
||||
|
||||
|
||||
def to_networkx(graph: DependencyGraph):
|
||||
"""Convert a :class:`DependencyGraph` to a networkx ``DiGraph``.
|
||||
|
||||
Each edge carries an ``edge_type`` attribute (string value of the
|
||||
:class:`EdgeType` enum, or ``None``).
|
||||
"""
|
||||
nx = _require_networkx()
|
||||
G = nx.DiGraph()
|
||||
G.add_nodes_from(graph.nodes)
|
||||
for node in graph.nodes:
|
||||
for succ in graph.get_successors(node):
|
||||
edge_type = graph.get_edge_type(node, succ)
|
||||
G.add_edge(
|
||||
node, succ,
|
||||
edge_type=edge_type.value if edge_type else None,
|
||||
)
|
||||
return G
|
||||
|
||||
|
||||
def connected_components(graph: DependencyGraph) -> list[set[str]]:
|
||||
"""Find weakly connected components (edges treated as undirected).
|
||||
|
||||
Returns a list of node sets, one per component, sorted largest-first.
|
||||
"""
|
||||
nx = _require_networkx()
|
||||
G = to_networkx(graph)
|
||||
components = list(nx.weakly_connected_components(G))
|
||||
components.sort(key=len, reverse=True)
|
||||
return [set(c) for c in components]
|
||||
|
||||
|
||||
def betweenness_centrality(graph: DependencyGraph) -> dict[str, float]:
|
||||
"""Compute betweenness centrality for all nodes.
|
||||
|
||||
Returns a dict mapping node ID to centrality score in [0, 1].
|
||||
"""
|
||||
nx = _require_networkx()
|
||||
G = to_networkx(graph)
|
||||
return nx.betweenness_centrality(G)
|
||||
|
||||
|
||||
def detect_communities(
|
||||
graph: DependencyGraph,
|
||||
seed: Optional[int] = None,
|
||||
) -> list[set[str]]:
|
||||
"""Detect communities using the Louvain algorithm.
|
||||
|
||||
Operates on an undirected projection of the graph. Returns a list
|
||||
of node sets, one per community, sorted largest-first.
|
||||
|
||||
Args:
|
||||
graph: The dependency graph to analyse.
|
||||
seed: Random seed for reproducibility (passed to Louvain).
|
||||
"""
|
||||
nx = _require_networkx()
|
||||
G = to_networkx(graph).to_undirected()
|
||||
if len(G.nodes) == 0:
|
||||
return []
|
||||
communities = list(nx.community.louvain_communities(G, seed=seed))
|
||||
communities.sort(key=len, reverse=True)
|
||||
return [set(c) for c in communities]
|
||||
|
||||
|
||||
def modularity_score(
|
||||
graph: DependencyGraph,
|
||||
communities: Optional[list[set[str]]] = None,
|
||||
seed: Optional[int] = None,
|
||||
) -> float:
|
||||
"""Compute the modularity score for a community partition.
|
||||
|
||||
Args:
|
||||
graph: The dependency graph.
|
||||
communities: Pre-computed communities. If ``None``, communities
|
||||
are detected via :func:`detect_communities`.
|
||||
seed: Random seed (used only when *communities* is ``None``).
|
||||
|
||||
Returns:
|
||||
Modularity in [-0.5, 1.0]. Returns 0.0 for graphs with no edges.
|
||||
"""
|
||||
nx = _require_networkx()
|
||||
G = to_networkx(graph).to_undirected()
|
||||
if len(G.edges) == 0:
|
||||
return 0.0
|
||||
if communities is None:
|
||||
communities = detect_communities(graph, seed=seed)
|
||||
return nx.community.modularity(G, communities)
|
||||
|
||||
|
||||
def degree_distribution(graph: DependencyGraph) -> dict[str, dict[str, int]]:
|
||||
"""Compute in-degree, out-degree, and total degree for each node.
|
||||
|
||||
Returns::
|
||||
|
||||
{"node_id": {"in_degree": 2, "out_degree": 1, "total_degree": 3}, ...}
|
||||
"""
|
||||
nx = _require_networkx()
|
||||
G = to_networkx(graph)
|
||||
result = {}
|
||||
for node in G.nodes:
|
||||
ind = G.in_degree(node)
|
||||
outd = G.out_degree(node)
|
||||
result[node] = {
|
||||
"in_degree": ind,
|
||||
"out_degree": outd,
|
||||
"total_degree": ind + outd,
|
||||
}
|
||||
return result
|
||||
|
||||
|
||||
def cohesion_coupling(
|
||||
graph: DependencyGraph,
|
||||
communities: Optional[list[set[str]]] = None,
|
||||
seed: Optional[int] = None,
|
||||
) -> dict:
|
||||
"""Compute cohesion (intra-community edges) and coupling (inter-community edges).
|
||||
|
||||
Args:
|
||||
graph: The dependency graph.
|
||||
communities: Pre-computed communities. If ``None``, detected
|
||||
via :func:`detect_communities`.
|
||||
seed: Random seed (used only when *communities* is ``None``).
|
||||
|
||||
Returns:
|
||||
Dict with keys ``cohesion``, ``coupling`` (ratios in [0, 1]),
|
||||
``intra_edges``, ``inter_edges``, ``total_edges``, ``communities``.
|
||||
"""
|
||||
_require_networkx()
|
||||
G = to_networkx(graph)
|
||||
if communities is None:
|
||||
communities = detect_communities(graph, seed=seed)
|
||||
|
||||
# Build node → community index
|
||||
node_community: dict[str, int] = {}
|
||||
for i, comm in enumerate(communities):
|
||||
for node in comm:
|
||||
node_community[node] = i
|
||||
|
||||
intra = 0
|
||||
inter = 0
|
||||
for u, v in G.edges:
|
||||
if node_community.get(u) == node_community.get(v):
|
||||
intra += 1
|
||||
else:
|
||||
inter += 1
|
||||
|
||||
total = intra + inter
|
||||
return {
|
||||
"cohesion": intra / total if total > 0 else 0.0,
|
||||
"coupling": inter / total if total > 0 else 0.0,
|
||||
"intra_edges": intra,
|
||||
"inter_edges": inter,
|
||||
"total_edges": total,
|
||||
"communities": len(communities),
|
||||
}
|
||||
@@ -7147,6 +7147,13 @@ try:
|
||||
except ImportError:
|
||||
pass # Helper module not available
|
||||
|
||||
# Register infospace commands
|
||||
try:
|
||||
from markitect.infospace.cli import infospace_commands
|
||||
cli.add_command(infospace_commands)
|
||||
except ImportError:
|
||||
pass # Infospace module not available
|
||||
|
||||
# Register proxy file system commands
|
||||
try:
|
||||
from markitect.proxy.cli import proxy_group
|
||||
|
||||
@@ -9,6 +9,7 @@ This package contains the fundamental building blocks:
|
||||
"""
|
||||
|
||||
from .parser import parse_markdown_to_ast
|
||||
from .section_tree import build_section_tree, extract_section_text
|
||||
from .serializer import ASTSerializer
|
||||
from .document_manager import DocumentManager, CleanDocumentManager
|
||||
from .workspace import (
|
||||
@@ -29,6 +30,9 @@ from .workspace import (
|
||||
__all__ = [
|
||||
# Parser
|
||||
"parse_markdown_to_ast",
|
||||
# Section tree
|
||||
"build_section_tree",
|
||||
"extract_section_text",
|
||||
# Serializer
|
||||
"ASTSerializer",
|
||||
# Document Manager
|
||||
|
||||
124
markitect/core/section_tree.py
Normal file
124
markitect/core/section_tree.py
Normal file
@@ -0,0 +1,124 @@
|
||||
"""
|
||||
Standalone section-tree utilities extracted from SchemaGenerator.
|
||||
|
||||
Builds a hierarchical section tree from flat markdown-it AST tokens and
|
||||
provides helpers for navigating heading structure and extracting text.
|
||||
These functions are used by both the schema generator and the infospace
|
||||
entity parser.
|
||||
"""
|
||||
|
||||
import re
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
|
||||
def slugify(text: str) -> str:
|
||||
"""Convert heading or label text to a valid slug / JSON property key."""
|
||||
replacements = {
|
||||
'ä': 'ae', 'ö': 'oe', 'ü': 'ue',
|
||||
'Ä': 'Ae', 'Ö': 'Oe', 'Ü': 'Ue', 'ß': 'ss',
|
||||
}
|
||||
slug = text
|
||||
for char, repl in replacements.items():
|
||||
slug = slug.replace(char, repl)
|
||||
slug = slug.lower()
|
||||
slug = re.sub(r'[^a-z0-9]+', '_', slug)
|
||||
slug = slug.strip('_')
|
||||
return slug or 'feld'
|
||||
|
||||
|
||||
def extract_heading_level(tag: str) -> int:
|
||||
"""Extract heading level from an HTML tag string (h1, h2, …)."""
|
||||
if tag.startswith('h') and len(tag) == 2:
|
||||
try:
|
||||
return int(tag[1])
|
||||
except ValueError:
|
||||
pass
|
||||
return 1
|
||||
|
||||
|
||||
def extract_heading_content(tokens: List[Dict[str, Any]], start_index: int) -> str:
|
||||
"""Return the inline text content following a ``heading_open`` token."""
|
||||
for i in range(start_index, min(start_index + 3, len(tokens))):
|
||||
token = tokens[i]
|
||||
if token.get('type') == 'inline':
|
||||
return token.get('content', '')
|
||||
return ''
|
||||
|
||||
|
||||
def build_section_tree(
|
||||
tokens: List[Dict[str, Any]], max_depth: Optional[int] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Build a hierarchical section tree from a flat markdown-it token list.
|
||||
|
||||
Returns a root node whose ``children`` list contains the top-level
|
||||
sections. Each node carries:
|
||||
|
||||
- ``heading`` – heading text (``None`` for the root)
|
||||
- ``level`` – heading depth (``0`` for the root)
|
||||
- ``slug`` – slugified heading
|
||||
- ``content_tokens`` – non-heading tokens belonging to this section
|
||||
- ``children`` – nested sub-sections
|
||||
"""
|
||||
root: Dict[str, Any] = {
|
||||
'heading': None, 'level': 0, 'slug': '',
|
||||
'content_tokens': [], 'children': []
|
||||
}
|
||||
stack = [root]
|
||||
|
||||
i = 0
|
||||
while i < len(tokens):
|
||||
token = tokens[i]
|
||||
if token.get('type') == 'heading_open':
|
||||
level = extract_heading_level(token.get('tag', ''))
|
||||
heading_text = extract_heading_content(tokens, i)
|
||||
|
||||
if max_depth is not None and level > max_depth:
|
||||
# Skip this heading and its close token, but keep content
|
||||
i += 1
|
||||
while i < len(tokens) and tokens[i].get('type') != 'heading_close':
|
||||
i += 1
|
||||
i += 1
|
||||
continue
|
||||
|
||||
section: Dict[str, Any] = {
|
||||
'heading': heading_text,
|
||||
'level': level,
|
||||
'slug': slugify(heading_text),
|
||||
'content_tokens': [],
|
||||
'children': []
|
||||
}
|
||||
|
||||
# Pop stack until we find the parent (level < current)
|
||||
while len(stack) > 1 and stack[-1]['level'] >= level:
|
||||
stack.pop()
|
||||
|
||||
stack[-1]['children'].append(section)
|
||||
stack.append(section)
|
||||
|
||||
# Skip past heading_close
|
||||
i += 1
|
||||
while i < len(tokens) and tokens[i].get('type') != 'heading_close':
|
||||
i += 1
|
||||
else:
|
||||
# Add content token to current section
|
||||
stack[-1]['content_tokens'].append(token)
|
||||
|
||||
i += 1
|
||||
|
||||
return root
|
||||
|
||||
|
||||
def extract_section_text(section: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Return the plain text content of a section node.
|
||||
|
||||
Concatenates the ``content`` field of every ``inline`` token found
|
||||
in the section's ``content_tokens``. Paragraphs are separated by
|
||||
newlines; other inline tokens are joined with spaces.
|
||||
"""
|
||||
parts: List[str] = []
|
||||
for token in section.get('content_tokens', []):
|
||||
if token.get('type') == 'inline':
|
||||
parts.append(token.get('content', ''))
|
||||
return '\n'.join(parts)
|
||||
107
markitect/infospace/__init__.py
Normal file
107
markitect/infospace/__init__.py
Normal file
@@ -0,0 +1,107 @@
|
||||
"""
|
||||
Infospace analysis package.
|
||||
|
||||
Provides tooling for extracting structured metadata from entity markdown
|
||||
files and analysing infospace collections.
|
||||
"""
|
||||
|
||||
from .models import EntityMeta
|
||||
from .entity_parser import parse_entity_file, parse_entity_directory
|
||||
from .schema import (
|
||||
ECONOMIC_ENTITY_SCHEMA,
|
||||
EntitySchema,
|
||||
EnumConstraint,
|
||||
SectionRequirement,
|
||||
SectionRule,
|
||||
)
|
||||
from .validator import (
|
||||
BatchComplianceResult,
|
||||
ComplianceDiagnostic,
|
||||
ComplianceResult,
|
||||
validate_entities,
|
||||
validate_entity,
|
||||
)
|
||||
from .evaluation import (
|
||||
EntityEvaluation,
|
||||
EvaluationSnapshot,
|
||||
MetricChange,
|
||||
MetricValue,
|
||||
ScoreChange,
|
||||
ScoreEntry,
|
||||
SnapshotDiff,
|
||||
)
|
||||
from .evaluation_io import (
|
||||
append_to_history,
|
||||
diff_snapshots,
|
||||
read_entity_evaluation,
|
||||
read_history,
|
||||
read_snapshot,
|
||||
write_entity_evaluation,
|
||||
write_snapshot,
|
||||
)
|
||||
from .config import (
|
||||
DisciplineBinding,
|
||||
InfospaceConfig,
|
||||
PipelineConfig,
|
||||
PipelineStage,
|
||||
SchemaRegistry,
|
||||
TopicConfig,
|
||||
ViabilityThreshold,
|
||||
find_infospace_config,
|
||||
load_infospace_config,
|
||||
save_infospace_config,
|
||||
)
|
||||
from .state import (
|
||||
InfospaceState,
|
||||
ViabilityResult,
|
||||
build_state,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"EntityMeta",
|
||||
"parse_entity_file",
|
||||
"parse_entity_directory",
|
||||
# Schema
|
||||
"ECONOMIC_ENTITY_SCHEMA",
|
||||
"EntitySchema",
|
||||
"EnumConstraint",
|
||||
"SectionRequirement",
|
||||
"SectionRule",
|
||||
# Validator
|
||||
"BatchComplianceResult",
|
||||
"ComplianceDiagnostic",
|
||||
"ComplianceResult",
|
||||
"validate_entities",
|
||||
"validate_entity",
|
||||
# Evaluation models
|
||||
"EntityEvaluation",
|
||||
"EvaluationSnapshot",
|
||||
"MetricChange",
|
||||
"MetricValue",
|
||||
"ScoreChange",
|
||||
"ScoreEntry",
|
||||
"SnapshotDiff",
|
||||
# Evaluation I/O
|
||||
"append_to_history",
|
||||
"diff_snapshots",
|
||||
"read_entity_evaluation",
|
||||
"read_history",
|
||||
"read_snapshot",
|
||||
"write_entity_evaluation",
|
||||
"write_snapshot",
|
||||
# Config
|
||||
"DisciplineBinding",
|
||||
"InfospaceConfig",
|
||||
"PipelineConfig",
|
||||
"PipelineStage",
|
||||
"SchemaRegistry",
|
||||
"TopicConfig",
|
||||
"ViabilityThreshold",
|
||||
"find_infospace_config",
|
||||
"load_infospace_config",
|
||||
"save_infospace_config",
|
||||
# State
|
||||
"InfospaceState",
|
||||
"ViabilityResult",
|
||||
"build_state",
|
||||
]
|
||||
23
markitect/infospace/checks/__init__.py
Normal file
23
markitect/infospace/checks/__init__.py
Normal file
@@ -0,0 +1,23 @@
|
||||
"""
|
||||
Collection-level quality checks for infospaces.
|
||||
|
||||
Five concerns: Redundancy (C1), Coverage (C2), Coherence (C3),
|
||||
Consistency (C4), Granularity (C5).
|
||||
"""
|
||||
|
||||
from markitect.infospace.checks.redundancy import check_redundancy
|
||||
from markitect.infospace.checks.coverage import check_coverage
|
||||
from markitect.infospace.checks.coherence import check_coherence
|
||||
from markitect.infospace.checks.consistency import check_consistency
|
||||
from markitect.infospace.checks.granularity import check_granularity
|
||||
from markitect.infospace.checks.orchestrator import run_all_checks, CheckReport
|
||||
|
||||
__all__ = [
|
||||
"check_redundancy",
|
||||
"check_coverage",
|
||||
"check_coherence",
|
||||
"check_consistency",
|
||||
"check_granularity",
|
||||
"run_all_checks",
|
||||
"CheckReport",
|
||||
]
|
||||
81
markitect/infospace/checks/coherence.py
Normal file
81
markitect/infospace/checks/coherence.py
Normal file
@@ -0,0 +1,81 @@
|
||||
"""
|
||||
C3 — Structural coherence.
|
||||
|
||||
Uses graph analysis to check that the entity relationship graph is
|
||||
well-connected and has meaningful community structure.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
from markitect.prompts.dependencies.models import DependencyGraph
|
||||
|
||||
|
||||
@dataclass
|
||||
class CoherenceReport:
|
||||
"""Results from coherence analysis."""
|
||||
|
||||
connected_components: int = 0
|
||||
largest_component_size: int = 0
|
||||
modularity: float = 0.0
|
||||
community_count: int = 0
|
||||
cohesion: float = 0.0
|
||||
coupling: float = 0.0
|
||||
entity_count: int = 0
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"concern": "C3",
|
||||
"connected_components": self.connected_components,
|
||||
"largest_component_size": self.largest_component_size,
|
||||
"modularity": round(self.modularity, 4),
|
||||
"community_count": self.community_count,
|
||||
"cohesion": round(self.cohesion, 4),
|
||||
"coupling": round(self.coupling, 4),
|
||||
"entity_count": self.entity_count,
|
||||
}
|
||||
|
||||
|
||||
def check_coherence(
|
||||
graph: Optional[DependencyGraph] = None,
|
||||
entity_count: int = 0,
|
||||
) -> CoherenceReport:
|
||||
"""Check structural coherence of the entity relationship graph.
|
||||
|
||||
Args:
|
||||
graph: The entity relationship graph. If ``None``, returns
|
||||
a report with zero values.
|
||||
entity_count: Total number of entities (for context).
|
||||
|
||||
Returns:
|
||||
:class:`CoherenceReport` with connectivity and community metrics.
|
||||
"""
|
||||
if graph is None or len(graph.nodes) == 0:
|
||||
return CoherenceReport(entity_count=entity_count)
|
||||
|
||||
try:
|
||||
from markitect.analysis.graph import (
|
||||
connected_components,
|
||||
modularity_score,
|
||||
detect_communities,
|
||||
cohesion_coupling,
|
||||
)
|
||||
except ImportError:
|
||||
return CoherenceReport(entity_count=entity_count)
|
||||
|
||||
components = connected_components(graph)
|
||||
communities = detect_communities(graph, seed=42)
|
||||
mod = modularity_score(graph, communities=communities)
|
||||
cc = cohesion_coupling(graph, communities=communities)
|
||||
|
||||
return CoherenceReport(
|
||||
connected_components=len(components),
|
||||
largest_component_size=len(components[0]) if components else 0,
|
||||
modularity=mod,
|
||||
community_count=len(communities),
|
||||
cohesion=cc["cohesion"],
|
||||
coupling=cc["coupling"],
|
||||
entity_count=entity_count or len(graph.nodes),
|
||||
)
|
||||
58
markitect/infospace/checks/consistency.py
Normal file
58
markitect/infospace/checks/consistency.py
Normal file
@@ -0,0 +1,58 @@
|
||||
"""
|
||||
C4 — Definitional consistency.
|
||||
|
||||
Checks for cycles in the dependency graph and definitional conflicts
|
||||
between entities.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.prompts.dependencies.models import DependencyGraph
|
||||
|
||||
|
||||
@dataclass
|
||||
class ConsistencyReport:
|
||||
"""Results from consistency analysis."""
|
||||
|
||||
cycles: List[List[str]] = field(default_factory=list)
|
||||
cycle_count: int = 0
|
||||
entity_count: int = 0
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"concern": "C4",
|
||||
"cycle_count": self.cycle_count,
|
||||
"cycles": self.cycles,
|
||||
"entity_count": self.entity_count,
|
||||
}
|
||||
|
||||
|
||||
def check_consistency(
|
||||
entities: List[EntityMeta],
|
||||
graph: Optional[DependencyGraph] = None,
|
||||
) -> ConsistencyReport:
|
||||
"""Check definitional consistency.
|
||||
|
||||
Args:
|
||||
entities: Entity metadata list.
|
||||
graph: Optional dependency graph for cycle detection.
|
||||
|
||||
Returns:
|
||||
:class:`ConsistencyReport` with cycles found.
|
||||
"""
|
||||
n = len(entities)
|
||||
cycles: List[List[str]] = []
|
||||
|
||||
if graph is not None and len(graph.nodes) > 0:
|
||||
raw_cycles = graph.detect_cycles()
|
||||
cycles = raw_cycles
|
||||
|
||||
return ConsistencyReport(
|
||||
cycles=cycles,
|
||||
cycle_count=len(cycles),
|
||||
entity_count=n,
|
||||
)
|
||||
111
markitect/infospace/checks/coverage.py
Normal file
111
markitect/infospace/checks/coverage.py
Normal file
@@ -0,0 +1,111 @@
|
||||
"""
|
||||
C2 — Coverage completeness.
|
||||
|
||||
Uses FCA and cross-tabulation to detect structural coverage gaps:
|
||||
attribute combinations (domain × VSM system) with no entities.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.analysis.fca import FormalContext, find_empty_cells, find_gap_concepts
|
||||
|
||||
|
||||
@dataclass
|
||||
class CoverageReport:
|
||||
"""Results from coverage analysis."""
|
||||
|
||||
coverage_ratio: float = 0.0
|
||||
empty_cells: List[dict] = field(default_factory=list)
|
||||
gap_concepts: List[dict] = field(default_factory=list)
|
||||
domain_counts: Dict[str, int] = field(default_factory=dict)
|
||||
entity_count: int = 0
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"concern": "C2",
|
||||
"coverage_ratio": round(self.coverage_ratio, 4),
|
||||
"empty_cells": self.empty_cells,
|
||||
"gap_concepts_count": len(self.gap_concepts),
|
||||
"domain_counts": self.domain_counts,
|
||||
"entity_count": self.entity_count,
|
||||
}
|
||||
|
||||
|
||||
def _extract_attributes(entity: EntityMeta) -> set[str]:
|
||||
"""Extract FCA attributes from an entity."""
|
||||
attrs: set[str] = set()
|
||||
if entity.domain:
|
||||
attrs.add(f"domain:{entity.domain}")
|
||||
if entity.source_chapter:
|
||||
attrs.add(f"chapter:{entity.source_chapter}")
|
||||
return attrs
|
||||
|
||||
|
||||
def check_coverage(
|
||||
entities: List[EntityMeta],
|
||||
extra_attributes: Optional[Dict[str, set[str]]] = None,
|
||||
) -> CoverageReport:
|
||||
"""Check coverage completeness using FCA gap analysis.
|
||||
|
||||
Args:
|
||||
entities: Entity metadata list.
|
||||
extra_attributes: Optional ``{slug: {attr, ...}}`` to merge
|
||||
with auto-extracted attributes (e.g. VSM mappings).
|
||||
|
||||
Returns:
|
||||
:class:`CoverageReport` with gaps and coverage ratio.
|
||||
"""
|
||||
n = len(entities)
|
||||
if n == 0:
|
||||
return CoverageReport()
|
||||
|
||||
# Build entity → attributes mapping
|
||||
entity_attrs: Dict[str, set[str]] = {}
|
||||
for e in entities:
|
||||
attrs = _extract_attributes(e)
|
||||
if extra_attributes and e.slug in extra_attributes:
|
||||
attrs.update(extra_attributes[e.slug])
|
||||
entity_attrs[e.slug] = attrs
|
||||
|
||||
# Domain counts
|
||||
domain_counts: Dict[str, int] = {}
|
||||
for e in entities:
|
||||
d = e.domain or "(unspecified)"
|
||||
domain_counts[d] = domain_counts.get(d, 0) + 1
|
||||
|
||||
# Build FCA context
|
||||
context = FormalContext.from_dict(entity_attrs)
|
||||
|
||||
# Cross-tabulation: domain × chapter
|
||||
domains = sorted({a for attrs in entity_attrs.values() for a in attrs if a.startswith("domain:")})
|
||||
chapters = sorted({a for attrs in entity_attrs.values() for a in attrs if a.startswith("chapter:")})
|
||||
|
||||
empty = []
|
||||
if domains and chapters:
|
||||
raw_empty = find_empty_cells(context, domains, chapters)
|
||||
empty = [{"dimension_a": a, "dimension_b": b} for a, b in raw_empty]
|
||||
|
||||
# FCA gap concepts
|
||||
gaps = find_gap_concepts(context)
|
||||
gap_dicts = [
|
||||
{"intent": sorted(g.intent), "extent_size": g.extent_size}
|
||||
for g in gaps
|
||||
if g.intent_size <= 4 # Only report manageable gaps
|
||||
]
|
||||
|
||||
# Coverage ratio: populated cells / total possible cells
|
||||
total_cells = len(domains) * len(chapters) if domains and chapters else 1
|
||||
populated = total_cells - len(empty)
|
||||
ratio = populated / total_cells if total_cells > 0 else 0.0
|
||||
|
||||
return CoverageReport(
|
||||
coverage_ratio=ratio,
|
||||
empty_cells=empty,
|
||||
gap_concepts=gap_dicts,
|
||||
domain_counts=domain_counts,
|
||||
entity_count=n,
|
||||
)
|
||||
98
markitect/infospace/checks/granularity.py
Normal file
98
markitect/infospace/checks/granularity.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
C5 — Granularity balance.
|
||||
|
||||
Checks that entities are at a consistent level of abstraction,
|
||||
measured by word count distribution and Shannon entropy of domain
|
||||
assignments.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, List
|
||||
|
||||
from markitect.infospace.models import EntityMeta
|
||||
|
||||
|
||||
@dataclass
|
||||
class GranularityReport:
|
||||
"""Results from granularity analysis."""
|
||||
|
||||
domain_entropy: float = 0.0
|
||||
word_count_stats: Dict[str, float] = field(default_factory=dict)
|
||||
domain_distribution: Dict[str, int] = field(default_factory=dict)
|
||||
entity_count: int = 0
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"concern": "C5",
|
||||
"domain_entropy": round(self.domain_entropy, 4),
|
||||
"word_count_stats": {
|
||||
k: round(v, 2) for k, v in self.word_count_stats.items()
|
||||
},
|
||||
"domain_distribution": self.domain_distribution,
|
||||
"entity_count": self.entity_count,
|
||||
}
|
||||
|
||||
|
||||
def _shannon_entropy(counts: Dict[str, int]) -> float:
|
||||
"""Compute Shannon entropy of a distribution."""
|
||||
total = sum(counts.values())
|
||||
if total == 0:
|
||||
return 0.0
|
||||
entropy = 0.0
|
||||
for count in counts.values():
|
||||
if count > 0:
|
||||
p = count / total
|
||||
entropy -= p * math.log2(p)
|
||||
return entropy
|
||||
|
||||
|
||||
def check_granularity(entities: List[EntityMeta]) -> GranularityReport:
|
||||
"""Check granularity balance across entities.
|
||||
|
||||
Metrics:
|
||||
- Domain entropy: higher = more balanced distribution.
|
||||
- Word count statistics: mean, min, max, std dev.
|
||||
|
||||
Args:
|
||||
entities: Entity metadata list.
|
||||
|
||||
Returns:
|
||||
:class:`GranularityReport` with balance metrics.
|
||||
"""
|
||||
n = len(entities)
|
||||
if n == 0:
|
||||
return GranularityReport()
|
||||
|
||||
# Domain distribution
|
||||
domain_counts: Dict[str, int] = {}
|
||||
for e in entities:
|
||||
d = e.domain or "(unspecified)"
|
||||
domain_counts[d] = domain_counts.get(d, 0) + 1
|
||||
|
||||
entropy = _shannon_entropy(domain_counts)
|
||||
|
||||
# Word count statistics
|
||||
word_counts = [e.definition_word_count for e in entities]
|
||||
if not word_counts:
|
||||
word_counts = [0]
|
||||
|
||||
mean_wc = sum(word_counts) / len(word_counts)
|
||||
min_wc = min(word_counts)
|
||||
max_wc = max(word_counts)
|
||||
variance = sum((wc - mean_wc) ** 2 for wc in word_counts) / len(word_counts)
|
||||
std_wc = math.sqrt(variance)
|
||||
|
||||
return GranularityReport(
|
||||
domain_entropy=entropy,
|
||||
word_count_stats={
|
||||
"mean": mean_wc,
|
||||
"min": float(min_wc),
|
||||
"max": float(max_wc),
|
||||
"std": std_wc,
|
||||
},
|
||||
domain_distribution=domain_counts,
|
||||
entity_count=n,
|
||||
)
|
||||
102
markitect/infospace/checks/orchestrator.py
Normal file
102
markitect/infospace/checks/orchestrator.py
Normal file
@@ -0,0 +1,102 @@
|
||||
"""
|
||||
Unified orchestrator for all five collection-level checks.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.prompts.dependencies.models import DependencyGraph
|
||||
|
||||
from .redundancy import RedundancyReport, check_redundancy
|
||||
from .coverage import CoverageReport, check_coverage
|
||||
from .coherence import CoherenceReport, check_coherence
|
||||
from .consistency import ConsistencyReport, check_consistency
|
||||
from .granularity import GranularityReport, check_granularity
|
||||
|
||||
|
||||
@dataclass
|
||||
class CheckReport:
|
||||
"""Unified report from all five collection-level checks."""
|
||||
|
||||
redundancy: Optional[RedundancyReport] = None
|
||||
coverage: Optional[CoverageReport] = None
|
||||
coherence: Optional[CoherenceReport] = None
|
||||
consistency: Optional[ConsistencyReport] = None
|
||||
granularity: Optional[GranularityReport] = None
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {}
|
||||
if self.redundancy:
|
||||
d["redundancy"] = self.redundancy.to_dict()
|
||||
if self.coverage:
|
||||
d["coverage"] = self.coverage.to_dict()
|
||||
if self.coherence:
|
||||
d["coherence"] = self.coherence.to_dict()
|
||||
if self.consistency:
|
||||
d["consistency"] = self.consistency.to_dict()
|
||||
if self.granularity:
|
||||
d["granularity"] = self.granularity.to_dict()
|
||||
return d
|
||||
|
||||
def metrics(self) -> Dict[str, float]:
|
||||
"""Extract key metrics for viability checking."""
|
||||
m: Dict[str, float] = {}
|
||||
if self.redundancy:
|
||||
m["redundancy_ratio"] = self.redundancy.redundancy_ratio
|
||||
if self.coverage:
|
||||
m["coverage_ratio"] = self.coverage.coverage_ratio
|
||||
if self.coherence:
|
||||
m["coherence_components"] = float(self.coherence.connected_components)
|
||||
m["modularity"] = self.coherence.modularity
|
||||
if self.consistency:
|
||||
m["consistency_cycles"] = float(self.consistency.cycle_count)
|
||||
if self.granularity:
|
||||
m["granularity_entropy"] = self.granularity.domain_entropy
|
||||
return m
|
||||
|
||||
|
||||
def run_all_checks(
|
||||
entities: List[EntityMeta],
|
||||
embeddings: Optional[Dict[str, list[float]]] = None,
|
||||
graph: Optional[DependencyGraph] = None,
|
||||
extra_attributes: Optional[Dict[str, set[str]]] = None,
|
||||
checks: Optional[List[str]] = None,
|
||||
) -> CheckReport:
|
||||
"""Run all (or selected) collection-level checks.
|
||||
|
||||
Args:
|
||||
entities: Entity metadata list.
|
||||
embeddings: Pre-computed embedding vectors for C1.
|
||||
graph: Entity relationship graph for C3 and C4.
|
||||
extra_attributes: Extra FCA attributes for C2.
|
||||
checks: List of check names to run. If ``None``, runs all five.
|
||||
Valid names: ``redundancy``, ``coverage``, ``coherence``,
|
||||
``consistency``, ``granularity``.
|
||||
|
||||
Returns:
|
||||
:class:`CheckReport` with results from each check.
|
||||
"""
|
||||
run_all = checks is None
|
||||
check_set = set(checks) if checks else set()
|
||||
|
||||
report = CheckReport()
|
||||
|
||||
if run_all or "redundancy" in check_set:
|
||||
report.redundancy = check_redundancy(entities, embeddings=embeddings)
|
||||
|
||||
if run_all or "coverage" in check_set:
|
||||
report.coverage = check_coverage(entities, extra_attributes=extra_attributes)
|
||||
|
||||
if run_all or "coherence" in check_set:
|
||||
report.coherence = check_coherence(graph=graph, entity_count=len(entities))
|
||||
|
||||
if run_all or "consistency" in check_set:
|
||||
report.consistency = check_consistency(entities, graph=graph)
|
||||
|
||||
if run_all or "granularity" in check_set:
|
||||
report.granularity = check_granularity(entities)
|
||||
|
||||
return report
|
||||
98
markitect/infospace/checks/redundancy.py
Normal file
98
markitect/infospace/checks/redundancy.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
C1 — Redundancy detection.
|
||||
|
||||
Uses embedding similarity to find entity pairs with overlapping
|
||||
meanings that may be candidates for merging.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.llm.similarity import find_similar_pairs
|
||||
|
||||
|
||||
@dataclass
|
||||
class RedundancyReport:
|
||||
"""Results from redundancy analysis."""
|
||||
|
||||
similar_pairs: List[dict] = field(default_factory=list)
|
||||
redundancy_ratio: float = 0.0
|
||||
entity_count: int = 0
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"concern": "C1",
|
||||
"redundancy_ratio": round(self.redundancy_ratio, 4),
|
||||
"similar_pairs": self.similar_pairs,
|
||||
"entity_count": self.entity_count,
|
||||
}
|
||||
|
||||
|
||||
def check_redundancy(
|
||||
entities: List[EntityMeta],
|
||||
embeddings: Optional[Dict[str, list[float]]] = None,
|
||||
threshold: float = 0.85,
|
||||
) -> RedundancyReport:
|
||||
"""Check for redundant entities using embedding similarity.
|
||||
|
||||
Args:
|
||||
entities: Entity metadata list.
|
||||
embeddings: Pre-computed ``{slug: vector}`` mapping.
|
||||
If ``None``, redundancy is checked structurally (title overlap).
|
||||
threshold: Similarity threshold for flagging pairs.
|
||||
|
||||
Returns:
|
||||
:class:`RedundancyReport` with similar pairs and ratio.
|
||||
"""
|
||||
n = len(entities)
|
||||
if n < 2:
|
||||
return RedundancyReport(entity_count=n)
|
||||
|
||||
pairs: list[dict] = []
|
||||
|
||||
if embeddings:
|
||||
# Embedding-based similarity
|
||||
raw_pairs = find_similar_pairs(embeddings, threshold=threshold)
|
||||
for slug_a, slug_b, sim in raw_pairs:
|
||||
pairs.append({
|
||||
"entity_a": slug_a,
|
||||
"entity_b": slug_b,
|
||||
"similarity": round(sim, 4),
|
||||
"method": "embedding",
|
||||
})
|
||||
else:
|
||||
# Fallback: structural overlap (shared definition words)
|
||||
slug_to_words = {}
|
||||
for e in entities:
|
||||
words = set(e.definition.lower().split()) if e.definition else set()
|
||||
slug_to_words[e.slug] = words
|
||||
|
||||
slugs = sorted(slug_to_words)
|
||||
for i, a in enumerate(slugs):
|
||||
for b in slugs[i + 1:]:
|
||||
wa, wb = slug_to_words[a], slug_to_words[b]
|
||||
if wa and wb:
|
||||
overlap = len(wa & wb) / min(len(wa), len(wb))
|
||||
if overlap >= threshold:
|
||||
pairs.append({
|
||||
"entity_a": a,
|
||||
"entity_b": b,
|
||||
"similarity": round(overlap, 4),
|
||||
"method": "word_overlap",
|
||||
})
|
||||
|
||||
# redundancy_ratio: fraction of entities involved in similar pairs
|
||||
involved = set()
|
||||
for p in pairs:
|
||||
involved.add(p["entity_a"])
|
||||
involved.add(p["entity_b"])
|
||||
ratio = len(involved) / n if n > 0 else 0.0
|
||||
|
||||
return RedundancyReport(
|
||||
similar_pairs=pairs,
|
||||
redundancy_ratio=ratio,
|
||||
entity_count=n,
|
||||
)
|
||||
524
markitect/infospace/cli.py
Normal file
524
markitect/infospace/cli.py
Normal file
@@ -0,0 +1,524 @@
|
||||
"""
|
||||
CLI commands for infospace lifecycle management.
|
||||
|
||||
Provides ``markitect infospace`` subcommands for initialising,
|
||||
inspecting, and evaluating infospaces.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import click
|
||||
|
||||
from markitect.infospace.config import (
|
||||
DisciplineBinding,
|
||||
InfospaceConfig,
|
||||
SchemaRegistry,
|
||||
TopicConfig,
|
||||
find_infospace_config,
|
||||
load_infospace_config,
|
||||
save_infospace_config,
|
||||
)
|
||||
from markitect.infospace.entity_parser import parse_entity_directory
|
||||
from markitect.infospace.state import build_state
|
||||
|
||||
|
||||
def _load_config_or_exit(config_path: Optional[str] = None) -> tuple:
|
||||
"""Resolve and load infospace.yaml, or exit with an error."""
|
||||
if config_path:
|
||||
p = Path(config_path)
|
||||
else:
|
||||
p = find_infospace_config()
|
||||
if p is None:
|
||||
click.echo("Error: No infospace.yaml found. Run 'markitect infospace init' first.", err=True)
|
||||
raise SystemExit(1)
|
||||
cfg = load_infospace_config(p)
|
||||
return cfg, p
|
||||
|
||||
|
||||
@click.group(name="infospace")
|
||||
def infospace_commands():
|
||||
"""Manage infospaces — create, inspect, evaluate."""
|
||||
pass
|
||||
|
||||
|
||||
# ── init ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--topic", required=True, help="Topic name for the infospace.")
|
||||
@click.option("--domain", default="", help="Knowledge domain.")
|
||||
@click.option("--sources", default="", help="Path to source material directory.")
|
||||
@click.option("--discipline", multiple=True, help="Discipline name (repeatable).")
|
||||
@click.option("--output", "-o", default="infospace.yaml", help="Output config file path.")
|
||||
def init(topic: str, domain: str, sources: str, discipline: tuple, output: str):
|
||||
"""Initialise a new infospace configuration file."""
|
||||
out_path = Path(output)
|
||||
if out_path.exists():
|
||||
click.echo(f"Error: {out_path} already exists.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
disciplines = [DisciplineBinding(name=d) for d in discipline]
|
||||
config = InfospaceConfig(
|
||||
topic=TopicConfig(name=topic, domain=domain, sources=sources),
|
||||
disciplines=disciplines,
|
||||
)
|
||||
save_infospace_config(config, out_path)
|
||||
click.echo(f"Created {out_path}")
|
||||
|
||||
|
||||
# ── status ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
def status(config_path: Optional[str]):
|
||||
"""Show infospace status — entity count, domains, evaluation state."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
# Parse entities
|
||||
entities_dir = root / cfg.entities_dir
|
||||
entities = []
|
||||
if entities_dir.is_dir():
|
||||
entities = parse_entity_directory(entities_dir)
|
||||
|
||||
# Load latest snapshot if available
|
||||
snapshot = None
|
||||
history_path = root / cfg.metrics_dir / "history.yaml"
|
||||
if history_path.is_file():
|
||||
from markitect.infospace.evaluation_io import read_history
|
||||
history = read_history(history_path)
|
||||
if history:
|
||||
snapshot = history[-1]
|
||||
|
||||
state = build_state(cfg, entities=entities, snapshot=snapshot)
|
||||
|
||||
click.echo(f"Infospace: {state.topic_name}")
|
||||
if cfg.topic.domain:
|
||||
click.echo(f"Domain: {cfg.topic.domain}")
|
||||
click.echo(f"Entities: {state.entity_count}")
|
||||
if state.domains:
|
||||
click.echo(f"Domains: {', '.join(state.domains)}")
|
||||
if cfg.disciplines:
|
||||
names = [d.name for d in cfg.disciplines]
|
||||
click.echo(f"Disciplines: {', '.join(names)}")
|
||||
if state.has_evaluations:
|
||||
click.echo(f"Last evaluated: {state.latest_snapshot.created_at.isoformat()}")
|
||||
else:
|
||||
click.echo("Evaluations: none")
|
||||
|
||||
|
||||
# ── entities ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
@click.option(
|
||||
"--sort-by", "sort_key",
|
||||
type=click.Choice(["slug", "domain", "words"]),
|
||||
default="slug",
|
||||
help="Sort entities by field.",
|
||||
)
|
||||
def entities(config_path: Optional[str], sort_key: str):
|
||||
"""List entities with metadata summary."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
entities_dir = root / cfg.entities_dir
|
||||
|
||||
if not entities_dir.is_dir():
|
||||
click.echo("No entities directory found.")
|
||||
return
|
||||
|
||||
entity_list = parse_entity_directory(entities_dir)
|
||||
if not entity_list:
|
||||
click.echo("No entities found.")
|
||||
return
|
||||
|
||||
# Sort
|
||||
if sort_key == "domain":
|
||||
entity_list.sort(key=lambda e: (e.domain or "", e.slug))
|
||||
elif sort_key == "words":
|
||||
entity_list.sort(key=lambda e: e.total_word_count, reverse=True)
|
||||
else:
|
||||
entity_list.sort(key=lambda e: e.slug)
|
||||
|
||||
# Format as table
|
||||
click.echo(f"{'Slug':<40} {'Domain':<20} {'Words':>6}")
|
||||
click.echo("-" * 68)
|
||||
for e in entity_list:
|
||||
click.echo(f"{e.slug:<40} {(e.domain or '-'):<20} {e.total_word_count:>6}")
|
||||
click.echo(f"\nTotal: {len(entity_list)} entities")
|
||||
|
||||
|
||||
# ── evaluate ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
@click.option("--provider", default="openrouter", help="LLM provider (openrouter, openai, etc.).")
|
||||
@click.option("--model", default=None, help="LLM model name.")
|
||||
@click.option("--entity", "entity_slug", default=None, help="Evaluate a single entity by slug.")
|
||||
@click.option("--chapter", default=None, help="Evaluate entities from a specific chapter.")
|
||||
def evaluate(config_path, provider, model, entity_slug, chapter):
|
||||
"""Evaluate entities using LLM-based quality assessment."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
entities_dir = root / cfg.entities_dir
|
||||
if not entities_dir.is_dir():
|
||||
click.echo("Error: No entities directory found.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
entity_list = parse_entity_directory(entities_dir)
|
||||
if not entity_list:
|
||||
click.echo("No entities to evaluate.")
|
||||
return
|
||||
|
||||
# Filter
|
||||
if entity_slug:
|
||||
entity_list = [e for e in entity_list if e.slug == entity_slug]
|
||||
if not entity_list:
|
||||
click.echo(f"Error: Entity '{entity_slug}' not found.", err=True)
|
||||
raise SystemExit(1)
|
||||
elif chapter:
|
||||
entity_list = [e for e in entity_list if chapter in e.source_chapter]
|
||||
if not entity_list:
|
||||
click.echo(f"No entities found for chapter '{chapter}'.")
|
||||
return
|
||||
|
||||
# Create adapter
|
||||
from markitect.llm import create_adapter
|
||||
from markitect.prompts.execution.models import RunConfig
|
||||
adapter = create_adapter(provider, model=model)
|
||||
run_config = RunConfig(model_name=model or "default", temperature=0.3, max_tokens=2000)
|
||||
|
||||
# Progress callback
|
||||
def on_progress(done, total, result):
|
||||
status = result.status.upper()
|
||||
click.echo(f" [{done}/{total}] {result.key}: {status}")
|
||||
|
||||
click.echo(f"Evaluating {len(entity_list)} entities via {provider}...")
|
||||
|
||||
from markitect.infospace.evaluate import run_entity_evaluation
|
||||
output_dir = root / cfg.evaluations_dir
|
||||
summary = run_entity_evaluation(
|
||||
config=cfg,
|
||||
entities=entity_list,
|
||||
adapter=adapter,
|
||||
run_config=run_config,
|
||||
output_dir=output_dir,
|
||||
progress_callback=on_progress,
|
||||
)
|
||||
|
||||
click.echo(f"\nDone: {summary.succeeded} succeeded, {summary.failed} failed, {summary.skipped} skipped")
|
||||
if summary.total_tokens > 0:
|
||||
click.echo(f"Tokens used: {summary.total_tokens}")
|
||||
|
||||
|
||||
# ── viability ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
def viability(config_path: Optional[str]):
|
||||
"""Show viability dashboard — threshold checks and pass/fail."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
|
||||
if not cfg.viability:
|
||||
click.echo("No viability thresholds configured in infospace.yaml.")
|
||||
return
|
||||
|
||||
# Try to load latest metrics
|
||||
root = cfg_path.parent
|
||||
metrics: dict = {}
|
||||
metrics_file = root / cfg.metrics_dir / "metrics.yaml"
|
||||
if metrics_file.is_file():
|
||||
import yaml
|
||||
raw = yaml.safe_load(metrics_file.read_text(encoding="utf-8"))
|
||||
if isinstance(raw, dict):
|
||||
metrics = {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
|
||||
|
||||
state = build_state(cfg, metrics=metrics if metrics else None)
|
||||
|
||||
if not state.viability_results:
|
||||
click.echo("No metrics available. Run evaluations first.")
|
||||
click.echo("\nConfigured thresholds:")
|
||||
for name, t in cfg.viability.items():
|
||||
bounds = []
|
||||
if t.min is not None:
|
||||
bounds.append(f"min={t.min}")
|
||||
if t.max is not None:
|
||||
bounds.append(f"max={t.max}")
|
||||
click.echo(f" {name}: {', '.join(bounds)}")
|
||||
return
|
||||
|
||||
click.echo(f"{'Metric':<30} {'Value':>8} {'Threshold':>15} {'Status':>8}")
|
||||
click.echo("-" * 63)
|
||||
for r in state.viability_results:
|
||||
bounds = []
|
||||
if r.threshold.min is not None:
|
||||
bounds.append(f"min={r.threshold.min}")
|
||||
if r.threshold.max is not None:
|
||||
bounds.append(f"max={r.threshold.max}")
|
||||
status_str = "PASS" if r.passed else "FAIL"
|
||||
click.echo(
|
||||
f"{r.metric:<30} {r.value:>8.4f} {', '.join(bounds):>15} {status_str:>8}"
|
||||
)
|
||||
|
||||
click.echo()
|
||||
if state.is_viable:
|
||||
click.echo(f"Viable: YES ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
|
||||
else:
|
||||
click.echo(f"Viable: NO ({state.viability_pass_count}/{state.viability_total_count} thresholds met)")
|
||||
|
||||
|
||||
# ── check ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
@click.option(
|
||||
"--concern", "concerns", multiple=True,
|
||||
type=click.Choice(["redundancy", "coverage", "coherence", "consistency", "granularity"]),
|
||||
help="Run specific concern(s). Omit to run all five.",
|
||||
)
|
||||
@click.option("--json", "as_json", is_flag=True, help="Output results as JSON.")
|
||||
def check(config_path: Optional[str], concerns: tuple, as_json: bool):
|
||||
"""Run collection-level quality checks (C1–C5)."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
entities_dir = root / cfg.entities_dir
|
||||
if not entities_dir.is_dir():
|
||||
click.echo("Error: No entities directory found.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
entity_list = parse_entity_directory(entities_dir)
|
||||
if not entity_list:
|
||||
click.echo("No entities to check.")
|
||||
return
|
||||
|
||||
from markitect.infospace.checks import run_all_checks
|
||||
|
||||
checks_list = list(concerns) if concerns else None
|
||||
|
||||
report = run_all_checks(
|
||||
entities=entity_list,
|
||||
checks=checks_list,
|
||||
)
|
||||
|
||||
if as_json:
|
||||
import json
|
||||
click.echo(json.dumps(report.to_dict(), indent=2))
|
||||
else:
|
||||
click.echo(f"Collection checks — {len(entity_list)} entities\n")
|
||||
d = report.to_dict()
|
||||
for concern_name, concern_data in d.items():
|
||||
label = concern_data.get("concern", concern_name.upper())
|
||||
click.echo(f" {label} — {concern_name}")
|
||||
for k, v in concern_data.items():
|
||||
if k == "concern":
|
||||
continue
|
||||
click.echo(f" {k}: {v}")
|
||||
click.echo()
|
||||
|
||||
# Show summary metrics
|
||||
m = report.metrics()
|
||||
if m and not as_json:
|
||||
click.echo("Metrics summary:")
|
||||
for k, v in sorted(m.items()):
|
||||
click.echo(f" {k}: {v:.4f}")
|
||||
|
||||
# Record to history
|
||||
if m:
|
||||
from markitect.infospace.history import record_check_results
|
||||
snap = record_check_results(report, cfg, root, entity_count=len(entity_list))
|
||||
if not as_json:
|
||||
click.echo(f"\nRecorded snapshot {snap.snapshot_id}")
|
||||
|
||||
|
||||
# ── history ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
@click.option("--metric", default=None, help="Show trend for a specific metric.")
|
||||
@click.option("--json", "as_json", is_flag=True, help="Output as JSON.")
|
||||
def history(config_path: Optional[str], metric: Optional[str], as_json: bool):
|
||||
"""Show metrics history — snapshots over time."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
from markitect.infospace.history import get_history, metric_trend
|
||||
|
||||
snapshots = get_history(cfg, root)
|
||||
if not snapshots:
|
||||
click.echo("No history found. Run 'markitect infospace check' first.")
|
||||
return
|
||||
|
||||
if metric:
|
||||
trend = metric_trend(snapshots, metric)
|
||||
if not trend:
|
||||
click.echo(f"No data for metric '{metric}'.")
|
||||
return
|
||||
if as_json:
|
||||
import json
|
||||
click.echo(json.dumps(trend, indent=2))
|
||||
else:
|
||||
click.echo(f"Trend: {metric}\n")
|
||||
for entry in trend:
|
||||
click.echo(f" {entry['date'][:19]} {entry['value']:.4f}")
|
||||
return
|
||||
|
||||
if as_json:
|
||||
import json
|
||||
click.echo(json.dumps([s.to_dict() for s in snapshots], indent=2, default=str))
|
||||
return
|
||||
|
||||
click.echo(f"History: {len(snapshots)} snapshot(s)\n")
|
||||
click.echo(f"{'#':<4} {'Date':<20} {'Entities':>8} {'Metrics':>8}")
|
||||
click.echo("-" * 42)
|
||||
for i, snap in enumerate(snapshots, 1):
|
||||
date_str = snap.created_at.isoformat()[:19]
|
||||
n_metrics = len(snap.collection_metrics)
|
||||
click.echo(f"{i:<4} {date_str:<20} {snap.entity_count:>8} {n_metrics:>8}")
|
||||
|
||||
|
||||
@infospace_commands.command(name="history-diff")
|
||||
@click.argument("date_a")
|
||||
@click.argument("date_b")
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
def history_diff(date_a: str, date_b: str, config_path: Optional[str]):
|
||||
"""Compare two history snapshots by date (YYYY-MM-DD)."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
from markitect.infospace.history import find_snapshot_by_date, get_history
|
||||
from markitect.infospace.evaluation_io import diff_snapshots
|
||||
|
||||
snapshots = get_history(cfg, root)
|
||||
if len(snapshots) < 2:
|
||||
click.echo("Need at least two snapshots to diff.")
|
||||
return
|
||||
|
||||
snap_a = find_snapshot_by_date(snapshots, date_a)
|
||||
snap_b = find_snapshot_by_date(snapshots, date_b)
|
||||
|
||||
if snap_a is None:
|
||||
click.echo(f"No snapshot found near '{date_a}'.")
|
||||
return
|
||||
if snap_b is None:
|
||||
click.echo(f"No snapshot found near '{date_b}'.")
|
||||
return
|
||||
if snap_a.snapshot_id == snap_b.snapshot_id:
|
||||
click.echo("Both dates resolve to the same snapshot.")
|
||||
return
|
||||
|
||||
diff = diff_snapshots(snap_a, snap_b)
|
||||
click.echo(diff.summary())
|
||||
|
||||
|
||||
# ── bind-discipline ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command(name="bind-discipline")
|
||||
@click.argument("discipline_path")
|
||||
@click.option("--name", required=True, help="Name for the discipline.")
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
def bind_discipline_cmd(discipline_path: str, name: str, config_path: Optional[str]):
|
||||
"""Bind a discipline infospace to the current infospace."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
from markitect.infospace.composition import bind_discipline
|
||||
|
||||
status = bind_discipline(cfg, name=name, path=discipline_path, root=root)
|
||||
|
||||
if status.error:
|
||||
click.echo(f"Error: {status.error}", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
# Persist updated config
|
||||
save_infospace_config(cfg, cfg_path)
|
||||
|
||||
click.echo(f"Bound discipline '{name}' from {discipline_path}")
|
||||
click.echo(f" Entities: {status.entity_count}")
|
||||
if status.has_config:
|
||||
viable_str = "YES" if status.is_viable else "NO"
|
||||
click.echo(f" Viable: {viable_str}")
|
||||
|
||||
|
||||
# ── disciplines ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
def disciplines(config_path: Optional[str]):
|
||||
"""List bound disciplines and their viability status."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
if not cfg.disciplines:
|
||||
click.echo("No disciplines bound.")
|
||||
return
|
||||
|
||||
from markitect.infospace.composition import check_discipline_status
|
||||
|
||||
click.echo(f"{'Name':<30} {'Entities':>8} {'Viable':>8} {'Path'}")
|
||||
click.echo("-" * 70)
|
||||
for binding in cfg.disciplines:
|
||||
status = check_discipline_status(binding, root)
|
||||
viable_str = "YES" if status.is_viable else ("NO" if status.has_config else "?")
|
||||
click.echo(
|
||||
f"{status.name:<30} {status.entity_count:>8} {viable_str:>8} {status.path}"
|
||||
)
|
||||
if status.error:
|
||||
click.echo(f" Error: {status.error}")
|
||||
|
||||
|
||||
# ── stale-mappings ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command(name="stale-mappings")
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
def stale_mappings(config_path: Optional[str]):
|
||||
"""Check for stale mappings due to discipline changes."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
if not cfg.disciplines:
|
||||
click.echo("No disciplines bound — no mappings to check.")
|
||||
return
|
||||
|
||||
from markitect.infospace.composition import find_stale_mappings
|
||||
|
||||
# Try to load mapping references from output
|
||||
mapping_refs = _load_mapping_references(cfg, root)
|
||||
|
||||
stale = find_stale_mappings(cfg, root, mapping_references=mapping_refs)
|
||||
|
||||
if not stale:
|
||||
click.echo("No stale mappings detected.")
|
||||
return
|
||||
|
||||
click.echo(f"Found {len(stale)} stale mapping(s):\n")
|
||||
for s in stale:
|
||||
click.echo(f" {s.entity_slug} -> {s.discipline_entity}")
|
||||
click.echo(f" {s.reason}")
|
||||
|
||||
|
||||
def _load_mapping_references(
|
||||
cfg: InfospaceConfig, root: Path
|
||||
) -> Optional[dict]:
|
||||
"""Try to load mapping references from YAML file in output dir."""
|
||||
mapping_file = root / cfg.metrics_dir / "mapping-references.yaml"
|
||||
if not mapping_file.is_file():
|
||||
return None
|
||||
import yaml
|
||||
data = yaml.safe_load(mapping_file.read_text(encoding="utf-8"))
|
||||
if isinstance(data, dict):
|
||||
return data
|
||||
return None
|
||||
281
markitect/infospace/composition.py
Normal file
281
markitect/infospace/composition.py
Normal file
@@ -0,0 +1,281 @@
|
||||
"""
|
||||
Infospace composition model.
|
||||
|
||||
Allows one infospace to use another as a discipline — a reusable
|
||||
framework of concepts applied as an analytical lens.
|
||||
|
||||
Key operations:
|
||||
- Resolve and validate discipline bindings
|
||||
- Check discipline viability (must meet its own thresholds)
|
||||
- List discipline entities as mapping targets
|
||||
- Detect stale mappings when discipline content changes
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from markitect.infospace.config import (
|
||||
DisciplineBinding,
|
||||
InfospaceConfig,
|
||||
load_infospace_config,
|
||||
)
|
||||
from markitect.infospace.entity_parser import parse_entity_directory
|
||||
from markitect.infospace.history import get_latest_snapshot, read_metrics_file
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.infospace.state import InfospaceState, ViabilityResult, build_state
|
||||
|
||||
|
||||
@dataclass
|
||||
class DisciplineStatus:
|
||||
"""Status of a bound discipline infospace."""
|
||||
|
||||
name: str
|
||||
path: str
|
||||
resolved_path: Optional[Path] = None
|
||||
exists: bool = False
|
||||
has_config: bool = False
|
||||
entity_count: int = 0
|
||||
is_viable: bool = False
|
||||
viability_results: List[ViabilityResult] = field(default_factory=list)
|
||||
error: str = ""
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {
|
||||
"name": self.name,
|
||||
"path": self.path,
|
||||
"exists": self.exists,
|
||||
"has_config": self.has_config,
|
||||
"entity_count": self.entity_count,
|
||||
"is_viable": self.is_viable,
|
||||
}
|
||||
if self.viability_results:
|
||||
d["viability"] = [r.to_dict() for r in self.viability_results]
|
||||
if self.error:
|
||||
d["error"] = self.error
|
||||
return d
|
||||
|
||||
|
||||
@dataclass
|
||||
class StaleMappingInfo:
|
||||
"""Information about a mapping that may be stale."""
|
||||
|
||||
entity_slug: str
|
||||
discipline_entity: str
|
||||
reason: str
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"entity_slug": self.entity_slug,
|
||||
"discipline_entity": self.discipline_entity,
|
||||
"reason": self.reason,
|
||||
}
|
||||
|
||||
|
||||
# ── Resolution ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def resolve_discipline_path(
|
||||
binding: DisciplineBinding, root: Path
|
||||
) -> Optional[Path]:
|
||||
"""Resolve a discipline binding to an absolute path.
|
||||
|
||||
Tries the binding's path relative to *root*, then as an absolute path.
|
||||
Returns ``None`` if the directory doesn't exist.
|
||||
"""
|
||||
if not binding.path:
|
||||
return None
|
||||
|
||||
# Try relative to root first
|
||||
candidate = root / binding.path
|
||||
if candidate.is_dir():
|
||||
return candidate.resolve()
|
||||
|
||||
# Try as absolute
|
||||
candidate = Path(binding.path)
|
||||
if candidate.is_dir():
|
||||
return candidate.resolve()
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def load_discipline_config(
|
||||
binding: DisciplineBinding, root: Path
|
||||
) -> Optional[InfospaceConfig]:
|
||||
"""Load the infospace config for a bound discipline.
|
||||
|
||||
Returns ``None`` if the discipline path cannot be resolved or
|
||||
has no ``infospace.yaml``.
|
||||
"""
|
||||
disc_path = resolve_discipline_path(binding, root)
|
||||
if disc_path is None:
|
||||
return None
|
||||
|
||||
config_file = disc_path / "infospace.yaml"
|
||||
if not config_file.is_file():
|
||||
return None
|
||||
|
||||
return load_infospace_config(config_file)
|
||||
|
||||
|
||||
# ── Viability checking ───────────────────────────────────────────────
|
||||
|
||||
|
||||
def check_discipline_status(
|
||||
binding: DisciplineBinding, root: Path
|
||||
) -> DisciplineStatus:
|
||||
"""Check the full status of a bound discipline.
|
||||
|
||||
Resolves the path, loads config, counts entities, and checks
|
||||
viability against the discipline's own thresholds.
|
||||
"""
|
||||
status = DisciplineStatus(name=binding.name, path=binding.path)
|
||||
|
||||
disc_path = resolve_discipline_path(binding, root)
|
||||
if disc_path is None:
|
||||
status.error = f"Path not found: {binding.path}"
|
||||
return status
|
||||
|
||||
status.resolved_path = disc_path
|
||||
status.exists = True
|
||||
|
||||
# Load config
|
||||
config_file = disc_path / "infospace.yaml"
|
||||
if not config_file.is_file():
|
||||
status.error = "No infospace.yaml found"
|
||||
return status
|
||||
|
||||
disc_config = load_infospace_config(config_file)
|
||||
status.has_config = True
|
||||
|
||||
# Count entities
|
||||
entities_dir = disc_path / disc_config.entities_dir
|
||||
if entities_dir.is_dir():
|
||||
entities = parse_entity_directory(entities_dir)
|
||||
status.entity_count = len(entities)
|
||||
|
||||
# Check viability
|
||||
if disc_config.viability:
|
||||
metrics = read_metrics_file(disc_path / disc_config.metrics_dir / "metrics.yaml")
|
||||
if metrics:
|
||||
state = build_state(disc_config, metrics=metrics)
|
||||
status.viability_results = state.viability_results
|
||||
status.is_viable = state.is_viable
|
||||
|
||||
return status
|
||||
|
||||
|
||||
def get_discipline_entities(
|
||||
binding: DisciplineBinding, root: Path
|
||||
) -> List[EntityMeta]:
|
||||
"""Get all entities from a bound discipline infospace."""
|
||||
disc_path = resolve_discipline_path(binding, root)
|
||||
if disc_path is None:
|
||||
return []
|
||||
|
||||
disc_config = load_discipline_config(binding, root)
|
||||
if disc_config is None:
|
||||
return []
|
||||
|
||||
entities_dir = disc_path / disc_config.entities_dir
|
||||
if not entities_dir.is_dir():
|
||||
return []
|
||||
|
||||
return parse_entity_directory(entities_dir)
|
||||
|
||||
|
||||
# ── Stale mapping detection ─────────────────────────────────────────
|
||||
|
||||
|
||||
def _content_digest(entity: EntityMeta) -> str:
|
||||
"""Compute a short content digest for an entity."""
|
||||
content = f"{entity.slug}|{entity.definition}|{entity.domain}"
|
||||
return hashlib.sha256(content.encode()).hexdigest()[:12]
|
||||
|
||||
|
||||
def compute_discipline_digests(
|
||||
binding: DisciplineBinding, root: Path
|
||||
) -> Dict[str, str]:
|
||||
"""Compute content digests for all entities in a discipline.
|
||||
|
||||
Returns ``{slug: digest}`` mapping.
|
||||
"""
|
||||
entities = get_discipline_entities(binding, root)
|
||||
return {e.slug: _content_digest(e) for e in entities}
|
||||
|
||||
|
||||
def find_stale_mappings(
|
||||
config: InfospaceConfig,
|
||||
root: Path,
|
||||
mapping_references: Optional[Dict[str, List[str]]] = None,
|
||||
) -> List[StaleMappingInfo]:
|
||||
"""Find mappings that may be stale due to discipline changes.
|
||||
|
||||
Args:
|
||||
config: The infospace configuration.
|
||||
root: Project root directory.
|
||||
mapping_references: ``{entity_slug: [discipline_entity_slugs]}``
|
||||
mapping of local entities to the discipline entities they
|
||||
reference. If ``None``, returns an empty list (no mapping
|
||||
data available).
|
||||
|
||||
Returns:
|
||||
List of stale mapping info objects.
|
||||
"""
|
||||
if not mapping_references:
|
||||
return []
|
||||
|
||||
stale: List[StaleMappingInfo] = []
|
||||
|
||||
for binding in config.disciplines:
|
||||
disc_entities = get_discipline_entities(binding, root)
|
||||
disc_slugs = {e.slug for e in disc_entities}
|
||||
|
||||
for entity_slug, refs in mapping_references.items():
|
||||
for ref_slug in refs:
|
||||
if ref_slug not in disc_slugs:
|
||||
stale.append(StaleMappingInfo(
|
||||
entity_slug=entity_slug,
|
||||
discipline_entity=ref_slug,
|
||||
reason=f"Discipline entity '{ref_slug}' no longer exists in '{binding.name}'",
|
||||
))
|
||||
|
||||
return stale
|
||||
|
||||
|
||||
# ── Binding management ───────────────────────────────────────────────
|
||||
|
||||
|
||||
def bind_discipline(
|
||||
config: InfospaceConfig,
|
||||
name: str,
|
||||
path: str,
|
||||
root: Path,
|
||||
) -> DisciplineStatus:
|
||||
"""Add a discipline binding to the config and validate it.
|
||||
|
||||
Does NOT persist the config — the caller should save it.
|
||||
|
||||
Args:
|
||||
config: The infospace configuration to update.
|
||||
name: Discipline name.
|
||||
path: Path to the discipline infospace.
|
||||
root: Project root for path resolution.
|
||||
|
||||
Returns:
|
||||
Status of the newly bound discipline.
|
||||
"""
|
||||
# Check for duplicates
|
||||
existing = {d.name for d in config.disciplines}
|
||||
if name in existing:
|
||||
return DisciplineStatus(
|
||||
name=name, path=path, error=f"Discipline '{name}' already bound"
|
||||
)
|
||||
|
||||
binding = DisciplineBinding(name=name, path=path)
|
||||
config.disciplines.append(binding)
|
||||
|
||||
return check_discipline_status(binding, root)
|
||||
309
markitect/infospace/config.py
Normal file
309
markitect/infospace/config.py
Normal file
@@ -0,0 +1,309 @@
|
||||
"""
|
||||
Infospace configuration model and YAML loader.
|
||||
|
||||
An infospace is declared via an ``infospace.yaml`` file that specifies
|
||||
its topic, disciplines, schemas, competency questions, and viability
|
||||
thresholds. This module provides the data models and I/O for that
|
||||
configuration.
|
||||
|
||||
Example ``infospace.yaml``::
|
||||
|
||||
topic:
|
||||
name: "The Wealth of Nations"
|
||||
domain: "Classical Economics"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Viable System Model"
|
||||
path: artifacts/vsm-reference/
|
||||
|
||||
schemas:
|
||||
entity: schemas/economic-entity-schema-v1.0.md
|
||||
|
||||
competency_questions: schemas/competency-questions.md
|
||||
|
||||
viability:
|
||||
coverage_ratio: { min: 0.60 }
|
||||
per_entity_mean: { min: 3.5 }
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
@dataclass
|
||||
class TopicConfig:
|
||||
"""The subject matter an infospace explains.
|
||||
|
||||
Attributes:
|
||||
name: Human-readable topic name.
|
||||
domain: Broader knowledge domain.
|
||||
sources: Path (relative to infospace root) to source material.
|
||||
"""
|
||||
|
||||
name: str
|
||||
domain: str = ""
|
||||
sources: str = ""
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {"name": self.name}
|
||||
if self.domain:
|
||||
d["domain"] = self.domain
|
||||
if self.sources:
|
||||
d["sources"] = self.sources
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> TopicConfig:
|
||||
return cls(
|
||||
name=data["name"],
|
||||
domain=data.get("domain", ""),
|
||||
sources=data.get("sources", ""),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class DisciplineBinding:
|
||||
"""An external infospace applied as an analytical lens.
|
||||
|
||||
Attributes:
|
||||
name: Human-readable discipline name.
|
||||
path: Path to the discipline infospace (relative to root).
|
||||
"""
|
||||
|
||||
name: str
|
||||
path: str = ""
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {"name": self.name}
|
||||
if self.path:
|
||||
d["path"] = self.path
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> DisciplineBinding:
|
||||
return cls(name=data["name"], path=data.get("path", ""))
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaRegistry:
|
||||
"""Schema paths governing entity and document structure.
|
||||
|
||||
All paths are relative to the infospace root directory.
|
||||
"""
|
||||
|
||||
entity: str = ""
|
||||
mapping: str = ""
|
||||
analysis: str = ""
|
||||
extra: Dict[str, str] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {}
|
||||
if self.entity:
|
||||
d["entity"] = self.entity
|
||||
if self.mapping:
|
||||
d["mapping"] = self.mapping
|
||||
if self.analysis:
|
||||
d["analysis"] = self.analysis
|
||||
d.update(self.extra)
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> SchemaRegistry:
|
||||
known = {"entity", "mapping", "analysis"}
|
||||
extra = {k: v for k, v in data.items() if k not in known}
|
||||
return cls(
|
||||
entity=data.get("entity", ""),
|
||||
mapping=data.get("mapping", ""),
|
||||
analysis=data.get("analysis", ""),
|
||||
extra=extra,
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ViabilityThreshold:
|
||||
"""Threshold for a single viability metric.
|
||||
|
||||
At least one of *min* or *max* should be set.
|
||||
"""
|
||||
|
||||
metric: str
|
||||
min: Optional[float] = None
|
||||
max: Optional[float] = None
|
||||
|
||||
def check(self, value: float) -> bool:
|
||||
"""Return ``True`` if *value* is within the threshold."""
|
||||
if self.min is not None and value < self.min:
|
||||
return False
|
||||
if self.max is not None and value > self.max:
|
||||
return False
|
||||
return True
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {}
|
||||
if self.min is not None:
|
||||
d["min"] = self.min
|
||||
if self.max is not None:
|
||||
d["max"] = self.max
|
||||
return d
|
||||
|
||||
|
||||
@dataclass
|
||||
class PipelineStage:
|
||||
"""A single stage in the processing pipeline."""
|
||||
|
||||
template: str
|
||||
spaces: List[str] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {"template": self.template}
|
||||
if self.spaces:
|
||||
d["spaces"] = self.spaces
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> PipelineStage:
|
||||
return cls(
|
||||
template=data["template"],
|
||||
spaces=data.get("spaces", []),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PipelineConfig:
|
||||
"""Processing pipeline configuration."""
|
||||
|
||||
stages: List[PipelineStage] = field(default_factory=list)
|
||||
post_batch: List[PipelineStage] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {}
|
||||
if self.stages:
|
||||
d["stages"] = [s.to_dict() for s in self.stages]
|
||||
if self.post_batch:
|
||||
d["post_batch"] = [s.to_dict() for s in self.post_batch]
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> PipelineConfig:
|
||||
return cls(
|
||||
stages=[PipelineStage.from_dict(s) for s in data.get("stages", [])],
|
||||
post_batch=[PipelineStage.from_dict(s) for s in data.get("post_batch", [])],
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class InfospaceConfig:
|
||||
"""Complete infospace configuration, loaded from ``infospace.yaml``.
|
||||
|
||||
This is the declarative description of an infospace: what it
|
||||
explains, through which lenses, governed by which schemas, and
|
||||
what quality thresholds it must meet.
|
||||
"""
|
||||
|
||||
topic: TopicConfig
|
||||
disciplines: List[DisciplineBinding] = field(default_factory=list)
|
||||
schemas: SchemaRegistry = field(default_factory=SchemaRegistry)
|
||||
competency_questions: str = ""
|
||||
viability: Dict[str, ViabilityThreshold] = field(default_factory=dict)
|
||||
pipeline: Optional[PipelineConfig] = None
|
||||
entities_dir: str = "output/entities"
|
||||
evaluations_dir: str = "output/evaluations"
|
||||
metrics_dir: str = "output/metrics"
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {"topic": self.topic.to_dict()}
|
||||
if self.disciplines:
|
||||
d["disciplines"] = [db.to_dict() for db in self.disciplines]
|
||||
schemas_dict = self.schemas.to_dict()
|
||||
if schemas_dict:
|
||||
d["schemas"] = schemas_dict
|
||||
if self.competency_questions:
|
||||
d["competency_questions"] = self.competency_questions
|
||||
if self.viability:
|
||||
d["viability"] = {
|
||||
name: t.to_dict() for name, t in self.viability.items()
|
||||
}
|
||||
if self.pipeline:
|
||||
d["pipeline"] = self.pipeline.to_dict()
|
||||
if self.entities_dir != "output/entities":
|
||||
d["entities_dir"] = self.entities_dir
|
||||
if self.evaluations_dir != "output/evaluations":
|
||||
d["evaluations_dir"] = self.evaluations_dir
|
||||
if self.metrics_dir != "output/metrics":
|
||||
d["metrics_dir"] = self.metrics_dir
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> InfospaceConfig:
|
||||
viability_raw = data.get("viability", {})
|
||||
viability = {
|
||||
name: ViabilityThreshold(metric=name, **bounds)
|
||||
for name, bounds in viability_raw.items()
|
||||
}
|
||||
pipeline_raw = data.get("pipeline")
|
||||
pipeline = PipelineConfig.from_dict(pipeline_raw) if pipeline_raw else None
|
||||
|
||||
return cls(
|
||||
topic=TopicConfig.from_dict(data["topic"]),
|
||||
disciplines=[
|
||||
DisciplineBinding.from_dict(d)
|
||||
for d in data.get("disciplines", [])
|
||||
],
|
||||
schemas=SchemaRegistry.from_dict(data.get("schemas", {})),
|
||||
competency_questions=data.get("competency_questions", ""),
|
||||
viability=viability,
|
||||
pipeline=pipeline,
|
||||
entities_dir=data.get("entities_dir", "output/entities"),
|
||||
evaluations_dir=data.get("evaluations_dir", "output/evaluations"),
|
||||
metrics_dir=data.get("metrics_dir", "output/metrics"),
|
||||
)
|
||||
|
||||
|
||||
def load_infospace_config(path: Path) -> InfospaceConfig:
|
||||
"""Load an :class:`InfospaceConfig` from a YAML file.
|
||||
|
||||
Args:
|
||||
path: Path to ``infospace.yaml``.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If *path* does not exist.
|
||||
ValueError: If required fields are missing.
|
||||
"""
|
||||
data = yaml.safe_load(path.read_text(encoding="utf-8"))
|
||||
if not isinstance(data, dict):
|
||||
raise ValueError(f"Expected a YAML mapping in {path}")
|
||||
if "topic" not in data:
|
||||
raise ValueError(f"Missing required 'topic' key in {path}")
|
||||
return InfospaceConfig.from_dict(data)
|
||||
|
||||
|
||||
def save_infospace_config(config: InfospaceConfig, path: Path) -> None:
|
||||
"""Write an :class:`InfospaceConfig` to a YAML file."""
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(
|
||||
yaml.safe_dump(
|
||||
config.to_dict(),
|
||||
default_flow_style=False,
|
||||
sort_keys=False,
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
def find_infospace_config(start: Optional[Path] = None) -> Optional[Path]:
|
||||
"""Walk up from *start* looking for ``infospace.yaml``.
|
||||
|
||||
Returns the path to the config file, or ``None``.
|
||||
"""
|
||||
current = (start or Path.cwd()).resolve()
|
||||
for directory in [current, *current.parents]:
|
||||
candidate = directory / "infospace.yaml"
|
||||
if candidate.is_file():
|
||||
return candidate
|
||||
return None
|
||||
176
markitect/infospace/entity_parser.py
Normal file
176
markitect/infospace/entity_parser.py
Normal file
@@ -0,0 +1,176 @@
|
||||
"""
|
||||
Entity metadata parser.
|
||||
|
||||
Extracts structured :class:`EntityMeta` from entity markdown files
|
||||
produced by the infospace entity-extraction pipeline.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import List, Optional, Sequence
|
||||
|
||||
from markitect.core.parser import parse_markdown_to_ast
|
||||
from markitect.core.section_tree import (
|
||||
build_section_tree,
|
||||
extract_heading_content,
|
||||
extract_heading_level,
|
||||
extract_section_text,
|
||||
slugify,
|
||||
)
|
||||
from .models import EntityMeta
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Sections we look for (slug → human-friendly label)
|
||||
_KNOWN_SECTIONS = {
|
||||
"definition": "Definition",
|
||||
"source_chapter": "Source Chapter",
|
||||
"context": "Context",
|
||||
"economic_domain": "Economic Domain",
|
||||
"smith_s_original_wording": "Smith's Original Wording",
|
||||
"modern_interpretation": "Modern Interpretation",
|
||||
}
|
||||
|
||||
# Default filename patterns to exclude from directory parsing
|
||||
_DEFAULT_EXCLUDE_PATTERNS = (
|
||||
r".*-entities\.md$",
|
||||
r".*-prompt\.md$",
|
||||
)
|
||||
|
||||
|
||||
def _is_title_case(text: str) -> bool:
|
||||
"""Return True if *text* is in title case (ignoring short words)."""
|
||||
# Words that are allowed to be lowercase in title case
|
||||
minor_words = {
|
||||
"a", "an", "the", "and", "but", "or", "nor", "for", "yet", "so",
|
||||
"in", "on", "at", "to", "by", "of", "up", "as", "is", "if",
|
||||
}
|
||||
words = text.split()
|
||||
if not words:
|
||||
return False
|
||||
for i, word in enumerate(words):
|
||||
# Strip leading/trailing punctuation for the check
|
||||
clean = re.sub(r"[^\w]", "", word)
|
||||
if not clean:
|
||||
continue
|
||||
# First word must be capitalised
|
||||
if i == 0:
|
||||
if not clean[0].isupper():
|
||||
return False
|
||||
elif clean.lower() in minor_words:
|
||||
continue # minor words may be lower
|
||||
elif not clean[0].isupper():
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _word_count(text: str) -> int:
|
||||
"""Count whitespace-separated words in *text*."""
|
||||
return len(text.split())
|
||||
|
||||
|
||||
def _find_h2_section(tree_root: dict, slug: str) -> Optional[dict]:
|
||||
"""Find a direct H2 child of the root by slug."""
|
||||
for child in tree_root.get("children", []):
|
||||
if child["level"] == 2 and child["slug"] == slug:
|
||||
return child
|
||||
return None
|
||||
|
||||
|
||||
def parse_entity_file(path: Path) -> EntityMeta:
|
||||
"""Parse a single entity markdown file into :class:`EntityMeta`.
|
||||
|
||||
Raises:
|
||||
ValueError: If the file has no H1 heading.
|
||||
"""
|
||||
content = path.read_text(encoding="utf-8")
|
||||
tokens = parse_markdown_to_ast(content)
|
||||
tree = build_section_tree(tokens)
|
||||
|
||||
# --- H1: entity title ---
|
||||
h1_section = None
|
||||
for child in tree["children"]:
|
||||
if child["level"] == 1:
|
||||
h1_section = child
|
||||
break
|
||||
|
||||
if h1_section is None:
|
||||
raise ValueError(f"No H1 heading found in {path}")
|
||||
|
||||
h1_raw = h1_section["heading"]
|
||||
slug = slugify(h1_raw)
|
||||
title = h1_raw
|
||||
h1_is_title_case = _is_title_case(h1_raw)
|
||||
|
||||
# Use the H1 node as the effective root for H2 look-ups
|
||||
effective_root = h1_section
|
||||
|
||||
# Collect all H2 section slugs
|
||||
section_slugs = [c["slug"] for c in effective_root.get("children", []) if c["level"] == 2]
|
||||
|
||||
# --- Extract known sections ---
|
||||
def _get_section_text(section_slug: str) -> str:
|
||||
node = _find_h2_section(effective_root, section_slug)
|
||||
if node is None:
|
||||
return ""
|
||||
return extract_section_text(node).strip()
|
||||
|
||||
definition = _get_section_text("definition")
|
||||
source_chapter = _get_section_text("source_chapter")
|
||||
context = _get_section_text("context")
|
||||
domain = _get_section_text("economic_domain")
|
||||
original_wording = _get_section_text("smith_s_original_wording")
|
||||
modern_interpretation = _get_section_text("modern_interpretation")
|
||||
|
||||
# --- Derived metrics ---
|
||||
has_original_wording = bool(original_wording)
|
||||
definition_word_count = _word_count(definition)
|
||||
total_word_count = _word_count(content)
|
||||
|
||||
return EntityMeta(
|
||||
slug=slug,
|
||||
title=title,
|
||||
h1_raw=h1_raw,
|
||||
definition=definition,
|
||||
source_chapter=source_chapter,
|
||||
context=context,
|
||||
domain=domain,
|
||||
original_wording=original_wording,
|
||||
modern_interpretation=modern_interpretation,
|
||||
h1_is_title_case=h1_is_title_case,
|
||||
has_original_wording=has_original_wording,
|
||||
definition_word_count=definition_word_count,
|
||||
total_word_count=total_word_count,
|
||||
section_slugs=section_slugs,
|
||||
source_path=str(path),
|
||||
)
|
||||
|
||||
|
||||
def parse_entity_directory(
|
||||
directory: Path,
|
||||
exclude_patterns: Optional[Sequence[str]] = None,
|
||||
) -> List[EntityMeta]:
|
||||
"""Parse all entity markdown files in *directory*.
|
||||
|
||||
Files matching *exclude_patterns* (regexes tested against the
|
||||
filename) are skipped. Defaults exclude chapter-view
|
||||
(``*-entities.md``) and prompt (``*-prompt.md``) files.
|
||||
|
||||
Malformed files are skipped with a warning rather than raising.
|
||||
"""
|
||||
if exclude_patterns is None:
|
||||
exclude_patterns = _DEFAULT_EXCLUDE_PATTERNS
|
||||
|
||||
compiled = [re.compile(p) for p in exclude_patterns]
|
||||
entities: List[EntityMeta] = []
|
||||
|
||||
for md_file in sorted(directory.glob("*.md")):
|
||||
if any(pat.match(md_file.name) for pat in compiled):
|
||||
continue
|
||||
try:
|
||||
entities.append(parse_entity_file(md_file))
|
||||
except Exception as exc:
|
||||
logger.warning("Skipping %s: %s", md_file.name, exc)
|
||||
|
||||
return entities
|
||||
215
markitect/infospace/evaluate.py
Normal file
215
markitect/infospace/evaluate.py
Normal file
@@ -0,0 +1,215 @@
|
||||
"""
|
||||
Per-entity evaluation pipeline.
|
||||
|
||||
Builds prompts from entity metadata and delegates LLM evaluation to
|
||||
the :class:`BatchEvaluator`. Writes structured results to the
|
||||
evaluations directory.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Callable, Dict, List, Optional
|
||||
|
||||
from markitect.infospace.config import InfospaceConfig
|
||||
from markitect.infospace.evaluation import EntityEvaluation, ScoreEntry
|
||||
from markitect.infospace.evaluation_io import write_entity_evaluation
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.prompts.execution.batch import BatchEvaluator, BatchItem, BatchSummary
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig
|
||||
|
||||
|
||||
_DEFAULT_DIMENSIONS = [
|
||||
"definition_precision",
|
||||
"source_grounding",
|
||||
"domain_relevance",
|
||||
"discipline_alignment",
|
||||
"conceptual_clarity",
|
||||
]
|
||||
|
||||
_PROMPT_TEMPLATE = """\
|
||||
You are evaluating an entity from an infospace about "{topic}".
|
||||
|
||||
## Entity: {title}
|
||||
|
||||
**Slug:** {slug}
|
||||
**Domain:** {domain}
|
||||
**Source chapter:** {source_chapter}
|
||||
|
||||
### Definition
|
||||
{definition}
|
||||
|
||||
### Context
|
||||
{context}
|
||||
|
||||
## Instructions
|
||||
|
||||
Rate this entity on each dimension below using a scale of 1-5 \
|
||||
(1 = poor, 5 = excellent). For each dimension, provide:
|
||||
1. A numeric score (1-5)
|
||||
2. A brief rationale (1-2 sentences)
|
||||
|
||||
### Dimensions to evaluate:
|
||||
{dimensions_list}
|
||||
|
||||
## Output format
|
||||
|
||||
Return your evaluation as a structured list:
|
||||
|
||||
DIMENSION: <name>
|
||||
SCORE: <1-5>
|
||||
RATIONALE: <explanation>
|
||||
|
||||
Repeat for each dimension.
|
||||
"""
|
||||
|
||||
|
||||
def build_evaluation_prompt(
|
||||
entity: EntityMeta,
|
||||
topic: str,
|
||||
dimensions: Optional[List[str]] = None,
|
||||
) -> str:
|
||||
"""Build an evaluation prompt for a single entity."""
|
||||
dims = dimensions or _DEFAULT_DIMENSIONS
|
||||
dims_list = "\n".join(f"- {d}" for d in dims)
|
||||
return _PROMPT_TEMPLATE.format(
|
||||
topic=topic,
|
||||
title=entity.title,
|
||||
slug=entity.slug,
|
||||
domain=entity.domain or "(unspecified)",
|
||||
source_chapter=entity.source_chapter or "(unspecified)",
|
||||
definition=entity.definition or "(no definition)",
|
||||
context=entity.context or "(no context)",
|
||||
dimensions_list=dims_list,
|
||||
)
|
||||
|
||||
|
||||
def content_digest(entity: EntityMeta) -> str:
|
||||
"""Compute a content digest for incremental evaluation."""
|
||||
content = f"{entity.slug}:{entity.definition}:{entity.context}:{entity.domain}"
|
||||
return hashlib.sha256(content.encode()).hexdigest()[:16]
|
||||
|
||||
|
||||
def parse_evaluation_response(
|
||||
response_text: str,
|
||||
dimensions: Optional[List[str]] = None,
|
||||
) -> List[ScoreEntry]:
|
||||
"""Parse structured dimension scores from LLM response text.
|
||||
|
||||
Expects blocks of::
|
||||
|
||||
DIMENSION: <name>
|
||||
SCORE: <1-5>
|
||||
RATIONALE: <text>
|
||||
"""
|
||||
dims = dimensions or _DEFAULT_DIMENSIONS
|
||||
scores: List[ScoreEntry] = []
|
||||
current_dim = None
|
||||
current_score = None
|
||||
current_rationale = ""
|
||||
|
||||
for line in response_text.splitlines():
|
||||
stripped = line.strip()
|
||||
if stripped.upper().startswith("DIMENSION:"):
|
||||
# Flush previous
|
||||
if current_dim is not None and current_score is not None:
|
||||
scores.append(ScoreEntry(
|
||||
name=current_dim,
|
||||
value=current_score,
|
||||
max_value=5.0,
|
||||
rationale=current_rationale.strip(),
|
||||
))
|
||||
current_dim = stripped.split(":", 1)[1].strip()
|
||||
current_score = None
|
||||
current_rationale = ""
|
||||
elif stripped.upper().startswith("SCORE:"):
|
||||
try:
|
||||
current_score = float(stripped.split(":", 1)[1].strip())
|
||||
except ValueError:
|
||||
current_score = None
|
||||
elif stripped.upper().startswith("RATIONALE:"):
|
||||
current_rationale = stripped.split(":", 1)[1].strip()
|
||||
elif current_dim is not None and current_score is not None:
|
||||
# Continuation of rationale
|
||||
if stripped:
|
||||
current_rationale += " " + stripped
|
||||
|
||||
# Flush last
|
||||
if current_dim is not None and current_score is not None:
|
||||
scores.append(ScoreEntry(
|
||||
name=current_dim,
|
||||
value=current_score,
|
||||
max_value=5.0,
|
||||
rationale=current_rationale.strip(),
|
||||
))
|
||||
|
||||
return scores
|
||||
|
||||
|
||||
def run_entity_evaluation(
|
||||
config: InfospaceConfig,
|
||||
entities: List[EntityMeta],
|
||||
adapter: LLMAdapter,
|
||||
run_config: Optional[RunConfig] = None,
|
||||
output_dir: Optional[Path] = None,
|
||||
previous_digests: Optional[Dict[str, str]] = None,
|
||||
progress_callback: Optional[Callable] = None,
|
||||
dimensions: Optional[List[str]] = None,
|
||||
) -> BatchSummary:
|
||||
"""Run per-entity evaluation using the batch evaluator.
|
||||
|
||||
Args:
|
||||
config: The infospace configuration.
|
||||
entities: Entities to evaluate.
|
||||
adapter: LLM adapter for evaluation.
|
||||
run_config: LLM execution configuration.
|
||||
output_dir: Where to write evaluation results. Defaults to
|
||||
``config.evaluations_dir`` relative to CWD.
|
||||
previous_digests: ``{slug: digest}`` for incremental skip.
|
||||
progress_callback: Called after each item.
|
||||
dimensions: Custom evaluation dimensions.
|
||||
|
||||
Returns:
|
||||
A :class:`BatchSummary` with per-entity results.
|
||||
"""
|
||||
topic = config.topic.name
|
||||
items = [
|
||||
BatchItem(
|
||||
key=entity.slug,
|
||||
prompt=build_evaluation_prompt(entity, topic, dimensions),
|
||||
content_digest=content_digest(entity),
|
||||
metadata={"source_path": entity.source_path},
|
||||
)
|
||||
for entity in entities
|
||||
]
|
||||
|
||||
evaluator = BatchEvaluator(
|
||||
adapter=adapter,
|
||||
config=run_config,
|
||||
progress_callback=progress_callback,
|
||||
previous_digests=previous_digests,
|
||||
)
|
||||
summary = evaluator.evaluate(items)
|
||||
|
||||
# Write successful results
|
||||
evaluations_path = output_dir or Path(config.evaluations_dir)
|
||||
evaluator_name = (run_config.model_name if run_config else "unknown")
|
||||
|
||||
for result in summary.results:
|
||||
if result.status != "success" or result.response is None:
|
||||
continue
|
||||
|
||||
scores = parse_evaluation_response(result.response.content, dimensions)
|
||||
evaluation = EntityEvaluation(
|
||||
entity_slug=result.key,
|
||||
evaluator=evaluator_name,
|
||||
scores=scores,
|
||||
evaluated_at=datetime.utcnow(),
|
||||
)
|
||||
eval_path = evaluations_path / f"{result.key}.md"
|
||||
write_entity_evaluation(evaluation, eval_path)
|
||||
|
||||
return summary
|
||||
207
markitect/infospace/evaluation.py
Normal file
207
markitect/infospace/evaluation.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Data models for structured evaluation output.
|
||||
|
||||
Provides typed containers for per-entity LLM-evaluated scores and
|
||||
collection-level metrics. All models support ``to_dict()``/``from_dict()``
|
||||
round-tripping for YAML serialisation.
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScoreEntry:
|
||||
"""A single scored dimension (e.g. definition_precision: 4.5/5.0)."""
|
||||
|
||||
name: str
|
||||
value: float
|
||||
max_value: float = 5.0
|
||||
rationale: str = ""
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {
|
||||
"name": self.name,
|
||||
"value": self.value,
|
||||
"max_value": self.max_value,
|
||||
}
|
||||
if self.rationale:
|
||||
d["rationale"] = self.rationale
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "ScoreEntry":
|
||||
return cls(
|
||||
name=data["name"],
|
||||
value=float(data["value"]),
|
||||
max_value=float(data.get("max_value", 5.0)),
|
||||
rationale=data.get("rationale", ""),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class EntityEvaluation:
|
||||
"""Per-entity evaluation result."""
|
||||
|
||||
entity_slug: str
|
||||
evaluator: str
|
||||
scores: List[ScoreEntry]
|
||||
evaluated_at: datetime
|
||||
notes: List[str] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def overall_score(self) -> float:
|
||||
if not self.scores:
|
||||
return 0.0
|
||||
return sum(s.value for s in self.scores) / len(self.scores)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"entity_slug": self.entity_slug,
|
||||
"evaluator": self.evaluator,
|
||||
"evaluated_at": self.evaluated_at.isoformat(),
|
||||
"overall_score": round(self.overall_score, 4),
|
||||
"scores": [s.to_dict() for s in self.scores],
|
||||
"notes": self.notes,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "EntityEvaluation":
|
||||
return cls(
|
||||
entity_slug=data["entity_slug"],
|
||||
evaluator=data["evaluator"],
|
||||
scores=[ScoreEntry.from_dict(s) for s in data["scores"]],
|
||||
evaluated_at=datetime.fromisoformat(data["evaluated_at"]),
|
||||
notes=data.get("notes", []),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class MetricValue:
|
||||
"""A single collection-level metric."""
|
||||
|
||||
name: str
|
||||
value: float
|
||||
concern: str = ""
|
||||
details: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {"name": self.name, "value": self.value}
|
||||
if self.concern:
|
||||
d["concern"] = self.concern
|
||||
if self.details:
|
||||
d["details"] = self.details
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "MetricValue":
|
||||
return cls(
|
||||
name=data["name"],
|
||||
value=float(data["value"]),
|
||||
concern=data.get("concern", ""),
|
||||
details=data.get("details", {}),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class EvaluationSnapshot:
|
||||
"""Timestamped snapshot of entity evaluations and collection metrics."""
|
||||
|
||||
snapshot_id: str
|
||||
created_at: datetime
|
||||
schema_name: str
|
||||
entity_count: int
|
||||
entity_evaluations: List[EntityEvaluation] = field(default_factory=list)
|
||||
collection_metrics: List[MetricValue] = field(default_factory=list)
|
||||
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"snapshot_id": self.snapshot_id,
|
||||
"created_at": self.created_at.isoformat(),
|
||||
"schema_name": self.schema_name,
|
||||
"entity_count": self.entity_count,
|
||||
"entity_evaluations": [e.to_dict() for e in self.entity_evaluations],
|
||||
"collection_metrics": [m.to_dict() for m in self.collection_metrics],
|
||||
"metadata": self.metadata,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "EvaluationSnapshot":
|
||||
return cls(
|
||||
snapshot_id=data["snapshot_id"],
|
||||
created_at=datetime.fromisoformat(data["created_at"]),
|
||||
schema_name=data["schema_name"],
|
||||
entity_count=data["entity_count"],
|
||||
entity_evaluations=[
|
||||
EntityEvaluation.from_dict(e) for e in data.get("entity_evaluations", [])
|
||||
],
|
||||
collection_metrics=[
|
||||
MetricValue.from_dict(m) for m in data.get("collection_metrics", [])
|
||||
],
|
||||
metadata=data.get("metadata", {}),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScoreChange:
|
||||
"""Delta record for a single score dimension between snapshots."""
|
||||
|
||||
entity_slug: str
|
||||
dimension: str
|
||||
before: float
|
||||
after: float
|
||||
|
||||
@property
|
||||
def delta(self) -> float:
|
||||
return self.after - self.before
|
||||
|
||||
|
||||
@dataclass
|
||||
class MetricChange:
|
||||
"""Delta record for a collection metric between snapshots."""
|
||||
|
||||
name: str
|
||||
before: float
|
||||
after: float
|
||||
|
||||
@property
|
||||
def delta(self) -> float:
|
||||
return self.after - self.before
|
||||
|
||||
|
||||
@dataclass
|
||||
class SnapshotDiff:
|
||||
"""Diff between two evaluation snapshots."""
|
||||
|
||||
before_id: str
|
||||
after_id: str
|
||||
added_entities: List[str] = field(default_factory=list)
|
||||
removed_entities: List[str] = field(default_factory=list)
|
||||
score_changes: List[ScoreChange] = field(default_factory=list)
|
||||
metric_changes: List[MetricChange] = field(default_factory=list)
|
||||
|
||||
def summary(self) -> str:
|
||||
lines = [f"Diff: {self.before_id} -> {self.after_id}"]
|
||||
if self.added_entities:
|
||||
lines.append(f" Added entities: {', '.join(self.added_entities)}")
|
||||
if self.removed_entities:
|
||||
lines.append(f" Removed entities: {', '.join(self.removed_entities)}")
|
||||
if self.score_changes:
|
||||
lines.append(f" Score changes: {len(self.score_changes)}")
|
||||
for sc in self.score_changes:
|
||||
lines.append(
|
||||
f" {sc.entity_slug}/{sc.dimension}: "
|
||||
f"{sc.before} -> {sc.after} ({sc.delta:+.2f})"
|
||||
)
|
||||
if self.metric_changes:
|
||||
lines.append(f" Metric changes: {len(self.metric_changes)}")
|
||||
for mc in self.metric_changes:
|
||||
lines.append(
|
||||
f" {mc.name}: {mc.before} -> {mc.after} ({mc.delta:+.2f})"
|
||||
)
|
||||
if not any([self.added_entities, self.removed_entities,
|
||||
self.score_changes, self.metric_changes]):
|
||||
lines.append(" No changes")
|
||||
return "\n".join(lines)
|
||||
213
markitect/infospace/evaluation_io.py
Normal file
213
markitect/infospace/evaluation_io.py
Normal file
@@ -0,0 +1,213 @@
|
||||
"""
|
||||
Read/write utilities for evaluation output files.
|
||||
|
||||
Per-entity evaluations are stored as markdown with YAML frontmatter.
|
||||
Snapshots and history are stored as pure YAML files.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
import yaml
|
||||
|
||||
from .evaluation import (
|
||||
EntityEvaluation,
|
||||
EvaluationSnapshot,
|
||||
MetricChange,
|
||||
MetricValue,
|
||||
ScoreChange,
|
||||
SnapshotDiff,
|
||||
)
|
||||
|
||||
_FRONTMATTER_SEP = "---"
|
||||
|
||||
|
||||
def write_entity_evaluation(evaluation: EntityEvaluation, path: Path) -> None:
|
||||
"""Write a per-entity evaluation as YAML frontmatter + markdown body."""
|
||||
frontmatter = {
|
||||
"entity_slug": evaluation.entity_slug,
|
||||
"evaluator": evaluation.evaluator,
|
||||
"evaluated_at": evaluation.evaluated_at.isoformat(),
|
||||
"overall_score": round(evaluation.overall_score, 4),
|
||||
"scores": [s.to_dict() for s in evaluation.scores],
|
||||
}
|
||||
if evaluation.notes:
|
||||
frontmatter["notes"] = evaluation.notes
|
||||
|
||||
lines: List[str] = []
|
||||
lines.append(_FRONTMATTER_SEP)
|
||||
lines.append(yaml.safe_dump(frontmatter, default_flow_style=False, sort_keys=False).rstrip())
|
||||
lines.append(_FRONTMATTER_SEP)
|
||||
lines.append("")
|
||||
|
||||
# Title
|
||||
title = evaluation.entity_slug.replace("_", " ").replace("-", " ").title()
|
||||
lines.append(f"# Evaluation: {title}")
|
||||
lines.append("")
|
||||
|
||||
# One section per score with rationale
|
||||
for score in evaluation.scores:
|
||||
lines.append(f"## {score.name} — {score.value} / {score.max_value}")
|
||||
lines.append("")
|
||||
if score.rationale:
|
||||
lines.append(score.rationale)
|
||||
lines.append("")
|
||||
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text("\n".join(lines), encoding="utf-8")
|
||||
|
||||
|
||||
def read_entity_evaluation(path: Path) -> EntityEvaluation:
|
||||
"""Read a per-entity evaluation from a YAML frontmatter markdown file."""
|
||||
text = path.read_text(encoding="utf-8")
|
||||
parts = text.split(f"{_FRONTMATTER_SEP}\n", maxsplit=2)
|
||||
# parts: ["", frontmatter_text, body]
|
||||
if len(parts) < 3:
|
||||
raise ValueError(f"Invalid frontmatter in {path}")
|
||||
fm_text = parts[1]
|
||||
body = parts[2]
|
||||
|
||||
fm = yaml.safe_load(fm_text)
|
||||
|
||||
# Parse rationales from body
|
||||
rationales = _parse_rationales(body)
|
||||
|
||||
from .evaluation import ScoreEntry
|
||||
|
||||
scores = []
|
||||
for s_data in fm["scores"]:
|
||||
se = ScoreEntry.from_dict(s_data)
|
||||
if se.name in rationales:
|
||||
se.rationale = rationales[se.name]
|
||||
scores.append(se)
|
||||
|
||||
return EntityEvaluation(
|
||||
entity_slug=fm["entity_slug"],
|
||||
evaluator=fm["evaluator"],
|
||||
scores=scores,
|
||||
evaluated_at=__import__("datetime").datetime.fromisoformat(fm["evaluated_at"]),
|
||||
notes=fm.get("notes", []),
|
||||
)
|
||||
|
||||
|
||||
def _parse_rationales(body: str) -> dict:
|
||||
"""Extract rationale text per dimension from the markdown body."""
|
||||
rationales: dict = {}
|
||||
current_name = None
|
||||
current_lines: List[str] = []
|
||||
|
||||
for line in body.splitlines():
|
||||
if line.startswith("## "):
|
||||
# Save previous
|
||||
if current_name is not None:
|
||||
rationales[current_name] = "\n".join(current_lines).strip()
|
||||
# Parse "## dimension_name — 4.5 / 5.0"
|
||||
heading = line[3:].strip()
|
||||
name = heading.split("—")[0].strip() if "—" in heading else heading
|
||||
current_name = name
|
||||
current_lines = []
|
||||
elif current_name is not None:
|
||||
current_lines.append(line)
|
||||
|
||||
if current_name is not None:
|
||||
rationales[current_name] = "\n".join(current_lines).strip()
|
||||
|
||||
return rationales
|
||||
|
||||
|
||||
def write_snapshot(snapshot: EvaluationSnapshot, path: Path) -> None:
|
||||
"""Write an evaluation snapshot as a YAML file."""
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(
|
||||
yaml.safe_dump(snapshot.to_dict(), default_flow_style=False, sort_keys=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
def read_snapshot(path: Path) -> EvaluationSnapshot:
|
||||
"""Read an evaluation snapshot from a YAML file."""
|
||||
data = yaml.safe_load(path.read_text(encoding="utf-8"))
|
||||
return EvaluationSnapshot.from_dict(data)
|
||||
|
||||
|
||||
def append_to_history(snapshot: EvaluationSnapshot, history_path: Path) -> None:
|
||||
"""Append a snapshot to a YAML list file (creates if missing)."""
|
||||
history_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
existing: List[dict] = []
|
||||
if history_path.exists():
|
||||
loaded = yaml.safe_load(history_path.read_text(encoding="utf-8"))
|
||||
if loaded is not None:
|
||||
existing = loaded
|
||||
|
||||
existing.append(snapshot.to_dict())
|
||||
history_path.write_text(
|
||||
yaml.safe_dump(existing, default_flow_style=False, sort_keys=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
def read_history(history_path: Path) -> List[EvaluationSnapshot]:
|
||||
"""Read all snapshots from a YAML history file."""
|
||||
data = yaml.safe_load(history_path.read_text(encoding="utf-8"))
|
||||
if data is None:
|
||||
return []
|
||||
return [EvaluationSnapshot.from_dict(d) for d in data]
|
||||
|
||||
|
||||
def diff_snapshots(before: EvaluationSnapshot, after: EvaluationSnapshot) -> SnapshotDiff:
|
||||
"""Compute the diff between two evaluation snapshots."""
|
||||
before_slugs = {e.entity_slug for e in before.entity_evaluations}
|
||||
after_slugs = {e.entity_slug for e in after.entity_evaluations}
|
||||
|
||||
added = sorted(after_slugs - before_slugs)
|
||||
removed = sorted(before_slugs - after_slugs)
|
||||
|
||||
# Build score lookup: {slug: {dimension: value}}
|
||||
before_scores: dict = {}
|
||||
for ev in before.entity_evaluations:
|
||||
before_scores[ev.entity_slug] = {s.name: s.value for s in ev.scores}
|
||||
|
||||
after_scores: dict = {}
|
||||
for ev in after.entity_evaluations:
|
||||
after_scores[ev.entity_slug] = {s.name: s.value for s in ev.scores}
|
||||
|
||||
score_changes: List[ScoreChange] = []
|
||||
common_slugs = sorted(before_slugs & after_slugs)
|
||||
for slug in common_slugs:
|
||||
b_dims = before_scores[slug]
|
||||
a_dims = after_scores[slug]
|
||||
all_dims = sorted(set(b_dims) | set(a_dims))
|
||||
for dim in all_dims:
|
||||
bv = b_dims.get(dim)
|
||||
av = a_dims.get(dim)
|
||||
if bv != av:
|
||||
score_changes.append(ScoreChange(
|
||||
entity_slug=slug,
|
||||
dimension=dim,
|
||||
before=bv if bv is not None else 0.0,
|
||||
after=av if av is not None else 0.0,
|
||||
))
|
||||
|
||||
# Metric changes
|
||||
before_metrics = {m.name: m.value for m in before.collection_metrics}
|
||||
after_metrics = {m.name: m.value for m in after.collection_metrics}
|
||||
all_metric_names = sorted(set(before_metrics) | set(after_metrics))
|
||||
metric_changes: List[MetricChange] = []
|
||||
for name in all_metric_names:
|
||||
bv = before_metrics.get(name)
|
||||
av = after_metrics.get(name)
|
||||
if bv != av:
|
||||
metric_changes.append(MetricChange(
|
||||
name=name,
|
||||
before=bv if bv is not None else 0.0,
|
||||
after=av if av is not None else 0.0,
|
||||
))
|
||||
|
||||
return SnapshotDiff(
|
||||
before_id=before.snapshot_id,
|
||||
after_id=after.snapshot_id,
|
||||
added_entities=added,
|
||||
removed_entities=removed,
|
||||
score_changes=score_changes,
|
||||
metric_changes=metric_changes,
|
||||
)
|
||||
223
markitect/infospace/history.py
Normal file
223
markitect/infospace/history.py
Normal file
@@ -0,0 +1,223 @@
|
||||
"""
|
||||
Metrics history and viability tracking.
|
||||
|
||||
Converts check results into timestamped snapshots and maintains a
|
||||
persistent history file for trend analysis.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import uuid
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import yaml
|
||||
|
||||
from markitect.infospace.checks.orchestrator import CheckReport
|
||||
from markitect.infospace.config import InfospaceConfig
|
||||
from markitect.infospace.evaluation import EvaluationSnapshot, MetricValue
|
||||
from markitect.infospace.evaluation_io import (
|
||||
append_to_history,
|
||||
diff_snapshots,
|
||||
read_history,
|
||||
)
|
||||
from markitect.infospace.state import ViabilityResult
|
||||
|
||||
|
||||
# ── Snapshot creation ────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _concern_for_metric(name: str) -> str:
|
||||
"""Map a metric name to its concern label."""
|
||||
mapping = {
|
||||
"redundancy_ratio": "C1",
|
||||
"coverage_ratio": "C2",
|
||||
"coherence_components": "C3",
|
||||
"modularity": "C3",
|
||||
"consistency_cycles": "C4",
|
||||
"granularity_entropy": "C5",
|
||||
}
|
||||
return mapping.get(name, "")
|
||||
|
||||
|
||||
def snapshot_from_checks(
|
||||
check_report: CheckReport,
|
||||
entity_count: int,
|
||||
schema_name: str = "default",
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
) -> EvaluationSnapshot:
|
||||
"""Create an :class:`EvaluationSnapshot` from collection check results.
|
||||
|
||||
Args:
|
||||
check_report: Output from :func:`run_all_checks`.
|
||||
entity_count: Number of entities checked.
|
||||
schema_name: Schema identifier for the snapshot.
|
||||
metadata: Optional extra metadata to attach.
|
||||
|
||||
Returns:
|
||||
A snapshot containing the check metrics as collection_metrics.
|
||||
"""
|
||||
metrics_dict = check_report.metrics()
|
||||
collection_metrics = [
|
||||
MetricValue(
|
||||
name=name,
|
||||
value=value,
|
||||
concern=_concern_for_metric(name),
|
||||
)
|
||||
for name, value in sorted(metrics_dict.items())
|
||||
]
|
||||
|
||||
return EvaluationSnapshot(
|
||||
snapshot_id=str(uuid.uuid4())[:8],
|
||||
created_at=datetime.now(timezone.utc),
|
||||
schema_name=schema_name,
|
||||
entity_count=entity_count,
|
||||
collection_metrics=collection_metrics,
|
||||
metadata=metadata or {},
|
||||
)
|
||||
|
||||
|
||||
# ── Metrics file I/O ────────────────────────────────────────────────
|
||||
|
||||
|
||||
def write_metrics_file(metrics: Dict[str, float], path: Path) -> None:
|
||||
"""Write the latest metrics to a simple YAML file.
|
||||
|
||||
This file is used by ``markitect infospace viability`` for quick
|
||||
threshold checking.
|
||||
"""
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(
|
||||
yaml.safe_dump(
|
||||
{k: round(v, 6) for k, v in sorted(metrics.items())},
|
||||
default_flow_style=False,
|
||||
sort_keys=True,
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
def read_metrics_file(path: Path) -> Dict[str, float]:
|
||||
"""Read the latest metrics from a YAML file."""
|
||||
if not path.is_file():
|
||||
return {}
|
||||
raw = yaml.safe_load(path.read_text(encoding="utf-8"))
|
||||
if not isinstance(raw, dict):
|
||||
return {}
|
||||
return {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
|
||||
|
||||
|
||||
# ── History operations ───────────────────────────────────────────────
|
||||
|
||||
|
||||
def record_check_results(
|
||||
check_report: CheckReport,
|
||||
config: InfospaceConfig,
|
||||
root: Path,
|
||||
entity_count: int,
|
||||
) -> EvaluationSnapshot:
|
||||
"""Record check results: save metrics file and append to history.
|
||||
|
||||
Args:
|
||||
check_report: Output from ``run_all_checks()``.
|
||||
config: The infospace configuration.
|
||||
root: Project root directory.
|
||||
entity_count: Number of entities checked.
|
||||
|
||||
Returns:
|
||||
The snapshot that was recorded.
|
||||
"""
|
||||
metrics_dir = root / config.metrics_dir
|
||||
metrics = check_report.metrics()
|
||||
|
||||
# Save latest metrics
|
||||
write_metrics_file(metrics, metrics_dir / "metrics.yaml")
|
||||
|
||||
# Create and append snapshot
|
||||
snapshot = snapshot_from_checks(
|
||||
check_report,
|
||||
entity_count=entity_count,
|
||||
metadata={"source": "collection-checks"},
|
||||
)
|
||||
append_to_history(snapshot, metrics_dir / "history.yaml")
|
||||
|
||||
return snapshot
|
||||
|
||||
|
||||
def get_history(config: InfospaceConfig, root: Path) -> List[EvaluationSnapshot]:
|
||||
"""Read the full metrics history for an infospace."""
|
||||
history_path = root / config.metrics_dir / "history.yaml"
|
||||
if not history_path.is_file():
|
||||
return []
|
||||
return read_history(history_path)
|
||||
|
||||
|
||||
def get_latest_snapshot(
|
||||
config: InfospaceConfig, root: Path
|
||||
) -> Optional[EvaluationSnapshot]:
|
||||
"""Get the most recent snapshot from the history."""
|
||||
history = get_history(config, root)
|
||||
return history[-1] if history else None
|
||||
|
||||
|
||||
def find_snapshot_by_date(
|
||||
history: List[EvaluationSnapshot], date_str: str
|
||||
) -> Optional[EvaluationSnapshot]:
|
||||
"""Find the snapshot closest to a given date string.
|
||||
|
||||
Args:
|
||||
history: List of snapshots in chronological order.
|
||||
date_str: Date string in ``YYYY-MM-DD`` or ``YYYY-MM-DDTHH:MM:SS`` format.
|
||||
|
||||
Returns:
|
||||
The snapshot closest to the given date, or ``None`` if history is empty.
|
||||
"""
|
||||
if not history:
|
||||
return None
|
||||
|
||||
# Parse the target date
|
||||
try:
|
||||
if "T" in date_str:
|
||||
target = datetime.fromisoformat(date_str)
|
||||
else:
|
||||
target = datetime.fromisoformat(date_str + "T00:00:00")
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
# Make timezone-aware if needed
|
||||
if target.tzinfo is None:
|
||||
target = target.replace(tzinfo=timezone.utc)
|
||||
|
||||
best = None
|
||||
best_delta = None
|
||||
for snap in history:
|
||||
snap_dt = snap.created_at
|
||||
if snap_dt.tzinfo is None:
|
||||
snap_dt = snap_dt.replace(tzinfo=timezone.utc)
|
||||
delta = abs((snap_dt - target).total_seconds())
|
||||
if best_delta is None or delta < best_delta:
|
||||
best = snap
|
||||
best_delta = delta
|
||||
|
||||
return best
|
||||
|
||||
|
||||
def metric_trend(
|
||||
history: List[EvaluationSnapshot], metric_name: str
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""Extract a single metric's values across the history.
|
||||
|
||||
Returns a list of ``{"date": iso_str, "value": float}`` entries
|
||||
for each snapshot that contains the metric.
|
||||
"""
|
||||
trend: List[Dict[str, Any]] = []
|
||||
for snap in history:
|
||||
for m in snap.collection_metrics:
|
||||
if m.name == metric_name:
|
||||
trend.append({
|
||||
"date": snap.created_at.isoformat(),
|
||||
"value": m.value,
|
||||
})
|
||||
break
|
||||
return trend
|
||||
53
markitect/infospace/models.py
Normal file
53
markitect/infospace/models.py
Normal file
@@ -0,0 +1,53 @@
|
||||
"""
|
||||
Data models for infospace entity metadata.
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from typing import Any, Dict, List
|
||||
|
||||
|
||||
@dataclass
|
||||
class EntityMeta:
|
||||
"""Structured metadata extracted from a single entity markdown file.
|
||||
|
||||
The parser populates every field it can find; missing optional
|
||||
sections are left as empty strings (validation is a separate step).
|
||||
"""
|
||||
|
||||
# Identity
|
||||
slug: str
|
||||
title: str
|
||||
h1_raw: str # verbatim H1 text before any normalisation
|
||||
|
||||
# Section contents (plain text, empty string if section missing)
|
||||
definition: str = ""
|
||||
source_chapter: str = ""
|
||||
context: str = ""
|
||||
domain: str = ""
|
||||
original_wording: str = ""
|
||||
modern_interpretation: str = ""
|
||||
|
||||
# Derived flags
|
||||
h1_is_title_case: bool = False
|
||||
has_original_wording: bool = False
|
||||
|
||||
# Metrics-ready numbers
|
||||
definition_word_count: int = 0
|
||||
total_word_count: int = 0
|
||||
|
||||
# All H2 section slugs found (preserves order)
|
||||
section_slugs: List[str] = field(default_factory=list)
|
||||
|
||||
# Source file path (as string for serialisation)
|
||||
source_path: str = ""
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Serialise to a plain dictionary."""
|
||||
return asdict(self)
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "EntityMeta":
|
||||
"""Deserialise from a plain dictionary."""
|
||||
known_fields = {f.name for f in cls.__dataclass_fields__.values()}
|
||||
filtered = {k: v for k, v in data.items() if k in known_fields}
|
||||
return cls(**filtered)
|
||||
144
markitect/infospace/schema.py
Normal file
144
markitect/infospace/schema.py
Normal file
@@ -0,0 +1,144 @@
|
||||
"""
|
||||
Declarative schema definitions for entity compliance validation.
|
||||
|
||||
A schema describes the expected structure of an entity: which sections
|
||||
are required, word count bounds, heading format, and valid enum values.
|
||||
Schemas are frozen (immutable once created).
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
from typing import Optional, Tuple
|
||||
|
||||
|
||||
class SectionRequirement(Enum):
|
||||
"""How strictly a section must be present."""
|
||||
|
||||
REQUIRED = "required"
|
||||
RECOMMENDED = "recommended"
|
||||
OPTIONAL = "optional"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SectionRule:
|
||||
"""Validation rule for a single H2 section.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
slug:
|
||||
Section slug as it appears in entity metadata (e.g. ``definition``).
|
||||
label:
|
||||
Human-readable section name for diagnostics.
|
||||
requirement:
|
||||
Whether the section is required, recommended, or optional.
|
||||
min_words:
|
||||
Minimum word count (inclusive). ``None`` means no lower bound.
|
||||
max_words:
|
||||
Maximum word count (inclusive). ``None`` means no upper bound.
|
||||
"""
|
||||
|
||||
slug: str
|
||||
label: str
|
||||
requirement: SectionRequirement
|
||||
min_words: Optional[int] = None
|
||||
max_words: Optional[int] = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EnumConstraint:
|
||||
"""Constraint limiting a field to a set of allowed values.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
field_name:
|
||||
The ``EntityMeta`` field to check (e.g. ``domain``).
|
||||
allowed_values:
|
||||
Tuple of acceptable string values.
|
||||
severity:
|
||||
``"error"`` or ``"warning"`` when the value is not in the set.
|
||||
"""
|
||||
|
||||
field_name: str
|
||||
allowed_values: Tuple[str, ...]
|
||||
severity: str = "warning"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EntitySchema:
|
||||
"""Complete validation schema for an entity type.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
name:
|
||||
Human-readable schema name (e.g. ``"Economic Entity"``).
|
||||
section_rules:
|
||||
Tuple of :class:`SectionRule` objects.
|
||||
enum_constraints:
|
||||
Tuple of :class:`EnumConstraint` objects.
|
||||
h1_title_case_severity:
|
||||
Severity for non-title-case H1 headings (``"error"`` or ``"warning"``).
|
||||
require_h1:
|
||||
Whether a non-empty slug (H1) is required.
|
||||
"""
|
||||
|
||||
name: str
|
||||
section_rules: Tuple[SectionRule, ...]
|
||||
enum_constraints: Tuple[EnumConstraint, ...] = ()
|
||||
h1_title_case_severity: str = "warning"
|
||||
require_h1: bool = True
|
||||
|
||||
|
||||
# ── Default schema for the economic-entity infospace ──────────────
|
||||
|
||||
ECONOMIC_ENTITY_SCHEMA = EntitySchema(
|
||||
name="Economic Entity",
|
||||
section_rules=(
|
||||
SectionRule(
|
||||
slug="definition",
|
||||
label="Definition",
|
||||
requirement=SectionRequirement.REQUIRED,
|
||||
min_words=20,
|
||||
max_words=150,
|
||||
),
|
||||
SectionRule(
|
||||
slug="source_chapter",
|
||||
label="Source Chapter",
|
||||
requirement=SectionRequirement.REQUIRED,
|
||||
),
|
||||
SectionRule(
|
||||
slug="context",
|
||||
label="Context",
|
||||
requirement=SectionRequirement.REQUIRED,
|
||||
),
|
||||
SectionRule(
|
||||
slug="economic_domain",
|
||||
label="Economic Domain",
|
||||
requirement=SectionRequirement.REQUIRED,
|
||||
),
|
||||
SectionRule(
|
||||
slug="smith_s_original_wording",
|
||||
label="Smith's Original Wording",
|
||||
requirement=SectionRequirement.OPTIONAL,
|
||||
),
|
||||
SectionRule(
|
||||
slug="modern_interpretation",
|
||||
label="Modern Interpretation",
|
||||
requirement=SectionRequirement.OPTIONAL,
|
||||
),
|
||||
),
|
||||
enum_constraints=(
|
||||
EnumConstraint(
|
||||
field_name="domain",
|
||||
allowed_values=(
|
||||
"Production",
|
||||
"Exchange",
|
||||
"Distribution",
|
||||
"Regulation",
|
||||
"General Theory",
|
||||
),
|
||||
severity="warning",
|
||||
),
|
||||
),
|
||||
h1_title_case_severity="warning",
|
||||
require_h1=True,
|
||||
)
|
||||
141
markitect/infospace/state.py
Normal file
141
markitect/infospace/state.py
Normal file
@@ -0,0 +1,141 @@
|
||||
"""
|
||||
Infospace runtime state.
|
||||
|
||||
Computed from the current entities, evaluations, and metrics on disk.
|
||||
Provides the data behind ``markitect infospace status`` and
|
||||
``markitect infospace viability``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from markitect.infospace.config import InfospaceConfig, ViabilityThreshold
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.infospace.evaluation import EvaluationSnapshot
|
||||
|
||||
|
||||
@dataclass
|
||||
class ViabilityResult:
|
||||
"""Result of checking a single viability threshold."""
|
||||
|
||||
metric: str
|
||||
value: float
|
||||
threshold: ViabilityThreshold
|
||||
passed: bool
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {
|
||||
"metric": self.metric,
|
||||
"value": self.value,
|
||||
"passed": self.passed,
|
||||
}
|
||||
if self.threshold.min is not None:
|
||||
d["min"] = self.threshold.min
|
||||
if self.threshold.max is not None:
|
||||
d["max"] = self.threshold.max
|
||||
return d
|
||||
|
||||
|
||||
@dataclass
|
||||
class InfospaceState:
|
||||
"""Current runtime state of an infospace.
|
||||
|
||||
Aggregates entity metadata, evaluation results, and viability
|
||||
checks into a single queryable object.
|
||||
"""
|
||||
|
||||
config: InfospaceConfig
|
||||
entities: List[EntityMeta] = field(default_factory=list)
|
||||
latest_snapshot: Optional[EvaluationSnapshot] = None
|
||||
viability_results: List[ViabilityResult] = field(default_factory=list)
|
||||
computed_at: datetime = field(default_factory=datetime.utcnow)
|
||||
|
||||
@property
|
||||
def entity_count(self) -> int:
|
||||
return len(self.entities)
|
||||
|
||||
@property
|
||||
def topic_name(self) -> str:
|
||||
return self.config.topic.name
|
||||
|
||||
@property
|
||||
def is_viable(self) -> bool:
|
||||
"""``True`` if all viability thresholds are met."""
|
||||
if not self.viability_results:
|
||||
return False
|
||||
return all(r.passed for r in self.viability_results)
|
||||
|
||||
@property
|
||||
def viability_pass_count(self) -> int:
|
||||
return sum(1 for r in self.viability_results if r.passed)
|
||||
|
||||
@property
|
||||
def viability_total_count(self) -> int:
|
||||
return len(self.viability_results)
|
||||
|
||||
@property
|
||||
def domains(self) -> List[str]:
|
||||
"""Distinct domain values across all entities."""
|
||||
return sorted({e.domain for e in self.entities if e.domain})
|
||||
|
||||
@property
|
||||
def has_evaluations(self) -> bool:
|
||||
return self.latest_snapshot is not None
|
||||
|
||||
def check_viability(self, metrics: Dict[str, float]) -> List[ViabilityResult]:
|
||||
"""Check *metrics* against the configured viability thresholds.
|
||||
|
||||
Updates :attr:`viability_results` and returns the results.
|
||||
"""
|
||||
results: List[ViabilityResult] = []
|
||||
for name, threshold in self.config.viability.items():
|
||||
value = metrics.get(name, 0.0)
|
||||
results.append(ViabilityResult(
|
||||
metric=name,
|
||||
value=value,
|
||||
threshold=threshold,
|
||||
passed=threshold.check(value),
|
||||
))
|
||||
self.viability_results = results
|
||||
return results
|
||||
|
||||
def summary(self) -> Dict[str, Any]:
|
||||
"""Return a summary dict suitable for display or serialisation."""
|
||||
d: Dict[str, Any] = {
|
||||
"topic": self.topic_name,
|
||||
"entity_count": self.entity_count,
|
||||
"domains": self.domains,
|
||||
"has_evaluations": self.has_evaluations,
|
||||
}
|
||||
if self.viability_results:
|
||||
d["viable"] = self.is_viable
|
||||
d["viability_pass"] = self.viability_pass_count
|
||||
d["viability_total"] = self.viability_total_count
|
||||
if self.latest_snapshot:
|
||||
d["last_evaluated"] = self.latest_snapshot.created_at.isoformat()
|
||||
return d
|
||||
|
||||
|
||||
def build_state(
|
||||
config: InfospaceConfig,
|
||||
entities: Optional[List[EntityMeta]] = None,
|
||||
snapshot: Optional[EvaluationSnapshot] = None,
|
||||
metrics: Optional[Dict[str, float]] = None,
|
||||
) -> InfospaceState:
|
||||
"""Build an :class:`InfospaceState` from available data.
|
||||
|
||||
This is a convenience function that assembles the state object
|
||||
and optionally runs viability checks if *metrics* are provided.
|
||||
"""
|
||||
state = InfospaceState(
|
||||
config=config,
|
||||
entities=entities or [],
|
||||
latest_snapshot=snapshot,
|
||||
)
|
||||
if metrics is not None:
|
||||
state.check_viability(metrics)
|
||||
return state
|
||||
261
markitect/infospace/validator.py
Normal file
261
markitect/infospace/validator.py
Normal file
@@ -0,0 +1,261 @@
|
||||
"""
|
||||
Schema compliance validator for entity metadata.
|
||||
|
||||
Validates :class:`~markitect.infospace.models.EntityMeta` instances
|
||||
against a declarative :class:`~markitect.infospace.schema.EntitySchema`.
|
||||
All checks are deterministic — no LLM calls.
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, List, Optional, Sequence
|
||||
|
||||
from .models import EntityMeta
|
||||
from .schema import EntitySchema, SectionRequirement
|
||||
|
||||
# Maps section slugs (as they appear in the schema) to EntityMeta field
|
||||
# names. Most match directly; ``economic_domain`` maps to ``domain``.
|
||||
_SECTION_FIELD_MAP: Dict[str, str] = {
|
||||
"definition": "definition",
|
||||
"source_chapter": "source_chapter",
|
||||
"context": "context",
|
||||
"economic_domain": "domain",
|
||||
"smith_s_original_wording": "original_wording",
|
||||
"modern_interpretation": "modern_interpretation",
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ComplianceDiagnostic:
|
||||
"""A single validation finding."""
|
||||
|
||||
code: str
|
||||
message: str
|
||||
severity: str # "error" or "warning"
|
||||
section: Optional[str] = None
|
||||
field: Optional[str] = None
|
||||
|
||||
def __str__(self) -> str:
|
||||
parts = [f"[{self.severity.upper()}] {self.code}: {self.message}"]
|
||||
if self.section:
|
||||
parts.append(f"(section: {self.section})")
|
||||
if self.field:
|
||||
parts.append(f"(field: {self.field})")
|
||||
return " ".join(parts)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ComplianceResult:
|
||||
"""Validation result for a single entity."""
|
||||
|
||||
entity_slug: str
|
||||
schema_name: str
|
||||
diagnostics: List[ComplianceDiagnostic] = field(default_factory=list)
|
||||
checks_run: int = 0
|
||||
|
||||
@property
|
||||
def is_compliant(self) -> bool:
|
||||
return self.error_count == 0
|
||||
|
||||
@property
|
||||
def error_count(self) -> int:
|
||||
return sum(1 for d in self.diagnostics if d.severity == "error")
|
||||
|
||||
@property
|
||||
def warning_count(self) -> int:
|
||||
return sum(1 for d in self.diagnostics if d.severity == "warning")
|
||||
|
||||
@property
|
||||
def errors(self) -> List[ComplianceDiagnostic]:
|
||||
return [d for d in self.diagnostics if d.severity == "error"]
|
||||
|
||||
@property
|
||||
def warnings(self) -> List[ComplianceDiagnostic]:
|
||||
return [d for d in self.diagnostics if d.severity == "warning"]
|
||||
|
||||
def summary(self) -> str:
|
||||
status = "PASS" if self.is_compliant else "FAIL"
|
||||
return (
|
||||
f"{self.entity_slug}: {status} "
|
||||
f"({self.checks_run} checks, "
|
||||
f"{self.error_count} errors, "
|
||||
f"{self.warning_count} warnings)"
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class BatchComplianceResult:
|
||||
"""Aggregated validation result for multiple entities."""
|
||||
|
||||
results: List[ComplianceResult] = field(default_factory=list)
|
||||
schema_name: str = ""
|
||||
|
||||
@property
|
||||
def total_entities(self) -> int:
|
||||
return len(self.results)
|
||||
|
||||
@property
|
||||
def compliant_count(self) -> int:
|
||||
return sum(1 for r in self.results if r.is_compliant)
|
||||
|
||||
@property
|
||||
def non_compliant_count(self) -> int:
|
||||
return self.total_entities - self.compliant_count
|
||||
|
||||
@property
|
||||
def total_errors(self) -> int:
|
||||
return sum(r.error_count for r in self.results)
|
||||
|
||||
@property
|
||||
def total_warnings(self) -> int:
|
||||
return sum(r.warning_count for r in self.results)
|
||||
|
||||
def summary(self) -> str:
|
||||
lines = [
|
||||
f"Schema: {self.schema_name}",
|
||||
f"Entities: {self.total_entities}",
|
||||
f"Compliant: {self.compliant_count}/{self.total_entities}",
|
||||
f"Errors: {self.total_errors}, Warnings: {self.total_warnings}",
|
||||
]
|
||||
for r in self.results:
|
||||
lines.append(f" {r.summary()}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _word_count(text: str) -> int:
|
||||
"""Count whitespace-separated words."""
|
||||
return len(text.split())
|
||||
|
||||
|
||||
def validate_entity(
|
||||
entity: EntityMeta,
|
||||
schema: EntitySchema,
|
||||
) -> ComplianceResult:
|
||||
"""Validate a single entity against *schema*.
|
||||
|
||||
Returns a :class:`ComplianceResult` with all diagnostics found.
|
||||
"""
|
||||
result = ComplianceResult(
|
||||
entity_slug=entity.slug,
|
||||
schema_name=schema.name,
|
||||
)
|
||||
checks = 0
|
||||
|
||||
# ── H1 checks ─────────────────────────────────────────────────
|
||||
if schema.require_h1:
|
||||
checks += 1
|
||||
if not entity.slug:
|
||||
result.diagnostics.append(
|
||||
ComplianceDiagnostic(
|
||||
code="H1_MISSING",
|
||||
message="Entity has no H1 heading (empty slug).",
|
||||
severity="error",
|
||||
)
|
||||
)
|
||||
|
||||
checks += 1
|
||||
if entity.slug and not entity.h1_is_title_case:
|
||||
result.diagnostics.append(
|
||||
ComplianceDiagnostic(
|
||||
code="H1_NOT_TITLE_CASE",
|
||||
message=f"H1 '{entity.h1_raw}' is not in title case.",
|
||||
severity=schema.h1_title_case_severity,
|
||||
)
|
||||
)
|
||||
|
||||
# ── Section checks ────────────────────────────────────────────
|
||||
for rule in schema.section_rules:
|
||||
checks += 1
|
||||
field_name = _SECTION_FIELD_MAP.get(rule.slug, rule.slug)
|
||||
value = getattr(entity, field_name, "")
|
||||
|
||||
is_empty = not value or not value.strip()
|
||||
|
||||
if is_empty:
|
||||
if rule.requirement == SectionRequirement.REQUIRED:
|
||||
result.diagnostics.append(
|
||||
ComplianceDiagnostic(
|
||||
code="SECTION_MISSING",
|
||||
message=f"Required section '{rule.label}' is missing or empty.",
|
||||
severity="error",
|
||||
section=rule.slug,
|
||||
)
|
||||
)
|
||||
elif rule.requirement == SectionRequirement.RECOMMENDED:
|
||||
result.diagnostics.append(
|
||||
ComplianceDiagnostic(
|
||||
code="SECTION_RECOMMENDED",
|
||||
message=f"Recommended section '{rule.label}' is missing.",
|
||||
severity="warning",
|
||||
section=rule.slug,
|
||||
)
|
||||
)
|
||||
# OPTIONAL + empty → no diagnostic
|
||||
continue
|
||||
|
||||
# Word count bounds (only if section has content)
|
||||
wc = _word_count(value)
|
||||
if rule.min_words is not None and wc < rule.min_words:
|
||||
checks += 1
|
||||
result.diagnostics.append(
|
||||
ComplianceDiagnostic(
|
||||
code="SECTION_TOO_SHORT",
|
||||
message=(
|
||||
f"Section '{rule.label}' has {wc} words "
|
||||
f"(minimum: {rule.min_words})."
|
||||
),
|
||||
severity="error",
|
||||
section=rule.slug,
|
||||
)
|
||||
)
|
||||
elif rule.max_words is not None and wc > rule.max_words:
|
||||
checks += 1
|
||||
result.diagnostics.append(
|
||||
ComplianceDiagnostic(
|
||||
code="SECTION_TOO_LONG",
|
||||
message=(
|
||||
f"Section '{rule.label}' has {wc} words "
|
||||
f"(maximum: {rule.max_words})."
|
||||
),
|
||||
severity="warning",
|
||||
section=rule.slug,
|
||||
)
|
||||
)
|
||||
|
||||
# ── Enum constraints ──────────────────────────────────────────
|
||||
for constraint in schema.enum_constraints:
|
||||
checks += 1
|
||||
value = getattr(entity, constraint.field_name, "")
|
||||
|
||||
# Empty field is already caught by SECTION_MISSING above
|
||||
if not value or not value.strip():
|
||||
continue
|
||||
|
||||
if value.strip() not in constraint.allowed_values:
|
||||
result.diagnostics.append(
|
||||
ComplianceDiagnostic(
|
||||
code="ENUM_VALUE_UNKNOWN",
|
||||
message=(
|
||||
f"Field '{constraint.field_name}' has value "
|
||||
f"'{value.strip()}' which is not in the allowed set."
|
||||
),
|
||||
severity=constraint.severity,
|
||||
field=constraint.field_name,
|
||||
)
|
||||
)
|
||||
|
||||
result.checks_run = checks
|
||||
return result
|
||||
|
||||
|
||||
def validate_entities(
|
||||
entities: Sequence[EntityMeta],
|
||||
schema: EntitySchema,
|
||||
) -> BatchComplianceResult:
|
||||
"""Validate multiple entities against *schema*.
|
||||
|
||||
Returns a :class:`BatchComplianceResult` with per-entity results.
|
||||
"""
|
||||
batch = BatchComplianceResult(schema_name=schema.name)
|
||||
for entity in entities:
|
||||
batch.results.append(validate_entity(entity, schema))
|
||||
return batch
|
||||
@@ -26,6 +26,15 @@ from markitect.llm.exceptions import (
|
||||
LLMTimeoutError,
|
||||
LLMSubprocessError,
|
||||
)
|
||||
from markitect.llm.embedding_adapter import EmbeddingAdapter
|
||||
from markitect.llm.embedding_openai import OpenAICompatibleEmbeddingAdapter
|
||||
from markitect.llm.embedding_cache import EmbeddingCache
|
||||
from markitect.llm.embedding_factory import create_embedding_adapter
|
||||
from markitect.llm.similarity import (
|
||||
cosine_similarity,
|
||||
similarity_matrix,
|
||||
find_similar_pairs,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"create_adapter",
|
||||
@@ -41,4 +50,11 @@ __all__ = [
|
||||
"LLMRateLimitError",
|
||||
"LLMTimeoutError",
|
||||
"LLMSubprocessError",
|
||||
"EmbeddingAdapter",
|
||||
"OpenAICompatibleEmbeddingAdapter",
|
||||
"EmbeddingCache",
|
||||
"create_embedding_adapter",
|
||||
"cosine_similarity",
|
||||
"similarity_matrix",
|
||||
"find_similar_pairs",
|
||||
]
|
||||
|
||||
34
markitect/llm/embedding_adapter.py
Normal file
34
markitect/llm/embedding_adapter.py
Normal file
@@ -0,0 +1,34 @@
|
||||
"""
|
||||
Abstract base class for embedding adapters.
|
||||
|
||||
Embedding adapters convert text into float vectors. This is a separate
|
||||
hierarchy from :class:`LLMAdapter` (text generation) because the API
|
||||
contract is fundamentally different: text in, float vectors out.
|
||||
"""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
|
||||
class EmbeddingAdapter(ABC):
|
||||
"""Base class for all embedding adapters."""
|
||||
|
||||
@abstractmethod
|
||||
def embed(self, texts: list[str]) -> list[list[float]]:
|
||||
"""Embed a batch of texts into vectors.
|
||||
|
||||
Args:
|
||||
texts: One or more strings to embed.
|
||||
|
||||
Returns:
|
||||
A list of embedding vectors, one per input text,
|
||||
in the same order as *texts*.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def validate(self) -> bool:
|
||||
"""Check that the adapter is configured correctly.
|
||||
|
||||
Returns:
|
||||
``True`` if the adapter has a valid configuration
|
||||
(e.g. API key present), ``False`` otherwise.
|
||||
"""
|
||||
64
markitect/llm/embedding_cache.py
Normal file
64
markitect/llm/embedding_cache.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""
|
||||
File-based embedding cache.
|
||||
|
||||
Stores embedding vectors in a single JSON file keyed by entity slug.
|
||||
Each entry includes a content digest so stale embeddings are
|
||||
automatically invalidated when entity content changes.
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class EmbeddingCache:
|
||||
"""Persistent cache for embedding vectors.
|
||||
|
||||
Structure on disk (``embeddings.json``)::
|
||||
|
||||
{
|
||||
"division-of-labour": {"digest": "abc123", "vector": [0.1, ...]},
|
||||
...
|
||||
}
|
||||
"""
|
||||
|
||||
def __init__(self, cache_dir: Path):
|
||||
self._path = cache_dir / "embeddings.json"
|
||||
self._data: dict[str, dict] = {}
|
||||
self._hits = 0
|
||||
self._misses = 0
|
||||
self._load()
|
||||
|
||||
def get(self, slug: str, content_digest: str) -> Optional[list[float]]:
|
||||
"""Return the cached vector if *content_digest* matches, else ``None``."""
|
||||
entry = self._data.get(slug)
|
||||
if entry is not None and entry.get("digest") == content_digest:
|
||||
self._hits += 1
|
||||
return entry["vector"]
|
||||
self._misses += 1
|
||||
return None
|
||||
|
||||
def put(self, slug: str, content_digest: str, vector: list[float]) -> None:
|
||||
"""Store or overwrite the embedding for *slug*."""
|
||||
self._data[slug] = {"digest": content_digest, "vector": vector}
|
||||
|
||||
def save(self) -> None:
|
||||
"""Write cache to disk."""
|
||||
self._path.parent.mkdir(parents=True, exist_ok=True)
|
||||
self._path.write_text(json.dumps(self._data, separators=(",", ":")))
|
||||
|
||||
def stats(self) -> dict:
|
||||
"""Return cache statistics."""
|
||||
return {
|
||||
"entries": len(self._data),
|
||||
"hits": self._hits,
|
||||
"misses": self._misses,
|
||||
}
|
||||
|
||||
def _load(self) -> None:
|
||||
"""Read cache from disk if it exists."""
|
||||
if self._path.is_file():
|
||||
try:
|
||||
self._data = json.loads(self._path.read_text())
|
||||
except (json.JSONDecodeError, OSError):
|
||||
self._data = {}
|
||||
50
markitect/llm/embedding_factory.py
Normal file
50
markitect/llm/embedding_factory.py
Normal file
@@ -0,0 +1,50 @@
|
||||
"""
|
||||
Factory for creating embedding adapters by provider name.
|
||||
"""
|
||||
|
||||
from typing import Optional, Any
|
||||
|
||||
from markitect.llm.embedding_adapter import EmbeddingAdapter
|
||||
from markitect.llm.exceptions import LLMConfigurationError
|
||||
|
||||
_EMBEDDING_PROVIDERS = {
|
||||
"openai": "markitect.llm.embedding_openai.OpenAICompatibleEmbeddingAdapter",
|
||||
"openrouter": "markitect.llm.embedding_openai.OpenAICompatibleEmbeddingAdapter",
|
||||
}
|
||||
|
||||
|
||||
def create_embedding_adapter(
|
||||
provider: str = "openai",
|
||||
model: Optional[str] = None,
|
||||
api_key: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> EmbeddingAdapter:
|
||||
"""Instantiate an :class:`EmbeddingAdapter` for the given *provider*.
|
||||
|
||||
Args:
|
||||
provider: ``"openai"`` or ``"openrouter"``.
|
||||
model: Embedding model name (e.g. ``"text-embedding-3-small"``).
|
||||
api_key: Explicit API key.
|
||||
**kwargs: Extra keyword arguments forwarded to the adapter.
|
||||
|
||||
Returns:
|
||||
A ready-to-use :class:`EmbeddingAdapter` instance.
|
||||
|
||||
Raises:
|
||||
LLMConfigurationError: If *provider* is not recognised.
|
||||
"""
|
||||
if provider not in _EMBEDDING_PROVIDERS:
|
||||
known = ", ".join(sorted(_EMBEDDING_PROVIDERS))
|
||||
raise LLMConfigurationError(
|
||||
f"Unknown embedding provider {provider!r}. Choose from: {known}",
|
||||
context={"provider": provider},
|
||||
)
|
||||
|
||||
# Lazy import
|
||||
fqn = _EMBEDDING_PROVIDERS[provider]
|
||||
module_path, class_name = fqn.rsplit(".", 1)
|
||||
import importlib
|
||||
mod = importlib.import_module(module_path)
|
||||
cls = getattr(mod, class_name)
|
||||
|
||||
return cls(model=model, api_key=api_key, provider=provider, **kwargs)
|
||||
125
markitect/llm/embedding_openai.py
Normal file
125
markitect/llm/embedding_openai.py
Normal file
@@ -0,0 +1,125 @@
|
||||
"""
|
||||
OpenAI-compatible embedding adapter.
|
||||
|
||||
Works with both OpenAI (``/v1/embeddings``) and OpenRouter
|
||||
(``/api/v1/embeddings``) since they share the same API format.
|
||||
The *provider* parameter determines the default base URL and
|
||||
API key environment variable.
|
||||
"""
|
||||
|
||||
import time
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
from markitect.llm.embedding_adapter import EmbeddingAdapter
|
||||
from markitect.llm.config import resolve_api_key, find_project_root
|
||||
from markitect.llm._http import post_json
|
||||
from markitect.llm.exceptions import (
|
||||
LLMConfigurationError,
|
||||
LLMAPIError,
|
||||
LLMRateLimitError,
|
||||
)
|
||||
|
||||
_DEFAULT_MODEL = "text-embedding-3-small"
|
||||
|
||||
_PROVIDER_DEFAULTS: Dict[str, Dict[str, str]] = {
|
||||
"openai": {
|
||||
"api_base": "https://api.openai.com/v1",
|
||||
"env_var": "OPENAI_API_KEY",
|
||||
},
|
||||
"openrouter": {
|
||||
"api_base": "https://openrouter.ai/api/v1",
|
||||
"env_var": "OPENROUTER_API_KEY",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
class OpenAICompatibleEmbeddingAdapter(EmbeddingAdapter):
|
||||
"""Embedding adapter for OpenAI-compatible endpoints.
|
||||
|
||||
A single class handles both OpenAI and OpenRouter because they
|
||||
expose the same ``/embeddings`` endpoint format.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model: Optional[str] = None,
|
||||
api_key: Optional[str] = None,
|
||||
api_base: Optional[str] = None,
|
||||
provider: str = "openai",
|
||||
max_retries: int = 3,
|
||||
):
|
||||
if provider not in _PROVIDER_DEFAULTS:
|
||||
known = ", ".join(sorted(_PROVIDER_DEFAULTS))
|
||||
raise LLMConfigurationError(
|
||||
f"Unknown embedding provider {provider!r}. Choose from: {known}",
|
||||
context={"provider": provider},
|
||||
)
|
||||
|
||||
defaults = _PROVIDER_DEFAULTS[provider]
|
||||
self._model = model or _DEFAULT_MODEL
|
||||
self._api_base = (api_base or defaults["api_base"]).rstrip("/")
|
||||
self._max_retries = max_retries
|
||||
self._provider = provider
|
||||
|
||||
# Resolve API key
|
||||
env_var = defaults["env_var"]
|
||||
root = find_project_root()
|
||||
key_file_paths = [root / f"apikey-{provider}.txt"] if root else []
|
||||
self._api_key = resolve_api_key(
|
||||
explicit=api_key,
|
||||
env_var=env_var,
|
||||
key_file_paths=key_file_paths,
|
||||
)
|
||||
|
||||
def embed(self, texts: list[str]) -> list[list[float]]:
|
||||
"""Embed texts via the OpenAI-compatible ``/embeddings`` endpoint.
|
||||
|
||||
Raises:
|
||||
LLMConfigurationError: If no API key is configured.
|
||||
LLMAPIError: On HTTP errors after retries are exhausted.
|
||||
"""
|
||||
if not self._api_key:
|
||||
raise LLMConfigurationError(
|
||||
"No API key configured for embedding adapter",
|
||||
context={"provider": self._provider},
|
||||
)
|
||||
|
||||
url = f"{self._api_base}/embeddings"
|
||||
payload: Dict[str, Any] = {
|
||||
"model": self._model,
|
||||
"input": texts,
|
||||
}
|
||||
headers = {"Authorization": f"Bearer {self._api_key}"}
|
||||
|
||||
data = self._post_with_retries(url, payload, headers)
|
||||
|
||||
# Response: {"data": [{"embedding": [...], "index": 0}, ...]}
|
||||
# Sort by index to guarantee input order.
|
||||
items = sorted(data["data"], key=lambda d: d["index"])
|
||||
return [item["embedding"] for item in items]
|
||||
|
||||
def validate(self) -> bool:
|
||||
"""Return ``True`` if an API key is available."""
|
||||
return self._api_key is not None
|
||||
|
||||
def _post_with_retries(
|
||||
self,
|
||||
url: str,
|
||||
payload: Dict[str, Any],
|
||||
headers: Dict[str, str],
|
||||
) -> Dict[str, Any]:
|
||||
last_exc: Optional[Exception] = None
|
||||
for attempt in range(self._max_retries + 1):
|
||||
try:
|
||||
return post_json(url, payload, headers)
|
||||
except LLMRateLimitError as exc:
|
||||
last_exc = exc
|
||||
if attempt < self._max_retries:
|
||||
time.sleep(2 ** attempt)
|
||||
except LLMAPIError as exc:
|
||||
if exc.status_code >= 500 and attempt < self._max_retries:
|
||||
last_exc = exc
|
||||
time.sleep(2 ** attempt)
|
||||
else:
|
||||
raise
|
||||
raise last_exc # type: ignore[misc]
|
||||
64
markitect/llm/similarity.py
Normal file
64
markitect/llm/similarity.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""
|
||||
Pure-Python vector similarity utilities.
|
||||
|
||||
No external dependencies — uses :mod:`math` only. Sufficient for the
|
||||
current entity scale (~100s). numpy can be substituted later if needed.
|
||||
"""
|
||||
|
||||
import math
|
||||
|
||||
|
||||
def cosine_similarity(a: list[float], b: list[float]) -> float:
|
||||
"""Cosine similarity between two vectors.
|
||||
|
||||
Returns a float in [-1, 1]. Returns 0.0 if either vector has
|
||||
zero magnitude (to avoid division by zero).
|
||||
"""
|
||||
dot = sum(x * y for x, y in zip(a, b))
|
||||
mag_a = math.sqrt(sum(x * x for x in a))
|
||||
mag_b = math.sqrt(sum(x * x for x in b))
|
||||
if mag_a == 0.0 or mag_b == 0.0:
|
||||
return 0.0
|
||||
return dot / (mag_a * mag_b)
|
||||
|
||||
|
||||
def similarity_matrix(embeddings: list[list[float]]) -> list[list[float]]:
|
||||
"""Build an NxN cosine similarity matrix.
|
||||
|
||||
``matrix[i][j]`` is the cosine similarity between
|
||||
``embeddings[i]`` and ``embeddings[j]``.
|
||||
"""
|
||||
n = len(embeddings)
|
||||
mat: list[list[float]] = [[0.0] * n for _ in range(n)]
|
||||
for i in range(n):
|
||||
mat[i][i] = 1.0
|
||||
for j in range(i + 1, n):
|
||||
sim = cosine_similarity(embeddings[i], embeddings[j])
|
||||
mat[i][j] = sim
|
||||
mat[j][i] = sim
|
||||
return mat
|
||||
|
||||
|
||||
def find_similar_pairs(
|
||||
embeddings: dict[str, list[float]],
|
||||
threshold: float = 0.80,
|
||||
) -> list[tuple[str, str, float]]:
|
||||
"""Find all pairs with cosine similarity >= *threshold*.
|
||||
|
||||
Args:
|
||||
embeddings: Mapping of slug → embedding vector.
|
||||
threshold: Minimum similarity to include (default 0.80).
|
||||
|
||||
Returns:
|
||||
List of ``(slug_a, slug_b, similarity)`` tuples sorted by
|
||||
similarity descending.
|
||||
"""
|
||||
slugs = sorted(embeddings)
|
||||
pairs: list[tuple[str, str, float]] = []
|
||||
for i, slug_a in enumerate(slugs):
|
||||
for slug_b in slugs[i + 1:]:
|
||||
sim = cosine_similarity(embeddings[slug_a], embeddings[slug_b])
|
||||
if sim >= threshold:
|
||||
pairs.append((slug_a, slug_b, sim))
|
||||
pairs.sort(key=lambda t: t[2], reverse=True)
|
||||
return pairs
|
||||
168
markitect/prompts/execution/batch.py
Normal file
168
markitect/prompts/execution/batch.py
Normal file
@@ -0,0 +1,168 @@
|
||||
"""
|
||||
Batch LLM evaluation orchestrator.
|
||||
|
||||
Runs an evaluation prompt against a batch of items (entities, pairs,
|
||||
etc.), collecting structured results. Handles:
|
||||
|
||||
- Incremental evaluation (skip items whose content hasn't changed)
|
||||
- Progress reporting via callback
|
||||
- Graceful error handling per item (one failure doesn't stop the batch)
|
||||
- Aggregate token usage tracking
|
||||
|
||||
This is the mechanism by which infospace tooling delegates LLM work
|
||||
to the platform. The adapter's own retry logic handles transient
|
||||
API errors (rate limits, 5xx).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Callable, Dict, List, Optional
|
||||
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.prompts.execution.models import LLMResponse, RunConfig
|
||||
|
||||
|
||||
@dataclass
|
||||
class BatchItem:
|
||||
"""A single item to evaluate in a batch.
|
||||
|
||||
Attributes:
|
||||
key: Unique identifier (e.g. entity slug).
|
||||
prompt: The compiled prompt text to send to the LLM.
|
||||
content_digest: Hash of the source content, used for
|
||||
incremental evaluation (skip if unchanged).
|
||||
metadata: Arbitrary pass-through metadata.
|
||||
"""
|
||||
|
||||
key: str
|
||||
prompt: str
|
||||
content_digest: str = ""
|
||||
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass
|
||||
class BatchResult:
|
||||
"""Result for a single batch item.
|
||||
|
||||
Attributes:
|
||||
key: Matches the input :attr:`BatchItem.key`.
|
||||
status: One of ``"success"``, ``"error"``, ``"skipped"``.
|
||||
response: The LLM response (``None`` if skipped or error).
|
||||
error: Error message (``None`` if success or skipped).
|
||||
metadata: Pass-through metadata from the input item.
|
||||
"""
|
||||
|
||||
key: str
|
||||
status: str
|
||||
response: Optional[LLMResponse] = None
|
||||
error: Optional[str] = None
|
||||
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass
|
||||
class BatchSummary:
|
||||
"""Aggregate results from a batch evaluation run."""
|
||||
|
||||
total: int = 0
|
||||
succeeded: int = 0
|
||||
failed: int = 0
|
||||
skipped: int = 0
|
||||
results: List[BatchResult] = field(default_factory=list)
|
||||
total_prompt_tokens: int = 0
|
||||
total_completion_tokens: int = 0
|
||||
|
||||
@property
|
||||
def total_tokens(self) -> int:
|
||||
return self.total_prompt_tokens + self.total_completion_tokens
|
||||
|
||||
def success_rate(self) -> float:
|
||||
"""Fraction of non-skipped items that succeeded."""
|
||||
attempted = self.total - self.skipped
|
||||
if attempted == 0:
|
||||
return 1.0
|
||||
return self.succeeded / attempted
|
||||
|
||||
|
||||
class BatchEvaluator:
|
||||
"""Orchestrates LLM evaluation across a batch of items.
|
||||
|
||||
Args:
|
||||
adapter: The LLM adapter to use for evaluation.
|
||||
config: Run configuration (model, temperature, etc.).
|
||||
progress_callback: Optional ``fn(completed, total, result)``
|
||||
called after each item is processed.
|
||||
previous_digests: Optional ``{key: digest}`` mapping from a
|
||||
previous run. Items whose digest matches are skipped.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
adapter: LLMAdapter,
|
||||
config: Optional[RunConfig] = None,
|
||||
progress_callback: Optional[Callable[[int, int, BatchResult], None]] = None,
|
||||
previous_digests: Optional[Dict[str, str]] = None,
|
||||
):
|
||||
self._adapter = adapter
|
||||
self._config = config or RunConfig()
|
||||
self._progress_callback = progress_callback
|
||||
self._previous_digests = previous_digests or {}
|
||||
|
||||
def evaluate(self, items: List[BatchItem]) -> BatchSummary:
|
||||
"""Run evaluation for all items and return aggregate results.
|
||||
|
||||
Items whose :attr:`~BatchItem.content_digest` matches an entry
|
||||
in *previous_digests* are skipped. All other items are sent to
|
||||
the LLM adapter. Errors on individual items are captured
|
||||
without aborting the batch.
|
||||
"""
|
||||
summary = BatchSummary(total=len(items))
|
||||
|
||||
for idx, item in enumerate(items):
|
||||
result = self._evaluate_one(item)
|
||||
summary.results.append(result)
|
||||
|
||||
if result.status == "success":
|
||||
summary.succeeded += 1
|
||||
usage = result.response.usage if result.response else {}
|
||||
summary.total_prompt_tokens += usage.get("prompt_tokens", 0)
|
||||
summary.total_completion_tokens += usage.get("completion_tokens", 0)
|
||||
elif result.status == "skipped":
|
||||
summary.skipped += 1
|
||||
else:
|
||||
summary.failed += 1
|
||||
|
||||
if self._progress_callback is not None:
|
||||
self._progress_callback(idx + 1, len(items), result)
|
||||
|
||||
return summary
|
||||
|
||||
def _evaluate_one(self, item: BatchItem) -> BatchResult:
|
||||
"""Evaluate a single item, handling skip logic and errors."""
|
||||
# Incremental: skip if digest unchanged
|
||||
if (
|
||||
item.content_digest
|
||||
and item.key in self._previous_digests
|
||||
and self._previous_digests[item.key] == item.content_digest
|
||||
):
|
||||
return BatchResult(
|
||||
key=item.key,
|
||||
status="skipped",
|
||||
metadata=item.metadata,
|
||||
)
|
||||
|
||||
try:
|
||||
response = self._adapter.execute_prompt(item.prompt, self._config)
|
||||
return BatchResult(
|
||||
key=item.key,
|
||||
status="success",
|
||||
response=response,
|
||||
metadata=item.metadata,
|
||||
)
|
||||
except Exception as exc:
|
||||
return BatchResult(
|
||||
key=item.key,
|
||||
status="error",
|
||||
error=str(exc),
|
||||
metadata=item.metadata,
|
||||
)
|
||||
@@ -33,6 +33,7 @@ development = [
|
||||
"kaizen-agentic @ file:./capabilities/kaizen-agentic"
|
||||
]
|
||||
proxy-pdf = ["pymupdf4llm>=0.0.10"]
|
||||
analysis = ["networkx>=3.0"]
|
||||
proxy-html = ["markdownify>=0.13.1"]
|
||||
proxy-markitdown = ["markitdown-no-magika[pdf]"]
|
||||
proxy = ["markitdown-no-magika[pdf]"]
|
||||
|
||||
621
roadmap/infospace-tooling/PLAN.md
Normal file
621
roadmap/infospace-tooling/PLAN.md
Normal file
@@ -0,0 +1,621 @@
|
||||
# Viable Infospace Tooling — Roadmap
|
||||
|
||||
## Vision
|
||||
|
||||
An **infospace** is a structured, evaluable, composable collection of
|
||||
concepts that explains a **topic** through the lens of one or more
|
||||
**disciplines**. Infospaces are the unit of knowledge work in MarkiTect.
|
||||
|
||||
This roadmap organises the work needed to move from the current
|
||||
ad-hoc example (`infospace-with-history`) to a general-purpose platform
|
||||
for creating, evaluating, maintaining, and composing infospaces.
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
These terms establish the vocabulary for infospace tooling. They
|
||||
generalise from the Wealth of Nations / VSM example but are not
|
||||
specific to it.
|
||||
|
||||
### Infospace
|
||||
|
||||
A curated, self-describing collection of **entities** (concepts,
|
||||
mechanisms, observations) that together explain a **topic**. An
|
||||
infospace has:
|
||||
|
||||
- A **topic** — the subject matter being explained (e.g. "The Wealth
|
||||
of Nations", "cellular biology", "Kubernetes networking")
|
||||
- One or more **disciplines** — external frameworks applied as lenses
|
||||
(e.g. "Viable System Model", "category theory")
|
||||
- **Entities** — the atomic units of knowledge, each with a definition,
|
||||
provenance, and quality scores
|
||||
- **Schemas** — structural templates that define what a well-formed
|
||||
entity, mapping, or analysis looks like
|
||||
- **Evaluations** — per-entity and collection-level quality assessments
|
||||
- **Metrics** — quantitative indicators of completeness, coherence,
|
||||
consistency, and granularity balance
|
||||
|
||||
An infospace is **viable** when it meets threshold scores across its
|
||||
defined metrics — it is fit for purpose as an explanatory tool.
|
||||
|
||||
### Topic
|
||||
|
||||
The subject matter an infospace is built to explain. A topic sits
|
||||
within a **domain** (broader field of knowledge) but is more specific:
|
||||
|
||||
- Domain: Economics → Topic: The Wealth of Nations
|
||||
- Domain: Systems Theory → Topic: Viable System Model
|
||||
- Domain: Computer Science → Topic: Distributed consensus protocols
|
||||
|
||||
A topic provides the **source material** — the texts, data, or
|
||||
observations from which entities are extracted.
|
||||
|
||||
### Discipline
|
||||
|
||||
A reusable framework of concepts applied as a lens to explore a topic.
|
||||
A discipline is itself an infospace — one that has been evaluated as
|
||||
viable and packaged for reuse.
|
||||
|
||||
In our example, the VSM is the discipline: a set of concepts (S1-S5,
|
||||
recursion, variety, viability) from systems theory, applied to the
|
||||
economic concepts in Smith's work.
|
||||
|
||||
**Key property:** Disciplines compose. An infospace built with one
|
||||
discipline can itself become a discipline for another infospace. The
|
||||
Wealth of Nations infospace, viewed through VSM, could become a
|
||||
discipline applied to a modern supply chain analysis.
|
||||
|
||||
### Entity
|
||||
|
||||
The atomic unit of an infospace. An entity has:
|
||||
|
||||
- **Identity**: a unique slug and human-readable title
|
||||
- **Definition**: a precise, non-circular explanation
|
||||
- **Provenance**: the source chapter, passage, and extraction context
|
||||
- **Domain placement**: which area of the topic it belongs to
|
||||
- **Discipline mapping**: how it connects to the applied discipline
|
||||
(e.g. which VSM system)
|
||||
- **Quality scores**: per-entity LLM-evaluated metrics
|
||||
- **Lifecycle state**: active, archived (with reason), or draft
|
||||
|
||||
### Evaluation
|
||||
|
||||
A structured assessment of quality, applied at two levels:
|
||||
|
||||
- **Per-entity evaluation**: scores an individual entity against
|
||||
quality rubrics defined in its schema (definition precision, source
|
||||
grounding, discipline relevance, etc.)
|
||||
- **Collection evaluation**: scores the entity set as a whole against
|
||||
five concerns: redundancy, coverage, coherence, consistency, and
|
||||
granularity balance
|
||||
|
||||
Evaluations are always performed by **delegated LLM calls** through
|
||||
MarkiTect's LLM integration — never by the coding agent working on
|
||||
infrastructure. This separation ensures that domain-level judgment
|
||||
stays in the problem space, not the tooling space.
|
||||
|
||||
### Viability
|
||||
|
||||
An infospace is viable when:
|
||||
|
||||
1. Its entities individually meet quality thresholds (per-entity eval)
|
||||
2. Its collection metrics are within acceptable ranges
|
||||
3. It can answer its defined **competency questions** — the canonical
|
||||
queries the infospace is meant to support
|
||||
4. It has been evaluated recently enough that metrics reflect current
|
||||
content
|
||||
|
||||
Viability is not binary — it is a profile of scores that the user
|
||||
sets thresholds for based on their needs.
|
||||
|
||||
---
|
||||
|
||||
## Architecture: Three Layers
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ Layer 3: Infospace Instances │
|
||||
│ Specific infospaces built by users │
|
||||
│ (Wealth of Nations + VSM, supply chain + ...) │
|
||||
│ Works IN an infospace │
|
||||
├──────────────────────────────────────────────────┤
|
||||
│ Layer 2: Infospace Tooling │
|
||||
│ Terminology, primitives, composition model │
|
||||
│ CLI: infospace create/evaluate/compose/... │
|
||||
│ Works WITH infospaces │
|
||||
├──────────────────────────────────────────────────┤
|
||||
│ Layer 1: MarkiTect Platform │
|
||||
│ Artifacts, prompts, LLM, spaces, graph, embed │
|
||||
│ Provides FOR infospaces │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Boundary condition: LLM delegation
|
||||
|
||||
All LLM-based evaluation (entity scoring, pairwise judgments, coverage
|
||||
analysis) is delegated to MarkiTect's LLM integration module. The coding
|
||||
agent that works on infrastructure never makes domain-level judgments
|
||||
itself. This keeps a clean separation:
|
||||
|
||||
- **Coding agent** → writes Python, templates, schemas, tests
|
||||
- **MarkiTect LLM** → evaluates entities, judges redundancy, assesses
|
||||
coverage, checks consistency
|
||||
|
||||
The infospace tooling (Layer 2) orchestrates these LLM calls through
|
||||
prompt templates and the prompt execution engine, not through ad-hoc
|
||||
prompting.
|
||||
|
||||
---
|
||||
|
||||
## Stage 1: MarkiTect Platform Additions
|
||||
|
||||
Infrastructure that must exist before infospace tooling can be built.
|
||||
These are general-purpose platform capabilities, not infospace-specific.
|
||||
|
||||
### S1.1 — Entity metadata parser
|
||||
|
||||
Add a deterministic markdown parser that extracts structured metadata
|
||||
from entity files: H1 title, sections present, word counts, domain,
|
||||
source chapter. Returns a dataclass usable by all downstream metrics.
|
||||
|
||||
**Maps to:** INFRA-TASKS #13, #10
|
||||
**Location:** `markitect/prompts/quality/` or new `markitect/analysis/`
|
||||
**Depends on:** Nothing — can start immediately
|
||||
**Deliverable:** `parse_entity_metadata(path) -> EntityMeta` function
|
||||
with tests
|
||||
|
||||
### S1.2 — Schema compliance validator
|
||||
|
||||
Deterministic validation of entity/mapping files against their schemas:
|
||||
section presence, word count ranges, heading format, enum values. No
|
||||
LLM needed.
|
||||
|
||||
**Maps to:** INFRA-TASKS #10
|
||||
**Location:** `markitect/prompts/quality/validator.py` (extend existing)
|
||||
**Depends on:** S1.1
|
||||
**Deliverable:** `validate_document(path, schema) -> ValidationResult`
|
||||
with tests
|
||||
|
||||
### S1.3 — Embedding adapter
|
||||
|
||||
Add embedding support to `markitect/llm/`. Needs:
|
||||
|
||||
- `EmbeddingAdapter` interface: `embed(texts: list[str]) -> list[list[float]]`
|
||||
- `OpenRouterEmbeddingAdapter` implementation (or OpenAI embedding endpoint)
|
||||
- Caching layer: store embeddings keyed by `{slug: content_digest}` so
|
||||
unchanged entities skip re-embedding
|
||||
- Cosine similarity utility: `similarity_matrix(embeddings) -> np.ndarray`
|
||||
|
||||
**Maps to:** INFRA-TASKS #14 (prerequisite)
|
||||
**Location:** `markitect/llm/embeddings.py`
|
||||
**Depends on:** Nothing — can start immediately
|
||||
**Deliverable:** Embedding adapter + cache + similarity computation, with
|
||||
tests
|
||||
|
||||
### S1.4 — Graph analysis utilities
|
||||
|
||||
The existing `DependencyGraph` supports basic traversal and cycle
|
||||
detection. Collection-level metrics need richer analysis:
|
||||
|
||||
- Connected components
|
||||
- Betweenness centrality
|
||||
- Community detection (Louvain or label propagation)
|
||||
- Modularity score
|
||||
- Degree distribution
|
||||
- Cohesion/coupling computation
|
||||
|
||||
Decide: extend `DependencyGraph` or add a lightweight wrapper that
|
||||
converts to networkx (adding it as an optional dependency).
|
||||
|
||||
**Maps to:** INFRA-TASKS #16 (prerequisite)
|
||||
**Location:** `markitect/prompts/dependencies/analysis.py` or new
|
||||
`markitect/analysis/graph.py`
|
||||
**Depends on:** Nothing — can start immediately
|
||||
**Deliverable:** Graph analysis functions with tests
|
||||
|
||||
### S1.5 — Structured evaluation output
|
||||
|
||||
Define a standard format for evaluation results: YAML front-matter +
|
||||
markdown body. Add utilities for:
|
||||
|
||||
- Writing evaluation results (per-entity, per-pair, collection-level)
|
||||
- Reading/parsing evaluation results back into dataclasses
|
||||
- Appending timestamped snapshots to a history file
|
||||
- Diffing two snapshots
|
||||
|
||||
**Maps to:** INFRA-TASKS #11, #12
|
||||
**Location:** `markitect/prompts/quality/` or `markitect/analysis/`
|
||||
**Depends on:** S1.1
|
||||
**Deliverable:** `EvaluationResult` model + read/write utilities with
|
||||
tests
|
||||
|
||||
### S1.6 — Batch LLM evaluation orchestrator
|
||||
|
||||
A pipeline component that runs an evaluation prompt template against a
|
||||
batch of entities (or entity pairs), collecting structured results.
|
||||
Must handle:
|
||||
|
||||
- Rate limiting and retry (reuse existing adapter logic)
|
||||
- Progress reporting
|
||||
- Incremental evaluation (skip entities whose content hasn't changed
|
||||
since last eval)
|
||||
- Result aggregation
|
||||
|
||||
This is the mechanism by which infospace tooling delegates LLM work
|
||||
to the platform.
|
||||
|
||||
**Maps to:** INFRA-TASKS #9 (prerequisite)
|
||||
**Location:** `markitect/prompts/execution/batch.py`
|
||||
**Depends on:** S1.5
|
||||
**Deliverable:** `BatchEvaluator` class with tests
|
||||
|
||||
### S1.7 — FCA computation
|
||||
|
||||
Formal Concept Analysis: build a formal context (entity × attribute
|
||||
matrix), compute the concept lattice, extract gap concepts. Either
|
||||
implement a minimal FCA algorithm or integrate a library.
|
||||
|
||||
**Maps to:** INFRA-TASKS #15 (prerequisite)
|
||||
**Location:** `markitect/analysis/fca.py`
|
||||
**Depends on:** S1.1
|
||||
**Deliverable:** `FormalContext`, `ConceptLattice`, `find_gap_concepts()`
|
||||
with tests
|
||||
|
||||
### Summary: Stage 1 dependency graph
|
||||
|
||||
```
|
||||
S1.1 Entity metadata parser ──┬── S1.2 Schema validator
|
||||
├── S1.5 Eval output format ── S1.6 Batch evaluator
|
||||
└── S1.7 FCA computation
|
||||
|
||||
S1.3 Embedding adapter ──────── (independent)
|
||||
S1.4 Graph analysis ─────────── (independent)
|
||||
```
|
||||
|
||||
S1.1, S1.3, and S1.4 can proceed in parallel. S1.6 (batch evaluator) is
|
||||
the final piece needed before Stage 2 can begin.
|
||||
|
||||
---
|
||||
|
||||
## Stage 2: Infospace Tooling
|
||||
|
||||
The user-facing layer that provides documented primitives for working
|
||||
with infospaces. Built on top of Stage 1 infrastructure and the existing
|
||||
`markitect/spaces/` module.
|
||||
|
||||
### S2.1 — Infospace model and configuration
|
||||
|
||||
Define the `Infospace` as a first-class concept that extends the existing
|
||||
`InformationSpace` with:
|
||||
|
||||
- **Topic declaration**: name, domain, source material reference
|
||||
- **Discipline bindings**: which external infospaces are applied as lenses
|
||||
- **Schema registry**: which schemas govern entity structure
|
||||
- **Competency questions**: what the infospace should be able to answer
|
||||
- **Viability thresholds**: minimum acceptable metric scores
|
||||
- **Evaluation state**: latest per-entity and collection scores
|
||||
|
||||
Configuration format: a `infospace.yaml` (or section in existing config)
|
||||
that declares all of the above.
|
||||
|
||||
**Location:** new `markitect/infospace/` package
|
||||
**Depends on:** S1.1, S1.5, existing `markitect/spaces/`
|
||||
**Deliverable:** `InfospaceConfig`, `InfospaceState` models + loader
|
||||
|
||||
### S2.2 — Infospace lifecycle commands
|
||||
|
||||
CLI commands for the core lifecycle:
|
||||
|
||||
```bash
|
||||
# Initialise a new infospace
|
||||
markitect infospace init --topic "Wealth of Nations" \
|
||||
--domain "Economics" \
|
||||
--discipline vsm-framework
|
||||
|
||||
# Show infospace status (entity count, eval state, viability)
|
||||
markitect infospace status
|
||||
|
||||
# List entities with quality summary
|
||||
markitect infospace entities [--sort-by score|domain|chapter]
|
||||
|
||||
# Show viability dashboard
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
These commands read the `infospace.yaml` config and present information
|
||||
from the metadata index and evaluation results.
|
||||
|
||||
**Location:** `markitect/infospace/cli.py` integrated into main CLI
|
||||
**Depends on:** S2.1
|
||||
**Deliverable:** CLI commands with help text and tests
|
||||
|
||||
### S2.3 — Per-entity evaluation primitives
|
||||
|
||||
Prompt templates and CLI commands for evaluating individual entities:
|
||||
|
||||
```bash
|
||||
# Evaluate all entities
|
||||
markitect infospace evaluate --provider openrouter
|
||||
|
||||
# Evaluate entities from a specific chapter
|
||||
markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter
|
||||
|
||||
# Re-evaluate a single entity
|
||||
markitect infospace evaluate --entity division-of-labour --provider openrouter
|
||||
```
|
||||
|
||||
Uses the batch evaluator (S1.6) to run the evaluate-entity prompt
|
||||
template (defined in the infospace's schema directory) against entities.
|
||||
Writes structured results to `output/evaluations/`.
|
||||
|
||||
**Maps to:** INFRA-TASKS #8, #9
|
||||
**Location:** `markitect/infospace/evaluation.py`
|
||||
**Depends on:** S1.6, S2.1
|
||||
**Deliverable:** Per-entity evaluation pipeline + CLI + prompt template
|
||||
|
||||
### S2.4 — Collection-level checks
|
||||
|
||||
CLI commands for each of the five collection concerns:
|
||||
|
||||
```bash
|
||||
# Run all collection checks
|
||||
markitect infospace check --provider openrouter
|
||||
|
||||
# Run specific checks
|
||||
markitect infospace check redundancy --provider openrouter
|
||||
markitect infospace check coverage --provider openrouter
|
||||
markitect infospace check coherence --provider openrouter
|
||||
markitect infospace check consistency --provider openrouter
|
||||
markitect infospace check granularity --provider openrouter
|
||||
```
|
||||
|
||||
Each check uses Stage 1 infrastructure (embeddings, graph analysis, FCA)
|
||||
and delegates LLM judgment to the platform. Results written to
|
||||
`output/metrics/` as per-concern reports + unified `metrics.yaml`.
|
||||
|
||||
**Maps to:** INFRA-TASKS #14-19
|
||||
**Location:** `markitect/infospace/checks/` (one module per concern)
|
||||
**Depends on:** S1.3, S1.4, S1.6, S1.7, S2.1
|
||||
**Deliverable:** Five check modules + unified orchestrator + CLI
|
||||
|
||||
### S2.5 — Metrics history and viability tracking
|
||||
|
||||
Track metrics over time. After each evaluation or check run, append a
|
||||
timestamped snapshot to `metrics-history.yaml`. Provide commands to
|
||||
review trends:
|
||||
|
||||
```bash
|
||||
# Show metrics history
|
||||
markitect infospace history
|
||||
|
||||
# Compare two snapshots
|
||||
markitect infospace history diff 2026-02-18 2026-03-01
|
||||
|
||||
# Check viability against thresholds
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
Viability is assessed by comparing current metrics to the thresholds
|
||||
declared in `infospace.yaml`. A simple pass/fail per metric with the
|
||||
actual value.
|
||||
|
||||
**Maps to:** INFRA-TASKS #12
|
||||
**Location:** `markitect/infospace/history.py`
|
||||
**Depends on:** S2.4, S1.5
|
||||
**Deliverable:** History tracking + viability assessment + CLI
|
||||
|
||||
### S2.6 — Infospace composition model
|
||||
|
||||
The mechanism by which one infospace is applied as a discipline to
|
||||
another. Builds on `markitect/spaces/composability/`:
|
||||
|
||||
- **Discipline binding**: declare that infospace A uses infospace B as a
|
||||
discipline. B's entities become available as mapping targets.
|
||||
- **Cross-infospace references**: entity in A maps to concept in B using
|
||||
the same mapping schema and evaluation pipeline.
|
||||
- **Discipline viability requirement**: B must be viable (meets its own
|
||||
thresholds) before it can be used as a discipline for A.
|
||||
- **Cascading evaluation**: when B's entities change, A's mappings that
|
||||
reference them are flagged for re-evaluation.
|
||||
|
||||
```bash
|
||||
# Bind a discipline to the current infospace
|
||||
markitect infospace bind-discipline ./path/to/vsm-infospace
|
||||
|
||||
# List bound disciplines and their viability
|
||||
markitect infospace disciplines
|
||||
|
||||
# Check for stale mappings after discipline update
|
||||
markitect infospace check stale-mappings
|
||||
```
|
||||
|
||||
**Location:** `markitect/infospace/composition.py`
|
||||
**Depends on:** S2.1, existing `markitect/spaces/composability/`
|
||||
**Deliverable:** Composition model + CLI + documentation
|
||||
|
||||
### S2.7 — Documentation: Infospace Primitives Reference
|
||||
|
||||
A reference document explaining all primitives, their purpose, and how
|
||||
they compose. This is the user-facing documentation for the infospace
|
||||
tooling layer — the equivalent of a framework guide.
|
||||
|
||||
**Location:** `docs/infospace-primitives.md` or in-CLI help
|
||||
**Depends on:** S2.1-S2.6
|
||||
**Deliverable:** Reference documentation
|
||||
|
||||
### Summary: Stage 2 dependency graph
|
||||
|
||||
```
|
||||
S2.1 Model & config ──┬── S2.2 Lifecycle CLI
|
||||
├── S2.3 Per-entity evaluation
|
||||
├── S2.4 Collection checks ── S2.5 History & viability
|
||||
└── S2.6 Composition model
|
||||
|
||||
S2.7 Documentation (depends on all above)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 3: Example Revision
|
||||
|
||||
Revisit the Wealth of Nations / VSM example using the new tooling.
|
||||
The example becomes both a tutorial and a validation of the tooling.
|
||||
|
||||
### S3.1 — Migrate example to infospace configuration
|
||||
|
||||
Replace the ad-hoc `process_chapters.py` setup with a declarative
|
||||
`infospace.yaml`:
|
||||
|
||||
```yaml
|
||||
topic:
|
||||
name: "The Wealth of Nations"
|
||||
domain: "Classical Economics"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Viable System Model"
|
||||
path: artifacts/vsm-reference/
|
||||
|
||||
schemas:
|
||||
entity: schemas/economic-entity-schema-v1.0.md
|
||||
mapping: schemas/vsm-mapping-schema-v1.0.md
|
||||
analysis: schemas/chapter-analysis-schema-v1.0.md
|
||||
|
||||
competency_questions: schemas/competency-questions.md
|
||||
|
||||
viability:
|
||||
redundancy_ratio: { max: 0.05 }
|
||||
coverage_ratio: { min: 0.60 }
|
||||
coherence_components: { max: 1 }
|
||||
consistency_cycles: { max: 0 }
|
||||
granularity_entropy: { min: 1.0 }
|
||||
per_entity_mean: { min: 3.5 }
|
||||
|
||||
pipeline:
|
||||
stages:
|
||||
- template: extract-entities
|
||||
spaces: [sources, guidelines, vsm-reference, entities]
|
||||
- template: map-to-vsm
|
||||
spaces: [entities, vsm-reference, guidelines]
|
||||
- template: synthesize-analysis
|
||||
spaces: [sources, entities, mappings, vsm-reference]
|
||||
post_batch:
|
||||
- template: assess-metrics
|
||||
spaces: [analyses, vsm-reference]
|
||||
```
|
||||
|
||||
**Depends on:** S2.1
|
||||
**Deliverable:** `infospace.yaml` + migration of `process_chapters.py` to
|
||||
use infospace tooling APIs
|
||||
|
||||
### S3.2 — Clean per-chapter git history
|
||||
|
||||
Re-run all processed chapters (and remaining ones) with per-chapter
|
||||
commits on a clean branch, then replace the current tangled history.
|
||||
|
||||
**Maps to:** INFRA-TASKS #4, #7
|
||||
**Depends on:** S3.1
|
||||
**Deliverable:** Clean branch with one commit per chapter
|
||||
|
||||
### S3.3 — Full evaluation run
|
||||
|
||||
Run all per-entity evaluations and collection checks on the completed
|
||||
infospace. Establish baseline metrics. Demonstrate the viability
|
||||
dashboard.
|
||||
|
||||
**Maps to:** INFRA-TASKS #6
|
||||
**Depends on:** S2.3, S2.4, S2.5, S3.2
|
||||
**Deliverable:** Complete evaluation results + viability report
|
||||
|
||||
### S3.4 — Rewrite tutorial
|
||||
|
||||
Update `TUTORIAL.md` to use infospace tooling commands instead of
|
||||
raw `process_chapters.py` invocations. The tutorial should walk
|
||||
through:
|
||||
|
||||
1. Initialising an infospace (`markitect infospace init`)
|
||||
2. Defining schemas and competency questions
|
||||
3. Processing chapters (pipeline execution)
|
||||
4. Evaluating entities (`markitect infospace evaluate`)
|
||||
5. Running collection checks (`markitect infospace check`)
|
||||
6. Reviewing viability (`markitect infospace viability`)
|
||||
7. Iterating: refining guidelines, re-processing, re-evaluating
|
||||
8. Using the infospace as a discipline for a new project
|
||||
|
||||
**Depends on:** S3.1-S3.3
|
||||
**Deliverable:** Revised `TUTORIAL.md`
|
||||
|
||||
### S3.5 — Demonstrate composition
|
||||
|
||||
Create a minimal second infospace (e.g. a modern supply chain case
|
||||
study or a different economic text) that binds the Wealth of Nations
|
||||
infospace as a discipline. Demonstrates the composition model from S2.6.
|
||||
|
||||
**Depends on:** S2.6, S3.3
|
||||
**Deliverable:** Second example infospace + composition tutorial section
|
||||
|
||||
---
|
||||
|
||||
## Task Mapping
|
||||
|
||||
Cross-reference between INFRA-TASKS numbers and roadmap stages:
|
||||
|
||||
| INFRA-TASK | Description | Stage |
|
||||
|------------|-------------|-------|
|
||||
| 1-3 | Infra fixes (resolved) | — |
|
||||
| 4 | Per-chapter git history | S3.2 |
|
||||
| 5 | Prompt file side-effects | S1.6 (batch eval avoids this) |
|
||||
| 6 | Stale metrics | S3.3 |
|
||||
| 7 | Remaining 28 chapters | S3.2 |
|
||||
| 8 | Per-concept quality metrics in schema | S2.3 |
|
||||
| 9 | Evaluate-entity prompt template | S2.3 |
|
||||
| 10 | Deterministic schema compliance | S1.2 |
|
||||
| 11 | Structured metrics output | S1.5 |
|
||||
| 12 | Metrics-over-time tracking | S2.5 |
|
||||
| 13 | Entity metadata index | S1.1 |
|
||||
| 14 | Redundancy detection (C1) | S2.4 |
|
||||
| 15 | Coverage completeness (C2) | S2.4 |
|
||||
| 16 | Structural coherence (C3) | S2.4 |
|
||||
| 17 | Definitional consistency (C4) | S2.4 |
|
||||
| 18 | Granularity balance (C5) | S2.4 |
|
||||
| 19 | Unified collection evaluation | S2.4 |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order
|
||||
|
||||
Recommended sequence, accounting for dependencies and value delivery:
|
||||
|
||||
**Phase A — Foundation (Stage 1, parallelisable)**
|
||||
1. S1.1 Entity metadata parser
|
||||
2. S1.3 Embedding adapter
|
||||
3. S1.4 Graph analysis utilities
|
||||
|
||||
**Phase B — Validation & Output (Stage 1)**
|
||||
4. S1.2 Schema compliance validator (needs S1.1)
|
||||
5. S1.5 Structured evaluation output (needs S1.1)
|
||||
6. S1.7 FCA computation (needs S1.1)
|
||||
|
||||
**Phase C — Orchestration (Stage 1 → Stage 2 bridge)**
|
||||
7. S1.6 Batch LLM evaluation orchestrator (needs S1.5)
|
||||
|
||||
**Phase D — Infospace Core (Stage 2)**
|
||||
8. S2.1 Infospace model and configuration
|
||||
9. S2.2 Lifecycle commands
|
||||
10. S2.3 Per-entity evaluation primitives (needs S1.6, S2.1)
|
||||
|
||||
**Phase E — Collection Intelligence (Stage 2)**
|
||||
11. S2.4 Collection-level checks (needs S1.3, S1.4, S1.7, S2.1)
|
||||
12. S2.5 Metrics history and viability tracking
|
||||
|
||||
**Phase F — Composition (Stage 2)**
|
||||
13. S2.6 Infospace composition model
|
||||
14. S2.7 Documentation
|
||||
|
||||
**Phase G — Example (Stage 3)**
|
||||
15. S3.1 Migrate example to infospace config
|
||||
16. S3.2 Clean per-chapter history
|
||||
17. S3.3 Full evaluation run
|
||||
18. S3.4 Rewrite tutorial
|
||||
19. S3.5 Demonstrate composition
|
||||
381
roadmap/infospace-tooling/viable-information-spaces.md
Normal file
381
roadmap/infospace-tooling/viable-information-spaces.md
Normal file
@@ -0,0 +1,381 @@
|
||||
# Viable Information Spaces
|
||||
|
||||
*A preliminary introduction to the concepts, structure, and purpose of
|
||||
viable information spaces as a framework for structured knowledge work.*
|
||||
|
||||
---
|
||||
|
||||
## What is an Information Space?
|
||||
|
||||
An information space is a curated collection of concepts — each precisely
|
||||
defined, grounded in source material, and connected to the others — that
|
||||
together explain a topic. It is not a database, not a knowledge graph in
|
||||
the technical sense, and not a document collection. It is closer to what
|
||||
a domain expert carries in their head: a working vocabulary of ideas,
|
||||
their relationships, and the judgment to know which idea applies where.
|
||||
|
||||
The difference is that an information space makes this vocabulary
|
||||
**explicit, evaluable, and composable**. Every concept has a written
|
||||
definition. Every relationship can be traced. The quality of the whole
|
||||
collection can be measured and improved over time.
|
||||
|
||||
We use the term **infospace** as shorthand.
|
||||
|
||||
---
|
||||
|
||||
## Why "Viable"?
|
||||
|
||||
The word comes from Stafford Beer's Viable System Model, but the idea
|
||||
generalises beyond it. A viable system is one that can maintain a
|
||||
separate existence — it is complete enough to function, coherent enough
|
||||
to hold together, and adaptive enough to improve when circumstances
|
||||
change.
|
||||
|
||||
A **viable infospace** has the same properties:
|
||||
|
||||
- **Complete enough** — it covers the topic well enough to answer the
|
||||
questions it was built to answer. Not every detail, but every concept
|
||||
that matters.
|
||||
- **Coherent enough** — its concepts connect into an explanatory web,
|
||||
not a disconnected list. You can trace how one idea leads to another.
|
||||
- **Consistent enough** — concepts don't contradict each other. Terms
|
||||
are used the same way throughout. Definitions don't go in circles.
|
||||
- **Balanced enough** — concepts operate at comparable levels of
|
||||
abstraction. The infospace doesn't mix foundational theories with
|
||||
trivial observations without acknowledging the difference.
|
||||
- **Non-redundant enough** — each concept earns its place. Two concepts
|
||||
that mean the same thing should be one concept.
|
||||
|
||||
None of these are absolute. "Enough" is defined by the purpose. An
|
||||
infospace built for teaching needs different coverage than one built for
|
||||
research. Viability is a profile of scores against thresholds that the
|
||||
user sets.
|
||||
|
||||
---
|
||||
|
||||
## The Anatomy of an Infospace
|
||||
|
||||
### Topic
|
||||
|
||||
Every infospace is built to explain something specific. The **topic** is
|
||||
the subject matter: a text, a system, a body of knowledge, a problem
|
||||
domain. In our first example, the topic is Adam Smith's *The Wealth of
|
||||
Nations* — the economic ideas contained in that specific work.
|
||||
|
||||
A topic sits within a broader **domain** (economics, biology, software
|
||||
engineering) but is more focused. The domain provides context; the topic
|
||||
provides the source material from which concepts are extracted.
|
||||
|
||||
### Entities
|
||||
|
||||
The atomic units of an infospace are its **entities** — the individual
|
||||
concepts, mechanisms, and observations that constitute its vocabulary.
|
||||
Each entity has:
|
||||
|
||||
- A **name** and unique identifier
|
||||
- A **definition** — precise, non-circular, distinguishable from
|
||||
neighbouring concepts
|
||||
- **Provenance** — where it came from (which chapter, passage, or data
|
||||
source)
|
||||
- A **domain placement** — which area of the topic it belongs to
|
||||
- **Quality scores** — how well it is defined, grounded, and connected
|
||||
|
||||
Entities are stored as individual files, one concept per file. This makes
|
||||
them independently addressable, diffable, and composable.
|
||||
|
||||
### Schemas
|
||||
|
||||
**Schemas** define what a well-formed entity looks like: which sections
|
||||
it must have, what validation rules apply, what quality metrics are
|
||||
evaluated. A schema is not code — it is a markdown document that both
|
||||
humans and LLMs read as instructions.
|
||||
|
||||
Schemas serve two purposes:
|
||||
|
||||
1. **Structural** — they tell the extraction pipeline what to produce
|
||||
(required sections, word count ranges, heading formats)
|
||||
2. **Evaluative** — they define quality rubrics against which each entity
|
||||
is scored (definition precision, source grounding, explanatory value)
|
||||
|
||||
By changing a schema, you change what the infospace considers "good"
|
||||
without changing any infrastructure.
|
||||
|
||||
### Disciplines
|
||||
|
||||
Here is where things get interesting. An infospace doesn't just catalogue
|
||||
what's in the source material — it looks at the source through a
|
||||
**lens**. We call this lens a **discipline**: a structured framework of
|
||||
concepts from another domain, applied to illuminate the topic at hand.
|
||||
|
||||
In our example, the discipline is Stafford Beer's Viable System Model —
|
||||
a set of concepts from systems theory (System 1 through System 5,
|
||||
recursion, variety, viability) applied to the economic ideas in Smith's
|
||||
work. The VSM provides the analytical structure; Smith provides the raw
|
||||
material.
|
||||
|
||||
The key insight: **a discipline is itself an infospace.** The VSM
|
||||
concepts (S1-S5, recursion, variety, algedonic signals) form their own
|
||||
curated, evaluable collection of ideas. To use the VSM as a discipline,
|
||||
it must first be a viable infospace in its own right — its concepts must
|
||||
be well-defined, coherent, and complete.
|
||||
|
||||
This leads to a recursive property: infospaces can be built on top of
|
||||
other infospaces. The Wealth of Nations infospace, viewed through the
|
||||
VSM lens, could itself become a discipline applied to analyse a modern
|
||||
supply chain. Each layer adds structure without losing the detail
|
||||
beneath it.
|
||||
|
||||
---
|
||||
|
||||
## How Infospaces Are Built
|
||||
|
||||
Building an infospace is an incremental process with four repeating
|
||||
phases:
|
||||
|
||||
### 1. Extract
|
||||
|
||||
Source material is processed one unit at a time (a chapter, a document,
|
||||
a dataset). For each unit, an LLM extracts entities according to the
|
||||
schemas and guidelines. Entities that already exist are recognised and
|
||||
skipped — the infospace grows by accumulation, not duplication.
|
||||
|
||||
### 2. Map
|
||||
|
||||
Extracted entities are mapped to the discipline. In our example, each
|
||||
economic concept is mapped to a VSM system with a strength rating and
|
||||
rationale. This is where the discipline lens does its work: it forces
|
||||
the question "what role does this concept play in the larger system?"
|
||||
|
||||
### 3. Evaluate
|
||||
|
||||
After extraction and mapping, the infospace is evaluated at two levels:
|
||||
|
||||
- **Per-entity**: each concept is scored against quality rubrics. Is the
|
||||
definition precise? Is it grounded in the source? Does it connect
|
||||
meaningfully to the discipline?
|
||||
- **Collection-level**: the set of concepts is assessed for redundancy,
|
||||
coverage, coherence, consistency, and granularity balance.
|
||||
|
||||
Evaluation produces structured, machine-readable scores — not prose
|
||||
narratives. These scores are tracked over time.
|
||||
|
||||
### 4. Refine
|
||||
|
||||
Evaluation reveals what needs improvement. Redundant concepts are merged
|
||||
or archived. Coverage gaps are addressed by re-extracting with improved
|
||||
guidelines. Inconsistencies are resolved by clarifying definitions.
|
||||
Guidelines and schemas are updated. The cycle repeats.
|
||||
|
||||
This loop — extract, map, evaluate, refine — is the heartbeat of a
|
||||
viable infospace. Each iteration makes the infospace more viable:
|
||||
more complete, more coherent, more consistent.
|
||||
|
||||
---
|
||||
|
||||
## How Infospaces Are Evaluated
|
||||
|
||||
Quality is assessed through two complementary mechanisms:
|
||||
|
||||
### LLM Evaluation
|
||||
|
||||
A language model reads an entity (or a pair of entities) and judges it
|
||||
against defined rubrics. This captures qualitative aspects that can't be
|
||||
computed mechanically: Is this definition actually precise? Does this
|
||||
mapping rationale make sense? Are these two concepts really different?
|
||||
|
||||
LLM evaluation is always **delegated** — it runs through prompt templates
|
||||
and the platform's LLM integration, never through the human or agent
|
||||
working on infrastructure. This separation keeps domain judgment in the
|
||||
problem space.
|
||||
|
||||
### Deterministic Aggregation
|
||||
|
||||
Structured scores from LLM evaluation, plus metrics computed directly
|
||||
from files (section counts, word lengths, graph properties, similarity
|
||||
matrices), are aggregated into collection-level indicators. These are
|
||||
numbers that can be tracked, diffed, and plotted:
|
||||
|
||||
- **Redundancy ratio** — what fraction of concepts substantially overlap
|
||||
- **Coverage ratio** — what fraction of the domain-discipline matrix is
|
||||
populated
|
||||
- **Graph density** — how connected the concept web is
|
||||
- **Cycle count** — how many circular definition chains exist
|
||||
- **Granularity entropy** — how balanced the abstraction levels are
|
||||
|
||||
These indicators, compared against user-defined thresholds, determine
|
||||
whether the infospace is **viable** for its intended purpose.
|
||||
|
||||
---
|
||||
|
||||
## Five Concerns of Collection Quality
|
||||
|
||||
Individual concept quality (is this definition good?) is necessary but
|
||||
not sufficient. An infospace made of individually excellent concepts can
|
||||
still fail as a collection. Five concerns capture what can go wrong:
|
||||
|
||||
### Redundancy
|
||||
|
||||
Do two concepts mean the same thing? Overlap wastes the reader's
|
||||
attention and creates ambiguity about which concept to use. Redundancy is
|
||||
detected through embedding similarity (are the definitions close in
|
||||
meaning?) confirmed by LLM judgment (are they genuinely the same
|
||||
concept, or merely related?).
|
||||
|
||||
### Coverage
|
||||
|
||||
Does the concept set cover the domain? Are there areas of the topic that
|
||||
have no corresponding concepts? Coverage is assessed structurally (which
|
||||
cells in the domain-discipline matrix are empty?) and functionally (can
|
||||
the infospace answer the questions it was built to answer?).
|
||||
|
||||
### Coherence
|
||||
|
||||
Do the concepts form a connected web of explanations, or a fragmented
|
||||
list of isolated ideas? Coherence is measured through graph analysis:
|
||||
connected components (is everything reachable?), modularity (are there
|
||||
meaningful clusters?), and bridge concepts (which ideas connect different
|
||||
areas?).
|
||||
|
||||
### Consistency
|
||||
|
||||
Are concepts defined in terms of each other without contradiction? Are
|
||||
there circular definition chains? Do definitions use terms that should
|
||||
be concepts but aren't? Consistency is checked through dependency graph
|
||||
analysis (cycles, undefined terms) and LLM pairwise judgment
|
||||
(do related definitions contradict each other?).
|
||||
|
||||
### Granularity Balance
|
||||
|
||||
Are concepts at comparable levels of abstraction? An infospace that mixes
|
||||
broad theoretical principles with narrow observations — without
|
||||
acknowledging the difference — confuses more than it explains. Balance
|
||||
is assessed by classifying each concept's abstraction level and measuring
|
||||
the distribution.
|
||||
|
||||
---
|
||||
|
||||
## Infospaces as Organisms
|
||||
|
||||
The biological metaphor is deliberate. A viable organism maintains its
|
||||
identity while exchanging material with its environment. It has internal
|
||||
coherence (its parts work together), boundary integrity (it is
|
||||
distinguishable from its surroundings), and adaptive capacity (it
|
||||
responds to change).
|
||||
|
||||
Infospaces exhibit the same properties:
|
||||
|
||||
- **Internal coherence** — concepts connect and support each other
|
||||
- **Boundary** — the topic and discipline define what belongs and what
|
||||
doesn't
|
||||
- **Adaptation** — evaluation and refinement allow the infospace to
|
||||
improve
|
||||
|
||||
And like organisms, infospaces don't exist in isolation.
|
||||
|
||||
### Hierarchical Composition
|
||||
|
||||
One infospace can serve as a discipline for another. The VSM infospace
|
||||
provides the lens for the Wealth of Nations infospace, which could
|
||||
provide the lens for a supply chain infospace. Each layer adds structure
|
||||
and interpretive power. This is analogous to biological organisation:
|
||||
cells compose into tissues, tissues into organs, organs into organisms.
|
||||
|
||||
For this to work, the lower-level infospace must be viable — you can't
|
||||
build reliable analysis on a shaky foundation. A discipline that is
|
||||
incomplete or inconsistent will produce unreliable mappings.
|
||||
|
||||
### Network Composition
|
||||
|
||||
Infospaces can also relate laterally. Two infospaces at the same level
|
||||
might share concepts, reference each other's entities, or provide
|
||||
complementary views of overlapping domains. A Wealth of Nations infospace
|
||||
and a Marx's Capital infospace might share economic entities while
|
||||
differing in their analytical discipline.
|
||||
|
||||
This networked structure mirrors how knowledge actually works: fields
|
||||
overlap, vocabularies are shared and contested, and understanding grows
|
||||
by connecting islands of well-organised thought.
|
||||
|
||||
### Swarm Behaviour
|
||||
|
||||
When many infospaces exist and interact, emergent properties appear.
|
||||
Common entities across many infospaces become well-tested through
|
||||
repeated evaluation in different contexts. Concepts that survive across
|
||||
multiple disciplines are more likely to be fundamental. Gaps visible from
|
||||
one perspective may be filled by insights from another.
|
||||
|
||||
This is speculative territory for now, but the tooling should be designed
|
||||
with it in mind: infospaces as first-class, composable, addressable
|
||||
units of knowledge.
|
||||
|
||||
---
|
||||
|
||||
## The Role of Tooling
|
||||
|
||||
An infospace is a living artefact that requires ongoing maintenance. The
|
||||
tooling must support every phase of the lifecycle:
|
||||
|
||||
### Creating an infospace
|
||||
|
||||
Declaring a topic, binding disciplines, defining schemas and competency
|
||||
questions, setting viability thresholds. This should be a single
|
||||
configuration step, not a programming exercise.
|
||||
|
||||
### Populating an infospace
|
||||
|
||||
Processing source material through the extract-map pipeline, one unit at
|
||||
a time. Progress is tracked. Each addition is committed to version
|
||||
history.
|
||||
|
||||
### Evaluating an infospace
|
||||
|
||||
Running per-entity and collection-level checks. Producing structured,
|
||||
machine-readable scores. Comparing against viability thresholds.
|
||||
Identifying specific issues (this entity is redundant, this domain gap
|
||||
needs filling, these definitions contradict).
|
||||
|
||||
### Refining an infospace
|
||||
|
||||
Acting on evaluation results: archiving redundant entities, re-extracting
|
||||
with improved guidelines, updating schemas, re-evaluating. Every change
|
||||
is traceable.
|
||||
|
||||
### Composing infospaces
|
||||
|
||||
Binding one infospace as a discipline for another. Checking that the
|
||||
discipline is viable. Propagating changes when the discipline's concepts
|
||||
are updated.
|
||||
|
||||
### Monitoring an infospace
|
||||
|
||||
Tracking metrics over time. Seeing how coverage, coherence, and
|
||||
consistency evolve as content is added. Detecting regressions when a
|
||||
re-extraction reduces quality.
|
||||
|
||||
The tooling should present these operations as simple, well-documented
|
||||
commands — not as infrastructure details. The user thinks in terms of
|
||||
"evaluate my infospace" and "check for redundancy", not in terms of
|
||||
embedding vectors and graph algorithms.
|
||||
|
||||
---
|
||||
|
||||
## Where We Are
|
||||
|
||||
We have built the first example infospace: 85 economic entities from
|
||||
Adam Smith's *The Wealth of Nations*, mapped to Beer's Viable System
|
||||
Model, with schemas, prompt templates, and a chapter-by-chapter
|
||||
pipeline.
|
||||
|
||||
This example has taught us what works (incremental extraction,
|
||||
deduplication, flat canonical entity sets, transclusion views) and what's
|
||||
missing (per-concept evaluation, collection-level checks, composition
|
||||
model, clean tooling commands).
|
||||
|
||||
The work ahead is to generalise from this example: build the platform
|
||||
capabilities needed, create the tooling layer that makes infospace
|
||||
operations accessible, and then revisit the example as both a validation
|
||||
and a tutorial.
|
||||
|
||||
The goal is that anyone with a body of source material and an analytical
|
||||
framework can create a viable infospace — and that infospaces, once
|
||||
built, become reusable intellectual tools for future work.
|
||||
0
tests/unit/analysis/__init__.py
Normal file
0
tests/unit/analysis/__init__.py
Normal file
313
tests/unit/analysis/test_fca.py
Normal file
313
tests/unit/analysis/test_fca.py
Normal file
@@ -0,0 +1,313 @@
|
||||
"""Tests for markitect.analysis.fca."""
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.analysis.fca import (
|
||||
FormalContext,
|
||||
FormalConcept,
|
||||
ConceptLattice,
|
||||
find_gap_concepts,
|
||||
find_empty_cells,
|
||||
)
|
||||
|
||||
|
||||
# ── Test data ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _animal_context():
|
||||
"""Classic FCA example: animals × properties.
|
||||
|
||||
Context:
|
||||
| animal | legs | wings | feathers | fur |
|
||||
|-----------|------|-------|----------|-----|
|
||||
| dog | x | | | x |
|
||||
| cat | x | | | x |
|
||||
| eagle | x | x | x | |
|
||||
| sparrow | x | x | x | |
|
||||
| penguin | x | | x | |
|
||||
"""
|
||||
return FormalContext(
|
||||
objects=["dog", "cat", "eagle", "sparrow", "penguin"],
|
||||
attributes=["legs", "wings", "feathers", "fur"],
|
||||
incidence={
|
||||
"dog": {"legs", "fur"},
|
||||
"cat": {"legs", "fur"},
|
||||
"eagle": {"legs", "wings", "feathers"},
|
||||
"sparrow": {"legs", "wings", "feathers"},
|
||||
"penguin": {"legs", "feathers"},
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _infospace_context():
|
||||
"""Simplified infospace-style context: entities × {domain, vsm_system}.
|
||||
|
||||
Entities with domain and VSM classification, including a gap:
|
||||
no entity has both domain:Exchange and vsm:S3.
|
||||
"""
|
||||
return FormalContext.from_dict({
|
||||
"division-of-labour": {"domain:Production", "vsm:S1"},
|
||||
"pin-factory": {"domain:Production", "vsm:S1"},
|
||||
"market-extent": {"domain:Exchange", "vsm:S4"},
|
||||
"wage-determination": {"domain:Distribution", "vsm:S3"},
|
||||
"rent-theory": {"domain:Distribution", "vsm:S5"},
|
||||
"capital-accumulation": {"domain:Production", "vsm:S3"},
|
||||
})
|
||||
|
||||
|
||||
def _empty_context():
|
||||
"""Context with no objects."""
|
||||
return FormalContext([], ["a", "b"], {})
|
||||
|
||||
|
||||
def _single_entity():
|
||||
"""Context with one object."""
|
||||
return FormalContext(["only"], ["x", "y"], {"only": {"x", "y"}})
|
||||
|
||||
|
||||
# ── FormalContext ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestFormalContext:
|
||||
def test_objects_sorted(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.objects == sorted(ctx.objects)
|
||||
|
||||
def test_attributes_sorted(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.attributes == sorted(ctx.attributes)
|
||||
|
||||
def test_object_count(self):
|
||||
assert _animal_context().object_count == 5
|
||||
|
||||
def test_attribute_count(self):
|
||||
assert _animal_context().attribute_count == 4
|
||||
|
||||
def test_extent_single_attr(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.extent(["fur"]) == frozenset({"dog", "cat"})
|
||||
|
||||
def test_extent_multiple_attrs(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.extent(["wings", "feathers"]) == frozenset({"eagle", "sparrow"})
|
||||
|
||||
def test_extent_empty_returns_all(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.extent([]) == frozenset(ctx.objects)
|
||||
|
||||
def test_extent_no_match(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.extent(["fur", "feathers"]) == frozenset()
|
||||
|
||||
def test_intent_single_obj(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.intent(["penguin"]) == frozenset({"legs", "feathers"})
|
||||
|
||||
def test_intent_multiple_objs(self):
|
||||
ctx = _animal_context()
|
||||
# dog and cat share: legs, fur
|
||||
assert ctx.intent(["dog", "cat"]) == frozenset({"legs", "fur"})
|
||||
|
||||
def test_intent_empty_returns_all(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.intent([]) == frozenset(ctx.attributes)
|
||||
|
||||
def test_closure_is_idempotent(self):
|
||||
ctx = _animal_context()
|
||||
c1 = ctx.closure({"fur"})
|
||||
c2 = ctx.closure(c1)
|
||||
assert c1 == c2
|
||||
|
||||
def test_closure_expands(self):
|
||||
ctx = _animal_context()
|
||||
# fur → {dog, cat} → {legs, fur} (both have legs too)
|
||||
assert ctx.closure({"fur"}) == frozenset({"legs", "fur"})
|
||||
|
||||
def test_has_attribute(self):
|
||||
ctx = _animal_context()
|
||||
assert ctx.has_attribute("dog", "legs") is True
|
||||
assert ctx.has_attribute("dog", "wings") is False
|
||||
|
||||
def test_density(self):
|
||||
ctx = _animal_context()
|
||||
# 5 objects × 4 attributes = 20 cells
|
||||
# dog:2, cat:2, eagle:3, sparrow:3, penguin:2 = 12 filled
|
||||
assert ctx.density() == pytest.approx(12 / 20)
|
||||
|
||||
def test_density_empty(self):
|
||||
assert FormalContext([], [], {}).density() == 0.0
|
||||
|
||||
def test_from_dict(self):
|
||||
ctx = FormalContext.from_dict({
|
||||
"a": {"x", "y"},
|
||||
"b": {"y", "z"},
|
||||
})
|
||||
assert ctx.object_count == 2
|
||||
assert ctx.attribute_count == 3
|
||||
|
||||
def test_unknown_attributes_ignored(self):
|
||||
ctx = FormalContext(
|
||||
["a"], ["x"], {"a": {"x", "unknown"}}
|
||||
)
|
||||
assert ctx.intent(["a"]) == frozenset({"x"})
|
||||
|
||||
|
||||
# ── ConceptLattice ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestConceptLattice:
|
||||
def test_animal_concept_count(self):
|
||||
ctx = _animal_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
# Known: the animal context produces exactly 7 formal concepts
|
||||
# Top: ({all}, {legs}), Bottom: ({}, {all 4}),
|
||||
# plus intermediate concepts
|
||||
assert lattice.size >= 5
|
||||
|
||||
def test_top_has_all_objects(self):
|
||||
ctx = _animal_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
top = lattice.top
|
||||
assert top is not None
|
||||
assert top.extent == frozenset(ctx.objects)
|
||||
|
||||
def test_top_intent_is_common_attributes(self):
|
||||
ctx = _animal_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
top = lattice.top
|
||||
# All animals have "legs"
|
||||
assert "legs" in top.intent
|
||||
|
||||
def test_bottom_has_all_attributes(self):
|
||||
ctx = _animal_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
bottom = lattice.bottom
|
||||
assert bottom is not None
|
||||
assert bottom.intent == frozenset(ctx.attributes)
|
||||
|
||||
def test_bottom_extent_empty_when_no_universal_object(self):
|
||||
ctx = _animal_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
bottom = lattice.bottom
|
||||
# No animal has all 4 attributes
|
||||
assert bottom.extent_size == 0
|
||||
|
||||
def test_all_concepts_are_closed(self):
|
||||
ctx = _animal_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
for concept in lattice.concepts:
|
||||
# intent should be closed: closure(intent) == intent
|
||||
assert ctx.closure(concept.intent) == concept.intent
|
||||
# extent' should equal intent
|
||||
assert ctx.intent(concept.extent) == concept.intent
|
||||
# intent' should equal extent
|
||||
assert ctx.extent(concept.intent) == concept.extent
|
||||
|
||||
def test_empty_context(self):
|
||||
ctx = _empty_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
# Empty context → gap concepts for all attribute combinations
|
||||
assert lattice.size >= 1
|
||||
|
||||
def test_single_entity(self):
|
||||
ctx = _single_entity()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
# At least 1 concept containing the single entity
|
||||
has_entity = any(
|
||||
"only" in c.extent for c in lattice.concepts
|
||||
)
|
||||
assert has_entity
|
||||
|
||||
def test_no_attributes_produces_one_concept(self):
|
||||
ctx = FormalContext(["a", "b"], [], {})
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
assert lattice.size == 1
|
||||
assert lattice.concepts[0].extent == frozenset({"a", "b"})
|
||||
|
||||
def test_depth(self):
|
||||
ctx = _animal_context()
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
d = lattice.depth()
|
||||
# At least 2 levels (top → bottom)
|
||||
assert d >= 2
|
||||
|
||||
def test_depth_empty(self):
|
||||
lattice = ConceptLattice(concepts=[])
|
||||
assert lattice.depth() == 0
|
||||
|
||||
|
||||
# ── Gap concepts ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGapConcepts:
|
||||
def test_animal_has_gap(self):
|
||||
ctx = _animal_context()
|
||||
gaps = find_gap_concepts(ctx)
|
||||
# {fur, feathers} has no animal → gap concept
|
||||
fur_feathers_gap = any(
|
||||
{"fur", "feathers"} <= c.intent for c in gaps
|
||||
)
|
||||
assert fur_feathers_gap
|
||||
|
||||
def test_gap_extents_are_empty(self):
|
||||
ctx = _animal_context()
|
||||
gaps = find_gap_concepts(ctx)
|
||||
for gap in gaps:
|
||||
assert gap.extent_size == 0
|
||||
|
||||
def test_no_gaps_when_all_combinations_covered(self):
|
||||
# Every attribute combination has at least one object
|
||||
ctx = FormalContext.from_dict({
|
||||
"obj1": {"a", "b"},
|
||||
"obj2": {"a"},
|
||||
"obj3": {"b"},
|
||||
})
|
||||
lattice = ConceptLattice.from_context(ctx)
|
||||
gaps = find_gap_concepts(ctx, lattice)
|
||||
assert len(gaps) == 0
|
||||
|
||||
def test_sorted_by_intent_size(self):
|
||||
ctx = _animal_context()
|
||||
gaps = find_gap_concepts(ctx)
|
||||
sizes = [g.intent_size for g in gaps]
|
||||
assert sizes == sorted(sizes)
|
||||
|
||||
def test_infospace_gap(self):
|
||||
ctx = _infospace_context()
|
||||
gaps = find_gap_concepts(ctx)
|
||||
# domain:Exchange + vsm:S1 has no entity → should appear as gap
|
||||
gap_intents = [g.intent for g in gaps]
|
||||
exchange_s1_covered = any(
|
||||
{"domain:Exchange", "vsm:S1"} <= intent for intent in gap_intents
|
||||
)
|
||||
assert exchange_s1_covered
|
||||
|
||||
|
||||
# ── Empty cells (cross-tab) ─────────────────────────────────────────
|
||||
|
||||
|
||||
class TestFindEmptyCells:
|
||||
def test_finds_empty_cells(self):
|
||||
ctx = _infospace_context()
|
||||
domains = ["domain:Production", "domain:Distribution", "domain:Exchange"]
|
||||
vsm_systems = ["vsm:S1", "vsm:S3", "vsm:S4", "vsm:S5"]
|
||||
empty = find_empty_cells(ctx, domains, vsm_systems)
|
||||
# domain:Exchange + vsm:S1 should be empty
|
||||
assert ("domain:Exchange", "vsm:S1") in empty
|
||||
# domain:Production + vsm:S1 should NOT be empty (division-of-labour)
|
||||
assert ("domain:Production", "vsm:S1") not in empty
|
||||
|
||||
def test_all_filled_returns_empty_list(self):
|
||||
ctx = FormalContext.from_dict({
|
||||
"a": {"x", "y"},
|
||||
"b": {"x", "z"},
|
||||
"c": {"y", "z"},
|
||||
"d": {"x", "y", "z"},
|
||||
})
|
||||
empty = find_empty_cells(ctx, ["x", "y"], ["z"])
|
||||
assert empty == []
|
||||
|
||||
def test_empty_context_all_cells_empty(self):
|
||||
ctx = FormalContext([], ["a", "b", "c"], {})
|
||||
empty = find_empty_cells(ctx, ["a"], ["b", "c"])
|
||||
assert len(empty) == 2
|
||||
254
tests/unit/analysis/test_graph.py
Normal file
254
tests/unit/analysis/test_graph.py
Normal file
@@ -0,0 +1,254 @@
|
||||
"""Tests for markitect.analysis.graph."""
|
||||
|
||||
import pytest
|
||||
|
||||
nx = pytest.importorskip("networkx", reason="networkx not installed")
|
||||
|
||||
from markitect.prompts.dependencies.models import DependencyGraph, EdgeType
|
||||
from markitect.analysis.graph import (
|
||||
to_networkx,
|
||||
connected_components,
|
||||
betweenness_centrality,
|
||||
detect_communities,
|
||||
modularity_score,
|
||||
degree_distribution,
|
||||
cohesion_coupling,
|
||||
)
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _linear_graph():
|
||||
"""A -> B -> C -> D (simple chain)."""
|
||||
g = DependencyGraph()
|
||||
g.add_edge("A", "B")
|
||||
g.add_edge("B", "C")
|
||||
g.add_edge("C", "D")
|
||||
return g
|
||||
|
||||
|
||||
def _two_clusters():
|
||||
"""Two dense clusters connected by a single bridge edge.
|
||||
|
||||
Cluster 1: A -- B -- C (fully connected)
|
||||
Cluster 2: X -- Y -- Z (fully connected)
|
||||
Bridge: C -> X
|
||||
"""
|
||||
g = DependencyGraph()
|
||||
# Cluster 1
|
||||
g.add_edge("A", "B")
|
||||
g.add_edge("B", "A")
|
||||
g.add_edge("B", "C")
|
||||
g.add_edge("C", "B")
|
||||
g.add_edge("A", "C")
|
||||
g.add_edge("C", "A")
|
||||
# Cluster 2
|
||||
g.add_edge("X", "Y")
|
||||
g.add_edge("Y", "X")
|
||||
g.add_edge("Y", "Z")
|
||||
g.add_edge("Z", "Y")
|
||||
g.add_edge("X", "Z")
|
||||
g.add_edge("Z", "X")
|
||||
# Bridge
|
||||
g.add_edge("C", "X")
|
||||
return g
|
||||
|
||||
|
||||
def _disconnected_graph():
|
||||
"""Two separate components: {A, B} and {X, Y}."""
|
||||
g = DependencyGraph()
|
||||
g.add_edge("A", "B")
|
||||
g.add_edge("X", "Y")
|
||||
return g
|
||||
|
||||
|
||||
def _empty_graph():
|
||||
"""Graph with no nodes or edges."""
|
||||
return DependencyGraph()
|
||||
|
||||
|
||||
def _isolated_nodes():
|
||||
"""Graph with nodes but no edges."""
|
||||
g = DependencyGraph()
|
||||
# add_edge creates both nodes, so we use two separate edges
|
||||
# and then extract a subgraph with isolated nodes
|
||||
g.add_edge("A", "B")
|
||||
return g.get_subgraph({"A", "B", "C"})
|
||||
|
||||
|
||||
# ── to_networkx ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestToNetworkx:
|
||||
def test_preserves_nodes(self):
|
||||
g = _linear_graph()
|
||||
G = to_networkx(g)
|
||||
assert set(G.nodes) == {"A", "B", "C", "D"}
|
||||
|
||||
def test_preserves_edges(self):
|
||||
g = _linear_graph()
|
||||
G = to_networkx(g)
|
||||
assert G.has_edge("A", "B")
|
||||
assert G.has_edge("B", "C")
|
||||
assert not G.has_edge("D", "A")
|
||||
|
||||
def test_preserves_edge_type(self):
|
||||
g = DependencyGraph()
|
||||
g.add_edge("A", "B", EdgeType.GENERATES)
|
||||
G = to_networkx(g)
|
||||
assert G.edges["A", "B"]["edge_type"] == "generates"
|
||||
|
||||
def test_empty_graph(self):
|
||||
G = to_networkx(_empty_graph())
|
||||
assert len(G.nodes) == 0
|
||||
assert len(G.edges) == 0
|
||||
|
||||
|
||||
# ── Connected components ────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestConnectedComponents:
|
||||
def test_single_component(self):
|
||||
comps = connected_components(_linear_graph())
|
||||
assert len(comps) == 1
|
||||
assert comps[0] == {"A", "B", "C", "D"}
|
||||
|
||||
def test_two_components(self):
|
||||
comps = connected_components(_disconnected_graph())
|
||||
assert len(comps) == 2
|
||||
node_sets = [frozenset(c) for c in comps]
|
||||
assert frozenset({"A", "B"}) in node_sets
|
||||
assert frozenset({"X", "Y"}) in node_sets
|
||||
|
||||
def test_sorted_largest_first(self):
|
||||
g = DependencyGraph()
|
||||
g.add_edge("A", "B")
|
||||
g.add_edge("B", "C")
|
||||
g.add_edge("X", "Y")
|
||||
comps = connected_components(g)
|
||||
assert len(comps[0]) >= len(comps[1])
|
||||
|
||||
def test_empty_graph(self):
|
||||
assert connected_components(_empty_graph()) == []
|
||||
|
||||
|
||||
# ── Betweenness centrality ──────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBetweennessCentrality:
|
||||
def test_linear_chain_middle_node_highest(self):
|
||||
g = _linear_graph()
|
||||
bc = betweenness_centrality(g)
|
||||
# B and C are on all shortest paths between endpoints
|
||||
assert bc["B"] > bc["A"]
|
||||
assert bc["C"] > bc["D"]
|
||||
|
||||
def test_values_in_range(self):
|
||||
bc = betweenness_centrality(_two_clusters())
|
||||
for v in bc.values():
|
||||
assert 0.0 <= v <= 1.0
|
||||
|
||||
def test_empty_graph(self):
|
||||
assert betweenness_centrality(_empty_graph()) == {}
|
||||
|
||||
|
||||
# ── Community detection ─────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestDetectCommunities:
|
||||
def test_two_clusters_detected(self):
|
||||
comms = detect_communities(_two_clusters(), seed=42)
|
||||
# Should detect at least 2 communities
|
||||
assert len(comms) >= 2
|
||||
# Each node in exactly one community
|
||||
all_nodes = set()
|
||||
for c in comms:
|
||||
all_nodes.update(c)
|
||||
assert all_nodes == {"A", "B", "C", "X", "Y", "Z"}
|
||||
|
||||
def test_deterministic_with_seed(self):
|
||||
g = _two_clusters()
|
||||
c1 = detect_communities(g, seed=42)
|
||||
c2 = detect_communities(g, seed=42)
|
||||
assert c1 == c2
|
||||
|
||||
def test_empty_graph(self):
|
||||
assert detect_communities(_empty_graph()) == []
|
||||
|
||||
def test_sorted_largest_first(self):
|
||||
comms = detect_communities(_two_clusters(), seed=42)
|
||||
sizes = [len(c) for c in comms]
|
||||
assert sizes == sorted(sizes, reverse=True)
|
||||
|
||||
|
||||
# ── Modularity score ────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestModularityScore:
|
||||
def test_no_edges_returns_zero(self):
|
||||
assert modularity_score(_empty_graph()) == 0.0
|
||||
|
||||
def test_two_clusters_positive(self):
|
||||
g = _two_clusters()
|
||||
comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
|
||||
score = modularity_score(g, communities=comms)
|
||||
assert score > 0.0
|
||||
|
||||
def test_single_community_near_zero(self):
|
||||
g = _two_clusters()
|
||||
all_nodes = {"A", "B", "C", "X", "Y", "Z"}
|
||||
score = modularity_score(g, communities=[all_nodes])
|
||||
assert score == pytest.approx(0.0, abs=1e-10)
|
||||
|
||||
|
||||
# ── Degree distribution ─────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestDegreeDistribution:
|
||||
def test_linear_chain(self):
|
||||
dd = degree_distribution(_linear_graph())
|
||||
# A: out=1 in=0; B: out=1 in=1; D: out=0 in=1
|
||||
assert dd["A"]["out_degree"] == 1
|
||||
assert dd["A"]["in_degree"] == 0
|
||||
assert dd["B"]["in_degree"] == 1
|
||||
assert dd["B"]["out_degree"] == 1
|
||||
assert dd["D"]["in_degree"] == 1
|
||||
assert dd["D"]["out_degree"] == 0
|
||||
|
||||
def test_total_degree(self):
|
||||
dd = degree_distribution(_linear_graph())
|
||||
for node, degrees in dd.items():
|
||||
assert degrees["total_degree"] == degrees["in_degree"] + degrees["out_degree"]
|
||||
|
||||
def test_empty_graph(self):
|
||||
assert degree_distribution(_empty_graph()) == {}
|
||||
|
||||
|
||||
# ── Cohesion / coupling ─────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCohesionCoupling:
|
||||
def test_two_clusters_with_bridge(self):
|
||||
g = _two_clusters()
|
||||
comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
|
||||
cc = cohesion_coupling(g, communities=comms)
|
||||
# 12 intra-cluster edges + 1 bridge = 13 total
|
||||
assert cc["intra_edges"] == 12
|
||||
assert cc["inter_edges"] == 1
|
||||
assert cc["total_edges"] == 13
|
||||
assert cc["cohesion"] == pytest.approx(12 / 13)
|
||||
assert cc["coupling"] == pytest.approx(1 / 13)
|
||||
assert cc["communities"] == 2
|
||||
|
||||
def test_no_edges(self):
|
||||
cc = cohesion_coupling(_empty_graph())
|
||||
assert cc["cohesion"] == 0.0
|
||||
assert cc["coupling"] == 0.0
|
||||
assert cc["total_edges"] == 0
|
||||
|
||||
def test_ratios_sum_to_one(self):
|
||||
g = _two_clusters()
|
||||
comms = [{"A", "B", "C"}, {"X", "Y", "Z"}]
|
||||
cc = cohesion_coupling(g, communities=comms)
|
||||
assert cc["cohesion"] + cc["coupling"] == pytest.approx(1.0)
|
||||
0
tests/unit/core/__init__.py
Normal file
0
tests/unit/core/__init__.py
Normal file
137
tests/unit/core/test_section_tree.py
Normal file
137
tests/unit/core/test_section_tree.py
Normal file
@@ -0,0 +1,137 @@
|
||||
"""Tests for markitect.core.section_tree."""
|
||||
|
||||
from markitect.core.parser import parse_markdown_to_ast
|
||||
from markitect.core.section_tree import (
|
||||
build_section_tree,
|
||||
extract_heading_content,
|
||||
extract_heading_level,
|
||||
extract_section_text,
|
||||
slugify,
|
||||
)
|
||||
|
||||
|
||||
class TestSlugify:
|
||||
def test_simple_text(self):
|
||||
assert slugify("Hello World") == "hello_world"
|
||||
|
||||
def test_german_umlauts(self):
|
||||
assert slugify("Ärger mit Über") == "aerger_mit_ueber"
|
||||
|
||||
def test_special_characters(self):
|
||||
assert slugify("Smith's Original Wording") == "smith_s_original_wording"
|
||||
|
||||
def test_empty_string(self):
|
||||
assert slugify("") == "feld"
|
||||
|
||||
def test_trailing_underscores_stripped(self):
|
||||
assert slugify("--hello--") == "hello"
|
||||
|
||||
def test_multiple_spaces(self):
|
||||
assert slugify("a b") == "a_b"
|
||||
|
||||
|
||||
class TestExtractHeadingLevel:
|
||||
def test_h1(self):
|
||||
assert extract_heading_level("h1") == 1
|
||||
|
||||
def test_h6(self):
|
||||
assert extract_heading_level("h6") == 6
|
||||
|
||||
def test_invalid_tag(self):
|
||||
assert extract_heading_level("p") == 1
|
||||
|
||||
def test_empty(self):
|
||||
assert extract_heading_level("") == 1
|
||||
|
||||
|
||||
class TestExtractHeadingContent:
|
||||
def test_finds_inline_token(self):
|
||||
tokens = [
|
||||
{"type": "heading_open", "tag": "h1"},
|
||||
{"type": "inline", "content": "Hello"},
|
||||
{"type": "heading_close", "tag": "h1"},
|
||||
]
|
||||
assert extract_heading_content(tokens, 0) == "Hello"
|
||||
|
||||
def test_no_inline(self):
|
||||
tokens = [
|
||||
{"type": "heading_open", "tag": "h1"},
|
||||
{"type": "heading_close", "tag": "h1"},
|
||||
]
|
||||
assert extract_heading_content(tokens, 0) == ""
|
||||
|
||||
|
||||
class TestBuildSectionTree:
|
||||
def test_single_heading(self):
|
||||
md = "# Title\n\nSome text."
|
||||
tokens = parse_markdown_to_ast(md)
|
||||
tree = build_section_tree(tokens)
|
||||
|
||||
assert tree["level"] == 0
|
||||
assert len(tree["children"]) == 1
|
||||
assert tree["children"][0]["heading"] == "Title"
|
||||
assert tree["children"][0]["level"] == 1
|
||||
|
||||
def test_nested_headings(self):
|
||||
md = "# Top\n\n## Sub\n\ntext\n\n## Sub2\n\nmore"
|
||||
tokens = parse_markdown_to_ast(md)
|
||||
tree = build_section_tree(tokens)
|
||||
|
||||
top = tree["children"][0]
|
||||
assert top["heading"] == "Top"
|
||||
assert len(top["children"]) == 2
|
||||
assert top["children"][0]["heading"] == "Sub"
|
||||
assert top["children"][1]["heading"] == "Sub2"
|
||||
|
||||
def test_max_depth(self):
|
||||
md = "# Top\n\n## Sub\n\n### Deep\n\ntext"
|
||||
tokens = parse_markdown_to_ast(md)
|
||||
tree = build_section_tree(tokens, max_depth=2)
|
||||
|
||||
top = tree["children"][0]
|
||||
sub = top["children"][0]
|
||||
# H3 should be excluded from tree
|
||||
assert len(sub["children"]) == 0
|
||||
|
||||
def test_content_tokens_captured(self):
|
||||
md = "# Title\n\nParagraph text here."
|
||||
tokens = parse_markdown_to_ast(md)
|
||||
tree = build_section_tree(tokens)
|
||||
|
||||
section = tree["children"][0]
|
||||
inline_tokens = [t for t in section["content_tokens"] if t.get("type") == "inline"]
|
||||
assert len(inline_tokens) == 1
|
||||
assert "Paragraph text here" in inline_tokens[0]["content"]
|
||||
|
||||
def test_slug_assigned(self):
|
||||
md = "# Economic Domain\n\ntext"
|
||||
tokens = parse_markdown_to_ast(md)
|
||||
tree = build_section_tree(tokens)
|
||||
|
||||
assert tree["children"][0]["slug"] == "economic_domain"
|
||||
|
||||
def test_empty_document(self):
|
||||
tokens = parse_markdown_to_ast("")
|
||||
tree = build_section_tree(tokens)
|
||||
assert tree["children"] == []
|
||||
|
||||
|
||||
class TestExtractSectionText:
|
||||
def test_simple_paragraph(self):
|
||||
md = "# Title\n\nHello world."
|
||||
tokens = parse_markdown_to_ast(md)
|
||||
tree = build_section_tree(tokens)
|
||||
text = extract_section_text(tree["children"][0])
|
||||
assert text == "Hello world."
|
||||
|
||||
def test_multiple_paragraphs(self):
|
||||
md = "# Title\n\nFirst paragraph.\n\nSecond paragraph."
|
||||
tokens = parse_markdown_to_ast(md)
|
||||
tree = build_section_tree(tokens)
|
||||
text = extract_section_text(tree["children"][0])
|
||||
assert "First paragraph." in text
|
||||
assert "Second paragraph." in text
|
||||
|
||||
def test_empty_section(self):
|
||||
section = {"content_tokens": []}
|
||||
assert extract_section_text(section) == ""
|
||||
0
tests/unit/infospace/__init__.py
Normal file
0
tests/unit/infospace/__init__.py
Normal file
413
tests/unit/infospace/test_checks.py
Normal file
413
tests/unit/infospace/test_checks.py
Normal file
@@ -0,0 +1,413 @@
|
||||
"""
|
||||
Tests for collection-level quality checks (S2.4).
|
||||
|
||||
Covers all five concerns: Redundancy (C1), Coverage (C2), Coherence (C3),
|
||||
Consistency (C4), Granularity (C5), and the orchestrator.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.prompts.dependencies.models import DependencyGraph
|
||||
|
||||
|
||||
# ── helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _entity(slug: str, domain: str = "", definition: str = "",
|
||||
source_chapter: str = "", word_count: int = 0) -> EntityMeta:
|
||||
wc = word_count if word_count else (len(definition.split()) if definition else 0)
|
||||
return EntityMeta(
|
||||
slug=slug,
|
||||
title=slug.replace("-", " ").title(),
|
||||
h1_raw=slug.replace("-", " ").title(),
|
||||
definition=definition,
|
||||
domain=domain,
|
||||
source_chapter=source_chapter,
|
||||
definition_word_count=wc,
|
||||
total_word_count=wc,
|
||||
)
|
||||
|
||||
|
||||
def _sample_entities() -> list[EntityMeta]:
|
||||
return [
|
||||
_entity("alpha", domain="economics", definition="the first concept in our model", source_chapter="ch01"),
|
||||
_entity("beta", domain="economics", definition="the second concept about markets", source_chapter="ch01"),
|
||||
_entity("gamma", domain="sociology", definition="a social structure framework", source_chapter="ch02"),
|
||||
_entity("delta", domain="sociology", definition="a social dynamic pattern", source_chapter="ch02"),
|
||||
_entity("epsilon", domain="philosophy", definition="an epistemic principle", source_chapter="ch03"),
|
||||
]
|
||||
|
||||
|
||||
def _linear_graph() -> DependencyGraph:
|
||||
"""A -> B -> C -> D."""
|
||||
g = DependencyGraph()
|
||||
g.add_edge("A", "B")
|
||||
g.add_edge("B", "C")
|
||||
g.add_edge("C", "D")
|
||||
return g
|
||||
|
||||
|
||||
def _cyclic_graph() -> DependencyGraph:
|
||||
"""A -> B -> C -> A (one cycle)."""
|
||||
g = DependencyGraph()
|
||||
g.add_edge("A", "B")
|
||||
g.add_edge("B", "C")
|
||||
g.add_edge("C", "A")
|
||||
return g
|
||||
|
||||
|
||||
def _can_import_graph_analysis():
|
||||
try:
|
||||
from markitect.analysis.graph import connected_components # noqa: F401
|
||||
return True
|
||||
except ImportError:
|
||||
return False
|
||||
|
||||
|
||||
# ── C1: Redundancy ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestRedundancy:
|
||||
def test_empty_entities(self):
|
||||
from markitect.infospace.checks.redundancy import check_redundancy
|
||||
report = check_redundancy([])
|
||||
assert report.entity_count == 0
|
||||
assert report.redundancy_ratio == 0.0
|
||||
assert report.similar_pairs == []
|
||||
|
||||
def test_single_entity(self):
|
||||
from markitect.infospace.checks.redundancy import check_redundancy
|
||||
report = check_redundancy([_entity("a", definition="hello world")])
|
||||
assert report.entity_count == 1
|
||||
assert report.redundancy_ratio == 0.0
|
||||
|
||||
def test_no_overlap_word_fallback(self):
|
||||
from markitect.infospace.checks.redundancy import check_redundancy
|
||||
entities = [
|
||||
_entity("a", definition="apple banana cherry"),
|
||||
_entity("b", definition="delta epsilon zeta"),
|
||||
]
|
||||
report = check_redundancy(entities, threshold=0.5)
|
||||
assert report.similar_pairs == []
|
||||
assert report.redundancy_ratio == 0.0
|
||||
|
||||
def test_high_overlap_word_fallback(self):
|
||||
from markitect.infospace.checks.redundancy import check_redundancy
|
||||
entities = [
|
||||
_entity("a", definition="the quick brown fox"),
|
||||
_entity("b", definition="the quick brown dog"),
|
||||
]
|
||||
report = check_redundancy(entities, threshold=0.5)
|
||||
assert len(report.similar_pairs) == 1
|
||||
assert report.similar_pairs[0]["method"] == "word_overlap"
|
||||
assert report.similar_pairs[0]["entity_a"] == "a"
|
||||
assert report.similar_pairs[0]["entity_b"] == "b"
|
||||
assert report.redundancy_ratio == 1.0 # both entities involved
|
||||
|
||||
def test_embedding_based(self):
|
||||
from markitect.infospace.checks.redundancy import check_redundancy
|
||||
entities = [
|
||||
_entity("a", definition="x"),
|
||||
_entity("b", definition="y"),
|
||||
_entity("c", definition="z"),
|
||||
]
|
||||
# a and b are very similar; c is different
|
||||
embeddings = {
|
||||
"a": [1.0, 0.0, 0.0],
|
||||
"b": [0.99, 0.1, 0.0],
|
||||
"c": [0.0, 0.0, 1.0],
|
||||
}
|
||||
report = check_redundancy(entities, embeddings=embeddings, threshold=0.9)
|
||||
assert len(report.similar_pairs) >= 1
|
||||
assert report.similar_pairs[0]["method"] == "embedding"
|
||||
assert report.redundancy_ratio > 0.0
|
||||
|
||||
def test_to_dict(self):
|
||||
from markitect.infospace.checks.redundancy import RedundancyReport
|
||||
r = RedundancyReport(similar_pairs=[], redundancy_ratio=0.25, entity_count=10)
|
||||
d = r.to_dict()
|
||||
assert d["concern"] == "C1"
|
||||
assert d["redundancy_ratio"] == 0.25
|
||||
assert d["entity_count"] == 10
|
||||
|
||||
|
||||
# ── C2: Coverage ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCoverage:
|
||||
def test_empty_entities(self):
|
||||
from markitect.infospace.checks.coverage import check_coverage
|
||||
report = check_coverage([])
|
||||
assert report.entity_count == 0
|
||||
assert report.coverage_ratio == 0.0
|
||||
|
||||
def test_full_coverage(self):
|
||||
"""All domain×chapter cells are populated."""
|
||||
from markitect.infospace.checks.coverage import check_coverage
|
||||
entities = [
|
||||
_entity("a", domain="d1", source_chapter="ch1"),
|
||||
_entity("b", domain="d2", source_chapter="ch1"),
|
||||
_entity("c", domain="d1", source_chapter="ch2"),
|
||||
_entity("d", domain="d2", source_chapter="ch2"),
|
||||
]
|
||||
report = check_coverage(entities)
|
||||
assert report.coverage_ratio == 1.0
|
||||
assert report.empty_cells == []
|
||||
|
||||
def test_partial_coverage(self):
|
||||
"""One cell is missing → coverage < 1.0."""
|
||||
from markitect.infospace.checks.coverage import check_coverage
|
||||
entities = [
|
||||
_entity("a", domain="d1", source_chapter="ch1"),
|
||||
_entity("b", domain="d2", source_chapter="ch1"),
|
||||
_entity("c", domain="d1", source_chapter="ch2"),
|
||||
# Missing: d2×ch2
|
||||
]
|
||||
report = check_coverage(entities)
|
||||
assert report.coverage_ratio < 1.0
|
||||
assert len(report.empty_cells) == 1
|
||||
assert report.empty_cells[0]["dimension_a"] == "domain:d2"
|
||||
assert report.empty_cells[0]["dimension_b"] == "chapter:ch2"
|
||||
|
||||
def test_domain_counts(self):
|
||||
from markitect.infospace.checks.coverage import check_coverage
|
||||
entities = _sample_entities()
|
||||
report = check_coverage(entities)
|
||||
assert report.domain_counts["economics"] == 2
|
||||
assert report.domain_counts["sociology"] == 2
|
||||
assert report.domain_counts["philosophy"] == 1
|
||||
|
||||
def test_to_dict(self):
|
||||
from markitect.infospace.checks.coverage import CoverageReport
|
||||
r = CoverageReport(coverage_ratio=0.75, entity_count=8)
|
||||
d = r.to_dict()
|
||||
assert d["concern"] == "C2"
|
||||
assert d["coverage_ratio"] == 0.75
|
||||
|
||||
def test_extra_attributes(self):
|
||||
from markitect.infospace.checks.coverage import check_coverage
|
||||
entities = [
|
||||
_entity("a", domain="d1", source_chapter="ch1"),
|
||||
]
|
||||
extra = {"a": {"vsm:production"}}
|
||||
report = check_coverage(entities, extra_attributes=extra)
|
||||
assert report.entity_count == 1
|
||||
|
||||
|
||||
# ── C3: Coherence ───────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCoherence:
|
||||
def test_no_graph(self):
|
||||
from markitect.infospace.checks.coherence import check_coherence
|
||||
report = check_coherence(graph=None, entity_count=5)
|
||||
assert report.connected_components == 0
|
||||
assert report.entity_count == 5
|
||||
|
||||
def test_empty_graph(self):
|
||||
from markitect.infospace.checks.coherence import check_coherence
|
||||
g = DependencyGraph()
|
||||
report = check_coherence(graph=g, entity_count=0)
|
||||
assert report.connected_components == 0
|
||||
|
||||
def test_to_dict(self):
|
||||
from markitect.infospace.checks.coherence import CoherenceReport
|
||||
r = CoherenceReport(connected_components=2, modularity=0.3456, entity_count=10)
|
||||
d = r.to_dict()
|
||||
assert d["concern"] == "C3"
|
||||
assert d["modularity"] == 0.3456
|
||||
assert d["connected_components"] == 2
|
||||
|
||||
@pytest.mark.skipif(
|
||||
not _can_import_graph_analysis(),
|
||||
reason="networkx not available",
|
||||
)
|
||||
def test_with_graph(self):
|
||||
from markitect.infospace.checks.coherence import check_coherence
|
||||
g = _linear_graph()
|
||||
report = check_coherence(graph=g, entity_count=4)
|
||||
assert report.connected_components >= 1
|
||||
assert report.entity_count == 4
|
||||
|
||||
|
||||
# ── C4: Consistency ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestConsistency:
|
||||
def test_no_graph(self):
|
||||
from markitect.infospace.checks.consistency import check_consistency
|
||||
entities = _sample_entities()
|
||||
report = check_consistency(entities)
|
||||
assert report.cycle_count == 0
|
||||
assert report.entity_count == 5
|
||||
|
||||
def test_acyclic_graph(self):
|
||||
from markitect.infospace.checks.consistency import check_consistency
|
||||
entities = _sample_entities()
|
||||
g = _linear_graph()
|
||||
report = check_consistency(entities, graph=g)
|
||||
assert report.cycle_count == 0
|
||||
|
||||
def test_cyclic_graph(self):
|
||||
from markitect.infospace.checks.consistency import check_consistency
|
||||
entities = _sample_entities()
|
||||
g = _cyclic_graph()
|
||||
report = check_consistency(entities, graph=g)
|
||||
assert report.cycle_count >= 1
|
||||
assert len(report.cycles) >= 1
|
||||
|
||||
def test_to_dict(self):
|
||||
from markitect.infospace.checks.consistency import ConsistencyReport
|
||||
r = ConsistencyReport(cycles=[["A", "B", "A"]], cycle_count=1, entity_count=5)
|
||||
d = r.to_dict()
|
||||
assert d["concern"] == "C4"
|
||||
assert d["cycle_count"] == 1
|
||||
|
||||
|
||||
# ── C5: Granularity ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGranularity:
|
||||
def test_empty_entities(self):
|
||||
from markitect.infospace.checks.granularity import check_granularity
|
||||
report = check_granularity([])
|
||||
assert report.entity_count == 0
|
||||
assert report.domain_entropy == 0.0
|
||||
|
||||
def test_single_domain(self):
|
||||
from markitect.infospace.checks.granularity import check_granularity
|
||||
entities = [
|
||||
_entity("a", domain="d1", word_count=10),
|
||||
_entity("b", domain="d1", word_count=20),
|
||||
]
|
||||
report = check_granularity(entities)
|
||||
assert report.domain_entropy == 0.0 # single domain = zero entropy
|
||||
assert report.entity_count == 2
|
||||
assert report.word_count_stats["mean"] == 15.0
|
||||
|
||||
def test_balanced_domains(self):
|
||||
from markitect.infospace.checks.granularity import check_granularity
|
||||
entities = [
|
||||
_entity("a", domain="d1", word_count=10),
|
||||
_entity("b", domain="d2", word_count=10),
|
||||
]
|
||||
report = check_granularity(entities)
|
||||
assert report.domain_entropy == pytest.approx(1.0) # log2(2) = 1.0
|
||||
assert report.domain_distribution == {"d1": 1, "d2": 1}
|
||||
|
||||
def test_word_count_stats(self):
|
||||
from markitect.infospace.checks.granularity import check_granularity
|
||||
entities = [
|
||||
_entity("a", domain="d1", word_count=10),
|
||||
_entity("b", domain="d1", word_count=30),
|
||||
]
|
||||
report = check_granularity(entities)
|
||||
assert report.word_count_stats["mean"] == 20.0
|
||||
assert report.word_count_stats["min"] == 10.0
|
||||
assert report.word_count_stats["max"] == 30.0
|
||||
assert report.word_count_stats["std"] == 10.0
|
||||
|
||||
def test_to_dict(self):
|
||||
from markitect.infospace.checks.granularity import GranularityReport
|
||||
r = GranularityReport(domain_entropy=1.5, entity_count=4)
|
||||
d = r.to_dict()
|
||||
assert d["concern"] == "C5"
|
||||
assert d["domain_entropy"] == 1.5
|
||||
|
||||
def test_unspecified_domain(self):
|
||||
from markitect.infospace.checks.granularity import check_granularity
|
||||
entities = [_entity("a", domain="", word_count=10)]
|
||||
report = check_granularity(entities)
|
||||
assert "(unspecified)" in report.domain_distribution
|
||||
|
||||
|
||||
# ── Orchestrator ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestOrchestrator:
|
||||
def test_run_all_default(self):
|
||||
from markitect.infospace.checks.orchestrator import run_all_checks
|
||||
entities = _sample_entities()
|
||||
report = run_all_checks(entities)
|
||||
assert report.redundancy is not None
|
||||
assert report.coverage is not None
|
||||
assert report.coherence is not None
|
||||
assert report.consistency is not None
|
||||
assert report.granularity is not None
|
||||
|
||||
def test_run_selected_checks(self):
|
||||
from markitect.infospace.checks.orchestrator import run_all_checks
|
||||
entities = _sample_entities()
|
||||
report = run_all_checks(entities, checks=["redundancy", "granularity"])
|
||||
assert report.redundancy is not None
|
||||
assert report.granularity is not None
|
||||
assert report.coverage is None
|
||||
assert report.coherence is None
|
||||
assert report.consistency is None
|
||||
|
||||
def test_to_dict(self):
|
||||
from markitect.infospace.checks.orchestrator import run_all_checks
|
||||
entities = _sample_entities()
|
||||
report = run_all_checks(entities, checks=["granularity"])
|
||||
d = report.to_dict()
|
||||
assert "granularity" in d
|
||||
assert "redundancy" not in d
|
||||
|
||||
def test_metrics(self):
|
||||
from markitect.infospace.checks.orchestrator import run_all_checks
|
||||
entities = _sample_entities()
|
||||
report = run_all_checks(entities, checks=["redundancy", "granularity"])
|
||||
m = report.metrics()
|
||||
assert "redundancy_ratio" in m
|
||||
assert "granularity_entropy" in m
|
||||
assert isinstance(m["redundancy_ratio"], float)
|
||||
assert isinstance(m["granularity_entropy"], float)
|
||||
|
||||
def test_metrics_empty_report(self):
|
||||
from markitect.infospace.checks.orchestrator import CheckReport
|
||||
report = CheckReport()
|
||||
assert report.metrics() == {}
|
||||
|
||||
def test_run_all_with_graph(self):
|
||||
from markitect.infospace.checks.orchestrator import run_all_checks
|
||||
entities = _sample_entities()
|
||||
g = _linear_graph()
|
||||
report = run_all_checks(entities, graph=g, checks=["consistency"])
|
||||
assert report.consistency is not None
|
||||
assert report.consistency.cycle_count == 0
|
||||
|
||||
def test_run_all_with_cyclic_graph(self):
|
||||
from markitect.infospace.checks.orchestrator import run_all_checks
|
||||
entities = _sample_entities()
|
||||
g = _cyclic_graph()
|
||||
report = run_all_checks(entities, graph=g, checks=["consistency"])
|
||||
assert report.consistency.cycle_count >= 1
|
||||
|
||||
|
||||
# ── Shannon entropy helper ──────────────────────────────────────────
|
||||
|
||||
|
||||
class TestShannonEntropy:
|
||||
def test_uniform_distribution(self):
|
||||
from markitect.infospace.checks.granularity import _shannon_entropy
|
||||
counts = {"a": 1, "b": 1, "c": 1, "d": 1}
|
||||
assert _shannon_entropy(counts) == pytest.approx(2.0) # log2(4)
|
||||
|
||||
def test_single_element(self):
|
||||
from markitect.infospace.checks.granularity import _shannon_entropy
|
||||
assert _shannon_entropy({"a": 10}) == 0.0
|
||||
|
||||
def test_empty(self):
|
||||
from markitect.infospace.checks.granularity import _shannon_entropy
|
||||
assert _shannon_entropy({}) == 0.0
|
||||
|
||||
def test_skewed(self):
|
||||
from markitect.infospace.checks.granularity import _shannon_entropy
|
||||
counts = {"a": 99, "b": 1}
|
||||
entropy = _shannon_entropy(counts)
|
||||
assert 0.0 < entropy < 1.0
|
||||
225
tests/unit/infospace/test_cli.py
Normal file
225
tests/unit/infospace/test_cli.py
Normal file
@@ -0,0 +1,225 @@
|
||||
"""Tests for markitect.infospace.cli."""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from markitect.infospace.cli import infospace_commands
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def runner():
|
||||
return CliRunner()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def infospace_dir(tmp_path):
|
||||
"""Create a minimal infospace directory with config and entities."""
|
||||
config_yaml = """\
|
||||
topic:
|
||||
name: "Test Infospace"
|
||||
domain: "Testing"
|
||||
|
||||
disciplines:
|
||||
- name: "Test Discipline"
|
||||
|
||||
viability:
|
||||
coverage_ratio:
|
||||
min: 0.60
|
||||
redundancy_ratio:
|
||||
max: 0.05
|
||||
"""
|
||||
(tmp_path / "infospace.yaml").write_text(config_yaml)
|
||||
|
||||
entities = tmp_path / "output" / "entities"
|
||||
entities.mkdir(parents=True)
|
||||
(entities / "alpha.md").write_text(
|
||||
"# Alpha\n\n## Definition\n\nAlpha is a test entity.\n\n"
|
||||
"## Source Chapter\n\nChapter 1\n\n"
|
||||
"## Domain\n\nProduction\n"
|
||||
)
|
||||
(entities / "beta.md").write_text(
|
||||
"# Beta\n\n## Definition\n\nBeta is another test entity with more words "
|
||||
"to make it longer.\n\n"
|
||||
"## Source Chapter\n\nChapter 2\n\n"
|
||||
"## Domain\n\nDistribution\n"
|
||||
)
|
||||
return tmp_path
|
||||
|
||||
|
||||
# ── init ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestInitCommand:
|
||||
def test_creates_config_file(self, runner, tmp_path):
|
||||
out = tmp_path / "infospace.yaml"
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["init", "--topic", "My Topic", "--domain", "Science", "-o", str(out)],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert out.exists()
|
||||
assert "Created" in result.output
|
||||
|
||||
def test_config_contains_topic(self, runner, tmp_path):
|
||||
out = tmp_path / "infospace.yaml"
|
||||
runner.invoke(
|
||||
infospace_commands,
|
||||
["init", "--topic", "My Topic", "-o", str(out)],
|
||||
)
|
||||
text = out.read_text()
|
||||
assert "My Topic" in text
|
||||
|
||||
def test_refuses_overwrite(self, runner, tmp_path):
|
||||
out = tmp_path / "infospace.yaml"
|
||||
out.write_text("existing")
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["init", "--topic", "X", "-o", str(out)],
|
||||
)
|
||||
assert result.exit_code != 0
|
||||
assert "already exists" in result.output
|
||||
|
||||
def test_with_disciplines(self, runner, tmp_path):
|
||||
out = tmp_path / "infospace.yaml"
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
[
|
||||
"init", "--topic", "T",
|
||||
"--discipline", "VSM",
|
||||
"--discipline", "Category Theory",
|
||||
"-o", str(out),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
text = out.read_text()
|
||||
assert "VSM" in text
|
||||
assert "Category Theory" in text
|
||||
|
||||
|
||||
# ── status ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestStatusCommand:
|
||||
def test_shows_topic_and_count(self, runner, infospace_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["status", "--config", str(infospace_dir / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "Test Infospace" in result.output
|
||||
assert "2" in result.output # 2 entities
|
||||
|
||||
def test_shows_domain_field(self, runner, infospace_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["status", "--config", str(infospace_dir / "infospace.yaml")],
|
||||
)
|
||||
# Domain from config (topic.domain), not entity domains
|
||||
assert "Testing" in result.output
|
||||
|
||||
def test_shows_disciplines(self, runner, infospace_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["status", "--config", str(infospace_dir / "infospace.yaml")],
|
||||
)
|
||||
assert "Test Discipline" in result.output
|
||||
|
||||
def test_no_config_exits(self, runner, tmp_path):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["status", "--config", str(tmp_path / "nonexistent.yaml")],
|
||||
)
|
||||
assert result.exit_code != 0
|
||||
|
||||
|
||||
# ── entities ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEntitiesCommand:
|
||||
def test_lists_entities(self, runner, infospace_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["entities", "--config", str(infospace_dir / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "alpha" in result.output
|
||||
assert "beta" in result.output
|
||||
assert "Total: 2" in result.output
|
||||
|
||||
def test_sort_by_domain(self, runner, infospace_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
[
|
||||
"entities",
|
||||
"--config", str(infospace_dir / "infospace.yaml"),
|
||||
"--sort-by", "domain",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
lines = result.output.strip().split("\n")
|
||||
# Distribution comes before Production alphabetically
|
||||
data_lines = [l for l in lines if "alpha" in l or "beta" in l]
|
||||
assert len(data_lines) == 2
|
||||
|
||||
def test_no_entities_dir(self, runner, tmp_path):
|
||||
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["entities", "--config", str(tmp_path / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "No entities" in result.output
|
||||
|
||||
|
||||
# ── viability ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestViabilityCommand:
|
||||
def test_no_metrics_shows_thresholds(self, runner, infospace_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["viability", "--config", str(infospace_dir / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "coverage_ratio" in result.output
|
||||
|
||||
def test_with_metrics_file(self, runner, infospace_dir):
|
||||
import yaml
|
||||
metrics_dir = infospace_dir / "output" / "metrics"
|
||||
metrics_dir.mkdir(parents=True, exist_ok=True)
|
||||
metrics = {"coverage_ratio": 0.85, "redundancy_ratio": 0.02}
|
||||
(metrics_dir / "metrics.yaml").write_text(yaml.safe_dump(metrics))
|
||||
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["viability", "--config", str(infospace_dir / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "PASS" in result.output
|
||||
assert "Viable: YES" in result.output
|
||||
|
||||
def test_failing_threshold(self, runner, infospace_dir):
|
||||
import yaml
|
||||
metrics_dir = infospace_dir / "output" / "metrics"
|
||||
metrics_dir.mkdir(parents=True, exist_ok=True)
|
||||
metrics = {"coverage_ratio": 0.3, "redundancy_ratio": 0.02}
|
||||
(metrics_dir / "metrics.yaml").write_text(yaml.safe_dump(metrics))
|
||||
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["viability", "--config", str(infospace_dir / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "FAIL" in result.output
|
||||
assert "Viable: NO" in result.output
|
||||
|
||||
def test_no_thresholds_configured(self, runner, tmp_path):
|
||||
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["viability", "--config", str(tmp_path / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "No viability thresholds" in result.output
|
||||
257
tests/unit/infospace/test_composition.py
Normal file
257
tests/unit/infospace/test_composition.py
Normal file
@@ -0,0 +1,257 @@
|
||||
"""
|
||||
Tests for infospace composition model (S2.6).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import yaml
|
||||
|
||||
from markitect.infospace.composition import (
|
||||
DisciplineStatus,
|
||||
StaleMappingInfo,
|
||||
bind_discipline,
|
||||
check_discipline_status,
|
||||
compute_discipline_digests,
|
||||
find_stale_mappings,
|
||||
get_discipline_entities,
|
||||
load_discipline_config,
|
||||
resolve_discipline_path,
|
||||
)
|
||||
from markitect.infospace.config import (
|
||||
DisciplineBinding,
|
||||
InfospaceConfig,
|
||||
TopicConfig,
|
||||
ViabilityThreshold,
|
||||
save_infospace_config,
|
||||
)
|
||||
|
||||
|
||||
# ── helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _create_discipline(tmp_path: Path, name: str = "test-discipline") -> Path:
|
||||
"""Create a minimal discipline infospace directory."""
|
||||
disc_dir = tmp_path / name
|
||||
disc_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
disc_config = InfospaceConfig(
|
||||
topic=TopicConfig(name=name.replace("-", " ").title(), domain="Testing"),
|
||||
viability={"coverage_ratio": ViabilityThreshold(metric="coverage_ratio", min=0.5)},
|
||||
)
|
||||
save_infospace_config(disc_config, disc_dir / "infospace.yaml")
|
||||
|
||||
# Create some entities
|
||||
entities_dir = disc_dir / "output" / "entities"
|
||||
entities_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
for slug in ["concept_a", "concept_b", "concept-c"]:
|
||||
title = slug.replace("-", " ").title()
|
||||
(entities_dir / f"{slug}.md").write_text(
|
||||
f"# {title}\n\n## Definition\n\nA test concept for {slug}.\n\n"
|
||||
f"## Source Chapter\n\nch01\n\n## Domain\n\nTesting\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
return disc_dir
|
||||
|
||||
|
||||
def _parent_config(tmp_path: Path, disc_path: str = "") -> InfospaceConfig:
|
||||
"""Create a parent infospace config."""
|
||||
return InfospaceConfig(
|
||||
topic=TopicConfig(name="Parent", domain="Testing"),
|
||||
disciplines=[DisciplineBinding(name="Test Discipline", path=disc_path)]
|
||||
if disc_path
|
||||
else [],
|
||||
)
|
||||
|
||||
|
||||
# ── resolve_discipline_path ─────────────────────────────────────────
|
||||
|
||||
|
||||
class TestResolveDisciplinePath:
|
||||
def test_relative_path(self, tmp_path):
|
||||
disc_dir = _create_discipline(tmp_path)
|
||||
binding = DisciplineBinding(name="test", path="test-discipline")
|
||||
result = resolve_discipline_path(binding, tmp_path)
|
||||
assert result is not None
|
||||
assert result == disc_dir.resolve()
|
||||
|
||||
def test_absolute_path(self, tmp_path):
|
||||
disc_dir = _create_discipline(tmp_path)
|
||||
binding = DisciplineBinding(name="test", path=str(disc_dir))
|
||||
result = resolve_discipline_path(binding, tmp_path / "other")
|
||||
assert result is not None
|
||||
assert result == disc_dir.resolve()
|
||||
|
||||
def test_missing_path(self, tmp_path):
|
||||
binding = DisciplineBinding(name="test", path="nonexistent")
|
||||
assert resolve_discipline_path(binding, tmp_path) is None
|
||||
|
||||
def test_empty_path(self, tmp_path):
|
||||
binding = DisciplineBinding(name="test", path="")
|
||||
assert resolve_discipline_path(binding, tmp_path) is None
|
||||
|
||||
|
||||
# ── load_discipline_config ──────────────────────────────────────────
|
||||
|
||||
|
||||
class TestLoadDisciplineConfig:
|
||||
def test_loads_config(self, tmp_path):
|
||||
disc_dir = _create_discipline(tmp_path)
|
||||
binding = DisciplineBinding(name="test", path="test-discipline")
|
||||
config = load_discipline_config(binding, tmp_path)
|
||||
assert config is not None
|
||||
assert config.topic.domain == "Testing"
|
||||
|
||||
def test_missing_config_file(self, tmp_path):
|
||||
(tmp_path / "no-config").mkdir()
|
||||
binding = DisciplineBinding(name="test", path="no-config")
|
||||
assert load_discipline_config(binding, tmp_path) is None
|
||||
|
||||
def test_missing_directory(self, tmp_path):
|
||||
binding = DisciplineBinding(name="test", path="gone")
|
||||
assert load_discipline_config(binding, tmp_path) is None
|
||||
|
||||
|
||||
# ── check_discipline_status ─────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCheckDisciplineStatus:
|
||||
def test_valid_discipline(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
binding = DisciplineBinding(name="Test Discipline", path="test-discipline")
|
||||
status = check_discipline_status(binding, tmp_path)
|
||||
assert status.exists
|
||||
assert status.has_config
|
||||
assert status.entity_count == 3
|
||||
assert status.error == ""
|
||||
|
||||
def test_missing_discipline(self, tmp_path):
|
||||
binding = DisciplineBinding(name="Missing", path="nope")
|
||||
status = check_discipline_status(binding, tmp_path)
|
||||
assert not status.exists
|
||||
assert "not found" in status.error.lower()
|
||||
|
||||
def test_no_config(self, tmp_path):
|
||||
(tmp_path / "bare").mkdir()
|
||||
binding = DisciplineBinding(name="Bare", path="bare")
|
||||
status = check_discipline_status(binding, tmp_path)
|
||||
assert status.exists
|
||||
assert not status.has_config
|
||||
|
||||
def test_viable_with_metrics(self, tmp_path):
|
||||
disc_dir = _create_discipline(tmp_path)
|
||||
# Write metrics that meet the threshold
|
||||
metrics_dir = disc_dir / "output" / "metrics"
|
||||
metrics_dir.mkdir(parents=True, exist_ok=True)
|
||||
(metrics_dir / "metrics.yaml").write_text(
|
||||
yaml.safe_dump({"coverage_ratio": 0.8}), encoding="utf-8"
|
||||
)
|
||||
binding = DisciplineBinding(name="Test", path="test-discipline")
|
||||
status = check_discipline_status(binding, tmp_path)
|
||||
assert status.is_viable
|
||||
|
||||
def test_not_viable_below_threshold(self, tmp_path):
|
||||
disc_dir = _create_discipline(tmp_path)
|
||||
metrics_dir = disc_dir / "output" / "metrics"
|
||||
metrics_dir.mkdir(parents=True, exist_ok=True)
|
||||
(metrics_dir / "metrics.yaml").write_text(
|
||||
yaml.safe_dump({"coverage_ratio": 0.2}), encoding="utf-8"
|
||||
)
|
||||
binding = DisciplineBinding(name="Test", path="test-discipline")
|
||||
status = check_discipline_status(binding, tmp_path)
|
||||
assert not status.is_viable
|
||||
|
||||
def test_to_dict(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
binding = DisciplineBinding(name="Test", path="test-discipline")
|
||||
status = check_discipline_status(binding, tmp_path)
|
||||
d = status.to_dict()
|
||||
assert d["name"] == "Test"
|
||||
assert d["exists"] is True
|
||||
assert d["entity_count"] == 3
|
||||
|
||||
|
||||
# ── get_discipline_entities ─────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetDisciplineEntities:
|
||||
def test_returns_entities(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
binding = DisciplineBinding(name="Test", path="test-discipline")
|
||||
entities = get_discipline_entities(binding, tmp_path)
|
||||
assert len(entities) == 3
|
||||
slugs = {e.slug for e in entities}
|
||||
assert "concept_a" in slugs
|
||||
|
||||
def test_missing_discipline(self, tmp_path):
|
||||
binding = DisciplineBinding(name="Test", path="nope")
|
||||
assert get_discipline_entities(binding, tmp_path) == []
|
||||
|
||||
|
||||
# ── compute_discipline_digests ──────────────────────────────────────
|
||||
|
||||
|
||||
class TestComputeDisciplineDigests:
|
||||
def test_returns_digests(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
binding = DisciplineBinding(name="Test", path="test-discipline")
|
||||
digests = compute_discipline_digests(binding, tmp_path)
|
||||
assert len(digests) == 3
|
||||
assert "concept_a" in digests
|
||||
assert isinstance(digests["concept_a"], str)
|
||||
assert len(digests["concept_a"]) == 12
|
||||
|
||||
|
||||
# ── find_stale_mappings ─────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestFindStaleMappings:
|
||||
def test_no_references(self, tmp_path):
|
||||
cfg = _parent_config(tmp_path, disc_path="test-discipline")
|
||||
assert find_stale_mappings(cfg, tmp_path) == []
|
||||
|
||||
def test_no_stale(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
cfg = _parent_config(tmp_path, disc_path="test-discipline")
|
||||
refs = {"entity_x": ["concept_a", "concept_b"]}
|
||||
stale = find_stale_mappings(cfg, tmp_path, mapping_references=refs)
|
||||
assert stale == []
|
||||
|
||||
def test_detects_stale(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
cfg = _parent_config(tmp_path, disc_path="test-discipline")
|
||||
refs = {"entity_x": ["concept_a", "deleted_concept"]}
|
||||
stale = find_stale_mappings(cfg, tmp_path, mapping_references=refs)
|
||||
assert len(stale) == 1
|
||||
assert stale[0].entity_slug == "entity_x"
|
||||
assert stale[0].discipline_entity == "deleted_concept"
|
||||
|
||||
def test_stale_to_dict(self):
|
||||
info = StaleMappingInfo(
|
||||
entity_slug="e1", discipline_entity="d1", reason="gone"
|
||||
)
|
||||
d = info.to_dict()
|
||||
assert d["entity_slug"] == "e1"
|
||||
|
||||
|
||||
# ── bind_discipline ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBindDiscipline:
|
||||
def test_adds_binding(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
cfg = InfospaceConfig(topic=TopicConfig(name="Parent"))
|
||||
status = bind_discipline(cfg, name="Test", path="test-discipline", root=tmp_path)
|
||||
assert status.exists
|
||||
assert len(cfg.disciplines) == 1
|
||||
assert cfg.disciplines[0].name == "Test"
|
||||
|
||||
def test_duplicate_rejected(self, tmp_path):
|
||||
_create_discipline(tmp_path)
|
||||
cfg = _parent_config(tmp_path, disc_path="test-discipline")
|
||||
status = bind_discipline(cfg, name="Test Discipline", path="x", root=tmp_path)
|
||||
assert "already bound" in status.error
|
||||
400
tests/unit/infospace/test_config.py
Normal file
400
tests/unit/infospace/test_config.py
Normal file
@@ -0,0 +1,400 @@
|
||||
"""Tests for markitect.infospace.config and state."""
|
||||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.infospace.config import (
|
||||
DisciplineBinding,
|
||||
InfospaceConfig,
|
||||
PipelineConfig,
|
||||
PipelineStage,
|
||||
SchemaRegistry,
|
||||
TopicConfig,
|
||||
ViabilityThreshold,
|
||||
find_infospace_config,
|
||||
load_infospace_config,
|
||||
save_infospace_config,
|
||||
)
|
||||
from markitect.infospace.state import (
|
||||
InfospaceState,
|
||||
ViabilityResult,
|
||||
build_state,
|
||||
)
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.infospace.evaluation import (
|
||||
EntityEvaluation,
|
||||
EvaluationSnapshot,
|
||||
ScoreEntry,
|
||||
)
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
_SAMPLE_YAML = """\
|
||||
topic:
|
||||
name: "The Wealth of Nations"
|
||||
domain: "Classical Economics"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Viable System Model"
|
||||
path: artifacts/vsm-reference/
|
||||
|
||||
schemas:
|
||||
entity: schemas/economic-entity-schema-v1.0.md
|
||||
mapping: schemas/vsm-mapping-schema-v1.0.md
|
||||
|
||||
competency_questions: schemas/competency-questions.md
|
||||
|
||||
viability:
|
||||
coverage_ratio:
|
||||
min: 0.60
|
||||
per_entity_mean:
|
||||
min: 3.5
|
||||
redundancy_ratio:
|
||||
max: 0.05
|
||||
|
||||
pipeline:
|
||||
stages:
|
||||
- template: extract-entities
|
||||
spaces: [sources, guidelines]
|
||||
- template: map-to-vsm
|
||||
spaces: [entities, vsm-reference]
|
||||
post_batch:
|
||||
- template: assess-metrics
|
||||
"""
|
||||
|
||||
|
||||
def _sample_config() -> InfospaceConfig:
|
||||
return InfospaceConfig(
|
||||
topic=TopicConfig(name="Test Topic", domain="Testing"),
|
||||
disciplines=[DisciplineBinding(name="VSM", path="vsm/")],
|
||||
schemas=SchemaRegistry(entity="schemas/entity.md"),
|
||||
competency_questions="schemas/cq.md",
|
||||
viability={
|
||||
"coverage_ratio": ViabilityThreshold("coverage_ratio", min=0.6),
|
||||
"redundancy_ratio": ViabilityThreshold("redundancy_ratio", max=0.05),
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _sample_entities(n=5) -> list:
|
||||
return [
|
||||
EntityMeta(
|
||||
slug=f"entity-{i}",
|
||||
title=f"Entity {i}",
|
||||
h1_raw=f"Entity {i}",
|
||||
domain="Production" if i % 2 == 0 else "Distribution",
|
||||
)
|
||||
for i in range(n)
|
||||
]
|
||||
|
||||
|
||||
# ── TopicConfig ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestTopicConfig:
|
||||
def test_round_trip(self):
|
||||
tc = TopicConfig("WoN", "Economics", "sources/")
|
||||
d = tc.to_dict()
|
||||
restored = TopicConfig.from_dict(d)
|
||||
assert restored.name == "WoN"
|
||||
assert restored.domain == "Economics"
|
||||
assert restored.sources == "sources/"
|
||||
|
||||
def test_minimal(self):
|
||||
tc = TopicConfig.from_dict({"name": "Minimal"})
|
||||
assert tc.domain == ""
|
||||
assert tc.sources == ""
|
||||
|
||||
def test_to_dict_omits_empty(self):
|
||||
tc = TopicConfig("X")
|
||||
d = tc.to_dict()
|
||||
assert "domain" not in d
|
||||
assert "sources" not in d
|
||||
|
||||
|
||||
# ── DisciplineBinding ────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestDisciplineBinding:
|
||||
def test_round_trip(self):
|
||||
db = DisciplineBinding("VSM", "path/to/vsm")
|
||||
d = db.to_dict()
|
||||
restored = DisciplineBinding.from_dict(d)
|
||||
assert restored.name == "VSM"
|
||||
assert restored.path == "path/to/vsm"
|
||||
|
||||
|
||||
# ── SchemaRegistry ───────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSchemaRegistry:
|
||||
def test_round_trip(self):
|
||||
sr = SchemaRegistry(entity="e.md", mapping="m.md", analysis="a.md")
|
||||
d = sr.to_dict()
|
||||
restored = SchemaRegistry.from_dict(d)
|
||||
assert restored.entity == "e.md"
|
||||
assert restored.mapping == "m.md"
|
||||
|
||||
def test_extra_schemas(self):
|
||||
sr = SchemaRegistry.from_dict({"entity": "e.md", "custom": "c.md"})
|
||||
assert sr.entity == "e.md"
|
||||
assert sr.extra == {"custom": "c.md"}
|
||||
|
||||
|
||||
# ── ViabilityThreshold ──────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestViabilityThreshold:
|
||||
def test_min_check(self):
|
||||
t = ViabilityThreshold("x", min=0.5)
|
||||
assert t.check(0.6) is True
|
||||
assert t.check(0.5) is True
|
||||
assert t.check(0.4) is False
|
||||
|
||||
def test_max_check(self):
|
||||
t = ViabilityThreshold("x", max=0.1)
|
||||
assert t.check(0.05) is True
|
||||
assert t.check(0.1) is True
|
||||
assert t.check(0.2) is False
|
||||
|
||||
def test_min_and_max(self):
|
||||
t = ViabilityThreshold("x", min=0.3, max=0.7)
|
||||
assert t.check(0.5) is True
|
||||
assert t.check(0.2) is False
|
||||
assert t.check(0.8) is False
|
||||
|
||||
def test_no_bounds_always_passes(self):
|
||||
t = ViabilityThreshold("x")
|
||||
assert t.check(999.0) is True
|
||||
|
||||
|
||||
# ── PipelineConfig ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestPipelineConfig:
|
||||
def test_round_trip(self):
|
||||
pc = PipelineConfig(
|
||||
stages=[PipelineStage("extract", ["s1", "s2"])],
|
||||
post_batch=[PipelineStage("assess")],
|
||||
)
|
||||
d = pc.to_dict()
|
||||
restored = PipelineConfig.from_dict(d)
|
||||
assert len(restored.stages) == 1
|
||||
assert restored.stages[0].template == "extract"
|
||||
assert restored.stages[0].spaces == ["s1", "s2"]
|
||||
assert len(restored.post_batch) == 1
|
||||
|
||||
|
||||
# ── InfospaceConfig ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestInfospaceConfig:
|
||||
def test_to_dict_from_dict_round_trip(self):
|
||||
cfg = _sample_config()
|
||||
d = cfg.to_dict()
|
||||
restored = InfospaceConfig.from_dict(d)
|
||||
assert restored.topic.name == "Test Topic"
|
||||
assert len(restored.disciplines) == 1
|
||||
assert restored.schemas.entity == "schemas/entity.md"
|
||||
assert restored.competency_questions == "schemas/cq.md"
|
||||
assert len(restored.viability) == 2
|
||||
|
||||
def test_viability_thresholds_preserved(self):
|
||||
cfg = _sample_config()
|
||||
d = cfg.to_dict()
|
||||
restored = InfospaceConfig.from_dict(d)
|
||||
assert restored.viability["coverage_ratio"].min == 0.6
|
||||
assert restored.viability["redundancy_ratio"].max == 0.05
|
||||
|
||||
def test_default_dirs(self):
|
||||
cfg = InfospaceConfig(topic=TopicConfig("X"))
|
||||
assert cfg.entities_dir == "output/entities"
|
||||
assert cfg.evaluations_dir == "output/evaluations"
|
||||
assert cfg.metrics_dir == "output/metrics"
|
||||
|
||||
def test_custom_dirs(self):
|
||||
cfg = InfospaceConfig.from_dict({
|
||||
"topic": {"name": "X"},
|
||||
"entities_dir": "custom/entities",
|
||||
})
|
||||
assert cfg.entities_dir == "custom/entities"
|
||||
|
||||
|
||||
# ── YAML I/O ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestYAMLIO:
|
||||
def test_save_load_round_trip(self, tmp_path):
|
||||
cfg = _sample_config()
|
||||
p = tmp_path / "infospace.yaml"
|
||||
save_infospace_config(cfg, p)
|
||||
loaded = load_infospace_config(p)
|
||||
assert loaded.topic.name == cfg.topic.name
|
||||
assert len(loaded.viability) == len(cfg.viability)
|
||||
|
||||
def test_load_full_example(self, tmp_path):
|
||||
p = tmp_path / "infospace.yaml"
|
||||
p.write_text(_SAMPLE_YAML, encoding="utf-8")
|
||||
cfg = load_infospace_config(p)
|
||||
assert cfg.topic.name == "The Wealth of Nations"
|
||||
assert cfg.topic.domain == "Classical Economics"
|
||||
assert len(cfg.disciplines) == 1
|
||||
assert cfg.disciplines[0].name == "Viable System Model"
|
||||
assert cfg.schemas.entity == "schemas/economic-entity-schema-v1.0.md"
|
||||
assert cfg.competency_questions == "schemas/competency-questions.md"
|
||||
assert len(cfg.viability) == 3
|
||||
assert cfg.viability["coverage_ratio"].min == 0.60
|
||||
assert cfg.viability["redundancy_ratio"].max == 0.05
|
||||
assert cfg.pipeline is not None
|
||||
assert len(cfg.pipeline.stages) == 2
|
||||
assert len(cfg.pipeline.post_batch) == 1
|
||||
|
||||
def test_load_missing_file(self, tmp_path):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
load_infospace_config(tmp_path / "nonexistent.yaml")
|
||||
|
||||
def test_load_missing_topic(self, tmp_path):
|
||||
p = tmp_path / "bad.yaml"
|
||||
p.write_text("schemas:\n entity: x.md\n")
|
||||
with pytest.raises(ValueError, match="topic"):
|
||||
load_infospace_config(p)
|
||||
|
||||
def test_save_creates_parent_dirs(self, tmp_path):
|
||||
cfg = InfospaceConfig(topic=TopicConfig("X"))
|
||||
p = tmp_path / "deep" / "nested" / "infospace.yaml"
|
||||
save_infospace_config(cfg, p)
|
||||
assert p.exists()
|
||||
|
||||
|
||||
class TestFindConfig:
|
||||
def test_finds_config_in_current_dir(self, tmp_path):
|
||||
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
|
||||
found = find_infospace_config(tmp_path)
|
||||
assert found is not None
|
||||
assert found.name == "infospace.yaml"
|
||||
|
||||
def test_finds_config_in_parent(self, tmp_path):
|
||||
(tmp_path / "infospace.yaml").write_text("topic:\n name: X\n")
|
||||
child = tmp_path / "sub" / "dir"
|
||||
child.mkdir(parents=True)
|
||||
found = find_infospace_config(child)
|
||||
assert found is not None
|
||||
|
||||
def test_returns_none_if_not_found(self, tmp_path):
|
||||
assert find_infospace_config(tmp_path) is None
|
||||
|
||||
|
||||
# ── InfospaceState ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestInfospaceState:
|
||||
def test_entity_count(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg, entities=_sample_entities(5))
|
||||
assert state.entity_count == 5
|
||||
|
||||
def test_topic_name(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg)
|
||||
assert state.topic_name == "Test Topic"
|
||||
|
||||
def test_domains(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg, entities=_sample_entities(4))
|
||||
assert "Production" in state.domains
|
||||
assert "Distribution" in state.domains
|
||||
|
||||
def test_has_evaluations(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg)
|
||||
assert state.has_evaluations is False
|
||||
|
||||
snap = EvaluationSnapshot(
|
||||
snapshot_id="s1",
|
||||
created_at=datetime(2026, 1, 1),
|
||||
schema_name="Test",
|
||||
entity_count=0,
|
||||
)
|
||||
state.latest_snapshot = snap
|
||||
assert state.has_evaluations is True
|
||||
|
||||
|
||||
class TestViabilityCheck:
|
||||
def test_all_pass(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg)
|
||||
metrics = {"coverage_ratio": 0.8, "redundancy_ratio": 0.02}
|
||||
results = state.check_viability(metrics)
|
||||
assert all(r.passed for r in results)
|
||||
assert state.is_viable is True
|
||||
|
||||
def test_one_fails(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg)
|
||||
metrics = {"coverage_ratio": 0.4, "redundancy_ratio": 0.02}
|
||||
results = state.check_viability(metrics)
|
||||
assert not all(r.passed for r in results)
|
||||
assert state.is_viable is False
|
||||
|
||||
def test_missing_metric_defaults_to_zero(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg)
|
||||
# coverage_ratio min=0.6, missing → 0.0 → fails
|
||||
results = state.check_viability({})
|
||||
coverage = next(r for r in results if r.metric == "coverage_ratio")
|
||||
assert coverage.passed is False
|
||||
assert coverage.value == 0.0
|
||||
|
||||
def test_viability_counts(self):
|
||||
cfg = _sample_config()
|
||||
state = InfospaceState(config=cfg)
|
||||
metrics = {"coverage_ratio": 0.8, "redundancy_ratio": 0.2}
|
||||
state.check_viability(metrics)
|
||||
assert state.viability_pass_count == 1 # coverage passes
|
||||
assert state.viability_total_count == 2
|
||||
|
||||
def test_no_thresholds_not_viable(self):
|
||||
cfg = InfospaceConfig(topic=TopicConfig("X"))
|
||||
state = InfospaceState(config=cfg)
|
||||
assert state.is_viable is False
|
||||
|
||||
|
||||
class TestBuildState:
|
||||
def test_builds_with_entities(self):
|
||||
cfg = _sample_config()
|
||||
entities = _sample_entities(3)
|
||||
state = build_state(cfg, entities=entities)
|
||||
assert state.entity_count == 3
|
||||
|
||||
def test_builds_with_metrics(self):
|
||||
cfg = _sample_config()
|
||||
metrics = {"coverage_ratio": 0.9, "redundancy_ratio": 0.01}
|
||||
state = build_state(cfg, metrics=metrics)
|
||||
assert state.is_viable is True
|
||||
|
||||
def test_summary(self):
|
||||
cfg = _sample_config()
|
||||
entities = _sample_entities(3)
|
||||
metrics = {"coverage_ratio": 0.9, "redundancy_ratio": 0.01}
|
||||
state = build_state(cfg, entities=entities, metrics=metrics)
|
||||
s = state.summary()
|
||||
assert s["topic"] == "Test Topic"
|
||||
assert s["entity_count"] == 3
|
||||
assert s["viable"] is True
|
||||
|
||||
|
||||
class TestViabilityResult:
|
||||
def test_to_dict(self):
|
||||
t = ViabilityThreshold("x", min=0.5)
|
||||
r = ViabilityResult(metric="x", value=0.7, threshold=t, passed=True)
|
||||
d = r.to_dict()
|
||||
assert d["metric"] == "x"
|
||||
assert d["value"] == 0.7
|
||||
assert d["passed"] is True
|
||||
assert d["min"] == 0.5
|
||||
assert "max" not in d
|
||||
230
tests/unit/infospace/test_entity_parser.py
Normal file
230
tests/unit/infospace/test_entity_parser.py
Normal file
@@ -0,0 +1,230 @@
|
||||
"""Tests for markitect.infospace.entity_parser and EntityMeta."""
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.infospace import EntityMeta, parse_entity_file, parse_entity_directory
|
||||
|
||||
|
||||
# ── Fixtures ────────────────────────────────────────────────────────
|
||||
|
||||
COMPLETE_ENTITY = """\
|
||||
# Division of Labour
|
||||
|
||||
## Definition
|
||||
|
||||
The separation of a work process into a number of distinct tasks, each performed
|
||||
by a specialised worker, resulting in a significant increase in the productive
|
||||
powers of labour.
|
||||
|
||||
## Source Chapter
|
||||
|
||||
Book I, Chapter 1: "Of the Division of Labour"
|
||||
|
||||
## Context
|
||||
|
||||
The division of labour is the central argument of the chapter.
|
||||
|
||||
## Economic Domain
|
||||
|
||||
Production
|
||||
|
||||
## Smith's Original Wording
|
||||
|
||||
"The greatest improvements in the productive powers of labour…"
|
||||
|
||||
## Modern Interpretation
|
||||
|
||||
The division of labour remains a foundational concept in economics.
|
||||
"""
|
||||
|
||||
MINIMAL_ENTITY = """\
|
||||
# Minimal Entity
|
||||
|
||||
## Definition
|
||||
|
||||
A brief definition.
|
||||
|
||||
## Source Chapter
|
||||
|
||||
Book I, Chapter 1
|
||||
|
||||
## Context
|
||||
|
||||
Some context.
|
||||
|
||||
## Economic Domain
|
||||
|
||||
Exchange
|
||||
"""
|
||||
|
||||
SLUG_H1_ENTITY = """\
|
||||
# effectual-demand
|
||||
|
||||
## Definition
|
||||
|
||||
Effectual demand is the demand by consumers who are willing and able to pay.
|
||||
|
||||
## Source Chapter
|
||||
|
||||
Book 1, Chapter 7
|
||||
|
||||
## Context
|
||||
|
||||
Context for effectual demand.
|
||||
|
||||
## Economic Domain
|
||||
|
||||
Exchange
|
||||
|
||||
## Smith's Original Wording
|
||||
|
||||
"Such people may be called the effectual demanders…"
|
||||
|
||||
## Modern Interpretation
|
||||
|
||||
Represents the intersection of desire and purchasing power.
|
||||
"""
|
||||
|
||||
NO_H1 = """\
|
||||
## Only H2
|
||||
|
||||
Some content.
|
||||
"""
|
||||
|
||||
|
||||
# ── parse_entity_file ────────────────────────────────────────────────
|
||||
|
||||
class TestParseEntityFile:
|
||||
def test_complete_entity(self, tmp_path):
|
||||
f = tmp_path / "division-of-labour.md"
|
||||
f.write_text(COMPLETE_ENTITY)
|
||||
meta = parse_entity_file(f)
|
||||
|
||||
assert meta.slug == "division_of_labour"
|
||||
assert meta.title == "Division of Labour"
|
||||
assert meta.h1_is_title_case is True
|
||||
assert meta.has_original_wording is True
|
||||
assert meta.domain == "Production"
|
||||
assert meta.definition_word_count > 20
|
||||
assert "separation" in meta.definition.lower()
|
||||
assert meta.source_path == str(f)
|
||||
assert "definition" in meta.section_slugs
|
||||
assert "smith_s_original_wording" in meta.section_slugs
|
||||
|
||||
def test_minimal_entity(self, tmp_path):
|
||||
f = tmp_path / "minimal-entity.md"
|
||||
f.write_text(MINIMAL_ENTITY)
|
||||
meta = parse_entity_file(f)
|
||||
|
||||
assert meta.slug == "minimal_entity"
|
||||
assert meta.has_original_wording is False
|
||||
assert meta.original_wording == ""
|
||||
assert meta.modern_interpretation == ""
|
||||
assert meta.domain == "Exchange"
|
||||
|
||||
def test_slug_format_h1(self, tmp_path):
|
||||
f = tmp_path / "effectual-demand.md"
|
||||
f.write_text(SLUG_H1_ENTITY)
|
||||
meta = parse_entity_file(f)
|
||||
|
||||
assert meta.h1_raw == "effectual-demand"
|
||||
assert meta.h1_is_title_case is False
|
||||
assert meta.slug == "effectual_demand"
|
||||
assert meta.has_original_wording is True
|
||||
|
||||
def test_missing_h1_raises(self, tmp_path):
|
||||
f = tmp_path / "no-h1.md"
|
||||
f.write_text(NO_H1)
|
||||
with pytest.raises(ValueError, match="No H1"):
|
||||
parse_entity_file(f)
|
||||
|
||||
def test_missing_sections_return_empty(self, tmp_path):
|
||||
f = tmp_path / "minimal.md"
|
||||
f.write_text(MINIMAL_ENTITY)
|
||||
meta = parse_entity_file(f)
|
||||
|
||||
# Optional sections not present → empty string
|
||||
assert meta.original_wording == ""
|
||||
assert meta.modern_interpretation == ""
|
||||
|
||||
def test_word_count_accuracy(self, tmp_path):
|
||||
f = tmp_path / "test.md"
|
||||
f.write_text("# Test\n\n## Definition\n\none two three four five\n")
|
||||
meta = parse_entity_file(f)
|
||||
assert meta.definition_word_count == 5
|
||||
|
||||
|
||||
# ── parse_entity_directory ──────────────────────────────────────────
|
||||
|
||||
class TestParseEntityDirectory:
|
||||
def _make_dir(self, tmp_path):
|
||||
"""Create a temporary entity directory."""
|
||||
d = tmp_path / "entities"
|
||||
d.mkdir()
|
||||
(d / "entity-a.md").write_text(COMPLETE_ENTITY)
|
||||
(d / "entity-b.md").write_text(MINIMAL_ENTITY)
|
||||
# files that should be excluded by default
|
||||
(d / "book-1-chapter-01-entities.md").write_text("# View\n\nview file")
|
||||
(d / "book-1-chapter-01-prompt.md").write_text("# Prompt\n\nprompt file")
|
||||
return d
|
||||
|
||||
def test_excludes_view_and_prompt(self, tmp_path):
|
||||
d = self._make_dir(tmp_path)
|
||||
results = parse_entity_directory(d)
|
||||
slugs = {e.slug for e in results}
|
||||
|
||||
assert "division_of_labour" in slugs
|
||||
assert "minimal_entity" in slugs
|
||||
# Excluded files should not be parsed as entities
|
||||
assert len(results) == 2
|
||||
|
||||
def test_custom_exclude_patterns(self, tmp_path):
|
||||
d = self._make_dir(tmp_path)
|
||||
# Only exclude prompt files, allow entity views
|
||||
results = parse_entity_directory(d, exclude_patterns=[r".*-prompt\.md$"])
|
||||
assert len(results) == 3 # entity-a, entity-b, chapter-01-entities
|
||||
|
||||
def test_malformed_skipped_with_warning(self, tmp_path, caplog):
|
||||
d = tmp_path / "entities"
|
||||
d.mkdir()
|
||||
(d / "good.md").write_text(COMPLETE_ENTITY)
|
||||
(d / "bad.md").write_text(NO_H1)
|
||||
|
||||
with caplog.at_level(logging.WARNING):
|
||||
results = parse_entity_directory(d)
|
||||
|
||||
assert len(results) == 1
|
||||
assert "bad.md" in caplog.text
|
||||
|
||||
|
||||
# ── EntityMeta round-trip ───────────────────────────────────────────
|
||||
|
||||
class TestEntityMetaRoundTrip:
|
||||
def test_to_dict_from_dict(self, tmp_path):
|
||||
f = tmp_path / "entity.md"
|
||||
f.write_text(COMPLETE_ENTITY)
|
||||
original = parse_entity_file(f)
|
||||
|
||||
data = original.to_dict()
|
||||
restored = EntityMeta.from_dict(data)
|
||||
|
||||
assert restored.slug == original.slug
|
||||
assert restored.title == original.title
|
||||
assert restored.definition == original.definition
|
||||
assert restored.h1_is_title_case == original.h1_is_title_case
|
||||
assert restored.section_slugs == original.section_slugs
|
||||
assert restored.definition_word_count == original.definition_word_count
|
||||
|
||||
def test_from_dict_ignores_unknown_keys(self):
|
||||
data = {
|
||||
"slug": "test",
|
||||
"title": "Test",
|
||||
"h1_raw": "Test",
|
||||
"unknown_field": "should be ignored",
|
||||
}
|
||||
meta = EntityMeta.from_dict(data)
|
||||
assert meta.slug == "test"
|
||||
assert not hasattr(meta, "unknown_field") or "unknown_field" not in meta.__dict__
|
||||
224
tests/unit/infospace/test_evaluate.py
Normal file
224
tests/unit/infospace/test_evaluate.py
Normal file
@@ -0,0 +1,224 @@
|
||||
"""Tests for markitect.infospace.evaluate."""
|
||||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.infospace.config import InfospaceConfig, TopicConfig
|
||||
from markitect.infospace.evaluate import (
|
||||
build_evaluation_prompt,
|
||||
content_digest,
|
||||
parse_evaluation_response,
|
||||
run_entity_evaluation,
|
||||
)
|
||||
from markitect.infospace.evaluation import ScoreEntry
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.prompts.execution.llm_adapter import MockLLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _entity(**overrides) -> EntityMeta:
|
||||
defaults = dict(
|
||||
slug="division-of-labour",
|
||||
title="Division Of Labour",
|
||||
h1_raw="Division Of Labour",
|
||||
definition="Splitting work into specialised tasks.",
|
||||
source_chapter="Book I Chapter 1",
|
||||
context="Smith introduces the concept early.",
|
||||
domain="Production",
|
||||
source_path="entities/division-of-labour.md",
|
||||
)
|
||||
defaults.update(overrides)
|
||||
return EntityMeta(**defaults)
|
||||
|
||||
|
||||
def _config() -> InfospaceConfig:
|
||||
return InfospaceConfig(topic=TopicConfig(name="The Wealth of Nations"))
|
||||
|
||||
|
||||
_MOCK_RESPONSE = """\
|
||||
DIMENSION: definition_precision
|
||||
SCORE: 4.5
|
||||
RATIONALE: Clear and specific definition of the concept.
|
||||
|
||||
DIMENSION: source_grounding
|
||||
SCORE: 4.0
|
||||
RATIONALE: Well grounded in Smith's text.
|
||||
|
||||
DIMENSION: domain_relevance
|
||||
SCORE: 5.0
|
||||
RATIONALE: Directly relevant to production economics.
|
||||
"""
|
||||
|
||||
|
||||
# ── build_evaluation_prompt ──────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBuildPrompt:
|
||||
def test_contains_entity_fields(self):
|
||||
entity = _entity()
|
||||
prompt = build_evaluation_prompt(entity, "Test Topic")
|
||||
assert "division-of-labour" in prompt
|
||||
assert "Division Of Labour" in prompt
|
||||
assert "Production" in prompt
|
||||
assert "Splitting work" in prompt
|
||||
|
||||
def test_contains_topic(self):
|
||||
prompt = build_evaluation_prompt(_entity(), "WoN")
|
||||
assert "WoN" in prompt
|
||||
|
||||
def test_contains_dimensions(self):
|
||||
prompt = build_evaluation_prompt(_entity(), "T")
|
||||
assert "definition_precision" in prompt
|
||||
assert "source_grounding" in prompt
|
||||
|
||||
def test_custom_dimensions(self):
|
||||
prompt = build_evaluation_prompt(
|
||||
_entity(), "T", dimensions=["novelty", "coherence"]
|
||||
)
|
||||
assert "novelty" in prompt
|
||||
assert "coherence" in prompt
|
||||
assert "definition_precision" not in prompt
|
||||
|
||||
def test_handles_missing_fields(self):
|
||||
entity = _entity(definition="", context="", domain="")
|
||||
prompt = build_evaluation_prompt(entity, "T")
|
||||
assert "(no definition)" in prompt
|
||||
assert "(no context)" in prompt
|
||||
assert "(unspecified)" in prompt
|
||||
|
||||
|
||||
# ── content_digest ───────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestContentDigest:
|
||||
def test_deterministic(self):
|
||||
e = _entity()
|
||||
assert content_digest(e) == content_digest(e)
|
||||
|
||||
def test_changes_with_content(self):
|
||||
e1 = _entity(definition="A")
|
||||
e2 = _entity(definition="B")
|
||||
assert content_digest(e1) != content_digest(e2)
|
||||
|
||||
|
||||
# ── parse_evaluation_response ────────────────────────────────────────
|
||||
|
||||
|
||||
class TestParseResponse:
|
||||
def test_parses_three_dimensions(self):
|
||||
scores = parse_evaluation_response(_MOCK_RESPONSE)
|
||||
assert len(scores) == 3
|
||||
|
||||
def test_correct_names(self):
|
||||
scores = parse_evaluation_response(_MOCK_RESPONSE)
|
||||
names = [s.name for s in scores]
|
||||
assert "definition_precision" in names
|
||||
assert "source_grounding" in names
|
||||
assert "domain_relevance" in names
|
||||
|
||||
def test_correct_scores(self):
|
||||
scores = parse_evaluation_response(_MOCK_RESPONSE)
|
||||
by_name = {s.name: s for s in scores}
|
||||
assert by_name["definition_precision"].value == 4.5
|
||||
assert by_name["source_grounding"].value == 4.0
|
||||
assert by_name["domain_relevance"].value == 5.0
|
||||
|
||||
def test_correct_rationales(self):
|
||||
scores = parse_evaluation_response(_MOCK_RESPONSE)
|
||||
by_name = {s.name: s for s in scores}
|
||||
assert "Clear" in by_name["definition_precision"].rationale
|
||||
|
||||
def test_empty_response(self):
|
||||
scores = parse_evaluation_response("")
|
||||
assert scores == []
|
||||
|
||||
def test_malformed_score_skipped(self):
|
||||
text = "DIMENSION: x\nSCORE: not-a-number\nRATIONALE: oops"
|
||||
scores = parse_evaluation_response(text)
|
||||
assert len(scores) == 0
|
||||
|
||||
|
||||
# ── run_entity_evaluation ────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestRunEntityEvaluation:
|
||||
def test_evaluates_entities(self, tmp_path):
|
||||
adapter = MockLLMAdapter(_MOCK_RESPONSE)
|
||||
cfg = _config()
|
||||
entities = [_entity(), _entity(slug="pin-factory", title="Pin Factory")]
|
||||
|
||||
summary = run_entity_evaluation(
|
||||
config=cfg,
|
||||
entities=entities,
|
||||
adapter=adapter,
|
||||
output_dir=tmp_path / "evals",
|
||||
)
|
||||
assert summary.total == 2
|
||||
assert summary.succeeded == 2
|
||||
assert adapter.call_count == 2
|
||||
|
||||
def test_writes_evaluation_files(self, tmp_path):
|
||||
adapter = MockLLMAdapter(_MOCK_RESPONSE)
|
||||
cfg = _config()
|
||||
entities = [_entity()]
|
||||
|
||||
run_entity_evaluation(
|
||||
config=cfg,
|
||||
entities=entities,
|
||||
adapter=adapter,
|
||||
output_dir=tmp_path / "evals",
|
||||
)
|
||||
eval_file = tmp_path / "evals" / "division-of-labour.md"
|
||||
assert eval_file.exists()
|
||||
text = eval_file.read_text()
|
||||
assert "definition_precision" in text
|
||||
|
||||
def test_incremental_skip(self, tmp_path):
|
||||
adapter = MockLLMAdapter(_MOCK_RESPONSE)
|
||||
cfg = _config()
|
||||
entity = _entity()
|
||||
digest = content_digest(entity)
|
||||
|
||||
summary = run_entity_evaluation(
|
||||
config=cfg,
|
||||
entities=[entity],
|
||||
adapter=adapter,
|
||||
output_dir=tmp_path,
|
||||
previous_digests={entity.slug: digest},
|
||||
)
|
||||
assert summary.skipped == 1
|
||||
assert adapter.call_count == 0
|
||||
|
||||
def test_progress_callback_called(self, tmp_path):
|
||||
adapter = MockLLMAdapter(_MOCK_RESPONSE)
|
||||
cfg = _config()
|
||||
calls = []
|
||||
|
||||
run_entity_evaluation(
|
||||
config=cfg,
|
||||
entities=[_entity()],
|
||||
adapter=adapter,
|
||||
output_dir=tmp_path,
|
||||
progress_callback=lambda d, t, r: calls.append((d, t, r.key)),
|
||||
)
|
||||
assert len(calls) == 1
|
||||
assert calls[0] == (1, 1, "division-of-labour")
|
||||
|
||||
def test_passes_run_config(self, tmp_path):
|
||||
adapter = MockLLMAdapter(_MOCK_RESPONSE)
|
||||
cfg = _config()
|
||||
rc = RunConfig(temperature=0.1, max_tokens=500)
|
||||
|
||||
run_entity_evaluation(
|
||||
config=cfg,
|
||||
entities=[_entity()],
|
||||
adapter=adapter,
|
||||
run_config=rc,
|
||||
output_dir=tmp_path,
|
||||
)
|
||||
assert adapter.last_config.temperature == 0.1
|
||||
398
tests/unit/infospace/test_evaluation.py
Normal file
398
tests/unit/infospace/test_evaluation.py
Normal file
@@ -0,0 +1,398 @@
|
||||
"""Tests for markitect.infospace evaluation models and I/O."""
|
||||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.infospace import (
|
||||
EntityEvaluation,
|
||||
EvaluationSnapshot,
|
||||
MetricChange,
|
||||
MetricValue,
|
||||
ScoreChange,
|
||||
ScoreEntry,
|
||||
SnapshotDiff,
|
||||
append_to_history,
|
||||
diff_snapshots,
|
||||
read_entity_evaluation,
|
||||
read_history,
|
||||
read_snapshot,
|
||||
write_entity_evaluation,
|
||||
write_snapshot,
|
||||
)
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
_NOW = datetime(2026, 2, 19, 12, 0, 0)
|
||||
|
||||
|
||||
def _sample_scores() -> list:
|
||||
return [
|
||||
ScoreEntry("definition_precision", 4.5, rationale="Clear and specific."),
|
||||
ScoreEntry("source_grounding", 4.0, rationale="Well grounded."),
|
||||
ScoreEntry("domain_relevance", 4.5),
|
||||
]
|
||||
|
||||
|
||||
def _sample_evaluation(**overrides) -> EntityEvaluation:
|
||||
defaults = dict(
|
||||
entity_slug="division-of-labour",
|
||||
evaluator="openrouter/anthropic/claude-3.5-sonnet",
|
||||
scores=_sample_scores(),
|
||||
evaluated_at=_NOW,
|
||||
notes=["Strong entity with clear provenance"],
|
||||
)
|
||||
defaults.update(overrides)
|
||||
return EntityEvaluation(**defaults)
|
||||
|
||||
|
||||
def _sample_metric() -> MetricValue:
|
||||
return MetricValue("coverage_ratio", 0.85, concern="C2", details={"checked": 85})
|
||||
|
||||
|
||||
def _sample_snapshot(**overrides) -> EvaluationSnapshot:
|
||||
defaults = dict(
|
||||
snapshot_id="2026-02-19",
|
||||
created_at=_NOW,
|
||||
schema_name="Economic Entity",
|
||||
entity_count=1,
|
||||
entity_evaluations=[_sample_evaluation()],
|
||||
collection_metrics=[_sample_metric()],
|
||||
metadata={"version": "1.0"},
|
||||
)
|
||||
defaults.update(overrides)
|
||||
return EvaluationSnapshot(**defaults)
|
||||
|
||||
|
||||
# ── Model tests ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestScoreEntry:
|
||||
def test_to_dict_from_dict_round_trip(self):
|
||||
se = ScoreEntry("precision", 4.5, 5.0, "Good definition.")
|
||||
d = se.to_dict()
|
||||
restored = ScoreEntry.from_dict(d)
|
||||
assert restored.name == se.name
|
||||
assert restored.value == se.value
|
||||
assert restored.max_value == se.max_value
|
||||
assert restored.rationale == se.rationale
|
||||
|
||||
def test_to_dict_omits_empty_rationale(self):
|
||||
se = ScoreEntry("precision", 4.5)
|
||||
d = se.to_dict()
|
||||
assert "rationale" not in d
|
||||
|
||||
def test_from_dict_defaults(self):
|
||||
se = ScoreEntry.from_dict({"name": "x", "value": 3.0})
|
||||
assert se.max_value == 5.0
|
||||
assert se.rationale == ""
|
||||
|
||||
|
||||
class TestEntityEvaluation:
|
||||
def test_overall_score_is_mean(self):
|
||||
ev = _sample_evaluation()
|
||||
# (4.5 + 4.0 + 4.5) / 3 ≈ 4.333
|
||||
assert abs(ev.overall_score - 4.333333) < 0.001
|
||||
|
||||
def test_overall_score_zero_scores(self):
|
||||
ev = _sample_evaluation(scores=[])
|
||||
assert ev.overall_score == 0.0
|
||||
|
||||
def test_to_dict_from_dict_round_trip(self):
|
||||
ev = _sample_evaluation()
|
||||
d = ev.to_dict()
|
||||
restored = EntityEvaluation.from_dict(d)
|
||||
assert restored.entity_slug == ev.entity_slug
|
||||
assert restored.evaluator == ev.evaluator
|
||||
assert len(restored.scores) == len(ev.scores)
|
||||
assert restored.evaluated_at == ev.evaluated_at
|
||||
assert restored.notes == ev.notes
|
||||
|
||||
def test_to_dict_includes_overall_score(self):
|
||||
ev = _sample_evaluation()
|
||||
d = ev.to_dict()
|
||||
assert "overall_score" in d
|
||||
assert abs(d["overall_score"] - 4.3333) < 0.01
|
||||
|
||||
|
||||
class TestMetricValue:
|
||||
def test_to_dict_from_dict_round_trip(self):
|
||||
mv = _sample_metric()
|
||||
d = mv.to_dict()
|
||||
restored = MetricValue.from_dict(d)
|
||||
assert restored.name == mv.name
|
||||
assert restored.value == mv.value
|
||||
assert restored.concern == mv.concern
|
||||
assert restored.details == mv.details
|
||||
|
||||
def test_to_dict_omits_empty_concern(self):
|
||||
mv = MetricValue("x", 1.0)
|
||||
d = mv.to_dict()
|
||||
assert "concern" not in d
|
||||
assert "details" not in d
|
||||
|
||||
|
||||
class TestEvaluationSnapshot:
|
||||
def test_to_dict_from_dict_round_trip(self):
|
||||
snap = _sample_snapshot()
|
||||
d = snap.to_dict()
|
||||
restored = EvaluationSnapshot.from_dict(d)
|
||||
assert restored.snapshot_id == snap.snapshot_id
|
||||
assert restored.created_at == snap.created_at
|
||||
assert restored.schema_name == snap.schema_name
|
||||
assert restored.entity_count == snap.entity_count
|
||||
assert len(restored.entity_evaluations) == 1
|
||||
assert len(restored.collection_metrics) == 1
|
||||
assert restored.metadata == snap.metadata
|
||||
|
||||
def test_from_dict_empty_lists(self):
|
||||
d = {
|
||||
"snapshot_id": "test",
|
||||
"created_at": _NOW.isoformat(),
|
||||
"schema_name": "Test",
|
||||
"entity_count": 0,
|
||||
}
|
||||
snap = EvaluationSnapshot.from_dict(d)
|
||||
assert snap.entity_evaluations == []
|
||||
assert snap.collection_metrics == []
|
||||
assert snap.metadata == {}
|
||||
|
||||
|
||||
# ── Per-entity file I/O ──────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEntityEvaluationIO:
|
||||
def test_write_creates_file(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
assert p.exists()
|
||||
|
||||
def test_file_has_yaml_frontmatter(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
text = p.read_text()
|
||||
assert text.startswith("---\n")
|
||||
assert "\n---\n" in text
|
||||
|
||||
def test_frontmatter_contains_expected_keys(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
text = p.read_text()
|
||||
for key in ["entity_slug", "evaluator", "evaluated_at", "overall_score", "scores"]:
|
||||
assert key in text
|
||||
|
||||
def test_markdown_body_contains_rationales(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
text = p.read_text()
|
||||
assert "Clear and specific." in text
|
||||
assert "Well grounded." in text
|
||||
assert "## definition_precision" in text
|
||||
|
||||
def test_read_back_matches_original(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
restored = read_entity_evaluation(p)
|
||||
assert restored.entity_slug == ev.entity_slug
|
||||
assert restored.evaluator == ev.evaluator
|
||||
assert restored.evaluated_at == ev.evaluated_at
|
||||
assert restored.notes == ev.notes
|
||||
assert len(restored.scores) == len(ev.scores)
|
||||
|
||||
def test_round_trip_preserves_scores(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
restored = read_entity_evaluation(p)
|
||||
for orig, rest in zip(ev.scores, restored.scores):
|
||||
assert rest.name == orig.name
|
||||
assert rest.value == orig.value
|
||||
assert rest.max_value == orig.max_value
|
||||
|
||||
def test_round_trip_preserves_rationales(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
restored = read_entity_evaluation(p)
|
||||
assert restored.scores[0].rationale == "Clear and specific."
|
||||
assert restored.scores[1].rationale == "Well grounded."
|
||||
# Third score has no rationale
|
||||
assert restored.scores[2].rationale == ""
|
||||
|
||||
def test_write_creates_parent_dirs(self, tmp_path):
|
||||
ev = _sample_evaluation()
|
||||
p = tmp_path / "deep" / "nested" / "eval.md"
|
||||
write_entity_evaluation(ev, p)
|
||||
assert p.exists()
|
||||
|
||||
|
||||
# ── Snapshot I/O ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSnapshotIO:
|
||||
def test_write_creates_file(self, tmp_path):
|
||||
snap = _sample_snapshot()
|
||||
p = tmp_path / "snapshot.yaml"
|
||||
write_snapshot(snap, p)
|
||||
assert p.exists()
|
||||
|
||||
def test_read_back_matches_original(self, tmp_path):
|
||||
snap = _sample_snapshot()
|
||||
p = tmp_path / "snapshot.yaml"
|
||||
write_snapshot(snap, p)
|
||||
restored = read_snapshot(p)
|
||||
assert restored.snapshot_id == snap.snapshot_id
|
||||
assert restored.created_at == snap.created_at
|
||||
assert restored.schema_name == snap.schema_name
|
||||
assert restored.entity_count == snap.entity_count
|
||||
|
||||
def test_round_trip_preserves_entity_evaluations(self, tmp_path):
|
||||
snap = _sample_snapshot()
|
||||
p = tmp_path / "snapshot.yaml"
|
||||
write_snapshot(snap, p)
|
||||
restored = read_snapshot(p)
|
||||
assert len(restored.entity_evaluations) == 1
|
||||
ev = restored.entity_evaluations[0]
|
||||
assert ev.entity_slug == "division-of-labour"
|
||||
assert len(ev.scores) == 3
|
||||
|
||||
def test_round_trip_preserves_collection_metrics(self, tmp_path):
|
||||
snap = _sample_snapshot()
|
||||
p = tmp_path / "snapshot.yaml"
|
||||
write_snapshot(snap, p)
|
||||
restored = read_snapshot(p)
|
||||
assert len(restored.collection_metrics) == 1
|
||||
m = restored.collection_metrics[0]
|
||||
assert m.name == "coverage_ratio"
|
||||
assert m.value == 0.85
|
||||
assert m.concern == "C2"
|
||||
|
||||
|
||||
# ── History ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestHistory:
|
||||
def test_append_creates_new_file(self, tmp_path):
|
||||
snap = _sample_snapshot()
|
||||
hp = tmp_path / "history.yaml"
|
||||
append_to_history(snap, hp)
|
||||
assert hp.exists()
|
||||
history = read_history(hp)
|
||||
assert len(history) == 1
|
||||
|
||||
def test_append_adds_to_existing(self, tmp_path):
|
||||
hp = tmp_path / "history.yaml"
|
||||
snap1 = _sample_snapshot(snapshot_id="snap-1")
|
||||
snap2 = _sample_snapshot(snapshot_id="snap-2")
|
||||
append_to_history(snap1, hp)
|
||||
append_to_history(snap2, hp)
|
||||
history = read_history(hp)
|
||||
assert len(history) == 2
|
||||
assert history[0].snapshot_id == "snap-1"
|
||||
assert history[1].snapshot_id == "snap-2"
|
||||
|
||||
def test_multiple_appends_all_preserved(self, tmp_path):
|
||||
hp = tmp_path / "history.yaml"
|
||||
for i in range(5):
|
||||
snap = _sample_snapshot(snapshot_id=f"snap-{i}")
|
||||
append_to_history(snap, hp)
|
||||
history = read_history(hp)
|
||||
assert len(history) == 5
|
||||
assert [h.snapshot_id for h in history] == [f"snap-{i}" for i in range(5)]
|
||||
|
||||
def test_read_history_returns_list_in_order(self, tmp_path):
|
||||
hp = tmp_path / "history.yaml"
|
||||
snap_a = _sample_snapshot(snapshot_id="a")
|
||||
snap_b = _sample_snapshot(snapshot_id="b")
|
||||
append_to_history(snap_a, hp)
|
||||
append_to_history(snap_b, hp)
|
||||
history = read_history(hp)
|
||||
assert history[0].snapshot_id == "a"
|
||||
assert history[1].snapshot_id == "b"
|
||||
|
||||
|
||||
# ── Diffing ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestDiffSnapshots:
|
||||
def test_identical_snapshots_empty_diff(self):
|
||||
snap = _sample_snapshot()
|
||||
diff = diff_snapshots(snap, snap)
|
||||
assert diff.added_entities == []
|
||||
assert diff.removed_entities == []
|
||||
assert diff.score_changes == []
|
||||
assert diff.metric_changes == []
|
||||
|
||||
def test_added_entity(self):
|
||||
before = _sample_snapshot(entity_evaluations=[])
|
||||
after = _sample_snapshot()
|
||||
diff = diff_snapshots(before, after)
|
||||
assert "division-of-labour" in diff.added_entities
|
||||
assert diff.removed_entities == []
|
||||
|
||||
def test_removed_entity(self):
|
||||
before = _sample_snapshot()
|
||||
after = _sample_snapshot(entity_evaluations=[])
|
||||
diff = diff_snapshots(before, after)
|
||||
assert "division-of-labour" in diff.removed_entities
|
||||
assert diff.added_entities == []
|
||||
|
||||
def test_changed_score(self):
|
||||
ev_before = _sample_evaluation(scores=[ScoreEntry("precision", 4.0)])
|
||||
ev_after = _sample_evaluation(scores=[ScoreEntry("precision", 4.8)])
|
||||
before = _sample_snapshot(entity_evaluations=[ev_before])
|
||||
after = _sample_snapshot(entity_evaluations=[ev_after])
|
||||
diff = diff_snapshots(before, after)
|
||||
assert len(diff.score_changes) == 1
|
||||
sc = diff.score_changes[0]
|
||||
assert sc.entity_slug == "division-of-labour"
|
||||
assert sc.dimension == "precision"
|
||||
assert sc.before == 4.0
|
||||
assert sc.after == 4.8
|
||||
|
||||
def test_changed_metric(self):
|
||||
before = _sample_snapshot(
|
||||
collection_metrics=[MetricValue("coverage_ratio", 0.80)]
|
||||
)
|
||||
after = _sample_snapshot(
|
||||
collection_metrics=[MetricValue("coverage_ratio", 0.90)]
|
||||
)
|
||||
diff = diff_snapshots(before, after)
|
||||
assert len(diff.metric_changes) == 1
|
||||
mc = diff.metric_changes[0]
|
||||
assert mc.name == "coverage_ratio"
|
||||
assert mc.before == 0.80
|
||||
assert mc.after == 0.90
|
||||
|
||||
def test_summary_readable(self):
|
||||
ev_before = _sample_evaluation(scores=[ScoreEntry("precision", 4.0)])
|
||||
ev_after = _sample_evaluation(scores=[ScoreEntry("precision", 4.8)])
|
||||
before = _sample_snapshot(
|
||||
snapshot_id="snap-1",
|
||||
entity_evaluations=[ev_before],
|
||||
collection_metrics=[MetricValue("coverage", 0.80)],
|
||||
)
|
||||
after = _sample_snapshot(
|
||||
snapshot_id="snap-2",
|
||||
entity_evaluations=[ev_after],
|
||||
collection_metrics=[MetricValue("coverage", 0.90)],
|
||||
)
|
||||
diff = diff_snapshots(before, after)
|
||||
text = diff.summary()
|
||||
assert "snap-1" in text
|
||||
assert "snap-2" in text
|
||||
assert "precision" in text
|
||||
assert "coverage" in text
|
||||
|
||||
def test_summary_no_changes(self):
|
||||
snap = _sample_snapshot()
|
||||
diff = diff_snapshots(snap, snap)
|
||||
text = diff.summary()
|
||||
assert "No changes" in text
|
||||
258
tests/unit/infospace/test_history.py
Normal file
258
tests/unit/infospace/test_history.py
Normal file
@@ -0,0 +1,258 @@
|
||||
"""
|
||||
Tests for metrics history and viability tracking (S2.5).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import yaml
|
||||
|
||||
from markitect.infospace.checks.orchestrator import CheckReport
|
||||
from markitect.infospace.checks.granularity import GranularityReport
|
||||
from markitect.infospace.checks.redundancy import RedundancyReport
|
||||
from markitect.infospace.config import InfospaceConfig, TopicConfig, ViabilityThreshold
|
||||
from markitect.infospace.evaluation import EvaluationSnapshot, MetricValue
|
||||
from markitect.infospace.history import (
|
||||
find_snapshot_by_date,
|
||||
get_history,
|
||||
get_latest_snapshot,
|
||||
metric_trend,
|
||||
read_metrics_file,
|
||||
record_check_results,
|
||||
snapshot_from_checks,
|
||||
write_metrics_file,
|
||||
)
|
||||
|
||||
|
||||
# ── helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _check_report() -> CheckReport:
|
||||
return CheckReport(
|
||||
redundancy=RedundancyReport(redundancy_ratio=0.1, entity_count=10),
|
||||
granularity=GranularityReport(domain_entropy=1.5, entity_count=10),
|
||||
)
|
||||
|
||||
|
||||
def _config(tmp_path: Path) -> InfospaceConfig:
|
||||
return InfospaceConfig(
|
||||
topic=TopicConfig(name="Test Topic", domain="Testing"),
|
||||
metrics_dir=str(tmp_path / "metrics"),
|
||||
)
|
||||
|
||||
|
||||
def _snapshot(snap_id: str, date_str: str, metrics: dict) -> EvaluationSnapshot:
|
||||
return EvaluationSnapshot(
|
||||
snapshot_id=snap_id,
|
||||
created_at=datetime.fromisoformat(date_str).replace(tzinfo=timezone.utc),
|
||||
schema_name="default",
|
||||
entity_count=10,
|
||||
collection_metrics=[
|
||||
MetricValue(name=k, value=v) for k, v in metrics.items()
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
# ── snapshot_from_checks ────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSnapshotFromChecks:
|
||||
def test_creates_snapshot(self):
|
||||
report = _check_report()
|
||||
snap = snapshot_from_checks(report, entity_count=10)
|
||||
assert snap.entity_count == 10
|
||||
assert snap.snapshot_id # non-empty
|
||||
assert snap.created_at is not None
|
||||
|
||||
def test_contains_metrics(self):
|
||||
report = _check_report()
|
||||
snap = snapshot_from_checks(report, entity_count=10)
|
||||
metric_names = {m.name for m in snap.collection_metrics}
|
||||
assert "redundancy_ratio" in metric_names
|
||||
assert "granularity_entropy" in metric_names
|
||||
|
||||
def test_concern_labels(self):
|
||||
report = _check_report()
|
||||
snap = snapshot_from_checks(report, entity_count=10)
|
||||
by_name = {m.name: m for m in snap.collection_metrics}
|
||||
assert by_name["redundancy_ratio"].concern == "C1"
|
||||
assert by_name["granularity_entropy"].concern == "C5"
|
||||
|
||||
def test_custom_schema(self):
|
||||
report = _check_report()
|
||||
snap = snapshot_from_checks(report, entity_count=5, schema_name="custom")
|
||||
assert snap.schema_name == "custom"
|
||||
|
||||
def test_metadata(self):
|
||||
report = _check_report()
|
||||
snap = snapshot_from_checks(report, entity_count=5, metadata={"key": "val"})
|
||||
assert snap.metadata == {"key": "val"}
|
||||
|
||||
def test_empty_report(self):
|
||||
report = CheckReport()
|
||||
snap = snapshot_from_checks(report, entity_count=0)
|
||||
assert snap.collection_metrics == []
|
||||
|
||||
|
||||
# ── write_metrics_file / read_metrics_file ──────────────────────────
|
||||
|
||||
|
||||
class TestMetricsFileIO:
|
||||
def test_round_trip(self, tmp_path):
|
||||
path = tmp_path / "metrics.yaml"
|
||||
metrics = {"redundancy_ratio": 0.05, "coverage_ratio": 0.85}
|
||||
write_metrics_file(metrics, path)
|
||||
loaded = read_metrics_file(path)
|
||||
assert loaded["redundancy_ratio"] == pytest.approx(0.05)
|
||||
assert loaded["coverage_ratio"] == pytest.approx(0.85)
|
||||
|
||||
def test_creates_parent_dirs(self, tmp_path):
|
||||
path = tmp_path / "deep" / "nested" / "metrics.yaml"
|
||||
write_metrics_file({"x": 1.0}, path)
|
||||
assert path.is_file()
|
||||
|
||||
def test_read_missing_file(self, tmp_path):
|
||||
path = tmp_path / "nonexistent.yaml"
|
||||
assert read_metrics_file(path) == {}
|
||||
|
||||
def test_read_invalid_content(self, tmp_path):
|
||||
path = tmp_path / "bad.yaml"
|
||||
path.write_text("just a string", encoding="utf-8")
|
||||
assert read_metrics_file(path) == {}
|
||||
|
||||
|
||||
# ── record_check_results ────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestRecordCheckResults:
|
||||
def test_creates_metrics_file(self, tmp_path):
|
||||
cfg = _config(tmp_path)
|
||||
report = _check_report()
|
||||
record_check_results(report, cfg, tmp_path, entity_count=10)
|
||||
metrics_path = tmp_path / cfg.metrics_dir / "metrics.yaml"
|
||||
assert metrics_path.is_file()
|
||||
|
||||
def test_creates_history_file(self, tmp_path):
|
||||
cfg = _config(tmp_path)
|
||||
report = _check_report()
|
||||
record_check_results(report, cfg, tmp_path, entity_count=10)
|
||||
history_path = tmp_path / cfg.metrics_dir / "history.yaml"
|
||||
assert history_path.is_file()
|
||||
|
||||
def test_appends_to_history(self, tmp_path):
|
||||
cfg = _config(tmp_path)
|
||||
report = _check_report()
|
||||
record_check_results(report, cfg, tmp_path, entity_count=10)
|
||||
record_check_results(report, cfg, tmp_path, entity_count=12)
|
||||
history = get_history(cfg, tmp_path)
|
||||
assert len(history) == 2
|
||||
assert history[0].entity_count == 10
|
||||
assert history[1].entity_count == 12
|
||||
|
||||
def test_returns_snapshot(self, tmp_path):
|
||||
cfg = _config(tmp_path)
|
||||
report = _check_report()
|
||||
snap = record_check_results(report, cfg, tmp_path, entity_count=10)
|
||||
assert snap.snapshot_id
|
||||
assert snap.entity_count == 10
|
||||
|
||||
|
||||
# ── get_history / get_latest_snapshot ────────────────────────────────
|
||||
|
||||
|
||||
class TestGetHistory:
|
||||
def test_empty_history(self, tmp_path):
|
||||
cfg = _config(tmp_path)
|
||||
assert get_history(cfg, tmp_path) == []
|
||||
|
||||
def test_get_latest(self, tmp_path):
|
||||
cfg = _config(tmp_path)
|
||||
report = _check_report()
|
||||
record_check_results(report, cfg, tmp_path, entity_count=5)
|
||||
record_check_results(report, cfg, tmp_path, entity_count=10)
|
||||
latest = get_latest_snapshot(cfg, tmp_path)
|
||||
assert latest is not None
|
||||
assert latest.entity_count == 10
|
||||
|
||||
def test_latest_none_when_empty(self, tmp_path):
|
||||
cfg = _config(tmp_path)
|
||||
assert get_latest_snapshot(cfg, tmp_path) is None
|
||||
|
||||
|
||||
# ── find_snapshot_by_date ────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestFindSnapshotByDate:
|
||||
def test_finds_closest(self):
|
||||
history = [
|
||||
_snapshot("a", "2026-01-01T10:00:00", {"x": 1.0}),
|
||||
_snapshot("b", "2026-02-15T10:00:00", {"x": 2.0}),
|
||||
_snapshot("c", "2026-03-01T10:00:00", {"x": 3.0}),
|
||||
]
|
||||
result = find_snapshot_by_date(history, "2026-02-14")
|
||||
assert result is not None
|
||||
assert result.snapshot_id == "b"
|
||||
|
||||
def test_exact_match(self):
|
||||
history = [
|
||||
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
|
||||
_snapshot("b", "2026-02-01T00:00:00", {"x": 2.0}),
|
||||
]
|
||||
result = find_snapshot_by_date(history, "2026-02-01")
|
||||
assert result is not None
|
||||
assert result.snapshot_id == "b"
|
||||
|
||||
def test_empty_history(self):
|
||||
assert find_snapshot_by_date([], "2026-01-01") is None
|
||||
|
||||
def test_invalid_date(self):
|
||||
history = [_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0})]
|
||||
assert find_snapshot_by_date(history, "not-a-date") is None
|
||||
|
||||
def test_with_timestamp(self):
|
||||
history = [
|
||||
_snapshot("a", "2026-01-01T10:00:00", {"x": 1.0}),
|
||||
_snapshot("b", "2026-01-01T14:00:00", {"x": 2.0}),
|
||||
]
|
||||
result = find_snapshot_by_date(history, "2026-01-01T13:00:00")
|
||||
assert result is not None
|
||||
assert result.snapshot_id == "b"
|
||||
|
||||
|
||||
# ── metric_trend ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestMetricTrend:
|
||||
def test_extracts_trend(self):
|
||||
history = [
|
||||
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0, "y": 2.0}),
|
||||
_snapshot("b", "2026-02-01T00:00:00", {"x": 1.5, "y": 2.5}),
|
||||
]
|
||||
trend = metric_trend(history, "x")
|
||||
assert len(trend) == 2
|
||||
assert trend[0]["value"] == 1.0
|
||||
assert trend[1]["value"] == 1.5
|
||||
|
||||
def test_missing_metric(self):
|
||||
history = [
|
||||
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
|
||||
]
|
||||
assert metric_trend(history, "nonexistent") == []
|
||||
|
||||
def test_empty_history(self):
|
||||
assert metric_trend([], "x") == []
|
||||
|
||||
def test_partial_presence(self):
|
||||
history = [
|
||||
_snapshot("a", "2026-01-01T00:00:00", {"x": 1.0}),
|
||||
_snapshot("b", "2026-02-01T00:00:00", {"y": 2.0}), # x missing
|
||||
_snapshot("c", "2026-03-01T00:00:00", {"x": 3.0}),
|
||||
]
|
||||
trend = metric_trend(history, "x")
|
||||
assert len(trend) == 2
|
||||
assert trend[0]["value"] == 1.0
|
||||
assert trend[1]["value"] == 3.0
|
||||
419
tests/unit/infospace/test_schema_validator.py
Normal file
419
tests/unit/infospace/test_schema_validator.py
Normal file
@@ -0,0 +1,419 @@
|
||||
"""Tests for markitect.infospace schema and validator modules."""
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.infospace import (
|
||||
ECONOMIC_ENTITY_SCHEMA,
|
||||
BatchComplianceResult,
|
||||
ComplianceDiagnostic,
|
||||
ComplianceResult,
|
||||
EntityMeta,
|
||||
EntitySchema,
|
||||
EnumConstraint,
|
||||
SectionRequirement,
|
||||
SectionRule,
|
||||
validate_entities,
|
||||
validate_entity,
|
||||
)
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
def _compliant_entity(**overrides) -> EntityMeta:
|
||||
"""Return an EntityMeta that passes ECONOMIC_ENTITY_SCHEMA."""
|
||||
defaults = dict(
|
||||
slug="division_of_labour",
|
||||
title="Division of Labour",
|
||||
h1_raw="Division of Labour",
|
||||
definition=(
|
||||
"The separation of a work process into a number of distinct "
|
||||
"tasks, each performed by a specialised worker, resulting in "
|
||||
"a significant increase in the productive powers of labour."
|
||||
),
|
||||
source_chapter='Book I, Chapter 1: "Of the Division of Labour"',
|
||||
context="The division of labour is the central argument of the chapter.",
|
||||
domain="Production",
|
||||
original_wording='"The greatest improvements in the productive powers…"',
|
||||
modern_interpretation="Remains foundational in economics.",
|
||||
h1_is_title_case=True,
|
||||
has_original_wording=True,
|
||||
definition_word_count=30,
|
||||
total_word_count=100,
|
||||
section_slugs=[
|
||||
"definition",
|
||||
"source_chapter",
|
||||
"context",
|
||||
"economic_domain",
|
||||
"smith_s_original_wording",
|
||||
"modern_interpretation",
|
||||
],
|
||||
source_path="/tmp/division-of-labour.md",
|
||||
)
|
||||
defaults.update(overrides)
|
||||
return EntityMeta(**defaults)
|
||||
|
||||
|
||||
# ── Single-entity validation ────────────────────────────────────────
|
||||
|
||||
class TestValidateEntityCompliant:
|
||||
def test_fully_compliant_zero_diagnostics(self):
|
||||
entity = _compliant_entity()
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
assert result.diagnostics == []
|
||||
assert result.is_compliant is True
|
||||
assert result.error_count == 0
|
||||
assert result.warning_count == 0
|
||||
assert result.checks_run > 0
|
||||
|
||||
def test_summary_shows_pass(self):
|
||||
entity = _compliant_entity()
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
assert "PASS" in result.summary()
|
||||
assert "division_of_labour" in result.summary()
|
||||
|
||||
|
||||
class TestSectionMissing:
|
||||
def test_missing_required_section_error(self):
|
||||
entity = _compliant_entity(definition="")
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
codes = [d.code for d in result.diagnostics]
|
||||
assert "SECTION_MISSING" in codes
|
||||
assert not result.is_compliant
|
||||
|
||||
def test_empty_required_section_error(self):
|
||||
entity = _compliant_entity(definition=" ")
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
codes = [d.code for d in result.diagnostics]
|
||||
assert "SECTION_MISSING" in codes
|
||||
|
||||
def test_optional_section_absent_no_diagnostic(self):
|
||||
entity = _compliant_entity(original_wording="", modern_interpretation="")
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
# Only optional sections removed — should still be fully compliant
|
||||
assert result.is_compliant is True
|
||||
assert result.error_count == 0
|
||||
# No SECTION_MISSING or SECTION_RECOMMENDED for optional sections
|
||||
section_codes = {d.code for d in result.diagnostics}
|
||||
assert "SECTION_MISSING" not in section_codes
|
||||
assert "SECTION_RECOMMENDED" not in section_codes
|
||||
|
||||
|
||||
class TestSectionRecommended:
|
||||
def test_recommended_section_missing_warning(self):
|
||||
schema = EntitySchema(
|
||||
name="Test Schema",
|
||||
section_rules=(
|
||||
SectionRule(
|
||||
slug="definition",
|
||||
label="Definition",
|
||||
requirement=SectionRequirement.RECOMMENDED,
|
||||
),
|
||||
),
|
||||
)
|
||||
entity = _compliant_entity(definition="")
|
||||
result = validate_entity(entity, schema)
|
||||
codes = [d.code for d in result.diagnostics]
|
||||
assert "SECTION_RECOMMENDED" in codes
|
||||
severities = [d.severity for d in result.diagnostics if d.code == "SECTION_RECOMMENDED"]
|
||||
assert severities == ["warning"]
|
||||
# Warnings don't break compliance
|
||||
assert result.is_compliant is True
|
||||
|
||||
|
||||
class TestWordCountBounds:
|
||||
def test_definition_too_short_error(self):
|
||||
entity = _compliant_entity(definition="only ten words here to test the lower boundary check now")
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
short_diags = [d for d in result.diagnostics if d.code == "SECTION_TOO_SHORT"]
|
||||
assert len(short_diags) == 1
|
||||
assert short_diags[0].severity == "error"
|
||||
assert not result.is_compliant
|
||||
|
||||
def test_definition_too_long_warning(self):
|
||||
long_def = " ".join(["word"] * 200)
|
||||
entity = _compliant_entity(definition=long_def)
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
long_diags = [d for d in result.diagnostics if d.code == "SECTION_TOO_LONG"]
|
||||
assert len(long_diags) == 1
|
||||
assert long_diags[0].severity == "warning"
|
||||
# Warnings don't break compliance
|
||||
assert result.is_compliant is True
|
||||
|
||||
def test_definition_at_min_boundary_passes(self):
|
||||
exactly_20 = " ".join(["word"] * 20)
|
||||
entity = _compliant_entity(definition=exactly_20)
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
codes = [d.code for d in result.diagnostics]
|
||||
assert "SECTION_TOO_SHORT" not in codes
|
||||
|
||||
def test_definition_at_max_boundary_passes(self):
|
||||
exactly_150 = " ".join(["word"] * 150)
|
||||
entity = _compliant_entity(definition=exactly_150)
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
codes = [d.code for d in result.diagnostics]
|
||||
assert "SECTION_TOO_LONG" not in codes
|
||||
|
||||
|
||||
class TestH1Checks:
|
||||
def test_slug_format_h1_warning(self):
|
||||
entity = _compliant_entity(
|
||||
h1_raw="effectual-demand",
|
||||
h1_is_title_case=False,
|
||||
)
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
h1_diags = [d for d in result.diagnostics if d.code == "H1_NOT_TITLE_CASE"]
|
||||
assert len(h1_diags) == 1
|
||||
assert h1_diags[0].severity == "warning"
|
||||
# Still compliant (it's a warning)
|
||||
assert result.is_compliant is True
|
||||
|
||||
def test_h1_missing_error(self):
|
||||
entity = _compliant_entity(slug="", h1_raw="")
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
codes = [d.code for d in result.diagnostics]
|
||||
assert "H1_MISSING" in codes
|
||||
assert not result.is_compliant
|
||||
|
||||
def test_h1_title_case_error_severity(self):
|
||||
schema = EntitySchema(
|
||||
name="Strict",
|
||||
section_rules=(),
|
||||
h1_title_case_severity="error",
|
||||
)
|
||||
entity = _compliant_entity(h1_is_title_case=False)
|
||||
result = validate_entity(entity, schema)
|
||||
h1_diags = [d for d in result.diagnostics if d.code == "H1_NOT_TITLE_CASE"]
|
||||
assert h1_diags[0].severity == "error"
|
||||
assert not result.is_compliant
|
||||
|
||||
|
||||
class TestEnumConstraints:
|
||||
def test_unknown_domain_warning(self):
|
||||
entity = _compliant_entity(domain="Metaphysics")
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
|
||||
assert len(enum_diags) == 1
|
||||
assert enum_diags[0].severity == "warning"
|
||||
assert result.is_compliant is True
|
||||
|
||||
def test_empty_domain_no_enum_diagnostic(self):
|
||||
"""Empty domain triggers SECTION_MISSING, not ENUM_VALUE_UNKNOWN."""
|
||||
entity = _compliant_entity(domain="")
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
enum_codes = [d.code for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
|
||||
assert len(enum_codes) == 0
|
||||
# But SECTION_MISSING is raised for the required section
|
||||
missing_codes = [d.code for d in result.diagnostics if d.code == "SECTION_MISSING"]
|
||||
assert len(missing_codes) >= 1
|
||||
|
||||
def test_valid_domain_no_diagnostic(self):
|
||||
for domain in ("Production", "Exchange", "Distribution", "Regulation", "General Theory"):
|
||||
entity = _compliant_entity(domain=domain)
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
|
||||
assert len(enum_diags) == 0, f"Unexpected enum diagnostic for domain '{domain}'"
|
||||
|
||||
|
||||
class TestMultipleIssues:
|
||||
def test_multiple_issues_on_one_entity(self):
|
||||
entity = _compliant_entity(
|
||||
definition="too short",
|
||||
domain="UnknownDomain",
|
||||
h1_is_title_case=False,
|
||||
)
|
||||
result = validate_entity(entity, ECONOMIC_ENTITY_SCHEMA)
|
||||
codes = {d.code for d in result.diagnostics}
|
||||
assert "SECTION_TOO_SHORT" in codes
|
||||
assert "ENUM_VALUE_UNKNOWN" in codes
|
||||
assert "H1_NOT_TITLE_CASE" in codes
|
||||
assert len(result.diagnostics) >= 3
|
||||
|
||||
|
||||
class TestCustomSchema:
|
||||
def test_custom_schema_different_rules(self):
|
||||
schema = EntitySchema(
|
||||
name="Custom",
|
||||
section_rules=(
|
||||
SectionRule(
|
||||
slug="definition",
|
||||
label="Definition",
|
||||
requirement=SectionRequirement.REQUIRED,
|
||||
min_words=5,
|
||||
max_words=50,
|
||||
),
|
||||
),
|
||||
enum_constraints=(
|
||||
EnumConstraint(
|
||||
field_name="domain",
|
||||
allowed_values=("Alpha", "Beta"),
|
||||
severity="error",
|
||||
),
|
||||
),
|
||||
h1_title_case_severity="error",
|
||||
require_h1=False,
|
||||
)
|
||||
entity = _compliant_entity(
|
||||
definition="just five words here exactly",
|
||||
domain="Alpha",
|
||||
)
|
||||
result = validate_entity(entity, schema)
|
||||
assert result.is_compliant is True
|
||||
assert result.schema_name == "Custom"
|
||||
|
||||
def test_custom_enum_error_severity(self):
|
||||
schema = EntitySchema(
|
||||
name="Strict Enum",
|
||||
section_rules=(),
|
||||
enum_constraints=(
|
||||
EnumConstraint(
|
||||
field_name="domain",
|
||||
allowed_values=("A",),
|
||||
severity="error",
|
||||
),
|
||||
),
|
||||
)
|
||||
entity = _compliant_entity(domain="B")
|
||||
result = validate_entity(entity, schema)
|
||||
assert not result.is_compliant
|
||||
enum_diags = [d for d in result.diagnostics if d.code == "ENUM_VALUE_UNKNOWN"]
|
||||
assert enum_diags[0].severity == "error"
|
||||
|
||||
|
||||
# ── Batch validation ────────────────────────────────────────────────
|
||||
|
||||
class TestBatchValidation:
|
||||
def test_empty_list(self):
|
||||
result = validate_entities([], ECONOMIC_ENTITY_SCHEMA)
|
||||
assert result.total_entities == 0
|
||||
assert result.compliant_count == 0
|
||||
assert result.total_errors == 0
|
||||
assert result.total_warnings == 0
|
||||
|
||||
def test_mixed_compliance(self):
|
||||
good = _compliant_entity()
|
||||
bad = _compliant_entity(slug="bad", definition="")
|
||||
result = validate_entities([good, bad], ECONOMIC_ENTITY_SCHEMA)
|
||||
assert result.total_entities == 2
|
||||
assert result.compliant_count == 1
|
||||
assert result.non_compliant_count == 1
|
||||
assert result.total_errors >= 1
|
||||
|
||||
def test_summary_format(self):
|
||||
good = _compliant_entity()
|
||||
bad = _compliant_entity(slug="bad_entity", definition="too short")
|
||||
result = validate_entities([good, bad], ECONOMIC_ENTITY_SCHEMA)
|
||||
summary = result.summary()
|
||||
assert "Schema: Economic Entity" in summary
|
||||
assert "Entities: 2" in summary
|
||||
assert "Compliant: 1/2" in summary
|
||||
assert "division_of_labour" in summary
|
||||
assert "bad_entity" in summary
|
||||
|
||||
def test_aggregate_counts(self):
|
||||
entities = [
|
||||
_compliant_entity(slug="e1"),
|
||||
_compliant_entity(slug="e2", definition="short"),
|
||||
_compliant_entity(slug="e3", domain="Unknown", h1_is_title_case=False),
|
||||
]
|
||||
result = validate_entities(entities, ECONOMIC_ENTITY_SCHEMA)
|
||||
assert result.total_entities == 3
|
||||
assert result.total_errors == result.results[0].error_count + result.results[1].error_count + result.results[2].error_count
|
||||
assert result.total_warnings == result.results[0].warning_count + result.results[1].warning_count + result.results[2].warning_count
|
||||
|
||||
def test_schema_name_propagated(self):
|
||||
result = validate_entities([], ECONOMIC_ENTITY_SCHEMA)
|
||||
assert result.schema_name == "Economic Entity"
|
||||
|
||||
|
||||
# ── Default schema checks ──────────────────────────────────────────
|
||||
|
||||
class TestDefaultSchema:
|
||||
def test_correct_section_count(self):
|
||||
assert len(ECONOMIC_ENTITY_SCHEMA.section_rules) == 6
|
||||
|
||||
def test_required_sections(self):
|
||||
required = [
|
||||
r.slug for r in ECONOMIC_ENTITY_SCHEMA.section_rules
|
||||
if r.requirement == SectionRequirement.REQUIRED
|
||||
]
|
||||
assert set(required) == {"definition", "source_chapter", "context", "economic_domain"}
|
||||
|
||||
def test_optional_sections(self):
|
||||
optional = [
|
||||
r.slug for r in ECONOMIC_ENTITY_SCHEMA.section_rules
|
||||
if r.requirement == SectionRequirement.OPTIONAL
|
||||
]
|
||||
assert set(optional) == {"smith_s_original_wording", "modern_interpretation"}
|
||||
|
||||
def test_domain_enum_values(self):
|
||||
domain_constraint = ECONOMIC_ENTITY_SCHEMA.enum_constraints[0]
|
||||
assert domain_constraint.field_name == "domain"
|
||||
assert set(domain_constraint.allowed_values) == {
|
||||
"Production", "Exchange", "Distribution", "Regulation", "General Theory",
|
||||
}
|
||||
|
||||
def test_schema_is_frozen(self):
|
||||
with pytest.raises(AttributeError):
|
||||
ECONOMIC_ENTITY_SCHEMA.name = "Changed"
|
||||
|
||||
def test_section_rule_is_frozen(self):
|
||||
rule = ECONOMIC_ENTITY_SCHEMA.section_rules[0]
|
||||
with pytest.raises(AttributeError):
|
||||
rule.slug = "changed"
|
||||
|
||||
def test_enum_constraint_is_frozen(self):
|
||||
constraint = ECONOMIC_ENTITY_SCHEMA.enum_constraints[0]
|
||||
with pytest.raises(AttributeError):
|
||||
constraint.field_name = "changed"
|
||||
|
||||
|
||||
# ── ComplianceDiagnostic __str__ ────────────────────────────────────
|
||||
|
||||
class TestDiagnosticStr:
|
||||
def test_basic_str(self):
|
||||
d = ComplianceDiagnostic(code="TEST", message="test msg", severity="error")
|
||||
assert "[ERROR] TEST: test msg" in str(d)
|
||||
|
||||
def test_str_with_section(self):
|
||||
d = ComplianceDiagnostic(
|
||||
code="SECTION_MISSING",
|
||||
message="Missing.",
|
||||
severity="error",
|
||||
section="definition",
|
||||
)
|
||||
s = str(d)
|
||||
assert "(section: definition)" in s
|
||||
|
||||
def test_str_with_field(self):
|
||||
d = ComplianceDiagnostic(
|
||||
code="ENUM_VALUE_UNKNOWN",
|
||||
message="Unknown.",
|
||||
severity="warning",
|
||||
field="domain",
|
||||
)
|
||||
s = str(d)
|
||||
assert "(field: domain)" in s
|
||||
|
||||
|
||||
# ── ComplianceResult properties ─────────────────────────────────────
|
||||
|
||||
class TestComplianceResultProperties:
|
||||
def test_errors_property(self):
|
||||
result = ComplianceResult(entity_slug="test", schema_name="Test")
|
||||
result.diagnostics = [
|
||||
ComplianceDiagnostic(code="A", message="a", severity="error"),
|
||||
ComplianceDiagnostic(code="B", message="b", severity="warning"),
|
||||
ComplianceDiagnostic(code="C", message="c", severity="error"),
|
||||
]
|
||||
assert len(result.errors) == 2
|
||||
assert len(result.warnings) == 1
|
||||
assert result.error_count == 2
|
||||
assert result.warning_count == 1
|
||||
assert not result.is_compliant
|
||||
|
||||
def test_summary_fail(self):
|
||||
result = ComplianceResult(entity_slug="test", schema_name="Test", checks_run=5)
|
||||
result.diagnostics = [
|
||||
ComplianceDiagnostic(code="A", message="a", severity="error"),
|
||||
]
|
||||
assert "FAIL" in result.summary()
|
||||
235
tests/unit/llm/test_embeddings.py
Normal file
235
tests/unit/llm/test_embeddings.py
Normal file
@@ -0,0 +1,235 @@
|
||||
"""Tests for embedding adapter, cache, similarity, and factory."""
|
||||
|
||||
from pathlib import Path
|
||||
from unittest import mock
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.llm.similarity import (
|
||||
cosine_similarity,
|
||||
similarity_matrix,
|
||||
find_similar_pairs,
|
||||
)
|
||||
from markitect.llm.embedding_cache import EmbeddingCache
|
||||
from markitect.llm.embedding_openai import OpenAICompatibleEmbeddingAdapter
|
||||
from markitect.llm.embedding_factory import create_embedding_adapter
|
||||
from markitect.llm.exceptions import LLMConfigurationError, LLMRateLimitError
|
||||
|
||||
|
||||
# ── Similarity math ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCosineSimilarity:
|
||||
def test_identical_vectors(self):
|
||||
v = [1.0, 2.0, 3.0]
|
||||
assert cosine_similarity(v, v) == pytest.approx(1.0)
|
||||
|
||||
def test_orthogonal_vectors(self):
|
||||
a = [1.0, 0.0, 0.0]
|
||||
b = [0.0, 1.0, 0.0]
|
||||
assert cosine_similarity(a, b) == pytest.approx(0.0)
|
||||
|
||||
def test_opposite_vectors(self):
|
||||
a = [1.0, 0.0]
|
||||
b = [-1.0, 0.0]
|
||||
assert cosine_similarity(a, b) == pytest.approx(-1.0)
|
||||
|
||||
def test_zero_vector(self):
|
||||
assert cosine_similarity([0.0, 0.0], [1.0, 2.0]) == 0.0
|
||||
|
||||
|
||||
class TestSimilarityMatrix:
|
||||
def test_diagonal_is_one(self):
|
||||
vecs = [[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]]
|
||||
mat = similarity_matrix(vecs)
|
||||
for i in range(len(vecs)):
|
||||
assert mat[i][i] == pytest.approx(1.0)
|
||||
|
||||
def test_symmetric(self):
|
||||
vecs = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
|
||||
mat = similarity_matrix(vecs)
|
||||
n = len(vecs)
|
||||
for i in range(n):
|
||||
for j in range(n):
|
||||
assert mat[i][j] == pytest.approx(mat[j][i])
|
||||
|
||||
|
||||
class TestFindSimilarPairs:
|
||||
def test_threshold_filters(self):
|
||||
emb = {
|
||||
"a": [1.0, 0.0],
|
||||
"b": [0.0, 1.0],
|
||||
"c": [1.0, 0.01], # very similar to "a"
|
||||
}
|
||||
pairs = find_similar_pairs(emb, threshold=0.90)
|
||||
slugs_in_pairs = {(s1, s2) for s1, s2, _ in pairs}
|
||||
assert ("a", "c") in slugs_in_pairs
|
||||
# a-b are orthogonal, should not appear
|
||||
assert ("a", "b") not in slugs_in_pairs
|
||||
|
||||
def test_sorted_descending(self):
|
||||
emb = {
|
||||
"x": [1.0, 0.0, 0.0],
|
||||
"y": [0.9, 0.1, 0.0],
|
||||
"z": [0.95, 0.05, 0.0],
|
||||
}
|
||||
pairs = find_similar_pairs(emb, threshold=0.0)
|
||||
sims = [s for _, _, s in pairs]
|
||||
assert sims == sorted(sims, reverse=True)
|
||||
|
||||
def test_empty_embeddings(self):
|
||||
assert find_similar_pairs({}) == []
|
||||
|
||||
def test_single_embedding(self):
|
||||
assert find_similar_pairs({"only": [1.0, 0.0]}) == []
|
||||
|
||||
|
||||
# ── Embedding cache ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEmbeddingCache:
|
||||
def test_put_get_roundtrip(self, tmp_path: Path):
|
||||
cache = EmbeddingCache(tmp_path)
|
||||
cache.put("division-of-labour", "abc123", [0.1, 0.2, 0.3])
|
||||
assert cache.get("division-of-labour", "abc123") == [0.1, 0.2, 0.3]
|
||||
|
||||
def test_wrong_digest_returns_none(self, tmp_path: Path):
|
||||
cache = EmbeddingCache(tmp_path)
|
||||
cache.put("slug", "digest-v1", [1.0])
|
||||
assert cache.get("slug", "digest-v2") is None
|
||||
|
||||
def test_missing_slug_returns_none(self, tmp_path: Path):
|
||||
cache = EmbeddingCache(tmp_path)
|
||||
assert cache.get("nonexistent", "any") is None
|
||||
|
||||
def test_save_load_persists(self, tmp_path: Path):
|
||||
cache = EmbeddingCache(tmp_path)
|
||||
cache.put("slug-a", "d1", [0.5, 0.6])
|
||||
cache.save()
|
||||
|
||||
cache2 = EmbeddingCache(tmp_path)
|
||||
assert cache2.get("slug-a", "d1") == [0.5, 0.6]
|
||||
|
||||
def test_stats_tracks_hits_and_misses(self, tmp_path: Path):
|
||||
cache = EmbeddingCache(tmp_path)
|
||||
cache.put("s", "d", [1.0])
|
||||
cache.get("s", "d") # hit
|
||||
cache.get("s", "wrong") # miss
|
||||
cache.get("missing", "x") # miss
|
||||
s = cache.stats()
|
||||
assert s["entries"] == 1
|
||||
assert s["hits"] == 1
|
||||
assert s["misses"] == 2
|
||||
|
||||
|
||||
# ── Adapter (mocked HTTP) ──────────────────────────────────────────
|
||||
|
||||
|
||||
def _make_embedding_response(vectors):
|
||||
"""Build a mock API response for the /embeddings endpoint."""
|
||||
return {
|
||||
"data": [
|
||||
{"embedding": vec, "index": i}
|
||||
for i, vec in enumerate(vectors)
|
||||
],
|
||||
"usage": {"prompt_tokens": 5, "total_tokens": 5},
|
||||
}
|
||||
|
||||
|
||||
class TestOpenAICompatibleEmbeddingAdapter:
|
||||
def _adapter(self, **kwargs):
|
||||
defaults = {"api_key": "sk-test", "provider": "openai"}
|
||||
defaults.update(kwargs)
|
||||
return OpenAICompatibleEmbeddingAdapter(**defaults)
|
||||
|
||||
@mock.patch("markitect.llm.embedding_openai.post_json")
|
||||
def test_embed_returns_vectors_in_order(self, mock_post):
|
||||
# Return indices out of order to verify sorting
|
||||
mock_post.return_value = {
|
||||
"data": [
|
||||
{"embedding": [0.2, 0.3], "index": 1},
|
||||
{"embedding": [0.1, 0.2], "index": 0},
|
||||
],
|
||||
"usage": {},
|
||||
}
|
||||
adapter = self._adapter()
|
||||
result = adapter.embed(["text1", "text2"])
|
||||
assert result == [[0.1, 0.2], [0.2, 0.3]]
|
||||
|
||||
@mock.patch("markitect.llm.embedding_openai.post_json")
|
||||
def test_embed_payload_structure(self, mock_post):
|
||||
mock_post.return_value = _make_embedding_response([[0.1]])
|
||||
adapter = self._adapter(model="text-embedding-3-large")
|
||||
adapter.embed(["hello"])
|
||||
|
||||
call_args = mock_post.call_args
|
||||
url = call_args[0][0]
|
||||
payload = call_args[0][1]
|
||||
assert url == "https://api.openai.com/v1/embeddings"
|
||||
assert payload["model"] == "text-embedding-3-large"
|
||||
assert payload["input"] == ["hello"]
|
||||
|
||||
def test_embed_raises_without_api_key(self):
|
||||
adapter = OpenAICompatibleEmbeddingAdapter(api_key=None, provider="openai")
|
||||
adapter._api_key = None
|
||||
with pytest.raises(LLMConfigurationError):
|
||||
adapter.embed(["test"])
|
||||
|
||||
def test_validate_true_with_key(self):
|
||||
adapter = self._adapter()
|
||||
assert adapter.validate() is True
|
||||
|
||||
def test_validate_false_without_key(self):
|
||||
adapter = OpenAICompatibleEmbeddingAdapter(api_key=None, provider="openai")
|
||||
adapter._api_key = None
|
||||
assert adapter.validate() is False
|
||||
|
||||
@mock.patch("markitect.llm.embedding_openai.post_json")
|
||||
@mock.patch("markitect.llm.embedding_openai.time.sleep")
|
||||
def test_retry_on_429(self, mock_sleep, mock_post):
|
||||
mock_post.side_effect = [
|
||||
LLMRateLimitError("rate limited", status_code=429),
|
||||
_make_embedding_response([[0.1, 0.2]]),
|
||||
]
|
||||
adapter = self._adapter(max_retries=2)
|
||||
result = adapter.embed(["test"])
|
||||
assert result == [[0.1, 0.2]]
|
||||
assert mock_sleep.call_count == 1
|
||||
|
||||
def test_openai_provider_base_url(self):
|
||||
adapter = self._adapter(provider="openai")
|
||||
assert adapter._api_base == "https://api.openai.com/v1"
|
||||
|
||||
def test_openrouter_provider_base_url(self):
|
||||
adapter = self._adapter(provider="openrouter")
|
||||
assert adapter._api_base == "https://openrouter.ai/api/v1"
|
||||
|
||||
def test_unknown_provider_raises(self):
|
||||
with pytest.raises(LLMConfigurationError):
|
||||
OpenAICompatibleEmbeddingAdapter(api_key="sk-test", provider="unknown")
|
||||
|
||||
|
||||
# ── Factory ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCreateEmbeddingAdapter:
|
||||
def test_openai_provider(self):
|
||||
adapter = create_embedding_adapter("openai", api_key="sk-test")
|
||||
assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
|
||||
assert adapter._provider == "openai"
|
||||
|
||||
def test_openrouter_provider(self):
|
||||
adapter = create_embedding_adapter("openrouter", api_key="sk-test")
|
||||
assert isinstance(adapter, OpenAICompatibleEmbeddingAdapter)
|
||||
assert adapter._provider == "openrouter"
|
||||
|
||||
def test_unknown_provider_raises(self):
|
||||
with pytest.raises(LLMConfigurationError) as exc_info:
|
||||
create_embedding_adapter("unknown")
|
||||
assert "unknown" in str(exc_info.value)
|
||||
|
||||
def test_model_passed_through(self):
|
||||
adapter = create_embedding_adapter(
|
||||
"openai", model="text-embedding-3-large", api_key="sk-test"
|
||||
)
|
||||
assert adapter._model == "text-embedding-3-large"
|
||||
281
tests/unit/prompts/test_batch_evaluator.py
Normal file
281
tests/unit/prompts/test_batch_evaluator.py
Normal file
@@ -0,0 +1,281 @@
|
||||
"""Tests for markitect.prompts.execution.batch."""
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.prompts.execution.batch import (
|
||||
BatchEvaluator,
|
||||
BatchItem,
|
||||
BatchResult,
|
||||
BatchSummary,
|
||||
)
|
||||
from markitect.prompts.execution.llm_adapter import MockLLMAdapter, ErrorLLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _items(n=3, digest_prefix="d"):
|
||||
return [
|
||||
BatchItem(
|
||||
key=f"entity-{i}",
|
||||
prompt=f"Evaluate entity {i}",
|
||||
content_digest=f"{digest_prefix}{i}",
|
||||
metadata={"index": i},
|
||||
)
|
||||
for i in range(n)
|
||||
]
|
||||
|
||||
|
||||
# ── BatchItem / BatchResult / BatchSummary ───────────────────────────
|
||||
|
||||
|
||||
class TestBatchModels:
|
||||
def test_batch_item_defaults(self):
|
||||
item = BatchItem(key="slug", prompt="text")
|
||||
assert item.content_digest == ""
|
||||
assert item.metadata == {}
|
||||
|
||||
def test_batch_result_defaults(self):
|
||||
result = BatchResult(key="slug", status="success")
|
||||
assert result.response is None
|
||||
assert result.error is None
|
||||
|
||||
def test_summary_total_tokens(self):
|
||||
s = BatchSummary(total_prompt_tokens=100, total_completion_tokens=50)
|
||||
assert s.total_tokens == 150
|
||||
|
||||
def test_summary_success_rate_all_success(self):
|
||||
s = BatchSummary(total=3, succeeded=3)
|
||||
assert s.success_rate() == 1.0
|
||||
|
||||
def test_summary_success_rate_with_failures(self):
|
||||
s = BatchSummary(total=4, succeeded=2, failed=2)
|
||||
assert s.success_rate() == pytest.approx(0.5)
|
||||
|
||||
def test_summary_success_rate_all_skipped(self):
|
||||
s = BatchSummary(total=3, skipped=3)
|
||||
assert s.success_rate() == 1.0
|
||||
|
||||
def test_summary_success_rate_mixed(self):
|
||||
s = BatchSummary(total=5, succeeded=2, failed=1, skipped=2)
|
||||
# 3 attempted, 2 succeeded
|
||||
assert s.success_rate() == pytest.approx(2 / 3)
|
||||
|
||||
|
||||
# ── BatchEvaluator ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBatchEvaluator:
|
||||
def test_evaluate_all_items(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
summary = evaluator.evaluate(_items(3))
|
||||
|
||||
assert summary.total == 3
|
||||
assert summary.succeeded == 3
|
||||
assert summary.failed == 0
|
||||
assert summary.skipped == 0
|
||||
assert len(summary.results) == 3
|
||||
assert adapter.call_count == 3
|
||||
|
||||
def test_results_preserve_keys(self):
|
||||
adapter = MockLLMAdapter("ok")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
items = _items(2)
|
||||
summary = evaluator.evaluate(items)
|
||||
|
||||
keys = [r.key for r in summary.results]
|
||||
assert keys == ["entity-0", "entity-1"]
|
||||
|
||||
def test_results_preserve_metadata(self):
|
||||
adapter = MockLLMAdapter("ok")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
items = _items(1)
|
||||
summary = evaluator.evaluate(items)
|
||||
assert summary.results[0].metadata == {"index": 0}
|
||||
|
||||
def test_response_content_available(self):
|
||||
adapter = MockLLMAdapter("evaluated text")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
summary = evaluator.evaluate(_items(1))
|
||||
assert summary.results[0].response.content == "evaluated text"
|
||||
|
||||
def test_token_usage_aggregated(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
summary = evaluator.evaluate(_items(3))
|
||||
assert summary.total_prompt_tokens > 0
|
||||
assert summary.total_completion_tokens > 0
|
||||
assert summary.total_tokens == summary.total_prompt_tokens + summary.total_completion_tokens
|
||||
|
||||
def test_config_passed_to_adapter(self):
|
||||
adapter = MockLLMAdapter("ok")
|
||||
config = RunConfig(temperature=0.1, max_tokens=500)
|
||||
evaluator = BatchEvaluator(adapter, config=config)
|
||||
evaluator.evaluate(_items(1))
|
||||
assert adapter.last_config.temperature == 0.1
|
||||
assert adapter.last_config.max_tokens == 500
|
||||
|
||||
|
||||
# ── Incremental evaluation ──────────────────────────────────────────
|
||||
|
||||
|
||||
class TestIncrementalEvaluation:
|
||||
def test_skip_unchanged_items(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
previous = {"entity-0": "d0", "entity-1": "d1", "entity-2": "d2"}
|
||||
evaluator = BatchEvaluator(adapter, previous_digests=previous)
|
||||
|
||||
summary = evaluator.evaluate(_items(3))
|
||||
assert summary.skipped == 3
|
||||
assert summary.succeeded == 0
|
||||
assert adapter.call_count == 0
|
||||
|
||||
def test_evaluate_changed_items(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
# Only entity-0 has matching digest
|
||||
previous = {"entity-0": "d0"}
|
||||
evaluator = BatchEvaluator(adapter, previous_digests=previous)
|
||||
|
||||
summary = evaluator.evaluate(_items(3))
|
||||
assert summary.skipped == 1
|
||||
assert summary.succeeded == 2
|
||||
assert adapter.call_count == 2
|
||||
|
||||
def test_evaluate_new_items(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
# Previous has different keys
|
||||
previous = {"old-entity": "old-digest"}
|
||||
evaluator = BatchEvaluator(adapter, previous_digests=previous)
|
||||
|
||||
summary = evaluator.evaluate(_items(2))
|
||||
assert summary.skipped == 0
|
||||
assert summary.succeeded == 2
|
||||
|
||||
def test_changed_digest_not_skipped(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
# Same key but different digest
|
||||
previous = {"entity-0": "old-digest"}
|
||||
evaluator = BatchEvaluator(adapter, previous_digests=previous)
|
||||
|
||||
summary = evaluator.evaluate(_items(1))
|
||||
assert summary.skipped == 0
|
||||
assert summary.succeeded == 1
|
||||
|
||||
def test_empty_digest_not_skipped(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
previous = {"entity-0": "d0"}
|
||||
evaluator = BatchEvaluator(adapter, previous_digests=previous)
|
||||
|
||||
item = BatchItem(key="entity-0", prompt="eval", content_digest="")
|
||||
summary = evaluator.evaluate([item])
|
||||
assert summary.skipped == 0
|
||||
assert summary.succeeded == 1
|
||||
|
||||
def test_skipped_status_in_result(self):
|
||||
adapter = MockLLMAdapter("result")
|
||||
previous = {"entity-0": "d0"}
|
||||
evaluator = BatchEvaluator(adapter, previous_digests=previous)
|
||||
|
||||
summary = evaluator.evaluate(_items(1))
|
||||
assert summary.results[0].status == "skipped"
|
||||
assert summary.results[0].response is None
|
||||
|
||||
|
||||
# ── Error handling ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBatchErrorHandling:
|
||||
def test_error_captured_not_raised(self):
|
||||
adapter = ErrorLLMAdapter("kaboom")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
|
||||
summary = evaluator.evaluate(_items(2))
|
||||
assert summary.failed == 2
|
||||
assert summary.succeeded == 0
|
||||
|
||||
def test_error_message_in_result(self):
|
||||
adapter = ErrorLLMAdapter("something went wrong")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
|
||||
summary = evaluator.evaluate(_items(1))
|
||||
assert summary.results[0].status == "error"
|
||||
assert "something went wrong" in summary.results[0].error
|
||||
|
||||
def test_error_does_not_stop_batch(self):
|
||||
"""One failing item doesn't prevent others from running."""
|
||||
call_count = 0
|
||||
|
||||
class FailOnFirstAdapter(MockLLMAdapter):
|
||||
def execute_prompt(self, prompt, config):
|
||||
nonlocal call_count
|
||||
call_count += 1
|
||||
if call_count == 1:
|
||||
raise RuntimeError("first fails")
|
||||
return super().execute_prompt(prompt, config)
|
||||
|
||||
adapter = FailOnFirstAdapter("ok")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
summary = evaluator.evaluate(_items(3))
|
||||
|
||||
assert summary.failed == 1
|
||||
assert summary.succeeded == 2
|
||||
assert summary.results[0].status == "error"
|
||||
assert summary.results[1].status == "success"
|
||||
assert summary.results[2].status == "success"
|
||||
|
||||
|
||||
# ── Progress callback ───────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestProgressCallback:
|
||||
def test_callback_called_for_each_item(self):
|
||||
calls = []
|
||||
adapter = MockLLMAdapter("ok")
|
||||
evaluator = BatchEvaluator(
|
||||
adapter,
|
||||
progress_callback=lambda done, total, result: calls.append(
|
||||
(done, total, result.key)
|
||||
),
|
||||
)
|
||||
evaluator.evaluate(_items(3))
|
||||
|
||||
assert len(calls) == 3
|
||||
assert calls[0] == (1, 3, "entity-0")
|
||||
assert calls[1] == (2, 3, "entity-1")
|
||||
assert calls[2] == (3, 3, "entity-2")
|
||||
|
||||
def test_callback_receives_result(self):
|
||||
results = []
|
||||
adapter = MockLLMAdapter("ok")
|
||||
evaluator = BatchEvaluator(
|
||||
adapter,
|
||||
progress_callback=lambda done, total, result: results.append(result),
|
||||
)
|
||||
evaluator.evaluate(_items(2))
|
||||
|
||||
assert all(isinstance(r, BatchResult) for r in results)
|
||||
assert results[0].status == "success"
|
||||
|
||||
def test_no_callback_no_error(self):
|
||||
adapter = MockLLMAdapter("ok")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
# Should work fine without callback
|
||||
summary = evaluator.evaluate(_items(1))
|
||||
assert summary.succeeded == 1
|
||||
|
||||
|
||||
# ── Empty batch ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEmptyBatch:
|
||||
def test_empty_items(self):
|
||||
adapter = MockLLMAdapter("ok")
|
||||
evaluator = BatchEvaluator(adapter)
|
||||
summary = evaluator.evaluate([])
|
||||
|
||||
assert summary.total == 0
|
||||
assert summary.succeeded == 0
|
||||
assert summary.results == []
|
||||
assert adapter.call_count == 0
|
||||
Reference in New Issue
Block a user