docs: metrics methodology, collection-level tasks, and infospace tooling roadmap
Add METRICS-METHODOLOGY.md documenting the theoretical frameworks (SEQUAL, OntoClean, OOPS!, OntoQA, FCA, DSL principles) adapted for two-layer evaluation (LLM-Eval + deterministic aggregation) across five collection concerns: redundancy, coverage, coherence, consistency, and granularity balance. Extend INFRA-TASKS.md with assignment assessment (tasks 4-7), per-concept metrics (tasks 8-12), and collection-level metrics (tasks 13-19). Add roadmap/infospace-tooling/PLAN.md defining terminology (infospace, topic, discipline, entity, evaluation, viability) and a three-stage implementation plan: Stage 1 platform additions, Stage 2 infospace tooling layer, Stage 3 example revision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -37,3 +37,513 @@ no automatic parsing for this format, requiring manual macro construction.
|
||||
**Fix applied:** Added `SHORTHAND_PATTERN` to `MacroParser` that recognises
|
||||
`@{target}` and maps it to `MacroKind.REQUIRED`. Updated `has_macros()`,
|
||||
`count_macros()`, and `find_macro_positions()` accordingly.
|
||||
|
||||
---
|
||||
|
||||
## Assignment Assessment (18 Feb 2026)
|
||||
|
||||
How the example measures against the objectives stated in `README.md`:
|
||||
|
||||
| # | Objective | Status | Notes |
|
||||
|---|-----------|--------|-------|
|
||||
| 1 | Capture knowledge from Wealth of Nations | **Partial** | 7 of 35 chapters processed (Book I, ch. 1-7). 85 canonical entities extracted. |
|
||||
| 2 | Transform to VSM concepts/entities | **Done (for processed chapters)** | Entities mapped to S1-S5 with strength ratings. |
|
||||
| 3 | Consistent and complete | **Not yet** | Only 20% of chapters done. Metrics report exists but covers limited scope. |
|
||||
| 4 | Schemas as scaffolding | **Done** | Four schemas defined and used across all stages. |
|
||||
| 5 | Prompt dependency resolution | **Done** | `@{macro}` templates resolved via MultiSpaceResolutionStrategy. |
|
||||
| 6 | Incremental chapter injection | **Done** | Pipeline processes one chapter at a time; `@{existing_entities}` prevents duplication. |
|
||||
| 7 | Keep changes as git history | **Not done** | See task 4 below. |
|
||||
| 8 | Metrics for completeness/consistency | **Partial** | Template and report exist but only cover 4 chapters (report predates ch. 5-7). |
|
||||
| 9 | No infrastructure changes during experiment | **Violated** | Three infra fixes were required (tasks 1-3 above). Documented as intended. |
|
||||
| 10 | Generate task list for infra issues | **Done** | This file. |
|
||||
|
||||
## 4. Infospace has no per-chapter git history — OPEN
|
||||
|
||||
**Objective:** README states "The information space should utilize the option
|
||||
of keeping changes as git history."
|
||||
**Issue:** The 7 processed chapters were committed in mixed batches alongside
|
||||
infrastructure changes (LLM adapters, entity refactoring, archive policy).
|
||||
Chapters 1-2 are bundled into `fecc2fd` with the entire LLM module.
|
||||
Chapters 5-7 share a single commit (`41773f1`) with the OpenAI adapter and
|
||||
archive policy. There is no commit where you can `git diff` to see exactly
|
||||
what one chapter contributed to the infospace.
|
||||
**Impact:** Cannot use `git log`, `git diff`, or `git bisect` to trace how
|
||||
the infospace grew chapter by chapter — the core promise of "with history."
|
||||
**Suggested fix:** Re-run the 7 processed chapters (and remaining 28) using
|
||||
`process_chapters.py` without `--no-commit`, on a clean branch or after
|
||||
squashing the current output into a baseline commit. Each chapter gets its
|
||||
own commit via `_git_commit_chapter()`.
|
||||
|
||||
## 5. Prompt files are regenerated as a side-effect of DB rebuild — OPEN
|
||||
|
||||
**Issue:** Running `--all --no-commit` to regenerate `infospace.db` also
|
||||
overwrites `*-prompt.md` files in the output directories because each
|
||||
pipeline stage unconditionally writes the compiled prompt before checking
|
||||
whether output already exists. The `@{existing_entities}` macro content
|
||||
shifts as earlier chapters are loaded, so prompt files for already-processed
|
||||
chapters change on every full run.
|
||||
**Impact:** A DB regeneration dirties the working tree with prompt file
|
||||
changes, even though no actual outputs changed. Users must `git checkout`
|
||||
the prompt files after regeneration.
|
||||
**Suggested fix:** Skip writing prompt files when the corresponding output
|
||||
file already exists on disk, or add a `--rebuild-db-only` flag that
|
||||
populates the database without touching the file system.
|
||||
|
||||
## 6. Metrics report is stale — OPEN
|
||||
|
||||
**Issue:** The metrics report (`output/metrics/metrics-report.md`) was
|
||||
generated after chapters 1-4. Chapters 5-7 have since been processed but
|
||||
the report has not been refreshed.
|
||||
**Impact:** The metrics do not reflect the current state of the infospace.
|
||||
**Suggested fix:** Re-run `--metrics --provider <provider> --no-commit`
|
||||
after every batch of new chapters. Consider making metrics assessment
|
||||
automatic at the end of `--book` or `--all` runs.
|
||||
|
||||
## 7. Remaining 28 chapters not yet processed — OPEN
|
||||
|
||||
**Issue:** Only Book I chapters 1-7 have been processed. Books II-V
|
||||
(28 chapters) remain unprocessed.
|
||||
**Impact:** The infospace is incomplete — VSM coverage is limited to S1,
|
||||
S2, and partial S4. S3, S3*, S5, and many systemic concepts (algedonic
|
||||
signals, recursion, variety) are expected to emerge from later books.
|
||||
**Suggested fix:** Process remaining chapters in book-sized batches with
|
||||
per-chapter commits, refreshing metrics after each book.
|
||||
|
||||
---
|
||||
|
||||
## Per-Concept Metrics (tasks 8-12)
|
||||
|
||||
The current metrics system is a single LLM-evaluated narrative report that
|
||||
assesses the infospace as a whole. It produces no machine-readable output,
|
||||
cannot be tracked over time, and conflates per-concept quality with
|
||||
collection-level coherence.
|
||||
|
||||
The improvement splits metrics into two layers:
|
||||
|
||||
- **LLM-Eval**: A prompt template evaluates each concept individually
|
||||
against quality criteria defined in the schema. The LLM returns structured
|
||||
scores, not prose.
|
||||
- **Deterministic aggregation**: `process_chapters.py` computes what it can
|
||||
from files on disk (schema compliance, word counts, section presence,
|
||||
coverage tallies) and aggregates LLM-eval scores into dashboard metrics.
|
||||
|
||||
Both layers persist results in structured form so they can be diffed,
|
||||
tracked over time, and committed alongside the entities they evaluate.
|
||||
|
||||
## 8. Add per-concept quality metrics to entity schema — OPEN
|
||||
|
||||
**Issue:** The entity schema (`economic-entity-schema-v1.0.md`) defines
|
||||
required sections and validation rules (section presence, word count range)
|
||||
but no quality criteria. There is no definition of what makes a *good*
|
||||
entity versus a merely *compliant* one.
|
||||
**Suggested fix:** Add a `## Quality Metrics` section to the entity schema
|
||||
defining evaluation dimensions with scoring rubrics:
|
||||
|
||||
- **Definition Precision** (1-5): Is the definition specific, non-circular,
|
||||
and distinguishable from neighbouring concepts?
|
||||
- **Source Grounding** (1-5): Is the entity grounded in a specific passage?
|
||||
Does the citation exist and support the definition?
|
||||
- **Domain Placement** (1-5): Is the economic domain assignment correct and
|
||||
specific (not just "General Theory")?
|
||||
- **VSM Relevance** (1-5): Does the entity connect meaningfully to at least
|
||||
one VSM system, or is it too granular/abstract to map?
|
||||
- **Explanatory Value** (1-5): Does this entity contribute to explaining
|
||||
the economic system, or is it a restatement of another concept?
|
||||
|
||||
Similarly update the VSM mapping schema with:
|
||||
|
||||
- **Rationale Rigour** (1-5): Is the mapping justified with reference to
|
||||
Beer's definitions, not just surface-level analogy?
|
||||
- **Strength Calibration** (1-5): Is the declared strength (Strong/Moderate/
|
||||
Weak) consistent with the rationale given?
|
||||
|
||||
These rubrics become the prompt instructions for task 9.
|
||||
|
||||
## 9. Create evaluate-entity prompt template — OPEN
|
||||
|
||||
**Depends on:** Task 8 (quality metrics in schema).
|
||||
**Issue:** There is no mechanism to evaluate an existing entity after
|
||||
extraction. Quality is only judged implicitly during the global metrics
|
||||
assessment, which is too coarse to identify individual weak entities.
|
||||
**Suggested fix:** Create `templates/evaluate-entity.md` — a prompt
|
||||
template that:
|
||||
|
||||
1. Takes `@{entity_content}`, `@{source_chapter}`, `@{vsm_framework}`,
|
||||
and `@{quality_rubric}` (from the schema's quality metrics section).
|
||||
2. Asks the LLM to score each dimension (1-5) with a one-sentence
|
||||
justification per score.
|
||||
3. Outputs structured YAML front-matter (scores) followed by markdown
|
||||
(justifications), e.g.:
|
||||
|
||||
```yaml
|
||||
---
|
||||
entity: division-of-labour
|
||||
scores:
|
||||
definition_precision: 5
|
||||
source_grounding: 5
|
||||
domain_placement: 4
|
||||
vsm_relevance: 5
|
||||
explanatory_value: 5
|
||||
overall: 4.8
|
||||
flags: []
|
||||
---
|
||||
```
|
||||
|
||||
Add a pipeline stage: `--evaluate` runs this template against every
|
||||
canonical entity and writes results to `output/evaluations/<slug>-eval.md`.
|
||||
A `--evaluate --chapter <id>` variant evaluates only entities introduced
|
||||
by that chapter.
|
||||
|
||||
## 10. Add deterministic schema compliance checker — OPEN
|
||||
|
||||
**Issue:** Schema compliance is currently LLM-evaluated ("100%" in the
|
||||
metrics report) but the validation rules in the schemas are mechanical:
|
||||
section presence, word count ranges, heading format. These should be
|
||||
checked programmatically, not by an LLM.
|
||||
**Suggested fix:** Add a `validate_entity(path) -> ValidationResult`
|
||||
function to `process_chapters.py` (or a new `validate.py` module) that:
|
||||
|
||||
- Parses the markdown to extract H2 section headings
|
||||
- Checks required sections are present (Definition, Source Chapter,
|
||||
Context, Economic Domain)
|
||||
- Counts words in the Definition section (must be 20-150)
|
||||
- Checks H1 heading exists and is not a slug (e.g. `effectual-demand`
|
||||
in chapter 7 has `# effectual-demand` instead of `# Effectual Demand`)
|
||||
- Validates Source Chapter cites a specific book/chapter
|
||||
- For mapping files: checks Mapping Strength is one of the enum values
|
||||
|
||||
Expose as `--validate` CLI flag. Output a structured report:
|
||||
|
||||
```
|
||||
Validation: 85 entities, 3 warnings
|
||||
effectual-demand.md: H1 is slug format, not title case
|
||||
porter.md: Definition is 18 words (minimum 20)
|
||||
...
|
||||
```
|
||||
|
||||
This is fully deterministic — no LLM calls needed.
|
||||
|
||||
## 11. Structured metrics output format — OPEN
|
||||
|
||||
**Depends on:** Tasks 9 and 10.
|
||||
**Issue:** The metrics report is a markdown narrative. Values cannot be
|
||||
parsed programmatically, diffed meaningfully, or plotted over time.
|
||||
**Suggested fix:** Alongside the human-readable `metrics-report.md`,
|
||||
emit a machine-readable `metrics.yaml` (or `.json`) containing:
|
||||
|
||||
```yaml
|
||||
timestamp: "2026-02-18T12:00:00Z"
|
||||
chapters_processed: 7
|
||||
chapters_total: 35
|
||||
entities_total: 85
|
||||
entities_archived: 0
|
||||
vsm_coverage:
|
||||
S1: 28
|
||||
S2: 12
|
||||
S3: 8
|
||||
S3_star: 0
|
||||
S4: 5
|
||||
S5: 0
|
||||
recursion: 1
|
||||
variety: 0
|
||||
mapping_strength:
|
||||
strong: 64
|
||||
moderate: 18
|
||||
weak: 3
|
||||
validation:
|
||||
schema_compliant: 82
|
||||
warnings: 3
|
||||
evaluation: # from LLM-eval (task 9)
|
||||
mean_overall: 4.2
|
||||
min_overall: 2.8
|
||||
flagged_entities: ["porter", "country-workman"]
|
||||
```
|
||||
|
||||
The `--metrics` command writes both files. The YAML file is committed
|
||||
to git so `git diff` shows exactly how metrics changed between runs.
|
||||
|
||||
## 12. Metrics-over-time tracking — OPEN
|
||||
|
||||
**Depends on:** Task 11 (structured output).
|
||||
**Issue:** There is one metrics snapshot that gets overwritten. No history
|
||||
of how metrics evolved as chapters were added.
|
||||
**Suggested fix:** Append each metrics snapshot to a cumulative log file
|
||||
`output/metrics/metrics-history.yaml` (list of timestamped entries). This
|
||||
is committed to git alongside the current snapshot. The pipeline can
|
||||
optionally render a simple text-based progress summary:
|
||||
|
||||
```
|
||||
Metrics history (5 snapshots):
|
||||
2026-02-10 ch 1/35 13 entities 41.7% VSM coverage
|
||||
2026-02-11 ch 4/35 38 entities 50.0% VSM coverage
|
||||
2026-02-11 ch 7/35 85 entities 58.3% VSM coverage
|
||||
...
|
||||
```
|
||||
|
||||
This provides the "metrics that improve over time" feedback loop the
|
||||
README envisions: process chapters → evaluate → see coverage grow (or
|
||||
flag regressions when a re-extraction reduces quality scores).
|
||||
|
||||
---
|
||||
|
||||
## Collection-Level Metrics (tasks 13-19)
|
||||
|
||||
These tasks implement the five collection-level concerns described in
|
||||
`METRICS-METHODOLOGY.md`. They share underlying infrastructure (entity
|
||||
metadata index, definition embeddings, relationship graph) that should
|
||||
be built once per evaluation run.
|
||||
|
||||
See the methodology document for theoretical grounding, framework
|
||||
references, and the full metric definitions per concern.
|
||||
|
||||
## 13. Entity metadata index — deterministic parsing layer — OPEN
|
||||
|
||||
**Depends on:** Task 10 (schema compliance checker shares parsing logic).
|
||||
**Issue:** Several collection-level metrics (coverage matrix, FCA context,
|
||||
granularity distribution) require structured metadata extracted from entity
|
||||
files: H1 title, economic domain, VSM system(s), source chapter, section
|
||||
presence, word counts. Currently this information exists only as prose
|
||||
inside markdown files.
|
||||
**Suggested fix:** Add a `parse_entity_metadata(path) -> EntityMeta`
|
||||
function that extracts from each entity file:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class EntityMeta:
|
||||
slug: str
|
||||
title: str # from H1
|
||||
domain: str # from Economic Domain section
|
||||
source_chapter: str # from Source Chapter section
|
||||
definition_words: int # word count of Definition section
|
||||
has_original_wording: bool # optional section present?
|
||||
has_modern_interpretation: bool
|
||||
vsm_systems: list[str] # from mapping file if exists
|
||||
mapping_strengths: list[str]
|
||||
```
|
||||
|
||||
Build an index of all entities at the start of each evaluation run.
|
||||
This index is the input for tasks 14, 16, and 18. Expose as
|
||||
`--index` CLI flag for inspection.
|
||||
|
||||
## 14. Redundancy detection (Concern C1) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** OOPS! P2 (synonymous classes) + embedding similarity +
|
||||
LLM pairwise judgment. See METRICS-METHODOLOGY.md §4 C1.
|
||||
**Issue:** Entities with different slugs but overlapping meanings (e.g.
|
||||
`natural-rate` / `ordinary-or-average-rate`) survive extraction because
|
||||
dedup only checks slug collisions. There is no semantic overlap detection.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **Embed** — Compute vector embeddings of all entity definitions using
|
||||
an embedding API (OpenRouter, OpenAI, or a local sentence-transformer).
|
||||
Cache embeddings in `output/metrics/embeddings.json` keyed by
|
||||
`{slug: content_digest}` so unchanged entities skip re-embedding.
|
||||
|
||||
2. **Similarity matrix** — Compute NxN cosine similarity. Write the full
|
||||
matrix to `output/metrics/similarity-matrix.json`. Flag all pairs with
|
||||
cosine > 0.80 as candidates.
|
||||
|
||||
3. **LLM pairwise judgment** — For each candidate pair, run a prompt:
|
||||
"Given these two entity definitions, are they (a) the same concept and
|
||||
should be merged, (b) genuinely distinct, or (c) partially overlapping
|
||||
and should be clarified?" Write results to
|
||||
`output/metrics/redundancy-report.md` + YAML.
|
||||
|
||||
**Metrics produced:**
|
||||
- `high_similarity_pairs`: count and list
|
||||
- `confirmed_synonyms`: count (LLM-confirmed same concept)
|
||||
- `redundancy_ratio`: `confirmed_synonyms / total_entities`
|
||||
- `intensional_conciseness`: `1 - redundancy_ratio`
|
||||
|
||||
**CLI:** `--check-redundancy --provider <provider>`
|
||||
|
||||
## 15. Coverage completeness (Concern C2) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** SEQUAL completeness + FCA gap analysis + DSL competency
|
||||
questions. See METRICS-METHODOLOGY.md §4 C2.
|
||||
**Issue:** Coverage is currently assessed by the LLM in a single narrative
|
||||
pass. There is no structured view of which domain × VSM cells are
|
||||
populated, and no way to test whether the entity set can answer specific
|
||||
questions about the economic system.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **Domain × VSM matrix** — From the metadata index, count entities per
|
||||
{economic_domain, vsm_system} cell. Render as a table. Identify empty
|
||||
cells as specific, actionable gaps. Compute:
|
||||
- `coverage_ratio = populated_cells / total_cells`
|
||||
- `vsm_balance_entropy = -Σ(pᵢ log pᵢ)` across VSM systems
|
||||
|
||||
2. **FCA lattice** — Construct a formal context with objects = entities,
|
||||
attributes = {domain, vsm_system, source_book, abstraction_level}.
|
||||
Compute the concept lattice (Python `concepts` library). Extract
|
||||
attribute combinations with no corresponding entity — these are
|
||||
**structural coverage gaps** not visible in the simple matrix.
|
||||
|
||||
3. **Competency questions** — Define a set of 15-20 canonical questions
|
||||
the infospace should answer (stored in
|
||||
`schemas/competency-questions.md`). Example questions:
|
||||
- "How does the division of labour relate to market extent?"
|
||||
- "What mechanisms regulate wages toward their natural rate?"
|
||||
- "How do monopolies distort the viable system?"
|
||||
LLM-Eval tests whether current entities suffice to answer each.
|
||||
Unanswerable questions identify specific completeness gaps.
|
||||
|
||||
**Metrics produced:**
|
||||
- `domain_vsm_matrix`: cell counts
|
||||
- `coverage_ratio`: scalar
|
||||
- `vsm_balance_entropy`: scalar
|
||||
- `empty_cells`: list of {domain, vsm_system} gaps
|
||||
- `fca_gap_concepts`: attribute combos with no entity
|
||||
- `competency_coverage`: fraction of questions answerable
|
||||
|
||||
**CLI:** `--check-coverage --provider <provider>`
|
||||
|
||||
## 16. Structural coherence (Concern C3) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** OntoQA relationship richness + graph connectivity +
|
||||
community detection. See METRICS-METHODOLOGY.md §4 C3.
|
||||
**Issue:** It is unknown whether the 85 entities form a connected
|
||||
explanatory web or a fragmented collection. No relationship graph exists
|
||||
between entities.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **Explicit cross-references** — Scan each entity's definition for
|
||||
mentions of other entity slugs or titles (normalised string matching).
|
||||
This is deterministic and catches direct references.
|
||||
|
||||
2. **LLM-inferred edges** — For entity pairs not caught by string
|
||||
matching but in the same domain or VSM system, LLM-Eval: "Does A's
|
||||
definition conceptually depend on or explain B, or vice versa?" Run
|
||||
in batches. Write the combined graph to
|
||||
`output/metrics/relationship-graph.json` (adjacency list).
|
||||
|
||||
3. **Graph analysis** — Using networkx or equivalent:
|
||||
- Connected components (target: 1)
|
||||
- Graph density, average degree
|
||||
- Betweenness centrality → identify bridge concepts
|
||||
- Louvain community detection → compare to declared domains
|
||||
- OntoQA Relationship Richness
|
||||
- Cohesion per domain, coupling across domains
|
||||
- Orphan entities (degree 0 or 1)
|
||||
|
||||
**Metrics produced:**
|
||||
- `connected_components`: count (target: 1)
|
||||
- `graph_density`: scalar
|
||||
- `avg_degree`: scalar
|
||||
- `relationship_richness`: OntoQA RR
|
||||
- `modularity`: Louvain score
|
||||
- `bridge_concepts`: list (high betweenness centrality)
|
||||
- `orphan_entities`: list (degree ≤ 1)
|
||||
- `cohesion_by_domain` / `coupling_across_domains`: scalars
|
||||
|
||||
**CLI:** `--check-coherence --provider <provider>`
|
||||
|
||||
## 17. Definitional consistency (Concern C4) — OPEN
|
||||
|
||||
**Depends on:** Task 16 (relationship graph — the definitional dependency
|
||||
graph is a directed variant of the same structure).
|
||||
**Methodology:** OntoClean metaproperties + OOPS! P24 (circular
|
||||
definitions) + SEQUAL validity. See METRICS-METHODOLOGY.md §4 C4.
|
||||
**Issue:** No mechanism to detect circular definitions, contradictions
|
||||
between related entities, or terms used in definitions that should be
|
||||
entities but aren't.
|
||||
**Suggested fix:** Implement in four stages:
|
||||
|
||||
1. **Definitional dependency graph** — Directed version of the
|
||||
relationship graph: edge A→B means A's definition uses B's concept.
|
||||
Reuse cross-reference extraction from task 16.
|
||||
|
||||
2. **Cycle detection** — Find all cycles of length ≤ 3 in the directed
|
||||
graph. Short cycles are problematic (A defines B, B defines A).
|
||||
Compute `grounding_ratio`: fraction of entities traceable to terms
|
||||
outside the entity set without encountering a cycle.
|
||||
|
||||
3. **Undefined dependencies** — Extract terms from definitions that match
|
||||
entity-name patterns (capitalised noun phrases, kebab-case slugs) but
|
||||
have no corresponding entity file. These are concepts the infospace
|
||||
implicitly relies on but hasn't defined.
|
||||
|
||||
4. **LLM consistency checks** — For directly-connected entity pairs,
|
||||
LLM-Eval: "Do these definitions contradict each other?" For entities
|
||||
with Smith's Original Wording, LLM-Eval: "Does the definition
|
||||
accurately represent the cited passage?"
|
||||
|
||||
**Metrics produced:**
|
||||
- `circular_definitions`: count and list of cycles (length ≤ 3)
|
||||
- `grounding_ratio`: fraction of entities reaching primitives
|
||||
- `undefined_dependencies`: list of missing terms
|
||||
- `contradiction_candidates`: LLM-flagged pairs
|
||||
- `source_fidelity_score`: fraction passing source check
|
||||
|
||||
**CLI:** `--check-consistency --provider <provider>`
|
||||
|
||||
## 18. Granularity balance (Concern C5) — OPEN
|
||||
|
||||
**Depends on:** Task 13 (metadata index).
|
||||
**Methodology:** Keet granularity theory + OntoClean rigidity +
|
||||
DSL laconicity. See METRICS-METHODOLOGY.md §4 C5.
|
||||
**Issue:** Entities range from broad sectors (`agriculture`) to specific
|
||||
market roles (`effectual-demanders`) to abstract principles
|
||||
(`division-of-labour`). It is unclear whether this range is appropriate
|
||||
or whether some entities are too specific/general relative to their peers.
|
||||
**Suggested fix:** Implement in three stages:
|
||||
|
||||
1. **LLM classification** — For each entity, LLM-Eval assigns:
|
||||
- Abstraction level: `theory` / `mechanism` / `observation`
|
||||
- Scope score: 1-5 (very specific → very general)
|
||||
- Indispensability: 1-5 ("if removed, how much explanatory power lost?")
|
||||
Write to `output/evaluations/<slug>-classification.yaml`.
|
||||
|
||||
2. **Distribution analysis** — Deterministic:
|
||||
- Count per abstraction level; compute entropy
|
||||
- Per-domain scope variance (flag domains with high variance)
|
||||
- Level × domain matrix (from FCA context in task 15)
|
||||
- Outlier detection: entities > 1.5σ from their domain's mean scope
|
||||
|
||||
3. **Merge/split recommendations** — For outlier entities, LLM-Eval:
|
||||
"Should this entity be merged into a broader concept, split into
|
||||
sub-concepts, or is its current granularity justified?" For entities
|
||||
with indispensability ≤ 2: "Could another entity serve this purpose?"
|
||||
|
||||
**Metrics produced:**
|
||||
- `abstraction_distribution`: {theory: n, mechanism: n, observation: n}
|
||||
- `abstraction_entropy`: scalar (higher = more balanced)
|
||||
- `scope_variance_by_domain`: per-domain scalar
|
||||
- `dispensable_entities`: list (indispensability ≤ 2)
|
||||
- `merge_candidates`: list of pairs
|
||||
- `split_candidates`: list of entities
|
||||
|
||||
**CLI:** `--check-granularity --provider <provider>`
|
||||
|
||||
## 19. Unified collection evaluation command — OPEN
|
||||
|
||||
**Depends on:** Tasks 13-18.
|
||||
**Issue:** Running five separate `--check-*` commands is cumbersome and
|
||||
repeats shared computation (metadata parsing, embedding, graph building).
|
||||
**Suggested fix:** Add `--evaluate-collection --provider <provider>` that
|
||||
runs all five checks in sequence, sharing infrastructure:
|
||||
|
||||
1. Parse entity metadata index (task 13) — used by all
|
||||
2. Compute embeddings (task 14) — used by C1, C3
|
||||
3. Build relationship graph (task 16) — used by C3, C4
|
||||
4. Run all five concern checks
|
||||
5. Write per-concern reports to `output/metrics/`
|
||||
6. Write unified `metrics.yaml` with all collection metrics
|
||||
7. Append to `metrics-history.yaml` (task 12)
|
||||
|
||||
Incremental mode: `--evaluate-collection --chapter <id>` re-evaluates
|
||||
only entities from that chapter plus pairwise checks involving them.
|
||||
|
||||
Report a summary to stdout:
|
||||
|
||||
```
|
||||
Collection evaluation (85 entities, 7 chapters):
|
||||
Redundancy: 3 synonym candidates, conciseness 0.96
|
||||
Coverage: 58% VSM, 20% chapters, 4 domain gaps
|
||||
Coherence: 1 component, density 0.12, 2 orphans
|
||||
Consistency: 0 cycles, 5 undefined deps, 0 contradictions
|
||||
Granularity: entropy 1.42, 1 dispensable, 2 merge candidates
|
||||
```
|
||||
|
||||
501
examples/infospace-with-history/METRICS-METHODOLOGY.md
Normal file
501
examples/infospace-with-history/METRICS-METHODOLOGY.md
Normal file
@@ -0,0 +1,501 @@
|
||||
# Collection-Level Metrics Methodology
|
||||
|
||||
How we evaluate the quality of the infospace as a **collection of
|
||||
interrelated concepts**, beyond the quality of individual entities.
|
||||
|
||||
This document describes the theoretical frameworks drawn from ontology
|
||||
engineering, formal concept analysis, semiotic quality theory, and DSL
|
||||
design — and how each is adapted to work within MarkiTect's two-layer
|
||||
evaluation model (LLM-Eval + deterministic aggregation).
|
||||
|
||||
---
|
||||
|
||||
## 1. The Two-Layer Model
|
||||
|
||||
Every metric in this methodology decomposes into two layers:
|
||||
|
||||
| Layer | What it does | How it runs |
|
||||
|-------|-------------|-------------|
|
||||
| **LLM-Eval** | Qualitative judgment: "Are these two concepts the same?", "Is this definition grounded in the source?" | Prompt template → LLM → structured YAML output |
|
||||
| **Deterministic** | Quantitative aggregation: cosine similarity, graph connectivity, coverage counting, cycle detection | Python code in `process_chapters.py` or dedicated `metrics.py` |
|
||||
|
||||
The LLM-Eval layer produces **per-entity** or **per-pair** structured
|
||||
scores. The deterministic layer **aggregates** these into collection-level
|
||||
metrics, persisted as machine-readable YAML alongside human-readable
|
||||
markdown reports.
|
||||
|
||||
Per-concept quality metrics (definition precision, source grounding, VSM
|
||||
relevance — see INFRA-TASKS 8-12) operate at the individual entity level.
|
||||
This document covers the five **collection-level concerns** that assess how
|
||||
the entities work together as an explanatory system.
|
||||
|
||||
---
|
||||
|
||||
## 2. Five Collection-Level Concerns
|
||||
|
||||
### Overview
|
||||
|
||||
| # | Concern | Question | Primary framework |
|
||||
|---|---------|----------|-------------------|
|
||||
| C1 | Semantic Overlap | Are there redundant concepts? | OOPS! P2, embedding similarity |
|
||||
| C2 | Coverage Completeness | Does the concept set cover the domain? | SEQUAL, FCA |
|
||||
| C3 | Structural Coherence | Do concepts form a connected explanatory graph? | OntoQA, graph theory |
|
||||
| C4 | Definitional Consistency | Are concepts defined consistently and non-circularly? | OntoClean, OOPS! P24 |
|
||||
| C5 | Granularity Balance | Are concepts at comparable levels of abstraction? | Granularity theory, DSL laconicity |
|
||||
|
||||
---
|
||||
|
||||
## 3. Theoretical Frameworks
|
||||
|
||||
### 3.1 SEQUAL (Semiotic Quality Framework)
|
||||
|
||||
**Origin:** Lindland, Sindre & Sølvberg (1994), extended by Krogstie et al.
|
||||
|
||||
**What it defines:** Quality of a conceptual model as the correspondence
|
||||
between three worlds — the domain (what exists), the model (what we
|
||||
captured), and the audience's interpretation (what they understand).
|
||||
|
||||
Two key dimensions of **semantic quality**:
|
||||
|
||||
- **Validity** — everything in the model corresponds to something real
|
||||
in the domain. No invented concepts.
|
||||
- **Completeness** — everything relevant in the domain is represented in
|
||||
the model. No missing concepts.
|
||||
|
||||
**How we use it:** SEQUAL frames our entire metrics approach. Every
|
||||
collection-level metric maps to one of these dimensions:
|
||||
|
||||
| SEQUAL dimension | Our concerns |
|
||||
|-----------------|--------------|
|
||||
| Validity | C1 (redundancy reduces validity — duplicate concepts don't correspond to distinct domain facts), C4 (consistency — contradictory definitions can't both be valid) |
|
||||
| Completeness | C2 (coverage — are all needed concepts present?), C5 (granularity — missing levels of abstraction are completeness gaps) |
|
||||
| Both | C3 (coherence — disconnected concepts suggest either missing bridging concepts [completeness] or misplaced concepts [validity]) |
|
||||
|
||||
**Adaptation:** SEQUAL was designed for formal models evaluated by human
|
||||
experts. We replace human judgment with LLM-Eval (for validity checks like
|
||||
"does this concept correspond to something Smith actually described?") and
|
||||
deterministic counting (for completeness checks like "which VSM systems
|
||||
lack entity mappings?").
|
||||
|
||||
### 3.2 OntoClean
|
||||
|
||||
**Origin:** Guarino & Welty (2004).
|
||||
|
||||
**What it defines:** A methodology for validating taxonomic relationships
|
||||
by assigning **metaproperties** to each concept:
|
||||
|
||||
- **Rigidity** — Is the property essential to all its instances? (e.g.
|
||||
"market" is rigid; "effectual demander" is anti-rigid — an agent can
|
||||
stop being an effectual demander)
|
||||
- **Identity** — Does the concept carry an identity criterion? (e.g.
|
||||
"division of labour" can be identified by its three causal mechanisms)
|
||||
- **Unity** — Are all instances of this concept whole in the same way?
|
||||
- **Dependence** — Does the concept require another concept to exist?
|
||||
(e.g. "market price" depends on "effectual demand")
|
||||
|
||||
**Constraint:** A rigid concept cannot be subsumed by an anti-rigid one.
|
||||
Violations indicate structural confusion.
|
||||
|
||||
**How we use it:** We do not have a formal taxonomy, but our flat entity
|
||||
set implicitly contains subsumption relationships (e.g. "natural rate"
|
||||
subsumes "ordinary-or-average rate"). OntoClean metaproperties help detect:
|
||||
|
||||
- **Granularity mismatches** (C5): A rigid concept at the same level as
|
||||
an anti-rigid one suggests different abstraction levels are mixed.
|
||||
- **Definitional consistency** (C4): If entity A depends on entity B per
|
||||
OntoClean, but B's definition doesn't acknowledge A, the definitions
|
||||
are inconsistent.
|
||||
- **Redundancy** (C1): Two entities with identical metaproperty profiles
|
||||
and overlapping definitions are candidates for merging.
|
||||
|
||||
**Adaptation:** Instead of manual metaproperty assignment, we use LLM-Eval
|
||||
to classify each entity's rigidity, identity criterion, and dependencies.
|
||||
The constraint checking is then deterministic.
|
||||
|
||||
### 3.3 OOPS! (Ontology Pitfall Scanner)
|
||||
|
||||
**Origin:** Poveda-Villalón et al. (2014). Catalogue of 41 common
|
||||
ontology design pitfalls.
|
||||
|
||||
**What it defines:** Concrete, testable anti-patterns. The pitfalls most
|
||||
relevant to our infospace:
|
||||
|
||||
| Pitfall | Description | Our concern |
|
||||
|---------|-------------|-------------|
|
||||
| P2 | Synonymous classes — different names, same meaning | C1 (redundancy) |
|
||||
| P4 | Unconnected ontology elements | C3 (coherence) |
|
||||
| P6 | Missing inverse relationships | C3 |
|
||||
| P7 | Merging different concepts in the same class | C5 (granularity — too coarse) |
|
||||
| P11 | Missing domain or range | C4 (consistency) |
|
||||
| P19 | Missing disjointness axioms | C1 (how do we know two concepts don't overlap?) |
|
||||
| P24 | Recursive/circular definition | C4 (consistency) |
|
||||
| P25 | Inverse of itself | C4 |
|
||||
|
||||
**How we use it:** OOPS! pitfalls become a **checklist for LLM-Eval
|
||||
prompts**. Rather than running a formal OWL scanner, we ask the LLM to
|
||||
check for each pitfall pattern:
|
||||
|
||||
- "Are entities A and B synonymous?" (P2)
|
||||
- "Does entity A's definition reference itself?" (P24)
|
||||
- "Is entity A actually two distinct concepts merged together?" (P7)
|
||||
|
||||
The deterministic layer counts pitfall occurrences and tracks them over
|
||||
time.
|
||||
|
||||
**Adaptation:** We select the subset of OOPS! pitfalls applicable to
|
||||
semi-formal markdown-based ontologies (no OWL axioms) and implement each
|
||||
as an LLM-Eval prompt pattern rather than a formal reasoner check.
|
||||
|
||||
### 3.4 OntoQA (Metric-Based Ontology Quality Analysis)
|
||||
|
||||
**Origin:** Tartir & Arpinar (2007).
|
||||
|
||||
**What it defines:** Quantitative schema-level and instance-level metrics:
|
||||
|
||||
- **Relationship Richness (RR):** Proportion of non-taxonomic (lateral)
|
||||
relationships to total relationships. `RR = non_hierarchical / total`.
|
||||
Low RR = mere taxonomy. High RR = rich cross-cutting connections.
|
||||
- **Attribute Richness (AR):** Average number of attributes per concept.
|
||||
`AR = total_attributes / total_concepts`.
|
||||
- **Inheritance Richness (IR):** Average subclasses per class — measures
|
||||
how knowledge distributes across the hierarchy.
|
||||
- **Class Richness (CR):** Proportion of classes with instances.
|
||||
|
||||
**How we use it:** Our entities don't have formal relationships declared
|
||||
between them, but we can **infer** a relationship graph from their
|
||||
definitions and mappings:
|
||||
|
||||
- Entity A references entity B in its definition → definitional dependency
|
||||
- Entities A and B map to the same VSM system → structural co-occurrence
|
||||
- Entities A and B appear in the same chapter → contextual co-occurrence
|
||||
|
||||
From this inferred graph, we compute OntoQA metrics directly:
|
||||
|
||||
- **Relationship Richness** tells us whether our concepts form a web of
|
||||
explanatory connections or just a flat list.
|
||||
- **Attribute Richness** maps to our schema sections — entities with more
|
||||
optional sections filled (Original Wording, Modern Interpretation) are
|
||||
richer.
|
||||
|
||||
**Adaptation:** The key modification is that relationship inference is an
|
||||
LLM-Eval step (pairwise: "does A's definition depend on or reference B?"),
|
||||
after which all OntoQA metrics are computed deterministically on the
|
||||
resulting graph.
|
||||
|
||||
### 3.5 Formal Concept Analysis (FCA)
|
||||
|
||||
**Origin:** Wille (1982). Applied to ontology auditing by Elhaj et al.
|
||||
(2008) for SNOMED CT completeness checking.
|
||||
|
||||
**What it defines:** A mathematical framework for deriving a **concept
|
||||
lattice** from a binary relation between objects and attributes. The
|
||||
lattice reveals:
|
||||
|
||||
- **Formal concepts**: maximal sets of objects sharing the same attributes
|
||||
- **Subconcept/superconcept** relationships: the natural hierarchy
|
||||
- **Missing concepts**: attribute combinations with no corresponding object
|
||||
|
||||
**How we use it:** We construct a **formal context** (binary matrix):
|
||||
|
||||
- **Objects** = our 85 entities
|
||||
- **Attributes** = economic domain, VSM system, source book, abstraction
|
||||
level (from LLM-Eval), key terms (extracted from definitions)
|
||||
|
||||
The concept lattice then reveals:
|
||||
|
||||
- **Coverage gaps** (C2): Attribute combinations with no entity. E.g. if
|
||||
the cell {Distribution, S3} is empty, we lack control-layer concepts
|
||||
for distribution — a specific, actionable gap.
|
||||
- **Redundancy** (C1): Entities with identical attribute sets (same formal
|
||||
concept) are candidates for merging.
|
||||
- **Granularity** (C5): The lattice depth indicates how many meaningful
|
||||
levels of abstraction exist. A shallow lattice suggests missing
|
||||
intermediate concepts.
|
||||
|
||||
**Adaptation:** Classic FCA requires crisp binary attributes. Our domains
|
||||
and VSM mappings are already categorical, but abstraction level and key
|
||||
terms need LLM-Eval to produce. The lattice computation itself is
|
||||
deterministic (Python `concepts` library or equivalent). The FCA approach
|
||||
replaces the current "ask the LLM about coverage" with a structural
|
||||
computation that can identify *specific* gaps rather than vague
|
||||
recommendations.
|
||||
|
||||
### 3.6 DSL Design Principles
|
||||
|
||||
**Origin:** Mernik et al. (2005) "When and How to Develop DSLs";
|
||||
Karsai et al. (2014) "Design Guidelines for Domain-Specific Languages".
|
||||
|
||||
**What they define:** Quality criteria for a set of concepts that form a
|
||||
language for a specific domain:
|
||||
|
||||
- **Soundness**: Every concept in the language corresponds to a real domain
|
||||
concern (no invented abstractions).
|
||||
- **Completeness**: The language can express everything needed for its
|
||||
intended tasks.
|
||||
- **Laconicity**: No unnecessary concepts — every concept earns its place.
|
||||
- **Orthogonality**: Concepts are independent; combining any two produces
|
||||
a meaningful result (no redundant combinations).
|
||||
|
||||
**How we use it:** Our entity set is effectively a domain-specific
|
||||
vocabulary for "explaining classical economics through VSM". DSL quality
|
||||
criteria translate directly:
|
||||
|
||||
- **Soundness** → Validity (SEQUAL): every entity grounded in Smith's text
|
||||
- **Completeness** → Coverage (C2): can we answer the "competency
|
||||
questions" the infospace is meant to address?
|
||||
- **Laconicity** → Anti-redundancy (C1) + Indispensability (C5): would
|
||||
removing any entity lose explanatory power?
|
||||
- **Orthogonality** → Non-overlap (C1): entity definitions don't
|
||||
substantially duplicate each other
|
||||
|
||||
**Adaptation:** We operationalise DSL completeness through **competency
|
||||
questions** — a set of canonical questions the infospace should be able to
|
||||
answer (e.g. "How does the division of labour relate to market extent?",
|
||||
"What mechanisms regulate wages toward their natural rate?"). LLM-Eval
|
||||
tests whether the current entity set suffices to answer each question.
|
||||
Unanswerable questions identify specific completeness gaps.
|
||||
|
||||
Laconicity is operationalised as **indispensability scoring**: for each
|
||||
entity, LLM-Eval rates whether removing it would lose explanatory power.
|
||||
Low-scoring entities are candidates for merging or retirement.
|
||||
|
||||
---
|
||||
|
||||
## 4. Integration: Metric Definitions by Concern
|
||||
|
||||
### C1: Semantic Overlap / Redundancy
|
||||
|
||||
**Goal:** Identify entities that substantially overlap in meaning and
|
||||
should be merged, distinguished, or retired.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `similarity_matrix` | Deterministic | Embed all entity definitions; compute NxN cosine similarity |
|
||||
| `high_similarity_pairs` | Deterministic | Pairs with cosine > 0.80, sorted descending |
|
||||
| `confirmed_synonyms` | LLM-Eval | For each high-similarity pair, LLM judges: "same concept" / "genuinely distinct" / "partial overlap" |
|
||||
| `redundancy_ratio` | Deterministic | `confirmed_synonyms / total_entities` |
|
||||
| `intensional_conciseness` | Deterministic | `1 - redundancy_ratio` (from KG quality framework) |
|
||||
|
||||
**Pipeline:**
|
||||
1. Embed definitions (embedding API or local model)
|
||||
2. Compute cosine similarity matrix
|
||||
3. Filter pairs above threshold
|
||||
4. LLM pairwise judgment on filtered pairs only (avoids N² LLM calls)
|
||||
5. Aggregate into ratio and conciseness score
|
||||
|
||||
**Output:** `output/metrics/redundancy-report.md` + structured YAML with
|
||||
pair list, scores, and merge/retire recommendations.
|
||||
|
||||
### C2: Coverage Completeness
|
||||
|
||||
**Goal:** Identify domain areas and VSM systems that lack adequate
|
||||
representation in the entity set.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `domain_vsm_matrix` | Deterministic | Count entities per {economic_domain, VSM_system} cell |
|
||||
| `coverage_ratio` | Deterministic | `populated_cells / expected_cells` |
|
||||
| `vsm_balance_entropy` | Deterministic | Shannon entropy of entity distribution across VSM systems (higher = more balanced) |
|
||||
| `empty_cells` | Deterministic | List of {domain, VSM_system} pairs with zero entities |
|
||||
| `competency_coverage` | LLM-Eval | For each competency question, can it be answered with current entities? |
|
||||
| `fca_gap_concepts` | Deterministic | Attribute combinations in the FCA lattice with no corresponding entity |
|
||||
|
||||
**Pipeline:**
|
||||
1. Parse entity metadata (domain, VSM mapping) from files on disk
|
||||
2. Build domain × VSM matrix; identify empty cells
|
||||
3. Build FCA formal context; compute lattice; extract gap concepts
|
||||
4. Define competency questions (initially hand-written, later LLM-generated
|
||||
from the source material)
|
||||
5. LLM-evaluate answerability of each question
|
||||
6. Aggregate into coverage ratio, entropy, and gap list
|
||||
|
||||
**Output:** `output/metrics/coverage-report.md` + YAML with matrix, gaps,
|
||||
and competency question results.
|
||||
|
||||
### C3: Structural Coherence
|
||||
|
||||
**Goal:** Determine whether the entities form a connected explanatory web
|
||||
or a fragmented collection of isolated concepts.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `relationship_graph` | LLM-Eval + Deterministic | Infer edges from definition cross-references (string matching) + LLM judgment for implicit references |
|
||||
| `connected_components` | Deterministic | Number of connected components in the graph (target: 1) |
|
||||
| `graph_density` | Deterministic | `actual_edges / possible_edges` |
|
||||
| `avg_degree` | Deterministic | `total_edges / total_entities` |
|
||||
| `relationship_richness` | Deterministic | OntoQA RR: `non_hierarchical_edges / total_edges` |
|
||||
| `modularity` | Deterministic | Louvain modularity score (0.3-0.7 = meaningful structure; >0.8 = fragmentation) |
|
||||
| `bridge_concepts` | Deterministic | Entities with highest betweenness centrality (connect clusters) |
|
||||
| `orphan_entities` | Deterministic | Entities with degree 0 or 1 |
|
||||
| `cohesion_by_domain` | Deterministic | Avg intra-domain edges per entity |
|
||||
| `coupling_across_domains` | Deterministic | Inter-domain edges / total edges |
|
||||
|
||||
**Pipeline:**
|
||||
1. Extract explicit cross-references from definitions (entity name
|
||||
mentions in other definitions — string matching with slug normalisation)
|
||||
2. For entity pairs not caught by string matching, LLM-Eval: "Does A's
|
||||
definition depend on or reference B's concept?"
|
||||
3. Build directed graph
|
||||
4. Compute graph metrics (networkx or equivalent)
|
||||
5. Run community detection; compare detected communities to declared
|
||||
economic domains
|
||||
|
||||
**Output:** `output/metrics/coherence-report.md` + YAML with graph
|
||||
statistics, orphan list, bridge concepts, and community structure.
|
||||
|
||||
### C4: Definitional Consistency
|
||||
|
||||
**Goal:** Ensure entities are defined consistently, non-circularly, and
|
||||
without contradicting each other.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `definitional_dependency_graph` | Deterministic + LLM-Eval | Edges where A's definition uses B's concept |
|
||||
| `circular_definitions` | Deterministic | Cycles of length ≤ 3 in the dependency graph |
|
||||
| `definition_depth` | Deterministic | Longest dependency chain per entity before reaching a term not in the entity set |
|
||||
| `undefined_dependencies` | Deterministic | Terms used in definitions that arguably should be entities but aren't |
|
||||
| `pairwise_consistency` | LLM-Eval | For related entity pairs (sharing edges): "Do these definitions contradict each other?" |
|
||||
| `source_fidelity` | LLM-Eval | "Does this definition accurately represent what Smith wrote in the cited passage?" |
|
||||
| `metaproperty_violations` | LLM-Eval + Deterministic | OntoClean constraint checking after LLM classifies rigidity/identity |
|
||||
| `grounding_ratio` | Deterministic | Fraction of entities traceable to primitives without cycles |
|
||||
|
||||
**Pipeline:**
|
||||
1. Build definitional dependency graph (same technique as C3, but directed
|
||||
— A depends on B means A's definition uses B, not vice versa)
|
||||
2. Detect cycles; flag short cycles
|
||||
3. Extract undefined terms (terms matching entity-name patterns that appear
|
||||
in definitions but have no corresponding entity file)
|
||||
4. LLM pairwise consistency check on directly-connected pairs
|
||||
5. LLM source fidelity check (compare definition to source chapter text)
|
||||
6. LLM OntoClean metaproperty classification; deterministic constraint
|
||||
checking
|
||||
|
||||
**Output:** `output/metrics/consistency-report.md` + YAML with cycle list,
|
||||
undefined terms, contradiction candidates, and metaproperty violations.
|
||||
|
||||
### C5: Granularity Balance
|
||||
|
||||
**Goal:** Ensure entities operate at comparable levels of abstraction
|
||||
within their respective domains and perspectives.
|
||||
|
||||
**Metrics:**
|
||||
|
||||
| Metric | Type | Computation |
|
||||
|--------|------|-------------|
|
||||
| `abstraction_classification` | LLM-Eval | Classify each entity as theory-level / mechanism-level / observation-level |
|
||||
| `scope_score` | LLM-Eval | Rate each entity 1-5 for generality (1 = very specific instance, 5 = broad theoretical principle) |
|
||||
| `abstraction_distribution` | Deterministic | Count per level; compute entropy |
|
||||
| `scope_variance` | Deterministic | Variance of scope scores within each domain |
|
||||
| `level_x_perspective_matrix` | Deterministic | Cross-tabulation of abstraction level × economic domain |
|
||||
| `indispensability` | LLM-Eval | "If removed, what explanatory power is lost?" (1-5) |
|
||||
| `dispensable_entities` | Deterministic | Entities with indispensability score ≤ 2 |
|
||||
| `merge_candidates` | LLM-Eval | Pairs where one is a sub-case of the other |
|
||||
|
||||
**Pipeline:**
|
||||
1. LLM-classify each entity: abstraction level, scope score,
|
||||
indispensability
|
||||
2. Build level × perspective matrix
|
||||
3. Compute distribution entropy and per-domain scope variance
|
||||
4. Flag outliers: entities whose scope score deviates > 1.5σ from their
|
||||
domain mean
|
||||
5. For outlier entities, LLM-Eval: "Should this be merged into a broader
|
||||
concept, or split into sub-concepts?"
|
||||
|
||||
**Output:** `output/metrics/granularity-report.md` + YAML with
|
||||
classifications, distribution, outliers, and merge/split recommendations.
|
||||
|
||||
---
|
||||
|
||||
## 5. Shared Infrastructure
|
||||
|
||||
Several concerns share underlying computations:
|
||||
|
||||
| Infrastructure | Used by | Build once |
|
||||
|---------------|---------|------------|
|
||||
| Definition embeddings (vector per entity) | C1, C3 | Embedding API call per entity |
|
||||
| Relationship graph (entity → entity edges) | C3, C4 | String matching + LLM-Eval |
|
||||
| FCA formal context (entity × attribute matrix) | C2, C5 | Metadata parsing + LLM classification |
|
||||
| Entity metadata index (domain, VSM, chapter, sections) | C2, C5, C10 (schema compliance) | Deterministic markdown parsing |
|
||||
|
||||
These should be computed once per evaluation run and cached for use by
|
||||
all concern-specific metrics.
|
||||
|
||||
---
|
||||
|
||||
## 6. Evaluation Workflow
|
||||
|
||||
A full collection-level evaluation run:
|
||||
|
||||
```
|
||||
process_chapters.py --evaluate-collection --provider <provider>
|
||||
```
|
||||
|
||||
1. **Parse** — deterministic metadata extraction from all entity files
|
||||
2. **Embed** — compute definition embeddings (cached; only new/changed
|
||||
entities need fresh embeddings)
|
||||
3. **Infer** — LLM-Eval for relationship edges, metaproperties,
|
||||
abstraction levels, pairwise judgments (batched to minimise LLM calls)
|
||||
4. **Compute** — deterministic graph metrics, FCA lattice, coverage
|
||||
matrix, similarity matrix, cycle detection
|
||||
5. **Aggregate** — combine per-entity and per-pair scores into
|
||||
collection-level metrics
|
||||
6. **Report** — write per-concern markdown reports + unified `metrics.yaml`
|
||||
7. **Append** — add timestamped snapshot to `metrics-history.yaml`
|
||||
|
||||
Incremental mode (`--evaluate-collection --chapter <id>`) re-evaluates
|
||||
only the entities introduced or modified by that chapter, plus any
|
||||
pairwise checks involving those entities.
|
||||
|
||||
---
|
||||
|
||||
## 7. References
|
||||
|
||||
- Lindland, O.I., Sindre, G. & Sølvberg, A. (1994). "Understanding
|
||||
Quality in Conceptual Modeling." *IEEE Software* 11(2), 42-49.
|
||||
→ SEQUAL framework: validity and completeness dimensions.
|
||||
|
||||
- Guarino, N. & Welty, C.A. (2004). "An Overview of OntoClean." In
|
||||
*Handbook on Ontologies*, Springer, 151-171.
|
||||
→ Metaproperty analysis: rigidity, identity, unity, dependence.
|
||||
|
||||
- Poveda-Villalón, M., Gómez-Pérez, A. & Suárez-Figueroa, M.C. (2014).
|
||||
"OOPS! (OntOlogy Pitfall Scanner!): An On-line Tool for Ontology
|
||||
Evaluation." *IJSWIS* 10(2), 7-34.
|
||||
→ Pitfall catalogue: 41 anti-patterns for ontology design.
|
||||
|
||||
- Tartir, S. & Arpinar, I.B. (2007). "Ontology Evaluation and Ranking
|
||||
using OntoQA." *ICSC 2007*, IEEE, 185-192.
|
||||
→ Schema metrics: relationship richness, attribute richness.
|
||||
|
||||
- Wille, R. (1982). "Restructuring Lattice Theory." In *Ordered Sets*,
|
||||
Reidel, 445-470.
|
||||
→ Formal Concept Analysis: concept lattices from binary contexts.
|
||||
|
||||
- Elhaj, H. et al. (2008). "Auditing SNOMED CT with Formal Concept
|
||||
Analysis." *AMIA Annual Symposium*, PMC2605587.
|
||||
→ FCA for ontology completeness auditing.
|
||||
|
||||
- Keet, C.M. (2008). *A Formal Theory of Granularity.* PhD thesis,
|
||||
Free University of Bozen-Bolzano.
|
||||
→ Granularity levels and perspectives for ontology design.
|
||||
|
||||
- Mernik, M., Heering, J. & Sloane, A.M. (2005). "When and How to
|
||||
Develop Domain-Specific Languages." *ACM Computing Surveys* 37(4),
|
||||
316-344.
|
||||
→ DSL design: soundness, completeness, laconicity.
|
||||
|
||||
- Karsai, G. et al. (2014). "Design Guidelines for Domain Specific
|
||||
Languages." *arXiv:1409.2378*.
|
||||
→ Orthogonality, necessary-and-sufficient principle.
|
||||
|
||||
- Xue, B. & Zou, L. (2022). "Knowledge Graph Quality Management: A
|
||||
Comprehensive Survey." *IEEE TKDE* 35(5), 4969-4988.
|
||||
→ KG quality dimensions: conciseness, consistency, completeness.
|
||||
621
roadmap/infospace-tooling/PLAN.md
Normal file
621
roadmap/infospace-tooling/PLAN.md
Normal file
@@ -0,0 +1,621 @@
|
||||
# Viable Infospace Tooling — Roadmap
|
||||
|
||||
## Vision
|
||||
|
||||
An **infospace** is a structured, evaluable, composable collection of
|
||||
concepts that explains a **topic** through the lens of one or more
|
||||
**disciplines**. Infospaces are the unit of knowledge work in MarkiTect.
|
||||
|
||||
This roadmap organises the work needed to move from the current
|
||||
ad-hoc example (`infospace-with-history`) to a general-purpose platform
|
||||
for creating, evaluating, maintaining, and composing infospaces.
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
These terms establish the vocabulary for infospace tooling. They
|
||||
generalise from the Wealth of Nations / VSM example but are not
|
||||
specific to it.
|
||||
|
||||
### Infospace
|
||||
|
||||
A curated, self-describing collection of **entities** (concepts,
|
||||
mechanisms, observations) that together explain a **topic**. An
|
||||
infospace has:
|
||||
|
||||
- A **topic** — the subject matter being explained (e.g. "The Wealth
|
||||
of Nations", "cellular biology", "Kubernetes networking")
|
||||
- One or more **disciplines** — external frameworks applied as lenses
|
||||
(e.g. "Viable System Model", "category theory")
|
||||
- **Entities** — the atomic units of knowledge, each with a definition,
|
||||
provenance, and quality scores
|
||||
- **Schemas** — structural templates that define what a well-formed
|
||||
entity, mapping, or analysis looks like
|
||||
- **Evaluations** — per-entity and collection-level quality assessments
|
||||
- **Metrics** — quantitative indicators of completeness, coherence,
|
||||
consistency, and granularity balance
|
||||
|
||||
An infospace is **viable** when it meets threshold scores across its
|
||||
defined metrics — it is fit for purpose as an explanatory tool.
|
||||
|
||||
### Topic
|
||||
|
||||
The subject matter an infospace is built to explain. A topic sits
|
||||
within a **domain** (broader field of knowledge) but is more specific:
|
||||
|
||||
- Domain: Economics → Topic: The Wealth of Nations
|
||||
- Domain: Systems Theory → Topic: Viable System Model
|
||||
- Domain: Computer Science → Topic: Distributed consensus protocols
|
||||
|
||||
A topic provides the **source material** — the texts, data, or
|
||||
observations from which entities are extracted.
|
||||
|
||||
### Discipline
|
||||
|
||||
A reusable framework of concepts applied as a lens to explore a topic.
|
||||
A discipline is itself an infospace — one that has been evaluated as
|
||||
viable and packaged for reuse.
|
||||
|
||||
In our example, the VSM is the discipline: a set of concepts (S1-S5,
|
||||
recursion, variety, viability) from systems theory, applied to the
|
||||
economic concepts in Smith's work.
|
||||
|
||||
**Key property:** Disciplines compose. An infospace built with one
|
||||
discipline can itself become a discipline for another infospace. The
|
||||
Wealth of Nations infospace, viewed through VSM, could become a
|
||||
discipline applied to a modern supply chain analysis.
|
||||
|
||||
### Entity
|
||||
|
||||
The atomic unit of an infospace. An entity has:
|
||||
|
||||
- **Identity**: a unique slug and human-readable title
|
||||
- **Definition**: a precise, non-circular explanation
|
||||
- **Provenance**: the source chapter, passage, and extraction context
|
||||
- **Domain placement**: which area of the topic it belongs to
|
||||
- **Discipline mapping**: how it connects to the applied discipline
|
||||
(e.g. which VSM system)
|
||||
- **Quality scores**: per-entity LLM-evaluated metrics
|
||||
- **Lifecycle state**: active, archived (with reason), or draft
|
||||
|
||||
### Evaluation
|
||||
|
||||
A structured assessment of quality, applied at two levels:
|
||||
|
||||
- **Per-entity evaluation**: scores an individual entity against
|
||||
quality rubrics defined in its schema (definition precision, source
|
||||
grounding, discipline relevance, etc.)
|
||||
- **Collection evaluation**: scores the entity set as a whole against
|
||||
five concerns: redundancy, coverage, coherence, consistency, and
|
||||
granularity balance
|
||||
|
||||
Evaluations are always performed by **delegated LLM calls** through
|
||||
MarkiTect's LLM integration — never by the coding agent working on
|
||||
infrastructure. This separation ensures that domain-level judgment
|
||||
stays in the problem space, not the tooling space.
|
||||
|
||||
### Viability
|
||||
|
||||
An infospace is viable when:
|
||||
|
||||
1. Its entities individually meet quality thresholds (per-entity eval)
|
||||
2. Its collection metrics are within acceptable ranges
|
||||
3. It can answer its defined **competency questions** — the canonical
|
||||
queries the infospace is meant to support
|
||||
4. It has been evaluated recently enough that metrics reflect current
|
||||
content
|
||||
|
||||
Viability is not binary — it is a profile of scores that the user
|
||||
sets thresholds for based on their needs.
|
||||
|
||||
---
|
||||
|
||||
## Architecture: Three Layers
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ Layer 3: Infospace Instances │
|
||||
│ Specific infospaces built by users │
|
||||
│ (Wealth of Nations + VSM, supply chain + ...) │
|
||||
│ Works IN an infospace │
|
||||
├──────────────────────────────────────────────────┤
|
||||
│ Layer 2: Infospace Tooling │
|
||||
│ Terminology, primitives, composition model │
|
||||
│ CLI: infospace create/evaluate/compose/... │
|
||||
│ Works WITH infospaces │
|
||||
├──────────────────────────────────────────────────┤
|
||||
│ Layer 1: MarkiTect Platform │
|
||||
│ Artifacts, prompts, LLM, spaces, graph, embed │
|
||||
│ Provides FOR infospaces │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Boundary condition: LLM delegation
|
||||
|
||||
All LLM-based evaluation (entity scoring, pairwise judgments, coverage
|
||||
analysis) is delegated to MarkiTect's LLM integration module. The coding
|
||||
agent that works on infrastructure never makes domain-level judgments
|
||||
itself. This keeps a clean separation:
|
||||
|
||||
- **Coding agent** → writes Python, templates, schemas, tests
|
||||
- **MarkiTect LLM** → evaluates entities, judges redundancy, assesses
|
||||
coverage, checks consistency
|
||||
|
||||
The infospace tooling (Layer 2) orchestrates these LLM calls through
|
||||
prompt templates and the prompt execution engine, not through ad-hoc
|
||||
prompting.
|
||||
|
||||
---
|
||||
|
||||
## Stage 1: MarkiTect Platform Additions
|
||||
|
||||
Infrastructure that must exist before infospace tooling can be built.
|
||||
These are general-purpose platform capabilities, not infospace-specific.
|
||||
|
||||
### S1.1 — Entity metadata parser
|
||||
|
||||
Add a deterministic markdown parser that extracts structured metadata
|
||||
from entity files: H1 title, sections present, word counts, domain,
|
||||
source chapter. Returns a dataclass usable by all downstream metrics.
|
||||
|
||||
**Maps to:** INFRA-TASKS #13, #10
|
||||
**Location:** `markitect/prompts/quality/` or new `markitect/analysis/`
|
||||
**Depends on:** Nothing — can start immediately
|
||||
**Deliverable:** `parse_entity_metadata(path) -> EntityMeta` function
|
||||
with tests
|
||||
|
||||
### S1.2 — Schema compliance validator
|
||||
|
||||
Deterministic validation of entity/mapping files against their schemas:
|
||||
section presence, word count ranges, heading format, enum values. No
|
||||
LLM needed.
|
||||
|
||||
**Maps to:** INFRA-TASKS #10
|
||||
**Location:** `markitect/prompts/quality/validator.py` (extend existing)
|
||||
**Depends on:** S1.1
|
||||
**Deliverable:** `validate_document(path, schema) -> ValidationResult`
|
||||
with tests
|
||||
|
||||
### S1.3 — Embedding adapter
|
||||
|
||||
Add embedding support to `markitect/llm/`. Needs:
|
||||
|
||||
- `EmbeddingAdapter` interface: `embed(texts: list[str]) -> list[list[float]]`
|
||||
- `OpenRouterEmbeddingAdapter` implementation (or OpenAI embedding endpoint)
|
||||
- Caching layer: store embeddings keyed by `{slug: content_digest}` so
|
||||
unchanged entities skip re-embedding
|
||||
- Cosine similarity utility: `similarity_matrix(embeddings) -> np.ndarray`
|
||||
|
||||
**Maps to:** INFRA-TASKS #14 (prerequisite)
|
||||
**Location:** `markitect/llm/embeddings.py`
|
||||
**Depends on:** Nothing — can start immediately
|
||||
**Deliverable:** Embedding adapter + cache + similarity computation, with
|
||||
tests
|
||||
|
||||
### S1.4 — Graph analysis utilities
|
||||
|
||||
The existing `DependencyGraph` supports basic traversal and cycle
|
||||
detection. Collection-level metrics need richer analysis:
|
||||
|
||||
- Connected components
|
||||
- Betweenness centrality
|
||||
- Community detection (Louvain or label propagation)
|
||||
- Modularity score
|
||||
- Degree distribution
|
||||
- Cohesion/coupling computation
|
||||
|
||||
Decide: extend `DependencyGraph` or add a lightweight wrapper that
|
||||
converts to networkx (adding it as an optional dependency).
|
||||
|
||||
**Maps to:** INFRA-TASKS #16 (prerequisite)
|
||||
**Location:** `markitect/prompts/dependencies/analysis.py` or new
|
||||
`markitect/analysis/graph.py`
|
||||
**Depends on:** Nothing — can start immediately
|
||||
**Deliverable:** Graph analysis functions with tests
|
||||
|
||||
### S1.5 — Structured evaluation output
|
||||
|
||||
Define a standard format for evaluation results: YAML front-matter +
|
||||
markdown body. Add utilities for:
|
||||
|
||||
- Writing evaluation results (per-entity, per-pair, collection-level)
|
||||
- Reading/parsing evaluation results back into dataclasses
|
||||
- Appending timestamped snapshots to a history file
|
||||
- Diffing two snapshots
|
||||
|
||||
**Maps to:** INFRA-TASKS #11, #12
|
||||
**Location:** `markitect/prompts/quality/` or `markitect/analysis/`
|
||||
**Depends on:** S1.1
|
||||
**Deliverable:** `EvaluationResult` model + read/write utilities with
|
||||
tests
|
||||
|
||||
### S1.6 — Batch LLM evaluation orchestrator
|
||||
|
||||
A pipeline component that runs an evaluation prompt template against a
|
||||
batch of entities (or entity pairs), collecting structured results.
|
||||
Must handle:
|
||||
|
||||
- Rate limiting and retry (reuse existing adapter logic)
|
||||
- Progress reporting
|
||||
- Incremental evaluation (skip entities whose content hasn't changed
|
||||
since last eval)
|
||||
- Result aggregation
|
||||
|
||||
This is the mechanism by which infospace tooling delegates LLM work
|
||||
to the platform.
|
||||
|
||||
**Maps to:** INFRA-TASKS #9 (prerequisite)
|
||||
**Location:** `markitect/prompts/execution/batch.py`
|
||||
**Depends on:** S1.5
|
||||
**Deliverable:** `BatchEvaluator` class with tests
|
||||
|
||||
### S1.7 — FCA computation
|
||||
|
||||
Formal Concept Analysis: build a formal context (entity × attribute
|
||||
matrix), compute the concept lattice, extract gap concepts. Either
|
||||
implement a minimal FCA algorithm or integrate a library.
|
||||
|
||||
**Maps to:** INFRA-TASKS #15 (prerequisite)
|
||||
**Location:** `markitect/analysis/fca.py`
|
||||
**Depends on:** S1.1
|
||||
**Deliverable:** `FormalContext`, `ConceptLattice`, `find_gap_concepts()`
|
||||
with tests
|
||||
|
||||
### Summary: Stage 1 dependency graph
|
||||
|
||||
```
|
||||
S1.1 Entity metadata parser ──┬── S1.2 Schema validator
|
||||
├── S1.5 Eval output format ── S1.6 Batch evaluator
|
||||
└── S1.7 FCA computation
|
||||
|
||||
S1.3 Embedding adapter ──────── (independent)
|
||||
S1.4 Graph analysis ─────────── (independent)
|
||||
```
|
||||
|
||||
S1.1, S1.3, and S1.4 can proceed in parallel. S1.6 (batch evaluator) is
|
||||
the final piece needed before Stage 2 can begin.
|
||||
|
||||
---
|
||||
|
||||
## Stage 2: Infospace Tooling
|
||||
|
||||
The user-facing layer that provides documented primitives for working
|
||||
with infospaces. Built on top of Stage 1 infrastructure and the existing
|
||||
`markitect/spaces/` module.
|
||||
|
||||
### S2.1 — Infospace model and configuration
|
||||
|
||||
Define the `Infospace` as a first-class concept that extends the existing
|
||||
`InformationSpace` with:
|
||||
|
||||
- **Topic declaration**: name, domain, source material reference
|
||||
- **Discipline bindings**: which external infospaces are applied as lenses
|
||||
- **Schema registry**: which schemas govern entity structure
|
||||
- **Competency questions**: what the infospace should be able to answer
|
||||
- **Viability thresholds**: minimum acceptable metric scores
|
||||
- **Evaluation state**: latest per-entity and collection scores
|
||||
|
||||
Configuration format: a `infospace.yaml` (or section in existing config)
|
||||
that declares all of the above.
|
||||
|
||||
**Location:** new `markitect/infospace/` package
|
||||
**Depends on:** S1.1, S1.5, existing `markitect/spaces/`
|
||||
**Deliverable:** `InfospaceConfig`, `InfospaceState` models + loader
|
||||
|
||||
### S2.2 — Infospace lifecycle commands
|
||||
|
||||
CLI commands for the core lifecycle:
|
||||
|
||||
```bash
|
||||
# Initialise a new infospace
|
||||
markitect infospace init --topic "Wealth of Nations" \
|
||||
--domain "Economics" \
|
||||
--discipline vsm-framework
|
||||
|
||||
# Show infospace status (entity count, eval state, viability)
|
||||
markitect infospace status
|
||||
|
||||
# List entities with quality summary
|
||||
markitect infospace entities [--sort-by score|domain|chapter]
|
||||
|
||||
# Show viability dashboard
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
These commands read the `infospace.yaml` config and present information
|
||||
from the metadata index and evaluation results.
|
||||
|
||||
**Location:** `markitect/infospace/cli.py` integrated into main CLI
|
||||
**Depends on:** S2.1
|
||||
**Deliverable:** CLI commands with help text and tests
|
||||
|
||||
### S2.3 — Per-entity evaluation primitives
|
||||
|
||||
Prompt templates and CLI commands for evaluating individual entities:
|
||||
|
||||
```bash
|
||||
# Evaluate all entities
|
||||
markitect infospace evaluate --provider openrouter
|
||||
|
||||
# Evaluate entities from a specific chapter
|
||||
markitect infospace evaluate --chapter book-1-chapter-05 --provider openrouter
|
||||
|
||||
# Re-evaluate a single entity
|
||||
markitect infospace evaluate --entity division-of-labour --provider openrouter
|
||||
```
|
||||
|
||||
Uses the batch evaluator (S1.6) to run the evaluate-entity prompt
|
||||
template (defined in the infospace's schema directory) against entities.
|
||||
Writes structured results to `output/evaluations/`.
|
||||
|
||||
**Maps to:** INFRA-TASKS #8, #9
|
||||
**Location:** `markitect/infospace/evaluation.py`
|
||||
**Depends on:** S1.6, S2.1
|
||||
**Deliverable:** Per-entity evaluation pipeline + CLI + prompt template
|
||||
|
||||
### S2.4 — Collection-level checks
|
||||
|
||||
CLI commands for each of the five collection concerns:
|
||||
|
||||
```bash
|
||||
# Run all collection checks
|
||||
markitect infospace check --provider openrouter
|
||||
|
||||
# Run specific checks
|
||||
markitect infospace check redundancy --provider openrouter
|
||||
markitect infospace check coverage --provider openrouter
|
||||
markitect infospace check coherence --provider openrouter
|
||||
markitect infospace check consistency --provider openrouter
|
||||
markitect infospace check granularity --provider openrouter
|
||||
```
|
||||
|
||||
Each check uses Stage 1 infrastructure (embeddings, graph analysis, FCA)
|
||||
and delegates LLM judgment to the platform. Results written to
|
||||
`output/metrics/` as per-concern reports + unified `metrics.yaml`.
|
||||
|
||||
**Maps to:** INFRA-TASKS #14-19
|
||||
**Location:** `markitect/infospace/checks/` (one module per concern)
|
||||
**Depends on:** S1.3, S1.4, S1.6, S1.7, S2.1
|
||||
**Deliverable:** Five check modules + unified orchestrator + CLI
|
||||
|
||||
### S2.5 — Metrics history and viability tracking
|
||||
|
||||
Track metrics over time. After each evaluation or check run, append a
|
||||
timestamped snapshot to `metrics-history.yaml`. Provide commands to
|
||||
review trends:
|
||||
|
||||
```bash
|
||||
# Show metrics history
|
||||
markitect infospace history
|
||||
|
||||
# Compare two snapshots
|
||||
markitect infospace history diff 2026-02-18 2026-03-01
|
||||
|
||||
# Check viability against thresholds
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
Viability is assessed by comparing current metrics to the thresholds
|
||||
declared in `infospace.yaml`. A simple pass/fail per metric with the
|
||||
actual value.
|
||||
|
||||
**Maps to:** INFRA-TASKS #12
|
||||
**Location:** `markitect/infospace/history.py`
|
||||
**Depends on:** S2.4, S1.5
|
||||
**Deliverable:** History tracking + viability assessment + CLI
|
||||
|
||||
### S2.6 — Infospace composition model
|
||||
|
||||
The mechanism by which one infospace is applied as a discipline to
|
||||
another. Builds on `markitect/spaces/composability/`:
|
||||
|
||||
- **Discipline binding**: declare that infospace A uses infospace B as a
|
||||
discipline. B's entities become available as mapping targets.
|
||||
- **Cross-infospace references**: entity in A maps to concept in B using
|
||||
the same mapping schema and evaluation pipeline.
|
||||
- **Discipline viability requirement**: B must be viable (meets its own
|
||||
thresholds) before it can be used as a discipline for A.
|
||||
- **Cascading evaluation**: when B's entities change, A's mappings that
|
||||
reference them are flagged for re-evaluation.
|
||||
|
||||
```bash
|
||||
# Bind a discipline to the current infospace
|
||||
markitect infospace bind-discipline ./path/to/vsm-infospace
|
||||
|
||||
# List bound disciplines and their viability
|
||||
markitect infospace disciplines
|
||||
|
||||
# Check for stale mappings after discipline update
|
||||
markitect infospace check stale-mappings
|
||||
```
|
||||
|
||||
**Location:** `markitect/infospace/composition.py`
|
||||
**Depends on:** S2.1, existing `markitect/spaces/composability/`
|
||||
**Deliverable:** Composition model + CLI + documentation
|
||||
|
||||
### S2.7 — Documentation: Infospace Primitives Reference
|
||||
|
||||
A reference document explaining all primitives, their purpose, and how
|
||||
they compose. This is the user-facing documentation for the infospace
|
||||
tooling layer — the equivalent of a framework guide.
|
||||
|
||||
**Location:** `docs/infospace-primitives.md` or in-CLI help
|
||||
**Depends on:** S2.1-S2.6
|
||||
**Deliverable:** Reference documentation
|
||||
|
||||
### Summary: Stage 2 dependency graph
|
||||
|
||||
```
|
||||
S2.1 Model & config ──┬── S2.2 Lifecycle CLI
|
||||
├── S2.3 Per-entity evaluation
|
||||
├── S2.4 Collection checks ── S2.5 History & viability
|
||||
└── S2.6 Composition model
|
||||
|
||||
S2.7 Documentation (depends on all above)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 3: Example Revision
|
||||
|
||||
Revisit the Wealth of Nations / VSM example using the new tooling.
|
||||
The example becomes both a tutorial and a validation of the tooling.
|
||||
|
||||
### S3.1 — Migrate example to infospace configuration
|
||||
|
||||
Replace the ad-hoc `process_chapters.py` setup with a declarative
|
||||
`infospace.yaml`:
|
||||
|
||||
```yaml
|
||||
topic:
|
||||
name: "The Wealth of Nations"
|
||||
domain: "Classical Economics"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Viable System Model"
|
||||
path: artifacts/vsm-reference/
|
||||
|
||||
schemas:
|
||||
entity: schemas/economic-entity-schema-v1.0.md
|
||||
mapping: schemas/vsm-mapping-schema-v1.0.md
|
||||
analysis: schemas/chapter-analysis-schema-v1.0.md
|
||||
|
||||
competency_questions: schemas/competency-questions.md
|
||||
|
||||
viability:
|
||||
redundancy_ratio: { max: 0.05 }
|
||||
coverage_ratio: { min: 0.60 }
|
||||
coherence_components: { max: 1 }
|
||||
consistency_cycles: { max: 0 }
|
||||
granularity_entropy: { min: 1.0 }
|
||||
per_entity_mean: { min: 3.5 }
|
||||
|
||||
pipeline:
|
||||
stages:
|
||||
- template: extract-entities
|
||||
spaces: [sources, guidelines, vsm-reference, entities]
|
||||
- template: map-to-vsm
|
||||
spaces: [entities, vsm-reference, guidelines]
|
||||
- template: synthesize-analysis
|
||||
spaces: [sources, entities, mappings, vsm-reference]
|
||||
post_batch:
|
||||
- template: assess-metrics
|
||||
spaces: [analyses, vsm-reference]
|
||||
```
|
||||
|
||||
**Depends on:** S2.1
|
||||
**Deliverable:** `infospace.yaml` + migration of `process_chapters.py` to
|
||||
use infospace tooling APIs
|
||||
|
||||
### S3.2 — Clean per-chapter git history
|
||||
|
||||
Re-run all processed chapters (and remaining ones) with per-chapter
|
||||
commits on a clean branch, then replace the current tangled history.
|
||||
|
||||
**Maps to:** INFRA-TASKS #4, #7
|
||||
**Depends on:** S3.1
|
||||
**Deliverable:** Clean branch with one commit per chapter
|
||||
|
||||
### S3.3 — Full evaluation run
|
||||
|
||||
Run all per-entity evaluations and collection checks on the completed
|
||||
infospace. Establish baseline metrics. Demonstrate the viability
|
||||
dashboard.
|
||||
|
||||
**Maps to:** INFRA-TASKS #6
|
||||
**Depends on:** S2.3, S2.4, S2.5, S3.2
|
||||
**Deliverable:** Complete evaluation results + viability report
|
||||
|
||||
### S3.4 — Rewrite tutorial
|
||||
|
||||
Update `TUTORIAL.md` to use infospace tooling commands instead of
|
||||
raw `process_chapters.py` invocations. The tutorial should walk
|
||||
through:
|
||||
|
||||
1. Initialising an infospace (`markitect infospace init`)
|
||||
2. Defining schemas and competency questions
|
||||
3. Processing chapters (pipeline execution)
|
||||
4. Evaluating entities (`markitect infospace evaluate`)
|
||||
5. Running collection checks (`markitect infospace check`)
|
||||
6. Reviewing viability (`markitect infospace viability`)
|
||||
7. Iterating: refining guidelines, re-processing, re-evaluating
|
||||
8. Using the infospace as a discipline for a new project
|
||||
|
||||
**Depends on:** S3.1-S3.3
|
||||
**Deliverable:** Revised `TUTORIAL.md`
|
||||
|
||||
### S3.5 — Demonstrate composition
|
||||
|
||||
Create a minimal second infospace (e.g. a modern supply chain case
|
||||
study or a different economic text) that binds the Wealth of Nations
|
||||
infospace as a discipline. Demonstrates the composition model from S2.6.
|
||||
|
||||
**Depends on:** S2.6, S3.3
|
||||
**Deliverable:** Second example infospace + composition tutorial section
|
||||
|
||||
---
|
||||
|
||||
## Task Mapping
|
||||
|
||||
Cross-reference between INFRA-TASKS numbers and roadmap stages:
|
||||
|
||||
| INFRA-TASK | Description | Stage |
|
||||
|------------|-------------|-------|
|
||||
| 1-3 | Infra fixes (resolved) | — |
|
||||
| 4 | Per-chapter git history | S3.2 |
|
||||
| 5 | Prompt file side-effects | S1.6 (batch eval avoids this) |
|
||||
| 6 | Stale metrics | S3.3 |
|
||||
| 7 | Remaining 28 chapters | S3.2 |
|
||||
| 8 | Per-concept quality metrics in schema | S2.3 |
|
||||
| 9 | Evaluate-entity prompt template | S2.3 |
|
||||
| 10 | Deterministic schema compliance | S1.2 |
|
||||
| 11 | Structured metrics output | S1.5 |
|
||||
| 12 | Metrics-over-time tracking | S2.5 |
|
||||
| 13 | Entity metadata index | S1.1 |
|
||||
| 14 | Redundancy detection (C1) | S2.4 |
|
||||
| 15 | Coverage completeness (C2) | S2.4 |
|
||||
| 16 | Structural coherence (C3) | S2.4 |
|
||||
| 17 | Definitional consistency (C4) | S2.4 |
|
||||
| 18 | Granularity balance (C5) | S2.4 |
|
||||
| 19 | Unified collection evaluation | S2.4 |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order
|
||||
|
||||
Recommended sequence, accounting for dependencies and value delivery:
|
||||
|
||||
**Phase A — Foundation (Stage 1, parallelisable)**
|
||||
1. S1.1 Entity metadata parser
|
||||
2. S1.3 Embedding adapter
|
||||
3. S1.4 Graph analysis utilities
|
||||
|
||||
**Phase B — Validation & Output (Stage 1)**
|
||||
4. S1.2 Schema compliance validator (needs S1.1)
|
||||
5. S1.5 Structured evaluation output (needs S1.1)
|
||||
6. S1.7 FCA computation (needs S1.1)
|
||||
|
||||
**Phase C — Orchestration (Stage 1 → Stage 2 bridge)**
|
||||
7. S1.6 Batch LLM evaluation orchestrator (needs S1.5)
|
||||
|
||||
**Phase D — Infospace Core (Stage 2)**
|
||||
8. S2.1 Infospace model and configuration
|
||||
9. S2.2 Lifecycle commands
|
||||
10. S2.3 Per-entity evaluation primitives (needs S1.6, S2.1)
|
||||
|
||||
**Phase E — Collection Intelligence (Stage 2)**
|
||||
11. S2.4 Collection-level checks (needs S1.3, S1.4, S1.7, S2.1)
|
||||
12. S2.5 Metrics history and viability tracking
|
||||
|
||||
**Phase F — Composition (Stage 2)**
|
||||
13. S2.6 Infospace composition model
|
||||
14. S2.7 Documentation
|
||||
|
||||
**Phase G — Example (Stage 3)**
|
||||
15. S3.1 Migrate example to infospace config
|
||||
16. S3.2 Clean per-chapter history
|
||||
17. S3.3 Full evaluation run
|
||||
18. S3.4 Rewrite tutorial
|
||||
19. S3.5 Demonstrate composition
|
||||
Reference in New Issue
Block a user