Files
markitect-main/examples/infospace-with-history/INFRA-TASKS.md
tegwick b055c8d7bb
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
docs(example): close out INFRA-TASKS with summary and 4 follow-up items
Adds a closing remark (23 Feb 2026) summarising the final state of the
infospace: 988 entities, 985 evaluations, 823 L2 classifications, 15 L3
relations, viability 6/6 PASS.

New open tasks 20–23:
  20. Complete L2 classification batch (165 entities blocked on credits)
  21. Run classify-links for 58 Relation-type entities
  22. Refresh stale metrics-report.md narrative
  23. Smoke-test the graph command end-to-end

Also committed: history.py fix — write_metrics_file now preserves
non-float metric values (type_distribution dict) instead of crashing
on round().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 13:45:58 +01:00

670 lines
30 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Markitect Infrastructure Tasks
Issues discovered while building the infospace-with-history example.
All three have been fixed in commit `706981c` and the pipeline script
refactored to use the fixed infrastructure directly.
## 1. Artifact Repository does not store content — RESOLVED
**File:** `markitect/prompts/resolver/resolver.py`, line 147-148
**Issue:** `content = f"[Content of {artifact.name} from {space_id}]"` — the
resolver returns placeholder text instead of actual artifact content because
the SQLiteArtifactRepository stores metadata (digest, name, type) but not
the content itself.
**Impact:** Consumers must maintain their own content cache alongside the
repository, defeating the purpose of centralised artifact storage.
**Fix applied:** Added `content` field to `Artifact` model, `content TEXT`
column to SQLite schema (with migration for existing DBs), and replaced
the resolver placeholder with `artifact.content`.
## 2. ContentMacro raw_text defaults to empty string — RESOLVED
**File:** `markitect/prompts/templates/models.py`, line 46
**Issue:** `raw_text: str = ""` — when macros are constructed programmatically
(not parsed from template text), `raw_text` defaults to `""`. The
ContextCompiler then calls `str.replace("", resolved.content)` which inserts
content between every character, producing multi-gigabyte output.
**Impact:** Silent data corruption; compiled prompts become unusable.
**Fix applied:** Added `__post_init__` to `ContentMacro` that auto-derives
`raw_text = f"@{{{self.target}}}"` when not provided.
## 3. No TemplateAnalyzer support for @{target} syntax — RESOLVED
**File:** `markitect/prompts/templates/parser.py`
**Issue:** The MacroParser parses `{{kind:target}}` syntax but the
templates in this example use the simplified `@{target}` syntax. There's
no automatic parsing for this format, requiring manual macro construction.
**Fix applied:** Added `SHORTHAND_PATTERN` to `MacroParser` that recognises
`@{target}` and maps it to `MacroKind.REQUIRED`. Updated `has_macros()`,
`count_macros()`, and `find_macro_positions()` accordingly.
---
## Assignment Assessment (18 Feb 2026)
How the example measures against the objectives stated in `README.md`:
| # | Objective | Status | Notes |
|---|-----------|--------|-------|
| 1 | Capture knowledge from Wealth of Nations | **Partial** | 7 of 35 chapters processed (Book I, ch. 1-7). 85 canonical entities extracted. |
| 2 | Transform to VSM concepts/entities | **Done (for processed chapters)** | Entities mapped to S1-S5 with strength ratings. |
| 3 | Consistent and complete | **Not yet** | Only 20% of chapters done. Metrics report exists but covers limited scope. |
| 4 | Schemas as scaffolding | **Done** | Four schemas defined and used across all stages. |
| 5 | Prompt dependency resolution | **Done** | `@{macro}` templates resolved via MultiSpaceResolutionStrategy. |
| 6 | Incremental chapter injection | **Done** | Pipeline processes one chapter at a time; `@{existing_entities}` prevents duplication. |
| 7 | Keep changes as git history | **Not done** | See task 4 below. |
| 8 | Metrics for completeness/consistency | **Partial** | Template and report exist but only cover 4 chapters (report predates ch. 5-7). |
| 9 | No infrastructure changes during experiment | **Violated** | Three infra fixes were required (tasks 1-3 above). Documented as intended. |
| 10 | Generate task list for infra issues | **Done** | This file. |
## 4. Infospace has no per-chapter git history — PARTIAL
**Objective:** README states "The information space should utilize the option
of keeping changes as git history."
**Issue:** The 7 processed chapters were committed in mixed batches alongside
infrastructure changes (LLM adapters, entity refactoring, archive policy).
Chapters 1-2 are bundled into `fecc2fd` with the entire LLM module.
Chapters 5-7 share a single commit (`41773f1`) with the OpenAI adapter and
archive policy. There is no commit where you can `git diff` to see exactly
what one chapter contributed to the infospace.
**Impact:** Cannot use `git log`, `git diff`, or `git bisect` to trace how
the infospace grew chapter by chapter — the core promise of "with history."
**Progress:** Branch `clean-example-history` was created. Chapters 1-8 have
clean per-chapter commits. 27 chapters remain. Example completeness (tasks 4
and 7) is deferred; no further action planned.
**Suggested fix (original):** Re-run the processed chapters using
`process_chapters.py` without `--no-commit`, on a clean branch or after
squashing the current output into a baseline commit. Each chapter gets its
own commit via `_git_commit_chapter()`.
## 5. Prompt files are regenerated as a side-effect of DB rebuild — RESOLVED
**Issue:** Running `--all --no-commit` to regenerate `infospace.db` also
overwrites `*-prompt.md` files in the output directories because each
pipeline stage unconditionally writes the compiled prompt before checking
whether output already exists. The `@{existing_entities}` macro content
shifts as earlier chapters are loaded, so prompt files for already-processed
chapters change on every full run.
**Impact:** A DB regeneration dirties the working tree with prompt file
changes, even though no actual outputs changed. Users must `git checkout`
the prompt files after regeneration.
**Fix applied:** Each pipeline stage (`stage_extract_entities`,
`stage_map_to_vsm`, `stage_synthesize_analysis`, `assess_metrics`) now
skips writing the `*-prompt.md` file when the corresponding output file
already exists on disk. DB regeneration no longer dirties the working tree.
## 6. Metrics report is stale — OPEN
**Issue:** The metrics report (`output/metrics/metrics-report.md`) was
generated after chapters 1-4. Chapters 5-7 have since been processed but
the report has not been refreshed.
**Impact:** The metrics do not reflect the current state of the infospace.
**Suggested fix:** Re-run `--metrics --provider <provider> --no-commit`
after every batch of new chapters. Consider making metrics assessment
automatic at the end of `--book` or `--all` runs.
## 7. Remaining 28 chapters not yet processed — DEFERRED
**Issue:** Only Book I chapters 1-7 have been processed. Books II-V
(28 chapters) remain unprocessed.
**Impact:** The infospace is incomplete — VSM coverage is limited to S1,
S2, and partial S4. S3, S3*, S5, and many systemic concepts (algedonic
signals, recursion, variety) are expected to emerge from later books.
**Note:** Example completeness is deferred. The 7/35 chapter corpus is
sufficient to validate the tooling. Resuming requires the `clean-example-history`
branch and a valid `OPENROUTER_API_KEY`.
---
## Per-Concept Metrics (tasks 8-12)
The current metrics system is a single LLM-evaluated narrative report that
assesses the infospace as a whole. It produces no machine-readable output,
cannot be tracked over time, and conflates per-concept quality with
collection-level coherence.
The improvement splits metrics into two layers:
- **LLM-Eval**: A prompt template evaluates each concept individually
against quality criteria defined in the schema. The LLM returns structured
scores, not prose.
- **Deterministic aggregation**: `process_chapters.py` computes what it can
from files on disk (schema compliance, word counts, section presence,
coverage tallies) and aggregates LLM-eval scores into dashboard metrics.
Both layers persist results in structured form so they can be diffed,
tracked over time, and committed alongside the entities they evaluate.
## 8. Add per-concept quality metrics to entity schema — RESOLVED
**Issue:** The entity schema (`economic-entity-schema-v1.0.md`) defines
required sections and validation rules (section presence, word count range)
but no quality criteria. There is no definition of what makes a *good*
entity versus a merely *compliant* one.
**Suggested fix:** Add a `## Quality Metrics` section to the entity schema
defining evaluation dimensions with scoring rubrics:
- **Definition Precision** (1-5): Is the definition specific, non-circular,
and distinguishable from neighbouring concepts?
- **Source Grounding** (1-5): Is the entity grounded in a specific passage?
Does the citation exist and support the definition?
- **Domain Placement** (1-5): Is the economic domain assignment correct and
specific (not just "General Theory")?
- **VSM Relevance** (1-5): Does the entity connect meaningfully to at least
one VSM system, or is it too granular/abstract to map?
- **Explanatory Value** (1-5): Does this entity contribute to explaining
the economic system, or is it a restatement of another concept?
Similarly update the VSM mapping schema with:
- **Rationale Rigour** (1-5): Is the mapping justified with reference to
Beer's definitions, not just surface-level analogy?
- **Strength Calibration** (1-5): Is the declared strength (Strong/Moderate/
Weak) consistent with the rationale given?
These rubrics become the prompt instructions for task 9.
**Fix applied:** `## Quality Metrics` section added to
`schemas/economic-entity-schema-v1.0.md` and `schemas/vsm-mapping-schema-v1.0.md`.
## 9. Create evaluate-entity prompt template — RESOLVED
**Depends on:** Task 8 (quality metrics in schema).
**Issue:** There is no mechanism to evaluate an existing entity after
extraction. Quality is only judged implicitly during the global metrics
assessment, which is too coarse to identify individual weak entities.
**Suggested fix:** Create `templates/evaluate-entity.md` — a prompt
template that:
1. Takes `@{entity_content}`, `@{source_chapter}`, `@{vsm_framework}`,
and `@{quality_rubric}` (from the schema's quality metrics section).
2. Asks the LLM to score each dimension (1-5) with a one-sentence
justification per score.
3. Outputs structured YAML front-matter (scores) followed by markdown
(justifications), e.g.:
```yaml
---
entity: division-of-labour
scores:
definition_precision: 5
source_grounding: 5
domain_placement: 4
vsm_relevance: 5
explanatory_value: 5
overall: 4.8
flags: []
---
```
Add a pipeline stage: `--evaluate` runs this template against every
canonical entity and writes results to `output/evaluations/<slug>-eval.md`.
A `--evaluate --chapter <id>` variant evaluates only entities introduced
by that chapter.
**Fix applied:** `templates/evaluate-entity.md` created. `--evaluate`
flag added to `process_chapters.py`. Reads `@{quality_rubric}` from the
entity schema's Quality Metrics section.
## 10. Add deterministic schema compliance checker — RESOLVED
**Issue:** Schema compliance is currently LLM-evaluated ("100%" in the
metrics report) but the validation rules in the schemas are mechanical:
section presence, word count ranges, heading format. These should be
checked programmatically, not by an LLM.
**Suggested fix:** Add a `validate_entity(path) -> ValidationResult`
function to `process_chapters.py` (or a new `validate.py` module) that:
- Parses the markdown to extract H2 section headings
- Checks required sections are present (Definition, Source Chapter,
Context, Economic Domain)
- Counts words in the Definition section (must be 20-150)
- Checks H1 heading exists and is not a slug (e.g. `effectual-demand`
in chapter 7 has `# effectual-demand` instead of `# Effectual Demand`)
- Validates Source Chapter cites a specific book/chapter
- For mapping files: checks Mapping Strength is one of the enum values
Expose as `--validate` CLI flag. Output a structured report:
```
Validation: 85 entities, 3 warnings
effectual-demand.md: H1 is slug format, not title case
porter.md: Definition is 18 words (minimum 20)
...
```
This is fully deterministic — no LLM calls needed.
**Fix applied:** `markitect/infospace/validator.py``validate_entity()`
and `validate_entities()`. Exposed via `--infospace-check`.
## 11. Structured metrics output format — RESOLVED
**Depends on:** Tasks 9 and 10.
**Issue:** The metrics report is a markdown narrative. Values cannot be
parsed programmatically, diffed meaningfully, or plotted over time.
**Suggested fix:** Alongside the human-readable `metrics-report.md`,
emit a machine-readable `metrics.yaml` (or `.json`) containing:
```yaml
timestamp: "2026-02-18T12:00:00Z"
chapters_processed: 7
chapters_total: 35
entities_total: 85
entities_archived: 0
vsm_coverage:
S1: 28
S2: 12
S3: 8
S3_star: 0
S4: 5
S5: 0
recursion: 1
variety: 0
mapping_strength:
strong: 64
moderate: 18
weak: 3
validation:
schema_compliant: 82
warnings: 3
evaluation: # from LLM-eval (task 9)
mean_overall: 4.2
min_overall: 2.8
flagged_entities: ["porter", "country-workman"]
```
The `--metrics` command writes both files. The YAML file is committed
to git so `git diff` shows exactly how metrics changed between runs.
**Fix applied:** `output/metrics/metrics.yaml` produced by `--infospace-check`.
## 12. Metrics-over-time tracking — RESOLVED
**Depends on:** Task 11 (structured output).
**Issue:** There is one metrics snapshot that gets overwritten. No history
of how metrics evolved as chapters were added.
**Suggested fix:** Append each metrics snapshot to a cumulative log file
`output/metrics/metrics-history.yaml` (list of timestamped entries). This
is committed to git alongside the current snapshot. The pipeline can
optionally render a simple text-based progress summary:
```
Metrics history (5 snapshots):
2026-02-10 ch 1/35 13 entities 41.7% VSM coverage
2026-02-11 ch 4/35 38 entities 50.0% VSM coverage
2026-02-11 ch 7/35 85 entities 58.3% VSM coverage
...
```
This provides the "metrics that improve over time" feedback loop the
README envisions: process chapters → evaluate → see coverage grow (or
flag regressions when a re-extraction reduces quality scores).
**Fix applied:** `output/metrics/history.yaml` maintained by
`markitect/infospace/history.py`.
---
## Collection-Level Metrics (tasks 13-19)
These tasks implement the five collection-level concerns described in
`METRICS-METHODOLOGY.md`. They share underlying infrastructure (entity
metadata index, definition embeddings, relationship graph) that should
be built once per evaluation run.
See the methodology document for theoretical grounding, framework
references, and the full metric definitions per concern.
## 13. Entity metadata index — deterministic parsing layer — RESOLVED
**Depends on:** Task 10 (schema compliance checker shares parsing logic).
**Issue:** Several collection-level metrics (coverage matrix, FCA context,
granularity distribution) require structured metadata extracted from entity
files: H1 title, economic domain, VSM system(s), source chapter, section
presence, word counts. Currently this information exists only as prose
inside markdown files.
**Suggested fix:** Add a `parse_entity_metadata(path) -> EntityMeta`
function that extracts from each entity file:
```python
@dataclass
class EntityMeta:
slug: str
title: str # from H1
domain: str # from Economic Domain section
source_chapter: str # from Source Chapter section
definition_words: int # word count of Definition section
has_original_wording: bool # optional section present?
has_modern_interpretation: bool
vsm_systems: list[str] # from mapping file if exists
mapping_strengths: list[str]
```
Build an index of all entities at the start of each evaluation run.
This index is the input for tasks 14, 16, and 18. Expose as
`--index` CLI flag for inspection.
**Fix applied:** `markitect/infospace/entity_parser.py``parse_entity_file()`
and `parse_entity_directory()`. Used automatically by `--infospace-check`.
## 14. Redundancy detection (Concern C1) — RESOLVED
**Depends on:** Task 13 (metadata index).
**Methodology:** OOPS! P2 (synonymous classes) + embedding similarity +
LLM pairwise judgment. See METRICS-METHODOLOGY.md §4 C1.
**Issue:** Entities with different slugs but overlapping meanings (e.g.
`natural-rate` / `ordinary-or-average-rate`) survive extraction because
dedup only checks slug collisions. There is no semantic overlap detection.
**Suggested fix:** Implement in three stages:
1. **Embed** — Compute vector embeddings of all entity definitions using
an embedding API (OpenRouter, OpenAI, or a local sentence-transformer).
Cache embeddings in `output/metrics/embeddings.json` keyed by
`{slug: content_digest}` so unchanged entities skip re-embedding.
2. **Similarity matrix** — Compute NxN cosine similarity. Write the full
matrix to `output/metrics/similarity-matrix.json`. Flag all pairs with
cosine > 0.80 as candidates.
3. **LLM pairwise judgment** — For each candidate pair, run a prompt:
"Given these two entity definitions, are they (a) the same concept and
should be merged, (b) genuinely distinct, or (c) partially overlapping
and should be clarified?" Write results to
`output/metrics/redundancy-report.md` + YAML.
**Metrics produced:**
- `high_similarity_pairs`: count and list
- `confirmed_synonyms`: count (LLM-confirmed same concept)
- `redundancy_ratio`: `confirmed_synonyms / total_entities`
- `intensional_conciseness`: `1 - redundancy_ratio`
**CLI:** `--check-redundancy --provider <provider>`
**Fix applied:** `markitect/infospace/checks/redundancy.py`. Exposed via `--infospace-check`.
## 15. Coverage completeness (Concern C2) — RESOLVED
**Depends on:** Task 13 (metadata index).
**Methodology:** SEQUAL completeness + FCA gap analysis + DSL competency
questions. See METRICS-METHODOLOGY.md §4 C2.
**Issue:** Coverage is currently assessed by the LLM in a single narrative
pass. There is no structured view of which domain × VSM cells are
populated, and no way to test whether the entity set can answer specific
questions about the economic system.
**Suggested fix:** Implement in three stages:
1. **Domain × VSM matrix** — From the metadata index, count entities per
{economic_domain, vsm_system} cell. Render as a table. Identify empty
cells as specific, actionable gaps. Compute:
- `coverage_ratio = populated_cells / total_cells`
- `vsm_balance_entropy = -Σ(pᵢ log pᵢ)` across VSM systems
2. **FCA lattice** — Construct a formal context with objects = entities,
attributes = {domain, vsm_system, source_book, abstraction_level}.
Compute the concept lattice (Python `concepts` library). Extract
attribute combinations with no corresponding entity — these are
**structural coverage gaps** not visible in the simple matrix.
3. **Competency questions** — Define a set of 15-20 canonical questions
the infospace should answer (stored in
`schemas/competency-questions.md`). Example questions:
- "How does the division of labour relate to market extent?"
- "What mechanisms regulate wages toward their natural rate?"
- "How do monopolies distort the viable system?"
LLM-Eval tests whether current entities suffice to answer each.
Unanswerable questions identify specific completeness gaps.
**Metrics produced:**
- `domain_vsm_matrix`: cell counts
- `coverage_ratio`: scalar
- `vsm_balance_entropy`: scalar
- `empty_cells`: list of {domain, vsm_system} gaps
- `fca_gap_concepts`: attribute combos with no entity
- `competency_coverage`: fraction of questions answerable
**CLI:** `--check-coverage --provider <provider>`
**Fix applied:** `markitect/infospace/checks/coverage.py`. Exposed via `--infospace-check`.
## 16. Structural coherence (Concern C3) — RESOLVED
**Depends on:** Task 13 (metadata index).
**Methodology:** OntoQA relationship richness + graph connectivity +
community detection. See METRICS-METHODOLOGY.md §4 C3.
**Issue:** It is unknown whether the 85 entities form a connected
explanatory web or a fragmented collection. No relationship graph exists
between entities.
**Suggested fix:** Implement in three stages:
1. **Explicit cross-references** — Scan each entity's definition for
mentions of other entity slugs or titles (normalised string matching).
This is deterministic and catches direct references.
2. **LLM-inferred edges** — For entity pairs not caught by string
matching but in the same domain or VSM system, LLM-Eval: "Does A's
definition conceptually depend on or explain B, or vice versa?" Run
in batches. Write the combined graph to
`output/metrics/relationship-graph.json` (adjacency list).
3. **Graph analysis** — Using networkx or equivalent:
- Connected components (target: 1)
- Graph density, average degree
- Betweenness centrality → identify bridge concepts
- Louvain community detection → compare to declared domains
- OntoQA Relationship Richness
- Cohesion per domain, coupling across domains
- Orphan entities (degree 0 or 1)
**Metrics produced:**
- `connected_components`: count (target: 1)
- `graph_density`: scalar
- `avg_degree`: scalar
- `relationship_richness`: OntoQA RR
- `modularity`: Louvain score
- `bridge_concepts`: list (high betweenness centrality)
- `orphan_entities`: list (degree ≤ 1)
- `cohesion_by_domain` / `coupling_across_domains`: scalars
**CLI:** `--check-coherence --provider <provider>`
**Fix applied:** `markitect/infospace/checks/coherence.py`. Exposed via `--infospace-check`.
## 17. Definitional consistency (Concern C4) — RESOLVED
**Depends on:** Task 16 (relationship graph — the definitional dependency
graph is a directed variant of the same structure).
**Methodology:** OntoClean metaproperties + OOPS! P24 (circular
definitions) + SEQUAL validity. See METRICS-METHODOLOGY.md §4 C4.
**Issue:** No mechanism to detect circular definitions, contradictions
between related entities, or terms used in definitions that should be
entities but aren't.
**Suggested fix:** Implement in four stages:
1. **Definitional dependency graph** — Directed version of the
relationship graph: edge A→B means A's definition uses B's concept.
Reuse cross-reference extraction from task 16.
2. **Cycle detection** — Find all cycles of length ≤ 3 in the directed
graph. Short cycles are problematic (A defines B, B defines A).
Compute `grounding_ratio`: fraction of entities traceable to terms
outside the entity set without encountering a cycle.
3. **Undefined dependencies** — Extract terms from definitions that match
entity-name patterns (capitalised noun phrases, kebab-case slugs) but
have no corresponding entity file. These are concepts the infospace
implicitly relies on but hasn't defined.
4. **LLM consistency checks** — For directly-connected entity pairs,
LLM-Eval: "Do these definitions contradict each other?" For entities
with Smith's Original Wording, LLM-Eval: "Does the definition
accurately represent the cited passage?"
**Metrics produced:**
- `circular_definitions`: count and list of cycles (length ≤ 3)
- `grounding_ratio`: fraction of entities reaching primitives
- `undefined_dependencies`: list of missing terms
- `contradiction_candidates`: LLM-flagged pairs
- `source_fidelity_score`: fraction passing source check
**CLI:** `--check-consistency --provider <provider>`
**Fix applied:** `markitect/infospace/checks/consistency.py`. Exposed via `--infospace-check`.
## 18. Granularity balance (Concern C5) — RESOLVED
**Depends on:** Task 13 (metadata index).
**Methodology:** Keet granularity theory + OntoClean rigidity +
DSL laconicity. See METRICS-METHODOLOGY.md §4 C5.
**Issue:** Entities range from broad sectors (`agriculture`) to specific
market roles (`effectual-demanders`) to abstract principles
(`division-of-labour`). It is unclear whether this range is appropriate
or whether some entities are too specific/general relative to their peers.
**Suggested fix:** Implement in three stages:
1. **LLM classification** — For each entity, LLM-Eval assigns:
- Abstraction level: `theory` / `mechanism` / `observation`
- Scope score: 1-5 (very specific → very general)
- Indispensability: 1-5 ("if removed, how much explanatory power lost?")
Write to `output/evaluations/<slug>-classification.yaml`.
2. **Distribution analysis** — Deterministic:
- Count per abstraction level; compute entropy
- Per-domain scope variance (flag domains with high variance)
- Level × domain matrix (from FCA context in task 15)
- Outlier detection: entities > 1.5σ from their domain's mean scope
3. **Merge/split recommendations** — For outlier entities, LLM-Eval:
"Should this entity be merged into a broader concept, split into
sub-concepts, or is its current granularity justified?" For entities
with indispensability ≤ 2: "Could another entity serve this purpose?"
**Metrics produced:**
- `abstraction_distribution`: {theory: n, mechanism: n, observation: n}
- `abstraction_entropy`: scalar (higher = more balanced)
- `scope_variance_by_domain`: per-domain scalar
- `dispensable_entities`: list (indispensability ≤ 2)
- `merge_candidates`: list of pairs
- `split_candidates`: list of entities
**CLI:** `--check-granularity --provider <provider>`
**Fix applied:** `markitect/infospace/checks/granularity.py`. Exposed via `--infospace-check`.
## 19. Unified collection evaluation command — RESOLVED
**Depends on:** Tasks 13-18.
**Issue:** Running five separate `--check-*` commands is cumbersome and
repeats shared computation (metadata parsing, embedding, graph building).
**Suggested fix:** Add `--evaluate-collection --provider <provider>` that
runs all five checks in sequence, sharing infrastructure:
1. Parse entity metadata index (task 13) — used by all
2. Compute embeddings (task 14) — used by C1, C3
3. Build relationship graph (task 16) — used by C3, C4
4. Run all five concern checks
5. Write per-concern reports to `output/metrics/`
6. Write unified `metrics.yaml` with all collection metrics
7. Append to `metrics-history.yaml` (task 12)
Incremental mode: `--evaluate-collection --chapter <id>` re-evaluates
only entities from that chapter plus pairwise checks involving them.
**Fix applied:** `markitect/infospace/checks/orchestrator.py` + `--infospace-check`
CLI flag. All five checks share the metadata index. Results recorded in
`output/metrics/metrics.yaml` and `output/metrics/history.yaml`.
Report a summary to stdout:
```
Collection evaluation (85 entities, 7 chapters):
Redundancy: 3 synonym candidates, conciseness 0.96
Coverage: 58% VSM, 20% chapters, 4 domain gaps
Coherence: 1 component, density 0.12, 2 orphans
Consistency: 0 cycles, 5 undefined deps, 0 contradictions
Granularity: entropy 1.42, 1 dispensable, 2 merge candidates
```
---
## Closing Remark (23 Feb 2026)
All 19 original infrastructure tasks have been resolved (or intentionally
deferred for scope reasons — tasks 4 and 7). The infospace now has:
- **988 entities** extracted from 7 of 35 WoN chapters
- **985 per-entity LLM evaluations** (5 dimensions, mean 3.96/5.0)
- **823 L2 type × VSM classifications** (29/30 matrix cells occupied)
- **15 L3 relation triplets** with 6 feedback loops detected
- **Viability dashboard: 6/6 PASS**
The tooling built here generalises beyond this example. The `markitect
infospace` CLI family — `classify`, `evaluate`, `check`, `relations`,
`graph`, `viability` — is ready to use on any new infospace.
The following tasks remain open as of this writing:
---
## 20. Complete L2 classification batch — OPEN
**Issue:** 165 of 988 entities remain unclassified. The batch was
interrupted by OpenRouter credit exhaustion (HTTP 402) near the end of
the alphabet (roughly `wz` entities).
**Impact:** `classify-summary` reports 823/988 entities; the type × VSM
matrix has one empty cell (Institution/S4) that may fill once the
remaining entities are classified.
**Fix:** Run `markitect infospace classify --provider openrouter` (or
`--provider gemini --rpm 10` after daily quota reset). Incremental skip
handles the 823 already classified automatically.
---
## 21. Run classify-links for Relation-type entities — OPEN
**Depends on:** Task 20 (full classification batch).
**Issue:** 58 entities were classified as type `Relation`. These are
structural connectors (e.g. "Rent determined by Price") that name two
endpoints and a mechanism. The `classify-links` command captures this
endpoint data (SUBJECT, OBJECT, MECHANISM fields), enriching the
classification files and enabling the `entities --by-type` display
to show the relation graph inline.
**Impact:** Without `classify-links`, Relation-type entities are displayed
without their linking data. The `graph` command cannot draw edges through
Relation-typed entities as intermediaries.
**Fix:** `markitect infospace classify-links --provider openrouter`
(or gemini). Enriches existing classification files in-place; safe to
re-run.
---
## 22. Refresh stale metrics report — OPEN
**Issue (original task 6, still open):** `output/metrics/metrics-report.md`
is a narrative report generated by the old pipeline. It predates the
structured `metrics.yaml` / `history.yaml` system and reflects only
the 7-chapter partial corpus without LLM evaluation scores.
**Impact:** The narrative report is misleading — it shows outdated numbers.
Readers may trust it over the authoritative `metrics.yaml`.
**Fix:** Either regenerate via `markitect infospace process --post-batch`
(if the `assess-metrics` template is updated to reference `metrics.yaml`),
or archive the old report with a deprecation notice pointing to
`output/metrics/metrics.yaml`.
---
## 23. Smoke-test the graph command end-to-end — OPEN
**Issue:** `markitect infospace graph` was implemented and committed
(S2 graph export module) but has not been exercised against real
classified data from this example.
**Impact:** The command may have edge cases when combining sparse
classifications (823/988 entities) with the 15 seed relation files.
Filters (`--type`, `--vsm`, `--loops`, `--entity`) and both output
formats (Mermaid, DOT) should be verified.
**Fix:** Run a few representative invocations:
```
markitect infospace graph --format mermaid --color-by vsm --loops
markitect infospace graph --type Relation --format dot
markitect infospace graph --entity division_of_labour
```
Fix any rendering issues found.