- coverage.py: rewrite module docstring to explain what the metric actually
computes (domain × chapter cross-tabulation, not VSM system coverage),
what it does not capture (entity connectivity → C3), and when the
threshold is appropriate
- CoverageReport: add domain_densities, density_std, cross_cutting_ratio
for distribution-level insight beyond the aggregate ratio
- check_coverage: compute per-domain density and cross-cutting ratio
- METRICS-METHODOLOGY.md: correct C2 section to match implementation,
document the distribution-based interpretation, add implementation status
table distinguishing what is wired vs planned
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1021 entities extracted across all Books 1-5 of The Wealth of Nations.
Final metrics: coverage=0.4424, granularity=2.9533, redundancy=0.0059.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Free-tier APIs intermittently return invalid JSON or empty responses.
Now any exception in _call_llm retries up to 3 times with a 5s back-off,
rather than failing immediately on non-rate-limit errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Chapters with many pre-existing entities were still truncating at 6000 tokens
because the LLM needs space to output the full list of candidates even when
most are skipped.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
map-to-vsm was consistently truncating at 6000 tokens; synthesize-analysis
sometimes truncated at 3000 for chapters with many entities.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PipelineStage now supports max_tokens to override the 4096 default
- SourcePipeline records provider/model on each entity file as HTML comment
- output/processing-log.yaml tracks tokens, cost, duration, retries, errors
- _call_llm returns (content, metadata) for downstream traceability
- _http.py wraps JSON parse errors with body preview for debugging
- infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- SourcePipeline: retry split_entities stage once when 0 entity delimiters
are found (free-tier models intermittently return short non-formatted
responses); save raw LLM response to <stage>-raw.md alongside prompts
- Return None (pause pipeline) rather than writing empty view file when
no entities found after max retries
- _http.py: wrap json.JSONDecodeError in LLMAPIError with body preview
- extract-entities.md: add explicit H2-heading format example to Output
Format section to prevent models from using inline "Section:" format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend PipelineStage with name, output_dir, output_macro,
split_entities, and macros fields for declarative pipeline config
- Add SourcePipeline class (pipeline.py) using simple @{macro}
substitution — no SQLite dependency, skip-if-exists per stage,
LLM retry on rate limits, git commit per source
- Add `markitect infospace process [GLOB_PATTERN]` CLI command with
--all, --provider, --model, --check-after-each, --no-commit flags
- Update infospace.yaml with output_dir, output_macro, split_entities,
and macros for each pipeline stage in the WoN example
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two root causes of metric fragmentation observed in collection checks:
1. Schema's Economic Domain used free-form examples ("labour economics,
trade theory") which overrode the enum in extraction-rules.md, causing
the LLM to produce multi-domain strings and non-canonical values.
Fix: schema now specifies the exact 7-value enum with descriptions.
2. Source Chapter had no format constraint, producing 9 different formats
for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.
These fixes are prerequisites for clean reprocessing (S3.2 continuation).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
History module with snapshot creation from check results, metrics file
I/O, auto-append to history after checks, date-based snapshot lookup,
and metric trend extraction. CLI commands: history, history-diff.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Evaluation pipeline builds prompts from entity metadata, delegates
to BatchEvaluator, parses structured LLM responses into ScoreEntry
objects, and writes evaluation files. CLI: 'markitect infospace evaluate'
with --provider, --entity, --chapter filters.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds 'markitect infospace' command group with init (create config),
status (entity count/domains/disciplines), entities (list with sort),
and viability (threshold dashboard with pass/fail).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>