Commit Graph

602 Commits

Author SHA1 Message Date
dfe56a4f9b docs(metrics): clarify C2 coverage — domain×chapter matrix, not domain×VSM
- coverage.py: rewrite module docstring to explain what the metric actually
  computes (domain × chapter cross-tabulation, not VSM system coverage),
  what it does not capture (entity connectivity → C3), and when the
  threshold is appropriate
- CoverageReport: add domain_densities, density_std, cross_cutting_ratio
  for distribution-level insight beyond the aggregate ratio
- check_coverage: compute per-domain density and cross-cutting ratio
- METRICS-METHODOLOGY.md: correct C2 section to match implementation,
  document the distribution-based interpretation, add implementation status
  table distinguishing what is wired vs planned

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 00:08:46 +01:00
0f54f094e4 chore(example): final metrics snapshot — all 35 chapters processed
1021 entities extracted across all Books 1-5 of The Wealth of Nations.
Final metrics: coverage=0.4424, granularity=2.9533, redundancy=0.0059.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 22:54:54 +01:00
4a15a50337 infospace: process book-5-chapter-03
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 22:54:40 +01:00
92dfe367c7 infospace: process book-5-chapter-02
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 22:46:32 +01:00
23c397e46a infospace: process book-5-chapter-01
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 22:36:06 +01:00
e695ddfbbd infospace: process book-4-chapter-09
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 22:32:07 +01:00
5245dbbfc8 infospace: process book-4-chapter-08
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 22:25:52 +01:00
4319d2a32b infospace: process book-4-chapter-07
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 22:14:18 +01:00
efdaa884c8 infospace: process book-4-chapter-06
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 22:01:44 +01:00
2804de3d24 infospace: process book-4-chapter-05
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 21:47:52 +01:00
3e96ac7b8d infospace: process book-4-chapter-04
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 21:36:17 +01:00
a687e508f3 infospace: process book-4-chapter-03
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 21:31:40 +01:00
da9c5fce80 infospace: process book-4-chapter-02
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 21:19:39 +01:00
cd87ebfdc0 infospace: process book-4-chapter-01
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 21:13:08 +01:00
666f78d1ba infospace: process book-4-introduction
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 21:02:00 +01:00
579e02989b infospace: process book-3-chapter-04
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 20:46:20 +01:00
8401c69ff2 infospace: process book-3-chapter-03
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 20:40:35 +01:00
1b9a31665c fix(pipeline): retry on all LLM errors (not just rate limits)
Free-tier APIs intermittently return invalid JSON or empty responses.
Now any exception in _call_llm retries up to 3 times with a 5s back-off,
rather than failing immediately on non-rate-limit errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 20:32:23 +01:00
06e904ccf5 infospace: process book-3-chapter-02
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 20:30:22 +01:00
59d42b1665 infospace: process book-3-chapter-01
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 20:18:15 +01:00
8c11e13fef infospace: process book-2-chapter-05
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 20:03:11 +01:00
ac4e508aff infospace: process book-2-chapter-04
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 19:57:59 +01:00
8e1943afdb infospace: process book-2-chapter-03
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 19:50:53 +01:00
05711e541d infospace: process book-2-chapter-02
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 19:43:19 +01:00
8cb9ee6f6e infospace: process book-2-chapter-01
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 19:26:57 +01:00
db129fde6b infospace: process book-1-chapter-11
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 19:19:20 +01:00
6d9ec4e34b infospace: process book-1-chapter-10
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 18:59:36 +01:00
679f482e49 config(example): increase extract-entities max_tokens to 8000
Chapters with many pre-existing entities were still truncating at 6000 tokens
because the LLM needs space to output the full list of candidates even when
most are skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 18:48:33 +01:00
368571905a infospace: process book-1-chapter-09
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 15:58:08 +01:00
9c95912d68 infospace: process book-1-chapter-08
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 15:47:12 +01:00
0828581269 infospace: process book-1-chapter-07
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 15:40:24 +01:00
283abac378 infospace: process book-1-chapter-06
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 15:29:59 +01:00
90ca14dd85 config(example): increase max_tokens for map-to-vsm (10k) and synthesize (4k)
map-to-vsm was consistently truncating at 6000 tokens; synthesize-analysis
sometimes truncated at 3000 for chapters with many entities.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 15:21:04 +01:00
098b781f92 infospace: process book-1-chapter-05
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 15:20:35 +01:00
eea397a380 infospace: process book-1-chapter-04
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 15:12:54 +01:00
7615beb139 chore(example): update metrics after chapter-03 collection check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 15:06:03 +01:00
c2e06c15d7 infospace: process book-1-chapter-03
Extract entities, map to VSM, and synthesize analysis.
2026-02-19 15:04:57 +01:00
df1fdf1842 feat(pipeline): per-stage max_tokens, LLM provenance, processing log
- PipelineStage now supports max_tokens to override the 4096 default
- SourcePipeline records provider/model on each entity file as HTML comment
- output/processing-log.yaml tracks tokens, cost, duration, retries, errors
- _call_llm returns (content, metadata) for downstream traceability
- _http.py wraps JSON parse errors with body preview for debugging
- infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 14:50:49 +01:00
5ede1de4b8 fix(pipeline): retry on 0-entity response, save raw debug, improve template
- SourcePipeline: retry split_entities stage once when 0 entity delimiters
  are found (free-tier models intermittently return short non-formatted
  responses); save raw LLM response to <stage>-raw.md alongside prompts
- Return None (pause pipeline) rather than writing empty view file when
  no entities found after max retries
- _http.py: wrap json.JSONDecodeError in LLMAPIError with body preview
- extract-entities.md: add explicit H2-heading format example to Output
  Format section to prevent models from using inline "Section:" format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 14:26:28 +01:00
72d9904485 feat(infospace): add process command for batch source file processing
- Extend PipelineStage with name, output_dir, output_macro,
  split_entities, and macros fields for declarative pipeline config
- Add SourcePipeline class (pipeline.py) using simple @{macro}
  substitution — no SQLite dependency, skip-if-exists per stage,
  LLM retry on rate limits, git commit per source
- Add `markitect infospace process [GLOB_PATTERN]` CLI command with
  --all, --provider, --model, --check-after-each, --no-commit flags
- Update infospace.yaml with output_dir, output_macro, split_entities,
  and macros for each pipeline stage in the WoN example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:29:50 +01:00
77dd3fee6d fix(example): standardise domain enum and source chapter format in schema/rules
Two root causes of metric fragmentation observed in collection checks:

1. Schema's Economic Domain used free-form examples ("labour economics,
   trade theory") which overrode the enum in extraction-rules.md, causing
   the LLM to produce multi-domain strings and non-canonical values.
   Fix: schema now specifies the exact 7-value enum with descriptions.

2. Source Chapter had no format constraint, producing 9 different formats
   for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
   Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.

These fixes are prerequisites for clean reprocessing (S3.2 continuation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:02:05 +01:00
715ef19d1c infospace: remove example output — will replay chapter by chapter
This commit clears the tangled example output so each chapter
can be re-committed cleanly via S3.2.
2026-02-19 09:22:55 +01:00
3ac8447c10 feat(example): add baseline metrics snapshot from collection checks run
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Initial metrics from S2.4 checks on 85 entities (7 of 35 chapters):
coverage_ratio=0.361, redundancy=0.0, coherence_components=0.0,
consistency_cycles=0.0, granularity_entropy=2.69

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 07:44:01 +01:00
94cb2063af feat(example): migrate to infospace config with tooling integration (S3.1)
Add infospace.yaml declaring topic, disciplines, schemas, viability
thresholds. Integrate infospace tooling into process_chapters.py with
--infospace-status, --infospace-check, and --infospace-viability flags.

Initial check: 85 entities, 4/5 viable (coverage 0.36 < 0.50 — only
7/35 chapters processed so far).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:29:53 +01:00
d1c6e53754 docs: add infospace primitives reference (S2.7)
Reference document covering all infospace tooling primitives: config,
entity metadata, schema validation, per-entity evaluation, collection
checks, metrics history, viability, composition, and CLI commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:05:09 +01:00
b76d6d38c1 feat(infospace): add composition model for discipline binding (S2.6)
Discipline resolution, viability checking, entity access, stale
mapping detection, and binding management. CLI commands: bind-discipline,
disciplines, stale-mappings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:03:54 +01:00
ce7f78d57d feat(infospace): add metrics history and viability tracking (S2.5)
History module with snapshot creation from check results, metrics file
I/O, auto-append to history after checks, date-based snapshot lookup,
and metric trend extraction. CLI commands: history, history-diff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:01:00 +01:00
11585e6968 feat(infospace): add collection-level quality checks C1–C5 (S2.4)
Five concern checks: Redundancy (embedding/word overlap), Coverage
(FCA gap analysis), Coherence (graph connectivity), Consistency
(cycle detection), Granularity (Shannon entropy). Orchestrator runs
all or selected checks, CLI `markitect infospace check` command added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:54:22 +01:00
3461d2f354 feat(infospace): add per-entity evaluation pipeline and CLI command (S2.3)
Evaluation pipeline builds prompts from entity metadata, delegates
to BatchEvaluator, parses structured LLM responses into ScoreEntry
objects, and writes evaluation files. CLI: 'markitect infospace evaluate'
with --provider, --entity, --chapter filters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:48:34 +01:00
3726503adb feat(infospace): add lifecycle CLI commands — init, status, entities, viability (S2.2)
Adds 'markitect infospace' command group with init (create config),
status (entity count/domains/disciplines), entities (list with sort),
and viability (threshold dashboard with pass/fail).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:46:54 +01:00