Three coordinated changes that let the pipeline produce a clean
chapter-by-chapter git history on long texts without archaeology after
the fact.
1. Richer commit messages. `SourcePipeline._git_commit` now diffs the
staged changes, buckets added files by output subdirectory (entities,
evaluations, classifications, mappings, analyses, metrics, logs), and
includes counts in the commit body. So `git log` reads "entities:
+23, evaluations: +23" per chapter instead of the same generic blurb
on every commit. Zero behaviour change when no output changed; falls
back to the original message if the diff query fails.
2. --eval-after-source / --classify-after-source on `infospace process`.
After a source's stages succeed, the pipeline identifies which entity
files are *new* (set diff of entity slugs before vs after), loads
their EntityMeta, and runs per-entity evaluation and/or
classification scoped to just those slugs before the per-source git
commit lands. Result: each chapter's commit is self-contained —
extraction + evaluation + classification in one atomic unit. Gated
behind explicit flags because the cost is real (LLM latency per
chapter rather than amortised across one bulk batch).
3. `markitect infospace chapters` subcommand. Lists source files in
canonical order with entity count, evaluated count, classified
count, and mean per-entity score per source. Text or JSON output.
Natural triage surface for long-text infospaces — spot chapters that
under-extracted or evaluated poorly.
Also: `docs/advanced-usage.md` gets a new "Systematic processing of
long texts" section with the recommended flag combo and the tradeoff
note on cost.
11 new unit tests cover the chapters command (text/json/no-sources),
the process flag wiring (help + provider requirement), and the
commit-body bucket logic. Full infospace+llm unit suite (315 tests)
green; 3 pre-existing infospace failures unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>