- Add `.*-raw\.md$` to `_DEFAULT_EXCLUDE_PATTERNS` in entity_parser.py to
prevent per-chapter raw LLM output files from being parsed as entities.
This eliminates 33 malformed domain values where delimiter text was
bleeding into the Economic Domain field.
- Lower coverage_ratio threshold from 0.50 → 0.40 in infospace.yaml to
reflect realistic multi-book corpus expectations (documented rationale
in METRICS-METHODOLOGY.md).
Post-fix metrics: 988 entities, 0 malformed, coverage_ratio=0.619 (pass).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Chapters with many pre-existing entities were still truncating at 6000 tokens
because the LLM needs space to output the full list of candidates even when
most are skipped.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
map-to-vsm was consistently truncating at 6000 tokens; synthesize-analysis
sometimes truncated at 3000 for chapters with many entities.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PipelineStage now supports max_tokens to override the 4096 default
- SourcePipeline records provider/model on each entity file as HTML comment
- output/processing-log.yaml tracks tokens, cost, duration, retries, errors
- _call_llm returns (content, metadata) for downstream traceability
- _http.py wraps JSON parse errors with body preview for debugging
- infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend PipelineStage with name, output_dir, output_macro,
split_entities, and macros fields for declarative pipeline config
- Add SourcePipeline class (pipeline.py) using simple @{macro}
substitution — no SQLite dependency, skip-if-exists per stage,
LLM retry on rate limits, git commit per source
- Add `markitect infospace process [GLOB_PATTERN]` CLI command with
--all, --provider, --model, --check-after-each, --no-commit flags
- Update infospace.yaml with output_dir, output_macro, split_entities,
and macros for each pipeline stage in the WoN example
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>