generated from coulomb/repo-seed
b9173b6569f32323fbd31b1c8304f8ecf5b629c9
Resolve chapter labels from EPUB nav entries (when present) and from the first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N" labels into numeric chapter indices, and generate stable IDs of the form chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The chunker now operates on cleaned body text, distributes id="Page_*" page anchors per part via inline markers extracted before splitting, and supports a configurable overlap_words evidence window between adjacent parts of the same chapter. Reclassify body sections whose chapter label matches contents/transcriber-notes/license/colophon tokens so they leave the body stream by default. Strip <head>...</head> from HTML body extraction to stop the <title> tag from duplicating heading text in the chunk markdown. Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable chapter-NN IDs, distributes Page_N anchors across multi-part chapters, and reclassifies Contents and Transcriber's Notes out of body (role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2). 82 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
infospace-bench
Workspace and service for creating, developing, evaluating, and inspecting structured knowledge spaces.
This repo is the application-layer successor for the infospace work that began
inside markitect-main. It focuses on concrete infospaces and their lifecycle,
while lower-level markdown tooling and runtime orchestration remain in sibling
projects.
Start with:
INTENT.mdwiki/ProductRequirementsDocument.mdwiki/FunctionalRequirementsSpecification.mdSCOPE.mddocs/infospace-layout.mddocs/evaluation-and-inspection.mddocs/reference-pilot-decision.mddocs/markitect-main-scope-assessment.mddocs/markitect-tool-adapter.mddocs/entity-relation-model.mddocs/evaluation-history-and-metrics.mddocs/workflow-generation-pipeline.mddocs/kontextual-engine-boundary.mddocs/orthogonal-successor-roadmap.mddocs/legacy-infospace-feature-inventory.mddocs/successor-boundary-interface-map.mddocs/replacement-acceptance-matrix.mddocs/legacy-command-parity.mddocs/legacy-infospace-migration-guide.mddocs/replacement-readiness-decision.mddocs/wealth-vsm-generation-pipeline.mddocs/generic-source-generator.mddocs/agentic-memory-profile-pilot.mddocs/lefevre-epub3-validation.mdinfospaces/bootstrap-pilot/infospaces/wealth-vsm-legacy-slice/infospaces/wealth-vsm-generation-pilot/infospaces/agentic-memory-profile-pilot/workplans/
Current development command:
python3 -m pytest
Languages
Python
99.9%
Makefile
0.1%