IB-WP-0016: refresh validation baseline after T01/T02 smoke run

Run a fixture-backed end-to-end smoke against the real Lefevre EPUB
(max-chunks 3) and capture the result in the validation note and the
workplan. The pipeline produces a complete infospace with stable
chapter-01-part-NNN source IDs, full chapter/book/anchor provenance on
every source artifact, viable metrics, and exact-title entity dedupe.

Refresh the workplan validation baseline to reflect the post-T01/T02
state, and add a remaining-gaps section that maps the open issues to the
right follow-on tasks: cost/scope controls and plan preview to T03, the
trading-literature profile to T04, chunk-level resume to T06, and a
richer generation-summary report (entity titles, chapter coverage,
anchor links) to T07.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 16:13:39 +02:00
parent 745edc8b81
commit 001b64d67b
2 changed files with 65 additions and 17 deletions

View File

@@ -112,3 +112,26 @@ now produces:
`Page_1..Page_14` distributed across its three parts)
- Optional `overlap_words` parameter supports evidence-window context
between adjacent parts of the same chapter without duplicating headings
## Fixture Smoke Run (2026-05-17)
`generate from-source ... --fixture-responses ... --max-chunks 3 --apply`
against the real EPUB produced a complete infospace:
- 3 source chunks (`chapter-01-part-001..003`) and 3 entities/relations/
evaluations plus the generation-summary report
- `artifacts/index.yaml` carries full T01/T02 provenance on every source
artifact (`chapter_label`, `chapter_number`, `page_anchors`, OPF
`book_metadata`)
- Metrics viable: `coverage=1.0`, `redundancy=0.0`, `granularity_entropy
≈ 1.79`; viability gates pass
- Repeated same-title entities upserted to single artifact files — basic
exact-title dedupe works; near-duplicate dedupe is still open
Gaps the smoke surfaced for follow-on tasks:
- `generation-summary.md` is just counts + metrics; needs entity titles,
chapter coverage, page-anchor links for review (T07)
- No `plan` cost preview, no chapter/cost cap selection — running the full
book at default `max_words` is ~335 provider calls (T03)
- Generic profile shaped the output, not Lefevre's trading vocabulary (T04)