IB-WP-0016: refresh validation baseline after T01/T02 smoke run

Run a fixture-backed end-to-end smoke against the real Lefevre EPUB
(max-chunks 3) and capture the result in the validation note and the
workplan. The pipeline produces a complete infospace with stable
chapter-01-part-NNN source IDs, full chapter/book/anchor provenance on
every source artifact, viable metrics, and exact-title entity dedupe.

Refresh the workplan validation baseline to reflect the post-T01/T02
state, and add a remaining-gaps section that maps the open issues to the
right follow-on tasks: cost/scope controls and plan preview to T03, the
trading-literature profile to T04, chunk-level resume to T06, and a
richer generation-summary report (entity titles, chapter coverage,
anchor links) to T07.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 16:13:39 +02:00
parent 745edc8b81
commit 001b64d67b
2 changed files with 65 additions and 17 deletions

View File

@@ -112,3 +112,26 @@ now produces:
`Page_1..Page_14` distributed across its three parts)
- Optional `overlap_words` parameter supports evidence-window context
between adjacent parts of the same chapter without duplicating headings
## Fixture Smoke Run (2026-05-17)
`generate from-source ... --fixture-responses ... --max-chunks 3 --apply`
against the real EPUB produced a complete infospace:
- 3 source chunks (`chapter-01-part-001..003`) and 3 entities/relations/
evaluations plus the generation-summary report
- `artifacts/index.yaml` carries full T01/T02 provenance on every source
artifact (`chapter_label`, `chapter_number`, `page_anchors`, OPF
`book_metadata`)
- Metrics viable: `coverage=1.0`, `redundancy=0.0`, `granularity_entropy
≈ 1.79`; viability gates pass
- Repeated same-title entities upserted to single artifact files — basic
exact-title dedupe works; near-duplicate dedupe is still open
Gaps the smoke surfaced for follow-on tasks:
- `generation-summary.md` is just counts + metrics; needs entity titles,
chapter coverage, page-anchor links for review (T07)
- No `plan` cost preview, no chapter/cost cap selection — running the full
book at default `max_words` is ~335 provider calls (T03)
- Generic profile shaped the output, not Lefevre's trading vocabulary (T04)

View File

@@ -45,27 +45,49 @@ provenance, reviewability, or cost control.
## Validation Baseline
Validation note: `docs/lefevre-epub3-validation.md`.
Validation note: `docs/lefevre-epub3-validation.md` (includes T01 and T02
result sections).
Current WP-0015 infrastructure can initialize the local EPUB and run
source-only metrics in a disposable workspace:
After T01 and T02, the local Lefevre EPUB is intake-ready:
- source chunks: 155
- entity count: 0
- relation count: 0
- evaluation count: 0
- source-only metrics history can be written without provider calls
- 67 body chunks at default `max_words=800`, all 24 roman-numeral chapters
detected, stable IDs `chapter-01..chapter-24` with `-part-NNN` suffix
- Cover, PG header/footer, Contents, Transcriber's Notes, and license
sections classified out of the body stream by default
- Per-chunk provenance carries full OPF book metadata, chapter label and
number, page anchors, and spine index
The run proves the basic intake path works, but also shows why a live all-book
run should wait:
### Smoke Run (2026-05-17)
- most generated chunk titles collapse to the same Gutenberg page title
- EPUB spine/chapter metadata is not yet honored deeply enough
- archive-order sorting risks confusing reading order
- non-body sections such as cover/header/footer/license need explicit policy
- plan output is too prompt-heavy for cost review on a 155-chunk book
- long-book resume needs chunk-level state, not only whole-run skip
- generated entities need cross-chunk dedupe/merge policy
A fixture-backed end-to-end smoke run with `--max-chunks 3` against the
real EPUB produced a complete infospace:
- 3 source chunks (`chapter-01-part-001..003`), 3 entities, 3 relations,
3 evaluations, 1 generation-summary report
- All chapter/book/anchor provenance fields land in `artifacts/index.yaml`
(verified: `chapter_label=I`, `chapter_number=1`,
`page_anchors=[Page_1, Page_2, Page_3]` on the first chunk)
- Metrics viable: `coverage=1.0`, `redundancy=0.0`,
`granularity_entropy=1.79`, viability gates pass
- Same-title entities returned by repeated stages were upserted to single
artifact files — basic dedupe works for exact-title matches
### Remaining Gaps
These are the gaps a serious full-book run still hits:
- No compact `plan` output for cost/call preview on a 67-chunk run
(~5 stages per chunk = ~335 provider calls at default `max_words`) — T03
- No `--chapter`, `--from-chapter`, `--to-chapter`, `--cost-cap`, or
`--max-calls` selection — T03
- Generic profile produces sensible structure but does not push concepts
toward traders, markets, lessons, or strategies — T04
- The generation-summary report only shows counts and metrics; it should
surface entity titles, chapter coverage, page-anchor links, and unmapped
chunks for human review — T07
- Long-book resume is still whole-run-skip, not chunk-level — T06
- Near-duplicate entities across chunks (e.g. "Larry Livingston" vs "the
narrator") need cross-chunk merge/dedupe policy before a 24-chapter run
## Non-Goals
@@ -197,6 +219,9 @@ state_hub_task_id: "5ff1f11e-49ad-4c2d-bd4c-b8cc261309bc"
- Add a review checklist for duplicate entities, relation endpoints, weak
evidence, and over-broad trading lessons
- Add a final readiness report before generating the full book
- Enrich `reports/generation-summary.md` beyond counts and metrics: list
entity titles, per-chapter coverage, page-anchor links, and any unmapped
source chunks (gap found in the 2026-05-17 smoke run)
## Acceptance