generated from coulomb/repo-seed
IB-WP-0016-T07: review report and output policy; close IB-WP-0016
Enrich reports/generation-summary.md with the review-oriented sections that the 2026-05-17 smoke run flagged as missing: ## Chapter coverage (per-chapter source/entity/relation/anchor counts), ## Entities (the deduped title list), ## Unmapped source chunks (sources with no downstream generated artifact), and ## Page anchors (total plus deterministic sample). Sections are conditional on data being present so generic non-Lefevre runs stay terse. Add docs/lefevre-readiness.md as the final sign-off document for IB-WP-0016: what is wired (T01-T06 recap), an output policy table (checked-in fixture sources vs disposable generated infospaces vs archive targets), a seven-item reviewer checklist (duplicate entities, relation endpoints, weak evidence, overgeneralization, anchor coverage, unmapped sources, plan-vs-actual variance), a scale-up plan from one-chapter to full-book, and the load-bearing risks still outstanding (cross-chunk dedup, whole-run resume, adaptive routing deferred to LLM-WP-0004 / IB-WP-0018, rate-table drift). Closes IB-WP-0016 (Lefevre EPUB3 Infospace Readiness Pilot): T01-T07 all done; the workplan is set to status=done. 131 tests pass, 1 skipped (live OpenRouter smoke, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
164
docs/lefevre-readiness.md
Normal file
164
docs/lefevre-readiness.md
Normal file
@@ -0,0 +1,164 @@
|
||||
# Lefevre Infospace — Final Readiness Report
|
||||
|
||||
Date: 2026-05-17
|
||||
Workplan: IB-WP-0016
|
||||
Status: ready for a single-chapter live run; full-book run gated on
|
||||
human review of that first chapter's output
|
||||
|
||||
This is the human-facing readiness summary that closes IB-WP-0016. It
|
||||
records what is wired, where generated outputs live, what gets
|
||||
committed, the review checks a reviewer must perform before scaling
|
||||
beyond one chapter, and a few load-bearing risks that should be
|
||||
re-checked before any full-book run.
|
||||
|
||||
## What is wired (T01–T06)
|
||||
|
||||
- **T01 spine-aware EPUB3 intake.** Parses `META-INF/container.xml` and
|
||||
the OPF package document; iterates documents in spine order; tags
|
||||
every spine entry with a section role (`body`, `cover`, `nav`, `toc`,
|
||||
`header`, `footer`, `notes`, `license`, `auxiliary`); excludes
|
||||
non-body sections by default with an `include_non_body=True` opt-in.
|
||||
Full OPF book metadata (title, creator, language, subjects, rights,
|
||||
identifier, source_url, modified) reaches every chunk.
|
||||
- **T02 chapter-aware chunking and stable IDs.** Resolves chapter
|
||||
labels from the nav doc and from in-document headings; parses roman
|
||||
numerals and "Chapter N" labels into numeric indices; emits stable
|
||||
IDs `chapter-NN` with `-part-NNN` suffix on multi-part chapters.
|
||||
`id="Page_*"` anchors are extracted upfront and distributed per
|
||||
chunk; an `overlap_words` parameter supports an evidence window
|
||||
between adjacent parts.
|
||||
- **T03 scale-aware planning.** `generate plan` returns a compact
|
||||
summary by default (selected chunks, per-workflow calls, prompt
|
||||
tokens, rough USD). Selection filters `--chapter`, `--from-chapter`,
|
||||
`--to-chapter`, `--chunk` and budget caps `--max-calls`, `--cost-cap`,
|
||||
`--cost-per-1k` are all wired. `--full` opts back into the full
|
||||
per-workflow plan when needed.
|
||||
- **T04 trading-literature profile.** Eight entity categories (trader,
|
||||
market, strategy, error, psychological_pattern, institution,
|
||||
instrument, evidence_bearing_claim), five relation types
|
||||
(cause_effect, lesson_evidence, risk_mitigation, actor_venue,
|
||||
strategy_outcome), four evaluation criteria (groundedness,
|
||||
lesson_clarity, historical_context, overgeneralization_risk).
|
||||
- **T05 deterministic Lefevre fixture.** A checked-in Lefevre-shaped
|
||||
EPUB fixture under `tests/fixtures/lefevre/` plus a trading-tuned
|
||||
responses YAML. Three tests prove the full pipeline produces a
|
||||
manifest-backed infospace with stable `chapter-NN.md` source slugs
|
||||
and that PG boilerplate is excluded by default.
|
||||
- **T06 OpenRouter live-run guardrails.** `--chapter` selection on
|
||||
`init` and `from-source` so a one-chapter live run is a one-flag
|
||||
command. Provider metadata (model, request_id, usage tokens,
|
||||
retry_count, duration_seconds) lives in run records *and* in the
|
||||
generated artifact provenance under `provider_metadata`. The optional
|
||||
live smoke test in `tests/test_openrouter_live.py` is gated on
|
||||
`OPENROUTER_API_KEY` plus `INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1`.
|
||||
|
||||
The IB-WP-0019 budget registry rides along: every `generate plan`
|
||||
appends a snapshot, every `generate run` writes a usage rollup +
|
||||
variance summary + a state-hub token event with failure isolation.
|
||||
|
||||
## Output policy
|
||||
|
||||
Where things live, and what to commit:
|
||||
|
||||
| Path | Status | Notes |
|
||||
|---|---|---|
|
||||
| `tests/fixtures/lefevre/sources/` | **Committed** | Inspectable XHTML; the smoke test rebuilds the EPUB at test time. |
|
||||
| `tests/fixtures/lefevre/responses.yaml` | **Committed** | Trading-tuned fixture responses. |
|
||||
| `docs/` | **Committed** | This file, the validation note, generator docs. |
|
||||
| `infospaces/<slug>/` | **Disposable** | Generated infospaces — do not commit. |
|
||||
| `infospaces/<slug>/output/budget/` | **Disposable but archive-relevant** | Carried into archive packages by IB-WP-0014 / IB-WP-0019-T07. |
|
||||
| Live-run outputs (entities, relations, evaluations, reports) | **Review then archive** | After a successful chapter review, archive via `infospace-bench archive` rather than commit. |
|
||||
|
||||
The repo gitignore already excludes `infospaces/` content at the
|
||||
working-copy level; the only Lefevre input shape committed today is the
|
||||
small fixture under `tests/fixtures/lefevre/`. Archived outputs live in
|
||||
the artifact-store registry, not in git.
|
||||
|
||||
## Reviewer checklist (per chapter)
|
||||
|
||||
Run after each chapter's generation completes and before scaling:
|
||||
|
||||
1. **Duplicate entities.** `artifacts/entities/` and the report's
|
||||
`## Entities` list. Look for near-duplicates (e.g. `Larry
|
||||
Livingston` vs `The narrator`, or `Bucket Shop` vs
|
||||
`Cosmopolitan Stock Brokerage Company` when the latter is intended
|
||||
as a specific instance). Merge or split before continuing.
|
||||
2. **Relation endpoints.** Every relation's `## Subject` and
|
||||
`## Object` should match an existing entity title. Anything that
|
||||
does not should either gain an entity or be dropped.
|
||||
3. **Weak evidence.** Open the `## Evidence` section of each relation
|
||||
and confirm it quotes a concrete phrase from the source chunk, not
|
||||
a paraphrase. Relations with evidence like "the chapter implies…"
|
||||
should be downgraded or removed.
|
||||
4. **Overgeneralization.** For every entity whose category is
|
||||
`strategy`, `error`, `psychological_pattern`, or
|
||||
`evidence_bearing_claim`, check the evaluation's
|
||||
`overgeneralization_risk` score and read the `## Review Notes` it
|
||||
produced. Anything that silently universalises a chapter-local
|
||||
claim ("traders always…", "every market does…") should be
|
||||
re-scoped or dropped.
|
||||
5. **Page anchor coverage.** The report's `## Page anchors` section
|
||||
should show anchors actually present in the source. If the anchor
|
||||
count is zero for a chapter that should have them, intake mis-fired.
|
||||
6. **Unmapped source chunks.** The report's `## Unmapped source
|
||||
chunks` section must be empty before a full-book run. Any chunk
|
||||
listed there had its entity or relation stage skip silently — fix
|
||||
the underlying workflow or re-run with a different selection.
|
||||
7. **Plan-vs-actual variance.** `output/budget/summary.yaml` and the
|
||||
"Plan variance" line of the report. If actual is more than 1.5×
|
||||
estimated for either calls or tokens, re-plan before scaling.
|
||||
|
||||
## Scale-up plan
|
||||
|
||||
To go from a reviewed one-chapter run to the full Lefevre book:
|
||||
|
||||
1. Re-plan against the full book: `infospace-bench generate plan
|
||||
<root> --cost-per-1k <rate>` and inspect
|
||||
`total_provider_calls_estimate`, `total_prompt_tokens_estimate`,
|
||||
and `estimated_cost_usd`. The current real-book numbers are 730
|
||||
calls / ~518k tokens / ~$155 at $0.30/1k.
|
||||
2. Pick a defensible cost-cap. If the plan shows estimated cost above
|
||||
the cap, narrow selection with `--from-chapter`/`--to-chapter`
|
||||
before running.
|
||||
3. Pick the final model. Confirm an entry exists in
|
||||
`src/infospace_bench/model_rates.yaml` (or a workspace override) so
|
||||
`cost_usd_estimated` lines up with reality. List prices drift —
|
||||
refresh `captured_at` if older than 90 days.
|
||||
4. Run one chapter, review per the checklist above, then either
|
||||
continue chapter-by-chapter or batch via
|
||||
`--from-chapter`/`--to-chapter`. Resume is whole-run-skip today
|
||||
(see Risks); avoid relying on it for partial recovery.
|
||||
5. After each successful range, archive the infospace with
|
||||
`infospace-bench archive` so the budget log, the metrics, and the
|
||||
generated artifacts all land in a single content-addressed package.
|
||||
|
||||
## Risks still load-bearing
|
||||
|
||||
These do not block a one-chapter run but should be re-checked before a
|
||||
full-book run:
|
||||
|
||||
- **Cross-chunk entity dedupe.** Exact-title upsert works (same entity
|
||||
title across chunks collapses to one file), but near-duplicate dedup
|
||||
is still a reviewer responsibility. Plan: only proceed to multi-
|
||||
chapter once a chapter's entity list has been hand-pruned.
|
||||
- **Whole-run resume.** `generate resume` skips a completed run; it
|
||||
does not skip just-completed-chunks. For a real 24-chapter run that
|
||||
fails midway, the safest recovery today is a new infospace with the
|
||||
remaining chapter range — not resume on the original.
|
||||
- **Adaptive routing.** The current single-model run is fine for one
|
||||
chapter but expensive at full-book scale. The cost-quality routing
|
||||
layer is parked in `llm-connect` `LLM-WP-0004`; the consumer wiring
|
||||
is parked in `infospace-bench` `IB-WP-0018`. Either land that work
|
||||
first or accept the single-model bill.
|
||||
- **Provider rate drift.** The default rate table in
|
||||
`src/infospace_bench/model_rates.yaml` captured prices on 2026-05-17.
|
||||
Refresh before a full-book run if the file is older than 90 days.
|
||||
|
||||
## Sign-off
|
||||
|
||||
IB-WP-0016 T01–T07 are done. The pipeline can plan a chapter, run it
|
||||
against OpenRouter, write a manifest-backed infospace with provider
|
||||
metadata, record budget and variance, archive the result, and surface
|
||||
the review-oriented sections that this checklist depends on. The full
|
||||
Lefevre book is not yet a committed artifact and should not become one
|
||||
until at least one chapter has cleared the reviewer checklist above.
|
||||
Reference in New Issue
Block a user