# Lefevre Infospace — Final Readiness Report Date: 2026-05-17 Workplan: IB-WP-0016 Status: ready for a single-chapter live run; full-book run gated on human review of that first chapter's output This is the human-facing readiness summary that closes IB-WP-0016. It records what is wired, where generated outputs live, what gets committed, the review checks a reviewer must perform before scaling beyond one chapter, and a few load-bearing risks that should be re-checked before any full-book run. ## What is wired (T01–T06) - **T01 spine-aware EPUB3 intake.** Parses `META-INF/container.xml` and the OPF package document; iterates documents in spine order; tags every spine entry with a section role (`body`, `cover`, `nav`, `toc`, `header`, `footer`, `notes`, `license`, `auxiliary`); excludes non-body sections by default with an `include_non_body=True` opt-in. Full OPF book metadata (title, creator, language, subjects, rights, identifier, source_url, modified) reaches every chunk. - **T02 chapter-aware chunking and stable IDs.** Resolves chapter labels from the nav doc and from in-document headings; parses roman numerals and "Chapter N" labels into numeric indices; emits stable IDs `chapter-NN` with `-part-NNN` suffix on multi-part chapters. `id="Page_*"` anchors are extracted upfront and distributed per chunk; an `overlap_words` parameter supports an evidence window between adjacent parts. - **T03 scale-aware planning.** `generate plan` returns a compact summary by default (selected chunks, per-workflow calls, prompt tokens, rough USD). Selection filters `--chapter`, `--from-chapter`, `--to-chapter`, `--chunk` and budget caps `--max-calls`, `--cost-cap`, `--cost-per-1k` are all wired. `--full` opts back into the full per-workflow plan when needed. - **T04 trading-literature profile.** Eight entity categories (trader, market, strategy, error, psychological_pattern, institution, instrument, evidence_bearing_claim), five relation types (cause_effect, lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), four evaluation criteria (groundedness, lesson_clarity, historical_context, overgeneralization_risk). - **T05 deterministic Lefevre fixture.** A checked-in Lefevre-shaped EPUB fixture under `tests/fixtures/lefevre/` plus a trading-tuned responses YAML. Three tests prove the full pipeline produces a manifest-backed infospace with stable `chapter-NN.md` source slugs and that PG boilerplate is excluded by default. - **T06 OpenRouter live-run guardrails.** `--chapter` selection on `init` and `from-source` so a one-chapter live run is a one-flag command. Provider metadata (model, request_id, usage tokens, retry_count, duration_seconds) lives in run records *and* in the generated artifact provenance under `provider_metadata`. The optional live smoke test in `tests/test_openrouter_live.py` is gated on `OPENROUTER_API_KEY` plus `INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1`. The IB-WP-0019 budget registry rides along: every `generate plan` appends a snapshot, every `generate run` writes a usage rollup + variance summary + a state-hub token event with failure isolation. ## Output policy Where things live, and what to commit: | Path | Status | Notes | |---|---|---| | `tests/fixtures/lefevre/sources/` | **Committed** | Inspectable XHTML; the smoke test rebuilds the EPUB at test time. | | `tests/fixtures/lefevre/responses.yaml` | **Committed** | Trading-tuned fixture responses. | | `docs/` | **Committed** | This file, the validation note, generator docs. | | `infospaces//` | **Disposable** | Generated infospaces — do not commit. | | `infospaces//output/budget/` | **Disposable but archive-relevant** | Carried into archive packages by IB-WP-0014 / IB-WP-0019-T07. | | Live-run outputs (entities, relations, evaluations, reports) | **Review then archive** | After a successful chapter review, archive via `infospace-bench archive` rather than commit. | The repo gitignore already excludes `infospaces/` content at the working-copy level; the only Lefevre input shape committed today is the small fixture under `tests/fixtures/lefevre/`. Archived outputs live in the artifact-store registry, not in git. ## Reviewer checklist (per chapter) Run after each chapter's generation completes and before scaling: 1. **Duplicate entities.** `artifacts/entities/` and the report's `## Entities` list. Look for near-duplicates (e.g. `Larry Livingston` vs `The narrator`, or `Bucket Shop` vs `Cosmopolitan Stock Brokerage Company` when the latter is intended as a specific instance). Merge or split before continuing. 2. **Relation endpoints.** Every relation's `## Subject` and `## Object` should match an existing entity title. Anything that does not should either gain an entity or be dropped. 3. **Weak evidence.** Open the `## Evidence` section of each relation and confirm it quotes a concrete phrase from the source chunk, not a paraphrase. Relations with evidence like "the chapter implies…" should be downgraded or removed. 4. **Overgeneralization.** For every entity whose category is `strategy`, `error`, `psychological_pattern`, or `evidence_bearing_claim`, check the evaluation's `overgeneralization_risk` score and read the `## Review Notes` it produced. Anything that silently universalises a chapter-local claim ("traders always…", "every market does…") should be re-scoped or dropped. 5. **Page anchor coverage.** The report's `## Page anchors` section should show anchors actually present in the source. If the anchor count is zero for a chapter that should have them, intake mis-fired. 6. **Unmapped source chunks.** The report's `## Unmapped source chunks` section must be empty before a full-book run. Any chunk listed there had its entity or relation stage skip silently — fix the underlying workflow or re-run with a different selection. 7. **Plan-vs-actual variance.** `output/budget/summary.yaml` and the "Plan variance" line of the report. If actual is more than 1.5× estimated for either calls or tokens, re-plan before scaling. ## Scale-up plan To go from a reviewed one-chapter run to the full Lefevre book: 1. Re-plan against the full book: `infospace-bench generate plan --cost-per-1k ` and inspect `total_provider_calls_estimate`, `total_prompt_tokens_estimate`, and `estimated_cost_usd`. The current real-book numbers are 730 calls / ~518k tokens / ~$155 at $0.30/1k. 2. Pick a defensible cost-cap. If the plan shows estimated cost above the cap, narrow selection with `--from-chapter`/`--to-chapter` before running. 3. Pick the final model. Confirm an entry exists in `src/infospace_bench/model_rates.yaml` (or a workspace override) so `cost_usd_estimated` lines up with reality. List prices drift — refresh `captured_at` if older than 90 days. 4. Run one chapter, review per the checklist above, then either continue chapter-by-chapter or batch via `--from-chapter`/`--to-chapter`. Resume is whole-run-skip today (see Risks); avoid relying on it for partial recovery. 5. After each successful range, archive the infospace with `infospace-bench archive` so the budget log, the metrics, and the generated artifacts all land in a single content-addressed package. ## Risks still load-bearing These do not block a one-chapter run but should be re-checked before a full-book run: - **Cross-chunk entity dedupe.** Exact-title upsert works (same entity title across chunks collapses to one file), but near-duplicate dedup is still a reviewer responsibility. Plan: only proceed to multi- chapter once a chapter's entity list has been hand-pruned. - **Whole-run resume.** `generate resume` skips a completed run; it does not skip just-completed-chunks. For a real 24-chapter run that fails midway, the safest recovery today is a new infospace with the remaining chapter range — not resume on the original. - **Adaptive routing.** The current single-model run is fine for one chapter but expensive at full-book scale. The cost-quality routing layer is parked in `llm-connect` `LLM-WP-0004`; the consumer wiring is parked in `infospace-bench` `IB-WP-0018`. Either land that work first or accept the single-model bill. - **Provider rate drift.** The default rate table in `src/infospace_bench/model_rates.yaml` captured prices on 2026-05-17. Refresh before a full-book run if the file is older than 90 days. ## Sign-off IB-WP-0016 T01–T07 are done. The pipeline can plan a chapter, run it against OpenRouter, write a manifest-backed infospace with provider metadata, record budget and variance, archive the result, and surface the review-oriented sections that this checklist depends on. The full Lefevre book is not yet a committed artifact and should not become one until at least one chapter has cleared the reviewer checklist above.