Files
infospace-bench/docs/lefevre-readiness.md
tegwick 1d62dffae9 IB-WP-0016-T07: review report and output policy; close IB-WP-0016
Enrich reports/generation-summary.md with the review-oriented sections
that the 2026-05-17 smoke run flagged as missing: ## Chapter coverage
(per-chapter source/entity/relation/anchor counts), ## Entities (the
deduped title list), ## Unmapped source chunks (sources with no
downstream generated artifact), and ## Page anchors (total plus
deterministic sample). Sections are conditional on data being present
so generic non-Lefevre runs stay terse.

Add docs/lefevre-readiness.md as the final sign-off document for
IB-WP-0016: what is wired (T01-T06 recap), an output policy table
(checked-in fixture sources vs disposable generated infospaces vs
archive targets), a seven-item reviewer checklist (duplicate entities,
relation endpoints, weak evidence, overgeneralization, anchor
coverage, unmapped sources, plan-vs-actual variance), a scale-up plan
from one-chapter to full-book, and the load-bearing risks still
outstanding (cross-chunk dedup, whole-run resume, adaptive routing
deferred to LLM-WP-0004 / IB-WP-0018, rate-table drift).

Closes IB-WP-0016 (Lefevre EPUB3 Infospace Readiness Pilot): T01-T07
all done; the workplan is set to status=done.

131 tests pass, 1 skipped (live OpenRouter smoke, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 01:22:41 +02:00

8.7 KiB
Raw Permalink Blame History

Lefevre Infospace — Final Readiness Report

Date: 2026-05-17 Workplan: IB-WP-0016 Status: ready for a single-chapter live run; full-book run gated on human review of that first chapter's output

This is the human-facing readiness summary that closes IB-WP-0016. It records what is wired, where generated outputs live, what gets committed, the review checks a reviewer must perform before scaling beyond one chapter, and a few load-bearing risks that should be re-checked before any full-book run.

What is wired (T01T06)

  • T01 spine-aware EPUB3 intake. Parses META-INF/container.xml and the OPF package document; iterates documents in spine order; tags every spine entry with a section role (body, cover, nav, toc, header, footer, notes, license, auxiliary); excludes non-body sections by default with an include_non_body=True opt-in. Full OPF book metadata (title, creator, language, subjects, rights, identifier, source_url, modified) reaches every chunk.
  • T02 chapter-aware chunking and stable IDs. Resolves chapter labels from the nav doc and from in-document headings; parses roman numerals and "Chapter N" labels into numeric indices; emits stable IDs chapter-NN with -part-NNN suffix on multi-part chapters. id="Page_*" anchors are extracted upfront and distributed per chunk; an overlap_words parameter supports an evidence window between adjacent parts.
  • T03 scale-aware planning. generate plan returns a compact summary by default (selected chunks, per-workflow calls, prompt tokens, rough USD). Selection filters --chapter, --from-chapter, --to-chapter, --chunk and budget caps --max-calls, --cost-cap, --cost-per-1k are all wired. --full opts back into the full per-workflow plan when needed.
  • T04 trading-literature profile. Eight entity categories (trader, market, strategy, error, psychological_pattern, institution, instrument, evidence_bearing_claim), five relation types (cause_effect, lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), four evaluation criteria (groundedness, lesson_clarity, historical_context, overgeneralization_risk).
  • T05 deterministic Lefevre fixture. A checked-in Lefevre-shaped EPUB fixture under tests/fixtures/lefevre/ plus a trading-tuned responses YAML. Three tests prove the full pipeline produces a manifest-backed infospace with stable chapter-NN.md source slugs and that PG boilerplate is excluded by default.
  • T06 OpenRouter live-run guardrails. --chapter selection on init and from-source so a one-chapter live run is a one-flag command. Provider metadata (model, request_id, usage tokens, retry_count, duration_seconds) lives in run records and in the generated artifact provenance under provider_metadata. The optional live smoke test in tests/test_openrouter_live.py is gated on OPENROUTER_API_KEY plus INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1.

The IB-WP-0019 budget registry rides along: every generate plan appends a snapshot, every generate run writes a usage rollup + variance summary + a state-hub token event with failure isolation.

Output policy

Where things live, and what to commit:

Path Status Notes
tests/fixtures/lefevre/sources/ Committed Inspectable XHTML; the smoke test rebuilds the EPUB at test time.
tests/fixtures/lefevre/responses.yaml Committed Trading-tuned fixture responses.
docs/ Committed This file, the validation note, generator docs.
infospaces/<slug>/ Disposable Generated infospaces — do not commit.
infospaces/<slug>/output/budget/ Disposable but archive-relevant Carried into archive packages by IB-WP-0014 / IB-WP-0019-T07.
Live-run outputs (entities, relations, evaluations, reports) Review then archive After a successful chapter review, archive via infospace-bench archive rather than commit.

The repo gitignore already excludes infospaces/ content at the working-copy level; the only Lefevre input shape committed today is the small fixture under tests/fixtures/lefevre/. Archived outputs live in the artifact-store registry, not in git.

Reviewer checklist (per chapter)

Run after each chapter's generation completes and before scaling:

  1. Duplicate entities. artifacts/entities/ and the report's ## Entities list. Look for near-duplicates (e.g. Larry Livingston vs The narrator, or Bucket Shop vs Cosmopolitan Stock Brokerage Company when the latter is intended as a specific instance). Merge or split before continuing.
  2. Relation endpoints. Every relation's ## Subject and ## Object should match an existing entity title. Anything that does not should either gain an entity or be dropped.
  3. Weak evidence. Open the ## Evidence section of each relation and confirm it quotes a concrete phrase from the source chunk, not a paraphrase. Relations with evidence like "the chapter implies…" should be downgraded or removed.
  4. Overgeneralization. For every entity whose category is strategy, error, psychological_pattern, or evidence_bearing_claim, check the evaluation's overgeneralization_risk score and read the ## Review Notes it produced. Anything that silently universalises a chapter-local claim ("traders always…", "every market does…") should be re-scoped or dropped.
  5. Page anchor coverage. The report's ## Page anchors section should show anchors actually present in the source. If the anchor count is zero for a chapter that should have them, intake mis-fired.
  6. Unmapped source chunks. The report's ## Unmapped source chunks section must be empty before a full-book run. Any chunk listed there had its entity or relation stage skip silently — fix the underlying workflow or re-run with a different selection.
  7. Plan-vs-actual variance. output/budget/summary.yaml and the "Plan variance" line of the report. If actual is more than 1.5× estimated for either calls or tokens, re-plan before scaling.

Scale-up plan

To go from a reviewed one-chapter run to the full Lefevre book:

  1. Re-plan against the full book: infospace-bench generate plan <root> --cost-per-1k <rate> and inspect total_provider_calls_estimate, total_prompt_tokens_estimate, and estimated_cost_usd. The current real-book numbers are 730 calls / ~518k tokens / ~$155 at $0.30/1k.
  2. Pick a defensible cost-cap. If the plan shows estimated cost above the cap, narrow selection with --from-chapter/--to-chapter before running.
  3. Pick the final model. Confirm an entry exists in src/infospace_bench/model_rates.yaml (or a workspace override) so cost_usd_estimated lines up with reality. List prices drift — refresh captured_at if older than 90 days.
  4. Run one chapter, review per the checklist above, then either continue chapter-by-chapter or batch via --from-chapter/--to-chapter. Resume is whole-run-skip today (see Risks); avoid relying on it for partial recovery.
  5. After each successful range, archive the infospace with infospace-bench archive so the budget log, the metrics, and the generated artifacts all land in a single content-addressed package.

Risks still load-bearing

These do not block a one-chapter run but should be re-checked before a full-book run:

  • Cross-chunk entity dedupe. Exact-title upsert works (same entity title across chunks collapses to one file), but near-duplicate dedup is still a reviewer responsibility. Plan: only proceed to multi- chapter once a chapter's entity list has been hand-pruned.
  • Whole-run resume. generate resume skips a completed run; it does not skip just-completed-chunks. For a real 24-chapter run that fails midway, the safest recovery today is a new infospace with the remaining chapter range — not resume on the original.
  • Adaptive routing. The current single-model run is fine for one chapter but expensive at full-book scale. The cost-quality routing layer is parked in llm-connect LLM-WP-0004; the consumer wiring is parked in infospace-bench IB-WP-0018. Either land that work first or accept the single-model bill.
  • Provider rate drift. The default rate table in src/infospace_bench/model_rates.yaml captured prices on 2026-05-17. Refresh before a full-book run if the file is older than 90 days.

Sign-off

IB-WP-0016 T01T07 are done. The pipeline can plan a chapter, run it against OpenRouter, write a manifest-backed infospace with provider metadata, record budget and variance, archive the result, and surface the review-oriented sections that this checklist depends on. The full Lefevre book is not yet a committed artifact and should not become one until at least one chapter has cleared the reviewer checklist above.