Files
infospace-bench/docs/lefevre-readiness.md
tegwick 1d62dffae9 IB-WP-0016-T07: review report and output policy; close IB-WP-0016
Enrich reports/generation-summary.md with the review-oriented sections
that the 2026-05-17 smoke run flagged as missing: ## Chapter coverage
(per-chapter source/entity/relation/anchor counts), ## Entities (the
deduped title list), ## Unmapped source chunks (sources with no
downstream generated artifact), and ## Page anchors (total plus
deterministic sample). Sections are conditional on data being present
so generic non-Lefevre runs stay terse.

Add docs/lefevre-readiness.md as the final sign-off document for
IB-WP-0016: what is wired (T01-T06 recap), an output policy table
(checked-in fixture sources vs disposable generated infospaces vs
archive targets), a seven-item reviewer checklist (duplicate entities,
relation endpoints, weak evidence, overgeneralization, anchor
coverage, unmapped sources, plan-vs-actual variance), a scale-up plan
from one-chapter to full-book, and the load-bearing risks still
outstanding (cross-chunk dedup, whole-run resume, adaptive routing
deferred to LLM-WP-0004 / IB-WP-0018, rate-table drift).

Closes IB-WP-0016 (Lefevre EPUB3 Infospace Readiness Pilot): T01-T07
all done; the workplan is set to status=done.

131 tests pass, 1 skipped (live OpenRouter smoke, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 01:22:41 +02:00

165 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Lefevre Infospace — Final Readiness Report
Date: 2026-05-17
Workplan: IB-WP-0016
Status: ready for a single-chapter live run; full-book run gated on
human review of that first chapter's output
This is the human-facing readiness summary that closes IB-WP-0016. It
records what is wired, where generated outputs live, what gets
committed, the review checks a reviewer must perform before scaling
beyond one chapter, and a few load-bearing risks that should be
re-checked before any full-book run.
## What is wired (T01T06)
- **T01 spine-aware EPUB3 intake.** Parses `META-INF/container.xml` and
the OPF package document; iterates documents in spine order; tags
every spine entry with a section role (`body`, `cover`, `nav`, `toc`,
`header`, `footer`, `notes`, `license`, `auxiliary`); excludes
non-body sections by default with an `include_non_body=True` opt-in.
Full OPF book metadata (title, creator, language, subjects, rights,
identifier, source_url, modified) reaches every chunk.
- **T02 chapter-aware chunking and stable IDs.** Resolves chapter
labels from the nav doc and from in-document headings; parses roman
numerals and "Chapter N" labels into numeric indices; emits stable
IDs `chapter-NN` with `-part-NNN` suffix on multi-part chapters.
`id="Page_*"` anchors are extracted upfront and distributed per
chunk; an `overlap_words` parameter supports an evidence window
between adjacent parts.
- **T03 scale-aware planning.** `generate plan` returns a compact
summary by default (selected chunks, per-workflow calls, prompt
tokens, rough USD). Selection filters `--chapter`, `--from-chapter`,
`--to-chapter`, `--chunk` and budget caps `--max-calls`, `--cost-cap`,
`--cost-per-1k` are all wired. `--full` opts back into the full
per-workflow plan when needed.
- **T04 trading-literature profile.** Eight entity categories (trader,
market, strategy, error, psychological_pattern, institution,
instrument, evidence_bearing_claim), five relation types
(cause_effect, lesson_evidence, risk_mitigation, actor_venue,
strategy_outcome), four evaluation criteria (groundedness,
lesson_clarity, historical_context, overgeneralization_risk).
- **T05 deterministic Lefevre fixture.** A checked-in Lefevre-shaped
EPUB fixture under `tests/fixtures/lefevre/` plus a trading-tuned
responses YAML. Three tests prove the full pipeline produces a
manifest-backed infospace with stable `chapter-NN.md` source slugs
and that PG boilerplate is excluded by default.
- **T06 OpenRouter live-run guardrails.** `--chapter` selection on
`init` and `from-source` so a one-chapter live run is a one-flag
command. Provider metadata (model, request_id, usage tokens,
retry_count, duration_seconds) lives in run records *and* in the
generated artifact provenance under `provider_metadata`. The optional
live smoke test in `tests/test_openrouter_live.py` is gated on
`OPENROUTER_API_KEY` plus `INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1`.
The IB-WP-0019 budget registry rides along: every `generate plan`
appends a snapshot, every `generate run` writes a usage rollup +
variance summary + a state-hub token event with failure isolation.
## Output policy
Where things live, and what to commit:
| Path | Status | Notes |
|---|---|---|
| `tests/fixtures/lefevre/sources/` | **Committed** | Inspectable XHTML; the smoke test rebuilds the EPUB at test time. |
| `tests/fixtures/lefevre/responses.yaml` | **Committed** | Trading-tuned fixture responses. |
| `docs/` | **Committed** | This file, the validation note, generator docs. |
| `infospaces/<slug>/` | **Disposable** | Generated infospaces — do not commit. |
| `infospaces/<slug>/output/budget/` | **Disposable but archive-relevant** | Carried into archive packages by IB-WP-0014 / IB-WP-0019-T07. |
| Live-run outputs (entities, relations, evaluations, reports) | **Review then archive** | After a successful chapter review, archive via `infospace-bench archive` rather than commit. |
The repo gitignore already excludes `infospaces/` content at the
working-copy level; the only Lefevre input shape committed today is the
small fixture under `tests/fixtures/lefevre/`. Archived outputs live in
the artifact-store registry, not in git.
## Reviewer checklist (per chapter)
Run after each chapter's generation completes and before scaling:
1. **Duplicate entities.** `artifacts/entities/` and the report's
`## Entities` list. Look for near-duplicates (e.g. `Larry
Livingston` vs `The narrator`, or `Bucket Shop` vs
`Cosmopolitan Stock Brokerage Company` when the latter is intended
as a specific instance). Merge or split before continuing.
2. **Relation endpoints.** Every relation's `## Subject` and
`## Object` should match an existing entity title. Anything that
does not should either gain an entity or be dropped.
3. **Weak evidence.** Open the `## Evidence` section of each relation
and confirm it quotes a concrete phrase from the source chunk, not
a paraphrase. Relations with evidence like "the chapter implies…"
should be downgraded or removed.
4. **Overgeneralization.** For every entity whose category is
`strategy`, `error`, `psychological_pattern`, or
`evidence_bearing_claim`, check the evaluation's
`overgeneralization_risk` score and read the `## Review Notes` it
produced. Anything that silently universalises a chapter-local
claim ("traders always…", "every market does…") should be
re-scoped or dropped.
5. **Page anchor coverage.** The report's `## Page anchors` section
should show anchors actually present in the source. If the anchor
count is zero for a chapter that should have them, intake mis-fired.
6. **Unmapped source chunks.** The report's `## Unmapped source
chunks` section must be empty before a full-book run. Any chunk
listed there had its entity or relation stage skip silently — fix
the underlying workflow or re-run with a different selection.
7. **Plan-vs-actual variance.** `output/budget/summary.yaml` and the
"Plan variance" line of the report. If actual is more than 1.5×
estimated for either calls or tokens, re-plan before scaling.
## Scale-up plan
To go from a reviewed one-chapter run to the full Lefevre book:
1. Re-plan against the full book: `infospace-bench generate plan
<root> --cost-per-1k <rate>` and inspect
`total_provider_calls_estimate`, `total_prompt_tokens_estimate`,
and `estimated_cost_usd`. The current real-book numbers are 730
calls / ~518k tokens / ~$155 at $0.30/1k.
2. Pick a defensible cost-cap. If the plan shows estimated cost above
the cap, narrow selection with `--from-chapter`/`--to-chapter`
before running.
3. Pick the final model. Confirm an entry exists in
`src/infospace_bench/model_rates.yaml` (or a workspace override) so
`cost_usd_estimated` lines up with reality. List prices drift —
refresh `captured_at` if older than 90 days.
4. Run one chapter, review per the checklist above, then either
continue chapter-by-chapter or batch via
`--from-chapter`/`--to-chapter`. Resume is whole-run-skip today
(see Risks); avoid relying on it for partial recovery.
5. After each successful range, archive the infospace with
`infospace-bench archive` so the budget log, the metrics, and the
generated artifacts all land in a single content-addressed package.
## Risks still load-bearing
These do not block a one-chapter run but should be re-checked before a
full-book run:
- **Cross-chunk entity dedupe.** Exact-title upsert works (same entity
title across chunks collapses to one file), but near-duplicate dedup
is still a reviewer responsibility. Plan: only proceed to multi-
chapter once a chapter's entity list has been hand-pruned.
- **Whole-run resume.** `generate resume` skips a completed run; it
does not skip just-completed-chunks. For a real 24-chapter run that
fails midway, the safest recovery today is a new infospace with the
remaining chapter range — not resume on the original.
- **Adaptive routing.** The current single-model run is fine for one
chapter but expensive at full-book scale. The cost-quality routing
layer is parked in `llm-connect` `LLM-WP-0004`; the consumer wiring
is parked in `infospace-bench` `IB-WP-0018`. Either land that work
first or accept the single-model bill.
- **Provider rate drift.** The default rate table in
`src/infospace_bench/model_rates.yaml` captured prices on 2026-05-17.
Refresh before a full-book run if the file is older than 90 days.
## Sign-off
IB-WP-0016 T01T07 are done. The pipeline can plan a chapter, run it
against OpenRouter, write a manifest-backed infospace with provider
metadata, record budget and variance, archive the result, and surface
the review-oriented sections that this checklist depends on. The full
Lefevre book is not yet a committed artifact and should not become one
until at least one chapter has cleared the reviewer checklist above.