examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.
tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.
tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.
docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.
174 tests pass, 2 skipped (both live smokes, correctly gated).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a small YAML routing config schema (schema_version 1) and a
parser-only loader at src/infospace_bench/routing_config.py. The
loader validates the declarative shape — task_types with candidates,
optional per-task quality_floor, optional default_quality_floor,
optional ledger_path, optional stage_to_task_type override map — and
refuses bad shapes before any network or workspace work happens.
Supported provider names: openrouter, claude_code, openai, gemini.
Unknown providers, missing required candidate fields, out-of-range
quality floors, negative max_cost_per_1k, duplicate candidate ids
within a task type, and non-mapping stage_to_task_type all raise
focused InfospaceError codes that callers can pattern-match.
docs/routing-config.md documents the schema with two annotated
examples (OpenRouter-only two-tier, and adaptive with a ClaudeCode
baseline) plus the full "what fails fast" list.
16 parser tests cover happy-path round-trip, file load, missing file,
malformed YAML, and every validation surface (wrong/missing schema
version, empty task_types, empty candidates, missing required fields,
unsupported provider, negative cost, out-of-range quality_floor,
duplicate ids, non-mapping stage_map, non-string ledger_path).
T02 will turn a RoutingConfig into a live llm-connect RoutingPolicy /
AdaptiveRoutingPolicy with constructed LLMAdapter instances.
160 tests pass, 1 skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T01 — task-type taxonomy. docs/routing-task-types.md names the five
generation stages as the default identity-mapped task types
(summarize-source, extract-entities, extract-relations,
evaluate-entity, synthesize-report) and records the recommended quality
floors per stage. The taxonomy explicitly does not decide which adapter
ships per task type, where the ledger lives, or what a quality score
means — those stay with the caller per the LLM-WP-0004 scope guardrail.
T02 — RoutingAssistedGenerationAdapter bridge in
src/infospace_bench/routing.py. Wraps any llm-connect RoutingPolicy or
AdaptiveRoutingPolicy as an infospace-bench AssistedGenerationAdapter:
maps stage_id -> task_type (overridable), resolves an LLMAdapter,
delegates execute_prompt with a configurable RunConfig, and surfaces
the resolved adapter id, task type, model, usage, and finish_reason
back on AssistedGenerationResult.metadata. Provider tag stays
back-compatible with the strings already used in run records and the
budget rollup (openrouter / claude_code / openai / gemini / mock /
routing).
T05 — eight tests in tests/test_routing_adapter.py cover: static-policy
per-stage resolution, stage_to_task_type overrides, default-mapping
completeness, fall-through for unmapped stage ids, the adaptive path
selecting the cheaper qualifying adapter when a quality_floor is set,
adaptive policy falling back to static when no floor is set, response
metadata round-trip with provider tagging, and estimated_cost_per_1k
pass-through.
Adds llm-connect as a path dependency on pyproject.toml and to the
pytest pythonpath. Static OpenRouter and fixture paths are unchanged;
this commit only adds the option of routing.
139 tests pass, 1 skipped (the OpenRouter live smoke, gated as before).
T03 (shadow-mode integration) and T04 (CLI + per-stage chosen-adapter
in the generation report) follow next.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Enrich reports/generation-summary.md with the review-oriented sections
that the 2026-05-17 smoke run flagged as missing: ## Chapter coverage
(per-chapter source/entity/relation/anchor counts), ## Entities (the
deduped title list), ## Unmapped source chunks (sources with no
downstream generated artifact), and ## Page anchors (total plus
deterministic sample). Sections are conditional on data being present
so generic non-Lefevre runs stay terse.
Add docs/lefevre-readiness.md as the final sign-off document for
IB-WP-0016: what is wired (T01-T06 recap), an output policy table
(checked-in fixture sources vs disposable generated infospaces vs
archive targets), a seven-item reviewer checklist (duplicate entities,
relation endpoints, weak evidence, overgeneralization, anchor
coverage, unmapped sources, plan-vs-actual variance), a scale-up plan
from one-chapter to full-book, and the load-bearing risks still
outstanding (cross-chunk dedup, whole-run resume, adaptive routing
deferred to LLM-WP-0004 / IB-WP-0018, rate-table drift).
Closes IB-WP-0016 (Lefevre EPUB3 Infospace Readiness Pilot): T01-T07
all done; the workplan is set to status=done.
131 tests pass, 1 skipped (live OpenRouter smoke, correctly gated).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add --chapter / --from-chapter / --to-chapter / --chunk selection flags
to generate init and generate from-source, plumb them into
init_generation_infospace via a new _filter_chunks_by_chapter helper,
and refuse to create an infospace when the filters reject every chunk
(InfospaceError "empty_chapter_selection"). The flags use the same
T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a
single-chapter selection is a one-flag command.
OpenRouter run-record metadata (model, request_id, usage tokens,
retry_count, duration_seconds) already lands in
output/workflows/runs/*.yaml; this task just adds the smoke test that
proves it stays there, plus the parallel guarantee that the same
provider metadata reaches generated artifact provenance via
provenance.provider_metadata.
tests/test_openrouter_live.py covers:
- chapter-filter, from/to-chapter range, and empty-selection failure on
init (non-live, deterministic)
- CLI smoke through generate from-source with --chapter
- a pytest-skipped live OpenRouter one-chapter end-to-end gated by
OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with
INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini)
docs/generic-source-generator.md gains a "Live OpenRouter runs (handle
with care)" section that walks plan-before-run, single-chapter live
run, the budget/usage artifacts, and the checks a reviewer should run
before scaling to the full book.
129 tests pass, 1 skipped (the live smoke, correctly gated).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ship a starter model rate table at src/infospace_bench/model_rates.yaml
(prompt_per_1k / completion_per_1k for the OpenRouter models we have
actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet
and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a
load_rate_table() / estimate_cost_usd() pair that overlays an optional
<workspace>/model-rates.yaml on top of the bundled defaults.
generate run now passes a workspace-aware cost_resolver into
record_run_usage, so cost_usd_estimated lands on every usage bucket
whose model matches the table. Adapter-returned cost still wins
(cost_status="known"); rate-table cost is reported under
cost_status="estimated"; unmatched models are recorded as
cost_status="unknown" rather than silently zeroed. Rate-table file is
listed in pyproject.toml package-data so pip-installed users keep the
defaults.
106 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ship a specialized profile for trading memoirs and market-structure
texts. The profile names eight entity categories (trader, market,
strategy, error, psychological_pattern, institution, instrument,
evidence_bearing_claim), five relation types (cause_effect,
lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), and
four evaluation criteria (groundedness, lesson_clarity,
historical_context, overgeneralization_risk). Each is reflected in the
prompts and contracts so the LLM is steered toward operator-level
findings rather than biographical detail or moralising.
The generic profile remains the default. A 2-chapter Lefevre smoke run
with --profile trading-literature completes end-to-end with viable
metrics; 93 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace generate plan's full-prompt dump with a compact summary that
reports selected-chunk counts, selected chapter numbers, per-workflow
call counts, prompt-word and token estimates, and a rough USD cost when
--cost-per-1k is supplied. Selection filters --chapter (label or number,
repeatable), --from-chapter / --to-chapter (numeric range), and --chunk
(repeatable id) shape the estimate. Budget caps --max-calls and
--cost-cap are reported as exceeds_* booleans so callers can fail fast
before run.
The old full per-workflow plan with prompts remains available behind
--full so deep inspection is opt-in instead of the default.
Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls,
~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks,
95 calls, ~64k tokens. 87 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Run a fixture-backed end-to-end smoke against the real Lefevre EPUB
(max-chunks 3) and capture the result in the validation note and the
workplan. The pipeline produces a complete infospace with stable
chapter-01-part-NNN source IDs, full chapter/book/anchor provenance on
every source artifact, viable metrics, and exact-title entity dedupe.
Refresh the workplan validation baseline to reflect the post-T01/T02
state, and add a remaining-gaps section that maps the open issues to the
right follow-on tasks: cost/scope controls and plan preview to T03, the
trading-literature profile to T04, chunk-level resume to T06, and a
richer generation-summary report (entity titles, chapter coverage,
anchor links) to T07.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolve chapter labels from EPUB nav entries (when present) and from the
first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N"
labels into numeric chapter indices, and generate stable IDs of the form
chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The
chunker now operates on cleaned body text, distributes id="Page_*" page
anchors per part via inline markers extracted before splitting, and
supports a configurable overlap_words evidence window between adjacent
parts of the same chapter. Reclassify body sections whose chapter label
matches contents/transcriber-notes/license/colophon tokens so they leave
the body stream by default. Strip <head>...</head> from HTML body
extraction to stop the <title> tag from duplicating heading text in the
chunk markdown.
Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable
chapter-NN IDs, distributes Page_N anchors across multi-part chapters,
and reclassifies Contents and Transcriber's Notes out of body
(role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2).
82 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parse META-INF/container.xml and the OPF package document, then iterate
documents in spine reading order instead of archive-name sort. Classify
each spine item (body, cover, nav, toc, header, footer, notes, license,
auxiliary) and exclude non-body sections by default; include_non_body=True
opts them back in for inspection. Capture OPF book metadata (title,
creator, language, subjects, rights, identifier, source_url, modified)
onto every chunk and propagate it through source artifact provenance.
Preserve the legacy zip-without-OPF fallback for malformed EPUBs.
Real Lefevre EPUB now yields 148 body chunks in spine order (was 155
mixed, archive-sorted) with cover=1, header=1, footer=4 detected and
dropped. 78 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round out IB-WP-0014 with the remaining archive operations and docs.
- restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip
a finalized package's bytes back to disk. Refuses to overwrite a non-empty
target unless --force. --from <infospace-root> resolves the store location.
- archive-list CLI with --with-retention flag; annotate_retention() opens the
per-infospace registry and joins each record with its current retention
state (effective class, expires, holds, eligibility).
- docs/archive-integration.md covers when to archive, the include set,
retention classes, storage layout, credentials policy, and the explicit
non-goal that S3/git backends live in artifact-store.
- SCOPE.md cross-links the new doc.
- Workplan flipped to status: done. Full pytest suite: 72 passed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>