Files

tegwick 13f9c1895c IB-WP-0016-T03: scale-aware planning

Replace generate plan's full-prompt dump with a compact summary that
reports selected-chunk counts, selected chapter numbers, per-workflow
call counts, prompt-word and token estimates, and a rough USD cost when
--cost-per-1k is supplied. Selection filters --chapter (label or number,
repeatable), --from-chapter / --to-chapter (numeric range), and --chunk
(repeatable id) shape the estimate. Budget caps --max-calls and
--cost-cap are reported as exceeds_* booleans so callers can fail fast
before run.

The old full per-workflow plan with prompts remains available behind
--full so deep inspection is opt-in instead of the default.

Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls,
~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks,
95 calls, ~64k tokens. 87 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 18:18:09 +02:00

3.5 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_id

type

title

domain

repo

status

owner

topic_slug

created

updated

depends_on_workplans

related_workplans

state_hub_workstream_id

IB-WP-0018

workplan

Adaptive LLM Routing — infospace-bench Consumer Wiring

markitect

infospace-bench

blocked

markitect

2026-05-17

LLM-WP-0004

IB-WP-0016

3d38642e-9d6d-4c7f-869f-b185a00bd0e6

IB-WP-0018 — Adaptive LLM Routing — infospace-bench Consumer Wiring

Goal

Wire infospace-bench workflow stages to llm-connect's adaptive cost-quality routing once LLM-WP-0004 ships the primitives. The goal is to let an infospace generation run pick the cheapest model that clears a per-stage quality bar — for example, a small/cheap model for chunk summarisation and a larger model for entity/relation extraction — without hardcoding any specific model in infospace-bench itself.

This workplan is a stub until LLM-WP-0004 tasks T01..T03 (ledger, grader, adaptive policy) are done in llm-connect. The exact task list will be refined once that API is stable.

Status

Blocked on LLM-WP-0004 T01..T03.

Why this is a separate workplan

IB-WP-0016 brings the Lefevre EPUB pipeline to a state where a chapter-by-chapter live OpenRouter run is feasible. That work uses OpenRouterAssistedGenerationAdapter directly. Replacing that direct adapter with a task-typed adaptive route is a meaningful architectural shift that deserves its own scope, baseline, and tests, rather than being absorbed into IB-WP-0016.

Provisional Tasks (refined when LLM-WP-0004 lands)

T01 — Task-type taxonomy

Name the generation stages as task types for routing (summarize-source, extract-entities, extract-relations, evaluate-entity, synthesize-report)
Document quality expectations for each task type so a per-stage quality floor can be set

T02 — Adapter swap

Introduce a small router-aware adapter that wraps AdaptiveRoutingPolicy.resolve(task_type) and exposes the existing AssistedGenerationAdapter protocol used by workflow.py
Keep OpenRouterAssistedGenerationAdapter available as the static baseline so deterministic test runs and fixture mode continue to work

T03 — Baseline + shadow integration

Use ClaudeCodeAdapter as the default baseline grader (subject to availability)
Enable ShadowingAdapter for the first multi-chapter run so the quality ledger fills up while real generation proceeds

T04 — Cost/quality reporting

Surface per-stage chosen adapter, observed quality, and cumulative cost in reports/generation-summary.md
Add a small CLI helper to print the ledger summary for an infospace

T05 — Tests

Fixture-backed test that routes through a deterministic adaptive policy with mocked observations
Regression test that demonstrates the static path still works when the router is bypassed

Acceptance

An infospace generation run can be configured to use the adaptive router without any code change inside workflow.py
A multi-chapter Lefevre run completes with per-stage adapter choices recorded in the generation summary
The fixture-mode test suite continues to pass with no live calls
The static OpenRouterAssistedGenerationAdapter path remains usable for callers that opt out of the router

Non-Goals

Authoring the routing primitives themselves (that is LLM-WP-0004's job)
Owning a task-type taxonomy beyond infospace-bench workflow stages
Embedding cost or quality observations inside infospace-bench beyond what the llm-connect ledger already records

3.5 KiB Raw Blame History