Files

tegwick cb37a7f408 IB-WP-0018: stub workplan for adaptive LLM routing consumer wiring

Blocked stub that names the dependency on llm-connect WP-0004 (adaptive
cost-quality routing). Activates once T01..T03 of that workplan land
and the QualityLedger / BaselineGrader / AdaptiveRoutingPolicy APIs are
stable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 17:26:36 +02:00

3.5 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on_workplans, related_workplans

type

title

domain

repo

status

owner

topic_slug

created

updated

depends_on_workplans

related_workplans

IB-WP-0018

workplan

Adaptive LLM Routing — infospace-bench Consumer Wiring

markitect

infospace-bench

blocked

markitect

2026-05-17

LLM-WP-0004

IB-WP-0016

IB-WP-0018 — Adaptive LLM Routing — infospace-bench Consumer Wiring

Goal

Wire infospace-bench workflow stages to llm-connect's adaptive cost-quality routing once LLM-WP-0004 ships the primitives. The goal is to let an infospace generation run pick the cheapest model that clears a per-stage quality bar — for example, a small/cheap model for chunk summarisation and a larger model for entity/relation extraction — without hardcoding any specific model in infospace-bench itself.

This workplan is a stub until LLM-WP-0004 tasks T01..T03 (ledger, grader, adaptive policy) are done in llm-connect. The exact task list will be refined once that API is stable.

Status

Blocked on LLM-WP-0004 T01..T03.

Why this is a separate workplan

IB-WP-0016 brings the Lefevre EPUB pipeline to a state where a chapter-by-chapter live OpenRouter run is feasible. That work uses OpenRouterAssistedGenerationAdapter directly. Replacing that direct adapter with a task-typed adaptive route is a meaningful architectural shift that deserves its own scope, baseline, and tests, rather than being absorbed into IB-WP-0016.

Provisional Tasks (refined when LLM-WP-0004 lands)

T01 — Task-type taxonomy

Name the generation stages as task types for routing (summarize-source, extract-entities, extract-relations, evaluate-entity, synthesize-report)
Document quality expectations for each task type so a per-stage quality floor can be set

T02 — Adapter swap

Introduce a small router-aware adapter that wraps AdaptiveRoutingPolicy.resolve(task_type) and exposes the existing AssistedGenerationAdapter protocol used by workflow.py
Keep OpenRouterAssistedGenerationAdapter available as the static baseline so deterministic test runs and fixture mode continue to work

T03 — Baseline + shadow integration

Use ClaudeCodeAdapter as the default baseline grader (subject to availability)
Enable ShadowingAdapter for the first multi-chapter run so the quality ledger fills up while real generation proceeds

T04 — Cost/quality reporting

Surface per-stage chosen adapter, observed quality, and cumulative cost in reports/generation-summary.md
Add a small CLI helper to print the ledger summary for an infospace

T05 — Tests

Fixture-backed test that routes through a deterministic adaptive policy with mocked observations
Regression test that demonstrates the static path still works when the router is bypassed

Acceptance

An infospace generation run can be configured to use the adaptive router without any code change inside workflow.py
A multi-chapter Lefevre run completes with per-stage adapter choices recorded in the generation summary
The fixture-mode test suite continues to pass with no live calls
The static OpenRouterAssistedGenerationAdapter path remains usable for callers that opt out of the router

Non-Goals

Authoring the routing primitives themselves (that is LLM-WP-0004's job)
Owning a task-type taxonomy beyond infospace-bench workflow stages
Embedding cost or quality observations inside infospace-bench beyond what the llm-connect ledger already records

3.5 KiB Raw Blame History