Files

tegwick 95779bae02 Normalize agent instructions and workplan frontmatter (STATE-WP-0067)

- Align agent files with on-disk workplan prefixes (infer from workplan ids)
- Set workplan domain to registered domain_slug; add topic_slug where applicable
- Repair frontmatter delimiter formatting; migrate legacy task status literals
- Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates

2026-06-22 23:16:25 +02:00

4.3 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_id

type

title

domain

repo

status

owner

topic_slug

created

updated

depends_on_workplans

related_workplans

state_hub_workstream_id

IB-WP-0018

workplan

Adaptive LLM Routing — infospace-bench Consumer Wiring

communication

infospace-bench

done

markitect

2026-05-17

2026-05-18

LLM-WP-0004

IB-WP-0016

3d38642e-9d6d-4c7f-869f-b185a00bd0e6

IB-WP-0018 — Adaptive LLM Routing — infospace-bench Consumer Wiring

Goal

Wire infospace-bench workflow stages to llm-connect's adaptive cost-quality routing once LLM-WP-0004 ships the primitives. The goal is to let an infospace generation run pick the cheapest model that clears a per-stage quality bar — for example, a small/cheap model for chunk summarisation and a larger model for entity/relation extraction — without hardcoding any specific model in infospace-bench itself.

This workplan is a stub until LLM-WP-0004 tasks T01..T03 (ledger, grader, adaptive policy) are done in llm-connect. The exact task list will be refined once that API is stable.

Status

Done. LLM-WP-0004 landed QualityLedger, QualityObservation, BaselineGrader/PairedGrader/ExactMatchJudge/EmbeddingSimilarityJudge/ LLMJudge, AdaptiveRoutingPolicy, and ShadowingAdapter in llm-connect; the five tasks below are all complete.

T01 — task-type taxonomy (docs/routing-task-types.md)
T02 — RoutingAssistedGenerationAdapter bridge in src/infospace_bench/routing.py
T03 — wrap_with_shadow_sampling() helper that opt-in installs llm-connect's ShadowingAdapter around any candidate
T04 — ## Per-stage adapter choices section in reports/generation-summary.md (driven from artifact provenance.provider_metadata) and infospace-bench routing ledger CLI subcommand
T05 — tests/test_routing_adapter.py (13 tests, including a CLI smoke and the adapter-choices unit test)

Why this is a separate workplan

IB-WP-0016 brings the Lefevre EPUB pipeline to a state where a chapter-by-chapter live OpenRouter run is feasible. That work uses OpenRouterAssistedGenerationAdapter directly. Replacing that direct adapter with a task-typed adaptive route is a meaningful architectural shift that deserves its own scope, baseline, and tests, rather than being absorbed into IB-WP-0016.

Provisional Tasks (refined when LLM-WP-0004 lands)

T01 — Task-type taxonomy

Name the generation stages as task types for routing (summarize-source, extract-entities, extract-relations, evaluate-entity, synthesize-report)
Document quality expectations for each task type so a per-stage quality floor can be set

T02 — Adapter swap

Introduce a small router-aware adapter that wraps AdaptiveRoutingPolicy.resolve(task_type) and exposes the existing AssistedGenerationAdapter protocol used by workflow.py
Keep OpenRouterAssistedGenerationAdapter available as the static baseline so deterministic test runs and fixture mode continue to work

T03 — Baseline + shadow integration

Use ClaudeCodeAdapter as the default baseline grader (subject to availability)
Enable ShadowingAdapter for the first multi-chapter run so the quality ledger fills up while real generation proceeds

T04 — Cost/quality reporting

Surface per-stage chosen adapter, observed quality, and cumulative cost in reports/generation-summary.md
Add a small CLI helper to print the ledger summary for an infospace

T05 — Tests

Fixture-backed test that routes through a deterministic adaptive policy with mocked observations
Regression test that demonstrates the static path still works when the router is bypassed

Acceptance

An infospace generation run can be configured to use the adaptive router without any code change inside workflow.py
A multi-chapter Lefevre run completes with per-stage adapter choices recorded in the generation summary
The fixture-mode test suite continues to pass with no live calls
The static OpenRouterAssistedGenerationAdapter path remains usable for callers that opt out of the router

Non-Goals

Authoring the routing primitives themselves (that is LLM-WP-0004's job)
Owning a task-type taxonomy beyond infospace-bench workflow stages
Embedding cost or quality observations inside infospace-bench beyond what the llm-connect ledger already records

4.3 KiB Raw Blame History