Files

tegwick debd2b8e69 IB-WP-0020-T04: example routing config + live routing smoke

examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.

tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.

tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.

docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.

174 tests pass, 2 skipped (both live smokes, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-18 22:19:54 +02:00

7.7 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_slug, state_hub_workstream_id

type

title

domain

repo

status

owner

topic_slug

created

updated

depends_on_workplans

related_workplans

state_hub_workstream_slug

state_hub_workstream_id

IB-WP-0020

workplan

Provider Routing CLI Integration

markitect

infospace-bench

active

markitect

2026-05-18

IB-WP-0018

LLM-WP-0004

IB-WP-0016

IB-WP-0019

ib-wp-0020-provider-routing-cli

172bb082-610a-477b-b5e0-26c9f4bdfd95

IB-WP-0020 — Provider Routing CLI Integration

Goal

Expose RoutingAssistedGenerationAdapter (IB-WP-0018) as a first-class CLI option so a real multi-chapter or full-book run can use the adaptive router without writing any Python. Today --provider accepts fixture and openrouter; this workplan adds routing, plus a small config file that names the rules, the ledger, the quality floors, and the per-stage task-type overrides.

The end state is a single command that does cost-aware adaptive routing across multiple OpenRouter models and writes back the per-stage adapter choices, the budget log, and (optionally) sampled shadow grades:

infospace-bench generate from-source ./LEFEVRE.epub \
  --workspace ./infospaces \
  --slug reminiscences-routed \
  --name "Reminiscences (Routed)" \
  --profile trading-literature \
  --provider routing \
  --routing-config ./routing.yaml \
  --chapter I \
  --apply

Why this is a separate workplan

IB-WP-0018 shipped the bridge module and its programmatic API. CLI wiring needs its own config-file schema, its own loader, its own error surfaces, and its own end-to-end smoke test — and that is enough scope to justify a separate review surface rather than absorbing it into the already-closed IB-WP-0018.

Non-Goals

Owning the routing policy primitives (those live in llm-connect LLM-WP-0004).
Replacing the static openrouter provider — that path stays usable for callers who do not want the router.
Embedding model selection logic inside the CLI; the config file is declarative and routing decisions stay with AdaptiveRoutingPolicy.

Tasks

T01 — Routing config file schema

id: IB-WP-0020-T01
status: done
priority: medium
state_hub_task_id: "39597441-22ab-4dcf-b68d-b045823a9374"

Define a small YAML schema for a routing config:
- quality_floor: <float | null> (global default)
- ledger_path: <str | null> (relative to workspace by default)
- task_types: map of task_type to a list of candidate adapters, each with id, provider (openrouter, claude_code, openai, …), model, api_key_env, optional max_cost_per_1k, optional quality_floor override
- stage_to_task_type: optional override map
Document the schema in docs/routing-config.md with two annotated examples (one OpenRouter-only, one ClaudeCode-as-baseline + OpenRouter candidates).
Tests: schema parses; missing fields default cleanly; unknown providers raise a focused error.

T02 — Routing config loader

id: IB-WP-0020-T02
status: done
priority: high
state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd"

Add src/infospace_bench/routing_config.py (or extend routing.py) with load_routing_config(path, *, workspace) that returns a RoutingPolicy (or AdaptiveRoutingPolicy when the config sets quality_floor or names a ledger) ready to hand to RoutingAssistedGenerationAdapter.
Provider construction:
- openrouter → llm-connect OpenRouterAdapter with API key from api_key_env (default OPENROUTER_API_KEY)
- claude_code → llm-connect ClaudeCodeAdapter
- others (openai, gemini) supported but explicitly documented as untested for production use
Tests: builds a static policy from a minimal config; builds an adaptive policy with a ledger; missing API key raises before any network call.

T03 — `--provider routing` and `--routing-config` CLI flags

id: IB-WP-0020-T03
status: done
priority: high
state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"

Add routing to the --provider choices on generate run, generate resume, and generate from-source.
Add --routing-config <path> (required when --provider routing).
Add --quality-floor <float> to override the config-level floor at the call site (handy for tightening or loosening for a single run without editing the file).
Wire the loader into _adapter_for/run_generation so a RoutingAssistedGenerationAdapter is constructed and passed to the workflow engine.
Tests: CLI smoke that builds a routing config pointing at mocked adapter ids and confirms the run goes through the bridge.

T04 — Example config and live-smoke wiring

id: IB-WP-0020-T04
status: done
priority: medium
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"

Add examples/routing/trading-literature.yaml with a realistic Lefevre-aimed config: cheap model for summaries, mid model for entities/relations, ClaudeCode baseline behind a shadow sampler.
Update the optional live-OpenRouter smoke test (tests/test_openrouter_live.py) with a parallel skipped test that exercises --provider routing end-to-end when both OPENROUTER_API_KEY and INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 are set.
Document how to run the live routing smoke in docs/generic-source-generator.md.

T05 — Shadow-mode opt-in flag

id: IB-WP-0020-T05
status: todo
priority: medium
state_hub_task_id: "02658420-056c-4d73-8055-e6a7ab51876b"

Add --shadow-rate <float> and --shadow-baseline <id> flags so a caller can enable wrap_with_shadow_sampling() for an entire run without editing the config file. When set, the loader wraps each candidate adapter in ShadowingAdapter with the named baseline and the chosen rate.
Tests: monkeypatched baseline asserts the shadow path fires at shadow_rate=1.0 and skips at shadow_rate=0.0.

Acceptance

infospace-bench generate from-source ... --provider routing --routing-config <path> succeeds against the deterministic Lefevre fixture with a hand-crafted routing config and mocked adapters.
The generation report's ## Per-stage adapter choices section reflects the routed choices, and output/budget/usage.yaml buckets reflect the actual model that ran each call.
The static openrouter and fixture provider paths remain unchanged.
An optional live smoke test exists and is gated identically to the IB-WP-0016 OpenRouter smoke.
Documentation explains the config shape, the API-key resolution, and the difference between adaptive routing and shadow-mode sampling.

Risks and open questions

Adapter constructor surface. llm-connect's adapter constructors vary slightly per provider; the loader needs to keep a small but explicit allowlist of provider names rather than reflective magic.
API key plumbing. Today openrouter reads OPENROUTER_API_KEY directly. The config will name the env var explicitly to make multi-key setups workable; no key material belongs in the config file itself.
Schema versioning. Bump schema_version from day one so the loader can refuse mismatched configs once the shape stabilises.
Shadow grader choice. v1 will default the shadow grader to ExactMatchJudge because it has no extra cost. LLMJudge and EmbeddingSimilarityJudge configuration belongs in a follow-up.

Downstream effects

infospace-bench routing ledger <path> (already shipped via IB-WP-0018) becomes the natural companion CLI for inspecting the observations the routed runs accumulate.
A successful T03 + T04 lets us run a multi-chapter Lefevre live build using the adaptive router and validate the IB-WP-0016 reviewer checklist on real output without single-model lock-in.

7.7 KiB Raw Blame History