Add --shadow-baseline <id> and --shadow-rate <float> opt-in flags to
generate run, generate resume, and generate from-source. When
--shadow-baseline names a candidate id from the routing config,
build_routing_policy_from_config wraps every other candidate in an
llm-connect ShadowingAdapter using that baseline plus a
PairedGrader(ExactMatchJudge()) and the workspace-resolved
QualityLedger. The baseline candidate itself is never wrapped — that
would shadow it against itself. --shadow-rate defaults to 0.1 when
--shadow-baseline is set; passing --shadow-rate without
--shadow-baseline fails fast with shadow_rate_without_baseline.
Setting --shadow-baseline without a ledger_path in the config fails
with missing_routing_ledger_for_shadow so observations have a place to
land before any call goes out.
run_generation grew shadow_baseline + shadow_rate kwargs and
_adapter_for("routing", ...) plumbs them into
build_routing_policy_from_config. The wrapped ShadowingAdapter slots
into the policy's prefer/fallback per task type via a
(candidate_id, task_type) reverse lookup, and adapters_by_id on the
adaptive policy gets the string-keyed entries.
Five new tests cover: shadow_rate without baseline fails fast, shadow
mode without a ledger fails fast, unknown shadow baseline id fails
fast, structural assertion that ShadowingAdapter wraps non-baseline
candidates and leaves the baseline raw, and a behavioural check that
shadow_rate=1.0 calls the baseline on every call while shadow_rate=0.0
skips entirely. Test forces async_shadow=False so the call counter is
deterministic.
Closes IB-WP-0020: T01-T05 all done. Workplan status flips from active
to finished. 179 tests pass, 2 skipped (both live OpenRouter smokes).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.7 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_slug, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | depends_on_workplans | related_workplans | state_hub_workstream_slug | state_hub_workstream_id | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IB-WP-0020 | workplan | Provider Routing CLI Integration | markitect | infospace-bench | finished | markitect | markitect | 2026-05-18 | 2026-05-18 |
|
|
ib-wp-0020-provider-routing-cli | 172bb082-610a-477b-b5e0-26c9f4bdfd95 |
IB-WP-0020 — Provider Routing CLI Integration
Goal
Expose RoutingAssistedGenerationAdapter (IB-WP-0018) as a first-class
CLI option so a real multi-chapter or full-book run can use the
adaptive router without writing any Python. Today --provider accepts
fixture and openrouter; this workplan adds routing, plus a small
config file that names the rules, the ledger, the quality floors, and
the per-stage task-type overrides.
The end state is a single command that does cost-aware adaptive routing across multiple OpenRouter models and writes back the per-stage adapter choices, the budget log, and (optionally) sampled shadow grades:
infospace-bench generate from-source ./LEFEVRE.epub \
--workspace ./infospaces \
--slug reminiscences-routed \
--name "Reminiscences (Routed)" \
--profile trading-literature \
--provider routing \
--routing-config ./routing.yaml \
--chapter I \
--apply
Why this is a separate workplan
IB-WP-0018 shipped the bridge module and its programmatic API. CLI
wiring needs its own config-file schema, its own loader, its own error
surfaces, and its own end-to-end smoke test — and that is enough scope
to justify a separate review surface rather than absorbing it into the
already-closed IB-WP-0018.
Non-Goals
- Owning the routing policy primitives (those live in
llm-connectLLM-WP-0004). - Replacing the static
openrouterprovider — that path stays usable for callers who do not want the router. - Embedding model selection logic inside the CLI; the config file is
declarative and routing decisions stay with
AdaptiveRoutingPolicy.
Tasks
T01 — Routing config file schema
id: IB-WP-0020-T01
status: done
priority: medium
state_hub_task_id: "39597441-22ab-4dcf-b68d-b045823a9374"
- Define a small YAML schema for a routing config:
quality_floor: <float | null>(global default)ledger_path: <str | null>(relative to workspace by default)task_types: map of task_type to a list of candidate adapters, each withid,provider(openrouter,claude_code,openai, …),model,api_key_env, optionalmax_cost_per_1k, optionalquality_flooroverridestage_to_task_type: optional override map
- Document the schema in
docs/routing-config.mdwith two annotated examples (one OpenRouter-only, one ClaudeCode-as-baseline + OpenRouter candidates). - Tests: schema parses; missing fields default cleanly; unknown providers raise a focused error.
T02 — Routing config loader
id: IB-WP-0020-T02
status: done
priority: high
state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd"
- Add
src/infospace_bench/routing_config.py(or extendrouting.py) withload_routing_config(path, *, workspace)that returns aRoutingPolicy(orAdaptiveRoutingPolicywhen the config setsquality_flooror names a ledger) ready to hand toRoutingAssistedGenerationAdapter. - Provider construction:
openrouter→ llm-connectOpenRouterAdapterwith API key fromapi_key_env(defaultOPENROUTER_API_KEY)claude_code→ llm-connectClaudeCodeAdapter- others (openai, gemini) supported but explicitly documented as untested for production use
- Tests: builds a static policy from a minimal config; builds an adaptive policy with a ledger; missing API key raises before any network call.
T03 — --provider routing and --routing-config CLI flags
id: IB-WP-0020-T03
status: done
priority: high
state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
- Add
routingto the--providerchoices ongenerate run,generate resume, andgenerate from-source. - Add
--routing-config <path>(required when--provider routing). - Add
--quality-floor <float>to override the config-level floor at the call site (handy for tightening or loosening for a single run without editing the file). - Wire the loader into
_adapter_for/run_generationso aRoutingAssistedGenerationAdapteris constructed and passed to the workflow engine. - Tests: CLI smoke that builds a routing config pointing at mocked adapter ids and confirms the run goes through the bridge.
T04 — Example config and live-smoke wiring
id: IB-WP-0020-T04
status: done
priority: medium
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
- Add
examples/routing/trading-literature.yamlwith a realistic Lefevre-aimed config: cheap model for summaries, mid model for entities/relations, ClaudeCode baseline behind a shadow sampler. - Update the optional live-OpenRouter smoke test
(
tests/test_openrouter_live.py) with a parallel skipped test that exercises--provider routingend-to-end when bothOPENROUTER_API_KEYandINFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1are set. - Document how to run the live routing smoke in
docs/generic-source-generator.md.
T05 — Shadow-mode opt-in flag
id: IB-WP-0020-T05
status: done
priority: medium
state_hub_task_id: "02658420-056c-4d73-8055-e6a7ab51876b"
- Add
--shadow-rate <float>and--shadow-baseline <id>flags so a caller can enablewrap_with_shadow_sampling()for an entire run without editing the config file. When set, the loader wraps each candidate adapter inShadowingAdapterwith the named baseline and the chosen rate. - Tests: monkeypatched baseline asserts the shadow path fires at
shadow_rate=1.0and skips atshadow_rate=0.0.
Acceptance
infospace-bench generate from-source ... --provider routing --routing-config <path>succeeds against the deterministic Lefevre fixture with a hand-crafted routing config and mocked adapters.- The generation report's
## Per-stage adapter choicessection reflects the routed choices, andoutput/budget/usage.yamlbuckets reflect the actual model that ran each call. - The static
openrouterandfixtureprovider paths remain unchanged. - An optional live smoke test exists and is gated identically to the IB-WP-0016 OpenRouter smoke.
- Documentation explains the config shape, the API-key resolution, and the difference between adaptive routing and shadow-mode sampling.
Risks and open questions
- Adapter constructor surface. llm-connect's adapter constructors vary slightly per provider; the loader needs to keep a small but explicit allowlist of provider names rather than reflective magic.
- API key plumbing. Today
openrouterreadsOPENROUTER_API_KEYdirectly. The config will name the env var explicitly to make multi-key setups workable; no key material belongs in the config file itself. - Schema versioning. Bump
schema_versionfrom day one so the loader can refuse mismatched configs once the shape stabilises. - Shadow grader choice. v1 will default the shadow grader to
ExactMatchJudgebecause it has no extra cost.LLMJudgeandEmbeddingSimilarityJudgeconfiguration belongs in a follow-up.
Downstream effects
infospace-bench routing ledger <path>(already shipped via IB-WP-0018) becomes the natural companion CLI for inspecting the observations the routed runs accumulate.- A successful T03 + T04 lets us run a multi-chapter Lefevre live build using the adaptive router and validate the IB-WP-0016 reviewer checklist on real output without single-model lock-in.