generated from coulomb/repo-seed
Add --shadow-baseline <id> and --shadow-rate <float> opt-in flags to
generate run, generate resume, and generate from-source. When
--shadow-baseline names a candidate id from the routing config,
build_routing_policy_from_config wraps every other candidate in an
llm-connect ShadowingAdapter using that baseline plus a
PairedGrader(ExactMatchJudge()) and the workspace-resolved
QualityLedger. The baseline candidate itself is never wrapped — that
would shadow it against itself. --shadow-rate defaults to 0.1 when
--shadow-baseline is set; passing --shadow-rate without
--shadow-baseline fails fast with shadow_rate_without_baseline.
Setting --shadow-baseline without a ledger_path in the config fails
with missing_routing_ledger_for_shadow so observations have a place to
land before any call goes out.
run_generation grew shadow_baseline + shadow_rate kwargs and
_adapter_for("routing", ...) plumbs them into
build_routing_policy_from_config. The wrapped ShadowingAdapter slots
into the policy's prefer/fallback per task type via a
(candidate_id, task_type) reverse lookup, and adapters_by_id on the
adaptive policy gets the string-keyed entries.
Five new tests cover: shadow_rate without baseline fails fast, shadow
mode without a ledger fails fast, unknown shadow baseline id fails
fast, structural assertion that ShadowingAdapter wraps non-baseline
candidates and leaves the baseline raw, and a behavioural check that
shadow_rate=1.0 calls the baseline on every call while shadow_rate=0.0
skips entirely. Test forces async_shadow=False so the call counter is
deterministic.
Closes IB-WP-0020: T01-T05 all done. Workplan status flips from active
to finished. 179 tests pass, 2 skipped (both live OpenRouter smokes).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
212 lines
7.7 KiB
Markdown
212 lines
7.7 KiB
Markdown
---
|
|
id: IB-WP-0020
|
|
type: workplan
|
|
title: "Provider Routing CLI Integration"
|
|
domain: markitect
|
|
repo: infospace-bench
|
|
status: finished
|
|
owner: markitect
|
|
topic_slug: markitect
|
|
created: "2026-05-18"
|
|
updated: "2026-05-18"
|
|
depends_on_workplans:
|
|
- IB-WP-0018
|
|
- LLM-WP-0004
|
|
related_workplans:
|
|
- IB-WP-0016
|
|
- IB-WP-0019
|
|
state_hub_workstream_slug: "ib-wp-0020-provider-routing-cli"
|
|
state_hub_workstream_id: "172bb082-610a-477b-b5e0-26c9f4bdfd95"
|
|
---
|
|
|
|
# IB-WP-0020 — Provider Routing CLI Integration
|
|
|
|
## Goal
|
|
|
|
Expose `RoutingAssistedGenerationAdapter` (IB-WP-0018) as a first-class
|
|
CLI option so a real multi-chapter or full-book run can use the
|
|
adaptive router without writing any Python. Today `--provider` accepts
|
|
`fixture` and `openrouter`; this workplan adds `routing`, plus a small
|
|
config file that names the rules, the ledger, the quality floors, and
|
|
the per-stage task-type overrides.
|
|
|
|
The end state is a single command that does cost-aware adaptive
|
|
routing across multiple OpenRouter models and writes back the
|
|
per-stage adapter choices, the budget log, and (optionally) sampled
|
|
shadow grades:
|
|
|
|
```bash
|
|
infospace-bench generate from-source ./LEFEVRE.epub \
|
|
--workspace ./infospaces \
|
|
--slug reminiscences-routed \
|
|
--name "Reminiscences (Routed)" \
|
|
--profile trading-literature \
|
|
--provider routing \
|
|
--routing-config ./routing.yaml \
|
|
--chapter I \
|
|
--apply
|
|
```
|
|
|
|
## Why this is a separate workplan
|
|
|
|
`IB-WP-0018` shipped the bridge module and its programmatic API. CLI
|
|
wiring needs its own config-file schema, its own loader, its own error
|
|
surfaces, and its own end-to-end smoke test — and that is enough scope
|
|
to justify a separate review surface rather than absorbing it into the
|
|
already-closed IB-WP-0018.
|
|
|
|
## Non-Goals
|
|
|
|
- Owning the routing policy primitives (those live in
|
|
`llm-connect` LLM-WP-0004).
|
|
- Replacing the static `openrouter` provider — that path stays usable
|
|
for callers who do not want the router.
|
|
- Embedding model selection logic inside the CLI; the config file is
|
|
declarative and routing decisions stay with `AdaptiveRoutingPolicy`.
|
|
|
|
## Tasks
|
|
|
|
### T01 — Routing config file schema
|
|
|
|
```task
|
|
id: IB-WP-0020-T01
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "39597441-22ab-4dcf-b68d-b045823a9374"
|
|
```
|
|
|
|
- Define a small YAML schema for a routing config:
|
|
- `quality_floor: <float | null>` (global default)
|
|
- `ledger_path: <str | null>` (relative to workspace by default)
|
|
- `task_types`: map of task_type to a list of candidate adapters,
|
|
each with `id`, `provider` (`openrouter`, `claude_code`,
|
|
`openai`, …), `model`, `api_key_env`, optional `max_cost_per_1k`,
|
|
optional `quality_floor` override
|
|
- `stage_to_task_type`: optional override map
|
|
- Document the schema in `docs/routing-config.md` with two annotated
|
|
examples (one OpenRouter-only, one ClaudeCode-as-baseline +
|
|
OpenRouter candidates).
|
|
- Tests: schema parses; missing fields default cleanly; unknown
|
|
providers raise a focused error.
|
|
|
|
### T02 — Routing config loader
|
|
|
|
```task
|
|
id: IB-WP-0020-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd"
|
|
```
|
|
|
|
- Add `src/infospace_bench/routing_config.py` (or extend
|
|
`routing.py`) with `load_routing_config(path, *, workspace)` that
|
|
returns a `RoutingPolicy` (or `AdaptiveRoutingPolicy` when the
|
|
config sets `quality_floor` or names a ledger) ready to hand to
|
|
`RoutingAssistedGenerationAdapter`.
|
|
- Provider construction:
|
|
- `openrouter` → llm-connect `OpenRouterAdapter` with API key from
|
|
`api_key_env` (default `OPENROUTER_API_KEY`)
|
|
- `claude_code` → llm-connect `ClaudeCodeAdapter`
|
|
- others (openai, gemini) supported but explicitly documented as
|
|
untested for production use
|
|
- Tests: builds a static policy from a minimal config; builds an
|
|
adaptive policy with a ledger; missing API key raises before any
|
|
network call.
|
|
|
|
### T03 — `--provider routing` and `--routing-config` CLI flags
|
|
|
|
```task
|
|
id: IB-WP-0020-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
|
|
```
|
|
|
|
- Add `routing` to the `--provider` choices on `generate run`,
|
|
`generate resume`, and `generate from-source`.
|
|
- Add `--routing-config <path>` (required when `--provider routing`).
|
|
- Add `--quality-floor <float>` to override the config-level floor at
|
|
the call site (handy for tightening or loosening for a single run
|
|
without editing the file).
|
|
- Wire the loader into `_adapter_for`/`run_generation` so a
|
|
`RoutingAssistedGenerationAdapter` is constructed and passed to the
|
|
workflow engine.
|
|
- Tests: CLI smoke that builds a routing config pointing at mocked
|
|
adapter ids and confirms the run goes through the bridge.
|
|
|
|
### T04 — Example config and live-smoke wiring
|
|
|
|
```task
|
|
id: IB-WP-0020-T04
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
|
|
```
|
|
|
|
- Add `examples/routing/trading-literature.yaml` with a realistic
|
|
Lefevre-aimed config: cheap model for summaries, mid model for
|
|
entities/relations, ClaudeCode baseline behind a shadow sampler.
|
|
- Update the optional live-OpenRouter smoke test
|
|
(`tests/test_openrouter_live.py`) with a parallel skipped test that
|
|
exercises `--provider routing` end-to-end when both
|
|
`OPENROUTER_API_KEY` and
|
|
`INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` are set.
|
|
- Document how to run the live routing smoke in
|
|
`docs/generic-source-generator.md`.
|
|
|
|
### T05 — Shadow-mode opt-in flag
|
|
|
|
```task
|
|
id: IB-WP-0020-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "02658420-056c-4d73-8055-e6a7ab51876b"
|
|
```
|
|
|
|
- Add `--shadow-rate <float>` and `--shadow-baseline <id>` flags so a
|
|
caller can enable `wrap_with_shadow_sampling()` for an entire run
|
|
without editing the config file. When set, the loader wraps each
|
|
candidate adapter in `ShadowingAdapter` with the named baseline and
|
|
the chosen rate.
|
|
- Tests: monkeypatched baseline asserts the shadow path fires at
|
|
`shadow_rate=1.0` and skips at `shadow_rate=0.0`.
|
|
|
|
## Acceptance
|
|
|
|
- `infospace-bench generate from-source ... --provider routing
|
|
--routing-config <path>` succeeds against the deterministic Lefevre
|
|
fixture with a hand-crafted routing config and mocked adapters.
|
|
- The generation report's `## Per-stage adapter choices` section
|
|
reflects the routed choices, and `output/budget/usage.yaml` buckets
|
|
reflect the actual model that ran each call.
|
|
- The static `openrouter` and `fixture` provider paths remain
|
|
unchanged.
|
|
- An optional live smoke test exists and is gated identically to the
|
|
IB-WP-0016 OpenRouter smoke.
|
|
- Documentation explains the config shape, the API-key resolution, and
|
|
the difference between adaptive routing and shadow-mode sampling.
|
|
|
|
## Risks and open questions
|
|
|
|
- **Adapter constructor surface.** llm-connect's adapter constructors
|
|
vary slightly per provider; the loader needs to keep a small but
|
|
explicit allowlist of provider names rather than reflective magic.
|
|
- **API key plumbing.** Today `openrouter` reads
|
|
`OPENROUTER_API_KEY` directly. The config will name the env var
|
|
explicitly to make multi-key setups workable; no key material
|
|
belongs in the config file itself.
|
|
- **Schema versioning.** Bump `schema_version` from day one so the
|
|
loader can refuse mismatched configs once the shape stabilises.
|
|
- **Shadow grader choice.** v1 will default the shadow grader to
|
|
`ExactMatchJudge` because it has no extra cost. `LLMJudge` and
|
|
`EmbeddingSimilarityJudge` configuration belongs in a follow-up.
|
|
|
|
## Downstream effects
|
|
|
|
- `infospace-bench routing ledger <path>` (already shipped via
|
|
IB-WP-0018) becomes the natural companion CLI for inspecting the
|
|
observations the routed runs accumulate.
|
|
- A successful T03 + T04 lets us run a multi-chapter Lefevre live
|
|
build using the adaptive router and validate the IB-WP-0016
|
|
reviewer checklist on real output without single-model lock-in.
|