--- id: IB-WP-0020 type: workplan title: "Provider Routing CLI Integration" domain: markitect repo: infospace-bench status: active owner: markitect topic_slug: markitect created: "2026-05-18" updated: "2026-05-18" depends_on_workplans: - IB-WP-0018 - LLM-WP-0004 related_workplans: - IB-WP-0016 - IB-WP-0019 state_hub_workstream_slug: "ib-wp-0020-provider-routing-cli" state_hub_workstream_id: "172bb082-610a-477b-b5e0-26c9f4bdfd95" --- # IB-WP-0020 — Provider Routing CLI Integration ## Goal Expose `RoutingAssistedGenerationAdapter` (IB-WP-0018) as a first-class CLI option so a real multi-chapter or full-book run can use the adaptive router without writing any Python. Today `--provider` accepts `fixture` and `openrouter`; this workplan adds `routing`, plus a small config file that names the rules, the ledger, the quality floors, and the per-stage task-type overrides. The end state is a single command that does cost-aware adaptive routing across multiple OpenRouter models and writes back the per-stage adapter choices, the budget log, and (optionally) sampled shadow grades: ```bash infospace-bench generate from-source ./LEFEVRE.epub \ --workspace ./infospaces \ --slug reminiscences-routed \ --name "Reminiscences (Routed)" \ --profile trading-literature \ --provider routing \ --routing-config ./routing.yaml \ --chapter I \ --apply ``` ## Why this is a separate workplan `IB-WP-0018` shipped the bridge module and its programmatic API. CLI wiring needs its own config-file schema, its own loader, its own error surfaces, and its own end-to-end smoke test — and that is enough scope to justify a separate review surface rather than absorbing it into the already-closed IB-WP-0018. ## Non-Goals - Owning the routing policy primitives (those live in `llm-connect` LLM-WP-0004). - Replacing the static `openrouter` provider — that path stays usable for callers who do not want the router. - Embedding model selection logic inside the CLI; the config file is declarative and routing decisions stay with `AdaptiveRoutingPolicy`. ## Tasks ### T01 — Routing config file schema ```task id: IB-WP-0020-T01 status: done priority: medium state_hub_task_id: "39597441-22ab-4dcf-b68d-b045823a9374" ``` - Define a small YAML schema for a routing config: - `quality_floor: ` (global default) - `ledger_path: ` (relative to workspace by default) - `task_types`: map of task_type to a list of candidate adapters, each with `id`, `provider` (`openrouter`, `claude_code`, `openai`, …), `model`, `api_key_env`, optional `max_cost_per_1k`, optional `quality_floor` override - `stage_to_task_type`: optional override map - Document the schema in `docs/routing-config.md` with two annotated examples (one OpenRouter-only, one ClaudeCode-as-baseline + OpenRouter candidates). - Tests: schema parses; missing fields default cleanly; unknown providers raise a focused error. ### T02 — Routing config loader ```task id: IB-WP-0020-T02 status: todo priority: high state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd" ``` - Add `src/infospace_bench/routing_config.py` (or extend `routing.py`) with `load_routing_config(path, *, workspace)` that returns a `RoutingPolicy` (or `AdaptiveRoutingPolicy` when the config sets `quality_floor` or names a ledger) ready to hand to `RoutingAssistedGenerationAdapter`. - Provider construction: - `openrouter` → llm-connect `OpenRouterAdapter` with API key from `api_key_env` (default `OPENROUTER_API_KEY`) - `claude_code` → llm-connect `ClaudeCodeAdapter` - others (openai, gemini) supported but explicitly documented as untested for production use - Tests: builds a static policy from a minimal config; builds an adaptive policy with a ledger; missing API key raises before any network call. ### T03 — `--provider routing` and `--routing-config` CLI flags ```task id: IB-WP-0020-T03 status: todo priority: high state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73" ``` - Add `routing` to the `--provider` choices on `generate run`, `generate resume`, and `generate from-source`. - Add `--routing-config ` (required when `--provider routing`). - Add `--quality-floor ` to override the config-level floor at the call site (handy for tightening or loosening for a single run without editing the file). - Wire the loader into `_adapter_for`/`run_generation` so a `RoutingAssistedGenerationAdapter` is constructed and passed to the workflow engine. - Tests: CLI smoke that builds a routing config pointing at mocked adapter ids and confirms the run goes through the bridge. ### T04 — Example config and live-smoke wiring ```task id: IB-WP-0020-T04 status: todo priority: medium state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8" ``` - Add `examples/routing/trading-literature.yaml` with a realistic Lefevre-aimed config: cheap model for summaries, mid model for entities/relations, ClaudeCode baseline behind a shadow sampler. - Update the optional live-OpenRouter smoke test (`tests/test_openrouter_live.py`) with a parallel skipped test that exercises `--provider routing` end-to-end when both `OPENROUTER_API_KEY` and `INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` are set. - Document how to run the live routing smoke in `docs/generic-source-generator.md`. ### T05 — Shadow-mode opt-in flag ```task id: IB-WP-0020-T05 status: todo priority: medium state_hub_task_id: "02658420-056c-4d73-8055-e6a7ab51876b" ``` - Add `--shadow-rate ` and `--shadow-baseline ` flags so a caller can enable `wrap_with_shadow_sampling()` for an entire run without editing the config file. When set, the loader wraps each candidate adapter in `ShadowingAdapter` with the named baseline and the chosen rate. - Tests: monkeypatched baseline asserts the shadow path fires at `shadow_rate=1.0` and skips at `shadow_rate=0.0`. ## Acceptance - `infospace-bench generate from-source ... --provider routing --routing-config ` succeeds against the deterministic Lefevre fixture with a hand-crafted routing config and mocked adapters. - The generation report's `## Per-stage adapter choices` section reflects the routed choices, and `output/budget/usage.yaml` buckets reflect the actual model that ran each call. - The static `openrouter` and `fixture` provider paths remain unchanged. - An optional live smoke test exists and is gated identically to the IB-WP-0016 OpenRouter smoke. - Documentation explains the config shape, the API-key resolution, and the difference between adaptive routing and shadow-mode sampling. ## Risks and open questions - **Adapter constructor surface.** llm-connect's adapter constructors vary slightly per provider; the loader needs to keep a small but explicit allowlist of provider names rather than reflective magic. - **API key plumbing.** Today `openrouter` reads `OPENROUTER_API_KEY` directly. The config will name the env var explicitly to make multi-key setups workable; no key material belongs in the config file itself. - **Schema versioning.** Bump `schema_version` from day one so the loader can refuse mismatched configs once the shape stabilises. - **Shadow grader choice.** v1 will default the shadow grader to `ExactMatchJudge` because it has no extra cost. `LLMJudge` and `EmbeddingSimilarityJudge` configuration belongs in a follow-up. ## Downstream effects - `infospace-bench routing ledger ` (already shipped via IB-WP-0018) becomes the natural companion CLI for inspecting the observations the routed runs accumulate. - A successful T03 + T04 lets us run a multi-chapter Lefevre live build using the adaptive router and validate the IB-WP-0016 reviewer checklist on real output without single-model lock-in.