infospace-bench/workplans/IB-WP-0020-provider-routing-cli.md

---
id: IB-WP-0020
type: workplan
title: "Provider Routing CLI Integration"
domain: markitect
repo: infospace-bench
status: active
owner: markitect
topic_slug: markitect
created: "2026-05-18"
updated: "2026-05-18"
depends_on_workplans:
  - IB-WP-0018
  - LLM-WP-0004
related_workplans:
  - IB-WP-0016
  - IB-WP-0019
state_hub_workstream_slug: "ib-wp-0020-provider-routing-cli"
state_hub_workstream_id: "172bb082-610a-477b-b5e0-26c9f4bdfd95"
---

# IB-WP-0020 — Provider Routing CLI Integration

## Goal

Expose `RoutingAssistedGenerationAdapter` (IB-WP-0018) as a first-class
CLI option so a real multi-chapter or full-book run can use the
adaptive router without writing any Python. Today `--provider` accepts
`fixture` and `openrouter`; this workplan adds `routing`, plus a small
config file that names the rules, the ledger, the quality floors, and
the per-stage task-type overrides.

The end state is a single command that does cost-aware adaptive
routing across multiple OpenRouter models and writes back the
per-stage adapter choices, the budget log, and (optionally) sampled
shadow grades:

```bash
infospace-bench generate from-source ./LEFEVRE.epub \
  --workspace ./infospaces \
  --slug reminiscences-routed \
  --name "Reminiscences (Routed)" \
  --profile trading-literature \
  --provider routing \
  --routing-config ./routing.yaml \
  --chapter I \
  --apply
```

## Why this is a separate workplan

`IB-WP-0018` shipped the bridge module and its programmatic API. CLI
wiring needs its own config-file schema, its own loader, its own error
surfaces, and its own end-to-end smoke test — and that is enough scope
to justify a separate review surface rather than absorbing it into the
already-closed IB-WP-0018.

## Non-Goals

- Owning the routing policy primitives (those live in
  `llm-connect` LLM-WP-0004).
- Replacing the static `openrouter` provider — that path stays usable
  for callers who do not want the router.
- Embedding model selection logic inside the CLI; the config file is
  declarative and routing decisions stay with `AdaptiveRoutingPolicy`.

## Tasks

### T01 — Routing config file schema

```task
id: IB-WP-0020-T01
status: done
priority: medium
state_hub_task_id: "39597441-22ab-4dcf-b68d-b045823a9374"
```

- Define a small YAML schema for a routing config:
  - `quality_floor: <float | null>` (global default)
  - `ledger_path: <str | null>` (relative to workspace by default)
  - `task_types`: map of task_type to a list of candidate adapters,
    each with `id`, `provider` (`openrouter`, `claude_code`,
    `openai`, …), `model`, `api_key_env`, optional `max_cost_per_1k`,
    optional `quality_floor` override
  - `stage_to_task_type`: optional override map
- Document the schema in `docs/routing-config.md` with two annotated
  examples (one OpenRouter-only, one ClaudeCode-as-baseline +
  OpenRouter candidates).
- Tests: schema parses; missing fields default cleanly; unknown
  providers raise a focused error.

### T02 — Routing config loader

```task
id: IB-WP-0020-T02
status: todo
priority: high
state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd"
```

- Add `src/infospace_bench/routing_config.py` (or extend
  `routing.py`) with `load_routing_config(path, *, workspace)` that
  returns a `RoutingPolicy` (or `AdaptiveRoutingPolicy` when the
  config sets `quality_floor` or names a ledger) ready to hand to
  `RoutingAssistedGenerationAdapter`.
- Provider construction:
  - `openrouter` → llm-connect `OpenRouterAdapter` with API key from
    `api_key_env` (default `OPENROUTER_API_KEY`)
  - `claude_code` → llm-connect `ClaudeCodeAdapter`
  - others (openai, gemini) supported but explicitly documented as
    untested for production use
- Tests: builds a static policy from a minimal config; builds an
  adaptive policy with a ledger; missing API key raises before any
  network call.

### T03 — `--provider routing` and `--routing-config` CLI flags

```task
id: IB-WP-0020-T03
status: todo
priority: high
state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
```

- Add `routing` to the `--provider` choices on `generate run`,
  `generate resume`, and `generate from-source`.
- Add `--routing-config <path>` (required when `--provider routing`).
- Add `--quality-floor <float>` to override the config-level floor at
  the call site (handy for tightening or loosening for a single run
  without editing the file).
- Wire the loader into `_adapter_for`/`run_generation` so a
  `RoutingAssistedGenerationAdapter` is constructed and passed to the
  workflow engine.
- Tests: CLI smoke that builds a routing config pointing at mocked
  adapter ids and confirms the run goes through the bridge.

### T04 — Example config and live-smoke wiring

```task
id: IB-WP-0020-T04
status: todo
priority: medium
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
```

- Add `examples/routing/trading-literature.yaml` with a realistic
  Lefevre-aimed config: cheap model for summaries, mid model for
  entities/relations, ClaudeCode baseline behind a shadow sampler.
- Update the optional live-OpenRouter smoke test
  (`tests/test_openrouter_live.py`) with a parallel skipped test that
  exercises `--provider routing` end-to-end when both
  `OPENROUTER_API_KEY` and
  `INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` are set.
- Document how to run the live routing smoke in
  `docs/generic-source-generator.md`.

### T05 — Shadow-mode opt-in flag

```task
id: IB-WP-0020-T05
status: todo
priority: medium
state_hub_task_id: "02658420-056c-4d73-8055-e6a7ab51876b"
```

- Add `--shadow-rate <float>` and `--shadow-baseline <id>` flags so a
  caller can enable `wrap_with_shadow_sampling()` for an entire run
  without editing the config file. When set, the loader wraps each
  candidate adapter in `ShadowingAdapter` with the named baseline and
  the chosen rate.
- Tests: monkeypatched baseline asserts the shadow path fires at
  `shadow_rate=1.0` and skips at `shadow_rate=0.0`.

## Acceptance

- `infospace-bench generate from-source ... --provider routing
  --routing-config <path>` succeeds against the deterministic Lefevre
  fixture with a hand-crafted routing config and mocked adapters.
- The generation report's `## Per-stage adapter choices` section
  reflects the routed choices, and `output/budget/usage.yaml` buckets
  reflect the actual model that ran each call.
- The static `openrouter` and `fixture` provider paths remain
  unchanged.
- An optional live smoke test exists and is gated identically to the
  IB-WP-0016 OpenRouter smoke.
- Documentation explains the config shape, the API-key resolution, and
  the difference between adaptive routing and shadow-mode sampling.

## Risks and open questions

- **Adapter constructor surface.** llm-connect's adapter constructors
  vary slightly per provider; the loader needs to keep a small but
  explicit allowlist of provider names rather than reflective magic.
- **API key plumbing.** Today `openrouter` reads
  `OPENROUTER_API_KEY` directly. The config will name the env var
  explicitly to make multi-key setups workable; no key material
  belongs in the config file itself.
- **Schema versioning.** Bump `schema_version` from day one so the
  loader can refuse mismatched configs once the shape stabilises.
- **Shadow grader choice.** v1 will default the shadow grader to
  `ExactMatchJudge` because it has no extra cost. `LLMJudge` and
  `EmbeddingSimilarityJudge` configuration belongs in a follow-up.

## Downstream effects

- `infospace-bench routing ledger <path>` (already shipped via
  IB-WP-0018) becomes the natural companion CLI for inspecting the
  observations the routed runs accumulate.
- A successful T03 + T04 lets us run a multi-chapter Lefevre live
  build using the adaptive router and validate the IB-WP-0016
  reviewer checklist on real output without single-model lock-in.