Files
infospace-bench/workplans/IB-WP-0020-provider-routing-cli.md
tegwick debd2b8e69 IB-WP-0020-T04: example routing config + live routing smoke
examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.

tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.

tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.

docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.

174 tests pass, 2 skipped (both live smokes, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 22:19:54 +02:00

7.7 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_slug, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated depends_on_workplans related_workplans state_hub_workstream_slug state_hub_workstream_id
IB-WP-0020 workplan Provider Routing CLI Integration markitect infospace-bench active markitect markitect 2026-05-18 2026-05-18
IB-WP-0018
LLM-WP-0004
IB-WP-0016
IB-WP-0019
ib-wp-0020-provider-routing-cli 172bb082-610a-477b-b5e0-26c9f4bdfd95

IB-WP-0020 — Provider Routing CLI Integration

Goal

Expose RoutingAssistedGenerationAdapter (IB-WP-0018) as a first-class CLI option so a real multi-chapter or full-book run can use the adaptive router without writing any Python. Today --provider accepts fixture and openrouter; this workplan adds routing, plus a small config file that names the rules, the ledger, the quality floors, and the per-stage task-type overrides.

The end state is a single command that does cost-aware adaptive routing across multiple OpenRouter models and writes back the per-stage adapter choices, the budget log, and (optionally) sampled shadow grades:

infospace-bench generate from-source ./LEFEVRE.epub \
  --workspace ./infospaces \
  --slug reminiscences-routed \
  --name "Reminiscences (Routed)" \
  --profile trading-literature \
  --provider routing \
  --routing-config ./routing.yaml \
  --chapter I \
  --apply

Why this is a separate workplan

IB-WP-0018 shipped the bridge module and its programmatic API. CLI wiring needs its own config-file schema, its own loader, its own error surfaces, and its own end-to-end smoke test — and that is enough scope to justify a separate review surface rather than absorbing it into the already-closed IB-WP-0018.

Non-Goals

  • Owning the routing policy primitives (those live in llm-connect LLM-WP-0004).
  • Replacing the static openrouter provider — that path stays usable for callers who do not want the router.
  • Embedding model selection logic inside the CLI; the config file is declarative and routing decisions stay with AdaptiveRoutingPolicy.

Tasks

T01 — Routing config file schema

id: IB-WP-0020-T01
status: done
priority: medium
state_hub_task_id: "39597441-22ab-4dcf-b68d-b045823a9374"
  • Define a small YAML schema for a routing config:
    • quality_floor: <float | null> (global default)
    • ledger_path: <str | null> (relative to workspace by default)
    • task_types: map of task_type to a list of candidate adapters, each with id, provider (openrouter, claude_code, openai, …), model, api_key_env, optional max_cost_per_1k, optional quality_floor override
    • stage_to_task_type: optional override map
  • Document the schema in docs/routing-config.md with two annotated examples (one OpenRouter-only, one ClaudeCode-as-baseline + OpenRouter candidates).
  • Tests: schema parses; missing fields default cleanly; unknown providers raise a focused error.

T02 — Routing config loader

id: IB-WP-0020-T02
status: done
priority: high
state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd"
  • Add src/infospace_bench/routing_config.py (or extend routing.py) with load_routing_config(path, *, workspace) that returns a RoutingPolicy (or AdaptiveRoutingPolicy when the config sets quality_floor or names a ledger) ready to hand to RoutingAssistedGenerationAdapter.
  • Provider construction:
    • openrouter → llm-connect OpenRouterAdapter with API key from api_key_env (default OPENROUTER_API_KEY)
    • claude_code → llm-connect ClaudeCodeAdapter
    • others (openai, gemini) supported but explicitly documented as untested for production use
  • Tests: builds a static policy from a minimal config; builds an adaptive policy with a ledger; missing API key raises before any network call.

T03 — --provider routing and --routing-config CLI flags

id: IB-WP-0020-T03
status: done
priority: high
state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
  • Add routing to the --provider choices on generate run, generate resume, and generate from-source.
  • Add --routing-config <path> (required when --provider routing).
  • Add --quality-floor <float> to override the config-level floor at the call site (handy for tightening or loosening for a single run without editing the file).
  • Wire the loader into _adapter_for/run_generation so a RoutingAssistedGenerationAdapter is constructed and passed to the workflow engine.
  • Tests: CLI smoke that builds a routing config pointing at mocked adapter ids and confirms the run goes through the bridge.

T04 — Example config and live-smoke wiring

id: IB-WP-0020-T04
status: done
priority: medium
state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
  • Add examples/routing/trading-literature.yaml with a realistic Lefevre-aimed config: cheap model for summaries, mid model for entities/relations, ClaudeCode baseline behind a shadow sampler.
  • Update the optional live-OpenRouter smoke test (tests/test_openrouter_live.py) with a parallel skipped test that exercises --provider routing end-to-end when both OPENROUTER_API_KEY and INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1 are set.
  • Document how to run the live routing smoke in docs/generic-source-generator.md.

T05 — Shadow-mode opt-in flag

id: IB-WP-0020-T05
status: todo
priority: medium
state_hub_task_id: "02658420-056c-4d73-8055-e6a7ab51876b"
  • Add --shadow-rate <float> and --shadow-baseline <id> flags so a caller can enable wrap_with_shadow_sampling() for an entire run without editing the config file. When set, the loader wraps each candidate adapter in ShadowingAdapter with the named baseline and the chosen rate.
  • Tests: monkeypatched baseline asserts the shadow path fires at shadow_rate=1.0 and skips at shadow_rate=0.0.

Acceptance

  • infospace-bench generate from-source ... --provider routing --routing-config <path> succeeds against the deterministic Lefevre fixture with a hand-crafted routing config and mocked adapters.
  • The generation report's ## Per-stage adapter choices section reflects the routed choices, and output/budget/usage.yaml buckets reflect the actual model that ran each call.
  • The static openrouter and fixture provider paths remain unchanged.
  • An optional live smoke test exists and is gated identically to the IB-WP-0016 OpenRouter smoke.
  • Documentation explains the config shape, the API-key resolution, and the difference between adaptive routing and shadow-mode sampling.

Risks and open questions

  • Adapter constructor surface. llm-connect's adapter constructors vary slightly per provider; the loader needs to keep a small but explicit allowlist of provider names rather than reflective magic.
  • API key plumbing. Today openrouter reads OPENROUTER_API_KEY directly. The config will name the env var explicitly to make multi-key setups workable; no key material belongs in the config file itself.
  • Schema versioning. Bump schema_version from day one so the loader can refuse mismatched configs once the shape stabilises.
  • Shadow grader choice. v1 will default the shadow grader to ExactMatchJudge because it has no extra cost. LLMJudge and EmbeddingSimilarityJudge configuration belongs in a follow-up.

Downstream effects

  • infospace-bench routing ledger <path> (already shipped via IB-WP-0018) becomes the natural companion CLI for inspecting the observations the routed runs accumulate.
  • A successful T03 + T04 lets us run a multi-chapter Lefevre live build using the adaptive router and validate the IB-WP-0016 reviewer checklist on real output without single-model lock-in.