Files

tegwick c11a942bb7 IB-WP-0020-T01: routing config schema and parser

Add a small YAML routing config schema (schema_version 1) and a
parser-only loader at src/infospace_bench/routing_config.py. The
loader validates the declarative shape — task_types with candidates,
optional per-task quality_floor, optional default_quality_floor,
optional ledger_path, optional stage_to_task_type override map — and
refuses bad shapes before any network or workspace work happens.

Supported provider names: openrouter, claude_code, openai, gemini.
Unknown providers, missing required candidate fields, out-of-range
quality floors, negative max_cost_per_1k, duplicate candidate ids
within a task type, and non-mapping stage_to_task_type all raise
focused InfospaceError codes that callers can pattern-match.

docs/routing-config.md documents the schema with two annotated
examples (OpenRouter-only two-tier, and adaptive with a ClaudeCode
baseline) plus the full "what fails fast" list.

16 parser tests cover happy-path round-trip, file load, missing file,
malformed YAML, and every validation surface (wrong/missing schema
version, empty task_types, empty candidates, missing required fields,
unsupported provider, negative cost, out-of-range quality_floor,
duplicate ids, non-mapping stage_map, non-string ledger_path).

T02 will turn a RoutingConfig into a live llm-connect RoutingPolicy /
AdaptiveRoutingPolicy with constructed LLMAdapter instances.

160 tests pass, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-18 18:09:28 +02:00

4.7 KiB

Raw Blame History

Routing Config Schema

Workplan: IB-WP-0020 (T01 schema, T02 loader) Module: src/infospace_bench/routing_config.py

A routing config is a small YAML file that names the candidate adapters per task type and (optionally) the quality floor, the QualityLedger path, and a stage-to-task-type override map. The file is the consumer side of llm-connect LLM-WP-0004's routing primitives: it does not embed model selection logic, just declares the universe the policy can choose from.

The schema_version is pinned to 1. Bump it (and the parser) before making backward-incompatible changes.

Top-level fields

Field	Type	Notes
`schema_version`	int (required)	Currently `1`. Mismatch fails fast.
`task_types`	mapping (required)	At least one entry. Each entry has `candidates` and an optional `quality_floor`.
`default_quality_floor`	float (optional)	Falls back when a task type does not name its own. Must be 0..1.
`ledger_path`	string (optional)	Path to a `QualityLedger` JSONL. Relative paths resolve against the workspace by default. Required when any `quality_floor` is non-null.
`stage_to_task_type`	mapping (optional)	Caller-supplied mapping from infospace-bench stage ids to task types. Falls through to identity when omitted.

Candidate fields

Each entry under task_types.<task_type>.candidates[]:

Field	Type	Notes
`id`	string (required)	Stable adapter id used for the `QualityLedger` and the per-stage adapter-choice line of the generation report.
`provider`	string (required)	One of `openrouter`, `claude_code`, `openai`, `gemini`.
`model`	string (required)	Provider-specific model id, e.g. `openai/gpt-4o-mini`.
`api_key_env`	string (optional)	Env var that holds the API key. Defaults to a provider-specific name (`OPENROUTER_API_KEY` etc.) in the T02 loader.
`max_cost_per_1k`	float (optional)	Static cost cap. Static `RoutingPolicy` falls back to a cheaper candidate when the caller-supplied estimate exceeds this.

Example A — OpenRouter-only, two-tier

A pragmatic Lefevre-style config. Cheap model for summaries, mid model for entities/relations, cheap again for evaluation. No adaptive routing, no ledger.

schema_version: 1

stage_to_task_type:
  summarize-source: cheap
  extract-entities: smart
  extract-relations: smart
  evaluate-entity: cheap
  synthesize-report: smart

task_types:
  cheap:
    candidates:
      - id: openrouter:gpt-4o-mini
        provider: openrouter
        model: openai/gpt-4o-mini
        api_key_env: OPENROUTER_API_KEY
  smart:
    candidates:
      - id: openrouter:claude-3.5-sonnet
        provider: openrouter
        model: anthropic/claude-3.5-sonnet
        api_key_env: OPENROUTER_API_KEY

Example B — Adaptive with a ClaudeCode baseline

A two-candidate-per-stage adaptive config. The QualityLedger accumulates observations; over time, the cheaper qualifying model is preferred per stage. ClaudeCodeAdapter is wired into a separate task_types.baseline rule so it can be referenced by a ShadowingAdapter builder (T05).

schema_version: 1
default_quality_floor: 0.80
ledger_path: output/routing/quality.jsonl

task_types:
  summarize-source:
    quality_floor: 0.70
    candidates:
      - id: openrouter:gpt-4o-mini
        provider: openrouter
        model: openai/gpt-4o-mini
        api_key_env: OPENROUTER_API_KEY
        max_cost_per_1k: 0.001
      - id: openrouter:claude-3.5-haiku
        provider: openrouter
        model: anthropic/claude-3.5-haiku
        api_key_env: OPENROUTER_API_KEY
        max_cost_per_1k: 0.003

  extract-entities:
    quality_floor: 0.85
    candidates:
      - id: openrouter:claude-3.5-haiku
        provider: openrouter
        model: anthropic/claude-3.5-haiku
        api_key_env: OPENROUTER_API_KEY
      - id: openrouter:claude-3.5-sonnet
        provider: openrouter
        model: anthropic/claude-3.5-sonnet
        api_key_env: OPENROUTER_API_KEY

  baseline:
    candidates:
      - id: claude-code
        provider: claude_code
        model: claude-opus-4-7

What fails fast

The parser refuses, before any network or workspace work, when:

schema_version is missing or not 1
task_types is missing or empty
Any task_type has no candidates
A candidate is missing id, provider, or model
A provider is not one of the supported names
max_cost_per_1k is non-numeric or negative
Any quality_floor (top-level or per-task) is outside 0..1
A task_type has duplicate candidate ids
ledger_path or stage_to_task_type has the wrong YAML shape

api_key_env resolution and live adapter construction happen in T02. This file only validates the declarative shape.

4.7 KiB Raw Blame History