IB-WP-0020-T01: routing config schema and parser

Add a small YAML routing config schema (schema_version 1) and a parser-only loader at src/infospace_bench/routing_config.py. The loader validates the declarative shape — task_types with candidates, optional per-task quality_floor, optional default_quality_floor, optional ledger_path, optional stage_to_task_type override map — and refuses bad shapes before any network or workspace work happens. Supported provider names: openrouter, claude_code, openai, gemini. Unknown providers, missing required candidate fields, out-of-range quality floors, negative max_cost_per_1k, duplicate candidate ids within a task type, and non-mapping stage_to_task_type all raise focused InfospaceError codes that callers can pattern-match. docs/routing-config.md documents the schema with two annotated examples (OpenRouter-only two-tier, and adaptive with a ClaudeCode baseline) plus the full "what fails fast" list. 16 parser tests cover happy-path round-trip, file load, missing file, malformed YAML, and every validation surface (wrong/missing schema version, empty task_types, empty candidates, missing required fields, unsupported provider, negative cost, out-of-range quality_floor, duplicate ids, non-mapping stage_map, non-string ledger_path). T02 will turn a RoutingConfig into a live llm-connect RoutingPolicy / AdaptiveRoutingPolicy with constructed LLMAdapter instances. 160 tests pass, 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Refresh agent instruction files
2026-05-18 18:09:28 +02:00 · 2026-05-18 16:55:43 +02:00 · 2026-05-18 13:50:26 +02:00 · 2026-05-18 11:52:05 +02:00
19 changed files with 1609 additions and 85 deletions
--- a/.claude/rules/agents.md
+++ b/.claude/rules/agents.md
@@ -0,0 +1,20 @@
+## Kaizen Agents
+
+Specialized agent personas available on demand via the state-hub MCP.
+
+**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
+**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
+
+Common agents:
+
+| Agent | Category | When to use |
+|-------|----------|-------------|
+| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
+| `code-refactoring` | quality | Code quality analysis and safe refactoring |
+| `test-maintenance` | testing | Diagnose and fix failing tests |
+| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
+| `keepaTodofile` | process | Maintain TODO.md during work |
+| `project-management` | process | Track status, determine next steps |
+| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
+
+All 17 agents: call `list_kaizen_agents()` for the full list.
--- a/.claude/rules/architecture.md
+++ b/.claude/rules/architecture.md
@@ -1,20 +1,8 @@
-# Architecture Notes
+## Architecture

-The intended architecture is layered:
+<!-- TODO: Describe the key design decisions and component structure.
+     Key modules, data flows, external integrations, state machines, etc. -->

-```text
-markitect-tool     -> syntax layer
-kontextual-engine  -> system/runtime layer
-infospace-bench    -> application layer
-```
+## Quick Reference

-The first implementation should establish repo shape before service shape:
-
- `infospaces/` for concrete infospace projects
- `schemas/` or dependency references for artifact schemas
- `workflows/` for application-level workflow definitions
- `reports/` for evaluation and inspection outputs
- `docs/` for migration and design records
-
-Use direct dependencies on lower-layer projects only where they clarify the
-boundary. Avoid copying infrastructure wholesale from `markitect-main`.
+`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference
--- a/.claude/rules/first-session.md
+++ b/.claude/rules/first-session.md
@@ -0,0 +1,38 @@
+## First Session Protocol
+
+Triggered when `get_domain_summary("markitect")` shows **no workstreams**.
+The project is registered but work has not yet been structured.
+
+**Step 1 — Read, don't write**
+- `~/the-custodian/canon/projects/markitect/project_charter_v0.1.md` — purpose, scope
+- `~/the-custodian/canon/projects/markitect/roadmap_v0.1.md` — planned phases
+- Scan repo root: README, directory structure, existing code or docs
+
+**Step 2 — Survey in-progress work**
+Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
+
+**Step 3 — Propose workstreams to Bernd**
+Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a
+roadmap phase. **Wait for approval before creating.**
+
+**Step 4 — Create workplan file first, then DB record (ADR-001)**
+```
+workplans/infospace-bench-WP-NNNN-<slug>.md   ← write this first
+```
+Then register in the hub:
+```
+create_workstream(topic_id="5571d954-0d30-4950-980d-7bcaaad8e3e2", title="...", owner="...", description="...")
+create_task(workstream_id="<id>", title="...", priority="high|medium|low")
+```
+
+**Step 5 — Record the setup**
+```
+add_progress_event(
+    summary="First session: structured markitect into N workstreams, M tasks",
+    event_type="milestone",
+    topic_id="5571d954-0d30-4950-980d-7bcaaad8e3e2",
+    detail={"workstreams": [...], "tasks_created": M}
+)
+```
+
+<!-- Delete or archive this file once past first session -->
--- a/.claude/rules/repo-boundary.md
+++ b/.claude/rules/repo-boundary.md
@@ -1,19 +1,8 @@
-# Repo Boundary
+## Repo boundary

-`infospace-bench` owns application-level infospace usage. It must not absorb
-lower-layer responsibilities.
+This repo owns **infospace-bench** only. It does not own:

-Belongs here:
-
- Infospace definitions and examples
- Application workflow definitions
- Evaluation and inspection reports
- Migration notes from `markitect-main`
- Workplans for applied infospace capabilities
-
-Belongs elsewhere:
-
- Markdown parsing and structural syntax primitives: `markitect-tool`
- Runtime persistence and orchestration: `kontextual-engine`
- LLM provider abstraction: `llm-connect` or equivalent
- Final production domain artifacts: the relevant domain repo
+<!-- TODO: List what belongs in adjacent repos, e.g.:
+- SSH key management → railiance-infra/
+- State hub code     → state-hub/
+-->
--- a/.claude/rules/repo-identity.md
+++ b/.claude/rules/repo-identity.md
@@ -1,11 +1,5 @@
-# Repo Identity
+**Purpose:** Application-layer workspace and service for concrete structured knowledge spaces; scoped successor to markitect-main infospace work.

- Project: `infospace-bench`
- Domain: `markitect`
- State Hub repo slug: `infospace-bench`
- State Hub topic ID: `5571d954-0d30-4950-980d-7bcaaad8e3e2`
- Purpose: application-layer workspace and service for concrete infospaces.
-
-This repo is a scoped successor to the application-level infospace work in
-`markitect-main`. It should preserve and extend the parts that help create,
-evaluate, inspect, and evolve real knowledge spaces.
+**Domain:** markitect
+**Repo slug:** infospace-bench
+**Topic ID:** 5571d954-0d30-4950-980d-7bcaaad8e3e2
--- a/.claude/rules/session-protocol.md
+++ b/.claude/rules/session-protocol.md
@@ -1,8 +1,84 @@
-# Session Protocol
+## Session Protocol

-1. Read `SCOPE.md`, `INTENT.md`, and the active workplan before making changes.
-2. Check `git status --short` and preserve user changes.
-3. Use State Hub as the coordination record when available.
-4. Keep repo artifacts traceable: workplans, docs, configs, metrics, and outputs
-   should explain what changed and why.
-5. Prefer narrow, inspectable changes over broad platform work.
+State Hub: http://127.0.0.1:8000
+
+**Step 1 — Orient**
+
+Read the offline-safe brief first — it works without a live hub connection:
+```bash
+cat .custodian-brief.md
+```
+Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
+```
+get_domain_summary("markitect")
+```
+If MCP tools are unavailable in the current agent session, use the REST API:
+```bash
+curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
+```
+If the hub is offline: `cd ~/state-hub && make api`
+
+**Step 2 — Check inbox**
+With MCP tools:
+```
+get_messages(to_agent="infospace-bench", unread_only=True)
+```
+Mark read with `mark_message_read(message_id)`. Reply or act on coordination
+requests before proceeding.
+
+Without MCP tools:
+```bash
+curl -s "http://127.0.0.1:8000/messages/?to_agent=infospace-bench&unread_only=true" \
+  | python3 -m json.tool
+curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
+  -H "Content-Type: application/json" -d '{}'
+```
+
+**Step 3 — Scan workplans**
+```bash
+ls workplans/
+```
+For each file with `status: ready`, `active`, or `blocked`, note pending
+`todo`/`in_progress` tasks.
+
+**Step 4 — Present brief**
+
+1. **Active workstreams** for `markitect` — title, task counts, blocking decisions
+2. **Pending tasks** from `workplans/` + any `[repo:infospace-bench]` hub tasks
+3. **Goal guidance** — if `goal_guidance` in summary:
+   - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
+   - `alignment_warnings`: flag if active work is not aligned with current goal
+4. **Suggested next action** — highest-priority open item
+5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
+
+If no workstreams: follow First Session Protocol (`first-session.md`).
+
+**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
+
+> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
+> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
+
+**Session close:**
+With MCP tools:
+```
+add_progress_event(summary="...", topic_id="5571d954-0d30-4950-980d-7bcaaad8e3e2", workstream_id="<uuid>")
+```
+Without MCP tools:
+```bash
+curl -s -X POST http://127.0.0.1:8000/progress/ \
+  -H "Content-Type: application/json" \
+  -d '{"topic_id":"5571d954-0d30-4950-980d-7bcaaad8e3e2","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
+```
+If workplan files were modified, ensure the local copy is up to date first:
+```bash
+git -C <repo_path> pull --ff-only
+cd ~/state-hub && make fix-consistency REPO=infospace-bench
+```
+For repos where implementation runs on a remote machine (e.g. CoulombCore),
+use the combined target which pulls before fixing:
+```bash
+cd ~/state-hub && make fix-consistency-remote REPO=infospace-bench
+```
+**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
+will sync the file to match DB.  **C-16** (repo behind remote) blocks all writes
+until you pull — intentional to prevent clobbering remote progress.
--- a/.claude/rules/stack-and-commands.md
+++ b/.claude/rules/stack-and-commands.md
@@ -1,26 +1,19 @@
-# Stack And Commands
+## Stack

-The implementation stack is not established yet. Until it is, prefer
-documentation and small scaffold changes over choosing frameworks prematurely.
+<!-- TODO: Fill in language, frameworks, and key dependencies -->
+- **Language:**
+- **Key deps:**

-The Python package depends on path deps (`markitect-tool`, `artifactstore`)
-that bring heavy runtime dependencies. Use the Makefile to provision a
-local venv before running tests:
+## Dev Commands

 ```bash
-make install     # creates ./.venv with all path deps
-make test        # full pytest suite (must run via .venv/bin/python)
-```
-
-Useful commands:
-
-```bash
-git status --short
-rg --files
-```
-
-State Hub registration was completed with:
-
-```bash
-/home/worsch/the-custodian/state-hub/.venv/bin/custodian register-project --domain markitect --path /home/worsch/infospace-bench
+# TODO: Fill in the standard commands for this repo
+
+# Install dependencies
+
+# Run tests
+
+# Lint / type check
+
+# Build / package (if applicable)
 ```
--- a/.claude/rules/workplan-convention.md
+++ b/.claude/rules/workplan-convention.md
@@ -1,9 +1,28 @@
-# Workplan Convention
+## Workplan Convention (ADR-001)

- Workplans live in `workplans/`.
- Prefix new workplans with `IB-WP-`.
- Use YAML frontmatter with `id`, `type`, `title`, `domain`, `repo`, `status`,
-  `owner`, `topic_slug`, `created`, and `updated`.
- Include task blocks with stable IDs, status, priority, and optional State Hub
-  task IDs.
- Keep workplans tied to this repo's PRD/FRS requirements and State Hub context.
+File location: `workplans/infospace-bench-WP-NNNN-<slug>.md`
+ID prefix: `INFOSPACE-WP`
+
+Work items originate as files in this repo **before** being registered in the hub.
+
+Canonical workplan/workstream frontmatter statuses are:
+`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
+Use `proposed` for a newly drafted plan, `ready` after review against current
+repo state, and `finished` when implementation is complete. `stalled` and
+`needs_review` are derived health labels, not stored statuses.
+
+Closed workplans may be moved to `workplans/archived/` with a completion-date
+prefix: `YYMMDD-infospace-bench-WP-NNNN-<slug>.md`. The frontmatter id remains
+unchanged; the prefix is only for quick visual reference.
+
+Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
+`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
+`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
+directly. Promote anything requiring analysis, design, approval, dependencies, or
+multiple planned phases into a normal workplan.
+
+Ecosystem todos from other agents arrive as `[repo:infospace-bench]` hub tasks —
+visible at session start. Pick one up by creating the workplan file, then registering
+the workstream.
+
+<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,162 @@
+# infospace-bench — Agent Instructions
+
+## Repo Identity
+
+**Purpose:** Application-layer workspace and service for concrete structured knowledge spaces; scoped successor to markitect-main infospace work.
+
+**Domain:** markitect
+**Repo slug:** infospace-bench
+**Topic ID:** `5571d954-0d30-4950-980d-7bcaaad8e3e2`
+**Workplan prefix:** `INFOSPACE-WP-`
+
+---
+
+## State Hub Integration
+
+The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
+there is no MCP server for Codex agents.
+
+| Context | URL |
+|---------|-----|
+| Local workstation | `http://127.0.0.1:8000` |
+| Remote via tunnel | `http://127.0.0.1:18000` |
+
+### Orient at session start
+
+```bash
+# Offline brief — works without hub connection
+cat .custodian-brief.md
+
+# Active workstreams for this domain
+curl -s "http://127.0.0.1:8000/workstreams/?topic_id=5571d954-0d30-4950-980d-7bcaaad8e3e2&status=active" \
+  | python3 -m json.tool
+
+# Check inbox
+curl -s "http://127.0.0.1:8000/messages/?to_agent=infospace-bench&unread_only=true" \
+  | python3 -m json.tool
+```
+
+Mark a message read:
+```bash
+curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
+  -H "Content-Type: application/json" -d '{}'
+```
+
+### Log progress (required at session close)
+
+```bash
+curl -s -X POST http://127.0.0.1:8000/progress/ \
+  -H "Content-Type: application/json" \
+  -d '{
+    "summary": "what was done",
+    "event_type": "note",
+    "author": "codex",
+    "workstream_id": "<uuid>",
+    "task_id": "<uuid>"
+  }'
+```
+
+Omit `workstream_id` / `task_id` when not applicable.
+
+### Update task status
+
+```bash
+curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
+  -H "Content-Type: application/json" \
+  -d '{"status": "in_progress"}'
+# values: todo | in_progress | done | blocked
+```
+
+### Flag a task for human review
+
+```bash
+curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
+  -H "Content-Type: application/json" \
+  -d '{"needs_human": true, "intervention_note": "reason"}'
+```
+
+---
+
+## Session Protocol
+
+**Start:**
+1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
+2. Check inbox: `GET /messages/?to_agent=infospace-bench&unread_only=true`; mark read
+3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
+4. Check blocked tasks: `GET /tasks/?needs_human=true`
+
+**During work:**
+- Update task statuses in workplan files as tasks progress
+- Record significant decisions via `POST /decisions/`
+
+**Close:**
+1. Update workplan file task statuses to reflect progress
+2. Log: `POST /progress/` with a summary of what changed
+3. Note for the custodian operator: after workplan file changes, run from
+   `~/state-hub`:
+   ```bash
+   make fix-consistency REPO=infospace-bench
+   ```
+   This syncs task status from files into the hub DB.
+
+---
+
+## Workplan Convention (ADR-001)
+
+Work items originate as files in this repo — not in the hub. The hub is a
+read/cache/index layer that rebuilds from files.
+
+**File location:** `workplans/INFOSPACE-WP-NNNN-<slug>.md`
+
+**Archived location:** finished workplans may move to
+`workplans/archived/YYMMDD-INFOSPACE-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
+the completion/archive date; the frontmatter `id` does not change.
+
+**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
+`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
+this only for low-risk work completed directly; create a normal workplan for
+anything needing analysis, design, approval, dependencies, or multiple phases.
+
+**Frontmatter:**
+
+```yaml
+---
+id: INFOSPACE-WP-NNNN
+type: workplan
+title: "..."
+domain: markitect
+repo: infospace-bench
+status: proposed | ready | active | blocked | backlog | finished | archived
+owner: codex
+topic_slug: ...
+created: "YYYY-MM-DD"
+updated: "YYYY-MM-DD"
+state_hub_workstream_id: "<uuid>"   # written by fix-consistency — do not edit
+---
+```
+
+Use `proposed` for a new draft, `ready` after review against current repo
+state, and `finished` after implementation. `stalled` and `needs_review` are
+derived health labels, not frontmatter statuses.
+
+**Task block format** (one per `##` section):
+
+```
+## Task Title
+
+` ` `task
+id: INFOSPACE-WP-NNNN-T01
+status: todo | in_progress | done | blocked
+priority: high | medium | low
+state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
+` ` `
+
+Task description text.
+```
+
+Status progression: `todo` → `in_progress` → `done` (or `blocked`)
+
+To create a new workplan:
+1. Write the file following the format above
+2. Notify the custodian operator to run `make fix-consistency REPO=infospace-bench`
+   (or send a message to the hub agent via `POST /messages/`)
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,7 +1,11 @@
+# infospace-bench — Claude Code Instructions
+
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
+@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
-@.claude/rules/repo-boundary.md
-@.claude/rules/architecture.md
@.claude/rules/stack-and-commands.md
+@.claude/rules/architecture.md
+@.claude/rules/repo-boundary.md
+@.claude/rules/agents.md
--- a/docs/routing-config.md
+++ b/docs/routing-config.md
@@ -0,0 +1,131 @@
+# Routing Config Schema
+
+Workplan: IB-WP-0020 (T01 schema, T02 loader)
+Module: `src/infospace_bench/routing_config.py`
+
+A routing config is a small YAML file that names the candidate adapters
+per task type and (optionally) the quality floor, the
+`QualityLedger` path, and a stage-to-task-type override map. The file
+is the consumer side of llm-connect `LLM-WP-0004`'s routing primitives:
+it does not embed model selection logic, just declares the universe
+the policy can choose from.
+
+The schema_version is pinned to `1`. Bump it (and the parser) before
+making backward-incompatible changes.
+
+## Top-level fields
+
+| Field | Type | Notes |
+|---|---|---|
+| `schema_version` | int (required) | Currently `1`. Mismatch fails fast. |
+| `task_types` | mapping (required) | At least one entry. Each entry has `candidates` and an optional `quality_floor`. |
+| `default_quality_floor` | float (optional) | Falls back when a task type does not name its own. Must be 0..1. |
+| `ledger_path` | string (optional) | Path to a `QualityLedger` JSONL. Relative paths resolve against the workspace by default. Required when any `quality_floor` is non-null. |
+| `stage_to_task_type` | mapping (optional) | Caller-supplied mapping from infospace-bench stage ids to task types. Falls through to identity when omitted. |
+
+## Candidate fields
+
+Each entry under `task_types.<task_type>.candidates[]`:
+
+| Field | Type | Notes |
+|---|---|---|
+| `id` | string (required) | Stable adapter id used for the `QualityLedger` and the per-stage adapter-choice line of the generation report. |
+| `provider` | string (required) | One of `openrouter`, `claude_code`, `openai`, `gemini`. |
+| `model` | string (required) | Provider-specific model id, e.g. `openai/gpt-4o-mini`. |
+| `api_key_env` | string (optional) | Env var that holds the API key. Defaults to a provider-specific name (`OPENROUTER_API_KEY` etc.) in the T02 loader. |
+| `max_cost_per_1k` | float (optional) | Static cost cap. Static `RoutingPolicy` falls back to a cheaper candidate when the caller-supplied estimate exceeds this. |
+
+## Example A — OpenRouter-only, two-tier
+
+A pragmatic Lefevre-style config. Cheap model for summaries, mid model
+for entities/relations, cheap again for evaluation. No adaptive
+routing, no ledger.
+
+```yaml
+schema_version: 1
+
+stage_to_task_type:
+  summarize-source: cheap
+  extract-entities: smart
+  extract-relations: smart
+  evaluate-entity: cheap
+  synthesize-report: smart
+
+task_types:
+  cheap:
+    candidates:
+      - id: openrouter:gpt-4o-mini
+        provider: openrouter
+        model: openai/gpt-4o-mini
+        api_key_env: OPENROUTER_API_KEY
+  smart:
+    candidates:
+      - id: openrouter:claude-3.5-sonnet
+        provider: openrouter
+        model: anthropic/claude-3.5-sonnet
+        api_key_env: OPENROUTER_API_KEY
+```
+
+## Example B — Adaptive with a ClaudeCode baseline
+
+A two-candidate-per-stage adaptive config. The `QualityLedger`
+accumulates observations; over time, the cheaper qualifying model is
+preferred per stage. `ClaudeCodeAdapter` is wired into a separate
+`task_types.baseline` rule so it can be referenced by a
+`ShadowingAdapter` builder (T05).
+
+```yaml
+schema_version: 1
+default_quality_floor: 0.80
+ledger_path: output/routing/quality.jsonl
+
+task_types:
+  summarize-source:
+    quality_floor: 0.70
+    candidates:
+      - id: openrouter:gpt-4o-mini
+        provider: openrouter
+        model: openai/gpt-4o-mini
+        api_key_env: OPENROUTER_API_KEY
+        max_cost_per_1k: 0.001
+      - id: openrouter:claude-3.5-haiku
+        provider: openrouter
+        model: anthropic/claude-3.5-haiku
+        api_key_env: OPENROUTER_API_KEY
+        max_cost_per_1k: 0.003
+
+  extract-entities:
+    quality_floor: 0.85
+    candidates:
+      - id: openrouter:claude-3.5-haiku
+        provider: openrouter
+        model: anthropic/claude-3.5-haiku
+        api_key_env: OPENROUTER_API_KEY
+      - id: openrouter:claude-3.5-sonnet
+        provider: openrouter
+        model: anthropic/claude-3.5-sonnet
+        api_key_env: OPENROUTER_API_KEY
+
+  baseline:
+    candidates:
+      - id: claude-code
+        provider: claude_code
+        model: claude-opus-4-7
+```
+
+## What fails fast
+
+The parser refuses, before any network or workspace work, when:
+
+- `schema_version` is missing or not `1`
+- `task_types` is missing or empty
+- Any `task_type` has no `candidates`
+- A candidate is missing `id`, `provider`, or `model`
+- A `provider` is not one of the supported names
+- `max_cost_per_1k` is non-numeric or negative
+- Any `quality_floor` (top-level or per-task) is outside 0..1
+- A `task_type` has duplicate candidate `id`s
+- `ledger_path` or `stage_to_task_type` has the wrong YAML shape
+
+`api_key_env` resolution and live adapter construction happen in T02.
+This file only validates the declarative shape.
--- a/src/infospace_bench/cli.py
+++ b/src/infospace_bench/cli.py
@@ -256,6 +256,14 @@ def build_parser() -> argparse.ArgumentParser:
    )
    generate_from_source.add_argument("--apply", action="store_true")

+    routing = sub.add_parser("routing", help="Inspect llm-connect routing observations")
+    routing_sub = routing.add_subparsers(dest="routing_command", required=True)
+    routing_ledger = routing_sub.add_parser(
+        "ledger",
+        help="Summarise a llm-connect QualityLedger by (task_type, adapter_id)",
+    )
+    routing_ledger.add_argument("ledger_path")
+
    budget = sub.add_parser("budget", help="Inspect per-infospace budget and usage records")
    budget_sub = budget.add_subparsers(dest="budget_command", required=True)
    budget_list = budget_sub.add_parser(
@@ -587,6 +595,17 @@ def main(argv: list[str] | None = None) -> int:
                    _write_json(plan_generation(infospace.root, stage=args.stage))
            else:
                parser.error(f"Unhandled generate command: {args.generate_command}")
+        elif args.command == "routing":
+            from .routing import summarise_quality_ledger
+            if args.routing_command == "ledger":
+                _write_json(
+                    {
+                        "ledger_path": str(Path(args.ledger_path)),
+                        "rows": summarise_quality_ledger(args.ledger_path),
+                    }
+                )
+            else:
+                parser.error(f"Unhandled routing command: {args.routing_command}")
        elif args.command == "budget":
            from .budget import budget_list_workspace, budget_show
            if args.budget_command == "list":
--- a/src/infospace_bench/generator.py
+++ b/src/infospace_bench/generator.py
@@ -791,6 +791,15 @@ def _write_generation_report(root: Path, metrics: dict[str, Any], snapshot_id: s
                "",
            ]
        )
+    if review.get("adapter_choices"):
+        lines.extend(["## Per-stage adapter choices", ""])
+        for row in review["adapter_choices"]:
+            lines.append(
+                f"- `{row['stage_id']}` ({row['task_type']}) -> "
+                f"`{row['adapter_id']}` · {row['calls']} call(s) · "
+                f"{row['prompt_tokens']} prompt + {row['completion_tokens']} completion tokens"
+            )
+        lines.append("")
    text = "\n".join(lines)
    path = root / "reports" / "generation-summary.md"
    path.parent.mkdir(parents=True, exist_ok=True)
@@ -872,15 +881,55 @@ def _collect_review_report(root: Path) -> dict[str, Any]:
    entity_titles = sorted(
        {item.title for item in infospace.artifacts if item.kind == "entity" and item.title}
    )
+    adapter_choices = _collect_adapter_choices(generated)
    return {
        "chapter_coverage": chapter_coverage,
        "entity_titles": entity_titles,
        "unmapped_sources": unmapped,
        "page_anchor_total": len(anchors),
        "page_anchor_sample": anchors[:6],
+        "adapter_choices": adapter_choices,
    }


+def _collect_adapter_choices(generated: list[Any]) -> list[dict[str, Any]]:
+    """Roll up which adapter ran each stage when the routing bridge was used.
+
+    Returns one row per (stage_id, adapter_id) with call counts and
+    cumulative tokens. Entries without provider_metadata are skipped so
+    fixture-only runs produce an empty list rather than a noisy section.
+    """
+    buckets: dict[tuple[str, str], dict[str, Any]] = {}
+    for item in generated:
+        provenance = item.provenance or {}
+        metadata = provenance.get("provider_metadata") or {}
+        if not isinstance(metadata, dict):
+            continue
+        adapter_id = str(metadata.get("adapter_id") or metadata.get("model") or "")
+        if not adapter_id:
+            continue
+        stage_id = str(metadata.get("stage_id") or provenance.get("stage_id") or "")
+        if not stage_id:
+            continue
+        usage = metadata.get("usage") or {}
+        key = (stage_id, adapter_id)
+        bucket = buckets.setdefault(
+            key,
+            {
+                "stage_id": stage_id,
+                "adapter_id": adapter_id,
+                "task_type": metadata.get("task_type") or stage_id,
+                "calls": 0,
+                "prompt_tokens": 0,
+                "completion_tokens": 0,
+            },
+        )
+        bucket["calls"] += 1
+        bucket["prompt_tokens"] += int(usage.get("prompt_tokens") or 0)
+        bucket["completion_tokens"] += int(usage.get("completion_tokens") or 0)
+    return sorted(buckets.values(), key=lambda row: (row["stage_id"], row["adapter_id"]))
+
+
 def _workflow_ids_for_stage(stage: str) -> list[str]:
    normalized = stage.strip().lower()
    if normalized == "intake":
--- a/src/infospace_bench/routing.py
+++ b/src/infospace_bench/routing.py
@@ -15,8 +15,11 @@ from dataclasses import dataclass, field
 from typing import Any

 from llm_connect.adapter import LLMAdapter
+from llm_connect.grading import BaselineGrader
 from llm_connect.models import RunConfig
+from llm_connect.quality import QualityLedger
 from llm_connect.routing import AdaptiveRoutingPolicy, RoutingPolicy
+from llm_connect.shadowing import ShadowingAdapter

 from .workflow import AssistedGenerationRequest, AssistedGenerationResult

@@ -116,6 +119,88 @@ def _identify_adapter(adapter: LLMAdapter) -> str:
    return name


+def wrap_with_shadow_sampling(
+    *,
+    candidate: LLMAdapter,
+    baseline: LLMAdapter,
+    grader: BaselineGrader,
+    ledger: QualityLedger,
+    task_type: str,
+    adapter_id: str | None = None,
+    baseline_adapter_id: str | None = None,
+    shadow_rate: float = 0.1,
+    async_shadow: bool = True,
+    on_shadow_error: Any | None = None,
+) -> ShadowingAdapter:
+    """Wrap ``candidate`` with llm-connect's ``ShadowingAdapter``.
+
+    Sampled baseline grading collects QualityLedger observations without
+    changing the response the caller sees. Errors in the shadow path
+    (baseline outage, grader failure, ledger write error) never alter the
+    candidate response — failures land on ``on_shadow_error`` when
+    provided, else are silently swallowed by the underlying adapter.
+
+    The returned ``ShadowingAdapter`` is still an ``LLMAdapter``, so it
+    can be slotted into a ``RoutingPolicy`` rule and used through
+    ``RoutingAssistedGenerationAdapter`` without further changes.
+    """
+    return ShadowingAdapter(
+        candidate_adapter=candidate,
+        baseline_adapter=baseline,
+        grader=grader,
+        ledger=ledger,
+        task_type=task_type,
+        adapter_id=adapter_id or _identify_adapter(candidate),
+        baseline_adapter_id=baseline_adapter_id or _identify_adapter(baseline),
+        shadow_rate=shadow_rate,
+        async_shadow=async_shadow,
+        on_shadow_error=on_shadow_error,
+    )
+
+
+def summarise_quality_ledger(
+    ledger_path: str | Any,
+) -> list[dict[str, Any]]:
+    """Roll up a QualityLedger into one row per (task_type, adapter_id).
+
+    Useful as a CLI helper or a quick budget-style inspection without
+    loading llm-connect's full ledger API at the call site.
+    """
+    from pathlib import Path
+
+    ledger = QualityLedger(path=Path(ledger_path))
+    observations = ledger.read_all()
+    grouped: dict[tuple[str, str], dict[str, Any]] = {}
+    for obs in observations:
+        key = (obs.task_type, obs.adapter_id)
+        bucket = grouped.setdefault(
+            key,
+            {
+                "task_type": obs.task_type,
+                "adapter_id": obs.adapter_id,
+                "observations": 0,
+                "mean_quality": 0.0,
+                "mean_cost_usd": 0.0,
+                "total_tokens_in": 0,
+                "total_tokens_out": 0,
+            },
+        )
+        bucket["observations"] += 1
+        bucket["mean_quality"] += float(obs.quality_score)
+        bucket["mean_cost_usd"] += float(obs.cost_usd)
+        bucket["total_tokens_in"] += int(getattr(obs, "tokens_in", 0) or 0)
+        bucket["total_tokens_out"] += int(getattr(obs, "tokens_out", 0) or 0)
+    rows: list[dict[str, Any]] = []
+    for bucket in grouped.values():
+        count = bucket["observations"]
+        if count:
+            bucket["mean_quality"] = round(bucket["mean_quality"] / count, 4)
+            bucket["mean_cost_usd"] = round(bucket["mean_cost_usd"] / count, 6)
+        rows.append(bucket)
+    rows.sort(key=lambda row: (row["task_type"], row["adapter_id"]))
+    return rows
+
+
 def _provider_tag(adapter: LLMAdapter) -> str:
    """Coarse provider tag matching the strings already used in run records.

--- a/src/infospace_bench/routing_config.py
+++ b/src/infospace_bench/routing_config.py
@@ -0,0 +1,265 @@
+"""
+Routing config schema (IB-WP-0020-T01).
+
+Parser-only: this module reads a YAML file into validated dataclasses.
+The follow-on task T02 takes a ``RoutingConfig`` and constructs the
+actual llm-connect ``RoutingPolicy`` / ``AdaptiveRoutingPolicy`` plus
+LLMAdapter instances (which involves API keys and provider-specific
+construction). Keeping parsing separate lets T01 stay network-free and
+deterministically testable.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from .errors import InfospaceError
+
+ROUTING_SCHEMA_VERSION = 1
+
+# Provider names that the T02 loader will know how to construct.
+# Validation happens here so a config typo fails before any work begins.
+SUPPORTED_PROVIDERS: frozenset[str] = frozenset(
+    {"openrouter", "claude_code", "openai", "gemini"}
+)
+
+
+@dataclass(frozen=True)
+class RoutingCandidateConfig:
+    """One candidate adapter inside a task_type rule."""
+
+    id: str
+    provider: str
+    model: str
+    api_key_env: str = ""
+    max_cost_per_1k: float | None = None
+
+
+@dataclass(frozen=True)
+class RoutingTaskTypeConfig:
+    """All candidate adapters for one task_type, with an optional quality floor."""
+
+    task_type: str
+    candidates: tuple[RoutingCandidateConfig, ...]
+    quality_floor: float | None = None
+
+
+@dataclass(frozen=True)
+class RoutingConfig:
+    """Top-level routing config payload, parsed from YAML."""
+
+    schema_version: int
+    task_types: tuple[RoutingTaskTypeConfig, ...]
+    default_quality_floor: float | None = None
+    ledger_path: str | None = None
+    stage_to_task_type: dict[str, str] = field(default_factory=dict)
+
+
+def load_routing_config(path: str | Path) -> RoutingConfig:
+    """Read and validate a routing config YAML file."""
+    config_path = Path(path)
+    if not config_path.is_file():
+        raise InfospaceError(
+            "missing_routing_config",
+            f"Routing config does not exist: {config_path}",
+            {"path": str(config_path)},
+        )
+    raw_text = config_path.read_text(encoding="utf-8")
+    try:
+        data = yaml.safe_load(raw_text)
+    except yaml.YAMLError as exc:
+        raise InfospaceError(
+            "invalid_routing_config_yaml",
+            f"Routing config is not valid YAML: {exc}",
+            {"path": str(config_path)},
+        ) from exc
+    if not isinstance(data, dict):
+        raise InfospaceError(
+            "invalid_routing_config",
+            "Routing config must be a YAML mapping at the top level",
+            {"path": str(config_path)},
+        )
+    return parse_routing_config(data, source=str(config_path))
+
+
+def parse_routing_config(
+    data: dict[str, Any], *, source: str = "<inline>"
+) -> RoutingConfig:
+    """Validate a parsed routing config dict and return a frozen config."""
+    schema_version = data.get("schema_version")
+    if not isinstance(schema_version, int) or schema_version != ROUTING_SCHEMA_VERSION:
+        raise InfospaceError(
+            "unsupported_routing_schema",
+            f"Routing config schema_version must be {ROUTING_SCHEMA_VERSION}",
+            {"source": source, "got": schema_version},
+        )
+    task_types_raw = data.get("task_types") or {}
+    if not isinstance(task_types_raw, dict) or not task_types_raw:
+        raise InfospaceError(
+            "empty_routing_task_types",
+            "Routing config must declare at least one task_type with candidates",
+            {"source": source},
+        )
+
+    task_types: list[RoutingTaskTypeConfig] = []
+    for task_type, entry in task_types_raw.items():
+        task_types.append(_parse_task_type(str(task_type), entry, source=source))
+
+    default_floor = _optional_quality_floor(
+        data.get("default_quality_floor"), "default_quality_floor", source
+    )
+    ledger_path_value = data.get("ledger_path")
+    if ledger_path_value is not None and not isinstance(ledger_path_value, str):
+        raise InfospaceError(
+            "invalid_routing_ledger_path",
+            "ledger_path must be a string when present",
+            {"source": source},
+        )
+
+    stage_map_raw = data.get("stage_to_task_type") or {}
+    if not isinstance(stage_map_raw, dict):
+        raise InfospaceError(
+            "invalid_routing_stage_map",
+            "stage_to_task_type must be a mapping",
+            {"source": source},
+        )
+    stage_to_task_type = {str(key): str(value) for key, value in stage_map_raw.items()}
+
+    return RoutingConfig(
+        schema_version=schema_version,
+        task_types=tuple(task_types),
+        default_quality_floor=default_floor,
+        ledger_path=ledger_path_value if isinstance(ledger_path_value, str) else None,
+        stage_to_task_type=stage_to_task_type,
+    )
+
+
+def _parse_task_type(
+    task_type: str, entry: Any, *, source: str
+) -> RoutingTaskTypeConfig:
+    if not isinstance(entry, dict):
+        raise InfospaceError(
+            "invalid_routing_task_type",
+            f"task_types.{task_type} must be a mapping",
+            {"source": source, "task_type": task_type},
+        )
+    candidates_raw = entry.get("candidates") or []
+    if not isinstance(candidates_raw, list) or not candidates_raw:
+        raise InfospaceError(
+            "empty_routing_candidates",
+            f"task_types.{task_type} must declare at least one candidate",
+            {"source": source, "task_type": task_type},
+        )
+    candidates: list[RoutingCandidateConfig] = []
+    seen_ids: set[str] = set()
+    for index, candidate_raw in enumerate(candidates_raw):
+        candidate = _parse_candidate(task_type, index, candidate_raw, source=source)
+        if candidate.id in seen_ids:
+            raise InfospaceError(
+                "duplicate_routing_candidate_id",
+                f"task_types.{task_type} has duplicate candidate id {candidate.id!r}",
+                {"source": source, "task_type": task_type, "id": candidate.id},
+            )
+        seen_ids.add(candidate.id)
+        candidates.append(candidate)
+    quality_floor = _optional_quality_floor(
+        entry.get("quality_floor"),
+        f"task_types.{task_type}.quality_floor",
+        source,
+    )
+    return RoutingTaskTypeConfig(
+        task_type=task_type,
+        candidates=tuple(candidates),
+        quality_floor=quality_floor,
+    )
+
+
+def _parse_candidate(
+    task_type: str, index: int, candidate_raw: Any, *, source: str
+) -> RoutingCandidateConfig:
+    if not isinstance(candidate_raw, dict):
+        raise InfospaceError(
+            "invalid_routing_candidate",
+            f"task_types.{task_type}.candidates[{index}] must be a mapping",
+            {"source": source, "task_type": task_type, "index": index},
+        )
+    candidate_id = str(candidate_raw.get("id") or "").strip()
+    provider = str(candidate_raw.get("provider") or "").strip().lower()
+    model = str(candidate_raw.get("model") or "").strip()
+    missing = [
+        field_name
+        for field_name, value in (("id", candidate_id), ("provider", provider), ("model", model))
+        if not value
+    ]
+    if missing:
+        raise InfospaceError(
+            "missing_routing_candidate_field",
+            f"task_types.{task_type}.candidates[{index}] is missing required fields: "
+            f"{', '.join(missing)}",
+            {
+                "source": source,
+                "task_type": task_type,
+                "index": index,
+                "missing": missing,
+            },
+        )
+    if provider not in SUPPORTED_PROVIDERS:
+        raise InfospaceError(
+            "unsupported_routing_provider",
+            f"Unsupported provider {provider!r}; allowed: {sorted(SUPPORTED_PROVIDERS)}",
+            {
+                "source": source,
+                "task_type": task_type,
+                "index": index,
+                "provider": provider,
+            },
+        )
+    max_cost = _optional_float(
+        candidate_raw.get("max_cost_per_1k"),
+        f"task_types.{task_type}.candidates[{index}].max_cost_per_1k",
+        source,
+    )
+    if max_cost is not None and max_cost < 0:
+        raise InfospaceError(
+            "invalid_routing_max_cost",
+            "max_cost_per_1k must be non-negative",
+            {"source": source, "task_type": task_type, "index": index, "value": max_cost},
+        )
+    api_key_env = str(candidate_raw.get("api_key_env") or "").strip()
+    return RoutingCandidateConfig(
+        id=candidate_id,
+        provider=provider,
+        model=model,
+        api_key_env=api_key_env,
+        max_cost_per_1k=max_cost,
+    )
+
+
+def _optional_float(value: Any, name: str, source: str) -> float | None:
+    if value is None:
+        return None
+    try:
+        return float(value)
+    except (TypeError, ValueError) as exc:
+        raise InfospaceError(
+            "invalid_routing_float",
+            f"{name} must be numeric",
+            {"source": source, "value": value},
+        ) from exc
+
+
+def _optional_quality_floor(value: Any, name: str, source: str) -> float | None:
+    floor = _optional_float(value, name, source)
+    if floor is None:
+        return None
+    if not 0 <= floor <= 1:
+        raise InfospaceError(
+            "invalid_routing_quality_floor",
+            f"{name} must be between 0 and 1",
+            {"source": source, "name": name, "value": floor},
+        )
+    return floor
--- a/tests/test_routing_adapter.py
+++ b/tests/test_routing_adapter.py
@@ -213,6 +213,200 @@ def test_bridge_preserves_response_metadata_and_provider_tag() -> None:
    assert result.provider == "mock"


+def test_wrap_with_shadow_sampling_passes_candidate_through(tmp_path) -> None:
+    from llm_connect.grading import ExactMatchJudge, PairedGrader
+    from infospace_bench.routing import wrap_with_shadow_sampling
+
+    candidate = _MockAdapter(model="cheap-1", content="match")
+    baseline = _MockAdapter(model="baseline-1", content="match")
+    ledger = QualityLedger(path=tmp_path / "quality.jsonl")
+    grader = PairedGrader(judge=ExactMatchJudge())
+
+    shadow = wrap_with_shadow_sampling(
+        candidate=candidate,
+        baseline=baseline,
+        grader=grader,
+        ledger=ledger,
+        task_type="extract-entities",
+        shadow_rate=1.0,
+        async_shadow=False,
+    )
+
+    config = RunConfig(model_name="cheap-1")
+    response = shadow.execute_prompt("Hello.", config)
+
+    assert response.content == "match"
+    # Baseline ran in the shadow path; ledger now has one observation.
+    assert baseline.calls, "baseline must have been called when shadow_rate=1.0"
+    observations = ledger.by_task_type("extract-entities")
+    assert observations, "shadow path should append at least one observation"
+
+
+def test_wrap_with_shadow_sampling_isolates_baseline_failure(tmp_path) -> None:
+    from llm_connect.grading import ExactMatchJudge, PairedGrader
+    from infospace_bench.routing import wrap_with_shadow_sampling
+
+    candidate = _MockAdapter(model="cheap-1", content="ok")
+
+    class _AngryBaseline(LLMAdapter):
+        def execute_prompt(self, prompt, config):
+            raise RuntimeError("baseline outage")
+
+        def validate_config(self, config):
+            return True
+
+    seen_errors: list[Exception] = []
+    shadow = wrap_with_shadow_sampling(
+        candidate=candidate,
+        baseline=_AngryBaseline(),
+        grader=PairedGrader(judge=ExactMatchJudge()),
+        ledger=QualityLedger(path=tmp_path / "quality.jsonl"),
+        task_type="summarize-source",
+        shadow_rate=1.0,
+        async_shadow=False,
+        on_shadow_error=seen_errors.append,
+    )
+    response = shadow.execute_prompt("Hello.", RunConfig(model_name="cheap-1"))
+
+    assert response.content == "ok", "candidate response must survive baseline outage"
+    assert seen_errors and "baseline outage" in str(seen_errors[0])
+
+
+def test_summarise_quality_ledger_rolls_up_by_task_and_adapter(tmp_path) -> None:
+    from infospace_bench.routing import summarise_quality_ledger
+
+    ledger_path = tmp_path / "quality.jsonl"
+    ledger = QualityLedger(path=ledger_path)
+    for quality in (0.9, 0.95, 0.85):
+        ledger.append(
+            QualityObservation(
+                task_type="extract-entities",
+                adapter_id="cheap-1",
+                model_id="cheap-1",
+                cost_usd=0.001,
+                quality_score=quality,
+                tokens_in=100,
+                tokens_out=50,
+                latency_ms=10,
+            )
+        )
+    ledger.append(
+        QualityObservation(
+            task_type="summarize-source",
+            adapter_id="cheaper-1",
+            model_id="cheaper-1",
+            cost_usd=0.0001,
+            quality_score=0.7,
+            tokens_in=80,
+            tokens_out=20,
+            latency_ms=5,
+        )
+    )
+
+    rows = summarise_quality_ledger(ledger_path)
+
+    by_key = {(row["task_type"], row["adapter_id"]): row for row in rows}
+    extract = by_key[("extract-entities", "cheap-1")]
+    assert extract["observations"] == 3
+    assert extract["mean_quality"] == round((0.9 + 0.95 + 0.85) / 3, 4)
+    assert extract["mean_cost_usd"] == 0.001
+    summarize = by_key[("summarize-source", "cheaper-1")]
+    assert summarize["observations"] == 1
+
+
+def test_collect_adapter_choices_rolls_up_per_stage(tmp_path) -> None:
+    """Unit test: report helper aggregates adapter choices from artifact provenance."""
+    from infospace_bench.generator import _collect_adapter_choices
+
+    class _FakeArtifact:
+        def __init__(self, kind: str, provenance: dict) -> None:
+            self.kind = kind
+            self.provenance = provenance
+
+    artifacts = [
+        _FakeArtifact(
+            kind="entity",
+            provenance={
+                "stage_id": "extract-entities",
+                "provider_metadata": {
+                    "adapter_id": "_MockAdapter:cheap-1",
+                    "task_type": "extract-entities",
+                    "usage": {"prompt_tokens": 120, "completion_tokens": 40},
+                },
+            },
+        ),
+        _FakeArtifact(
+            kind="entity",
+            provenance={
+                "stage_id": "extract-entities",
+                "provider_metadata": {
+                    "adapter_id": "_MockAdapter:cheap-1",
+                    "task_type": "extract-entities",
+                    "usage": {"prompt_tokens": 130, "completion_tokens": 50},
+                },
+            },
+        ),
+        _FakeArtifact(
+            kind="relation",
+            provenance={
+                "stage_id": "extract-relations",
+                "provider_metadata": {
+                    "adapter_id": "_MockAdapter:smart-1",
+                    "task_type": "extract-relations",
+                    "usage": {"prompt_tokens": 200, "completion_tokens": 80},
+                },
+            },
+        ),
+        # Artifact without provider_metadata should be ignored.
+        _FakeArtifact(kind="generated", provenance={"stage_id": "summarize-source"}),
+    ]
+
+    rows = _collect_adapter_choices(artifacts)
+
+    by_key = {(row["stage_id"], row["adapter_id"]): row for row in rows}
+    entities_row = by_key[("extract-entities", "_MockAdapter:cheap-1")]
+    relations_row = by_key[("extract-relations", "_MockAdapter:smart-1")]
+    assert entities_row["calls"] == 2
+    assert entities_row["prompt_tokens"] == 250
+    assert entities_row["completion_tokens"] == 90
+    assert relations_row["calls"] == 1
+    assert relations_row["task_type"] == "extract-relations"
+
+
+def test_routing_ledger_cli(tmp_path) -> None:
+    import json as _json
+    import subprocess as _sub
+    import sys as _sys
+    import os as _os
+
+    ledger_path = tmp_path / "quality.jsonl"
+    ledger = QualityLedger(path=ledger_path)
+    ledger.append(
+        QualityObservation(
+            task_type="extract-entities",
+            adapter_id="cheap-1",
+            model_id="cheap-1",
+            cost_usd=0.001,
+            quality_score=0.9,
+            tokens_in=100,
+            tokens_out=50,
+            latency_ms=10,
+        )
+    )
+
+    env = _os.environ.copy()
+    env["PYTHONPATH"] = "src:/home/worsch/markitect-tool/src:/home/worsch/llm-connect"
+    result = _sub.run(
+        [_sys.executable, "-m", "infospace_bench", "routing", "ledger", str(ledger_path)],
+        check=False, env=env, text=True, capture_output=True,
+    )
+
+    assert result.returncode == 0, result.stderr
+    payload = _json.loads(result.stdout)
+    assert payload["ledger_path"] == str(ledger_path)
+    assert payload["rows"] and payload["rows"][0]["task_type"] == "extract-entities"
+
+
 def test_bridge_passes_estimated_cost_per_1k_through() -> None:
    captured: dict[str, Any] = {}

--- a/tests/test_routing_config.py
+++ b/tests/test_routing_config.py
@@ -0,0 +1,272 @@
+"""
+Tests for the routing config schema (IB-WP-0020-T01).
+
+Parser-only — no network calls, no llm-connect construction. T02 will
+test the provider construction loader separately.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+import yaml
+
+from infospace_bench.errors import InfospaceError
+from infospace_bench.routing_config import (
+    ROUTING_SCHEMA_VERSION,
+    RoutingCandidateConfig,
+    RoutingConfig,
+    RoutingTaskTypeConfig,
+    load_routing_config,
+    parse_routing_config,
+)
+
+
+MINIMAL = {
+    "schema_version": 1,
+    "task_types": {
+        "summarize-source": {
+            "candidates": [
+                {
+                    "id": "openrouter:gpt-4o-mini",
+                    "provider": "openrouter",
+                    "model": "openai/gpt-4o-mini",
+                },
+            ],
+        },
+    },
+}
+
+
+def test_parses_minimal_config() -> None:
+    config = parse_routing_config(MINIMAL)
+
+    assert config.schema_version == ROUTING_SCHEMA_VERSION
+    assert config.default_quality_floor is None
+    assert config.ledger_path is None
+    assert config.stage_to_task_type == {}
+    assert len(config.task_types) == 1
+    task = config.task_types[0]
+    assert task.task_type == "summarize-source"
+    assert task.quality_floor is None
+    assert len(task.candidates) == 1
+    candidate = task.candidates[0]
+    assert candidate.id == "openrouter:gpt-4o-mini"
+    assert candidate.provider == "openrouter"
+    assert candidate.model == "openai/gpt-4o-mini"
+    assert candidate.api_key_env == ""
+    assert candidate.max_cost_per_1k is None
+
+
+def test_parses_full_config_round_trip() -> None:
+    data = {
+        "schema_version": 1,
+        "default_quality_floor": 0.8,
+        "ledger_path": "output/routing/quality.jsonl",
+        "stage_to_task_type": {
+            "extract-entities": "smart",
+            "extract-relations": "smart",
+        },
+        "task_types": {
+            "cheap": {
+                "quality_floor": 0.7,
+                "candidates": [
+                    {
+                        "id": "openrouter:gpt-4o-mini",
+                        "provider": "openrouter",
+                        "model": "openai/gpt-4o-mini",
+                        "api_key_env": "OPENROUTER_API_KEY",
+                        "max_cost_per_1k": 0.001,
+                    },
+                ],
+            },
+            "smart": {
+                "quality_floor": 0.85,
+                "candidates": [
+                    {
+                        "id": "openrouter:claude-haiku",
+                        "provider": "openrouter",
+                        "model": "anthropic/claude-3.5-haiku",
+                    },
+                    {
+                        "id": "openrouter:claude-sonnet",
+                        "provider": "openrouter",
+                        "model": "anthropic/claude-3.5-sonnet",
+                        "max_cost_per_1k": 0.003,
+                    },
+                ],
+            },
+        },
+    }
+
+    config = parse_routing_config(data)
+
+    assert config.default_quality_floor == 0.8
+    assert config.ledger_path == "output/routing/quality.jsonl"
+    assert config.stage_to_task_type == {
+        "extract-entities": "smart",
+        "extract-relations": "smart",
+    }
+    smart = next(t for t in config.task_types if t.task_type == "smart")
+    assert smart.quality_floor == 0.85
+    assert len(smart.candidates) == 2
+    assert smart.candidates[1].max_cost_per_1k == 0.003
+
+
+def test_load_routing_config_reads_yaml_file(tmp_path: Path) -> None:
+    config_path = tmp_path / "routing.yaml"
+    config_path.write_text(yaml.safe_dump(MINIMAL, sort_keys=False), encoding="utf-8")
+
+    config = load_routing_config(config_path)
+
+    assert isinstance(config, RoutingConfig)
+    assert config.schema_version == 1
+
+
+def test_load_routing_config_missing_file(tmp_path: Path) -> None:
+    with pytest.raises(InfospaceError) as exc_info:
+        load_routing_config(tmp_path / "missing.yaml")
+    assert exc_info.value.code == "missing_routing_config"
+
+
+def test_load_routing_config_bad_yaml(tmp_path: Path) -> None:
+    config_path = tmp_path / "broken.yaml"
+    config_path.write_text("schema_version: 1\n  bad: indent\n: : : :\n", encoding="utf-8")
+
+    with pytest.raises(InfospaceError) as exc_info:
+        load_routing_config(config_path)
+    assert exc_info.value.code == "invalid_routing_config_yaml"
+
+
+def test_rejects_wrong_schema_version() -> None:
+    payload = {**MINIMAL, "schema_version": 2}
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "unsupported_routing_schema"
+
+
+def test_rejects_missing_schema_version() -> None:
+    payload = {"task_types": MINIMAL["task_types"]}
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "unsupported_routing_schema"
+
+
+def test_rejects_empty_task_types() -> None:
+    payload = {"schema_version": 1, "task_types": {}}
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "empty_routing_task_types"
+
+
+def test_rejects_task_type_without_candidates() -> None:
+    payload = {
+        "schema_version": 1,
+        "task_types": {"foo": {"candidates": []}},
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "empty_routing_candidates"
+
+
+def test_rejects_candidate_missing_required_field() -> None:
+    payload = {
+        "schema_version": 1,
+        "task_types": {
+            "foo": {
+                "candidates": [{"provider": "openrouter", "model": "x"}],  # missing id
+            },
+        },
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "missing_routing_candidate_field"
+    assert "id" in exc_info.value.detail["missing"]
+
+
+def test_rejects_unsupported_provider() -> None:
+    payload = {
+        "schema_version": 1,
+        "task_types": {
+            "foo": {
+                "candidates": [
+                    {"id": "x", "provider": "acme", "model": "acme/model"},
+                ],
+            },
+        },
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "unsupported_routing_provider"
+
+
+def test_rejects_negative_max_cost() -> None:
+    payload = {
+        "schema_version": 1,
+        "task_types": {
+            "foo": {
+                "candidates": [
+                    {
+                        "id": "x",
+                        "provider": "openrouter",
+                        "model": "openai/gpt-4o-mini",
+                        "max_cost_per_1k": -1,
+                    },
+                ],
+            },
+        },
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "invalid_routing_max_cost"
+
+
+def test_rejects_quality_floor_out_of_range() -> None:
+    payload = {
+        "schema_version": 1,
+        "default_quality_floor": 1.5,
+        "task_types": MINIMAL["task_types"],
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "invalid_routing_quality_floor"
+
+
+def test_rejects_duplicate_candidate_ids_within_task_type() -> None:
+    payload = {
+        "schema_version": 1,
+        "task_types": {
+            "foo": {
+                "candidates": [
+                    {"id": "dupe", "provider": "openrouter", "model": "a"},
+                    {"id": "dupe", "provider": "openrouter", "model": "b"},
+                ],
+            },
+        },
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "duplicate_routing_candidate_id"
+
+
+def test_rejects_non_mapping_stage_map() -> None:
+    payload = {
+        "schema_version": 1,
+        "task_types": MINIMAL["task_types"],
+        "stage_to_task_type": ["not", "a", "mapping"],
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "invalid_routing_stage_map"
+
+
+def test_rejects_non_string_ledger_path() -> None:
+    payload = {
+        "schema_version": 1,
+        "task_types": MINIMAL["task_types"],
+        "ledger_path": 42,
+    }
+    with pytest.raises(InfospaceError) as exc_info:
+        parse_routing_config(payload)
+    assert exc_info.value.code == "invalid_routing_ledger_path"
--- a/workplans/IB-WP-0018-adaptive-llm-routing-consumer.md
+++ b/workplans/IB-WP-0018-adaptive-llm-routing-consumer.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Adaptive LLM Routing — infospace-bench Consumer Wiring"
 domain: markitect
 repo: infospace-bench
-status: blocked
+status: done
 owner: markitect
 topic_slug: markitect
 created: "2026-05-17"
-updated: "2026-05-17"
+updated: "2026-05-18"
 depends_on_workplans:
  - LLM-WP-0004
 related_workplans:
@@ -33,7 +33,22 @@ list will be refined once that API is stable.

 ## Status

-Blocked on `LLM-WP-0004` T01..T03.
+Done. LLM-WP-0004 landed `QualityLedger`, `QualityObservation`,
+`BaselineGrader`/`PairedGrader`/`ExactMatchJudge`/`EmbeddingSimilarityJudge`/
+`LLMJudge`, `AdaptiveRoutingPolicy`, and `ShadowingAdapter` in
+llm-connect; the five tasks below are all complete.
+
+- T01 — task-type taxonomy (`docs/routing-task-types.md`)
+- T02 — `RoutingAssistedGenerationAdapter` bridge in
+  `src/infospace_bench/routing.py`
+- T03 — `wrap_with_shadow_sampling()` helper that opt-in installs
+  llm-connect's `ShadowingAdapter` around any candidate
+- T04 — `## Per-stage adapter choices` section in
+  `reports/generation-summary.md` (driven from artifact
+  `provenance.provider_metadata`) and `infospace-bench routing ledger`
+  CLI subcommand
+- T05 — `tests/test_routing_adapter.py` (13 tests, including a CLI
+  smoke and the adapter-choices unit test)

 ## Why this is a separate workplan

--- a/workplans/IB-WP-0020-provider-routing-cli.md
+++ b/workplans/IB-WP-0020-provider-routing-cli.md
@@ -0,0 +1,211 @@
+---
+id: IB-WP-0020
+type: workplan
+title: "Provider Routing CLI Integration"
+domain: markitect
+repo: infospace-bench
+status: active
+owner: markitect
+topic_slug: markitect
+created: "2026-05-18"
+updated: "2026-05-18"
+depends_on_workplans:
+  - IB-WP-0018
+  - LLM-WP-0004
+related_workplans:
+  - IB-WP-0016
+  - IB-WP-0019
+state_hub_workstream_slug: "ib-wp-0020-provider-routing-cli"
+state_hub_workstream_id: "172bb082-610a-477b-b5e0-26c9f4bdfd95"
+---
+
+# IB-WP-0020 — Provider Routing CLI Integration
+
+## Goal
+
+Expose `RoutingAssistedGenerationAdapter` (IB-WP-0018) as a first-class
+CLI option so a real multi-chapter or full-book run can use the
+adaptive router without writing any Python. Today `--provider` accepts
+`fixture` and `openrouter`; this workplan adds `routing`, plus a small
+config file that names the rules, the ledger, the quality floors, and
+the per-stage task-type overrides.
+
+The end state is a single command that does cost-aware adaptive
+routing across multiple OpenRouter models and writes back the
+per-stage adapter choices, the budget log, and (optionally) sampled
+shadow grades:
+
+```bash
+infospace-bench generate from-source ./LEFEVRE.epub \
+  --workspace ./infospaces \
+  --slug reminiscences-routed \
+  --name "Reminiscences (Routed)" \
+  --profile trading-literature \
+  --provider routing \
+  --routing-config ./routing.yaml \
+  --chapter I \
+  --apply
+```
+
+## Why this is a separate workplan
+
+`IB-WP-0018` shipped the bridge module and its programmatic API. CLI
+wiring needs its own config-file schema, its own loader, its own error
+surfaces, and its own end-to-end smoke test — and that is enough scope
+to justify a separate review surface rather than absorbing it into the
+already-closed IB-WP-0018.
+
+## Non-Goals
+
+- Owning the routing policy primitives (those live in
+  `llm-connect` LLM-WP-0004).
+- Replacing the static `openrouter` provider — that path stays usable
+  for callers who do not want the router.
+- Embedding model selection logic inside the CLI; the config file is
+  declarative and routing decisions stay with `AdaptiveRoutingPolicy`.
+
+## Tasks
+
+### T01 — Routing config file schema
+
+```task
+id: IB-WP-0020-T01
+status: done
+priority: medium
+state_hub_task_id: "39597441-22ab-4dcf-b68d-b045823a9374"
+```
+
+- Define a small YAML schema for a routing config:
+  - `quality_floor: <float | null>` (global default)
+  - `ledger_path: <str | null>` (relative to workspace by default)
+  - `task_types`: map of task_type to a list of candidate adapters,
+    each with `id`, `provider` (`openrouter`, `claude_code`,
+    `openai`, …), `model`, `api_key_env`, optional `max_cost_per_1k`,
+    optional `quality_floor` override
+  - `stage_to_task_type`: optional override map
+- Document the schema in `docs/routing-config.md` with two annotated
+  examples (one OpenRouter-only, one ClaudeCode-as-baseline +
+  OpenRouter candidates).
+- Tests: schema parses; missing fields default cleanly; unknown
+  providers raise a focused error.
+
+### T02 — Routing config loader
+
+```task
+id: IB-WP-0020-T02
+status: todo
+priority: high
+state_hub_task_id: "5e38514b-ad6a-4d39-8716-f812f241d9fd"
+```
+
+- Add `src/infospace_bench/routing_config.py` (or extend
+  `routing.py`) with `load_routing_config(path, *, workspace)` that
+  returns a `RoutingPolicy` (or `AdaptiveRoutingPolicy` when the
+  config sets `quality_floor` or names a ledger) ready to hand to
+  `RoutingAssistedGenerationAdapter`.
+- Provider construction:
+  - `openrouter` → llm-connect `OpenRouterAdapter` with API key from
+    `api_key_env` (default `OPENROUTER_API_KEY`)
+  - `claude_code` → llm-connect `ClaudeCodeAdapter`
+  - others (openai, gemini) supported but explicitly documented as
+    untested for production use
+- Tests: builds a static policy from a minimal config; builds an
+  adaptive policy with a ledger; missing API key raises before any
+  network call.
+
+### T03 — `--provider routing` and `--routing-config` CLI flags
+
+```task
+id: IB-WP-0020-T03
+status: todo
+priority: high
+state_hub_task_id: "fe5888e0-da33-413a-b026-71ed811b8c73"
+```
+
+- Add `routing` to the `--provider` choices on `generate run`,
+  `generate resume`, and `generate from-source`.
+- Add `--routing-config <path>` (required when `--provider routing`).
+- Add `--quality-floor <float>` to override the config-level floor at
+  the call site (handy for tightening or loosening for a single run
+  without editing the file).
+- Wire the loader into `_adapter_for`/`run_generation` so a
+  `RoutingAssistedGenerationAdapter` is constructed and passed to the
+  workflow engine.
+- Tests: CLI smoke that builds a routing config pointing at mocked
+  adapter ids and confirms the run goes through the bridge.
+
+### T04 — Example config and live-smoke wiring
+
+```task
+id: IB-WP-0020-T04
+status: todo
+priority: medium
+state_hub_task_id: "69288131-f265-4db5-a4b0-b0c8a6f55dd8"
+```
+
+- Add `examples/routing/trading-literature.yaml` with a realistic
+  Lefevre-aimed config: cheap model for summaries, mid model for
+  entities/relations, ClaudeCode baseline behind a shadow sampler.
+- Update the optional live-OpenRouter smoke test
+  (`tests/test_openrouter_live.py`) with a parallel skipped test that
+  exercises `--provider routing` end-to-end when both
+  `OPENROUTER_API_KEY` and
+  `INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER=1` are set.
+- Document how to run the live routing smoke in
+  `docs/generic-source-generator.md`.
+
+### T05 — Shadow-mode opt-in flag
+
+```task
+id: IB-WP-0020-T05
+status: todo
+priority: medium
+state_hub_task_id: "02658420-056c-4d73-8055-e6a7ab51876b"
+```
+
+- Add `--shadow-rate <float>` and `--shadow-baseline <id>` flags so a
+  caller can enable `wrap_with_shadow_sampling()` for an entire run
+  without editing the config file. When set, the loader wraps each
+  candidate adapter in `ShadowingAdapter` with the named baseline and
+  the chosen rate.
+- Tests: monkeypatched baseline asserts the shadow path fires at
+  `shadow_rate=1.0` and skips at `shadow_rate=0.0`.
+
+## Acceptance
+
+- `infospace-bench generate from-source ... --provider routing
+  --routing-config <path>` succeeds against the deterministic Lefevre
+  fixture with a hand-crafted routing config and mocked adapters.
+- The generation report's `## Per-stage adapter choices` section
+  reflects the routed choices, and `output/budget/usage.yaml` buckets
+  reflect the actual model that ran each call.
+- The static `openrouter` and `fixture` provider paths remain
+  unchanged.
+- An optional live smoke test exists and is gated identically to the
+  IB-WP-0016 OpenRouter smoke.
+- Documentation explains the config shape, the API-key resolution, and
+  the difference between adaptive routing and shadow-mode sampling.
+
+## Risks and open questions
+
+- **Adapter constructor surface.** llm-connect's adapter constructors
+  vary slightly per provider; the loader needs to keep a small but
+  explicit allowlist of provider names rather than reflective magic.
+- **API key plumbing.** Today `openrouter` reads
+  `OPENROUTER_API_KEY` directly. The config will name the env var
+  explicitly to make multi-key setups workable; no key material
+  belongs in the config file itself.
+- **Schema versioning.** Bump `schema_version` from day one so the
+  loader can refuse mismatched configs once the shape stabilises.
+- **Shadow grader choice.** v1 will default the shadow grader to
+  `ExactMatchJudge` because it has no extra cost. `LLMJudge` and
+  `EmbeddingSimilarityJudge` configuration belongs in a follow-up.
+
+## Downstream effects
+
+- `infospace-bench routing ledger <path>` (already shipped via
+  IB-WP-0018) becomes the natural companion CLI for inspecting the
+  observations the routed runs accumulate.
+- A successful T03 + T04 lets us run a multi-chapter Lefevre live
+  build using the adaptive router and validate the IB-WP-0016
+  reviewer checklist on real output without single-model lock-in.
Author	SHA1	Message	Date
tegwick	c11a942bb7	IB-WP-0020-T01: routing config schema and parser Add a small YAML routing config schema (schema_version 1) and a parser-only loader at src/infospace_bench/routing_config.py. The loader validates the declarative shape — task_types with candidates, optional per-task quality_floor, optional default_quality_floor, optional ledger_path, optional stage_to_task_type override map — and refuses bad shapes before any network or workspace work happens. Supported provider names: openrouter, claude_code, openai, gemini. Unknown providers, missing required candidate fields, out-of-range quality floors, negative max_cost_per_1k, duplicate candidate ids within a task type, and non-mapping stage_to_task_type all raise focused InfospaceError codes that callers can pattern-match. docs/routing-config.md documents the schema with two annotated examples (OpenRouter-only two-tier, and adaptive with a ClaudeCode baseline) plus the full "what fails fast" list. 16 parser tests cover happy-path round-trip, file load, missing file, malformed YAML, and every validation surface (wrong/missing schema version, empty task_types, empty candidates, missing required fields, unsupported provider, negative cost, out-of-range quality_floor, duplicate ids, non-mapping stage_map, non-string ledger_path). T02 will turn a RoutingConfig into a live llm-connect RoutingPolicy / AdaptiveRoutingPolicy with constructed LLMAdapter instances. 160 tests pass, 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 18:09:28 +02:00
tegwick	706ace3661	Refresh agent instruction files	2026-05-18 16:55:43 +02:00
tegwick	a95322051f	IB-WP-0020: provider routing CLI workplan (todo) Open a workplan that turns the IB-WP-0018 RoutingAssistedGenerationAdapter bridge into a first-class CLI option. Adds --provider routing, a YAML routing config schema, --quality-floor and --shadow-rate / --shadow-baseline opt-in flags so a real multi-chapter Lefevre live run can use adaptive cost-quality routing without writing any Python. Workstream registered with state-hub (172bb082-610a-477b-b5e0-26c9f4bdfd95) with five tasks: - T01 routing config file schema (medium) - T02 routing config loader (high) - T03 --provider routing + --routing-config + --quality-floor CLI flags (high) - T04 example config + optional live routing smoke test (medium) - T05 --shadow-rate / --shadow-baseline opt-in flags (medium) Depends on IB-WP-0018 (already done) and LLM-WP-0004 (already done in ~/llm-connect). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 13:50:26 +02:00
tegwick	f818acfc62	IB-WP-0018-T03+T04: shadow sampling + report/CLI surfacing; close IB-WP-0018 T03 — wrap_with_shadow_sampling() helper in routing.py: builds a llm-connect ShadowingAdapter around any candidate LLMAdapter with a caller-supplied baseline, grader, and QualityLedger. async_shadow=True by default so production load is not doubled; on_shadow_error escape hatch keeps caller logs informed when a baseline outage swallows the shadow path. The returned adapter is still an LLMAdapter so it slots into a RoutingPolicy rule without further code change. T04 — generation report enrichment plus a small CLI helper: - _collect_adapter_choices walks artifact provenance, groups by (stage_id, adapter_id), and surfaces calls + prompt/completion tokens per (stage, adapter) pair in a new ## Per-stage adapter choices section. Runs that did not go through the bridge have no provider_metadata.adapter_id and emit an empty list, so fixture-only reports stay terse. - summarise_quality_ledger() rolls a llm-connect QualityLedger up by (task_type, adapter_id) with mean quality, mean cost, observations, and cumulative tokens. - infospace-bench routing ledger <path> CLI prints the rollup as JSON. Five new tests cover shadow happy-path, shadow failure isolation, ledger rollup, the routing CLI, and the report's adapter-choice aggregation. Closes IB-WP-0018: T01-T05 are all done and the workplan status flips from blocked to done now that LLM-WP-0004's primitives have shipped. 144 tests pass, 1 skipped (the OpenRouter live smoke, gated as before). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 11:52:05 +02:00