T03 — wrap_with_shadow_sampling() helper in routing.py: builds a
llm-connect ShadowingAdapter around any candidate LLMAdapter with a
caller-supplied baseline, grader, and QualityLedger. async_shadow=True
by default so production load is not doubled; on_shadow_error escape
hatch keeps caller logs informed when a baseline outage swallows the
shadow path. The returned adapter is still an LLMAdapter so it slots
into a RoutingPolicy rule without further code change.
T04 — generation report enrichment plus a small CLI helper:
- _collect_adapter_choices walks artifact provenance, groups by
(stage_id, adapter_id), and surfaces calls + prompt/completion tokens
per (stage, adapter) pair in a new ## Per-stage adapter choices
section. Runs that did not go through the bridge have no
provider_metadata.adapter_id and emit an empty list, so fixture-only
reports stay terse.
- summarise_quality_ledger() rolls a llm-connect QualityLedger up by
(task_type, adapter_id) with mean quality, mean cost, observations,
and cumulative tokens.
- infospace-bench routing ledger <path> CLI prints the rollup as JSON.
Five new tests cover shadow happy-path, shadow failure isolation,
ledger rollup, the routing CLI, and the report's adapter-choice
aggregation. Closes IB-WP-0018: T01-T05 are all done and the workplan
status flips from blocked to done now that LLM-WP-0004's primitives
have shipped.
144 tests pass, 1 skipped (the OpenRouter live smoke, gated as before).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace generate plan's full-prompt dump with a compact summary that
reports selected-chunk counts, selected chapter numbers, per-workflow
call counts, prompt-word and token estimates, and a rough USD cost when
--cost-per-1k is supplied. Selection filters --chapter (label or number,
repeatable), --from-chapter / --to-chapter (numeric range), and --chunk
(repeatable id) shape the estimate. Budget caps --max-calls and
--cost-cap are reported as exceeds_* booleans so callers can fail fast
before run.
The old full per-workflow plan with prompts remains available behind
--full so deep inspection is opt-in instead of the default.
Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls,
~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks,
95 calls, ~64k tokens. 87 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Blocked stub that names the dependency on llm-connect WP-0004 (adaptive
cost-quality routing). Activates once T01..T03 of that workplan land
and the QualityLedger / BaselineGrader / AdaptiveRoutingPolicy APIs are
stable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>