infospace-bench

coulomb/infospace-bench

Fork 0

generated from coulomb/repo-seed

Commit Graph

Author	SHA1	Message	Date
tegwick	debd2b8e69	IB-WP-0020-T04: example routing config + live routing smoke examples/routing/trading-literature.yaml is the checked-in starting config for a Lefevre-style run. It applies the IB-WP-0018 task-type taxonomy: cheap candidates for summary + evaluation, smart candidates for entity + relation extraction, and a separate baseline rule wiring claude_code for a follow-on T05 ShadowingAdapter step. Workspace- relative ledger_path keeps adaptive observations with the workspace. tests/test_routing_config.py gains a regression test that asserts the shipped example parses cleanly, every stage in stage_to_task_type maps to a declared task type, and the baseline candidate uses the claude_code provider — so the example will not bit-rot silently. tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY opt-in as the existing static smoke. It builds a one-candidate routing config, runs a single chapter through --provider routing, and asserts the per-stage adapter-choices report section names the routed model and the routed artifacts carry adapter_id provenance. docs/generic-source-generator.md gains a "Live runs with --provider routing" subsection that walks through the one-command routed run, explains the --quality-floor override, and points at the parallel live smoke test. 174 tests pass, 2 skipped (both live smokes, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 22:19:54 +02:00
tegwick	ab23c5873e	IB-WP-0016-T06: OpenRouter live-run guardrails Add --chapter / --from-chapter / --to-chapter / --chunk selection flags to generate init and generate from-source, plumb them into init_generation_infospace via a new _filter_chunks_by_chapter helper, and refuse to create an infospace when the filters reject every chunk (InfospaceError "empty_chapter_selection"). The flags use the same T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a single-chapter selection is a one-flag command. OpenRouter run-record metadata (model, request_id, usage tokens, retry_count, duration_seconds) already lands in output/workflows/runs/*.yaml; this task just adds the smoke test that proves it stays there, plus the parallel guarantee that the same provider metadata reaches generated artifact provenance via provenance.provider_metadata. tests/test_openrouter_live.py covers: - chapter-filter, from/to-chapter range, and empty-selection failure on init (non-live, deterministic) - CLI smoke through generate from-source with --chapter - a pytest-skipped live OpenRouter one-chapter end-to-end gated by OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini) docs/generic-source-generator.md gains a "Live OpenRouter runs (handle with care)" section that walks plan-before-run, single-chapter live run, the budget/usage artifacts, and the checks a reviewer should run before scaling to the full book. 129 tests pass, 1 skipped (the live smoke, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 23:04:19 +02:00

Author

SHA1

Message

Date

tegwick

debd2b8e69

IB-WP-0020-T04: example routing config + live routing smoke

examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.

tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.

tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.

docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.

174 tests pass, 2 skipped (both live smokes, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-18 22:19:54 +02:00

tegwick

ab23c5873e

IB-WP-0016-T06: OpenRouter live-run guardrails

Add --chapter / --from-chapter / --to-chapter / --chunk selection flags
to generate init and generate from-source, plumb them into
init_generation_infospace via a new _filter_chunks_by_chapter helper,
and refuse to create an infospace when the filters reject every chunk
(InfospaceError "empty_chapter_selection"). The flags use the same
T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a
single-chapter selection is a one-flag command.

OpenRouter run-record metadata (model, request_id, usage tokens,
retry_count, duration_seconds) already lands in
output/workflows/runs/*.yaml; this task just adds the smoke test that
proves it stays there, plus the parallel guarantee that the same
provider metadata reaches generated artifact provenance via
provenance.provider_metadata.

tests/test_openrouter_live.py covers:
- chapter-filter, from/to-chapter range, and empty-selection failure on
  init (non-live, deterministic)
- CLI smoke through generate from-source with --chapter
- a pytest-skipped live OpenRouter one-chapter end-to-end gated by
  OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with
  INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini)

docs/generic-source-generator.md gains a "Live OpenRouter runs (handle
with care)" section that walks plan-before-run, single-chapter live
run, the budget/usage artifacts, and the checks a reviewer should run
before scaling to the full book.

129 tests pass, 1 skipped (the live smoke, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 23:04:19 +02:00

2 Commits