infospace-bench

coulomb/infospace-bench

Fork 0

generated from coulomb/repo-seed

Commit Graph

Author	SHA1	Message	Date
tegwick	a4dde53fc3	IB-WP-0019-T03: rate-table cost computation Ship a starter model rate table at src/infospace_bench/model_rates.yaml (prompt_per_1k / completion_per_1k for the OpenRouter models we have actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a load_rate_table() / estimate_cost_usd() pair that overlays an optional <workspace>/model-rates.yaml on top of the bundled defaults. generate run now passes a workspace-aware cost_resolver into record_run_usage, so cost_usd_estimated lands on every usage bucket whose model matches the table. Adapter-returned cost still wins (cost_status="known"); rate-table cost is reported under cost_status="estimated"; unmatched models are recorded as cost_status="unknown" rather than silently zeroed. Rate-table file is listed in pyproject.toml package-data so pip-installed users keep the defaults. 106 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:54:30 +02:00
tegwick	678508226a	IB-WP-0019-T02: usage rollup from run records Every completed generate run now aggregates per-call adapter usage from the workflow-engine run records into output/budget/usage.yaml. Per-call data is bucketed by (workflow_id, stage_id, provider, model) with running totals for calls, prompt_tokens, completion_tokens, total_tokens, and cost_usd_known (sum of adapter-reported cost when the provider returns it; usually zero today). A run-level entry captures run_index, started_at, completed_at, duration_seconds, the executing plan snapshot_id (resolved from the latest plans.yaml entry), and the workflow-level run_id / stage_count summaries. cost_usd_estimated is left as None for this task; T03 wires the rate-table resolver so the same bucket gets a model-priced fallback when the adapter does not return cost directly. Fixture-mode runs are recorded with provider='fixture', zero tokens, and cost_status='unknown' rather than silently skipped, so the rollup honestly reflects which stages actually ran. 102 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:46:40 +02:00
tegwick	182f7011bb	IB-WP-0019-T01: plan snapshot persistence Every generate plan invocation now appends its compact summary to output/budget/plans.yaml with a deterministic 12-char snapshot_id hashed over the selection filters and the estimated call/token/cost totals. Identical-fingerprint plans refresh the most recent entry's recorded_at instead of stacking duplicates. Retention defaults to the last 50 snapshots; older entries are pruned and counted on a top-level pruned_count field. The summary now echoes its input filters (chapter_filter, chunk_filter, from_chapter, to_chapter) so reviewers can read the snapshot without cross-referencing the CLI invocation. New module src/infospace_bench/budget.py owns layer 1 (per-infospace recording) of the IB-WP-0019 three-layer design; layer 2 still belongs in llm-connect LLM-WP-0004 and layer 3 in state-hub. 99 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:19:35 +02:00

Author

SHA1

Message

Date

tegwick

a4dde53fc3

IB-WP-0019-T03: rate-table cost computation

Ship a starter model rate table at src/infospace_bench/model_rates.yaml
(prompt_per_1k / completion_per_1k for the OpenRouter models we have
actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet
and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a
load_rate_table() / estimate_cost_usd() pair that overlays an optional
<workspace>/model-rates.yaml on top of the bundled defaults.

generate run now passes a workspace-aware cost_resolver into
record_run_usage, so cost_usd_estimated lands on every usage bucket
whose model matches the table. Adapter-returned cost still wins
(cost_status="known"); rate-table cost is reported under
cost_status="estimated"; unmatched models are recorded as
cost_status="unknown" rather than silently zeroed. Rate-table file is
listed in pyproject.toml package-data so pip-installed users keep the
defaults.

106 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 19:54:30 +02:00

tegwick

678508226a

IB-WP-0019-T02: usage rollup from run records

Every completed generate run now aggregates per-call adapter usage from
the workflow-engine run records into output/budget/usage.yaml. Per-call
data is bucketed by (workflow_id, stage_id, provider, model) with
running totals for calls, prompt_tokens, completion_tokens,
total_tokens, and cost_usd_known (sum of adapter-reported cost when the
provider returns it; usually zero today). A run-level entry captures
run_index, started_at, completed_at, duration_seconds, the executing
plan snapshot_id (resolved from the latest plans.yaml entry), and the
workflow-level run_id / stage_count summaries.

cost_usd_estimated is left as None for this task; T03 wires the
rate-table resolver so the same bucket gets a model-priced fallback
when the adapter does not return cost directly.

Fixture-mode runs are recorded with provider='fixture', zero tokens,
and cost_status='unknown' rather than silently skipped, so the rollup
honestly reflects which stages actually ran.

102 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 19:46:40 +02:00

tegwick

182f7011bb

IB-WP-0019-T01: plan snapshot persistence

Every generate plan invocation now appends its compact summary to
output/budget/plans.yaml with a deterministic 12-char snapshot_id
hashed over the selection filters and the estimated call/token/cost
totals. Identical-fingerprint plans refresh the most recent entry's
recorded_at instead of stacking duplicates. Retention defaults to the
last 50 snapshots; older entries are pruned and counted on a top-level
pruned_count field.

The summary now echoes its input filters (chapter_filter, chunk_filter,
from_chapter, to_chapter) so reviewers can read the snapshot without
cross-referencing the CLI invocation.

New module src/infospace_bench/budget.py owns layer 1 (per-infospace
recording) of the IB-WP-0019 three-layer design; layer 2 still belongs
in llm-connect LLM-WP-0004 and layer 3 in state-hub.

99 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 19:19:35 +02:00

3 Commits