Commit Graph

4 Commits

Author SHA1 Message Date
a4dde53fc3 IB-WP-0019-T03: rate-table cost computation
Ship a starter model rate table at src/infospace_bench/model_rates.yaml
(prompt_per_1k / completion_per_1k for the OpenRouter models we have
actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet
and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a
load_rate_table() / estimate_cost_usd() pair that overlays an optional
<workspace>/model-rates.yaml on top of the bundled defaults.

generate run now passes a workspace-aware cost_resolver into
record_run_usage, so cost_usd_estimated lands on every usage bucket
whose model matches the table. Adapter-returned cost still wins
(cost_status="known"); rate-table cost is reported under
cost_status="estimated"; unmatched models are recorded as
cost_status="unknown" rather than silently zeroed. Rate-table file is
listed in pyproject.toml package-data so pip-installed users keep the
defaults.

106 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:54:30 +02:00
df87e212a2 IB-WP-0016-T04: trading-literature profile
Ship a specialized profile for trading memoirs and market-structure
texts. The profile names eight entity categories (trader, market,
strategy, error, psychological_pattern, institution, instrument,
evidence_bearing_claim), five relation types (cause_effect,
lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), and
four evaluation criteria (groundedness, lesson_clarity,
historical_context, overgeneralization_risk). Each is reflected in the
prompts and contracts so the LLM is steered toward operator-level
findings rather than biographical detail or moralising.

The generic profile remains the default. A 2-chapter Lefevre smoke run
with --profile trading-literature completes end-to-end with viable
metrics; 93 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:59:45 +02:00
13f9c1895c IB-WP-0016-T03: scale-aware planning
Replace generate plan's full-prompt dump with a compact summary that
reports selected-chunk counts, selected chapter numbers, per-workflow
call counts, prompt-word and token estimates, and a rough USD cost when
--cost-per-1k is supplied. Selection filters --chapter (label or number,
repeatable), --from-chapter / --to-chapter (numeric range), and --chunk
(repeatable id) shape the estimate. Budget caps --max-calls and
--cost-cap are reported as exceeds_* booleans so callers can fail fast
before run.

The old full per-workflow plan with prompts remains available behind
--full so deep inspection is opt-in instead of the default.

Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls,
~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks,
95 calls, ~64k tokens. 87 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:18:09 +02:00
46aad3cce8 generic source-to-infospace generator 2026-05-14 19:33:22 +02:00