2 Commits

Author SHA1 Message Date
3ca891de4a fix: review findings from Lefevre live smoke
Two small fixes informed by the 2026-05-18 live OpenRouter chapter-I run.

1. extract-entities templates (trading-literature and general-knowledge):
   the # Entity Title placeholder was interpreted by gpt-4o-mini as a
   literal heading prefix, so every entity came back as "# Entity Title:
   Bucket Shop" etc. The instruction now spells the placeholder out
   with concrete examples and an explicit "not the literal string"
   note, so smaller models hit the intended shape.

2. generate plan grows --model <id>. When supplied, the cost estimate
   pulls per-prompt and per-completion rates from the bundled
   model_rates.yaml instead of multiplying a single blended
   --cost-per-1k value across all tokens. The summary now also returns
   a separate estimated_completion_tokens field plus a cost_source tag
   ("rate_table:<model>" | "cost_per_1k_blended" | None).

This is a stopgap. LLM-WP-0005 (proposed in llm-connect this round)
will move the rate registry and token-shape problem classes upstream
so consumers stop re-implementing them.

The live smoke ran 28k prompt tokens / 7.5k completion / $0.0088
actual. With --model openai/gpt-4o-mini the plan estimate now lands at
$0.0076 (within 14% of actual) versus the prior $8.40 estimate at
--cost-per-1k 0.30.

181 tests pass, 2 skipped (both live OpenRouter smokes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 04:30:33 +02:00
13f9c1895c IB-WP-0016-T03: scale-aware planning
Replace generate plan's full-prompt dump with a compact summary that
reports selected-chunk counts, selected chapter numbers, per-workflow
call counts, prompt-word and token estimates, and a rough USD cost when
--cost-per-1k is supplied. Selection filters --chapter (label or number,
repeatable), --from-chapter / --to-chapter (numeric range), and --chunk
(repeatable id) shape the estimate. Budget caps --max-calls and
--cost-cap are reported as exceeds_* booleans so callers can fail fast
before run.

The old full per-workflow plan with prompts remains available behind
--full so deep inspection is opt-in instead of the default.

Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls,
~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks,
95 calls, ~64k tokens. 87 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:18:09 +02:00