generated from coulomb/repo-seed
The default archive include set already pulls output/ in wholesale, so output/budget/ already lands inside the archive package with no code change. Add a budget_summary block to ArchiveRecord.metadata so catalog-level tools can see plans_count, runs_count, total_tokens, total_cost_usd_known, total_cost_usd_estimated, and the latest_snapshot_id without unpacking the archive. An infospace with no budget data still archives cleanly with an empty metadata dict. Closes IB-WP-0019 (Budget and Usage Registry): T01-T07 all done. Three-layer design landed end-to-end — layer 1 (per-infospace plans.yaml / usage.yaml / summary.yaml) and layer 3 (state-hub record_token_event emission with failure isolation) live here; layer 2 (cross-application QualityLedger for adaptive routing) is parked in llm-connect LLM-WP-0004 and infospace-bench IB-WP-0018 awaits it. 122 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
10 KiB
10 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | depends_on_workplans | related_workplans | state_hub_workstream_id | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IB-WP-0019 | workplan | Budget and Usage Registry for Infospaces | markitect | infospace-bench | done | markitect | markitect | 2026-05-17 | 2026-05-17 |
|
063c6285-a56e-476b-8666-109d6fa35858 |
IB-WP-0019 — Budget and Usage Registry for Infospaces
Goal
Persist budget and usage signals at the per-infospace layer and emit organizational rollups, so every infospace can answer "what did we estimate, what did we actually spend, on which model, at what cost" without scraping commit messages or state-hub events.
This workplan owns the recording and rollup layer. It does not own:
- Adaptive routing decisions or per-task quality grading — those belong
to
llm-connectLLM-WP-0004and the consumer workplanIB-WP-0018. - Authoritative provider pricing — we read a small rate table and combine it with adapter-returned usage; the table itself is a static artifact that consumers refresh.
Why
IB-WP-0016-T03 made the planning estimates cheap to obtain (chunks,
calls, tokens, rough USD), but the numbers vanish after the JSON is
printed. Run records under output/workflows/runs/*.yaml capture
per-call prompt_tokens and completion_tokens but nothing rolls them
up, no cost is computed, and there is no plan-vs-actual variance.
Without this layer:
- Each new infospace re-discovers the same cost surprises
LLM-WP-0004's adaptive policy has no per-application history to learn from when it landsIB-WP-0014's archive packages forget the budget shape of the work that produced them- State-hub's organizational token ledger stays blind to infospace-bench runs
Non-Goals
- Owning a cross-application quality ledger (that is
LLM-WP-0004) - Auto-refreshing provider price lists at runtime
- Failing a
generate runwhen state-hub is unreachable - Persisting full prompt text for retrospective replay (the existing run records already keep what is needed)
Layered design (read first)
Three layers, each owned by a different repo:
| Layer | Lives in | Purpose | This workplan? |
|---|---|---|---|
| 1. Per-infospace budget log | infospace-bench (this workplan) |
Plans + usage + variance, archived with the infospace | yes |
| 2. Cross-application observations | llm-connect (LLM-WP-0004) |
Per-task per-adapter (cost, tokens, latency, quality) for adaptive routing | no |
| 3. Organizational rollup | state-hub (already exists) |
record_token_event / get_token_summary across all projects |
this workplan emits, hub stores |
Tasks
T01 — Plan snapshot persistence
id: IB-WP-0019-T01
status: done
priority: high
state_hub_task_id: "7f1a4e0a-c1ad-49f3-aad1-6946de9b1219"
- Append the compact
plan_generation_summarypayload tooutput/budget/plans.yamlon everygenerate planinvocation - Include a stable
snapshot_id(hash of relevant fields), the stage, selection filters, and arecorded_attimestamp - Cap the history length with a configurable retention (default keep last 50 snapshots; older snapshots are pruned with a single rollup entry preserved)
- Tests: round-trip, retention, repeat plans produce distinct snapshots
T02 — Usage rollup from run records
id: IB-WP-0019-T02
status: done
priority: high
state_hub_task_id: "a612f8d4-f96d-4fae-9aa6-66a7946414f5"
- On
runandresumecompletion, scan the run-record YAML written by the workflow engine and aggregate per-call usage intooutput/budget/usage.yaml - Aggregate buckets: workflow, stage, provider, model
- Fields per bucket:
calls,prompt_tokens,completion_tokens,total_tokens,cost_usd_known(sum over calls with known cost),cost_usd_estimated(computed via rate table fallback) - Append a top-level
runs[]entry per completed run with the run's rollup, thesnapshot_idof the plan it executed against (when one exists), and the wall-clock duration - Tests: aggregate across multiple stages, fixture-mode produces zero cost without erroring, missing-usage entries do not abort the rollup
T03 — Cost computation from a rate table
id: IB-WP-0019-T03
status: done
priority: high
state_hub_task_id: "688c590d-8885-455e-bcf6-61409a45e001"
- Add
docs/model-rates.yamlwithmodel -> {prompt_per_1k, completion_per_1k, currency, source_url, captured_at}for the OpenRouter models we have actually used (start small: the ones currently exercised in tests/smoke) - Resolver order: adapter-returned cost (when present) > rate table > unknown (recorded explicitly, not silently zeroed)
- Allow a per-workspace override via
${workspace}/model-rates.yamlfor self-hosted or private-rate setups - Tests: known model, unknown model surfaces as
cost_usd: nullwithcost_status: "unknown", override file takes precedence
T04 — Plan-vs-actual variance and surfacing
id: IB-WP-0019-T04
status: done
priority: medium
state_hub_task_id: "c6adc4fb-9062-4c81-a0b2-98d3166e047d"
- Compute a small variance record on each run: actual_calls / estimated_calls, actual_tokens / estimated_tokens, actual_cost / estimated_cost, plus per-stage variance
- Persist to
output/budget/summary.yaml(overwrite each run; previous versions live in usage.yaml history) - Surface a one-line variance summary in
reports/generation-summary.md(touches T07 of IB-WP-0016) - Add the variance summary to
generate statusJSON output - Tests: zero-cost fixture run, known-model OpenRouter mock run, missing-plan run (variance fields are null but the run still records)
T05 — State-hub token-event emission
id: IB-WP-0019-T05
status: done
priority: medium
state_hub_task_id: "968bca1d-63ff-4818-83bb-ca314b1e633c"
- After each completed run, call state-hub
record_token_eventwith the run's rollup (tokens in/out, model, USD cost when known,infospace_slug,workspace) - Emit at most one event per run; tag the event with the workplan context when available
- Failure isolation: a state-hub error must not fail the run; log the failure and continue
- Honor an opt-out env var
INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS - Tests: monkey-patched hub client, opt-out flag respected, run succeeds when the hub raises
T06 — Workspace-level rollup CLI
id: IB-WP-0019-T06
status: done
priority: medium
state_hub_task_id: "7cb34bfc-c562-4dda-a6d4-b44158644e19"
- Add
infospace-bench budget list <workspace>that walksinfospaces/*/output/budget/and prints a JSON table:slug,plans_count,runs_count,total_tokens,total_cost_usd_known,total_cost_usd_estimated,last_run_at - Add
infospace-bench budget show <infospace-root>that prints the full per-infospace budget structure - Tests: empty workspace, multiple infospaces, missing budget dir is treated as zero, not an error
T07 — Archive integration
id: IB-WP-0019-T07
status: done
priority: low
state_hub_task_id: "b97906e0-2835-4246-9868-840c02d64fae"
- Confirm
output/budget/ends up inside the archive package built byIB-WP-0014'sarchive_infospace()(it should, via the existing default-include rules — verify with a test) - Add a
budget_summaryfield to the archive manifest so catalog-level tools can find the cost shape of an archived infospace without unpacking it
Acceptance
- A
generate planinvocation persists a snapshot tooutput/budget/plans.yamland is idempotent across runs - A
generate runinvocation appends a usage rollup tooutput/budget/usage.yaml, writes a variance summary, and emits one state-hub token event (when the hub is reachable) generate statusand the generation-summary report surface the plan-vs-actual variance for the most recent runinfospace-bench budget list <workspace>returns a parseable rollup across all infospaces in a workspace- Archived infospace packages carry their budget log and expose a
budget_summaryfield in the archive manifest - Tests cover plan persistence, run rollup, rate-table resolution, variance, state-hub emission with hub-down isolation, and the workspace CLI
Risks and open questions
- Rate-table drift. Provider prices change. The rate table will go
stale unless someone refreshes it. Add
captured_atto every entry and surface "rate older than 90 days" as a warning in budget output; do not block. - Multiple-provider cost. When a single run mixes providers (e.g. fixture for cheap stages + OpenRouter for expensive ones), the rollup must split clearly. The model+provider bucketing in T02 covers this; tests should pin the behaviour.
- State-hub coupling. Emitting token events introduces a cross-repo write. T05 keeps it opt-outable and failure-isolated, but callers running offline want zero coupling — make sure the default is "emit if reachable, silent skip otherwise" rather than "fail if unreachable".
- Concurrency. Two
generate runinvocations on the same infospace would race onusage.yaml. Existing infospace workflows assume sequential runs; document the constraint rather than building locks. - Budget vs adaptive observations. This workplan records what
happened.
LLM-WP-0004records what we learned about quality. Keep them as two distinct files / schemas so the layering stays inspectable; do not merge. - Privacy. Usage records do not include prompt or completion text — only counts and identifiers. State-hub events likewise. If this assumption later changes, add an explicit redaction hook before doing so.
Downstream effects
IB-WP-0018(adaptive routing consumer) gains a local history to cross-check against theQualityLedgeronceLLM-WP-0004landsIB-WP-0016-T07(review report and output policy) can pull the variance summary directly instead of regenerating numbersIB-WP-0014archives become budget-bearing artifacts without code changes beyond T07's manifest field- State-hub's
get_token_summaryfinally sees infospace-bench runs alongside other domains' token spend