--- id: IB-WP-0019 type: workplan title: "Budget and Usage Registry for Infospaces" domain: markitect repo: infospace-bench status: done owner: markitect topic_slug: markitect created: "2026-05-17" updated: "2026-05-17" depends_on_workplans: [] related_workplans: - IB-WP-0016 - IB-WP-0014 - IB-WP-0018 - LLM-WP-0004 state_hub_workstream_id: "063c6285-a56e-476b-8666-109d6fa35858" --- # IB-WP-0019 — Budget and Usage Registry for Infospaces ## Goal Persist budget and usage signals at the per-infospace layer and emit organizational rollups, so every infospace can answer "what did we estimate, what did we actually spend, on which model, at what cost" without scraping commit messages or state-hub events. This workplan owns the *recording and rollup* layer. It does **not** own: - Adaptive routing decisions or per-task quality grading — those belong to `llm-connect` `LLM-WP-0004` and the consumer workplan `IB-WP-0018`. - Authoritative provider pricing — we read a small rate table and combine it with adapter-returned usage; the table itself is a static artifact that consumers refresh. ## Why `IB-WP-0016-T03` made the planning estimates cheap to obtain (chunks, calls, tokens, rough USD), but the numbers vanish after the JSON is printed. Run records under `output/workflows/runs/*.yaml` capture per-call `prompt_tokens` and `completion_tokens` but nothing rolls them up, no cost is computed, and there is no plan-vs-actual variance. Without this layer: - Each new infospace re-discovers the same cost surprises - `LLM-WP-0004`'s adaptive policy has no per-application history to learn from when it lands - `IB-WP-0014`'s archive packages forget the budget shape of the work that produced them - State-hub's organizational token ledger stays blind to infospace-bench runs ## Non-Goals - Owning a cross-application quality ledger (that is `LLM-WP-0004`) - Auto-refreshing provider price lists at runtime - Failing a `generate run` when state-hub is unreachable - Persisting full prompt text for retrospective replay (the existing run records already keep what is needed) ## Layered design (read first) Three layers, each owned by a different repo: | Layer | Lives in | Purpose | This workplan? | |---|---|---|---| | 1. Per-infospace budget log | `infospace-bench` (this workplan) | Plans + usage + variance, archived with the infospace | yes | | 2. Cross-application observations | `llm-connect` (LLM-WP-0004) | Per-task per-adapter (cost, tokens, latency, quality) for adaptive routing | no | | 3. Organizational rollup | `state-hub` (already exists) | `record_token_event` / `get_token_summary` across all projects | this workplan emits, hub stores | ## Tasks ### T01 — Plan snapshot persistence ```task id: IB-WP-0019-T01 status: done priority: high state_hub_task_id: "7f1a4e0a-c1ad-49f3-aad1-6946de9b1219" ``` - Append the compact `plan_generation_summary` payload to `output/budget/plans.yaml` on every `generate plan` invocation - Include a stable `snapshot_id` (hash of relevant fields), the stage, selection filters, and a `recorded_at` timestamp - Cap the history length with a configurable retention (default keep last 50 snapshots; older snapshots are pruned with a single rollup entry preserved) - Tests: round-trip, retention, repeat plans produce distinct snapshots ### T02 — Usage rollup from run records ```task id: IB-WP-0019-T02 status: done priority: high state_hub_task_id: "a612f8d4-f96d-4fae-9aa6-66a7946414f5" ``` - On `run` and `resume` completion, scan the run-record YAML written by the workflow engine and aggregate per-call usage into `output/budget/usage.yaml` - Aggregate buckets: workflow, stage, provider, model - Fields per bucket: `calls`, `prompt_tokens`, `completion_tokens`, `total_tokens`, `cost_usd_known` (sum over calls with known cost), `cost_usd_estimated` (computed via rate table fallback) - Append a top-level `runs[]` entry per completed run with the run's rollup, the `snapshot_id` of the plan it executed against (when one exists), and the wall-clock duration - Tests: aggregate across multiple stages, fixture-mode produces zero cost without erroring, missing-usage entries do not abort the rollup ### T03 — Cost computation from a rate table ```task id: IB-WP-0019-T03 status: done priority: high state_hub_task_id: "688c590d-8885-455e-bcf6-61409a45e001" ``` - Add `docs/model-rates.yaml` with `model -> {prompt_per_1k, completion_per_1k, currency, source_url, captured_at}` for the OpenRouter models we have actually used (start small: the ones currently exercised in tests/smoke) - Resolver order: adapter-returned cost (when present) > rate table > unknown (recorded explicitly, not silently zeroed) - Allow a per-workspace override via `${workspace}/model-rates.yaml` for self-hosted or private-rate setups - Tests: known model, unknown model surfaces as `cost_usd: null` with `cost_status: "unknown"`, override file takes precedence ### T04 — Plan-vs-actual variance and surfacing ```task id: IB-WP-0019-T04 status: done priority: medium state_hub_task_id: "c6adc4fb-9062-4c81-a0b2-98d3166e047d" ``` - Compute a small variance record on each run: actual_calls / estimated_calls, actual_tokens / estimated_tokens, actual_cost / estimated_cost, plus per-stage variance - Persist to `output/budget/summary.yaml` (overwrite each run; previous versions live in usage.yaml history) - Surface a one-line variance summary in `reports/generation-summary.md` (touches T07 of IB-WP-0016) - Add the variance summary to `generate status` JSON output - Tests: zero-cost fixture run, known-model OpenRouter mock run, missing-plan run (variance fields are null but the run still records) ### T05 — State-hub token-event emission ```task id: IB-WP-0019-T05 status: done priority: medium state_hub_task_id: "968bca1d-63ff-4818-83bb-ca314b1e633c" ``` - After each completed run, call state-hub `record_token_event` with the run's rollup (tokens in/out, model, USD cost when known, `infospace_slug`, `workspace`) - Emit at most one event per run; tag the event with the workplan context when available - Failure isolation: a state-hub error must not fail the run; log the failure and continue - Honor an opt-out env var `INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS` - Tests: monkey-patched hub client, opt-out flag respected, run succeeds when the hub raises ### T06 — Workspace-level rollup CLI ```task id: IB-WP-0019-T06 status: done priority: medium state_hub_task_id: "7cb34bfc-c562-4dda-a6d4-b44158644e19" ``` - Add `infospace-bench budget list ` that walks `infospaces/*/output/budget/` and prints a JSON table: `slug`, `plans_count`, `runs_count`, `total_tokens`, `total_cost_usd_known`, `total_cost_usd_estimated`, `last_run_at` - Add `infospace-bench budget show ` that prints the full per-infospace budget structure - Tests: empty workspace, multiple infospaces, missing budget dir is treated as zero, not an error ### T07 — Archive integration ```task id: IB-WP-0019-T07 status: done priority: low state_hub_task_id: "b97906e0-2835-4246-9868-840c02d64fae" ``` - Confirm `output/budget/` ends up inside the archive package built by `IB-WP-0014`'s `archive_infospace()` (it should, via the existing default-include rules — verify with a test) - Add a `budget_summary` field to the archive manifest so catalog-level tools can find the cost shape of an archived infospace without unpacking it ## Acceptance - A `generate plan` invocation persists a snapshot to `output/budget/plans.yaml` and is idempotent across runs - A `generate run` invocation appends a usage rollup to `output/budget/usage.yaml`, writes a variance summary, and emits one state-hub token event (when the hub is reachable) - `generate status` and the generation-summary report surface the plan-vs-actual variance for the most recent run - `infospace-bench budget list ` returns a parseable rollup across all infospaces in a workspace - Archived infospace packages carry their budget log and expose a `budget_summary` field in the archive manifest - Tests cover plan persistence, run rollup, rate-table resolution, variance, state-hub emission with hub-down isolation, and the workspace CLI ## Risks and open questions - **Rate-table drift.** Provider prices change. The rate table will go stale unless someone refreshes it. Add `captured_at` to every entry and surface "rate older than 90 days" as a warning in budget output; do not block. - **Multiple-provider cost.** When a single run mixes providers (e.g. fixture for cheap stages + OpenRouter for expensive ones), the rollup must split clearly. The model+provider bucketing in T02 covers this; tests should pin the behaviour. - **State-hub coupling.** Emitting token events introduces a cross-repo write. T05 keeps it opt-outable and failure-isolated, but callers running offline want zero coupling — make sure the default is "emit if reachable, silent skip otherwise" rather than "fail if unreachable". - **Concurrency.** Two `generate run` invocations on the same infospace would race on `usage.yaml`. Existing infospace workflows assume sequential runs; document the constraint rather than building locks. - **Budget vs adaptive observations.** This workplan records *what happened*. `LLM-WP-0004` records *what we learned about quality*. Keep them as two distinct files / schemas so the layering stays inspectable; do not merge. - **Privacy.** Usage records do not include prompt or completion text — only counts and identifiers. State-hub events likewise. If this assumption later changes, add an explicit redaction hook before doing so. ## Downstream effects - `IB-WP-0018` (adaptive routing consumer) gains a local history to cross-check against the `QualityLedger` once `LLM-WP-0004` lands - `IB-WP-0016-T07` (review report and output policy) can pull the variance summary directly instead of regenerating numbers - `IB-WP-0014` archives become budget-bearing artifacts without code changes beyond T07's manifest field - State-hub's `get_token_summary` finally sees infospace-bench runs alongside other domains' token spend