infospace-bench/workplans/IB-WP-0019-budget-and-usage-registry.md

---
id: IB-WP-0019
type: workplan
title: "Budget and Usage Registry for Infospaces"
domain: markitect
repo: infospace-bench
status: done
owner: markitect
topic_slug: markitect
created: "2026-05-17"
updated: "2026-05-17"
depends_on_workplans: []
related_workplans:
  - IB-WP-0016
  - IB-WP-0014
  - IB-WP-0018
  - LLM-WP-0004
state_hub_workstream_id: "063c6285-a56e-476b-8666-109d6fa35858"
---

# IB-WP-0019 — Budget and Usage Registry for Infospaces

## Goal

Persist budget and usage signals at the per-infospace layer and emit
organizational rollups, so every infospace can answer "what did we
estimate, what did we actually spend, on which model, at what cost"
without scraping commit messages or state-hub events.

This workplan owns the *recording and rollup* layer. It does **not**
own:

- Adaptive routing decisions or per-task quality grading — those belong
  to `llm-connect` `LLM-WP-0004` and the consumer workplan `IB-WP-0018`.
- Authoritative provider pricing — we read a small rate table and
  combine it with adapter-returned usage; the table itself is a static
  artifact that consumers refresh.

## Why

`IB-WP-0016-T03` made the planning estimates cheap to obtain (chunks,
calls, tokens, rough USD), but the numbers vanish after the JSON is
printed. Run records under `output/workflows/runs/*.yaml` capture
per-call `prompt_tokens` and `completion_tokens` but nothing rolls them
up, no cost is computed, and there is no plan-vs-actual variance.
Without this layer:

- Each new infospace re-discovers the same cost surprises
- `LLM-WP-0004`'s adaptive policy has no per-application history to
  learn from when it lands
- `IB-WP-0014`'s archive packages forget the budget shape of the work
  that produced them
- State-hub's organizational token ledger stays blind to
  infospace-bench runs

## Non-Goals

- Owning a cross-application quality ledger (that is `LLM-WP-0004`)
- Auto-refreshing provider price lists at runtime
- Failing a `generate run` when state-hub is unreachable
- Persisting full prompt text for retrospective replay (the existing
  run records already keep what is needed)

## Layered design (read first)

Three layers, each owned by a different repo:

| Layer | Lives in | Purpose | This workplan? |
|---|---|---|---|
| 1. Per-infospace budget log | `infospace-bench` (this workplan) | Plans + usage + variance, archived with the infospace | yes |
| 2. Cross-application observations | `llm-connect` (LLM-WP-0004) | Per-task per-adapter (cost, tokens, latency, quality) for adaptive routing | no |
| 3. Organizational rollup | `state-hub` (already exists) | `record_token_event` / `get_token_summary` across all projects | this workplan emits, hub stores |

## Tasks

### T01 — Plan snapshot persistence

```task
id: IB-WP-0019-T01
status: done
priority: high
state_hub_task_id: "7f1a4e0a-c1ad-49f3-aad1-6946de9b1219"
```

- Append the compact `plan_generation_summary` payload to
  `output/budget/plans.yaml` on every `generate plan` invocation
- Include a stable `snapshot_id` (hash of relevant fields), the stage,
  selection filters, and a `recorded_at` timestamp
- Cap the history length with a configurable retention (default keep
  last 50 snapshots; older snapshots are pruned with a single rollup
  entry preserved)
- Tests: round-trip, retention, repeat plans produce distinct snapshots

### T02 — Usage rollup from run records

```task
id: IB-WP-0019-T02
status: done
priority: high
state_hub_task_id: "a612f8d4-f96d-4fae-9aa6-66a7946414f5"
```

- On `run` and `resume` completion, scan the run-record YAML written by
  the workflow engine and aggregate per-call usage into
  `output/budget/usage.yaml`
- Aggregate buckets: workflow, stage, provider, model
- Fields per bucket: `calls`, `prompt_tokens`, `completion_tokens`,
  `total_tokens`, `cost_usd_known` (sum over calls with known cost),
  `cost_usd_estimated` (computed via rate table fallback)
- Append a top-level `runs[]` entry per completed run with the run's
  rollup, the `snapshot_id` of the plan it executed against (when one
  exists), and the wall-clock duration
- Tests: aggregate across multiple stages, fixture-mode produces zero
  cost without erroring, missing-usage entries do not abort the rollup

### T03 — Cost computation from a rate table

```task
id: IB-WP-0019-T03
status: done
priority: high
state_hub_task_id: "688c590d-8885-455e-bcf6-61409a45e001"
```

- Add `docs/model-rates.yaml` with `model -> {prompt_per_1k,
  completion_per_1k, currency, source_url, captured_at}` for the
  OpenRouter models we have actually used (start small: the ones
  currently exercised in tests/smoke)
- Resolver order: adapter-returned cost (when present) > rate table >
  unknown (recorded explicitly, not silently zeroed)
- Allow a per-workspace override via `${workspace}/model-rates.yaml`
  for self-hosted or private-rate setups
- Tests: known model, unknown model surfaces as `cost_usd: null` with
  `cost_status: "unknown"`, override file takes precedence

### T04 — Plan-vs-actual variance and surfacing

```task
id: IB-WP-0019-T04
status: done
priority: medium
state_hub_task_id: "c6adc4fb-9062-4c81-a0b2-98d3166e047d"
```

- Compute a small variance record on each run: actual_calls /
  estimated_calls, actual_tokens / estimated_tokens, actual_cost /
  estimated_cost, plus per-stage variance
- Persist to `output/budget/summary.yaml` (overwrite each run; previous
  versions live in usage.yaml history)
- Surface a one-line variance summary in
  `reports/generation-summary.md` (touches T07 of IB-WP-0016)
- Add the variance summary to `generate status` JSON output
- Tests: zero-cost fixture run, known-model OpenRouter mock run,
  missing-plan run (variance fields are null but the run still records)

### T05 — State-hub token-event emission

```task
id: IB-WP-0019-T05
status: done
priority: medium
state_hub_task_id: "968bca1d-63ff-4818-83bb-ca314b1e633c"
```

- After each completed run, call state-hub `record_token_event` with
  the run's rollup (tokens in/out, model, USD cost when known,
  `infospace_slug`, `workspace`)
- Emit at most one event per run; tag the event with the workplan
  context when available
- Failure isolation: a state-hub error must not fail the run; log the
  failure and continue
- Honor an opt-out env var `INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS`
- Tests: monkey-patched hub client, opt-out flag respected, run
  succeeds when the hub raises

### T06 — Workspace-level rollup CLI

```task
id: IB-WP-0019-T06
status: done
priority: medium
state_hub_task_id: "7cb34bfc-c562-4dda-a6d4-b44158644e19"
```

- Add `infospace-bench budget list <workspace>` that walks
  `infospaces/*/output/budget/` and prints a JSON table:
  `slug`, `plans_count`, `runs_count`, `total_tokens`,
  `total_cost_usd_known`, `total_cost_usd_estimated`, `last_run_at`
- Add `infospace-bench budget show <infospace-root>` that prints the
  full per-infospace budget structure
- Tests: empty workspace, multiple infospaces, missing budget dir is
  treated as zero, not an error

### T07 — Archive integration

```task
id: IB-WP-0019-T07
status: done
priority: low
state_hub_task_id: "b97906e0-2835-4246-9868-840c02d64fae"
```

- Confirm `output/budget/` ends up inside the archive package built by
  `IB-WP-0014`'s `archive_infospace()` (it should, via the existing
  default-include rules — verify with a test)
- Add a `budget_summary` field to the archive manifest so
  catalog-level tools can find the cost shape of an archived infospace
  without unpacking it

## Acceptance

- A `generate plan` invocation persists a snapshot to
  `output/budget/plans.yaml` and is idempotent across runs
- A `generate run` invocation appends a usage rollup to
  `output/budget/usage.yaml`, writes a variance summary, and emits one
  state-hub token event (when the hub is reachable)
- `generate status` and the generation-summary report surface the
  plan-vs-actual variance for the most recent run
- `infospace-bench budget list <workspace>` returns a parseable rollup
  across all infospaces in a workspace
- Archived infospace packages carry their budget log and expose a
  `budget_summary` field in the archive manifest
- Tests cover plan persistence, run rollup, rate-table resolution,
  variance, state-hub emission with hub-down isolation, and the
  workspace CLI

## Risks and open questions

- **Rate-table drift.** Provider prices change. The rate table will go
  stale unless someone refreshes it. Add `captured_at` to every entry
  and surface "rate older than 90 days" as a warning in budget output;
  do not block.
- **Multiple-provider cost.** When a single run mixes providers (e.g.
  fixture for cheap stages + OpenRouter for expensive ones), the
  rollup must split clearly. The model+provider bucketing in T02
  covers this; tests should pin the behaviour.
- **State-hub coupling.** Emitting token events introduces a
  cross-repo write. T05 keeps it opt-outable and failure-isolated, but
  callers running offline want zero coupling — make sure the default
  is "emit if reachable, silent skip otherwise" rather than "fail if
  unreachable".
- **Concurrency.** Two `generate run` invocations on the same
  infospace would race on `usage.yaml`. Existing infospace workflows
  assume sequential runs; document the constraint rather than building
  locks.
- **Budget vs adaptive observations.** This workplan records *what
  happened*. `LLM-WP-0004` records *what we learned about quality*.
  Keep them as two distinct files / schemas so the layering stays
  inspectable; do not merge.
- **Privacy.** Usage records do not include prompt or completion
  text — only counts and identifiers. State-hub events likewise. If
  this assumption later changes, add an explicit redaction hook before
  doing so.

## Downstream effects

- `IB-WP-0018` (adaptive routing consumer) gains a local history to
  cross-check against the `QualityLedger` once `LLM-WP-0004` lands
- `IB-WP-0016-T07` (review report and output policy) can pull the
  variance summary directly instead of regenerating numbers
- `IB-WP-0014` archives become budget-bearing artifacts without code
  changes beyond T07's manifest field
- State-hub's `get_token_summary` finally sees infospace-bench runs
  alongside other domains' token spend