IB-WP-0019: budget and usage registry workplan (todo)

Open a separate workplan for the budget/usage recording layer surfaced
by the T03 conversation. Three-layer design: layer 1 (per-infospace
budget log) and layer 3 (state-hub emission) live here; layer 2
(cross-application quality observations for adaptive routing) stays in
llm-connect LLM-WP-0004.

Seven tasks cover plan snapshot persistence, run usage rollup,
rate-table cost computation, plan-vs-actual variance, state-hub token
events with hub-down isolation, a workspace-level rollup CLI, and
archive integration so IB-WP-0014 packages carry their budget shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 18:30:49 +02:00
parent 74c52c6239
commit d2deebe081

View File

@@ -0,0 +1,256 @@
---
id: IB-WP-0019
type: workplan
title: "Budget and Usage Registry for Infospaces"
domain: markitect
repo: infospace-bench
status: todo
owner: markitect
topic_slug: markitect
created: "2026-05-17"
updated: "2026-05-17"
depends_on_workplans: []
related_workplans:
- IB-WP-0016
- IB-WP-0014
- IB-WP-0018
- LLM-WP-0004
---
# IB-WP-0019 — Budget and Usage Registry for Infospaces
## Goal
Persist budget and usage signals at the per-infospace layer and emit
organizational rollups, so every infospace can answer "what did we
estimate, what did we actually spend, on which model, at what cost"
without scraping commit messages or state-hub events.
This workplan owns the *recording and rollup* layer. It does **not**
own:
- Adaptive routing decisions or per-task quality grading — those belong
to `llm-connect` `LLM-WP-0004` and the consumer workplan `IB-WP-0018`.
- Authoritative provider pricing — we read a small rate table and
combine it with adapter-returned usage; the table itself is a static
artifact that consumers refresh.
## Why
`IB-WP-0016-T03` made the planning estimates cheap to obtain (chunks,
calls, tokens, rough USD), but the numbers vanish after the JSON is
printed. Run records under `output/workflows/runs/*.yaml` capture
per-call `prompt_tokens` and `completion_tokens` but nothing rolls them
up, no cost is computed, and there is no plan-vs-actual variance.
Without this layer:
- Each new infospace re-discovers the same cost surprises
- `LLM-WP-0004`'s adaptive policy has no per-application history to
learn from when it lands
- `IB-WP-0014`'s archive packages forget the budget shape of the work
that produced them
- State-hub's organizational token ledger stays blind to
infospace-bench runs
## Non-Goals
- Owning a cross-application quality ledger (that is `LLM-WP-0004`)
- Auto-refreshing provider price lists at runtime
- Failing a `generate run` when state-hub is unreachable
- Persisting full prompt text for retrospective replay (the existing
run records already keep what is needed)
## Layered design (read first)
Three layers, each owned by a different repo:
| Layer | Lives in | Purpose | This workplan? |
|---|---|---|---|
| 1. Per-infospace budget log | `infospace-bench` (this workplan) | Plans + usage + variance, archived with the infospace | yes |
| 2. Cross-application observations | `llm-connect` (LLM-WP-0004) | Per-task per-adapter (cost, tokens, latency, quality) for adaptive routing | no |
| 3. Organizational rollup | `state-hub` (already exists) | `record_token_event` / `get_token_summary` across all projects | this workplan emits, hub stores |
## Tasks
### T01 — Plan snapshot persistence
```task
id: IB-WP-0019-T01
status: todo
priority: high
```
- Append the compact `plan_generation_summary` payload to
`output/budget/plans.yaml` on every `generate plan` invocation
- Include a stable `snapshot_id` (hash of relevant fields), the stage,
selection filters, and a `recorded_at` timestamp
- Cap the history length with a configurable retention (default keep
last 50 snapshots; older snapshots are pruned with a single rollup
entry preserved)
- Tests: round-trip, retention, repeat plans produce distinct snapshots
### T02 — Usage rollup from run records
```task
id: IB-WP-0019-T02
status: todo
priority: high
```
- On `run` and `resume` completion, scan the run-record YAML written by
the workflow engine and aggregate per-call usage into
`output/budget/usage.yaml`
- Aggregate buckets: workflow, stage, provider, model
- Fields per bucket: `calls`, `prompt_tokens`, `completion_tokens`,
`total_tokens`, `cost_usd_known` (sum over calls with known cost),
`cost_usd_estimated` (computed via rate table fallback)
- Append a top-level `runs[]` entry per completed run with the run's
rollup, the `snapshot_id` of the plan it executed against (when one
exists), and the wall-clock duration
- Tests: aggregate across multiple stages, fixture-mode produces zero
cost without erroring, missing-usage entries do not abort the rollup
### T03 — Cost computation from a rate table
```task
id: IB-WP-0019-T03
status: todo
priority: high
```
- Add `docs/model-rates.yaml` with `model -> {prompt_per_1k,
completion_per_1k, currency, source_url, captured_at}` for the
OpenRouter models we have actually used (start small: the ones
currently exercised in tests/smoke)
- Resolver order: adapter-returned cost (when present) > rate table >
unknown (recorded explicitly, not silently zeroed)
- Allow a per-workspace override via `${workspace}/model-rates.yaml`
for self-hosted or private-rate setups
- Tests: known model, unknown model surfaces as `cost_usd: null` with
`cost_status: "unknown"`, override file takes precedence
### T04 — Plan-vs-actual variance and surfacing
```task
id: IB-WP-0019-T04
status: todo
priority: medium
```
- Compute a small variance record on each run: actual_calls /
estimated_calls, actual_tokens / estimated_tokens, actual_cost /
estimated_cost, plus per-stage variance
- Persist to `output/budget/summary.yaml` (overwrite each run; previous
versions live in usage.yaml history)
- Surface a one-line variance summary in
`reports/generation-summary.md` (touches T07 of IB-WP-0016)
- Add the variance summary to `generate status` JSON output
- Tests: zero-cost fixture run, known-model OpenRouter mock run,
missing-plan run (variance fields are null but the run still records)
### T05 — State-hub token-event emission
```task
id: IB-WP-0019-T05
status: todo
priority: medium
```
- After each completed run, call state-hub `record_token_event` with
the run's rollup (tokens in/out, model, USD cost when known,
`infospace_slug`, `workspace`)
- Emit at most one event per run; tag the event with the workplan
context when available
- Failure isolation: a state-hub error must not fail the run; log the
failure and continue
- Honor an opt-out env var `INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS`
- Tests: monkey-patched hub client, opt-out flag respected, run
succeeds when the hub raises
### T06 — Workspace-level rollup CLI
```task
id: IB-WP-0019-T06
status: todo
priority: medium
```
- Add `infospace-bench budget list <workspace>` that walks
`infospaces/*/output/budget/` and prints a JSON table:
`slug`, `plans_count`, `runs_count`, `total_tokens`,
`total_cost_usd_known`, `total_cost_usd_estimated`, `last_run_at`
- Add `infospace-bench budget show <infospace-root>` that prints the
full per-infospace budget structure
- Tests: empty workspace, multiple infospaces, missing budget dir is
treated as zero, not an error
### T07 — Archive integration
```task
id: IB-WP-0019-T07
status: todo
priority: low
```
- Confirm `output/budget/` ends up inside the archive package built by
`IB-WP-0014`'s `archive_infospace()` (it should, via the existing
default-include rules — verify with a test)
- Add a `budget_summary` field to the archive manifest so
catalog-level tools can find the cost shape of an archived infospace
without unpacking it
## Acceptance
- A `generate plan` invocation persists a snapshot to
`output/budget/plans.yaml` and is idempotent across runs
- A `generate run` invocation appends a usage rollup to
`output/budget/usage.yaml`, writes a variance summary, and emits one
state-hub token event (when the hub is reachable)
- `generate status` and the generation-summary report surface the
plan-vs-actual variance for the most recent run
- `infospace-bench budget list <workspace>` returns a parseable rollup
across all infospaces in a workspace
- Archived infospace packages carry their budget log and expose a
`budget_summary` field in the archive manifest
- Tests cover plan persistence, run rollup, rate-table resolution,
variance, state-hub emission with hub-down isolation, and the
workspace CLI
## Risks and open questions
- **Rate-table drift.** Provider prices change. The rate table will go
stale unless someone refreshes it. Add `captured_at` to every entry
and surface "rate older than 90 days" as a warning in budget output;
do not block.
- **Multiple-provider cost.** When a single run mixes providers (e.g.
fixture for cheap stages + OpenRouter for expensive ones), the
rollup must split clearly. The model+provider bucketing in T02
covers this; tests should pin the behaviour.
- **State-hub coupling.** Emitting token events introduces a
cross-repo write. T05 keeps it opt-outable and failure-isolated, but
callers running offline want zero coupling — make sure the default
is "emit if reachable, silent skip otherwise" rather than "fail if
unreachable".
- **Concurrency.** Two `generate run` invocations on the same
infospace would race on `usage.yaml`. Existing infospace workflows
assume sequential runs; document the constraint rather than building
locks.
- **Budget vs adaptive observations.** This workplan records *what
happened*. `LLM-WP-0004` records *what we learned about quality*.
Keep them as two distinct files / schemas so the layering stays
inspectable; do not merge.
- **Privacy.** Usage records do not include prompt or completion
text — only counts and identifiers. State-hub events likewise. If
this assumption later changes, add an explicit redaction hook before
doing so.
## Downstream effects
- `IB-WP-0018` (adaptive routing consumer) gains a local history to
cross-check against the `QualityLedger` once `LLM-WP-0004` lands
- `IB-WP-0016-T07` (review report and output policy) can pull the
variance summary directly instead of regenerating numbers
- `IB-WP-0014` archives become budget-bearing artifacts without code
changes beyond T07's manifest field
- State-hub's `get_token_summary` finally sees infospace-bench runs
alongside other domains' token spend