generated from coulomb/repo-seed
The default archive include set already pulls output/ in wholesale, so output/budget/ already lands inside the archive package with no code change. Add a budget_summary block to ArchiveRecord.metadata so catalog-level tools can see plans_count, runs_count, total_tokens, total_cost_usd_known, total_cost_usd_estimated, and the latest_snapshot_id without unpacking the archive. An infospace with no budget data still archives cleanly with an empty metadata dict. Closes IB-WP-0019 (Budget and Usage Registry): T01-T07 all done. Three-layer design landed end-to-end — layer 1 (per-infospace plans.yaml / usage.yaml / summary.yaml) and layer 3 (state-hub record_token_event emission with failure isolation) live here; layer 2 (cross-application QualityLedger for adaptive routing) is parked in llm-connect LLM-WP-0004 and infospace-bench IB-WP-0018 awaits it. 122 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
265 lines
10 KiB
Markdown
265 lines
10 KiB
Markdown
---
|
|
id: IB-WP-0019
|
|
type: workplan
|
|
title: "Budget and Usage Registry for Infospaces"
|
|
domain: markitect
|
|
repo: infospace-bench
|
|
status: done
|
|
owner: markitect
|
|
topic_slug: markitect
|
|
created: "2026-05-17"
|
|
updated: "2026-05-17"
|
|
depends_on_workplans: []
|
|
related_workplans:
|
|
- IB-WP-0016
|
|
- IB-WP-0014
|
|
- IB-WP-0018
|
|
- LLM-WP-0004
|
|
state_hub_workstream_id: "063c6285-a56e-476b-8666-109d6fa35858"
|
|
---
|
|
|
|
# IB-WP-0019 — Budget and Usage Registry for Infospaces
|
|
|
|
## Goal
|
|
|
|
Persist budget and usage signals at the per-infospace layer and emit
|
|
organizational rollups, so every infospace can answer "what did we
|
|
estimate, what did we actually spend, on which model, at what cost"
|
|
without scraping commit messages or state-hub events.
|
|
|
|
This workplan owns the *recording and rollup* layer. It does **not**
|
|
own:
|
|
|
|
- Adaptive routing decisions or per-task quality grading — those belong
|
|
to `llm-connect` `LLM-WP-0004` and the consumer workplan `IB-WP-0018`.
|
|
- Authoritative provider pricing — we read a small rate table and
|
|
combine it with adapter-returned usage; the table itself is a static
|
|
artifact that consumers refresh.
|
|
|
|
## Why
|
|
|
|
`IB-WP-0016-T03` made the planning estimates cheap to obtain (chunks,
|
|
calls, tokens, rough USD), but the numbers vanish after the JSON is
|
|
printed. Run records under `output/workflows/runs/*.yaml` capture
|
|
per-call `prompt_tokens` and `completion_tokens` but nothing rolls them
|
|
up, no cost is computed, and there is no plan-vs-actual variance.
|
|
Without this layer:
|
|
|
|
- Each new infospace re-discovers the same cost surprises
|
|
- `LLM-WP-0004`'s adaptive policy has no per-application history to
|
|
learn from when it lands
|
|
- `IB-WP-0014`'s archive packages forget the budget shape of the work
|
|
that produced them
|
|
- State-hub's organizational token ledger stays blind to
|
|
infospace-bench runs
|
|
|
|
## Non-Goals
|
|
|
|
- Owning a cross-application quality ledger (that is `LLM-WP-0004`)
|
|
- Auto-refreshing provider price lists at runtime
|
|
- Failing a `generate run` when state-hub is unreachable
|
|
- Persisting full prompt text for retrospective replay (the existing
|
|
run records already keep what is needed)
|
|
|
|
## Layered design (read first)
|
|
|
|
Three layers, each owned by a different repo:
|
|
|
|
| Layer | Lives in | Purpose | This workplan? |
|
|
|---|---|---|---|
|
|
| 1. Per-infospace budget log | `infospace-bench` (this workplan) | Plans + usage + variance, archived with the infospace | yes |
|
|
| 2. Cross-application observations | `llm-connect` (LLM-WP-0004) | Per-task per-adapter (cost, tokens, latency, quality) for adaptive routing | no |
|
|
| 3. Organizational rollup | `state-hub` (already exists) | `record_token_event` / `get_token_summary` across all projects | this workplan emits, hub stores |
|
|
|
|
## Tasks
|
|
|
|
### T01 — Plan snapshot persistence
|
|
|
|
```task
|
|
id: IB-WP-0019-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "7f1a4e0a-c1ad-49f3-aad1-6946de9b1219"
|
|
```
|
|
|
|
- Append the compact `plan_generation_summary` payload to
|
|
`output/budget/plans.yaml` on every `generate plan` invocation
|
|
- Include a stable `snapshot_id` (hash of relevant fields), the stage,
|
|
selection filters, and a `recorded_at` timestamp
|
|
- Cap the history length with a configurable retention (default keep
|
|
last 50 snapshots; older snapshots are pruned with a single rollup
|
|
entry preserved)
|
|
- Tests: round-trip, retention, repeat plans produce distinct snapshots
|
|
|
|
### T02 — Usage rollup from run records
|
|
|
|
```task
|
|
id: IB-WP-0019-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "a612f8d4-f96d-4fae-9aa6-66a7946414f5"
|
|
```
|
|
|
|
- On `run` and `resume` completion, scan the run-record YAML written by
|
|
the workflow engine and aggregate per-call usage into
|
|
`output/budget/usage.yaml`
|
|
- Aggregate buckets: workflow, stage, provider, model
|
|
- Fields per bucket: `calls`, `prompt_tokens`, `completion_tokens`,
|
|
`total_tokens`, `cost_usd_known` (sum over calls with known cost),
|
|
`cost_usd_estimated` (computed via rate table fallback)
|
|
- Append a top-level `runs[]` entry per completed run with the run's
|
|
rollup, the `snapshot_id` of the plan it executed against (when one
|
|
exists), and the wall-clock duration
|
|
- Tests: aggregate across multiple stages, fixture-mode produces zero
|
|
cost without erroring, missing-usage entries do not abort the rollup
|
|
|
|
### T03 — Cost computation from a rate table
|
|
|
|
```task
|
|
id: IB-WP-0019-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "688c590d-8885-455e-bcf6-61409a45e001"
|
|
```
|
|
|
|
- Add `docs/model-rates.yaml` with `model -> {prompt_per_1k,
|
|
completion_per_1k, currency, source_url, captured_at}` for the
|
|
OpenRouter models we have actually used (start small: the ones
|
|
currently exercised in tests/smoke)
|
|
- Resolver order: adapter-returned cost (when present) > rate table >
|
|
unknown (recorded explicitly, not silently zeroed)
|
|
- Allow a per-workspace override via `${workspace}/model-rates.yaml`
|
|
for self-hosted or private-rate setups
|
|
- Tests: known model, unknown model surfaces as `cost_usd: null` with
|
|
`cost_status: "unknown"`, override file takes precedence
|
|
|
|
### T04 — Plan-vs-actual variance and surfacing
|
|
|
|
```task
|
|
id: IB-WP-0019-T04
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "c6adc4fb-9062-4c81-a0b2-98d3166e047d"
|
|
```
|
|
|
|
- Compute a small variance record on each run: actual_calls /
|
|
estimated_calls, actual_tokens / estimated_tokens, actual_cost /
|
|
estimated_cost, plus per-stage variance
|
|
- Persist to `output/budget/summary.yaml` (overwrite each run; previous
|
|
versions live in usage.yaml history)
|
|
- Surface a one-line variance summary in
|
|
`reports/generation-summary.md` (touches T07 of IB-WP-0016)
|
|
- Add the variance summary to `generate status` JSON output
|
|
- Tests: zero-cost fixture run, known-model OpenRouter mock run,
|
|
missing-plan run (variance fields are null but the run still records)
|
|
|
|
### T05 — State-hub token-event emission
|
|
|
|
```task
|
|
id: IB-WP-0019-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "968bca1d-63ff-4818-83bb-ca314b1e633c"
|
|
```
|
|
|
|
- After each completed run, call state-hub `record_token_event` with
|
|
the run's rollup (tokens in/out, model, USD cost when known,
|
|
`infospace_slug`, `workspace`)
|
|
- Emit at most one event per run; tag the event with the workplan
|
|
context when available
|
|
- Failure isolation: a state-hub error must not fail the run; log the
|
|
failure and continue
|
|
- Honor an opt-out env var `INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS`
|
|
- Tests: monkey-patched hub client, opt-out flag respected, run
|
|
succeeds when the hub raises
|
|
|
|
### T06 — Workspace-level rollup CLI
|
|
|
|
```task
|
|
id: IB-WP-0019-T06
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "7cb34bfc-c562-4dda-a6d4-b44158644e19"
|
|
```
|
|
|
|
- Add `infospace-bench budget list <workspace>` that walks
|
|
`infospaces/*/output/budget/` and prints a JSON table:
|
|
`slug`, `plans_count`, `runs_count`, `total_tokens`,
|
|
`total_cost_usd_known`, `total_cost_usd_estimated`, `last_run_at`
|
|
- Add `infospace-bench budget show <infospace-root>` that prints the
|
|
full per-infospace budget structure
|
|
- Tests: empty workspace, multiple infospaces, missing budget dir is
|
|
treated as zero, not an error
|
|
|
|
### T07 — Archive integration
|
|
|
|
```task
|
|
id: IB-WP-0019-T07
|
|
status: done
|
|
priority: low
|
|
state_hub_task_id: "b97906e0-2835-4246-9868-840c02d64fae"
|
|
```
|
|
|
|
- Confirm `output/budget/` ends up inside the archive package built by
|
|
`IB-WP-0014`'s `archive_infospace()` (it should, via the existing
|
|
default-include rules — verify with a test)
|
|
- Add a `budget_summary` field to the archive manifest so
|
|
catalog-level tools can find the cost shape of an archived infospace
|
|
without unpacking it
|
|
|
|
## Acceptance
|
|
|
|
- A `generate plan` invocation persists a snapshot to
|
|
`output/budget/plans.yaml` and is idempotent across runs
|
|
- A `generate run` invocation appends a usage rollup to
|
|
`output/budget/usage.yaml`, writes a variance summary, and emits one
|
|
state-hub token event (when the hub is reachable)
|
|
- `generate status` and the generation-summary report surface the
|
|
plan-vs-actual variance for the most recent run
|
|
- `infospace-bench budget list <workspace>` returns a parseable rollup
|
|
across all infospaces in a workspace
|
|
- Archived infospace packages carry their budget log and expose a
|
|
`budget_summary` field in the archive manifest
|
|
- Tests cover plan persistence, run rollup, rate-table resolution,
|
|
variance, state-hub emission with hub-down isolation, and the
|
|
workspace CLI
|
|
|
|
## Risks and open questions
|
|
|
|
- **Rate-table drift.** Provider prices change. The rate table will go
|
|
stale unless someone refreshes it. Add `captured_at` to every entry
|
|
and surface "rate older than 90 days" as a warning in budget output;
|
|
do not block.
|
|
- **Multiple-provider cost.** When a single run mixes providers (e.g.
|
|
fixture for cheap stages + OpenRouter for expensive ones), the
|
|
rollup must split clearly. The model+provider bucketing in T02
|
|
covers this; tests should pin the behaviour.
|
|
- **State-hub coupling.** Emitting token events introduces a
|
|
cross-repo write. T05 keeps it opt-outable and failure-isolated, but
|
|
callers running offline want zero coupling — make sure the default
|
|
is "emit if reachable, silent skip otherwise" rather than "fail if
|
|
unreachable".
|
|
- **Concurrency.** Two `generate run` invocations on the same
|
|
infospace would race on `usage.yaml`. Existing infospace workflows
|
|
assume sequential runs; document the constraint rather than building
|
|
locks.
|
|
- **Budget vs adaptive observations.** This workplan records *what
|
|
happened*. `LLM-WP-0004` records *what we learned about quality*.
|
|
Keep them as two distinct files / schemas so the layering stays
|
|
inspectable; do not merge.
|
|
- **Privacy.** Usage records do not include prompt or completion
|
|
text — only counts and identifiers. State-hub events likewise. If
|
|
this assumption later changes, add an explicit redaction hook before
|
|
doing so.
|
|
|
|
## Downstream effects
|
|
|
|
- `IB-WP-0018` (adaptive routing consumer) gains a local history to
|
|
cross-check against the `QualityLedger` once `LLM-WP-0004` lands
|
|
- `IB-WP-0016-T07` (review report and output policy) can pull the
|
|
variance summary directly instead of regenerating numbers
|
|
- `IB-WP-0014` archives become budget-bearing artifacts without code
|
|
changes beyond T07's manifest field
|
|
- State-hub's `get_token_summary` finally sees infospace-bench runs
|
|
alongside other domains' token spend
|