generated from coulomb/repo-seed
IB-WP-0019: budget and usage registry workplan (todo)
Open a separate workplan for the budget/usage recording layer surfaced by the T03 conversation. Three-layer design: layer 1 (per-infospace budget log) and layer 3 (state-hub emission) live here; layer 2 (cross-application quality observations for adaptive routing) stays in llm-connect LLM-WP-0004. Seven tasks cover plan snapshot persistence, run usage rollup, rate-table cost computation, plan-vs-actual variance, state-hub token events with hub-down isolation, a workspace-level rollup CLI, and archive integration so IB-WP-0014 packages carry their budget shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
256
workplans/IB-WP-0019-budget-and-usage-registry.md
Normal file
256
workplans/IB-WP-0019-budget-and-usage-registry.md
Normal file
@@ -0,0 +1,256 @@
|
||||
---
|
||||
id: IB-WP-0019
|
||||
type: workplan
|
||||
title: "Budget and Usage Registry for Infospaces"
|
||||
domain: markitect
|
||||
repo: infospace-bench
|
||||
status: todo
|
||||
owner: markitect
|
||||
topic_slug: markitect
|
||||
created: "2026-05-17"
|
||||
updated: "2026-05-17"
|
||||
depends_on_workplans: []
|
||||
related_workplans:
|
||||
- IB-WP-0016
|
||||
- IB-WP-0014
|
||||
- IB-WP-0018
|
||||
- LLM-WP-0004
|
||||
---
|
||||
|
||||
# IB-WP-0019 — Budget and Usage Registry for Infospaces
|
||||
|
||||
## Goal
|
||||
|
||||
Persist budget and usage signals at the per-infospace layer and emit
|
||||
organizational rollups, so every infospace can answer "what did we
|
||||
estimate, what did we actually spend, on which model, at what cost"
|
||||
without scraping commit messages or state-hub events.
|
||||
|
||||
This workplan owns the *recording and rollup* layer. It does **not**
|
||||
own:
|
||||
|
||||
- Adaptive routing decisions or per-task quality grading — those belong
|
||||
to `llm-connect` `LLM-WP-0004` and the consumer workplan `IB-WP-0018`.
|
||||
- Authoritative provider pricing — we read a small rate table and
|
||||
combine it with adapter-returned usage; the table itself is a static
|
||||
artifact that consumers refresh.
|
||||
|
||||
## Why
|
||||
|
||||
`IB-WP-0016-T03` made the planning estimates cheap to obtain (chunks,
|
||||
calls, tokens, rough USD), but the numbers vanish after the JSON is
|
||||
printed. Run records under `output/workflows/runs/*.yaml` capture
|
||||
per-call `prompt_tokens` and `completion_tokens` but nothing rolls them
|
||||
up, no cost is computed, and there is no plan-vs-actual variance.
|
||||
Without this layer:
|
||||
|
||||
- Each new infospace re-discovers the same cost surprises
|
||||
- `LLM-WP-0004`'s adaptive policy has no per-application history to
|
||||
learn from when it lands
|
||||
- `IB-WP-0014`'s archive packages forget the budget shape of the work
|
||||
that produced them
|
||||
- State-hub's organizational token ledger stays blind to
|
||||
infospace-bench runs
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Owning a cross-application quality ledger (that is `LLM-WP-0004`)
|
||||
- Auto-refreshing provider price lists at runtime
|
||||
- Failing a `generate run` when state-hub is unreachable
|
||||
- Persisting full prompt text for retrospective replay (the existing
|
||||
run records already keep what is needed)
|
||||
|
||||
## Layered design (read first)
|
||||
|
||||
Three layers, each owned by a different repo:
|
||||
|
||||
| Layer | Lives in | Purpose | This workplan? |
|
||||
|---|---|---|---|
|
||||
| 1. Per-infospace budget log | `infospace-bench` (this workplan) | Plans + usage + variance, archived with the infospace | yes |
|
||||
| 2. Cross-application observations | `llm-connect` (LLM-WP-0004) | Per-task per-adapter (cost, tokens, latency, quality) for adaptive routing | no |
|
||||
| 3. Organizational rollup | `state-hub` (already exists) | `record_token_event` / `get_token_summary` across all projects | this workplan emits, hub stores |
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 — Plan snapshot persistence
|
||||
|
||||
```task
|
||||
id: IB-WP-0019-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
- Append the compact `plan_generation_summary` payload to
|
||||
`output/budget/plans.yaml` on every `generate plan` invocation
|
||||
- Include a stable `snapshot_id` (hash of relevant fields), the stage,
|
||||
selection filters, and a `recorded_at` timestamp
|
||||
- Cap the history length with a configurable retention (default keep
|
||||
last 50 snapshots; older snapshots are pruned with a single rollup
|
||||
entry preserved)
|
||||
- Tests: round-trip, retention, repeat plans produce distinct snapshots
|
||||
|
||||
### T02 — Usage rollup from run records
|
||||
|
||||
```task
|
||||
id: IB-WP-0019-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
- On `run` and `resume` completion, scan the run-record YAML written by
|
||||
the workflow engine and aggregate per-call usage into
|
||||
`output/budget/usage.yaml`
|
||||
- Aggregate buckets: workflow, stage, provider, model
|
||||
- Fields per bucket: `calls`, `prompt_tokens`, `completion_tokens`,
|
||||
`total_tokens`, `cost_usd_known` (sum over calls with known cost),
|
||||
`cost_usd_estimated` (computed via rate table fallback)
|
||||
- Append a top-level `runs[]` entry per completed run with the run's
|
||||
rollup, the `snapshot_id` of the plan it executed against (when one
|
||||
exists), and the wall-clock duration
|
||||
- Tests: aggregate across multiple stages, fixture-mode produces zero
|
||||
cost without erroring, missing-usage entries do not abort the rollup
|
||||
|
||||
### T03 — Cost computation from a rate table
|
||||
|
||||
```task
|
||||
id: IB-WP-0019-T03
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
- Add `docs/model-rates.yaml` with `model -> {prompt_per_1k,
|
||||
completion_per_1k, currency, source_url, captured_at}` for the
|
||||
OpenRouter models we have actually used (start small: the ones
|
||||
currently exercised in tests/smoke)
|
||||
- Resolver order: adapter-returned cost (when present) > rate table >
|
||||
unknown (recorded explicitly, not silently zeroed)
|
||||
- Allow a per-workspace override via `${workspace}/model-rates.yaml`
|
||||
for self-hosted or private-rate setups
|
||||
- Tests: known model, unknown model surfaces as `cost_usd: null` with
|
||||
`cost_status: "unknown"`, override file takes precedence
|
||||
|
||||
### T04 — Plan-vs-actual variance and surfacing
|
||||
|
||||
```task
|
||||
id: IB-WP-0019-T04
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- Compute a small variance record on each run: actual_calls /
|
||||
estimated_calls, actual_tokens / estimated_tokens, actual_cost /
|
||||
estimated_cost, plus per-stage variance
|
||||
- Persist to `output/budget/summary.yaml` (overwrite each run; previous
|
||||
versions live in usage.yaml history)
|
||||
- Surface a one-line variance summary in
|
||||
`reports/generation-summary.md` (touches T07 of IB-WP-0016)
|
||||
- Add the variance summary to `generate status` JSON output
|
||||
- Tests: zero-cost fixture run, known-model OpenRouter mock run,
|
||||
missing-plan run (variance fields are null but the run still records)
|
||||
|
||||
### T05 — State-hub token-event emission
|
||||
|
||||
```task
|
||||
id: IB-WP-0019-T05
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- After each completed run, call state-hub `record_token_event` with
|
||||
the run's rollup (tokens in/out, model, USD cost when known,
|
||||
`infospace_slug`, `workspace`)
|
||||
- Emit at most one event per run; tag the event with the workplan
|
||||
context when available
|
||||
- Failure isolation: a state-hub error must not fail the run; log the
|
||||
failure and continue
|
||||
- Honor an opt-out env var `INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS`
|
||||
- Tests: monkey-patched hub client, opt-out flag respected, run
|
||||
succeeds when the hub raises
|
||||
|
||||
### T06 — Workspace-level rollup CLI
|
||||
|
||||
```task
|
||||
id: IB-WP-0019-T06
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- Add `infospace-bench budget list <workspace>` that walks
|
||||
`infospaces/*/output/budget/` and prints a JSON table:
|
||||
`slug`, `plans_count`, `runs_count`, `total_tokens`,
|
||||
`total_cost_usd_known`, `total_cost_usd_estimated`, `last_run_at`
|
||||
- Add `infospace-bench budget show <infospace-root>` that prints the
|
||||
full per-infospace budget structure
|
||||
- Tests: empty workspace, multiple infospaces, missing budget dir is
|
||||
treated as zero, not an error
|
||||
|
||||
### T07 — Archive integration
|
||||
|
||||
```task
|
||||
id: IB-WP-0019-T07
|
||||
status: todo
|
||||
priority: low
|
||||
```
|
||||
|
||||
- Confirm `output/budget/` ends up inside the archive package built by
|
||||
`IB-WP-0014`'s `archive_infospace()` (it should, via the existing
|
||||
default-include rules — verify with a test)
|
||||
- Add a `budget_summary` field to the archive manifest so
|
||||
catalog-level tools can find the cost shape of an archived infospace
|
||||
without unpacking it
|
||||
|
||||
## Acceptance
|
||||
|
||||
- A `generate plan` invocation persists a snapshot to
|
||||
`output/budget/plans.yaml` and is idempotent across runs
|
||||
- A `generate run` invocation appends a usage rollup to
|
||||
`output/budget/usage.yaml`, writes a variance summary, and emits one
|
||||
state-hub token event (when the hub is reachable)
|
||||
- `generate status` and the generation-summary report surface the
|
||||
plan-vs-actual variance for the most recent run
|
||||
- `infospace-bench budget list <workspace>` returns a parseable rollup
|
||||
across all infospaces in a workspace
|
||||
- Archived infospace packages carry their budget log and expose a
|
||||
`budget_summary` field in the archive manifest
|
||||
- Tests cover plan persistence, run rollup, rate-table resolution,
|
||||
variance, state-hub emission with hub-down isolation, and the
|
||||
workspace CLI
|
||||
|
||||
## Risks and open questions
|
||||
|
||||
- **Rate-table drift.** Provider prices change. The rate table will go
|
||||
stale unless someone refreshes it. Add `captured_at` to every entry
|
||||
and surface "rate older than 90 days" as a warning in budget output;
|
||||
do not block.
|
||||
- **Multiple-provider cost.** When a single run mixes providers (e.g.
|
||||
fixture for cheap stages + OpenRouter for expensive ones), the
|
||||
rollup must split clearly. The model+provider bucketing in T02
|
||||
covers this; tests should pin the behaviour.
|
||||
- **State-hub coupling.** Emitting token events introduces a
|
||||
cross-repo write. T05 keeps it opt-outable and failure-isolated, but
|
||||
callers running offline want zero coupling — make sure the default
|
||||
is "emit if reachable, silent skip otherwise" rather than "fail if
|
||||
unreachable".
|
||||
- **Concurrency.** Two `generate run` invocations on the same
|
||||
infospace would race on `usage.yaml`. Existing infospace workflows
|
||||
assume sequential runs; document the constraint rather than building
|
||||
locks.
|
||||
- **Budget vs adaptive observations.** This workplan records *what
|
||||
happened*. `LLM-WP-0004` records *what we learned about quality*.
|
||||
Keep them as two distinct files / schemas so the layering stays
|
||||
inspectable; do not merge.
|
||||
- **Privacy.** Usage records do not include prompt or completion
|
||||
text — only counts and identifiers. State-hub events likewise. If
|
||||
this assumption later changes, add an explicit redaction hook before
|
||||
doing so.
|
||||
|
||||
## Downstream effects
|
||||
|
||||
- `IB-WP-0018` (adaptive routing consumer) gains a local history to
|
||||
cross-check against the `QualityLedger` once `LLM-WP-0004` lands
|
||||
- `IB-WP-0016-T07` (review report and output policy) can pull the
|
||||
variance summary directly instead of regenerating numbers
|
||||
- `IB-WP-0014` archives become budget-bearing artifacts without code
|
||||
changes beyond T07's manifest field
|
||||
- State-hub's `get_token_summary` finally sees infospace-bench runs
|
||||
alongside other domains' token spend
|
||||
Reference in New Issue
Block a user