Files

tegwick d65bc701da feat(token-tracking): record AI token consumption per task (CUST-WP-0029)

Introduces end-to-end token consumption tracking so agent work is
visible as a cost/effort metric alongside tasks and workplans.

- Migration o2j3k4l5m6n7: token_events table with FK indexes on
  task_id, workstream_id, repo_id, created_at
- ORM model, Pydantic schemas (TokenEventCreate, TokenEventRead with
  computed tokens_total, TokenSummary)
- Router: POST /token-events/, GET /token-events/ (7 filters),
  GET /token-events/summary/ (task|workstream|repo|commit|release scope)
- MCP tools: record_token_event, get_token_summary (formatted table)
- update_task_status enriched with optional tokens_in/tokens_out
  passthrough — one call creates status update + token event
- Dashboard token-cost.md page: by-repo bar, by-workplan table,
  by-model bar, top-10 tasks by tokens
- ralph-workplan skill updated with token reporting guidance and
  per-task heuristics for estimating counts
- Tests: test_token_events.py + test_token_passthrough.py (182 pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-29 17:46:46 +02:00

7.8 KiB

Raw Permalink Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id

id	type	title	domain	repo	status	owner	topic_slug	created	updated	state_hub_workstream_id
CUST-WP-0029	workplan	Token Consumption Tracking	custodian	the-custodian	done	custodian	custodian	2026-03-29	2026-03-29	6f459d9f-b4d4-46d7-a5d7-d5f10721b29e

Token Consumption Tracking

Goal

Record AI token consumption at task granularity and aggregate it up to workstream, repo, commit, and release level. Makes agent work visible as a cost/effort metric — reviewable alongside tasks, workplans, and releases.

Background

Sessions produce no durable token signal today. Without it, there is no way to ask "how expensive was WP-0003?" or "which repo consumes the most tokens per release?". Token counts are the closest proxy for AI effort and cost that can be captured without external instrumentation.

Reporting model: agents self-report tokens when completing a task (alongside update_task_status). The ralph-workplan skill is updated to pass token data per iteration. Commit/release tagging is optional and manual.

Schema

token_events
  id            UUID PK
  task_id       UUID FK tasks (nullable)
  workstream_id UUID FK workstreams (nullable)
  repo_id       UUID FK managed_repos (nullable)
  session_id    TEXT          -- agent session identifier
  model         TEXT          -- e.g. "claude-sonnet-4-6"
  tokens_in     INT NOT NULL
  tokens_out    INT NOT NULL
  agent         TEXT          -- "custodian", "ralph", etc.
  ref_type      TEXT          -- 'task'|'workstream'|'commit'|'release'|'session'
  ref_id        TEXT          -- commit SHA, release tag, etc.
  note          TEXT
  created_at    TIMESTAMPTZ   server_default=now()

Derived field tokens_total = tokens_in + tokens_out computed at query time. Aggregation endpoint rolls up by any FK axis.

Exit Criteria

Token events can be recorded via MCP tool
Aggregation queries work for task / workstream / repo / commit / release
Dashboard page shows token spend by repo, workplan, model
ralph-workplan logs a token event per completed task iteration
All tests passing; consistency check clean

Tasks

T01 — Migration: token_events table

id: CUST-WP-0029-T01
status: done
priority: high
state_hub_task_id: "5a758a61-4021-44e8-8f99-60a63cba6e50"

Write Alembic migration for token_events. Table as per schema above. Indexes: ix_token_events_task_id, ix_token_events_workstream_id, ix_token_events_repo_id, ix_token_events_created_at.

down_revision = current Alembic head (check with alembic heads).

Exit criteria: make migrate succeeds; table visible in psql.

T02 — Model and schema

id: CUST-WP-0029-T02
status: done
priority: high
state_hub_task_id: "57d71132-001a-4c85-bc39-2d20155c4971"

Add api/models/token_event.py (SQLAlchemy ORM, relationships to Task, Workstream, ManagedRepo). Add api/schemas/token_event.py:

TokenEventCreate — input (task_id, workstream_id, repo_id all nullable; tokens_in, tokens_out required; model, agent, ref_type, ref_id, note optional)
TokenEventRead — full row + tokens_total: int computed field
TokenSummary — aggregated view: { scope, scope_id, tokens_in, tokens_out, tokens_total, event_count, by_model: dict[str, int], by_agent: dict[str, int] }

Exit criteria: models import cleanly; tokens_total computed correctly.

T03 — Router: CRUD + aggregation

id: CUST-WP-0029-T03
status: done
priority: high
state_hub_task_id: "14604f26-5aa6-455d-b65d-0e7a4ba42509"

Add api/routers/token_events.py:

POST /token-events/ — create event; auto-populate workstream_id from task if not provided (look up task.workstream_id)
GET /token-events/ — list with filters: task_id, workstream_id, repo_id, ref_type, ref_id, model, agent; default limit 100
GET /token-events/summary/ — aggregation; required query param scope (task|workstream|repo|commit|release) + id (the FK value or ref_id). Returns TokenSummary.

Exit criteria: all three endpoints return correct data; summary rolls up tokens_in/tokens_out and breaks down by model and agent.

T04 — MCP tools: record_token_event and get_token_summary

id: CUST-WP-0029-T04
status: done
priority: high
state_hub_task_id: "e032aa47-2bf9-4cd7-982c-4d21c9d5e286"

Add two tools to mcp_server/server.py:

record_token_event(tokens_in, tokens_out, task_id?, workstream_id?, repo_id?, model?, agent?, ref_type?, ref_id?, note?, session_id?)

POSTs to /token-events/
Returns the created event id and running total for the task/workstream

get_token_summary(scope, id)

GETs /token-events/summary/?scope=X&id=Y
Returns TokenSummary formatted as a readable table string

Update TOOLS.md with both tools.

Exit criteria: both tools callable from Claude Code MCP; record → read round-trip works.

T05 — Enrich update_task_status with optional token passthrough

id: CUST-WP-0029-T05
status: done
priority: medium
state_hub_task_id: "0c95c442-0b03-4acc-aba7-a986fd006416"

Extend the existing update_task_status MCP tool signature to accept optional tokens_in: int and tokens_out: int parameters. When provided, automatically call record_token_event internally (with the task's workstream_id and repo_id auto-populated). This lets agents report tokens in one call instead of two.

Keep the parameters optional — existing callers are unaffected.

Exit criteria: update_task_status(task_id, status='done', tokens_in=1200, tokens_out=800) creates both a status update and a token event.

T06 — Dashboard: token cost page

id: CUST-WP-0029-T06
status: done
priority: medium
state_hub_task_id: "02cc5d8e-a9da-4fb3-9c39-fdc05812d8d0"

Add dashboard/src/token-cost.md Observable page:

By repo bar chart — total tokens per repo (stacked in/out)
By workplan table — workstream slug, title, tokens_total, event_count, dominant model
By model breakdown — pie or bar; shows model mix across all events
Top 10 tasks by tokens — useful for identifying expensive tasks

Data loader: dashboard/src/data/token-summary.json.py — calls GET /token-events/summary/ for each repo and workstream.

Add page to observablehq.config.js nav under "Analytics".

Exit criteria: page renders with real data; updates on refresh.

T07 — ralph-workplan: log token event per completed task

id: CUST-WP-0029-T07
status: done
priority: medium
state_hub_task_id: "676f2a94-b5a5-467d-8dd4-889817acb159"

Update ~/.claude/plugins/ralph-workplan/ralph-workplan.md and ~/.claude/commands/ralph-workplan.md:

Add a note in Step 5 (loop active) instructing the agent that each time a task is marked done, it should report tokens via update_task_status (with tokens_in/tokens_out) or a standalone record_token_event call.

Guidance: estimate tokens from the Claude Code status bar (input/output shown at session end) or use a rough per-task heuristic (1000 in / 500 out) when exact counts are unavailable. Log model from the known session model (claude-sonnet-4-6 by default).

Exit criteria: both skill files updated; guidance is clear and actionable for an agent running a loop.

T08 — Tests and consistency gate

id: CUST-WP-0029-T08
status: done
priority: high
state_hub_task_id: "a3627144-9d98-4a3b-aa64-3079fd087448"

Add tests to state-hub/tests/:

test_token_events.py: create event, list with filter, summary aggregation (single task, cross-workstream rollup, by-model breakdown)
test_token_passthrough.py: update_task_status with tokens creates event

Run make test. Run make fix-consistency REPO=the-custodian.

Exit criteria: all new tests pass; consistency check clean; Alembic head matches DB.

7.8 KiB Raw Permalink Blame History