generated from coulomb/repo-seed
Fixed and improved token tracking
This commit is contained in:
@@ -4,12 +4,12 @@ type: workplan
|
||||
title: "Multi-User Onboarding and Environment Bootstrap"
|
||||
domain: custodian
|
||||
repo: state-hub
|
||||
status: active
|
||||
status: finished
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
state_hub_workstream_id: "a28d9e29-4119-4b73-9469-f921920253ef"
|
||||
created: "2026-03-11"
|
||||
updated: "2026-05-17"
|
||||
updated: "2026-05-23"
|
||||
---
|
||||
|
||||
# Multi-User Onboarding and Environment Bootstrap
|
||||
@@ -51,7 +51,7 @@ Two personas:
|
||||
```task
|
||||
id: CUST-WP-0012-T01
|
||||
state_hub_task_id: 71628269-9a75-4dae-a347-e64a86040322
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
@@ -79,6 +79,12 @@ git config --global credential.helper 'cache --timeout=3600'
|
||||
**Done when:** included in bootstrap script; push to Gitea works without
|
||||
re-entering credentials on second attempt.
|
||||
|
||||
**Implemented 2026-05-23:** `scripts/bootstrap-env.sh` configures a global
|
||||
credential helper when one is not already present. It prefers `libsecret`, uses
|
||||
`cache --timeout=3600` as the safe automatic fallback, and supports explicit
|
||||
headless plaintext storage via `--git-helper store --allow-plaintext-store`.
|
||||
`docs/onboarding.md` documents the tradeoffs.
|
||||
|
||||
---
|
||||
|
||||
### T02 — SSH key generation and authorization automation
|
||||
@@ -86,7 +92,7 @@ re-entering credentials on second attempt.
|
||||
```task
|
||||
id: CUST-WP-0012-T02
|
||||
state_hub_task_id: fea965e9-8a8f-439c-9096-8f7756eb71ed
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
@@ -110,6 +116,11 @@ ssh-copy-id -i ~/.ssh/id_ed25519.pub tegwick@92.205.130.254
|
||||
|
||||
**Done when:** included in bootstrap script; documented in onboarding guide.
|
||||
|
||||
**Implemented 2026-05-23:** `scripts/bootstrap-env.sh` generates
|
||||
`~/.ssh/id_ed25519` if missing, prints the public key, and can run
|
||||
`ssh-copy-id` for Railiance01 and CoulombCore with `--authorize-ssh`.
|
||||
`docs/onboarding.md` documents the operator and collaborator path.
|
||||
|
||||
---
|
||||
|
||||
### T03 — Claude Code MCP registration automation
|
||||
@@ -117,7 +128,7 @@ ssh-copy-id -i ~/.ssh/id_ed25519.pub tegwick@92.205.130.254
|
||||
```task
|
||||
id: CUST-WP-0012-T03
|
||||
state_hub_task_id: 60318e9a-972e-45c8-afde-82ed0625f594
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
@@ -132,10 +143,10 @@ make register-mcp # idempotent; safe to re-run
|
||||
|
||||
The script should:
|
||||
1. Detect whether `state-hub` is already in `~/.claude.json`
|
||||
2. Extract the server config from `.mcp.json`
|
||||
2. Use the current SSE MCP config (`http://127.0.0.1:8001/sse` locally or
|
||||
`http://127.0.0.1:18001/sse` through ops-bridge)
|
||||
3. Run `claude mcp add-json -s user state-hub <config>`
|
||||
4. Run `patch_mcp_cwd.py` to restore the cwd field
|
||||
5. Print instructions to restart Claude Code
|
||||
4. Print instructions to restart Claude Code
|
||||
|
||||
Should also detect whether the state hub is reachable directly
|
||||
(`http://127.0.0.1:8000`) or needs a tunnel (via ops-bridge), and emit
|
||||
@@ -144,6 +155,12 @@ a warning if neither is available.
|
||||
**Done when:** `make register-mcp` works on a clean machine; documented
|
||||
in onboarding guide.
|
||||
|
||||
**Implemented 2026-05-23:** `scripts/register-mcp.sh` and the
|
||||
`make register-mcp` target register the current SSE MCP transport
|
||||
idempotently. The script detects local/tunnel reachability, supports
|
||||
`MCP_URL`, `API_BASE`, and `DRY_RUN=1`, and documents the old `.mcp.json` cwd
|
||||
patch path as legacy.
|
||||
|
||||
---
|
||||
|
||||
### T04 — Environment bootstrap script
|
||||
@@ -151,7 +168,7 @@ in onboarding guide.
|
||||
```task
|
||||
id: CUST-WP-0012-T04
|
||||
state_hub_task_id: 84a94761-e424-4470-a9a2-64d9cabadb7f
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
@@ -176,6 +193,11 @@ Design constraints:
|
||||
**Done when:** running the script on a clean Ubuntu 24.04 machine
|
||||
produces a working Custodian environment with no additional manual steps.
|
||||
|
||||
**Implemented 2026-05-23:** `scripts/bootstrap-env.sh` and
|
||||
`make bootstrap-env` provide the idempotent entrypoint. It supports dry-run,
|
||||
non-interactive mode, optional apt package installation, SSH authorization,
|
||||
Gitea token prompting, MCP registration, and State Hub health checks.
|
||||
|
||||
---
|
||||
|
||||
### T05 — Onboarding guide and user journey documentation
|
||||
@@ -183,7 +205,7 @@ produces a working Custodian environment with no additional manual steps.
|
||||
```task
|
||||
id: CUST-WP-0012-T05
|
||||
state_hub_task_id: b0839802-659a-475b-8b84-ab7341ea3d15
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
@@ -208,6 +230,10 @@ for both personas:
|
||||
**Done when:** a new collaborator can follow the guide without
|
||||
clarification from the primary operator.
|
||||
|
||||
**Implemented 2026-05-23:** `docs/onboarding.md` covers primary operator and
|
||||
domain collaborator journeys, including SSH, Gitea token file, credential
|
||||
helper choices, MCP registration, tunnel setup, and verification checks.
|
||||
|
||||
---
|
||||
|
||||
### T06 — State Hub multi-user model (deferred)
|
||||
@@ -215,7 +241,7 @@ clarification from the primary operator.
|
||||
```task
|
||||
id: CUST-WP-0012-T06
|
||||
state_hub_task_id: d5df3302-67b9-4765-a8d8-ea2df53dff6e
|
||||
status: todo
|
||||
status: done
|
||||
priority: low
|
||||
```
|
||||
|
||||
@@ -235,6 +261,11 @@ domain) or rely on Gitea repo permissions as the authoritative boundary
|
||||
Implement T01–T05 first; multi-user access control is only needed when
|
||||
there is more than one user.
|
||||
|
||||
**Implemented 2026-05-23:** `docs/multi-user-access-model.md` records the
|
||||
current decision: repo permissions, SSH access, tunnels, and OpenBao remain the
|
||||
authoritative boundaries for this phase; State Hub API auth is deferred until a
|
||||
real second-user or exposed-deployment trigger exists.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
310
workplans/STATE-WP-0045-token-measurement-accuracy.md
Normal file
310
workplans/STATE-WP-0045-token-measurement-accuracy.md
Normal file
@@ -0,0 +1,310 @@
|
||||
---
|
||||
id: STATE-WP-0045
|
||||
type: workplan
|
||||
title: "Token Measurement Accuracy and Resilience"
|
||||
domain: custodian
|
||||
repo: state-hub
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-05-23"
|
||||
updated: "2026-05-23"
|
||||
state_hub_workstream_id: "0aefe379-c182-4471-84dd-c136d5e1206b"
|
||||
---
|
||||
|
||||
# Token Measurement Accuracy and Resilience
|
||||
|
||||
## Summary
|
||||
|
||||
Make State Hub token tracking accurate enough to trust for daily operations and
|
||||
robust enough to survive agent/tool changes.
|
||||
|
||||
The May 19 flatline showed the current weak spots: token events mixed measured
|
||||
usage, task-completion fallbacks, and file-sync side effects in the same table;
|
||||
Claude measurement depended on one hook path; Codex usage lived in local session
|
||||
logs until a manual backfill; and the dashboard treated every token event as the
|
||||
same quality of evidence. The immediate fix restored Codex session totals and
|
||||
suppressed sync-generated fallback events, but the system still needs a durable
|
||||
measurement model, idempotent source adapters, reconciliation checks, and a
|
||||
dashboard that exposes provenance and confidence.
|
||||
|
||||
## Current Findings
|
||||
|
||||
- `token_events` stores counts, associations, free-text notes, and timestamps,
|
||||
but not structured provenance such as source system, source event id, parser
|
||||
version, raw token categories, confidence, or whether the row is measured,
|
||||
allocated, estimated, or superseded.
|
||||
- `PATCH /tasks/{id}` can still create heuristic token events on a transition to
|
||||
`done`. That fallback is useful as a temporary operational signal, but it is
|
||||
not a measurement and should not be blended into measured totals.
|
||||
- `fix-consistency` now suppresses token events while syncing file-backed task
|
||||
status, but this is a narrow guard. Other bulk sync, import, and migration
|
||||
paths need the same invariant.
|
||||
- Codex Desktop session logs contain structured `token_count` events with
|
||||
`last_token_usage`, `total_token_usage`, cached-input counts, and reasoning
|
||||
output counts. The new backfill script can restore these, but it is not yet a
|
||||
scheduled or monitored ingestion path.
|
||||
- Claude Code measurement currently depends on `scripts/task_token_hook.py`
|
||||
firing after one MCP tool name. It uses per-session state in `/tmp`, so missed
|
||||
hooks, restarts, renamed tools, and non-MCP REST paths can silently degrade to
|
||||
fallback events.
|
||||
- Repository attribution for Codex backfill is path-based. This is good enough
|
||||
for the emergency restore, but long-term attribution should prefer registered
|
||||
repo fingerprints/remotes and then fall back to paths.
|
||||
- The Token Cost dashboard currently aggregates all events returned by
|
||||
`/token-events/?limit=1000`; it does not show measurement quality, source,
|
||||
superseded rows, ingestion freshness, or possible gaps.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Exact billing reconciliation against vendor invoices.
|
||||
- Capturing private transcript content in State Hub.
|
||||
- Replacing existing task/workstream/repo relationships.
|
||||
- Implementing every provider-specific parser in one pass. The first pass should
|
||||
cover Codex Desktop and Claude Code, with a documented adapter contract for
|
||||
others.
|
||||
|
||||
## T01 - Define Token Evidence Model
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "29aed6d9-40aa-40fc-9e9a-3eb3e6f985bc"
|
||||
```
|
||||
|
||||
Define a structured model that separates measured usage from allocated,
|
||||
estimated, and superseded rows.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Add a short design note or ADR section covering token event semantics.
|
||||
- Define measurement classes such as `measured`, `allocated`, `estimated`, and
|
||||
`superseded`.
|
||||
- Define source classes such as `codex_session`, `claude_transcript`,
|
||||
`llm_connect`, `manual`, and `task_fallback`.
|
||||
- Define structured provenance fields: source system, source id, source path or
|
||||
URI, source timestamp, parser version, ingestion timestamp, and confidence.
|
||||
- Decide how to represent raw token categories: input, cached input, output,
|
||||
reasoning output, and provider total.
|
||||
- Decide whether cached input should be included in default totals or shown as a
|
||||
separate metric. Preserve enough fields to support both views.
|
||||
- Replace free-text note taxonomy as the primary quality signal. Notes can
|
||||
remain for human context, but dashboards and APIs should rely on structured
|
||||
fields.
|
||||
|
||||
Done when the repo has a reviewed token evidence contract and the follow-on
|
||||
schema/API tasks can implement it without ambiguity.
|
||||
|
||||
## T02 - Add Provenance Schema and Idempotent Upsert API
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "ade2bd40-343c-4829-ba4f-44bc8b7cbef9"
|
||||
```
|
||||
|
||||
Extend token storage so source-derived events can be written repeatedly without
|
||||
duplicates and without losing provenance.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Add migration fields for the evidence model from T01. Candidate fields:
|
||||
`measurement_kind`, `source_provider`, `source_id`, `source_path`,
|
||||
`source_created_at`, `ingested_at`, `parser_version`, `confidence`,
|
||||
`cached_input_tokens`, `reasoning_output_tokens`, `raw_total_tokens`,
|
||||
`cost_estimated_usd`, and `raw_metadata`.
|
||||
- Add a unique constraint or partial unique index that prevents duplicate
|
||||
measured source rows. For example: source provider plus source id, scoped by
|
||||
measurement kind.
|
||||
- Provide an upsert endpoint or make `POST /token-events/` support an explicit
|
||||
idempotency key. The behavior should update a growing live session rather than
|
||||
creating a second row.
|
||||
- Keep backward compatibility for existing clients that only post
|
||||
`tokens_in`/`tokens_out`, but classify those rows explicitly.
|
||||
- Update schemas, router tests, and migration tests.
|
||||
|
||||
Done when source-backed token events can be inserted or updated idempotently and
|
||||
legacy callers continue to work.
|
||||
|
||||
## T03 - Build Reusable Token Source Adapters
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "3844fb70-4ceb-4f90-9894-d4845970f0a6"
|
||||
```
|
||||
|
||||
Move source-specific parsing out of one-off scripts and hooks into reusable,
|
||||
tested adapter modules.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Add an `api/services/token_sources/` package or equivalent service layer.
|
||||
- Implement a Codex Desktop adapter for `.codex/sessions/**` and
|
||||
`.codex/archived_sessions/**`.
|
||||
- Implement a Claude Code adapter for `.claude/projects/**/*.jsonl` that reads
|
||||
usage metadata without storing transcript text.
|
||||
- Provide a common adapter result type with source id, timestamps, token
|
||||
categories, model, agent, cwd/path context, and raw parser metadata.
|
||||
- Make parsing safe by default: no conversation text in logs, progress events,
|
||||
token notes, or API payloads.
|
||||
- Add fixtures with synthetic Codex and Claude session records that cover live
|
||||
sessions, archived sessions, duplicate files, malformed JSONL, resets, and
|
||||
missing usage records.
|
||||
- Keep `scripts/backfill_codex_token_events.py` as a thin CLI over the reusable
|
||||
service or replace it with a new unified CLI.
|
||||
|
||||
Done when Codex and Claude token sources have deterministic parser tests and a
|
||||
shared ingestion interface.
|
||||
|
||||
## T04 - Improve Repo, Workstream, and Task Attribution
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d78b36ea-2a1a-40d6-bd83-03d48ff2ad9b"
|
||||
```
|
||||
|
||||
Make attribution accurate without relying solely on local path string matching.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Resolve repo attribution by git root fingerprint and remote URL when possible,
|
||||
then fall back to registered host paths.
|
||||
- Handle duplicate local paths or alias repos explicitly, especially where one
|
||||
checkout is registered under multiple slugs.
|
||||
- Attribute session-level usage to repo first, then optionally to workstreams or
|
||||
tasks when there is strong evidence.
|
||||
- Define task allocation rules that do not change measured session totals. For
|
||||
example, produce `allocated` child rows from measured session rows using task
|
||||
completion timestamps, tool-call metadata, or explicit operator input.
|
||||
- Record the allocation method and confidence for every task-level allocation.
|
||||
- Avoid minting task-level heuristic rows automatically for bulk import, status
|
||||
sync, migration, and consistency tooling.
|
||||
|
||||
Done when measured session totals are stable and task/workstream attribution is
|
||||
explicitly either measured, allocated, or estimated.
|
||||
|
||||
## T05 - Add Reconciliation, Gap Detection, and Backfill Operations
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T05
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "efaa2629-4f9a-439c-b0a3-85d77b03580f"
|
||||
```
|
||||
|
||||
Add an operator-safe reconciliation command that detects flatlines, duplicate
|
||||
rows, stale ingestion, and fallback leakage.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Add a command such as `make token-reconcile` or
|
||||
`python scripts/token_reconcile.py --since <date>`.
|
||||
- Report sessions found, sessions ingested, sessions stale, duplicate source
|
||||
ids, fallback events, superseded rows, unattributed sessions, and rows missing
|
||||
structured provenance.
|
||||
- Support `--dry-run` by default and `--apply` for writes.
|
||||
- Include an explicit `--zero-superseded-fallbacks` or equivalent flag rather
|
||||
than silently editing historical rows.
|
||||
- Store reconciliation summaries as progress events or report files without
|
||||
including transcript content.
|
||||
- Add a canary threshold: alert or fail when measured token volume is zero while
|
||||
task/progress activity exists for the same window.
|
||||
|
||||
Done when an operator can run one command to verify token tracking health and
|
||||
perform safe, idempotent backfills.
|
||||
|
||||
## T06 - Harden Hooks and Runtime Integration
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T06
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "5fd99241-e6dd-4ca6-8c58-a0048f08f0ca"
|
||||
```
|
||||
|
||||
Make token collection survive hook misses, tool renames, restarts, and multiple
|
||||
agent runtimes.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Update Claude hook handling so it can match supported task completion paths,
|
||||
not just one exact MCP tool name.
|
||||
- Persist hook high-water marks in a durable State Hub or repo-local location
|
||||
instead of only `/tmp`.
|
||||
- Add hook health logging that records when a hook ran, what source id it
|
||||
processed, and whether it patched or skipped a token event.
|
||||
- Add a Codex ingestion path that can run on demand and from a schedule without
|
||||
requiring manual script execution.
|
||||
- Document required environment variables and path discovery for Windows, WSL,
|
||||
and remote Linux hosts.
|
||||
- Ensure failures degrade to visible `estimated` events or health warnings, not
|
||||
silent flatlines.
|
||||
|
||||
Done when missing or stale token ingestion becomes visible within one reporting
|
||||
window and can be recovered without ad hoc inspection.
|
||||
|
||||
## T07 - Upgrade Token APIs and Dashboard Quality Signals
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T07
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "ecaf6ff8-59aa-4c56-8163-125dc96b2068"
|
||||
```
|
||||
|
||||
Expose token quality, source, and freshness in APIs and dashboard views.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Add API filters for measurement kind, source provider, repo, time range,
|
||||
superseded rows, and unattributed rows.
|
||||
- Replace the hard dashboard dependence on `/token-events/?limit=1000` with
|
||||
paginated or pre-aggregated endpoints that support time windows.
|
||||
- Add dashboard controls for measured-only, include allocated, include
|
||||
estimates, and show superseded rows.
|
||||
- Show ingestion freshness: last Codex session ingested, last Claude transcript
|
||||
ingested, and last reconciliation run.
|
||||
- Add a data-quality section listing fallback events, unattributed measured
|
||||
sessions, duplicate source ids, and days with progress/task activity but zero
|
||||
measured tokens.
|
||||
- Update the Token Cost page and docs so operators know which numbers are
|
||||
measured versus inferred.
|
||||
|
||||
Done when the dashboard no longer presents fallback, allocated, and measured
|
||||
usage as indistinguishable totals.
|
||||
|
||||
## T08 - Verification and Migration Playbook
|
||||
|
||||
```task
|
||||
id: STATE-WP-0045-T08
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "61baff79-832e-45f8-80f3-106abe262096"
|
||||
```
|
||||
|
||||
Cover the new measurement system with tests and a safe rollout plan.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Add unit tests for the evidence model, source adapters, source-id
|
||||
deduplication, repo attribution, and task allocation.
|
||||
- Add router tests for idempotent upsert, source filters, measurement-kind
|
||||
filters, created-at preservation, and backwards-compatible legacy posts.
|
||||
- Add reconciliation tests with synthetic pre-May-19 and post-May-19 flatline
|
||||
scenarios.
|
||||
- Add dashboard/data-loader tests or fixture checks for quality filters and
|
||||
aggregate counts.
|
||||
- Write a migration playbook covering old heuristic rows, existing
|
||||
`backfill:codex-session` rows, and any rows without structured provenance.
|
||||
- Verify the full suite and run a dry-run reconciliation before marking this
|
||||
workplan finished.
|
||||
|
||||
Done when the improved token measurement path has automated coverage, an
|
||||
operator playbook, and a dry-run reconciliation report showing no hidden
|
||||
fallback leakage.
|
||||
Reference in New Issue
Block a user