Fixed and improved token tracking

This commit is contained in:
2026-05-23 13:59:05 +02:00
parent dd3279ea1a
commit c12091c2eb
29 changed files with 3549 additions and 278 deletions

View File

@@ -0,0 +1,75 @@
# State Hub Multi-User Access Model
State Hub is local-first coordination infrastructure. It reflects repo-backed
workplans, progress, and operational state; it is not the authority for source
control, host access, identity, or runtime secret custody.
## Decision
For the current phase, enforce user access through the systems that already own
the boundary:
- Gitea controls repository read/write rights.
- SSH authorized keys control host access.
- ops-bridge controls whether a remote machine can reach local services.
- OpenBao controls runtime secret custody after bootstrap.
State Hub API authentication is deferred until there is an active external
collaborator or an exposed deployment that needs per-user write enforcement.
Until then, State Hub stays private to local or tunneled operator networks.
## Roles
| Role | State Hub access | Source of authority |
|------|------------------|---------------------|
| Primary operator | Full read/write across domains | host access, repo ownership, operator secret custody |
| Domain collaborator | Read all public coordination state; write through owned domain repo and approved hub actions | Gitea repo permissions plus SSH/tunnel authorization |
| Observer | Read-only brief/dashboard access where explicitly exposed | tunnel or future API token |
## Current Enforcement Boundary
1. Repo files remain authoritative. A collaborator can change workplans only in
repos where Gitea allows them to push.
2. State Hub indexes files and records progress events, but it should not become
the primary identity authority.
3. Direct dashboard/API access is private by default. Do not publish State Hub
unauthenticated on the public internet.
4. Runtime secrets, service account keys, database credentials, and package
tokens should move into OpenBao after the OpenBao bootstrap, unseal, audit,
and recovery procedure is complete.
## Future API Auth Trigger
Add API-layer auth when one of these becomes true:
- a second human needs direct State Hub API/dashboard mutation rights
- State Hub is exposed beyond localhost or a tightly controlled SSH tunnel
- automation needs per-consumer attribution and revocation independent of repo
commits
- domain-scoped write checks are needed at request time
## Future Token Shape
When the trigger is reached, implement a small token model rather than a full
identity provider inside State Hub:
- accept NetKingdom IAM Profile OIDC tokens when the identity plane is ready
- support one emergency local admin token for break-glass operation
- map claims to `primary_operator`, `domain_collaborator`, or `observer`
- enforce domain write scopes in mutating endpoints
- keep repo permissions as the durable source of contribution authority
Candidate scopes:
```text
statehub:read
statehub:write
statehub:domain:<slug>:write
statehub:admin
```
## Operator Rule
Do not store collaborator credentials in the State Hub database. Store secrets
in OpenBao or the approved bootstrap bundle, and store source permissions in
Gitea.

212
docs/onboarding.md Normal file
View File

@@ -0,0 +1,212 @@
# State Hub Onboarding
This guide turns a new machine into a usable State Hub operator or collaborator
environment. It covers local credentials, SSH reachability, Gitea access, and
Claude Code MCP registration.
State Hub remains a coordination read/cache layer. Repo permissions, SSH
access, and controlled tunnels are the first access boundary. OpenBao is the
runtime secret authority for platform and workload secrets once its bootstrap
ceremony is complete.
## Quick Start
Clone the repo, then run the bootstrap script:
```bash
git clone https://gitea.coulomb.social/coulomb/state-hub.git ~/state-hub
cd ~/state-hub
make bootstrap-env
```
On a clean Ubuntu 24.04 machine, allow package installation explicitly:
```bash
make bootstrap-env ARGS="--install-missing"
```
For a remote machine that reaches State Hub through ops-bridge:
```bash
make bridges
make register-mcp MCP_URL=http://127.0.0.1:18001/sse API_BASE=http://127.0.0.1:18000
```
Restart Claude Code after MCP registration.
## Primary Operator: New Machine
1. Install minimal host prerequisites:
```bash
sudo apt-get update
sudo apt-get install -y git curl openssh-client make python3
```
2. Clone `state-hub` and any domain repo you expect to operate:
```bash
git clone https://gitea.coulomb.social/coulomb/state-hub.git ~/state-hub
git clone https://gitea.coulomb.social/coulomb/the-custodian.git ~/the-custodian
```
3. Run the bootstrap:
```bash
cd ~/state-hub
make bootstrap-env ARGS="--install-missing"
```
The script will:
- check required tools
- configure `git credential.helper`
- create `~/.ssh/id_ed25519` when missing
- print the public key for managed hosts
- create `~/.railiance_gitea.conf` when you provide a Gitea token
- register the State Hub MCP server for Claude Code
- check State Hub API reachability
4. Authorize the SSH key on managed hosts. If password or existing key access
is available, rerun:
```bash
make bootstrap-env ARGS="--authorize-ssh --skip-gitea --skip-mcp"
```
Default targets:
- `tegwick@92.205.62.239` for Railiance01
- `tegwick@92.205.130.254` for CoulombCore
5. Start or connect to State Hub:
```bash
make api
make mcp-http
```
If the hub is remote, use ops-bridge:
```bash
make bridges
```
6. Restart Claude Code and verify that `state-hub` appears in the MCP server
list. In the first session, call `get_state_summary()` when MCP tools are
available. If not, use:
```bash
cat .custodian-brief.md
curl -s "http://127.0.0.1:8000/workstreams/?status=active" | python3 -m json.tool
```
## Domain Collaborator: New Person
1. Get a Gitea account with write access to the relevant domain repo.
2. Clone this repo and the domain repo:
```bash
git clone https://gitea.coulomb.social/coulomb/state-hub.git ~/state-hub
git clone https://gitea.coulomb.social/coulomb/<domain-repo>.git ~/<domain-repo>
```
3. Run the bootstrap:
```bash
cd ~/state-hub
make bootstrap-env
```
4. Send the printed SSH public key to the operator, or authorize it yourself if
you already have host access:
```bash
ssh-copy-id -i ~/.ssh/id_ed25519.pub tegwick@92.205.62.239
```
5. Bring up the State Hub tunnel when direct local access is unavailable:
```bash
make bridges
make register-mcp MCP_URL=http://127.0.0.1:18001/sse API_BASE=http://127.0.0.1:18000
```
6. Restart Claude Code, open the domain repo, and orient from the repo brief:
```bash
cat .custodian-brief.md
```
7. Contribute work through repo-backed workplans. A new workplan lives under
`workplans/` and follows ADR-001. The hub indexes files; the files remain
authoritative.
## Credential Helper Choices
`make bootstrap-env` configures Git credentials only when no global helper is
already set.
Default behavior:
- use `libsecret` when the helper exists
- otherwise use `credential.helper=cache --timeout=3600`
For headless hosts where a persistent plaintext helper is acceptable:
```bash
make bootstrap-env ARGS="--git-helper store --allow-plaintext-store"
```
Prefer SSH remotes or a keyring-backed helper for normal operator machines.
## Gitea Token File
Some Railiance scripts read `~/.railiance_gitea.conf`:
```bash
GITEA_URL="http://92.205.130.254:32166"
GITEA_USER="<user>"
GITEA_TOKEN="<token>"
```
Required token capabilities depend on the action:
- repo creation needs `read:user` and repository write/admin scope
- package publishing needs package write scope
- inventory reads need repository read scope
The bootstrap script writes this file with mode `0600` and does not print the
token.
## MCP Registration
Local registration:
```bash
make register-mcp
```
Tunnel registration:
```bash
make register-mcp MCP_URL=http://127.0.0.1:18001/sse API_BASE=http://127.0.0.1:18000
```
The current State Hub MCP transport is SSE. The old `.mcp.json`/stdio flow is
legacy; use `make mcp-http` to run the SSE service on `127.0.0.1:8001`.
## Verification Checklist
Run these checks after bootstrap:
```bash
git config --global --get credential.helper
test -f ~/.ssh/id_ed25519.pub
test -f ~/.railiance_gitea.conf
curl -fsS http://127.0.0.1:8000/state/health || curl -fsS http://127.0.0.1:18000/state/health
make register-mcp DRY_RUN=1
```
Then restart Claude Code and confirm that the `state-hub` MCP server is
available.

View File

@@ -0,0 +1,57 @@
# Token Evidence Model
State Hub token events distinguish source-backed measurements from inferred
operational signals. Dashboards and reports should use structured fields for
quality and provenance; `note` remains human context only.
## Measurement Kinds
| Kind | Meaning | Default confidence |
| --- | --- | --- |
| `measured` | Parsed from a source that reports usage metadata, such as Codex session logs or Claude transcript usage blocks. | `1.0` |
| `allocated` | A share of a larger known total, assigned to a task/workstream by a documented allocation method. | `0.70` |
| `estimated` | A fallback or operator-entered estimate without direct source evidence. | `0.35` |
| `superseded` | Historical rows retained for audit but excluded from active totals. | `0.0` |
## Source Providers
| Provider | Source |
| --- | --- |
| `codex_session` | Codex Desktop `.codex/sessions/**` and `.codex/archived_sessions/**` JSONL token_count events. |
| `claude_transcript` | Claude Code `.claude/projects/**/*.jsonl` usage metadata. Transcript text is never stored. |
| `llm_connect` | Future llm-connect usage metadata. |
| `manual` | Explicit operator/API input. |
| `task_fallback` | Fixed task-completion fallback rows created when no source data is available. |
## Provenance Fields
Each source-backed row should include:
- `source_provider`, `source_id`, `source_path`, `source_created_at`
- `parser_version`, `ingested_at`, `confidence`
- `cached_input_tokens`, `reasoning_output_tokens`, `raw_total_tokens`
- `raw_metadata` with parser and attribution metadata, never transcript content
`tokens_in + tokens_out` remains the default active total. Cached input and
reasoning output are preserved separately so dashboards can show both default
and provider-style totals without rewriting history.
## Idempotency
Measured sources must be written with a stable `source_id`. State Hub enforces
one row for each `(measurement_kind, source_provider, source_id)` tuple and
`POST /token-events/upsert` updates a growing live session rather than creating
duplicates.
## Migration Playbook
1. Run the token-event provenance migration.
2. Run `python3 scripts/token_reconcile.py --since 2026-05-19` and inspect the
dry-run report.
3. Run `python3 scripts/token_reconcile.py --since 2026-05-19 --apply` to
upsert measured Codex/Claude source rows.
4. Run the same command with `--zero-superseded-fallbacks` only after measured
source rows cover the affected window.
5. Check `/token-events/quality/` or the Token Cost dashboard for fallback,
missing-provenance, duplicate-source, and unattributed measured signals.
6. Keep historical fallback rows as `superseded`; do not delete them.