Fetches /token-events/?limit=1000 in parallel with progress events and
renders a second area+line chart (amber) below the events-per-day chart,
aggregating tokens_in + tokens_out per calendar day over the same 30-day window.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three reactive dropdowns below the Token Cost heading:
- Filter by repo: client-side filter via 3-level chain resolution
- Sort by: Tokens Total (default), Tokens In, Out, Event Count, Most Recent
- Show: 10/20/50/100/500 rows per table (default 20)
Applies uniformly to By Repo, By Workplan, and Top Tasks tables.
"Most Recent" derives last_event_at per group from the fetched events.
Truncated tables show a "Showing M of N" count below.
Completes CUST-WP-0030 T07–T09.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
By Repo now resolves via the full chain rather than requiring repo_id
directly on the token event:
1. token_events.repo_id (direct)
2. → workstreams.repo_id (via workstream_id)
3. → task.workstream_id → workstreams.repo_id (via task_id)
Changes:
- Auto-populate repo_id on token events at creation time (both the
token_events router and the tasks router)
- New GET /token-events/by-repo/ endpoint with RepoTokenSummary schema;
returns tokens_in/out/total, event_count, by_model, by_note per repo
- Dashboard By Repo section uses /by-repo/ directly and shows repo_slug
instead of a truncated UUID
- Backfilled the three existing events (userbased) with repo_id via SQL
185 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tier 1 (exact counts) now defaults to note="measured" instead of null,
signalling the counts were read from the Claude Code status bar.
Callers can pass note="userbased" when a human provided the numbers.
measured — agent read exact counts from the Claude Code status bar
userbased — counts provided by a human
workplan — prorated from workplan total across task count
heuristic — server fallback, 1000/500, no agent input
Added token_note field to TaskUpdate schema and exposed note param on
update_task_status and record_interactive_task MCP tools.
TOOLS.md documents the full taxonomy. 185 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New tool for capturing ad-hoc work done outside formal workplans.
Finds or creates a persistent 'interactive-<repo>' workstream for the
repo, creates the task, marks it done, and records a token event using
the three-tier logic — all in a single call.
Seeded two example events on interactive-the-custodian:
- Three-tier token recording on task done (8000/3500)
- Add record_interactive_task MCP tool (4500/1800)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Token events are now always created when update_task_status is called
with status="done", using the best available data:
Tier 1 (best): exact tokens_in + tokens_out passed by agent
Tier 2: workplan_tokens_in + workplan_tokens_out prorated
across workstream task count (note="workplan")
Tier 3 (fallback): heuristic 1000 in / 500 out (note="heuristic")
Non-done status changes never create a token event.
MCP tool updated with workplan_tokens_in/out params and tiered docs.
Ralph-workplan skill files updated with the three-tier guidance.
184 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The post-commit hook re-invokes fix-consistency, which commits writeback
changes, which re-triggers the hook — causing exponential process spawning.
Fix: pass GIT_CUSTODIAN_SYNC=1 in the env for all writeback git commits.
Update the post-commit hook (not tracked by git) to exit early when this
variable is set.
Also remove the --no-verify flag that was added as a failed attempt (it
only skips pre-commit/commit-msg, not post-commit hooks).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add git_fingerprint (root commit SHA-1) to managed_repos as a stable,
machine-independent identifier — identical across every clone regardless
of checkout path, remote URL, or SSH alias.
- Migration n1i2j3k4l5m6: adds git_fingerprint column + non-unique index
(non-unique to support repos that share ancestry via forks/splits)
- GET /repos/by-fingerprint?hash=<sha>[&remote_url=<url>]: lookup by
fingerprint; optional remote_url disambiguates shared-ancestry repos
- GET /repos/by-remote?url=<url>: fallback lookup by remote URL
- consistency_check.py --here [PATH]: auto-detects repo slug from any
local checkout via fingerprint (falls back to remote URL), then auto-
registers host_paths[hostname] so subsequent runs need no override
- --all now includes repos with host_paths[current_hostname], not just
those with local_path
- fix-consistency-here / check-consistency-here Makefile targets
- Fixed _api_get bug: httpx strips query strings when params={} is passed
- Backfilled fingerprints for 14 repos on this host
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST /topics/ was already implemented in the REST API but had no MCP
wrapper, so agents couldn't create topics (e.g. inter_hub) via MCP.
Tool follows the same pattern as create_domain.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `Optional` to typing imports in mcp_server/server.py — it was used
in 13 annotations but never imported, crashing FastMCP v3 at startup
- Remove legacy tunnel/tunnel-daemon/tunnel-loop/tunnel-status/tunnel-stop
targets from Makefile; ops-bridge (tunnels-up/status/check) supersedes them
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the dashboard's architecture, framework choice rationale, data-fetching
strategies (static loaders + live polling), component library, page inventory,
and key features including the Workstream Health Index and entity modals.
Also registers the new page in the Reference nav and adds runbook section for
node overload / runaway agent process (INC-002) with hardening checklist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds _write_custodian_brief() to consistency_check.py. After every fix_repo()
run, a .custodian-brief.md is written to the repo root with: domain, last-synced
timestamp, current repo goal, active workstreams with progress (done/total), and
the first 7 open tasks per workstream (blocked → in_progress → todo order) with
task IDs. The file is git-committed when content changes so remote workers (e.g.
CoulombCore) can pull it and orient without a live MCP connection.
Session protocol template and CLAUDE.md updated: read .custodian-brief.md first,
then call get_domain_summary() as an enhancement (skip if MCP unreachable).
This eliminates false "State hub is offline" alarms in subagents and remote workers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds --remote CLI flag and fix_all_remote() function. When run without a
REPO argument, the target checks all registered repos and:
- Skips repos whose local path does not exist on this machine
- Skips repos that are already clean (no fixable issues, no FAILs, not
behind remote, only C-08 background noise allowed)
- For repos that need work: git pull --ff-only then fix_repo()
Prints a summary of CLEAN (skipped) and NOT ON THIS HOST (skipped) repos
before the detailed fix reports.
Simplifies the Makefile target from shell-level curl+git to a single
uv run call using --remote. Same flag handles both single-repo and all-repos.
Also adds _git_pull() helper and 13 new tests (71 total in consistency suite).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_detect_behind_remote was comparing HEAD != @{u} which incorrectly
triggered C-16 when the local repo had unpushed commits. Fixed to use
git rev-list --count HEAD..@{u} which only counts commits the remote
has that local lacks. Adds test_returns_false_when_local_ahead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
T01 — No-regress rule (C-15): fix-consistency now detects when a DB task
status is ahead of the workplan file (e.g. marked done on CoulombCore)
and emits C-15 WARN instead of regressing the DB back to the stale file
value. STATUS_ORDER ranking: todo(0) < in_progress/blocked(1) < done/cancelled(2).
T02 — Pull gate (C-16): fix_repo runs git fetch + rev-parse at the start
of every --fix run. If the local repo is behind its remote tracking branch,
all write operations are skipped and C-16 WARN is emitted. Best-effort:
offline/no-remote silently skips the check.
T03 — DB→file writeback: C-15 fix path patches the status field in the
matching task block and git-commits the change with a standard message.
--no-writeback flag disables writeback while keeping T01/T02 active.
T04 — CLAUDE.md + session-protocol.template updated with new guidance,
C-15/C-16 semantics, and fix-consistency-remote recommendation.
T05 — Makefile: fix-consistency-remote pulls then fixes in one step.
16 new tests; 155 passed total.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a structured dispute mechanism when capability request routing is wrong:
- New `routing_disputed` status with four DB columns (dispute_reason, disputed_by,
dispute_suggested_domain, disputed_at) via Alembic migration m0h1i2j3k4l5
- POST /capability-requests/{id}/dispute — any party can flag misrouting with a reason
and optional suggested domain; notifies custodian + current fulfilling domain
- POST /capability-requests/{id}/reroute — custodian re-routes to correct domain via
catalog_entry_id or direct slug; appends audit trail to routing_note; resets to requested
- Two new MCP tools: dispute_capability_routing and reroute_capability_request
- Dashboard: amber disputed-banner at top of Summary, routing_disputed Kanban column,
dispute details (reason, suggested domain, raised-by) shown on disputed cards
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New page at /tools listing all connected applications grouped by
category: Local Services (State Hub API, KeePassXC, pgAdmin, ops-bridge),
Source Control (Gitea), Identity/Auth (KeyCape, Authelia, privacyIDEA,
LLDAP), and Dev Tooling (Claude Code, uv). Local services show live
green/red/grey status dots via no-cors fetch probes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `routing_note` column (migration l9g0h1i2j3k4) to persist why a request was routed to a given domain
- Fix substring-match bug in `_route_capability`: use `\b` word-boundary regex so 'postgres' no longer matches inside 'postgresql'
- Include `title` in keyword scoring for better routing accuracy
- Return `routing_note` string from `_route_capability` and store it on the request
- Add `PATCH /capability-requests/{id}` endpoint + `CapabilityRequestPatch` schema to correct mutable metadata (catalog_entry_id, priority, blocking_task_id, fulfilling_workstream_id)
- Add `patch_capability_request` MCP tool wrapping the new endpoint
- Add 105 lines of routing tests (word-boundary, title-match, multi-entry scoring, broadcast fallback)
- Add `tunnels-up`, `tunnels-status`, `tunnels-check` Makefile targets for ops-bridge managed tunnels
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
New page (docs/state-hub.md) covers:
- Why: the invisible state problem across repos and agents
- What: Derived Data Store, Read Model, Agent Orchestration Layer,
Cross-Repo Observatory — and what it is NOT
- Derived Data Store principle (ADR-003): fingerprint cache, rebuild
guarantee, force-refresh
- Repository Orchestrator: session protocol, cross-domain coordination
via messages + capability routing, Kaizen agents
- Architecture diagram (ASCII), technology choices, data model overview
- Running the hub, design principles, related docs
reference.md: add Architecture & Design section grouping state-hub,
TPSC, GDPR maturity, SCOPE.md, capabilities, and goals docs.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Root cause: C2/C9/C10 each made a full HTTP round-trip back to the API
(asyncio.to_thread → urllib → TCP → uvicorn → SQLAlchemy → DB) for every
repo. 16 repos × 3 calls = 48 self-calls at ~80-150ms each = ~6s total.
Fix: doi_engine.evaluate() accepts a prefetch dict. The summary endpoint
runs 3 bulk GROUP BY queries (domain status, TPSC snapshot counts, active
goal counts) and passes results directly — zero HTTP self-calls in summary
mode.
Result: /repos/doi/summary 6s → <1s (6× improvement on top of prior 13×).
Total improvement from original: 108s → <1s.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Page now renders in ~200ms. DoI badges and KPI card show a spinner
while the background fetch resolves (~6s), then update reactively
via Observable Mutable pattern (doiData / doiLoading).
Fast path: repos, SBOM, domains, workstreams — immediate render.
Slow path: /repos/doi/summary — background, non-blocking.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Two fixes:
1. skip_consistency=True in summary mode — omits C7/C13 subprocess calls
(consistency_check.py) which were the main bottleneck (32 spawns for 16 repos).
Full check still available per-repo via GET /repos/{slug}/doi.
2. asyncio.gather — all repos evaluated in parallel instead of sequentially.
Also: rename Repositories page title from "Repos" to "Repositories".
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Implements the 14-criterion DoI checklist as a runnable gate with API,
MCP tools, CLI script, and dashboard integration.
Core components:
- api/doi_engine.py — async engine evaluating all 14 criteria (asyncio.to_thread
for non-blocking HTTP self-calls), shared by API and CLI
- api/schemas/doi.py — DoICriterion, DoIReport, DoISummaryEntry schemas
- api/routers/repos.py — GET /repos/{slug}/doi + GET /repos/doi/summary
- scripts/check_doi.py — CLI: make check-doi REPO=<slug> / check-doi-all
- mcp_server/server.py — check_repo_doi(), get_doi_summary() tools
Dashboard (repos.md):
- DoI tier badge per repo (None/Core/Standard/Full) colour-coded red→green
- Domain block shows lowest DoI tier across its repos
- DoI KPI card in summary row
- DoI filter in All Repos Table
- Link to Repository DoI policy page
Also fixes: TPSC snapshots 500 error (missing nested selectinload for
catalog_entry relationship in list_snapshots endpoint).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Three-tier checklist defining what 'fully integrated with the state-hub'
means for a repository:
- Core (Registered): registered, domain assigned, path resolves, remote URL
- Standard (Integrated): SCOPE.md, CLAUDE.md, workplan convention, SBOM, TPSC
- Full (Fully Integrated): repo goal, capabilities declared, agents template,
clean consistency check, host paths registered
Exposed via /policy/repo-doi (editable in dashboard) and linked under Policies.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Stale host_paths entries (wrong username, old machine) were silently overriding
the correct local_path, causing FileNotFoundError on tools like list_kaizen_agents.
Extracts _resolve_repo_path(repo) helper that tries host_paths[hostname] first
but validates the path exists on disk before trusting it, then falls back to
local_path. Both candidates support ~ expansion. Applied to all 4 call sites:
_kaizen_agents_dir, validate_repo_adr, check_repo_consistency, ingest_sbom_tool.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Introduces a capability catalog (CUST-WP-0022) so domains can advertise what
they provide and agents can request capabilities from other domains with
auto-routing, lifecycle tracking, and task-unblocking on completion.
- New models: CapabilityCatalog, CapabilityRequest with full lifecycle
(requested → accepted → in_progress → ready_for_review → completed/rejected/withdrawn)
- Migration i6d7e8f9a0b1: capability_catalog + capability_requests tables
- Router /capability-catalog and /capability-requests with accept/status endpoints
- 7 new MCP tools: register_capability, list_capabilities, request_capability,
accept_capability_request, update_capability_request_status,
list_capability_requests, get_capability_request
- StateSummary gains open_capability_requests count
- Dashboard: capability-requests.md page + docs/capabilities.md + docs/scope.md
- SCOPE.md: three seed capabilities documented (MCP registration, state tracking, SBOM)
- scope.template: Provided Capabilities section with example block
- scripts/ingest_capabilities.py + make ingest-capabilities[/-all] targets
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
repos.json.py now fetches /sbom/snapshots/ alongside /repos/ and
annotates each repo with sbom_snapshot_count, sbom_entry_count, and a
last_sbom_at fallback derived from actual snapshot data. This prevents
"LastSBOM=never" when the denormalized field is out of sync.
repo-sync.md gains SBOM KPI tiles (ingested vs no-SBOM), color-coded
SBOM age column (same green/orange/red scale as state sync), and an
entry count column showing packages from the latest snapshot.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Parses go.sum lockfiles for Go projects. Reads go.mod alongside to
mark direct vs indirect dependencies. Deduplicates by (module, version),
skipping go.mod hash lines.
Used to ingest key-cape (netkingdom domain): 23 Go modules.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Always call display() for the warning element so Observable Framework
replaces it on each poll re-run. Previously the conditional display()
call left the warning rendered indefinitely once shown.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MCP server is now a persistent SSE service on :8001 (make mcp-http),
independent of the Claude Code session. Re-registration is a single
claude mcp add-json command; no patch_mcp_cwd.py needed.
- Makefile: mcp-http is primary transport, add fuser restart + updated comment
- state-hub/README.md: stack table, MCP section, troubleshooting note updated
- CLAUDE.md (project): registration instructions rewritten for SSE
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old bare `api` target (uvicorn only) is subsumed into the new `api`
target (db + postgres-wait + migrate + fuser-restart + uvicorn). Updated
all doc references and cleaned up duplicate entries left by the rename.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pkill -f matched the shell subprocess's own argv (which contains the
pattern as a -c argument), causing make to receive SIGTERM and abort.
fuser -k 8000/tcp / 3000/tcp targets only the process bound to the
port — no self-kill risk.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- `make backend` replaces `make start`; polls postgres with nc (up to 10s)
instead of fixed sleep, kills any running uvicorn before starting fresh
- `make dashboard` kills any running observable preview before restarting
- Update all references in CLAUDE.md, README.md, SCOPE.md, state-hub/README.md,
and dashboard/src/docs/live-data.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Agents had no way to look up task UUIDs by workstream; they were stuck
unable to call update_task_status without already knowing the UUID.
list_tasks() wraps GET /tasks with workstream_id filter, returning
[{id, title, status, priority}] for all matching tasks.
FR raised by kaizen-agentic worker on COULOMBCORE while syncing
KAIZEN-WP-0002 task IDs. Marked merged in contributions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- parse_task_blocks() now injects the nearest preceding ### heading
text as `title` — tasks no longer stored with bare IDs as their title
- C-11 fix skips creating tasks when workstream is completed/archived
(prevents duplicate task creation on repeated fix-consistency runs)
- C-12 is now fixable: auto-cancels open orphan DB tasks when the
backing workstream is finished (completed/archived)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
FastMCP validates dict | None strictly, rejecting a JSON string even if
parseable. Broaden to dict | str | None and coerce in the function body
so callers don't need to pre-parse the detail payload.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>