199 Commits

Author SHA1 Message Date
0375e57626 chore(state-hub): decouple embedded service tree 2026-05-17 20:16:30 +02:00
ab77698702 docs(state-hub): plan repo extraction 2026-05-17 18:48:31 +02:00
ca8a09ed04 feat(state-hub): CUST-WP-0040 — NATS lifecycle event publishing for activity-core
Makes the state hub an event publisher so activity-core can drive
maintenance automation declaratively via ActivityDefinitions, rather
than the hub creating tasks itself.

- api/events/: lazy JetStream publisher + EventEnvelope mirroring
  activity-core's contract; no-op when NATS_URL unset, fire-and-forget
  with logged failures so publishing never breaks an API request.
- Wired publishers on the five v1.0 lifecycle events:
    org.statehub.repo.registered        (POST /repos/)
    org.statehub.workstream.completed   (PATCH /workstreams/* on transition)
    org.statehub.decision.resolved      (POST /decisions/*/resolve)
    org.statehub.domain.goal.activated  (POST /domain-goals/*/activate)
    org.statehub.task.stale             (scripts/cleanup_stale_tasks.py)
- docs/nats-event-subjects.md: subject naming convention + catalog.
- docs/cron-migration.md: design stub for replacing custodian-sync
  systemd timer and cleanup-stale cron with ActivityDefinitions
  (depends on activity-core WP-0003).
- docs/activity-core-delegation.md: protocol, invariants, cutover plan.
- SCOPE.md: declares activity-core as downstream event consumer and
  restates that the state hub stays a read model, not a task factory.

Workplan: workplans/CUST-WP-0040-state-hub-nats-activity-core-integration.md
242 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 05:49:29 +02:00
66db2da9d5 Update State Hub image build provenance 2026-05-15 15:02:30 +02:00
0eb2ef0650 perf(api): CUST-WP-0041 — DB indexes, TTL caches, noload on list endpoints
- Migration t7o8p9q0r1s2: indexes on tasks.status, tasks(workstream_id,status),
  workstreams.status, sbom_snapshots(repo_id,snapshot_at)
- workplan-index: 30 s TTL cache + ?refresh param (4171 ms → 16 ms on hit)
- /state/summary: 15 s TTL cache, bypassed on Cache-Control: no-cache
- /topics/: noload(workstreams, decisions, progress_events) (2382 ms → 115 ms)
- /domains/: noload(topics, repos, goals) (2252 ms → 39 ms)
- /repos/: noload(goals) (2222 ms → 599 ms first / fast on repeat)
- conftest: reset TTL caches between tests to prevent bleed-through

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 11:12:17 +02:00
41ce4ede2c feat(dashboard): poll optimisation — T4, T5, T6
T4: workstreams.md and dependencies.md now call /state/deps instead of the
    full /state/summary — removes 2 heavy 10-table queries per 60 s cycle.

T5: index.md's 4 independent polling loops (summaryState, sbomSnapState,
    regsState, wsChartState) consolidated into a single pageState generator
    with one Promise.all batch and a shared backoff counter.

T6: config.js gains waitForVisible(ms) — pauses polling entirely while the
    tab is hidden and fires immediately on visibilitychange.  pollDelay()
    simplified (hidden-tab POLL_HIDDEN logic removed).  All 16 polling pages
    migrated from await sleep(pollDelay(...)) to await waitForVisible(pollDelay(...)).

CUST-WP-0039 complete — all 6 tasks done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 17:58:18 +02:00
512c0a73ed feat(api): dashboard poll optimisation — T1, T2, T3
T1: Cache-Control max-age=60 on /topics/, /repos/, /domains/ list endpoints
    so repeated dashboard polls within a minute are served from browser cache.

T2: ETag middleware (md5 hash) on all JSON GET responses with conditional-GET
    (304 Not Modified) support; If-None-Match and ETag added to CORS headers.
    ETag registered inside CORS so 304s automatically carry CORS headers.

T3: GET /state/deps — lightweight dep-graph endpoint returning open workstreams
    with depends_on/blocks edges only, skipping the 10-table full-summary query.
    Prerequisite for T4 (switching workstreams.md and dependencies.md off /state/summary).

Workplan: CUST-WP-0039-dashboard-poll-optimization.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 17:26:30 +02:00
e01298177e fix(api): restrict uvicorn --reload to source dirs only
Watching .venv/ (6k files) and dashboard/node_modules/ (6k files) was
causing sustained ~42% CPU on the uvicorn main process.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 15:26:48 +02:00
646fb21113 Load limiting safeguards 2026-05-06 04:04:53 +02:00
30f7ac8fea Updated repo onboarding 2026-05-04 19:18:10 +02:00
bfed370a6e Improved workplan dependency management facilities 2026-05-04 11:45:24 +02:00
4d6164e81b register project fix 2026-05-04 11:04:25 +02:00
85bf0bc180 railiance-bootstrap to railiance-cluster rename cleanup 2026-05-03 16:30:07 +02:00
190704a4eb Locked in cytoscape.js as visualization for dep graph 2026-05-03 01:43:50 +02:00
acd52966d3 Codex state hub stuff added 2026-05-02 17:13:13 +02:00
c48f26dd1f Overview shows workstreams by repo now and allows drilldown 2026-05-02 12:01:45 +02:00
f4e00f5070 Cleanup of documentation 2026-05-02 00:46:07 +02:00
95bcc5c83c Task flow engine implementation 2026-05-02 00:21:14 +02:00
4be941e65e Implemented foundation of task-flow-engine 2026-05-01 22:19:03 +02:00
5d50dee3f1 Implemented Ad-Hoc Task handling 2026-05-01 21:27:52 +02:00
01af0f8810 scope refactoring 2026-05-01 01:47:14 +02:00
273bcb5b78 state-hub scope functionality work 2026-05-01 01:33:15 +02:00
f99d77d539 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-01:
  - update .custodian-brief.md for the-custodian
2026-05-01 00:55:53 +02:00
17303d2519 feat(state-hub): Interface Change Registry (CUST-WP-0033 T01-T06)
Adds first-class tracking for API and interface mutations across the
agent ecosystem. Breaking changes are documented, affected repos are
notified via inbox, and agents discover pending changes at session
start via the dispatch endpoint.

- Migration q4l5m6n7o8p9: interface_changes table
- Model/schema: InterfaceChange with draft→published→resolved lifecycle
- Router: POST/GET/PATCH /interface-changes/, /publish, /resolve actions
  (auto-notify affected repo agents on publish; progress event on origin)
- Dispatch: GET /repos/{slug}/dispatch now returns pending_interface_changes
- MCP tools: register_interface_change, list_interface_changes,
  publish_interface_change, resolve_interface_change
- Dashboard: /interface-changes page with type badges, planned calendar,
  published cards, and draft table
- EP-CUST-ICR-001 registered: webhook subscriptions (deliberately deferred)

First record: trailing-slash normalisation (2026-04-26), published,
affecting repo-registry — visible in repo-registry dispatch immediately.

223 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 15:29:08 +02:00
8dd15efde1 fix(api): normalize trailing slashes — no slash on param routes
Rule: trailing slash only on collection roots (/). Any route containing
a path parameter {…} uses no trailing slash. Applies across all routers,
scripts, Makefile, and tests. Fixes 307-redirect fragility on POST/PATCH
from naive clients (curl, Codex HTTP calls).

Also adds POST /repos/{slug}/sync — runs ADR-001 consistency check with
--fix via HTTP, so non-MCP agents (Codex) can self-service DB sync without
operator intervention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 15:13:01 +02:00
fa31befeff fix(sbom): resolve repo path from hub host_paths when --repo-path omitted
Previously defaulted to CWD ("."), causing ingest to silently scan the
state-hub directory instead of the target repo when called without
--repo-path. Now queries GET /repos/{slug}/ for host_paths[hostname]
and exits with a clear error if neither flag nor hub lookup succeeds.

Also deleted the incorrect SBOM snapshot for repo-registry (420 entries
that were actually state-hub packages).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:27:09 +02:00
7a65da1ebe chore(deps): update uv.lock 2026-04-26 13:23:33 +02:00
3481a15e7f feat(registration): add --codex flag and AGENTS.md template
- register_project.sh: parse --additional/--codex as named flags (not
  positional), skip MCP check in codex mode, generate AGENTS.md from
  agents-codex.template instead of CLAUDE.md + .claude/rules/
- agents-codex.template: new template for Codex repos — HTTP REST session
  protocol, inbox/progress curl examples, ADR-001 workplan convention
- Makefile: add register-codex-project target

Driven by onboarding repo-registry (first non-Claude-Code repo, first
repo under the capabilities domain).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:17:50 +02:00
2eea36cdc9 Added missing api to the make guidance 2026-04-25 23:15:30 +02:00
e410d44959 feat(consistency): T04 push seal — closed-loop writeback for automated commits
Root cause of the 501-commit pile-up in inter-hub: fix_repo() created
git commits (brief updates, T03 writebacks) but never pushed them, so
the 15-minute timer accumulated local commits indefinitely. Once real
development landed on remote the repos diverged with no self-healing path.

Changes
-------
repo_sync.py (new module)
  Extracts all git lifecycle primitives: pull_ff, push_ff,
  count_remote_ahead (C-16 input), count_local_ahead (C-17/T04 input).
  Module docstring documents the push-seal invariant and stable state.

consistency_check.py
  - Imports primitives from repo_sync; thin _detect_behind_remote wrapper
    preserves backward compat for existing callers and tests.
  - C-17 backlog guard: if local has unpushed commits from a prior failed
    push, retry before making more; skip all writes if push still fails.
  - T04 push seal: unconditional push_ff() at end of every fix_repo() run.
  - _report_needs_action: ahead_of_remote param so repos with unpushed
    backlogs are not silently skipped as "clean" by fix_all_remote().
  - Domain-slug fallback: brief no longer degrades to "(unknown)" when all
    workplans are completed — falls back to any workstream for domain context.
  - Service switched from --all --fix to --remote --all (pulls before
    fixing, skips already-clean repos).

push-seal.md (new)
  Capability documentation: the problem, the invariant, all three checks
  (C-16/C-17/T04), stable-state description, API reference, and test map.

test_repo_sync.py (new, 32 tests)
  Full coverage of all four primitives via real git repos (tmp_path).
  Includes C-17 scenario, push-seal invariant, and four end-to-end
  loop-stability tests.

test_consistency_check.py
  Four new _report_needs_action cases for the ahead_of_remote parameter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 01:43:40 +02:00
13734c404e Improved documentation of how to start everything 2026-04-20 00:04:46 +02:00
f794d8c47f feat(token-events): auto-capture real token counts via PostToolUse hook
- Add PATCH /token-events/{id} endpoint to correct heuristic events
- Add `note` filter to GET /token-events/ list
- Add TokenEventPatch schema
- Add task_token_hook.py: PostToolUse hook that reads the Claude Code
  session transcript, computes per-task token delta, and replaces the
  heuristic token event with real measured counts (note="measured")
- Register hook in ~/.claude/settings.json on mcp__state-hub__update_task_status
  Covers both interactive sessions and ralph-workplan loops

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:38:45 +02:00
fa1694b49b make bridge target 2026-04-01 21:53:51 +02:00
d58ef71339 feat(capability-registry): CUST-WP-0031 domain capability registry
- Migration p3k4l5m6n7o8: nullable repo_id FK on capability_catalog
- PATCH /capability-catalog/{id} endpoint for back-filling repo attribution
- register_capability MCP tool accepts optional repo_slug
- get_domain_summary now includes compact capabilities list (type+title+repo_slug)
- New get_capability_profile MCP tool: domain → repos → capabilities tree
- 6 repo descriptions populated; 25 catalog entries attributed to repos
- 9 new capabilities registered for personhood, foerster_capabilities, coulomb_social
- TOOLS.md: Capability Catalog & Requests section with full tool reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:23:45 +02:00
5399fcec20 fix(dashboard): merge token + event charts into single dual-axis chart on Progress page
Replaces the two separate charts with one combined area+line chart.
Events use the left y-axis (steelblue); tokens use a normalized scale
with a right y-axis (amber) that formats values as k/M. When no token
data exists yet the right axis is omitted and a legend note explains.
Hover tooltips show actual values for both series.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 00:48:22 +02:00
b54ee8149c feat(dashboard): add tokens consumed per day chart to Progress page
Fetches /token-events/?limit=1000 in parallel with progress events and
renders a second area+line chart (amber) below the events-per-day chart,
aggregating tokens_in + tokens_out per calendar day over the same 30-day window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 00:42:09 +02:00
f5c166e77e feat(dashboard): add repo filter, sort order, and max results controls to Token Cost page
Three reactive dropdowns below the Token Cost heading:
- Filter by repo: client-side filter via 3-level chain resolution
- Sort by: Tokens Total (default), Tokens In, Out, Event Count, Most Recent
- Show: 10/20/50/100/500 rows per table (default 20)

Applies uniformly to By Repo, By Workplan, and Top Tasks tables.
"Most Recent" derives last_event_at per group from the fetched events.
Truncated tables show a "Showing M of N" count below.

Completes CUST-WP-0030 T07–T09.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 00:02:17 +02:00
20ce332d55 feat(dashboard): entity list UX — REF column, name cells, detail pages (CUST-WP-0030)
- ref-cell.js: REF column component — click=copy deeplink, dblclick=open
- field-help.js: field registry + fieldRow helper with help-tip decoration;
  FK fields (task_id, workstream_id, repo_id) render as async-linked cells
  with entity-title bubble-help on hover
- GET /token-events/{id} endpoint + get-by-id tests
- GET /repos/by-id/{repo_id} UUID lookup endpoint
- Landing pages: /token-events/[id], /workstreams/[id], /repos/[slug], /tasks/[id]
- token-cost.md: REF + Name columns on all three tables; parallel fetch of
  workstreams/tasks for title resolution
- reference.md: entity detail page URL scheme documented

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 22:35:35 +02:00
bb965e4990 feat(token-tracking): repo aggregation via graph walk (task→workstream→repo)
By Repo now resolves via the full chain rather than requiring repo_id
directly on the token event:
  1. token_events.repo_id (direct)
  2. → workstreams.repo_id (via workstream_id)
  3. → task.workstream_id → workstreams.repo_id (via task_id)

Changes:
- Auto-populate repo_id on token events at creation time (both the
  token_events router and the tasks router)
- New GET /token-events/by-repo/ endpoint with RepoTokenSummary schema;
  returns tokens_in/out/total, event_count, by_model, by_note per repo
- Dashboard By Repo section uses /by-repo/ directly and shows repo_slug
  instead of a truncated UUID
- Backfilled the three existing events (userbased) with repo_id via SQL

185 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 19:05:23 +02:00
68943684e1 feat(token-tracking): introduce token note taxonomy (measured/userbased/workplan/heuristic)
Tier 1 (exact counts) now defaults to note="measured" instead of null,
signalling the counts were read from the Claude Code status bar.
Callers can pass note="userbased" when a human provided the numbers.

  measured  — agent read exact counts from the Claude Code status bar
  userbased — counts provided by a human
  workplan  — prorated from workplan total across task count
  heuristic — server fallback, 1000/500, no agent input

Added token_note field to TaskUpdate schema and exposed note param on
update_task_status and record_interactive_task MCP tools.
TOOLS.md documents the full taxonomy. 185 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 18:47:40 +02:00
edffc62775 feat(token-tracking): add record_interactive_task MCP tool
New tool for capturing ad-hoc work done outside formal workplans.
Finds or creates a persistent 'interactive-<repo>' workstream for the
repo, creates the task, marks it done, and records a token event using
the three-tier logic — all in a single call.

Seeded two example events on interactive-the-custodian:
  - Three-tier token recording on task done (8000/3500)
  - Add record_interactive_task MCP tool (4500/1800)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 18:36:51 +02:00
e247c20439 feat(token-tracking): three-tier token recording on task done
Token events are now always created when update_task_status is called
with status="done", using the best available data:

  Tier 1 (best): exact tokens_in + tokens_out passed by agent
  Tier 2:        workplan_tokens_in + workplan_tokens_out prorated
                 across workstream task count (note="workplan")
  Tier 3 (fallback): heuristic 1000 in / 500 out (note="heuristic")

Non-done status changes never create a token event.
MCP tool updated with workplan_tokens_in/out params and tiered docs.
Ralph-workplan skill files updated with the three-tier guidance.
184 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 18:28:18 +02:00
d65bc701da feat(token-tracking): record AI token consumption per task (CUST-WP-0029)
Introduces end-to-end token consumption tracking so agent work is
visible as a cost/effort metric alongside tasks and workplans.

- Migration o2j3k4l5m6n7: token_events table with FK indexes on
  task_id, workstream_id, repo_id, created_at
- ORM model, Pydantic schemas (TokenEventCreate, TokenEventRead with
  computed tokens_total, TokenSummary)
- Router: POST /token-events/, GET /token-events/ (7 filters),
  GET /token-events/summary/ (task|workstream|repo|commit|release scope)
- MCP tools: record_token_event, get_token_summary (formatted table)
- update_task_status enriched with optional tokens_in/tokens_out
  passthrough — one call creates status update + token event
- Dashboard token-cost.md page: by-repo bar, by-workplan table,
  by-model bar, top-10 tasks by tokens
- ralph-workplan skill updated with token reporting guidance and
  per-task heuristics for estimating counts
- Tests: test_token_events.py + test_token_passthrough.py (182 pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 17:46:46 +02:00
b0b6c98aec fix(consistency): prevent post-commit hook re-entrancy loop
The post-commit hook re-invokes fix-consistency, which commits writeback
changes, which re-triggers the hook — causing exponential process spawning.

Fix: pass GIT_CUSTODIAN_SYNC=1 in the env for all writeback git commits.
Update the post-commit hook (not tracked by git) to exit early when this
variable is set.

Also remove the --no-verify flag that was added as a failed attempt (it
only skips pre-commit/commit-msg, not post-commit hooks).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 23:55:07 +01:00
091695766b feat(repos): git-fingerprint-based machine-independent repo identity
Add git_fingerprint (root commit SHA-1) to managed_repos as a stable,
machine-independent identifier — identical across every clone regardless
of checkout path, remote URL, or SSH alias.

- Migration n1i2j3k4l5m6: adds git_fingerprint column + non-unique index
  (non-unique to support repos that share ancestry via forks/splits)
- GET /repos/by-fingerprint?hash=<sha>[&remote_url=<url>]: lookup by
  fingerprint; optional remote_url disambiguates shared-ancestry repos
- GET /repos/by-remote?url=<url>: fallback lookup by remote URL
- consistency_check.py --here [PATH]: auto-detects repo slug from any
  local checkout via fingerprint (falls back to remote URL), then auto-
  registers host_paths[hostname] so subsequent runs need no override
- --all now includes repos with host_paths[current_hostname], not just
  those with local_path
- fix-consistency-here / check-consistency-here Makefile targets
- Fixed _api_get bug: httpx strips query strings when params={} is passed
- Backfilled fingerprints for 14 repos on this host

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 23:55:06 +01:00
aa88a53db2 feat(mcp): add create_topic tool
POST /topics/ was already implemented in the REST API but had no MCP
wrapper, so agents couldn't create topics (e.g. inter_hub) via MCP.
Tool follows the same pattern as create_domain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 01:32:52 +01:00
662db5c593 fix(state-hub): fix mcp-http startup crash and remove legacy tunnel targets
- Add `Optional` to typing imports in mcp_server/server.py — it was used
  in 13 annotations but never imported, crashing FastMCP v3 at startup
- Remove legacy tunnel/tunnel-daemon/tunnel-loop/tunnel-status/tunnel-stop
  targets from Makefile; ops-bridge (tunnels-up/status/check) supersedes them

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:32:51 +01:00
b19896a9a9 docs(dashboard): add technical reference page for Observable Framework dashboard
Documents the dashboard's architecture, framework choice rationale, data-fetching
strategies (static loaders + live polling), component library, page inventory,
and key features including the Workstream Health Index and entity modals.
Also registers the new page in the Reference nav and adds runbook section for
node overload / runaway agent process (INC-002) with hardening checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 00:09:18 +01:00
6018df03cf feat(brief): generate .custodian-brief.md per repo for offline worker orientation
Adds _write_custodian_brief() to consistency_check.py. After every fix_repo()
run, a .custodian-brief.md is written to the repo root with: domain, last-synced
timestamp, current repo goal, active workstreams with progress (done/total), and
the first 7 open tasks per workstream (blocked → in_progress → todo order) with
task IDs. The file is git-committed when content changes so remote workers (e.g.
CoulombCore) can pull it and orient without a live MCP connection.

Session protocol template and CLAUDE.md updated: read .custodian-brief.md first,
then call get_domain_summary() as an enhancement (skip if MCP unreachable).
This eliminates false "State hub is offline" alarms in subagents and remote workers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:48:36 +01:00
8ad371f753 feat(consistency): fix-consistency-remote works without REPO for all repos
Adds --remote CLI flag and fix_all_remote() function. When run without a
REPO argument, the target checks all registered repos and:
- Skips repos whose local path does not exist on this machine
- Skips repos that are already clean (no fixable issues, no FAILs, not
  behind remote, only C-08 background noise allowed)
- For repos that need work: git pull --ff-only then fix_repo()

Prints a summary of CLEAN (skipped) and NOT ON THIS HOST (skipped) repos
before the detailed fix reports.

Simplifies the Makefile target from shell-level curl+git to a single
uv run call using --remote. Same flag handles both single-repo and all-repos.

Also adds _git_pull() helper and 13 new tests (71 total in consistency suite).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 14:38:30 +01:00