Replace the fixed 15s TTL on GET /state/summary with per-table revision watermarks, stale-while-revalidate background refresh, and a progress-tail section split. SQLAlchemy write hooks invalidate core or progress sections on mutation. Adds tests, benchmark script, and operator docs.
11 KiB
State Hub
State Hub is the live coordination service for the Custodian ecosystem: PostgreSQL persistence, FastAPI API, FastMCP server, Observable dashboard, consistency tooling, and repo/workplan synchronization.
This repository is the standalone home for the service. It was extracted from the former embedded implementation at:
/home/worsch/the-custodian/state-hub
Extraction State
The extraction workplan CUST-WP-0043 - State Hub Repo Extraction is complete.
Current state:
- The implementation has been imported here with subtree history.
CUST-WP-0042has been re-homed into this repository.- The old embedded tree in
the-custodianremains only as a pointer. - This repository is authoritative for State Hub code, docs, tests, dashboard, migrations, scripts, policies, and State Hub-local workplans.
Workplans
New State Hub-local workplans should use the prefix:
STATE-WP-0001
Legacy Custodian-hosted State Hub plans, such as CUST-WP-0042, may retain
their existing IDs when that preserves State Hub workstream/task continuity.
Do not create duplicate workstreams manually; write the workplan file first,
then run consistency sync.
Stack
| Layer | Technology | Port |
|---|---|---|
| Database | PostgreSQL 16-alpine (Docker) | 127.0.0.1:5432 |
| API | FastAPI + SQLAlchemy 2.0 async + asyncpg | 127.0.0.1:8000 |
| MCP server | FastMCP SSE | 127.0.0.1:8001 |
| Dashboard | Observable Framework | 127.0.0.1:3000 |
| CLI | custodian (Python, uv entry point) |
— |
All services bind to 127.0.0.1 only — nothing exposed to the network.
Setup
Prerequisites
- Docker Engine
- Python 3.12+ with
uv(pip install uv) - Node.js 18+ (dashboard only)
First-time
cd /home/worsch/state-hub
cp .env.example .env # edit POSTGRES_PASSWORD
make install # uv sync
make db # docker compose up postgres
make migrate # alembic upgrade head
make seed # insert 6 canonical topics
make api # db + migrate + uvicorn :8000 (restarts if running)
Dashboard
make dashboard # installs dashboard deps if needed, then Observable dev server on :3000
make dashboard-check # installs deps if needed, then runs Observable build
Start Everything
To start all the infrastructure on separate consoles do:
make db # docker compose up postgres
make mcp-http # start state-hub mcp service
make dashboard # Observable dev server on :3000
make bridges # Set up ssh bridges for cross machines access
CLI
make install-cli # symlink .venv/bin/custodian → ~/.local/bin
custodian status # API health + summary totals
custodian register-project # register cwd as a Custodian project
Makefile Targets
| Target | What it does |
|---|---|
make install |
uv sync — install Python deps + entry points |
make install-cli |
Symlink custodian to ~/.local/bin |
make db |
Start postgres container |
make db-tools |
Start postgres + pgadmin (http://127.0.0.1:5050) |
make migrate |
alembic upgrade head |
make seed |
Insert 6 canonical topics (legacy bootstrap) |
make register-from-classification REPO=slug |
Upsert repo from .repo-classification.yaml |
make register-from-classification-all |
Bulk reclassify all repos with classification files |
make api |
db + wait + migrate + uvicorn (restarts if running) |
make dashboard-install |
Install dashboard npm deps from dashboard/package-lock.json |
make dashboard-check |
Build the Observable dashboard as a smoke/regression check |
make dashboard |
Install deps if needed, then start Observable dev server (restarts if running) |
make check |
curl /state/health |
make test |
Python test suite plus make dashboard-check |
make register-project DOMAIN=x PROJECT_PATH=y |
Register a project |
make clean |
docker compose down -v (destroys DB volume) |
Database Schema
Repo-anchored coordination spine (STATE-WP-0065):
domains (14 market domains: infotech, financials, communication, …)
managed_repos (classification: category, domain, capability_tags, business_stake, …)
└── workplans (repo_id required; topic_id optional legacy tag)
└── tasks
└── progress_events
topics (optional cross-repo tag; domain_id → market domain)
decisions (FK: topic_id and/or workplan_id)
Each registered repo carries a committed .repo-classification.yaml (canon
standard v1.0). Registration and reclassification use
make register-from-classification.
Key enums / vocabularies
| Field | Values |
|---|---|
workplan_status |
proposed · ready · active · blocked · backlog · finished · archived |
task_status |
wait · todo · progress · done · cancel |
repo category |
experimental · research · project · tooling · product · business |
market domain |
14 fixed slugs — see the-custodian/canon/standards/repo-classification.allowed.yaml |
Governance constraints encoded in schema
- No hard DELETE endpoints — only soft:
archived,cancel,superseded progress_eventshas noupdated_atand no DELETE endpoint (append-only per constitution §5)decisionswith financial/legal keywords +pendingtype → auto-setescalation_note(§4)
API
Interactive docs at http://127.0.0.1:8000/docs once the API is running.
Key endpoint: /state/summary
Returns a full snapshot in one call — used by both the MCP server and dashboard:
{
"generated_at": "...",
"totals": {
"topics": { "active": 6, "paused": 0, "archived": 0, "total": 6 },
"workstreams": { "ready": 1, "active": 1, "blocked": 0, "finished": 1, "total": 3 },
"tasks": { "wait": 0, "todo": 9, "progress": 0, "done": 11, "cancel": 0, "total": 20 },
"decisions": { "open": 1, "resolved": 0, "escalated": 0, "total": 1 }
},
"topics": [...], // topics with nested workstream stubs
"blocking_decisions": [...], // pending decisions only
"waiting_tasks": [...],
"recent_progress": [...], // last 20 events
"open_workstreams": [...]
}
Caching: responses are revision-gated — the API compares cheap per-table
MAX(updated_at) / MAX(created_at) watermarks before rebuilding. Unchanged
data returns the cached snapshot (X-StateHub-Cache: hit-revision). When core
data changes, the last good snapshot may be served immediately while a
background refresh runs (X-StateHub-Cache: stale). Force a synchronous rebuild
with ?refresh=true or Cache-Control: no-cache. Infrastructure probes should
use /state/health, not /state/summary.
Router summary
| Prefix | Operations |
|---|---|
/topics |
CRUD (soft-delete: archived) |
/workplans |
Preferred CRUD surface for repo-backed workplans (soft-delete: archived) |
/workstreams |
Legacy compatibility CRUD surface; usage is recorded by legacy-meter |
/tasks |
CRUD (soft-delete: cancel); PATCH updates status |
/decisions |
CRUD (soft-delete: superseded); auto-escalation |
/progress |
GET list + POST append — no DELETE |
/legacy-meter |
Register, meter, and review legacy interface usage |
/state/summary |
Full snapshot |
/state/health |
DB connectivity check |
See docs/workplan-terminology-transition.md for the workstream-to-workplan
compatibility policy and retirement criteria.
MCP Server
Runs as a persistent SSE service on :8001, independent of the Claude Code session.
Restart it anytime without restarting Claude Code.
make mcp-http # start (or restart) the MCP SSE server on :8001
Registered at user scope in ~/.claude.json:
{ "type": "sse", "url": "http://127.0.0.1:8001/sse" }
To re-register from scratch:
claude mcp remove state-hub -s user 2>/dev/null || true
claude mcp add-json -s user state-hub '{"type":"sse","url":"http://127.0.0.1:8001/sse"}'
See mcp_server/TOOLS.md for the full tool reference card (30 lines, faster than reading server.py).
Tools at a glance
Query (read-only): get_state_summary · get_topic · list_blocked_tasks · list_pending_decisions · get_recent_progress
Mutate (each auto-emits a progress event): create_task · update_task_status · record_decision · resolve_decision · add_progress_event · create_workplan · update_workplan_status · register_repo_from_classification
Resources: state://summary · state://topics · state://workplans/{topic_slug} · state://decisions/blocking · state://tasks/blocked
Legacy workstream_* tool names remain as aliases — see mcp_server/TOOLS.md.
custodian CLI
Installed into .venv/bin/custodian by uv sync; symlinked to ~/.local/bin by make install-cli.
custodian register-project [--domain DOMAIN] [--path PATH]
--pathdefaults to current working directory--domainis auto-detected fromproject_charter_v*.mdfrontmatter if omitted
custodian status
Prints API health, totals, and any blocking decisions.
What register-project does
- Verifies the API is reachable (fails fast with
make apihint) - Looks up the topic ID for the domain via
/topics/?status=active - Checks that
state-hubis in~/.claude.json - Writes
$PROJECT_PATH/CLAUDE.mdfromscripts/project_claude_md.template - Posts a
milestoneprogress event recording the registration
Project Registration Scripts
| Script | Purpose |
|---|---|
scripts/register_project.sh |
Shell version of custodian register-project |
scripts/patch_mcp_cwd.py |
Legacy: patched cwd for the old stdio registration (no longer needed) |
scripts/project_claude_md.template |
CLAUDE.md template with {PROJECT_NAME}, {DOMAIN}, {TOPIC_ID} |
scripts/seed.py |
Insert the 6 canonical topics into a fresh database |
scripts/pull_image.py |
WSL2 workaround: pull Docker images via Python urllib with Range-request chunking |
Dashboard
Four pages at http://127.0.0.1:3000 (dev) or built with npm run build:
| Page | Content |
|---|---|
| Overview | Status cards, task-by-status chart, recent activity feed, decisions due within 7 days |
| Workstreams | Filterable table by domain/status/owner; selected workstream task list; progress timeline |
| Decisions | Pending tab (with escalation highlights) and Made tab; resolution velocity chart |
| Progress | Append-only event feed with author badges; 30-day event volume chart |
Data loaders (src/data/*.json.py) are Python scripts that call the local API. They run at dev-server start and on npm run build. Clear the cache if data appears stale:
rm -rf dashboard/src/.observablehq/cache/
Known Issues / WSL2 Notes
- TLS bad record MAC on large downloads: WSL2 corrupts packets on big TCP transfers. Use
scripts/pull_image.pyinstead ofdocker pullfor future image pulls. - MCP server is now SSE, not stdio: Re-registration is
claude mcp add-json -s user state-hub '{"type":"sse","url":"http://127.0.0.1:8001/sse"}'. Thepatch_mcp_cwd.pyscript and.mcp.jsonconfig are legacy artifacts from the old stdio setup. - AsyncSession concurrency: SQLAlchemy 2.0 async sessions don't support concurrent operations. All queries in
/state/summaryrun sequentially on a single session.