Unblock sweep for blocked workplans vs inbox, inbox triage discipline
(thread_id + unread-age alerts), workplan ID prefix lint with cross-repo
collision detection, SCOPE Current State freshness. Findings from the
2026-07-01 railiance-cluster review (RAILIANCE-WP-0014 sat blocked 13 days
after the unblocking inbox message arrived).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Align agent files with on-disk workplan prefixes (infer from workplan ids)
- Set workplan domain to registered domain_slug; add topic_slug where applicable
- Repair frontmatter delimiter formatting; migrate legacy task status literals
- Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates
Add human-review script for 13 high-blast-radius repos, bulk-push helper,
and SSH-based Gitea inventory probe. Update exclusion list with SSH-verified
absent slugs; marki-docx now classified and registered.
Add exclusion list and batch classification author for post-cutover inventory.
Mark workplan finished after registering 7 new repos and reclassifying 43
migration rows via state-hub register-from-classification tooling.
Per 2026-06-22 review: T03 dropped (registering unregistered repos under the
old model = legacy to clean up). Implementation re-homed to state-hub-local
STATE-WP-0065; T04/T05/T10 merged into one spine migration (P1). CUST-WP-0050
stays the coordination driver. T11 (post-cutover inventory) replaces T03.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bernd confirmed kaizen-agentic and llm-connect stay agents-primary
(infotech secondary). All 11 custodian-repo .repo-classification.yaml
flipped to classified_by: human and re-validated clean against T01.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Insert a 'tooling' category between project and product (reusable internal
tooling/infrastructure: libraries, CLIs, services, ops components used across
the ecosystem rather than offered to external customers). Update §5 definition,
§11 decision procedure, §16 agent prompt, the machine-readable allowed-values,
and the CUST-WP-0050 T02 progress note. Nine custodian tooling repos
reclassified to it; the-custodian and inter-hub remain research.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
All 11 custodian-domain repos now carry a committed, validated
.repo-classification.yaml (first-pass classified_by: agent). T02 remains
in_progress pending the human-review step.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Activate the workplan and complete T01: add the machine-readable controlled
vocabulary canon/standards/repo-classification.allowed.yaml (categories,
domains, business_stake, business_mechanics, capability families, guidance),
reference it from the standard §12, and add tools/validate_repo_classification.py
(stdlib + PyYAML, --self-test PASS).
Begin T02: author the-custodian/.repo-classification.yaml (research · infotech ·
agents), which validates clean. classified_by: agent, pending human review.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adopt the repo as the primary workplan anchor: repo_id becomes required,
market-domain is derived from each repo's classification, and the
domain/topic spine is demoted/retired (RepoGoal becomes the goal primitive).
Add task T10 for the re-anchor plus the workstream -> workplan rename across
schema/API/MCP.
Add ADR-005 (Cross-Repo Workplans Live in Dedicated Project Repos): complex
cross-repo efforts get their own project repo (category: project) as the
anchor, retired to archive on completion with results living on in the
modified product repos. Rewrite D1 as resolved and add D1a for the
project-repo naming/archival convention.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Proposed workplan to adopt the Repo Classification Standard ecosystem-wide:
per-repo .repo-classification.yaml as source of truth, State Hub domain model
replaced by the standard's 14 market domains, auto-registration tooling, and
reclassification of the 57 existing registrations. Folds in the 2026-06-21
discrepancy findings as reconciliation targets. Blocking design question D1
(topic vs market-domain) flagged for resolution before schema work.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move specs/RepoClassificationStandard.md to
canon/standards/repo-classification-standard_v1.0.md with provenance
frontmatter (id: canon-repo-classification, status: active, v1.0). The
standard originated in Helix Forge; the-custodian is interim steward. Leave
a pointer stub in specs/ redirecting to canon and the rollout workplan.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- "seven project domains" (contradicted "six") -> "a growing set"
- clarify the six canon charters are the founding domains; note the hub
now tracks a larger live set (14 active) via list_domains()
- rename Foerster Capabilities -> Capabilities (live domain slug) in the
dependency chain and domains table
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Archive the out-of-date SCOPE.md to history/20260621-SCOPE.md and rebuild
it via the kaizen scope-analyst persona, grounded in INTENT.md (the newer
boundary authority), the repo tree, and the live list_domains() result.
Fixes:
- Stop conflating this repo with the standalone State Hub service: reframe
Provided Capabilities to governance canon / session protocol / append-only
memory (state-tracking, SBOM, MCP-tool-registration belong to /state-hub).
- Add missing boundary owners issue-core and repo-scoping to Out of Scope.
- Replace the self-contradictory 6/7 domain count with a pointer to the live
list (14 active as of 2026-06-21).
- Add updated frontmatter for freshness tracking.
Also records the scope-analyst agent memory for future runs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Switch coach and optimization agents from daily to weekly Monday crons
- Restore disabled tdd-workflow stanza; quote cron expressions
- Add credential routing guidance to AGENTS.md for Codex/Grok agents
- Wire credential-routing rule into CLAUDE.md for Claude Code sessions
- Scaffold kaizen agent memory files and record failed daily-triage run
Seed a non-secret service inventory (environments, hosts, clusters,
services, endpoints, access paths, evidence, gaps) with a JSON schema,
a renderer, and a generated service-catalog view. Adds the
`make ops-inventory-view` target, probe ActivityDefinition, and docs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codex daily-state-hub-wsjf-triage paused per operator confirmation;
activity-core is now the only enabled runner for the daily State Hub
WSJF triage at 07:20 Europe/Berlin. Set status: draft → active to
match.
Temporal schedule activity-schedule-6fca51fa-... is unpaused; next
run tomorrow ~07:20 Europe/Berlin will be the first real scheduled
verification.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The daily-triage workflow completed end-to-end with all three evidence
surfaces (working-memory note, State Hub daily_triage event, ActivityRun
row) referencing the same run_id f9b97749. Backend: llm-connect against
OpenRouter anthropic/claude-sonnet-4, 12.85s end-to-end.
Add Implementation Notes - 2026-06-02 capturing the bug chain found and
fixed today (five llm-connect commits, two activity-core commits), the
backend choice and its consequences for the next scheduled run, and an
explicit carve-out: the operational cutover step (pause Codex, flip
enabled: true, sync schedules) is intentionally deferred to operator
action and remains a prerequisite for T08.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
llm-connect's CLI default port (:8080) collides with the dev stack's
temporal-ui container. Hit during the 2026-06-01 cutover attempt with
OSError: Address already in use. Update Steps 3, 5, and 6 to use :8088
and note the conflict reason inline so the next operator does not
rediscover this the slow way.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T06 remains in_progress — no canary was rerun. Capture the runbook deliverable
(workplans/CUST-WP-0045-cutover-runbook.md @ 8ef5399), the still-unchanged
upstream fixes that should let the patched canary succeed, and the two
operational gotchas the runbook now documents (host-mode env overrides vs.
Docker-network .env; Claude CLI quota collision when triggering from inside
an active Claude Code session). Bump updated: to 2026-06-01.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Exact command sequence to rerun the patched real-LLM canary and, on success,
perform the Codex → activity-core cutover. Captures the heads-up about CLI
session collision, the host-mode env-var overrides for the worker/API, and the
verification queries for all three evidence surfaces. Frontmatter uses
type: runbook so the consistency checker does not treat it as a workstream.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Makes the state hub an event publisher so activity-core can drive
maintenance automation declaratively via ActivityDefinitions, rather
than the hub creating tasks itself.
- api/events/: lazy JetStream publisher + EventEnvelope mirroring
activity-core's contract; no-op when NATS_URL unset, fire-and-forget
with logged failures so publishing never breaks an API request.
- Wired publishers on the five v1.0 lifecycle events:
org.statehub.repo.registered (POST /repos/)
org.statehub.workstream.completed (PATCH /workstreams/* on transition)
org.statehub.decision.resolved (POST /decisions/*/resolve)
org.statehub.domain.goal.activated (POST /domain-goals/*/activate)
org.statehub.task.stale (scripts/cleanup_stale_tasks.py)
- docs/nats-event-subjects.md: subject naming convention + catalog.
- docs/cron-migration.md: design stub for replacing custodian-sync
systemd timer and cleanup-stale cron with ActivityDefinitions
(depends on activity-core WP-0003).
- docs/activity-core-delegation.md: protocol, invariants, cutover plan.
- SCOPE.md: declares activity-core as downstream event consumer and
restates that the state hub stays a read model, not a task factory.
Workplan: workplans/CUST-WP-0040-state-hub-nats-activity-core-integration.md
242 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Migration t7o8p9q0r1s2: indexes on tasks.status, tasks(workstream_id,status),
workstreams.status, sbom_snapshots(repo_id,snapshot_at)
- workplan-index: 30 s TTL cache + ?refresh param (4171 ms → 16 ms on hit)
- /state/summary: 15 s TTL cache, bypassed on Cache-Control: no-cache
- /topics/: noload(workstreams, decisions, progress_events) (2382 ms → 115 ms)
- /domains/: noload(topics, repos, goals) (2252 ms → 39 ms)
- /repos/: noload(goals) (2222 ms → 599 ms first / fast on repeat)
- conftest: reset TTL caches between tests to prevent bleed-through
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Workplan for delegating maintenance automation from the state hub to
activity-core via NATS JetStream. Covers NATS publisher wiring, subject
schema, lifecycle event emission, cron migration stubs, and delegation
protocol docs.
Hub workstream: d8ac100b-a844-46a5-9684-415df0d32539
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
T4: workstreams.md and dependencies.md now call /state/deps instead of the
full /state/summary — removes 2 heavy 10-table queries per 60 s cycle.
T5: index.md's 4 independent polling loops (summaryState, sbomSnapState,
regsState, wsChartState) consolidated into a single pageState generator
with one Promise.all batch and a shared backoff counter.
T6: config.js gains waitForVisible(ms) — pauses polling entirely while the
tab is hidden and fires immediately on visibilitychange. pollDelay()
simplified (hidden-tab POLL_HIDDEN logic removed). All 16 polling pages
migrated from await sleep(pollDelay(...)) to await waitForVisible(pollDelay(...)).
CUST-WP-0039 complete — all 6 tasks done.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
T1: Cache-Control max-age=60 on /topics/, /repos/, /domains/ list endpoints
so repeated dashboard polls within a minute are served from browser cache.
T2: ETag middleware (md5 hash) on all JSON GET responses with conditional-GET
(304 Not Modified) support; If-None-Match and ETag added to CORS headers.
ETag registered inside CORS so 304s automatically carry CORS headers.
T3: GET /state/deps — lightweight dep-graph endpoint returning open workstreams
with depends_on/blocks edges only, skipping the 10-table full-summary query.
Prerequisite for T4 (switching workstreams.md and dependencies.md off /state/summary).
Workplan: CUST-WP-0039-dashboard-poll-optimization.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Watching .venv/ (6k files) and dashboard/node_modules/ (6k files) was
causing sustained ~42% CPU on the uvicorn main process.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds first-class tracking for API and interface mutations across the
agent ecosystem. Breaking changes are documented, affected repos are
notified via inbox, and agents discover pending changes at session
start via the dispatch endpoint.
- Migration q4l5m6n7o8p9: interface_changes table
- Model/schema: InterfaceChange with draft→published→resolved lifecycle
- Router: POST/GET/PATCH /interface-changes/, /publish, /resolve actions
(auto-notify affected repo agents on publish; progress event on origin)
- Dispatch: GET /repos/{slug}/dispatch now returns pending_interface_changes
- MCP tools: register_interface_change, list_interface_changes,
publish_interface_change, resolve_interface_change
- Dashboard: /interface-changes page with type badges, planned calendar,
published cards, and draft table
- EP-CUST-ICR-001 registered: webhook subscriptions (deliberately deferred)
First record: trailing-slash normalisation (2026-04-26), published,
affecting repo-registry — visible in repo-registry dispatch immediately.
223 tests passing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rule: trailing slash only on collection roots (/). Any route containing
a path parameter {…} uses no trailing slash. Applies across all routers,
scripts, Makefile, and tests. Fixes 307-redirect fragility on POST/PATCH
from naive clients (curl, Codex HTTP calls).
Also adds POST /repos/{slug}/sync — runs ADR-001 consistency check with
--fix via HTTP, so non-MCP agents (Codex) can self-service DB sync without
operator intervention.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously defaulted to CWD ("."), causing ingest to silently scan the
state-hub directory instead of the target repo when called without
--repo-path. Now queries GET /repos/{slug}/ for host_paths[hostname]
and exits with a clear error if neither flag nor hub lookup succeeds.
Also deleted the incorrect SBOM snapshot for repo-registry (420 entries
that were actually state-hub packages).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause of the 501-commit pile-up in inter-hub: fix_repo() created
git commits (brief updates, T03 writebacks) but never pushed them, so
the 15-minute timer accumulated local commits indefinitely. Once real
development landed on remote the repos diverged with no self-healing path.
Changes
-------
repo_sync.py (new module)
Extracts all git lifecycle primitives: pull_ff, push_ff,
count_remote_ahead (C-16 input), count_local_ahead (C-17/T04 input).
Module docstring documents the push-seal invariant and stable state.
consistency_check.py
- Imports primitives from repo_sync; thin _detect_behind_remote wrapper
preserves backward compat for existing callers and tests.
- C-17 backlog guard: if local has unpushed commits from a prior failed
push, retry before making more; skip all writes if push still fails.
- T04 push seal: unconditional push_ff() at end of every fix_repo() run.
- _report_needs_action: ahead_of_remote param so repos with unpushed
backlogs are not silently skipped as "clean" by fix_all_remote().
- Domain-slug fallback: brief no longer degrades to "(unknown)" when all
workplans are completed — falls back to any workstream for domain context.
- Service switched from --all --fix to --remote --all (pulls before
fixing, skips already-clean repos).
push-seal.md (new)
Capability documentation: the problem, the invariant, all three checks
(C-16/C-17/T04), stable-state description, API reference, and test map.
test_repo_sync.py (new, 32 tests)
Full coverage of all four primitives via real git repos (tmp_path).
Includes C-17 scenario, push-seal invariant, and four end-to-end
loop-stability tests.
test_consistency_check.py
Four new _report_needs_action cases for the ahead_of_remote parameter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add PATCH /token-events/{id} endpoint to correct heuristic events
- Add `note` filter to GET /token-events/ list
- Add TokenEventPatch schema
- Add task_token_hook.py: PostToolUse hook that reads the Claude Code
session transcript, computes per-task token delta, and replaces the
heuristic token event with real measured counts (note="measured")
- Register hook in ~/.claude/settings.json on mcp__state-hub__update_task_status
Covers both interactive sessions and ralph-workplan loops
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7-task workplan to give worker agents efficient MCP access to domain/repo
scope and capabilities without exploring source repos directly.
Phase 1 (code): repo_id FK on CapabilityCatalog, capabilities in
get_domain_summary, new get_capability_profile MCP tool.
Phase 2 (data): populate missing repo descriptions, back-fill repo_id
on 25 existing entries, register capabilities for 3 empty domains.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the two separate charts with one combined area+line chart.
Events use the left y-axis (steelblue); tokens use a normalized scale
with a right y-axis (amber) that formats values as k/M. When no token
data exists yet the right axis is omitted and a legend note explains.
Hover tooltips show actual values for both series.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fetches /token-events/?limit=1000 in parallel with progress events and
renders a second area+line chart (amber) below the events-per-day chart,
aggregating tokens_in + tokens_out per calendar day over the same 30-day window.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three reactive dropdowns below the Token Cost heading:
- Filter by repo: client-side filter via 3-level chain resolution
- Sort by: Tokens Total (default), Tokens In, Out, Event Count, Most Recent
- Show: 10/20/50/100/500 rows per table (default 20)
Applies uniformly to By Repo, By Workplan, and Top Tasks tables.
"Most Recent" derives last_event_at per group from the fetched events.
Truncated tables show a "Showing M of N" count below.
Completes CUST-WP-0030 T07–T09.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
By Repo now resolves via the full chain rather than requiring repo_id
directly on the token event:
1. token_events.repo_id (direct)
2. → workstreams.repo_id (via workstream_id)
3. → task.workstream_id → workstreams.repo_id (via task_id)
Changes:
- Auto-populate repo_id on token events at creation time (both the
token_events router and the tasks router)
- New GET /token-events/by-repo/ endpoint with RepoTokenSummary schema;
returns tokens_in/out/total, event_count, by_model, by_note per repo
- Dashboard By Repo section uses /by-repo/ directly and shows repo_slug
instead of a truncated UUID
- Backfilled the three existing events (userbased) with repo_id via SQL
185 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tier 1 (exact counts) now defaults to note="measured" instead of null,
signalling the counts were read from the Claude Code status bar.
Callers can pass note="userbased" when a human provided the numbers.
measured — agent read exact counts from the Claude Code status bar
userbased — counts provided by a human
workplan — prorated from workplan total across task count
heuristic — server fallback, 1000/500, no agent input
Added token_note field to TaskUpdate schema and exposed note param on
update_task_status and record_interactive_task MCP tools.
TOOLS.md documents the full taxonomy. 185 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New tool for capturing ad-hoc work done outside formal workplans.
Finds or creates a persistent 'interactive-<repo>' workstream for the
repo, creates the task, marks it done, and records a token event using
the three-tier logic — all in a single call.
Seeded two example events on interactive-the-custodian:
- Three-tier token recording on task done (8000/3500)
- Add record_interactive_task MCP tool (4500/1800)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Token events are now always created when update_task_status is called
with status="done", using the best available data:
Tier 1 (best): exact tokens_in + tokens_out passed by agent
Tier 2: workplan_tokens_in + workplan_tokens_out prorated
across workstream task count (note="workplan")
Tier 3 (fallback): heuristic 1000 in / 500 out (note="heuristic")
Non-done status changes never create a token event.
MCP tool updated with workplan_tokens_in/out params and tiered docs.
Ralph-workplan skill files updated with the three-tier guidance.
184 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The post-commit hook re-invokes fix-consistency, which commits writeback
changes, which re-triggers the hook — causing exponential process spawning.
Fix: pass GIT_CUSTODIAN_SYNC=1 in the env for all writeback git commits.
Update the post-commit hook (not tracked by git) to exit early when this
variable is set.
Also remove the --no-verify flag that was added as a failed attempt (it
only skips pre-commit/commit-msg, not post-commit hooks).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add git_fingerprint (root commit SHA-1) to managed_repos as a stable,
machine-independent identifier — identical across every clone regardless
of checkout path, remote URL, or SSH alias.
- Migration n1i2j3k4l5m6: adds git_fingerprint column + non-unique index
(non-unique to support repos that share ancestry via forks/splits)
- GET /repos/by-fingerprint?hash=<sha>[&remote_url=<url>]: lookup by
fingerprint; optional remote_url disambiguates shared-ancestry repos
- GET /repos/by-remote?url=<url>: fallback lookup by remote URL
- consistency_check.py --here [PATH]: auto-detects repo slug from any
local checkout via fingerprint (falls back to remote URL), then auto-
registers host_paths[hostname] so subsequent runs need no override
- --all now includes repos with host_paths[current_hostname], not just
those with local_path
- fix-consistency-here / check-consistency-here Makefile targets
- Fixed _api_get bug: httpx strips query strings when params={} is passed
- Backfilled fingerprints for 14 repos on this host
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST /topics/ was already implemented in the REST API but had no MCP
wrapper, so agents couldn't create topics (e.g. inter_hub) via MCP.
Tool follows the same pattern as create_domain.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**Purpose:** Transgenerational cognitive infrastructure and central coordination hub for all domains. Houses the state-hub (PostgreSQL + FastAPI + MCP + dashboard), governance canon, workplans, and agent session memory.
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What This Repository Is
**The Custodian** is a *transgenerational cognitive infrastructure* — a local-first, sovereignty-preserving agent system for co-creating and stewarding knowledge across seven project domains. v0.1 is a governance and schema skeleton; `state-hub/` is the first live implementation layer.
Each project under `canon/projects/` follows a consistent three-file pattern:
-`project_charter_v0.1.md` — purpose, problem statement, scope, success criteria
-`concepts_seed_v0.1.md` — ten foundational concepts for the domain
-`roadmap_v0.1.md` — multi-phase implementation plan
## Build / Test / Lint
### State Hub (primary active service)
```bash
cd state-hub
# One-time setup
cp .env.example .env # edit POSTGRES_PASSWORD
make install # uv sync → installs Python deps
# Docker (requires Docker Engine — see Docker Setup below)
make db # start postgres on 127.0.0.1:5432
make migrate # alembic upgrade head
make seed # insert 6 canonical topics
# Run services (each restarts the service if already running)
make api # db + migrate + uvicorn on 127.0.0.1:8000
make dashboard # Observable preview on :3000
make check # curl /state/health
```
The MCP server runs as a persistent SSE service (`make mcp-http`, port 8001). Registered at user scope via `claude mcp add-json -s user state-hub '{"type":"sse","url":"http://127.0.0.1:8001/sse"}'`. Restart the MCP server independently — no Claude Code restart needed.
@@ -2,7 +2,7 @@ Confidential and Proprietary. Authorized Use Only. Subject to NDA & Contractual
# The Custodian
**Transgenerational Cognitive Infrastructure** — a local-first, sovereignty-preserving agent system for co-creating and stewarding knowledge across seven project domains.
**Transgenerational Cognitive Infrastructure** — a local-first, sovereignty-preserving agent system for co-creating and stewarding knowledge across a growing set of project domains.
The Custodian acts as co-creator and steward, not authority. Humans approve all irreversible decisions. The system is designed to still be coherent decades from now.
-`concepts_seed_v0.1.md` — ten foundational concepts
-`roadmap_v0.1.md` — multi-phase implementation plan
@@ -60,7 +58,7 @@ Each domain has three canon artifacts under `canon/projects/<domain>/`:
## State Hub — Quick Start
The State Hub is the live operational layer: a PostgreSQL database, a FastAPI REST service, an MCP server for Claude Code, and an Observable dashboard.
The State Hub is the live operational layer: a PostgreSQL database, a FastAPI REST service, an MCP server, and an Observable dashboard. Its authoritative implementation now lives in the standalone checkout at `/home/worsch/state-hub`.
### Prerequisites
@@ -71,7 +69,7 @@ The State Hub is the live operational layer: a PostgreSQL database, a FastAPI RE
### First-time setup
```bash
cd state-hub
cd/home/worsch/state-hub
cp .env.example .env # set POSTGRES_PASSWORD
make install # uv sync → Python deps + custodian CLI in .venv
@@ -94,7 +92,7 @@ make api # db + migrate + api (restarts if already running)
### Dashboard
```bash
cd state-hub
cd/home/worsch/state-hub
make dashboard # Observable Framework dev server on :3000
```
@@ -134,7 +132,7 @@ It exposes 11 tools and 5 resources directly in every Claude Code session.
> This file helps you quickly understand what this repository is about,
@@ -8,120 +14,176 @@
## One-liner
Central cognitive infrastructure and coordination hub for seven project domains — provides governance canon, a live state-tracking API, and MCP integration for cross-domain agent sessions.
Governance and continuity substrate for a local-first, multi-domain agent ecosystem — owns canon, memory, workplans, and agent runtime scaffolding; coordinates through the standalone State Hub service rather than hosting it.
---
## Core Idea
The Custodian is both an **operational system** (State Hub: PostgreSQL + FastAPI + MCP server + Observable dashboard) and a **governance substrate** (canon: constitution, values, domain charters). It acts as episodic memory and coordination layer so that work across multiple repos remains visible, tracked, and aligned with long-term intent.
The Custodian holds the long-lived **meaning, boundaries, and continuity** of the
`get_domain_summary("custodian")` (MCP); State Hub service at
`/home/worsch/state-hub` (`make api`)
---
## Provided Capabilities
```capability
type: api
title: MCP tool registration
description: Register and expose new MCP tools to all Claude Code sessions via the state-hub server.
keywords: [mcp, tool, api, registration, server]
type: reference
title: Governance canon
description: Constitution, foundational values, standards, and per-domain charters/roadmaps that define what matters and what is permitted across the ecosystem.
title: Session protocol and cross-domain orientation
description: Conventions for how agents orient, coordinate, and hand off via the State Hub, including ADR-001 workplan origination and human-gated review.
Dependency order for domain sequencing: Railiance → Markitect → Coulomb.social → Personhood/Foerster → Custodian. The consistency checker (`make fix-consistency REPO=the-custodian`) must be run after any workplan changes to keep the dashboard accurate.
- This repo intentionally avoids reabsorbing runtime code. If a subsystem grows a
runtime, tests, and a deployment surface, it should move to its own repo and
report back through the State Hub and workplans (see `INTENT.md` → design values).
- After any workplan change, run `cd /home/worsch/state-hub && make
fix-consistency REPO=the-custodian` to keep the dashboard accurate.
description: Meta-agent that analyzes and optimizes other Claude Code subagents based on their performance data, usage patterns, and effectiveness metrics. Use PROACTIVELY for agent ecosystem improvement.
Meta-agent that analyzes and optimizes other Claude Code subagents based on their performance data, usage patterns, and effectiveness metrics. Continuously improves the agent ecosystem by identifying patterns that correlate with success or failure, and proposing data-driven refinements to agent specifications.
## When to Use This Agent
Use the kaizen-optimizer agent when you need:
- Analysis of subagent performance and effectiveness
- Optimization recommendations for existing agents
- Agent specification improvements based on usage data
- Performance pattern identification across agent invocations
- Agent ecosystem health assessment
- Continuous improvement of the agent framework
### Trigger Patterns
1.**Scheduled Reviews**: Regular analysis of agent performance (weekly/monthly)
2.**Performance Degradation**: When agent success rates drop below thresholds
3.**New Agent Evaluation**: After deploying new agents to assess effectiveness
4.**Usage Pattern Changes**: When agent usage patterns shift significantly
5.**Explicit Optimization Requests**: Direct requests for agent improvement analysis
### Example Usage Scenarios
1.**Post-Project Analysis**: "Analyze how well our agents performed during Issue #15 implementation and suggest improvements"
2.**Agent Performance Review**: "Review the effectiveness of tddai-assistant over the last 30 days and recommend optimizations"
3.**Ecosystem Optimization**: "Identify which agents are underperforming and suggest specification improvements"
4.**Success Pattern Analysis**: "Analyze successful agent chains and recommend best practices"
Works with other agents to gather performance data:
- Uses **general-purpose** for complex analysis tasks
- Coordinates with **project-assistant** for milestone-based performance tracking
- Leverages **claude-expert** for framework knowledge and best practices
## Expected Outputs
### Performance Analysis Reports
- Agent effectiveness rankings with supporting evidence
- Usage pattern analysis and trend identification
- Success/failure correlation analysis
- Performance bottleneck identification
### Optimization Recommendations
- Specific agent specification improvements
- Trigger pattern refinements
- Agent chain optimization suggestions
- New agent capability recommendations
### Implementation Guidance
- Prioritized improvement roadmap
- Specification update templates
- A/B testing suggestions for agent improvements
- Rollback strategies for failed optimizations
## Best Practices for Usage
### Provide Performance Context
- Share specific agent interactions that were particularly effective or ineffective
- Describe user experience challenges with current agents
- Include examples of successful and unsuccessful agent chains
- Specify performance concerns or optimization goals
### Be Specific About Scope
- Focus on particular agents or agent categories for analysis
- Define time windows for performance analysis
- Specify success criteria for optimization efforts
- Clarify whether analysis should be broad ecosystem or targeted
### Implementation Approach
- Request prioritized recommendations based on impact vs. effort
- Ask for specific specification changes rather than general advice
- Seek rollback plans for proposed optimizations
- Request measurable success criteria for improvements
## Quality Standards
### Analysis Rigor
- Evidence-based recommendations supported by usage patterns
- Consideration of trade-offs between different optimization approaches
- Realistic improvement expectations and timelines
- Acknowledgment of limitations in available performance data
### Recommendation Quality
- Specific, actionable changes to agent specifications
- Clear success criteria for measuring improvement effectiveness
- Integration considerations for agent ecosystem harmony
- Risk assessment for proposed changes
## Integration Notes
This agent operates within Claude Code's conversation context and focuses on:
- **Qualitative Analysis**: Since detailed metrics aren't available, focuses on behavioral patterns and user interaction quality
- **Specification Optimization**: Improving agent descriptions, examples, and usage guidance
- **Ecosystem Balance**: Ensuring agents complement rather than compete with each other
- **Practical Improvements**: Recommendations that can be implemented through specification updates
The agent serves as the continuous improvement engine for the subagent ecosystem, ensuring agents evolve to better serve user needs and project requirements.
## Session Start
1. Check for `.kaizen/agents/optimization/memory.md` in the project root.
2. If present, read it before beginning analysis.
3. Review `.kaizen/metrics/optimizer/analysis.json` if it exists for the latest fleet report.
## Session Close
1. When analysis completes, note key findings in `## Accumulated Findings`.
2. Append one line to `## Session Log`: `YYYY-MM-DD · <agents reviewed> · <outcome>`.
3. Bump `last_updated` and increment `session_count`.
4. Persist quantitative analysis via CLI (ADR-004):
```bash
kaizen-agentic metrics optimize [agent-name]
```
Run without an agent name to analyze all agents with project metrics. Requires
≥10 execution records per agent for actionable recommendations (see
that production/shared staging deploys must not depend on `latest`.
The current Core Hub staging profile is acceptable as a near-term service-repo
profile. It should later be promoted into a Railiance app release path once the
API contract and staging evidence are stable.
## Environment Posture
Use three distinct postures so build/release work does not overfit to the local
workstation:
| Posture | Purpose | Core Hub behavior |
|---|---|---|
| Local/dev | Fast contract work and disposable smoke proof. | `uv`, SQLite/disposable DBs, `CORE_HUB_AUTO_CREATE_TABLES=1` only for local smoke/test bootstraps, no live cluster dependency. |
| Staging | Production-like proof without cutover. | Postgres, Alembic migrations only, Kubernetes `core-hub-staging`, commit-SHA images, OpenBao/operator-owned secret references, deployed API and activity-core smokes. |
| Production/cutover | Replacement of Haskell Inter-Hub. | Requires staging import, dual-run or shadow smokes, rollback notes, non-secret readiness summary, and explicit operator approval. |
Do not let missing runner automation block API contract work. It should block
only publish/deploy automation that actually needs forge runner labels or
cluster credentials.
## Recommended Core Hub Build Lane
Keep the existing Core Hub Make targets and add future targets only when they
| `ops-bridge-tunnel` | `ops-bridge` | SSH tunnels and port forwards | Route; supply `cert_command` pattern when needed | `wiki/playbooks/ops-bridge-tunnel-cert.md#migration-checklist` |
## Security-Stage and Maturity Triage
Use ops-warden `wiki/WorkloadSecurityPosture.md` to split vague IT-security
blockers into concrete outcomes.
| Classifier | CUST-WP-0051 interpretation |
| --- | --- |
| Dev/test posture only | Not blocked on production secrets. Use synthetic contract doubles or generated test values. |
| Prod posture with real values | Owner custody and policy gates are required. Record only route id, path/version, decision id, populated-key count, or smoke id. |
| Workload maturity below secret requirement | Real blocker until the workload matures, the secret is reclassified, or the design avoids that secret. |
| Route exists and lane is `exec_capable` | `warden access --fetch/--exec` may remove manual copy/paste as a blocker by proxying the owning tool as the caller. |
| Unseal, break-glass, issuer custody unresolved | Operator ceremony/design blocker; do not bypass with Codex-visible values. |
Current read:
| Gate family | Posture/maturity read |
| --- | --- |
| Inter-Hub / ops-hub runtime keys | Legacy/fallback production real-value gate; implementation can proceed with route evidence, but live smoke waits on OpenBao/operator custody. |
| Core Hub deployed smoke/runtime token | Preferred replacement gate: `CORE_HUB_BASE_URL` is endpoint routing, operator/runtime token custody routes to `openbao-api-key`, and activity-core widget mapping belongs to Core Hub/activity-core. ops-warden does not mint or store this token. |
| activity-core to issue-core | Production service credential gate; the blocker is `ISSUE_CORE_API_KEY` injection/evidence, not repo-side contract work. |
| OpenBao unseal / issuer profile | M3-style operator ceremony. The narrow `warden-sign` lane is verified/banked; broader issuer/profile work remains separate. |
| ops-warden policy gate / warden-sign | Verified and banked: `SECRETS-WP-0004` and `FLEX-WP-0007` are finished, with `decision:032b096c433ad80c`, `ttl_out_of_bounds`, backend `vault`, and no secret material recorded. |
| Forgejo SMTP/package/runner migration | Production credential and recovery-readiness gate; use OpenBao/key-cape/ops-bridge routes, then record non-secret drill evidence. |
## Live Gates
| Gate | Blocking work | Owner and route | Expected execution host | Non-secret evidence | Fallback decision | Next action | Status |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Core Hub deployed evidence and activity-core sink | `CUST-WP-0025-T16`, `CORE-WP-0004-T03/T04`, feeds `CUST-WP-0025-T17` | `openbao-api-key` for operator/runtime token custody; `ops-bridge-tunnel` if the staging URL is private; Core Hub/activity-core for widget mapping | Core Hub staging runtime, operator workstation, or trusted cluster job | deployed-smoke run id, hub/manifest/API-consumer ids, key prefix only, widget/event ids, counts, readiness and containment booleans | Keep Inter-Hub as legacy/fallback until Core Hub evidence exists or explicit supersede decision. | Provide `CORE_HUB_BASE_URL`, approved token custody, activity-core widget id/mapping, then run deployed smoke plus sink smoke. | Waiting on staging/custody handoff |
| Inter-Hub ops-hub bootstrap | `CUST-WP-0049-T06`, unblocks `CUST-WP-0047-T05` | `inter-hub-bootstrap-ssh` for the envelope; `openbao-api-key` for operator/runtime key custody; `ssh-cert-host-access` only for cert signing if remote execution is used | Local workstation with `IHUB_OPERATOR_KEY_FILE`, or trusted host with railiance-infra force-command wrapper | Hub id, manifest id, widget count, runtime key prefix only, bootstrap smoke result, State Hub progress id | Prefer API helper. Use deployment-side migration/bootstrap only by explicit operator approval. Manual SQL remains last-resort and must be recorded as an exception. | Operator materializes Inter-Hub operator key through approved custody, runs the ops-hub helper, stores generated runtime key outside Git, removes temp files. | Ready for operator handoff |
| Ops-hub runtime evidence key | `IHUB-WP-0022-T04`, then `IHUB-WP-0022-T07` | `openbao-api-key` owned by `railiance-platform` / OpenBao | Operator workstation, OpenBao UI/CLI session, or trusted cluster job; not a Codex-visible shell with printed values | OpenBao path/version or populated key count only, token exchange HTTP status, evidence submission smoke id | Attended one-time key file is acceptable only long enough to store in OpenBao and remove; no chat or State Hub transfer. | Store/provide `OPS_HUB_KEY` via OpenBao path, then run Inter-Hub submission smoke. | Waiting on operator custody |
| OpenBao unseal and token automation | `NET-WP-0020`, related OpenBao token-grant and policy-gate blockers | `openbao-api-key` for OpenBao issuer/token paths; `railiance-infra-principals` for host policy; `ssh-cert-host-access` for cert signing; `key-cape-oidc-login` for login/MFA | OpenBao operator terminal, cluster-admin context, or trusted railiance-infra deployment path | Policy names, role names, token accessor only, decision ids, allow/deny smoke result | Keep attended ceremony path until auto-unseal/profile is explicitly approved. Do not invent `warden secret` or paste `VAULT_TOKEN`. | Broader custody profile remains open; do not treat the completed `warden-sign` lane as a general OpenBao credential helper. | Needs operator design/approval |
| ops-warden policy gate / warden-sign lane | `SECRETS-WP-0004`, `FLEX-WP-0007` | secrets-engine owned the OpenBao lane; flex-auth owned the policy decision runtime; ops-warden ran the smoke | CoulombCore via deployed flex-auth runtime `127.0.0.1:18090` and production OpenBao | `decision:032b096c433ad80c`, `ttl_out_of_bounds`, backend `vault`, no token/role/secret/accessor values | Keep `policy.enabled` off until testing/production maturity; live enforcement is an ops-warden operator posture decision. | No CUST action. Bank the verified gate and avoid reopening it as a generic credential blocker. | Verified/banked |
| Forgejo production migration | `RAIL-HO-WP-0005` T02/T06/T11/T12 | `openbao-api-key` for SMTP/package/provider credentials; `key-cape-oidc-login` for login/MFA; `ops-bridge-tunnel` or `ssh-cert-host-access` only for host reachability | Forgejo admin/browser session, railiance01 trusted host, or approved GitOps/deployment path | Decision record id, hostname/exposure choice, SMTP sender/domain alignment, password-reset smoke, backup/restore drill id, package pull smoke, cutover approval id | Keep Gitea as read-only rollback until stabilization passes; do not retire legacy Gitea without explicit approval. | Resolve production choices, store SMTP credentials through OpenBao, run recovery and migration drills, then request cutover approval. | Needs human production decisions |
## Route Lookup Commands
```bash
cd /home/worsch/ops-warden
uv run warden route show openbao-api-key --json
uv run warden route show inter-hub-bootstrap-ssh --json
uv run warden route show ssh-cert-host-access --json
uv run warden route show railiance-infra-principals --json
uv run warden route show key-cape-oidc-login --json
uv run warden route show ops-bridge-tunnel --json
```
## Pickup Order
1. Core Hub deployed evidence and activity-core sink smoke, because this is
the preferred replacement proof for ops-hub evidence and can supersede
legacy Inter-Hub waits once real staging evidence exists.
2. Keep Inter-Hub ops-hub bootstrap and runtime keys as legacy/fallback only
unless the operator explicitly chooses rollback or compatibility proof.
3. OpenBao custody profile, because several credential-helper and policy-gate
blockers collapse once a narrow issuer path exists.
4. Forgejo production decisions, because those require human design approval
Track `CUST-WP-0051-T07` and `CUST-WP-0052`: sequence `CUST-WP-0025` so FOS Hub bootstrap can resume from current repo reality rather than the older mega-hub/Keycloak/Inter-Hub assumptions.
## Current Decision
Do not restart FOS bootstrap at the old `NK-WP-0001` Keycloak path. That workplan is archived and superseded. The active identity baseline is:
-`NK-WP-0002` local identity: complete; usable for bootstrap/dev OIDC.
-`NK-WP-0012` IAM Profile v0.2: finished; canonical NetKingdom-owned profile and conformance suite.
- KeyCape/Authelia/LLDAP stack from the superseding NetKingdom path: current lightweight identity mode.
-`NK-WP-0011` expanded-mode Keycloak: proposed enterprise federation lane, not a blocker for ops-hub bootstrap.
## Sequence Board
| Area | Current state | Pickup action |
| --- | --- | --- |
| Identity | Old `CUST-WP-0025-T01` pointed at archived `NK-WP-0001`; local identity and IAM Profile v0.2 are done. | Keep T01 cancelled, T02 done, and make T03 the remaining identity gate: a protected FastAPI fixture using IAM Profile v0.2 against local-identity or KeyCape. |
| Hub extraction/dev-hub | `CUST-WP-0025-T05` through `T12` are done: hub-core exists, State Hub imports hub-core, and MCP naming moved to dev-hub. | Treat Phase 2 as complete. Do not spend pickup energy here unless consistency drift appears. |
| Ops hub | Core Hub is now the replacement platform: `CORE-WP-0008` finished the API smoke harness, activity-core sink, staging profile, CLI wrappers, UI rebuild backlog, and Custodian handoff. Live deployed smokes and cutover evidence are still open. | Continue through Core Hub deployed evidence, migration import, activity-core smoke, and cutover gates. Route operator/runtime tokens through `openbao-api-key`, endpoint access through `ops-bridge-tunnel` when private, and widget mapping through Core Hub/activity-core. Treat Haskell Inter-Hub as legacy compatibility or rollback evidence. |
| Old ops-hub scaffold tasks | `CUST-WP-0025-T13`-`T19` have been rewritten around Core Hub API evidence, CLI parity, deployed smoke/cutover gates, whynot-aligned UI, and cancellation of immediate standalone ops-hub MCP registration. | Execute the remaining wait/todo gates in the rewritten Phase 3. Do not resume the obsolete standalone ops-hub scaffold sequence. |
| Fin hub/business | `CUST-WP-0025-T20`-`T26` are all todo and depend on a proven multi-hub pattern. | Defer until ops-hub has a working first signal and the identity integration gate is proven. |
## Stable Pickup Order
1. Close the identity drift: T01 cancelled, T02 done, T03 remains as the one real identity integration test.
2. Use the finished `CORE-WP-0008` evidence lane and `CUST-WP-0052` reset notes as the Core Hub replacement baseline.
3. Keep `CUST-WP-0047`/`CUST-WP-0049` as legacy evidence/fallback until Core Hub deployed smoke evidence or an explicit supersede decision closes them.
4. Execute rewritten `CUST-WP-0025-T16` and `T17` through the Core Hub staging route: deployed API smoke, activity-core sink smoke, migration import, then cutover readiness. Keep T14/T15/T18 as completed definition/CLI/UI inputs.
5. Start fin-hub/business work only after ops-hub proves the Core Hub pattern end-to-end.
| `api/models/domain.py` | Core domain identity; remove relationships to dev-hub-only models from core. |
| `api/models/managed_repo.py` | Core repo registry; make `topic_id`, SBOM, and sync timestamps extension fields or keep them in dev-hub until a second pass. |
| `api/models/agent_message.py` | Generic agent inbox and thread model. |
| `api/models/tpsc.py` | Third-party service catalog/snapshot primitives. |
| `api/routers/progress.py` | Generic progress-event router once dev-hub foreign keys move behind `subject_refs` or extension mapping. |
| `api/routers/capability_requests.py` | Generic capability catalog/request router once dev-hub flow side effects and task unblocking stay in dev-hub. |
| `api/routers/tpsc.py` | Generic catalog and GDPR report router. |
| `api/routers/policy.py` | Generic policy document router if policy roots become configurable. |
The first committed router seam is factory-based rather than global:
| `hub_core.utils.pagination` | Shared limit/offset bounds and SQLAlchemy pagination. |
| `hub_core.utils.paths` | Resolve repo paths from `host_paths` before falling back to `local_path`. |
| `hub_core.utils.routing` | Normalize a path or URL path component while preserving query strings and fragments. |
## Migration Scaffold
`/home/worsch/hub-core` now carries Alembic template files under
`hub_core/migrations/` plus `versions/0001_core_schema.py`. The first migration
covers only the currently extracted core tables:
-`domains`
-`managed_repos`
-`agent_messages`
-`capability_catalog`
-`capability_requests`
-`progress_events`
-`tpsc_catalog`
-`tpsc_snapshots`
-`tpsc_entries`
## Needs An Adapter Seam
These are still part of the target architecture, but the current State Hub
implementation is coupled to dev-hub concepts:
| Surface | Coupling to resolve |
| --- | --- |
| `Domain` and `domains.py` detail views | Detail counts now use a dev-hub callback behind the hub-core router factory. Domain relationships still need a later model split if State Hub stops carrying topics/goals on the core table. |
| `ManagedRepo` | State Hub create/read schemas now extend hub-core contracts, with `topic_id`, SBOM fields, and state-sync timestamps kept as dev-hub extensions. Generic repo registry collection, lookup, detail, update, and host-path routes now mount from the hub-core factory; State Hub keeps onboarding, DoI, scope-health, dispatch, archive, and consistency-sync behavior locally. |
| `CapabilityRequest` | Write routes now mount from `create_capability_request_write_router` with host callbacks. State Hub keeps workplan/task columns on its model; generic hubs use `request_context` / `fulfillment_context` JSON. Optional future step: map State Hub columns into JSON and drop duplicate fields. |
| `ProgressEvent` | Adapter seam implemented with generic `subject_refs`; State Hub still needs a later refactor to map topic/workstream/task/decision foreign keys into that field or a dev-hub extension table. |
| MCP tools in `mcp_server/server.py` | Generic tools register via `HubCoreMCPServer.attach_to(mcp, exclude=...)`. Remaining local tools: dev-hub orientation (`get_state_summary`, `get_domain_summary`), extended repo/capability/TPSC contracts, and all workstream/task/decision tooling. |
The first two adapter seams are now implemented in hub-core:
-`ProgressEvent.subject_refs`: generic JSON references for hub-local subjects.
-`CapabilityRequest.request_context` and `fulfillment_context`: generic JSON
context for hub-local workstreams, tasks, incidents, services, budgets, or
other future hub entities.
## Keep In Dev-Hub
The following State Hub areas should not move into hub-core during T05:
- Topics, workstreams, tasks, decisions, dependencies, and flow state.
`make -C ~/net-kingdom openbao-init-unseal` exists with custody-model gate
and non-secret evidence; operator review still needed to wire it as a phase
inside `creds-bootstrap-agent.sh`, and greenfield live proof needs a rebuild
slate.
## Purpose
This checkpoint is the restart surface for the infrastructure stabilization
metaplan. It consolidates the workplan review, unblock boards, current State
Hub registration state, and the next strategic picks.
Use this file first when resuming the lane. Then open the source workplan named
in the relevant row and continue from its task state.
## Registration State
State Hub active workstreams queried on 2026-06-27:
| Workstream | Current pickup meaning |
| --- | --- |
| `artifact-store-wp-0007` | Start D7.1/D7.2 assessment and compatibility harness; D7.3 STS vending may route to NetKingdom. |
| `ihub-wp-0022` | Ops-hub evidence intake contract is aligned to live vocabulary; runtime key custody, protected widget lookup, and smoke remain. |
| `cust-wp-0047` | Now-view waits on the ops-hub Inter-Hub evidence lane, not on service inventory collection. |
| `cust-wp-0049` | Bootstrap access helper/runbook is ready; authenticated execution is operator-gated. |
| `cust-wp-0051` | This metaplan is the coordination layer for remaining cross-workplan gates. |
| `activity-wp-0016-llm-output-robustness-trust-boundary` | Repo-side output robustness bundle is prepared; live deploy/smoke proof remains. |
| `three-phoenix-ha-cluster` | HA substrate remains future critical-workload work, not the current State Hub cutover blocker. |
| `rail-ho-wp-0005` | Forgejo production migration is parked behind explicit design, SMTP, backup, runner, and cutover decisions. |
| `net-wp-0020` | OpenBao unseal/token custody remains an operator design and smoke gate. |
| `issue-wp-0003` | issue-core service is healthy; activity-core REST emission wiring remains. |
| `activity-wp-0006` | Calibration waits on the post-WP-0016 live daily-triage smoke and three clean scheduled runs. |
| `cust-wp-0038` | Full State Hub HA migration is deferred until the pragmatic railiance01 path stabilizes. |
| `cust-wp-0025` | FOS bootstrap resumes from identity integration and ops-hub evidence, not the old mega-hub scaffold. |
| `cust-wp-0011` | Active State Hub migration path; next gate is explicit cutover approval. |
Hygiene status:
-`CUST-WP-0045-cutover-runbook` is no longer active; it is a finished runbook
record, not an empty active workstream.
-`CUST-WP-0014` is reopened as `backlog`; it is no longer a done workplan with
todo task blocks.
- Completed or cancelled tasks no longer carry the stale human-needed flags
cleared during this stabilization session.
-`make fix-consistency REPO=the-custodian` still reports pre-existing C-12
orphan-row warnings, but the relevant workplan lifecycle and task states sync.
-`RAIL-BS-WP-0006-staged-promotion-lifecycle` is finished: all seven tasks
are done, the workstream is finished in State Hub, and the file frontmatter
is `status: finished`.
## Blocker Board
No live credential, access, or approval gate is unowned. Do not ask
`ops-warden` for secret values; use the route catalog, the `warden access`
assist/proxy surface where the catalog lane allows it, and the owning subsystem.
For credential-related blockers, classify the environment posture and workload
maturity first. Dev/test work can use synthetic contract doubles; production
real-value work needs owner custody, policy gates where applicable, and
non-secret evidence. See `docs/ops-warden-secret-posture-review.md`.
Do not implement ops-warden changes from this Custodian lane. New ops-warden
needs should be posted through State Hub as requirements or suggestions for the
separate ops-warden worker.
| Gate | Owner/route | Non-secret evidence to collect | Next action |
| --- | --- | --- | --- |
| State Hub pragmatic cutover | Custodian operator approval; `CUST-WP-0011-T07` | Final dump id/time, row-count comparison, chosen private endpoint, stabilization notes | Approve freeze/final restore and make railiance01 State Hub primary, or leave WSL2 primary explicitly. |
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Legacy/fallback only. Prefer Core Hub deployed smoke; run attended Inter-Hub bootstrap only by explicit operator supersede/rollback decision. |
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Do not materialize legacy `OPS_HUB_KEY` until a deployed Core Hub smoke or explicit legacy Inter-Hub smoke is ready to use it. |
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean streak, then have the activity-core owner land/sync the in-flight WP-0016 diagnostics and prove bounded top-N plus graceful-degradation smoke. |
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | `warden-sign` lane is verified/banked; broader custody profile and issuer automation remain separate operator-design gates. |
| ops-warden policy gate / warden-sign lane | `SECRETS-WP-0004` + `FLEX-WP-0007` finished; ops-warden operator posture | `decision:032b096c433ad80c`, `ttl_out_of_bounds`, backend `vault`; no token/role/secret/accessor values | No Custodian action. Keep `policy.enabled` off until testing/production maturity. |
## Daily Automation Evidence
The scheduled daily-triage runner is alive and writing State Hub plus working
memory evidence. The current blocker is bounded output-contract adoption and
live graceful-degradation proof, not scheduling or sink reachability.
Bank the three-run calibration streak, but keep the WP-0016 live-proof gate open
until the bounded top-N contract and graceful-degradation smoke are proven. The
activity-core worktree currently has in-flight uncommitted ACTIVITY-WP-0016
and ACTIVITY-WP-0018/0019 changes, so Custodian should wait for that owner to
commit/sync or explicitly hand off before treating those files as source truth.
Use activity-core repo-native automation status surface once it lands; do not
use assistant-provided scheduling as operational evidence.
## Production Service Summary
| Surface | Stable fact | Remaining gate |
| --- | --- | --- |
| State Hub | Pragmatic railiance01 path has image, manifests, empty deploy, migrations, restored WSL2 data, row-count comparison, and healthy API through `CUST-WP-0011-T06`. | `CUST-WP-0011-T07` cutover approval, then stabilization; HA path stays deferred. |
| Inter-Hub / Core Hub | Public `https://hub.coulomb.social/api/v2/hubs` exposes `ops-hub`; `CORE-WP-0008` finished the Core Hub API smoke harness, activity-core sink, staging profile, CLI wrappers, UI backlog, and Custodian handoff. | Run deployed Core Hub smoke, staging import, activity-core sink smoke, and readiness summary; keep Haskell Inter-Hub only for migration/rollback proof. |
| ops-hub evidence | `CUST-WP-0025-T14` is done with the Core Hub ops evidence contract spec. `CUST-WP-0025-T13` through `T19` now use Core Hub API/CLI/UI gates; `CUST-WP-0047` and `CUST-WP-0049` remain legacy/fallback records. | Execute `CUST-WP-0025-T16`, `T17`, and `T18`; close legacy Inter-Hub waits only through deployed Core Hub evidence or explicit supersede decision. |
| issue-core | ArgoCD service is healthy on port `8765`; image `0.2.1`; ExternalSecret Ready; authenticated smoke created Gitea issue `175`. | activity-core still needs `ISSUE_CORE_API_KEY`, URL port `8765`, `ISSUE_SINK_TYPE=rest`, and a safe emission smoke. |
| Forgejo | Migration inventory/design lane is active but pre-cutover. | Production design decisions, SMTP/email recovery, package registry, Actions, backup/restore, migration drill, cutover approval. |
| artifact-store | D7.1 is done; D7.2 has an opt-in live MinIO compatibility harness and manual smoke docs. No live secret handoff is recorded. | Run D7.2 against an approved MinIO-compatible endpoint, then route D7.3 STS vending through identity/platform custody before changing credential behavior. |
| secrets-engine | `SECRETS-WP-0004` is finished: the scoped `warden-sign` lane supported the vault-backed policy-gate smoke without exposing token material. `SECRETS-WP-0003` remains active for the real whynot-design npm publish pilot. | Finish or park `SECRETS-WP-0003` behind Gitea bot/package-token provisioning, OpenBao custody, ops-warden route confirmation, and real package publish evidence. |
| FOS hub | Old NK-WP-0001 Keycloak prerequisite is cancelled; NK-WP-0002 local identity, IAM Profile v0.2, the Core Hub FastAPI IAM Profile integration test, and Core Hub operator UI first screens are done; hub-core extraction/dev-hub work is done; CUST-WP-0025 Phase 3 has been rewritten for Core Hub. | Execute the remaining Core Hub deployed evidence and cutover gates: `CUST-WP-0025-T16` and `T17`. |
## Next-Pick List
1. Execute the remaining rewritten `CUST-WP-0025` Core Hub gates: deployed
smoke and activity-core proof (`T16`) and cutover decision coupling (`T17`).
T03, T14, and T18 are complete as the identity integration template, ops
evidence/read-model contract, and operator UI first-screen gates.
2. Keep `CUST-WP-0047` and `CUST-WP-0049` as legacy evidence/fallback until
Core Hub deployed smoke evidence or an explicit supersede decision closes
them.
3. Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean daily-triage
streak for calibration, then have the activity-core owner land/sync the
in-flight WP-0016 diagnostics/status work and prove the bounded top-N plus
graceful-degradation smoke.
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or
record that WSL2 remains primary for the next operating period.
6. Run artifact-store D7.2 live MinIO-compatible evidence; Forgejo and storage
work can now inherit the finished staged-promotion gates.
7. Keep `SECRETS-WP-0003` parked until Gitea bot/package-token provisioning,
OpenBao custody, route confirmation, and a coordinated whynot-design version
bump are available.
8. Keep Forgejo cutover and State Hub HA work parked until their human decision
and drill gates are satisfied.
## Resume Commands
```bash
cd /home/worsch/the-custodian
sed -n '1,260p' workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md
sed -n '1,260p' docs/infrastructure-stabilization-pickup-checkpoint.md
sed -n '1,260p' docs/credential-custody-unblock-board.md
Track `CUST-WP-0051-T05`: finish or park near-term production service lanes
before starting larger migrations.
## Lane Board
| Lane | Current state | Next action |
| --- | --- | --- |
| `issue-wp-0003` | issue-core is live through ArgoCD; image `0.2.1`, Service port `8765`, ExternalSecret Ready, authenticated smoke created Gitea issue `175`. | Do not flip activity-core blindly. First inject `ISSUE_CORE_API_KEY` into `actcore-runtime-secret` through route `activity-core-issue-sink`; then set activity-core `ISSUE_CORE_URL` to port `8765`, set `ISSUE_SINK_TYPE=rest`, restart/sync, and run one safe emission smoke. |
| `rail-ho-wp-0005` | Forgejo migration remains pre-implementation. Inventory is in progress; production decisions, SMTP/email recovery, cutover, and legacy retirement are human-gated. | Resolve T02 production decisions first, then build the disposable Forgejo probe. Do not start production cutover before promotion lifecycle, email recovery, package registry, Actions, backup/restore, and migration drill pass. |
| `artifact-store-wp-0007` | D7.1 is done. The dated MinIO/fork/object-store landscape assessment chose a compatibility-profile lane rather than a direct MaxIO fork. D7.2 is in progress with an opt-in live MinIO pytest harness and manual smoke docs; no secret value was read or recorded. | Run the D7.2 harness against an approved MinIO-compatible endpoint and capture health/round-trip/multipart evidence. Route D7.3 STS credential vending through identity/platform custody before changing artifact-store credential behavior. |
| `secrets-wp-0003` | Active. The whynot-design real npm publish pilot has a canonical decision and source-side runbook, but real publication still waits on Gitea bot/package-token provisioning, OpenBao custody, ops-warden route confirmation, and a coordinated whynot-design version bump. | Keep parked until the operator/Gitea/OpenBao gates are ready; do not request or record token values. The next safe non-secret action is route-confirmation evidence from ops-warden. |
| `staged-promotion-lifecycle` | Finished. Lifecycle spec, app contract, overlay scaffold, Stage 1 runner, canary template, deploy/observe tooling, promote/rollback tooling, and onboarding guide are done. | Use the finished promotion gates as prerequisites for Forgejo/source-forge and storage production work. |
## Credential And Operator Routing
`activity-core -> issue-core` REST emission uses route catalog id
`activity-core-issue-sink`.
Route lookup on 2026-06-27:
- owner: `activity-core + issue-core`
- ops-warden executes: no
- status: active
- next action: follow `ops-warden/wiki/playbooks/activity-core-issue-sink.md#worker-checklist`
No secret value was read or written. The required non-secret evidence is:
-`actcore-runtime-secret` has an `ISSUE_CORE_API_KEY` data key;
`inter-hub/docs/contracts/ops-hub-activity-core-mapping.md` and
`ops-hub-activity-core-event-payloads.md` still describe the early
activity-core proposal:
| Contract name | Live seed status | Recommended action |
| --- | --- | --- |
| `ops-service-observed` | Not in live event registry | Rename to `ops-service-discovered`, or add an explicit alias event in the ops-hub manifest. |
| `ops-endpoint-verified` | Live | Keep. |
| `ops-access-path-checked` | Not in live event registry; no `ops-access-path` widget type in seed | Either add access-path vocabulary/widgets, or defer access-path submissions and keep State Hub fallback. |
| `ops-backup-verified` | Live | Keep, but map to `ops-backup-set` widget type. |
| `ops-inventory-drift` | Not in live event registry | Rename to `ops-drift-detected`, or add an explicit alias event. |
| `ops-evidence` policy scope | Not in live policy scopes | Use an existing ops scope or add `ops-evidence` to the manifest and activate it. |
| aggregate refs such as `ops:service:aggregate` | Not in `ops-hub/seeds/ops-hub-widgets.seed.json` | Seed aggregate intake widgets or change mapping to the existing entity/readiness widgets. |
| widget types such as `ops-service-card` | Not in live widget types | Use live widget types like `ops-service`, `ops-endpoint`, `ops-backup-set`, and `ops-readiness-gate`. |
## 2026-06-27 Contract Alignment
The Inter-Hub contract docs were revised in `/home/worsch/inter-hub` to target
the live ops-hub seed vocabulary:
-`ops-service-observed` is now a transition alias for
`ops-service-discovered`.
-`ops-inventory-drift` is now a transition alias for `ops-drift-detected`.
-`ops-access-path-checked` is explicitly deferred to State Hub fallback until
ops-hub adds access-path vocabulary or a readiness/risk mapping decision.
- The old `ops-evidence` policy scope is replaced by declared live scopes such
as `ops-production`, `ops-registry`, and `ops-backup-retention`.
- Payload examples now post only live manifest event types.
This removes the known contract-drift blocker before the attended bootstrap.
The remaining gate is authenticated widget lookup, any missing backup/risk seed
widget, runtime key custody, and protected event submission smoke.
## Current Closure State
`CUST-WP-0049-T06` remains `wait`: the helper and runbook are ready, but an
approved authenticated execution lane is still required.
`CUST-WP-0047-T05` remains `wait`: the ops-hub row and vocabulary are visible,
but seeded widgets and event acceptance cannot be proven without the protected
runtime path.
`IHUB-WP-0022-T03/T04/T07` remain gated: before an end-to-end smoke, reconcile
the activity-core mapping contract to the live ops-hub seed vocabulary or add
the missing aliases/aggregate widgets to the manifest.
## Next Pick
1. Use the aligned live-vocabulary contract for the attended
`CUST-WP-0049-T06` bootstrap.
2. Confirm protected widget ids and seed any missing backup/risk target widgets
required by the mapping.
3. Store or confirm `OPS_HUB_KEY` through OpenBao, then run the protected
| activity-core (activity-core) | Railiance01<br>type: k3s; cluster: railiance01-k3s; namespace: activity-core | activity-core<br>the-custodian | activity-core API health endpoint<br>Expected: status 200, healthy DB and Temporal status | observed_ok<br>2026-05-23: API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified. | postgresql:activity-core<br>temporal:activity-core<br>nats:railiance01 | k8s: observed_ok (railiance01-k3s/activity-core) | Add explicit ops inventory probes and evidence events. |
| Ops Bridge (ops-bridge) | Local Workstation<br>type: bridge; host: local-workstation | ops-bridge | - | unknown<br>2026-05-16: Bridge is useful for connected-server visibility but is not itself the service catalog. | - | ssh-tunnel: unknown (connected remote servers) | Emit reachability evidence into ops-hub instead of relying on bridge state as inventory. |
| Haskell Build Agent (haskell-build-agent) | Local Workstation<br>type: systemd; host: haskell-build-vm | the-custodian | http://127.0.0.1:18000<br>Expected: VM can reach State Hub through SSH forward | unknown<br>undated: Build agent is a systemd service and registers with State Hub on boot. | - | ssh: unknown (local workstation reverse tunnel port 12222) | Current tunnel and capability registration need live evidence in ops-hub. |
## Open Operating Gaps
### Gitea (`gitea`)
- Package token and push/pull verification need current evidence.
- Backup and restore evidence for database and shared storage not recorded in ops inventory.
### Gitea Database (`gitea-database`)
- Backup and restore evidence not recorded in ops inventory.
### Gitea Shared Storage (`gitea-shared-storage`)
- Package blob backup and restore evidence not confirmed.
### State Hub (`state-hub`)
- Future cluster deployment readiness still needs ops evidence.
### Inter-Hub (`inter-hub`)
- ops-hub bootstrap requires authenticated UI flow or deployment-side migration.
### activity-core (`activity-core`)
- Add explicit ops inventory probes and evidence events.
### Ops Bridge (`ops-bridge`)
- Emit reachability evidence into ops-hub instead of relying on bridge state as inventory.
### Haskell Build Agent (`haskell-build-agent`)
- Current tunnel and capability registration need live evidence in ops-hub.
## Next Evidence Events
-`ops-service-observed` for each runtime object confirmed by a probe.
-`ops-endpoint-verified` for HTTP, HTTPS, tunnel, or cluster endpoints.
-`ops-access-path-checked` for non-secret access path checks.
-`ops-backup-verified` where backup and restore evidence exists.
-`ops-inventory-drift` when observed state differs from this inventory.
posture plus `M0-M3` workload maturity and a secret-flow lattice.
This helps CUST-WP-0051 because a security blocker can now be classified instead
of left as a generic "credentials needed" stop.
## Blocker Refinement Rules
| Situation | CUST-WP-0051 action |
| --- | --- |
| Dev/test implementation needs a credential-shaped dependency | Use synthetic contract doubles; do not wait for production secrets. |
| Production smoke needs a real value | Route to the owner, collect non-secret evidence, and keep the value out of Codex-visible surfaces. |
| Route is `exec_capable` | Prefer `warden access --fetch/--exec` as the caller over copy/paste handling. |
| Workload maturity is below the secret requirement | Keep the blocker; resolve by maturity advancement, policy/design change, or avoiding the secret. |
| OpenBao unseal, break-glass, or issuer custody is unresolved | Keep as operator ceremony/design blocker. |
## Current CUST-WP-0051 Read
| Gate | Refined blocker |
| --- | --- |
| Ops-hub runtime `OPS_HUB_KEY` | Production real-value custody gate; implementation is not blocked, live smoke is. |
| Inter-Hub ops-hub bootstrap | Access/custody gate with an attended execution path; no need to request secret values from ops-warden. |
| activity-core -> issue-core | Production API key injection/evidence gate; route is known through `activity-core-issue-sink`. |
Use `CUST-WP-0011` as the active State Hub stabilization path.
Keep `CUST-WP-0038` and `RAIL-BS-WP-0007` as deferred HA/ThreePhoenix follow-up lanes.
Rationale: the pragmatic railiance01 deployment has already completed image
publish, cluster manifests, empty deploy, migrations, WSL2 data restore, row-count
comparison, and cluster API health checks. The remaining work is cutover and
stabilization, not initial buildout.
## Current State
| Path | State | Next action |
| --- | --- | --- |
| `CUST-WP-0011` pragmatic railiance01 | T01-T06 done. Cluster State Hub has verified restored WSL2 data and healthy API. | T07: get explicit approval to freeze WSL2 writes, restore final dump, compare again, and redirect private access/MCP to the cluster endpoint. |
| `CUST-WP-0038` full HA State Hub | Entry criteria depend on completing or superseding CUST-WP-0011 and passing stabilization. All implementation tasks are still todo. | Defer until cluster-hosted State Hub proves stable and ThreePhoenix storage/database strategy is current. |
| `RAIL-BS-WP-0007` ThreePhoenix HA cluster | All phases are todo. | Treat as substrate work for future critical workloads and HA State Hub, not as a blocker for pragmatic cutover. |
## Human Gates
-`CUST-WP-0011-T07`: explicit approval required before freezing WSL2 writes and making the cluster State Hub primary.
-`CUST-WP-0038-T08`: explicit approval required before retiring WSL2 fallback after HA failover and restore drills.
## Stable Pickup Path
1. Reconfirm current WSL2 backup and take final pre-cutover dump.
2. Restore final dump into railiance01 State Hub and compare counts again.
3. Redirect the active private access path: either keep local `127.0.0.1:8000` and move it to an ops-bridge/SSH tunnel, or set MCP `API_BASE` to the private cluster endpoint.
4. Run stabilization with WSL2 retained as fallback.
5. Document the operating model and leave final retirement to a later explicit decision or HA workplan.
> This file helps you quickly understand what this repository is about,
> when it is relevant, and when it is not.
> It is intentionally lightweight and may be incomplete.
---
## One-liner
Central cognitive infrastructure and coordination hub for seven project domains — provides governance canon and coordinates through the standalone State Hub API/MCP service.
---
## Core Idea
The Custodian repository is the **governance substrate**: canon, constitution, values, domain charters, workplans, and runtime scaffolding. The operational State Hub service (PostgreSQL + FastAPI + MCP server + Observable dashboard) now lives in the standalone `/home/worsch/state-hub` repository and acts as episodic memory and coordination layer for work across repos.
---
## In Scope
- Canon layer: governance constitution, foundational values, six domain charters/roadmaps
- Coordination through the standalone State Hub API: topics, workstreams, tasks, decisions, progress events, contributions, SBOM, goals
- MCP session protocol: use the State Hub MCP tools from registered agent sessions
- Publishing lifecycle events on NATS JetStream (`org.statehub.>`) so activity-core can react via declarative ActivityDefinitions
---
## Out of Scope
- Domain-specific implementation work (Railiance, Markitect, etc. each own their repos)
- Financial/legal transactions or external publication
- Storing plaintext credentials
- Direct writes to `canon/` without a human-approved review gate
- State Hub implementation work; use `/home/worsch/state-hub`
- Maintenance task *creation* in response to lifecycle events — that responsibility lives in activity-core (see `/home/worsch/state-hub/docs/activity-core-delegation.md`). The state hub remains a **read model**, not a task factory.
---
## Relevant When
- Starting or closing any session in a registered domain repo (orientation via `get_domain_summary()`)
- Tracking cross-domain decisions, blockers, or workplan progress
- Registering a new project into the ecosystem (`make register-project`)
- Consulting governance rules or domain charters
- Running the standalone State Hub API locally for MCP connectivity
---
## Not Relevant When
- Implementing single-domain features (stay in the domain repo)
- Working fully offline with no need for state coordination
- Non-custodian ecosystem work (standalone projects, throw-away scripts)
---
## Current State
- Status: active
- Implementation: ~60% — canon + standalone State Hub operational; RAG/drafting pipelines (Phase 2) not yet started
- Stability: stable (versioned Alembic migrations; no breaking API changes since v0.3)
- Usage: running daily; 15+ active workstreams across 6 domains; MCP server active in Claude Code
---
## How It Fits
- Upstream dependencies: none (sits at the top of the dependency order)
- Downstream consumers: all six domains (railiance → markitect → coulomb.social → personhood/foerster → custodian); **activity-core** consumes state hub lifecycle events on NATS subject `org.statehub.>` to drive maintenance ActivityDefinitions
- Often used with: kaizen-agentic (agent definitions), ops-bridge (remote tunnel connectivity), activity-core (task factory + event bridge)
Dependency order for domain sequencing: Railiance → Markitect → Coulomb.social → Personhood/Foerster → Custodian. The consistency checker (`cd /home/worsch/state-hub && make fix-consistency REPO=the-custodian`) must be run after any workplan changes to keep the dashboard accurate.
13 active workstreams with 58 open tasks. High-priority triage items need immediate attention, including 6 tasks requiring human intervention and 1 blocked workstream. Daily triage infrastructure itself is in active development.
```json
{
"recommendations":[
{
"action":"work-next",
"candidate":"cust-wp-0044",
"confidence":"high",
"why":"Current workplan for this very triage process - needs completion for self-sustaining operations"
},
{
"action":"work-next",
"candidate":"cust-wp-0045",
"confidence":"high",
"why":"Daily triage runner infrastructure - critical dependency for automated operations"
},
{
"action":"needs-human",
"candidate":"cust-wp-0046",
"confidence":"high",
"why":"Blocked status with 1 needs_human task - requires intervention to unblock"
},
{
"action":"needs-human",
"candidate":"hf-wp-0001",
"confidence":"high",
"why":"5 needs_human tasks in high-priority ops-hub establishment workstream"
},
{
"action":"needs-human",
"candidate":"railiance-wp-0004",
"confidence":"medium",
"why":"Medium priority with 1 needs_human task but no open todos - may need task breakdown"
},
{
"action":"work-next",
"candidate":"whi-kpi-card",
"confidence":"medium",
"why":"9 open high-priority tasks for workstream health monitoring - supports triage operations"
},
{
"action":"revisit",
"candidate":"cust-wp-0011",
"confidence":"medium",
"why":"6 high-priority migration tasks but no planning priority set - needs prioritization review"
},
{
"action":"split",
"candidate":"cust-wp-0025",
"confidence":"high",
"why":"25 open tasks in single workstream - too large for effective management"
},
{
"action":"park",
"candidate":"adhoc-2026-06-01",
"confidence":"medium",
"why":"Single low-priority opportunistic fix - defer until higher priorities complete"
}
],
"summary":"13 active workstreams with 58 open tasks. High-priority triage items need immediate attention, including 6 tasks requiring human intervention and 1 blocked workstream. Daily triage infrastructure itself is in active development."
13 active workstreams with 59 open tasks. High-priority triage items include 6 needs-human tasks in HF-WP-0001, blocked CUST-WP-0046, and two active daily triage workstreams ready for execution.
```json
{
"recommendations":[
{
"action":"work-next",
"candidate":"cust-wp-0044",
"confidence":"high",
"why":"High priority daily triage workstream with 1 todo task, directly supports this report generation"
},
{
"action":"work-next",
"candidate":"cust-wp-0045",
"confidence":"high",
"why":"High priority daily triage runner with 2 todo tasks, enables automation of this process"
},
{
"action":"needs-human",
"candidate":"hf-wp-0001",
"confidence":"high",
"why":"High priority but has 5 needs-human tasks requiring human decision-making"
},
{
"action":"needs-human",
"candidate":"cust-wp-0046",
"confidence":"high",
"why":"Blocked status with 1 needs-human task preventing progress"
},
{
"action":"work-next",
"candidate":"whi-kpi-card",
"confidence":"medium",
"why":"9 todo tasks with high priority items, no human intervention needed"
},
{
"action":"revisit",
"candidate":"cust-wp-0011",
"confidence":"medium",
"why":"6 high priority migration tasks but no planning priority set, needs prioritization review"
},
{
"action":"work-next",
"candidate":"adhoc-llmc-2026-06-02",
"confidence":"medium",
"why":"Recent ad-hoc work with 6 medium priority tasks, likely time-sensitive"
}
],
"summary":"13 active workstreams with 59 open tasks. High-priority triage items include 6 needs-human tasks in HF-WP-0001, blocked CUST-WP-0046, and two active daily triage workstreams ready for execution."
11 active workstreams with 3 high-priority items needing immediate attention. CUST-WP-0044 (this triage system) and CUST-WP-0045 (daily runner) are in calibration phase. HF-WP-0001 has 5 human-needed tasks blocking ops-hub extension. One workstream blocked, infrastructure migration work distributed across multiple streams.
```json
{
"recommendations":[
{
"action":"work-next",
"candidate":"cust-wp-0044",
"confidence":"high",
"why":"High priority, active calibration of this triage system itself"
"why":"High priority but 5 tasks need human input, blocking ops-hub extension"
},
{
"action":"revisit",
"candidate":"cust-wp-0046",
"confidence":"medium",
"why":"Blocked status with 1 human-needed task, assess unblocking conditions"
},
{
"action":"needs-human",
"candidate":"rail-ho-wp-0005",
"confidence":"medium",
"why":"Large workstream (11 tasks) with 4 human-needed items including high-priority design decisions"
},
{
"action":"work-next",
"candidate":"cust-wp-0003",
"confidence":"medium",
"why":"9 todo tasks, all high priority, no human intervention needed"
},
{
"action":"needs-consistency-sync",
"candidate":"cust-wp-0011",
"confidence":"medium",
"why":"Infrastructure migration overlaps with CUST-WP-0038, coordinate sequencing"
},
{
"action":"close-out",
"candidate":"state-wp-0052",
"confidence":"high",
"why":"No open tasks remaining, appears complete"
},
{
"action":"close-out",
"candidate":"ihub-wp-0018",
"confidence":"high",
"why":"No open tasks remaining, appears complete"
}
],
"summary":"11 active workstreams with 3 high-priority items needing immediate attention. CUST-WP-0044 (this triage system) and CUST-WP-0045 (daily runner) are in calibration phase. HF-WP-0001 has 5 human-needed tasks blocking ops-hub extension. One workstream blocked, infrastructure migration work distributed across multiple streams."
High-priority ops-hub establishment blocked on human decisions; multiple infrastructure migrations ready for automation; recommend focusing on unblocking foundational systems first
```json
{
"recommendations":[
{
"action":"needs-human",
"candidate":"hf-wp-0001",
"confidence":"high",
"rank":1,
"why":"Critical ops-hub establishment blocked with 5 human-needed tasks in wait status",
"wsjf":{
"job_size":2,
"opportunity_enablement":5,
"risk_reduction":4,
"score":9.5,
"strategic_value":5,
"time_criticality":5
}
},
{
"action":"work-next",
"candidate":"cust-wp-0011",
"confidence":"high",
"rank":2,
"why":"State Hub migration in progress with clear next steps, no blockers",
"wsjf":{
"job_size":2,
"opportunity_enablement":4,
"risk_reduction":4,
"score":7.6,
"strategic_value":4,
"time_criticality":5
}
},
{
"action":"work-next",
"candidate":"ihub-wp-0018",
"confidence":"high",
"rank":3,
"why":"Railiance01 deployment has actionable todo tasks ready for automation",
"wsjf":{
"job_size":2,
"opportunity_enablement":4,
"risk_reduction":3,
"score":6.8,
"strategic_value":4,
"time_criticality":4
}
},
{
"action":"needs-human",
"candidate":"rail-ho-wp-0005",
"confidence":"medium",
"rank":4,
"why":"Forgejo migration has high-priority human decisions needed for T02",
"wsjf":{
"job_size":3,
"opportunity_enablement":3,
"risk_reduction":4,
"score":5.0,
"strategic_value":3,
"time_criticality":3
}
},
{
"action":"split",
"candidate":"cust-wp-0025",
"confidence":"medium",
"rank":5,
"why":"FOS Hub Bootstrap has 25 todo tasks - too large, should be decomposed",
"wsjf":{
"job_size":4,
"opportunity_enablement":4,
"risk_reduction":3,
"score":3.8,
"strategic_value":4,
"time_criticality":2
}
},
{
"action":"park",
"candidate":"cust-wp-0038",
"confidence":"high",
"rank":6,
"why":"HA migration should wait until basic State Hub migration completes",
"wsjf":{
"job_size":4,
"opportunity_enablement":3,
"risk_reduction":4,
"score":3.2,
"strategic_value":3,
"time_criticality":1
}
},
{
"action":"needs-cross-agent",
"candidate":"railiance-wp-0004",
"confidence":"medium",
"rank":7,
"why":"Package registry publication depends on Gitea/Forgejo infrastructure decisions",
"wsjf":{
"job_size":3,
"opportunity_enablement":3,
"risk_reduction":2,
"score":3.0,
"strategic_value":2,
"time_criticality":2
}
},
{
"action":"needs-consistency-sync",
"candidate":"issue-wp-0002",
"confidence":"medium",
"rank":8,
"why":"Blocked status conflicts with railiance-wp-0004 dependency chain",
"wsjf":{
"job_size":3,
"opportunity_enablement":2,
"risk_reduction":2,
"score":2.7,
"strategic_value":2,
"time_criticality":2
}
},
{
"action":"revisit",
"candidate":"activity-wp-0006",
"confidence":"low",
"rank":9,
"why":"Single wait task with medium priority - check if conditions met",
"wsjf":{
"job_size":3,
"opportunity_enablement":2,
"risk_reduction":2,
"score":2.5,
"strategic_value":2,
"time_criticality":1
}
},
{
"action":"revisit",
"candidate":"rail-fab-wp-0023",
"confidence":"low",
"rank":10,
"why":"Single wait task for UI improvement - verify completion conditions",
"wsjf":{
"job_size":2,
"opportunity_enablement":1,
"risk_reduction":1,
"score":2.0,
"strategic_value":1,
"time_criticality":1
}
}
],
"summary":"High-priority ops-hub establishment blocked on human decisions; multiple infrastructure migrations ready for automation; recommend focusing on unblocking foundational systems first"
11 active workstreams with critical ops-hub establishment blocked on human decisions. High-priority infrastructure migrations ready for execution while foundational systems await manual intervention.
```json
{
"recommendations":[
{
"action":"needs-human",
"candidate":"hf-wp-0001",
"confidence":"high",
"rank":1,
"why":"5 high-priority tasks waiting for human decisions to establish ops-hub foundation",
"wsjf":{
"job_size":2,
"opportunity_enablement":5,
"risk_reduction":5,
"score":10.0,
"strategic_value":5,
"time_criticality":5
}
},
{
"action":"work-next",
"candidate":"cust-wp-0011",
"confidence":"high",
"rank":2,
"why":"State Hub migration in progress with clear next steps and no blockers",
"wsjf":{
"job_size":2,
"opportunity_enablement":5,
"risk_reduction":5,
"score":9.5,
"strategic_value":5,
"time_criticality":4
}
},
{
"action":"needs-human",
"candidate":"rail-ho-wp-0005",
"confidence":"high",
"rank":3,
"why":"Forgejo migration requires human decisions on production design",
"wsjf":{
"job_size":3,
"opportunity_enablement":4,
"risk_reduction":4,
"score":7.7,
"strategic_value":4,
"time_criticality":4
}
},
{
"action":"needs-cross-agent",
"candidate":"cust-wp-0047",
"confidence":"medium",
"rank":4,
"why":"Ops Hub widgets waiting on Inter-Hub activation coordination",
"wsjf":{
"job_size":2,
"opportunity_enablement":4,
"risk_reduction":3,
"score":7.5,
"strategic_value":4,
"time_criticality":4
}
},
{
"action":"needs-cross-agent",
"candidate":"ihub-wp-0018",
"confidence":"medium",
"rank":5,
"why":"Railiance01 deployment has dependencies across multiple systems",
"wsjf":{
"job_size":3,
"opportunity_enablement":3,
"risk_reduction":4,
"score":6.7,
"strategic_value":4,
"time_criticality":3
}
},
{
"action":"work-next",
"candidate":"agentic-wp-0001",
"confidence":"high",
"rank":6,
"why":"Ready status with clear todo tasks for State Hub integration",
"wsjf":{
"job_size":2,
"opportunity_enablement":3,
"risk_reduction":3,
"score":6.0,
"strategic_value":3,
"time_criticality":3
}
},
{
"action":"park",
"candidate":"cust-wp-0038",
"confidence":"high",
"rank":7,
"why":"HA migration should wait until basic State Hub migration completes",
"wsjf":{
"job_size":4,
"opportunity_enablement":3,
"risk_reduction":4,
"score":5.0,
"strategic_value":4,
"time_criticality":2
}
},
{
"action":"park",
"candidate":"cust-wp-0025",
"confidence":"medium",
"rank":8,
"why":"25 todo tasks indicate scope too large, should be split after core systems stable",
"wsjf":{
"job_size":5,
"opportunity_enablement":4,
"risk_reduction":3,
"score":3.6,
"strategic_value":3,
"time_criticality":2
}
},
{
"action":"revisit",
"candidate":"activity-wp-0006",
"confidence":"low",
"rank":9,
"why":"Single waiting task needs status check for operational hardening",
"wsjf":{
"job_size":1,
"opportunity_enablement":3,
"risk_reduction":2,
"score":3.0,
"strategic_value":2,
"time_criticality":2
}
},
{
"action":"close-out",
"candidate":"ihub-wp-0010",
"confidence":"high",
"rank":10,
"why":"No open tasks remaining, appears complete",
"wsjf":{
"job_size":1,
"opportunity_enablement":1,
"risk_reduction":1,
"score":2.0,
"strategic_value":1,
"time_criticality":1
}
}
],
"summary":"11 active workstreams with critical ops-hub establishment blocked on human decisions. High-priority infrastructure migrations ready for execution while foundational systems await manual intervention."
High-priority ops-hub establishment blocked on human decisions; infrastructure migrations progressing; recommend focusing on unblocking wait states and completing foundational deployments
```json
{
"recommendations":[
{
"action":"needs-human",
"candidate":"hf-wp-0001",
"confidence":"high",
"rank":1,
"why":"Critical ops-hub establishment blocked with 5 human-needed tasks in wait state",
"wsjf":{
"job_size":2,
"opportunity_enablement":3,
"risk_reduction":5,
"score":9.0,
"strategic_value":5,
"time_criticality":5
}
},
{
"action":"work-next",
"candidate":"cust-wp-0011",
"confidence":"high",
"rank":2,
"why":"State Hub migration in progress with clear next steps, foundational for other workstreams",
"wsjf":{
"job_size":3,
"opportunity_enablement":3,
"risk_reduction":4,
"score":6.0,
"strategic_value":4,
"time_criticality":4
}
},
{
"action":"needs-human",
"candidate":"rail-ho-wp-0005",
"confidence":"high",
"rank":3,
"why":"Forgejo migration has 4 human-needed tasks, critical for development infrastructure",
"wsjf":{
"job_size":3,
"opportunity_enablement":3,
"risk_reduction":4,
"score":5.7,
"strategic_value":4,
"time_criticality":3
}
},
{
"action":"work-next",
"candidate":"ihub-wp-0018",
"confidence":"medium",
"rank":4,
"why":"Railiance01 deployment progressing with actionable todo tasks",
"wsjf":{
"job_size":3,
"opportunity_enablement":3,
"risk_reduction":3,
"score":5.0,
"strategic_value":4,
"time_criticality":3
}
},
{
"action":"needs-cross-agent",
"candidate":"cust-wp-0047",
"confidence":"medium",
"rank":5,
"why":"Ops Hub widgets blocked on Inter-Hub activation, needs coordination",
"wsjf":{
"job_size":2,
"opportunity_enablement":4,
"risk_reduction":2,
"score":4.5,
"strategic_value":3,
"time_criticality":4
}
},
{
"action":"work-next",
"candidate":"agentic-wp-0001",
"confidence":"medium",
"rank":6,
"why":"Ready status with clear todo tasks for State Hub integration",
"wsjf":{
"job_size":3,
"opportunity_enablement":5,
"risk_reduction":3,
"score":4.3,
"strategic_value":3,
"time_criticality":2
}
},
{
"action":"split",
"candidate":"cust-wp-0025",
"confidence":"high",
"rank":7,
"why":"Large workstream with 25 todo tasks, should be broken into smaller chunks",
"wsjf":{
"job_size":5,
"opportunity_enablement":4,
"risk_reduction":3,
"score":3.6,
"strategic_value":4,
"time_criticality":2
}
},
{
"action":"park",
"candidate":"cust-wp-0038",
"confidence":"medium",
"rank":8,
"why":"HA migration should wait until basic State Hub migration completes",
"wsjf":{
"job_size":4,
"opportunity_enablement":3,
"risk_reduction":4,
"score":2.6,
"strategic_value":2,
"time_criticality":1
}
},
{
"action":"revisit",
"candidate":"activity-wp-0006",
"confidence":"low",
"rank":9,
"why":"Single wait task for calibration feedback, check if conditions are met",
"wsjf":{
"job_size":2,
"opportunity_enablement":2,
"risk_reduction":1,
"score":2.5,
"strategic_value":2,
"time_criticality":2
}
},
{
"action":"close-out",
"candidate":"ihub-wp-0010",
"confidence":"high",
"rank":10,
"why":"No open tasks remaining, appears complete",
"wsjf":{
"job_size":1,
"opportunity_enablement":1,
"risk_reduction":1,
"score":2.0,
"strategic_value":1,
"time_criticality":1
}
}
],
"summary":"High-priority ops-hub establishment blocked on human decisions; infrastructure migrations progressing; recommend focusing on unblocking wait states and completing foundational deployments"
secrets_rule:"Do not store credentials, tokens, private addresses that are not already operationally documented, or command output containing secrets."
| `work-next` | Best next executable task | scope is local, reversible, and inside an existing approved workplan | it touches money, legal, secrets, public reputation, or external commitments |
| `revisit` | Re-read and refresh before execution | the plan is stale, ambiguous, blocked, or context has moved | revisiting changes purpose, scope, owner, or approval posture |
| `split` | Break an oversized workplan into smaller plans | split is file-backed and preserves provenance | split would drop scope, change priorities, or alter commitments |
| `park` | Move out of active focus | plan is clearly not current and parking is proposed only | actually changing status to backlog/archive needs review |
| `close-out` | Finish closure review and mark done when appropriate | remaining tasks are truly done/cancelled/carry-forwarded | tasks are ambiguous, cancelled for policy reasons, or external effects are involved |
| `needs-human` | Human decision or approval needed | never auto-resolve; report the ask crisply | always |
| `needs-cross-agent` | Another repo/agent is the right owner | send/prepare a coordination message or task only when requested | when ownership or priority is uncertain |
| `needs-consistency-sync` | File/DB/index state should be reconciled | running read/check/fix consistency is already allowed by repo protocol | sync would overwrite work or local repo is behind remote |
The operational brain of the Custodian: a local PostgreSQL database, FastAPI REST service, FastMCP SSE server for Claude Code, Observable Framework dashboard, and a `custodian` CLI.
State Hub is no longer owned as an embedded implementation tree in this
Data loaders (`src/data/*.json.py`) are Python scripts that call the local API. They run at dev-server start and on `npm run build`. Clear the cache if data appears stale:
```bash
rm -rf dashboard/src/.observablehq/cache/
```
---
## Known Issues / WSL2 Notes
- **TLS bad record MAC on large downloads**: WSL2 corrupts packets on big TCP transfers. Use `scripts/pull_image.py` instead of `docker pull` for future image pulls.
- **MCP server is now SSE, not stdio**: Re-registration is `claude mcp add-json -s user state-hub '{"type":"sse","url":"http://127.0.0.1:8001/sse"}'`. The `patch_mcp_cwd.py` script and `.mcp.json` config are legacy artifacts from the old stdio setup.
- **AsyncSession concurrency**: SQLAlchemy 2.0 async sessions don't support concurrent operations. All queries in `/state/summary` run sequentially on a single session.
This directory remains only as a pointer so old references to
`the-custodian/state-hub` fail gently instead of implying that this repository
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.