Commit Graph

213 Commits

Author SHA1 Message Date
f2bde496cb chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - CUST-WP-0028-T06: todo → cancelled
2026-03-27 01:01:43 +01:00
2e9a7b6244 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 01:01:37 +01:00
19381858f5 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 01:01:06 +01:00
1b9061d726 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 01:00:26 +01:00
54b5087df0 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:59:11 +01:00
7b807ede10 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:58:44 +01:00
0082234b3e chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:57:46 +01:00
b2c7e7ee06 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:57:03 +01:00
7cf8d3a7a2 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:56:19 +01:00
7a2dbe7ede chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:55:43 +01:00
e413ae9a18 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:55:11 +01:00
ca21812c30 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:54:33 +01:00
a04f873223 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:53:54 +01:00
6b163683e6 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:53:11 +01:00
d061c777d1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:52:18 +01:00
276196028a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:51:25 +01:00
5d5a922dc2 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:50:55 +01:00
2c63b5145d chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:50:15 +01:00
aacaaaceb5 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:49:29 +01:00
ecfe59a8f2 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:49:01 +01:00
d9a5ab18c2 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:48:28 +01:00
94dde646ba chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:42:06 +01:00
29b84de13c ADR and Runbook artefacts 2026-03-27 00:16:09 +01:00
b19896a9a9 docs(dashboard): add technical reference page for Observable Framework dashboard
Documents the dashboard's architecture, framework choice rationale, data-fetching
strategies (static loaders + live polling), component library, page inventory,
and key features including the Workstream Health Index and entity modals.
Also registers the new page in the Reference nav and adds runbook section for
node overload / runaway agent process (INC-002) with hardening checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 00:09:18 +01:00
6018df03cf feat(brief): generate .custodian-brief.md per repo for offline worker orientation
Adds _write_custodian_brief() to consistency_check.py. After every fix_repo()
run, a .custodian-brief.md is written to the repo root with: domain, last-synced
timestamp, current repo goal, active workstreams with progress (done/total), and
the first 7 open tasks per workstream (blocked → in_progress → todo order) with
task IDs. The file is git-committed when content changes so remote workers (e.g.
CoulombCore) can pull it and orient without a live MCP connection.

Session protocol template and CLAUDE.md updated: read .custodian-brief.md first,
then call get_domain_summary() as an enhancement (skip if MCP unreachable).
This eliminates false "State hub is offline" alarms in subagents and remote workers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:48:36 +01:00
f5f1323eb3 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-26:
  - update .custodian-brief.md for the-custodian
2026-03-26 17:45:38 +01:00
8ad371f753 feat(consistency): fix-consistency-remote works without REPO for all repos
Adds --remote CLI flag and fix_all_remote() function. When run without a
REPO argument, the target checks all registered repos and:
- Skips repos whose local path does not exist on this machine
- Skips repos that are already clean (no fixable issues, no FAILs, not
  behind remote, only C-08 background noise allowed)
- For repos that need work: git pull --ff-only then fix_repo()

Prints a summary of CLEAN (skipped) and NOT ON THIS HOST (skipped) repos
before the detailed fix reports.

Simplifies the Makefile target from shell-level curl+git to a single
uv run call using --remote. Same flag handles both single-repo and all-repos.

Also adds _git_pull() helper and 13 new tests (71 total in consistency suite).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 14:38:30 +01:00
86fd570533 fix(consistency): correct behind-remote detection to not trigger on local-ahead
_detect_behind_remote was comparing HEAD != @{u} which incorrectly
triggered C-16 when the local repo had unpushed commits. Fixed to use
git rev-list --count HEAD..@{u} which only counts commits the remote
has that local lacks. Adds test_returns_false_when_local_ahead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 13:31:28 +01:00
f1b72aab82 chore(workplan): mark CUST-WP-0026 done — all tasks completed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 12:16:33 +01:00
626061dff1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-26:
  - CUST-WP-0026-T05: todo → done
2026-03-26 11:41:51 +01:00
9d538fa80b chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-26:
  - CUST-WP-0026-T04: todo → done
2026-03-26 11:41:50 +01:00
84306d1a7a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-26:
  - CUST-WP-0026-T03: todo → done
2026-03-26 11:41:50 +01:00
a48c59793c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-26:
  - CUST-WP-0026-T02: todo → done
2026-03-26 11:41:50 +01:00
7cbc5751d4 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-26:
  - CUST-WP-0026-T01: todo → done
2026-03-26 11:41:50 +01:00
26c920ce3e feat(consistency): distributed multi-machine safety (CUST-WP-0026)
T01 — No-regress rule (C-15): fix-consistency now detects when a DB task
status is ahead of the workplan file (e.g. marked done on CoulombCore)
and emits C-15 WARN instead of regressing the DB back to the stale file
value. STATUS_ORDER ranking: todo(0) < in_progress/blocked(1) < done/cancelled(2).

T02 — Pull gate (C-16): fix_repo runs git fetch + rev-parse at the start
of every --fix run. If the local repo is behind its remote tracking branch,
all write operations are skipped and C-16 WARN is emitted. Best-effort:
offline/no-remote silently skips the check.

T03 — DB→file writeback: C-15 fix path patches the status field in the
matching task block and git-commits the change with a standard message.
--no-writeback flag disables writeback while keeping T01/T02 active.

T04 — CLAUDE.md + session-protocol.template updated with new guidance,
C-15/C-16 semantics, and fix-consistency-remote recommendation.

T05 — Makefile: fix-consistency-remote pulls then fixes in one step.

16 new tests; 155 passed total.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 10:19:23 +01:00
41d239c166 ops: establish ops/ directory with Gitea runbook and INC-001 incident report
- Create ops/runbooks/gitea-coulombcore.md — recovery checklist for Gitea
  on COULOMBCORE, documents containerd StartError pattern and CPU budget issue
- Create ops/incidents/2026-03-25-gitea-pgpool-crashloop.md — INC-001 post-mortem
  for 13-day Gitea outage (PGPool CrashLoopBackOff + rolling update CPU deadlock)
- Create ops/README.md — index for runbooks and incidents
- state-hub/dashboard/src/docs/connecting.md: add railiance01 tunnel config
  (was previously unsaved)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:30:44 +01:00
efbbef76b0 feat(capability-requests): add routing dispute & reroute workflow (CUST-WP-0027)
Adds a structured dispute mechanism when capability request routing is wrong:
- New `routing_disputed` status with four DB columns (dispute_reason, disputed_by,
  dispute_suggested_domain, disputed_at) via Alembic migration m0h1i2j3k4l5
- POST /capability-requests/{id}/dispute — any party can flag misrouting with a reason
  and optional suggested domain; notifies custodian + current fulfilling domain
- POST /capability-requests/{id}/reroute — custodian re-routes to correct domain via
  catalog_entry_id or direct slug; appends audit trail to routing_note; resets to requested
- Two new MCP tools: dispute_capability_routing and reroute_capability_request
- Dashboard: amber disputed-banner at top of Summary, routing_disputed Kanban column,
  dispute details (reason, suggested domain, raised-by) shown on disputed cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 23:58:52 +01:00
e31693ad67 feat(workplan): CUST-WP-0027 — capability request dispute & negotiation
Adds a routing_disputed status, dispute/reroute endpoints, MCP tools,
and dashboard highlighting so any agent can flag a misrouting and
negotiate before a request is accepted.

Also rerouted 0e0aefd7 (KeyCape GHCR image) from railiance → netkingdom
and created netkingdom catalog entry for container image publishing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 23:42:25 +01:00
e423ff7126 feat(dashboard): add Tools & Apps page with liveness probes
New page at /tools listing all connected applications grouped by
category: Local Services (State Hub API, KeePassXC, pgAdmin, ops-bridge),
Source Control (Gitea), Identity/Auth (KeyCape, Authelia, privacyIDEA,
LLDAP), and Dev Tooling (Claude Code, uv). Local services show live
green/red/grey status dots via no-cors fetch probes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 01:18:11 +01:00
ad3024e55f feat(workplan): CUST-WP-0026 — distributed consistency for multi-machine state sync
Addresses false status regressions when work progresses on CoulombCore
via ops-bridge but local workplan files are stale. Three-layer fix:
T01 no-regress rule (C-13), T02 pull gate (C-14), T03 DB→file writeback.
Plus session protocol update and fix-consistency-remote Makefile target.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 01:04:00 +01:00
5bb71a223f docs: correct domain count from six to seven across all documentation
netkingdom was registered as the 7th active domain on 2026-02-28 but
was never reflected in CLAUDE.md, README.md, or SCOPE.md. Fixed all
three files. Global ~/.claude/CLAUDE.md updated separately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 00:19:15 +01:00
0777e5b2f0 feat: add FOS/credential standards, big-picture guidance, and CUST-WP-0025 workplan
- canon/standards/credential-management_v0.1.md: single root-of-trust credential hierarchy standard
- canon/standards/federated-organization-standard_v1.0.md: FOS reference architecture (VSM-based)
- wiki/BigPictureGuidance.md: integration guidance for OAS + FOS orthogonal layers
- workplans/CUST-WP-0025-fos-hub-bootstrap.md: 4-phase plan (identity, hub-core extraction, ops-hub, fin-hub)
- state-hub/Makefile: treat exit 2 (warnings-only) as success in check-consistency targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 23:48:13 +01:00
cbad0dc958 Some cleanup needs to wait for later 2026-03-20 03:52:05 +01:00
634642cb52 feat(capability-requests): add routing_note, PATCH endpoint, word-boundary fix, and ops-bridge tunnel targets
- Add `routing_note` column (migration l9g0h1i2j3k4) to persist why a request was routed to a given domain
- Fix substring-match bug in `_route_capability`: use `\b` word-boundary regex so 'postgres' no longer matches inside 'postgresql'
- Include `title` in keyword scoring for better routing accuracy
- Return `routing_note` string from `_route_capability` and store it on the request
- Add `PATCH /capability-requests/{id}` endpoint + `CapabilityRequestPatch` schema to correct mutable metadata (catalog_entry_id, priority, blocking_task_id, fulfilling_workstream_id)
- Add `patch_capability_request` MCP tool wrapping the new endpoint
- Add 105 lines of routing tests (word-boundary, title-match, multi-entry scoring, broadcast fallback)
- Add `tunnels-up`, `tunnels-status`, `tunnels-check` Makefile targets for ops-bridge managed tunnels

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 03:47:54 +01:00
1e0ae37c89 docs: add State Hub reference page and restructure reference index
New page (docs/state-hub.md) covers:
- Why: the invisible state problem across repos and agents
- What: Derived Data Store, Read Model, Agent Orchestration Layer,
  Cross-Repo Observatory — and what it is NOT
- Derived Data Store principle (ADR-003): fingerprint cache, rebuild
  guarantee, force-refresh
- Repository Orchestrator: session protocol, cross-domain coordination
  via messages + capability routing, Kaizen agents
- Architecture diagram (ASCII), technology choices, data model overview
- Running the hub, design principles, related docs

reference.md: add Architecture & Design section grouping state-hub,
TPSC, GDPR maturity, SCOPE.md, capabilities, and goals docs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 02:01:58 +01:00
51e95ec21a docs(adr): ADR-003 — Materialized Derived State with Fingerprint Invalidation
Formalises the caching pattern introduced in doi_cache: pre-compute and store
repo-sourced derived data, invalidate by fingerprint (composite of DB timestamps
+ file mtimes), force-refresh on demand.

Names the pattern against the literature (Materialized View, Derived Data Store,
CQRS Read Model, ETag-style invalidation) and mandates its use for all future
repo-sourced derived data with an implementation checklist.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:54:21 +01:00
d63e7310d5 perf(doi): fingerprint-based DB cache for DoI results
Adds doi_cache table (migration k8f9a0b1c2d3). Results are stored after
each evaluation and reused on subsequent requests when the fingerprint
matches. Fingerprint covers repo.updated_at, latest TPSC snapshot_at,
latest goal updated_at, and mtime of SCOPE.md / CLAUDE.md / tpsc.yaml.

Behaviour:
- Summary (warm cache, nothing changed): ~0.4s (was 0.9s)
- Summary (one repo stale): ~0.9s (only stale repos recomputed)
- Single repo (cache hit): ~0.2s (was 40s for full check)
- Single repo ?force_refresh=true: ~2s (full C7/C13 subprocess check)

Total journey: 108s (original) → 6s → <1s → 0.2s (cached single repo)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:47:19 +01:00
6ae5cb6bf7 perf(doi): eliminate HTTP self-calls in summary — 48 calls → 3 bulk DB queries
Root cause: C2/C9/C10 each made a full HTTP round-trip back to the API
(asyncio.to_thread → urllib → TCP → uvicorn → SQLAlchemy → DB) for every
repo. 16 repos × 3 calls = 48 self-calls at ~80-150ms each = ~6s total.

Fix: doi_engine.evaluate() accepts a prefetch dict. The summary endpoint
runs 3 bulk GROUP BY queries (domain status, TPSC snapshot counts, active
goal counts) and passes results directly — zero HTTP self-calls in summary
mode.

Result: /repos/doi/summary 6s → <1s (6× improvement on top of prior 13×).
Total improvement from original: 108s → <1s.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:37:40 +01:00
7dec3ac9ee perf(dashboard): lazy-load DoI tiers on Repositories page
Page now renders in ~200ms. DoI badges and KPI card show a spinner
while the background fetch resolves (~6s), then update reactively
via Observable Mutable pattern (doiData / doiLoading).

Fast path: repos, SBOM, domains, workstreams — immediate render.
Slow path: /repos/doi/summary — background, non-blocking.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:31:48 +01:00
66e3a9afe4 perf(doi): 13x speedup for /repos/doi/summary (108s → ~6s)
Two fixes:
1. skip_consistency=True in summary mode — omits C7/C13 subprocess calls
   (consistency_check.py) which were the main bottleneck (32 spawns for 16 repos).
   Full check still available per-repo via GET /repos/{slug}/doi.
2. asyncio.gather — all repos evaluated in parallel instead of sequentially.

Also: rename Repositories page title from "Repos" to "Repositories".

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:29:27 +01:00