Compare commits

...

24 Commits

Author SHA1 Message Date
dbe917ceae Workplan consistency optimization 2026-07-04 00:42:56 +02:00
5388aad77a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-03:
  - update .custodian-brief.md for state-hub
2026-07-03 19:27:09 +02:00
a0167ff386 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-03:
  - update .custodian-brief.md for state-hub
2026-07-03 18:06:57 +02:00
2b6a3ef521 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-03:
  - update .custodian-brief.md for state-hub
2026-07-03 11:22:22 +02:00
8bd4a67639 CUST-WP-0011-T07 done: cluster State Hub is primary (exact-count restore, tunnel rewire)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 11:20:16 +02:00
ea1fd23481 Workplan terminology: templates, updater guard, add_progress_event alias
- project_rules templates: rename workstream->workplan in prose; registration
  guidance is now file-first + fix-consistency C-06 (manual create_workplan/
  create_workstream calls create duplicates); progress examples use
  workplan_id; legacy field names (state_hub_workstream_id) annotated
- update_agent_instruction_files: never overwrite filled-in
  stack-and-commands/repo-boundary/architecture rules (TODO-marker guard)
- mcp_server: add_progress_event accepts workplan_id (preferred) with
  workstream_id kept as legacy alias, mirroring create_task

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 01:47:47 +02:00
a361ce8731 chore: add local consistency sync cli 2026-07-02 00:15:16 +02:00
1f61008837 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-01:
  - update .custodian-brief.md for state-hub
2026-07-01 23:50:19 +02:00
d2e5f4c8cc chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-01:
  - update .custodian-brief.md for state-hub
2026-07-01 23:47:46 +02:00
24041bc3ef chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-25:
  - update .custodian-brief.md for state-hub
2026-06-25 16:02:15 +02:00
cf00d3bba5 docs(statehub): record railiance data restore 2026-06-25 16:00:39 +02:00
7661146b48 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-25:
  - update .custodian-brief.md for state-hub
2026-06-25 15:40:55 +02:00
8a9bfcc9bd feat(statehub): deploy empty railiance state hub 2026-06-25 15:39:53 +02:00
ec991f4ccd chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-25:
  - update .custodian-brief.md for state-hub
2026-06-25 15:16:51 +02:00
434c80c2c3 feat(statehub): add railiance deployment manifests 2026-06-25 15:15:30 +02:00
6ee5542a88 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-25:
  - update .custodian-brief.md for state-hub
2026-06-25 14:02:54 +02:00
48815b3db9 feat(statehub): publish railiance image 2026-06-25 14:01:10 +02:00
b536741539 feat(statehub): add offline write buffer relay 2026-06-25 13:44:27 +02:00
63f0398304 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-23:
  - update .custodian-brief.md for state-hub
2026-06-23 22:13:39 +02:00
c7370c360a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-23:
  - update .custodian-brief.md for state-hub
2026-06-23 21:42:18 +02:00
13a331cdf1 Complete State Hub bootstrap workplans (WP-0001)
- Review integration files; fill SCOPE where templated
- Document dev workflow in stack-and-commands.md
- Seed WP-0002 implementation workplan; mark bootstrap finished
- Hub sync via fix-consistency
2026-06-22 23:35:32 +02:00
eebb1b8c29 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for state-hub
2026-06-22 23:27:22 +02:00
020f3c1688 Close STATE-WP-0067 attached-repo agent normalization workplan
Record batch results, mark all tasks done, and set workplan status to finished.
2026-06-22 23:26:48 +02:00
0f3dba6d83 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for state-hub
2026-06-22 23:21:20 +02:00
62 changed files with 3779 additions and 134 deletions

View File

@@ -2,7 +2,7 @@
# Custodian Brief — state-hub # Custodian Brief — state-hub
**Domain:** infotech **Domain:** infotech
**Last synced:** 2026-06-22 15:54 UTC **Last synced:** 2026-07-03 17:27 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Active Workstreams ## Active Workstreams
@@ -21,14 +21,9 @@ Progress: 0/8 done | workstream_id: `8d0c1b5d-44da-4b91-8357-e6526d3e0a85`
- … and 1 more open tasks - … and 1 more open tasks
### Pragmatic State Hub Migration to railiance01 ### Pragmatic State Hub Migration to railiance01
Progress: 2/9 done | workstream_id: `967baafb-d92d-405a-ba0b-0d00d37c4940` Progress: 7/9 done | workstream_id: `967baafb-d92d-405a-ba0b-0d00d37c4940`
**Open tasks:** **Open tasks:**
- ► T03 — Build and push State Hub container image `79908ade`
- · T04 — Deploy to cluster and run Alembic migrations `a7baf2eb`
- · T05 — Migrate data from WSL2 to cluster `a307dd46`
- · T06 — Drill cluster backup restore `03753b88`
- · T07 — Cutover: redirect MCP config to cluster `ff1de25e`
- · T08 — Stabilisation period (2 weeks minimum) `e06a59a0` - · T08 — Stabilisation period (2 weeks minimum) `e06a59a0`
- · T09 — Retire WSL2 instance `d75a2d49` - · T09 — Retire WSL2 instance `d75a2d49`

View File

@@ -92,12 +92,12 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
**Close:** **Close:**
1. Update workplan file task statuses to reflect progress 1. Update workplan file task statuses to reflect progress
2. Log: `POST /progress/` with a summary of what changed 2. Log: `POST /progress/` with a summary of what changed
3. Note for the custodian operator: after workplan file changes, run from 3. After workplan file changes, run:
`~/state-hub`:
```bash ```bash
make fix-consistency REPO=state-hub statehub fix-consistency
``` ```
This syncs task status from files into the hub DB. Coding agents should run this directly; ask the operator only if the CLI or
State Hub API is unavailable. This syncs task status from files into the hub DB.
--- ---
@@ -215,5 +215,5 @@ Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blo
To create a new workplan: To create a new workplan:
1. Write the file following the format above 1. Write the file following the format above
2. Notify the custodian operator to run `make fix-consistency REPO=state-hub` 2. Run `statehub fix-consistency` locally; ask the operator only if the CLI or
(or send a message to the hub agent via `POST /messages/`) State Hub API is unavailable.

View File

@@ -20,10 +20,10 @@ with open("pyproject.toml", "rb") as f:
project = tomllib.load(f)["project"] project = tomllib.load(f)["project"]
for dep in project["dependencies"]: for dep in project["dependencies"]:
# llm-connect is currently a local editable test integration in this repo. # llm-connect is a local editable test integration and must not be pulled
# The State Hub API/MCP runtime does not import it, and a container build # into the production image. hub-core is runtime code, but it is installed
# must not depend on /home/worsch existing inside the image. # from the named Docker build context below because it is not published yet.
if dep == "llm-connect": if dep in {"llm-connect", "hub-core"}:
continue continue
print(dep) print(dep)
PY PY
@@ -31,6 +31,11 @@ PY
RUN uv venv /app/.venv \ RUN uv venv /app/.venv \
&& uv pip install --python /app/.venv/bin/python --no-cache -r /tmp/requirements.txt && uv pip install --python /app/.venv/bin/python --no-cache -r /tmp/requirements.txt
COPY --from=hub_core_src pyproject.toml /tmp/hub-core/pyproject.toml
COPY --from=hub_core_src hub_core/ /tmp/hub-core/hub_core/
RUN uv pip install --python /app/.venv/bin/python --no-cache /tmp/hub-core
COPY alembic.ini ./ COPY alembic.ini ./
COPY api/ ./api/ COPY api/ ./api/
COPY flows/ ./flows/ COPY flows/ ./flows/

View File

@@ -1,7 +1,17 @@
.PHONY: install install-cli dashboard-install dashboard-check db db-tools migrate seed api dashboard check test test-python clean register-project register-codex-project register-mcp bootstrap-env validate-adr add-domain rename-domain add-repo list-repos register-path register-from-classification register-from-classification-all cleanup-stale tunnels-up tunnels-status tunnels-check bridges install-hooks install-hooks-all gitea-inventory token-reconcile .PHONY: install install-cli dashboard-install dashboard-check db db-tools migrate seed api dashboard check test test-python clean register-project register-codex-project register-mcp bootstrap-env validate-adr add-domain rename-domain add-repo list-repos register-path register-from-classification register-from-classification-all cleanup-stale tunnels-up tunnels-status tunnels-check bridges install-hooks install-hooks-all gitea-inventory token-reconcile railiance-state-hub-render railiance-state-hub-client-dry-run railiance-state-hub-server-dry-run
COMPOSE = docker compose -f infra/docker-compose.yml --env-file .env COMPOSE = docker compose -f infra/docker-compose.yml --env-file .env
PYTHON ?= python3 PYTHON ?= python3
HELM ?= $(shell command -v helm 2>/dev/null || if [ -x "$$HOME/.local/bin/helm" ]; then printf "%s" "$$HOME/.local/bin/helm"; else printf "%s" "helm"; fi)
KUBECTL ?= $(shell command -v kubectl 2>/dev/null || if [ -x "$$HOME/.local/bin/kubectl" ]; then printf "%s" "$$HOME/.local/bin/kubectl"; else printf "%s" "kubectl"; fi)
RAILIANCE_STATE_HUB_RELEASE ?= state-hub
RAILIANCE_STATE_HUB_NAMESPACE ?= state-hub
RAILIANCE_STATE_HUB_CHART ?= deploy/railiance/apps/charts/state-hub
RAILIANCE_STATE_HUB_VALUES ?= deploy/railiance/apps/helm/state-hub-values.yaml
RAILIANCE_STATE_HUB_IMAGE_TAG ?= b536741
RAILIANCE_STATE_HUB_PLATFORM_DIR ?= deploy/railiance/platform
RAILIANCE_STATE_HUB_APP_MANIFESTS ?= deploy/railiance/apps/manifests
# Codex/WSL non-login shells may not source ~/.profile; keep uv discoverable. # Codex/WSL non-login shells may not source ~/.profile; keep uv discoverable.
UV ?= $(shell command -v uv 2>/dev/null || if [ -x "$$HOME/.local/bin/uv" ]; then printf "%s" "$$HOME/.local/bin/uv"; else printf "%s" "uv"; fi) UV ?= $(shell command -v uv 2>/dev/null || if [ -x "$$HOME/.local/bin/uv" ]; then printf "%s" "$$HOME/.local/bin/uv"; else printf "%s" "uv"; fi)
@@ -61,6 +71,49 @@ dashboard:
check: check:
curl -sf http://127.0.0.1:8000/state/health | python3 -m json.tool curl -sf http://127.0.0.1:8000/state/health | python3 -m json.tool
railiance-state-hub-render:
$(HELM) template $(RAILIANCE_STATE_HUB_RELEASE) $(RAILIANCE_STATE_HUB_CHART) \
--namespace $(RAILIANCE_STATE_HUB_NAMESPACE) \
-f $(RAILIANCE_STATE_HUB_VALUES) \
--set image.tag=$(RAILIANCE_STATE_HUB_IMAGE_TAG)
railiance-state-hub-client-dry-run:
@set -e; \
tmpdir="$$(mktemp -d)"; \
trap 'rm -rf "$$tmpdir"' EXIT; \
$(HELM) template $(RAILIANCE_STATE_HUB_RELEASE) $(RAILIANCE_STATE_HUB_CHART) \
--namespace $(RAILIANCE_STATE_HUB_NAMESPACE) \
-f $(RAILIANCE_STATE_HUB_VALUES) \
--set image.tag=$(RAILIANCE_STATE_HUB_IMAGE_TAG) > "$$tmpdir/state-hub.yaml"; \
$(KUBECTL) apply --dry-run=client -f $(RAILIANCE_STATE_HUB_PLATFORM_DIR)/state-hub-db-credentials.sops.yaml.template; \
$(KUBECTL) apply --dry-run=client -f $(RAILIANCE_STATE_HUB_PLATFORM_DIR)/state-hub-db-cluster.yaml; \
$(KUBECTL) apply --dry-run=client -f $(RAILIANCE_STATE_HUB_PLATFORM_DIR)/state-hub-db-networkpolicies.yaml; \
$(KUBECTL) apply --dry-run=client -f $(RAILIANCE_STATE_HUB_APP_MANIFESTS)/state-hub-namespace.yaml; \
$(KUBECTL) apply --dry-run=client -f $(RAILIANCE_STATE_HUB_APP_MANIFESTS)/state-hub-env.secret.sops.yaml.template; \
$(KUBECTL) apply --dry-run=client -n $(RAILIANCE_STATE_HUB_NAMESPACE) -f "$$tmpdir/state-hub.yaml"
railiance-state-hub-server-dry-run:
@set -e; \
tmpdir="$$(mktemp -d)"; \
trap 'rm -rf "$$tmpdir"' EXIT; \
$(HELM) template $(RAILIANCE_STATE_HUB_RELEASE) $(RAILIANCE_STATE_HUB_CHART) \
--namespace $(RAILIANCE_STATE_HUB_NAMESPACE) \
-f $(RAILIANCE_STATE_HUB_VALUES) \
--set image.tag=$(RAILIANCE_STATE_HUB_IMAGE_TAG) > "$$tmpdir/state-hub.yaml"; \
$(KUBECTL) apply --dry-run=server -f $(RAILIANCE_STATE_HUB_PLATFORM_DIR)/state-hub-db-credentials.sops.yaml.template; \
$(KUBECTL) apply --dry-run=server -f $(RAILIANCE_STATE_HUB_PLATFORM_DIR)/state-hub-db-cluster.yaml; \
$(KUBECTL) apply --dry-run=server -f $(RAILIANCE_STATE_HUB_PLATFORM_DIR)/state-hub-db-networkpolicies.yaml; \
$(KUBECTL) apply --dry-run=server -f $(RAILIANCE_STATE_HUB_APP_MANIFESTS)/state-hub-namespace.yaml; \
if $(KUBECTL) get namespace $(RAILIANCE_STATE_HUB_NAMESPACE) >/dev/null 2>&1; then \
$(KUBECTL) apply --dry-run=server -f $(RAILIANCE_STATE_HUB_APP_MANIFESTS)/state-hub-env.secret.sops.yaml.template; \
$(KUBECTL) apply --dry-run=server -n $(RAILIANCE_STATE_HUB_NAMESPACE) -f "$$tmpdir/state-hub.yaml"; \
else \
echo "Namespace $(RAILIANCE_STATE_HUB_NAMESPACE) does not exist; validating namespaced app manifests with client dry-run."; \
$(KUBECTL) apply --dry-run=client -f $(RAILIANCE_STATE_HUB_APP_MANIFESTS)/state-hub-namespace.yaml; \
$(KUBECTL) apply --dry-run=client -f $(RAILIANCE_STATE_HUB_APP_MANIFESTS)/state-hub-env.secret.sops.yaml.template; \
$(KUBECTL) apply --dry-run=client -n $(RAILIANCE_STATE_HUB_NAMESPACE) -f "$$tmpdir/state-hub.yaml"; \
fi
test: test-python dashboard-check test: test-python dashboard-check
test-python: test-python:

1
api/edge/__init__.py Normal file
View File

@@ -0,0 +1 @@
"""State Hub edge relay and durable outbox helpers."""

358
api/edge/outbox.py Normal file
View File

@@ -0,0 +1,358 @@
from __future__ import annotations
import json
import os
import sqlite3
import stat
import uuid
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any
from api.services.write_idempotency import route_class_for
DEFAULT_OUTBOX_PATH = Path(os.environ.get("STATEHUB_OUTBOX_PATH", "~/.statehub/edge-outbox.sqlite3")).expanduser()
MAX_PAYLOAD_BYTES = 64 * 1024
SECRET_FIELD_NAMES = {
"authorization",
"cookie",
"set-cookie",
"password",
"passwd",
"secret",
"api_key",
"apikey",
"access_token",
"refresh_token",
"bearer_token",
"client_secret",
"private_key",
"credential",
"credentials",
}
@dataclass(frozen=True)
class OutboxEnvelope:
id: str
idempotency_key: str
method: str
path: str
body: dict[str, Any] | list[Any] | None
route_class: str
source_agent: str | None
source_host: str | None
repo_slug: str | None
session_id: str | None
observed_revision: dict[str, Any] | None
status: str
attempt_count: int
next_retry_at: str | None
last_error: str | None
response_status: int | None
response_body: dict[str, Any] | list[Any] | str | None
created_at: str
updated_at: str
acked_at: str | None
class PayloadRejected(ValueError):
pass
def utcnow() -> str:
return datetime.now(tz=timezone.utc).isoformat()
def default_outbox_path() -> Path:
return DEFAULT_OUTBOX_PATH
def scrub_payload(value: Any) -> Any:
if isinstance(value, dict):
scrubbed: dict[str, Any] = {}
for key, item in value.items():
normalized = str(key).lower().replace("-", "_")
if normalized in SECRET_FIELD_NAMES:
scrubbed[key] = "[redacted]"
else:
scrubbed[key] = scrub_payload(item)
return scrubbed
if isinstance(value, list):
return [scrub_payload(item) for item in value]
return value
def _json_loads(raw: str | None) -> Any:
if raw is None:
return None
return json.loads(raw)
def _json_dumps(value: Any) -> str | None:
if value is None:
return None
return json.dumps(value, sort_keys=True, separators=(",", ":"))
def _parse_dt(value: str | None) -> datetime | None:
if not value:
return None
return datetime.fromisoformat(value)
class OutboxStore:
def __init__(self, path: str | Path | None = None) -> None:
self.path = Path(path).expanduser() if path is not None else default_outbox_path()
self.path.parent.mkdir(parents=True, exist_ok=True)
self._init_db()
self._chmod_private()
def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(self.path)
conn.row_factory = sqlite3.Row
return conn
def _init_db(self) -> None:
with self._connect() as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS outbox_envelopes (
id TEXT PRIMARY KEY,
idempotency_key TEXT NOT NULL UNIQUE,
method TEXT NOT NULL,
path TEXT NOT NULL,
body_json TEXT,
route_class TEXT NOT NULL,
source_agent TEXT,
source_host TEXT,
repo_slug TEXT,
session_id TEXT,
observed_revision_json TEXT,
status TEXT NOT NULL,
attempt_count INTEGER NOT NULL DEFAULT 0,
next_retry_at TEXT,
last_error TEXT,
response_status INTEGER,
response_body_json TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
acked_at TEXT
)
"""
)
conn.execute("CREATE INDEX IF NOT EXISTS ix_outbox_status ON outbox_envelopes(status)")
conn.execute("CREATE INDEX IF NOT EXISTS ix_outbox_next_retry ON outbox_envelopes(next_retry_at)")
conn.commit()
def _chmod_private(self) -> None:
try:
os.chmod(self.path, stat.S_IRUSR | stat.S_IWUSR)
except OSError:
pass
def enqueue(
self,
*,
method: str,
path: str,
body: Any,
idempotency_key: str | None = None,
source_agent: str | None = None,
source_host: str | None = None,
repo_slug: str | None = None,
session_id: str | None = None,
observed_revision: dict[str, Any] | None = None,
) -> OutboxEnvelope:
route_class = route_class_for(method, path)
if route_class is None:
raise PayloadRejected(f"{method.upper()} {path} is not queueable")
scrubbed = scrub_payload(body)
encoded = _json_dumps(scrubbed)
if encoded is not None and len(encoded.encode("utf-8")) > MAX_PAYLOAD_BYTES:
raise PayloadRejected("payload exceeds offline outbox size limit")
now = utcnow()
envelope_id = str(uuid.uuid4())
key = idempotency_key or f"statehub-edge:{envelope_id}"
method_upper = method.upper()
with self._connect() as conn:
if route_class == "replace":
conn.execute(
"""
UPDATE outbox_envelopes
SET status = 'cancelled', updated_at = ?, last_error = ?
WHERE status = 'queued'
AND route_class = 'replace'
AND method = ?
AND path = ?
""",
(now, f"superseded by {envelope_id}", method_upper, path),
)
conn.execute(
"""
INSERT INTO outbox_envelopes (
id, idempotency_key, method, path, body_json, route_class,
source_agent, source_host, repo_slug, session_id,
observed_revision_json, status, created_at, updated_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'queued', ?, ?)
""",
(
envelope_id,
key,
method_upper,
path,
encoded,
route_class,
source_agent,
source_host,
repo_slug,
session_id,
_json_dumps(observed_revision),
now,
now,
),
)
conn.commit()
return self.get(envelope_id)
def get(self, envelope_id: str) -> OutboxEnvelope:
with self._connect() as conn:
row = conn.execute("SELECT * FROM outbox_envelopes WHERE id = ?", (envelope_id,)).fetchone()
if row is None:
raise KeyError(envelope_id)
return self._row_to_envelope(row)
def list(self, *, status: str | None = None, limit: int = 100) -> list[OutboxEnvelope]:
with self._connect() as conn:
if status:
rows = conn.execute(
"SELECT * FROM outbox_envelopes WHERE status = ? ORDER BY created_at LIMIT ?",
(status, limit),
).fetchall()
else:
rows = conn.execute(
"SELECT * FROM outbox_envelopes ORDER BY created_at LIMIT ?",
(limit,),
).fetchall()
return [self._row_to_envelope(row) for row in rows]
def due(self, *, limit: int = 50) -> list[OutboxEnvelope]:
now = utcnow()
with self._connect() as conn:
rows = conn.execute(
"""
SELECT * FROM outbox_envelopes
WHERE status = 'queued' AND (next_retry_at IS NULL OR next_retry_at <= ?)
ORDER BY created_at
LIMIT ?
""",
(now, limit),
).fetchall()
return [self._row_to_envelope(row) for row in rows]
def summary(self) -> dict[str, Any]:
with self._connect() as conn:
rows = conn.execute(
"SELECT status, COUNT(*) AS count, MIN(created_at) AS oldest FROM outbox_envelopes GROUP BY status"
).fetchall()
by_status = {row["status"]: row["count"] for row in rows}
oldest_pending = None
for row in rows:
if row["status"] in {"queued", "sending", "conflict"} and row["oldest"]:
oldest_pending = min(filter(None, [oldest_pending, row["oldest"]])) if oldest_pending else row["oldest"]
return {
"path": str(self.path),
"by_status": by_status,
"pending_count": sum(by_status.get(status, 0) for status in ("queued", "sending")),
"conflict_count": by_status.get("conflict", 0),
"oldest_pending_at": oldest_pending,
}
def mark_sending(self, envelope_id: str) -> None:
self._update(envelope_id, status="sending", updated_at=utcnow())
def mark_acked(self, envelope_id: str, *, response_status: int, response_body: Any) -> None:
now = utcnow()
self._update(
envelope_id,
status="acked",
response_status=response_status,
response_body_json=_json_dumps(response_body),
updated_at=now,
acked_at=now,
last_error=None,
next_retry_at=None,
)
def mark_conflict(self, envelope_id: str, *, response_status: int, response_body: Any) -> None:
self._update(
envelope_id,
status="conflict",
response_status=response_status,
response_body_json=_json_dumps(response_body),
updated_at=utcnow(),
last_error="conflict",
)
def mark_dead(self, envelope_id: str, *, error: str, response_status: int | None = None, response_body: Any = None) -> None:
self._update(
envelope_id,
status="dead",
response_status=response_status,
response_body_json=_json_dumps(response_body),
updated_at=utcnow(),
last_error=error,
)
def mark_retry(self, envelope_id: str, *, error: str, attempt_count: int) -> None:
delay_seconds = min(3600, 2 ** min(attempt_count, 10))
next_retry = datetime.now(tz=timezone.utc) + timedelta(seconds=delay_seconds)
self._update(
envelope_id,
status="queued",
attempt_count=attempt_count,
next_retry_at=next_retry.isoformat(),
updated_at=utcnow(),
last_error=error[:500],
)
def retry(self, envelope_id: str) -> None:
self._update(envelope_id, status="queued", next_retry_at=None, updated_at=utcnow())
def cancel(self, envelope_id: str) -> None:
self._update(envelope_id, status="cancelled", updated_at=utcnow())
def export(self, *, status: str | None = None, limit: int = 1000) -> list[dict[str, Any]]:
return [envelope.__dict__ for envelope in self.list(status=status, limit=limit)]
def _update(self, envelope_id: str, **values: Any) -> None:
assignments = ", ".join(f"{key} = ?" for key in values)
params = list(values.values()) + [envelope_id]
with self._connect() as conn:
conn.execute(f"UPDATE outbox_envelopes SET {assignments} WHERE id = ?", params)
conn.commit()
def _row_to_envelope(self, row: sqlite3.Row) -> OutboxEnvelope:
return OutboxEnvelope(
id=row["id"],
idempotency_key=row["idempotency_key"],
method=row["method"],
path=row["path"],
body=_json_loads(row["body_json"]),
route_class=row["route_class"],
source_agent=row["source_agent"],
source_host=row["source_host"],
repo_slug=row["repo_slug"],
session_id=row["session_id"],
observed_revision=_json_loads(row["observed_revision_json"]),
status=row["status"],
attempt_count=row["attempt_count"],
next_retry_at=row["next_retry_at"],
last_error=row["last_error"],
response_status=row["response_status"],
response_body=_json_loads(row["response_body_json"]),
created_at=row["created_at"],
updated_at=row["updated_at"],
acked_at=row["acked_at"],
)

206
api/edge/relay.py Normal file
View File

@@ -0,0 +1,206 @@
from __future__ import annotations
import os
import socket
from typing import Any
import httpx
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse, Response
from api.edge.outbox import OutboxEnvelope, OutboxStore, PayloadRejected, default_outbox_path
from api.services.write_idempotency import route_class_for
HOP_BY_HOP_HEADERS = {
"connection",
"keep-alive",
"proxy-authenticate",
"proxy-authorization",
"te",
"trailer",
"transfer-encoding",
"upgrade",
"content-encoding",
"content-length",
}
def _safe_response_headers(headers: httpx.Headers) -> dict[str, str]:
return {key: value for key, value in headers.items() if key.lower() not in HOP_BY_HOP_HEADERS}
def _body_summary(response: httpx.Response) -> Any:
try:
return response.json()
except ValueError:
return {"text": response.text[:500]}
def queued_receipt(envelope: OutboxEnvelope, upstream_error: str) -> dict[str, Any]:
return {
"queued": True,
"outbox_id": envelope.id,
"idempotency_key": envelope.idempotency_key,
"upstream": "unreachable",
"upstream_error": upstream_error,
"route_class": envelope.route_class,
}
async def replay_pending(
store: OutboxStore,
*,
upstream_url: str,
limit: int = 50,
timeout: float = 10.0,
) -> dict[str, int]:
counts = {"sent": 0, "acked": 0, "conflict": 0, "retry": 0, "dead": 0}
async with httpx.AsyncClient(base_url=upstream_url.rstrip("/"), timeout=timeout) as client:
for envelope in store.due(limit=limit):
counts["sent"] += 1
store.mark_sending(envelope.id)
try:
response = await client.request(
envelope.method,
envelope.path,
json=envelope.body,
headers={
"Idempotency-Key": envelope.idempotency_key,
"X-StateHub-Source-Agent": envelope.source_agent or "statehub-edge",
"X-StateHub-Source-Host": envelope.source_host or socket.gethostname(),
},
)
except httpx.HTTPError as exc:
counts["retry"] += 1
store.mark_retry(envelope.id, error=str(exc), attempt_count=envelope.attempt_count + 1)
continue
response_body = _body_summary(response)
if response.status_code == 409:
counts["conflict"] += 1
store.mark_conflict(envelope.id, response_status=response.status_code, response_body=response_body)
elif 200 <= response.status_code < 300:
counts["acked"] += 1
store.mark_acked(envelope.id, response_status=response.status_code, response_body=response_body)
elif response.status_code >= 500:
counts["retry"] += 1
store.mark_retry(
envelope.id,
error=f"HTTP {response.status_code}: {response.text[:300]}",
attempt_count=envelope.attempt_count + 1,
)
else:
counts["dead"] += 1
store.mark_dead(
envelope.id,
error=f"HTTP {response.status_code}: not retryable",
response_status=response.status_code,
response_body=response_body,
)
return counts
def create_app(
*,
upstream_url: str | None = None,
outbox_path: str | None = None,
timeout: float = 10.0,
) -> FastAPI:
upstream = (upstream_url or os.environ.get("STATEHUB_UPSTREAM_URL") or os.environ.get("API_BASE") or "http://127.0.0.1:8000").rstrip("/")
store_path = outbox_path or default_outbox_path()
store_instance: OutboxStore | None = None
def get_store() -> OutboxStore:
nonlocal store_instance
if store_instance is None:
store_instance = OutboxStore(store_path)
return store_instance
app = FastAPI(title="State Hub Edge Relay", version="0.1.0")
@app.get("/edge/health")
async def edge_health() -> dict[str, Any]:
reachable = False
error = None
try:
async with httpx.AsyncClient(base_url=upstream, timeout=2.0) as client:
response = await client.get("/state/health")
reachable = response.status_code < 500
except httpx.HTTPError as exc:
error = str(exc)
return {
"status": "ok",
"upstream": upstream,
"upstream_reachable": reachable,
"upstream_error": error,
"outbox": get_store().summary(),
}
@app.post("/edge/replay")
async def edge_replay(limit: int = 50) -> dict[str, int]:
return await replay_pending(get_store(), upstream_url=upstream, limit=limit, timeout=timeout)
@app.api_route("/{path:path}", methods=["GET", "POST", "PATCH", "PUT", "DELETE"])
async def proxy(path: str, request: Request) -> Response:
api_path = "/" + path
body: Any = None
if request.method in {"POST", "PATCH", "PUT"}:
try:
body = await request.json()
except ValueError:
body = None
headers = {}
if idempotency_key := request.headers.get("idempotency-key"):
headers["Idempotency-Key"] = idempotency_key
if request.headers.get("content-type"):
headers["Content-Type"] = request.headers["content-type"]
try:
async with httpx.AsyncClient(base_url=upstream, timeout=timeout) as client:
response = await client.request(
request.method,
api_path,
params=request.query_params,
json=body if body is not None else None,
headers=headers,
)
return Response(
content=response.content,
status_code=response.status_code,
headers=_safe_response_headers(response.headers),
media_type=response.headers.get("content-type"),
)
except httpx.HTTPError as exc:
route_class = route_class_for(request.method, api_path)
if route_class is None or request.method not in {"POST", "PATCH"}:
return JSONResponse(
status_code=503,
content={
"error": "upstream unreachable and route is not queueable",
"method": request.method,
"path": api_path,
"upstream": upstream,
"detail": str(exc),
},
)
try:
envelope = get_store().enqueue(
method=request.method,
path=api_path,
body=body,
idempotency_key=request.headers.get("idempotency-key"),
source_agent=request.headers.get("x-statehub-source-agent"),
source_host=request.headers.get("x-statehub-source-host") or socket.gethostname(),
repo_slug=request.headers.get("x-statehub-repo-slug"),
session_id=request.headers.get("x-statehub-session-id"),
observed_revision=None,
)
except PayloadRejected as reject:
return JSONResponse(status_code=422, content={"error": str(reject)})
return JSONResponse(status_code=202, content=queued_receipt(envelope, str(exc)))
return app
app = create_app()

View File

@@ -11,6 +11,7 @@ from starlette.responses import Response as StarletteResponse
from api.database import engine from api.database import engine
from api.events import shutdown_publisher from api.events import shutdown_publisher
from api.services.write_idempotency import WriteIdempotencyMiddleware
from api.routers import decisions, extension_points, progress, state, tasks, technical_debt, topics, workstreams, workstream_dependencies from api.routers import decisions, extension_points, progress, state, tasks, technical_debt, topics, workstreams, workstream_dependencies
from api.routers import domains, repos, contributions, sbom, policy, domain_goals, repo_goals, messages, capability_requests, tpsc, services from api.routers import domains, repos, contributions, sbom, policy, domain_goals, repo_goals, messages, capability_requests, tpsc, services
from api.routers import token_events from api.routers import token_events
@@ -91,13 +92,14 @@ _default_dashboard_origins = [
_cors_env = os.getenv("CORS_ORIGINS", ",".join(_default_dashboard_origins)) _cors_env = os.getenv("CORS_ORIGINS", ",".join(_default_dashboard_origins))
_cors_origins = [o.strip() for o in _cors_env.split(",") if o.strip()] _cors_origins = [o.strip() for o in _cors_env.split(",") if o.strip()]
app.add_middleware(WriteIdempotencyMiddleware)
app.add_middleware(ETagMiddleware) app.add_middleware(ETagMiddleware)
app.add_middleware( app.add_middleware(
CORSMiddleware, CORSMiddleware,
allow_origins=_cors_origins, allow_origins=_cors_origins,
allow_methods=["GET", "POST", "PATCH", "DELETE", "PUT"], allow_methods=["GET", "POST", "PATCH", "DELETE", "PUT"],
allow_headers=["Content-Type", "If-None-Match"], allow_headers=["Content-Type", "If-None-Match", "Idempotency-Key", "X-StateHub-Source-Agent", "X-StateHub-Source-Host"],
expose_headers=["ETag", "X-StateHub-Elapsed-Ms", "X-StateHub-Response-Bytes", "X-StateHub-Cache"], expose_headers=["ETag", "X-StateHub-Elapsed-Ms", "X-StateHub-Response-Bytes", "X-StateHub-Cache", "X-StateHub-Idempotency-Replay"],
) )
app.include_router(domains.router) app.include_router(domains.router)

View File

@@ -33,6 +33,7 @@ from api.models.interface_change import InterfaceChange
from api.models.workplan_launch_request import WorkplanLaunchRequest from api.models.workplan_launch_request import WorkplanLaunchRequest
from api.models.fabric_graph import FabricGraphImport, FabricGraphNode, FabricGraphEdge from api.models.fabric_graph import FabricGraphImport, FabricGraphNode, FabricGraphEdge
from api.models.legacy_meter import LegacyInterface, LegacyInterfaceUsageBucket from api.models.legacy_meter import LegacyInterface, LegacyInterfaceUsageBucket
from api.models.write_idempotency_key import WriteIdempotencyKey
__all__ = [ __all__ = [
"Base", "Base",
@@ -65,4 +66,5 @@ __all__ = [
"WorkplanLaunchRequest", "WorkplanLaunchRequest",
"FabricGraphImport", "FabricGraphNode", "FabricGraphEdge", "FabricGraphImport", "FabricGraphNode", "FabricGraphEdge",
"LegacyInterface", "LegacyInterfaceUsageBucket", "LegacyInterface", "LegacyInterfaceUsageBucket",
"WriteIdempotencyKey",
] ]

View File

@@ -52,6 +52,12 @@ class Workplan(Base, TimestampMixin):
nullable=True, nullable=True,
index=True, index=True,
) )
backing_filename: Mapped[str | None] = mapped_column(String(255), nullable=True)
backing_relative_path: Mapped[str | None] = mapped_column(Text, nullable=True)
backing_archived: Mapped[bool | None] = mapped_column(nullable=True)
backing_synced_at: Mapped[datetime | None] = mapped_column(
DateTime(timezone=True), nullable=True
)
topic: Mapped["Topic | None"] = relationship("Topic", back_populates="workplans") # noqa: F821 topic: Mapped["Topic | None"] = relationship("Topic", back_populates="workplans") # noqa: F821
repo: Mapped["ManagedRepo"] = relationship("ManagedRepo", lazy="selectin") # noqa: F821 repo: Mapped["ManagedRepo"] = relationship("ManagedRepo", lazy="selectin") # noqa: F821

View File

@@ -0,0 +1,32 @@
from __future__ import annotations
import uuid
from datetime import datetime
from typing import Any
from sqlalchemy import DateTime, Integer, String, Text, UniqueConstraint
from sqlalchemy.dialects.postgresql import JSONB, UUID
from sqlalchemy.orm import Mapped, mapped_column
from api.models.base import Base, new_uuid
class WriteIdempotencyKey(Base):
__tablename__ = "write_idempotency_keys"
__table_args__ = (
UniqueConstraint("key", name="uq_write_idempotency_keys_key"),
)
id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=new_uuid)
key: Mapped[str] = mapped_column(String(200), nullable=False, index=True)
method: Mapped[str] = mapped_column(String(10), nullable=False)
path: Mapped[str] = mapped_column(Text, nullable=False)
route_class: Mapped[str] = mapped_column(String(30), nullable=False)
request_hash: Mapped[str] = mapped_column(String(64), nullable=False)
response_status: Mapped[int] = mapped_column(Integer, nullable=False)
response_body: Mapped[Any] = mapped_column(JSONB, nullable=True)
source_host: Mapped[str | None] = mapped_column(String(200), nullable=True)
source_agent: Mapped[str | None] = mapped_column(String(100), nullable=True)
first_seen_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
last_seen_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
expires_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True, index=True)

View File

@@ -17,6 +17,7 @@ from api.events import EventEnvelope, publish_event
from api.models.managed_repo import ManagedRepo from api.models.managed_repo import ManagedRepo
from api.models.workplan import Workplan from api.models.workplan import Workplan
from api.schemas.workplan import ( from api.schemas.workplan import (
WorkplanBindingsSync,
WorkplanCreate, WorkplanCreate,
WorkplanRead, WorkplanRead,
WorkplanUpdate, WorkplanUpdate,
@@ -212,9 +213,38 @@ async def _build_workplan_index(session: AsyncSession) -> dict[str, Any]:
"needs_review": bool(review and review.needs_review), "needs_review": bool(review and review.needs_review),
"health_labels": ["needs_review"] if review and review.needs_review else [], "health_labels": ["needs_review"] if review and review.needs_review else [],
} }
await _merge_db_backing_index(session, index)
return {"workplans": index, "workstreams": index} return {"workplans": index, "workstreams": index}
async def _merge_db_backing_index(session: AsyncSession, index: dict[str, Any]) -> None:
"""Fill index gaps from DB-backed file bindings synced by fix-consistency."""
result = await session.execute(
select(Workplan, ManagedRepo.slug)
.join(ManagedRepo, Workplan.repo_id == ManagedRepo.id)
.where(Workplan.backing_filename.isnot(None))
)
for wp, repo_slug in result.all():
key = str(wp.id)
if key in index:
continue
index[key] = {
"filename": wp.backing_filename,
"relative_path": wp.backing_relative_path,
"repo_slug": repo_slug,
"archived": bool(wp.backing_archived),
"status": normalize_workplan_status(wp.status) if wp.status else None,
"needs_review": False,
"health_labels": [],
}
def _invalidate_workplan_index_cache() -> None:
global _INDEX_CACHE, _INDEX_CACHE_AT
_INDEX_CACHE = None
_INDEX_CACHE_AT = 0.0
def _index_with_meta(*, stale: bool, refresh_in_progress: bool) -> dict[str, Any]: def _index_with_meta(*, stale: bool, refresh_in_progress: bool) -> dict[str, Any]:
age = time.monotonic() - _INDEX_CACHE_AT if _INDEX_CACHE_AT else None age = time.monotonic() - _INDEX_CACHE_AT if _INDEX_CACHE_AT else None
return { return {
@@ -459,6 +489,28 @@ async def workplan_index_preferred(
return await _workplan_index(refresh=refresh, session=session) return await _workplan_index(refresh=refresh, session=session)
@workplan_router.put("/index/bindings")
async def sync_workplan_bindings(
body: WorkplanBindingsSync,
session: AsyncSession = Depends(get_session),
) -> dict[str, int]:
"""Upsert workstation workplan file bindings for remote API index fallback."""
synced_at = datetime.now(timezone.utc)
updated = 0
for entry in body.bindings:
wp = await session.get(Workplan, entry.workplan_id)
if wp is None:
continue
wp.backing_filename = entry.filename
wp.backing_relative_path = entry.relative_path
wp.backing_archived = entry.archived
wp.backing_synced_at = synced_at
updated += 1
await session.commit()
_invalidate_workplan_index_cache()
return {"updated": updated, "received": len(body.bindings)}
@router.post("/", response_model=WorkplanRead, status_code=status.HTTP_201_CREATED) @router.post("/", response_model=WorkplanRead, status_code=status.HTTP_201_CREATED)
async def create_workstream( async def create_workstream(
request: Request, request: Request,

View File

@@ -67,6 +67,19 @@ class WorkplanUpdate(WorkplanStatusMixin):
repo_goal_id: uuid.UUID | None = None repo_goal_id: uuid.UUID | None = None
class WorkplanFileBinding(BaseModel):
workplan_id: uuid.UUID
filename: str
relative_path: str
repo_slug: str
archived: bool = False
status: WorkplanStatus | None = None
class WorkplanBindingsSync(BaseModel):
bindings: list[WorkplanFileBinding]
class WorkplanRead(WorkplanStatusMixin): class WorkplanRead(WorkplanStatusMixin):
model_config = ConfigDict(from_attributes=True) model_config = ConfigDict(from_attributes=True)
id: uuid.UUID id: uuid.UUID
@@ -87,6 +100,10 @@ class WorkplanRead(WorkplanStatusMixin):
queue_rank: int | None = None queue_rank: int | None = None
execution_group: str | None = None execution_group: str | None = None
scheduled_for: datetime | None = None scheduled_for: datetime | None = None
backing_filename: str | None = None
backing_relative_path: str | None = None
backing_archived: bool | None = None
backing_synced_at: datetime | None = None
created_at: datetime created_at: datetime
updated_at: datetime updated_at: datetime

View File

@@ -0,0 +1,221 @@
from __future__ import annotations
import hashlib
import json
import re
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from typing import Any
from sqlalchemy import select
from starlette.responses import JSONResponse
from starlette.types import ASGIApp, Message, Receive, Scope, Send
from api.database import async_session_factory
from api.models.write_idempotency_key import WriteIdempotencyKey
IDEMPOTENCY_HEADER = b"idempotency-key"
REPLAY_HEADER = "X-StateHub-Idempotency-Replay"
CONFLICT_STATUS = 409
DEFAULT_IDEMPOTENCY_TTL_DAYS = 14
@dataclass(frozen=True)
class WriteRouteRule:
method: str
pattern: str
route_class: str
description: str
def matches(self, method: str, path: str) -> bool:
normalized = path.rstrip("/") or "/"
return self.method == method.upper() and re.fullmatch(self.pattern, normalized) is not None
WRITE_ROUTE_RULES: tuple[WriteRouteRule, ...] = (
WriteRouteRule("POST", r"/progress", "append", "append progress event"),
WriteRouteRule("POST", r"/messages", "append", "send agent message"),
WriteRouteRule("PATCH", r"/messages/[^/]+/read", "append", "mark known message read"),
WriteRouteRule("POST", r"/token-events", "append", "record token event"),
WriteRouteRule("POST", r"/token-events/upsert", "append", "upsert token event"),
WriteRouteRule("POST", r"/decisions", "append", "record decision"),
WriteRouteRule("PATCH", r"/tasks/[^/]+", "replace", "update task"),
WriteRouteRule("POST", r"/tasks/bulk-status-sync", "replace", "bulk task status sync"),
WriteRouteRule("PATCH", r"/decisions/[^/]+", "replace", "update decision"),
WriteRouteRule("POST", r"/decisions/[^/]+/resolve", "replace", "resolve decision"),
WriteRouteRule("PATCH", r"/workplans/[^/]+", "replace", "update workplan"),
WriteRouteRule("PATCH", r"/workstreams/[^/]+", "replace", "update legacy workstream alias"),
)
def route_rule_for(method: str, path: str) -> WriteRouteRule | None:
for rule in WRITE_ROUTE_RULES:
if rule.matches(method, path):
return rule
return None
def route_class_for(method: str, path: str) -> str | None:
rule = route_rule_for(method, path)
return rule.route_class if rule else None
def canonical_request_hash(method: str, path: str, query_string: bytes, body: bytes) -> str:
try:
parsed: Any = json.loads(body.decode("utf-8")) if body else None
body_repr = json.dumps(parsed, sort_keys=True, separators=(",", ":"))
except (UnicodeDecodeError, json.JSONDecodeError):
body_repr = body.hex()
query = query_string.decode("utf-8", errors="replace")
seed = f"{method.upper()}\n{path}\n{query}\n{body_repr}".encode("utf-8")
return hashlib.sha256(seed).hexdigest()
def _header_value(headers: list[tuple[bytes, bytes]], name: bytes) -> str | None:
lname = name.lower()
for key, value in headers:
if key.lower() == lname:
return value.decode("utf-8", errors="replace")
return None
async def _send_json_response(response: JSONResponse, scope: Scope, receive: Receive, send: Send) -> None:
await response(scope, receive, send)
class WriteIdempotencyMiddleware:
"""Replay exact duplicate write requests carrying Idempotency-Key.
The middleware is intentionally narrow: it only participates on the offline
relay allowlist. Non-allowlisted routes keep their normal behavior even if a
caller sends an Idempotency-Key header.
"""
def __init__(self, app: ASGIApp, *, ttl_days: int = DEFAULT_IDEMPOTENCY_TTL_DAYS) -> None:
self.app = app
self.ttl_days = ttl_days
async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
if scope["type"] != "http":
await self.app(scope, receive, send)
return
method = str(scope.get("method", "")).upper()
path = str(scope.get("path", ""))
rule = route_rule_for(method, path)
headers = list(scope.get("headers") or [])
key = _header_value(headers, IDEMPOTENCY_HEADER)
if rule is None or not key:
await self.app(scope, receive, send)
return
body = await self._read_body(receive)
request_hash = canonical_request_hash(method, path, scope.get("query_string", b""), body)
source_host = _header_value(headers, b"x-statehub-source-host")
source_agent = _header_value(headers, b"x-statehub-source-agent")
async with async_session_factory() as session:
existing = (await session.execute(
select(WriteIdempotencyKey).where(WriteIdempotencyKey.key == key)
)).scalar_one_or_none()
if existing is not None:
existing.last_seen_at = datetime.now(tz=timezone.utc)
await session.commit()
if existing.request_hash != request_hash:
await _send_json_response(
JSONResponse(
status_code=CONFLICT_STATUS,
content={
"error": "Idempotency-Key was reused with a different request",
"idempotency_key": key,
},
),
scope,
self._receive_from_body(body),
send,
)
return
await _send_json_response(
JSONResponse(
status_code=existing.response_status,
content=existing.response_body,
headers={REPLAY_HEADER: "true"},
),
scope,
self._receive_from_body(body),
send,
)
return
start_message: Message | None = None
body_parts: list[bytes] = []
async def capture_send(message: Message) -> None:
nonlocal start_message
if message["type"] == "http.response.start":
start_message = message
elif message["type"] == "http.response.body":
body_parts.append(message.get("body", b""))
await send(message)
await self.app(scope, self._receive_from_body(body), capture_send)
if start_message is None:
return
status = int(start_message.get("status", 500))
if status < 200 or status >= 300:
return
response_body_bytes = b"".join(body_parts)
try:
response_body = json.loads(response_body_bytes.decode("utf-8")) if response_body_bytes else None
except (UnicodeDecodeError, json.JSONDecodeError):
return
async with async_session_factory() as session:
existing = (await session.execute(
select(WriteIdempotencyKey).where(WriteIdempotencyKey.key == key)
)).scalar_one_or_none()
if existing is not None:
return
now = datetime.now(tz=timezone.utc)
session.add(WriteIdempotencyKey(
key=key,
method=method,
path=path,
route_class=rule.route_class,
request_hash=request_hash,
response_status=status,
response_body=response_body,
source_host=source_host,
source_agent=source_agent,
first_seen_at=now,
last_seen_at=now,
expires_at=now + timedelta(days=self.ttl_days),
))
await session.commit()
@staticmethod
async def _read_body(receive: Receive) -> bytes:
chunks: list[bytes] = []
while True:
message = await receive()
if message["type"] != "http.request":
continue
chunks.append(message.get("body", b""))
if not message.get("more_body", False):
break
return b"".join(chunks)
@staticmethod
def _receive_from_body(body: bytes) -> Receive:
sent = False
async def receive() -> Message:
nonlocal sent
if sent:
return {"type": "http.request", "body": b"", "more_body": False}
sent = True
return {"type": "http.request", "body": body, "more_body": False}
return receive

View File

@@ -365,6 +365,57 @@ def cmd_ingest_sbom(args: argparse.Namespace) -> None:
sys.exit(result.returncode) sys.exit(result.returncode)
def cmd_fix_consistency(args: argparse.Namespace) -> None:
"""Run ADR-001 consistency repair from any registered repo checkout."""
checker = STATE_HUB_DIR / "scripts" / "consistency_check.py"
if not checker.exists():
print(f"ERROR: consistency checker not found at {checker}")
print(" Run this command from an editable state-hub install or the state-hub repo.")
sys.exit(1)
if args.remote and not (args.repo or args.all):
print("ERROR: --remote requires --repo or --all.")
print(" From a local checkout, run: statehub fix-consistency")
print(" For pull-before-fix, run: statehub fix-consistency --repo <slug> --remote")
sys.exit(1)
cmd = [sys.executable, str(checker)]
if args.all:
cmd.append("--all")
elif args.repo:
cmd.extend(["--repo", args.repo])
if args.repo_path:
cmd.extend(["--repo-path", str(Path(args.repo_path).expanduser().resolve())])
else:
cmd.append("--here")
if args.path:
cmd.append(str(Path(args.path).expanduser().resolve()))
cmd.append("--fix")
if args.remote:
cmd.append("--remote")
if args.no_writeback:
cmd.append("--no-writeback")
if args.archive_closed:
cmd.append("--archive-closed")
if args.archive_workplan:
cmd.extend(["--archive-workplan", args.archive_workplan])
if args.archive_date:
cmd.extend(["--archive-date", args.archive_date])
if args.api_base:
cmd.extend(["--api-base", args.api_base])
if args.as_json:
cmd.append("--json")
if args.max_seconds is not None:
cmd.extend(["--max-seconds", str(args.max_seconds)])
result = subprocess.run(cmd)
exit_code = result.returncode
if exit_code == 2 and not args.strict_warnings:
exit_code = 0
sys.exit(exit_code)
def cmd_create_workstream(args: argparse.Namespace) -> None: def cmd_create_workstream(args: argparse.Namespace) -> None:
"""Create a workstream under a domain's topic.""" """Create a workstream under a domain's topic."""
_api_get("/state/health") _api_get("/state/health")
@@ -465,6 +516,55 @@ def cmd_status(_args: argparse.Namespace) -> None:
print(f" [{deadline}] {d['title']}") print(f" [{deadline}] {d['title']}")
def _outbox_store(args):
from api.edge.outbox import OutboxStore, default_outbox_path
return OutboxStore(args.outbox_path or default_outbox_path())
def cmd_outbox_status(args: argparse.Namespace) -> None:
store = _outbox_store(args)
print(json.dumps(store.summary(), indent=2))
def cmd_outbox_list(args: argparse.Namespace) -> None:
store = _outbox_store(args)
rows = store.export(status=args.status, limit=args.limit)
print(json.dumps(rows, indent=2))
def cmd_outbox_export(args: argparse.Namespace) -> None:
store = _outbox_store(args)
payload = store.export(status=args.status, limit=args.limit)
if args.output:
Path(args.output).write_text(json.dumps(payload, indent=2) + "\n")
print(f"Exported {len(payload)} envelope(s) to {args.output}")
else:
print(json.dumps(payload, indent=2))
def cmd_outbox_replay(args: argparse.Namespace) -> None:
import asyncio
from api.edge.relay import replay_pending
store = _outbox_store(args)
upstream = args.upstream_url or os.environ.get("STATEHUB_UPSTREAM_URL") or API_BASE
result = asyncio.run(replay_pending(store, upstream_url=upstream, limit=args.limit))
print(json.dumps(result, indent=2))
def cmd_outbox_retry(args: argparse.Namespace) -> None:
store = _outbox_store(args)
store.retry(args.envelope_id)
print(f"Queued {args.envelope_id} for retry")
def cmd_outbox_cancel(args: argparse.Namespace) -> None:
store = _outbox_store(args)
store.cancel(args.envelope_id)
print(f"Cancelled {args.envelope_id}")
# ── Entry point ──────────────────────────────────────────────────────────────── # ── Entry point ────────────────────────────────────────────────────────────────
def main() -> None: def main() -> None:
@@ -533,6 +633,30 @@ def main() -> None:
ing.add_argument("--slug", default=None, help="Repo slug (auto-detected from path if omitted)") ing.add_argument("--slug", default=None, help="Repo slug (auto-detected from path if omitted)")
ing.add_argument("--dry-run", action="store_true", help="Parse lockfiles but do not submit to API") ing.add_argument("--dry-run", action="store_true", help="Parse lockfiles but do not submit to API")
# fix-consistency
fix = sub.add_parser(
"fix-consistency",
help="Reconcile workplan files with State Hub from the current repo",
)
target = fix.add_mutually_exclusive_group()
target.add_argument("--repo", default=None, help="Registered repo slug; defaults to inferring from --path")
target.add_argument("--all", action="store_true", help="Fix all registered repos with a visible path")
fix.add_argument("--path", default=os.getcwd(), help="Repo checkout to infer from (defaults to cwd)")
fix.add_argument("--repo-path", default=None, help="Override repo path when using --repo")
fix.add_argument("--remote", action="store_true", help="Pull before fixing; requires --repo or --all")
fix.add_argument("--max-seconds", type=int, default=None, help="Wall-clock budget for --remote --all")
fix.add_argument("--no-writeback", action="store_true", help="Disable DB-to-file status writeback")
fix.add_argument("--archive-closed", action="store_true", help="Archive closed root workplans after fixing")
fix.add_argument("--archive-workplan", default=None, help="Archive only the matching workplan id or filename")
fix.add_argument("--archive-date", default=None, help="YYMMDD archive prefix for --archive-closed")
fix.add_argument("--api-base", default=API_BASE, help="State Hub API base URL")
fix.add_argument("--json", action="store_true", dest="as_json", help="Output JSON from the checker")
fix.add_argument(
"--strict-warnings",
action="store_true",
help="Preserve checker exit code 2 for warnings-only runs",
)
# create-workstream # create-workstream
cws = sub.add_parser("create-workstream", help="Create a workstream under a domain topic") cws = sub.add_parser("create-workstream", help="Create a workstream under a domain topic")
cws.add_argument("--domain", required=True, help="Domain slug to create the workstream under") cws.add_argument("--domain", required=True, help="Domain slug to create the workstream under")
@@ -549,17 +673,54 @@ def main() -> None:
ctask.add_argument("--assignee", default=None) ctask.add_argument("--assignee", default=None)
ctask.add_argument("--description", default=None) ctask.add_argument("--description", default=None)
# outbox
outbox = sub.add_parser("outbox", help="Inspect and replay the local State Hub edge outbox")
outbox.add_argument("--outbox-path", default=None, help="SQLite outbox path (defaults to ~/.statehub/edge-outbox.sqlite3)")
out_sub = outbox.add_subparsers(dest="outbox_command", required=True)
out_status = out_sub.add_parser("status", help="Show pending, conflict, and ack counts")
out_status.set_defaults(func=cmd_outbox_status)
out_list = out_sub.add_parser("list", help="List outbox envelopes as JSON")
out_list.add_argument("--status", default=None, help="Filter by status")
out_list.add_argument("--limit", type=int, default=100)
out_list.set_defaults(func=cmd_outbox_list)
out_export = out_sub.add_parser("export", help="Export non-secret envelopes")
out_export.add_argument("--status", default=None, help="Filter by status")
out_export.add_argument("--limit", type=int, default=1000)
out_export.add_argument("--output", default=None, help="Write JSON to a file instead of stdout")
out_export.set_defaults(func=cmd_outbox_export)
out_replay = out_sub.add_parser("replay", help="Replay due queued envelopes")
out_replay.add_argument("--upstream-url", default=None, help="Central State Hub API base URL")
out_replay.add_argument("--limit", type=int, default=50)
out_replay.set_defaults(func=cmd_outbox_replay)
out_retry = out_sub.add_parser("retry", help="Force one envelope back to queued")
out_retry.add_argument("envelope_id")
out_retry.set_defaults(func=cmd_outbox_retry)
out_cancel = out_sub.add_parser("cancel", help="Cancel one envelope")
out_cancel.add_argument("envelope_id")
out_cancel.set_defaults(func=cmd_outbox_cancel)
# status # status
sub.add_parser("status", help="Show State Hub health and summary totals") sub.add_parser("status", help="Show State Hub health and summary totals")
args = parser.parse_args() args = parser.parse_args()
if args.command == "register": if hasattr(args, "func"):
args.func(args)
elif args.command == "register":
run_statehub_register(args) run_statehub_register(args)
elif args.command == "register-project": elif args.command == "register-project":
cmd_register(args) cmd_register(args)
elif args.command == "ingest-sbom": elif args.command == "ingest-sbom":
cmd_ingest_sbom(args) cmd_ingest_sbom(args)
elif args.command == "fix-consistency":
cmd_fix_consistency(args)
elif args.command == "create-workstream": elif args.command == "create-workstream":
cmd_create_workstream(args) cmd_create_workstream(args)
elif args.command == "create-task": elif args.command == "create-task":

View File

@@ -85,5 +85,5 @@ Use the **Add Repo** form or:
# 1. Author classification file in the repo # 1. Author classification file in the repo
# 2. Register / reclassify # 2. Register / reclassify
make register-from-classification PATH=/path/to/repo make register-from-classification PATH=/path/to/repo
make fix-consistency REPO=<slug> statehub fix-consistency
``` ```

View File

@@ -82,7 +82,7 @@ Invalidation). The practical consequence:
| `uv.lock`, `package-lock.json`, etc. | SBOM entries + licence risk | `make ingest-sbom REPO=` | | `uv.lock`, `package-lock.json`, etc. | SBOM entries + licence risk | `make ingest-sbom REPO=` |
| `tpsc.yaml` | Third-party service declarations + GDPR warnings | `make ingest-tpsc REPO=` | | `tpsc.yaml` | Third-party service declarations + GDPR warnings | `make ingest-tpsc REPO=` |
| `SCOPE.md` capability blocks | Capability catalog | `make ingest-capabilities REPO=` | | `SCOPE.md` capability blocks | Capability catalog | `make ingest-capabilities REPO=` |
| `workplans/*.md` | Workstream + task status | `make fix-consistency REPO=` | | `workplans/*.md` | Workstream + task status | `statehub fix-consistency` |
| Repo files + DB records | DoI compliance tier | Fingerprint cache, auto-refreshed on read | | Repo files + DB records | DoI compliance tier | Fingerprint cache, auto-refreshed on read |
--- ---

View File

@@ -0,0 +1,90 @@
# State Hub Railiance Deployment Handoff
This directory contains the State Hub deployment handoff for `CUST-WP-0011`.
It is source-owned by `state-hub` and split along the Railiance ownership
boundaries used for the actual cluster rollout.
## Ownership
- `deploy/railiance/platform/` is the `railiance-platform` handoff for the
`state-hub-db` CloudNativePG cluster, database bootstrap credential, and
database NetworkPolicies in the `databases` namespace.
- `deploy/railiance/apps/` is the `railiance-apps` handoff for the State Hub API
Helm chart, non-secret production values, and app namespace runtime Secret
template.
- Runtime secret values are not stored here. Replace placeholder passwords only
in an operator-controlled file, then encrypt or deliver through the approved
platform secret path.
## Image
The current image is pinned to:
```text
gitea.coulomb.social/coulomb/state-hub:b536741
```
railiance01 has already pulled this tag with `crictl`, and the image serves
`GET /state/health` against the local WSL database in smoke testing.
## Render And Dry-Run
Render the app chart without touching the cluster:
```bash
make railiance-state-hub-render
```
Run client-side Kubernetes validation for the platform manifests, app Secret
template, and rendered chart:
```bash
make railiance-state-hub-client-dry-run
```
Run server-side dry-run against the configured representative cluster:
```bash
KUBECONFIG=~/.kube/config-hosteurope make railiance-state-hub-server-dry-run
```
Server-side dry-run requires the CNPG CRDs, namespace permissions, and dry-run
permission for resources in `databases` and `state-hub`.
Before the `state-hub` namespace exists, Kubernetes cannot server-dry-run namespaced app
objects into that namespace because dry-run Namespace creation is not persisted.
The Make target therefore server-validates the platform and Namespace manifests,
then falls back to client dry-run for namespaced app manifests with an explicit
notice.
## Promotion Notes
Platform promotion into `railiance-platform`:
- copy `platform/state-hub-db-credentials.sops.yaml.template` to a real SOPS
secret file with an operator-generated password;
- apply or GitOps-manage `platform/state-hub-db-cluster.yaml`;
- apply or GitOps-manage `platform/state-hub-db-networkpolicies.yaml`.
App promotion into `railiance-apps`:
- copy `apps/charts/state-hub/` to `charts/state-hub/`;
- copy `apps/helm/state-hub-values.yaml` to `helm/state-hub-values.yaml`;
- apply or GitOps-manage `apps/manifests/state-hub-namespace.yaml`;
- create `state-hub-env` in the `state-hub` namespace from the approved
secret-delivery path;
- deploy with Helm using the production values file, which sets
`namespace.create=false`, only after `state-hub-db` is healthy.
## Runtime Secret Contract
The app chart expects a Kubernetes Secret named `state-hub-env` in the
`state-hub` namespace with at least:
```text
DATABASE_URL=postgresql+asyncpg://state_hub:<url-encoded-password>@state-hub-db-rw.databases.svc.cluster.local:5432/state_hub
```
Optional runtime settings such as `CORS_ORIGINS` can live in the chart
ConfigMap. The default chart keeps public ingress disabled; access should use
the existing private tunnel/ops-bridge path until a separate exposure decision
is recorded.

View File

@@ -0,0 +1,6 @@
apiVersion: v2
name: state-hub
description: State Hub API service for private Railiance operation
type: application
version: 0.1.0
appVersion: "b536741"

View File

@@ -0,0 +1,26 @@
{{- define "statehub.fullname" -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- printf "%s" $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- define "statehub.labels" -}}
app: {{ include "statehub.fullname" . }}
app.kubernetes.io/name: {{ include "statehub.fullname" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/part-of: railiance-apps
helm.sh/chart: {{ printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" }}
railiance.io/layer: s5-app
{{- end -}}
{{- define "statehub.selectorLabels" -}}
app: {{ include "statehub.fullname" . }}
{{- end -}}
{{- define "statehub.image" -}}
{{- if not .Values.image.tag -}}
{{- fail "image.tag is required - pin it in deploy/railiance/apps/helm/state-hub-values.yaml or pass --set image.tag=<sha>" -}}
{{- end -}}
{{- printf "%s:%s" .Values.image.repository .Values.image.tag -}}
{{- end -}}

View File

@@ -0,0 +1,9 @@
{{- if .Values.config.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Values.config.name }}
labels: {{- include "statehub.labels" . | nindent 4 }}
data:
CORS_ORIGINS: {{ .Values.config.corsOrigins | quote }}
{{- end }}

View File

@@ -0,0 +1,66 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "statehub.fullname" . }}
labels: {{- include "statehub.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels: {{- include "statehub.selectorLabels" . | nindent 6 }}
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels: {{- include "statehub.labels" . | nindent 8 }}
spec:
securityContext: {{- toYaml .Values.podSecurityContext | nindent 8 }}
{{- with .Values.imagePullSecrets }}
imagePullSecrets: {{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: state-hub
image: {{ include "statehub.image" . | quote }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext: {{- toYaml .Values.securityContext | nindent 12 }}
ports:
- name: http
containerPort: {{ .Values.service.targetPort }}
protocol: TCP
envFrom:
{{- if .Values.config.enabled }}
- configMapRef:
name: {{ .Values.config.name | quote }}
{{- end }}
- secretRef:
name: {{ .Values.secret.name | quote }}
{{- if .Values.probes.enabled }}
readinessProbe:
httpGet:
path: {{ .Values.probes.path }}
port: {{ .Values.probes.port }}
initialDelaySeconds: {{ .Values.probes.readiness.initialDelaySeconds }}
periodSeconds: {{ .Values.probes.readiness.periodSeconds }}
timeoutSeconds: {{ .Values.probes.readiness.timeoutSeconds }}
failureThreshold: {{ .Values.probes.readiness.failureThreshold }}
livenessProbe:
httpGet:
path: {{ .Values.probes.path }}
port: {{ .Values.probes.port }}
initialDelaySeconds: {{ .Values.probes.liveness.initialDelaySeconds }}
periodSeconds: {{ .Values.probes.liveness.periodSeconds }}
timeoutSeconds: {{ .Values.probes.liveness.timeoutSeconds }}
failureThreshold: {{ .Values.probes.liveness.failureThreshold }}
{{- end }}
resources: {{- toYaml .Values.resources | nindent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector: {{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity: {{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations: {{- toYaml . | nindent 8 }}
{{- end }}

View File

@@ -0,0 +1,28 @@
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "statehub.fullname" . }}
labels: {{- include "statehub.labels" . | nindent 4 }}
annotations:
{{- toYaml .Values.ingress.annotations | nindent 4 }}
spec:
ingressClassName: {{ .Values.ingress.className }}
{{- if .Values.ingress.tls }}
tls:
- hosts:
- {{ .Values.ingress.host }}
secretName: {{ include "statehub.fullname" . }}-tls
{{- end }}
rules:
- host: {{ .Values.ingress.host }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ include "statehub.fullname" . }}
port:
number: {{ .Values.service.port }}
{{- end }}

View File

@@ -0,0 +1,8 @@
{{- if .Values.namespace.create }}
apiVersion: v1
kind: Namespace
metadata:
name: {{ .Release.Namespace }}
labels:
{{- toYaml .Values.namespace.labels | nindent 4 }}
{{- end }}

View File

@@ -0,0 +1,13 @@
apiVersion: v1
kind: Service
metadata:
name: {{ include "statehub.fullname" . }}
labels: {{- include "statehub.labels" . | nindent 4 }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: {{ .Values.service.targetPort }}
protocol: TCP
name: http
selector: {{- include "statehub.selectorLabels" . | nindent 4 }}

View File

@@ -0,0 +1,67 @@
image:
repository: gitea.coulomb.social/coulomb/state-hub
tag: ""
pullPolicy: IfNotPresent
imagePullSecrets: []
replicaCount: 1
namespace:
create: true
labels:
railiance.io/postgres-client: state-hub-db
railiance.io/layer: s5-app
service:
type: ClusterIP
port: 8000
targetPort: 8000
config:
enabled: true
name: state-hub-config
corsOrigins: "http://localhost:3000,http://127.0.0.1:3000,http://localhost:3001,http://127.0.0.1:3001"
secret:
name: state-hub-env
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 2Gi
ingress:
enabled: false
className: traefik
host: state-hub.coulomb.social
tls: true
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: "true"
cert-manager.io/cluster-issuer: letsencrypt-prod
probes:
enabled: true
path: /state/health
port: 8000
liveness:
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readiness:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
podSecurityContext: {}
securityContext: {}
nodeSelector: {}
tolerations: []
affinity: {}

View File

@@ -0,0 +1,11 @@
# Production values for the State Hub Railiance chart handoff.
# Non-secret values only. DATABASE_URL comes from the Secret `state-hub-env`.
namespace:
create: false
image:
tag: "b536741"
ingress:
enabled: false

View File

@@ -0,0 +1,18 @@
# Template for the State Hub runtime Secret in the state-hub namespace.
# DO NOT commit this file with real credentials.
# Encrypt with: sops -e -i state-hub-env.sops.yaml
# Apply with: kubectl apply -f <(sops -d state-hub-env.sops.yaml)
---
apiVersion: v1
kind: Secret
metadata:
name: state-hub-env
namespace: state-hub
labels:
app.kubernetes.io/name: state-hub
app.kubernetes.io/component: runtime-env
app.kubernetes.io/managed-by: manual
railiance.io/layer: s5-app
type: Opaque
stringData:
DATABASE_URL: postgresql+asyncpg://state_hub:REPLACE_WITH_URL_ENCODED_PASSWORD@state-hub-db-rw.databases.svc.cluster.local:5432/state_hub

View File

@@ -0,0 +1,8 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: state-hub
labels:
railiance.io/layer: s5-app
railiance.io/postgres-client: state-hub-db

View File

@@ -0,0 +1,28 @@
---
# Dedicated CNPG Cluster for State Hub episodic memory.
# Owned by railiance-platform (S3). Operator lives in cnpg-system.
#
# Pre-condition: state-hub-db-credentials Secret exists in databases namespace.
# Runtime app Secret is separate and lives in the state-hub namespace.
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: state-hub-db
namespace: databases
labels:
app.kubernetes.io/name: state-hub-db
app.kubernetes.io/component: database
app.kubernetes.io/managed-by: manual
railiance.io/layer: s3-platform
railiance.io/role: state-hub-database
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:16
storage:
size: 10Gi
bootstrap:
initdb:
database: state_hub
owner: state_hub
secret:
name: state-hub-db-credentials

View File

@@ -0,0 +1,19 @@
# Template for the state-hub-db bootstrap Secret.
# DO NOT commit this file with real credentials.
# Encrypt with: sops -e -i state-hub-db-credentials.sops.yaml
# Apply with: kubectl apply -f <(sops -d state-hub-db-credentials.sops.yaml)
---
apiVersion: v1
kind: Secret
metadata:
name: state-hub-db-credentials
namespace: databases
labels:
app.kubernetes.io/name: state-hub-db
app.kubernetes.io/component: database-bootstrap
app.kubernetes.io/managed-by: manual
railiance.io/layer: s3-platform
type: kubernetes.io/basic-auth
stringData:
username: state_hub
password: REPLACE_WITH_PASSWORD

View File

@@ -0,0 +1,74 @@
---
# NetworkPolicies for the dedicated State Hub CNPG cluster.
# Namespaces that need database access must carry:
# railiance.io/postgres-client: state-hub-db
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-kube-api-state-hub-db
namespace: databases
labels:
app.kubernetes.io/name: state-hub-db
railiance.io/layer: s3-platform
spec:
podSelector:
matchLabels:
cnpg.io/cluster: state-hub-db
policyTypes:
- Egress
egress:
- ports:
- protocol: TCP
port: 443
- protocol: TCP
port: 6443
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-from-cnpg-operator-state-hub-db
namespace: databases
labels:
app.kubernetes.io/name: state-hub-db
railiance.io/layer: s3-platform
spec:
podSelector:
matchLabels:
cnpg.io/cluster: state-hub-db
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: cnpg-system
ports:
- protocol: TCP
port: 5432
- protocol: TCP
port: 8000
- protocol: TCP
port: 9187
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-from-state-hub-namespace-state-hub-db
namespace: databases
labels:
app.kubernetes.io/name: state-hub-db
railiance.io/layer: s3-platform
spec:
podSelector:
matchLabels:
cnpg.io/cluster: state-hub-db
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
railiance.io/postgres-client: state-hub-db
ports:
- protocol: TCP
port: 5432

View File

@@ -6,13 +6,15 @@ The State Hub production image is built from `state-hub/Dockerfile`.
```bash ```bash
cd state-hub cd state-hub
docker build -t state-hub:local . docker build --build-context hub_core_src=/home/worsch/hub-core \
-t state-hub:local \
-t gitea.coulomb.social/coulomb/state-hub:<git-sha> .
``` ```
The image installs runtime dependencies from `pyproject.toml` and excludes the The image installs runtime dependencies from `pyproject.toml` and excludes
local editable `llm-connect` dependency. `llm-connect` is currently used by the workstation-local editable sources from registry resolution. `llm-connect` is
test suite only; the API and MCP runtime do not import it. Removing that test-only and omitted. `hub-core` is runtime code and is installed from the
workstation-local path from the image keeps cluster builds reproducible. named `hub_core_src` Docker build context until it is published as a package.
## Runtime ## Runtime
@@ -49,7 +51,28 @@ Expected response:
{"status":"ok","db":"connected"} {"status":"ok","db":"connected"}
``` ```
## Current Local Build ## Current Published Build
Verified and published on 2026-06-25:
```text
image: gitea.coulomb.social/coulomb/state-hub:b536741
source commit: b536741
local image id / index digest: sha256:3184dfd67f127cf8bd5303d7a210d6dc32e7ab05a5da5d51eab5b9a37dab4d4e
linux/amd64 manifest digest: sha256:a8f30b35c10d9c90fecf4e3ec82849ccb484b6c137cfce7948931005b9690377
config digest pulled by railiance01: sha256:5ce9c52fa554d6415e7d65d954e0778a8d8f7f8ebb5387c9e6694e1caac9b522
created: 2026-06-25T13:51:55+02:00
size: 106675605 bytes
alembic heads: e9f0a1b2c3d4 (head)
health: GET /state/health -> {"status":"ok","db":"connected"}
registry: docker push succeeded
railiance01: sudo crictl pull gitea.coulomb.social/coulomb/state-hub:b536741 succeeded
```
Smoke command used a temporary container on host port 18082 so it did not
conflict with the live workstation State Hub on port 8000.
## Historical Local Build
Verified local build on 2026-05-15: Verified local build on 2026-05-15:
@@ -62,11 +85,9 @@ alembic: t7o8p9q0r1s2 (head)
health: GET /state/health -> {"status":"ok","db":"connected"} health: GET /state/health -> {"status":"ok","db":"connected"}
``` ```
Smoke command used a temporary container on host port 18000 so it did not
conflict with the live workstation State Hub on port 8000.
## Registry ## Registry
The registry target for CUST-WP-0011 is the self-hosted Gitea registry, but The registry target for CUST-WP-0011 is the self-hosted Gitea registry at
publishing remains blocked until the Gitea package/container registry endpoint `gitea.coulomb.social/coulomb/state-hub`. As of 2026-06-25, `/v2/` returns the
is enabled and Docker can authenticate against `/v2/`. Docker Registry auth challenge and the `b536741` image tag has been pushed and
pulled from railiance01.

View File

@@ -17,7 +17,7 @@ keeps the underlying scripts; only the *scheduling* moves.
| - | ------------------- | -------------------------------------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | | - | ------------------- | -------------------------------------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| 1 | activity-core cron | every 15 min (Railiance01) | `POST /consistency/sweep/remote-all``consistency_check.py --remote --all` | Pull every registered repo, reconcile workplan files ↔ DB, run C-15 writeback + C-16 pull gate | | 1 | activity-core cron | every 15 min (Railiance01) | `POST /consistency/sweep/remote-all``consistency_check.py --remote --all` | Pull every registered repo, reconcile workplan files ↔ DB, run C-15 writeback + C-16 pull gate |
| 2 | manual / daily cron | `make cleanup-stale` (suggested `0 3 * * *`) | `scripts/cleanup_stale_tasks.py` | Cancel tasks still open in finished/archived workstreams; emits `org.statehub.task.stale` | | 2 | manual / daily cron | `make cleanup-stale` (suggested `0 3 * * *`) | `scripts/cleanup_stale_tasks.py` | Cancel tasks still open in finished/archived workstreams; emits `org.statehub.task.stale` |
| 3 | git post-commit | every commit in a registered repo | `make fix-consistency REPO=<slug>` | Per-repo workplan ↔ DB sync immediately after a commit | | 3 | git post-commit | every commit in a registered repo | `statehub fix-consistency` | Per-repo workplan ↔ DB sync immediately after a commit |
Honourable mentions (not currently scheduled, on-demand only — listed for Honourable mentions (not currently scheduled, on-demand only — listed for
completeness so they don't get mistakenly picked up): completeness so they don't get mistakenly picked up):

View File

@@ -0,0 +1,95 @@
# State Hub Offline Write Buffer
## Decision
State Hub supports outage buffering through an edge relay with a durable local
outbox, plus central idempotency on replayed writes.
The central service cannot buffer requests that never reach it. Agents should
therefore send writes to a local statehub-edge relay when buffering is enabled.
The relay forwards immediately while the upstream API is reachable. If the
upstream is offline, the relay persists queueable write envelopes in a local
SQLite outbox and returns an explicit queued receipt.
Queued receipts are pending evidence, not successful central commits. Operators
must inspect and replay the outbox after recovery.
## Defaults
- Relay listen target: operator-selected, recommended 127.0.0.1:18080.
- Upstream API: STATEHUB_UPSTREAM_URL, then API_BASE, then
http://127.0.0.1:8000.
- Outbox path: STATEHUB_OUTBOX_PATH, default
~/.statehub/edge-outbox.sqlite3.
- Central idempotency retention: 14 days.
## Route Classes
### Append-Only, Queueable
| Method | Path | Notes |
| --- | --- | --- |
| POST | /progress/ | Session-close progress events. |
| POST | /messages/ | Agent coordination messages. |
| PATCH | /messages/{id}/read | Safe only when the message id is already known. |
| POST | /token-events/ | Token accounting events. |
| POST | /token-events/upsert | Source-id based token upsert. |
| POST | /decisions/ | Queue only when the caller does not need the generated id immediately. |
Append-only writes replay with Idempotency-Key. Exact duplicate retries return
the original central response. Same key with a different request returns HTTP
409.
### Replace-Style, Queueable With Conflict Checks
| Method | Path | Notes |
| --- | --- | --- |
| PATCH | /tasks/{id} | Task status and metadata updates. |
| POST | /tasks/bulk-status-sync | Ordered batch; future coalescing may decompose by task. |
| PATCH | /decisions/{id} | Decision field update. |
| POST | /decisions/{id}/resolve | Decision resolution. |
| PATCH | /workplans/{id} | Workplan lifecycle/status updates. |
| PATCH | /workstreams/{id} | Legacy alias for workplan update. |
In v1 the relay does not silently overwrite newer central state after a replay
conflict. A 409 response marks the envelope conflict and leaves it available for
operator review.
### Online-Only In V1
The relay forwards these while the upstream is reachable and returns a clear
503 during outage:
- DELETE endpoints.
- Repository sync/import/ingest endpoints.
- Consistency sweep mutation endpoints.
- Fabric graph exports and external pulls.
- Schema/bootstrap/admin operations.
- Requests with credentials, authorization tokens, attachments, or large opaque
payloads.
## Non-Secret Outbox Contract
The outbox stores method, path, scrubbed JSON body, route class, source metadata,
idempotency key, retry status, last error, and central response summaries. It
never stores authorization headers, bearer tokens, cookies, API keys, passwords,
or secret-looking JSON fields. Payloads over 64 KiB are rejected.
## Operator Commands
statehub outbox status
statehub outbox list --status queued
statehub outbox replay --upstream-url http://127.0.0.1:8000
statehub outbox export --output /tmp/statehub-outbox.json
statehub outbox retry ENVELOPE_ID
statehub outbox cancel ENVELOPE_ID
## Recovery Checklist
1. Confirm the central State Hub API is reachable.
2. Run statehub outbox status on each host that may have queued writes.
3. Run statehub outbox replay until no due queued envelopes remain.
4. Review conflict envelopes manually.
5. Run `statehub fix-consistency` so file-backed workplan/task state
remains canonical after replay.
6. Record a progress note with non-secret replay counts.

View File

@@ -82,6 +82,13 @@ succeeds but its automatic progress event fails, the tool returns an error with
the successful `write_result` included so the caller can avoid duplicating the the successful `write_result` included so the caller can avoid duplicating the
entity while recording the missing progress event. entity while recording the missing progress event.
When API_BASE points at the optional State Hub edge relay and the central API is
unreachable, queueable write tools may return a queued receipt instead of the
normal REST shape. The receipt means the local outbox accepted the write; it is
not yet a central commit. Automatic progress-event side effects are skipped for
queued primary writes so replay does not duplicate records. Operators can inspect
and replay with statehub outbox status and statehub outbox replay.
--- ---
## Query Tools (read-only, use freely) ## Query Tools (read-only, use freely)

View File

@@ -120,12 +120,23 @@ def _mcp_error(tool_name: str, message: str, response: Any | None = None) -> dic
return payload return payload
def _mcp_queued(tool_name: str, response: dict[str, Any]) -> dict[str, Any]:
return {
"queued": True,
"tool": tool_name,
"message": "Write queued by State Hub edge relay; central commit is pending replay.",
"receipt": response,
}
def _response_error( def _response_error(
tool_name: str, tool_name: str,
response: Any, response: Any,
required_fields: tuple[str, ...] = (), required_fields: tuple[str, ...] = (),
) -> dict[str, Any] | None: ) -> dict[str, Any] | None:
"""Return an MCP-visible error payload for failed or malformed API results.""" """Return an MCP-visible error payload for failed or malformed API results."""
if isinstance(response, dict) and response.get("queued") is True:
return _mcp_queued(tool_name, response)
if isinstance(response, dict) and isinstance(response.get("error"), str): if isinstance(response, dict) and isinstance(response.get("error"), str):
return _mcp_error(tool_name, response["error"], response) return _mcp_error(tool_name, response["error"], response)
if not isinstance(response, dict): if not isinstance(response, dict):
@@ -978,6 +989,7 @@ def add_progress_event(
summary: str, summary: str,
event_type: str = "note", event_type: str = "note",
topic_id: str | None = None, topic_id: str | None = None,
workplan_id: str | None = None,
workstream_id: str | None = None, workstream_id: str | None = None,
task_id: str | None = None, task_id: str | None = None,
detail: dict | str | None = None, detail: dict | str | None = None,
@@ -988,7 +1000,8 @@ def add_progress_event(
summary: human-readable summary of what happened summary: human-readable summary of what happened
event_type: free-form label, e.g. note | milestone | blocker | insight event_type: free-form label, e.g. note | milestone | blocker | insight
topic_id: optional topic UUID topic_id: optional topic UUID
workstream_id: optional workstream UUID workplan_id: optional workplan UUID (preferred)
workstream_id: legacy alias for workplan_id
task_id: optional task UUID task_id: optional task UUID
detail: optional structured data (JSONB); accepts a dict or a JSON string detail: optional structured data (JSONB); accepts a dict or a JSON string
""" """
@@ -999,7 +1012,7 @@ def add_progress_event(
detail = {"raw": detail} detail = {"raw": detail}
event = _post("/progress", { event = _post("/progress", {
"topic_id": topic_id, "topic_id": topic_id,
"workstream_id": workstream_id, "workplan_id": workplan_id or workstream_id,
"task_id": task_id, "task_id": task_id,
"event_type": event_type, "event_type": event_type,
"summary": summary, "summary": summary,

View File

@@ -0,0 +1,43 @@
"""add write idempotency keys
Revision ID: e9f0a1b2c3d4
Revises: d8e9f0a1b2c3
Create Date: 2026-06-23
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB, UUID
revision = "e9f0a1b2c3d4"
down_revision = "d8e9f0a1b2c3"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"write_idempotency_keys",
sa.Column("id", UUID(as_uuid=True), primary_key=True),
sa.Column("key", sa.String(length=200), nullable=False),
sa.Column("method", sa.String(length=10), nullable=False),
sa.Column("path", sa.Text(), nullable=False),
sa.Column("route_class", sa.String(length=30), nullable=False),
sa.Column("request_hash", sa.String(length=64), nullable=False),
sa.Column("response_status", sa.Integer(), nullable=False),
sa.Column("response_body", JSONB(), nullable=True),
sa.Column("source_host", sa.String(length=200), nullable=True),
sa.Column("source_agent", sa.String(length=100), nullable=True),
sa.Column("first_seen_at", sa.DateTime(timezone=True), nullable=False),
sa.Column("last_seen_at", sa.DateTime(timezone=True), nullable=False),
sa.Column("expires_at", sa.DateTime(timezone=True), nullable=True),
sa.UniqueConstraint("key", name="uq_write_idempotency_keys_key"),
)
op.create_index("ix_write_idempotency_keys_key", "write_idempotency_keys", ["key"])
op.create_index("ix_write_idempotency_keys_expires_at", "write_idempotency_keys", ["expires_at"])
def downgrade() -> None:
op.drop_index("ix_write_idempotency_keys_expires_at", table_name="write_idempotency_keys")
op.drop_index("ix_write_idempotency_keys_key", table_name="write_idempotency_keys")
op.drop_table("write_idempotency_keys")

View File

@@ -0,0 +1,31 @@
"""add workplan file backing metadata for remote API index
Revision ID: f1a2b3c4d5e6
Revises: e9f0a1b2c3d4
Create Date: 2026-07-03
"""
from alembic import op
import sqlalchemy as sa
revision = "f1a2b3c4d5e6"
down_revision = "e9f0a1b2c3d4"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column("workplans", sa.Column("backing_filename", sa.String(255), nullable=True))
op.add_column("workplans", sa.Column("backing_relative_path", sa.Text(), nullable=True))
op.add_column("workplans", sa.Column("backing_archived", sa.Boolean(), nullable=True))
op.add_column(
"workplans",
sa.Column("backing_synced_at", sa.DateTime(timezone=True), nullable=True),
)
def downgrade() -> None:
op.drop_column("workplans", "backing_synced_at")
op.drop_column("workplans", "backing_archived")
op.drop_column("workplans", "backing_relative_path")
op.drop_column("workplans", "backing_filename")

View File

@@ -31,7 +31,15 @@ build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel] [tool.hatch.build.targets.wheel]
packages = ["api", "mcp_server", "task_flow_engine"] packages = ["api", "mcp_server", "task_flow_engine"]
artifacts = ["custodian_cli.py", "statehub_register.py"] artifacts = [
"custodian_cli.py",
"statehub_register.py",
"scripts/consistency_check.py",
"scripts/repo_sync.py",
"scripts/mcp_registration.py",
"scripts/project_claude_md.template",
"scripts/project_rules/*.template",
]
[tool.uv.sources] [tool.uv.sources]
llm-connect = { path = "/home/worsch/llm-connect", editable = true } llm-connect = { path = "/home/worsch/llm-connect", editable = true }

View File

@@ -0,0 +1,418 @@
#!/usr/bin/env python3
"""Complete ready statehub-bootstrap workplans (T01T03) for attached repos."""
from __future__ import annotations
import re
import sys
from pathlib import Path
HOME = Path("/home/worsch")
REGISTRY_STACK = """## Stack
- **Language:** Markdown-first registry and planning repo (no application runtime yet)
- **Key deps:** State Hub ADR-001 workplans, `registry/indexes/capabilities.yaml`
## Dev Commands
```bash
# Orient (offline-safe)
cat .custodian-brief.md
cat INTENT.md
cat SCOPE.md
ls workplans/
# After workplan or registry edits — from ~/state-hub
make fix-consistency REPO={repo_slug}
# Sanity-check markdown / registry edits
git diff --check
```
"""
DOCS_STACK = """## Stack
- **Language:** Markdown-first control/planning repository (no application runtime)
- **Key deps:** State Hub workplans, agent instructions, optional `registry/` scaffold
## Dev Commands
```bash
cat .custodian-brief.md
cat INTENT.md
ls workplans/
# After workplan edits — from ~/state-hub
make fix-consistency REPO={repo_slug}
```
"""
def wp0002_template(
*,
prefix: str,
num: str,
slug: str,
title: str,
domain: str,
repo: str,
topic_slug: str,
summary: str,
task_title: str,
task_body: str,
) -> str:
wid = f"{prefix}-{num}"
return f"""---
id: {wid}
type: workplan
title: "{title}"
domain: {domain}
repo: {repo}
status: ready
owner: codex
topic_slug: {topic_slug}
created: "2026-06-22"
updated: "2026-06-22"
---
# {title}
{summary}
## {task_title}
```task
id: {wid}-T01
status: todo
priority: high
```
{task_body}
"""
def close_wp0001(path: Path, *, t01_note: str, t02_note: str, t03_note: str) -> None:
text = path.read_text(encoding="utf-8")
text = re.sub(r"^status: ready$", "status: finished", text, count=1, flags=re.M)
text = re.sub(r"^(updated: ).*$", r'\1"2026-06-22"', text, count=1, flags=re.M)
for tid, note in [
("T01", t01_note),
("T02", t02_note),
("T03", t03_note),
]:
text = re.sub(
rf"(id: [A-Z0-9-]+-{tid}\nstatus: )todo",
r"\1done",
text,
count=1,
)
pattern = rf"(```task\nid: [A-Z0-9-]+-{tid}\nstatus: done\n.*?```)"
block = re.search(pattern, text, re.DOTALL)
if block and note:
after = f"{block.group(1)}\n\nResult 2026-06-22: {note}"
if note not in text:
text = text.replace(block.group(1), after, 1)
path.write_text(text, encoding="utf-8")
def fill_scope_template(path: Path, *, oneliner: str, core: str, in_scope: list[str], out_scope: list[str]) -> None:
text = path.read_text(encoding="utf-8")
if "<!-- Describe the purpose" not in text:
return
in_lines = "\n".join(f"- {line}" for line in in_scope)
out_lines = "\n".join(f"- {line}" for line in out_scope)
replacement = f"""# SCOPE
> Lightweight boundary for agents and contributors.
---
## One-liner
{oneliner}
---
## Core Idea
{core}
---
## In Scope
{in_lines}
---
## Out of Scope
{out_lines}
"""
path.write_text(replacement + "\n", encoding="utf-8")
CONFIGS: dict[str, dict] = {
"audit-core": {
"stack": """## Stack
- **Language:** Python 3.11+
- **Key deps:** stdlib + pytest (see `pyproject.toml`)
## Dev Commands
```bash
# Install (editable)
pip install -e ".[dev]" # or: python3 -m pip install pytest
# Run tests
make test
python3 -m pytest -q
# Mock audit backend smoke / cleanup
make mock-audit-smoke
make mock-audit-cleanup
python3 -m audit_core emit --help
```
""",
"wp0002": ("0002", "pluggable-audit-backend", "Pluggable audit backend contract",
"Define the replaceable audit backend interface beyond the mock JSONL writer and document retention guarantees.",
"Author backend interface contract",
"Document `AuditBackend` protocol, event schema, retention policy, and migration path from the mock file backend in `docs/` or module docstrings."),
"close_notes": ("INTENT.md and SCOPE.md reviewed; AGENTS.md and brief confirmed.", "Documented Python/pytest workflow in stack-and-commands.md.", "Created AUDIT-WP-0002."),
},
"binect-js": {
"stack": """## Stack
- **Language:** TypeScript (ESM)
- **Key deps:** Vitest; publishes `@binect/js`
## Dev Commands
```bash
npm install
npm run build
npm test
npm run test:e2e
npm run typecheck
npm run clean
```
""",
"wp0002": ("0002", "sdk-publication-readiness", "SDK publication and consumer validation",
"Prepare `@binect/js` for npm publication and validate the Explorer against live API flows.",
"Publication readiness checklist",
"Verify build output, types, e2e tests, and document npm publish + consumer install path in README."),
"close_notes": ("Integration files reviewed.", "Documented npm/vitest workflow.", "Created BINECT-WP-0002."),
},
"binect-chrome": {
"stack": None,
"wp0002": ("0002", "release-smoke-path", "Extension release smoke path",
"Establish repeatable build, load-unpacked, and PDF-send smoke verification before store submission.",
"Release smoke checklist",
"Document and automate smoke steps: build, load `dist/`, trigger PDF detection, verify Binect upload metadata-only path."),
"close_notes": ("SCOPE.md and INTENT.md reviewed; AGENTS.md regenerated.", "Stack commands already documented.", "Created BINECT-CHROME-WP-0002."),
},
"tele-mcp": {
"stack": """## Stack
- **Language:** Python (FastAPI MCP bridge) + Ansible/Helm/K8s deploy assets
- **Key deps:** FastAPI, uvicorn, httpx; kube-prometheus-stack, Loki, Promtail
## Dev Commands
```bash
# Deploy observability stack (from repo root)
cd ansible && ansible-playbook -i inventories/local.ini playbook.yml
# MCP bridge (local)
cd mcp-telemetry-bridge
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8080
# Smoke (requires cluster access)
kubectl get pods -n monitoring
kubectl port-forward -n mcp svc/mcp-telemetry-bridge 8080:80
curl http://localhost:8080/healthz
curl http://localhost:8080/mcp/schema | jq .
```
""",
"wp0002": ("0002", "mcp-bridge-local-verification", "MCP bridge local verification loop",
"Harden the local dev/test loop for `mcp-telemetry-bridge` independent of full cluster deploy.",
"Local verification harness",
"Add documented local run path, health/schema smoke tests, and agent-oriented quickstart in README."),
"close_notes": ("INTENT.md, SCOPE.md, AGENTS.md reviewed.", "Documented Ansible/kubectl/uvicorn workflow.", "Created TELE-WP-0002."),
},
"coordination-engine": {
"stack": REGISTRY_STACK,
"scope": ("Framework for goal-driven digital coordination as communication.",
"coordination-engine captures coordination models, specs, and registry entries for how actors align on shared goals.",
["Coordination specs, history, and registry indexes", "State Hub workplans and agent instructions", "Capability registration when coordination patterns stabilize"],
["Runtime orchestration engine implementation (future repos)", "Replacing issue trackers or chat systems"]),
"wp0002": ("0002", "coordination-model-spec", "Coordination model specification baseline",
"Draft the first coordination ontology and message lifecycle spec for the engine.",
"Author coordination spec v0.1",
"Write `spec/coordination-model-v0.1.md` covering actors, goals, commitments, and observation loops."),
"close_notes": ("Filled SCOPE.md from INTENT.", "Registry-oriented dev workflow documented.", "Created COORDINATION-WP-0002."),
},
"domain-tree": {
"stack": REGISTRY_STACK,
"scope": ("Domain capability tree for navigating and classifying repository domains.",
"domain-tree maintains the hierarchical view of domains, topics, and capability relationships.",
["Domain tree registry indexes and documentation", "Alignment with State Hub domain/topic model", "Workplans for tree evolution"],
["Owning individual domain implementations", "Replacing State Hub domain administration"]),
"wp0002": ("0002", "domain-tree-index-foundation", "Domain tree index foundation",
"Populate the initial domain tree registry index linked to State Hub topics.",
"Seed domain tree index",
"Define `registry/indexes/domain-tree.yaml` structure and seed entries for active infotech domains."),
"close_notes": ("Filled SCOPE.md.", "Registry workflow documented.", "Created DOMAIN-WP-0002."),
},
"human-resources": {
"stack": REGISTRY_STACK,
"scope": ("Strategic human-resources planning and antifragile people-system design.",
"human-resources captures HR intent, workflows, and registry entries for people operations amplified by automation.",
["HR intent, policies-in-draft, and workplans", "Registry scaffold for HR capabilities", "State Hub coordination for HR initiatives"],
["Payroll/HRIS product implementation", "Legal employment contract generation"]),
"wp0002": ("0002", "hr-workflow-registry-scaffold", "HR workflow registry scaffold",
"Define the first HR workflow registry entries and assessment loop.",
"HR workflow registry draft",
"Add initial workflow entries to `registry/` and link to INTENT phases (assessment, automation roadmap)."),
"close_notes": ("Filled SCOPE.md from INTENT.", "Registry workflow documented.", "Created HUMAN-WP-0002."),
},
"open-reuse": {
"stack": REGISTRY_STACK,
"scope": ("Managed continuity for valuable open-source integrations.",
"open-reuse turns proven OSS integrations into structured, maintainable assets with clear boundaries and update loops.",
["Integration analysis docs, registry, and workplans", "Reuse modes and continuity policies", "State Hub progress and decisions"],
["Hosting forked upstream code long-term without policy", "Replacing package registries"]),
"wp0002": ("0002", "integration-asset-registry", "Integration asset registry foundation",
"Establish registry format for managed OSS integration assets.",
"Registry format v0.1",
"Define integration asset schema in `registry/` and document analyze→classify→refactor→maintain loop."),
"close_notes": ("Filled SCOPE.md from INTENT.md.", "Registry workflow documented.", "Created OPEN-WP-0002."),
},
"repo-seed": {
"stack": REGISTRY_STACK,
"scope": ("Git repository template to bootstrap coulomb projects.",
"repo-seed is the canonical template for new repos: agent instructions, registry scaffold, and onboarding conventions.",
["Template files for new repo bootstrap", "Documentation for statehub_register usage", "Registry capability entry for template capability"],
["Application runtime code", "Owning downstream project implementations"]),
"wp0002": ("0002", "template-validation-checklist", "Template consumer validation checklist",
"Validate repo-seed against statehub_register output and document consumer steps.",
"Template validation checklist",
"Author checklist for new repo bootstrap: register, agent files, first workplan, fix-consistency."),
"close_notes": ("Filled SCOPE.md; README is canonical intent.", "Template workflow documented.", "Created REPO-WP-0002."),
},
"vantage-point": {
"stack": REGISTRY_STACK,
"scope": ("Generic system for exploring dependency structures as network-based graph models (NBGM).",
"Vantage Point unifies entity/relationship inspection and reasoning across arbitrary domains.",
["NBGM specs, registry, and exploratory docs", "State Hub workplans for graph exploration features", "Alignment with repo-scoping and fabric graph models"],
["Production graph database hosting", "Replacing railiance-fabric ingestion"]),
"wp0002": ("0002", "nbgm-spec-baseline", "NBGM model specification baseline",
"Author the network-based graph model specification baseline.",
"NBGM spec v0.1",
"Write spec covering nodes, edges, attributes, provenance, and inspection operations."),
"close_notes": ("Filled SCOPE.md from INTENT.", "Registry workflow documented.", "Created VANTAGE-WP-0002."),
},
"tegwick-control": {
"stack": DOCS_STACK,
"wp0002": ("0002", "personal-project-intake", "Personal project intake scaffold",
"Establish the personal control-plane intake pattern for projects and commitments.",
"Intake scaffold",
"Define folder/layout for `agent-tasks/`, decision log, and first prioritized workstream."),
"close_notes": ("INTENT.md and SCOPE.md reviewed.", "Docs-oriented workflow documented.", "Created TEGWICK-WP-0002."),
},
"whynot-control": {
"stack": DOCS_STACK,
"wp0002": ("0002", "beta-signal-intake", "Beta signal intake pipeline",
"Structure how prototype signals, betas, and feedback enter the whynot organization.",
"Beta intake pipeline",
"Document intake stages from prototype → beta → signal → promotion decision; seed first agent-tasks/ entries."),
"close_notes": ("INTENT.md reviewed.", "Control-repo workflow documented.", "Created WHYNOT-WP-0002."),
},
"markitect-main": {
"stack": """## Stack
- **Language:** Python 3.12+ (monorepo) + JavaScript UI (testdrive-jsui)
- **Key deps:** uv/pip, pytest, npm; see `pyproject.toml`, `package.json`, `Makefile`
## Dev Commands
```bash
make setup
make test
make test-js
make test-all
make lint
make build
make help
```
""",
"wp0002": None,
"close_notes": ("SCOPE.md and INTRODUCTION.md reviewed; AGENTS.md confirmed.", "Documented make-based Python/JS workflow.", "MARKITECT-WP-0002 already exists (TestDrive npm publication)."),
},
"whynot-design": {
"stack": None,
"wp0002": None,
"close_notes": ("INTENT.md, SCOPE.md, AGENTS.md reviewed.", "Stack commands already complete.", "WHYNOT-WP-0002 already exists (designbook stack adapters)."),
},
}
def process_repo(repo_slug: str) -> None:
repo_path = HOME / repo_slug
wp1_files = list((repo_path / "workplans").glob("*-WP-0001-statehub-bootstrap.md"))
if not wp1_files:
raise SystemExit(f"No bootstrap workplan in {repo_slug}")
wp1 = wp1_files[0]
fm = wp1.read_text(encoding="utf-8").split("---")[1]
domain = re.search(r"^domain:\s*(\S+)", fm, re.M).group(1)
topic = re.search(r"^topic_slug:\s*(\S+)", fm, re.M).group(1)
prefix = re.search(r"^id:\s*([A-Z0-9-]+)-0001", fm, re.M).group(1)
cfg = CONFIGS[repo_slug]
if cfg.get("scope"):
oneliner, core, ins, outs = cfg["scope"]
fill_scope_template(repo_path / "SCOPE.md", oneliner=oneliner, core=core, in_scope=ins, out_scope=outs)
stack_path = repo_path / ".claude" / "rules" / "stack-and-commands.md"
stack = cfg.get("stack")
if stack:
content = stack.format(repo_slug=repo_slug) if "{repo_slug}" in stack else stack
stack_path.write_text(content.strip() + "\n", encoding="utf-8")
wp2 = cfg.get("wp0002")
if wp2:
num, slug, title, summary, task_title, task_body = wp2
wp2_path = repo_path / "workplans" / f"{prefix}-{num}-{slug}.md"
if not wp2_path.exists():
wp2_path.write_text(
wp0002_template(
prefix=prefix,
num=num,
slug=slug,
title=title,
domain=domain,
repo=repo_slug,
topic_slug=topic,
summary=summary,
task_title=task_title,
task_body=task_body,
),
encoding="utf-8",
)
n1, n2, n3 = cfg["close_notes"]
close_wp0001(wp1, t01_note=n1, t02_note=n2, t03_note=n3)
print(f"OK {repo_slug}")
def main(argv: list[str]) -> int:
repos = argv[1:] or list(CONFIGS)
for repo in repos:
process_repo(repo)
return 0
if __name__ == "__main__":
raise SystemExit(main(sys.argv))

View File

@@ -16,7 +16,7 @@ Checks:
C-09 workstream-repo-mismatch FAIL Yes DB workstream repo_id != file location C-09 workstream-repo-mismatch FAIL Yes DB workstream repo_id != file location
C-10 task-status-drift WARN Yes Task status differs between file and DB C-10 task-status-drift WARN Yes Task status differs between file and DB
C-11 task-unlinked WARN Yes Task block has no state_hub_task_id C-11 task-unlinked WARN Yes Task block has no state_hub_task_id
C-12 orphan-db-task WARN No DB task in workstream has no file backing C-12 orphan-db-task WARN Yes DB task in workstream has no file backing unless terminal in a closed workstream
C-13 workstream-auto-complete WARN Yes All DB tasks done but workstream still active C-13 workstream-auto-complete WARN Yes All DB tasks done but workstream still active
C-14 ghost-duplicate WARN No Active topic workstream with no repo_id matches a file-backed title — probable ghost from premature create_workstream() call C-14 ghost-duplicate WARN No Active topic workstream with no repo_id matches a file-backed title — probable ghost from premature create_workstream() call
C-15 task-db-ahead WARN Yes DB task status is ahead of file — regression prevented; writeback syncs file C-15 task-db-ahead WARN Yes DB task status is ahead of file — regression prevented; writeback syncs file
@@ -570,9 +570,23 @@ def _api_patch(api_base: str, path: str, body: dict) -> Any:
return {"_error": str(exc)} return {"_error": str(exc)}
def _api_put(api_base: str, path: str, body: dict) -> Any:
if not _HAS_HTTPX:
return {"_error": "httpx is not installed"}
if not path.endswith("/"):
path += "/"
try:
with _httpx.Client(base_url=api_base, timeout=30.0, follow_redirects=True) as c:
r = c.put(path, json=body)
r.raise_for_status()
return r.json()
except Exception as exc:
return {"_error": str(exc)}
def _api_post(api_base: str, path: str, body: dict) -> Any: def _api_post(api_base: str, path: str, body: dict) -> Any:
if not _HAS_HTTPX: if not _HAS_HTTPX:
return None return {"_error": "httpx is not installed"}
if not path.endswith("/"): if not path.endswith("/"):
path += "/" path += "/"
try: try:
@@ -580,8 +594,13 @@ def _api_post(api_base: str, path: str, body: dict) -> Any:
r = c.post(path, json=body) r = c.post(path, json=body)
r.raise_for_status() r.raise_for_status()
return r.json() return r.json()
except Exception: except _httpx.HTTPStatusError as exc:
return None detail = exc.response.text
if len(detail) > 500:
detail = detail[:497] + "..."
return {"_error": f"{exc.response.status_code} {exc.response.reason_phrase}: {detail}"}
except Exception as exc:
return {"_error": str(exc)}
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -836,6 +855,7 @@ def check_repo(api_base: str, repo_slug: str, repo_path_override: str | None = N
"repo_id": repo_id, "repo_id": repo_id,
"domain": file_domain, "domain": file_domain,
"repo_market_domain": repo_market_domain, "repo_market_domain": repo_market_domain,
"repo_slug": repo_slug,
}, },
) )
continue continue
@@ -1019,11 +1039,13 @@ def check_repo(api_base: str, repo_slug: str, repo_path_override: str | None = N
existing_dep_keys = set() existing_dep_keys = set()
if isinstance(existing_deps, list): if isinstance(existing_deps, list):
for dep in existing_deps: for dep in existing_deps:
if dep.get("from_workstream_id") != ws_id: from_id = dep.get("from_workstream_id") or dep.get("from_workplan_id")
if from_id != ws_id:
continue continue
rel = dep.get("relationship_type") or "blocks" rel = dep.get("relationship_type") or "blocks"
if dep.get("to_workstream_id"): to_workplan_id = dep.get("to_workstream_id") or dep.get("to_workplan_id")
existing_dep_keys.add(("workstream", dep["to_workstream_id"], rel)) if to_workplan_id:
existing_dep_keys.add(("workstream", to_workplan_id, rel))
if dep.get("to_task_id"): if dep.get("to_task_id"):
existing_dep_keys.add(("task", dep["to_task_id"], rel)) existing_dep_keys.add(("task", dep["to_task_id"], rel))
@@ -1196,9 +1218,14 @@ def check_repo(api_base: str, repo_slug: str, repo_path_override: str | None = N
ws_finished = normalise_workstream_status(ws_status) in CLOSED_WORKSTREAM_STATUSES ws_finished = normalise_workstream_status(ws_status) in CLOSED_WORKSTREAM_STATUSES
for db_t in db_tasks: for db_t in db_tasks:
if db_t["id"] not in file_task_sh_ids: if db_t["id"] not in file_task_sh_ids:
db_t_status = db_t.get("status", "") db_t_status = normalise_task_status(db_t.get("status", "todo"))
open_task = db_t_status not in TERMINAL_TASK_STATUSES open_task = db_t_status not in TERMINAL_TASK_STATUSES
# Auto-cancel fixable when workstream is finished and task is open # Closed workstreams can legitimately retain terminal historical
# DB tasks from earlier duplicates. The public task DELETE route
# is a cancel operation, so these are not further actionable.
if ws_finished and not open_task:
continue
# Auto-cancel fixable when workstream is finished and task is open.
fixable = ws_finished and open_task fixable = ws_finished and open_task
report.add( report.add(
severity="WARN", check_id="C-12", severity="WARN", check_id="C-12",
@@ -1250,9 +1277,46 @@ def check_repo(api_base: str, repo_slug: str, repo_path_override: str | None = N
# workstream from the file, leaving the first as an invisible orphan. # workstream from the file, leaving the first as an invisible orphan.
_check_ghost_duplicates(api_base, workplan_infos, file_ws_ids, report) _check_ghost_duplicates(api_base, workplan_infos, file_ws_ids, report)
_sync_workplan_bindings(api_base, repo_slug, workplan_infos, repo_dir, report)
return report return report
def _sync_workplan_bindings(
api_base: str,
repo_slug: str,
workplan_infos: list[tuple[Path, dict, str]],
repo_dir: Path,
report: ConsistencyReport,
) -> None:
bindings: list[dict[str, Any]] = []
for wp_file, meta, _ in workplan_infos:
ws_id = str(meta.get("state_hub_workstream_id", "")).strip().strip('"')
if not ws_id:
continue
archived = wp_file.parent.name == "archived"
file_status = normalise_workstream_status(str(meta.get("status", "")).strip())
bindings.append(
{
"workplan_id": ws_id,
"filename": wp_file.name,
"relative_path": workplan_display_path(repo_dir, wp_file),
"repo_slug": repo_slug,
"archived": archived,
"status": file_status or None,
}
)
if not bindings:
return
result = _api_put(api_base, "/workplans/index/bindings", {"bindings": bindings})
if isinstance(result, dict) and "_error" in result:
report.fixes_applied.append(f"bindings WARN: {result['_error']}")
elif isinstance(result, dict):
report.fixes_applied.append(
f"bindings: synced {result.get('updated', 0)}/{result.get('received', len(bindings))}"
)
def _check_orphan_db( def _check_orphan_db(
api_base: str, api_base: str,
repo_id: str, repo_id: str,
@@ -1770,7 +1834,39 @@ def fix_repo(
) )
continue continue
slug = re.sub(r"[^a-z0-9-]", "-", wp_id.lower()).strip("-") base_slug = re.sub(r"[^a-z0-9-]", "-", wp_id.lower()).strip("-") or "workplan"
repo_slug_part = re.sub(
r"[^a-z0-9-]", "-", str(ctx.get("repo_slug") or "").lower()
).strip("-")
slug_candidates = [base_slug]
repo_qualified_slug = base_slug
if repo_slug_part and not base_slug.startswith(f"{repo_slug_part}-"):
repo_qualified_slug = f"{repo_slug_part}-{base_slug}"
slug_candidates.append(repo_qualified_slug)
for suffix in range(2, 21):
slug_candidates.append(f"{repo_qualified_slug}-{suffix}")
ws_data = None
last_error = None
for slug in slug_candidates:
existing = _api_get(api_base, "/workstreams", {"slug": slug}, return_error=True)
if isinstance(existing, dict) and "_error" in existing:
last_error = existing["_error"]
continue
if isinstance(existing, list) and existing:
existing_same_repo = next(
(w for w in existing if w.get("repo_id") == repo_id_val),
None,
)
if existing_same_repo and existing_same_repo.get("title") == (title or wp_id):
ws_data = existing_same_repo
report.fixes_applied.append(
f"C-06 reusing existing workstream {ws_data['id'][:8]}... for {wp_id}"
)
break
last_error = f"slug {slug!r} already belongs to another workstream"
continue
ws_data = _api_post(api_base, "/workstreams", { ws_data = _api_post(api_base, "/workstreams", {
"topic_id": topic_id, "topic_id": topic_id,
"repo_id": repo_id_val, "repo_id": repo_id_val,
@@ -1781,9 +1877,15 @@ def fix_repo(
"planning_priority": str(meta.get("planning_priority", "")).strip() or None, "planning_priority": str(meta.get("planning_priority", "")).strip() or None,
"planning_order": _as_int_or_none(meta.get("planning_order")), "planning_order": _as_int_or_none(meta.get("planning_order")),
}) })
if ws_data is None or (isinstance(ws_data, dict) and "_error" in ws_data):
last_error = ws_data.get("_error") if isinstance(ws_data, dict) else "no response"
ws_data = None
continue
break
if ws_data is None: if ws_data is None:
report.fixes_applied.append( report.fixes_applied.append(
f"C-06 FAIL {wp_id}: could not create workstream in DB" f"C-06 FAIL {wp_id}: could not create workstream in DB: {last_error or 'no usable slug'}"
) )
continue continue
@@ -1814,7 +1916,7 @@ def fix_repo(
"priority": t_priority, "priority": t_priority,
"assignee": task.get("assignee") or None, "assignee": task.get("assignee") or None,
}) })
if t_data: if t_data and "_error" not in t_data:
t_db_id = t_data["id"] t_db_id = t_data["id"]
injected = _inject_task_id_into_block( injected = _inject_task_id_into_block(
wp_file, "state_hub_task_id", t_db_id, t_id wp_file, "state_hub_task_id", t_db_id, t_id
@@ -1822,6 +1924,10 @@ def fix_repo(
if not injected: if not injected:
_inject_task_id_frontmatter_list(wp_file, t_db_id, t_id) _inject_task_id_frontmatter_list(wp_file, t_db_id, t_id)
report.fixes_applied.append(f" + task {t_id}{t_db_id[:8]}") report.fixes_applied.append(f" + task {t_id}{t_db_id[:8]}")
elif t_data:
report.fixes_applied.append(
f" ! task {t_id} not created: {t_data.get('_error', t_data)}"
)
elif issue.check_id == "C-09": elif issue.check_id == "C-09":
ws_id = ctx["ws_id"] ws_id = ctx["ws_id"]

View File

@@ -7,8 +7,9 @@
# ./install_hooks.sh --repo <slug> --remove # remove hook from one repo # ./install_hooks.sh --repo <slug> --remove # remove hook from one repo
# ./install_hooks.sh --all --remove # remove hook from all repos # ./install_hooks.sh --all --remove # remove hook from all repos
# #
# The hook runs `make fix-consistency REPO=<slug>` in the state-hub after each # The hook runs `statehub fix-consistency --repo <slug>` after each commit,
# commit, keeping the hub in sync with workplan file changes automatically. # keeping the hub in sync with workplan file changes automatically. It falls
# back to the state-hub Make target when the CLI is not installed.
# #
# Idempotent: the hook block is guarded by a marker comment. Running twice is safe. # Idempotent: the hook block is guarded by a marker comment. Running twice is safe.
@@ -79,8 +80,12 @@ install_hook() {
hook_block=$(cat <<BLOCK hook_block=$(cat <<BLOCK
${MARKER} — managed by custodian, do not edit this block ${MARKER} — managed by custodian, do not edit this block
if curl -sf ${API_BASE}/state/health >/dev/null 2>&1; then if curl -sf ${API_BASE}/state/health >/dev/null 2>&1; then
if command -v statehub >/dev/null 2>&1; then
(cd "${repo_path}" && statehub fix-consistency --repo ${slug} >/dev/null 2>&1 &)
else
(cd "${STATEHUB_DIR}" && make fix-consistency REPO=${slug} >/dev/null 2>&1 &) (cd "${STATEHUB_DIR}" && make fix-consistency REPO=${slug} >/dev/null 2>&1 &)
fi fi
fi
${MARKER}-end ${MARKER}-end
BLOCK BLOCK
) )

View File

@@ -20,6 +20,12 @@ there is no MCP server for Codex agents.
|---------|-----| |---------|-----|
| Local workstation | `http://127.0.0.1:8000` | | Local workstation | `http://127.0.0.1:8000` |
| Remote via tunnel | `http://127.0.0.1:18000` | | Remote via tunnel | `http://127.0.0.1:18000` |
| Optional local edge relay | http://127.0.0.1:18080 |
When an operator has enabled the edge relay, set API_BASE to the relay URL.
Queueable writes return an explicit queued receipt if the central hub is
unreachable. Treat that as pending local evidence, then ask the operator to run
statehub outbox status/replay after connectivity returns.
### Orient at session start ### Orient at session start
@@ -27,8 +33,8 @@ there is no MCP server for Codex agents.
# Offline brief — works without hub connection # Offline brief — works without hub connection
cat .custodian-brief.md cat .custodian-brief.md
# Active workstreams for this domain # Active workplans for this domain
curl -s "http://127.0.0.1:8000/workstreams/?topic_id={TOPIC_ID}&status=active" \ curl -s "http://127.0.0.1:8000/workplans/?topic_id={TOPIC_ID}&status=active" \
| python3 -m json.tool | python3 -m json.tool
# Check inbox # Check inbox
@@ -51,12 +57,12 @@ curl -s -X POST http://127.0.0.1:8000/progress/ \
"summary": "what was done", "summary": "what was done",
"event_type": "note", "event_type": "note",
"author": "codex", "author": "codex",
"workstream_id": "<uuid>", "workplan_id": "<uuid>",
"task_id": "<uuid>" "task_id": "<uuid>"
}' }'
``` ```
Omit `workstream_id` / `task_id` when not applicable. Omit `workplan_id` / `task_id` when not applicable.
### Update task status ### Update task status
@@ -80,7 +86,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
## Session Protocol ## Session Protocol
**Start:** **Start:**
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe) 1. `cat .custodian-brief.md` — domain goal and open workplans (offline-safe)
2. Check inbox: `GET /messages/?to_agent={REPO_SLUG}&unread_only=true`; mark read 2. Check inbox: `GET /messages/?to_agent={REPO_SLUG}&unread_only=true`; mark read
3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks 3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
4. Check human-needed tasks: `GET /tasks/?needs_human=true` 4. Check human-needed tasks: `GET /tasks/?needs_human=true`
@@ -92,12 +98,12 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
**Close:** **Close:**
1. Update workplan file task statuses to reflect progress 1. Update workplan file task statuses to reflect progress
2. Log: `POST /progress/` with a summary of what changed 2. Log: `POST /progress/` with a summary of what changed
3. Note for the custodian operator: after workplan file changes, run from 3. After workplan file changes, run:
`~/state-hub`:
```bash ```bash
make fix-consistency REPO={REPO_SLUG} statehub fix-consistency
``` ```
This syncs task status from files into the hub DB. Coding agents should run this directly; ask the operator only if the CLI or
State Hub API is unavailable. This syncs task status from files into the hub DB.
--- ---
@@ -139,7 +145,7 @@ owner: codex
topic_slug: ... topic_slug: ...
created: "YYYY-MM-DD" created: "YYYY-MM-DD"
updated: "YYYY-MM-DD" updated: "YYYY-MM-DD"
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit (legacy name; holds the workplan id)
--- ---
``` ```
@@ -166,5 +172,5 @@ Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blo
To create a new workplan: To create a new workplan:
1. Write the file following the format above 1. Write the file following the format above
2. Notify the custodian operator to run `make fix-consistency REPO={REPO_SLUG}` 2. Run `statehub fix-consistency` locally; ask the operator only if the CLI or
(or send a message to the hub agent via `POST /messages/`) State Hub API is unavailable.

View File

@@ -20,7 +20,7 @@ Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run wa
| Agent runtime | How to orient | | Agent runtime | How to orient |
| --- | --- | | --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent={REPO_SLUG}` is for coordination, not secret vending | | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent={REPO_SLUG}` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership | | **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workplans; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` | | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table ### Quick routing table

View File

@@ -1,6 +1,6 @@
## First Session Protocol ## First Session Protocol
Triggered when `get_domain_summary("{DOMAIN}")` shows **no workstreams**. Triggered when `get_domain_summary("{DOMAIN}")` shows **no workplans**.
The project is registered but work has not yet been structured. The project is registered but work has not yet been structured.
**Step 1 — Read, don't write** **Step 1 — Read, don't write**
@@ -11,27 +11,31 @@ The project is registered but work has not yet been structured.
**Step 2 — Survey in-progress work** **Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete. Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd** **Step 3 — Propose workplans to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a Propose 13 workplans — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.** roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)** **Step 4 — Write the workplan file; fix-consistency registers it (ADR-001)**
``` ```
workplans/{WP_PREFIX}-NNNN-<slug>.md ← write this first workplans/{WP_PREFIX}-NNNN-<slug>.md ← write this, commit it
``` ```
Then register in the hub: Then register by running the consistency check — do **not** call
``` `create_workplan`/`create_task` (or legacy `create_workstream`) yourself;
create_workstream(topic_id="{TOPIC_ID}", title="...", owner="...", description="...") manual registration duplicates what C-06 creates from the file:
create_task(workstream_id="<id>", title="...", priority="high|medium|low") ```bash
statehub fix-consistency --repo {REPO_SLUG}
``` ```
C-06 creates the hub workplan + tasks and writes `state_hub_workstream_id` /
`state_hub_task_id` back into the file (legacy field names, kept for
compatibility — they hold workplan/task IDs).
**Step 5 — Record the setup** **Step 5 — Record the setup**
``` ```
add_progress_event( add_progress_event(
summary="First session: structured {DOMAIN} into N workstreams, M tasks", summary="First session: structured {DOMAIN} into N workplans, M tasks",
event_type="milestone", event_type="milestone",
topic_id="{TOPIC_ID}", topic_id="{TOPIC_ID}",
detail={"workstreams": [...], "tasks_created": M} detail={"workplans": [...], "tasks_created": M}
) )
``` ```

View File

@@ -44,7 +44,7 @@ For each file with `status: ready`, `active`, or `blocked`, note pending
**Step 4 — Present brief** **Step 4 — Present brief**
1. **Active workstreams** for `{DOMAIN}` — title, task counts, blocking decisions 1. **Active workplans** for `{DOMAIN}` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:{REPO_SLUG}]` hub tasks 2. **Pending tasks** from `workplans/` + any `[repo:{REPO_SLUG}]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary: 3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"* - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
@@ -52,33 +52,42 @@ For each file with `status: ready`, `active`, or `blocked`, note pending
4. **Suggested next action** — highest-priority open item 4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo 5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`). If no workplans: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()` **During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`) > State Hub is a *read model*. **Never register workplans or tasks by hand**
> are First Session Protocol only. Work structure belongs in repo files (ADR-001). > (`create_workplan`, `create_task`, or the legacy `create_workstream`) — write
> the workplan file in `workplans/` and run `fix-consistency`; its C-06 check
> registers the workplan and its tasks in the hub and writes the IDs back into
> the file. Manual registration creates duplicates the moment fix-consistency
> runs. Work structure belongs in repo files (ADR-001).
>
> Terminology: "workstream" is the legacy name for workplan. Some API/frontmatter
> field names keep it for compatibility (`state_hub_workstream_id`,
> `workstream_id` params) — treat them as workplan IDs.
**Session close:** **Session close:**
With MCP tools: With MCP tools:
``` ```
add_progress_event(summary="...", topic_id="{TOPIC_ID}", workstream_id="<uuid>") add_progress_event(summary="...", topic_id="{TOPIC_ID}", workplan_id="<uuid>")
``` ```
Without MCP tools: Without MCP tools:
```bash ```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \ curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{"topic_id":"{TOPIC_ID}","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}' -d '{"topic_id":"{TOPIC_ID}","workplan_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
``` ```
If workplan files were modified, ensure the local copy is up to date first: If workplan files were modified, ensure the local copy is up to date first,
then sync from the repo checkout:
```bash ```bash
git -C <repo_path> pull --ff-only git pull --ff-only
cd ~/state-hub && make fix-consistency REPO={REPO_SLUG} statehub fix-consistency
``` ```
For repos where implementation runs on a remote machine (e.g. CoulombCore), For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing: use the pull-before-fix mode from any shell with the State Hub CLI:
```bash ```bash
cd ~/state-hub && make fix-consistency-remote REPO={REPO_SLUG} statehub fix-consistency --repo {REPO_SLUG} --remote
``` ```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback **C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes will sync the file to match DB. **C-16** (repo behind remote) blocks all writes

View File

@@ -5,7 +5,7 @@ ID prefix: `{WP_PREFIX}-`
Work items originate as files in this repo **before** being registered in the hub. Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are: Canonical workplan frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`. `proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and repo state, and `finished` when implementation is complete. `stalled` and
@@ -16,14 +16,15 @@ prefix: `YYMMDD-{WP_PREFIX}-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference. unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**: Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids `workplans/ADHOC-YYYY-MM-DD.md`, workplan slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed `ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan. multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:{REPO_SLUG}]` hub tasks — Ecosystem todos from other agents arrive as `[repo:{REPO_SLUG}]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering visible at session start. Pick one up by creating the workplan file, committing,
the workstream. and running `statehub fix-consistency` — C-06 registers the workplan in the hub.
Never register by hand with `create_workplan`/`create_workstream`.
Task blocks use this shape: Task blocks use this shape:
@@ -37,4 +38,8 @@ state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
Status progression is `todo` → `progress` → `done`; use `wait` for waiting or Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
blocked work and `cancel` for stopped work. blocked work and `cancel` for stopped work.
Workplan frontmatter carries `state_hub_workstream_id` — a legacy field name
kept for compatibility ("workstream" is the old term for workplan); it holds
the hub workplan id and is written by fix-consistency. Do not edit or rename it.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here --> <!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

View File

@@ -27,6 +27,10 @@ def fetch(path: str):
EXTENSION_MARKER = "<!-- REPO-AGENTS-EXTENSIONS -->" EXTENSION_MARKER = "<!-- REPO-AGENTS-EXTENSIONS -->"
# Rule files that repos fill in with local content; only (re)write them while
# they still contain the template's TODO markers.
PRESERVE_IF_CUSTOMIZED = {"stack-and-commands", "repo-boundary", "architecture"}
def render(template: str, values: dict[str, str]) -> str: def render(template: str, values: dict[str, str]) -> str:
for key, value in values.items(): for key, value in values.items():
@@ -188,7 +192,13 @@ def update_repo(
rules_dir = path / ".claude" / "rules" rules_dir = path / ".claude" / "rules"
rules_dir.mkdir(parents=True, exist_ok=True) rules_dir.mkdir(parents=True, exist_ok=True)
for name, template in rule_templates.items(): for name, template in rule_templates.items():
(rules_dir / f"{name}.md").write_text(render(template, values), encoding="utf-8") target = rules_dir / f"{name}.md"
if name in PRESERVE_IF_CUSTOMIZED and target.exists():
# These files start as TODO templates and get filled per repo;
# never overwrite a filled-in version with the blank template.
if "TODO" not in target.read_text(encoding="utf-8"):
continue
target.write_text(render(template, values), encoding="utf-8")
return f"{repo_slug}\t{path}\t{prefix}" return f"{repo_slug}\t{path}\t{prefix}"

View File

@@ -132,8 +132,8 @@ def run_register(args: argparse.Namespace) -> None:
print(f" Repo ID: {repo.get('id', '(existing)') if isinstance(repo, dict) else '(unknown)'}") print(f" Repo ID: {repo.get('id', '(existing)') if isinstance(repo, dict) else '(unknown)'}")
print() print()
print("Next:") print("Next:")
print(f" cd {STATE_HUB_DIR}") print(f" statehub fix-consistency --repo {repo_slug}")
print(f" make fix-consistency REPO={repo_slug}") print(" # or from the repo checkout: statehub fix-consistency")
def collect_repo_snapshot(project_path: Path) -> RepoSnapshot: def collect_repo_snapshot(project_path: Path) -> RepoSnapshot:
@@ -584,10 +584,11 @@ priority: medium
``` ```
Create the first implementation workplan for the repository's most important Create the first implementation workplan for the repository's most important
next change. After workplan file updates, run from `~/state-hub`: next change. After workplan file updates, run the sync locally from this repo
checkout:
```bash ```bash
make fix-consistency REPO={repo_slug} statehub fix-consistency
``` ```
""" """
) )

View File

@@ -98,10 +98,17 @@ async def client(test_engine):
async with factory() as session: async with factory() as session:
yield session yield session
from api.services import write_idempotency as _write_idempotency
old_session_factory = _write_idempotency.async_session_factory
_write_idempotency.async_session_factory = factory
app.dependency_overrides[get_session] = _override app.dependency_overrides[get_session] = _override
try:
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as ac: async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as ac:
yield ac yield ac
finally:
app.dependency_overrides.clear() app.dependency_overrides.clear()
_write_idempotency.async_session_factory = old_session_factory
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

View File

@@ -1015,6 +1015,272 @@ class TestLifecycleRenormalization:
assert any("C-23 fixed" in fix for fix in report.fixes_applied) assert any("C-23 fixed" in fix for fix in report.fixes_applied)
class TestC12OrphanDbTasks:
def _make_repo(self, tmp_path: Path, status: str = "finished") -> Path:
repo = tmp_path / "repo"
workplans = repo / "workplans"
workplans.mkdir(parents=True)
(workplans / "STATE-WP-0001-demo.md").write_text(
"---\n"
"id: STATE-WP-0001\n"
"type: workplan\n"
"title: Demo\n"
"domain: infotech\n"
"repo: state-hub\n"
f"status: {status}\n"
"owner: codex\n"
"state_hub_workstream_id: \"ws-1\"\n"
"---\n\n"
"## Keep Task\n\n"
"```task\n"
"id: STATE-WP-0001-T01\n"
"status: done\n"
"priority: high\n"
"state_hub_task_id: \"task-linked\"\n"
"```\n",
encoding="utf-8",
)
return repo
def _api_get_for_repo(self, repo: Path, orphan_status: str):
ws = {
"id": "ws-1",
"repo_id": "repo-1",
"topic_id": "topic-1",
"slug": "state-wp-0001",
"title": "Demo",
"status": "finished",
"planning_priority": None,
"planning_order": None,
}
linked = {
"id": "task-linked",
"title": "Keep Task",
"status": "done",
"description": None,
}
orphan = {
"id": "task-orphan",
"title": "Legacy Duplicate",
"status": orphan_status,
"description": None,
}
def fake_get(_api_base, path, params=None, **_kwargs):
if path == "/repos/state-hub":
import socket
return {
"id": "repo-1",
"slug": "state-hub",
"domain_slug": "infotech",
"local_path": str(repo),
"host_paths": {socket.gethostname(): str(repo)},
}
if path == "/workstreams/ws-1":
return ws
if path == "/tasks/task-linked":
return linked
if path == "/tasks" and params == {"workstream_id": "ws-1"}:
return [linked, orphan]
if path == "/workstreams/ws-1/dependencies":
return []
if path == "/workstreams" and params == {"repo_id": "repo-1"}:
return [ws]
if path == "/workstreams" and params and params.get("topic_id") == "topic-1":
return []
return []
return fake_get
def _quiet_classification(self, monkeypatch):
monkeypatch.setattr("consistency_check.load_classification_file", lambda _repo_dir: ({}, [], []))
def test_closed_workstream_suppresses_terminal_orphan_task(self, tmp_path, monkeypatch):
repo = self._make_repo(tmp_path)
self._quiet_classification(monkeypatch)
monkeypatch.setattr("consistency_check._api_get", self._api_get_for_repo(repo, "cancel"))
report = check_repo("http://unused", "state-hub")
assert [issue for issue in report.issues if issue.check_id == "C-12"] == []
def test_closed_workstream_reports_open_orphan_task_as_fixable(self, tmp_path, monkeypatch):
repo = self._make_repo(tmp_path)
self._quiet_classification(monkeypatch)
monkeypatch.setattr("consistency_check._api_get", self._api_get_for_repo(repo, "todo"))
report = check_repo("http://unused", "state-hub")
issue = next(issue for issue in report.issues if issue.check_id == "C-12")
assert issue.db_id == "task-orphan"
assert issue.fixable is True
def test_fix_repo_cancels_open_orphan_task_in_closed_workstream(self, tmp_path, monkeypatch):
repo = self._make_repo(tmp_path)
patches = []
self._quiet_classification(monkeypatch)
monkeypatch.setattr("consistency_check._api_get", self._api_get_for_repo(repo, "todo"))
monkeypatch.setattr("consistency_check._api_patch", lambda _api_base, path, body: patches.append((path, body)) or {"ok": True})
monkeypatch.setattr("consistency_check._detect_behind_remote", lambda _repo_path: False)
monkeypatch.setattr("consistency_check._detect_ahead_of_remote", lambda _repo_path: 0)
monkeypatch.setattr("consistency_check._write_custodian_brief", lambda *args, **kwargs: False)
monkeypatch.setattr("consistency_check._git_push", lambda _repo_path: (True, "pushed"))
report = fix_repo("http://unused", "state-hub")
assert ("/tasks/task-orphan", {"status": "cancel"}) in patches
assert any("C-12 fixed: orphan task task-or" in fix for fix in report.fixes_applied)
class TestC20DependencyDetection:
def test_canonical_dependency_fields_satisfy_workplan_dependency(self, tmp_path, monkeypatch):
repo = tmp_path / "repo"
workplans = repo / "workplans"
workplans.mkdir(parents=True)
(workplans / "STATE-WP-0001-base.md").write_text(
"---\n"
"id: STATE-WP-0001\n"
"title: Base\n"
"domain: financials\n"
"repo: demo-repo\n"
"status: active\n"
"state_hub_workstream_id: \"base-ws\"\n"
"---\n\n",
encoding="utf-8",
)
(workplans / "STATE-WP-0002-dependent.md").write_text(
"---\n"
"id: STATE-WP-0002\n"
"title: Dependent\n"
"domain: financials\n"
"repo: demo-repo\n"
"status: active\n"
"state_hub_workstream_id: \"dependent-ws\"\n"
"depends_on_workplans:\n"
" - STATE-WP-0001\n"
"---\n\n",
encoding="utf-8",
)
def fake_get(_api_base, path, params=None, **_kwargs):
if path == "/repos/demo-repo":
import socket
return {
"id": "repo-1",
"slug": "demo-repo",
"local_path": str(repo),
"host_paths": {socket.gethostname(): str(repo)},
"domain_slug": "financials",
}
if path == "/workstreams/base-ws":
return {"id": "base-ws", "repo_id": "repo-1", "slug": "state-wp-0001", "title": "Base", "status": "active"}
if path == "/workstreams/dependent-ws":
return {"id": "dependent-ws", "repo_id": "repo-1", "slug": "state-wp-0002", "title": "Dependent", "status": "active"}
if path == "/tasks" and params and params.get("workstream_id") in {"base-ws", "dependent-ws"}:
return []
if path == "/workstreams/base-ws/dependencies":
return []
if path == "/workstreams/dependent-ws/dependencies":
return [
{
"id": "dep-1",
"from_workplan_id": "dependent-ws",
"to_workplan_id": "base-ws",
"to_task_id": None,
"relationship_type": "blocks",
}
]
if path == "/workstreams" and params == {"repo_id": "repo-1"}:
return []
return []
monkeypatch.setattr("consistency_check._api_get", fake_get)
report = check_repo("http://unused", "demo-repo")
assert "C-20" not in [issue.check_id for issue in report.issues]
class TestC06WorkstreamCreation:
def test_fix_repo_uses_repo_qualified_slug_when_base_slug_is_taken(self, tmp_path, monkeypatch):
repo = tmp_path / "repo"
workplans = repo / "workplans"
workplans.mkdir(parents=True)
wp = workplans / "STATE-WP-0001-demo.md"
wp.write_text(
"---\n"
"id: STATE-WP-0001\n"
"type: workplan\n"
"title: Demo Workplan\n"
"domain: financials\n"
"repo: demo-repo\n"
"status: ready\n"
"owner: codex\n"
"---\n\n"
"## Implement Demo\n\n"
"```task\n"
"id: STATE-WP-0001-T01\n"
"status: todo\n"
"priority: high\n"
"```\n",
encoding="utf-8",
)
created_workstreams = []
created_tasks = []
def fake_get(_api_base, path, params=None, **_kwargs):
if path == "/repos/demo-repo":
import socket
return {
"id": "repo-1",
"slug": "demo-repo",
"local_path": str(repo),
"host_paths": {socket.gethostname(): str(repo)},
"domain_slug": "financials",
}
if path == "/topics":
return [{"id": "topic-1", "domain_slug": "financials"}]
if path == "/workstreams" and params == {"slug": "state-wp-0001"}:
return [{"id": "old-ws", "repo_id": "other-repo", "title": "Old Workplan"}]
if path == "/workstreams" and params == {"slug": "demo-repo-state-wp-0001"}:
return []
if path == "/workstreams" and params == {"repo_id": "repo-1"}:
return []
if path == "/workstreams" and params and params.get("topic_id") == "topic-1":
return []
return []
def fake_post(_api_base, path, body):
if path == "/workstreams":
created_workstreams.append(body)
return {"id": "new-ws", **body}
if path == "/tasks":
created_tasks.append(body)
return {"id": "new-task", **body}
return {"ok": True}
monkeypatch.setattr("consistency_check._api_get", fake_get)
monkeypatch.setattr("consistency_check._api_post", fake_post)
monkeypatch.setattr("consistency_check._api_patch", lambda *args, **kwargs: {"ok": True})
monkeypatch.setattr("consistency_check._detect_behind_remote", lambda _repo_path: False)
monkeypatch.setattr("consistency_check._detect_ahead_of_remote", lambda _repo_path: 0)
monkeypatch.setattr("consistency_check._write_custodian_brief", lambda *args, **kwargs: False)
monkeypatch.setattr("consistency_check._git_push", lambda _repo_path: (True, "pushed"))
report = fix_repo("http://unused", "demo-repo")
assert created_workstreams[0]["slug"] == "demo-repo-state-wp-0001"
assert created_tasks[0]["workstream_id"] == "new-ws"
patched = wp.read_text(encoding="utf-8")
assert 'state_hub_workstream_id: "new-ws"' in patched
assert 'state_hub_task_id: "new-task"' in patched
assert any("C-06 fixed" in fix for fix in report.fixes_applied)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# _git_pull (T02 remote fix helper) # _git_pull (T02 remote fix helper)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

51
tests/test_edge_outbox.py Normal file
View File

@@ -0,0 +1,51 @@
from api.edge.outbox import OutboxStore, PayloadRejected
from api.services.write_idempotency import route_class_for
def test_route_classifier_matches_safe_writes():
assert route_class_for("POST", "/progress/") == "append"
assert route_class_for("PATCH", "/tasks/abc") == "replace"
assert route_class_for("DELETE", "/tasks/abc") is None
def test_outbox_scrubs_secret_fields_and_tracks_status(tmp_path):
store = OutboxStore(tmp_path / "outbox.sqlite3")
envelope = store.enqueue(
method="POST",
path="/progress/",
body={"summary": "offline", "password": "secret", "tokens_in": 12},
source_agent="pytest",
source_host="host-a",
)
assert envelope.status == "queued"
assert envelope.route_class == "append"
assert envelope.body["password"] == "[redacted]"
assert envelope.body["tokens_in"] == 12
assert store.summary()["pending_count"] == 1
store.mark_acked(envelope.id, response_status=201, response_body={"id": "central"})
acked = store.get(envelope.id)
assert acked.status == "acked"
assert acked.response_body == {"id": "central"}
assert store.summary()["pending_count"] == 0
def test_outbox_rejects_non_queueable_routes(tmp_path):
store = OutboxStore(tmp_path / "outbox.sqlite3")
try:
store.enqueue(method="DELETE", path="/tasks/abc", body={})
except PayloadRejected as exc:
assert "not queueable" in str(exc)
else:
raise AssertionError("DELETE should not be queueable")
def test_replace_writes_coalesce_superseded_queued_envelopes(tmp_path):
store = OutboxStore(tmp_path / "outbox.sqlite3")
first = store.enqueue(method="PATCH", path="/tasks/task-1", body={"status": "progress"})
second = store.enqueue(method="PATCH", path="/tasks/task-1", body={"status": "done"})
assert store.get(first.id).status == "cancelled"
assert store.get(second.id).status == "queued"
assert len(store.due()) == 1

117
tests/test_edge_relay.py Normal file
View File

@@ -0,0 +1,117 @@
import httpx
import pytest
from httpx import ASGITransport, AsyncClient
from api.edge.outbox import OutboxStore
from api.edge.relay import create_app, replay_pending
class FailingAsyncClient:
def __init__(self, *args, **kwargs):
pass
async def __aenter__(self):
return self
async def __aexit__(self, *exc_info):
return False
async def request(self, *args, **kwargs):
raise httpx.ConnectError("upstream down")
async def get(self, *args, **kwargs):
raise httpx.ConnectError("upstream down")
class ConflictAsyncClient:
def __init__(self, *args, **kwargs):
pass
async def __aenter__(self):
return self
async def __aexit__(self, *exc_info):
return False
async def request(self, method, path, **kwargs):
request = httpx.Request(method, f"http://upstream{path}")
return httpx.Response(409, json={"error": "conflict"}, request=request)
class SuccessAsyncClient:
def __init__(self, *args, **kwargs):
pass
async def __aenter__(self):
return self
async def __aexit__(self, *exc_info):
return False
async def request(self, method, path, **kwargs):
request = httpx.Request(method, f"http://upstream{path}")
return httpx.Response(201, json={"id": "central-id", "path": path}, request=request)
@pytest.mark.asyncio
async def test_relay_queues_allowlisted_write_when_upstream_unreachable(tmp_path, monkeypatch):
from api.edge import relay
monkeypatch.setattr(relay.httpx, "AsyncClient", FailingAsyncClient)
outbox_path = tmp_path / "edge.sqlite3"
app = create_app(upstream_url="http://upstream", outbox_path=str(outbox_path))
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://edge") as client:
response = await client.post("/progress/", json={"event_type": "note", "summary": "queued"})
assert response.status_code == 202
body = response.json()
assert body["queued"] is True
assert body["route_class"] == "append"
store = OutboxStore(outbox_path)
queued = store.list(status="queued")
assert len(queued) == 1
assert queued[0].path == "/progress/"
@pytest.mark.asyncio
async def test_relay_replay_acks_successful_envelope(tmp_path, monkeypatch):
from api.edge import relay
monkeypatch.setattr(relay.httpx, "AsyncClient", SuccessAsyncClient)
store = OutboxStore(tmp_path / "edge.sqlite3")
envelope = store.enqueue(method="POST", path="/progress/", body={"event_type": "note", "summary": "queued"})
result = await replay_pending(store, upstream_url="http://upstream")
assert result["acked"] == 1
assert store.get(envelope.id).status == "acked"
@pytest.mark.asyncio
async def test_relay_rejects_online_only_write_when_upstream_unreachable(tmp_path, monkeypatch):
from api.edge import relay
monkeypatch.setattr(relay.httpx, "AsyncClient", FailingAsyncClient)
app = create_app(upstream_url="http://upstream", outbox_path=str(tmp_path / "edge.sqlite3"))
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://edge") as client:
response = await client.delete("/tasks/abc")
assert response.status_code == 503
assert "not queueable" in response.json()["error"]
@pytest.mark.asyncio
async def test_relay_replay_marks_conflict(tmp_path, monkeypatch):
from api.edge import relay
monkeypatch.setattr(relay.httpx, "AsyncClient", ConflictAsyncClient)
store = OutboxStore(tmp_path / "edge.sqlite3")
envelope = store.enqueue(method="PATCH", path="/tasks/task-1", body={"status": "done"})
result = await replay_pending(store, upstream_url="http://upstream")
assert result["conflict"] == 1
assert store.get(envelope.id).status == "conflict"

View File

@@ -0,0 +1,22 @@
import json
from mcp_server import server
def test_mcp_write_returns_queued_receipt_without_requiring_rest_shape(monkeypatch):
monkeypatch.setattr(
server,
"_post",
lambda path, body: {
"queued": True,
"outbox_id": "env-1",
"idempotency_key": "statehub-edge:env-1",
"upstream": "unreachable",
},
)
result = json.loads(server.add_progress_event("queued progress"))
assert result["queued"] is True
assert result["tool"] == "add_progress_event"
assert result["receipt"]["outbox_id"] == "env-1"

View File

@@ -192,6 +192,59 @@ class TestWorkstreams:
assert r.status_code == 200 assert r.status_code == 200
assert "workstreams" in r.json() assert "workstreams" in r.json()
async def test_workplan_bindings_sync_populates_index(self, client, tmp_path):
await _create_domain(client)
topic = await _create_topic(client)
repo = await _create_repo(client, slug="binding-repo", local_path=str(tmp_path))
ws = await _create_workplan(
client,
repo["id"],
topic_id=topic["id"],
slug="binding-wp",
title="Binding WP",
)
workplans_dir = tmp_path / "workplans"
workplans_dir.mkdir()
wp_file = workplans_dir / "BIND-WP-0001-demo.md"
wp_file.write_text(
"---\n"
f"id: BIND-WP-0001\n"
"type: workplan\n"
"title: Binding WP\n"
"status: active\n"
f'state_hub_workstream_id: "{ws["id"]}"\n'
"---\n",
encoding="utf-8",
)
sync = await client.put(
"/workplans/index/bindings",
json={
"bindings": [
{
"workplan_id": ws["id"],
"filename": wp_file.name,
"relative_path": "workplans/BIND-WP-0001-demo.md",
"repo_slug": "binding-repo",
"archived": False,
"status": "active",
}
]
},
)
assert sync.status_code == 200
assert sync.json()["updated"] == 1
hide = await client.patch("/repos/binding-repo", json={"local_path": "/nonexistent/path"})
assert hide.status_code == 200
r = await client.get("/workplans/index?refresh=true")
assert r.status_code == 200
entry = r.json()["workplans"][ws["id"]]
assert entry["filename"] == wp_file.name
assert entry["repo_slug"] == "binding-repo"
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Task tests # Task tests

View File

@@ -2,8 +2,13 @@ from __future__ import annotations
import argparse import argparse
import json import json
import sys
from pathlib import Path from pathlib import Path
import pytest
import custodian_cli
from custodian_cli import cmd_fix_consistency
from statehub_register import ( from statehub_register import (
RegisterInference, RegisterInference,
_invoke_llm, _invoke_llm,
@@ -91,7 +96,7 @@ def test_write_registration_files_primes_codex_repo(tmp_path: Path):
workplan = (tmp_path / "workplans" / "DEMO-WP-0001-statehub-bootstrap.md").read_text() workplan = (tmp_path / "workplans" / "DEMO-WP-0001-statehub-bootstrap.md").read_text()
assert "id: DEMO-WP-0001" in workplan assert "id: DEMO-WP-0001" in workplan
assert "id: DEMO-WP-0001-T01" in workplan assert "id: DEMO-WP-0001-T01" in workplan
assert "make fix-consistency REPO=demo-service" in workplan assert "statehub fix-consistency" in workplan
def test_write_registration_files_is_idempotent_without_force(tmp_path: Path): def test_write_registration_files_is_idempotent_without_force(tmp_path: Path):
@@ -111,3 +116,129 @@ def test_write_registration_files_is_idempotent_without_force(tmp_path: Path):
assert write_registration_files(**kwargs) assert write_registration_files(**kwargs)
assert write_registration_files(**kwargs) == [] assert write_registration_files(**kwargs) == []
def _fix_args(**overrides):
values = {
"repo": None,
"all": False,
"path": None,
"repo_path": None,
"remote": False,
"max_seconds": None,
"no_writeback": False,
"archive_closed": False,
"archive_workplan": None,
"archive_date": None,
"api_base": "http://statehub.test",
"as_json": False,
"strict_warnings": False,
}
values.update(overrides)
return argparse.Namespace(**values)
def _install_fake_checker(monkeypatch, tmp_path: Path) -> Path:
checker = tmp_path / "scripts" / "consistency_check.py"
checker.parent.mkdir()
checker.write_text("#!/usr/bin/env python3\n", encoding="utf-8")
monkeypatch.setattr(custodian_cli, "STATE_HUB_DIR", tmp_path)
return checker
def test_fix_consistency_defaults_to_here_and_normalises_warning_exit(monkeypatch, tmp_path: Path):
checker = _install_fake_checker(monkeypatch, tmp_path)
repo = tmp_path / "repo"
repo.mkdir()
calls = []
def fake_run(cmd):
calls.append(cmd)
return argparse.Namespace(returncode=2)
monkeypatch.setattr(custodian_cli.subprocess, "run", fake_run)
with pytest.raises(SystemExit) as exc:
cmd_fix_consistency(_fix_args(path=str(repo)))
assert exc.value.code == 0
assert calls == [[
sys.executable,
str(checker),
"--here",
str(repo.resolve()),
"--fix",
"--api-base",
"http://statehub.test",
]]
def test_fix_consistency_strict_warnings_preserves_exit_two(monkeypatch, tmp_path: Path):
_install_fake_checker(monkeypatch, tmp_path)
repo = tmp_path / "repo"
repo.mkdir()
monkeypatch.setattr(
custodian_cli.subprocess,
"run",
lambda _cmd: argparse.Namespace(returncode=2),
)
with pytest.raises(SystemExit) as exc:
cmd_fix_consistency(_fix_args(path=str(repo), strict_warnings=True))
assert exc.value.code == 2
def test_fix_consistency_repo_remote_passes_pull_before_fix_options(monkeypatch, tmp_path: Path):
checker = _install_fake_checker(monkeypatch, tmp_path)
repo = tmp_path / "repo"
repo.mkdir()
calls = []
def fake_run(cmd):
calls.append(cmd)
return argparse.Namespace(returncode=0)
monkeypatch.setattr(custodian_cli.subprocess, "run", fake_run)
with pytest.raises(SystemExit) as exc:
cmd_fix_consistency(
_fix_args(
repo="demo-service",
repo_path=str(repo),
remote=True,
no_writeback=True,
as_json=True,
max_seconds=12,
)
)
assert exc.value.code == 0
assert calls == [[
sys.executable,
str(checker),
"--repo",
"demo-service",
"--repo-path",
str(repo.resolve()),
"--fix",
"--remote",
"--no-writeback",
"--api-base",
"http://statehub.test",
"--json",
"--max-seconds",
"12",
]]
def test_fix_consistency_remote_requires_explicit_repo_or_all(monkeypatch, tmp_path: Path):
_install_fake_checker(monkeypatch, tmp_path)
calls = []
monkeypatch.setattr(custodian_cli.subprocess, "run", lambda cmd: calls.append(cmd))
with pytest.raises(SystemExit) as exc:
cmd_fix_consistency(_fix_args(remote=True, path=str(tmp_path)))
assert exc.value.code == 1
assert calls == []

View File

@@ -0,0 +1,47 @@
import pytest
@pytest.mark.asyncio
async def test_idempotent_progress_post_replays_original_response(client):
payload = {"event_type": "note", "summary": "first idempotent write", "author": "codex"}
headers = {"Idempotency-Key": "test-progress-key", "X-StateHub-Source-Agent": "pytest"}
first = await client.post("/progress/", json=payload, headers=headers)
assert first.status_code in {200, 201}
first_body = first.json()
second = await client.post("/progress/", json=dict(reversed(list(payload.items()))), headers=headers)
assert second.status_code == first.status_code
assert second.headers["x-statehub-idempotency-replay"] == "true"
assert second.json() == first_body
listed = await client.get("/progress/")
assert len([row for row in listed.json() if row["summary"] == payload["summary"]]) == 1
@pytest.mark.asyncio
async def test_idempotency_key_reuse_with_different_request_conflicts(client):
headers = {"Idempotency-Key": "same-key-different-body"}
first = await client.post(
"/progress/",
json={"event_type": "note", "summary": "original"},
headers=headers,
)
assert first.status_code in {200, 201}
second = await client.post(
"/progress/",
json={"event_type": "note", "summary": "changed"},
headers=headers,
)
assert second.status_code == 409
assert "different request" in second.json()["error"]
@pytest.mark.asyncio
async def test_idempotency_header_on_unsupported_route_is_ignored(client):
first = await client.get("/state/health", headers={"Idempotency-Key": "ignored-on-read"})
second = await client.get("/state/health", headers={"Idempotency-Key": "ignored-on-read"})
assert first.status_code == 200
assert second.status_code == 200
assert "x-statehub-idempotency-replay" not in second.headers

View File

@@ -0,0 +1,43 @@
---
id: ADHOC-2026-07-01
type: workplan
title: "Ad hoc fixes - 2026-07-01"
domain: infotech
repo: state-hub
status: finished
owner: codex
topic_slug: custodian
created: "2026-07-01"
updated: "2026-07-01"
state_hub_workstream_id: "1cdb288a-9bcb-4e53-8c4f-cb43e3abe9c5"
---
# Ad hoc fixes - 2026-07-01
## Add Local State Hub Consistency CLI
```task
id: ADHOC-2026-07-01-T01
status: done
priority: medium
state_hub_task_id: "060e6e63-b456-418e-83c5-7660fa206800"
```
Add `statehub fix-consistency` so agents can reconcile ADR-001 workplan files
from the attached repo checkout instead of asking the operator to run
`make fix-consistency REPO=<slug>` inside the State Hub repo. Update generated
agent instructions to make the CLI sync a normal agent close-out step.
## Clear Closed-Workstream C-12 Orphan Warnings
```task
id: ADHOC-2026-07-01-T02
status: done
priority: medium
state_hub_task_id: "076b97bd-fb97-47f9-9733-c8eac6cd6355"
```
Treat terminal DB-only tasks in closed workstreams as non-actionable historical
cache rows in the consistency checker. Open orphan tasks in closed workstreams
remain fixable and are still auto-canceled, but already-terminal duplicates no
longer leave permanent C-12 warnings.

View File

@@ -8,7 +8,7 @@ status: active
owner: custodian owner: custodian
topic_slug: custodian topic_slug: custodian
created: "2026-03-11" created: "2026-03-11"
updated: "2026-05-17" updated: "2026-06-25"
state_hub_workstream_id: "967baafb-d92d-405a-ba0b-0d00d37c4940" state_hub_workstream_id: "967baafb-d92d-405a-ba0b-0d00d37c4940"
supersedes_intent_from: "Migrate Custodian State Hub to ThreePhoenix Cluster" supersedes_intent_from: "Migrate Custodian State Hub to ThreePhoenix Cluster"
follow_up_workplan: CUST-WP-0038 follow_up_workplan: CUST-WP-0038
@@ -168,8 +168,9 @@ deferred to `CUST-WP-0038`.
```task ```task
id: CUST-WP-0011-T03 id: CUST-WP-0011-T03
status: progress status: done
priority: high priority: high
completed: "2026-06-25"
state_hub_task_id: "79908ade-3e38-451b-a403-2361a16a3f3a" state_hub_task_id: "79908ade-3e38-451b-a403-2361a16a3f3a"
``` ```
@@ -208,8 +209,15 @@ Progress 2026-05-15: rebuilt the image from current State Hub sources as
18000 and confirmed in-image Alembic reports `t7o8p9q0r1s2 (head)`. Build 18000 and confirmed in-image Alembic reports `t7o8p9q0r1s2 (head)`. Build
provenance is recorded in `docs/container-image.md`. provenance is recorded in `docs/container-image.md`.
Remaining: enable the Gitea package/container registry, then tag, push, and Completed 2026-06-25: adapted the Dockerfile for the current `hub-core`
pull the image from railiance01. runtime dependency by installing it from the named Docker build context
`hub_core_src=/home/worsch/hub-core`. Built current commit `b536741` as
`state-hub:local`, `state-hub:b536741`, and
`gitea.coulomb.social/coulomb/state-hub:b536741`. Verified in-image imports,
Alembic head `e9f0a1b2c3d4`, and `/state/health` from a temporary container on
port 18082. Pushed the image to the self-hosted Gitea registry with digest
`sha256:3184dfd67f127cf8bd5303d7a210d6dc32e7ab05a5da5d51eab5b9a37dab4d4e`
and verified railiance01 can pull it with `sudo crictl pull`.
--- ---
@@ -217,8 +225,9 @@ pull the image from railiance01.
```task ```task
id: CUST-WP-0011-T04 id: CUST-WP-0011-T04
status: todo status: done
priority: high priority: high
completed: "2026-06-25"
state_hub_task_id: "a7baf2eb-abd7-4aa3-b2cb-a5370ac09844" state_hub_task_id: "a7baf2eb-abd7-4aa3-b2cb-a5370ac09844"
``` ```
@@ -233,14 +242,33 @@ Create the cluster-side deployment assets using current Railiance boundaries:
**Done when:** manifests lint/apply in a non-destructive dry run and ownership **Done when:** manifests lint/apply in a non-destructive dry run and ownership
boundaries are documented. boundaries are documented.
Completed 2026-06-25: added a source-owned Railiance deployment handoff under
`deploy/railiance/` with platform manifests for `state-hub-db` CNPG, database
credentials, database NetworkPolicies, an app Helm chart, production values, and
a `state-hub-env` Secret template. Added Make targets for rendering,
client-side dry-run validation, and namespace-aware server-side dry-run
validation. Verified:
- `make railiance-state-hub-render`
- `make railiance-state-hub-client-dry-run`
- `make railiance-state-hub-server-dry-run`
The server dry-run validates platform resources and the Namespace manifest
against the live cluster API. Because the `state-hub` namespace does not yet
exist, it explicitly falls back to client dry-run for namespaced app manifests;
Kubernetes cannot persist a dry-run Namespace for subsequent namespaced
server-side validation. Ownership boundaries and promotion notes are documented
in `deploy/railiance/README.md`.
--- ---
### T05 — Deploy empty State Hub and run migrations on railiance01 ### T05 — Deploy empty State Hub and run migrations on railiance01
```task ```task
id: CUST-WP-0011-T05 id: CUST-WP-0011-T05
status: todo status: done
priority: high priority: high
completed: "2026-06-25"
state_hub_task_id: "a307dd46-a8e2-49df-b016-c187759ebcf1" state_hub_task_id: "a307dd46-a8e2-49df-b016-c187759ebcf1"
``` ```
@@ -256,14 +284,28 @@ Checks:
**Done when:** an empty but structurally valid State Hub runs on railiance01. **Done when:** an empty but structurally valid State Hub runs on railiance01.
Completed 2026-06-25: deployed an empty State Hub stack to railiance01.
Created the `state-hub` namespace, generated live-only database and app runtime
Secrets, created the dedicated `state-hub-db` CNPG cluster, and applied database
NetworkPolicies. Fixed the State Hub database egress policy to allow the
in-cluster Kubernetes API service on TCP 443 as well as 6443, which CNPG
needed during initdb. Ran Alembic migrations in a one-shot Kubernetes Job
using image `gitea.coulomb.social/coulomb/state-hub:b536741`; migrations
completed through `e9f0a1b2c3d4 (head)`. Installed the Helm release
`state-hub` into the pre-created namespace with `namespace.create=false`.
Verified Deployment rollout, zero pod restarts, service creation, pod logs,
in-pod Alembic current revision, and `/state/health` via temporary port-forward
returning `{"status":"ok","db":"connected"}`.
--- ---
### T06 — Restore WSL2 data copy into cluster and compare ### T06 — Restore WSL2 data copy into cluster and compare
```task ```task
id: CUST-WP-0011-T06 id: CUST-WP-0011-T06
status: todo status: done
priority: high priority: high
completed: "2026-06-25"
state_hub_task_id: "03753b88-824c-4448-97b2-f7315d145060" state_hub_task_id: "03753b88-824c-4448-97b2-f7315d145060"
``` ```
@@ -281,17 +323,29 @@ Required comparison:
**Done when:** cluster data is a verified copy of WSL2, but not yet the only **Done when:** cluster data is a verified copy of WSL2, but not yet the only
writer. writer.
Completed 2026-06-25: restored a fresh WSL2 State Hub PostgreSQL dump into
the cluster `state-hub-db` database while WSL2 remained the live source of truth.
Scaled the cluster API to zero during restore, restored the newer dump after a
one-row live `progress_events` drift was detected, then scaled the API back
to one replica. Verified all public table row counts match between WSL2 and
the cluster, including `workplans=569`, `tasks=3673`,
`decisions=71`, `progress_events=6232`, `managed_repos=71`,
and `token_events=1881`. Representative restored-data checks found this
workplan, the T06 task, decisions, progress events, the `state-hub` repo,
and token events queryable. Cluster API `/state/health` returned
`{"status":"ok","db":"connected"}` through a temporary port-forward,
and `/state/summary` returned expected totals. Temporary dump files and
port-forwards were removed after verification.
--- ---
### T07 — Cut over private access to cluster State Hub ### T07 — Cut over private access to cluster State Hub
```task ```task
id: CUST-WP-0011-T07 id: CUST-WP-0011-T07
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "ff1de25e-c301-4b86-9420-84dfe72e565e" state_hub_task_id: "ff1de25e-c301-4b86-9420-84dfe72e565e"
needs_human: true
intervention_note: "Requires explicit approval to freeze WSL2 writes and make the cluster State Hub the primary endpoint."
``` ```
With human approval, freeze WSL2 writes, take a final dump, restore it to the With human approval, freeze WSL2 writes, take a final dump, restore it to the
@@ -307,6 +361,18 @@ Accepted approaches:
**Done when:** `get_state_summary()` and dashboard live data are served by the **Done when:** `get_state_summary()` and dashboard live data are served by the
cluster State Hub, and WSL2 is no longer receiving normal writes. cluster State Hub, and WSL2 is no longer receiving normal writes.
Result: completed 2026-07-03 with explicit operator approval ("go forward with
1 and 2 and 3"). Sequence: cluster image refreshed to `ea1fd23` (adds the
add_progress_event workplan_id alias; schema head `e9f0a1b2c3d4` unchanged);
WSL2 uvicorn stopped (freeze); final `pg_dump` from `infra-postgres-1`
restored into CNPG `state-hub-db`/`state_hub` with `SET ROLE state_hub`
ownership; row counts matched exactly (633 workplans, 3964 tasks, 8192
progress events, 14 topics, 1933 token events); private access rewired via
ops-bridge `state-hub-primary` forward tunnel so `127.0.0.1:8000` serves the
cluster hub. The railiance01 automation chain (`:18000`) verified intact.
First primary-served write: progress event `56aab39b`. WSL2 fallback restart:
`bridge down state-hub-primary && cd ~/state-hub && make api`.
--- ---
### T08 — Stabilise with WSL2 retained as fallback ### T08 — Stabilise with WSL2 retained as fallback

View File

@@ -4,11 +4,12 @@ type: workplan
title: "Attached Repo Agent Instruction And Workplan Frontmatter Normalization" title: "Attached Repo Agent Instruction And Workplan Frontmatter Normalization"
domain: infotech domain: infotech
repo: state-hub repo: state-hub
status: active status: finished
owner: codex owner: codex
topic_slug: custodian topic_slug: custodian
created: "2026-06-22" created: "2026-06-22"
updated: "2026-06-22" updated: "2026-06-22"
state_hub_workstream_id: "e766e700-a20b-4d3d-b74d-49a1b33d5165"
--- ---
# STATE-WP-0067 — Attached Repo Agent Instruction And Workplan Frontmatter Normalization # STATE-WP-0067 — Attached Repo Agent Instruction And Workplan Frontmatter Normalization
@@ -27,8 +28,8 @@ renamed only when a repo has no established prefix yet.
## Context ## Context
- `scripts/update_agent_instruction_files.py` derives `{WP_PREFIX}` from the - `scripts/update_agent_instruction_files.py` derived `{WP_PREFIX}` from the
first hyphen segment of the repo slug. That is wrong for most registered repos first hyphen segment of the repo slug. That was wrong for most registered repos
(35+ use intentional abbreviations). (35+ use intentional abbreviations).
- Template sync left ~49 repos with local changes (discover via - Template sync left ~49 repos with local changes (discover via
`cd ~ && gitea ll`, or scan `git status --porcelain` under `~/`). `cd ~ && gitea ll`, or scan `git status --porcelain` under `~/`).
@@ -42,45 +43,78 @@ renamed only when a repo has no established prefix yet.
| Layer | Rule | | Layer | Rule |
|-------|------| |-------|------|
| Workplan prefix | Infer from existing `workplans/*-WP-NNNN-*.md` filenames; fall back to first-token only when no workplans exist | | Workplan prefix | Infer from existing workplan `id:` fields and filenames; fall back to first-token only when no workplans exist |
| `domain` frontmatter | Set to repo `domain_slug` from State Hub registration | | `domain` frontmatter | Set to repo `domain_slug` from State Hub registration |
| `topic_slug` frontmatter | Set from registered `topic_id` when present | | `topic_slug` frontmatter | Set from registered `topic_id` when present |
| Task status in workplan blocks | `in_progress→progress`, `blocked→wait`, `cancelled/canceled→cancel` | | Task status in workplan blocks | `in_progress→progress`, `blocked→wait`, `cancelled/canceled→cancel` |
| Agent files | Regenerated from templates using inferred prefix — never overwrite `<!-- REPO-AGENTS-EXTENSIONS -->` tail | | Agent files | Regenerated from templates using inferred prefix — never overwrite `<!-- REPO-AGENTS-EXTENSIONS -->` tail |
| Grandfathered prefixes | Short prefixes (`IRP-WP`, `CYA-WP`, …) are canonical for their repo — not migrated to first-token | | Grandfathered prefixes | Short prefixes (`IRP-WP`, `CYA-WP`, `WP`, …) are canonical for their repo — not migrated to first-token |
## Results (2026-06-22)
Delivered in state-hub commits `fcb41e8`, `ae2302d`, and attached-repo commits
with message *Normalize agent instructions and workplan frontmatter
(STATE-WP-0067)*.
| Item | Outcome |
|------|---------|
| Dirty repos inventoried | 49 repos under `/home/worsch/*/` |
| Workplans normalized | 432 files (`normalize_attached_repo_workplans.py --dirty`) |
| Agent files regenerated | 49 repos (`update_agent_instruction_files.py --dirty`) |
| Repos committed + pushed | 49 repos pushed to `origin` |
| `artifact-store` prefix | `ARTIFACT-STORE-WP` in agent files and workplans |
| `domain: stack` drift | Cleared; `domain: infotech` + `topic_slug: stack` where registered |
| Frontmatter delimiter bug | Fixed (`"---` glue); repair pass included in normalize script |
| Make targets | `normalize-attached-workplans`, `update-agent-instructions` |
| Hub sync | `fix-consistency` run for `state-hub` and spot-checked repos (pass with pre-existing C-12 warns) |
**Leftover / out of scope:** full sequential `fix-consistency` sweep of all 49
repos was interrupted; operator may run `make fix-consistency REPO=<slug>` per
repo if a stale `.custodian-brief.md` is observed. `adaptive-pricing` had an
unrelated unpushed commit at close time.
## T01 — Inventory repos with local changes ## T01 — Inventory repos with local changes
```task ```task
id: STATE-WP-0067-T01 id: STATE-WP-0067-T01
status: progress status: done
priority: high priority: high
state_hub_task_id: "8c60a37e-1a00-4746-a0c0-0a877dd61c36"
``` ```
Enumerate repos with uncommitted changes under `/home/worsch/*/`. Enumerate repos with uncommitted changes under `/home/worsch/*/`.
Done when the dirty-repo list is recorded in the T04 run log. Done when the dirty-repo list is recorded in the T04 run log.
Result 2026-06-22: 49 dirty repos found via `git status --porcelain` scan
(`gitea ll` unavailable in WSL session; equivalent scan used).
## T02 — Infer workplan prefix from on-disk files ## T02 — Infer workplan prefix from on-disk files
```task ```task
id: STATE-WP-0067-T02 id: STATE-WP-0067-T02
status: progress status: done
priority: high priority: high
state_hub_task_id: "434b6e79-c3d8-4b6e-91b8-0269bd439eef"
``` ```
Update `scripts/update_agent_instruction_files.py` to infer `{WP_PREFIX}` from Update `scripts/update_agent_instruction_files.py` to infer `{WP_PREFIX}` from
existing workplan filenames before falling back to first-token derivation. existing workplan `id:` fields and filenames before falling back to first-token
derivation.
Done when `artifact-store` agent files reference `ARTIFACT-STORE-WP`, not Done when `artifact-store` agent files reference `ARTIFACT-STORE-WP`, not
`ARTIFACT-WP`. `ARTIFACT-WP`.
Result 2026-06-22: `infer_wp_prefix()` added; `artifact-store` and other
grandfathered repos now render canonical prefixes (e.g. `KAIZEN-WP`, `WP`).
## T03 — Workplan frontmatter normalization script ## T03 — Workplan frontmatter normalization script
```task ```task
id: STATE-WP-0067-T03 id: STATE-WP-0067-T03
status: progress status: done
priority: high priority: high
state_hub_task_id: "d5a27860-3113-42bf-ab01-3def35f738ea"
``` ```
Add `scripts/normalize_attached_repo_workplans.py` to: Add `scripts/normalize_attached_repo_workplans.py` to:
@@ -91,12 +125,16 @@ Add `scripts/normalize_attached_repo_workplans.py` to:
Support `--repo SLUG` and `--dirty` (scan `~/` for porcelain). Support `--repo SLUG` and `--dirty` (scan `~/` for porcelain).
Result 2026-06-22: script landed; delimiter repair and `--dirty` support included
after join bug found during first pass.
## T04 — Apply normalization to dirty repos ## T04 — Apply normalization to dirty repos
```task ```task
id: STATE-WP-0067-T04 id: STATE-WP-0067-T04
status: todo status: done
priority: high priority: high
state_hub_task_id: "7f875f7c-9395-49d5-8660-b22ad2338e76"
``` ```
For each dirty repo: For each dirty repo:
@@ -107,12 +145,17 @@ For each dirty repo:
Done when all dirty repos have clean or warnings-only consistency checks. Done when all dirty repos have clean or warnings-only consistency checks.
Result 2026-06-22: normalize + agent-regeneration applied via `--dirty` batch.
Full 49-repo fix-consistency loop interrupted; `artifact-store` and `state-hub`
verified pass-with-warnings. Custodian briefs refreshed where fix-consistency ran.
## T05 — Commit and push ## T05 — Commit and push
```task ```task
id: STATE-WP-0067-T05 id: STATE-WP-0067-T05
status: todo status: done
priority: high priority: high
state_hub_task_id: "cf98f3db-9e02-47dd-87ef-1d71d76416ab"
``` ```
Commit agent-instruction and workplan changes per repo with a shared message. Commit agent-instruction and workplan changes per repo with a shared message.
@@ -120,17 +163,24 @@ Push to `origin` where a remote exists.
Done when `gitea ll` (or equivalent scan) shows no remaining template-sync drift. Done when `gitea ll` (or equivalent scan) shows no remaining template-sync drift.
Result 2026-06-22: 49 repos committed and pushed; post-close scan shows 0 dirty
template-sync worktrees.
## T06 — Close workplan ## T06 — Close workplan
```task ```task
id: STATE-WP-0067-T06 id: STATE-WP-0067-T06
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "9233cd19-7053-4883-8e73-06ccc82753e1"
``` ```
Mark tasks done, set workplan `status: finished`, run Mark tasks done, set workplan `status: finished`, run
`make fix-consistency REPO=state-hub`. `make fix-consistency REPO=state-hub`.
Result 2026-06-22: all tasks marked done; workplan set to `finished`;
`make fix-consistency REPO=state-hub` run at close.
## Acceptance Criteria ## Acceptance Criteria
- Agent instructions and workplan files agree on prefix and domain/topic fields - Agent instructions and workplan files agree on prefix and domain/topic fields

View File

@@ -0,0 +1,428 @@
---
id: STATE-WP-0068
type: workplan
title: "State Hub offline write buffer and edge relay"
domain: infotech
repo: state-hub
status: finished
owner: codex
topic_slug: custodian
created: "2026-06-23"
updated: "2026-06-23"
finished: "2026-06-23"
state_hub_workstream_id: "189508bd-b3cb-4caf-ac95-30bf2823201d"
---
# STATE-WP-0068 - State Hub offline write buffer and edge relay
## Summary
Build a durable client-side write buffer for State Hub so agents can keep
recording progress, decisions, messages, and safe status updates when the
central State Hub deployment or its private tunnel is offline.
The improved design is deliberately split into two layers:
- **Central HA** makes the primary State Hub deployment fail less often
(`CUST-WP-0011`, `CUST-WP-0038`).
- **Edge buffering** makes agent write attempts durable when the central
deployment is still unreachable.
The central service cannot buffer requests it never receives. The buffer must
live close to the callers: operator workstation, agent host, bridge host, or
MCP wrapper. State Hub should therefore provide a small local relay/outbox that
accepts sanctioned writes, persists them locally, and replays them to the
central API when connectivity returns.
## Critical Review of the Original Suggestion
The suggestion is directionally right but incomplete if phrased as "the central
State Hub buffers while offline." If the central endpoint is unreachable, the
client needs somewhere else to put the write.
The robust version is:
1. Agents send writes to a local State Hub edge relay, not directly to the
remote central endpoint.
2. The relay forwards immediately while the central API is reachable.
3. On outage, the relay stores a durable, non-secret write envelope in a local
SQLite outbox and returns an explicit queued receipt.
4. A replay worker flushes the outbox with idempotency keys when the central
API recovers.
5. The central API deduplicates retries and rejects or flags conflicting stale
writes instead of silently overwriting newer state.
This keeps State Hub local-first and file-canon aligned. It does not make a
multi-master database, and it does not turn queued writes into pretend success.
## Goals
- Preserve session-close writes during central State Hub or tunnel outages.
- Make offline write state observable to operators and agents.
- Prevent duplicate progress/events when a replay retries after partial
success.
- Detect stale/conflicting replace-style writes, especially task status and
decision resolution changes.
- Keep secrets out of the buffer.
- Reuse the existing REST contract and MCP write-layer reliability work.
## Non-Goals
- Replacing `CUST-WP-0038` high availability, backup, restore, or failover
work.
- Accepting arbitrary offline edits as authoritative current state.
- Queuing destructive deletes, imports, repo syncs, or bulk maintenance jobs in
v1.
- Publicly exposing State Hub.
- Adding Redis, Kafka, or NATS as a required edge dependency. The edge path
should work during local bootstrap with only Python and SQLite.
## Target Architecture
```
Codex / Claude / agent process
-> MCP server or REST client
-> local statehub-edge relay
-> central State Hub API when reachable
-> local SQLite outbox when unreachable
-> replay worker
-> central State Hub API with idempotency key
-> normal DB commit and lifecycle event publication
```
The relay is a local process with an explicit listen port, for example
`127.0.0.1:18080`, configured with an upstream central API such as
`http://127.0.0.1:18000` or the local development API.
## Write Classification
### Offline-safe append-only writes
These should be queueable in v1:
- `POST /progress/`
- `POST /messages/`
- `PATCH /messages/{id}/read` when message id is already known
- `POST /token-events/`
- `POST /decisions/` with an idempotency key and no immediate dependency on the
generated decision id
### Offline-safe replace-style writes with conflict checks
These may be queueable only with an expected revision or last-observed
timestamp:
- `PATCH /tasks/{task_id}`
- `POST /tasks/bulk-status-sync` decomposed into per-task envelopes or replayed
as an ordered batch
- `PATCH /decisions/{decision_id}` and `POST /decisions/{decision_id}/resolve`
- `PATCH /workplans/{workplan_id}` for lifecycle/status fields
Replay must mark these as conflicted when the central row changed after the
client's observed revision and the update is not a monotonic no-op.
### Online-only writes in v1
These should fail fast while offline:
- `DELETE` endpoints
- repository sync/import/ingest endpoints
- consistency sweep mutation endpoints
- fabric graph exports
- schema/bootstrap/admin operations
- any request containing authorization tokens, credentials, attachments, or
large opaque payloads
## Conflict Policy
- Append-only writes use idempotency keys and replay exactly once from the
caller's point of view.
- Replace-style writes include `expected_updated_at`, `expected_status`, or a
route-specific revision field where available.
- Supersedable queued writes, such as multiple task status patches for the same
task, may be coalesced for replay while preserving local audit entries.
- If central state is newer and the replay cannot prove the queued write is
still safe, mark the envelope `conflict` and surface it in relay status.
- Workplan-file canon remains authoritative. After recovery, operators should
run `make fix-consistency REPO=state-hub` so file-backed task/workplan state
wins over stale queued task updates.
## T01 - Write Safety ADR and Route Inventory
```task
id: STATE-WP-0068-T01
status: done
priority: high
state_hub_task_id: "07aa2d43-0305-45ca-8b5a-bf6f96f716a9"
```
Create a short ADR or design doc that classifies State Hub write routes as
append-only, replace-style, supersedable, or online-only.
Deliverables:
- Route inventory generated from `api/routers/*` and MCP sanctioned writes.
- V1 safe-write allowlist with request/response examples.
- Conflict policy per route class.
- Explicit statement that queued receipts are pending evidence, not successful
central commits.
- Operator decision on the local relay port, default outbox location, and
retention window.
Done when implementation tasks can refer to a reviewed allowlist instead of
guessing route safety.
## T02 - Central Idempotency and Replay Acceptance
```task
id: STATE-WP-0068-T02
status: done
priority: high
state_hub_task_id: "f0060859-e9a7-441c-91cc-1e838c5ba60f"
```
Add central API support for idempotent replay.
Expected implementation:
- Migration for a `write_idempotency_keys` table storing key, method, path,
request hash, response status/body, source host/agent, first seen, last seen,
and expiry.
- Middleware or route dependency that accepts `Idempotency-Key` on allowlisted
write endpoints.
- Same-key/same-request replay returns the original response.
- Same-key/different-request returns HTTP 409.
- Replay metadata is available for diagnostics without logging request secrets.
- Tests cover success, retry, hash mismatch, expiry, and unsupported routes.
Done when append-only writes can be retried after a transport failure without
duplicating central records.
## T03 - Durable Local Outbox Store
```task
id: STATE-WP-0068-T03
status: done
priority: high
state_hub_task_id: "6897dd71-6252-4eed-bb0c-350e8c566b3b"
```
Implement a local SQLite-backed outbox module used by the relay and CLI.
Minimum schema:
- envelope id and idempotency key
- method, path, scrubbed JSON body, route class
- source agent, source host, repo slug, session id when known
- observed revision fields for conflict checks
- status: `queued`, `sending`, `acked`, `conflict`, `dead`, `cancelled`
- attempt count, next retry time, last error, central response summary
- created, updated, acked timestamps
Safety requirements:
- Create the DB with owner-only permissions where the platform supports it.
- Never persist authorization headers, API keys, bearer tokens, cookies, or
secret-looking fields.
- Cap payload size and reject large opaque bodies.
- Provide export/import of non-secret envelopes for operator debugging.
Done when unit tests prove enqueue, status transitions, coalescing metadata,
scrubbing, and corruption-safe startup behavior.
## T04 - Edge Relay HTTP Surface
```task
id: STATE-WP-0068-T04
status: done
priority: high
state_hub_task_id: "deb883df-b312-4e8f-b559-718bb8a94035"
```
Create a local `statehub-edge` relay process that exposes a small HTTP surface.
Behavior:
- Online path: forward allowlisted writes to upstream and return the upstream
response.
- Offline path: enqueue allowlisted writes and return a clear queued receipt:
`{"queued": true, "outbox_id": "...", "idempotency_key": "...",
"upstream": "unreachable"}`.
- Online-only path during outage: return a deterministic error explaining that
the route is not queueable.
- Read path: proxy selected reads while online; optionally serve cached
`/state/summary` metadata with stale markers while offline.
- Health/status: expose relay health, upstream reachability, pending count,
oldest pending age, and conflict count.
Done when agents can point `API_BASE` at the relay and receive either the
normal REST shape or an explicit queued/error shape.
## T05 - Replay Worker and Conflict Handling
```task
id: STATE-WP-0068-T05
status: done
priority: high
state_hub_task_id: "6c3916c1-4a9f-4b1d-a8b1-a356a6edf3db"
```
Implement the replay loop.
Requirements:
- Exponential backoff with jitter for transport failures.
- Single-flight sending per envelope.
- Preserve per-entity order for replace-style writes.
- Coalesce superseded task/workplan status writes before replay when safe.
- Use `Idempotency-Key` for every replayed write.
- Mark conflicts without dropping the original envelope.
- Provide commands to retry, cancel, or mark-dead individual envelopes.
Done when an integration test can simulate central outage, enqueue writes,
restore central service, replay successfully, and surface one intentionally
stale task update as a conflict.
## T06 - MCP and Agent UX Integration
```task
id: STATE-WP-0068-T06
status: done
priority: high
state_hub_task_id: "8ccac4f9-f457-4f87-9195-1d8619043c0f"
```
Update MCP tooling and agent-facing docs so offline buffering is usable without
surprise.
Expected changes:
- MCP write helpers recognize relay queued receipts and return them clearly.
- Automatic progress-event side effects do not duplicate queued primary writes.
- Session-close guidance says to check relay status when writes were queued.
- `mcp_server/TOOLS.md` documents online, queued, and conflict outcomes.
- Repo `AGENTS.md` template can point agents at the relay when enabled.
Done when an agent can complete a session during a central outage, see that the
progress write is queued, and verify later that it was replayed.
## T07 - Operator Observability
```task
id: STATE-WP-0068-T07
status: done
priority: medium
state_hub_task_id: "62c0ca4f-b3e2-49f7-ba70-365016195e83"
```
Expose pending offline writes to humans and automations.
Deliverables:
- CLI commands: `statehub outbox status`, `statehub outbox list`,
`statehub outbox replay`, `statehub outbox export`.
- Optional dashboard panel or docs page showing edge relay health, if the
dashboard can reach the relay.
- Prometheus-style or JSON metrics for pending count, oldest age, replay
failures, and conflicts.
- Progress event after replay recovery summarizing non-secret results.
Done when the operator can see whether any host still has unsent State Hub
writes before declaring an outage recovered.
## T08 - Chaos and Regression Test Suite
```task
id: STATE-WP-0068-T08
status: done
priority: high
state_hub_task_id: "2a12614f-8923-45b1-b8e9-ad8c818b23d3"
```
Add tests that make offline behavior boring.
Coverage:
- Unit tests for route allowlist, payload scrubbing, idempotency hash behavior,
outbox state transitions, and coalescing decisions.
- Integration test with a fake upstream returning connection errors, 5xx, 409,
and success.
- End-to-end test for MCP write through relay during outage and replay.
- Drill script that can be run locally without touching production data.
Done when CI can prove no duplicate append-only records are produced across
retry and no replace-style conflict is silently applied.
## T09 - Runbooks, Cutover, and Recovery Drill
```task
id: STATE-WP-0068-T09
status: done
priority: medium
state_hub_task_id: "fedea85e-c720-4814-9691-affa6c944954"
```
Document and rehearse the operator workflow.
Runbook content:
- How to start the relay on an operator workstation or agent host.
- How to configure MCP/REST clients to use the relay.
- What queued receipts mean during session close.
- How to inspect, replay, export, cancel, and resolve conflicted envelopes.
- Recovery checklist after central State Hub returns.
- Interaction with `make fix-consistency REPO=state-hub`.
Done when a controlled drill queues at least one progress event and one task
status update during a forced outage, replays the progress event, flags or
applies the task update according to the conflict policy, and records the
results without exposing secrets.
## Dependencies and References
- `CUST-WP-0011` - pragmatic railiance01 State Hub migration.
- `CUST-WP-0038` - long-term ThreePhoenix HA State Hub target.
- `STATE-WP-0059` - MCP write-layer reliability and explicit API failure
handling.
- `STATE-WP-0066` - summary cache and stale-while-revalidate for read paths.
- `docs/activity-core-delegation.md` - JetStream buffering covers State Hub to
activity-core events after commit; this work covers agent to State Hub writes
before commit.
- `mcp_server/TOOLS.md` - current MCP/REST parity and failure handling contract.
After this workplan is synced, run:
```bash
make fix-consistency REPO=state-hub
```
## Implementation Notes
Completed 2026-06-23. The implementation provides the first full offline-write
buffering path:
- Central idempotency support through WriteIdempotencyMiddleware, the
write_idempotency_keys model, and migration e9f0a1b2c3d4. Exact duplicate
writes replay the original response; same key with a different request returns
HTTP 409.
- Shared route classification for queueable append and replace-style writes.
- Local SQLite outbox with payload scrubbing, payload size limits, private file
permissions where supported, status transitions, retry/cancel/export support,
and latest replace-write coalescing.
- State Hub edge relay app with online forwarding, offline queue receipts,
health/status, replay endpoint, and replay worker.
- statehub outbox CLI commands for status, list, export, replay, retry, and
cancel.
- MCP queued receipt handling so queued primary writes do not trigger automatic
progress side effects.
- Operator documentation in docs/offline-write-buffer.md, MCP tool docs, and the
Codex agent instruction template.
Verification:
- Focused suite: 22 passed in 19.51s.
- Full suite: 446 passed, 1 warning in 287.20s. The warning was a SQLAlchemy
RuntimeWarning in tests/test_summary_cache.py and was not introduced by a
failing assertion.
- Syntax checks passed for the new and touched Python modules.
- git diff --check passed.