Compare commits

...

3 Commits

Author SHA1 Message Date
eacfccdffd Add Service DoM dashboard policy page
Mirror the repo-doi/workstream-dod Observable policy pages for service-dom:
read/edit view backed by GET/PUT /policy/service-dom. Add it to the Policies
nav section and the State Hub reference doc. Builds clean (62 pages).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 16:05:04 +02:00
e4126bc755 Add Service Definition of Mature policy and health-route test
Establish policies/service-dom.md as the service-level companion to the repo
DoI and workstream DoD. Its load-bearing Core criterion is a cheap, side-effect
free health endpoint for availability probing — satisfied by the existing
GET /state/health (DB readiness, 200/503). Served automatically at
/policy/service-dom by the existing policy router.

Add a regression test asserting /state/health returns 200 with the expected
shape, since none existed (DoM Standard criterion #4).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 15:58:51 +02:00
044141de48 Add STATE-WP-0061 demand-weighted suggestion backlog workplan
Proposed plan (status: proposed) for a Suggestion entity with a persisted
relevance/demand counter feeding a WSJF read-model projection. Authored during
ops-warden WP-0012 triage; tracks gated needs as relevance-accruing suggestions
rather than inert todo tasks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 15:58:36 +02:00
6 changed files with 389 additions and 0 deletions

View File

@@ -44,6 +44,7 @@ export default {
open: false,
pages: [
{ name: "Repository DoI", path: "/policy/repo-doi" },
{ name: "Service DoM", path: "/policy/service-dom" },
{ name: "Workstream DoD", path: "/policy/workstream-dod" },
],
},

View File

@@ -265,5 +265,6 @@ why, even years later.
- [Connecting to the Hub](/docs/connecting)
- [Repo Integration](/docs/repo-integration)
- [Repository DoI](/policy/repo-doi) — Definition of Integrated
- [Service DoM](/policy/service-dom) — Definition of Mature
- [TPSC](/docs/tpsc) — Third-Party Services Catalog
- [SBOM](/docs/sbom)

View File

@@ -0,0 +1,90 @@
---
title: Service Definition of Mature (DoM)
---
```js
import {API} from "../components/config.js";
```
```js
import {marked} from "npm:marked";
const _resp = await fetch(`${API}/policy/service-dom`);
if (!_resp.ok) throw new Error(`Failed to load policy: ${_resp.status}`);
const _policy = await _resp.json();
```
```js
let _content = _policy.content;
let _editing = false;
const _root = display(html`<div></div>`);
async function _save(text) {
const r = await fetch(`${API}/policy/service-dom`, {
method: "PUT",
headers: {"Content-Type": "application/json"},
body: JSON.stringify({content: text}),
});
if (!r.ok) throw new Error(`Save failed: ${r.status}`);
_content = text;
}
function _toolbar(...nodes) {
return html`<div style="display:flex;gap:0.5rem;margin-bottom:1rem">${nodes}</div>`;
}
function _btn(label, primary = false) {
return html`<button style="
padding:0.35rem 0.9rem;border-radius:4px;cursor:pointer;font-size:13px;
background:${primary ? "#1e293b" : "#f1f5f9"};
color:${primary ? "#f8fafc" : "#1e293b"};
border:1px solid ${primary ? "#1e293b" : "#cbd5e1"};
">${label}</button>`;
}
function _render() {
_root.innerHTML = "";
if (_editing) {
const area = html`<textarea style="
width:100%;box-sizing:border-box;height:520px;
font-family:ui-monospace,monospace;font-size:13px;line-height:1.6;
padding:0.75rem;border:1px solid #cbd5e1;border-radius:4px;
background:#f8fafc;color:#1e293b;resize:vertical;
">${_content}</textarea>`;
const saveBtn = _btn("Save", true);
const cancelBtn = _btn("Cancel");
saveBtn.onclick = async () => {
saveBtn.disabled = true;
saveBtn.textContent = "Saving…";
try {
await _save(area.value);
_editing = false;
_render();
} catch (e) {
saveBtn.disabled = false;
saveBtn.textContent = "Save";
alert(e.message);
}
};
cancelBtn.onclick = () => { _editing = false; _render(); };
_root.append(_toolbar(saveBtn, cancelBtn), area);
} else {
const editBtn = _btn("Edit");
editBtn.onclick = () => { _editing = true; _render(); };
const body = html`<div style="max-width:720px;line-height:1.7;"></div>`;
body.innerHTML = marked.parse(_content);
_root.append(_toolbar(editBtn), body);
}
}
_render();
```

106
policies/service-dom.md Normal file
View File

@@ -0,0 +1,106 @@
# Service Definition of Mature (DoM)
A long-running **service** (an HTTP API, MCP server, worker, or daemon — as
opposed to a repository or a workstream) is considered **mature** when all
criteria below are satisfied. This is the service-level companion to the
Repository *Definition of Integrated* (`repo-doi.md`) and the Workstream
*Definition of Done* (`workstream-dod.md`).
Criteria are grouped by tier: a service that meets all **Core** criteria is
*operable*; meeting **Standard** criteria makes it *observable*; meeting
**Full** criteria makes it *mature*.
---
## Tier 1 — Core (Operable)
The minimum for a service to be run and reasoned about by agents and operators.
- [ ] **Health endpoint** — the service exposes an unauthenticated health route
that allows efficient "is the service available?" probing **without** running
business logic. It returns `200` with a small JSON body
(`{"status": "ok", ...}`) when ready, and a non-2xx (e.g. `503`) when a hard
dependency is unavailable. For the State Hub API this is
`GET /state/health`, which also reports DB connectivity (`{"db": "connected"}`).
Agents should probe this **before** assuming the service is offline (see the
session protocol fallback in `CLAUDE.md`).
- [ ] **Start command documented** — a single documented command brings the
service up from a clean checkout (for the State Hub API: `make api`, with
`make db` first if Postgres is not running).
- [ ] **Bound address known** — the listen host/port is fixed and documented
(State Hub API: `http://127.0.0.1:8000`; remote via ops-bridge:
`http://127.0.0.1:18000`).
---
## Tier 2 — Standard (Observable)
The service can be monitored and integrated by other agents and tooling.
- [ ] **Health route is tested** — an automated test asserts the health route
returns a success status and the expected shape, so regressions that take the
service silently un-probeable are caught.
- [ ] **Dependencies declared** — external service dependencies are declared in
`tpsc.yaml` and ingested (`make ingest-tpsc REPO={slug}`); an empty
`services: []` is used when there are none, to make the absence explicit.
- [ ] **Remote reachability path** — if the service is consumed across machines,
the tunnel/bridge route is documented (ops-bridge port map) and the health
endpoint is reachable over it.
- [ ] **Graceful dependency failure** — when a hard dependency (DB, broker) is
down, the service reports it via the health route rather than crashing or
hanging callers.
---
## Tier 3 — Full (Mature)
The service participates safely in the wider ecosystem over time.
- [ ] **Versioned interface** — breaking interface changes are published via the
interface-change tracker (`publish_interface_change`) so consumers are warned.
- [ ] **Authn/authz boundary documented** — which routes are public (e.g. health)
versus authenticated is explicit, and credential needs route through the
standard channels (`credential-routing.md` / `warden route`).
- [ ] **Recovery documented** — the runbook for restart and for restoring a
failed dependency is captured (for the State Hub API: `make db` then
`make api`; consistency repair via `make fix-consistency`).
- [ ] **Progress/telemetry on lifecycle** — significant lifecycle events
(deploys, migrations, outages) are recorded so the hub reflects service state.
---
## Maturity Checklist (Quick Reference)
| # | Criterion | Tier | Verified by |
|---|---|---|---|
| 1 | Health endpoint | Core | `curl -s $BASE/state/health``200`, `{"status":"ok"}` |
| 2 | Start command documented | Core | `make api` from clean checkout |
| 3 | Bound address known | Core | docs / `CLAUDE.md` |
| 4 | Health route is tested | Standard | `tests/` asserts health route |
| 5 | Dependencies declared | Standard | `make ingest-tpsc` |
| 6 | Remote reachability path | Standard | ops-bridge health probe |
| 7 | Graceful dependency failure | Standard | health returns `503` when DB down |
| 8 | Versioned interface | Full | `publish_interface_change` |
| 9 | Authn/authz boundary documented | Full | docs review |
| 10 | Recovery documented | Full | runbook present |
| 11 | Lifecycle telemetry | Full | `add_progress_event` on lifecycle |
---
## Notes
- The DoM is enforced by convention, not by automated gates.
- The **health endpoint** (Core #1) is the load-bearing criterion: it is what
lets agents and monitors distinguish *"service down"* from *"service up but
the request is wrong,"* cheaply and without side effects.
- "Service" here means a process exposing an interface over its lifetime — the
State Hub API and the FastMCP server each qualify. A one-shot CLI or a
migration script is **not** a service and is out of scope for the DoM.

View File

@@ -1512,3 +1512,20 @@ class TestFabricGraphReadModel:
summary = r.json()
assert summary["schema_version"] is None
assert summary["nodes_by_fabric"] == {}
# ---------------------------------------------------------------------------
# Health route — Service Definition of Mature (policies/service-dom.md), Core #1
# ---------------------------------------------------------------------------
async def test_health_route_reports_ok_when_db_reachable(client):
"""The health endpoint is a cheap availability probe with no business logic.
It must return 200 and a small JSON body so agents and monitors can tell
"service available" from "request wrong" without side effects.
"""
r = await client.get("/state/health")
assert r.status_code == 200, r.text
body = r.json()
assert body["status"] == "ok"
assert body["db"] == "connected"

View File

@@ -0,0 +1,174 @@
---
id: STATE-WP-0061
type: workplan
title: "Demand-weighted suggestion backlog (relevance-fed WSJF)"
domain: custodian
repo: state-hub
status: proposed
owner: codex
topic_slug: custodian
created: "2026-06-18"
updated: "2026-06-18"
---
# STATE-WP-0061 — Demand-weighted suggestion backlog (relevance-fed WSJF)
**Origin:** ops-warden WP-0012 triage (2026-06-18). Most WP-0012 tasks are gated
on external owners shipping paths that do not exist yet. They should not sit as
inert `todo` tasks, nor be fabricated as active entries. They should live as
**suggestions that accrue demand pressure** every time they are needed-but-unmet,
so in-demand work is promoted to real tasks first.
## Problem
The hub today cannot represent "a need that has been raised but not yet vetted or
scheduled, whose urgency grows with repeated demand." Concretely:
- `NextStep` suggestions are **derived on the fly and never persisted**
(`api/schemas/state.py`), so they cannot accumulate anything.
- There is **no relevance/demand counter** on any entity.
- There is **no suggestion → vetted requirement → task** promotion pipeline.
`CapabilityRequest` is the closest analog but models cross-domain brokering, and
a repeat need spawns a *new* request rather than bumping demand on an existing one.
- The **WSJF triage is advisory** (activity-core `daily-statehub-wsjf-triage`,
CUST-WP-0044) and consumes current summary/workplan/progress state — **no
persisted demand signal feeds it**, and the hub holds no Cost-of-Delay / Job-Size
data model.
## Approach (decided)
- **New `Suggestion` entity** (not an extension of `CapabilityRequest`/`TechnicalDebt`).
Stages: `suggestion``requirement` (vetted + structured) → `promoted` (became a
Task); plus terminal `declined`. Append-only `SuggestionNote` trail (mirrors
`TDNote`) records vetting. `promoted_task_id` links the resulting `Task`.
- **Relevance is a persisted demand counter.** It increments whenever the
suggestion is *needed but not yet done* (defined in T3). `relevance` +
`last_requested_at` + a `relevance_events` count drive ranking.
- **WSJF is computed in the hub as a read-model projection** and exposed via a
ranked endpoint; the existing activity-core daily triage **consumes** it.
`wsjf = cost_of_delay / job_size`, where
`cost_of_delay = base_value + (relevance_weight × relevance)` so repeated demand
raises priority. Job size is an operator-set estimate (default medium).
- Promotion keeps the active task backlog clean: gated needs (the WP-0012 case)
live as relevance-accruing suggestions, **not** as inert `todo` tasks or
fabricated active catalog entries.
> **Read-model boundary (ADR-001 / hub design):** suggestion writes are a new
> sanctioned write surface alongside `resolve_decision` and `get_next_steps`.
> `bump_relevance`, `promote_suggestion_to_task`, and vetting transitions are the
> only writes; ranking/WSJF are pure projections. T6 records the ADR amendment.
## Open questions (resolve during T1/T4, do not block proposal)
- Exact WSJF cost-of-delay decomposition (single `base_value` vs SAFe triple of
business-value / time-criticality / risk-reduction). Start with `base_value`
+ relevance; leave room to split later.
- `relevance_weight` default and whether relevance should decay over time
(staleness) — model the field now, tune in T4.
---
## Tasks
### T1 — Suggestion data model + migration
```task
id: STATE-WP-0061-T01
status: todo
priority: high
```
- [ ] `api/models/suggestion.py`: `Suggestion` (id, domain_id, topic_id?,
workstream_id?, title, description, origin, stage, relevance,
relevance_events, last_requested_at, base_value, job_size,
relevance_weight, promoted_task_id) + `SuggestionNote` (append-only trail).
- [ ] `SuggestionStage` enum: `suggestion | requirement | promoted | declined`.
- [ ] Alembic migration; register model in `api/models/__init__.py`.
### T2 — API + MCP sanctioned write layer
```task
id: STATE-WP-0061-T02
status: todo
priority: high
```
- [ ] REST + MCP: `create_suggestion`, `vet_suggestion` (→ requirement, with
structured fields + note), `decline_suggestion`, `promote_suggestion_to_task`
(creates a `Task`, sets `promoted_task_id`, stage→promoted), and `list/get`.
- [ ] `bump_relevance(id, reason)` — sanctioned write; appends a relevance event,
increments counter, sets `last_requested_at`.
- [ ] Document these as sanctioned writes (alongside `resolve_decision`).
### T3 — Relevance emission wiring ("needed but not done")
```task
id: STATE-WP-0061-T03
status: todo
priority: high
```
- [ ] Define the demand events that bump relevance: (a) `get_next_steps` /
dependency lookup resolves to an open suggestion/requirement; (b) a
`CapabilityRequest` matches an unfulfilled suggestion; (c) an explicit agent
bump when it hits a gap (the WP-0012 routing-scenario case).
- [ ] Wire (a) and (b) in-hub; expose (c) via the MCP write from T2.
- [ ] Idempotency/debounce so a single lookup does not double-count.
### T4 — WSJF projection + ranked endpoint
```task
id: STATE-WP-0061-T04
status: todo
priority: high
```
- [ ] Pure projection: `wsjf = (base_value + relevance_weight × relevance) / job_size`.
- [ ] `GET /suggestions?rank=wsjf` returns suggestions/requirements ordered by score
(promoted/declined excluded by default).
- [ ] Feed the activity-core daily triage: include the ranked suggestion list in
the `daily_triage` report input (coordinate with CUST-WP-0044 runner).
### T5 — Dashboard surface
```task
id: STATE-WP-0061-T05
status: todo
priority: medium
```
- [ ] `/suggestions` page: ranked table (stage, relevance, WSJF, last requested),
with vet/promote/decline actions guarded to the sanctioned write layer.
- [ ] Link from `/wsjf-triage`; short `src/docs/suggestions.md`.
### T6 — Tests, docs, ADR amendment
```task
id: STATE-WP-0061-T06
status: todo
priority: medium
```
- [ ] Tests: model + migration, relevance bump idempotency, WSJF ordering,
promotion creates a linked task, stage transitions reject illegal moves.
- [ ] SCOPE/INTENT note; amend the read-model ADR to list the new sanctioned writes.
- [ ] Backfill example: register the gated WP-0012 routing scenarios as suggestions.
---
## Acceptance
- A need can be recorded as a `suggestion`, vetted into a `requirement`, and
promoted into a real `Task` — with the demand trail preserved.
- Each unmet lookup increments `relevance`; higher relevance raises WSJF, and
`GET /suggestions?rank=wsjf` reflects the new order.
- The daily WSJF triage report includes the ranked suggestion backlog.
- Gated work (WP-0012-style) lives as a relevance-accruing suggestion, never as an
inert `todo` task or a fabricated active catalog entry.
## See also
- `STATE-WP-0053` — WSJF triage review page (consumer surface)
- `CUST-WP-0044` — activity-core daily triage runner (producer; cross-repo seam)
- `api/models/capability_request.py`, `api/models/technical_debt.py` — prior art
- ops-warden `WARDEN-WP-0012` — the gated-backlog case that motivated this