generated from coulomb/repo-seed
Compare commits
3 Commits
af2972a460
...
eacfccdffd
| Author | SHA1 | Date | |
|---|---|---|---|
| eacfccdffd | |||
| e4126bc755 | |||
| 044141de48 |
@@ -44,6 +44,7 @@ export default {
|
||||
open: false,
|
||||
pages: [
|
||||
{ name: "Repository DoI", path: "/policy/repo-doi" },
|
||||
{ name: "Service DoM", path: "/policy/service-dom" },
|
||||
{ name: "Workstream DoD", path: "/policy/workstream-dod" },
|
||||
],
|
||||
},
|
||||
|
||||
@@ -265,5 +265,6 @@ why, even years later.
|
||||
- [Connecting to the Hub](/docs/connecting)
|
||||
- [Repo Integration](/docs/repo-integration)
|
||||
- [Repository DoI](/policy/repo-doi) — Definition of Integrated
|
||||
- [Service DoM](/policy/service-dom) — Definition of Mature
|
||||
- [TPSC](/docs/tpsc) — Third-Party Services Catalog
|
||||
- [SBOM](/docs/sbom)
|
||||
|
||||
90
dashboard/src/policy/service-dom.md
Normal file
90
dashboard/src/policy/service-dom.md
Normal file
@@ -0,0 +1,90 @@
|
||||
---
|
||||
title: Service Definition of Mature (DoM)
|
||||
---
|
||||
|
||||
```js
|
||||
import {API} from "../components/config.js";
|
||||
```
|
||||
|
||||
```js
|
||||
import {marked} from "npm:marked";
|
||||
|
||||
const _resp = await fetch(`${API}/policy/service-dom`);
|
||||
if (!_resp.ok) throw new Error(`Failed to load policy: ${_resp.status}`);
|
||||
const _policy = await _resp.json();
|
||||
```
|
||||
|
||||
```js
|
||||
let _content = _policy.content;
|
||||
let _editing = false;
|
||||
|
||||
const _root = display(html`<div></div>`);
|
||||
|
||||
async function _save(text) {
|
||||
const r = await fetch(`${API}/policy/service-dom`, {
|
||||
method: "PUT",
|
||||
headers: {"Content-Type": "application/json"},
|
||||
body: JSON.stringify({content: text}),
|
||||
});
|
||||
if (!r.ok) throw new Error(`Save failed: ${r.status}`);
|
||||
_content = text;
|
||||
}
|
||||
|
||||
function _toolbar(...nodes) {
|
||||
return html`<div style="display:flex;gap:0.5rem;margin-bottom:1rem">${nodes}</div>`;
|
||||
}
|
||||
|
||||
function _btn(label, primary = false) {
|
||||
return html`<button style="
|
||||
padding:0.35rem 0.9rem;border-radius:4px;cursor:pointer;font-size:13px;
|
||||
background:${primary ? "#1e293b" : "#f1f5f9"};
|
||||
color:${primary ? "#f8fafc" : "#1e293b"};
|
||||
border:1px solid ${primary ? "#1e293b" : "#cbd5e1"};
|
||||
">${label}</button>`;
|
||||
}
|
||||
|
||||
function _render() {
|
||||
_root.innerHTML = "";
|
||||
|
||||
if (_editing) {
|
||||
const area = html`<textarea style="
|
||||
width:100%;box-sizing:border-box;height:520px;
|
||||
font-family:ui-monospace,monospace;font-size:13px;line-height:1.6;
|
||||
padding:0.75rem;border:1px solid #cbd5e1;border-radius:4px;
|
||||
background:#f8fafc;color:#1e293b;resize:vertical;
|
||||
">${_content}</textarea>`;
|
||||
|
||||
const saveBtn = _btn("Save", true);
|
||||
const cancelBtn = _btn("Cancel");
|
||||
|
||||
saveBtn.onclick = async () => {
|
||||
saveBtn.disabled = true;
|
||||
saveBtn.textContent = "Saving…";
|
||||
try {
|
||||
await _save(area.value);
|
||||
_editing = false;
|
||||
_render();
|
||||
} catch (e) {
|
||||
saveBtn.disabled = false;
|
||||
saveBtn.textContent = "Save";
|
||||
alert(e.message);
|
||||
}
|
||||
};
|
||||
|
||||
cancelBtn.onclick = () => { _editing = false; _render(); };
|
||||
|
||||
_root.append(_toolbar(saveBtn, cancelBtn), area);
|
||||
|
||||
} else {
|
||||
const editBtn = _btn("Edit");
|
||||
editBtn.onclick = () => { _editing = true; _render(); };
|
||||
|
||||
const body = html`<div style="max-width:720px;line-height:1.7;"></div>`;
|
||||
body.innerHTML = marked.parse(_content);
|
||||
|
||||
_root.append(_toolbar(editBtn), body);
|
||||
}
|
||||
}
|
||||
|
||||
_render();
|
||||
```
|
||||
106
policies/service-dom.md
Normal file
106
policies/service-dom.md
Normal file
@@ -0,0 +1,106 @@
|
||||
# Service Definition of Mature (DoM)
|
||||
|
||||
A long-running **service** (an HTTP API, MCP server, worker, or daemon — as
|
||||
opposed to a repository or a workstream) is considered **mature** when all
|
||||
criteria below are satisfied. This is the service-level companion to the
|
||||
Repository *Definition of Integrated* (`repo-doi.md`) and the Workstream
|
||||
*Definition of Done* (`workstream-dod.md`).
|
||||
|
||||
Criteria are grouped by tier: a service that meets all **Core** criteria is
|
||||
*operable*; meeting **Standard** criteria makes it *observable*; meeting
|
||||
**Full** criteria makes it *mature*.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — Core (Operable)
|
||||
|
||||
The minimum for a service to be run and reasoned about by agents and operators.
|
||||
|
||||
- [ ] **Health endpoint** — the service exposes an unauthenticated health route
|
||||
that allows efficient "is the service available?" probing **without** running
|
||||
business logic. It returns `200` with a small JSON body
|
||||
(`{"status": "ok", ...}`) when ready, and a non-2xx (e.g. `503`) when a hard
|
||||
dependency is unavailable. For the State Hub API this is
|
||||
`GET /state/health`, which also reports DB connectivity (`{"db": "connected"}`).
|
||||
Agents should probe this **before** assuming the service is offline (see the
|
||||
session protocol fallback in `CLAUDE.md`).
|
||||
|
||||
- [ ] **Start command documented** — a single documented command brings the
|
||||
service up from a clean checkout (for the State Hub API: `make api`, with
|
||||
`make db` first if Postgres is not running).
|
||||
|
||||
- [ ] **Bound address known** — the listen host/port is fixed and documented
|
||||
(State Hub API: `http://127.0.0.1:8000`; remote via ops-bridge:
|
||||
`http://127.0.0.1:18000`).
|
||||
|
||||
---
|
||||
|
||||
## Tier 2 — Standard (Observable)
|
||||
|
||||
The service can be monitored and integrated by other agents and tooling.
|
||||
|
||||
- [ ] **Health route is tested** — an automated test asserts the health route
|
||||
returns a success status and the expected shape, so regressions that take the
|
||||
service silently un-probeable are caught.
|
||||
|
||||
- [ ] **Dependencies declared** — external service dependencies are declared in
|
||||
`tpsc.yaml` and ingested (`make ingest-tpsc REPO={slug}`); an empty
|
||||
`services: []` is used when there are none, to make the absence explicit.
|
||||
|
||||
- [ ] **Remote reachability path** — if the service is consumed across machines,
|
||||
the tunnel/bridge route is documented (ops-bridge port map) and the health
|
||||
endpoint is reachable over it.
|
||||
|
||||
- [ ] **Graceful dependency failure** — when a hard dependency (DB, broker) is
|
||||
down, the service reports it via the health route rather than crashing or
|
||||
hanging callers.
|
||||
|
||||
---
|
||||
|
||||
## Tier 3 — Full (Mature)
|
||||
|
||||
The service participates safely in the wider ecosystem over time.
|
||||
|
||||
- [ ] **Versioned interface** — breaking interface changes are published via the
|
||||
interface-change tracker (`publish_interface_change`) so consumers are warned.
|
||||
|
||||
- [ ] **Authn/authz boundary documented** — which routes are public (e.g. health)
|
||||
versus authenticated is explicit, and credential needs route through the
|
||||
standard channels (`credential-routing.md` / `warden route`).
|
||||
|
||||
- [ ] **Recovery documented** — the runbook for restart and for restoring a
|
||||
failed dependency is captured (for the State Hub API: `make db` then
|
||||
`make api`; consistency repair via `make fix-consistency`).
|
||||
|
||||
- [ ] **Progress/telemetry on lifecycle** — significant lifecycle events
|
||||
(deploys, migrations, outages) are recorded so the hub reflects service state.
|
||||
|
||||
---
|
||||
|
||||
## Maturity Checklist (Quick Reference)
|
||||
|
||||
| # | Criterion | Tier | Verified by |
|
||||
|---|---|---|---|
|
||||
| 1 | Health endpoint | Core | `curl -s $BASE/state/health` → `200`, `{"status":"ok"}` |
|
||||
| 2 | Start command documented | Core | `make api` from clean checkout |
|
||||
| 3 | Bound address known | Core | docs / `CLAUDE.md` |
|
||||
| 4 | Health route is tested | Standard | `tests/` asserts health route |
|
||||
| 5 | Dependencies declared | Standard | `make ingest-tpsc` |
|
||||
| 6 | Remote reachability path | Standard | ops-bridge health probe |
|
||||
| 7 | Graceful dependency failure | Standard | health returns `503` when DB down |
|
||||
| 8 | Versioned interface | Full | `publish_interface_change` |
|
||||
| 9 | Authn/authz boundary documented | Full | docs review |
|
||||
| 10 | Recovery documented | Full | runbook present |
|
||||
| 11 | Lifecycle telemetry | Full | `add_progress_event` on lifecycle |
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- The DoM is enforced by convention, not by automated gates.
|
||||
- The **health endpoint** (Core #1) is the load-bearing criterion: it is what
|
||||
lets agents and monitors distinguish *"service down"* from *"service up but
|
||||
the request is wrong,"* cheaply and without side effects.
|
||||
- "Service" here means a process exposing an interface over its lifetime — the
|
||||
State Hub API and the FastMCP server each qualify. A one-shot CLI or a
|
||||
migration script is **not** a service and is out of scope for the DoM.
|
||||
@@ -1512,3 +1512,20 @@ class TestFabricGraphReadModel:
|
||||
summary = r.json()
|
||||
assert summary["schema_version"] is None
|
||||
assert summary["nodes_by_fabric"] == {}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Health route — Service Definition of Mature (policies/service-dom.md), Core #1
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
async def test_health_route_reports_ok_when_db_reachable(client):
|
||||
"""The health endpoint is a cheap availability probe with no business logic.
|
||||
|
||||
It must return 200 and a small JSON body so agents and monitors can tell
|
||||
"service available" from "request wrong" without side effects.
|
||||
"""
|
||||
r = await client.get("/state/health")
|
||||
assert r.status_code == 200, r.text
|
||||
body = r.json()
|
||||
assert body["status"] == "ok"
|
||||
assert body["db"] == "connected"
|
||||
|
||||
174
workplans/STATE-WP-0061-demand-weighted-suggestion-backlog.md
Normal file
174
workplans/STATE-WP-0061-demand-weighted-suggestion-backlog.md
Normal file
@@ -0,0 +1,174 @@
|
||||
---
|
||||
id: STATE-WP-0061
|
||||
type: workplan
|
||||
title: "Demand-weighted suggestion backlog (relevance-fed WSJF)"
|
||||
domain: custodian
|
||||
repo: state-hub
|
||||
status: proposed
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-18"
|
||||
updated: "2026-06-18"
|
||||
---
|
||||
|
||||
# STATE-WP-0061 — Demand-weighted suggestion backlog (relevance-fed WSJF)
|
||||
|
||||
**Origin:** ops-warden WP-0012 triage (2026-06-18). Most WP-0012 tasks are gated
|
||||
on external owners shipping paths that do not exist yet. They should not sit as
|
||||
inert `todo` tasks, nor be fabricated as active entries. They should live as
|
||||
**suggestions that accrue demand pressure** every time they are needed-but-unmet,
|
||||
so in-demand work is promoted to real tasks first.
|
||||
|
||||
## Problem
|
||||
|
||||
The hub today cannot represent "a need that has been raised but not yet vetted or
|
||||
scheduled, whose urgency grows with repeated demand." Concretely:
|
||||
|
||||
- `NextStep` suggestions are **derived on the fly and never persisted**
|
||||
(`api/schemas/state.py`), so they cannot accumulate anything.
|
||||
- There is **no relevance/demand counter** on any entity.
|
||||
- There is **no suggestion → vetted requirement → task** promotion pipeline.
|
||||
`CapabilityRequest` is the closest analog but models cross-domain brokering, and
|
||||
a repeat need spawns a *new* request rather than bumping demand on an existing one.
|
||||
- The **WSJF triage is advisory** (activity-core `daily-statehub-wsjf-triage`,
|
||||
CUST-WP-0044) and consumes current summary/workplan/progress state — **no
|
||||
persisted demand signal feeds it**, and the hub holds no Cost-of-Delay / Job-Size
|
||||
data model.
|
||||
|
||||
## Approach (decided)
|
||||
|
||||
- **New `Suggestion` entity** (not an extension of `CapabilityRequest`/`TechnicalDebt`).
|
||||
Stages: `suggestion` → `requirement` (vetted + structured) → `promoted` (became a
|
||||
Task); plus terminal `declined`. Append-only `SuggestionNote` trail (mirrors
|
||||
`TDNote`) records vetting. `promoted_task_id` links the resulting `Task`.
|
||||
- **Relevance is a persisted demand counter.** It increments whenever the
|
||||
suggestion is *needed but not yet done* (defined in T3). `relevance` +
|
||||
`last_requested_at` + a `relevance_events` count drive ranking.
|
||||
- **WSJF is computed in the hub as a read-model projection** and exposed via a
|
||||
ranked endpoint; the existing activity-core daily triage **consumes** it.
|
||||
`wsjf = cost_of_delay / job_size`, where
|
||||
`cost_of_delay = base_value + (relevance_weight × relevance)` so repeated demand
|
||||
raises priority. Job size is an operator-set estimate (default medium).
|
||||
- Promotion keeps the active task backlog clean: gated needs (the WP-0012 case)
|
||||
live as relevance-accruing suggestions, **not** as inert `todo` tasks or
|
||||
fabricated active catalog entries.
|
||||
|
||||
> **Read-model boundary (ADR-001 / hub design):** suggestion writes are a new
|
||||
> sanctioned write surface alongside `resolve_decision` and `get_next_steps`.
|
||||
> `bump_relevance`, `promote_suggestion_to_task`, and vetting transitions are the
|
||||
> only writes; ranking/WSJF are pure projections. T6 records the ADR amendment.
|
||||
|
||||
## Open questions (resolve during T1/T4, do not block proposal)
|
||||
|
||||
- Exact WSJF cost-of-delay decomposition (single `base_value` vs SAFe triple of
|
||||
business-value / time-criticality / risk-reduction). Start with `base_value`
|
||||
+ relevance; leave room to split later.
|
||||
- `relevance_weight` default and whether relevance should decay over time
|
||||
(staleness) — model the field now, tune in T4.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### T1 — Suggestion data model + migration
|
||||
|
||||
```task
|
||||
id: STATE-WP-0061-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] `api/models/suggestion.py`: `Suggestion` (id, domain_id, topic_id?,
|
||||
workstream_id?, title, description, origin, stage, relevance,
|
||||
relevance_events, last_requested_at, base_value, job_size,
|
||||
relevance_weight, promoted_task_id) + `SuggestionNote` (append-only trail).
|
||||
- [ ] `SuggestionStage` enum: `suggestion | requirement | promoted | declined`.
|
||||
- [ ] Alembic migration; register model in `api/models/__init__.py`.
|
||||
|
||||
### T2 — API + MCP sanctioned write layer
|
||||
|
||||
```task
|
||||
id: STATE-WP-0061-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] REST + MCP: `create_suggestion`, `vet_suggestion` (→ requirement, with
|
||||
structured fields + note), `decline_suggestion`, `promote_suggestion_to_task`
|
||||
(creates a `Task`, sets `promoted_task_id`, stage→promoted), and `list/get`.
|
||||
- [ ] `bump_relevance(id, reason)` — sanctioned write; appends a relevance event,
|
||||
increments counter, sets `last_requested_at`.
|
||||
- [ ] Document these as sanctioned writes (alongside `resolve_decision`).
|
||||
|
||||
### T3 — Relevance emission wiring ("needed but not done")
|
||||
|
||||
```task
|
||||
id: STATE-WP-0061-T03
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] Define the demand events that bump relevance: (a) `get_next_steps` /
|
||||
dependency lookup resolves to an open suggestion/requirement; (b) a
|
||||
`CapabilityRequest` matches an unfulfilled suggestion; (c) an explicit agent
|
||||
bump when it hits a gap (the WP-0012 routing-scenario case).
|
||||
- [ ] Wire (a) and (b) in-hub; expose (c) via the MCP write from T2.
|
||||
- [ ] Idempotency/debounce so a single lookup does not double-count.
|
||||
|
||||
### T4 — WSJF projection + ranked endpoint
|
||||
|
||||
```task
|
||||
id: STATE-WP-0061-T04
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] Pure projection: `wsjf = (base_value + relevance_weight × relevance) / job_size`.
|
||||
- [ ] `GET /suggestions?rank=wsjf` returns suggestions/requirements ordered by score
|
||||
(promoted/declined excluded by default).
|
||||
- [ ] Feed the activity-core daily triage: include the ranked suggestion list in
|
||||
the `daily_triage` report input (coordinate with CUST-WP-0044 runner).
|
||||
|
||||
### T5 — Dashboard surface
|
||||
|
||||
```task
|
||||
id: STATE-WP-0061-T05
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [ ] `/suggestions` page: ranked table (stage, relevance, WSJF, last requested),
|
||||
with vet/promote/decline actions guarded to the sanctioned write layer.
|
||||
- [ ] Link from `/wsjf-triage`; short `src/docs/suggestions.md`.
|
||||
|
||||
### T6 — Tests, docs, ADR amendment
|
||||
|
||||
```task
|
||||
id: STATE-WP-0061-T06
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [ ] Tests: model + migration, relevance bump idempotency, WSJF ordering,
|
||||
promotion creates a linked task, stage transitions reject illegal moves.
|
||||
- [ ] SCOPE/INTENT note; amend the read-model ADR to list the new sanctioned writes.
|
||||
- [ ] Backfill example: register the gated WP-0012 routing scenarios as suggestions.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance
|
||||
|
||||
- A need can be recorded as a `suggestion`, vetted into a `requirement`, and
|
||||
promoted into a real `Task` — with the demand trail preserved.
|
||||
- Each unmet lookup increments `relevance`; higher relevance raises WSJF, and
|
||||
`GET /suggestions?rank=wsjf` reflects the new order.
|
||||
- The daily WSJF triage report includes the ranked suggestion backlog.
|
||||
- Gated work (WP-0012-style) lives as a relevance-accruing suggestion, never as an
|
||||
inert `todo` task or a fabricated active catalog entry.
|
||||
|
||||
## See also
|
||||
|
||||
- `STATE-WP-0053` — WSJF triage review page (consumer surface)
|
||||
- `CUST-WP-0044` — activity-core daily triage runner (producer; cross-repo seam)
|
||||
- `api/models/capability_request.py`, `api/models/technical_debt.py` — prior art
|
||||
- ops-warden `WARDEN-WP-0012` — the gated-backlog case that motivated this
|
||||
Reference in New Issue
Block a user