generated from coulomb/repo-seed
Rename Tier 1/2/3 -> Level 1/2/3 (Core/Standard/Full) in the Service DoM policy and the checklist header to "Level", aligning with the service_catalog maturity_level column. The DoI tier subsystem is intentionally untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
107 lines
4.8 KiB
Markdown
107 lines
4.8 KiB
Markdown
# Service Definition of Mature (DoM)
|
|
|
|
A long-running **service** (an HTTP API, MCP server, worker, or daemon — as
|
|
opposed to a repository or a workstream) is considered **mature** when all
|
|
criteria below are satisfied. This is the service-level companion to the
|
|
Repository *Definition of Integrated* (`repo-doi.md`) and the Workstream
|
|
*Definition of Done* (`workstream-dod.md`).
|
|
|
|
Criteria are grouped by **Service Maturity Level**: a service that meets all
|
|
**Core** criteria is *operable*; meeting **Standard** criteria makes it
|
|
*observable*; meeting **Full** criteria makes it *mature*.
|
|
|
|
---
|
|
|
|
## Level 1 — Core (Operable)
|
|
|
|
The minimum for a service to be run and reasoned about by agents and operators.
|
|
|
|
- [ ] **Health endpoint** — the service exposes an unauthenticated health route
|
|
that allows efficient "is the service available?" probing **without** running
|
|
business logic. It returns `200` with a small JSON body
|
|
(`{"status": "ok", ...}`) when ready, and a non-2xx (e.g. `503`) when a hard
|
|
dependency is unavailable. For the State Hub API this is
|
|
`GET /state/health`, which also reports DB connectivity (`{"db": "connected"}`).
|
|
Agents should probe this **before** assuming the service is offline (see the
|
|
session protocol fallback in `CLAUDE.md`).
|
|
|
|
- [ ] **Start command documented** — a single documented command brings the
|
|
service up from a clean checkout (for the State Hub API: `make api`, with
|
|
`make db` first if Postgres is not running).
|
|
|
|
- [ ] **Bound address known** — the listen host/port is fixed and documented
|
|
(State Hub API: `http://127.0.0.1:8000`; remote via ops-bridge:
|
|
`http://127.0.0.1:18000`).
|
|
|
|
---
|
|
|
|
## Level 2 — Standard (Observable)
|
|
|
|
The service can be monitored and integrated by other agents and tooling.
|
|
|
|
- [ ] **Health route is tested** — an automated test asserts the health route
|
|
returns a success status and the expected shape, so regressions that take the
|
|
service silently un-probeable are caught.
|
|
|
|
- [ ] **Dependencies declared** — external service dependencies are declared in
|
|
`tpsc.yaml` and ingested (`make ingest-tpsc REPO={slug}`); an empty
|
|
`services: []` is used when there are none, to make the absence explicit.
|
|
|
|
- [ ] **Remote reachability path** — if the service is consumed across machines,
|
|
the tunnel/bridge route is documented (ops-bridge port map) and the health
|
|
endpoint is reachable over it.
|
|
|
|
- [ ] **Graceful dependency failure** — when a hard dependency (DB, broker) is
|
|
down, the service reports it via the health route rather than crashing or
|
|
hanging callers.
|
|
|
|
---
|
|
|
|
## Level 3 — Full (Mature)
|
|
|
|
The service participates safely in the wider ecosystem over time.
|
|
|
|
- [ ] **Versioned interface** — breaking interface changes are published via the
|
|
interface-change tracker (`publish_interface_change`) so consumers are warned.
|
|
|
|
- [ ] **Authn/authz boundary documented** — which routes are public (e.g. health)
|
|
versus authenticated is explicit, and credential needs route through the
|
|
standard channels (`credential-routing.md` / `warden route`).
|
|
|
|
- [ ] **Recovery documented** — the runbook for restart and for restoring a
|
|
failed dependency is captured (for the State Hub API: `make db` then
|
|
`make api`; consistency repair via `make fix-consistency`).
|
|
|
|
- [ ] **Progress/telemetry on lifecycle** — significant lifecycle events
|
|
(deploys, migrations, outages) are recorded so the hub reflects service state.
|
|
|
|
---
|
|
|
|
## Maturity Checklist (Quick Reference)
|
|
|
|
| # | Criterion | Level | Verified by |
|
|
|---|---|---|---|
|
|
| 1 | Health endpoint | 1 · Core | `curl -s $BASE/state/health` → `200`, `{"status":"ok"}` |
|
|
| 2 | Start command documented | 1 · Core | `make api` from clean checkout |
|
|
| 3 | Bound address known | 1 · Core | docs / `CLAUDE.md` |
|
|
| 4 | Health route is tested | 2 · Standard | `tests/` asserts health route |
|
|
| 5 | Dependencies declared | 2 · Standard | `make ingest-tpsc` |
|
|
| 6 | Remote reachability path | 2 · Standard | ops-bridge health probe |
|
|
| 7 | Graceful dependency failure | 2 · Standard | health returns `503` when DB down |
|
|
| 8 | Versioned interface | 3 · Full | `publish_interface_change` |
|
|
| 9 | Authn/authz boundary documented | 3 · Full | docs review |
|
|
| 10 | Recovery documented | 3 · Full | runbook present |
|
|
| 11 | Lifecycle telemetry | 3 · Full | `add_progress_event` on lifecycle |
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
- The DoM is enforced by convention, not by automated gates.
|
|
- The **health endpoint** (Core #1) is the load-bearing criterion: it is what
|
|
lets agents and monitors distinguish *"service down"* from *"service up but
|
|
the request is wrong,"* cheaply and without side effects.
|
|
- "Service" here means a process exposing an interface over its lifetime — the
|
|
State Hub API and the FastMCP server each qualify. A one-shot CLI or a
|
|
migration script is **not** a service and is out of scope for the DoM.
|