Rename Tier 1/2/3 -> Level 1/2/3 (Core/Standard/Full) in the Service DoM policy and the checklist header to "Level", aligning with the service_catalog maturity_level column. The DoI tier subsystem is intentionally untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.8 KiB
Service Definition of Mature (DoM)
A long-running service (an HTTP API, MCP server, worker, or daemon — as
opposed to a repository or a workstream) is considered mature when all
criteria below are satisfied. This is the service-level companion to the
Repository Definition of Integrated (repo-doi.md) and the Workstream
Definition of Done (workstream-dod.md).
Criteria are grouped by Service Maturity Level: a service that meets all Core criteria is operable; meeting Standard criteria makes it observable; meeting Full criteria makes it mature.
Level 1 — Core (Operable)
The minimum for a service to be run and reasoned about by agents and operators.
-
Health endpoint — the service exposes an unauthenticated health route that allows efficient "is the service available?" probing without running business logic. It returns
200with a small JSON body ({"status": "ok", ...}) when ready, and a non-2xx (e.g.503) when a hard dependency is unavailable. For the State Hub API this isGET /state/health, which also reports DB connectivity ({"db": "connected"}). Agents should probe this before assuming the service is offline (see the session protocol fallback inCLAUDE.md). -
Start command documented — a single documented command brings the service up from a clean checkout (for the State Hub API:
make api, withmake dbfirst if Postgres is not running). -
Bound address known — the listen host/port is fixed and documented (State Hub API:
http://127.0.0.1:8000; remote via ops-bridge:http://127.0.0.1:18000).
Level 2 — Standard (Observable)
The service can be monitored and integrated by other agents and tooling.
-
Health route is tested — an automated test asserts the health route returns a success status and the expected shape, so regressions that take the service silently un-probeable are caught.
-
Dependencies declared — external service dependencies are declared in
tpsc.yamland ingested (make ingest-tpsc REPO={slug}); an emptyservices: []is used when there are none, to make the absence explicit. -
Remote reachability path — if the service is consumed across machines, the tunnel/bridge route is documented (ops-bridge port map) and the health endpoint is reachable over it.
-
Graceful dependency failure — when a hard dependency (DB, broker) is down, the service reports it via the health route rather than crashing or hanging callers.
Level 3 — Full (Mature)
The service participates safely in the wider ecosystem over time.
-
Versioned interface — breaking interface changes are published via the interface-change tracker (
publish_interface_change) so consumers are warned. -
Authn/authz boundary documented — which routes are public (e.g. health) versus authenticated is explicit, and credential needs route through the standard channels (
credential-routing.md/warden route). -
Recovery documented — the runbook for restart and for restoring a failed dependency is captured (for the State Hub API:
make dbthenmake api; consistency repair viamake fix-consistency). -
Progress/telemetry on lifecycle — significant lifecycle events (deploys, migrations, outages) are recorded so the hub reflects service state.
Maturity Checklist (Quick Reference)
| # | Criterion | Level | Verified by |
|---|---|---|---|
| 1 | Health endpoint | 1 · Core | curl -s $BASE/state/health → 200, {"status":"ok"} |
| 2 | Start command documented | 1 · Core | make api from clean checkout |
| 3 | Bound address known | 1 · Core | docs / CLAUDE.md |
| 4 | Health route is tested | 2 · Standard | tests/ asserts health route |
| 5 | Dependencies declared | 2 · Standard | make ingest-tpsc |
| 6 | Remote reachability path | 2 · Standard | ops-bridge health probe |
| 7 | Graceful dependency failure | 2 · Standard | health returns 503 when DB down |
| 8 | Versioned interface | 3 · Full | publish_interface_change |
| 9 | Authn/authz boundary documented | 3 · Full | docs review |
| 10 | Recovery documented | 3 · Full | runbook present |
| 11 | Lifecycle telemetry | 3 · Full | add_progress_event on lifecycle |
Notes
- The DoM is enforced by convention, not by automated gates.
- The health endpoint (Core #1) is the load-bearing criterion: it is what lets agents and monitors distinguish "service down" from "service up but the request is wrong," cheaply and without side effects.
- "Service" here means a process exposing an interface over its lifetime — the State Hub API and the FastMCP server each qualify. A one-shot CLI or a migration script is not a service and is out of scope for the DoM.