# Service Definition of Mature (DoM) A long-running **service** (an HTTP API, MCP server, worker, or daemon — as opposed to a repository or a workstream) is considered **mature** when all criteria below are satisfied. This is the service-level companion to the Repository *Definition of Integrated* (`repo-doi.md`) and the Workstream *Definition of Done* (`workstream-dod.md`). Criteria are grouped by **Service Maturity Level**: a service that meets all **Core** criteria is *operable*; meeting **Standard** criteria makes it *observable*; meeting **Full** criteria makes it *mature*. --- ## Level 1 — Core (Operable) The minimum for a service to be run and reasoned about by agents and operators. - [ ] **Health endpoint** — the service exposes an unauthenticated health route that allows efficient "is the service available?" probing **without** running business logic. It returns `200` with a small JSON body (`{"status": "ok", ...}`) when ready, and a non-2xx (e.g. `503`) when a hard dependency is unavailable. For the State Hub API this is `GET /state/health`, which also reports DB connectivity (`{"db": "connected"}`). Agents should probe this **before** assuming the service is offline (see the session protocol fallback in `CLAUDE.md`). - [ ] **Start command documented** — a single documented command brings the service up from a clean checkout (for the State Hub API: `make api`, with `make db` first if Postgres is not running). - [ ] **Bound address known** — the listen host/port is fixed and documented (State Hub API: `http://127.0.0.1:8000`; remote via ops-bridge: `http://127.0.0.1:18000`). --- ## Level 2 — Standard (Observable) The service can be monitored and integrated by other agents and tooling. - [ ] **Health route is tested** — an automated test asserts the health route returns a success status and the expected shape, so regressions that take the service silently un-probeable are caught. - [ ] **Dependencies declared** — external service dependencies are declared in `tpsc.yaml` and ingested (`make ingest-tpsc REPO={slug}`); an empty `services: []` is used when there are none, to make the absence explicit. - [ ] **Remote reachability path** — if the service is consumed across machines, the tunnel/bridge route is documented (ops-bridge port map) and the health endpoint is reachable over it. - [ ] **Graceful dependency failure** — when a hard dependency (DB, broker) is down, the service reports it via the health route rather than crashing or hanging callers. --- ## Level 3 — Full (Mature) The service participates safely in the wider ecosystem over time. - [ ] **Versioned interface** — breaking interface changes are published via the interface-change tracker (`publish_interface_change`) so consumers are warned. - [ ] **Authn/authz boundary documented** — which routes are public (e.g. health) versus authenticated is explicit, and credential needs route through the standard channels (`credential-routing.md` / `warden route`). - [ ] **Recovery documented** — the runbook for restart and for restoring a failed dependency is captured (for the State Hub API: `make db` then `make api`; consistency repair via `make fix-consistency`). - [ ] **Progress/telemetry on lifecycle** — significant lifecycle events (deploys, migrations, outages) are recorded so the hub reflects service state. --- ## Maturity Checklist (Quick Reference) | # | Criterion | Level | Verified by | |---|---|---|---| | 1 | Health endpoint | 1 · Core | `curl -s $BASE/state/health` → `200`, `{"status":"ok"}` | | 2 | Start command documented | 1 · Core | `make api` from clean checkout | | 3 | Bound address known | 1 · Core | docs / `CLAUDE.md` | | 4 | Health route is tested | 2 · Standard | `tests/` asserts health route | | 5 | Dependencies declared | 2 · Standard | `make ingest-tpsc` | | 6 | Remote reachability path | 2 · Standard | ops-bridge health probe | | 7 | Graceful dependency failure | 2 · Standard | health returns `503` when DB down | | 8 | Versioned interface | 3 · Full | `publish_interface_change` | | 9 | Authn/authz boundary documented | 3 · Full | docs review | | 10 | Recovery documented | 3 · Full | runbook present | | 11 | Lifecycle telemetry | 3 · Full | `add_progress_event` on lifecycle | --- ## Notes - The DoM is enforced by convention, not by automated gates. - The **health endpoint** (Core #1) is the load-bearing criterion: it is what lets agents and monitors distinguish *"service down"* from *"service up but the request is wrong,"* cheaply and without side effects. - "Service" here means a process exposing an interface over its lifetime — the State Hub API and the FastMCP server each qualify. A one-shot CLI or a migration script is **not** a service and is out of scope for the DoM.