Files
state-hub/policies/service-dom.md
tegwick f14c225dd9 STATE-WP-0062 T4: Service DoM uses "Level" not "Tier"
Rename Tier 1/2/3 -> Level 1/2/3 (Core/Standard/Full) in the Service DoM policy
and the checklist header to "Level", aligning with the service_catalog
maturity_level column. The DoI tier subsystem is intentionally untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 21:03:35 +02:00

4.8 KiB

Service Definition of Mature (DoM)

A long-running service (an HTTP API, MCP server, worker, or daemon — as opposed to a repository or a workstream) is considered mature when all criteria below are satisfied. This is the service-level companion to the Repository Definition of Integrated (repo-doi.md) and the Workstream Definition of Done (workstream-dod.md).

Criteria are grouped by Service Maturity Level: a service that meets all Core criteria is operable; meeting Standard criteria makes it observable; meeting Full criteria makes it mature.


Level 1 — Core (Operable)

The minimum for a service to be run and reasoned about by agents and operators.

  • Health endpoint — the service exposes an unauthenticated health route that allows efficient "is the service available?" probing without running business logic. It returns 200 with a small JSON body ({"status": "ok", ...}) when ready, and a non-2xx (e.g. 503) when a hard dependency is unavailable. For the State Hub API this is GET /state/health, which also reports DB connectivity ({"db": "connected"}). Agents should probe this before assuming the service is offline (see the session protocol fallback in CLAUDE.md).

  • Start command documented — a single documented command brings the service up from a clean checkout (for the State Hub API: make api, with make db first if Postgres is not running).

  • Bound address known — the listen host/port is fixed and documented (State Hub API: http://127.0.0.1:8000; remote via ops-bridge: http://127.0.0.1:18000).


Level 2 — Standard (Observable)

The service can be monitored and integrated by other agents and tooling.

  • Health route is tested — an automated test asserts the health route returns a success status and the expected shape, so regressions that take the service silently un-probeable are caught.

  • Dependencies declared — external service dependencies are declared in tpsc.yaml and ingested (make ingest-tpsc REPO={slug}); an empty services: [] is used when there are none, to make the absence explicit.

  • Remote reachability path — if the service is consumed across machines, the tunnel/bridge route is documented (ops-bridge port map) and the health endpoint is reachable over it.

  • Graceful dependency failure — when a hard dependency (DB, broker) is down, the service reports it via the health route rather than crashing or hanging callers.


Level 3 — Full (Mature)

The service participates safely in the wider ecosystem over time.

  • Versioned interface — breaking interface changes are published via the interface-change tracker (publish_interface_change) so consumers are warned.

  • Authn/authz boundary documented — which routes are public (e.g. health) versus authenticated is explicit, and credential needs route through the standard channels (credential-routing.md / warden route).

  • Recovery documented — the runbook for restart and for restoring a failed dependency is captured (for the State Hub API: make db then make api; consistency repair via make fix-consistency).

  • Progress/telemetry on lifecycle — significant lifecycle events (deploys, migrations, outages) are recorded so the hub reflects service state.


Maturity Checklist (Quick Reference)

# Criterion Level Verified by
1 Health endpoint 1 · Core curl -s $BASE/state/health200, {"status":"ok"}
2 Start command documented 1 · Core make api from clean checkout
3 Bound address known 1 · Core docs / CLAUDE.md
4 Health route is tested 2 · Standard tests/ asserts health route
5 Dependencies declared 2 · Standard make ingest-tpsc
6 Remote reachability path 2 · Standard ops-bridge health probe
7 Graceful dependency failure 2 · Standard health returns 503 when DB down
8 Versioned interface 3 · Full publish_interface_change
9 Authn/authz boundary documented 3 · Full docs review
10 Recovery documented 3 · Full runbook present
11 Lifecycle telemetry 3 · Full add_progress_event on lifecycle

Notes

  • The DoM is enforced by convention, not by automated gates.
  • The health endpoint (Core #1) is the load-bearing criterion: it is what lets agents and monitors distinguish "service down" from "service up but the request is wrong," cheaply and without side effects.
  • "Service" here means a process exposing an interface over its lifetime — the State Hub API and the FastMCP server each qualify. A one-shot CLI or a migration script is not a service and is out of scope for the DoM.