generated from coulomb/repo-seed
Add Service Definition of Mature policy and health-route test
Establish policies/service-dom.md as the service-level companion to the repo DoI and workstream DoD. Its load-bearing Core criterion is a cheap, side-effect free health endpoint for availability probing — satisfied by the existing GET /state/health (DB readiness, 200/503). Served automatically at /policy/service-dom by the existing policy router. Add a regression test asserting /state/health returns 200 with the expected shape, since none existed (DoM Standard criterion #4). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
106
policies/service-dom.md
Normal file
106
policies/service-dom.md
Normal file
@@ -0,0 +1,106 @@
|
||||
# Service Definition of Mature (DoM)
|
||||
|
||||
A long-running **service** (an HTTP API, MCP server, worker, or daemon — as
|
||||
opposed to a repository or a workstream) is considered **mature** when all
|
||||
criteria below are satisfied. This is the service-level companion to the
|
||||
Repository *Definition of Integrated* (`repo-doi.md`) and the Workstream
|
||||
*Definition of Done* (`workstream-dod.md`).
|
||||
|
||||
Criteria are grouped by tier: a service that meets all **Core** criteria is
|
||||
*operable*; meeting **Standard** criteria makes it *observable*; meeting
|
||||
**Full** criteria makes it *mature*.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — Core (Operable)
|
||||
|
||||
The minimum for a service to be run and reasoned about by agents and operators.
|
||||
|
||||
- [ ] **Health endpoint** — the service exposes an unauthenticated health route
|
||||
that allows efficient "is the service available?" probing **without** running
|
||||
business logic. It returns `200` with a small JSON body
|
||||
(`{"status": "ok", ...}`) when ready, and a non-2xx (e.g. `503`) when a hard
|
||||
dependency is unavailable. For the State Hub API this is
|
||||
`GET /state/health`, which also reports DB connectivity (`{"db": "connected"}`).
|
||||
Agents should probe this **before** assuming the service is offline (see the
|
||||
session protocol fallback in `CLAUDE.md`).
|
||||
|
||||
- [ ] **Start command documented** — a single documented command brings the
|
||||
service up from a clean checkout (for the State Hub API: `make api`, with
|
||||
`make db` first if Postgres is not running).
|
||||
|
||||
- [ ] **Bound address known** — the listen host/port is fixed and documented
|
||||
(State Hub API: `http://127.0.0.1:8000`; remote via ops-bridge:
|
||||
`http://127.0.0.1:18000`).
|
||||
|
||||
---
|
||||
|
||||
## Tier 2 — Standard (Observable)
|
||||
|
||||
The service can be monitored and integrated by other agents and tooling.
|
||||
|
||||
- [ ] **Health route is tested** — an automated test asserts the health route
|
||||
returns a success status and the expected shape, so regressions that take the
|
||||
service silently un-probeable are caught.
|
||||
|
||||
- [ ] **Dependencies declared** — external service dependencies are declared in
|
||||
`tpsc.yaml` and ingested (`make ingest-tpsc REPO={slug}`); an empty
|
||||
`services: []` is used when there are none, to make the absence explicit.
|
||||
|
||||
- [ ] **Remote reachability path** — if the service is consumed across machines,
|
||||
the tunnel/bridge route is documented (ops-bridge port map) and the health
|
||||
endpoint is reachable over it.
|
||||
|
||||
- [ ] **Graceful dependency failure** — when a hard dependency (DB, broker) is
|
||||
down, the service reports it via the health route rather than crashing or
|
||||
hanging callers.
|
||||
|
||||
---
|
||||
|
||||
## Tier 3 — Full (Mature)
|
||||
|
||||
The service participates safely in the wider ecosystem over time.
|
||||
|
||||
- [ ] **Versioned interface** — breaking interface changes are published via the
|
||||
interface-change tracker (`publish_interface_change`) so consumers are warned.
|
||||
|
||||
- [ ] **Authn/authz boundary documented** — which routes are public (e.g. health)
|
||||
versus authenticated is explicit, and credential needs route through the
|
||||
standard channels (`credential-routing.md` / `warden route`).
|
||||
|
||||
- [ ] **Recovery documented** — the runbook for restart and for restoring a
|
||||
failed dependency is captured (for the State Hub API: `make db` then
|
||||
`make api`; consistency repair via `make fix-consistency`).
|
||||
|
||||
- [ ] **Progress/telemetry on lifecycle** — significant lifecycle events
|
||||
(deploys, migrations, outages) are recorded so the hub reflects service state.
|
||||
|
||||
---
|
||||
|
||||
## Maturity Checklist (Quick Reference)
|
||||
|
||||
| # | Criterion | Tier | Verified by |
|
||||
|---|---|---|---|
|
||||
| 1 | Health endpoint | Core | `curl -s $BASE/state/health` → `200`, `{"status":"ok"}` |
|
||||
| 2 | Start command documented | Core | `make api` from clean checkout |
|
||||
| 3 | Bound address known | Core | docs / `CLAUDE.md` |
|
||||
| 4 | Health route is tested | Standard | `tests/` asserts health route |
|
||||
| 5 | Dependencies declared | Standard | `make ingest-tpsc` |
|
||||
| 6 | Remote reachability path | Standard | ops-bridge health probe |
|
||||
| 7 | Graceful dependency failure | Standard | health returns `503` when DB down |
|
||||
| 8 | Versioned interface | Full | `publish_interface_change` |
|
||||
| 9 | Authn/authz boundary documented | Full | docs review |
|
||||
| 10 | Recovery documented | Full | runbook present |
|
||||
| 11 | Lifecycle telemetry | Full | `add_progress_event` on lifecycle |
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- The DoM is enforced by convention, not by automated gates.
|
||||
- The **health endpoint** (Core #1) is the load-bearing criterion: it is what
|
||||
lets agents and monitors distinguish *"service down"* from *"service up but
|
||||
the request is wrong,"* cheaply and without side effects.
|
||||
- "Service" here means a process exposing an interface over its lifetime — the
|
||||
State Hub API and the FastMCP server each qualify. A one-shot CLI or a
|
||||
migration script is **not** a service and is out of scope for the DoM.
|
||||
@@ -1512,3 +1512,20 @@ class TestFabricGraphReadModel:
|
||||
summary = r.json()
|
||||
assert summary["schema_version"] is None
|
||||
assert summary["nodes_by_fabric"] == {}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Health route — Service Definition of Mature (policies/service-dom.md), Core #1
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
async def test_health_route_reports_ok_when_db_reachable(client):
|
||||
"""The health endpoint is a cheap availability probe with no business logic.
|
||||
|
||||
It must return 200 and a small JSON body so agents and monitors can tell
|
||||
"service available" from "request wrong" without side effects.
|
||||
"""
|
||||
r = await client.get("/state/health")
|
||||
assert r.status_code == 200, r.text
|
||||
body = r.json()
|
||||
assert body["status"] == "ok"
|
||||
assert body["db"] == "connected"
|
||||
|
||||
Reference in New Issue
Block a user