Document service job durability contract

This commit is contained in:
2026-05-15 14:37:44 +02:00
parent 4089b7e400
commit 6154d74add
6 changed files with 113 additions and 8 deletions

View File

@@ -2,16 +2,15 @@
# Custodian Brief — guide-board # Custodian Brief — guide-board
**Domain:** markitect **Domain:** markitect
**Last synced:** 2026-05-15 11:49 UTC **Last synced:** 2026-05-15 12:35 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Active Workstreams ## Active Workstreams
### Assessment Operations Baseline ### Assessment Operations Baseline
Progress: 3/6 done | workstream_id: `fc5b1573-91b2-4a19-b6a9-dd4d17057d9b` Progress: 4/6 done | workstream_id: `fc5b1573-91b2-4a19-b6a9-dd4d17057d9b`
**Open tasks:** **Open tasks:**
- · D2.4 - Service Job Durability Contract `10e4003c`
- · D2.5 - Container Smoke Acceptance `9e2e7fa7` - · D2.5 - Container Smoke Acceptance `9e2e7fa7`
- · D2.6 - External Extension Acceptance Path `65fbf1df` - · D2.6 - External Extension Acceptance Path `65fbf1df`

View File

@@ -56,5 +56,6 @@ See:
- [docs/CONTAINER.md](docs/CONTAINER.md) - [docs/CONTAINER.md](docs/CONTAINER.md)
- [docs/EXTENSION-SDK.md](docs/EXTENSION-SDK.md) - [docs/EXTENSION-SDK.md](docs/EXTENSION-SDK.md)
- [docs/LOCAL-SERVICE-API.md](docs/LOCAL-SERVICE-API.md) - [docs/LOCAL-SERVICE-API.md](docs/LOCAL-SERVICE-API.md)
- [docs/SERVICE-JOB-DURABILITY.md](docs/SERVICE-JOB-DURABILITY.md)
- [extensions/CANDIDATES.md](extensions/CANDIDATES.md) - [extensions/CANDIDATES.md](extensions/CANDIDATES.md)
- [workplans/GUIDE-BOARD-WP-0001-bootstrapping.md](workplans/GUIDE-BOARD-WP-0001-bootstrapping.md) - [workplans/GUIDE-BOARD-WP-0001-bootstrapping.md](workplans/GUIDE-BOARD-WP-0001-bootstrapping.md)

View File

@@ -147,7 +147,8 @@ curl -sf http://127.0.0.1:8080/runs/JOB_ID/reports | python3 -m json.tool
Service job state is currently in memory for the running service process. Run Service job state is currently in memory for the running service process. Run
artifacts are durable in the output directory and can still be inspected after a artifacts are durable in the output directory and can still be inspected after a
service restart. service restart. See `docs/SERVICE-JOB-DURABILITY.md` for the restart and
recovery contract.
## Status Vocabulary ## Status Vocabulary

View File

@@ -87,7 +87,8 @@ run directory; the assessment result itself is still reported separately as
### `GET /runs` ### `GET /runs`
Lists known in-memory jobs for the current service process. Lists known in-memory jobs for the current service process. Job records are not
durable across service restarts.
### `GET /runs/{job_id}` ### `GET /runs/{job_id}`
@@ -111,4 +112,5 @@ podman run --rm -p 8080:8080 \
``` ```
The service keeps job state in memory. Durable run evidence remains in the The service keeps job state in memory. Durable run evidence remains in the
mounted output directory. mounted output directory. See `docs/SERVICE-JOB-DURABILITY.md` for the explicit
restart and recovery contract.

View File

@@ -0,0 +1,90 @@
# Service Job Durability
Status: draft
Created: 2026-05-15
## Decision
The guide-board local service keeps HTTP job state in memory for the baseline.
This is intentional. The service is a thin local transport over the CLI
contracts, not a workflow database.
Durable state lives in run directories:
- `run.json`
- `plan.json`
- `retention-summary.json`
- `normalized/evidence.json`
- `normalized/findings.json`
- `normalized/mappings.json`
- `reports/assessment-package.json`
- `reports/report.md`
- `artifacts/`
The durable recovery index is the set of `retention-summary.json` files under a
runs directory.
## Why In-Memory Jobs Stay The Baseline
In-memory service jobs keep the first service layer dependency-light and easy to
embed in local, container, and extension-specific environments. Operators can
restart the service without migrating or repairing a service database, and the
CLI remains the source of truth for execution semantics.
This also keeps interrupted service runs easy to reason about:
- if the process exits before a run completes, the HTTP job record is gone,
- any partial run directory remains for inspection,
- completed runs are recoverable through retained run summaries,
- repeated runs should use a new output directory or an intentional overwrite
policy chosen by the operator.
## Restart Semantics
After a service restart:
- `GET /runs` returns only jobs created since the new service process started,
- old `job_id` values are invalid,
- `GET /runs/{job_id}` cannot recover pre-restart job metadata,
- `GET /runs/{job_id}/reports` only works for jobs known to the current process,
- run artifacts from earlier service processes remain available on disk.
Operators should recover previous results with the CLI run-history commands:
```sh
PYTHONPATH=src python3 -m guide_board runs list --runs-dir runs
PYTHONPATH=src python3 -m guide_board runs latest --runs-dir runs
PYTHONPATH=src python3 -m guide_board runs report --runs-dir runs --run-id RUN_ID
```
## Recovery Flow
Use this flow when the service process restarted or a browser/UI lost its job
state:
1. Identify the output directory passed to `POST /runs`.
2. Confirm whether `retention-summary.json` exists.
3. If it exists, use `guide-board runs report --runs-dir <parent>` to retrieve
report paths.
4. If only partial files exist, inspect `run.json`, `plan.json`, and artifacts
before rerunning.
5. Rerun into a fresh output directory when the prior status is unclear.
## Future Durable Index Option
A future durable service index may be added if UI or automation workflows need
cross-restart job lookup. If added, it should remain reconstructable from run
directories and should not become the authority for assessment results.
The minimum acceptable durable index would contain:
- job id,
- request payload,
- job transport status,
- run id,
- output directory,
- result paths,
- error summary.
The index should be optional, dependency-light, and repairable by scanning
retained run summaries.

View File

@@ -120,7 +120,7 @@ Progress:
```task ```task
id: GUIDE-BOARD-WP-0002-T004 id: GUIDE-BOARD-WP-0002-T004
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "10e4003c-dc11-4a8e-aecc-7815559ac439" state_hub_task_id: "10e4003c-dc11-4a8e-aecc-7815559ac439"
``` ```
@@ -134,6 +134,18 @@ Acceptance:
- If durable indexing is added, keep it dependency-light and reconstructable - If durable indexing is added, keep it dependency-light and reconstructable
from retained run artifacts. from retained run artifacts.
Decision:
- Keep local service job state intentionally in-memory for the baseline.
- Treat run directories and `retention-summary.json` as the durable recovery
source.
Progress:
- Added `docs/SERVICE-JOB-DURABILITY.md`.
- Linked the contract from README, the local service API docs, and the
assessment operations guide.
## D2.5 - Container Smoke Acceptance ## D2.5 - Container Smoke Acceptance
```task ```task