Files
guide-board/docs/SERVICE-JOB-DURABILITY.md

91 lines
2.9 KiB
Markdown

# Service Job Durability
Status: draft
Created: 2026-05-15
## Decision
The guide-board local service keeps HTTP job state in memory for the baseline.
This is intentional. The service is a thin local transport over the CLI
contracts, not a workflow database.
Durable state lives in run directories:
- `run.json`
- `plan.json`
- `retention-summary.json`
- `normalized/evidence.json`
- `normalized/findings.json`
- `normalized/mappings.json`
- `reports/assessment-package.json`
- `reports/report.md`
- `artifacts/`
The durable recovery index is the set of `retention-summary.json` files under a
runs directory.
## Why In-Memory Jobs Stay The Baseline
In-memory service jobs keep the first service layer dependency-light and easy to
embed in local, container, and extension-specific environments. Operators can
restart the service without migrating or repairing a service database, and the
CLI remains the source of truth for execution semantics.
This also keeps interrupted service runs easy to reason about:
- if the process exits before a run completes, the HTTP job record is gone,
- any partial run directory remains for inspection,
- completed runs are recoverable through retained run summaries,
- repeated runs should use a new output directory or an intentional overwrite
policy chosen by the operator.
## Restart Semantics
After a service restart:
- `GET /runs` returns only jobs created since the new service process started,
- old `job_id` values are invalid,
- `GET /runs/{job_id}` cannot recover pre-restart job metadata,
- `GET /runs/{job_id}/reports` only works for jobs known to the current process,
- run artifacts from earlier service processes remain available on disk.
Operators should recover previous results with the CLI run-history commands:
```sh
PYTHONPATH=src python3 -m guide_board runs list --runs-dir runs
PYTHONPATH=src python3 -m guide_board runs latest --runs-dir runs
PYTHONPATH=src python3 -m guide_board runs report --runs-dir runs --run-id RUN_ID
```
## Recovery Flow
Use this flow when the service process restarted or a browser/UI lost its job
state:
1. Identify the output directory passed to `POST /runs`.
2. Confirm whether `retention-summary.json` exists.
3. If it exists, use `guide-board runs report --runs-dir <parent>` to retrieve
report paths.
4. If only partial files exist, inspect `run.json`, `plan.json`, and artifacts
before rerunning.
5. Rerun into a fresh output directory when the prior status is unclear.
## Future Durable Index Option
A future durable service index may be added if UI or automation workflows need
cross-restart job lookup. If added, it should remain reconstructable from run
directories and should not become the authority for assessment results.
The minimum acceptable durable index would contain:
- job id,
- request payload,
- job transport status,
- run id,
- output directory,
- result paths,
- error summary.
The index should be optional, dependency-light, and repairable by scanning
retained run summaries.