generated from coulomb/repo-seed
106 lines
3.7 KiB
Markdown
106 lines
3.7 KiB
Markdown
# Service Job Durability
|
|
|
|
Status: draft
|
|
Created: 2026-05-15
|
|
|
|
## Decision
|
|
|
|
The guide-board local service keeps HTTP job state in memory for the baseline.
|
|
This is intentional. The service is a thin local transport over the CLI
|
|
contracts, not a workflow database.
|
|
|
|
Durable state lives in run directories:
|
|
|
|
- `run.json`
|
|
- `plan.json`
|
|
- `sources.lock.json`
|
|
- `retention-summary.json`
|
|
- `normalized/evidence.json`
|
|
- `normalized/findings.json`
|
|
- `normalized/mappings.json`
|
|
- `reports/assessment-package.json`
|
|
- `reports/report.md`
|
|
- `reports/submission-package.json`
|
|
- `artifacts/`
|
|
|
|
The durable recovery index is the set of `retention-summary.json` files under a
|
|
runs directory. No separate durable service index is required for the baseline;
|
|
the service reconstructs retained-run views by scanning those summaries.
|
|
|
|
## Why In-Memory Jobs Stay The Baseline
|
|
|
|
In-memory service jobs keep the first service layer dependency-light and easy to
|
|
embed in local, container, and extension-specific environments. Operators can
|
|
restart the service without migrating or repairing a service database, and the
|
|
CLI remains the source of truth for execution semantics.
|
|
|
|
This also keeps interrupted service runs easy to reason about:
|
|
|
|
- if the process exits before a run completes, the HTTP job record is gone,
|
|
- any partial run directory remains for inspection,
|
|
- completed runs are recoverable through retained run summaries,
|
|
- repeated runs should use a new output directory or an intentional overwrite
|
|
policy chosen by the operator.
|
|
|
|
## Restart Semantics
|
|
|
|
After a service restart:
|
|
|
|
- `GET /runs` returns only jobs created since the new service process started,
|
|
- old `job_id` values are invalid,
|
|
- `GET /runs/{job_id}` cannot recover pre-restart job metadata,
|
|
- `GET /runs/{job_id}/reports` only works for jobs known to the current process,
|
|
- run artifacts from earlier service processes remain available on disk,
|
|
- `GET /retained-runs`, `GET /retained-runs/latest`,
|
|
`GET /retained-runs/{run_id}/reports`, and
|
|
`GET /retained-runs/{run_id}/artifact-manifest` can expose completed retained
|
|
runs after restart.
|
|
|
|
Operators can recover previous results with either the CLI run-history commands
|
|
or the retained-run service endpoints:
|
|
|
|
```sh
|
|
PYTHONPATH=src python3 -m guide_board runs list --runs-dir runs
|
|
PYTHONPATH=src python3 -m guide_board runs latest --runs-dir runs
|
|
PYTHONPATH=src python3 -m guide_board runs report --runs-dir runs --run-id RUN_ID
|
|
```
|
|
|
|
```sh
|
|
curl -sf "http://127.0.0.1:8080/retained-runs?runs_dir=runs" | python3 -m json.tool
|
|
curl -sf "http://127.0.0.1:8080/retained-runs/RUN_ID/reports?runs_dir=runs" | python3 -m json.tool
|
|
curl -sf "http://127.0.0.1:8080/retained-runs/RUN_ID/artifact-manifest?runs_dir=runs" | python3 -m json.tool
|
|
```
|
|
|
|
## Recovery Flow
|
|
|
|
Use this flow when the service process restarted or a browser/UI lost its job
|
|
state:
|
|
|
|
1. Identify the output directory passed to `POST /runs`.
|
|
2. Confirm whether `retention-summary.json` exists.
|
|
3. If it exists, use `guide-board runs report --runs-dir <parent>` or
|
|
`GET /retained-runs/{run_id}/reports?runs_dir=<parent>` to retrieve report
|
|
paths.
|
|
4. If only partial files exist, inspect `run.json`, `plan.json`, and artifacts
|
|
before rerunning.
|
|
5. Rerun into a fresh output directory when the prior status is unclear.
|
|
|
|
## Future Durable Index Option
|
|
|
|
A future durable service index may be added if UI or automation workflows need
|
|
cross-restart transport job lookup. If added, it should remain reconstructable
|
|
from run directories and should not become the authority for assessment results.
|
|
|
|
The minimum acceptable durable index would contain:
|
|
|
|
- job id,
|
|
- request payload,
|
|
- job transport status,
|
|
- run id,
|
|
- output directory,
|
|
- result paths,
|
|
- error summary.
|
|
|
|
The index should be optional, dependency-light, and repairable by scanning
|
|
retained run summaries.
|