Files
guide-board/docs/SERVICE-JOB-DURABILITY.md

2.9 KiB

Service Job Durability

Status: draft Created: 2026-05-15

Decision

The guide-board local service keeps HTTP job state in memory for the baseline. This is intentional. The service is a thin local transport over the CLI contracts, not a workflow database.

Durable state lives in run directories:

  • run.json
  • plan.json
  • retention-summary.json
  • normalized/evidence.json
  • normalized/findings.json
  • normalized/mappings.json
  • reports/assessment-package.json
  • reports/report.md
  • artifacts/

The durable recovery index is the set of retention-summary.json files under a runs directory.

Why In-Memory Jobs Stay The Baseline

In-memory service jobs keep the first service layer dependency-light and easy to embed in local, container, and extension-specific environments. Operators can restart the service without migrating or repairing a service database, and the CLI remains the source of truth for execution semantics.

This also keeps interrupted service runs easy to reason about:

  • if the process exits before a run completes, the HTTP job record is gone,
  • any partial run directory remains for inspection,
  • completed runs are recoverable through retained run summaries,
  • repeated runs should use a new output directory or an intentional overwrite policy chosen by the operator.

Restart Semantics

After a service restart:

  • GET /runs returns only jobs created since the new service process started,
  • old job_id values are invalid,
  • GET /runs/{job_id} cannot recover pre-restart job metadata,
  • GET /runs/{job_id}/reports only works for jobs known to the current process,
  • run artifacts from earlier service processes remain available on disk.

Operators should recover previous results with the CLI run-history commands:

PYTHONPATH=src python3 -m guide_board runs list --runs-dir runs
PYTHONPATH=src python3 -m guide_board runs latest --runs-dir runs
PYTHONPATH=src python3 -m guide_board runs report --runs-dir runs --run-id RUN_ID

Recovery Flow

Use this flow when the service process restarted or a browser/UI lost its job state:

  1. Identify the output directory passed to POST /runs.
  2. Confirm whether retention-summary.json exists.
  3. If it exists, use guide-board runs report --runs-dir <parent> to retrieve report paths.
  4. If only partial files exist, inspect run.json, plan.json, and artifacts before rerunning.
  5. Rerun into a fresh output directory when the prior status is unclear.

Future Durable Index Option

A future durable service index may be added if UI or automation workflows need cross-restart job lookup. If added, it should remain reconstructable from run directories and should not become the authority for assessment results.

The minimum acceptable durable index would contain:

  • job id,
  • request payload,
  • job transport status,
  • run id,
  • output directory,
  • result paths,
  • error summary.

The index should be optional, dependency-light, and repairable by scanning retained run summaries.