diff --git a/workplans/RAIL-FAB-WP-0011-operational-rescan-loops.md b/workplans/RAIL-FAB-WP-0011-operational-rescan-loops.md new file mode 100644 index 0000000..f816690 --- /dev/null +++ b/workplans/RAIL-FAB-WP-0011-operational-rescan-loops.md @@ -0,0 +1,313 @@ +--- +id: RAIL-FAB-WP-0011 +type: workplan +title: "Operational Rescan Loops" +domain: railiance +repo: railiance-fabric +status: ready +owner: codex +topic_slug: railiance +planning_priority: high +planning_order: 11 +created: "2026-05-19" +updated: "2026-05-19" +state_hub_workstream_id: "b6eb92ee-1aba-49b4-8580-ab15782cb970" +--- + +# RAIL-FAB-WP-0011 - Operational Rescan Loops + +## Goal + +Turn the repo reality scanner into a regular operational loop that can rescan +the local Fabric estate, compare each repo against its latest known discovery +state, store useful baselines, surface changes for review, and update the +registry without requiring manual JSON handoffs between runs. + +The desired outcome is a boring, repeatable command path that can be run by a +human, cron, Codex automation, or a later State Hub operator. A run should answer +three practical questions: + +- what changed in the observed repo reality? +- what needs review before acceptance? +- which repos failed, were skipped, or are becoming stale? + +## Background + +`RAIL-FAB-WP-0010` established the repo reality scanner, deterministic and +LLM-assisted extraction, reconciliation, registry discovery snapshot storage, +multi-repo `registry scan-manifest`, and the first small rollout dry-run. + +The scanner can already: + +- scan one repo or a manifest of repos +- write discovery snapshots to files +- reconcile against a previous snapshot directory +- ingest discovery snapshots into the Fabric registry +- accept candidates that are already review-safe +- produce concise per-repo summaries + +The remaining operational gap is that repeated rescans still require too much +manual setup: choosing a snapshot directory, exporting previous snapshots, +remembering when to ingest, and turning run summaries into a persistent review +backlog. + +This workplan closes that gap by making the registry and CLI cooperate around +baselines, previous-from-registry diffs, run reports, stale/failure health, and +automation-safe modes. + +## Design Principles + +- Default to safe dry-runs and explicit ingest/accept actions. +- Prefer the registry as the durable source of prior discovery state. +- Keep local snapshot caches useful but optional. +- Make unchanged runs cheap and quiet. +- Treat conflicts, tombstones, LLM failures, and missing repos as review + signals, not as silent noise. +- Preserve per-repo failure isolation in every operational mode. +- Keep the loop automation-friendly: stable exit codes, machine-readable + reports, lock/overlap behavior, and concise human summaries. +- Avoid accepting or projecting discovery data unless review state and policy + allow it. + +## Proposed Operational Loop + +1. Read `registry/local-repos.yaml` or another onboarding manifest. +2. For each selected repo, determine the previous discovery snapshot from: + - the latest registry snapshot for the same repo/profile, or + - a configured local snapshot cache, or + - no previous snapshot on first baseline. +3. Run the scanner with deterministic rules and explicitly enabled connectors + or LLM profile. +4. Reconcile current evidence against previous evidence. +5. Write an operational run report with per-repo results, diffs, failures, + skipped LLM state, review artifact counts, and accepted/ingested ids. +6. Optionally ingest changed or baseline snapshots into the registry. +7. Optionally project candidates only when policy says they are acceptable. +8. Expose the run result through registry/status endpoints and State Hub + progress notes. + +## Scope + +In scope: + +- CLI and registry support for previous-from-registry rescans. +- Standard local snapshot/report directory conventions. +- Run reports that can be consumed by humans and automation. +- Idempotent ingest behavior for unchanged runs. +- Review-oriented summary output and health status. +- Documentation and tests for recurring use. + +Out of scope for this workplan: + +- A full review UI for discovery conflicts and tombstones. +- Live server/deployment inventory connectors beyond existing local connector + mechanics. +- Auto-generating repo-owned Fabric declaration patches from accepted + discoveries. +- Enabling external LLM providers by default. + +Those are likely follow-up workplans once the operational loop produces steady +baseline data. + +## Tasks + +### T01 - Snapshot Cache And Baseline Conventions + +```task +id: RAIL-FAB-WP-0011-T01 +status: todo +priority: high +state_hub_task_id: "cb6f05b6-ae8c-47b1-aead-4505276b089f" +``` + +Define and implement the local baseline conventions for repeated discovery +scans. + +Acceptance notes: + +- Define a standard local directory, likely `.fabric-discovery/`, for snapshot + caches and run reports. +- Decide whether the directory is ignored, partially checked in, or fully local + operational state; document the reason. +- Add CLI defaults or manifest configuration so `scan-manifest` can write and + read this directory without repeated flags. +- Preserve explicit `--output-dir` and `--previous-dir` overrides. +- Ensure output filenames remain stable by repo slug and scan profile. +- Add tests that prove first-baseline runs create predictable snapshot/report + paths without affecting registry state in dry-run mode. + +### T02 - Previous-From-Registry Reconciliation + +```task +id: RAIL-FAB-WP-0011-T02 +status: todo +priority: high +state_hub_task_id: "ee8e5437-3c87-473c-99a0-84d947d09249" +``` + +Allow manifest rescans to diff against the latest stored discovery snapshot in +the registry, so operators do not need to export JSON before every run. + +Acceptance notes: + +- Add a `scan-manifest` option such as `--previous-source registry|dir|none` + or `--previous-from-registry`. +- Fetch the latest discovery snapshot for each repo/profile through existing + registry discovery APIs. +- Fall back cleanly when a repo has no previous registry snapshot and mark the + run as a first baseline for that repo. +- Keep local `--previous-dir` support for offline or file-based workflows. +- Include previous snapshot id/source in per-repo results and run reports. +- Add tests for registry previous found, registry previous missing, registry + unavailable, and file-directory fallback. + +### T03 - Operational Run Reports + +```task +id: RAIL-FAB-WP-0011-T03 +status: todo +priority: high +state_hub_task_id: "d11621f2-7610-4060-863d-dbf86858a3e6" +``` + +Persist each rescan loop as a report that can drive review, State Hub notes, +and future automation. + +Acceptance notes: + +- Add a report schema or documented JSON shape for manifest rescan runs. +- Record command profile, manifest path, selected repos, generated timestamp, + scanner version, registry URL, dry-run/ingest/accept flags, and LLM budget + policy. +- For each repo, record commit, previous source/id, current output path, + discovery snapshot id, accepted graph snapshot id, candidate counts, diff + counts, review artifact counts, connector run summaries, and errors. +- Add `--report-output` and a default report path under the standard + operational directory. +- Keep console output concise while making the JSON report complete. +- Add tests for report content in success, partial failure, and no-change runs. + +### T04 - Idempotent Ingest And Acceptance Policies + +```task +id: RAIL-FAB-WP-0011-T04 +status: todo +priority: high +state_hub_task_id: "c64daf3b-a5ec-4ea9-82b6-f8f352eb9283" +``` + +Make registry writes safe for recurring execution by avoiding unnecessary +snapshot churn and by separating ingest from acceptance policy. + +Acceptance notes: + +- Add a mode to skip ingesting unchanged discovery snapshots unless explicitly + requested. +- Detect unchanged snapshots by reconciliation diff and/or normalized snapshot + fingerprint. +- Keep an explicit first-baseline ingest mode for repos with no prior discovery + snapshot. +- Add acceptance policy controls such as accepted-only, no-conflicts, + no-tombstones, selected keys, or selected review states. +- Prevent `--accept` from projecting conflicted, needs-review, or low-confidence + candidates unless an explicit override is supplied. +- Report why a repo was ingested, skipped unchanged, blocked for review, or + accepted. +- Add tests covering unchanged skip, baseline ingest, changed ingest, blocked + acceptance, and explicit acceptance override. + +### T05 - Rescan Health And Registry Surfaces + +```task +id: RAIL-FAB-WP-0011-T05 +status: todo +priority: medium +state_hub_task_id: "b3440439-b9c4-4753-98bc-618d1934ed4e" +``` + +Expose operational rescan state through the registry so humans and tools can +see freshness, failures, and review load. + +Acceptance notes: + +- Store or derive latest rescan metadata per repo/profile. +- Add registry inventory/status fields for latest discovery run, latest diff + counts, latest failure, stale age, and review artifact counts. +- Add an endpoint or CLI view for repos needing review. +- Add an endpoint or CLI view for repos stale beyond a configurable age. +- Keep existing graph and discovery snapshot APIs backward compatible. +- Add tests for inventory/status output after baseline, changed, failed, and + stale runs. + +### T06 - Automation-Safe Command Mode + +```task +id: RAIL-FAB-WP-0011-T06 +status: todo +priority: medium +state_hub_task_id: "7461e6f1-7ef0-4947-9cfa-67f463e9aa00" +``` + +Make the rescan loop safe to run from cron, Codex automations, or a State Hub +operator without bespoke shell glue. + +Acceptance notes: + +- Add a documented command recipe, script, or subcommand for the standard local + rescan loop. +- Define stable exit codes for success, changes found, review required, + partial repo failures, and infrastructure failure. +- Add lock/overlap protection so two local rescan loops do not run against the + same manifest concurrently. +- Keep dry-run as the safe default unless ingest/accept flags are explicit. +- Emit concise human output and machine-readable JSON consistently. +- Add tests for exit-code policy and lock behavior where practical. + +### T07 - Documentation, Rollout, And First Baseline + +```task +id: RAIL-FAB-WP-0011-T07 +status: todo +priority: medium +state_hub_task_id: "9c6f8e33-dd13-48ca-815d-73fd09b25423" +``` + +Document the operational loop and run a first controlled baseline against a +small local repo set before broad adoption. + +Acceptance notes: + +- Document the standard local rescan workflow, registry-backed workflow, + report format, exit codes, and failure handling. +- Document how to use deterministic-only mode, connector mode, and LLM-capped + mode safely. +- Document the manual review steps before acceptance. +- Run a first baseline loop against a small allowlist such as + `repo-scoping`, `llm-connect`, and `railiance-fabric`. +- Record the resulting report summary and follow-up backlog in docs and State + Hub progress. +- Mark this workplan ready for broader all-local-repo rollout only after the + small baseline loop is repeatable. + +## Open Questions + +- Should local discovery caches be committed, ignored, or treated as operator + runtime state only? +- Should the registry store every run report or only latest run metadata? +- What is the right default stale age for local repos: daily, weekly, or based + on commit changes? +- Should exit code `0` mean "no infrastructure failure" or "no changes found"? +- Which acceptance policies are safe enough for unattended operation? +- Should State Hub receive one progress note per run or only when changes, + failures, or review-required conditions appear? + +## Close Criteria + +- A single documented command can perform a safe repeated rescan loop across a + manifest. +- The command can diff against registry-stored previous discovery snapshots. +- First-baseline, unchanged, changed, failed, and review-required repos are + distinguishable in console output, JSON reports, and registry status. +- Repeated runs do not create noisy duplicate registry snapshots by default. +- Acceptance remains explicit and policy-gated. +- Tests cover the recurring loop behavior well enough to trust automation.