new workplan rescan

2026-05-20 00:57:26 +02:00
parent 6746943c0b
commit 50810ffd54
1 changed files with 313 additions and 0 deletions
--- a/workplans/RAIL-FAB-WP-0011-operational-rescan-loops.md
+++ b/workplans/RAIL-FAB-WP-0011-operational-rescan-loops.md
@@ -0,0 +1,313 @@
+---
+id: RAIL-FAB-WP-0011
+type: workplan
+title: "Operational Rescan Loops"
+domain: railiance
+repo: railiance-fabric
+status: ready
+owner: codex
+topic_slug: railiance
+planning_priority: high
+planning_order: 11
+created: "2026-05-19"
+updated: "2026-05-19"
+state_hub_workstream_id: "b6eb92ee-1aba-49b4-8580-ab15782cb970"
+---
+
+# RAIL-FAB-WP-0011 - Operational Rescan Loops
+
+## Goal
+
+Turn the repo reality scanner into a regular operational loop that can rescan
+the local Fabric estate, compare each repo against its latest known discovery
+state, store useful baselines, surface changes for review, and update the
+registry without requiring manual JSON handoffs between runs.
+
+The desired outcome is a boring, repeatable command path that can be run by a
+human, cron, Codex automation, or a later State Hub operator. A run should answer
+three practical questions:
+
+- what changed in the observed repo reality?
+- what needs review before acceptance?
+- which repos failed, were skipped, or are becoming stale?
+
+## Background
+
+`RAIL-FAB-WP-0010` established the repo reality scanner, deterministic and
+LLM-assisted extraction, reconciliation, registry discovery snapshot storage,
+multi-repo `registry scan-manifest`, and the first small rollout dry-run.
+
+The scanner can already:
+
+- scan one repo or a manifest of repos
+- write discovery snapshots to files
+- reconcile against a previous snapshot directory
+- ingest discovery snapshots into the Fabric registry
+- accept candidates that are already review-safe
+- produce concise per-repo summaries
+
+The remaining operational gap is that repeated rescans still require too much
+manual setup: choosing a snapshot directory, exporting previous snapshots,
+remembering when to ingest, and turning run summaries into a persistent review
+backlog.
+
+This workplan closes that gap by making the registry and CLI cooperate around
+baselines, previous-from-registry diffs, run reports, stale/failure health, and
+automation-safe modes.
+
+## Design Principles
+
+- Default to safe dry-runs and explicit ingest/accept actions.
+- Prefer the registry as the durable source of prior discovery state.
+- Keep local snapshot caches useful but optional.
+- Make unchanged runs cheap and quiet.
+- Treat conflicts, tombstones, LLM failures, and missing repos as review
+  signals, not as silent noise.
+- Preserve per-repo failure isolation in every operational mode.
+- Keep the loop automation-friendly: stable exit codes, machine-readable
+  reports, lock/overlap behavior, and concise human summaries.
+- Avoid accepting or projecting discovery data unless review state and policy
+  allow it.
+
+## Proposed Operational Loop
+
+1. Read `registry/local-repos.yaml` or another onboarding manifest.
+2. For each selected repo, determine the previous discovery snapshot from:
+   - the latest registry snapshot for the same repo/profile, or
+   - a configured local snapshot cache, or
+   - no previous snapshot on first baseline.
+3. Run the scanner with deterministic rules and explicitly enabled connectors
+   or LLM profile.
+4. Reconcile current evidence against previous evidence.
+5. Write an operational run report with per-repo results, diffs, failures,
+   skipped LLM state, review artifact counts, and accepted/ingested ids.
+6. Optionally ingest changed or baseline snapshots into the registry.
+7. Optionally project candidates only when policy says they are acceptable.
+8. Expose the run result through registry/status endpoints and State Hub
+   progress notes.
+
+## Scope
+
+In scope:
+
+- CLI and registry support for previous-from-registry rescans.
+- Standard local snapshot/report directory conventions.
+- Run reports that can be consumed by humans and automation.
+- Idempotent ingest behavior for unchanged runs.
+- Review-oriented summary output and health status.
+- Documentation and tests for recurring use.
+
+Out of scope for this workplan:
+
+- A full review UI for discovery conflicts and tombstones.
+- Live server/deployment inventory connectors beyond existing local connector
+  mechanics.
+- Auto-generating repo-owned Fabric declaration patches from accepted
+  discoveries.
+- Enabling external LLM providers by default.
+
+Those are likely follow-up workplans once the operational loop produces steady
+baseline data.
+
+## Tasks
+
+### T01 - Snapshot Cache And Baseline Conventions
+
+```task
+id: RAIL-FAB-WP-0011-T01
+status: todo
+priority: high
+state_hub_task_id: "cb6f05b6-ae8c-47b1-aead-4505276b089f"
+```
+
+Define and implement the local baseline conventions for repeated discovery
+scans.
+
+Acceptance notes:
+
+- Define a standard local directory, likely `.fabric-discovery/`, for snapshot
+  caches and run reports.
+- Decide whether the directory is ignored, partially checked in, or fully local
+  operational state; document the reason.
+- Add CLI defaults or manifest configuration so `scan-manifest` can write and
+  read this directory without repeated flags.
+- Preserve explicit `--output-dir` and `--previous-dir` overrides.
+- Ensure output filenames remain stable by repo slug and scan profile.
+- Add tests that prove first-baseline runs create predictable snapshot/report
+  paths without affecting registry state in dry-run mode.
+
+### T02 - Previous-From-Registry Reconciliation
+
+```task
+id: RAIL-FAB-WP-0011-T02
+status: todo
+priority: high
+state_hub_task_id: "ee8e5437-3c87-473c-99a0-84d947d09249"
+```
+
+Allow manifest rescans to diff against the latest stored discovery snapshot in
+the registry, so operators do not need to export JSON before every run.
+
+Acceptance notes:
+
+- Add a `scan-manifest` option such as `--previous-source registry|dir|none`
+  or `--previous-from-registry`.
+- Fetch the latest discovery snapshot for each repo/profile through existing
+  registry discovery APIs.
+- Fall back cleanly when a repo has no previous registry snapshot and mark the
+  run as a first baseline for that repo.
+- Keep local `--previous-dir` support for offline or file-based workflows.
+- Include previous snapshot id/source in per-repo results and run reports.
+- Add tests for registry previous found, registry previous missing, registry
+  unavailable, and file-directory fallback.
+
+### T03 - Operational Run Reports
+
+```task
+id: RAIL-FAB-WP-0011-T03
+status: todo
+priority: high
+state_hub_task_id: "d11621f2-7610-4060-863d-dbf86858a3e6"
+```
+
+Persist each rescan loop as a report that can drive review, State Hub notes,
+and future automation.
+
+Acceptance notes:
+
+- Add a report schema or documented JSON shape for manifest rescan runs.
+- Record command profile, manifest path, selected repos, generated timestamp,
+  scanner version, registry URL, dry-run/ingest/accept flags, and LLM budget
+  policy.
+- For each repo, record commit, previous source/id, current output path,
+  discovery snapshot id, accepted graph snapshot id, candidate counts, diff
+  counts, review artifact counts, connector run summaries, and errors.
+- Add `--report-output` and a default report path under the standard
+  operational directory.
+- Keep console output concise while making the JSON report complete.
+- Add tests for report content in success, partial failure, and no-change runs.
+
+### T04 - Idempotent Ingest And Acceptance Policies
+
+```task
+id: RAIL-FAB-WP-0011-T04
+status: todo
+priority: high
+state_hub_task_id: "c64daf3b-a5ec-4ea9-82b6-f8f352eb9283"
+```
+
+Make registry writes safe for recurring execution by avoiding unnecessary
+snapshot churn and by separating ingest from acceptance policy.
+
+Acceptance notes:
+
+- Add a mode to skip ingesting unchanged discovery snapshots unless explicitly
+  requested.
+- Detect unchanged snapshots by reconciliation diff and/or normalized snapshot
+  fingerprint.
+- Keep an explicit first-baseline ingest mode for repos with no prior discovery
+  snapshot.
+- Add acceptance policy controls such as accepted-only, no-conflicts,
+  no-tombstones, selected keys, or selected review states.
+- Prevent `--accept` from projecting conflicted, needs-review, or low-confidence
+  candidates unless an explicit override is supplied.
+- Report why a repo was ingested, skipped unchanged, blocked for review, or
+  accepted.
+- Add tests covering unchanged skip, baseline ingest, changed ingest, blocked
+  acceptance, and explicit acceptance override.
+
+### T05 - Rescan Health And Registry Surfaces
+
+```task
+id: RAIL-FAB-WP-0011-T05
+status: todo
+priority: medium
+state_hub_task_id: "b3440439-b9c4-4753-98bc-618d1934ed4e"
+```
+
+Expose operational rescan state through the registry so humans and tools can
+see freshness, failures, and review load.
+
+Acceptance notes:
+
+- Store or derive latest rescan metadata per repo/profile.
+- Add registry inventory/status fields for latest discovery run, latest diff
+  counts, latest failure, stale age, and review artifact counts.
+- Add an endpoint or CLI view for repos needing review.
+- Add an endpoint or CLI view for repos stale beyond a configurable age.
+- Keep existing graph and discovery snapshot APIs backward compatible.
+- Add tests for inventory/status output after baseline, changed, failed, and
+  stale runs.
+
+### T06 - Automation-Safe Command Mode
+
+```task
+id: RAIL-FAB-WP-0011-T06
+status: todo
+priority: medium
+state_hub_task_id: "7461e6f1-7ef0-4947-9cfa-67f463e9aa00"
+```
+
+Make the rescan loop safe to run from cron, Codex automations, or a State Hub
+operator without bespoke shell glue.
+
+Acceptance notes:
+
+- Add a documented command recipe, script, or subcommand for the standard local
+  rescan loop.
+- Define stable exit codes for success, changes found, review required,
+  partial repo failures, and infrastructure failure.
+- Add lock/overlap protection so two local rescan loops do not run against the
+  same manifest concurrently.
+- Keep dry-run as the safe default unless ingest/accept flags are explicit.
+- Emit concise human output and machine-readable JSON consistently.
+- Add tests for exit-code policy and lock behavior where practical.
+
+### T07 - Documentation, Rollout, And First Baseline
+
+```task
+id: RAIL-FAB-WP-0011-T07
+status: todo
+priority: medium
+state_hub_task_id: "9c6f8e33-dd13-48ca-815d-73fd09b25423"
+```
+
+Document the operational loop and run a first controlled baseline against a
+small local repo set before broad adoption.
+
+Acceptance notes:
+
+- Document the standard local rescan workflow, registry-backed workflow,
+  report format, exit codes, and failure handling.
+- Document how to use deterministic-only mode, connector mode, and LLM-capped
+  mode safely.
+- Document the manual review steps before acceptance.
+- Run a first baseline loop against a small allowlist such as
+  `repo-scoping`, `llm-connect`, and `railiance-fabric`.
+- Record the resulting report summary and follow-up backlog in docs and State
+  Hub progress.
+- Mark this workplan ready for broader all-local-repo rollout only after the
+  small baseline loop is repeatable.
+
+## Open Questions
+
+- Should local discovery caches be committed, ignored, or treated as operator
+  runtime state only?
+- Should the registry store every run report or only latest run metadata?
+- What is the right default stale age for local repos: daily, weekly, or based
+  on commit changes?
+- Should exit code `0` mean "no infrastructure failure" or "no changes found"?
+- Which acceptance policies are safe enough for unattended operation?
+- Should State Hub receive one progress note per run or only when changes,
+  failures, or review-required conditions appear?
+
+## Close Criteria
+
+- A single documented command can perform a safe repeated rescan loop across a
+  manifest.
+- The command can diff against registry-stored previous discovery snapshots.
+- First-baseline, unchanged, changed, failed, and review-required repos are
+  distinguishable in console output, JSON reports, and registry status.
+- Repeated runs do not create noisy duplicate registry snapshots by default.
+- Acceptance remains explicit and policy-gated.
+- Tests cover the recurring loop behavior well enough to trust automation.