Add deterministic repo scanner

This commit is contained in:
2026-05-19 03:55:50 +02:00
parent 17bd23e79b
commit afd8b3d608
5 changed files with 1547 additions and 1 deletions

View File

@@ -20,6 +20,41 @@ repository, one commit, and one scan profile. It contains:
The JSON schema lives at `schemas/discovery-snapshot.schema.yaml`.
## Deterministic Scanner CLI
The first implementation slice adds an offline deterministic scan command:
```bash
railiance-fabric scan . \
--repo-slug railiance-fabric \
--commit "$(git rev-parse HEAD)" \
--dry-run \
--output discovery-snapshot.json
```
Use `--json` to print the full `FabricDiscoverySnapshot` to stdout. Without
`--json`, the command prints a concise summary of node, edge, attribute, and
replacement-scope counts. The scanner does not call registries, catalogs, or
LLMs in this mode; `--output` is the only write side effect.
The deterministic extractor framework currently covers:
- repository metadata from local git/path evidence
- README, INTENT, and SCOPE document presence and headings
- repo-owned Fabric declarations under `fabric/`
- Python `pyproject.toml` package metadata and dependencies
- Node `package.json` package metadata and dependencies
- common lockfiles such as `package-lock.json`, `poetry.lock`, and `uv.lock`
- Dockerfiles and Docker Compose services
- OpenAPI and AsyncAPI contract files
- Score workload files
- Kubernetes-style deployment manifests
- common service config files such as `application.yaml` and
`appsettings.json`
Each extractor emits candidates through the same accumulator so stable-key
duplicates merge inside a scan before the snapshot is returned.
## Identity
Identity is the main safety boundary. The scanner must not append guesses on