Add llm-assisted discovery extraction

2026-05-19 04:35:35 +02:00
parent bc25eb6871
commit a76c6a4aea
7 changed files with 981 additions and 4 deletions
--- a/docs/repo-reality-scanner.md
+++ b/docs/repo-reality-scanner.md
@@ -55,6 +55,42 @@ The deterministic extractor framework currently covers:
 Each extractor emits candidates through the same accumulator so stable-key
 duplicates merge inside a scan before the snapshot is returned.

+## LLM-Assisted Extraction
+
+LLM extraction is optional and explicit:
+
+```bash
+railiance-fabric scan . \
+  --repo-slug railiance-fabric \
+  --llm \
+  --llm-provider openai \
+  --llm-model gpt-4.1-mini \
+  --dry-run \
+  --output discovery-with-llm.json
+```
+
+The implementation integrates through `llm-connect` with `create_adapter` and
+`RunConfig`. Tests use a `MockLLMAdapter`-compatible boundary so CI stays
+offline. If `llm-connect` is unavailable, the provider call fails, or the model
+returns malformed JSON, the scanner records a `review_artifacts` entry and keeps
+the discovery snapshot schema-valid.
+
+The LLM never receives the whole repository. The scanner first builds a compact
+evidence bundle from deterministic candidates, prioritizing repo-owned Fabric
+declarations, services, capabilities, interfaces, libraries, deployments, and
+small README/INTENT/SCOPE signals. The prompt asks for strict JSON:
+
+```json
+{"nodes": [], "edges": [], "attributes": []}
+```
+
+Projected LLM candidates are always `origin: llm` and
+`review_state: needs_review`. Candidates below the configured confidence
+threshold become `llm_low_confidence` review artifacts instead of graph
+candidates. Unresolved edge endpoints or attribute targets also become review
+artifacts. Accepted graph data still requires deterministic evidence,
+repo-owned declarations, or a later human review/acceptance path.
+
 ## Identity

 Identity is the main safety boundary. The scanner must not append guesses on