Add llm-assisted discovery extraction

This commit is contained in:
2026-05-19 04:35:35 +02:00
parent bc25eb6871
commit a76c6a4aea
7 changed files with 981 additions and 4 deletions

View File

@@ -55,6 +55,42 @@ The deterministic extractor framework currently covers:
Each extractor emits candidates through the same accumulator so stable-key
duplicates merge inside a scan before the snapshot is returned.
## LLM-Assisted Extraction
LLM extraction is optional and explicit:
```bash
railiance-fabric scan . \
--repo-slug railiance-fabric \
--llm \
--llm-provider openai \
--llm-model gpt-4.1-mini \
--dry-run \
--output discovery-with-llm.json
```
The implementation integrates through `llm-connect` with `create_adapter` and
`RunConfig`. Tests use a `MockLLMAdapter`-compatible boundary so CI stays
offline. If `llm-connect` is unavailable, the provider call fails, or the model
returns malformed JSON, the scanner records a `review_artifacts` entry and keeps
the discovery snapshot schema-valid.
The LLM never receives the whole repository. The scanner first builds a compact
evidence bundle from deterministic candidates, prioritizing repo-owned Fabric
declarations, services, capabilities, interfaces, libraries, deployments, and
small README/INTENT/SCOPE signals. The prompt asks for strict JSON:
```json
{"nodes": [], "edges": [], "attributes": []}
```
Projected LLM candidates are always `origin: llm` and
`review_state: needs_review`. Candidates below the configured confidence
threshold become `llm_low_confidence` review artifacts instead of graph
candidates. Unresolved edge endpoints or attribute targets also become review
artifacts. Accepted graph data still requires deterministic evidence,
repo-owned declarations, or a later human review/acceptance path.
## Identity
Identity is the main safety boundary. The scanner must not append guesses on