3.0 KiB
Abstraction Strategy
The registry has three layers with different trust levels:
- Observed facts are deterministic scanner output: files, manifests, framework hints, tests, docs, routes, commands, and source locations.
- Candidate claims are abstractions proposed from those facts. They are useful review seeds, not registry truth.
- Approved entries are curated truth after human review or an explicit trusted automation mode.
Granularity
Features should describe a user-visible or operational behavior surface, not mirror individual scanner facts. A one-to-one pattern such as one route fact becoming one feature is a smell unless the repository truly exposes only one behavior.
Current deterministic grouping:
- Multiple HTTP route facts become one
HTTP API surfacefeature with several source references. - Multiple CLI command facts become one
CLI command surfacefeature with several source references. - Facts remain available as drilldown evidence through
source_refs.
This gives reviewers orientation at the behavior level while keeping provenance.
What Deterministic Logic Can Do
Deterministic scanners can reliably identify:
- repository structure and languages
- package manifests and framework hints
- API/CLI entry-point surfaces
- docs, examples, and tests as corroborating evidence
- stable source references for review and approval
Deterministic candidate generation can group these into conservative capabilities such as interface exposure and repository structure. It should avoid pretending it understands domain intent when the evidence is thin.
Where LLM Assistance Helps
LLMs are most useful for naming and explaining intent:
- turning
HTTP API surfaceinto a domain capability such as repository ingestion, review workflow, or search - separating administrative, operational, and product-facing capabilities
- summarizing README and code context into clearer ability descriptions
- suggesting merges or relinks when deterministic names are too generic
LLM output remains candidate material. It should cite source paths and be reviewed by a human or configured agentic reviewer before becoming approved registry truth. Deterministic checks can block or flag weak candidates; they do not approve them.
Trial Repo Observations
repo-scoping demonstrates the current boundary well: deterministic scanning sees
FastAPI routes, tests, docs, and Python structure, but the meaningful abstractions
are repository ingestion, deterministic analysis, candidate review, discovery, and
State Hub coordination. Those names likely require either review edits or LLM
assistance.
The other trial repos reinforce the same point: fact lists are useful audit trails, but the primary UI should lead with candidate or approved ability maps and expose facts as drilldown evidence.
Regression Guard
tests/test_candidate_graph.py includes a guard that multiple interface facts are
grouped into behavioral surface features with multiple source refs. This protects
against falling back to one feature per observed fact.