generated from coulomb/repo-seed
73 lines
2.9 KiB
Markdown
73 lines
2.9 KiB
Markdown
# Abstraction Strategy
|
|
|
|
The registry has three layers with different trust levels:
|
|
|
|
1. Observed facts are deterministic scanner output: files, manifests, framework
|
|
hints, tests, docs, routes, commands, and source locations.
|
|
2. Candidate claims are abstractions proposed from those facts. They are useful
|
|
review seeds, not registry truth.
|
|
3. Approved entries are curated truth after human review or an explicit trusted
|
|
automation mode.
|
|
|
|
## Granularity
|
|
|
|
Features should describe a user-visible or operational behavior surface, not mirror
|
|
individual scanner facts. A one-to-one pattern such as one route fact becoming one
|
|
feature is a smell unless the repository truly exposes only one behavior.
|
|
|
|
Current deterministic grouping:
|
|
|
|
- Multiple HTTP route facts become one `HTTP API surface` feature with several
|
|
source references.
|
|
- Multiple CLI command facts become one `CLI command surface` feature with several
|
|
source references.
|
|
- Facts remain available as drilldown evidence through `source_refs`.
|
|
|
|
This gives reviewers orientation at the behavior level while keeping provenance.
|
|
|
|
## What Deterministic Logic Can Do
|
|
|
|
Deterministic scanners can reliably identify:
|
|
|
|
- repository structure and languages
|
|
- package manifests and framework hints
|
|
- API/CLI entry-point surfaces
|
|
- docs, examples, and tests as corroborating evidence
|
|
- stable source references for review and approval
|
|
|
|
Deterministic candidate generation can group these into conservative capabilities
|
|
such as interface exposure and repository structure. It should avoid pretending it
|
|
understands domain intent when the evidence is thin.
|
|
|
|
## Where LLM Assistance Helps
|
|
|
|
LLMs are most useful for naming and explaining intent:
|
|
|
|
- turning `HTTP API surface` into a domain capability such as repository ingestion,
|
|
review workflow, or search
|
|
- separating administrative, operational, and product-facing capabilities
|
|
- summarizing README and code context into clearer ability descriptions
|
|
- suggesting merges or relinks when deterministic names are too generic
|
|
|
|
LLM output remains candidate material. It should cite source paths and be reviewed
|
|
or explicitly auto-approved by a trusted mode before becoming approved registry
|
|
truth.
|
|
|
|
## Trial Repo Observations
|
|
|
|
`repo-scoping` demonstrates the current boundary well: deterministic scanning sees
|
|
FastAPI routes, tests, docs, and Python structure, but the meaningful abstractions
|
|
are repository ingestion, deterministic analysis, candidate review, discovery, and
|
|
State Hub coordination. Those names likely require either review edits or LLM
|
|
assistance.
|
|
|
|
The other trial repos reinforce the same point: fact lists are useful audit trails,
|
|
but the primary UI should lead with candidate or approved ability maps and expose
|
|
facts as drilldown evidence.
|
|
|
|
## Regression Guard
|
|
|
|
`tests/test_candidate_graph.py` includes a guard that multiple interface facts are
|
|
grouped into behavioral surface features with multiple source refs. This protects
|
|
against falling back to one feature per observed fact.
|