Files
repo-scoping/docs/characteristic-evidence-model.md

141 lines
7.1 KiB
Markdown

# Characteristic And Evidence Model
The registry should treat a repository profile as a characteristic tree.
## Characteristics
A characteristic is an interpreted claim about a repository. The current concrete
levels are:
- Scope: the single root characteristic for the repository.
- Ability: a high-level thing the repository is meant to enable.
- Capability: a more specific capacity that contributes to an ability.
- Feature: a concrete user-facing, operational, interface, or implementation
feature that contributes to a capability.
The regular target shape is:
```text
Scope -> Ability -> Capability -> Feature -> Observed Fact
```
This regular tree is an orientation tool, not a claim that every real repository
is perfectly tree-shaped. Cross references and same-level references can be
useful during review, but they are also quality signals: frequent same-level
feature references may indicate that features are too coarse, too fine, or
organized under the wrong capability.
## Facts, Source References, And Evidence
Observed facts are deterministic scanner output. They describe what was seen in
the repository: files, languages, frameworks, routes, tests, documentation,
provider names, configuration variables, and similar source-linked observations.
Facts can carry a source role so generation can separate product evidence from
ambient context. Important roles include:
- `intent_summary`: `INTENT.md` or equivalent design-intent material describing
why the repository should exist and what utility it is meant to provide.
- `derived_scope`: `SCOPE.md` or equivalent current-scope material. This is a
derived or curated description of what is believed to be true now, not primary
evidence for rebuilding the same characteristic model.
- `product_documentation`: README, docs, specifications, and user-facing guides.
- `implementation_source`: source code owned by the repository.
- `dependency_declaration`: manifests, imports, lockfiles, and package metadata.
- `configuration`, `ci_tooling`, `test_evidence`, and `agent_guidance`.
`INTENT.md` and `SCOPE.md` deliberately answer different questions. Intent is a
design artifact: what the repository is supposed to become or provide. Scope is
a derived current-state artifact: what the repository is understood to provide
after evidence and review. A good `SCOPE.md` is valuable context, but using it
as ordinary evidence for generated characteristics creates a circular model.
Rebuilds should therefore prefer `INTENT.md`, product documentation, source, and
tests; `SCOPE.md` should be used as comparison material or explicit bootstrap
input only when a curator chooses that mode.
For repositories that already have a useful `SCOPE.md` but no `INTENT.md`,
repo-scoping can perform a one-time bootstrap by copying the scope text into a
new intent file with a clear provenance note. After that bootstrap, the files
should diverge naturally: `INTENT.md` remains design intent, while `SCOPE.md`
remains generated or curated current scope.
Provider, dependency, and tooling facts should also carry a utility
relationship. A provider mentioned in documentation is usually a `mention`; an
environment variable is usually `configure`; a manifest entry is usually
`dependency`; implementation code under provider or adapter modules may be
`owned` or `adapter`. Candidate generation should promote only relationships
that show the repository provides the utility directly or intentionally exposes
it as a facade/adapter. Mentions, dependencies, configuration, and tooling are
context until a curator promotes them or stronger owned evidence appears.
Trusted auto-approval applies the same rule. A candidate capability must have
source references and an eligible utility relationship (`owned`, `facade`, or
`adapter`) before it can be approved automatically. Dependency, tooling,
configuration, and mention-only candidates remain review material. The review
decision should explain both sides: why approved candidates were considered safe
and why skipped candidates need curator review.
`INTENT.md` may also seed intended capabilities when it contains an explicit
capability section. These intent-derived candidates are marked as review
required because intent says what the repository is meant to provide, not what
has already been proven. `SCOPE.md` sections with the same wording are not
treated as equivalent input during rebuilds, because scope is derived from the
registry model being rebuilt.
The motivating failure mode was a key-cape-like repository whose agent guidance
and generic backend-adapter vocabulary looked superficially like LLM provider
routing. That pattern should produce source-linked facts for the files that
exist, but it should not become an LLM-provider capability unless there is
provider-specific owned, facade, or adapter evidence. The scanner and generator
should solve this by provenance and utility relationship rules, not by
hard-coding product names.
Source references point from interpreted claims back to files or facts.
Evidence is support for a characteristic. It is not the same thing as an observed
fact. Evidence may reference:
- Observed facts.
- Source files or content chunks.
- Lower-level characteristics, such as a capability using features as evidence.
Evidence should usually point downward in abstraction. An ability can use
capabilities or features as support. A capability can use features or facts as
support. A feature should usually use facts or source references as support, not
abilities or capabilities.
Same-level evidence references are allowed as review material, but should be
treated as a possible organization smell.
## Implementation Direction
The current schema still stores evidence on capabilities, with textual
references and source refs. The next additive schema step should generalize this
without breaking existing data:
- Add a scope root per repository.
- Add typed evidence targets: supported characteristic kind/id.
- Add typed evidence references: fact, source ref, content chunk, or
characteristic kind/id.
- Keep legacy evidence fields until migration/export/search have been updated.
The UI should make this relationship clear by presenting evidence as support
under the characteristic it supports, not as a peer of features.
## Rebuilds And Supersession
Use a normal analysis rerun when the existing approved map is mostly trustworthy
and the goal is to compare new evidence against prior candidates. Use a rebuild
from scratch when approved characteristics are polluted by a bad extraction
pattern, stale after a major rename, or circularly derived from old scope text.
A dry-run rebuild should be the first step. It scans current source, generates a
fresh candidate graph, and reports what approved abilities, capabilities,
features, and evidence would be superseded. A confirmed rebuild preserves audit
history by recording which approved IDs were superseded, then clears the current
approved map and leaves the fresh candidate graph for review or trusted
auto-approval.
Curators should treat superseded characteristics as historical claims, not as
deleted facts. They explain what the registry used to believe and why a rebuild
was chosen over incremental correction.