Files
repo-scoping/docs/characteristic-evidence-model.md

141 lines
7.0 KiB
Markdown

# Characteristic And Evidence Model
The registry should treat a repository profile as a characteristic tree.
## Characteristics
A characteristic is an interpreted claim about a repository. The current concrete
levels are:
- Scope: the single root characteristic for the repository.
- Ability: a high-level thing the repository is meant to enable.
- Capability: a more specific capacity that contributes to an ability.
- Feature: a concrete user-facing, operational, interface, or implementation
feature that contributes to a capability.
The regular target shape is:
```text
Scope -> Ability -> Capability -> Feature -> Observed Fact
```
This regular tree is an orientation tool, not a claim that every real repository
is perfectly tree-shaped. Cross references and same-level references can be
useful during review, but they are also quality signals: frequent same-level
feature references may indicate that features are too coarse, too fine, or
organized under the wrong capability.
## Facts, Source References, And Evidence
Observed facts are deterministic scanner output. They describe what was seen in
the repository: files, languages, frameworks, routes, tests, documentation,
provider names, configuration variables, and similar source-linked observations.
Facts can carry a source role so generation can separate product evidence from
ambient context. Important roles include:
- `intent_summary`: `INTENT.md` or equivalent design-intent material describing
why the repository should exist and what utility it is meant to provide.
- `derived_scope`: `SCOPE.md` or equivalent current-scope material. This is a
derived or curated description of what is believed to be true now, not primary
evidence for rebuilding the same characteristic model.
- `product_documentation`: README, docs, specifications, and user-facing guides.
- `implementation_source`: source code owned by the repository.
- `dependency_declaration`: manifests, imports, lockfiles, and package metadata.
- `configuration`, `ci_tooling`, `test_evidence`, and `agent_guidance`.
`INTENT.md` and `SCOPE.md` deliberately answer different questions. Intent is a
design artifact: what the repository is supposed to become or provide. Scope is
a derived current-state artifact: what the repository is understood to provide
after evidence and review. A good `SCOPE.md` is valuable context, but using it
as ordinary evidence for generated characteristics creates a circular model.
Rebuilds should therefore prefer `INTENT.md`, product documentation, source, and
tests; `SCOPE.md` should be used as comparison material or explicit bootstrap
input only when a curator chooses that mode.
For repositories that already have a useful `SCOPE.md` but no `INTENT.md`,
repo-scoping can perform a one-time bootstrap by copying the scope text into a
new intent file with a clear provenance note. After that bootstrap, the files
should diverge naturally: `INTENT.md` remains design intent, while `SCOPE.md`
remains generated or curated current scope.
Provider, dependency, and tooling facts should also carry a utility
relationship. A provider mentioned in documentation is usually a `mention`; an
environment variable is usually `configure`; a manifest entry is usually
`dependency`; implementation code under provider or adapter modules may be
`owned` or `adapter`. Candidate generation should promote only relationships
that show the repository provides the utility directly or intentionally exposes
it as a facade/adapter. Mentions, dependencies, configuration, and tooling are
context until a curator promotes them or stronger owned evidence appears.
Deterministic quality gates apply the same source and utility relationship
signals, but they do not approve automatically. Gates may reject, downgrade,
invalidate, flag, merge, or require review. Approval requires human judgement or
a configured agentic reviewer that records evidence, criteria version, and
rationale. Dependency, tooling, configuration, and mention-only candidates remain
review material.
`INTENT.md` may also seed intended capabilities when it contains an explicit
capability section. These intent-derived candidates are marked as review
required because intent says what the repository is meant to provide, not what
has already been proven. `SCOPE.md` sections with the same wording are not
treated as equivalent input during rebuilds, because scope is derived from the
registry model being rebuilt.
The motivating failure mode was a key-cape-like repository whose agent guidance
and generic backend-adapter vocabulary looked superficially like LLM provider
routing. That pattern should produce source-linked facts for the files that
exist, but it should not become an LLM-provider capability unless there is
provider-specific owned, facade, or adapter evidence. The scanner and generator
should solve this by provenance and utility relationship rules, not by
hard-coding product names.
Source references point from interpreted claims back to files or facts.
Evidence is support for a characteristic. It is not the same thing as an observed
fact. Evidence may reference:
- Observed facts.
- Source files or content chunks.
- Lower-level characteristics, such as a capability using features as evidence.
Evidence should usually point downward in abstraction. An ability can use
capabilities or features as support. A capability can use features or facts as
support. A feature should usually use facts or source references as support, not
abilities or capabilities.
Same-level evidence references are allowed as review material, but should be
treated as a possible organization smell.
## Implementation Direction
The current schema still stores evidence on capabilities, with textual
references and source refs. The next additive schema step should generalize this
without breaking existing data:
- Add a scope root per repository.
- Add typed evidence targets: supported characteristic kind/id.
- Add typed evidence references: fact, source ref, content chunk, or
characteristic kind/id.
- Keep legacy evidence fields until migration/export/search have been updated.
The UI should make this relationship clear by presenting evidence as support
under the characteristic it supports, not as a peer of features.
## Rebuilds And Supersession
Use a normal analysis rerun when the existing approved map is mostly trustworthy
and the goal is to compare new evidence against prior candidates. Use a rebuild
from scratch when approved characteristics are polluted by a bad extraction
pattern, stale after a major rename, or circularly derived from old scope text.
A dry-run rebuild should be the first step. It scans current source, generates a
fresh candidate graph, and reports what approved abilities, capabilities,
features, and evidence would be superseded. A confirmed rebuild preserves audit
history by recording which approved IDs were superseded, then clears the current
approved map and leaves the fresh candidate graph for review or trusted
auto-approval.
Curators should treat superseded characteristics as historical claims, not as
deleted facts. They explain what the registry used to believe and why a rebuild
was chosen over incremental correction.