7.0 KiB
Characteristic And Evidence Model
The registry should treat a repository profile as a characteristic tree.
Characteristics
A characteristic is an interpreted claim about a repository. The current concrete levels are:
- Scope: the single root characteristic for the repository.
- Ability: a high-level thing the repository is meant to enable.
- Capability: a more specific capacity that contributes to an ability.
- Feature: a concrete user-facing, operational, interface, or implementation feature that contributes to a capability.
The regular target shape is:
Scope -> Ability -> Capability -> Feature -> Observed Fact
This regular tree is an orientation tool, not a claim that every real repository is perfectly tree-shaped. Cross references and same-level references can be useful during review, but they are also quality signals: frequent same-level feature references may indicate that features are too coarse, too fine, or organized under the wrong capability.
Facts, Source References, And Evidence
Observed facts are deterministic scanner output. They describe what was seen in the repository: files, languages, frameworks, routes, tests, documentation, provider names, configuration variables, and similar source-linked observations. Facts can carry a source role so generation can separate product evidence from ambient context. Important roles include:
intent_summary:INTENT.mdor equivalent design-intent material describing why the repository should exist and what utility it is meant to provide.derived_scope:SCOPE.mdor equivalent current-scope material. This is a derived or curated description of what is believed to be true now, not primary evidence for rebuilding the same characteristic model.product_documentation: README, docs, specifications, and user-facing guides.implementation_source: source code owned by the repository.dependency_declaration: manifests, imports, lockfiles, and package metadata.configuration,ci_tooling,test_evidence, andagent_guidance.
INTENT.md and SCOPE.md deliberately answer different questions. Intent is a
design artifact: what the repository is supposed to become or provide. Scope is
a derived current-state artifact: what the repository is understood to provide
after evidence and review. A good SCOPE.md is valuable context, but using it
as ordinary evidence for generated characteristics creates a circular model.
Rebuilds should therefore prefer INTENT.md, product documentation, source, and
tests; SCOPE.md should be used as comparison material or explicit bootstrap
input only when a curator chooses that mode.
For repositories that already have a useful SCOPE.md but no INTENT.md,
repo-scoping can perform a one-time bootstrap by copying the scope text into a
new intent file with a clear provenance note. After that bootstrap, the files
should diverge naturally: INTENT.md remains design intent, while SCOPE.md
remains generated or curated current scope.
Provider, dependency, and tooling facts should also carry a utility
relationship. A provider mentioned in documentation is usually a mention; an
environment variable is usually configure; a manifest entry is usually
dependency; implementation code under provider or adapter modules may be
owned or adapter. Candidate generation should promote only relationships
that show the repository provides the utility directly or intentionally exposes
it as a facade/adapter. Mentions, dependencies, configuration, and tooling are
context until a curator promotes them or stronger owned evidence appears.
Deterministic quality gates apply the same source and utility relationship signals, but they do not approve automatically. Gates may reject, downgrade, invalidate, flag, merge, or require review. Approval requires human judgement or a configured agentic reviewer that records evidence, criteria version, and rationale. Dependency, tooling, configuration, and mention-only candidates remain review material.
INTENT.md may also seed intended capabilities when it contains an explicit
capability section. These intent-derived candidates are marked as review
required because intent says what the repository is meant to provide, not what
has already been proven. SCOPE.md sections with the same wording are not
treated as equivalent input during rebuilds, because scope is derived from the
registry model being rebuilt.
The motivating failure mode was a key-cape-like repository whose agent guidance and generic backend-adapter vocabulary looked superficially like LLM provider routing. That pattern should produce source-linked facts for the files that exist, but it should not become an LLM-provider capability unless there is provider-specific owned, facade, or adapter evidence. The scanner and generator should solve this by provenance and utility relationship rules, not by hard-coding product names.
Source references point from interpreted claims back to files or facts.
Evidence is support for a characteristic. It is not the same thing as an observed fact. Evidence may reference:
- Observed facts.
- Source files or content chunks.
- Lower-level characteristics, such as a capability using features as evidence.
Evidence should usually point downward in abstraction. An ability can use capabilities or features as support. A capability can use features or facts as support. A feature should usually use facts or source references as support, not abilities or capabilities.
Same-level evidence references are allowed as review material, but should be treated as a possible organization smell.
Implementation Direction
The current schema still stores evidence on capabilities, with textual references and source refs. The next additive schema step should generalize this without breaking existing data:
- Add a scope root per repository.
- Add typed evidence targets: supported characteristic kind/id.
- Add typed evidence references: fact, source ref, content chunk, or characteristic kind/id.
- Keep legacy evidence fields until migration/export/search have been updated.
The UI should make this relationship clear by presenting evidence as support under the characteristic it supports, not as a peer of features.
Rebuilds And Supersession
Use a normal analysis rerun when the existing approved map is mostly trustworthy and the goal is to compare new evidence against prior candidates. Use a rebuild from scratch when approved characteristics are polluted by a bad extraction pattern, stale after a major rename, or circularly derived from old scope text.
A dry-run rebuild should be the first step. It scans current source, generates a fresh candidate graph, and reports what approved abilities, capabilities, features, and evidence would be superseded. A confirmed rebuild preserves audit history by recording which approved IDs were superseded, then clears the current approved map and leaves the fresh candidate graph for review or trusted auto-approval.
Curators should treat superseded characteristics as historical claims, not as deleted facts. They explain what the registry used to believe and why a rebuild was chosen over incremental correction.