Files
repo-scoping/docs/characteristic-evidence-model.md

7.0 KiB

Characteristic And Evidence Model

The registry should treat a repository profile as a characteristic tree.

Characteristics

A characteristic is an interpreted claim about a repository. The current concrete levels are:

  • Scope: the single root characteristic for the repository.
  • Ability: a high-level thing the repository is meant to enable.
  • Capability: a more specific capacity that contributes to an ability.
  • Feature: a concrete user-facing, operational, interface, or implementation feature that contributes to a capability.

The regular target shape is:

Scope -> Ability -> Capability -> Feature -> Observed Fact

This regular tree is an orientation tool, not a claim that every real repository is perfectly tree-shaped. Cross references and same-level references can be useful during review, but they are also quality signals: frequent same-level feature references may indicate that features are too coarse, too fine, or organized under the wrong capability.

Facts, Source References, And Evidence

Observed facts are deterministic scanner output. They describe what was seen in the repository: files, languages, frameworks, routes, tests, documentation, provider names, configuration variables, and similar source-linked observations. Facts can carry a source role so generation can separate product evidence from ambient context. Important roles include:

  • intent_summary: INTENT.md or equivalent design-intent material describing why the repository should exist and what utility it is meant to provide.
  • derived_scope: SCOPE.md or equivalent current-scope material. This is a derived or curated description of what is believed to be true now, not primary evidence for rebuilding the same characteristic model.
  • product_documentation: README, docs, specifications, and user-facing guides.
  • implementation_source: source code owned by the repository.
  • dependency_declaration: manifests, imports, lockfiles, and package metadata.
  • configuration, ci_tooling, test_evidence, and agent_guidance.

INTENT.md and SCOPE.md deliberately answer different questions. Intent is a design artifact: what the repository is supposed to become or provide. Scope is a derived current-state artifact: what the repository is understood to provide after evidence and review. A good SCOPE.md is valuable context, but using it as ordinary evidence for generated characteristics creates a circular model. Rebuilds should therefore prefer INTENT.md, product documentation, source, and tests; SCOPE.md should be used as comparison material or explicit bootstrap input only when a curator chooses that mode.

For repositories that already have a useful SCOPE.md but no INTENT.md, repo-scoping can perform a one-time bootstrap by copying the scope text into a new intent file with a clear provenance note. After that bootstrap, the files should diverge naturally: INTENT.md remains design intent, while SCOPE.md remains generated or curated current scope.

Provider, dependency, and tooling facts should also carry a utility relationship. A provider mentioned in documentation is usually a mention; an environment variable is usually configure; a manifest entry is usually dependency; implementation code under provider or adapter modules may be owned or adapter. Candidate generation should promote only relationships that show the repository provides the utility directly or intentionally exposes it as a facade/adapter. Mentions, dependencies, configuration, and tooling are context until a curator promotes them or stronger owned evidence appears.

Deterministic quality gates apply the same source and utility relationship signals, but they do not approve automatically. Gates may reject, downgrade, invalidate, flag, merge, or require review. Approval requires human judgement or a configured agentic reviewer that records evidence, criteria version, and rationale. Dependency, tooling, configuration, and mention-only candidates remain review material.

INTENT.md may also seed intended capabilities when it contains an explicit capability section. These intent-derived candidates are marked as review required because intent says what the repository is meant to provide, not what has already been proven. SCOPE.md sections with the same wording are not treated as equivalent input during rebuilds, because scope is derived from the registry model being rebuilt.

The motivating failure mode was a key-cape-like repository whose agent guidance and generic backend-adapter vocabulary looked superficially like LLM provider routing. That pattern should produce source-linked facts for the files that exist, but it should not become an LLM-provider capability unless there is provider-specific owned, facade, or adapter evidence. The scanner and generator should solve this by provenance and utility relationship rules, not by hard-coding product names.

Source references point from interpreted claims back to files or facts.

Evidence is support for a characteristic. It is not the same thing as an observed fact. Evidence may reference:

  • Observed facts.
  • Source files or content chunks.
  • Lower-level characteristics, such as a capability using features as evidence.

Evidence should usually point downward in abstraction. An ability can use capabilities or features as support. A capability can use features or facts as support. A feature should usually use facts or source references as support, not abilities or capabilities.

Same-level evidence references are allowed as review material, but should be treated as a possible organization smell.

Implementation Direction

The current schema still stores evidence on capabilities, with textual references and source refs. The next additive schema step should generalize this without breaking existing data:

  • Add a scope root per repository.
  • Add typed evidence targets: supported characteristic kind/id.
  • Add typed evidence references: fact, source ref, content chunk, or characteristic kind/id.
  • Keep legacy evidence fields until migration/export/search have been updated.

The UI should make this relationship clear by presenting evidence as support under the characteristic it supports, not as a peer of features.

Rebuilds And Supersession

Use a normal analysis rerun when the existing approved map is mostly trustworthy and the goal is to compare new evidence against prior candidates. Use a rebuild from scratch when approved characteristics are polluted by a bad extraction pattern, stale after a major rename, or circularly derived from old scope text.

A dry-run rebuild should be the first step. It scans current source, generates a fresh candidate graph, and reports what approved abilities, capabilities, features, and evidence would be superseded. A confirmed rebuild preserves audit history by recording which approved IDs were superseded, then clears the current approved map and leaves the fresh candidate graph for review or trusted auto-approval.

Curators should treat superseded characteristics as historical claims, not as deleted facts. They explain what the registry used to believe and why a rebuild was chosen over incremental correction.