repo-scoping/docs/scope-md-spec.md

# SCOPE.md Reference Specification

`SCOPE.md` is the human- and agent-facing boundary definition for a repository.
It answers, quickly and concretely, what the repository is for, when it is useful,
where it fits, and what capabilities it can provide.

Repo-scoping is the source of truth for generating and validating `SCOPE.md`
because its approved characteristic model already captures the same structure:

```text
Scope -> Ability -> Capability -> Feature -> Evidence -> Observed Fact
```

This specification supersedes the Custodian dashboard reference at
`state-hub/dashboard/src/docs/scope.md`. The scaffold template remains at
`state-hub/scripts/project_rules/scope.template`; this document defines how
repo-scoping should generate, validate, and update that file.

Related model docs:

- `docs/characteristic-evidence-model.md`
- `docs/classification-strategy.md`

## Purpose

`SCOPE.md` is not a README, architecture document, or marketing page. It is a
short orientation artifact for deciding whether a repo is relevant before reading
its code in depth.

It should answer:

- What is this repository for?
- Should I care about it right now?
- When is it relevant to my work?
- Where does it fit in the ecosystem?
- Is it mature enough to trust or reuse?
- Does it overlap with something else?
- What capabilities can it provide to other domains?

## Canonical Template

The historical Custodian reference calls this an "11-section template". The
current `scope.template` contains twelve functional sections plus an optional
`Notes` tail. Repo-scoping should preserve the current template headings for
compatibility and treat `Notes` as curator-owned free text.

Generated files must contain these sections, in this order:

| Section | Source in repo-scoping | Generation ownership |
|---------|--------------------------|----------------------|
| `## One-liner` | Scope name plus scope description | generated, curator-reviewed |
| `## Core Idea` | Scope description and top approved abilities | generated, curator-reviewed |
| `## In Scope` | Approved abilities and high-confidence capabilities | generated, curator-reviewed |
| `## Out of Scope` | Abilities or expectation gaps classified as exclusions | curator-owned unless explicitly modeled |
| `## Relevant When` | Approved features with `primary_class: business-usecase` or `attributes` including use-case labels | generated, curator-reviewed |
| `## Not Relevant When` | Negative use-case expectation gaps or curator exclusions | curator-owned unless explicitly modeled |
| `## Current State` | Observed facts aggregated by scanner: status, language, framework, tests, routes, docs, manifests | generated |
| `## How It Fits` | Evidence/support references to other characteristics or repos; dependency facts | generated, curator-reviewed |
| `## Terminology` | Domain term facts, names, aliases, and classification labels | generated, curator-reviewed |
| `## Related / Overlapping` | Cross-repo support references and comparison/discovery data | generated when known, curator-reviewed |
| `## Getting Oriented` | Source refs, content chunks, key files, entry points, docs, tests | generated |
| `## Provided Capabilities` | Approved capability characteristics rendered as machine-readable `capability` blocks | generated, file-origin truth |
| `## Notes` | Human-maintained remarks that do not fit the structured sections | curator-owned |

When a generated section has insufficient data, emit a short stub plus:

```markdown
<!-- needs curator input -->
```

This makes gaps visible without pretending the scanner knows more than it does.

## Section Mapping Details

### One-liner

Use the approved repository `Scope` as the root characteristic. Prefer a single
sentence from the scope description. If no curated sentence exists, use:

```text
<scope name> defines and maintains the repository scope for <repository name>.
```

### Core Idea

Summarize the root `Scope` and the most important approved `Ability` entries.
Use ability descriptions where available. Avoid listing every capability here;
the goal is orientation, not completeness.

### In Scope

Render approved abilities as top-level bullets. Include the most important
capabilities as nested wording inside the bullet, but avoid deep nesting in the
generated Markdown.

Suggested form:

```markdown
- <Ability name> — <ability description>. Includes <capability A>, <capability B>.
```

### Out of Scope

This section is primarily curator-owned. Repo-scoping may seed it from
classification expectation gaps whose `expected_type` is one of:

- `classification-granularity`
- `classification-support`
- `out-of-scope`

Generated text must be conservative and marked for review unless there is an
approved negative/exclusion model in the future.

### Relevant When

Use approved features that represent real usage scenarios. Strong signals:

- `primary_class == "business-usecase"`
- `attributes` contains `usecase`, `workflow`, `review`, `generation`,
  `analysis`, `integration`, or another domain-specific use-case label

If no business-usecase features exist, seed from high-confidence abilities and
capabilities with a curator-input marker.

### Not Relevant When

This section is curator-owned unless explicit negative use-case facts or
expectation gaps exist. Do not infer broad exclusions from missing features.

### Current State

Aggregate observed facts. Good generated indicators include:

- Status: derive from repository status and analysis run state.
- Implementation: derive from source files, package manifests, tests, and route
  or CLI facts.
- Stability: conservative default `evolving` unless curated.
- Usage: conservative default `internal` or `unknown` unless facts indicate
  production usage.

Include compact bullets for detected languages, frameworks, tests, manifests,
docs, interfaces, provider facts, and scanner gaps.

### How It Fits

Use support/evidence relationships and source refs:

- Upstream dependencies: package, service, provider, and integration facts.
- Downstream consumers: cross-repo support references when available.
- Often used with: related repo links and common provider/framework facts.

Evidence is support for a characteristic, not the same thing as a fact. Prefer
evidence links that point downward in abstraction, as described in
`docs/characteristic-evidence-model.md`.

### Terminology

Generate from:

- scope, ability, capability, and feature names
- `primary_class` and `attributes`
- scanner facts for providers, frameworks, commands, APIs, and domain terms
- aliases or expectation gaps when present

Mark ambiguous or overlapping terms for curator review.

### Related / Overlapping

Generate only when there is cross-repo evidence, comparison data, or explicit
curator input. Do not invent related repositories from name similarity alone.

### Getting Oriented

Use source references and observed facts to name good entry points:

- Start with: README, docs, API route files, CLI files, core service modules
- Key files / directories: source paths with high fact/support density
- Entry points: API routes, CLI commands, package manifests, tests

### Provided Capabilities

Render approved `Capability` characteristics as fenced `capability` blocks. This
section is parsed by the Custodian capability catalog and remains file-origin
truth under ADR-001.

Block format:

````markdown
```capability
type: api
title: scope.generate
description: >
  Generates a SCOPE.md from approved repository characteristics.
keywords: [scope, scope-md, generation]
```
````

Fields:

| Field | Required | Source |
|-------|----------|--------|
| `type` | yes | capability `primary_class`, normalized to catalog categories |
| `title` | yes | capability name or curated capability key |
| `description` | no | capability description |
| `keywords` | no | capability attributes plus relevant feature classes |

Allowed catalog categories remain compatible with the existing Custodian ingest:

- `infrastructure`
- `api`
- `data`
- `security`
- `documentation`
- `other`

If a capability's `primary_class` is not one of these categories, map it to
`api`, `data`, `documentation`, or `other` conservatively and preserve the
original class as a keyword.

### Notes

`Notes` is optional and curator-owned. Generators should preserve existing notes
when updating a file and should not overwrite this section unless explicitly
requested.

## Generation Ownership

Repo-scoping-generated sections:

- One-liner
- Core Idea
- In Scope
- Relevant When
- Current State
- How It Fits
- Terminology
- Related / Overlapping
- Getting Oriented
- Provided Capabilities

Curator-owned or curator-reviewed sections:

- Out of Scope
- Not Relevant When
- Notes
- Any generated section containing `<!-- needs curator input -->`

The generator may write stubs for curator-owned sections, but the updater must
preserve existing curator text unless the caller explicitly asks for a full
rewrite.

## Validation Rules

The validator should mirror the Custodian DOI C5 checks:

- C5a: `SCOPE.md` exists at the repository root.
- C5b: required headings are present in canonical order.
- C5c: `## Provided Capabilities` contains parseable `capability` blocks, or an
  explicit empty-state note when the repo provides no routable capabilities.

Additional repo-scoping validation:

- Generated sections with missing data must include `<!-- needs curator input -->`.
- Capability blocks must parse as key/value metadata.
- Capability block titles should be stable enough for routing.
- Curator-owned sections should be preserved by diff/update flows.

## Update Semantics

The validator/differ compares the existing file to freshly generated content by
section. A section is:

- `ok` when normalized existing text matches generated content.
- `stale` when the section exists but differs materially.
- `missing` when the heading is absent.

Normalization should ignore repeated whitespace and harmless Markdown wrapping,
but must not ignore changed capability block metadata.

Generated updates should be section-aware. Do not rewrite the whole file when a
smaller section update is enough.

## Agent Guidance

Agents should treat `SCOPE.md` as a decision aid:

- Read it before deep code exploration.
- Prefer it over README for scope boundaries.
- Use `AGENTS.md` for operating instructions and repo-specific workflow.
- Use generated diffs to spot stale scope claims.
- Record expectation gaps when generated scope, classes, or capabilities do not
match human judgment.