Files
repo-scoping/docs/scope-md-spec.md

293 lines
10 KiB
Markdown

# SCOPE.md Reference Specification
`SCOPE.md` is the human- and agent-facing boundary definition for a repository.
It answers, quickly and concretely, what the repository is for, when it is useful,
where it fits, and what capabilities it can provide.
Repo-scoping is the source of truth for generating and validating `SCOPE.md`
because its approved characteristic model already captures the same structure:
```text
Scope -> Ability -> Capability -> Feature -> Evidence -> Observed Fact
```
This specification supersedes the Custodian dashboard reference at
`state-hub/dashboard/src/docs/scope.md`. The scaffold template remains at
`state-hub/scripts/project_rules/scope.template`; this document defines how
repo-scoping should generate, validate, and update that file.
Related model docs:
- `docs/characteristic-evidence-model.md`
- `docs/classification-strategy.md`
## Purpose
`SCOPE.md` is not a README, architecture document, or marketing page. It is a
short orientation artifact for deciding whether a repo is relevant before reading
its code in depth.
It should answer:
- What is this repository for?
- Should I care about it right now?
- When is it relevant to my work?
- Where does it fit in the ecosystem?
- Is it mature enough to trust or reuse?
- Does it overlap with something else?
- What capabilities can it provide to other domains?
## Canonical Template
The historical Custodian reference calls this an "11-section template". The
current `scope.template` contains twelve functional sections plus an optional
`Notes` tail. Repo-scoping should preserve the current template headings for
compatibility and treat `Notes` as curator-owned free text.
Generated files must contain these sections, in this order:
| Section | Source in repo-scoping | Generation ownership |
|---------|--------------------------|----------------------|
| `## One-liner` | Scope name plus scope description | generated, curator-reviewed |
| `## Core Idea` | Scope description and top approved abilities | generated, curator-reviewed |
| `## In Scope` | Approved abilities and high-confidence capabilities | generated, curator-reviewed |
| `## Out of Scope` | Abilities or expectation gaps classified as exclusions | curator-owned unless explicitly modeled |
| `## Relevant When` | Approved features with `primary_class: business-usecase` or `attributes` including use-case labels | generated, curator-reviewed |
| `## Not Relevant When` | Negative use-case expectation gaps or curator exclusions | curator-owned unless explicitly modeled |
| `## Current State` | Observed facts aggregated by scanner: status, language, framework, tests, routes, docs, manifests | generated |
| `## How It Fits` | Evidence/support references to other characteristics or repos; dependency facts | generated, curator-reviewed |
| `## Terminology` | Domain term facts, names, aliases, and classification labels | generated, curator-reviewed |
| `## Related / Overlapping` | Cross-repo support references and comparison/discovery data | generated when known, curator-reviewed |
| `## Getting Oriented` | Source refs, content chunks, key files, entry points, docs, tests | generated |
| `## Provided Capabilities` | Approved capability characteristics rendered as machine-readable `capability` blocks | generated, file-origin truth |
| `## Notes` | Human-maintained remarks that do not fit the structured sections | curator-owned |
When a generated section has insufficient data, emit a short stub plus:
```markdown
<!-- needs curator input -->
```
This makes gaps visible without pretending the scanner knows more than it does.
## Section Mapping Details
### One-liner
Use the approved repository `Scope` as the root characteristic. Prefer a single
sentence from the scope description. If no curated sentence exists, use:
```text
<scope name> defines and maintains the repository scope for <repository name>.
```
### Core Idea
Summarize the root `Scope` and the most important approved `Ability` entries.
Use ability descriptions where available. Avoid listing every capability here;
the goal is orientation, not completeness.
### In Scope
Render approved abilities as top-level bullets. Include the most important
capabilities as nested wording inside the bullet, but avoid deep nesting in the
generated Markdown.
Suggested form:
```markdown
- <Ability name> — <ability description>. Includes <capability A>, <capability B>.
```
### Out of Scope
This section is primarily curator-owned. Repo-scoping may seed it from
classification expectation gaps whose `expected_type` is one of:
- `classification-granularity`
- `classification-support`
- `out-of-scope`
Generated text must be conservative and marked for review unless there is an
approved negative/exclusion model in the future.
### Relevant When
Use approved features that represent real usage scenarios. Strong signals:
- `primary_class == "business-usecase"`
- `attributes` contains `usecase`, `workflow`, `review`, `generation`,
`analysis`, `integration`, or another domain-specific use-case label
If no business-usecase features exist, seed from high-confidence abilities and
capabilities with a curator-input marker.
### Not Relevant When
This section is curator-owned unless explicit negative use-case facts or
expectation gaps exist. Do not infer broad exclusions from missing features.
### Current State
Aggregate observed facts. Good generated indicators include:
- Status: derive from repository status and analysis run state.
- Implementation: derive from source files, package manifests, tests, and route
or CLI facts.
- Stability: conservative default `evolving` unless curated.
- Usage: conservative default `internal` or `unknown` unless facts indicate
production usage.
Include compact bullets for detected languages, frameworks, tests, manifests,
docs, interfaces, provider facts, and scanner gaps.
### How It Fits
Use support/evidence relationships and source refs:
- Upstream dependencies: package, service, provider, and integration facts.
- Downstream consumers: cross-repo support references when available.
- Often used with: related repo links and common provider/framework facts.
Evidence is support for a characteristic, not the same thing as a fact. Prefer
evidence links that point downward in abstraction, as described in
`docs/characteristic-evidence-model.md`.
### Terminology
Generate from:
- scope, ability, capability, and feature names
- `primary_class` and `attributes`
- scanner facts for providers, frameworks, commands, APIs, and domain terms
- aliases or expectation gaps when present
Mark ambiguous or overlapping terms for curator review.
### Related / Overlapping
Generate only when there is cross-repo evidence, comparison data, or explicit
curator input. Do not invent related repositories from name similarity alone.
### Getting Oriented
Use source references and observed facts to name good entry points:
- Start with: README, docs, API route files, CLI files, core service modules
- Key files / directories: source paths with high fact/support density
- Entry points: API routes, CLI commands, package manifests, tests
### Provided Capabilities
Render approved `Capability` characteristics as fenced `capability` blocks. This
section is parsed by the Custodian capability catalog and remains file-origin
truth under ADR-001.
Block format:
````markdown
```capability
type: api
title: scope.generate
description: >
Generates a SCOPE.md from approved repository characteristics.
keywords: [scope, scope-md, generation]
```
````
Fields:
| Field | Required | Source |
|-------|----------|--------|
| `type` | yes | capability `primary_class`, normalized to catalog categories |
| `title` | yes | capability name or curated capability key |
| `description` | no | capability description |
| `keywords` | no | capability attributes plus relevant feature classes |
Allowed catalog categories remain compatible with the existing Custodian ingest:
- `infrastructure`
- `api`
- `data`
- `security`
- `documentation`
- `other`
If a capability's `primary_class` is not one of these categories, map it to
`api`, `data`, `documentation`, or `other` conservatively and preserve the
original class as a keyword.
### Notes
`Notes` is optional and curator-owned. Generators should preserve existing notes
when updating a file and should not overwrite this section unless explicitly
requested.
## Generation Ownership
Repo-scoping-generated sections:
- One-liner
- Core Idea
- In Scope
- Relevant When
- Current State
- How It Fits
- Terminology
- Related / Overlapping
- Getting Oriented
- Provided Capabilities
Curator-owned or curator-reviewed sections:
- Out of Scope
- Not Relevant When
- Notes
- Any generated section containing `<!-- needs curator input -->`
The generator may write stubs for curator-owned sections, but the updater must
preserve existing curator text unless the caller explicitly asks for a full
rewrite.
## Validation Rules
The validator should mirror the Custodian DOI C5 checks:
- C5a: `SCOPE.md` exists at the repository root.
- C5b: required headings are present in canonical order.
- C5c: `## Provided Capabilities` contains parseable `capability` blocks, or an
explicit empty-state note when the repo provides no routable capabilities.
Additional repo-scoping validation:
- Generated sections with missing data must include `<!-- needs curator input -->`.
- Capability blocks must parse as key/value metadata.
- Capability block titles should be stable enough for routing.
- Curator-owned sections should be preserved by diff/update flows.
## Update Semantics
The validator/differ compares the existing file to freshly generated content by
section. A section is:
- `ok` when normalized existing text matches generated content.
- `stale` when the section exists but differs materially.
- `missing` when the heading is absent.
Normalization should ignore repeated whitespace and harmless Markdown wrapping,
but must not ignore changed capability block metadata.
Generated updates should be section-aware. Do not rewrite the whole file when a
smaller section update is enough.
## Agent Guidance
Agents should treat `SCOPE.md` as a decision aid:
- Read it before deep code exploration.
- Prefer it over README for scope boundaries.
- Use `AGENTS.md` for operating instructions and repo-specific workflow.
- Use generated diffs to spot stale scope claims.
- Record expectation gaps when generated scope, classes, or capabilities do not
match human judgment.