generated from coulomb/repo-seed
Transfered deep scope functionality from the custodian
This commit is contained in:
12
AGENTS.md
12
AGENTS.md
@@ -1,4 +1,4 @@
|
||||
# repo-registry — Agent Instructions
|
||||
# repo-scoping — Agent Instructions
|
||||
|
||||
## Repo Identity
|
||||
|
||||
@@ -8,7 +8,7 @@ scanners establish observed facts; LLM-assisted extractors propose interpreted
|
||||
claims; humans or trusted agents approve registry truth.
|
||||
|
||||
**Domain:** capabilities
|
||||
**Repo slug:** repo-registry
|
||||
**Repo slug:** repo-scoping
|
||||
**Topic ID:** `64418556-3206-457a-ba29-6884b5b12cf3`
|
||||
**Workplan prefix:** `RREG-WP-`
|
||||
|
||||
@@ -33,7 +33,7 @@ curl -s "http://127.0.0.1:8000/workstreams/?topic_id=64418556-3206-457a-ba29-688
|
||||
curl -s "http://127.0.0.1:8000/tasks/?status=todo" | python3 -m json.tool
|
||||
|
||||
# Check inbox
|
||||
curl -s "http://127.0.0.1:8000/messages/?to_agent=repo-registry&unread_only=true" \
|
||||
curl -s "http://127.0.0.1:8000/messages/?to_agent=repo-scoping&unread_only=true" \
|
||||
| python3 -m json.tool
|
||||
```
|
||||
|
||||
@@ -79,7 +79,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
|
||||
**Start:**
|
||||
1. `ls workplans/` — note active workplans and their open tasks
|
||||
2. Check inbox via `GET /messages/?to_agent=repo-registry&unread_only=true`
|
||||
2. Check inbox via `GET /messages/?to_agent=repo-scoping&unread_only=true`
|
||||
3. Check for human-flagged tasks: `GET /tasks/?needs_human=true`
|
||||
|
||||
**During work:**
|
||||
@@ -92,7 +92,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
3. If workplan files changed, sync them to the hub DB:
|
||||
|
||||
```bash
|
||||
curl -s -X POST "http://127.0.0.1:8000/repos/repo-registry/sync" | python3 -m json.tool
|
||||
curl -s -X POST "http://127.0.0.1:8000/repos/repo-scoping/sync" | python3 -m json.tool
|
||||
```
|
||||
|
||||
This runs the ADR-001 consistency check with `--fix` and returns a JSON report.
|
||||
@@ -116,7 +116,7 @@ id: RREG-WP-NNNN
|
||||
type: workplan
|
||||
title: "..."
|
||||
domain: capabilities
|
||||
repo: repo-registry
|
||||
repo: repo-scoping
|
||||
status: active | done
|
||||
owner: codex
|
||||
topic_slug: foerster-capabilities
|
||||
|
||||
197
SCOPE.md
197
SCOPE.md
@@ -1,48 +1,175 @@
|
||||
---
|
||||
domain: capabilities
|
||||
repo: repo-registry
|
||||
updated: "2026-04-26"
|
||||
repo: repo-scoping
|
||||
updated: "2026-04-30"
|
||||
---
|
||||
|
||||
# repo-registry — Scope
|
||||
# SCOPE
|
||||
|
||||
## Purpose
|
||||
> This file helps you quickly understand what this repository is about,
|
||||
> when it is relevant, and when it is not.
|
||||
> It is curated from repo-scoping's approved characteristics and operating role.
|
||||
|
||||
Repository Ability Registry. Turns Git repositories into reviewable, source-linked
|
||||
maps of `Ability → Capability → Feature → Evidence`.
|
||||
---
|
||||
|
||||
## Core Design Principle
|
||||
## One-liner
|
||||
|
||||
```
|
||||
deterministic scanners → observed facts (file paths, languages, API routes, …)
|
||||
LLM-assisted extractors → interpreted claims (ability names, descriptions, links)
|
||||
human / agent review → approved registry truth
|
||||
repo-scoping turns Git repositories into reviewable, source-linked scope maps and
|
||||
maintains SCOPE.md files as the human and agent entry point to repository utility.
|
||||
|
||||
---
|
||||
|
||||
## Core Idea
|
||||
|
||||
repo-scoping models a repository as a hierarchy of characteristics:
|
||||
`Scope -> Ability -> Capability -> Feature -> Evidence -> Observed Fact`.
|
||||
|
||||
Deterministic scanners establish observed facts from repository content. Optional
|
||||
LLM-assisted extraction proposes interpreted candidates. Humans or trusted agents
|
||||
approve the resulting characteristics before they become registry truth.
|
||||
|
||||
The primary output is a useful repository scope profile: what a repo is for, when
|
||||
to use it, what capabilities it provides, and which facts or lower-level
|
||||
characteristics support those claims.
|
||||
|
||||
---
|
||||
|
||||
## In Scope
|
||||
|
||||
- Register repositories and keep metadata, analysis runs, facts, candidates, and
|
||||
approved characteristics together.
|
||||
- Analyze repositories with deterministic scanners and optional LLM-assisted
|
||||
candidate extraction.
|
||||
- Review, edit, approve, reject, merge, and relink candidate abilities,
|
||||
capabilities, features, and evidence.
|
||||
- Search and compare approved repository characteristics.
|
||||
- Generate, diff, validate, and write SCOPE.md files from approved
|
||||
characteristics.
|
||||
- Support the Custodian State Hub by acting as the provider for scope generation
|
||||
and update capabilities.
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Owning the Custodian State Hub, its database, or cross-domain governance rules.
|
||||
- Making unreviewed truth claims canonical without approval.
|
||||
- Replacing human product judgment for curator-owned scope sections.
|
||||
- Continuous Git hosting automation, deployment infrastructure, or access-control
|
||||
policy beyond repository ingestion needs.
|
||||
- Full static code understanding across every language and framework.
|
||||
|
||||
---
|
||||
|
||||
## Relevant When
|
||||
|
||||
- You need to understand what a repository is useful for without reading the whole
|
||||
codebase first.
|
||||
- You want a source-linked map from high-level repository scope down to observed
|
||||
implementation facts.
|
||||
- You need to review generated candidate abilities, capabilities, features, and
|
||||
evidence before approving them.
|
||||
- You need to create or refresh a SCOPE.md for a registered repository.
|
||||
- You need to compare repositories by approved characteristics or find capability
|
||||
gaps across a domain.
|
||||
|
||||
---
|
||||
|
||||
## Not Relevant When
|
||||
|
||||
- You only need raw Git hosting, CI, deployment, or issue tracking.
|
||||
- You need a fully autonomous ontology without human review.
|
||||
- The repository has not been registered or analyzed and no approved
|
||||
characteristics exist yet.
|
||||
- The needed decision is curator-owned product positioning rather than
|
||||
source-observable repository behavior.
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- Status: active and evolving.
|
||||
- Implementation: FastAPI service with SQLite development storage, deterministic
|
||||
Git scanning, candidate graph review workflow, search, comparison, and SCOPE.md
|
||||
generation endpoints.
|
||||
- LLM assistance: optional; deterministic non-LLM behavior remains a first-class
|
||||
path for continued optimization.
|
||||
- UI: available for repository registration, analysis runs, candidate review, and
|
||||
characteristic navigation.
|
||||
- Integration: registered in the Custodian State Hub as `repo-scoping`.
|
||||
|
||||
---
|
||||
|
||||
## How It Fits
|
||||
|
||||
- Upstream coordination: the Custodian State Hub owns workstream/task state,
|
||||
managed repository records, and capability routing.
|
||||
- Downstream consumers: Custodian agents and humans use repo-scoping to inspect,
|
||||
refine, and refresh repository utility profiles.
|
||||
- Often used with: `llm-connect` for optional LLM-assisted extraction and
|
||||
`the-custodian` for state, routing, and domain coordination.
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
- Preferred terms: scope, ability, capability, feature, evidence, observed fact,
|
||||
characteristic, candidate, approved characteristic, SCOPE.md.
|
||||
- Also known as: repository scoping service, repository ability registry.
|
||||
- Potentially confusing terms: evidence is not just a raw fact; it is support for
|
||||
a characteristic and may reference facts or lower-level characteristics.
|
||||
Candidates are proposed claims awaiting review; approved characteristics are
|
||||
canonical registry truth.
|
||||
|
||||
---
|
||||
|
||||
## Related / Overlapping Repositories
|
||||
|
||||
- `the-custodian` - coordination layer, State Hub, workplans, and capability
|
||||
catalog.
|
||||
- `llm-connect` - optional provider abstraction for LLM-assisted extraction.
|
||||
- `markitect` / `markitect-project` - content and documentation platform with
|
||||
related scope-document needs.
|
||||
|
||||
---
|
||||
|
||||
## Getting Oriented
|
||||
|
||||
- Start with: `README.md`, `AGENTS.md`, and this `SCOPE.md`.
|
||||
- Key files / directories: `src/repo_registry/web_api/app.py`,
|
||||
`src/repo_registry/core/service.py`, `src/repo_registry/scope/`,
|
||||
`src/repo_registry/candidate_graph/`, `src/repo_registry/repo_scanning/`,
|
||||
`docs/scope-md-spec.md`, and `workplans/`.
|
||||
- Entry points: `uvicorn repo_registry.web_api.app:app --reload`, the `/ui`
|
||||
routes, and the `/repos/{repo_slug}/scope*` API endpoints.
|
||||
|
||||
---
|
||||
|
||||
## Provided Capabilities
|
||||
|
||||
```capability
|
||||
type: api
|
||||
title: scope.generate
|
||||
description: >
|
||||
Generates a SCOPE.md from scratch for a registered repo using its approved
|
||||
characteristics profile (abilities, capabilities, features, facts).
|
||||
keywords: [scope, scope-md, generation, repository-utility]
|
||||
```
|
||||
|
||||
Approved entries are always explicit, reviewable, and source-linked. The system
|
||||
never publishes unapproved claims as canonical truth.
|
||||
```capability
|
||||
type: api
|
||||
title: scope.update
|
||||
description: >
|
||||
Diffs an existing SCOPE.md against the current characteristics profile
|
||||
and returns or writes an updated version.
|
||||
keywords: [scope, scope-md, update, diff, staleness]
|
||||
```
|
||||
|
||||
## In Scope (v0.1)
|
||||
---
|
||||
|
||||
- Repository registration by Git URL
|
||||
- Deterministic repository scan (file tree, languages, frameworks, API/CLI surface)
|
||||
- Candidate extraction for abilities, capabilities, features, and evidence
|
||||
- Human review workflow: edit, approve, reject, merge, relink
|
||||
- Natural-language and semantic search over approved registry entries
|
||||
- REST API for repositories, ability maps, capabilities, and search
|
||||
## Notes
|
||||
|
||||
## Out of Scope (v0.1)
|
||||
|
||||
- Continuous GitHub App integration
|
||||
- Full static code understanding (AST/type analysis)
|
||||
- Advanced ontology enforcement
|
||||
- Distributed indexing
|
||||
- Benchmark execution
|
||||
- Marketplace or commercial features
|
||||
- Complex access control
|
||||
- Automated truth claims without review
|
||||
|
||||
## Domain Context
|
||||
|
||||
Part of the **capabilities** domain — systematic modeling of abilities, capabilities,
|
||||
and features across the Custodian ecosystem. First registered repo in this domain.
|
||||
- The local checkout path is still `/home/worsch/repo-registry`; the canonical
|
||||
State Hub slug and Git remote are now `repo-scoping`.
|
||||
- Ecosystem-wide SCOPE.md refresh is blocked until Custodian C5b/C5c checks are
|
||||
active and more managed repos have approved characteristics in repo-scoping.
|
||||
|
||||
292
docs/scope-md-spec.md
Normal file
292
docs/scope-md-spec.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# SCOPE.md Reference Specification
|
||||
|
||||
`SCOPE.md` is the human- and agent-facing boundary definition for a repository.
|
||||
It answers, quickly and concretely, what the repository is for, when it is useful,
|
||||
where it fits, and what capabilities it can provide.
|
||||
|
||||
Repo-registry is the source of truth for generating and validating `SCOPE.md`
|
||||
because its approved characteristic model already captures the same structure:
|
||||
|
||||
```text
|
||||
Scope -> Ability -> Capability -> Feature -> Evidence -> Observed Fact
|
||||
```
|
||||
|
||||
This specification supersedes the Custodian dashboard reference at
|
||||
`state-hub/dashboard/src/docs/scope.md`. The scaffold template remains at
|
||||
`state-hub/scripts/project_rules/scope.template`; this document defines how
|
||||
repo-registry should generate, validate, and update that file.
|
||||
|
||||
Related model docs:
|
||||
|
||||
- `docs/characteristic-evidence-model.md`
|
||||
- `docs/classification-strategy.md`
|
||||
|
||||
## Purpose
|
||||
|
||||
`SCOPE.md` is not a README, architecture document, or marketing page. It is a
|
||||
short orientation artifact for deciding whether a repo is relevant before reading
|
||||
its code in depth.
|
||||
|
||||
It should answer:
|
||||
|
||||
- What is this repository for?
|
||||
- Should I care about it right now?
|
||||
- When is it relevant to my work?
|
||||
- Where does it fit in the ecosystem?
|
||||
- Is it mature enough to trust or reuse?
|
||||
- Does it overlap with something else?
|
||||
- What capabilities can it provide to other domains?
|
||||
|
||||
## Canonical Template
|
||||
|
||||
The historical Custodian reference calls this an "11-section template". The
|
||||
current `scope.template` contains twelve functional sections plus an optional
|
||||
`Notes` tail. Repo-registry should preserve the current template headings for
|
||||
compatibility and treat `Notes` as curator-owned free text.
|
||||
|
||||
Generated files must contain these sections, in this order:
|
||||
|
||||
| Section | Source in repo-registry | Generation ownership |
|
||||
|---------|--------------------------|----------------------|
|
||||
| `## One-liner` | Scope name plus scope description | generated, curator-reviewed |
|
||||
| `## Core Idea` | Scope description and top approved abilities | generated, curator-reviewed |
|
||||
| `## In Scope` | Approved abilities and high-confidence capabilities | generated, curator-reviewed |
|
||||
| `## Out of Scope` | Abilities or expectation gaps classified as exclusions | curator-owned unless explicitly modeled |
|
||||
| `## Relevant When` | Approved features with `primary_class: business-usecase` or `attributes` including use-case labels | generated, curator-reviewed |
|
||||
| `## Not Relevant When` | Negative use-case expectation gaps or curator exclusions | curator-owned unless explicitly modeled |
|
||||
| `## Current State` | Observed facts aggregated by scanner: status, language, framework, tests, routes, docs, manifests | generated |
|
||||
| `## How It Fits` | Evidence/support references to other characteristics or repos; dependency facts | generated, curator-reviewed |
|
||||
| `## Terminology` | Domain term facts, names, aliases, and classification labels | generated, curator-reviewed |
|
||||
| `## Related / Overlapping Repositories` | Cross-repo support references and comparison/discovery data | generated when known, curator-reviewed |
|
||||
| `## Getting Oriented` | Source refs, content chunks, key files, entry points, docs, tests | generated |
|
||||
| `## Provided Capabilities` | Approved capability characteristics rendered as machine-readable `capability` blocks | generated, file-origin truth |
|
||||
| `## Notes` | Human-maintained remarks that do not fit the structured sections | curator-owned |
|
||||
|
||||
When a generated section has insufficient data, emit a short stub plus:
|
||||
|
||||
```markdown
|
||||
<!-- needs curator input -->
|
||||
```
|
||||
|
||||
This makes gaps visible without pretending the scanner knows more than it does.
|
||||
|
||||
## Section Mapping Details
|
||||
|
||||
### One-liner
|
||||
|
||||
Use the approved repository `Scope` as the root characteristic. Prefer a single
|
||||
sentence from the scope description. If no curated sentence exists, use:
|
||||
|
||||
```text
|
||||
<scope name> defines and maintains the repository scope for <repository name>.
|
||||
```
|
||||
|
||||
### Core Idea
|
||||
|
||||
Summarize the root `Scope` and the most important approved `Ability` entries.
|
||||
Use ability descriptions where available. Avoid listing every capability here;
|
||||
the goal is orientation, not completeness.
|
||||
|
||||
### In Scope
|
||||
|
||||
Render approved abilities as top-level bullets. Include the most important
|
||||
capabilities as nested wording inside the bullet, but avoid deep nesting in the
|
||||
generated Markdown.
|
||||
|
||||
Suggested form:
|
||||
|
||||
```markdown
|
||||
- <Ability name> — <ability description>. Includes <capability A>, <capability B>.
|
||||
```
|
||||
|
||||
### Out of Scope
|
||||
|
||||
This section is primarily curator-owned. Repo-registry may seed it from
|
||||
classification expectation gaps whose `expected_type` is one of:
|
||||
|
||||
- `classification-granularity`
|
||||
- `classification-support`
|
||||
- `out-of-scope`
|
||||
|
||||
Generated text must be conservative and marked for review unless there is an
|
||||
approved negative/exclusion model in the future.
|
||||
|
||||
### Relevant When
|
||||
|
||||
Use approved features that represent real usage scenarios. Strong signals:
|
||||
|
||||
- `primary_class == "business-usecase"`
|
||||
- `attributes` contains `usecase`, `workflow`, `review`, `generation`,
|
||||
`analysis`, `integration`, or another domain-specific use-case label
|
||||
|
||||
If no business-usecase features exist, seed from high-confidence abilities and
|
||||
capabilities with a curator-input marker.
|
||||
|
||||
### Not Relevant When
|
||||
|
||||
This section is curator-owned unless explicit negative use-case facts or
|
||||
expectation gaps exist. Do not infer broad exclusions from missing features.
|
||||
|
||||
### Current State
|
||||
|
||||
Aggregate observed facts. Good generated indicators include:
|
||||
|
||||
- Status: derive from repository status and analysis run state.
|
||||
- Implementation: derive from source files, package manifests, tests, and route
|
||||
or CLI facts.
|
||||
- Stability: conservative default `evolving` unless curated.
|
||||
- Usage: conservative default `internal` or `unknown` unless facts indicate
|
||||
production usage.
|
||||
|
||||
Include compact bullets for detected languages, frameworks, tests, manifests,
|
||||
docs, interfaces, provider facts, and scanner gaps.
|
||||
|
||||
### How It Fits
|
||||
|
||||
Use support/evidence relationships and source refs:
|
||||
|
||||
- Upstream dependencies: package, service, provider, and integration facts.
|
||||
- Downstream consumers: cross-repo support references when available.
|
||||
- Often used with: related repo links and common provider/framework facts.
|
||||
|
||||
Evidence is support for a characteristic, not the same thing as a fact. Prefer
|
||||
evidence links that point downward in abstraction, as described in
|
||||
`docs/characteristic-evidence-model.md`.
|
||||
|
||||
### Terminology
|
||||
|
||||
Generate from:
|
||||
|
||||
- scope, ability, capability, and feature names
|
||||
- `primary_class` and `attributes`
|
||||
- scanner facts for providers, frameworks, commands, APIs, and domain terms
|
||||
- aliases or expectation gaps when present
|
||||
|
||||
Mark ambiguous or overlapping terms for curator review.
|
||||
|
||||
### Related / Overlapping Repositories
|
||||
|
||||
Generate only when there is cross-repo evidence, comparison data, or explicit
|
||||
curator input. Do not invent related repositories from name similarity alone.
|
||||
|
||||
### Getting Oriented
|
||||
|
||||
Use source references and observed facts to name good entry points:
|
||||
|
||||
- Start with: README, docs, API route files, CLI files, core service modules
|
||||
- Key files / directories: source paths with high fact/support density
|
||||
- Entry points: API routes, CLI commands, package manifests, tests
|
||||
|
||||
### Provided Capabilities
|
||||
|
||||
Render approved `Capability` characteristics as fenced `capability` blocks. This
|
||||
section is parsed by the Custodian capability catalog and remains file-origin
|
||||
truth under ADR-001.
|
||||
|
||||
Block format:
|
||||
|
||||
````markdown
|
||||
```capability
|
||||
type: api
|
||||
title: scope.generate
|
||||
description: >
|
||||
Generates a SCOPE.md from approved repository characteristics.
|
||||
keywords: [scope, scope-md, generation]
|
||||
```
|
||||
````
|
||||
|
||||
Fields:
|
||||
|
||||
| Field | Required | Source |
|
||||
|-------|----------|--------|
|
||||
| `type` | yes | capability `primary_class`, normalized to catalog categories |
|
||||
| `title` | yes | capability name or curated capability key |
|
||||
| `description` | no | capability description |
|
||||
| `keywords` | no | capability attributes plus relevant feature classes |
|
||||
|
||||
Allowed catalog categories remain compatible with the existing Custodian ingest:
|
||||
|
||||
- `infrastructure`
|
||||
- `api`
|
||||
- `data`
|
||||
- `security`
|
||||
- `documentation`
|
||||
- `other`
|
||||
|
||||
If a capability's `primary_class` is not one of these categories, map it to
|
||||
`api`, `data`, `documentation`, or `other` conservatively and preserve the
|
||||
original class as a keyword.
|
||||
|
||||
### Notes
|
||||
|
||||
`Notes` is optional and curator-owned. Generators should preserve existing notes
|
||||
when updating a file and should not overwrite this section unless explicitly
|
||||
requested.
|
||||
|
||||
## Generation Ownership
|
||||
|
||||
Repo-registry-generated sections:
|
||||
|
||||
- One-liner
|
||||
- Core Idea
|
||||
- In Scope
|
||||
- Relevant When
|
||||
- Current State
|
||||
- How It Fits
|
||||
- Terminology
|
||||
- Related / Overlapping Repositories
|
||||
- Getting Oriented
|
||||
- Provided Capabilities
|
||||
|
||||
Curator-owned or curator-reviewed sections:
|
||||
|
||||
- Out of Scope
|
||||
- Not Relevant When
|
||||
- Notes
|
||||
- Any generated section containing `<!-- needs curator input -->`
|
||||
|
||||
The generator may write stubs for curator-owned sections, but the updater must
|
||||
preserve existing curator text unless the caller explicitly asks for a full
|
||||
rewrite.
|
||||
|
||||
## Validation Rules
|
||||
|
||||
The validator should mirror the Custodian DOI C5 checks:
|
||||
|
||||
- C5a: `SCOPE.md` exists at the repository root.
|
||||
- C5b: required headings are present in canonical order.
|
||||
- C5c: `## Provided Capabilities` contains parseable `capability` blocks, or an
|
||||
explicit empty-state note when the repo provides no routable capabilities.
|
||||
|
||||
Additional repo-registry validation:
|
||||
|
||||
- Generated sections with missing data must include `<!-- needs curator input -->`.
|
||||
- Capability blocks must parse as key/value metadata.
|
||||
- Capability block titles should be stable enough for routing.
|
||||
- Curator-owned sections should be preserved by diff/update flows.
|
||||
|
||||
## Update Semantics
|
||||
|
||||
The validator/differ compares the existing file to freshly generated content by
|
||||
section. A section is:
|
||||
|
||||
- `ok` when normalized existing text matches generated content.
|
||||
- `stale` when the section exists but differs materially.
|
||||
- `missing` when the heading is absent.
|
||||
|
||||
Normalization should ignore repeated whitespace and harmless Markdown wrapping,
|
||||
but must not ignore changed capability block metadata.
|
||||
|
||||
Generated updates should be section-aware. Do not rewrite the whole file when a
|
||||
smaller section update is enough.
|
||||
|
||||
## Agent Guidance
|
||||
|
||||
Agents should treat `SCOPE.md` as a decision aid:
|
||||
|
||||
- Read it before deep code exploration.
|
||||
- Prefer it over README for scope boundaries.
|
||||
- Use `AGENTS.md` for operating instructions and repo-specific workflow.
|
||||
- Use generated diffs to spot stale scope claims.
|
||||
- Record expectation gaps when generated scope, classes, or capabilities do not
|
||||
match human judgment.
|
||||
4
src/repo_registry/scope/__init__.py
Normal file
4
src/repo_registry/scope/__init__.py
Normal file
@@ -0,0 +1,4 @@
|
||||
from repo_registry.scope.generator import ScopeGenerator
|
||||
from repo_registry.scope.validator import ScopeValidator
|
||||
|
||||
__all__ = ["ScopeGenerator", "ScopeValidator"]
|
||||
323
src/repo_registry/scope/generator.py
Normal file
323
src/repo_registry/scope/generator.py
Normal file
@@ -0,0 +1,323 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import asdict
|
||||
|
||||
from repo_registry.core.service import RegistryService
|
||||
from repo_registry.storage.sqlite import NotFoundError
|
||||
|
||||
|
||||
SCOPE_SECTIONS = [
|
||||
"One-liner",
|
||||
"Core Idea",
|
||||
"In Scope",
|
||||
"Out of Scope",
|
||||
"Relevant When",
|
||||
"Not Relevant When",
|
||||
"Current State",
|
||||
"How It Fits",
|
||||
"Terminology",
|
||||
"Related / Overlapping Repositories",
|
||||
"Getting Oriented",
|
||||
"Provided Capabilities",
|
||||
"Notes",
|
||||
]
|
||||
|
||||
|
||||
NEEDS_INPUT = "<!-- needs curator input -->"
|
||||
|
||||
|
||||
class ScopeGenerator:
|
||||
"""Render SCOPE.md from approved repository characteristics."""
|
||||
|
||||
def __init__(self, service: RegistryService) -> None:
|
||||
self.service = service
|
||||
|
||||
def generate(self, repo_slug: str) -> str:
|
||||
repository = self._repository_by_slug(repo_slug)
|
||||
ability_map = asdict(self.service.ability_map(repository.id))
|
||||
facts = [asdict(fact) for fact in self.service.list_observed_facts(repository.id)]
|
||||
sections = {
|
||||
"One-liner": self._one_liner(ability_map),
|
||||
"Core Idea": self._core_idea(ability_map),
|
||||
"In Scope": self._in_scope(ability_map),
|
||||
"Out of Scope": self._curator_stub(),
|
||||
"Relevant When": self._relevant_when(ability_map),
|
||||
"Not Relevant When": self._curator_stub(),
|
||||
"Current State": self._current_state(repository.status, facts),
|
||||
"How It Fits": self._how_it_fits(ability_map),
|
||||
"Terminology": self._terminology(ability_map, facts),
|
||||
"Related / Overlapping Repositories": self._curator_stub(),
|
||||
"Getting Oriented": self._getting_oriented(ability_map, facts),
|
||||
"Provided Capabilities": self._provided_capabilities(ability_map),
|
||||
"Notes": self._curator_stub(),
|
||||
}
|
||||
lines = [
|
||||
"# SCOPE",
|
||||
"",
|
||||
"> This file helps you quickly understand what this repository is about,",
|
||||
"> when it is relevant, and when it is not.",
|
||||
"> It was generated from approved repo-registry characteristics.",
|
||||
"",
|
||||
"---",
|
||||
"",
|
||||
]
|
||||
for section in SCOPE_SECTIONS:
|
||||
lines.extend([f"## {section}", "", sections[section].rstrip(), "", "---", ""])
|
||||
return "\n".join(lines).rstrip() + "\n"
|
||||
|
||||
def _repository_by_slug(self, repo_slug: str):
|
||||
wanted = self._slug(repo_slug)
|
||||
for repository in self.service.list_repositories():
|
||||
candidates = {
|
||||
self._slug(repository.name),
|
||||
self._slug(repository.url.rstrip("/").rsplit("/", 1)[-1].removesuffix(".git")),
|
||||
}
|
||||
if wanted in candidates:
|
||||
return repository
|
||||
raise NotFoundError(f"repository slug {repo_slug!r} was not found")
|
||||
|
||||
def _one_liner(self, ability_map: dict) -> str:
|
||||
scope = ability_map["scope"]
|
||||
description = self._sentence(scope.get("description", ""))
|
||||
if description:
|
||||
return description
|
||||
return f"{scope['name']} defines the repository scope for {ability_map['repository']['name']}."
|
||||
|
||||
def _core_idea(self, ability_map: dict) -> str:
|
||||
scope = ability_map["scope"]
|
||||
abilities = ability_map.get("abilities", [])
|
||||
lines = [scope.get("description") or self._one_liner(ability_map)]
|
||||
if abilities:
|
||||
lines.append("")
|
||||
lines.append("Approved abilities:")
|
||||
lines.extend(
|
||||
f"- {ability['name']} — {ability.get('description') or 'Approved repository ability.'}"
|
||||
for ability in abilities[:5]
|
||||
)
|
||||
else:
|
||||
lines.extend(["", NEEDS_INPUT])
|
||||
return "\n".join(lines)
|
||||
|
||||
def _in_scope(self, ability_map: dict) -> str:
|
||||
abilities = ability_map.get("abilities", [])
|
||||
if not abilities:
|
||||
return self._curator_stub()
|
||||
lines = []
|
||||
for ability in abilities:
|
||||
capabilities = ", ".join(
|
||||
capability["name"] for capability in ability.get("capabilities", [])[:4]
|
||||
)
|
||||
suffix = f" Includes {capabilities}." if capabilities else ""
|
||||
lines.append(
|
||||
f"- {ability['name']} — {ability.get('description') or 'Approved ability.'}{suffix}"
|
||||
)
|
||||
return "\n".join(lines)
|
||||
|
||||
def _relevant_when(self, ability_map: dict) -> str:
|
||||
features = [
|
||||
feature
|
||||
for feature in self._features(ability_map)
|
||||
if self._is_usecase_feature(feature)
|
||||
]
|
||||
if not features:
|
||||
features = self._features(ability_map)[:5]
|
||||
if not features:
|
||||
return self._curator_stub()
|
||||
lines = [
|
||||
f"- You need {feature['name']} ({feature.get('primary_class') or feature.get('type', 'feature')})."
|
||||
for feature in features
|
||||
]
|
||||
if not any(self._is_usecase_feature(feature) for feature in features):
|
||||
lines.append(NEEDS_INPUT)
|
||||
return "\n".join(lines)
|
||||
|
||||
def _current_state(self, status: str, facts: list[dict]) -> str:
|
||||
kinds = self._facts_by_kind(facts)
|
||||
languages = self._fact_names(kinds.get("language", []))
|
||||
frameworks = self._fact_names(kinds.get("framework", []))
|
||||
tests = kinds.get("test", [])
|
||||
interfaces = kinds.get("interface", [])
|
||||
manifests = kinds.get("manifest", [])
|
||||
implementation = "substantial" if interfaces or manifests else "partial"
|
||||
if not facts:
|
||||
implementation = "unknown"
|
||||
lines = [
|
||||
f"- Status: {status}",
|
||||
f"- Implementation: {implementation}",
|
||||
"- Stability: evolving",
|
||||
"- Usage: internal",
|
||||
f"- Languages: {', '.join(languages) if languages else 'unknown'}",
|
||||
f"- Frameworks: {', '.join(frameworks) if frameworks else 'none detected'}",
|
||||
f"- Tests observed: {len(tests)}",
|
||||
f"- Interfaces observed: {len(interfaces)}",
|
||||
f"- Manifests observed: {len(manifests)}",
|
||||
]
|
||||
if not facts:
|
||||
lines.append(NEEDS_INPUT)
|
||||
return "\n".join(lines)
|
||||
|
||||
def _how_it_fits(self, ability_map: dict) -> str:
|
||||
evidence = [
|
||||
item
|
||||
for capability in self._capabilities(ability_map)
|
||||
for item in capability.get("evidence", [])
|
||||
]
|
||||
if not evidence:
|
||||
return "\n".join(
|
||||
[
|
||||
"- Upstream dependencies: " + NEEDS_INPUT,
|
||||
"- Downstream consumers: " + NEEDS_INPUT,
|
||||
"- Often used with: " + NEEDS_INPUT,
|
||||
]
|
||||
)
|
||||
refs = ", ".join(
|
||||
sorted({item.get("reference", "") for item in evidence if item.get("reference")})[:8]
|
||||
)
|
||||
return "\n".join(
|
||||
[
|
||||
f"- Supported by evidence references: {refs or 'available evidence'}",
|
||||
"- Upstream dependencies: " + NEEDS_INPUT,
|
||||
"- Downstream consumers: " + NEEDS_INPUT,
|
||||
"- Often used with: " + NEEDS_INPUT,
|
||||
]
|
||||
)
|
||||
|
||||
def _terminology(self, ability_map: dict, facts: list[dict]) -> str:
|
||||
terms = set()
|
||||
for item in [ability_map["scope"], *ability_map.get("abilities", [])]:
|
||||
terms.add(item.get("name", ""))
|
||||
terms.add(item.get("primary_class", ""))
|
||||
terms.update(item.get("attributes", []))
|
||||
for capability in self._capabilities(ability_map):
|
||||
terms.add(capability.get("name", ""))
|
||||
terms.add(capability.get("primary_class", ""))
|
||||
terms.update(capability.get("attributes", []))
|
||||
for fact in facts:
|
||||
if fact.get("kind") in {"framework", "llm_provider", "provider_registry"}:
|
||||
terms.add(fact.get("name", ""))
|
||||
visible = [term for term in sorted(terms) if term]
|
||||
if not visible:
|
||||
return self._curator_stub()
|
||||
return "\n".join(
|
||||
[
|
||||
"- Preferred terms: " + ", ".join(visible[:12]),
|
||||
"- Also known as: " + NEEDS_INPUT,
|
||||
"- Potentially confusing terms: " + NEEDS_INPUT,
|
||||
]
|
||||
)
|
||||
|
||||
def _getting_oriented(self, ability_map: dict, facts: list[dict]) -> str:
|
||||
paths = self._source_paths(ability_map, facts)
|
||||
if not paths:
|
||||
return self._curator_stub()
|
||||
return "\n".join(
|
||||
[
|
||||
f"- Start with: {paths[0]}",
|
||||
f"- Key files / directories: {', '.join(paths[:8])}",
|
||||
f"- Entry points: {', '.join(paths[:5])}",
|
||||
]
|
||||
)
|
||||
|
||||
def _provided_capabilities(self, ability_map: dict) -> str:
|
||||
capabilities = self._capabilities(ability_map)
|
||||
if not capabilities:
|
||||
return f"<!-- No approved capabilities yet. -->\n{NEEDS_INPUT}"
|
||||
blocks = []
|
||||
for capability in capabilities:
|
||||
keywords = self._keywords_for_capability(capability)
|
||||
blocks.append(
|
||||
"\n".join(
|
||||
[
|
||||
"```capability",
|
||||
f"type: {self._capability_type(capability.get('primary_class', 'other'))}",
|
||||
f"title: {capability['name']}",
|
||||
"description: >",
|
||||
f" {capability.get('description') or 'Approved repository capability.'}",
|
||||
f"keywords: [{', '.join(keywords)}]",
|
||||
"```",
|
||||
]
|
||||
)
|
||||
)
|
||||
return "\n\n".join(blocks)
|
||||
|
||||
def _capabilities(self, ability_map: dict) -> list[dict]:
|
||||
return [
|
||||
capability
|
||||
for ability in ability_map.get("abilities", [])
|
||||
for capability in ability.get("capabilities", [])
|
||||
]
|
||||
|
||||
def _features(self, ability_map: dict) -> list[dict]:
|
||||
return [
|
||||
feature
|
||||
for capability in self._capabilities(ability_map)
|
||||
for feature in capability.get("features", [])
|
||||
]
|
||||
|
||||
def _is_usecase_feature(self, feature: dict) -> bool:
|
||||
labels = {str(feature.get("primary_class", "")).lower()}
|
||||
labels.update(str(item).lower() for item in feature.get("attributes", []))
|
||||
return bool(labels & {"business-usecase", "usecase", "workflow", "review"})
|
||||
|
||||
def _keywords_for_capability(self, capability: dict) -> list[str]:
|
||||
keywords = [capability.get("primary_class", "")]
|
||||
keywords.extend(capability.get("attributes", []))
|
||||
for feature in capability.get("features", []):
|
||||
keywords.append(feature.get("primary_class", ""))
|
||||
keywords.extend(feature.get("attributes", []))
|
||||
return [self._keyword(item) for item in self._unique(keywords)[:8] if item]
|
||||
|
||||
def _capability_type(self, primary_class: str) -> str:
|
||||
normalized = primary_class.lower()
|
||||
if normalized in {"api", "infrastructure", "data", "security", "documentation"}:
|
||||
return normalized
|
||||
if normalized in {"interface", "integration", "llm-integration"}:
|
||||
return "api"
|
||||
if normalized in {"storage", "repository-structure"}:
|
||||
return "data"
|
||||
return "other"
|
||||
|
||||
def _facts_by_kind(self, facts: list[dict]) -> dict[str, list[dict]]:
|
||||
grouped: dict[str, list[dict]] = {}
|
||||
for fact in facts:
|
||||
grouped.setdefault(fact.get("kind", ""), []).append(fact)
|
||||
return grouped
|
||||
|
||||
def _fact_names(self, facts: list[dict]) -> list[str]:
|
||||
return self._unique([fact.get("name", "") for fact in facts])
|
||||
|
||||
def _source_paths(self, ability_map: dict, facts: list[dict]) -> list[str]:
|
||||
paths = [fact.get("path", "") for fact in facts if fact.get("path")]
|
||||
for feature in self._features(ability_map):
|
||||
paths.append(feature.get("location", ""))
|
||||
for source_ref in feature.get("source_refs", []):
|
||||
paths.append(source_ref.get("path", ""))
|
||||
return self._unique(paths)
|
||||
|
||||
def _curator_stub(self) -> str:
|
||||
return f"- {NEEDS_INPUT}"
|
||||
|
||||
def _sentence(self, text: str) -> str:
|
||||
cleaned = re.sub(r"\s+", " ", text.strip())
|
||||
if not cleaned:
|
||||
return ""
|
||||
return re.split(r"(?<=[.!?])\s+", cleaned, maxsplit=1)[0]
|
||||
|
||||
def _slug(self, value: str) -> str:
|
||||
return re.sub(r"[^a-z0-9]+", "-", value.lower()).strip("-")
|
||||
|
||||
def _keyword(self, value: str) -> str:
|
||||
return self._slug(value) or "other"
|
||||
|
||||
def _unique(self, values: list[str]) -> list[str]:
|
||||
result: list[str] = []
|
||||
seen: set[str] = set()
|
||||
for value in values:
|
||||
item = str(value).strip()
|
||||
key = item.lower()
|
||||
if not item or key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
result.append(item)
|
||||
return result
|
||||
184
src/repo_registry/scope/validator.py
Normal file
184
src/repo_registry/scope/validator.py
Normal file
@@ -0,0 +1,184 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
from repo_registry.scope.generator import SCOPE_SECTIONS, ScopeGenerator
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ScopeDiffSection:
|
||||
section: str
|
||||
status: str
|
||||
current_text: str | None
|
||||
proposed_text: str | None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ScopeDiff:
|
||||
sections: list[ScopeDiffSection]
|
||||
|
||||
@property
|
||||
def needs_update(self) -> bool:
|
||||
return any(section.status != "ok" for section in self.sections)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ScopeValidationIssue:
|
||||
check: str
|
||||
severity: str
|
||||
message: str
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ValidationResult:
|
||||
issues: list[ScopeValidationIssue]
|
||||
|
||||
@property
|
||||
def ok(self) -> bool:
|
||||
return not any(issue.severity == "error" for issue in self.issues)
|
||||
|
||||
|
||||
class ScopeValidator:
|
||||
"""Validate and diff SCOPE.md files."""
|
||||
|
||||
def __init__(self, generator: ScopeGenerator | None = None) -> None:
|
||||
self.generator = generator
|
||||
|
||||
def diff(self, repo_slug: str, existing_path: Path) -> ScopeDiff:
|
||||
if self.generator is None:
|
||||
raise ValueError("ScopeValidator.diff requires a ScopeGenerator")
|
||||
current = existing_path.read_text(encoding="utf-8") if existing_path.exists() else ""
|
||||
proposed = self.generator.generate(repo_slug)
|
||||
current_sections = self._parse_sections(current)
|
||||
proposed_sections = self._parse_sections(proposed)
|
||||
sections: list[ScopeDiffSection] = []
|
||||
for section in SCOPE_SECTIONS:
|
||||
current_text = current_sections.get(section)
|
||||
proposed_text = proposed_sections.get(section, "")
|
||||
if current_text is None:
|
||||
status = "missing"
|
||||
elif self._normalize(current_text) == self._normalize(proposed_text):
|
||||
status = "ok"
|
||||
else:
|
||||
status = "stale"
|
||||
sections.append(
|
||||
ScopeDiffSection(
|
||||
section=section,
|
||||
status=status,
|
||||
current_text=current_text,
|
||||
proposed_text=proposed_text,
|
||||
)
|
||||
)
|
||||
return ScopeDiff(sections=sections)
|
||||
|
||||
def validate(self, path: Path) -> ValidationResult:
|
||||
issues: list[ScopeValidationIssue] = []
|
||||
if not path.exists():
|
||||
return ValidationResult(
|
||||
issues=[
|
||||
ScopeValidationIssue(
|
||||
check="C5a",
|
||||
severity="error",
|
||||
message="SCOPE.md is missing.",
|
||||
)
|
||||
]
|
||||
)
|
||||
content = path.read_text(encoding="utf-8")
|
||||
sections = self._parse_sections(content)
|
||||
missing = [section for section in SCOPE_SECTIONS if section not in sections]
|
||||
if missing:
|
||||
severity = "warn" if missing == ["Provided Capabilities"] else "error"
|
||||
issues.append(
|
||||
ScopeValidationIssue(
|
||||
check="C5b",
|
||||
severity=severity,
|
||||
message=f"Missing SCOPE.md section(s): {', '.join(missing)}.",
|
||||
)
|
||||
)
|
||||
ordered = self._heading_order(content)
|
||||
expected_order = [section for section in SCOPE_SECTIONS if section in sections]
|
||||
if ordered[: len(expected_order)] != expected_order:
|
||||
issues.append(
|
||||
ScopeValidationIssue(
|
||||
check="C5b",
|
||||
severity="warn",
|
||||
message="SCOPE.md sections are not in canonical order.",
|
||||
)
|
||||
)
|
||||
capabilities = sections.get("Provided Capabilities")
|
||||
if capabilities is None:
|
||||
issues.append(
|
||||
ScopeValidationIssue(
|
||||
check="C5c",
|
||||
severity="warn",
|
||||
message="Provided Capabilities section is missing.",
|
||||
)
|
||||
)
|
||||
elif "```capability" in capabilities:
|
||||
for index, block in enumerate(self._capability_blocks(capabilities), start=1):
|
||||
keys = self._capability_keys(block)
|
||||
missing_keys = {"type", "title"} - keys
|
||||
if missing_keys:
|
||||
issues.append(
|
||||
ScopeValidationIssue(
|
||||
check="C5c",
|
||||
severity="warn",
|
||||
message=(
|
||||
f"Capability block {index} is missing required field(s): "
|
||||
f"{', '.join(sorted(missing_keys))}."
|
||||
),
|
||||
)
|
||||
)
|
||||
elif "No approved capabilities yet" not in capabilities:
|
||||
issues.append(
|
||||
ScopeValidationIssue(
|
||||
check="C5c",
|
||||
severity="warn",
|
||||
message=(
|
||||
"Provided Capabilities has no capability blocks or explicit "
|
||||
"empty-state note."
|
||||
),
|
||||
)
|
||||
)
|
||||
return ValidationResult(issues=issues)
|
||||
|
||||
def _parse_sections(self, content: str) -> dict[str, str]:
|
||||
matches = list(re.finditer(r"^##\s+(.+?)\s*$", content, re.MULTILINE))
|
||||
sections: dict[str, str] = {}
|
||||
for index, match in enumerate(matches):
|
||||
title = match.group(1).strip()
|
||||
start = match.end()
|
||||
end = matches[index + 1].start() if index + 1 < len(matches) else len(content)
|
||||
body = content[start:end]
|
||||
body = re.sub(r"\n---\s*$", "", body.strip())
|
||||
sections[title] = body.strip()
|
||||
return sections
|
||||
|
||||
def _heading_order(self, content: str) -> list[str]:
|
||||
return [
|
||||
match.group(1).strip()
|
||||
for match in re.finditer(r"^##\s+(.+?)\s*$", content, re.MULTILINE)
|
||||
if match.group(1).strip() in SCOPE_SECTIONS
|
||||
]
|
||||
|
||||
def _normalize(self, value: str | None) -> str:
|
||||
if value is None:
|
||||
return ""
|
||||
without_comments = re.sub(r"<!--.*?-->", "", value, flags=re.DOTALL)
|
||||
without_markdown = re.sub(r"[`*_>#-]+", " ", without_comments)
|
||||
return re.sub(r"\s+", " ", without_markdown).strip().lower()
|
||||
|
||||
def _capability_blocks(self, content: str) -> list[str]:
|
||||
return re.findall(
|
||||
r"```capability\s*(.*?)```",
|
||||
content,
|
||||
flags=re.DOTALL | re.IGNORECASE,
|
||||
)
|
||||
|
||||
def _capability_keys(self, block: str) -> set[str]:
|
||||
return {
|
||||
match.group(1)
|
||||
for match in re.finditer(r"^([A-Za-z_][A-Za-z0-9_-]*):", block, re.MULTILINE)
|
||||
}
|
||||
@@ -1,8 +1,11 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import json
|
||||
from dataclasses import asdict
|
||||
from pathlib import Path
|
||||
from urllib.error import HTTPError, URLError
|
||||
from urllib.request import urlopen
|
||||
|
||||
from fastapi import Depends, FastAPI, HTTPException, Query
|
||||
from fastapi.responses import PlainTextResponse
|
||||
@@ -13,6 +16,7 @@ from repo_registry.core.service import RegistryService
|
||||
from repo_registry.llm_extraction import LLMCandidateExtractor, create_llm_connect_adapter
|
||||
from repo_registry.repo_ingestion.git import GitIngestionService
|
||||
from repo_registry.semantic import HashingEmbeddingProvider
|
||||
from repo_registry.scope import ScopeGenerator, ScopeValidator
|
||||
from repo_registry.storage.sqlite import NotFoundError, RegistryStore
|
||||
from repo_registry.web_api.schemas import (
|
||||
AbilityCreate,
|
||||
@@ -58,6 +62,12 @@ from repo_registry.web_api.schemas import (
|
||||
)
|
||||
|
||||
|
||||
def slugify(value: str) -> str:
|
||||
import re
|
||||
|
||||
return re.sub(r"[^a-z0-9]+", "-", value.lower()).strip("-")
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
model_config = SettingsConfigDict(env_prefix="REPO_REGISTRY_")
|
||||
|
||||
@@ -67,6 +77,7 @@ class Settings(BaseSettings):
|
||||
llm_provider: str | None = Field(default=None)
|
||||
llm_model: str | None = Field(default=None)
|
||||
embedding_provider: str | None = Field(default=None)
|
||||
state_hub_base_url: str = Field(default="http://127.0.0.1:8000")
|
||||
log_level: str = Field(default="INFO")
|
||||
|
||||
|
||||
@@ -111,6 +122,7 @@ OPENAPI_TAGS = [
|
||||
{"name": "analysis", "description": "Repository scans and extracted review inputs."},
|
||||
{"name": "review", "description": "Candidate graph approval and correction workflow."},
|
||||
{"name": "registry", "description": "Approved ability maps and manual registry CRUD."},
|
||||
{"name": "scope", "description": "SCOPE.md generation, diffing, and writing."},
|
||||
{"name": "search", "description": "Agent-facing discovery endpoints."},
|
||||
{"name": "discovery", "description": "Comparison, gap analysis, and export helpers."},
|
||||
]
|
||||
@@ -1120,6 +1132,144 @@ def export_repository_registry_entry(
|
||||
return PlainTextResponse(content, media_type="application/x-yaml")
|
||||
|
||||
|
||||
@app.get(
|
||||
"/repos/{repo_slug}/scope",
|
||||
tags=["scope"],
|
||||
response_class=PlainTextResponse,
|
||||
responses={
|
||||
200: {
|
||||
"content": {"text/markdown": {}},
|
||||
"description": "Generated SCOPE.md preview from approved characteristics.",
|
||||
}
|
||||
},
|
||||
)
|
||||
def generate_repository_scope(
|
||||
repo_slug: str,
|
||||
service: RegistryService = Depends(get_service),
|
||||
) -> PlainTextResponse:
|
||||
try:
|
||||
ensure_scope_generation_ready(service, repo_slug)
|
||||
content = ScopeGenerator(service).generate(repo_slug)
|
||||
except NotFoundError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
return PlainTextResponse(content, media_type="text/markdown")
|
||||
|
||||
|
||||
@app.get(
|
||||
"/repos/{repo_slug}/scope/diff",
|
||||
tags=["scope"],
|
||||
)
|
||||
def diff_repository_scope(
|
||||
repo_slug: str,
|
||||
service: RegistryService = Depends(get_service),
|
||||
settings: Settings = Depends(get_settings),
|
||||
) -> dict[str, object]:
|
||||
try:
|
||||
repository = ensure_scope_generation_ready(service, repo_slug)
|
||||
scope_path = scope_file_path(service, repository, repo_slug, settings)
|
||||
diff = ScopeValidator(ScopeGenerator(service)).diff(repo_slug, scope_path)
|
||||
except NotFoundError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
except ValueError as exc:
|
||||
raise HTTPException(status_code=409, detail=str(exc)) from exc
|
||||
return {
|
||||
"sections": [asdict(section) for section in diff.sections],
|
||||
"needs_update": diff.needs_update,
|
||||
}
|
||||
|
||||
|
||||
@app.post(
|
||||
"/repos/{repo_slug}/scope/write",
|
||||
tags=["scope"],
|
||||
)
|
||||
def write_repository_scope(
|
||||
repo_slug: str,
|
||||
service: RegistryService = Depends(get_service),
|
||||
settings: Settings = Depends(get_settings),
|
||||
) -> dict[str, object]:
|
||||
try:
|
||||
repository = ensure_scope_generation_ready(service, repo_slug)
|
||||
scope_path = scope_file_path(service, repository, repo_slug, settings)
|
||||
content = ScopeGenerator(service).generate(repo_slug)
|
||||
except NotFoundError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
except ValueError as exc:
|
||||
raise HTTPException(status_code=409, detail=str(exc)) from exc
|
||||
scope_path.write_text(content, encoding="utf-8")
|
||||
return {"written": True, "path": str(scope_path)}
|
||||
|
||||
|
||||
def ensure_scope_generation_ready(
|
||||
service: RegistryService,
|
||||
repo_slug: str,
|
||||
):
|
||||
repository = repository_by_slug(service, repo_slug)
|
||||
ability_map = service.ability_map(repository.id)
|
||||
if not ability_map.abilities:
|
||||
raise NotFoundError(
|
||||
f"repository {repo_slug!r} has no approved characteristics"
|
||||
)
|
||||
return repository
|
||||
|
||||
|
||||
def repository_by_slug(service: RegistryService, repo_slug: str):
|
||||
wanted = slugify(repo_slug)
|
||||
for repository in service.list_repositories():
|
||||
candidates = {
|
||||
slugify(repository.name),
|
||||
slugify(repository.url.rstrip("/").rsplit("/", 1)[-1].removesuffix(".git")),
|
||||
}
|
||||
if wanted in candidates:
|
||||
return repository
|
||||
raise NotFoundError(f"repository slug {repo_slug!r} was not found")
|
||||
|
||||
|
||||
def scope_file_path(
|
||||
service: RegistryService,
|
||||
repository,
|
||||
repo_slug: str,
|
||||
settings: Settings,
|
||||
) -> Path:
|
||||
state_hub_path = state_hub_scope_file_path(repo_slug, settings)
|
||||
if state_hub_path is not None:
|
||||
return state_hub_path
|
||||
source_path = Path(repository.url)
|
||||
if source_path.exists() and source_path.is_dir():
|
||||
return source_path / "SCOPE.md"
|
||||
checkout = service.ingestion.cached_checkout(repository.url)
|
||||
if checkout is not None and checkout.source_path.exists():
|
||||
return checkout.source_path / "SCOPE.md"
|
||||
raise ValueError(
|
||||
"repository has no known local checkout path on this host"
|
||||
)
|
||||
|
||||
|
||||
def state_hub_scope_file_path(repo_slug: str, settings: Settings) -> Path | None:
|
||||
base_url = settings.state_hub_base_url.rstrip("/")
|
||||
if not base_url:
|
||||
return None
|
||||
try:
|
||||
with urlopen(f"{base_url}/repos/{repo_slug}/", timeout=2) as response:
|
||||
repo = json.loads(response.read().decode("utf-8"))
|
||||
except HTTPError as exc:
|
||||
if exc.code == 404:
|
||||
return None
|
||||
raise ValueError("state hub repository path lookup failed") from exc
|
||||
except (URLError, TimeoutError, OSError, json.JSONDecodeError):
|
||||
return None
|
||||
local_path = repo.get("local_path")
|
||||
if not local_path:
|
||||
raise ValueError(
|
||||
f"state hub repo {repo_slug!r} has no local path on this host"
|
||||
)
|
||||
path = Path(local_path)
|
||||
if path.exists() and path.is_dir():
|
||||
return path / "SCOPE.md"
|
||||
raise ValueError(
|
||||
f"state hub local path for repo {repo_slug!r} is not available: {path}"
|
||||
)
|
||||
|
||||
|
||||
@app.get(
|
||||
"/repository-comparisons",
|
||||
tags=["discovery"],
|
||||
|
||||
138
tests/test_scope_generator.py
Normal file
138
tests/test_scope_generator.py
Normal file
@@ -0,0 +1,138 @@
|
||||
from repo_registry.core.service import RegistryService
|
||||
from repo_registry.repo_ingestion.git import GitIngestionService
|
||||
from repo_registry.scope.generator import SCOPE_SECTIONS, ScopeGenerator
|
||||
from repo_registry.scope.validator import ScopeValidator
|
||||
from repo_registry.storage.sqlite import RegistryStore
|
||||
|
||||
|
||||
def make_service(tmp_path):
|
||||
store = RegistryStore(tmp_path / "registry.sqlite3")
|
||||
store.initialize()
|
||||
return RegistryService(store, ingestion=GitIngestionService(tmp_path / "checkouts"))
|
||||
|
||||
|
||||
def test_scope_generator_renders_canonical_sections_and_capability_blocks(tmp_path):
|
||||
service = make_service(tmp_path)
|
||||
repository = service.register_repository(
|
||||
name="Repo Registry",
|
||||
url="https://example.test/coulomb/repo-registry.git",
|
||||
description="Generates repository scope files from approved characteristics.",
|
||||
)
|
||||
service.update_scope(
|
||||
repository.id,
|
||||
name="Repo Scoping",
|
||||
description="Generates and validates SCOPE.md files for registered repositories.",
|
||||
confidence=0.95,
|
||||
)
|
||||
ability_id = service.add_ability(
|
||||
repository.id,
|
||||
name="Maintain Repository Scope",
|
||||
description="Keeps repository utility and boundaries understandable.",
|
||||
primary_class="repository-intelligence",
|
||||
attributes=["scope", "capability-mapping"],
|
||||
)
|
||||
capability_id = service.add_capability(
|
||||
repository.id,
|
||||
ability_id,
|
||||
name="Generate SCOPE.md",
|
||||
description="Renders SCOPE.md from approved repository characteristics.",
|
||||
primary_class="api",
|
||||
attributes=["scope", "generation"],
|
||||
)
|
||||
service.add_feature(
|
||||
repository.id,
|
||||
capability_id,
|
||||
name="Preview generated SCOPE.md",
|
||||
type="business-usecase",
|
||||
primary_class="business-usecase",
|
||||
attributes=["scope", "preview"],
|
||||
location="src/repo_registry/scope/generator.py",
|
||||
)
|
||||
|
||||
content = ScopeGenerator(service).generate("repo-registry")
|
||||
|
||||
assert content.startswith("# SCOPE\n")
|
||||
for section in SCOPE_SECTIONS:
|
||||
assert f"## {section}" in content
|
||||
assert "Generates and validates SCOPE.md files" in content
|
||||
assert "Maintain Repository Scope" in content
|
||||
assert "Preview generated SCOPE.md" in content
|
||||
assert "src/repo_registry/scope/generator.py" in content
|
||||
assert "```capability" in content
|
||||
assert "type: api" in content
|
||||
assert "title: Generate SCOPE.md" in content
|
||||
assert "keywords: [api, scope, generation, business-usecase, preview]" in content
|
||||
|
||||
|
||||
def test_scope_generator_marks_missing_curator_owned_sections(tmp_path):
|
||||
service = make_service(tmp_path)
|
||||
service.register_repository(
|
||||
name="Sparse Repo",
|
||||
url="https://example.test/sparse.git",
|
||||
description="Sparse repo.",
|
||||
)
|
||||
|
||||
content = ScopeGenerator(service).generate("sparse")
|
||||
|
||||
assert "## Out of Scope" in content
|
||||
assert "<!-- needs curator input -->" in content
|
||||
assert "<!-- No approved capabilities yet. -->" in content
|
||||
|
||||
|
||||
def test_scope_validator_validates_generated_scope_and_diffs_sections(tmp_path):
|
||||
service = make_service(tmp_path)
|
||||
repository = service.register_repository(
|
||||
name="Validator Repo",
|
||||
url="https://example.test/validator-repo.git",
|
||||
description="Validates generated scope files.",
|
||||
)
|
||||
ability_id = service.add_ability(repository.id, name="Validate Scope Files")
|
||||
service.add_capability(
|
||||
repository.id,
|
||||
ability_id,
|
||||
name="Diff SCOPE.md",
|
||||
description="Compares generated and existing scope sections.",
|
||||
primary_class="api",
|
||||
attributes=["scope", "diff"],
|
||||
)
|
||||
generator = ScopeGenerator(service)
|
||||
validator = ScopeValidator(generator)
|
||||
path = tmp_path / "SCOPE.md"
|
||||
path.write_text(generator.generate("validator-repo"), encoding="utf-8")
|
||||
|
||||
validation = validator.validate(path)
|
||||
diff = validator.diff("validator-repo", path)
|
||||
|
||||
assert validation.ok
|
||||
assert validation.issues == []
|
||||
assert not diff.needs_update
|
||||
assert {section.status for section in diff.sections} == {"ok"}
|
||||
|
||||
path.write_text(
|
||||
path.read_text(encoding="utf-8").replace("## Core Idea", "## Core Thought"),
|
||||
encoding="utf-8",
|
||||
)
|
||||
diff = validator.diff("validator-repo", path)
|
||||
assert diff.needs_update
|
||||
assert next(section for section in diff.sections if section.section == "Core Idea").status == "missing"
|
||||
|
||||
|
||||
def test_scope_validator_warns_when_provided_capabilities_section_is_missing(tmp_path):
|
||||
path = tmp_path / "SCOPE.md"
|
||||
path.write_text(
|
||||
"\n\n".join(
|
||||
f"## {section}\n\nplaceholder"
|
||||
for section in SCOPE_SECTIONS
|
||||
if section != "Provided Capabilities"
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
result = ScopeValidator().validate(path)
|
||||
|
||||
assert any(
|
||||
issue.check == "C5c"
|
||||
and issue.severity == "warn"
|
||||
and "Provided Capabilities" in issue.message
|
||||
for issue in result.issues
|
||||
)
|
||||
@@ -18,6 +18,7 @@ def test_openapi_groups_agent_facing_endpoints():
|
||||
"analysis",
|
||||
"review",
|
||||
"registry",
|
||||
"scope",
|
||||
"search",
|
||||
"discovery",
|
||||
}
|
||||
@@ -252,6 +253,15 @@ def test_openapi_contract_snapshot_for_stable_agent_paths():
|
||||
"/repos/{repository_id}/export": {
|
||||
"get": {"tags": ["discovery"], "success_schema": "application/x-yaml"}
|
||||
},
|
||||
"/repos/{repo_slug}/scope": {
|
||||
"get": {"tags": ["scope"], "success_schema": None}
|
||||
},
|
||||
"/repos/{repo_slug}/scope/diff": {
|
||||
"get": {"tags": ["scope"], "success_schema": "object"}
|
||||
},
|
||||
"/repos/{repo_slug}/scope/write": {
|
||||
"post": {"tags": ["scope"], "success_schema": "object"}
|
||||
},
|
||||
"/repos/{repository_id}/expectation-gaps": {
|
||||
"get": {"tags": ["review"], "success_schema": "list[ExpectationGapResponse]"},
|
||||
"post": {"tags": ["review"], "success_schema": "ExpectationGapResponse"},
|
||||
@@ -455,6 +465,100 @@ def test_api_manual_registry_loop(tmp_path):
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
def test_api_generates_diffs_and_writes_scope_md(tmp_path):
|
||||
source = tmp_path / "scope-repo"
|
||||
source.mkdir()
|
||||
|
||||
def override_settings():
|
||||
return Settings(
|
||||
database_path=str(tmp_path / "scope-api.sqlite3"),
|
||||
checkout_root=str(tmp_path / "checkouts"),
|
||||
)
|
||||
|
||||
app.dependency_overrides[get_settings] = override_settings
|
||||
client = TestClient(app)
|
||||
try:
|
||||
repository = client.post(
|
||||
"/repos",
|
||||
json={
|
||||
"name": "Scope Repo",
|
||||
"url": str(source),
|
||||
"description": "Generates SCOPE.md through the API.",
|
||||
},
|
||||
).json()
|
||||
ability_id = client.post(
|
||||
f"/repos/{repository['id']}/abilities",
|
||||
json={
|
||||
"name": "Maintain Repository Scope",
|
||||
"description": "Keeps repository utility understandable.",
|
||||
},
|
||||
).json()["id"]
|
||||
client.post(
|
||||
f"/repos/{repository['id']}/capabilities",
|
||||
json={
|
||||
"ability_id": ability_id,
|
||||
"name": "Generate SCOPE.md",
|
||||
"description": "Renders SCOPE.md from approved characteristics.",
|
||||
"primary_class": "api",
|
||||
"attributes": ["scope", "generation"],
|
||||
},
|
||||
)
|
||||
|
||||
preview = client.get("/repos/scope-repo/scope")
|
||||
assert preview.status_code == 200
|
||||
assert preview.headers["content-type"].startswith("text/markdown")
|
||||
assert "# SCOPE" in preview.text
|
||||
assert "title: Generate SCOPE.md" in preview.text
|
||||
|
||||
diff = client.get("/repos/scope-repo/scope/diff")
|
||||
assert diff.status_code == 200
|
||||
assert diff.json()["needs_update"] is True
|
||||
assert {section["status"] for section in diff.json()["sections"]} == {"missing"}
|
||||
|
||||
write = client.post("/repos/scope-repo/scope/write")
|
||||
assert write.status_code == 200
|
||||
assert write.json() == {"written": True, "path": str(source / "SCOPE.md")}
|
||||
assert (source / "SCOPE.md").read_text(encoding="utf-8").startswith("# SCOPE")
|
||||
|
||||
current = client.get("/repos/scope-repo/scope/diff")
|
||||
assert current.status_code == 200
|
||||
assert current.json()["needs_update"] is False
|
||||
assert {section["status"] for section in current.json()["sections"]} == {"ok"}
|
||||
|
||||
empty = client.post(
|
||||
"/repos",
|
||||
json={
|
||||
"name": "Empty Scope",
|
||||
"url": "https://example.test/empty-scope.git",
|
||||
"description": "No approved characteristics yet.",
|
||||
},
|
||||
).json()
|
||||
assert client.get("/repos/empty-scope/scope").status_code == 404
|
||||
|
||||
remote = client.post(
|
||||
"/repos",
|
||||
json={
|
||||
"name": "Remote Scope",
|
||||
"url": "https://example.test/remote-scope.git",
|
||||
"description": "Has no known local checkout path.",
|
||||
},
|
||||
).json()
|
||||
remote_ability = client.post(
|
||||
f"/repos/{remote['id']}/abilities",
|
||||
json={"name": "Remote Scope Generation"},
|
||||
).json()["id"]
|
||||
client.post(
|
||||
f"/repos/{remote['id']}/capabilities",
|
||||
json={
|
||||
"ability_id": remote_ability,
|
||||
"name": "Generate Remote SCOPE.md",
|
||||
},
|
||||
)
|
||||
assert client.post("/repos/remote-scope/scope/write").status_code == 409
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
def test_api_compare_gap_and_export_use_cases(tmp_path):
|
||||
def override_settings():
|
||||
return Settings(
|
||||
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "SCOPE.md Generation Feature"
|
||||
domain: capabilities
|
||||
repo: repo-registry
|
||||
status: todo
|
||||
status: done
|
||||
owner: codex
|
||||
topic_slug: foerster-capabilities
|
||||
created: "2026-04-30"
|
||||
@@ -37,7 +37,7 @@ Unblocks: RREG-WP-0006
|
||||
|
||||
```task
|
||||
id: RREG-WP-0005-T01
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "83154aae-dd06-4329-8df6-3906b2bf0f14"
|
||||
```
|
||||
@@ -73,11 +73,17 @@ Acceptance: `docs/scope-md-spec.md` exists, covers all 11 sections with
|
||||
explicit characteristic-to-section mappings, and is consistent with the
|
||||
existing template at `state-hub/scripts/project_rules/scope.template`.
|
||||
|
||||
Implementation note 2026-04-30: `docs/scope-md-spec.md` now owns the reference
|
||||
specification. It maps the current Custodian template headings to the
|
||||
Scope/Ability/Capability/Feature/Evidence/Facts model, documents generated vs.
|
||||
curator-owned sections, preserves the existing capability block format, and
|
||||
cross-references the characteristic/evidence and classification strategy docs.
|
||||
|
||||
## T02: Build SCOPE.md generator
|
||||
|
||||
```task
|
||||
id: RREG-WP-0005-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "39feb7ea-72ca-4d99-8094-b006df605dbe"
|
||||
```
|
||||
@@ -109,11 +115,17 @@ valid SCOPE.md; all 11 sections are present; the `## Provided Capabilities`
|
||||
section contains parseable capability blocks; the output passes the C5b/C5c
|
||||
checks defined in CUST-WP-0034-T01.
|
||||
|
||||
Implementation note 2026-04-30: `repo_registry.scope.ScopeGenerator` now renders
|
||||
SCOPE.md from approved repository scope, abilities, capabilities, features, facts,
|
||||
support evidence, and classification metadata. It preserves the current template
|
||||
headings, emits curator-input stubs for missing data, and renders approved
|
||||
capabilities as parseable `capability` blocks.
|
||||
|
||||
## T03: Build SCOPE.md validator and differ
|
||||
|
||||
```task
|
||||
id: RREG-WP-0005-T03
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "0c9c1347-368a-4657-a039-ae143a6500bd"
|
||||
```
|
||||
@@ -137,11 +149,15 @@ Acceptance: `ScopeValidator.diff("repo-registry", Path("SCOPE.md"))` returns
|
||||
a diff with at least some `ok` sections and surfaces any real gaps; the
|
||||
validator catches a missing `## Provided Capabilities` section as a `warn`.
|
||||
|
||||
Implementation note 2026-04-30: `repo_registry.scope.ScopeValidator` now validates
|
||||
C5a/C5b/C5c-style SCOPE.md structure, parses capability blocks, and produces
|
||||
section-aware diffs against freshly generated content.
|
||||
|
||||
## T04: API endpoints
|
||||
|
||||
```task
|
||||
id: RREG-WP-0005-T04
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "a2d1937b-f9e2-480e-8e28-1c12837e1b23"
|
||||
```
|
||||
@@ -164,11 +180,17 @@ Acceptance: `GET /repos/repo-registry/scope` returns valid Markdown; `GET
|
||||
/repos/repo-registry/scope/diff` returns a diff JSON; a `POST` write succeeds
|
||||
and the written file passes the validator.
|
||||
|
||||
Implementation note 2026-04-30: Added `/repos/{repo_slug}/scope`,
|
||||
`/repos/{repo_slug}/scope/diff`, and `/repos/{repo_slug}/scope/write` API
|
||||
endpoints. The endpoints resolve registered repositories by slug, require
|
||||
approved characteristics, use a local repository path or cached checkout for
|
||||
diff/write operations, and return 409 when no local path is available.
|
||||
|
||||
## T05: Register capabilities in custodian
|
||||
|
||||
```task
|
||||
id: RREG-WP-0005-T05
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "e1bd4a4f-3d9a-4384-a254-ef75bd9905b9"
|
||||
```
|
||||
@@ -200,3 +222,9 @@ CUST-WP-0034-T03 will then resolve correctly.
|
||||
Acceptance: `list_capabilities()` in the state-hub MCP returns `scope.generate`
|
||||
and `scope.update` with `provider_repo: repo-registry`; `request_capability`
|
||||
with either key resolves without routing error.
|
||||
|
||||
Implementation note 2026-04-30: Added `scope.generate` and `scope.update`
|
||||
capability blocks to `SCOPE.md`, then ingested them with the State Hub
|
||||
capability ingestion script using `/home/worsch/repo-registry` as the explicit
|
||||
repo path. The State Hub catalog created `api/scope.generate` and
|
||||
`api/scope.update` entries for `repo-registry`.
|
||||
|
||||
Reference in New Issue
Block a user