Go to file

tegwick 28ea672225 chore(consistency): sync task status from DB [auto]

Updated by fix-consistency on 2026-05-15:
  - update .custodian-brief.md for repo-scoping

2026-05-15 21:17:32 +02:00

docs

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

migrations

Improved datamodel and deterministic generation

2026-04-30 01:29:29 +02:00

src/repo_scoping

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

tests

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

wiki

Documentation, terminology repo cleanup.

2026-05-01 15:00:39 +02:00

workplans

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

.custodian-brief.md

chore(consistency): sync task status from DB [auto]

2026-05-15 21:17:32 +02:00

.gitignore

Initial commit

2026-04-25 19:02:19 +00:00

AGENTS.md

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

CLAUDE.md

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

INTENT.md

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

LICENSE

Initial commit

2026-04-25 19:02:19 +00:00

Makefile

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

pyproject.toml

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

README.md

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

SCOPE.md

Finalize repo-scoping runtime rename

2026-05-15 21:16:34 +02:00

README.md

Repository Scoping

Repository Scoping maps repositories from usefulness to implementation:

Ability -> Capability -> Feature -> Evidence -> Code location

The implementation is a Python registry core plus FastAPI HTTP API and a small curator UI. Repository registration imports basic metadata from the repository itself, then analysis builds observed facts and candidate review entries.

Local Development

Create an environment and install dependencies:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"

Run tests:

pytest

Run the API:

uvicorn repo_scoping.web_api.app:app --reload

The API creates a local SQLite database at var/repo-scoping.sqlite3 by default. Runtime configuration uses the REPO_SCOPING_ environment prefix, and the Python package is repo_scoping.

First API Loop

curl -X POST http://127.0.0.1:8000/repos \
  -H 'content-type: application/json' \
  -d '{"url":"https://example.com/mail-router.git"}'

The registry uses the submitted repository path or URL as the default name, then imports description and fallback metadata from pyproject.toml, package.json, or README where possible. Then add abilities, capabilities, features, and evidence under that repository and inspect:

curl http://127.0.0.1:8000/repos/1/ability-map
curl 'http://127.0.0.1:8000/search?q=classify'

Deterministic Analysis

For local development, repository URLs may be local filesystem paths. Git URLs, including file:// URLs, are cloned into var/checkouts before scanning. Trigger a deterministic scan:

curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs \
  -H 'content-type: application/json' \
  -d '{}'

Or override the scan source path explicitly:

curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs \
  -H 'content-type: application/json' \
  -d '{"source_path":"/path/to/repository"}'

Inspect recorded facts:

curl http://127.0.0.1:8000/repos/1/analysis-runs
curl http://127.0.0.1:8000/repos/1/analysis-runs/1
curl http://127.0.0.1:8000/repos/1/observed-facts

The deterministic scanner records observed facts only: languages, documentation files, examples, tests, package manifests, configuration files, framework hints, and likely API/CLI interfaces.

Each completed analysis run also creates a conservative candidate graph for review:

curl http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph

Candidate entries are source-linked review seeds. They are not canonical registry truth until a review workflow approves them. Candidate, approved, and search responses include numeric confidence values plus low, medium, or high confidence labels for quick triage.

Approve a candidate graph into the canonical registry:

curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph/approve \
  -H 'content-type: application/json' \
  -d '{"notes":"Approved first review package"}'

Approval copies candidate abilities, capabilities, features, and evidence into the approved registry tables, marks candidates approved, and moves the repository status to indexed.

Review Workflow

Candidate graphs are meant to be corrected before publication. The API supports:

edit candidate abilities and capabilities with PATCH
reject candidate abilities, capabilities, features, and evidence
relink capabilities under another ability
relink features or evidence under another capability
merge duplicate abilities, capabilities, features, or evidence

Examples are available in the generated OpenAPI docs at /docs.

Optional LLM Extraction

The llm_extraction module is designed to work with the sibling llm-connect project without making it a hard dependency. To enable provider-backed extraction locally:

python -m pip install -e ../llm-connect

The integration accepts any llm-connect style adapter with execute_prompt(prompt, config) and parses strict JSON candidate drafts from model responses. Parsed drafts can be mapped into reviewable candidate graph entries while preserving source paths where they match observed facts or content chunks. Tests use fake adapters, so the default test suite does not call external providers.

Application code can inject an LLMCandidateExtractor into RegistryService. When an extractor is present and returns candidates, analysis stores those reviewable candidates; when it returns no candidates, the deterministic heuristic generator remains the fallback. If extraction fails, the failure is recorded as a review decision and analysis continues with deterministic candidates. Successful LLM candidate generation is also recorded as a review decision so curators can see whether a graph came from deterministic heuristics or an LLM draft.

The FastAPI settings object also accepts llm_provider and llm_model. By default llm_provider is unset, so analysis is fully offline and deterministic. Environment variables use the REPO_SCOPING_ prefix:

REPO_SCOPING_LLM_PROVIDER=gemini
REPO_SCOPING_LLM_MODEL=gemini-2.5-flash

LLM assistance can also be disabled even when a provider is configured:

REPO_SCOPING_LLM_ENABLED=false

Individual analysis requests may opt out with {"use_llm_assistance": false}. For local demos, {"trusted_auto_approve": true} approves the generated candidate graph immediately after analysis and records the review decision as trusted_auto_approve_candidate_graph. The default remains review-first: automation is off unless explicitly requested.

Agent-Facing Endpoints

The v0.1 API covers the main registration, analysis, review, search, and inspection loop:

GET  /repos
POST /repos
GET  /repos/{id}
PATCH /repos/{id}
DELETE /repos/{id}
POST /repos/{id}/analysis-runs
GET  /repos/{id}/analysis-runs
GET  /repos/{id}/analysis-runs/{run_id}
GET  /repos/{id}/analysis-runs/{run_id}/candidate-graph
POST /repos/{id}/analysis-runs/{run_id}/candidate-graph/approve
GET  /repos/{id}/ability-map
PATCH /repos/{id}/abilities/{ability_id}
DELETE /repos/{id}/abilities/{ability_id}
PATCH /repos/{id}/capabilities/{capability_id}
DELETE /repos/{id}/capabilities/{capability_id}
PATCH /repos/{id}/features/{feature_id}
DELETE /repos/{id}/features/{feature_id}
PATCH /repos/{id}/evidence/{evidence_id}
DELETE /repos/{id}/evidence/{evidence_id}
GET  /abilities
GET  /capabilities
GET  /search?q=...
GET  /repository-comparisons?repository_ids=1&repository_ids=2
POST /capability-gaps
GET  /repos/{id}/export

Agent API Loop

Agents can use the API as a conservative inspect-and-curate loop:

Register or find a repository with POST /repos or GET /repos.
Run analysis with POST /repos/{id}/analysis-runs.
Inspect observed facts, content chunks, and the candidate graph:

curl http://127.0.0.1:8000/repos/1/observed-facts
curl http://127.0.0.1:8000/repos/1/content-chunks
curl http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph

Correct candidate claims with review endpoints when needed, then approve:

curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph/approve \
  -H 'content-type: application/json' \
  -d '{"notes":"Approved after source-linked review"}'

Search and inspect only approved registry truth:

curl 'http://127.0.0.1:8000/search?q=classify%20email&status=indexed'
curl http://127.0.0.1:8000/repos/1/ability-map

Search results include match_type, matched_field, confidence, confidence_label, ability/capability identifiers where available, and source/evidence context when the match comes from implementation evidence. The generated OpenAPI schema at /openapi.json and docs at /docs include typed response schemas and examples for the main agent-facing responses. The API compatibility policy is documented in docs/api-contract.md; stable agent-facing paths are guarded by an OpenAPI contract snapshot test.

Discovery helpers are available for production-readiness workflows that compare approved profiles, find simple capability gaps, or export a registry entry:

curl 'http://127.0.0.1:8000/repository-comparisons?repository_ids=1&repository_ids=2'

curl -X POST http://127.0.0.1:8000/capability-gaps \
  -H 'content-type: application/json' \
  -d '{"desired_ability":"Business Email Routing","desired_capabilities":["Classify Incoming Email","Route Email to Team"],"repository_ids":[1,2]}'

curl http://127.0.0.1:8000/repos/1/export

Description

A platform that analyzes utility characteristics of git repositories maping scope, abilities, capabilities, features into a searchable, inspectable description tree that reveals what the code can do and how it does it.

Readme MIT-0 5.1 MiB