repo-scoping/README.md

# Repository Ability Registry

The Repository Ability Registry maps repositories from usefulness to implementation:

```text
Ability -> Capability -> Feature -> Evidence -> Code location
```

The first implementation slice is a Python registry core plus FastAPI HTTP API and a small curator UI. Repository registration imports basic metadata from the repository itself, then analysis builds observed facts and candidate review entries.

## Local Development

Create an environment and install dependencies:

```bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"
```

Run tests:

```bash
pytest
```

Run the API:

```bash
uvicorn repo_registry.web_api.app:app --reload
```

The API creates a local SQLite database at `var/repo-registry.sqlite3` by default.

## First API Loop

```bash
curl -X POST http://127.0.0.1:8000/repos \
  -H 'content-type: application/json' \
  -d '{"url":"https://example.com/mail-router.git"}'
```

The registry imports name and description from `pyproject.toml`, `package.json`, or README where possible. Then add abilities, capabilities, features, and evidence under that repository and inspect:

```bash
curl http://127.0.0.1:8000/repos/1/ability-map
curl 'http://127.0.0.1:8000/search?q=classify'
```

## Deterministic Analysis

For local development, repository URLs may be local filesystem paths. Git URLs, including `file://` URLs, are cloned into `var/checkouts` before scanning. Trigger a deterministic scan:

```bash
curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs \
  -H 'content-type: application/json' \
  -d '{}'
```

Or override the scan source path explicitly:

```bash
curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs \
  -H 'content-type: application/json' \
  -d '{"source_path":"/path/to/repository"}'
```

Inspect recorded facts:

```bash
curl http://127.0.0.1:8000/repos/1/analysis-runs
curl http://127.0.0.1:8000/repos/1/analysis-runs/1
curl http://127.0.0.1:8000/repos/1/observed-facts
```

The deterministic scanner records observed facts only: languages, documentation files, examples, tests, package manifests, configuration files, framework hints, and likely API/CLI interfaces.

Each completed analysis run also creates a conservative candidate graph for review:

```bash
curl http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph
```

Candidate entries are source-linked review seeds. They are not canonical registry truth until a review workflow approves them.
Candidate, approved, and search responses include numeric confidence values plus `low`, `medium`, or `high` confidence labels for quick triage.

Approve a candidate graph into the canonical registry:

```bash
curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph/approve \
  -H 'content-type: application/json' \
  -d '{"notes":"Approved first review package"}'
```

Approval copies candidate abilities, capabilities, features, and evidence into the approved registry tables, marks candidates approved, and moves the repository status to `indexed`.

## Review Workflow

Candidate graphs are meant to be corrected before publication. The API supports:

- edit candidate abilities and capabilities with `PATCH`
- reject candidate abilities, capabilities, features, and evidence
- relink capabilities under another ability
- relink features or evidence under another capability
- merge duplicate abilities, capabilities, features, or evidence

Examples are available in the generated OpenAPI docs at `/docs`.

## Optional LLM Extraction

The `llm_extraction` module is designed to work with the sibling `llm-connect`
project without making it a hard dependency. To enable provider-backed
extraction locally:

```bash
python -m pip install -e ../llm-connect
```

The integration accepts any `llm-connect` style adapter with
`execute_prompt(prompt, config)` and parses strict JSON candidate drafts from
model responses. Parsed drafts can be mapped into reviewable candidate graph
entries while preserving source paths where they match observed facts or
content chunks. Tests use fake adapters, so the default test suite does not call
external providers.

Application code can inject an `LLMCandidateExtractor` into `RegistryService`.
When an extractor is present and returns candidates, analysis stores those
reviewable candidates; when it returns no candidates, the deterministic
heuristic generator remains the fallback.
If extraction fails, the failure is recorded as a review decision and analysis
continues with deterministic candidates.
Successful LLM candidate generation is also recorded as a review decision so
curators can see whether a graph came from deterministic heuristics or an LLM
draft.

The FastAPI settings object also accepts `llm_provider` and `llm_model`. By
default `llm_provider` is unset, so analysis is fully offline and deterministic.
Environment variables use the `REPO_REGISTRY_` prefix:

```bash
REPO_REGISTRY_LLM_PROVIDER=gemini
REPO_REGISTRY_LLM_MODEL=gemini-2.5-flash
```

## Agent-Facing Endpoints

The v0.1 API covers the main registration, analysis, review, search, and inspection loop:

```text
GET  /repos
POST /repos
GET  /repos/{id}
PATCH /repos/{id}
DELETE /repos/{id}
POST /repos/{id}/analysis-runs
GET  /repos/{id}/analysis-runs
GET  /repos/{id}/analysis-runs/{run_id}
GET  /repos/{id}/analysis-runs/{run_id}/candidate-graph
POST /repos/{id}/analysis-runs/{run_id}/candidate-graph/approve
GET  /repos/{id}/ability-map
PATCH /repos/{id}/abilities/{ability_id}
DELETE /repos/{id}/abilities/{ability_id}
PATCH /repos/{id}/capabilities/{capability_id}
DELETE /repos/{id}/capabilities/{capability_id}
PATCH /repos/{id}/features/{feature_id}
DELETE /repos/{id}/features/{feature_id}
PATCH /repos/{id}/evidence/{evidence_id}
DELETE /repos/{id}/evidence/{evidence_id}
GET  /abilities
GET  /capabilities
GET  /search?q=...
GET  /repository-comparisons?repository_ids=1&repository_ids=2
POST /capability-gaps
GET  /repos/{id}/export
```

## Agent API Loop

Agents can use the API as a conservative inspect-and-curate loop:

1. Register or find a repository with `POST /repos` or `GET /repos`.
2. Run analysis with `POST /repos/{id}/analysis-runs`.
3. Inspect observed facts, content chunks, and the candidate graph:

```bash
curl http://127.0.0.1:8000/repos/1/observed-facts
curl http://127.0.0.1:8000/repos/1/content-chunks
curl http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph
```

4. Correct candidate claims with review endpoints when needed, then approve:

```bash
curl -X POST http://127.0.0.1:8000/repos/1/analysis-runs/1/candidate-graph/approve \
  -H 'content-type: application/json' \
  -d '{"notes":"Approved after source-linked review"}'
```

5. Search and inspect only approved registry truth:

```bash
curl 'http://127.0.0.1:8000/search?q=classify%20email&status=indexed'
curl http://127.0.0.1:8000/repos/1/ability-map
```

Search results include `match_type`, `matched_field`, `confidence`,
`confidence_label`, ability/capability identifiers where available, and
source/evidence context when the match comes from implementation evidence.
The generated OpenAPI schema at `/openapi.json` and docs at `/docs` include
typed response schemas and examples for the main agent-facing responses.
The API compatibility policy is documented in `docs/api-contract.md`; stable
agent-facing paths are guarded by an OpenAPI contract snapshot test.

Discovery helpers are available for production-readiness workflows that compare
approved profiles, find simple capability gaps, or export a registry entry:

```bash
curl 'http://127.0.0.1:8000/repository-comparisons?repository_ids=1&repository_ids=2'

curl -X POST http://127.0.0.1:8000/capability-gaps \
  -H 'content-type: application/json' \
  -d '{"desired_ability":"Business Email Routing","desired_capabilities":["Classify Incoming Email","Route Email to Team"],"repository_ids":[1,2]}'

curl http://127.0.0.1:8000/repos/1/export
```