CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

# Install
pip install -e ".[dev]"

# Run dev server (port 8001)
uvicorn repo_registry.web_api.app:app --reload --port 8001

# Run tests
pytest
pytest -k "test_scanner"        # filter by keyword
pytest tests/test_web_api.py    # single file

# Health check
curl http://127.0.0.1:8001/health

Note: AGENTS.md shows src.repo_registry.app:app but the correct module path is repo_registry.web_api.app:app (as installed via src/).

Architecture

The service maps Git repositories to reviewable scope maps using a fixed hierarchy:

Scope → Ability → Capability → Feature → Evidence → ObservedFact

Data flow for an analysis run:

POST /repos/{id}/analysis-runs triggers the pipeline in RegistryService.run_analysis()
GitIngestionService clones or resolves the repo path
RepositoryMetadataExtractor reads pyproject.toml / package.json / README
DeterministicScanner produces ObservedFact objects (files, languages, manifests, APIs, etc.)
ContentExtractor chunks files into searchable segments
CandidateGraphGenerator builds a draft ability→capability→feature→evidence tree from facts
Optionally, LLMCandidateExtractor proposes additional candidates (requires REPO_REGISTRY_LLM_ENABLED=true)
Candidates are stored; humans or agents review them via POST .../candidate-graph/approve
Approved characteristics feed ScopeGenerator to produce SCOPE.md

Key source locations:

Component	Path
FastAPI routes + DI	`src/repo_registry/web_api/app.py`
Orchestration	`src/repo_registry/core/service.py`
Frozen dataclasses	`src/repo_registry/core/models.py`
Deterministic scanner	`src/repo_registry/repo_scanning/scanner.py`
Candidate graph builder	`src/repo_registry/candidate_graph/generator.py`
SQLite store	`src/repo_registry/storage/sqlite.py`
Schema migration	`migrations/0001_initial.sql`

Storage: SQLite at var/repo-registry.sqlite3 (auto-created). Schema migrations run at startup. Dynamic columns are added to support evidence relationships, classification, and expectation gaps.

LLM extraction is optional and disabled by default. Enable with REPO_REGISTRY_LLM_ENABLED=true plus REPO_REGISTRY_LLM_PROVIDER and REPO_REGISTRY_LLM_MODEL. The llm-connect sibling package provides the adapter abstraction.

Semantic search uses HashingEmbeddingProvider by default — deterministic, no external service required.

Environment Variables

Variable	Default	Purpose
`REPO_REGISTRY_DATABASE_PATH`	`var/repo-registry.sqlite3`	SQLite file
`REPO_REGISTRY_CHECKOUT_ROOT`	`var/checkouts`	Git clone cache
`REPO_REGISTRY_LLM_ENABLED`	`true`	Enable LLM extraction
`REPO_REGISTRY_LLM_PROVIDER`	—	e.g. `gemini`, `anthropic`
`REPO_REGISTRY_LLM_MODEL`	—	e.g. `gemini-2.5-flash`
`REPO_REGISTRY_STATE_HUB_BASE_URL`	`http://127.0.0.1:8000`	State Hub for coordination

State Hub & Workplans

Active work is tracked in workplans/RREG-WP-*.md — these files are the source of truth (ADR-001). The Custodian State Hub caches this state; workplan files take precedence.

Session protocol (see AGENTS.md for full curl examples):

Start: check workplans/ status headers and State Hub inbox
Close: update task statuses in workplan files, then POST /progress/ and sync via POST /repos/repo-scoping/sync

Workplan sync warns on C-17 (unpushed commits) — that's normal. A "result": "fail" needs investigation.

Docs

Design decisions and terminology live in docs/:

docs/terminology.md — characteristic model definitions
docs/scope-md-spec.md — SCOPE.md format
docs/characteristic-evidence-model.md — evidence target kinds
docs/classification-strategy.md — how characteristics are classified

3.9 KiB Raw Blame History