3.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Commands
# Install
pip install -e ".[dev]"
# Run dev server (port 8001)
uvicorn repo_registry.web_api.app:app --reload --port 8001
# Run tests
pytest
pytest -k "test_scanner" # filter by keyword
pytest tests/test_web_api.py # single file
# Health check
curl http://127.0.0.1:8001/health
Note: AGENTS.md shows src.repo_registry.app:app but the correct module path is repo_registry.web_api.app:app (as installed via src/).
Architecture
The service maps Git repositories to reviewable scope maps using a fixed hierarchy:
Scope → Ability → Capability → Feature → Evidence → ObservedFact
Data flow for an analysis run:
POST /repos/{id}/analysis-runstriggers the pipeline inRegistryService.run_analysis()GitIngestionServiceclones or resolves the repo pathRepositoryMetadataExtractorreads pyproject.toml / package.json / READMEDeterministicScannerproducesObservedFactobjects (files, languages, manifests, APIs, etc.)ContentExtractorchunks files into searchable segmentsCandidateGraphGeneratorbuilds a draft ability→capability→feature→evidence tree from facts- Optionally,
LLMCandidateExtractorproposes additional candidates (requiresREPO_REGISTRY_LLM_ENABLED=true) - Candidates are stored; humans or agents review them via
POST .../candidate-graph/approve - Approved characteristics feed
ScopeGeneratorto produceSCOPE.md
Key source locations:
| Component | Path |
|---|---|
| FastAPI routes + DI | src/repo_registry/web_api/app.py |
| Orchestration | src/repo_registry/core/service.py |
| Frozen dataclasses | src/repo_registry/core/models.py |
| Deterministic scanner | src/repo_registry/repo_scanning/scanner.py |
| Candidate graph builder | src/repo_registry/candidate_graph/generator.py |
| SQLite store | src/repo_registry/storage/sqlite.py |
| Schema migration | migrations/0001_initial.sql |
Storage: SQLite at var/repo-registry.sqlite3 (auto-created). Schema migrations run at startup. Dynamic columns are added to support evidence relationships, classification, and expectation gaps.
LLM extraction is optional and disabled by default. Enable with REPO_REGISTRY_LLM_ENABLED=true plus REPO_REGISTRY_LLM_PROVIDER and REPO_REGISTRY_LLM_MODEL. The llm-connect sibling package provides the adapter abstraction.
Semantic search uses HashingEmbeddingProvider by default — deterministic, no external service required.
Environment Variables
| Variable | Default | Purpose |
|---|---|---|
REPO_REGISTRY_DATABASE_PATH |
var/repo-registry.sqlite3 |
SQLite file |
REPO_REGISTRY_CHECKOUT_ROOT |
var/checkouts |
Git clone cache |
REPO_REGISTRY_LLM_ENABLED |
true |
Enable LLM extraction |
REPO_REGISTRY_LLM_PROVIDER |
— | e.g. gemini, anthropic |
REPO_REGISTRY_LLM_MODEL |
— | e.g. gemini-2.5-flash |
REPO_REGISTRY_STATE_HUB_BASE_URL |
http://127.0.0.1:8000 |
State Hub for coordination |
State Hub & Workplans
Active work is tracked in workplans/RREG-WP-*.md — these files are the source of truth (ADR-001). The Custodian State Hub caches this state; workplan files take precedence.
Session protocol (see AGENTS.md for full curl examples):
- Start: check
workplans/status headers and State Hub inbox - Close: update task statuses in workplan files, then
POST /progress/and sync viaPOST /repos/repo-scoping/sync
Workplan sync warns on C-17 (unpushed commits) — that's normal. A "result": "fail" needs investigation.
Docs
Design decisions and terminology live in docs/:
docs/terminology.md— characteristic model definitionsdocs/scope-md-spec.md— SCOPE.md formatdocs/characteristic-evidence-model.md— evidence target kindsdocs/classification-strategy.md— how characteristics are classified