generated from coulomb/repo-seed
91 lines
3.9 KiB
Markdown
91 lines
3.9 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Install
|
|
pip install -e ".[dev]"
|
|
|
|
# Run dev server (port 8001)
|
|
uvicorn repo_registry.web_api.app:app --reload --port 8001
|
|
|
|
# Run tests
|
|
pytest
|
|
pytest -k "test_scanner" # filter by keyword
|
|
pytest tests/test_web_api.py # single file
|
|
|
|
# Health check
|
|
curl http://127.0.0.1:8001/health
|
|
```
|
|
|
|
Note: `AGENTS.md` shows `src.repo_registry.app:app` but the correct module path is `repo_registry.web_api.app:app` (as installed via `src/`).
|
|
|
|
## Architecture
|
|
|
|
The service maps Git repositories to reviewable scope maps using a fixed hierarchy:
|
|
|
|
```
|
|
Scope → Ability → Capability → Feature → Evidence → ObservedFact
|
|
```
|
|
|
|
**Data flow for an analysis run:**
|
|
|
|
1. `POST /repos/{id}/analysis-runs` triggers the pipeline in `RegistryService.run_analysis()`
|
|
2. `GitIngestionService` clones or resolves the repo path
|
|
3. `RepositoryMetadataExtractor` reads pyproject.toml / package.json / README
|
|
4. `DeterministicScanner` produces `ObservedFact` objects (files, languages, manifests, APIs, etc.)
|
|
5. `ContentExtractor` chunks files into searchable segments
|
|
6. `CandidateGraphGenerator` builds a draft ability→capability→feature→evidence tree from facts
|
|
7. Optionally, `LLMCandidateExtractor` proposes additional candidates (requires `REPO_REGISTRY_LLM_ENABLED=true`)
|
|
8. Candidates are stored; humans or agents review them via `POST .../candidate-graph/approve`
|
|
9. Approved characteristics feed `ScopeGenerator` to produce `SCOPE.md`
|
|
|
|
**Key source locations:**
|
|
|
|
| Component | Path |
|
|
|-----------|------|
|
|
| FastAPI routes + DI | `src/repo_registry/web_api/app.py` |
|
|
| Orchestration | `src/repo_registry/core/service.py` |
|
|
| Frozen dataclasses | `src/repo_registry/core/models.py` |
|
|
| Deterministic scanner | `src/repo_registry/repo_scanning/scanner.py` |
|
|
| Candidate graph builder | `src/repo_registry/candidate_graph/generator.py` |
|
|
| SQLite store | `src/repo_registry/storage/sqlite.py` |
|
|
| Schema migration | `migrations/0001_initial.sql` |
|
|
|
|
**Storage:** SQLite at `var/repo-registry.sqlite3` (auto-created). Schema migrations run at startup. Dynamic columns are added to support evidence relationships, classification, and expectation gaps.
|
|
|
|
**LLM extraction** is optional and disabled by default. Enable with `REPO_REGISTRY_LLM_ENABLED=true` plus `REPO_REGISTRY_LLM_PROVIDER` and `REPO_REGISTRY_LLM_MODEL`. The `llm-connect` sibling package provides the adapter abstraction.
|
|
|
|
**Semantic search** uses `HashingEmbeddingProvider` by default — deterministic, no external service required.
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Default | Purpose |
|
|
|----------|---------|---------|
|
|
| `REPO_REGISTRY_DATABASE_PATH` | `var/repo-registry.sqlite3` | SQLite file |
|
|
| `REPO_REGISTRY_CHECKOUT_ROOT` | `var/checkouts` | Git clone cache |
|
|
| `REPO_REGISTRY_LLM_ENABLED` | `true` | Enable LLM extraction |
|
|
| `REPO_REGISTRY_LLM_PROVIDER` | — | e.g. `gemini`, `anthropic` |
|
|
| `REPO_REGISTRY_LLM_MODEL` | — | e.g. `gemini-2.5-flash` |
|
|
| `REPO_REGISTRY_STATE_HUB_BASE_URL` | `http://127.0.0.1:8000` | State Hub for coordination |
|
|
|
|
## State Hub & Workplans
|
|
|
|
Active work is tracked in `workplans/RREG-WP-*.md` — these files are the source of truth (ADR-001). The Custodian State Hub caches this state; workplan files take precedence.
|
|
|
|
Session protocol (see `AGENTS.md` for full curl examples):
|
|
- **Start:** check `workplans/` status headers and State Hub inbox
|
|
- **Close:** update task statuses in workplan files, then `POST /progress/` and sync via `POST /repos/repo-scoping/sync`
|
|
|
|
Workplan sync warns on C-17 (unpushed commits) — that's normal. A `"result": "fail"` needs investigation.
|
|
|
|
## Docs
|
|
|
|
Design decisions and terminology live in `docs/`:
|
|
- `docs/terminology.md` — characteristic model definitions
|
|
- `docs/scope-md-spec.md` — SCOPE.md format
|
|
- `docs/characteristic-evidence-model.md` — evidence target kinds
|
|
- `docs/classification-strategy.md` — how characteristics are classified
|