Files
repo-scoping/CLAUDE.md

3.8 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

# Install
pip install -e ".[dev]"

# Run dev server (port 8001)
uvicorn repo_scoping.web_api.app:app --reload --port 8001

# Run tests
pytest
pytest -k "test_scanner"        # filter by keyword
pytest tests/test_web_api.py    # single file

# Health check
curl http://127.0.0.1:8001/health

Architecture

The service maps Git repositories to reviewable scope maps using a fixed hierarchy:

Scope → Ability → Capability → Feature → Evidence → ObservedFact

Data flow for an analysis run:

  1. POST /repos/{id}/analysis-runs triggers the pipeline in RegistryService.run_analysis()
  2. GitIngestionService clones or resolves the repo path
  3. RepositoryMetadataExtractor reads pyproject.toml / package.json / README
  4. DeterministicScanner produces ObservedFact objects (files, languages, manifests, APIs, etc.)
  5. ContentExtractor chunks files into searchable segments
  6. CandidateGraphGenerator builds a draft ability→capability→feature→evidence tree from facts
  7. Optionally, LLMCandidateExtractor proposes additional candidates (requires REPO_SCOPING_LLM_ENABLED=true)
  8. Candidates are stored; humans or agents review them via POST .../candidate-graph/approve
  9. Approved characteristics feed ScopeGenerator to produce SCOPE.md

Key source locations:

Component Path
FastAPI routes + DI src/repo_scoping/web_api/app.py
Orchestration src/repo_scoping/core/service.py
Frozen dataclasses src/repo_scoping/core/models.py
Deterministic scanner src/repo_scoping/repo_scanning/scanner.py
Candidate graph builder src/repo_scoping/candidate_graph/generator.py
SQLite store src/repo_scoping/storage/sqlite.py
Schema migration migrations/0001_initial.sql

Storage: SQLite at var/repo-scoping.sqlite3 (auto-created). Schema migrations run at startup. Dynamic columns are added to support evidence relationships, classification, and expectation gaps.

LLM extraction is optional and disabled by default. Enable with REPO_SCOPING_LLM_ENABLED=true plus REPO_SCOPING_LLM_PROVIDER and REPO_SCOPING_LLM_MODEL. The llm-connect sibling package provides the adapter abstraction.

Semantic search uses HashingEmbeddingProvider by default — deterministic, no external service required.

Environment Variables

Variable Default Purpose
REPO_SCOPING_DATABASE_PATH var/repo-scoping.sqlite3 SQLite file
REPO_SCOPING_CHECKOUT_ROOT var/checkouts Git clone cache
REPO_SCOPING_LLM_ENABLED true Enable LLM extraction
REPO_SCOPING_LLM_PROVIDER e.g. gemini, anthropic
REPO_SCOPING_LLM_MODEL e.g. gemini-2.5-flash
REPO_SCOPING_STATE_HUB_BASE_URL http://127.0.0.1:8000 State Hub for coordination

State Hub & Workplans

Active work is tracked in workplans/RREG-WP-*.md — these files are the source of truth (ADR-001). The Custodian State Hub caches this state; workplan files take precedence.

Session protocol (see AGENTS.md for full curl examples):

  • Start: check workplans/ status headers and State Hub inbox
  • Close: update task statuses in workplan files, then POST /progress/ and sync via POST /repos/repo-scoping/sync

Workplan sync warns on C-17 (unpushed commits) — that's normal. A "result": "fail" needs investigation.

Docs

Design decisions and terminology live in docs/:

  • docs/terminology.md — characteristic model definitions
  • docs/scope-md-spec.md — SCOPE.md format
  • docs/characteristic-evidence-model.md — evidence target kinds
  • docs/classification-strategy.md — how characteristics are classified