coulomb/repo-scoping

Fork 0

generated from coulomb/repo-seed

Files

tegwick 04faa6e6f3 Closed mvp implementation workplan with followup for production hardening

2026-04-26 12:54:13 +02:00

17 KiB

Raw Blame History

Repository Ability Registry Implementation Workplan

MVP Closure

Status: closed as MVP complete on 2026-04-26.

The v0.1 implementation now covers the core product loop:

Register repository
Analyze repository
Generate source-linked candidate map
Review and approve candidates
Publish approved profile
Search, inspect, compare, gap-check, and export registry entries

The full test suite passed at closure with 63 passed.

Remaining work is no longer considered part of this MVP workplan. Production hardening items have moved to workplans/ProductionHardeningWorkplan.md, including first-class analysis-run diffs, semantic/vector search, broader fixture coverage, and richer UI surfaces for discovery workflows.

1. Documentation Review Summary

The wiki defines a coherent v0.1 product: a registry that turns Git repositories into reviewable, source-linked maps of:

Ability -> Capability -> Feature -> Evidence -> Code location

The strongest architectural principle across the docs is:

deterministic scanners establish observed facts
LLM-assisted extractors propose interpreted claims
humans or trusted agents approve registry truth

This should remain the core design constraint for implementation. The system should be conservative, explainable, reviewable, and source-linked rather than attempting fully automatic code understanding.

2. MVP Scope

The first version should implement the core journey documented in the PRD, FRS, architecture sketch, and use-case catalog:

Register repository
Analyze repository
Generate candidate ability/capability/feature/evidence map
Review and approve candidates
Publish registry profile
Search and inspect repositories

In scope for v0.1:

Repository registration by Git URL
Repository metadata and snapshot tracking
Deterministic repository scan
Candidate extraction for abilities, capabilities, features, and evidence
Human review actions: edit, approve, reject, merge, relink
Inspectable ability map
Natural-language search over approved registry entries
API access for repositories, ability maps, capabilities, and search

Out of scope for v0.1:

Continuous GitHub app integration
Full static code understanding
Advanced ontology enforcement
Distributed indexing
Benchmark execution
Marketplace features
Complex access control
Automated truth claims without review

3. Recommended Technical Baseline

Use a pragmatic stack that keeps the analyzer and registry easy to evolve:

Backend: Python FastAPI
Database: PostgreSQL
Semantic search: pgvector inside PostgreSQL
Worker: simple background jobs first; graduate to RQ or Celery when needed
Git access: subprocess git or GitPython
Frontend: React/Next.js or server-rendered FastAPI templates for earliest prototype
LLM extraction: provider-abstracted interface
Local artifact storage: filesystem under an application data directory

For the first implementation pass, prefer a modular monolith over distributed services. Keep clean module boundaries internally, but avoid operational complexity until the product loop is proven.

4. Core Domain Model

Implement these entities first:

Repository
RepositorySnapshot
AnalysisRun
ObservedFact
CandidateAbility
CandidateCapability
CandidateFeature
CandidateEvidence
ApprovedAbility
ApprovedCapability
ApprovedFeature
ApprovedEvidence
SourceReference
ReviewDecision

The model should preserve a clear distinction between observed facts and interpreted claims.

Observed facts include things like:

File paths
Documentation files
Test files
Package manifests
API routes
CLI commands
Public modules/functions
Detected languages/frameworks

Interpreted claims include:

Ability names and descriptions
Capability names and descriptions
Feature-to-capability links
Evidence-to-capability links
Confidence scores

5. Suggested Module Boundaries

Use the architecture sketch's boundaries as implementation modules:

repo_ingestion: validate Git URLs, clone/fetch repos, resolve branch/commit
repo_scanning: deterministic file tree, language, docs, tests, examples, API/CLI detection
content_indexing: text extraction, chunking, source references, embeddings
llm_extraction: prompt orchestration and structured candidate generation
candidate_graph: build and validate ability/capability/feature/evidence relationships
review_workflow: edit, approve, reject, merge, relink, publish
registry_query: search, filters, profile retrieval, ability-map assembly
web_api: HTTP endpoints and request/response schemas
web_ui: registration, analysis, review, profile, and search screens

6. Milestones

Milestone 0: Project Foundation

Goal: establish the application skeleton and development path.

Deliverables:

Backend app skeleton
Database migration setup
Configuration system
Local development instructions
Basic test harness
Health endpoint

Acceptance criteria:

App starts locally
Tests run locally
Database migrations apply cleanly

Milestone 1: Manual Registry

Goal: prove the core data model and inspection experience before automation.

Deliverables:

Repository CRUD
Manual ability/capability/feature/evidence CRUD
Ability map endpoint
Basic repository profile UI

Acceptance criteria:

A user can create a repository profile by hand
The UI displays Ability -> Capability -> Feature -> Evidence
API returns the same map as structured JSON

Milestone 2: Git Ingestion and Deterministic Scanner

Goal: establish trustworthy observed facts from repository contents.

Deliverables:

Git URL validation
Clone/fetch and checkout
Snapshot record with branch and commit hash
File tree scan
README/docs/examples/tests/package manifest detection
Basic language/framework/interface detection
Analysis run status tracking

Acceptance criteria:

A public Git repository can be registered and analyzed
The system records a snapshot and deterministic scan summary
Analysis failures are visible without corrupting prior data

Milestone 3: Reviewable Candidate Graph

Goal: generate candidate registry entries from deterministic facts and extracted content.

Deliverables:

Content extraction from README, docs, examples, tests, package metadata, and selected source files
Source references with file paths and line ranges where possible
Candidate ability generation
Candidate capability generation
Candidate feature generation
Candidate evidence detection
Confidence scoring using the documented additive factors
Candidate graph endpoint and UI

Acceptance criteria:

Analysis produces candidates with source references and confidence
Candidates distinguish observed facts from interpreted claims
Candidate output is explainable enough for curator review

Milestone 4: Review and Approval Workflow

Goal: turn candidates into canonical registry entries.

Deliverables:

Approve/reject candidate entries
Edit names, descriptions, confidence, and relationships
Merge duplicate abilities/capabilities/features
Relink capabilities, features, and evidence
Publish approved repository profile
Persist review decisions

Acceptance criteria:

A curator can correct and approve an analysis result
Only approved entries appear in canonical search/profile views
Repository status changes from analyzed to indexed/published

Milestone 5: Search and Inspection

Goal: make the registry useful for discovery.

Deliverables:

Text search over repositories, abilities, capabilities, and descriptions
Semantic search with pgvector
Search filters for language, framework, and ability/capability presence
Search UI
Repository profile drill-down UI
Code/evidence links from features and capabilities

Acceptance criteria:

A user can search by need using natural language
Results show repository, matching ability/capability, confidence, and evidence level
A user can drill from a search result into the ability map and code/evidence references

Milestone 6: API Completeness for Agents

Goal: support programmatic consumers cleanly.

Deliverables:

GET /repos
POST /repos
GET /repos/{id}
POST /repos/{id}/analysis-runs
GET /repos/{id}/analysis-runs/{run_id}
GET /repos/{id}/ability-map
GET /abilities
GET /capabilities
GET /search?q=...
OpenAPI examples

Acceptance criteria:

API covers repository registration, analysis, search, and inspection
Responses are stable enough for agent/tooling integration
OpenAPI docs describe all MVP endpoints

6.1 Implemented Status Checkpoint

Status date: 2026-04-26

Current implementation baseline:

Milestone 0: implemented. FastAPI app, SQLite migrations, settings, health endpoint, README development flow, and pytest harness are in place.
Milestone 1: implemented. Repository CRUD, manual ability/capability/feature/evidence CRUD, ability-map API, and server-rendered repository profile UI are in place.
Milestone 2: implemented for local paths and Git URLs. Registration can import metadata, analysis records snapshots and observed facts, and failures are captured on analysis runs.
Milestone 3: implemented for deterministic extraction plus optional LLM-assisted extraction. Analysis stores content chunks, source-linked candidates, candidate evidence, confidence scores, and confidence labels.
Milestone 4: implemented. Candidate approval, reject, edit, relink, merge, review decisions, and indexed repository publication are supported through API and UI paths.
Milestone 5: partially implemented. Text search, filters, search UI, ability-map drill-down, and evidence/source context are implemented. pgvector-backed semantic search remains future work.
Milestone 6: implemented for the MVP and review workflow. Agent-facing endpoints have typed OpenAPI response schemas, examples, tags, and docs smoke coverage.

Use case coverage status:

ID	Use Case	Implementation Status	E2E Coverage Status
UC-01	Register Git Repository	Implemented through API and UI.	Covered by API and UI registration loops.
UC-02	Import Repository Metadata	Implemented from repository files when name/description are omitted.	Covered by API and service metadata tests.
UC-03	Analyze Repository Structure	Implemented by deterministic scanner and analysis runs.	Covered by API, service, scanner, and UI analysis loops.
UC-04	Extract Candidate Abilities	Implemented by deterministic generator and optional LLM mapper.	Covered by API/service analysis loops and LLM extraction tests.
UC-05	Extract Candidate Capabilities	Implemented by deterministic generator and optional LLM mapper.	Covered by API/service analysis loops and LLM extraction tests.
UC-06	Extract Candidate Features	Implemented with detected interfaces, languages, frameworks, docs, tests, and manifests.	Covered by API/service analysis loops plus source-linked fixture e2e assertions.
UC-07	Link Features to Code Locations	Implemented through feature locations and source references.	Covered by service approval tests and API e2e assertions for source paths/lines.
UC-08	Attach Evidence to Capabilities	Implemented for candidate and approved evidence.	Covered by API/UI review, manual registry tests, and source-linked approved evidence e2e assertions.
UC-09	Review and Approve Analysis	Implemented through approve, edit, reject, relink, merge, and review decisions.	Covered by API/service/UI review tests.
UC-10	Search Repositories by Need	Implemented with text search and structured filters.	Covered by API/service/UI search tests. Semantic search remains future work.
UC-11	Inspect Repository Ability Map	Implemented through API and UI profile drill-down.	Covered by API/service/UI ability-map tests.
UC-12	Compare Repositories	Implemented as a read-only API comparison over approved ability maps.	Covered by API e2e comparison test.
UC-13	Detect Capability Gaps	Implemented as a read-only API gap report over desired capabilities and approved maps.	Covered by API e2e gap-analysis test.
UC-14	Expose Registry via API	Implemented for MVP plus review workflow.	Covered by API contract, OpenAPI, and docs smoke tests.
UC-15	Update Registry After Repo Change	Partially implemented by rerunning analysis; no explicit diff/change-review workflow yet.	Covered for rerun behavior by API e2e: second analysis records new candidates without corrupting approved profile.
UC-16	Export Registry Entry	Implemented as YAML export for approved registry entries.	Covered by API e2e export test.

Immediate production-readiness test focus:

If UC-15 becomes a production priority, add an explicit diff/change-review model instead of relying only on rerun analysis.
Broaden fixture coverage over time for README-only, Python CLI, FastAPI, JavaScript/TypeScript, tests/examples, and weak-doc repositories.
Add richer UI affordances for comparison, gap analysis, and export if these discovery endpoints become curator-facing workflows.

7. Initial Database Shape

Start with tables for:

repositories
repository_snapshots
analysis_runs
observed_facts
source_references
candidate_abilities
candidate_capabilities
candidate_features
candidate_evidence
candidate_links
approved_abilities
approved_capabilities
approved_features
approved_evidence
approved_links
review_decisions
content_chunks
embeddings

Use status fields consistently:

registered
ingesting
analyzing
analysis_failed
analyzed
reviewing
indexed

8. Analyzer v0.1 Strategy

The first analyzer should be intentionally modest.

Deterministic scan:

Identify repo root metadata files
Identify docs, examples, tests, package manifests, API specs, config files
Detect languages from extensions and package files
Detect common frameworks from manifests
Detect likely API/CLI features using simple framework-specific scanners

Content extraction:

README and docs first
Examples and tests second
Selected source files only when they expose interfaces
Preserve path and line references

LLM extraction:

Use separate prompts for abilities, capabilities, features, and evidence
Request structured JSON
Require source references for each candidate
Reject or mark speculative any candidate without supporting sources

Confidence scoring:

Start from the documented additive model
Normalize to 0.0-1.0
Store both numeric confidence and label

9. UI Workplan

Build application screens in this order:

Repository list
Repository registration
Repository detail and analysis status
Deterministic scan summary
Candidate review tree
Published repository profile
Search

The UI should feel like an operational tool rather than a marketing site: dense, clear, review-focused, and optimized for repeated curator work.

10. Testing Strategy

Add tests around the highest-risk boundaries:

Database migrations and model relationships
Git URL validation
Scanner output for fixture repositories
Candidate graph validation
Review workflow transitions
Search result ranking and filtering
API contract tests for MVP endpoints

Create small fixture repositories for:

README-only repository
Python CLI repository
FastAPI repository
JavaScript/TypeScript package
Repository with tests and examples
Repository with weak or misleading docs

11. Key Risks and Mitigations

Extraction quality risk:

Require source references.
Keep candidates reviewable.
Separate observed facts from interpreted claims.

Over-complex ontology risk:

Keep v0.1 schema minimal.
Avoid enforcing deep taxonomy too early.

Search quality risk:

Combine relational filters, full-text search, and vector search.
Show why a result matched.

Operational complexity risk:

Start as a modular monolith.
Use simple jobs before adding worker infrastructure.

Trust risk:

Never publish unapproved claims as canonical truth.
Preserve analysis run history and review decisions.

12. Immediate Next Actions

Recommended next implementation sequence:

Scaffold the FastAPI application, database migrations, and test harness.
Implement the core schema for repositories, snapshots, analysis runs, observed facts, candidates, and approved entries.
Add manual registry CRUD and ability-map API.
Build a minimal repository list/profile UI.
Add Git ingestion and deterministic scanning.
Add candidate graph generation and review workflow.

The first meaningful demo should be:

Create a repository
Add or generate an ability map
Approve it
Search for a capability
Open the repository profile
Drill down to feature and evidence locations

17 KiB Raw Blame History

Repository Ability Registry Implementation Workplan

MVP Closure

1. Documentation Review Summary

2. MVP Scope

3. Recommended Technical Baseline

4. Core Domain Model

5. Suggested Module Boundaries

6. Milestones

Milestone 0: Project Foundation

Milestone 1: Manual Registry

Milestone 2: Git Ingestion and Deterministic Scanner

Milestone 3: Reviewable Candidate Graph

Milestone 4: Review and Approval Workflow

Milestone 5: Search and Inspection

Milestone 6: API Completeness for Agents

6.1 Implemented Status Checkpoint

7. Initial Database Shape

8. Analyzer v0.1 Strategy

9. UI Workplan

10. Testing Strategy

11. Key Risks and Mitigations

12. Immediate Next Actions

17 KiB

Raw Blame History