Files
repo-scoping/workplans/ImplementationWorkplan.md

17 KiB

Repository Ability Registry Implementation Workplan

MVP Closure

Status: closed as MVP complete on 2026-04-26.

The v0.1 implementation now covers the core product loop:

Register repository
Analyze repository
Generate source-linked candidate map
Review and approve candidates
Publish approved profile
Search, inspect, compare, gap-check, and export registry entries

The full test suite passed at closure with 63 passed.

Remaining work is no longer considered part of this MVP workplan. Production hardening items have moved to workplans/ProductionHardeningWorkplan.md, including first-class analysis-run diffs, semantic/vector search, broader fixture coverage, and richer UI surfaces for discovery workflows.

1. Documentation Review Summary

The wiki defines a coherent v0.1 product: a registry that turns Git repositories into reviewable, source-linked maps of:

Ability -> Capability -> Feature -> Evidence -> Code location

The strongest architectural principle across the docs is:

deterministic scanners establish observed facts
LLM-assisted extractors propose interpreted claims
humans or trusted agents approve registry truth

This should remain the core design constraint for implementation. The system should be conservative, explainable, reviewable, and source-linked rather than attempting fully automatic code understanding.

2. MVP Scope

The first version should implement the core journey documented in the PRD, FRS, architecture sketch, and use-case catalog:

Register repository
Analyze repository
Generate candidate ability/capability/feature/evidence map
Review and approve candidates
Publish registry profile
Search and inspect repositories

In scope for v0.1:

  • Repository registration by Git URL
  • Repository metadata and snapshot tracking
  • Deterministic repository scan
  • Candidate extraction for abilities, capabilities, features, and evidence
  • Human review actions: edit, approve, reject, merge, relink
  • Inspectable ability map
  • Natural-language search over approved registry entries
  • API access for repositories, ability maps, capabilities, and search

Out of scope for v0.1:

  • Continuous GitHub app integration
  • Full static code understanding
  • Advanced ontology enforcement
  • Distributed indexing
  • Benchmark execution
  • Marketplace features
  • Complex access control
  • Automated truth claims without review

Use a pragmatic stack that keeps the analyzer and registry easy to evolve:

  • Backend: Python FastAPI
  • Database: PostgreSQL
  • Semantic search: pgvector inside PostgreSQL
  • Worker: simple background jobs first; graduate to RQ or Celery when needed
  • Git access: subprocess git or GitPython
  • Frontend: React/Next.js or server-rendered FastAPI templates for earliest prototype
  • LLM extraction: provider-abstracted interface
  • Local artifact storage: filesystem under an application data directory

For the first implementation pass, prefer a modular monolith over distributed services. Keep clean module boundaries internally, but avoid operational complexity until the product loop is proven.

4. Core Domain Model

Implement these entities first:

  • Repository
  • RepositorySnapshot
  • AnalysisRun
  • ObservedFact
  • CandidateAbility
  • CandidateCapability
  • CandidateFeature
  • CandidateEvidence
  • ApprovedAbility
  • ApprovedCapability
  • ApprovedFeature
  • ApprovedEvidence
  • SourceReference
  • ReviewDecision

The model should preserve a clear distinction between observed facts and interpreted claims.

Observed facts include things like:

  • File paths
  • Documentation files
  • Test files
  • Package manifests
  • API routes
  • CLI commands
  • Public modules/functions
  • Detected languages/frameworks

Interpreted claims include:

  • Ability names and descriptions
  • Capability names and descriptions
  • Feature-to-capability links
  • Evidence-to-capability links
  • Confidence scores

5. Suggested Module Boundaries

Use the architecture sketch's boundaries as implementation modules:

  • repo_ingestion: validate Git URLs, clone/fetch repos, resolve branch/commit
  • repo_scanning: deterministic file tree, language, docs, tests, examples, API/CLI detection
  • content_indexing: text extraction, chunking, source references, embeddings
  • llm_extraction: prompt orchestration and structured candidate generation
  • candidate_graph: build and validate ability/capability/feature/evidence relationships
  • review_workflow: edit, approve, reject, merge, relink, publish
  • registry_query: search, filters, profile retrieval, ability-map assembly
  • web_api: HTTP endpoints and request/response schemas
  • web_ui: registration, analysis, review, profile, and search screens

6. Milestones

Milestone 0: Project Foundation

Goal: establish the application skeleton and development path.

Deliverables:

  • Backend app skeleton
  • Database migration setup
  • Configuration system
  • Local development instructions
  • Basic test harness
  • Health endpoint

Acceptance criteria:

  • App starts locally
  • Tests run locally
  • Database migrations apply cleanly

Milestone 1: Manual Registry

Goal: prove the core data model and inspection experience before automation.

Deliverables:

  • Repository CRUD
  • Manual ability/capability/feature/evidence CRUD
  • Ability map endpoint
  • Basic repository profile UI

Acceptance criteria:

  • A user can create a repository profile by hand
  • The UI displays Ability -> Capability -> Feature -> Evidence
  • API returns the same map as structured JSON

Milestone 2: Git Ingestion and Deterministic Scanner

Goal: establish trustworthy observed facts from repository contents.

Deliverables:

  • Git URL validation
  • Clone/fetch and checkout
  • Snapshot record with branch and commit hash
  • File tree scan
  • README/docs/examples/tests/package manifest detection
  • Basic language/framework/interface detection
  • Analysis run status tracking

Acceptance criteria:

  • A public Git repository can be registered and analyzed
  • The system records a snapshot and deterministic scan summary
  • Analysis failures are visible without corrupting prior data

Milestone 3: Reviewable Candidate Graph

Goal: generate candidate registry entries from deterministic facts and extracted content.

Deliverables:

  • Content extraction from README, docs, examples, tests, package metadata, and selected source files
  • Source references with file paths and line ranges where possible
  • Candidate ability generation
  • Candidate capability generation
  • Candidate feature generation
  • Candidate evidence detection
  • Confidence scoring using the documented additive factors
  • Candidate graph endpoint and UI

Acceptance criteria:

  • Analysis produces candidates with source references and confidence
  • Candidates distinguish observed facts from interpreted claims
  • Candidate output is explainable enough for curator review

Milestone 4: Review and Approval Workflow

Goal: turn candidates into canonical registry entries.

Deliverables:

  • Approve/reject candidate entries
  • Edit names, descriptions, confidence, and relationships
  • Merge duplicate abilities/capabilities/features
  • Relink capabilities, features, and evidence
  • Publish approved repository profile
  • Persist review decisions

Acceptance criteria:

  • A curator can correct and approve an analysis result
  • Only approved entries appear in canonical search/profile views
  • Repository status changes from analyzed to indexed/published

Milestone 5: Search and Inspection

Goal: make the registry useful for discovery.

Deliverables:

  • Text search over repositories, abilities, capabilities, and descriptions
  • Semantic search with pgvector
  • Search filters for language, framework, and ability/capability presence
  • Search UI
  • Repository profile drill-down UI
  • Code/evidence links from features and capabilities

Acceptance criteria:

  • A user can search by need using natural language
  • Results show repository, matching ability/capability, confidence, and evidence level
  • A user can drill from a search result into the ability map and code/evidence references

Milestone 6: API Completeness for Agents

Goal: support programmatic consumers cleanly.

Deliverables:

  • GET /repos
  • POST /repos
  • GET /repos/{id}
  • POST /repos/{id}/analysis-runs
  • GET /repos/{id}/analysis-runs/{run_id}
  • GET /repos/{id}/ability-map
  • GET /abilities
  • GET /capabilities
  • GET /search?q=...
  • OpenAPI examples

Acceptance criteria:

  • API covers repository registration, analysis, search, and inspection
  • Responses are stable enough for agent/tooling integration
  • OpenAPI docs describe all MVP endpoints

6.1 Implemented Status Checkpoint

Status date: 2026-04-26

Current implementation baseline:

  • Milestone 0: implemented. FastAPI app, SQLite migrations, settings, health endpoint, README development flow, and pytest harness are in place.
  • Milestone 1: implemented. Repository CRUD, manual ability/capability/feature/evidence CRUD, ability-map API, and server-rendered repository profile UI are in place.
  • Milestone 2: implemented for local paths and Git URLs. Registration can import metadata, analysis records snapshots and observed facts, and failures are captured on analysis runs.
  • Milestone 3: implemented for deterministic extraction plus optional LLM-assisted extraction. Analysis stores content chunks, source-linked candidates, candidate evidence, confidence scores, and confidence labels.
  • Milestone 4: implemented. Candidate approval, reject, edit, relink, merge, review decisions, and indexed repository publication are supported through API and UI paths.
  • Milestone 5: partially implemented. Text search, filters, search UI, ability-map drill-down, and evidence/source context are implemented. pgvector-backed semantic search remains future work.
  • Milestone 6: implemented for the MVP and review workflow. Agent-facing endpoints have typed OpenAPI response schemas, examples, tags, and docs smoke coverage.

Use case coverage status:

ID Use Case Implementation Status E2E Coverage Status
UC-01 Register Git Repository Implemented through API and UI. Covered by API and UI registration loops.
UC-02 Import Repository Metadata Implemented from repository files when name/description are omitted. Covered by API and service metadata tests.
UC-03 Analyze Repository Structure Implemented by deterministic scanner and analysis runs. Covered by API, service, scanner, and UI analysis loops.
UC-04 Extract Candidate Abilities Implemented by deterministic generator and optional LLM mapper. Covered by API/service analysis loops and LLM extraction tests.
UC-05 Extract Candidate Capabilities Implemented by deterministic generator and optional LLM mapper. Covered by API/service analysis loops and LLM extraction tests.
UC-06 Extract Candidate Features Implemented with detected interfaces, languages, frameworks, docs, tests, and manifests. Covered by API/service analysis loops plus source-linked fixture e2e assertions.
UC-07 Link Features to Code Locations Implemented through feature locations and source references. Covered by service approval tests and API e2e assertions for source paths/lines.
UC-08 Attach Evidence to Capabilities Implemented for candidate and approved evidence. Covered by API/UI review, manual registry tests, and source-linked approved evidence e2e assertions.
UC-09 Review and Approve Analysis Implemented through approve, edit, reject, relink, merge, and review decisions. Covered by API/service/UI review tests.
UC-10 Search Repositories by Need Implemented with text search and structured filters. Covered by API/service/UI search tests. Semantic search remains future work.
UC-11 Inspect Repository Ability Map Implemented through API and UI profile drill-down. Covered by API/service/UI ability-map tests.
UC-12 Compare Repositories Implemented as a read-only API comparison over approved ability maps. Covered by API e2e comparison test.
UC-13 Detect Capability Gaps Implemented as a read-only API gap report over desired capabilities and approved maps. Covered by API e2e gap-analysis test.
UC-14 Expose Registry via API Implemented for MVP plus review workflow. Covered by API contract, OpenAPI, and docs smoke tests.
UC-15 Update Registry After Repo Change Partially implemented by rerunning analysis; no explicit diff/change-review workflow yet. Covered for rerun behavior by API e2e: second analysis records new candidates without corrupting approved profile.
UC-16 Export Registry Entry Implemented as YAML export for approved registry entries. Covered by API e2e export test.

Immediate production-readiness test focus:

  1. If UC-15 becomes a production priority, add an explicit diff/change-review model instead of relying only on rerun analysis.
  2. Broaden fixture coverage over time for README-only, Python CLI, FastAPI, JavaScript/TypeScript, tests/examples, and weak-doc repositories.
  3. Add richer UI affordances for comparison, gap analysis, and export if these discovery endpoints become curator-facing workflows.

7. Initial Database Shape

Start with tables for:

  • repositories
  • repository_snapshots
  • analysis_runs
  • observed_facts
  • source_references
  • candidate_abilities
  • candidate_capabilities
  • candidate_features
  • candidate_evidence
  • candidate_links
  • approved_abilities
  • approved_capabilities
  • approved_features
  • approved_evidence
  • approved_links
  • review_decisions
  • content_chunks
  • embeddings

Use status fields consistently:

registered
ingesting
analyzing
analysis_failed
analyzed
reviewing
indexed

8. Analyzer v0.1 Strategy

The first analyzer should be intentionally modest.

Deterministic scan:

  • Identify repo root metadata files
  • Identify docs, examples, tests, package manifests, API specs, config files
  • Detect languages from extensions and package files
  • Detect common frameworks from manifests
  • Detect likely API/CLI features using simple framework-specific scanners

Content extraction:

  • README and docs first
  • Examples and tests second
  • Selected source files only when they expose interfaces
  • Preserve path and line references

LLM extraction:

  • Use separate prompts for abilities, capabilities, features, and evidence
  • Request structured JSON
  • Require source references for each candidate
  • Reject or mark speculative any candidate without supporting sources

Confidence scoring:

  • Start from the documented additive model
  • Normalize to 0.0-1.0
  • Store both numeric confidence and label

9. UI Workplan

Build application screens in this order:

  1. Repository list
  2. Repository registration
  3. Repository detail and analysis status
  4. Deterministic scan summary
  5. Candidate review tree
  6. Published repository profile
  7. Search

The UI should feel like an operational tool rather than a marketing site: dense, clear, review-focused, and optimized for repeated curator work.

10. Testing Strategy

Add tests around the highest-risk boundaries:

  • Database migrations and model relationships
  • Git URL validation
  • Scanner output for fixture repositories
  • Candidate graph validation
  • Review workflow transitions
  • Search result ranking and filtering
  • API contract tests for MVP endpoints

Create small fixture repositories for:

  • README-only repository
  • Python CLI repository
  • FastAPI repository
  • JavaScript/TypeScript package
  • Repository with tests and examples
  • Repository with weak or misleading docs

11. Key Risks and Mitigations

Extraction quality risk:

  • Require source references.
  • Keep candidates reviewable.
  • Separate observed facts from interpreted claims.

Over-complex ontology risk:

  • Keep v0.1 schema minimal.
  • Avoid enforcing deep taxonomy too early.

Search quality risk:

  • Combine relational filters, full-text search, and vector search.
  • Show why a result matched.

Operational complexity risk:

  • Start as a modular monolith.
  • Use simple jobs before adding worker infrastructure.

Trust risk:

  • Never publish unapproved claims as canonical truth.
  • Preserve analysis run history and review decisions.

12. Immediate Next Actions

Recommended next implementation sequence:

  1. Scaffold the FastAPI application, database migrations, and test harness.
  2. Implement the core schema for repositories, snapshots, analysis runs, observed facts, candidates, and approved entries.
  3. Add manual registry CRUD and ability-map API.
  4. Build a minimal repository list/profile UI.
  5. Add Git ingestion and deterministic scanning.
  6. Add candidate graph generation and review workflow.

The first meaningful demo should be:

Create a repository
Add or generate an ability map
Approve it
Search for a capability
Open the repository profile
Drill down to feature and evidence locations