generated from coulomb/repo-seed
Added codex generated workplan
This commit is contained in:
415
workplans/ImplementationWorkplan.md
Normal file
415
workplans/ImplementationWorkplan.md
Normal file
@@ -0,0 +1,415 @@
|
||||
# Repository Ability Registry Implementation Workplan
|
||||
|
||||
## 1. Documentation Review Summary
|
||||
|
||||
The wiki defines a coherent v0.1 product: a registry that turns Git repositories into reviewable, source-linked maps of:
|
||||
|
||||
```text
|
||||
Ability -> Capability -> Feature -> Evidence -> Code location
|
||||
```
|
||||
|
||||
The strongest architectural principle across the docs is:
|
||||
|
||||
```text
|
||||
deterministic scanners establish observed facts
|
||||
LLM-assisted extractors propose interpreted claims
|
||||
humans or trusted agents approve registry truth
|
||||
```
|
||||
|
||||
This should remain the core design constraint for implementation. The system should be conservative, explainable, reviewable, and source-linked rather than attempting fully automatic code understanding.
|
||||
|
||||
## 2. MVP Scope
|
||||
|
||||
The first version should implement the core journey documented in the PRD, FRS, architecture sketch, and use-case catalog:
|
||||
|
||||
```text
|
||||
Register repository
|
||||
Analyze repository
|
||||
Generate candidate ability/capability/feature/evidence map
|
||||
Review and approve candidates
|
||||
Publish registry profile
|
||||
Search and inspect repositories
|
||||
```
|
||||
|
||||
In scope for v0.1:
|
||||
|
||||
- Repository registration by Git URL
|
||||
- Repository metadata and snapshot tracking
|
||||
- Deterministic repository scan
|
||||
- Candidate extraction for abilities, capabilities, features, and evidence
|
||||
- Human review actions: edit, approve, reject, merge, relink
|
||||
- Inspectable ability map
|
||||
- Natural-language search over approved registry entries
|
||||
- API access for repositories, ability maps, capabilities, and search
|
||||
|
||||
Out of scope for v0.1:
|
||||
|
||||
- Continuous GitHub app integration
|
||||
- Full static code understanding
|
||||
- Advanced ontology enforcement
|
||||
- Distributed indexing
|
||||
- Benchmark execution
|
||||
- Marketplace features
|
||||
- Complex access control
|
||||
- Automated truth claims without review
|
||||
|
||||
## 3. Recommended Technical Baseline
|
||||
|
||||
Use a pragmatic stack that keeps the analyzer and registry easy to evolve:
|
||||
|
||||
- Backend: Python FastAPI
|
||||
- Database: PostgreSQL
|
||||
- Semantic search: pgvector inside PostgreSQL
|
||||
- Worker: simple background jobs first; graduate to RQ or Celery when needed
|
||||
- Git access: subprocess git or GitPython
|
||||
- Frontend: React/Next.js or server-rendered FastAPI templates for earliest prototype
|
||||
- LLM extraction: provider-abstracted interface
|
||||
- Local artifact storage: filesystem under an application data directory
|
||||
|
||||
For the first implementation pass, prefer a modular monolith over distributed services. Keep clean module boundaries internally, but avoid operational complexity until the product loop is proven.
|
||||
|
||||
## 4. Core Domain Model
|
||||
|
||||
Implement these entities first:
|
||||
|
||||
- Repository
|
||||
- RepositorySnapshot
|
||||
- AnalysisRun
|
||||
- ObservedFact
|
||||
- CandidateAbility
|
||||
- CandidateCapability
|
||||
- CandidateFeature
|
||||
- CandidateEvidence
|
||||
- ApprovedAbility
|
||||
- ApprovedCapability
|
||||
- ApprovedFeature
|
||||
- ApprovedEvidence
|
||||
- SourceReference
|
||||
- ReviewDecision
|
||||
|
||||
The model should preserve a clear distinction between observed facts and interpreted claims.
|
||||
|
||||
Observed facts include things like:
|
||||
|
||||
- File paths
|
||||
- Documentation files
|
||||
- Test files
|
||||
- Package manifests
|
||||
- API routes
|
||||
- CLI commands
|
||||
- Public modules/functions
|
||||
- Detected languages/frameworks
|
||||
|
||||
Interpreted claims include:
|
||||
|
||||
- Ability names and descriptions
|
||||
- Capability names and descriptions
|
||||
- Feature-to-capability links
|
||||
- Evidence-to-capability links
|
||||
- Confidence scores
|
||||
|
||||
## 5. Suggested Module Boundaries
|
||||
|
||||
Use the architecture sketch's boundaries as implementation modules:
|
||||
|
||||
- `repo_ingestion`: validate Git URLs, clone/fetch repos, resolve branch/commit
|
||||
- `repo_scanning`: deterministic file tree, language, docs, tests, examples, API/CLI detection
|
||||
- `content_indexing`: text extraction, chunking, source references, embeddings
|
||||
- `llm_extraction`: prompt orchestration and structured candidate generation
|
||||
- `candidate_graph`: build and validate ability/capability/feature/evidence relationships
|
||||
- `review_workflow`: edit, approve, reject, merge, relink, publish
|
||||
- `registry_query`: search, filters, profile retrieval, ability-map assembly
|
||||
- `web_api`: HTTP endpoints and request/response schemas
|
||||
- `web_ui`: registration, analysis, review, profile, and search screens
|
||||
|
||||
## 6. Milestones
|
||||
|
||||
### Milestone 0: Project Foundation
|
||||
|
||||
Goal: establish the application skeleton and development path.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- Backend app skeleton
|
||||
- Database migration setup
|
||||
- Configuration system
|
||||
- Local development instructions
|
||||
- Basic test harness
|
||||
- Health endpoint
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- App starts locally
|
||||
- Tests run locally
|
||||
- Database migrations apply cleanly
|
||||
|
||||
### Milestone 1: Manual Registry
|
||||
|
||||
Goal: prove the core data model and inspection experience before automation.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- Repository CRUD
|
||||
- Manual ability/capability/feature/evidence CRUD
|
||||
- Ability map endpoint
|
||||
- Basic repository profile UI
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- A user can create a repository profile by hand
|
||||
- The UI displays `Ability -> Capability -> Feature -> Evidence`
|
||||
- API returns the same map as structured JSON
|
||||
|
||||
### Milestone 2: Git Ingestion and Deterministic Scanner
|
||||
|
||||
Goal: establish trustworthy observed facts from repository contents.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- Git URL validation
|
||||
- Clone/fetch and checkout
|
||||
- Snapshot record with branch and commit hash
|
||||
- File tree scan
|
||||
- README/docs/examples/tests/package manifest detection
|
||||
- Basic language/framework/interface detection
|
||||
- Analysis run status tracking
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- A public Git repository can be registered and analyzed
|
||||
- The system records a snapshot and deterministic scan summary
|
||||
- Analysis failures are visible without corrupting prior data
|
||||
|
||||
### Milestone 3: Reviewable Candidate Graph
|
||||
|
||||
Goal: generate candidate registry entries from deterministic facts and extracted content.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- Content extraction from README, docs, examples, tests, package metadata, and selected source files
|
||||
- Source references with file paths and line ranges where possible
|
||||
- Candidate ability generation
|
||||
- Candidate capability generation
|
||||
- Candidate feature generation
|
||||
- Candidate evidence detection
|
||||
- Confidence scoring using the documented additive factors
|
||||
- Candidate graph endpoint and UI
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- Analysis produces candidates with source references and confidence
|
||||
- Candidates distinguish observed facts from interpreted claims
|
||||
- Candidate output is explainable enough for curator review
|
||||
|
||||
### Milestone 4: Review and Approval Workflow
|
||||
|
||||
Goal: turn candidates into canonical registry entries.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- Approve/reject candidate entries
|
||||
- Edit names, descriptions, confidence, and relationships
|
||||
- Merge duplicate abilities/capabilities/features
|
||||
- Relink capabilities, features, and evidence
|
||||
- Publish approved repository profile
|
||||
- Persist review decisions
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- A curator can correct and approve an analysis result
|
||||
- Only approved entries appear in canonical search/profile views
|
||||
- Repository status changes from analyzed to indexed/published
|
||||
|
||||
### Milestone 5: Search and Inspection
|
||||
|
||||
Goal: make the registry useful for discovery.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- Text search over repositories, abilities, capabilities, and descriptions
|
||||
- Semantic search with pgvector
|
||||
- Search filters for language, framework, and ability/capability presence
|
||||
- Search UI
|
||||
- Repository profile drill-down UI
|
||||
- Code/evidence links from features and capabilities
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- A user can search by need using natural language
|
||||
- Results show repository, matching ability/capability, confidence, and evidence level
|
||||
- A user can drill from a search result into the ability map and code/evidence references
|
||||
|
||||
### Milestone 6: API Completeness for Agents
|
||||
|
||||
Goal: support programmatic consumers cleanly.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- `GET /repos`
|
||||
- `POST /repos`
|
||||
- `GET /repos/{id}`
|
||||
- `POST /repos/{id}/analysis-runs`
|
||||
- `GET /repos/{id}/analysis-runs/{run_id}`
|
||||
- `GET /repos/{id}/ability-map`
|
||||
- `GET /abilities`
|
||||
- `GET /capabilities`
|
||||
- `GET /search?q=...`
|
||||
- OpenAPI examples
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- API covers repository registration, analysis, search, and inspection
|
||||
- Responses are stable enough for agent/tooling integration
|
||||
- OpenAPI docs describe all MVP endpoints
|
||||
|
||||
## 7. Initial Database Shape
|
||||
|
||||
Start with tables for:
|
||||
|
||||
- `repositories`
|
||||
- `repository_snapshots`
|
||||
- `analysis_runs`
|
||||
- `observed_facts`
|
||||
- `source_references`
|
||||
- `candidate_abilities`
|
||||
- `candidate_capabilities`
|
||||
- `candidate_features`
|
||||
- `candidate_evidence`
|
||||
- `candidate_links`
|
||||
- `approved_abilities`
|
||||
- `approved_capabilities`
|
||||
- `approved_features`
|
||||
- `approved_evidence`
|
||||
- `approved_links`
|
||||
- `review_decisions`
|
||||
- `content_chunks`
|
||||
- `embeddings`
|
||||
|
||||
Use status fields consistently:
|
||||
|
||||
```text
|
||||
registered
|
||||
ingesting
|
||||
analyzing
|
||||
analysis_failed
|
||||
analyzed
|
||||
reviewing
|
||||
indexed
|
||||
```
|
||||
|
||||
## 8. Analyzer v0.1 Strategy
|
||||
|
||||
The first analyzer should be intentionally modest.
|
||||
|
||||
Deterministic scan:
|
||||
|
||||
- Identify repo root metadata files
|
||||
- Identify docs, examples, tests, package manifests, API specs, config files
|
||||
- Detect languages from extensions and package files
|
||||
- Detect common frameworks from manifests
|
||||
- Detect likely API/CLI features using simple framework-specific scanners
|
||||
|
||||
Content extraction:
|
||||
|
||||
- README and docs first
|
||||
- Examples and tests second
|
||||
- Selected source files only when they expose interfaces
|
||||
- Preserve path and line references
|
||||
|
||||
LLM extraction:
|
||||
|
||||
- Use separate prompts for abilities, capabilities, features, and evidence
|
||||
- Request structured JSON
|
||||
- Require source references for each candidate
|
||||
- Reject or mark speculative any candidate without supporting sources
|
||||
|
||||
Confidence scoring:
|
||||
|
||||
- Start from the documented additive model
|
||||
- Normalize to `0.0-1.0`
|
||||
- Store both numeric confidence and label
|
||||
|
||||
## 9. UI Workplan
|
||||
|
||||
Build application screens in this order:
|
||||
|
||||
1. Repository list
|
||||
2. Repository registration
|
||||
3. Repository detail and analysis status
|
||||
4. Deterministic scan summary
|
||||
5. Candidate review tree
|
||||
6. Published repository profile
|
||||
7. Search
|
||||
|
||||
The UI should feel like an operational tool rather than a marketing site: dense, clear, review-focused, and optimized for repeated curator work.
|
||||
|
||||
## 10. Testing Strategy
|
||||
|
||||
Add tests around the highest-risk boundaries:
|
||||
|
||||
- Database migrations and model relationships
|
||||
- Git URL validation
|
||||
- Scanner output for fixture repositories
|
||||
- Candidate graph validation
|
||||
- Review workflow transitions
|
||||
- Search result ranking and filtering
|
||||
- API contract tests for MVP endpoints
|
||||
|
||||
Create small fixture repositories for:
|
||||
|
||||
- README-only repository
|
||||
- Python CLI repository
|
||||
- FastAPI repository
|
||||
- JavaScript/TypeScript package
|
||||
- Repository with tests and examples
|
||||
- Repository with weak or misleading docs
|
||||
|
||||
## 11. Key Risks and Mitigations
|
||||
|
||||
Extraction quality risk:
|
||||
|
||||
- Require source references.
|
||||
- Keep candidates reviewable.
|
||||
- Separate observed facts from interpreted claims.
|
||||
|
||||
Over-complex ontology risk:
|
||||
|
||||
- Keep v0.1 schema minimal.
|
||||
- Avoid enforcing deep taxonomy too early.
|
||||
|
||||
Search quality risk:
|
||||
|
||||
- Combine relational filters, full-text search, and vector search.
|
||||
- Show why a result matched.
|
||||
|
||||
Operational complexity risk:
|
||||
|
||||
- Start as a modular monolith.
|
||||
- Use simple jobs before adding worker infrastructure.
|
||||
|
||||
Trust risk:
|
||||
|
||||
- Never publish unapproved claims as canonical truth.
|
||||
- Preserve analysis run history and review decisions.
|
||||
|
||||
## 12. Immediate Next Actions
|
||||
|
||||
Recommended next implementation sequence:
|
||||
|
||||
1. Scaffold the FastAPI application, database migrations, and test harness.
|
||||
2. Implement the core schema for repositories, snapshots, analysis runs, observed facts, candidates, and approved entries.
|
||||
3. Add manual registry CRUD and ability-map API.
|
||||
4. Build a minimal repository list/profile UI.
|
||||
5. Add Git ingestion and deterministic scanning.
|
||||
6. Add candidate graph generation and review workflow.
|
||||
|
||||
The first meaningful demo should be:
|
||||
|
||||
```text
|
||||
Create a repository
|
||||
Add or generate an ability map
|
||||
Approve it
|
||||
Search for a capability
|
||||
Open the repository profile
|
||||
Drill down to feature and evidence locations
|
||||
```
|
||||
Reference in New Issue
Block a user