diff --git a/wiki/AbilityExtractionHeuristics.md b/wiki/AbilityExtractionHeuristics.md new file mode 100644 index 0000000..2c629fb --- /dev/null +++ b/wiki/AbilityExtractionHeuristics.md @@ -0,0 +1,822 @@ +AbilityExtractionHeuristics + +*How repositories will be explored* + +# Ability / Capability Extraction Heuristics v0.1 + +## Repository Ability Registry + +## 1. Purpose + +The extraction engine should answer: + +> “What is this repository useful for, what bounded behaviors does it provide, and where are those behaviors implemented?” + +It should produce **candidate entries**, not final truth. Human/agent review remains part of the workflow. + +--- + +# 2. Extraction Layers + +```text +Ability → usefulness / problem class +Capability → bounded behavior +Feature → concrete interface or implementation +Evidence → reason to believe the claim +``` + +--- + +# 3. Source Priority + +Not all repository signals are equally trustworthy. + +## Priority 1 — High Trust + +Use these first: + +```text +README +docs/ +examples/ +tests/ +API specs +CLI help +package metadata +``` + +These usually express intended usage. + +## Priority 2 — Medium Trust + +```text +module names +function names +class names +route names +config files +workflow files +``` + +These show implemented structure. + +## Priority 3 — Low Trust + +```text +comments +commit messages +dependency names +directory names alone +``` + +Useful as supporting signals, but not enough by themselves. + +--- + +# 4. Ability Extraction Heuristics + +Abilities describe **why the repository is useful**. + +## 4.1 Ability Signal Patterns + +Look for phrases like: + +```text +"helps users..." +"enables..." +"automates..." +"provides a way to..." +"used for..." +"designed to..." +"allows..." +"supports..." +``` + +Example: + +```text +"This library helps route incoming business emails." +``` + +Candidate ability: + +```yaml +name: Business Email Routing +``` + +--- + +## 4.2 Ability Naming Rule + +Ability names should be: + +```text +Domain + Problem Class +``` + +Good: + +```text +Business Email Routing +Document Classification +Invoice Data Extraction +Kubernetes Deployment Inspection +Agent Workflow Orchestration +``` + +Bad: + +```text +Fast API +Email Button +Classifier +Uses GPT +``` + +--- + +## 4.3 Ability Extraction Sources + +Best sources for abilities: + +```text +README intro +project tagline +docs overview +examples index +package description +``` + +Ability is usually described in prose, not code. + +--- + +## 4.4 Ability Confidence + +Assign confidence based on signal quality: + +```yaml +confidence: + high: + - explicitly stated in README/docs + - supported by examples + - supported by tests or APIs + + medium: + - inferred from multiple capabilities/features + - visible in examples but not stated + + low: + - inferred from names only + - based on dependencies or folder structure +``` + +--- + +# 5. Capability Extraction Heuristics + +Capabilities describe **bounded behavior**. + +## 5.1 Capability Signal Patterns + +Look for verbs applied to objects: + +```text +classify email +extract invoice data +summarize document +validate schema +generate response +deploy service +monitor cluster +route ticket +convert format +``` + +Pattern: + +```text +Verb + Object +``` + +Examples: + +```text +Classify Email Intent +Extract Invoice Metadata +Generate Routing Explanation +Validate Repository Metadata +``` + +--- + +## 5.2 Capability Naming Rule + +Capability names should be: + +```text +Action Verb + Domain Object +``` + +Good: + +```text +Classify Incoming Email +Extract PDF Metadata +Generate API Client +Validate Kubernetes Manifest +Detect Broken Links +``` + +Bad: + +```text +Email Capability +Parser +Smart Document Stuff +Endpoint +``` + +--- + +## 5.3 Capability Sources + +Best sources: + +```text +API route names +CLI commands +public functions +service classes +tests +examples +docs tutorials +``` + +Capability is often visible in code and tests. + +--- + +## 5.4 Capability Boundary Rule + +A capability should be small enough to test. + +Good: + +```text +Extract invoice date from PDF +Classify email into intent category +Generate markdown from DOCX +``` + +Too broad: + +```text +Manage documents +Automate business +Understand everything +``` + +Too narrow: + +```text +Read config variable +Call helper function +Trim whitespace +``` + +Rule of thumb: + +> If you can write a meaningful acceptance test for it, it is probably a capability. + +--- + +# 6. Feature Extraction Heuristics + +Features describe **how the capability is exposed or implemented**. + +## 6.1 Feature Signal Patterns + +Look for concrete affordances: + +```text +REST endpoint +CLI command +UI component +configuration option +SDK method +background job +database migration +import/export format +plugin hook +``` + +Examples: + +```yaml +features: + - name: /classify-email endpoint + - name: classify-email CLI command + - name: department-rules.yaml config + - name: JSON result export +``` + +--- + +## 6.2 Feature Naming Rule + +Feature names should be concrete and inspectable. + +Good: + +```text +POST /api/classify-email +classify-email CLI command +Rule Configuration File +PDF Upload Component +``` + +Bad: + +```text +AI routing +Document understanding +Magic extraction +``` + +--- + +# 7. Evidence Extraction Heuristics + +Evidence supports claims. + +## 7.1 Evidence Types + +```yaml +evidence_types: + unit_test + integration_test + example + demo + benchmark + documentation + API specification + production usage note + manual review +``` + +--- + +## 7.2 Evidence Mapping + +Map evidence to the nearest capability. + +Example: + +```text +tests/test_email_classifier.py +``` + +Supports: + +```text +Classify Incoming Email +``` + +Example: + +```text +examples/invoice_extraction_demo.py +``` + +Supports: + +```text +Extract Invoice Metadata +``` + +--- + +## 7.3 Evidence Strength + +```yaml +evidence_strength: + strong: + - automated tests + - benchmark results + - executable examples + - integration tests + + medium: + - documentation + - tutorials + - screenshots + - sample output + + weak: + - README claim only + - comments + - filename hints +``` + +--- + +# 8. Ability–Capability–Feature Linking + +## 8.1 Link Rule + +```text +Ability explains why. +Capability explains what. +Feature explains how/where. +``` + +Example: + +```yaml +ability: + name: Business Email Routing + +capability: + name: Classify Incoming Email + supports: + - Business Email Routing + +feature: + name: POST /api/classify-email + implements: + - Classify Incoming Email +``` + +--- + +## 8.2 Linking Heuristic + +A capability supports an ability if: + +```text +Removing the capability would weaken the repository’s ability to deliver that usefulness. +``` + +A feature implements a capability if: + +```text +The feature is an interface, component, or code location through which the behavior is performed or exposed. +``` + +--- + +# 9. Confidence Scoring + +Use a simple additive model first. + +## 9.1 Candidate Confidence Factors + +```yaml +confidence_factors: + explicit_doc_claim: +0.30 + example_present: +0.20 + test_present: +0.25 + implementation_location_found: +0.15 + api_or_cli_exposed: +0.15 + multiple_source_agreement: +0.20 + inferred_from_names_only: -0.25 + no_evidence: -0.30 +``` + +Normalize to: + +```text +0.0 – 1.0 +``` + +## 9.2 Confidence Labels + +```yaml +0.80 - 1.00: high +0.50 - 0.79: medium +0.20 - 0.49: low +0.00 - 0.19: speculative +``` + +--- + +# 10. Classification Rules + +## 10.1 Is it an Ability? + +Ask: + +```text +Would a user search for this as a desired outcome? +``` + +If yes, probably ability. + +Example: + +```text +“I need document classification.” +``` + +Ability. + +--- + +## 10.2 Is it a Capability? + +Ask: + +```text +Can this behavior be tested with input/output expectations? +``` + +If yes, probably capability. + +Example: + +```text +“Classify document into category.” +``` + +Capability. + +--- + +## 10.3 Is it a Feature? + +Ask: + +```text +Is this a concrete interface, option, component, or implementation artifact? +``` + +If yes, probably feature. + +Example: + +```text +“POST /api/classify-document” +``` + +Feature. + +--- + +# 11. Anti-Heuristics + +Things the extractor should avoid. + +## 11.1 Do Not Treat Dependencies as Capabilities + +Bad: + +```yaml +capability: Uses OpenAI +``` + +Better: + +```yaml +feature: OpenAI provider integration +capability: Generate Text Summary +``` + +--- + +## 11.2 Do Not Treat Technology as Ability + +Bad: + +```yaml +ability: FastAPI +``` + +Better: + +```yaml +feature: FastAPI REST interface +``` + +--- + +## 11.3 Do Not Treat Internal Helpers as Capabilities + +Bad: + +```yaml +capability: Parse YAML Config +``` + +Unless parsing YAML config is a user-visible behavior. + +--- + +## 11.4 Avoid Vendor-Hype Terms + +Bad: + +```text +intelligent automation +next-gen AI +enterprise-ready transformation +``` + +Convert into testable candidates: + +```text +Classify Documents +Generate Reports +Route Tasks +``` + +--- + +# 12. Extraction Pipeline v0.1 + +## Step 1 — Repository Intake + +Collect: + +```text +README +docs +examples +tests +package files +source tree +API routes +CLI definitions +``` + +--- + +## Step 2 — Structural Summary + +Produce: + +```yaml +repository_summary: + languages: [] + frameworks: [] + interfaces: [] + docs_found: [] + tests_found: [] + examples_found: [] +``` + +--- + +## Step 3 — Candidate Ability Extraction + +From README/docs/package descriptions. + +Output: + +```yaml +candidate_abilities: + - name + - description + - confidence + - supporting_sources +``` + +--- + +## Step 4 — Candidate Capability Extraction + +From APIs, tests, examples, public modules. + +Output: + +```yaml +candidate_capabilities: + - name + - description + - inputs + - outputs + - linked_abilities + - confidence + - supporting_sources +``` + +--- + +## Step 5 — Candidate Feature Extraction + +From endpoints, CLI commands, config files, UI components, modules. + +Output: + +```yaml +candidate_features: + - name + - type + - location + - linked_capabilities + - confidence +``` + +--- + +## Step 6 — Evidence Linking + +Attach evidence: + +```yaml +evidence: + - type + - path + - supports + - strength +``` + +--- + +## Step 7 — Review Package + +Generate a curator-friendly review view: + +```text +Ability + Capability + Feature + Evidence +``` + +--- + +# 13. Example Extraction + +Given README: + +```text +MailRouter helps companies automatically classify incoming emails and route them to the right department. +``` + +Given route: + +```text +POST /api/classify-email +``` + +Given test: + +```text +tests/test_email_classification.py +``` + +Output: + +```yaml +abilities: + - id: ability.business_email_routing + name: Business Email Routing + confidence: 0.9 + +capabilities: + - id: capability.classify_incoming_email + name: Classify Incoming Email + ability_refs: + - ability.business_email_routing + confidence: 0.85 + +features: + - id: feature.classify_email_endpoint + name: POST /api/classify-email + type: REST endpoint + location: src/routes/classify_email.py + capability_refs: + - capability.classify_incoming_email + +evidence: + - type: unit_test + path: tests/test_email_classification.py + supports: + - capability.classify_incoming_email + strength: strong +``` + +--- + +# 14. MVP Principle + +The extractor should be: + +```text +conservative +explainable +reviewable +source-linked +``` + +Not magical. + +The best first version is not the one that extracts everything. + +It is the one where the user says: + +> “Yes, I understand why the system proposed this.” + + +xxx diff --git a/wiki/ArchitectureSketch.md b/wiki/ArchitectureSketch.md new file mode 100644 index 0000000..2ef67aa --- /dev/null +++ b/wiki/ArchitectureSketch.md @@ -0,0 +1,575 @@ +ArchitectureSketch + +* Repository Ability Registry — Architecture v0.1* + +# Repository Ability Registry — Architecture v0.1 + +## 1. Core architectural idea + +Use a **pipeline + registry + inspection UI** architecture. + +```text +Git Repo + ↓ +Ingestion + ↓ +Analysis Pipeline + ↓ +Candidate Registry Entries + ↓ +Human Review / Approval + ↓ +Searchable Ability Registry + ↓ +Web UI / API / CLI +``` + +The system should not pretend the first analysis is truth. It produces **reviewable candidates**. + +--- + +## 2. Main components + +### 1. Registry Web App + +Purpose: + +* register repositories +* trigger analysis +* review results +* inspect ability maps +* search repos + +Could be a normal web app with: + +```text +Frontend + Backend API + Database +``` + +--- + +### 2. Git Ingestion Service + +Responsibilities: + +* clone/pull repositories +* checkout commit +* store snapshot metadata +* detect repo structure + +Outputs: + +```yaml +repo_snapshot: + repo_id + commit_hash + branch + file_tree + metadata_files +``` + +--- + +### 3. Repository Analyzer + +This is the heart. + +Pipeline stages: + +```text +Structure Scanner +Documentation Scanner +Interface Scanner +Test Scanner +LLM-Assisted Extractor +Evidence Linker +Confidence Scorer +``` + +Important: split deterministic scanners from LLM extraction. + +--- + +### 4. Candidate Registry Store + +Stores unapproved results: + +```text +candidate abilities +candidate capabilities +candidate features +candidate evidence +source references +confidence scores +``` + +These are editable. + +--- + +### 5. Curator Review Layer + +Allows a human or agent to: + +* accept +* reject +* rename +* merge +* relink +* approve + +This turns candidates into official registry entries. + +--- + +### 6. Search / Query Layer + +Supports: + +```text +natural language search +ability search +capability search +repo search +feature search +evidence search +``` + +Use both: + +* relational filters +* vector/semantic search + +--- + +### 7. Public/Agent API + +Expose structured access: + +```http +GET /repos +GET /repos/{id} +GET /abilities +GET /capabilities +GET /search?q=... +GET /repos/{id}/ability-map +``` + +Later this becomes MCP-friendly. + +--- + +## 3. Suggested storage architecture + +Use a hybrid model: + +### PostgreSQL + +For canonical structured data: + +```text +repositories +snapshots +abilities +capabilities +features +evidence +links +analysis_runs +review_status +``` + +### Vector index + +For semantic search over: + +```text +README chunks +docs chunks +ability descriptions +capability descriptions +feature descriptions +``` + +Start simple with `pgvector` inside PostgreSQL. + +### Object/file storage + +For: + +```text +repo snapshots +analysis artifacts +parsed file summaries +exported registry YAML +``` + +Local filesystem is fine for MVP. + +--- + +## 4. Data model sketch + +```text +Repository + has many Snapshots + has many AnalysisRuns + has many RegistryEntries + +Snapshot + commit_hash + branch + file_tree + extracted_documents + +AnalysisRun + snapshot_id + status + started_at + completed_at + model_used + analyzer_version + +Ability + repo_id + name + description + confidence + status + +Capability + repo_id + ability_id + name + description + inputs + outputs + confidence + status + +Feature + repo_id + capability_id + name + type + location + confidence + status + +Evidence + repo_id + capability_id + type + path + strength +``` + +--- + +## 5. Analysis pipeline + +### Step 1 — Clone / update repo + +```text +git clone +checkout commit +record commit hash +``` + +--- + +### Step 2 — Deterministic scan + +Detect: + +```text +languages +frameworks +package managers +entrypoints +routes +CLI commands +tests +docs +examples +config files +``` + +This should be deterministic code, not LLM. + +--- + +### Step 3 — Content chunking + +Create chunks from: + +```text +README +docs +examples +tests +API specs +selected source files +``` + +Each chunk keeps: + +```text +file path +line range +content type +semantic role +``` + +--- + +### Step 4 — LLM-assisted extraction + +Ask the model separately for: + +```text +candidate abilities +candidate capabilities +candidate features +evidence mappings +``` + +Do not ask one giant prompt to do everything. + +--- + +### Step 5 — Confidence scoring + +Combine: + +```text +LLM confidence +source quality +tests present +examples present +implementation found +multiple-source agreement +``` + +--- + +### Step 6 — Candidate graph generation + +Output: + +```text +Ability + → Capability + → Feature + → Evidence +``` + +--- + +### Step 7 — Review + +Only approved entries become canonical. + +--- + +## 6. Important design decision + +Separate: + +```text +Observed facts +``` + +from: + +```text +Interpreted claims +``` + +Example: + +### Observed fact + +```yaml +file: src/routes/classify.py +route: POST /classify +``` + +### Interpreted claim + +```yaml +capability: Classify Incoming Email +``` + +The first is source-derived. +The second is inferred. + +This distinction is crucial for trust. + +--- + +## 7. MVP technology suggestion + +A very pragmatic stack: + +```text +Backend: Python FastAPI +DB: PostgreSQL + pgvector +Worker: Celery/RQ or simple background jobs +Git analysis: GitPython / subprocess git +Frontend: React / Next.js or simple server-rendered app +LLM extraction: provider-abstracted interface +``` + +Given your broader agent/tooling context, Python is probably best for the analyzer. + +--- + +## 8. Web UI structure + +### Page 1 — Repository List + +Shows: + +```text +name +description +status +last analyzed +top abilities +``` + +### Page 2 — Register Repository + +Input: + +```text +Git URL +branch +access token optional +``` + +### Page 3 — Analysis Run + +Shows: + +```text +scan progress +detected structure +candidate entries +warnings +``` + +### Page 4 — Review + +Tree view: + +```text +Ability + Capability + Feature + Evidence +``` + +Actions: + +```text +approve +edit +reject +merge +relink +``` + +### Page 5 — Repository Profile + +Final inspectable view. + +### Page 6 — Search + +Natural-language search with filters: + +```text +domain +language +framework +capability type +maturity +evidence strength +``` + +--- + +## 9. Internal API boundaries + +Keep clean module boundaries: + +```text +repo_ingestion +repo_scanning +content_indexing +llm_extraction +candidate_graph +review_workflow +registry_query +web_api +``` + +This prevents the analyzer from becoming a ball of mud. + +--- + +## 10. What to avoid in v0.1 + +Do not build yet: + +```text +continuous GitHub app integration +full static code analysis +full ontology engine +automatic truth claims +complex permission system +benchmark execution +marketplace functionality +``` + +MVP should prove: + +> Can we register repos, extract useful maps, review them, and search them? + +--- + +## 11. Recommended first implementation path + +### Milestone 1 — Manual Registry + +Create schema + UI where entries can be entered manually. + +### Milestone 2 — Deterministic Scanner + +Add repo clone + README/docs/tests/interface detection. + +### Milestone 3 — LLM Candidate Extraction + +Generate candidate ability/capability/feature graph. + +### Milestone 4 — Review Workflow + +Approve/edit/reject extracted entries. + +### Milestone 5 — Search + +Add semantic search over approved registry entries. + +--- + +## 12. Architecture principle + +> Deterministic scanners establish facts. +> LLMs propose interpretations. +> Humans or trusted agents approve registry truth. + +That should be the backbone. + + +xxx diff --git a/wiki/FunctionalRequirementsSpecV0.1.md b/wiki/FunctionalRequirementsSpecV0.1.md new file mode 100644 index 0000000..5e62843 --- /dev/null +++ b/wiki/FunctionalRequirementsSpecV0.1.md @@ -0,0 +1,439 @@ +FunctionalRequirementsSpecV0.1 + +*Repository Ability Registry Functionality* + +Here is a **Functional Requirements Specification (FRS)** for the **Repository Ability Registry (v0.1)**—focused strictly on **externally observable system behavior**, aligned with your architecture and PRD. + +--- + +# **Functional Requirements Specification (FRS)** + +## **Repository Ability Registry v0.1** + +--- + +# 1. **Scope** + +This FRS defines the externally visible functionality of a system that: + +* registers Git repositories +* analyzes repository content +* extracts and manages **abilities, capabilities, features, and evidence** +* enables **search and inspection** of repositories via UI and API + +--- + +# 2. **System Overview** + +The system provides: + +```text +Repository Registration +→ Analysis & Extraction +→ Review & Approval +→ Search & Inspection +→ API Access +``` + +--- + +# 3. **Actors** + +| Actor | Description | +| ------- | ------------------------------------------ | +| User | Registers repositories, searches, inspects | +| Curator | Reviews and approves extracted data | +| System | Performs automated analysis | +| Agent | External system interacting via API | + +--- + +# 4. **Functional Areas** + +--- + +# 4.1 Repository Management + +## FR-001 Register Repository + +The system shall allow a user to register a repository by providing a Git URL. + +## FR-002 Validate Repository Access + +The system shall validate that the repository is accessible. + +## FR-003 Store Repository Metadata + +The system shall store: + +* repository URL +* name +* description (optional) +* branch (default: main) + +## FR-004 List Registered Repositories + +The system shall allow users to view all registered repositories. + +## FR-005 View Repository Details + +The system shall display repository metadata and status. + +--- + +# 4.2 Repository Analysis + +## FR-010 Trigger Analysis + +The system shall allow a user or system process to trigger repository analysis. + +## FR-011 Clone Repository + +The system shall retrieve repository contents for analysis. + +## FR-012 Detect Repository Structure + +The system shall identify: + +* programming languages +* frameworks (if detectable) +* directory structure +* documentation files +* test files +* example files + +## FR-013 Extract Content Sources + +The system shall extract text from: + +* README +* documentation +* examples +* selected source files +* test files + +## FR-014 Record Analysis Run + +The system shall record each analysis execution with: + +* timestamp +* repository snapshot reference +* status + +--- + +# 4.3 Ability Extraction + +## FR-020 Generate Candidate Abilities + +The system shall generate candidate abilities from repository content. + +## FR-021 Ability Attributes + +Each ability shall include: + +* name +* description +* confidence score +* source references + +## FR-022 List Candidate Abilities + +The system shall display candidate abilities for a repository. + +--- + +# 4.4 Capability Extraction + +## FR-030 Generate Candidate Capabilities + +The system shall generate candidate capabilities. + +## FR-031 Capability Attributes + +Each capability shall include: + +* name +* description +* inputs (if detectable) +* outputs (if detectable) +* linked ability references +* confidence score + +## FR-032 List Candidate Capabilities + +The system shall display candidate capabilities. + +--- + +# 4.5 Feature Extraction + +## FR-040 Generate Candidate Features + +The system shall generate candidate features. + +## FR-041 Feature Attributes + +Each feature shall include: + +* name +* type (e.g. API, CLI, config) +* implementation location (file path) +* linked capability references +* confidence score + +## FR-042 List Candidate Features + +The system shall display candidate features. + +--- + +# 4.6 Evidence Handling + +## FR-050 Detect Evidence Sources + +The system shall identify evidence such as: + +* test files +* examples +* documentation + +## FR-051 Associate Evidence with Capabilities + +The system shall link evidence to relevant capabilities. + +## FR-052 Evidence Attributes + +Each evidence item shall include: + +* type +* file path or reference +* associated capability +* strength classification + +--- + +# 4.7 Review and Curation + +## FR-060 View Analysis Results + +The system shall allow users to view extracted abilities, capabilities, features, and evidence. + +## FR-061 Edit Entries + +The system shall allow users to: + +* modify names +* modify descriptions +* adjust relationships + +## FR-062 Accept Entries + +The system shall allow users to mark entries as approved. + +## FR-063 Reject Entries + +The system shall allow users to remove candidate entries. + +## FR-064 Merge Entries + +The system shall allow users to merge duplicate or overlapping entries. + +## FR-065 Persist Approved Entries + +The system shall store approved entries as canonical registry data. + +--- + +# 4.8 Search + +## FR-070 Natural Language Search + +The system shall allow users to search repositories using free-text queries. + +## FR-071 Semantic Matching + +The system shall match queries to: + +* abilities +* capabilities +* repository descriptions + +## FR-072 Display Search Results + +Search results shall include: + +* repository name +* matching ability/capability +* confidence indicator + +## FR-073 Filter Results + +The system shall allow filtering by: + +* language +* framework (if available) +* ability/capability presence + +--- + +# 4.9 Repository Inspection + +## FR-080 View Repository Profile + +The system shall display an inspectable repository view. + +## FR-081 Display Ability Map + +The system shall display: + +```text +Ability → Capability → Feature → Code Location +``` + +## FR-082 Drill-down Navigation + +The system shall allow users to navigate: + +* from ability to capabilities +* from capability to features +* from feature to code location + +## FR-083 Display Evidence + +The system shall display evidence associated with capabilities. + +## FR-084 Display Confidence + +The system shall display confidence levels for abilities, capabilities, and features. + +--- + +# 4.10 API Access + +## FR-090 Repository API + +The system shall provide an API to retrieve repository metadata. + +## FR-091 Ability API + +The system shall provide an API to retrieve abilities. + +## FR-092 Capability API + +The system shall provide an API to retrieve capabilities. + +## FR-093 Search API + +The system shall provide an API for search queries. + +## FR-094 Inspection API + +The system shall provide an API to retrieve full ability maps. + +--- + +# 4.11 Updates + +## FR-100 Re-run Analysis + +The system shall allow re-analysis of a repository. + +## FR-101 Track Changes + +The system shall record differences between analysis runs. + +--- + +# 5. **Data Handling Requirements** + +## FR-110 Store Repository Snapshots + +The system shall maintain references to analyzed repository versions. + +## FR-111 Maintain Versioned Entries + +The system shall retain historical analysis results. + +--- + +# 6. **Error Handling** + +## FR-120 Analysis Failure Handling + +The system shall report analysis failures without corrupting existing data. + +## FR-121 Partial Results + +The system shall allow partial extraction results to be stored and displayed. + +--- + +# 7. **Security & Access (MVP)** + +## FR-130 Basic Access Control + +The system shall support: + +* public repositories +* optional restricted access + +(No advanced role model required in v0.1) + +--- + +# 8. **User Interface Requirements** + +## FR-140 Repository Registration UI + +The system shall provide a form to register repositories. + +## FR-141 Analysis View UI + +The system shall display analysis progress and results. + +## FR-142 Review UI + +The system shall allow editing and approval of entries. + +## FR-143 Search UI + +The system shall provide a search interface. + +## FR-144 Inspection UI + +The system shall display the ability-capability-feature hierarchy. + +--- + +# 9. **Acceptance Criteria (MVP)** + +The system shall be considered functional when: + +1. A repository can be registered and analyzed +2. Candidate abilities, capabilities, and features are generated +3. A user can review and approve extracted entries +4. A user can search for repositories using natural language +5. A user can inspect a repository via an ability map +6. A user can navigate from abstraction to code location + +--- + +# 10. **Summary** + +The system provides: + +> A structured, inspectable, and searchable mapping from repository usefulness (abilities) to concrete implementation (features and code), supported by evidence and reviewable extraction. + +--- + +## Next high-leverage step + +If you want to push toward implementation: + +👉 Define **API schema (OpenAPI spec)** +👉 Design **database schema in SQL (ready to implement)** +👉 Build **first analyzer prototype (step-by-step coding plan)** + + +xxx diff --git a/wiki/LandingPage.md b/wiki/LandingPage.md new file mode 100644 index 0000000..baa677c --- /dev/null +++ b/wiki/LandingPage.md @@ -0,0 +1,104 @@ +LandingPage + +*Introducing the Repository Ability Registry* + +# **Repository Ability Registry** + +## **Understand What Code Can Do. Instantly.** + +### *From raw repositories to real understanding.* + +--- + +## **Code is everywhere. Understanding it isn’t.** + +Every day, developers and architects face the same problem: + +* Repositories are hard to navigate +* READMEs are incomplete or misleading +* Useful functionality is hidden deep in code +* Reuse decisions rely on guesswork + +> You don’t need more code. +> You need to understand the code you already have. + +--- + +## **What if repositories could explain themselves?** + +The **Repository Ability Registry** transforms Git repositories into **structured, inspectable knowledge**. + +It reveals: + +* **Abilities** — what a repository is useful for +* **Capabilities** — what it can actually do +* **Features** — how those behaviors are implemented +* **Evidence** — why you can trust those claims + +--- + +## **How it works** + +### 🔗 **1. Register a Repository** + +Add any Git repository to the registry. + +--- + +### 🧠 **2. Analyze & Extract** + +The system scans code, documentation, tests, and interfaces to generate a structured map. + +--- + +### 🛠 **3. Review & Refine** + +You or your team validate and improve the extracted understanding. + +--- + +### 🔍 **4. Search & Inspect** + +Explore repositories through a powerful interface: + +* Search by need (“classify documents”, “route emails”) +* Compare repositories +* Drill down from abstraction to code + +--- + +## **From Structure to Understanding** + +Most tools show you: + +> files, folders, and dependencies + +We show you: + +> **what the repository can actually help you do—and where to find it** + +--- + +## **Built for Humans and Agents** + +Whether you’re: + +* a developer looking for reusable components +* an architect mapping system capabilities +* an AI agent navigating codebases + +The registry provides a shared, machine-readable understanding. + +--- + +## **The Missing Layer** + +Between raw code and real-world usefulness, there is a gap. + +The **Repository Ability Registry** fills it. + +--- + +### **Start turning code into capability.** + +xxx diff --git a/wiki/ProductRequirementsSpecificationV0.1.md b/wiki/ProductRequirementsSpecificationV0.1.md new file mode 100644 index 0000000..758d65c --- /dev/null +++ b/wiki/ProductRequirementsSpecificationV0.1.md @@ -0,0 +1,495 @@ +ProductRequirementsSpecV0.1 + +*Repository Ability Registry Requirements* + +# **Product Requirements Document (PRD)** + +## **Repository Ability Registry (v0.1)** + +--- + +# 1. **Purpose** + +The **Repository Ability Registry** provides a structured, inspectable orientation layer for code repositories by mapping them from **abilities → capabilities → features → implementation → evidence**. + +It enables developers, architects, and agents to: + +* understand what a repository is useful for +* inspect how it implements that usefulness +* compare repositories based on real functionality +* search repositories using natural language + +--- + +# 2. **Problem Statement** + +Modern code repositories suffer from **orientation opacity**: + +* usefulness is unclear from structure alone +* README files are incomplete or misleading +* functionality is scattered across code +* reuse decisions rely on tribal knowledge + +This results in: + +* wasted time exploring repositories +* duplicated work +* poor architectural decisions +* low reuse of existing assets + +--- + +# 3. **Goals** + +## 3.1 Primary Goal + +> Enable users to **register a repository and understand its usefulness within minutes** via an inspectable ability map. + +--- + +## 3.2 Secondary Goals + +* enable **natural language search** across repositories +* provide **traceability from abstraction to code** +* establish a **foundation for proof-of-ability integration** +* support both **human and agent interaction** + +--- + +## 3.3 Non-Goals (v0.1) + +* full automated benchmarking +* advanced ontology enforcement +* marketplace features +* monetization features +* full CI/CD integration + +--- + +# 4. **Users** + +## 4.1 Developer / Architect + +* needs to find reusable functionality +* needs to understand unfamiliar repos quickly + +## 4.2 Repository Owner + +* wants their repo to be understandable and discoverable + +## 4.3 Registry Curator + +* ensures quality and correctness of extracted data + +## 4.4 Agent / Automation + +* queries registry programmatically + +--- + +# 5. **Core Concepts** + +```text +Ability = why the repository is useful +Capability = what behavior it provides +Feature = how the behavior is exposed/implemented +Evidence = why the capability can be trusted +``` + +--- + +# 6. **Product Scope (MVP)** + +## 6.1 Included + +* repository registration +* repository analysis (semi-automated) +* ability/capability/feature extraction (assisted) +* human review and correction +* searchable registry +* inspectable repository view + +--- + +## 6.2 Excluded + +* automated full-code understanding engine +* distributed indexing +* deep static analysis +* real-time synchronization +* full dependency graph modeling + +--- + +# 7. **Functional Requirements** + +--- + +## 7.1 Repository Registration + +### FR-01 + +User can register a repository via Git URL. + +### FR-02 + +System validates repository access. + +### FR-03 + +System stores repository metadata. + +--- + +## 7.2 Repository Analysis + +### FR-04 + +System clones or accesses repository contents. + +### FR-05 + +System extracts: + +* languages +* structure +* interfaces (API, CLI, etc.) +* documentation files +* tests + +--- + +## 7.3 Ability Extraction + +### FR-06 + +System generates candidate abilities based on: + +* README +* documentation +* examples + +### FR-07 + +Each ability includes: + +* name +* description +* confidence +* source references + +--- + +## 7.4 Capability Extraction + +### FR-08 + +System identifies candidate capabilities from: + +* modules +* APIs +* tests +* code structure + +### FR-09 + +Capabilities include: + +* inputs +* outputs +* linked abilities +* description + +--- + +## 7.5 Feature Extraction + +### FR-10 + +System identifies concrete features: + +* endpoints +* CLI commands +* modules +* configuration interfaces + +### FR-11 + +Each feature includes: + +* type +* implementation location +* linked capability + +--- + +## 7.6 Evidence Linking + +### FR-12 + +System identifies potential evidence: + +* tests +* examples +* benchmarks +* docs + +### FR-13 + +User can manually attach evidence to capabilities. + +--- + +## 7.7 Review and Curation + +### FR-14 + +User can review extracted abilities, capabilities, features. + +### FR-15 + +User can edit, delete, or merge entries. + +### FR-16 + +User can approve registry entry. + +--- + +## 7.8 Search + +### FR-17 + +User can search using natural language. + +### FR-18 + +System maps queries to abilities/capabilities. + +### FR-19 + +Search results include: + +* repository name +* matching ability +* matching capability +* confidence + +--- + +## 7.9 Inspection UI + +### FR-20 + +User can view repository profile. + +### FR-21 + +UI shows hierarchical structure: + +```text +Ability → Capability → Feature → Code +``` + +### FR-22 + +User can drill down to: + +* feature details +* code locations +* evidence + +--- + +## 7.10 API Access + +### FR-23 + +System exposes API endpoints for: + +* search +* repository retrieval +* capability lookup + +--- + +# 8. **Non-Functional Requirements** + +--- + +## 8.1 Performance + +* search results returned within < 2 seconds +* repository analysis < 2 minutes for medium repos + +--- + +## 8.2 Scalability + +* support 100–1,000 repositories (MVP) +* architecture allows later horizontal scaling + +--- + +## 8.3 Usability + +* repository can be registered in < 2 minutes +* inspection view understandable within 1 minute + +--- + +## 8.4 Extensibility + +* schema versioned +* ability to extend capability model later + +--- + +## 8.5 Reliability + +* analysis failures do not corrupt registry +* partial results allowed + +--- + +# 9. **Data Model (Simplified)** + +```yaml +Repository: + id + name + url + description + metadata + +Ability: + id + name + description + confidence + +Capability: + id + name + description + inputs + outputs + ability_refs + +Feature: + id + name + type + location + capability_refs + +Evidence: + id + type + reference + capability_refs +``` + +--- + +# 10. **User Experience** + +--- + +## 10.1 Key Screens + +### 1. Repository Registration + +* input URL +* confirm metadata + +### 2. Analysis View + +* detected structure +* proposed abilities/capabilities/features + +### 3. Review Interface + +* edit/approve entries + +### 4. Search Page + +* natural language search +* result list + +### 5. Repository Profile Page + +* ability map +* drill-down navigation +* code links + +--- + +# 11. **Success Metrics** + +--- + +## 11.1 Product Metrics + +* time to understand a repository < 5 minutes +* number of repositories indexed +* number of successful searches +* user engagement with inspection view + +--- + +## 11.2 Qualitative Metrics + +* “I understand what this repo does now” +* “I found something reusable” +* “This saved me time” + +--- + +# 12. **Risks** + +--- + +## 12.1 Extraction Quality Risk + +* automated inference may be inaccurate + +**Mitigation:** human review required + +--- + +## 12.2 Over-Complexity Risk + +* ontology becomes too heavy + +**Mitigation:** keep schema minimal in v0.1 + +--- + +## 12.3 Adoption Risk + +* users may not contribute metadata + +**Mitigation:** provide value via auto-analysis first + +--- + +# 13. **Future Extensions** + +* integration with CI/CD for live updates +* ability benchmarking integration (ties back to GIL) +* capability maturity scoring +* dependency graph across repositories +* marketplace / discovery layer +* automated code reasoning agents + +--- + +# 14. **Positioning** + +> The Repository Ability Registry is the missing orientation layer between raw code and practical reuse—making repositories understandable, comparable, and actionable. + + + + +xxx diff --git a/wiki/RepositoryRegistry.md b/wiki/RepositoryRegistry.md new file mode 100644 index 0000000..3d4cecb --- /dev/null +++ b/wiki/RepositoryRegistry.md @@ -0,0 +1,7 @@ +RepositoryRegistry + +*Exploring repositories by ability, capability and features* + +Our mission is to make code understandable and usable by transforming repositories into transparent, structured representations of their abilities, capabilities, and features—so humans and intelligent systems can reliably discover, evaluate, and build upon what already exists. + +xxx diff --git a/wiki/UseCaseCatalog.md b/wiki/UseCaseCatalog.md new file mode 100644 index 0000000..5629c47 --- /dev/null +++ b/wiki/UseCaseCatalog.md @@ -0,0 +1,610 @@ +UseCaseCatalog + +*Register, analyse and explore git repos* + +# Use Case Catalog + +## Repository Ability Registry + +## 1. Scope + +The **Repository Ability Registry** allows users to register Git repositories, analyze their contents, extract or maintain structured descriptions of their **abilities, capabilities, features, evidence, and implementation locations**, and make this information searchable through efficient interfaces and a web UI. + +Core promise: + +> Register a repository. Understand what it can do. Inspect the evidence. Find what is useful. + +--- + +# 2. Primary Actors + +## Repository Owner + +Maintains a Git repository and wants it represented accurately in the registry. + +## Registry Curator + +Reviews, improves, corrects, and approves extracted ability/capability/feature metadata. + +## Developer / Architect + +Searches across registered repositories to find reusable functionality or understand system structure. + +## Agent / Automation + +Uses API/CLI access to query repositories, capabilities, evidence, and implementation locations. + +## Viewer / Explorer + +Uses the web UI to browse and inspect registered repositories. + +--- + +# 3. Core Domain Objects + +```text +Repository +Ability +Capability +Feature +Evidence +Analysis Run +Registry Entry +Inspection View +Search Query +``` + +--- + +# 4. Use Case Overview + +| ID | Use Case | Primary Actor | +| ----- | --------------------------------- | ------------------------ | +| UC-01 | Register Git Repository | Repository Owner | +| UC-02 | Import Repository Metadata | System | +| UC-03 | Analyze Repository Structure | System | +| UC-04 | Extract Candidate Abilities | System / Agent | +| UC-05 | Extract Candidate Capabilities | System / Agent | +| UC-06 | Extract Candidate Features | System / Agent | +| UC-07 | Link Features to Code Locations | System | +| UC-08 | Attach Evidence to Capabilities | System / Curator | +| UC-09 | Review and Approve Analysis | Registry Curator | +| UC-10 | Search Repositories by Need | Developer / Architect | +| UC-11 | Inspect Repository Ability Map | Developer / Architect | +| UC-12 | Compare Repositories | Developer / Architect | +| UC-13 | Detect Capability Gaps | Architect / Agent | +| UC-14 | Expose Registry via API | Agent / Automation | +| UC-15 | Update Registry After Repo Change | System | +| UC-16 | Export Registry Entry | Repository Owner / Agent | + +--- + +# 5. Detailed Use Cases + +## UC-01 — Register Git Repository + +### Goal + +Add a Git repository to the registry. + +### Primary Actor + +Repository Owner + +### Preconditions + +The actor has a repository URL and access rights if the repo is private. + +### Main Flow + +1. Actor opens “Register Repository”. +2. Actor enters Git URL. +3. Actor provides optional metadata: + + * name + * description + * owner + * domain tags + * visibility +4. System validates access. +5. System creates a repository record. +6. System queues initial analysis. + +### Postconditions + +Repository exists in the registry with status `registered`. + +--- + +## UC-02 — Import Repository Metadata + +### Goal + +Collect basic metadata from the Git repository. + +### Primary Actor + +System + +### Main Flow + +1. System clones or accesses the repository. +2. System reads metadata files: + + * README + * package manifests + * build files + * license + * docs + * existing registry metadata +3. System stores detected metadata. +4. System prepares repository for deeper analysis. + +### Postconditions + +Repository has initial metadata and source snapshot reference. + +--- + +## UC-03 — Analyze Repository Structure + +### Goal + +Create a structural map of the repository. + +### Primary Actor + +System + +### Main Flow + +1. System detects language/frameworks. +2. System identifies major folders and modules. +3. System detects APIs, CLIs, services, tests, examples, docs. +4. System stores structural findings. +5. System generates a repository structure summary. + +### Output + +Example: + +```yaml +languages: + - Python +interfaces: + - REST API + - CLI +tests: + - pytest +documentation: + - README.md + - docs/ +``` + +--- + +## UC-04 — Extract Candidate Abilities + +### Goal + +Infer high-level usefulness from the repository. + +### Primary Actor + +System / Agent + +### Main Flow + +1. System analyzes README, docs, examples, names, and tests. +2. System proposes candidate abilities. +3. Each ability receives: + + * name + * description + * confidence + * supporting sources +4. Candidate abilities are shown for review. + +### Example Output + +```yaml +abilities: + - name: Business Email Routing + confidence: 0.82 + rationale: README and examples describe routing inbound messages. +``` + +--- + +## UC-05 — Extract Candidate Capabilities + +### Goal + +Identify bounded behaviors the repository provides. + +### Primary Actor + +System / Agent + +### Main Flow + +1. System inspects APIs, modules, functions, tests, examples. +2. System proposes capabilities. +3. System links capabilities to abilities. +4. System records confidence and source evidence. + +### Example + +```yaml +capabilities: + - name: Email Intent Classification + ability_refs: + - Business Email Routing + inputs: + - email subject + - email body + outputs: + - intent category + - confidence +``` + +--- + +## UC-06 — Extract Candidate Features + +### Goal + +Identify concrete exposed or implemented features. + +### Primary Actor + +System / Agent + +### Main Flow + +1. System detects endpoints, commands, UI components, config options, public functions, modules. +2. System creates candidate feature entries. +3. System links features to capabilities. +4. System records implementation locations. + +### Example + +```yaml +features: + - name: Classify Email Endpoint + type: REST endpoint + location: src/api/routes/classify.py + capability_refs: + - Email Intent Classification +``` + +--- + +## UC-07 — Link Features to Code Locations + +### Goal + +Make registry entries inspectable down to source code. + +### Primary Actor + +System + +### Main Flow + +1. System identifies file paths and symbols. +2. System links registry entries to code locations. +3. UI exposes links from ability → capability → feature → source. +4. Actor can inspect relevant files without browsing the whole repo. + +### Postconditions + +Features and capabilities are traceable to implementation. + +--- + +## UC-08 — Attach Evidence to Capabilities + +### Goal + +Support trust in capability claims. + +### Primary Actor + +System / Curator + +### Main Flow + +1. System detects tests, examples, benchmarks, demos, docs. +2. System proposes evidence links. +3. Curator confirms or edits evidence. +4. Evidence is attached to capabilities. + +### Evidence Types + +```text +unit test +integration test +example +demo +benchmark +documentation +production usage note +manual review +``` + +--- + +## UC-09 — Review and Approve Analysis + +### Goal + +Allow human correction before registry metadata becomes authoritative. + +### Primary Actor + +Registry Curator + +### Main Flow + +1. Curator opens analysis results. +2. Curator reviews proposed abilities, capabilities, features, and evidence. +3. Curator accepts, edits, rejects, or merges entries. +4. System saves approved registry entry. +5. Repository status changes to `indexed`. + +### Postconditions + +Repository has a reviewed registry profile. + +--- + +## UC-10 — Search Repositories by Need + +### Goal + +Find repositories using everyday language. + +### Primary Actor + +Developer / Architect + +### Main Flow + +1. Actor enters a query such as: + + > “I need something that can classify incoming customer emails.” +2. System maps query to possible abilities and capabilities. +3. System returns matching repositories. +4. Results show: + + * matching ability + * matching capability + * confidence + * maturity + * evidence level + +### Postconditions + +Actor can identify candidate repositories without knowing their names. + +--- + +## UC-11 — Inspect Repository Ability Map + +### Goal + +Understand what a repository is useful for. + +### Primary Actor + +Developer / Architect + +### Main Flow + +1. Actor opens repository profile. +2. UI displays: + + * repository summary + * abilities + * capabilities + * features + * evidence + * code links +3. Actor drills down from high-level ability to implementation details. + +### Key UI Concept + +```text +Ability + → Capability + → Feature + → Code Location + → Evidence +``` + +--- + +## UC-12 — Compare Repositories + +### Goal + +Compare multiple repositories by abilities, capabilities, maturity, and evidence. + +### Primary Actor + +Developer / Architect + +### Main Flow + +1. Actor selects two or more repositories. +2. System shows comparison matrix. +3. Actor compares: + + * overlapping abilities + * unique capabilities + * maturity + * evidence quality + * interfaces +4. Actor identifies best fit or complementarity. + +--- + +## UC-13 — Detect Capability Gaps + +### Goal + +Identify missing, weak, or unsupported capabilities. + +### Primary Actor + +Architect / Agent + +### Main Flow + +1. Actor defines desired ability or target capability map. +2. System compares desired map with registered repositories. +3. System reports: + + * missing capabilities + * weakly evidenced capabilities + * duplicate capabilities + * abandoned repositories + * features without mapped capability +4. Actor uses results for planning. + +### Example Output + +```text +Gap: Document classification ability exists, but no repository provides benchmarked German-language evaluation. +``` + +--- + +## UC-14 — Expose Registry via API + +### Goal + +Allow agents and external tools to query the registry. + +### Primary Actor + +Agent / Automation + +### Main Flow + +1. Agent calls registry API. +2. API supports queries such as: + + * find repositories by ability + * show capability details + * list features of repo + * retrieve evidence links +3. API returns structured JSON. + +### Example + +```http +GET /api/capabilities?query=email-routing +``` + +--- + +## UC-15 — Update Registry After Repo Change + +### Goal + +Keep registry entries aligned with repository changes. + +### Primary Actor + +System + +### Main Flow + +1. System detects repository change. +2. System runs incremental analysis. +3. System compares old and new findings. +4. System flags changed abilities/capabilities/features. +5. Curator reviews differences. +6. Registry entry is updated. + +### Postconditions + +Registry reflects current repository state. + +--- + +## UC-16 — Export Registry Entry + +### Goal + +Allow registry data to travel with the repository. + +### Primary Actor + +Repository Owner / Agent + +### Main Flow + +1. Actor requests export. +2. System generates `repo-abilities.yaml`. +3. Actor commits file into repository. +4. Future analyses can use it as a prior. + +### Output + +```text +/.well-known/repo-abilities.yaml +``` + +--- + +# 6. MVP Use Cases + +For the first version, implement only these: + +```text +UC-01 Register Git Repository +UC-02 Import Repository Metadata +UC-03 Analyze Repository Structure +UC-04 Extract Candidate Abilities +UC-05 Extract Candidate Capabilities +UC-06 Extract Candidate Features +UC-09 Review and Approve Analysis +UC-10 Search Repositories by Need +UC-11 Inspect Repository Ability Map +``` + +Everything else can follow. + +--- + +# 7. Core MVP User Journey + +```text +Register repo + ↓ +Analyze repo + ↓ +Review extracted ability/capability/feature map + ↓ +Publish registry profile + ↓ +Search and inspect repos through web UI/API +``` + +--- + +# 8. Product Principle + +The registry should not merely answer: + +> “What files are in this repository?” + +It should answer: + +> **“What can this repository help me do, how does it do it, and where can I inspect the proof?”** + + +xxx