added initial concept documents

This commit is contained in:
2026-04-25 21:15:17 +02:00
parent 4bcd22a518
commit 8d3d5aab42
7 changed files with 3052 additions and 0 deletions

View File

@@ -0,0 +1,822 @@
AbilityExtractionHeuristics
*How repositories will be explored*
# Ability / Capability Extraction Heuristics v0.1
## Repository Ability Registry
## 1. Purpose
The extraction engine should answer:
> “What is this repository useful for, what bounded behaviors does it provide, and where are those behaviors implemented?”
It should produce **candidate entries**, not final truth. Human/agent review remains part of the workflow.
---
# 2. Extraction Layers
```text
Ability → usefulness / problem class
Capability → bounded behavior
Feature → concrete interface or implementation
Evidence → reason to believe the claim
```
---
# 3. Source Priority
Not all repository signals are equally trustworthy.
## Priority 1 — High Trust
Use these first:
```text
README
docs/
examples/
tests/
API specs
CLI help
package metadata
```
These usually express intended usage.
## Priority 2 — Medium Trust
```text
module names
function names
class names
route names
config files
workflow files
```
These show implemented structure.
## Priority 3 — Low Trust
```text
comments
commit messages
dependency names
directory names alone
```
Useful as supporting signals, but not enough by themselves.
---
# 4. Ability Extraction Heuristics
Abilities describe **why the repository is useful**.
## 4.1 Ability Signal Patterns
Look for phrases like:
```text
"helps users..."
"enables..."
"automates..."
"provides a way to..."
"used for..."
"designed to..."
"allows..."
"supports..."
```
Example:
```text
"This library helps route incoming business emails."
```
Candidate ability:
```yaml
name: Business Email Routing
```
---
## 4.2 Ability Naming Rule
Ability names should be:
```text
Domain + Problem Class
```
Good:
```text
Business Email Routing
Document Classification
Invoice Data Extraction
Kubernetes Deployment Inspection
Agent Workflow Orchestration
```
Bad:
```text
Fast API
Email Button
Classifier
Uses GPT
```
---
## 4.3 Ability Extraction Sources
Best sources for abilities:
```text
README intro
project tagline
docs overview
examples index
package description
```
Ability is usually described in prose, not code.
---
## 4.4 Ability Confidence
Assign confidence based on signal quality:
```yaml
confidence:
high:
- explicitly stated in README/docs
- supported by examples
- supported by tests or APIs
medium:
- inferred from multiple capabilities/features
- visible in examples but not stated
low:
- inferred from names only
- based on dependencies or folder structure
```
---
# 5. Capability Extraction Heuristics
Capabilities describe **bounded behavior**.
## 5.1 Capability Signal Patterns
Look for verbs applied to objects:
```text
classify email
extract invoice data
summarize document
validate schema
generate response
deploy service
monitor cluster
route ticket
convert format
```
Pattern:
```text
Verb + Object
```
Examples:
```text
Classify Email Intent
Extract Invoice Metadata
Generate Routing Explanation
Validate Repository Metadata
```
---
## 5.2 Capability Naming Rule
Capability names should be:
```text
Action Verb + Domain Object
```
Good:
```text
Classify Incoming Email
Extract PDF Metadata
Generate API Client
Validate Kubernetes Manifest
Detect Broken Links
```
Bad:
```text
Email Capability
Parser
Smart Document Stuff
Endpoint
```
---
## 5.3 Capability Sources
Best sources:
```text
API route names
CLI commands
public functions
service classes
tests
examples
docs tutorials
```
Capability is often visible in code and tests.
---
## 5.4 Capability Boundary Rule
A capability should be small enough to test.
Good:
```text
Extract invoice date from PDF
Classify email into intent category
Generate markdown from DOCX
```
Too broad:
```text
Manage documents
Automate business
Understand everything
```
Too narrow:
```text
Read config variable
Call helper function
Trim whitespace
```
Rule of thumb:
> If you can write a meaningful acceptance test for it, it is probably a capability.
---
# 6. Feature Extraction Heuristics
Features describe **how the capability is exposed or implemented**.
## 6.1 Feature Signal Patterns
Look for concrete affordances:
```text
REST endpoint
CLI command
UI component
configuration option
SDK method
background job
database migration
import/export format
plugin hook
```
Examples:
```yaml
features:
- name: /classify-email endpoint
- name: classify-email CLI command
- name: department-rules.yaml config
- name: JSON result export
```
---
## 6.2 Feature Naming Rule
Feature names should be concrete and inspectable.
Good:
```text
POST /api/classify-email
classify-email CLI command
Rule Configuration File
PDF Upload Component
```
Bad:
```text
AI routing
Document understanding
Magic extraction
```
---
# 7. Evidence Extraction Heuristics
Evidence supports claims.
## 7.1 Evidence Types
```yaml
evidence_types:
unit_test
integration_test
example
demo
benchmark
documentation
API specification
production usage note
manual review
```
---
## 7.2 Evidence Mapping
Map evidence to the nearest capability.
Example:
```text
tests/test_email_classifier.py
```
Supports:
```text
Classify Incoming Email
```
Example:
```text
examples/invoice_extraction_demo.py
```
Supports:
```text
Extract Invoice Metadata
```
---
## 7.3 Evidence Strength
```yaml
evidence_strength:
strong:
- automated tests
- benchmark results
- executable examples
- integration tests
medium:
- documentation
- tutorials
- screenshots
- sample output
weak:
- README claim only
- comments
- filename hints
```
---
# 8. AbilityCapabilityFeature Linking
## 8.1 Link Rule
```text
Ability explains why.
Capability explains what.
Feature explains how/where.
```
Example:
```yaml
ability:
name: Business Email Routing
capability:
name: Classify Incoming Email
supports:
- Business Email Routing
feature:
name: POST /api/classify-email
implements:
- Classify Incoming Email
```
---
## 8.2 Linking Heuristic
A capability supports an ability if:
```text
Removing the capability would weaken the repositorys ability to deliver that usefulness.
```
A feature implements a capability if:
```text
The feature is an interface, component, or code location through which the behavior is performed or exposed.
```
---
# 9. Confidence Scoring
Use a simple additive model first.
## 9.1 Candidate Confidence Factors
```yaml
confidence_factors:
explicit_doc_claim: +0.30
example_present: +0.20
test_present: +0.25
implementation_location_found: +0.15
api_or_cli_exposed: +0.15
multiple_source_agreement: +0.20
inferred_from_names_only: -0.25
no_evidence: -0.30
```
Normalize to:
```text
0.0 1.0
```
## 9.2 Confidence Labels
```yaml
0.80 - 1.00: high
0.50 - 0.79: medium
0.20 - 0.49: low
0.00 - 0.19: speculative
```
---
# 10. Classification Rules
## 10.1 Is it an Ability?
Ask:
```text
Would a user search for this as a desired outcome?
```
If yes, probably ability.
Example:
```text
“I need document classification.”
```
Ability.
---
## 10.2 Is it a Capability?
Ask:
```text
Can this behavior be tested with input/output expectations?
```
If yes, probably capability.
Example:
```text
“Classify document into category.”
```
Capability.
---
## 10.3 Is it a Feature?
Ask:
```text
Is this a concrete interface, option, component, or implementation artifact?
```
If yes, probably feature.
Example:
```text
“POST /api/classify-document”
```
Feature.
---
# 11. Anti-Heuristics
Things the extractor should avoid.
## 11.1 Do Not Treat Dependencies as Capabilities
Bad:
```yaml
capability: Uses OpenAI
```
Better:
```yaml
feature: OpenAI provider integration
capability: Generate Text Summary
```
---
## 11.2 Do Not Treat Technology as Ability
Bad:
```yaml
ability: FastAPI
```
Better:
```yaml
feature: FastAPI REST interface
```
---
## 11.3 Do Not Treat Internal Helpers as Capabilities
Bad:
```yaml
capability: Parse YAML Config
```
Unless parsing YAML config is a user-visible behavior.
---
## 11.4 Avoid Vendor-Hype Terms
Bad:
```text
intelligent automation
next-gen AI
enterprise-ready transformation
```
Convert into testable candidates:
```text
Classify Documents
Generate Reports
Route Tasks
```
---
# 12. Extraction Pipeline v0.1
## Step 1 — Repository Intake
Collect:
```text
README
docs
examples
tests
package files
source tree
API routes
CLI definitions
```
---
## Step 2 — Structural Summary
Produce:
```yaml
repository_summary:
languages: []
frameworks: []
interfaces: []
docs_found: []
tests_found: []
examples_found: []
```
---
## Step 3 — Candidate Ability Extraction
From README/docs/package descriptions.
Output:
```yaml
candidate_abilities:
- name
- description
- confidence
- supporting_sources
```
---
## Step 4 — Candidate Capability Extraction
From APIs, tests, examples, public modules.
Output:
```yaml
candidate_capabilities:
- name
- description
- inputs
- outputs
- linked_abilities
- confidence
- supporting_sources
```
---
## Step 5 — Candidate Feature Extraction
From endpoints, CLI commands, config files, UI components, modules.
Output:
```yaml
candidate_features:
- name
- type
- location
- linked_capabilities
- confidence
```
---
## Step 6 — Evidence Linking
Attach evidence:
```yaml
evidence:
- type
- path
- supports
- strength
```
---
## Step 7 — Review Package
Generate a curator-friendly review view:
```text
Ability
Capability
Feature
Evidence
```
---
# 13. Example Extraction
Given README:
```text
MailRouter helps companies automatically classify incoming emails and route them to the right department.
```
Given route:
```text
POST /api/classify-email
```
Given test:
```text
tests/test_email_classification.py
```
Output:
```yaml
abilities:
- id: ability.business_email_routing
name: Business Email Routing
confidence: 0.9
capabilities:
- id: capability.classify_incoming_email
name: Classify Incoming Email
ability_refs:
- ability.business_email_routing
confidence: 0.85
features:
- id: feature.classify_email_endpoint
name: POST /api/classify-email
type: REST endpoint
location: src/routes/classify_email.py
capability_refs:
- capability.classify_incoming_email
evidence:
- type: unit_test
path: tests/test_email_classification.py
supports:
- capability.classify_incoming_email
strength: strong
```
---
# 14. MVP Principle
The extractor should be:
```text
conservative
explainable
reviewable
source-linked
```
Not magical.
The best first version is not the one that extracts everything.
It is the one where the user says:
> “Yes, I understand why the system proposed this.”
xxx

575
wiki/ArchitectureSketch.md Normal file
View File

@@ -0,0 +1,575 @@
ArchitectureSketch
* Repository Ability Registry — Architecture v0.1*
# Repository Ability Registry — Architecture v0.1
## 1. Core architectural idea
Use a **pipeline + registry + inspection UI** architecture.
```text
Git Repo
Ingestion
Analysis Pipeline
Candidate Registry Entries
Human Review / Approval
Searchable Ability Registry
Web UI / API / CLI
```
The system should not pretend the first analysis is truth. It produces **reviewable candidates**.
---
## 2. Main components
### 1. Registry Web App
Purpose:
* register repositories
* trigger analysis
* review results
* inspect ability maps
* search repos
Could be a normal web app with:
```text
Frontend + Backend API + Database
```
---
### 2. Git Ingestion Service
Responsibilities:
* clone/pull repositories
* checkout commit
* store snapshot metadata
* detect repo structure
Outputs:
```yaml
repo_snapshot:
repo_id
commit_hash
branch
file_tree
metadata_files
```
---
### 3. Repository Analyzer
This is the heart.
Pipeline stages:
```text
Structure Scanner
Documentation Scanner
Interface Scanner
Test Scanner
LLM-Assisted Extractor
Evidence Linker
Confidence Scorer
```
Important: split deterministic scanners from LLM extraction.
---
### 4. Candidate Registry Store
Stores unapproved results:
```text
candidate abilities
candidate capabilities
candidate features
candidate evidence
source references
confidence scores
```
These are editable.
---
### 5. Curator Review Layer
Allows a human or agent to:
* accept
* reject
* rename
* merge
* relink
* approve
This turns candidates into official registry entries.
---
### 6. Search / Query Layer
Supports:
```text
natural language search
ability search
capability search
repo search
feature search
evidence search
```
Use both:
* relational filters
* vector/semantic search
---
### 7. Public/Agent API
Expose structured access:
```http
GET /repos
GET /repos/{id}
GET /abilities
GET /capabilities
GET /search?q=...
GET /repos/{id}/ability-map
```
Later this becomes MCP-friendly.
---
## 3. Suggested storage architecture
Use a hybrid model:
### PostgreSQL
For canonical structured data:
```text
repositories
snapshots
abilities
capabilities
features
evidence
links
analysis_runs
review_status
```
### Vector index
For semantic search over:
```text
README chunks
docs chunks
ability descriptions
capability descriptions
feature descriptions
```
Start simple with `pgvector` inside PostgreSQL.
### Object/file storage
For:
```text
repo snapshots
analysis artifacts
parsed file summaries
exported registry YAML
```
Local filesystem is fine for MVP.
---
## 4. Data model sketch
```text
Repository
has many Snapshots
has many AnalysisRuns
has many RegistryEntries
Snapshot
commit_hash
branch
file_tree
extracted_documents
AnalysisRun
snapshot_id
status
started_at
completed_at
model_used
analyzer_version
Ability
repo_id
name
description
confidence
status
Capability
repo_id
ability_id
name
description
inputs
outputs
confidence
status
Feature
repo_id
capability_id
name
type
location
confidence
status
Evidence
repo_id
capability_id
type
path
strength
```
---
## 5. Analysis pipeline
### Step 1 — Clone / update repo
```text
git clone
checkout commit
record commit hash
```
---
### Step 2 — Deterministic scan
Detect:
```text
languages
frameworks
package managers
entrypoints
routes
CLI commands
tests
docs
examples
config files
```
This should be deterministic code, not LLM.
---
### Step 3 — Content chunking
Create chunks from:
```text
README
docs
examples
tests
API specs
selected source files
```
Each chunk keeps:
```text
file path
line range
content type
semantic role
```
---
### Step 4 — LLM-assisted extraction
Ask the model separately for:
```text
candidate abilities
candidate capabilities
candidate features
evidence mappings
```
Do not ask one giant prompt to do everything.
---
### Step 5 — Confidence scoring
Combine:
```text
LLM confidence
source quality
tests present
examples present
implementation found
multiple-source agreement
```
---
### Step 6 — Candidate graph generation
Output:
```text
Ability
→ Capability
→ Feature
→ Evidence
```
---
### Step 7 — Review
Only approved entries become canonical.
---
## 6. Important design decision
Separate:
```text
Observed facts
```
from:
```text
Interpreted claims
```
Example:
### Observed fact
```yaml
file: src/routes/classify.py
route: POST /classify
```
### Interpreted claim
```yaml
capability: Classify Incoming Email
```
The first is source-derived.
The second is inferred.
This distinction is crucial for trust.
---
## 7. MVP technology suggestion
A very pragmatic stack:
```text
Backend: Python FastAPI
DB: PostgreSQL + pgvector
Worker: Celery/RQ or simple background jobs
Git analysis: GitPython / subprocess git
Frontend: React / Next.js or simple server-rendered app
LLM extraction: provider-abstracted interface
```
Given your broader agent/tooling context, Python is probably best for the analyzer.
---
## 8. Web UI structure
### Page 1 — Repository List
Shows:
```text
name
description
status
last analyzed
top abilities
```
### Page 2 — Register Repository
Input:
```text
Git URL
branch
access token optional
```
### Page 3 — Analysis Run
Shows:
```text
scan progress
detected structure
candidate entries
warnings
```
### Page 4 — Review
Tree view:
```text
Ability
Capability
Feature
Evidence
```
Actions:
```text
approve
edit
reject
merge
relink
```
### Page 5 — Repository Profile
Final inspectable view.
### Page 6 — Search
Natural-language search with filters:
```text
domain
language
framework
capability type
maturity
evidence strength
```
---
## 9. Internal API boundaries
Keep clean module boundaries:
```text
repo_ingestion
repo_scanning
content_indexing
llm_extraction
candidate_graph
review_workflow
registry_query
web_api
```
This prevents the analyzer from becoming a ball of mud.
---
## 10. What to avoid in v0.1
Do not build yet:
```text
continuous GitHub app integration
full static code analysis
full ontology engine
automatic truth claims
complex permission system
benchmark execution
marketplace functionality
```
MVP should prove:
> Can we register repos, extract useful maps, review them, and search them?
---
## 11. Recommended first implementation path
### Milestone 1 — Manual Registry
Create schema + UI where entries can be entered manually.
### Milestone 2 — Deterministic Scanner
Add repo clone + README/docs/tests/interface detection.
### Milestone 3 — LLM Candidate Extraction
Generate candidate ability/capability/feature graph.
### Milestone 4 — Review Workflow
Approve/edit/reject extracted entries.
### Milestone 5 — Search
Add semantic search over approved registry entries.
---
## 12. Architecture principle
> Deterministic scanners establish facts.
> LLMs propose interpretations.
> Humans or trusted agents approve registry truth.
That should be the backbone.
xxx

View File

@@ -0,0 +1,439 @@
FunctionalRequirementsSpecV0.1
*Repository Ability Registry Functionality*
Here is a **Functional Requirements Specification (FRS)** for the **Repository Ability Registry (v0.1)**—focused strictly on **externally observable system behavior**, aligned with your architecture and PRD.
---
# **Functional Requirements Specification (FRS)**
## **Repository Ability Registry v0.1**
---
# 1. **Scope**
This FRS defines the externally visible functionality of a system that:
* registers Git repositories
* analyzes repository content
* extracts and manages **abilities, capabilities, features, and evidence**
* enables **search and inspection** of repositories via UI and API
---
# 2. **System Overview**
The system provides:
```text
Repository Registration
→ Analysis & Extraction
→ Review & Approval
→ Search & Inspection
→ API Access
```
---
# 3. **Actors**
| Actor | Description |
| ------- | ------------------------------------------ |
| User | Registers repositories, searches, inspects |
| Curator | Reviews and approves extracted data |
| System | Performs automated analysis |
| Agent | External system interacting via API |
---
# 4. **Functional Areas**
---
# 4.1 Repository Management
## FR-001 Register Repository
The system shall allow a user to register a repository by providing a Git URL.
## FR-002 Validate Repository Access
The system shall validate that the repository is accessible.
## FR-003 Store Repository Metadata
The system shall store:
* repository URL
* name
* description (optional)
* branch (default: main)
## FR-004 List Registered Repositories
The system shall allow users to view all registered repositories.
## FR-005 View Repository Details
The system shall display repository metadata and status.
---
# 4.2 Repository Analysis
## FR-010 Trigger Analysis
The system shall allow a user or system process to trigger repository analysis.
## FR-011 Clone Repository
The system shall retrieve repository contents for analysis.
## FR-012 Detect Repository Structure
The system shall identify:
* programming languages
* frameworks (if detectable)
* directory structure
* documentation files
* test files
* example files
## FR-013 Extract Content Sources
The system shall extract text from:
* README
* documentation
* examples
* selected source files
* test files
## FR-014 Record Analysis Run
The system shall record each analysis execution with:
* timestamp
* repository snapshot reference
* status
---
# 4.3 Ability Extraction
## FR-020 Generate Candidate Abilities
The system shall generate candidate abilities from repository content.
## FR-021 Ability Attributes
Each ability shall include:
* name
* description
* confidence score
* source references
## FR-022 List Candidate Abilities
The system shall display candidate abilities for a repository.
---
# 4.4 Capability Extraction
## FR-030 Generate Candidate Capabilities
The system shall generate candidate capabilities.
## FR-031 Capability Attributes
Each capability shall include:
* name
* description
* inputs (if detectable)
* outputs (if detectable)
* linked ability references
* confidence score
## FR-032 List Candidate Capabilities
The system shall display candidate capabilities.
---
# 4.5 Feature Extraction
## FR-040 Generate Candidate Features
The system shall generate candidate features.
## FR-041 Feature Attributes
Each feature shall include:
* name
* type (e.g. API, CLI, config)
* implementation location (file path)
* linked capability references
* confidence score
## FR-042 List Candidate Features
The system shall display candidate features.
---
# 4.6 Evidence Handling
## FR-050 Detect Evidence Sources
The system shall identify evidence such as:
* test files
* examples
* documentation
## FR-051 Associate Evidence with Capabilities
The system shall link evidence to relevant capabilities.
## FR-052 Evidence Attributes
Each evidence item shall include:
* type
* file path or reference
* associated capability
* strength classification
---
# 4.7 Review and Curation
## FR-060 View Analysis Results
The system shall allow users to view extracted abilities, capabilities, features, and evidence.
## FR-061 Edit Entries
The system shall allow users to:
* modify names
* modify descriptions
* adjust relationships
## FR-062 Accept Entries
The system shall allow users to mark entries as approved.
## FR-063 Reject Entries
The system shall allow users to remove candidate entries.
## FR-064 Merge Entries
The system shall allow users to merge duplicate or overlapping entries.
## FR-065 Persist Approved Entries
The system shall store approved entries as canonical registry data.
---
# 4.8 Search
## FR-070 Natural Language Search
The system shall allow users to search repositories using free-text queries.
## FR-071 Semantic Matching
The system shall match queries to:
* abilities
* capabilities
* repository descriptions
## FR-072 Display Search Results
Search results shall include:
* repository name
* matching ability/capability
* confidence indicator
## FR-073 Filter Results
The system shall allow filtering by:
* language
* framework (if available)
* ability/capability presence
---
# 4.9 Repository Inspection
## FR-080 View Repository Profile
The system shall display an inspectable repository view.
## FR-081 Display Ability Map
The system shall display:
```text
Ability → Capability → Feature → Code Location
```
## FR-082 Drill-down Navigation
The system shall allow users to navigate:
* from ability to capabilities
* from capability to features
* from feature to code location
## FR-083 Display Evidence
The system shall display evidence associated with capabilities.
## FR-084 Display Confidence
The system shall display confidence levels for abilities, capabilities, and features.
---
# 4.10 API Access
## FR-090 Repository API
The system shall provide an API to retrieve repository metadata.
## FR-091 Ability API
The system shall provide an API to retrieve abilities.
## FR-092 Capability API
The system shall provide an API to retrieve capabilities.
## FR-093 Search API
The system shall provide an API for search queries.
## FR-094 Inspection API
The system shall provide an API to retrieve full ability maps.
---
# 4.11 Updates
## FR-100 Re-run Analysis
The system shall allow re-analysis of a repository.
## FR-101 Track Changes
The system shall record differences between analysis runs.
---
# 5. **Data Handling Requirements**
## FR-110 Store Repository Snapshots
The system shall maintain references to analyzed repository versions.
## FR-111 Maintain Versioned Entries
The system shall retain historical analysis results.
---
# 6. **Error Handling**
## FR-120 Analysis Failure Handling
The system shall report analysis failures without corrupting existing data.
## FR-121 Partial Results
The system shall allow partial extraction results to be stored and displayed.
---
# 7. **Security & Access (MVP)**
## FR-130 Basic Access Control
The system shall support:
* public repositories
* optional restricted access
(No advanced role model required in v0.1)
---
# 8. **User Interface Requirements**
## FR-140 Repository Registration UI
The system shall provide a form to register repositories.
## FR-141 Analysis View UI
The system shall display analysis progress and results.
## FR-142 Review UI
The system shall allow editing and approval of entries.
## FR-143 Search UI
The system shall provide a search interface.
## FR-144 Inspection UI
The system shall display the ability-capability-feature hierarchy.
---
# 9. **Acceptance Criteria (MVP)**
The system shall be considered functional when:
1. A repository can be registered and analyzed
2. Candidate abilities, capabilities, and features are generated
3. A user can review and approve extracted entries
4. A user can search for repositories using natural language
5. A user can inspect a repository via an ability map
6. A user can navigate from abstraction to code location
---
# 10. **Summary**
The system provides:
> A structured, inspectable, and searchable mapping from repository usefulness (abilities) to concrete implementation (features and code), supported by evidence and reviewable extraction.
---
## Next high-leverage step
If you want to push toward implementation:
👉 Define **API schema (OpenAPI spec)**
👉 Design **database schema in SQL (ready to implement)**
👉 Build **first analyzer prototype (step-by-step coding plan)**
xxx

104
wiki/LandingPage.md Normal file
View File

@@ -0,0 +1,104 @@
LandingPage
*Introducing the Repository Ability Registry*
# **Repository Ability Registry**
## **Understand What Code Can Do. Instantly.**
### *From raw repositories to real understanding.*
---
## **Code is everywhere. Understanding it isnt.**
Every day, developers and architects face the same problem:
* Repositories are hard to navigate
* READMEs are incomplete or misleading
* Useful functionality is hidden deep in code
* Reuse decisions rely on guesswork
> You dont need more code.
> You need to understand the code you already have.
---
## **What if repositories could explain themselves?**
The **Repository Ability Registry** transforms Git repositories into **structured, inspectable knowledge**.
It reveals:
* **Abilities** — what a repository is useful for
* **Capabilities** — what it can actually do
* **Features** — how those behaviors are implemented
* **Evidence** — why you can trust those claims
---
## **How it works**
### 🔗 **1. Register a Repository**
Add any Git repository to the registry.
---
### 🧠 **2. Analyze & Extract**
The system scans code, documentation, tests, and interfaces to generate a structured map.
---
### 🛠 **3. Review & Refine**
You or your team validate and improve the extracted understanding.
---
### 🔍 **4. Search & Inspect**
Explore repositories through a powerful interface:
* Search by need (“classify documents”, “route emails”)
* Compare repositories
* Drill down from abstraction to code
---
## **From Structure to Understanding**
Most tools show you:
> files, folders, and dependencies
We show you:
> **what the repository can actually help you do—and where to find it**
---
## **Built for Humans and Agents**
Whether youre:
* a developer looking for reusable components
* an architect mapping system capabilities
* an AI agent navigating codebases
The registry provides a shared, machine-readable understanding.
---
## **The Missing Layer**
Between raw code and real-world usefulness, there is a gap.
The **Repository Ability Registry** fills it.
---
### **Start turning code into capability.**
xxx

View File

@@ -0,0 +1,495 @@
ProductRequirementsSpecV0.1
*Repository Ability Registry Requirements*
# **Product Requirements Document (PRD)**
## **Repository Ability Registry (v0.1)**
---
# 1. **Purpose**
The **Repository Ability Registry** provides a structured, inspectable orientation layer for code repositories by mapping them from **abilities → capabilities → features → implementation → evidence**.
It enables developers, architects, and agents to:
* understand what a repository is useful for
* inspect how it implements that usefulness
* compare repositories based on real functionality
* search repositories using natural language
---
# 2. **Problem Statement**
Modern code repositories suffer from **orientation opacity**:
* usefulness is unclear from structure alone
* README files are incomplete or misleading
* functionality is scattered across code
* reuse decisions rely on tribal knowledge
This results in:
* wasted time exploring repositories
* duplicated work
* poor architectural decisions
* low reuse of existing assets
---
# 3. **Goals**
## 3.1 Primary Goal
> Enable users to **register a repository and understand its usefulness within minutes** via an inspectable ability map.
---
## 3.2 Secondary Goals
* enable **natural language search** across repositories
* provide **traceability from abstraction to code**
* establish a **foundation for proof-of-ability integration**
* support both **human and agent interaction**
---
## 3.3 Non-Goals (v0.1)
* full automated benchmarking
* advanced ontology enforcement
* marketplace features
* monetization features
* full CI/CD integration
---
# 4. **Users**
## 4.1 Developer / Architect
* needs to find reusable functionality
* needs to understand unfamiliar repos quickly
## 4.2 Repository Owner
* wants their repo to be understandable and discoverable
## 4.3 Registry Curator
* ensures quality and correctness of extracted data
## 4.4 Agent / Automation
* queries registry programmatically
---
# 5. **Core Concepts**
```text
Ability = why the repository is useful
Capability = what behavior it provides
Feature = how the behavior is exposed/implemented
Evidence = why the capability can be trusted
```
---
# 6. **Product Scope (MVP)**
## 6.1 Included
* repository registration
* repository analysis (semi-automated)
* ability/capability/feature extraction (assisted)
* human review and correction
* searchable registry
* inspectable repository view
---
## 6.2 Excluded
* automated full-code understanding engine
* distributed indexing
* deep static analysis
* real-time synchronization
* full dependency graph modeling
---
# 7. **Functional Requirements**
---
## 7.1 Repository Registration
### FR-01
User can register a repository via Git URL.
### FR-02
System validates repository access.
### FR-03
System stores repository metadata.
---
## 7.2 Repository Analysis
### FR-04
System clones or accesses repository contents.
### FR-05
System extracts:
* languages
* structure
* interfaces (API, CLI, etc.)
* documentation files
* tests
---
## 7.3 Ability Extraction
### FR-06
System generates candidate abilities based on:
* README
* documentation
* examples
### FR-07
Each ability includes:
* name
* description
* confidence
* source references
---
## 7.4 Capability Extraction
### FR-08
System identifies candidate capabilities from:
* modules
* APIs
* tests
* code structure
### FR-09
Capabilities include:
* inputs
* outputs
* linked abilities
* description
---
## 7.5 Feature Extraction
### FR-10
System identifies concrete features:
* endpoints
* CLI commands
* modules
* configuration interfaces
### FR-11
Each feature includes:
* type
* implementation location
* linked capability
---
## 7.6 Evidence Linking
### FR-12
System identifies potential evidence:
* tests
* examples
* benchmarks
* docs
### FR-13
User can manually attach evidence to capabilities.
---
## 7.7 Review and Curation
### FR-14
User can review extracted abilities, capabilities, features.
### FR-15
User can edit, delete, or merge entries.
### FR-16
User can approve registry entry.
---
## 7.8 Search
### FR-17
User can search using natural language.
### FR-18
System maps queries to abilities/capabilities.
### FR-19
Search results include:
* repository name
* matching ability
* matching capability
* confidence
---
## 7.9 Inspection UI
### FR-20
User can view repository profile.
### FR-21
UI shows hierarchical structure:
```text
Ability → Capability → Feature → Code
```
### FR-22
User can drill down to:
* feature details
* code locations
* evidence
---
## 7.10 API Access
### FR-23
System exposes API endpoints for:
* search
* repository retrieval
* capability lookup
---
# 8. **Non-Functional Requirements**
---
## 8.1 Performance
* search results returned within < 2 seconds
* repository analysis < 2 minutes for medium repos
---
## 8.2 Scalability
* support 1001,000 repositories (MVP)
* architecture allows later horizontal scaling
---
## 8.3 Usability
* repository can be registered in < 2 minutes
* inspection view understandable within 1 minute
---
## 8.4 Extensibility
* schema versioned
* ability to extend capability model later
---
## 8.5 Reliability
* analysis failures do not corrupt registry
* partial results allowed
---
# 9. **Data Model (Simplified)**
```yaml
Repository:
id
name
url
description
metadata
Ability:
id
name
description
confidence
Capability:
id
name
description
inputs
outputs
ability_refs
Feature:
id
name
type
location
capability_refs
Evidence:
id
type
reference
capability_refs
```
---
# 10. **User Experience**
---
## 10.1 Key Screens
### 1. Repository Registration
* input URL
* confirm metadata
### 2. Analysis View
* detected structure
* proposed abilities/capabilities/features
### 3. Review Interface
* edit/approve entries
### 4. Search Page
* natural language search
* result list
### 5. Repository Profile Page
* ability map
* drill-down navigation
* code links
---
# 11. **Success Metrics**
---
## 11.1 Product Metrics
* time to understand a repository < 5 minutes
* number of repositories indexed
* number of successful searches
* user engagement with inspection view
---
## 11.2 Qualitative Metrics
* “I understand what this repo does now”
* “I found something reusable”
* “This saved me time”
---
# 12. **Risks**
---
## 12.1 Extraction Quality Risk
* automated inference may be inaccurate
**Mitigation:** human review required
---
## 12.2 Over-Complexity Risk
* ontology becomes too heavy
**Mitigation:** keep schema minimal in v0.1
---
## 12.3 Adoption Risk
* users may not contribute metadata
**Mitigation:** provide value via auto-analysis first
---
# 13. **Future Extensions**
* integration with CI/CD for live updates
* ability benchmarking integration (ties back to GIL)
* capability maturity scoring
* dependency graph across repositories
* marketplace / discovery layer
* automated code reasoning agents
---
# 14. **Positioning**
> The Repository Ability Registry is the missing orientation layer between raw code and practical reuse—making repositories understandable, comparable, and actionable.
xxx

View File

@@ -0,0 +1,7 @@
RepositoryRegistry
*Exploring repositories by ability, capability and features*
Our mission is to make code understandable and usable by transforming repositories into transparent, structured representations of their abilities, capabilities, and features—so humans and intelligent systems can reliably discover, evaluate, and build upon what already exists.
xxx

610
wiki/UseCaseCatalog.md Normal file
View File

@@ -0,0 +1,610 @@
UseCaseCatalog
*Register, analyse and explore git repos*
# Use Case Catalog
## Repository Ability Registry
## 1. Scope
The **Repository Ability Registry** allows users to register Git repositories, analyze their contents, extract or maintain structured descriptions of their **abilities, capabilities, features, evidence, and implementation locations**, and make this information searchable through efficient interfaces and a web UI.
Core promise:
> Register a repository. Understand what it can do. Inspect the evidence. Find what is useful.
---
# 2. Primary Actors
## Repository Owner
Maintains a Git repository and wants it represented accurately in the registry.
## Registry Curator
Reviews, improves, corrects, and approves extracted ability/capability/feature metadata.
## Developer / Architect
Searches across registered repositories to find reusable functionality or understand system structure.
## Agent / Automation
Uses API/CLI access to query repositories, capabilities, evidence, and implementation locations.
## Viewer / Explorer
Uses the web UI to browse and inspect registered repositories.
---
# 3. Core Domain Objects
```text
Repository
Ability
Capability
Feature
Evidence
Analysis Run
Registry Entry
Inspection View
Search Query
```
---
# 4. Use Case Overview
| ID | Use Case | Primary Actor |
| ----- | --------------------------------- | ------------------------ |
| UC-01 | Register Git Repository | Repository Owner |
| UC-02 | Import Repository Metadata | System |
| UC-03 | Analyze Repository Structure | System |
| UC-04 | Extract Candidate Abilities | System / Agent |
| UC-05 | Extract Candidate Capabilities | System / Agent |
| UC-06 | Extract Candidate Features | System / Agent |
| UC-07 | Link Features to Code Locations | System |
| UC-08 | Attach Evidence to Capabilities | System / Curator |
| UC-09 | Review and Approve Analysis | Registry Curator |
| UC-10 | Search Repositories by Need | Developer / Architect |
| UC-11 | Inspect Repository Ability Map | Developer / Architect |
| UC-12 | Compare Repositories | Developer / Architect |
| UC-13 | Detect Capability Gaps | Architect / Agent |
| UC-14 | Expose Registry via API | Agent / Automation |
| UC-15 | Update Registry After Repo Change | System |
| UC-16 | Export Registry Entry | Repository Owner / Agent |
---
# 5. Detailed Use Cases
## UC-01 — Register Git Repository
### Goal
Add a Git repository to the registry.
### Primary Actor
Repository Owner
### Preconditions
The actor has a repository URL and access rights if the repo is private.
### Main Flow
1. Actor opens “Register Repository”.
2. Actor enters Git URL.
3. Actor provides optional metadata:
* name
* description
* owner
* domain tags
* visibility
4. System validates access.
5. System creates a repository record.
6. System queues initial analysis.
### Postconditions
Repository exists in the registry with status `registered`.
---
## UC-02 — Import Repository Metadata
### Goal
Collect basic metadata from the Git repository.
### Primary Actor
System
### Main Flow
1. System clones or accesses the repository.
2. System reads metadata files:
* README
* package manifests
* build files
* license
* docs
* existing registry metadata
3. System stores detected metadata.
4. System prepares repository for deeper analysis.
### Postconditions
Repository has initial metadata and source snapshot reference.
---
## UC-03 — Analyze Repository Structure
### Goal
Create a structural map of the repository.
### Primary Actor
System
### Main Flow
1. System detects language/frameworks.
2. System identifies major folders and modules.
3. System detects APIs, CLIs, services, tests, examples, docs.
4. System stores structural findings.
5. System generates a repository structure summary.
### Output
Example:
```yaml
languages:
- Python
interfaces:
- REST API
- CLI
tests:
- pytest
documentation:
- README.md
- docs/
```
---
## UC-04 — Extract Candidate Abilities
### Goal
Infer high-level usefulness from the repository.
### Primary Actor
System / Agent
### Main Flow
1. System analyzes README, docs, examples, names, and tests.
2. System proposes candidate abilities.
3. Each ability receives:
* name
* description
* confidence
* supporting sources
4. Candidate abilities are shown for review.
### Example Output
```yaml
abilities:
- name: Business Email Routing
confidence: 0.82
rationale: README and examples describe routing inbound messages.
```
---
## UC-05 — Extract Candidate Capabilities
### Goal
Identify bounded behaviors the repository provides.
### Primary Actor
System / Agent
### Main Flow
1. System inspects APIs, modules, functions, tests, examples.
2. System proposes capabilities.
3. System links capabilities to abilities.
4. System records confidence and source evidence.
### Example
```yaml
capabilities:
- name: Email Intent Classification
ability_refs:
- Business Email Routing
inputs:
- email subject
- email body
outputs:
- intent category
- confidence
```
---
## UC-06 — Extract Candidate Features
### Goal
Identify concrete exposed or implemented features.
### Primary Actor
System / Agent
### Main Flow
1. System detects endpoints, commands, UI components, config options, public functions, modules.
2. System creates candidate feature entries.
3. System links features to capabilities.
4. System records implementation locations.
### Example
```yaml
features:
- name: Classify Email Endpoint
type: REST endpoint
location: src/api/routes/classify.py
capability_refs:
- Email Intent Classification
```
---
## UC-07 — Link Features to Code Locations
### Goal
Make registry entries inspectable down to source code.
### Primary Actor
System
### Main Flow
1. System identifies file paths and symbols.
2. System links registry entries to code locations.
3. UI exposes links from ability → capability → feature → source.
4. Actor can inspect relevant files without browsing the whole repo.
### Postconditions
Features and capabilities are traceable to implementation.
---
## UC-08 — Attach Evidence to Capabilities
### Goal
Support trust in capability claims.
### Primary Actor
System / Curator
### Main Flow
1. System detects tests, examples, benchmarks, demos, docs.
2. System proposes evidence links.
3. Curator confirms or edits evidence.
4. Evidence is attached to capabilities.
### Evidence Types
```text
unit test
integration test
example
demo
benchmark
documentation
production usage note
manual review
```
---
## UC-09 — Review and Approve Analysis
### Goal
Allow human correction before registry metadata becomes authoritative.
### Primary Actor
Registry Curator
### Main Flow
1. Curator opens analysis results.
2. Curator reviews proposed abilities, capabilities, features, and evidence.
3. Curator accepts, edits, rejects, or merges entries.
4. System saves approved registry entry.
5. Repository status changes to `indexed`.
### Postconditions
Repository has a reviewed registry profile.
---
## UC-10 — Search Repositories by Need
### Goal
Find repositories using everyday language.
### Primary Actor
Developer / Architect
### Main Flow
1. Actor enters a query such as:
> “I need something that can classify incoming customer emails.”
2. System maps query to possible abilities and capabilities.
3. System returns matching repositories.
4. Results show:
* matching ability
* matching capability
* confidence
* maturity
* evidence level
### Postconditions
Actor can identify candidate repositories without knowing their names.
---
## UC-11 — Inspect Repository Ability Map
### Goal
Understand what a repository is useful for.
### Primary Actor
Developer / Architect
### Main Flow
1. Actor opens repository profile.
2. UI displays:
* repository summary
* abilities
* capabilities
* features
* evidence
* code links
3. Actor drills down from high-level ability to implementation details.
### Key UI Concept
```text
Ability
→ Capability
→ Feature
→ Code Location
→ Evidence
```
---
## UC-12 — Compare Repositories
### Goal
Compare multiple repositories by abilities, capabilities, maturity, and evidence.
### Primary Actor
Developer / Architect
### Main Flow
1. Actor selects two or more repositories.
2. System shows comparison matrix.
3. Actor compares:
* overlapping abilities
* unique capabilities
* maturity
* evidence quality
* interfaces
4. Actor identifies best fit or complementarity.
---
## UC-13 — Detect Capability Gaps
### Goal
Identify missing, weak, or unsupported capabilities.
### Primary Actor
Architect / Agent
### Main Flow
1. Actor defines desired ability or target capability map.
2. System compares desired map with registered repositories.
3. System reports:
* missing capabilities
* weakly evidenced capabilities
* duplicate capabilities
* abandoned repositories
* features without mapped capability
4. Actor uses results for planning.
### Example Output
```text
Gap: Document classification ability exists, but no repository provides benchmarked German-language evaluation.
```
---
## UC-14 — Expose Registry via API
### Goal
Allow agents and external tools to query the registry.
### Primary Actor
Agent / Automation
### Main Flow
1. Agent calls registry API.
2. API supports queries such as:
* find repositories by ability
* show capability details
* list features of repo
* retrieve evidence links
3. API returns structured JSON.
### Example
```http
GET /api/capabilities?query=email-routing
```
---
## UC-15 — Update Registry After Repo Change
### Goal
Keep registry entries aligned with repository changes.
### Primary Actor
System
### Main Flow
1. System detects repository change.
2. System runs incremental analysis.
3. System compares old and new findings.
4. System flags changed abilities/capabilities/features.
5. Curator reviews differences.
6. Registry entry is updated.
### Postconditions
Registry reflects current repository state.
---
## UC-16 — Export Registry Entry
### Goal
Allow registry data to travel with the repository.
### Primary Actor
Repository Owner / Agent
### Main Flow
1. Actor requests export.
2. System generates `repo-abilities.yaml`.
3. Actor commits file into repository.
4. Future analyses can use it as a prior.
### Output
```text
/.well-known/repo-abilities.yaml
```
---
# 6. MVP Use Cases
For the first version, implement only these:
```text
UC-01 Register Git Repository
UC-02 Import Repository Metadata
UC-03 Analyze Repository Structure
UC-04 Extract Candidate Abilities
UC-05 Extract Candidate Capabilities
UC-06 Extract Candidate Features
UC-09 Review and Approve Analysis
UC-10 Search Repositories by Need
UC-11 Inspect Repository Ability Map
```
Everything else can follow.
---
# 7. Core MVP User Journey
```text
Register repo
Analyze repo
Review extracted ability/capability/feature map
Publish registry profile
Search and inspect repos through web UI/API
```
---
# 8. Product Principle
The registry should not merely answer:
> “What files are in this repository?”
It should answer:
> **“What can this repository help me do, how does it do it, and where can I inspect the proof?”**
xxx