generated from coulomb/repo-seed
added initial concept documents
This commit is contained in:
822
wiki/AbilityExtractionHeuristics.md
Normal file
822
wiki/AbilityExtractionHeuristics.md
Normal file
@@ -0,0 +1,822 @@
|
||||
AbilityExtractionHeuristics
|
||||
|
||||
*How repositories will be explored*
|
||||
|
||||
# Ability / Capability Extraction Heuristics v0.1
|
||||
|
||||
## Repository Ability Registry
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
The extraction engine should answer:
|
||||
|
||||
> “What is this repository useful for, what bounded behaviors does it provide, and where are those behaviors implemented?”
|
||||
|
||||
It should produce **candidate entries**, not final truth. Human/agent review remains part of the workflow.
|
||||
|
||||
---
|
||||
|
||||
# 2. Extraction Layers
|
||||
|
||||
```text
|
||||
Ability → usefulness / problem class
|
||||
Capability → bounded behavior
|
||||
Feature → concrete interface or implementation
|
||||
Evidence → reason to believe the claim
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 3. Source Priority
|
||||
|
||||
Not all repository signals are equally trustworthy.
|
||||
|
||||
## Priority 1 — High Trust
|
||||
|
||||
Use these first:
|
||||
|
||||
```text
|
||||
README
|
||||
docs/
|
||||
examples/
|
||||
tests/
|
||||
API specs
|
||||
CLI help
|
||||
package metadata
|
||||
```
|
||||
|
||||
These usually express intended usage.
|
||||
|
||||
## Priority 2 — Medium Trust
|
||||
|
||||
```text
|
||||
module names
|
||||
function names
|
||||
class names
|
||||
route names
|
||||
config files
|
||||
workflow files
|
||||
```
|
||||
|
||||
These show implemented structure.
|
||||
|
||||
## Priority 3 — Low Trust
|
||||
|
||||
```text
|
||||
comments
|
||||
commit messages
|
||||
dependency names
|
||||
directory names alone
|
||||
```
|
||||
|
||||
Useful as supporting signals, but not enough by themselves.
|
||||
|
||||
---
|
||||
|
||||
# 4. Ability Extraction Heuristics
|
||||
|
||||
Abilities describe **why the repository is useful**.
|
||||
|
||||
## 4.1 Ability Signal Patterns
|
||||
|
||||
Look for phrases like:
|
||||
|
||||
```text
|
||||
"helps users..."
|
||||
"enables..."
|
||||
"automates..."
|
||||
"provides a way to..."
|
||||
"used for..."
|
||||
"designed to..."
|
||||
"allows..."
|
||||
"supports..."
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
"This library helps route incoming business emails."
|
||||
```
|
||||
|
||||
Candidate ability:
|
||||
|
||||
```yaml
|
||||
name: Business Email Routing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4.2 Ability Naming Rule
|
||||
|
||||
Ability names should be:
|
||||
|
||||
```text
|
||||
Domain + Problem Class
|
||||
```
|
||||
|
||||
Good:
|
||||
|
||||
```text
|
||||
Business Email Routing
|
||||
Document Classification
|
||||
Invoice Data Extraction
|
||||
Kubernetes Deployment Inspection
|
||||
Agent Workflow Orchestration
|
||||
```
|
||||
|
||||
Bad:
|
||||
|
||||
```text
|
||||
Fast API
|
||||
Email Button
|
||||
Classifier
|
||||
Uses GPT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4.3 Ability Extraction Sources
|
||||
|
||||
Best sources for abilities:
|
||||
|
||||
```text
|
||||
README intro
|
||||
project tagline
|
||||
docs overview
|
||||
examples index
|
||||
package description
|
||||
```
|
||||
|
||||
Ability is usually described in prose, not code.
|
||||
|
||||
---
|
||||
|
||||
## 4.4 Ability Confidence
|
||||
|
||||
Assign confidence based on signal quality:
|
||||
|
||||
```yaml
|
||||
confidence:
|
||||
high:
|
||||
- explicitly stated in README/docs
|
||||
- supported by examples
|
||||
- supported by tests or APIs
|
||||
|
||||
medium:
|
||||
- inferred from multiple capabilities/features
|
||||
- visible in examples but not stated
|
||||
|
||||
low:
|
||||
- inferred from names only
|
||||
- based on dependencies or folder structure
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 5. Capability Extraction Heuristics
|
||||
|
||||
Capabilities describe **bounded behavior**.
|
||||
|
||||
## 5.1 Capability Signal Patterns
|
||||
|
||||
Look for verbs applied to objects:
|
||||
|
||||
```text
|
||||
classify email
|
||||
extract invoice data
|
||||
summarize document
|
||||
validate schema
|
||||
generate response
|
||||
deploy service
|
||||
monitor cluster
|
||||
route ticket
|
||||
convert format
|
||||
```
|
||||
|
||||
Pattern:
|
||||
|
||||
```text
|
||||
Verb + Object
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
```text
|
||||
Classify Email Intent
|
||||
Extract Invoice Metadata
|
||||
Generate Routing Explanation
|
||||
Validate Repository Metadata
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5.2 Capability Naming Rule
|
||||
|
||||
Capability names should be:
|
||||
|
||||
```text
|
||||
Action Verb + Domain Object
|
||||
```
|
||||
|
||||
Good:
|
||||
|
||||
```text
|
||||
Classify Incoming Email
|
||||
Extract PDF Metadata
|
||||
Generate API Client
|
||||
Validate Kubernetes Manifest
|
||||
Detect Broken Links
|
||||
```
|
||||
|
||||
Bad:
|
||||
|
||||
```text
|
||||
Email Capability
|
||||
Parser
|
||||
Smart Document Stuff
|
||||
Endpoint
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5.3 Capability Sources
|
||||
|
||||
Best sources:
|
||||
|
||||
```text
|
||||
API route names
|
||||
CLI commands
|
||||
public functions
|
||||
service classes
|
||||
tests
|
||||
examples
|
||||
docs tutorials
|
||||
```
|
||||
|
||||
Capability is often visible in code and tests.
|
||||
|
||||
---
|
||||
|
||||
## 5.4 Capability Boundary Rule
|
||||
|
||||
A capability should be small enough to test.
|
||||
|
||||
Good:
|
||||
|
||||
```text
|
||||
Extract invoice date from PDF
|
||||
Classify email into intent category
|
||||
Generate markdown from DOCX
|
||||
```
|
||||
|
||||
Too broad:
|
||||
|
||||
```text
|
||||
Manage documents
|
||||
Automate business
|
||||
Understand everything
|
||||
```
|
||||
|
||||
Too narrow:
|
||||
|
||||
```text
|
||||
Read config variable
|
||||
Call helper function
|
||||
Trim whitespace
|
||||
```
|
||||
|
||||
Rule of thumb:
|
||||
|
||||
> If you can write a meaningful acceptance test for it, it is probably a capability.
|
||||
|
||||
---
|
||||
|
||||
# 6. Feature Extraction Heuristics
|
||||
|
||||
Features describe **how the capability is exposed or implemented**.
|
||||
|
||||
## 6.1 Feature Signal Patterns
|
||||
|
||||
Look for concrete affordances:
|
||||
|
||||
```text
|
||||
REST endpoint
|
||||
CLI command
|
||||
UI component
|
||||
configuration option
|
||||
SDK method
|
||||
background job
|
||||
database migration
|
||||
import/export format
|
||||
plugin hook
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
```yaml
|
||||
features:
|
||||
- name: /classify-email endpoint
|
||||
- name: classify-email CLI command
|
||||
- name: department-rules.yaml config
|
||||
- name: JSON result export
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6.2 Feature Naming Rule
|
||||
|
||||
Feature names should be concrete and inspectable.
|
||||
|
||||
Good:
|
||||
|
||||
```text
|
||||
POST /api/classify-email
|
||||
classify-email CLI command
|
||||
Rule Configuration File
|
||||
PDF Upload Component
|
||||
```
|
||||
|
||||
Bad:
|
||||
|
||||
```text
|
||||
AI routing
|
||||
Document understanding
|
||||
Magic extraction
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 7. Evidence Extraction Heuristics
|
||||
|
||||
Evidence supports claims.
|
||||
|
||||
## 7.1 Evidence Types
|
||||
|
||||
```yaml
|
||||
evidence_types:
|
||||
unit_test
|
||||
integration_test
|
||||
example
|
||||
demo
|
||||
benchmark
|
||||
documentation
|
||||
API specification
|
||||
production usage note
|
||||
manual review
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7.2 Evidence Mapping
|
||||
|
||||
Map evidence to the nearest capability.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
tests/test_email_classifier.py
|
||||
```
|
||||
|
||||
Supports:
|
||||
|
||||
```text
|
||||
Classify Incoming Email
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
examples/invoice_extraction_demo.py
|
||||
```
|
||||
|
||||
Supports:
|
||||
|
||||
```text
|
||||
Extract Invoice Metadata
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7.3 Evidence Strength
|
||||
|
||||
```yaml
|
||||
evidence_strength:
|
||||
strong:
|
||||
- automated tests
|
||||
- benchmark results
|
||||
- executable examples
|
||||
- integration tests
|
||||
|
||||
medium:
|
||||
- documentation
|
||||
- tutorials
|
||||
- screenshots
|
||||
- sample output
|
||||
|
||||
weak:
|
||||
- README claim only
|
||||
- comments
|
||||
- filename hints
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 8. Ability–Capability–Feature Linking
|
||||
|
||||
## 8.1 Link Rule
|
||||
|
||||
```text
|
||||
Ability explains why.
|
||||
Capability explains what.
|
||||
Feature explains how/where.
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
ability:
|
||||
name: Business Email Routing
|
||||
|
||||
capability:
|
||||
name: Classify Incoming Email
|
||||
supports:
|
||||
- Business Email Routing
|
||||
|
||||
feature:
|
||||
name: POST /api/classify-email
|
||||
implements:
|
||||
- Classify Incoming Email
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8.2 Linking Heuristic
|
||||
|
||||
A capability supports an ability if:
|
||||
|
||||
```text
|
||||
Removing the capability would weaken the repository’s ability to deliver that usefulness.
|
||||
```
|
||||
|
||||
A feature implements a capability if:
|
||||
|
||||
```text
|
||||
The feature is an interface, component, or code location through which the behavior is performed or exposed.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 9. Confidence Scoring
|
||||
|
||||
Use a simple additive model first.
|
||||
|
||||
## 9.1 Candidate Confidence Factors
|
||||
|
||||
```yaml
|
||||
confidence_factors:
|
||||
explicit_doc_claim: +0.30
|
||||
example_present: +0.20
|
||||
test_present: +0.25
|
||||
implementation_location_found: +0.15
|
||||
api_or_cli_exposed: +0.15
|
||||
multiple_source_agreement: +0.20
|
||||
inferred_from_names_only: -0.25
|
||||
no_evidence: -0.30
|
||||
```
|
||||
|
||||
Normalize to:
|
||||
|
||||
```text
|
||||
0.0 – 1.0
|
||||
```
|
||||
|
||||
## 9.2 Confidence Labels
|
||||
|
||||
```yaml
|
||||
0.80 - 1.00: high
|
||||
0.50 - 0.79: medium
|
||||
0.20 - 0.49: low
|
||||
0.00 - 0.19: speculative
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 10. Classification Rules
|
||||
|
||||
## 10.1 Is it an Ability?
|
||||
|
||||
Ask:
|
||||
|
||||
```text
|
||||
Would a user search for this as a desired outcome?
|
||||
```
|
||||
|
||||
If yes, probably ability.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
“I need document classification.”
|
||||
```
|
||||
|
||||
Ability.
|
||||
|
||||
---
|
||||
|
||||
## 10.2 Is it a Capability?
|
||||
|
||||
Ask:
|
||||
|
||||
```text
|
||||
Can this behavior be tested with input/output expectations?
|
||||
```
|
||||
|
||||
If yes, probably capability.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
“Classify document into category.”
|
||||
```
|
||||
|
||||
Capability.
|
||||
|
||||
---
|
||||
|
||||
## 10.3 Is it a Feature?
|
||||
|
||||
Ask:
|
||||
|
||||
```text
|
||||
Is this a concrete interface, option, component, or implementation artifact?
|
||||
```
|
||||
|
||||
If yes, probably feature.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
“POST /api/classify-document”
|
||||
```
|
||||
|
||||
Feature.
|
||||
|
||||
---
|
||||
|
||||
# 11. Anti-Heuristics
|
||||
|
||||
Things the extractor should avoid.
|
||||
|
||||
## 11.1 Do Not Treat Dependencies as Capabilities
|
||||
|
||||
Bad:
|
||||
|
||||
```yaml
|
||||
capability: Uses OpenAI
|
||||
```
|
||||
|
||||
Better:
|
||||
|
||||
```yaml
|
||||
feature: OpenAI provider integration
|
||||
capability: Generate Text Summary
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11.2 Do Not Treat Technology as Ability
|
||||
|
||||
Bad:
|
||||
|
||||
```yaml
|
||||
ability: FastAPI
|
||||
```
|
||||
|
||||
Better:
|
||||
|
||||
```yaml
|
||||
feature: FastAPI REST interface
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11.3 Do Not Treat Internal Helpers as Capabilities
|
||||
|
||||
Bad:
|
||||
|
||||
```yaml
|
||||
capability: Parse YAML Config
|
||||
```
|
||||
|
||||
Unless parsing YAML config is a user-visible behavior.
|
||||
|
||||
---
|
||||
|
||||
## 11.4 Avoid Vendor-Hype Terms
|
||||
|
||||
Bad:
|
||||
|
||||
```text
|
||||
intelligent automation
|
||||
next-gen AI
|
||||
enterprise-ready transformation
|
||||
```
|
||||
|
||||
Convert into testable candidates:
|
||||
|
||||
```text
|
||||
Classify Documents
|
||||
Generate Reports
|
||||
Route Tasks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 12. Extraction Pipeline v0.1
|
||||
|
||||
## Step 1 — Repository Intake
|
||||
|
||||
Collect:
|
||||
|
||||
```text
|
||||
README
|
||||
docs
|
||||
examples
|
||||
tests
|
||||
package files
|
||||
source tree
|
||||
API routes
|
||||
CLI definitions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2 — Structural Summary
|
||||
|
||||
Produce:
|
||||
|
||||
```yaml
|
||||
repository_summary:
|
||||
languages: []
|
||||
frameworks: []
|
||||
interfaces: []
|
||||
docs_found: []
|
||||
tests_found: []
|
||||
examples_found: []
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3 — Candidate Ability Extraction
|
||||
|
||||
From README/docs/package descriptions.
|
||||
|
||||
Output:
|
||||
|
||||
```yaml
|
||||
candidate_abilities:
|
||||
- name
|
||||
- description
|
||||
- confidence
|
||||
- supporting_sources
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4 — Candidate Capability Extraction
|
||||
|
||||
From APIs, tests, examples, public modules.
|
||||
|
||||
Output:
|
||||
|
||||
```yaml
|
||||
candidate_capabilities:
|
||||
- name
|
||||
- description
|
||||
- inputs
|
||||
- outputs
|
||||
- linked_abilities
|
||||
- confidence
|
||||
- supporting_sources
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 5 — Candidate Feature Extraction
|
||||
|
||||
From endpoints, CLI commands, config files, UI components, modules.
|
||||
|
||||
Output:
|
||||
|
||||
```yaml
|
||||
candidate_features:
|
||||
- name
|
||||
- type
|
||||
- location
|
||||
- linked_capabilities
|
||||
- confidence
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 6 — Evidence Linking
|
||||
|
||||
Attach evidence:
|
||||
|
||||
```yaml
|
||||
evidence:
|
||||
- type
|
||||
- path
|
||||
- supports
|
||||
- strength
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 7 — Review Package
|
||||
|
||||
Generate a curator-friendly review view:
|
||||
|
||||
```text
|
||||
Ability
|
||||
Capability
|
||||
Feature
|
||||
Evidence
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 13. Example Extraction
|
||||
|
||||
Given README:
|
||||
|
||||
```text
|
||||
MailRouter helps companies automatically classify incoming emails and route them to the right department.
|
||||
```
|
||||
|
||||
Given route:
|
||||
|
||||
```text
|
||||
POST /api/classify-email
|
||||
```
|
||||
|
||||
Given test:
|
||||
|
||||
```text
|
||||
tests/test_email_classification.py
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```yaml
|
||||
abilities:
|
||||
- id: ability.business_email_routing
|
||||
name: Business Email Routing
|
||||
confidence: 0.9
|
||||
|
||||
capabilities:
|
||||
- id: capability.classify_incoming_email
|
||||
name: Classify Incoming Email
|
||||
ability_refs:
|
||||
- ability.business_email_routing
|
||||
confidence: 0.85
|
||||
|
||||
features:
|
||||
- id: feature.classify_email_endpoint
|
||||
name: POST /api/classify-email
|
||||
type: REST endpoint
|
||||
location: src/routes/classify_email.py
|
||||
capability_refs:
|
||||
- capability.classify_incoming_email
|
||||
|
||||
evidence:
|
||||
- type: unit_test
|
||||
path: tests/test_email_classification.py
|
||||
supports:
|
||||
- capability.classify_incoming_email
|
||||
strength: strong
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 14. MVP Principle
|
||||
|
||||
The extractor should be:
|
||||
|
||||
```text
|
||||
conservative
|
||||
explainable
|
||||
reviewable
|
||||
source-linked
|
||||
```
|
||||
|
||||
Not magical.
|
||||
|
||||
The best first version is not the one that extracts everything.
|
||||
|
||||
It is the one where the user says:
|
||||
|
||||
> “Yes, I understand why the system proposed this.”
|
||||
|
||||
|
||||
xxx
|
||||
Reference in New Issue
Block a user