generated from coulomb/repo-seed
823 lines
11 KiB
Markdown
823 lines
11 KiB
Markdown
AbilityExtractionHeuristics
|
||
|
||
*How repositories will be explored*
|
||
|
||
# Ability / Capability Extraction Heuristics v0.1
|
||
|
||
## Repository Scoping
|
||
|
||
## 1. Purpose
|
||
|
||
The extraction engine should answer:
|
||
|
||
> “What is this repository useful for, what bounded behaviors does it provide, and where are those behaviors implemented?”
|
||
|
||
It should produce **candidate entries**, not final truth. Human/agent review remains part of the workflow.
|
||
|
||
---
|
||
|
||
# 2. Extraction Layers
|
||
|
||
```text
|
||
Ability → usefulness / problem class
|
||
Capability → bounded behavior
|
||
Feature → concrete interface or implementation
|
||
Evidence → reason to believe the claim
|
||
```
|
||
|
||
---
|
||
|
||
# 3. Source Priority
|
||
|
||
Not all repository signals are equally trustworthy.
|
||
|
||
## Priority 1 — High Trust
|
||
|
||
Use these first:
|
||
|
||
```text
|
||
README
|
||
docs/
|
||
examples/
|
||
tests/
|
||
API specs
|
||
CLI help
|
||
package metadata
|
||
```
|
||
|
||
These usually express intended usage.
|
||
|
||
## Priority 2 — Medium Trust
|
||
|
||
```text
|
||
module names
|
||
function names
|
||
class names
|
||
route names
|
||
config files
|
||
workflow files
|
||
```
|
||
|
||
These show implemented structure.
|
||
|
||
## Priority 3 — Low Trust
|
||
|
||
```text
|
||
comments
|
||
commit messages
|
||
dependency names
|
||
directory names alone
|
||
```
|
||
|
||
Useful as supporting signals, but not enough by themselves.
|
||
|
||
---
|
||
|
||
# 4. Ability Extraction Heuristics
|
||
|
||
Abilities describe **why the repository is useful**.
|
||
|
||
## 4.1 Ability Signal Patterns
|
||
|
||
Look for phrases like:
|
||
|
||
```text
|
||
"helps users..."
|
||
"enables..."
|
||
"automates..."
|
||
"provides a way to..."
|
||
"used for..."
|
||
"designed to..."
|
||
"allows..."
|
||
"supports..."
|
||
```
|
||
|
||
Example:
|
||
|
||
```text
|
||
"This library helps route incoming business emails."
|
||
```
|
||
|
||
Candidate ability:
|
||
|
||
```yaml
|
||
name: Business Email Routing
|
||
```
|
||
|
||
---
|
||
|
||
## 4.2 Ability Naming Rule
|
||
|
||
Ability names should be:
|
||
|
||
```text
|
||
Domain + Problem Class
|
||
```
|
||
|
||
Good:
|
||
|
||
```text
|
||
Business Email Routing
|
||
Document Classification
|
||
Invoice Data Extraction
|
||
Kubernetes Deployment Inspection
|
||
Agent Workflow Orchestration
|
||
```
|
||
|
||
Bad:
|
||
|
||
```text
|
||
Fast API
|
||
Email Button
|
||
Classifier
|
||
Uses GPT
|
||
```
|
||
|
||
---
|
||
|
||
## 4.3 Ability Extraction Sources
|
||
|
||
Best sources for abilities:
|
||
|
||
```text
|
||
README intro
|
||
project tagline
|
||
docs overview
|
||
examples index
|
||
package description
|
||
```
|
||
|
||
Ability is usually described in prose, not code.
|
||
|
||
---
|
||
|
||
## 4.4 Ability Confidence
|
||
|
||
Assign confidence based on signal quality:
|
||
|
||
```yaml
|
||
confidence:
|
||
high:
|
||
- explicitly stated in README/docs
|
||
- supported by examples
|
||
- supported by tests or APIs
|
||
|
||
medium:
|
||
- inferred from multiple capabilities/features
|
||
- visible in examples but not stated
|
||
|
||
low:
|
||
- inferred from names only
|
||
- based on dependencies or folder structure
|
||
```
|
||
|
||
---
|
||
|
||
# 5. Capability Extraction Heuristics
|
||
|
||
Capabilities describe **bounded behavior**.
|
||
|
||
## 5.1 Capability Signal Patterns
|
||
|
||
Look for verbs applied to objects:
|
||
|
||
```text
|
||
classify email
|
||
extract invoice data
|
||
summarize document
|
||
validate schema
|
||
generate response
|
||
deploy service
|
||
monitor cluster
|
||
route ticket
|
||
convert format
|
||
```
|
||
|
||
Pattern:
|
||
|
||
```text
|
||
Verb + Object
|
||
```
|
||
|
||
Examples:
|
||
|
||
```text
|
||
Classify Email Intent
|
||
Extract Invoice Metadata
|
||
Generate Routing Explanation
|
||
Validate Repository Metadata
|
||
```
|
||
|
||
---
|
||
|
||
## 5.2 Capability Naming Rule
|
||
|
||
Capability names should be:
|
||
|
||
```text
|
||
Action Verb + Domain Object
|
||
```
|
||
|
||
Good:
|
||
|
||
```text
|
||
Classify Incoming Email
|
||
Extract PDF Metadata
|
||
Generate API Client
|
||
Validate Kubernetes Manifest
|
||
Detect Broken Links
|
||
```
|
||
|
||
Bad:
|
||
|
||
```text
|
||
Email Capability
|
||
Parser
|
||
Smart Document Stuff
|
||
Endpoint
|
||
```
|
||
|
||
---
|
||
|
||
## 5.3 Capability Sources
|
||
|
||
Best sources:
|
||
|
||
```text
|
||
API route names
|
||
CLI commands
|
||
public functions
|
||
service classes
|
||
tests
|
||
examples
|
||
docs tutorials
|
||
```
|
||
|
||
Capability is often visible in code and tests.
|
||
|
||
---
|
||
|
||
## 5.4 Capability Boundary Rule
|
||
|
||
A capability should be small enough to test.
|
||
|
||
Good:
|
||
|
||
```text
|
||
Extract invoice date from PDF
|
||
Classify email into intent category
|
||
Generate markdown from DOCX
|
||
```
|
||
|
||
Too broad:
|
||
|
||
```text
|
||
Manage documents
|
||
Automate business
|
||
Understand everything
|
||
```
|
||
|
||
Too narrow:
|
||
|
||
```text
|
||
Read config variable
|
||
Call helper function
|
||
Trim whitespace
|
||
```
|
||
|
||
Rule of thumb:
|
||
|
||
> If you can write a meaningful acceptance test for it, it is probably a capability.
|
||
|
||
---
|
||
|
||
# 6. Feature Extraction Heuristics
|
||
|
||
Features describe **how the capability is exposed or implemented**.
|
||
|
||
## 6.1 Feature Signal Patterns
|
||
|
||
Look for concrete affordances:
|
||
|
||
```text
|
||
REST endpoint
|
||
CLI command
|
||
UI component
|
||
configuration option
|
||
SDK method
|
||
background job
|
||
database migration
|
||
import/export format
|
||
plugin hook
|
||
```
|
||
|
||
Examples:
|
||
|
||
```yaml
|
||
features:
|
||
- name: /classify-email endpoint
|
||
- name: classify-email CLI command
|
||
- name: department-rules.yaml config
|
||
- name: JSON result export
|
||
```
|
||
|
||
---
|
||
|
||
## 6.2 Feature Naming Rule
|
||
|
||
Feature names should be concrete and inspectable.
|
||
|
||
Good:
|
||
|
||
```text
|
||
POST /api/classify-email
|
||
classify-email CLI command
|
||
Rule Configuration File
|
||
PDF Upload Component
|
||
```
|
||
|
||
Bad:
|
||
|
||
```text
|
||
AI routing
|
||
Document understanding
|
||
Magic extraction
|
||
```
|
||
|
||
---
|
||
|
||
# 7. Evidence Extraction Heuristics
|
||
|
||
Evidence supports claims.
|
||
|
||
## 7.1 Evidence Types
|
||
|
||
```yaml
|
||
evidence_types:
|
||
unit_test
|
||
integration_test
|
||
example
|
||
demo
|
||
benchmark
|
||
documentation
|
||
API specification
|
||
production usage note
|
||
manual review
|
||
```
|
||
|
||
---
|
||
|
||
## 7.2 Evidence Mapping
|
||
|
||
Map evidence to the nearest capability.
|
||
|
||
Example:
|
||
|
||
```text
|
||
tests/test_email_classifier.py
|
||
```
|
||
|
||
Supports:
|
||
|
||
```text
|
||
Classify Incoming Email
|
||
```
|
||
|
||
Example:
|
||
|
||
```text
|
||
examples/invoice_extraction_demo.py
|
||
```
|
||
|
||
Supports:
|
||
|
||
```text
|
||
Extract Invoice Metadata
|
||
```
|
||
|
||
---
|
||
|
||
## 7.3 Evidence Strength
|
||
|
||
```yaml
|
||
evidence_strength:
|
||
strong:
|
||
- automated tests
|
||
- benchmark results
|
||
- executable examples
|
||
- integration tests
|
||
|
||
medium:
|
||
- documentation
|
||
- tutorials
|
||
- screenshots
|
||
- sample output
|
||
|
||
weak:
|
||
- README claim only
|
||
- comments
|
||
- filename hints
|
||
```
|
||
|
||
---
|
||
|
||
# 8. Ability–Capability–Feature Linking
|
||
|
||
## 8.1 Link Rule
|
||
|
||
```text
|
||
Ability explains why.
|
||
Capability explains what.
|
||
Feature explains how/where.
|
||
```
|
||
|
||
Example:
|
||
|
||
```yaml
|
||
ability:
|
||
name: Business Email Routing
|
||
|
||
capability:
|
||
name: Classify Incoming Email
|
||
supports:
|
||
- Business Email Routing
|
||
|
||
feature:
|
||
name: POST /api/classify-email
|
||
implements:
|
||
- Classify Incoming Email
|
||
```
|
||
|
||
---
|
||
|
||
## 8.2 Linking Heuristic
|
||
|
||
A capability supports an ability if:
|
||
|
||
```text
|
||
Removing the capability would weaken the repository’s ability to deliver that usefulness.
|
||
```
|
||
|
||
A feature implements a capability if:
|
||
|
||
```text
|
||
The feature is an interface, component, or code location through which the behavior is performed or exposed.
|
||
```
|
||
|
||
---
|
||
|
||
# 9. Confidence Scoring
|
||
|
||
Use a simple additive model first.
|
||
|
||
## 9.1 Candidate Confidence Factors
|
||
|
||
```yaml
|
||
confidence_factors:
|
||
explicit_doc_claim: +0.30
|
||
example_present: +0.20
|
||
test_present: +0.25
|
||
implementation_location_found: +0.15
|
||
api_or_cli_exposed: +0.15
|
||
multiple_source_agreement: +0.20
|
||
inferred_from_names_only: -0.25
|
||
no_evidence: -0.30
|
||
```
|
||
|
||
Normalize to:
|
||
|
||
```text
|
||
0.0 – 1.0
|
||
```
|
||
|
||
## 9.2 Confidence Labels
|
||
|
||
```yaml
|
||
0.80 - 1.00: high
|
||
0.50 - 0.79: medium
|
||
0.20 - 0.49: low
|
||
0.00 - 0.19: speculative
|
||
```
|
||
|
||
---
|
||
|
||
# 10. Classification Rules
|
||
|
||
## 10.1 Is it an Ability?
|
||
|
||
Ask:
|
||
|
||
```text
|
||
Would a user search for this as a desired outcome?
|
||
```
|
||
|
||
If yes, probably ability.
|
||
|
||
Example:
|
||
|
||
```text
|
||
“I need document classification.”
|
||
```
|
||
|
||
Ability.
|
||
|
||
---
|
||
|
||
## 10.2 Is it a Capability?
|
||
|
||
Ask:
|
||
|
||
```text
|
||
Can this behavior be tested with input/output expectations?
|
||
```
|
||
|
||
If yes, probably capability.
|
||
|
||
Example:
|
||
|
||
```text
|
||
“Classify document into category.”
|
||
```
|
||
|
||
Capability.
|
||
|
||
---
|
||
|
||
## 10.3 Is it a Feature?
|
||
|
||
Ask:
|
||
|
||
```text
|
||
Is this a concrete interface, option, component, or implementation artifact?
|
||
```
|
||
|
||
If yes, probably feature.
|
||
|
||
Example:
|
||
|
||
```text
|
||
“POST /api/classify-document”
|
||
```
|
||
|
||
Feature.
|
||
|
||
---
|
||
|
||
# 11. Anti-Heuristics
|
||
|
||
Things the extractor should avoid.
|
||
|
||
## 11.1 Do Not Treat Dependencies as Capabilities
|
||
|
||
Bad:
|
||
|
||
```yaml
|
||
capability: Uses OpenAI
|
||
```
|
||
|
||
Better:
|
||
|
||
```yaml
|
||
feature: OpenAI provider integration
|
||
capability: Generate Text Summary
|
||
```
|
||
|
||
---
|
||
|
||
## 11.2 Do Not Treat Technology as Ability
|
||
|
||
Bad:
|
||
|
||
```yaml
|
||
ability: FastAPI
|
||
```
|
||
|
||
Better:
|
||
|
||
```yaml
|
||
feature: FastAPI REST interface
|
||
```
|
||
|
||
---
|
||
|
||
## 11.3 Do Not Treat Internal Helpers as Capabilities
|
||
|
||
Bad:
|
||
|
||
```yaml
|
||
capability: Parse YAML Config
|
||
```
|
||
|
||
Unless parsing YAML config is a user-visible behavior.
|
||
|
||
---
|
||
|
||
## 11.4 Avoid Vendor-Hype Terms
|
||
|
||
Bad:
|
||
|
||
```text
|
||
intelligent automation
|
||
next-gen AI
|
||
enterprise-ready transformation
|
||
```
|
||
|
||
Convert into testable candidates:
|
||
|
||
```text
|
||
Classify Documents
|
||
Generate Reports
|
||
Route Tasks
|
||
```
|
||
|
||
---
|
||
|
||
# 12. Extraction Pipeline v0.1
|
||
|
||
## Step 1 — Repository Intake
|
||
|
||
Collect:
|
||
|
||
```text
|
||
README
|
||
docs
|
||
examples
|
||
tests
|
||
package files
|
||
source tree
|
||
API routes
|
||
CLI definitions
|
||
```
|
||
|
||
---
|
||
|
||
## Step 2 — Structural Summary
|
||
|
||
Produce:
|
||
|
||
```yaml
|
||
repository_summary:
|
||
languages: []
|
||
frameworks: []
|
||
interfaces: []
|
||
docs_found: []
|
||
tests_found: []
|
||
examples_found: []
|
||
```
|
||
|
||
---
|
||
|
||
## Step 3 — Candidate Ability Extraction
|
||
|
||
From README/docs/package descriptions.
|
||
|
||
Output:
|
||
|
||
```yaml
|
||
candidate_abilities:
|
||
- name
|
||
- description
|
||
- confidence
|
||
- supporting_sources
|
||
```
|
||
|
||
---
|
||
|
||
## Step 4 — Candidate Capability Extraction
|
||
|
||
From APIs, tests, examples, public modules.
|
||
|
||
Output:
|
||
|
||
```yaml
|
||
candidate_capabilities:
|
||
- name
|
||
- description
|
||
- inputs
|
||
- outputs
|
||
- linked_abilities
|
||
- confidence
|
||
- supporting_sources
|
||
```
|
||
|
||
---
|
||
|
||
## Step 5 — Candidate Feature Extraction
|
||
|
||
From endpoints, CLI commands, config files, UI components, modules.
|
||
|
||
Output:
|
||
|
||
```yaml
|
||
candidate_features:
|
||
- name
|
||
- type
|
||
- location
|
||
- linked_capabilities
|
||
- confidence
|
||
```
|
||
|
||
---
|
||
|
||
## Step 6 — Evidence Linking
|
||
|
||
Attach evidence:
|
||
|
||
```yaml
|
||
evidence:
|
||
- type
|
||
- path
|
||
- supports
|
||
- strength
|
||
```
|
||
|
||
---
|
||
|
||
## Step 7 — Review Package
|
||
|
||
Generate a curator-friendly review view:
|
||
|
||
```text
|
||
Ability
|
||
Capability
|
||
Feature
|
||
Evidence
|
||
```
|
||
|
||
---
|
||
|
||
# 13. Example Extraction
|
||
|
||
Given README:
|
||
|
||
```text
|
||
MailRouter helps companies automatically classify incoming emails and route them to the right department.
|
||
```
|
||
|
||
Given route:
|
||
|
||
```text
|
||
POST /api/classify-email
|
||
```
|
||
|
||
Given test:
|
||
|
||
```text
|
||
tests/test_email_classification.py
|
||
```
|
||
|
||
Output:
|
||
|
||
```yaml
|
||
abilities:
|
||
- id: ability.business_email_routing
|
||
name: Business Email Routing
|
||
confidence: 0.9
|
||
|
||
capabilities:
|
||
- id: capability.classify_incoming_email
|
||
name: Classify Incoming Email
|
||
ability_refs:
|
||
- ability.business_email_routing
|
||
confidence: 0.85
|
||
|
||
features:
|
||
- id: feature.classify_email_endpoint
|
||
name: POST /api/classify-email
|
||
type: REST endpoint
|
||
location: src/routes/classify_email.py
|
||
capability_refs:
|
||
- capability.classify_incoming_email
|
||
|
||
evidence:
|
||
- type: unit_test
|
||
path: tests/test_email_classification.py
|
||
supports:
|
||
- capability.classify_incoming_email
|
||
strength: strong
|
||
```
|
||
|
||
---
|
||
|
||
# 14. MVP Principle
|
||
|
||
The extractor should be:
|
||
|
||
```text
|
||
conservative
|
||
explainable
|
||
reviewable
|
||
source-linked
|
||
```
|
||
|
||
Not magical.
|
||
|
||
The best first version is not the one that extracts everything.
|
||
|
||
It is the one where the user says:
|
||
|
||
> “Yes, I understand why the system proposed this.”
|
||
|
||
|
||
xxx
|