added initial concept documents

2026-04-25 21:15:17 +02:00
parent 4bcd22a518
commit 8d3d5aab42
7 changed files with 3052 additions and 0 deletions
--- a/wiki/AbilityExtractionHeuristics.md
+++ b/wiki/AbilityExtractionHeuristics.md
@@ -0,0 +1,822 @@
+AbilityExtractionHeuristics
+
+*How repositories will be explored*
+
+# Ability / Capability Extraction Heuristics v0.1
+
+## Repository Ability Registry
+
+## 1. Purpose
+
+The extraction engine should answer:
+
+> “What is this repository useful for, what bounded behaviors does it provide, and where are those behaviors implemented?”
+
+It should produce **candidate entries**, not final truth. Human/agent review remains part of the workflow.
+
+---
+
+# 2. Extraction Layers
+
+```text
+Ability      → usefulness / problem class
+Capability   → bounded behavior
+Feature      → concrete interface or implementation
+Evidence     → reason to believe the claim
+```
+
+---
+
+# 3. Source Priority
+
+Not all repository signals are equally trustworthy.
+
+## Priority 1 — High Trust
+
+Use these first:
+
+```text
+README
+docs/
+examples/
+tests/
+API specs
+CLI help
+package metadata
+```
+
+These usually express intended usage.
+
+## Priority 2 — Medium Trust
+
+```text
+module names
+function names
+class names
+route names
+config files
+workflow files
+```
+
+These show implemented structure.
+
+## Priority 3 — Low Trust
+
+```text
+comments
+commit messages
+dependency names
+directory names alone
+```
+
+Useful as supporting signals, but not enough by themselves.
+
+---
+
+# 4. Ability Extraction Heuristics
+
+Abilities describe **why the repository is useful**.
+
+## 4.1 Ability Signal Patterns
+
+Look for phrases like:
+
+```text
+"helps users..."
+"enables..."
+"automates..."
+"provides a way to..."
+"used for..."
+"designed to..."
+"allows..."
+"supports..."
+```
+
+Example:
+
+```text
+"This library helps route incoming business emails."
+```
+
+Candidate ability:
+
+```yaml
+name: Business Email Routing
+```
+
+---
+
+## 4.2 Ability Naming Rule
+
+Ability names should be:
+
+```text
+Domain + Problem Class
+```
+
+Good:
+
+```text
+Business Email Routing
+Document Classification
+Invoice Data Extraction
+Kubernetes Deployment Inspection
+Agent Workflow Orchestration
+```
+
+Bad:
+
+```text
+Fast API
+Email Button
+Classifier
+Uses GPT
+```
+
+---
+
+## 4.3 Ability Extraction Sources
+
+Best sources for abilities:
+
+```text
+README intro
+project tagline
+docs overview
+examples index
+package description
+```
+
+Ability is usually described in prose, not code.
+
+---
+
+## 4.4 Ability Confidence
+
+Assign confidence based on signal quality:
+
+```yaml
+confidence:
+  high:
+    - explicitly stated in README/docs
+    - supported by examples
+    - supported by tests or APIs
+
+  medium:
+    - inferred from multiple capabilities/features
+    - visible in examples but not stated
+
+  low:
+    - inferred from names only
+    - based on dependencies or folder structure
+```
+
+---
+
+# 5. Capability Extraction Heuristics
+
+Capabilities describe **bounded behavior**.
+
+## 5.1 Capability Signal Patterns
+
+Look for verbs applied to objects:
+
+```text
+classify email
+extract invoice data
+summarize document
+validate schema
+generate response
+deploy service
+monitor cluster
+route ticket
+convert format
+```
+
+Pattern:
+
+```text
+Verb + Object
+```
+
+Examples:
+
+```text
+Classify Email Intent
+Extract Invoice Metadata
+Generate Routing Explanation
+Validate Repository Metadata
+```
+
+---
+
+## 5.2 Capability Naming Rule
+
+Capability names should be:
+
+```text
+Action Verb + Domain Object
+```
+
+Good:
+
+```text
+Classify Incoming Email
+Extract PDF Metadata
+Generate API Client
+Validate Kubernetes Manifest
+Detect Broken Links
+```
+
+Bad:
+
+```text
+Email Capability
+Parser
+Smart Document Stuff
+Endpoint
+```
+
+---
+
+## 5.3 Capability Sources
+
+Best sources:
+
+```text
+API route names
+CLI commands
+public functions
+service classes
+tests
+examples
+docs tutorials
+```
+
+Capability is often visible in code and tests.
+
+---
+
+## 5.4 Capability Boundary Rule
+
+A capability should be small enough to test.
+
+Good:
+
+```text
+Extract invoice date from PDF
+Classify email into intent category
+Generate markdown from DOCX
+```
+
+Too broad:
+
+```text
+Manage documents
+Automate business
+Understand everything
+```
+
+Too narrow:
+
+```text
+Read config variable
+Call helper function
+Trim whitespace
+```
+
+Rule of thumb:
+
+> If you can write a meaningful acceptance test for it, it is probably a capability.
+
+---
+
+# 6. Feature Extraction Heuristics
+
+Features describe **how the capability is exposed or implemented**.
+
+## 6.1 Feature Signal Patterns
+
+Look for concrete affordances:
+
+```text
+REST endpoint
+CLI command
+UI component
+configuration option
+SDK method
+background job
+database migration
+import/export format
+plugin hook
+```
+
+Examples:
+
+```yaml
+features:
+  - name: /classify-email endpoint
+  - name: classify-email CLI command
+  - name: department-rules.yaml config
+  - name: JSON result export
+```
+
+---
+
+## 6.2 Feature Naming Rule
+
+Feature names should be concrete and inspectable.
+
+Good:
+
+```text
+POST /api/classify-email
+classify-email CLI command
+Rule Configuration File
+PDF Upload Component
+```
+
+Bad:
+
+```text
+AI routing
+Document understanding
+Magic extraction
+```
+
+---
+
+# 7. Evidence Extraction Heuristics
+
+Evidence supports claims.
+
+## 7.1 Evidence Types
+
+```yaml
+evidence_types:
+  unit_test
+  integration_test
+  example
+  demo
+  benchmark
+  documentation
+  API specification
+  production usage note
+  manual review
+```
+
+---
+
+## 7.2 Evidence Mapping
+
+Map evidence to the nearest capability.
+
+Example:
+
+```text
+tests/test_email_classifier.py
+```
+
+Supports:
+
+```text
+Classify Incoming Email
+```
+
+Example:
+
+```text
+examples/invoice_extraction_demo.py
+```
+
+Supports:
+
+```text
+Extract Invoice Metadata
+```
+
+---
+
+## 7.3 Evidence Strength
+
+```yaml
+evidence_strength:
+  strong:
+    - automated tests
+    - benchmark results
+    - executable examples
+    - integration tests
+
+  medium:
+    - documentation
+    - tutorials
+    - screenshots
+    - sample output
+
+  weak:
+    - README claim only
+    - comments
+    - filename hints
+```
+
+---
+
+# 8. Ability–Capability–Feature Linking
+
+## 8.1 Link Rule
+
+```text
+Ability explains why.
+Capability explains what.
+Feature explains how/where.
+```
+
+Example:
+
+```yaml
+ability:
+  name: Business Email Routing
+
+capability:
+  name: Classify Incoming Email
+  supports:
+    - Business Email Routing
+
+feature:
+  name: POST /api/classify-email
+  implements:
+    - Classify Incoming Email
+```
+
+---
+
+## 8.2 Linking Heuristic
+
+A capability supports an ability if:
+
+```text
+Removing the capability would weaken the repository’s ability to deliver that usefulness.
+```
+
+A feature implements a capability if:
+
+```text
+The feature is an interface, component, or code location through which the behavior is performed or exposed.
+```
+
+---
+
+# 9. Confidence Scoring
+
+Use a simple additive model first.
+
+## 9.1 Candidate Confidence Factors
+
+```yaml
+confidence_factors:
+  explicit_doc_claim: +0.30
+  example_present: +0.20
+  test_present: +0.25
+  implementation_location_found: +0.15
+  api_or_cli_exposed: +0.15
+  multiple_source_agreement: +0.20
+  inferred_from_names_only: -0.25
+  no_evidence: -0.30
+```
+
+Normalize to:
+
+```text
+0.0 – 1.0
+```
+
+## 9.2 Confidence Labels
+
+```yaml
+0.80 - 1.00: high
+0.50 - 0.79: medium
+0.20 - 0.49: low
+0.00 - 0.19: speculative
+```
+
+---
+
+# 10. Classification Rules
+
+## 10.1 Is it an Ability?
+
+Ask:
+
+```text
+Would a user search for this as a desired outcome?
+```
+
+If yes, probably ability.
+
+Example:
+
+```text
+“I need document classification.”
+```
+
+Ability.
+
+---
+
+## 10.2 Is it a Capability?
+
+Ask:
+
+```text
+Can this behavior be tested with input/output expectations?
+```
+
+If yes, probably capability.
+
+Example:
+
+```text
+“Classify document into category.”
+```
+
+Capability.
+
+---
+
+## 10.3 Is it a Feature?
+
+Ask:
+
+```text
+Is this a concrete interface, option, component, or implementation artifact?
+```
+
+If yes, probably feature.
+
+Example:
+
+```text
+“POST /api/classify-document”
+```
+
+Feature.
+
+---
+
+# 11. Anti-Heuristics
+
+Things the extractor should avoid.
+
+## 11.1 Do Not Treat Dependencies as Capabilities
+
+Bad:
+
+```yaml
+capability: Uses OpenAI
+```
+
+Better:
+
+```yaml
+feature: OpenAI provider integration
+capability: Generate Text Summary
+```
+
+---
+
+## 11.2 Do Not Treat Technology as Ability
+
+Bad:
+
+```yaml
+ability: FastAPI
+```
+
+Better:
+
+```yaml
+feature: FastAPI REST interface
+```
+
+---
+
+## 11.3 Do Not Treat Internal Helpers as Capabilities
+
+Bad:
+
+```yaml
+capability: Parse YAML Config
+```
+
+Unless parsing YAML config is a user-visible behavior.
+
+---
+
+## 11.4 Avoid Vendor-Hype Terms
+
+Bad:
+
+```text
+intelligent automation
+next-gen AI
+enterprise-ready transformation
+```
+
+Convert into testable candidates:
+
+```text
+Classify Documents
+Generate Reports
+Route Tasks
+```
+
+---
+
+# 12. Extraction Pipeline v0.1
+
+## Step 1 — Repository Intake
+
+Collect:
+
+```text
+README
+docs
+examples
+tests
+package files
+source tree
+API routes
+CLI definitions
+```
+
+---
+
+## Step 2 — Structural Summary
+
+Produce:
+
+```yaml
+repository_summary:
+  languages: []
+  frameworks: []
+  interfaces: []
+  docs_found: []
+  tests_found: []
+  examples_found: []
+```
+
+---
+
+## Step 3 — Candidate Ability Extraction
+
+From README/docs/package descriptions.
+
+Output:
+
+```yaml
+candidate_abilities:
+  - name
+  - description
+  - confidence
+  - supporting_sources
+```
+
+---
+
+## Step 4 — Candidate Capability Extraction
+
+From APIs, tests, examples, public modules.
+
+Output:
+
+```yaml
+candidate_capabilities:
+  - name
+  - description
+  - inputs
+  - outputs
+  - linked_abilities
+  - confidence
+  - supporting_sources
+```
+
+---
+
+## Step 5 — Candidate Feature Extraction
+
+From endpoints, CLI commands, config files, UI components, modules.
+
+Output:
+
+```yaml
+candidate_features:
+  - name
+  - type
+  - location
+  - linked_capabilities
+  - confidence
+```
+
+---
+
+## Step 6 — Evidence Linking
+
+Attach evidence:
+
+```yaml
+evidence:
+  - type
+  - path
+  - supports
+  - strength
+```
+
+---
+
+## Step 7 — Review Package
+
+Generate a curator-friendly review view:
+
+```text
+Ability
+  Capability
+    Feature
+    Evidence
+```
+
+---
+
+# 13. Example Extraction
+
+Given README:
+
+```text
+MailRouter helps companies automatically classify incoming emails and route them to the right department.
+```
+
+Given route:
+
+```text
+POST /api/classify-email
+```
+
+Given test:
+
+```text
+tests/test_email_classification.py
+```
+
+Output:
+
+```yaml
+abilities:
+  - id: ability.business_email_routing
+    name: Business Email Routing
+    confidence: 0.9
+
+capabilities:
+  - id: capability.classify_incoming_email
+    name: Classify Incoming Email
+    ability_refs:
+      - ability.business_email_routing
+    confidence: 0.85
+
+features:
+  - id: feature.classify_email_endpoint
+    name: POST /api/classify-email
+    type: REST endpoint
+    location: src/routes/classify_email.py
+    capability_refs:
+      - capability.classify_incoming_email
+
+evidence:
+  - type: unit_test
+    path: tests/test_email_classification.py
+    supports:
+      - capability.classify_incoming_email
+    strength: strong
+```
+
+---
+
+# 14. MVP Principle
+
+The extractor should be:
+
+```text
+conservative
+explainable
+reviewable
+source-linked
+```
+
+Not magical.
+
+The best first version is not the one that extracts everything.
+
+It is the one where the user says:
+
+> “Yes, I understand why the system proposed this.”
+
+
+xxx