feat: create Datamodel Optimization Specialist Agent - Issue #127
Based on successful IssueActivity optimization (Issue #126), created a comprehensive Claude Code subagent specialized in datamodel enhancement: Agent Documentation (docs/sub_agents/datamodel_optimizer.md): - 4-phase optimization methodology (Discovery, Analysis, Enhancement, Validation) - Core patterns: property-based formatting, serialization consolidation - Integration framework with Claude Code ecosystem - Success metrics and implementation roadmap Practical Implementation Tool (tools/datamodel_optimizer.py): - AST-based datamodel discovery engine - Usage pattern analysis with impact scoring - Multi-format reporting (summary, detailed, JSON) - CLI interface for interactive and batch processing Real Codebase Validation: - Analyzed 97 datamodels in current codebase - Identified 350 usage patterns and 119 optimization opportunities - Potential 518 lines of code reduction - Correctly recognized IssueActivity optimizations from Issue #126 Core Capabilities: - Property-based formatting consolidation - Verbose serialization → single method calls - Test data consistency (dict mocks → proper objects) - Business logic encapsulation Agent provides systematic, reusable framework for datamodel optimization across any codebase while preserving interface compatibility. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
427
docs/sub_agents/datamodel_optimizer.md
Normal file
427
docs/sub_agents/datamodel_optimizer.md
Normal file
@@ -0,0 +1,427 @@
|
||||
# Datamodel Optimization Specialist Agent
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The Datamodel Optimization Specialist is a Claude Code subagent designed to systematically analyze, optimize, and enhance dataclasses, models, and data structures within a codebase. Based on the successful optimization of `IssueActivity` (Issue #126), this agent provides comprehensive datamodel improvements including convenience methods, interface consistency, code reduction, and test alignment.
|
||||
|
||||
## Problem Analysis
|
||||
|
||||
### Core Issues Identified
|
||||
1. **Scattered Interface Logic**: Formatting and display logic spread across multiple files
|
||||
2. **Test/Production Mismatches**: Tests using dictionary mocks instead of proper dataclass objects
|
||||
3. **Verbose Code Patterns**: Repetitive serialization and formatting code
|
||||
4. **Poor Encapsulation**: Direct attribute access without convenient methods
|
||||
5. **Helper Code Complexity**: Complex utility functions handling multiple data formats
|
||||
|
||||
### Impact Assessment
|
||||
- **Development Efficiency**: Time wasted on repetitive formatting and serialization
|
||||
- **Code Maintainability**: Logic scattered across multiple locations
|
||||
- **Test Reliability**: Fragile dictionary mocks breaking easily
|
||||
- **Interface Consistency**: Inconsistent access patterns across codebase
|
||||
|
||||
## Agent Capabilities
|
||||
|
||||
### 1. Datamodel Discovery & Analysis
|
||||
- **Class Pattern Recognition**: Identify dataclasses, Pydantic models, and plain classes
|
||||
- **Usage Pattern Analysis**: Map how models are used across the codebase
|
||||
- **Interface Assessment**: Analyze current attribute access patterns
|
||||
- **Test Pattern Detection**: Identify mock vs real object usage inconsistencies
|
||||
|
||||
### 2. Optimization Opportunity Detection
|
||||
- **Convenience Method Gaps**: Identify missing formatting/display methods
|
||||
- **Serialization Optimization**: Find verbose dict building patterns
|
||||
- **Code Duplication Detection**: Locate repeated formatting logic
|
||||
- **Test Alignment Issues**: Find test/production data structure mismatches
|
||||
|
||||
### 3. Enhancement Implementation
|
||||
- **Property Addition**: Add computed properties for common operations
|
||||
- **Method Generation**: Create convenience methods for frequent patterns
|
||||
- **Serialization Methods**: Implement clean `to_dict()` and similar methods
|
||||
- **Display Formatting**: Add formatting methods for UI/CLI display
|
||||
|
||||
### 4. Test Consistency Resolution
|
||||
- **Mock Replacement**: Convert dictionary mocks to proper object instances
|
||||
- **Test Data Factories**: Create factories for consistent test objects
|
||||
- **Mock Validation**: Ensure mocks match real object interfaces
|
||||
- **Test Coverage Enhancement**: Improve test reliability and maintainability
|
||||
|
||||
## Methodology Framework
|
||||
|
||||
### Phase 1: Discovery & Analysis
|
||||
|
||||
#### 1.1 Datamodel Inventory
|
||||
```python
|
||||
# Discover dataclasses and models
|
||||
find . -name "*.py" -exec grep -l "@dataclass\|BaseModel\|class.*:" {} \;
|
||||
|
||||
# Analyze attribute patterns
|
||||
grep -r "def __init__\|@property" --include="*.py" .
|
||||
|
||||
# Map usage patterns
|
||||
grep -rn "\.attribute\|\.method" --include="*.py" .
|
||||
```
|
||||
|
||||
#### 1.2 Usage Pattern Analysis
|
||||
```bash
|
||||
# Find formatting patterns
|
||||
grep -r "strftime\|\.value\|\.lower()\|\.upper()" --include="*.py" .
|
||||
|
||||
# Identify serialization patterns
|
||||
grep -r "{'.*':\|dict(\|\.items()\|\.keys()" --include="*.py" .
|
||||
|
||||
# Detect repetitive code
|
||||
grep -r -A5 -B5 "for.*in.*:" --include="*.py" . | grep -A10 -B10 "append\|\.get("
|
||||
```
|
||||
|
||||
#### 1.3 Test Pattern Assessment
|
||||
```bash
|
||||
# Find mock usage
|
||||
grep -r "Mock(\|mock\.\|@patch" tests/ --include="*.py"
|
||||
|
||||
# Identify dictionary test data
|
||||
grep -r "{\s*['\"].*['\"]\s*:" tests/ --include="*.py"
|
||||
|
||||
# Map test data patterns
|
||||
grep -r "test.*data\|mock.*data" tests/ --include="*.py"
|
||||
```
|
||||
|
||||
### Phase 2: Optimization Strategy Development
|
||||
|
||||
#### 2.1 Enhancement Planning
|
||||
Based on analysis, create optimization plan:
|
||||
|
||||
**Property Candidates:**
|
||||
- Date/datetime formatting
|
||||
- Enum value extraction
|
||||
- Display-friendly representations
|
||||
- Truncated content for UI
|
||||
|
||||
**Method Candidates:**
|
||||
- Keyword search functionality
|
||||
- Business logic validation
|
||||
- Serialization/deserialization
|
||||
- Comparison operations
|
||||
|
||||
**Code Reduction Opportunities:**
|
||||
- Verbose dictionary building → single method calls
|
||||
- Repeated formatting logic → property access
|
||||
- Complex conditional logic → method encapsulation
|
||||
|
||||
#### 2.2 Impact Assessment
|
||||
```python
|
||||
class OptimizationImpact:
|
||||
"""Assess potential impact of datamodel optimization."""
|
||||
|
||||
def calculate_loc_reduction(self, patterns: List[Pattern]) -> int:
|
||||
"""Calculate potential lines of code reduction."""
|
||||
pass
|
||||
|
||||
def assess_maintainability_improvement(self) -> MetricScore:
|
||||
"""Evaluate maintainability improvements."""
|
||||
pass
|
||||
|
||||
def estimate_test_reliability_gain(self) -> MetricScore:
|
||||
"""Estimate test reliability improvements."""
|
||||
pass
|
||||
```
|
||||
|
||||
### Phase 3: Implementation Execution
|
||||
|
||||
#### 3.1 Datamodel Enhancement
|
||||
```python
|
||||
# Example enhancement pattern (based on IssueActivity)
|
||||
@dataclass
|
||||
class OptimizedDataModel:
|
||||
# Original fields (preserve existing interface)
|
||||
core_field: str
|
||||
enum_field: SomeEnum
|
||||
date_field: date
|
||||
|
||||
# Add convenience properties
|
||||
@property
|
||||
def enum_value(self) -> str:
|
||||
"""Get string value of enum field."""
|
||||
return self.enum_field.value if self.enum_field else ''
|
||||
|
||||
@property
|
||||
def display_name(self) -> str:
|
||||
"""Get display-friendly representation."""
|
||||
return self.enum_value.replace('_', ' ').title()
|
||||
|
||||
@property
|
||||
def formatted_date(self) -> str:
|
||||
"""Get formatted date string."""
|
||||
return self.date_field.strftime('%Y-%m-%d') if self.date_field else 'N/A'
|
||||
|
||||
# Add convenience methods
|
||||
def contains_keyword(self, keyword: str, case_sensitive: bool = False) -> bool:
|
||||
"""Check if model contains keyword."""
|
||||
pass
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary representation."""
|
||||
pass
|
||||
```
|
||||
|
||||
#### 3.2 Code Simplification
|
||||
```python
|
||||
# BEFORE: Verbose patterns
|
||||
data_list = []
|
||||
for item in items:
|
||||
data = {
|
||||
'id': item.id,
|
||||
'name': item.name,
|
||||
'status': item.status.value if item.status else '',
|
||||
'date': item.date.strftime('%Y-%m-%d') if item.date else 'N/A'
|
||||
}
|
||||
data_list.append(data)
|
||||
|
||||
# AFTER: Optimized pattern
|
||||
data_list = [item.to_dict() for item in items]
|
||||
```
|
||||
|
||||
#### 3.3 Test Consistency Resolution
|
||||
```python
|
||||
# BEFORE: Dictionary mocks
|
||||
mock_data = {
|
||||
'field1': 'value1',
|
||||
'field2': 'value2',
|
||||
'status': 'active' # String instead of enum!
|
||||
}
|
||||
|
||||
# AFTER: Proper object instances
|
||||
from models import DataModel, StatusEnum
|
||||
|
||||
test_data = DataModel(
|
||||
field1='value1',
|
||||
field2='value2',
|
||||
status=StatusEnum.ACTIVE # Proper enum usage
|
||||
)
|
||||
```
|
||||
|
||||
### Phase 4: Validation & Testing
|
||||
|
||||
#### 4.1 Functionality Preservation
|
||||
```bash
|
||||
# Ensure all tests still pass
|
||||
pytest --tb=short -x
|
||||
|
||||
# Verify no breaking changes
|
||||
python -c "from models import DataModel; print('Interface preserved')"
|
||||
|
||||
# Check type consistency
|
||||
mypy . --strict
|
||||
```
|
||||
|
||||
#### 4.2 Optimization Verification
|
||||
```python
|
||||
class OptimizationValidator:
|
||||
"""Validate optimization results."""
|
||||
|
||||
def verify_loc_reduction(self) -> bool:
|
||||
"""Verify actual LOC reduction matches estimates."""
|
||||
pass
|
||||
|
||||
def validate_interface_preservation(self) -> bool:
|
||||
"""Ensure existing interfaces still work."""
|
||||
pass
|
||||
|
||||
def check_performance_impact(self) -> PerformanceReport:
|
||||
"""Measure any performance impact."""
|
||||
pass
|
||||
```
|
||||
|
||||
## Core Optimization Patterns
|
||||
|
||||
### Pattern 1: Property-Based Formatting
|
||||
**Problem**: Repetitive formatting code scattered across files
|
||||
**Solution**: Centralized formatting properties
|
||||
|
||||
```python
|
||||
# Replace scattered formatting
|
||||
activity.activity_type.value.title()
|
||||
activity.activity_date.strftime('%Y-%m-%d') if activity.activity_date else 'N/A'
|
||||
(activity.details[:40] + '...') if len(activity.details) > 40 else activity.details
|
||||
|
||||
# With clean properties
|
||||
activity.activity_type_display
|
||||
activity.formatted_date
|
||||
activity.truncated_details
|
||||
```
|
||||
|
||||
### Pattern 2: Serialization Method Consolidation
|
||||
**Problem**: Verbose dictionary building patterns
|
||||
**Solution**: Single method calls
|
||||
|
||||
```python
|
||||
# Replace 18-line dictionary building
|
||||
activity_data = []
|
||||
for activity in activities:
|
||||
data = {
|
||||
'id': activity.id,
|
||||
'type': activity.activity_type.value,
|
||||
'date': activity.activity_date.isoformat() if activity.activity_date else None,
|
||||
# ... many more lines
|
||||
}
|
||||
activity_data.append(data)
|
||||
|
||||
# With single method call
|
||||
activity_data = [activity.to_dict() for activity in activities]
|
||||
```
|
||||
|
||||
### Pattern 3: Business Logic Encapsulation
|
||||
**Problem**: Complex conditional logic spread across codebase
|
||||
**Solution**: Encapsulated methods
|
||||
|
||||
```python
|
||||
# Replace complex logic
|
||||
has_implementation = any(
|
||||
'implement' in (getattr(activity, 'activity_type', None).value
|
||||
if hasattr(activity, 'activity_type') and getattr(activity, 'activity_type')
|
||||
else activity.get('activity_type', '') if hasattr(activity, 'get')
|
||||
else '').lower()
|
||||
for activity in activities
|
||||
)
|
||||
|
||||
# With simple method call
|
||||
has_implementation = any(activity.has_implementation_activity() for activity in activities)
|
||||
```
|
||||
|
||||
### Pattern 4: Test Data Consistency
|
||||
**Problem**: Mock/real object mismatches
|
||||
**Solution**: Proper object instances in tests
|
||||
|
||||
```python
|
||||
# Replace fragile dictionary mocks
|
||||
with patch.object(service, 'get_activities') as mock_activities:
|
||||
mock_activities.return_value = [
|
||||
{'activity_type': 'implementation', 'description': 'Implemented feature'}
|
||||
]
|
||||
|
||||
# With proper objects
|
||||
with patch.object(service, 'get_activities') as mock_activities:
|
||||
mock_activities.return_value = [
|
||||
Activity(
|
||||
activity_type=ActivityType.CREATED,
|
||||
activity_details='Implemented feature'
|
||||
)
|
||||
]
|
||||
```
|
||||
|
||||
## Integration Framework
|
||||
|
||||
### With Existing Claude Code Tools
|
||||
- **Task Agent**: Enhanced for datamodel-specific optimization tasks
|
||||
- **TodoWrite**: Track optimization progress with specific checkpoints
|
||||
- **Testing Framework**: Validate optimizations don't break functionality
|
||||
- **Git Integration**: Clean commits with comprehensive optimization documentation
|
||||
|
||||
### With Development Workflow
|
||||
- **Issue Analysis**: Identify datamodel optimization opportunities in issues
|
||||
- **Code Review**: Suggest optimizations during development
|
||||
- **Refactoring Support**: Guide systematic datamodel improvements
|
||||
- **Documentation**: Maintain optimization knowledge base
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Quantitative Measures
|
||||
- **Lines of Code Reduction**: Measure LOC saved through optimization
|
||||
- **Code Duplication Elimination**: Track removed duplicate patterns
|
||||
- **Test Reliability Improvement**: Measure test failure reduction
|
||||
- **Method Call Simplification**: Count complex patterns replaced with simple calls
|
||||
|
||||
### Qualitative Measures
|
||||
- **Code Maintainability**: Easier to modify and extend datamodels
|
||||
- **Developer Experience**: Cleaner APIs and more intuitive interfaces
|
||||
- **Test Consistency**: Reliable test data that matches production models
|
||||
- **Interface Clarity**: Clear, well-documented datamodel interfaces
|
||||
|
||||
## Expected Optimization Outcomes
|
||||
|
||||
### Based on IssueActivity Success (Issue #126)
|
||||
|
||||
**Code Reduction Achieved:**
|
||||
- JSON serialization: 18 lines → 1 line (94% reduction)
|
||||
- Implementation detection: 13 lines → 3 lines (77% reduction)
|
||||
- Table formatting: 8 lines → 6 lines (25% reduction)
|
||||
- **Total**: ~21 lines of complex helper code eliminated
|
||||
|
||||
**Quality Improvements:**
|
||||
- Single source of truth for all operations
|
||||
- Consistent interface across all usage patterns
|
||||
- Better encapsulation and maintainability
|
||||
- Enhanced code readability and reliability
|
||||
|
||||
### Scalable Benefits
|
||||
- **Per-datamodel savings**: ~15-25 lines of code reduction potential
|
||||
- **Codebase-wide impact**: Systematic improvement across all datamodels
|
||||
- **Maintenance efficiency**: Centralized logic reduces update overhead
|
||||
- **Development velocity**: Faster feature development with better abstractions
|
||||
|
||||
## Usage Patterns
|
||||
|
||||
### 1. Proactive Analysis Mode
|
||||
```bash
|
||||
# Discover optimization opportunities
|
||||
markitect analyze-datamodels --scope all --report detailed
|
||||
|
||||
# Generate optimization plan
|
||||
markitect plan-datamodel-optimization --target DataModelClass
|
||||
|
||||
# Estimate impact
|
||||
markitect estimate-optimization-impact --model DataModelClass
|
||||
```
|
||||
|
||||
### 2. Guided Optimization Mode
|
||||
```bash
|
||||
# Interactive optimization session
|
||||
markitect optimize-datamodel --interactive DataModelClass
|
||||
|
||||
# Apply common patterns
|
||||
markitect apply-optimization-patterns --pattern serialization DataModelClass
|
||||
|
||||
# Validate optimization
|
||||
markitect validate-datamodel-optimization DataModelClass
|
||||
```
|
||||
|
||||
### 3. Batch Processing Mode
|
||||
```bash
|
||||
# Optimize all datamodels
|
||||
markitect batch-optimize-datamodels --safe-mode
|
||||
|
||||
# Generate optimization report
|
||||
markitect datamodel-optimization-report --format detailed
|
||||
|
||||
# Create test alignment fixes
|
||||
markitect fix-test-datamodel-alignment --auto-apply
|
||||
```
|
||||
|
||||
## Implementation Roadmap
|
||||
|
||||
### Phase 1: Agent Foundation (Immediate)
|
||||
1. Create datamodel discovery engine
|
||||
2. Implement usage pattern analysis
|
||||
3. Develop optimization opportunity detection
|
||||
4. Generate baseline assessment tools
|
||||
|
||||
### Phase 2: Core Optimization Capabilities
|
||||
1. Implement property generation framework
|
||||
2. Create method enhancement system
|
||||
3. Build serialization optimization tools
|
||||
4. Develop test alignment correction
|
||||
|
||||
### Phase 3: Advanced Features
|
||||
1. Add performance impact analysis
|
||||
2. Implement optimization success tracking
|
||||
3. Create integration with existing workflows
|
||||
4. Develop optimization knowledge base
|
||||
|
||||
### Phase 4: Ecosystem Integration
|
||||
1. Integration with Claude Code agent system
|
||||
2. Automated optimization suggestions
|
||||
3. Continuous improvement feedback loops
|
||||
4. Documentation and training materials
|
||||
|
||||
---
|
||||
|
||||
*This agent embodies the systematic approach to datamodel optimization demonstrated in the successful IssueActivity enhancement (Issue #126), providing a reusable framework for improving datamodels throughout any codebase while maintaining interface compatibility and test reliability.*
|
||||
Reference in New Issue
Block a user