Based on successful IssueActivity optimization (Issue #126), created a comprehensive Claude Code subagent specialized in datamodel enhancement: Agent Documentation (docs/sub_agents/datamodel_optimizer.md): - 4-phase optimization methodology (Discovery, Analysis, Enhancement, Validation) - Core patterns: property-based formatting, serialization consolidation - Integration framework with Claude Code ecosystem - Success metrics and implementation roadmap Practical Implementation Tool (tools/datamodel_optimizer.py): - AST-based datamodel discovery engine - Usage pattern analysis with impact scoring - Multi-format reporting (summary, detailed, JSON) - CLI interface for interactive and batch processing Real Codebase Validation: - Analyzed 97 datamodels in current codebase - Identified 350 usage patterns and 119 optimization opportunities - Potential 518 lines of code reduction - Correctly recognized IssueActivity optimizations from Issue #126 Core Capabilities: - Property-based formatting consolidation - Verbose serialization → single method calls - Test data consistency (dict mocks → proper objects) - Business logic encapsulation Agent provides systematic, reusable framework for datamodel optimization across any codebase while preserving interface compatibility. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
Datamodel Optimization Specialist Agent
Executive Summary
The Datamodel Optimization Specialist is a Claude Code subagent designed to systematically analyze, optimize, and enhance dataclasses, models, and data structures within a codebase. Based on the successful optimization of IssueActivity (Issue #126), this agent provides comprehensive datamodel improvements including convenience methods, interface consistency, code reduction, and test alignment.
Problem Analysis
Core Issues Identified
- Scattered Interface Logic: Formatting and display logic spread across multiple files
- Test/Production Mismatches: Tests using dictionary mocks instead of proper dataclass objects
- Verbose Code Patterns: Repetitive serialization and formatting code
- Poor Encapsulation: Direct attribute access without convenient methods
- Helper Code Complexity: Complex utility functions handling multiple data formats
Impact Assessment
- Development Efficiency: Time wasted on repetitive formatting and serialization
- Code Maintainability: Logic scattered across multiple locations
- Test Reliability: Fragile dictionary mocks breaking easily
- Interface Consistency: Inconsistent access patterns across codebase
Agent Capabilities
1. Datamodel Discovery & Analysis
- Class Pattern Recognition: Identify dataclasses, Pydantic models, and plain classes
- Usage Pattern Analysis: Map how models are used across the codebase
- Interface Assessment: Analyze current attribute access patterns
- Test Pattern Detection: Identify mock vs real object usage inconsistencies
2. Optimization Opportunity Detection
- Convenience Method Gaps: Identify missing formatting/display methods
- Serialization Optimization: Find verbose dict building patterns
- Code Duplication Detection: Locate repeated formatting logic
- Test Alignment Issues: Find test/production data structure mismatches
3. Enhancement Implementation
- Property Addition: Add computed properties for common operations
- Method Generation: Create convenience methods for frequent patterns
- Serialization Methods: Implement clean
to_dict()and similar methods - Display Formatting: Add formatting methods for UI/CLI display
4. Test Consistency Resolution
- Mock Replacement: Convert dictionary mocks to proper object instances
- Test Data Factories: Create factories for consistent test objects
- Mock Validation: Ensure mocks match real object interfaces
- Test Coverage Enhancement: Improve test reliability and maintainability
Methodology Framework
Phase 1: Discovery & Analysis
1.1 Datamodel Inventory
# Discover dataclasses and models
find . -name "*.py" -exec grep -l "@dataclass\|BaseModel\|class.*:" {} \;
# Analyze attribute patterns
grep -r "def __init__\|@property" --include="*.py" .
# Map usage patterns
grep -rn "\.attribute\|\.method" --include="*.py" .
1.2 Usage Pattern Analysis
# Find formatting patterns
grep -r "strftime\|\.value\|\.lower()\|\.upper()" --include="*.py" .
# Identify serialization patterns
grep -r "{'.*':\|dict(\|\.items()\|\.keys()" --include="*.py" .
# Detect repetitive code
grep -r -A5 -B5 "for.*in.*:" --include="*.py" . | grep -A10 -B10 "append\|\.get("
1.3 Test Pattern Assessment
# Find mock usage
grep -r "Mock(\|mock\.\|@patch" tests/ --include="*.py"
# Identify dictionary test data
grep -r "{\s*['\"].*['\"]\s*:" tests/ --include="*.py"
# Map test data patterns
grep -r "test.*data\|mock.*data" tests/ --include="*.py"
Phase 2: Optimization Strategy Development
2.1 Enhancement Planning
Based on analysis, create optimization plan:
Property Candidates:
- Date/datetime formatting
- Enum value extraction
- Display-friendly representations
- Truncated content for UI
Method Candidates:
- Keyword search functionality
- Business logic validation
- Serialization/deserialization
- Comparison operations
Code Reduction Opportunities:
- Verbose dictionary building → single method calls
- Repeated formatting logic → property access
- Complex conditional logic → method encapsulation
2.2 Impact Assessment
class OptimizationImpact:
"""Assess potential impact of datamodel optimization."""
def calculate_loc_reduction(self, patterns: List[Pattern]) -> int:
"""Calculate potential lines of code reduction."""
pass
def assess_maintainability_improvement(self) -> MetricScore:
"""Evaluate maintainability improvements."""
pass
def estimate_test_reliability_gain(self) -> MetricScore:
"""Estimate test reliability improvements."""
pass
Phase 3: Implementation Execution
3.1 Datamodel Enhancement
# Example enhancement pattern (based on IssueActivity)
@dataclass
class OptimizedDataModel:
# Original fields (preserve existing interface)
core_field: str
enum_field: SomeEnum
date_field: date
# Add convenience properties
@property
def enum_value(self) -> str:
"""Get string value of enum field."""
return self.enum_field.value if self.enum_field else ''
@property
def display_name(self) -> str:
"""Get display-friendly representation."""
return self.enum_value.replace('_', ' ').title()
@property
def formatted_date(self) -> str:
"""Get formatted date string."""
return self.date_field.strftime('%Y-%m-%d') if self.date_field else 'N/A'
# Add convenience methods
def contains_keyword(self, keyword: str, case_sensitive: bool = False) -> bool:
"""Check if model contains keyword."""
pass
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary representation."""
pass
3.2 Code Simplification
# BEFORE: Verbose patterns
data_list = []
for item in items:
data = {
'id': item.id,
'name': item.name,
'status': item.status.value if item.status else '',
'date': item.date.strftime('%Y-%m-%d') if item.date else 'N/A'
}
data_list.append(data)
# AFTER: Optimized pattern
data_list = [item.to_dict() for item in items]
3.3 Test Consistency Resolution
# BEFORE: Dictionary mocks
mock_data = {
'field1': 'value1',
'field2': 'value2',
'status': 'active' # String instead of enum!
}
# AFTER: Proper object instances
from models import DataModel, StatusEnum
test_data = DataModel(
field1='value1',
field2='value2',
status=StatusEnum.ACTIVE # Proper enum usage
)
Phase 4: Validation & Testing
4.1 Functionality Preservation
# Ensure all tests still pass
pytest --tb=short -x
# Verify no breaking changes
python -c "from models import DataModel; print('Interface preserved')"
# Check type consistency
mypy . --strict
4.2 Optimization Verification
class OptimizationValidator:
"""Validate optimization results."""
def verify_loc_reduction(self) -> bool:
"""Verify actual LOC reduction matches estimates."""
pass
def validate_interface_preservation(self) -> bool:
"""Ensure existing interfaces still work."""
pass
def check_performance_impact(self) -> PerformanceReport:
"""Measure any performance impact."""
pass
Core Optimization Patterns
Pattern 1: Property-Based Formatting
Problem: Repetitive formatting code scattered across files Solution: Centralized formatting properties
# Replace scattered formatting
activity.activity_type.value.title()
activity.activity_date.strftime('%Y-%m-%d') if activity.activity_date else 'N/A'
(activity.details[:40] + '...') if len(activity.details) > 40 else activity.details
# With clean properties
activity.activity_type_display
activity.formatted_date
activity.truncated_details
Pattern 2: Serialization Method Consolidation
Problem: Verbose dictionary building patterns Solution: Single method calls
# Replace 18-line dictionary building
activity_data = []
for activity in activities:
data = {
'id': activity.id,
'type': activity.activity_type.value,
'date': activity.activity_date.isoformat() if activity.activity_date else None,
# ... many more lines
}
activity_data.append(data)
# With single method call
activity_data = [activity.to_dict() for activity in activities]
Pattern 3: Business Logic Encapsulation
Problem: Complex conditional logic spread across codebase Solution: Encapsulated methods
# Replace complex logic
has_implementation = any(
'implement' in (getattr(activity, 'activity_type', None).value
if hasattr(activity, 'activity_type') and getattr(activity, 'activity_type')
else activity.get('activity_type', '') if hasattr(activity, 'get')
else '').lower()
for activity in activities
)
# With simple method call
has_implementation = any(activity.has_implementation_activity() for activity in activities)
Pattern 4: Test Data Consistency
Problem: Mock/real object mismatches Solution: Proper object instances in tests
# Replace fragile dictionary mocks
with patch.object(service, 'get_activities') as mock_activities:
mock_activities.return_value = [
{'activity_type': 'implementation', 'description': 'Implemented feature'}
]
# With proper objects
with patch.object(service, 'get_activities') as mock_activities:
mock_activities.return_value = [
Activity(
activity_type=ActivityType.CREATED,
activity_details='Implemented feature'
)
]
Integration Framework
With Existing Claude Code Tools
- Task Agent: Enhanced for datamodel-specific optimization tasks
- TodoWrite: Track optimization progress with specific checkpoints
- Testing Framework: Validate optimizations don't break functionality
- Git Integration: Clean commits with comprehensive optimization documentation
With Development Workflow
- Issue Analysis: Identify datamodel optimization opportunities in issues
- Code Review: Suggest optimizations during development
- Refactoring Support: Guide systematic datamodel improvements
- Documentation: Maintain optimization knowledge base
Success Metrics
Quantitative Measures
- Lines of Code Reduction: Measure LOC saved through optimization
- Code Duplication Elimination: Track removed duplicate patterns
- Test Reliability Improvement: Measure test failure reduction
- Method Call Simplification: Count complex patterns replaced with simple calls
Qualitative Measures
- Code Maintainability: Easier to modify and extend datamodels
- Developer Experience: Cleaner APIs and more intuitive interfaces
- Test Consistency: Reliable test data that matches production models
- Interface Clarity: Clear, well-documented datamodel interfaces
Expected Optimization Outcomes
Based on IssueActivity Success (Issue #126)
Code Reduction Achieved:
- JSON serialization: 18 lines → 1 line (94% reduction)
- Implementation detection: 13 lines → 3 lines (77% reduction)
- Table formatting: 8 lines → 6 lines (25% reduction)
- Total: ~21 lines of complex helper code eliminated
Quality Improvements:
- Single source of truth for all operations
- Consistent interface across all usage patterns
- Better encapsulation and maintainability
- Enhanced code readability and reliability
Scalable Benefits
- Per-datamodel savings: ~15-25 lines of code reduction potential
- Codebase-wide impact: Systematic improvement across all datamodels
- Maintenance efficiency: Centralized logic reduces update overhead
- Development velocity: Faster feature development with better abstractions
Usage Patterns
1. Proactive Analysis Mode
# Discover optimization opportunities
markitect analyze-datamodels --scope all --report detailed
# Generate optimization plan
markitect plan-datamodel-optimization --target DataModelClass
# Estimate impact
markitect estimate-optimization-impact --model DataModelClass
2. Guided Optimization Mode
# Interactive optimization session
markitect optimize-datamodel --interactive DataModelClass
# Apply common patterns
markitect apply-optimization-patterns --pattern serialization DataModelClass
# Validate optimization
markitect validate-datamodel-optimization DataModelClass
3. Batch Processing Mode
# Optimize all datamodels
markitect batch-optimize-datamodels --safe-mode
# Generate optimization report
markitect datamodel-optimization-report --format detailed
# Create test alignment fixes
markitect fix-test-datamodel-alignment --auto-apply
Implementation Roadmap
Phase 1: Agent Foundation (Immediate)
- Create datamodel discovery engine
- Implement usage pattern analysis
- Develop optimization opportunity detection
- Generate baseline assessment tools
Phase 2: Core Optimization Capabilities
- Implement property generation framework
- Create method enhancement system
- Build serialization optimization tools
- Develop test alignment correction
Phase 3: Advanced Features
- Add performance impact analysis
- Implement optimization success tracking
- Create integration with existing workflows
- Develop optimization knowledge base
Phase 4: Ecosystem Integration
- Integration with Claude Code agent system
- Automated optimization suggestions
- Continuous improvement feedback loops
- Documentation and training materials
This agent embodies the systematic approach to datamodel optimization demonstrated in the successful IssueActivity enhancement (Issue #126), providing a reusable framework for improving datamodels throughout any codebase while maintaining interface compatibility and test reliability.