Files
markitect-main/GAMEPLAN_ISSUE_141_VARIANT_B.md

453 lines
14 KiB
Markdown

# Gameplan: Issue #141 Asset Management - Variant B Implementation
**Date**: October 8, 2025
**Issue**: #141 - Asset Management Concepts
**Variant**: B - Content-Addressable Package System with Symlinks
**Status**: 📋 **IMPLEMENTATION GAMEPLAN**
## Executive Summary
This gameplan outlines the implementation of **Variant B** from Issue #141, which provides a **Content-Addressable Package System with Symlinks** for managing images and file includes in markitect. The implementation focuses on:
1. **Package-based document storage** (.mdpkg ZIP files)
2. **Symlink-based deduplication** with shared asset library
3. **CLI integration** with markitect commands
4. **Gradual rollout** with backward compatibility
## Architecture Overview
```
markitect_packages/
├── packages/ # Generated .mdpkg files
│ ├── document_a.mdpkg
│ └── document_b.mdpkg
├── shared_assets/ # Deduplicated asset library
│ ├── images/
│ │ ├── content_hash_1.png
│ │ └── content_hash_2.jpg
│ └── registry.json # Asset registry
└── workspace/ # Working directory with symlinks
├── document_a/
│ ├── index.md
│ └── assets/ # Symlinks to shared_assets
│ └── logo.png → ../../shared_assets/images/hash_1.png
└── document_b/
```
## Current Markitect Integration Points
Based on analysis of the existing codebase:
### Existing Modules
- **CLI Framework**: `/markitect/cli.py` - Main Click-based CLI with 247KB of commands
- **Module Structure**: Organized in packages (finance, issues, legacy, etc.)
- **Database Integration**: `/markitect/database.py` - SQLite-based storage
- **Configuration**: `/markitect/config_manager.py` - Centralized config management
- **Batch Processing**: `/markitect/batch_processor.py` - File processing pipeline
### Integration Strategy
- Follow existing patterns in `/markitect/finance/` and `/markitect/issues/`
- Use Click command groups for asset management commands
- Leverage existing `DatabaseManager` for metadata storage
- Integrate with `ConfigurationManager` for user settings
## Implementation Phases
### Phase 1: Core Asset Management Module (Week 1-2)
**Deliverables:**
1. **`/markitect/assets/` module structure**
2. **Asset registry and deduplication engine**
3. **Basic CLI commands**
4. **Unit tests**
**Components:**
```
markitect/assets/
├── __init__.py # Module exports
├── registry.py # AssetRegistry class
├── deduplicator.py # AssetDeduplicator class
├── packager.py # MarkdownPackager class
├── cli.py # Click command group
├── exceptions.py # Asset-specific exceptions
└── constants.py # Configuration constants
```
**Key Classes:**
- `AssetRegistry` - JSON-based asset metadata storage
- `AssetDeduplicator` - Symlink-based deduplication
- `MarkdownPackager` - .mdpkg creation/extraction
- `AssetManager` - High-level API coordinator
### Phase 2: CLI Integration (Week 3)
**Deliverables:**
1. **Full CLI command suite**
2. **Integration with existing markitect CLI**
3. **Configuration management**
4. **User documentation**
**CLI Commands:**
```bash
# Asset Management
markitect asset add <file> <document> [--name NAME]
markitect asset list [--document DOC] [--unused]
markitect asset dedupe [--dry-run]
markitect asset stats
markitect asset cleanup [--orphaned]
# Package Management
markitect package create <document-dir> <package-name>
markitect package extract <package-file> [--name NAME]
markitect package list
markitect package validate <package-file>
# Workspace Management
markitect workspace init [--template TEMPLATE]
markitect workspace status
markitect workspace sync [--document DOC]
```
### Phase 3: Advanced Features (Week 4-5)
**Deliverables:**
1. **Batch processing integration**
2. **Database schema extensions**
3. **Performance optimizations**
4. **Integration tests**
**Features:**
- **Batch Import**: Process entire directories of assets
- **Auto-discovery**: Scan markdown files for asset references
- **Format Optimization**: Automatic image compression/conversion
- **Workspace Templates**: Pre-configured project structures
- **Asset Search**: Content-based asset discovery
### Phase 4: Production Readiness (Week 6)
**Deliverables:**
1. **Error handling and recovery**
2. **Configuration validation**
3. **Performance benchmarking**
4. **Documentation completion**
**Production Features:**
- **Rollback Support**: Undo asset operations
- **Conflict Resolution**: Handle symlink/file conflicts
- **Cross-platform Support**: Windows symlink alternatives
- **Migration Tools**: Import from existing asset workflows
## Technical Specifications
### Module Structure
**`markitect/assets/__init__.py`**
```python
"""Asset Management for Markitect - Issue #141 Variant B Implementation."""
from .registry import AssetRegistry
from .deduplicator import AssetDeduplicator
from .packager import MarkdownPackager
from .manager import AssetManager
from .exceptions import AssetError, DuplicationError, PackageError
__all__ = [
'AssetRegistry',
'AssetDeduplicator',
'MarkdownPackager',
'AssetManager',
'AssetError',
'DuplicationError',
'PackageError'
]
```
**CLI Integration Pattern**
```python
# In markitect/cli.py
from .assets.cli import asset_commands
@cli.group()
def asset():
"""Asset management commands."""
pass
cli.add_command(asset_commands, 'asset')
```
### Database Schema Extensions
**Asset Metadata Table**
```sql
CREATE TABLE asset_metadata (
content_hash TEXT PRIMARY KEY,
original_name TEXT,
file_size INTEGER,
mime_type TEXT,
stored_path TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_accessed TIMESTAMP,
reference_count INTEGER DEFAULT 0
);
CREATE TABLE asset_references (
id INTEGER PRIMARY KEY AUTOINCREMENT,
content_hash TEXT,
document_path TEXT,
virtual_name TEXT,
markdown_line INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (content_hash) REFERENCES asset_metadata(content_hash)
);
CREATE INDEX idx_asset_refs_document ON asset_references(document_path);
CREATE INDEX idx_asset_refs_hash ON asset_references(content_hash);
```
### Configuration Schema
**Asset Management Settings**
```yaml
# markitect.yaml
asset_management:
enabled: true
workspace_path: "./markitect_workspace"
shared_assets_path: "./markitect_workspace/shared_assets"
packages_path: "./markitect_workspace/packages"
# Deduplication settings
auto_dedupe: true
symlink_preferred: true
fallback_to_copy: true # Windows compatibility
# Package settings
compression_level: 6
include_manifest: true
validate_on_create: true
# Performance settings
cache_enabled: true
batch_size: 100
max_file_size_mb: 50
```
## CLI Command Specifications
### Asset Commands
**`markitect asset add`**
```bash
# Basic usage
markitect asset add logo.png ./project_a --name company_logo.png
# Options
--name NAME # Virtual name in document (default: original filename)
--document PATH # Target document directory (required)
--force # Overwrite existing virtual name
--no-symlink # Force file copy instead of symlink
```
**`markitect asset list`**
```bash
# List all assets
markitect asset list
# Filter by document
markitect asset list --document ./project_a
# Show unused assets
markitect asset list --unused
# Output formats
markitect asset list --format json
markitect asset list --format table
```
**`markitect asset dedupe`**
```bash
# Dry run (show what would be deduplicated)
markitect asset dedupe --dry-run
# Execute deduplication
markitect asset dedupe
# Force deduplication of all assets
markitect asset dedupe --force
```
### Package Commands
**`markitect package create`**
```bash
# Create package from document directory
markitect package create ./project_a project_a
# Options
--output PATH # Output directory (default: workspace/packages)
--compression LEVEL # ZIP compression level 0-9 (default: 6)
--exclude PATTERN # Exclude files matching pattern
--include-sources # Include source markdown files
```
**`markitect package extract`**
```bash
# Extract package to workspace
markitect package extract project_a.mdpkg
# Extract with custom name
markitect package extract project_a.mdpkg --name project_a_v2
# Options
--output PATH # Output directory (default: workspace/documents)
--overwrite # Overwrite existing directory
--no-dedupe # Skip deduplication during extraction
```
## Testing Strategy
### Unit Tests
**Test Coverage Areas:**
- **Asset Registry**: JSON persistence, hash calculations, metadata management
- **Deduplicator**: Content hashing, symlink creation, fallback mechanisms
- **Packager**: ZIP creation/extraction, manifest handling, asset resolution
- **CLI Commands**: Command parsing, error handling, output formatting
**Test Structure:**
```
tests/
├── test_assets/
│ ├── test_registry.py
│ ├── test_deduplicator.py
│ ├── test_packager.py
│ └── test_cli.py
├── fixtures/
│ ├── test_images/
│ ├── test_documents/
│ └── test_packages/
└── integration/
├── test_full_workflow.py
└── test_cross_platform.py
```
### Integration Tests
**Workflow Tests:**
1. **Complete Asset Lifecycle**: Add → Dedupe → Package → Extract
2. **Cross-Document Sharing**: Multiple docs referencing same assets
3. **Package Portability**: Create on one system, extract on another
4. **Error Recovery**: Broken symlinks, missing files, corrupted packages
### Performance Tests
**Benchmarking Scenarios:**
- **Large Asset Libraries**: 1000+ assets, multiple documents
- **Batch Processing**: Importing entire directories
- **Package Operations**: Creating/extracting large packages
- **Deduplication Efficiency**: Storage savings measurement
## Risk Mitigation
### Technical Risks
**Symlink Compatibility**
- **Risk**: Symlinks fail on Windows or restricted filesystems
- **Mitigation**: Automatic fallback to file copying
- **Detection**: Platform detection and permission testing
**Package Corruption**
- **Risk**: ZIP files become corrupted during transfer
- **Mitigation**: Built-in validation and checksum verification
- **Recovery**: Package repair tools and backup strategies
**Storage Scalability**
- **Risk**: Asset libraries become too large to manage efficiently
- **Mitigation**: Lazy loading, pagination, and cleanup tools
- **Monitoring**: Storage usage tracking and alerts
### User Experience Risks
**Learning Curve**
- **Risk**: Users find asset management complex
- **Mitigation**: Progressive disclosure, good defaults, clear documentation
- **Support**: Interactive tutorials and example workflows
**Data Loss**
- **Risk**: Assets accidentally deleted or corrupted
- **Mitigation**: Confirmation prompts, soft deletion, backup recommendations
- **Recovery**: Asset history tracking and restore capabilities
## Success Metrics
### Technical Metrics
- **Storage Efficiency**: 30%+ reduction in duplicate asset storage
- **Performance**: Asset operations complete in <100ms for typical workloads
- **Reliability**: 99.9%+ success rate for package operations
- **Compatibility**: Works on Windows, macOS, Linux
### User Adoption Metrics
- **CLI Usage**: Asset commands represent 10%+ of total markitect usage
- **Package Creation**: Users create 5+ packages per month on average
- **Error Rates**: <1% of asset operations result in user-visible errors
- **Documentation**: Asset management docs have 95%+ user satisfaction
## Implementation Timeline
**Week 1-2: Core Module**
- [ ] Asset registry implementation
- [ ] Deduplication engine with symlinks
- [ ] Basic package creation/extraction
- [ ] Unit test suite (80%+ coverage)
**Week 3: CLI Integration**
- [ ] Complete CLI command suite
- [ ] Integration with main markitect CLI
- [ ] Configuration management
- [ ] User documentation
**Week 4-5: Advanced Features**
- [ ] Batch processing capabilities
- [ ] Database integration
- [ ] Performance optimizations
- [ ] Integration test suite
**Week 6: Production Readiness**
- [ ] Error handling and recovery
- [ ] Cross-platform testing
- [ ] Performance benchmarking
- [ ] Release preparation
## Dependencies
### Internal Dependencies
- **markitect.database**: Metadata storage integration
- **markitect.config_manager**: Configuration management
- **markitect.cli**: Command registration and parsing
- **markitect.batch_processor**: Bulk operation support
### External Dependencies
- **Click**: CLI framework (existing dependency)
- **Pathlib**: Path manipulation (standard library)
- **Zipfile**: Package creation (standard library)
- **Hashlib**: Content hashing (standard library)
- **JSON**: Metadata serialization (standard library)
- **OS**: Symlink operations (standard library)
### Optional Dependencies
- **Pillow**: Image processing and optimization
- **Send2trash**: Safe file deletion
- **Watchdog**: File system monitoring
## Next Steps
1. **Review and Approval**: Get stakeholder sign-off on this gameplan
2. **Environment Setup**: Prepare development environment and test fixtures
3. **Phase 1 Kickoff**: Begin core module implementation
4. **Continuous Integration**: Set up automated testing pipeline
5. **Documentation**: Start user guide and API documentation
This gameplan provides a comprehensive roadmap for implementing Issue #141 Variant B, ensuring robust asset management capabilities while maintaining compatibility with existing markitect workflows.
---
**Status**: 📋 **Ready for Implementation - Awaiting Approval**