14 KiB
Gameplan: Issue #141 Asset Management - Variant B Implementation
Date: October 8, 2025 Issue: #141 - Asset Management Concepts Variant: B - Content-Addressable Package System with Symlinks Status: 📋 IMPLEMENTATION GAMEPLAN
Executive Summary
This gameplan outlines the implementation of Variant B from Issue #141, which provides a Content-Addressable Package System with Symlinks for managing images and file includes in markitect. The implementation focuses on:
- Package-based document storage (.mdpkg ZIP files)
- Symlink-based deduplication with shared asset library
- CLI integration with markitect commands
- Gradual rollout with backward compatibility
Architecture Overview
markitect_packages/
├── packages/ # Generated .mdpkg files
│ ├── document_a.mdpkg
│ └── document_b.mdpkg
├── shared_assets/ # Deduplicated asset library
│ ├── images/
│ │ ├── content_hash_1.png
│ │ └── content_hash_2.jpg
│ └── registry.json # Asset registry
└── workspace/ # Working directory with symlinks
├── document_a/
│ ├── index.md
│ └── assets/ # Symlinks to shared_assets
│ └── logo.png → ../../shared_assets/images/hash_1.png
└── document_b/
Current Markitect Integration Points
Based on analysis of the existing codebase:
Existing Modules
- CLI Framework:
/markitect/cli.py- Main Click-based CLI with 247KB of commands - Module Structure: Organized in packages (finance, issues, legacy, etc.)
- Database Integration:
/markitect/database.py- SQLite-based storage - Configuration:
/markitect/config_manager.py- Centralized config management - Batch Processing:
/markitect/batch_processor.py- File processing pipeline
Integration Strategy
- Follow existing patterns in
/markitect/finance/and/markitect/issues/ - Use Click command groups for asset management commands
- Leverage existing
DatabaseManagerfor metadata storage - Integrate with
ConfigurationManagerfor user settings
Implementation Phases
Phase 1: Core Asset Management Module (Week 1-2)
Deliverables:
/markitect/assets/module structure- Asset registry and deduplication engine
- Basic CLI commands
- Unit tests
Components:
markitect/assets/
├── __init__.py # Module exports
├── registry.py # AssetRegistry class
├── deduplicator.py # AssetDeduplicator class
├── packager.py # MarkdownPackager class
├── cli.py # Click command group
├── exceptions.py # Asset-specific exceptions
└── constants.py # Configuration constants
Key Classes:
AssetRegistry- JSON-based asset metadata storageAssetDeduplicator- Symlink-based deduplicationMarkdownPackager- .mdpkg creation/extractionAssetManager- High-level API coordinator
Phase 2: CLI Integration (Week 3)
Deliverables:
- Full CLI command suite
- Integration with existing markitect CLI
- Configuration management
- User documentation
CLI Commands:
# Asset Management
markitect asset add <file> <document> [--name NAME]
markitect asset list [--document DOC] [--unused]
markitect asset dedupe [--dry-run]
markitect asset stats
markitect asset cleanup [--orphaned]
# Package Management
markitect package create <document-dir> <package-name>
markitect package extract <package-file> [--name NAME]
markitect package list
markitect package validate <package-file>
# Workspace Management
markitect workspace init [--template TEMPLATE]
markitect workspace status
markitect workspace sync [--document DOC]
Phase 3: Advanced Features (Week 4-5)
Deliverables:
- Batch processing integration
- Database schema extensions
- Performance optimizations
- Integration tests
Features:
- Batch Import: Process entire directories of assets
- Auto-discovery: Scan markdown files for asset references
- Format Optimization: Automatic image compression/conversion
- Workspace Templates: Pre-configured project structures
- Asset Search: Content-based asset discovery
Phase 4: Production Readiness (Week 6)
Deliverables:
- Error handling and recovery
- Configuration validation
- Performance benchmarking
- Documentation completion
Production Features:
- Rollback Support: Undo asset operations
- Conflict Resolution: Handle symlink/file conflicts
- Cross-platform Support: Windows symlink alternatives
- Migration Tools: Import from existing asset workflows
Technical Specifications
Module Structure
markitect/assets/__init__.py
"""Asset Management for Markitect - Issue #141 Variant B Implementation."""
from .registry import AssetRegistry
from .deduplicator import AssetDeduplicator
from .packager import MarkdownPackager
from .manager import AssetManager
from .exceptions import AssetError, DuplicationError, PackageError
__all__ = [
'AssetRegistry',
'AssetDeduplicator',
'MarkdownPackager',
'AssetManager',
'AssetError',
'DuplicationError',
'PackageError'
]
CLI Integration Pattern
# In markitect/cli.py
from .assets.cli import asset_commands
@cli.group()
def asset():
"""Asset management commands."""
pass
cli.add_command(asset_commands, 'asset')
Database Schema Extensions
Asset Metadata Table
CREATE TABLE asset_metadata (
content_hash TEXT PRIMARY KEY,
original_name TEXT,
file_size INTEGER,
mime_type TEXT,
stored_path TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_accessed TIMESTAMP,
reference_count INTEGER DEFAULT 0
);
CREATE TABLE asset_references (
id INTEGER PRIMARY KEY AUTOINCREMENT,
content_hash TEXT,
document_path TEXT,
virtual_name TEXT,
markdown_line INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (content_hash) REFERENCES asset_metadata(content_hash)
);
CREATE INDEX idx_asset_refs_document ON asset_references(document_path);
CREATE INDEX idx_asset_refs_hash ON asset_references(content_hash);
Configuration Schema
Asset Management Settings
# markitect.yaml
asset_management:
enabled: true
workspace_path: "./markitect_workspace"
shared_assets_path: "./markitect_workspace/shared_assets"
packages_path: "./markitect_workspace/packages"
# Deduplication settings
auto_dedupe: true
symlink_preferred: true
fallback_to_copy: true # Windows compatibility
# Package settings
compression_level: 6
include_manifest: true
validate_on_create: true
# Performance settings
cache_enabled: true
batch_size: 100
max_file_size_mb: 50
CLI Command Specifications
Asset Commands
markitect asset add
# Basic usage
markitect asset add logo.png ./project_a --name company_logo.png
# Options
--name NAME # Virtual name in document (default: original filename)
--document PATH # Target document directory (required)
--force # Overwrite existing virtual name
--no-symlink # Force file copy instead of symlink
markitect asset list
# List all assets
markitect asset list
# Filter by document
markitect asset list --document ./project_a
# Show unused assets
markitect asset list --unused
# Output formats
markitect asset list --format json
markitect asset list --format table
markitect asset dedupe
# Dry run (show what would be deduplicated)
markitect asset dedupe --dry-run
# Execute deduplication
markitect asset dedupe
# Force deduplication of all assets
markitect asset dedupe --force
Package Commands
markitect package create
# Create package from document directory
markitect package create ./project_a project_a
# Options
--output PATH # Output directory (default: workspace/packages)
--compression LEVEL # ZIP compression level 0-9 (default: 6)
--exclude PATTERN # Exclude files matching pattern
--include-sources # Include source markdown files
markitect package extract
# Extract package to workspace
markitect package extract project_a.mdpkg
# Extract with custom name
markitect package extract project_a.mdpkg --name project_a_v2
# Options
--output PATH # Output directory (default: workspace/documents)
--overwrite # Overwrite existing directory
--no-dedupe # Skip deduplication during extraction
Testing Strategy
Unit Tests
Test Coverage Areas:
- Asset Registry: JSON persistence, hash calculations, metadata management
- Deduplicator: Content hashing, symlink creation, fallback mechanisms
- Packager: ZIP creation/extraction, manifest handling, asset resolution
- CLI Commands: Command parsing, error handling, output formatting
Test Structure:
tests/
├── test_assets/
│ ├── test_registry.py
│ ├── test_deduplicator.py
│ ├── test_packager.py
│ └── test_cli.py
├── fixtures/
│ ├── test_images/
│ ├── test_documents/
│ └── test_packages/
└── integration/
├── test_full_workflow.py
└── test_cross_platform.py
Integration Tests
Workflow Tests:
- Complete Asset Lifecycle: Add → Dedupe → Package → Extract
- Cross-Document Sharing: Multiple docs referencing same assets
- Package Portability: Create on one system, extract on another
- Error Recovery: Broken symlinks, missing files, corrupted packages
Performance Tests
Benchmarking Scenarios:
- Large Asset Libraries: 1000+ assets, multiple documents
- Batch Processing: Importing entire directories
- Package Operations: Creating/extracting large packages
- Deduplication Efficiency: Storage savings measurement
Risk Mitigation
Technical Risks
Symlink Compatibility
- Risk: Symlinks fail on Windows or restricted filesystems
- Mitigation: Automatic fallback to file copying
- Detection: Platform detection and permission testing
Package Corruption
- Risk: ZIP files become corrupted during transfer
- Mitigation: Built-in validation and checksum verification
- Recovery: Package repair tools and backup strategies
Storage Scalability
- Risk: Asset libraries become too large to manage efficiently
- Mitigation: Lazy loading, pagination, and cleanup tools
- Monitoring: Storage usage tracking and alerts
User Experience Risks
Learning Curve
- Risk: Users find asset management complex
- Mitigation: Progressive disclosure, good defaults, clear documentation
- Support: Interactive tutorials and example workflows
Data Loss
- Risk: Assets accidentally deleted or corrupted
- Mitigation: Confirmation prompts, soft deletion, backup recommendations
- Recovery: Asset history tracking and restore capabilities
Success Metrics
Technical Metrics
- Storage Efficiency: 30%+ reduction in duplicate asset storage
- Performance: Asset operations complete in <100ms for typical workloads
- Reliability: 99.9%+ success rate for package operations
- Compatibility: Works on Windows, macOS, Linux
User Adoption Metrics
- CLI Usage: Asset commands represent 10%+ of total markitect usage
- Package Creation: Users create 5+ packages per month on average
- Error Rates: <1% of asset operations result in user-visible errors
- Documentation: Asset management docs have 95%+ user satisfaction
Implementation Timeline
Week 1-2: Core Module
- Asset registry implementation
- Deduplication engine with symlinks
- Basic package creation/extraction
- Unit test suite (80%+ coverage)
Week 3: CLI Integration
- Complete CLI command suite
- Integration with main markitect CLI
- Configuration management
- User documentation
Week 4-5: Advanced Features
- Batch processing capabilities
- Database integration
- Performance optimizations
- Integration test suite
Week 6: Production Readiness
- Error handling and recovery
- Cross-platform testing
- Performance benchmarking
- Release preparation
Dependencies
Internal Dependencies
- markitect.database: Metadata storage integration
- markitect.config_manager: Configuration management
- markitect.cli: Command registration and parsing
- markitect.batch_processor: Bulk operation support
External Dependencies
- Click: CLI framework (existing dependency)
- Pathlib: Path manipulation (standard library)
- Zipfile: Package creation (standard library)
- Hashlib: Content hashing (standard library)
- JSON: Metadata serialization (standard library)
- OS: Symlink operations (standard library)
Optional Dependencies
- Pillow: Image processing and optimization
- Send2trash: Safe file deletion
- Watchdog: File system monitoring
Next Steps
- Review and Approval: Get stakeholder sign-off on this gameplan
- Environment Setup: Prepare development environment and test fixtures
- Phase 1 Kickoff: Begin core module implementation
- Continuous Integration: Set up automated testing pipeline
- Documentation: Start user guide and API documentation
This gameplan provides a comprehensive roadmap for implementing Issue #141 Variant B, ensuring robust asset management capabilities while maintaining compatibility with existing markitect workflows.
Status: 📋 Ready for Implementation - Awaiting Approval