Files
markitect-main/GAMEPLAN_ISSUE_141_VARIANT_B.md

14 KiB

Gameplan: Issue #141 Asset Management - Variant B Implementation

Date: October 8, 2025 Issue: #141 - Asset Management Concepts Variant: B - Content-Addressable Package System with Symlinks Status: 📋 IMPLEMENTATION GAMEPLAN

Executive Summary

This gameplan outlines the implementation of Variant B from Issue #141, which provides a Content-Addressable Package System with Symlinks for managing images and file includes in markitect. The implementation focuses on:

  1. Package-based document storage (.mdpkg ZIP files)
  2. Symlink-based deduplication with shared asset library
  3. CLI integration with markitect commands
  4. Gradual rollout with backward compatibility

Architecture Overview

markitect_packages/
├── packages/                 # Generated .mdpkg files
│   ├── document_a.mdpkg
│   └── document_b.mdpkg
├── shared_assets/           # Deduplicated asset library
│   ├── images/
│   │   ├── content_hash_1.png
│   │   └── content_hash_2.jpg
│   └── registry.json       # Asset registry
└── workspace/               # Working directory with symlinks
    ├── document_a/
    │   ├── index.md
    │   └── assets/          # Symlinks to shared_assets
    │       └── logo.png → ../../shared_assets/images/hash_1.png
    └── document_b/

Current Markitect Integration Points

Based on analysis of the existing codebase:

Existing Modules

  • CLI Framework: /markitect/cli.py - Main Click-based CLI with 247KB of commands
  • Module Structure: Organized in packages (finance, issues, legacy, etc.)
  • Database Integration: /markitect/database.py - SQLite-based storage
  • Configuration: /markitect/config_manager.py - Centralized config management
  • Batch Processing: /markitect/batch_processor.py - File processing pipeline

Integration Strategy

  • Follow existing patterns in /markitect/finance/ and /markitect/issues/
  • Use Click command groups for asset management commands
  • Leverage existing DatabaseManager for metadata storage
  • Integrate with ConfigurationManager for user settings

Implementation Phases

Phase 1: Core Asset Management Module (Week 1-2)

Deliverables:

  1. /markitect/assets/ module structure
  2. Asset registry and deduplication engine
  3. Basic CLI commands
  4. Unit tests

Components:

markitect/assets/
├── __init__.py              # Module exports
├── registry.py              # AssetRegistry class
├── deduplicator.py         # AssetDeduplicator class
├── packager.py             # MarkdownPackager class
├── cli.py                  # Click command group
├── exceptions.py           # Asset-specific exceptions
└── constants.py            # Configuration constants

Key Classes:

  • AssetRegistry - JSON-based asset metadata storage
  • AssetDeduplicator - Symlink-based deduplication
  • MarkdownPackager - .mdpkg creation/extraction
  • AssetManager - High-level API coordinator

Phase 2: CLI Integration (Week 3)

Deliverables:

  1. Full CLI command suite
  2. Integration with existing markitect CLI
  3. Configuration management
  4. User documentation

CLI Commands:

# Asset Management
markitect asset add <file> <document> [--name NAME]
markitect asset list [--document DOC] [--unused]
markitect asset dedupe [--dry-run]
markitect asset stats
markitect asset cleanup [--orphaned]

# Package Management
markitect package create <document-dir> <package-name>
markitect package extract <package-file> [--name NAME]
markitect package list
markitect package validate <package-file>

# Workspace Management
markitect workspace init [--template TEMPLATE]
markitect workspace status
markitect workspace sync [--document DOC]

Phase 3: Advanced Features (Week 4-5)

Deliverables:

  1. Batch processing integration
  2. Database schema extensions
  3. Performance optimizations
  4. Integration tests

Features:

  • Batch Import: Process entire directories of assets
  • Auto-discovery: Scan markdown files for asset references
  • Format Optimization: Automatic image compression/conversion
  • Workspace Templates: Pre-configured project structures
  • Asset Search: Content-based asset discovery

Phase 4: Production Readiness (Week 6)

Deliverables:

  1. Error handling and recovery
  2. Configuration validation
  3. Performance benchmarking
  4. Documentation completion

Production Features:

  • Rollback Support: Undo asset operations
  • Conflict Resolution: Handle symlink/file conflicts
  • Cross-platform Support: Windows symlink alternatives
  • Migration Tools: Import from existing asset workflows

Technical Specifications

Module Structure

markitect/assets/__init__.py

"""Asset Management for Markitect - Issue #141 Variant B Implementation."""

from .registry import AssetRegistry
from .deduplicator import AssetDeduplicator
from .packager import MarkdownPackager
from .manager import AssetManager
from .exceptions import AssetError, DuplicationError, PackageError

__all__ = [
    'AssetRegistry',
    'AssetDeduplicator',
    'MarkdownPackager',
    'AssetManager',
    'AssetError',
    'DuplicationError',
    'PackageError'
]

CLI Integration Pattern

# In markitect/cli.py
from .assets.cli import asset_commands

@cli.group()
def asset():
    """Asset management commands."""
    pass

cli.add_command(asset_commands, 'asset')

Database Schema Extensions

Asset Metadata Table

CREATE TABLE asset_metadata (
    content_hash TEXT PRIMARY KEY,
    original_name TEXT,
    file_size INTEGER,
    mime_type TEXT,
    stored_path TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_accessed TIMESTAMP,
    reference_count INTEGER DEFAULT 0
);

CREATE TABLE asset_references (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    content_hash TEXT,
    document_path TEXT,
    virtual_name TEXT,
    markdown_line INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (content_hash) REFERENCES asset_metadata(content_hash)
);

CREATE INDEX idx_asset_refs_document ON asset_references(document_path);
CREATE INDEX idx_asset_refs_hash ON asset_references(content_hash);

Configuration Schema

Asset Management Settings

# markitect.yaml
asset_management:
  enabled: true
  workspace_path: "./markitect_workspace"
  shared_assets_path: "./markitect_workspace/shared_assets"
  packages_path: "./markitect_workspace/packages"

  # Deduplication settings
  auto_dedupe: true
  symlink_preferred: true
  fallback_to_copy: true  # Windows compatibility

  # Package settings
  compression_level: 6
  include_manifest: true
  validate_on_create: true

  # Performance settings
  cache_enabled: true
  batch_size: 100
  max_file_size_mb: 50

CLI Command Specifications

Asset Commands

markitect asset add

# Basic usage
markitect asset add logo.png ./project_a --name company_logo.png

# Options
--name NAME           # Virtual name in document (default: original filename)
--document PATH       # Target document directory (required)
--force              # Overwrite existing virtual name
--no-symlink         # Force file copy instead of symlink

markitect asset list

# List all assets
markitect asset list

# Filter by document
markitect asset list --document ./project_a

# Show unused assets
markitect asset list --unused

# Output formats
markitect asset list --format json
markitect asset list --format table

markitect asset dedupe

# Dry run (show what would be deduplicated)
markitect asset dedupe --dry-run

# Execute deduplication
markitect asset dedupe

# Force deduplication of all assets
markitect asset dedupe --force

Package Commands

markitect package create

# Create package from document directory
markitect package create ./project_a project_a

# Options
--output PATH         # Output directory (default: workspace/packages)
--compression LEVEL   # ZIP compression level 0-9 (default: 6)
--exclude PATTERN     # Exclude files matching pattern
--include-sources     # Include source markdown files

markitect package extract

# Extract package to workspace
markitect package extract project_a.mdpkg

# Extract with custom name
markitect package extract project_a.mdpkg --name project_a_v2

# Options
--output PATH         # Output directory (default: workspace/documents)
--overwrite          # Overwrite existing directory
--no-dedupe          # Skip deduplication during extraction

Testing Strategy

Unit Tests

Test Coverage Areas:

  • Asset Registry: JSON persistence, hash calculations, metadata management
  • Deduplicator: Content hashing, symlink creation, fallback mechanisms
  • Packager: ZIP creation/extraction, manifest handling, asset resolution
  • CLI Commands: Command parsing, error handling, output formatting

Test Structure:

tests/
├── test_assets/
│   ├── test_registry.py
│   ├── test_deduplicator.py
│   ├── test_packager.py
│   └── test_cli.py
├── fixtures/
│   ├── test_images/
│   ├── test_documents/
│   └── test_packages/
└── integration/
    ├── test_full_workflow.py
    └── test_cross_platform.py

Integration Tests

Workflow Tests:

  1. Complete Asset Lifecycle: Add → Dedupe → Package → Extract
  2. Cross-Document Sharing: Multiple docs referencing same assets
  3. Package Portability: Create on one system, extract on another
  4. Error Recovery: Broken symlinks, missing files, corrupted packages

Performance Tests

Benchmarking Scenarios:

  • Large Asset Libraries: 1000+ assets, multiple documents
  • Batch Processing: Importing entire directories
  • Package Operations: Creating/extracting large packages
  • Deduplication Efficiency: Storage savings measurement

Risk Mitigation

Technical Risks

Symlink Compatibility

  • Risk: Symlinks fail on Windows or restricted filesystems
  • Mitigation: Automatic fallback to file copying
  • Detection: Platform detection and permission testing

Package Corruption

  • Risk: ZIP files become corrupted during transfer
  • Mitigation: Built-in validation and checksum verification
  • Recovery: Package repair tools and backup strategies

Storage Scalability

  • Risk: Asset libraries become too large to manage efficiently
  • Mitigation: Lazy loading, pagination, and cleanup tools
  • Monitoring: Storage usage tracking and alerts

User Experience Risks

Learning Curve

  • Risk: Users find asset management complex
  • Mitigation: Progressive disclosure, good defaults, clear documentation
  • Support: Interactive tutorials and example workflows

Data Loss

  • Risk: Assets accidentally deleted or corrupted
  • Mitigation: Confirmation prompts, soft deletion, backup recommendations
  • Recovery: Asset history tracking and restore capabilities

Success Metrics

Technical Metrics

  • Storage Efficiency: 30%+ reduction in duplicate asset storage
  • Performance: Asset operations complete in <100ms for typical workloads
  • Reliability: 99.9%+ success rate for package operations
  • Compatibility: Works on Windows, macOS, Linux

User Adoption Metrics

  • CLI Usage: Asset commands represent 10%+ of total markitect usage
  • Package Creation: Users create 5+ packages per month on average
  • Error Rates: <1% of asset operations result in user-visible errors
  • Documentation: Asset management docs have 95%+ user satisfaction

Implementation Timeline

Week 1-2: Core Module

  • Asset registry implementation
  • Deduplication engine with symlinks
  • Basic package creation/extraction
  • Unit test suite (80%+ coverage)

Week 3: CLI Integration

  • Complete CLI command suite
  • Integration with main markitect CLI
  • Configuration management
  • User documentation

Week 4-5: Advanced Features

  • Batch processing capabilities
  • Database integration
  • Performance optimizations
  • Integration test suite

Week 6: Production Readiness

  • Error handling and recovery
  • Cross-platform testing
  • Performance benchmarking
  • Release preparation

Dependencies

Internal Dependencies

  • markitect.database: Metadata storage integration
  • markitect.config_manager: Configuration management
  • markitect.cli: Command registration and parsing
  • markitect.batch_processor: Bulk operation support

External Dependencies

  • Click: CLI framework (existing dependency)
  • Pathlib: Path manipulation (standard library)
  • Zipfile: Package creation (standard library)
  • Hashlib: Content hashing (standard library)
  • JSON: Metadata serialization (standard library)
  • OS: Symlink operations (standard library)

Optional Dependencies

  • Pillow: Image processing and optimization
  • Send2trash: Safe file deletion
  • Watchdog: File system monitoring

Next Steps

  1. Review and Approval: Get stakeholder sign-off on this gameplan
  2. Environment Setup: Prepare development environment and test fixtures
  3. Phase 1 Kickoff: Begin core module implementation
  4. Continuous Integration: Set up automated testing pipeline
  5. Documentation: Start user guide and API documentation

This gameplan provides a comprehensive roadmap for implementing Issue #141 Variant B, ensuring robust asset management capabilities while maintaining compatibility with existing markitect workflows.


Status: 📋 Ready for Implementation - Awaiting Approval