# MarkiTect Project Structure This document describes the current project layout, architectural decisions, and the reorganization plan for the Information Space Service evolution. ## Overview MarkiTect is a markdown processing toolkit with transclusion, schema validation, asset management, and multi-format output capabilities. The project follows a hybrid layout that is being incrementally consolidated. ## Current Directory Structure ``` markitect-main/ ├── markitect/ # Main package │ ├── [34 root-level .py files] # Core functionality (see below) │ ├── assets/ # Asset discovery, management, caching (21 files) │ ├── finance/ # Cost tracking, work time management (9 files) │ ├── plugins/ # Plugin system with base classes (7 files) │ ├── packaging/ # Asset packaging, MDZ variants (7 files) │ ├── production/ # Deployment validation, benchmarks (6 files) │ ├── legacy/ # Legacy compatibility layer (8 files) │ ├── explode_variants/ # Document expansion, variants (9 files) │ ├── query_paradigms/ # Query paradigm implementations (4 files) │ ├── validators/ # Content/link/section validation (4 files) │ ├── matter_frontmatter/ # Front matter parsing (4 files) │ ├── matter_contentmatter/ # Content matter parsing (4 files) │ ├── matter_tailmatter/ # Tail matter parsing (4 files) │ ├── profile/ # User profile management (4 files) │ ├── graphql/ # GraphQL query implementation (4 files) │ ├── template/ # Template management (3 files) │ ├── themes/ # Theme system with subdirectories (1 file) │ └── schemas/ # Built-in schema definitions (9 files) ├── application/ # Application layer services ├── domain/ # Domain models ├── infrastructure/ # Infrastructure implementations ├── tests/ # Test suite (90+ test files) │ ├── unit/ # Unit tests │ ├── integration/ # Integration tests │ ├── e2e/ # End-to-end tests │ └── fixtures/ # Test data ├── docs/ # Documentation (12+ subdirectories) ├── src/ # JavaScript/frontend components └── roadmap/ # Project roadmap ``` ## Root-Level Modules (/markitect/) The 34 root-level Python files are organized by function: ### Core Infrastructure | File | Lines | Purpose | |------|-------|---------| | `parser.py` | ~50 | Markdown AST parsing using markdown-it | | `serializer.py` | ~360 | AST serialization back to Markdown | | `document_manager.py` | ~100 | Wrapper around CleanDocumentManager | | `clean_document_manager.py` | ~2000 | Clean document management implementation | | `workspace.py` | ~200 | Workspace management | | `database.py` | ~400 | SQLite database management | ### Schema Management (6 files, 99KB total) | File | Lines | Purpose | |------|-------|---------| | `schema_generator.py` | ~600 | JSON schema generation from markdown AST | | `schema_analyzer.py` | ~450 | Schema rigidity analysis with phase classification | | `schema_loader.py` | ~600 | Schema loading from markdown with frontmatter | | `schema_refiner.py` | ~600 | Automatic schema refinement using loosening rules | | `schema_validator.py` | ~900 | Comprehensive schema validation | | `schema_naming.py` | ~300 | Schema naming convention enforcement | ### Configuration & Services | File | Purpose | |------|---------| | `config_manager.py` | Configuration file management | | `frontmatter.py` | YAML frontmatter parsing | | `exceptions.py` | Custom exception classes | | `ast_service.py` | AST service layer | | `cache_service.py` | Caching functionality | | `ast_cache.py` | AST caching implementation | | `performance_tracker.py` | Performance metrics | ### Validation & Analysis | File | Purpose | |------|---------| | `semantic_validator.py` | Semantic validation layer | | `validation_error.py` | Validation error handling | | `metaschema.py` | Metaschema validation for custom extensions | ### CLI & Commands | File | Purpose | |------|---------| | `cli.py` | Main CLI interface (274KB, comprehensive) | | `cli_utils.py` | CLI utilities | | `asset_commands.py` | Asset-related CLI commands | | `draft_generator.py` | Draft generation functionality | ### Utilities | File | Purpose | |------|---------| | `batch_processor.py` | Batch processing operations | | `associated_files.py` | Associated file tracking | | `legacy_compat.py` | Legacy compatibility layer | | `legacy_integration_example.py` | Integration examples | | `_version.py`, `__version__.py` | Version management | ## Subpackages ### assets/ (21 files) Complete asset management system including discovery, analytics, caching, deduplication, and packaging. Key files: - `repository.py` - Asset repository pattern - `discovery.py` - Asset discovery algorithms - `cache.py` - Asset caching layer - `analytics.py` - Asset usage analytics ### finance/ (9 files) Cost tracking and work time management: - `models.py` - Financial data models - `cost_tracker.py` - Cost tracking implementation - `period_tracker.py` - Period-based tracking - `report_generator.py` - Financial reports ### plugins/ (7 files) Extensible plugin system: - `base.py` - Plugin base classes and types - `registry.py` - Plugin registry - `builtin/` - Built-in plugin implementations ### packaging/ (7 files) Asset packaging and MDZ format support: - `mdz_packager.py` - MDZ package creation - `transclusion.py` - Transclusion handling - `variant_factory.py` - Variant generation ### production/ (6 files) Deployment and production validation: - `deployment_validator.py` - Deployment checks - `performance_benchmark.py` - Performance testing - `cross_platform_validator.py` - Platform compatibility ### legacy/ (8 files) Backward compatibility layer: - `compatibility.py` - Compatibility wrappers - `deprecation.py` - Deprecation warnings - `git_tracker.py` - Git integration (useful for Phase 8) ## Test Structure ``` tests/ ├── conftest.py # Shared pytest configuration ├── fixtures/ # Test data files │ ├── content_test_files/ │ ├── contentmatter_test_files/ │ ├── frontmatter_test_files/ │ └── tailmatter_test_files/ ├── unit/ # Unit tests by domain │ ├── application/ │ └── infrastructure/ ├── integration/ # Integration tests │ └── repositories/ └── e2e/ # End-to-end tests ├── cli/ └── performance/ ``` --- ## Planned Reorganization ### Motivation The current layout has grown organically, resulting in: 1. **34 files at root level** - Too many modules at package root 2. **No clear grouping** - Schema tools, core infrastructure, and utilities mixed 3. **Hybrid architecture** - Mix of root packages and monolithic /markitect/ ### Target Structure After reorganization, the /markitect/ package will have clearer structure: ``` markitect/ ├── core/ # Core infrastructure (NEW) │ ├── __init__.py │ ├── parser.py # (from markitect/) │ ├── serializer.py # (from markitect/) │ ├── document_manager.py # (from markitect/) │ └── workspace.py # (from markitect/) ├── schema/ # Schema management (NEW) │ ├── __init__.py │ ├── validator.py # (from schema_validator.py) │ ├── generator.py # (from schema_generator.py) │ ├── loader.py # (from schema_loader.py) │ ├── analyzer.py # (from schema_analyzer.py) │ ├── refiner.py # (from schema_refiner.py) │ └── naming.py # (from schema_naming.py) ├── storage/ # Storage concerns (NEW) │ ├── __init__.py │ ├── database.py # (from markitect/) │ └── cache.py # (consolidated) ├── spaces/ # Information spaces (Phase 1+) │ ├── models.py │ ├── events/ │ ├── repositories/ │ ├── transclusion/ │ ├── rendering/ │ ├── sync/ │ └── services/ └── [existing subpackages] # assets/, plugins/, etc. ``` ### Backward Compatibility Original import paths will continue to work through re-exports: ```python # Old import (still works) from markitect.parser import parse_markdown # New import (preferred) from markitect.core.parser import parse_markdown ``` ### Migration Strategy 1. Create new subpackages with copied content 2. Update internal imports to new paths 3. Add deprecation warnings to old paths 4. Re-export from original locations for compatibility 5. Verify all tests pass 6. Update documentation --- ## Information Space Service Architecture The reorganization prepares for the Information Space Service evolution, which adds: ### Phase 1-3: Foundation - `InformationSpace` entity with lifecycle management - `SpaceRepository` for persistence - Event system for change tracking - Persistent transclusion context ### Phase 4-5: Modes - HTML rendering mode with caching - Directory mode with bidirectional sync ### Phase 6-7: API & Composability - GraphQL schema extensions - CLI commands for space management - Space references and inheritance ### Phase 8: Git History (Optional) - Git-based version control for spaces - Event-driven commits - Version navigation See [docs/roadmap/information-space-service/](./roadmap/information-space-service/) for the complete workplan. --- ## Key Dependencies From `pyproject.toml`: - Python >=3.8 (tested on 3.12) - markdown-it-py - Markdown parsing - PyYAML - YAML/frontmatter handling - click - CLI framework - tabulate - Table formatting - jsonpath-ng - JSON path queries - aiohttp - Async HTTP ## Version Information - Current version is managed in `_version.py` and `__version__.py` - Follows semantic versioning - CHANGELOG.md tracks all changes --- ## Related Documentation - [CLI Tutorial](CLI_TUTORIAL.md) - CLI usage guide - [Plugin System](PLUGIN_SYSTEM.md) - Plugin architecture - [Schema Management Guide](SCHEMA_MANAGEMENT_GUIDE.md) - Schema workflows - [Asset Management Guide](ASSET_MANAGEMENT_USER_GUIDE.md) - Asset system - [Error Handling Strategy](ERROR_HANDLING_STRATEGY.md) - Error patterns