Files
markitect-main/docs/PROJECT_STRUCTURE.md
tegwick b7e11461f4 chore: rename markitect_project to markitect-main across project
Finishes the in-progress rename so docs, configs, tests, and capability
manifests all reference the current repo name consistently. Fixes two
tests (test_roundtrip_consolidated.py, test_issue_140_roundtrip_simplified.py)
whose hardcoded cwd paths would have broken under the renamed directory.

Archival content under history/, reports/, and roadmap/eat-the-frog/, plus
derived artifacts (.venv_old/, node_modules/, asset_registry.json) are
intentionally left untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:57:35 +02:00

11 KiB

MarkiTect Project Structure

This document describes the current project layout, architectural decisions, and the reorganization plan for the Information Space Service evolution.

Overview

MarkiTect is a markdown processing toolkit with transclusion, schema validation, asset management, and multi-format output capabilities. The project follows a hybrid layout that is being incrementally consolidated.

Current Directory Structure

markitect-main/
├── markitect/                    # Main package
│   ├── [34 root-level .py files] # Core functionality (see below)
│   ├── assets/                   # Asset discovery, management, caching (21 files)
│   ├── finance/                  # Cost tracking, work time management (9 files)
│   ├── plugins/                  # Plugin system with base classes (7 files)
│   ├── packaging/                # Asset packaging, MDZ variants (7 files)
│   ├── production/               # Deployment validation, benchmarks (6 files)
│   ├── legacy/                   # Legacy compatibility layer (8 files)
│   ├── explode_variants/         # Document expansion, variants (9 files)
│   ├── query_paradigms/          # Query paradigm implementations (4 files)
│   ├── validators/               # Content/link/section validation (4 files)
│   ├── matter_frontmatter/       # Front matter parsing (4 files)
│   ├── matter_contentmatter/     # Content matter parsing (4 files)
│   ├── matter_tailmatter/        # Tail matter parsing (4 files)
│   ├── profile/                  # User profile management (4 files)
│   ├── graphql/                  # GraphQL query implementation (4 files)
│   ├── template/                 # Template management (3 files)
│   ├── themes/                   # Theme system with subdirectories (1 file)
│   └── schemas/                  # Built-in schema definitions (9 files)
├── application/                  # Application layer services
├── domain/                       # Domain models
├── infrastructure/               # Infrastructure implementations
├── tests/                        # Test suite (90+ test files)
│   ├── unit/                     # Unit tests
│   ├── integration/              # Integration tests
│   ├── e2e/                      # End-to-end tests
│   └── fixtures/                 # Test data
├── docs/                         # Documentation (12+ subdirectories)
├── src/                          # JavaScript/frontend components
└── roadmap/                      # Project roadmap

Root-Level Modules (/markitect/)

The 34 root-level Python files are organized by function:

Core Infrastructure

File Lines Purpose
parser.py ~50 Markdown AST parsing using markdown-it
serializer.py ~360 AST serialization back to Markdown
document_manager.py ~100 Wrapper around CleanDocumentManager
clean_document_manager.py ~2000 Clean document management implementation
workspace.py ~200 Workspace management
database.py ~400 SQLite database management

Schema Management (6 files, 99KB total)

File Lines Purpose
schema_generator.py ~600 JSON schema generation from markdown AST
schema_analyzer.py ~450 Schema rigidity analysis with phase classification
schema_loader.py ~600 Schema loading from markdown with frontmatter
schema_refiner.py ~600 Automatic schema refinement using loosening rules
schema_validator.py ~900 Comprehensive schema validation
schema_naming.py ~300 Schema naming convention enforcement

Configuration & Services

File Purpose
config_manager.py Configuration file management
frontmatter.py YAML frontmatter parsing
exceptions.py Custom exception classes
ast_service.py AST service layer
cache_service.py Caching functionality
ast_cache.py AST caching implementation
performance_tracker.py Performance metrics

Validation & Analysis

File Purpose
semantic_validator.py Semantic validation layer
validation_error.py Validation error handling
metaschema.py Metaschema validation for custom extensions

CLI & Commands

File Purpose
cli.py Main CLI interface (274KB, comprehensive)
cli_utils.py CLI utilities
asset_commands.py Asset-related CLI commands
draft_generator.py Draft generation functionality

Utilities

File Purpose
batch_processor.py Batch processing operations
associated_files.py Associated file tracking
legacy_compat.py Legacy compatibility layer
legacy_integration_example.py Integration examples
_version.py, __version__.py Version management

Subpackages

assets/ (21 files)

Complete asset management system including discovery, analytics, caching, deduplication, and packaging. Key files:

  • repository.py - Asset repository pattern
  • discovery.py - Asset discovery algorithms
  • cache.py - Asset caching layer
  • analytics.py - Asset usage analytics

finance/ (9 files)

Cost tracking and work time management:

  • models.py - Financial data models
  • cost_tracker.py - Cost tracking implementation
  • period_tracker.py - Period-based tracking
  • report_generator.py - Financial reports

plugins/ (7 files)

Extensible plugin system:

  • base.py - Plugin base classes and types
  • registry.py - Plugin registry
  • builtin/ - Built-in plugin implementations

packaging/ (7 files)

Asset packaging and MDZ format support:

  • mdz_packager.py - MDZ package creation
  • transclusion.py - Transclusion handling
  • variant_factory.py - Variant generation

production/ (6 files)

Deployment and production validation:

  • deployment_validator.py - Deployment checks
  • performance_benchmark.py - Performance testing
  • cross_platform_validator.py - Platform compatibility

legacy/ (8 files)

Backward compatibility layer:

  • compatibility.py - Compatibility wrappers
  • deprecation.py - Deprecation warnings
  • git_tracker.py - Git integration (useful for Phase 8)

Test Structure

tests/
├── conftest.py              # Shared pytest configuration
├── fixtures/                # Test data files
│   ├── content_test_files/
│   ├── contentmatter_test_files/
│   ├── frontmatter_test_files/
│   └── tailmatter_test_files/
├── unit/                    # Unit tests by domain
│   ├── application/
│   └── infrastructure/
├── integration/             # Integration tests
│   └── repositories/
└── e2e/                     # End-to-end tests
    ├── cli/
    └── performance/

Planned Reorganization

Motivation

The current layout has grown organically, resulting in:

  1. 34 files at root level - Too many modules at package root
  2. No clear grouping - Schema tools, core infrastructure, and utilities mixed
  3. Hybrid architecture - Mix of root packages and monolithic /markitect/

Target Structure

After reorganization, the /markitect/ package will have clearer structure:

markitect/
├── core/                    # Core infrastructure (NEW)
│   ├── __init__.py
│   ├── parser.py           # (from markitect/)
│   ├── serializer.py       # (from markitect/)
│   ├── document_manager.py # (from markitect/)
│   └── workspace.py        # (from markitect/)
├── schema/                  # Schema management (NEW)
│   ├── __init__.py
│   ├── validator.py        # (from schema_validator.py)
│   ├── generator.py        # (from schema_generator.py)
│   ├── loader.py           # (from schema_loader.py)
│   ├── analyzer.py         # (from schema_analyzer.py)
│   ├── refiner.py          # (from schema_refiner.py)
│   └── naming.py           # (from schema_naming.py)
├── storage/                 # Storage concerns (NEW)
│   ├── __init__.py
│   ├── database.py         # (from markitect/)
│   └── cache.py            # (consolidated)
├── spaces/                  # Information spaces (Phase 1+)
│   ├── models.py
│   ├── events/
│   ├── repositories/
│   ├── transclusion/
│   ├── rendering/
│   ├── sync/
│   └── services/
└── [existing subpackages]   # assets/, plugins/, etc.

Backward Compatibility

Original import paths will continue to work through re-exports:

# Old import (still works)
from markitect.parser import parse_markdown

# New import (preferred)
from markitect.core.parser import parse_markdown

Migration Strategy

  1. Create new subpackages with copied content
  2. Update internal imports to new paths
  3. Add deprecation warnings to old paths
  4. Re-export from original locations for compatibility
  5. Verify all tests pass
  6. Update documentation

Information Space Service Architecture

The reorganization prepares for the Information Space Service evolution, which adds:

Phase 1-3: Foundation

  • InformationSpace entity with lifecycle management
  • SpaceRepository for persistence
  • Event system for change tracking
  • Persistent transclusion context

Phase 4-5: Modes

  • HTML rendering mode with caching
  • Directory mode with bidirectional sync

Phase 6-7: API & Composability

  • GraphQL schema extensions
  • CLI commands for space management
  • Space references and inheritance

Phase 8: Git History (Optional)

  • Git-based version control for spaces
  • Event-driven commits
  • Version navigation

See docs/roadmap/information-space-service/ for the complete workplan.


Key Dependencies

From pyproject.toml:

  • Python >=3.8 (tested on 3.12)
  • markdown-it-py - Markdown parsing
  • PyYAML - YAML/frontmatter handling
  • click - CLI framework
  • tabulate - Table formatting
  • jsonpath-ng - JSON path queries
  • aiohttp - Async HTTP

Version Information

  • Current version is managed in _version.py and __version__.py
  • Follows semantic versioning
  • CHANGELOG.md tracks all changes