Files
markitect-main/docs/PROJECT_STRUCTURE.md
tegwick b7e11461f4 chore: rename markitect_project to markitect-main across project
Finishes the in-progress rename so docs, configs, tests, and capability
manifests all reference the current repo name consistently. Fixes two
tests (test_roundtrip_consolidated.py, test_issue_140_roundtrip_simplified.py)
whose hardcoded cwd paths would have broken under the renamed directory.

Archival content under history/, reports/, and roadmap/eat-the-frog/, plus
derived artifacts (.venv_old/, node_modules/, asset_registry.json) are
intentionally left untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:57:35 +02:00

287 lines
11 KiB
Markdown

# MarkiTect Project Structure
This document describes the current project layout, architectural decisions, and the reorganization plan for the Information Space Service evolution.
## Overview
MarkiTect is a markdown processing toolkit with transclusion, schema validation, asset management, and multi-format output capabilities. The project follows a hybrid layout that is being incrementally consolidated.
## Current Directory Structure
```
markitect-main/
├── markitect/ # Main package
│ ├── [34 root-level .py files] # Core functionality (see below)
│ ├── assets/ # Asset discovery, management, caching (21 files)
│ ├── finance/ # Cost tracking, work time management (9 files)
│ ├── plugins/ # Plugin system with base classes (7 files)
│ ├── packaging/ # Asset packaging, MDZ variants (7 files)
│ ├── production/ # Deployment validation, benchmarks (6 files)
│ ├── legacy/ # Legacy compatibility layer (8 files)
│ ├── explode_variants/ # Document expansion, variants (9 files)
│ ├── query_paradigms/ # Query paradigm implementations (4 files)
│ ├── validators/ # Content/link/section validation (4 files)
│ ├── matter_frontmatter/ # Front matter parsing (4 files)
│ ├── matter_contentmatter/ # Content matter parsing (4 files)
│ ├── matter_tailmatter/ # Tail matter parsing (4 files)
│ ├── profile/ # User profile management (4 files)
│ ├── graphql/ # GraphQL query implementation (4 files)
│ ├── template/ # Template management (3 files)
│ ├── themes/ # Theme system with subdirectories (1 file)
│ └── schemas/ # Built-in schema definitions (9 files)
├── application/ # Application layer services
├── domain/ # Domain models
├── infrastructure/ # Infrastructure implementations
├── tests/ # Test suite (90+ test files)
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── e2e/ # End-to-end tests
│ └── fixtures/ # Test data
├── docs/ # Documentation (12+ subdirectories)
├── src/ # JavaScript/frontend components
└── roadmap/ # Project roadmap
```
## Root-Level Modules (/markitect/)
The 34 root-level Python files are organized by function:
### Core Infrastructure
| File | Lines | Purpose |
|------|-------|---------|
| `parser.py` | ~50 | Markdown AST parsing using markdown-it |
| `serializer.py` | ~360 | AST serialization back to Markdown |
| `document_manager.py` | ~100 | Wrapper around CleanDocumentManager |
| `clean_document_manager.py` | ~2000 | Clean document management implementation |
| `workspace.py` | ~200 | Workspace management |
| `database.py` | ~400 | SQLite database management |
### Schema Management (6 files, 99KB total)
| File | Lines | Purpose |
|------|-------|---------|
| `schema_generator.py` | ~600 | JSON schema generation from markdown AST |
| `schema_analyzer.py` | ~450 | Schema rigidity analysis with phase classification |
| `schema_loader.py` | ~600 | Schema loading from markdown with frontmatter |
| `schema_refiner.py` | ~600 | Automatic schema refinement using loosening rules |
| `schema_validator.py` | ~900 | Comprehensive schema validation |
| `schema_naming.py` | ~300 | Schema naming convention enforcement |
### Configuration & Services
| File | Purpose |
|------|---------|
| `config_manager.py` | Configuration file management |
| `frontmatter.py` | YAML frontmatter parsing |
| `exceptions.py` | Custom exception classes |
| `ast_service.py` | AST service layer |
| `cache_service.py` | Caching functionality |
| `ast_cache.py` | AST caching implementation |
| `performance_tracker.py` | Performance metrics |
### Validation & Analysis
| File | Purpose |
|------|---------|
| `semantic_validator.py` | Semantic validation layer |
| `validation_error.py` | Validation error handling |
| `metaschema.py` | Metaschema validation for custom extensions |
### CLI & Commands
| File | Purpose |
|------|---------|
| `cli.py` | Main CLI interface (274KB, comprehensive) |
| `cli_utils.py` | CLI utilities |
| `asset_commands.py` | Asset-related CLI commands |
| `draft_generator.py` | Draft generation functionality |
### Utilities
| File | Purpose |
|------|---------|
| `batch_processor.py` | Batch processing operations |
| `associated_files.py` | Associated file tracking |
| `legacy_compat.py` | Legacy compatibility layer |
| `legacy_integration_example.py` | Integration examples |
| `_version.py`, `__version__.py` | Version management |
## Subpackages
### assets/ (21 files)
Complete asset management system including discovery, analytics, caching, deduplication, and packaging. Key files:
- `repository.py` - Asset repository pattern
- `discovery.py` - Asset discovery algorithms
- `cache.py` - Asset caching layer
- `analytics.py` - Asset usage analytics
### finance/ (9 files)
Cost tracking and work time management:
- `models.py` - Financial data models
- `cost_tracker.py` - Cost tracking implementation
- `period_tracker.py` - Period-based tracking
- `report_generator.py` - Financial reports
### plugins/ (7 files)
Extensible plugin system:
- `base.py` - Plugin base classes and types
- `registry.py` - Plugin registry
- `builtin/` - Built-in plugin implementations
### packaging/ (7 files)
Asset packaging and MDZ format support:
- `mdz_packager.py` - MDZ package creation
- `transclusion.py` - Transclusion handling
- `variant_factory.py` - Variant generation
### production/ (6 files)
Deployment and production validation:
- `deployment_validator.py` - Deployment checks
- `performance_benchmark.py` - Performance testing
- `cross_platform_validator.py` - Platform compatibility
### legacy/ (8 files)
Backward compatibility layer:
- `compatibility.py` - Compatibility wrappers
- `deprecation.py` - Deprecation warnings
- `git_tracker.py` - Git integration (useful for Phase 8)
## Test Structure
```
tests/
├── conftest.py # Shared pytest configuration
├── fixtures/ # Test data files
│ ├── content_test_files/
│ ├── contentmatter_test_files/
│ ├── frontmatter_test_files/
│ └── tailmatter_test_files/
├── unit/ # Unit tests by domain
│ ├── application/
│ └── infrastructure/
├── integration/ # Integration tests
│ └── repositories/
└── e2e/ # End-to-end tests
├── cli/
└── performance/
```
---
## Planned Reorganization
### Motivation
The current layout has grown organically, resulting in:
1. **34 files at root level** - Too many modules at package root
2. **No clear grouping** - Schema tools, core infrastructure, and utilities mixed
3. **Hybrid architecture** - Mix of root packages and monolithic /markitect/
### Target Structure
After reorganization, the /markitect/ package will have clearer structure:
```
markitect/
├── core/ # Core infrastructure (NEW)
│ ├── __init__.py
│ ├── parser.py # (from markitect/)
│ ├── serializer.py # (from markitect/)
│ ├── document_manager.py # (from markitect/)
│ └── workspace.py # (from markitect/)
├── schema/ # Schema management (NEW)
│ ├── __init__.py
│ ├── validator.py # (from schema_validator.py)
│ ├── generator.py # (from schema_generator.py)
│ ├── loader.py # (from schema_loader.py)
│ ├── analyzer.py # (from schema_analyzer.py)
│ ├── refiner.py # (from schema_refiner.py)
│ └── naming.py # (from schema_naming.py)
├── storage/ # Storage concerns (NEW)
│ ├── __init__.py
│ ├── database.py # (from markitect/)
│ └── cache.py # (consolidated)
├── spaces/ # Information spaces (Phase 1+)
│ ├── models.py
│ ├── events/
│ ├── repositories/
│ ├── transclusion/
│ ├── rendering/
│ ├── sync/
│ └── services/
└── [existing subpackages] # assets/, plugins/, etc.
```
### Backward Compatibility
Original import paths will continue to work through re-exports:
```python
# Old import (still works)
from markitect.parser import parse_markdown
# New import (preferred)
from markitect.core.parser import parse_markdown
```
### Migration Strategy
1. Create new subpackages with copied content
2. Update internal imports to new paths
3. Add deprecation warnings to old paths
4. Re-export from original locations for compatibility
5. Verify all tests pass
6. Update documentation
---
## Information Space Service Architecture
The reorganization prepares for the Information Space Service evolution, which adds:
### Phase 1-3: Foundation
- `InformationSpace` entity with lifecycle management
- `SpaceRepository` for persistence
- Event system for change tracking
- Persistent transclusion context
### Phase 4-5: Modes
- HTML rendering mode with caching
- Directory mode with bidirectional sync
### Phase 6-7: API & Composability
- GraphQL schema extensions
- CLI commands for space management
- Space references and inheritance
### Phase 8: Git History (Optional)
- Git-based version control for spaces
- Event-driven commits
- Version navigation
See [docs/roadmap/information-space-service/](./roadmap/information-space-service/) for the complete workplan.
---
## Key Dependencies
From `pyproject.toml`:
- Python >=3.8 (tested on 3.12)
- markdown-it-py - Markdown parsing
- PyYAML - YAML/frontmatter handling
- click - CLI framework
- tabulate - Table formatting
- jsonpath-ng - JSON path queries
- aiohttp - Async HTTP
## Version Information
- Current version is managed in `_version.py` and `__version__.py`
- Follows semantic versioning
- CHANGELOG.md tracks all changes
---
## Related Documentation
- [CLI Tutorial](CLI_TUTORIAL.md) - CLI usage guide
- [Plugin System](PLUGIN_SYSTEM.md) - Plugin architecture
- [Schema Management Guide](SCHEMA_MANAGEMENT_GUIDE.md) - Schema workflows
- [Asset Management Guide](ASSET_MANAGEMENT_USER_GUIDE.md) - Asset system
- [Error Handling Strategy](ERROR_HANDLING_STRATEGY.md) - Error patterns