feat(spaces): implement Phase 0-1 of Information Space Service
Phase 0 - Project Organization: - Create docs/PROJECT_STRUCTURE.md documenting codebase layout - Create markitect/core/ with parser, serializer, document_manager, workspace - Create markitect/schema/ consolidating 6 schema_*.py modules - Create markitect/storage/ with database module - Maintain backward compatibility via re-exports from original locations - Add docs/roadmap/information-space-service/ with README and WORKPLAN Phase 1 - Foundation (Weeks 1-3): - Week 1: Core domain models (InformationSpace, SpaceDocument, SpaceConfig, SpaceMetadata, SpaceVariable, TransclusionReference, SpaceStatus) - Week 2: Repository layer with interfaces (ISpaceRepository, IDocumentAssociationRepository, IVariableRepository, IReferenceRepository) and SQLite implementations with foreign key cascade deletes - Week 3: SpaceService orchestration layer with full CRUD, document, variable, and reference tracking operations Test coverage: 124 tests (25 model + 63 repository + 36 integration) Capabilities delivered: - CAP-001: InformationSpace entity with lifecycle management - CAP-002: SpaceRepository CRUD with SQLite backing - CAP-003: Document-Space associations with path-based organization - CAP-004: Space metadata and configuration schemas - CAP-005: Database schema with migrations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
286
docs/PROJECT_STRUCTURE.md
Normal file
286
docs/PROJECT_STRUCTURE.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# MarkiTect Project Structure
|
||||
|
||||
This document describes the current project layout, architectural decisions, and the reorganization plan for the Information Space Service evolution.
|
||||
|
||||
## Overview
|
||||
|
||||
MarkiTect is a markdown processing toolkit with transclusion, schema validation, asset management, and multi-format output capabilities. The project follows a hybrid layout that is being incrementally consolidated.
|
||||
|
||||
## Current Directory Structure
|
||||
|
||||
```
|
||||
markitect_project/
|
||||
├── markitect/ # Main package
|
||||
│ ├── [34 root-level .py files] # Core functionality (see below)
|
||||
│ ├── assets/ # Asset discovery, management, caching (21 files)
|
||||
│ ├── finance/ # Cost tracking, work time management (9 files)
|
||||
│ ├── plugins/ # Plugin system with base classes (7 files)
|
||||
│ ├── packaging/ # Asset packaging, MDZ variants (7 files)
|
||||
│ ├── production/ # Deployment validation, benchmarks (6 files)
|
||||
│ ├── legacy/ # Legacy compatibility layer (8 files)
|
||||
│ ├── explode_variants/ # Document expansion, variants (9 files)
|
||||
│ ├── query_paradigms/ # Query paradigm implementations (4 files)
|
||||
│ ├── validators/ # Content/link/section validation (4 files)
|
||||
│ ├── matter_frontmatter/ # Front matter parsing (4 files)
|
||||
│ ├── matter_contentmatter/ # Content matter parsing (4 files)
|
||||
│ ├── matter_tailmatter/ # Tail matter parsing (4 files)
|
||||
│ ├── profile/ # User profile management (4 files)
|
||||
│ ├── graphql/ # GraphQL query implementation (4 files)
|
||||
│ ├── template/ # Template management (3 files)
|
||||
│ ├── themes/ # Theme system with subdirectories (1 file)
|
||||
│ └── schemas/ # Built-in schema definitions (9 files)
|
||||
├── application/ # Application layer services
|
||||
├── domain/ # Domain models
|
||||
├── infrastructure/ # Infrastructure implementations
|
||||
├── tests/ # Test suite (90+ test files)
|
||||
│ ├── unit/ # Unit tests
|
||||
│ ├── integration/ # Integration tests
|
||||
│ ├── e2e/ # End-to-end tests
|
||||
│ └── fixtures/ # Test data
|
||||
├── docs/ # Documentation (12+ subdirectories)
|
||||
├── src/ # JavaScript/frontend components
|
||||
└── roadmap/ # Project roadmap
|
||||
```
|
||||
|
||||
## Root-Level Modules (/markitect/)
|
||||
|
||||
The 34 root-level Python files are organized by function:
|
||||
|
||||
### Core Infrastructure
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `parser.py` | ~50 | Markdown AST parsing using markdown-it |
|
||||
| `serializer.py` | ~360 | AST serialization back to Markdown |
|
||||
| `document_manager.py` | ~100 | Wrapper around CleanDocumentManager |
|
||||
| `clean_document_manager.py` | ~2000 | Clean document management implementation |
|
||||
| `workspace.py` | ~200 | Workspace management |
|
||||
| `database.py` | ~400 | SQLite database management |
|
||||
|
||||
### Schema Management (6 files, 99KB total)
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `schema_generator.py` | ~600 | JSON schema generation from markdown AST |
|
||||
| `schema_analyzer.py` | ~450 | Schema rigidity analysis with phase classification |
|
||||
| `schema_loader.py` | ~600 | Schema loading from markdown with frontmatter |
|
||||
| `schema_refiner.py` | ~600 | Automatic schema refinement using loosening rules |
|
||||
| `schema_validator.py` | ~900 | Comprehensive schema validation |
|
||||
| `schema_naming.py` | ~300 | Schema naming convention enforcement |
|
||||
|
||||
### Configuration & Services
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `config_manager.py` | Configuration file management |
|
||||
| `frontmatter.py` | YAML frontmatter parsing |
|
||||
| `exceptions.py` | Custom exception classes |
|
||||
| `ast_service.py` | AST service layer |
|
||||
| `cache_service.py` | Caching functionality |
|
||||
| `ast_cache.py` | AST caching implementation |
|
||||
| `performance_tracker.py` | Performance metrics |
|
||||
|
||||
### Validation & Analysis
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `semantic_validator.py` | Semantic validation layer |
|
||||
| `validation_error.py` | Validation error handling |
|
||||
| `metaschema.py` | Metaschema validation for custom extensions |
|
||||
|
||||
### CLI & Commands
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `cli.py` | Main CLI interface (274KB, comprehensive) |
|
||||
| `cli_utils.py` | CLI utilities |
|
||||
| `asset_commands.py` | Asset-related CLI commands |
|
||||
| `draft_generator.py` | Draft generation functionality |
|
||||
|
||||
### Utilities
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `batch_processor.py` | Batch processing operations |
|
||||
| `associated_files.py` | Associated file tracking |
|
||||
| `legacy_compat.py` | Legacy compatibility layer |
|
||||
| `legacy_integration_example.py` | Integration examples |
|
||||
| `_version.py`, `__version__.py` | Version management |
|
||||
|
||||
## Subpackages
|
||||
|
||||
### assets/ (21 files)
|
||||
Complete asset management system including discovery, analytics, caching, deduplication, and packaging. Key files:
|
||||
- `repository.py` - Asset repository pattern
|
||||
- `discovery.py` - Asset discovery algorithms
|
||||
- `cache.py` - Asset caching layer
|
||||
- `analytics.py` - Asset usage analytics
|
||||
|
||||
### finance/ (9 files)
|
||||
Cost tracking and work time management:
|
||||
- `models.py` - Financial data models
|
||||
- `cost_tracker.py` - Cost tracking implementation
|
||||
- `period_tracker.py` - Period-based tracking
|
||||
- `report_generator.py` - Financial reports
|
||||
|
||||
### plugins/ (7 files)
|
||||
Extensible plugin system:
|
||||
- `base.py` - Plugin base classes and types
|
||||
- `registry.py` - Plugin registry
|
||||
- `builtin/` - Built-in plugin implementations
|
||||
|
||||
### packaging/ (7 files)
|
||||
Asset packaging and MDZ format support:
|
||||
- `mdz_packager.py` - MDZ package creation
|
||||
- `transclusion.py` - Transclusion handling
|
||||
- `variant_factory.py` - Variant generation
|
||||
|
||||
### production/ (6 files)
|
||||
Deployment and production validation:
|
||||
- `deployment_validator.py` - Deployment checks
|
||||
- `performance_benchmark.py` - Performance testing
|
||||
- `cross_platform_validator.py` - Platform compatibility
|
||||
|
||||
### legacy/ (8 files)
|
||||
Backward compatibility layer:
|
||||
- `compatibility.py` - Compatibility wrappers
|
||||
- `deprecation.py` - Deprecation warnings
|
||||
- `git_tracker.py` - Git integration (useful for Phase 8)
|
||||
|
||||
## Test Structure
|
||||
|
||||
```
|
||||
tests/
|
||||
├── conftest.py # Shared pytest configuration
|
||||
├── fixtures/ # Test data files
|
||||
│ ├── content_test_files/
|
||||
│ ├── contentmatter_test_files/
|
||||
│ ├── frontmatter_test_files/
|
||||
│ └── tailmatter_test_files/
|
||||
├── unit/ # Unit tests by domain
|
||||
│ ├── application/
|
||||
│ └── infrastructure/
|
||||
├── integration/ # Integration tests
|
||||
│ └── repositories/
|
||||
└── e2e/ # End-to-end tests
|
||||
├── cli/
|
||||
└── performance/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Planned Reorganization
|
||||
|
||||
### Motivation
|
||||
|
||||
The current layout has grown organically, resulting in:
|
||||
1. **34 files at root level** - Too many modules at package root
|
||||
2. **No clear grouping** - Schema tools, core infrastructure, and utilities mixed
|
||||
3. **Hybrid architecture** - Mix of root packages and monolithic /markitect/
|
||||
|
||||
### Target Structure
|
||||
|
||||
After reorganization, the /markitect/ package will have clearer structure:
|
||||
|
||||
```
|
||||
markitect/
|
||||
├── core/ # Core infrastructure (NEW)
|
||||
│ ├── __init__.py
|
||||
│ ├── parser.py # (from markitect/)
|
||||
│ ├── serializer.py # (from markitect/)
|
||||
│ ├── document_manager.py # (from markitect/)
|
||||
│ └── workspace.py # (from markitect/)
|
||||
├── schema/ # Schema management (NEW)
|
||||
│ ├── __init__.py
|
||||
│ ├── validator.py # (from schema_validator.py)
|
||||
│ ├── generator.py # (from schema_generator.py)
|
||||
│ ├── loader.py # (from schema_loader.py)
|
||||
│ ├── analyzer.py # (from schema_analyzer.py)
|
||||
│ ├── refiner.py # (from schema_refiner.py)
|
||||
│ └── naming.py # (from schema_naming.py)
|
||||
├── storage/ # Storage concerns (NEW)
|
||||
│ ├── __init__.py
|
||||
│ ├── database.py # (from markitect/)
|
||||
│ └── cache.py # (consolidated)
|
||||
├── spaces/ # Information spaces (Phase 1+)
|
||||
│ ├── models.py
|
||||
│ ├── events/
|
||||
│ ├── repositories/
|
||||
│ ├── transclusion/
|
||||
│ ├── rendering/
|
||||
│ ├── sync/
|
||||
│ └── services/
|
||||
└── [existing subpackages] # assets/, plugins/, etc.
|
||||
```
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
Original import paths will continue to work through re-exports:
|
||||
|
||||
```python
|
||||
# Old import (still works)
|
||||
from markitect.parser import parse_markdown
|
||||
|
||||
# New import (preferred)
|
||||
from markitect.core.parser import parse_markdown
|
||||
```
|
||||
|
||||
### Migration Strategy
|
||||
|
||||
1. Create new subpackages with copied content
|
||||
2. Update internal imports to new paths
|
||||
3. Add deprecation warnings to old paths
|
||||
4. Re-export from original locations for compatibility
|
||||
5. Verify all tests pass
|
||||
6. Update documentation
|
||||
|
||||
---
|
||||
|
||||
## Information Space Service Architecture
|
||||
|
||||
The reorganization prepares for the Information Space Service evolution, which adds:
|
||||
|
||||
### Phase 1-3: Foundation
|
||||
- `InformationSpace` entity with lifecycle management
|
||||
- `SpaceRepository` for persistence
|
||||
- Event system for change tracking
|
||||
- Persistent transclusion context
|
||||
|
||||
### Phase 4-5: Modes
|
||||
- HTML rendering mode with caching
|
||||
- Directory mode with bidirectional sync
|
||||
|
||||
### Phase 6-7: API & Composability
|
||||
- GraphQL schema extensions
|
||||
- CLI commands for space management
|
||||
- Space references and inheritance
|
||||
|
||||
### Phase 8: Git History (Optional)
|
||||
- Git-based version control for spaces
|
||||
- Event-driven commits
|
||||
- Version navigation
|
||||
|
||||
See [docs/roadmap/information-space-service/](./roadmap/information-space-service/) for the complete workplan.
|
||||
|
||||
---
|
||||
|
||||
## Key Dependencies
|
||||
|
||||
From `pyproject.toml`:
|
||||
- Python >=3.8 (tested on 3.12)
|
||||
- markdown-it-py - Markdown parsing
|
||||
- PyYAML - YAML/frontmatter handling
|
||||
- click - CLI framework
|
||||
- tabulate - Table formatting
|
||||
- jsonpath-ng - JSON path queries
|
||||
- aiohttp - Async HTTP
|
||||
|
||||
## Version Information
|
||||
|
||||
- Current version is managed in `_version.py` and `__version__.py`
|
||||
- Follows semantic versioning
|
||||
- CHANGELOG.md tracks all changes
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [CLI Tutorial](CLI_TUTORIAL.md) - CLI usage guide
|
||||
- [Plugin System](PLUGIN_SYSTEM.md) - Plugin architecture
|
||||
- [Schema Management Guide](SCHEMA_MANAGEMENT_GUIDE.md) - Schema workflows
|
||||
- [Asset Management Guide](ASSET_MANAGEMENT_USER_GUIDE.md) - Asset system
|
||||
- [Error Handling Strategy](ERROR_HANDLING_STRATEGY.md) - Error patterns
|
||||
142
docs/roadmap/information-space-service/README.md
Normal file
142
docs/roadmap/information-space-service/README.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Headless Information Space Service Evolution
|
||||
|
||||
## Vision
|
||||
|
||||
Evolve markitect into a headless markdown transclusion-based information space service that supports:
|
||||
|
||||
1. **HTML Rendering Mode** - Render markdown to HTML, track changes, update space
|
||||
2. **Directory Structure Mode** - Represent information as canonical directory with markdown files
|
||||
3. **Multiple Frontends** - Support different interaction modes via clean API layer
|
||||
|
||||
## What is an Information Space?
|
||||
|
||||
An Information Space is a first-class abstraction that:
|
||||
|
||||
- Contains a collection of documents with transclusion relationships
|
||||
- Maintains persistent context for variable resolution
|
||||
- Tracks document dependencies for cache invalidation
|
||||
- Can be rendered to HTML or exported to directory structure
|
||||
- Supports event-driven updates and subscriptions
|
||||
- Can reference other spaces (composability)
|
||||
|
||||
## Key Capabilities
|
||||
|
||||
| Phase | Capability | Description |
|
||||
|-------|-----------|-------------|
|
||||
| 1 | InformationSpace Entity | Space abstraction with identity, metadata, lifecycle |
|
||||
| 2 | Event System | In-process pub/sub for space events |
|
||||
| 3 | Persistent Transclusion | Store context state, track references |
|
||||
| 4 | HTML Rendering | Render resolved markdown to HTML with caching |
|
||||
| 5 | Directory Mode | Bidirectional sync with filesystem |
|
||||
| 6 | API Layer | GraphQL, REST, CLI interfaces |
|
||||
| 7 | Composability | Space references and inheritance |
|
||||
| 8 | Git History | Optional git-based version control |
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ API Layer │
|
||||
│ GraphQL Schema │ REST Endpoints │ CLI Commands │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Service Layer │
|
||||
│ SpaceService │ RenderService │ SyncService │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Domain Layer │
|
||||
│ InformationSpace │ SpaceDocument │ SpaceEvent │ EventBus │
|
||||
│ PersistentTransclusionContext │ ReferenceGraph │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Storage Layer │
|
||||
│ SpaceRepository │ EventStore │ Cache Backend │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Existing Markitect │
|
||||
│ DatabaseManager │ TransclusionEngine │ VariantFactory │
|
||||
│ PluginRegistry │ QueryParadigms │ ASTService │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Integration Strategy
|
||||
|
||||
The Information Space Service builds on existing markitect infrastructure:
|
||||
|
||||
| Existing Component | Integration |
|
||||
|-------------------|-------------|
|
||||
| `TransclusionContext` | Extended with `PersistentTransclusionContext` |
|
||||
| `VariantFactory` | Used for directory export/import |
|
||||
| `PluginRegistry` | Add SPACE_RENDERER, SPACE_SYNC, EVENT_HANDLER types |
|
||||
| `DatabaseManager` | Add space-related tables |
|
||||
| `GraphQL Schema` | Extend with Space types and mutations |
|
||||
|
||||
## Project Status
|
||||
|
||||
- [ ] **Phase 0**: Project Organization (prerequisite cleanup)
|
||||
- [ ] **Phase 1**: Foundation (Space entity, repository)
|
||||
- [ ] **Phase 2**: Event System
|
||||
- [ ] **Phase 3**: Persistent Transclusion
|
||||
- [ ] **Phase 4**: HTML Rendering Mode
|
||||
- [ ] **Phase 5**: Directory Mode
|
||||
- [ ] **Phase 6**: API Layer
|
||||
- [ ] **Phase 7**: Composability
|
||||
- [ ] **Phase 8**: Git History (optional)
|
||||
|
||||
## Documentation
|
||||
|
||||
- [WORKPLAN.md](WORKPLAN.md) - Detailed implementation workplan
|
||||
- [PROJECT_STRUCTURE.md](../../PROJECT_STRUCTURE.md) - Current project structure
|
||||
|
||||
## Usage Example (Target State)
|
||||
|
||||
```python
|
||||
from markitect.spaces import SpaceService, InformationSpace
|
||||
|
||||
# Create a space
|
||||
service = SpaceService()
|
||||
space = await service.create_space("my-docs", description="Documentation space")
|
||||
|
||||
# Add documents
|
||||
await service.add_document(space, "/intro.md", content="# Introduction")
|
||||
await service.add_document(space, "/getting-started.md", content="# Getting Started")
|
||||
|
||||
# Render to HTML
|
||||
html_output = await service.render(space, theme="default")
|
||||
|
||||
# Export to directory
|
||||
await service.export_to_directory(space, "./output/")
|
||||
|
||||
# Watch for changes
|
||||
async for event in service.subscribe(space):
|
||||
print(f"Change detected: {event.type} on {event.document_path}")
|
||||
```
|
||||
|
||||
## CLI Commands (Target State)
|
||||
|
||||
```bash
|
||||
# Space management
|
||||
markitect space create my-space --description "My documentation"
|
||||
markitect space list
|
||||
markitect space show my-space
|
||||
|
||||
# Document management
|
||||
markitect space add-doc my-space --path "/intro.md" --file ./intro.md
|
||||
markitect space list-docs my-space
|
||||
|
||||
# Rendering
|
||||
markitect space render my-space --output ./html/ --theme default
|
||||
|
||||
# Directory sync
|
||||
markitect space sync my-space --directory ./my-space-dir/ --bidirectional
|
||||
|
||||
# History (Phase 8)
|
||||
markitect space history log my-space
|
||||
markitect space history diff my-space --rev HEAD~1
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
See the main project CONTRIBUTING.md for guidelines. For this initiative specifically:
|
||||
|
||||
1. Follow the phased implementation order
|
||||
2. Write tests before implementing features
|
||||
3. Update documentation as you go
|
||||
4. Use the event system for loose coupling
|
||||
5. Maintain backward compatibility
|
||||
599
docs/roadmap/information-space-service/WORKPLAN.md
Normal file
599
docs/roadmap/information-space-service/WORKPLAN.md
Normal file
@@ -0,0 +1,599 @@
|
||||
# Headless Information Space Service - Implementation Workplan
|
||||
|
||||
## Overview
|
||||
|
||||
This workplan details the implementation phases for evolving markitect into a headless markdown transclusion-based information space service.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: Project Organization (Prerequisite)
|
||||
|
||||
### Current State Issues
|
||||
- **Hybrid layout** - Mix of root-level packages and monolithic `/markitect/`
|
||||
- **Flat root in markitect** - 34 .py files at `/markitect/` root level
|
||||
- **No structure documentation** - Missing PROJECT_STRUCTURE.md (now created)
|
||||
|
||||
### Reorganization Tasks
|
||||
|
||||
| ID | Task | Description | Status |
|
||||
|----|------|-------------|--------|
|
||||
| ORG-001 | Create PROJECT_STRUCTURE.md | Document current layout and rationale | Done |
|
||||
| ORG-002 | Create `/markitect/core/` | Move parser, serializer, document_manager | Pending |
|
||||
| ORG-003 | Create `/markitect/schema/` | Consolidate 6 schema_*.py files | Pending |
|
||||
| ORG-004 | Create `/markitect/storage/` | Group database.py, cache modules | Pending |
|
||||
| ORG-005 | Update imports | Fix all import statements after moves | Pending |
|
||||
| ORG-006 | Verify tests | Ensure all tests pass after moves | Pending |
|
||||
|
||||
### Target Structure After Phase 0
|
||||
|
||||
```
|
||||
markitect/
|
||||
├── core/ # Core infrastructure
|
||||
│ ├── __init__.py
|
||||
│ ├── parser.py # (from markitect/)
|
||||
│ ├── serializer.py # (from markitect/)
|
||||
│ ├── document_manager.py # (from markitect/)
|
||||
│ └── workspace.py # (from markitect/)
|
||||
├── schema/ # Schema management
|
||||
│ ├── __init__.py
|
||||
│ ├── validator.py # (from schema_validator.py)
|
||||
│ ├── generator.py # (from schema_generator.py)
|
||||
│ ├── loader.py # (from schema_loader.py)
|
||||
│ ├── analyzer.py # (from schema_analyzer.py)
|
||||
│ ├── refiner.py # (from schema_refiner.py)
|
||||
│ └── naming.py # (from schema_naming.py)
|
||||
├── storage/ # Storage concerns
|
||||
│ ├── __init__.py
|
||||
│ └── database.py # (from markitect/)
|
||||
└── spaces/ # Information spaces (Phase 1+)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Foundation
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-001 | InformationSpace Entity | First-class space abstraction with identity, metadata, lifecycle | Critical |
|
||||
| CAP-002 | SpaceRepository | CRUD operations for spaces with SQLite backing | Critical |
|
||||
| CAP-003 | Document-Space Association | Link documents to spaces with membership tracking | Critical |
|
||||
| CAP-004 | Space Metadata Schema | Extensible metadata schema for space configuration | High |
|
||||
| CAP-005 | Database Migrations | Schema evolution for space-related tables | High |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 1: Core Models**
|
||||
- Create `markitect/spaces/models.py`
|
||||
- `InformationSpace` dataclass with id, name, description, metadata, config
|
||||
- `SpaceDocument` dataclass for document membership
|
||||
- `SpaceConfig` dataclass for space settings
|
||||
- Create `markitect/spaces/repositories/interfaces.py`
|
||||
- Unit tests for models
|
||||
|
||||
**Week 2: Repository Implementation**
|
||||
- Create `markitect/spaces/repositories/sqlite.py`
|
||||
- Implement `ISpaceRepository` for SQLite
|
||||
- Implement `IDocumentAssociationRepository`
|
||||
- Database migration scripts
|
||||
- Repository unit tests
|
||||
|
||||
**Week 3: Basic SpaceService**
|
||||
- Create `markitect/spaces/services/space_service.py`
|
||||
- CRUD operations for spaces
|
||||
- Document add/remove operations
|
||||
- Integration tests
|
||||
|
||||
### Database Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE spaces (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT UNIQUE NOT NULL,
|
||||
description TEXT,
|
||||
metadata JSON,
|
||||
config JSON,
|
||||
parent_space_id TEXT REFERENCES spaces(id),
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE space_documents (
|
||||
id TEXT PRIMARY KEY,
|
||||
space_id TEXT NOT NULL REFERENCES spaces(id),
|
||||
document_id TEXT NOT NULL,
|
||||
space_path TEXT NOT NULL,
|
||||
order_index INTEGER DEFAULT 0,
|
||||
metadata JSON,
|
||||
added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
UNIQUE(space_id, space_path)
|
||||
);
|
||||
```
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
pytest tests/unit/spaces/test_*_model.py
|
||||
pytest tests/unit/spaces/test_*_repository.py
|
||||
pytest tests/integration/spaces/test_space_service_integration.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Event System
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-010 | SpaceEvent Base | Event dataclass with type, payload, timestamp | Critical |
|
||||
| CAP-011 | Event Bus | In-process publish/subscribe for space events | Critical |
|
||||
| CAP-012 | Event Handlers Registry | Register/unregister event handlers by type | High |
|
||||
| CAP-013 | Change Detection | Detect document changes via content hash comparison | High |
|
||||
| CAP-014 | Event Persistence | Store events for replay/audit | Medium |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 4: Event Infrastructure**
|
||||
- Create `markitect/spaces/events/models.py`
|
||||
- `SpaceEvent` dataclass with event_id, type, space_id, payload, timestamp
|
||||
- `SpaceEventType` enum (DOCUMENT_ADDED, DOCUMENT_UPDATED, DOCUMENT_REMOVED, etc.)
|
||||
- Create `markitect/spaces/events/bus.py`
|
||||
- `EventBus` with sync/async handler support
|
||||
- Handler registration by event type
|
||||
- Unit tests for event bus
|
||||
|
||||
**Week 5: Integration**
|
||||
- Wire events into SpaceService (emit on document operations)
|
||||
- Implement change detection (content hash comparison)
|
||||
- Optional: event persistence table
|
||||
- Integration tests for event flow
|
||||
|
||||
### Event Types
|
||||
|
||||
```python
|
||||
class SpaceEventType(Enum):
|
||||
SPACE_CREATED = "space.created"
|
||||
SPACE_UPDATED = "space.updated"
|
||||
SPACE_DELETED = "space.deleted"
|
||||
DOCUMENT_ADDED = "document.added"
|
||||
DOCUMENT_UPDATED = "document.updated"
|
||||
DOCUMENT_REMOVED = "document.removed"
|
||||
DOCUMENT_MOVED = "document.moved"
|
||||
VARIABLE_SET = "variable.set"
|
||||
RENDER_COMPLETED = "render.completed"
|
||||
SYNC_COMPLETED = "sync.completed"
|
||||
```
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
pytest tests/unit/spaces/test_event_bus.py
|
||||
pytest tests/integration/spaces/test_event_propagation.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Persistent Transclusion Context
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-020 | Persistent TransclusionContext | Store context state in database | Critical |
|
||||
| CAP-021 | Cross-Space References | Resolve transclusions across space boundaries | High |
|
||||
| CAP-022 | Reference Graph | Track document dependencies for invalidation | High |
|
||||
| CAP-023 | Variable Scope Layers | Space-level, document-level, request-level variables | Medium |
|
||||
| CAP-024 | Transclusion Cache Invalidation | Invalidate rendered content on dependency change | High |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 6: Persistent Context**
|
||||
- Create `markitect/spaces/transclusion/persistent_context.py`
|
||||
- Extend existing `TransclusionContext` with DB persistence
|
||||
- Space-scoped variable storage
|
||||
|
||||
**Week 7: Reference Graph**
|
||||
- Implement reference tracking during transclusion resolution
|
||||
- Cross-space reference resolution with space:// protocol
|
||||
- Variable scope layers (space → document → request)
|
||||
|
||||
**Week 8: Cache Invalidation**
|
||||
- Wire change events to cache invalidation
|
||||
- Dependency-aware cache clearing
|
||||
- Integration tests
|
||||
|
||||
### Database Schema Additions
|
||||
|
||||
```sql
|
||||
CREATE TABLE space_variables (
|
||||
space_id TEXT NOT NULL REFERENCES spaces(id),
|
||||
name TEXT NOT NULL,
|
||||
value JSON,
|
||||
scope TEXT DEFAULT 'space',
|
||||
PRIMARY KEY(space_id, name)
|
||||
);
|
||||
|
||||
CREATE TABLE transclusion_references (
|
||||
source_doc_id TEXT NOT NULL,
|
||||
target_doc_id TEXT NOT NULL,
|
||||
space_id TEXT NOT NULL REFERENCES spaces(id),
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
PRIMARY KEY(source_doc_id, target_doc_id, space_id)
|
||||
);
|
||||
```
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
pytest tests/unit/spaces/test_persistent_context.py
|
||||
pytest tests/unit/spaces/test_reference_graph.py
|
||||
pytest tests/integration/spaces/test_transclusion_persistence.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: HTML Rendering Mode
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-030 | SpaceRenderer Base | Abstract renderer interface | Critical |
|
||||
| CAP-031 | MarkdownToHTMLRenderer | Render resolved markdown to HTML | Critical |
|
||||
| CAP-032 | Rendering Cache | Cache rendered output with invalidation | High |
|
||||
| CAP-033 | Theme Support | Apply themes to rendered HTML | Medium |
|
||||
| CAP-034 | Incremental Rendering | Re-render only changed documents | Medium |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 9: Renderer Base**
|
||||
- Create `markitect/spaces/rendering/base.py` - SpaceRenderer ABC
|
||||
- Create `markitect/spaces/rendering/html_renderer.py` - MarkdownToHTMLRenderer
|
||||
- Integrate with existing `CleanDocumentManager`
|
||||
|
||||
**Week 10: Caching and Themes**
|
||||
- Implement render output caching (keyed by content hash)
|
||||
- Theme integration using existing theme system
|
||||
- Invalidation on dependency change via events
|
||||
|
||||
**Week 11: Incremental Rendering**
|
||||
- Re-render only affected documents on change
|
||||
- Rendering events emission
|
||||
- E2E tests for render workflow
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
pytest tests/e2e/spaces/test_html_rendering_workflow.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Directory Mode
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-040 | SpaceToDirectory Exporter | Export space to canonical directory structure | Critical |
|
||||
| CAP-041 | DirectoryToSpace Importer | Import directory structure as space | Critical |
|
||||
| CAP-042 | Bidirectional Sync | Detect and sync changes both directions | High |
|
||||
| CAP-043 | Filesystem Watcher | Watch directory for external changes | Medium |
|
||||
| CAP-044 | Conflict Resolution | Handle conflicts in bidirectional sync | Medium |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 12: Export**
|
||||
- Create `markitect/spaces/sync/directory_exporter.py`
|
||||
- Integrate with existing `VariantFactory`
|
||||
- Support flat/hierarchical/semantic variants
|
||||
|
||||
**Week 13: Import and Sync**
|
||||
- Create `markitect/spaces/sync/directory_importer.py`
|
||||
- Create `markitect/spaces/sync/bidirectional.py`
|
||||
- Conflict detection (modification time, content hash)
|
||||
|
||||
**Week 14: Filesystem Watcher**
|
||||
- Implement watcher using `watchdog` library
|
||||
- Sync events emission
|
||||
- E2E tests for bidirectional sync
|
||||
|
||||
### Canonical Directory Structure
|
||||
|
||||
```
|
||||
.markitect/spaces/{space-name}/
|
||||
├── .space.yaml # Space metadata and config
|
||||
├── documents/ # Document files
|
||||
│ ├── intro.md
|
||||
│ ├── getting-started.md
|
||||
│ └── advanced/
|
||||
│ └── topics.md
|
||||
└── assets/ # Associated assets
|
||||
```
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
pytest tests/e2e/spaces/test_directory_mode_workflow.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: API Layer
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-060 | SpaceService | Service layer orchestrating space operations | Critical |
|
||||
| CAP-061 | GraphQL Space Schema | Extend existing GraphQL with space types | High |
|
||||
| CAP-062 | REST Endpoints | Alternative REST API for spaces | Medium |
|
||||
| CAP-063 | WebSocket Subscriptions | Real-time event subscriptions | Medium |
|
||||
| CAP-064 | CLI Space Commands | CLI commands for space management | High |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 15: GraphQL Extension**
|
||||
- Extend `markitect/graphql/schema.py` with Space types
|
||||
- Add mutations: createSpace, updateSpace, deleteSpace
|
||||
- Add queries: space, spaces, spaceDocuments
|
||||
- Add subscriptions: onSpaceEvent
|
||||
|
||||
**Week 16: CLI Commands**
|
||||
- Add to `markitect/cli.py`:
|
||||
- `markitect space create/list/show/delete`
|
||||
- `markitect space add-doc/remove-doc/list-docs`
|
||||
- `markitect space render`
|
||||
- `markitect space sync`
|
||||
|
||||
**Week 17: WebSocket and Polish**
|
||||
- WebSocket subscriptions for real-time events
|
||||
- Documentation updates
|
||||
- Final integration testing
|
||||
|
||||
### GraphQL Schema Extensions
|
||||
|
||||
```graphql
|
||||
type InformationSpace {
|
||||
id: ID!
|
||||
name: String!
|
||||
description: String
|
||||
documents: [SpaceDocument!]!
|
||||
config: SpaceConfig!
|
||||
parentSpace: InformationSpace
|
||||
createdAt: DateTime!
|
||||
updatedAt: DateTime!
|
||||
}
|
||||
|
||||
type SpaceDocument {
|
||||
id: ID!
|
||||
spacePath: String!
|
||||
content: String!
|
||||
metadata: JSON
|
||||
}
|
||||
|
||||
type Mutation {
|
||||
createSpace(input: CreateSpaceInput!): InformationSpace!
|
||||
addDocument(spaceId: ID!, input: AddDocumentInput!): SpaceDocument!
|
||||
renderSpace(spaceId: ID!, options: RenderOptions): RenderResult!
|
||||
}
|
||||
|
||||
type Subscription {
|
||||
onSpaceEvent(spaceId: ID!): SpaceEvent!
|
||||
}
|
||||
```
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
pytest tests/integration/spaces/
|
||||
pytest tests/e2e/spaces/
|
||||
markitect space --help # Verify CLI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Composability
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-050 | Space References | Spaces can reference other spaces | High |
|
||||
| CAP-051 | Space Inheritance | Child spaces inherit parent context | Medium |
|
||||
| CAP-053 | Space Access Control | Basic permission model for space access | Medium |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 18-19: Space References**
|
||||
- Space-to-space references via space:// protocol
|
||||
- Variable inheritance from parent spaces
|
||||
- Basic access control (read/write/admin)
|
||||
|
||||
**Week 20: Final Integration**
|
||||
- Complete E2E test suite
|
||||
- Performance testing
|
||||
- User documentation
|
||||
|
||||
### Space Reference Protocol
|
||||
|
||||
```markdown
|
||||
<!-- Reference document from another space -->
|
||||
{{transclude space://other-space/path/to/doc.md}}
|
||||
|
||||
<!-- Reference with variable override -->
|
||||
{{transclude space://shared-components/header.md | title="My Page"}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Git History Tracking (Optional)
|
||||
|
||||
### Capability Requirements
|
||||
|
||||
| ID | Capability | Description | Priority |
|
||||
|----|-----------|-------------|----------|
|
||||
| CAP-070 | History Configuration | Per-space history tracking configuration | High |
|
||||
| CAP-071 | HistoryBackend Interface | Abstract interface for history backends | High |
|
||||
| CAP-072 | GitHistoryBackend | Git implementation of history backend | High |
|
||||
| CAP-073 | Canonical Directory Binding | Bind space to canonical directory for git | High |
|
||||
| CAP-074 | Event-Driven Commits | Commit on document change events | Medium |
|
||||
| CAP-075 | History Query API | Query commits, diffs, branches | High |
|
||||
| CAP-076 | History CLI Commands | CLI for log, diff, restore, checkout | High |
|
||||
| CAP-077 | Versioned Read/Render | Read/render documents at specific versions | Medium |
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
**Week 21: History Infrastructure**
|
||||
- Create `markitect/spaces/history/interfaces.py` - IHistoryBackend ABC
|
||||
- Create `markitect/spaces/history/models.py` - Commit, HistoryEntry dataclasses
|
||||
- Add `SpaceConfig` fields: history_enabled, history_backend, history_options
|
||||
- Add `SPACE_SYNC` to PluginType enum
|
||||
|
||||
**Week 22: Git Backend**
|
||||
- Create `markitect/spaces/history/git_backend.py`
|
||||
- Leverage existing `legacy/git_tracker.py` patterns
|
||||
- Create event handlers for auto-commit on document changes
|
||||
- Integration tests
|
||||
|
||||
**Week 23: API and CLI**
|
||||
- History query service for log, diff, branches
|
||||
- CLI commands: `markitect space history log/diff/restore/checkout`
|
||||
- Extend read/render with `--version` option
|
||||
- E2E tests
|
||||
|
||||
### Integration Diagram
|
||||
|
||||
```
|
||||
Document Update Flow (with history enabled):
|
||||
|
||||
User updates document
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ SpaceService │
|
||||
│ update_document()│
|
||||
└────────┬──────────┘
|
||||
│ emit event
|
||||
▼
|
||||
┌───────────────────┐ ┌─────────────────────┐
|
||||
│ Event Bus │────▶│ GitHistoryHandler │
|
||||
│ DOCUMENT_UPDATED │ │ (subscribed) │
|
||||
└───────────────────┘ └──────────┬──────────┘
|
||||
│
|
||||
┌──────────▼──────────┐
|
||||
│ DirectorySyncService │
|
||||
└──────────┬──────────┘
|
||||
│ writes to
|
||||
┌──────────▼──────────┐
|
||||
│ Canonical Directory │
|
||||
│ .markitect/spaces/X/ │
|
||||
└──────────┬──────────┘
|
||||
│
|
||||
┌──────────▼──────────┐
|
||||
│ GitHistoryBackend │
|
||||
│ git add && git commit│
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
markitect space create my-space --history-enabled
|
||||
markitect space add-doc my-space --content "# V1"
|
||||
markitect space update-doc my-space/doc.md --content "# V2"
|
||||
markitect space history log my-space
|
||||
markitect space history diff my-space --rev HEAD~1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Timeline Summary
|
||||
|
||||
| Phase | Focus | Duration |
|
||||
|-------|-------|----------|
|
||||
| 0 | Project Organization | 1 week |
|
||||
| 1 | Foundation | 3 weeks |
|
||||
| 2 | Event System | 2 weeks |
|
||||
| 3 | Persistent Transclusion | 3 weeks |
|
||||
| 4 | HTML Rendering Mode | 3 weeks |
|
||||
| 5 | Directory Mode | 3 weeks |
|
||||
| 6 | API Layer | 3 weeks |
|
||||
| 7 | Composability | 3 weeks |
|
||||
| 8 | Git History (Optional) | 3 weeks |
|
||||
|
||||
**Total: 21-24 weeks** (5-6 months)
|
||||
|
||||
### Parallel Work Opportunities
|
||||
- Phase 4 (HTML) and Phase 5 (Directory) can run in parallel after Phase 3
|
||||
- Phase 8 can start in parallel with Phase 7
|
||||
- Documentation can be written incrementally
|
||||
- CLI commands can start in parallel with Phase 4/5
|
||||
|
||||
---
|
||||
|
||||
## Files to Create
|
||||
|
||||
### Phase 0
|
||||
```
|
||||
docs/PROJECT_STRUCTURE.md # Done
|
||||
roadmap/information-space-service/ # Done
|
||||
├── README.md # Done
|
||||
└── WORKPLAN.md # This file
|
||||
markitect/core/ # To do
|
||||
markitect/schema/ # To do
|
||||
markitect/storage/ # To do
|
||||
```
|
||||
|
||||
### Phase 1+
|
||||
```
|
||||
markitect/spaces/
|
||||
├── __init__.py
|
||||
├── models.py
|
||||
├── events/
|
||||
│ ├── __init__.py
|
||||
│ ├── models.py
|
||||
│ └── bus.py
|
||||
├── repositories/
|
||||
│ ├── __init__.py
|
||||
│ ├── interfaces.py
|
||||
│ └── sqlite.py
|
||||
├── transclusion/
|
||||
│ ├── __init__.py
|
||||
│ └── persistent_context.py
|
||||
├── rendering/
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py
|
||||
│ └── html_renderer.py
|
||||
├── sync/
|
||||
│ ├── __init__.py
|
||||
│ ├── directory_exporter.py
|
||||
│ ├── directory_importer.py
|
||||
│ └── bidirectional.py
|
||||
├── history/ # Phase 8
|
||||
│ ├── __init__.py
|
||||
│ ├── interfaces.py
|
||||
│ ├── models.py
|
||||
│ ├── git_backend.py
|
||||
│ ├── events.py
|
||||
│ └── queries.py
|
||||
└── services/
|
||||
├── __init__.py
|
||||
└── space_service.py
|
||||
```
|
||||
|
||||
### Test Files
|
||||
```
|
||||
tests/unit/spaces/
|
||||
tests/integration/spaces/
|
||||
tests/e2e/spaces/
|
||||
tests/fixtures/spaces.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. Phase 0 complete: project reorganized with docs/PROJECT_STRUCTURE.md
|
||||
2. All phases complete with passing tests
|
||||
3. HTML rendering mode fully functional
|
||||
4. Directory mode with bidirectional sync working
|
||||
5. GraphQL API exposing all space operations
|
||||
6. CLI commands operational
|
||||
7. Events propagating correctly
|
||||
8. Cross-space transclusion resolving
|
||||
50
markitect/core/__init__.py
Normal file
50
markitect/core/__init__.py
Normal file
@@ -0,0 +1,50 @@
|
||||
"""
|
||||
Core infrastructure modules for MarkiTect.
|
||||
|
||||
This package contains the fundamental building blocks:
|
||||
- Parser: Markdown to AST conversion
|
||||
- Serializer: AST to Markdown serialization
|
||||
- DocumentManager: Document ingestion and management
|
||||
- Workspace: Workspace and project management
|
||||
"""
|
||||
|
||||
from .parser import parse_markdown_to_ast
|
||||
from .serializer import ASTSerializer
|
||||
from .document_manager import DocumentManager, CleanDocumentManager
|
||||
from .workspace import (
|
||||
WorkspaceManager,
|
||||
WorkspaceTemplate,
|
||||
TemplateMetadata,
|
||||
TemplateResult,
|
||||
WorkspaceCreationResult,
|
||||
ProjectResult,
|
||||
SyncResult,
|
||||
BackupResult,
|
||||
RestoreResult,
|
||||
WorkspaceState,
|
||||
ConflictInfo,
|
||||
MergeResult,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Parser
|
||||
"parse_markdown_to_ast",
|
||||
# Serializer
|
||||
"ASTSerializer",
|
||||
# Document Manager
|
||||
"DocumentManager",
|
||||
"CleanDocumentManager",
|
||||
# Workspace
|
||||
"WorkspaceManager",
|
||||
"WorkspaceTemplate",
|
||||
"TemplateMetadata",
|
||||
"TemplateResult",
|
||||
"WorkspaceCreationResult",
|
||||
"ProjectResult",
|
||||
"SyncResult",
|
||||
"BackupResult",
|
||||
"RestoreResult",
|
||||
"WorkspaceState",
|
||||
"ConflictInfo",
|
||||
"MergeResult",
|
||||
]
|
||||
98
markitect/core/document_manager.py
Normal file
98
markitect/core/document_manager.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
Document manager - Clean implementation.
|
||||
|
||||
This module provides the DocumentManager class which is now a wrapper around
|
||||
the CleanDocumentManager for backward compatibility.
|
||||
"""
|
||||
|
||||
from markitect.clean_document_manager import CleanDocumentManager
|
||||
from .parser import parse_markdown_to_ast
|
||||
from markitect.frontmatter import FrontMatterParser
|
||||
|
||||
|
||||
class DocumentManager(CleanDocumentManager):
|
||||
"""
|
||||
Document manager for backward compatibility.
|
||||
|
||||
This class extends CleanDocumentManager to maintain compatibility
|
||||
with existing code while using the clean implementation.
|
||||
"""
|
||||
|
||||
def __init__(self, db_manager=None):
|
||||
super().__init__(db_manager)
|
||||
|
||||
def ingest_file(self, file_path: str):
|
||||
"""
|
||||
Ingest a markdown file for processing.
|
||||
|
||||
This method provides compatibility for tests expecting the ingest_file interface.
|
||||
"""
|
||||
import time
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
file_path = Path(file_path)
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"File not found: {file_path}")
|
||||
|
||||
# Read file content
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
|
||||
# Extract front matter
|
||||
start_time = time.time()
|
||||
parser = FrontMatterParser()
|
||||
front_matter_data, content_without_front_matter = parser.parse(content)
|
||||
|
||||
# Parse to AST
|
||||
ast = parse_markdown_to_ast(content)
|
||||
parse_time = time.time() - start_time
|
||||
|
||||
# Extract title - first try front matter, then first heading, then filename
|
||||
title = "Unknown"
|
||||
if front_matter_data and 'title' in front_matter_data:
|
||||
title = front_matter_data['title']
|
||||
elif isinstance(ast, list):
|
||||
# Look for first H1 heading in AST tokens
|
||||
for token in ast:
|
||||
if token.get('type') == 'heading_open' and token.get('tag') == 'h1':
|
||||
# Find the next inline token with content
|
||||
idx = ast.index(token) + 1
|
||||
if idx < len(ast) and ast[idx].get('type') == 'inline':
|
||||
title = ast[idx].get('content', 'Unknown')
|
||||
break
|
||||
|
||||
# Create actual cache file for compatibility
|
||||
cache_dir = Path(file_path.parent) / '.ast_cache'
|
||||
cache_dir.mkdir(exist_ok=True)
|
||||
cache_file = cache_dir / f"{file_path.stem}_ast.json"
|
||||
|
||||
# Write AST to cache file
|
||||
with open(cache_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(ast, f, indent=2)
|
||||
|
||||
# Store document in database if db_manager exists
|
||||
if hasattr(self, 'db_manager') and self.db_manager:
|
||||
try:
|
||||
# Store using the clean document manager's method
|
||||
self.store_document(str(file_path), content, ast, front_matter_data)
|
||||
except Exception:
|
||||
# If storage fails, continue without error for test compatibility
|
||||
pass
|
||||
|
||||
return {
|
||||
'ast': ast,
|
||||
'content': content,
|
||||
'metadata': {
|
||||
'filename': file_path.name,
|
||||
'title': title,
|
||||
'size': len(content),
|
||||
'path': str(file_path)
|
||||
},
|
||||
'ast_cache_path': cache_file,
|
||||
'parse_time': parse_time,
|
||||
'cache_time': 0 # Mock cache time for compatibility
|
||||
}
|
||||
|
||||
|
||||
# For backward compatibility, also export the clean document manager directly
|
||||
__all__ = ['DocumentManager', 'CleanDocumentManager']
|
||||
47
markitect/core/parser.py
Normal file
47
markitect/core/parser.py
Normal file
@@ -0,0 +1,47 @@
|
||||
"""
|
||||
Markdown AST Parser.
|
||||
|
||||
This module provides functionality to parse markdown content into an
|
||||
Abstract Syntax Tree (AST) using the markdown-it library.
|
||||
"""
|
||||
|
||||
from markdown_it import MarkdownIt
|
||||
|
||||
|
||||
def parse_markdown_to_ast(md_content: str):
|
||||
"""
|
||||
Parse markdown content into a JSON-serializable AST.
|
||||
|
||||
Args:
|
||||
md_content: Markdown text to parse
|
||||
|
||||
Returns:
|
||||
List of token dictionaries representing the AST
|
||||
|
||||
Example:
|
||||
ast = parse_markdown_to_ast("# Hello\\n\\nWorld")
|
||||
"""
|
||||
# Enable table parsing and other common plugins
|
||||
md = MarkdownIt("commonmark", {"tables": True}).enable(['table'])
|
||||
tokens = md.parse(md_content)
|
||||
|
||||
# Convert to a JSON-serializable list of dicts
|
||||
def token_to_dict(token):
|
||||
d = {
|
||||
'type': token.type,
|
||||
'tag': token.tag,
|
||||
'attrs': token.attrs,
|
||||
'map': token.map,
|
||||
'nesting': token.nesting,
|
||||
'level': token.level,
|
||||
'children': [token_to_dict(child) if child else None for child in token.children] if token.children else None,
|
||||
'content': token.content,
|
||||
'markup': token.markup,
|
||||
'info': token.info,
|
||||
'meta': token.meta,
|
||||
'block': token.block,
|
||||
'hidden': token.hidden
|
||||
}
|
||||
return {k: v for k, v in d.items() if v is not None} # Remove None values
|
||||
|
||||
return [token_to_dict(token) for token in tokens]
|
||||
359
markitect/core/serializer.py
Normal file
359
markitect/core/serializer.py
Normal file
@@ -0,0 +1,359 @@
|
||||
"""
|
||||
AST to Markdown Serialization - Issue #2 Completion
|
||||
|
||||
This module provides functionality to serialize markdown-it AST tokens back into
|
||||
markdown format, enabling roundtrip validation and document manipulation.
|
||||
|
||||
Key Features:
|
||||
- Convert AST tokens back to markdown text
|
||||
- Preserve front matter during serialization
|
||||
- Support for content manipulation operations
|
||||
- Roundtrip integrity validation
|
||||
"""
|
||||
|
||||
from typing import List, Dict, Any, Optional
|
||||
import yaml
|
||||
|
||||
|
||||
class ASTSerializer:
|
||||
"""
|
||||
Serializes markdown-it AST tokens back to markdown format.
|
||||
|
||||
Provides roundtrip capability: markdown -> AST -> markdown
|
||||
Supports front matter preservation and content manipulation.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the AST serializer."""
|
||||
pass
|
||||
|
||||
def serialize_to_markdown(self, ast: List[Dict[str, Any]], front_matter: Optional[Dict[str, Any]] = None) -> str:
|
||||
"""
|
||||
Convert AST tokens back to markdown format.
|
||||
|
||||
Args:
|
||||
ast: List of markdown-it AST tokens
|
||||
front_matter: Optional YAML front matter dictionary
|
||||
|
||||
Returns:
|
||||
Markdown text with optional front matter
|
||||
|
||||
Example:
|
||||
serializer = ASTSerializer()
|
||||
markdown = serializer.serialize_to_markdown(ast, front_matter)
|
||||
"""
|
||||
markdown_parts = []
|
||||
|
||||
# Add front matter if present
|
||||
if front_matter and isinstance(front_matter, dict) and front_matter:
|
||||
yaml_content = yaml.dump(front_matter, default_flow_style=False).strip()
|
||||
markdown_parts.append(f"---\n{yaml_content}\n---\n\n")
|
||||
|
||||
# Process AST tokens
|
||||
markdown_content = self._process_tokens(ast)
|
||||
markdown_parts.append(markdown_content)
|
||||
|
||||
return ''.join(markdown_parts)
|
||||
|
||||
def _process_tokens(self, tokens: List[Dict[str, Any]]) -> str:
|
||||
"""
|
||||
Process a list of AST tokens into markdown text.
|
||||
|
||||
Args:
|
||||
tokens: List of markdown-it tokens
|
||||
|
||||
Returns:
|
||||
Markdown text representation
|
||||
"""
|
||||
markdown_lines = []
|
||||
current_line = ""
|
||||
list_level = 0
|
||||
|
||||
for token in tokens:
|
||||
token_type = token.get('type', '')
|
||||
content = token.get('content', '')
|
||||
markup = token.get('markup', '')
|
||||
tag = token.get('tag', '')
|
||||
nesting = token.get('nesting', 0)
|
||||
level = token.get('level', 0)
|
||||
|
||||
# Handle different token types
|
||||
if token_type == 'heading_open':
|
||||
heading_level = int(tag[1]) if tag.startswith('h') else 1
|
||||
current_line = '#' * heading_level + ' '
|
||||
elif token_type == 'heading_close':
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
current_line = ""
|
||||
markdown_lines.append("") # Empty line after heading
|
||||
|
||||
elif token_type == 'paragraph_open':
|
||||
pass # Start of paragraph
|
||||
elif token_type == 'paragraph_close':
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
current_line = ""
|
||||
markdown_lines.append("") # Empty line after paragraph
|
||||
|
||||
elif token_type == 'inline':
|
||||
# Process inline content and children
|
||||
if content:
|
||||
current_line += content
|
||||
elif 'children' in token:
|
||||
current_line += self._process_inline_children(token['children'])
|
||||
|
||||
elif token_type == 'list_item_open':
|
||||
# Handle list items
|
||||
indent = ' ' * (level // 2)
|
||||
if markup in ('-', '*'):
|
||||
current_line = indent + '- '
|
||||
elif markup.isdigit():
|
||||
current_line = indent + '1. '
|
||||
elif token_type == 'list_item_close':
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
current_line = ""
|
||||
|
||||
elif token_type in ('bullet_list_open', 'ordered_list_open'):
|
||||
list_level += 1
|
||||
elif token_type in ('bullet_list_close', 'ordered_list_close'):
|
||||
list_level -= 1
|
||||
if list_level == 0:
|
||||
markdown_lines.append("") # Empty line after list
|
||||
|
||||
elif token_type == 'blockquote_open':
|
||||
pass
|
||||
elif token_type == 'blockquote_close':
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'code_block':
|
||||
markdown_lines.append(f"```{token.get('info', '')}")
|
||||
markdown_lines.append(content.rstrip())
|
||||
markdown_lines.append("```")
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'fence':
|
||||
if nesting == 1: # Opening fence
|
||||
markdown_lines.append(f"```{token.get('info', '')}")
|
||||
else: # Closing fence
|
||||
markdown_lines.append("```")
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'hr':
|
||||
markdown_lines.append("---")
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'text':
|
||||
current_line += content
|
||||
|
||||
# Add any remaining content
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
|
||||
# Clean up extra empty lines at the end
|
||||
while markdown_lines and markdown_lines[-1] == "":
|
||||
markdown_lines.pop()
|
||||
|
||||
return '\n'.join(markdown_lines)
|
||||
|
||||
def _process_inline_children(self, children: List[Dict[str, Any]]) -> str:
|
||||
"""
|
||||
Process inline children tokens (emphasis, strong, links, etc.).
|
||||
|
||||
Args:
|
||||
children: List of inline token children
|
||||
|
||||
Returns:
|
||||
Processed inline markdown text
|
||||
"""
|
||||
result = ""
|
||||
|
||||
for child in children:
|
||||
token_type = child.get('type', '')
|
||||
content = child.get('content', '')
|
||||
markup = child.get('markup', '')
|
||||
|
||||
if token_type == 'text':
|
||||
result += content
|
||||
elif token_type == 'code_inline':
|
||||
result += f"`{content}`"
|
||||
elif token_type == 'em_open':
|
||||
result += markup or '*'
|
||||
elif token_type == 'em_close':
|
||||
result += markup or '*'
|
||||
elif token_type == 'strong_open':
|
||||
result += markup or '**'
|
||||
elif token_type == 'strong_close':
|
||||
result += markup or '**'
|
||||
elif token_type == 'link_open':
|
||||
# Extract href from attrs
|
||||
href = ""
|
||||
if 'attrs' in child and child['attrs']:
|
||||
for attr in child['attrs']:
|
||||
if attr[0] == 'href':
|
||||
href = attr[1]
|
||||
break
|
||||
result += "["
|
||||
elif token_type == 'link_close':
|
||||
# This is tricky - we need to get the href from the opening token
|
||||
# For now, we'll use a placeholder approach
|
||||
result += "](#)"
|
||||
elif token_type == 'softbreak':
|
||||
result += '\n'
|
||||
elif token_type == 'hardbreak':
|
||||
result += ' \n'
|
||||
|
||||
return result
|
||||
|
||||
def modify_ast_content(self, ast: List[Dict[str, Any]], modifications: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Modify AST content based on provided modifications.
|
||||
|
||||
Args:
|
||||
ast: Original AST tokens
|
||||
modifications: Dictionary of modifications to apply
|
||||
|
||||
Returns:
|
||||
Modified AST tokens
|
||||
|
||||
Supported modifications:
|
||||
- add_section: Add a new section with title and content
|
||||
- update_front_matter: Update front matter values
|
||||
"""
|
||||
modified_ast = ast.copy()
|
||||
|
||||
# Handle adding sections
|
||||
if 'add_section' in modifications:
|
||||
section_data = modifications['add_section']
|
||||
title = section_data.get('title', 'New Section')
|
||||
content = section_data.get('content', '')
|
||||
level = section_data.get('level', 2)
|
||||
|
||||
# Create new section tokens
|
||||
new_tokens = [
|
||||
{
|
||||
"type": "heading_open",
|
||||
"tag": f"h{level}",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "#" * level,
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "inline",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 1,
|
||||
"children": [
|
||||
{
|
||||
"type": "text",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 0,
|
||||
"content": title,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": False,
|
||||
"hidden": False
|
||||
}
|
||||
],
|
||||
"content": title,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "heading_close",
|
||||
"tag": f"h{level}",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": -1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "#" * level,
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
}
|
||||
]
|
||||
|
||||
if content:
|
||||
new_tokens.extend([
|
||||
{
|
||||
"type": "paragraph_open",
|
||||
"tag": "p",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "inline",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 1,
|
||||
"children": [
|
||||
{
|
||||
"type": "text",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 0,
|
||||
"content": content,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": False,
|
||||
"hidden": False
|
||||
}
|
||||
],
|
||||
"content": content,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "paragraph_close",
|
||||
"tag": "p",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": -1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
}
|
||||
])
|
||||
|
||||
# Add to end of AST
|
||||
modified_ast.extend(new_tokens)
|
||||
|
||||
return modified_ast
|
||||
475
markitect/core/workspace.py
Normal file
475
markitect/core/workspace.py
Normal file
@@ -0,0 +1,475 @@
|
||||
"""
|
||||
Workspace management functionality for Issue #144.
|
||||
|
||||
This module provides workspace templates, multi-project support, and
|
||||
collaborative workspace features.
|
||||
"""
|
||||
|
||||
import json
|
||||
import yaml
|
||||
import shutil
|
||||
import zipfile
|
||||
import hashlib
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
|
||||
from markitect.assets import AssetManager
|
||||
|
||||
|
||||
@dataclass
|
||||
class TemplateMetadata:
|
||||
"""Metadata for workspace templates."""
|
||||
name: str
|
||||
description: str
|
||||
version: str
|
||||
created_at: datetime
|
||||
asset_count: int
|
||||
author: str = "Unknown"
|
||||
tags: List[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class TemplateResult:
|
||||
"""Result of template creation."""
|
||||
success: bool
|
||||
template_path: Path
|
||||
template_name: str
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class WorkspaceCreationResult:
|
||||
"""Result of workspace creation from template."""
|
||||
success: bool
|
||||
workspace_path: Path
|
||||
project_name: str
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class ProjectResult:
|
||||
"""Result of project operations."""
|
||||
success: bool
|
||||
project_path: Path
|
||||
project_name: str
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class SyncResult:
|
||||
"""Result of workspace synchronization."""
|
||||
synchronized_count: int
|
||||
skipped_count: int
|
||||
error_count: int
|
||||
errors: List[Exception] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class BackupResult:
|
||||
"""Result of workspace backup."""
|
||||
success: bool
|
||||
backup_path: Path
|
||||
backup_size: int
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class RestoreResult:
|
||||
"""Result of workspace restore."""
|
||||
success: bool
|
||||
restored_path: Path
|
||||
files_restored: int
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class WorkspaceState:
|
||||
"""Snapshot of workspace state."""
|
||||
timestamp: datetime
|
||||
file_checksums: Dict[str, str]
|
||||
directory_structure: List[str]
|
||||
asset_hashes: List[str]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ConflictInfo:
|
||||
"""Information about a workspace conflict."""
|
||||
file_path: Path
|
||||
conflict_type: str
|
||||
local_timestamp: datetime
|
||||
remote_timestamp: datetime
|
||||
|
||||
|
||||
@dataclass
|
||||
class MergeResult:
|
||||
"""Result of conflict resolution."""
|
||||
resolved_conflicts: int
|
||||
unresolved_conflicts: int
|
||||
merge_strategy: str
|
||||
|
||||
|
||||
class WorkspaceTemplate:
|
||||
"""Workspace template management."""
|
||||
|
||||
def __init__(self, template_path: Path):
|
||||
"""Initialize workspace template."""
|
||||
self.template_path = template_path
|
||||
self.metadata_file = template_path / "template.json"
|
||||
|
||||
def get_metadata(self) -> TemplateMetadata:
|
||||
"""Get template metadata."""
|
||||
if self.metadata_file.exists():
|
||||
metadata_dict = json.loads(self.metadata_file.read_text())
|
||||
return TemplateMetadata(**metadata_dict)
|
||||
else:
|
||||
return TemplateMetadata(
|
||||
name="Unknown",
|
||||
description="No description",
|
||||
version="1.0.0",
|
||||
created_at=datetime.now(),
|
||||
asset_count=0
|
||||
)
|
||||
|
||||
|
||||
class WorkspaceManager:
|
||||
"""Workspace management system."""
|
||||
|
||||
def __init__(self, templates_dir: Optional[Path] = None):
|
||||
"""Initialize workspace manager."""
|
||||
self.templates_dir = templates_dir or Path.home() / ".markitect" / "templates"
|
||||
self.templates_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def create_template(self, name: str, source_path: Path, description: str = "",
|
||||
include_assets: bool = True, configuration: Optional[Dict] = None) -> TemplateResult:
|
||||
"""Create a workspace template from existing workspace."""
|
||||
try:
|
||||
template_path = self.templates_dir / name
|
||||
template_path.mkdir(exist_ok=True)
|
||||
|
||||
# Copy workspace structure
|
||||
self._copy_workspace_structure(source_path, template_path, include_assets)
|
||||
|
||||
# Count assets
|
||||
asset_count = 0
|
||||
if include_assets and (source_path / "assets").exists():
|
||||
asset_count = len(list((source_path / "assets").rglob("*")))
|
||||
|
||||
# Create template metadata
|
||||
metadata = {
|
||||
"name": name,
|
||||
"description": description,
|
||||
"version": "1.0.0",
|
||||
"created_at": datetime.now().isoformat(),
|
||||
"asset_count": asset_count,
|
||||
"author": "Unknown",
|
||||
"tags": []
|
||||
}
|
||||
|
||||
metadata_file = template_path / "template.json"
|
||||
metadata_file.write_text(json.dumps(metadata, indent=2))
|
||||
|
||||
# Save configuration if provided
|
||||
if configuration:
|
||||
config_file = template_path / "markitect.yaml"
|
||||
config_file.write_text(yaml.dump(configuration, indent=2))
|
||||
|
||||
return TemplateResult(
|
||||
success=True,
|
||||
template_path=template_path,
|
||||
template_name=name
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return TemplateResult(
|
||||
success=False,
|
||||
template_path=Path(),
|
||||
template_name=name,
|
||||
error=e
|
||||
)
|
||||
|
||||
def get_template_metadata(self, template_name: str) -> TemplateMetadata:
|
||||
"""Get metadata for a specific template."""
|
||||
template_path = self.templates_dir / template_name
|
||||
template = WorkspaceTemplate(template_path)
|
||||
return template.get_metadata()
|
||||
|
||||
def create_workspace_from_template(self, template_name: str, target_path: Path,
|
||||
project_name: str) -> WorkspaceCreationResult:
|
||||
"""Create a new workspace from a template."""
|
||||
try:
|
||||
template_path = self.templates_dir / template_name
|
||||
|
||||
if not template_path.exists():
|
||||
raise FileNotFoundError(f"Template '{template_name}' not found")
|
||||
|
||||
# Create target directory
|
||||
target_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Copy template contents
|
||||
self._copy_workspace_structure(template_path, target_path, include_assets=True)
|
||||
|
||||
# Update project-specific files
|
||||
self._customize_workspace(target_path, project_name)
|
||||
|
||||
return WorkspaceCreationResult(
|
||||
success=True,
|
||||
workspace_path=target_path,
|
||||
project_name=project_name
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return WorkspaceCreationResult(
|
||||
success=False,
|
||||
workspace_path=target_path,
|
||||
project_name=project_name,
|
||||
error=e
|
||||
)
|
||||
|
||||
def initialize_multi_project_workspace(self, workspace_root: Path):
|
||||
"""Initialize a multi-project workspace."""
|
||||
workspace_root.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Create shared directories
|
||||
(workspace_root / "shared_assets").mkdir(exist_ok=True)
|
||||
(workspace_root / "templates").mkdir(exist_ok=True)
|
||||
(workspace_root / "config").mkdir(exist_ok=True)
|
||||
|
||||
# Create workspace configuration
|
||||
config = {
|
||||
"workspace_type": "multi_project",
|
||||
"shared_assets_enabled": True,
|
||||
"project_isolation": True,
|
||||
"created_at": datetime.now().isoformat()
|
||||
}
|
||||
|
||||
config_file = workspace_root / "workspace.yaml"
|
||||
config_file.write_text(yaml.dump(config, indent=2))
|
||||
|
||||
def add_project(self, workspace_root: Path, project_name: str,
|
||||
template: Optional[str] = None) -> ProjectResult:
|
||||
"""Add a project to multi-project workspace."""
|
||||
try:
|
||||
project_path = workspace_root / project_name
|
||||
project_path.mkdir(exist_ok=True)
|
||||
|
||||
if template:
|
||||
# Use template if specified
|
||||
result = self.create_workspace_from_template(template, project_path, project_name)
|
||||
if not result.success:
|
||||
raise result.error or Exception("Template creation failed")
|
||||
else:
|
||||
# Create basic project structure
|
||||
(project_path / "docs").mkdir(exist_ok=True)
|
||||
(project_path / "assets").mkdir(exist_ok=True)
|
||||
|
||||
return ProjectResult(
|
||||
success=True,
|
||||
project_path=project_path,
|
||||
project_name=project_name
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return ProjectResult(
|
||||
success=False,
|
||||
project_path=workspace_root / project_name,
|
||||
project_name=project_name,
|
||||
error=e
|
||||
)
|
||||
|
||||
def get_shared_asset_library(self, workspace_root: Path) -> Optional[AssetManager]:
|
||||
"""Get shared asset library for multi-project workspace."""
|
||||
shared_assets_path = workspace_root / "shared_assets"
|
||||
if shared_assets_path.exists():
|
||||
return AssetManager(storage_path=shared_assets_path)
|
||||
return None
|
||||
|
||||
def initialize_workspace(self, workspace_path: Path):
|
||||
"""Initialize a single workspace."""
|
||||
workspace_path.mkdir(parents=True, exist_ok=True)
|
||||
(workspace_path / "assets").mkdir(exist_ok=True)
|
||||
(workspace_path / "docs").mkdir(exist_ok=True)
|
||||
|
||||
def synchronize_assets(self, source_workspace: Path, target_workspace: Path,
|
||||
sync_mode: str = "incremental") -> SyncResult:
|
||||
"""Synchronize assets between workspaces."""
|
||||
result = SyncResult(
|
||||
synchronized_count=0,
|
||||
skipped_count=0,
|
||||
error_count=0
|
||||
)
|
||||
|
||||
try:
|
||||
source_assets = source_workspace / "assets"
|
||||
target_assets = target_workspace / "assets"
|
||||
|
||||
if not source_assets.exists():
|
||||
return result
|
||||
|
||||
target_assets.mkdir(exist_ok=True)
|
||||
|
||||
# Simple synchronization (copy new files)
|
||||
for asset_file in source_assets.rglob("*"):
|
||||
if asset_file.is_file():
|
||||
relative_path = asset_file.relative_to(source_assets)
|
||||
target_file = target_assets / relative_path
|
||||
|
||||
if not target_file.exists() or sync_mode == "overwrite":
|
||||
target_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(asset_file, target_file)
|
||||
result.synchronized_count += 1
|
||||
else:
|
||||
result.skipped_count += 1
|
||||
|
||||
except Exception as e:
|
||||
result.error_count += 1
|
||||
result.errors.append(e)
|
||||
|
||||
return result
|
||||
|
||||
def create_backup(self, workspace_path: Path, backup_path: Path,
|
||||
include_assets: bool = True, compression_level: int = 6) -> BackupResult:
|
||||
"""Create a backup of workspace."""
|
||||
try:
|
||||
with zipfile.ZipFile(backup_path, 'w', zipfile.ZIP_DEFLATED, compresslevel=compression_level) as backup_zip:
|
||||
for file_path in workspace_path.rglob("*"):
|
||||
if file_path.is_file():
|
||||
# Skip assets if not included
|
||||
if not include_assets and "assets" in file_path.parts:
|
||||
continue
|
||||
|
||||
arc_name = file_path.relative_to(workspace_path)
|
||||
backup_zip.write(file_path, arc_name)
|
||||
|
||||
backup_size = backup_path.stat().st_size
|
||||
|
||||
return BackupResult(
|
||||
success=True,
|
||||
backup_path=backup_path,
|
||||
backup_size=backup_size
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return BackupResult(
|
||||
success=False,
|
||||
backup_path=backup_path,
|
||||
backup_size=0,
|
||||
error=e
|
||||
)
|
||||
|
||||
def restore_from_backup(self, backup_path: Path, target_path: Path) -> RestoreResult:
|
||||
"""Restore workspace from backup."""
|
||||
try:
|
||||
target_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
files_restored = 0
|
||||
with zipfile.ZipFile(backup_path, 'r') as backup_zip:
|
||||
backup_zip.extractall(target_path)
|
||||
files_restored = len(backup_zip.namelist())
|
||||
|
||||
return RestoreResult(
|
||||
success=True,
|
||||
restored_path=target_path,
|
||||
files_restored=files_restored
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return RestoreResult(
|
||||
success=False,
|
||||
restored_path=target_path,
|
||||
files_restored=0,
|
||||
error=e
|
||||
)
|
||||
|
||||
def capture_workspace_state(self, workspace_path: Path) -> WorkspaceState:
|
||||
"""Capture current state of workspace."""
|
||||
file_checksums = {}
|
||||
directory_structure = []
|
||||
asset_hashes = []
|
||||
|
||||
for item_path in workspace_path.rglob("*"):
|
||||
relative_path = str(item_path.relative_to(workspace_path))
|
||||
|
||||
if item_path.is_file():
|
||||
# Calculate file checksum
|
||||
content = item_path.read_bytes()
|
||||
checksum = hashlib.md5(content).hexdigest()
|
||||
file_checksums[relative_path] = checksum
|
||||
|
||||
# Track asset hashes
|
||||
if "assets" in item_path.parts:
|
||||
asset_hashes.append(checksum)
|
||||
|
||||
directory_structure.append(relative_path)
|
||||
|
||||
return WorkspaceState(
|
||||
timestamp=datetime.now(),
|
||||
file_checksums=file_checksums,
|
||||
directory_structure=directory_structure,
|
||||
asset_hashes=asset_hashes
|
||||
)
|
||||
|
||||
def detect_conflicts(self, state1: WorkspaceState, state2: WorkspaceState) -> List[ConflictInfo]:
|
||||
"""Detect conflicts between workspace states."""
|
||||
conflicts = []
|
||||
|
||||
# Find files that exist in both states but have different checksums
|
||||
for file_path, checksum1 in state1.file_checksums.items():
|
||||
if file_path in state2.file_checksums:
|
||||
checksum2 = state2.file_checksums[file_path]
|
||||
if checksum1 != checksum2:
|
||||
conflict = ConflictInfo(
|
||||
file_path=Path(file_path),
|
||||
conflict_type="content_conflict",
|
||||
local_timestamp=state1.timestamp,
|
||||
remote_timestamp=state2.timestamp
|
||||
)
|
||||
conflicts.append(conflict)
|
||||
|
||||
return conflicts
|
||||
|
||||
def resolve_conflicts(self, conflicts: List[ConflictInfo],
|
||||
resolution_strategy: str = "manual") -> MergeResult:
|
||||
"""Resolve workspace conflicts."""
|
||||
# Mock conflict resolution
|
||||
result = MergeResult(
|
||||
resolved_conflicts=len(conflicts),
|
||||
unresolved_conflicts=0,
|
||||
merge_strategy=resolution_strategy
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
def _copy_workspace_structure(self, source: Path, target: Path, include_assets: bool):
|
||||
"""Copy workspace structure from source to target."""
|
||||
for item in source.rglob("*"):
|
||||
if item.is_file():
|
||||
relative_path = item.relative_to(source)
|
||||
|
||||
# Skip assets if not included
|
||||
if not include_assets and "assets" in relative_path.parts:
|
||||
continue
|
||||
|
||||
# Skip template metadata
|
||||
if item.name == "template.json":
|
||||
continue
|
||||
|
||||
target_path = target / relative_path
|
||||
target_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(item, target_path)
|
||||
|
||||
def _customize_workspace(self, workspace_path: Path, project_name: str):
|
||||
"""Customize workspace for specific project."""
|
||||
# Update any configuration files with project name
|
||||
config_files = list(workspace_path.glob("*.yaml")) + list(workspace_path.glob("*.yml"))
|
||||
|
||||
for config_file in config_files:
|
||||
try:
|
||||
content = config_file.read_text()
|
||||
# Replace placeholder project names
|
||||
content = content.replace("{{PROJECT_NAME}}", project_name)
|
||||
content = content.replace("New Project", project_name)
|
||||
config_file.write_text(content)
|
||||
except Exception:
|
||||
pass # Ignore errors in customization
|
||||
@@ -1,444 +1,11 @@
|
||||
"""
|
||||
Database management functionality for MarkiTect.
|
||||
Database management - Backward Compatibility Module.
|
||||
|
||||
This module provides SQLite database initialization, markdown file storage
|
||||
with front matter support, and JSON schema storage (Issue #3).
|
||||
This module re-exports from markitect.storage.database for backward compatibility.
|
||||
New code should import from markitect.storage.database directly.
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict, Any
|
||||
# Re-export from storage package for backward compatibility
|
||||
from markitect.storage.database import DatabaseManager
|
||||
|
||||
from .frontmatter import FrontMatterParser
|
||||
|
||||
|
||||
class DatabaseManager:
|
||||
"""Manager for SQLite database operations."""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""
|
||||
Initialize database manager.
|
||||
|
||||
Args:
|
||||
db_path: Path to SQLite database file
|
||||
"""
|
||||
self.db_path = db_path
|
||||
self.front_matter_parser = FrontMatterParser()
|
||||
|
||||
def initialize_database(self) -> None:
|
||||
"""
|
||||
Initialize SQLite database with required tables.
|
||||
|
||||
Creates the markdown_files table with the following schema:
|
||||
- id: INTEGER PRIMARY KEY
|
||||
- filename: TEXT NOT NULL
|
||||
- front_matter: TEXT (JSON)
|
||||
- content: TEXT
|
||||
- created_at: TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
|
||||
Also initializes finance schema if finance module is available.
|
||||
"""
|
||||
# Ensure directory exists
|
||||
db_dir = os.path.dirname(self.db_path)
|
||||
if db_dir and not os.path.exists(db_dir):
|
||||
os.makedirs(db_dir)
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create markdown_files table
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS markdown_files (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
filename TEXT NOT NULL,
|
||||
front_matter TEXT,
|
||||
content TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
''')
|
||||
|
||||
# Create schemas table for Issue #3
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS schemas (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
filename TEXT NOT NULL UNIQUE,
|
||||
title TEXT,
|
||||
description TEXT,
|
||||
schema_content TEXT NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
''')
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# Initialize finance schema if available
|
||||
self.initialize_finance_schema()
|
||||
|
||||
def initialize_finance_schema(self) -> None:
|
||||
"""
|
||||
Initialize finance schema for cost tracking (Issue #88).
|
||||
|
||||
This method is called automatically during database initialization
|
||||
to set up cost tracking tables if the finance module is available.
|
||||
"""
|
||||
try:
|
||||
from .finance.models import FinanceModels
|
||||
finance_models = FinanceModels(self.db_path)
|
||||
finance_models.initialize_finance_schema()
|
||||
except ImportError:
|
||||
# Finance module not available, skip initialization
|
||||
pass
|
||||
except Exception as e:
|
||||
# Silently ignore finance schema initialization errors for CLI compatibility
|
||||
pass
|
||||
|
||||
def store_markdown_file(self, filename: str, content: str) -> Optional[int]:
|
||||
"""
|
||||
Store a markdown file in the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the markdown file
|
||||
content: Raw markdown content with optional front matter
|
||||
|
||||
Returns:
|
||||
ID of the inserted record, or None if insertion failed
|
||||
"""
|
||||
# Parse front matter and content
|
||||
front_matter, markdown_content = self.front_matter_parser.parse(content)
|
||||
|
||||
# Convert front matter to JSON string
|
||||
front_matter_json = json.dumps(front_matter) if front_matter else '{}'
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
cursor.execute('''
|
||||
INSERT INTO markdown_files (filename, front_matter, content, created_at)
|
||||
VALUES (?, ?, ?, ?)
|
||||
''', (filename, front_matter_json, markdown_content, datetime.now().isoformat()))
|
||||
|
||||
record_id = cursor.lastrowid
|
||||
conn.commit()
|
||||
return record_id
|
||||
|
||||
except sqlite3.Error:
|
||||
conn.rollback()
|
||||
return None
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_markdown_file(self, filename: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Retrieve a markdown file from the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the markdown file to retrieve
|
||||
|
||||
Returns:
|
||||
Dictionary containing file data, or None if not found
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, front_matter, content, created_at
|
||||
FROM markdown_files
|
||||
WHERE filename = ?
|
||||
''', (filename,))
|
||||
|
||||
row = cursor.fetchone()
|
||||
conn.close()
|
||||
|
||||
if row:
|
||||
return {
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'front_matter': json.loads(row[2]) if row[2] else {},
|
||||
'content': row[3],
|
||||
'created_at': row[4]
|
||||
}
|
||||
|
||||
return None
|
||||
|
||||
def list_markdown_files(self) -> list:
|
||||
"""
|
||||
List all markdown files in the database.
|
||||
|
||||
Returns:
|
||||
List of dictionaries containing file metadata
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, front_matter, created_at
|
||||
FROM markdown_files
|
||||
ORDER BY created_at DESC
|
||||
''')
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
files = []
|
||||
for row in rows:
|
||||
files.append({
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'front_matter': json.loads(row[2]) if row[2] else {},
|
||||
'created_at': row[3]
|
||||
})
|
||||
|
||||
return files
|
||||
|
||||
def execute_query(self, sql: str) -> list:
|
||||
"""
|
||||
Execute a read-only SQL query against the database.
|
||||
|
||||
Args:
|
||||
sql: SQL query string (SELECT operations only)
|
||||
|
||||
Returns:
|
||||
List of dictionaries representing query results
|
||||
|
||||
Raises:
|
||||
ValueError: If query contains non-SELECT operations
|
||||
sqlite3.Error: If query execution fails
|
||||
"""
|
||||
# Security check: only allow SELECT queries
|
||||
sql_upper = sql.strip().upper()
|
||||
if not sql_upper.startswith('SELECT'):
|
||||
allowed_starts = ['SELECT', 'WITH'] # Allow WITH for CTEs
|
||||
if not any(sql_upper.startswith(start) for start in allowed_starts):
|
||||
raise ValueError("Only SELECT and WITH queries are allowed for safety")
|
||||
|
||||
# Additional safety checks for dangerous keywords (as whole words)
|
||||
dangerous_keywords = [
|
||||
'DROP', 'DELETE', 'UPDATE', 'INSERT', 'CREATE', 'ALTER',
|
||||
'TRUNCATE', 'REPLACE', 'PRAGMA'
|
||||
]
|
||||
import re
|
||||
for keyword in dangerous_keywords:
|
||||
# Use word boundaries to match only complete words
|
||||
pattern = r'\b' + keyword + r'\b'
|
||||
if re.search(pattern, sql_upper):
|
||||
raise ValueError(f"Query contains dangerous keyword: {keyword}")
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row # Enable column access by name
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
cursor.execute(sql)
|
||||
rows = cursor.fetchall()
|
||||
|
||||
# Convert rows to dictionaries
|
||||
results = []
|
||||
for row in rows:
|
||||
results.append(dict(row))
|
||||
|
||||
conn.close()
|
||||
return results
|
||||
|
||||
except sqlite3.Error as e:
|
||||
conn.close()
|
||||
raise e
|
||||
|
||||
def get_schema(self) -> dict:
|
||||
"""
|
||||
Get database schema information.
|
||||
|
||||
Returns:
|
||||
Dictionary containing table schemas with column information
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
schema = {}
|
||||
|
||||
try:
|
||||
# Get all table names
|
||||
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
|
||||
tables = cursor.fetchall()
|
||||
|
||||
for table_row in tables:
|
||||
table_name = table_row[0]
|
||||
|
||||
# Get column information for each table
|
||||
cursor.execute(f"PRAGMA table_info({table_name})")
|
||||
columns = cursor.fetchall()
|
||||
|
||||
column_info = []
|
||||
for col in columns:
|
||||
column_info.append({
|
||||
'name': col[1],
|
||||
'type': col[2],
|
||||
'nullable': not bool(col[3]), # notnull flag
|
||||
'default_value': col[4],
|
||||
'primary_key': bool(col[5])
|
||||
})
|
||||
|
||||
schema[table_name] = {
|
||||
'columns': column_info
|
||||
}
|
||||
|
||||
conn.close()
|
||||
return schema
|
||||
|
||||
except sqlite3.Error as e:
|
||||
conn.close()
|
||||
raise e
|
||||
|
||||
# Schema management methods for Issue #3
|
||||
def store_schema_file(self, filename: str, schema_content: str) -> Optional[int]:
|
||||
"""
|
||||
Store a JSON schema file in the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the schema file
|
||||
schema_content: JSON schema content as string
|
||||
|
||||
Returns:
|
||||
ID of the inserted/updated record, or None if operation failed
|
||||
"""
|
||||
try:
|
||||
# Parse and validate JSON schema
|
||||
schema_data = json.loads(schema_content)
|
||||
title = schema_data.get('title', filename)
|
||||
description = schema_data.get('description', '')
|
||||
except json.JSONDecodeError:
|
||||
return None
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
# Check if schema already exists
|
||||
cursor.execute('SELECT id FROM schemas WHERE filename = ?', (filename,))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# Update existing schema
|
||||
cursor.execute('''
|
||||
UPDATE schemas
|
||||
SET title = ?, description = ?, schema_content = ?, updated_at = ?
|
||||
WHERE filename = ?
|
||||
''', (title, description, schema_content, datetime.now().isoformat(), filename))
|
||||
record_id = existing[0]
|
||||
else:
|
||||
# Insert new schema
|
||||
cursor.execute('''
|
||||
INSERT INTO schemas (filename, title, description, schema_content, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
''', (filename, title, description, schema_content,
|
||||
datetime.now().isoformat(), datetime.now().isoformat()))
|
||||
record_id = cursor.lastrowid
|
||||
|
||||
conn.commit()
|
||||
return record_id
|
||||
|
||||
except sqlite3.Error:
|
||||
conn.rollback()
|
||||
return None
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_schema_file(self, filename: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Retrieve a schema file from the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the schema file to retrieve
|
||||
|
||||
Returns:
|
||||
Dictionary containing schema data, or None if not found
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, title, description, schema_content, created_at, updated_at
|
||||
FROM schemas
|
||||
WHERE filename = ?
|
||||
''', (filename,))
|
||||
|
||||
row = cursor.fetchone()
|
||||
conn.close()
|
||||
|
||||
if row:
|
||||
return {
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'title': row[2],
|
||||
'description': row[3],
|
||||
'schema_content': row[4],
|
||||
'created_at': row[5],
|
||||
'updated_at': row[6]
|
||||
}
|
||||
|
||||
return None
|
||||
|
||||
def list_schema_files(self) -> list:
|
||||
"""
|
||||
List all schema files in the database.
|
||||
|
||||
Returns:
|
||||
List of dictionaries containing schema metadata
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, title, description, created_at, updated_at
|
||||
FROM schemas
|
||||
ORDER BY updated_at DESC
|
||||
''')
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
schemas = []
|
||||
for row in rows:
|
||||
schemas.append({
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'title': row[2],
|
||||
'description': row[3],
|
||||
'created_at': row[4],
|
||||
'updated_at': row[5]
|
||||
})
|
||||
|
||||
return schemas
|
||||
|
||||
def delete_schema_file(self, filename: str) -> bool:
|
||||
"""
|
||||
Delete a schema file from the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the schema file to delete
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
cursor.execute('DELETE FROM schemas WHERE filename = ?', (filename,))
|
||||
success = cursor.rowcount > 0
|
||||
conn.commit()
|
||||
return success
|
||||
|
||||
except sqlite3.Error:
|
||||
conn.rollback()
|
||||
return False
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
__all__ = ['DatabaseManager']
|
||||
|
||||
@@ -1,98 +1,11 @@
|
||||
"""
|
||||
Document manager - Clean implementation.
|
||||
Document manager - Backward Compatibility Module.
|
||||
|
||||
This module provides the DocumentManager class which is now a wrapper around
|
||||
the CleanDocumentManager for backward compatibility.
|
||||
This module re-exports from markitect.core.document_manager for backward compatibility.
|
||||
New code should import from markitect.core.document_manager directly.
|
||||
"""
|
||||
|
||||
from .clean_document_manager import CleanDocumentManager
|
||||
# Re-export from core for backward compatibility
|
||||
from markitect.core.document_manager import DocumentManager, CleanDocumentManager
|
||||
|
||||
|
||||
class DocumentManager(CleanDocumentManager):
|
||||
"""
|
||||
Document manager for backward compatibility.
|
||||
|
||||
This class extends CleanDocumentManager to maintain compatibility
|
||||
with existing code while using the clean implementation.
|
||||
"""
|
||||
|
||||
def __init__(self, db_manager=None):
|
||||
super().__init__(db_manager)
|
||||
|
||||
def ingest_file(self, file_path: str):
|
||||
"""
|
||||
Ingest a markdown file for processing.
|
||||
|
||||
This method provides compatibility for tests expecting the ingest_file interface.
|
||||
"""
|
||||
import time
|
||||
from pathlib import Path
|
||||
from .parser import parse_markdown_to_ast
|
||||
from .frontmatter import FrontMatterParser
|
||||
|
||||
file_path = Path(file_path)
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"File not found: {file_path}")
|
||||
|
||||
# Read file content
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
|
||||
# Extract front matter
|
||||
start_time = time.time()
|
||||
parser = FrontMatterParser()
|
||||
front_matter_data, content_without_front_matter = parser.parse(content)
|
||||
|
||||
# Parse to AST
|
||||
ast = parse_markdown_to_ast(content)
|
||||
parse_time = time.time() - start_time
|
||||
|
||||
# Extract title - first try front matter, then first heading, then filename
|
||||
title = "Unknown"
|
||||
if front_matter_data and 'title' in front_matter_data:
|
||||
title = front_matter_data['title']
|
||||
elif isinstance(ast, list):
|
||||
# Look for first H1 heading in AST tokens
|
||||
for token in ast:
|
||||
if token.get('type') == 'heading_open' and token.get('tag') == 'h1':
|
||||
# Find the next inline token with content
|
||||
idx = ast.index(token) + 1
|
||||
if idx < len(ast) and ast[idx].get('type') == 'inline':
|
||||
title = ast[idx].get('content', 'Unknown')
|
||||
break
|
||||
|
||||
# Create actual cache file for compatibility
|
||||
cache_dir = Path(file_path.parent) / '.ast_cache'
|
||||
cache_dir.mkdir(exist_ok=True)
|
||||
cache_file = cache_dir / f"{file_path.stem}_ast.json"
|
||||
|
||||
# Write AST to cache file
|
||||
import json
|
||||
with open(cache_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(ast, f, indent=2)
|
||||
|
||||
# Store document in database if db_manager exists
|
||||
if hasattr(self, 'db_manager') and self.db_manager:
|
||||
try:
|
||||
# Store using the clean document manager's method
|
||||
self.store_document(str(file_path), content, ast, front_matter_data)
|
||||
except Exception:
|
||||
# If storage fails, continue without error for test compatibility
|
||||
pass
|
||||
|
||||
return {
|
||||
'ast': ast,
|
||||
'content': content,
|
||||
'metadata': {
|
||||
'filename': file_path.name,
|
||||
'title': title,
|
||||
'size': len(content),
|
||||
'path': str(file_path)
|
||||
},
|
||||
'ast_cache_path': cache_file,
|
||||
'parse_time': parse_time,
|
||||
'cache_time': 0 # Mock cache time for compatibility
|
||||
}
|
||||
|
||||
|
||||
# For backward compatibility, also export the clean document manager directly
|
||||
__all__ = ['DocumentManager', 'CleanDocumentManager']
|
||||
__all__ = ['DocumentManager', 'CleanDocumentManager']
|
||||
|
||||
@@ -1,26 +1,11 @@
|
||||
from markdown_it import MarkdownIt
|
||||
"""
|
||||
Markdown AST Parser - Backward Compatibility Module.
|
||||
|
||||
def parse_markdown_to_ast(md_content: str):
|
||||
# Enable table parsing and other common plugins
|
||||
md = MarkdownIt("commonmark", {"tables": True}).enable(['table'])
|
||||
tokens = md.parse(md_content)
|
||||
# Convert to a JSON-serializable list of dicts (tokens are objects, so we dict-ify them recursively)
|
||||
def token_to_dict(token):
|
||||
d = {
|
||||
'type': token.type,
|
||||
'tag': token.tag,
|
||||
'attrs': token.attrs,
|
||||
'map': token.map,
|
||||
'nesting': token.nesting,
|
||||
'level': token.level,
|
||||
'children': [token_to_dict(child) if child else None for child in token.children] if token.children else None,
|
||||
'content': token.content,
|
||||
'markup': token.markup,
|
||||
'info': token.info,
|
||||
'meta': token.meta,
|
||||
'block': token.block,
|
||||
'hidden': token.hidden
|
||||
}
|
||||
return {k: v for k, v in d.items() if v is not None} # Remove None values for cleanliness
|
||||
This module re-exports from markitect.core.parser for backward compatibility.
|
||||
New code should import from markitect.core.parser directly.
|
||||
"""
|
||||
|
||||
return [token_to_dict(token) for token in tokens]
|
||||
# Re-export from core for backward compatibility
|
||||
from markitect.core.parser import parse_markdown_to_ast
|
||||
|
||||
__all__ = ['parse_markdown_to_ast']
|
||||
|
||||
72
markitect/schema/__init__.py
Normal file
72
markitect/schema/__init__.py
Normal file
@@ -0,0 +1,72 @@
|
||||
"""
|
||||
Schema management modules for MarkiTect.
|
||||
|
||||
This package contains the schema-related functionality:
|
||||
- Validator: Validate markdown documents against JSON schemas
|
||||
- Generator: Generate JSON schemas from markdown structures
|
||||
- Loader: Load schemas from markdown files with embedded JSON
|
||||
- Analyzer: Analyze schemas for rigidity issues
|
||||
- Refiner: Refine rigid schemas with loosening rules
|
||||
- Naming: Schema filename convention validation
|
||||
|
||||
All modules are re-exported from their original schema_*.py locations
|
||||
for backward compatibility.
|
||||
"""
|
||||
|
||||
from .validator import SchemaValidator
|
||||
from .generator import SchemaGenerator
|
||||
from .loader import (
|
||||
MarkdownSchemaLoader,
|
||||
SchemaLoaderError,
|
||||
InvalidSchemaFormatError,
|
||||
SchemaNotFoundError,
|
||||
)
|
||||
from .analyzer import (
|
||||
SchemaAnalyzer,
|
||||
SchemaAnalysisResult,
|
||||
SchemaIssue,
|
||||
IssueType,
|
||||
IssueSeverity,
|
||||
)
|
||||
from .refiner import (
|
||||
SchemaRefiner,
|
||||
RefinementResult,
|
||||
RefinementAction,
|
||||
)
|
||||
from .naming import (
|
||||
validate_schema_filename,
|
||||
suggest_valid_filename,
|
||||
extract_schema_domain,
|
||||
get_schema_version,
|
||||
SchemaFilenameError,
|
||||
SCHEMA_FILENAME_PATTERN,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Validator
|
||||
"SchemaValidator",
|
||||
# Generator
|
||||
"SchemaGenerator",
|
||||
# Loader
|
||||
"MarkdownSchemaLoader",
|
||||
"SchemaLoaderError",
|
||||
"InvalidSchemaFormatError",
|
||||
"SchemaNotFoundError",
|
||||
# Analyzer
|
||||
"SchemaAnalyzer",
|
||||
"SchemaAnalysisResult",
|
||||
"SchemaIssue",
|
||||
"IssueType",
|
||||
"IssueSeverity",
|
||||
# Refiner
|
||||
"SchemaRefiner",
|
||||
"RefinementResult",
|
||||
"RefinementAction",
|
||||
# Naming
|
||||
"validate_schema_filename",
|
||||
"suggest_valid_filename",
|
||||
"extract_schema_domain",
|
||||
"get_schema_version",
|
||||
"SchemaFilenameError",
|
||||
"SCHEMA_FILENAME_PATTERN",
|
||||
]
|
||||
352
markitect/schema/analyzer.py
Normal file
352
markitect/schema/analyzer.py
Normal file
@@ -0,0 +1,352 @@
|
||||
"""
|
||||
Schema Analyzer for Phase 2: Schema Refinement Tools
|
||||
|
||||
Analyzes JSON schemas to detect rigidity issues and provide suggestions
|
||||
for improvement using the Phase 1 classification system.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional, Tuple
|
||||
import json
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class IssueType(Enum):
|
||||
"""Types of schema rigidity issues."""
|
||||
EXACT_COUNT = "exact_count"
|
||||
MISSING_CLASSIFICATIONS = "missing_classifications"
|
||||
MISSING_CONTENT_INSTRUCTIONS = "missing_content_instructions"
|
||||
OVERLY_SPECIFIC = "overly_specific"
|
||||
NO_FLEXIBILITY = "no_flexibility"
|
||||
DEPRECATED_EXTENSIONS = "deprecated_extensions"
|
||||
|
||||
|
||||
class IssueSeverity(Enum):
|
||||
"""Severity levels for schema issues."""
|
||||
INFO = "info"
|
||||
WARNING = "warning"
|
||||
ERROR = "error"
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaIssue:
|
||||
"""Represents a detected schema issue."""
|
||||
issue_type: IssueType
|
||||
severity: IssueSeverity
|
||||
path: str
|
||||
message: str
|
||||
suggestion: str
|
||||
current_value: Any = None
|
||||
suggested_value: Any = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaAnalysisResult:
|
||||
"""Results of schema analysis."""
|
||||
is_rigid: bool
|
||||
rigidity_score: int # 0-100, higher = more rigid
|
||||
issues: List[SchemaIssue] = field(default_factory=list)
|
||||
has_classifications: bool = False
|
||||
has_content_control: bool = False
|
||||
uses_deprecated_extensions: bool = False
|
||||
|
||||
@property
|
||||
def issue_count_by_severity(self) -> Dict[IssueSeverity, int]:
|
||||
"""Count issues by severity."""
|
||||
counts = {severity: 0 for severity in IssueSeverity}
|
||||
for issue in self.issues:
|
||||
counts[issue.severity] += 1
|
||||
return counts
|
||||
|
||||
|
||||
class SchemaAnalyzer:
|
||||
"""Analyzes schemas for rigidity and suggests improvements."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema analyzer."""
|
||||
self.deprecated_extensions = [
|
||||
"x-markitect-required-sections",
|
||||
"x-markitect-recommended-sections",
|
||||
"x-markitect-optional-sections"
|
||||
]
|
||||
|
||||
def analyze_schema(self, schema: Dict[str, Any]) -> SchemaAnalysisResult:
|
||||
"""
|
||||
Analyze a schema for rigidity issues.
|
||||
|
||||
Args:
|
||||
schema: The JSON schema to analyze
|
||||
|
||||
Returns:
|
||||
SchemaAnalysisResult with detected issues and suggestions
|
||||
"""
|
||||
result = SchemaAnalysisResult(is_rigid=False, rigidity_score=0)
|
||||
|
||||
# Check for Phase 1 features
|
||||
result.has_classifications = "x-markitect-sections" in schema
|
||||
result.has_content_control = "x-markitect-content-control" in schema
|
||||
|
||||
# Check for deprecated extensions
|
||||
for deprecated in self.deprecated_extensions:
|
||||
if deprecated in schema:
|
||||
result.uses_deprecated_extensions = True
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.DEPRECATED_EXTENSIONS,
|
||||
severity=IssueSeverity.WARNING,
|
||||
path=deprecated,
|
||||
message=f"Using deprecated extension '{deprecated}'",
|
||||
suggestion=f"Migrate to 'x-markitect-sections' with classification system"
|
||||
))
|
||||
|
||||
# Analyze properties for rigidity
|
||||
if "properties" in schema:
|
||||
self._analyze_properties(schema["properties"], result, "properties")
|
||||
|
||||
# Check for missing classifications
|
||||
if not result.has_classifications:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.MISSING_CLASSIFICATIONS,
|
||||
severity=IssueSeverity.INFO,
|
||||
path="root",
|
||||
message="Schema does not use section classification system",
|
||||
suggestion="Add 'x-markitect-sections' to classify sections as required/recommended/optional/discouraged/improper"
|
||||
))
|
||||
|
||||
# Check for missing content control
|
||||
if not result.has_content_control:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.MISSING_CONTENT_INSTRUCTIONS,
|
||||
severity=IssueSeverity.INFO,
|
||||
path="root",
|
||||
message="Schema does not provide content control",
|
||||
suggestion="Add 'x-markitect-content-control' for pattern validation and quality metrics"
|
||||
))
|
||||
|
||||
# Calculate rigidity score
|
||||
result.rigidity_score = self._calculate_rigidity_score(result)
|
||||
result.is_rigid = result.rigidity_score > 50
|
||||
|
||||
return result
|
||||
|
||||
def _analyze_properties(self, properties: Dict[str, Any], result: SchemaAnalysisResult, path: str):
|
||||
"""Analyze schema properties for rigidity issues."""
|
||||
for prop_name, prop_def in properties.items():
|
||||
prop_path = f"{path}.{prop_name}"
|
||||
|
||||
if not isinstance(prop_def, dict):
|
||||
continue
|
||||
|
||||
# Check for exact counts (const)
|
||||
if "const" in prop_def:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
severity=IssueSeverity.WARNING,
|
||||
path=prop_path,
|
||||
message=f"Property '{prop_name}' requires exact value",
|
||||
suggestion=f"Consider using a range or removing constraint for flexibility",
|
||||
current_value=prop_def["const"]
|
||||
))
|
||||
|
||||
# Check for arrays with exact counts
|
||||
if prop_def.get("type") == "array":
|
||||
min_items = prop_def.get("minItems")
|
||||
max_items = prop_def.get("maxItems")
|
||||
|
||||
if min_items is not None and max_items is not None and min_items == max_items:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
severity=IssueSeverity.WARNING,
|
||||
path=prop_path,
|
||||
message=f"Array '{prop_name}' requires exactly {min_items} items",
|
||||
suggestion=f"Use a range like minItems: {max(0, min_items - 2)}, maxItems: {min_items + 5}",
|
||||
current_value={"minItems": min_items, "maxItems": max_items},
|
||||
suggested_value={
|
||||
"minItems": max(0, min_items - 2),
|
||||
"maxItems": min_items + 5
|
||||
}
|
||||
))
|
||||
|
||||
# Check for overly specific counts (large numbers)
|
||||
if min_items is not None and min_items > 50:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.OVERLY_SPECIFIC,
|
||||
severity=IssueSeverity.INFO,
|
||||
path=prop_path,
|
||||
message=f"Array '{prop_name}' has very specific minItems: {min_items}",
|
||||
suggestion=f"Consider rounding to {(min_items // 10) * 10} for flexibility",
|
||||
current_value=min_items,
|
||||
suggested_value=(min_items // 10) * 10
|
||||
))
|
||||
|
||||
# Check for overly specific integer constraints
|
||||
if prop_def.get("type") == "integer":
|
||||
if "minimum" in prop_def and "maximum" in prop_def:
|
||||
min_val = prop_def["minimum"]
|
||||
max_val = prop_def["maximum"]
|
||||
range_size = max_val - min_val
|
||||
|
||||
if range_size < 3:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.NO_FLEXIBILITY,
|
||||
severity=IssueSeverity.INFO,
|
||||
path=prop_path,
|
||||
message=f"Integer '{prop_name}' has very narrow range: {min_val}-{max_val}",
|
||||
suggestion=f"Consider widening range for flexibility",
|
||||
current_value={"minimum": min_val, "maximum": max_val}
|
||||
))
|
||||
|
||||
# Recursively check nested properties
|
||||
if "properties" in prop_def:
|
||||
self._analyze_properties(prop_def["properties"], result, prop_path)
|
||||
|
||||
# Check items schema for arrays
|
||||
if "items" in prop_def and isinstance(prop_def["items"], dict):
|
||||
if "properties" in prop_def["items"]:
|
||||
self._analyze_properties(
|
||||
prop_def["items"]["properties"],
|
||||
result,
|
||||
f"{prop_path}.items"
|
||||
)
|
||||
|
||||
def _calculate_rigidity_score(self, result: SchemaAnalysisResult) -> int:
|
||||
"""
|
||||
Calculate overall rigidity score (0-100).
|
||||
|
||||
Higher score = more rigid schema.
|
||||
"""
|
||||
score = 0
|
||||
|
||||
# Count issues by type with weighted scores
|
||||
weights = {
|
||||
IssueType.EXACT_COUNT: 15,
|
||||
IssueType.OVERLY_SPECIFIC: 10,
|
||||
IssueType.NO_FLEXIBILITY: 8,
|
||||
IssueType.MISSING_CLASSIFICATIONS: 5,
|
||||
IssueType.MISSING_CONTENT_INSTRUCTIONS: 3,
|
||||
IssueType.DEPRECATED_EXTENSIONS: 5
|
||||
}
|
||||
|
||||
for issue in result.issues:
|
||||
score += weights.get(issue.issue_type, 5)
|
||||
|
||||
# Cap at 100
|
||||
return min(100, score)
|
||||
|
||||
def analyze_schema_file(self, schema_path: Path) -> SchemaAnalysisResult:
|
||||
"""
|
||||
Analyze a schema file.
|
||||
|
||||
Args:
|
||||
schema_path: Path to JSON schema file
|
||||
|
||||
Returns:
|
||||
SchemaAnalysisResult
|
||||
"""
|
||||
with open(schema_path) as f:
|
||||
schema = json.load(f)
|
||||
|
||||
return self.analyze_schema(schema)
|
||||
|
||||
def format_analysis_report(self, result: SchemaAnalysisResult, verbose: bool = False) -> str:
|
||||
"""
|
||||
Format analysis results as a human-readable report.
|
||||
|
||||
Args:
|
||||
result: Analysis results
|
||||
verbose: Include detailed information
|
||||
|
||||
Returns:
|
||||
Formatted report string
|
||||
"""
|
||||
lines = []
|
||||
|
||||
# Header
|
||||
lines.append("=" * 70)
|
||||
lines.append("Schema Analysis Report")
|
||||
lines.append("=" * 70)
|
||||
lines.append("")
|
||||
|
||||
# Overall assessment
|
||||
rigidity_level = "HIGH" if result.rigidity_score > 70 else "MEDIUM" if result.rigidity_score > 40 else "LOW"
|
||||
lines.append(f"Rigidity Score: {result.rigidity_score}/100 ({rigidity_level})")
|
||||
lines.append(f"Status: {'RIGID - Needs refinement' if result.is_rigid else 'FLEXIBLE - Good'}")
|
||||
lines.append("")
|
||||
|
||||
# Features check
|
||||
lines.append("Phase 1 Features:")
|
||||
lines.append(f" - Classifications: {'Yes' if result.has_classifications else 'No'}")
|
||||
lines.append(f" - Content Control: {'Yes' if result.has_content_control else 'No'}")
|
||||
if result.uses_deprecated_extensions:
|
||||
lines.append(f" - Deprecated Extensions: Yes (needs migration)")
|
||||
lines.append("")
|
||||
|
||||
# Issue summary
|
||||
counts = result.issue_count_by_severity
|
||||
lines.append(f"Issues Found: {len(result.issues)} total")
|
||||
lines.append(f" - Errors: {counts[IssueSeverity.ERROR]}")
|
||||
lines.append(f" - Warnings: {counts[IssueSeverity.WARNING]}")
|
||||
lines.append(f" - Info: {counts[IssueSeverity.INFO]}")
|
||||
lines.append("")
|
||||
|
||||
# List issues
|
||||
if result.issues:
|
||||
lines.append("Detected Issues:")
|
||||
lines.append("-" * 70)
|
||||
|
||||
for i, issue in enumerate(result.issues, 1):
|
||||
severity_icon = "ERROR" if issue.severity == IssueSeverity.ERROR else "WARN" if issue.severity == IssueSeverity.WARNING else "INFO"
|
||||
lines.append(f"{i}. [{severity_icon}] {issue.message}")
|
||||
lines.append(f" Path: {issue.path}")
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if verbose and issue.current_value is not None:
|
||||
lines.append(f" Current: {json.dumps(issue.current_value)}")
|
||||
if verbose and issue.suggested_value is not None:
|
||||
lines.append(f" Suggested: {json.dumps(issue.suggested_value)}")
|
||||
|
||||
lines.append("")
|
||||
else:
|
||||
lines.append("No issues found - schema is well-designed!")
|
||||
lines.append("")
|
||||
|
||||
# Recommendations
|
||||
if result.is_rigid:
|
||||
lines.append("Recommendations:")
|
||||
lines.append("-" * 70)
|
||||
lines.append("Run: markitect schema-refine <schema-file> --loosen-counts")
|
||||
lines.append(" to automatically apply suggested improvements")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def analyze_schema_cli(schema_path: str, verbose: bool = False) -> int:
|
||||
"""
|
||||
CLI entry point for schema analysis.
|
||||
|
||||
Args:
|
||||
schema_path: Path to schema file
|
||||
verbose: Show detailed information
|
||||
|
||||
Returns:
|
||||
Exit code (0 = success, 1 = rigid schema found)
|
||||
"""
|
||||
analyzer = SchemaAnalyzer()
|
||||
|
||||
try:
|
||||
result = analyzer.analyze_schema_file(Path(schema_path))
|
||||
report = analyzer.format_analysis_report(result, verbose=verbose)
|
||||
print(report)
|
||||
|
||||
return 1 if result.is_rigid else 0
|
||||
|
||||
except FileNotFoundError:
|
||||
print(f"Error: Schema file not found: {schema_path}")
|
||||
return 2
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: Invalid JSON in schema file: {e}")
|
||||
return 2
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
return 2
|
||||
466
markitect/schema/generator.py
Normal file
466
markitect/schema/generator.py
Normal file
@@ -0,0 +1,466 @@
|
||||
"""
|
||||
Schema Generator for Issue #5: Generate a Schema from a Markdown File.
|
||||
|
||||
This module provides functionality to analyze markdown AST structures and generate
|
||||
JSON schemas that describe the document's structural elements with configurable
|
||||
depth limitations for architectural documentation analysis.
|
||||
"""
|
||||
|
||||
import json
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional, Set
|
||||
|
||||
from markitect.core.parser import parse_markdown_to_ast
|
||||
from markitect.exceptions import FileNotFoundError, InvalidDepthError, InvalidInstructionTypeError
|
||||
|
||||
|
||||
class SchemaGenerator:
|
||||
"""
|
||||
Generates JSON schemas from markdown file AST structures.
|
||||
|
||||
Analyzes the structural elements of markdown documents and creates
|
||||
JSON schemas that can be used for validation and compliance checking
|
||||
in architecture documentation workflows.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema generator."""
|
||||
self.default_schema_url = "http://json-schema.org/draft-07/schema#"
|
||||
|
||||
def generate_schema_from_file(
|
||||
self,
|
||||
file_path: Path,
|
||||
max_depth: Optional[int] = None,
|
||||
mode: Optional[str] = None,
|
||||
outline_depth: Optional[int] = None,
|
||||
capture_heading_text: bool = False,
|
||||
include_content_instructions: bool = False,
|
||||
instruction_type: str = 'description'
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate a JSON schema from a markdown file's AST structure.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
max_depth: Maximum heading depth to include (None = unlimited)
|
||||
mode: Generation mode ('outline' for structure-focused schemas)
|
||||
outline_depth: Depth limit for outline mode
|
||||
capture_heading_text: Whether to capture exact heading text as constraints
|
||||
include_content_instructions: Whether to include content instruction fields
|
||||
instruction_type: Type of content instructions ('description', 'example', 'constraint', 'template')
|
||||
|
||||
Returns:
|
||||
JSON schema as a dictionary
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidDepthError: If max_depth is invalid (< 1)
|
||||
"""
|
||||
# Validate inputs
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"Markdown file not found: {file_path}")
|
||||
|
||||
if max_depth is not None and max_depth < 1:
|
||||
raise InvalidDepthError(f"max_depth must be >= 1, got: {max_depth}")
|
||||
|
||||
# Validate instruction type
|
||||
valid_instruction_types = {'description', 'example', 'constraint', 'template'}
|
||||
if instruction_type not in valid_instruction_types:
|
||||
raise InvalidInstructionTypeError(f"Invalid instruction type '{instruction_type}'. Must be one of: {', '.join(valid_instruction_types)}")
|
||||
|
||||
# Read and parse the markdown file
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
ast_tokens = parse_markdown_to_ast(content)
|
||||
|
||||
# Analyze the AST structure
|
||||
structure_analysis = self._analyze_ast_structure(ast_tokens, max_depth)
|
||||
|
||||
# Generate the JSON schema
|
||||
schema = self._create_json_schema(
|
||||
structure_analysis,
|
||||
file_path.name,
|
||||
mode=mode,
|
||||
outline_depth=outline_depth,
|
||||
capture_heading_text=capture_heading_text,
|
||||
include_content_instructions=include_content_instructions,
|
||||
instruction_type=instruction_type
|
||||
)
|
||||
|
||||
return schema
|
||||
|
||||
def _analyze_ast_structure(self, tokens: List[Dict[str, Any]], max_depth: Optional[int]) -> Dict[str, Any]:
|
||||
"""
|
||||
Analyze AST tokens to extract structural patterns.
|
||||
|
||||
Args:
|
||||
tokens: List of AST tokens from markdown-it
|
||||
max_depth: Maximum heading depth to analyze
|
||||
|
||||
Returns:
|
||||
Dictionary containing structural analysis
|
||||
"""
|
||||
analysis = {
|
||||
'headings': defaultdict(list),
|
||||
'paragraphs': [],
|
||||
'lists': [],
|
||||
'code_blocks': [],
|
||||
'blockquotes': [],
|
||||
'tables': [],
|
||||
'links': [],
|
||||
'images': [],
|
||||
'emphasis': [],
|
||||
'structure_types': set()
|
||||
}
|
||||
|
||||
current_heading_level = 0
|
||||
i = 0
|
||||
|
||||
while i < len(tokens):
|
||||
token = tokens[i]
|
||||
token_type = token.get('type', '')
|
||||
|
||||
# Track all structural types found
|
||||
analysis['structure_types'].add(token_type)
|
||||
|
||||
# Analyze headings with depth filtering
|
||||
if token_type == 'heading_open':
|
||||
level = self._extract_heading_level(token.get('tag', ''))
|
||||
if max_depth is None or level <= max_depth:
|
||||
heading_content = self._extract_heading_content(tokens, i)
|
||||
analysis['headings'][f'level_{level}'].append({
|
||||
'content': heading_content,
|
||||
'level': level,
|
||||
'position': i
|
||||
})
|
||||
current_heading_level = level
|
||||
|
||||
# Analyze paragraphs
|
||||
elif token_type == 'paragraph_open':
|
||||
paragraph_content = self._extract_paragraph_content(tokens, i)
|
||||
analysis['paragraphs'].append({
|
||||
'content': paragraph_content,
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze lists
|
||||
elif token_type in ['bullet_list_open', 'ordered_list_open']:
|
||||
list_structure = self._extract_list_structure(tokens, i)
|
||||
analysis['lists'].append({
|
||||
'type': 'bullet' if token_type == 'bullet_list_open' else 'ordered',
|
||||
'structure': list_structure,
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze code blocks
|
||||
elif token_type == 'code_block' or token_type == 'fence':
|
||||
code_info = self._extract_code_block_info(token)
|
||||
analysis['code_blocks'].append({
|
||||
'language': code_info.get('language', ''),
|
||||
'content_length': len(code_info.get('content', '')),
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze blockquotes
|
||||
elif token_type == 'blockquote_open':
|
||||
quote_content = self._extract_blockquote_content(tokens, i)
|
||||
analysis['blockquotes'].append({
|
||||
'content': quote_content,
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze tables
|
||||
elif token_type == 'table_open':
|
||||
table_structure = self._extract_table_structure(tokens, i)
|
||||
analysis['tables'].append({
|
||||
'columns': table_structure.get('columns', 0),
|
||||
'rows': table_structure.get('rows', 0),
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze inline elements
|
||||
elif token_type == 'inline':
|
||||
inline_analysis = self._analyze_inline_content(token)
|
||||
analysis['links'].extend(inline_analysis.get('links', []))
|
||||
analysis['images'].extend(inline_analysis.get('images', []))
|
||||
analysis['emphasis'].extend(inline_analysis.get('emphasis', []))
|
||||
|
||||
i += 1
|
||||
|
||||
# Convert sets to lists for JSON serialization
|
||||
analysis['structure_types'] = list(analysis['structure_types'])
|
||||
|
||||
return analysis
|
||||
|
||||
def _create_json_schema(
|
||||
self,
|
||||
analysis: Dict[str, Any],
|
||||
filename: str,
|
||||
mode: Optional[str] = None,
|
||||
outline_depth: Optional[int] = None,
|
||||
capture_heading_text: bool = False,
|
||||
include_content_instructions: bool = False,
|
||||
instruction_type: str = 'description'
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Create a JSON schema from structural analysis.
|
||||
|
||||
Args:
|
||||
analysis: Structural analysis of the document
|
||||
filename: Name of the source file
|
||||
mode: Generation mode ('outline' for structure-focused schemas)
|
||||
outline_depth: Depth limit for outline mode
|
||||
capture_heading_text: Whether to capture exact heading text as constraints
|
||||
include_content_instructions: Whether to include content instruction fields
|
||||
instruction_type: Type of content instructions to generate
|
||||
|
||||
Returns:
|
||||
JSON schema dictionary
|
||||
"""
|
||||
# Determine title format based on mode
|
||||
title_preposition = "from" if mode == "outline" else "for"
|
||||
|
||||
schema = {
|
||||
"$schema": self.default_schema_url,
|
||||
"type": "object",
|
||||
"title": f"Schema {title_preposition} {filename}",
|
||||
"description": f"JSON schema describing the structure of {filename}",
|
||||
"properties": {}
|
||||
}
|
||||
|
||||
# Add metaschema extensions for outline mode
|
||||
if mode == "outline":
|
||||
schema["x-markitect-outline-mode"] = True
|
||||
if outline_depth is not None:
|
||||
schema["x-markitect-outline-depth"] = outline_depth
|
||||
|
||||
# Add metaschema extension for heading text capture
|
||||
if capture_heading_text:
|
||||
schema["x-markitect-heading-text-capture"] = True
|
||||
|
||||
# Add metaschema extension for content instructions
|
||||
if include_content_instructions:
|
||||
schema["x-markitect-content-instructions-enabled"] = True
|
||||
|
||||
# Add heading structure
|
||||
if analysis['headings']:
|
||||
heading_properties = {}
|
||||
for level_key, headings in analysis['headings'].items():
|
||||
if headings: # Only include levels that have content
|
||||
# Configure content property based on heading text capture
|
||||
if capture_heading_text:
|
||||
# Extract actual heading texts in document order
|
||||
heading_texts = [heading['content'] for heading in headings]
|
||||
content_property = {"enum": heading_texts}
|
||||
else:
|
||||
content_property = {"type": "string"}
|
||||
|
||||
# Build properties for the heading item
|
||||
item_properties = {
|
||||
"content": content_property,
|
||||
"level": {"type": "integer"},
|
||||
"position": {"type": "integer"}
|
||||
}
|
||||
|
||||
# Add content instruction fields if enabled
|
||||
if include_content_instructions:
|
||||
# Generate appropriate instruction text based on heading level
|
||||
level_num = int(level_key.split('_')[1])
|
||||
section_name = f"level {level_num} heading"
|
||||
instruction_text = self._generate_content_instruction(section_name, instruction_type)
|
||||
|
||||
item_properties["x-markitect-content-instructions"] = {
|
||||
"type": "string",
|
||||
"const": instruction_text
|
||||
}
|
||||
|
||||
item_properties["x-markitect-instruction-type"] = {
|
||||
"type": "string",
|
||||
"enum": [instruction_type]
|
||||
}
|
||||
|
||||
heading_properties[level_key] = {
|
||||
"type": "array",
|
||||
"description": f"Headings at {level_key.replace('_', ' ')}",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": item_properties,
|
||||
"required": ["content", "level"]
|
||||
},
|
||||
"minItems": len(headings),
|
||||
"maxItems": len(headings)
|
||||
}
|
||||
|
||||
if heading_properties:
|
||||
schema["properties"]["headings"] = {
|
||||
"type": "object",
|
||||
"description": "Document heading structure",
|
||||
"properties": heading_properties
|
||||
}
|
||||
|
||||
# Add other structural elements
|
||||
structural_elements = {
|
||||
"paragraphs": ("Text paragraphs", analysis['paragraphs']),
|
||||
"lists": ("Lists (ordered and unordered)", analysis['lists']),
|
||||
"code_blocks": ("Code blocks and fenced code", analysis['code_blocks']),
|
||||
"blockquotes": ("Block quotations", analysis['blockquotes']),
|
||||
"tables": ("Tables with rows and columns", analysis['tables']),
|
||||
"links": ("Links to external resources", analysis['links']),
|
||||
"images": ("Embedded images", analysis['images']),
|
||||
"emphasis": ("Text emphasis (bold, italic)", analysis['emphasis'])
|
||||
}
|
||||
|
||||
for element_name, (description, element_list) in structural_elements.items():
|
||||
if element_list:
|
||||
# Build base schema for the element
|
||||
element_schema = {
|
||||
"type": "array",
|
||||
"description": description,
|
||||
"minItems": len(element_list),
|
||||
"maxItems": len(element_list)
|
||||
}
|
||||
|
||||
# Add content instructions for paragraphs and lists if enabled
|
||||
if include_content_instructions and element_name in ["paragraphs", "lists"]:
|
||||
element_schema["items"] = {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {"type": "string"},
|
||||
"x-markitect-content-instructions": {
|
||||
"type": "string",
|
||||
"const": self._generate_content_instruction(element_name, instruction_type)
|
||||
},
|
||||
"x-markitect-instruction-type": {
|
||||
"type": "string",
|
||||
"enum": [instruction_type]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
schema["properties"][element_name] = element_schema
|
||||
|
||||
# Add metadata
|
||||
schema["properties"]["metadata"] = {
|
||||
"type": "object",
|
||||
"description": "Document structure metadata",
|
||||
"properties": {
|
||||
"total_elements": {
|
||||
"type": "integer",
|
||||
"const": sum(len(v) if isinstance(v, list) else 0 for v in analysis.values())
|
||||
},
|
||||
"structure_types": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "All structural element types found",
|
||||
"const": analysis['structure_types']
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return schema
|
||||
|
||||
def _extract_heading_level(self, tag: str) -> int:
|
||||
"""Extract heading level from HTML tag (h1, h2, etc.)."""
|
||||
if tag.startswith('h') and len(tag) == 2:
|
||||
try:
|
||||
return int(tag[1])
|
||||
except ValueError:
|
||||
pass
|
||||
return 1
|
||||
|
||||
def _extract_heading_content(self, tokens: List[Dict[str, Any]], start_index: int) -> str:
|
||||
"""Extract text content from heading tokens."""
|
||||
# Look for the inline token that contains the heading text
|
||||
for i in range(start_index, min(start_index + 3, len(tokens))):
|
||||
token = tokens[i]
|
||||
if token.get('type') == 'inline':
|
||||
return token.get('content', '')
|
||||
return ''
|
||||
|
||||
def _extract_paragraph_content(self, tokens: List[Dict[str, Any]], start_index: int) -> str:
|
||||
"""Extract text content from paragraph tokens."""
|
||||
# Look for the inline token that contains the paragraph text
|
||||
for i in range(start_index, min(start_index + 3, len(tokens))):
|
||||
token = tokens[i]
|
||||
if token.get('type') == 'inline':
|
||||
return token.get('content', '')
|
||||
return ''
|
||||
|
||||
def _extract_list_structure(self, tokens: List[Dict[str, Any]], start_index: int) -> Dict[str, Any]:
|
||||
"""Extract list structure information."""
|
||||
# This is a simplified implementation
|
||||
# In a full implementation, we'd parse the nested list structure
|
||||
return {
|
||||
"type": "list",
|
||||
"estimated_items": 1 # Placeholder - would need more complex parsing
|
||||
}
|
||||
|
||||
def _extract_code_block_info(self, token: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extract code block information."""
|
||||
return {
|
||||
"language": token.get('info', '').split()[0] if token.get('info') else '',
|
||||
"content": token.get('content', '')
|
||||
}
|
||||
|
||||
def _extract_blockquote_content(self, tokens: List[Dict[str, Any]], start_index: int) -> str:
|
||||
"""Extract blockquote content."""
|
||||
# Simplified implementation
|
||||
return "blockquote content"
|
||||
|
||||
def _extract_table_structure(self, tokens: List[Dict[str, Any]], start_index: int) -> Dict[str, Any]:
|
||||
"""Extract table structure information."""
|
||||
# Simplified implementation
|
||||
return {
|
||||
"columns": 2, # Placeholder
|
||||
"rows": 1 # Placeholder
|
||||
}
|
||||
|
||||
def _analyze_inline_content(self, token: Dict[str, Any]) -> Dict[str, List[Any]]:
|
||||
"""Analyze inline content for links, images, emphasis."""
|
||||
result = {
|
||||
"links": [],
|
||||
"images": [],
|
||||
"emphasis": []
|
||||
}
|
||||
|
||||
# Analyze children tokens if they exist
|
||||
children = token.get('children', [])
|
||||
for child in children:
|
||||
if child and isinstance(child, dict):
|
||||
child_type = child.get('type', '')
|
||||
if child_type == 'link_open':
|
||||
result['links'].append({"type": "link"})
|
||||
elif child_type == 'image':
|
||||
result['images'].append({"type": "image"})
|
||||
elif child_type in ['em_open', 'strong_open']:
|
||||
result['emphasis'].append({"type": child_type})
|
||||
|
||||
return result
|
||||
|
||||
def _generate_content_instruction(self, heading_text: str, instruction_type: str) -> str:
|
||||
"""
|
||||
Generate appropriate content instruction text based on heading and instruction type.
|
||||
|
||||
Args:
|
||||
heading_text: The text of the heading
|
||||
instruction_type: Type of instruction to generate
|
||||
|
||||
Returns:
|
||||
Instruction text for the content field
|
||||
"""
|
||||
if instruction_type == "description":
|
||||
return f"Provide content for the '{heading_text}' section"
|
||||
elif instruction_type == "example":
|
||||
return f"Example content for the '{heading_text}' section"
|
||||
elif instruction_type == "constraint":
|
||||
return f"Content must be relevant to '{heading_text}'"
|
||||
elif instruction_type == "template":
|
||||
return f"Template content for '{heading_text}' section"
|
||||
else:
|
||||
# Default fallback
|
||||
return f"Content for the '{heading_text}' section"
|
||||
610
markitect/schema/loader.py
Normal file
610
markitect/schema/loader.py
Normal file
@@ -0,0 +1,610 @@
|
||||
"""
|
||||
Schema Loader - Extract JSON schemas from markdown files.
|
||||
|
||||
This module provides functionality to load schemas from markdown files that
|
||||
contain embedded JSON schemas in code blocks, along with YAML frontmatter
|
||||
metadata and rich documentation.
|
||||
|
||||
Markdown Schema Format:
|
||||
---
|
||||
schema-id: "https://markitect.dev/schemas/domain/v1"
|
||||
version: "1.0.0"
|
||||
status: "stable|draft|deprecated"
|
||||
---
|
||||
|
||||
# Schema Title v1.0
|
||||
|
||||
## Documentation sections...
|
||||
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
This enables:
|
||||
- Rich documentation alongside schemas
|
||||
- Version history in same file
|
||||
- Human-readable schema files
|
||||
- Markdown-first approach aligned with MarkiTect philosophy
|
||||
"""
|
||||
|
||||
import re
|
||||
import json
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional, List, Tuple
|
||||
|
||||
|
||||
class SchemaLoaderError(Exception):
|
||||
"""Base exception for schema loading errors."""
|
||||
pass
|
||||
|
||||
|
||||
class InvalidSchemaFormatError(SchemaLoaderError):
|
||||
"""Schema file format is invalid."""
|
||||
pass
|
||||
|
||||
|
||||
class SchemaNotFoundError(SchemaLoaderError):
|
||||
"""No JSON schema found in markdown file."""
|
||||
pass
|
||||
|
||||
|
||||
class MarkdownSchemaLoader:
|
||||
"""
|
||||
Load and parse markdown schema files.
|
||||
|
||||
Supports:
|
||||
- YAML frontmatter for metadata
|
||||
- JSON code blocks for schema definition
|
||||
- Validation of schema structure
|
||||
- Metadata merging
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))
|
||||
>>> schema = schema_data['schema']
|
||||
>>> metadata = schema_data['metadata']
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema loader with regex patterns."""
|
||||
# Pattern to match YAML frontmatter
|
||||
# Matches: --- ... --- at start of file
|
||||
self.frontmatter_pattern = re.compile(
|
||||
r'^---\s*\n(.*?)\n---\s*\n',
|
||||
re.DOTALL | re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern to match JSON code blocks
|
||||
# Matches: ```json ... ```
|
||||
self.json_code_block_pattern = re.compile(
|
||||
r'```json\s*\n(.*?)\n```',
|
||||
re.DOTALL | re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern to find Schema Definition section
|
||||
# This helps us find the right JSON block if there are multiple
|
||||
self.schema_section_pattern = re.compile(
|
||||
r'##\s+Schema Definition\s*\n',
|
||||
re.MULTILINE
|
||||
)
|
||||
|
||||
def load_schema(self, md_path: Path) -> Dict[str, Any]:
|
||||
"""
|
||||
Load schema from markdown file.
|
||||
|
||||
Args:
|
||||
md_path: Path to markdown schema file
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- schema: Extracted JSON schema (dict)
|
||||
- metadata: Frontmatter metadata (dict)
|
||||
- documentation: Full markdown content (str)
|
||||
- source_file: Source file path (str)
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If schema file doesn't exist
|
||||
InvalidSchemaFormatError: If file format is invalid
|
||||
SchemaNotFoundError: If no JSON schema found
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> data = loader.load_schema(Path("manpage-schema-v1.0.md"))
|
||||
>>> print(data['schema']['title'])
|
||||
'Unix Manual Page Schema'
|
||||
"""
|
||||
if not md_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {md_path}")
|
||||
|
||||
# Read file content
|
||||
try:
|
||||
content = md_path.read_text(encoding='utf-8')
|
||||
except Exception as e:
|
||||
raise InvalidSchemaFormatError(f"Failed to read schema file: {e}")
|
||||
|
||||
# Extract frontmatter
|
||||
metadata = self._extract_frontmatter(content)
|
||||
|
||||
# Extract JSON schema
|
||||
schema = self._extract_json_schema(content)
|
||||
|
||||
if not schema:
|
||||
raise SchemaNotFoundError(
|
||||
f"No JSON schema found in {md_path}. "
|
||||
f"Expected a ```json code block with schema definition."
|
||||
)
|
||||
|
||||
# Merge metadata into schema
|
||||
schema = self._merge_metadata(schema, metadata, md_path)
|
||||
|
||||
return {
|
||||
'schema': schema,
|
||||
'metadata': metadata,
|
||||
'documentation': content,
|
||||
'source_file': str(md_path)
|
||||
}
|
||||
|
||||
def _extract_frontmatter(self, content: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract YAML frontmatter from markdown content.
|
||||
|
||||
Args:
|
||||
content: Markdown file content
|
||||
|
||||
Returns:
|
||||
Dictionary of frontmatter metadata (empty if none found)
|
||||
|
||||
Raises:
|
||||
InvalidSchemaFormatError: If YAML is malformed
|
||||
"""
|
||||
match = self.frontmatter_pattern.search(content)
|
||||
if not match:
|
||||
return {}
|
||||
|
||||
yaml_content = match.group(1)
|
||||
try:
|
||||
metadata = yaml.safe_load(yaml_content) or {}
|
||||
if not isinstance(metadata, dict):
|
||||
raise InvalidSchemaFormatError(
|
||||
f"Frontmatter must be a YAML dictionary, got {type(metadata)}"
|
||||
)
|
||||
return metadata
|
||||
except yaml.YAMLError as e:
|
||||
raise InvalidSchemaFormatError(f"Invalid YAML frontmatter: {e}")
|
||||
|
||||
def _extract_json_schema(self, content: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Extract JSON schema from markdown code blocks.
|
||||
|
||||
Prefers JSON blocks under "## Schema Definition" section,
|
||||
but will use first JSON block if no Schema Definition section found.
|
||||
|
||||
Args:
|
||||
content: Markdown file content
|
||||
|
||||
Returns:
|
||||
JSON schema dictionary or None if not found
|
||||
|
||||
Raises:
|
||||
InvalidSchemaFormatError: If JSON is malformed
|
||||
"""
|
||||
# Find all JSON code blocks
|
||||
json_blocks = self.json_code_block_pattern.findall(content)
|
||||
|
||||
if not json_blocks:
|
||||
return None
|
||||
|
||||
# Try to find the Schema Definition section
|
||||
schema_section_match = self.schema_section_pattern.search(content)
|
||||
|
||||
if schema_section_match:
|
||||
# Find JSON block that comes after Schema Definition section
|
||||
section_pos = schema_section_match.end()
|
||||
|
||||
# Re-search for JSON blocks starting from section position
|
||||
remaining_content = content[section_pos:]
|
||||
section_json_blocks = self.json_code_block_pattern.findall(remaining_content)
|
||||
|
||||
if section_json_blocks:
|
||||
json_text = section_json_blocks[0]
|
||||
else:
|
||||
# Fallback to first JSON block in entire document
|
||||
json_text = json_blocks[0]
|
||||
else:
|
||||
# No Schema Definition section, use first JSON block
|
||||
json_text = json_blocks[0]
|
||||
|
||||
# Parse JSON
|
||||
try:
|
||||
schema = json.loads(json_text)
|
||||
if not isinstance(schema, dict):
|
||||
raise InvalidSchemaFormatError(
|
||||
f"Schema must be a JSON object, got {type(schema)}"
|
||||
)
|
||||
return schema
|
||||
except json.JSONDecodeError as e:
|
||||
raise InvalidSchemaFormatError(f"Invalid JSON schema: {e}")
|
||||
|
||||
def _merge_metadata(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
metadata: Dict[str, Any],
|
||||
source_file: Path
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Merge frontmatter metadata into schema.
|
||||
|
||||
Adds x-markitect-source extension with file info and metadata.
|
||||
Optionally overrides schema fields with frontmatter values.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary
|
||||
metadata: Frontmatter metadata dictionary
|
||||
source_file: Path to source file
|
||||
|
||||
Returns:
|
||||
Schema with merged metadata
|
||||
"""
|
||||
# Create a copy to avoid modifying original
|
||||
merged_schema = schema.copy()
|
||||
|
||||
# Add MarkiTect-specific source metadata
|
||||
merged_schema['x-markitect-source'] = {
|
||||
'file': str(source_file),
|
||||
'filename': source_file.name,
|
||||
'format': 'markdown',
|
||||
'frontmatter': metadata
|
||||
}
|
||||
|
||||
# Override schema fields with frontmatter if present
|
||||
# This allows frontmatter to be the source of truth for metadata
|
||||
if 'version' in metadata:
|
||||
merged_schema['version'] = metadata['version']
|
||||
|
||||
if 'schema-id' in metadata:
|
||||
merged_schema['$id'] = metadata['schema-id']
|
||||
|
||||
if 'status' in metadata:
|
||||
if 'x-markitect-metadata' not in merged_schema:
|
||||
merged_schema['x-markitect-metadata'] = {}
|
||||
merged_schema['x-markitect-metadata']['status'] = metadata['status']
|
||||
|
||||
return merged_schema
|
||||
|
||||
def save_schema(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
md_path: Path,
|
||||
template: Optional[str] = None,
|
||||
frontmatter: Optional[Dict[str, Any]] = None
|
||||
):
|
||||
"""
|
||||
Save schema as markdown file.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary to save
|
||||
md_path: Output path for markdown file
|
||||
template: Optional markdown template string
|
||||
frontmatter: Optional frontmatter metadata (extracted from schema if not provided)
|
||||
|
||||
Raises:
|
||||
InvalidSchemaFormatError: If schema is invalid
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> loader.save_schema(
|
||||
... schema={'title': 'My Schema', ...},
|
||||
... md_path=Path('my-schema-v1.0.md')
|
||||
... )
|
||||
"""
|
||||
if template:
|
||||
# Use provided template
|
||||
content = self._render_template(template, schema, frontmatter)
|
||||
else:
|
||||
# Generate basic markdown
|
||||
content = self._generate_markdown(schema, frontmatter)
|
||||
|
||||
# Create parent directory if needed
|
||||
md_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Write file
|
||||
try:
|
||||
md_path.write_text(content, encoding='utf-8')
|
||||
except Exception as e:
|
||||
raise InvalidSchemaFormatError(f"Failed to write schema file: {e}")
|
||||
|
||||
def _generate_markdown(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
frontmatter: Optional[Dict[str, Any]] = None
|
||||
) -> str:
|
||||
"""
|
||||
Generate markdown from schema.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary
|
||||
frontmatter: Optional frontmatter metadata
|
||||
|
||||
Returns:
|
||||
Markdown content as string
|
||||
"""
|
||||
# Extract metadata from schema
|
||||
title = schema.get('title', 'Untitled Schema')
|
||||
version = schema.get('version', '1.0.0')
|
||||
description = schema.get('description', '')
|
||||
schema_id = schema.get('$id', '')
|
||||
|
||||
# Build frontmatter
|
||||
if frontmatter is None:
|
||||
frontmatter = {}
|
||||
|
||||
# Set defaults
|
||||
if 'schema-id' not in frontmatter and schema_id:
|
||||
frontmatter['schema-id'] = schema_id
|
||||
if 'version' not in frontmatter:
|
||||
frontmatter['version'] = version
|
||||
if 'status' not in frontmatter:
|
||||
frontmatter['status'] = 'draft'
|
||||
|
||||
# Generate frontmatter YAML
|
||||
frontmatter_yaml = yaml.dump(
|
||||
frontmatter,
|
||||
default_flow_style=False,
|
||||
allow_unicode=True
|
||||
).strip()
|
||||
|
||||
# Generate JSON (pretty-printed)
|
||||
schema_json = json.dumps(schema, indent=2, ensure_ascii=False)
|
||||
|
||||
# Build markdown content
|
||||
md_content = f"""---
|
||||
{frontmatter_yaml}
|
||||
---
|
||||
|
||||
# {title} v{version}
|
||||
|
||||
## Overview
|
||||
|
||||
{description}
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
markitect validate document.md --schema {Path(frontmatter.get('schema-id', 'schema')).name}
|
||||
```
|
||||
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{schema_json}
|
||||
```
|
||||
|
||||
## Version History
|
||||
|
||||
### v{version}
|
||||
- Initial version
|
||||
"""
|
||||
|
||||
return md_content
|
||||
|
||||
def _render_template(
|
||||
self,
|
||||
template: str,
|
||||
schema: Dict[str, Any],
|
||||
frontmatter: Optional[Dict[str, Any]] = None
|
||||
) -> str:
|
||||
"""
|
||||
Render markdown from template.
|
||||
|
||||
Simple template rendering using string formatting.
|
||||
For complex templates, consider using Jinja2 or similar.
|
||||
|
||||
Args:
|
||||
template: Template string
|
||||
schema: JSON schema dictionary
|
||||
frontmatter: Optional frontmatter metadata
|
||||
|
||||
Returns:
|
||||
Rendered markdown content
|
||||
"""
|
||||
# Build context for template
|
||||
context = {
|
||||
'title': schema.get('title', 'Untitled'),
|
||||
'version': schema.get('version', '1.0.0'),
|
||||
'description': schema.get('description', ''),
|
||||
'schema_id': schema.get('$id', ''),
|
||||
'schema_json': json.dumps(schema, indent=2, ensure_ascii=False),
|
||||
'frontmatter': frontmatter or {},
|
||||
}
|
||||
|
||||
# Simple template rendering
|
||||
try:
|
||||
return template.format(**context)
|
||||
except KeyError as e:
|
||||
raise InvalidSchemaFormatError(f"Template missing key: {e}")
|
||||
|
||||
def list_json_blocks(self, content: str) -> List[Tuple[int, str]]:
|
||||
"""
|
||||
List all JSON code blocks in markdown content.
|
||||
|
||||
Useful for debugging or when multiple JSON blocks exist.
|
||||
|
||||
Args:
|
||||
content: Markdown file content
|
||||
|
||||
Returns:
|
||||
List of (position, json_content) tuples
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> content = Path('schema.md').read_text()
|
||||
>>> blocks = loader.list_json_blocks(content)
|
||||
>>> print(f"Found {len(blocks)} JSON blocks")
|
||||
"""
|
||||
blocks = []
|
||||
for match in self.json_code_block_pattern.finditer(content):
|
||||
blocks.append((match.start(), match.group(1)))
|
||||
return blocks
|
||||
|
||||
def validate_schema_structure(self, schema: Dict[str, Any]) -> List[str]:
|
||||
"""
|
||||
Validate basic schema structure.
|
||||
|
||||
Checks for required JSON Schema fields and MarkiTect conventions.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary
|
||||
|
||||
Returns:
|
||||
List of warning/error messages (empty if valid)
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> issues = loader.validate_schema_structure(schema)
|
||||
>>> if issues:
|
||||
... print("Schema issues:", issues)
|
||||
"""
|
||||
issues = []
|
||||
|
||||
# Check required JSON Schema fields
|
||||
if '$schema' not in schema:
|
||||
issues.append("Missing required field: $schema")
|
||||
|
||||
if 'type' not in schema:
|
||||
issues.append("Missing recommended field: type")
|
||||
|
||||
if 'title' not in schema:
|
||||
issues.append("Missing recommended field: title")
|
||||
|
||||
if 'description' not in schema:
|
||||
issues.append("Missing recommended field: description")
|
||||
|
||||
# Check MarkiTect conventions
|
||||
if 'version' not in schema:
|
||||
issues.append("Missing MarkiTect convention: version field")
|
||||
|
||||
if '$id' not in schema:
|
||||
issues.append("Missing recommended field: $id")
|
||||
|
||||
# Check $id format if present
|
||||
if '$id' in schema:
|
||||
schema_id = schema['$id']
|
||||
if not isinstance(schema_id, str):
|
||||
issues.append("$id must be a string")
|
||||
elif not schema_id.startswith('https://'):
|
||||
issues.append("$id should be a full HTTPS URL")
|
||||
|
||||
return issues
|
||||
|
||||
|
||||
def auto_ingest_schemas(db_manager=None, schema_dir: Optional[Path] = None, verbose: bool = False) -> Dict[str, Any]:
|
||||
"""Automatically ingest schemas from markitect/schemas/ directory.
|
||||
|
||||
This function scans the schemas directory for .md schema files and ingests
|
||||
any that are not already in the database. Useful for post-install setup
|
||||
or automatic schema registration.
|
||||
|
||||
Args:
|
||||
db_manager: DatabaseManager instance (optional, will create if not provided)
|
||||
schema_dir: Directory containing schemas (defaults to markitect/schemas/)
|
||||
verbose: If True, print detailed progress messages
|
||||
|
||||
Returns:
|
||||
Dictionary with ingestion results:
|
||||
{
|
||||
'ingested': [list of schema names that were ingested],
|
||||
'skipped': [list of schema names that were already present],
|
||||
'failed': [list of (schema_name, error) tuples for failures]
|
||||
}
|
||||
|
||||
Example:
|
||||
>>> from markitect.schema.loader import auto_ingest_schemas
|
||||
>>> results = auto_ingest_schemas(verbose=True)
|
||||
>>> print(f"Ingested {len(results['ingested'])} schemas")
|
||||
"""
|
||||
# Determine schema directory
|
||||
if schema_dir is None:
|
||||
schema_dir = Path(__file__).parent.parent / "schemas"
|
||||
|
||||
if not schema_dir.exists():
|
||||
if verbose:
|
||||
print(f"Warning: Schema directory not found: {schema_dir}")
|
||||
return {'ingested': [], 'skipped': [], 'failed': []}
|
||||
|
||||
# Initialize database manager if not provided
|
||||
if db_manager is None:
|
||||
from markitect.database import DatabaseManager
|
||||
db_path = Path.home() / '.markitect' / 'markitect.db'
|
||||
db_manager = DatabaseManager(str(db_path))
|
||||
db_manager.initialize_database()
|
||||
|
||||
# Get list of already ingested schemas
|
||||
try:
|
||||
existing_schemas = {schema['name'] for schema in db_manager.list_schemas()}
|
||||
except Exception as e:
|
||||
if verbose:
|
||||
print(f"Error listing existing schemas: {e}")
|
||||
return {'ingested': [], 'skipped': [], 'failed': []}
|
||||
|
||||
results = {
|
||||
'ingested': [],
|
||||
'skipped': [],
|
||||
'failed': []
|
||||
}
|
||||
|
||||
# Find all schema files
|
||||
schema_files = list(schema_dir.glob("*-schema-v*.md"))
|
||||
|
||||
if verbose and schema_files:
|
||||
print(f"Found {len(schema_files)} schema file(s) in {schema_dir}")
|
||||
|
||||
loader = MarkdownSchemaLoader()
|
||||
|
||||
for schema_file in sorted(schema_files):
|
||||
schema_name = schema_file.name
|
||||
|
||||
# Skip if already ingested
|
||||
if schema_name in existing_schemas:
|
||||
results['skipped'].append(schema_name)
|
||||
if verbose:
|
||||
print(f"Skipping {schema_name} (already ingested)")
|
||||
continue
|
||||
|
||||
# Try to ingest
|
||||
try:
|
||||
# Load schema
|
||||
schema_data_full = loader.load_schema(schema_file)
|
||||
schema_data = schema_data_full['schema']
|
||||
|
||||
# Store in database
|
||||
schema_content = json.dumps(schema_data, indent=2)
|
||||
record_id = db_manager.store_schema_file(schema_name, schema_content)
|
||||
|
||||
if record_id:
|
||||
results['ingested'].append(schema_name)
|
||||
if verbose:
|
||||
title = schema_data.get('title', schema_name)
|
||||
print(f"Ingested {schema_name} (title: {title})")
|
||||
else:
|
||||
results['failed'].append((schema_name, "Failed to store in database"))
|
||||
if verbose:
|
||||
print(f"Failed to store {schema_name} in database")
|
||||
|
||||
except Exception as e:
|
||||
results['failed'].append((schema_name, str(e)))
|
||||
if verbose:
|
||||
print(f"Failed to ingest {schema_name}: {e}")
|
||||
|
||||
if verbose:
|
||||
print(f"\nAuto-ingestion complete:")
|
||||
print(f" Ingested: {len(results['ingested'])}")
|
||||
print(f" Skipped: {len(results['skipped'])}")
|
||||
print(f" Failed: {len(results['failed'])}")
|
||||
|
||||
return results
|
||||
369
markitect/schema/naming.py
Normal file
369
markitect/schema/naming.py
Normal file
@@ -0,0 +1,369 @@
|
||||
"""
|
||||
Schema Naming Validation - Enforce filename conventions for schemas.
|
||||
|
||||
This module provides validation and utilities for schema filename conventions
|
||||
to ensure consistency across the MarkiTect schema ecosystem.
|
||||
|
||||
Naming Convention:
|
||||
Format: {domain}-schema-v{major}.{minor}.md
|
||||
|
||||
Components:
|
||||
- domain: lowercase, hyphen-separated identifier (e.g., "manpage", "api-documentation")
|
||||
- schema: literal string "schema"
|
||||
- version: SemVer major.minor (e.g., "v1.0", "v2.1")
|
||||
- extension: ".md" (markdown)
|
||||
|
||||
Valid Examples:
|
||||
- manpage-schema-v1.0.md
|
||||
- terminology-schema-v1.0.md
|
||||
- api-documentation-schema-v1.0.md
|
||||
- my-custom-type-schema-v2.1.md
|
||||
|
||||
Invalid Examples:
|
||||
- manpage.json (missing version and wrong extension)
|
||||
- manpage-v1.md (missing "schema" keyword)
|
||||
- ManPage-Schema-v1.0.md (wrong case - must be lowercase)
|
||||
- manpage-schema-1.0.md (missing 'v' prefix)
|
||||
- manpage-schema-v1.md (missing minor version)
|
||||
"""
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Tuple, Optional, Dict, Any
|
||||
|
||||
|
||||
# Regex pattern for schema filename validation
|
||||
# Matches: {domain}-schema-v{major}.{minor}.md
|
||||
# Where domain is lowercase letters/numbers/hyphens starting with letter
|
||||
SCHEMA_FILENAME_PATTERN = re.compile(
|
||||
r'^(?P<domain>[a-z][a-z0-9-]*)-schema-v(?P<major>\d+)\.(?P<minor>\d+)\.md$'
|
||||
)
|
||||
|
||||
|
||||
class SchemaFilenameError(Exception):
|
||||
"""Exception raised for invalid schema filenames."""
|
||||
pass
|
||||
|
||||
|
||||
def validate_schema_filename(filename: str) -> Tuple[bool, Optional[Dict[str, Any]]]:
|
||||
"""
|
||||
Validate schema filename against naming convention.
|
||||
|
||||
Args:
|
||||
filename: The filename to validate (e.g., "manpage-schema-v1.0.md")
|
||||
|
||||
Returns:
|
||||
Tuple of (is_valid, metadata_dict or None)
|
||||
|
||||
If valid, metadata_dict contains:
|
||||
- domain: str - The domain identifier
|
||||
- version: str - Full version string (e.g., "1.0")
|
||||
- major: int - Major version number
|
||||
- minor: int - Minor version number
|
||||
- filename: str - The original filename
|
||||
|
||||
If invalid, metadata_dict is None
|
||||
|
||||
Examples:
|
||||
>>> validate_schema_filename("manpage-schema-v1.0.md")
|
||||
(True, {'domain': 'manpage', 'version': '1.0', ...})
|
||||
|
||||
>>> validate_schema_filename("invalid.json")
|
||||
(False, None)
|
||||
"""
|
||||
match = SCHEMA_FILENAME_PATTERN.match(filename)
|
||||
|
||||
if not match:
|
||||
return False, None
|
||||
|
||||
return True, {
|
||||
'domain': match.group('domain'),
|
||||
'version': f"{match.group('major')}.{match.group('minor')}",
|
||||
'major': int(match.group('major')),
|
||||
'minor': int(match.group('minor')),
|
||||
'filename': filename
|
||||
}
|
||||
|
||||
|
||||
def suggest_valid_filename(
|
||||
domain: str,
|
||||
version: str = "1.0",
|
||||
normalize: bool = True
|
||||
) -> str:
|
||||
"""
|
||||
Generate a valid schema filename from domain and version.
|
||||
|
||||
Args:
|
||||
domain: The schema domain (e.g., "manpage", "API Documentation")
|
||||
version: Version string in format "major.minor" (default: "1.0")
|
||||
normalize: Whether to normalize domain to lowercase/hyphenated
|
||||
|
||||
Returns:
|
||||
Valid schema filename
|
||||
|
||||
Raises:
|
||||
ValueError: If domain or version format is invalid
|
||||
|
||||
Examples:
|
||||
>>> suggest_valid_filename("manpage", "1.0")
|
||||
'manpage-schema-v1.0.md'
|
||||
|
||||
>>> suggest_valid_filename("API Documentation", "2.1")
|
||||
'api-documentation-schema-v2.1.md'
|
||||
|
||||
>>> suggest_valid_filename("My_Custom_Type", "1.0")
|
||||
'my-custom-type-schema-v1.0.md'
|
||||
"""
|
||||
if not domain:
|
||||
raise ValueError("Domain cannot be empty")
|
||||
|
||||
if normalize:
|
||||
# Normalize domain: lowercase, replace spaces/underscores with hyphens
|
||||
domain_clean = domain.lower()
|
||||
domain_clean = domain_clean.replace(' ', '-').replace('_', '-')
|
||||
# Remove consecutive hyphens
|
||||
domain_clean = re.sub(r'-+', '-', domain_clean)
|
||||
# Remove leading/trailing hyphens
|
||||
domain_clean = domain_clean.strip('-')
|
||||
else:
|
||||
domain_clean = domain
|
||||
|
||||
# Validate domain format (must start with letter, contain only lowercase, numbers, hyphens)
|
||||
if not re.match(r'^[a-z][a-z0-9-]*$', domain_clean):
|
||||
raise ValueError(
|
||||
f"Invalid domain '{domain_clean}': must start with lowercase letter "
|
||||
"and contain only lowercase letters, numbers, and hyphens"
|
||||
)
|
||||
|
||||
# Parse and validate version
|
||||
version_parts = version.split('.')
|
||||
if len(version_parts) != 2:
|
||||
raise ValueError(
|
||||
f"Invalid version '{version}': must be in format 'major.minor' (e.g., '1.0')"
|
||||
)
|
||||
|
||||
try:
|
||||
major = int(version_parts[0])
|
||||
minor = int(version_parts[1])
|
||||
except ValueError:
|
||||
raise ValueError(
|
||||
f"Invalid version '{version}': major and minor must be integers"
|
||||
)
|
||||
|
||||
if major < 0 or minor < 0:
|
||||
raise ValueError(
|
||||
f"Invalid version '{version}': major and minor must be non-negative"
|
||||
)
|
||||
|
||||
return f"{domain_clean}-schema-v{major}.{minor}.md"
|
||||
|
||||
|
||||
# Alias for backward compatibility
|
||||
suggest_schema_filename = suggest_valid_filename
|
||||
|
||||
|
||||
def extract_schema_domain(filename: str) -> str:
|
||||
"""
|
||||
Extract the domain from a valid schema filename.
|
||||
|
||||
Args:
|
||||
filename: Schema filename to parse
|
||||
|
||||
Returns:
|
||||
The domain identifier
|
||||
|
||||
Raises:
|
||||
SchemaFilenameError: If filename is invalid
|
||||
|
||||
Examples:
|
||||
>>> extract_schema_domain("manpage-schema-v1.0.md")
|
||||
'manpage'
|
||||
"""
|
||||
is_valid, metadata = validate_schema_filename(filename)
|
||||
|
||||
if not is_valid:
|
||||
raise SchemaFilenameError(
|
||||
f"Invalid schema filename: {filename}\n"
|
||||
f"Expected format: {{domain}}-schema-v{{major}}.{{minor}}.md"
|
||||
)
|
||||
|
||||
return metadata['domain']
|
||||
|
||||
|
||||
def get_schema_version(filename: str) -> str:
|
||||
"""
|
||||
Get the version string from a valid schema filename.
|
||||
|
||||
Args:
|
||||
filename: Schema filename to parse
|
||||
|
||||
Returns:
|
||||
Version string (e.g., "1.0")
|
||||
|
||||
Raises:
|
||||
SchemaFilenameError: If filename is invalid
|
||||
|
||||
Examples:
|
||||
>>> get_schema_version("manpage-schema-v1.0.md")
|
||||
'1.0'
|
||||
"""
|
||||
is_valid, metadata = validate_schema_filename(filename)
|
||||
|
||||
if not is_valid:
|
||||
raise SchemaFilenameError(
|
||||
f"Invalid schema filename: {filename}\n"
|
||||
f"Expected format: {{domain}}-schema-v{{major}}.{{minor}}.md"
|
||||
)
|
||||
|
||||
return metadata['version']
|
||||
|
||||
|
||||
def extract_schema_metadata(filename: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract metadata from a valid schema filename.
|
||||
|
||||
Args:
|
||||
filename: Schema filename to parse
|
||||
|
||||
Returns:
|
||||
Dictionary with metadata
|
||||
|
||||
Raises:
|
||||
SchemaFilenameError: If filename is invalid
|
||||
|
||||
Examples:
|
||||
>>> extract_schema_metadata("manpage-schema-v1.0.md")
|
||||
{'domain': 'manpage', 'version': '1.0', 'major': 1, 'minor': 0}
|
||||
"""
|
||||
is_valid, metadata = validate_schema_filename(filename)
|
||||
|
||||
if not is_valid:
|
||||
raise SchemaFilenameError(
|
||||
f"Invalid schema filename: {filename}\n"
|
||||
f"Expected format: {{domain}}-schema-v{{major}}.{{minor}}.md"
|
||||
)
|
||||
|
||||
return metadata
|
||||
|
||||
|
||||
def get_validation_errors(filename: str) -> list:
|
||||
"""
|
||||
Get detailed validation errors for a filename.
|
||||
|
||||
Args:
|
||||
filename: Filename to validate
|
||||
|
||||
Returns:
|
||||
List of error messages (empty if valid)
|
||||
|
||||
Examples:
|
||||
>>> get_validation_errors("manpage-schema-v1.0.md")
|
||||
[]
|
||||
|
||||
>>> get_validation_errors("invalid.json")
|
||||
['Filename does not match pattern: {domain}-schema-v{major}.{minor}.md', ...]
|
||||
"""
|
||||
errors = []
|
||||
|
||||
# Check basic pattern match
|
||||
is_valid, _ = validate_schema_filename(filename)
|
||||
if is_valid:
|
||||
return errors
|
||||
|
||||
# Provide detailed feedback
|
||||
errors.append(
|
||||
f"Filename does not match pattern: {{domain}}-schema-v{{major}}.{{minor}}.md"
|
||||
)
|
||||
|
||||
# Check extension
|
||||
if not filename.endswith('.md'):
|
||||
errors.append(f"Extension must be '.md', got: {Path(filename).suffix}")
|
||||
|
||||
# Check for version
|
||||
if '-v' not in filename:
|
||||
errors.append("Missing version: filename must include '-v{major}.{minor}'")
|
||||
elif not re.search(r'-v\d+\.\d+', filename):
|
||||
errors.append(
|
||||
"Invalid version format: must be '-v{major}.{minor}' (e.g., '-v1.0')"
|
||||
)
|
||||
|
||||
# Check for schema keyword
|
||||
if '-schema-' not in filename:
|
||||
errors.append("Missing '-schema-' keyword in filename")
|
||||
|
||||
# Check for uppercase (must be lowercase)
|
||||
if any(c.isupper() for c in filename):
|
||||
errors.append("Filename must be lowercase")
|
||||
|
||||
# Check domain format (if we can isolate it)
|
||||
parts = filename.split('-schema-')
|
||||
if len(parts) >= 1:
|
||||
domain = parts[0]
|
||||
if domain and not re.match(r'^[a-z][a-z0-9-]*$', domain):
|
||||
errors.append(
|
||||
f"Invalid domain '{domain}': must start with lowercase letter "
|
||||
"and contain only lowercase letters, numbers, and hyphens"
|
||||
)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def is_valid_schema_filename(filename: str) -> bool:
|
||||
"""
|
||||
Check if filename is valid (convenience function).
|
||||
|
||||
Args:
|
||||
filename: Filename to check
|
||||
|
||||
Returns:
|
||||
True if valid, False otherwise
|
||||
|
||||
Examples:
|
||||
>>> is_valid_schema_filename("manpage-schema-v1.0.md")
|
||||
True
|
||||
|
||||
>>> is_valid_schema_filename("invalid.json")
|
||||
False
|
||||
"""
|
||||
is_valid, _ = validate_schema_filename(filename)
|
||||
return is_valid
|
||||
|
||||
|
||||
def format_validation_message(filename: str) -> str:
|
||||
"""
|
||||
Format a user-friendly validation message.
|
||||
|
||||
Args:
|
||||
filename: Filename that failed validation
|
||||
|
||||
Returns:
|
||||
Formatted error message with suggestions
|
||||
|
||||
Examples:
|
||||
>>> print(format_validation_message("manpage.json"))
|
||||
Invalid schema filename: manpage.json
|
||||
...
|
||||
"""
|
||||
errors = get_validation_errors(filename)
|
||||
|
||||
if not errors:
|
||||
return f"\u2705 Valid schema filename: {filename}"
|
||||
|
||||
message = f"\u274c Invalid schema filename: {filename}\n\n"
|
||||
message += "Errors:\n"
|
||||
for i, error in enumerate(errors, 1):
|
||||
message += f" {i}. {error}\n"
|
||||
|
||||
message += "\nExpected format: {domain}-schema-v{major}.{minor}.md\n"
|
||||
message += "Example: manpage-schema-v1.0.md\n"
|
||||
|
||||
# Try to suggest a corrected filename
|
||||
try:
|
||||
# Extract domain guess (everything before first hyphen or dot)
|
||||
domain_guess = filename.split('-')[0].split('.')[0]
|
||||
suggestion = suggest_valid_filename(domain_guess, "1.0")
|
||||
message += f"\nSuggested filename: {suggestion}\n"
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return message
|
||||
530
markitect/schema/refiner.py
Normal file
530
markitect/schema/refiner.py
Normal file
@@ -0,0 +1,530 @@
|
||||
"""
|
||||
Schema Refiner for Phase 2: Schema Refinement Tools
|
||||
|
||||
Automatically refines rigid schemas by applying loosening rules and fixes.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional, Tuple
|
||||
import json
|
||||
import copy
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from .analyzer import SchemaAnalyzer, SchemaIssue, IssueType, IssueSeverity
|
||||
|
||||
|
||||
@dataclass
|
||||
class RefinementAction:
|
||||
"""Represents a refinement action taken on the schema."""
|
||||
issue_type: IssueType
|
||||
path: str
|
||||
description: str
|
||||
old_value: Any = None
|
||||
new_value: Any = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class RefinementResult:
|
||||
"""Results of schema refinement."""
|
||||
success: bool
|
||||
actions_taken: List[RefinementAction] = field(default_factory=list)
|
||||
refined_schema: Optional[Dict[str, Any]] = None
|
||||
error_message: Optional[str] = None
|
||||
|
||||
|
||||
class SchemaRefiner:
|
||||
"""Refines rigid schemas by applying loosening rules."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema refiner."""
|
||||
self.analyzer = SchemaAnalyzer()
|
||||
|
||||
def _navigate_to_path(self, schema: Dict[str, Any], path: str) -> Optional[Tuple[Dict[str, Any], str]]:
|
||||
"""
|
||||
Navigate to a path in the schema, handling nested 'properties' objects.
|
||||
|
||||
Returns (parent_object, property_name) or None if path doesn't exist.
|
||||
"""
|
||||
path_parts = path.split('.')
|
||||
obj = schema
|
||||
|
||||
# Navigate through all but the last part
|
||||
for i, part in enumerate(path_parts[:-1]):
|
||||
# Try direct access first
|
||||
if part in obj:
|
||||
obj = obj[part]
|
||||
# If not found and obj has 'properties', try there
|
||||
elif isinstance(obj, dict) and "properties" in obj and part in obj["properties"]:
|
||||
obj = obj["properties"][part]
|
||||
else:
|
||||
return None
|
||||
|
||||
# For the final part, check if we need to descend into 'properties'
|
||||
prop_name = path_parts[-1]
|
||||
if prop_name in obj:
|
||||
return (obj, prop_name)
|
||||
elif isinstance(obj, dict) and "properties" in obj and prop_name in obj["properties"]:
|
||||
return (obj["properties"], prop_name)
|
||||
else:
|
||||
return None
|
||||
|
||||
def refine_schema_interactive(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True
|
||||
) -> RefinementResult:
|
||||
"""
|
||||
Refine a schema interactively, prompting for each fix.
|
||||
|
||||
Args:
|
||||
schema: The JSON schema to refine
|
||||
loosen_counts: Enable fixes for exact counts
|
||||
migrate_deprecated: Enable migration of deprecated extensions
|
||||
round_numbers: Enable rounding of overly specific numbers
|
||||
|
||||
Returns:
|
||||
RefinementResult with actions taken and refined schema
|
||||
"""
|
||||
result = RefinementResult(success=False)
|
||||
|
||||
try:
|
||||
# Analyze the schema first
|
||||
analysis = self.analyzer.analyze_schema(schema)
|
||||
|
||||
print(f"\nFound {len(analysis.issues)} issue(s) to review\n")
|
||||
|
||||
# Deep copy to avoid modifying original
|
||||
refined = copy.deepcopy(schema)
|
||||
|
||||
# Process each issue interactively
|
||||
for i, issue in enumerate(analysis.issues, 1):
|
||||
print(f"Issue {i}/{len(analysis.issues)}")
|
||||
print(f" Type: {issue.issue_type.value}")
|
||||
print(f" Path: {issue.path}")
|
||||
print(f" {issue.message}")
|
||||
print(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if issue.current_value is not None:
|
||||
print(f" Current: {json.dumps(issue.current_value)}")
|
||||
if issue.suggested_value is not None:
|
||||
print(f" Suggested: {json.dumps(issue.suggested_value)}")
|
||||
|
||||
# Ask user if they want to apply the fix
|
||||
response = input("\nApply this fix? [y/N/q]: ").strip().lower()
|
||||
|
||||
if response == 'q':
|
||||
print("Refinement cancelled by user")
|
||||
result.success = False
|
||||
return result
|
||||
elif response == 'y':
|
||||
action = None
|
||||
|
||||
if loosen_counts and issue.issue_type == IssueType.EXACT_COUNT:
|
||||
action = self._fix_exact_count(refined, issue)
|
||||
|
||||
elif round_numbers and issue.issue_type == IssueType.OVERLY_SPECIFIC:
|
||||
action = self._fix_overly_specific(refined, issue)
|
||||
|
||||
elif loosen_counts and issue.issue_type == IssueType.NO_FLEXIBILITY:
|
||||
action = self._fix_no_flexibility(refined, issue)
|
||||
|
||||
elif migrate_deprecated and issue.issue_type == IssueType.DEPRECATED_EXTENSIONS:
|
||||
action = self._fix_deprecated_extension(refined, issue)
|
||||
|
||||
if action:
|
||||
result.actions_taken.append(action)
|
||||
print(f" ✓ Applied")
|
||||
else:
|
||||
print(f" ✗ Could not apply fix")
|
||||
else:
|
||||
print(f" - Skipped")
|
||||
|
||||
print()
|
||||
|
||||
result.refined_schema = refined
|
||||
result.success = True
|
||||
|
||||
except Exception as e:
|
||||
result.error_message = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def refine_schema(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True
|
||||
) -> RefinementResult:
|
||||
"""
|
||||
Refine a schema by applying fixes for detected issues.
|
||||
|
||||
Args:
|
||||
schema: The JSON schema to refine
|
||||
loosen_counts: Apply fixes for exact counts
|
||||
migrate_deprecated: Migrate deprecated extensions
|
||||
round_numbers: Round overly specific numbers
|
||||
|
||||
Returns:
|
||||
RefinementResult with actions taken and refined schema
|
||||
"""
|
||||
result = RefinementResult(success=False)
|
||||
|
||||
try:
|
||||
# Analyze the schema first
|
||||
analysis = self.analyzer.analyze_schema(schema)
|
||||
|
||||
# Deep copy to avoid modifying original
|
||||
refined = copy.deepcopy(schema)
|
||||
|
||||
# Apply fixes based on issues found
|
||||
for issue in analysis.issues:
|
||||
action = None
|
||||
|
||||
if loosen_counts and issue.issue_type == IssueType.EXACT_COUNT:
|
||||
action = self._fix_exact_count(refined, issue)
|
||||
|
||||
elif round_numbers and issue.issue_type == IssueType.OVERLY_SPECIFIC:
|
||||
action = self._fix_overly_specific(refined, issue)
|
||||
|
||||
elif loosen_counts and issue.issue_type == IssueType.NO_FLEXIBILITY:
|
||||
action = self._fix_no_flexibility(refined, issue)
|
||||
|
||||
elif migrate_deprecated and issue.issue_type == IssueType.DEPRECATED_EXTENSIONS:
|
||||
action = self._fix_deprecated_extension(refined, issue)
|
||||
|
||||
if action:
|
||||
result.actions_taken.append(action)
|
||||
|
||||
result.refined_schema = refined
|
||||
result.success = True
|
||||
|
||||
except Exception as e:
|
||||
result.error_message = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def _fix_exact_count(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Fix exact count constraints by converting to ranges."""
|
||||
nav_result = self._navigate_to_path(schema, issue.path)
|
||||
if not nav_result:
|
||||
return None
|
||||
|
||||
obj, prop_name = nav_result
|
||||
prop_def = obj[prop_name]
|
||||
old_value = copy.deepcopy(prop_def)
|
||||
|
||||
# Check if it's an array with exact minItems/maxItems
|
||||
if isinstance(prop_def, dict) and prop_def.get("type") == "array":
|
||||
min_items = prop_def.get("minItems")
|
||||
max_items = prop_def.get("maxItems")
|
||||
|
||||
if min_items is not None and max_items is not None and min_items == max_items:
|
||||
# Apply suggested loosening
|
||||
new_min = max(0, min_items - 2)
|
||||
new_max = min_items + 5
|
||||
|
||||
prop_def["minItems"] = new_min
|
||||
prop_def["maxItems"] = new_max
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
path=issue.path,
|
||||
description=f"Loosened array count from exactly {min_items} to range {new_min}-{new_max}",
|
||||
old_value={"minItems": min_items, "maxItems": max_items},
|
||||
new_value={"minItems": new_min, "maxItems": new_max}
|
||||
)
|
||||
|
||||
# Check if it's a const value
|
||||
if isinstance(prop_def, dict) and "const" in prop_def:
|
||||
const_value = prop_def["const"]
|
||||
del prop_def["const"]
|
||||
|
||||
# If it's a number, convert to a range
|
||||
if isinstance(const_value, int):
|
||||
prop_def["minimum"] = const_value - 1
|
||||
prop_def["maximum"] = const_value + 1
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
path=issue.path,
|
||||
description=f"Converted const {const_value} to range {const_value-1}-{const_value+1}",
|
||||
old_value=const_value,
|
||||
new_value={"minimum": const_value - 1, "maximum": const_value + 1}
|
||||
)
|
||||
else:
|
||||
# For non-numeric constants, just remove the constraint
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
path=issue.path,
|
||||
description=f"Removed const constraint: {const_value}",
|
||||
old_value=const_value,
|
||||
new_value=None
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def _fix_overly_specific(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Fix overly specific number constraints by rounding."""
|
||||
if issue.suggested_value is None:
|
||||
return None
|
||||
|
||||
nav_result = self._navigate_to_path(schema, issue.path)
|
||||
if not nav_result:
|
||||
return None
|
||||
|
||||
obj, prop_name = nav_result
|
||||
prop_def = obj[prop_name]
|
||||
|
||||
# Round the minItems value
|
||||
if isinstance(prop_def, dict) and "minItems" in prop_def:
|
||||
old_value = prop_def["minItems"]
|
||||
new_value = issue.suggested_value
|
||||
prop_def["minItems"] = new_value
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.OVERLY_SPECIFIC,
|
||||
path=issue.path,
|
||||
description=f"Rounded minItems from {old_value} to {new_value}",
|
||||
old_value=old_value,
|
||||
new_value=new_value
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def _fix_no_flexibility(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Fix narrow ranges by widening them."""
|
||||
nav_result = self._navigate_to_path(schema, issue.path)
|
||||
if not nav_result:
|
||||
return None
|
||||
|
||||
obj, prop_name = nav_result
|
||||
prop_def = obj[prop_name]
|
||||
|
||||
if isinstance(prop_def, dict) and "minimum" in prop_def and "maximum" in prop_def:
|
||||
old_min = prop_def["minimum"]
|
||||
old_max = prop_def["maximum"]
|
||||
range_size = old_max - old_min
|
||||
|
||||
# Widen the range
|
||||
new_min = old_min - 5
|
||||
new_max = old_max + 5
|
||||
|
||||
prop_def["minimum"] = new_min
|
||||
prop_def["maximum"] = new_max
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.NO_FLEXIBILITY,
|
||||
path=issue.path,
|
||||
description=f"Widened range from {old_min}-{old_max} to {new_min}-{new_max}",
|
||||
old_value={"minimum": old_min, "maximum": old_max},
|
||||
new_value={"minimum": new_min, "maximum": new_max}
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def _fix_deprecated_extension(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Remove deprecated extension (migration requires manual work)."""
|
||||
# For now, just document that manual migration is needed
|
||||
# Full migration would require understanding the old format
|
||||
|
||||
deprecated_key = issue.path
|
||||
if deprecated_key in schema:
|
||||
old_value = schema[deprecated_key]
|
||||
# Don't actually remove it automatically - too risky
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.DEPRECATED_EXTENSIONS,
|
||||
path=issue.path,
|
||||
description=f"Detected deprecated extension (manual migration recommended)",
|
||||
old_value=old_value,
|
||||
new_value=None
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def refine_schema_file(
|
||||
self,
|
||||
input_path: Path,
|
||||
output_path: Optional[Path] = None,
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True
|
||||
) -> RefinementResult:
|
||||
"""
|
||||
Refine a schema file.
|
||||
|
||||
Args:
|
||||
input_path: Path to input schema file
|
||||
output_path: Path to output file (if None, overwrites input)
|
||||
loosen_counts: Apply fixes for exact counts
|
||||
migrate_deprecated: Migrate deprecated extensions
|
||||
round_numbers: Round overly specific numbers
|
||||
|
||||
Returns:
|
||||
RefinementResult
|
||||
"""
|
||||
with open(input_path) as f:
|
||||
schema = json.load(f)
|
||||
|
||||
result = self.refine_schema(
|
||||
schema,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
if result.success and result.refined_schema:
|
||||
output = output_path or input_path
|
||||
with open(output, 'w') as f:
|
||||
json.dump(result.refined_schema, f, indent=2)
|
||||
|
||||
return result
|
||||
|
||||
def format_refinement_report(self, result: RefinementResult) -> str:
|
||||
"""
|
||||
Format refinement results as a human-readable report.
|
||||
|
||||
Args:
|
||||
result: Refinement results
|
||||
|
||||
Returns:
|
||||
Formatted report string
|
||||
"""
|
||||
lines = []
|
||||
|
||||
# Header
|
||||
lines.append("=" * 70)
|
||||
lines.append("Schema Refinement Report")
|
||||
lines.append("=" * 70)
|
||||
lines.append("")
|
||||
|
||||
if not result.success:
|
||||
lines.append(f"❌ Refinement failed: {result.error_message}")
|
||||
return "\n".join(lines)
|
||||
|
||||
# Summary
|
||||
action_count = len(result.actions_taken)
|
||||
if action_count == 0:
|
||||
lines.append("✅ No refinements needed - schema is already flexible")
|
||||
else:
|
||||
lines.append(f"✅ Applied {action_count} refinement(s)")
|
||||
lines.append("")
|
||||
|
||||
# List actions
|
||||
if result.actions_taken:
|
||||
lines.append("Actions Taken:")
|
||||
lines.append("-" * 70)
|
||||
|
||||
for i, action in enumerate(result.actions_taken, 1):
|
||||
lines.append(f"{i}. {action.description}")
|
||||
lines.append(f" Path: {action.path}")
|
||||
|
||||
if action.old_value is not None:
|
||||
lines.append(f" Before: {json.dumps(action.old_value)}")
|
||||
if action.new_value is not None:
|
||||
lines.append(f" After: {json.dumps(action.new_value)}")
|
||||
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def refine_schema_cli(
|
||||
schema_path: str,
|
||||
output: Optional[str] = None,
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True,
|
||||
dry_run: bool = False,
|
||||
interactive: bool = False
|
||||
) -> int:
|
||||
"""
|
||||
CLI entry point for schema refinement.
|
||||
|
||||
Args:
|
||||
schema_path: Path to schema file
|
||||
output: Output path (None = overwrite input)
|
||||
loosen_counts: Apply count loosening fixes
|
||||
migrate_deprecated: Migrate deprecated extensions
|
||||
round_numbers: Round overly specific numbers
|
||||
dry_run: Show changes without applying
|
||||
interactive: Prompt for each fix
|
||||
|
||||
Returns:
|
||||
Exit code (0 = success, 1 = no changes needed, 2 = error)
|
||||
"""
|
||||
refiner = SchemaRefiner()
|
||||
|
||||
try:
|
||||
input_path = Path(schema_path)
|
||||
output_path = Path(output) if output else None
|
||||
|
||||
# Load schema
|
||||
with open(input_path) as f:
|
||||
schema = json.load(f)
|
||||
|
||||
if interactive:
|
||||
# Interactive mode - prompt for each fix
|
||||
print(f"Refining schema: {schema_path}")
|
||||
result = refiner.refine_schema_interactive(
|
||||
schema,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
if result.success and result.refined_schema and not dry_run:
|
||||
# Write the refined schema
|
||||
output = output_path or input_path
|
||||
with open(output, 'w') as f:
|
||||
json.dump(result.refined_schema, f, indent=2)
|
||||
print(f"\nRefined schema written to: {output}")
|
||||
|
||||
elif dry_run:
|
||||
# Just analyze and show what would be done
|
||||
result = refiner.refine_schema(
|
||||
schema,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
print("DRY RUN - No changes will be made")
|
||||
print()
|
||||
else:
|
||||
result = refiner.refine_schema_file(
|
||||
input_path,
|
||||
output_path,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
# Only print full report if not in interactive mode (user already saw changes)
|
||||
if not interactive:
|
||||
report = refiner.format_refinement_report(result)
|
||||
print(report)
|
||||
elif result.success:
|
||||
# Just print summary for interactive mode
|
||||
print(f"\n{'='*70}")
|
||||
print(f"Refinement complete: {len(result.actions_taken)} change(s) applied")
|
||||
print(f"{'='*70}")
|
||||
|
||||
if result.success and len(result.actions_taken) > 0:
|
||||
return 0 # Success with changes
|
||||
elif result.success:
|
||||
return 1 # Success but no changes needed
|
||||
else:
|
||||
return 2 # Error
|
||||
|
||||
except FileNotFoundError:
|
||||
print(f"Error: Schema file not found: {schema_path}")
|
||||
return 2
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: Invalid JSON in schema file: {e}")
|
||||
return 2
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
return 2
|
||||
679
markitect/schema/validator.py
Normal file
679
markitect/schema/validator.py
Normal file
@@ -0,0 +1,679 @@
|
||||
"""
|
||||
Schema Validator for Issue #7: Validate a Markdown File Against a Schema.
|
||||
|
||||
This module provides functionality to validate markdown documents against JSON schemas
|
||||
for arc42 architecture documentation compliance checking - essential for intelligent
|
||||
document analysis and plan-actual comparison capabilities.
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
|
||||
try:
|
||||
import jsonschema
|
||||
from jsonschema import SchemaError
|
||||
JSONSCHEMA_AVAILABLE = True
|
||||
except ImportError:
|
||||
# Fallback to basic validation without full JSON Schema validation
|
||||
JSONSCHEMA_AVAILABLE = False
|
||||
SchemaError = Exception
|
||||
|
||||
from markitect.core.parser import parse_markdown_to_ast
|
||||
from .generator import SchemaGenerator
|
||||
from markitect.validation_error import ValidationErrorCollector, ValidationErrorType
|
||||
from markitect.exceptions import FileNotFoundError, SchemaValidationError, InvalidSchemaError
|
||||
|
||||
|
||||
class SchemaValidator:
|
||||
"""
|
||||
Validates markdown documents against JSON schemas for arc42 compliance checking.
|
||||
|
||||
This service provides boolean validation results for markdown documents against
|
||||
schemas, enabling strict compliance checking for architectural documentation
|
||||
templates and intelligent plan-actual comparison.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema validator."""
|
||||
self.schema_generator = SchemaGenerator()
|
||||
self.jsonschema_available = JSONSCHEMA_AVAILABLE
|
||||
|
||||
def validate_file_against_schema(self, file_path: Path, schema: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema: JSON schema dictionary to validate against
|
||||
|
||||
Returns:
|
||||
True if the document matches the schema, False otherwise
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid
|
||||
"""
|
||||
# Validate inputs
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"Markdown file not found: {file_path}")
|
||||
|
||||
# Validate the schema itself
|
||||
self._validate_schema(schema)
|
||||
|
||||
# Generate the document's current structure
|
||||
try:
|
||||
document_schema = self.schema_generator.generate_schema_from_file(file_path)
|
||||
except Exception as e:
|
||||
raise SchemaValidationError(f"Failed to generate document schema: {e}") from e
|
||||
|
||||
# Check if the expected schema has heading text constraints
|
||||
if self._has_heading_text_constraints(schema):
|
||||
# For heading text validation, we need to extract actual content and compare against enum constraints
|
||||
return self._validate_with_heading_text_constraints(file_path, schema, document_schema)
|
||||
|
||||
# Use standard structure comparison for backward compatibility
|
||||
return self._compare_structures(document_schema, schema)
|
||||
|
||||
def validate_file_against_schema_string(self, file_path: Path, schema_json: str) -> bool:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema provided as a string.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_json: JSON schema as a string
|
||||
|
||||
Returns:
|
||||
True if the document matches the schema, False otherwise
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid JSON or schema
|
||||
"""
|
||||
try:
|
||||
schema = json.loads(schema_json)
|
||||
except json.JSONDecodeError as e:
|
||||
raise InvalidSchemaError(f"Invalid JSON schema string: {e}") from e
|
||||
|
||||
return self.validate_file_against_schema(file_path, schema)
|
||||
|
||||
def validate_file_against_schema_file(self, file_path: Path, schema_file_path: Path) -> bool:
|
||||
"""
|
||||
Validate a markdown file against a schema stored in a file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_file_path: Path to the JSON schema file
|
||||
|
||||
Returns:
|
||||
True if the document matches the schema, False otherwise
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If either file doesn't exist
|
||||
InvalidSchemaError: If the schema file is invalid
|
||||
"""
|
||||
if not schema_file_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {schema_file_path}")
|
||||
|
||||
try:
|
||||
schema_content = schema_file_path.read_text(encoding='utf-8')
|
||||
schema = json.loads(schema_content)
|
||||
except (IOError, json.JSONDecodeError) as e:
|
||||
raise InvalidSchemaError(f"Failed to load schema file {schema_file_path}: {e}") from e
|
||||
|
||||
return self.validate_file_against_schema(file_path, schema)
|
||||
|
||||
def _validate_schema(self, schema: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Validate that a schema is a valid JSON Schema.
|
||||
|
||||
Args:
|
||||
schema: Schema dictionary to validate
|
||||
|
||||
Raises:
|
||||
InvalidSchemaError: If the schema is invalid
|
||||
"""
|
||||
try:
|
||||
# Check basic schema structure
|
||||
if not isinstance(schema, dict):
|
||||
raise InvalidSchemaError("Schema must be a dictionary")
|
||||
|
||||
# Basic schema validation
|
||||
if not schema.get('$schema') or not schema.get('type'):
|
||||
raise InvalidSchemaError("Schema must have '$schema' and 'type' fields")
|
||||
|
||||
# If jsonschema library is available, use it for full validation
|
||||
if self.jsonschema_available:
|
||||
jsonschema.validators.validator_for(schema).check_schema(schema)
|
||||
|
||||
except (SchemaError, TypeError, AttributeError) as e:
|
||||
raise InvalidSchemaError(f"Invalid JSON schema: {e}") from e
|
||||
|
||||
def _compare_structures(self, document_schema: Dict[str, Any], expected_schema: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Compare a document's actual structure against expected schema requirements.
|
||||
|
||||
This method performs the core validation logic by analyzing whether the
|
||||
document's generated schema satisfies the requirements defined in the
|
||||
expected schema.
|
||||
|
||||
Args:
|
||||
document_schema: Schema generated from the actual document
|
||||
expected_schema: Expected schema requirements
|
||||
|
||||
Returns:
|
||||
True if the document satisfies the expected schema requirements
|
||||
"""
|
||||
try:
|
||||
# Extract actual document structure
|
||||
doc_properties = document_schema.get('properties', {})
|
||||
expected_properties = expected_schema.get('properties', {})
|
||||
|
||||
# Check all required properties are present
|
||||
required_properties = expected_schema.get('required', [])
|
||||
for prop in required_properties:
|
||||
if prop not in doc_properties:
|
||||
return False
|
||||
|
||||
# Validate heading structure if specified
|
||||
if 'headings' in expected_properties and 'headings' in doc_properties:
|
||||
if not self._validate_heading_structure(
|
||||
doc_properties['headings'],
|
||||
expected_properties['headings']
|
||||
):
|
||||
return False
|
||||
|
||||
# Validate other structural elements
|
||||
structural_elements = ['paragraphs', 'lists', 'code_blocks', 'blockquotes', 'tables']
|
||||
for element in structural_elements:
|
||||
if element in expected_properties:
|
||||
if not self._validate_structural_element(
|
||||
doc_properties.get(element),
|
||||
expected_properties[element]
|
||||
):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
except Exception:
|
||||
# If comparison fails for any reason, consider validation failed
|
||||
return False
|
||||
|
||||
def _validate_heading_structure(self, actual_headings: Dict[str, Any], expected_headings: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate heading structure against expected requirements.
|
||||
|
||||
Args:
|
||||
actual_headings: Actual heading structure from document
|
||||
expected_headings: Expected heading requirements
|
||||
|
||||
Returns:
|
||||
True if heading structure meets requirements
|
||||
"""
|
||||
actual_heading_props = actual_headings.get('properties', {})
|
||||
expected_heading_props = expected_headings.get('properties', {})
|
||||
required_heading_levels = expected_headings.get('required', [])
|
||||
|
||||
# Check required heading levels are present
|
||||
for level in required_heading_levels:
|
||||
if level not in actual_heading_props:
|
||||
return False
|
||||
|
||||
# Check each expected heading level meets requirements
|
||||
for level, expected_spec in expected_heading_props.items():
|
||||
if level not in actual_heading_props:
|
||||
# If level is not required, skip it
|
||||
if level not in required_heading_levels:
|
||||
continue
|
||||
return False
|
||||
|
||||
actual_spec = actual_heading_props[level]
|
||||
|
||||
# Check minimum and maximum item requirements
|
||||
if not self._validate_array_constraints(actual_spec, expected_spec):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _validate_structural_element(self, actual_element: Dict[str, Any], expected_element: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate a structural element (paragraphs, lists, etc.) against requirements.
|
||||
|
||||
Args:
|
||||
actual_element: Actual element structure from document
|
||||
expected_element: Expected element requirements
|
||||
|
||||
Returns:
|
||||
True if element meets requirements
|
||||
"""
|
||||
if actual_element is None:
|
||||
# Element doesn't exist in document
|
||||
return False
|
||||
|
||||
return self._validate_array_constraints(actual_element, expected_element)
|
||||
|
||||
def _validate_array_constraints(self, actual: Dict[str, Any], expected: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate array constraints (minItems, maxItems) for structural elements.
|
||||
|
||||
Args:
|
||||
actual: Actual element specification
|
||||
expected: Expected element specification
|
||||
|
||||
Returns:
|
||||
True if constraints are satisfied
|
||||
"""
|
||||
# Get actual count from the schema specification
|
||||
# For generated schemas, we use minItems/maxItems which represent actual counts
|
||||
actual_min = actual.get('minItems', 0)
|
||||
actual_max = actual.get('maxItems', actual_min)
|
||||
actual_count = actual_max # In our generated schemas, min=max=actual count
|
||||
|
||||
# Check against expected constraints
|
||||
expected_min = expected.get('minItems', 0)
|
||||
expected_max = expected.get('maxItems', float('inf'))
|
||||
|
||||
return expected_min <= actual_count <= expected_max
|
||||
|
||||
# Issue #8: Detailed Error Reporting Methods
|
||||
|
||||
def validate_file_with_errors(self, file_path: Path, schema: Dict[str, Any]) -> ValidationErrorCollector:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema and collect detailed errors.
|
||||
|
||||
This method provides comprehensive error reporting for Issue #8, enabling
|
||||
users to understand exactly how their documents deviate from schemas.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema: JSON schema dictionary to validate against
|
||||
|
||||
Returns:
|
||||
ValidationErrorCollector with all validation errors
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid
|
||||
"""
|
||||
# Validate inputs
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"Markdown file not found: {file_path}")
|
||||
|
||||
# Validate the schema itself
|
||||
self._validate_schema(schema)
|
||||
|
||||
# Initialize error collector
|
||||
error_collector = ValidationErrorCollector()
|
||||
|
||||
# Generate the document's current structure
|
||||
try:
|
||||
document_schema = self.schema_generator.generate_schema_from_file(file_path)
|
||||
except Exception as e:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.STRUCTURAL_VIOLATION,
|
||||
f"Failed to generate document schema: {e}",
|
||||
"document.structure",
|
||||
suggestion="Check if the markdown file is properly formatted"
|
||||
)
|
||||
return error_collector
|
||||
|
||||
# Compare the document's structure against the expected schema and collect errors
|
||||
if self._has_heading_text_constraints(schema):
|
||||
# For heading text validation, we need to handle enum constraints specially
|
||||
self._compare_structures_with_errors(document_schema, schema, error_collector)
|
||||
self._validate_heading_text_constraints_with_errors(file_path, schema, error_collector)
|
||||
else:
|
||||
# Use standard structure comparison for backward compatibility
|
||||
self._compare_structures_with_errors(document_schema, schema, error_collector)
|
||||
|
||||
return error_collector
|
||||
|
||||
def validate_file_with_errors_string(self, file_path: Path, schema_json: str) -> ValidationErrorCollector:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema string and collect detailed errors.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_json: JSON schema as a string
|
||||
|
||||
Returns:
|
||||
ValidationErrorCollector with all validation errors
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid JSON or schema
|
||||
"""
|
||||
try:
|
||||
schema = json.loads(schema_json)
|
||||
except json.JSONDecodeError as e:
|
||||
raise InvalidSchemaError(f"Invalid JSON schema string: {e}") from e
|
||||
|
||||
return self.validate_file_with_errors(file_path, schema)
|
||||
|
||||
def validate_file_with_errors_file(self, file_path: Path, schema_file_path: Path) -> ValidationErrorCollector:
|
||||
"""
|
||||
Validate a markdown file against a schema file and collect detailed errors.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_file_path: Path to the JSON schema file
|
||||
|
||||
Returns:
|
||||
ValidationErrorCollector with all validation errors
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If either file doesn't exist
|
||||
InvalidSchemaError: If the schema file is invalid
|
||||
"""
|
||||
if not schema_file_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {schema_file_path}")
|
||||
|
||||
try:
|
||||
schema_content = schema_file_path.read_text(encoding='utf-8')
|
||||
schema = json.loads(schema_content)
|
||||
except (IOError, json.JSONDecodeError) as e:
|
||||
raise InvalidSchemaError(f"Failed to load schema file {schema_file_path}: {e}") from e
|
||||
|
||||
return self.validate_file_with_errors(file_path, schema)
|
||||
|
||||
def _compare_structures_with_errors(
|
||||
self,
|
||||
document_schema: Dict[str, Any],
|
||||
expected_schema: Dict[str, Any],
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Compare document structure against expected schema and collect detailed errors.
|
||||
|
||||
This method performs comprehensive validation analysis, collecting specific
|
||||
errors about missing headings, incorrect content counts, and structural violations.
|
||||
|
||||
Args:
|
||||
document_schema: Schema generated from the actual document
|
||||
expected_schema: Expected schema requirements
|
||||
error_collector: Collector to accumulate validation errors
|
||||
"""
|
||||
try:
|
||||
# Extract actual document structure
|
||||
doc_properties = document_schema.get('properties', {})
|
||||
expected_properties = expected_schema.get('properties', {})
|
||||
|
||||
# Check all required properties are present
|
||||
required_properties = expected_schema.get('required', [])
|
||||
for prop in required_properties:
|
||||
if prop not in doc_properties:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.MISSING_REQUIRED_SECTION,
|
||||
f"Missing required section: '{prop}'",
|
||||
f"document.{prop}",
|
||||
expected=f"Section '{prop}' is required by schema",
|
||||
actual="Section not found",
|
||||
suggestion=f"Add the '{prop}' section to your document"
|
||||
)
|
||||
|
||||
# Validate heading structure if specified
|
||||
if 'headings' in expected_properties and 'headings' in doc_properties:
|
||||
self._validate_heading_structure_with_errors(
|
||||
doc_properties['headings'],
|
||||
expected_properties['headings'],
|
||||
error_collector
|
||||
)
|
||||
|
||||
# Validate other structural elements
|
||||
structural_elements = ['paragraphs', 'lists', 'code_blocks', 'blockquotes', 'tables']
|
||||
for element in structural_elements:
|
||||
if element in expected_properties:
|
||||
self._validate_structural_element_with_errors(
|
||||
doc_properties.get(element),
|
||||
expected_properties[element],
|
||||
element,
|
||||
error_collector
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.STRUCTURAL_VIOLATION,
|
||||
f"Error during structure comparison: {e}",
|
||||
"document.structure",
|
||||
suggestion="Check if both the document and schema are properly formatted"
|
||||
)
|
||||
|
||||
def _validate_heading_structure_with_errors(
|
||||
self,
|
||||
actual_headings: Dict[str, Any],
|
||||
expected_headings: Dict[str, Any],
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate heading structure and collect detailed errors.
|
||||
|
||||
Args:
|
||||
actual_headings: Actual heading structure from document
|
||||
expected_headings: Expected heading requirements
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
actual_heading_props = actual_headings.get('properties', {})
|
||||
expected_heading_props = expected_headings.get('properties', {})
|
||||
required_heading_levels = expected_headings.get('required', [])
|
||||
|
||||
# Check required heading levels are present
|
||||
for level in required_heading_levels:
|
||||
if level not in actual_heading_props:
|
||||
level_num = level.replace('level_', '')
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.MISSING_REQUIRED_HEADING,
|
||||
f"Missing required heading level {level_num}",
|
||||
f"headings.{level}",
|
||||
expected=f"At least one heading at level {level_num}",
|
||||
actual="No headings found at this level",
|
||||
suggestion=f"Add heading(s) at level {level_num} (e.g., {'#' * int(level_num)} Heading)"
|
||||
)
|
||||
|
||||
# Check each expected heading level meets requirements
|
||||
for level, expected_spec in expected_heading_props.items():
|
||||
if level not in actual_heading_props:
|
||||
# If level is not required, skip it
|
||||
if level not in required_heading_levels:
|
||||
continue
|
||||
# Already handled above in required check
|
||||
|
||||
else:
|
||||
actual_spec = actual_heading_props[level]
|
||||
level_num = level.replace('level_', '')
|
||||
|
||||
# Check minimum and maximum item requirements
|
||||
self._validate_array_constraints_with_errors(
|
||||
actual_spec,
|
||||
expected_spec,
|
||||
f"headings.{level}",
|
||||
f"level {level_num} headings",
|
||||
error_collector
|
||||
)
|
||||
|
||||
def _validate_structural_element_with_errors(
|
||||
self,
|
||||
actual_element: Dict[str, Any],
|
||||
expected_element: Dict[str, Any],
|
||||
element_name: str,
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate a structural element and collect errors.
|
||||
|
||||
Args:
|
||||
actual_element: Actual element structure from document
|
||||
expected_element: Expected element requirements
|
||||
element_name: Name of the structural element (for error messages)
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
if actual_element is None:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.MISSING_REQUIRED_SECTION,
|
||||
f"Missing required structural element: {element_name}",
|
||||
f"content.{element_name}",
|
||||
expected=f"Document should contain {element_name}",
|
||||
actual="Element not found",
|
||||
suggestion=f"Add {element_name} to your document"
|
||||
)
|
||||
return
|
||||
|
||||
self._validate_array_constraints_with_errors(
|
||||
actual_element,
|
||||
expected_element,
|
||||
f"content.{element_name}",
|
||||
element_name,
|
||||
error_collector
|
||||
)
|
||||
|
||||
def _validate_array_constraints_with_errors(
|
||||
self,
|
||||
actual: Dict[str, Any],
|
||||
expected: Dict[str, Any],
|
||||
path: str,
|
||||
element_description: str,
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate array constraints and collect specific errors.
|
||||
|
||||
Args:
|
||||
actual: Actual element specification
|
||||
expected: Expected element specification
|
||||
path: JSON path for error location
|
||||
element_description: Human-readable element description
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
# Get actual count from the schema specification
|
||||
actual_min = actual.get('minItems', 0)
|
||||
actual_max = actual.get('maxItems', actual_min)
|
||||
actual_count = actual_max # In our generated schemas, min=max=actual count
|
||||
|
||||
# Check against expected constraints
|
||||
expected_min = expected.get('minItems', 0)
|
||||
expected_max = expected.get('maxItems', float('inf'))
|
||||
|
||||
# Check minimum constraint
|
||||
if actual_count < expected_min:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.INSUFFICIENT_CONTENT,
|
||||
f"Insufficient {element_description}: found {actual_count}, required at least {expected_min}",
|
||||
path,
|
||||
expected=f"At least {expected_min} {element_description}",
|
||||
actual=f"{actual_count} {element_description}",
|
||||
suggestion=f"Add {expected_min - actual_count} more {element_description}"
|
||||
)
|
||||
|
||||
# Check maximum constraint
|
||||
if expected_max != float('inf') and actual_count > expected_max:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.EXCESS_CONTENT,
|
||||
f"Too many {element_description}: found {actual_count}, maximum allowed {expected_max}",
|
||||
path,
|
||||
expected=f"At most {expected_max} {element_description}",
|
||||
actual=f"{actual_count} {element_description}",
|
||||
suggestion=f"Remove {actual_count - expected_max} {element_description}"
|
||||
)
|
||||
|
||||
def _has_heading_text_constraints(self, schema: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Check if the schema has heading text constraints (enum values on heading content).
|
||||
|
||||
Args:
|
||||
schema: JSON schema to check
|
||||
|
||||
Returns:
|
||||
True if schema has heading text constraints
|
||||
"""
|
||||
headings_props = schema.get('properties', {}).get('headings', {}).get('properties', {})
|
||||
|
||||
for level_props in headings_props.values():
|
||||
items = level_props.get('items', {})
|
||||
content_prop = items.get('properties', {}).get('content', {})
|
||||
if 'enum' in content_prop:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _validate_with_heading_text_constraints(
|
||||
self,
|
||||
file_path: Path,
|
||||
expected_schema: Dict[str, Any],
|
||||
document_schema: Dict[str, Any]
|
||||
) -> bool:
|
||||
"""
|
||||
Validate document with heading text constraints by comparing actual content against enum values.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
expected_schema: Schema with heading text constraints
|
||||
document_schema: Generated schema from the actual document
|
||||
|
||||
Returns:
|
||||
True if document meets all constraints including heading text
|
||||
"""
|
||||
# First check standard structure compliance
|
||||
if not self._compare_structures(document_schema, expected_schema):
|
||||
return False
|
||||
|
||||
# Then check heading text constraints
|
||||
expected_headings = expected_schema.get('properties', {}).get('headings', {}).get('properties', {})
|
||||
|
||||
# Generate document analysis with actual heading content
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
ast_tokens = parse_markdown_to_ast(content)
|
||||
structure_analysis = self.schema_generator._analyze_ast_structure(ast_tokens, None)
|
||||
|
||||
for level_key, expected_level_spec in expected_headings.items():
|
||||
content_constraints = expected_level_spec.get('items', {}).get('properties', {}).get('content', {})
|
||||
|
||||
if 'enum' in content_constraints:
|
||||
allowed_texts = content_constraints['enum']
|
||||
actual_headings = structure_analysis['headings'].get(level_key, [])
|
||||
|
||||
for heading in actual_headings:
|
||||
actual_text = heading['content']
|
||||
if actual_text not in allowed_texts:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _validate_heading_text_constraints_with_errors(
|
||||
self,
|
||||
file_path: Path,
|
||||
expected_schema: Dict[str, Any],
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate heading text constraints and collect detailed errors.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
expected_schema: Schema with heading text constraints
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
expected_headings = expected_schema.get('properties', {}).get('headings', {}).get('properties', {})
|
||||
|
||||
# Generate document analysis with actual heading content
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
ast_tokens = parse_markdown_to_ast(content)
|
||||
structure_analysis = self.schema_generator._analyze_ast_structure(ast_tokens, None)
|
||||
|
||||
for level_key, expected_level_spec in expected_headings.items():
|
||||
content_constraints = expected_level_spec.get('items', {}).get('properties', {}).get('content', {})
|
||||
|
||||
if 'enum' in content_constraints:
|
||||
allowed_texts = content_constraints['enum']
|
||||
actual_headings = structure_analysis['headings'].get(level_key, [])
|
||||
|
||||
for i, heading in enumerate(actual_headings):
|
||||
actual_text = heading['content']
|
||||
if actual_text not in allowed_texts:
|
||||
# Add detailed error about heading text mismatch
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.HEADING_COUNT_MISMATCH,
|
||||
f"Heading text mismatch at {level_key.replace('_', ' ')} #{i+1}: expected one of {allowed_texts}, found '{actual_text}'",
|
||||
f"headings.{level_key}[{i}].content",
|
||||
expected=f"One of: {allowed_texts}",
|
||||
actual=actual_text,
|
||||
suggestion=f"Change heading text to one of the allowed values: {', '.join(allowed_texts)}"
|
||||
)
|
||||
@@ -1,352 +1,25 @@
|
||||
"""
|
||||
Schema Analyzer for Phase 2: Schema Refinement Tools
|
||||
Schema Analyzer - Backward Compatibility Module.
|
||||
|
||||
Analyzes JSON schemas to detect rigidity issues and provide suggestions
|
||||
for improvement using the Phase 1 classification system.
|
||||
This module re-exports from markitect.schema.analyzer for backward compatibility.
|
||||
New code should import from markitect.schema.analyzer directly.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional, Tuple
|
||||
import json
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
# Re-export from schema package for backward compatibility
|
||||
from markitect.schema.analyzer import (
|
||||
SchemaAnalyzer,
|
||||
SchemaAnalysisResult,
|
||||
SchemaIssue,
|
||||
IssueType,
|
||||
IssueSeverity,
|
||||
analyze_schema_cli,
|
||||
)
|
||||
|
||||
|
||||
class IssueType(Enum):
|
||||
"""Types of schema rigidity issues."""
|
||||
EXACT_COUNT = "exact_count"
|
||||
MISSING_CLASSIFICATIONS = "missing_classifications"
|
||||
MISSING_CONTENT_INSTRUCTIONS = "missing_content_instructions"
|
||||
OVERLY_SPECIFIC = "overly_specific"
|
||||
NO_FLEXIBILITY = "no_flexibility"
|
||||
DEPRECATED_EXTENSIONS = "deprecated_extensions"
|
||||
|
||||
|
||||
class IssueSeverity(Enum):
|
||||
"""Severity levels for schema issues."""
|
||||
INFO = "info"
|
||||
WARNING = "warning"
|
||||
ERROR = "error"
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaIssue:
|
||||
"""Represents a detected schema issue."""
|
||||
issue_type: IssueType
|
||||
severity: IssueSeverity
|
||||
path: str
|
||||
message: str
|
||||
suggestion: str
|
||||
current_value: Any = None
|
||||
suggested_value: Any = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaAnalysisResult:
|
||||
"""Results of schema analysis."""
|
||||
is_rigid: bool
|
||||
rigidity_score: int # 0-100, higher = more rigid
|
||||
issues: List[SchemaIssue] = field(default_factory=list)
|
||||
has_classifications: bool = False
|
||||
has_content_control: bool = False
|
||||
uses_deprecated_extensions: bool = False
|
||||
|
||||
@property
|
||||
def issue_count_by_severity(self) -> Dict[IssueSeverity, int]:
|
||||
"""Count issues by severity."""
|
||||
counts = {severity: 0 for severity in IssueSeverity}
|
||||
for issue in self.issues:
|
||||
counts[issue.severity] += 1
|
||||
return counts
|
||||
|
||||
|
||||
class SchemaAnalyzer:
|
||||
"""Analyzes schemas for rigidity and suggests improvements."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema analyzer."""
|
||||
self.deprecated_extensions = [
|
||||
"x-markitect-required-sections",
|
||||
"x-markitect-recommended-sections",
|
||||
"x-markitect-optional-sections"
|
||||
]
|
||||
|
||||
def analyze_schema(self, schema: Dict[str, Any]) -> SchemaAnalysisResult:
|
||||
"""
|
||||
Analyze a schema for rigidity issues.
|
||||
|
||||
Args:
|
||||
schema: The JSON schema to analyze
|
||||
|
||||
Returns:
|
||||
SchemaAnalysisResult with detected issues and suggestions
|
||||
"""
|
||||
result = SchemaAnalysisResult(is_rigid=False, rigidity_score=0)
|
||||
|
||||
# Check for Phase 1 features
|
||||
result.has_classifications = "x-markitect-sections" in schema
|
||||
result.has_content_control = "x-markitect-content-control" in schema
|
||||
|
||||
# Check for deprecated extensions
|
||||
for deprecated in self.deprecated_extensions:
|
||||
if deprecated in schema:
|
||||
result.uses_deprecated_extensions = True
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.DEPRECATED_EXTENSIONS,
|
||||
severity=IssueSeverity.WARNING,
|
||||
path=deprecated,
|
||||
message=f"Using deprecated extension '{deprecated}'",
|
||||
suggestion=f"Migrate to 'x-markitect-sections' with classification system"
|
||||
))
|
||||
|
||||
# Analyze properties for rigidity
|
||||
if "properties" in schema:
|
||||
self._analyze_properties(schema["properties"], result, "properties")
|
||||
|
||||
# Check for missing classifications
|
||||
if not result.has_classifications:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.MISSING_CLASSIFICATIONS,
|
||||
severity=IssueSeverity.INFO,
|
||||
path="root",
|
||||
message="Schema does not use section classification system",
|
||||
suggestion="Add 'x-markitect-sections' to classify sections as required/recommended/optional/discouraged/improper"
|
||||
))
|
||||
|
||||
# Check for missing content control
|
||||
if not result.has_content_control:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.MISSING_CONTENT_INSTRUCTIONS,
|
||||
severity=IssueSeverity.INFO,
|
||||
path="root",
|
||||
message="Schema does not provide content control",
|
||||
suggestion="Add 'x-markitect-content-control' for pattern validation and quality metrics"
|
||||
))
|
||||
|
||||
# Calculate rigidity score
|
||||
result.rigidity_score = self._calculate_rigidity_score(result)
|
||||
result.is_rigid = result.rigidity_score > 50
|
||||
|
||||
return result
|
||||
|
||||
def _analyze_properties(self, properties: Dict[str, Any], result: SchemaAnalysisResult, path: str):
|
||||
"""Analyze schema properties for rigidity issues."""
|
||||
for prop_name, prop_def in properties.items():
|
||||
prop_path = f"{path}.{prop_name}"
|
||||
|
||||
if not isinstance(prop_def, dict):
|
||||
continue
|
||||
|
||||
# Check for exact counts (const)
|
||||
if "const" in prop_def:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
severity=IssueSeverity.WARNING,
|
||||
path=prop_path,
|
||||
message=f"Property '{prop_name}' requires exact value",
|
||||
suggestion=f"Consider using a range or removing constraint for flexibility",
|
||||
current_value=prop_def["const"]
|
||||
))
|
||||
|
||||
# Check for arrays with exact counts
|
||||
if prop_def.get("type") == "array":
|
||||
min_items = prop_def.get("minItems")
|
||||
max_items = prop_def.get("maxItems")
|
||||
|
||||
if min_items is not None and max_items is not None and min_items == max_items:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
severity=IssueSeverity.WARNING,
|
||||
path=prop_path,
|
||||
message=f"Array '{prop_name}' requires exactly {min_items} items",
|
||||
suggestion=f"Use a range like minItems: {max(0, min_items - 2)}, maxItems: {min_items + 5}",
|
||||
current_value={"minItems": min_items, "maxItems": max_items},
|
||||
suggested_value={
|
||||
"minItems": max(0, min_items - 2),
|
||||
"maxItems": min_items + 5
|
||||
}
|
||||
))
|
||||
|
||||
# Check for overly specific counts (large numbers)
|
||||
if min_items is not None and min_items > 50:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.OVERLY_SPECIFIC,
|
||||
severity=IssueSeverity.INFO,
|
||||
path=prop_path,
|
||||
message=f"Array '{prop_name}' has very specific minItems: {min_items}",
|
||||
suggestion=f"Consider rounding to {(min_items // 10) * 10} for flexibility",
|
||||
current_value=min_items,
|
||||
suggested_value=(min_items // 10) * 10
|
||||
))
|
||||
|
||||
# Check for overly specific integer constraints
|
||||
if prop_def.get("type") == "integer":
|
||||
if "minimum" in prop_def and "maximum" in prop_def:
|
||||
min_val = prop_def["minimum"]
|
||||
max_val = prop_def["maximum"]
|
||||
range_size = max_val - min_val
|
||||
|
||||
if range_size < 3:
|
||||
result.issues.append(SchemaIssue(
|
||||
issue_type=IssueType.NO_FLEXIBILITY,
|
||||
severity=IssueSeverity.INFO,
|
||||
path=prop_path,
|
||||
message=f"Integer '{prop_name}' has very narrow range: {min_val}-{max_val}",
|
||||
suggestion=f"Consider widening range for flexibility",
|
||||
current_value={"minimum": min_val, "maximum": max_val}
|
||||
))
|
||||
|
||||
# Recursively check nested properties
|
||||
if "properties" in prop_def:
|
||||
self._analyze_properties(prop_def["properties"], result, prop_path)
|
||||
|
||||
# Check items schema for arrays
|
||||
if "items" in prop_def and isinstance(prop_def["items"], dict):
|
||||
if "properties" in prop_def["items"]:
|
||||
self._analyze_properties(
|
||||
prop_def["items"]["properties"],
|
||||
result,
|
||||
f"{prop_path}.items"
|
||||
)
|
||||
|
||||
def _calculate_rigidity_score(self, result: SchemaAnalysisResult) -> int:
|
||||
"""
|
||||
Calculate overall rigidity score (0-100).
|
||||
|
||||
Higher score = more rigid schema.
|
||||
"""
|
||||
score = 0
|
||||
|
||||
# Count issues by type with weighted scores
|
||||
weights = {
|
||||
IssueType.EXACT_COUNT: 15,
|
||||
IssueType.OVERLY_SPECIFIC: 10,
|
||||
IssueType.NO_FLEXIBILITY: 8,
|
||||
IssueType.MISSING_CLASSIFICATIONS: 5,
|
||||
IssueType.MISSING_CONTENT_INSTRUCTIONS: 3,
|
||||
IssueType.DEPRECATED_EXTENSIONS: 5
|
||||
}
|
||||
|
||||
for issue in result.issues:
|
||||
score += weights.get(issue.issue_type, 5)
|
||||
|
||||
# Cap at 100
|
||||
return min(100, score)
|
||||
|
||||
def analyze_schema_file(self, schema_path: Path) -> SchemaAnalysisResult:
|
||||
"""
|
||||
Analyze a schema file.
|
||||
|
||||
Args:
|
||||
schema_path: Path to JSON schema file
|
||||
|
||||
Returns:
|
||||
SchemaAnalysisResult
|
||||
"""
|
||||
with open(schema_path) as f:
|
||||
schema = json.load(f)
|
||||
|
||||
return self.analyze_schema(schema)
|
||||
|
||||
def format_analysis_report(self, result: SchemaAnalysisResult, verbose: bool = False) -> str:
|
||||
"""
|
||||
Format analysis results as a human-readable report.
|
||||
|
||||
Args:
|
||||
result: Analysis results
|
||||
verbose: Include detailed information
|
||||
|
||||
Returns:
|
||||
Formatted report string
|
||||
"""
|
||||
lines = []
|
||||
|
||||
# Header
|
||||
lines.append("=" * 70)
|
||||
lines.append("Schema Analysis Report")
|
||||
lines.append("=" * 70)
|
||||
lines.append("")
|
||||
|
||||
# Overall assessment
|
||||
rigidity_level = "HIGH" if result.rigidity_score > 70 else "MEDIUM" if result.rigidity_score > 40 else "LOW"
|
||||
lines.append(f"Rigidity Score: {result.rigidity_score}/100 ({rigidity_level})")
|
||||
lines.append(f"Status: {'RIGID - Needs refinement' if result.is_rigid else 'FLEXIBLE - Good'}")
|
||||
lines.append("")
|
||||
|
||||
# Features check
|
||||
lines.append("Phase 1 Features:")
|
||||
lines.append(f" ✓ Classifications: {'Yes' if result.has_classifications else 'No'}")
|
||||
lines.append(f" ✓ Content Control: {'Yes' if result.has_content_control else 'No'}")
|
||||
if result.uses_deprecated_extensions:
|
||||
lines.append(f" ⚠ Deprecated Extensions: Yes (needs migration)")
|
||||
lines.append("")
|
||||
|
||||
# Issue summary
|
||||
counts = result.issue_count_by_severity
|
||||
lines.append(f"Issues Found: {len(result.issues)} total")
|
||||
lines.append(f" - Errors: {counts[IssueSeverity.ERROR]}")
|
||||
lines.append(f" - Warnings: {counts[IssueSeverity.WARNING]}")
|
||||
lines.append(f" - Info: {counts[IssueSeverity.INFO]}")
|
||||
lines.append("")
|
||||
|
||||
# List issues
|
||||
if result.issues:
|
||||
lines.append("Detected Issues:")
|
||||
lines.append("-" * 70)
|
||||
|
||||
for i, issue in enumerate(result.issues, 1):
|
||||
severity_icon = "❌" if issue.severity == IssueSeverity.ERROR else "⚠️ " if issue.severity == IssueSeverity.WARNING else "ℹ️ "
|
||||
lines.append(f"{i}. {severity_icon} {issue.message}")
|
||||
lines.append(f" Path: {issue.path}")
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if verbose and issue.current_value is not None:
|
||||
lines.append(f" Current: {json.dumps(issue.current_value)}")
|
||||
if verbose and issue.suggested_value is not None:
|
||||
lines.append(f" Suggested: {json.dumps(issue.suggested_value)}")
|
||||
|
||||
lines.append("")
|
||||
else:
|
||||
lines.append("✅ No issues found - schema is well-designed!")
|
||||
lines.append("")
|
||||
|
||||
# Recommendations
|
||||
if result.is_rigid:
|
||||
lines.append("Recommendations:")
|
||||
lines.append("-" * 70)
|
||||
lines.append("Run: markitect schema-refine <schema-file> --loosen-counts")
|
||||
lines.append(" to automatically apply suggested improvements")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def analyze_schema_cli(schema_path: str, verbose: bool = False) -> int:
|
||||
"""
|
||||
CLI entry point for schema analysis.
|
||||
|
||||
Args:
|
||||
schema_path: Path to schema file
|
||||
verbose: Show detailed information
|
||||
|
||||
Returns:
|
||||
Exit code (0 = success, 1 = rigid schema found)
|
||||
"""
|
||||
analyzer = SchemaAnalyzer()
|
||||
|
||||
try:
|
||||
result = analyzer.analyze_schema_file(Path(schema_path))
|
||||
report = analyzer.format_analysis_report(result, verbose=verbose)
|
||||
print(report)
|
||||
|
||||
return 1 if result.is_rigid else 0
|
||||
|
||||
except FileNotFoundError:
|
||||
print(f"Error: Schema file not found: {schema_path}")
|
||||
return 2
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: Invalid JSON in schema file: {e}")
|
||||
return 2
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
return 2
|
||||
__all__ = [
|
||||
'SchemaAnalyzer',
|
||||
'SchemaAnalysisResult',
|
||||
'SchemaIssue',
|
||||
'IssueType',
|
||||
'IssueSeverity',
|
||||
'analyze_schema_cli',
|
||||
]
|
||||
|
||||
@@ -1,466 +1,11 @@
|
||||
"""
|
||||
Schema Generator for Issue #5: Generate a Schema from a Markdown File.
|
||||
Schema Generator - Backward Compatibility Module.
|
||||
|
||||
This module provides functionality to analyze markdown AST structures and generate
|
||||
JSON schemas that describe the document's structural elements with configurable
|
||||
depth limitations for architectural documentation analysis.
|
||||
This module re-exports from markitect.schema.generator for backward compatibility.
|
||||
New code should import from markitect.schema.generator directly.
|
||||
"""
|
||||
|
||||
import json
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional, Set
|
||||
# Re-export from schema package for backward compatibility
|
||||
from markitect.schema.generator import SchemaGenerator
|
||||
|
||||
from .parser import parse_markdown_to_ast
|
||||
from .exceptions import FileNotFoundError, InvalidDepthError, InvalidInstructionTypeError
|
||||
|
||||
|
||||
class SchemaGenerator:
|
||||
"""
|
||||
Generates JSON schemas from markdown file AST structures.
|
||||
|
||||
Analyzes the structural elements of markdown documents and creates
|
||||
JSON schemas that can be used for validation and compliance checking
|
||||
in architecture documentation workflows.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema generator."""
|
||||
self.default_schema_url = "http://json-schema.org/draft-07/schema#"
|
||||
|
||||
def generate_schema_from_file(
|
||||
self,
|
||||
file_path: Path,
|
||||
max_depth: Optional[int] = None,
|
||||
mode: Optional[str] = None,
|
||||
outline_depth: Optional[int] = None,
|
||||
capture_heading_text: bool = False,
|
||||
include_content_instructions: bool = False,
|
||||
instruction_type: str = 'description'
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate a JSON schema from a markdown file's AST structure.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
max_depth: Maximum heading depth to include (None = unlimited)
|
||||
mode: Generation mode ('outline' for structure-focused schemas)
|
||||
outline_depth: Depth limit for outline mode
|
||||
capture_heading_text: Whether to capture exact heading text as constraints
|
||||
include_content_instructions: Whether to include content instruction fields
|
||||
instruction_type: Type of content instructions ('description', 'example', 'constraint', 'template')
|
||||
|
||||
Returns:
|
||||
JSON schema as a dictionary
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidDepthError: If max_depth is invalid (< 1)
|
||||
"""
|
||||
# Validate inputs
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"Markdown file not found: {file_path}")
|
||||
|
||||
if max_depth is not None and max_depth < 1:
|
||||
raise InvalidDepthError(f"max_depth must be >= 1, got: {max_depth}")
|
||||
|
||||
# Validate instruction type
|
||||
valid_instruction_types = {'description', 'example', 'constraint', 'template'}
|
||||
if instruction_type not in valid_instruction_types:
|
||||
raise InvalidInstructionTypeError(f"Invalid instruction type '{instruction_type}'. Must be one of: {', '.join(valid_instruction_types)}")
|
||||
|
||||
# Read and parse the markdown file
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
ast_tokens = parse_markdown_to_ast(content)
|
||||
|
||||
# Analyze the AST structure
|
||||
structure_analysis = self._analyze_ast_structure(ast_tokens, max_depth)
|
||||
|
||||
# Generate the JSON schema
|
||||
schema = self._create_json_schema(
|
||||
structure_analysis,
|
||||
file_path.name,
|
||||
mode=mode,
|
||||
outline_depth=outline_depth,
|
||||
capture_heading_text=capture_heading_text,
|
||||
include_content_instructions=include_content_instructions,
|
||||
instruction_type=instruction_type
|
||||
)
|
||||
|
||||
return schema
|
||||
|
||||
def _analyze_ast_structure(self, tokens: List[Dict[str, Any]], max_depth: Optional[int]) -> Dict[str, Any]:
|
||||
"""
|
||||
Analyze AST tokens to extract structural patterns.
|
||||
|
||||
Args:
|
||||
tokens: List of AST tokens from markdown-it
|
||||
max_depth: Maximum heading depth to analyze
|
||||
|
||||
Returns:
|
||||
Dictionary containing structural analysis
|
||||
"""
|
||||
analysis = {
|
||||
'headings': defaultdict(list),
|
||||
'paragraphs': [],
|
||||
'lists': [],
|
||||
'code_blocks': [],
|
||||
'blockquotes': [],
|
||||
'tables': [],
|
||||
'links': [],
|
||||
'images': [],
|
||||
'emphasis': [],
|
||||
'structure_types': set()
|
||||
}
|
||||
|
||||
current_heading_level = 0
|
||||
i = 0
|
||||
|
||||
while i < len(tokens):
|
||||
token = tokens[i]
|
||||
token_type = token.get('type', '')
|
||||
|
||||
# Track all structural types found
|
||||
analysis['structure_types'].add(token_type)
|
||||
|
||||
# Analyze headings with depth filtering
|
||||
if token_type == 'heading_open':
|
||||
level = self._extract_heading_level(token.get('tag', ''))
|
||||
if max_depth is None or level <= max_depth:
|
||||
heading_content = self._extract_heading_content(tokens, i)
|
||||
analysis['headings'][f'level_{level}'].append({
|
||||
'content': heading_content,
|
||||
'level': level,
|
||||
'position': i
|
||||
})
|
||||
current_heading_level = level
|
||||
|
||||
# Analyze paragraphs
|
||||
elif token_type == 'paragraph_open':
|
||||
paragraph_content = self._extract_paragraph_content(tokens, i)
|
||||
analysis['paragraphs'].append({
|
||||
'content': paragraph_content,
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze lists
|
||||
elif token_type in ['bullet_list_open', 'ordered_list_open']:
|
||||
list_structure = self._extract_list_structure(tokens, i)
|
||||
analysis['lists'].append({
|
||||
'type': 'bullet' if token_type == 'bullet_list_open' else 'ordered',
|
||||
'structure': list_structure,
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze code blocks
|
||||
elif token_type == 'code_block' or token_type == 'fence':
|
||||
code_info = self._extract_code_block_info(token)
|
||||
analysis['code_blocks'].append({
|
||||
'language': code_info.get('language', ''),
|
||||
'content_length': len(code_info.get('content', '')),
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze blockquotes
|
||||
elif token_type == 'blockquote_open':
|
||||
quote_content = self._extract_blockquote_content(tokens, i)
|
||||
analysis['blockquotes'].append({
|
||||
'content': quote_content,
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze tables
|
||||
elif token_type == 'table_open':
|
||||
table_structure = self._extract_table_structure(tokens, i)
|
||||
analysis['tables'].append({
|
||||
'columns': table_structure.get('columns', 0),
|
||||
'rows': table_structure.get('rows', 0),
|
||||
'position': i,
|
||||
'under_heading_level': current_heading_level
|
||||
})
|
||||
|
||||
# Analyze inline elements
|
||||
elif token_type == 'inline':
|
||||
inline_analysis = self._analyze_inline_content(token)
|
||||
analysis['links'].extend(inline_analysis.get('links', []))
|
||||
analysis['images'].extend(inline_analysis.get('images', []))
|
||||
analysis['emphasis'].extend(inline_analysis.get('emphasis', []))
|
||||
|
||||
i += 1
|
||||
|
||||
# Convert sets to lists for JSON serialization
|
||||
analysis['structure_types'] = list(analysis['structure_types'])
|
||||
|
||||
return analysis
|
||||
|
||||
def _create_json_schema(
|
||||
self,
|
||||
analysis: Dict[str, Any],
|
||||
filename: str,
|
||||
mode: Optional[str] = None,
|
||||
outline_depth: Optional[int] = None,
|
||||
capture_heading_text: bool = False,
|
||||
include_content_instructions: bool = False,
|
||||
instruction_type: str = 'description'
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Create a JSON schema from structural analysis.
|
||||
|
||||
Args:
|
||||
analysis: Structural analysis of the document
|
||||
filename: Name of the source file
|
||||
mode: Generation mode ('outline' for structure-focused schemas)
|
||||
outline_depth: Depth limit for outline mode
|
||||
capture_heading_text: Whether to capture exact heading text as constraints
|
||||
include_content_instructions: Whether to include content instruction fields
|
||||
instruction_type: Type of content instructions to generate
|
||||
|
||||
Returns:
|
||||
JSON schema dictionary
|
||||
"""
|
||||
# Determine title format based on mode
|
||||
title_preposition = "from" if mode == "outline" else "for"
|
||||
|
||||
schema = {
|
||||
"$schema": self.default_schema_url,
|
||||
"type": "object",
|
||||
"title": f"Schema {title_preposition} {filename}",
|
||||
"description": f"JSON schema describing the structure of {filename}",
|
||||
"properties": {}
|
||||
}
|
||||
|
||||
# Add metaschema extensions for outline mode
|
||||
if mode == "outline":
|
||||
schema["x-markitect-outline-mode"] = True
|
||||
if outline_depth is not None:
|
||||
schema["x-markitect-outline-depth"] = outline_depth
|
||||
|
||||
# Add metaschema extension for heading text capture
|
||||
if capture_heading_text:
|
||||
schema["x-markitect-heading-text-capture"] = True
|
||||
|
||||
# Add metaschema extension for content instructions
|
||||
if include_content_instructions:
|
||||
schema["x-markitect-content-instructions-enabled"] = True
|
||||
|
||||
# Add heading structure
|
||||
if analysis['headings']:
|
||||
heading_properties = {}
|
||||
for level_key, headings in analysis['headings'].items():
|
||||
if headings: # Only include levels that have content
|
||||
# Configure content property based on heading text capture
|
||||
if capture_heading_text:
|
||||
# Extract actual heading texts in document order
|
||||
heading_texts = [heading['content'] for heading in headings]
|
||||
content_property = {"enum": heading_texts}
|
||||
else:
|
||||
content_property = {"type": "string"}
|
||||
|
||||
# Build properties for the heading item
|
||||
item_properties = {
|
||||
"content": content_property,
|
||||
"level": {"type": "integer"},
|
||||
"position": {"type": "integer"}
|
||||
}
|
||||
|
||||
# Add content instruction fields if enabled
|
||||
if include_content_instructions:
|
||||
# Generate appropriate instruction text based on heading level
|
||||
level_num = int(level_key.split('_')[1])
|
||||
section_name = f"level {level_num} heading"
|
||||
instruction_text = self._generate_content_instruction(section_name, instruction_type)
|
||||
|
||||
item_properties["x-markitect-content-instructions"] = {
|
||||
"type": "string",
|
||||
"const": instruction_text
|
||||
}
|
||||
|
||||
item_properties["x-markitect-instruction-type"] = {
|
||||
"type": "string",
|
||||
"enum": [instruction_type]
|
||||
}
|
||||
|
||||
heading_properties[level_key] = {
|
||||
"type": "array",
|
||||
"description": f"Headings at {level_key.replace('_', ' ')}",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": item_properties,
|
||||
"required": ["content", "level"]
|
||||
},
|
||||
"minItems": len(headings),
|
||||
"maxItems": len(headings)
|
||||
}
|
||||
|
||||
if heading_properties:
|
||||
schema["properties"]["headings"] = {
|
||||
"type": "object",
|
||||
"description": "Document heading structure",
|
||||
"properties": heading_properties
|
||||
}
|
||||
|
||||
# Add other structural elements
|
||||
structural_elements = {
|
||||
"paragraphs": ("Text paragraphs", analysis['paragraphs']),
|
||||
"lists": ("Lists (ordered and unordered)", analysis['lists']),
|
||||
"code_blocks": ("Code blocks and fenced code", analysis['code_blocks']),
|
||||
"blockquotes": ("Block quotations", analysis['blockquotes']),
|
||||
"tables": ("Tables with rows and columns", analysis['tables']),
|
||||
"links": ("Links to external resources", analysis['links']),
|
||||
"images": ("Embedded images", analysis['images']),
|
||||
"emphasis": ("Text emphasis (bold, italic)", analysis['emphasis'])
|
||||
}
|
||||
|
||||
for element_name, (description, element_list) in structural_elements.items():
|
||||
if element_list:
|
||||
# Build base schema for the element
|
||||
element_schema = {
|
||||
"type": "array",
|
||||
"description": description,
|
||||
"minItems": len(element_list),
|
||||
"maxItems": len(element_list)
|
||||
}
|
||||
|
||||
# Add content instructions for paragraphs and lists if enabled
|
||||
if include_content_instructions and element_name in ["paragraphs", "lists"]:
|
||||
element_schema["items"] = {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {"type": "string"},
|
||||
"x-markitect-content-instructions": {
|
||||
"type": "string",
|
||||
"const": self._generate_content_instruction(element_name, instruction_type)
|
||||
},
|
||||
"x-markitect-instruction-type": {
|
||||
"type": "string",
|
||||
"enum": [instruction_type]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
schema["properties"][element_name] = element_schema
|
||||
|
||||
# Add metadata
|
||||
schema["properties"]["metadata"] = {
|
||||
"type": "object",
|
||||
"description": "Document structure metadata",
|
||||
"properties": {
|
||||
"total_elements": {
|
||||
"type": "integer",
|
||||
"const": sum(len(v) if isinstance(v, list) else 0 for v in analysis.values())
|
||||
},
|
||||
"structure_types": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "All structural element types found",
|
||||
"const": analysis['structure_types']
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return schema
|
||||
|
||||
def _extract_heading_level(self, tag: str) -> int:
|
||||
"""Extract heading level from HTML tag (h1, h2, etc.)."""
|
||||
if tag.startswith('h') and len(tag) == 2:
|
||||
try:
|
||||
return int(tag[1])
|
||||
except ValueError:
|
||||
pass
|
||||
return 1
|
||||
|
||||
def _extract_heading_content(self, tokens: List[Dict[str, Any]], start_index: int) -> str:
|
||||
"""Extract text content from heading tokens."""
|
||||
# Look for the inline token that contains the heading text
|
||||
for i in range(start_index, min(start_index + 3, len(tokens))):
|
||||
token = tokens[i]
|
||||
if token.get('type') == 'inline':
|
||||
return token.get('content', '')
|
||||
return ''
|
||||
|
||||
def _extract_paragraph_content(self, tokens: List[Dict[str, Any]], start_index: int) -> str:
|
||||
"""Extract text content from paragraph tokens."""
|
||||
# Look for the inline token that contains the paragraph text
|
||||
for i in range(start_index, min(start_index + 3, len(tokens))):
|
||||
token = tokens[i]
|
||||
if token.get('type') == 'inline':
|
||||
return token.get('content', '')
|
||||
return ''
|
||||
|
||||
def _extract_list_structure(self, tokens: List[Dict[str, Any]], start_index: int) -> Dict[str, Any]:
|
||||
"""Extract list structure information."""
|
||||
# This is a simplified implementation
|
||||
# In a full implementation, we'd parse the nested list structure
|
||||
return {
|
||||
"type": "list",
|
||||
"estimated_items": 1 # Placeholder - would need more complex parsing
|
||||
}
|
||||
|
||||
def _extract_code_block_info(self, token: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extract code block information."""
|
||||
return {
|
||||
"language": token.get('info', '').split()[0] if token.get('info') else '',
|
||||
"content": token.get('content', '')
|
||||
}
|
||||
|
||||
def _extract_blockquote_content(self, tokens: List[Dict[str, Any]], start_index: int) -> str:
|
||||
"""Extract blockquote content."""
|
||||
# Simplified implementation
|
||||
return "blockquote content"
|
||||
|
||||
def _extract_table_structure(self, tokens: List[Dict[str, Any]], start_index: int) -> Dict[str, Any]:
|
||||
"""Extract table structure information."""
|
||||
# Simplified implementation
|
||||
return {
|
||||
"columns": 2, # Placeholder
|
||||
"rows": 1 # Placeholder
|
||||
}
|
||||
|
||||
def _analyze_inline_content(self, token: Dict[str, Any]) -> Dict[str, List[Any]]:
|
||||
"""Analyze inline content for links, images, emphasis."""
|
||||
result = {
|
||||
"links": [],
|
||||
"images": [],
|
||||
"emphasis": []
|
||||
}
|
||||
|
||||
# Analyze children tokens if they exist
|
||||
children = token.get('children', [])
|
||||
for child in children:
|
||||
if child and isinstance(child, dict):
|
||||
child_type = child.get('type', '')
|
||||
if child_type == 'link_open':
|
||||
result['links'].append({"type": "link"})
|
||||
elif child_type == 'image':
|
||||
result['images'].append({"type": "image"})
|
||||
elif child_type in ['em_open', 'strong_open']:
|
||||
result['emphasis'].append({"type": child_type})
|
||||
|
||||
return result
|
||||
|
||||
def _generate_content_instruction(self, heading_text: str, instruction_type: str) -> str:
|
||||
"""
|
||||
Generate appropriate content instruction text based on heading and instruction type.
|
||||
|
||||
Args:
|
||||
heading_text: The text of the heading
|
||||
instruction_type: Type of instruction to generate
|
||||
|
||||
Returns:
|
||||
Instruction text for the content field
|
||||
"""
|
||||
if instruction_type == "description":
|
||||
return f"Provide content for the '{heading_text}' section"
|
||||
elif instruction_type == "example":
|
||||
return f"Example content for the '{heading_text}' section"
|
||||
elif instruction_type == "constraint":
|
||||
return f"Content must be relevant to '{heading_text}'"
|
||||
elif instruction_type == "template":
|
||||
return f"Template content for '{heading_text}' section"
|
||||
else:
|
||||
# Default fallback
|
||||
return f"Content for the '{heading_text}' section"
|
||||
__all__ = ['SchemaGenerator']
|
||||
|
||||
@@ -1,610 +1,23 @@
|
||||
"""
|
||||
Schema Loader - Extract JSON schemas from markdown files.
|
||||
Schema Loader - Backward Compatibility Module.
|
||||
|
||||
This module provides functionality to load schemas from markdown files that
|
||||
contain embedded JSON schemas in code blocks, along with YAML frontmatter
|
||||
metadata and rich documentation.
|
||||
|
||||
Markdown Schema Format:
|
||||
---
|
||||
schema-id: "https://markitect.dev/schemas/domain/v1"
|
||||
version: "1.0.0"
|
||||
status: "stable|draft|deprecated"
|
||||
---
|
||||
|
||||
# Schema Title v1.0
|
||||
|
||||
## Documentation sections...
|
||||
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
This enables:
|
||||
- Rich documentation alongside schemas
|
||||
- Version history in same file
|
||||
- Human-readable schema files
|
||||
- Markdown-first approach aligned with MarkiTect philosophy
|
||||
This module re-exports from markitect.schema.loader for backward compatibility.
|
||||
New code should import from markitect.schema.loader directly.
|
||||
"""
|
||||
|
||||
import re
|
||||
import json
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional, List, Tuple
|
||||
|
||||
|
||||
class SchemaLoaderError(Exception):
|
||||
"""Base exception for schema loading errors."""
|
||||
pass
|
||||
|
||||
|
||||
class InvalidSchemaFormatError(SchemaLoaderError):
|
||||
"""Schema file format is invalid."""
|
||||
pass
|
||||
|
||||
|
||||
class SchemaNotFoundError(SchemaLoaderError):
|
||||
"""No JSON schema found in markdown file."""
|
||||
pass
|
||||
|
||||
|
||||
class MarkdownSchemaLoader:
|
||||
"""
|
||||
Load and parse markdown schema files.
|
||||
|
||||
Supports:
|
||||
- YAML frontmatter for metadata
|
||||
- JSON code blocks for schema definition
|
||||
- Validation of schema structure
|
||||
- Metadata merging
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))
|
||||
>>> schema = schema_data['schema']
|
||||
>>> metadata = schema_data['metadata']
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema loader with regex patterns."""
|
||||
# Pattern to match YAML frontmatter
|
||||
# Matches: --- ... --- at start of file
|
||||
self.frontmatter_pattern = re.compile(
|
||||
r'^---\s*\n(.*?)\n---\s*\n',
|
||||
re.DOTALL | re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern to match JSON code blocks
|
||||
# Matches: ```json ... ```
|
||||
self.json_code_block_pattern = re.compile(
|
||||
r'```json\s*\n(.*?)\n```',
|
||||
re.DOTALL | re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern to find Schema Definition section
|
||||
# This helps us find the right JSON block if there are multiple
|
||||
self.schema_section_pattern = re.compile(
|
||||
r'##\s+Schema Definition\s*\n',
|
||||
re.MULTILINE
|
||||
)
|
||||
|
||||
def load_schema(self, md_path: Path) -> Dict[str, Any]:
|
||||
"""
|
||||
Load schema from markdown file.
|
||||
|
||||
Args:
|
||||
md_path: Path to markdown schema file
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- schema: Extracted JSON schema (dict)
|
||||
- metadata: Frontmatter metadata (dict)
|
||||
- documentation: Full markdown content (str)
|
||||
- source_file: Source file path (str)
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If schema file doesn't exist
|
||||
InvalidSchemaFormatError: If file format is invalid
|
||||
SchemaNotFoundError: If no JSON schema found
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> data = loader.load_schema(Path("manpage-schema-v1.0.md"))
|
||||
>>> print(data['schema']['title'])
|
||||
'Unix Manual Page Schema'
|
||||
"""
|
||||
if not md_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {md_path}")
|
||||
|
||||
# Read file content
|
||||
try:
|
||||
content = md_path.read_text(encoding='utf-8')
|
||||
except Exception as e:
|
||||
raise InvalidSchemaFormatError(f"Failed to read schema file: {e}")
|
||||
|
||||
# Extract frontmatter
|
||||
metadata = self._extract_frontmatter(content)
|
||||
|
||||
# Extract JSON schema
|
||||
schema = self._extract_json_schema(content)
|
||||
|
||||
if not schema:
|
||||
raise SchemaNotFoundError(
|
||||
f"No JSON schema found in {md_path}. "
|
||||
f"Expected a ```json code block with schema definition."
|
||||
)
|
||||
|
||||
# Merge metadata into schema
|
||||
schema = self._merge_metadata(schema, metadata, md_path)
|
||||
|
||||
return {
|
||||
'schema': schema,
|
||||
'metadata': metadata,
|
||||
'documentation': content,
|
||||
'source_file': str(md_path)
|
||||
}
|
||||
|
||||
def _extract_frontmatter(self, content: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract YAML frontmatter from markdown content.
|
||||
|
||||
Args:
|
||||
content: Markdown file content
|
||||
|
||||
Returns:
|
||||
Dictionary of frontmatter metadata (empty if none found)
|
||||
|
||||
Raises:
|
||||
InvalidSchemaFormatError: If YAML is malformed
|
||||
"""
|
||||
match = self.frontmatter_pattern.search(content)
|
||||
if not match:
|
||||
return {}
|
||||
|
||||
yaml_content = match.group(1)
|
||||
try:
|
||||
metadata = yaml.safe_load(yaml_content) or {}
|
||||
if not isinstance(metadata, dict):
|
||||
raise InvalidSchemaFormatError(
|
||||
f"Frontmatter must be a YAML dictionary, got {type(metadata)}"
|
||||
)
|
||||
return metadata
|
||||
except yaml.YAMLError as e:
|
||||
raise InvalidSchemaFormatError(f"Invalid YAML frontmatter: {e}")
|
||||
|
||||
def _extract_json_schema(self, content: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Extract JSON schema from markdown code blocks.
|
||||
|
||||
Prefers JSON blocks under "## Schema Definition" section,
|
||||
but will use first JSON block if no Schema Definition section found.
|
||||
|
||||
Args:
|
||||
content: Markdown file content
|
||||
|
||||
Returns:
|
||||
JSON schema dictionary or None if not found
|
||||
|
||||
Raises:
|
||||
InvalidSchemaFormatError: If JSON is malformed
|
||||
"""
|
||||
# Find all JSON code blocks
|
||||
json_blocks = self.json_code_block_pattern.findall(content)
|
||||
|
||||
if not json_blocks:
|
||||
return None
|
||||
|
||||
# Try to find the Schema Definition section
|
||||
schema_section_match = self.schema_section_pattern.search(content)
|
||||
|
||||
if schema_section_match:
|
||||
# Find JSON block that comes after Schema Definition section
|
||||
section_pos = schema_section_match.end()
|
||||
|
||||
# Re-search for JSON blocks starting from section position
|
||||
remaining_content = content[section_pos:]
|
||||
section_json_blocks = self.json_code_block_pattern.findall(remaining_content)
|
||||
|
||||
if section_json_blocks:
|
||||
json_text = section_json_blocks[0]
|
||||
else:
|
||||
# Fallback to first JSON block in entire document
|
||||
json_text = json_blocks[0]
|
||||
else:
|
||||
# No Schema Definition section, use first JSON block
|
||||
json_text = json_blocks[0]
|
||||
|
||||
# Parse JSON
|
||||
try:
|
||||
schema = json.loads(json_text)
|
||||
if not isinstance(schema, dict):
|
||||
raise InvalidSchemaFormatError(
|
||||
f"Schema must be a JSON object, got {type(schema)}"
|
||||
)
|
||||
return schema
|
||||
except json.JSONDecodeError as e:
|
||||
raise InvalidSchemaFormatError(f"Invalid JSON schema: {e}")
|
||||
|
||||
def _merge_metadata(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
metadata: Dict[str, Any],
|
||||
source_file: Path
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Merge frontmatter metadata into schema.
|
||||
|
||||
Adds x-markitect-source extension with file info and metadata.
|
||||
Optionally overrides schema fields with frontmatter values.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary
|
||||
metadata: Frontmatter metadata dictionary
|
||||
source_file: Path to source file
|
||||
|
||||
Returns:
|
||||
Schema with merged metadata
|
||||
"""
|
||||
# Create a copy to avoid modifying original
|
||||
merged_schema = schema.copy()
|
||||
|
||||
# Add MarkiTect-specific source metadata
|
||||
merged_schema['x-markitect-source'] = {
|
||||
'file': str(source_file),
|
||||
'filename': source_file.name,
|
||||
'format': 'markdown',
|
||||
'frontmatter': metadata
|
||||
}
|
||||
|
||||
# Override schema fields with frontmatter if present
|
||||
# This allows frontmatter to be the source of truth for metadata
|
||||
if 'version' in metadata:
|
||||
merged_schema['version'] = metadata['version']
|
||||
|
||||
if 'schema-id' in metadata:
|
||||
merged_schema['$id'] = metadata['schema-id']
|
||||
|
||||
if 'status' in metadata:
|
||||
if 'x-markitect-metadata' not in merged_schema:
|
||||
merged_schema['x-markitect-metadata'] = {}
|
||||
merged_schema['x-markitect-metadata']['status'] = metadata['status']
|
||||
|
||||
return merged_schema
|
||||
|
||||
def save_schema(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
md_path: Path,
|
||||
template: Optional[str] = None,
|
||||
frontmatter: Optional[Dict[str, Any]] = None
|
||||
):
|
||||
"""
|
||||
Save schema as markdown file.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary to save
|
||||
md_path: Output path for markdown file
|
||||
template: Optional markdown template string
|
||||
frontmatter: Optional frontmatter metadata (extracted from schema if not provided)
|
||||
|
||||
Raises:
|
||||
InvalidSchemaFormatError: If schema is invalid
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> loader.save_schema(
|
||||
... schema={'title': 'My Schema', ...},
|
||||
... md_path=Path('my-schema-v1.0.md')
|
||||
... )
|
||||
"""
|
||||
if template:
|
||||
# Use provided template
|
||||
content = self._render_template(template, schema, frontmatter)
|
||||
else:
|
||||
# Generate basic markdown
|
||||
content = self._generate_markdown(schema, frontmatter)
|
||||
|
||||
# Create parent directory if needed
|
||||
md_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Write file
|
||||
try:
|
||||
md_path.write_text(content, encoding='utf-8')
|
||||
except Exception as e:
|
||||
raise InvalidSchemaFormatError(f"Failed to write schema file: {e}")
|
||||
|
||||
def _generate_markdown(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
frontmatter: Optional[Dict[str, Any]] = None
|
||||
) -> str:
|
||||
"""
|
||||
Generate markdown from schema.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary
|
||||
frontmatter: Optional frontmatter metadata
|
||||
|
||||
Returns:
|
||||
Markdown content as string
|
||||
"""
|
||||
# Extract metadata from schema
|
||||
title = schema.get('title', 'Untitled Schema')
|
||||
version = schema.get('version', '1.0.0')
|
||||
description = schema.get('description', '')
|
||||
schema_id = schema.get('$id', '')
|
||||
|
||||
# Build frontmatter
|
||||
if frontmatter is None:
|
||||
frontmatter = {}
|
||||
|
||||
# Set defaults
|
||||
if 'schema-id' not in frontmatter and schema_id:
|
||||
frontmatter['schema-id'] = schema_id
|
||||
if 'version' not in frontmatter:
|
||||
frontmatter['version'] = version
|
||||
if 'status' not in frontmatter:
|
||||
frontmatter['status'] = 'draft'
|
||||
|
||||
# Generate frontmatter YAML
|
||||
frontmatter_yaml = yaml.dump(
|
||||
frontmatter,
|
||||
default_flow_style=False,
|
||||
allow_unicode=True
|
||||
).strip()
|
||||
|
||||
# Generate JSON (pretty-printed)
|
||||
schema_json = json.dumps(schema, indent=2, ensure_ascii=False)
|
||||
|
||||
# Build markdown content
|
||||
md_content = f"""---
|
||||
{frontmatter_yaml}
|
||||
---
|
||||
|
||||
# {title} v{version}
|
||||
|
||||
## Overview
|
||||
|
||||
{description}
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
markitect validate document.md --schema {Path(frontmatter.get('schema-id', 'schema')).name}
|
||||
```
|
||||
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{schema_json}
|
||||
```
|
||||
|
||||
## Version History
|
||||
|
||||
### v{version}
|
||||
- Initial version
|
||||
"""
|
||||
|
||||
return md_content
|
||||
|
||||
def _render_template(
|
||||
self,
|
||||
template: str,
|
||||
schema: Dict[str, Any],
|
||||
frontmatter: Optional[Dict[str, Any]] = None
|
||||
) -> str:
|
||||
"""
|
||||
Render markdown from template.
|
||||
|
||||
Simple template rendering using string formatting.
|
||||
For complex templates, consider using Jinja2 or similar.
|
||||
|
||||
Args:
|
||||
template: Template string
|
||||
schema: JSON schema dictionary
|
||||
frontmatter: Optional frontmatter metadata
|
||||
|
||||
Returns:
|
||||
Rendered markdown content
|
||||
"""
|
||||
# Build context for template
|
||||
context = {
|
||||
'title': schema.get('title', 'Untitled'),
|
||||
'version': schema.get('version', '1.0.0'),
|
||||
'description': schema.get('description', ''),
|
||||
'schema_id': schema.get('$id', ''),
|
||||
'schema_json': json.dumps(schema, indent=2, ensure_ascii=False),
|
||||
'frontmatter': frontmatter or {},
|
||||
}
|
||||
|
||||
# Simple template rendering
|
||||
try:
|
||||
return template.format(**context)
|
||||
except KeyError as e:
|
||||
raise InvalidSchemaFormatError(f"Template missing key: {e}")
|
||||
|
||||
def list_json_blocks(self, content: str) -> List[Tuple[int, str]]:
|
||||
"""
|
||||
List all JSON code blocks in markdown content.
|
||||
|
||||
Useful for debugging or when multiple JSON blocks exist.
|
||||
|
||||
Args:
|
||||
content: Markdown file content
|
||||
|
||||
Returns:
|
||||
List of (position, json_content) tuples
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> content = Path('schema.md').read_text()
|
||||
>>> blocks = loader.list_json_blocks(content)
|
||||
>>> print(f"Found {len(blocks)} JSON blocks")
|
||||
"""
|
||||
blocks = []
|
||||
for match in self.json_code_block_pattern.finditer(content):
|
||||
blocks.append((match.start(), match.group(1)))
|
||||
return blocks
|
||||
|
||||
def validate_schema_structure(self, schema: Dict[str, Any]) -> List[str]:
|
||||
"""
|
||||
Validate basic schema structure.
|
||||
|
||||
Checks for required JSON Schema fields and MarkiTect conventions.
|
||||
|
||||
Args:
|
||||
schema: JSON schema dictionary
|
||||
|
||||
Returns:
|
||||
List of warning/error messages (empty if valid)
|
||||
|
||||
Example:
|
||||
>>> loader = MarkdownSchemaLoader()
|
||||
>>> issues = loader.validate_schema_structure(schema)
|
||||
>>> if issues:
|
||||
... print("Schema issues:", issues)
|
||||
"""
|
||||
issues = []
|
||||
|
||||
# Check required JSON Schema fields
|
||||
if '$schema' not in schema:
|
||||
issues.append("Missing required field: $schema")
|
||||
|
||||
if 'type' not in schema:
|
||||
issues.append("Missing recommended field: type")
|
||||
|
||||
if 'title' not in schema:
|
||||
issues.append("Missing recommended field: title")
|
||||
|
||||
if 'description' not in schema:
|
||||
issues.append("Missing recommended field: description")
|
||||
|
||||
# Check MarkiTect conventions
|
||||
if 'version' not in schema:
|
||||
issues.append("Missing MarkiTect convention: version field")
|
||||
|
||||
if '$id' not in schema:
|
||||
issues.append("Missing recommended field: $id")
|
||||
|
||||
# Check $id format if present
|
||||
if '$id' in schema:
|
||||
schema_id = schema['$id']
|
||||
if not isinstance(schema_id, str):
|
||||
issues.append("$id must be a string")
|
||||
elif not schema_id.startswith('https://'):
|
||||
issues.append("$id should be a full HTTPS URL")
|
||||
|
||||
return issues
|
||||
|
||||
|
||||
def auto_ingest_schemas(db_manager=None, schema_dir: Optional[Path] = None, verbose: bool = False) -> Dict[str, Any]:
|
||||
"""Automatically ingest schemas from markitect/schemas/ directory.
|
||||
|
||||
This function scans the schemas directory for .md schema files and ingests
|
||||
any that are not already in the database. Useful for post-install setup
|
||||
or automatic schema registration.
|
||||
|
||||
Args:
|
||||
db_manager: DatabaseManager instance (optional, will create if not provided)
|
||||
schema_dir: Directory containing schemas (defaults to markitect/schemas/)
|
||||
verbose: If True, print detailed progress messages
|
||||
|
||||
Returns:
|
||||
Dictionary with ingestion results:
|
||||
{
|
||||
'ingested': [list of schema names that were ingested],
|
||||
'skipped': [list of schema names that were already present],
|
||||
'failed': [list of (schema_name, error) tuples for failures]
|
||||
}
|
||||
|
||||
Example:
|
||||
>>> from markitect.schema_loader import auto_ingest_schemas
|
||||
>>> results = auto_ingest_schemas(verbose=True)
|
||||
>>> print(f"Ingested {len(results['ingested'])} schemas")
|
||||
"""
|
||||
# Determine schema directory
|
||||
if schema_dir is None:
|
||||
schema_dir = Path(__file__).parent / "schemas"
|
||||
|
||||
if not schema_dir.exists():
|
||||
if verbose:
|
||||
print(f"⚠️ Schema directory not found: {schema_dir}")
|
||||
return {'ingested': [], 'skipped': [], 'failed': []}
|
||||
|
||||
# Initialize database manager if not provided
|
||||
if db_manager is None:
|
||||
from .database import DatabaseManager
|
||||
db_path = Path.home() / '.markitect' / 'markitect.db'
|
||||
db_manager = DatabaseManager(str(db_path))
|
||||
db_manager.initialize_database()
|
||||
|
||||
# Get list of already ingested schemas
|
||||
try:
|
||||
existing_schemas = {schema['name'] for schema in db_manager.list_schemas()}
|
||||
except Exception as e:
|
||||
if verbose:
|
||||
print(f"❌ Error listing existing schemas: {e}")
|
||||
return {'ingested': [], 'skipped': [], 'failed': []}
|
||||
|
||||
results = {
|
||||
'ingested': [],
|
||||
'skipped': [],
|
||||
'failed': []
|
||||
}
|
||||
|
||||
# Find all schema files
|
||||
schema_files = list(schema_dir.glob("*-schema-v*.md"))
|
||||
|
||||
if verbose and schema_files:
|
||||
print(f"🔍 Found {len(schema_files)} schema file(s) in {schema_dir}")
|
||||
|
||||
loader = MarkdownSchemaLoader()
|
||||
|
||||
for schema_file in sorted(schema_files):
|
||||
schema_name = schema_file.name
|
||||
|
||||
# Skip if already ingested
|
||||
if schema_name in existing_schemas:
|
||||
results['skipped'].append(schema_name)
|
||||
if verbose:
|
||||
print(f"⏭️ Skipping {schema_name} (already ingested)")
|
||||
continue
|
||||
|
||||
# Try to ingest
|
||||
try:
|
||||
# Load schema
|
||||
schema_data_full = loader.load_schema(schema_file)
|
||||
schema_data = schema_data_full['schema']
|
||||
|
||||
# Store in database
|
||||
schema_content = json.dumps(schema_data, indent=2)
|
||||
record_id = db_manager.store_schema_file(schema_name, schema_content)
|
||||
|
||||
if record_id:
|
||||
results['ingested'].append(schema_name)
|
||||
if verbose:
|
||||
title = schema_data.get('title', schema_name)
|
||||
print(f"✅ Ingested {schema_name} (title: {title})")
|
||||
else:
|
||||
results['failed'].append((schema_name, "Failed to store in database"))
|
||||
if verbose:
|
||||
print(f"❌ Failed to store {schema_name} in database")
|
||||
|
||||
except Exception as e:
|
||||
results['failed'].append((schema_name, str(e)))
|
||||
if verbose:
|
||||
print(f"❌ Failed to ingest {schema_name}: {e}")
|
||||
|
||||
if verbose:
|
||||
print(f"\n📊 Auto-ingestion complete:")
|
||||
print(f" Ingested: {len(results['ingested'])}")
|
||||
print(f" Skipped: {len(results['skipped'])}")
|
||||
print(f" Failed: {len(results['failed'])}")
|
||||
|
||||
return results
|
||||
# Re-export from schema package for backward compatibility
|
||||
from markitect.schema.loader import (
|
||||
MarkdownSchemaLoader,
|
||||
SchemaLoaderError,
|
||||
InvalidSchemaFormatError,
|
||||
SchemaNotFoundError,
|
||||
auto_ingest_schemas,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
'MarkdownSchemaLoader',
|
||||
'SchemaLoaderError',
|
||||
'InvalidSchemaFormatError',
|
||||
'SchemaNotFoundError',
|
||||
'auto_ingest_schemas',
|
||||
]
|
||||
|
||||
@@ -1,309 +1,35 @@
|
||||
"""
|
||||
Schema Naming Validation - Enforce filename conventions for schemas.
|
||||
Schema Naming - Backward Compatibility Module.
|
||||
|
||||
This module provides validation and utilities for schema filename conventions
|
||||
to ensure consistency across the MarkiTect schema ecosystem.
|
||||
|
||||
Naming Convention:
|
||||
Format: {domain}-schema-v{major}.{minor}.md
|
||||
|
||||
Components:
|
||||
- domain: lowercase, hyphen-separated identifier (e.g., "manpage", "api-documentation")
|
||||
- schema: literal string "schema"
|
||||
- version: SemVer major.minor (e.g., "v1.0", "v2.1")
|
||||
- extension: ".md" (markdown)
|
||||
|
||||
Valid Examples:
|
||||
✓ manpage-schema-v1.0.md
|
||||
✓ terminology-schema-v1.0.md
|
||||
✓ api-documentation-schema-v1.0.md
|
||||
✓ my-custom-type-schema-v2.1.md
|
||||
|
||||
Invalid Examples:
|
||||
✗ manpage.json (missing version and wrong extension)
|
||||
✗ manpage-v1.md (missing "schema" keyword)
|
||||
✗ ManPage-Schema-v1.0.md (wrong case - must be lowercase)
|
||||
✗ manpage-schema-1.0.md (missing 'v' prefix)
|
||||
✗ manpage-schema-v1.md (missing minor version)
|
||||
This module re-exports from markitect.schema.naming for backward compatibility.
|
||||
New code should import from markitect.schema.naming directly.
|
||||
"""
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Tuple, Optional, Dict, Any
|
||||
|
||||
|
||||
# Regex pattern for schema filename validation
|
||||
# Matches: {domain}-schema-v{major}.{minor}.md
|
||||
# Where domain is lowercase letters/numbers/hyphens starting with letter
|
||||
SCHEMA_FILENAME_PATTERN = re.compile(
|
||||
r'^(?P<domain>[a-z][a-z0-9-]*)-schema-v(?P<major>\d+)\.(?P<minor>\d+)\.md$'
|
||||
# Re-export from schema package for backward compatibility
|
||||
from markitect.schema.naming import (
|
||||
validate_schema_filename,
|
||||
suggest_valid_filename,
|
||||
suggest_schema_filename,
|
||||
extract_schema_domain,
|
||||
get_schema_version,
|
||||
extract_schema_metadata,
|
||||
get_validation_errors,
|
||||
is_valid_schema_filename,
|
||||
format_validation_message,
|
||||
SchemaFilenameError,
|
||||
SCHEMA_FILENAME_PATTERN,
|
||||
)
|
||||
|
||||
|
||||
class SchemaFilenameError(Exception):
|
||||
"""Exception raised for invalid schema filenames."""
|
||||
pass
|
||||
|
||||
|
||||
def validate_schema_filename(filename: str) -> Tuple[bool, Optional[Dict[str, Any]]]:
|
||||
"""
|
||||
Validate schema filename against naming convention.
|
||||
|
||||
Args:
|
||||
filename: The filename to validate (e.g., "manpage-schema-v1.0.md")
|
||||
|
||||
Returns:
|
||||
Tuple of (is_valid, metadata_dict or None)
|
||||
|
||||
If valid, metadata_dict contains:
|
||||
- domain: str - The domain identifier
|
||||
- version: str - Full version string (e.g., "1.0")
|
||||
- major: int - Major version number
|
||||
- minor: int - Minor version number
|
||||
- filename: str - The original filename
|
||||
|
||||
If invalid, metadata_dict is None
|
||||
|
||||
Examples:
|
||||
>>> validate_schema_filename("manpage-schema-v1.0.md")
|
||||
(True, {'domain': 'manpage', 'version': '1.0', ...})
|
||||
|
||||
>>> validate_schema_filename("invalid.json")
|
||||
(False, None)
|
||||
"""
|
||||
match = SCHEMA_FILENAME_PATTERN.match(filename)
|
||||
|
||||
if not match:
|
||||
return False, None
|
||||
|
||||
return True, {
|
||||
'domain': match.group('domain'),
|
||||
'version': f"{match.group('major')}.{match.group('minor')}",
|
||||
'major': int(match.group('major')),
|
||||
'minor': int(match.group('minor')),
|
||||
'filename': filename
|
||||
}
|
||||
|
||||
|
||||
def suggest_schema_filename(
|
||||
domain: str,
|
||||
version: str = "1.0",
|
||||
normalize: bool = True
|
||||
) -> str:
|
||||
"""
|
||||
Generate a valid schema filename from domain and version.
|
||||
|
||||
Args:
|
||||
domain: The schema domain (e.g., "manpage", "API Documentation")
|
||||
version: Version string in format "major.minor" (default: "1.0")
|
||||
normalize: Whether to normalize domain to lowercase/hyphenated
|
||||
|
||||
Returns:
|
||||
Valid schema filename
|
||||
|
||||
Raises:
|
||||
ValueError: If domain or version format is invalid
|
||||
|
||||
Examples:
|
||||
>>> suggest_schema_filename("manpage", "1.0")
|
||||
'manpage-schema-v1.0.md'
|
||||
|
||||
>>> suggest_schema_filename("API Documentation", "2.1")
|
||||
'api-documentation-schema-v2.1.md'
|
||||
|
||||
>>> suggest_schema_filename("My_Custom_Type", "1.0")
|
||||
'my-custom-type-schema-v1.0.md'
|
||||
"""
|
||||
if not domain:
|
||||
raise ValueError("Domain cannot be empty")
|
||||
|
||||
if normalize:
|
||||
# Normalize domain: lowercase, replace spaces/underscores with hyphens
|
||||
domain_clean = domain.lower()
|
||||
domain_clean = domain_clean.replace(' ', '-').replace('_', '-')
|
||||
# Remove consecutive hyphens
|
||||
domain_clean = re.sub(r'-+', '-', domain_clean)
|
||||
# Remove leading/trailing hyphens
|
||||
domain_clean = domain_clean.strip('-')
|
||||
else:
|
||||
domain_clean = domain
|
||||
|
||||
# Validate domain format (must start with letter, contain only lowercase, numbers, hyphens)
|
||||
if not re.match(r'^[a-z][a-z0-9-]*$', domain_clean):
|
||||
raise ValueError(
|
||||
f"Invalid domain '{domain_clean}': must start with lowercase letter "
|
||||
"and contain only lowercase letters, numbers, and hyphens"
|
||||
)
|
||||
|
||||
# Parse and validate version
|
||||
version_parts = version.split('.')
|
||||
if len(version_parts) != 2:
|
||||
raise ValueError(
|
||||
f"Invalid version '{version}': must be in format 'major.minor' (e.g., '1.0')"
|
||||
)
|
||||
|
||||
try:
|
||||
major = int(version_parts[0])
|
||||
minor = int(version_parts[1])
|
||||
except ValueError:
|
||||
raise ValueError(
|
||||
f"Invalid version '{version}': major and minor must be integers"
|
||||
)
|
||||
|
||||
if major < 0 or minor < 0:
|
||||
raise ValueError(
|
||||
f"Invalid version '{version}': major and minor must be non-negative"
|
||||
)
|
||||
|
||||
return f"{domain_clean}-schema-v{major}.{minor}.md"
|
||||
|
||||
|
||||
def extract_schema_metadata(filename: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract metadata from a valid schema filename.
|
||||
|
||||
Args:
|
||||
filename: Schema filename to parse
|
||||
|
||||
Returns:
|
||||
Dictionary with metadata
|
||||
|
||||
Raises:
|
||||
SchemaFilenameError: If filename is invalid
|
||||
|
||||
Examples:
|
||||
>>> extract_schema_metadata("manpage-schema-v1.0.md")
|
||||
{'domain': 'manpage', 'version': '1.0', 'major': 1, 'minor': 0}
|
||||
"""
|
||||
is_valid, metadata = validate_schema_filename(filename)
|
||||
|
||||
if not is_valid:
|
||||
raise SchemaFilenameError(
|
||||
f"Invalid schema filename: {filename}\n"
|
||||
f"Expected format: {{domain}}-schema-v{{major}}.{{minor}}.md"
|
||||
)
|
||||
|
||||
return metadata
|
||||
|
||||
|
||||
def get_validation_errors(filename: str) -> list:
|
||||
"""
|
||||
Get detailed validation errors for a filename.
|
||||
|
||||
Args:
|
||||
filename: Filename to validate
|
||||
|
||||
Returns:
|
||||
List of error messages (empty if valid)
|
||||
|
||||
Examples:
|
||||
>>> get_validation_errors("manpage-schema-v1.0.md")
|
||||
[]
|
||||
|
||||
>>> get_validation_errors("invalid.json")
|
||||
['Filename does not match pattern: {domain}-schema-v{major}.{minor}.md', ...]
|
||||
"""
|
||||
errors = []
|
||||
|
||||
# Check basic pattern match
|
||||
is_valid, _ = validate_schema_filename(filename)
|
||||
if is_valid:
|
||||
return errors
|
||||
|
||||
# Provide detailed feedback
|
||||
errors.append(
|
||||
f"Filename does not match pattern: {{domain}}-schema-v{{major}}.{{minor}}.md"
|
||||
)
|
||||
|
||||
# Check extension
|
||||
if not filename.endswith('.md'):
|
||||
errors.append(f"Extension must be '.md', got: {Path(filename).suffix}")
|
||||
|
||||
# Check for version
|
||||
if '-v' not in filename:
|
||||
errors.append("Missing version: filename must include '-v{major}.{minor}'")
|
||||
elif not re.search(r'-v\d+\.\d+', filename):
|
||||
errors.append(
|
||||
"Invalid version format: must be '-v{major}.{minor}' (e.g., '-v1.0')"
|
||||
)
|
||||
|
||||
# Check for schema keyword
|
||||
if '-schema-' not in filename:
|
||||
errors.append("Missing '-schema-' keyword in filename")
|
||||
|
||||
# Check for uppercase (must be lowercase)
|
||||
if any(c.isupper() for c in filename):
|
||||
errors.append("Filename must be lowercase")
|
||||
|
||||
# Check domain format (if we can isolate it)
|
||||
parts = filename.split('-schema-')
|
||||
if len(parts) >= 1:
|
||||
domain = parts[0]
|
||||
if domain and not re.match(r'^[a-z][a-z0-9-]*$', domain):
|
||||
errors.append(
|
||||
f"Invalid domain '{domain}': must start with lowercase letter "
|
||||
"and contain only lowercase letters, numbers, and hyphens"
|
||||
)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def is_valid_schema_filename(filename: str) -> bool:
|
||||
"""
|
||||
Check if filename is valid (convenience function).
|
||||
|
||||
Args:
|
||||
filename: Filename to check
|
||||
|
||||
Returns:
|
||||
True if valid, False otherwise
|
||||
|
||||
Examples:
|
||||
>>> is_valid_schema_filename("manpage-schema-v1.0.md")
|
||||
True
|
||||
|
||||
>>> is_valid_schema_filename("invalid.json")
|
||||
False
|
||||
"""
|
||||
is_valid, _ = validate_schema_filename(filename)
|
||||
return is_valid
|
||||
|
||||
|
||||
def format_validation_message(filename: str) -> str:
|
||||
"""
|
||||
Format a user-friendly validation message.
|
||||
|
||||
Args:
|
||||
filename: Filename that failed validation
|
||||
|
||||
Returns:
|
||||
Formatted error message with suggestions
|
||||
|
||||
Examples:
|
||||
>>> print(format_validation_message("manpage.json"))
|
||||
❌ Invalid schema filename: manpage.json
|
||||
...
|
||||
"""
|
||||
errors = get_validation_errors(filename)
|
||||
|
||||
if not errors:
|
||||
return f"✅ Valid schema filename: {filename}"
|
||||
|
||||
message = f"❌ Invalid schema filename: {filename}\n\n"
|
||||
message += "Errors:\n"
|
||||
for i, error in enumerate(errors, 1):
|
||||
message += f" {i}. {error}\n"
|
||||
|
||||
message += "\nExpected format: {domain}-schema-v{major}.{minor}.md\n"
|
||||
message += "Example: manpage-schema-v1.0.md\n"
|
||||
|
||||
# Try to suggest a corrected filename
|
||||
try:
|
||||
# Extract domain guess (everything before first hyphen or dot)
|
||||
domain_guess = filename.split('-')[0].split('.')[0]
|
||||
suggestion = suggest_schema_filename(domain_guess, "1.0")
|
||||
message += f"\nSuggested filename: {suggestion}\n"
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return message
|
||||
__all__ = [
|
||||
'validate_schema_filename',
|
||||
'suggest_valid_filename',
|
||||
'suggest_schema_filename',
|
||||
'extract_schema_domain',
|
||||
'get_schema_version',
|
||||
'extract_schema_metadata',
|
||||
'get_validation_errors',
|
||||
'is_valid_schema_filename',
|
||||
'format_validation_message',
|
||||
'SchemaFilenameError',
|
||||
'SCHEMA_FILENAME_PATTERN',
|
||||
]
|
||||
|
||||
@@ -1,530 +1,19 @@
|
||||
"""
|
||||
Schema Refiner for Phase 2: Schema Refinement Tools
|
||||
Schema Refiner - Backward Compatibility Module.
|
||||
|
||||
Automatically refines rigid schemas by applying loosening rules and fixes.
|
||||
This module re-exports from markitect.schema.refiner for backward compatibility.
|
||||
New code should import from markitect.schema.refiner directly.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional, Tuple
|
||||
import json
|
||||
import copy
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from .schema_analyzer import SchemaAnalyzer, SchemaIssue, IssueType, IssueSeverity
|
||||
|
||||
|
||||
@dataclass
|
||||
class RefinementAction:
|
||||
"""Represents a refinement action taken on the schema."""
|
||||
issue_type: IssueType
|
||||
path: str
|
||||
description: str
|
||||
old_value: Any = None
|
||||
new_value: Any = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class RefinementResult:
|
||||
"""Results of schema refinement."""
|
||||
success: bool
|
||||
actions_taken: List[RefinementAction] = field(default_factory=list)
|
||||
refined_schema: Optional[Dict[str, Any]] = None
|
||||
error_message: Optional[str] = None
|
||||
|
||||
|
||||
class SchemaRefiner:
|
||||
"""Refines rigid schemas by applying loosening rules."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema refiner."""
|
||||
self.analyzer = SchemaAnalyzer()
|
||||
|
||||
def _navigate_to_path(self, schema: Dict[str, Any], path: str) -> Optional[Tuple[Dict[str, Any], str]]:
|
||||
"""
|
||||
Navigate to a path in the schema, handling nested 'properties' objects.
|
||||
|
||||
Returns (parent_object, property_name) or None if path doesn't exist.
|
||||
"""
|
||||
path_parts = path.split('.')
|
||||
obj = schema
|
||||
|
||||
# Navigate through all but the last part
|
||||
for i, part in enumerate(path_parts[:-1]):
|
||||
# Try direct access first
|
||||
if part in obj:
|
||||
obj = obj[part]
|
||||
# If not found and obj has 'properties', try there
|
||||
elif isinstance(obj, dict) and "properties" in obj and part in obj["properties"]:
|
||||
obj = obj["properties"][part]
|
||||
else:
|
||||
return None
|
||||
|
||||
# For the final part, check if we need to descend into 'properties'
|
||||
prop_name = path_parts[-1]
|
||||
if prop_name in obj:
|
||||
return (obj, prop_name)
|
||||
elif isinstance(obj, dict) and "properties" in obj and prop_name in obj["properties"]:
|
||||
return (obj["properties"], prop_name)
|
||||
else:
|
||||
return None
|
||||
|
||||
def refine_schema_interactive(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True
|
||||
) -> RefinementResult:
|
||||
"""
|
||||
Refine a schema interactively, prompting for each fix.
|
||||
|
||||
Args:
|
||||
schema: The JSON schema to refine
|
||||
loosen_counts: Enable fixes for exact counts
|
||||
migrate_deprecated: Enable migration of deprecated extensions
|
||||
round_numbers: Enable rounding of overly specific numbers
|
||||
|
||||
Returns:
|
||||
RefinementResult with actions taken and refined schema
|
||||
"""
|
||||
result = RefinementResult(success=False)
|
||||
|
||||
try:
|
||||
# Analyze the schema first
|
||||
analysis = self.analyzer.analyze_schema(schema)
|
||||
|
||||
print(f"\nFound {len(analysis.issues)} issue(s) to review\n")
|
||||
|
||||
# Deep copy to avoid modifying original
|
||||
refined = copy.deepcopy(schema)
|
||||
|
||||
# Process each issue interactively
|
||||
for i, issue in enumerate(analysis.issues, 1):
|
||||
print(f"Issue {i}/{len(analysis.issues)}")
|
||||
print(f" Type: {issue.issue_type.value}")
|
||||
print(f" Path: {issue.path}")
|
||||
print(f" {issue.message}")
|
||||
print(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if issue.current_value is not None:
|
||||
print(f" Current: {json.dumps(issue.current_value)}")
|
||||
if issue.suggested_value is not None:
|
||||
print(f" Suggested: {json.dumps(issue.suggested_value)}")
|
||||
|
||||
# Ask user if they want to apply the fix
|
||||
response = input("\nApply this fix? [y/N/q]: ").strip().lower()
|
||||
|
||||
if response == 'q':
|
||||
print("Refinement cancelled by user")
|
||||
result.success = False
|
||||
return result
|
||||
elif response == 'y':
|
||||
action = None
|
||||
|
||||
if loosen_counts and issue.issue_type == IssueType.EXACT_COUNT:
|
||||
action = self._fix_exact_count(refined, issue)
|
||||
|
||||
elif round_numbers and issue.issue_type == IssueType.OVERLY_SPECIFIC:
|
||||
action = self._fix_overly_specific(refined, issue)
|
||||
|
||||
elif loosen_counts and issue.issue_type == IssueType.NO_FLEXIBILITY:
|
||||
action = self._fix_no_flexibility(refined, issue)
|
||||
|
||||
elif migrate_deprecated and issue.issue_type == IssueType.DEPRECATED_EXTENSIONS:
|
||||
action = self._fix_deprecated_extension(refined, issue)
|
||||
|
||||
if action:
|
||||
result.actions_taken.append(action)
|
||||
print(f" ✓ Applied")
|
||||
else:
|
||||
print(f" ✗ Could not apply fix")
|
||||
else:
|
||||
print(f" - Skipped")
|
||||
|
||||
print()
|
||||
|
||||
result.refined_schema = refined
|
||||
result.success = True
|
||||
|
||||
except Exception as e:
|
||||
result.error_message = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def refine_schema(
|
||||
self,
|
||||
schema: Dict[str, Any],
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True
|
||||
) -> RefinementResult:
|
||||
"""
|
||||
Refine a schema by applying fixes for detected issues.
|
||||
|
||||
Args:
|
||||
schema: The JSON schema to refine
|
||||
loosen_counts: Apply fixes for exact counts
|
||||
migrate_deprecated: Migrate deprecated extensions
|
||||
round_numbers: Round overly specific numbers
|
||||
|
||||
Returns:
|
||||
RefinementResult with actions taken and refined schema
|
||||
"""
|
||||
result = RefinementResult(success=False)
|
||||
|
||||
try:
|
||||
# Analyze the schema first
|
||||
analysis = self.analyzer.analyze_schema(schema)
|
||||
|
||||
# Deep copy to avoid modifying original
|
||||
refined = copy.deepcopy(schema)
|
||||
|
||||
# Apply fixes based on issues found
|
||||
for issue in analysis.issues:
|
||||
action = None
|
||||
|
||||
if loosen_counts and issue.issue_type == IssueType.EXACT_COUNT:
|
||||
action = self._fix_exact_count(refined, issue)
|
||||
|
||||
elif round_numbers and issue.issue_type == IssueType.OVERLY_SPECIFIC:
|
||||
action = self._fix_overly_specific(refined, issue)
|
||||
|
||||
elif loosen_counts and issue.issue_type == IssueType.NO_FLEXIBILITY:
|
||||
action = self._fix_no_flexibility(refined, issue)
|
||||
|
||||
elif migrate_deprecated and issue.issue_type == IssueType.DEPRECATED_EXTENSIONS:
|
||||
action = self._fix_deprecated_extension(refined, issue)
|
||||
|
||||
if action:
|
||||
result.actions_taken.append(action)
|
||||
|
||||
result.refined_schema = refined
|
||||
result.success = True
|
||||
|
||||
except Exception as e:
|
||||
result.error_message = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def _fix_exact_count(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Fix exact count constraints by converting to ranges."""
|
||||
nav_result = self._navigate_to_path(schema, issue.path)
|
||||
if not nav_result:
|
||||
return None
|
||||
|
||||
obj, prop_name = nav_result
|
||||
prop_def = obj[prop_name]
|
||||
old_value = copy.deepcopy(prop_def)
|
||||
|
||||
# Check if it's an array with exact minItems/maxItems
|
||||
if isinstance(prop_def, dict) and prop_def.get("type") == "array":
|
||||
min_items = prop_def.get("minItems")
|
||||
max_items = prop_def.get("maxItems")
|
||||
|
||||
if min_items is not None and max_items is not None and min_items == max_items:
|
||||
# Apply suggested loosening
|
||||
new_min = max(0, min_items - 2)
|
||||
new_max = min_items + 5
|
||||
|
||||
prop_def["minItems"] = new_min
|
||||
prop_def["maxItems"] = new_max
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
path=issue.path,
|
||||
description=f"Loosened array count from exactly {min_items} to range {new_min}-{new_max}",
|
||||
old_value={"minItems": min_items, "maxItems": max_items},
|
||||
new_value={"minItems": new_min, "maxItems": new_max}
|
||||
)
|
||||
|
||||
# Check if it's a const value
|
||||
if isinstance(prop_def, dict) and "const" in prop_def:
|
||||
const_value = prop_def["const"]
|
||||
del prop_def["const"]
|
||||
|
||||
# If it's a number, convert to a range
|
||||
if isinstance(const_value, int):
|
||||
prop_def["minimum"] = const_value - 1
|
||||
prop_def["maximum"] = const_value + 1
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
path=issue.path,
|
||||
description=f"Converted const {const_value} to range {const_value-1}-{const_value+1}",
|
||||
old_value=const_value,
|
||||
new_value={"minimum": const_value - 1, "maximum": const_value + 1}
|
||||
)
|
||||
else:
|
||||
# For non-numeric constants, just remove the constraint
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.EXACT_COUNT,
|
||||
path=issue.path,
|
||||
description=f"Removed const constraint: {const_value}",
|
||||
old_value=const_value,
|
||||
new_value=None
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def _fix_overly_specific(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Fix overly specific number constraints by rounding."""
|
||||
if issue.suggested_value is None:
|
||||
return None
|
||||
|
||||
nav_result = self._navigate_to_path(schema, issue.path)
|
||||
if not nav_result:
|
||||
return None
|
||||
|
||||
obj, prop_name = nav_result
|
||||
prop_def = obj[prop_name]
|
||||
|
||||
# Round the minItems value
|
||||
if isinstance(prop_def, dict) and "minItems" in prop_def:
|
||||
old_value = prop_def["minItems"]
|
||||
new_value = issue.suggested_value
|
||||
prop_def["minItems"] = new_value
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.OVERLY_SPECIFIC,
|
||||
path=issue.path,
|
||||
description=f"Rounded minItems from {old_value} to {new_value}",
|
||||
old_value=old_value,
|
||||
new_value=new_value
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def _fix_no_flexibility(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Fix narrow ranges by widening them."""
|
||||
nav_result = self._navigate_to_path(schema, issue.path)
|
||||
if not nav_result:
|
||||
return None
|
||||
|
||||
obj, prop_name = nav_result
|
||||
prop_def = obj[prop_name]
|
||||
|
||||
if isinstance(prop_def, dict) and "minimum" in prop_def and "maximum" in prop_def:
|
||||
old_min = prop_def["minimum"]
|
||||
old_max = prop_def["maximum"]
|
||||
range_size = old_max - old_min
|
||||
|
||||
# Widen the range
|
||||
new_min = old_min - 5
|
||||
new_max = old_max + 5
|
||||
|
||||
prop_def["minimum"] = new_min
|
||||
prop_def["maximum"] = new_max
|
||||
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.NO_FLEXIBILITY,
|
||||
path=issue.path,
|
||||
description=f"Widened range from {old_min}-{old_max} to {new_min}-{new_max}",
|
||||
old_value={"minimum": old_min, "maximum": old_max},
|
||||
new_value={"minimum": new_min, "maximum": new_max}
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def _fix_deprecated_extension(self, schema: Dict[str, Any], issue: SchemaIssue) -> Optional[RefinementAction]:
|
||||
"""Remove deprecated extension (migration requires manual work)."""
|
||||
# For now, just document that manual migration is needed
|
||||
# Full migration would require understanding the old format
|
||||
|
||||
deprecated_key = issue.path
|
||||
if deprecated_key in schema:
|
||||
old_value = schema[deprecated_key]
|
||||
# Don't actually remove it automatically - too risky
|
||||
return RefinementAction(
|
||||
issue_type=IssueType.DEPRECATED_EXTENSIONS,
|
||||
path=issue.path,
|
||||
description=f"Detected deprecated extension (manual migration recommended)",
|
||||
old_value=old_value,
|
||||
new_value=None
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
def refine_schema_file(
|
||||
self,
|
||||
input_path: Path,
|
||||
output_path: Optional[Path] = None,
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True
|
||||
) -> RefinementResult:
|
||||
"""
|
||||
Refine a schema file.
|
||||
|
||||
Args:
|
||||
input_path: Path to input schema file
|
||||
output_path: Path to output file (if None, overwrites input)
|
||||
loosen_counts: Apply fixes for exact counts
|
||||
migrate_deprecated: Migrate deprecated extensions
|
||||
round_numbers: Round overly specific numbers
|
||||
|
||||
Returns:
|
||||
RefinementResult
|
||||
"""
|
||||
with open(input_path) as f:
|
||||
schema = json.load(f)
|
||||
|
||||
result = self.refine_schema(
|
||||
schema,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
if result.success and result.refined_schema:
|
||||
output = output_path or input_path
|
||||
with open(output, 'w') as f:
|
||||
json.dump(result.refined_schema, f, indent=2)
|
||||
|
||||
return result
|
||||
|
||||
def format_refinement_report(self, result: RefinementResult) -> str:
|
||||
"""
|
||||
Format refinement results as a human-readable report.
|
||||
|
||||
Args:
|
||||
result: Refinement results
|
||||
|
||||
Returns:
|
||||
Formatted report string
|
||||
"""
|
||||
lines = []
|
||||
|
||||
# Header
|
||||
lines.append("=" * 70)
|
||||
lines.append("Schema Refinement Report")
|
||||
lines.append("=" * 70)
|
||||
lines.append("")
|
||||
|
||||
if not result.success:
|
||||
lines.append(f"❌ Refinement failed: {result.error_message}")
|
||||
return "\n".join(lines)
|
||||
|
||||
# Summary
|
||||
action_count = len(result.actions_taken)
|
||||
if action_count == 0:
|
||||
lines.append("✅ No refinements needed - schema is already flexible")
|
||||
else:
|
||||
lines.append(f"✅ Applied {action_count} refinement(s)")
|
||||
lines.append("")
|
||||
|
||||
# List actions
|
||||
if result.actions_taken:
|
||||
lines.append("Actions Taken:")
|
||||
lines.append("-" * 70)
|
||||
|
||||
for i, action in enumerate(result.actions_taken, 1):
|
||||
lines.append(f"{i}. {action.description}")
|
||||
lines.append(f" Path: {action.path}")
|
||||
|
||||
if action.old_value is not None:
|
||||
lines.append(f" Before: {json.dumps(action.old_value)}")
|
||||
if action.new_value is not None:
|
||||
lines.append(f" After: {json.dumps(action.new_value)}")
|
||||
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def refine_schema_cli(
|
||||
schema_path: str,
|
||||
output: Optional[str] = None,
|
||||
loosen_counts: bool = True,
|
||||
migrate_deprecated: bool = False,
|
||||
round_numbers: bool = True,
|
||||
dry_run: bool = False,
|
||||
interactive: bool = False
|
||||
) -> int:
|
||||
"""
|
||||
CLI entry point for schema refinement.
|
||||
|
||||
Args:
|
||||
schema_path: Path to schema file
|
||||
output: Output path (None = overwrite input)
|
||||
loosen_counts: Apply count loosening fixes
|
||||
migrate_deprecated: Migrate deprecated extensions
|
||||
round_numbers: Round overly specific numbers
|
||||
dry_run: Show changes without applying
|
||||
interactive: Prompt for each fix
|
||||
|
||||
Returns:
|
||||
Exit code (0 = success, 1 = no changes needed, 2 = error)
|
||||
"""
|
||||
refiner = SchemaRefiner()
|
||||
|
||||
try:
|
||||
input_path = Path(schema_path)
|
||||
output_path = Path(output) if output else None
|
||||
|
||||
# Load schema
|
||||
with open(input_path) as f:
|
||||
schema = json.load(f)
|
||||
|
||||
if interactive:
|
||||
# Interactive mode - prompt for each fix
|
||||
print(f"Refining schema: {schema_path}")
|
||||
result = refiner.refine_schema_interactive(
|
||||
schema,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
if result.success and result.refined_schema and not dry_run:
|
||||
# Write the refined schema
|
||||
output = output_path or input_path
|
||||
with open(output, 'w') as f:
|
||||
json.dump(result.refined_schema, f, indent=2)
|
||||
print(f"\nRefined schema written to: {output}")
|
||||
|
||||
elif dry_run:
|
||||
# Just analyze and show what would be done
|
||||
result = refiner.refine_schema(
|
||||
schema,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
print("DRY RUN - No changes will be made")
|
||||
print()
|
||||
else:
|
||||
result = refiner.refine_schema_file(
|
||||
input_path,
|
||||
output_path,
|
||||
loosen_counts=loosen_counts,
|
||||
migrate_deprecated=migrate_deprecated,
|
||||
round_numbers=round_numbers
|
||||
)
|
||||
|
||||
# Only print full report if not in interactive mode (user already saw changes)
|
||||
if not interactive:
|
||||
report = refiner.format_refinement_report(result)
|
||||
print(report)
|
||||
elif result.success:
|
||||
# Just print summary for interactive mode
|
||||
print(f"\n{'='*70}")
|
||||
print(f"Refinement complete: {len(result.actions_taken)} change(s) applied")
|
||||
print(f"{'='*70}")
|
||||
|
||||
if result.success and len(result.actions_taken) > 0:
|
||||
return 0 # Success with changes
|
||||
elif result.success:
|
||||
return 1 # Success but no changes needed
|
||||
else:
|
||||
return 2 # Error
|
||||
|
||||
except FileNotFoundError:
|
||||
print(f"Error: Schema file not found: {schema_path}")
|
||||
return 2
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: Invalid JSON in schema file: {e}")
|
||||
return 2
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
return 2
|
||||
# Re-export from schema package for backward compatibility
|
||||
from markitect.schema.refiner import (
|
||||
SchemaRefiner,
|
||||
RefinementResult,
|
||||
RefinementAction,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
'SchemaRefiner',
|
||||
'RefinementResult',
|
||||
'RefinementAction',
|
||||
]
|
||||
|
||||
@@ -1,679 +1,11 @@
|
||||
"""
|
||||
Schema Validator for Issue #7: Validate a Markdown File Against a Schema.
|
||||
Schema Validator - Backward Compatibility Module.
|
||||
|
||||
This module provides functionality to validate markdown documents against JSON schemas
|
||||
for arc42 architecture documentation compliance checking - essential for intelligent
|
||||
document analysis and plan-actual comparison capabilities.
|
||||
This module re-exports from markitect.schema.validator for backward compatibility.
|
||||
New code should import from markitect.schema.validator directly.
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
# Re-export from schema package for backward compatibility
|
||||
from markitect.schema.validator import SchemaValidator
|
||||
|
||||
try:
|
||||
import jsonschema
|
||||
from jsonschema import SchemaError
|
||||
JSONSCHEMA_AVAILABLE = True
|
||||
except ImportError:
|
||||
# Fallback to basic validation without full JSON Schema validation
|
||||
JSONSCHEMA_AVAILABLE = False
|
||||
SchemaError = Exception
|
||||
|
||||
from .parser import parse_markdown_to_ast
|
||||
from .schema_generator import SchemaGenerator
|
||||
from .validation_error import ValidationErrorCollector, ValidationErrorType
|
||||
from .exceptions import FileNotFoundError, SchemaValidationError, InvalidSchemaError
|
||||
|
||||
|
||||
class SchemaValidator:
|
||||
"""
|
||||
Validates markdown documents against JSON schemas for arc42 compliance checking.
|
||||
|
||||
This service provides boolean validation results for markdown documents against
|
||||
schemas, enabling strict compliance checking for architectural documentation
|
||||
templates and intelligent plan-actual comparison.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the schema validator."""
|
||||
self.schema_generator = SchemaGenerator()
|
||||
self.jsonschema_available = JSONSCHEMA_AVAILABLE
|
||||
|
||||
def validate_file_against_schema(self, file_path: Path, schema: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema: JSON schema dictionary to validate against
|
||||
|
||||
Returns:
|
||||
True if the document matches the schema, False otherwise
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid
|
||||
"""
|
||||
# Validate inputs
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"Markdown file not found: {file_path}")
|
||||
|
||||
# Validate the schema itself
|
||||
self._validate_schema(schema)
|
||||
|
||||
# Generate the document's current structure
|
||||
try:
|
||||
document_schema = self.schema_generator.generate_schema_from_file(file_path)
|
||||
except Exception as e:
|
||||
raise SchemaValidationError(f"Failed to generate document schema: {e}") from e
|
||||
|
||||
# Check if the expected schema has heading text constraints
|
||||
if self._has_heading_text_constraints(schema):
|
||||
# For heading text validation, we need to extract actual content and compare against enum constraints
|
||||
return self._validate_with_heading_text_constraints(file_path, schema, document_schema)
|
||||
|
||||
# Use standard structure comparison for backward compatibility
|
||||
return self._compare_structures(document_schema, schema)
|
||||
|
||||
def validate_file_against_schema_string(self, file_path: Path, schema_json: str) -> bool:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema provided as a string.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_json: JSON schema as a string
|
||||
|
||||
Returns:
|
||||
True if the document matches the schema, False otherwise
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid JSON or schema
|
||||
"""
|
||||
try:
|
||||
schema = json.loads(schema_json)
|
||||
except json.JSONDecodeError as e:
|
||||
raise InvalidSchemaError(f"Invalid JSON schema string: {e}") from e
|
||||
|
||||
return self.validate_file_against_schema(file_path, schema)
|
||||
|
||||
def validate_file_against_schema_file(self, file_path: Path, schema_file_path: Path) -> bool:
|
||||
"""
|
||||
Validate a markdown file against a schema stored in a file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_file_path: Path to the JSON schema file
|
||||
|
||||
Returns:
|
||||
True if the document matches the schema, False otherwise
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If either file doesn't exist
|
||||
InvalidSchemaError: If the schema file is invalid
|
||||
"""
|
||||
if not schema_file_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {schema_file_path}")
|
||||
|
||||
try:
|
||||
schema_content = schema_file_path.read_text(encoding='utf-8')
|
||||
schema = json.loads(schema_content)
|
||||
except (IOError, json.JSONDecodeError) as e:
|
||||
raise InvalidSchemaError(f"Failed to load schema file {schema_file_path}: {e}") from e
|
||||
|
||||
return self.validate_file_against_schema(file_path, schema)
|
||||
|
||||
def _validate_schema(self, schema: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Validate that a schema is a valid JSON Schema.
|
||||
|
||||
Args:
|
||||
schema: Schema dictionary to validate
|
||||
|
||||
Raises:
|
||||
InvalidSchemaError: If the schema is invalid
|
||||
"""
|
||||
try:
|
||||
# Check basic schema structure
|
||||
if not isinstance(schema, dict):
|
||||
raise InvalidSchemaError("Schema must be a dictionary")
|
||||
|
||||
# Basic schema validation
|
||||
if not schema.get('$schema') or not schema.get('type'):
|
||||
raise InvalidSchemaError("Schema must have '$schema' and 'type' fields")
|
||||
|
||||
# If jsonschema library is available, use it for full validation
|
||||
if self.jsonschema_available:
|
||||
jsonschema.validators.validator_for(schema).check_schema(schema)
|
||||
|
||||
except (SchemaError, TypeError, AttributeError) as e:
|
||||
raise InvalidSchemaError(f"Invalid JSON schema: {e}") from e
|
||||
|
||||
def _compare_structures(self, document_schema: Dict[str, Any], expected_schema: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Compare a document's actual structure against expected schema requirements.
|
||||
|
||||
This method performs the core validation logic by analyzing whether the
|
||||
document's generated schema satisfies the requirements defined in the
|
||||
expected schema.
|
||||
|
||||
Args:
|
||||
document_schema: Schema generated from the actual document
|
||||
expected_schema: Expected schema requirements
|
||||
|
||||
Returns:
|
||||
True if the document satisfies the expected schema requirements
|
||||
"""
|
||||
try:
|
||||
# Extract actual document structure
|
||||
doc_properties = document_schema.get('properties', {})
|
||||
expected_properties = expected_schema.get('properties', {})
|
||||
|
||||
# Check all required properties are present
|
||||
required_properties = expected_schema.get('required', [])
|
||||
for prop in required_properties:
|
||||
if prop not in doc_properties:
|
||||
return False
|
||||
|
||||
# Validate heading structure if specified
|
||||
if 'headings' in expected_properties and 'headings' in doc_properties:
|
||||
if not self._validate_heading_structure(
|
||||
doc_properties['headings'],
|
||||
expected_properties['headings']
|
||||
):
|
||||
return False
|
||||
|
||||
# Validate other structural elements
|
||||
structural_elements = ['paragraphs', 'lists', 'code_blocks', 'blockquotes', 'tables']
|
||||
for element in structural_elements:
|
||||
if element in expected_properties:
|
||||
if not self._validate_structural_element(
|
||||
doc_properties.get(element),
|
||||
expected_properties[element]
|
||||
):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
except Exception:
|
||||
# If comparison fails for any reason, consider validation failed
|
||||
return False
|
||||
|
||||
def _validate_heading_structure(self, actual_headings: Dict[str, Any], expected_headings: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate heading structure against expected requirements.
|
||||
|
||||
Args:
|
||||
actual_headings: Actual heading structure from document
|
||||
expected_headings: Expected heading requirements
|
||||
|
||||
Returns:
|
||||
True if heading structure meets requirements
|
||||
"""
|
||||
actual_heading_props = actual_headings.get('properties', {})
|
||||
expected_heading_props = expected_headings.get('properties', {})
|
||||
required_heading_levels = expected_headings.get('required', [])
|
||||
|
||||
# Check required heading levels are present
|
||||
for level in required_heading_levels:
|
||||
if level not in actual_heading_props:
|
||||
return False
|
||||
|
||||
# Check each expected heading level meets requirements
|
||||
for level, expected_spec in expected_heading_props.items():
|
||||
if level not in actual_heading_props:
|
||||
# If level is not required, skip it
|
||||
if level not in required_heading_levels:
|
||||
continue
|
||||
return False
|
||||
|
||||
actual_spec = actual_heading_props[level]
|
||||
|
||||
# Check minimum and maximum item requirements
|
||||
if not self._validate_array_constraints(actual_spec, expected_spec):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _validate_structural_element(self, actual_element: Dict[str, Any], expected_element: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate a structural element (paragraphs, lists, etc.) against requirements.
|
||||
|
||||
Args:
|
||||
actual_element: Actual element structure from document
|
||||
expected_element: Expected element requirements
|
||||
|
||||
Returns:
|
||||
True if element meets requirements
|
||||
"""
|
||||
if actual_element is None:
|
||||
# Element doesn't exist in document
|
||||
return False
|
||||
|
||||
return self._validate_array_constraints(actual_element, expected_element)
|
||||
|
||||
def _validate_array_constraints(self, actual: Dict[str, Any], expected: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate array constraints (minItems, maxItems) for structural elements.
|
||||
|
||||
Args:
|
||||
actual: Actual element specification
|
||||
expected: Expected element specification
|
||||
|
||||
Returns:
|
||||
True if constraints are satisfied
|
||||
"""
|
||||
# Get actual count from the schema specification
|
||||
# For generated schemas, we use minItems/maxItems which represent actual counts
|
||||
actual_min = actual.get('minItems', 0)
|
||||
actual_max = actual.get('maxItems', actual_min)
|
||||
actual_count = actual_max # In our generated schemas, min=max=actual count
|
||||
|
||||
# Check against expected constraints
|
||||
expected_min = expected.get('minItems', 0)
|
||||
expected_max = expected.get('maxItems', float('inf'))
|
||||
|
||||
return expected_min <= actual_count <= expected_max
|
||||
|
||||
# Issue #8: Detailed Error Reporting Methods
|
||||
|
||||
def validate_file_with_errors(self, file_path: Path, schema: Dict[str, Any]) -> ValidationErrorCollector:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema and collect detailed errors.
|
||||
|
||||
This method provides comprehensive error reporting for Issue #8, enabling
|
||||
users to understand exactly how their documents deviate from schemas.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema: JSON schema dictionary to validate against
|
||||
|
||||
Returns:
|
||||
ValidationErrorCollector with all validation errors
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid
|
||||
"""
|
||||
# Validate inputs
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"Markdown file not found: {file_path}")
|
||||
|
||||
# Validate the schema itself
|
||||
self._validate_schema(schema)
|
||||
|
||||
# Initialize error collector
|
||||
error_collector = ValidationErrorCollector()
|
||||
|
||||
# Generate the document's current structure
|
||||
try:
|
||||
document_schema = self.schema_generator.generate_schema_from_file(file_path)
|
||||
except Exception as e:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.STRUCTURAL_VIOLATION,
|
||||
f"Failed to generate document schema: {e}",
|
||||
"document.structure",
|
||||
suggestion="Check if the markdown file is properly formatted"
|
||||
)
|
||||
return error_collector
|
||||
|
||||
# Compare the document's structure against the expected schema and collect errors
|
||||
if self._has_heading_text_constraints(schema):
|
||||
# For heading text validation, we need to handle enum constraints specially
|
||||
self._compare_structures_with_errors(document_schema, schema, error_collector)
|
||||
self._validate_heading_text_constraints_with_errors(file_path, schema, error_collector)
|
||||
else:
|
||||
# Use standard structure comparison for backward compatibility
|
||||
self._compare_structures_with_errors(document_schema, schema, error_collector)
|
||||
|
||||
return error_collector
|
||||
|
||||
def validate_file_with_errors_string(self, file_path: Path, schema_json: str) -> ValidationErrorCollector:
|
||||
"""
|
||||
Validate a markdown file against a JSON schema string and collect detailed errors.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_json: JSON schema as a string
|
||||
|
||||
Returns:
|
||||
ValidationErrorCollector with all validation errors
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the markdown file doesn't exist
|
||||
InvalidSchemaError: If the schema is invalid JSON or schema
|
||||
"""
|
||||
try:
|
||||
schema = json.loads(schema_json)
|
||||
except json.JSONDecodeError as e:
|
||||
raise InvalidSchemaError(f"Invalid JSON schema string: {e}") from e
|
||||
|
||||
return self.validate_file_with_errors(file_path, schema)
|
||||
|
||||
def validate_file_with_errors_file(self, file_path: Path, schema_file_path: Path) -> ValidationErrorCollector:
|
||||
"""
|
||||
Validate a markdown file against a schema file and collect detailed errors.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
schema_file_path: Path to the JSON schema file
|
||||
|
||||
Returns:
|
||||
ValidationErrorCollector with all validation errors
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If either file doesn't exist
|
||||
InvalidSchemaError: If the schema file is invalid
|
||||
"""
|
||||
if not schema_file_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {schema_file_path}")
|
||||
|
||||
try:
|
||||
schema_content = schema_file_path.read_text(encoding='utf-8')
|
||||
schema = json.loads(schema_content)
|
||||
except (IOError, json.JSONDecodeError) as e:
|
||||
raise InvalidSchemaError(f"Failed to load schema file {schema_file_path}: {e}") from e
|
||||
|
||||
return self.validate_file_with_errors(file_path, schema)
|
||||
|
||||
def _compare_structures_with_errors(
|
||||
self,
|
||||
document_schema: Dict[str, Any],
|
||||
expected_schema: Dict[str, Any],
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Compare document structure against expected schema and collect detailed errors.
|
||||
|
||||
This method performs comprehensive validation analysis, collecting specific
|
||||
errors about missing headings, incorrect content counts, and structural violations.
|
||||
|
||||
Args:
|
||||
document_schema: Schema generated from the actual document
|
||||
expected_schema: Expected schema requirements
|
||||
error_collector: Collector to accumulate validation errors
|
||||
"""
|
||||
try:
|
||||
# Extract actual document structure
|
||||
doc_properties = document_schema.get('properties', {})
|
||||
expected_properties = expected_schema.get('properties', {})
|
||||
|
||||
# Check all required properties are present
|
||||
required_properties = expected_schema.get('required', [])
|
||||
for prop in required_properties:
|
||||
if prop not in doc_properties:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.MISSING_REQUIRED_SECTION,
|
||||
f"Missing required section: '{prop}'",
|
||||
f"document.{prop}",
|
||||
expected=f"Section '{prop}' is required by schema",
|
||||
actual="Section not found",
|
||||
suggestion=f"Add the '{prop}' section to your document"
|
||||
)
|
||||
|
||||
# Validate heading structure if specified
|
||||
if 'headings' in expected_properties and 'headings' in doc_properties:
|
||||
self._validate_heading_structure_with_errors(
|
||||
doc_properties['headings'],
|
||||
expected_properties['headings'],
|
||||
error_collector
|
||||
)
|
||||
|
||||
# Validate other structural elements
|
||||
structural_elements = ['paragraphs', 'lists', 'code_blocks', 'blockquotes', 'tables']
|
||||
for element in structural_elements:
|
||||
if element in expected_properties:
|
||||
self._validate_structural_element_with_errors(
|
||||
doc_properties.get(element),
|
||||
expected_properties[element],
|
||||
element,
|
||||
error_collector
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.STRUCTURAL_VIOLATION,
|
||||
f"Error during structure comparison: {e}",
|
||||
"document.structure",
|
||||
suggestion="Check if both the document and schema are properly formatted"
|
||||
)
|
||||
|
||||
def _validate_heading_structure_with_errors(
|
||||
self,
|
||||
actual_headings: Dict[str, Any],
|
||||
expected_headings: Dict[str, Any],
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate heading structure and collect detailed errors.
|
||||
|
||||
Args:
|
||||
actual_headings: Actual heading structure from document
|
||||
expected_headings: Expected heading requirements
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
actual_heading_props = actual_headings.get('properties', {})
|
||||
expected_heading_props = expected_headings.get('properties', {})
|
||||
required_heading_levels = expected_headings.get('required', [])
|
||||
|
||||
# Check required heading levels are present
|
||||
for level in required_heading_levels:
|
||||
if level not in actual_heading_props:
|
||||
level_num = level.replace('level_', '')
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.MISSING_REQUIRED_HEADING,
|
||||
f"Missing required heading level {level_num}",
|
||||
f"headings.{level}",
|
||||
expected=f"At least one heading at level {level_num}",
|
||||
actual="No headings found at this level",
|
||||
suggestion=f"Add heading(s) at level {level_num} (e.g., {'#' * int(level_num)} Heading)"
|
||||
)
|
||||
|
||||
# Check each expected heading level meets requirements
|
||||
for level, expected_spec in expected_heading_props.items():
|
||||
if level not in actual_heading_props:
|
||||
# If level is not required, skip it
|
||||
if level not in required_heading_levels:
|
||||
continue
|
||||
# Already handled above in required check
|
||||
|
||||
else:
|
||||
actual_spec = actual_heading_props[level]
|
||||
level_num = level.replace('level_', '')
|
||||
|
||||
# Check minimum and maximum item requirements
|
||||
self._validate_array_constraints_with_errors(
|
||||
actual_spec,
|
||||
expected_spec,
|
||||
f"headings.{level}",
|
||||
f"level {level_num} headings",
|
||||
error_collector
|
||||
)
|
||||
|
||||
def _validate_structural_element_with_errors(
|
||||
self,
|
||||
actual_element: Dict[str, Any],
|
||||
expected_element: Dict[str, Any],
|
||||
element_name: str,
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate a structural element and collect errors.
|
||||
|
||||
Args:
|
||||
actual_element: Actual element structure from document
|
||||
expected_element: Expected element requirements
|
||||
element_name: Name of the structural element (for error messages)
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
if actual_element is None:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.MISSING_REQUIRED_SECTION,
|
||||
f"Missing required structural element: {element_name}",
|
||||
f"content.{element_name}",
|
||||
expected=f"Document should contain {element_name}",
|
||||
actual="Element not found",
|
||||
suggestion=f"Add {element_name} to your document"
|
||||
)
|
||||
return
|
||||
|
||||
self._validate_array_constraints_with_errors(
|
||||
actual_element,
|
||||
expected_element,
|
||||
f"content.{element_name}",
|
||||
element_name,
|
||||
error_collector
|
||||
)
|
||||
|
||||
def _validate_array_constraints_with_errors(
|
||||
self,
|
||||
actual: Dict[str, Any],
|
||||
expected: Dict[str, Any],
|
||||
path: str,
|
||||
element_description: str,
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate array constraints and collect specific errors.
|
||||
|
||||
Args:
|
||||
actual: Actual element specification
|
||||
expected: Expected element specification
|
||||
path: JSON path for error location
|
||||
element_description: Human-readable element description
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
# Get actual count from the schema specification
|
||||
actual_min = actual.get('minItems', 0)
|
||||
actual_max = actual.get('maxItems', actual_min)
|
||||
actual_count = actual_max # In our generated schemas, min=max=actual count
|
||||
|
||||
# Check against expected constraints
|
||||
expected_min = expected.get('minItems', 0)
|
||||
expected_max = expected.get('maxItems', float('inf'))
|
||||
|
||||
# Check minimum constraint
|
||||
if actual_count < expected_min:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.INSUFFICIENT_CONTENT,
|
||||
f"Insufficient {element_description}: found {actual_count}, required at least {expected_min}",
|
||||
path,
|
||||
expected=f"At least {expected_min} {element_description}",
|
||||
actual=f"{actual_count} {element_description}",
|
||||
suggestion=f"Add {expected_min - actual_count} more {element_description}"
|
||||
)
|
||||
|
||||
# Check maximum constraint
|
||||
if expected_max != float('inf') and actual_count > expected_max:
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.EXCESS_CONTENT,
|
||||
f"Too many {element_description}: found {actual_count}, maximum allowed {expected_max}",
|
||||
path,
|
||||
expected=f"At most {expected_max} {element_description}",
|
||||
actual=f"{actual_count} {element_description}",
|
||||
suggestion=f"Remove {actual_count - expected_max} {element_description}"
|
||||
)
|
||||
|
||||
def _has_heading_text_constraints(self, schema: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Check if the schema has heading text constraints (enum values on heading content).
|
||||
|
||||
Args:
|
||||
schema: JSON schema to check
|
||||
|
||||
Returns:
|
||||
True if schema has heading text constraints
|
||||
"""
|
||||
headings_props = schema.get('properties', {}).get('headings', {}).get('properties', {})
|
||||
|
||||
for level_props in headings_props.values():
|
||||
items = level_props.get('items', {})
|
||||
content_prop = items.get('properties', {}).get('content', {})
|
||||
if 'enum' in content_prop:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _validate_with_heading_text_constraints(
|
||||
self,
|
||||
file_path: Path,
|
||||
expected_schema: Dict[str, Any],
|
||||
document_schema: Dict[str, Any]
|
||||
) -> bool:
|
||||
"""
|
||||
Validate document with heading text constraints by comparing actual content against enum values.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
expected_schema: Schema with heading text constraints
|
||||
document_schema: Generated schema from the actual document
|
||||
|
||||
Returns:
|
||||
True if document meets all constraints including heading text
|
||||
"""
|
||||
# First check standard structure compliance
|
||||
if not self._compare_structures(document_schema, expected_schema):
|
||||
return False
|
||||
|
||||
# Then check heading text constraints
|
||||
expected_headings = expected_schema.get('properties', {}).get('headings', {}).get('properties', {})
|
||||
|
||||
# Generate document analysis with actual heading content
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
ast_tokens = parse_markdown_to_ast(content)
|
||||
structure_analysis = self.schema_generator._analyze_ast_structure(ast_tokens, None)
|
||||
|
||||
for level_key, expected_level_spec in expected_headings.items():
|
||||
content_constraints = expected_level_spec.get('items', {}).get('properties', {}).get('content', {})
|
||||
|
||||
if 'enum' in content_constraints:
|
||||
allowed_texts = content_constraints['enum']
|
||||
actual_headings = structure_analysis['headings'].get(level_key, [])
|
||||
|
||||
for heading in actual_headings:
|
||||
actual_text = heading['content']
|
||||
if actual_text not in allowed_texts:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _validate_heading_text_constraints_with_errors(
|
||||
self,
|
||||
file_path: Path,
|
||||
expected_schema: Dict[str, Any],
|
||||
error_collector: ValidationErrorCollector
|
||||
) -> None:
|
||||
"""
|
||||
Validate heading text constraints and collect detailed errors.
|
||||
|
||||
Args:
|
||||
file_path: Path to the markdown file
|
||||
expected_schema: Schema with heading text constraints
|
||||
error_collector: Collector for validation errors
|
||||
"""
|
||||
expected_headings = expected_schema.get('properties', {}).get('headings', {}).get('properties', {})
|
||||
|
||||
# Generate document analysis with actual heading content
|
||||
content = file_path.read_text(encoding='utf-8')
|
||||
ast_tokens = parse_markdown_to_ast(content)
|
||||
structure_analysis = self.schema_generator._analyze_ast_structure(ast_tokens, None)
|
||||
|
||||
for level_key, expected_level_spec in expected_headings.items():
|
||||
content_constraints = expected_level_spec.get('items', {}).get('properties', {}).get('content', {})
|
||||
|
||||
if 'enum' in content_constraints:
|
||||
allowed_texts = content_constraints['enum']
|
||||
actual_headings = structure_analysis['headings'].get(level_key, [])
|
||||
|
||||
for i, heading in enumerate(actual_headings):
|
||||
actual_text = heading['content']
|
||||
if actual_text not in allowed_texts:
|
||||
# Add detailed error about heading text mismatch
|
||||
error_collector.add_error(
|
||||
ValidationErrorType.HEADING_COUNT_MISMATCH,
|
||||
f"Heading text mismatch at {level_key.replace('_', ' ')} #{i+1}: expected one of {allowed_texts}, found '{actual_text}'",
|
||||
f"headings.{level_key}[{i}].content",
|
||||
expected=f"One of: {allowed_texts}",
|
||||
actual=actual_text,
|
||||
suggestion=f"Change heading text to one of the allowed values: {', '.join(allowed_texts)}"
|
||||
)
|
||||
__all__ = ['SchemaValidator']
|
||||
|
||||
@@ -1,359 +1,11 @@
|
||||
"""
|
||||
AST to Markdown Serialization - Issue #2 Completion
|
||||
AST to Markdown Serialization - Backward Compatibility Module.
|
||||
|
||||
This module provides functionality to serialize markdown-it AST tokens back into
|
||||
markdown format, enabling roundtrip validation and document manipulation.
|
||||
|
||||
Key Features:
|
||||
- Convert AST tokens back to markdown text
|
||||
- Preserve front matter during serialization
|
||||
- Support for content manipulation operations
|
||||
- Roundtrip integrity validation
|
||||
This module re-exports from markitect.core.serializer for backward compatibility.
|
||||
New code should import from markitect.core.serializer directly.
|
||||
"""
|
||||
|
||||
from typing import List, Dict, Any, Optional
|
||||
import yaml
|
||||
# Re-export from core for backward compatibility
|
||||
from markitect.core.serializer import ASTSerializer
|
||||
|
||||
|
||||
class ASTSerializer:
|
||||
"""
|
||||
Serializes markdown-it AST tokens back to markdown format.
|
||||
|
||||
Provides roundtrip capability: markdown → AST → markdown
|
||||
Supports front matter preservation and content manipulation.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the AST serializer."""
|
||||
pass
|
||||
|
||||
def serialize_to_markdown(self, ast: List[Dict[str, Any]], front_matter: Optional[Dict[str, Any]] = None) -> str:
|
||||
"""
|
||||
Convert AST tokens back to markdown format.
|
||||
|
||||
Args:
|
||||
ast: List of markdown-it AST tokens
|
||||
front_matter: Optional YAML front matter dictionary
|
||||
|
||||
Returns:
|
||||
Markdown text with optional front matter
|
||||
|
||||
Example:
|
||||
serializer = ASTSerializer()
|
||||
markdown = serializer.serialize_to_markdown(ast, front_matter)
|
||||
"""
|
||||
markdown_parts = []
|
||||
|
||||
# Add front matter if present
|
||||
if front_matter and isinstance(front_matter, dict) and front_matter:
|
||||
yaml_content = yaml.dump(front_matter, default_flow_style=False).strip()
|
||||
markdown_parts.append(f"---\n{yaml_content}\n---\n\n")
|
||||
|
||||
# Process AST tokens
|
||||
markdown_content = self._process_tokens(ast)
|
||||
markdown_parts.append(markdown_content)
|
||||
|
||||
return ''.join(markdown_parts)
|
||||
|
||||
def _process_tokens(self, tokens: List[Dict[str, Any]]) -> str:
|
||||
"""
|
||||
Process a list of AST tokens into markdown text.
|
||||
|
||||
Args:
|
||||
tokens: List of markdown-it tokens
|
||||
|
||||
Returns:
|
||||
Markdown text representation
|
||||
"""
|
||||
markdown_lines = []
|
||||
current_line = ""
|
||||
list_level = 0
|
||||
|
||||
for token in tokens:
|
||||
token_type = token.get('type', '')
|
||||
content = token.get('content', '')
|
||||
markup = token.get('markup', '')
|
||||
tag = token.get('tag', '')
|
||||
nesting = token.get('nesting', 0)
|
||||
level = token.get('level', 0)
|
||||
|
||||
# Handle different token types
|
||||
if token_type == 'heading_open':
|
||||
heading_level = int(tag[1]) if tag.startswith('h') else 1
|
||||
current_line = '#' * heading_level + ' '
|
||||
elif token_type == 'heading_close':
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
current_line = ""
|
||||
markdown_lines.append("") # Empty line after heading
|
||||
|
||||
elif token_type == 'paragraph_open':
|
||||
pass # Start of paragraph
|
||||
elif token_type == 'paragraph_close':
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
current_line = ""
|
||||
markdown_lines.append("") # Empty line after paragraph
|
||||
|
||||
elif token_type == 'inline':
|
||||
# Process inline content and children
|
||||
if content:
|
||||
current_line += content
|
||||
elif 'children' in token:
|
||||
current_line += self._process_inline_children(token['children'])
|
||||
|
||||
elif token_type == 'list_item_open':
|
||||
# Handle list items
|
||||
indent = ' ' * (level // 2)
|
||||
if markup in ('-', '*'):
|
||||
current_line = indent + '- '
|
||||
elif markup.isdigit():
|
||||
current_line = indent + '1. '
|
||||
elif token_type == 'list_item_close':
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
current_line = ""
|
||||
|
||||
elif token_type in ('bullet_list_open', 'ordered_list_open'):
|
||||
list_level += 1
|
||||
elif token_type in ('bullet_list_close', 'ordered_list_close'):
|
||||
list_level -= 1
|
||||
if list_level == 0:
|
||||
markdown_lines.append("") # Empty line after list
|
||||
|
||||
elif token_type == 'blockquote_open':
|
||||
pass
|
||||
elif token_type == 'blockquote_close':
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'code_block':
|
||||
markdown_lines.append(f"```{token.get('info', '')}")
|
||||
markdown_lines.append(content.rstrip())
|
||||
markdown_lines.append("```")
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'fence':
|
||||
if nesting == 1: # Opening fence
|
||||
markdown_lines.append(f"```{token.get('info', '')}")
|
||||
else: # Closing fence
|
||||
markdown_lines.append("```")
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'hr':
|
||||
markdown_lines.append("---")
|
||||
markdown_lines.append("")
|
||||
|
||||
elif token_type == 'text':
|
||||
current_line += content
|
||||
|
||||
# Add any remaining content
|
||||
if current_line:
|
||||
markdown_lines.append(current_line.rstrip())
|
||||
|
||||
# Clean up extra empty lines at the end
|
||||
while markdown_lines and markdown_lines[-1] == "":
|
||||
markdown_lines.pop()
|
||||
|
||||
return '\n'.join(markdown_lines)
|
||||
|
||||
def _process_inline_children(self, children: List[Dict[str, Any]]) -> str:
|
||||
"""
|
||||
Process inline children tokens (emphasis, strong, links, etc.).
|
||||
|
||||
Args:
|
||||
children: List of inline token children
|
||||
|
||||
Returns:
|
||||
Processed inline markdown text
|
||||
"""
|
||||
result = ""
|
||||
|
||||
for child in children:
|
||||
token_type = child.get('type', '')
|
||||
content = child.get('content', '')
|
||||
markup = child.get('markup', '')
|
||||
|
||||
if token_type == 'text':
|
||||
result += content
|
||||
elif token_type == 'code_inline':
|
||||
result += f"`{content}`"
|
||||
elif token_type == 'em_open':
|
||||
result += markup or '*'
|
||||
elif token_type == 'em_close':
|
||||
result += markup or '*'
|
||||
elif token_type == 'strong_open':
|
||||
result += markup or '**'
|
||||
elif token_type == 'strong_close':
|
||||
result += markup or '**'
|
||||
elif token_type == 'link_open':
|
||||
# Extract href from attrs
|
||||
href = ""
|
||||
if 'attrs' in child and child['attrs']:
|
||||
for attr in child['attrs']:
|
||||
if attr[0] == 'href':
|
||||
href = attr[1]
|
||||
break
|
||||
result += "["
|
||||
elif token_type == 'link_close':
|
||||
# This is tricky - we need to get the href from the opening token
|
||||
# For now, we'll use a placeholder approach
|
||||
result += "](#)"
|
||||
elif token_type == 'softbreak':
|
||||
result += '\n'
|
||||
elif token_type == 'hardbreak':
|
||||
result += ' \n'
|
||||
|
||||
return result
|
||||
|
||||
def modify_ast_content(self, ast: List[Dict[str, Any]], modifications: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Modify AST content based on provided modifications.
|
||||
|
||||
Args:
|
||||
ast: Original AST tokens
|
||||
modifications: Dictionary of modifications to apply
|
||||
|
||||
Returns:
|
||||
Modified AST tokens
|
||||
|
||||
Supported modifications:
|
||||
- add_section: Add a new section with title and content
|
||||
- update_front_matter: Update front matter values
|
||||
"""
|
||||
modified_ast = ast.copy()
|
||||
|
||||
# Handle adding sections
|
||||
if 'add_section' in modifications:
|
||||
section_data = modifications['add_section']
|
||||
title = section_data.get('title', 'New Section')
|
||||
content = section_data.get('content', '')
|
||||
level = section_data.get('level', 2)
|
||||
|
||||
# Create new section tokens
|
||||
new_tokens = [
|
||||
{
|
||||
"type": "heading_open",
|
||||
"tag": f"h{level}",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "#" * level,
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "inline",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 1,
|
||||
"children": [
|
||||
{
|
||||
"type": "text",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 0,
|
||||
"content": title,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": False,
|
||||
"hidden": False
|
||||
}
|
||||
],
|
||||
"content": title,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "heading_close",
|
||||
"tag": f"h{level}",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": -1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "#" * level,
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
}
|
||||
]
|
||||
|
||||
if content:
|
||||
new_tokens.extend([
|
||||
{
|
||||
"type": "paragraph_open",
|
||||
"tag": "p",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "inline",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 1,
|
||||
"children": [
|
||||
{
|
||||
"type": "text",
|
||||
"tag": "",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": 0,
|
||||
"level": 0,
|
||||
"content": content,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": False,
|
||||
"hidden": False
|
||||
}
|
||||
],
|
||||
"content": content,
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
},
|
||||
{
|
||||
"type": "paragraph_close",
|
||||
"tag": "p",
|
||||
"attrs": {},
|
||||
"map": None,
|
||||
"nesting": -1,
|
||||
"level": 0,
|
||||
"content": "",
|
||||
"markup": "",
|
||||
"info": "",
|
||||
"meta": {},
|
||||
"block": True,
|
||||
"hidden": False
|
||||
}
|
||||
])
|
||||
|
||||
# Add to end of AST
|
||||
modified_ast.extend(new_tokens)
|
||||
|
||||
return modified_ast
|
||||
__all__ = ['ASTSerializer']
|
||||
|
||||
76
markitect/spaces/__init__.py
Normal file
76
markitect/spaces/__init__.py
Normal file
@@ -0,0 +1,76 @@
|
||||
"""
|
||||
Information Spaces package for MarkiTect.
|
||||
|
||||
This package provides the Information Space abstraction, enabling:
|
||||
- First-class space entities with identity, metadata, and lifecycle
|
||||
- Event-driven change tracking and notifications
|
||||
- Persistent transclusion context with cross-space references
|
||||
- HTML rendering with caching and theme support
|
||||
- Bidirectional directory synchronization
|
||||
- Composable space hierarchies
|
||||
|
||||
Package Structure:
|
||||
- models: Core domain models (InformationSpace, SpaceDocument, SpaceConfig)
|
||||
- events: Event system (SpaceEvent, EventBus, handlers)
|
||||
- repositories: Data access layer (ISpaceRepository, SqliteSpaceRepository)
|
||||
- transclusion: Persistent transclusion context and reference tracking
|
||||
- rendering: Space rendering (HTML, themes)
|
||||
- sync: Directory synchronization (export, import, bidirectional)
|
||||
- services: Business logic (SpaceService)
|
||||
- history: Optional git-based version control
|
||||
|
||||
Usage:
|
||||
from markitect.spaces import SpaceService, InformationSpace
|
||||
|
||||
service = SpaceService()
|
||||
space = await service.create_space("my-docs")
|
||||
await service.add_document(space, "/intro.md", content="# Intro")
|
||||
await service.render(space, output_dir="./html/")
|
||||
"""
|
||||
|
||||
# Phase 1: Foundation
|
||||
from .models import (
|
||||
InformationSpace,
|
||||
SpaceDocument,
|
||||
SpaceConfig,
|
||||
SpaceMetadata,
|
||||
SpaceVariable,
|
||||
TransclusionReference,
|
||||
SpaceStatus,
|
||||
)
|
||||
from .services import SpaceService
|
||||
from .repositories import (
|
||||
ISpaceRepository,
|
||||
IDocumentAssociationRepository,
|
||||
IVariableRepository,
|
||||
IReferenceRepository,
|
||||
SqliteSpaceRepository,
|
||||
SqliteDocumentRepository,
|
||||
SqliteVariableRepository,
|
||||
SqliteReferenceRepository,
|
||||
initialize_space_tables,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Models
|
||||
"InformationSpace",
|
||||
"SpaceDocument",
|
||||
"SpaceConfig",
|
||||
"SpaceMetadata",
|
||||
"SpaceVariable",
|
||||
"TransclusionReference",
|
||||
"SpaceStatus",
|
||||
# Services
|
||||
"SpaceService",
|
||||
# Repository Interfaces
|
||||
"ISpaceRepository",
|
||||
"IDocumentAssociationRepository",
|
||||
"IVariableRepository",
|
||||
"IReferenceRepository",
|
||||
# SQLite Implementations
|
||||
"SqliteSpaceRepository",
|
||||
"SqliteDocumentRepository",
|
||||
"SqliteVariableRepository",
|
||||
"SqliteReferenceRepository",
|
||||
"initialize_space_tables",
|
||||
]
|
||||
16
markitect/spaces/events/__init__.py
Normal file
16
markitect/spaces/events/__init__.py
Normal file
@@ -0,0 +1,16 @@
|
||||
"""
|
||||
Event system for Information Spaces.
|
||||
|
||||
This package provides event-driven architecture for space operations:
|
||||
- SpaceEvent: Event dataclass with type, payload, timestamp
|
||||
- EventBus: In-process publish/subscribe for space events
|
||||
- Event handlers and registration
|
||||
|
||||
Events emitted:
|
||||
- SPACE_CREATED, SPACE_UPDATED, SPACE_DELETED
|
||||
- DOCUMENT_ADDED, DOCUMENT_UPDATED, DOCUMENT_REMOVED
|
||||
- RENDER_COMPLETED, SYNC_COMPLETED
|
||||
"""
|
||||
|
||||
# Events will be implemented in Phase 2
|
||||
__all__ = []
|
||||
13
markitect/spaces/history/__init__.py
Normal file
13
markitect/spaces/history/__init__.py
Normal file
@@ -0,0 +1,13 @@
|
||||
"""
|
||||
Git history tracking for Information Spaces (Optional Phase 8).
|
||||
|
||||
This package provides version control integration:
|
||||
- IHistoryBackend: Abstract history backend interface
|
||||
- GitHistoryBackend: Git implementation
|
||||
- Event-driven commit triggers
|
||||
- History query API (log, diff, branches)
|
||||
- Versioned read/render operations
|
||||
"""
|
||||
|
||||
# History tracking will be implemented in Phase 8
|
||||
__all__ = []
|
||||
329
markitect/spaces/models.py
Normal file
329
markitect/spaces/models.py
Normal file
@@ -0,0 +1,329 @@
|
||||
"""
|
||||
Core domain models for Information Spaces.
|
||||
|
||||
This module provides the foundational data models for the Information Space
|
||||
abstraction, including the space entity, document associations, and configuration.
|
||||
"""
|
||||
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Dict, Any, List, Optional
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class SpaceStatus(Enum):
|
||||
"""Lifecycle status of an Information Space."""
|
||||
DRAFT = "draft"
|
||||
ACTIVE = "active"
|
||||
ARCHIVED = "archived"
|
||||
DELETED = "deleted"
|
||||
|
||||
|
||||
@dataclass
|
||||
class SpaceMetadata:
|
||||
"""
|
||||
Extensible metadata for an Information Space.
|
||||
|
||||
Attributes:
|
||||
tags: List of tags for categorization
|
||||
author: Author identifier
|
||||
custom: Dictionary for custom metadata fields
|
||||
"""
|
||||
tags: List[str] = field(default_factory=list)
|
||||
author: Optional[str] = None
|
||||
custom: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert metadata to dictionary for serialization."""
|
||||
return {
|
||||
"tags": self.tags,
|
||||
"author": self.author,
|
||||
"custom": self.custom,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "SpaceMetadata":
|
||||
"""Create metadata from dictionary."""
|
||||
return cls(
|
||||
tags=data.get("tags", []),
|
||||
author=data.get("author"),
|
||||
custom=data.get("custom", {}),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SpaceConfig:
|
||||
"""
|
||||
Configuration settings for an Information Space.
|
||||
|
||||
Attributes:
|
||||
default_variant: Default directory variant for export (flat/hierarchical/semantic)
|
||||
enable_caching: Whether to enable render caching
|
||||
theme: Theme name for HTML rendering
|
||||
history_enabled: Whether git history tracking is enabled (Phase 8)
|
||||
history_backend: History backend type (default: "git")
|
||||
history_options: Additional history backend options
|
||||
variable_scope: Default variable scope resolution strategy
|
||||
"""
|
||||
default_variant: str = "hierarchical"
|
||||
enable_caching: bool = True
|
||||
theme: Optional[str] = None
|
||||
history_enabled: bool = False
|
||||
history_backend: str = "git"
|
||||
history_options: Dict[str, Any] = field(default_factory=dict)
|
||||
variable_scope: str = "space" # space, document, request
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert config to dictionary for serialization."""
|
||||
return {
|
||||
"default_variant": self.default_variant,
|
||||
"enable_caching": self.enable_caching,
|
||||
"theme": self.theme,
|
||||
"history_enabled": self.history_enabled,
|
||||
"history_backend": self.history_backend,
|
||||
"history_options": self.history_options,
|
||||
"variable_scope": self.variable_scope,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "SpaceConfig":
|
||||
"""Create config from dictionary."""
|
||||
return cls(
|
||||
default_variant=data.get("default_variant", "hierarchical"),
|
||||
enable_caching=data.get("enable_caching", True),
|
||||
theme=data.get("theme"),
|
||||
history_enabled=data.get("history_enabled", False),
|
||||
history_backend=data.get("history_backend", "git"),
|
||||
history_options=data.get("history_options", {}),
|
||||
variable_scope=data.get("variable_scope", "space"),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SpaceDocument:
|
||||
"""
|
||||
Represents a document's membership in an Information Space.
|
||||
|
||||
Attributes:
|
||||
id: Unique document membership identifier
|
||||
space_id: ID of the containing space
|
||||
document_id: Reference to the actual document
|
||||
space_path: Path within the space (e.g., "/intro.md")
|
||||
order_index: Ordering within the space
|
||||
metadata: Document-specific metadata
|
||||
content_hash: Hash of document content for change detection
|
||||
added_at: Timestamp when document was added
|
||||
"""
|
||||
id: str = field(default_factory=lambda: str(uuid.uuid4()))
|
||||
space_id: str = ""
|
||||
document_id: str = ""
|
||||
space_path: str = ""
|
||||
order_index: int = 0
|
||||
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||
content_hash: Optional[str] = None
|
||||
added_at: datetime = field(default_factory=datetime.now)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert document association to dictionary."""
|
||||
return {
|
||||
"id": self.id,
|
||||
"space_id": self.space_id,
|
||||
"document_id": self.document_id,
|
||||
"space_path": self.space_path,
|
||||
"order_index": self.order_index,
|
||||
"metadata": self.metadata,
|
||||
"content_hash": self.content_hash,
|
||||
"added_at": self.added_at.isoformat(),
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "SpaceDocument":
|
||||
"""Create document association from dictionary."""
|
||||
added_at = data.get("added_at")
|
||||
if isinstance(added_at, str):
|
||||
added_at = datetime.fromisoformat(added_at)
|
||||
elif added_at is None:
|
||||
added_at = datetime.now()
|
||||
|
||||
return cls(
|
||||
id=data.get("id", str(uuid.uuid4())),
|
||||
space_id=data.get("space_id", ""),
|
||||
document_id=data.get("document_id", ""),
|
||||
space_path=data.get("space_path", ""),
|
||||
order_index=data.get("order_index", 0),
|
||||
metadata=data.get("metadata", {}),
|
||||
content_hash=data.get("content_hash"),
|
||||
added_at=added_at,
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class InformationSpace:
|
||||
"""
|
||||
First-class Information Space abstraction.
|
||||
|
||||
An Information Space is a container for documents with transclusion
|
||||
relationships, persistent context, and lifecycle management.
|
||||
|
||||
Attributes:
|
||||
id: Unique space identifier
|
||||
name: Human-readable unique name
|
||||
description: Optional description
|
||||
metadata: Extensible metadata
|
||||
config: Space configuration
|
||||
parent_space_id: Optional parent space for inheritance
|
||||
status: Current lifecycle status
|
||||
created_at: Creation timestamp
|
||||
updated_at: Last update timestamp
|
||||
|
||||
Example:
|
||||
space = InformationSpace(
|
||||
name="api-docs",
|
||||
description="API Documentation",
|
||||
config=SpaceConfig(theme="technical")
|
||||
)
|
||||
"""
|
||||
id: str = field(default_factory=lambda: str(uuid.uuid4()))
|
||||
name: str = ""
|
||||
description: Optional[str] = None
|
||||
metadata: SpaceMetadata = field(default_factory=SpaceMetadata)
|
||||
config: SpaceConfig = field(default_factory=SpaceConfig)
|
||||
parent_space_id: Optional[str] = None
|
||||
status: SpaceStatus = SpaceStatus.DRAFT
|
||||
created_at: datetime = field(default_factory=datetime.now)
|
||||
updated_at: datetime = field(default_factory=datetime.now)
|
||||
|
||||
def __post_init__(self):
|
||||
"""Validate space after initialization."""
|
||||
if not self.name:
|
||||
raise ValueError("Space name is required")
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert space to dictionary for serialization."""
|
||||
return {
|
||||
"id": self.id,
|
||||
"name": self.name,
|
||||
"description": self.description,
|
||||
"metadata": self.metadata.to_dict() if isinstance(self.metadata, SpaceMetadata) else self.metadata,
|
||||
"config": self.config.to_dict() if isinstance(self.config, SpaceConfig) else self.config,
|
||||
"parent_space_id": self.parent_space_id,
|
||||
"status": self.status.value if isinstance(self.status, SpaceStatus) else self.status,
|
||||
"created_at": self.created_at.isoformat(),
|
||||
"updated_at": self.updated_at.isoformat(),
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "InformationSpace":
|
||||
"""Create space from dictionary."""
|
||||
created_at = data.get("created_at")
|
||||
if isinstance(created_at, str):
|
||||
created_at = datetime.fromisoformat(created_at)
|
||||
elif created_at is None:
|
||||
created_at = datetime.now()
|
||||
|
||||
updated_at = data.get("updated_at")
|
||||
if isinstance(updated_at, str):
|
||||
updated_at = datetime.fromisoformat(updated_at)
|
||||
elif updated_at is None:
|
||||
updated_at = datetime.now()
|
||||
|
||||
status = data.get("status", "draft")
|
||||
if isinstance(status, str):
|
||||
status = SpaceStatus(status)
|
||||
|
||||
metadata = data.get("metadata", {})
|
||||
if isinstance(metadata, dict):
|
||||
metadata = SpaceMetadata.from_dict(metadata)
|
||||
|
||||
config = data.get("config", {})
|
||||
if isinstance(config, dict):
|
||||
config = SpaceConfig.from_dict(config)
|
||||
|
||||
return cls(
|
||||
id=data.get("id", str(uuid.uuid4())),
|
||||
name=data["name"],
|
||||
description=data.get("description"),
|
||||
metadata=metadata,
|
||||
config=config,
|
||||
parent_space_id=data.get("parent_space_id"),
|
||||
status=status,
|
||||
created_at=created_at,
|
||||
updated_at=updated_at,
|
||||
)
|
||||
|
||||
def activate(self) -> None:
|
||||
"""Activate the space."""
|
||||
self.status = SpaceStatus.ACTIVE
|
||||
self.updated_at = datetime.now()
|
||||
|
||||
def archive(self) -> None:
|
||||
"""Archive the space."""
|
||||
self.status = SpaceStatus.ARCHIVED
|
||||
self.updated_at = datetime.now()
|
||||
|
||||
def touch(self) -> None:
|
||||
"""Update the last modified timestamp."""
|
||||
self.updated_at = datetime.now()
|
||||
|
||||
|
||||
@dataclass
|
||||
class SpaceVariable:
|
||||
"""
|
||||
Variable stored at space level for transclusion context.
|
||||
|
||||
Attributes:
|
||||
space_id: ID of the containing space
|
||||
name: Variable name
|
||||
value: Variable value (JSON-serializable)
|
||||
scope: Variable scope (space, document, request)
|
||||
"""
|
||||
space_id: str
|
||||
name: str
|
||||
value: Any
|
||||
scope: str = "space"
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"space_id": self.space_id,
|
||||
"name": self.name,
|
||||
"value": self.value,
|
||||
"scope": self.scope,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "SpaceVariable":
|
||||
"""Create from dictionary."""
|
||||
return cls(
|
||||
space_id=data["space_id"],
|
||||
name=data["name"],
|
||||
value=data["value"],
|
||||
scope=data.get("scope", "space"),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class TransclusionReference:
|
||||
"""
|
||||
Tracks a transclusion reference between documents for cache invalidation.
|
||||
|
||||
Attributes:
|
||||
source_doc_id: ID of the document containing the transclusion
|
||||
target_doc_id: ID of the transcluded document
|
||||
space_id: ID of the space containing the reference
|
||||
created_at: When the reference was created
|
||||
"""
|
||||
source_doc_id: str
|
||||
target_doc_id: str
|
||||
space_id: str
|
||||
created_at: datetime = field(default_factory=datetime.now)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"source_doc_id": self.source_doc_id,
|
||||
"target_doc_id": self.target_doc_id,
|
||||
"space_id": self.space_id,
|
||||
"created_at": self.created_at.isoformat(),
|
||||
}
|
||||
12
markitect/spaces/rendering/__init__.py
Normal file
12
markitect/spaces/rendering/__init__.py
Normal file
@@ -0,0 +1,12 @@
|
||||
"""
|
||||
Rendering system for Information Spaces.
|
||||
|
||||
This package provides space rendering capabilities:
|
||||
- SpaceRenderer: Abstract renderer interface
|
||||
- MarkdownToHTMLRenderer: HTML output renderer
|
||||
- Theme support and customization
|
||||
- Render caching with invalidation
|
||||
"""
|
||||
|
||||
# Rendering will be implemented in Phase 4
|
||||
__all__ = []
|
||||
38
markitect/spaces/repositories/__init__.py
Normal file
38
markitect/spaces/repositories/__init__.py
Normal file
@@ -0,0 +1,38 @@
|
||||
"""
|
||||
Repository layer for Information Spaces.
|
||||
|
||||
This package provides data access abstractions:
|
||||
- ISpaceRepository: Abstract interface for space CRUD operations
|
||||
- SqliteSpaceRepository: SQLite implementation
|
||||
- IDocumentAssociationRepository: Document-space association storage
|
||||
- IVariableRepository: Space variable storage
|
||||
- IReferenceRepository: Transclusion reference tracking
|
||||
"""
|
||||
|
||||
from .interfaces import (
|
||||
ISpaceRepository,
|
||||
IDocumentAssociationRepository,
|
||||
IVariableRepository,
|
||||
IReferenceRepository,
|
||||
)
|
||||
from .sqlite import (
|
||||
SqliteSpaceRepository,
|
||||
SqliteDocumentRepository,
|
||||
SqliteVariableRepository,
|
||||
SqliteReferenceRepository,
|
||||
initialize_space_tables,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Interfaces
|
||||
"ISpaceRepository",
|
||||
"IDocumentAssociationRepository",
|
||||
"IVariableRepository",
|
||||
"IReferenceRepository",
|
||||
# SQLite implementations
|
||||
"SqliteSpaceRepository",
|
||||
"SqliteDocumentRepository",
|
||||
"SqliteVariableRepository",
|
||||
"SqliteReferenceRepository",
|
||||
"initialize_space_tables",
|
||||
]
|
||||
409
markitect/spaces/repositories/interfaces.py
Normal file
409
markitect/spaces/repositories/interfaces.py
Normal file
@@ -0,0 +1,409 @@
|
||||
"""
|
||||
Repository interfaces for Information Spaces.
|
||||
|
||||
This module defines abstract base classes for space data access,
|
||||
following the repository pattern for clean separation of concerns.
|
||||
"""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import List, Optional, Dict, Any
|
||||
from ..models import (
|
||||
InformationSpace,
|
||||
SpaceDocument,
|
||||
SpaceVariable,
|
||||
TransclusionReference,
|
||||
)
|
||||
|
||||
|
||||
class ISpaceRepository(ABC):
|
||||
"""
|
||||
Abstract repository interface for InformationSpace persistence.
|
||||
|
||||
Implementations should handle CRUD operations for spaces,
|
||||
including proper transaction management and error handling.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def create(self, space: InformationSpace) -> InformationSpace:
|
||||
"""
|
||||
Create a new space in the repository.
|
||||
|
||||
Args:
|
||||
space: The space to create
|
||||
|
||||
Returns:
|
||||
The created space with any generated fields populated
|
||||
|
||||
Raises:
|
||||
ValueError: If space with same name already exists
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_by_id(self, space_id: str) -> Optional[InformationSpace]:
|
||||
"""
|
||||
Retrieve a space by its ID.
|
||||
|
||||
Args:
|
||||
space_id: The unique space identifier
|
||||
|
||||
Returns:
|
||||
The space if found, None otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_by_name(self, name: str) -> Optional[InformationSpace]:
|
||||
"""
|
||||
Retrieve a space by its unique name.
|
||||
|
||||
Args:
|
||||
name: The space name
|
||||
|
||||
Returns:
|
||||
The space if found, None otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def list_all(self, include_archived: bool = False) -> List[InformationSpace]:
|
||||
"""
|
||||
List all spaces in the repository.
|
||||
|
||||
Args:
|
||||
include_archived: Whether to include archived spaces
|
||||
|
||||
Returns:
|
||||
List of all spaces
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def update(self, space: InformationSpace) -> InformationSpace:
|
||||
"""
|
||||
Update an existing space.
|
||||
|
||||
Args:
|
||||
space: The space with updated values
|
||||
|
||||
Returns:
|
||||
The updated space
|
||||
|
||||
Raises:
|
||||
ValueError: If space does not exist
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def delete(self, space_id: str) -> bool:
|
||||
"""
|
||||
Delete a space by ID.
|
||||
|
||||
Args:
|
||||
space_id: The space ID to delete
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def exists(self, space_id: str) -> bool:
|
||||
"""
|
||||
Check if a space exists.
|
||||
|
||||
Args:
|
||||
space_id: The space ID to check
|
||||
|
||||
Returns:
|
||||
True if exists, False otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_children(self, parent_space_id: str) -> List[InformationSpace]:
|
||||
"""
|
||||
Get all child spaces of a parent space.
|
||||
|
||||
Args:
|
||||
parent_space_id: The parent space ID
|
||||
|
||||
Returns:
|
||||
List of child spaces
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class IDocumentAssociationRepository(ABC):
|
||||
"""
|
||||
Abstract repository interface for SpaceDocument associations.
|
||||
|
||||
Manages the relationship between documents and spaces.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def add_document(self, document: SpaceDocument) -> SpaceDocument:
|
||||
"""
|
||||
Add a document to a space.
|
||||
|
||||
Args:
|
||||
document: The document association to create
|
||||
|
||||
Returns:
|
||||
The created document association
|
||||
|
||||
Raises:
|
||||
ValueError: If document path already exists in space
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_document(self, document_id: str) -> Optional[SpaceDocument]:
|
||||
"""
|
||||
Get a document association by ID.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
|
||||
Returns:
|
||||
The document if found, None otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_by_space_path(self, space_id: str, space_path: str) -> Optional[SpaceDocument]:
|
||||
"""
|
||||
Get a document by its path within a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
space_path: The path within the space (e.g., "/intro.md")
|
||||
|
||||
Returns:
|
||||
The document if found, None otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def list_by_space(self, space_id: str) -> List[SpaceDocument]:
|
||||
"""
|
||||
List all documents in a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of documents in the space, ordered by order_index
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def update_document(self, document: SpaceDocument) -> SpaceDocument:
|
||||
"""
|
||||
Update a document association.
|
||||
|
||||
Args:
|
||||
document: The document with updated values
|
||||
|
||||
Returns:
|
||||
The updated document
|
||||
|
||||
Raises:
|
||||
ValueError: If document does not exist
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def remove_document(self, document_id: str) -> bool:
|
||||
"""
|
||||
Remove a document from a space.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
|
||||
Returns:
|
||||
True if removed, False if not found
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def move_document(self, document_id: str, new_space_path: str) -> SpaceDocument:
|
||||
"""
|
||||
Move a document to a new path within the space.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
new_space_path: The new path within the space
|
||||
|
||||
Returns:
|
||||
The updated document
|
||||
|
||||
Raises:
|
||||
ValueError: If new path already exists
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def reorder_documents(self, space_id: str, document_ids: List[str]) -> None:
|
||||
"""
|
||||
Reorder documents within a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
document_ids: Ordered list of document IDs
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def update_content_hash(self, document_id: str, content_hash: str) -> None:
|
||||
"""
|
||||
Update the content hash for change detection.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
content_hash: New content hash
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class IVariableRepository(ABC):
|
||||
"""
|
||||
Abstract repository interface for SpaceVariable storage.
|
||||
|
||||
Manages space-level variables for transclusion context.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def set_variable(self, variable: SpaceVariable) -> SpaceVariable:
|
||||
"""
|
||||
Set a variable value.
|
||||
|
||||
Args:
|
||||
variable: The variable to set
|
||||
|
||||
Returns:
|
||||
The saved variable
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_variable(self, space_id: str, name: str) -> Optional[SpaceVariable]:
|
||||
"""
|
||||
Get a variable by name.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
name: Variable name
|
||||
|
||||
Returns:
|
||||
The variable if found, None otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def list_variables(self, space_id: str, scope: Optional[str] = None) -> List[SpaceVariable]:
|
||||
"""
|
||||
List all variables in a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
scope: Optional scope filter
|
||||
|
||||
Returns:
|
||||
List of variables
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def delete_variable(self, space_id: str, name: str) -> bool:
|
||||
"""
|
||||
Delete a variable.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
name: Variable name
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class IReferenceRepository(ABC):
|
||||
"""
|
||||
Abstract repository interface for TransclusionReference tracking.
|
||||
|
||||
Manages the dependency graph for cache invalidation.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def add_reference(self, reference: TransclusionReference) -> TransclusionReference:
|
||||
"""
|
||||
Add a transclusion reference.
|
||||
|
||||
Args:
|
||||
reference: The reference to add
|
||||
|
||||
Returns:
|
||||
The saved reference
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_references_from(self, source_doc_id: str, space_id: str) -> List[TransclusionReference]:
|
||||
"""
|
||||
Get all references from a source document.
|
||||
|
||||
Args:
|
||||
source_doc_id: The source document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of references from this document
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_references_to(self, target_doc_id: str, space_id: str) -> List[TransclusionReference]:
|
||||
"""
|
||||
Get all references to a target document.
|
||||
|
||||
Args:
|
||||
target_doc_id: The target document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of references to this document
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def clear_references_from(self, source_doc_id: str, space_id: str) -> int:
|
||||
"""
|
||||
Clear all references from a source document.
|
||||
|
||||
Args:
|
||||
source_doc_id: The source document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
Number of references deleted
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_dependents(self, document_id: str, space_id: str) -> List[str]:
|
||||
"""
|
||||
Get all documents that depend on a given document.
|
||||
|
||||
Used for cache invalidation - returns documents that need
|
||||
to be re-rendered when the target document changes.
|
||||
|
||||
Args:
|
||||
document_id: The document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of dependent document IDs
|
||||
"""
|
||||
pass
|
||||
713
markitect/spaces/repositories/sqlite.py
Normal file
713
markitect/spaces/repositories/sqlite.py
Normal file
@@ -0,0 +1,713 @@
|
||||
"""
|
||||
SQLite implementation of space repositories.
|
||||
|
||||
This module provides SQLite-backed implementations of the repository
|
||||
interfaces for persistent storage of Information Spaces.
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import List, Optional
|
||||
from datetime import datetime
|
||||
|
||||
from .interfaces import (
|
||||
ISpaceRepository,
|
||||
IDocumentAssociationRepository,
|
||||
IVariableRepository,
|
||||
IReferenceRepository,
|
||||
)
|
||||
from ..models import (
|
||||
InformationSpace,
|
||||
SpaceDocument,
|
||||
SpaceVariable,
|
||||
TransclusionReference,
|
||||
SpaceStatus,
|
||||
SpaceMetadata,
|
||||
SpaceConfig,
|
||||
)
|
||||
|
||||
|
||||
# SQL Schema for space tables
|
||||
SPACE_TABLES_SQL = """
|
||||
-- Information Spaces table
|
||||
CREATE TABLE IF NOT EXISTS spaces (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT UNIQUE NOT NULL,
|
||||
description TEXT,
|
||||
metadata JSON,
|
||||
config JSON,
|
||||
parent_space_id TEXT REFERENCES spaces(id),
|
||||
status TEXT DEFAULT 'draft',
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
-- Space documents association table
|
||||
CREATE TABLE IF NOT EXISTS space_documents (
|
||||
id TEXT PRIMARY KEY,
|
||||
space_id TEXT NOT NULL REFERENCES spaces(id) ON DELETE CASCADE,
|
||||
document_id TEXT NOT NULL,
|
||||
space_path TEXT NOT NULL,
|
||||
order_index INTEGER DEFAULT 0,
|
||||
metadata JSON,
|
||||
content_hash TEXT,
|
||||
added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
UNIQUE(space_id, space_path)
|
||||
);
|
||||
|
||||
-- Space variables for transclusion context
|
||||
CREATE TABLE IF NOT EXISTS space_variables (
|
||||
space_id TEXT NOT NULL REFERENCES spaces(id) ON DELETE CASCADE,
|
||||
name TEXT NOT NULL,
|
||||
value JSON,
|
||||
scope TEXT DEFAULT 'space',
|
||||
PRIMARY KEY(space_id, name)
|
||||
);
|
||||
|
||||
-- Transclusion reference tracking for cache invalidation
|
||||
CREATE TABLE IF NOT EXISTS transclusion_references (
|
||||
source_doc_id TEXT NOT NULL,
|
||||
target_doc_id TEXT NOT NULL,
|
||||
space_id TEXT NOT NULL REFERENCES spaces(id) ON DELETE CASCADE,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
PRIMARY KEY(source_doc_id, target_doc_id, space_id)
|
||||
);
|
||||
|
||||
-- Indexes for performance
|
||||
CREATE INDEX IF NOT EXISTS idx_spaces_name ON spaces(name);
|
||||
CREATE INDEX IF NOT EXISTS idx_spaces_parent ON spaces(parent_space_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_spaces_status ON spaces(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_space_documents_space ON space_documents(space_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_space_documents_path ON space_documents(space_id, space_path);
|
||||
CREATE INDEX IF NOT EXISTS idx_transclusion_refs_source ON transclusion_references(source_doc_id, space_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_transclusion_refs_target ON transclusion_references(target_doc_id, space_id);
|
||||
"""
|
||||
|
||||
|
||||
def initialize_space_tables(db_path: str) -> None:
|
||||
"""
|
||||
Initialize the space-related database tables.
|
||||
|
||||
Args:
|
||||
db_path: Path to the SQLite database file
|
||||
"""
|
||||
# Ensure directory exists
|
||||
db_dir = Path(db_path).parent
|
||||
if db_dir and not db_dir.exists():
|
||||
db_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
conn = sqlite3.connect(db_path)
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.executescript(SPACE_TABLES_SQL)
|
||||
conn.commit()
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
class SqliteSpaceRepository(ISpaceRepository):
|
||||
"""
|
||||
SQLite implementation of the space repository.
|
||||
|
||||
Provides persistent storage for InformationSpace entities
|
||||
using SQLite as the backend.
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""
|
||||
Initialize the repository.
|
||||
|
||||
Args:
|
||||
db_path: Path to the SQLite database file
|
||||
"""
|
||||
self.db_path = db_path
|
||||
initialize_space_tables(db_path)
|
||||
|
||||
def _get_connection(self) -> sqlite3.Connection:
|
||||
"""Get a database connection with foreign keys enabled."""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
return conn
|
||||
|
||||
def _row_to_space(self, row: sqlite3.Row) -> InformationSpace:
|
||||
"""Convert a database row to an InformationSpace."""
|
||||
metadata_dict = json.loads(row["metadata"]) if row["metadata"] else {}
|
||||
config_dict = json.loads(row["config"]) if row["config"] else {}
|
||||
|
||||
return InformationSpace(
|
||||
id=row["id"],
|
||||
name=row["name"],
|
||||
description=row["description"],
|
||||
metadata=SpaceMetadata.from_dict(metadata_dict),
|
||||
config=SpaceConfig.from_dict(config_dict),
|
||||
parent_space_id=row["parent_space_id"],
|
||||
status=SpaceStatus(row["status"]),
|
||||
created_at=datetime.fromisoformat(row["created_at"]) if row["created_at"] else datetime.now(),
|
||||
updated_at=datetime.fromisoformat(row["updated_at"]) if row["updated_at"] else datetime.now(),
|
||||
)
|
||||
|
||||
def create(self, space: InformationSpace) -> InformationSpace:
|
||||
"""Create a new space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Check if name already exists
|
||||
cursor.execute("SELECT id FROM spaces WHERE name = ?", (space.name,))
|
||||
if cursor.fetchone():
|
||||
raise ValueError(f"Space with name '{space.name}' already exists")
|
||||
|
||||
cursor.execute(
|
||||
"""
|
||||
INSERT INTO spaces (id, name, description, metadata, config, parent_space_id, status, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
space.id,
|
||||
space.name,
|
||||
space.description,
|
||||
json.dumps(space.metadata.to_dict() if isinstance(space.metadata, SpaceMetadata) else space.metadata),
|
||||
json.dumps(space.config.to_dict() if isinstance(space.config, SpaceConfig) else space.config),
|
||||
space.parent_space_id,
|
||||
space.status.value if isinstance(space.status, SpaceStatus) else space.status,
|
||||
space.created_at.isoformat(),
|
||||
space.updated_at.isoformat(),
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
return space
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_by_id(self, space_id: str) -> Optional[InformationSpace]:
|
||||
"""Get a space by ID."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT * FROM spaces WHERE id = ?", (space_id,))
|
||||
row = cursor.fetchone()
|
||||
return self._row_to_space(row) if row else None
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_by_name(self, name: str) -> Optional[InformationSpace]:
|
||||
"""Get a space by name."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT * FROM spaces WHERE name = ?", (name,))
|
||||
row = cursor.fetchone()
|
||||
return self._row_to_space(row) if row else None
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def list_all(self, include_archived: bool = False) -> List[InformationSpace]:
|
||||
"""List all spaces."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
if include_archived:
|
||||
cursor.execute("SELECT * FROM spaces WHERE status != 'deleted' ORDER BY name")
|
||||
else:
|
||||
cursor.execute("SELECT * FROM spaces WHERE status NOT IN ('archived', 'deleted') ORDER BY name")
|
||||
return [self._row_to_space(row) for row in cursor.fetchall()]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def update(self, space: InformationSpace) -> InformationSpace:
|
||||
"""Update a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Check if space exists
|
||||
cursor.execute("SELECT id FROM spaces WHERE id = ?", (space.id,))
|
||||
if not cursor.fetchone():
|
||||
raise ValueError(f"Space with id '{space.id}' does not exist")
|
||||
|
||||
space.touch() # Update timestamp
|
||||
|
||||
cursor.execute(
|
||||
"""
|
||||
UPDATE spaces SET
|
||||
name = ?,
|
||||
description = ?,
|
||||
metadata = ?,
|
||||
config = ?,
|
||||
parent_space_id = ?,
|
||||
status = ?,
|
||||
updated_at = ?
|
||||
WHERE id = ?
|
||||
""",
|
||||
(
|
||||
space.name,
|
||||
space.description,
|
||||
json.dumps(space.metadata.to_dict() if isinstance(space.metadata, SpaceMetadata) else space.metadata),
|
||||
json.dumps(space.config.to_dict() if isinstance(space.config, SpaceConfig) else space.config),
|
||||
space.parent_space_id,
|
||||
space.status.value if isinstance(space.status, SpaceStatus) else space.status,
|
||||
space.updated_at.isoformat(),
|
||||
space.id,
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
return space
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def delete(self, space_id: str) -> bool:
|
||||
"""Delete a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("DELETE FROM spaces WHERE id = ?", (space_id,))
|
||||
conn.commit()
|
||||
return cursor.rowcount > 0
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def exists(self, space_id: str) -> bool:
|
||||
"""Check if a space exists."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT 1 FROM spaces WHERE id = ?", (space_id,))
|
||||
return cursor.fetchone() is not None
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_children(self, parent_space_id: str) -> List[InformationSpace]:
|
||||
"""Get child spaces."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT * FROM spaces WHERE parent_space_id = ? ORDER BY name",
|
||||
(parent_space_id,),
|
||||
)
|
||||
return [self._row_to_space(row) for row in cursor.fetchall()]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
class SqliteDocumentRepository(IDocumentAssociationRepository):
|
||||
"""
|
||||
SQLite implementation of the document association repository.
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""Initialize the repository."""
|
||||
self.db_path = db_path
|
||||
initialize_space_tables(db_path)
|
||||
|
||||
def _get_connection(self) -> sqlite3.Connection:
|
||||
"""Get a database connection with foreign keys enabled."""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
return conn
|
||||
|
||||
def _row_to_document(self, row: sqlite3.Row) -> SpaceDocument:
|
||||
"""Convert a database row to a SpaceDocument."""
|
||||
metadata_dict = json.loads(row["metadata"]) if row["metadata"] else {}
|
||||
|
||||
return SpaceDocument(
|
||||
id=row["id"],
|
||||
space_id=row["space_id"],
|
||||
document_id=row["document_id"],
|
||||
space_path=row["space_path"],
|
||||
order_index=row["order_index"],
|
||||
metadata=metadata_dict,
|
||||
content_hash=row["content_hash"],
|
||||
added_at=datetime.fromisoformat(row["added_at"]) if row["added_at"] else datetime.now(),
|
||||
)
|
||||
|
||||
def add_document(self, document: SpaceDocument) -> SpaceDocument:
|
||||
"""Add a document to a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Check if path already exists in space
|
||||
cursor.execute(
|
||||
"SELECT id FROM space_documents WHERE space_id = ? AND space_path = ?",
|
||||
(document.space_id, document.space_path),
|
||||
)
|
||||
if cursor.fetchone():
|
||||
raise ValueError(f"Document path '{document.space_path}' already exists in space")
|
||||
|
||||
cursor.execute(
|
||||
"""
|
||||
INSERT INTO space_documents (id, space_id, document_id, space_path, order_index, metadata, content_hash, added_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
document.id,
|
||||
document.space_id,
|
||||
document.document_id,
|
||||
document.space_path,
|
||||
document.order_index,
|
||||
json.dumps(document.metadata),
|
||||
document.content_hash,
|
||||
document.added_at.isoformat(),
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
return document
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_document(self, document_id: str) -> Optional[SpaceDocument]:
|
||||
"""Get a document by ID."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT * FROM space_documents WHERE id = ?", (document_id,))
|
||||
row = cursor.fetchone()
|
||||
return self._row_to_document(row) if row else None
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_by_space_path(self, space_id: str, space_path: str) -> Optional[SpaceDocument]:
|
||||
"""Get a document by its path within a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT * FROM space_documents WHERE space_id = ? AND space_path = ?",
|
||||
(space_id, space_path),
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
return self._row_to_document(row) if row else None
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def list_by_space(self, space_id: str) -> List[SpaceDocument]:
|
||||
"""List all documents in a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT * FROM space_documents WHERE space_id = ? ORDER BY order_index, space_path",
|
||||
(space_id,),
|
||||
)
|
||||
return [self._row_to_document(row) for row in cursor.fetchall()]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def update_document(self, document: SpaceDocument) -> SpaceDocument:
|
||||
"""Update a document."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT id FROM space_documents WHERE id = ?", (document.id,))
|
||||
if not cursor.fetchone():
|
||||
raise ValueError(f"Document with id '{document.id}' does not exist")
|
||||
|
||||
cursor.execute(
|
||||
"""
|
||||
UPDATE space_documents SET
|
||||
document_id = ?,
|
||||
space_path = ?,
|
||||
order_index = ?,
|
||||
metadata = ?,
|
||||
content_hash = ?
|
||||
WHERE id = ?
|
||||
""",
|
||||
(
|
||||
document.document_id,
|
||||
document.space_path,
|
||||
document.order_index,
|
||||
json.dumps(document.metadata),
|
||||
document.content_hash,
|
||||
document.id,
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
return document
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def remove_document(self, document_id: str) -> bool:
|
||||
"""Remove a document from a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("DELETE FROM space_documents WHERE id = ?", (document_id,))
|
||||
conn.commit()
|
||||
return cursor.rowcount > 0
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def move_document(self, document_id: str, new_space_path: str) -> SpaceDocument:
|
||||
"""Move a document to a new path."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Get current document
|
||||
cursor.execute("SELECT * FROM space_documents WHERE id = ?", (document_id,))
|
||||
row = cursor.fetchone()
|
||||
if not row:
|
||||
raise ValueError(f"Document with id '{document_id}' does not exist")
|
||||
|
||||
# Check if new path already exists
|
||||
cursor.execute(
|
||||
"SELECT id FROM space_documents WHERE space_id = ? AND space_path = ? AND id != ?",
|
||||
(row["space_id"], new_space_path, document_id),
|
||||
)
|
||||
if cursor.fetchone():
|
||||
raise ValueError(f"Document path '{new_space_path}' already exists")
|
||||
|
||||
cursor.execute(
|
||||
"UPDATE space_documents SET space_path = ? WHERE id = ?",
|
||||
(new_space_path, document_id),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
document = self._row_to_document(row)
|
||||
document.space_path = new_space_path
|
||||
return document
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def reorder_documents(self, space_id: str, document_ids: List[str]) -> None:
|
||||
"""Reorder documents within a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
for index, doc_id in enumerate(document_ids):
|
||||
cursor.execute(
|
||||
"UPDATE space_documents SET order_index = ? WHERE id = ? AND space_id = ?",
|
||||
(index, doc_id, space_id),
|
||||
)
|
||||
conn.commit()
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def update_content_hash(self, document_id: str, content_hash: str) -> None:
|
||||
"""Update the content hash."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"UPDATE space_documents SET content_hash = ? WHERE id = ?",
|
||||
(content_hash, document_id),
|
||||
)
|
||||
conn.commit()
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
class SqliteVariableRepository(IVariableRepository):
|
||||
"""
|
||||
SQLite implementation of the variable repository.
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""Initialize the repository."""
|
||||
self.db_path = db_path
|
||||
initialize_space_tables(db_path)
|
||||
|
||||
def _get_connection(self) -> sqlite3.Connection:
|
||||
"""Get a database connection with foreign keys enabled."""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
return conn
|
||||
|
||||
def set_variable(self, variable: SpaceVariable) -> SpaceVariable:
|
||||
"""Set a variable value."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""
|
||||
INSERT OR REPLACE INTO space_variables (space_id, name, value, scope)
|
||||
VALUES (?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
variable.space_id,
|
||||
variable.name,
|
||||
json.dumps(variable.value),
|
||||
variable.scope,
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
return variable
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_variable(self, space_id: str, name: str) -> Optional[SpaceVariable]:
|
||||
"""Get a variable by name."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT * FROM space_variables WHERE space_id = ? AND name = ?",
|
||||
(space_id, name),
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
if not row:
|
||||
return None
|
||||
return SpaceVariable(
|
||||
space_id=row["space_id"],
|
||||
name=row["name"],
|
||||
value=json.loads(row["value"]) if row["value"] else None,
|
||||
scope=row["scope"],
|
||||
)
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def list_variables(self, space_id: str, scope: Optional[str] = None) -> List[SpaceVariable]:
|
||||
"""List variables in a space."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
if scope:
|
||||
cursor.execute(
|
||||
"SELECT * FROM space_variables WHERE space_id = ? AND scope = ?",
|
||||
(space_id, scope),
|
||||
)
|
||||
else:
|
||||
cursor.execute(
|
||||
"SELECT * FROM space_variables WHERE space_id = ?",
|
||||
(space_id,),
|
||||
)
|
||||
return [
|
||||
SpaceVariable(
|
||||
space_id=row["space_id"],
|
||||
name=row["name"],
|
||||
value=json.loads(row["value"]) if row["value"] else None,
|
||||
scope=row["scope"],
|
||||
)
|
||||
for row in cursor.fetchall()
|
||||
]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def delete_variable(self, space_id: str, name: str) -> bool:
|
||||
"""Delete a variable."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"DELETE FROM space_variables WHERE space_id = ? AND name = ?",
|
||||
(space_id, name),
|
||||
)
|
||||
conn.commit()
|
||||
return cursor.rowcount > 0
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
class SqliteReferenceRepository(IReferenceRepository):
|
||||
"""
|
||||
SQLite implementation of the reference repository.
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""Initialize the repository."""
|
||||
self.db_path = db_path
|
||||
initialize_space_tables(db_path)
|
||||
|
||||
def _get_connection(self) -> sqlite3.Connection:
|
||||
"""Get a database connection with foreign keys enabled."""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
return conn
|
||||
|
||||
def add_reference(self, reference: TransclusionReference) -> TransclusionReference:
|
||||
"""Add a transclusion reference."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""
|
||||
INSERT OR REPLACE INTO transclusion_references (source_doc_id, target_doc_id, space_id, created_at)
|
||||
VALUES (?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
reference.source_doc_id,
|
||||
reference.target_doc_id,
|
||||
reference.space_id,
|
||||
reference.created_at.isoformat(),
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
return reference
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_references_from(self, source_doc_id: str, space_id: str) -> List[TransclusionReference]:
|
||||
"""Get references from a source document."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT * FROM transclusion_references WHERE source_doc_id = ? AND space_id = ?",
|
||||
(source_doc_id, space_id),
|
||||
)
|
||||
return [
|
||||
TransclusionReference(
|
||||
source_doc_id=row["source_doc_id"],
|
||||
target_doc_id=row["target_doc_id"],
|
||||
space_id=row["space_id"],
|
||||
created_at=datetime.fromisoformat(row["created_at"]) if row["created_at"] else datetime.now(),
|
||||
)
|
||||
for row in cursor.fetchall()
|
||||
]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_references_to(self, target_doc_id: str, space_id: str) -> List[TransclusionReference]:
|
||||
"""Get references to a target document."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT * FROM transclusion_references WHERE target_doc_id = ? AND space_id = ?",
|
||||
(target_doc_id, space_id),
|
||||
)
|
||||
return [
|
||||
TransclusionReference(
|
||||
source_doc_id=row["source_doc_id"],
|
||||
target_doc_id=row["target_doc_id"],
|
||||
space_id=row["space_id"],
|
||||
created_at=datetime.fromisoformat(row["created_at"]) if row["created_at"] else datetime.now(),
|
||||
)
|
||||
for row in cursor.fetchall()
|
||||
]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def clear_references_from(self, source_doc_id: str, space_id: str) -> int:
|
||||
"""Clear references from a source document."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"DELETE FROM transclusion_references WHERE source_doc_id = ? AND space_id = ?",
|
||||
(source_doc_id, space_id),
|
||||
)
|
||||
conn.commit()
|
||||
return cursor.rowcount
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_dependents(self, document_id: str, space_id: str) -> List[str]:
|
||||
"""Get documents that depend on this document."""
|
||||
conn = self._get_connection()
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT DISTINCT source_doc_id FROM transclusion_references WHERE target_doc_id = ? AND space_id = ?",
|
||||
(document_id, space_id),
|
||||
)
|
||||
return [row["source_doc_id"] for row in cursor.fetchall()]
|
||||
finally:
|
||||
conn.close()
|
||||
14
markitect/spaces/services/__init__.py
Normal file
14
markitect/spaces/services/__init__.py
Normal file
@@ -0,0 +1,14 @@
|
||||
"""
|
||||
Service layer for Information Spaces.
|
||||
|
||||
This package provides the main orchestration service:
|
||||
- SpaceService: Main API for space operations
|
||||
- RenderService: Rendering orchestration (Phase 4)
|
||||
- SyncService: Synchronization coordination (Phase 5)
|
||||
"""
|
||||
|
||||
from .space_service import SpaceService
|
||||
|
||||
__all__ = [
|
||||
"SpaceService",
|
||||
]
|
||||
659
markitect/spaces/services/space_service.py
Normal file
659
markitect/spaces/services/space_service.py
Normal file
@@ -0,0 +1,659 @@
|
||||
"""
|
||||
SpaceService - Main orchestration service for Information Spaces.
|
||||
|
||||
This module provides the primary API for space operations, coordinating
|
||||
between repositories, event handling, and transclusion context management.
|
||||
"""
|
||||
|
||||
from typing import List, Optional, Dict, Any
|
||||
from pathlib import Path
|
||||
|
||||
from ..models import (
|
||||
InformationSpace,
|
||||
SpaceDocument,
|
||||
SpaceVariable,
|
||||
TransclusionReference,
|
||||
SpaceStatus,
|
||||
SpaceConfig,
|
||||
SpaceMetadata,
|
||||
)
|
||||
from ..repositories.interfaces import (
|
||||
ISpaceRepository,
|
||||
IDocumentAssociationRepository,
|
||||
IVariableRepository,
|
||||
IReferenceRepository,
|
||||
)
|
||||
|
||||
|
||||
class SpaceService:
|
||||
"""
|
||||
Main orchestration service for Information Space operations.
|
||||
|
||||
Provides a high-level API for managing spaces, documents, variables,
|
||||
and transclusion references. This service coordinates between the
|
||||
repository layer and future event/rendering systems.
|
||||
|
||||
Usage:
|
||||
service = SpaceService(
|
||||
space_repo=SqliteSpaceRepository(db_path),
|
||||
document_repo=SqliteDocumentRepository(db_path),
|
||||
variable_repo=SqliteVariableRepository(db_path),
|
||||
reference_repo=SqliteReferenceRepository(db_path),
|
||||
)
|
||||
|
||||
# Create a space
|
||||
space = service.create_space(name="my-docs", description="My documentation")
|
||||
|
||||
# Add documents
|
||||
service.add_document(space.id, "/intro.md", document_id="doc-1")
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
space_repo: ISpaceRepository,
|
||||
document_repo: IDocumentAssociationRepository,
|
||||
variable_repo: IVariableRepository,
|
||||
reference_repo: IReferenceRepository,
|
||||
):
|
||||
"""
|
||||
Initialize the SpaceService.
|
||||
|
||||
Args:
|
||||
space_repo: Repository for space CRUD operations
|
||||
document_repo: Repository for document associations
|
||||
variable_repo: Repository for space variables
|
||||
reference_repo: Repository for transclusion references
|
||||
"""
|
||||
self._space_repo = space_repo
|
||||
self._document_repo = document_repo
|
||||
self._variable_repo = variable_repo
|
||||
self._reference_repo = reference_repo
|
||||
|
||||
# =========================================================================
|
||||
# Space CRUD Operations
|
||||
# =========================================================================
|
||||
|
||||
def create_space(
|
||||
self,
|
||||
name: str,
|
||||
description: Optional[str] = None,
|
||||
config: Optional[SpaceConfig] = None,
|
||||
metadata: Optional[SpaceMetadata] = None,
|
||||
parent_space_id: Optional[str] = None,
|
||||
) -> InformationSpace:
|
||||
"""
|
||||
Create a new information space.
|
||||
|
||||
Args:
|
||||
name: Unique name for the space
|
||||
description: Optional description
|
||||
config: Optional configuration (defaults provided if None)
|
||||
metadata: Optional metadata (defaults provided if None)
|
||||
parent_space_id: Optional parent space for hierarchy
|
||||
|
||||
Returns:
|
||||
The created InformationSpace
|
||||
|
||||
Raises:
|
||||
ValueError: If name is empty or already exists
|
||||
"""
|
||||
if not name or not name.strip():
|
||||
raise ValueError("Space name cannot be empty")
|
||||
|
||||
# Validate parent exists if specified
|
||||
if parent_space_id:
|
||||
parent = self._space_repo.get_by_id(parent_space_id)
|
||||
if not parent:
|
||||
raise ValueError(f"Parent space '{parent_space_id}' not found")
|
||||
|
||||
space = InformationSpace(
|
||||
name=name.strip(),
|
||||
description=description,
|
||||
config=config or SpaceConfig(),
|
||||
metadata=metadata or SpaceMetadata(),
|
||||
parent_space_id=parent_space_id,
|
||||
)
|
||||
|
||||
return self._space_repo.create(space)
|
||||
|
||||
def get_space(self, space_id: str) -> Optional[InformationSpace]:
|
||||
"""
|
||||
Get a space by its ID.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
The space if found, None otherwise
|
||||
"""
|
||||
return self._space_repo.get_by_id(space_id)
|
||||
|
||||
def get_space_by_name(self, name: str) -> Optional[InformationSpace]:
|
||||
"""
|
||||
Get a space by its name.
|
||||
|
||||
Args:
|
||||
name: The space name
|
||||
|
||||
Returns:
|
||||
The space if found, None otherwise
|
||||
"""
|
||||
return self._space_repo.get_by_name(name)
|
||||
|
||||
def list_spaces(self, include_archived: bool = False) -> List[InformationSpace]:
|
||||
"""
|
||||
List all spaces.
|
||||
|
||||
Args:
|
||||
include_archived: Whether to include archived spaces
|
||||
|
||||
Returns:
|
||||
List of spaces
|
||||
"""
|
||||
return self._space_repo.list_all(include_archived=include_archived)
|
||||
|
||||
def update_space(
|
||||
self,
|
||||
space_id: str,
|
||||
name: Optional[str] = None,
|
||||
description: Optional[str] = None,
|
||||
config: Optional[SpaceConfig] = None,
|
||||
metadata: Optional[SpaceMetadata] = None,
|
||||
) -> InformationSpace:
|
||||
"""
|
||||
Update a space's properties.
|
||||
|
||||
Args:
|
||||
space_id: The space ID to update
|
||||
name: New name (optional)
|
||||
description: New description (optional)
|
||||
config: New config (optional)
|
||||
metadata: New metadata (optional)
|
||||
|
||||
Returns:
|
||||
The updated space
|
||||
|
||||
Raises:
|
||||
ValueError: If space not found or name already taken
|
||||
"""
|
||||
space = self._space_repo.get_by_id(space_id)
|
||||
if not space:
|
||||
raise ValueError(f"Space '{space_id}' not found")
|
||||
|
||||
if name is not None:
|
||||
if not name.strip():
|
||||
raise ValueError("Space name cannot be empty")
|
||||
# Check if name is taken by another space
|
||||
existing = self._space_repo.get_by_name(name.strip())
|
||||
if existing and existing.id != space_id:
|
||||
raise ValueError(f"Space name '{name}' already exists")
|
||||
space.name = name.strip()
|
||||
|
||||
if description is not None:
|
||||
space.description = description
|
||||
|
||||
if config is not None:
|
||||
space.config = config
|
||||
|
||||
if metadata is not None:
|
||||
space.metadata = metadata
|
||||
|
||||
return self._space_repo.update(space)
|
||||
|
||||
def delete_space(self, space_id: str, cascade: bool = True) -> bool:
|
||||
"""
|
||||
Delete a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID to delete
|
||||
cascade: If True, delete all child spaces too
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
|
||||
Raises:
|
||||
ValueError: If space has children and cascade is False
|
||||
"""
|
||||
space = self._space_repo.get_by_id(space_id)
|
||||
if not space:
|
||||
return False
|
||||
|
||||
children = self._space_repo.get_children(space_id)
|
||||
if children and not cascade:
|
||||
raise ValueError(
|
||||
f"Space '{space_id}' has {len(children)} child spaces. "
|
||||
"Set cascade=True to delete them."
|
||||
)
|
||||
|
||||
# Delete children first (if cascade)
|
||||
if cascade:
|
||||
for child in children:
|
||||
self.delete_space(child.id, cascade=True)
|
||||
|
||||
return self._space_repo.delete(space_id)
|
||||
|
||||
def activate_space(self, space_id: str) -> InformationSpace:
|
||||
"""
|
||||
Activate a space (change status from draft to active).
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
The updated space
|
||||
|
||||
Raises:
|
||||
ValueError: If space not found
|
||||
"""
|
||||
space = self._space_repo.get_by_id(space_id)
|
||||
if not space:
|
||||
raise ValueError(f"Space '{space_id}' not found")
|
||||
|
||||
space.activate()
|
||||
return self._space_repo.update(space)
|
||||
|
||||
def archive_space(self, space_id: str) -> InformationSpace:
|
||||
"""
|
||||
Archive a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
The updated space
|
||||
|
||||
Raises:
|
||||
ValueError: If space not found
|
||||
"""
|
||||
space = self._space_repo.get_by_id(space_id)
|
||||
if not space:
|
||||
raise ValueError(f"Space '{space_id}' not found")
|
||||
|
||||
space.archive()
|
||||
return self._space_repo.update(space)
|
||||
|
||||
def get_child_spaces(self, parent_space_id: str) -> List[InformationSpace]:
|
||||
"""
|
||||
Get all child spaces of a parent.
|
||||
|
||||
Args:
|
||||
parent_space_id: The parent space ID
|
||||
|
||||
Returns:
|
||||
List of child spaces
|
||||
"""
|
||||
return self._space_repo.get_children(parent_space_id)
|
||||
|
||||
# =========================================================================
|
||||
# Document Operations
|
||||
# =========================================================================
|
||||
|
||||
def add_document(
|
||||
self,
|
||||
space_id: str,
|
||||
space_path: str,
|
||||
document_id: Optional[str] = None,
|
||||
order_index: int = 0,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
content_hash: Optional[str] = None,
|
||||
) -> SpaceDocument:
|
||||
"""
|
||||
Add a document to a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
space_path: Path within the space (e.g., "/intro.md")
|
||||
document_id: External document ID (optional)
|
||||
order_index: Position in space ordering
|
||||
metadata: Document metadata
|
||||
content_hash: Content hash for change detection
|
||||
|
||||
Returns:
|
||||
The created document association
|
||||
|
||||
Raises:
|
||||
ValueError: If space not found or path already exists
|
||||
"""
|
||||
if not self._space_repo.exists(space_id):
|
||||
raise ValueError(f"Space '{space_id}' not found")
|
||||
|
||||
# Normalize path
|
||||
if not space_path.startswith("/"):
|
||||
space_path = "/" + space_path
|
||||
|
||||
document = SpaceDocument(
|
||||
space_id=space_id,
|
||||
document_id=document_id or "",
|
||||
space_path=space_path,
|
||||
order_index=order_index,
|
||||
metadata=metadata or {},
|
||||
content_hash=content_hash,
|
||||
)
|
||||
|
||||
return self._document_repo.add_document(document)
|
||||
|
||||
def get_document(self, document_id: str) -> Optional[SpaceDocument]:
|
||||
"""
|
||||
Get a document by its association ID.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
|
||||
Returns:
|
||||
The document if found, None otherwise
|
||||
"""
|
||||
return self._document_repo.get_document(document_id)
|
||||
|
||||
def get_document_by_path(
|
||||
self, space_id: str, space_path: str
|
||||
) -> Optional[SpaceDocument]:
|
||||
"""
|
||||
Get a document by its path within a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
space_path: The path within the space
|
||||
|
||||
Returns:
|
||||
The document if found, None otherwise
|
||||
"""
|
||||
# Normalize path
|
||||
if not space_path.startswith("/"):
|
||||
space_path = "/" + space_path
|
||||
return self._document_repo.get_by_space_path(space_id, space_path)
|
||||
|
||||
def list_documents(self, space_id: str) -> List[SpaceDocument]:
|
||||
"""
|
||||
List all documents in a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of documents ordered by order_index
|
||||
"""
|
||||
return self._document_repo.list_by_space(space_id)
|
||||
|
||||
def remove_document(self, document_id: str) -> bool:
|
||||
"""
|
||||
Remove a document from a space.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
|
||||
Returns:
|
||||
True if removed, False if not found
|
||||
"""
|
||||
# Clear any references from this document first
|
||||
document = self._document_repo.get_document(document_id)
|
||||
if document:
|
||||
self._reference_repo.clear_references_from(document_id, document.space_id)
|
||||
|
||||
return self._document_repo.remove_document(document_id)
|
||||
|
||||
def move_document(self, document_id: str, new_path: str) -> SpaceDocument:
|
||||
"""
|
||||
Move a document to a new path within its space.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
new_path: The new path
|
||||
|
||||
Returns:
|
||||
The updated document
|
||||
|
||||
Raises:
|
||||
ValueError: If document not found or new path exists
|
||||
"""
|
||||
if not new_path.startswith("/"):
|
||||
new_path = "/" + new_path
|
||||
return self._document_repo.move_document(document_id, new_path)
|
||||
|
||||
def reorder_documents(self, space_id: str, document_ids: List[str]) -> None:
|
||||
"""
|
||||
Reorder documents within a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
document_ids: Ordered list of document IDs
|
||||
"""
|
||||
self._document_repo.reorder_documents(space_id, document_ids)
|
||||
|
||||
def update_document_hash(self, document_id: str, content_hash: str) -> None:
|
||||
"""
|
||||
Update the content hash for a document.
|
||||
|
||||
Args:
|
||||
document_id: The document association ID
|
||||
content_hash: The new content hash
|
||||
"""
|
||||
self._document_repo.update_content_hash(document_id, content_hash)
|
||||
|
||||
# =========================================================================
|
||||
# Variable Operations
|
||||
# =========================================================================
|
||||
|
||||
def set_variable(
|
||||
self,
|
||||
space_id: str,
|
||||
name: str,
|
||||
value: Any,
|
||||
scope: str = "space",
|
||||
) -> SpaceVariable:
|
||||
"""
|
||||
Set a variable in a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
name: Variable name
|
||||
value: Variable value (any JSON-serializable value)
|
||||
scope: Variable scope ("space" or "document")
|
||||
|
||||
Returns:
|
||||
The saved variable
|
||||
|
||||
Raises:
|
||||
ValueError: If space not found
|
||||
"""
|
||||
if not self._space_repo.exists(space_id):
|
||||
raise ValueError(f"Space '{space_id}' not found")
|
||||
|
||||
variable = SpaceVariable(
|
||||
space_id=space_id,
|
||||
name=name,
|
||||
value=value,
|
||||
scope=scope,
|
||||
)
|
||||
|
||||
return self._variable_repo.set_variable(variable)
|
||||
|
||||
def get_variable(self, space_id: str, name: str) -> Optional[SpaceVariable]:
|
||||
"""
|
||||
Get a variable by name.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
name: Variable name
|
||||
|
||||
Returns:
|
||||
The variable if found, None otherwise
|
||||
"""
|
||||
return self._variable_repo.get_variable(space_id, name)
|
||||
|
||||
def list_variables(
|
||||
self, space_id: str, scope: Optional[str] = None
|
||||
) -> List[SpaceVariable]:
|
||||
"""
|
||||
List all variables in a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
scope: Optional scope filter
|
||||
|
||||
Returns:
|
||||
List of variables
|
||||
"""
|
||||
return self._variable_repo.list_variables(space_id, scope)
|
||||
|
||||
def delete_variable(self, space_id: str, name: str) -> bool:
|
||||
"""
|
||||
Delete a variable.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
name: Variable name
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
return self._variable_repo.delete_variable(space_id, name)
|
||||
|
||||
def get_variables_dict(self, space_id: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get all variables as a dictionary.
|
||||
|
||||
Useful for transclusion context.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
Dictionary of variable names to values
|
||||
"""
|
||||
variables = self._variable_repo.list_variables(space_id)
|
||||
return {var.name: var.value for var in variables}
|
||||
|
||||
# =========================================================================
|
||||
# Reference Operations
|
||||
# =========================================================================
|
||||
|
||||
def add_reference(
|
||||
self,
|
||||
source_doc_id: str,
|
||||
target_doc_id: str,
|
||||
space_id: str,
|
||||
) -> TransclusionReference:
|
||||
"""
|
||||
Add a transclusion reference.
|
||||
|
||||
Args:
|
||||
source_doc_id: The source document ID
|
||||
target_doc_id: The target document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
The created reference
|
||||
"""
|
||||
reference = TransclusionReference(
|
||||
source_doc_id=source_doc_id,
|
||||
target_doc_id=target_doc_id,
|
||||
space_id=space_id,
|
||||
)
|
||||
return self._reference_repo.add_reference(reference)
|
||||
|
||||
def get_references_from(
|
||||
self, source_doc_id: str, space_id: str
|
||||
) -> List[TransclusionReference]:
|
||||
"""
|
||||
Get all references from a source document.
|
||||
|
||||
Args:
|
||||
source_doc_id: The source document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of references
|
||||
"""
|
||||
return self._reference_repo.get_references_from(source_doc_id, space_id)
|
||||
|
||||
def get_references_to(
|
||||
self, target_doc_id: str, space_id: str
|
||||
) -> List[TransclusionReference]:
|
||||
"""
|
||||
Get all references to a target document.
|
||||
|
||||
Args:
|
||||
target_doc_id: The target document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of references
|
||||
"""
|
||||
return self._reference_repo.get_references_to(target_doc_id, space_id)
|
||||
|
||||
def clear_references_from(self, source_doc_id: str, space_id: str) -> int:
|
||||
"""
|
||||
Clear all references from a source document.
|
||||
|
||||
Args:
|
||||
source_doc_id: The source document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
Number of references cleared
|
||||
"""
|
||||
return self._reference_repo.clear_references_from(source_doc_id, space_id)
|
||||
|
||||
def get_dependents(self, document_id: str, space_id: str) -> List[str]:
|
||||
"""
|
||||
Get all documents that depend on a given document.
|
||||
|
||||
Used for cache invalidation - returns documents that need
|
||||
to be re-rendered when the target document changes.
|
||||
|
||||
Args:
|
||||
document_id: The document ID
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
List of dependent document IDs
|
||||
"""
|
||||
return self._reference_repo.get_dependents(document_id, space_id)
|
||||
|
||||
# =========================================================================
|
||||
# Convenience Methods
|
||||
# =========================================================================
|
||||
|
||||
def space_exists(self, space_id: str) -> bool:
|
||||
"""
|
||||
Check if a space exists.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
True if exists, False otherwise
|
||||
"""
|
||||
return self._space_repo.exists(space_id)
|
||||
|
||||
def get_space_stats(self, space_id: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get statistics about a space.
|
||||
|
||||
Args:
|
||||
space_id: The space ID
|
||||
|
||||
Returns:
|
||||
Dictionary with statistics
|
||||
|
||||
Raises:
|
||||
ValueError: If space not found
|
||||
"""
|
||||
space = self._space_repo.get_by_id(space_id)
|
||||
if not space:
|
||||
raise ValueError(f"Space '{space_id}' not found")
|
||||
|
||||
documents = self._document_repo.list_by_space(space_id)
|
||||
variables = self._variable_repo.list_variables(space_id)
|
||||
children = self._space_repo.get_children(space_id)
|
||||
|
||||
return {
|
||||
"space_id": space_id,
|
||||
"name": space.name,
|
||||
"status": space.status.value,
|
||||
"document_count": len(documents),
|
||||
"variable_count": len(variables),
|
||||
"child_space_count": len(children),
|
||||
"created_at": space.created_at.isoformat(),
|
||||
"updated_at": space.updated_at.isoformat(),
|
||||
}
|
||||
13
markitect/spaces/sync/__init__.py
Normal file
13
markitect/spaces/sync/__init__.py
Normal file
@@ -0,0 +1,13 @@
|
||||
"""
|
||||
Directory synchronization for Information Spaces.
|
||||
|
||||
This package provides filesystem integration:
|
||||
- SpaceToDirectory exporter using VariantFactory
|
||||
- DirectoryToSpace importer
|
||||
- Bidirectional sync coordinator
|
||||
- Filesystem watcher for external changes
|
||||
- Conflict detection and resolution
|
||||
"""
|
||||
|
||||
# Directory sync will be implemented in Phase 5
|
||||
__all__ = []
|
||||
12
markitect/spaces/transclusion/__init__.py
Normal file
12
markitect/spaces/transclusion/__init__.py
Normal file
@@ -0,0 +1,12 @@
|
||||
"""
|
||||
Persistent transclusion context for Information Spaces.
|
||||
|
||||
This package extends the existing TransclusionContext with:
|
||||
- Database persistence for context state
|
||||
- Cross-space reference resolution
|
||||
- Reference graph for dependency tracking
|
||||
- Variable scope layers (space, document, request)
|
||||
"""
|
||||
|
||||
# Transclusion extensions will be implemented in Phase 3
|
||||
__all__ = []
|
||||
10
markitect/storage/__init__.py
Normal file
10
markitect/storage/__init__.py
Normal file
@@ -0,0 +1,10 @@
|
||||
"""
|
||||
Storage modules for MarkiTect.
|
||||
|
||||
This package contains storage-related functionality:
|
||||
- Database: SQLite database management for documents and schemas
|
||||
"""
|
||||
|
||||
from .database import DatabaseManager
|
||||
|
||||
__all__ = ['DatabaseManager']
|
||||
444
markitect/storage/database.py
Normal file
444
markitect/storage/database.py
Normal file
@@ -0,0 +1,444 @@
|
||||
"""
|
||||
Database management functionality for MarkiTect.
|
||||
|
||||
This module provides SQLite database initialization, markdown file storage
|
||||
with front matter support, and JSON schema storage (Issue #3).
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
from markitect.frontmatter import FrontMatterParser
|
||||
|
||||
|
||||
class DatabaseManager:
|
||||
"""Manager for SQLite database operations."""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""
|
||||
Initialize database manager.
|
||||
|
||||
Args:
|
||||
db_path: Path to SQLite database file
|
||||
"""
|
||||
self.db_path = db_path
|
||||
self.front_matter_parser = FrontMatterParser()
|
||||
|
||||
def initialize_database(self) -> None:
|
||||
"""
|
||||
Initialize SQLite database with required tables.
|
||||
|
||||
Creates the markdown_files table with the following schema:
|
||||
- id: INTEGER PRIMARY KEY
|
||||
- filename: TEXT NOT NULL
|
||||
- front_matter: TEXT (JSON)
|
||||
- content: TEXT
|
||||
- created_at: TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
|
||||
Also initializes finance schema if finance module is available.
|
||||
"""
|
||||
# Ensure directory exists
|
||||
db_dir = os.path.dirname(self.db_path)
|
||||
if db_dir and not os.path.exists(db_dir):
|
||||
os.makedirs(db_dir)
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create markdown_files table
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS markdown_files (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
filename TEXT NOT NULL,
|
||||
front_matter TEXT,
|
||||
content TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
''')
|
||||
|
||||
# Create schemas table for Issue #3
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS schemas (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
filename TEXT NOT NULL UNIQUE,
|
||||
title TEXT,
|
||||
description TEXT,
|
||||
schema_content TEXT NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
''')
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# Initialize finance schema if available
|
||||
self.initialize_finance_schema()
|
||||
|
||||
def initialize_finance_schema(self) -> None:
|
||||
"""
|
||||
Initialize finance schema for cost tracking (Issue #88).
|
||||
|
||||
This method is called automatically during database initialization
|
||||
to set up cost tracking tables if the finance module is available.
|
||||
"""
|
||||
try:
|
||||
from .finance.models import FinanceModels
|
||||
finance_models = FinanceModels(self.db_path)
|
||||
finance_models.initialize_finance_schema()
|
||||
except ImportError:
|
||||
# Finance module not available, skip initialization
|
||||
pass
|
||||
except Exception as e:
|
||||
# Silently ignore finance schema initialization errors for CLI compatibility
|
||||
pass
|
||||
|
||||
def store_markdown_file(self, filename: str, content: str) -> Optional[int]:
|
||||
"""
|
||||
Store a markdown file in the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the markdown file
|
||||
content: Raw markdown content with optional front matter
|
||||
|
||||
Returns:
|
||||
ID of the inserted record, or None if insertion failed
|
||||
"""
|
||||
# Parse front matter and content
|
||||
front_matter, markdown_content = self.front_matter_parser.parse(content)
|
||||
|
||||
# Convert front matter to JSON string
|
||||
front_matter_json = json.dumps(front_matter) if front_matter else '{}'
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
cursor.execute('''
|
||||
INSERT INTO markdown_files (filename, front_matter, content, created_at)
|
||||
VALUES (?, ?, ?, ?)
|
||||
''', (filename, front_matter_json, markdown_content, datetime.now().isoformat()))
|
||||
|
||||
record_id = cursor.lastrowid
|
||||
conn.commit()
|
||||
return record_id
|
||||
|
||||
except sqlite3.Error:
|
||||
conn.rollback()
|
||||
return None
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_markdown_file(self, filename: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Retrieve a markdown file from the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the markdown file to retrieve
|
||||
|
||||
Returns:
|
||||
Dictionary containing file data, or None if not found
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, front_matter, content, created_at
|
||||
FROM markdown_files
|
||||
WHERE filename = ?
|
||||
''', (filename,))
|
||||
|
||||
row = cursor.fetchone()
|
||||
conn.close()
|
||||
|
||||
if row:
|
||||
return {
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'front_matter': json.loads(row[2]) if row[2] else {},
|
||||
'content': row[3],
|
||||
'created_at': row[4]
|
||||
}
|
||||
|
||||
return None
|
||||
|
||||
def list_markdown_files(self) -> list:
|
||||
"""
|
||||
List all markdown files in the database.
|
||||
|
||||
Returns:
|
||||
List of dictionaries containing file metadata
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, front_matter, created_at
|
||||
FROM markdown_files
|
||||
ORDER BY created_at DESC
|
||||
''')
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
files = []
|
||||
for row in rows:
|
||||
files.append({
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'front_matter': json.loads(row[2]) if row[2] else {},
|
||||
'created_at': row[3]
|
||||
})
|
||||
|
||||
return files
|
||||
|
||||
def execute_query(self, sql: str) -> list:
|
||||
"""
|
||||
Execute a read-only SQL query against the database.
|
||||
|
||||
Args:
|
||||
sql: SQL query string (SELECT operations only)
|
||||
|
||||
Returns:
|
||||
List of dictionaries representing query results
|
||||
|
||||
Raises:
|
||||
ValueError: If query contains non-SELECT operations
|
||||
sqlite3.Error: If query execution fails
|
||||
"""
|
||||
# Security check: only allow SELECT queries
|
||||
sql_upper = sql.strip().upper()
|
||||
if not sql_upper.startswith('SELECT'):
|
||||
allowed_starts = ['SELECT', 'WITH'] # Allow WITH for CTEs
|
||||
if not any(sql_upper.startswith(start) for start in allowed_starts):
|
||||
raise ValueError("Only SELECT and WITH queries are allowed for safety")
|
||||
|
||||
# Additional safety checks for dangerous keywords (as whole words)
|
||||
dangerous_keywords = [
|
||||
'DROP', 'DELETE', 'UPDATE', 'INSERT', 'CREATE', 'ALTER',
|
||||
'TRUNCATE', 'REPLACE', 'PRAGMA'
|
||||
]
|
||||
import re
|
||||
for keyword in dangerous_keywords:
|
||||
# Use word boundaries to match only complete words
|
||||
pattern = r'\b' + keyword + r'\b'
|
||||
if re.search(pattern, sql_upper):
|
||||
raise ValueError(f"Query contains dangerous keyword: {keyword}")
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row # Enable column access by name
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
cursor.execute(sql)
|
||||
rows = cursor.fetchall()
|
||||
|
||||
# Convert rows to dictionaries
|
||||
results = []
|
||||
for row in rows:
|
||||
results.append(dict(row))
|
||||
|
||||
conn.close()
|
||||
return results
|
||||
|
||||
except sqlite3.Error as e:
|
||||
conn.close()
|
||||
raise e
|
||||
|
||||
def get_schema(self) -> dict:
|
||||
"""
|
||||
Get database schema information.
|
||||
|
||||
Returns:
|
||||
Dictionary containing table schemas with column information
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
schema = {}
|
||||
|
||||
try:
|
||||
# Get all table names
|
||||
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
|
||||
tables = cursor.fetchall()
|
||||
|
||||
for table_row in tables:
|
||||
table_name = table_row[0]
|
||||
|
||||
# Get column information for each table
|
||||
cursor.execute(f"PRAGMA table_info({table_name})")
|
||||
columns = cursor.fetchall()
|
||||
|
||||
column_info = []
|
||||
for col in columns:
|
||||
column_info.append({
|
||||
'name': col[1],
|
||||
'type': col[2],
|
||||
'nullable': not bool(col[3]), # notnull flag
|
||||
'default_value': col[4],
|
||||
'primary_key': bool(col[5])
|
||||
})
|
||||
|
||||
schema[table_name] = {
|
||||
'columns': column_info
|
||||
}
|
||||
|
||||
conn.close()
|
||||
return schema
|
||||
|
||||
except sqlite3.Error as e:
|
||||
conn.close()
|
||||
raise e
|
||||
|
||||
# Schema management methods for Issue #3
|
||||
def store_schema_file(self, filename: str, schema_content: str) -> Optional[int]:
|
||||
"""
|
||||
Store a JSON schema file in the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the schema file
|
||||
schema_content: JSON schema content as string
|
||||
|
||||
Returns:
|
||||
ID of the inserted/updated record, or None if operation failed
|
||||
"""
|
||||
try:
|
||||
# Parse and validate JSON schema
|
||||
schema_data = json.loads(schema_content)
|
||||
title = schema_data.get('title', filename)
|
||||
description = schema_data.get('description', '')
|
||||
except json.JSONDecodeError:
|
||||
return None
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
# Check if schema already exists
|
||||
cursor.execute('SELECT id FROM schemas WHERE filename = ?', (filename,))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# Update existing schema
|
||||
cursor.execute('''
|
||||
UPDATE schemas
|
||||
SET title = ?, description = ?, schema_content = ?, updated_at = ?
|
||||
WHERE filename = ?
|
||||
''', (title, description, schema_content, datetime.now().isoformat(), filename))
|
||||
record_id = existing[0]
|
||||
else:
|
||||
# Insert new schema
|
||||
cursor.execute('''
|
||||
INSERT INTO schemas (filename, title, description, schema_content, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
''', (filename, title, description, schema_content,
|
||||
datetime.now().isoformat(), datetime.now().isoformat()))
|
||||
record_id = cursor.lastrowid
|
||||
|
||||
conn.commit()
|
||||
return record_id
|
||||
|
||||
except sqlite3.Error:
|
||||
conn.rollback()
|
||||
return None
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def get_schema_file(self, filename: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Retrieve a schema file from the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the schema file to retrieve
|
||||
|
||||
Returns:
|
||||
Dictionary containing schema data, or None if not found
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, title, description, schema_content, created_at, updated_at
|
||||
FROM schemas
|
||||
WHERE filename = ?
|
||||
''', (filename,))
|
||||
|
||||
row = cursor.fetchone()
|
||||
conn.close()
|
||||
|
||||
if row:
|
||||
return {
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'title': row[2],
|
||||
'description': row[3],
|
||||
'schema_content': row[4],
|
||||
'created_at': row[5],
|
||||
'updated_at': row[6]
|
||||
}
|
||||
|
||||
return None
|
||||
|
||||
def list_schema_files(self) -> list:
|
||||
"""
|
||||
List all schema files in the database.
|
||||
|
||||
Returns:
|
||||
List of dictionaries containing schema metadata
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT id, filename, title, description, created_at, updated_at
|
||||
FROM schemas
|
||||
ORDER BY updated_at DESC
|
||||
''')
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
schemas = []
|
||||
for row in rows:
|
||||
schemas.append({
|
||||
'id': row[0],
|
||||
'filename': row[1],
|
||||
'title': row[2],
|
||||
'description': row[3],
|
||||
'created_at': row[4],
|
||||
'updated_at': row[5]
|
||||
})
|
||||
|
||||
return schemas
|
||||
|
||||
def delete_schema_file(self, filename: str) -> bool:
|
||||
"""
|
||||
Delete a schema file from the database.
|
||||
|
||||
Args:
|
||||
filename: Name of the schema file to delete
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
cursor.execute('DELETE FROM schemas WHERE filename = ?', (filename,))
|
||||
success = cursor.rowcount > 0
|
||||
conn.commit()
|
||||
return success
|
||||
|
||||
except sqlite3.Error:
|
||||
conn.rollback()
|
||||
return False
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
@@ -1,477 +1,37 @@
|
||||
"""
|
||||
Workspace management functionality for Issue #144.
|
||||
Workspace management - Backward Compatibility Module.
|
||||
|
||||
This module provides workspace templates, multi-project support, and
|
||||
collaborative workspace features.
|
||||
This module re-exports from markitect.core.workspace for backward compatibility.
|
||||
New code should import from markitect.core.workspace directly.
|
||||
"""
|
||||
|
||||
import json
|
||||
import yaml
|
||||
import shutil
|
||||
import zipfile
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
|
||||
from markitect.assets import AssetManager
|
||||
|
||||
|
||||
@dataclass
|
||||
class TemplateMetadata:
|
||||
"""Metadata for workspace templates."""
|
||||
name: str
|
||||
description: str
|
||||
version: str
|
||||
created_at: datetime
|
||||
asset_count: int
|
||||
author: str = "Unknown"
|
||||
tags: List[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class TemplateResult:
|
||||
"""Result of template creation."""
|
||||
success: bool
|
||||
template_path: Path
|
||||
template_name: str
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class WorkspaceCreationResult:
|
||||
"""Result of workspace creation from template."""
|
||||
success: bool
|
||||
workspace_path: Path
|
||||
project_name: str
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class ProjectResult:
|
||||
"""Result of project operations."""
|
||||
success: bool
|
||||
project_path: Path
|
||||
project_name: str
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class SyncResult:
|
||||
"""Result of workspace synchronization."""
|
||||
synchronized_count: int
|
||||
skipped_count: int
|
||||
error_count: int
|
||||
errors: List[Exception] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class BackupResult:
|
||||
"""Result of workspace backup."""
|
||||
success: bool
|
||||
backup_path: Path
|
||||
backup_size: int
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class RestoreResult:
|
||||
"""Result of workspace restore."""
|
||||
success: bool
|
||||
restored_path: Path
|
||||
files_restored: int
|
||||
error: Optional[Exception] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class WorkspaceState:
|
||||
"""Snapshot of workspace state."""
|
||||
timestamp: datetime
|
||||
file_checksums: Dict[str, str]
|
||||
directory_structure: List[str]
|
||||
asset_hashes: List[str]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ConflictInfo:
|
||||
"""Information about a workspace conflict."""
|
||||
file_path: Path
|
||||
conflict_type: str
|
||||
local_timestamp: datetime
|
||||
remote_timestamp: datetime
|
||||
|
||||
|
||||
@dataclass
|
||||
class MergeResult:
|
||||
"""Result of conflict resolution."""
|
||||
resolved_conflicts: int
|
||||
unresolved_conflicts: int
|
||||
merge_strategy: str
|
||||
|
||||
|
||||
class WorkspaceTemplate:
|
||||
"""Workspace template management."""
|
||||
|
||||
def __init__(self, template_path: Path):
|
||||
"""Initialize workspace template."""
|
||||
self.template_path = template_path
|
||||
self.metadata_file = template_path / "template.json"
|
||||
|
||||
def get_metadata(self) -> TemplateMetadata:
|
||||
"""Get template metadata."""
|
||||
if self.metadata_file.exists():
|
||||
metadata_dict = json.loads(self.metadata_file.read_text())
|
||||
return TemplateMetadata(**metadata_dict)
|
||||
else:
|
||||
return TemplateMetadata(
|
||||
name="Unknown",
|
||||
description="No description",
|
||||
version="1.0.0",
|
||||
created_at=datetime.now(),
|
||||
asset_count=0
|
||||
)
|
||||
|
||||
|
||||
class WorkspaceManager:
|
||||
"""Workspace management system."""
|
||||
|
||||
def __init__(self, templates_dir: Optional[Path] = None):
|
||||
"""Initialize workspace manager."""
|
||||
self.templates_dir = templates_dir or Path.home() / ".markitect" / "templates"
|
||||
self.templates_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def create_template(self, name: str, source_path: Path, description: str = "",
|
||||
include_assets: bool = True, configuration: Optional[Dict] = None) -> TemplateResult:
|
||||
"""Create a workspace template from existing workspace."""
|
||||
try:
|
||||
template_path = self.templates_dir / name
|
||||
template_path.mkdir(exist_ok=True)
|
||||
|
||||
# Copy workspace structure
|
||||
self._copy_workspace_structure(source_path, template_path, include_assets)
|
||||
|
||||
# Count assets
|
||||
asset_count = 0
|
||||
if include_assets and (source_path / "assets").exists():
|
||||
asset_count = len(list((source_path / "assets").rglob("*")))
|
||||
|
||||
# Create template metadata
|
||||
metadata = {
|
||||
"name": name,
|
||||
"description": description,
|
||||
"version": "1.0.0",
|
||||
"created_at": datetime.now().isoformat(),
|
||||
"asset_count": asset_count,
|
||||
"author": "Unknown",
|
||||
"tags": []
|
||||
}
|
||||
|
||||
metadata_file = template_path / "template.json"
|
||||
metadata_file.write_text(json.dumps(metadata, indent=2))
|
||||
|
||||
# Save configuration if provided
|
||||
if configuration:
|
||||
config_file = template_path / "markitect.yaml"
|
||||
config_file.write_text(yaml.dump(configuration, indent=2))
|
||||
|
||||
return TemplateResult(
|
||||
success=True,
|
||||
template_path=template_path,
|
||||
template_name=name
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return TemplateResult(
|
||||
success=False,
|
||||
template_path=Path(),
|
||||
template_name=name,
|
||||
error=e
|
||||
)
|
||||
|
||||
def get_template_metadata(self, template_name: str) -> TemplateMetadata:
|
||||
"""Get metadata for a specific template."""
|
||||
template_path = self.templates_dir / template_name
|
||||
template = WorkspaceTemplate(template_path)
|
||||
return template.get_metadata()
|
||||
|
||||
def create_workspace_from_template(self, template_name: str, target_path: Path,
|
||||
project_name: str) -> WorkspaceCreationResult:
|
||||
"""Create a new workspace from a template."""
|
||||
try:
|
||||
template_path = self.templates_dir / template_name
|
||||
|
||||
if not template_path.exists():
|
||||
raise FileNotFoundError(f"Template '{template_name}' not found")
|
||||
|
||||
# Create target directory
|
||||
target_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Copy template contents
|
||||
self._copy_workspace_structure(template_path, target_path, include_assets=True)
|
||||
|
||||
# Update project-specific files
|
||||
self._customize_workspace(target_path, project_name)
|
||||
|
||||
return WorkspaceCreationResult(
|
||||
success=True,
|
||||
workspace_path=target_path,
|
||||
project_name=project_name
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return WorkspaceCreationResult(
|
||||
success=False,
|
||||
workspace_path=target_path,
|
||||
project_name=project_name,
|
||||
error=e
|
||||
)
|
||||
|
||||
def initialize_multi_project_workspace(self, workspace_root: Path):
|
||||
"""Initialize a multi-project workspace."""
|
||||
workspace_root.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Create shared directories
|
||||
(workspace_root / "shared_assets").mkdir(exist_ok=True)
|
||||
(workspace_root / "templates").mkdir(exist_ok=True)
|
||||
(workspace_root / "config").mkdir(exist_ok=True)
|
||||
|
||||
# Create workspace configuration
|
||||
config = {
|
||||
"workspace_type": "multi_project",
|
||||
"shared_assets_enabled": True,
|
||||
"project_isolation": True,
|
||||
"created_at": datetime.now().isoformat()
|
||||
}
|
||||
|
||||
config_file = workspace_root / "workspace.yaml"
|
||||
config_file.write_text(yaml.dump(config, indent=2))
|
||||
|
||||
def add_project(self, workspace_root: Path, project_name: str,
|
||||
template: Optional[str] = None) -> ProjectResult:
|
||||
"""Add a project to multi-project workspace."""
|
||||
try:
|
||||
project_path = workspace_root / project_name
|
||||
project_path.mkdir(exist_ok=True)
|
||||
|
||||
if template:
|
||||
# Use template if specified
|
||||
result = self.create_workspace_from_template(template, project_path, project_name)
|
||||
if not result.success:
|
||||
raise result.error or Exception("Template creation failed")
|
||||
else:
|
||||
# Create basic project structure
|
||||
(project_path / "docs").mkdir(exist_ok=True)
|
||||
(project_path / "assets").mkdir(exist_ok=True)
|
||||
|
||||
return ProjectResult(
|
||||
success=True,
|
||||
project_path=project_path,
|
||||
project_name=project_name
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return ProjectResult(
|
||||
success=False,
|
||||
project_path=workspace_root / project_name,
|
||||
project_name=project_name,
|
||||
error=e
|
||||
)
|
||||
|
||||
def get_shared_asset_library(self, workspace_root: Path) -> Optional[AssetManager]:
|
||||
"""Get shared asset library for multi-project workspace."""
|
||||
shared_assets_path = workspace_root / "shared_assets"
|
||||
if shared_assets_path.exists():
|
||||
return AssetManager(storage_path=shared_assets_path)
|
||||
return None
|
||||
|
||||
def initialize_workspace(self, workspace_path: Path):
|
||||
"""Initialize a single workspace."""
|
||||
workspace_path.mkdir(parents=True, exist_ok=True)
|
||||
(workspace_path / "assets").mkdir(exist_ok=True)
|
||||
(workspace_path / "docs").mkdir(exist_ok=True)
|
||||
|
||||
def synchronize_assets(self, source_workspace: Path, target_workspace: Path,
|
||||
sync_mode: str = "incremental") -> SyncResult:
|
||||
"""Synchronize assets between workspaces."""
|
||||
result = SyncResult(
|
||||
synchronized_count=0,
|
||||
skipped_count=0,
|
||||
error_count=0
|
||||
)
|
||||
|
||||
try:
|
||||
source_assets = source_workspace / "assets"
|
||||
target_assets = target_workspace / "assets"
|
||||
|
||||
if not source_assets.exists():
|
||||
return result
|
||||
|
||||
target_assets.mkdir(exist_ok=True)
|
||||
|
||||
# Simple synchronization (copy new files)
|
||||
for asset_file in source_assets.rglob("*"):
|
||||
if asset_file.is_file():
|
||||
relative_path = asset_file.relative_to(source_assets)
|
||||
target_file = target_assets / relative_path
|
||||
|
||||
if not target_file.exists() or sync_mode == "overwrite":
|
||||
target_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(asset_file, target_file)
|
||||
result.synchronized_count += 1
|
||||
else:
|
||||
result.skipped_count += 1
|
||||
|
||||
except Exception as e:
|
||||
result.error_count += 1
|
||||
result.errors.append(e)
|
||||
|
||||
return result
|
||||
|
||||
def create_backup(self, workspace_path: Path, backup_path: Path,
|
||||
include_assets: bool = True, compression_level: int = 6) -> BackupResult:
|
||||
"""Create a backup of workspace."""
|
||||
try:
|
||||
with zipfile.ZipFile(backup_path, 'w', zipfile.ZIP_DEFLATED, compresslevel=compression_level) as backup_zip:
|
||||
for file_path in workspace_path.rglob("*"):
|
||||
if file_path.is_file():
|
||||
# Skip assets if not included
|
||||
if not include_assets and "assets" in file_path.parts:
|
||||
continue
|
||||
|
||||
arc_name = file_path.relative_to(workspace_path)
|
||||
backup_zip.write(file_path, arc_name)
|
||||
|
||||
backup_size = backup_path.stat().st_size
|
||||
|
||||
return BackupResult(
|
||||
success=True,
|
||||
backup_path=backup_path,
|
||||
backup_size=backup_size
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return BackupResult(
|
||||
success=False,
|
||||
backup_path=backup_path,
|
||||
backup_size=0,
|
||||
error=e
|
||||
)
|
||||
|
||||
def restore_from_backup(self, backup_path: Path, target_path: Path) -> RestoreResult:
|
||||
"""Restore workspace from backup."""
|
||||
try:
|
||||
target_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
files_restored = 0
|
||||
with zipfile.ZipFile(backup_path, 'r') as backup_zip:
|
||||
backup_zip.extractall(target_path)
|
||||
files_restored = len(backup_zip.namelist())
|
||||
|
||||
return RestoreResult(
|
||||
success=True,
|
||||
restored_path=target_path,
|
||||
files_restored=files_restored
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return RestoreResult(
|
||||
success=False,
|
||||
restored_path=target_path,
|
||||
files_restored=0,
|
||||
error=e
|
||||
)
|
||||
|
||||
def capture_workspace_state(self, workspace_path: Path) -> WorkspaceState:
|
||||
"""Capture current state of workspace."""
|
||||
import hashlib
|
||||
|
||||
file_checksums = {}
|
||||
directory_structure = []
|
||||
asset_hashes = []
|
||||
|
||||
for item_path in workspace_path.rglob("*"):
|
||||
relative_path = str(item_path.relative_to(workspace_path))
|
||||
|
||||
if item_path.is_file():
|
||||
# Calculate file checksum
|
||||
content = item_path.read_bytes()
|
||||
checksum = hashlib.md5(content).hexdigest()
|
||||
file_checksums[relative_path] = checksum
|
||||
|
||||
# Track asset hashes
|
||||
if "assets" in item_path.parts:
|
||||
asset_hashes.append(checksum)
|
||||
|
||||
directory_structure.append(relative_path)
|
||||
|
||||
return WorkspaceState(
|
||||
timestamp=datetime.now(),
|
||||
file_checksums=file_checksums,
|
||||
directory_structure=directory_structure,
|
||||
asset_hashes=asset_hashes
|
||||
)
|
||||
|
||||
def detect_conflicts(self, state1: WorkspaceState, state2: WorkspaceState) -> List[ConflictInfo]:
|
||||
"""Detect conflicts between workspace states."""
|
||||
conflicts = []
|
||||
|
||||
# Find files that exist in both states but have different checksums
|
||||
for file_path, checksum1 in state1.file_checksums.items():
|
||||
if file_path in state2.file_checksums:
|
||||
checksum2 = state2.file_checksums[file_path]
|
||||
if checksum1 != checksum2:
|
||||
conflict = ConflictInfo(
|
||||
file_path=Path(file_path),
|
||||
conflict_type="content_conflict",
|
||||
local_timestamp=state1.timestamp,
|
||||
remote_timestamp=state2.timestamp
|
||||
)
|
||||
conflicts.append(conflict)
|
||||
|
||||
return conflicts
|
||||
|
||||
def resolve_conflicts(self, conflicts: List[ConflictInfo],
|
||||
resolution_strategy: str = "manual") -> MergeResult:
|
||||
"""Resolve workspace conflicts."""
|
||||
# Mock conflict resolution
|
||||
result = MergeResult(
|
||||
resolved_conflicts=len(conflicts),
|
||||
unresolved_conflicts=0,
|
||||
merge_strategy=resolution_strategy
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
def _copy_workspace_structure(self, source: Path, target: Path, include_assets: bool):
|
||||
"""Copy workspace structure from source to target."""
|
||||
for item in source.rglob("*"):
|
||||
if item.is_file():
|
||||
relative_path = item.relative_to(source)
|
||||
|
||||
# Skip assets if not included
|
||||
if not include_assets and "assets" in relative_path.parts:
|
||||
continue
|
||||
|
||||
# Skip template metadata
|
||||
if item.name == "template.json":
|
||||
continue
|
||||
|
||||
target_path = target / relative_path
|
||||
target_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(item, target_path)
|
||||
|
||||
def _customize_workspace(self, workspace_path: Path, project_name: str):
|
||||
"""Customize workspace for specific project."""
|
||||
# Update any configuration files with project name
|
||||
config_files = list(workspace_path.glob("*.yaml")) + list(workspace_path.glob("*.yml"))
|
||||
|
||||
for config_file in config_files:
|
||||
try:
|
||||
content = config_file.read_text()
|
||||
# Replace placeholder project names
|
||||
content = content.replace("{{PROJECT_NAME}}", project_name)
|
||||
content = content.replace("New Project", project_name)
|
||||
config_file.write_text(content)
|
||||
except Exception:
|
||||
pass # Ignore errors in customization
|
||||
# Re-export from core for backward compatibility
|
||||
from markitect.core.workspace import (
|
||||
WorkspaceManager,
|
||||
WorkspaceTemplate,
|
||||
TemplateMetadata,
|
||||
TemplateResult,
|
||||
WorkspaceCreationResult,
|
||||
ProjectResult,
|
||||
SyncResult,
|
||||
BackupResult,
|
||||
RestoreResult,
|
||||
WorkspaceState,
|
||||
ConflictInfo,
|
||||
MergeResult,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
'WorkspaceManager',
|
||||
'WorkspaceTemplate',
|
||||
'TemplateMetadata',
|
||||
'TemplateResult',
|
||||
'WorkspaceCreationResult',
|
||||
'ProjectResult',
|
||||
'SyncResult',
|
||||
'BackupResult',
|
||||
'RestoreResult',
|
||||
'WorkspaceState',
|
||||
'ConflictInfo',
|
||||
'MergeResult',
|
||||
]
|
||||
|
||||
0
tests/integration/spaces/__init__.py
Normal file
0
tests/integration/spaces/__init__.py
Normal file
570
tests/integration/spaces/test_space_service_integration.py
Normal file
570
tests/integration/spaces/test_space_service_integration.py
Normal file
@@ -0,0 +1,570 @@
|
||||
"""
|
||||
Integration tests for SpaceService.
|
||||
|
||||
Tests the full workflow of space operations including:
|
||||
- Space creation and lifecycle management
|
||||
- Document operations within spaces
|
||||
- Variable management
|
||||
- Reference tracking for cache invalidation
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
import os
|
||||
|
||||
from markitect.spaces import (
|
||||
SpaceService,
|
||||
InformationSpace,
|
||||
SpaceDocument,
|
||||
SpaceConfig,
|
||||
SpaceMetadata,
|
||||
SpaceStatus,
|
||||
SqliteSpaceRepository,
|
||||
SqliteDocumentRepository,
|
||||
SqliteVariableRepository,
|
||||
SqliteReferenceRepository,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_db():
|
||||
"""Create a temporary database file for testing."""
|
||||
fd, path = tempfile.mkstemp(suffix=".db")
|
||||
os.close(fd)
|
||||
yield path
|
||||
if os.path.exists(path):
|
||||
os.unlink(path)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def space_service(temp_db):
|
||||
"""Create a fully wired SpaceService for testing."""
|
||||
return SpaceService(
|
||||
space_repo=SqliteSpaceRepository(temp_db),
|
||||
document_repo=SqliteDocumentRepository(temp_db),
|
||||
variable_repo=SqliteVariableRepository(temp_db),
|
||||
reference_repo=SqliteReferenceRepository(temp_db),
|
||||
)
|
||||
|
||||
|
||||
class TestSpaceLifecycle:
|
||||
"""Tests for space lifecycle operations."""
|
||||
|
||||
def test_create_and_retrieve_space(self, space_service):
|
||||
"""Test creating and retrieving a space."""
|
||||
space = space_service.create_space(
|
||||
name="my-docs",
|
||||
description="My documentation",
|
||||
)
|
||||
|
||||
assert space.name == "my-docs"
|
||||
assert space.description == "My documentation"
|
||||
assert space.status == SpaceStatus.DRAFT
|
||||
|
||||
# Retrieve by ID
|
||||
retrieved = space_service.get_space(space.id)
|
||||
assert retrieved is not None
|
||||
assert retrieved.name == "my-docs"
|
||||
|
||||
# Retrieve by name
|
||||
by_name = space_service.get_space_by_name("my-docs")
|
||||
assert by_name is not None
|
||||
assert by_name.id == space.id
|
||||
|
||||
def test_create_space_with_config_and_metadata(self, space_service):
|
||||
"""Test creating a space with custom config and metadata."""
|
||||
config = SpaceConfig(
|
||||
theme="dark",
|
||||
history_enabled=True,
|
||||
enable_caching=False,
|
||||
)
|
||||
metadata = SpaceMetadata(
|
||||
tags=["api", "v1"],
|
||||
author="tester",
|
||||
custom={"version": "1.0"},
|
||||
)
|
||||
|
||||
space = space_service.create_space(
|
||||
name="configured-space",
|
||||
config=config,
|
||||
metadata=metadata,
|
||||
)
|
||||
|
||||
assert space.config.theme == "dark"
|
||||
assert space.config.history_enabled is True
|
||||
assert space.metadata.tags == ["api", "v1"]
|
||||
assert space.metadata.author == "tester"
|
||||
|
||||
def test_update_space(self, space_service):
|
||||
"""Test updating a space."""
|
||||
space = space_service.create_space(name="original")
|
||||
|
||||
updated = space_service.update_space(
|
||||
space.id,
|
||||
name="updated",
|
||||
description="New description",
|
||||
)
|
||||
|
||||
assert updated.name == "updated"
|
||||
assert updated.description == "New description"
|
||||
|
||||
# Verify persisted
|
||||
retrieved = space_service.get_space(space.id)
|
||||
assert retrieved.name == "updated"
|
||||
|
||||
def test_space_lifecycle_transitions(self, space_service):
|
||||
"""Test space status transitions."""
|
||||
space = space_service.create_space(name="lifecycle-test")
|
||||
assert space.status == SpaceStatus.DRAFT
|
||||
|
||||
# Activate
|
||||
activated = space_service.activate_space(space.id)
|
||||
assert activated.status == SpaceStatus.ACTIVE
|
||||
|
||||
# Archive
|
||||
archived = space_service.archive_space(space.id)
|
||||
assert archived.status == SpaceStatus.ARCHIVED
|
||||
|
||||
def test_delete_space(self, space_service):
|
||||
"""Test deleting a space."""
|
||||
space = space_service.create_space(name="to-delete")
|
||||
|
||||
result = space_service.delete_space(space.id)
|
||||
assert result is True
|
||||
|
||||
# Verify deleted
|
||||
retrieved = space_service.get_space(space.id)
|
||||
assert retrieved is None
|
||||
|
||||
def test_list_spaces_excludes_archived(self, space_service):
|
||||
"""Test that list_spaces excludes archived by default."""
|
||||
space1 = space_service.create_space(name="active")
|
||||
space2 = space_service.create_space(name="archived")
|
||||
space_service.archive_space(space2.id)
|
||||
|
||||
spaces = space_service.list_spaces()
|
||||
assert len(spaces) == 1
|
||||
assert spaces[0].name == "active"
|
||||
|
||||
# Include archived
|
||||
all_spaces = space_service.list_spaces(include_archived=True)
|
||||
assert len(all_spaces) == 2
|
||||
|
||||
|
||||
class TestSpaceHierarchy:
|
||||
"""Tests for space hierarchy operations."""
|
||||
|
||||
def test_create_child_space(self, space_service):
|
||||
"""Test creating a child space."""
|
||||
parent = space_service.create_space(name="parent")
|
||||
child = space_service.create_space(
|
||||
name="child",
|
||||
parent_space_id=parent.id,
|
||||
)
|
||||
|
||||
assert child.parent_space_id == parent.id
|
||||
|
||||
children = space_service.get_child_spaces(parent.id)
|
||||
assert len(children) == 1
|
||||
assert children[0].id == child.id
|
||||
|
||||
def test_create_nested_hierarchy(self, space_service):
|
||||
"""Test creating a nested space hierarchy."""
|
||||
root = space_service.create_space(name="root")
|
||||
level1 = space_service.create_space(name="level1", parent_space_id=root.id)
|
||||
level2 = space_service.create_space(name="level2", parent_space_id=level1.id)
|
||||
|
||||
# Verify hierarchy
|
||||
root_children = space_service.get_child_spaces(root.id)
|
||||
assert len(root_children) == 1
|
||||
assert root_children[0].id == level1.id
|
||||
|
||||
level1_children = space_service.get_child_spaces(level1.id)
|
||||
assert len(level1_children) == 1
|
||||
assert level1_children[0].id == level2.id
|
||||
|
||||
def test_delete_space_with_children_cascade(self, space_service):
|
||||
"""Test deleting a space cascades to children."""
|
||||
parent = space_service.create_space(name="parent")
|
||||
child = space_service.create_space(name="child", parent_space_id=parent.id)
|
||||
|
||||
space_service.delete_space(parent.id, cascade=True)
|
||||
|
||||
assert space_service.get_space(parent.id) is None
|
||||
assert space_service.get_space(child.id) is None
|
||||
|
||||
def test_delete_space_with_children_no_cascade_raises(self, space_service):
|
||||
"""Test deleting a space with children raises if cascade=False."""
|
||||
parent = space_service.create_space(name="parent")
|
||||
space_service.create_space(name="child", parent_space_id=parent.id)
|
||||
|
||||
with pytest.raises(ValueError, match="has 1 child"):
|
||||
space_service.delete_space(parent.id, cascade=False)
|
||||
|
||||
|
||||
class TestDocumentOperations:
|
||||
"""Tests for document operations within spaces."""
|
||||
|
||||
def test_add_and_list_documents(self, space_service):
|
||||
"""Test adding and listing documents."""
|
||||
space = space_service.create_space(name="doc-space")
|
||||
|
||||
doc1 = space_service.add_document(
|
||||
space.id,
|
||||
space_path="/intro.md",
|
||||
document_id="doc-1",
|
||||
)
|
||||
doc2 = space_service.add_document(
|
||||
space.id,
|
||||
space_path="/api/endpoints.md",
|
||||
document_id="doc-2",
|
||||
)
|
||||
|
||||
docs = space_service.list_documents(space.id)
|
||||
assert len(docs) == 2
|
||||
|
||||
def test_get_document_by_path(self, space_service):
|
||||
"""Test getting a document by its path."""
|
||||
space = space_service.create_space(name="doc-space")
|
||||
space_service.add_document(space.id, "/intro.md", document_id="doc-1")
|
||||
|
||||
doc = space_service.get_document_by_path(space.id, "/intro.md")
|
||||
assert doc is not None
|
||||
assert doc.document_id == "doc-1"
|
||||
|
||||
# Also works without leading slash
|
||||
doc2 = space_service.get_document_by_path(space.id, "intro.md")
|
||||
assert doc2 is not None
|
||||
|
||||
def test_move_document(self, space_service):
|
||||
"""Test moving a document to a new path."""
|
||||
space = space_service.create_space(name="doc-space")
|
||||
doc = space_service.add_document(space.id, "/old-path.md")
|
||||
|
||||
moved = space_service.move_document(doc.id, "/new-path.md")
|
||||
assert moved.space_path == "/new-path.md"
|
||||
|
||||
# Old path should not exist
|
||||
old_doc = space_service.get_document_by_path(space.id, "/old-path.md")
|
||||
assert old_doc is None
|
||||
|
||||
# New path should work
|
||||
new_doc = space_service.get_document_by_path(space.id, "/new-path.md")
|
||||
assert new_doc is not None
|
||||
|
||||
def test_remove_document(self, space_service):
|
||||
"""Test removing a document."""
|
||||
space = space_service.create_space(name="doc-space")
|
||||
doc = space_service.add_document(space.id, "/to-remove.md")
|
||||
|
||||
result = space_service.remove_document(doc.id)
|
||||
assert result is True
|
||||
|
||||
# Verify removed
|
||||
retrieved = space_service.get_document(doc.id)
|
||||
assert retrieved is None
|
||||
|
||||
def test_reorder_documents(self, space_service):
|
||||
"""Test reordering documents."""
|
||||
space = space_service.create_space(name="doc-space")
|
||||
doc1 = space_service.add_document(space.id, "/a.md", order_index=0)
|
||||
doc2 = space_service.add_document(space.id, "/b.md", order_index=1)
|
||||
doc3 = space_service.add_document(space.id, "/c.md", order_index=2)
|
||||
|
||||
# Reorder: c, a, b
|
||||
space_service.reorder_documents(space.id, [doc3.id, doc1.id, doc2.id])
|
||||
|
||||
docs = space_service.list_documents(space.id)
|
||||
assert docs[0].id == doc3.id
|
||||
assert docs[1].id == doc1.id
|
||||
assert docs[2].id == doc2.id
|
||||
|
||||
def test_document_with_metadata(self, space_service):
|
||||
"""Test document with custom metadata."""
|
||||
space = space_service.create_space(name="doc-space")
|
||||
doc = space_service.add_document(
|
||||
space.id,
|
||||
"/api.md",
|
||||
metadata={"title": "API Reference", "order": 5},
|
||||
)
|
||||
|
||||
retrieved = space_service.get_document(doc.id)
|
||||
assert retrieved.metadata["title"] == "API Reference"
|
||||
assert retrieved.metadata["order"] == 5
|
||||
|
||||
def test_update_document_hash(self, space_service):
|
||||
"""Test updating document content hash."""
|
||||
space = space_service.create_space(name="doc-space")
|
||||
doc = space_service.add_document(space.id, "/content.md")
|
||||
|
||||
space_service.update_document_hash(doc.id, "hash123abc")
|
||||
|
||||
retrieved = space_service.get_document(doc.id)
|
||||
assert retrieved.content_hash == "hash123abc"
|
||||
|
||||
|
||||
class TestVariableOperations:
|
||||
"""Tests for variable operations within spaces."""
|
||||
|
||||
def test_set_and_get_variable(self, space_service):
|
||||
"""Test setting and getting a variable."""
|
||||
space = space_service.create_space(name="var-space")
|
||||
|
||||
var = space_service.set_variable(space.id, "version", "1.0.0")
|
||||
assert var.value == "1.0.0"
|
||||
|
||||
retrieved = space_service.get_variable(space.id, "version")
|
||||
assert retrieved is not None
|
||||
assert retrieved.value == "1.0.0"
|
||||
|
||||
def test_list_variables(self, space_service):
|
||||
"""Test listing variables."""
|
||||
space = space_service.create_space(name="var-space")
|
||||
space_service.set_variable(space.id, "var1", "value1")
|
||||
space_service.set_variable(space.id, "var2", "value2")
|
||||
|
||||
variables = space_service.list_variables(space.id)
|
||||
assert len(variables) == 2
|
||||
|
||||
def test_list_variables_by_scope(self, space_service):
|
||||
"""Test listing variables filtered by scope."""
|
||||
space = space_service.create_space(name="var-space")
|
||||
space_service.set_variable(space.id, "global", "g", scope="space")
|
||||
space_service.set_variable(space.id, "local", "l", scope="document")
|
||||
|
||||
space_vars = space_service.list_variables(space.id, scope="space")
|
||||
assert len(space_vars) == 1
|
||||
assert space_vars[0].name == "global"
|
||||
|
||||
def test_delete_variable(self, space_service):
|
||||
"""Test deleting a variable."""
|
||||
space = space_service.create_space(name="var-space")
|
||||
space_service.set_variable(space.id, "temp", "value")
|
||||
|
||||
result = space_service.delete_variable(space.id, "temp")
|
||||
assert result is True
|
||||
|
||||
retrieved = space_service.get_variable(space.id, "temp")
|
||||
assert retrieved is None
|
||||
|
||||
def test_get_variables_dict(self, space_service):
|
||||
"""Test getting variables as a dictionary."""
|
||||
space = space_service.create_space(name="var-space")
|
||||
space_service.set_variable(space.id, "api_url", "https://api.example.com")
|
||||
space_service.set_variable(space.id, "version", "2.0")
|
||||
|
||||
variables_dict = space_service.get_variables_dict(space.id)
|
||||
assert variables_dict == {
|
||||
"api_url": "https://api.example.com",
|
||||
"version": "2.0",
|
||||
}
|
||||
|
||||
def test_variable_with_complex_value(self, space_service):
|
||||
"""Test variable with complex JSON value."""
|
||||
space = space_service.create_space(name="var-space")
|
||||
complex_value = {
|
||||
"endpoints": ["/api/v1", "/api/v2"],
|
||||
"config": {"timeout": 30},
|
||||
}
|
||||
space_service.set_variable(space.id, "api_config", complex_value)
|
||||
|
||||
retrieved = space_service.get_variable(space.id, "api_config")
|
||||
assert retrieved.value == complex_value
|
||||
|
||||
|
||||
class TestReferenceTracking:
|
||||
"""Tests for transclusion reference tracking."""
|
||||
|
||||
def test_add_and_get_references(self, space_service):
|
||||
"""Test adding and getting references."""
|
||||
space = space_service.create_space(name="ref-space")
|
||||
|
||||
space_service.add_reference("doc-1", "shared-component", space.id)
|
||||
space_service.add_reference("doc-2", "shared-component", space.id)
|
||||
|
||||
refs = space_service.get_references_to("shared-component", space.id)
|
||||
assert len(refs) == 2
|
||||
|
||||
def test_get_references_from(self, space_service):
|
||||
"""Test getting references from a source document."""
|
||||
space = space_service.create_space(name="ref-space")
|
||||
|
||||
space_service.add_reference("doc-1", "component-a", space.id)
|
||||
space_service.add_reference("doc-1", "component-b", space.id)
|
||||
|
||||
refs = space_service.get_references_from("doc-1", space.id)
|
||||
assert len(refs) == 2
|
||||
targets = [r.target_doc_id for r in refs]
|
||||
assert "component-a" in targets
|
||||
assert "component-b" in targets
|
||||
|
||||
def test_get_dependents(self, space_service):
|
||||
"""Test getting dependent documents."""
|
||||
space = space_service.create_space(name="ref-space")
|
||||
|
||||
space_service.add_reference("doc-1", "shared", space.id)
|
||||
space_service.add_reference("doc-2", "shared", space.id)
|
||||
space_service.add_reference("doc-3", "shared", space.id)
|
||||
|
||||
dependents = space_service.get_dependents("shared", space.id)
|
||||
assert len(dependents) == 3
|
||||
assert set(dependents) == {"doc-1", "doc-2", "doc-3"}
|
||||
|
||||
def test_clear_references_from(self, space_service):
|
||||
"""Test clearing references from a source document."""
|
||||
space = space_service.create_space(name="ref-space")
|
||||
|
||||
space_service.add_reference("doc-1", "a", space.id)
|
||||
space_service.add_reference("doc-1", "b", space.id)
|
||||
space_service.add_reference("doc-2", "a", space.id)
|
||||
|
||||
count = space_service.clear_references_from("doc-1", space.id)
|
||||
assert count == 2
|
||||
|
||||
# doc-1 refs should be gone
|
||||
refs1 = space_service.get_references_from("doc-1", space.id)
|
||||
assert len(refs1) == 0
|
||||
|
||||
# doc-2 refs should still exist
|
||||
refs2 = space_service.get_references_from("doc-2", space.id)
|
||||
assert len(refs2) == 1
|
||||
|
||||
def test_remove_document_clears_references(self, space_service):
|
||||
"""Test that removing a document clears its references."""
|
||||
space = space_service.create_space(name="ref-space")
|
||||
doc = space_service.add_document(space.id, "/source.md")
|
||||
|
||||
# Add reference from this document
|
||||
space_service.add_reference(doc.id, "target", space.id)
|
||||
|
||||
# Verify reference exists
|
||||
refs = space_service.get_references_from(doc.id, space.id)
|
||||
assert len(refs) == 1
|
||||
|
||||
# Remove document
|
||||
space_service.remove_document(doc.id)
|
||||
|
||||
# References should be cleared
|
||||
refs = space_service.get_references_from(doc.id, space.id)
|
||||
assert len(refs) == 0
|
||||
|
||||
|
||||
class TestFullWorkflow:
|
||||
"""End-to-end workflow tests."""
|
||||
|
||||
def test_documentation_space_workflow(self, space_service):
|
||||
"""Test a complete documentation space workflow."""
|
||||
# Create a documentation space
|
||||
space = space_service.create_space(
|
||||
name="api-docs",
|
||||
description="API Documentation",
|
||||
config=SpaceConfig(theme="minimal"),
|
||||
metadata=SpaceMetadata(tags=["api", "v2"]),
|
||||
)
|
||||
|
||||
# Add documents
|
||||
intro = space_service.add_document(
|
||||
space.id,
|
||||
"/intro.md",
|
||||
order_index=0,
|
||||
metadata={"title": "Introduction"},
|
||||
)
|
||||
endpoints = space_service.add_document(
|
||||
space.id,
|
||||
"/api/endpoints.md",
|
||||
order_index=1,
|
||||
metadata={"title": "API Endpoints"},
|
||||
)
|
||||
auth = space_service.add_document(
|
||||
space.id,
|
||||
"/api/auth.md",
|
||||
order_index=2,
|
||||
metadata={"title": "Authentication"},
|
||||
)
|
||||
|
||||
# Add variables for transclusion
|
||||
space_service.set_variable(space.id, "api_base_url", "https://api.example.com")
|
||||
space_service.set_variable(space.id, "version", "2.0")
|
||||
|
||||
# Track references (e.g., endpoints includes auth)
|
||||
space_service.add_reference(endpoints.id, auth.id, space.id)
|
||||
|
||||
# Activate the space
|
||||
space_service.activate_space(space.id)
|
||||
|
||||
# Get stats
|
||||
stats = space_service.get_space_stats(space.id)
|
||||
assert stats["document_count"] == 3
|
||||
assert stats["variable_count"] == 2
|
||||
assert stats["status"] == "active"
|
||||
|
||||
# Verify the space
|
||||
retrieved = space_service.get_space(space.id)
|
||||
assert retrieved.status == SpaceStatus.ACTIVE
|
||||
|
||||
# List documents in order
|
||||
docs = space_service.list_documents(space.id)
|
||||
assert len(docs) == 3
|
||||
assert docs[0].space_path == "/intro.md"
|
||||
|
||||
# Get transclusion context
|
||||
context = space_service.get_variables_dict(space.id)
|
||||
assert context["api_base_url"] == "https://api.example.com"
|
||||
|
||||
# Check dependencies for cache invalidation
|
||||
dependents = space_service.get_dependents(auth.id, space.id)
|
||||
assert endpoints.id in dependents
|
||||
|
||||
def test_space_stats(self, space_service):
|
||||
"""Test getting space statistics."""
|
||||
space = space_service.create_space(name="stats-test")
|
||||
space_service.add_document(space.id, "/doc1.md")
|
||||
space_service.add_document(space.id, "/doc2.md")
|
||||
space_service.set_variable(space.id, "var1", "value1")
|
||||
space_service.create_space(name="child", parent_space_id=space.id)
|
||||
|
||||
stats = space_service.get_space_stats(space.id)
|
||||
|
||||
assert stats["name"] == "stats-test"
|
||||
assert stats["document_count"] == 2
|
||||
assert stats["variable_count"] == 1
|
||||
assert stats["child_space_count"] == 1
|
||||
|
||||
|
||||
class TestErrorHandling:
|
||||
"""Tests for error handling scenarios."""
|
||||
|
||||
def test_create_space_empty_name_raises(self, space_service):
|
||||
"""Test that empty name raises ValueError."""
|
||||
with pytest.raises(ValueError, match="cannot be empty"):
|
||||
space_service.create_space(name="")
|
||||
|
||||
with pytest.raises(ValueError, match="cannot be empty"):
|
||||
space_service.create_space(name=" ")
|
||||
|
||||
def test_create_space_duplicate_name_raises(self, space_service):
|
||||
"""Test that duplicate name raises ValueError."""
|
||||
space_service.create_space(name="taken")
|
||||
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
space_service.create_space(name="taken")
|
||||
|
||||
def test_update_nonexistent_space_raises(self, space_service):
|
||||
"""Test that updating non-existent space raises ValueError."""
|
||||
with pytest.raises(ValueError, match="not found"):
|
||||
space_service.update_space("non-existent", name="new-name")
|
||||
|
||||
def test_add_document_to_nonexistent_space_raises(self, space_service):
|
||||
"""Test that adding document to non-existent space raises."""
|
||||
with pytest.raises(ValueError, match="not found"):
|
||||
space_service.add_document("non-existent", "/doc.md")
|
||||
|
||||
def test_set_variable_in_nonexistent_space_raises(self, space_service):
|
||||
"""Test that setting variable in non-existent space raises."""
|
||||
with pytest.raises(ValueError, match="not found"):
|
||||
space_service.set_variable("non-existent", "var", "value")
|
||||
|
||||
def test_create_child_with_nonexistent_parent_raises(self, space_service):
|
||||
"""Test that creating child with non-existent parent raises."""
|
||||
with pytest.raises(ValueError, match="Parent space.*not found"):
|
||||
space_service.create_space(name="orphan", parent_space_id="non-existent")
|
||||
1
tests/unit/spaces/__init__.py
Normal file
1
tests/unit/spaces/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Unit tests for the spaces package."""
|
||||
299
tests/unit/spaces/test_models.py
Normal file
299
tests/unit/spaces/test_models.py
Normal file
@@ -0,0 +1,299 @@
|
||||
"""
|
||||
Unit tests for Information Space models.
|
||||
|
||||
Tests the core domain models: InformationSpace, SpaceDocument, SpaceConfig, SpaceMetadata.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from datetime import datetime
|
||||
from markitect.spaces.models import (
|
||||
InformationSpace,
|
||||
SpaceDocument,
|
||||
SpaceConfig,
|
||||
SpaceMetadata,
|
||||
SpaceVariable,
|
||||
TransclusionReference,
|
||||
SpaceStatus,
|
||||
)
|
||||
|
||||
|
||||
class TestSpaceMetadata:
|
||||
"""Tests for SpaceMetadata dataclass."""
|
||||
|
||||
def test_default_metadata(self):
|
||||
"""Test default metadata values."""
|
||||
metadata = SpaceMetadata()
|
||||
assert metadata.tags == []
|
||||
assert metadata.author is None
|
||||
assert metadata.custom == {}
|
||||
|
||||
def test_metadata_with_values(self):
|
||||
"""Test metadata with custom values."""
|
||||
metadata = SpaceMetadata(
|
||||
tags=["api", "docs"],
|
||||
author="test-user",
|
||||
custom={"version": "1.0"}
|
||||
)
|
||||
assert metadata.tags == ["api", "docs"]
|
||||
assert metadata.author == "test-user"
|
||||
assert metadata.custom["version"] == "1.0"
|
||||
|
||||
def test_metadata_to_dict(self):
|
||||
"""Test metadata serialization."""
|
||||
metadata = SpaceMetadata(tags=["test"], author="user")
|
||||
data = metadata.to_dict()
|
||||
assert data["tags"] == ["test"]
|
||||
assert data["author"] == "user"
|
||||
|
||||
def test_metadata_from_dict(self):
|
||||
"""Test metadata deserialization."""
|
||||
data = {"tags": ["api"], "author": "admin", "custom": {"key": "value"}}
|
||||
metadata = SpaceMetadata.from_dict(data)
|
||||
assert metadata.tags == ["api"]
|
||||
assert metadata.author == "admin"
|
||||
assert metadata.custom["key"] == "value"
|
||||
|
||||
|
||||
class TestSpaceConfig:
|
||||
"""Tests for SpaceConfig dataclass."""
|
||||
|
||||
def test_default_config(self):
|
||||
"""Test default configuration values."""
|
||||
config = SpaceConfig()
|
||||
assert config.default_variant == "hierarchical"
|
||||
assert config.enable_caching is True
|
||||
assert config.theme is None
|
||||
assert config.history_enabled is False
|
||||
assert config.history_backend == "git"
|
||||
|
||||
def test_config_with_history_enabled(self):
|
||||
"""Test config with git history enabled."""
|
||||
config = SpaceConfig(history_enabled=True, history_backend="git")
|
||||
assert config.history_enabled is True
|
||||
assert config.history_backend == "git"
|
||||
|
||||
def test_config_to_dict(self):
|
||||
"""Test config serialization."""
|
||||
config = SpaceConfig(theme="dark", enable_caching=False)
|
||||
data = config.to_dict()
|
||||
assert data["theme"] == "dark"
|
||||
assert data["enable_caching"] is False
|
||||
|
||||
def test_config_from_dict(self):
|
||||
"""Test config deserialization."""
|
||||
data = {"default_variant": "flat", "history_enabled": True}
|
||||
config = SpaceConfig.from_dict(data)
|
||||
assert config.default_variant == "flat"
|
||||
assert config.history_enabled is True
|
||||
|
||||
|
||||
class TestSpaceDocument:
|
||||
"""Tests for SpaceDocument dataclass."""
|
||||
|
||||
def test_default_document(self):
|
||||
"""Test default document values."""
|
||||
doc = SpaceDocument()
|
||||
assert doc.id is not None
|
||||
assert doc.space_path == ""
|
||||
assert doc.order_index == 0
|
||||
assert doc.metadata == {}
|
||||
|
||||
def test_document_with_values(self):
|
||||
"""Test document with custom values."""
|
||||
doc = SpaceDocument(
|
||||
space_id="space-1",
|
||||
document_id="doc-1",
|
||||
space_path="/intro.md",
|
||||
order_index=1,
|
||||
content_hash="abc123"
|
||||
)
|
||||
assert doc.space_id == "space-1"
|
||||
assert doc.space_path == "/intro.md"
|
||||
assert doc.content_hash == "abc123"
|
||||
|
||||
def test_document_to_dict(self):
|
||||
"""Test document serialization."""
|
||||
doc = SpaceDocument(space_path="/test.md")
|
||||
data = doc.to_dict()
|
||||
assert data["space_path"] == "/test.md"
|
||||
assert "id" in data
|
||||
assert "added_at" in data
|
||||
|
||||
def test_document_from_dict(self):
|
||||
"""Test document deserialization."""
|
||||
data = {
|
||||
"id": "doc-123",
|
||||
"space_id": "space-1",
|
||||
"space_path": "/api.md",
|
||||
"order_index": 5
|
||||
}
|
||||
doc = SpaceDocument.from_dict(data)
|
||||
assert doc.id == "doc-123"
|
||||
assert doc.space_path == "/api.md"
|
||||
assert doc.order_index == 5
|
||||
|
||||
|
||||
class TestInformationSpace:
|
||||
"""Tests for InformationSpace dataclass."""
|
||||
|
||||
def test_space_requires_name(self):
|
||||
"""Test that space name is required."""
|
||||
with pytest.raises(ValueError, match="Space name is required"):
|
||||
InformationSpace(name="")
|
||||
|
||||
def test_space_default_values(self):
|
||||
"""Test default space values."""
|
||||
space = InformationSpace(name="test-space")
|
||||
assert space.name == "test-space"
|
||||
assert space.id is not None
|
||||
assert space.status == SpaceStatus.DRAFT
|
||||
assert space.description is None
|
||||
assert space.parent_space_id is None
|
||||
|
||||
def test_space_with_config(self):
|
||||
"""Test space with custom config."""
|
||||
config = SpaceConfig(theme="minimal", history_enabled=True)
|
||||
space = InformationSpace(
|
||||
name="docs",
|
||||
description="Documentation space",
|
||||
config=config
|
||||
)
|
||||
assert space.config.theme == "minimal"
|
||||
assert space.config.history_enabled is True
|
||||
|
||||
def test_space_activation(self):
|
||||
"""Test space lifecycle transitions."""
|
||||
space = InformationSpace(name="test")
|
||||
assert space.status == SpaceStatus.DRAFT
|
||||
|
||||
space.activate()
|
||||
assert space.status == SpaceStatus.ACTIVE
|
||||
|
||||
space.archive()
|
||||
assert space.status == SpaceStatus.ARCHIVED
|
||||
|
||||
def test_space_touch_updates_timestamp(self):
|
||||
"""Test that touch() updates the timestamp."""
|
||||
space = InformationSpace(name="test")
|
||||
original_updated = space.updated_at
|
||||
|
||||
import time
|
||||
time.sleep(0.01) # Small delay to ensure timestamp changes
|
||||
|
||||
space.touch()
|
||||
assert space.updated_at >= original_updated
|
||||
|
||||
def test_space_to_dict(self):
|
||||
"""Test space serialization."""
|
||||
space = InformationSpace(
|
||||
name="api-docs",
|
||||
description="API Documentation"
|
||||
)
|
||||
data = space.to_dict()
|
||||
|
||||
assert data["name"] == "api-docs"
|
||||
assert data["description"] == "API Documentation"
|
||||
assert data["status"] == "draft"
|
||||
assert "id" in data
|
||||
assert "created_at" in data
|
||||
|
||||
def test_space_from_dict(self):
|
||||
"""Test space deserialization."""
|
||||
data = {
|
||||
"id": "space-123",
|
||||
"name": "my-space",
|
||||
"description": "Test space",
|
||||
"status": "active",
|
||||
"config": {"history_enabled": True},
|
||||
"metadata": {"tags": ["test"]}
|
||||
}
|
||||
space = InformationSpace.from_dict(data)
|
||||
|
||||
assert space.id == "space-123"
|
||||
assert space.name == "my-space"
|
||||
assert space.status == SpaceStatus.ACTIVE
|
||||
assert space.config.history_enabled is True
|
||||
assert space.metadata.tags == ["test"]
|
||||
|
||||
def test_space_roundtrip_serialization(self):
|
||||
"""Test that to_dict and from_dict are inverse operations."""
|
||||
original = InformationSpace(
|
||||
name="roundtrip-test",
|
||||
description="Testing serialization",
|
||||
config=SpaceConfig(theme="dark", history_enabled=True),
|
||||
metadata=SpaceMetadata(tags=["test", "roundtrip"])
|
||||
)
|
||||
original.activate()
|
||||
|
||||
data = original.to_dict()
|
||||
restored = InformationSpace.from_dict(data)
|
||||
|
||||
assert restored.name == original.name
|
||||
assert restored.description == original.description
|
||||
assert restored.status == original.status
|
||||
assert restored.config.theme == original.config.theme
|
||||
assert restored.metadata.tags == original.metadata.tags
|
||||
|
||||
|
||||
class TestSpaceVariable:
|
||||
"""Tests for SpaceVariable dataclass."""
|
||||
|
||||
def test_variable_creation(self):
|
||||
"""Test variable creation."""
|
||||
var = SpaceVariable(
|
||||
space_id="space-1",
|
||||
name="version",
|
||||
value="1.0.0"
|
||||
)
|
||||
assert var.name == "version"
|
||||
assert var.value == "1.0.0"
|
||||
assert var.scope == "space"
|
||||
|
||||
def test_variable_to_dict(self):
|
||||
"""Test variable serialization."""
|
||||
var = SpaceVariable(
|
||||
space_id="space-1",
|
||||
name="config",
|
||||
value={"key": "value"},
|
||||
scope="document"
|
||||
)
|
||||
data = var.to_dict()
|
||||
assert data["name"] == "config"
|
||||
assert data["scope"] == "document"
|
||||
|
||||
|
||||
class TestTransclusionReference:
|
||||
"""Tests for TransclusionReference dataclass."""
|
||||
|
||||
def test_reference_creation(self):
|
||||
"""Test transclusion reference creation."""
|
||||
ref = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id="space-1"
|
||||
)
|
||||
assert ref.source_doc_id == "doc-1"
|
||||
assert ref.target_doc_id == "doc-2"
|
||||
assert ref.created_at is not None
|
||||
|
||||
def test_reference_to_dict(self):
|
||||
"""Test reference serialization."""
|
||||
ref = TransclusionReference(
|
||||
source_doc_id="a",
|
||||
target_doc_id="b",
|
||||
space_id="s"
|
||||
)
|
||||
data = ref.to_dict()
|
||||
assert "created_at" in data
|
||||
assert data["source_doc_id"] == "a"
|
||||
|
||||
|
||||
class TestSpaceStatus:
|
||||
"""Tests for SpaceStatus enum."""
|
||||
|
||||
def test_status_values(self):
|
||||
"""Test status enum values."""
|
||||
assert SpaceStatus.DRAFT.value == "draft"
|
||||
assert SpaceStatus.ACTIVE.value == "active"
|
||||
assert SpaceStatus.ARCHIVED.value == "archived"
|
||||
assert SpaceStatus.DELETED.value == "deleted"
|
||||
901
tests/unit/spaces/test_repositories.py
Normal file
901
tests/unit/spaces/test_repositories.py
Normal file
@@ -0,0 +1,901 @@
|
||||
"""
|
||||
Unit tests for space repositories.
|
||||
|
||||
Tests the SQLite implementations of:
|
||||
- ISpaceRepository (SqliteSpaceRepository)
|
||||
- IDocumentAssociationRepository (SqliteDocumentRepository)
|
||||
- IVariableRepository (SqliteVariableRepository)
|
||||
- IReferenceRepository (SqliteReferenceRepository)
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
import os
|
||||
from datetime import datetime
|
||||
|
||||
from markitect.spaces.models import (
|
||||
InformationSpace,
|
||||
SpaceDocument,
|
||||
SpaceVariable,
|
||||
TransclusionReference,
|
||||
SpaceStatus,
|
||||
SpaceConfig,
|
||||
SpaceMetadata,
|
||||
)
|
||||
from markitect.spaces.repositories.sqlite import (
|
||||
SqliteSpaceRepository,
|
||||
SqliteDocumentRepository,
|
||||
SqliteVariableRepository,
|
||||
SqliteReferenceRepository,
|
||||
initialize_space_tables,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_db():
|
||||
"""Create a temporary database file for testing."""
|
||||
fd, path = tempfile.mkstemp(suffix=".db")
|
||||
os.close(fd)
|
||||
yield path
|
||||
if os.path.exists(path):
|
||||
os.unlink(path)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def space_repo(temp_db):
|
||||
"""Create a SqliteSpaceRepository for testing."""
|
||||
return SqliteSpaceRepository(temp_db)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def doc_repo(temp_db):
|
||||
"""Create a SqliteDocumentRepository for testing."""
|
||||
return SqliteDocumentRepository(temp_db)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def var_repo(temp_db):
|
||||
"""Create a SqliteVariableRepository for testing."""
|
||||
return SqliteVariableRepository(temp_db)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def ref_repo(temp_db):
|
||||
"""Create a SqliteReferenceRepository for testing."""
|
||||
return SqliteReferenceRepository(temp_db)
|
||||
|
||||
|
||||
class TestInitializeSpaceTables:
|
||||
"""Tests for initialize_space_tables function."""
|
||||
|
||||
def test_creates_tables(self, temp_db):
|
||||
"""Test that initialize_space_tables creates all required tables."""
|
||||
import sqlite3
|
||||
|
||||
initialize_space_tables(temp_db)
|
||||
|
||||
conn = sqlite3.connect(temp_db)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Check that all tables exist
|
||||
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
|
||||
tables = {row[0] for row in cursor.fetchall()}
|
||||
|
||||
assert "spaces" in tables
|
||||
assert "space_documents" in tables
|
||||
assert "space_variables" in tables
|
||||
assert "transclusion_references" in tables
|
||||
|
||||
conn.close()
|
||||
|
||||
def test_idempotent(self, temp_db):
|
||||
"""Test that initialize_space_tables can be called multiple times."""
|
||||
initialize_space_tables(temp_db)
|
||||
initialize_space_tables(temp_db) # Should not raise
|
||||
|
||||
def test_creates_parent_directory(self):
|
||||
"""Test that initialize_space_tables creates parent directories."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = os.path.join(tmpdir, "subdir", "nested", "test.db")
|
||||
initialize_space_tables(db_path)
|
||||
assert os.path.exists(db_path)
|
||||
|
||||
|
||||
class TestSqliteSpaceRepository:
|
||||
"""Tests for SqliteSpaceRepository."""
|
||||
|
||||
def test_create_space(self, space_repo):
|
||||
"""Test creating a new space."""
|
||||
space = InformationSpace(name="test-space", description="A test space")
|
||||
created = space_repo.create(space)
|
||||
|
||||
assert created.id == space.id
|
||||
assert created.name == "test-space"
|
||||
assert created.description == "A test space"
|
||||
|
||||
def test_create_space_duplicate_name_raises(self, space_repo):
|
||||
"""Test that creating a space with duplicate name raises ValueError."""
|
||||
space1 = InformationSpace(name="duplicate")
|
||||
space_repo.create(space1)
|
||||
|
||||
space2 = InformationSpace(name="duplicate")
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
space_repo.create(space2)
|
||||
|
||||
def test_get_by_id(self, space_repo):
|
||||
"""Test retrieving a space by ID."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
retrieved = space_repo.get_by_id(space.id)
|
||||
assert retrieved is not None
|
||||
assert retrieved.id == space.id
|
||||
assert retrieved.name == "test-space"
|
||||
|
||||
def test_get_by_id_not_found(self, space_repo):
|
||||
"""Test that get_by_id returns None for non-existent space."""
|
||||
result = space_repo.get_by_id("non-existent-id")
|
||||
assert result is None
|
||||
|
||||
def test_get_by_name(self, space_repo):
|
||||
"""Test retrieving a space by name."""
|
||||
space = InformationSpace(name="named-space")
|
||||
space_repo.create(space)
|
||||
|
||||
retrieved = space_repo.get_by_name("named-space")
|
||||
assert retrieved is not None
|
||||
assert retrieved.name == "named-space"
|
||||
|
||||
def test_get_by_name_not_found(self, space_repo):
|
||||
"""Test that get_by_name returns None for non-existent space."""
|
||||
result = space_repo.get_by_name("non-existent")
|
||||
assert result is None
|
||||
|
||||
def test_list_all_empty(self, space_repo):
|
||||
"""Test listing spaces when none exist."""
|
||||
spaces = space_repo.list_all()
|
||||
assert spaces == []
|
||||
|
||||
def test_list_all(self, space_repo):
|
||||
"""Test listing all spaces."""
|
||||
space1 = InformationSpace(name="alpha")
|
||||
space2 = InformationSpace(name="beta")
|
||||
space_repo.create(space1)
|
||||
space_repo.create(space2)
|
||||
|
||||
spaces = space_repo.list_all()
|
||||
assert len(spaces) == 2
|
||||
names = [s.name for s in spaces]
|
||||
assert "alpha" in names
|
||||
assert "beta" in names
|
||||
|
||||
def test_list_all_excludes_archived_by_default(self, space_repo):
|
||||
"""Test that list_all excludes archived spaces by default."""
|
||||
space1 = InformationSpace(name="active-space")
|
||||
space2 = InformationSpace(name="archived-space")
|
||||
space2.archive()
|
||||
|
||||
space_repo.create(space1)
|
||||
space_repo.create(space2)
|
||||
|
||||
spaces = space_repo.list_all()
|
||||
assert len(spaces) == 1
|
||||
assert spaces[0].name == "active-space"
|
||||
|
||||
def test_list_all_includes_archived_when_requested(self, space_repo):
|
||||
"""Test that list_all includes archived spaces when requested."""
|
||||
space1 = InformationSpace(name="active-space")
|
||||
space2 = InformationSpace(name="archived-space")
|
||||
space2.archive()
|
||||
|
||||
space_repo.create(space1)
|
||||
space_repo.create(space2)
|
||||
|
||||
spaces = space_repo.list_all(include_archived=True)
|
||||
assert len(spaces) == 2
|
||||
|
||||
def test_update_space(self, space_repo):
|
||||
"""Test updating a space."""
|
||||
space = InformationSpace(name="original")
|
||||
space_repo.create(space)
|
||||
|
||||
space.description = "Updated description"
|
||||
updated = space_repo.update(space)
|
||||
|
||||
assert updated.description == "Updated description"
|
||||
|
||||
# Verify persisted
|
||||
retrieved = space_repo.get_by_id(space.id)
|
||||
assert retrieved.description == "Updated description"
|
||||
|
||||
def test_update_nonexistent_raises(self, space_repo):
|
||||
"""Test that updating a non-existent space raises ValueError."""
|
||||
space = InformationSpace(name="non-existent")
|
||||
with pytest.raises(ValueError, match="does not exist"):
|
||||
space_repo.update(space)
|
||||
|
||||
def test_delete_space(self, space_repo):
|
||||
"""Test deleting a space."""
|
||||
space = InformationSpace(name="to-delete")
|
||||
space_repo.create(space)
|
||||
|
||||
result = space_repo.delete(space.id)
|
||||
assert result is True
|
||||
|
||||
# Verify deleted
|
||||
retrieved = space_repo.get_by_id(space.id)
|
||||
assert retrieved is None
|
||||
|
||||
def test_delete_nonexistent(self, space_repo):
|
||||
"""Test that deleting a non-existent space returns False."""
|
||||
result = space_repo.delete("non-existent-id")
|
||||
assert result is False
|
||||
|
||||
def test_exists(self, space_repo):
|
||||
"""Test checking if a space exists."""
|
||||
space = InformationSpace(name="existing")
|
||||
space_repo.create(space)
|
||||
|
||||
assert space_repo.exists(space.id) is True
|
||||
assert space_repo.exists("non-existent") is False
|
||||
|
||||
def test_get_children(self, space_repo):
|
||||
"""Test getting child spaces."""
|
||||
parent = InformationSpace(name="parent")
|
||||
space_repo.create(parent)
|
||||
|
||||
child1 = InformationSpace(name="child1", parent_space_id=parent.id)
|
||||
child2 = InformationSpace(name="child2", parent_space_id=parent.id)
|
||||
space_repo.create(child1)
|
||||
space_repo.create(child2)
|
||||
|
||||
children = space_repo.get_children(parent.id)
|
||||
assert len(children) == 2
|
||||
names = [c.name for c in children]
|
||||
assert "child1" in names
|
||||
assert "child2" in names
|
||||
|
||||
def test_get_children_empty(self, space_repo):
|
||||
"""Test getting children when none exist."""
|
||||
parent = InformationSpace(name="lonely-parent")
|
||||
space_repo.create(parent)
|
||||
|
||||
children = space_repo.get_children(parent.id)
|
||||
assert children == []
|
||||
|
||||
def test_space_with_config_and_metadata(self, space_repo):
|
||||
"""Test creating and retrieving a space with config and metadata."""
|
||||
config = SpaceConfig(theme="dark", history_enabled=True)
|
||||
metadata = SpaceMetadata(tags=["api", "docs"], author="tester")
|
||||
space = InformationSpace(
|
||||
name="configured-space",
|
||||
config=config,
|
||||
metadata=metadata,
|
||||
)
|
||||
space_repo.create(space)
|
||||
|
||||
retrieved = space_repo.get_by_id(space.id)
|
||||
assert retrieved.config.theme == "dark"
|
||||
assert retrieved.config.history_enabled is True
|
||||
assert retrieved.metadata.tags == ["api", "docs"]
|
||||
assert retrieved.metadata.author == "tester"
|
||||
|
||||
def test_space_status_persistence(self, space_repo):
|
||||
"""Test that space status is persisted correctly."""
|
||||
space = InformationSpace(name="lifecycle-test")
|
||||
space.activate()
|
||||
space_repo.create(space)
|
||||
|
||||
retrieved = space_repo.get_by_id(space.id)
|
||||
assert retrieved.status == SpaceStatus.ACTIVE
|
||||
|
||||
|
||||
class TestSqliteDocumentRepository:
|
||||
"""Tests for SqliteDocumentRepository."""
|
||||
|
||||
def test_add_document(self, doc_repo, space_repo):
|
||||
"""Test adding a document to a space."""
|
||||
# First create a space
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(
|
||||
space_id=space.id,
|
||||
document_id="doc-123",
|
||||
space_path="/intro.md",
|
||||
)
|
||||
added = doc_repo.add_document(doc)
|
||||
|
||||
assert added.id == doc.id
|
||||
assert added.space_path == "/intro.md"
|
||||
|
||||
def test_add_document_duplicate_path_raises(self, doc_repo, space_repo):
|
||||
"""Test that adding a document with duplicate path raises ValueError."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc1 = SpaceDocument(space_id=space.id, space_path="/same.md")
|
||||
doc_repo.add_document(doc1)
|
||||
|
||||
doc2 = SpaceDocument(space_id=space.id, space_path="/same.md")
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
doc_repo.add_document(doc2)
|
||||
|
||||
def test_get_document(self, doc_repo, space_repo):
|
||||
"""Test getting a document by ID."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(space_id=space.id, space_path="/test.md")
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
retrieved = doc_repo.get_document(doc.id)
|
||||
assert retrieved is not None
|
||||
assert retrieved.space_path == "/test.md"
|
||||
|
||||
def test_get_document_not_found(self, doc_repo):
|
||||
"""Test that get_document returns None for non-existent document."""
|
||||
result = doc_repo.get_document("non-existent")
|
||||
assert result is None
|
||||
|
||||
def test_get_by_space_path(self, doc_repo, space_repo):
|
||||
"""Test getting a document by space path."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(space_id=space.id, space_path="/api/docs.md")
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
retrieved = doc_repo.get_by_space_path(space.id, "/api/docs.md")
|
||||
assert retrieved is not None
|
||||
assert retrieved.id == doc.id
|
||||
|
||||
def test_get_by_space_path_not_found(self, doc_repo, space_repo):
|
||||
"""Test that get_by_space_path returns None for non-existent path."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
result = doc_repo.get_by_space_path(space.id, "/non-existent.md")
|
||||
assert result is None
|
||||
|
||||
def test_list_by_space(self, doc_repo, space_repo):
|
||||
"""Test listing documents in a space."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc1 = SpaceDocument(space_id=space.id, space_path="/first.md", order_index=0)
|
||||
doc2 = SpaceDocument(space_id=space.id, space_path="/second.md", order_index=1)
|
||||
doc_repo.add_document(doc1)
|
||||
doc_repo.add_document(doc2)
|
||||
|
||||
docs = doc_repo.list_by_space(space.id)
|
||||
assert len(docs) == 2
|
||||
assert docs[0].space_path == "/first.md"
|
||||
assert docs[1].space_path == "/second.md"
|
||||
|
||||
def test_list_by_space_empty(self, doc_repo, space_repo):
|
||||
"""Test listing documents when none exist."""
|
||||
space = InformationSpace(name="empty-space")
|
||||
space_repo.create(space)
|
||||
|
||||
docs = doc_repo.list_by_space(space.id)
|
||||
assert docs == []
|
||||
|
||||
def test_update_document(self, doc_repo, space_repo):
|
||||
"""Test updating a document."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(space_id=space.id, space_path="/old.md")
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
doc.content_hash = "newhash123"
|
||||
updated = doc_repo.update_document(doc)
|
||||
assert updated.content_hash == "newhash123"
|
||||
|
||||
# Verify persisted
|
||||
retrieved = doc_repo.get_document(doc.id)
|
||||
assert retrieved.content_hash == "newhash123"
|
||||
|
||||
def test_update_nonexistent_raises(self, doc_repo):
|
||||
"""Test that updating a non-existent document raises ValueError."""
|
||||
doc = SpaceDocument(space_path="/non-existent.md")
|
||||
with pytest.raises(ValueError, match="does not exist"):
|
||||
doc_repo.update_document(doc)
|
||||
|
||||
def test_remove_document(self, doc_repo, space_repo):
|
||||
"""Test removing a document."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(space_id=space.id, space_path="/to-remove.md")
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
result = doc_repo.remove_document(doc.id)
|
||||
assert result is True
|
||||
|
||||
# Verify removed
|
||||
retrieved = doc_repo.get_document(doc.id)
|
||||
assert retrieved is None
|
||||
|
||||
def test_remove_nonexistent(self, doc_repo):
|
||||
"""Test that removing a non-existent document returns False."""
|
||||
result = doc_repo.remove_document("non-existent")
|
||||
assert result is False
|
||||
|
||||
def test_move_document(self, doc_repo, space_repo):
|
||||
"""Test moving a document to a new path."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(space_id=space.id, space_path="/old-path.md")
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
moved = doc_repo.move_document(doc.id, "/new-path.md")
|
||||
assert moved.space_path == "/new-path.md"
|
||||
|
||||
# Verify old path no longer works
|
||||
old_result = doc_repo.get_by_space_path(space.id, "/old-path.md")
|
||||
assert old_result is None
|
||||
|
||||
# Verify new path works
|
||||
new_result = doc_repo.get_by_space_path(space.id, "/new-path.md")
|
||||
assert new_result is not None
|
||||
|
||||
def test_move_document_to_existing_path_raises(self, doc_repo, space_repo):
|
||||
"""Test that moving to an existing path raises ValueError."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc1 = SpaceDocument(space_id=space.id, space_path="/first.md")
|
||||
doc2 = SpaceDocument(space_id=space.id, space_path="/second.md")
|
||||
doc_repo.add_document(doc1)
|
||||
doc_repo.add_document(doc2)
|
||||
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
doc_repo.move_document(doc1.id, "/second.md")
|
||||
|
||||
def test_move_nonexistent_raises(self, doc_repo):
|
||||
"""Test that moving a non-existent document raises ValueError."""
|
||||
with pytest.raises(ValueError, match="does not exist"):
|
||||
doc_repo.move_document("non-existent", "/new-path.md")
|
||||
|
||||
def test_reorder_documents(self, doc_repo, space_repo):
|
||||
"""Test reordering documents within a space."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc1 = SpaceDocument(space_id=space.id, space_path="/a.md", order_index=0)
|
||||
doc2 = SpaceDocument(space_id=space.id, space_path="/b.md", order_index=1)
|
||||
doc3 = SpaceDocument(space_id=space.id, space_path="/c.md", order_index=2)
|
||||
doc_repo.add_document(doc1)
|
||||
doc_repo.add_document(doc2)
|
||||
doc_repo.add_document(doc3)
|
||||
|
||||
# Reorder: c, a, b
|
||||
doc_repo.reorder_documents(space.id, [doc3.id, doc1.id, doc2.id])
|
||||
|
||||
docs = doc_repo.list_by_space(space.id)
|
||||
assert docs[0].id == doc3.id
|
||||
assert docs[1].id == doc1.id
|
||||
assert docs[2].id == doc2.id
|
||||
|
||||
def test_update_content_hash(self, doc_repo, space_repo):
|
||||
"""Test updating content hash."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(space_id=space.id, space_path="/test.md")
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
doc_repo.update_content_hash(doc.id, "newhash456")
|
||||
|
||||
retrieved = doc_repo.get_document(doc.id)
|
||||
assert retrieved.content_hash == "newhash456"
|
||||
|
||||
def test_document_with_metadata(self, doc_repo, space_repo):
|
||||
"""Test document with custom metadata."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(
|
||||
space_id=space.id,
|
||||
space_path="/with-meta.md",
|
||||
metadata={"title": "Test Document", "version": "1.0"},
|
||||
)
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
retrieved = doc_repo.get_document(doc.id)
|
||||
assert retrieved.metadata["title"] == "Test Document"
|
||||
assert retrieved.metadata["version"] == "1.0"
|
||||
|
||||
|
||||
class TestSqliteVariableRepository:
|
||||
"""Tests for SqliteVariableRepository."""
|
||||
|
||||
def test_set_variable(self, var_repo, space_repo):
|
||||
"""Test setting a variable."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
var = SpaceVariable(
|
||||
space_id=space.id,
|
||||
name="version",
|
||||
value="1.0.0",
|
||||
)
|
||||
result = var_repo.set_variable(var)
|
||||
assert result.name == "version"
|
||||
assert result.value == "1.0.0"
|
||||
|
||||
def test_set_variable_overwrites(self, var_repo, space_repo):
|
||||
"""Test that setting a variable with same name overwrites."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
var1 = SpaceVariable(space_id=space.id, name="config", value="old")
|
||||
var_repo.set_variable(var1)
|
||||
|
||||
var2 = SpaceVariable(space_id=space.id, name="config", value="new")
|
||||
var_repo.set_variable(var2)
|
||||
|
||||
retrieved = var_repo.get_variable(space.id, "config")
|
||||
assert retrieved.value == "new"
|
||||
|
||||
def test_get_variable(self, var_repo, space_repo):
|
||||
"""Test getting a variable."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
var = SpaceVariable(space_id=space.id, name="api_key", value="secret123")
|
||||
var_repo.set_variable(var)
|
||||
|
||||
retrieved = var_repo.get_variable(space.id, "api_key")
|
||||
assert retrieved is not None
|
||||
assert retrieved.value == "secret123"
|
||||
|
||||
def test_get_variable_not_found(self, var_repo, space_repo):
|
||||
"""Test that get_variable returns None for non-existent variable."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
result = var_repo.get_variable(space.id, "non-existent")
|
||||
assert result is None
|
||||
|
||||
def test_list_variables(self, var_repo, space_repo):
|
||||
"""Test listing variables in a space."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
var1 = SpaceVariable(space_id=space.id, name="var1", value="a")
|
||||
var2 = SpaceVariable(space_id=space.id, name="var2", value="b")
|
||||
var_repo.set_variable(var1)
|
||||
var_repo.set_variable(var2)
|
||||
|
||||
variables = var_repo.list_variables(space.id)
|
||||
assert len(variables) == 2
|
||||
names = [v.name for v in variables]
|
||||
assert "var1" in names
|
||||
assert "var2" in names
|
||||
|
||||
def test_list_variables_empty(self, var_repo, space_repo):
|
||||
"""Test listing variables when none exist."""
|
||||
space = InformationSpace(name="empty-space")
|
||||
space_repo.create(space)
|
||||
|
||||
variables = var_repo.list_variables(space.id)
|
||||
assert variables == []
|
||||
|
||||
def test_list_variables_with_scope_filter(self, var_repo, space_repo):
|
||||
"""Test listing variables filtered by scope."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
var1 = SpaceVariable(space_id=space.id, name="global", value="x", scope="space")
|
||||
var2 = SpaceVariable(space_id=space.id, name="local", value="y", scope="document")
|
||||
var_repo.set_variable(var1)
|
||||
var_repo.set_variable(var2)
|
||||
|
||||
space_vars = var_repo.list_variables(space.id, scope="space")
|
||||
assert len(space_vars) == 1
|
||||
assert space_vars[0].name == "global"
|
||||
|
||||
doc_vars = var_repo.list_variables(space.id, scope="document")
|
||||
assert len(doc_vars) == 1
|
||||
assert doc_vars[0].name == "local"
|
||||
|
||||
def test_delete_variable(self, var_repo, space_repo):
|
||||
"""Test deleting a variable."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
var = SpaceVariable(space_id=space.id, name="to-delete", value="bye")
|
||||
var_repo.set_variable(var)
|
||||
|
||||
result = var_repo.delete_variable(space.id, "to-delete")
|
||||
assert result is True
|
||||
|
||||
# Verify deleted
|
||||
retrieved = var_repo.get_variable(space.id, "to-delete")
|
||||
assert retrieved is None
|
||||
|
||||
def test_delete_nonexistent(self, var_repo, space_repo):
|
||||
"""Test that deleting a non-existent variable returns False."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
result = var_repo.delete_variable(space.id, "non-existent")
|
||||
assert result is False
|
||||
|
||||
def test_variable_with_complex_value(self, var_repo, space_repo):
|
||||
"""Test variable with complex JSON value."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
complex_value = {
|
||||
"endpoints": [
|
||||
{"url": "/api/v1", "methods": ["GET", "POST"]},
|
||||
{"url": "/api/v2", "methods": ["GET"]},
|
||||
],
|
||||
"config": {"timeout": 30, "retries": 3},
|
||||
}
|
||||
var = SpaceVariable(space_id=space.id, name="api_config", value=complex_value)
|
||||
var_repo.set_variable(var)
|
||||
|
||||
retrieved = var_repo.get_variable(space.id, "api_config")
|
||||
assert retrieved.value == complex_value
|
||||
assert retrieved.value["endpoints"][0]["url"] == "/api/v1"
|
||||
|
||||
|
||||
class TestSqliteReferenceRepository:
|
||||
"""Tests for SqliteReferenceRepository."""
|
||||
|
||||
def test_add_reference(self, ref_repo, space_repo):
|
||||
"""Test adding a transclusion reference."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
ref = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space.id,
|
||||
)
|
||||
result = ref_repo.add_reference(ref)
|
||||
assert result.source_doc_id == "doc-1"
|
||||
assert result.target_doc_id == "doc-2"
|
||||
|
||||
def test_add_reference_overwrites(self, ref_repo, space_repo):
|
||||
"""Test that adding same reference overwrites (no duplicates)."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
ref1 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref_repo.add_reference(ref1)
|
||||
|
||||
# Add same reference again (should not raise)
|
||||
ref2 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref_repo.add_reference(ref2)
|
||||
|
||||
# Should still only have one reference
|
||||
refs = ref_repo.get_references_from("doc-1", space.id)
|
||||
assert len(refs) == 1
|
||||
|
||||
def test_get_references_from(self, ref_repo, space_repo):
|
||||
"""Test getting references from a source document."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
ref1 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref2 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-3",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref_repo.add_reference(ref1)
|
||||
ref_repo.add_reference(ref2)
|
||||
|
||||
refs = ref_repo.get_references_from("doc-1", space.id)
|
||||
assert len(refs) == 2
|
||||
targets = [r.target_doc_id for r in refs]
|
||||
assert "doc-2" in targets
|
||||
assert "doc-3" in targets
|
||||
|
||||
def test_get_references_from_empty(self, ref_repo, space_repo):
|
||||
"""Test getting references when none exist."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
refs = ref_repo.get_references_from("non-existent", space.id)
|
||||
assert refs == []
|
||||
|
||||
def test_get_references_to(self, ref_repo, space_repo):
|
||||
"""Test getting references to a target document."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
ref1 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="shared-doc",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref2 = TransclusionReference(
|
||||
source_doc_id="doc-2",
|
||||
target_doc_id="shared-doc",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref_repo.add_reference(ref1)
|
||||
ref_repo.add_reference(ref2)
|
||||
|
||||
refs = ref_repo.get_references_to("shared-doc", space.id)
|
||||
assert len(refs) == 2
|
||||
sources = [r.source_doc_id for r in refs]
|
||||
assert "doc-1" in sources
|
||||
assert "doc-2" in sources
|
||||
|
||||
def test_clear_references_from(self, ref_repo, space_repo):
|
||||
"""Test clearing references from a source document."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
ref1 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref2 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-3",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref_repo.add_reference(ref1)
|
||||
ref_repo.add_reference(ref2)
|
||||
|
||||
count = ref_repo.clear_references_from("doc-1", space.id)
|
||||
assert count == 2
|
||||
|
||||
# Verify cleared
|
||||
refs = ref_repo.get_references_from("doc-1", space.id)
|
||||
assert refs == []
|
||||
|
||||
def test_clear_references_from_empty(self, ref_repo, space_repo):
|
||||
"""Test clearing references when none exist."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
count = ref_repo.clear_references_from("non-existent", space.id)
|
||||
assert count == 0
|
||||
|
||||
def test_get_dependents(self, ref_repo, space_repo):
|
||||
"""Test getting dependent documents."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
# doc-1 and doc-2 both reference shared-component
|
||||
ref1 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="shared-component",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref2 = TransclusionReference(
|
||||
source_doc_id="doc-2",
|
||||
target_doc_id="shared-component",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref_repo.add_reference(ref1)
|
||||
ref_repo.add_reference(ref2)
|
||||
|
||||
dependents = ref_repo.get_dependents("shared-component", space.id)
|
||||
assert len(dependents) == 2
|
||||
assert "doc-1" in dependents
|
||||
assert "doc-2" in dependents
|
||||
|
||||
def test_get_dependents_empty(self, ref_repo, space_repo):
|
||||
"""Test getting dependents when none exist."""
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
dependents = ref_repo.get_dependents("orphan-doc", space.id)
|
||||
assert dependents == []
|
||||
|
||||
def test_references_isolated_by_space(self, ref_repo, space_repo):
|
||||
"""Test that references are isolated by space."""
|
||||
space1 = InformationSpace(name="space-1")
|
||||
space2 = InformationSpace(name="space-2")
|
||||
space_repo.create(space1)
|
||||
space_repo.create(space2)
|
||||
|
||||
# Same source/target IDs in different spaces
|
||||
ref1 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space1.id,
|
||||
)
|
||||
ref2 = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space2.id,
|
||||
)
|
||||
ref_repo.add_reference(ref1)
|
||||
ref_repo.add_reference(ref2)
|
||||
|
||||
# Each space should have its own reference
|
||||
refs1 = ref_repo.get_references_from("doc-1", space1.id)
|
||||
refs2 = ref_repo.get_references_from("doc-1", space2.id)
|
||||
|
||||
assert len(refs1) == 1
|
||||
assert len(refs2) == 1
|
||||
assert refs1[0].space_id == space1.id
|
||||
assert refs2[0].space_id == space2.id
|
||||
|
||||
|
||||
class TestCascadeDelete:
|
||||
"""Test cascade delete behavior."""
|
||||
|
||||
def test_deleting_space_cascades_to_documents(self, temp_db):
|
||||
"""Test that deleting a space also deletes its documents."""
|
||||
space_repo = SqliteSpaceRepository(temp_db)
|
||||
doc_repo = SqliteDocumentRepository(temp_db)
|
||||
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
doc = SpaceDocument(space_id=space.id, space_path="/test.md")
|
||||
doc_repo.add_document(doc)
|
||||
|
||||
# Delete space
|
||||
space_repo.delete(space.id)
|
||||
|
||||
# Document should also be gone
|
||||
retrieved = doc_repo.get_document(doc.id)
|
||||
assert retrieved is None
|
||||
|
||||
def test_deleting_space_cascades_to_variables(self, temp_db):
|
||||
"""Test that deleting a space also deletes its variables."""
|
||||
space_repo = SqliteSpaceRepository(temp_db)
|
||||
var_repo = SqliteVariableRepository(temp_db)
|
||||
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
var = SpaceVariable(space_id=space.id, name="var", value="val")
|
||||
var_repo.set_variable(var)
|
||||
|
||||
# Delete space
|
||||
space_repo.delete(space.id)
|
||||
|
||||
# Variable should also be gone
|
||||
retrieved = var_repo.get_variable(space.id, "var")
|
||||
assert retrieved is None
|
||||
|
||||
def test_deleting_space_cascades_to_references(self, temp_db):
|
||||
"""Test that deleting a space also deletes its references."""
|
||||
space_repo = SqliteSpaceRepository(temp_db)
|
||||
ref_repo = SqliteReferenceRepository(temp_db)
|
||||
|
||||
space = InformationSpace(name="test-space")
|
||||
space_repo.create(space)
|
||||
|
||||
ref = TransclusionReference(
|
||||
source_doc_id="doc-1",
|
||||
target_doc_id="doc-2",
|
||||
space_id=space.id,
|
||||
)
|
||||
ref_repo.add_reference(ref)
|
||||
|
||||
# Delete space
|
||||
space_repo.delete(space.id)
|
||||
|
||||
# Reference should also be gone
|
||||
refs = ref_repo.get_references_from("doc-1", space.id)
|
||||
assert refs == []
|
||||
Reference in New Issue
Block a user