Files
markitect-main/tests/test_issue_5_schema_generation.py
tegwick 0acde1e840 feat: Complete Issue #5 - Schema Generation Foundation for arc42 Architecture Documentation
CRITICAL MILESTONE: Establish schema-driven architecture foundation that unlocks the entire
pathway to HolyGrailRequirement - intelligent arc42 architecture documentation with AI-supported
plan-actual comparison capabilities.

Major Components Implemented:

🎯 SCHEMA GENERATION SERVICE:
• SchemaGenerator class with sophisticated AST analysis capabilities
• Depth-limited heading extraction for arc42 section-specific schemas
• Comprehensive structural element detection (headings, paragraphs, lists, code blocks, etc.)
• JSON Schema Draft 7 compliant output with proper validation metadata
• Robust error handling with domain-specific exceptions (FileNotFoundError, InvalidDepthError)

🖥️ CLI INTEGRATION:
• generate-schema command with full argument and option support
• Multiple output formats (JSON, YAML) with stdout or file output
• Configurable depth limiting for architectural document analysis
• User-friendly summaries and progress feedback
• Integration with existing CLI framework and error handling patterns

📊 COMPREHENSIVE TESTING:
• 6 comprehensive test scenarios covering core functionality and edge cases
• Perfect integration with architectural test system (71 service layer tests passing)
• Test coverage for schema generation, depth limiting, error handling, and JSON compliance
• Architectural layer L4 (Service) test placement following reverse dependency principles

🏗️ STRATEGIC ARCHITECTURE:
• Leverages existing AST processing infrastructure for maximum efficiency
• Builds on proven markdown-it parsing with intelligent caching
• Seamless integration with existing CLI framework and configuration system
• Foundation for Issues #7 (Schema Validation) and #8 (Validation Errors)

Technical Excellence:
- Full JSON Schema Draft 7 specification compliance for validator compatibility
- Sophisticated AST token analysis with structural pattern recognition
- Configurable depth filtering essential for arc42 template compliance
- Comprehensive metadata extraction for architectural analysis
- Robust exception handling with actionable error messages

Strategic Value:
- 🎯 33% completion of critical path Phase 1 (Schema Foundation)
- 🔑 Unlocks schema validation and error reporting capabilities
- 🏛️ Essential building block for arc42 architectural documentation intelligence
- 🚀 Direct pathway to AI-supported plan-actual comparison capabilities

This implementation transforms MarkiTect from advanced markdown processor toward intelligent
architecture documentation platform, establishing the schema-driven foundation critical for
achieving the HolyGrailRequirement of arc42 compliance with AI intelligence.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-29 14:53:05 +02:00

306 lines
9.2 KiB
Python

"""
Test for Issue #5: Generate a Schema from a Markdown File.
Tests the ability to create JSON schemas from markdown file AST structures
with configurable depth limitations for structural analysis.
"""
import json
import pytest
from pathlib import Path
from tempfile import NamedTemporaryFile
from markitect.schema_generator import SchemaGenerator
from markitect.exceptions import FileNotFoundError, InvalidDepthError
class TestIssue5SchemaGeneration:
"""Test suite for schema generation from markdown files."""
def setup_method(self):
"""Set up test environment."""
self.schema_generator = SchemaGenerator()
def teardown_method(self):
"""Clean up after tests."""
pass
def test_generate_schema_from_simple_markdown(self):
"""
ISSUE #5: Test basic schema generation from simple markdown structure.
Verifies that a simple markdown file generates a valid JSON schema
that captures heading structure and basic elements.
"""
# Arrange - Simple markdown with clear structure
markdown_content = """# Main Heading
This is a paragraph.
## Sub Heading
- List item 1
- List item 2
Some text here.
"""
with NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
f.write(markdown_content)
temp_file = Path(f.name)
try:
# Act - Generate schema with unlimited depth
result = self.schema_generator.generate_schema_from_file(temp_file)
# Assert - Schema should be valid JSON and contain expected structure
assert isinstance(result, dict)
assert "$schema" in result
assert "type" in result
assert result["type"] == "object"
# Should capture heading structure
properties = result.get("properties", {})
assert "headings" in properties
# Should define heading levels found in the document
heading_properties = properties["headings"]["properties"]
assert "level_1" in heading_properties # # Main Heading
assert "level_2" in heading_properties # ## Sub Heading
# Should capture other structural elements
assert "paragraphs" in properties
assert "lists" in properties
finally:
temp_file.unlink()
def test_generate_schema_with_depth_limitation(self):
"""
ISSUE #5: Test schema generation with depth limitation.
Verifies that depth parameter correctly limits which heading levels
are included in the generated schema.
"""
# Arrange - Markdown with multiple heading levels
markdown_content = """# Level 1
Content here.
## Level 2
More content.
### Level 3
Deep content.
#### Level 4
Very deep content.
"""
with NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
f.write(markdown_content)
temp_file = Path(f.name)
try:
# Act - Generate schema with depth limit of 2
result = self.schema_generator.generate_schema_from_file(temp_file, max_depth=2)
# Assert - Only levels 1 and 2 should be included
properties = result.get("properties", {})
heading_properties = properties["headings"]["properties"]
assert "level_1" in heading_properties
assert "level_2" in heading_properties
assert "level_3" not in heading_properties # Should be excluded
assert "level_4" not in heading_properties # Should be excluded
finally:
temp_file.unlink()
def test_generate_schema_from_complex_document(self):
"""
ISSUE #5: Test schema generation from complex markdown document.
Verifies handling of complex markdown structures including
code blocks, blockquotes, links, and nested lists.
"""
# Arrange - Complex markdown with various elements
markdown_content = """# Documentation
## Overview
This is an **important** document with *emphasis*.
### Features
- Feature 1 with [link](https://example.com)
- Feature 2
- Nested item A
- Nested item B
### Code Examples
```python
def hello():
print("Hello, World!")
```
> This is a blockquote with important information.
## API Reference
| Method | Description |
|--------|-------------|
| GET | Retrieve data |
| POST | Create data |
### Error Handling
1. Check input parameters
2. Validate data types
3. Handle exceptions
#### Implementation Details
Some implementation notes here.
"""
with NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
f.write(markdown_content)
temp_file = Path(f.name)
try:
# Act - Generate schema
result = self.schema_generator.generate_schema_from_file(temp_file)
# Assert - Schema should capture complex structures
properties = result.get("properties", {})
# Should have all major structural elements
expected_elements = ["headings", "paragraphs", "lists", "code_blocks", "blockquotes", "tables"]
for element in expected_elements:
assert element in properties, f"Missing {element} in schema"
# Should capture heading hierarchy
heading_properties = properties["headings"]["properties"]
assert "level_1" in heading_properties
assert "level_2" in heading_properties
assert "level_3" in heading_properties
assert "level_4" in heading_properties
finally:
temp_file.unlink()
def test_generate_schema_file_not_found(self):
"""
ISSUE #5: Test error handling when markdown file doesn't exist.
"""
# Arrange - Non-existent file path
non_existent_file = Path("/tmp/non_existent_file.md")
# Act & Assert - Should raise appropriate exception
with pytest.raises(FileNotFoundError):
self.schema_generator.generate_schema_from_file(non_existent_file)
def test_generate_schema_invalid_depth(self):
"""
ISSUE #5: Test error handling for invalid depth parameters.
"""
# Arrange - Simple markdown file
markdown_content = "# Test\n\nContent here."
with NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
f.write(markdown_content)
temp_file = Path(f.name)
try:
# Act & Assert - Invalid depth values should raise exceptions
with pytest.raises(InvalidDepthError):
self.schema_generator.generate_schema_from_file(temp_file, max_depth=0)
with pytest.raises(InvalidDepthError):
self.schema_generator.generate_schema_from_file(temp_file, max_depth=-1)
finally:
temp_file.unlink()
def test_generate_schema_empty_file(self):
"""
ISSUE #5: Test schema generation from empty markdown file.
"""
# Arrange - Empty markdown file
with NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
f.write("")
temp_file = Path(f.name)
try:
# Act - Generate schema from empty file
result = self.schema_generator.generate_schema_from_file(temp_file)
# Assert - Should generate valid but minimal schema
assert isinstance(result, dict)
assert "$schema" in result
assert "type" in result
# Should have empty or minimal structure
properties = result.get("properties", {})
if "headings" in properties:
heading_properties = properties["headings"].get("properties", {})
assert len(heading_properties) == 0 # No headings in empty file
finally:
temp_file.unlink()
def test_schema_format_compliance(self):
"""
ISSUE #5: Test that generated schema follows JSON Schema specification.
Verifies the output is a valid JSON Schema that could be used
for validation by standard JSON Schema validators.
"""
# Arrange - Standard markdown structure
markdown_content = """# Title
## Section
Content with **formatting**.
- List item
### Subsection
More content.
"""
with NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
f.write(markdown_content)
temp_file = Path(f.name)
try:
# Act - Generate schema
result = self.schema_generator.generate_schema_from_file(temp_file)
# Assert - Should be valid JSON Schema format
assert result.get("$schema") == "http://json-schema.org/draft-07/schema#"
assert result.get("type") == "object"
assert "properties" in result
assert "title" in result
assert "description" in result
# Should be serializable as JSON
json_string = json.dumps(result, indent=2)
assert len(json_string) > 0
# Should be deserializable back to same structure
deserialized = json.loads(json_string)
assert deserialized == result
finally:
temp_file.unlink()
if __name__ == '__main__':
pytest.main([__file__, '-v'])