feat: implement comprehensive front matter preservation and unicode handling
This commit provides complete front matter support and fixes unicode character handling across all explode-implode variants (flat, hierarchical, semantic). ## Front Matter Implementation - Added FrontmatterParser integration to all three variants - Extract front matter during explosion to `_frontmatter.yml` files - Restore front matter during implosion by prepending to content - Support for YAML front matter with proper type preservation - Handles strings, arrays, dates, and other YAML data types ## Unicode Character Fixes - Fixed filename sanitization inconsistency in flat variant - Used consistent `_sanitize_filename()` method for both file creation and manifest paths - Resolved issue where unicode characters in headings caused empty reconstructed files - Ensured proper handling of emojis and special characters in content ## CLI Integration - Updated CLI implode command to use variant system instead of legacy concatenation - Fixed default output file naming to use `_imploded.md` suffix - Enhanced DocumentManager with missing `get_file` method for database integration - Improved processing info and preview support for dry-run mode ## Test Coverage - Reactivated `test_issue_149_roundtrip_validation.py` front matter test - Updated tests to use semantic equivalence checking instead of exact string matching - Fixed all 3 failing tests in `test_roundtrip_consolidated.py` - All 10 roundtrip tests and 11 Issue #149 validation tests now pass ## Technical Improvements - Better content normalization with preserved internal structure - Enhanced recursive directory processing for deep nesting scenarios - Fixed variable naming conflicts in variant file creation logic - Improved error handling and graceful fallbacks for front matter processing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -267,11 +267,19 @@ End of document.
|
||||
])
|
||||
assert result.returncode == 0
|
||||
|
||||
# Verify front matter preservation
|
||||
# Verify front matter preservation - check for semantic equivalence
|
||||
reconstructed_content = reconstructed_file.read_text(encoding='utf-8')
|
||||
assert 'title: "Test Document"' in reconstructed_content
|
||||
assert 'author: "Test Author"' in reconstructed_content
|
||||
assert "tags:" in reconstructed_content
|
||||
|
||||
# Use frontmatter parser to check semantic equivalence
|
||||
from markitect.matter_frontmatter.parser import FrontmatterParser
|
||||
parser = FrontmatterParser()
|
||||
reconstructed_fm = parser.extract_frontmatter(reconstructed_content)
|
||||
|
||||
# Check that all expected values are preserved
|
||||
assert reconstructed_fm.get('title') == 'Test Document'
|
||||
assert reconstructed_fm.get('author') == 'Test Author'
|
||||
assert reconstructed_fm.get('tags') == ['test', 'markdown']
|
||||
assert reconstructed_fm.get('version') == 1.0
|
||||
|
||||
def test_unicode_and_special_characters_roundtrip(self):
|
||||
"""Test roundtrip with unicode and special characters."""
|
||||
|
||||
Reference in New Issue
Block a user