feat: Complete Issue #52 - Capture actual heading text in schemas

Implement comprehensive heading text capture functionality that allows schemas to
enforce specific heading text requirements through enum constraints:

• New CLI option: --capture-heading-text flag for exact text constraints
• Schema generation with heading text as enum constraints (not just structure)
• Advanced validation engine that enforces heading text requirements
• Metaschema extension: x-markitect-heading-text-capture marker
• Full integration with Issue #51 outline mode capabilities
• Comprehensive error reporting for heading text mismatches
• Complete backward compatibility with existing schema generation

Technical implementation:
- Extended SchemaGenerator with capture_heading_text parameter
- Enhanced validation system to check enum constraints on heading content
- Added _validate_heading_text_constraints_with_errors for detailed reporting
- Integrated with existing metaschema validation from Issue #50
- Preserved document order of headings in enum constraints

Key features:
- Schemas can now specify required heading text via enum constraints
- Validation rejects documents with incorrect heading text
- Detailed error messages show expected vs actual heading text
- Works seamlessly with outline mode depth controls
- Maintains 100% compatibility with 513 existing tests

Usage examples:
  markitect schema-generate --capture-heading-text document.md
  markitect schema-generate --mode outline --capture-heading-text --depth 2 document.md

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-01 08:03:11 +02:00
parent b5f510f9c7
commit 0f37900222
4 changed files with 534 additions and 10 deletions

View File

@@ -33,7 +33,8 @@ class SchemaGenerator:
file_path: Path,
max_depth: Optional[int] = None,
mode: Optional[str] = None,
outline_depth: Optional[int] = None
outline_depth: Optional[int] = None,
capture_heading_text: bool = False
) -> Dict[str, Any]:
"""
Generate a JSON schema from a markdown file's AST structure.
@@ -43,6 +44,7 @@ class SchemaGenerator:
max_depth: Maximum heading depth to include (None = unlimited)
mode: Generation mode ('outline' for structure-focused schemas)
outline_depth: Depth limit for outline mode
capture_heading_text: Whether to capture exact heading text as constraints
Returns:
JSON schema as a dictionary
@@ -66,7 +68,7 @@ class SchemaGenerator:
structure_analysis = self._analyze_ast_structure(ast_tokens, max_depth)
# Generate the JSON schema
schema = self._create_json_schema(structure_analysis, file_path.name, mode=mode, outline_depth=outline_depth)
schema = self._create_json_schema(structure_analysis, file_path.name, mode=mode, outline_depth=outline_depth, capture_heading_text=capture_heading_text)
return schema
@@ -183,7 +185,8 @@ class SchemaGenerator:
analysis: Dict[str, Any],
filename: str,
mode: Optional[str] = None,
outline_depth: Optional[int] = None
outline_depth: Optional[int] = None,
capture_heading_text: bool = False
) -> Dict[str, Any]:
"""
Create a JSON schema from structural analysis.
@@ -193,6 +196,7 @@ class SchemaGenerator:
filename: Name of the source file
mode: Generation mode ('outline' for structure-focused schemas)
outline_depth: Depth limit for outline mode
capture_heading_text: Whether to capture exact heading text as constraints
Returns:
JSON schema dictionary
@@ -214,18 +218,30 @@ class SchemaGenerator:
if outline_depth is not None:
schema["x-markitect-outline-depth"] = outline_depth
# Add metaschema extension for heading text capture
if capture_heading_text:
schema["x-markitect-heading-text-capture"] = True
# Add heading structure
if analysis['headings']:
heading_properties = {}
for level_key, headings in analysis['headings'].items():
if headings: # Only include levels that have content
# Configure content property based on heading text capture
if capture_heading_text:
# Extract actual heading texts in document order
heading_texts = [heading['content'] for heading in headings]
content_property = {"enum": heading_texts}
else:
content_property = {"type": "string"}
heading_properties[level_key] = {
"type": "array",
"description": f"Headings at {level_key.replace('_', ' ')}",
"items": {
"type": "object",
"properties": {
"content": {"type": "string"},
"content": content_property,
"level": {"type": "integer"},
"position": {"type": "integer"}
},