feat: implement Phase 1 - Enhanced Schema Format with Classifications
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled

Complete Phase 1 of Schema Evolution Workplan implementing flexible content
control and section classification system.

## New Features

### 1. x-markitect-sections Extension
- Five classification levels: required, recommended, optional, discouraged, improper
- Per-section content constraints (paragraphs, code blocks, lists)
- Position hints for section ordering
- Custom error/warning messages
- Alternative section names support
- Content instructions for authors

### 2. x-markitect-content-control Extension
- Required/discouraged/forbidden pattern matching
- Content quality metrics (word count, readability target, sentence count)
- Content instruction arrays
- Link validation configuration

### 3. Metaschema Validation
- Updated markitect-metaschema.json with complete validation rules
- Enhanced metaschema.py with validation methods for both extensions
- Comprehensive validation of all extension properties
- Clear error messages for invalid schemas

### 4. Documentation & Examples
- Complete specification in docs/specifications/schema-extensions-spec.md
- Enhanced manpage schema demonstrating all 5 classification levels
- API documentation schema showing alternative patterns
- Detailed usage examples and validation behavior

## Implementation Details

**Files Modified:**
- markitect/schemas/markitect-metaschema.json: Added extension definitions
- markitect/metaschema.py: Added _validate_sections() and _validate_content_control()

**Files Created:**
- docs/specifications/schema-extensions-spec.md: Complete specification (v1.0)
- examples/manpages/enhanced-manpage-schema.json: Demonstrates all classifications
- examples/manpages/api-documentation-schema.json: Shows API doc patterns

## Validation Behavior

**Classification Levels:**
- required: Missing = ERROR (validation fails)
- recommended: Missing = WARNING (validation succeeds with warnings)
- optional: No validation impact
- discouraged: Present = WARNING (validation succeeds with warnings)
- improper: Present = ERROR (validation fails)

## Next Steps

Phase 2: Schema Refinement Tools (schema-analyze, schema-refine, schema-compose)
Phase 3: Enhanced Validation Engine (classification-aware validation, quality metrics)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-04 21:02:51 +01:00
parent b51999582e
commit d68e762612
5 changed files with 1466 additions and 0 deletions

View File

@@ -112,6 +112,8 @@ class MetaschemaValidator:
"x-markitect-instruction-type": self._validate_instruction_type,
"x-markitect-generation-mode": self._validate_generation_mode,
"x-markitect-generated-from": self._validate_generated_from,
"x-markitect-sections": self._validate_sections,
"x-markitect-content-control": self._validate_content_control,
}
# Apply validation rules
@@ -193,4 +195,190 @@ class MetaschemaValidator:
"x-markitect-generated-from must be a string",
property_name
)
return None
def _validate_sections(self, value: Any, property_name: str) -> Optional[ValidationError]:
"""Validate x-markitect-sections property."""
if not isinstance(value, dict):
return ValidationError(
"x-markitect-sections must be an object",
property_name
)
# Validate each section definition
for section_name, section_def in value.items():
# Section name should be UPPERCASE (convention)
if not isinstance(section_name, str):
return ValidationError(
f"Section name must be a string: {section_name}",
f"{property_name}.{section_name}"
)
if not isinstance(section_def, dict):
return ValidationError(
f"Section definition must be an object: {section_name}",
f"{property_name}.{section_name}"
)
# Validate required 'classification' field
if "classification" not in section_def:
return ValidationError(
f"Section '{section_name}' missing required 'classification' field",
f"{property_name}.{section_name}"
)
classification = section_def["classification"]
valid_classifications = ["required", "recommended", "optional", "discouraged", "improper"]
if classification not in valid_classifications:
return ValidationError(
f"Section '{section_name}' has invalid classification '{classification}'. "
f"Must be one of {valid_classifications}",
f"{property_name}.{section_name}.classification"
)
# Validate optional fields if present
if "heading_level" in section_def:
level = section_def["heading_level"]
if not isinstance(level, int) or level < 1 or level > 6:
return ValidationError(
f"Section '{section_name}' heading_level must be integer 1-6, got {level}",
f"{property_name}.{section_name}.heading_level"
)
if "position" in section_def:
position = section_def["position"]
valid_positions = ["after_title", "before_section_name", "after_section_name", "anywhere"]
if position not in valid_positions:
return ValidationError(
f"Section '{section_name}' has invalid position '{position}'. "
f"Must be one of {valid_positions}",
f"{property_name}.{section_name}.position"
)
# Validate content constraints are non-negative integers
for constraint in ["min_paragraphs", "max_paragraphs", "min_code_blocks",
"max_code_blocks", "min_lists", "max_lists"]:
if constraint in section_def:
value_check = section_def[constraint]
if not isinstance(value_check, int) or value_check < 0:
return ValidationError(
f"Section '{section_name}' {constraint} must be non-negative integer, got {value_check}",
f"{property_name}.{section_name}.{constraint}"
)
# Validate alternatives is array of strings
if "alternatives" in section_def:
alternatives = section_def["alternatives"]
if not isinstance(alternatives, list):
return ValidationError(
f"Section '{section_name}' alternatives must be an array",
f"{property_name}.{section_name}.alternatives"
)
for alt in alternatives:
if not isinstance(alt, str):
return ValidationError(
f"Section '{section_name}' alternative names must be strings",
f"{property_name}.{section_name}.alternatives"
)
return None
def _validate_content_control(self, value: Any, property_name: str) -> Optional[ValidationError]:
"""Validate x-markitect-content-control property."""
if not isinstance(value, dict):
return ValidationError(
"x-markitect-content-control must be an object",
property_name
)
# Validate each section's content control rules
for section_name, control_def in value.items():
if not isinstance(section_name, str):
return ValidationError(
f"Content control section name must be a string: {section_name}",
f"{property_name}.{section_name}"
)
if not isinstance(control_def, dict):
return ValidationError(
f"Content control definition must be an object: {section_name}",
f"{property_name}.{section_name}"
)
# Validate pattern arrays
for pattern_type in ["required_patterns", "discouraged_patterns", "forbidden_patterns"]:
if pattern_type in control_def:
patterns = control_def[pattern_type]
if not isinstance(patterns, list):
return ValidationError(
f"Content control '{section_name}' {pattern_type} must be an array",
f"{property_name}.{section_name}.{pattern_type}"
)
for pattern in patterns:
if not isinstance(pattern, str):
return ValidationError(
f"Content control '{section_name}' pattern must be string",
f"{property_name}.{section_name}.{pattern_type}"
)
# Validate content_quality object
if "content_quality" in control_def:
quality = control_def["content_quality"]
if not isinstance(quality, dict):
return ValidationError(
f"Content control '{section_name}' content_quality must be an object",
f"{property_name}.{section_name}.content_quality"
)
# Validate word/sentence counts
for count_field in ["min_words", "max_words", "min_sentences", "max_sentences"]:
if count_field in quality:
count = quality[count_field]
if not isinstance(count, int) or count < 0:
return ValidationError(
f"Content quality '{section_name}' {count_field} must be non-negative integer",
f"{property_name}.{section_name}.content_quality.{count_field}"
)
# Validate readability_target
if "readability_target" in quality:
target = quality["readability_target"]
valid_targets = ["simple", "general", "technical", "advanced"]
if target not in valid_targets:
return ValidationError(
f"Content quality '{section_name}' readability_target must be one of {valid_targets}",
f"{property_name}.{section_name}.content_quality.readability_target"
)
# Validate content_instructions array
if "content_instructions" in control_def:
instructions = control_def["content_instructions"]
if not isinstance(instructions, list):
return ValidationError(
f"Content control '{section_name}' content_instructions must be an array",
f"{property_name}.{section_name}.content_instructions"
)
for instruction in instructions:
if not isinstance(instruction, str):
return ValidationError(
f"Content control '{section_name}' instruction must be string",
f"{property_name}.{section_name}.content_instructions"
)
# Validate link_validation object
if "link_validation" in control_def:
link_val = control_def["link_validation"]
if not isinstance(link_val, dict):
return ValidationError(
f"Content control '{section_name}' link_validation must be an object",
f"{property_name}.{section_name}.link_validation"
)
for field in ["check_internal", "check_external", "allow_fragments"]:
if field in link_val:
if not isinstance(link_val[field], bool):
return ValidationError(
f"Content control '{section_name}' link_validation.{field} must be boolean",
f"{property_name}.{section_name}.link_validation.{field}"
)
return None

View File

@@ -40,6 +40,163 @@
"type": "string",
"enum": ["outline", "full"],
"description": "Mode used to generate this schema"
},
"x-markitect-sections": {
"type": "object",
"description": "Section classification and content control for document sections",
"patternProperties": {
"^[A-Z][A-Z0-9_ ]*$": {
"type": "object",
"description": "Section definition with classification and constraints",
"properties": {
"classification": {
"type": "string",
"enum": ["required", "recommended", "optional", "discouraged", "improper"],
"description": "Classification level determining validation behavior"
},
"heading_level": {
"type": "integer",
"minimum": 1,
"maximum": 6,
"description": "Expected heading level (H1-H6) for this section"
},
"position": {
"type": "string",
"enum": ["after_title", "before_section_name", "after_section_name", "anywhere"],
"description": "Where this section should appear in the document"
},
"content_instruction": {
"type": "string",
"description": "Human-readable instruction for section content"
},
"min_paragraphs": {
"type": "integer",
"minimum": 0,
"description": "Minimum number of paragraphs in this section"
},
"max_paragraphs": {
"type": "integer",
"minimum": 0,
"description": "Maximum number of paragraphs in this section"
},
"min_code_blocks": {
"type": "integer",
"minimum": 0,
"description": "Minimum number of code blocks in this section"
},
"max_code_blocks": {
"type": "integer",
"minimum": 0,
"description": "Maximum number of code blocks in this section"
},
"min_lists": {
"type": "integer",
"minimum": 0,
"description": "Minimum number of lists in this section"
},
"max_lists": {
"type": "integer",
"minimum": 0,
"description": "Maximum number of lists in this section"
},
"warning_if_missing": {
"type": "string",
"description": "Custom warning message for missing recommended sections"
},
"error_message": {
"type": "string",
"description": "Custom error message for required/improper section violations"
},
"alternatives": {
"type": "array",
"items": {"type": "string"},
"description": "Alternative section names that satisfy the requirement"
}
},
"required": ["classification"]
}
}
},
"x-markitect-content-control": {
"type": "object",
"description": "Content validation rules including patterns and quality metrics",
"patternProperties": {
"^[a-z][a-z0-9_]*$": {
"type": "object",
"description": "Content control rules for a specific section",
"properties": {
"required_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "Regex patterns that must appear in section content"
},
"discouraged_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "Regex patterns that should not appear in content (warning)"
},
"forbidden_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "Regex patterns that must not appear in content (error)"
},
"content_quality": {
"type": "object",
"description": "Quality metrics for section content",
"properties": {
"min_words": {
"type": "integer",
"minimum": 0,
"description": "Minimum word count"
},
"max_words": {
"type": "integer",
"minimum": 0,
"description": "Maximum word count"
},
"readability_target": {
"type": "string",
"enum": ["simple", "general", "technical", "advanced"],
"description": "Target readability level"
},
"min_sentences": {
"type": "integer",
"minimum": 0,
"description": "Minimum sentence count"
},
"max_sentences": {
"type": "integer",
"minimum": 0,
"description": "Maximum sentence count"
}
}
},
"content_instructions": {
"type": "array",
"items": {"type": "string"},
"description": "Array of human-readable content creation instructions"
},
"link_validation": {
"type": "object",
"description": "Link checking configuration",
"properties": {
"check_internal": {
"type": "boolean",
"description": "Validate internal document links"
},
"check_external": {
"type": "boolean",
"description": "Validate external URLs"
},
"allow_fragments": {
"type": "boolean",
"description": "Allow fragment-only links like #section"
}
}
}
}
}
}
}
},
"patternProperties": {