Complete Phase 1 of Schema Evolution Workplan implementing flexible content control and section classification system. ## New Features ### 1. x-markitect-sections Extension - Five classification levels: required, recommended, optional, discouraged, improper - Per-section content constraints (paragraphs, code blocks, lists) - Position hints for section ordering - Custom error/warning messages - Alternative section names support - Content instructions for authors ### 2. x-markitect-content-control Extension - Required/discouraged/forbidden pattern matching - Content quality metrics (word count, readability target, sentence count) - Content instruction arrays - Link validation configuration ### 3. Metaschema Validation - Updated markitect-metaschema.json with complete validation rules - Enhanced metaschema.py with validation methods for both extensions - Comprehensive validation of all extension properties - Clear error messages for invalid schemas ### 4. Documentation & Examples - Complete specification in docs/specifications/schema-extensions-spec.md - Enhanced manpage schema demonstrating all 5 classification levels - API documentation schema showing alternative patterns - Detailed usage examples and validation behavior ## Implementation Details **Files Modified:** - markitect/schemas/markitect-metaschema.json: Added extension definitions - markitect/metaschema.py: Added _validate_sections() and _validate_content_control() **Files Created:** - docs/specifications/schema-extensions-spec.md: Complete specification (v1.0) - examples/manpages/enhanced-manpage-schema.json: Demonstrates all classifications - examples/manpages/api-documentation-schema.json: Shows API doc patterns ## Validation Behavior **Classification Levels:** - required: Missing = ERROR (validation fails) - recommended: Missing = WARNING (validation succeeds with warnings) - optional: No validation impact - discouraged: Present = WARNING (validation succeeds with warnings) - improper: Present = ERROR (validation fails) ## Next Steps Phase 2: Schema Refinement Tools (schema-analyze, schema-refine, schema-compose) Phase 3: Enhanced Validation Engine (classification-aware validation, quality metrics) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
18 KiB
MarkiTect Schema Extensions Specification v1.0
Status: Draft - Phase 1 Implementation
Overview
This specification defines MarkiTect-specific extensions to JSON Schema (draft-07) for markdown document validation with content control, section classification, and flexible structural constraints.
Design Principles
- Backward Compatibility: Existing schemas without extensions continue to work
- Namespace Isolation: All extensions prefixed with
x-markitect- - Progressive Enhancement: Extensions add capabilities without breaking standard JSON Schema
- Clear Semantics: Each extension has well-defined validation behavior
- Metaschema Validation: All extensions validated by MarkiTect metaschema
Extension: x-markitect-sections
Purpose
Define document sections with classification levels (required, recommended, optional, discouraged, improper) and content control specifications.
Schema Location
Applied at the root level of the schema or within properties that represent document sections.
Format
{
"x-markitect-sections": {
"SECTION_NAME": {
"classification": "required|recommended|optional|discouraged|improper",
"heading_level": 1|2|3|4|5|6,
"position": "after_title|before_section_name|after_section_name|anywhere",
"content_instruction": "string",
"min_paragraphs": integer,
"max_paragraphs": integer,
"min_code_blocks": integer,
"max_code_blocks": integer,
"min_lists": integer,
"max_lists": integer,
"warning_if_missing": "string",
"error_message": "string",
"alternatives": ["SECTION_NAME_1", "SECTION_NAME_2"]
}
}
}
Property Definitions
classification (required)
Classification level determining validation behavior:
required: Section MUST be present. Validation fails if missing.recommended: Section SHOULD be present. Warning if missing, but validation succeeds.optional: Section MAY be present. No validation impact either way.discouraged: Section SHOULD NOT be present. Warning if present, but validation succeeds.improper: Section MUST NOT be present. Validation fails if present.
Type: String enum
Required: Yes
Values: ["required", "recommended", "optional", "discouraged", "improper"]
heading_level (optional)
The heading level (H1-H6) for this section.
Type: Integer Range: 1-6 Default: 2 (for standard sections)
position (optional)
Where this section should appear relative to other sections.
Type: String enum Values:
"after_title"- Immediately after document title (H1)"before_section_name"- Before another named section"after_section_name"- After another named section"anywhere"- No position constraint (default)
Default: "anywhere"
content_instruction (optional)
Human-readable instruction describing what content belongs in this section.
Type: String Usage: Displayed in validation warnings, generated templates, and documentation
Example:
"content_instruction": "Brief command syntax showing all options and arguments"
Content Constraints (optional)
Minimum and maximum counts for content elements within the section:
min_paragraphs: Minimum paragraph count (integer ≥ 0)max_paragraphs: Maximum paragraph count (integer ≥ min_paragraphs)min_code_blocks: Minimum code block count (integer ≥ 0)max_code_blocks: Maximum code block count (integer ≥ min_code_blocks)min_lists: Minimum list count (integer ≥ 0)max_lists: Maximum list count (integer ≥ max_lists)
Type: Integer Default: No constraint if omitted
warning_if_missing (optional)
Custom warning message when a recommended section is missing.
Type: String
Applies to: classification: "recommended" only
Example:
"warning_if_missing": "Examples greatly improve documentation usability"
error_message (optional)
Custom error message when validation fails.
Type: String
Applies to: classification: "required" or "improper"
Example:
"error_message": "Internal notes must not appear in published documentation"
alternatives (optional)
Array of alternative section names that satisfy the requirement.
Type: Array of strings Usage: If any alternative is present, requirement is satisfied
Example:
{
"classification": "required",
"alternatives": ["EXAMPLES", "USAGE", "TUTORIAL"]
}
Example: Manpage Schema with Sections
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Unix Manpage Schema",
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"position": "after_title",
"content_instruction": "Brief command syntax with options and arguments",
"min_paragraphs": 1,
"max_paragraphs": 5,
"min_code_blocks": 0,
"max_code_blocks": 3,
"error_message": "SYNOPSIS section is mandatory for all manpages"
},
"DESCRIPTION": {
"classification": "required",
"heading_level": 2,
"position": "after_section_name",
"content_instruction": "Detailed explanation of what the command does",
"min_paragraphs": 2,
"error_message": "DESCRIPTION section is mandatory for all manpages"
},
"EXAMPLES": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Practical usage examples with explanations",
"min_code_blocks": 3,
"warning_if_missing": "Examples greatly improve manpage usability"
},
"SEE ALSO": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Related commands and documentation references",
"warning_if_missing": "Cross-references help users discover related functionality"
},
"BUGS": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Known issues and bug reporting information"
},
"DEPRECATED": {
"classification": "discouraged",
"heading_level": 2,
"warning_if_missing": "Consider moving deprecated content to historical documentation"
},
"INTERNAL_NOTES": {
"classification": "improper",
"heading_level": 2,
"error_message": "Internal notes must not appear in published manpages"
}
}
}
Validation Behavior
Required Sections
"SYNOPSIS": {"classification": "required"}
Validation:
- Section missing → ERROR →
is_valid = False - Section present → Continue validation
- Custom
error_messageused if provided
Recommended Sections
"EXAMPLES": {"classification": "recommended"}
Validation:
- Section missing → WARNING →
is_valid = True(with warnings) - Section present → Continue validation
- Custom
warning_if_missingused if provided
Optional Sections
"BUGS": {"classification": "optional"}
Validation:
- Section missing → No impact
- Section present → Continue validation
- No messages generated
Discouraged Sections
"DEPRECATED": {"classification": "discouraged"}
Validation:
- Section missing → No impact
- Section present → WARNING →
is_valid = True(with warnings) - Custom warning message used if provided
Improper Sections
"INTERNAL_NOTES": {"classification": "improper"}
Validation:
- Section missing → No impact
- Section present → ERROR →
is_valid = False - Custom
error_messageused if provided
Extension: x-markitect-content-control
Purpose
Define content validation rules for document sections including pattern matching, quality metrics, and semantic constraints.
Schema Location
Applied at root level or within specific section properties.
Format
{
"x-markitect-content-control": {
"section_name": {
"required_patterns": ["regex_pattern_1", "regex_pattern_2"],
"discouraged_patterns": ["regex_pattern_1"],
"forbidden_patterns": ["regex_pattern_1"],
"content_quality": {
"min_words": integer,
"max_words": integer,
"readability_target": "technical|general|simple|advanced",
"min_sentences": integer,
"max_sentences": integer
},
"content_instructions": ["instruction_1", "instruction_2"],
"link_validation": {
"check_internal": boolean,
"check_external": boolean,
"allow_fragments": boolean
}
}
}
}
Property Definitions
required_patterns (optional)
Array of regex patterns that MUST appear in section content.
Type: Array of strings (valid regex patterns) Validation: ERROR if any pattern missing
Example:
"required_patterns": [
"\\*\\*[a-z-]+\\*\\*", // Bold command name
"\\[.*\\]" // Options in brackets
]
discouraged_patterns (optional)
Array of regex patterns that SHOULD NOT appear in content.
Type: Array of strings (valid regex patterns) Validation: WARNING if any pattern found
Example:
"discouraged_patterns": [
"TODO",
"FIXME",
"\\bWIP\\b"
]
forbidden_patterns (optional)
Array of regex patterns that MUST NOT appear in content.
Type: Array of strings (valid regex patterns) Validation: ERROR if any pattern found
Example:
"forbidden_patterns": [
"password\\s*=\\s*[\"'].*[\"']", // Hard-coded passwords
"api[_-]?key\\s*=\\s*[\"'].*[\"']" // Hard-coded API keys
]
content_quality (optional)
Quality metrics for section content:
Sub-properties:
min_words: Minimum word count (integer ≥ 0)max_words: Maximum word count (integer ≥ min_words)readability_target: Target readability level (enum)"simple"- Elementary school level"general"- General audience"technical"- Technical audience"advanced"- Expert/academic level
min_sentences: Minimum sentence count (integer ≥ 0)max_sentences: Maximum sentence count (integer ≥ min_sentences)
Example:
"content_quality": {
"min_words": 50,
"max_words": 300,
"readability_target": "technical",
"min_sentences": 3
}
content_instructions (optional)
Array of human-readable instructions for content creation.
Type: Array of strings Usage: Displayed in templates, validation reports, and documentation
Example:
"content_instructions": [
"Show command name in bold",
"Include all major options",
"Use italic for arguments and placeholders",
"Keep syntax examples concise (1-3 lines)"
]
link_validation (optional)
Link checking configuration:
Sub-properties:
check_internal: Validate internal document links (boolean)check_external: Validate external URLs (boolean)allow_fragments: Allow fragment-only links like#section(boolean)
Default: All false (no link validation)
Example:
"link_validation": {
"check_internal": true,
"check_external": false,
"allow_fragments": true
}
Example: Content Control for API Documentation
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "API Documentation Schema",
"x-markitect-content-control": {
"synopsis": {
"required_patterns": [
"\\*\\*[A-Z]+\\*\\*", // HTTP method in bold
"`/api/.*`" // Endpoint path in code
],
"content_quality": {
"min_words": 10,
"max_words": 100,
"readability_target": "technical"
},
"content_instructions": [
"Start with HTTP method in bold (e.g., **GET**)",
"Show endpoint path in code format",
"Include brief one-line description"
]
},
"request_parameters": {
"required_patterns": [
"\\*\\*[a-z_]+\\*\\*.*\\*[A-Za-z]+\\*" // Bold param name with italic type
],
"content_instructions": [
"Use bold for parameter names",
"Use italic for parameter types",
"Include description for each parameter",
"Mark required parameters clearly"
]
},
"description": {
"discouraged_patterns": [
"TODO",
"FIXME",
"TBD"
],
"forbidden_patterns": [
"password\\s*=",
"secret\\s*=",
"token\\s*="
],
"content_quality": {
"min_words": 50,
"max_words": 500,
"readability_target": "technical",
"min_sentences": 3
},
"link_validation": {
"check_internal": true,
"check_external": true,
"allow_fragments": true
}
}
}
}
Validation Result Structure
Enhanced ValidationResult Class
class ValidationResult:
"""Result of schema validation with classification support."""
status: Literal["valid", "valid_with_warnings", "invalid"]
errors: List[ValidationError] # Required/improper violations
warnings: List[ValidationWarning] # Recommended/discouraged violations
suggestions: List[str] # Optional improvements
quality_metrics: Dict[str, Any] # Content quality scores
Validation Status Values
"valid": No errors, no warnings. Document fully conforms."valid_with_warnings": No errors, but has warnings. Document acceptable but improvable."invalid": Has errors. Document does not conform to schema.
Error Types
class ValidationErrorType(Enum):
MISSING_REQUIRED_SECTION = "missing_required_section"
IMPROPER_SECTION_PRESENT = "improper_section_present"
CONTENT_PATTERN_MISSING = "content_pattern_missing"
CONTENT_PATTERN_FORBIDDEN = "content_pattern_forbidden"
CONTENT_TOO_SHORT = "content_too_short"
CONTENT_TOO_LONG = "content_too_long"
INVALID_LINK = "invalid_link"
STRUCTURE_MISMATCH = "structure_mismatch"
Warning Types
class ValidationWarningType(Enum):
MISSING_RECOMMENDED_SECTION = "missing_recommended_section"
DISCOURAGED_SECTION_PRESENT = "discouraged_section_present"
CONTENT_PATTERN_DISCOURAGED = "content_pattern_discouraged"
CONTENT_QUALITY_BELOW_TARGET = "content_quality_below_target"
READABILITY_MISMATCH = "readability_mismatch"
Metaschema Validation
Extension Validation Rules
The MarkiTect metaschema validates these extensions:
{
"x-markitect-sections": {
"type": "object",
"patternProperties": {
"^[A-Z][A-Z0-9_ ]*$": {
"type": "object",
"properties": {
"classification": {
"type": "string",
"enum": ["required", "recommended", "optional", "discouraged", "improper"]
},
"heading_level": {
"type": "integer",
"minimum": 1,
"maximum": 6
},
"position": {
"type": "string",
"enum": ["after_title", "before_section_name", "after_section_name", "anywhere"]
},
"content_instruction": {"type": "string"},
"min_paragraphs": {"type": "integer", "minimum": 0},
"max_paragraphs": {"type": "integer", "minimum": 0},
"min_code_blocks": {"type": "integer", "minimum": 0},
"max_code_blocks": {"type": "integer", "minimum": 0},
"min_lists": {"type": "integer", "minimum": 0},
"max_lists": {"type": "integer", "minimum": 0},
"warning_if_missing": {"type": "string"},
"error_message": {"type": "string"},
"alternatives": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["classification"]
}
}
},
"x-markitect-content-control": {
"type": "object",
"patternProperties": {
"^[a-z][a-z0-9_]*$": {
"type": "object",
"properties": {
"required_patterns": {
"type": "array",
"items": {"type": "string", "format": "regex"}
},
"discouraged_patterns": {
"type": "array",
"items": {"type": "string", "format": "regex"}
},
"forbidden_patterns": {
"type": "array",
"items": {"type": "string", "format": "regex"}
},
"content_quality": {
"type": "object",
"properties": {
"min_words": {"type": "integer", "minimum": 0},
"max_words": {"type": "integer", "minimum": 0},
"readability_target": {
"type": "string",
"enum": ["simple", "general", "technical", "advanced"]
},
"min_sentences": {"type": "integer", "minimum": 0},
"max_sentences": {"type": "integer", "minimum": 0}
}
},
"content_instructions": {
"type": "array",
"items": {"type": "string"}
},
"link_validation": {
"type": "object",
"properties": {
"check_internal": {"type": "boolean"},
"check_external": {"type": "boolean"},
"allow_fragments": {"type": "boolean"}
}
}
}
}
}
}
}
Implementation Notes
Phase 1 Scope
- Define and document extension formats ✓
- Update metaschema to validate extensions
- Implement basic classification validation (required/recommended/optional/discouraged/improper)
- Create example schemas demonstrating all features
- Update CLI to report errors vs warnings separately
Future Enhancements (Phase 2+)
- Content pattern matching implementation
- Quality metrics calculation
- Link validation
- Readability scoring
- Position constraints enforcement
Version History
- v1.0 (Draft) - Initial specification for Phase 1 implementation
x-markitect-sectionsextension definedx-markitect-content-controlextension defined- Validation result structure defined
- Metaschema validation rules defined
References
- JSON Schema Draft-07: https://json-schema.org/draft-07/schema
- MarkiTect Schema Evolution Workplan:
examples/manpages/SCHEMA_EVOLUTION_WORKPLAN.md - Existing Metaschema:
markitect/schemas/markitect-metaschema.json - Metaschema Validator:
markitect/metaschema.py