# MarkiTect Schema Extensions Specification v1.0 ## Status: Draft - Phase 1 Implementation ## Overview This specification defines MarkiTect-specific extensions to JSON Schema (draft-07) for markdown document validation with content control, section classification, and flexible structural constraints. ## Design Principles 1. **Backward Compatibility**: Existing schemas without extensions continue to work 2. **Namespace Isolation**: All extensions prefixed with `x-markitect-` 3. **Progressive Enhancement**: Extensions add capabilities without breaking standard JSON Schema 4. **Clear Semantics**: Each extension has well-defined validation behavior 5. **Metaschema Validation**: All extensions validated by MarkiTect metaschema --- ## Extension: `x-markitect-sections` ### Purpose Define document sections with classification levels (required, recommended, optional, discouraged, improper) and content control specifications. ### Schema Location Applied at the **root level** of the schema or within **properties** that represent document sections. ### Format ```json { "x-markitect-sections": { "SECTION_NAME": { "classification": "required|recommended|optional|discouraged|improper", "heading_level": 1|2|3|4|5|6, "position": "after_title|before_section_name|after_section_name|anywhere", "content_instruction": "string", "min_paragraphs": integer, "max_paragraphs": integer, "min_code_blocks": integer, "max_code_blocks": integer, "min_lists": integer, "max_lists": integer, "warning_if_missing": "string", "error_message": "string", "alternatives": ["SECTION_NAME_1", "SECTION_NAME_2"] } } } ``` ### Property Definitions #### `classification` (required) Classification level determining validation behavior: - **`required`**: Section MUST be present. Validation fails if missing. - **`recommended`**: Section SHOULD be present. Warning if missing, but validation succeeds. - **`optional`**: Section MAY be present. No validation impact either way. - **`discouraged`**: Section SHOULD NOT be present. Warning if present, but validation succeeds. - **`improper`**: Section MUST NOT be present. Validation fails if present. **Type**: String enum **Required**: Yes **Values**: `["required", "recommended", "optional", "discouraged", "improper"]` #### `heading_level` (optional) The heading level (H1-H6) for this section. **Type**: Integer **Range**: 1-6 **Default**: 2 (for standard sections) #### `position` (optional) Where this section should appear relative to other sections. **Type**: String enum **Values**: - `"after_title"` - Immediately after document title (H1) - `"before_section_name"` - Before another named section - `"after_section_name"` - After another named section - `"anywhere"` - No position constraint (default) **Default**: `"anywhere"` #### `content_instruction` (optional) Human-readable instruction describing what content belongs in this section. **Type**: String **Usage**: Displayed in validation warnings, generated templates, and documentation **Example**: ```json "content_instruction": "Brief command syntax showing all options and arguments" ``` #### Content Constraints (optional) Minimum and maximum counts for content elements within the section: - **`min_paragraphs`**: Minimum paragraph count (integer ≥ 0) - **`max_paragraphs`**: Maximum paragraph count (integer ≥ min_paragraphs) - **`min_code_blocks`**: Minimum code block count (integer ≥ 0) - **`max_code_blocks`**: Maximum code block count (integer ≥ min_code_blocks) - **`min_lists`**: Minimum list count (integer ≥ 0) - **`max_lists`**: Maximum list count (integer ≥ max_lists) **Type**: Integer **Default**: No constraint if omitted #### `warning_if_missing` (optional) Custom warning message when a recommended section is missing. **Type**: String **Applies to**: `classification: "recommended"` only **Example**: ```json "warning_if_missing": "Examples greatly improve documentation usability" ``` #### `error_message` (optional) Custom error message when validation fails. **Type**: String **Applies to**: `classification: "required"` or `"improper"` **Example**: ```json "error_message": "Internal notes must not appear in published documentation" ``` #### `alternatives` (optional) Array of alternative section names that satisfy the requirement. **Type**: Array of strings **Usage**: If any alternative is present, requirement is satisfied **Example**: ```json { "classification": "required", "alternatives": ["EXAMPLES", "USAGE", "TUTORIAL"] } ``` ### Example: Manpage Schema with Sections ```json { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Unix Manpage Schema", "x-markitect-sections": { "SYNOPSIS": { "classification": "required", "heading_level": 2, "position": "after_title", "content_instruction": "Brief command syntax with options and arguments", "min_paragraphs": 1, "max_paragraphs": 5, "min_code_blocks": 0, "max_code_blocks": 3, "error_message": "SYNOPSIS section is mandatory for all manpages" }, "DESCRIPTION": { "classification": "required", "heading_level": 2, "position": "after_section_name", "content_instruction": "Detailed explanation of what the command does", "min_paragraphs": 2, "error_message": "DESCRIPTION section is mandatory for all manpages" }, "EXAMPLES": { "classification": "recommended", "heading_level": 2, "content_instruction": "Practical usage examples with explanations", "min_code_blocks": 3, "warning_if_missing": "Examples greatly improve manpage usability" }, "SEE ALSO": { "classification": "recommended", "heading_level": 2, "content_instruction": "Related commands and documentation references", "warning_if_missing": "Cross-references help users discover related functionality" }, "BUGS": { "classification": "optional", "heading_level": 2, "content_instruction": "Known issues and bug reporting information" }, "DEPRECATED": { "classification": "discouraged", "heading_level": 2, "warning_if_missing": "Consider moving deprecated content to historical documentation" }, "INTERNAL_NOTES": { "classification": "improper", "heading_level": 2, "error_message": "Internal notes must not appear in published manpages" } } } ``` ### Validation Behavior #### Required Sections ```json "SYNOPSIS": {"classification": "required"} ``` **Validation**: - Section missing → **ERROR** → `is_valid = False` - Section present → Continue validation - Custom `error_message` used if provided #### Recommended Sections ```json "EXAMPLES": {"classification": "recommended"} ``` **Validation**: - Section missing → **WARNING** → `is_valid = True` (with warnings) - Section present → Continue validation - Custom `warning_if_missing` used if provided #### Optional Sections ```json "BUGS": {"classification": "optional"} ``` **Validation**: - Section missing → No impact - Section present → Continue validation - No messages generated #### Discouraged Sections ```json "DEPRECATED": {"classification": "discouraged"} ``` **Validation**: - Section missing → No impact - Section present → **WARNING** → `is_valid = True` (with warnings) - Custom warning message used if provided #### Improper Sections ```json "INTERNAL_NOTES": {"classification": "improper"} ``` **Validation**: - Section missing → No impact - Section present → **ERROR** → `is_valid = False` - Custom `error_message` used if provided --- ## Extension: `x-markitect-content-control` ### Purpose Define content validation rules for document sections including pattern matching, quality metrics, and semantic constraints. ### Schema Location Applied at **root level** or within specific **section properties**. ### Format ```json { "x-markitect-content-control": { "section_name": { "required_patterns": ["regex_pattern_1", "regex_pattern_2"], "discouraged_patterns": ["regex_pattern_1"], "forbidden_patterns": ["regex_pattern_1"], "content_quality": { "min_words": integer, "max_words": integer, "readability_target": "technical|general|simple|advanced", "min_sentences": integer, "max_sentences": integer }, "content_instructions": ["instruction_1", "instruction_2"], "link_validation": { "check_internal": boolean, "check_external": boolean, "allow_fragments": boolean } } } } ``` ### Property Definitions #### `required_patterns` (optional) Array of regex patterns that MUST appear in section content. **Type**: Array of strings (valid regex patterns) **Validation**: ERROR if any pattern missing **Example**: ```json "required_patterns": [ "\\*\\*[a-z-]+\\*\\*", // Bold command name "\\[.*\\]" // Options in brackets ] ``` #### `discouraged_patterns` (optional) Array of regex patterns that SHOULD NOT appear in content. **Type**: Array of strings (valid regex patterns) **Validation**: WARNING if any pattern found **Example**: ```json "discouraged_patterns": [ "TODO", "FIXME", "\\bWIP\\b" ] ``` #### `forbidden_patterns` (optional) Array of regex patterns that MUST NOT appear in content. **Type**: Array of strings (valid regex patterns) **Validation**: ERROR if any pattern found **Example**: ```json "forbidden_patterns": [ "password\\s*=\\s*[\"'].*[\"']", // Hard-coded passwords "api[_-]?key\\s*=\\s*[\"'].*[\"']" // Hard-coded API keys ] ``` #### `content_quality` (optional) Quality metrics for section content: **Sub-properties**: - **`min_words`**: Minimum word count (integer ≥ 0) - **`max_words`**: Maximum word count (integer ≥ min_words) - **`readability_target`**: Target readability level (enum) - `"simple"` - Elementary school level - `"general"` - General audience - `"technical"` - Technical audience - `"advanced"` - Expert/academic level - **`min_sentences`**: Minimum sentence count (integer ≥ 0) - **`max_sentences`**: Maximum sentence count (integer ≥ min_sentences) **Example**: ```json "content_quality": { "min_words": 50, "max_words": 300, "readability_target": "technical", "min_sentences": 3 } ``` #### `content_instructions` (optional) Array of human-readable instructions for content creation. **Type**: Array of strings **Usage**: Displayed in templates, validation reports, and documentation **Example**: ```json "content_instructions": [ "Show command name in bold", "Include all major options", "Use italic for arguments and placeholders", "Keep syntax examples concise (1-3 lines)" ] ``` #### `link_validation` (optional) Link checking configuration: **Sub-properties**: - **`check_internal`**: Validate internal document links (boolean) - **`check_external`**: Validate external URLs (boolean) - **`allow_fragments`**: Allow fragment-only links like `#section` (boolean) **Default**: All false (no link validation) **Example**: ```json "link_validation": { "check_internal": true, "check_external": false, "allow_fragments": true } ``` ### Example: Content Control for API Documentation ```json { "$schema": "http://json-schema.org/draft-07/schema#", "title": "API Documentation Schema", "x-markitect-content-control": { "synopsis": { "required_patterns": [ "\\*\\*[A-Z]+\\*\\*", // HTTP method in bold "`/api/.*`" // Endpoint path in code ], "content_quality": { "min_words": 10, "max_words": 100, "readability_target": "technical" }, "content_instructions": [ "Start with HTTP method in bold (e.g., **GET**)", "Show endpoint path in code format", "Include brief one-line description" ] }, "request_parameters": { "required_patterns": [ "\\*\\*[a-z_]+\\*\\*.*\\*[A-Za-z]+\\*" // Bold param name with italic type ], "content_instructions": [ "Use bold for parameter names", "Use italic for parameter types", "Include description for each parameter", "Mark required parameters clearly" ] }, "description": { "discouraged_patterns": [ "TODO", "FIXME", "TBD" ], "forbidden_patterns": [ "password\\s*=", "secret\\s*=", "token\\s*=" ], "content_quality": { "min_words": 50, "max_words": 500, "readability_target": "technical", "min_sentences": 3 }, "link_validation": { "check_internal": true, "check_external": true, "allow_fragments": true } } } } ``` --- ## Validation Result Structure ### Enhanced ValidationResult Class ```python class ValidationResult: """Result of schema validation with classification support.""" status: Literal["valid", "valid_with_warnings", "invalid"] errors: List[ValidationError] # Required/improper violations warnings: List[ValidationWarning] # Recommended/discouraged violations suggestions: List[str] # Optional improvements quality_metrics: Dict[str, Any] # Content quality scores ``` ### Validation Status Values - **`"valid"`**: No errors, no warnings. Document fully conforms. - **`"valid_with_warnings"`**: No errors, but has warnings. Document acceptable but improvable. - **`"invalid"`**: Has errors. Document does not conform to schema. ### Error Types ```python class ValidationErrorType(Enum): MISSING_REQUIRED_SECTION = "missing_required_section" IMPROPER_SECTION_PRESENT = "improper_section_present" CONTENT_PATTERN_MISSING = "content_pattern_missing" CONTENT_PATTERN_FORBIDDEN = "content_pattern_forbidden" CONTENT_TOO_SHORT = "content_too_short" CONTENT_TOO_LONG = "content_too_long" INVALID_LINK = "invalid_link" STRUCTURE_MISMATCH = "structure_mismatch" ``` ### Warning Types ```python class ValidationWarningType(Enum): MISSING_RECOMMENDED_SECTION = "missing_recommended_section" DISCOURAGED_SECTION_PRESENT = "discouraged_section_present" CONTENT_PATTERN_DISCOURAGED = "content_pattern_discouraged" CONTENT_QUALITY_BELOW_TARGET = "content_quality_below_target" READABILITY_MISMATCH = "readability_mismatch" ``` --- ## Metaschema Validation ### Extension Validation Rules The MarkiTect metaschema validates these extensions: ```json { "x-markitect-sections": { "type": "object", "patternProperties": { "^[A-Z][A-Z0-9_ ]*$": { "type": "object", "properties": { "classification": { "type": "string", "enum": ["required", "recommended", "optional", "discouraged", "improper"] }, "heading_level": { "type": "integer", "minimum": 1, "maximum": 6 }, "position": { "type": "string", "enum": ["after_title", "before_section_name", "after_section_name", "anywhere"] }, "content_instruction": {"type": "string"}, "min_paragraphs": {"type": "integer", "minimum": 0}, "max_paragraphs": {"type": "integer", "minimum": 0}, "min_code_blocks": {"type": "integer", "minimum": 0}, "max_code_blocks": {"type": "integer", "minimum": 0}, "min_lists": {"type": "integer", "minimum": 0}, "max_lists": {"type": "integer", "minimum": 0}, "warning_if_missing": {"type": "string"}, "error_message": {"type": "string"}, "alternatives": { "type": "array", "items": {"type": "string"} } }, "required": ["classification"] } } }, "x-markitect-content-control": { "type": "object", "patternProperties": { "^[a-z][a-z0-9_]*$": { "type": "object", "properties": { "required_patterns": { "type": "array", "items": {"type": "string", "format": "regex"} }, "discouraged_patterns": { "type": "array", "items": {"type": "string", "format": "regex"} }, "forbidden_patterns": { "type": "array", "items": {"type": "string", "format": "regex"} }, "content_quality": { "type": "object", "properties": { "min_words": {"type": "integer", "minimum": 0}, "max_words": {"type": "integer", "minimum": 0}, "readability_target": { "type": "string", "enum": ["simple", "general", "technical", "advanced"] }, "min_sentences": {"type": "integer", "minimum": 0}, "max_sentences": {"type": "integer", "minimum": 0} } }, "content_instructions": { "type": "array", "items": {"type": "string"} }, "link_validation": { "type": "object", "properties": { "check_internal": {"type": "boolean"}, "check_external": {"type": "boolean"}, "allow_fragments": {"type": "boolean"} } } } } } } } ``` --- ## Implementation Notes ### Phase 1 Scope 1. Define and document extension formats ✓ 2. Update metaschema to validate extensions 3. Implement basic classification validation (required/recommended/optional/discouraged/improper) 4. Create example schemas demonstrating all features 5. Update CLI to report errors vs warnings separately ### Future Enhancements (Phase 2+) - Content pattern matching implementation - Quality metrics calculation - Link validation - Readability scoring - Position constraints enforcement --- ## Version History - **v1.0 (Draft)** - Initial specification for Phase 1 implementation - `x-markitect-sections` extension defined - `x-markitect-content-control` extension defined - Validation result structure defined - Metaschema validation rules defined --- ## References - JSON Schema Draft-07: https://json-schema.org/draft-07/schema - MarkiTect Schema Evolution Workplan: `examples/manpages/SCHEMA_EVOLUTION_WORKPLAN.md` - Existing Metaschema: `markitect/schemas/markitect-metaschema.json` - Metaschema Validator: `markitect/metaschema.py`