Files
markitect-main/docs/specifications/schema-extensions-spec.md
tegwick d68e762612
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
feat: implement Phase 1 - Enhanced Schema Format with Classifications
Complete Phase 1 of Schema Evolution Workplan implementing flexible content
control and section classification system.

## New Features

### 1. x-markitect-sections Extension
- Five classification levels: required, recommended, optional, discouraged, improper
- Per-section content constraints (paragraphs, code blocks, lists)
- Position hints for section ordering
- Custom error/warning messages
- Alternative section names support
- Content instructions for authors

### 2. x-markitect-content-control Extension
- Required/discouraged/forbidden pattern matching
- Content quality metrics (word count, readability target, sentence count)
- Content instruction arrays
- Link validation configuration

### 3. Metaschema Validation
- Updated markitect-metaschema.json with complete validation rules
- Enhanced metaschema.py with validation methods for both extensions
- Comprehensive validation of all extension properties
- Clear error messages for invalid schemas

### 4. Documentation & Examples
- Complete specification in docs/specifications/schema-extensions-spec.md
- Enhanced manpage schema demonstrating all 5 classification levels
- API documentation schema showing alternative patterns
- Detailed usage examples and validation behavior

## Implementation Details

**Files Modified:**
- markitect/schemas/markitect-metaschema.json: Added extension definitions
- markitect/metaschema.py: Added _validate_sections() and _validate_content_control()

**Files Created:**
- docs/specifications/schema-extensions-spec.md: Complete specification (v1.0)
- examples/manpages/enhanced-manpage-schema.json: Demonstrates all classifications
- examples/manpages/api-documentation-schema.json: Shows API doc patterns

## Validation Behavior

**Classification Levels:**
- required: Missing = ERROR (validation fails)
- recommended: Missing = WARNING (validation succeeds with warnings)
- optional: No validation impact
- discouraged: Present = WARNING (validation succeeds with warnings)
- improper: Present = ERROR (validation fails)

## Next Steps

Phase 2: Schema Refinement Tools (schema-analyze, schema-refine, schema-compose)
Phase 3: Enhanced Validation Engine (classification-aware validation, quality metrics)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 21:02:51 +01:00

18 KiB

MarkiTect Schema Extensions Specification v1.0

Status: Draft - Phase 1 Implementation

Overview

This specification defines MarkiTect-specific extensions to JSON Schema (draft-07) for markdown document validation with content control, section classification, and flexible structural constraints.

Design Principles

  1. Backward Compatibility: Existing schemas without extensions continue to work
  2. Namespace Isolation: All extensions prefixed with x-markitect-
  3. Progressive Enhancement: Extensions add capabilities without breaking standard JSON Schema
  4. Clear Semantics: Each extension has well-defined validation behavior
  5. Metaschema Validation: All extensions validated by MarkiTect metaschema

Extension: x-markitect-sections

Purpose

Define document sections with classification levels (required, recommended, optional, discouraged, improper) and content control specifications.

Schema Location

Applied at the root level of the schema or within properties that represent document sections.

Format

{
  "x-markitect-sections": {
    "SECTION_NAME": {
      "classification": "required|recommended|optional|discouraged|improper",
      "heading_level": 1|2|3|4|5|6,
      "position": "after_title|before_section_name|after_section_name|anywhere",
      "content_instruction": "string",
      "min_paragraphs": integer,
      "max_paragraphs": integer,
      "min_code_blocks": integer,
      "max_code_blocks": integer,
      "min_lists": integer,
      "max_lists": integer,
      "warning_if_missing": "string",
      "error_message": "string",
      "alternatives": ["SECTION_NAME_1", "SECTION_NAME_2"]
    }
  }
}

Property Definitions

classification (required)

Classification level determining validation behavior:

  • required: Section MUST be present. Validation fails if missing.
  • recommended: Section SHOULD be present. Warning if missing, but validation succeeds.
  • optional: Section MAY be present. No validation impact either way.
  • discouraged: Section SHOULD NOT be present. Warning if present, but validation succeeds.
  • improper: Section MUST NOT be present. Validation fails if present.

Type: String enum Required: Yes Values: ["required", "recommended", "optional", "discouraged", "improper"]

heading_level (optional)

The heading level (H1-H6) for this section.

Type: Integer Range: 1-6 Default: 2 (for standard sections)

position (optional)

Where this section should appear relative to other sections.

Type: String enum Values:

  • "after_title" - Immediately after document title (H1)
  • "before_section_name" - Before another named section
  • "after_section_name" - After another named section
  • "anywhere" - No position constraint (default)

Default: "anywhere"

content_instruction (optional)

Human-readable instruction describing what content belongs in this section.

Type: String Usage: Displayed in validation warnings, generated templates, and documentation

Example:

"content_instruction": "Brief command syntax showing all options and arguments"

Content Constraints (optional)

Minimum and maximum counts for content elements within the section:

  • min_paragraphs: Minimum paragraph count (integer ≥ 0)
  • max_paragraphs: Maximum paragraph count (integer ≥ min_paragraphs)
  • min_code_blocks: Minimum code block count (integer ≥ 0)
  • max_code_blocks: Maximum code block count (integer ≥ min_code_blocks)
  • min_lists: Minimum list count (integer ≥ 0)
  • max_lists: Maximum list count (integer ≥ max_lists)

Type: Integer Default: No constraint if omitted

warning_if_missing (optional)

Custom warning message when a recommended section is missing.

Type: String Applies to: classification: "recommended" only

Example:

"warning_if_missing": "Examples greatly improve documentation usability"

error_message (optional)

Custom error message when validation fails.

Type: String Applies to: classification: "required" or "improper"

Example:

"error_message": "Internal notes must not appear in published documentation"

alternatives (optional)

Array of alternative section names that satisfy the requirement.

Type: Array of strings Usage: If any alternative is present, requirement is satisfied

Example:

{
  "classification": "required",
  "alternatives": ["EXAMPLES", "USAGE", "TUTORIAL"]
}

Example: Manpage Schema with Sections

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Unix Manpage Schema",
  "x-markitect-sections": {
    "SYNOPSIS": {
      "classification": "required",
      "heading_level": 2,
      "position": "after_title",
      "content_instruction": "Brief command syntax with options and arguments",
      "min_paragraphs": 1,
      "max_paragraphs": 5,
      "min_code_blocks": 0,
      "max_code_blocks": 3,
      "error_message": "SYNOPSIS section is mandatory for all manpages"
    },
    "DESCRIPTION": {
      "classification": "required",
      "heading_level": 2,
      "position": "after_section_name",
      "content_instruction": "Detailed explanation of what the command does",
      "min_paragraphs": 2,
      "error_message": "DESCRIPTION section is mandatory for all manpages"
    },
    "EXAMPLES": {
      "classification": "recommended",
      "heading_level": 2,
      "content_instruction": "Practical usage examples with explanations",
      "min_code_blocks": 3,
      "warning_if_missing": "Examples greatly improve manpage usability"
    },
    "SEE ALSO": {
      "classification": "recommended",
      "heading_level": 2,
      "content_instruction": "Related commands and documentation references",
      "warning_if_missing": "Cross-references help users discover related functionality"
    },
    "BUGS": {
      "classification": "optional",
      "heading_level": 2,
      "content_instruction": "Known issues and bug reporting information"
    },
    "DEPRECATED": {
      "classification": "discouraged",
      "heading_level": 2,
      "warning_if_missing": "Consider moving deprecated content to historical documentation"
    },
    "INTERNAL_NOTES": {
      "classification": "improper",
      "heading_level": 2,
      "error_message": "Internal notes must not appear in published manpages"
    }
  }
}

Validation Behavior

Required Sections

"SYNOPSIS": {"classification": "required"}

Validation:

  • Section missing → ERRORis_valid = False
  • Section present → Continue validation
  • Custom error_message used if provided
"EXAMPLES": {"classification": "recommended"}

Validation:

  • Section missing → WARNINGis_valid = True (with warnings)
  • Section present → Continue validation
  • Custom warning_if_missing used if provided

Optional Sections

"BUGS": {"classification": "optional"}

Validation:

  • Section missing → No impact
  • Section present → Continue validation
  • No messages generated

Discouraged Sections

"DEPRECATED": {"classification": "discouraged"}

Validation:

  • Section missing → No impact
  • Section present → WARNINGis_valid = True (with warnings)
  • Custom warning message used if provided

Improper Sections

"INTERNAL_NOTES": {"classification": "improper"}

Validation:

  • Section missing → No impact
  • Section present → ERRORis_valid = False
  • Custom error_message used if provided

Extension: x-markitect-content-control

Purpose

Define content validation rules for document sections including pattern matching, quality metrics, and semantic constraints.

Schema Location

Applied at root level or within specific section properties.

Format

{
  "x-markitect-content-control": {
    "section_name": {
      "required_patterns": ["regex_pattern_1", "regex_pattern_2"],
      "discouraged_patterns": ["regex_pattern_1"],
      "forbidden_patterns": ["regex_pattern_1"],
      "content_quality": {
        "min_words": integer,
        "max_words": integer,
        "readability_target": "technical|general|simple|advanced",
        "min_sentences": integer,
        "max_sentences": integer
      },
      "content_instructions": ["instruction_1", "instruction_2"],
      "link_validation": {
        "check_internal": boolean,
        "check_external": boolean,
        "allow_fragments": boolean
      }
    }
  }
}

Property Definitions

required_patterns (optional)

Array of regex patterns that MUST appear in section content.

Type: Array of strings (valid regex patterns) Validation: ERROR if any pattern missing

Example:

"required_patterns": [
  "\\*\\*[a-z-]+\\*\\*",  // Bold command name
  "\\[.*\\]"              // Options in brackets
]

discouraged_patterns (optional)

Array of regex patterns that SHOULD NOT appear in content.

Type: Array of strings (valid regex patterns) Validation: WARNING if any pattern found

Example:

"discouraged_patterns": [
  "TODO",
  "FIXME",
  "\\bWIP\\b"
]

forbidden_patterns (optional)

Array of regex patterns that MUST NOT appear in content.

Type: Array of strings (valid regex patterns) Validation: ERROR if any pattern found

Example:

"forbidden_patterns": [
  "password\\s*=\\s*[\"'].*[\"']",  // Hard-coded passwords
  "api[_-]?key\\s*=\\s*[\"'].*[\"']"  // Hard-coded API keys
]

content_quality (optional)

Quality metrics for section content:

Sub-properties:

  • min_words: Minimum word count (integer ≥ 0)
  • max_words: Maximum word count (integer ≥ min_words)
  • readability_target: Target readability level (enum)
    • "simple" - Elementary school level
    • "general" - General audience
    • "technical" - Technical audience
    • "advanced" - Expert/academic level
  • min_sentences: Minimum sentence count (integer ≥ 0)
  • max_sentences: Maximum sentence count (integer ≥ min_sentences)

Example:

"content_quality": {
  "min_words": 50,
  "max_words": 300,
  "readability_target": "technical",
  "min_sentences": 3
}

content_instructions (optional)

Array of human-readable instructions for content creation.

Type: Array of strings Usage: Displayed in templates, validation reports, and documentation

Example:

"content_instructions": [
  "Show command name in bold",
  "Include all major options",
  "Use italic for arguments and placeholders",
  "Keep syntax examples concise (1-3 lines)"
]

Link checking configuration:

Sub-properties:

  • check_internal: Validate internal document links (boolean)
  • check_external: Validate external URLs (boolean)
  • allow_fragments: Allow fragment-only links like #section (boolean)

Default: All false (no link validation)

Example:

"link_validation": {
  "check_internal": true,
  "check_external": false,
  "allow_fragments": true
}

Example: Content Control for API Documentation

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "API Documentation Schema",
  "x-markitect-content-control": {
    "synopsis": {
      "required_patterns": [
        "\\*\\*[A-Z]+\\*\\*",  // HTTP method in bold
        "`/api/.*`"            // Endpoint path in code
      ],
      "content_quality": {
        "min_words": 10,
        "max_words": 100,
        "readability_target": "technical"
      },
      "content_instructions": [
        "Start with HTTP method in bold (e.g., **GET**)",
        "Show endpoint path in code format",
        "Include brief one-line description"
      ]
    },
    "request_parameters": {
      "required_patterns": [
        "\\*\\*[a-z_]+\\*\\*.*\\*[A-Za-z]+\\*"  // Bold param name with italic type
      ],
      "content_instructions": [
        "Use bold for parameter names",
        "Use italic for parameter types",
        "Include description for each parameter",
        "Mark required parameters clearly"
      ]
    },
    "description": {
      "discouraged_patterns": [
        "TODO",
        "FIXME",
        "TBD"
      ],
      "forbidden_patterns": [
        "password\\s*=",
        "secret\\s*=",
        "token\\s*="
      ],
      "content_quality": {
        "min_words": 50,
        "max_words": 500,
        "readability_target": "technical",
        "min_sentences": 3
      },
      "link_validation": {
        "check_internal": true,
        "check_external": true,
        "allow_fragments": true
      }
    }
  }
}

Validation Result Structure

Enhanced ValidationResult Class

class ValidationResult:
    """Result of schema validation with classification support."""

    status: Literal["valid", "valid_with_warnings", "invalid"]
    errors: List[ValidationError]      # Required/improper violations
    warnings: List[ValidationWarning]  # Recommended/discouraged violations
    suggestions: List[str]             # Optional improvements
    quality_metrics: Dict[str, Any]    # Content quality scores

Validation Status Values

  • "valid": No errors, no warnings. Document fully conforms.
  • "valid_with_warnings": No errors, but has warnings. Document acceptable but improvable.
  • "invalid": Has errors. Document does not conform to schema.

Error Types

class ValidationErrorType(Enum):
    MISSING_REQUIRED_SECTION = "missing_required_section"
    IMPROPER_SECTION_PRESENT = "improper_section_present"
    CONTENT_PATTERN_MISSING = "content_pattern_missing"
    CONTENT_PATTERN_FORBIDDEN = "content_pattern_forbidden"
    CONTENT_TOO_SHORT = "content_too_short"
    CONTENT_TOO_LONG = "content_too_long"
    INVALID_LINK = "invalid_link"
    STRUCTURE_MISMATCH = "structure_mismatch"

Warning Types

class ValidationWarningType(Enum):
    MISSING_RECOMMENDED_SECTION = "missing_recommended_section"
    DISCOURAGED_SECTION_PRESENT = "discouraged_section_present"
    CONTENT_PATTERN_DISCOURAGED = "content_pattern_discouraged"
    CONTENT_QUALITY_BELOW_TARGET = "content_quality_below_target"
    READABILITY_MISMATCH = "readability_mismatch"

Metaschema Validation

Extension Validation Rules

The MarkiTect metaschema validates these extensions:

{
  "x-markitect-sections": {
    "type": "object",
    "patternProperties": {
      "^[A-Z][A-Z0-9_ ]*$": {
        "type": "object",
        "properties": {
          "classification": {
            "type": "string",
            "enum": ["required", "recommended", "optional", "discouraged", "improper"]
          },
          "heading_level": {
            "type": "integer",
            "minimum": 1,
            "maximum": 6
          },
          "position": {
            "type": "string",
            "enum": ["after_title", "before_section_name", "after_section_name", "anywhere"]
          },
          "content_instruction": {"type": "string"},
          "min_paragraphs": {"type": "integer", "minimum": 0},
          "max_paragraphs": {"type": "integer", "minimum": 0},
          "min_code_blocks": {"type": "integer", "minimum": 0},
          "max_code_blocks": {"type": "integer", "minimum": 0},
          "min_lists": {"type": "integer", "minimum": 0},
          "max_lists": {"type": "integer", "minimum": 0},
          "warning_if_missing": {"type": "string"},
          "error_message": {"type": "string"},
          "alternatives": {
            "type": "array",
            "items": {"type": "string"}
          }
        },
        "required": ["classification"]
      }
    }
  },
  "x-markitect-content-control": {
    "type": "object",
    "patternProperties": {
      "^[a-z][a-z0-9_]*$": {
        "type": "object",
        "properties": {
          "required_patterns": {
            "type": "array",
            "items": {"type": "string", "format": "regex"}
          },
          "discouraged_patterns": {
            "type": "array",
            "items": {"type": "string", "format": "regex"}
          },
          "forbidden_patterns": {
            "type": "array",
            "items": {"type": "string", "format": "regex"}
          },
          "content_quality": {
            "type": "object",
            "properties": {
              "min_words": {"type": "integer", "minimum": 0},
              "max_words": {"type": "integer", "minimum": 0},
              "readability_target": {
                "type": "string",
                "enum": ["simple", "general", "technical", "advanced"]
              },
              "min_sentences": {"type": "integer", "minimum": 0},
              "max_sentences": {"type": "integer", "minimum": 0}
            }
          },
          "content_instructions": {
            "type": "array",
            "items": {"type": "string"}
          },
          "link_validation": {
            "type": "object",
            "properties": {
              "check_internal": {"type": "boolean"},
              "check_external": {"type": "boolean"},
              "allow_fragments": {"type": "boolean"}
            }
          }
        }
      }
    }
  }
}

Implementation Notes

Phase 1 Scope

  1. Define and document extension formats ✓
  2. Update metaschema to validate extensions
  3. Implement basic classification validation (required/recommended/optional/discouraged/improper)
  4. Create example schemas demonstrating all features
  5. Update CLI to report errors vs warnings separately

Future Enhancements (Phase 2+)

  • Content pattern matching implementation
  • Quality metrics calculation
  • Link validation
  • Readability scoring
  • Position constraints enforcement

Version History

  • v1.0 (Draft) - Initial specification for Phase 1 implementation
    • x-markitect-sections extension defined
    • x-markitect-content-control extension defined
    • Validation result structure defined
    • Metaschema validation rules defined

References

  • JSON Schema Draft-07: https://json-schema.org/draft-07/schema
  • MarkiTect Schema Evolution Workplan: examples/manpages/SCHEMA_EVOLUTION_WORKPLAN.md
  • Existing Metaschema: markitect/schemas/markitect-metaschema.json
  • Metaschema Validator: markitect/metaschema.py