Files
markitect-main/examples/manpages/markdown-schema-validation.1.md
tegwick c46d9f7a0b docs: update schema validation manual with Phase 1 features
Comprehensively document the new classification system and content control
features added in Phase 1.

## Documentation Updates

### New Content Added

**1. Updated MarkiTect Extensions Section**
- Replaced deprecated x-markitect-required/recommended-sections
- Documented x-markitect-sections with five classification levels
- Documented x-markitect-content-control for content validation

**2. Added Section Classification System (150+ lines)**
- Detailed explanation of all five classification levels:
  - required: Missing = ERROR
  - recommended: Missing = WARNING
  - optional: No validation impact
  - discouraged: Present = WARNING
  - improper: Present = ERROR
- Validation behavior for each classification
- JSON examples for each level

**3. Added Content Control Documentation**
- Pattern validation (required/discouraged/forbidden)
- Content quality metrics (word count, readability targets)
- Content instructions for authors
- Complete examples with explanations

**4. Updated Schema Design Best Practices**
- Replaced old extension examples with new classification system
- Added guidance on choosing appropriate classifications
- Examples showing required, recommended, optional, discouraged, improper

**5. Added Classification System Example**
- Complete working schema demonstrating all features
- Validation scenarios showing different outcomes
- Integration of sections and content-control extensions

## Changes Summary

**Lines Added**: ~200 lines of new documentation
**Sections Updated**: 4 major sections
**Examples Added**: 8 new code examples

**Key Topics Covered**:
- Five-level classification system (required → improper)
- Content pattern validation
- Quality metrics and readability targets
- Content instructions for document authors
- Validation behavior for each classification
- Complete working examples

## Validation

 Manual validates against improved markdown-manpage-schema.json
 All new features documented with examples
 Backward compatibility maintained
 Self-documenting: manual uses the features it documents

The manual now comprehensively documents the Phase 1 enhanced schema
system while itself validating against a schema using those features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 21:20:27 +01:00

20 KiB

markdown-schema-validation(7) - Structured Document Validation with JSON Schema

SYNOPSIS

markitect schema-generate SOURCE_FILE [--output SCHEMA_FILE]

markitect schema-ingest SCHEMA_FILE

markitect validate DOCUMENT SCHEMA

markitect generate-stub SCHEMA [--output FILE]

DESCRIPTION

Markdown Schema Validation is MarkiTect's system for enforcing structural consistency in markdown documents. Unlike traditional markdown linters that check syntax, schema validation ensures documents conform to predefined structural patterns by validating their Abstract Syntax Tree (AST) representation against JSON Schema definitions.

This approach enables content management workflows where document structure is as important as content, making it ideal for technical documentation, business documents, and any scenario requiring consistent document templates.

How Schema Validation Works

MarkiTect parses markdown files into an AST representation, then validates the AST structure against JSON schemas. The validation process checks:

  • Heading hierarchy - Required heading levels and counts
  • Content elements - Minimum and maximum paragraph counts
  • Structural patterns - Presence of lists, code blocks, tables
  • Section organization - Required and optional document sections

Schemas validate structure, not semantics. A document can pass validation while containing incorrect content, as long as the structure matches the schema.

SCHEMA STRUCTURE

JSON Schema Format

MarkiTect schemas are standard JSON Schema (draft-07) documents with custom extensions for markdown-specific validation.

Standard Properties

properties.headings
Defines heading structure by level (level_1, level_2, level_3)
Each level specifies minItems, maxItems, and content patterns
properties.paragraphs
Array constraints for paragraph counts
Validates document length and content density
properties.code_blocks
Array constraints for code examples
Ensures technical documentation includes examples
properties.lists
Array constraints for list elements
Validates presence of structured information
properties.emphasis
Array constraints for bold and italic text
Ensures appropriate use of emphasis

MarkiTect Extensions

MarkiTect extends JSON Schema with custom properties prefixed with x-markitect-:

x-markitect-sections
Section classification and content control system
Defines sections with five classification levels:
  • required: Must be present (validation fails if missing)
  • recommended: Should be present (warning if missing)
  • optional: May be present (no validation impact)
  • discouraged: Should not be present (warning if present)
  • improper: Must not be present (validation fails if present)
Each section can specify content instructions, constraints, and custom messages
x-markitect-content-control
Content validation rules for section content
Defines required/discouraged/forbidden patterns
Specifies content quality metrics (word count, readability)
Provides content instructions for authors
x-markitect-outline-mode
Boolean enabling outline-only validation
Focuses on heading structure without content validation
x-markitect-heading-text-capture
Boolean enabling exact heading text validation
Enforces specific section names

COMMANDS

Schema Generation

markitect schema-generate SOURCE_FILE
Analyzes markdown file AST and generates JSON schema
Schema describes actual structure found in source document
--output SCHEMA_FILE
Write schema to file instead of stdout
Default: outputs to terminal
--max-depth N
Limit heading analysis to depth N
Useful for outline-focused schemas

Schema Management

markitect schema-ingest SCHEMA_FILE
Store schema in MarkiTect database
Registers schema for reuse with validation commands
markitect schema-list
Display all stored schemas
Shows schema names and metadata
markitect schema-get SCHEMA_NAME
Retrieve stored schema
Outputs JSON schema to stdout
markitect schema-delete SCHEMA_NAME
Remove schema from database
Permanently deletes schema definition

Document Validation

markitect validate DOCUMENT SCHEMA
Validate markdown document against schema
Returns exit code 0 for valid, 4 for invalid
--detailed-errors
Show detailed validation error messages
Includes suggestions for fixing violations
--quiet
Suppress output, exit code only
Useful for scripting and automation

Template Generation

markitect generate-stub SCHEMA
Generate markdown template from schema
Creates document outline following schema structure
--output FILE
Write template to file
Default: outputs to stdout

WORKFLOW

Schema-Driven Development Workflow

The typical workflow for schema-based document management:

1. Generate Schema from Example

Create or identify an exemplar document with the desired structure, then generate its schema:

markitect schema-generate exemplar.md --output doc-schema.json

2. Refine Schema

Edit the generated schema to adjust constraints:

  • Change minItems/maxItems for flexibility
  • Add required-sections extensions
  • Adjust heading patterns
  • Add content instructions

3. Store Schema

Register schema for reuse:

markitect schema-ingest doc-schema.json

4. Generate Templates

Create document templates from schema:

markitect generate-stub doc-schema.json --output template.md

5. Create Documents

Write new documents using template as starting point, or use existing documents.

6. Validate Documents

Ensure documents conform to schema:

markitect validate new-document.md doc-schema.json

markitect validate new-document.md doc-schema.json --detailed-errors

7. Iterate

Fix validation errors and re-validate until document passes.

Batch Validation Workflow

For managing multiple documents:

for doc in docs/*.md; do
    markitect validate "$doc" doc-schema.json --quiet || echo "Failed: $doc"
done

VALIDATION RULES

Heading Validation

Schemas validate heading structure through the headings property:

level_1 headings must appear exactly once (document title)

level_2 headings represent major sections (minItems/maxItems set bounds)

level_3 headings provide subsections (often optional with minItems: 0)

Heading content can be validated with pattern or enum constraints for exact section names.

Content Element Validation

Paragraphs - Validates document has sufficient descriptive content

Code blocks - Ensures technical documents include examples

Lists - Validates structured information presence

Emphasis - Checks for appropriate use of bold/italic formatting

Constraints use minItems and maxItems to set acceptable ranges.

Metadata Validation

The metadata property validates overall document characteristics:

total_elements - Total AST node count

structure_types - Array of AST node types present

Use const for exact matches or ranges for flexibility.

Section Classification System

MarkiTect provides a five-level classification system for document sections through x-markitect-sections:

Required Sections

Sections marked as required must be present in the document. Validation fails with an error if missing.

"SYNOPSIS": {
  "classification": "required",
  "error_message": "SYNOPSIS section is mandatory for all manpages"
}

Validation Behavior:

  • Missing → ERROR → validation fails
  • Present → Continue validation

Sections marked as recommended should be present. A warning is generated if missing, but validation succeeds.

"EXAMPLES": {
  "classification": "recommended",
  "warning_if_missing": "Examples improve documentation usability"
}

Validation Behavior:

  • Missing → WARNING → validation succeeds with warnings
  • Present → Continue validation

Optional Sections

Sections marked as optional may or may not be present with no validation impact.

"BUGS": {
  "classification": "optional",
  "content_instruction": "Known issues and bug reporting"
}

Validation Behavior:

  • Missing → No impact
  • Present → Continue validation

Discouraged Sections

Sections marked as discouraged should not be present. A warning is generated if found, but validation succeeds.

"DEPRECATED": {
  "classification": "discouraged",
  "warning_if_missing": "Move deprecated content to HISTORY section"
}

Validation Behavior:

  • Missing → No impact
  • Present → WARNING → validation succeeds with warnings

Improper Sections

Sections marked as improper must not be present. Validation fails with an error if found.

"TODO": {
  "classification": "improper",
  "error_message": "TODO sections must be removed before publication"
}

Validation Behavior:

  • Missing → No impact
  • Present → ERROR → validation fails

Content Control

The x-markitect-content-control extension enables content-level validation:

Pattern Validation

required_patterns - Array of regex patterns that must appear in content:

"required_patterns": ["\\*\\*command\\*\\*", "\\[.*\\]"]

discouraged_patterns - Patterns that should not appear (generates warnings):

"discouraged_patterns": ["TODO", "FIXME", "\\bWIP\\b"]

forbidden_patterns - Patterns that must not appear (validation fails):

"forbidden_patterns": ["password\\s*=", "api[_-]?key\\s*="]

Content Quality Metrics

Validate content length and readability:

"content_quality": {
  "min_words": 50,
  "max_words": 1000,
  "readability_target": "technical",
  "min_sentences": 3
}

Readability Targets:

  • simple - Elementary school level
  • general - General audience
  • technical - Technical audience (default for documentation)
  • advanced - Expert/academic level

Content Instructions

Provide guidance for content authors:

"content_instructions": [
  "Show command name in bold",
  "Use brackets [] for optional arguments",
  "Keep synopsis concise (1-5 lines)"
]

These instructions appear in validation reports and generated templates.

ERROR HANDLING

Common Validation Errors

Missing Required Section

Error: Required section 'SYNOPSIS' not found
Suggestion: Add H2 heading '## SYNOPSIS' near document start

Insufficient Content

Error: Too few paragraphs (found 3, minimum 5 required)
Suggestion: Add descriptive content to meet minimum paragraph count

Heading Count Mismatch

Error: Too many H2 headings (found 15, maximum 13 allowed)
Suggestion: Combine related sections or adjust schema maxItems

Structure Type Mismatch

Error: Expected structure types not found: code_blocks
Suggestion: Add code examples using fenced code blocks

Using Detailed Error Mode

Enable detailed errors for actionable feedback:

markitect validate document.md schema.json --detailed-errors

Output includes:

  • Specific constraint violations
  • Location information when available
  • Suggestions for fixes
  • Schema path to failing constraint

SCHEMA DESIGN

Best Practices

Start with Real Documents

Generate schemas from actual documents rather than writing from scratch. Real documents provide realistic constraints.

Use Ranges, Not Exact Counts

Allow flexibility with minItems/maxItems ranges:

"paragraphs": {
  "minItems": 10,
  "maxItems": 100
}

Avoid exact counts (const) unless structure is truly rigid.

Section Classification

Use the five-level classification system to define section requirements:

"x-markitect-sections": {
  "SYNOPSIS": {
    "classification": "required",
    "content_instruction": "Brief command syntax",
    "error_message": "SYNOPSIS is mandatory"
  },
  "EXAMPLES": {
    "classification": "recommended",
    "warning_if_missing": "Examples improve usability"
  },
  "BUGS": {
    "classification": "optional"
  }
}

Choose classifications based on importance:

  • required for essential sections (SYNOPSIS, DESCRIPTION)
  • recommended for important sections (EXAMPLES, SEE ALSO)
  • optional for nice-to-have sections (BUGS, AUTHORS)
  • discouraged for sections that should be elsewhere (DEPRECATED)
  • improper for sections that must not appear (TODO, INTERNAL_NOTES)

Heading Patterns

Use regex patterns for flexible heading validation:

"pattern": "^[A-Z][A-Z ]+$"

Matches UPPERCASE section names while allowing variation.

Progressive Refinement

Start with loose constraints, tighten based on validation experience with real documents.

Anti-Patterns

Over-Specification

Avoid schemas that are too specific:

"paragraphs": { "const": 47 }

This requires exactly 47 paragraphs, which is too rigid for most use cases.

Under-Specification

Avoid schemas that validate nothing:

"paragraphs": { "minItems": 0 }

Provide meaningful constraints that ensure document quality.

Semantic Validation

Schemas validate structure, not content. Don't expect schemas to validate:

  • Correct grammar or spelling
  • Factual accuracy
  • Code correctness
  • Logical flow

Use other tools for semantic validation.

INTEGRATION

CI/CD Integration

Validate documentation in continuous integration:

markitect validate README.md readme-schema.json --quiet
exit_code=$?

if [ $exit_code -eq 0 ]; then
    echo "Documentation valid"
else
    echo "Documentation validation failed"
    markitect validate README.md readme-schema.json --detailed-errors
    exit 1
fi

Git Hooks

Pre-commit hook for automatic validation:

changed_docs=$(git diff --cached --name-only --diff-filter=ACM | grep '.md$')

for doc in $changed_docs; do
    schema="${doc%.md}-schema.json"
    if [ -f "$schema" ]; then
        markitect validate "$doc" "$schema" || exit 1
    fi
done

Build Systems

Makefile integration:

.PHONY: validate-docs
validate-docs:
	@for doc in docs/*.md; do \
		markitect validate "$$doc" doc-schema.json || exit 1; \
	done

.PHONY: build
build: validate-docs
	# Build process continues only if docs validate

EXAMPLES

Generate Schema from Document

markitect schema-generate examples/invoice.md --output invoice-schema.json

Store Schema for Reuse

markitect schema-ingest invoice-schema.json
markitect schema-list

Validate Single Document

markitect validate draft-invoice.md invoice-schema.json

markitect validate draft-invoice.md invoice-schema.json --detailed-errors

Batch Validation

for invoice in invoices/*.md; do
    markitect validate "$invoice" invoice-schema.json --quiet
    if [ $? -ne 0 ]; then
        echo "Invalid: $invoice"
        markitect validate "$invoice" invoice-schema.json --detailed-errors
    fi
done

Template Generation

markitect generate-stub invoice-schema.json --output new-invoice-template.md

cat new-invoice-template.md

markitect validate new-invoice-template.md invoice-schema.json

Schema Refinement Workflow

markitect schema-generate example.md --output v1-schema.json

markitect validate test-doc.md v1-schema.json --detailed-errors

markitect schema-generate example.md --max-depth 2 --output v2-schema.json

markitect validate test-doc.md v2-schema.json

Schema with Classification System

Create a schema with section classifications and content control:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Technical Documentation Schema",
  "x-markitect-sections": {
    "OVERVIEW": {
      "classification": "required",
      "heading_level": 2,
      "content_instruction": "High-level description of the system",
      "min_paragraphs": 2,
      "error_message": "OVERVIEW section is required"
    },
    "EXAMPLES": {
      "classification": "recommended",
      "heading_level": 2,
      "min_code_blocks": 2,
      "warning_if_missing": "Examples help users understand usage"
    },
    "REFERENCES": {
      "classification": "optional",
      "heading_level": 2,
      "content_instruction": "External documentation and resources"
    },
    "TODO": {
      "classification": "improper",
      "error_message": "Remove TODO sections before publishing"
    }
  },
  "x-markitect-content-control": {
    "overview": {
      "discouraged_patterns": ["TODO", "FIXME"],
      "forbidden_patterns": ["password", "secret"],
      "content_quality": {
        "min_words": 100,
        "max_words": 500,
        "readability_target": "technical"
      }
    }
  },
  "properties": {
    "headings": {
      "properties": {
        "level_1": {"minItems": 1, "maxItems": 1},
        "level_2": {"minItems": 2, "maxItems": 20}
      }
    },
    "paragraphs": {"minItems": 10, "maxItems": 200},
    "code_blocks": {"minItems": 1}
  }
}

Validate documents against this schema:

# Missing required section = ERROR
markitect validate doc-without-overview.md tech-schema.json
# Result: INVALID - missing required section OVERVIEW

# Missing recommended section = WARNING
markitect validate doc-without-examples.md tech-schema.json
# Result: VALID (with warnings) - missing recommended section EXAMPLES

# Improper section present = ERROR
markitect validate doc-with-todo.md tech-schema.json
# Result: INVALID - improper section TODO must not be present

FILES

*.json
JSON schema files defining document structure
Standard JSON Schema draft-07 format with MarkiTect extensions
markitect.db
Database storing ingested schemas
SQLite database in current directory or specified path
.markitect.yml
Configuration file for default schemas
YAML format with schema paths and validation rules

EXIT STATUS

0
Success - document is valid
1
General error - file not found, invalid arguments
2
Configuration error - invalid schema file
3
Database error - schema storage/retrieval failed
4
Validation error - document does not conform to schema

ENVIRONMENT

MARKITECT_DATABASE
Path to database file for schema storage
Default: markitect.db in current directory
MARKITECT_SCHEMA_PATH
Search path for schema files
Colon-separated list of directories
MARKITECT_VALIDATION_STRICT
Enable strict validation mode
Any non-empty value enables strict mode

SEE ALSO

markitect(1), json-schema(7), markdown-it(7)

Related documentation:

  • JSON Schema Specification (https://json-schema.org/)
  • MarkiTect Schema Reference
  • AST Structure Documentation
  • Template System Guide

LIMITATIONS

Schema validation has inherent limitations:

Structure Only

Schemas validate document structure, not content semantics. Cannot validate:

  • Factual correctness
  • Code functionality
  • Logical consistency
  • Language quality

AST-Based

Validation operates on parsed AST, not raw markdown. Some markdown formatting details may not be preserved or validated.

Performance

Large documents with complex schemas may have performance implications. AST caching mitigates this for repeated validations.

Schema Complexity

Very complex schemas can become difficult to maintain. Keep schemas as simple as possible while meeting requirements.

BUGS

Report bugs at: https://github.com/markitect/markitect/issues

Known issues:

  • Schema generation from very large documents may be slow
  • Some edge cases in heading pattern matching
  • Limited support for custom markdown extensions

AUTHORS

MarkiTect development team

Schema validation system designed for structured content management and documentation consistency.

Copyright (c) 2025 MarkiTect Project. Licensed under MIT License.

VERSION

This manual documents schema validation in MarkiTect version 1.0 and later.