Add comprehensive example showcasing schema validation with self-documenting manpage system: - markdown-manpage-schema.json: Reusable schema for Unix manpage structure - markdown-schema-validation.1.md: Complete manual about schema validation - README.md: Usage guide, integration examples, and best practices - SCHEMA_EVOLUTION_WORKPLAN.md: Roadmap for enhanced schema system The manual validates against its own schema, demonstrating dogfooding principle. Workplan outlines 5-phase evolution from rigid structural validation to flexible content control with blueprints. Key features demonstrated: - Schema-driven documentation structure - Self-validating documentation - Reusable validation patterns - Classification system design (required/recommended/optional/discouraged/improper) This sets foundation for Phase 1 implementation: enhanced schema format with section classification and content control. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
14 KiB
markdown-schema-validation(7) - Structured Document Validation with JSON Schema
SYNOPSIS
markitect schema-generate SOURCE_FILE [--output SCHEMA_FILE]
markitect schema-ingest SCHEMA_FILE
markitect validate DOCUMENT SCHEMA
markitect generate-stub SCHEMA [--output FILE]
DESCRIPTION
Markdown Schema Validation is MarkiTect's system for enforcing structural consistency in markdown documents. Unlike traditional markdown linters that check syntax, schema validation ensures documents conform to predefined structural patterns by validating their Abstract Syntax Tree (AST) representation against JSON Schema definitions.
This approach enables content management workflows where document structure is as important as content, making it ideal for technical documentation, business documents, and any scenario requiring consistent document templates.
How Schema Validation Works
MarkiTect parses markdown files into an AST representation, then validates the AST structure against JSON schemas. The validation process checks:
- Heading hierarchy - Required heading levels and counts
- Content elements - Minimum and maximum paragraph counts
- Structural patterns - Presence of lists, code blocks, tables
- Section organization - Required and optional document sections
Schemas validate structure, not semantics. A document can pass validation while containing incorrect content, as long as the structure matches the schema.
SCHEMA STRUCTURE
JSON Schema Format
MarkiTect schemas are standard JSON Schema (draft-07) documents with custom extensions for markdown-specific validation.
Standard Properties
- properties.headings
- Defines heading structure by level (level_1, level_2, level_3)
- Each level specifies minItems, maxItems, and content patterns
- properties.paragraphs
- Array constraints for paragraph counts
- Validates document length and content density
- properties.code_blocks
- Array constraints for code examples
- Ensures technical documentation includes examples
- properties.lists
- Array constraints for list elements
- Validates presence of structured information
- properties.emphasis
- Array constraints for bold and italic text
- Ensures appropriate use of emphasis
MarkiTect Extensions
MarkiTect extends JSON Schema with custom properties prefixed with x-markitect-:
- x-markitect-required-sections
- Array of required H2 section names
- Example: ["SYNOPSIS", "DESCRIPTION", "EXAMPLES"]
- x-markitect-recommended-sections
- Array of recommended but optional section names
- Generates warnings when missing
- x-markitect-outline-mode
- Boolean enabling outline-only validation
- Focuses on heading structure without content validation
- x-markitect-heading-text-capture
- Boolean enabling exact heading text validation
- Enforces specific section names
COMMANDS
Schema Generation
- markitect schema-generate SOURCE_FILE
- Analyzes markdown file AST and generates JSON schema
- Schema describes actual structure found in source document
- --output SCHEMA_FILE
- Write schema to file instead of stdout
- Default: outputs to terminal
- --max-depth N
- Limit heading analysis to depth N
- Useful for outline-focused schemas
Schema Management
- markitect schema-ingest SCHEMA_FILE
- Store schema in MarkiTect database
- Registers schema for reuse with validation commands
- markitect schema-list
- Display all stored schemas
- Shows schema names and metadata
- markitect schema-get SCHEMA_NAME
- Retrieve stored schema
- Outputs JSON schema to stdout
- markitect schema-delete SCHEMA_NAME
- Remove schema from database
- Permanently deletes schema definition
Document Validation
- markitect validate DOCUMENT SCHEMA
- Validate markdown document against schema
- Returns exit code 0 for valid, 4 for invalid
- --detailed-errors
- Show detailed validation error messages
- Includes suggestions for fixing violations
- --quiet
- Suppress output, exit code only
- Useful for scripting and automation
Template Generation
- markitect generate-stub SCHEMA
- Generate markdown template from schema
- Creates document outline following schema structure
- --output FILE
- Write template to file
- Default: outputs to stdout
WORKFLOW
Schema-Driven Development Workflow
The typical workflow for schema-based document management:
1. Generate Schema from Example
Create or identify an exemplar document with the desired structure, then generate its schema:
markitect schema-generate exemplar.md --output doc-schema.json
2. Refine Schema
Edit the generated schema to adjust constraints:
- Change minItems/maxItems for flexibility
- Add required-sections extensions
- Adjust heading patterns
- Add content instructions
3. Store Schema
Register schema for reuse:
markitect schema-ingest doc-schema.json
4. Generate Templates
Create document templates from schema:
markitect generate-stub doc-schema.json --output template.md
5. Create Documents
Write new documents using template as starting point, or use existing documents.
6. Validate Documents
Ensure documents conform to schema:
markitect validate new-document.md doc-schema.json
markitect validate new-document.md doc-schema.json --detailed-errors
7. Iterate
Fix validation errors and re-validate until document passes.
Batch Validation Workflow
For managing multiple documents:
for doc in docs/*.md; do
markitect validate "$doc" doc-schema.json --quiet || echo "Failed: $doc"
done
VALIDATION RULES
Heading Validation
Schemas validate heading structure through the headings property:
level_1 headings must appear exactly once (document title)
level_2 headings represent major sections (minItems/maxItems set bounds)
level_3 headings provide subsections (often optional with minItems: 0)
Heading content can be validated with pattern or enum constraints for exact section names.
Content Element Validation
Paragraphs - Validates document has sufficient descriptive content
Code blocks - Ensures technical documents include examples
Lists - Validates structured information presence
Emphasis - Checks for appropriate use of bold/italic formatting
Constraints use minItems and maxItems to set acceptable ranges.
Metadata Validation
The metadata property validates overall document characteristics:
total_elements - Total AST node count
structure_types - Array of AST node types present
Use const for exact matches or ranges for flexibility.
ERROR HANDLING
Common Validation Errors
Missing Required Section
Error: Required section 'SYNOPSIS' not found
Suggestion: Add H2 heading '## SYNOPSIS' near document start
Insufficient Content
Error: Too few paragraphs (found 3, minimum 5 required)
Suggestion: Add descriptive content to meet minimum paragraph count
Heading Count Mismatch
Error: Too many H2 headings (found 15, maximum 13 allowed)
Suggestion: Combine related sections or adjust schema maxItems
Structure Type Mismatch
Error: Expected structure types not found: code_blocks
Suggestion: Add code examples using fenced code blocks
Using Detailed Error Mode
Enable detailed errors for actionable feedback:
markitect validate document.md schema.json --detailed-errors
Output includes:
- Specific constraint violations
- Location information when available
- Suggestions for fixes
- Schema path to failing constraint
SCHEMA DESIGN
Best Practices
Start with Real Documents
Generate schemas from actual documents rather than writing from scratch. Real documents provide realistic constraints.
Use Ranges, Not Exact Counts
Allow flexibility with minItems/maxItems ranges:
"paragraphs": {
"minItems": 10,
"maxItems": 100
}
Avoid exact counts (const) unless structure is truly rigid.
Required vs Optional Sections
Use x-markitect-required-sections for essential sections like SYNOPSIS and DESCRIPTION.
Use x-markitect-recommended-sections for important but optional sections like EXAMPLES.
Heading Patterns
Use regex patterns for flexible heading validation:
"pattern": "^[A-Z][A-Z ]+$"
Matches UPPERCASE section names while allowing variation.
Progressive Refinement
Start with loose constraints, tighten based on validation experience with real documents.
Anti-Patterns
Over-Specification
Avoid schemas that are too specific:
"paragraphs": { "const": 47 }
This requires exactly 47 paragraphs, which is too rigid for most use cases.
Under-Specification
Avoid schemas that validate nothing:
"paragraphs": { "minItems": 0 }
Provide meaningful constraints that ensure document quality.
Semantic Validation
Schemas validate structure, not content. Don't expect schemas to validate:
- Correct grammar or spelling
- Factual accuracy
- Code correctness
- Logical flow
Use other tools for semantic validation.
INTEGRATION
CI/CD Integration
Validate documentation in continuous integration:
markitect validate README.md readme-schema.json --quiet
exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "Documentation valid"
else
echo "Documentation validation failed"
markitect validate README.md readme-schema.json --detailed-errors
exit 1
fi
Git Hooks
Pre-commit hook for automatic validation:
changed_docs=$(git diff --cached --name-only --diff-filter=ACM | grep '.md$')
for doc in $changed_docs; do
schema="${doc%.md}-schema.json"
if [ -f "$schema" ]; then
markitect validate "$doc" "$schema" || exit 1
fi
done
Build Systems
Makefile integration:
.PHONY: validate-docs
validate-docs:
@for doc in docs/*.md; do \
markitect validate "$$doc" doc-schema.json || exit 1; \
done
.PHONY: build
build: validate-docs
# Build process continues only if docs validate
EXAMPLES
Generate Schema from Document
markitect schema-generate examples/invoice.md --output invoice-schema.json
Store Schema for Reuse
markitect schema-ingest invoice-schema.json
markitect schema-list
Validate Single Document
markitect validate draft-invoice.md invoice-schema.json
markitect validate draft-invoice.md invoice-schema.json --detailed-errors
Batch Validation
for invoice in invoices/*.md; do
markitect validate "$invoice" invoice-schema.json --quiet
if [ $? -ne 0 ]; then
echo "Invalid: $invoice"
markitect validate "$invoice" invoice-schema.json --detailed-errors
fi
done
Template Generation
markitect generate-stub invoice-schema.json --output new-invoice-template.md
cat new-invoice-template.md
markitect validate new-invoice-template.md invoice-schema.json
Schema Refinement Workflow
markitect schema-generate example.md --output v1-schema.json
markitect validate test-doc.md v1-schema.json --detailed-errors
markitect schema-generate example.md --max-depth 2 --output v2-schema.json
markitect validate test-doc.md v2-schema.json
FILES
- *.json
- JSON schema files defining document structure
- Standard JSON Schema draft-07 format with MarkiTect extensions
- markitect.db
- Database storing ingested schemas
- SQLite database in current directory or specified path
- .markitect.yml
- Configuration file for default schemas
- YAML format with schema paths and validation rules
EXIT STATUS
- 0
- Success - document is valid
- 1
- General error - file not found, invalid arguments
- 2
- Configuration error - invalid schema file
- 3
- Database error - schema storage/retrieval failed
- 4
- Validation error - document does not conform to schema
ENVIRONMENT
- MARKITECT_DATABASE
- Path to database file for schema storage
- Default: markitect.db in current directory
- MARKITECT_SCHEMA_PATH
- Search path for schema files
- Colon-separated list of directories
- MARKITECT_VALIDATION_STRICT
- Enable strict validation mode
- Any non-empty value enables strict mode
SEE ALSO
markitect(1), json-schema(7), markdown-it(7)
Related documentation:
- JSON Schema Specification (https://json-schema.org/)
- MarkiTect Schema Reference
- AST Structure Documentation
- Template System Guide
LIMITATIONS
Schema validation has inherent limitations:
Structure Only
Schemas validate document structure, not content semantics. Cannot validate:
- Factual correctness
- Code functionality
- Logical consistency
- Language quality
AST-Based
Validation operates on parsed AST, not raw markdown. Some markdown formatting details may not be preserved or validated.
Performance
Large documents with complex schemas may have performance implications. AST caching mitigates this for repeated validations.
Schema Complexity
Very complex schemas can become difficult to maintain. Keep schemas as simple as possible while meeting requirements.
BUGS
Report bugs at: https://github.com/markitect/markitect/issues
Known issues:
- Schema generation from very large documents may be slow
- Some edge cases in heading pattern matching
- Limited support for custom markdown extensions
AUTHORS
MarkiTect development team
Schema validation system designed for structured content management and documentation consistency.
COPYRIGHT
Copyright (c) 2025 MarkiTect Project. Licensed under MIT License.
VERSION
This manual documents schema validation in MarkiTect version 1.0 and later.