22 KiB
MarkiTect Schema Evolution Workplan
Executive Summary
Current State: MarkiTect validates document structure via JSON Schema, but is too rigid (exact counts) and structure-only (no content guidance).
Target State: A flexible schema system with content control, section classification, multi-schema conformance, and blueprint-based document generation.
Timeline: 5 phases, 15-20 development sessions, approximately 8-10 weeks.
Problem Analysis
Current Limitations
1. Structural Rigidity
Problem: Auto-generated schemas use exact counts
"paragraphs": { "minItems": 86, "maxItems": 86 }
Impact: Schemas are document-specific, not reusable patterns.
2. Binary Structure Validation
Problem: Elements are either valid or invalid, no classification. Need: Required, Recommended, Optional, Discouraged, Improper classifications.
3. No Content Guidance
Problem: Schemas validate structure exists, not what content belongs there. Need: Content instructions, semantic patterns, quality expectations.
4. Single Schema Limitation
Problem: Documents can only conform to one schema. Need: Multi-schema conformance (e.g., "manpage" + "API reference" + "tutorial").
5. Template Generation Gap
Problem: generate-stub creates outline, but no content guidance or data binding.
Need: Blueprint system with content instructions and data templates.
Proposed Architecture
Three-Layer System
┌─────────────────────────────────────────────┐
│ BLUEPRINT LAYER │
│ (Multi-schema + Content + Data Templates) │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ SCHEMA LAYER (Enhanced) │
│ (Structure + Classification + Instructions) │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ VALIDATION LAYER │
│ (AST Validation + Content Analysis) │
└─────────────────────────────────────────────┘
Key Concepts
1. Schema Classification System
- Required: Must be present, validation fails if missing
- Recommended: Should be present, warning if missing
- Optional: May be present, no validation impact
- Discouraged: Should not be present, warning if present
- Improper: Must not be present, validation fails if present
2. Content Control
- Content Instructions: Human-readable guidance for section content
- Content Patterns: Regex/template patterns for content validation
- Content Quality Metrics: Word count, readability, completeness scoring
3. Multi-Schema Conformance
- Documents can conform to multiple schemas simultaneously
- Schema composition and inheritance
- Conflict resolution strategies
4. Blueprint System
- Schemas + Instructions + Data Templates = Blueprints
- Blueprints generate documents with content guidance
- Data binding for dynamic document generation
Phase 1: Enhanced Schema Format
Goal: Extend JSON Schema with MarkiTect-specific content control extensions.
1.1 Schema Classification Extensions
New Properties:
{
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"position": "after_title",
"content_instruction": "Brief command syntax showing all options",
"min_code_blocks": 1,
"max_code_blocks": 3
},
"EXAMPLES": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Practical usage examples with explanations",
"min_code_blocks": 3,
"warning_if_missing": "Examples greatly improve documentation usability"
},
"DEPRECATED": {
"classification": "discouraged",
"heading_level": 2,
"warning_message": "DEPRECATED sections should be moved to historical docs"
},
"INTERNAL_NOTES": {
"classification": "improper",
"heading_level": 2,
"error_message": "Internal notes must not appear in published documentation"
}
}
}
1.2 Content Control Extensions
New Properties:
{
"x-markitect-content-control": {
"synopsis_section": {
"min_paragraphs": 1,
"max_paragraphs": 3,
"required_patterns": [
"\\*\\*[a-z-]+\\*\\*.*\\[.*\\]" // Bold command with args
],
"content_quality": {
"min_words": 10,
"max_words": 100,
"readability_target": "technical"
},
"content_instructions": [
"Show command name in bold",
"Include all major options in synopsis",
"Use italic for arguments and placeholders"
]
}
}
}
1.3 Flexible Structure Constraints
Replace rigid counts with ranges and classifications:
{
"properties": {
"headings": {
"properties": {
"level_2": {
"items": {
"properties": {
"content": {
"oneOf": [
{"const": "SYNOPSIS", "x-markitect-classification": "required"},
{"const": "DESCRIPTION", "x-markitect-classification": "required"},
{"const": "EXAMPLES", "x-markitect-classification": "recommended"},
{"const": "SEE ALSO", "x-markitect-classification": "optional"}
]
}
}
},
"minItems": 2, // At least required sections
"maxItems": 30 // Reasonable upper bound
}
}
}
}
}
Tasks
- Task 1.1: Define
x-markitect-sectionsschema extension format - Task 1.2: Define
x-markitect-content-controlschema extension format - Task 1.3: Update metaschema to validate new extensions
- Task 1.4: Create schema examples demonstrating all classifications
- Task 1.5: Document schema extension format
Duration: 3-4 sessions Dependencies: None Deliverables: Enhanced schema format specification, updated metaschema
Phase 2: Schema Refinement Tools
Goal: Tools to transform rigid auto-generated schemas into flexible, classified schemas.
2.1 Schema Analysis Tool
Command: markitect schema-analyze
Analyzes existing schema and suggests improvements:
markitect schema-analyze rigid-schema.json
# Output:
⚠️ Exact counts detected (86 paragraphs)
Suggestion: Use range 50-150 for flexibility
⚠️ All sections unclassified
Suggestion: Classify sections as required/recommended/optional
⚠️ No content instructions
Suggestion: Add content guidance for key sections
✨ Run: markitect schema-refine rigid-schema.json
2.2 Schema Refinement Tool
Command: markitect schema-refine
Interactive or automated schema refinement:
# Automated: Apply common refinements
markitect schema-refine rigid-schema.json \
--loosen-counts \
--add-classifications \
--output flexible-schema.json
# Interactive: Guided refinement
markitect schema-refine rigid-schema.json --interactive
Refinement Operations:
- Convert exact counts to ranges (configurable tolerance)
- Classify sections based on conventions
- Add content instructions from templates
- Merge multiple schemas for common patterns
2.3 Schema Composition Tool
Command: markitect schema-compose
Combine multiple schemas:
# Create composite schema
markitect schema-compose \
--base manpage-schema.json \
--extend api-reference-schema.json \
--extend tutorial-schema.json \
--output composite-schema.json
Tasks
- Task 2.1: Implement
schema-analyzecommand - Task 2.2: Implement
schema-refinecommand with loosening logic - Task 2.3: Implement
schema-refine --interactivemode - Task 2.4: Implement
schema-composecommand - Task 2.5: Create schema refinement rule library
Duration: 3-4 sessions Dependencies: Phase 1 complete Deliverables: Schema analysis, refinement, and composition tools
Phase 3: Enhanced Validation Engine
Goal: Validate classification levels, content patterns, and multi-schema conformance.
3.1 Classification-Aware Validation
Validation Levels:
class ValidationResult:
status: Literal["valid", "valid_with_warnings", "invalid"]
errors: List[ValidationError] # Required/Improper violations
warnings: List[ValidationWarning] # Recommended/Discouraged violations
suggestions: List[str] # Optional improvements
Example Output:
markitect validate document.md schema.json --detailed-errors
❌ ERRORS (validation failed)
- Missing required section: SYNOPSIS
- Improper section present: INTERNAL_NOTES
⚠️ WARNINGS
- Missing recommended section: EXAMPLES
- Discouraged section present: DEPRECATED
💡 SUGGESTIONS
- Consider adding optional section: PERFORMANCE
- Content quality: DESCRIPTION section below recommended word count (45/100)
Status: INVALID (2 errors, 2 warnings)
3.2 Content Pattern Validation
Validate content patterns:
# Schema specifies required patterns
"synopsis_section": {
"required_patterns": [
r"\*\*command\*\*", # Bold command name
r"\[.*\]" # Options in brackets
],
"discouraged_patterns": [
r"TODO", # No TODOs in published docs
r"FIXME"
]
}
3.3 Multi-Schema Validation
Command: markitect validate --schemas
# Validate against multiple schemas
markitect validate api-doc.md \
--schemas manpage.json,api-reference.json,tutorial.json \
--require-all
# Output shows conformance to each schema
✅ manpage.json: VALID
✅ api-reference.json: VALID (2 warnings)
❌ tutorial.json: INVALID (missing required section: GETTING STARTED)
Overall: INVALID (must conform to all schemas)
3.4 Content Quality Metrics
Validate content quality:
markitect validate document.md schema.json --quality-check
📊 Content Quality Report
- Word count: 487 (target: 300-1000) ✅
- Code examples: 3 (minimum: 3) ✅
- Readability: Technical (appropriate) ✅
- Link validity: 12/12 valid ✅
- Heading hierarchy: Valid ✅
Quality Score: 95/100
Tasks
- Task 3.1: Implement classification-aware validator
- Task 3.2: Implement content pattern validation
- Task 3.3: Implement multi-schema validation
- Task 3.4: Implement content quality metrics
- Task 3.5: Enhanced error reporting with suggestions
Duration: 4-5 sessions Dependencies: Phase 1 complete Deliverables: Enhanced validation engine, quality metrics
Phase 4: Blueprint System
Goal: Document generation system with schemas + content instructions + data templates.
4.1 Blueprint Format
Blueprint Structure:
{
"$blueprint": "1.0",
"name": "api-documentation-blueprint",
"description": "Blueprint for API endpoint documentation",
"schemas": [
"manpage-schema.json",
"api-reference-schema.json"
],
"content_model": {
"synopsis": {
"template": "**{{command}}** [*OPTIONS*] *{{primary_argument}}*",
"data_source": "command_metadata.json",
"instruction": "Brief command syntax"
},
"description": {
"template": "{{description}}\n\nThis endpoint {{purpose}}.",
"min_paragraphs": 2,
"instruction": "Explain what the endpoint does and why to use it"
},
"parameters": {
"template": "{{#each parameters}}\n**{{name}}** *{{type}}*\n: {{description}}\n{{/each}}",
"data_source": "parameters",
"instruction": "Document all parameters with types and descriptions"
}
},
"data_schema": {
"type": "object",
"properties": {
"command": {"type": "string"},
"primary_argument": {"type": "string"},
"description": {"type": "string"},
"purpose": {"type": "string"},
"parameters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string"},
"description": {"type": "string"}
}
}
}
}
},
"generation_rules": {
"heading_style": "atx",
"code_fence_style": "backticks",
"line_length": 80,
"include_metadata": true
}
}
4.2 Blueprint Commands
Create Blueprint:
# From existing schema
markitect blueprint-create --from-schema api-schema.json \
--output api-blueprint.json
# Interactive creation
markitect blueprint-create --interactive
Generate from Blueprint:
# Generate with data file
markitect blueprint-generate api-blueprint.json \
--data endpoint-data.json \
--output api-doc.md
# Generate with inline data
markitect blueprint-generate api-blueprint.json \
--data '{"command": "api-call", "description": "Make API call"}' \
--output api-doc.md
# Batch generation
markitect blueprint-generate-batch api-blueprint.json \
--data-dir ./endpoints/ \
--output-dir ./docs/api/
Validate Blueprint:
# Validate blueprint format
markitect blueprint-validate api-blueprint.json
# Test blueprint generation
markitect blueprint-test api-blueprint.json \
--sample-data test-data.json
4.3 Template Engine Integration
Handlebars-style templates with MarkiTect extensions:
# {{command}}(1) - {{title}}
## SYNOPSIS
**{{command}}** {{#each options}}[*{{this}}*] {{/each}}*{{argument}}*
## DESCRIPTION
{{description}}
{{#markitect-section "technical-details"}}
Technical implementation details for {{command}}.
{{/markitect-section}}
## PARAMETERS
{{#each parameters}}
**--{{name}}** *{{type}}*
: {{description}}
: {{#if default}}Default: `{{default}}`{{/if}}
{{/each}}
{{#markitect-code-block "bash"}}
# Example usage
{{command}} {{#each examples.[0].args}}{{this}} {{/each}}
{{/markitect-code-block}}
Tasks
- Task 4.1: Define blueprint format specification
- Task 4.2: Implement
blueprint-createcommand - Task 4.3: Implement
blueprint-generatecommand - Task 4.4: Implement template engine with Handlebars
- Task 4.5: Implement
blueprint-validatecommand - Task 4.6: Implement batch generation
- Task 4.7: Create blueprint library (common patterns)
Duration: 5-6 sessions Dependencies: Phases 1 and 3 complete Deliverables: Blueprint system, template engine, generation commands
Phase 5: Documentation and Integration
Goal: Comprehensive documentation, examples, and ecosystem integration.
5.1 Documentation Suite
Documents to Create:
- Schema Evolution Guide (why and how)
- Schema Classification Reference
- Content Control Specification
- Blueprint System Guide
- Schema Design Best Practices
- Migration Guide (old schemas → new format)
- API Reference for programmatic usage
5.2 Example Gallery
Create comprehensive examples:
- Manpage blueprint (already started)
- API documentation blueprint
- Tutorial document blueprint
- Architecture Decision Record (ADR) blueprint
- RFC/specification blueprint
- Meeting notes blueprint
- Project README blueprint
5.3 CLI Integration
Update existing commands:
# schema-generate with classification
markitect schema-generate example.md \
--classify-sections \
--add-instructions \
--flexible \
--output smart-schema.json
# validate with multiple schemas
markitect validate doc.md \
--schemas schema1.json,schema2.json \
--classification-aware \
--quality-check
# generate-stub enhanced
markitect generate-stub schema.json \
--include-instructions \
--sample-content \
--output template.md
5.4 CI/CD Integration Templates
Provide ready-to-use integrations:
GitHub Actions:
- name: Validate Documentation
uses: markitect/validate-action@v1
with:
schemas: docs/schemas/*.json
files: docs/**/*.md
classification-aware: true
fail-on: errors
warn-on: missing-recommended
Pre-commit hook:
#!/bin/bash
markitect validate-changed --schemas docs/schemas/ \
--classification-aware \
--fail-on errors
Tasks
- Task 5.1: Write comprehensive documentation suite
- Task 5.2: Create example gallery with 7+ blueprints
- Task 5.3: Update all CLI commands for new features
- Task 5.4: Create CI/CD integration templates
- Task 5.5: Write migration guide for existing schemas
- Task 5.6: Create video tutorials/screencasts
Duration: 3-4 sessions Dependencies: All previous phases complete Deliverables: Complete documentation, examples, integrations
Implementation Strategy
Development Approach
1. Test-Driven Development
- Write tests for each classification level
- Test schema refinement transformations
- Test blueprint generation with various data
- Test multi-schema validation
2. Backward Compatibility
- Existing schemas continue to work
- New features are opt-in via extensions
- Clear migration path documented
3. Incremental Rollout
- Phase 1: Can be used immediately after completion
- Each phase delivers user value independently
- Later phases build on earlier phases
4. Community Feedback
- Alpha release after Phase 1
- Beta release after Phase 3
- Stable release after Phase 5
Technical Considerations
Schema Format:
- JSON Schema draft-07 as foundation
- MarkiTect extensions namespaced with
x-markitect- - Validation via metaschema
- Clear upgrade path to future JSON Schema versions
Performance:
- Cache compiled schemas
- Lazy validation for large documents
- Parallel validation for multiple schemas
- Optimize content pattern matching
API Design:
- Programmatic access to all features
- Python API for schema manipulation
- Plugin system for custom validators
- Extensible template engine
Success Metrics
Phase 1 Success
- ✅ Schema with all 5 classifications validates correctly
- ✅ Content instructions appear in generated stubs
- ✅ Metaschema validates all extension formats
Phase 2 Success
- ✅ Rigid schema refined to flexible schema automatically
- ✅ Multiple schemas composed without conflicts
- ✅ Interactive refinement completes end-to-end
Phase 3 Success
- ✅ Validation distinguishes errors from warnings
- ✅ Content patterns detected and reported
- ✅ Multi-schema validation works with 3+ schemas
- ✅ Quality metrics provide actionable feedback
Phase 4 Success
- ✅ Blueprint generates valid document from data
- ✅ Generated document validates against source schemas
- ✅ Batch generation processes 100+ documents
- ✅ Template engine supports complex logic
Phase 5 Success
- ✅ Documentation covers all features
- ✅ 7+ working blueprint examples
- ✅ CI/CD integrations work in real projects
- ✅ Migration guide successfully upgrades old schemas
Risk Assessment
Technical Risks
Risk: Schema format complexity Mitigation: Clear examples, validation tools, gradual adoption
Risk: Performance degradation with complex schemas Mitigation: Caching, optimization, benchmarking
Risk: Template engine security (code injection) Mitigation: Sandboxed execution, no eval, strict parsing
Adoption Risks
Risk: Breaking changes to existing workflows Mitigation: Full backward compatibility, opt-in features
Risk: Learning curve for new features Mitigation: Excellent documentation, examples, tutorials
Risk: Feature bloat Mitigation: Keep core simple, advanced features optional
Future Enhancements (Post-MVP)
Potential Future Features
1. Semantic Validation
- AI-powered content quality checking
- Grammar and style validation
- Factual consistency checking
- Link and reference validation
2. Visual Schema Editor
- Web-based GUI for schema creation
- Visual blueprint designer
- Live preview of generated documents
- Drag-and-drop section arrangement
3. Schema Marketplace
- Community schema repository
- Reusable blueprint library
- Rating and reviews system
- Version management
4. Advanced Blueprint Features
- Conditional sections based on data
- Dynamic schema selection
- Multi-language support
- Custom helper functions
5. Integration Ecosystem
- IDE plugins (VS Code, JetBrains)
- Documentation platforms (Read the Docs, Docusaurus)
- CMS integrations (Contentful, Strapi)
- Static site generators (Hugo, Jekyll)
Conclusion
This workplan transforms MarkiTect from a structural validator to a comprehensive document control system:
Current: Rigid structure validation Target: Flexible content control with blueprints
Key Improvements:
- ✨ Classification system (required → improper)
- ✨ Content guidance and instructions
- ✨ Multi-schema conformance
- ✨ Blueprint-based generation
- ✨ Quality metrics and analysis
Timeline: ~8-10 weeks for full implementation Value: Complete CMS-like document control for markdown
The system remains true to MarkiTect's philosophy of treating markdown as structured data while adding the flexibility and guidance needed for real-world content management.
Next Steps
- Review and refine this workplan
- Prioritize phases based on user needs
- Create detailed specifications for Phase 1
- Set up development environment for new features
- Begin implementation with TDD approach
First Implementation Task: Define x-markitect-sections format specification