# MarkiTect Schema Evolution Workplan ## Executive Summary **Current State**: MarkiTect validates document structure via JSON Schema, but is too rigid (exact counts) and structure-only (no content guidance). **Target State**: A flexible schema system with content control, section classification, multi-schema conformance, and blueprint-based document generation. **Timeline**: 5 phases, 15-20 development sessions, approximately 8-10 weeks. --- ## Problem Analysis ### Current Limitations #### 1. Structural Rigidity **Problem**: Auto-generated schemas use exact counts ```json "paragraphs": { "minItems": 86, "maxItems": 86 } ``` **Impact**: Schemas are document-specific, not reusable patterns. #### 2. Binary Structure Validation **Problem**: Elements are either valid or invalid, no classification. **Need**: Required, Recommended, Optional, Discouraged, Improper classifications. #### 3. No Content Guidance **Problem**: Schemas validate structure exists, not what content belongs there. **Need**: Content instructions, semantic patterns, quality expectations. #### 4. Single Schema Limitation **Problem**: Documents can only conform to one schema. **Need**: Multi-schema conformance (e.g., "manpage" + "API reference" + "tutorial"). #### 5. Template Generation Gap **Problem**: `generate-stub` creates outline, but no content guidance or data binding. **Need**: Blueprint system with content instructions and data templates. --- ## Proposed Architecture ### Three-Layer System ``` ┌─────────────────────────────────────────────┐ │ BLUEPRINT LAYER │ │ (Multi-schema + Content + Data Templates) │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ SCHEMA LAYER (Enhanced) │ │ (Structure + Classification + Instructions) │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ VALIDATION LAYER │ │ (AST Validation + Content Analysis) │ └─────────────────────────────────────────────┘ ``` ### Key Concepts **1. Schema Classification System** - **Required**: Must be present, validation fails if missing - **Recommended**: Should be present, warning if missing - **Optional**: May be present, no validation impact - **Discouraged**: Should not be present, warning if present - **Improper**: Must not be present, validation fails if present **2. Content Control** - **Content Instructions**: Human-readable guidance for section content - **Content Patterns**: Regex/template patterns for content validation - **Content Quality Metrics**: Word count, readability, completeness scoring **3. Multi-Schema Conformance** - Documents can conform to multiple schemas simultaneously - Schema composition and inheritance - Conflict resolution strategies **4. Blueprint System** - Schemas + Instructions + Data Templates = Blueprints - Blueprints generate documents with content guidance - Data binding for dynamic document generation --- ## Phase 1: Enhanced Schema Format **Goal**: Extend JSON Schema with MarkiTect-specific content control extensions. ### 1.1 Schema Classification Extensions **New Properties**: ```json { "x-markitect-sections": { "SYNOPSIS": { "classification": "required", "heading_level": 2, "position": "after_title", "content_instruction": "Brief command syntax showing all options", "min_code_blocks": 1, "max_code_blocks": 3 }, "EXAMPLES": { "classification": "recommended", "heading_level": 2, "content_instruction": "Practical usage examples with explanations", "min_code_blocks": 3, "warning_if_missing": "Examples greatly improve documentation usability" }, "DEPRECATED": { "classification": "discouraged", "heading_level": 2, "warning_message": "DEPRECATED sections should be moved to historical docs" }, "INTERNAL_NOTES": { "classification": "improper", "heading_level": 2, "error_message": "Internal notes must not appear in published documentation" } } } ``` ### 1.2 Content Control Extensions **New Properties**: ```json { "x-markitect-content-control": { "synopsis_section": { "min_paragraphs": 1, "max_paragraphs": 3, "required_patterns": [ "\\*\\*[a-z-]+\\*\\*.*\\[.*\\]" // Bold command with args ], "content_quality": { "min_words": 10, "max_words": 100, "readability_target": "technical" }, "content_instructions": [ "Show command name in bold", "Include all major options in synopsis", "Use italic for arguments and placeholders" ] } } } ``` ### 1.3 Flexible Structure Constraints **Replace rigid counts with ranges and classifications**: ```json { "properties": { "headings": { "properties": { "level_2": { "items": { "properties": { "content": { "oneOf": [ {"const": "SYNOPSIS", "x-markitect-classification": "required"}, {"const": "DESCRIPTION", "x-markitect-classification": "required"}, {"const": "EXAMPLES", "x-markitect-classification": "recommended"}, {"const": "SEE ALSO", "x-markitect-classification": "optional"} ] } } }, "minItems": 2, // At least required sections "maxItems": 30 // Reasonable upper bound } } } } } ``` ### Tasks - [ ] **Task 1.1**: Define `x-markitect-sections` schema extension format - [ ] **Task 1.2**: Define `x-markitect-content-control` schema extension format - [ ] **Task 1.3**: Update metaschema to validate new extensions - [ ] **Task 1.4**: Create schema examples demonstrating all classifications - [ ] **Task 1.5**: Document schema extension format **Duration**: 3-4 sessions **Dependencies**: None **Deliverables**: Enhanced schema format specification, updated metaschema --- ## Phase 2: Schema Refinement Tools **Goal**: Tools to transform rigid auto-generated schemas into flexible, classified schemas. ### 2.1 Schema Analysis Tool **Command**: `markitect schema-analyze` Analyzes existing schema and suggests improvements: ```bash markitect schema-analyze rigid-schema.json # Output: ⚠️ Exact counts detected (86 paragraphs) Suggestion: Use range 50-150 for flexibility ⚠️ All sections unclassified Suggestion: Classify sections as required/recommended/optional ⚠️ No content instructions Suggestion: Add content guidance for key sections ✨ Run: markitect schema-refine rigid-schema.json ``` ### 2.2 Schema Refinement Tool **Command**: `markitect schema-refine` Interactive or automated schema refinement: ```bash # Automated: Apply common refinements markitect schema-refine rigid-schema.json \ --loosen-counts \ --add-classifications \ --output flexible-schema.json # Interactive: Guided refinement markitect schema-refine rigid-schema.json --interactive ``` **Refinement Operations**: - Convert exact counts to ranges (configurable tolerance) - Classify sections based on conventions - Add content instructions from templates - Merge multiple schemas for common patterns ### 2.3 Schema Composition Tool **Command**: `markitect schema-compose` Combine multiple schemas: ```bash # Create composite schema markitect schema-compose \ --base manpage-schema.json \ --extend api-reference-schema.json \ --extend tutorial-schema.json \ --output composite-schema.json ``` ### Tasks - [ ] **Task 2.1**: Implement `schema-analyze` command - [ ] **Task 2.2**: Implement `schema-refine` command with loosening logic - [ ] **Task 2.3**: Implement `schema-refine --interactive` mode - [ ] **Task 2.4**: Implement `schema-compose` command - [ ] **Task 2.5**: Create schema refinement rule library **Duration**: 3-4 sessions **Dependencies**: Phase 1 complete **Deliverables**: Schema analysis, refinement, and composition tools --- ## Phase 3: Enhanced Validation Engine **Goal**: Validate classification levels, content patterns, and multi-schema conformance. ### 3.1 Classification-Aware Validation **Validation Levels**: ```python class ValidationResult: status: Literal["valid", "valid_with_warnings", "invalid"] errors: List[ValidationError] # Required/Improper violations warnings: List[ValidationWarning] # Recommended/Discouraged violations suggestions: List[str] # Optional improvements ``` **Example Output**: ```bash markitect validate document.md schema.json --detailed-errors ❌ ERRORS (validation failed) - Missing required section: SYNOPSIS - Improper section present: INTERNAL_NOTES ⚠️ WARNINGS - Missing recommended section: EXAMPLES - Discouraged section present: DEPRECATED 💡 SUGGESTIONS - Consider adding optional section: PERFORMANCE - Content quality: DESCRIPTION section below recommended word count (45/100) Status: INVALID (2 errors, 2 warnings) ``` ### 3.2 Content Pattern Validation **Validate content patterns**: ```python # Schema specifies required patterns "synopsis_section": { "required_patterns": [ r"\*\*command\*\*", # Bold command name r"\[.*\]" # Options in brackets ], "discouraged_patterns": [ r"TODO", # No TODOs in published docs r"FIXME" ] } ``` ### 3.3 Multi-Schema Validation **Command**: `markitect validate --schemas` ```bash # Validate against multiple schemas markitect validate api-doc.md \ --schemas manpage.json,api-reference.json,tutorial.json \ --require-all # Output shows conformance to each schema ✅ manpage.json: VALID ✅ api-reference.json: VALID (2 warnings) ❌ tutorial.json: INVALID (missing required section: GETTING STARTED) Overall: INVALID (must conform to all schemas) ``` ### 3.4 Content Quality Metrics **Validate content quality**: ```bash markitect validate document.md schema.json --quality-check 📊 Content Quality Report - Word count: 487 (target: 300-1000) ✅ - Code examples: 3 (minimum: 3) ✅ - Readability: Technical (appropriate) ✅ - Link validity: 12/12 valid ✅ - Heading hierarchy: Valid ✅ Quality Score: 95/100 ``` ### Tasks - [ ] **Task 3.1**: Implement classification-aware validator - [ ] **Task 3.2**: Implement content pattern validation - [ ] **Task 3.3**: Implement multi-schema validation - [ ] **Task 3.4**: Implement content quality metrics - [ ] **Task 3.5**: Enhanced error reporting with suggestions **Duration**: 4-5 sessions **Dependencies**: Phase 1 complete **Deliverables**: Enhanced validation engine, quality metrics --- ## Phase 4: Blueprint System **Goal**: Document generation system with schemas + content instructions + data templates. ### 4.1 Blueprint Format **Blueprint Structure**: ```json { "$blueprint": "1.0", "name": "api-documentation-blueprint", "description": "Blueprint for API endpoint documentation", "schemas": [ "manpage-schema.json", "api-reference-schema.json" ], "content_model": { "synopsis": { "template": "**{{command}}** [*OPTIONS*] *{{primary_argument}}*", "data_source": "command_metadata.json", "instruction": "Brief command syntax" }, "description": { "template": "{{description}}\n\nThis endpoint {{purpose}}.", "min_paragraphs": 2, "instruction": "Explain what the endpoint does and why to use it" }, "parameters": { "template": "{{#each parameters}}\n**{{name}}** *{{type}}*\n: {{description}}\n{{/each}}", "data_source": "parameters", "instruction": "Document all parameters with types and descriptions" } }, "data_schema": { "type": "object", "properties": { "command": {"type": "string"}, "primary_argument": {"type": "string"}, "description": {"type": "string"}, "purpose": {"type": "string"}, "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "type": {"type": "string"}, "description": {"type": "string"} } } } } }, "generation_rules": { "heading_style": "atx", "code_fence_style": "backticks", "line_length": 80, "include_metadata": true } } ``` ### 4.2 Blueprint Commands **Create Blueprint**: ```bash # From existing schema markitect blueprint-create --from-schema api-schema.json \ --output api-blueprint.json # Interactive creation markitect blueprint-create --interactive ``` **Generate from Blueprint**: ```bash # Generate with data file markitect blueprint-generate api-blueprint.json \ --data endpoint-data.json \ --output api-doc.md # Generate with inline data markitect blueprint-generate api-blueprint.json \ --data '{"command": "api-call", "description": "Make API call"}' \ --output api-doc.md # Batch generation markitect blueprint-generate-batch api-blueprint.json \ --data-dir ./endpoints/ \ --output-dir ./docs/api/ ``` **Validate Blueprint**: ```bash # Validate blueprint format markitect blueprint-validate api-blueprint.json # Test blueprint generation markitect blueprint-test api-blueprint.json \ --sample-data test-data.json ``` ### 4.3 Template Engine Integration **Handlebars-style templates with MarkiTect extensions**: ```markdown # {{command}}(1) - {{title}} ## SYNOPSIS **{{command}}** {{#each options}}[*{{this}}*] {{/each}}*{{argument}}* ## DESCRIPTION {{description}} {{#markitect-section "technical-details"}} Technical implementation details for {{command}}. {{/markitect-section}} ## PARAMETERS {{#each parameters}} **--{{name}}** *{{type}}* : {{description}} : {{#if default}}Default: `{{default}}`{{/if}} {{/each}} {{#markitect-code-block "bash"}} # Example usage {{command}} {{#each examples.[0].args}}{{this}} {{/each}} {{/markitect-code-block}} ``` ### Tasks - [ ] **Task 4.1**: Define blueprint format specification - [ ] **Task 4.2**: Implement `blueprint-create` command - [ ] **Task 4.3**: Implement `blueprint-generate` command - [ ] **Task 4.4**: Implement template engine with Handlebars - [ ] **Task 4.5**: Implement `blueprint-validate` command - [ ] **Task 4.6**: Implement batch generation - [ ] **Task 4.7**: Create blueprint library (common patterns) **Duration**: 5-6 sessions **Dependencies**: Phases 1 and 3 complete **Deliverables**: Blueprint system, template engine, generation commands --- ## Phase 5: Documentation and Integration **Goal**: Comprehensive documentation, examples, and ecosystem integration. ### 5.1 Documentation Suite **Documents to Create**: - [ ] Schema Evolution Guide (why and how) - [ ] Schema Classification Reference - [ ] Content Control Specification - [ ] Blueprint System Guide - [ ] Schema Design Best Practices - [ ] Migration Guide (old schemas → new format) - [ ] API Reference for programmatic usage ### 5.2 Example Gallery **Create comprehensive examples**: - [ ] Manpage blueprint (already started) - [ ] API documentation blueprint - [ ] Tutorial document blueprint - [ ] Architecture Decision Record (ADR) blueprint - [ ] RFC/specification blueprint - [ ] Meeting notes blueprint - [ ] Project README blueprint ### 5.3 CLI Integration **Update existing commands**: ```bash # schema-generate with classification markitect schema-generate example.md \ --classify-sections \ --add-instructions \ --flexible \ --output smart-schema.json # validate with multiple schemas markitect validate doc.md \ --schemas schema1.json,schema2.json \ --classification-aware \ --quality-check # generate-stub enhanced markitect generate-stub schema.json \ --include-instructions \ --sample-content \ --output template.md ``` ### 5.4 CI/CD Integration Templates **Provide ready-to-use integrations**: GitHub Actions: ```yaml - name: Validate Documentation uses: markitect/validate-action@v1 with: schemas: docs/schemas/*.json files: docs/**/*.md classification-aware: true fail-on: errors warn-on: missing-recommended ``` Pre-commit hook: ```bash #!/bin/bash markitect validate-changed --schemas docs/schemas/ \ --classification-aware \ --fail-on errors ``` ### Tasks - [ ] **Task 5.1**: Write comprehensive documentation suite - [ ] **Task 5.2**: Create example gallery with 7+ blueprints - [ ] **Task 5.3**: Update all CLI commands for new features - [ ] **Task 5.4**: Create CI/CD integration templates - [ ] **Task 5.5**: Write migration guide for existing schemas - [ ] **Task 5.6**: Create video tutorials/screencasts **Duration**: 3-4 sessions **Dependencies**: All previous phases complete **Deliverables**: Complete documentation, examples, integrations --- ## Implementation Strategy ### Development Approach **1. Test-Driven Development** - Write tests for each classification level - Test schema refinement transformations - Test blueprint generation with various data - Test multi-schema validation **2. Backward Compatibility** - Existing schemas continue to work - New features are opt-in via extensions - Clear migration path documented **3. Incremental Rollout** - Phase 1: Can be used immediately after completion - Each phase delivers user value independently - Later phases build on earlier phases **4. Community Feedback** - Alpha release after Phase 1 - Beta release after Phase 3 - Stable release after Phase 5 ### Technical Considerations **Schema Format**: - JSON Schema draft-07 as foundation - MarkiTect extensions namespaced with `x-markitect-` - Validation via metaschema - Clear upgrade path to future JSON Schema versions **Performance**: - Cache compiled schemas - Lazy validation for large documents - Parallel validation for multiple schemas - Optimize content pattern matching **API Design**: - Programmatic access to all features - Python API for schema manipulation - Plugin system for custom validators - Extensible template engine --- ## Success Metrics ### Phase 1 Success - ✅ Schema with all 5 classifications validates correctly - ✅ Content instructions appear in generated stubs - ✅ Metaschema validates all extension formats ### Phase 2 Success - ✅ Rigid schema refined to flexible schema automatically - ✅ Multiple schemas composed without conflicts - ✅ Interactive refinement completes end-to-end ### Phase 3 Success - ✅ Validation distinguishes errors from warnings - ✅ Content patterns detected and reported - ✅ Multi-schema validation works with 3+ schemas - ✅ Quality metrics provide actionable feedback ### Phase 4 Success - ✅ Blueprint generates valid document from data - ✅ Generated document validates against source schemas - ✅ Batch generation processes 100+ documents - ✅ Template engine supports complex logic ### Phase 5 Success - ✅ Documentation covers all features - ✅ 7+ working blueprint examples - ✅ CI/CD integrations work in real projects - ✅ Migration guide successfully upgrades old schemas --- ## Risk Assessment ### Technical Risks **Risk**: Schema format complexity **Mitigation**: Clear examples, validation tools, gradual adoption **Risk**: Performance degradation with complex schemas **Mitigation**: Caching, optimization, benchmarking **Risk**: Template engine security (code injection) **Mitigation**: Sandboxed execution, no eval, strict parsing ### Adoption Risks **Risk**: Breaking changes to existing workflows **Mitigation**: Full backward compatibility, opt-in features **Risk**: Learning curve for new features **Mitigation**: Excellent documentation, examples, tutorials **Risk**: Feature bloat **Mitigation**: Keep core simple, advanced features optional --- ## Future Enhancements (Post-MVP) ### Potential Future Features **1. Semantic Validation** - AI-powered content quality checking - Grammar and style validation - Factual consistency checking - Link and reference validation **2. Visual Schema Editor** - Web-based GUI for schema creation - Visual blueprint designer - Live preview of generated documents - Drag-and-drop section arrangement **3. Schema Marketplace** - Community schema repository - Reusable blueprint library - Rating and reviews system - Version management **4. Advanced Blueprint Features** - Conditional sections based on data - Dynamic schema selection - Multi-language support - Custom helper functions **5. Integration Ecosystem** - IDE plugins (VS Code, JetBrains) - Documentation platforms (Read the Docs, Docusaurus) - CMS integrations (Contentful, Strapi) - Static site generators (Hugo, Jekyll) --- ## Conclusion This workplan transforms MarkiTect from a structural validator to a comprehensive document control system: **Current**: Rigid structure validation **Target**: Flexible content control with blueprints **Key Improvements**: 1. ✨ Classification system (required → improper) 2. ✨ Content guidance and instructions 3. ✨ Multi-schema conformance 4. ✨ Blueprint-based generation 5. ✨ Quality metrics and analysis **Timeline**: ~8-10 weeks for full implementation **Value**: Complete CMS-like document control for markdown The system remains true to MarkiTect's philosophy of treating markdown as structured data while adding the flexibility and guidance needed for real-world content management. --- ## Next Steps 1. **Review and refine** this workplan 2. **Prioritize phases** based on user needs 3. **Create detailed specifications** for Phase 1 4. **Set up development environment** for new features 5. **Begin implementation** with TDD approach **First Implementation Task**: Define `x-markitect-sections` format specification