Files
markitect-main/roadmap/260105-schema-evolution/SCHEMA_EVOLUTION_WORKPLAN.md

22 KiB

MarkiTect Schema Evolution Workplan

Executive Summary

Current State: MarkiTect validates document structure via JSON Schema, but is too rigid (exact counts) and structure-only (no content guidance).

Target State: A flexible schema system with content control, section classification, multi-schema conformance, and blueprint-based document generation.

Timeline: 5 phases, 15-20 development sessions, approximately 8-10 weeks.


Problem Analysis

Current Limitations

1. Structural Rigidity

Problem: Auto-generated schemas use exact counts

"paragraphs": { "minItems": 86, "maxItems": 86 }

Impact: Schemas are document-specific, not reusable patterns.

2. Binary Structure Validation

Problem: Elements are either valid or invalid, no classification. Need: Required, Recommended, Optional, Discouraged, Improper classifications.

3. No Content Guidance

Problem: Schemas validate structure exists, not what content belongs there. Need: Content instructions, semantic patterns, quality expectations.

4. Single Schema Limitation

Problem: Documents can only conform to one schema. Need: Multi-schema conformance (e.g., "manpage" + "API reference" + "tutorial").

5. Template Generation Gap

Problem: generate-stub creates outline, but no content guidance or data binding. Need: Blueprint system with content instructions and data templates.


Proposed Architecture

Three-Layer System

┌─────────────────────────────────────────────┐
│         BLUEPRINT LAYER                      │
│  (Multi-schema + Content + Data Templates)   │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│         SCHEMA LAYER (Enhanced)              │
│  (Structure + Classification + Instructions) │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│         VALIDATION LAYER                     │
│  (AST Validation + Content Analysis)         │
└─────────────────────────────────────────────┘

Key Concepts

1. Schema Classification System

  • Required: Must be present, validation fails if missing
  • Recommended: Should be present, warning if missing
  • Optional: May be present, no validation impact
  • Discouraged: Should not be present, warning if present
  • Improper: Must not be present, validation fails if present

2. Content Control

  • Content Instructions: Human-readable guidance for section content
  • Content Patterns: Regex/template patterns for content validation
  • Content Quality Metrics: Word count, readability, completeness scoring

3. Multi-Schema Conformance

  • Documents can conform to multiple schemas simultaneously
  • Schema composition and inheritance
  • Conflict resolution strategies

4. Blueprint System

  • Schemas + Instructions + Data Templates = Blueprints
  • Blueprints generate documents with content guidance
  • Data binding for dynamic document generation

Phase 1: Enhanced Schema Format

Goal: Extend JSON Schema with MarkiTect-specific content control extensions.

1.1 Schema Classification Extensions

New Properties:

{
  "x-markitect-sections": {
    "SYNOPSIS": {
      "classification": "required",
      "heading_level": 2,
      "position": "after_title",
      "content_instruction": "Brief command syntax showing all options",
      "min_code_blocks": 1,
      "max_code_blocks": 3
    },
    "EXAMPLES": {
      "classification": "recommended",
      "heading_level": 2,
      "content_instruction": "Practical usage examples with explanations",
      "min_code_blocks": 3,
      "warning_if_missing": "Examples greatly improve documentation usability"
    },
    "DEPRECATED": {
      "classification": "discouraged",
      "heading_level": 2,
      "warning_message": "DEPRECATED sections should be moved to historical docs"
    },
    "INTERNAL_NOTES": {
      "classification": "improper",
      "heading_level": 2,
      "error_message": "Internal notes must not appear in published documentation"
    }
  }
}

1.2 Content Control Extensions

New Properties:

{
  "x-markitect-content-control": {
    "synopsis_section": {
      "min_paragraphs": 1,
      "max_paragraphs": 3,
      "required_patterns": [
        "\\*\\*[a-z-]+\\*\\*.*\\[.*\\]"  // Bold command with args
      ],
      "content_quality": {
        "min_words": 10,
        "max_words": 100,
        "readability_target": "technical"
      },
      "content_instructions": [
        "Show command name in bold",
        "Include all major options in synopsis",
        "Use italic for arguments and placeholders"
      ]
    }
  }
}

1.3 Flexible Structure Constraints

Replace rigid counts with ranges and classifications:

{
  "properties": {
    "headings": {
      "properties": {
        "level_2": {
          "items": {
            "properties": {
              "content": {
                "oneOf": [
                  {"const": "SYNOPSIS", "x-markitect-classification": "required"},
                  {"const": "DESCRIPTION", "x-markitect-classification": "required"},
                  {"const": "EXAMPLES", "x-markitect-classification": "recommended"},
                  {"const": "SEE ALSO", "x-markitect-classification": "optional"}
                ]
              }
            }
          },
          "minItems": 2,  // At least required sections
          "maxItems": 30  // Reasonable upper bound
        }
      }
    }
  }
}

Tasks

  • Task 1.1: Define x-markitect-sections schema extension format
  • Task 1.2: Define x-markitect-content-control schema extension format
  • Task 1.3: Update metaschema to validate new extensions
  • Task 1.4: Create schema examples demonstrating all classifications
  • Task 1.5: Document schema extension format

Duration: 3-4 sessions Dependencies: None Deliverables: Enhanced schema format specification, updated metaschema


Phase 2: Schema Refinement Tools

Goal: Tools to transform rigid auto-generated schemas into flexible, classified schemas.

2.1 Schema Analysis Tool

Command: markitect schema-analyze

Analyzes existing schema and suggests improvements:

markitect schema-analyze rigid-schema.json

# Output:
⚠️  Exact counts detected (86 paragraphs)
   Suggestion: Use range 50-150 for flexibility

⚠️  All sections unclassified
   Suggestion: Classify sections as required/recommended/optional

⚠️  No content instructions
   Suggestion: Add content guidance for key sections

✨ Run: markitect schema-refine rigid-schema.json

2.2 Schema Refinement Tool

Command: markitect schema-refine

Interactive or automated schema refinement:

# Automated: Apply common refinements
markitect schema-refine rigid-schema.json \
    --loosen-counts \
    --add-classifications \
    --output flexible-schema.json

# Interactive: Guided refinement
markitect schema-refine rigid-schema.json --interactive

Refinement Operations:

  • Convert exact counts to ranges (configurable tolerance)
  • Classify sections based on conventions
  • Add content instructions from templates
  • Merge multiple schemas for common patterns

2.3 Schema Composition Tool

Command: markitect schema-compose

Combine multiple schemas:

# Create composite schema
markitect schema-compose \
    --base manpage-schema.json \
    --extend api-reference-schema.json \
    --extend tutorial-schema.json \
    --output composite-schema.json

Tasks

  • Task 2.1: Implement schema-analyze command
  • Task 2.2: Implement schema-refine command with loosening logic
  • Task 2.3: Implement schema-refine --interactive mode
  • Task 2.4: Implement schema-compose command
  • Task 2.5: Create schema refinement rule library

Duration: 3-4 sessions Dependencies: Phase 1 complete Deliverables: Schema analysis, refinement, and composition tools


Phase 3: Enhanced Validation Engine

Goal: Validate classification levels, content patterns, and multi-schema conformance.

3.1 Classification-Aware Validation

Validation Levels:

class ValidationResult:
    status: Literal["valid", "valid_with_warnings", "invalid"]
    errors: List[ValidationError]      # Required/Improper violations
    warnings: List[ValidationWarning]  # Recommended/Discouraged violations
    suggestions: List[str]             # Optional improvements

Example Output:

markitect validate document.md schema.json --detailed-errors

❌ ERRORS (validation failed)
  - Missing required section: SYNOPSIS
  - Improper section present: INTERNAL_NOTES

⚠️  WARNINGS
  - Missing recommended section: EXAMPLES
  - Discouraged section present: DEPRECATED

💡 SUGGESTIONS
  - Consider adding optional section: PERFORMANCE
  - Content quality: DESCRIPTION section below recommended word count (45/100)

Status: INVALID (2 errors, 2 warnings)

3.2 Content Pattern Validation

Validate content patterns:

# Schema specifies required patterns
"synopsis_section": {
    "required_patterns": [
        r"\*\*command\*\*",  # Bold command name
        r"\[.*\]"            # Options in brackets
    ],
    "discouraged_patterns": [
        r"TODO",             # No TODOs in published docs
        r"FIXME"
    ]
}

3.3 Multi-Schema Validation

Command: markitect validate --schemas

# Validate against multiple schemas
markitect validate api-doc.md \
    --schemas manpage.json,api-reference.json,tutorial.json \
    --require-all

# Output shows conformance to each schema
✅ manpage.json: VALID
✅ api-reference.json: VALID (2 warnings)
❌ tutorial.json: INVALID (missing required section: GETTING STARTED)

Overall: INVALID (must conform to all schemas)

3.4 Content Quality Metrics

Validate content quality:

markitect validate document.md schema.json --quality-check

📊 Content Quality Report
  - Word count: 487 (target: 300-1000) ✅
  - Code examples: 3 (minimum: 3) ✅
  - Readability: Technical (appropriate) ✅
  - Link validity: 12/12 valid ✅
  - Heading hierarchy: Valid ✅

Quality Score: 95/100

Tasks

  • Task 3.1: Implement classification-aware validator
  • Task 3.2: Implement content pattern validation
  • Task 3.3: Implement multi-schema validation
  • Task 3.4: Implement content quality metrics
  • Task 3.5: Enhanced error reporting with suggestions

Duration: 4-5 sessions Dependencies: Phase 1 complete Deliverables: Enhanced validation engine, quality metrics


Phase 4: Blueprint System

Goal: Document generation system with schemas + content instructions + data templates.

4.1 Blueprint Format

Blueprint Structure:

{
  "$blueprint": "1.0",
  "name": "api-documentation-blueprint",
  "description": "Blueprint for API endpoint documentation",

  "schemas": [
    "manpage-schema.json",
    "api-reference-schema.json"
  ],

  "content_model": {
    "synopsis": {
      "template": "**{{command}}** [*OPTIONS*] *{{primary_argument}}*",
      "data_source": "command_metadata.json",
      "instruction": "Brief command syntax"
    },
    "description": {
      "template": "{{description}}\n\nThis endpoint {{purpose}}.",
      "min_paragraphs": 2,
      "instruction": "Explain what the endpoint does and why to use it"
    },
    "parameters": {
      "template": "{{#each parameters}}\n**{{name}}** *{{type}}*\n: {{description}}\n{{/each}}",
      "data_source": "parameters",
      "instruction": "Document all parameters with types and descriptions"
    }
  },

  "data_schema": {
    "type": "object",
    "properties": {
      "command": {"type": "string"},
      "primary_argument": {"type": "string"},
      "description": {"type": "string"},
      "purpose": {"type": "string"},
      "parameters": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "type": {"type": "string"},
            "description": {"type": "string"}
          }
        }
      }
    }
  },

  "generation_rules": {
    "heading_style": "atx",
    "code_fence_style": "backticks",
    "line_length": 80,
    "include_metadata": true
  }
}

4.2 Blueprint Commands

Create Blueprint:

# From existing schema
markitect blueprint-create --from-schema api-schema.json \
    --output api-blueprint.json

# Interactive creation
markitect blueprint-create --interactive

Generate from Blueprint:

# Generate with data file
markitect blueprint-generate api-blueprint.json \
    --data endpoint-data.json \
    --output api-doc.md

# Generate with inline data
markitect blueprint-generate api-blueprint.json \
    --data '{"command": "api-call", "description": "Make API call"}' \
    --output api-doc.md

# Batch generation
markitect blueprint-generate-batch api-blueprint.json \
    --data-dir ./endpoints/ \
    --output-dir ./docs/api/

Validate Blueprint:

# Validate blueprint format
markitect blueprint-validate api-blueprint.json

# Test blueprint generation
markitect blueprint-test api-blueprint.json \
    --sample-data test-data.json

4.3 Template Engine Integration

Handlebars-style templates with MarkiTect extensions:

# {{command}}(1) - {{title}}

## SYNOPSIS

**{{command}}** {{#each options}}[*{{this}}*] {{/each}}*{{argument}}*

## DESCRIPTION

{{description}}

{{#markitect-section "technical-details"}}
Technical implementation details for {{command}}.
{{/markitect-section}}

## PARAMETERS

{{#each parameters}}
**--{{name}}** *{{type}}*
: {{description}}
: {{#if default}}Default: `{{default}}`{{/if}}

{{/each}}

{{#markitect-code-block "bash"}}
# Example usage
{{command}} {{#each examples.[0].args}}{{this}} {{/each}}
{{/markitect-code-block}}

Tasks

  • Task 4.1: Define blueprint format specification
  • Task 4.2: Implement blueprint-create command
  • Task 4.3: Implement blueprint-generate command
  • Task 4.4: Implement template engine with Handlebars
  • Task 4.5: Implement blueprint-validate command
  • Task 4.6: Implement batch generation
  • Task 4.7: Create blueprint library (common patterns)

Duration: 5-6 sessions Dependencies: Phases 1 and 3 complete Deliverables: Blueprint system, template engine, generation commands


Phase 5: Documentation and Integration

Goal: Comprehensive documentation, examples, and ecosystem integration.

5.1 Documentation Suite

Documents to Create:

  • Schema Evolution Guide (why and how)
  • Schema Classification Reference
  • Content Control Specification
  • Blueprint System Guide
  • Schema Design Best Practices
  • Migration Guide (old schemas → new format)
  • API Reference for programmatic usage

Create comprehensive examples:

  • Manpage blueprint (already started)
  • API documentation blueprint
  • Tutorial document blueprint
  • Architecture Decision Record (ADR) blueprint
  • RFC/specification blueprint
  • Meeting notes blueprint
  • Project README blueprint

5.3 CLI Integration

Update existing commands:

# schema-generate with classification
markitect schema-generate example.md \
    --classify-sections \
    --add-instructions \
    --flexible \
    --output smart-schema.json

# validate with multiple schemas
markitect validate doc.md \
    --schemas schema1.json,schema2.json \
    --classification-aware \
    --quality-check

# generate-stub enhanced
markitect generate-stub schema.json \
    --include-instructions \
    --sample-content \
    --output template.md

5.4 CI/CD Integration Templates

Provide ready-to-use integrations:

GitHub Actions:

- name: Validate Documentation
  uses: markitect/validate-action@v1
  with:
    schemas: docs/schemas/*.json
    files: docs/**/*.md
    classification-aware: true
    fail-on: errors
    warn-on: missing-recommended

Pre-commit hook:

#!/bin/bash
markitect validate-changed --schemas docs/schemas/ \
    --classification-aware \
    --fail-on errors

Tasks

  • Task 5.1: Write comprehensive documentation suite
  • Task 5.2: Create example gallery with 7+ blueprints
  • Task 5.3: Update all CLI commands for new features
  • Task 5.4: Create CI/CD integration templates
  • Task 5.5: Write migration guide for existing schemas
  • Task 5.6: Create video tutorials/screencasts

Duration: 3-4 sessions Dependencies: All previous phases complete Deliverables: Complete documentation, examples, integrations


Implementation Strategy

Development Approach

1. Test-Driven Development

  • Write tests for each classification level
  • Test schema refinement transformations
  • Test blueprint generation with various data
  • Test multi-schema validation

2. Backward Compatibility

  • Existing schemas continue to work
  • New features are opt-in via extensions
  • Clear migration path documented

3. Incremental Rollout

  • Phase 1: Can be used immediately after completion
  • Each phase delivers user value independently
  • Later phases build on earlier phases

4. Community Feedback

  • Alpha release after Phase 1
  • Beta release after Phase 3
  • Stable release after Phase 5

Technical Considerations

Schema Format:

  • JSON Schema draft-07 as foundation
  • MarkiTect extensions namespaced with x-markitect-
  • Validation via metaschema
  • Clear upgrade path to future JSON Schema versions

Performance:

  • Cache compiled schemas
  • Lazy validation for large documents
  • Parallel validation for multiple schemas
  • Optimize content pattern matching

API Design:

  • Programmatic access to all features
  • Python API for schema manipulation
  • Plugin system for custom validators
  • Extensible template engine

Success Metrics

Phase 1 Success

  • Schema with all 5 classifications validates correctly
  • Content instructions appear in generated stubs
  • Metaschema validates all extension formats

Phase 2 Success

  • Rigid schema refined to flexible schema automatically
  • Multiple schemas composed without conflicts
  • Interactive refinement completes end-to-end

Phase 3 Success

  • Validation distinguishes errors from warnings
  • Content patterns detected and reported
  • Multi-schema validation works with 3+ schemas
  • Quality metrics provide actionable feedback

Phase 4 Success

  • Blueprint generates valid document from data
  • Generated document validates against source schemas
  • Batch generation processes 100+ documents
  • Template engine supports complex logic

Phase 5 Success

  • Documentation covers all features
  • 7+ working blueprint examples
  • CI/CD integrations work in real projects
  • Migration guide successfully upgrades old schemas

Risk Assessment

Technical Risks

Risk: Schema format complexity Mitigation: Clear examples, validation tools, gradual adoption

Risk: Performance degradation with complex schemas Mitigation: Caching, optimization, benchmarking

Risk: Template engine security (code injection) Mitigation: Sandboxed execution, no eval, strict parsing

Adoption Risks

Risk: Breaking changes to existing workflows Mitigation: Full backward compatibility, opt-in features

Risk: Learning curve for new features Mitigation: Excellent documentation, examples, tutorials

Risk: Feature bloat Mitigation: Keep core simple, advanced features optional


Future Enhancements (Post-MVP)

Potential Future Features

1. Semantic Validation

  • AI-powered content quality checking
  • Grammar and style validation
  • Factual consistency checking
  • Link and reference validation

2. Visual Schema Editor

  • Web-based GUI for schema creation
  • Visual blueprint designer
  • Live preview of generated documents
  • Drag-and-drop section arrangement

3. Schema Marketplace

  • Community schema repository
  • Reusable blueprint library
  • Rating and reviews system
  • Version management

4. Advanced Blueprint Features

  • Conditional sections based on data
  • Dynamic schema selection
  • Multi-language support
  • Custom helper functions

5. Integration Ecosystem

  • IDE plugins (VS Code, JetBrains)
  • Documentation platforms (Read the Docs, Docusaurus)
  • CMS integrations (Contentful, Strapi)
  • Static site generators (Hugo, Jekyll)

Conclusion

This workplan transforms MarkiTect from a structural validator to a comprehensive document control system:

Current: Rigid structure validation Target: Flexible content control with blueprints

Key Improvements:

  1. Classification system (required → improper)
  2. Content guidance and instructions
  3. Multi-schema conformance
  4. Blueprint-based generation
  5. Quality metrics and analysis

Timeline: ~8-10 weeks for full implementation Value: Complete CMS-like document control for markdown

The system remains true to MarkiTect's philosophy of treating markdown as structured data while adding the flexibility and guidance needed for real-world content management.


Next Steps

  1. Review and refine this workplan
  2. Prioritize phases based on user needs
  3. Create detailed specifications for Phase 1
  4. Set up development environment for new features
  5. Begin implementation with TDD approach

First Implementation Task: Define x-markitect-sections format specification