Files
markitect-main/roadmap/SCHEMA_EVOLUTION_WORKPLAN.md
tegwick b6f95066a3 chore: establish schema-of-schemas workplan and reorganize roadmap
This commit sets up the comprehensive workplan for implementing a
markdown-first schema management system with naming conventions,
versioning, and self-validation capabilities.

## Directory Reorganization

- Renamed `todo/` → `roadmap/` for better organization
- Created `roadmap/schema-of-schemas/` subdirectory
- Moved schema management planning artifacts to dedicated directory

## Planning Artifacts Created

### Workplan & Documentation
- **WORKPLAN.md** (19KB) - Comprehensive 6-phase implementation plan
- **SCHEMA_MANAGEMENT_PROPOSAL.md** - Full analysis with 4 options
- **SCHEMA_MANAGEMENT_SUMMARY.md** - Executive summary
- **README.md** - Quick reference guide

### Example Schema
- **examples/schemas/manpage-schema-v1.md** - Demonstrates markdown format

## Schema Management System Design

### Naming Convention
**Format:** `{domain}-schema-v{major}.{minor}.md`
**Examples:**
- `manpage-schema-v1.0.md`
- `terminology-schema-v1.0.md`
- `api-documentation-schema-v1.0.md`

### Markdown-First Format
Schemas will be markdown files with:
- YAML frontmatter for metadata
- Rich documentation sections
- Embedded JSON schema in code block
- Version history and examples

### Implementation Phases (8-10 days)

**Phase 0:** Planning & Setup  (0.5 days) - COMPLETE
**Phase 1:** Filename Convention (1 day) - NEXT
**Phase 2:** Markdown Loader (2-3 days)
**Phase 3:** Schema-for-Schemas (2 days)
**Phase 4:** Schema Migration (1-2 days)
**Phase 5:** CLI & Documentation (1 day)
**Phase 6:** Testing & Validation (1 day)

### Goals

1.  Establish naming convention
2.  Implement filename validation
3.  Create markdown schema loader
4.  Build schema-for-schemas metaschema
5.  Migrate 5 existing schemas (remove 2 duplicates)
6.  Update CLI and documentation

## Updated Tracking

### TODO.md
- Added Schema-of-Schemas as active work item
- Documented Phase 1 tasks and timeline
- Paused capability extraction work

### CHANGELOG.md
- Added schema management system to [Unreleased]
- Documented directory reorganization
- Added "In Progress" section for current work

## Next Steps

Begin Phase 1:
1. Implement schema_naming.py with validation
2. Add unit tests
3. Update CLI schema-ingest command
4. Create naming specification document

## Files Changed

- CHANGELOG.md - Added unreleased schema management features
- TODO.md - Updated active work tracking
- roadmap/ - Reorganized from todo/
- roadmap/schema-of-schemas/ - New planning directory
- examples/schemas/ - Example markdown schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 23:47:02 +01:00

788 lines
22 KiB
Markdown

# MarkiTect Schema Evolution Workplan
## Executive Summary
**Current State**: MarkiTect validates document structure via JSON Schema, but is too rigid (exact counts) and structure-only (no content guidance).
**Target State**: A flexible schema system with content control, section classification, multi-schema conformance, and blueprint-based document generation.
**Timeline**: 5 phases, 15-20 development sessions, approximately 8-10 weeks.
---
## Problem Analysis
### Current Limitations
#### 1. Structural Rigidity
**Problem**: Auto-generated schemas use exact counts
```json
"paragraphs": { "minItems": 86, "maxItems": 86 }
```
**Impact**: Schemas are document-specific, not reusable patterns.
#### 2. Binary Structure Validation
**Problem**: Elements are either valid or invalid, no classification.
**Need**: Required, Recommended, Optional, Discouraged, Improper classifications.
#### 3. No Content Guidance
**Problem**: Schemas validate structure exists, not what content belongs there.
**Need**: Content instructions, semantic patterns, quality expectations.
#### 4. Single Schema Limitation
**Problem**: Documents can only conform to one schema.
**Need**: Multi-schema conformance (e.g., "manpage" + "API reference" + "tutorial").
#### 5. Template Generation Gap
**Problem**: `generate-stub` creates outline, but no content guidance or data binding.
**Need**: Blueprint system with content instructions and data templates.
---
## Proposed Architecture
### Three-Layer System
```
┌─────────────────────────────────────────────┐
│ BLUEPRINT LAYER │
│ (Multi-schema + Content + Data Templates) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ SCHEMA LAYER (Enhanced) │
│ (Structure + Classification + Instructions) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ VALIDATION LAYER │
│ (AST Validation + Content Analysis) │
└─────────────────────────────────────────────┘
```
### Key Concepts
**1. Schema Classification System**
- **Required**: Must be present, validation fails if missing
- **Recommended**: Should be present, warning if missing
- **Optional**: May be present, no validation impact
- **Discouraged**: Should not be present, warning if present
- **Improper**: Must not be present, validation fails if present
**2. Content Control**
- **Content Instructions**: Human-readable guidance for section content
- **Content Patterns**: Regex/template patterns for content validation
- **Content Quality Metrics**: Word count, readability, completeness scoring
**3. Multi-Schema Conformance**
- Documents can conform to multiple schemas simultaneously
- Schema composition and inheritance
- Conflict resolution strategies
**4. Blueprint System**
- Schemas + Instructions + Data Templates = Blueprints
- Blueprints generate documents with content guidance
- Data binding for dynamic document generation
---
## Phase 1: Enhanced Schema Format
**Goal**: Extend JSON Schema with MarkiTect-specific content control extensions.
### 1.1 Schema Classification Extensions
**New Properties**:
```json
{
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"position": "after_title",
"content_instruction": "Brief command syntax showing all options",
"min_code_blocks": 1,
"max_code_blocks": 3
},
"EXAMPLES": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Practical usage examples with explanations",
"min_code_blocks": 3,
"warning_if_missing": "Examples greatly improve documentation usability"
},
"DEPRECATED": {
"classification": "discouraged",
"heading_level": 2,
"warning_message": "DEPRECATED sections should be moved to historical docs"
},
"INTERNAL_NOTES": {
"classification": "improper",
"heading_level": 2,
"error_message": "Internal notes must not appear in published documentation"
}
}
}
```
### 1.2 Content Control Extensions
**New Properties**:
```json
{
"x-markitect-content-control": {
"synopsis_section": {
"min_paragraphs": 1,
"max_paragraphs": 3,
"required_patterns": [
"\\*\\*[a-z-]+\\*\\*.*\\[.*\\]" // Bold command with args
],
"content_quality": {
"min_words": 10,
"max_words": 100,
"readability_target": "technical"
},
"content_instructions": [
"Show command name in bold",
"Include all major options in synopsis",
"Use italic for arguments and placeholders"
]
}
}
}
```
### 1.3 Flexible Structure Constraints
**Replace rigid counts with ranges and classifications**:
```json
{
"properties": {
"headings": {
"properties": {
"level_2": {
"items": {
"properties": {
"content": {
"oneOf": [
{"const": "SYNOPSIS", "x-markitect-classification": "required"},
{"const": "DESCRIPTION", "x-markitect-classification": "required"},
{"const": "EXAMPLES", "x-markitect-classification": "recommended"},
{"const": "SEE ALSO", "x-markitect-classification": "optional"}
]
}
}
},
"minItems": 2, // At least required sections
"maxItems": 30 // Reasonable upper bound
}
}
}
}
}
```
### Tasks
- [ ] **Task 1.1**: Define `x-markitect-sections` schema extension format
- [ ] **Task 1.2**: Define `x-markitect-content-control` schema extension format
- [ ] **Task 1.3**: Update metaschema to validate new extensions
- [ ] **Task 1.4**: Create schema examples demonstrating all classifications
- [ ] **Task 1.5**: Document schema extension format
**Duration**: 3-4 sessions
**Dependencies**: None
**Deliverables**: Enhanced schema format specification, updated metaschema
---
## Phase 2: Schema Refinement Tools
**Goal**: Tools to transform rigid auto-generated schemas into flexible, classified schemas.
### 2.1 Schema Analysis Tool
**Command**: `markitect schema-analyze`
Analyzes existing schema and suggests improvements:
```bash
markitect schema-analyze rigid-schema.json
# Output:
⚠️ Exact counts detected (86 paragraphs)
Suggestion: Use range 50-150 for flexibility
⚠️ All sections unclassified
Suggestion: Classify sections as required/recommended/optional
⚠️ No content instructions
Suggestion: Add content guidance for key sections
✨ Run: markitect schema-refine rigid-schema.json
```
### 2.2 Schema Refinement Tool
**Command**: `markitect schema-refine`
Interactive or automated schema refinement:
```bash
# Automated: Apply common refinements
markitect schema-refine rigid-schema.json \
--loosen-counts \
--add-classifications \
--output flexible-schema.json
# Interactive: Guided refinement
markitect schema-refine rigid-schema.json --interactive
```
**Refinement Operations**:
- Convert exact counts to ranges (configurable tolerance)
- Classify sections based on conventions
- Add content instructions from templates
- Merge multiple schemas for common patterns
### 2.3 Schema Composition Tool
**Command**: `markitect schema-compose`
Combine multiple schemas:
```bash
# Create composite schema
markitect schema-compose \
--base manpage-schema.json \
--extend api-reference-schema.json \
--extend tutorial-schema.json \
--output composite-schema.json
```
### Tasks
- [ ] **Task 2.1**: Implement `schema-analyze` command
- [ ] **Task 2.2**: Implement `schema-refine` command with loosening logic
- [ ] **Task 2.3**: Implement `schema-refine --interactive` mode
- [ ] **Task 2.4**: Implement `schema-compose` command
- [ ] **Task 2.5**: Create schema refinement rule library
**Duration**: 3-4 sessions
**Dependencies**: Phase 1 complete
**Deliverables**: Schema analysis, refinement, and composition tools
---
## Phase 3: Enhanced Validation Engine
**Goal**: Validate classification levels, content patterns, and multi-schema conformance.
### 3.1 Classification-Aware Validation
**Validation Levels**:
```python
class ValidationResult:
status: Literal["valid", "valid_with_warnings", "invalid"]
errors: List[ValidationError] # Required/Improper violations
warnings: List[ValidationWarning] # Recommended/Discouraged violations
suggestions: List[str] # Optional improvements
```
**Example Output**:
```bash
markitect validate document.md schema.json --detailed-errors
❌ ERRORS (validation failed)
- Missing required section: SYNOPSIS
- Improper section present: INTERNAL_NOTES
⚠️ WARNINGS
- Missing recommended section: EXAMPLES
- Discouraged section present: DEPRECATED
💡 SUGGESTIONS
- Consider adding optional section: PERFORMANCE
- Content quality: DESCRIPTION section below recommended word count (45/100)
Status: INVALID (2 errors, 2 warnings)
```
### 3.2 Content Pattern Validation
**Validate content patterns**:
```python
# Schema specifies required patterns
"synopsis_section": {
"required_patterns": [
r"\*\*command\*\*", # Bold command name
r"\[.*\]" # Options in brackets
],
"discouraged_patterns": [
r"TODO", # No TODOs in published docs
r"FIXME"
]
}
```
### 3.3 Multi-Schema Validation
**Command**: `markitect validate --schemas`
```bash
# Validate against multiple schemas
markitect validate api-doc.md \
--schemas manpage.json,api-reference.json,tutorial.json \
--require-all
# Output shows conformance to each schema
✅ manpage.json: VALID
✅ api-reference.json: VALID (2 warnings)
❌ tutorial.json: INVALID (missing required section: GETTING STARTED)
Overall: INVALID (must conform to all schemas)
```
### 3.4 Content Quality Metrics
**Validate content quality**:
```bash
markitect validate document.md schema.json --quality-check
📊 Content Quality Report
- Word count: 487 (target: 300-1000)
- Code examples: 3 (minimum: 3)
- Readability: Technical (appropriate)
- Link validity: 12/12 valid ✅
- Heading hierarchy: Valid ✅
Quality Score: 95/100
```
### Tasks
- [ ] **Task 3.1**: Implement classification-aware validator
- [ ] **Task 3.2**: Implement content pattern validation
- [ ] **Task 3.3**: Implement multi-schema validation
- [ ] **Task 3.4**: Implement content quality metrics
- [ ] **Task 3.5**: Enhanced error reporting with suggestions
**Duration**: 4-5 sessions
**Dependencies**: Phase 1 complete
**Deliverables**: Enhanced validation engine, quality metrics
---
## Phase 4: Blueprint System
**Goal**: Document generation system with schemas + content instructions + data templates.
### 4.1 Blueprint Format
**Blueprint Structure**:
```json
{
"$blueprint": "1.0",
"name": "api-documentation-blueprint",
"description": "Blueprint for API endpoint documentation",
"schemas": [
"manpage-schema.json",
"api-reference-schema.json"
],
"content_model": {
"synopsis": {
"template": "**{{command}}** [*OPTIONS*] *{{primary_argument}}*",
"data_source": "command_metadata.json",
"instruction": "Brief command syntax"
},
"description": {
"template": "{{description}}\n\nThis endpoint {{purpose}}.",
"min_paragraphs": 2,
"instruction": "Explain what the endpoint does and why to use it"
},
"parameters": {
"template": "{{#each parameters}}\n**{{name}}** *{{type}}*\n: {{description}}\n{{/each}}",
"data_source": "parameters",
"instruction": "Document all parameters with types and descriptions"
}
},
"data_schema": {
"type": "object",
"properties": {
"command": {"type": "string"},
"primary_argument": {"type": "string"},
"description": {"type": "string"},
"purpose": {"type": "string"},
"parameters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string"},
"description": {"type": "string"}
}
}
}
}
},
"generation_rules": {
"heading_style": "atx",
"code_fence_style": "backticks",
"line_length": 80,
"include_metadata": true
}
}
```
### 4.2 Blueprint Commands
**Create Blueprint**:
```bash
# From existing schema
markitect blueprint-create --from-schema api-schema.json \
--output api-blueprint.json
# Interactive creation
markitect blueprint-create --interactive
```
**Generate from Blueprint**:
```bash
# Generate with data file
markitect blueprint-generate api-blueprint.json \
--data endpoint-data.json \
--output api-doc.md
# Generate with inline data
markitect blueprint-generate api-blueprint.json \
--data '{"command": "api-call", "description": "Make API call"}' \
--output api-doc.md
# Batch generation
markitect blueprint-generate-batch api-blueprint.json \
--data-dir ./endpoints/ \
--output-dir ./docs/api/
```
**Validate Blueprint**:
```bash
# Validate blueprint format
markitect blueprint-validate api-blueprint.json
# Test blueprint generation
markitect blueprint-test api-blueprint.json \
--sample-data test-data.json
```
### 4.3 Template Engine Integration
**Handlebars-style templates with MarkiTect extensions**:
```markdown
# {{command}}(1) - {{title}}
## SYNOPSIS
**{{command}}** {{#each options}}[*{{this}}*] {{/each}}*{{argument}}*
## DESCRIPTION
{{description}}
{{#markitect-section "technical-details"}}
Technical implementation details for {{command}}.
{{/markitect-section}}
## PARAMETERS
{{#each parameters}}
**--{{name}}** *{{type}}*
: {{description}}
: {{#if default}}Default: `{{default}}`{{/if}}
{{/each}}
{{#markitect-code-block "bash"}}
# Example usage
{{command}} {{#each examples.[0].args}}{{this}} {{/each}}
{{/markitect-code-block}}
```
### Tasks
- [ ] **Task 4.1**: Define blueprint format specification
- [ ] **Task 4.2**: Implement `blueprint-create` command
- [ ] **Task 4.3**: Implement `blueprint-generate` command
- [ ] **Task 4.4**: Implement template engine with Handlebars
- [ ] **Task 4.5**: Implement `blueprint-validate` command
- [ ] **Task 4.6**: Implement batch generation
- [ ] **Task 4.7**: Create blueprint library (common patterns)
**Duration**: 5-6 sessions
**Dependencies**: Phases 1 and 3 complete
**Deliverables**: Blueprint system, template engine, generation commands
---
## Phase 5: Documentation and Integration
**Goal**: Comprehensive documentation, examples, and ecosystem integration.
### 5.1 Documentation Suite
**Documents to Create**:
- [ ] Schema Evolution Guide (why and how)
- [ ] Schema Classification Reference
- [ ] Content Control Specification
- [ ] Blueprint System Guide
- [ ] Schema Design Best Practices
- [ ] Migration Guide (old schemas → new format)
- [ ] API Reference for programmatic usage
### 5.2 Example Gallery
**Create comprehensive examples**:
- [ ] Manpage blueprint (already started)
- [ ] API documentation blueprint
- [ ] Tutorial document blueprint
- [ ] Architecture Decision Record (ADR) blueprint
- [ ] RFC/specification blueprint
- [ ] Meeting notes blueprint
- [ ] Project README blueprint
### 5.3 CLI Integration
**Update existing commands**:
```bash
# schema-generate with classification
markitect schema-generate example.md \
--classify-sections \
--add-instructions \
--flexible \
--output smart-schema.json
# validate with multiple schemas
markitect validate doc.md \
--schemas schema1.json,schema2.json \
--classification-aware \
--quality-check
# generate-stub enhanced
markitect generate-stub schema.json \
--include-instructions \
--sample-content \
--output template.md
```
### 5.4 CI/CD Integration Templates
**Provide ready-to-use integrations**:
GitHub Actions:
```yaml
- name: Validate Documentation
uses: markitect/validate-action@v1
with:
schemas: docs/schemas/*.json
files: docs/**/*.md
classification-aware: true
fail-on: errors
warn-on: missing-recommended
```
Pre-commit hook:
```bash
#!/bin/bash
markitect validate-changed --schemas docs/schemas/ \
--classification-aware \
--fail-on errors
```
### Tasks
- [ ] **Task 5.1**: Write comprehensive documentation suite
- [ ] **Task 5.2**: Create example gallery with 7+ blueprints
- [ ] **Task 5.3**: Update all CLI commands for new features
- [ ] **Task 5.4**: Create CI/CD integration templates
- [ ] **Task 5.5**: Write migration guide for existing schemas
- [ ] **Task 5.6**: Create video tutorials/screencasts
**Duration**: 3-4 sessions
**Dependencies**: All previous phases complete
**Deliverables**: Complete documentation, examples, integrations
---
## Implementation Strategy
### Development Approach
**1. Test-Driven Development**
- Write tests for each classification level
- Test schema refinement transformations
- Test blueprint generation with various data
- Test multi-schema validation
**2. Backward Compatibility**
- Existing schemas continue to work
- New features are opt-in via extensions
- Clear migration path documented
**3. Incremental Rollout**
- Phase 1: Can be used immediately after completion
- Each phase delivers user value independently
- Later phases build on earlier phases
**4. Community Feedback**
- Alpha release after Phase 1
- Beta release after Phase 3
- Stable release after Phase 5
### Technical Considerations
**Schema Format**:
- JSON Schema draft-07 as foundation
- MarkiTect extensions namespaced with `x-markitect-`
- Validation via metaschema
- Clear upgrade path to future JSON Schema versions
**Performance**:
- Cache compiled schemas
- Lazy validation for large documents
- Parallel validation for multiple schemas
- Optimize content pattern matching
**API Design**:
- Programmatic access to all features
- Python API for schema manipulation
- Plugin system for custom validators
- Extensible template engine
---
## Success Metrics
### Phase 1 Success
- ✅ Schema with all 5 classifications validates correctly
- ✅ Content instructions appear in generated stubs
- ✅ Metaschema validates all extension formats
### Phase 2 Success
- ✅ Rigid schema refined to flexible schema automatically
- ✅ Multiple schemas composed without conflicts
- ✅ Interactive refinement completes end-to-end
### Phase 3 Success
- ✅ Validation distinguishes errors from warnings
- ✅ Content patterns detected and reported
- ✅ Multi-schema validation works with 3+ schemas
- ✅ Quality metrics provide actionable feedback
### Phase 4 Success
- ✅ Blueprint generates valid document from data
- ✅ Generated document validates against source schemas
- ✅ Batch generation processes 100+ documents
- ✅ Template engine supports complex logic
### Phase 5 Success
- ✅ Documentation covers all features
- ✅ 7+ working blueprint examples
- ✅ CI/CD integrations work in real projects
- ✅ Migration guide successfully upgrades old schemas
---
## Risk Assessment
### Technical Risks
**Risk**: Schema format complexity
**Mitigation**: Clear examples, validation tools, gradual adoption
**Risk**: Performance degradation with complex schemas
**Mitigation**: Caching, optimization, benchmarking
**Risk**: Template engine security (code injection)
**Mitigation**: Sandboxed execution, no eval, strict parsing
### Adoption Risks
**Risk**: Breaking changes to existing workflows
**Mitigation**: Full backward compatibility, opt-in features
**Risk**: Learning curve for new features
**Mitigation**: Excellent documentation, examples, tutorials
**Risk**: Feature bloat
**Mitigation**: Keep core simple, advanced features optional
---
## Future Enhancements (Post-MVP)
### Potential Future Features
**1. Semantic Validation**
- AI-powered content quality checking
- Grammar and style validation
- Factual consistency checking
- Link and reference validation
**2. Visual Schema Editor**
- Web-based GUI for schema creation
- Visual blueprint designer
- Live preview of generated documents
- Drag-and-drop section arrangement
**3. Schema Marketplace**
- Community schema repository
- Reusable blueprint library
- Rating and reviews system
- Version management
**4. Advanced Blueprint Features**
- Conditional sections based on data
- Dynamic schema selection
- Multi-language support
- Custom helper functions
**5. Integration Ecosystem**
- IDE plugins (VS Code, JetBrains)
- Documentation platforms (Read the Docs, Docusaurus)
- CMS integrations (Contentful, Strapi)
- Static site generators (Hugo, Jekyll)
---
## Conclusion
This workplan transforms MarkiTect from a structural validator to a comprehensive document control system:
**Current**: Rigid structure validation
**Target**: Flexible content control with blueprints
**Key Improvements**:
1. ✨ Classification system (required → improper)
2. ✨ Content guidance and instructions
3. ✨ Multi-schema conformance
4. ✨ Blueprint-based generation
5. ✨ Quality metrics and analysis
**Timeline**: ~8-10 weeks for full implementation
**Value**: Complete CMS-like document control for markdown
The system remains true to MarkiTect's philosophy of treating markdown as structured data while adding the flexibility and guidance needed for real-world content management.
---
## Next Steps
1. **Review and refine** this workplan
2. **Prioritize phases** based on user needs
3. **Create detailed specifications** for Phase 1
4. **Set up development environment** for new features
5. **Begin implementation** with TDD approach
**First Implementation Task**: Define `x-markitect-sections` format specification