# Schema-of-Schemas Implementation Workplan **Project:** Implement Markdown-First Schema System with Self-Description **Created:** 2026-01-04 **Status:** Planning **Duration:** 6-10 days **Priority:** High - Foundation for all schema work ## Executive Summary This workplan implements a comprehensive schema management system: 1. Filename conventions and versioning 2. Markdown-first schema format (`.md` with embedded JSON) 3. Schema-for-schemas (metaschema) for validation 4. Migration of existing schemas 5. Cleanup of legacy schemas from registry ## Project Goals ### Primary Goals - [x] Establish filename convention: `{domain}-schema-v{version}.md` - [ ] Implement markdown schema parser (extract JSON from markdown) - [ ] Create schema-for-schemas to validate all schemas - [ ] Migrate existing schemas to new format - [ ] Remove legacy/duplicate schemas from registry ### Success Criteria - ✅ All schemas follow naming convention - ✅ Schemas stored as markdown files with embedded JSON - ✅ Schema-for-schemas validates all schemas successfully - ✅ No duplicate schemas in registry - ✅ CLI commands work with `.md` schema files - ✅ Documentation updated ## Architecture Overview ### Current State ``` Schemas: JSON files (.json) Naming: Inconsistent (api-documentation, markdown-manpage-schema.json) Versioning: None Documentation: Separate or missing Registry: Database with 5 schemas (3 duplicates) ``` ### Target State ``` Schemas: Markdown files (.md) with embedded JSON Naming: {domain}-schema-v{major}.{minor}.md Versioning: SemVer in filename and metadata Documentation: Inline with schema Registry: Clean, versioned, no duplicates Validation: Schema-for-schemas validates all schemas ``` ### Components to Build ``` markitect/ ├── schema_loader.py # NEW: Load schemas from markdown ├── schema_validator.py # UPDATED: Support .md schemas ├── cli.py # UPDATED: Accept .md schema files └── schemas/ ├── schema-schema-v1.md # NEW: Schema-for-schemas └── ...versioned schemas... examples/schemas/ # Markdown schema examples └── manpage-schema-v1.md # Already created roadmap/schema-of-schemas/ # Planning artifacts ├── WORKPLAN.md # This file ├── SCHEMA_NAMING_SPEC.md # Naming convention spec └── IMPLEMENTATION_LOG.md # Progress tracking ``` ## Phase Breakdown ### Phase 0: Planning & Setup ✅ (0.5 days) **Goal:** Establish project structure and specifications **Tasks:** - [x] Create roadmap/schema-of-schemas directory - [x] Move planning documents to roadmap - [ ] Write naming convention specification - [ ] Document schema metadata standard - [ ] Create implementation checklist **Deliverables:** - [x] Directory structure - [ ] SCHEMA_NAMING_SPEC.md - [ ] SCHEMA_METADATA_SPEC.md - [ ] This workplan **Duration:** 0.5 days **Status:** In Progress --- ### Phase 1: Filename Convention & Validation (1 day) **Goal:** Establish and enforce filename conventions **1.1 Define Naming Convention** **Specification:** ``` Format: {domain}-schema-v{major}.{minor}.md Components: - domain: lowercase, hyphen-separated (e.g., "manpage", "api-documentation") - schema: literal string "schema" - version: SemVer major.minor (e.g., "v1.0", "v2.1") - extension: ".md" (markdown) Examples: ✓ manpage-schema-v1.0.md ✓ terminology-schema-v1.0.md ✓ api-documentation-schema-v1.0.md ✗ manpage.json (missing version) ✗ manpage-v1.md (missing "schema") ✗ ManPage-Schema-v1.0.md (wrong case) ``` **1.2 Implement Validation Function** **File:** `markitect/schema_naming.py` (NEW) ```python import re from pathlib import Path from typing import Tuple, Optional SCHEMA_FILENAME_PATTERN = re.compile( r'^(?P[a-z][a-z0-9-]*)-schema-v(?P\d+)\.(?P\d+)\.md$' ) def validate_schema_filename(filename: str) -> Tuple[bool, Optional[dict]]: """ Validate schema filename against convention. Returns: (is_valid, metadata_dict) """ match = SCHEMA_FILENAME_PATTERN.match(filename) if not match: return False, None return True, { 'domain': match.group('domain'), 'version': f"{match.group('major')}.{match.group('minor')}", 'major': int(match.group('major')), 'minor': int(match.group('minor')) } def suggest_schema_filename(domain: str, version: str) -> str: """Generate correct schema filename from domain and version.""" # Normalize domain: lowercase, replace spaces with hyphens domain_clean = domain.lower().replace(' ', '-').replace('_', '-') return f"{domain_clean}-schema-v{version}.md" ``` **1.3 Add CLI Validation** **Update:** `markitect/cli.py` - schema-ingest command ```python @cli.command('schema-ingest') @click.argument('schema_file', type=click.Path(exists=True, path_type=Path)) @click.option('--force', is_flag=True, help='Skip filename validation') def schema_ingest(config, schema_file, force): """Ingest schema file with filename validation.""" from .schema_naming import validate_schema_filename, suggest_schema_filename filename = schema_file.name is_valid, metadata = validate_schema_filename(filename) if not is_valid and not force: click.echo(f"❌ Invalid schema filename: {filename}", err=True) click.echo("\nExpected format: {domain}-schema-v{major}.{minor}.md") click.echo("Example: manpage-schema-v1.0.md") # Try to suggest correct name # ... extract domain/version from file content ... suggestion = suggest_schema_filename(domain, version) click.echo(f"\nSuggested filename: {suggestion}") click.echo("\nUse --force to skip validation") sys.exit(1) # Continue with ingestion... ``` **Tasks:** - [ ] Write `markitect/schema_naming.py` - [ ] Add unit tests for filename validation - [ ] Update `schema-ingest` command with validation - [ ] Test with valid and invalid filenames **Deliverables:** - [ ] schema_naming.py with validation logic - [ ] Unit tests (tests/test_schema_naming.py) - [ ] Updated CLI with validation - [ ] SCHEMA_NAMING_SPEC.md documentation **Duration:** 1 day --- ### Phase 2: Markdown Schema Loader (2-3 days) **Goal:** Parse markdown files to extract JSON schemas **2.1 Design Markdown Schema Format** **Format Specification:** ```markdown --- schema-id: "https://markitect.dev/schemas/{domain}/v{major}" version: "{major}.{minor}.{patch}" status: "stable|draft|deprecated" --- # {Title} v{version} ## Overview [Human-readable description] ## Usage [Examples of how to use this schema] ## Schema Definition ```json { "$schema": "http://json-schema.org/draft-07/schema#", "$id": "https://markitect.dev/schemas/{domain}/v{major}", "version": "{major}.{minor}.{patch}", ... } \``` ## Validation Rules [Explanation of schema rules] ## Version History [Changelog] ``` **2.2 Implement Markdown Schema Loader** **File:** `markitect/schema_loader.py` (NEW) ```python """ Schema Loader - Extract JSON schemas from markdown files. Supports: - YAML frontmatter for metadata - JSON code block for schema definition - Validation of schema structure """ import re import json import yaml from pathlib import Path from typing import Dict, Any, Optional, Tuple class MarkdownSchemaLoader: """Load and parse markdown schema files.""" def __init__(self): self.frontmatter_pattern = re.compile( r'^---\s*\n(.*?)\n---\s*\n', re.DOTALL | re.MULTILINE ) self.json_code_block_pattern = re.compile( r'```json\s*\n(.*?)\n```', re.DOTALL | re.MULTILINE ) def load_schema(self, md_path: Path) -> Dict[str, Any]: """ Load schema from markdown file. Returns: { 'schema': {...}, # Extracted JSON schema 'metadata': {...}, # Frontmatter metadata 'documentation': '...' # Full markdown content } """ if not md_path.exists(): raise FileNotFoundError(f"Schema file not found: {md_path}") content = md_path.read_text(encoding='utf-8') # Extract frontmatter metadata = self._extract_frontmatter(content) # Extract JSON schema schema = self._extract_json_schema(content) if not schema: raise ValueError(f"No JSON schema found in {md_path}") # Merge metadata into schema schema = self._merge_metadata(schema, metadata, md_path) return { 'schema': schema, 'metadata': metadata, 'documentation': content, 'source_file': str(md_path) } def _extract_frontmatter(self, content: str) -> Dict[str, Any]: """Extract YAML frontmatter from markdown.""" match = self.frontmatter_pattern.search(content) if not match: return {} try: return yaml.safe_load(match.group(1)) or {} except yaml.YAMLError as e: raise ValueError(f"Invalid YAML frontmatter: {e}") def _extract_json_schema(self, content: str) -> Optional[Dict[str, Any]]: """Extract JSON schema from code block.""" matches = self.json_code_block_pattern.findall(content) if not matches: return None # Use the first JSON code block as schema # (or could look for specific heading like "## Schema Definition") try: return json.loads(matches[0]) except json.JSONDecodeError as e: raise ValueError(f"Invalid JSON schema: {e}") def _merge_metadata( self, schema: Dict[str, Any], metadata: Dict[str, Any], source_file: Path ) -> Dict[str, Any]: """Merge frontmatter metadata into schema.""" # Add MarkiTect-specific metadata schema['x-markitect-source'] = { 'file': str(source_file), 'format': 'markdown', 'frontmatter': metadata } # Override schema fields with frontmatter if present if 'version' in metadata: schema['version'] = metadata['version'] if 'schema-id' in metadata: schema['$id'] = metadata['schema-id'] return schema def save_schema( self, schema: Dict[str, Any], md_path: Path, template: Optional[str] = None ): """ Save schema as markdown file. Args: schema: JSON schema dict md_path: Output path template: Optional markdown template """ if template: # Use provided template content = self._render_template(template, schema) else: # Generate basic markdown content = self._generate_markdown(schema) md_path.write_text(content, encoding='utf-8') def _generate_markdown(self, schema: Dict[str, Any]) -> str: """Generate markdown from schema.""" title = schema.get('title', 'Untitled Schema') version = schema.get('version', '1.0.0') description = schema.get('description', '') # Generate frontmatter frontmatter = yaml.dump({ 'schema-id': schema.get('$id', ''), 'version': version, 'status': 'draft' }, default_flow_style=False) # Generate markdown md = f"""--- {frontmatter}--- # {title} v{version} ## Overview {description} ## Schema Definition ```json {json.dumps(schema, indent=2)} ``` ## Version History ### v{version} - Initial version """ return md class SchemaLoaderError(Exception): """Base exception for schema loading errors.""" pass ``` **2.3 Update Schema Validator** **Update:** `markitect/schema_validator.py` ```python from .schema_loader import MarkdownSchemaLoader class SchemaValidator: def __init__(self): self.schema_generator = SchemaGenerator() self.jsonschema_available = JSONSCHEMA_AVAILABLE self.md_loader = MarkdownSchemaLoader() # NEW def validate_file_against_schema_file( self, file_path: Path, schema_file_path: Path ) -> bool: """Validate file against schema (supports .json and .md).""" # Detect schema file format if schema_file_path.suffix == '.md': # Load from markdown schema_data = self.md_loader.load_schema(schema_file_path) schema = schema_data['schema'] else: # Load from JSON (legacy) schema_content = schema_file_path.read_text(encoding='utf-8') schema = json.loads(schema_content) return self.validate_file_against_schema(file_path, schema) ``` **Tasks:** - [ ] Implement MarkdownSchemaLoader class - [ ] Add frontmatter extraction (YAML) - [ ] Add JSON code block extraction - [ ] Add metadata merging logic - [ ] Write comprehensive unit tests - [ ] Update SchemaValidator to use loader - [ ] Test with example markdown schemas **Deliverables:** - [ ] schema_loader.py implementation - [ ] Unit tests (tests/test_schema_loader.py) - [ ] Updated schema_validator.py - [ ] Integration tests **Duration:** 2-3 days --- ### Phase 3: Schema-for-Schemas (2 days) **Goal:** Create metaschema to validate all schema files **3.1 Design Schema-for-Schemas** **File:** `markitect/schemas/schema-schema-v1.md` **Purpose:** Validates that schema files follow MarkiTect conventions **Validates:** - Required fields ($schema, $id, version, title, description) - Version format (SemVer) - $id URL format - x-markitect-* extensions - Section classifications - Content control structures **3.2 Implement Schema-for-Schemas** See separate file: `roadmap/schema-of-schemas/schema-schema-v1.md` (to be created) **3.3 Add Schema Validation Command** **New CLI command:** `markitect schema-validate` ```python @cli.command('schema-validate') @click.argument('schema_file', type=click.Path(exists=True, path_type=Path)) @click.option('--detailed-errors', is_flag=True) def schema_validate(config, schema_file, detailed_errors): """ Validate a schema file against the schema-for-schemas. Ensures schema files follow MarkiTect conventions and standards. """ from .schema_loader import MarkdownSchemaLoader from .schema_validator import SchemaValidator loader = MarkdownSchemaLoader() validator = SchemaValidator() # Load the schema try: schema_data = loader.load_schema(schema_file) schema = schema_data['schema'] except Exception as e: click.echo(f"❌ Failed to load schema: {e}", err=True) sys.exit(1) # Load schema-for-schemas metaschema_path = Path(__file__).parent / 'schemas' / 'schema-schema-v1.md' metaschema_data = loader.load_schema(metaschema_path) metaschema = metaschema_data['schema'] # Validate is_valid = validator.validate_schema_against_metaschema(schema, metaschema) if is_valid: click.echo(f"✅ Schema is valid: {schema_file.name}") click.echo(f" Title: {schema.get('title')}") click.echo(f" Version: {schema.get('version')}") else: click.echo(f"❌ Schema validation failed: {schema_file.name}", err=True) if detailed_errors: # Show detailed errors pass sys.exit(1) ``` **Tasks:** - [ ] Design schema-for-schemas structure - [ ] Implement schema-schema-v1.md - [ ] Add schema validation logic - [ ] Create `schema-validate` CLI command - [ ] Test all existing schemas against metaschema - [ ] Document validation rules **Deliverables:** - [ ] schema-schema-v1.md (metaschema) - [ ] schema-validate command - [ ] Validation documentation - [ ] Test suite **Duration:** 2 days --- ### Phase 4: Schema Migration (1-2 days) **Goal:** Convert existing schemas to new format **4.1 Inventory Current Schemas** Current schemas in database: ``` 1. terminology-schema.json → terminology-schema-v1.0.md 2. api-documentation → api-documentation-schema-v1.0.md 3. enhanced-manpage → manpage-schema-v2.0.md 4. markdown-manpage → manpage-schema-v1.0.md (DUPLICATE) 5. markdown-manpage-schema.json → manpage-schema-v1.0.md (DUPLICATE) ``` **Decision matrix:** - Keep enhanced-manpage as v2.0 (has classifications) - Merge markdown-manpage variants into v1.0 - Update terminology to v1.0 - Update api-documentation to v1.0 **4.2 Create Migration Script** **File:** `scripts/migrate_schemas.py` ```python #!/usr/bin/env python3 """Migrate schemas to markdown format with versioning.""" from pathlib import Path from markitect.schema_loader import MarkdownSchemaLoader from markitect.database import DatabaseManager def migrate_schema( db_manager: DatabaseManager, old_name: str, new_name: str, version: str, domain: str ): """Migrate single schema to new format.""" # Get old schema from database old_schema = db_manager.get_schema_file(old_name) if not old_schema: print(f"❌ Schema not found: {old_name}") return schema_json = json.loads(old_schema['schema_content']) # Update metadata schema_json['version'] = version schema_json['$id'] = f"https://markitect.dev/schemas/{domain}/v{version.split('.')[0]}" # Save as markdown loader = MarkdownSchemaLoader() md_path = Path(f"markitect/schemas/{new_name}") loader.save_schema(schema_json, md_path) print(f"✓ Migrated: {old_name} → {new_name}") # Ingest new schema # ... ingest markdown schema to database ... return md_path def main(): migrations = [ ('terminology-schema.json', 'terminology-schema-v1.0.md', '1.0.0', 'terminology'), ('api-documentation', 'api-documentation-schema-v1.0.md', '1.0.0', 'api-documentation'), ('enhanced-manpage', 'manpage-schema-v2.0.md', '2.0.0', 'manpage'), ('markdown-manpage', 'manpage-schema-v1.0.md', '1.0.0', 'manpage'), ] db = DatabaseManager('markitect.db') for old, new, version, domain in migrations: migrate_schema(db, old, new, version, domain) ``` **4.3 Execute Migration** ```bash # Run migration script python scripts/migrate_schemas.py # Validate all new schemas for schema in markitect/schemas/*-schema-v*.md; do markitect schema-validate "$schema" done # Ingest new schemas for schema in markitect/schemas/*-schema-v*.md; do markitect schema-ingest "$schema" done ``` **4.4 Clean Up Registry** ```bash # Remove old schemas from database markitect schema-delete markdown-manpage markitect schema-delete markdown-manpage-schema.json markitect schema-delete api-documentation markitect schema-delete enhanced-manpage markitect schema-delete terminology-schema.json # Verify cleanup markitect schema-list # Should show only versioned .md schemas ``` **Tasks:** - [ ] Create schema inventory - [ ] Write migration script - [ ] Test migration on one schema - [ ] Execute full migration - [ ] Validate all migrated schemas - [ ] Remove old schemas from database - [ ] Update examples to use new schema names **Deliverables:** - [ ] scripts/migrate_schemas.py - [ ] All schemas migrated to markdown - [ ] Clean registry (no duplicates) - [ ] Migration report **Duration:** 1-2 days --- ### Phase 5: CLI & Documentation Updates (1 day) **Goal:** Update CLI and documentation for new system **5.1 Update CLI Commands** Commands to update: - `schema-ingest` - Accept .md files, validate filename - `schema-list` - Show version in output - `schema-get` - Export as .md or .json - `validate` - Accept .md schema files - `generate-stub` - Work with .md schemas - `schema-generate` - Output .md format option - NEW: `schema-validate` - Validate against metaschema **5.2 Update Documentation** Files to update: - README.md - Mention markdown schemas - examples/terminology/README.md - Use new schema name - docs/specifications/schema-extensions-spec.md - Document markdown format - Create: docs/guides/schema-authoring-guide.md **5.3 Add Schema Templates** **File:** `templates/schema-template-v1.md` ```markdown --- schema-id: "https://markitect.dev/schemas/DOMAIN/v1" version: "1.0.0" status: "draft" --- # TITLE Schema v1.0 ## Overview [Description of what this schema validates] ## Document Types - [Document type 1] - [Document type 2] ## Usage \`\`\`bash markitect validate document.md --schema DOMAIN-schema-v1.0.md \`\`\` ## Examples See [examples/DOMAIN/example.md](../../examples/DOMAIN/example.md) ## Schema Definition \`\`\`json { "$schema": "http://json-schema.org/draft-07/schema#", "$id": "https://markitect.dev/schemas/DOMAIN/v1", "version": "1.0.0", "title": "TITLE Schema", "description": "Schema for validating DESCRIPTION", "type": "object", "properties": { "headings": { "type": "object" } }, "x-markitect-sections": {}, "x-markitect-content-control": {} } \`\`\` ## Validation Rules ### Required Sections - **SECTION** - Description ### Optional Sections - **SECTION** - Description ## Version History ### v1.0.0 (YYYY-MM-DD) - Initial release ``` **Tasks:** - [ ] Update all CLI commands for .md support - [ ] Update documentation - [ ] Create schema authoring guide - [ ] Add schema template - [ ] Update examples - [ ] Test all workflows end-to-end **Deliverables:** - [ ] Updated CLI commands - [ ] Schema authoring guide - [ ] Schema template - [ ] Updated examples - [ ] End-to-end tests **Duration:** 1 day --- ### Phase 6: Testing & Validation (1 day) **Goal:** Comprehensive testing of new system **6.1 Unit Tests** Test coverage for: - `schema_naming.py` - Filename validation - `schema_loader.py` - Markdown parsing - `schema_validator.py` - Validation with .md schemas **6.2 Integration Tests** End-to-end workflows: 1. Create new schema in markdown format 2. Validate schema against schema-for-schemas 3. Ingest schema to database 4. Use schema to validate documents 5. Generate stub from schema 6. Export schema **6.3 Regression Tests** Ensure existing functionality still works: - JSON schemas still load (backward compatibility) - All existing documents validate - Schema generation still works - Stub generation still works **Tasks:** - [ ] Write unit tests for new modules - [ ] Create integration test suite - [ ] Run regression tests - [ ] Fix any issues found - [ ] Achieve >80% code coverage - [ ] Document test procedures **Deliverables:** - [ ] Unit tests (>80% coverage) - [ ] Integration tests - [ ] Regression test suite - [ ] Test documentation **Duration:** 1 day --- ## Timeline ``` Week 1: Day 1: Phase 0 (Planning) + Phase 1 (Naming Convention) Day 2-3: Phase 2 (Markdown Loader) Day 4-5: Phase 3 (Schema-for-Schemas) Week 2: Day 6-7: Phase 4 (Migration) Day 8: Phase 5 (CLI & Docs) Day 9: Phase 6 (Testing) Day 10: Buffer for issues/refinement ``` **Total:** 8-10 days ## Risks & Mitigation ### Risk 1: Parsing Complexity **Risk:** Markdown parsing more complex than expected **Probability:** Medium **Impact:** High **Mitigation:** - Start with simple regex-based parser - Test extensively with edge cases - Have fallback to simpler format ### Risk 2: Backward Compatibility **Risk:** Breaking existing workflows **Probability:** Low **Impact:** High **Mitigation:** - Support both .json and .md during transition - Provide migration script - Test thoroughly with existing documents ### Risk 3: Schema-for-Schemas Complexity **Risk:** Self-referential validation complex **Probability:** Medium **Impact:** Medium **Mitigation:** - Start with simple metaschema - Iterate based on actual schemas - Don't over-engineer initially ## Success Metrics - [ ] All schemas follow naming convention (5/5) - [ ] All schemas in markdown format (5/5) - [ ] All schemas validate against metaschema (5/5) - [ ] Zero duplicate schemas in registry - [ ] CLI commands work with .md schemas - [ ] Documentation comprehensive - [ ] Test coverage >80% - [ ] No regression in existing functionality ## Deliverables Checklist ### Code - [ ] markitect/schema_naming.py - [ ] markitect/schema_loader.py - [ ] markitect/schemas/schema-schema-v1.md - [ ] scripts/migrate_schemas.py - [ ] Updated CLI commands - [ ] Unit tests - [ ] Integration tests ### Documentation - [ ] SCHEMA_NAMING_SPEC.md - [ ] SCHEMA_METADATA_SPEC.md - [ ] Schema authoring guide - [ ] Migration guide - [ ] Updated examples - [ ] IMPLEMENTATION_LOG.md ### Schemas - [ ] terminology-schema-v1.0.md - [ ] api-documentation-schema-v1.0.md - [ ] manpage-schema-v1.0.md - [ ] manpage-schema-v2.0.md - [ ] schema-schema-v1.0.md ### Registry - [ ] Clean schema database - [ ] Updated schema-catalog.yaml - [ ] No duplicates ## Next Steps 1. **Review this workplan** - Get approval 2. **Phase 0** - Complete planning artifacts 3. **Phase 1** - Implement naming validation 4. **Checkpoint** - Review progress after Phase 1 5. **Continue** - Execute remaining phases ## Approval - [ ] Workplan reviewed - [ ] Approach approved - [ ] Ready to begin implementation --- **Status:** Awaiting approval **Next Action:** Complete Phase 0 planning artifacts