# Markdown Schema Loader - User Guide **Version:** 1.0 **Status:** Implemented **Created:** 2026-01-04 ## Overview The Markdown Schema Loader enables MarkiTect to load JSON schemas from markdown files, combining rich documentation with machine-readable validation rules. This aligns with MarkiTect's markdown-first philosophy while maintaining JSON Schema compatibility. ## Markdown Schema Format A markdown schema file consists of three parts: 1. **YAML Frontmatter**: Metadata about the schema 2. **Documentation**: Rich markdown content explaining the schema 3. **Schema Definition**: JSON schema in a code block ### Example Structure ```markdown --- schema-id: "https://markitect.dev/schemas/domain/v1.0" version: "1.0.0" status: "stable" --- # Schema Title v1.0 ## Overview Description of what this schema validates... ## Usage How to use this schema... ## Schema Definition ```json { "$schema": "http://json-schema.org/draft-07/schema#", "title": "My Schema", "type": "object", ... } ``` ## Version History - v1.0.0 - Initial version ``` ## Frontmatter Metadata ### Required Fields None are strictly required, but these are recommended: | Field | Type | Description | Example | |-------|------|-------------|---------| | `schema-id` | string | Canonical URI for the schema | `https://markitect.dev/schemas/manpage/v1.0` | | `version` | string | SemVer version | `1.0.0` | | `status` | string | Lifecycle status | `stable`, `draft`, `deprecated` | ### Optional Fields | Field | Type | Description | |-------|------|-------------| | `domain` | string | Schema domain name | | `description` | string | Brief schema description | | `authors` | array | List of authors | | `created` | string | Creation date (ISO 8601) | | `updated` | string | Last update date (ISO 8601) | ### Metadata Merging Frontmatter metadata takes precedence over schema fields: - `schema-id` → `$id` in the schema - `version` → `version` in the schema - `status` → `x-markitect-metadata.status` in the schema All frontmatter is preserved in `x-markitect-source.frontmatter`. ## JSON Schema Extraction ### Schema Definition Section The loader prefers JSON blocks under a `## Schema Definition` heading: ```markdown ## Schema Definition ```json { "$schema": "http://json-schema.org/draft-07/schema#", ... } ``` ``` ### Fallback Behavior If no `## Schema Definition` section exists, the loader uses the **first** JSON code block in the file. ### Multiple JSON Blocks You can include multiple JSON blocks in documentation: ```markdown ## Example Usage ```json { "name": "example", "version": "1.0" } ``` ## Schema Definition ```json { "$schema": "http://json-schema.org/draft-07/schema#", "properties": { "name": {"type": "string"}, "version": {"type": "string"} } } ``` ``` The loader will use the schema under `## Schema Definition` heading. ## Using the Loader ### Python API ```python from pathlib import Path from markitect.schema_loader import MarkdownSchemaLoader # Create loader instance loader = MarkdownSchemaLoader() # Load schema from markdown schema_data = loader.load_schema(Path("manpage-schema-v1.0.md")) # Access components schema = schema_data['schema'] # JSON Schema dict metadata = schema_data['metadata'] # Frontmatter dict docs = schema_data['documentation'] # Full markdown content source = schema_data['source_file'] # Source file path # Use the schema print(f"Loaded: {schema['title']}") print(f"Version: {schema['version']}") print(f"Status: {metadata['status']}") ``` ### Loading from Markdown ```python # Load schema schema_data = loader.load_schema(Path("my-schema-v1.0.md")) # Check for issues issues = loader.validate_schema_structure(schema_data['schema']) if issues: for issue in issues: print(f"⚠️ {issue}") ``` ### Saving to Markdown ```python # Create a schema schema = { "$schema": "http://json-schema.org/draft-07/schema#", "title": "My Schema", "version": "1.0.0", "type": "object", "properties": { "name": {"type": "string"} } } # Save as markdown loader.save_schema( schema=schema, md_path=Path("my-schema-v1.0.md"), frontmatter={ "schema-id": "https://example.com/schemas/my-schema/v1.0", "status": "draft" } ) ``` ### Round-Trip Conversion ```python # Load existing JSON schema import json json_schema = json.loads(Path("old-schema.json").read_text()) # Save as markdown loader.save_schema( schema=json_schema, md_path=Path("new-schema-v1.0.md") ) # Load it back schema_data = loader.load_schema(Path("new-schema-v1.0.md")) # Schemas are equivalent assert schema_data['schema']['title'] == json_schema['title'] ``` ## Advanced Features ### Listing JSON Blocks Useful for debugging when multiple JSON blocks exist: ```python content = Path("schema.md").read_text() blocks = loader.list_json_blocks(content) print(f"Found {len(blocks)} JSON blocks:") for position, json_content in blocks: print(f" Position {position}: {len(json_content)} chars") ``` ### Schema Structure Validation Check for recommended fields and conventions: ```python issues = loader.validate_schema_structure(schema) for issue in issues: print(f"⚠️ {issue}") # Example output: # ⚠️ Missing recommended field: $id # ⚠️ Missing MarkiTect convention: version field ``` ### Custom Templates Use custom markdown templates for saving schemas: ```python template = """--- {frontmatter_yaml} --- # {title} {description} ## Schema ```json {schema_json} ``` """ loader.save_schema( schema=schema, md_path=Path("custom-schema-v1.0.md"), template=template ) ``` ## Error Handling ### Common Errors | Error | Cause | Solution | |-------|-------|----------| | `FileNotFoundError` | Schema file doesn't exist | Check file path | | `SchemaNotFoundError` | No JSON block in markdown | Add ```json code block | | `InvalidSchemaFormatError` | Invalid JSON or YAML | Check syntax | | `SchemaFilenameError` | Invalid filename format | Use `{domain}-schema-v{major}.{minor}.md` | ### Example Error Handling ```python from markitect.schema_loader import ( MarkdownSchemaLoader, SchemaNotFoundError, InvalidSchemaFormatError ) loader = MarkdownSchemaLoader() try: schema_data = loader.load_schema(Path("my-schema.md")) except FileNotFoundError as e: print(f"❌ File not found: {e}") except SchemaNotFoundError as e: print(f"❌ No schema in file: {e}") except InvalidSchemaFormatError as e: print(f"❌ Invalid format: {e}") ``` ## Best Practices ### 1. Use Schema Definition Section Always place the main schema under `## Schema Definition`: ```markdown ## Schema Definition ```json {...} ``` ``` ### 2. Include Frontmatter Provide metadata for better discoverability: ```yaml --- schema-id: "https://markitect.dev/schemas/domain/v1.0" version: "1.0.0" status: "stable" --- ``` ### 3. Add Rich Documentation Explain the schema purpose, usage, and examples: ```markdown ## Overview This schema validates... ## Usage ```bash markitect validate doc.md --schema my-schema-v1.0 ``` ## Examples ... ``` ### 4. Version Your Schemas Follow the naming convention: - Initial: `my-schema-v1.0.md` - Minor update: `my-schema-v1.1.md` - Breaking change: `my-schema-v2.0.md` ### 5. Validate Structure Always check for common issues: ```python issues = loader.validate_schema_structure(schema) if not issues: print("✅ Schema structure is valid") ``` ## Integration with MarkiTect ### CLI Usage (Future) Once integrated with the CLI, you'll be able to: ```bash # Ingest markdown schema markitect schema-ingest manpage-schema-v1.0.md # Validate against markdown schema markitect validate document.md --schema manpage-schema-v1.0 # Export schema markitect schema-get manpage-schema-v1.0 --output json ``` ### Validator Integration The SchemaValidator will automatically detect `.md` schemas: ```python from markitect.validator import SchemaValidator validator = SchemaValidator() validator.validate( document="my-doc.md", schema="manpage-schema-v1.0.md" # .md extension auto-detected ) ``` ## Markdown Schema Template Here's a complete template for creating new schemas: ```markdown --- schema-id: "https://markitect.dev/schemas/YOUR-DOMAIN/v1.0" version: "1.0.0" status: "draft" domain: "YOUR-DOMAIN" description: "Brief description of what this schema validates" authors: - "Your Name " created: "2026-01-04" --- # YOUR-DOMAIN Schema v1.0 ## Overview Detailed description of what this schema validates and why it exists. ## Features - Feature 1 - Feature 2 - Feature 3 ## Usage ### Validating Documents ```bash markitect validate document.md --schema YOUR-DOMAIN-schema-v1.0 ``` ### Common Validation Errors 1. **Error Type 1**: Description and solution 2. **Error Type 2**: Description and solution ## Schema Definition ```json { "$schema": "http://json-schema.org/draft-07/schema#", "title": "YOUR DOMAIN Schema", "description": "Schema description", "type": "object", "properties": { "field1": { "type": "string", "description": "Description of field1" } }, "required": ["field1"] } ``` ## Examples ### Valid Document ```markdown Example of valid content... ``` ### Invalid Document ```markdown Example of invalid content... ``` ## Version History ### v1.0.0 (2026-01-04) - Initial version - Feature A - Feature B ## Related Documentation - [Related Schema 1](../other-schema-v1.0.md) - [MarkiTect Documentation](../../README.md) ``` ## Testing The loader has comprehensive test coverage: ```bash # Run all loader tests pytest tests/test_schema_loader.py -v # Run specific test class pytest tests/test_schema_loader.py::TestMarkdownSchemaLoader -v # Check coverage pytest tests/test_schema_loader.py --cov=markitect.schema_loader ``` **Test Results**: 35/35 tests passing (100%) ## Implementation Details ### Regex Patterns The loader uses these regex patterns: ```python # Frontmatter pattern r'^---\s*\n(.*?)\n---\s*\n' # JSON code block pattern r'```json\s*\n(.*?)\n```' # Schema Definition section pattern r'##\s+Schema Definition\s*\n' ``` ### Metadata Merging The `_merge_metadata` method: 1. Copies the original schema 2. Adds `x-markitect-source` with file metadata 3. Merges frontmatter fields: - `schema-id` → `$id` - `version` → `version` - `status` → `x-markitect-metadata.status` ### File Encoding All files are read/written as UTF-8. Invalid UTF-8 sequences raise `InvalidSchemaFormatError`. ## Troubleshooting ### Schema Not Found **Problem**: `SchemaNotFoundError: No JSON schema found` **Solutions**: - Ensure you have a ```json code block - Check the JSON syntax is valid - Verify the code block is properly closed with ``` ### Invalid YAML Frontmatter **Problem**: `InvalidSchemaFormatError: Invalid YAML frontmatter` **Solutions**: - Check YAML syntax (indentation, colons, quotes) - Ensure frontmatter is between `---` delimiters - Verify frontmatter is at the start of file ### Binary File Error **Problem**: `InvalidSchemaFormatError: Failed to read schema file` **Solutions**: - Ensure file is text, not binary - Check file encoding is UTF-8 - Verify file isn't corrupted ## See Also - [Schema Naming Specification](SCHEMA_NAMING_SPEC.md) - [Schema Management Workplan](WORKPLAN.md) - [Phase 2 Documentation](WORKPLAN.md#phase-2-markdown-schema-loader) - [Example Markdown Schema](../../markitect/schemas/manpage-schema-v1.0.md) ## Changelog ### v1.0.0 (2026-01-04) - Initial implementation - 35 unit tests (100% passing) - Frontmatter extraction with YAML parsing - JSON code block extraction with section preference - Metadata merging with x-markitect-source tracking - Schema saving with template support - Round-trip save/load capability - Helper methods for validation and debugging