feat: implement Phase 2 - Markdown Schema Loader
Completed Phase 2 of the schema-of-schemas implementation with full markdown schema support. This enables schemas to be authored as markdown files with rich documentation and embedded JSON schemas. Core Implementation (markitect/schema_loader.py): - MarkdownSchemaLoader class with comprehensive parsing capabilities - YAML frontmatter extraction with error handling - JSON code block extraction with section preference (## Schema Definition) - Metadata merging with x-markitect-source tracking - Schema saving with template support and round-trip capability - Helper methods: list_json_blocks(), validate_schema_structure() Test Coverage (tests/test_schema_loader.py): - 35 comprehensive unit tests (100% passing) - Tests for loading, parsing, saving, round-trip conversion - Edge case handling (empty files, binary files, malformed blocks) - Fixed binary file test to use invalid UTF-8 sequences Example Schema (markitect/schemas/manpage-schema-v1.0.md): - First markdown schema following naming convention - Complete manpage schema with frontmatter + documentation + JSON - Demonstrates section classification and content control - Shows proper structure for future schema authors Documentation (roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md): - Comprehensive user guide (600+ lines) - API reference with examples - Best practices and troubleshooting - Integration patterns for CLI and validator Progress Tracking: - Updated TODO.md with Phase 2 completion - Updated CHANGELOG.md with implementation details - Next: Phase 3 - Schema-for-Schemas Metaschema 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
579
roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md
Normal file
579
roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md
Normal file
@@ -0,0 +1,579 @@
|
||||
# Markdown Schema Loader - User Guide
|
||||
|
||||
**Version:** 1.0
|
||||
**Status:** Implemented
|
||||
**Created:** 2026-01-04
|
||||
|
||||
## Overview
|
||||
|
||||
The Markdown Schema Loader enables MarkiTect to load JSON schemas from markdown files, combining rich documentation with machine-readable validation rules. This aligns with MarkiTect's markdown-first philosophy while maintaining JSON Schema compatibility.
|
||||
|
||||
## Markdown Schema Format
|
||||
|
||||
A markdown schema file consists of three parts:
|
||||
|
||||
1. **YAML Frontmatter**: Metadata about the schema
|
||||
2. **Documentation**: Rich markdown content explaining the schema
|
||||
3. **Schema Definition**: JSON schema in a code block
|
||||
|
||||
### Example Structure
|
||||
|
||||
```markdown
|
||||
---
|
||||
schema-id: "https://markitect.dev/schemas/domain/v1.0"
|
||||
version: "1.0.0"
|
||||
status: "stable"
|
||||
---
|
||||
|
||||
# Schema Title v1.0
|
||||
|
||||
## Overview
|
||||
Description of what this schema validates...
|
||||
|
||||
## Usage
|
||||
How to use this schema...
|
||||
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "My Schema",
|
||||
"type": "object",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
## Version History
|
||||
- v1.0.0 - Initial version
|
||||
```
|
||||
|
||||
## Frontmatter Metadata
|
||||
|
||||
### Required Fields
|
||||
|
||||
None are strictly required, but these are recommended:
|
||||
|
||||
| Field | Type | Description | Example |
|
||||
|-------|------|-------------|---------|
|
||||
| `schema-id` | string | Canonical URI for the schema | `https://markitect.dev/schemas/manpage/v1.0` |
|
||||
| `version` | string | SemVer version | `1.0.0` |
|
||||
| `status` | string | Lifecycle status | `stable`, `draft`, `deprecated` |
|
||||
|
||||
### Optional Fields
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `domain` | string | Schema domain name |
|
||||
| `description` | string | Brief schema description |
|
||||
| `authors` | array | List of authors |
|
||||
| `created` | string | Creation date (ISO 8601) |
|
||||
| `updated` | string | Last update date (ISO 8601) |
|
||||
|
||||
### Metadata Merging
|
||||
|
||||
Frontmatter metadata takes precedence over schema fields:
|
||||
|
||||
- `schema-id` → `$id` in the schema
|
||||
- `version` → `version` in the schema
|
||||
- `status` → `x-markitect-metadata.status` in the schema
|
||||
|
||||
All frontmatter is preserved in `x-markitect-source.frontmatter`.
|
||||
|
||||
## JSON Schema Extraction
|
||||
|
||||
### Schema Definition Section
|
||||
|
||||
The loader prefers JSON blocks under a `## Schema Definition` heading:
|
||||
|
||||
```markdown
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
...
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
### Fallback Behavior
|
||||
|
||||
If no `## Schema Definition` section exists, the loader uses the **first** JSON code block in the file.
|
||||
|
||||
### Multiple JSON Blocks
|
||||
|
||||
You can include multiple JSON blocks in documentation:
|
||||
|
||||
```markdown
|
||||
## Example Usage
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "example",
|
||||
"version": "1.0"
|
||||
}
|
||||
```
|
||||
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"properties": {
|
||||
"name": {"type": "string"},
|
||||
"version": {"type": "string"}
|
||||
}
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
The loader will use the schema under `## Schema Definition` heading.
|
||||
|
||||
## Using the Loader
|
||||
|
||||
### Python API
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
from markitect.schema_loader import MarkdownSchemaLoader
|
||||
|
||||
# Create loader instance
|
||||
loader = MarkdownSchemaLoader()
|
||||
|
||||
# Load schema from markdown
|
||||
schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))
|
||||
|
||||
# Access components
|
||||
schema = schema_data['schema'] # JSON Schema dict
|
||||
metadata = schema_data['metadata'] # Frontmatter dict
|
||||
docs = schema_data['documentation'] # Full markdown content
|
||||
source = schema_data['source_file'] # Source file path
|
||||
|
||||
# Use the schema
|
||||
print(f"Loaded: {schema['title']}")
|
||||
print(f"Version: {schema['version']}")
|
||||
print(f"Status: {metadata['status']}")
|
||||
```
|
||||
|
||||
### Loading from Markdown
|
||||
|
||||
```python
|
||||
# Load schema
|
||||
schema_data = loader.load_schema(Path("my-schema-v1.0.md"))
|
||||
|
||||
# Check for issues
|
||||
issues = loader.validate_schema_structure(schema_data['schema'])
|
||||
if issues:
|
||||
for issue in issues:
|
||||
print(f"⚠️ {issue}")
|
||||
```
|
||||
|
||||
### Saving to Markdown
|
||||
|
||||
```python
|
||||
# Create a schema
|
||||
schema = {
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "My Schema",
|
||||
"version": "1.0.0",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {"type": "string"}
|
||||
}
|
||||
}
|
||||
|
||||
# Save as markdown
|
||||
loader.save_schema(
|
||||
schema=schema,
|
||||
md_path=Path("my-schema-v1.0.md"),
|
||||
frontmatter={
|
||||
"schema-id": "https://example.com/schemas/my-schema/v1.0",
|
||||
"status": "draft"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Round-Trip Conversion
|
||||
|
||||
```python
|
||||
# Load existing JSON schema
|
||||
import json
|
||||
json_schema = json.loads(Path("old-schema.json").read_text())
|
||||
|
||||
# Save as markdown
|
||||
loader.save_schema(
|
||||
schema=json_schema,
|
||||
md_path=Path("new-schema-v1.0.md")
|
||||
)
|
||||
|
||||
# Load it back
|
||||
schema_data = loader.load_schema(Path("new-schema-v1.0.md"))
|
||||
|
||||
# Schemas are equivalent
|
||||
assert schema_data['schema']['title'] == json_schema['title']
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Listing JSON Blocks
|
||||
|
||||
Useful for debugging when multiple JSON blocks exist:
|
||||
|
||||
```python
|
||||
content = Path("schema.md").read_text()
|
||||
blocks = loader.list_json_blocks(content)
|
||||
|
||||
print(f"Found {len(blocks)} JSON blocks:")
|
||||
for position, json_content in blocks:
|
||||
print(f" Position {position}: {len(json_content)} chars")
|
||||
```
|
||||
|
||||
### Schema Structure Validation
|
||||
|
||||
Check for recommended fields and conventions:
|
||||
|
||||
```python
|
||||
issues = loader.validate_schema_structure(schema)
|
||||
|
||||
for issue in issues:
|
||||
print(f"⚠️ {issue}")
|
||||
|
||||
# Example output:
|
||||
# ⚠️ Missing recommended field: $id
|
||||
# ⚠️ Missing MarkiTect convention: version field
|
||||
```
|
||||
|
||||
### Custom Templates
|
||||
|
||||
Use custom markdown templates for saving schemas:
|
||||
|
||||
```python
|
||||
template = """---
|
||||
{frontmatter_yaml}
|
||||
---
|
||||
|
||||
# {title}
|
||||
|
||||
{description}
|
||||
|
||||
## Schema
|
||||
|
||||
```json
|
||||
{schema_json}
|
||||
```
|
||||
"""
|
||||
|
||||
loader.save_schema(
|
||||
schema=schema,
|
||||
md_path=Path("custom-schema-v1.0.md"),
|
||||
template=template
|
||||
)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| `FileNotFoundError` | Schema file doesn't exist | Check file path |
|
||||
| `SchemaNotFoundError` | No JSON block in markdown | Add ```json code block |
|
||||
| `InvalidSchemaFormatError` | Invalid JSON or YAML | Check syntax |
|
||||
| `SchemaFilenameError` | Invalid filename format | Use `{domain}-schema-v{major}.{minor}.md` |
|
||||
|
||||
### Example Error Handling
|
||||
|
||||
```python
|
||||
from markitect.schema_loader import (
|
||||
MarkdownSchemaLoader,
|
||||
SchemaNotFoundError,
|
||||
InvalidSchemaFormatError
|
||||
)
|
||||
|
||||
loader = MarkdownSchemaLoader()
|
||||
|
||||
try:
|
||||
schema_data = loader.load_schema(Path("my-schema.md"))
|
||||
except FileNotFoundError as e:
|
||||
print(f"❌ File not found: {e}")
|
||||
except SchemaNotFoundError as e:
|
||||
print(f"❌ No schema in file: {e}")
|
||||
except InvalidSchemaFormatError as e:
|
||||
print(f"❌ Invalid format: {e}")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Schema Definition Section
|
||||
|
||||
Always place the main schema under `## Schema Definition`:
|
||||
|
||||
```markdown
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{...}
|
||||
```
|
||||
```
|
||||
|
||||
### 2. Include Frontmatter
|
||||
|
||||
Provide metadata for better discoverability:
|
||||
|
||||
```yaml
|
||||
---
|
||||
schema-id: "https://markitect.dev/schemas/domain/v1.0"
|
||||
version: "1.0.0"
|
||||
status: "stable"
|
||||
---
|
||||
```
|
||||
|
||||
### 3. Add Rich Documentation
|
||||
|
||||
Explain the schema purpose, usage, and examples:
|
||||
|
||||
```markdown
|
||||
## Overview
|
||||
This schema validates...
|
||||
|
||||
## Usage
|
||||
```bash
|
||||
markitect validate doc.md --schema my-schema-v1.0
|
||||
```
|
||||
|
||||
## Examples
|
||||
...
|
||||
```
|
||||
|
||||
### 4. Version Your Schemas
|
||||
|
||||
Follow the naming convention:
|
||||
|
||||
- Initial: `my-schema-v1.0.md`
|
||||
- Minor update: `my-schema-v1.1.md`
|
||||
- Breaking change: `my-schema-v2.0.md`
|
||||
|
||||
### 5. Validate Structure
|
||||
|
||||
Always check for common issues:
|
||||
|
||||
```python
|
||||
issues = loader.validate_schema_structure(schema)
|
||||
if not issues:
|
||||
print("✅ Schema structure is valid")
|
||||
```
|
||||
|
||||
## Integration with MarkiTect
|
||||
|
||||
### CLI Usage (Future)
|
||||
|
||||
Once integrated with the CLI, you'll be able to:
|
||||
|
||||
```bash
|
||||
# Ingest markdown schema
|
||||
markitect schema-ingest manpage-schema-v1.0.md
|
||||
|
||||
# Validate against markdown schema
|
||||
markitect validate document.md --schema manpage-schema-v1.0
|
||||
|
||||
# Export schema
|
||||
markitect schema-get manpage-schema-v1.0 --output json
|
||||
```
|
||||
|
||||
### Validator Integration
|
||||
|
||||
The SchemaValidator will automatically detect `.md` schemas:
|
||||
|
||||
```python
|
||||
from markitect.validator import SchemaValidator
|
||||
|
||||
validator = SchemaValidator()
|
||||
validator.validate(
|
||||
document="my-doc.md",
|
||||
schema="manpage-schema-v1.0.md" # .md extension auto-detected
|
||||
)
|
||||
```
|
||||
|
||||
## Markdown Schema Template
|
||||
|
||||
Here's a complete template for creating new schemas:
|
||||
|
||||
```markdown
|
||||
---
|
||||
schema-id: "https://markitect.dev/schemas/YOUR-DOMAIN/v1.0"
|
||||
version: "1.0.0"
|
||||
status: "draft"
|
||||
domain: "YOUR-DOMAIN"
|
||||
description: "Brief description of what this schema validates"
|
||||
authors:
|
||||
- "Your Name <email@example.com>"
|
||||
created: "2026-01-04"
|
||||
---
|
||||
|
||||
# YOUR-DOMAIN Schema v1.0
|
||||
|
||||
## Overview
|
||||
|
||||
Detailed description of what this schema validates and why it exists.
|
||||
|
||||
## Features
|
||||
|
||||
- Feature 1
|
||||
- Feature 2
|
||||
- Feature 3
|
||||
|
||||
## Usage
|
||||
|
||||
### Validating Documents
|
||||
|
||||
```bash
|
||||
markitect validate document.md --schema YOUR-DOMAIN-schema-v1.0
|
||||
```
|
||||
|
||||
### Common Validation Errors
|
||||
|
||||
1. **Error Type 1**: Description and solution
|
||||
2. **Error Type 2**: Description and solution
|
||||
|
||||
## Schema Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "YOUR DOMAIN Schema",
|
||||
"description": "Schema description",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"field1": {
|
||||
"type": "string",
|
||||
"description": "Description of field1"
|
||||
}
|
||||
},
|
||||
"required": ["field1"]
|
||||
}
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Valid Document
|
||||
|
||||
```markdown
|
||||
Example of valid content...
|
||||
```
|
||||
|
||||
### Invalid Document
|
||||
|
||||
```markdown
|
||||
Example of invalid content...
|
||||
```
|
||||
|
||||
## Version History
|
||||
|
||||
### v1.0.0 (2026-01-04)
|
||||
- Initial version
|
||||
- Feature A
|
||||
- Feature B
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Related Schema 1](../other-schema-v1.0.md)
|
||||
- [MarkiTect Documentation](../../README.md)
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
The loader has comprehensive test coverage:
|
||||
|
||||
```bash
|
||||
# Run all loader tests
|
||||
pytest tests/test_schema_loader.py -v
|
||||
|
||||
# Run specific test class
|
||||
pytest tests/test_schema_loader.py::TestMarkdownSchemaLoader -v
|
||||
|
||||
# Check coverage
|
||||
pytest tests/test_schema_loader.py --cov=markitect.schema_loader
|
||||
```
|
||||
|
||||
**Test Results**: 35/35 tests passing (100%)
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Regex Patterns
|
||||
|
||||
The loader uses these regex patterns:
|
||||
|
||||
```python
|
||||
# Frontmatter pattern
|
||||
r'^---\s*\n(.*?)\n---\s*\n'
|
||||
|
||||
# JSON code block pattern
|
||||
r'```json\s*\n(.*?)\n```'
|
||||
|
||||
# Schema Definition section pattern
|
||||
r'##\s+Schema Definition\s*\n'
|
||||
```
|
||||
|
||||
### Metadata Merging
|
||||
|
||||
The `_merge_metadata` method:
|
||||
|
||||
1. Copies the original schema
|
||||
2. Adds `x-markitect-source` with file metadata
|
||||
3. Merges frontmatter fields:
|
||||
- `schema-id` → `$id`
|
||||
- `version` → `version`
|
||||
- `status` → `x-markitect-metadata.status`
|
||||
|
||||
### File Encoding
|
||||
|
||||
All files are read/written as UTF-8. Invalid UTF-8 sequences raise `InvalidSchemaFormatError`.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Schema Not Found
|
||||
|
||||
**Problem**: `SchemaNotFoundError: No JSON schema found`
|
||||
|
||||
**Solutions**:
|
||||
- Ensure you have a ```json code block
|
||||
- Check the JSON syntax is valid
|
||||
- Verify the code block is properly closed with ```
|
||||
|
||||
### Invalid YAML Frontmatter
|
||||
|
||||
**Problem**: `InvalidSchemaFormatError: Invalid YAML frontmatter`
|
||||
|
||||
**Solutions**:
|
||||
- Check YAML syntax (indentation, colons, quotes)
|
||||
- Ensure frontmatter is between `---` delimiters
|
||||
- Verify frontmatter is at the start of file
|
||||
|
||||
### Binary File Error
|
||||
|
||||
**Problem**: `InvalidSchemaFormatError: Failed to read schema file`
|
||||
|
||||
**Solutions**:
|
||||
- Ensure file is text, not binary
|
||||
- Check file encoding is UTF-8
|
||||
- Verify file isn't corrupted
|
||||
|
||||
## See Also
|
||||
|
||||
- [Schema Naming Specification](SCHEMA_NAMING_SPEC.md)
|
||||
- [Schema Management Workplan](WORKPLAN.md)
|
||||
- [Phase 2 Documentation](WORKPLAN.md#phase-2-markdown-schema-loader)
|
||||
- [Example Markdown Schema](../../markitect/schemas/manpage-schema-v1.0.md)
|
||||
|
||||
## Changelog
|
||||
|
||||
### v1.0.0 (2026-01-04)
|
||||
- Initial implementation
|
||||
- 35 unit tests (100% passing)
|
||||
- Frontmatter extraction with YAML parsing
|
||||
- JSON code block extraction with section preference
|
||||
- Metadata merging with x-markitect-source tracking
|
||||
- Schema saving with template support
|
||||
- Round-trip save/load capability
|
||||
- Helper methods for validation and debugging
|
||||
Reference in New Issue
Block a user