Completed Phase 2 of the schema-of-schemas implementation with full markdown schema support. This enables schemas to be authored as markdown files with rich documentation and embedded JSON schemas. Core Implementation (markitect/schema_loader.py): - MarkdownSchemaLoader class with comprehensive parsing capabilities - YAML frontmatter extraction with error handling - JSON code block extraction with section preference (## Schema Definition) - Metadata merging with x-markitect-source tracking - Schema saving with template support and round-trip capability - Helper methods: list_json_blocks(), validate_schema_structure() Test Coverage (tests/test_schema_loader.py): - 35 comprehensive unit tests (100% passing) - Tests for loading, parsing, saving, round-trip conversion - Edge case handling (empty files, binary files, malformed blocks) - Fixed binary file test to use invalid UTF-8 sequences Example Schema (markitect/schemas/manpage-schema-v1.0.md): - First markdown schema following naming convention - Complete manpage schema with frontmatter + documentation + JSON - Demonstrates section classification and content control - Shows proper structure for future schema authors Documentation (roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md): - Comprehensive user guide (600+ lines) - API reference with examples - Best practices and troubleshooting - Integration patterns for CLI and validator Progress Tracking: - Updated TODO.md with Phase 2 completion - Updated CHANGELOG.md with implementation details - Next: Phase 3 - Schema-for-Schemas Metaschema 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
580 lines
12 KiB
Markdown
580 lines
12 KiB
Markdown
# Markdown Schema Loader - User Guide
|
|
|
|
**Version:** 1.0
|
|
**Status:** Implemented
|
|
**Created:** 2026-01-04
|
|
|
|
## Overview
|
|
|
|
The Markdown Schema Loader enables MarkiTect to load JSON schemas from markdown files, combining rich documentation with machine-readable validation rules. This aligns with MarkiTect's markdown-first philosophy while maintaining JSON Schema compatibility.
|
|
|
|
## Markdown Schema Format
|
|
|
|
A markdown schema file consists of three parts:
|
|
|
|
1. **YAML Frontmatter**: Metadata about the schema
|
|
2. **Documentation**: Rich markdown content explaining the schema
|
|
3. **Schema Definition**: JSON schema in a code block
|
|
|
|
### Example Structure
|
|
|
|
```markdown
|
|
---
|
|
schema-id: "https://markitect.dev/schemas/domain/v1.0"
|
|
version: "1.0.0"
|
|
status: "stable"
|
|
---
|
|
|
|
# Schema Title v1.0
|
|
|
|
## Overview
|
|
Description of what this schema validates...
|
|
|
|
## Usage
|
|
How to use this schema...
|
|
|
|
## Schema Definition
|
|
|
|
```json
|
|
{
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"title": "My Schema",
|
|
"type": "object",
|
|
...
|
|
}
|
|
```
|
|
|
|
## Version History
|
|
- v1.0.0 - Initial version
|
|
```
|
|
|
|
## Frontmatter Metadata
|
|
|
|
### Required Fields
|
|
|
|
None are strictly required, but these are recommended:
|
|
|
|
| Field | Type | Description | Example |
|
|
|-------|------|-------------|---------|
|
|
| `schema-id` | string | Canonical URI for the schema | `https://markitect.dev/schemas/manpage/v1.0` |
|
|
| `version` | string | SemVer version | `1.0.0` |
|
|
| `status` | string | Lifecycle status | `stable`, `draft`, `deprecated` |
|
|
|
|
### Optional Fields
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `domain` | string | Schema domain name |
|
|
| `description` | string | Brief schema description |
|
|
| `authors` | array | List of authors |
|
|
| `created` | string | Creation date (ISO 8601) |
|
|
| `updated` | string | Last update date (ISO 8601) |
|
|
|
|
### Metadata Merging
|
|
|
|
Frontmatter metadata takes precedence over schema fields:
|
|
|
|
- `schema-id` → `$id` in the schema
|
|
- `version` → `version` in the schema
|
|
- `status` → `x-markitect-metadata.status` in the schema
|
|
|
|
All frontmatter is preserved in `x-markitect-source.frontmatter`.
|
|
|
|
## JSON Schema Extraction
|
|
|
|
### Schema Definition Section
|
|
|
|
The loader prefers JSON blocks under a `## Schema Definition` heading:
|
|
|
|
```markdown
|
|
## Schema Definition
|
|
|
|
```json
|
|
{
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
...
|
|
}
|
|
```
|
|
```
|
|
|
|
### Fallback Behavior
|
|
|
|
If no `## Schema Definition` section exists, the loader uses the **first** JSON code block in the file.
|
|
|
|
### Multiple JSON Blocks
|
|
|
|
You can include multiple JSON blocks in documentation:
|
|
|
|
```markdown
|
|
## Example Usage
|
|
|
|
```json
|
|
{
|
|
"name": "example",
|
|
"version": "1.0"
|
|
}
|
|
```
|
|
|
|
## Schema Definition
|
|
|
|
```json
|
|
{
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"properties": {
|
|
"name": {"type": "string"},
|
|
"version": {"type": "string"}
|
|
}
|
|
}
|
|
```
|
|
```
|
|
|
|
The loader will use the schema under `## Schema Definition` heading.
|
|
|
|
## Using the Loader
|
|
|
|
### Python API
|
|
|
|
```python
|
|
from pathlib import Path
|
|
from markitect.schema_loader import MarkdownSchemaLoader
|
|
|
|
# Create loader instance
|
|
loader = MarkdownSchemaLoader()
|
|
|
|
# Load schema from markdown
|
|
schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))
|
|
|
|
# Access components
|
|
schema = schema_data['schema'] # JSON Schema dict
|
|
metadata = schema_data['metadata'] # Frontmatter dict
|
|
docs = schema_data['documentation'] # Full markdown content
|
|
source = schema_data['source_file'] # Source file path
|
|
|
|
# Use the schema
|
|
print(f"Loaded: {schema['title']}")
|
|
print(f"Version: {schema['version']}")
|
|
print(f"Status: {metadata['status']}")
|
|
```
|
|
|
|
### Loading from Markdown
|
|
|
|
```python
|
|
# Load schema
|
|
schema_data = loader.load_schema(Path("my-schema-v1.0.md"))
|
|
|
|
# Check for issues
|
|
issues = loader.validate_schema_structure(schema_data['schema'])
|
|
if issues:
|
|
for issue in issues:
|
|
print(f"⚠️ {issue}")
|
|
```
|
|
|
|
### Saving to Markdown
|
|
|
|
```python
|
|
# Create a schema
|
|
schema = {
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"title": "My Schema",
|
|
"version": "1.0.0",
|
|
"type": "object",
|
|
"properties": {
|
|
"name": {"type": "string"}
|
|
}
|
|
}
|
|
|
|
# Save as markdown
|
|
loader.save_schema(
|
|
schema=schema,
|
|
md_path=Path("my-schema-v1.0.md"),
|
|
frontmatter={
|
|
"schema-id": "https://example.com/schemas/my-schema/v1.0",
|
|
"status": "draft"
|
|
}
|
|
)
|
|
```
|
|
|
|
### Round-Trip Conversion
|
|
|
|
```python
|
|
# Load existing JSON schema
|
|
import json
|
|
json_schema = json.loads(Path("old-schema.json").read_text())
|
|
|
|
# Save as markdown
|
|
loader.save_schema(
|
|
schema=json_schema,
|
|
md_path=Path("new-schema-v1.0.md")
|
|
)
|
|
|
|
# Load it back
|
|
schema_data = loader.load_schema(Path("new-schema-v1.0.md"))
|
|
|
|
# Schemas are equivalent
|
|
assert schema_data['schema']['title'] == json_schema['title']
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### Listing JSON Blocks
|
|
|
|
Useful for debugging when multiple JSON blocks exist:
|
|
|
|
```python
|
|
content = Path("schema.md").read_text()
|
|
blocks = loader.list_json_blocks(content)
|
|
|
|
print(f"Found {len(blocks)} JSON blocks:")
|
|
for position, json_content in blocks:
|
|
print(f" Position {position}: {len(json_content)} chars")
|
|
```
|
|
|
|
### Schema Structure Validation
|
|
|
|
Check for recommended fields and conventions:
|
|
|
|
```python
|
|
issues = loader.validate_schema_structure(schema)
|
|
|
|
for issue in issues:
|
|
print(f"⚠️ {issue}")
|
|
|
|
# Example output:
|
|
# ⚠️ Missing recommended field: $id
|
|
# ⚠️ Missing MarkiTect convention: version field
|
|
```
|
|
|
|
### Custom Templates
|
|
|
|
Use custom markdown templates for saving schemas:
|
|
|
|
```python
|
|
template = """---
|
|
{frontmatter_yaml}
|
|
---
|
|
|
|
# {title}
|
|
|
|
{description}
|
|
|
|
## Schema
|
|
|
|
```json
|
|
{schema_json}
|
|
```
|
|
"""
|
|
|
|
loader.save_schema(
|
|
schema=schema,
|
|
md_path=Path("custom-schema-v1.0.md"),
|
|
template=template
|
|
)
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Common Errors
|
|
|
|
| Error | Cause | Solution |
|
|
|-------|-------|----------|
|
|
| `FileNotFoundError` | Schema file doesn't exist | Check file path |
|
|
| `SchemaNotFoundError` | No JSON block in markdown | Add ```json code block |
|
|
| `InvalidSchemaFormatError` | Invalid JSON or YAML | Check syntax |
|
|
| `SchemaFilenameError` | Invalid filename format | Use `{domain}-schema-v{major}.{minor}.md` |
|
|
|
|
### Example Error Handling
|
|
|
|
```python
|
|
from markitect.schema_loader import (
|
|
MarkdownSchemaLoader,
|
|
SchemaNotFoundError,
|
|
InvalidSchemaFormatError
|
|
)
|
|
|
|
loader = MarkdownSchemaLoader()
|
|
|
|
try:
|
|
schema_data = loader.load_schema(Path("my-schema.md"))
|
|
except FileNotFoundError as e:
|
|
print(f"❌ File not found: {e}")
|
|
except SchemaNotFoundError as e:
|
|
print(f"❌ No schema in file: {e}")
|
|
except InvalidSchemaFormatError as e:
|
|
print(f"❌ Invalid format: {e}")
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Use Schema Definition Section
|
|
|
|
Always place the main schema under `## Schema Definition`:
|
|
|
|
```markdown
|
|
## Schema Definition
|
|
|
|
```json
|
|
{...}
|
|
```
|
|
```
|
|
|
|
### 2. Include Frontmatter
|
|
|
|
Provide metadata for better discoverability:
|
|
|
|
```yaml
|
|
---
|
|
schema-id: "https://markitect.dev/schemas/domain/v1.0"
|
|
version: "1.0.0"
|
|
status: "stable"
|
|
---
|
|
```
|
|
|
|
### 3. Add Rich Documentation
|
|
|
|
Explain the schema purpose, usage, and examples:
|
|
|
|
```markdown
|
|
## Overview
|
|
This schema validates...
|
|
|
|
## Usage
|
|
```bash
|
|
markitect validate doc.md --schema my-schema-v1.0
|
|
```
|
|
|
|
## Examples
|
|
...
|
|
```
|
|
|
|
### 4. Version Your Schemas
|
|
|
|
Follow the naming convention:
|
|
|
|
- Initial: `my-schema-v1.0.md`
|
|
- Minor update: `my-schema-v1.1.md`
|
|
- Breaking change: `my-schema-v2.0.md`
|
|
|
|
### 5. Validate Structure
|
|
|
|
Always check for common issues:
|
|
|
|
```python
|
|
issues = loader.validate_schema_structure(schema)
|
|
if not issues:
|
|
print("✅ Schema structure is valid")
|
|
```
|
|
|
|
## Integration with MarkiTect
|
|
|
|
### CLI Usage (Future)
|
|
|
|
Once integrated with the CLI, you'll be able to:
|
|
|
|
```bash
|
|
# Ingest markdown schema
|
|
markitect schema-ingest manpage-schema-v1.0.md
|
|
|
|
# Validate against markdown schema
|
|
markitect validate document.md --schema manpage-schema-v1.0
|
|
|
|
# Export schema
|
|
markitect schema-get manpage-schema-v1.0 --output json
|
|
```
|
|
|
|
### Validator Integration
|
|
|
|
The SchemaValidator will automatically detect `.md` schemas:
|
|
|
|
```python
|
|
from markitect.validator import SchemaValidator
|
|
|
|
validator = SchemaValidator()
|
|
validator.validate(
|
|
document="my-doc.md",
|
|
schema="manpage-schema-v1.0.md" # .md extension auto-detected
|
|
)
|
|
```
|
|
|
|
## Markdown Schema Template
|
|
|
|
Here's a complete template for creating new schemas:
|
|
|
|
```markdown
|
|
---
|
|
schema-id: "https://markitect.dev/schemas/YOUR-DOMAIN/v1.0"
|
|
version: "1.0.0"
|
|
status: "draft"
|
|
domain: "YOUR-DOMAIN"
|
|
description: "Brief description of what this schema validates"
|
|
authors:
|
|
- "Your Name <email@example.com>"
|
|
created: "2026-01-04"
|
|
---
|
|
|
|
# YOUR-DOMAIN Schema v1.0
|
|
|
|
## Overview
|
|
|
|
Detailed description of what this schema validates and why it exists.
|
|
|
|
## Features
|
|
|
|
- Feature 1
|
|
- Feature 2
|
|
- Feature 3
|
|
|
|
## Usage
|
|
|
|
### Validating Documents
|
|
|
|
```bash
|
|
markitect validate document.md --schema YOUR-DOMAIN-schema-v1.0
|
|
```
|
|
|
|
### Common Validation Errors
|
|
|
|
1. **Error Type 1**: Description and solution
|
|
2. **Error Type 2**: Description and solution
|
|
|
|
## Schema Definition
|
|
|
|
```json
|
|
{
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"title": "YOUR DOMAIN Schema",
|
|
"description": "Schema description",
|
|
"type": "object",
|
|
"properties": {
|
|
"field1": {
|
|
"type": "string",
|
|
"description": "Description of field1"
|
|
}
|
|
},
|
|
"required": ["field1"]
|
|
}
|
|
```
|
|
|
|
## Examples
|
|
|
|
### Valid Document
|
|
|
|
```markdown
|
|
Example of valid content...
|
|
```
|
|
|
|
### Invalid Document
|
|
|
|
```markdown
|
|
Example of invalid content...
|
|
```
|
|
|
|
## Version History
|
|
|
|
### v1.0.0 (2026-01-04)
|
|
- Initial version
|
|
- Feature A
|
|
- Feature B
|
|
|
|
## Related Documentation
|
|
|
|
- [Related Schema 1](../other-schema-v1.0.md)
|
|
- [MarkiTect Documentation](../../README.md)
|
|
```
|
|
|
|
## Testing
|
|
|
|
The loader has comprehensive test coverage:
|
|
|
|
```bash
|
|
# Run all loader tests
|
|
pytest tests/test_schema_loader.py -v
|
|
|
|
# Run specific test class
|
|
pytest tests/test_schema_loader.py::TestMarkdownSchemaLoader -v
|
|
|
|
# Check coverage
|
|
pytest tests/test_schema_loader.py --cov=markitect.schema_loader
|
|
```
|
|
|
|
**Test Results**: 35/35 tests passing (100%)
|
|
|
|
## Implementation Details
|
|
|
|
### Regex Patterns
|
|
|
|
The loader uses these regex patterns:
|
|
|
|
```python
|
|
# Frontmatter pattern
|
|
r'^---\s*\n(.*?)\n---\s*\n'
|
|
|
|
# JSON code block pattern
|
|
r'```json\s*\n(.*?)\n```'
|
|
|
|
# Schema Definition section pattern
|
|
r'##\s+Schema Definition\s*\n'
|
|
```
|
|
|
|
### Metadata Merging
|
|
|
|
The `_merge_metadata` method:
|
|
|
|
1. Copies the original schema
|
|
2. Adds `x-markitect-source` with file metadata
|
|
3. Merges frontmatter fields:
|
|
- `schema-id` → `$id`
|
|
- `version` → `version`
|
|
- `status` → `x-markitect-metadata.status`
|
|
|
|
### File Encoding
|
|
|
|
All files are read/written as UTF-8. Invalid UTF-8 sequences raise `InvalidSchemaFormatError`.
|
|
|
|
## Troubleshooting
|
|
|
|
### Schema Not Found
|
|
|
|
**Problem**: `SchemaNotFoundError: No JSON schema found`
|
|
|
|
**Solutions**:
|
|
- Ensure you have a ```json code block
|
|
- Check the JSON syntax is valid
|
|
- Verify the code block is properly closed with ```
|
|
|
|
### Invalid YAML Frontmatter
|
|
|
|
**Problem**: `InvalidSchemaFormatError: Invalid YAML frontmatter`
|
|
|
|
**Solutions**:
|
|
- Check YAML syntax (indentation, colons, quotes)
|
|
- Ensure frontmatter is between `---` delimiters
|
|
- Verify frontmatter is at the start of file
|
|
|
|
### Binary File Error
|
|
|
|
**Problem**: `InvalidSchemaFormatError: Failed to read schema file`
|
|
|
|
**Solutions**:
|
|
- Ensure file is text, not binary
|
|
- Check file encoding is UTF-8
|
|
- Verify file isn't corrupted
|
|
|
|
## See Also
|
|
|
|
- [Schema Naming Specification](SCHEMA_NAMING_SPEC.md)
|
|
- [Schema Management Workplan](WORKPLAN.md)
|
|
- [Phase 2 Documentation](WORKPLAN.md#phase-2-markdown-schema-loader)
|
|
- [Example Markdown Schema](../../markitect/schemas/manpage-schema-v1.0.md)
|
|
|
|
## Changelog
|
|
|
|
### v1.0.0 (2026-01-04)
|
|
- Initial implementation
|
|
- 35 unit tests (100% passing)
|
|
- Frontmatter extraction with YAML parsing
|
|
- JSON code block extraction with section preference
|
|
- Metadata merging with x-markitect-source tracking
|
|
- Schema saving with template support
|
|
- Round-trip save/load capability
|
|
- Helper methods for validation and debugging
|