feat: implement Phase 2 - Markdown Schema Loader

Completed Phase 2 of the schema-of-schemas implementation with full
markdown schema support. This enables schemas to be authored as
markdown files with rich documentation and embedded JSON schemas.

Core Implementation (markitect/schema_loader.py):
- MarkdownSchemaLoader class with comprehensive parsing capabilities
- YAML frontmatter extraction with error handling
- JSON code block extraction with section preference (## Schema Definition)
- Metadata merging with x-markitect-source tracking
- Schema saving with template support and round-trip capability
- Helper methods: list_json_blocks(), validate_schema_structure()

Test Coverage (tests/test_schema_loader.py):
- 35 comprehensive unit tests (100% passing)
- Tests for loading, parsing, saving, round-trip conversion
- Edge case handling (empty files, binary files, malformed blocks)
- Fixed binary file test to use invalid UTF-8 sequences

Example Schema (markitect/schemas/manpage-schema-v1.0.md):
- First markdown schema following naming convention
- Complete manpage schema with frontmatter + documentation + JSON
- Demonstrates section classification and content control
- Shows proper structure for future schema authors

Documentation (roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md):
- Comprehensive user guide (600+ lines)
- API reference with examples
- Best practices and troubleshooting
- Integration patterns for CLI and validator

Progress Tracking:
- Updated TODO.md with Phase 2 completion
- Updated CHANGELOG.md with implementation details
- Next: Phase 3 - Schema-for-Schemas Metaschema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-05 00:02:15 +01:00
parent 14108533fb
commit b81ce5631d
6 changed files with 2151 additions and 14 deletions

View File

@@ -0,0 +1,579 @@
# Markdown Schema Loader - User Guide
**Version:** 1.0
**Status:** Implemented
**Created:** 2026-01-04
## Overview
The Markdown Schema Loader enables MarkiTect to load JSON schemas from markdown files, combining rich documentation with machine-readable validation rules. This aligns with MarkiTect's markdown-first philosophy while maintaining JSON Schema compatibility.
## Markdown Schema Format
A markdown schema file consists of three parts:
1. **YAML Frontmatter**: Metadata about the schema
2. **Documentation**: Rich markdown content explaining the schema
3. **Schema Definition**: JSON schema in a code block
### Example Structure
```markdown
---
schema-id: "https://markitect.dev/schemas/domain/v1.0"
version: "1.0.0"
status: "stable"
---
# Schema Title v1.0
## Overview
Description of what this schema validates...
## Usage
How to use this schema...
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "My Schema",
"type": "object",
...
}
```
## Version History
- v1.0.0 - Initial version
```
## Frontmatter Metadata
### Required Fields
None are strictly required, but these are recommended:
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `schema-id` | string | Canonical URI for the schema | `https://markitect.dev/schemas/manpage/v1.0` |
| `version` | string | SemVer version | `1.0.0` |
| `status` | string | Lifecycle status | `stable`, `draft`, `deprecated` |
### Optional Fields
| Field | Type | Description |
|-------|------|-------------|
| `domain` | string | Schema domain name |
| `description` | string | Brief schema description |
| `authors` | array | List of authors |
| `created` | string | Creation date (ISO 8601) |
| `updated` | string | Last update date (ISO 8601) |
### Metadata Merging
Frontmatter metadata takes precedence over schema fields:
- `schema-id` → `$id` in the schema
- `version` → `version` in the schema
- `status` → `x-markitect-metadata.status` in the schema
All frontmatter is preserved in `x-markitect-source.frontmatter`.
## JSON Schema Extraction
### Schema Definition Section
The loader prefers JSON blocks under a `## Schema Definition` heading:
```markdown
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
...
}
```
```
### Fallback Behavior
If no `## Schema Definition` section exists, the loader uses the **first** JSON code block in the file.
### Multiple JSON Blocks
You can include multiple JSON blocks in documentation:
```markdown
## Example Usage
```json
{
"name": "example",
"version": "1.0"
}
```
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"name": {"type": "string"},
"version": {"type": "string"}
}
}
```
```
The loader will use the schema under `## Schema Definition` heading.
## Using the Loader
### Python API
```python
from pathlib import Path
from markitect.schema_loader import MarkdownSchemaLoader
# Create loader instance
loader = MarkdownSchemaLoader()
# Load schema from markdown
schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))
# Access components
schema = schema_data['schema'] # JSON Schema dict
metadata = schema_data['metadata'] # Frontmatter dict
docs = schema_data['documentation'] # Full markdown content
source = schema_data['source_file'] # Source file path
# Use the schema
print(f"Loaded: {schema['title']}")
print(f"Version: {schema['version']}")
print(f"Status: {metadata['status']}")
```
### Loading from Markdown
```python
# Load schema
schema_data = loader.load_schema(Path("my-schema-v1.0.md"))
# Check for issues
issues = loader.validate_schema_structure(schema_data['schema'])
if issues:
for issue in issues:
print(f"⚠️ {issue}")
```
### Saving to Markdown
```python
# Create a schema
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "My Schema",
"version": "1.0.0",
"type": "object",
"properties": {
"name": {"type": "string"}
}
}
# Save as markdown
loader.save_schema(
schema=schema,
md_path=Path("my-schema-v1.0.md"),
frontmatter={
"schema-id": "https://example.com/schemas/my-schema/v1.0",
"status": "draft"
}
)
```
### Round-Trip Conversion
```python
# Load existing JSON schema
import json
json_schema = json.loads(Path("old-schema.json").read_text())
# Save as markdown
loader.save_schema(
schema=json_schema,
md_path=Path("new-schema-v1.0.md")
)
# Load it back
schema_data = loader.load_schema(Path("new-schema-v1.0.md"))
# Schemas are equivalent
assert schema_data['schema']['title'] == json_schema['title']
```
## Advanced Features
### Listing JSON Blocks
Useful for debugging when multiple JSON blocks exist:
```python
content = Path("schema.md").read_text()
blocks = loader.list_json_blocks(content)
print(f"Found {len(blocks)} JSON blocks:")
for position, json_content in blocks:
print(f" Position {position}: {len(json_content)} chars")
```
### Schema Structure Validation
Check for recommended fields and conventions:
```python
issues = loader.validate_schema_structure(schema)
for issue in issues:
print(f"⚠️ {issue}")
# Example output:
# ⚠️ Missing recommended field: $id
# ⚠️ Missing MarkiTect convention: version field
```
### Custom Templates
Use custom markdown templates for saving schemas:
```python
template = """---
{frontmatter_yaml}
---
# {title}
{description}
## Schema
```json
{schema_json}
```
"""
loader.save_schema(
schema=schema,
md_path=Path("custom-schema-v1.0.md"),
template=template
)
```
## Error Handling
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `FileNotFoundError` | Schema file doesn't exist | Check file path |
| `SchemaNotFoundError` | No JSON block in markdown | Add ```json code block |
| `InvalidSchemaFormatError` | Invalid JSON or YAML | Check syntax |
| `SchemaFilenameError` | Invalid filename format | Use `{domain}-schema-v{major}.{minor}.md` |
### Example Error Handling
```python
from markitect.schema_loader import (
MarkdownSchemaLoader,
SchemaNotFoundError,
InvalidSchemaFormatError
)
loader = MarkdownSchemaLoader()
try:
schema_data = loader.load_schema(Path("my-schema.md"))
except FileNotFoundError as e:
print(f"❌ File not found: {e}")
except SchemaNotFoundError as e:
print(f"❌ No schema in file: {e}")
except InvalidSchemaFormatError as e:
print(f"❌ Invalid format: {e}")
```
## Best Practices
### 1. Use Schema Definition Section
Always place the main schema under `## Schema Definition`:
```markdown
## Schema Definition
```json
{...}
```
```
### 2. Include Frontmatter
Provide metadata for better discoverability:
```yaml
---
schema-id: "https://markitect.dev/schemas/domain/v1.0"
version: "1.0.0"
status: "stable"
---
```
### 3. Add Rich Documentation
Explain the schema purpose, usage, and examples:
```markdown
## Overview
This schema validates...
## Usage
```bash
markitect validate doc.md --schema my-schema-v1.0
```
## Examples
...
```
### 4. Version Your Schemas
Follow the naming convention:
- Initial: `my-schema-v1.0.md`
- Minor update: `my-schema-v1.1.md`
- Breaking change: `my-schema-v2.0.md`
### 5. Validate Structure
Always check for common issues:
```python
issues = loader.validate_schema_structure(schema)
if not issues:
print("✅ Schema structure is valid")
```
## Integration with MarkiTect
### CLI Usage (Future)
Once integrated with the CLI, you'll be able to:
```bash
# Ingest markdown schema
markitect schema-ingest manpage-schema-v1.0.md
# Validate against markdown schema
markitect validate document.md --schema manpage-schema-v1.0
# Export schema
markitect schema-get manpage-schema-v1.0 --output json
```
### Validator Integration
The SchemaValidator will automatically detect `.md` schemas:
```python
from markitect.validator import SchemaValidator
validator = SchemaValidator()
validator.validate(
document="my-doc.md",
schema="manpage-schema-v1.0.md" # .md extension auto-detected
)
```
## Markdown Schema Template
Here's a complete template for creating new schemas:
```markdown
---
schema-id: "https://markitect.dev/schemas/YOUR-DOMAIN/v1.0"
version: "1.0.0"
status: "draft"
domain: "YOUR-DOMAIN"
description: "Brief description of what this schema validates"
authors:
- "Your Name <email@example.com>"
created: "2026-01-04"
---
# YOUR-DOMAIN Schema v1.0
## Overview
Detailed description of what this schema validates and why it exists.
## Features
- Feature 1
- Feature 2
- Feature 3
## Usage
### Validating Documents
```bash
markitect validate document.md --schema YOUR-DOMAIN-schema-v1.0
```
### Common Validation Errors
1. **Error Type 1**: Description and solution
2. **Error Type 2**: Description and solution
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "YOUR DOMAIN Schema",
"description": "Schema description",
"type": "object",
"properties": {
"field1": {
"type": "string",
"description": "Description of field1"
}
},
"required": ["field1"]
}
```
## Examples
### Valid Document
```markdown
Example of valid content...
```
### Invalid Document
```markdown
Example of invalid content...
```
## Version History
### v1.0.0 (2026-01-04)
- Initial version
- Feature A
- Feature B
## Related Documentation
- [Related Schema 1](../other-schema-v1.0.md)
- [MarkiTect Documentation](../../README.md)
```
## Testing
The loader has comprehensive test coverage:
```bash
# Run all loader tests
pytest tests/test_schema_loader.py -v
# Run specific test class
pytest tests/test_schema_loader.py::TestMarkdownSchemaLoader -v
# Check coverage
pytest tests/test_schema_loader.py --cov=markitect.schema_loader
```
**Test Results**: 35/35 tests passing (100%)
## Implementation Details
### Regex Patterns
The loader uses these regex patterns:
```python
# Frontmatter pattern
r'^---\s*\n(.*?)\n---\s*\n'
# JSON code block pattern
r'```json\s*\n(.*?)\n```'
# Schema Definition section pattern
r'##\s+Schema Definition\s*\n'
```
### Metadata Merging
The `_merge_metadata` method:
1. Copies the original schema
2. Adds `x-markitect-source` with file metadata
3. Merges frontmatter fields:
- `schema-id``$id`
- `version``version`
- `status``x-markitect-metadata.status`
### File Encoding
All files are read/written as UTF-8. Invalid UTF-8 sequences raise `InvalidSchemaFormatError`.
## Troubleshooting
### Schema Not Found
**Problem**: `SchemaNotFoundError: No JSON schema found`
**Solutions**:
- Ensure you have a ```json code block
- Check the JSON syntax is valid
- Verify the code block is properly closed with ```
### Invalid YAML Frontmatter
**Problem**: `InvalidSchemaFormatError: Invalid YAML frontmatter`
**Solutions**:
- Check YAML syntax (indentation, colons, quotes)
- Ensure frontmatter is between `---` delimiters
- Verify frontmatter is at the start of file
### Binary File Error
**Problem**: `InvalidSchemaFormatError: Failed to read schema file`
**Solutions**:
- Ensure file is text, not binary
- Check file encoding is UTF-8
- Verify file isn't corrupted
## See Also
- [Schema Naming Specification](SCHEMA_NAMING_SPEC.md)
- [Schema Management Workplan](WORKPLAN.md)
- [Phase 2 Documentation](WORKPLAN.md#phase-2-markdown-schema-loader)
- [Example Markdown Schema](../../markitect/schemas/manpage-schema-v1.0.md)
## Changelog
### v1.0.0 (2026-01-04)
- Initial implementation
- 35 unit tests (100% passing)
- Frontmatter extraction with YAML parsing
- JSON code block extraction with section preference
- Metadata merging with x-markitect-source tracking
- Schema saving with template support
- Round-trip save/load capability
- Helper methods for validation and debugging