Files
markitect-main/roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md
tegwick b81ce5631d feat: implement Phase 2 - Markdown Schema Loader
Completed Phase 2 of the schema-of-schemas implementation with full
markdown schema support. This enables schemas to be authored as
markdown files with rich documentation and embedded JSON schemas.

Core Implementation (markitect/schema_loader.py):
- MarkdownSchemaLoader class with comprehensive parsing capabilities
- YAML frontmatter extraction with error handling
- JSON code block extraction with section preference (## Schema Definition)
- Metadata merging with x-markitect-source tracking
- Schema saving with template support and round-trip capability
- Helper methods: list_json_blocks(), validate_schema_structure()

Test Coverage (tests/test_schema_loader.py):
- 35 comprehensive unit tests (100% passing)
- Tests for loading, parsing, saving, round-trip conversion
- Edge case handling (empty files, binary files, malformed blocks)
- Fixed binary file test to use invalid UTF-8 sequences

Example Schema (markitect/schemas/manpage-schema-v1.0.md):
- First markdown schema following naming convention
- Complete manpage schema with frontmatter + documentation + JSON
- Demonstrates section classification and content control
- Shows proper structure for future schema authors

Documentation (roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md):
- Comprehensive user guide (600+ lines)
- API reference with examples
- Best practices and troubleshooting
- Integration patterns for CLI and validator

Progress Tracking:
- Updated TODO.md with Phase 2 completion
- Updated CHANGELOG.md with implementation details
- Next: Phase 3 - Schema-for-Schemas Metaschema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-05 00:02:15 +01:00

12 KiB

Markdown Schema Loader - User Guide

Version: 1.0 Status: Implemented Created: 2026-01-04

Overview

The Markdown Schema Loader enables MarkiTect to load JSON schemas from markdown files, combining rich documentation with machine-readable validation rules. This aligns with MarkiTect's markdown-first philosophy while maintaining JSON Schema compatibility.

Markdown Schema Format

A markdown schema file consists of three parts:

  1. YAML Frontmatter: Metadata about the schema
  2. Documentation: Rich markdown content explaining the schema
  3. Schema Definition: JSON schema in a code block

Example Structure

---
schema-id: "https://markitect.dev/schemas/domain/v1.0"
version: "1.0.0"
status: "stable"
---

# Schema Title v1.0

## Overview
Description of what this schema validates...

## Usage
How to use this schema...

## Schema Definition

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "My Schema",
  "type": "object",
  ...
}

Version History

  • v1.0.0 - Initial version

## Frontmatter Metadata

### Required Fields

None are strictly required, but these are recommended:

| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `schema-id` | string | Canonical URI for the schema | `https://markitect.dev/schemas/manpage/v1.0` |
| `version` | string | SemVer version | `1.0.0` |
| `status` | string | Lifecycle status | `stable`, `draft`, `deprecated` |

### Optional Fields

| Field | Type | Description |
|-------|------|-------------|
| `domain` | string | Schema domain name |
| `description` | string | Brief schema description |
| `authors` | array | List of authors |
| `created` | string | Creation date (ISO 8601) |
| `updated` | string | Last update date (ISO 8601) |

### Metadata Merging

Frontmatter metadata takes precedence over schema fields:

- `schema-id` → `$id` in the schema
- `version` → `version` in the schema
- `status` → `x-markitect-metadata.status` in the schema

All frontmatter is preserved in `x-markitect-source.frontmatter`.

## JSON Schema Extraction

### Schema Definition Section

The loader prefers JSON blocks under a `## Schema Definition` heading:

```markdown
## Schema Definition

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  ...
}

### Fallback Behavior

If no `## Schema Definition` section exists, the loader uses the **first** JSON code block in the file.

### Multiple JSON Blocks

You can include multiple JSON blocks in documentation:

```markdown
## Example Usage

```json
{
  "name": "example",
  "version": "1.0"
}

Schema Definition

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "properties": {
    "name": {"type": "string"},
    "version": {"type": "string"}
  }
}

The loader will use the schema under `## Schema Definition` heading.

## Using the Loader

### Python API

```python
from pathlib import Path
from markitect.schema_loader import MarkdownSchemaLoader

# Create loader instance
loader = MarkdownSchemaLoader()

# Load schema from markdown
schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))

# Access components
schema = schema_data['schema']         # JSON Schema dict
metadata = schema_data['metadata']     # Frontmatter dict
docs = schema_data['documentation']    # Full markdown content
source = schema_data['source_file']    # Source file path

# Use the schema
print(f"Loaded: {schema['title']}")
print(f"Version: {schema['version']}")
print(f"Status: {metadata['status']}")

Loading from Markdown

# Load schema
schema_data = loader.load_schema(Path("my-schema-v1.0.md"))

# Check for issues
issues = loader.validate_schema_structure(schema_data['schema'])
if issues:
    for issue in issues:
        print(f"⚠️  {issue}")

Saving to Markdown

# Create a schema
schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "My Schema",
    "version": "1.0.0",
    "type": "object",
    "properties": {
        "name": {"type": "string"}
    }
}

# Save as markdown
loader.save_schema(
    schema=schema,
    md_path=Path("my-schema-v1.0.md"),
    frontmatter={
        "schema-id": "https://example.com/schemas/my-schema/v1.0",
        "status": "draft"
    }
)

Round-Trip Conversion

# Load existing JSON schema
import json
json_schema = json.loads(Path("old-schema.json").read_text())

# Save as markdown
loader.save_schema(
    schema=json_schema,
    md_path=Path("new-schema-v1.0.md")
)

# Load it back
schema_data = loader.load_schema(Path("new-schema-v1.0.md"))

# Schemas are equivalent
assert schema_data['schema']['title'] == json_schema['title']

Advanced Features

Listing JSON Blocks

Useful for debugging when multiple JSON blocks exist:

content = Path("schema.md").read_text()
blocks = loader.list_json_blocks(content)

print(f"Found {len(blocks)} JSON blocks:")
for position, json_content in blocks:
    print(f"  Position {position}: {len(json_content)} chars")

Schema Structure Validation

Check for recommended fields and conventions:

issues = loader.validate_schema_structure(schema)

for issue in issues:
    print(f"⚠️  {issue}")

# Example output:
# ⚠️  Missing recommended field: $id
# ⚠️  Missing MarkiTect convention: version field

Custom Templates

Use custom markdown templates for saving schemas:

template = """---
{frontmatter_yaml}
---

# {title}

{description}

## Schema

```json
{schema_json}

"""

loader.save_schema( schema=schema, md_path=Path("custom-schema-v1.0.md"), template=template )


## Error Handling

### Common Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `FileNotFoundError` | Schema file doesn't exist | Check file path |
| `SchemaNotFoundError` | No JSON block in markdown | Add ```json code block |
| `InvalidSchemaFormatError` | Invalid JSON or YAML | Check syntax |
| `SchemaFilenameError` | Invalid filename format | Use `{domain}-schema-v{major}.{minor}.md` |

### Example Error Handling

```python
from markitect.schema_loader import (
    MarkdownSchemaLoader,
    SchemaNotFoundError,
    InvalidSchemaFormatError
)

loader = MarkdownSchemaLoader()

try:
    schema_data = loader.load_schema(Path("my-schema.md"))
except FileNotFoundError as e:
    print(f"❌ File not found: {e}")
except SchemaNotFoundError as e:
    print(f"❌ No schema in file: {e}")
except InvalidSchemaFormatError as e:
    print(f"❌ Invalid format: {e}")

Best Practices

1. Use Schema Definition Section

Always place the main schema under ## Schema Definition:

## Schema Definition

```json
{...}

### 2. Include Frontmatter

Provide metadata for better discoverability:

```yaml
---
schema-id: "https://markitect.dev/schemas/domain/v1.0"
version: "1.0.0"
status: "stable"
---

3. Add Rich Documentation

Explain the schema purpose, usage, and examples:

## Overview
This schema validates...

## Usage
```bash
markitect validate doc.md --schema my-schema-v1.0

Examples

...


### 4. Version Your Schemas

Follow the naming convention:

- Initial: `my-schema-v1.0.md`
- Minor update: `my-schema-v1.1.md`
- Breaking change: `my-schema-v2.0.md`

### 5. Validate Structure

Always check for common issues:

```python
issues = loader.validate_schema_structure(schema)
if not issues:
    print("✅ Schema structure is valid")

Integration with MarkiTect

CLI Usage (Future)

Once integrated with the CLI, you'll be able to:

# Ingest markdown schema
markitect schema-ingest manpage-schema-v1.0.md

# Validate against markdown schema
markitect validate document.md --schema manpage-schema-v1.0

# Export schema
markitect schema-get manpage-schema-v1.0 --output json

Validator Integration

The SchemaValidator will automatically detect .md schemas:

from markitect.validator import SchemaValidator

validator = SchemaValidator()
validator.validate(
    document="my-doc.md",
    schema="manpage-schema-v1.0.md"  # .md extension auto-detected
)

Markdown Schema Template

Here's a complete template for creating new schemas:

---
schema-id: "https://markitect.dev/schemas/YOUR-DOMAIN/v1.0"
version: "1.0.0"
status: "draft"
domain: "YOUR-DOMAIN"
description: "Brief description of what this schema validates"
authors:
  - "Your Name <email@example.com>"
created: "2026-01-04"
---

# YOUR-DOMAIN Schema v1.0

## Overview

Detailed description of what this schema validates and why it exists.

## Features

- Feature 1
- Feature 2
- Feature 3

## Usage

### Validating Documents

```bash
markitect validate document.md --schema YOUR-DOMAIN-schema-v1.0

Common Validation Errors

  1. Error Type 1: Description and solution
  2. Error Type 2: Description and solution

Schema Definition

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "YOUR DOMAIN Schema",
  "description": "Schema description",
  "type": "object",
  "properties": {
    "field1": {
      "type": "string",
      "description": "Description of field1"
    }
  },
  "required": ["field1"]
}

Examples

Valid Document

Example of valid content...

Invalid Document

Example of invalid content...

Version History

v1.0.0 (2026-01-04)

  • Initial version
  • Feature A
  • Feature B

## Testing

The loader has comprehensive test coverage:

```bash
# Run all loader tests
pytest tests/test_schema_loader.py -v

# Run specific test class
pytest tests/test_schema_loader.py::TestMarkdownSchemaLoader -v

# Check coverage
pytest tests/test_schema_loader.py --cov=markitect.schema_loader

Test Results: 35/35 tests passing (100%)

Implementation Details

Regex Patterns

The loader uses these regex patterns:

# Frontmatter pattern
r'^---\s*\n(.*?)\n---\s*\n'

# JSON code block pattern
r'```json\s*\n(.*?)\n```'

# Schema Definition section pattern
r'##\s+Schema Definition\s*\n'

Metadata Merging

The _merge_metadata method:

  1. Copies the original schema
  2. Adds x-markitect-source with file metadata
  3. Merges frontmatter fields:
    • schema-id$id
    • versionversion
    • statusx-markitect-metadata.status

File Encoding

All files are read/written as UTF-8. Invalid UTF-8 sequences raise InvalidSchemaFormatError.

Troubleshooting

Schema Not Found

Problem: SchemaNotFoundError: No JSON schema found

Solutions:

  • Ensure you have a ```json code block
  • Check the JSON syntax is valid
  • Verify the code block is properly closed with ```

Invalid YAML Frontmatter

Problem: InvalidSchemaFormatError: Invalid YAML frontmatter

Solutions:

  • Check YAML syntax (indentation, colons, quotes)
  • Ensure frontmatter is between --- delimiters
  • Verify frontmatter is at the start of file

Binary File Error

Problem: InvalidSchemaFormatError: Failed to read schema file

Solutions:

  • Ensure file is text, not binary
  • Check file encoding is UTF-8
  • Verify file isn't corrupted

See Also

Changelog

v1.0.0 (2026-01-04)

  • Initial implementation
  • 35 unit tests (100% passing)
  • Frontmatter extraction with YAML parsing
  • JSON code block extraction with section preference
  • Metadata merging with x-markitect-source tracking
  • Schema saving with template support
  • Round-trip save/load capability
  • Helper methods for validation and debugging