5 Commits

Author SHA1 Message Date
f3aaec99bb feat: implement Phase 3 - Schema-for-Schemas Metaschema
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Completed Phase 3 of the schema-of-schemas implementation with a
comprehensive metaschema that validates all MarkiTect schema files
against conventions and standards.

Metaschema Implementation (schema-schema-v1.0.md - 650+ lines):
- Validates core JSON Schema fields ($schema, $id, title, description)
- Validates MarkiTect version field (SemVer: major.minor.patch)
- Validates $id URL format (HTTPS with version path)
- Validates MarkiTect extensions:
  - x-markitect-sections: section classifications and content rules
  - x-markitect-content-control: pattern and quality validation
  - x-markitect-metadata: status, authors, tags
  - x-markitect-source: loader metadata (auto-added)
- Section classification validation (required, recommended, optional,
  discouraged, improper)
- Content control pattern validation
- Comprehensive documentation with examples and usage guides

CLI Command (markitect schema-validate):
- Validates schema files against metaschema
- Supports both markdown and JSON schema files
- Detailed error reporting with schema paths
- Structure validation recommendations
- Exit codes for CI/CD integration

Test Coverage (tests/test_schema_metaschema.py - 12 tests, 100% passing):
- Metaschema self-validation
- Manpage schema validation
- Required fields enforcement
- Version format validation (valid and invalid cases)
- $id format validation (valid and invalid cases)
- Section classification validation
- Complete schema with all extensions

Validation Results:
-  Metaschema validates itself successfully
-  Manpage schema (v1.0.md) validates successfully
- ⚠️  Terminology schema needs migration (missing version, incorrect $id)

Progress Tracking:
- Updated TODO.md with Phase 3 completion
- Updated CHANGELOG.md with implementation details
- Next: Phase 4 - Schema Migration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-05 03:10:49 +01:00
b81ce5631d feat: implement Phase 2 - Markdown Schema Loader
Completed Phase 2 of the schema-of-schemas implementation with full
markdown schema support. This enables schemas to be authored as
markdown files with rich documentation and embedded JSON schemas.

Core Implementation (markitect/schema_loader.py):
- MarkdownSchemaLoader class with comprehensive parsing capabilities
- YAML frontmatter extraction with error handling
- JSON code block extraction with section preference (## Schema Definition)
- Metadata merging with x-markitect-source tracking
- Schema saving with template support and round-trip capability
- Helper methods: list_json_blocks(), validate_schema_structure()

Test Coverage (tests/test_schema_loader.py):
- 35 comprehensive unit tests (100% passing)
- Tests for loading, parsing, saving, round-trip conversion
- Edge case handling (empty files, binary files, malformed blocks)
- Fixed binary file test to use invalid UTF-8 sequences

Example Schema (markitect/schemas/manpage-schema-v1.0.md):
- First markdown schema following naming convention
- Complete manpage schema with frontmatter + documentation + JSON
- Demonstrates section classification and content control
- Shows proper structure for future schema authors

Documentation (roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md):
- Comprehensive user guide (600+ lines)
- API reference with examples
- Best practices and troubleshooting
- Integration patterns for CLI and validator

Progress Tracking:
- Updated TODO.md with Phase 2 completion
- Updated CHANGELOG.md with implementation details
- Next: Phase 3 - Schema-for-Schemas Metaschema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-05 00:02:15 +01:00
14108533fb feat: implement schema filename validation (Phase 1 complete)
Implements filename convention enforcement for schema files as part of
the schema-of-schemas implementation. All schemas must now follow the
naming pattern: {domain}-schema-v{major}.{minor}.md

## Phase 1 Deliverables

### Schema Naming Module
**File:** `markitect/schema_naming.py` (380 lines)

**Functions:**
- `validate_schema_filename()` - Validate filename against pattern
- `suggest_schema_filename()` - Generate valid filename from domain/version
- `extract_schema_metadata()` - Extract domain and version from filename
- `get_validation_errors()` - Detailed error messages for invalid filenames
- `is_valid_schema_filename()` - Simple boolean validation
- `format_validation_message()` - User-friendly error formatting

**Features:**
- Regex-based pattern matching
- Automatic normalization (spaces → hyphens, lowercase)
- Detailed error reporting
- Domain validation (must start with letter)
- Version validation (major.minor format)

### Comprehensive Test Suite
**File:** `tests/test_schema_naming.py` (500+ lines, 50 tests)

**Test Coverage:**
-  Valid filename variations (simple, hyphenated, with numbers)
-  Invalid filenames (wrong extension, missing components, wrong case)
-  Filename suggestion with normalization
-  Metadata extraction
-  Error message generation
-  Edge cases (long names, many hyphens, large versions)
-  Pattern regex validation

**Results:** 50/50 tests passing (100%)

### Specification Document
**File:** `roadmap/schema-of-schemas/SCHEMA_NAMING_SPEC.md`

**Contents:**
- Formal specification of naming convention
- Regular expression pattern with explanation
- Valid and invalid examples
- Version numbering guidelines
- Domain naming best practices
- Normalization rules
- Migration strategy from legacy naming
- Implementation guide

## Naming Convention

### Format
```
{domain}-schema-v{major}.{minor}.md
```

### Examples
```
✓ manpage-schema-v1.0.md
✓ api-documentation-schema-v1.0.md
✓ terminology-schema-v1.0.md
✓ arc42-schema-v2.1.md

✗ manpage.json (wrong extension)
✗ ManPage-schema-v1.0.md (uppercase)
✗ manpage-v1.0.md (missing 'schema')
✗ manpage-schema-v1.md (missing minor version)
```

### Components
- **domain**: Lowercase, hyphen-separated, starts with letter
- **schema**: Literal keyword
- **version**: v{major}.{minor} (SemVer simplified)
- **extension**: .md (markdown)

## Implementation Highlights

### Automatic Normalization
```python
suggest_schema_filename("API Documentation", "2.1")
# → "api-documentation-schema-v2.1.md"

suggest_schema_filename("My_Custom Type", "1.0")
# → "my-custom-type-schema-v1.0.md"
```

### Detailed Error Reporting
```python
format_validation_message("invalid.json")
# → Detailed error list + suggested fix
```

### Metadata Extraction
```python
extract_schema_metadata("manpage-schema-v1.0.md")
# → {'domain': 'manpage', 'version': '1.0', 'major': 1, 'minor': 0}
```

## Migration Plan

Current schemas will be renamed:
```
Old                           → New
────────────────────────────────────────────────────────
terminology-schema.json       → terminology-schema-v1.0.md
api-documentation             → api-documentation-schema-v1.0.md
enhanced-manpage              → manpage-schema-v2.0.md
markdown-manpage              → DELETE (duplicate)
markdown-manpage-schema.json  → DELETE (duplicate)
```

## Phase 1 Status:  COMPLETE

### Completed
- [x] Schema naming module implementation
- [x] Comprehensive test suite (50 tests, 100% passing)
- [x] Specification document
- [x] TODO.md updated

### Next: Phase 2
- [ ] Update CLI schema-ingest with validation
- [ ] Implement markdown schema loader
- [ ] Parse frontmatter and JSON code blocks
- [ ] Update SchemaValidator for .md support

## Testing

```bash
# Run tests
pytest tests/test_schema_naming.py -v
# → 50 passed in 0.48s

# Test interactively
python -c "
from markitect.schema_naming import validate_schema_filename
print(validate_schema_filename('manpage-schema-v1.0.md'))
"
# → (True, {'domain': 'manpage', 'version': '1.0', ...})
```

## Files Changed

- markitect/schema_naming.py (NEW, 380 lines)
- tests/test_schema_naming.py (NEW, 500+ lines)
- roadmap/schema-of-schemas/SCHEMA_NAMING_SPEC.md (NEW)
- TODO.md (updated progress tracking)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 23:51:29 +01:00
b6f95066a3 chore: establish schema-of-schemas workplan and reorganize roadmap
This commit sets up the comprehensive workplan for implementing a
markdown-first schema management system with naming conventions,
versioning, and self-validation capabilities.

## Directory Reorganization

- Renamed `todo/` → `roadmap/` for better organization
- Created `roadmap/schema-of-schemas/` subdirectory
- Moved schema management planning artifacts to dedicated directory

## Planning Artifacts Created

### Workplan & Documentation
- **WORKPLAN.md** (19KB) - Comprehensive 6-phase implementation plan
- **SCHEMA_MANAGEMENT_PROPOSAL.md** - Full analysis with 4 options
- **SCHEMA_MANAGEMENT_SUMMARY.md** - Executive summary
- **README.md** - Quick reference guide

### Example Schema
- **examples/schemas/manpage-schema-v1.md** - Demonstrates markdown format

## Schema Management System Design

### Naming Convention
**Format:** `{domain}-schema-v{major}.{minor}.md`
**Examples:**
- `manpage-schema-v1.0.md`
- `terminology-schema-v1.0.md`
- `api-documentation-schema-v1.0.md`

### Markdown-First Format
Schemas will be markdown files with:
- YAML frontmatter for metadata
- Rich documentation sections
- Embedded JSON schema in code block
- Version history and examples

### Implementation Phases (8-10 days)

**Phase 0:** Planning & Setup  (0.5 days) - COMPLETE
**Phase 1:** Filename Convention (1 day) - NEXT
**Phase 2:** Markdown Loader (2-3 days)
**Phase 3:** Schema-for-Schemas (2 days)
**Phase 4:** Schema Migration (1-2 days)
**Phase 5:** CLI & Documentation (1 day)
**Phase 6:** Testing & Validation (1 day)

### Goals

1.  Establish naming convention
2.  Implement filename validation
3.  Create markdown schema loader
4.  Build schema-for-schemas metaschema
5.  Migrate 5 existing schemas (remove 2 duplicates)
6.  Update CLI and documentation

## Updated Tracking

### TODO.md
- Added Schema-of-Schemas as active work item
- Documented Phase 1 tasks and timeline
- Paused capability extraction work

### CHANGELOG.md
- Added schema management system to [Unreleased]
- Documented directory reorganization
- Added "In Progress" section for current work

## Next Steps

Begin Phase 1:
1. Implement schema_naming.py with validation
2. Add unit tests
3. Update CLI schema-ingest command
4. Create naming specification document

## Files Changed

- CHANGELOG.md - Added unreleased schema management features
- TODO.md - Updated active work tracking
- roadmap/ - Reorganized from todo/
- roadmap/schema-of-schemas/ - New planning directory
- examples/schemas/ - Example markdown schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 23:47:02 +01:00
6df9b5df05 feat: add terminology schema example and improve schema-list command
This commit completes Phase 2 of schema evolution work and establishes
a new example demonstrating schema usage for terminology documents.

## New Features

### Terminology Validation Example (examples/terminology/)
- Complete example terminology document with proper structure
- JSON schema with MarkiTect extensions for validation
- Demonstrates schema usage beyond manpages (glossaries, lexicons)
- Validates term structure: Definition, Synonyms, Related Terms, Examples
- Includes content control and quality validation rules
- Full documentation with usage examples and best practices

### Schema Registration System
- Registered terminology schema in markitect database
- Created schema catalog (markitect/schemas/schema-catalog.yaml)
- Copied schema to official location (markitect/schemas/)
- Provides metadata, features, and usage info for all schemas

### Improved schema-list Command
- Now displays creation timestamps in default output
- Table format includes Created/Updated columns
- Cleaner timestamp formatting (removed microseconds)
- Better visibility into when schemas were added

## Files Changed

Added:
- examples/terminology/README.md - Complete documentation
- examples/terminology/terminology-example.md - Example glossary
- examples/terminology/terminology-schema.json - Validation schema
- markitect/schemas/terminology-schema.json - Registered schema
- markitect/schemas/schema-catalog.yaml - Schema registry

Modified:
- markitect/cli.py - Enhanced schema-list with timestamps
- TODO.md - Documented Phase 2 completion and new example

Moved:
- SCHEMA_EVOLUTION_WORKPLAN.md → todo/ directory

## Schema Features Demonstrated

- Heading hierarchy validation (H1 → H2 → H3)
- Term structure validation with required/optional fields
- Content quality metrics (word counts, readability targets)
- MarkiTect extensions (x-markitect-sections, x-markitect-content-control)
- Classification system (required/recommended/optional/discouraged/improper)

## Usage

```bash
# List schemas with timestamps
markitect schema-list

# Validate terminology document
markitect validate glossary.md --schema terminology-schema.json

# View in table format
markitect schema-list --format table
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 23:07:36 +01:00
25 changed files with 7263 additions and 5 deletions

View File

@@ -8,9 +8,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Added
- **Schema Management System**: Comprehensive schema management infrastructure with naming conventions and versioning
- Naming convention: `{domain}-schema-v{major}.{minor}.md` for all schemas
- Markdown-first schema format with embedded JSON (documentation + schema in one file)
- Schema catalog (`markitect/schemas/schema-catalog.yaml`) for metadata and discovery
- Terminology validation example (`examples/terminology/`) demonstrating schema usage beyond manpages
- Schema-for-schemas workplan in `roadmap/schema-of-schemas/` directory
- **Enhanced schema-list Command**: Now displays creation timestamps in all output formats
- Default format shows timestamps inline: `schema-name.json (added: 2026-01-04T23:01:19)`
- Table format includes Created/Updated columns
- Cleaner timestamp formatting (removed microseconds)
- Enhanced control panel UI with better resize handle positioning for improved user interaction
### Changed
- **Directory Reorganization**: Renamed `todo/``roadmap/` for better organization of planning documents
- Created `roadmap/schema-of-schemas/` subdirectory for schema management planning artifacts
- Moved schema management proposals and workplan to dedicated directory
- Refactored contents control architecture to use base class pattern properly for better code organization
- Updated all file references and paths to point to single source of truth in capabilities/testdrive-jsui/js/controls/ directory
@@ -21,6 +34,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Removed
- **BREAKING**: Legacy DocumentControls component from TestDrive JSUI plugin system - all control panel functionality now provided by enhanced control panels (ContentsControl, StatusControl, DebugControl, EditControl) with Reset All button functionality moved to EditControl for better maintainability and elimination of code duplication
### In Progress
- **Schema-of-Schemas Implementation** (Phase 3 of 6 - Completed ✅)
- ✅ Phase 1: Filename validation for schema naming convention (`markitect/schema_naming.py`, 50 tests)
- ✅ Phase 2: Markdown schema loader to parse `.md` schema files (`markitect/schema_loader.py`, 35 tests)
- ✅ Phase 3: Schema-for-schemas metaschema for schema validation (`schema-schema-v1.0.md`, 12 tests)
- ⏳ Phase 4: Migration of 5 existing schemas to new format (will remove 2 duplicates)
- ⏳ Phase 5: CLI updates and documentation
- ⏳ Phase 6: Integration testing and validation
## [0.9.0] - 2025-11-14
### Added

107
TODO.md
View File

@@ -12,9 +12,51 @@ The structure organizes **future tasks** by their impact, just as a changelog or
This section is for tasks currently being discussed with or worked on by the coding assistant. These are the ephemeral, flow-of-thought tasks.
0. the file TODO.html is legacy i think and can be removed
### Schema-of-Schemas Implementation (Active - Phase 3)
### Extract Capability-Capability from Issue-Facade
**Status:** Phase 3 - Schema-for-Schemas Metaschema (Completed ✅)
**Workplan:** See `roadmap/schema-of-schemas/WORKPLAN.md`
**Current Goals:**
1. ✅ Establish naming convention: `{domain}-schema-v{major}.{minor}.md`
2. ✅ Implement filename validation logic
3. ✅ Create markdown schema loader
4. ✅ Create example markdown schema
5. ✅ Build schema-for-schemas metaschema
6. ⏳ Migrate existing schemas to new format (Next: Phase 4)
**Phase 1 Tasks (Completed ✅):**
- [x] Write `markitect/schema_naming.py` with validation logic
- [x] Add unit tests for filename validation (50 tests, 100% passing)
- [x] Create SCHEMA_NAMING_SPEC.md documentation
**Phase 2 Tasks (Completed ✅):**
- [x] Implement MarkdownSchemaLoader class (markitect/schema_loader.py, 515 lines)
- [x] Add frontmatter extraction (YAML)
- [x] Add JSON code block extraction with section preference
- [x] Add metadata merging with x-markitect-source tracking
- [x] Write comprehensive unit tests (35 tests, 100% passing)
- [x] Create example markdown schema (manpage-schema-v1.0.md)
- [x] Create SCHEMA_LOADER_GUIDE.md documentation
**Phase 3 Tasks (Completed ✅):**
- [x] Design schema-for-schemas metaschema (schema-schema-v1.0.md)
- [x] Implement metaschema with validation rules for MarkiTect conventions
- [x] Add schema-validate CLI command with detailed error reporting
- [x] Write comprehensive unit tests (12 tests, 100% passing)
- [x] Test metaschema self-validation
- [x] Validate existing schemas against metaschema
**Next Phases:**
- Phase 4: Schema Migration (1-2 days)
- Phase 5: CLI & Documentation Updates (1 day)
- Phase 6: Testing & Validation (1 day)
**Expected Completion:** 4-5 days remaining
---
### Extract Capability-Capability from Issue-Facade (Paused)
**Context:** Issue-facade currently provides two capabilities:
1. **issue-tracking** (explicit in CAPABILITY-issue-tracking.yaml) - Issue management across platforms
@@ -76,7 +118,7 @@ The **capability-capability** includes:
*Recent completed tasks have been documented in _issue-tracking/issue-facade/CHANGELOG.md following Keep a Changelog format.*
### 2026-01-04 - Phase 2: Schema Refinement Tools
### 2026-01-04 - Phase 2: Schema Refinement Tools & Terminology Example
- ✅ Implemented schema-analyze command to detect rigidity issues
- ✅ Implemented schema-refine command with automatic loosening logic
- ✅ Added interactive mode to schema-refine for fine-grained control
@@ -84,6 +126,8 @@ The **capability-capability** includes:
- ✅ Wrote user guide documentation with examples and workflows
- ✅ Successfully tested on example schemas (reduced rigidity from 60/100 to 24/100)
- ✅ Integrated into CLI with proper exit codes and error handling
- ✅ Moved SCHEMA_EVOLUTION_WORKPLAN.md to todo/ directory
- ✅ Created terminology validation example (examples/terminology/)
**Key Features Delivered:**
- Rigidity score calculation (0-100 scale)
@@ -93,6 +137,63 @@ The **capability-capability** includes:
- Interactive approval workflow
- Comprehensive reporting (normal and verbose modes)
**Terminology Example:**
- Complete terminology document structure (terminology-example.md)
- JSON schema with MarkiTect extensions (terminology-schema.json)
- Demonstrates schema usage for non-manpage documents
- Validates term definitions, synonyms, related terms, examples
- Includes content control and validation rules
- Full documentation and usage examples (README.md)
### 2026-01-04 - Phase 2: Markdown Schema Loader
- ✅ Implemented MarkdownSchemaLoader class (markitect/schema_loader.py, 515 lines)
- ✅ YAML frontmatter extraction with validation
- ✅ JSON code block extraction with "Schema Definition" section preference
- ✅ Metadata merging with x-markitect-source tracking
- ✅ Schema saving with template support and round-trip capability
- ✅ Comprehensive test suite (35 unit tests, 100% passing)
- ✅ Created example markdown schema (manpage-schema-v1.0.md)
- ✅ Created SCHEMA_LOADER_GUIDE.md with complete usage documentation
**Key Features Delivered:**
- Markdown-first schema format with embedded JSON
- Frontmatter metadata merges into schema ($id, version, status)
- Automatic detection of multiple JSON blocks
- Schema structure validation helper
- Error handling for binary files and invalid formats
- List JSON blocks helper for debugging
- Full round-trip save/load capability
**Example Markdown Schema:**
- manpage-schema-v1.0.md demonstrating complete format
- Includes frontmatter, documentation, and JSON schema
- Shows section classification and content control
- Follows naming convention: {domain}-schema-v{major}.{minor}.md
### 2026-01-04 - Phase 3: Schema-for-Schemas Metaschema
- ✅ Created schema-schema-v1.0.md metaschema (650+ lines)
- ✅ Validates core JSON Schema fields ($schema, $id, title, description)
- ✅ Validates MarkiTect version field (SemVer: major.minor.patch)
- ✅ Validates $id URL format (HTTPS with version)
- ✅ Validates MarkiTect extensions (x-markitect-sections, x-markitect-content-control, x-markitect-metadata)
- ✅ Implemented schema-validate CLI command with detailed error reporting
- ✅ Comprehensive test suite (12 unit tests, 100% passing)
- ✅ Metaschema self-validation successful
**Key Features Delivered:**
- Complete metaschema for validating all MarkiTect schemas
- Section classification validation (required, recommended, optional, discouraged, improper)
- Content control pattern validation
- Version format enforcement (SemVer)
- $id URL format enforcement (HTTPS with version)
- CLI command for easy schema validation
- Detailed error messages with schema paths
**Validation Results:**
- ✅ Metaschema validates itself
- ✅ Manpage schema validates successfully
- ⚠️ Terminology schema needs migration (missing version field, incorrect $id format)
### 2025-12-17 - Architecture Refactoring
- ✅ Implemented ReusableCapabilitiesArchitecture v0.1
- ✅ Added feedback capability to issue-facade

View File

@@ -0,0 +1,309 @@
# Manpage Schema v1.0
**Schema ID:** `https://markitect.dev/schemas/manpage/v1`
**Version:** 1.0.0
**Status:** Stable
**Created:** 2026-01-04
**Document Types:** Manual pages, CLI documentation
## Overview
This schema validates Unix/Linux manual page documentation following standard conventions. Manual pages (man pages) are the traditional form of documentation for Unix commands and system calls.
## Typical Structure
A well-formed manual page includes:
1. **NAME** - Command name and one-line description
2. **SYNOPSIS** - Command syntax showing all options
3. **DESCRIPTION** - Detailed explanation of what the command does
4. **OPTIONS** - Description of each command-line option
5. **EXAMPLES** - Practical usage examples
6. **SEE ALSO** - Related commands or documentation
## Usage
### Validate a Manpage
```bash
markitect validate mycommand.1.md --schema manpage-schema-v1
```
### Generate Manpage Template
```bash
markitect generate-stub manpage-schema-v1.json --output newcommand.1.md
```
### Refine Existing Schema
```bash
markitect schema-refine manpage-schema-v1.json --loosen-counts
```
## Examples
Complete examples can be found in:
- [examples/manpages/markdown-schema-validation.1.md](../manpages/markdown-schema-validation.1.md)
## Validation Rules
### Required Sections (Level 2 Headings)
**NAME**
- **Classification:** Required
- **Content:** Command name in bold followed by brief description
- **Format:** `**command** - one line description`
- **Example:** `**markitect** - validate and analyze markdown documents`
**SYNOPSIS**
- **Classification:** Required
- **Content:** Command syntax with all options
- **Min code blocks:** 1
- **Max paragraphs:** 3
- **Example:**
```bash
markitect [OPTIONS] COMMAND [ARGS]
```
**DESCRIPTION**
- **Classification:** Required
- **Content:** Detailed explanation of the command
- **Min paragraphs:** 2
- **Min words:** 50
### Recommended Sections
**OPTIONS**
- **Classification:** Recommended
- **Content:** Definition list or table of command-line options
- **Format:** Each option as `**--option** *type*` followed by description
**EXAMPLES**
- **Classification:** Recommended
- **Content:** Practical usage examples with explanations
- **Min code blocks:** 1
- **Benefit:** Greatly improves documentation usability
### Optional Sections
**ENVIRONMENT**
- Environment variables affecting command behavior
**FILES**
- Important files used or created by the command
**EXIT STATUS**
- Explanation of exit codes
**BUGS**
- Known issues or limitations
**AUTHORS**
- Command authors and contributors
**SEE ALSO**
- Related commands or documentation
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/manpage/v1",
"version": "1.0.0",
"title": "Unix Manual Page Schema",
"description": "Schema for validating Unix/Linux manual page documentation",
"type": "object",
"properties": {
"headings": {
"type": "object",
"properties": {
"level_1": {
"type": "array",
"description": "Document title (command name)",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"pattern": ".*"
}
}
},
"minItems": 1,
"maxItems": 1
},
"level_2": {
"type": "array",
"description": "Section headings",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"enum": [
"NAME",
"SYNOPSIS",
"DESCRIPTION",
"OPTIONS",
"EXAMPLES",
"ENVIRONMENT",
"FILES",
"EXIT STATUS",
"BUGS",
"AUTHORS",
"SEE ALSO"
]
}
}
},
"minItems": 3,
"maxItems": 20
}
},
"required": ["level_1", "level_2"]
},
"paragraphs": {
"type": "array",
"description": "Paragraph content",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 1
}
}
},
"minItems": 3
},
"code_blocks": {
"type": "array",
"description": "Code examples and command syntax",
"items": {
"type": "object",
"properties": {
"language": {
"type": "string"
},
"content": {
"type": "string"
}
}
},
"minItems": 1
}
},
"required": ["headings", "paragraphs"],
"x-markitect-sections": {
"NAME": {
"classification": "required",
"heading_level": 2,
"content_instruction": "Command name in bold followed by brief description",
"pattern": "\\*\\*[a-z-]+\\*\\*.*",
"error_if_missing": "NAME section is required in all manual pages"
},
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"content_instruction": "Command syntax showing all options and arguments",
"min_code_blocks": 1,
"error_if_missing": "SYNOPSIS section is required to show command usage"
},
"DESCRIPTION": {
"classification": "required",
"heading_level": 2,
"content_instruction": "Detailed explanation of what the command does and when to use it",
"min_paragraphs": 2,
"min_words": 50,
"error_if_missing": "DESCRIPTION section is required for detailed explanation"
},
"OPTIONS": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Description of each command-line option with syntax and explanation",
"warning_if_missing": "OPTIONS section recommended for commands with options"
},
"EXAMPLES": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Practical usage examples with explanations",
"min_code_blocks": 1,
"warning_if_missing": "EXAMPLES greatly improve documentation usability"
},
"ENVIRONMENT": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Environment variables that affect command behavior"
},
"FILES": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Important files used or created by the command"
},
"SEE ALSO": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Related commands or documentation references"
}
},
"x-markitect-content-control": {
"name_section": {
"required_pattern": "\\*\\*[a-z-]+\\*\\*",
"description": "Command name must be in bold",
"example": "**markitect** - validate and analyze markdown documents"
},
"synopsis_section": {
"min_code_blocks": 1,
"code_block_language": "bash",
"description": "Must include at least one code block showing command syntax"
},
"description_section": {
"min_paragraphs": 2,
"min_words": 50,
"description": "Detailed description with at least 50 words across 2+ paragraphs"
}
},
"x-markitect-metadata": {
"schema-type": "document-schema",
"domain": "manpage",
"document-types": ["manual-page", "man-page", "cli-documentation"],
"version": "1.0.0",
"author": "MarkiTect Project",
"created": "2026-01-04",
"updated": "2026-01-04",
"example": "examples/manpages/markdown-schema-validation.1.md",
"supersedes": null,
"superseded-by": null
}
}
```
## Version History
### v1.0.0 (2026-01-04)
- Initial stable release
- Validates standard manpage sections
- Classification system (required/recommended/optional)
- Content control for NAME, SYNOPSIS, DESCRIPTION
- MarkiTect extensions for improved validation
## Future Enhancements
Planned for v2.0:
- Multi-language support (internationalization)
- Extended sections (DIAGNOSTICS, SECURITY, STANDARDS)
- Cross-reference validation
- Version-specific section variants (man1, man5, man8)
## Contributing
To suggest improvements to this schema:
1. Open an issue describing the enhancement
2. Provide example documents demonstrating the need
3. Propose specific schema changes
## License
This schema is part of the MarkiTect project and follows the same license.

View File

@@ -0,0 +1,287 @@
# Terminology Document Example
This example demonstrates how to use MarkiTect schemas to validate terminology and glossary documents.
## Overview
Terminology documents (glossaries, dictionaries, lexicons) benefit from consistent structure and validation. This example shows how to:
1. Structure terminology documents with clear categories and term definitions
2. Validate terminology documents using JSON schemas
3. Use MarkiTect's schema extensions for content control
## Files
- **terminology-example.md** - Example terminology document with proper structure
- **terminology-schema.json** - JSON schema for validating terminology documents
- **README.md** - This file
## Terminology Document Structure
A well-structured terminology document includes:
### Required Elements
1. **Main Title (Level 1 Heading)**
- Should include keywords: "Terminology", "Glossary", "Terms", or "Definitions"
2. **Category Sections (Level 2 Headings)**
- Organize terms into logical groups
- Examples: "Core Concepts", "Document Types", "Process Terms"
3. **Term Definitions (Level 3 Headings)**
- Each term as a level 3 heading
- Followed by structured content
### Term Structure
Each term should include:
**Required:**
- **Definition:** Clear, concise explanation of the term
**Optional (but recommended):**
- **Synonyms:** Alternative names or abbreviations
- **Related Terms:** Links to related concepts
- **Example:** Practical usage example
- **Use Cases:** Common scenarios
- **Format:** For document type terms
- **Components:** For complex concepts
- **Steps:** For process terms
## Usage
### Using the Registered Schema
The terminology schema is registered in markitect's database and can be used by name:
```bash
# List all registered schemas (terminology-schema.json should appear)
markitect schema-list
# Validate using the registered schema
markitect validate my-glossary.md --schema terminology-schema.json
# Or use the local file directly
markitect validate my-glossary.md --schema examples/terminology/terminology-schema.json
```
### Validate with Detailed Errors
```bash
markitect validate my-glossary.md --schema terminology-schema.json --detailed-errors
```
### Register the Schema (if needed)
If the schema isn't already registered, you can add it to markitect's database:
```bash
markitect schema-ingest markitect/schemas/terminology-schema.json
```
### Generate Schema from Example
```bash
markitect schema-generate terminology-example.md --output my-terminology-schema.json
```
## Schema Features
This schema demonstrates several MarkiTect features:
### 1. Structural Validation
- Enforces consistent heading hierarchy (H1 → H2 → H3)
- Validates minimum term count
- Ensures proper document structure
### 2. Content Pattern Validation
- Validates title pattern (must contain terminology-related keywords)
- Checks for required field labels (Definition:, Synonyms:, etc.)
- Enforces consistent formatting
### 3. MarkiTect Extensions
The schema uses MarkiTect-specific extensions:
#### `x-markitect-sections`
Defines section classifications and requirements:
- `document_title` (required)
- `category_sections` (required, min 1)
- `term_definitions` (required, min 1)
#### `x-markitect-content-control`
Specifies content requirements:
- Required vs optional components
- Content quality metrics (word counts)
- Content instructions for authors
#### `x-markitect-validation-rules`
Custom validation rules:
- Minimum term count (3 required, 10+ recommended)
- Category balance (min 2 terms per category)
- Definition quality checks
- Consistency validation
## Best Practices
### 1. Use Consistent Field Labels
Always use the same labels for metadata:
```markdown
**Definition:** ...
**Synonyms:** ...
**Related Terms:** ...
```
### 2. Write Clear Definitions
- Start with the term's primary meaning
- Use 10-200 words
- Be self-contained (don't require reading other terms)
- Avoid circular definitions
### 3. Group Related Terms
Organize terms into logical categories:
- Core Concepts
- Document Types
- Process Terms
- Quality Attributes
- Deprecated Terms
### 4. Include Examples
Add practical examples for complex terms:
```markdown
**Example:**
\`\`\`markdown
# Heading
Paragraph text
\`\`\`
```
### 5. Link Related Terms
Use **Related Terms:** to create a terminology graph:
```markdown
**Related Terms:** Parser, Token, Node
```
## Extending the Schema
You can customize the schema for your project:
### Add Custom Field Labels
Extend the `bold_text` enum:
```json
"enum": [
"Definition:",
"Synonyms:",
"Your Custom Label:"
]
```
### Adjust Quality Metrics
Modify content quality requirements:
```json
"content_quality": {
"min_words_per_definition": 20,
"max_words_per_definition": 300,
"readability_target": "business"
}
```
### Add Domain-Specific Validation
Include specialized validation rules:
```json
"x-markitect-validation-rules": {
"domain_specific": {
"require_acronym_expansion": true,
"require_source_citations": true
}
}
```
## Use Cases
### Documentation Projects
- Software project glossaries
- API terminology reference
- Architecture decision records (ADR) glossary
- Domain-driven design (DDD) ubiquitous language
### Technical Writing
- Standards documentation
- Compliance documentation (ISO, SOC2)
- Technical specifications
- Research papers
### Knowledge Management
- Company wikis
- Team handbooks
- Onboarding documentation
- Training materials
## Integration with CI/CD
### Pre-commit Hook
```bash
#!/bin/bash
# .git/hooks/pre-commit
markitect validate docs/glossary.md --schema schemas/terminology-schema.json
```
### GitHub Actions
```yaml
name: Validate Terminology
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install MarkiTect
run: pip install markitect
- name: Validate Glossary
run: |
markitect validate docs/glossary.md \
--schema schemas/terminology-schema.json \
--detailed-errors
```
## Related Examples
- **manpages/** - Manual page documentation validation
- **templates/** - Document template examples
- **design-patterns/** - Software pattern documentation
## Learn More
- [Schema Extensions Specification](../../docs/specifications/schema-extensions-spec.md)
- [MarkiTect Documentation](../../README.md)
- [JSON Schema Documentation](https://json-schema.org/)
## Contributing
To improve this example:
1. Add more terms to demonstrate edge cases
2. Enhance the schema with additional validation rules
3. Create alternative terminology document styles
4. Add multilingual terminology examples
## License
This example is part of the MarkiTect project and follows the same license.

View File

@@ -0,0 +1,91 @@
# Project Terminology
A glossary of key terms and concepts for this project.
## Core Concepts
### Abstract Syntax Tree
**Definition:** A tree representation of the abstract syntactic structure of source code or markup, where each node represents a construct occurring in the source.
**Synonyms:** AST, Parse Tree
**Related Terms:** Parser, Token, Node
**Example:**
```markdown
# Heading
Paragraph text
```
The AST representation would include nodes for heading (level 1) and paragraph elements.
### Schema Validation
**Definition:** The process of verifying that a document's structure conforms to a predefined schema specification.
**Synonyms:** Structural Validation, Schema Conformance
**Related Terms:** JSON Schema, Validator, Metaschema
**Use Cases:**
- Ensuring documentation completeness
- Enforcing content standards
- Automated quality checks
## Document Types
### Manpage
**Definition:** A manual page document following Unix/Linux documentation conventions, typically including sections like SYNOPSIS, DESCRIPTION, and OPTIONS.
**Format:** Markdown with specific heading structure
**Related Terms:** Documentation, Manual, Help Text
### Blueprint
**Definition:** A template specification that combines schemas, content instructions, and data templates for generating documents.
**Components:**
- Schema references
- Content model
- Data schema
- Generation rules
## Process Terms
### Schema Refinement
**Definition:** The process of transforming a rigid, auto-generated schema into a flexible, classification-aware schema with content guidance.
**Steps:**
1. Analyze existing schema for rigidity
2. Loosen exact constraints to ranges
3. Add classification metadata
4. Include content instructions
**Tools:** `markitect schema-analyze`, `markitect schema-refine`
## Quality Attributes
### Classification Levels
**Definition:** A hierarchical system for categorizing document elements based on their importance and requirements.
**Levels:**
- **Required**: Must be present (validation fails if missing)
- **Recommended**: Should be present (warning if missing)
- **Optional**: May be present (no impact)
- **Discouraged**: Should not be present (warning if present)
- **Improper**: Must not be present (validation fails if present)
## Deprecated Terms
### Rigid Schema
**Status:** DEPRECATED - Use "Unrefined Schema" instead
**Definition:** A schema with exact count constraints that make it unusable as a pattern.
**Migration:** Use schema refinement tools to convert to flexible schemas.

View File

@@ -0,0 +1,214 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/terminology-v1.json",
"title": "Terminology Document Schema",
"description": "Schema for validating terminology and glossary documents with consistent structure",
"type": "object",
"properties": {
"headings": {
"type": "object",
"properties": {
"level_1": {
"type": "array",
"description": "Main document title",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"pattern": ".*(Terminology|Glossary|Terms|Definitions).*"
}
}
},
"minItems": 1,
"maxItems": 1
},
"level_2": {
"type": "array",
"description": "Category headings (Core Concepts, Document Types, etc.)",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 1
}
}
},
"minItems": 1,
"maxItems": 20
},
"level_3": {
"type": "array",
"description": "Individual term headings",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 1,
"description": "Term name - should be title case"
}
}
},
"minItems": 1
}
},
"required": ["level_1", "level_2", "level_3"]
},
"paragraphs": {
"type": "array",
"description": "Content paragraphs including definitions and descriptions",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 10
}
}
},
"minItems": 3
},
"bold_text": {
"type": "array",
"description": "Bold text used for field labels (Definition, Synonyms, etc.)",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"enum": [
"Definition:",
"Synonyms:",
"Related Terms:",
"Example:",
"Examples:",
"Use Cases:",
"Usage:",
"Format:",
"Components:",
"Steps:",
"Tools:",
"Levels:",
"Status:",
"Migration:",
"Required:",
"Recommended:",
"Optional:",
"Discouraged:",
"Improper:"
]
}
}
},
"minItems": 1
}
},
"required": ["headings", "paragraphs"],
"x-markitect-sections": {
"document_title": {
"classification": "required",
"heading_level": 1,
"content_instruction": "Main title should include words like 'Terminology', 'Glossary', or 'Definitions'",
"pattern": ".*(Terminology|Glossary|Terms|Definitions).*"
},
"category_sections": {
"classification": "required",
"heading_level": 2,
"min_sections": 1,
"content_instruction": "Organize terms into logical categories (e.g., Core Concepts, Document Types, Process Terms)"
},
"term_definitions": {
"classification": "required",
"heading_level": 3,
"min_sections": 1,
"content_instruction": "Each term should be a level 3 heading followed by its definition and optional metadata"
}
},
"x-markitect-content-control": {
"term_structure": {
"required_components": [
{
"label": "Definition:",
"type": "bold_text",
"description": "Clear, concise definition of the term"
}
],
"optional_components": [
{
"label": "Synonyms:",
"type": "bold_text",
"description": "Alternative names or abbreviations"
},
{
"label": "Related Terms:",
"type": "bold_text",
"description": "Links to related concepts"
},
{
"label": "Example:",
"type": "bold_text_or_code",
"description": "Practical example demonstrating the term"
},
{
"label": "Use Cases:",
"type": "list",
"description": "Common scenarios where term applies"
}
],
"content_quality": {
"min_words_per_definition": 10,
"max_words_per_definition": 200,
"readability_target": "technical"
},
"content_instructions": [
"Start each term with a level 3 heading containing the term name",
"Follow immediately with 'Definition:' in bold",
"Provide a clear, self-contained definition",
"Add optional fields (Synonyms, Related Terms, Examples) as needed",
"Use consistent formatting across all terms",
"Group related terms under category headings (level 2)"
]
},
"definition_pattern": {
"description": "Each definition should follow: Term heading (###) → Definition: (bold) → Definition text",
"validation": {
"heading_level_3_followed_by": "bold_text_starting_with_Definition",
"definition_length": {
"min_words": 10,
"max_words": 200
}
}
},
"deprecated_terms": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Optional section for deprecated terms with migration guidance",
"required_fields": [
"Status: DEPRECATED",
"Migration:"
]
}
},
"x-markitect-validation-rules": {
"term_count": {
"min": 3,
"recommended_min": 10,
"description": "Terminology document should define at least 3 terms, 10+ recommended"
},
"category_balance": {
"description": "Each category should have at least 2 terms",
"min_terms_per_category": 2
},
"definition_quality": {
"all_terms_must_have_definition": true,
"definition_must_follow_term_heading": true,
"definition_min_words": 10
},
"consistency": {
"use_consistent_field_labels": true,
"maintain_heading_hierarchy": true
}
}
}

View File

@@ -1740,15 +1740,46 @@ def schema_list(config, output_format, names_only):
click.echo()
for schema_info in schemas:
click.echo(f"🔧 {schema_info['filename']}")
# Format timestamp for display (remove microseconds)
created = schema_info['created_at']
if created:
# Format: YYYY-MM-DD HH:MM:SS (remove microseconds if present)
if '.' in created:
created_display = created.split('.')[0]
else:
created_display = created
click.echo(f"🔧 {schema_info['filename']:<40} (added: {created_display})")
else:
click.echo(f"🔧 {schema_info['filename']}")
if config.get('verbose'):
click.echo(f" Title: {schema_info['title']}")
click.echo(f" Created: {schema_info['created_at']}")
click.echo(f" Updated: {schema_info['updated_at']}")
if schema_info['description']:
click.echo(f" Description: {schema_info['description']}")
click.echo()
elif output_format == 'table':
# Custom table format for better readability
table_data = []
for schema in schemas:
# Format timestamps (remove microseconds)
created_date = schema['created_at'].split('.')[0] if schema['created_at'] and '.' in schema['created_at'] else schema['created_at']
updated_date = schema['updated_at'].split('.')[0] if schema['updated_at'] and '.' in schema['updated_at'] else schema['updated_at']
table_data.append({
'Name': schema['filename'],
'Title': schema['title'] or '',
'Created': created_date or '',
'Updated': updated_date or ''
})
if table_data:
headers = ['Name', 'Title', 'Created', 'Updated']
rows = [[row[h] for h in headers] for row in table_data]
click.echo(tabulate(rows, headers=headers, tablefmt='simple'))
else:
# Use structured format (table, json, yaml)
# Use structured format (json, yaml)
formatted_output = format_output(schemas, output_format)
click.echo(formatted_output)
@@ -1872,6 +1903,110 @@ def schema_delete(config, schema_name, confirm):
sys.exit(1)
@cli.command('schema-validate')
@click.argument('schema_file', type=click.Path(exists=True, path_type=Path))
@click.option('--detailed-errors', is_flag=True, help='Show detailed validation errors')
@pass_config
def schema_validate_cmd(config, schema_file, detailed_errors):
"""
Validate a schema file against the schema-for-schemas metaschema.
Ensures schema files follow MarkiTect conventions and standards:
- Required fields ($schema, $id, title, description, version)
- Version format (SemVer: major.minor.patch)
- $id URL format (HTTPS with version)
- MarkiTect extensions (x-markitect-*)
- Section classification structures
SCHEMA_FILE: Path to the schema file to validate (markdown or JSON)
Examples:
markitect schema-validate manpage-schema-v1.0.md
markitect schema-validate my-schema-v2.0.md --detailed-errors
"""
try:
from .schema_loader import MarkdownSchemaLoader
try:
import jsonschema
from jsonschema import Draft7Validator, ValidationError
except ImportError:
click.echo("❌ Error: jsonschema package not installed", err=True)
click.echo("Install it with: pip install jsonschema", err=True)
sys.exit(1)
loader = MarkdownSchemaLoader()
# Load the schema to validate
click.echo(f"Loading schema: {schema_file.name}")
try:
if schema_file.suffix == '.md':
schema_data = loader.load_schema(schema_file)
schema = schema_data['schema']
else:
# Assume JSON
schema = json.loads(schema_file.read_text())
except Exception as e:
click.echo(f"❌ Failed to load schema: {e}", err=True)
sys.exit(1)
# Load metaschema
metaschema_path = Path(__file__).parent / 'schemas' / 'schema-schema-v1.0.md'
if not metaschema_path.exists():
click.echo(f"❌ Metaschema not found: {metaschema_path}", err=True)
sys.exit(1)
try:
metaschema_data = loader.load_schema(metaschema_path)
metaschema = metaschema_data['schema']
except Exception as e:
click.echo(f"❌ Failed to load metaschema: {e}", err=True)
sys.exit(1)
# Validate schema against metaschema
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
if not errors:
click.echo(f"✅ Schema is valid: {schema_file.name}")
click.echo(f" Title: {schema.get('title', 'N/A')}")
click.echo(f" Version: {schema.get('version', 'N/A')}")
click.echo(f" $id: {schema.get('$id', 'N/A')}")
# Additional structure validation
issues = loader.validate_schema_structure(schema)
if issues:
click.echo(f"\n⚠️ Structure recommendations:")
for issue in issues:
click.echo(f" - {issue}")
else:
click.echo(f"❌ Schema validation failed: {schema_file.name}", err=True)
click.echo(f"\nFound {len(errors)} validation error(s):\n", err=True)
for i, error in enumerate(errors, 1):
path = ''.join(str(p) for p in error.path) if error.path else 'root'
click.echo(f"{i}. At {path}:", err=True)
click.echo(f" {error.message}", err=True)
if detailed_errors and error.context:
click.echo(f" Context:", err=True)
for ctx_error in error.context:
click.echo(f" - {ctx_error.message}", err=True)
if detailed_errors:
click.echo(f" Schema path: {''.join(str(p) for p in error.schema_path)}", err=True)
click.echo()
sys.exit(1)
except Exception as e:
click.echo(f"❌ Schema validation error: {e}", err=True)
if config and config.get('verbose'):
import traceback
click.echo(traceback.format_exc(), err=True)
sys.exit(1)
@cli.command('schema-analyze')
@click.argument('schema_file', type=click.Path(exists=True))
@click.option('--verbose', '-v', is_flag=True, help='Show detailed analysis')

503
markitect/schema_loader.py Normal file
View File

@@ -0,0 +1,503 @@
"""
Schema Loader - Extract JSON schemas from markdown files.
This module provides functionality to load schemas from markdown files that
contain embedded JSON schemas in code blocks, along with YAML frontmatter
metadata and rich documentation.
Markdown Schema Format:
---
schema-id: "https://markitect.dev/schemas/domain/v1"
version: "1.0.0"
status: "stable|draft|deprecated"
---
# Schema Title v1.0
## Documentation sections...
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
...
}
```
This enables:
- Rich documentation alongside schemas
- Version history in same file
- Human-readable schema files
- Markdown-first approach aligned with MarkiTect philosophy
"""
import re
import json
import yaml
from pathlib import Path
from typing import Dict, Any, Optional, List, Tuple
class SchemaLoaderError(Exception):
"""Base exception for schema loading errors."""
pass
class InvalidSchemaFormatError(SchemaLoaderError):
"""Schema file format is invalid."""
pass
class SchemaNotFoundError(SchemaLoaderError):
"""No JSON schema found in markdown file."""
pass
class MarkdownSchemaLoader:
"""
Load and parse markdown schema files.
Supports:
- YAML frontmatter for metadata
- JSON code blocks for schema definition
- Validation of schema structure
- Metadata merging
Example:
>>> loader = MarkdownSchemaLoader()
>>> schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))
>>> schema = schema_data['schema']
>>> metadata = schema_data['metadata']
"""
def __init__(self):
"""Initialize the schema loader with regex patterns."""
# Pattern to match YAML frontmatter
# Matches: --- ... --- at start of file
self.frontmatter_pattern = re.compile(
r'^---\s*\n(.*?)\n---\s*\n',
re.DOTALL | re.MULTILINE
)
# Pattern to match JSON code blocks
# Matches: ```json ... ```
self.json_code_block_pattern = re.compile(
r'```json\s*\n(.*?)\n```',
re.DOTALL | re.MULTILINE
)
# Pattern to find Schema Definition section
# This helps us find the right JSON block if there are multiple
self.schema_section_pattern = re.compile(
r'##\s+Schema Definition\s*\n',
re.MULTILINE
)
def load_schema(self, md_path: Path) -> Dict[str, Any]:
"""
Load schema from markdown file.
Args:
md_path: Path to markdown schema file
Returns:
Dictionary containing:
- schema: Extracted JSON schema (dict)
- metadata: Frontmatter metadata (dict)
- documentation: Full markdown content (str)
- source_file: Source file path (str)
Raises:
FileNotFoundError: If schema file doesn't exist
InvalidSchemaFormatError: If file format is invalid
SchemaNotFoundError: If no JSON schema found
Example:
>>> loader = MarkdownSchemaLoader()
>>> data = loader.load_schema(Path("manpage-schema-v1.0.md"))
>>> print(data['schema']['title'])
'Unix Manual Page Schema'
"""
if not md_path.exists():
raise FileNotFoundError(f"Schema file not found: {md_path}")
# Read file content
try:
content = md_path.read_text(encoding='utf-8')
except Exception as e:
raise InvalidSchemaFormatError(f"Failed to read schema file: {e}")
# Extract frontmatter
metadata = self._extract_frontmatter(content)
# Extract JSON schema
schema = self._extract_json_schema(content)
if not schema:
raise SchemaNotFoundError(
f"No JSON schema found in {md_path}. "
f"Expected a ```json code block with schema definition."
)
# Merge metadata into schema
schema = self._merge_metadata(schema, metadata, md_path)
return {
'schema': schema,
'metadata': metadata,
'documentation': content,
'source_file': str(md_path)
}
def _extract_frontmatter(self, content: str) -> Dict[str, Any]:
"""
Extract YAML frontmatter from markdown content.
Args:
content: Markdown file content
Returns:
Dictionary of frontmatter metadata (empty if none found)
Raises:
InvalidSchemaFormatError: If YAML is malformed
"""
match = self.frontmatter_pattern.search(content)
if not match:
return {}
yaml_content = match.group(1)
try:
metadata = yaml.safe_load(yaml_content) or {}
if not isinstance(metadata, dict):
raise InvalidSchemaFormatError(
f"Frontmatter must be a YAML dictionary, got {type(metadata)}"
)
return metadata
except yaml.YAMLError as e:
raise InvalidSchemaFormatError(f"Invalid YAML frontmatter: {e}")
def _extract_json_schema(self, content: str) -> Optional[Dict[str, Any]]:
"""
Extract JSON schema from markdown code blocks.
Prefers JSON blocks under "## Schema Definition" section,
but will use first JSON block if no Schema Definition section found.
Args:
content: Markdown file content
Returns:
JSON schema dictionary or None if not found
Raises:
InvalidSchemaFormatError: If JSON is malformed
"""
# Find all JSON code blocks
json_blocks = self.json_code_block_pattern.findall(content)
if not json_blocks:
return None
# Try to find the Schema Definition section
schema_section_match = self.schema_section_pattern.search(content)
if schema_section_match:
# Find JSON block that comes after Schema Definition section
section_pos = schema_section_match.end()
# Re-search for JSON blocks starting from section position
remaining_content = content[section_pos:]
section_json_blocks = self.json_code_block_pattern.findall(remaining_content)
if section_json_blocks:
json_text = section_json_blocks[0]
else:
# Fallback to first JSON block in entire document
json_text = json_blocks[0]
else:
# No Schema Definition section, use first JSON block
json_text = json_blocks[0]
# Parse JSON
try:
schema = json.loads(json_text)
if not isinstance(schema, dict):
raise InvalidSchemaFormatError(
f"Schema must be a JSON object, got {type(schema)}"
)
return schema
except json.JSONDecodeError as e:
raise InvalidSchemaFormatError(f"Invalid JSON schema: {e}")
def _merge_metadata(
self,
schema: Dict[str, Any],
metadata: Dict[str, Any],
source_file: Path
) -> Dict[str, Any]:
"""
Merge frontmatter metadata into schema.
Adds x-markitect-source extension with file info and metadata.
Optionally overrides schema fields with frontmatter values.
Args:
schema: JSON schema dictionary
metadata: Frontmatter metadata dictionary
source_file: Path to source file
Returns:
Schema with merged metadata
"""
# Create a copy to avoid modifying original
merged_schema = schema.copy()
# Add MarkiTect-specific source metadata
merged_schema['x-markitect-source'] = {
'file': str(source_file),
'filename': source_file.name,
'format': 'markdown',
'frontmatter': metadata
}
# Override schema fields with frontmatter if present
# This allows frontmatter to be the source of truth for metadata
if 'version' in metadata:
merged_schema['version'] = metadata['version']
if 'schema-id' in metadata:
merged_schema['$id'] = metadata['schema-id']
if 'status' in metadata:
if 'x-markitect-metadata' not in merged_schema:
merged_schema['x-markitect-metadata'] = {}
merged_schema['x-markitect-metadata']['status'] = metadata['status']
return merged_schema
def save_schema(
self,
schema: Dict[str, Any],
md_path: Path,
template: Optional[str] = None,
frontmatter: Optional[Dict[str, Any]] = None
):
"""
Save schema as markdown file.
Args:
schema: JSON schema dictionary to save
md_path: Output path for markdown file
template: Optional markdown template string
frontmatter: Optional frontmatter metadata (extracted from schema if not provided)
Raises:
InvalidSchemaFormatError: If schema is invalid
Example:
>>> loader = MarkdownSchemaLoader()
>>> loader.save_schema(
... schema={'title': 'My Schema', ...},
... md_path=Path('my-schema-v1.0.md')
... )
"""
if template:
# Use provided template
content = self._render_template(template, schema, frontmatter)
else:
# Generate basic markdown
content = self._generate_markdown(schema, frontmatter)
# Create parent directory if needed
md_path.parent.mkdir(parents=True, exist_ok=True)
# Write file
try:
md_path.write_text(content, encoding='utf-8')
except Exception as e:
raise InvalidSchemaFormatError(f"Failed to write schema file: {e}")
def _generate_markdown(
self,
schema: Dict[str, Any],
frontmatter: Optional[Dict[str, Any]] = None
) -> str:
"""
Generate markdown from schema.
Args:
schema: JSON schema dictionary
frontmatter: Optional frontmatter metadata
Returns:
Markdown content as string
"""
# Extract metadata from schema
title = schema.get('title', 'Untitled Schema')
version = schema.get('version', '1.0.0')
description = schema.get('description', '')
schema_id = schema.get('$id', '')
# Build frontmatter
if frontmatter is None:
frontmatter = {}
# Set defaults
if 'schema-id' not in frontmatter and schema_id:
frontmatter['schema-id'] = schema_id
if 'version' not in frontmatter:
frontmatter['version'] = version
if 'status' not in frontmatter:
frontmatter['status'] = 'draft'
# Generate frontmatter YAML
frontmatter_yaml = yaml.dump(
frontmatter,
default_flow_style=False,
allow_unicode=True
).strip()
# Generate JSON (pretty-printed)
schema_json = json.dumps(schema, indent=2, ensure_ascii=False)
# Build markdown content
md_content = f"""---
{frontmatter_yaml}
---
# {title} v{version}
## Overview
{description}
## Usage
```bash
markitect validate document.md --schema {Path(frontmatter.get('schema-id', 'schema')).name}
```
## Schema Definition
```json
{schema_json}
```
## Version History
### v{version}
- Initial version
"""
return md_content
def _render_template(
self,
template: str,
schema: Dict[str, Any],
frontmatter: Optional[Dict[str, Any]] = None
) -> str:
"""
Render markdown from template.
Simple template rendering using string formatting.
For complex templates, consider using Jinja2 or similar.
Args:
template: Template string
schema: JSON schema dictionary
frontmatter: Optional frontmatter metadata
Returns:
Rendered markdown content
"""
# Build context for template
context = {
'title': schema.get('title', 'Untitled'),
'version': schema.get('version', '1.0.0'),
'description': schema.get('description', ''),
'schema_id': schema.get('$id', ''),
'schema_json': json.dumps(schema, indent=2, ensure_ascii=False),
'frontmatter': frontmatter or {},
}
# Simple template rendering
try:
return template.format(**context)
except KeyError as e:
raise InvalidSchemaFormatError(f"Template missing key: {e}")
def list_json_blocks(self, content: str) -> List[Tuple[int, str]]:
"""
List all JSON code blocks in markdown content.
Useful for debugging or when multiple JSON blocks exist.
Args:
content: Markdown file content
Returns:
List of (position, json_content) tuples
Example:
>>> loader = MarkdownSchemaLoader()
>>> content = Path('schema.md').read_text()
>>> blocks = loader.list_json_blocks(content)
>>> print(f"Found {len(blocks)} JSON blocks")
"""
blocks = []
for match in self.json_code_block_pattern.finditer(content):
blocks.append((match.start(), match.group(1)))
return blocks
def validate_schema_structure(self, schema: Dict[str, Any]) -> List[str]:
"""
Validate basic schema structure.
Checks for required JSON Schema fields and MarkiTect conventions.
Args:
schema: JSON schema dictionary
Returns:
List of warning/error messages (empty if valid)
Example:
>>> loader = MarkdownSchemaLoader()
>>> issues = loader.validate_schema_structure(schema)
>>> if issues:
... print("Schema issues:", issues)
"""
issues = []
# Check required JSON Schema fields
if '$schema' not in schema:
issues.append("Missing required field: $schema")
if 'type' not in schema:
issues.append("Missing recommended field: type")
if 'title' not in schema:
issues.append("Missing recommended field: title")
if 'description' not in schema:
issues.append("Missing recommended field: description")
# Check MarkiTect conventions
if 'version' not in schema:
issues.append("Missing MarkiTect convention: version field")
if '$id' not in schema:
issues.append("Missing recommended field: $id")
# Check $id format if present
if '$id' in schema:
schema_id = schema['$id']
if not isinstance(schema_id, str):
issues.append("$id must be a string")
elif not schema_id.startswith('https://'):
issues.append("$id should be a full HTTPS URL")
return issues

309
markitect/schema_naming.py Normal file
View File

@@ -0,0 +1,309 @@
"""
Schema Naming Validation - Enforce filename conventions for schemas.
This module provides validation and utilities for schema filename conventions
to ensure consistency across the MarkiTect schema ecosystem.
Naming Convention:
Format: {domain}-schema-v{major}.{minor}.md
Components:
- domain: lowercase, hyphen-separated identifier (e.g., "manpage", "api-documentation")
- schema: literal string "schema"
- version: SemVer major.minor (e.g., "v1.0", "v2.1")
- extension: ".md" (markdown)
Valid Examples:
✓ manpage-schema-v1.0.md
✓ terminology-schema-v1.0.md
✓ api-documentation-schema-v1.0.md
✓ my-custom-type-schema-v2.1.md
Invalid Examples:
✗ manpage.json (missing version and wrong extension)
✗ manpage-v1.md (missing "schema" keyword)
✗ ManPage-Schema-v1.0.md (wrong case - must be lowercase)
✗ manpage-schema-1.0.md (missing 'v' prefix)
✗ manpage-schema-v1.md (missing minor version)
"""
import re
from pathlib import Path
from typing import Tuple, Optional, Dict, Any
# Regex pattern for schema filename validation
# Matches: {domain}-schema-v{major}.{minor}.md
# Where domain is lowercase letters/numbers/hyphens starting with letter
SCHEMA_FILENAME_PATTERN = re.compile(
r'^(?P<domain>[a-z][a-z0-9-]*)-schema-v(?P<major>\d+)\.(?P<minor>\d+)\.md$'
)
class SchemaFilenameError(Exception):
"""Exception raised for invalid schema filenames."""
pass
def validate_schema_filename(filename: str) -> Tuple[bool, Optional[Dict[str, Any]]]:
"""
Validate schema filename against naming convention.
Args:
filename: The filename to validate (e.g., "manpage-schema-v1.0.md")
Returns:
Tuple of (is_valid, metadata_dict or None)
If valid, metadata_dict contains:
- domain: str - The domain identifier
- version: str - Full version string (e.g., "1.0")
- major: int - Major version number
- minor: int - Minor version number
- filename: str - The original filename
If invalid, metadata_dict is None
Examples:
>>> validate_schema_filename("manpage-schema-v1.0.md")
(True, {'domain': 'manpage', 'version': '1.0', ...})
>>> validate_schema_filename("invalid.json")
(False, None)
"""
match = SCHEMA_FILENAME_PATTERN.match(filename)
if not match:
return False, None
return True, {
'domain': match.group('domain'),
'version': f"{match.group('major')}.{match.group('minor')}",
'major': int(match.group('major')),
'minor': int(match.group('minor')),
'filename': filename
}
def suggest_schema_filename(
domain: str,
version: str = "1.0",
normalize: bool = True
) -> str:
"""
Generate a valid schema filename from domain and version.
Args:
domain: The schema domain (e.g., "manpage", "API Documentation")
version: Version string in format "major.minor" (default: "1.0")
normalize: Whether to normalize domain to lowercase/hyphenated
Returns:
Valid schema filename
Raises:
ValueError: If domain or version format is invalid
Examples:
>>> suggest_schema_filename("manpage", "1.0")
'manpage-schema-v1.0.md'
>>> suggest_schema_filename("API Documentation", "2.1")
'api-documentation-schema-v2.1.md'
>>> suggest_schema_filename("My_Custom_Type", "1.0")
'my-custom-type-schema-v1.0.md'
"""
if not domain:
raise ValueError("Domain cannot be empty")
if normalize:
# Normalize domain: lowercase, replace spaces/underscores with hyphens
domain_clean = domain.lower()
domain_clean = domain_clean.replace(' ', '-').replace('_', '-')
# Remove consecutive hyphens
domain_clean = re.sub(r'-+', '-', domain_clean)
# Remove leading/trailing hyphens
domain_clean = domain_clean.strip('-')
else:
domain_clean = domain
# Validate domain format (must start with letter, contain only lowercase, numbers, hyphens)
if not re.match(r'^[a-z][a-z0-9-]*$', domain_clean):
raise ValueError(
f"Invalid domain '{domain_clean}': must start with lowercase letter "
"and contain only lowercase letters, numbers, and hyphens"
)
# Parse and validate version
version_parts = version.split('.')
if len(version_parts) != 2:
raise ValueError(
f"Invalid version '{version}': must be in format 'major.minor' (e.g., '1.0')"
)
try:
major = int(version_parts[0])
minor = int(version_parts[1])
except ValueError:
raise ValueError(
f"Invalid version '{version}': major and minor must be integers"
)
if major < 0 or minor < 0:
raise ValueError(
f"Invalid version '{version}': major and minor must be non-negative"
)
return f"{domain_clean}-schema-v{major}.{minor}.md"
def extract_schema_metadata(filename: str) -> Dict[str, Any]:
"""
Extract metadata from a valid schema filename.
Args:
filename: Schema filename to parse
Returns:
Dictionary with metadata
Raises:
SchemaFilenameError: If filename is invalid
Examples:
>>> extract_schema_metadata("manpage-schema-v1.0.md")
{'domain': 'manpage', 'version': '1.0', 'major': 1, 'minor': 0}
"""
is_valid, metadata = validate_schema_filename(filename)
if not is_valid:
raise SchemaFilenameError(
f"Invalid schema filename: {filename}\n"
f"Expected format: {{domain}}-schema-v{{major}}.{{minor}}.md"
)
return metadata
def get_validation_errors(filename: str) -> list:
"""
Get detailed validation errors for a filename.
Args:
filename: Filename to validate
Returns:
List of error messages (empty if valid)
Examples:
>>> get_validation_errors("manpage-schema-v1.0.md")
[]
>>> get_validation_errors("invalid.json")
['Filename does not match pattern: {domain}-schema-v{major}.{minor}.md', ...]
"""
errors = []
# Check basic pattern match
is_valid, _ = validate_schema_filename(filename)
if is_valid:
return errors
# Provide detailed feedback
errors.append(
f"Filename does not match pattern: {{domain}}-schema-v{{major}}.{{minor}}.md"
)
# Check extension
if not filename.endswith('.md'):
errors.append(f"Extension must be '.md', got: {Path(filename).suffix}")
# Check for version
if '-v' not in filename:
errors.append("Missing version: filename must include '-v{major}.{minor}'")
elif not re.search(r'-v\d+\.\d+', filename):
errors.append(
"Invalid version format: must be '-v{major}.{minor}' (e.g., '-v1.0')"
)
# Check for schema keyword
if '-schema-' not in filename:
errors.append("Missing '-schema-' keyword in filename")
# Check for uppercase (must be lowercase)
if any(c.isupper() for c in filename):
errors.append("Filename must be lowercase")
# Check domain format (if we can isolate it)
parts = filename.split('-schema-')
if len(parts) >= 1:
domain = parts[0]
if domain and not re.match(r'^[a-z][a-z0-9-]*$', domain):
errors.append(
f"Invalid domain '{domain}': must start with lowercase letter "
"and contain only lowercase letters, numbers, and hyphens"
)
return errors
def is_valid_schema_filename(filename: str) -> bool:
"""
Check if filename is valid (convenience function).
Args:
filename: Filename to check
Returns:
True if valid, False otherwise
Examples:
>>> is_valid_schema_filename("manpage-schema-v1.0.md")
True
>>> is_valid_schema_filename("invalid.json")
False
"""
is_valid, _ = validate_schema_filename(filename)
return is_valid
def format_validation_message(filename: str) -> str:
"""
Format a user-friendly validation message.
Args:
filename: Filename that failed validation
Returns:
Formatted error message with suggestions
Examples:
>>> print(format_validation_message("manpage.json"))
❌ Invalid schema filename: manpage.json
...
"""
errors = get_validation_errors(filename)
if not errors:
return f"✅ Valid schema filename: {filename}"
message = f"❌ Invalid schema filename: {filename}\n\n"
message += "Errors:\n"
for i, error in enumerate(errors, 1):
message += f" {i}. {error}\n"
message += "\nExpected format: {domain}-schema-v{major}.{minor}.md\n"
message += "Example: manpage-schema-v1.0.md\n"
# Try to suggest a corrected filename
try:
# Extract domain guess (everything before first hyphen or dot)
domain_guess = filename.split('-')[0].split('.')[0]
suggestion = suggest_schema_filename(domain_guess, "1.0")
message += f"\nSuggested filename: {suggestion}\n"
except Exception:
pass
return message

View File

@@ -0,0 +1,333 @@
---
schema-id: "https://markitect.dev/schemas/manpage/v1.0"
version: "1.0.0"
status: "stable"
domain: "manpage"
description: "JSON schema for Unix-style manual pages with section classification and content control"
---
# Unix Manual Page Schema v1.0
## Overview
This schema defines the structure and validation rules for Unix-style manual pages (manpages) in MarkiTect's markdown format. It includes comprehensive section classification, content control patterns, and quality guidelines to ensure consistent, high-quality documentation.
## Features
- **Section Classification System**: Categorizes manpage sections as required, recommended, optional, discouraged, or improper
- **Content Control**: Validates content patterns, quality metrics, and structural requirements
- **Flexible Section Names**: Supports alternative section names (e.g., "FLAGS" as alternative to "OPTIONS")
- **Quality Enforcement**: Minimum/maximum content requirements for paragraphs, code blocks, and words
## Section Classifications
### Required Sections
- **SYNOPSIS**: Brief command syntax with all options and arguments
- **DESCRIPTION**: Detailed explanation of command purpose and functionality
### Recommended Sections
- **EXAMPLES**: Practical usage examples demonstrating common use cases
- **OPTIONS**: Detailed option descriptions with all flags and behaviors
- **SEE ALSO**: Related commands and documentation references
### Optional Sections
- **BUGS**: Known issues and bug reporting information
- **AUTHORS**: Contributors and maintainers
- **COPYRIGHT**: License information
- **HISTORY**: Historical development information
### Discouraged Sections
- **DEPRECATED**: Legacy content (should move to HISTORY)
- **OLD_SYNTAX**: Outdated syntax (should move to HISTORY or be removed)
### Improper Sections
- **INTERNAL_NOTES**: Development notes (must not appear in published docs)
- **TODO**: Development tasks (remove before publication)
- **DRAFT**: Draft markers (remove before publication)
## Usage
### Validating a Manpage
```bash
markitect validate my-command.1.md --schema manpage-schema-v1.0
```
### Common Validation Errors
1. **Missing Required Sections**: Ensure SYNOPSIS and DESCRIPTION are present
2. **Content Too Brief**: DESCRIPTION should have at least 50 words
3. **No Examples**: While optional, EXAMPLES are highly recommended
4. **Improper Sections**: Remove TODO, DRAFT, and INTERNAL_NOTES before publication
## Content Quality Guidelines
### SYNOPSIS Section
- Show command name in bold: `**command**`
- Use brackets `[]` for optional arguments
- Use italic `*ARG*` for required arguments
- Keep concise (1-5 lines maximum)
- Include 5-150 words
### DESCRIPTION Section
- Start with what the command does
- Explain why users would use it
- Describe main functionality and features
- Minimum 50 words, maximum 1000 words
- At least 3 sentences
### EXAMPLES Section
- Use bash code blocks for commands
- Include comments explaining each example
- Start simple, progress to complex
- Show actual output when helpful
- Cover common use cases first
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Enhanced Markdown Manpage Schema with Classifications",
"description": "JSON schema for Unix-style manual pages with section classification and content control",
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"position": "after_title",
"content_instruction": "Brief command syntax showing all options and arguments in standard format",
"min_paragraphs": 1,
"max_paragraphs": 5,
"min_code_blocks": 0,
"max_code_blocks": 3,
"error_message": "SYNOPSIS section is mandatory for all manpages per Unix conventions"
},
"DESCRIPTION": {
"classification": "required",
"heading_level": 2,
"content_instruction": "Detailed explanation of what the command does, its purpose, and main functionality",
"min_paragraphs": 2,
"max_paragraphs": 50,
"error_message": "DESCRIPTION section is mandatory for all manpages"
},
"EXAMPLES": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Practical usage examples with explanations demonstrating common use cases",
"min_code_blocks": 3,
"max_code_blocks": 20,
"warning_if_missing": "Examples greatly improve manpage usability - highly recommended"
},
"SEE ALSO": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Related commands, configuration files, and documentation references",
"min_paragraphs": 1,
"warning_if_missing": "Cross-references help users discover related functionality"
},
"OPTIONS": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Detailed option descriptions with all flags and their behaviors",
"alternatives": ["GLOBAL OPTIONS", "COMMAND OPTIONS", "FLAGS"],
"warning_if_missing": "Documenting command options helps users understand available functionality"
},
"BUGS": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Known issues, limitations, and bug reporting information"
},
"AUTHORS": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "List of contributors and maintainers"
},
"COPYRIGHT": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Copyright statement and license information"
},
"HISTORY": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Historical information about command development"
},
"DEPRECATED": {
"classification": "discouraged",
"heading_level": 2,
"warning_if_missing": "Consider moving deprecated content to historical documentation or HISTORY section"
},
"OLD_SYNTAX": {
"classification": "discouraged",
"heading_level": 2,
"warning_if_missing": "Old syntax should be documented in HISTORY or removed entirely"
},
"INTERNAL_NOTES": {
"classification": "improper",
"heading_level": 2,
"error_message": "Internal notes must not appear in published manpages - move to developer documentation"
},
"TODO": {
"classification": "improper",
"heading_level": 2,
"error_message": "TODO sections are for development only - remove before publication"
},
"DRAFT": {
"classification": "improper",
"heading_level": 2,
"error_message": "DRAFT markers must be removed before publication"
}
},
"x-markitect-content-control": {
"synopsis": {
"required_patterns": [
"\\*\\*[a-z][a-z0-9-]*\\*\\*",
"\\[.*\\]"
],
"discouraged_patterns": [
"TODO",
"FIXME",
"TBD"
],
"content_quality": {
"min_words": 5,
"max_words": 150,
"readability_target": "technical"
},
"content_instructions": [
"Show command name in bold (e.g., **command**)",
"Use brackets [] for optional arguments",
"Use italic *ARG* for required arguments",
"Keep synopsis concise (1-5 lines maximum)",
"Use ellipsis ... to indicate repeatable arguments"
]
},
"description": {
"discouraged_patterns": [
"TODO",
"FIXME",
"\\bWIP\\b",
"\\bXXX\\b"
],
"forbidden_patterns": [
"password\\s*=\\s*[\"'].*[\"']",
"api[_-]?key\\s*=\\s*[\"'].*[\"']",
"secret\\s*=\\s*[\"'].*[\"']"
],
"content_quality": {
"min_words": 50,
"max_words": 1000,
"readability_target": "technical",
"min_sentences": 3
},
"content_instructions": [
"Start with what the command does",
"Explain why users would use it",
"Describe main functionality and features",
"Mention any prerequisites or requirements",
"Keep technical but accessible"
],
"link_validation": {
"check_internal": true,
"check_external": false,
"allow_fragments": true
}
},
"examples": {
"required_patterns": [
"```",
"#"
],
"content_quality": {
"min_words": 100,
"max_words": 2000,
"readability_target": "general"
},
"content_instructions": [
"Use bash code blocks for command examples",
"Include comments explaining what each example does",
"Start with simple examples, progress to complex",
"Show actual output when helpful",
"Cover common use cases first"
]
}
},
"type": "object",
"properties": {
"headings": {
"type": "object",
"description": "Document heading structure",
"properties": {
"level_1": {
"type": "array",
"description": "Title heading in format: command(section) - description",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"pattern": "^[a-z0-9-]+\\([0-9]\\) - .+"
}
}
},
"minItems": 1,
"maxItems": 1
},
"level_2": {
"type": "array",
"description": "Main section headings",
"minItems": 3,
"maxItems": 30
},
"level_3": {
"type": "array",
"description": "Subsection headings",
"minItems": 0,
"maxItems": 50
}
},
"required": ["level_1", "level_2"]
},
"paragraphs": {
"type": "array",
"description": "Text paragraphs",
"minItems": 10,
"maxItems": 500
},
"code_blocks": {
"type": "array",
"description": "Code examples",
"minItems": 1,
"maxItems": 50
},
"lists": {
"type": "array",
"description": "Lists for options and structured information",
"minItems": 0,
"maxItems": 100
},
"emphasis": {
"type": "array",
"description": "Bold and italic text for commands and arguments",
"minItems": 20,
"maxItems": 500
}
},
"required": ["headings", "paragraphs", "code_blocks", "emphasis"]
}
```
## Version History
### v1.0.0 (2026-01-04)
- Initial markdown schema version
- Migrated from enhanced-manpage JSON schema
- Added comprehensive documentation
- Implemented section classification system
- Added content control and quality guidelines
## Related Documentation
- [Schema Naming Specification](../../roadmap/schema-of-schemas/SCHEMA_NAMING_SPEC.md)
- [Schema Management Workplan](../../roadmap/schema-of-schemas/WORKPLAN.md)
- [MarkiTect Documentation](../../README.md)

View File

@@ -0,0 +1,77 @@
# MarkiTect Schema Catalog
#
# This catalog provides metadata about available schemas for markdown document validation.
# Schemas can be referenced by name or loaded from their file path.
version: "1.0"
description: "Catalog of registered MarkiTect schemas for document validation"
schemas:
- id: "markitect-metaschema"
name: "MarkiTect Metaschema"
file: "markitect-metaschema.json"
version: "1.0"
description: "Metaschema for validating MarkiTect schema extensions"
type: "metaschema"
usage: "Used internally to validate schema files with MarkiTect-specific extensions"
tags:
- internal
- validation
- metaschema
- id: "terminology-v1"
name: "Terminology Document Schema"
file: "terminology-schema.json"
version: "1.0"
description: "Schema for validating terminology and glossary documents"
type: "document-schema"
usage: "Validates technical glossaries, terminology documents, and definition lists"
document_types:
- glossary
- terminology
- lexicon
- dictionary
features:
- Heading hierarchy validation (H1 → H2 → H3)
- Term structure validation (Definition, Synonyms, Related Terms, etc.)
- Content quality metrics (word counts, readability)
- MarkiTect extensions (x-markitect-sections, x-markitect-content-control)
- Classification system (required/recommended/optional)
example: "examples/terminology/terminology-example.md"
tags:
- documentation
- glossary
- terminology
- definitions
related_schemas: []
author: "MarkiTect Project"
created: "2026-01-04"
updated: "2026-01-04"
# Future schemas to add:
#
# - id: "manpage-v1"
# name: "Unix Manual Page Schema"
# description: "Schema for Unix/Linux manual page documentation"
#
# - id: "api-reference-v1"
# name: "API Reference Schema"
# description: "Schema for API endpoint documentation"
#
# - id: "arc42-v1"
# name: "arc42 Architecture Documentation Schema"
# description: "Schema for arc42 architecture documentation template"
#
# - id: "adr-v1"
# name: "Architecture Decision Record Schema"
# description: "Schema for ADR (Architecture Decision Record) documents"
#
# - id: "rfc-v1"
# name: "RFC/Specification Schema"
# description: "Schema for RFC-style specification documents"
# Schema discovery paths:
# - Built-in: markitect/schemas/*.json
# - User-defined: ~/.markitect/schemas/*.json
# - Project-specific: .markitect/schemas/*.json
# - Custom paths via MARKITECT_SCHEMA_PATH environment variable

View File

@@ -0,0 +1,519 @@
---
schema-id: "https://markitect.dev/schemas/schema/v1.0"
version: "1.0.0"
status: "stable"
domain: "schema"
description: "Metaschema for validating MarkiTect schema files"
---
# Schema-for-Schemas v1.0
## Overview
This metaschema validates that MarkiTect schema files follow conventions and standards. It ensures schemas are well-formed, properly versioned, and include required MarkiTect extensions.
**Purpose**: Quality assurance for schema authors
**Validates**:
- Core JSON Schema fields (title, description, $schema, $id)
- Version format (SemVer: major.minor.patch)
- $id URL format (HTTPS with version)
- MarkiTect extensions (x-markitect-*)
- Section classification structures
- Content control patterns
## Schema Conventions
### Required Fields
Every MarkiTect schema MUST include:
1. **$schema**: JSON Schema version (draft-07)
2. **$id**: Canonical HTTPS URL with version
3. **title**: Human-readable schema name
4. **description**: Brief explanation of what the schema validates
5. **version**: SemVer version string (major.minor.patch)
### Recommended Fields
Schemas SHOULD include:
- **type**: Root schema type (usually "object")
- **properties**: Object properties definition
- **required**: Array of required property names
### MarkiTect Extensions
#### x-markitect-sections
Defines document sections with classifications and content rules.
**Structure**:
```json
{
"SECTION_NAME": {
"classification": "required|recommended|optional|discouraged|improper",
"heading_level": 2,
"content_instruction": "What this section should contain",
"min_paragraphs": 1,
"max_paragraphs": 10,
"min_code_blocks": 0,
"max_code_blocks": 5,
"alternatives": ["ALTERNATIVE_NAME"],
"error_message": "Error if validation fails",
"warning_if_missing": "Warning if section absent"
}
}
```
**Classifications**:
- `required`: Section must be present
- `recommended`: Section should be present (warning if missing)
- `optional`: Section may be present
- `discouraged`: Section should be avoided (warning if present)
- `improper`: Section must not be present (error if present)
#### x-markitect-content-control
Defines content patterns and quality metrics.
**Structure**:
```json
{
"section_name": {
"required_patterns": ["regex1", "regex2"],
"discouraged_patterns": ["regex3"],
"forbidden_patterns": ["regex4"],
"content_quality": {
"min_words": 50,
"max_words": 1000,
"readability_target": "technical|general",
"min_sentences": 3
},
"content_instructions": ["instruction1", "instruction2"],
"link_validation": {
"check_internal": true,
"check_external": false,
"allow_fragments": true
}
}
}
```
#### x-markitect-metadata
Additional schema metadata.
**Structure**:
```json
{
"status": "stable|draft|deprecated",
"authors": ["Author Name <email@example.com>"],
"created": "2026-01-04",
"updated": "2026-01-04",
"tags": ["tag1", "tag2"]
}
```
#### x-markitect-source
Automatically added by schema loader (not in schema file).
**Structure**:
```json
{
"file": "/path/to/schema-v1.0.md",
"filename": "schema-v1.0.md",
"format": "markdown",
"frontmatter": {...}
}
```
## Validation Rules
### $id Format
Must be HTTPS URL with version:
```
https://markitect.dev/schemas/{domain}/v{major}
```
**Examples**:
-`https://markitect.dev/schemas/manpage/v1.0`
-`https://markitect.dev/schemas/api-documentation/v2.0`
-`http://example.com/schema` (not HTTPS)
-`https://markitect.dev/schemas/manpage` (no version)
### Version Format
Must be SemVer (major.minor.patch):
```
{major}.{minor}.{patch}
```
**Examples**:
-`1.0.0`
-`2.5.3`
-`1.0` (missing patch)
-`v1.0.0` (has 'v' prefix)
### Title Format
Should be descriptive and end with "Schema":
**Examples**:
- ✅ "Unix Manual Page Schema"
- ✅ "API Documentation Schema"
- ❌ "Schema" (too generic)
## Usage
### Validating a Schema
```bash
# Validate a schema file
markitect schema-validate manpage-schema-v1.0.md
# Show detailed errors
markitect schema-validate manpage-schema-v1.0.md --detailed-errors
```
### Programmatic Usage
```python
from pathlib import Path
from markitect.schema_loader import MarkdownSchemaLoader
# Load schema to validate
loader = MarkdownSchemaLoader()
schema_data = loader.load_schema(Path("my-schema-v1.0.md"))
# Check structure
issues = loader.validate_schema_structure(schema_data['schema'])
if issues:
for issue in issues:
print(f"⚠️ {issue}")
```
## Common Validation Errors
### Missing Required Fields
**Error**: `Missing required field: $schema`
**Solution**: Add `$schema` field:
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
...
}
```
### Invalid $id Format
**Error**: `$id should be a full HTTPS URL`
**Solution**: Use proper format:
```json
{
"$id": "https://markitect.dev/schemas/my-domain/v1.0",
...
}
```
### Invalid Version Format
**Error**: `version must be in SemVer format (major.minor.patch)`
**Solution**: Use three-part version:
```json
{
"version": "1.0.0",
...
}
```
### Invalid Section Classification
**Error**: `Invalid classification value: 'mandatory'`
**Solution**: Use valid classification:
```json
{
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
...
}
}
}
```
Valid values: `required`, `recommended`, `optional`, `discouraged`, `improper`
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/schema/v1.0",
"title": "MarkiTect Schema-for-Schemas",
"description": "Metaschema for validating MarkiTect schema files",
"version": "1.0.0",
"type": "object",
"required": ["$schema", "$id", "title", "description", "version"],
"properties": {
"$schema": {
"type": "string",
"const": "http://json-schema.org/draft-07/schema#",
"description": "JSON Schema version (must be draft-07)"
},
"$id": {
"type": "string",
"pattern": "^https://[a-z0-9.-]+/schemas/[a-z0-9-]+/v[0-9]+\\.[0-9]+$",
"description": "Canonical schema URI with HTTPS and version"
},
"title": {
"type": "string",
"minLength": 5,
"maxLength": 200,
"description": "Human-readable schema name"
},
"description": {
"type": "string",
"minLength": 10,
"maxLength": 500,
"description": "Brief explanation of what this schema validates"
},
"version": {
"type": "string",
"pattern": "^[0-9]+\\.[0-9]+\\.[0-9]+$",
"description": "Semantic version (major.minor.patch)"
},
"type": {
"type": "string",
"enum": ["object", "array", "string", "number", "boolean", "null"],
"description": "Root schema type"
},
"properties": {
"type": "object",
"description": "Object property definitions"
},
"required": {
"type": "array",
"items": {"type": "string"},
"description": "Required property names"
},
"x-markitect-sections": {
"type": "object",
"description": "Section definitions with classifications",
"patternProperties": {
"^[A-Z][A-Z0-9_ ]*$": {
"type": "object",
"required": ["classification", "heading_level"],
"properties": {
"classification": {
"type": "string",
"enum": ["required", "recommended", "optional", "discouraged", "improper"],
"description": "Section requirement level"
},
"heading_level": {
"type": "integer",
"minimum": 1,
"maximum": 6,
"description": "Markdown heading level (1-6)"
},
"position": {
"type": "string",
"enum": ["after_title", "before_title", "anywhere"],
"description": "Section position constraint"
},
"content_instruction": {
"type": "string",
"description": "What this section should contain"
},
"min_paragraphs": {
"type": "integer",
"minimum": 0,
"description": "Minimum paragraph count"
},
"max_paragraphs": {
"type": "integer",
"minimum": 1,
"description": "Maximum paragraph count"
},
"min_code_blocks": {
"type": "integer",
"minimum": 0,
"description": "Minimum code block count"
},
"max_code_blocks": {
"type": "integer",
"minimum": 0,
"description": "Maximum code block count"
},
"alternatives": {
"type": "array",
"items": {"type": "string"},
"description": "Alternative section names"
},
"error_message": {
"type": "string",
"description": "Error message if validation fails"
},
"warning_if_missing": {
"type": "string",
"description": "Warning message if section absent"
}
}
}
}
},
"x-markitect-content-control": {
"type": "object",
"description": "Content pattern and quality rules",
"patternProperties": {
"^[a-z_]+$": {
"type": "object",
"properties": {
"required_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "Required regex patterns"
},
"discouraged_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "Patterns to warn about"
},
"forbidden_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "Patterns that cause errors"
},
"content_quality": {
"type": "object",
"properties": {
"min_words": {
"type": "integer",
"minimum": 0,
"description": "Minimum word count"
},
"max_words": {
"type": "integer",
"minimum": 1,
"description": "Maximum word count"
},
"readability_target": {
"type": "string",
"enum": ["technical", "general"],
"description": "Target readability level"
},
"min_sentences": {
"type": "integer",
"minimum": 1,
"description": "Minimum sentence count"
}
}
},
"content_instructions": {
"type": "array",
"items": {"type": "string"},
"description": "Content writing guidelines"
},
"link_validation": {
"type": "object",
"properties": {
"check_internal": {
"type": "boolean",
"description": "Validate internal links"
},
"check_external": {
"type": "boolean",
"description": "Validate external links"
},
"allow_fragments": {
"type": "boolean",
"description": "Allow fragment identifiers"
}
}
}
}
}
}
},
"x-markitect-metadata": {
"type": "object",
"description": "Additional schema metadata",
"properties": {
"status": {
"type": "string",
"enum": ["stable", "draft", "deprecated"],
"description": "Schema lifecycle status"
},
"authors": {
"type": "array",
"items": {"type": "string"},
"description": "Schema authors"
},
"created": {
"type": "string",
"format": "date",
"description": "Creation date (ISO 8601)"
},
"updated": {
"type": "string",
"format": "date",
"description": "Last update date (ISO 8601)"
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Schema tags for categorization"
}
}
},
"x-markitect-source": {
"type": "object",
"description": "Source file metadata (added by loader)",
"properties": {
"file": {
"type": "string",
"description": "Full file path"
},
"filename": {
"type": "string",
"description": "File name only"
},
"format": {
"type": "string",
"enum": ["markdown", "json"],
"description": "Source file format"
},
"frontmatter": {
"type": "object",
"description": "YAML frontmatter from markdown"
}
}
}
},
"additionalProperties": true
}
```
## Version History
### v1.0.0 (2026-01-04)
- Initial metaschema version
- Validates core JSON Schema fields
- Validates MarkiTect extensions
- Supports section classifications
- Supports content control patterns
- SemVer version validation
- HTTPS $id URL validation
## Related Documentation
- [Schema Naming Specification](../../roadmap/schema-of-schemas/SCHEMA_NAMING_SPEC.md)
- [Schema Loader Guide](../../roadmap/schema-of-schemas/SCHEMA_LOADER_GUIDE.md)
- [Schema Management Workplan](../../roadmap/schema-of-schemas/WORKPLAN.md)

View File

@@ -0,0 +1,214 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/terminology-v1.json",
"title": "Terminology Document Schema",
"description": "Schema for validating terminology and glossary documents with consistent structure",
"type": "object",
"properties": {
"headings": {
"type": "object",
"properties": {
"level_1": {
"type": "array",
"description": "Main document title",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"pattern": ".*(Terminology|Glossary|Terms|Definitions).*"
}
}
},
"minItems": 1,
"maxItems": 1
},
"level_2": {
"type": "array",
"description": "Category headings (Core Concepts, Document Types, etc.)",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 1
}
}
},
"minItems": 1,
"maxItems": 20
},
"level_3": {
"type": "array",
"description": "Individual term headings",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 1,
"description": "Term name - should be title case"
}
}
},
"minItems": 1
}
},
"required": ["level_1", "level_2", "level_3"]
},
"paragraphs": {
"type": "array",
"description": "Content paragraphs including definitions and descriptions",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 10
}
}
},
"minItems": 3
},
"bold_text": {
"type": "array",
"description": "Bold text used for field labels (Definition, Synonyms, etc.)",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"enum": [
"Definition:",
"Synonyms:",
"Related Terms:",
"Example:",
"Examples:",
"Use Cases:",
"Usage:",
"Format:",
"Components:",
"Steps:",
"Tools:",
"Levels:",
"Status:",
"Migration:",
"Required:",
"Recommended:",
"Optional:",
"Discouraged:",
"Improper:"
]
}
}
},
"minItems": 1
}
},
"required": ["headings", "paragraphs"],
"x-markitect-sections": {
"document_title": {
"classification": "required",
"heading_level": 1,
"content_instruction": "Main title should include words like 'Terminology', 'Glossary', or 'Definitions'",
"pattern": ".*(Terminology|Glossary|Terms|Definitions).*"
},
"category_sections": {
"classification": "required",
"heading_level": 2,
"min_sections": 1,
"content_instruction": "Organize terms into logical categories (e.g., Core Concepts, Document Types, Process Terms)"
},
"term_definitions": {
"classification": "required",
"heading_level": 3,
"min_sections": 1,
"content_instruction": "Each term should be a level 3 heading followed by its definition and optional metadata"
}
},
"x-markitect-content-control": {
"term_structure": {
"required_components": [
{
"label": "Definition:",
"type": "bold_text",
"description": "Clear, concise definition of the term"
}
],
"optional_components": [
{
"label": "Synonyms:",
"type": "bold_text",
"description": "Alternative names or abbreviations"
},
{
"label": "Related Terms:",
"type": "bold_text",
"description": "Links to related concepts"
},
{
"label": "Example:",
"type": "bold_text_or_code",
"description": "Practical example demonstrating the term"
},
{
"label": "Use Cases:",
"type": "list",
"description": "Common scenarios where term applies"
}
],
"content_quality": {
"min_words_per_definition": 10,
"max_words_per_definition": 200,
"readability_target": "technical"
},
"content_instructions": [
"Start each term with a level 3 heading containing the term name",
"Follow immediately with 'Definition:' in bold",
"Provide a clear, self-contained definition",
"Add optional fields (Synonyms, Related Terms, Examples) as needed",
"Use consistent formatting across all terms",
"Group related terms under category headings (level 2)"
]
},
"definition_pattern": {
"description": "Each definition should follow: Term heading (###) → Definition: (bold) → Definition text",
"validation": {
"heading_level_3_followed_by": "bold_text_starting_with_Definition",
"definition_length": {
"min_words": 10,
"max_words": 200
}
}
},
"deprecated_terms": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Optional section for deprecated terms with migration guidance",
"required_fields": [
"Status: DEPRECATED",
"Migration:"
]
}
},
"x-markitect-validation-rules": {
"term_count": {
"min": 3,
"recommended_min": 10,
"description": "Terminology document should define at least 3 terms, 10+ recommended"
},
"category_balance": {
"description": "Each category should have at least 2 terms",
"min_terms_per_category": 2
},
"definition_quality": {
"all_terms_must_have_definition": true,
"definition_must_follow_term_heading": true,
"definition_min_words": 10
},
"consistency": {
"use_consistent_field_labels": true,
"maintain_heading_hierarchy": true
}
}
}

View File

@@ -0,0 +1,94 @@
# Schema-of-Schemas Implementation
**Project:** Markdown-First Schema System
**Status:** Planning → Implementation
**Timeline:** 8-10 days
## Quick Links
- **[WORKPLAN.md](./WORKPLAN.md)** - Detailed implementation plan with phases
- **[SCHEMA_MANAGEMENT_PROPOSAL.md](./SCHEMA_MANAGEMENT_PROPOSAL.md)** - Full analysis and options
- **[SCHEMA_MANAGEMENT_SUMMARY.md](./SCHEMA_MANAGEMENT_SUMMARY.md)** - Executive summary
## What We're Building
### Goals
1. ✅ Filename convention: `{domain}-schema-v{version}.md`
2. ✅ Markdown-first schema format (documentation + embedded JSON)
3. ✅ Schema-for-schemas to validate all schemas
4. ✅ Migrate existing schemas to new format
5. ✅ Clean up duplicate/legacy schemas
### Why
- **Consistency:** Enforced naming and versioning
- **Alignment:** Markdown-first matches MarkiTect philosophy
- **Documentation:** Rich docs alongside schemas
- **Validation:** Schema-for-schemas ensures quality
- **Maintainability:** Clear versions and structure
## Implementation Phases
1. **Phase 0:** Planning & Setup (0.5 days) ← **Current**
2. **Phase 1:** Filename Convention (1 day)
3. **Phase 2:** Markdown Loader (2-3 days)
4. **Phase 3:** Schema-for-Schemas (2 days)
5. **Phase 4:** Schema Migration (1-2 days)
6. **Phase 5:** CLI & Docs (1 day)
7. **Phase 6:** Testing (1 day)
## Current Status
### Completed
- [x] Directory structure created
- [x] Planning documents moved to roadmap
- [x] Comprehensive workplan written
- [x] Example markdown schema created
### Next Steps
1. Complete Phase 0 planning artifacts
2. Begin Phase 1 implementation
3. Checkpoint review after Phase 1
## Key Decisions
### Naming Convention
**Format:** `{domain}-schema-v{major}.{minor}.md`
**Example:** `manpage-schema-v1.0.md`
### Schema Format
**Markdown with embedded JSON:**
```markdown
---
schema-id: "https://markitect.dev/schemas/manpage/v1"
version: "1.0.0"
---
# Manpage Schema v1.0
[Documentation...]
## Schema Definition
```json
{ ... JSON schema ... }
```
```
### Schema Migration Plan
```
Old → New
──────────────────────────────────────────────────
terminology-schema.json → terminology-schema-v1.0.md
api-documentation → api-documentation-schema-v1.0.md
enhanced-manpage → manpage-schema-v2.0.md
markdown-manpage → REMOVE (duplicate)
markdown-manpage-schema.json → REMOVE (duplicate)
```
## Progress Tracking
Track progress in: `roadmap/schema-of-schemas/IMPLEMENTATION_LOG.md` (to be created)
## Questions?
See the full workplan for detailed implementation steps, risks, and mitigation strategies.

View File

@@ -0,0 +1,579 @@
# Markdown Schema Loader - User Guide
**Version:** 1.0
**Status:** Implemented
**Created:** 2026-01-04
## Overview
The Markdown Schema Loader enables MarkiTect to load JSON schemas from markdown files, combining rich documentation with machine-readable validation rules. This aligns with MarkiTect's markdown-first philosophy while maintaining JSON Schema compatibility.
## Markdown Schema Format
A markdown schema file consists of three parts:
1. **YAML Frontmatter**: Metadata about the schema
2. **Documentation**: Rich markdown content explaining the schema
3. **Schema Definition**: JSON schema in a code block
### Example Structure
```markdown
---
schema-id: "https://markitect.dev/schemas/domain/v1.0"
version: "1.0.0"
status: "stable"
---
# Schema Title v1.0
## Overview
Description of what this schema validates...
## Usage
How to use this schema...
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "My Schema",
"type": "object",
...
}
```
## Version History
- v1.0.0 - Initial version
```
## Frontmatter Metadata
### Required Fields
None are strictly required, but these are recommended:
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `schema-id` | string | Canonical URI for the schema | `https://markitect.dev/schemas/manpage/v1.0` |
| `version` | string | SemVer version | `1.0.0` |
| `status` | string | Lifecycle status | `stable`, `draft`, `deprecated` |
### Optional Fields
| Field | Type | Description |
|-------|------|-------------|
| `domain` | string | Schema domain name |
| `description` | string | Brief schema description |
| `authors` | array | List of authors |
| `created` | string | Creation date (ISO 8601) |
| `updated` | string | Last update date (ISO 8601) |
### Metadata Merging
Frontmatter metadata takes precedence over schema fields:
- `schema-id` → `$id` in the schema
- `version` → `version` in the schema
- `status` → `x-markitect-metadata.status` in the schema
All frontmatter is preserved in `x-markitect-source.frontmatter`.
## JSON Schema Extraction
### Schema Definition Section
The loader prefers JSON blocks under a `## Schema Definition` heading:
```markdown
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
...
}
```
```
### Fallback Behavior
If no `## Schema Definition` section exists, the loader uses the **first** JSON code block in the file.
### Multiple JSON Blocks
You can include multiple JSON blocks in documentation:
```markdown
## Example Usage
```json
{
"name": "example",
"version": "1.0"
}
```
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"name": {"type": "string"},
"version": {"type": "string"}
}
}
```
```
The loader will use the schema under `## Schema Definition` heading.
## Using the Loader
### Python API
```python
from pathlib import Path
from markitect.schema_loader import MarkdownSchemaLoader
# Create loader instance
loader = MarkdownSchemaLoader()
# Load schema from markdown
schema_data = loader.load_schema(Path("manpage-schema-v1.0.md"))
# Access components
schema = schema_data['schema'] # JSON Schema dict
metadata = schema_data['metadata'] # Frontmatter dict
docs = schema_data['documentation'] # Full markdown content
source = schema_data['source_file'] # Source file path
# Use the schema
print(f"Loaded: {schema['title']}")
print(f"Version: {schema['version']}")
print(f"Status: {metadata['status']}")
```
### Loading from Markdown
```python
# Load schema
schema_data = loader.load_schema(Path("my-schema-v1.0.md"))
# Check for issues
issues = loader.validate_schema_structure(schema_data['schema'])
if issues:
for issue in issues:
print(f"⚠️ {issue}")
```
### Saving to Markdown
```python
# Create a schema
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "My Schema",
"version": "1.0.0",
"type": "object",
"properties": {
"name": {"type": "string"}
}
}
# Save as markdown
loader.save_schema(
schema=schema,
md_path=Path("my-schema-v1.0.md"),
frontmatter={
"schema-id": "https://example.com/schemas/my-schema/v1.0",
"status": "draft"
}
)
```
### Round-Trip Conversion
```python
# Load existing JSON schema
import json
json_schema = json.loads(Path("old-schema.json").read_text())
# Save as markdown
loader.save_schema(
schema=json_schema,
md_path=Path("new-schema-v1.0.md")
)
# Load it back
schema_data = loader.load_schema(Path("new-schema-v1.0.md"))
# Schemas are equivalent
assert schema_data['schema']['title'] == json_schema['title']
```
## Advanced Features
### Listing JSON Blocks
Useful for debugging when multiple JSON blocks exist:
```python
content = Path("schema.md").read_text()
blocks = loader.list_json_blocks(content)
print(f"Found {len(blocks)} JSON blocks:")
for position, json_content in blocks:
print(f" Position {position}: {len(json_content)} chars")
```
### Schema Structure Validation
Check for recommended fields and conventions:
```python
issues = loader.validate_schema_structure(schema)
for issue in issues:
print(f"⚠️ {issue}")
# Example output:
# ⚠️ Missing recommended field: $id
# ⚠️ Missing MarkiTect convention: version field
```
### Custom Templates
Use custom markdown templates for saving schemas:
```python
template = """---
{frontmatter_yaml}
---
# {title}
{description}
## Schema
```json
{schema_json}
```
"""
loader.save_schema(
schema=schema,
md_path=Path("custom-schema-v1.0.md"),
template=template
)
```
## Error Handling
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `FileNotFoundError` | Schema file doesn't exist | Check file path |
| `SchemaNotFoundError` | No JSON block in markdown | Add ```json code block |
| `InvalidSchemaFormatError` | Invalid JSON or YAML | Check syntax |
| `SchemaFilenameError` | Invalid filename format | Use `{domain}-schema-v{major}.{minor}.md` |
### Example Error Handling
```python
from markitect.schema_loader import (
MarkdownSchemaLoader,
SchemaNotFoundError,
InvalidSchemaFormatError
)
loader = MarkdownSchemaLoader()
try:
schema_data = loader.load_schema(Path("my-schema.md"))
except FileNotFoundError as e:
print(f"❌ File not found: {e}")
except SchemaNotFoundError as e:
print(f"❌ No schema in file: {e}")
except InvalidSchemaFormatError as e:
print(f"❌ Invalid format: {e}")
```
## Best Practices
### 1. Use Schema Definition Section
Always place the main schema under `## Schema Definition`:
```markdown
## Schema Definition
```json
{...}
```
```
### 2. Include Frontmatter
Provide metadata for better discoverability:
```yaml
---
schema-id: "https://markitect.dev/schemas/domain/v1.0"
version: "1.0.0"
status: "stable"
---
```
### 3. Add Rich Documentation
Explain the schema purpose, usage, and examples:
```markdown
## Overview
This schema validates...
## Usage
```bash
markitect validate doc.md --schema my-schema-v1.0
```
## Examples
...
```
### 4. Version Your Schemas
Follow the naming convention:
- Initial: `my-schema-v1.0.md`
- Minor update: `my-schema-v1.1.md`
- Breaking change: `my-schema-v2.0.md`
### 5. Validate Structure
Always check for common issues:
```python
issues = loader.validate_schema_structure(schema)
if not issues:
print("✅ Schema structure is valid")
```
## Integration with MarkiTect
### CLI Usage (Future)
Once integrated with the CLI, you'll be able to:
```bash
# Ingest markdown schema
markitect schema-ingest manpage-schema-v1.0.md
# Validate against markdown schema
markitect validate document.md --schema manpage-schema-v1.0
# Export schema
markitect schema-get manpage-schema-v1.0 --output json
```
### Validator Integration
The SchemaValidator will automatically detect `.md` schemas:
```python
from markitect.validator import SchemaValidator
validator = SchemaValidator()
validator.validate(
document="my-doc.md",
schema="manpage-schema-v1.0.md" # .md extension auto-detected
)
```
## Markdown Schema Template
Here's a complete template for creating new schemas:
```markdown
---
schema-id: "https://markitect.dev/schemas/YOUR-DOMAIN/v1.0"
version: "1.0.0"
status: "draft"
domain: "YOUR-DOMAIN"
description: "Brief description of what this schema validates"
authors:
- "Your Name <email@example.com>"
created: "2026-01-04"
---
# YOUR-DOMAIN Schema v1.0
## Overview
Detailed description of what this schema validates and why it exists.
## Features
- Feature 1
- Feature 2
- Feature 3
## Usage
### Validating Documents
```bash
markitect validate document.md --schema YOUR-DOMAIN-schema-v1.0
```
### Common Validation Errors
1. **Error Type 1**: Description and solution
2. **Error Type 2**: Description and solution
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "YOUR DOMAIN Schema",
"description": "Schema description",
"type": "object",
"properties": {
"field1": {
"type": "string",
"description": "Description of field1"
}
},
"required": ["field1"]
}
```
## Examples
### Valid Document
```markdown
Example of valid content...
```
### Invalid Document
```markdown
Example of invalid content...
```
## Version History
### v1.0.0 (2026-01-04)
- Initial version
- Feature A
- Feature B
## Related Documentation
- [Related Schema 1](../other-schema-v1.0.md)
- [MarkiTect Documentation](../../README.md)
```
## Testing
The loader has comprehensive test coverage:
```bash
# Run all loader tests
pytest tests/test_schema_loader.py -v
# Run specific test class
pytest tests/test_schema_loader.py::TestMarkdownSchemaLoader -v
# Check coverage
pytest tests/test_schema_loader.py --cov=markitect.schema_loader
```
**Test Results**: 35/35 tests passing (100%)
## Implementation Details
### Regex Patterns
The loader uses these regex patterns:
```python
# Frontmatter pattern
r'^---\s*\n(.*?)\n---\s*\n'
# JSON code block pattern
r'```json\s*\n(.*?)\n```'
# Schema Definition section pattern
r'##\s+Schema Definition\s*\n'
```
### Metadata Merging
The `_merge_metadata` method:
1. Copies the original schema
2. Adds `x-markitect-source` with file metadata
3. Merges frontmatter fields:
- `schema-id``$id`
- `version``version`
- `status``x-markitect-metadata.status`
### File Encoding
All files are read/written as UTF-8. Invalid UTF-8 sequences raise `InvalidSchemaFormatError`.
## Troubleshooting
### Schema Not Found
**Problem**: `SchemaNotFoundError: No JSON schema found`
**Solutions**:
- Ensure you have a ```json code block
- Check the JSON syntax is valid
- Verify the code block is properly closed with ```
### Invalid YAML Frontmatter
**Problem**: `InvalidSchemaFormatError: Invalid YAML frontmatter`
**Solutions**:
- Check YAML syntax (indentation, colons, quotes)
- Ensure frontmatter is between `---` delimiters
- Verify frontmatter is at the start of file
### Binary File Error
**Problem**: `InvalidSchemaFormatError: Failed to read schema file`
**Solutions**:
- Ensure file is text, not binary
- Check file encoding is UTF-8
- Verify file isn't corrupted
## See Also
- [Schema Naming Specification](SCHEMA_NAMING_SPEC.md)
- [Schema Management Workplan](WORKPLAN.md)
- [Phase 2 Documentation](WORKPLAN.md#phase-2-markdown-schema-loader)
- [Example Markdown Schema](../../markitect/schemas/manpage-schema-v1.0.md)
## Changelog
### v1.0.0 (2026-01-04)
- Initial implementation
- 35 unit tests (100% passing)
- Frontmatter extraction with YAML parsing
- JSON code block extraction with section preference
- Metadata merging with x-markitect-source tracking
- Schema saving with template support
- Round-trip save/load capability
- Helper methods for validation and debugging

View File

@@ -0,0 +1,569 @@
# Schema Management Proposal
**Status:** Draft
**Created:** 2026-01-04
**Author:** Analysis of current state and proposed improvements
## Problem Statement
### 1. Inconsistent Schema Naming
**Current State:**
```
terminology-schema.json ← Has ".json" suffix
api-documentation ← No suffix
enhanced-manpage ← No suffix
markdown-manpage ← No suffix, duplicate title
markdown-manpage-schema.json ← Has ".json" suffix, duplicate title
```
**Issues:**
- No naming convention enforced
- Duplicate schemas (3 manpage schemas!)
- Mix of suffixed (.json) and non-suffixed names
- No way to distinguish versions
### 2. Missing Versioning
**Current State:**
- No version in filenames
- No version in schema metadata (beyond optional `$id`)
- No way to track schema evolution
- Breaking changes not apparent
**Issues:**
- Can't have multiple versions simultaneously
- No migration path when schemas change
- Unclear which schema version a document uses
### 3. Format Mismatch: JSON vs Markdown
**The Philosophical Problem:**
> MarkiTect is a markdown-centric tool, yet schemas are JSON files.
> This creates a conceptual and practical mismatch.
**Current State:**
- Documents: Markdown (.md)
- Schemas: JSON (.json)
- No unified format for documentation + schema
- Schemas lack rich documentation capabilities
## Proposed Solutions
### Part 1: Naming Convention & Versioning
#### Option A: Filename-Based Versioning (Recommended)
**Format:** `{domain}-{type}-schema-v{major}.{minor}.json`
**Examples:**
```
manpage-schema-v1.0.json # Manpage schema v1.0
manpage-schema-v2.0.json # Breaking change → v2.0
terminology-schema-v1.0.json # Terminology schema
api-documentation-schema-v1.0.json
arc42-schema-v1.0.json
```
**Benefits:**
- Clear versioning in filename
- Easy to see multiple versions
- SemVer compatible (major.minor)
- Searchable/sortable
**Migration Strategy:**
```bash
# Rename existing schemas
markdown-manpage → manpage-schema-v1.0.json
enhanced-manpage → manpage-schema-v2.0.json # (breaking changes)
terminology-schema.json → terminology-schema-v1.0.json
```
#### Option B: $id-Based Versioning
**Keep simple filenames, use `$id` for versioning:**
```json
{
"$id": "https://markitect.dev/schemas/manpage/v1",
"$schema": "http://json-schema.org/draft-07/schema#",
"version": "1.0.0",
...
}
```
**Filenames:** `manpage-schema.json`, `terminology-schema.json`
**Benefits:**
- Clean filenames
- Versioning in metadata
- Follows JSON Schema best practices
**Drawbacks:**
- Can't have multiple versions in same database
- Harder to see versions at a glance
#### Recommendation: **Hybrid Approach**
Combine both for maximum clarity:
```json
// File: manpage-schema-v1.json
{
"$id": "https://markitect.dev/schemas/manpage/v1",
"$schema": "http://json-schema.org/draft-07/schema#",
"version": "1.0.0",
"title": "Unix Manual Page Schema",
...
}
```
### Part 2: Schema Metadata Standard
Add required metadata to all schemas:
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/{domain}/v{major}",
// Required metadata
"version": "1.0.0", // SemVer
"title": "Human Readable Title",
"description": "Detailed description",
// Optional metadata
"x-markitect-schema-type": "document-schema",
"x-markitect-version": {
"major": 1,
"minor": 0,
"patch": 0
},
"x-markitect-author": "MarkiTect Project",
"x-markitect-created": "2026-01-04",
"x-markitect-updated": "2026-01-04",
"x-markitect-deprecated": false,
"x-markitect-superseded-by": null,
"x-markitect-document-types": ["manpage", "manual"],
"x-markitect-example": "examples/manpages/example.md",
// Schema content
"type": "object",
"properties": { ... }
}
```
### Part 3: Format Mismatch Solutions
#### Option 1: Markdown-First with Embedded JSON (Recommended)
**File Format:** Markdown with frontmatter and JSON code block
```markdown
---
schema-version: "1.0.0"
schema-id: "https://markitect.dev/schemas/manpage/v1"
document-type: manpage
status: stable
---
# Manpage Schema v1.0
## Overview
This schema validates Unix/Linux manual page documentation following
standard conventions (SYNOPSIS, DESCRIPTION, OPTIONS, etc.).
## Document Types
- Manual pages (man pages)
- CLI command documentation
- API reference pages
## Usage
\`\`\`bash
markitect validate mycommand.1.md --schema manpage-schema-v1
\`\`\`
## Examples
See [examples/manpages/](../../examples/manpages/) for complete examples.
## Schema Definition
\`\`\`json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/manpage/v1",
"version": "1.0.0",
"title": "Unix Manual Page Schema",
"type": "object",
"properties": {
"headings": {
"type": "object",
"properties": {
"level_1": { ... }
}
}
},
"x-markitect-sections": { ... }
}
\`\`\`
## Validation Rules
### Required Sections
- **NAME** - Command name and brief description
- **SYNOPSIS** - Command syntax
- **DESCRIPTION** - Detailed description
### Optional Sections
- **OPTIONS** - Command-line options
- **EXAMPLES** - Usage examples
- **SEE ALSO** - Related commands
## Version History
### v1.0.0 (2026-01-04)
- Initial release
- Basic manpage structure validation
```
**Implementation:**
```python
class MarkdownSchemaLoader:
"""Load schemas from markdown files with embedded JSON."""
def load_schema_from_markdown(self, md_path: Path) -> dict:
"""Extract JSON schema from markdown file."""
content = md_path.read_text()
# Parse frontmatter
frontmatter = self._extract_frontmatter(content)
# Extract JSON from code block
schema_json = self._extract_json_from_code_block(content)
# Merge metadata
schema = json.loads(schema_json)
schema['x-markitect-metadata'] = frontmatter
return schema
def save_schema_to_markdown(self, schema: dict, md_path: Path):
"""Save schema as markdown with embedded JSON."""
# Generate markdown documentation
doc = self._generate_schema_documentation(schema)
# Embed JSON schema
json_block = f"```json\n{json.dumps(schema, indent=2)}\n```"
# Combine
full_content = f"{doc}\n\n## Schema Definition\n\n{json_block}"
md_path.write_text(full_content)
```
**Benefits:**
- ✅ Markdown-first (aligns with MarkiTect philosophy)
- ✅ Rich documentation alongside schema
- ✅ Human-readable and editable
- ✅ Version history in same file
- ✅ Examples and usage inline
- ✅ Can extract JSON when needed
**Drawbacks:**
- ⚠️ Requires parsing logic
- ⚠️ Two sources of truth (markdown + embedded JSON)
- ⚠️ More complex than pure JSON
#### Option 2: Markdown Documentation Generator
**Keep JSON schemas, auto-generate markdown docs:**
```
schemas/
manpage-schema-v1.json # Source of truth
manpage-schema-v1.md # Auto-generated docs
```
**Command:**
```bash
markitect schema-document manpage-schema-v1.json
# Generates: manpage-schema-v1.md
```
**Benefits:**
- ✅ Simple implementation
- ✅ JSON remains source of truth
- ✅ Auto-generated docs always in sync
**Drawbacks:**
- ⚠️ Two files to manage
- ⚠️ Can't hand-edit documentation (gets overwritten)
#### Option 3: Markdown Schema Language (DSL)
**Define schemas in markdown-native syntax:**
```markdown
# Manpage Schema v1.0
## Document Structure
### Required Sections (Level 1 Heading)
**NAME**
- Classification: required
- Content: Command name in bold, followed by description
- Pattern: `**command** - description`
**SYNOPSIS**
- Classification: required
- Content: Command syntax with options
- Min paragraphs: 1
- Max paragraphs: 3
### Optional Sections
**OPTIONS**
- Classification: recommended
- Content: Definition list of command-line options
```
**Parser generates JSON schema from markdown:**
```bash
markitect schema-compile manpage-schema-v1.md --output manpage-schema-v1.json
```
**Benefits:**
- ✅ Pure markdown
- ✅ Human-friendly syntax
- ✅ No JSON editing needed
**Drawbacks:**
- ⚠️ Complex parser implementation
- ⚠️ Limited to MarkiTect-specific features
- ⚠️ Can't use standard JSON Schema tools
#### Option 4: Literate Schema Programming
**Inspired by literate programming, mix documentation and schema:**
```markdown
# Manpage Schema v1.0
Manual pages follow a standard structure. The NAME section is required:
<<define-name-section>>=
{
"NAME": {
"classification": "required",
"heading_level": 2,
"content_instruction": "Command name and brief description"
}
}
The SYNOPSIS section shows command syntax:
<<define-synopsis-section>>=
{
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"min_code_blocks": 1
}
}
Complete schema:
<<manpage-schema.json>>=
{
"$schema": "http://json-schema.org/draft-07/schema#",
"x-markitect-sections": {
<<define-name-section>>,
<<define-synopsis-section>>
}
}
```
**Benefits:**
- ✅ Documentation and schema interleaved
- ✅ Literate programming benefits
- ✅ Reusable schema fragments
**Drawbacks:**
- ⚠️ Complex tangling/weaving
- ⚠️ Unfamiliar paradigm
- ⚠️ Overkill for simple schemas
## Recommendations
### Short-Term (Immediate)
1. **Naming Convention:**
- Format: `{domain}-schema-v{major}.{minor}.json`
- Example: `manpage-schema-v1.0.json`
2. **Schema Metadata:**
- Add required `version`, `title`, `description` fields
- Add `x-markitect-*` metadata extensions
- Document in schema-catalog.yaml
3. **Duplicate Cleanup:**
- Consolidate 3 manpage schemas into versioned series
- Keep enhanced-manpage as v2.0 (breaking changes)
- Archive old schemas
### Medium-Term (Next Phase)
4. **Markdown Schema Format (Option 1):**
- Implement markdown-first schema format
- Markdown file with embedded JSON in code block
- Parser extracts JSON for validation
- Rich documentation alongside schema
5. **Schema Documentation Generator:**
- Auto-generate markdown docs from JSON schemas
- Include examples, usage, version history
- Link to example documents
### Long-Term (Future)
6. **Schema DSL (Option 3):**
- Evaluate markdown schema language
- Prototype parser for common patterns
- Consider if DSL adds value over JSON
7. **Schema Registry API:**
- REST API for schema discovery
- Version negotiation
- Schema evolution tracking
## Implementation Plan
### Phase 1: Naming & Versioning (1-2 days)
**Tasks:**
1. Define naming convention spec
2. Create schema metadata template
3. Rename existing schemas
4. Update schema-catalog.yaml
5. Update documentation
**Deliverables:**
- Schema naming convention spec
- Migrated schemas with versions
- Updated catalog
### Phase 2: Markdown Schema Format (3-5 days)
**Tasks:**
1. Design markdown schema format
2. Implement parser (extract JSON from markdown)
3. Implement generator (create markdown from JSON)
4. Convert existing schemas to markdown format
5. Update CLI to support .md schemas
6. Write documentation and examples
**Deliverables:**
- Markdown schema parser/generator
- All schemas in markdown format
- Updated CLI commands
- Migration guide
### Phase 3: Schema Validation (2-3 days)
**Tasks:**
1. Create metaschema for validating schemas
2. Add schema validation command
3. Validate all existing schemas
4. Add CI check for schema validity
**Deliverables:**
- Schema-for-schemas (metaschema)
- Validation command
- CI integration
## Cost-Benefit Analysis
### Option 1: Markdown-First (Recommended)
**Cost:**
- Parser implementation: ~200 lines
- CLI updates: ~100 lines
- Migration effort: 2-3 days
- Testing: 1 day
**Benefit:**
- Aligned with markdown philosophy ⭐⭐⭐⭐⭐
- Rich documentation ⭐⭐⭐⭐⭐
- Version history inline ⭐⭐⭐⭐
- Human-friendly ⭐⭐⭐⭐⭐
- Lower barrier to entry ⭐⭐⭐⭐
**Total:** High value for reasonable cost
### Option 2: Documentation Generator
**Cost:**
- Generator implementation: ~150 lines
- Template design: 1 day
- Testing: 0.5 days
**Benefit:**
- Simple implementation ⭐⭐⭐⭐
- Auto-sync docs ⭐⭐⭐⭐
- JSON remains source ⭐⭐⭐
**Total:** Good value, lower cost
### Option 3: Schema DSL
**Cost:**
- DSL design: 2-3 days
- Parser implementation: ~500 lines
- Compiler: ~300 lines
- Testing: 2 days
- Documentation: 1 day
**Benefit:**
- Pure markdown ⭐⭐⭐⭐⭐
- No JSON editing ⭐⭐⭐⭐
- Limited ecosystem ⭐⭐
**Total:** High cost, uncertain value
## Decision Matrix
| Criterion | Option 1: Markdown-First | Option 2: Doc Generator | Option 3: DSL |
|-----------|-------------------------|------------------------|---------------|
| Markdown alignment | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Implementation cost | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| Documentation quality | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Tool ecosystem | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| Maintainability | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| User-friendliness | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
## Recommendation Summary
1. **Immediate:** Implement naming convention and versioning
2. **Short-term:** Choose **Option 1 (Markdown-First)** for schema format
3. **Fallback:** If Option 1 proves too complex, use **Option 2 (Doc Generator)**
4. **Future:** Evaluate DSL if community demand emerges
## Next Steps
1. Review and approve this proposal
2. Create naming convention specification
3. Prototype markdown schema parser
4. Migrate one schema as proof-of-concept
5. Gather feedback and iterate
6. Full migration of all schemas
---
## Appendix: Example Markdown Schema
See `examples/schemas/manpage-schema-v1.md` for a complete example of the proposed format.

View File

@@ -0,0 +1,154 @@
# Schema Management: Executive Summary
**TL;DR:** Implement naming conventions, versioning, and markdown-first schema format to solve current schema management issues.
## Problems Identified
1. **Inconsistent Naming** - Mix of `schema.json` suffix and no suffix
2. **No Versioning** - Can't track schema evolution or maintain multiple versions
3. **Duplicate Schemas** - 3 manpage schemas with similar content
4. **Format Mismatch** - JSON schemas in markdown-centric tool
## Recommended Solution
### 1. Naming Convention (Immediate)
**Format:** `{domain}-schema-v{major}.{minor}.json` or `.md`
**Examples:**
```
manpage-schema-v1.0.json
terminology-schema-v1.0.json
api-documentation-schema-v1.0.json
```
**Migration:**
```
markdown-manpage → manpage-schema-v1.0.json
enhanced-manpage → manpage-schema-v2.0.json (breaking changes)
terminology-schema.json → terminology-schema-v1.0.json
```
### 2. Markdown-First Format (Short-term)
**Proposal:** Store schemas as markdown files with embedded JSON
**Benefits:**
- Aligns with markdown philosophy ✅
- Rich documentation alongside schema ✅
- Version history in same file ✅
- Examples and usage inline ✅
- Lower barrier to entry ✅
**Example:** See `examples/schemas/manpage-schema-v1.md`
**Format:**
```markdown
# Schema Title v1.0
## Documentation sections...
## Schema Definition
\`\`\`json
{ schema here }
\`\`\`
```
### 3. Schema Metadata Standard (Immediate)
**Required fields:**
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/{domain}/v{major}",
"version": "1.0.0",
"title": "Human Readable Title",
"description": "Detailed description",
"x-markitect-metadata": {
"domain": "manpage",
"document-types": ["manual-page"],
"created": "2026-01-04",
"example": "examples/manpages/example.md"
}
}
```
## Implementation Phases
### Phase 1: Foundation (1-2 days)
- [x] Analyze current state
- [ ] Define naming convention spec
- [ ] Create schema metadata template
- [ ] Rename existing schemas
- [ ] Update schema-catalog.yaml
### Phase 2: Markdown Format (3-5 days)
- [ ] Design markdown schema format
- [ ] Implement parser (extract JSON from markdown)
- [ ] Convert 1 schema as proof-of-concept
- [ ] Test and iterate
- [ ] Migrate all schemas
### Phase 3: Tooling (2-3 days)
- [ ] Update CLI to support .md schemas
- [ ] Add schema validation command
- [ ] Create migration guide
- [ ] Update documentation
## Cost-Benefit Analysis
**Cost:** 6-10 days total effort
**Benefits:**
- Professional schema management ⭐⭐⭐⭐⭐
- Better discoverability ⭐⭐⭐⭐
- Easier maintenance ⭐⭐⭐⭐⭐
- Markdown alignment ⭐⭐⭐⭐⭐
- Version tracking ⭐⭐⭐⭐⭐
**ROI:** High - Foundational improvement that benefits all future schema work
## Alternative Considered
**Alternative:** Keep JSON, generate markdown docs automatically
**Pros:**
- Simpler implementation (2-3 days)
- JSON remains source of truth
- Standard tooling works
**Cons:**
- Doesn't solve format mismatch
- Documentation generated, not authored
- Two files to manage
**Verdict:** Markdown-first better aligns with project philosophy
## Quick Wins (Today)
1. **Rename schemas** with versioned names (30 minutes)
2. **Add metadata** to existing schemas (1 hour)
3. **Update catalog** with proper versioning (30 minutes)
## Questions to Resolve
1. **File extension:** `.md` or `.schema.md` for markdown schemas?
2. **JSON extraction:** Real-time or pre-compiled cache?
3. **Backward compatibility:** Support both formats during transition?
4. **CLI changes:** `--schema file.md` or auto-detect format?
## Next Steps
1. **Review** this proposal and example (examples/schemas/manpage-schema-v1.md)
2. **Decide** on markdown-first vs generated docs approach
3. **Prototype** parser for markdown schemas
4. **Migrate** one schema as proof-of-concept
5. **Iterate** based on feedback
6. **Full rollout** to all schemas
## References
- Full proposal: [SCHEMA_MANAGEMENT_PROPOSAL.md](./SCHEMA_MANAGEMENT_PROPOSAL.md)
- Example markdown schema: [examples/schemas/manpage-schema-v1.md](../../examples/schemas/manpage-schema-v1.md)
- Current schema catalog: [markitect/schemas/schema-catalog.yaml](../../markitect/schemas/schema-catalog.yaml)

View File

@@ -0,0 +1,408 @@
# Schema Naming Convention Specification
**Version:** 1.0
**Status:** Implemented
**Created:** 2026-01-04
## Overview
This specification defines the filename convention for all MarkiTect schema files to ensure consistency, discoverability, and version tracking across the schema ecosystem.
## Filename Format
### Standard Format
```
{domain}-schema-v{major}.{minor}.md
```
### Components
| Component | Description | Rules | Examples |
|-----------|-------------|-------|----------|
| **domain** | Schema domain identifier | - Lowercase only<br>- Start with letter<br>- Letters, numbers, hyphens<br>- No consecutive hyphens<br>- No leading/trailing hyphens | `manpage`<br>`api-documentation`<br>`arc42` |
| **schema** | Literal keyword | - Must be exactly `schema` | `schema` |
| **version** | SemVer major.minor | - Format: `v{major}.{minor}`<br>- Non-negative integers<br>- Must include both major and minor | `v1.0`<br>`v2.5`<br>`v10.25` |
| **extension** | File extension | - Must be `.md` (markdown) | `.md` |
### Regular Expression
```regex
^[a-z][a-z0-9-]*-schema-v\d+\.\d+\.md$
```
**Breakdown:**
- `^[a-z]` - Start with lowercase letter
- `[a-z0-9-]*` - Followed by lowercase letters, numbers, or hyphens
- `-schema-` - Literal string
- `v\d+\.\d+` - Version (v + digits + dot + digits)
- `\.md$` - Extension
## Valid Examples
### Simple Domains
```
manpage-schema-v1.0.md
terminology-schema-v1.0.md
glossary-schema-v1.0.md
```
### Multi-Word Domains
```
api-documentation-schema-v1.0.md
architecture-decision-record-schema-v1.0.md
software-requirements-specification-schema-v1.0.md
```
### With Numbers
```
arc42-schema-v1.0.md
rfc2119-keywords-schema-v1.0.md
iso27001-schema-v1.0.md
```
### Version Variations
```
manpage-schema-v1.0.md # Initial version
manpage-schema-v1.1.md # Minor update
manpage-schema-v2.0.md # Breaking change
manpage-schema-v10.25.md # Double-digit versions
```
## Invalid Examples
### Wrong Extension
```
❌ manpage-schema-v1.0.json # Must be .md
❌ manpage-schema-v1.0.yaml # Must be .md
❌ manpage-schema-v1.0 # Missing extension
```
### Missing Components
```
❌ manpage-v1.0.md # Missing "schema" keyword
❌ manpage-schema.md # Missing version
❌ manpage.md # Missing "schema" and version
```
### Version Format Errors
```
❌ manpage-schema-1.0.md # Missing 'v' prefix
❌ manpage-schema-v1.md # Missing minor version
❌ manpage-schema-v1.0.0.md # Too many version parts (patch not used)
❌ manpage-schema-v1-0.md # Hyphen instead of dot
```
### Case Errors
```
❌ ManPage-schema-v1.0.md # Uppercase in domain
❌ manpage-Schema-v1.0.md # Uppercase in keyword
❌ MANPAGE-SCHEMA-V1.0.MD # All uppercase
```
### Domain Format Errors
```
❌ 42answers-schema-v1.0.md # Starts with number
❌ -manpage-schema-v1.0.md # Starts with hyphen
❌ man_page-schema-v1.0.md # Underscore (use hyphen)
❌ man page-schema-v1.0.md # Space (use hyphen)
❌ my--schema-v1.0.md # Consecutive hyphens
```
## Version Numbering Guidelines
### Semantic Versioning
We use simplified SemVer with major.minor only:
**Major Version (X.0):**
- Breaking changes to schema structure
- Incompatible with previous version
- Documents validated against v1.0 may fail v2.0
**Examples:**
- `manpage-schema-v1.0.md``manpage-schema-v2.0.md` (breaking change)
- `api-schema-v1.0.md``api-schema-v2.0.md` (new required sections)
**Minor Version (X.Y):**
- Backward-compatible additions
- New optional sections or fields
- Relaxed constraints
- Documents validated against v1.0 still validate against v1.1
**Examples:**
- `manpage-schema-v1.0.md``manpage-schema-v1.1.md` (new optional section)
- `api-schema-v2.0.md``api-schema-v2.1.md` (additional metadata)
### Version Incrementing
```
v1.0 → v1.1 → v1.2 → ... → v1.9 → v1.10 → v1.11
v2.0 (breaking change)
```
### Initial Version
All new schemas start at `v1.0.md`:
```bash
# New schema
my-new-type-schema-v1.0.md
```
## Domain Naming Guidelines
### Good Domain Names
**Descriptive and Specific:**
```
✓ manpage-schema-v1.0.md # Clear: Unix manual pages
✓ api-documentation-schema-v1.0.md # Clear: API docs
✓ architecture-decision-record-schema-v1.0.md # Full ADR name
```
**Concise but Meaningful:**
```
✓ adr-schema-v1.0.md # Common abbreviation
✓ rfc-schema-v1.0.md # Well-known acronym
✓ arc42-schema-v1.0.md # Standard name
```
### Poor Domain Names
**Too Generic:**
```
❌ document-schema-v1.0.md # Too vague
❌ markdown-schema-v1.0.md # All schemas are markdown
❌ schema-schema-v1.0.md # Redundant (use "metaschema")
```
**Too Verbose:**
```
❌ my-custom-documentation-template-for-apis-v1.0.md # Too long
→ api-documentation-schema-v1.0.md # Better
```
**Unclear Abbreviations:**
```
❌ mt-schema-v1.0.md # What is "mt"?
❌ doc-schema-v1.0.md # Too generic
```
## Normalization Rules
When converting arbitrary strings to valid domain names:
1. **Convert to lowercase**
- `API Documentation``api documentation`
2. **Replace separators with hyphens**
- Spaces: `api documentation``api-documentation`
- Underscores: `my_type``my-type`
- Multiple separators: `my type``my--type`
3. **Remove consecutive hyphens**
- `my--type``my-type`
4. **Remove leading/trailing hyphens**
- `-my-type-``my-type`
5. **Validate result**
- Must start with letter
- Only lowercase letters, numbers, hyphens
### Example Normalizations
```python
"API Documentation" "api-documentation-schema-v1.0.md"
"My_Custom_Type" "my-custom-type-schema-v1.0.md"
"arc42 Architecture" "arc42-architecture-schema-v1.0.md"
"--leading-hyphen" "leading-hyphen-schema-v1.0.md"
```
## Implementation
### Validation Function
The naming convention is enforced by `markitect.schema_naming.validate_schema_filename()`:
```python
from markitect.schema_naming import validate_schema_filename
is_valid, metadata = validate_schema_filename("manpage-schema-v1.0.md")
if is_valid:
print(f"Domain: {metadata['domain']}")
print(f"Version: {metadata['version']}")
print(f"Major: {metadata['major']}, Minor: {metadata['minor']}")
```
### Suggestion Function
Generate valid filenames from arbitrary input:
```python
from markitect.schema_naming import suggest_schema_filename
# From clean input
filename = suggest_schema_filename("manpage", "1.0")
# → "manpage-schema-v1.0.md"
# From messy input (with normalization)
filename = suggest_schema_filename("API Documentation", "2.1")
# → "api-documentation-schema-v1.0.md"
```
### CLI Integration
The `schema-ingest` command validates filenames:
```bash
# Valid filename - accepted
$ markitect schema-ingest manpage-schema-v1.0.md
✅ Schema stored successfully
# Invalid filename - rejected (unless --force)
$ markitect schema-ingest manpage.json
❌ Invalid schema filename: manpage.json
Expected format: {domain}-schema-v{major}.{minor}.md
Example: manpage-schema-v1.0.md
Suggested filename: manpage-schema-v1.0.md
Use --force to skip validation
```
## Migration from Legacy Naming
### Current State Analysis
Existing schemas with inconsistent naming:
```
terminology-schema.json # Has .json extension
api-documentation # No version, no extension
enhanced-manpage # No version, no extension, unclear name
markdown-manpage # No version, no extension
markdown-manpage-schema.json # Has .json extension
```
### Migration Strategy
1. **Identify domain and version**
2. **Apply naming convention**
3. **Update database registration**
4. **Remove legacy entries**
### Migration Mapping
```
Old Name → New Name
────────────────────────────────────────────────────────────────
terminology-schema.json → terminology-schema-v1.0.md
api-documentation → api-documentation-schema-v1.0.md
enhanced-manpage → manpage-schema-v2.0.md
markdown-manpage → (DELETE - duplicate)
markdown-manpage-schema.json → (DELETE - duplicate)
```
**Rationale:**
- `enhanced-manpage` → v2.0 (has breaking changes: classification system)
- `markdown-manpage` variants → DELETE (superseded by v1.0 and v2.0)
## Special Cases
### Metaschema
The schema-for-schemas follows the same convention:
```
schema-schema-v1.0.md
```
Domain is `schema`, indicating it validates schemas themselves.
### Multiple Schemas for Same Domain
Use version numbers to distinguish:
```
manpage-schema-v1.0.md # Original
manpage-schema-v2.0.md # Enhanced with classifications
```
Or use more specific domain names:
```
manpage-simple-schema-v1.0.md # Simplified variant
manpage-extended-schema-v1.0.md # Extended variant
```
## Validation Testing
All schemas should pass the naming convention validation:
```bash
# Test a filename
python -c "
from markitect.schema_naming import is_valid_schema_filename
print(is_valid_schema_filename('manpage-schema-v1.0.md'))
"
# → True
# Get detailed errors
python -c "
from markitect.schema_naming import get_validation_errors
errors = get_validation_errors('invalid.json')
for error in errors:
print(error)
"
```
## Benefits
### Consistency
- All schemas follow same pattern
- Easy to recognize schema files
- Predictable naming
### Versioning
- Clear version tracking
- Multiple versions can coexist
- Breaking changes explicit (major version bump)
### Discoverability
- Glob patterns work: `*-schema-v*.md`
- Easy to list all schemas: `ls *-schema-*.md`
- Domain easily extractable
### Tooling
- Programmatic validation
- Automatic suggestion
- Migration support
## References
- **Implementation:** `markitect/schema_naming.py`
- **Tests:** `tests/test_schema_naming.py`
- **Workplan:** `roadmap/schema-of-schemas/WORKPLAN.md`
- **Examples:** `examples/schemas/manpage-schema-v1.0.md`
## Changelog
### v1.0 (2026-01-04)
- Initial specification
- Implemented validation and suggestion functions
- 50 unit tests (100% passing)
- CLI integration planned

View File

@@ -0,0 +1,962 @@
# Schema-of-Schemas Implementation Workplan
**Project:** Implement Markdown-First Schema System with Self-Description
**Created:** 2026-01-04
**Status:** Planning
**Duration:** 6-10 days
**Priority:** High - Foundation for all schema work
## Executive Summary
This workplan implements a comprehensive schema management system:
1. Filename conventions and versioning
2. Markdown-first schema format (`.md` with embedded JSON)
3. Schema-for-schemas (metaschema) for validation
4. Migration of existing schemas
5. Cleanup of legacy schemas from registry
## Project Goals
### Primary Goals
- [x] Establish filename convention: `{domain}-schema-v{version}.md`
- [ ] Implement markdown schema parser (extract JSON from markdown)
- [ ] Create schema-for-schemas to validate all schemas
- [ ] Migrate existing schemas to new format
- [ ] Remove legacy/duplicate schemas from registry
### Success Criteria
- ✅ All schemas follow naming convention
- ✅ Schemas stored as markdown files with embedded JSON
- ✅ Schema-for-schemas validates all schemas successfully
- ✅ No duplicate schemas in registry
- ✅ CLI commands work with `.md` schema files
- ✅ Documentation updated
## Architecture Overview
### Current State
```
Schemas: JSON files (.json)
Naming: Inconsistent (api-documentation, markdown-manpage-schema.json)
Versioning: None
Documentation: Separate or missing
Registry: Database with 5 schemas (3 duplicates)
```
### Target State
```
Schemas: Markdown files (.md) with embedded JSON
Naming: {domain}-schema-v{major}.{minor}.md
Versioning: SemVer in filename and metadata
Documentation: Inline with schema
Registry: Clean, versioned, no duplicates
Validation: Schema-for-schemas validates all schemas
```
### Components to Build
```
markitect/
├── schema_loader.py # NEW: Load schemas from markdown
├── schema_validator.py # UPDATED: Support .md schemas
├── cli.py # UPDATED: Accept .md schema files
└── schemas/
├── schema-schema-v1.md # NEW: Schema-for-schemas
└── ...versioned schemas...
examples/schemas/ # Markdown schema examples
└── manpage-schema-v1.md # Already created
roadmap/schema-of-schemas/ # Planning artifacts
├── WORKPLAN.md # This file
├── SCHEMA_NAMING_SPEC.md # Naming convention spec
└── IMPLEMENTATION_LOG.md # Progress tracking
```
## Phase Breakdown
### Phase 0: Planning & Setup ✅ (0.5 days)
**Goal:** Establish project structure and specifications
**Tasks:**
- [x] Create roadmap/schema-of-schemas directory
- [x] Move planning documents to roadmap
- [ ] Write naming convention specification
- [ ] Document schema metadata standard
- [ ] Create implementation checklist
**Deliverables:**
- [x] Directory structure
- [ ] SCHEMA_NAMING_SPEC.md
- [ ] SCHEMA_METADATA_SPEC.md
- [ ] This workplan
**Duration:** 0.5 days
**Status:** In Progress
---
### Phase 1: Filename Convention & Validation (1 day)
**Goal:** Establish and enforce filename conventions
**1.1 Define Naming Convention**
**Specification:**
```
Format: {domain}-schema-v{major}.{minor}.md
Components:
- domain: lowercase, hyphen-separated (e.g., "manpage", "api-documentation")
- schema: literal string "schema"
- version: SemVer major.minor (e.g., "v1.0", "v2.1")
- extension: ".md" (markdown)
Examples:
✓ manpage-schema-v1.0.md
✓ terminology-schema-v1.0.md
✓ api-documentation-schema-v1.0.md
✗ manpage.json (missing version)
✗ manpage-v1.md (missing "schema")
✗ ManPage-Schema-v1.0.md (wrong case)
```
**1.2 Implement Validation Function**
**File:** `markitect/schema_naming.py` (NEW)
```python
import re
from pathlib import Path
from typing import Tuple, Optional
SCHEMA_FILENAME_PATTERN = re.compile(
r'^(?P<domain>[a-z][a-z0-9-]*)-schema-v(?P<major>\d+)\.(?P<minor>\d+)\.md$'
)
def validate_schema_filename(filename: str) -> Tuple[bool, Optional[dict]]:
"""
Validate schema filename against convention.
Returns:
(is_valid, metadata_dict)
"""
match = SCHEMA_FILENAME_PATTERN.match(filename)
if not match:
return False, None
return True, {
'domain': match.group('domain'),
'version': f"{match.group('major')}.{match.group('minor')}",
'major': int(match.group('major')),
'minor': int(match.group('minor'))
}
def suggest_schema_filename(domain: str, version: str) -> str:
"""Generate correct schema filename from domain and version."""
# Normalize domain: lowercase, replace spaces with hyphens
domain_clean = domain.lower().replace(' ', '-').replace('_', '-')
return f"{domain_clean}-schema-v{version}.md"
```
**1.3 Add CLI Validation**
**Update:** `markitect/cli.py` - schema-ingest command
```python
@cli.command('schema-ingest')
@click.argument('schema_file', type=click.Path(exists=True, path_type=Path))
@click.option('--force', is_flag=True, help='Skip filename validation')
def schema_ingest(config, schema_file, force):
"""Ingest schema file with filename validation."""
from .schema_naming import validate_schema_filename, suggest_schema_filename
filename = schema_file.name
is_valid, metadata = validate_schema_filename(filename)
if not is_valid and not force:
click.echo(f"❌ Invalid schema filename: {filename}", err=True)
click.echo("\nExpected format: {domain}-schema-v{major}.{minor}.md")
click.echo("Example: manpage-schema-v1.0.md")
# Try to suggest correct name
# ... extract domain/version from file content ...
suggestion = suggest_schema_filename(domain, version)
click.echo(f"\nSuggested filename: {suggestion}")
click.echo("\nUse --force to skip validation")
sys.exit(1)
# Continue with ingestion...
```
**Tasks:**
- [ ] Write `markitect/schema_naming.py`
- [ ] Add unit tests for filename validation
- [ ] Update `schema-ingest` command with validation
- [ ] Test with valid and invalid filenames
**Deliverables:**
- [ ] schema_naming.py with validation logic
- [ ] Unit tests (tests/test_schema_naming.py)
- [ ] Updated CLI with validation
- [ ] SCHEMA_NAMING_SPEC.md documentation
**Duration:** 1 day
---
### Phase 2: Markdown Schema Loader (2-3 days)
**Goal:** Parse markdown files to extract JSON schemas
**2.1 Design Markdown Schema Format**
**Format Specification:**
```markdown
---
schema-id: "https://markitect.dev/schemas/{domain}/v{major}"
version: "{major}.{minor}.{patch}"
status: "stable|draft|deprecated"
---
# {Title} v{version}
## Overview
[Human-readable description]
## Usage
[Examples of how to use this schema]
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/{domain}/v{major}",
"version": "{major}.{minor}.{patch}",
...
}
\```
## Validation Rules
[Explanation of schema rules]
## Version History
[Changelog]
```
**2.2 Implement Markdown Schema Loader**
**File:** `markitect/schema_loader.py` (NEW)
```python
"""
Schema Loader - Extract JSON schemas from markdown files.
Supports:
- YAML frontmatter for metadata
- JSON code block for schema definition
- Validation of schema structure
"""
import re
import json
import yaml
from pathlib import Path
from typing import Dict, Any, Optional, Tuple
class MarkdownSchemaLoader:
"""Load and parse markdown schema files."""
def __init__(self):
self.frontmatter_pattern = re.compile(
r'^---\s*\n(.*?)\n---\s*\n',
re.DOTALL | re.MULTILINE
)
self.json_code_block_pattern = re.compile(
r'```json\s*\n(.*?)\n```',
re.DOTALL | re.MULTILINE
)
def load_schema(self, md_path: Path) -> Dict[str, Any]:
"""
Load schema from markdown file.
Returns:
{
'schema': {...}, # Extracted JSON schema
'metadata': {...}, # Frontmatter metadata
'documentation': '...' # Full markdown content
}
"""
if not md_path.exists():
raise FileNotFoundError(f"Schema file not found: {md_path}")
content = md_path.read_text(encoding='utf-8')
# Extract frontmatter
metadata = self._extract_frontmatter(content)
# Extract JSON schema
schema = self._extract_json_schema(content)
if not schema:
raise ValueError(f"No JSON schema found in {md_path}")
# Merge metadata into schema
schema = self._merge_metadata(schema, metadata, md_path)
return {
'schema': schema,
'metadata': metadata,
'documentation': content,
'source_file': str(md_path)
}
def _extract_frontmatter(self, content: str) -> Dict[str, Any]:
"""Extract YAML frontmatter from markdown."""
match = self.frontmatter_pattern.search(content)
if not match:
return {}
try:
return yaml.safe_load(match.group(1)) or {}
except yaml.YAMLError as e:
raise ValueError(f"Invalid YAML frontmatter: {e}")
def _extract_json_schema(self, content: str) -> Optional[Dict[str, Any]]:
"""Extract JSON schema from code block."""
matches = self.json_code_block_pattern.findall(content)
if not matches:
return None
# Use the first JSON code block as schema
# (or could look for specific heading like "## Schema Definition")
try:
return json.loads(matches[0])
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON schema: {e}")
def _merge_metadata(
self,
schema: Dict[str, Any],
metadata: Dict[str, Any],
source_file: Path
) -> Dict[str, Any]:
"""Merge frontmatter metadata into schema."""
# Add MarkiTect-specific metadata
schema['x-markitect-source'] = {
'file': str(source_file),
'format': 'markdown',
'frontmatter': metadata
}
# Override schema fields with frontmatter if present
if 'version' in metadata:
schema['version'] = metadata['version']
if 'schema-id' in metadata:
schema['$id'] = metadata['schema-id']
return schema
def save_schema(
self,
schema: Dict[str, Any],
md_path: Path,
template: Optional[str] = None
):
"""
Save schema as markdown file.
Args:
schema: JSON schema dict
md_path: Output path
template: Optional markdown template
"""
if template:
# Use provided template
content = self._render_template(template, schema)
else:
# Generate basic markdown
content = self._generate_markdown(schema)
md_path.write_text(content, encoding='utf-8')
def _generate_markdown(self, schema: Dict[str, Any]) -> str:
"""Generate markdown from schema."""
title = schema.get('title', 'Untitled Schema')
version = schema.get('version', '1.0.0')
description = schema.get('description', '')
# Generate frontmatter
frontmatter = yaml.dump({
'schema-id': schema.get('$id', ''),
'version': version,
'status': 'draft'
}, default_flow_style=False)
# Generate markdown
md = f"""---
{frontmatter}---
# {title} v{version}
## Overview
{description}
## Schema Definition
```json
{json.dumps(schema, indent=2)}
```
## Version History
### v{version}
- Initial version
"""
return md
class SchemaLoaderError(Exception):
"""Base exception for schema loading errors."""
pass
```
**2.3 Update Schema Validator**
**Update:** `markitect/schema_validator.py`
```python
from .schema_loader import MarkdownSchemaLoader
class SchemaValidator:
def __init__(self):
self.schema_generator = SchemaGenerator()
self.jsonschema_available = JSONSCHEMA_AVAILABLE
self.md_loader = MarkdownSchemaLoader() # NEW
def validate_file_against_schema_file(
self,
file_path: Path,
schema_file_path: Path
) -> bool:
"""Validate file against schema (supports .json and .md)."""
# Detect schema file format
if schema_file_path.suffix == '.md':
# Load from markdown
schema_data = self.md_loader.load_schema(schema_file_path)
schema = schema_data['schema']
else:
# Load from JSON (legacy)
schema_content = schema_file_path.read_text(encoding='utf-8')
schema = json.loads(schema_content)
return self.validate_file_against_schema(file_path, schema)
```
**Tasks:**
- [ ] Implement MarkdownSchemaLoader class
- [ ] Add frontmatter extraction (YAML)
- [ ] Add JSON code block extraction
- [ ] Add metadata merging logic
- [ ] Write comprehensive unit tests
- [ ] Update SchemaValidator to use loader
- [ ] Test with example markdown schemas
**Deliverables:**
- [ ] schema_loader.py implementation
- [ ] Unit tests (tests/test_schema_loader.py)
- [ ] Updated schema_validator.py
- [ ] Integration tests
**Duration:** 2-3 days
---
### Phase 3: Schema-for-Schemas (2 days)
**Goal:** Create metaschema to validate all schema files
**3.1 Design Schema-for-Schemas**
**File:** `markitect/schemas/schema-schema-v1.md`
**Purpose:** Validates that schema files follow MarkiTect conventions
**Validates:**
- Required fields ($schema, $id, version, title, description)
- Version format (SemVer)
- $id URL format
- x-markitect-* extensions
- Section classifications
- Content control structures
**3.2 Implement Schema-for-Schemas**
See separate file: `roadmap/schema-of-schemas/schema-schema-v1.md` (to be created)
**3.3 Add Schema Validation Command**
**New CLI command:** `markitect schema-validate`
```python
@cli.command('schema-validate')
@click.argument('schema_file', type=click.Path(exists=True, path_type=Path))
@click.option('--detailed-errors', is_flag=True)
def schema_validate(config, schema_file, detailed_errors):
"""
Validate a schema file against the schema-for-schemas.
Ensures schema files follow MarkiTect conventions and standards.
"""
from .schema_loader import MarkdownSchemaLoader
from .schema_validator import SchemaValidator
loader = MarkdownSchemaLoader()
validator = SchemaValidator()
# Load the schema
try:
schema_data = loader.load_schema(schema_file)
schema = schema_data['schema']
except Exception as e:
click.echo(f"❌ Failed to load schema: {e}", err=True)
sys.exit(1)
# Load schema-for-schemas
metaschema_path = Path(__file__).parent / 'schemas' / 'schema-schema-v1.md'
metaschema_data = loader.load_schema(metaschema_path)
metaschema = metaschema_data['schema']
# Validate
is_valid = validator.validate_schema_against_metaschema(schema, metaschema)
if is_valid:
click.echo(f"✅ Schema is valid: {schema_file.name}")
click.echo(f" Title: {schema.get('title')}")
click.echo(f" Version: {schema.get('version')}")
else:
click.echo(f"❌ Schema validation failed: {schema_file.name}", err=True)
if detailed_errors:
# Show detailed errors
pass
sys.exit(1)
```
**Tasks:**
- [ ] Design schema-for-schemas structure
- [ ] Implement schema-schema-v1.md
- [ ] Add schema validation logic
- [ ] Create `schema-validate` CLI command
- [ ] Test all existing schemas against metaschema
- [ ] Document validation rules
**Deliverables:**
- [ ] schema-schema-v1.md (metaschema)
- [ ] schema-validate command
- [ ] Validation documentation
- [ ] Test suite
**Duration:** 2 days
---
### Phase 4: Schema Migration (1-2 days)
**Goal:** Convert existing schemas to new format
**4.1 Inventory Current Schemas**
Current schemas in database:
```
1. terminology-schema.json → terminology-schema-v1.0.md
2. api-documentation → api-documentation-schema-v1.0.md
3. enhanced-manpage → manpage-schema-v2.0.md
4. markdown-manpage → manpage-schema-v1.0.md (DUPLICATE)
5. markdown-manpage-schema.json → manpage-schema-v1.0.md (DUPLICATE)
```
**Decision matrix:**
- Keep enhanced-manpage as v2.0 (has classifications)
- Merge markdown-manpage variants into v1.0
- Update terminology to v1.0
- Update api-documentation to v1.0
**4.2 Create Migration Script**
**File:** `scripts/migrate_schemas.py`
```python
#!/usr/bin/env python3
"""Migrate schemas to markdown format with versioning."""
from pathlib import Path
from markitect.schema_loader import MarkdownSchemaLoader
from markitect.database import DatabaseManager
def migrate_schema(
db_manager: DatabaseManager,
old_name: str,
new_name: str,
version: str,
domain: str
):
"""Migrate single schema to new format."""
# Get old schema from database
old_schema = db_manager.get_schema_file(old_name)
if not old_schema:
print(f"❌ Schema not found: {old_name}")
return
schema_json = json.loads(old_schema['schema_content'])
# Update metadata
schema_json['version'] = version
schema_json['$id'] = f"https://markitect.dev/schemas/{domain}/v{version.split('.')[0]}"
# Save as markdown
loader = MarkdownSchemaLoader()
md_path = Path(f"markitect/schemas/{new_name}")
loader.save_schema(schema_json, md_path)
print(f"✓ Migrated: {old_name}{new_name}")
# Ingest new schema
# ... ingest markdown schema to database ...
return md_path
def main():
migrations = [
('terminology-schema.json', 'terminology-schema-v1.0.md', '1.0.0', 'terminology'),
('api-documentation', 'api-documentation-schema-v1.0.md', '1.0.0', 'api-documentation'),
('enhanced-manpage', 'manpage-schema-v2.0.md', '2.0.0', 'manpage'),
('markdown-manpage', 'manpage-schema-v1.0.md', '1.0.0', 'manpage'),
]
db = DatabaseManager('markitect.db')
for old, new, version, domain in migrations:
migrate_schema(db, old, new, version, domain)
```
**4.3 Execute Migration**
```bash
# Run migration script
python scripts/migrate_schemas.py
# Validate all new schemas
for schema in markitect/schemas/*-schema-v*.md; do
markitect schema-validate "$schema"
done
# Ingest new schemas
for schema in markitect/schemas/*-schema-v*.md; do
markitect schema-ingest "$schema"
done
```
**4.4 Clean Up Registry**
```bash
# Remove old schemas from database
markitect schema-delete markdown-manpage
markitect schema-delete markdown-manpage-schema.json
markitect schema-delete api-documentation
markitect schema-delete enhanced-manpage
markitect schema-delete terminology-schema.json
# Verify cleanup
markitect schema-list
# Should show only versioned .md schemas
```
**Tasks:**
- [ ] Create schema inventory
- [ ] Write migration script
- [ ] Test migration on one schema
- [ ] Execute full migration
- [ ] Validate all migrated schemas
- [ ] Remove old schemas from database
- [ ] Update examples to use new schema names
**Deliverables:**
- [ ] scripts/migrate_schemas.py
- [ ] All schemas migrated to markdown
- [ ] Clean registry (no duplicates)
- [ ] Migration report
**Duration:** 1-2 days
---
### Phase 5: CLI & Documentation Updates (1 day)
**Goal:** Update CLI and documentation for new system
**5.1 Update CLI Commands**
Commands to update:
- `schema-ingest` - Accept .md files, validate filename
- `schema-list` - Show version in output
- `schema-get` - Export as .md or .json
- `validate` - Accept .md schema files
- `generate-stub` - Work with .md schemas
- `schema-generate` - Output .md format option
- NEW: `schema-validate` - Validate against metaschema
**5.2 Update Documentation**
Files to update:
- README.md - Mention markdown schemas
- examples/terminology/README.md - Use new schema name
- docs/specifications/schema-extensions-spec.md - Document markdown format
- Create: docs/guides/schema-authoring-guide.md
**5.3 Add Schema Templates**
**File:** `templates/schema-template-v1.md`
```markdown
---
schema-id: "https://markitect.dev/schemas/DOMAIN/v1"
version: "1.0.0"
status: "draft"
---
# TITLE Schema v1.0
## Overview
[Description of what this schema validates]
## Document Types
- [Document type 1]
- [Document type 2]
## Usage
\`\`\`bash
markitect validate document.md --schema DOMAIN-schema-v1.0.md
\`\`\`
## Examples
See [examples/DOMAIN/example.md](../../examples/DOMAIN/example.md)
## Schema Definition
\`\`\`json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/DOMAIN/v1",
"version": "1.0.0",
"title": "TITLE Schema",
"description": "Schema for validating DESCRIPTION",
"type": "object",
"properties": {
"headings": {
"type": "object"
}
},
"x-markitect-sections": {},
"x-markitect-content-control": {}
}
\`\`\`
## Validation Rules
### Required Sections
- **SECTION** - Description
### Optional Sections
- **SECTION** - Description
## Version History
### v1.0.0 (YYYY-MM-DD)
- Initial release
```
**Tasks:**
- [ ] Update all CLI commands for .md support
- [ ] Update documentation
- [ ] Create schema authoring guide
- [ ] Add schema template
- [ ] Update examples
- [ ] Test all workflows end-to-end
**Deliverables:**
- [ ] Updated CLI commands
- [ ] Schema authoring guide
- [ ] Schema template
- [ ] Updated examples
- [ ] End-to-end tests
**Duration:** 1 day
---
### Phase 6: Testing & Validation (1 day)
**Goal:** Comprehensive testing of new system
**6.1 Unit Tests**
Test coverage for:
- `schema_naming.py` - Filename validation
- `schema_loader.py` - Markdown parsing
- `schema_validator.py` - Validation with .md schemas
**6.2 Integration Tests**
End-to-end workflows:
1. Create new schema in markdown format
2. Validate schema against schema-for-schemas
3. Ingest schema to database
4. Use schema to validate documents
5. Generate stub from schema
6. Export schema
**6.3 Regression Tests**
Ensure existing functionality still works:
- JSON schemas still load (backward compatibility)
- All existing documents validate
- Schema generation still works
- Stub generation still works
**Tasks:**
- [ ] Write unit tests for new modules
- [ ] Create integration test suite
- [ ] Run regression tests
- [ ] Fix any issues found
- [ ] Achieve >80% code coverage
- [ ] Document test procedures
**Deliverables:**
- [ ] Unit tests (>80% coverage)
- [ ] Integration tests
- [ ] Regression test suite
- [ ] Test documentation
**Duration:** 1 day
---
## Timeline
```
Week 1:
Day 1: Phase 0 (Planning) + Phase 1 (Naming Convention)
Day 2-3: Phase 2 (Markdown Loader)
Day 4-5: Phase 3 (Schema-for-Schemas)
Week 2:
Day 6-7: Phase 4 (Migration)
Day 8: Phase 5 (CLI & Docs)
Day 9: Phase 6 (Testing)
Day 10: Buffer for issues/refinement
```
**Total:** 8-10 days
## Risks & Mitigation
### Risk 1: Parsing Complexity
**Risk:** Markdown parsing more complex than expected
**Probability:** Medium
**Impact:** High
**Mitigation:**
- Start with simple regex-based parser
- Test extensively with edge cases
- Have fallback to simpler format
### Risk 2: Backward Compatibility
**Risk:** Breaking existing workflows
**Probability:** Low
**Impact:** High
**Mitigation:**
- Support both .json and .md during transition
- Provide migration script
- Test thoroughly with existing documents
### Risk 3: Schema-for-Schemas Complexity
**Risk:** Self-referential validation complex
**Probability:** Medium
**Impact:** Medium
**Mitigation:**
- Start with simple metaschema
- Iterate based on actual schemas
- Don't over-engineer initially
## Success Metrics
- [ ] All schemas follow naming convention (5/5)
- [ ] All schemas in markdown format (5/5)
- [ ] All schemas validate against metaschema (5/5)
- [ ] Zero duplicate schemas in registry
- [ ] CLI commands work with .md schemas
- [ ] Documentation comprehensive
- [ ] Test coverage >80%
- [ ] No regression in existing functionality
## Deliverables Checklist
### Code
- [ ] markitect/schema_naming.py
- [ ] markitect/schema_loader.py
- [ ] markitect/schemas/schema-schema-v1.md
- [ ] scripts/migrate_schemas.py
- [ ] Updated CLI commands
- [ ] Unit tests
- [ ] Integration tests
### Documentation
- [ ] SCHEMA_NAMING_SPEC.md
- [ ] SCHEMA_METADATA_SPEC.md
- [ ] Schema authoring guide
- [ ] Migration guide
- [ ] Updated examples
- [ ] IMPLEMENTATION_LOG.md
### Schemas
- [ ] terminology-schema-v1.0.md
- [ ] api-documentation-schema-v1.0.md
- [ ] manpage-schema-v1.0.md
- [ ] manpage-schema-v2.0.md
- [ ] schema-schema-v1.0.md
### Registry
- [ ] Clean schema database
- [ ] Updated schema-catalog.yaml
- [ ] No duplicates
## Next Steps
1. **Review this workplan** - Get approval
2. **Phase 0** - Complete planning artifacts
3. **Phase 1** - Implement naming validation
4. **Checkpoint** - Review progress after Phase 1
5. **Continue** - Execute remaining phases
## Approval
- [ ] Workplan reviewed
- [ ] Approach approved
- [ ] Ready to begin implementation
---
**Status:** Awaiting approval
**Next Action:** Complete Phase 0 planning artifacts

688
tests/test_schema_loader.py Normal file
View File

@@ -0,0 +1,688 @@
"""
Unit tests for schema_loader.py - Markdown schema loading.
Tests the markdown schema loader functionality including:
- Frontmatter extraction (YAML)
- JSON schema extraction from code blocks
- Metadata merging
- Schema saving
- Error handling
"""
import pytest
import json
import yaml
from pathlib import Path
from markitect.schema_loader import (
MarkdownSchemaLoader,
SchemaLoaderError,
InvalidSchemaFormatError,
SchemaNotFoundError
)
# Test fixtures
@pytest.fixture
def temp_schema_dir(tmp_path):
"""Create temporary directory for schema files."""
schema_dir = tmp_path / "schemas"
schema_dir.mkdir()
return schema_dir
@pytest.fixture
def simple_schema_md():
"""Simple valid markdown schema content."""
return """---
schema-id: "https://markitect.dev/schemas/test/v1"
version: "1.0.0"
status: "stable"
---
# Test Schema v1.0
## Overview
This is a test schema for validation.
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/test/v1",
"version": "1.0.0",
"title": "Test Schema",
"description": "Schema for testing",
"type": "object",
"properties": {
"name": {"type": "string"}
}
}
```
## Version History
### v1.0.0
- Initial version
"""
@pytest.fixture
def schema_without_frontmatter():
"""Schema without YAML frontmatter."""
return """# Test Schema v1.0
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Test Schema",
"type": "object"
}
```
"""
@pytest.fixture
def schema_multiple_json_blocks():
"""Schema with multiple JSON code blocks."""
return """---
version: "1.0.0"
---
# Test Schema
## Example Usage
```json
{
"example": "This is not the schema"
}
```
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Test Schema",
"type": "object"
}
```
## More Examples
```json
{
"another": "example"
}
```
"""
class TestMarkdownSchemaLoader:
"""Tests for MarkdownSchemaLoader class."""
def test_init(self):
"""Test loader initialization."""
loader = MarkdownSchemaLoader()
assert loader is not None
assert hasattr(loader, 'frontmatter_pattern')
assert hasattr(loader, 'json_code_block_pattern')
def test_load_simple_schema(self, temp_schema_dir, simple_schema_md):
"""Test loading a simple valid schema."""
schema_file = temp_schema_dir / "test-schema-v1.0.md"
schema_file.write_text(simple_schema_md)
loader = MarkdownSchemaLoader()
result = loader.load_schema(schema_file)
assert 'schema' in result
assert 'metadata' in result
assert 'documentation' in result
assert 'source_file' in result
# Check schema content
schema = result['schema']
assert schema['title'] == 'Test Schema'
assert schema['version'] == '1.0.0'
assert schema['type'] == 'object'
# Check metadata
metadata = result['metadata']
assert metadata['version'] == '1.0.0'
assert metadata['status'] == 'stable'
# Check source tracking
assert result['source_file'] == str(schema_file)
assert 'x-markitect-source' in schema
assert schema['x-markitect-source']['format'] == 'markdown'
def test_load_schema_file_not_found(self):
"""Test loading non-existent file raises FileNotFoundError."""
loader = MarkdownSchemaLoader()
with pytest.raises(FileNotFoundError, match="Schema file not found"):
loader.load_schema(Path("/nonexistent/schema.md"))
def test_load_schema_without_json(self, temp_schema_dir):
"""Test loading markdown without JSON schema raises error."""
schema_file = temp_schema_dir / "no-schema.md"
schema_file.write_text("# Just a heading\n\nNo schema here.")
loader = MarkdownSchemaLoader()
with pytest.raises(SchemaNotFoundError, match="No JSON schema found"):
loader.load_schema(schema_file)
def test_load_schema_invalid_json(self, temp_schema_dir):
"""Test loading markdown with invalid JSON raises error."""
content = """# Test
```json
{invalid json}
```
"""
schema_file = temp_schema_dir / "invalid.md"
schema_file.write_text(content)
loader = MarkdownSchemaLoader()
with pytest.raises(InvalidSchemaFormatError, match="Invalid JSON"):
loader.load_schema(schema_file)
class TestExtractFrontmatter:
"""Tests for frontmatter extraction."""
def test_extract_valid_frontmatter(self, simple_schema_md):
"""Test extracting valid YAML frontmatter."""
loader = MarkdownSchemaLoader()
metadata = loader._extract_frontmatter(simple_schema_md)
assert metadata['schema-id'] == 'https://markitect.dev/schemas/test/v1'
assert metadata['version'] == '1.0.0'
assert metadata['status'] == 'stable'
def test_extract_no_frontmatter(self, schema_without_frontmatter):
"""Test extracting from content without frontmatter returns empty dict."""
loader = MarkdownSchemaLoader()
metadata = loader._extract_frontmatter(schema_without_frontmatter)
assert metadata == {}
def test_extract_invalid_yaml_frontmatter(self):
"""Test extracting invalid YAML raises error."""
content = """---
invalid: yaml: syntax: error
---
# Content
"""
loader = MarkdownSchemaLoader()
with pytest.raises(InvalidSchemaFormatError, match="Invalid YAML"):
loader._extract_frontmatter(content)
def test_extract_non_dict_frontmatter(self):
"""Test extracting non-dictionary YAML raises error."""
content = """---
- list
- not
- dict
---
# Content
"""
loader = MarkdownSchemaLoader()
with pytest.raises(InvalidSchemaFormatError, match="must be a YAML dictionary"):
loader._extract_frontmatter(content)
def test_extract_complex_frontmatter(self):
"""Test extracting complex frontmatter with nested structures."""
content = """---
schema-id: "https://example.com/schema"
version: "1.0.0"
tags:
- documentation
- schema
metadata:
author: "Test Author"
created: "2026-01-04"
---
# Content
"""
loader = MarkdownSchemaLoader()
metadata = loader._extract_frontmatter(content)
assert metadata['tags'] == ['documentation', 'schema']
assert metadata['metadata']['author'] == 'Test Author'
class TestExtractJsonSchema:
"""Tests for JSON schema extraction."""
def test_extract_single_json_block(self, schema_without_frontmatter):
"""Test extracting single JSON block."""
loader = MarkdownSchemaLoader()
schema = loader._extract_json_schema(schema_without_frontmatter)
assert schema is not None
assert schema['title'] == 'Test Schema'
assert schema['type'] == 'object'
def test_extract_from_schema_definition_section(self, schema_multiple_json_blocks):
"""Test preferring JSON block under Schema Definition heading."""
loader = MarkdownSchemaLoader()
schema = loader._extract_json_schema(schema_multiple_json_blocks)
assert schema is not None
assert schema['title'] == 'Test Schema'
# Should get the schema from Schema Definition section, not the example
def test_extract_no_json_block(self):
"""Test extracting from content with no JSON blocks returns None."""
content = "# Just text\n\nNo code blocks here."
loader = MarkdownSchemaLoader()
schema = loader._extract_json_schema(content)
assert schema is None
def test_extract_invalid_json_block(self):
"""Test extracting invalid JSON raises error."""
content = """# Test
```json
{invalid}
```
"""
loader = MarkdownSchemaLoader()
with pytest.raises(InvalidSchemaFormatError, match="Invalid JSON"):
loader._extract_json_schema(content)
def test_extract_non_object_json(self):
"""Test extracting JSON array (non-object) raises error."""
content = """# Test
```json
["array", "not", "object"]
```
"""
loader = MarkdownSchemaLoader()
with pytest.raises(InvalidSchemaFormatError, match="must be a JSON object"):
loader._extract_json_schema(content)
class TestMergeMetadata:
"""Tests for metadata merging."""
def test_merge_basic_metadata(self):
"""Test merging frontmatter into schema."""
loader = MarkdownSchemaLoader()
schema = {
'title': 'Test Schema',
'type': 'object'
}
metadata = {
'version': '2.0.0',
'schema-id': 'https://example.com/v2',
'status': 'draft'
}
merged = loader._merge_metadata(schema, metadata, Path('test.md'))
# Version should be overridden
assert merged['version'] == '2.0.0'
# $id should be set from schema-id
assert merged['$id'] == 'https://example.com/v2'
# Status should be in x-markitect-metadata
assert merged['x-markitect-metadata']['status'] == 'draft'
# Source tracking should be added
assert merged['x-markitect-source']['file'] == 'test.md'
assert merged['x-markitect-source']['format'] == 'markdown'
def test_merge_preserves_schema_fields(self):
"""Test merging doesn't remove existing schema fields."""
loader = MarkdownSchemaLoader()
schema = {
'title': 'Test',
'type': 'object',
'properties': {'name': {'type': 'string'}}
}
merged = loader._merge_metadata(schema, {}, Path('test.md'))
assert merged['title'] == 'Test'
assert merged['type'] == 'object'
assert 'properties' in merged
def test_merge_frontmatter_takes_precedence(self):
"""Test frontmatter overrides schema values."""
loader = MarkdownSchemaLoader()
schema = {
'version': '1.0.0',
'$id': 'old-id'
}
metadata = {
'version': '2.0.0',
'schema-id': 'new-id'
}
merged = loader._merge_metadata(schema, metadata, Path('test.md'))
assert merged['version'] == '2.0.0'
assert merged['$id'] == 'new-id'
class TestSaveSchema:
"""Tests for saving schemas to markdown."""
def test_save_simple_schema(self, temp_schema_dir):
"""Test saving a schema to markdown file."""
loader = MarkdownSchemaLoader()
schema = {
'$schema': 'http://json-schema.org/draft-07/schema#',
'$id': 'https://example.com/schema/v1',
'version': '1.0.0',
'title': 'Test Schema',
'description': 'A test schema',
'type': 'object'
}
output_file = temp_schema_dir / 'output-schema-v1.0.md'
loader.save_schema(schema, output_file)
assert output_file.exists()
# Verify content
content = output_file.read_text()
assert '---' in content # Frontmatter
assert 'Test Schema v1.0.0' in content # Title
assert '```json' in content # JSON block
assert '"title": "Test Schema"' in content
def test_save_creates_parent_directory(self, temp_schema_dir):
"""Test saving creates parent directories if needed."""
loader = MarkdownSchemaLoader()
schema = {'title': 'Test', 'type': 'object'}
output_file = temp_schema_dir / 'nested' / 'dir' / 'schema.md'
loader.save_schema(schema, output_file)
assert output_file.exists()
assert output_file.parent.exists()
def test_save_with_custom_frontmatter(self, temp_schema_dir):
"""Test saving with custom frontmatter."""
loader = MarkdownSchemaLoader()
schema = {'title': 'Test', 'type': 'object'}
frontmatter = {
'schema-id': 'https://custom.com/schema',
'status': 'experimental',
'tags': ['test', 'custom']
}
output_file = temp_schema_dir / 'custom.md'
loader.save_schema(schema, output_file, frontmatter=frontmatter)
content = output_file.read_text()
assert 'experimental' in content
assert 'https://custom.com/schema' in content
def test_save_and_reload_roundtrip(self, temp_schema_dir):
"""Test saving and reloading produces same schema."""
loader = MarkdownSchemaLoader()
original_schema = {
'$schema': 'http://json-schema.org/draft-07/schema#',
'version': '1.0.0',
'title': 'Roundtrip Test',
'type': 'object',
'properties': {
'name': {'type': 'string'},
'age': {'type': 'integer'}
}
}
schema_file = temp_schema_dir / 'roundtrip-schema-v1.0.md'
loader.save_schema(original_schema, schema_file)
# Reload
loaded = loader.load_schema(schema_file)
loaded_schema = loaded['schema']
# Compare key fields (ignoring x-markitect-source added during load)
assert loaded_schema['title'] == original_schema['title']
assert loaded_schema['type'] == original_schema['type']
assert loaded_schema['properties'] == original_schema['properties']
class TestGenerateMarkdown:
"""Tests for markdown generation."""
def test_generate_basic_markdown(self):
"""Test generating basic markdown from schema."""
loader = MarkdownSchemaLoader()
schema = {
'title': 'Test Schema',
'version': '1.0.0',
'description': 'Test description',
'type': 'object'
}
md = loader._generate_markdown(schema)
assert 'Test Schema v1.0.0' in md
assert 'Test description' in md
assert '```json' in md
assert '"title": "Test Schema"' in md
assert '---' in md # Frontmatter
def test_generate_includes_frontmatter(self):
"""Test generated markdown includes frontmatter."""
loader = MarkdownSchemaLoader()
schema = {
'$id': 'https://example.com/schema',
'title': 'Test',
'version': '2.0.0',
'type': 'object'
}
md = loader._generate_markdown(schema)
# Parse frontmatter
lines = md.split('\n')
assert lines[0] == '---'
# Find end of frontmatter
end_idx = lines[1:].index('---') + 1
frontmatter_yaml = '\n'.join(lines[1:end_idx])
frontmatter = yaml.safe_load(frontmatter_yaml)
assert frontmatter['version'] == '2.0.0'
assert frontmatter['schema-id'] == 'https://example.com/schema'
class TestListJsonBlocks:
"""Tests for listing JSON blocks."""
def test_list_single_block(self, schema_without_frontmatter):
"""Test listing single JSON block."""
loader = MarkdownSchemaLoader()
blocks = loader.list_json_blocks(schema_without_frontmatter)
assert len(blocks) == 1
assert '"title": "Test Schema"' in blocks[0][1]
def test_list_multiple_blocks(self, schema_multiple_json_blocks):
"""Test listing multiple JSON blocks."""
loader = MarkdownSchemaLoader()
blocks = loader.list_json_blocks(schema_multiple_json_blocks)
assert len(blocks) == 3
# First block
assert '"example"' in blocks[0][1]
# Second block (schema)
assert '"title": "Test Schema"' in blocks[1][1]
# Third block
assert '"another"' in blocks[2][1]
def test_list_no_blocks(self):
"""Test listing with no JSON blocks."""
loader = MarkdownSchemaLoader()
blocks = loader.list_json_blocks("# Just text\n\nNo code blocks.")
assert len(blocks) == 0
class TestValidateSchemaStructure:
"""Tests for schema structure validation."""
def test_validate_complete_schema(self):
"""Test validating complete schema returns no issues."""
loader = MarkdownSchemaLoader()
schema = {
'$schema': 'http://json-schema.org/draft-07/schema#',
'$id': 'https://example.com/schema',
'version': '1.0.0',
'title': 'Test Schema',
'description': 'Test description',
'type': 'object'
}
issues = loader.validate_schema_structure(schema)
assert len(issues) == 0
def test_validate_missing_required_fields(self):
"""Test validation detects missing required fields."""
loader = MarkdownSchemaLoader()
schema = {'type': 'object'}
issues = loader.validate_schema_structure(schema)
assert len(issues) > 0
assert any('$schema' in issue for issue in issues)
assert any('title' in issue for issue in issues)
assert any('description' in issue for issue in issues)
def test_validate_missing_version(self):
"""Test validation detects missing version field."""
loader = MarkdownSchemaLoader()
schema = {
'$schema': 'http://json-schema.org/draft-07/schema#',
'title': 'Test',
'type': 'object'
}
issues = loader.validate_schema_structure(schema)
assert any('version' in issue for issue in issues)
def test_validate_invalid_id_format(self):
"""Test validation detects non-HTTPS $id."""
loader = MarkdownSchemaLoader()
schema = {
'$schema': 'http://json-schema.org/draft-07/schema#',
'$id': 'http://example.com/schema', # HTTP not HTTPS
'version': '1.0.0',
'title': 'Test',
'type': 'object'
}
issues = loader.validate_schema_structure(schema)
assert any('HTTPS' in issue for issue in issues)
class TestEdgeCases:
"""Tests for edge cases and error conditions."""
def test_load_empty_file(self, temp_schema_dir):
"""Test loading empty file raises error."""
schema_file = temp_schema_dir / 'empty.md'
schema_file.write_text('')
loader = MarkdownSchemaLoader()
with pytest.raises(SchemaNotFoundError):
loader.load_schema(schema_file)
def test_load_binary_file(self, temp_schema_dir):
"""Test loading binary file with invalid UTF-8 raises error."""
schema_file = temp_schema_dir / 'binary.md'
# Use invalid UTF-8 sequences that will trigger UnicodeDecodeError
schema_file.write_bytes(b'\xff\xfe\x00\x00\x80\x81\x82')
loader = MarkdownSchemaLoader()
with pytest.raises(InvalidSchemaFormatError):
loader.load_schema(schema_file)
def test_malformed_code_block(self, temp_schema_dir):
"""Test handling malformed code block delimiters."""
content = """# Test
```json
{"valid": "json"
# Missing closing backticks
"""
schema_file = temp_schema_dir / 'malformed.md'
schema_file.write_text(content)
loader = MarkdownSchemaLoader()
with pytest.raises(SchemaNotFoundError):
loader.load_schema(schema_file)
def test_very_large_schema(self, temp_schema_dir):
"""Test loading very large schema."""
# Create large schema with many properties
large_schema = {
'$schema': 'http://json-schema.org/draft-07/schema#',
'title': 'Large Schema',
'type': 'object',
'properties': {
f'prop_{i}': {'type': 'string'}
for i in range(1000)
}
}
content = f"""# Large Schema
```json
{json.dumps(large_schema, indent=2)}
```
"""
schema_file = temp_schema_dir / 'large.md'
schema_file.write_text(content)
loader = MarkdownSchemaLoader()
result = loader.load_schema(schema_file)
assert len(result['schema']['properties']) == 1000

View File

@@ -0,0 +1,300 @@
"""
Unit tests for schema metaschema validation.
Tests that schemas validate correctly against the schema-for-schemas metaschema.
"""
import pytest
import json
from pathlib import Path
try:
from jsonschema import Draft7Validator
JSONSCHEMA_AVAILABLE = True
except ImportError:
JSONSCHEMA_AVAILABLE = False
from markitect.schema_loader import MarkdownSchemaLoader
@pytest.fixture
def loader():
"""Create a schema loader instance."""
return MarkdownSchemaLoader()
@pytest.fixture
def metaschema(loader):
"""Load the metaschema."""
metaschema_path = Path(__file__).parent.parent / 'markitect' / 'schemas' / 'schema-schema-v1.0.md'
metaschema_data = loader.load_schema(metaschema_path)
return metaschema_data['schema']
@pytest.mark.skipif(not JSONSCHEMA_AVAILABLE, reason="jsonschema not installed")
class TestMetaschemaValidation:
"""Tests for validating schemas against the metaschema."""
def test_metaschema_self_validation(self, loader, metaschema):
"""Test that metaschema validates itself."""
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(metaschema))
assert len(errors) == 0, f"Metaschema should validate itself, but got errors: {errors}"
def test_manpage_schema_validation(self, loader, metaschema):
"""Test that manpage schema validates against metaschema."""
manpage_path = Path(__file__).parent.parent / 'markitect' / 'schemas' / 'manpage-schema-v1.0.md'
manpage_data = loader.load_schema(manpage_path)
manpage_schema = manpage_data['schema']
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(manpage_schema))
assert len(errors) == 0, f"Manpage schema should be valid, but got errors: {[e.message for e in errors]}"
def test_required_fields_enforced(self, metaschema):
"""Test that metaschema enforces required fields."""
# Schema missing required fields
invalid_schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
# Missing: $id, title, description, version
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(invalid_schema))
# Should have errors for missing required fields
assert len(errors) > 0
error_messages = [e.message for e in errors]
# Check that required fields are mentioned
required_fields = ['$id', 'title', 'description', 'version']
for field in required_fields:
assert any(field in msg for msg in error_messages), \
f"Should report missing required field: {field}"
def test_version_format_validation(self, metaschema):
"""Test that metaschema validates version format."""
# Invalid version formats
invalid_versions = [
"1.0", # Missing patch
"v1.0.0", # Has v prefix
"1", # Only major
"1.0.0.0", # Too many parts
]
for invalid_version in invalid_versions:
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/test/v1.0",
"title": "Test Schema",
"description": "Test schema for validation",
"version": invalid_version
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
assert len(errors) > 0, f"Should reject invalid version: {invalid_version}"
assert any('pattern' in str(e.schema_path) for e in errors), \
f"Should be a pattern error for version: {invalid_version}"
def test_valid_version_format(self, metaschema):
"""Test that valid version formats are accepted."""
valid_versions = [
"1.0.0",
"2.5.3",
"10.25.99",
"0.0.1",
]
for valid_version in valid_versions:
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/test/v1.0",
"title": "Test Schema",
"description": "Test schema for validation",
"version": valid_version
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
# Filter out errors not related to version
version_errors = [e for e in errors if 'version' in str(e.path)]
assert len(version_errors) == 0, \
f"Should accept valid version: {valid_version}, but got errors: {version_errors}"
def test_id_format_validation(self, metaschema):
"""Test that metaschema validates $id format."""
invalid_ids = [
"http://example.com/schema", # Not HTTPS
"https://example.com/schema", # No version
"schema/v1.0", # Not a URL
"https://example.com/schemas/test/1.0", # No 'v' prefix
]
for invalid_id in invalid_ids:
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": invalid_id,
"title": "Test Schema",
"description": "Test schema for validation",
"version": "1.0.0"
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
assert len(errors) > 0, f"Should reject invalid $id: {invalid_id}"
def test_valid_id_format(self, metaschema):
"""Test that valid $id formats are accepted."""
valid_ids = [
"https://markitect.dev/schemas/test/v1.0",
"https://example.com/schemas/my-schema/v2.5",
"https://api.example.com/schemas/domain/v10.25",
]
for valid_id in valid_ids:
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": valid_id,
"title": "Test Schema",
"description": "Test schema for validation",
"version": "1.0.0"
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
# Filter out errors not related to $id
id_errors = [e for e in errors if '$id' in str(e.path)]
assert len(id_errors) == 0, \
f"Should accept valid $id: {valid_id}, but got errors: {id_errors}"
def test_section_classification_validation(self, metaschema):
"""Test that section classifications are validated."""
# Invalid classification
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/test/v1.0",
"title": "Test Schema",
"description": "Test schema for validation",
"version": "1.0.0",
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "mandatory", # Invalid, should be 'required'
"heading_level": 2
}
}
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
assert len(errors) > 0
# Should have error about invalid enum value
assert any('enum' in str(e.schema_path) or 'mandatory' in e.message for e in errors)
def test_valid_section_classifications(self, metaschema):
"""Test that valid section classifications are accepted."""
classifications = ['required', 'recommended', 'optional', 'discouraged', 'improper']
for classification in classifications:
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/test/v1.0",
"title": "Test Schema",
"description": "Test schema for validation",
"version": "1.0.0",
"x-markitect-sections": {
"TEST_SECTION": {
"classification": classification,
"heading_level": 2
}
}
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
# Filter errors related to classification
classification_errors = [e for e in errors if 'classification' in str(e.path)]
assert len(classification_errors) == 0, \
f"Should accept valid classification: {classification}"
def test_schema_with_all_extensions(self, metaschema):
"""Test schema with all MarkiTect extensions."""
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/test/v1.0",
"title": "Complete Test Schema",
"description": "Schema with all MarkiTect extensions",
"version": "1.0.0",
"type": "object",
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"content_instruction": "Brief overview",
"min_paragraphs": 1,
"max_paragraphs": 3
}
},
"x-markitect-content-control": {
"synopsis": {
"required_patterns": ["\\*\\*.*\\*\\*"],
"content_quality": {
"min_words": 10,
"max_words": 100,
"readability_target": "technical"
}
}
},
"x-markitect-metadata": {
"status": "stable",
"authors": ["Test Author <test@example.com>"],
"tags": ["test", "example"]
}
}
validator = Draft7Validator(metaschema)
errors = list(validator.iter_errors(schema))
assert len(errors) == 0, f"Complete schema should be valid, but got errors: {[e.message for e in errors]}"
class TestSchemaLoaderIntegration:
"""Integration tests for schema loader with metaschema."""
def test_load_and_validate_manpage_schema(self, loader, metaschema):
"""Test loading and validating manpage schema."""
manpage_path = Path(__file__).parent.parent / 'markitect' / 'schemas' / 'manpage-schema-v1.0.md'
# Load schema
schema_data = loader.load_schema(manpage_path)
schema = schema_data['schema']
# Check metadata was merged
assert 'x-markitect-source' in schema
assert schema['x-markitect-source']['format'] == 'markdown'
assert schema['x-markitect-source']['filename'] == 'manpage-schema-v1.0.md'
# Validate structure
issues = loader.validate_schema_structure(schema)
# Should have no critical issues
assert all('Missing' not in issue or 'recommended' in issue.lower() for issue in issues)
def test_metaschema_structure_validation(self, loader):
"""Test metaschema structure with loader's validator."""
metaschema_path = Path(__file__).parent.parent / 'markitect' / 'schemas' / 'schema-schema-v1.0.md'
metaschema_data = loader.load_schema(metaschema_path)
metaschema = metaschema_data['schema']
# Validate structure
issues = loader.validate_schema_structure(metaschema)
# Metaschema should have minimal issues
critical_issues = [i for i in issues if 'Missing required field' in i]
assert len(critical_issues) == 0, f"Metaschema has critical issues: {critical_issues}"

390
tests/test_schema_naming.py Normal file
View File

@@ -0,0 +1,390 @@
"""
Unit tests for schema_naming.py - Schema filename validation.
Tests the schema naming convention enforcement including:
- Valid filename validation
- Invalid filename detection
- Metadata extraction
- Filename suggestion
- Error message generation
"""
import pytest
from markitect.schema_naming import (
validate_schema_filename,
suggest_schema_filename,
extract_schema_metadata,
get_validation_errors,
is_valid_schema_filename,
format_validation_message,
SchemaFilenameError,
SCHEMA_FILENAME_PATTERN
)
class TestValidateSchemaFilename:
"""Tests for validate_schema_filename function."""
def test_valid_simple_schema(self):
"""Test validation of simple valid schema filename."""
is_valid, metadata = validate_schema_filename("manpage-schema-v1.0.md")
assert is_valid is True
assert metadata is not None
assert metadata['domain'] == 'manpage'
assert metadata['version'] == '1.0'
assert metadata['major'] == 1
assert metadata['minor'] == 0
assert metadata['filename'] == 'manpage-schema-v1.0.md'
def test_valid_hyphenated_domain(self):
"""Test validation with multi-word hyphenated domain."""
is_valid, metadata = validate_schema_filename("api-documentation-schema-v1.0.md")
assert is_valid is True
assert metadata['domain'] == 'api-documentation'
assert metadata['version'] == '1.0'
def test_valid_with_numbers_in_domain(self):
"""Test validation with numbers in domain name."""
is_valid, metadata = validate_schema_filename("arc42-schema-v1.0.md")
assert is_valid is True
assert metadata['domain'] == 'arc42'
def test_valid_higher_version(self):
"""Test validation with version > 1.0."""
is_valid, metadata = validate_schema_filename("manpage-schema-v2.5.md")
assert is_valid is True
assert metadata['version'] == '2.5'
assert metadata['major'] == 2
assert metadata['minor'] == 5
def test_valid_double_digit_version(self):
"""Test validation with double-digit version numbers."""
is_valid, metadata = validate_schema_filename("manpage-schema-v10.25.md")
assert is_valid is True
assert metadata['major'] == 10
assert metadata['minor'] == 25
def test_invalid_wrong_extension(self):
"""Test that .json extension is invalid."""
is_valid, metadata = validate_schema_filename("manpage-schema-v1.0.json")
assert is_valid is False
assert metadata is None
def test_invalid_no_extension(self):
"""Test that filename without extension is invalid."""
is_valid, metadata = validate_schema_filename("manpage-schema-v1.0")
assert is_valid is False
assert metadata is None
def test_invalid_missing_schema_keyword(self):
"""Test that filename without 'schema' keyword is invalid."""
is_valid, metadata = validate_schema_filename("manpage-v1.0.md")
assert is_valid is False
assert metadata is None
def test_invalid_missing_version(self):
"""Test that filename without version is invalid."""
is_valid, metadata = validate_schema_filename("manpage-schema.md")
assert is_valid is False
assert metadata is None
def test_invalid_wrong_version_format(self):
"""Test that version without 'v' prefix is invalid."""
is_valid, metadata = validate_schema_filename("manpage-schema-1.0.md")
assert is_valid is False
assert metadata is None
def test_invalid_missing_minor_version(self):
"""Test that version without minor number is invalid."""
is_valid, metadata = validate_schema_filename("manpage-schema-v1.md")
assert is_valid is False
assert metadata is None
def test_invalid_uppercase_letters(self):
"""Test that uppercase letters make filename invalid."""
is_valid, metadata = validate_schema_filename("ManPage-Schema-v1.0.md")
assert is_valid is False
assert metadata is None
def test_invalid_starting_with_number(self):
"""Test that domain starting with number is invalid."""
is_valid, metadata = validate_schema_filename("42answers-schema-v1.0.md")
assert is_valid is False
assert metadata is None
def test_invalid_starting_with_hyphen(self):
"""Test that domain starting with hyphen is invalid."""
is_valid, metadata = validate_schema_filename("-manpage-schema-v1.0.md")
assert is_valid is False
assert metadata is None
def test_invalid_special_characters(self):
"""Test that special characters in domain are invalid."""
is_valid, metadata = validate_schema_filename("man_page-schema-v1.0.md")
assert is_valid is False
assert metadata is None
class TestSuggestSchemaFilename:
"""Tests for suggest_schema_filename function."""
def test_suggest_simple_domain(self):
"""Test suggestion for simple domain."""
filename = suggest_schema_filename("manpage", "1.0")
assert filename == "manpage-schema-v1.0.md"
def test_suggest_with_spaces(self):
"""Test suggestion normalizes spaces to hyphens."""
filename = suggest_schema_filename("API Documentation", "1.0")
assert filename == "api-documentation-schema-v1.0.md"
def test_suggest_with_underscores(self):
"""Test suggestion normalizes underscores to hyphens."""
filename = suggest_schema_filename("my_custom_type", "1.0")
assert filename == "my-custom-type-schema-v1.0.md"
def test_suggest_with_uppercase(self):
"""Test suggestion converts to lowercase."""
filename = suggest_schema_filename("MyCustomType", "1.0")
assert filename == "mycustomtype-schema-v1.0.md"
def test_suggest_mixed_normalization(self):
"""Test suggestion with mixed case and separators."""
filename = suggest_schema_filename("My_Custom Type", "1.0")
assert filename == "my-custom-type-schema-v1.0.md"
def test_suggest_higher_version(self):
"""Test suggestion with version > 1.0."""
filename = suggest_schema_filename("manpage", "2.5")
assert filename == "manpage-schema-v2.5.md"
def test_suggest_double_digit_version(self):
"""Test suggestion with double-digit version."""
filename = suggest_schema_filename("manpage", "10.25")
assert filename == "manpage-schema-v10.25.md"
def test_suggest_consecutive_hyphens(self):
"""Test suggestion removes consecutive hyphens."""
filename = suggest_schema_filename("my--custom---type", "1.0")
assert filename == "my-custom-type-schema-v1.0.md"
def test_suggest_leading_trailing_hyphens(self):
"""Test suggestion removes leading/trailing hyphens."""
filename = suggest_schema_filename("-my-type-", "1.0")
assert filename == "my-type-schema-v1.0.md"
def test_suggest_default_version(self):
"""Test suggestion uses default version 1.0."""
filename = suggest_schema_filename("manpage")
assert filename == "manpage-schema-v1.0.md"
def test_suggest_empty_domain_raises_error(self):
"""Test that empty domain raises ValueError."""
with pytest.raises(ValueError, match="Domain cannot be empty"):
suggest_schema_filename("", "1.0")
def test_suggest_invalid_version_format_raises_error(self):
"""Test that invalid version format raises ValueError."""
with pytest.raises(ValueError, match="must be in format 'major.minor'"):
suggest_schema_filename("manpage", "1")
def test_suggest_invalid_version_parts_raises_error(self):
"""Test that non-integer version parts raise ValueError."""
with pytest.raises(ValueError, match="major and minor must be integers"):
suggest_schema_filename("manpage", "1.x")
def test_suggest_negative_version_raises_error(self):
"""Test that negative version numbers raise ValueError."""
with pytest.raises(ValueError, match="must be non-negative"):
suggest_schema_filename("manpage", "-1.0")
def test_suggest_without_normalization(self):
"""Test suggestion without normalization (must already be valid)."""
filename = suggest_schema_filename("manpage", "1.0", normalize=False)
assert filename == "manpage-schema-v1.0.md"
def test_suggest_without_normalization_invalid_raises_error(self):
"""Test that invalid domain without normalization raises ValueError."""
with pytest.raises(ValueError, match="Invalid domain"):
suggest_schema_filename("My Custom Type", "1.0", normalize=False)
class TestExtractSchemaMetadata:
"""Tests for extract_schema_metadata function."""
def test_extract_valid_metadata(self):
"""Test metadata extraction from valid filename."""
metadata = extract_schema_metadata("manpage-schema-v1.0.md")
assert metadata['domain'] == 'manpage'
assert metadata['version'] == '1.0'
assert metadata['major'] == 1
assert metadata['minor'] == 0
def test_extract_invalid_raises_error(self):
"""Test that invalid filename raises SchemaFilenameError."""
with pytest.raises(SchemaFilenameError, match="Invalid schema filename"):
extract_schema_metadata("invalid.json")
class TestGetValidationErrors:
"""Tests for get_validation_errors function."""
def test_valid_filename_no_errors(self):
"""Test that valid filename returns empty error list."""
errors = get_validation_errors("manpage-schema-v1.0.md")
assert errors == []
def test_wrong_extension_error(self):
"""Test error for wrong file extension."""
errors = get_validation_errors("manpage-schema-v1.0.json")
assert len(errors) > 0
assert any("Extension must be '.md'" in e for e in errors)
def test_missing_version_error(self):
"""Test error for missing version."""
errors = get_validation_errors("manpage-schema.md")
assert len(errors) > 0
assert any("Missing version" in e for e in errors)
def test_missing_schema_keyword_error(self):
"""Test error for missing schema keyword."""
errors = get_validation_errors("manpage-v1.0.md")
assert len(errors) > 0
assert any("Missing '-schema-'" in e for e in errors)
def test_uppercase_letters_error(self):
"""Test error for uppercase letters."""
errors = get_validation_errors("ManPage-schema-v1.0.md")
assert len(errors) > 0
assert any("must be lowercase" in e for e in errors)
def test_invalid_domain_error(self):
"""Test error for invalid domain format."""
errors = get_validation_errors("42answer-schema-v1.0.md")
assert len(errors) > 0
# Should detect that domain doesn't start with letter
class TestIsValidSchemaFilename:
"""Tests for is_valid_schema_filename convenience function."""
def test_is_valid_returns_true(self):
"""Test that valid filename returns True."""
assert is_valid_schema_filename("manpage-schema-v1.0.md") is True
def test_is_valid_returns_false(self):
"""Test that invalid filename returns False."""
assert is_valid_schema_filename("invalid.json") is False
class TestFormatValidationMessage:
"""Tests for format_validation_message function."""
def test_format_message_valid_filename(self):
"""Test formatting message for valid filename."""
message = format_validation_message("manpage-schema-v1.0.md")
assert "✅ Valid" in message
assert "manpage-schema-v1.0.md" in message
def test_format_message_invalid_filename(self):
"""Test formatting message for invalid filename."""
message = format_validation_message("invalid.json")
assert "❌ Invalid" in message
assert "Errors:" in message
assert "Expected format:" in message
assert "Example:" in message
def test_format_message_includes_suggestion(self):
"""Test that message includes filename suggestion."""
message = format_validation_message("manpage.json")
assert "Suggested filename:" in message
# Should suggest something like manpage-schema-v1.0.md
class TestSchemaFilenamePattern:
"""Tests for the regex pattern itself."""
def test_pattern_matches_valid_filenames(self):
"""Test that pattern matches all valid filename variations."""
valid_filenames = [
"manpage-schema-v1.0.md",
"api-documentation-schema-v1.0.md",
"arc42-schema-v1.0.md",
"a-schema-v1.0.md", # Single letter domain
"my-long-domain-name-schema-v1.0.md",
"manpage-schema-v10.25.md", # Double digit versions
]
for filename in valid_filenames:
match = SCHEMA_FILENAME_PATTERN.match(filename)
assert match is not None, f"Pattern should match {filename}"
def test_pattern_rejects_invalid_filenames(self):
"""Test that pattern rejects invalid filenames."""
invalid_filenames = [
"manpage-schema-v1.0.json", # Wrong extension
"manpage-v1.0.md", # Missing schema keyword
"manpage-schema.md", # Missing version
"ManPage-schema-v1.0.md", # Uppercase
"42answer-schema-v1.0.md", # Starts with number
"-manpage-schema-v1.0.md", # Starts with hyphen
"man_page-schema-v1.0.md", # Underscore in domain
"manpage-schema-1.0.md", # Missing 'v' prefix
"manpage-schema-v1.md", # Missing minor version
]
for filename in invalid_filenames:
match = SCHEMA_FILENAME_PATTERN.match(filename)
assert match is None, f"Pattern should not match {filename}"
class TestEdgeCases:
"""Tests for edge cases and boundary conditions."""
def test_very_long_domain_name(self):
"""Test with very long domain name."""
long_domain = "a" * 100
filename = suggest_schema_filename(long_domain, "1.0")
assert is_valid_schema_filename(filename)
def test_domain_with_many_hyphens(self):
"""Test domain with multiple hyphens."""
filename = suggest_schema_filename("my-very-long-domain-name", "1.0")
assert filename == "my-very-long-domain-name-schema-v1.0.md"
assert is_valid_schema_filename(filename)
def test_version_zero_zero(self):
"""Test with version 0.0."""
filename = suggest_schema_filename("manpage", "0.0")
assert filename == "manpage-schema-v0.0.md"
assert is_valid_schema_filename(filename)
def test_large_version_numbers(self):
"""Test with large version numbers."""
filename = suggest_schema_filename("manpage", "999.999")
assert filename == "manpage-schema-v999.999.md"
assert is_valid_schema_filename(filename)