Files
markitect-main/docs/specifications/schema-extensions-spec.md
tegwick d68e762612
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
feat: implement Phase 1 - Enhanced Schema Format with Classifications
Complete Phase 1 of Schema Evolution Workplan implementing flexible content
control and section classification system.

## New Features

### 1. x-markitect-sections Extension
- Five classification levels: required, recommended, optional, discouraged, improper
- Per-section content constraints (paragraphs, code blocks, lists)
- Position hints for section ordering
- Custom error/warning messages
- Alternative section names support
- Content instructions for authors

### 2. x-markitect-content-control Extension
- Required/discouraged/forbidden pattern matching
- Content quality metrics (word count, readability target, sentence count)
- Content instruction arrays
- Link validation configuration

### 3. Metaschema Validation
- Updated markitect-metaschema.json with complete validation rules
- Enhanced metaschema.py with validation methods for both extensions
- Comprehensive validation of all extension properties
- Clear error messages for invalid schemas

### 4. Documentation & Examples
- Complete specification in docs/specifications/schema-extensions-spec.md
- Enhanced manpage schema demonstrating all 5 classification levels
- API documentation schema showing alternative patterns
- Detailed usage examples and validation behavior

## Implementation Details

**Files Modified:**
- markitect/schemas/markitect-metaschema.json: Added extension definitions
- markitect/metaschema.py: Added _validate_sections() and _validate_content_control()

**Files Created:**
- docs/specifications/schema-extensions-spec.md: Complete specification (v1.0)
- examples/manpages/enhanced-manpage-schema.json: Demonstrates all classifications
- examples/manpages/api-documentation-schema.json: Shows API doc patterns

## Validation Behavior

**Classification Levels:**
- required: Missing = ERROR (validation fails)
- recommended: Missing = WARNING (validation succeeds with warnings)
- optional: No validation impact
- discouraged: Present = WARNING (validation succeeds with warnings)
- improper: Present = ERROR (validation fails)

## Next Steps

Phase 2: Schema Refinement Tools (schema-analyze, schema-refine, schema-compose)
Phase 3: Enhanced Validation Engine (classification-aware validation, quality metrics)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 21:02:51 +01:00

663 lines
18 KiB
Markdown

# MarkiTect Schema Extensions Specification v1.0
## Status: Draft - Phase 1 Implementation
## Overview
This specification defines MarkiTect-specific extensions to JSON Schema (draft-07) for markdown document validation with content control, section classification, and flexible structural constraints.
## Design Principles
1. **Backward Compatibility**: Existing schemas without extensions continue to work
2. **Namespace Isolation**: All extensions prefixed with `x-markitect-`
3. **Progressive Enhancement**: Extensions add capabilities without breaking standard JSON Schema
4. **Clear Semantics**: Each extension has well-defined validation behavior
5. **Metaschema Validation**: All extensions validated by MarkiTect metaschema
---
## Extension: `x-markitect-sections`
### Purpose
Define document sections with classification levels (required, recommended, optional, discouraged, improper) and content control specifications.
### Schema Location
Applied at the **root level** of the schema or within **properties** that represent document sections.
### Format
```json
{
"x-markitect-sections": {
"SECTION_NAME": {
"classification": "required|recommended|optional|discouraged|improper",
"heading_level": 1|2|3|4|5|6,
"position": "after_title|before_section_name|after_section_name|anywhere",
"content_instruction": "string",
"min_paragraphs": integer,
"max_paragraphs": integer,
"min_code_blocks": integer,
"max_code_blocks": integer,
"min_lists": integer,
"max_lists": integer,
"warning_if_missing": "string",
"error_message": "string",
"alternatives": ["SECTION_NAME_1", "SECTION_NAME_2"]
}
}
}
```
### Property Definitions
#### `classification` (required)
Classification level determining validation behavior:
- **`required`**: Section MUST be present. Validation fails if missing.
- **`recommended`**: Section SHOULD be present. Warning if missing, but validation succeeds.
- **`optional`**: Section MAY be present. No validation impact either way.
- **`discouraged`**: Section SHOULD NOT be present. Warning if present, but validation succeeds.
- **`improper`**: Section MUST NOT be present. Validation fails if present.
**Type**: String enum
**Required**: Yes
**Values**: `["required", "recommended", "optional", "discouraged", "improper"]`
#### `heading_level` (optional)
The heading level (H1-H6) for this section.
**Type**: Integer
**Range**: 1-6
**Default**: 2 (for standard sections)
#### `position` (optional)
Where this section should appear relative to other sections.
**Type**: String enum
**Values**:
- `"after_title"` - Immediately after document title (H1)
- `"before_section_name"` - Before another named section
- `"after_section_name"` - After another named section
- `"anywhere"` - No position constraint (default)
**Default**: `"anywhere"`
#### `content_instruction` (optional)
Human-readable instruction describing what content belongs in this section.
**Type**: String
**Usage**: Displayed in validation warnings, generated templates, and documentation
**Example**:
```json
"content_instruction": "Brief command syntax showing all options and arguments"
```
#### Content Constraints (optional)
Minimum and maximum counts for content elements within the section:
- **`min_paragraphs`**: Minimum paragraph count (integer ≥ 0)
- **`max_paragraphs`**: Maximum paragraph count (integer ≥ min_paragraphs)
- **`min_code_blocks`**: Minimum code block count (integer ≥ 0)
- **`max_code_blocks`**: Maximum code block count (integer ≥ min_code_blocks)
- **`min_lists`**: Minimum list count (integer ≥ 0)
- **`max_lists`**: Maximum list count (integer ≥ max_lists)
**Type**: Integer
**Default**: No constraint if omitted
#### `warning_if_missing` (optional)
Custom warning message when a recommended section is missing.
**Type**: String
**Applies to**: `classification: "recommended"` only
**Example**:
```json
"warning_if_missing": "Examples greatly improve documentation usability"
```
#### `error_message` (optional)
Custom error message when validation fails.
**Type**: String
**Applies to**: `classification: "required"` or `"improper"`
**Example**:
```json
"error_message": "Internal notes must not appear in published documentation"
```
#### `alternatives` (optional)
Array of alternative section names that satisfy the requirement.
**Type**: Array of strings
**Usage**: If any alternative is present, requirement is satisfied
**Example**:
```json
{
"classification": "required",
"alternatives": ["EXAMPLES", "USAGE", "TUTORIAL"]
}
```
### Example: Manpage Schema with Sections
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Unix Manpage Schema",
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"position": "after_title",
"content_instruction": "Brief command syntax with options and arguments",
"min_paragraphs": 1,
"max_paragraphs": 5,
"min_code_blocks": 0,
"max_code_blocks": 3,
"error_message": "SYNOPSIS section is mandatory for all manpages"
},
"DESCRIPTION": {
"classification": "required",
"heading_level": 2,
"position": "after_section_name",
"content_instruction": "Detailed explanation of what the command does",
"min_paragraphs": 2,
"error_message": "DESCRIPTION section is mandatory for all manpages"
},
"EXAMPLES": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Practical usage examples with explanations",
"min_code_blocks": 3,
"warning_if_missing": "Examples greatly improve manpage usability"
},
"SEE ALSO": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Related commands and documentation references",
"warning_if_missing": "Cross-references help users discover related functionality"
},
"BUGS": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Known issues and bug reporting information"
},
"DEPRECATED": {
"classification": "discouraged",
"heading_level": 2,
"warning_if_missing": "Consider moving deprecated content to historical documentation"
},
"INTERNAL_NOTES": {
"classification": "improper",
"heading_level": 2,
"error_message": "Internal notes must not appear in published manpages"
}
}
}
```
### Validation Behavior
#### Required Sections
```json
"SYNOPSIS": {"classification": "required"}
```
**Validation**:
- Section missing → **ERROR**`is_valid = False`
- Section present → Continue validation
- Custom `error_message` used if provided
#### Recommended Sections
```json
"EXAMPLES": {"classification": "recommended"}
```
**Validation**:
- Section missing → **WARNING**`is_valid = True` (with warnings)
- Section present → Continue validation
- Custom `warning_if_missing` used if provided
#### Optional Sections
```json
"BUGS": {"classification": "optional"}
```
**Validation**:
- Section missing → No impact
- Section present → Continue validation
- No messages generated
#### Discouraged Sections
```json
"DEPRECATED": {"classification": "discouraged"}
```
**Validation**:
- Section missing → No impact
- Section present → **WARNING**`is_valid = True` (with warnings)
- Custom warning message used if provided
#### Improper Sections
```json
"INTERNAL_NOTES": {"classification": "improper"}
```
**Validation**:
- Section missing → No impact
- Section present → **ERROR**`is_valid = False`
- Custom `error_message` used if provided
---
## Extension: `x-markitect-content-control`
### Purpose
Define content validation rules for document sections including pattern matching, quality metrics, and semantic constraints.
### Schema Location
Applied at **root level** or within specific **section properties**.
### Format
```json
{
"x-markitect-content-control": {
"section_name": {
"required_patterns": ["regex_pattern_1", "regex_pattern_2"],
"discouraged_patterns": ["regex_pattern_1"],
"forbidden_patterns": ["regex_pattern_1"],
"content_quality": {
"min_words": integer,
"max_words": integer,
"readability_target": "technical|general|simple|advanced",
"min_sentences": integer,
"max_sentences": integer
},
"content_instructions": ["instruction_1", "instruction_2"],
"link_validation": {
"check_internal": boolean,
"check_external": boolean,
"allow_fragments": boolean
}
}
}
}
```
### Property Definitions
#### `required_patterns` (optional)
Array of regex patterns that MUST appear in section content.
**Type**: Array of strings (valid regex patterns)
**Validation**: ERROR if any pattern missing
**Example**:
```json
"required_patterns": [
"\\*\\*[a-z-]+\\*\\*", // Bold command name
"\\[.*\\]" // Options in brackets
]
```
#### `discouraged_patterns` (optional)
Array of regex patterns that SHOULD NOT appear in content.
**Type**: Array of strings (valid regex patterns)
**Validation**: WARNING if any pattern found
**Example**:
```json
"discouraged_patterns": [
"TODO",
"FIXME",
"\\bWIP\\b"
]
```
#### `forbidden_patterns` (optional)
Array of regex patterns that MUST NOT appear in content.
**Type**: Array of strings (valid regex patterns)
**Validation**: ERROR if any pattern found
**Example**:
```json
"forbidden_patterns": [
"password\\s*=\\s*[\"'].*[\"']", // Hard-coded passwords
"api[_-]?key\\s*=\\s*[\"'].*[\"']" // Hard-coded API keys
]
```
#### `content_quality` (optional)
Quality metrics for section content:
**Sub-properties**:
- **`min_words`**: Minimum word count (integer ≥ 0)
- **`max_words`**: Maximum word count (integer ≥ min_words)
- **`readability_target`**: Target readability level (enum)
- `"simple"` - Elementary school level
- `"general"` - General audience
- `"technical"` - Technical audience
- `"advanced"` - Expert/academic level
- **`min_sentences`**: Minimum sentence count (integer ≥ 0)
- **`max_sentences`**: Maximum sentence count (integer ≥ min_sentences)
**Example**:
```json
"content_quality": {
"min_words": 50,
"max_words": 300,
"readability_target": "technical",
"min_sentences": 3
}
```
#### `content_instructions` (optional)
Array of human-readable instructions for content creation.
**Type**: Array of strings
**Usage**: Displayed in templates, validation reports, and documentation
**Example**:
```json
"content_instructions": [
"Show command name in bold",
"Include all major options",
"Use italic for arguments and placeholders",
"Keep syntax examples concise (1-3 lines)"
]
```
#### `link_validation` (optional)
Link checking configuration:
**Sub-properties**:
- **`check_internal`**: Validate internal document links (boolean)
- **`check_external`**: Validate external URLs (boolean)
- **`allow_fragments`**: Allow fragment-only links like `#section` (boolean)
**Default**: All false (no link validation)
**Example**:
```json
"link_validation": {
"check_internal": true,
"check_external": false,
"allow_fragments": true
}
```
### Example: Content Control for API Documentation
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "API Documentation Schema",
"x-markitect-content-control": {
"synopsis": {
"required_patterns": [
"\\*\\*[A-Z]+\\*\\*", // HTTP method in bold
"`/api/.*`" // Endpoint path in code
],
"content_quality": {
"min_words": 10,
"max_words": 100,
"readability_target": "technical"
},
"content_instructions": [
"Start with HTTP method in bold (e.g., **GET**)",
"Show endpoint path in code format",
"Include brief one-line description"
]
},
"request_parameters": {
"required_patterns": [
"\\*\\*[a-z_]+\\*\\*.*\\*[A-Za-z]+\\*" // Bold param name with italic type
],
"content_instructions": [
"Use bold for parameter names",
"Use italic for parameter types",
"Include description for each parameter",
"Mark required parameters clearly"
]
},
"description": {
"discouraged_patterns": [
"TODO",
"FIXME",
"TBD"
],
"forbidden_patterns": [
"password\\s*=",
"secret\\s*=",
"token\\s*="
],
"content_quality": {
"min_words": 50,
"max_words": 500,
"readability_target": "technical",
"min_sentences": 3
},
"link_validation": {
"check_internal": true,
"check_external": true,
"allow_fragments": true
}
}
}
}
```
---
## Validation Result Structure
### Enhanced ValidationResult Class
```python
class ValidationResult:
"""Result of schema validation with classification support."""
status: Literal["valid", "valid_with_warnings", "invalid"]
errors: List[ValidationError] # Required/improper violations
warnings: List[ValidationWarning] # Recommended/discouraged violations
suggestions: List[str] # Optional improvements
quality_metrics: Dict[str, Any] # Content quality scores
```
### Validation Status Values
- **`"valid"`**: No errors, no warnings. Document fully conforms.
- **`"valid_with_warnings"`**: No errors, but has warnings. Document acceptable but improvable.
- **`"invalid"`**: Has errors. Document does not conform to schema.
### Error Types
```python
class ValidationErrorType(Enum):
MISSING_REQUIRED_SECTION = "missing_required_section"
IMPROPER_SECTION_PRESENT = "improper_section_present"
CONTENT_PATTERN_MISSING = "content_pattern_missing"
CONTENT_PATTERN_FORBIDDEN = "content_pattern_forbidden"
CONTENT_TOO_SHORT = "content_too_short"
CONTENT_TOO_LONG = "content_too_long"
INVALID_LINK = "invalid_link"
STRUCTURE_MISMATCH = "structure_mismatch"
```
### Warning Types
```python
class ValidationWarningType(Enum):
MISSING_RECOMMENDED_SECTION = "missing_recommended_section"
DISCOURAGED_SECTION_PRESENT = "discouraged_section_present"
CONTENT_PATTERN_DISCOURAGED = "content_pattern_discouraged"
CONTENT_QUALITY_BELOW_TARGET = "content_quality_below_target"
READABILITY_MISMATCH = "readability_mismatch"
```
---
## Metaschema Validation
### Extension Validation Rules
The MarkiTect metaschema validates these extensions:
```json
{
"x-markitect-sections": {
"type": "object",
"patternProperties": {
"^[A-Z][A-Z0-9_ ]*$": {
"type": "object",
"properties": {
"classification": {
"type": "string",
"enum": ["required", "recommended", "optional", "discouraged", "improper"]
},
"heading_level": {
"type": "integer",
"minimum": 1,
"maximum": 6
},
"position": {
"type": "string",
"enum": ["after_title", "before_section_name", "after_section_name", "anywhere"]
},
"content_instruction": {"type": "string"},
"min_paragraphs": {"type": "integer", "minimum": 0},
"max_paragraphs": {"type": "integer", "minimum": 0},
"min_code_blocks": {"type": "integer", "minimum": 0},
"max_code_blocks": {"type": "integer", "minimum": 0},
"min_lists": {"type": "integer", "minimum": 0},
"max_lists": {"type": "integer", "minimum": 0},
"warning_if_missing": {"type": "string"},
"error_message": {"type": "string"},
"alternatives": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["classification"]
}
}
},
"x-markitect-content-control": {
"type": "object",
"patternProperties": {
"^[a-z][a-z0-9_]*$": {
"type": "object",
"properties": {
"required_patterns": {
"type": "array",
"items": {"type": "string", "format": "regex"}
},
"discouraged_patterns": {
"type": "array",
"items": {"type": "string", "format": "regex"}
},
"forbidden_patterns": {
"type": "array",
"items": {"type": "string", "format": "regex"}
},
"content_quality": {
"type": "object",
"properties": {
"min_words": {"type": "integer", "minimum": 0},
"max_words": {"type": "integer", "minimum": 0},
"readability_target": {
"type": "string",
"enum": ["simple", "general", "technical", "advanced"]
},
"min_sentences": {"type": "integer", "minimum": 0},
"max_sentences": {"type": "integer", "minimum": 0}
}
},
"content_instructions": {
"type": "array",
"items": {"type": "string"}
},
"link_validation": {
"type": "object",
"properties": {
"check_internal": {"type": "boolean"},
"check_external": {"type": "boolean"},
"allow_fragments": {"type": "boolean"}
}
}
}
}
}
}
}
```
---
## Implementation Notes
### Phase 1 Scope
1. Define and document extension formats ✓
2. Update metaschema to validate extensions
3. Implement basic classification validation (required/recommended/optional/discouraged/improper)
4. Create example schemas demonstrating all features
5. Update CLI to report errors vs warnings separately
### Future Enhancements (Phase 2+)
- Content pattern matching implementation
- Quality metrics calculation
- Link validation
- Readability scoring
- Position constraints enforcement
---
## Version History
- **v1.0 (Draft)** - Initial specification for Phase 1 implementation
- `x-markitect-sections` extension defined
- `x-markitect-content-control` extension defined
- Validation result structure defined
- Metaschema validation rules defined
---
## References
- JSON Schema Draft-07: https://json-schema.org/draft-07/schema
- MarkiTect Schema Evolution Workplan: `examples/manpages/SCHEMA_EVOLUTION_WORKPLAN.md`
- Existing Metaschema: `markitect/schemas/markitect-metaschema.json`
- Metaschema Validator: `markitect/metaschema.py`