Files
markitect-main/docs/SCHEMA_MANAGEMENT_GUIDE.md
tegwick 20c0cfece7 feat: add LinkValidator for semantic link validation (Phase 3)
Implement comprehensive link validation as part of semantic validation:

Core Features:
- Link classification: internal, external, fragment, email
- Internal link validation: fragment anchors and file paths
- External link validation: HTTP/HTTPS with configurable timeout
- Email validation: mailto: link format checking
- Fragment policy enforcement: allow/disallow fragment identifiers

Link Validator:
- markitect/validators/link_validator.py - Full link validation implementation
- Supports x-markitect-content-control.link_validation configuration
- Default: check internal links, skip external (fast)
- Opt-in external checking with --check-links flag

Integration:
- Updated SemanticValidator to include link_result in reports
- CLI already supports --check-links flag (line 1629 in cli.py)
- Link validation runs by default for internal links (fast)
- External link checking requires explicit --check-links flag

Test Coverage:
- Added 9 comprehensive tests for LinkValidator
- Tests cover: classification, broken links, fragments, email, statistics
- All 25 semantic validator tests passing (100%)

Documentation:
- Updated SCHEMA_MANAGEMENT_GUIDE.md with link validation section
- Added examples for broken links and external link checking
- Documented link types, validation rules, and configuration

Statistics Tracking:
- Links checked, internal/external/fragment/email counts
- Detailed error/warning reporting with line numbers
- Integration with existing semantic validation reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 03:41:03 +01:00

549 lines
13 KiB
Markdown

# Schema Management Guide
Complete guide to managing schemas in MarkiTect using the Schema-of-Schemas system.
## Overview
MarkiTect provides a comprehensive schema management system with:
- Markdown-first schema format with embedded JSON
- Strict naming conventions for consistency
- Metaschema validation for all schemas
- Multi-schema batch validation
- Schema registry with version tracking
## Quick Start
### 1. Create a New Schema
Create a markdown file following the naming convention: `{domain}-schema-v{major}.{minor}.md`
```bash
# Example: blog-post-schema-v1.0.md
```
**Template:**
```markdown
---
schema-id: https://markitect.dev/schemas/blog-post/v1.0
version: 1.0.0
status: stable
domain: blog-post
description: Schema for blog post documents
---
# Blog Post Schema v1.0.0
## Overview
This schema validates blog post documents with frontmatter and content sections.
## Schema Definition
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/blog-post/v1.0",
"title": "Blog Post Schema",
"description": "Schema for blog post documents",
"version": "1.0.0",
"type": "object",
"properties": {
"title": {
"type": "string",
"minLength": 1
},
"author": {
"type": "string"
},
"date": {
"type": "string",
"format": "date"
}
},
"required": ["title", "author"]
}
```
\`\`\`
### 2. Validate Your Schema
Validate against the metaschema to ensure it follows MarkiTect conventions:
```bash
# Validate a single schema file
markitect schema-validate ./blog-post-schema-v1.0.md
# See detailed errors
markitect schema-validate ./blog-post-schema-v1.0.md --detailed-errors
```
### 3. Ingest into Registry
Add your schema to the registry:
```bash
markitect schema-ingest blog-post-schema-v1.0.md
```
### 4. List Registered Schemas
View all schemas with numbered references:
```bash
# Simple format (default)
markitect schema-list
# Table format
markitect schema-list --format table
# JSON format
markitect schema-list --format json
```
**Output:**
```
Found 4 schema(s):
[1] 🔧 blog-post-schema-v1.0.md (added: 2026-01-05T10:30:00)
[2] 🔧 schema-schema-v1.0.md (added: 2026-01-05T03:33:42)
[3] 🔧 manpage-schema-v1.0.md (added: 2026-01-05T03:33:42)
[4] 🔧 api-documentation-schema-v1.0.md (added: 2026-01-05T03:33:35)
```
## Schema Validation
### Single Schema Validation
**By number:**
```bash
markitect schema-validate 1
```
**By filename (from registry):**
```bash
markitect schema-validate blog-post-schema-v1.0.md
```
**By filesystem path:**
```bash
markitect schema-validate ./my-schema.md
```
### Batch Validation
**Validate a range:**
```bash
markitect schema-validate 1-3
```
**Validate specific schemas:**
```bash
markitect schema-validate 1,3,5
```
**Validate all schemas:**
```bash
markitect schema-validate --all
```
**Output:**
```
Validating 4 schema(s)...
Results:
# Schema Status Details
--- -------------------------------- -------- ---------
1 blog-post-schema-v1.0.md ✅ Valid v1.0.0
2 schema-schema-v1.0.md ✅ Valid v1.0.0
3 manpage-schema-v1.0.md ✅ Valid v1.0.0
4 api-documentation-schema-v1.0.md ✅ Valid v1.0.0
Summary: 4 valid, 0 failed
```
## Document Validation (Semantic)
### Validate Documents Against Schemas
Beyond validating schema structure, MarkiTect can validate actual markdown documents against schemas, checking both structural (AST) and semantic (x-markitect extensions) aspects.
**Validate a document:**
```bash
# Full validation (structural + semantic)
markitect validate my-document.md --schema manpage-schema-v1.0.md
# Only structural validation (classic mode)
markitect validate my-document.md --schema schema.json --no-semantic
# With external link checking (may be slow)
markitect validate my-document.md --schema manpage-schema-v1.0.md --check-links
# Strict mode (warnings become errors)
markitect validate my-document.md --schema manpage-schema-v1.0.md --strict
```
### What is Validated
**Structural Validation** (always enabled):
- Document AST structure matches JSON Schema properties
- Heading counts, paragraph counts, code block counts
- Element types and nesting
**Semantic Validation** (enabled by default with --semantic):
- **Section Classifications**: Checks that documents have required sections, don't have improper sections
- REQUIRED sections must be present (ERROR if missing)
- RECOMMENDED sections should be present (WARNING if missing)
- IMPROPER sections must not be present (ERROR if found)
- DISCOURAGED sections should not be present (WARNING if found)
- OPTIONAL sections may or may not be present (no check)
- **Content Patterns**: Validates content matches regex patterns
- `required_patterns`: Content must match (ERROR if missing)
- `forbidden_patterns`: Content must not match (ERROR if found)
- `discouraged_patterns`: Content should not match (WARNING if found)
- **Quality Metrics**: Checks word counts, sentence counts
- `min_words`, `max_words`: Word count requirements (WARNING)
- `min_sentences`: Minimum sentence count (WARNING)
- **Link Validation**: Validates internal and external links (optional)
- Internal links: Checked by default when semantic validation enabled
- Fragment links (#section-name) verified to exist (ERROR if broken)
- Relative file paths checked for existence (ERROR if broken)
- External links: Opt-in with --check-links flag (may be slow)
- HTTP/HTTPS URLs validated with HEAD requests (WARNING if broken)
- Email validation: Validates mailto: link format (WARNING if invalid)
- Fragment policy: Configurable allow/disallow fragment identifiers
### Validation Output
```
Validation result: VALID
File: my-command.1.md
Schema: schema file: manpage-schema-v1.0.md
✅ Document structure matches schema requirements
============================================================
Semantic Validation Results:
============================================================
Section Validation:
✅ SYNOPSIS - Present (required)
✅ DESCRIPTION - Present (required)
✅ EXAMPLES - Present (recommended)
Content Validation:
✅ All content requirements met
Link Validation:
✅ All 12 links valid
Summary:
Sections checked: 3
Sections found: 5
Errors: 0
Warnings: 0
Status: PASSED ✅
```
### Common Validation Scenarios
**Example 1: Missing Required Section**
```bash
$ markitect validate doc.md --schema manpage-schema-v1.0.md
❌ Document validation failed
Section Validation:
❌ SYNOPSIS - SYNOPSIS section is mandatory
✅ DESCRIPTION - Present (required)
Errors: 1
Status: FAILED ❌
```
**Example 2: Forbidden Pattern Found**
```bash
$ markitect validate doc.md --schema manpage-schema-v1.0.md
Content Validation:
❌ SYNOPSIS - Forbidden pattern found: 'TODO'
Errors: 1
Status: FAILED ❌
```
**Example 3: Content Too Short (Warning)**
```bash
$ markitect validate doc.md --schema manpage-schema-v1.0.md
Content Validation:
⚠️ DESCRIPTION - Content too short (25 words, minimum 50)
Warnings: 1
Status: PASSED ✅
# With --strict flag, this would fail:
$ markitect validate doc.md --schema manpage-schema-v1.0.md --strict
Status: FAILED ❌ (warnings treated as errors)
```
**Example 4: Broken Internal Link**
```bash
$ markitect validate doc.md --schema manpage-schema-v1.0.md
Link Validation:
#nonexistent-section - Internal link target not found: #nonexistent-section
Errors: 1
Status: FAILED ❌
```
**Example 5: External Link Validation**
```bash
# Enable external link checking (may be slow)
$ markitect validate doc.md --schema manpage-schema-v1.0.md --check-links
Link Validation:
✅ http://example.com - Valid
⚠️ http://broken-link.invalid - External link unreachable: Name or service not known
Warnings: 1
Status: PASSED ✅
```
## Schema Naming Conventions
All schema filenames must follow this pattern:
```
{domain}-schema-v{major}.{minor}.md
```
### Rules
- **Domain**: Lowercase letters, numbers, and hyphens only
- **Version**: Major.minor format (e.g., `v1.0`, `v2.3`)
- **Extension**: Must be `.md`
- **No spaces**: Use hyphens for separation
### Valid Examples
- `blog-post-schema-v1.0.md`
- `api-documentation-schema-v2.1.md`
- `user-profile-schema-v1.0.md`
### Invalid Examples
- `BlogPost-schema-v1.0.md` (uppercase)
- `blog_post-schema-v1.0.md` (underscore)
- `blog-post-v1.0.md` (missing "schema")
- `blog-post-schema-v1.md` (missing minor version)
## Required Schema Fields
All schemas must include these fields:
### Frontmatter (YAML)
```yaml
---
schema-id: https://markitect.dev/schemas/{domain}/v{major}.{minor}
version: {major}.{minor}.{patch}
status: draft|stable|deprecated
domain: {domain}
description: Brief description
---
```
### JSON Schema
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/{domain}/v{major}.{minor}",
"title": "Schema Title",
"description": "Schema description",
"version": "{major}.{minor}.{patch}"
}
```
## Common Workflows
### Revalidate All Schemas After Metaschema Changes
When you update the metaschema, revalidate all registered schemas:
```bash
markitect schema-validate --all
```
### Check Schema Rigidity
Analyze a schema for overly rigid constraints:
```bash
markitect schema-analyze my-schema.md
```
### Refine a Rigid Schema
Automatically loosen overly specific constraints:
```bash
# Dry run (preview changes)
markitect schema-refine my-schema.md --dry-run
# Apply changes
markitect schema-refine my-schema.md
# Interactive mode
markitect schema-refine my-schema.md --interactive
```
### Get Schema Details
View schema metadata:
```bash
markitect schema-get blog-post-schema-v1.0.md
```
### Delete a Schema
Remove a schema from the registry:
```bash
markitect schema-delete blog-post-schema-v1.0.md --confirm
```
## Resolution Precedence
When validating schemas, MarkiTect uses this resolution order:
1. **Registry (by filename)**: Exact match in the database
2. **Filesystem (fallback)**: If not found in registry or looks like a path
### Examples
```bash
# Looks up in registry first
markitect schema-validate blog-post-schema-v1.0.md
# Forces filesystem lookup (contains /)
markitect schema-validate ./blog-post-schema-v1.0.md
# Also forces filesystem
markitect schema-validate ../schemas/blog-post-schema-v1.0.md
```
## Best Practices
### Schema Development
1. **Start with a template**: Use an existing schema as a starting point
2. **Validate early**: Validate against the metaschema before ingesting
3. **Use semantic versioning**: Major.minor.patch for all versions
4. **Document thoroughly**: Include overview, usage, and examples
5. **Test with real documents**: Validate actual documents against your schema
### Version Management
- **Increment major version**: Breaking changes to schema structure
- **Increment minor version**: Backward-compatible additions
- **Increment patch version**: Bug fixes and clarifications
### Schema Organization
```
markitect/schemas/
├── schema-schema-v1.0.md # Metaschema
├── manpage-schema-v1.0.md # Man page documents
├── api-documentation-schema-v1.0.md
├── terminology-schema-v1.0.md
└── blog-post-schema-v1.0.md # Your schemas
```
## Troubleshooting
### Schema Not Found
```
❌ Schema 'my-schema.md' not found in registry or filesystem
```
**Solution:** Use `markitect schema-list` to see available schemas, or provide a path: `./my-schema.md`
### Validation Fails
```
❌ Schema validation failed: my-schema.md
Found 2 validation error(s):
```
**Solution:** Check error messages and compare with metaschema requirements. Use `--detailed-errors` for more context.
### Invalid Selector
```
❌ Invalid selector: Range 1-10 is out of bounds. Valid range: 1-4
```
**Solution:** Use `markitect schema-list` to see valid numbers, or check your range syntax.
## Advanced Usage
### Scripting with Schema Commands
Validate schemas in CI/CD:
```bash
#!/bin/bash
# Validate all schemas and exit with error if any fail
if ! markitect schema-validate --all; then
echo "Schema validation failed!"
exit 1
fi
echo "All schemas valid"
```
### Batch Operations
```bash
# Validate recently added schemas
markitect schema-validate 1-3
# Validate specific critical schemas
markitect schema-validate 1,5,8
# Check just the metaschema
markitect schema-validate 2
```
## Schema Extensions
MarkiTect supports custom extensions in schemas:
- `x-markitect-sections`: Section classification (required, recommended, optional, discouraged, improper)
- `x-markitect-content-control`: Content validation rules and patterns
- `x-markitect-metadata`: Additional metadata for MarkiTect processing
See existing schemas for examples of these extensions.
## Future Enhancements
Planned features:
- Wildcard/globbing support: `markitect schema-validate */manpage*`
- Schema diff tool: Compare schema versions
- Schema migration assistant: Help upgrade documents to new schema versions
## Related Documentation
- [Schema Naming Specification](../history/2026-01-05-schema-of-schemas/SCHEMA_NAMING_SPEC.md)
- [Schema Loader Guide](../history/2026-01-05-schema-of-schemas/SCHEMA_LOADER_GUIDE.md)
- [Metaschema Reference](../markitect/schemas/schema-schema-v1.0.md)
- [Implementation Workplan](../history/2026-01-05-schema-of-schemas/WORKPLAN.md) (archived)
## Support
For issues or questions:
- Check existing schemas as examples
- Review metaschema validation errors carefully
- Use `--detailed-errors` for more context
- Consult the metaschema for requirements