feat: add terminology schema example and improve schema-list command

This commit completes Phase 2 of schema evolution work and establishes
a new example demonstrating schema usage for terminology documents.

## New Features

### Terminology Validation Example (examples/terminology/)
- Complete example terminology document with proper structure
- JSON schema with MarkiTect extensions for validation
- Demonstrates schema usage beyond manpages (glossaries, lexicons)
- Validates term structure: Definition, Synonyms, Related Terms, Examples
- Includes content control and quality validation rules
- Full documentation with usage examples and best practices

### Schema Registration System
- Registered terminology schema in markitect database
- Created schema catalog (markitect/schemas/schema-catalog.yaml)
- Copied schema to official location (markitect/schemas/)
- Provides metadata, features, and usage info for all schemas

### Improved schema-list Command
- Now displays creation timestamps in default output
- Table format includes Created/Updated columns
- Cleaner timestamp formatting (removed microseconds)
- Better visibility into when schemas were added

## Files Changed

Added:
- examples/terminology/README.md - Complete documentation
- examples/terminology/terminology-example.md - Example glossary
- examples/terminology/terminology-schema.json - Validation schema
- markitect/schemas/terminology-schema.json - Registered schema
- markitect/schemas/schema-catalog.yaml - Schema registry

Modified:
- markitect/cli.py - Enhanced schema-list with timestamps
- TODO.md - Documented Phase 2 completion and new example

Moved:
- SCHEMA_EVOLUTION_WORKPLAN.md → todo/ directory

## Schema Features Demonstrated

- Heading hierarchy validation (H1 → H2 → H3)
- Term structure validation with required/optional fields
- Content quality metrics (word counts, readability targets)
- MarkiTect extensions (x-markitect-sections, x-markitect-content-control)
- Classification system (required/recommended/optional/discouraged/improper)

## Usage

```bash
# List schemas with timestamps
markitect schema-list

# Validate terminology document
markitect validate glossary.md --schema terminology-schema.json

# View in table format
markitect schema-list --format table
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-04 23:07:36 +01:00
parent 82c1a3ab65
commit 6df9b5df05
8 changed files with 927 additions and 3 deletions

View File

@@ -1,787 +0,0 @@
# MarkiTect Schema Evolution Workplan
## Executive Summary
**Current State**: MarkiTect validates document structure via JSON Schema, but is too rigid (exact counts) and structure-only (no content guidance).
**Target State**: A flexible schema system with content control, section classification, multi-schema conformance, and blueprint-based document generation.
**Timeline**: 5 phases, 15-20 development sessions, approximately 8-10 weeks.
---
## Problem Analysis
### Current Limitations
#### 1. Structural Rigidity
**Problem**: Auto-generated schemas use exact counts
```json
"paragraphs": { "minItems": 86, "maxItems": 86 }
```
**Impact**: Schemas are document-specific, not reusable patterns.
#### 2. Binary Structure Validation
**Problem**: Elements are either valid or invalid, no classification.
**Need**: Required, Recommended, Optional, Discouraged, Improper classifications.
#### 3. No Content Guidance
**Problem**: Schemas validate structure exists, not what content belongs there.
**Need**: Content instructions, semantic patterns, quality expectations.
#### 4. Single Schema Limitation
**Problem**: Documents can only conform to one schema.
**Need**: Multi-schema conformance (e.g., "manpage" + "API reference" + "tutorial").
#### 5. Template Generation Gap
**Problem**: `generate-stub` creates outline, but no content guidance or data binding.
**Need**: Blueprint system with content instructions and data templates.
---
## Proposed Architecture
### Three-Layer System
```
┌─────────────────────────────────────────────┐
│ BLUEPRINT LAYER │
│ (Multi-schema + Content + Data Templates) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ SCHEMA LAYER (Enhanced) │
│ (Structure + Classification + Instructions) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ VALIDATION LAYER │
│ (AST Validation + Content Analysis) │
└─────────────────────────────────────────────┘
```
### Key Concepts
**1. Schema Classification System**
- **Required**: Must be present, validation fails if missing
- **Recommended**: Should be present, warning if missing
- **Optional**: May be present, no validation impact
- **Discouraged**: Should not be present, warning if present
- **Improper**: Must not be present, validation fails if present
**2. Content Control**
- **Content Instructions**: Human-readable guidance for section content
- **Content Patterns**: Regex/template patterns for content validation
- **Content Quality Metrics**: Word count, readability, completeness scoring
**3. Multi-Schema Conformance**
- Documents can conform to multiple schemas simultaneously
- Schema composition and inheritance
- Conflict resolution strategies
**4. Blueprint System**
- Schemas + Instructions + Data Templates = Blueprints
- Blueprints generate documents with content guidance
- Data binding for dynamic document generation
---
## Phase 1: Enhanced Schema Format
**Goal**: Extend JSON Schema with MarkiTect-specific content control extensions.
### 1.1 Schema Classification Extensions
**New Properties**:
```json
{
"x-markitect-sections": {
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"position": "after_title",
"content_instruction": "Brief command syntax showing all options",
"min_code_blocks": 1,
"max_code_blocks": 3
},
"EXAMPLES": {
"classification": "recommended",
"heading_level": 2,
"content_instruction": "Practical usage examples with explanations",
"min_code_blocks": 3,
"warning_if_missing": "Examples greatly improve documentation usability"
},
"DEPRECATED": {
"classification": "discouraged",
"heading_level": 2,
"warning_message": "DEPRECATED sections should be moved to historical docs"
},
"INTERNAL_NOTES": {
"classification": "improper",
"heading_level": 2,
"error_message": "Internal notes must not appear in published documentation"
}
}
}
```
### 1.2 Content Control Extensions
**New Properties**:
```json
{
"x-markitect-content-control": {
"synopsis_section": {
"min_paragraphs": 1,
"max_paragraphs": 3,
"required_patterns": [
"\\*\\*[a-z-]+\\*\\*.*\\[.*\\]" // Bold command with args
],
"content_quality": {
"min_words": 10,
"max_words": 100,
"readability_target": "technical"
},
"content_instructions": [
"Show command name in bold",
"Include all major options in synopsis",
"Use italic for arguments and placeholders"
]
}
}
}
```
### 1.3 Flexible Structure Constraints
**Replace rigid counts with ranges and classifications**:
```json
{
"properties": {
"headings": {
"properties": {
"level_2": {
"items": {
"properties": {
"content": {
"oneOf": [
{"const": "SYNOPSIS", "x-markitect-classification": "required"},
{"const": "DESCRIPTION", "x-markitect-classification": "required"},
{"const": "EXAMPLES", "x-markitect-classification": "recommended"},
{"const": "SEE ALSO", "x-markitect-classification": "optional"}
]
}
}
},
"minItems": 2, // At least required sections
"maxItems": 30 // Reasonable upper bound
}
}
}
}
}
```
### Tasks
- [ ] **Task 1.1**: Define `x-markitect-sections` schema extension format
- [ ] **Task 1.2**: Define `x-markitect-content-control` schema extension format
- [ ] **Task 1.3**: Update metaschema to validate new extensions
- [ ] **Task 1.4**: Create schema examples demonstrating all classifications
- [ ] **Task 1.5**: Document schema extension format
**Duration**: 3-4 sessions
**Dependencies**: None
**Deliverables**: Enhanced schema format specification, updated metaschema
---
## Phase 2: Schema Refinement Tools
**Goal**: Tools to transform rigid auto-generated schemas into flexible, classified schemas.
### 2.1 Schema Analysis Tool
**Command**: `markitect schema-analyze`
Analyzes existing schema and suggests improvements:
```bash
markitect schema-analyze rigid-schema.json
# Output:
⚠️ Exact counts detected (86 paragraphs)
Suggestion: Use range 50-150 for flexibility
⚠️ All sections unclassified
Suggestion: Classify sections as required/recommended/optional
⚠️ No content instructions
Suggestion: Add content guidance for key sections
✨ Run: markitect schema-refine rigid-schema.json
```
### 2.2 Schema Refinement Tool
**Command**: `markitect schema-refine`
Interactive or automated schema refinement:
```bash
# Automated: Apply common refinements
markitect schema-refine rigid-schema.json \
--loosen-counts \
--add-classifications \
--output flexible-schema.json
# Interactive: Guided refinement
markitect schema-refine rigid-schema.json --interactive
```
**Refinement Operations**:
- Convert exact counts to ranges (configurable tolerance)
- Classify sections based on conventions
- Add content instructions from templates
- Merge multiple schemas for common patterns
### 2.3 Schema Composition Tool
**Command**: `markitect schema-compose`
Combine multiple schemas:
```bash
# Create composite schema
markitect schema-compose \
--base manpage-schema.json \
--extend api-reference-schema.json \
--extend tutorial-schema.json \
--output composite-schema.json
```
### Tasks
- [ ] **Task 2.1**: Implement `schema-analyze` command
- [ ] **Task 2.2**: Implement `schema-refine` command with loosening logic
- [ ] **Task 2.3**: Implement `schema-refine --interactive` mode
- [ ] **Task 2.4**: Implement `schema-compose` command
- [ ] **Task 2.5**: Create schema refinement rule library
**Duration**: 3-4 sessions
**Dependencies**: Phase 1 complete
**Deliverables**: Schema analysis, refinement, and composition tools
---
## Phase 3: Enhanced Validation Engine
**Goal**: Validate classification levels, content patterns, and multi-schema conformance.
### 3.1 Classification-Aware Validation
**Validation Levels**:
```python
class ValidationResult:
status: Literal["valid", "valid_with_warnings", "invalid"]
errors: List[ValidationError] # Required/Improper violations
warnings: List[ValidationWarning] # Recommended/Discouraged violations
suggestions: List[str] # Optional improvements
```
**Example Output**:
```bash
markitect validate document.md schema.json --detailed-errors
❌ ERRORS (validation failed)
- Missing required section: SYNOPSIS
- Improper section present: INTERNAL_NOTES
⚠️ WARNINGS
- Missing recommended section: EXAMPLES
- Discouraged section present: DEPRECATED
💡 SUGGESTIONS
- Consider adding optional section: PERFORMANCE
- Content quality: DESCRIPTION section below recommended word count (45/100)
Status: INVALID (2 errors, 2 warnings)
```
### 3.2 Content Pattern Validation
**Validate content patterns**:
```python
# Schema specifies required patterns
"synopsis_section": {
"required_patterns": [
r"\*\*command\*\*", # Bold command name
r"\[.*\]" # Options in brackets
],
"discouraged_patterns": [
r"TODO", # No TODOs in published docs
r"FIXME"
]
}
```
### 3.3 Multi-Schema Validation
**Command**: `markitect validate --schemas`
```bash
# Validate against multiple schemas
markitect validate api-doc.md \
--schemas manpage.json,api-reference.json,tutorial.json \
--require-all
# Output shows conformance to each schema
✅ manpage.json: VALID
✅ api-reference.json: VALID (2 warnings)
❌ tutorial.json: INVALID (missing required section: GETTING STARTED)
Overall: INVALID (must conform to all schemas)
```
### 3.4 Content Quality Metrics
**Validate content quality**:
```bash
markitect validate document.md schema.json --quality-check
📊 Content Quality Report
- Word count: 487 (target: 300-1000)
- Code examples: 3 (minimum: 3)
- Readability: Technical (appropriate)
- Link validity: 12/12 valid ✅
- Heading hierarchy: Valid ✅
Quality Score: 95/100
```
### Tasks
- [ ] **Task 3.1**: Implement classification-aware validator
- [ ] **Task 3.2**: Implement content pattern validation
- [ ] **Task 3.3**: Implement multi-schema validation
- [ ] **Task 3.4**: Implement content quality metrics
- [ ] **Task 3.5**: Enhanced error reporting with suggestions
**Duration**: 4-5 sessions
**Dependencies**: Phase 1 complete
**Deliverables**: Enhanced validation engine, quality metrics
---
## Phase 4: Blueprint System
**Goal**: Document generation system with schemas + content instructions + data templates.
### 4.1 Blueprint Format
**Blueprint Structure**:
```json
{
"$blueprint": "1.0",
"name": "api-documentation-blueprint",
"description": "Blueprint for API endpoint documentation",
"schemas": [
"manpage-schema.json",
"api-reference-schema.json"
],
"content_model": {
"synopsis": {
"template": "**{{command}}** [*OPTIONS*] *{{primary_argument}}*",
"data_source": "command_metadata.json",
"instruction": "Brief command syntax"
},
"description": {
"template": "{{description}}\n\nThis endpoint {{purpose}}.",
"min_paragraphs": 2,
"instruction": "Explain what the endpoint does and why to use it"
},
"parameters": {
"template": "{{#each parameters}}\n**{{name}}** *{{type}}*\n: {{description}}\n{{/each}}",
"data_source": "parameters",
"instruction": "Document all parameters with types and descriptions"
}
},
"data_schema": {
"type": "object",
"properties": {
"command": {"type": "string"},
"primary_argument": {"type": "string"},
"description": {"type": "string"},
"purpose": {"type": "string"},
"parameters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string"},
"description": {"type": "string"}
}
}
}
}
},
"generation_rules": {
"heading_style": "atx",
"code_fence_style": "backticks",
"line_length": 80,
"include_metadata": true
}
}
```
### 4.2 Blueprint Commands
**Create Blueprint**:
```bash
# From existing schema
markitect blueprint-create --from-schema api-schema.json \
--output api-blueprint.json
# Interactive creation
markitect blueprint-create --interactive
```
**Generate from Blueprint**:
```bash
# Generate with data file
markitect blueprint-generate api-blueprint.json \
--data endpoint-data.json \
--output api-doc.md
# Generate with inline data
markitect blueprint-generate api-blueprint.json \
--data '{"command": "api-call", "description": "Make API call"}' \
--output api-doc.md
# Batch generation
markitect blueprint-generate-batch api-blueprint.json \
--data-dir ./endpoints/ \
--output-dir ./docs/api/
```
**Validate Blueprint**:
```bash
# Validate blueprint format
markitect blueprint-validate api-blueprint.json
# Test blueprint generation
markitect blueprint-test api-blueprint.json \
--sample-data test-data.json
```
### 4.3 Template Engine Integration
**Handlebars-style templates with MarkiTect extensions**:
```markdown
# {{command}}(1) - {{title}}
## SYNOPSIS
**{{command}}** {{#each options}}[*{{this}}*] {{/each}}*{{argument}}*
## DESCRIPTION
{{description}}
{{#markitect-section "technical-details"}}
Technical implementation details for {{command}}.
{{/markitect-section}}
## PARAMETERS
{{#each parameters}}
**--{{name}}** *{{type}}*
: {{description}}
: {{#if default}}Default: `{{default}}`{{/if}}
{{/each}}
{{#markitect-code-block "bash"}}
# Example usage
{{command}} {{#each examples.[0].args}}{{this}} {{/each}}
{{/markitect-code-block}}
```
### Tasks
- [ ] **Task 4.1**: Define blueprint format specification
- [ ] **Task 4.2**: Implement `blueprint-create` command
- [ ] **Task 4.3**: Implement `blueprint-generate` command
- [ ] **Task 4.4**: Implement template engine with Handlebars
- [ ] **Task 4.5**: Implement `blueprint-validate` command
- [ ] **Task 4.6**: Implement batch generation
- [ ] **Task 4.7**: Create blueprint library (common patterns)
**Duration**: 5-6 sessions
**Dependencies**: Phases 1 and 3 complete
**Deliverables**: Blueprint system, template engine, generation commands
---
## Phase 5: Documentation and Integration
**Goal**: Comprehensive documentation, examples, and ecosystem integration.
### 5.1 Documentation Suite
**Documents to Create**:
- [ ] Schema Evolution Guide (why and how)
- [ ] Schema Classification Reference
- [ ] Content Control Specification
- [ ] Blueprint System Guide
- [ ] Schema Design Best Practices
- [ ] Migration Guide (old schemas → new format)
- [ ] API Reference for programmatic usage
### 5.2 Example Gallery
**Create comprehensive examples**:
- [ ] Manpage blueprint (already started)
- [ ] API documentation blueprint
- [ ] Tutorial document blueprint
- [ ] Architecture Decision Record (ADR) blueprint
- [ ] RFC/specification blueprint
- [ ] Meeting notes blueprint
- [ ] Project README blueprint
### 5.3 CLI Integration
**Update existing commands**:
```bash
# schema-generate with classification
markitect schema-generate example.md \
--classify-sections \
--add-instructions \
--flexible \
--output smart-schema.json
# validate with multiple schemas
markitect validate doc.md \
--schemas schema1.json,schema2.json \
--classification-aware \
--quality-check
# generate-stub enhanced
markitect generate-stub schema.json \
--include-instructions \
--sample-content \
--output template.md
```
### 5.4 CI/CD Integration Templates
**Provide ready-to-use integrations**:
GitHub Actions:
```yaml
- name: Validate Documentation
uses: markitect/validate-action@v1
with:
schemas: docs/schemas/*.json
files: docs/**/*.md
classification-aware: true
fail-on: errors
warn-on: missing-recommended
```
Pre-commit hook:
```bash
#!/bin/bash
markitect validate-changed --schemas docs/schemas/ \
--classification-aware \
--fail-on errors
```
### Tasks
- [ ] **Task 5.1**: Write comprehensive documentation suite
- [ ] **Task 5.2**: Create example gallery with 7+ blueprints
- [ ] **Task 5.3**: Update all CLI commands for new features
- [ ] **Task 5.4**: Create CI/CD integration templates
- [ ] **Task 5.5**: Write migration guide for existing schemas
- [ ] **Task 5.6**: Create video tutorials/screencasts
**Duration**: 3-4 sessions
**Dependencies**: All previous phases complete
**Deliverables**: Complete documentation, examples, integrations
---
## Implementation Strategy
### Development Approach
**1. Test-Driven Development**
- Write tests for each classification level
- Test schema refinement transformations
- Test blueprint generation with various data
- Test multi-schema validation
**2. Backward Compatibility**
- Existing schemas continue to work
- New features are opt-in via extensions
- Clear migration path documented
**3. Incremental Rollout**
- Phase 1: Can be used immediately after completion
- Each phase delivers user value independently
- Later phases build on earlier phases
**4. Community Feedback**
- Alpha release after Phase 1
- Beta release after Phase 3
- Stable release after Phase 5
### Technical Considerations
**Schema Format**:
- JSON Schema draft-07 as foundation
- MarkiTect extensions namespaced with `x-markitect-`
- Validation via metaschema
- Clear upgrade path to future JSON Schema versions
**Performance**:
- Cache compiled schemas
- Lazy validation for large documents
- Parallel validation for multiple schemas
- Optimize content pattern matching
**API Design**:
- Programmatic access to all features
- Python API for schema manipulation
- Plugin system for custom validators
- Extensible template engine
---
## Success Metrics
### Phase 1 Success
- ✅ Schema with all 5 classifications validates correctly
- ✅ Content instructions appear in generated stubs
- ✅ Metaschema validates all extension formats
### Phase 2 Success
- ✅ Rigid schema refined to flexible schema automatically
- ✅ Multiple schemas composed without conflicts
- ✅ Interactive refinement completes end-to-end
### Phase 3 Success
- ✅ Validation distinguishes errors from warnings
- ✅ Content patterns detected and reported
- ✅ Multi-schema validation works with 3+ schemas
- ✅ Quality metrics provide actionable feedback
### Phase 4 Success
- ✅ Blueprint generates valid document from data
- ✅ Generated document validates against source schemas
- ✅ Batch generation processes 100+ documents
- ✅ Template engine supports complex logic
### Phase 5 Success
- ✅ Documentation covers all features
- ✅ 7+ working blueprint examples
- ✅ CI/CD integrations work in real projects
- ✅ Migration guide successfully upgrades old schemas
---
## Risk Assessment
### Technical Risks
**Risk**: Schema format complexity
**Mitigation**: Clear examples, validation tools, gradual adoption
**Risk**: Performance degradation with complex schemas
**Mitigation**: Caching, optimization, benchmarking
**Risk**: Template engine security (code injection)
**Mitigation**: Sandboxed execution, no eval, strict parsing
### Adoption Risks
**Risk**: Breaking changes to existing workflows
**Mitigation**: Full backward compatibility, opt-in features
**Risk**: Learning curve for new features
**Mitigation**: Excellent documentation, examples, tutorials
**Risk**: Feature bloat
**Mitigation**: Keep core simple, advanced features optional
---
## Future Enhancements (Post-MVP)
### Potential Future Features
**1. Semantic Validation**
- AI-powered content quality checking
- Grammar and style validation
- Factual consistency checking
- Link and reference validation
**2. Visual Schema Editor**
- Web-based GUI for schema creation
- Visual blueprint designer
- Live preview of generated documents
- Drag-and-drop section arrangement
**3. Schema Marketplace**
- Community schema repository
- Reusable blueprint library
- Rating and reviews system
- Version management
**4. Advanced Blueprint Features**
- Conditional sections based on data
- Dynamic schema selection
- Multi-language support
- Custom helper functions
**5. Integration Ecosystem**
- IDE plugins (VS Code, JetBrains)
- Documentation platforms (Read the Docs, Docusaurus)
- CMS integrations (Contentful, Strapi)
- Static site generators (Hugo, Jekyll)
---
## Conclusion
This workplan transforms MarkiTect from a structural validator to a comprehensive document control system:
**Current**: Rigid structure validation
**Target**: Flexible content control with blueprints
**Key Improvements**:
1. ✨ Classification system (required → improper)
2. ✨ Content guidance and instructions
3. ✨ Multi-schema conformance
4. ✨ Blueprint-based generation
5. ✨ Quality metrics and analysis
**Timeline**: ~8-10 weeks for full implementation
**Value**: Complete CMS-like document control for markdown
The system remains true to MarkiTect's philosophy of treating markdown as structured data while adding the flexibility and guidance needed for real-world content management.
---
## Next Steps
1. **Review and refine** this workplan
2. **Prioritize phases** based on user needs
3. **Create detailed specifications** for Phase 1
4. **Set up development environment** for new features
5. **Begin implementation** with TDD approach
**First Implementation Task**: Define `x-markitect-sections` format specification

View File

@@ -0,0 +1,287 @@
# Terminology Document Example
This example demonstrates how to use MarkiTect schemas to validate terminology and glossary documents.
## Overview
Terminology documents (glossaries, dictionaries, lexicons) benefit from consistent structure and validation. This example shows how to:
1. Structure terminology documents with clear categories and term definitions
2. Validate terminology documents using JSON schemas
3. Use MarkiTect's schema extensions for content control
## Files
- **terminology-example.md** - Example terminology document with proper structure
- **terminology-schema.json** - JSON schema for validating terminology documents
- **README.md** - This file
## Terminology Document Structure
A well-structured terminology document includes:
### Required Elements
1. **Main Title (Level 1 Heading)**
- Should include keywords: "Terminology", "Glossary", "Terms", or "Definitions"
2. **Category Sections (Level 2 Headings)**
- Organize terms into logical groups
- Examples: "Core Concepts", "Document Types", "Process Terms"
3. **Term Definitions (Level 3 Headings)**
- Each term as a level 3 heading
- Followed by structured content
### Term Structure
Each term should include:
**Required:**
- **Definition:** Clear, concise explanation of the term
**Optional (but recommended):**
- **Synonyms:** Alternative names or abbreviations
- **Related Terms:** Links to related concepts
- **Example:** Practical usage example
- **Use Cases:** Common scenarios
- **Format:** For document type terms
- **Components:** For complex concepts
- **Steps:** For process terms
## Usage
### Using the Registered Schema
The terminology schema is registered in markitect's database and can be used by name:
```bash
# List all registered schemas (terminology-schema.json should appear)
markitect schema-list
# Validate using the registered schema
markitect validate my-glossary.md --schema terminology-schema.json
# Or use the local file directly
markitect validate my-glossary.md --schema examples/terminology/terminology-schema.json
```
### Validate with Detailed Errors
```bash
markitect validate my-glossary.md --schema terminology-schema.json --detailed-errors
```
### Register the Schema (if needed)
If the schema isn't already registered, you can add it to markitect's database:
```bash
markitect schema-ingest markitect/schemas/terminology-schema.json
```
### Generate Schema from Example
```bash
markitect schema-generate terminology-example.md --output my-terminology-schema.json
```
## Schema Features
This schema demonstrates several MarkiTect features:
### 1. Structural Validation
- Enforces consistent heading hierarchy (H1 → H2 → H3)
- Validates minimum term count
- Ensures proper document structure
### 2. Content Pattern Validation
- Validates title pattern (must contain terminology-related keywords)
- Checks for required field labels (Definition:, Synonyms:, etc.)
- Enforces consistent formatting
### 3. MarkiTect Extensions
The schema uses MarkiTect-specific extensions:
#### `x-markitect-sections`
Defines section classifications and requirements:
- `document_title` (required)
- `category_sections` (required, min 1)
- `term_definitions` (required, min 1)
#### `x-markitect-content-control`
Specifies content requirements:
- Required vs optional components
- Content quality metrics (word counts)
- Content instructions for authors
#### `x-markitect-validation-rules`
Custom validation rules:
- Minimum term count (3 required, 10+ recommended)
- Category balance (min 2 terms per category)
- Definition quality checks
- Consistency validation
## Best Practices
### 1. Use Consistent Field Labels
Always use the same labels for metadata:
```markdown
**Definition:** ...
**Synonyms:** ...
**Related Terms:** ...
```
### 2. Write Clear Definitions
- Start with the term's primary meaning
- Use 10-200 words
- Be self-contained (don't require reading other terms)
- Avoid circular definitions
### 3. Group Related Terms
Organize terms into logical categories:
- Core Concepts
- Document Types
- Process Terms
- Quality Attributes
- Deprecated Terms
### 4. Include Examples
Add practical examples for complex terms:
```markdown
**Example:**
\`\`\`markdown
# Heading
Paragraph text
\`\`\`
```
### 5. Link Related Terms
Use **Related Terms:** to create a terminology graph:
```markdown
**Related Terms:** Parser, Token, Node
```
## Extending the Schema
You can customize the schema for your project:
### Add Custom Field Labels
Extend the `bold_text` enum:
```json
"enum": [
"Definition:",
"Synonyms:",
"Your Custom Label:"
]
```
### Adjust Quality Metrics
Modify content quality requirements:
```json
"content_quality": {
"min_words_per_definition": 20,
"max_words_per_definition": 300,
"readability_target": "business"
}
```
### Add Domain-Specific Validation
Include specialized validation rules:
```json
"x-markitect-validation-rules": {
"domain_specific": {
"require_acronym_expansion": true,
"require_source_citations": true
}
}
```
## Use Cases
### Documentation Projects
- Software project glossaries
- API terminology reference
- Architecture decision records (ADR) glossary
- Domain-driven design (DDD) ubiquitous language
### Technical Writing
- Standards documentation
- Compliance documentation (ISO, SOC2)
- Technical specifications
- Research papers
### Knowledge Management
- Company wikis
- Team handbooks
- Onboarding documentation
- Training materials
## Integration with CI/CD
### Pre-commit Hook
```bash
#!/bin/bash
# .git/hooks/pre-commit
markitect validate docs/glossary.md --schema schemas/terminology-schema.json
```
### GitHub Actions
```yaml
name: Validate Terminology
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install MarkiTect
run: pip install markitect
- name: Validate Glossary
run: |
markitect validate docs/glossary.md \
--schema schemas/terminology-schema.json \
--detailed-errors
```
## Related Examples
- **manpages/** - Manual page documentation validation
- **templates/** - Document template examples
- **design-patterns/** - Software pattern documentation
## Learn More
- [Schema Extensions Specification](../../docs/specifications/schema-extensions-spec.md)
- [MarkiTect Documentation](../../README.md)
- [JSON Schema Documentation](https://json-schema.org/)
## Contributing
To improve this example:
1. Add more terms to demonstrate edge cases
2. Enhance the schema with additional validation rules
3. Create alternative terminology document styles
4. Add multilingual terminology examples
## License
This example is part of the MarkiTect project and follows the same license.

View File

@@ -0,0 +1,91 @@
# Project Terminology
A glossary of key terms and concepts for this project.
## Core Concepts
### Abstract Syntax Tree
**Definition:** A tree representation of the abstract syntactic structure of source code or markup, where each node represents a construct occurring in the source.
**Synonyms:** AST, Parse Tree
**Related Terms:** Parser, Token, Node
**Example:**
```markdown
# Heading
Paragraph text
```
The AST representation would include nodes for heading (level 1) and paragraph elements.
### Schema Validation
**Definition:** The process of verifying that a document's structure conforms to a predefined schema specification.
**Synonyms:** Structural Validation, Schema Conformance
**Related Terms:** JSON Schema, Validator, Metaschema
**Use Cases:**
- Ensuring documentation completeness
- Enforcing content standards
- Automated quality checks
## Document Types
### Manpage
**Definition:** A manual page document following Unix/Linux documentation conventions, typically including sections like SYNOPSIS, DESCRIPTION, and OPTIONS.
**Format:** Markdown with specific heading structure
**Related Terms:** Documentation, Manual, Help Text
### Blueprint
**Definition:** A template specification that combines schemas, content instructions, and data templates for generating documents.
**Components:**
- Schema references
- Content model
- Data schema
- Generation rules
## Process Terms
### Schema Refinement
**Definition:** The process of transforming a rigid, auto-generated schema into a flexible, classification-aware schema with content guidance.
**Steps:**
1. Analyze existing schema for rigidity
2. Loosen exact constraints to ranges
3. Add classification metadata
4. Include content instructions
**Tools:** `markitect schema-analyze`, `markitect schema-refine`
## Quality Attributes
### Classification Levels
**Definition:** A hierarchical system for categorizing document elements based on their importance and requirements.
**Levels:**
- **Required**: Must be present (validation fails if missing)
- **Recommended**: Should be present (warning if missing)
- **Optional**: May be present (no impact)
- **Discouraged**: Should not be present (warning if present)
- **Improper**: Must not be present (validation fails if present)
## Deprecated Terms
### Rigid Schema
**Status:** DEPRECATED - Use "Unrefined Schema" instead
**Definition:** A schema with exact count constraints that make it unusable as a pattern.
**Migration:** Use schema refinement tools to convert to flexible schemas.

View File

@@ -0,0 +1,214 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/terminology-v1.json",
"title": "Terminology Document Schema",
"description": "Schema for validating terminology and glossary documents with consistent structure",
"type": "object",
"properties": {
"headings": {
"type": "object",
"properties": {
"level_1": {
"type": "array",
"description": "Main document title",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"pattern": ".*(Terminology|Glossary|Terms|Definitions).*"
}
}
},
"minItems": 1,
"maxItems": 1
},
"level_2": {
"type": "array",
"description": "Category headings (Core Concepts, Document Types, etc.)",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 1
}
}
},
"minItems": 1,
"maxItems": 20
},
"level_3": {
"type": "array",
"description": "Individual term headings",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 1,
"description": "Term name - should be title case"
}
}
},
"minItems": 1
}
},
"required": ["level_1", "level_2", "level_3"]
},
"paragraphs": {
"type": "array",
"description": "Content paragraphs including definitions and descriptions",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"minLength": 10
}
}
},
"minItems": 3
},
"bold_text": {
"type": "array",
"description": "Bold text used for field labels (Definition, Synonyms, etc.)",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"enum": [
"Definition:",
"Synonyms:",
"Related Terms:",
"Example:",
"Examples:",
"Use Cases:",
"Usage:",
"Format:",
"Components:",
"Steps:",
"Tools:",
"Levels:",
"Status:",
"Migration:",
"Required:",
"Recommended:",
"Optional:",
"Discouraged:",
"Improper:"
]
}
}
},
"minItems": 1
}
},
"required": ["headings", "paragraphs"],
"x-markitect-sections": {
"document_title": {
"classification": "required",
"heading_level": 1,
"content_instruction": "Main title should include words like 'Terminology', 'Glossary', or 'Definitions'",
"pattern": ".*(Terminology|Glossary|Terms|Definitions).*"
},
"category_sections": {
"classification": "required",
"heading_level": 2,
"min_sections": 1,
"content_instruction": "Organize terms into logical categories (e.g., Core Concepts, Document Types, Process Terms)"
},
"term_definitions": {
"classification": "required",
"heading_level": 3,
"min_sections": 1,
"content_instruction": "Each term should be a level 3 heading followed by its definition and optional metadata"
}
},
"x-markitect-content-control": {
"term_structure": {
"required_components": [
{
"label": "Definition:",
"type": "bold_text",
"description": "Clear, concise definition of the term"
}
],
"optional_components": [
{
"label": "Synonyms:",
"type": "bold_text",
"description": "Alternative names or abbreviations"
},
{
"label": "Related Terms:",
"type": "bold_text",
"description": "Links to related concepts"
},
{
"label": "Example:",
"type": "bold_text_or_code",
"description": "Practical example demonstrating the term"
},
{
"label": "Use Cases:",
"type": "list",
"description": "Common scenarios where term applies"
}
],
"content_quality": {
"min_words_per_definition": 10,
"max_words_per_definition": 200,
"readability_target": "technical"
},
"content_instructions": [
"Start each term with a level 3 heading containing the term name",
"Follow immediately with 'Definition:' in bold",
"Provide a clear, self-contained definition",
"Add optional fields (Synonyms, Related Terms, Examples) as needed",
"Use consistent formatting across all terms",
"Group related terms under category headings (level 2)"
]
},
"definition_pattern": {
"description": "Each definition should follow: Term heading (###) → Definition: (bold) → Definition text",
"validation": {
"heading_level_3_followed_by": "bold_text_starting_with_Definition",
"definition_length": {
"min_words": 10,
"max_words": 200
}
}
},
"deprecated_terms": {
"classification": "optional",
"heading_level": 2,
"content_instruction": "Optional section for deprecated terms with migration guidance",
"required_fields": [
"Status: DEPRECATED",
"Migration:"
]
}
},
"x-markitect-validation-rules": {
"term_count": {
"min": 3,
"recommended_min": 10,
"description": "Terminology document should define at least 3 terms, 10+ recommended"
},
"category_balance": {
"description": "Each category should have at least 2 terms",
"min_terms_per_category": 2
},
"definition_quality": {
"all_terms_must_have_definition": true,
"definition_must_follow_term_heading": true,
"definition_min_words": 10
},
"consistency": {
"use_consistent_field_labels": true,
"maintain_heading_hierarchy": true
}
}
}