Files
markitect-main/roadmap/schema-of-schemas/SCHEMA_MANAGEMENT_PROPOSAL.md
tegwick b6f95066a3 chore: establish schema-of-schemas workplan and reorganize roadmap
This commit sets up the comprehensive workplan for implementing a
markdown-first schema management system with naming conventions,
versioning, and self-validation capabilities.

## Directory Reorganization

- Renamed `todo/` → `roadmap/` for better organization
- Created `roadmap/schema-of-schemas/` subdirectory
- Moved schema management planning artifacts to dedicated directory

## Planning Artifacts Created

### Workplan & Documentation
- **WORKPLAN.md** (19KB) - Comprehensive 6-phase implementation plan
- **SCHEMA_MANAGEMENT_PROPOSAL.md** - Full analysis with 4 options
- **SCHEMA_MANAGEMENT_SUMMARY.md** - Executive summary
- **README.md** - Quick reference guide

### Example Schema
- **examples/schemas/manpage-schema-v1.md** - Demonstrates markdown format

## Schema Management System Design

### Naming Convention
**Format:** `{domain}-schema-v{major}.{minor}.md`
**Examples:**
- `manpage-schema-v1.0.md`
- `terminology-schema-v1.0.md`
- `api-documentation-schema-v1.0.md`

### Markdown-First Format
Schemas will be markdown files with:
- YAML frontmatter for metadata
- Rich documentation sections
- Embedded JSON schema in code block
- Version history and examples

### Implementation Phases (8-10 days)

**Phase 0:** Planning & Setup  (0.5 days) - COMPLETE
**Phase 1:** Filename Convention (1 day) - NEXT
**Phase 2:** Markdown Loader (2-3 days)
**Phase 3:** Schema-for-Schemas (2 days)
**Phase 4:** Schema Migration (1-2 days)
**Phase 5:** CLI & Documentation (1 day)
**Phase 6:** Testing & Validation (1 day)

### Goals

1.  Establish naming convention
2.  Implement filename validation
3.  Create markdown schema loader
4.  Build schema-for-schemas metaschema
5.  Migrate 5 existing schemas (remove 2 duplicates)
6.  Update CLI and documentation

## Updated Tracking

### TODO.md
- Added Schema-of-Schemas as active work item
- Documented Phase 1 tasks and timeline
- Paused capability extraction work

### CHANGELOG.md
- Added schema management system to [Unreleased]
- Documented directory reorganization
- Added "In Progress" section for current work

## Next Steps

Begin Phase 1:
1. Implement schema_naming.py with validation
2. Add unit tests
3. Update CLI schema-ingest command
4. Create naming specification document

## Files Changed

- CHANGELOG.md - Added unreleased schema management features
- TODO.md - Updated active work tracking
- roadmap/ - Reorganized from todo/
- roadmap/schema-of-schemas/ - New planning directory
- examples/schemas/ - Example markdown schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 23:47:02 +01:00

570 lines
14 KiB
Markdown

# Schema Management Proposal
**Status:** Draft
**Created:** 2026-01-04
**Author:** Analysis of current state and proposed improvements
## Problem Statement
### 1. Inconsistent Schema Naming
**Current State:**
```
terminology-schema.json ← Has ".json" suffix
api-documentation ← No suffix
enhanced-manpage ← No suffix
markdown-manpage ← No suffix, duplicate title
markdown-manpage-schema.json ← Has ".json" suffix, duplicate title
```
**Issues:**
- No naming convention enforced
- Duplicate schemas (3 manpage schemas!)
- Mix of suffixed (.json) and non-suffixed names
- No way to distinguish versions
### 2. Missing Versioning
**Current State:**
- No version in filenames
- No version in schema metadata (beyond optional `$id`)
- No way to track schema evolution
- Breaking changes not apparent
**Issues:**
- Can't have multiple versions simultaneously
- No migration path when schemas change
- Unclear which schema version a document uses
### 3. Format Mismatch: JSON vs Markdown
**The Philosophical Problem:**
> MarkiTect is a markdown-centric tool, yet schemas are JSON files.
> This creates a conceptual and practical mismatch.
**Current State:**
- Documents: Markdown (.md)
- Schemas: JSON (.json)
- No unified format for documentation + schema
- Schemas lack rich documentation capabilities
## Proposed Solutions
### Part 1: Naming Convention & Versioning
#### Option A: Filename-Based Versioning (Recommended)
**Format:** `{domain}-{type}-schema-v{major}.{minor}.json`
**Examples:**
```
manpage-schema-v1.0.json # Manpage schema v1.0
manpage-schema-v2.0.json # Breaking change → v2.0
terminology-schema-v1.0.json # Terminology schema
api-documentation-schema-v1.0.json
arc42-schema-v1.0.json
```
**Benefits:**
- Clear versioning in filename
- Easy to see multiple versions
- SemVer compatible (major.minor)
- Searchable/sortable
**Migration Strategy:**
```bash
# Rename existing schemas
markdown-manpage → manpage-schema-v1.0.json
enhanced-manpage → manpage-schema-v2.0.json # (breaking changes)
terminology-schema.json → terminology-schema-v1.0.json
```
#### Option B: $id-Based Versioning
**Keep simple filenames, use `$id` for versioning:**
```json
{
"$id": "https://markitect.dev/schemas/manpage/v1",
"$schema": "http://json-schema.org/draft-07/schema#",
"version": "1.0.0",
...
}
```
**Filenames:** `manpage-schema.json`, `terminology-schema.json`
**Benefits:**
- Clean filenames
- Versioning in metadata
- Follows JSON Schema best practices
**Drawbacks:**
- Can't have multiple versions in same database
- Harder to see versions at a glance
#### Recommendation: **Hybrid Approach**
Combine both for maximum clarity:
```json
// File: manpage-schema-v1.json
{
"$id": "https://markitect.dev/schemas/manpage/v1",
"$schema": "http://json-schema.org/draft-07/schema#",
"version": "1.0.0",
"title": "Unix Manual Page Schema",
...
}
```
### Part 2: Schema Metadata Standard
Add required metadata to all schemas:
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/{domain}/v{major}",
// Required metadata
"version": "1.0.0", // SemVer
"title": "Human Readable Title",
"description": "Detailed description",
// Optional metadata
"x-markitect-schema-type": "document-schema",
"x-markitect-version": {
"major": 1,
"minor": 0,
"patch": 0
},
"x-markitect-author": "MarkiTect Project",
"x-markitect-created": "2026-01-04",
"x-markitect-updated": "2026-01-04",
"x-markitect-deprecated": false,
"x-markitect-superseded-by": null,
"x-markitect-document-types": ["manpage", "manual"],
"x-markitect-example": "examples/manpages/example.md",
// Schema content
"type": "object",
"properties": { ... }
}
```
### Part 3: Format Mismatch Solutions
#### Option 1: Markdown-First with Embedded JSON (Recommended)
**File Format:** Markdown with frontmatter and JSON code block
```markdown
---
schema-version: "1.0.0"
schema-id: "https://markitect.dev/schemas/manpage/v1"
document-type: manpage
status: stable
---
# Manpage Schema v1.0
## Overview
This schema validates Unix/Linux manual page documentation following
standard conventions (SYNOPSIS, DESCRIPTION, OPTIONS, etc.).
## Document Types
- Manual pages (man pages)
- CLI command documentation
- API reference pages
## Usage
\`\`\`bash
markitect validate mycommand.1.md --schema manpage-schema-v1
\`\`\`
## Examples
See [examples/manpages/](../../examples/manpages/) for complete examples.
## Schema Definition
\`\`\`json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://markitect.dev/schemas/manpage/v1",
"version": "1.0.0",
"title": "Unix Manual Page Schema",
"type": "object",
"properties": {
"headings": {
"type": "object",
"properties": {
"level_1": { ... }
}
}
},
"x-markitect-sections": { ... }
}
\`\`\`
## Validation Rules
### Required Sections
- **NAME** - Command name and brief description
- **SYNOPSIS** - Command syntax
- **DESCRIPTION** - Detailed description
### Optional Sections
- **OPTIONS** - Command-line options
- **EXAMPLES** - Usage examples
- **SEE ALSO** - Related commands
## Version History
### v1.0.0 (2026-01-04)
- Initial release
- Basic manpage structure validation
```
**Implementation:**
```python
class MarkdownSchemaLoader:
"""Load schemas from markdown files with embedded JSON."""
def load_schema_from_markdown(self, md_path: Path) -> dict:
"""Extract JSON schema from markdown file."""
content = md_path.read_text()
# Parse frontmatter
frontmatter = self._extract_frontmatter(content)
# Extract JSON from code block
schema_json = self._extract_json_from_code_block(content)
# Merge metadata
schema = json.loads(schema_json)
schema['x-markitect-metadata'] = frontmatter
return schema
def save_schema_to_markdown(self, schema: dict, md_path: Path):
"""Save schema as markdown with embedded JSON."""
# Generate markdown documentation
doc = self._generate_schema_documentation(schema)
# Embed JSON schema
json_block = f"```json\n{json.dumps(schema, indent=2)}\n```"
# Combine
full_content = f"{doc}\n\n## Schema Definition\n\n{json_block}"
md_path.write_text(full_content)
```
**Benefits:**
- ✅ Markdown-first (aligns with MarkiTect philosophy)
- ✅ Rich documentation alongside schema
- ✅ Human-readable and editable
- ✅ Version history in same file
- ✅ Examples and usage inline
- ✅ Can extract JSON when needed
**Drawbacks:**
- ⚠️ Requires parsing logic
- ⚠️ Two sources of truth (markdown + embedded JSON)
- ⚠️ More complex than pure JSON
#### Option 2: Markdown Documentation Generator
**Keep JSON schemas, auto-generate markdown docs:**
```
schemas/
manpage-schema-v1.json # Source of truth
manpage-schema-v1.md # Auto-generated docs
```
**Command:**
```bash
markitect schema-document manpage-schema-v1.json
# Generates: manpage-schema-v1.md
```
**Benefits:**
- ✅ Simple implementation
- ✅ JSON remains source of truth
- ✅ Auto-generated docs always in sync
**Drawbacks:**
- ⚠️ Two files to manage
- ⚠️ Can't hand-edit documentation (gets overwritten)
#### Option 3: Markdown Schema Language (DSL)
**Define schemas in markdown-native syntax:**
```markdown
# Manpage Schema v1.0
## Document Structure
### Required Sections (Level 1 Heading)
**NAME**
- Classification: required
- Content: Command name in bold, followed by description
- Pattern: `**command** - description`
**SYNOPSIS**
- Classification: required
- Content: Command syntax with options
- Min paragraphs: 1
- Max paragraphs: 3
### Optional Sections
**OPTIONS**
- Classification: recommended
- Content: Definition list of command-line options
```
**Parser generates JSON schema from markdown:**
```bash
markitect schema-compile manpage-schema-v1.md --output manpage-schema-v1.json
```
**Benefits:**
- ✅ Pure markdown
- ✅ Human-friendly syntax
- ✅ No JSON editing needed
**Drawbacks:**
- ⚠️ Complex parser implementation
- ⚠️ Limited to MarkiTect-specific features
- ⚠️ Can't use standard JSON Schema tools
#### Option 4: Literate Schema Programming
**Inspired by literate programming, mix documentation and schema:**
```markdown
# Manpage Schema v1.0
Manual pages follow a standard structure. The NAME section is required:
<<define-name-section>>=
{
"NAME": {
"classification": "required",
"heading_level": 2,
"content_instruction": "Command name and brief description"
}
}
The SYNOPSIS section shows command syntax:
<<define-synopsis-section>>=
{
"SYNOPSIS": {
"classification": "required",
"heading_level": 2,
"min_code_blocks": 1
}
}
Complete schema:
<<manpage-schema.json>>=
{
"$schema": "http://json-schema.org/draft-07/schema#",
"x-markitect-sections": {
<<define-name-section>>,
<<define-synopsis-section>>
}
}
```
**Benefits:**
- ✅ Documentation and schema interleaved
- ✅ Literate programming benefits
- ✅ Reusable schema fragments
**Drawbacks:**
- ⚠️ Complex tangling/weaving
- ⚠️ Unfamiliar paradigm
- ⚠️ Overkill for simple schemas
## Recommendations
### Short-Term (Immediate)
1. **Naming Convention:**
- Format: `{domain}-schema-v{major}.{minor}.json`
- Example: `manpage-schema-v1.0.json`
2. **Schema Metadata:**
- Add required `version`, `title`, `description` fields
- Add `x-markitect-*` metadata extensions
- Document in schema-catalog.yaml
3. **Duplicate Cleanup:**
- Consolidate 3 manpage schemas into versioned series
- Keep enhanced-manpage as v2.0 (breaking changes)
- Archive old schemas
### Medium-Term (Next Phase)
4. **Markdown Schema Format (Option 1):**
- Implement markdown-first schema format
- Markdown file with embedded JSON in code block
- Parser extracts JSON for validation
- Rich documentation alongside schema
5. **Schema Documentation Generator:**
- Auto-generate markdown docs from JSON schemas
- Include examples, usage, version history
- Link to example documents
### Long-Term (Future)
6. **Schema DSL (Option 3):**
- Evaluate markdown schema language
- Prototype parser for common patterns
- Consider if DSL adds value over JSON
7. **Schema Registry API:**
- REST API for schema discovery
- Version negotiation
- Schema evolution tracking
## Implementation Plan
### Phase 1: Naming & Versioning (1-2 days)
**Tasks:**
1. Define naming convention spec
2. Create schema metadata template
3. Rename existing schemas
4. Update schema-catalog.yaml
5. Update documentation
**Deliverables:**
- Schema naming convention spec
- Migrated schemas with versions
- Updated catalog
### Phase 2: Markdown Schema Format (3-5 days)
**Tasks:**
1. Design markdown schema format
2. Implement parser (extract JSON from markdown)
3. Implement generator (create markdown from JSON)
4. Convert existing schemas to markdown format
5. Update CLI to support .md schemas
6. Write documentation and examples
**Deliverables:**
- Markdown schema parser/generator
- All schemas in markdown format
- Updated CLI commands
- Migration guide
### Phase 3: Schema Validation (2-3 days)
**Tasks:**
1. Create metaschema for validating schemas
2. Add schema validation command
3. Validate all existing schemas
4. Add CI check for schema validity
**Deliverables:**
- Schema-for-schemas (metaschema)
- Validation command
- CI integration
## Cost-Benefit Analysis
### Option 1: Markdown-First (Recommended)
**Cost:**
- Parser implementation: ~200 lines
- CLI updates: ~100 lines
- Migration effort: 2-3 days
- Testing: 1 day
**Benefit:**
- Aligned with markdown philosophy ⭐⭐⭐⭐⭐
- Rich documentation ⭐⭐⭐⭐⭐
- Version history inline ⭐⭐⭐⭐
- Human-friendly ⭐⭐⭐⭐⭐
- Lower barrier to entry ⭐⭐⭐⭐
**Total:** High value for reasonable cost
### Option 2: Documentation Generator
**Cost:**
- Generator implementation: ~150 lines
- Template design: 1 day
- Testing: 0.5 days
**Benefit:**
- Simple implementation ⭐⭐⭐⭐
- Auto-sync docs ⭐⭐⭐⭐
- JSON remains source ⭐⭐⭐
**Total:** Good value, lower cost
### Option 3: Schema DSL
**Cost:**
- DSL design: 2-3 days
- Parser implementation: ~500 lines
- Compiler: ~300 lines
- Testing: 2 days
- Documentation: 1 day
**Benefit:**
- Pure markdown ⭐⭐⭐⭐⭐
- No JSON editing ⭐⭐⭐⭐
- Limited ecosystem ⭐⭐
**Total:** High cost, uncertain value
## Decision Matrix
| Criterion | Option 1: Markdown-First | Option 2: Doc Generator | Option 3: DSL |
|-----------|-------------------------|------------------------|---------------|
| Markdown alignment | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Implementation cost | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| Documentation quality | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Tool ecosystem | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| Maintainability | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| User-friendliness | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
## Recommendation Summary
1. **Immediate:** Implement naming convention and versioning
2. **Short-term:** Choose **Option 1 (Markdown-First)** for schema format
3. **Fallback:** If Option 1 proves too complex, use **Option 2 (Doc Generator)**
4. **Future:** Evaluate DSL if community demand emerges
## Next Steps
1. Review and approve this proposal
2. Create naming convention specification
3. Prototype markdown schema parser
4. Migrate one schema as proof-of-concept
5. Gather feedback and iterate
6. Full migration of all schemas
---
## Appendix: Example Markdown Schema
See `examples/schemas/manpage-schema-v1.md` for a complete example of the proposed format.