Files
markitect-main/roadmap/schema-of-schemas/SCHEMA_MANAGEMENT_PROPOSAL.md
tegwick b6f95066a3 chore: establish schema-of-schemas workplan and reorganize roadmap
This commit sets up the comprehensive workplan for implementing a
markdown-first schema management system with naming conventions,
versioning, and self-validation capabilities.

## Directory Reorganization

- Renamed `todo/` → `roadmap/` for better organization
- Created `roadmap/schema-of-schemas/` subdirectory
- Moved schema management planning artifacts to dedicated directory

## Planning Artifacts Created

### Workplan & Documentation
- **WORKPLAN.md** (19KB) - Comprehensive 6-phase implementation plan
- **SCHEMA_MANAGEMENT_PROPOSAL.md** - Full analysis with 4 options
- **SCHEMA_MANAGEMENT_SUMMARY.md** - Executive summary
- **README.md** - Quick reference guide

### Example Schema
- **examples/schemas/manpage-schema-v1.md** - Demonstrates markdown format

## Schema Management System Design

### Naming Convention
**Format:** `{domain}-schema-v{major}.{minor}.md`
**Examples:**
- `manpage-schema-v1.0.md`
- `terminology-schema-v1.0.md`
- `api-documentation-schema-v1.0.md`

### Markdown-First Format
Schemas will be markdown files with:
- YAML frontmatter for metadata
- Rich documentation sections
- Embedded JSON schema in code block
- Version history and examples

### Implementation Phases (8-10 days)

**Phase 0:** Planning & Setup  (0.5 days) - COMPLETE
**Phase 1:** Filename Convention (1 day) - NEXT
**Phase 2:** Markdown Loader (2-3 days)
**Phase 3:** Schema-for-Schemas (2 days)
**Phase 4:** Schema Migration (1-2 days)
**Phase 5:** CLI & Documentation (1 day)
**Phase 6:** Testing & Validation (1 day)

### Goals

1.  Establish naming convention
2.  Implement filename validation
3.  Create markdown schema loader
4.  Build schema-for-schemas metaschema
5.  Migrate 5 existing schemas (remove 2 duplicates)
6.  Update CLI and documentation

## Updated Tracking

### TODO.md
- Added Schema-of-Schemas as active work item
- Documented Phase 1 tasks and timeline
- Paused capability extraction work

### CHANGELOG.md
- Added schema management system to [Unreleased]
- Documented directory reorganization
- Added "In Progress" section for current work

## Next Steps

Begin Phase 1:
1. Implement schema_naming.py with validation
2. Add unit tests
3. Update CLI schema-ingest command
4. Create naming specification document

## Files Changed

- CHANGELOG.md - Added unreleased schema management features
- TODO.md - Updated active work tracking
- roadmap/ - Reorganized from todo/
- roadmap/schema-of-schemas/ - New planning directory
- examples/schemas/ - Example markdown schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 23:47:02 +01:00

14 KiB

Schema Management Proposal

Status: Draft Created: 2026-01-04 Author: Analysis of current state and proposed improvements

Problem Statement

1. Inconsistent Schema Naming

Current State:

terminology-schema.json       ← Has ".json" suffix
api-documentation             ← No suffix
enhanced-manpage              ← No suffix
markdown-manpage              ← No suffix, duplicate title
markdown-manpage-schema.json  ← Has ".json" suffix, duplicate title

Issues:

  • No naming convention enforced
  • Duplicate schemas (3 manpage schemas!)
  • Mix of suffixed (.json) and non-suffixed names
  • No way to distinguish versions

2. Missing Versioning

Current State:

  • No version in filenames
  • No version in schema metadata (beyond optional $id)
  • No way to track schema evolution
  • Breaking changes not apparent

Issues:

  • Can't have multiple versions simultaneously
  • No migration path when schemas change
  • Unclear which schema version a document uses

3. Format Mismatch: JSON vs Markdown

The Philosophical Problem:

MarkiTect is a markdown-centric tool, yet schemas are JSON files. This creates a conceptual and practical mismatch.

Current State:

  • Documents: Markdown (.md)
  • Schemas: JSON (.json)
  • No unified format for documentation + schema
  • Schemas lack rich documentation capabilities

Proposed Solutions

Part 1: Naming Convention & Versioning

Format: {domain}-{type}-schema-v{major}.{minor}.json

Examples:

manpage-schema-v1.0.json          # Manpage schema v1.0
manpage-schema-v2.0.json          # Breaking change → v2.0
terminology-schema-v1.0.json      # Terminology schema
api-documentation-schema-v1.0.json
arc42-schema-v1.0.json

Benefits:

  • Clear versioning in filename
  • Easy to see multiple versions
  • SemVer compatible (major.minor)
  • Searchable/sortable

Migration Strategy:

# Rename existing schemas
markdown-manpage → manpage-schema-v1.0.json
enhanced-manpage → manpage-schema-v2.0.json  # (breaking changes)
terminology-schema.json → terminology-schema-v1.0.json

Option B: $id-Based Versioning

Keep simple filenames, use $id for versioning:

{
  "$id": "https://markitect.dev/schemas/manpage/v1",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "version": "1.0.0",
  ...
}

Filenames: manpage-schema.json, terminology-schema.json

Benefits:

  • Clean filenames
  • Versioning in metadata
  • Follows JSON Schema best practices

Drawbacks:

  • Can't have multiple versions in same database
  • Harder to see versions at a glance

Recommendation: Hybrid Approach

Combine both for maximum clarity:

// File: manpage-schema-v1.json
{
  "$id": "https://markitect.dev/schemas/manpage/v1",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "version": "1.0.0",
  "title": "Unix Manual Page Schema",
  ...
}

Part 2: Schema Metadata Standard

Add required metadata to all schemas:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://markitect.dev/schemas/{domain}/v{major}",

  // Required metadata
  "version": "1.0.0",                    // SemVer
  "title": "Human Readable Title",
  "description": "Detailed description",

  // Optional metadata
  "x-markitect-schema-type": "document-schema",
  "x-markitect-version": {
    "major": 1,
    "minor": 0,
    "patch": 0
  },
  "x-markitect-author": "MarkiTect Project",
  "x-markitect-created": "2026-01-04",
  "x-markitect-updated": "2026-01-04",
  "x-markitect-deprecated": false,
  "x-markitect-superseded-by": null,
  "x-markitect-document-types": ["manpage", "manual"],
  "x-markitect-example": "examples/manpages/example.md",

  // Schema content
  "type": "object",
  "properties": { ... }
}

Part 3: Format Mismatch Solutions

File Format: Markdown with frontmatter and JSON code block

---
schema-version: "1.0.0"
schema-id: "https://markitect.dev/schemas/manpage/v1"
document-type: manpage
status: stable
---

# Manpage Schema v1.0

## Overview

This schema validates Unix/Linux manual page documentation following
standard conventions (SYNOPSIS, DESCRIPTION, OPTIONS, etc.).

## Document Types

- Manual pages (man pages)
- CLI command documentation
- API reference pages

## Usage

\`\`\`bash
markitect validate mycommand.1.md --schema manpage-schema-v1
\`\`\`

## Examples

See [examples/manpages/](../../examples/manpages/) for complete examples.

## Schema Definition

\`\`\`json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://markitect.dev/schemas/manpage/v1",
  "version": "1.0.0",
  "title": "Unix Manual Page Schema",
  "type": "object",
  "properties": {
    "headings": {
      "type": "object",
      "properties": {
        "level_1": { ... }
      }
    }
  },
  "x-markitect-sections": { ... }
}
\`\`\`

## Validation Rules

### Required Sections
- **NAME** - Command name and brief description
- **SYNOPSIS** - Command syntax
- **DESCRIPTION** - Detailed description

### Optional Sections
- **OPTIONS** - Command-line options
- **EXAMPLES** - Usage examples
- **SEE ALSO** - Related commands

## Version History

### v1.0.0 (2026-01-04)
- Initial release
- Basic manpage structure validation

Implementation:

class MarkdownSchemaLoader:
    """Load schemas from markdown files with embedded JSON."""

    def load_schema_from_markdown(self, md_path: Path) -> dict:
        """Extract JSON schema from markdown file."""
        content = md_path.read_text()

        # Parse frontmatter
        frontmatter = self._extract_frontmatter(content)

        # Extract JSON from code block
        schema_json = self._extract_json_from_code_block(content)

        # Merge metadata
        schema = json.loads(schema_json)
        schema['x-markitect-metadata'] = frontmatter

        return schema

    def save_schema_to_markdown(self, schema: dict, md_path: Path):
        """Save schema as markdown with embedded JSON."""
        # Generate markdown documentation
        doc = self._generate_schema_documentation(schema)

        # Embed JSON schema
        json_block = f"```json\n{json.dumps(schema, indent=2)}\n```"

        # Combine
        full_content = f"{doc}\n\n## Schema Definition\n\n{json_block}"
        md_path.write_text(full_content)

Benefits:

  • Markdown-first (aligns with MarkiTect philosophy)
  • Rich documentation alongside schema
  • Human-readable and editable
  • Version history in same file
  • Examples and usage inline
  • Can extract JSON when needed

Drawbacks:

  • ⚠️ Requires parsing logic
  • ⚠️ Two sources of truth (markdown + embedded JSON)
  • ⚠️ More complex than pure JSON

Option 2: Markdown Documentation Generator

Keep JSON schemas, auto-generate markdown docs:

schemas/
  manpage-schema-v1.json          # Source of truth
  manpage-schema-v1.md            # Auto-generated docs

Command:

markitect schema-document manpage-schema-v1.json
# Generates: manpage-schema-v1.md

Benefits:

  • Simple implementation
  • JSON remains source of truth
  • Auto-generated docs always in sync

Drawbacks:

  • ⚠️ Two files to manage
  • ⚠️ Can't hand-edit documentation (gets overwritten)

Option 3: Markdown Schema Language (DSL)

Define schemas in markdown-native syntax:

# Manpage Schema v1.0

## Document Structure

### Required Sections (Level 1 Heading)

**NAME**
- Classification: required
- Content: Command name in bold, followed by description
- Pattern: `**command** - description`

**SYNOPSIS**
- Classification: required
- Content: Command syntax with options
- Min paragraphs: 1
- Max paragraphs: 3

### Optional Sections

**OPTIONS**
- Classification: recommended
- Content: Definition list of command-line options

Parser generates JSON schema from markdown:

markitect schema-compile manpage-schema-v1.md --output manpage-schema-v1.json

Benefits:

  • Pure markdown
  • Human-friendly syntax
  • No JSON editing needed

Drawbacks:

  • ⚠️ Complex parser implementation
  • ⚠️ Limited to MarkiTect-specific features
  • ⚠️ Can't use standard JSON Schema tools

Option 4: Literate Schema Programming

Inspired by literate programming, mix documentation and schema:

# Manpage Schema v1.0

Manual pages follow a standard structure. The NAME section is required:

<<define-name-section>>=
{
  "NAME": {
    "classification": "required",
    "heading_level": 2,
    "content_instruction": "Command name and brief description"
  }
}

The SYNOPSIS section shows command syntax:

<<define-synopsis-section>>=
{
  "SYNOPSIS": {
    "classification": "required",
    "heading_level": 2,
    "min_code_blocks": 1
  }
}

Complete schema:

<<manpage-schema.json>>=
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "x-markitect-sections": {
    <<define-name-section>>,
    <<define-synopsis-section>>
  }
}

Benefits:

  • Documentation and schema interleaved
  • Literate programming benefits
  • Reusable schema fragments

Drawbacks:

  • ⚠️ Complex tangling/weaving
  • ⚠️ Unfamiliar paradigm
  • ⚠️ Overkill for simple schemas

Recommendations

Short-Term (Immediate)

  1. Naming Convention:

    • Format: {domain}-schema-v{major}.{minor}.json
    • Example: manpage-schema-v1.0.json
  2. Schema Metadata:

    • Add required version, title, description fields
    • Add x-markitect-* metadata extensions
    • Document in schema-catalog.yaml
  3. Duplicate Cleanup:

    • Consolidate 3 manpage schemas into versioned series
    • Keep enhanced-manpage as v2.0 (breaking changes)
    • Archive old schemas

Medium-Term (Next Phase)

  1. Markdown Schema Format (Option 1):

    • Implement markdown-first schema format
    • Markdown file with embedded JSON in code block
    • Parser extracts JSON for validation
    • Rich documentation alongside schema
  2. Schema Documentation Generator:

    • Auto-generate markdown docs from JSON schemas
    • Include examples, usage, version history
    • Link to example documents

Long-Term (Future)

  1. Schema DSL (Option 3):

    • Evaluate markdown schema language
    • Prototype parser for common patterns
    • Consider if DSL adds value over JSON
  2. Schema Registry API:

    • REST API for schema discovery
    • Version negotiation
    • Schema evolution tracking

Implementation Plan

Phase 1: Naming & Versioning (1-2 days)

Tasks:

  1. Define naming convention spec
  2. Create schema metadata template
  3. Rename existing schemas
  4. Update schema-catalog.yaml
  5. Update documentation

Deliverables:

  • Schema naming convention spec
  • Migrated schemas with versions
  • Updated catalog

Phase 2: Markdown Schema Format (3-5 days)

Tasks:

  1. Design markdown schema format
  2. Implement parser (extract JSON from markdown)
  3. Implement generator (create markdown from JSON)
  4. Convert existing schemas to markdown format
  5. Update CLI to support .md schemas
  6. Write documentation and examples

Deliverables:

  • Markdown schema parser/generator
  • All schemas in markdown format
  • Updated CLI commands
  • Migration guide

Phase 3: Schema Validation (2-3 days)

Tasks:

  1. Create metaschema for validating schemas
  2. Add schema validation command
  3. Validate all existing schemas
  4. Add CI check for schema validity

Deliverables:

  • Schema-for-schemas (metaschema)
  • Validation command
  • CI integration

Cost-Benefit Analysis

Cost:

  • Parser implementation: ~200 lines
  • CLI updates: ~100 lines
  • Migration effort: 2-3 days
  • Testing: 1 day

Benefit:

  • Aligned with markdown philosophy
  • Rich documentation
  • Version history inline
  • Human-friendly
  • Lower barrier to entry

Total: High value for reasonable cost

Option 2: Documentation Generator

Cost:

  • Generator implementation: ~150 lines
  • Template design: 1 day
  • Testing: 0.5 days

Benefit:

  • Simple implementation
  • Auto-sync docs
  • JSON remains source

Total: Good value, lower cost

Option 3: Schema DSL

Cost:

  • DSL design: 2-3 days
  • Parser implementation: ~500 lines
  • Compiler: ~300 lines
  • Testing: 2 days
  • Documentation: 1 day

Benefit:

  • Pure markdown
  • No JSON editing
  • Limited ecosystem

Total: High cost, uncertain value

Decision Matrix

Criterion Option 1: Markdown-First Option 2: Doc Generator Option 3: DSL
Markdown alignment
Implementation cost
Documentation quality
Tool ecosystem
Maintainability
User-friendliness

Recommendation Summary

  1. Immediate: Implement naming convention and versioning
  2. Short-term: Choose Option 1 (Markdown-First) for schema format
  3. Fallback: If Option 1 proves too complex, use Option 2 (Doc Generator)
  4. Future: Evaluate DSL if community demand emerges

Next Steps

  1. Review and approve this proposal
  2. Create naming convention specification
  3. Prototype markdown schema parser
  4. Migrate one schema as proof-of-concept
  5. Gather feedback and iterate
  6. Full migration of all schemas

Appendix: Example Markdown Schema

See examples/schemas/manpage-schema-v1.md for a complete example of the proposed format.