Files
markitect-main/docs/user-guides/SCHEMA_REFINEMENT_TOOLS.md
tegwick da34303057 docs: add comprehensive Phase 2 documentation and mark completion
Created detailed user guide for schema refinement tools:
- Command reference for schema-analyze and schema-refine
- Complete options and examples
- Issue type explanations with before/after examples
- Workflow guides (basic, interactive, CI/CD, migration)
- Best practices and troubleshooting
- Integration examples (Git hooks, Makefile, Python)
- Rigidity score interpretation table

Updated TODO.md to mark Phase 2 completion:
- Documented all delivered features
- Listed key capabilities (rigidity detection, auto-refine, interactive mode)
- Noted test coverage (33 tests, 100% passing)
- Added example results (60/100 → 24/100 rigidity reduction)

Phase 2 is now complete and fully documented.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 21:35:24 +01:00

12 KiB

Schema Refinement Tools - User Guide

Overview

MarkiTect Phase 2 introduces powerful schema refinement tools to help you analyze and improve JSON schemas for markdown validation. These tools detect rigidity issues and automatically apply fixes to make schemas more flexible and reusable.

Quick Start

# Analyze a schema for rigidity issues
markitect schema-analyze examples/manpages/markdown-manpage-schema.json

# Refine a schema automatically
markitect schema-refine examples/manpages/markdown-manpage-schema.json --output refined-schema.json

# Review each fix interactively
markitect schema-refine examples/manpages/markdown-manpage-schema.json --interactive

Commands

schema-analyze

Analyzes a JSON schema to detect rigidity issues and calculate a rigidity score (0-100).

Usage

markitect schema-analyze <schema-file> [OPTIONS]

Options

  • --verbose, -v: Show detailed analysis with current and suggested values

Examples

# Basic analysis
markitect schema-analyze schema.json

# Verbose output with details
markitect schema-analyze schema.json --verbose

Output

The analyzer provides:

  • Rigidity Score (0-100): Higher scores indicate more rigid schemas

    • 0-40: LOW - Flexible, good design
    • 41-70: MEDIUM - Some rigidity detected
    • 71-100: HIGH - Very rigid, needs refinement
  • Phase 1 Features: Checks for classification system and content control

  • Issue Count: Breakdown by severity (Errors, Warnings, Info)

  • Detected Issues: List of problems with suggestions

Exit Codes

  • 0: Schema is flexible (score ≤ 50)
  • 1: Schema is rigid (score > 50)
  • 2: Error occurred

schema-refine

Automatically refines rigid schemas by applying fixes for detected issues.

Usage

markitect schema-refine <schema-file> [OPTIONS]

Options

  • --output, -o PATH: Output file (default: overwrite input file)
  • --loosen-counts: Convert exact counts to flexible ranges (default: enabled)
  • --no-loosen-counts: Disable count loosening
  • --round-numbers: Round overly specific numbers (default: enabled)
  • --no-round-numbers: Disable number rounding
  • --migrate-deprecated: Document deprecated extensions (default: disabled)
  • --dry-run: Show changes without applying them
  • --interactive, -i: Prompt for each refinement interactively

Examples

# Refine schema in place
markitect schema-refine schema.json

# Preview changes without applying
markitect schema-refine schema.json --dry-run

# Save refined schema to new file
markitect schema-refine schema.json --output refined-schema.json

# Review each fix interactively
markitect schema-refine schema.json --interactive

# Disable specific refinements
markitect schema-refine schema.json --no-loosen-counts

Refinement Actions

The refiner automatically applies these fixes:

  1. Exact Count Loosening: Converts exact counts to flexible ranges

    • Before: "minItems": 5, "maxItems": 5
    • After: "minItems": 3, "maxItems": 10
  2. Const Value Conversion: Replaces exact value constraints with ranges

    • Before: "const": 1
    • After: "minimum": 0, "maximum": 2
  3. Number Rounding: Rounds overly specific numbers

    • Before: "minItems": 73
    • After: "minItems": 70
  4. Range Widening: Expands narrow integer ranges

    • Before: "minimum": 5, "maximum": 6
    • After: "minimum": 0, "maximum": 11

Exit Codes

  • 0: Success with changes applied
  • 1: Success but no changes needed
  • 2: Error occurred

Issue Types

Exact Count (WARNING)

Problem: Schema requires exact number of items, leaving no flexibility.

Example:

{
  "type": "array",
  "minItems": 5,
  "maxItems": 5
}

Fix: Convert to a range

{
  "type": "array",
  "minItems": 3,
  "maxItems": 10
}

Const Value (WARNING)

Problem: Property must have exact value.

Example:

{
  "type": "integer",
  "const": 1
}

Fix: Replace with range for numeric values

{
  "type": "integer",
  "minimum": 0,
  "maximum": 2
}

Overly Specific Numbers (INFO)

Problem: Numbers are too specific (like 73 instead of 70).

Example:

{
  "type": "array",
  "minItems": 73
}

Fix: Round to nearest 10

{
  "type": "array",
  "minItems": 70
}

No Flexibility (INFO)

Problem: Integer range is too narrow.

Example:

{
  "type": "integer",
  "minimum": 5,
  "maximum": 6
}

Fix: Widen the range

{
  "type": "integer",
  "minimum": 0,
  "maximum": 11
}

Missing Classifications (INFO)

Problem: Schema doesn't use the Phase 1 classification system.

Suggestion: Add x-markitect-sections to classify sections as required/recommended/optional/discouraged/improper.

Missing Content Control (INFO)

Problem: Schema lacks content validation patterns and quality metrics.

Suggestion: Add x-markitect-content-control for pattern validation and quality requirements.

Deprecated Extensions (WARNING)

Problem: Schema uses old extension format.

Example: x-markitect-required-sections

Suggestion: Migrate to x-markitect-sections with classification system.

Workflows

Basic Workflow: Analyze and Refine

  1. Analyze your schema to understand issues:

    markitect schema-analyze my-schema.json --verbose
    
  2. Preview refinements before applying:

    markitect schema-refine my-schema.json --dry-run
    
  3. Apply refinements:

    markitect schema-refine my-schema.json --output my-schema-refined.json
    
  4. Verify improvements:

    markitect schema-analyze my-schema-refined.json
    

Interactive Workflow

For fine-grained control, use interactive mode:

markitect schema-refine my-schema.json --interactive

The tool will:

  1. Show each detected issue
  2. Display current and suggested values
  3. Prompt for confirmation (y/N/q)
  4. Apply only approved fixes

Example session:

Issue 1/4
  Type: exact_count
  Path: properties.headings.level_1
  Array 'level_1' requires exactly 1 items
  Suggestion: Use a range like minItems: 0, maxItems: 6
  Current: {"minItems": 1, "maxItems": 1}
  Suggested: {"minItems": 0, "maxItems": 6}

Apply this fix? [y/N/q]: y
  ✓ Applied

CI/CD Integration

Use exit codes to enforce schema quality in your pipeline:

#!/bin/bash

# Analyze schema and fail if rigid
if ! markitect schema-analyze schema.json; then
    echo "Schema is too rigid (score > 50)"
    echo "Run: markitect schema-refine schema.json"
    exit 1
fi

echo "Schema quality check passed"

Schema Migration Workflow

Migrating from old format to Phase 1:

  1. Analyze to identify deprecated extensions:

    markitect schema-analyze old-schema.json
    
  2. Document deprecated extensions:

    markitect schema-refine old-schema.json --migrate-deprecated
    
  3. Manually migrate to new format (automatic migration not implemented due to complexity)

Best Practices

When to Use schema-analyze

  • Before committing schemas to version control
  • During code review to ensure quality
  • When creating new schemas from examples
  • To understand why a schema fails validation

When to Use schema-refine

  • After auto-generating schemas from documents
  • When inheriting legacy schemas
  • To quickly fix common rigidity issues
  • Before publishing schemas for reuse

When to Use --interactive

  • When you need fine-grained control
  • For schemas with domain-specific requirements
  • When learning about schema design
  • To review fixes before applying

For most use cases:

# Balanced refinement (default)
markitect schema-refine schema.json

# Conservative (preserve more constraints)
markitect schema-refine schema.json --no-round-numbers

# Aggressive (maximum flexibility)
markitect schema-refine schema.json --loosen-counts --round-numbers

Understanding Rigidity Scores

The rigidity score is calculated by weighting detected issues:

Issue Type Weight
Exact Count 15
Overly Specific 10
No Flexibility 8
Missing Classifications 5
Deprecated Extensions 5
Missing Content Control 3

Score Interpretation:

  • 0-20: Excellent - Well-designed, flexible schema
  • 21-40: Good - Minor improvements possible
  • 41-60: Fair - Moderate rigidity, refinement recommended
  • 61-80: Poor - Significant rigidity, refinement needed
  • 81-100: Very Poor - Highly rigid, manual review recommended

Integration Examples

Git Pre-commit Hook

#!/bin/bash
# .git/hooks/pre-commit

SCHEMAS=$(git diff --cached --name-only --diff-filter=ACM | grep '\.json$')

for schema in $SCHEMAS; do
    if markitect schema-analyze "$schema" 2>&1 | grep -q "RIGID"; then
        echo "Error: $schema is too rigid"
        echo "Run: markitect schema-refine $schema"
        exit 1
    fi
done

Makefile Target

.PHONY: check-schemas
check-schemas:
    @for schema in schemas/*.json; do \
        echo "Checking $$schema..."; \
        markitect schema-analyze $$schema || exit 1; \
    done

.PHONY: refine-schemas
refine-schemas:
    @for schema in schemas/*.json; do \
        echo "Refining $$schema..."; \
        markitect schema-refine $$schema; \
    done

Python Integration

import subprocess
import json

def analyze_schema(schema_path):
    """Analyze a schema and return rigidity score."""
    result = subprocess.run(
        ["markitect", "schema-analyze", schema_path],
        capture_output=True,
        text=True
    )

    # Parse output for score
    for line in result.stdout.split('\n'):
        if 'Rigidity Score:' in line:
            score = int(line.split(':')[1].split('/')[0].strip())
            return score
    return None

def refine_schema(schema_path, output_path):
    """Refine a schema and save to output path."""
    result = subprocess.run(
        ["markitect", "schema-refine", schema_path, "-o", output_path],
        capture_output=True,
        text=True
    )
    return result.returncode == 0

# Usage
score = analyze_schema("schema.json")
if score > 50:
    print(f"Schema is rigid (score: {score})")
    refine_schema("schema.json", "schema-refined.json")

Troubleshooting

Schema Not Found

Error: Error: Schema file not found: schema.json

Solution: Check file path and ensure file exists.

Invalid JSON

Error: Error: Invalid JSON in schema file

Solution: Validate JSON syntax using jsonlint or similar tool.

No Changes Applied

Output: No refinements needed - schema is already flexible

Reason: Schema doesn't have any detectable rigidity issues or has rigidity score < 50.

Action: Use --verbose to see all issues including INFO level.

Refinement Broke Schema

Problem: Refined schema is too permissive.

Solution:

  1. Use --interactive to selectively apply fixes
  2. Use --no-loosen-counts or --no-round-numbers to preserve constraints
  3. Manually adjust ranges after refinement

See Also

Support

For issues, questions, or feature requests: