# MarkiTect Schema Generation Capability Outline - GAMEPLAN ## 🎯 Mission: Transform MarkiTect from Static Analysis to Dynamic Generation **Parent Issue**: [#46 - Schema generation capability outline](http://gitea.coulomb.social/coulomb/markitect_project/issues/46) **Vision**: Enable users to generate document variations from example documents through schema-driven templates with content instructions and data automation. --- ## 📋 Issue Breakdown & Implementation Order ### **🏗️ Phase 1: Foundation (HIGH PRIORITY)** #### Issue #50: Define metaschema for JSON schema structure - **Priority**: High - **Status**: Ready to start - **Dependencies**: Current schema generation (Issue #5), JSON Schema validation (Issue #7) - **Goal**: Create JSON Schema specification that extends standard JSON Schema with MarkiTect-specific features - **Key Features**: - Heading text capture support - Content field instructions support - Outline structure representation - Backward compatibility with existing schemas - **Start Command**: `make tdd-start NUM=50` --- ### **🔧 Phase 2: Core Features (HIGH-MEDIUM PRIORITY)** #### Issue #51: Add outline mode to schema generation - **Priority**: High - **Dependencies**: Metaschema definition (Issue #50) - **Goal**: `markitect schema-generate --mode outline --depth 3 --outfile invoice.json example.md` - **Key Features**: - New `--mode outline` option - `--depth` parameter for control - Schema title: "Schema from example.md" (not "for") - Actual heading text capture #### Issue #52: Capture actual heading text in schemas - **Priority**: Medium - **Dependencies**: Metaschema (Issue #50), Current schema generation (Issue #5) - **Goal**: Preserve exact heading text in schemas for validation - **Key Features**: - Store heading text alongside structure - Enable heading text validation - Meaningful error messages for mismatches --- ### **📝 Phase 3: Content Instructions (MEDIUM PRIORITY)** #### Issue #54: Add content field instruction capabilities - **Priority**: Medium - **Dependencies**: Metaschema (Issue #50), Heading text capture (Issue #52) - **Goal**: Include guidance for content authors in schemas - **Key Features**: - Instructions for each section/content area - Support for different content types - Optional/required instruction flags - CLI support for adding instructions --- ### **🚀 Phase 4: Generation Pipeline (MEDIUM PRIORITY)** #### Issue #55: Schema-based draft generation - **Priority**: Medium - **Dependencies**: All previous issues, Current stub generation (Issue #6) - **Goal**: Generate document templates from schemas with instructions - **Key Features**: - New CLI command for draft generation - Proper heading hierarchy from schema - Content instruction placeholders - Schema reference for future validation --- ### **🤖 Phase 5: Data Automation (LOW PRIORITY)** #### Issue #56: Data-driven multiple draft generation - **Priority**: Low - **Dependencies**: Schema-based draft generation (Issue #55) - **Goal**: Batch document generation from data sources - **Key Features**: - Multiple data formats (JSON, CSV) - Field mapping from data to schema - Batch generation capabilities - Data validation against schema --- ## 🛣️ Complete User Workflow (Target State) ```bash # 1. Generate schema from example document markitect schema-generate --mode outline --depth 3 --outfile requirements_schema.json example_requirements.md # 2. Tune the schema (manual editing) # - Remove overly specific elements # - Add content instructions # - Refine outline structure # 3. Generate drafts from schema markitect generate-draft requirements_schema.json --outfile new_requirements.md # 4. Data-driven batch generation (future) markitect generate-batch requirements_schema.json --data projects.csv --output-dir ./generated/ # 5. Validate generated documents markitect validate new_requirements.md requirements_schema.json ``` --- ## 🎯 Implementation Strategy ### **Foundation-First Approach** 1. **Start with Issue #50** - metaschema is prerequisite for everything 2. **Parallel development** possible for Issues #51, #52 after #50 3. **Sequential dependency** for Issues #54, #55, #56 ### **TDD Workflow Integration** - Use `make tdd-start NUM=X` for each issue - Write tests first, implement features second - Maintain backward compatibility throughout ### **Testing Strategy** - Each issue requires comprehensive test coverage - Integration tests for end-to-end workflow - Performance testing for batch generation ### **Documentation Requirements** - CLI help updates for new options - User guide for complete workflow - API documentation for new schema features --- ## 📊 Success Metrics ### **Phase 1 Success**: Metaschema Defined - ✅ Extended JSON Schema with MarkiTect features - ✅ Backward compatibility maintained - ✅ Validation rules implemented ### **Phase 2 Success**: Outline Mode Working - ✅ `--mode outline` generates proper schemas - ✅ Heading text captured accurately - ✅ Depth control functional ### **Phase 3 Success**: Instructions Integrated - ✅ Content instructions in schemas - ✅ Instructions appear in generated drafts - ✅ Validation includes instruction compliance ### **Phase 4 Success**: Draft Generation - ✅ Schema-to-document generation working - ✅ Structured templates with placeholders - ✅ Round-trip validation (generate → validate) ### **Phase 5 Success**: Data Automation - ✅ Batch generation from data sources - ✅ Field mapping functionality - ✅ Production-ready automation pipeline --- ## 🚦 Current Status **Active Phase**: Ready to start Phase 1 **Next Action**: `make tdd-start NUM=50` **Estimated Timeline**: 6-8 development sessions across phases **Risk Level**: Low (building on solid foundation) --- ## 📝 Notes - This gameplan transforms Issue #46 from concept to implementation roadmap - Each phase delivers user value incrementally - Foundation-first approach ensures stable architecture - TDD methodology maintains quality throughout development - End result: Powerful document automation pipeline for MarkiTect users **Last Updated**: 2025-01-26 **Status**: Active Gameplan