docs: add comprehensive LLM integration gameplan for issues #98 & #99

- Created detailed implementation strategy for OpenRoute integration (issue #98) - Designed auto-fill templates system with LLM assistance (issue #99) - Analyzed existing infrastructure and identified reusable components - Provided 4-6 week phased development plan with clear priorities - Included technical architecture, database schemas, and testing strategy - Added risk assessment, success metrics, and requirements engineering guidance - Recommended starting with OpenRoute client as shared foundation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-04 00:16:17 +02:00
parent 5143864a86
commit f63101cad8
1 changed files with 366 additions and 0 deletions
--- a/todo/LLM_INTEGRATION_GAMEPLAN.md
+++ b/todo/LLM_INTEGRATION_GAMEPLAN.md
@@ -0,0 +1,366 @@
+# LLM Integration Gameplan - Issues #98 & #99
+
+**Date**: 2025-10-03
+**Status**: REQUIREMENTS ANALYSIS
+**Priority**: HIGH
+**Estimated Effort**: 4-6 weeks development
+
+## 🎯 Executive Summary
+
+Two complementary features that will transform MarkiTect from a content management system into an **AI-powered knowledge assistant**:
+
+- **Issue #98**: OpenRoute Integration - Enable LLM queries against MarkiTect content
+- **Issue #99**: Auto Fill Templates - LLM-powered interactive template completion
+
+## 📋 Current State Analysis
+
+### ✅ Existing Infrastructure (Ready to Leverage)
+- **Template System**: Full template engine with parsing and rendering (`markitect/template/`)
+- **Configuration Manager**: Extensible config system with CLI integration
+- **Query Paradigms**: Natural Language paradigm exists (documented only)
+- **CLI Framework**: Click-based with established patterns
+- **Database**: SQLite with full metadata and content indexing
+- **FTS Search**: Full text search capabilities for content discovery
+
+### 🏗️ Infrastructure Gaps (Need Development)
+- **LLM Client**: No OpenRouter integration exists
+- **Profile System**: No user profile management
+- **Interactive UI**: No terminal questionnaire system
+- **Context Building**: No intelligent content selection for LLM queries
+
+## 🚀 Issue #98: OpenRoute Integration
+
+### Requirements Analysis
+```yaml
+Goal: "Use MarkiTect ingested content as context for interacting with LLMs flexibly and conveniently"
+User Story: "As a user, I want to ask natural language questions about my content and get intelligent responses with source citations"
+Integration: "Allow users to connect with an existing OpenRouter account"
+```
+
+### Technical Implementation Plan
+
+#### Phase 1: Core LLM Infrastructure (Week 1)
+1. **OpenRouter Client Development**
+   ```python
+   # markitect/llm/openrouter_client.py
+   class OpenRouterClient:
+       - API key management
+       - Model selection (GPT-4, Claude, etc.)
+       - Request/response handling
+       - Rate limiting and error handling
+       - Cost tracking
+   ```
+
+2. **Configuration Integration**
+   ```bash
+   markitect config-set openrouter.api_key sk-or-...
+   markitect config-set openrouter.default_model openai/gpt-4-turbo
+   markitect config-show --show-sensitive  # Show API keys
+   ```
+
+3. **Basic CLI Commands**
+   ```bash
+   markitect llm test                    # Test OpenRouter connection
+   markitect llm models                  # List available models
+   markitect llm ask "Simple question"   # Basic LLM interaction
+   ```
+
+#### Phase 2: Content Context Integration (Week 2)
+4. **Context Builder System**
+   ```python
+   # markitect/llm/context_builder.py
+   class ContextBuilder:
+       - Extract relevant content from database
+       - Use FTS search for content discovery
+       - Build context within token limits
+       - Include metadata and relationships
+   ```
+
+5. **Enhanced Natural Language Paradigm**
+   ```python
+   # Update markitect/query_paradigms/paradigms/natural_language_paradigm.py
+   class NaturalLanguageQueryParadigm:
+       - Integrate OpenRouter for real LLM processing
+       - Build context from MarkiTect content
+       - Return structured responses with citations
+   ```
+
+6. **Advanced CLI Integration**
+   ```bash
+   markitect paradigms exec "Natural Language" "What are the main API concepts?"
+   markitect llm chat                                    # Interactive mode
+   markitect llm ask "Summarize docs tagged tutorial"   # Filtered context
+   ```
+
+#### Phase 3: Advanced Features (Week 3)
+7. **Smart Context Selection**
+   - Relevance scoring for content inclusion
+   - Context size optimization
+   - Source citation tracking
+
+8. **Response Enhancement**
+   - Markdown formatting
+   - Source links back to MarkiTect files
+   - Follow-up question suggestions
+
+### Success Criteria
+- ✅ OpenRouter integration working with API key configuration
+- ✅ Natural language queries return relevant, contextualized responses
+- ✅ Responses include source citations linking to MarkiTect files
+- ✅ Context building intelligently selects relevant content
+- ✅ CLI commands integrated with existing paradigm system
+
+## 📝 Issue #99: Auto Fill Templates
+
+### Requirements Analysis
+```yaml
+Goal: "Use Markdown Templates to capture data with terminal questionnaire and LLM auto-fill"
+User Story: "As a user, I want to fill templates interactively, with the system auto-completing fields based on my profile"
+LLM Integration: "Provided the user has a profile, an LLM should autofill based on the profile provided"
+```
+
+### Technical Implementation Plan
+
+#### Phase 1: Enhanced Template System (Week 1)
+1. **Template Field Analysis**
+   ```python
+   # markitect/template/field_analyzer.py
+   class TemplateFieldAnalyzer:
+       - Parse template annotations: {{name:string:Your full name}}
+       - Extract field types, descriptions, validation rules
+       - Identify required vs optional fields
+       - Support nested field structures
+   ```
+
+2. **Interactive Questionnaire Engine**
+   ```python
+   # markitect/template/questionnaire.py
+   class TemplateQuestionnaire:
+       - Terminal-based interactive data collection
+       - Support input types: text, choice, date, number, boolean
+       - Field validation and re-prompting
+       - Progress tracking and partial save
+   ```
+
+3. **Basic CLI Commands**
+   ```bash
+   markitect template-fill template.md            # Interactive questionnaire
+   markitect template-analyze template.md         # Show template fields
+   markitect template-validate template.md        # Validate template syntax
+   ```
+
+#### Phase 2: User Profile System (Week 2)
+4. **Profile Management**
+   ```python
+   # markitect/profile/manager.py
+   class ProfileManager:
+       - Create, read, update, delete profiles
+       - Support multiple profiles (personal, work, etc.)
+       - Profile inheritance and templates
+       - Database storage integration
+   ```
+
+5. **Profile Schema System**
+   ```python
+   # markitect/profile/schema.py
+   - Standard profile fields (personal, professional, technical)
+   - Custom field extensions
+   - JSON Schema validation
+   - Field type definitions and constraints
+   ```
+
+6. **Profile CLI Commands**
+   ```bash
+   markitect profile create personal
+   markitect profile set personal.name "John Doe"
+   markitect profile set personal.email "john@example.com"
+   markitect profile show personal
+   markitect profile list
+   markitect profile export personal profile.json
+   ```
+
+#### Phase 3: LLM-Powered Auto-Fill (Week 3)
+7. **Smart Field Completion**
+   ```python
+   # markitect/template/auto_filler.py
+   class LLMAutoFiller:
+       - Use OpenRouter LLM for field suggestions
+       - Context-aware completions based on template purpose
+       - Profile-informed field values
+       - Learning from user corrections
+   ```
+
+8. **Advanced Template Fill Modes**
+   ```bash
+   markitect template-fill template.md --auto              # Auto-fill from profile
+   markitect template-fill template.md --guided            # Mix auto + questions
+   markitect template-fill template.md --profile=work      # Use specific profile
+   markitect template-fill template.md --learn             # Learn from corrections
+   ```
+
+#### Phase 4: Advanced Features (Week 4)
+9. **Field Intelligence**
+   - Template field learning and preferences
+   - Content generation for complex fields
+   - Multi-step form workflows
+   - Field dependencies and conditional logic
+
+10. **Integration Features**
+    - Template field suggestions based on existing content
+    - Auto-population from MarkiTect database
+    - Template version control and updates
+
+### Success Criteria
+- ✅ Interactive terminal questionnaire for template completion
+- ✅ User profile system with multiple profile support
+- ✅ LLM-powered auto-fill suggestions based on user profile
+- ✅ Enhanced template parser supporting field metadata
+- ✅ Seamless integration with existing template rendering system
+
+## 🔗 Shared Infrastructure Requirements
+
+### Database Schema Extensions
+```sql
+-- User profiles table
+CREATE TABLE user_profiles (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL UNIQUE,
+    data JSON NOT NULL,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- LLM interaction logs (optional)
+CREATE TABLE llm_interactions (
+    id INTEGER PRIMARY KEY,
+    query TEXT NOT NULL,
+    response TEXT NOT NULL,
+    model TEXT NOT NULL,
+    tokens_used INTEGER,
+    cost REAL,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- Template usage history
+CREATE TABLE template_usage (
+    id INTEGER PRIMARY KEY,
+    template_path TEXT NOT NULL,
+    field_data JSON NOT NULL,
+    profile_used TEXT,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+### Configuration Extensions
+```yaml
+# .markitect.yml additions
+openrouter:
+  api_key: "sk-or-..."
+  default_model: "openai/gpt-4-turbo"
+  max_tokens: 4096
+  temperature: 0.7
+
+profiles:
+  default_profile: "personal"
+  auto_save: true
+
+templates:
+  auto_fill_mode: "guided"  # auto, interactive, guided
+  learn_from_corrections: true
+```
+
+## 📊 Implementation Priority Matrix
+
+| Component | Issue | Priority | Effort | Dependencies |
+|-----------|-------|----------|--------|--------------|
+| OpenRouter Client | #98 | HIGH | 2 days | Config system |
+| Context Builder | #98 | HIGH | 3 days | FTS, Database |
+| Profile Manager | #99 | HIGH | 2 days | Database |
+| Template Field Parser | #99 | HIGH | 3 days | Template system |
+| Interactive Questionnaire | #99 | MEDIUM | 4 days | Profile system |
+| LLM Auto-Fill | #99 | MEDIUM | 4 days | OpenRouter, Profiles |
+| Natural Language Enhancement | #98 | MEDIUM | 2 days | OpenRouter, Context |
+| Advanced Context | #98 | LOW | 3 days | Basic LLM working |
+
+## 🧪 Testing Strategy
+
+### Unit Tests Required
+- OpenRouter client error handling and retries
+- Template field parsing and validation
+- Profile CRUD operations
+- Context building with different content types
+- LLM response formatting and citation extraction
+
+### Integration Tests Required
+- End-to-end template filling workflow
+- Natural language queries with MarkiTect context
+- Profile-based auto-fill accuracy
+- CLI command integration
+
+### Manual Testing Scenarios
+1. **OpenRouter Setup**: User configures API key and tests connection
+2. **Template Creation**: User creates template with various field types
+3. **Profile Management**: User creates and manages multiple profiles
+4. **Interactive Fill**: User completes template via questionnaire
+5. **Auto-Fill**: System suggests field values based on profile
+6. **LLM Queries**: User asks questions about their content
+7. **Context Accuracy**: Verify LLM responses cite correct sources
+
+## 🎯 Success Metrics & KPIs
+
+### Quantitative Metrics
+- **Template Completion Time**: Reduce by 60% with auto-fill
+- **Query Response Accuracy**: >90% relevant context inclusion
+- **User Satisfaction**: >8/10 rating for LLM responses
+- **Profile Usage**: >75% of template fills use profile data
+
+### Qualitative Metrics
+- **User Experience**: Seamless workflow integration
+- **Content Discovery**: Users find value in LLM-powered content exploration
+- **Productivity**: Templates become preferred method for document creation
+- **Accuracy**: LLM suggestions match user intent and context
+
+## 🚧 Risk Assessment & Mitigation
+
+### Technical Risks
+1. **OpenRouter API Changes**: Mitigate with versioned API client and error handling
+2. **Token Limits**: Implement intelligent context truncation and chunking
+3. **LLM Response Quality**: Add response validation and fallback mechanisms
+4. **Performance**: Cache common queries and optimize context building
+
+### User Experience Risks
+1. **Complex Configuration**: Provide setup wizard and clear documentation
+2. **Learning Curve**: Include examples and guided tutorials
+3. **Profile Privacy**: Implement secure storage and optional features
+4. **Cost Concerns**: Add usage tracking and budget controls
+
+## 📝 Requirements Engineering Notes
+
+**FOR REQUIREMENTS ENGINEER:**
+
+1. **User Research Needed**:
+   - Survey existing MarkiTect users about LLM integration preferences
+   - Gather template usage patterns and pain points
+   - Validate profile data schema with target users
+
+2. **Technical Validation Required**:
+   - Verify OpenRouter API capabilities and limitations
+   - Test LLM response quality with MarkiTect content types
+   - Validate template field parsing edge cases
+
+3. **Feature Prioritization**:
+   - Consider implementing #98 first for immediate value
+   - #99 can follow as enhanced template experience
+   - Both share OpenRouter infrastructure investment
+
+4. **Alternative Approaches**:
+   - Consider other LLM providers (Anthropic direct, Azure OpenAI)
+   - Evaluate local LLM options for privacy-conscious users
+   - Template auto-fill could work without LLM (rule-based initially)
+
+5. **Integration Points**:
+   - Leverage existing Query Paradigm system for #98
+   - Build on solid template foundation for #99
+   - Utilize configuration manager for seamless setup
+
+**RECOMMENDATION**: Proceed with implementation in phases, starting with OpenRouter client and basic LLM integration for #98, then expanding to template auto-fill for #99. The shared infrastructure investment will benefit both features significantly.