markitect-main/history/LLM_INTEGRATION_GAMEPLAN.md

# LLM Integration Gameplan - Issues #98 & #99

**Date**: 2025-10-03
**Status**: REQUIREMENTS ANALYSIS
**Priority**: HIGH
**Estimated Effort**: 4-6 weeks development

## 🎯 Executive Summary

Two complementary features that will transform MarkiTect from a content management system into an **AI-powered knowledge assistant**:

- **Issue #98**: OpenRoute Integration - Enable LLM queries against MarkiTect content
- **Issue #99**: Auto Fill Templates - LLM-powered interactive template completion

## 📋 Current State Analysis

### ✅ Existing Infrastructure (Ready to Leverage)
- **Template System**: Full template engine with parsing and rendering (`markitect/template/`)
- **Configuration Manager**: Extensible config system with CLI integration
- **Query Paradigms**: Natural Language paradigm exists (documented only)
- **CLI Framework**: Click-based with established patterns
- **Database**: SQLite with full metadata and content indexing
- **FTS Search**: Full text search capabilities for content discovery

### 🏗️ Infrastructure Gaps (Need Development)
- **LLM Client**: No OpenRouter integration exists
- **Profile System**: No user profile management
- **Interactive UI**: No terminal questionnaire system
- **Context Building**: No intelligent content selection for LLM queries

## 🚀 Issue #98: OpenRoute Integration

### Requirements Analysis
```yaml
Goal: "Use MarkiTect ingested content as context for interacting with LLMs flexibly and conveniently"
User Story: "As a user, I want to ask natural language questions about my content and get intelligent responses with source citations"
Integration: "Allow users to connect with an existing OpenRouter account"
```

### Technical Implementation Plan

#### Phase 1: Core LLM Infrastructure (Week 1)
1. **OpenRouter Client Development**
   ```python
   # markitect/llm/openrouter_client.py
   class OpenRouterClient:
       - API key management
       - Model selection (GPT-4, Claude, etc.)
       - Request/response handling
       - Rate limiting and error handling
       - Cost tracking
   ```

2. **Configuration Integration**
   ```bash
   markitect config-set openrouter.api_key sk-or-...
   markitect config-set openrouter.default_model openai/gpt-4-turbo
   markitect config-show --show-sensitive  # Show API keys
   ```

3. **Basic CLI Commands**
   ```bash
   markitect llm test                    # Test OpenRouter connection
   markitect llm models                  # List available models
   markitect llm ask "Simple question"   # Basic LLM interaction
   ```

#### Phase 2: Content Context Integration (Week 2)
4. **Context Builder System**
   ```python
   # markitect/llm/context_builder.py
   class ContextBuilder:
       - Extract relevant content from database
       - Use FTS search for content discovery
       - Build context within token limits
       - Include metadata and relationships
   ```

5. **Enhanced Natural Language Paradigm**
   ```python
   # Update markitect/query_paradigms/paradigms/natural_language_paradigm.py
   class NaturalLanguageQueryParadigm:
       - Integrate OpenRouter for real LLM processing
       - Build context from MarkiTect content
       - Return structured responses with citations
   ```

6. **Advanced CLI Integration**
   ```bash
   markitect paradigms exec "Natural Language" "What are the main API concepts?"
   markitect llm chat                                    # Interactive mode
   markitect llm ask "Summarize docs tagged tutorial"   # Filtered context
   ```

#### Phase 3: Advanced Features (Week 3)
7. **Smart Context Selection**
   - Relevance scoring for content inclusion
   - Context size optimization
   - Source citation tracking

8. **Response Enhancement**
   - Markdown formatting
   - Source links back to MarkiTect files
   - Follow-up question suggestions

### Success Criteria
- ✅ OpenRouter integration working with API key configuration
- ✅ Natural language queries return relevant, contextualized responses
- ✅ Responses include source citations linking to MarkiTect files
- ✅ Context building intelligently selects relevant content
- ✅ CLI commands integrated with existing paradigm system

## 📝 Issue #99: Auto Fill Templates

### Requirements Analysis
```yaml
Goal: "Use Markdown Templates to capture data with terminal questionnaire and LLM auto-fill"
User Story: "As a user, I want to fill templates interactively, with the system auto-completing fields based on my profile"
LLM Integration: "Provided the user has a profile, an LLM should autofill based on the profile provided"
```

### Technical Implementation Plan

#### Phase 1: Enhanced Template System (Week 1)
1. **Template Field Analysis**
   ```python
   # markitect/template/field_analyzer.py
   class TemplateFieldAnalyzer:
       - Parse template annotations: {{name:string:Your full name}}
       - Extract field types, descriptions, validation rules
       - Identify required vs optional fields
       - Support nested field structures
   ```

2. **Interactive Questionnaire Engine**
   ```python
   # markitect/template/questionnaire.py
   class TemplateQuestionnaire:
       - Terminal-based interactive data collection
       - Support input types: text, choice, date, number, boolean
       - Field validation and re-prompting
       - Progress tracking and partial save
   ```

3. **Basic CLI Commands**
   ```bash
   markitect template-fill template.md            # Interactive questionnaire
   markitect template-analyze template.md         # Show template fields
   markitect template-validate template.md        # Validate template syntax
   ```

#### Phase 2: User Profile System (Week 2)
4. **Profile Management**
   ```python
   # markitect/profile/manager.py
   class ProfileManager:
       - Create, read, update, delete profiles
       - Support multiple profiles (personal, work, etc.)
       - Profile inheritance and templates
       - Database storage integration
   ```

5. **Profile Schema System**
   ```python
   # markitect/profile/schema.py
   - Standard profile fields (personal, professional, technical)
   - Custom field extensions
   - JSON Schema validation
   - Field type definitions and constraints
   ```

6. **Profile CLI Commands**
   ```bash
   markitect profile create personal
   markitect profile set personal.name "John Doe"
   markitect profile set personal.email "john@example.com"
   markitect profile show personal
   markitect profile list
   markitect profile export personal profile.json
   ```

#### Phase 3: LLM-Powered Auto-Fill (Week 3)
7. **Smart Field Completion**
   ```python
   # markitect/template/auto_filler.py
   class LLMAutoFiller:
       - Use OpenRouter LLM for field suggestions
       - Context-aware completions based on template purpose
       - Profile-informed field values
       - Learning from user corrections
   ```

8. **Advanced Template Fill Modes**
   ```bash
   markitect template-fill template.md --auto              # Auto-fill from profile
   markitect template-fill template.md --guided            # Mix auto + questions
   markitect template-fill template.md --profile=work      # Use specific profile
   markitect template-fill template.md --learn             # Learn from corrections
   ```

#### Phase 4: Advanced Features (Week 4)
9. **Field Intelligence**
   - Template field learning and preferences
   - Content generation for complex fields
   - Multi-step form workflows
   - Field dependencies and conditional logic

10. **Integration Features**
    - Template field suggestions based on existing content
    - Auto-population from MarkiTect database
    - Template version control and updates

### Success Criteria
- ✅ Interactive terminal questionnaire for template completion
- ✅ User profile system with multiple profile support
- ✅ LLM-powered auto-fill suggestions based on user profile
- ✅ Enhanced template parser supporting field metadata
- ✅ Seamless integration with existing template rendering system

## 🔗 Shared Infrastructure Requirements

### Database Schema Extensions
```sql
-- User profiles table
CREATE TABLE user_profiles (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL UNIQUE,
    data JSON NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- LLM interaction logs (optional)
CREATE TABLE llm_interactions (
    id INTEGER PRIMARY KEY,
    query TEXT NOT NULL,
    response TEXT NOT NULL,
    model TEXT NOT NULL,
    tokens_used INTEGER,
    cost REAL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Template usage history
CREATE TABLE template_usage (
    id INTEGER PRIMARY KEY,
    template_path TEXT NOT NULL,
    field_data JSON NOT NULL,
    profile_used TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

### Configuration Extensions
```yaml
# .markitect.yml additions
openrouter:
  api_key: "sk-or-..."
  default_model: "openai/gpt-4-turbo"
  max_tokens: 4096
  temperature: 0.7

profiles:
  default_profile: "personal"
  auto_save: true

templates:
  auto_fill_mode: "guided"  # auto, interactive, guided
  learn_from_corrections: true
```

## 📊 Implementation Priority Matrix

| Component | Issue | Priority | Effort | Dependencies |
|-----------|-------|----------|--------|--------------|
| OpenRouter Client | #98 | HIGH | 2 days | Config system |
| Context Builder | #98 | HIGH | 3 days | FTS, Database |
| Profile Manager | #99 | HIGH | 2 days | Database |
| Template Field Parser | #99 | HIGH | 3 days | Template system |
| Interactive Questionnaire | #99 | MEDIUM | 4 days | Profile system |
| LLM Auto-Fill | #99 | MEDIUM | 4 days | OpenRouter, Profiles |
| Natural Language Enhancement | #98 | MEDIUM | 2 days | OpenRouter, Context |
| Advanced Context | #98 | LOW | 3 days | Basic LLM working |

## 🧪 Testing Strategy

### Unit Tests Required
- OpenRouter client error handling and retries
- Template field parsing and validation
- Profile CRUD operations
- Context building with different content types
- LLM response formatting and citation extraction

### Integration Tests Required
- End-to-end template filling workflow
- Natural language queries with MarkiTect context
- Profile-based auto-fill accuracy
- CLI command integration

### Manual Testing Scenarios
1. **OpenRouter Setup**: User configures API key and tests connection
2. **Template Creation**: User creates template with various field types
3. **Profile Management**: User creates and manages multiple profiles
4. **Interactive Fill**: User completes template via questionnaire
5. **Auto-Fill**: System suggests field values based on profile
6. **LLM Queries**: User asks questions about their content
7. **Context Accuracy**: Verify LLM responses cite correct sources

## 🎯 Success Metrics & KPIs

### Quantitative Metrics
- **Template Completion Time**: Reduce by 60% with auto-fill
- **Query Response Accuracy**: >90% relevant context inclusion
- **User Satisfaction**: >8/10 rating for LLM responses
- **Profile Usage**: >75% of template fills use profile data

### Qualitative Metrics
- **User Experience**: Seamless workflow integration
- **Content Discovery**: Users find value in LLM-powered content exploration
- **Productivity**: Templates become preferred method for document creation
- **Accuracy**: LLM suggestions match user intent and context

## 🚧 Risk Assessment & Mitigation

### Technical Risks
1. **OpenRouter API Changes**: Mitigate with versioned API client and error handling
2. **Token Limits**: Implement intelligent context truncation and chunking
3. **LLM Response Quality**: Add response validation and fallback mechanisms
4. **Performance**: Cache common queries and optimize context building

### User Experience Risks
1. **Complex Configuration**: Provide setup wizard and clear documentation
2. **Learning Curve**: Include examples and guided tutorials
3. **Profile Privacy**: Implement secure storage and optional features
4. **Cost Concerns**: Add usage tracking and budget controls

## 📝 Requirements Engineering Notes

**FOR REQUIREMENTS ENGINEER:**

1. **User Research Needed**:
   - Survey existing MarkiTect users about LLM integration preferences
   - Gather template usage patterns and pain points
   - Validate profile data schema with target users

2. **Technical Validation Required**:
   - Verify OpenRouter API capabilities and limitations
   - Test LLM response quality with MarkiTect content types
   - Validate template field parsing edge cases

3. **Feature Prioritization**:
   - Consider implementing #98 first for immediate value
   - #99 can follow as enhanced template experience
   - Both share OpenRouter infrastructure investment

4. **Alternative Approaches**:
   - Consider other LLM providers (Anthropic direct, Azure OpenAI)
   - Evaluate local LLM options for privacy-conscious users
   - Template auto-fill could work without LLM (rule-based initially)

5. **Integration Points**:
   - Leverage existing Query Paradigm system for #98
   - Build on solid template foundation for #99
   - Utilize configuration manager for seamless setup

**RECOMMENDATION**: Proceed with implementation in phases, starting with OpenRouter client and basic LLM integration for #98, then expanding to template auto-fill for #99. The shared infrastructure investment will benefit both features significantly.