diff --git a/todo/LLM_INTEGRATION_GAMEPLAN.md b/todo/LLM_INTEGRATION_GAMEPLAN.md new file mode 100644 index 00000000..be0f6eee --- /dev/null +++ b/todo/LLM_INTEGRATION_GAMEPLAN.md @@ -0,0 +1,366 @@ +# LLM Integration Gameplan - Issues #98 & #99 + +**Date**: 2025-10-03 +**Status**: REQUIREMENTS ANALYSIS +**Priority**: HIGH +**Estimated Effort**: 4-6 weeks development + +## ๐ŸŽฏ Executive Summary + +Two complementary features that will transform MarkiTect from a content management system into an **AI-powered knowledge assistant**: + +- **Issue #98**: OpenRoute Integration - Enable LLM queries against MarkiTect content +- **Issue #99**: Auto Fill Templates - LLM-powered interactive template completion + +## ๐Ÿ“‹ Current State Analysis + +### โœ… Existing Infrastructure (Ready to Leverage) +- **Template System**: Full template engine with parsing and rendering (`markitect/template/`) +- **Configuration Manager**: Extensible config system with CLI integration +- **Query Paradigms**: Natural Language paradigm exists (documented only) +- **CLI Framework**: Click-based with established patterns +- **Database**: SQLite with full metadata and content indexing +- **FTS Search**: Full text search capabilities for content discovery + +### ๐Ÿ—๏ธ Infrastructure Gaps (Need Development) +- **LLM Client**: No OpenRouter integration exists +- **Profile System**: No user profile management +- **Interactive UI**: No terminal questionnaire system +- **Context Building**: No intelligent content selection for LLM queries + +## ๐Ÿš€ Issue #98: OpenRoute Integration + +### Requirements Analysis +```yaml +Goal: "Use MarkiTect ingested content as context for interacting with LLMs flexibly and conveniently" +User Story: "As a user, I want to ask natural language questions about my content and get intelligent responses with source citations" +Integration: "Allow users to connect with an existing OpenRouter account" +``` + +### Technical Implementation Plan + +#### Phase 1: Core LLM Infrastructure (Week 1) +1. **OpenRouter Client Development** + ```python + # markitect/llm/openrouter_client.py + class OpenRouterClient: + - API key management + - Model selection (GPT-4, Claude, etc.) + - Request/response handling + - Rate limiting and error handling + - Cost tracking + ``` + +2. **Configuration Integration** + ```bash + markitect config-set openrouter.api_key sk-or-... + markitect config-set openrouter.default_model openai/gpt-4-turbo + markitect config-show --show-sensitive # Show API keys + ``` + +3. **Basic CLI Commands** + ```bash + markitect llm test # Test OpenRouter connection + markitect llm models # List available models + markitect llm ask "Simple question" # Basic LLM interaction + ``` + +#### Phase 2: Content Context Integration (Week 2) +4. **Context Builder System** + ```python + # markitect/llm/context_builder.py + class ContextBuilder: + - Extract relevant content from database + - Use FTS search for content discovery + - Build context within token limits + - Include metadata and relationships + ``` + +5. **Enhanced Natural Language Paradigm** + ```python + # Update markitect/query_paradigms/paradigms/natural_language_paradigm.py + class NaturalLanguageQueryParadigm: + - Integrate OpenRouter for real LLM processing + - Build context from MarkiTect content + - Return structured responses with citations + ``` + +6. **Advanced CLI Integration** + ```bash + markitect paradigms exec "Natural Language" "What are the main API concepts?" + markitect llm chat # Interactive mode + markitect llm ask "Summarize docs tagged tutorial" # Filtered context + ``` + +#### Phase 3: Advanced Features (Week 3) +7. **Smart Context Selection** + - Relevance scoring for content inclusion + - Context size optimization + - Source citation tracking + +8. **Response Enhancement** + - Markdown formatting + - Source links back to MarkiTect files + - Follow-up question suggestions + +### Success Criteria +- โœ… OpenRouter integration working with API key configuration +- โœ… Natural language queries return relevant, contextualized responses +- โœ… Responses include source citations linking to MarkiTect files +- โœ… Context building intelligently selects relevant content +- โœ… CLI commands integrated with existing paradigm system + +## ๐Ÿ“ Issue #99: Auto Fill Templates + +### Requirements Analysis +```yaml +Goal: "Use Markdown Templates to capture data with terminal questionnaire and LLM auto-fill" +User Story: "As a user, I want to fill templates interactively, with the system auto-completing fields based on my profile" +LLM Integration: "Provided the user has a profile, an LLM should autofill based on the profile provided" +``` + +### Technical Implementation Plan + +#### Phase 1: Enhanced Template System (Week 1) +1. **Template Field Analysis** + ```python + # markitect/template/field_analyzer.py + class TemplateFieldAnalyzer: + - Parse template annotations: {{name:string:Your full name}} + - Extract field types, descriptions, validation rules + - Identify required vs optional fields + - Support nested field structures + ``` + +2. **Interactive Questionnaire Engine** + ```python + # markitect/template/questionnaire.py + class TemplateQuestionnaire: + - Terminal-based interactive data collection + - Support input types: text, choice, date, number, boolean + - Field validation and re-prompting + - Progress tracking and partial save + ``` + +3. **Basic CLI Commands** + ```bash + markitect template-fill template.md # Interactive questionnaire + markitect template-analyze template.md # Show template fields + markitect template-validate template.md # Validate template syntax + ``` + +#### Phase 2: User Profile System (Week 2) +4. **Profile Management** + ```python + # markitect/profile/manager.py + class ProfileManager: + - Create, read, update, delete profiles + - Support multiple profiles (personal, work, etc.) + - Profile inheritance and templates + - Database storage integration + ``` + +5. **Profile Schema System** + ```python + # markitect/profile/schema.py + - Standard profile fields (personal, professional, technical) + - Custom field extensions + - JSON Schema validation + - Field type definitions and constraints + ``` + +6. **Profile CLI Commands** + ```bash + markitect profile create personal + markitect profile set personal.name "John Doe" + markitect profile set personal.email "john@example.com" + markitect profile show personal + markitect profile list + markitect profile export personal profile.json + ``` + +#### Phase 3: LLM-Powered Auto-Fill (Week 3) +7. **Smart Field Completion** + ```python + # markitect/template/auto_filler.py + class LLMAutoFiller: + - Use OpenRouter LLM for field suggestions + - Context-aware completions based on template purpose + - Profile-informed field values + - Learning from user corrections + ``` + +8. **Advanced Template Fill Modes** + ```bash + markitect template-fill template.md --auto # Auto-fill from profile + markitect template-fill template.md --guided # Mix auto + questions + markitect template-fill template.md --profile=work # Use specific profile + markitect template-fill template.md --learn # Learn from corrections + ``` + +#### Phase 4: Advanced Features (Week 4) +9. **Field Intelligence** + - Template field learning and preferences + - Content generation for complex fields + - Multi-step form workflows + - Field dependencies and conditional logic + +10. **Integration Features** + - Template field suggestions based on existing content + - Auto-population from MarkiTect database + - Template version control and updates + +### Success Criteria +- โœ… Interactive terminal questionnaire for template completion +- โœ… User profile system with multiple profile support +- โœ… LLM-powered auto-fill suggestions based on user profile +- โœ… Enhanced template parser supporting field metadata +- โœ… Seamless integration with existing template rendering system + +## ๐Ÿ”— Shared Infrastructure Requirements + +### Database Schema Extensions +```sql +-- User profiles table +CREATE TABLE user_profiles ( + id INTEGER PRIMARY KEY, + name TEXT NOT NULL UNIQUE, + data JSON NOT NULL, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +-- LLM interaction logs (optional) +CREATE TABLE llm_interactions ( + id INTEGER PRIMARY KEY, + query TEXT NOT NULL, + response TEXT NOT NULL, + model TEXT NOT NULL, + tokens_used INTEGER, + cost REAL, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +-- Template usage history +CREATE TABLE template_usage ( + id INTEGER PRIMARY KEY, + template_path TEXT NOT NULL, + field_data JSON NOT NULL, + profile_used TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +``` + +### Configuration Extensions +```yaml +# .markitect.yml additions +openrouter: + api_key: "sk-or-..." + default_model: "openai/gpt-4-turbo" + max_tokens: 4096 + temperature: 0.7 + +profiles: + default_profile: "personal" + auto_save: true + +templates: + auto_fill_mode: "guided" # auto, interactive, guided + learn_from_corrections: true +``` + +## ๐Ÿ“Š Implementation Priority Matrix + +| Component | Issue | Priority | Effort | Dependencies | +|-----------|-------|----------|--------|--------------| +| OpenRouter Client | #98 | HIGH | 2 days | Config system | +| Context Builder | #98 | HIGH | 3 days | FTS, Database | +| Profile Manager | #99 | HIGH | 2 days | Database | +| Template Field Parser | #99 | HIGH | 3 days | Template system | +| Interactive Questionnaire | #99 | MEDIUM | 4 days | Profile system | +| LLM Auto-Fill | #99 | MEDIUM | 4 days | OpenRouter, Profiles | +| Natural Language Enhancement | #98 | MEDIUM | 2 days | OpenRouter, Context | +| Advanced Context | #98 | LOW | 3 days | Basic LLM working | + +## ๐Ÿงช Testing Strategy + +### Unit Tests Required +- OpenRouter client error handling and retries +- Template field parsing and validation +- Profile CRUD operations +- Context building with different content types +- LLM response formatting and citation extraction + +### Integration Tests Required +- End-to-end template filling workflow +- Natural language queries with MarkiTect context +- Profile-based auto-fill accuracy +- CLI command integration + +### Manual Testing Scenarios +1. **OpenRouter Setup**: User configures API key and tests connection +2. **Template Creation**: User creates template with various field types +3. **Profile Management**: User creates and manages multiple profiles +4. **Interactive Fill**: User completes template via questionnaire +5. **Auto-Fill**: System suggests field values based on profile +6. **LLM Queries**: User asks questions about their content +7. **Context Accuracy**: Verify LLM responses cite correct sources + +## ๐ŸŽฏ Success Metrics & KPIs + +### Quantitative Metrics +- **Template Completion Time**: Reduce by 60% with auto-fill +- **Query Response Accuracy**: >90% relevant context inclusion +- **User Satisfaction**: >8/10 rating for LLM responses +- **Profile Usage**: >75% of template fills use profile data + +### Qualitative Metrics +- **User Experience**: Seamless workflow integration +- **Content Discovery**: Users find value in LLM-powered content exploration +- **Productivity**: Templates become preferred method for document creation +- **Accuracy**: LLM suggestions match user intent and context + +## ๐Ÿšง Risk Assessment & Mitigation + +### Technical Risks +1. **OpenRouter API Changes**: Mitigate with versioned API client and error handling +2. **Token Limits**: Implement intelligent context truncation and chunking +3. **LLM Response Quality**: Add response validation and fallback mechanisms +4. **Performance**: Cache common queries and optimize context building + +### User Experience Risks +1. **Complex Configuration**: Provide setup wizard and clear documentation +2. **Learning Curve**: Include examples and guided tutorials +3. **Profile Privacy**: Implement secure storage and optional features +4. **Cost Concerns**: Add usage tracking and budget controls + +## ๐Ÿ“ Requirements Engineering Notes + +**FOR REQUIREMENTS ENGINEER:** + +1. **User Research Needed**: + - Survey existing MarkiTect users about LLM integration preferences + - Gather template usage patterns and pain points + - Validate profile data schema with target users + +2. **Technical Validation Required**: + - Verify OpenRouter API capabilities and limitations + - Test LLM response quality with MarkiTect content types + - Validate template field parsing edge cases + +3. **Feature Prioritization**: + - Consider implementing #98 first for immediate value + - #99 can follow as enhanced template experience + - Both share OpenRouter infrastructure investment + +4. **Alternative Approaches**: + - Consider other LLM providers (Anthropic direct, Azure OpenAI) + - Evaluate local LLM options for privacy-conscious users + - Template auto-fill could work without LLM (rule-based initially) + +5. **Integration Points**: + - Leverage existing Query Paradigm system for #98 + - Build on solid template foundation for #99 + - Utilize configuration manager for seamless setup + +**RECOMMENDATION**: Proceed with implementation in phases, starting with OpenRouter client and basic LLM integration for #98, then expanding to template auto-fill for #99. The shared infrastructure investment will benefit both features significantly. \ No newline at end of file