docs: add comprehensive LLM integration gameplan for issues #98 & #99

- Created detailed implementation strategy for OpenRoute integration (issue #98)
- Designed auto-fill templates system with LLM assistance (issue #99)
- Analyzed existing infrastructure and identified reusable components
- Provided 4-6 week phased development plan with clear priorities
- Included technical architecture, database schemas, and testing strategy
- Added risk assessment, success metrics, and requirements engineering guidance
- Recommended starting with OpenRoute client as shared foundation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-04 00:16:17 +02:00
parent 5143864a86
commit f63101cad8

View File

@@ -0,0 +1,366 @@
# LLM Integration Gameplan - Issues #98 & #99
**Date**: 2025-10-03
**Status**: REQUIREMENTS ANALYSIS
**Priority**: HIGH
**Estimated Effort**: 4-6 weeks development
## 🎯 Executive Summary
Two complementary features that will transform MarkiTect from a content management system into an **AI-powered knowledge assistant**:
- **Issue #98**: OpenRoute Integration - Enable LLM queries against MarkiTect content
- **Issue #99**: Auto Fill Templates - LLM-powered interactive template completion
## 📋 Current State Analysis
### ✅ Existing Infrastructure (Ready to Leverage)
- **Template System**: Full template engine with parsing and rendering (`markitect/template/`)
- **Configuration Manager**: Extensible config system with CLI integration
- **Query Paradigms**: Natural Language paradigm exists (documented only)
- **CLI Framework**: Click-based with established patterns
- **Database**: SQLite with full metadata and content indexing
- **FTS Search**: Full text search capabilities for content discovery
### 🏗️ Infrastructure Gaps (Need Development)
- **LLM Client**: No OpenRouter integration exists
- **Profile System**: No user profile management
- **Interactive UI**: No terminal questionnaire system
- **Context Building**: No intelligent content selection for LLM queries
## 🚀 Issue #98: OpenRoute Integration
### Requirements Analysis
```yaml
Goal: "Use MarkiTect ingested content as context for interacting with LLMs flexibly and conveniently"
User Story: "As a user, I want to ask natural language questions about my content and get intelligent responses with source citations"
Integration: "Allow users to connect with an existing OpenRouter account"
```
### Technical Implementation Plan
#### Phase 1: Core LLM Infrastructure (Week 1)
1. **OpenRouter Client Development**
```python
# markitect/llm/openrouter_client.py
class OpenRouterClient:
- API key management
- Model selection (GPT-4, Claude, etc.)
- Request/response handling
- Rate limiting and error handling
- Cost tracking
```
2. **Configuration Integration**
```bash
markitect config-set openrouter.api_key sk-or-...
markitect config-set openrouter.default_model openai/gpt-4-turbo
markitect config-show --show-sensitive # Show API keys
```
3. **Basic CLI Commands**
```bash
markitect llm test # Test OpenRouter connection
markitect llm models # List available models
markitect llm ask "Simple question" # Basic LLM interaction
```
#### Phase 2: Content Context Integration (Week 2)
4. **Context Builder System**
```python
# markitect/llm/context_builder.py
class ContextBuilder:
- Extract relevant content from database
- Use FTS search for content discovery
- Build context within token limits
- Include metadata and relationships
```
5. **Enhanced Natural Language Paradigm**
```python
# Update markitect/query_paradigms/paradigms/natural_language_paradigm.py
class NaturalLanguageQueryParadigm:
- Integrate OpenRouter for real LLM processing
- Build context from MarkiTect content
- Return structured responses with citations
```
6. **Advanced CLI Integration**
```bash
markitect paradigms exec "Natural Language" "What are the main API concepts?"
markitect llm chat # Interactive mode
markitect llm ask "Summarize docs tagged tutorial" # Filtered context
```
#### Phase 3: Advanced Features (Week 3)
7. **Smart Context Selection**
- Relevance scoring for content inclusion
- Context size optimization
- Source citation tracking
8. **Response Enhancement**
- Markdown formatting
- Source links back to MarkiTect files
- Follow-up question suggestions
### Success Criteria
- ✅ OpenRouter integration working with API key configuration
- ✅ Natural language queries return relevant, contextualized responses
- ✅ Responses include source citations linking to MarkiTect files
- ✅ Context building intelligently selects relevant content
- ✅ CLI commands integrated with existing paradigm system
## 📝 Issue #99: Auto Fill Templates
### Requirements Analysis
```yaml
Goal: "Use Markdown Templates to capture data with terminal questionnaire and LLM auto-fill"
User Story: "As a user, I want to fill templates interactively, with the system auto-completing fields based on my profile"
LLM Integration: "Provided the user has a profile, an LLM should autofill based on the profile provided"
```
### Technical Implementation Plan
#### Phase 1: Enhanced Template System (Week 1)
1. **Template Field Analysis**
```python
# markitect/template/field_analyzer.py
class TemplateFieldAnalyzer:
- Parse template annotations: {{name:string:Your full name}}
- Extract field types, descriptions, validation rules
- Identify required vs optional fields
- Support nested field structures
```
2. **Interactive Questionnaire Engine**
```python
# markitect/template/questionnaire.py
class TemplateQuestionnaire:
- Terminal-based interactive data collection
- Support input types: text, choice, date, number, boolean
- Field validation and re-prompting
- Progress tracking and partial save
```
3. **Basic CLI Commands**
```bash
markitect template-fill template.md # Interactive questionnaire
markitect template-analyze template.md # Show template fields
markitect template-validate template.md # Validate template syntax
```
#### Phase 2: User Profile System (Week 2)
4. **Profile Management**
```python
# markitect/profile/manager.py
class ProfileManager:
- Create, read, update, delete profiles
- Support multiple profiles (personal, work, etc.)
- Profile inheritance and templates
- Database storage integration
```
5. **Profile Schema System**
```python
# markitect/profile/schema.py
- Standard profile fields (personal, professional, technical)
- Custom field extensions
- JSON Schema validation
- Field type definitions and constraints
```
6. **Profile CLI Commands**
```bash
markitect profile create personal
markitect profile set personal.name "John Doe"
markitect profile set personal.email "john@example.com"
markitect profile show personal
markitect profile list
markitect profile export personal profile.json
```
#### Phase 3: LLM-Powered Auto-Fill (Week 3)
7. **Smart Field Completion**
```python
# markitect/template/auto_filler.py
class LLMAutoFiller:
- Use OpenRouter LLM for field suggestions
- Context-aware completions based on template purpose
- Profile-informed field values
- Learning from user corrections
```
8. **Advanced Template Fill Modes**
```bash
markitect template-fill template.md --auto # Auto-fill from profile
markitect template-fill template.md --guided # Mix auto + questions
markitect template-fill template.md --profile=work # Use specific profile
markitect template-fill template.md --learn # Learn from corrections
```
#### Phase 4: Advanced Features (Week 4)
9. **Field Intelligence**
- Template field learning and preferences
- Content generation for complex fields
- Multi-step form workflows
- Field dependencies and conditional logic
10. **Integration Features**
- Template field suggestions based on existing content
- Auto-population from MarkiTect database
- Template version control and updates
### Success Criteria
- ✅ Interactive terminal questionnaire for template completion
- ✅ User profile system with multiple profile support
- ✅ LLM-powered auto-fill suggestions based on user profile
- ✅ Enhanced template parser supporting field metadata
- ✅ Seamless integration with existing template rendering system
## 🔗 Shared Infrastructure Requirements
### Database Schema Extensions
```sql
-- User profiles table
CREATE TABLE user_profiles (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
data JSON NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- LLM interaction logs (optional)
CREATE TABLE llm_interactions (
id INTEGER PRIMARY KEY,
query TEXT NOT NULL,
response TEXT NOT NULL,
model TEXT NOT NULL,
tokens_used INTEGER,
cost REAL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Template usage history
CREATE TABLE template_usage (
id INTEGER PRIMARY KEY,
template_path TEXT NOT NULL,
field_data JSON NOT NULL,
profile_used TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
### Configuration Extensions
```yaml
# .markitect.yml additions
openrouter:
api_key: "sk-or-..."
default_model: "openai/gpt-4-turbo"
max_tokens: 4096
temperature: 0.7
profiles:
default_profile: "personal"
auto_save: true
templates:
auto_fill_mode: "guided" # auto, interactive, guided
learn_from_corrections: true
```
## 📊 Implementation Priority Matrix
| Component | Issue | Priority | Effort | Dependencies |
|-----------|-------|----------|--------|--------------|
| OpenRouter Client | #98 | HIGH | 2 days | Config system |
| Context Builder | #98 | HIGH | 3 days | FTS, Database |
| Profile Manager | #99 | HIGH | 2 days | Database |
| Template Field Parser | #99 | HIGH | 3 days | Template system |
| Interactive Questionnaire | #99 | MEDIUM | 4 days | Profile system |
| LLM Auto-Fill | #99 | MEDIUM | 4 days | OpenRouter, Profiles |
| Natural Language Enhancement | #98 | MEDIUM | 2 days | OpenRouter, Context |
| Advanced Context | #98 | LOW | 3 days | Basic LLM working |
## 🧪 Testing Strategy
### Unit Tests Required
- OpenRouter client error handling and retries
- Template field parsing and validation
- Profile CRUD operations
- Context building with different content types
- LLM response formatting and citation extraction
### Integration Tests Required
- End-to-end template filling workflow
- Natural language queries with MarkiTect context
- Profile-based auto-fill accuracy
- CLI command integration
### Manual Testing Scenarios
1. **OpenRouter Setup**: User configures API key and tests connection
2. **Template Creation**: User creates template with various field types
3. **Profile Management**: User creates and manages multiple profiles
4. **Interactive Fill**: User completes template via questionnaire
5. **Auto-Fill**: System suggests field values based on profile
6. **LLM Queries**: User asks questions about their content
7. **Context Accuracy**: Verify LLM responses cite correct sources
## 🎯 Success Metrics & KPIs
### Quantitative Metrics
- **Template Completion Time**: Reduce by 60% with auto-fill
- **Query Response Accuracy**: >90% relevant context inclusion
- **User Satisfaction**: >8/10 rating for LLM responses
- **Profile Usage**: >75% of template fills use profile data
### Qualitative Metrics
- **User Experience**: Seamless workflow integration
- **Content Discovery**: Users find value in LLM-powered content exploration
- **Productivity**: Templates become preferred method for document creation
- **Accuracy**: LLM suggestions match user intent and context
## 🚧 Risk Assessment & Mitigation
### Technical Risks
1. **OpenRouter API Changes**: Mitigate with versioned API client and error handling
2. **Token Limits**: Implement intelligent context truncation and chunking
3. **LLM Response Quality**: Add response validation and fallback mechanisms
4. **Performance**: Cache common queries and optimize context building
### User Experience Risks
1. **Complex Configuration**: Provide setup wizard and clear documentation
2. **Learning Curve**: Include examples and guided tutorials
3. **Profile Privacy**: Implement secure storage and optional features
4. **Cost Concerns**: Add usage tracking and budget controls
## 📝 Requirements Engineering Notes
**FOR REQUIREMENTS ENGINEER:**
1. **User Research Needed**:
- Survey existing MarkiTect users about LLM integration preferences
- Gather template usage patterns and pain points
- Validate profile data schema with target users
2. **Technical Validation Required**:
- Verify OpenRouter API capabilities and limitations
- Test LLM response quality with MarkiTect content types
- Validate template field parsing edge cases
3. **Feature Prioritization**:
- Consider implementing #98 first for immediate value
- #99 can follow as enhanced template experience
- Both share OpenRouter infrastructure investment
4. **Alternative Approaches**:
- Consider other LLM providers (Anthropic direct, Azure OpenAI)
- Evaluate local LLM options for privacy-conscious users
- Template auto-fill could work without LLM (rule-based initially)
5. **Integration Points**:
- Leverage existing Query Paradigm system for #98
- Build on solid template foundation for #99
- Utilize configuration manager for seamless setup
**RECOMMENDATION**: Proceed with implementation in phases, starting with OpenRouter client and basic LLM integration for #98, then expanding to template auto-fill for #99. The shared infrastructure investment will benefit both features significantly.