Files
markitect-main/docs/search.md
tegwick 8179929a4a feat: implement lightweight full text search plugin using SQLite FTS5 (issue #83)
Added comprehensive full text search capabilities as a lightweight plugin.

Key features:
- SQLite FTS5-based search engine with no external dependencies
- Automatic indexing via database triggers for real-time updates
- Advanced query support: phrase search, boolean operators, proximity search
- Complete CLI interface with search commands
- Graceful fallback to LIKE queries when FTS5 unavailable
- Plugin architecture integration for extensibility

CLI Commands:
- `markitect search init` - Initialize search indexes
- `markitect search query` - Perform full text searches
- `markitect search status` - View index statistics
- `markitect search rebuild` - Rebuild indexes from scratch

Search Features:
- Content type filtering (files, schemas, all)
- Result pagination and formatting options
- Query validation and syntax assistance
- Performance optimization and index maintenance

Technical Implementation:
- FTSSearchPlugin: Main search plugin class
- SearchIndexer: FTS5 table management and indexing
- QueryParser: Query optimization and FTS5 syntax conversion
- Comprehensive error handling and fallback mechanisms
- 25 test cases covering all functionality

Documentation includes complete usage guide and examples.

Resolves issue #83: Full text search

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-03 17:03:11 +02:00

307 lines
6.6 KiB
Markdown

# Full Text Search - Issue #83
MarkiTect provides powerful full text search capabilities using SQLite's FTS5 extension, implemented as a lightweight plugin system.
## Features
- **SQLite FTS5**: Leverages SQLite's built-in FTS5 virtual tables for high-performance search
- **No Dependencies**: Uses only SQLite, no additional search libraries required
- **Real-time Indexing**: Automatic index updates when content changes
- **Advanced Queries**: Support for phrase search, boolean operators, and proximity search
- **CLI Integration**: Complete command-line interface for search operations
- **Fallback Support**: Graceful degradation to simple LIKE queries if FTS5 unavailable
## Quick Start
### Initialize Search
First, initialize the search indexes:
```bash
markitect search init
```
This creates FTS5 virtual tables and sets up automatic indexing triggers.
### Rebuild Indexes
To rebuild indexes from scratch:
```bash
markitect search rebuild --optimize
```
### Check Status
View search system status:
```bash
markitect search status
```
### Perform Searches
Search across all content:
```bash
markitect search query "API documentation"
```
Search only files:
```bash
markitect search query "graphql" --type files --limit 5
```
Search only schemas:
```bash
markitect search query "user" --type schemas
```
## Query Syntax
### Simple Queries
```bash
# Single word - automatically adds wildcard
markitect search query "api" # Finds: api, apis, apiKey, etc.
# Multiple words - implicit AND
markitect search query "api documentation" # Finds documents with both terms
```
### Phrase Search
```bash
# Exact phrase matching
markitect search query '"GraphQL mutation"'
```
### Boolean Operators
```bash
# AND operator
markitect search query "api AND documentation"
# OR operator
markitect search query "rest OR graphql"
# NOT operator
markitect search query "api NOT deprecated"
```
### Advanced Features
```bash
# Proximity search (terms within 10 words)
markitect search query "NEAR(api documentation, 10)"
# Column-specific search
markitect search query "filename:readme"
```
## CLI Commands
### `markitect search init`
Initialize search indexes and FTS5 tables.
**Options:**
- `--rebuild` - Rebuild existing indexes during initialization
**Examples:**
```bash
markitect search init
markitect search init --rebuild
```
### `markitect search query`
Perform full text search queries.
**Arguments:**
- `QUERY` - Search query string
**Options:**
- `--type [all|files|schemas]` - Content type to search (default: all)
- `--limit INTEGER` - Maximum number of results (default: 20)
- `--offset INTEGER` - Result offset for pagination (default: 0)
- `--format [table|json|yaml]` - Output format (default: table)
- `--no-highlight` - Disable result highlighting
**Examples:**
```bash
markitect search query "documentation"
markitect search query "api" --type files --limit 10
markitect search query "schema" --format json
markitect search query "user" --offset 20 --limit 10 # Pagination
```
### `markitect search status`
Show search index status and statistics.
**Options:**
- `--format [table|json|yaml]` - Output format (default: table)
**Examples:**
```bash
markitect search status
markitect search status --format json
```
### `markitect search rebuild`
Rebuild search indexes from scratch.
**Options:**
- `--optimize` - Optimize indexes after rebuild
**Examples:**
```bash
markitect search rebuild
markitect search rebuild --optimize
```
## Architecture
### Plugin System
The search functionality is implemented as a plugin within MarkiTect's plugin architecture:
- **FTSSearchPlugin**: Main search plugin class
- **SearchIndexer**: Handles FTS5 table creation and maintenance
- **QueryParser**: Parses and optimizes search queries
### Database Integration
- **FTS5 Virtual Tables**: `fts_files` and `fts_schemas` for content indexing
- **Automatic Triggers**: Database triggers keep indexes synchronized
- **Fallback Queries**: LIKE-based search when FTS5 unavailable
### Search Process
1. **Indexing**: Content automatically indexed via database triggers
2. **Query Parsing**: User queries converted to FTS5-compatible syntax
3. **Search Execution**: FTS5 performs ranked full text search
4. **Result Processing**: Results formatted with highlighting and metadata
5. **Fallback**: Simple LIKE queries if FTS5 fails
## Performance Considerations
### Index Optimization
```bash
# Periodically optimize indexes for better performance
markitect search rebuild --optimize
```
### Query Performance
- Use specific content types (`--type files`) when possible
- Limit results with `--limit` for large result sets
- Use phrase queries for exact matches
- Boolean operators are more efficient than complex natural language
### Storage Impact
- FTS5 indexes require additional disk space (typically 30-50% of content size)
- Indexes are automatically maintained, no manual intervention needed
- Use `markitect search status` to monitor index sizes
## Troubleshooting
### FTS5 Not Available
If SQLite doesn't have FTS5 support:
```bash
markitect search status
# Shows: FTS5 Full Text Search: Disabled
```
The system automatically falls back to simple LIKE-based search.
### Database Lock Errors
If you see database lock errors:
```bash
# Wait for other operations to complete, then retry
markitect search rebuild
```
### Index Corruption
To fix corrupted indexes:
```bash
# Rebuild from scratch
markitect search rebuild --optimize
```
### No Results Found
Check if content is indexed:
```bash
markitect search status
# Check document counts for fts_files and fts_schemas
```
If no documents are indexed:
```bash
markitect search rebuild
```
## Integration with GraphQL
The search functionality integrates with MarkiTect's GraphQL interface through the existing search resolver, providing both FTS5-powered and fallback search capabilities through the GraphQL API.
## Examples
### Content Discovery
Find all API-related documentation:
```bash
markitect search query "api documentation" --limit 10
```
### Schema Exploration
Find user-related schemas:
```bash
markitect search query "user" --type schemas --format json
```
### Comprehensive Search
Search with pagination:
```bash
# First page
markitect search query "graphql" --limit 5 --offset 0
# Second page
markitect search query "graphql" --limit 5 --offset 5
```
### Advanced Queries
Complex boolean search:
```bash
markitect search query "api AND (rest OR graphql) NOT deprecated"
```
Exact phrase with context:
```bash
markitect search query '"mutation resolver"' --type files
```
The full text search system provides powerful, lightweight search capabilities that scale with your MarkiTect content repository.