Added comprehensive full text search capabilities as a lightweight plugin. Key features: - SQLite FTS5-based search engine with no external dependencies - Automatic indexing via database triggers for real-time updates - Advanced query support: phrase search, boolean operators, proximity search - Complete CLI interface with search commands - Graceful fallback to LIKE queries when FTS5 unavailable - Plugin architecture integration for extensibility CLI Commands: - `markitect search init` - Initialize search indexes - `markitect search query` - Perform full text searches - `markitect search status` - View index statistics - `markitect search rebuild` - Rebuild indexes from scratch Search Features: - Content type filtering (files, schemas, all) - Result pagination and formatting options - Query validation and syntax assistance - Performance optimization and index maintenance Technical Implementation: - FTSSearchPlugin: Main search plugin class - SearchIndexer: FTS5 table management and indexing - QueryParser: Query optimization and FTS5 syntax conversion - Comprehensive error handling and fallback mechanisms - 25 test cases covering all functionality Documentation includes complete usage guide and examples. Resolves issue #83: Full text search 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
307 lines
6.6 KiB
Markdown
307 lines
6.6 KiB
Markdown
# Full Text Search - Issue #83
|
|
|
|
MarkiTect provides powerful full text search capabilities using SQLite's FTS5 extension, implemented as a lightweight plugin system.
|
|
|
|
## Features
|
|
|
|
- **SQLite FTS5**: Leverages SQLite's built-in FTS5 virtual tables for high-performance search
|
|
- **No Dependencies**: Uses only SQLite, no additional search libraries required
|
|
- **Real-time Indexing**: Automatic index updates when content changes
|
|
- **Advanced Queries**: Support for phrase search, boolean operators, and proximity search
|
|
- **CLI Integration**: Complete command-line interface for search operations
|
|
- **Fallback Support**: Graceful degradation to simple LIKE queries if FTS5 unavailable
|
|
|
|
## Quick Start
|
|
|
|
### Initialize Search
|
|
|
|
First, initialize the search indexes:
|
|
|
|
```bash
|
|
markitect search init
|
|
```
|
|
|
|
This creates FTS5 virtual tables and sets up automatic indexing triggers.
|
|
|
|
### Rebuild Indexes
|
|
|
|
To rebuild indexes from scratch:
|
|
|
|
```bash
|
|
markitect search rebuild --optimize
|
|
```
|
|
|
|
### Check Status
|
|
|
|
View search system status:
|
|
|
|
```bash
|
|
markitect search status
|
|
```
|
|
|
|
### Perform Searches
|
|
|
|
Search across all content:
|
|
|
|
```bash
|
|
markitect search query "API documentation"
|
|
```
|
|
|
|
Search only files:
|
|
|
|
```bash
|
|
markitect search query "graphql" --type files --limit 5
|
|
```
|
|
|
|
Search only schemas:
|
|
|
|
```bash
|
|
markitect search query "user" --type schemas
|
|
```
|
|
|
|
## Query Syntax
|
|
|
|
### Simple Queries
|
|
|
|
```bash
|
|
# Single word - automatically adds wildcard
|
|
markitect search query "api" # Finds: api, apis, apiKey, etc.
|
|
|
|
# Multiple words - implicit AND
|
|
markitect search query "api documentation" # Finds documents with both terms
|
|
```
|
|
|
|
### Phrase Search
|
|
|
|
```bash
|
|
# Exact phrase matching
|
|
markitect search query '"GraphQL mutation"'
|
|
```
|
|
|
|
### Boolean Operators
|
|
|
|
```bash
|
|
# AND operator
|
|
markitect search query "api AND documentation"
|
|
|
|
# OR operator
|
|
markitect search query "rest OR graphql"
|
|
|
|
# NOT operator
|
|
markitect search query "api NOT deprecated"
|
|
```
|
|
|
|
### Advanced Features
|
|
|
|
```bash
|
|
# Proximity search (terms within 10 words)
|
|
markitect search query "NEAR(api documentation, 10)"
|
|
|
|
# Column-specific search
|
|
markitect search query "filename:readme"
|
|
```
|
|
|
|
## CLI Commands
|
|
|
|
### `markitect search init`
|
|
|
|
Initialize search indexes and FTS5 tables.
|
|
|
|
**Options:**
|
|
- `--rebuild` - Rebuild existing indexes during initialization
|
|
|
|
**Examples:**
|
|
```bash
|
|
markitect search init
|
|
markitect search init --rebuild
|
|
```
|
|
|
|
### `markitect search query`
|
|
|
|
Perform full text search queries.
|
|
|
|
**Arguments:**
|
|
- `QUERY` - Search query string
|
|
|
|
**Options:**
|
|
- `--type [all|files|schemas]` - Content type to search (default: all)
|
|
- `--limit INTEGER` - Maximum number of results (default: 20)
|
|
- `--offset INTEGER` - Result offset for pagination (default: 0)
|
|
- `--format [table|json|yaml]` - Output format (default: table)
|
|
- `--no-highlight` - Disable result highlighting
|
|
|
|
**Examples:**
|
|
```bash
|
|
markitect search query "documentation"
|
|
markitect search query "api" --type files --limit 10
|
|
markitect search query "schema" --format json
|
|
markitect search query "user" --offset 20 --limit 10 # Pagination
|
|
```
|
|
|
|
### `markitect search status`
|
|
|
|
Show search index status and statistics.
|
|
|
|
**Options:**
|
|
- `--format [table|json|yaml]` - Output format (default: table)
|
|
|
|
**Examples:**
|
|
```bash
|
|
markitect search status
|
|
markitect search status --format json
|
|
```
|
|
|
|
### `markitect search rebuild`
|
|
|
|
Rebuild search indexes from scratch.
|
|
|
|
**Options:**
|
|
- `--optimize` - Optimize indexes after rebuild
|
|
|
|
**Examples:**
|
|
```bash
|
|
markitect search rebuild
|
|
markitect search rebuild --optimize
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Plugin System
|
|
|
|
The search functionality is implemented as a plugin within MarkiTect's plugin architecture:
|
|
|
|
- **FTSSearchPlugin**: Main search plugin class
|
|
- **SearchIndexer**: Handles FTS5 table creation and maintenance
|
|
- **QueryParser**: Parses and optimizes search queries
|
|
|
|
### Database Integration
|
|
|
|
- **FTS5 Virtual Tables**: `fts_files` and `fts_schemas` for content indexing
|
|
- **Automatic Triggers**: Database triggers keep indexes synchronized
|
|
- **Fallback Queries**: LIKE-based search when FTS5 unavailable
|
|
|
|
### Search Process
|
|
|
|
1. **Indexing**: Content automatically indexed via database triggers
|
|
2. **Query Parsing**: User queries converted to FTS5-compatible syntax
|
|
3. **Search Execution**: FTS5 performs ranked full text search
|
|
4. **Result Processing**: Results formatted with highlighting and metadata
|
|
5. **Fallback**: Simple LIKE queries if FTS5 fails
|
|
|
|
## Performance Considerations
|
|
|
|
### Index Optimization
|
|
|
|
```bash
|
|
# Periodically optimize indexes for better performance
|
|
markitect search rebuild --optimize
|
|
```
|
|
|
|
### Query Performance
|
|
|
|
- Use specific content types (`--type files`) when possible
|
|
- Limit results with `--limit` for large result sets
|
|
- Use phrase queries for exact matches
|
|
- Boolean operators are more efficient than complex natural language
|
|
|
|
### Storage Impact
|
|
|
|
- FTS5 indexes require additional disk space (typically 30-50% of content size)
|
|
- Indexes are automatically maintained, no manual intervention needed
|
|
- Use `markitect search status` to monitor index sizes
|
|
|
|
## Troubleshooting
|
|
|
|
### FTS5 Not Available
|
|
|
|
If SQLite doesn't have FTS5 support:
|
|
|
|
```bash
|
|
markitect search status
|
|
# Shows: FTS5 Full Text Search: Disabled
|
|
```
|
|
|
|
The system automatically falls back to simple LIKE-based search.
|
|
|
|
### Database Lock Errors
|
|
|
|
If you see database lock errors:
|
|
|
|
```bash
|
|
# Wait for other operations to complete, then retry
|
|
markitect search rebuild
|
|
```
|
|
|
|
### Index Corruption
|
|
|
|
To fix corrupted indexes:
|
|
|
|
```bash
|
|
# Rebuild from scratch
|
|
markitect search rebuild --optimize
|
|
```
|
|
|
|
### No Results Found
|
|
|
|
Check if content is indexed:
|
|
|
|
```bash
|
|
markitect search status
|
|
# Check document counts for fts_files and fts_schemas
|
|
```
|
|
|
|
If no documents are indexed:
|
|
|
|
```bash
|
|
markitect search rebuild
|
|
```
|
|
|
|
## Integration with GraphQL
|
|
|
|
The search functionality integrates with MarkiTect's GraphQL interface through the existing search resolver, providing both FTS5-powered and fallback search capabilities through the GraphQL API.
|
|
|
|
## Examples
|
|
|
|
### Content Discovery
|
|
|
|
Find all API-related documentation:
|
|
|
|
```bash
|
|
markitect search query "api documentation" --limit 10
|
|
```
|
|
|
|
### Schema Exploration
|
|
|
|
Find user-related schemas:
|
|
|
|
```bash
|
|
markitect search query "user" --type schemas --format json
|
|
```
|
|
|
|
### Comprehensive Search
|
|
|
|
Search with pagination:
|
|
|
|
```bash
|
|
# First page
|
|
markitect search query "graphql" --limit 5 --offset 0
|
|
|
|
# Second page
|
|
markitect search query "graphql" --limit 5 --offset 5
|
|
```
|
|
|
|
### Advanced Queries
|
|
|
|
Complex boolean search:
|
|
|
|
```bash
|
|
markitect search query "api AND (rest OR graphql) NOT deprecated"
|
|
```
|
|
|
|
Exact phrase with context:
|
|
|
|
```bash
|
|
markitect search query '"mutation resolver"' --type files
|
|
```
|
|
|
|
The full text search system provides powerful, lightweight search capabilities that scale with your MarkiTect content repository. |