Files
markitect-main/docs/search.md
tegwick 8179929a4a feat: implement lightweight full text search plugin using SQLite FTS5 (issue #83)
Added comprehensive full text search capabilities as a lightweight plugin.

Key features:
- SQLite FTS5-based search engine with no external dependencies
- Automatic indexing via database triggers for real-time updates
- Advanced query support: phrase search, boolean operators, proximity search
- Complete CLI interface with search commands
- Graceful fallback to LIKE queries when FTS5 unavailable
- Plugin architecture integration for extensibility

CLI Commands:
- `markitect search init` - Initialize search indexes
- `markitect search query` - Perform full text searches
- `markitect search status` - View index statistics
- `markitect search rebuild` - Rebuild indexes from scratch

Search Features:
- Content type filtering (files, schemas, all)
- Result pagination and formatting options
- Query validation and syntax assistance
- Performance optimization and index maintenance

Technical Implementation:
- FTSSearchPlugin: Main search plugin class
- SearchIndexer: FTS5 table management and indexing
- QueryParser: Query optimization and FTS5 syntax conversion
- Comprehensive error handling and fallback mechanisms
- 25 test cases covering all functionality

Documentation includes complete usage guide and examples.

Resolves issue #83: Full text search

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-03 17:03:11 +02:00

6.6 KiB

Full Text Search - Issue #83

MarkiTect provides powerful full text search capabilities using SQLite's FTS5 extension, implemented as a lightweight plugin system.

Features

  • SQLite FTS5: Leverages SQLite's built-in FTS5 virtual tables for high-performance search
  • No Dependencies: Uses only SQLite, no additional search libraries required
  • Real-time Indexing: Automatic index updates when content changes
  • Advanced Queries: Support for phrase search, boolean operators, and proximity search
  • CLI Integration: Complete command-line interface for search operations
  • Fallback Support: Graceful degradation to simple LIKE queries if FTS5 unavailable

Quick Start

First, initialize the search indexes:

markitect search init

This creates FTS5 virtual tables and sets up automatic indexing triggers.

Rebuild Indexes

To rebuild indexes from scratch:

markitect search rebuild --optimize

Check Status

View search system status:

markitect search status

Perform Searches

Search across all content:

markitect search query "API documentation"

Search only files:

markitect search query "graphql" --type files --limit 5

Search only schemas:

markitect search query "user" --type schemas

Query Syntax

Simple Queries

# Single word - automatically adds wildcard
markitect search query "api"        # Finds: api, apis, apiKey, etc.

# Multiple words - implicit AND
markitect search query "api documentation"  # Finds documents with both terms
# Exact phrase matching
markitect search query '"GraphQL mutation"'

Boolean Operators

# AND operator
markitect search query "api AND documentation"

# OR operator
markitect search query "rest OR graphql"

# NOT operator
markitect search query "api NOT deprecated"

Advanced Features

# Proximity search (terms within 10 words)
markitect search query "NEAR(api documentation, 10)"

# Column-specific search
markitect search query "filename:readme"

CLI Commands

markitect search init

Initialize search indexes and FTS5 tables.

Options:

  • --rebuild - Rebuild existing indexes during initialization

Examples:

markitect search init
markitect search init --rebuild

markitect search query

Perform full text search queries.

Arguments:

  • QUERY - Search query string

Options:

  • --type [all|files|schemas] - Content type to search (default: all)
  • --limit INTEGER - Maximum number of results (default: 20)
  • --offset INTEGER - Result offset for pagination (default: 0)
  • --format [table|json|yaml] - Output format (default: table)
  • --no-highlight - Disable result highlighting

Examples:

markitect search query "documentation"
markitect search query "api" --type files --limit 10
markitect search query "schema" --format json
markitect search query "user" --offset 20 --limit 10  # Pagination

markitect search status

Show search index status and statistics.

Options:

  • --format [table|json|yaml] - Output format (default: table)

Examples:

markitect search status
markitect search status --format json

markitect search rebuild

Rebuild search indexes from scratch.

Options:

  • --optimize - Optimize indexes after rebuild

Examples:

markitect search rebuild
markitect search rebuild --optimize

Architecture

Plugin System

The search functionality is implemented as a plugin within MarkiTect's plugin architecture:

  • FTSSearchPlugin: Main search plugin class
  • SearchIndexer: Handles FTS5 table creation and maintenance
  • QueryParser: Parses and optimizes search queries

Database Integration

  • FTS5 Virtual Tables: fts_files and fts_schemas for content indexing
  • Automatic Triggers: Database triggers keep indexes synchronized
  • Fallback Queries: LIKE-based search when FTS5 unavailable

Search Process

  1. Indexing: Content automatically indexed via database triggers
  2. Query Parsing: User queries converted to FTS5-compatible syntax
  3. Search Execution: FTS5 performs ranked full text search
  4. Result Processing: Results formatted with highlighting and metadata
  5. Fallback: Simple LIKE queries if FTS5 fails

Performance Considerations

Index Optimization

# Periodically optimize indexes for better performance
markitect search rebuild --optimize

Query Performance

  • Use specific content types (--type files) when possible
  • Limit results with --limit for large result sets
  • Use phrase queries for exact matches
  • Boolean operators are more efficient than complex natural language

Storage Impact

  • FTS5 indexes require additional disk space (typically 30-50% of content size)
  • Indexes are automatically maintained, no manual intervention needed
  • Use markitect search status to monitor index sizes

Troubleshooting

FTS5 Not Available

If SQLite doesn't have FTS5 support:

markitect search status
# Shows: FTS5 Full Text Search: Disabled

The system automatically falls back to simple LIKE-based search.

Database Lock Errors

If you see database lock errors:

# Wait for other operations to complete, then retry
markitect search rebuild

Index Corruption

To fix corrupted indexes:

# Rebuild from scratch
markitect search rebuild --optimize

No Results Found

Check if content is indexed:

markitect search status
# Check document counts for fts_files and fts_schemas

If no documents are indexed:

markitect search rebuild

Integration with GraphQL

The search functionality integrates with MarkiTect's GraphQL interface through the existing search resolver, providing both FTS5-powered and fallback search capabilities through the GraphQL API.

Examples

Content Discovery

Find all API-related documentation:

markitect search query "api documentation" --limit 10

Schema Exploration

Find user-related schemas:

markitect search query "user" --type schemas --format json

Search with pagination:

# First page
markitect search query "graphql" --limit 5 --offset 0

# Second page
markitect search query "graphql" --limit 5 --offset 5

Advanced Queries

Complex boolean search:

markitect search query "api AND (rest OR graphql) NOT deprecated"

Exact phrase with context:

markitect search query '"mutation resolver"' --type files

The full text search system provides powerful, lightweight search capabilities that scale with your MarkiTect content repository.