Files
repo-scoping/docs/semantic-retrieval.md
2026-04-26 16:05:27 +02:00

45 lines
1.4 KiB
Markdown

# Semantic Retrieval Notes
T02 introduces semantic retrieval as an optional layer above the existing SQLite
text search. The default service path remains text-only so existing callers keep
stable result sets and ordering.
## Local provider
`HashingEmbeddingProvider` is the offline provider used for tests and local
development. It produces deterministic token-bucket vectors without any network
dependency. Configure it with:
```bash
REPO_REGISTRY_EMBEDDING_PROVIDER=hashing
```
When enabled, search combines:
- text match score from the existing SQLite search path
- vector score from approved ability/capability entries and content chunks
- approved confidence as a small ranking prior
## PostgreSQL / pgvector path
SQLite dev mode should remain the lowest-friction path. A production PostgreSQL
deployment can add pgvector without changing the registry API by introducing an
embedding table keyed by source entity:
```sql
CREATE TABLE registry_embeddings (
id bigserial PRIMARY KEY,
repository_id bigint NOT NULL,
source_table text NOT NULL,
source_id bigint NOT NULL,
provider text NOT NULL,
vector vector(768) NOT NULL,
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (source_table, source_id, provider)
);
```
The search service can then replace runtime embedding of stored text with indexed
nearest-neighbor lookup, while retaining the current hybrid rank formula and the
same response schema.