1.4 KiB
Semantic Retrieval Notes
T02 introduces semantic retrieval as an optional layer above the existing SQLite text search. The default service path remains text-only so existing callers keep stable result sets and ordering.
Local provider
HashingEmbeddingProvider is the offline provider used for tests and local
development. It produces deterministic token-bucket vectors without any network
dependency. Configure it with:
REPO_REGISTRY_EMBEDDING_PROVIDER=hashing
When enabled, search combines:
- text match score from the existing SQLite search path
- vector score from approved ability/capability entries and content chunks
- approved confidence as a small ranking prior
PostgreSQL / pgvector path
SQLite dev mode should remain the lowest-friction path. A production PostgreSQL deployment can add pgvector without changing the registry API by introducing an embedding table keyed by source entity:
CREATE TABLE registry_embeddings (
id bigserial PRIMARY KEY,
repository_id bigint NOT NULL,
source_table text NOT NULL,
source_id bigint NOT NULL,
provider text NOT NULL,
vector vector(768) NOT NULL,
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (source_table, source_id, provider)
);
The search service can then replace runtime embedding of stored text with indexed nearest-neighbor lookup, while retaining the current hybrid rank formula and the same response schema.