generated from coulomb/repo-seed
45 lines
1.4 KiB
Markdown
45 lines
1.4 KiB
Markdown
# Semantic Retrieval Notes
|
|
|
|
T02 introduces semantic retrieval as an optional layer above the existing SQLite
|
|
text search. The default service path remains text-only so existing callers keep
|
|
stable result sets and ordering.
|
|
|
|
## Local provider
|
|
|
|
`HashingEmbeddingProvider` is the offline provider used for tests and local
|
|
development. It produces deterministic token-bucket vectors without any network
|
|
dependency. Configure it with:
|
|
|
|
```bash
|
|
REPO_REGISTRY_EMBEDDING_PROVIDER=hashing
|
|
```
|
|
|
|
When enabled, search combines:
|
|
|
|
- text match score from the existing SQLite search path
|
|
- vector score from approved ability/capability entries and content chunks
|
|
- approved confidence as a small ranking prior
|
|
|
|
## PostgreSQL / pgvector path
|
|
|
|
SQLite dev mode should remain the lowest-friction path. A production PostgreSQL
|
|
deployment can add pgvector without changing the registry API by introducing an
|
|
embedding table keyed by source entity:
|
|
|
|
```sql
|
|
CREATE TABLE registry_embeddings (
|
|
id bigserial PRIMARY KEY,
|
|
repository_id bigint NOT NULL,
|
|
source_table text NOT NULL,
|
|
source_id bigint NOT NULL,
|
|
provider text NOT NULL,
|
|
vector vector(768) NOT NULL,
|
|
updated_at timestamptz NOT NULL DEFAULT now(),
|
|
UNIQUE (source_table, source_id, provider)
|
|
);
|
|
```
|
|
|
|
The search service can then replace runtime embedding of stored text with indexed
|
|
nearest-neighbor lookup, while retaining the current hybrid rank formula and the
|
|
same response schema.
|