generated from coulomb/repo-seed
optional semantic retrieval
This commit is contained in:
44
docs/semantic-retrieval.md
Normal file
44
docs/semantic-retrieval.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Semantic Retrieval Notes
|
||||
|
||||
T02 introduces semantic retrieval as an optional layer above the existing SQLite
|
||||
text search. The default service path remains text-only so existing callers keep
|
||||
stable result sets and ordering.
|
||||
|
||||
## Local provider
|
||||
|
||||
`HashingEmbeddingProvider` is the offline provider used for tests and local
|
||||
development. It produces deterministic token-bucket vectors without any network
|
||||
dependency. Configure it with:
|
||||
|
||||
```bash
|
||||
REPO_REGISTRY_EMBEDDING_PROVIDER=hashing
|
||||
```
|
||||
|
||||
When enabled, search combines:
|
||||
|
||||
- text match score from the existing SQLite search path
|
||||
- vector score from approved ability/capability entries and content chunks
|
||||
- approved confidence as a small ranking prior
|
||||
|
||||
## PostgreSQL / pgvector path
|
||||
|
||||
SQLite dev mode should remain the lowest-friction path. A production PostgreSQL
|
||||
deployment can add pgvector without changing the registry API by introducing an
|
||||
embedding table keyed by source entity:
|
||||
|
||||
```sql
|
||||
CREATE TABLE registry_embeddings (
|
||||
id bigserial PRIMARY KEY,
|
||||
repository_id bigint NOT NULL,
|
||||
source_table text NOT NULL,
|
||||
source_id bigint NOT NULL,
|
||||
provider text NOT NULL,
|
||||
vector vector(768) NOT NULL,
|
||||
updated_at timestamptz NOT NULL DEFAULT now(),
|
||||
UNIQUE (source_table, source_id, provider)
|
||||
);
|
||||
```
|
||||
|
||||
The search service can then replace runtime embedding of stored text with indexed
|
||||
nearest-neighbor lookup, while retaining the current hybrid rank formula and the
|
||||
same response schema.
|
||||
Reference in New Issue
Block a user