I really dig more content about how vector databases/tools handle problems like this!
In sqlite-vec, there's only a flat brute-force index (though DiskANN/IVF will be coming soon). But we do have a concept of partition keys[0], which allows you to "internally shard" the vector index based on a user_id or any other value.
Then at query time, any WHERE constraints on a partition key are pushed-down, so only matching vectors are searched instead of the entire index.
select
document_id,
user_id,
distance
from vec_documents
where contents_embedding match :query
and k = 20
and user_id = 123; -- only search documents from user 123
Definitely not as performant as a proper vector index, but a lot of real-world application have these natural groups anyway. Like "search only English documents" or "search entries in this workspace only", or even "search comments only from the past 30 days."
Internally sqlite-vec stores vectors in "chunks", so when partition keys are definds, every chunk is associated with a unique combination of all partition keys. Kinda hard to describe, but if you create a vec0 virtual table and insert some values, you can inspect the internal "shadow tables" in the SQLite database to see how it's all stored.
Well, I have a brute force strategy for pgvector working reasonably well. Individual, partial indexes. It works for all those queries with category_id=<> clauses. You only need an index for larger categories, for categories with rows below a threshold you dont need index a KNN/dot product would work.
In sqlite-vec, there's only a flat brute-force index (though DiskANN/IVF will be coming soon). But we do have a concept of partition keys[0], which allows you to "internally shard" the vector index based on a user_id or any other value.
Then at query time, any WHERE constraints on a partition key are pushed-down, so only matching vectors are searched instead of the entire index. Definitely not as performant as a proper vector index, but a lot of real-world application have these natural groups anyway. Like "search only English documents" or "search entries in this workspace only", or even "search comments only from the past 30 days."Internally sqlite-vec stores vectors in "chunks", so when partition keys are definds, every chunk is associated with a unique combination of all partition keys. Kinda hard to describe, but if you create a vec0 virtual table and insert some values, you can inspect the internal "shadow tables" in the SQLite database to see how it's all stored.
[0] https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partiti...