-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Unless we have an Open Source LLM with a 1M+ token context length then we need vector search for the assistant API: #1273 (comment)
Even with a very large context length it is still far cheaper to use vector search with embeddings. It can all easily be done on CPU.
Implementation
I see three main options for adding vector search:
- Simple in-memory brute force search. We regenerate the embeddings instead of saving them to storage.
- Add one or more vector databases as a backend
- Connect to an external database
The first is easy to implement and doesn't have any upkeep because we flush everything after a restart. If we want to change chunking size or any hyperparameter it has the same cost doing a restart. There is plenty of prior art in Go:
- https://github.com/marekgalovic/anndb
- https://github.com/aws-samples/gofast-hnsw/?tab=readme-ov-file#brute-search-performance
- Milvus, Weaviate, Gorse
Even implementing HNSW or Annoy would not be difficult. The main problems I see are the classic database issues. So I am in favor of doing 1. or 3. no in-between. Although saving embeddings to a flat file could be OK, just not on the first iteration.
I did make an experiment using BadgerDB, but talked myself out of it: https://github.com/richiejp/badger-cybertron-vector/blob/main/main.go. The problem is that it complicates comparing the vectors and then we also have to maintain state between restarts.
API
Obviously we will follow the OpenAI API as in #1273, but I think it would also make sense to have some API to do simple search without an LLM. Just so people can do fuzzy search with LocalAI instead of reaching for another tool. Suggestions for how this API should look welcome.