How to add a custom analyzer/query (More generally: How does bleve work/How can I extend it?) #2161
-
|
So I am building an app launcher and came across a problem... Many App Launchers (fzf as well) allow for fuzzy search matching the first letter of a sequence of words. Now here's the thing... The docs are pretty sparse regarding this topic and there are a couple gists that I've found (Gist 1, Gist 2) but I still don't get how this works and if this is even a good solution. I've even pondered if Analyzers are a good fit as they seem to process strings on index time and allow for a "one time cost". Buuuuttt I do not get at all how I could create anything custom like that... A nice thorough explanation/walk-through would be much appreciated! How this project indexes stuff, what tools (analyzers, queries, etc.) I have in my toolbox that I can use, etc. I have also contemplated implementing my own fuzzy search algorithm as bleve's fuzzy search is not really fulfilling my needs, it's just: I. Do. Not. Know. How. The docs are too empty. I've also tried looking at the source for FuzzyQuery but sadly the code is spread out across multiple files and it is really hard to piece it all together to understand it. So here I am asking for help, because I thought that this would be the fastest and most efficient way to reach a solution. Thanks in advance 🙏🏼 |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 7 replies
-
|
Also I just found out what vector searching is about (I didn't know it was called that). How would I get that working? It's not mentioned at all in the docs... |
Beta Was this translation helpful? Give feedback.
-
|
You should find documentation aided by code level commentary here - https://pkg.go.dev/github.com/blevesearch/bleve/v2 To your specific questions:
|
Beta Was this translation helpful? Give feedback.
-
|
That's right - you will not be able to add your own algorithms into search. You can only leverage what the library offers. Keep in mind, we're not talking about semantic search (via nearest neighbor over vectors) here - but it seems we're talking about allowing mistakes from users to still be able to match what was indexed. Using fuzziness is the way you can achieve this, albeit with a limitation - that we support a max of 2 per token. A character add, a character remove and a character replace are fuzziness properties. Here's a sample - The analyzer rules here tokenize text on whitespace and store lower case version of the tokens in the index. |
Beta Was this translation helpful? Give feedback.
That's right - you will not be able to add your own algorithms into search. You can only leverage what the library offers.
Keep in mind, we're not talking about semantic search (via nearest neighbor over vectors) here - but it seems we're talking about allowing mistakes from users to still be able to match what was indexed.
Using fuzziness is the way you can achieve this, albeit with a limitation - that we support a max of 2 per token. A character add, a character remove and a character replace are fuzziness properties.
Here's a sample -