How to add a custom analyzer/query (More generally: How does bleve work/How can I extend it?) #2161

Eclextic · 2025-03-08T01:02:59Z

Eclextic
Mar 8, 2025

So I am building an app launcher and came across a problem... Many App Launchers (fzf as well) allow for fuzzy search matching the first letter of a sequence of words.
E.g. "CSGO" should match with Counter Strike Global Offensive.

Now here's the thing... The docs are pretty sparse regarding this topic and there are a couple gists that I've found (Gist 1, Gist 2) but I still don't get how this works and if this is even a good solution.

I've even pondered if Analyzers are a good fit as they seem to process strings on index time and allow for a "one time cost". Buuuuttt I do not get at all how I could create anything custom like that...

A nice thorough explanation/walk-through would be much appreciated! How this project indexes stuff, what tools (analyzers, queries, etc.) I have in my toolbox that I can use, etc.

I have also contemplated implementing my own fuzzy search algorithm as bleve's fuzzy search is not really fulfilling my needs, it's just: I. Do. Not. Know. How. The docs are too empty.

I've also tried looking at the source for FuzzyQuery but sadly the code is spread out across multiple files and it is really hard to piece it all together to understand it.

So here I am asking for help, because I thought that this would be the fastest and most efficient way to reach a solution.

Thanks in advance 🙏🏼

Answered by abhinavdangeti

Mar 11, 2025

That's right - you will not be able to add your own algorithms into search. You can only leverage what the library offers.

Keep in mind, we're not talking about semantic search (via nearest neighbor over vectors) here - but it seems we're talking about allowing mistakes from users to still be able to match what was indexed.

Using fuzziness is the way you can achieve this, albeit with a limitation - that we support a max of 2 per token. A character add, a character remove and a character replace are fuzziness properties.

Here's a sample -

package main

import (
	"fmt"
	"os"

	"github.com/blevesearch/bleve/v2"
	"github.com/blevesearch/bleve/v2/analysis/analyzer/custom"
	"github.com/blevesea…

View full answer

Eclextic · 2025-03-08T13:27:40Z

Eclextic
Mar 8, 2025
Author

Also I just found out what vector searching is about (I didn't know it was called that).

How would I get that working? It's not mentioned at all in the docs...

0 replies

abhinavdangeti · 2025-03-10T21:45:40Z

abhinavdangeti
Mar 10, 2025
Maintainer

You should find documentation aided by code level commentary here - https://pkg.go.dev/github.com/blevesearch/bleve/v2

To your specific questions:

The following queries support fuzziness - fuzzy, match, match_phrase but up until a factor of 2.
Here's an example of how to set up a custom analyzer - https://pkg.go.dev/github.com/blevesearch/bleve/v2@v2.4.4/mapping#IndexMappingImpl.AddCustomAnalyzer
You can find guidelines to how to build a vector index and run a nearest neighbor search over that data here - https://github.com/blevesearch/bleve/blob/v2.4.4/docs/vectors.md

4 replies

Eclextic Mar 10, 2025
Author

Oh nice. I didn't expect an answer so soon!

First of all, regarding the vector search. One would need to find their own model correct? This is kind of new to me, gonna be honest, and I am still learning about this. Is this how photo search has lately become searchable using natural language e.g. Apple's Photo Search in iOS 18?

Also I know that the queries fuzzy, match and match_phrase support fuzzy searching, it's just that the current implementation isn't satisfactory, so here's my question to you: Can I generate an extra field that gets created on runtime and added into the index (this is what is called the analyzer correct, or is that a tokenizer)? If so, then I can just index the starting letters of a name and for now forget implementing my own custom fuzzy search. If not, then a custom query/algorithm would be more flexible and appreciated. How would I create my own query type?

Also how come are the main docs on go.pkg? Funnily enough I almost always ignore the go.pkg site because the docs of most packages are either directly in the code as comments themselves or just not available altogether.

Thank you so much for your help and sorry if I am asking way to many questions! I just haven't been able to find any good information and you seem to be the most reliable source!

abhinavdangeti Mar 10, 2025
Maintainer

One would need to find their own model correct?

Correct, a bleve index over a vector typed field is a vector store that enables you to run nearest neighbor search. Embedding models like open-ai, llama etc. can be used to vectorize your data.

Can I generate an extra field that gets created on runtime and added into the index

Yes, but this has nothing to do with fuzziness or custom analyzers. I assume when you say fields, you're referring to keys in JSON content - which is what your index mapping will represent. You can create a dynamic index mapping where you do not limit your index to a certain set of fields, but instead will index everything from your documents - any new fields you add in documents much later, will automatically get indexed.

If so, then I can just index the starting letters of a name and for now forget implementing my own custom fuzzy search.

See this is not in line with your previous comment, or at least I don't follow. For this you can device a custom analyzer that uses the truncate_token token filter for words to index substrings, but I'm still not sure what you're going for with this.

If not, then a custom query/algorithm would be more flexible and appreciated. How would I create my own query type?

You will be limited to the type of queries we support, but there's config with existing queries like with analytic queries - match & match_phrase, the analyzer associated with the field in the index mapping will be applied to your search criteria as well.

Eclextic Mar 11, 2025
Author

OK, so let me be very explicit...
Like I explained earlier I am creating an app launcher. One of the many functionalities of an app launcher would be really good fuzzy finding. And here are the challenges I am having right now:

I need the closest match no matter what. That means that even when the user has a stroke while trying to search for their app, they need to get their app. E.g. They type "A dnec fo foir amnd ic" they should get "A dance of fire and ice".
Almost all good app launchers I know implement an algorithm where the first letters are used for identification. E.g. "A dance of fire and ice" should be searchable using "adofi" as those are the starting letters of each word in the name "A dance of fire and ice".
The user should be able to customize their fuzzy searching algorithm. E.g. Enable disable case-sensitivity.
And lastly natural language searching capabilities would be the cherry on top for functionality. You explained this well enough for me and the docs are extensive regarding this feature. I won't implement it this early into the project, but I'll keep it in mind. Thanks :D

Now it seems to me, if I understood this correctly, that querying support is kind of limited... Creating a custom algorithm for searching isn't possible? What about the Gists that I linked in my original post? They don't seem to do any searching themselves, but I assumed that because they are "queries", that I could implement my own algorithm...

Also it seems that this project might not fit my needs after all? Because if it is this limited in customizability, then adding any new feature might be a hassle...

Eclextic Mar 11, 2025
Author

The reason I suggested analyzers was because regarding the second issue I just mentioned ("adofi") I expected 2 ways in which I can add this functionality. Either I:

Create a custom query algorithm which searches differently allowing me to implement this functionality of searching the first letter of a sentence at the algorithm level.
Or I add a custom field (key value pair) that has the key "subsequence" and value "adofi" so that bleve can find it with the natively supported way by bleve. But I would still like this to be dynamically generated at index time, i.e.: I have a struct called Application that has fields name string and description string, etc. I index it using bleve. Bleve generates a new field subsequence string which gets stored into the value with "adofi" if the name was "A dance of fire and ice".

abhinavdangeti · 2025-03-11T21:11:14Z

abhinavdangeti
Mar 11, 2025
Maintainer

That's right - you will not be able to add your own algorithms into search. You can only leverage what the library offers.

Keep in mind, we're not talking about semantic search (via nearest neighbor over vectors) here - but it seems we're talking about allowing mistakes from users to still be able to match what was indexed.

Using fuzziness is the way you can achieve this, albeit with a limitation - that we support a max of 2 per token. A character add, a character remove and a character replace are fuzziness properties.

Here's a sample -

package main

import (
	"fmt"
	"os"

	"github.com/blevesearch/bleve/v2"
	"github.com/blevesearch/bleve/v2/analysis/analyzer/custom"
	"github.com/blevesearch/bleve/v2/analysis/token/lowercase"
	"github.com/blevesearch/bleve/v2/analysis/tokenizer/whitespace"
	"github.com/blevesearch/bleve/v2/search/query"
)

func main() {
	idxMapping := bleve.NewIndexMapping()
	if err := idxMapping.AddCustomAnalyzer("xyz", map[string]interface{}{
		"type":          custom.Name,
		"tokenizer":     whitespace.Name,
		"token_filters": []string{lowercase.Name},
	}); err != nil {
		fmt.Println("ERROR SETTING UP CUSTOM ANALYZER", err)
		return
	}

	idxMapping.DefaultAnalyzer = "xyz"

	tmpIndexPath, _ := os.MkdirTemp("", "tmp.bleve")
	defer func(path string) {
		_ = os.RemoveAll(path)
	}(tmpIndexPath)

	idx, _ := bleve.New(tmpIndexPath, idxMapping)
	defer func() {
		_ = idx.Close()
	}()

	doc := map[string]interface{}{
		"fieldX": "a dance of ice and fire",
	}

	_ = idx.Index("doc1", doc)
	fmt.Printf("Indexed: `%s`\n", doc["fieldX"])
	fmt.Println("-------------------------------------")

	for _, i := range []string{
		"a dance of ice and fire",
		"fire and ice of dance a",
		"a dnc o ic n fir",
		"A dnec fo Foir amnd Ic",
		"fir n ice",
		"dance",
		"game of thrones",
	} {
		q := bleve.NewMatchQuery(i)
		q.Fuzziness = 2
		q.Operator = query.MatchQueryOperatorAnd
		req := bleve.NewSearchRequest(q)
		res, _ := idx.Search(req)

		if res.Total != 1 {
			fmt.Printf("Failed search for `%s`\n", i)
		} else {
			fmt.Printf("Successful search for `%s`\n", i)
		}
	}
}

The analyzer rules here tokenize text on whitespace and store lower case version of the tokens in the index.
Upon search, the same analyzer rules are applied over the search criteria and every token (via the and operator) is searched for, in no particular order within the index.
If that 2 error per token limit does not work for you, this solution won't fly.

3 replies

abhinavdangeti Mar 11, 2025
Maintainer

$ go run test.go
Indexed: `a dance of ice and fire`
-------------------------------------
Successful search for `a dance of ice and fire`
Successful search for `fire and ice of dance a`
Successful search for `a dnc o ic n fir`
Successful search for `A dnec fo Foir amnd Ic`
Successful search for `fir n ice`
Successful search for `dance`
Failed search for `game of thrones`

Eclextic Mar 11, 2025
Author

Alright you've been very helpful, thank you. I think I'll close this thread and try to work with bleve per your guidance.

But creating something like: search for "adofi" get "A dance of fire and ice" is not possible then?

If not, I'll figure something out. Thank you for everything.

abhinavdangeti Mar 11, 2025
Maintainer

Worth mentioning here - in addition to fuzziness, you could look into using our synonyms support (coming with v2.5.0) - to declare any number of acceptable definitions for a given word or phrase.

https://github.com/blevesearch/bleve/blob/master/docs/synonyms.md

How to add a custom analyzer/query (More generally: How does bleve work/How can I extend it?) #2161

Uh oh!

Eclextic Mar 8, 2025

Replies: 3 comments · 7 replies

Uh oh!

Eclextic Mar 8, 2025 Author

Uh oh!

abhinavdangeti Mar 10, 2025 Maintainer

Uh oh!

Eclextic Mar 10, 2025 Author

Uh oh!

abhinavdangeti Mar 10, 2025 Maintainer

Uh oh!

Eclextic Mar 11, 2025 Author

Uh oh!

Eclextic Mar 11, 2025 Author

Uh oh!

abhinavdangeti Mar 11, 2025 Maintainer

Uh oh!

abhinavdangeti Mar 11, 2025 Maintainer

Uh oh!

Eclextic Mar 11, 2025 Author

Uh oh!

abhinavdangeti Mar 11, 2025 Maintainer

Eclextic
Mar 8, 2025

Replies: 3 comments 7 replies

Eclextic
Mar 8, 2025
Author

abhinavdangeti
Mar 10, 2025
Maintainer

Eclextic Mar 10, 2025
Author

abhinavdangeti Mar 10, 2025
Maintainer

Eclextic Mar 11, 2025
Author

Eclextic Mar 11, 2025
Author

abhinavdangeti
Mar 11, 2025
Maintainer

abhinavdangeti Mar 11, 2025
Maintainer

Eclextic Mar 11, 2025
Author

abhinavdangeti Mar 11, 2025
Maintainer