diff --git a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc index 133927b74c1..97d06a91a4b 100644 --- a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc +++ b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc @@ -67,7 +67,7 @@ $ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:applicat "type":"pdate", "stored":true } - ] + ] }' ---- @@ -81,22 +81,22 @@ $ bin/solr post -c films example/films/films.json ---- === Let's do some Vector searches -Before making the queries, we define an example target vector, simulating a person that -watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the Chamber of Secrets_. -We get the vector of each movie, then calculate the resulting average vector, which will +Before making the queries, we define an example target vector, simulating a person that +watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the Chamber of Secrets_. +We get the vector of each movie, then calculate the resulting average vector, which will be used as the input vector for all the following example queries. - + ``` [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, 0.0859, -0.1789] ``` [NOTE] ==== -Interested in calculating the vector using Solr's xref:query-guide:streaming-expressions.adoc[streaming capability]? +Interested in calculating the vector using Solr's xref:query-guide:streaming-expressions.adoc[streaming capability]? Here is an example of a streaming expression that you can run via the xref:query-guide:stream-screen.adoc[Solr Admin Stream UI]: ``` let( - a=select( + a=select( search(films, qt="/select", q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and the Chamber of Secrets"", @@ -141,43 +141,54 @@ The output is: // Solr URL examples below all have [ and ] characters which, when used with Curl, causes encoding issues so just putting plain http links -Search for the top 10 movies most similar to the target vector that we previously calculated (KNN Query for recommendation): +**KNN Query for recommendation** - Search for the top 10 movies most similar to the target vector that we previously calculated: + + http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789] + +- Notice that among the results, there are some animation family movies, such as _Curious George_ and _Bambi_, which makes sense, since the target vector was created with two other animation family movies (_Finding Nemo_ and _Bee Movie_). +- We also notice that among the results there are two movies that the person already watched. In the next example we will filter them out. + +**KNN query with Filter Query** - Search for the top 10 movies most similar to the resulting vector, excluding the movies already watched: + + http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("/en/finding_nemo" "/en/bee_movie" "/en/harry_potter_and_the_chamber_of_secrets_2002") + +**KNN as Filter Query** - Search for movies with "cinderella" in the name among the top 50 movies most similar to the target vector: + + http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn f=film_vector topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789] - 'http://localhost:8983/solr/films/query?q={%21knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]' +- There are 3 "cinderella" movies in the index, but only 1 is among the top 50 most similar to the target vector (_Cinderella III: A Twist in Time_). -* Notice that among the results, there are some animation family movies, such as _Curious George_ and _Bambi_, which makes sense, since the target vector was created with two other animation family movies (_Finding Nemo_ and _Bee Movie_). -* We also notice that among the results there are two movies that the person already watched. In the next example we will filter them out. +*KNN with SeededQuery* - Search for the top 10 movies most similar to the target vector, guided by a seed lexical query on the `genre` field, which provides the initial entry points in the vector graph search: -Search for the top 10 movies most similar to the resulting vector, excluding the movies already watched (KNN query with Filter Query): + http://localhost:8983/solr/films/query?seedQuery=genre:Family&q={!knn f=film_vector topK=10 seedQuery=$seedQuery}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789] - http://localhost:8983/solr/films/query?q={!knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("%2Fen%2Ffinding_nemo"%20"%2Fen%2Fbee_movie"%20"%2Fen%2Fharry_potter_and_the_chamber_of_secrets_2002") +- This allows the KNN algorithm to start the similarity exploration from documents that already match the lexical criteria, potentially improving relevance and reducing search time. - - Search for movies with "cinderella" in the name among the top 50 movies most similar to the target vector (KNN as Filter Query): +*KNN with EarlyTermination* - Search for the top 10 movies most similar to the target vector, allowing the KNN search to stop early for lower latency: - http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn%20f=film_vector%20topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789] + http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10 earlyTermination=true}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789] - * There are 3 "cinderella" movies in the index, but only 1 is among the top 50 most similar to the target vector (_Cinderella III: A Twist in Time_). +- This allows Solr to return results faster by stopping the graph search once a good enough set of neighbors is found, instead of exploring all nodes in the vector index. - - Search for movies with "animation" in the genre, and rerank the top 5 documents by combining (sum) the original query score with twice (2x) the similarity to the target vector (KNN with ReRanking): +**KNN with ReRanking** - Search for movies with "animation" in the genre, and rerank the top 5 documents by combining (sum) the original query score with twice (2x) the similarity to the target vector: - http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=5%20reRankWeight=2} + http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank reRankQuery=$rqq reRankDocs=5 reRankWeight=2} - * To guarantee we calculate the vector similarity score for all the movies, we set `topK=10000`, a number higher than the total number of documents (`1100`). +- To guarantee we calculate the vector similarity score for all the movies, we set `topK=10000`, a number higher than the total number of documents (`1100`). - * It's possible to combine the vector similarity scores with other scores, by using Sub-query, - xref:query-guide:function-queries.adoc[Function Queries] and xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter Dereferencing] Solr features: +It's possible to combine the vector similarity scores with other scores, by using Sub-query, xref:query-guide:function-queries.adoc[Function Queries] and xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter Dereferencing] Solr features: - Search for "harry potter" movies, ranking the results by the similarity to the target vector instead of the lexical query score. Beside the `q` parameter, we define a "sub-query" named `q_vector`, that will calculate the similarity score between all the movies (since we set `topK=10000`). Then we use the sub-query parameter name as input for the `sort`, specifying that we want to rank descending according to the vector similarity score (`sort=$q_vector desc`): - http://localhost:8983/solr/films/query?q=name:"harry%20potter"&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector%20desc + http://localhost:8983/solr/films/query?q=name:"harry potter"&q_vector={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector desc - Search for movies with "the" in the name, keeping the original lexical query ranking, but returning only movies with similarity to the target vector of 0.8 or higher. Like previously, we define the sub-query `q_vector`, but this time we use it as input for the `frange` filter, specifying that we want documents with at least 0.8 of vector similarity score: - http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange%20l=0.8}$q_vector + http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange l=0.8}$q_vector - Search for "batman" movies, ranking the results by combining 70% of the original lexical query score and 30% of the similarity to the target vector. Besides the `q` main query and the `q_vector` sub-query, we also specify the `q_lexical` query, which will hold the lexical score of the main `q` query. Then we specify a parameter variable called `score_combined`, which scales the lexical and similarity scores, applies the 0.7 and 0.3 weights, then sum the result. We set the `sort` parameter to order according the combined score, and also set the `fl` parameter so that we can view the intermediary and the combined score values in the response: - http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax%20v=$q}&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined%20desc&fl=name,score,$q_lexical,$q_vector,$score_combined + http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax v=$q}&q_vector={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined desc&fl=name,score,$q_lexical,$q_vector,$score_combined === Exercise 5 Wrap Up