-
Notifications
You must be signed in to change notification settings - Fork 697
MB-35347: Synonym Search #2090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
MB-35347: Synonym Search #2090
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
f6170c3
real first draft
CascadingRadium 263e990
fix bug
CascadingRadium c5fc548
small fix the first draft
CascadingRadium 542d34a
glue code for indexing path
CascadingRadium 6627dcb
unit test
CascadingRadium 32c67af
query path first draft
CascadingRadium 784f45b
minor fixes
CascadingRadium f19cedc
bug fixes and unit tests for single index implementation
CascadingRadium 91be472
remove regex optimization
CascadingRadium 9baf914
add default synonym sources
CascadingRadium dd692bf
refactor code
CascadingRadium 4d4440e
alias path code
CascadingRadium 7b86533
Presearch Code Refactor
CascadingRadium d58474f
fix comment
CascadingRadium 3908df3
Add ExtractFields API with unit test
CascadingRadium 31973e0
bug fix
CascadingRadium 2130f3c
final fixes to alias query path
CascadingRadium 5494ea1
Merge branch 'presearchRefactor' into synonyms
CascadingRadium 59c3193
rebase
CascadingRadium cad9e79
fix bug
CascadingRadium f3d0ac5
optimization
CascadingRadium e3b1d5b
bleve APIs
CascadingRadium b469373
minor fix
CascadingRadium 67815ad
bug fix
CascadingRadium 55f6d4c
make default_synonym_source omitempty
CascadingRadium 1a6dd1e
fix bugs
CascadingRadium ee71211
refactor bleve APIs
CascadingRadium a4d83ac
add additional methods to interface
CascadingRadium 1cf2bfd
update interface name
CascadingRadium 302147b
go.mod update
CascadingRadium 2971072
merge master
CascadingRadium 0db25b4
Merge branch 'master' into synonyms
CascadingRadium e062cd7
reposition
CascadingRadium ccb4a71
Merge branch 'master' into synonyms
CascadingRadium 41fc99e
refactor
CascadingRadium 0f11d73
minor fix
CascadingRadium c731844
test fix
CascadingRadium 148d32a
Bump up zap/v16
abhinavdangeti File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| // Copyright (c) 2024 Couchbase, Inc. | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); | ||
| // you may not use this file except in compliance with the License. | ||
| // You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, software | ||
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| package document | ||
|
|
||
| import ( | ||
| "reflect" | ||
|
|
||
| "github.com/blevesearch/bleve/v2/analysis" | ||
| "github.com/blevesearch/bleve/v2/size" | ||
| index "github.com/blevesearch/bleve_index_api" | ||
| ) | ||
|
|
||
| var reflectStaticSizeSynonymField int | ||
|
|
||
| func init() { | ||
| var f SynonymField | ||
| reflectStaticSizeSynonymField = int(reflect.TypeOf(f).Size()) | ||
| } | ||
|
|
||
| const DefaultSynonymIndexingOptions = index.IndexField | ||
|
|
||
| type SynonymField struct { | ||
| name string | ||
| analyzer analysis.Analyzer | ||
| options index.FieldIndexingOptions | ||
| input []string | ||
| synonyms []string | ||
| numPlainTextBytes uint64 | ||
|
|
||
| // populated during analysis | ||
| synonymMap map[string][]string | ||
| } | ||
|
|
||
| func (s *SynonymField) Size() int { | ||
| return reflectStaticSizeSynonymField + size.SizeOfPtr + | ||
| len(s.name) | ||
| } | ||
|
|
||
| func (s *SynonymField) Name() string { | ||
| return s.name | ||
| } | ||
|
|
||
| func (s *SynonymField) ArrayPositions() []uint64 { | ||
| return nil | ||
| } | ||
|
|
||
| func (s *SynonymField) Options() index.FieldIndexingOptions { | ||
| return s.options | ||
| } | ||
|
|
||
| func (s *SynonymField) NumPlainTextBytes() uint64 { | ||
| return s.numPlainTextBytes | ||
| } | ||
|
|
||
| func (s *SynonymField) AnalyzedLength() int { | ||
| return 0 | ||
| } | ||
|
|
||
| func (s *SynonymField) EncodedFieldType() byte { | ||
| return 'y' | ||
| } | ||
|
|
||
| func (s *SynonymField) AnalyzedTokenFrequencies() index.TokenFrequencies { | ||
| return nil | ||
| } | ||
|
|
||
| func (s *SynonymField) Analyze() { | ||
| var analyzedInput []string | ||
| if len(s.input) > 0 { | ||
| analyzedInput = make([]string, 0, len(s.input)) | ||
| for _, term := range s.input { | ||
| analyzedTerm := analyzeSynonymTerm(term, s.analyzer) | ||
| if analyzedTerm != "" { | ||
| analyzedInput = append(analyzedInput, analyzedTerm) | ||
| } | ||
| } | ||
| } | ||
| analyzedSynonyms := make([]string, 0, len(s.synonyms)) | ||
| for _, syn := range s.synonyms { | ||
| analyzedTerm := analyzeSynonymTerm(syn, s.analyzer) | ||
| if analyzedTerm != "" { | ||
| analyzedSynonyms = append(analyzedSynonyms, analyzedTerm) | ||
| } | ||
| } | ||
| s.synonymMap = processSynonymData(analyzedInput, analyzedSynonyms) | ||
| } | ||
|
|
||
| func (s *SynonymField) Value() []byte { | ||
| return nil | ||
| } | ||
|
|
||
| func (s *SynonymField) IterateSynonyms(visitor func(term string, synonyms []string)) { | ||
| for term, synonyms := range s.synonymMap { | ||
| visitor(term, synonyms) | ||
| } | ||
| } | ||
|
|
||
| func NewSynonymField(name string, analyzer analysis.Analyzer, input []string, synonyms []string) *SynonymField { | ||
| return &SynonymField{ | ||
| name: name, | ||
| analyzer: analyzer, | ||
| options: DefaultSynonymIndexingOptions, | ||
| input: input, | ||
| synonyms: synonyms, | ||
| } | ||
| } | ||
|
|
||
| func processSynonymData(input []string, synonyms []string) map[string][]string { | ||
| var synonymMap map[string][]string | ||
| if len(input) > 0 { | ||
| // Map each term to the same list of synonyms. | ||
| synonymMap = make(map[string][]string, len(input)) | ||
| for _, term := range input { | ||
| synonymMap[term] = synonyms | ||
| } | ||
| } else { | ||
| synonymMap = make(map[string][]string, len(synonyms)) | ||
| // Precompute a map where each synonym points to all other synonyms. | ||
| for i, elem := range synonyms { | ||
| synonymMap[elem] = make([]string, 0, len(synonyms)-1) | ||
| for j, otherElem := range synonyms { | ||
| if i != j { | ||
| synonymMap[elem] = append(synonymMap[elem], otherElem) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| return synonymMap | ||
| } | ||
|
|
||
| func analyzeSynonymTerm(term string, analyzer analysis.Analyzer) string { | ||
| tokenStream := analyzer.Analyze([]byte(term)) | ||
| if len(tokenStream) == 1 { | ||
| return string(tokenStream[0].Term) | ||
| } | ||
| return "" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.