Skip to content

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Oct 10, 2024

  • Add Thesaurus API to find equivalent terms for a given term.
  • Enable Synonym Document objects with Synonym Field objects to provide Synonym Definitions.
    for creating the thesaurus in the search index.
  • Add a synonym section to handle synonym document processing; persist the synonym index in
    segments (separating it from the inverted and vector indexes), and manage the synonym index
    merging during segment merges.
  • Add command line tooling to access the thesaurus by parsing the segment file.
  • Update zap.md to reflect the index file format for thesaurus support.

@CascadingRadium CascadingRadium added the enhancement New feature or request label Oct 10, 2024
@CascadingRadium CascadingRadium self-assigned this Oct 10, 2024
@CascadingRadium CascadingRadium changed the title add a thesaurus datatype with its own section Add a thesaurus datatype with its own section Oct 16, 2024
@CascadingRadium CascadingRadium marked this pull request as ready for review October 16, 2024 09:52
@CascadingRadium CascadingRadium removed the request for review from moshaad7 November 5, 2024 12:33
@CascadingRadium CascadingRadium changed the title Add a thesaurus datatype with its own section Add Thesaurus API and Synonym Index Handling in Search Dec 12, 2024
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly ok to me. One comment around naming/commentary.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CascadingRadium . Let's get this in for now and incrementally improve the area as and when needed.

@abhinavdangeti abhinavdangeti merged commit 82553cd into master Dec 19, 2024
6 checks passed
@abhinavdangeti abhinavdangeti deleted the synonyms branch December 19, 2024 16:04
abhinavdangeti added a commit to blevesearch/bleve that referenced this pull request Dec 19, 2024
- Allow setting up `synonym_sources` in the index mapping, which will
follow its own ingest pipeline, ingesting special synonym definitions
using the IndexSynonym API().
- A `synonym_source` can be set like an analyzer to a field mapping and
can be set as a default option at the document mapping or the index
mapping level.
- Each `synonym_source` can have its own analyzer, making it flexible to
allow for compatibility with the language analyzer specified for its
corresponding mapping.
- Compatibility with every term-based query where the term gets expanded
to include its synonyms at query time.
- Dependencies:
- blevesearch/bleve_index_api@v1.2.0 -
blevesearch/bleve_index_api#57
- blevesearch/scorch_segment_api@v2.3.0 -
blevesearch/scorch_segment_api#46
- blevesearch/vellum@v1.1.0 -
blevesearch/vellum#22
- blevesearch/zapx@v16@latest -
blevesearch/zapx#268

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
@CascadingRadium
Copy link
Member Author

Thanks for merging @abhinavdangeti

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants