Migrate inverted index partitions into index segments

Inverted indexes work today with the concept of partitions.  During training we create possibly several partitions.  When we are adding new data we create a partition with the new data and add it to the index.  When we search we search the partitions in parallel, normalize the scores, and then do a top-k.

However, because partitions are not segments, they cannot stand on their own.  We cannot create a new partition and commit it on its own.  Instead we have to copy all the existing partitions and then add the new partition.  This leads to a lot of write amplification during the training process.  In addition, we've had to do special things like `deleted_fragments` because partitions don't have their own fragment bitmap, don't get remapped, etc.  Finally, it would be difficult to build out distributed search architecture that works similarly between vector search and full text search.

As a result, we should migrate from a "partitions" concept to the already established "segments" concept that exists in the table format today.  Each partition can be a segment.  This gives us all the same perks as before without the downsides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate inverted index partitions into index segments #6291

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrate inverted index partitions into index segments #6291

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions