Skip to content

Migrate inverted index partitions into index segments #6291

@westonpace

Description

@westonpace

Inverted indexes work today with the concept of partitions. During training we create possibly several partitions. When we are adding new data we create a partition with the new data and add it to the index. When we search we search the partitions in parallel, normalize the scores, and then do a top-k.

However, because partitions are not segments, they cannot stand on their own. We cannot create a new partition and commit it on its own. Instead we have to copy all the existing partitions and then add the new partition. This leads to a lot of write amplification during the training process. In addition, we've had to do special things like deleted_fragments because partitions don't have their own fragment bitmap, don't get remapped, etc. Finally, it would be difficult to build out distributed search architecture that works similarly between vector search and full text search.

As a result, we should migrate from a "partitions" concept to the already established "segments" concept that exists in the table format today. Each partition can be a segment. This gives us all the same perks as before without the downsides.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions