To track this issue: #5117 (comment)
cpoied here:
- The partition sources of the single-machine and the distributed mode are different:
- for the single machine: the data comes from the memory;
- for distributed: the data is persisted into the storage;
- About the file writer initatization they are also different:
- for the single machine, it is initialized in the
merge_partitions method;
- for the distributed: we implement via
init_writer_for_flat/pq/sq for different vector index type;
- About the logic of the merger, they are also different due to different partition data sources.
For different partition sources, it would be better to abstract a PartitionSource trait. After that, we can introduce a UnifiedPartitionMerger to do a general merger. Introducing a StorageWriterFactory to create different writers.
For a merger, the generic logic can be split into four common steps:
- create merger;
- instantiate merger;
- merger#merge();
- write final metadata;
To track this issue: #5117 (comment)
cpoied here:
merge_partitionsmethod;init_writer_for_flat/pq/sqfor different vector index type;For different partition sources, it would be better to abstract a
PartitionSourcetrait. After that, we can introduce aUnifiedPartitionMergerto do a general merger. Introducing aStorageWriterFactoryto create different writers.For a merger, the generic logic can be split into four common steps: