perf: merge partitions in stream style#5754
Conversation
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
PR Review: perf: merge partitions in stream styleThis PR improves memory efficiency by streaming partition loads during merge operations instead of loading all partitions upfront. The claim of 25% peak memory reduction is meaningful for large indices. P0/P1 Issues1. (P1) Edge case in single-partition path for lazy loading In the single-partition branch (lines 225-242 in the new code), partitions are copied without being loaded first. The code accesses part.store
.copy_index_file(&token_file_path(part.id), self.dest_store)
.await?;This is correct, but ensure that 2. (P1) Potential off-by-one in In next_id: max_id + 1,Then in self.builder = Some(InnerBuilder::new(
self.next_id,
with_position,
self.token_set_format,
));
self.next_id += 1;And in let next_id = builder.id() + 1;
// ...
self.next_id = next_id + 1; // This sets next_id to builder.id() + 2After flush, Minor Observations (not blocking)
Overall this is a solid memory optimization. Please verify the |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
this avoids reading all token sets and doc sets into memory, reduces the peak memory footprint by 25% related: lance-format#5754 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
this avoids reading all token sets and doc sets into memory, reduces the peak memory footprint by 25% related: lance-format#5754 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
this avoids reading all token sets and doc sets into memory, reduces the peak memory footprint by 25%
related: #5754