Skip to content

perf!: remove shuffle buffer#5912

Merged
westonpace merged 3 commits intolance-format:mainfrom
wkalt:task/remove-shuffle-buffer
Feb 10, 2026
Merged

perf!: remove shuffle buffer#5912
westonpace merged 3 commits intolance-format:mainfrom
wkalt:task/remove-shuffle-buffer

Conversation

@wkalt
Copy link
Copy Markdown
Contributor

@wkalt wkalt commented Feb 8, 2026

This removes a buffer in the shuffler that accumulated batches for batched writes to temporary storage. This was configured with a public buffer_size parameter hence the breaking change.

Previously, when we shuffled data we accumulated this many batches for each partition in memory and then flushed them all to disk at once. This may have been intended as an optimization in the original implementation of the shuffler, which supported external shuffling through arbitrary object storage. However, the shuffler was subsequently hardcoded to use local disk (where this kind of buffering serves no benefit) and even on remote object storage, we already have a layer of buffering in the storage writer.

Instead of buffering batches, just write them directly to the FileWriter. This results in much more predictable memory usage and also faster index builds.

This removes a buffer in the shuffler that accumulated batches for
batched writes to temporary storage. This was configured with a public
buffer_size parameter hence the breaking change.

Previously, when we shuffled data we accumulated this many batches for
each partition in memory and then flushed them all to disk at once. This
may have been intended as an optimization in the original implementation
of the shuffler, which supported external shuffling through arbitrary
object storage. However, the shuffler was subsequently hardcoded to use
local disk (where this kind of buffering serves no benefit) and even on
remote object storage, we already have a layer of buffering in the
storage writer.

Instead of buffering batches, just write them directly to the
FileWriter. This results in much more predictable memory usage and also
faster index builds.
@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented Feb 8, 2026

progress

Here is the result on an index build over 100M 384d vectors. Note that this chart still shows a slow leak over the course of the shuffle. I assume that's unrelated and haven't looked at it yet.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change!

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! The writers will already do their own buffering if they need to so I agree this extra layer of buffering is not needed.

if !batches.is_empty() {
partition_sizes[part_id] += batches.iter().map(|b| b.num_rows()).sum::<usize>();
futs.push(writer.write_batches(batches.iter()));
writers[part_id].write_batches(batches.iter()).await?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this in a follow-up but it might be nice to still do all the writes in parallel. E.g. keep the futs Vec. shuffled is a Vec and not any kind of stream / iterator so the data is all in memory already (I think the important point is getting rid of the if counter % self.buffer_size == 0)

let mut futs = vec![];
if !batches.is_empty() {
    futs.push(writers[part_id].write_batches(batches.iter()));
}
try_join_all(futs).await?;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated

@westonpace westonpace merged commit 0b2c9e3 into lance-format:main Feb 10, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants