perf: use 8KB buffer for local ObjectWriter by wkalt · Pull Request #5907 · lance-format/lance

wkalt · 2026-02-07T17:29:20Z

We pick the buffer size for object writers according to caller configuration, or by defaulting to 5MB in order to guarantee a multipart write in object storage. For local storage, the 5MB buffer is not applicable and can be wasteful, if many writers are open simultaneously. We encounter that situation during the shuffle stage of an IVF-PQ index build.

Change the object writer to use an 8KB buffer when the object store in use is local.

We pick the buffer size for object writers according to caller configuration, or by defaulting to 5MB in order to guarantee a multipart write in object storage. For local storage, the 5MB buffer is not applicable and can be wasteful, if many writers are open simultaneously. We encounter that situation during the shuffle stage of an IVF-PQ index build. Change the object writer to use an 8KB buffer when the object store in use is local.

wkalt · 2026-02-07T17:31:16Z

@westonpace I see a promising improvement in the memory usage pattern during an IVF-PQ index build. Related to some stuff you are working on.

I do not have a full understanding yet of why the previous code is growing from peak to peak though.

codecov · 2026-02-07T18:21:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

westonpace · 2026-02-09T14:55:01Z

I'm surprised this has much impact at all as the file writer is going to do its own buffering and shouldn't really be sending small writes to the object writer in the first place. I don't think we can easily get rid of the file writer's default buffering as we want to avoid tiny pages for read performance reasons.

However, we could override the file writer's default buffering in the shuffler when we open the file writers because we don't care that much about read performance (since we are only going to read it once to write the final index).

westonpace · 2026-02-09T14:58:40Z

+        is_local: bool,
+    ) -> Bytes {
+        let new_capacity = if is_local {
+            8 * 1024 // 8 KB for local filesystem


Is this going to chop up large writes into tiny 8KiB writes? From a syscall perspective that maybe isn't the best approach. We should probably just send the entire buffer to the OS?

hmm, actually this is way worse than I thought. It is going to do a simulated multipart write on the local FS.

I prototyped a specialized local writer here that doesn't do the multipart simulation. I didn't see an improvement in write throughput, so left it aside, but feel free to play around with it. wjones127@7d7e30a

this works perfectly 👍

wkalt · 2026-02-10T16:45:18Z

@westonpace here is what the heap dumps show:

main:

      flat  flat%        cum   cum%  function
20470.93MB  71.5% 20471.43MB  71.5%  lance_io::object_writer::ObjectWriter::new::{{closure}}
 4263.99MB  14.9%  4264.99MB  14.9%  alloc::boxed::Box<T>::new
 2131.86MB   7.4%  6846.11MB  23.9%  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<...
  811.36MB   2.8%   811.36MB   2.8%  irallocx_prof
  384.00MB   1.3%   384.00MB   1.3%  <lance_file::io::LanceEncodingsIo as lance_encoding::EncodingsIo>::submit_req...
  272.00MB   1.0%   272.00MB   1.0%  bytes::bytes_mut::BytesMut::with_capacity
  124.00MB   0.4%   124.00MB   0.4%  mallocx
   70.00MB   0.2%    70.00MB   0.2%  prost::message::Message::encode_to_vec
   19.50MB   0.1%   286.31MB   1.0%  lance_file::writer::FileWriter::write_page::{{closure}}
   12.50MB   0.0%    12.50MB   0.0%  alloc::sync::Arc<[T],A>::allocate_for_slice_in::{{closure}}
   10.00MB   0.0%    10.00MB   0.0%  lance_io::object_writer::ObjectWriter::next_part_buffer
    9.50MB   0.0% 26813.49MB  93.7%  <lance_index::vector::v3::shuffler::IvfShuffler as lance_index::vector::v3::s...
    9.00MB   0.0%     9.50MB   0.0%  alloc::boxed::Box<T,A>::try_new_uninit_in
    7.52MB   0.0%     7.52MB   0.0%  lance_encoding::data::encode_flat_data
    7.50MB   0.0%    15.00MB   0.1%  prost::encoding::message::merge_repeated
    4.34MB   0.0%     4.34MB   0.0%  <T as alloc::vec::spec_from_elem::SpecFromElem>::from_elem
    4.00MB   0.0%     4.00MB   0.0%  prost::encoding::<impl prost::encoding::sealed::BytesAdapter for alloc::vec::...
    2.50MB   0.0%     2.50MB   0.0%  prost::encoding::uint64::merge_repeated::{{closure}}
    1.50MB   0.0%     1.50MB   0.0%  hashbrown::raw::alloc::inner::do_alloc
    1.02MB   0.0%     1.02MB   0.0%  lance_table::utils::stream::apply_row_id_and_deletes

patch:

      flat  flat%        cum   cum%  function
 4222.48MB  52.3%  4225.48MB  52.3%  alloc::boxed::Box<T>::new
 2020.79MB  25.0%  6702.79MB  83.0%  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<...
  828.50MB  10.3%   828.50MB  10.3%  irallocx_prof
  384.00MB   4.8%   384.00MB   4.8%  <lance_file::io::LanceEncodingsIo as lance_encoding::EncodingsIo>::submit_req...
  272.00MB   3.4%   272.00MB   3.4%  bytes::bytes_mut::BytesMut::with_capacity
  149.50MB   1.9%   149.50MB   1.9%  mallocx
   92.01MB   1.1%    92.01MB   1.1%  prost::message::Message::encode_to_vec
   30.74MB   0.4%    30.74MB   0.4%  lance_io::object_writer::ObjectWriter::next_part_buffer
   20.50MB   0.3%   345.69MB   4.3%  lance_file::writer::FileWriter::write_page::{{closure}}
   13.50MB   0.2%    17.01MB   0.2%  prost::encoding::message::merge_repeated
   12.50MB   0.2%    12.50MB   0.2%  alloc::sync::Arc<[T],A>::allocate_for_slice_in::{{closure}}
    9.03MB   0.1%     9.03MB   0.1%  lance_encoding::data::encode_flat_data
    8.50MB   0.1%    10.50MB   0.1%  alloc::boxed::Box<T,A>::try_new_uninit_in
    2.50MB   0.0%     2.50MB   0.0%  calloc
    2.00MB   0.0%     2.00MB   0.0%  prost::encoding::<impl prost::encoding::sealed::BytesAdapter for alloc::vec::...
    1.60MB   0.0%     1.60MB   0.0%  lance_table::utils::stream::apply_row_id_and_deletes
    1.50MB   0.0%     1.50MB   0.0%  hashbrown::raw::alloc::inner::do_alloc
    1.00MB   0.0%     1.00MB   0.0%  prost::encoding::uint64::merge_repeated::{{closure}}
    1.00MB   0.0%     1.00MB   0.0%  <T as alloc::slice::<impl [T]>::to_vec_in::ConvertVec>::to_vec
    0.75MB   0.0%     0.75MB   0.0%  lance::io::exec::filtered_read::FilteredReadStream::plan_scan::{{closure}}::{...
[wyatt@desktop lance](task/index-build-progress-reporting) $

those allocations in main are all these 5MB buffers in ObjectWriter::new. I have one for each partition (these are during IVF shuffle for a 100M row dataset).

I agree reducing the page size to 8k is not good. Do you have any thought on how to best accomplish this?

We should probably just send the entire buffer to the OS?

Change ObjectWriter to support that? or bypass ObjectWriter?

I have a chart now with all of my optimizations with and without this change, so I can disentangle it from the other change to remove that buffer. The difference is much more significant and may indicate something that needs to be looked at

There are 4096 partitions here, which is 20GB when multiplied by 5MB.

edit: There was some time between those last two thoughts... given the 20GB, this result actually seems reasonable (for the 5MB setting) and the initial allocation/increase is the buffers becoming resident.

wkalt · 2026-02-11T22:17:59Z

superseded by #5939

github-actions Bot added the performance label Feb 7, 2026

wkalt mentioned this pull request Feb 7, 2026

feat!: support index progress reporting via callbacks #5910

Merged

westonpace reviewed Feb 9, 2026

View reviewed changes

wkalt mentioned this pull request Feb 11, 2026

perf: create local writer for efficient local writes #5939

Merged

wkalt closed this Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use 8KB buffer for local ObjectWriter#5907

perf: use 8KB buffer for local ObjectWriter#5907
wkalt wants to merge 1 commit intolance-format:mainfrom
wkalt:task/local-writer-smaller-buffer

wkalt commented Feb 7, 2026

Uh oh!

wkalt commented Feb 7, 2026

Uh oh!

codecov Bot commented Feb 7, 2026

Uh oh!

westonpace commented Feb 9, 2026

Uh oh!

westonpace Feb 9, 2026

Uh oh!

wkalt Feb 10, 2026

Uh oh!

wjones127 Feb 10, 2026

Uh oh!

wkalt Feb 11, 2026

Uh oh!

wkalt commented Feb 10, 2026 •

edited

Loading

Uh oh!

wkalt commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wkalt commented Feb 7, 2026

Uh oh!

wkalt commented Feb 7, 2026

Uh oh!

codecov Bot commented Feb 7, 2026

Codecov Report

Uh oh!

westonpace commented Feb 9, 2026

Uh oh!

westonpace Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

wkalt Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

wkalt Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

wkalt commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wkalt commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wkalt commented Feb 10, 2026 •

edited

Loading