perf: replace flatmap in build_distance_table#5898
Conversation
This replaces a pattern in build_distance_table where we unnecessarily destructure batches of distances using flatmap, and instead directly adds those batches as extensions of a preallocated vector.
|
50% benchmark improvement for 128d f32 vectors. Might be bigger for bigger vectors as well: |
BubbleCal
left a comment
There was a problem hiding this comment.
Impressive!
I'm a little bit surprised that flat_map slows down this
There was a problem hiding this comment.
This is worrying. The fact that his improves performance suggests exact_size() isn't working as expected. We use that in more places than this, so I think it's worth examining why.
lance/rust/lance-table/src/utils.rs
Lines 26 to 29 in aef1490
Specifying exact_size should be exactly equivalent to pre-sizing the vector. size_hint() should propagate to .collect() and be used to presize.
|
@BubbleCal @wjones127 what tipped me off was 32% of my CPU profile in extend_desugared. I think that does indeed indicate the size is not propagating. |
I'm not doubting the result of you benchmark. The improvement is good. But that means that |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |

This replaces a pattern in build_distance_table where we unnecessarily destructure batches of distances using flatmap, and instead directly adds those batches as extensions of a preallocated vector.