Skip to content

perf: replace flatmap in build_distance_table#5898

Merged
LuQQiu merged 2 commits intolance-format:mainfrom
wkalt:perf/replace-flatmap-build-distance-table
Feb 5, 2026
Merged

perf: replace flatmap in build_distance_table#5898
LuQQiu merged 2 commits intolance-format:mainfrom
wkalt:perf/replace-flatmap-build-distance-table

Conversation

@wkalt
Copy link
Copy Markdown
Contributor

@wkalt wkalt commented Feb 5, 2026

This replaces a pattern in build_distance_table where we unnecessarily destructure batches of distances using flatmap, and instead directly adds those batches as extensions of a preallocated vector.

This replaces a pattern in build_distance_table where we unnecessarily
destructure batches of distances using flatmap, and instead directly
adds those batches as extensions of a preallocated vector.
@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented Feb 5, 2026

50% benchmark improvement for 128d f32 vectors. Might be bigger for bigger vectors as well:

[~/work/sophon/src/lance] (detached v2.0.0-rc.4) $ cargo bench -p lance-index --bench pq_dist_table -- "construct_dist_table" --baseline before
warning: lance-linalg@2.0.0-rc.4: fp16kernels feature is not enabled, skipping build of fp16 kernels
warning: lance-linalg@2.0.0-rc.4: fp16kernels feature is not enabled, skipping build of fp16 kernels
   Compiling lance-index v2.0.0-rc.4 (/mnt/work/home/wyatt/work/sophon/src/lance/rust/lance-index)
    Finished `bench` profile [optimized + debuginfo] target(s) in 27.62s
     Running benches/pq_dist_table.rs (target/release/deps/pq_dist_table-c24171fd2fc66882)
construct_dist_table: l2,PQ=16,DIM=128,type=f16
                        time:   [26.269 µs 26.313 µs 26.369 µs]
                        change: [-17.611% -17.457% -17.305%] (p = 0.00 < 0.10)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

construct_dist_table: dot,PQ=16,DIM=128,type=f16
                        time:   [29.001 µs 29.047 µs 29.083 µs]
                        change: [-14.388% -14.179% -13.987%] (p = 0.00 < 0.10)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

construct_dist_table: l2,PQ=16,DIM=128,type=f32
                        time:   [7.5369 µs 7.5457 µs 7.5548 µs]
                        change: [-52.469% -52.383% -52.298%] (p = 0.00 < 0.10)
                        Performance has improved.

construct_dist_table: dot,PQ=16,DIM=128,type=f32
                        time:   [12.795 µs 12.815 µs 12.830 µs]
                        change: [-32.818% -32.661% -32.524%] (p = 0.00 < 0.10)
                        Performance has improved.

construct_dist_table: l2,PQ=16,DIM=128,type=f64
                        time:   [7.4224 µs 7.4262 µs 7.4294 µs]
                        change: [-41.902% -41.804% -41.715%] (p = 0.00 < 0.10)
                        Performance has improved.

construct_dist_table: dot,PQ=16,DIM=128,type=f64
                        time:   [11.138 µs 11.150 µs 11.163 µs]
                        change: [-30.590% -30.377% -30.131%] (p = 0.00 < 0.10)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive!

I'm a little bit surprised that flat_map slows down this

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worrying. The fact that his improves performance suggests exact_size() isn't working as expected. We use that in more places than this, so I think it's worth examining why.

pub struct ExactSize<I> {
inner: I,
size: usize,
}

Specifying exact_size should be exactly equivalent to pre-sizing the vector. size_hint() should propagate to .collect() and be used to presize.

@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented Feb 5, 2026

@BubbleCal @wjones127 what tipped me off was 32% of my CPU profile in extend_desugared. I think that does indeed indicate the size is not propagating.

@wjones127
Copy link
Copy Markdown
Contributor

what tipped me off was 32% of my CPU profile in extend_desugared.

I'm not doubting the result of you benchmark. The improvement is good.

But that means that .exact_size() has a bug and it would be worth fixing that. If you don't want to do that now we can make a follow up issue.

@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented Feb 5, 2026

oh yeah, I agree with you. I was not intending to take this patch further myself, but I agree it's worth investigation. FWIW I think the overhead is not just allocation. I think there are other overheads in the iteration flatmap is doing, that are likely costing us too
Selection_114

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@LuQQiu LuQQiu merged commit 97157a5 into lance-format:main Feb 5, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants