chore: simplify dot implementation to use auto-vectorization#2645
Merged
chore: simplify dot implementation to use auto-vectorization#2645
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2645 +/- ##
==========================================
- Coverage 79.81% 79.73% -0.08%
==========================================
Files 224 224
Lines 65871 65827 -44
Branches 65871 65827 -44
==========================================
- Hits 52572 52489 -83
- Misses 10225 10256 +31
- Partials 3074 3082 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
broccoliSpicy
approved these changes
Jul 27, 2024
d2fbabd to
e33a1c5
Compare
62228e5 to
3480166
Compare
3480166 to
3b0a32e
Compare
wjones127
approved these changes
Nov 10, 2025
jackye1995
pushed a commit
to jackye1995/lance
that referenced
this pull request
Jan 21, 2026
…ormat#2645) This change makes the auto-vectorization version of dot(f32) as fast as manually written SIMD. Run benchmarks via ``` export RUSTFLAGS="-C target-cpu=native" git checkout main cargo bench --bench dot -- --save-baseline dot_main f32 git checkout lei/simplify_dot cargo bench --bench dot -- --baseline dot_main f32 ``` On Macbook M2 Max ``` Dot(f32, auto-vectorization) time: [88.812 ms 89.654 ms 90.306 ms] change: [-2.5819% -1.6876% -0.6964%] (p = 0.01 < 0.10) Change within noise threshold. ``` AMD 5900X ``` Dot(f32, auto-vectorization) time: [172.50 ms 176.41 ms 179.41 ms] change: [-2.3545% +0.6133% +3.5448%] (p = 0.69 > 0.10) No change in performance detected. ``` Intel Sapphire ``` Dot(f32, auto-vectorization) time: [331.36 ms 331.62 ms 331.93 ms] change: [-2.3160% -1.1226% -0.3451%] (p = 0.04 < 0.10) Change within noise threshold. ``` Graviton3 ``` Benchmarking Dot(f32, auto-vectorization): Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling. Dot(f32, auto-vectorization) time: [160.62 ms 160.70 ms 160.76 ms] change: [-1.1157% -0.6868% -0.2951%] (p = 0.00 < 0.10) Change within noise threshold. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change makes the auto-vectorization version of dot(f32) as fast as manually written SIMD.
Run benchmarks via
On Macbook M2 Max
AMD 5900X
Intel Sapphire
Graviton3