chore: simplify dot implementation to use auto-vectorization by eddyxu · Pull Request #2645 · lance-format/lance

eddyxu · 2024-07-26T17:25:30Z

This change makes the auto-vectorization version of dot(f32) as fast as manually written SIMD.

Run benchmarks via

export RUSTFLAGS="-C target-cpu=native"
git checkout main
cargo bench --bench dot -- --save-baseline dot_main f32
git checkout lei/simplify_dot
cargo bench --bench dot -- --baseline dot_main f32

On Macbook M2 Max

Dot(f32, auto-vectorization)
                        time:   [88.812 ms 89.654 ms 90.306 ms]
                        change: [-2.5819% -1.6876% -0.6964%] (p = 0.01 < 0.10)
                        Change within noise threshold.

AMD 5900X

Dot(f32, auto-vectorization)
                        time:   [172.50 ms 176.41 ms 179.41 ms]
                        change: [-2.3545% +0.6133% +3.5448%] (p = 0.69 > 0.10)
                        No change in performance detected.

Intel Sapphire

Dot(f32, auto-vectorization)
                        time:   [331.36 ms 331.62 ms 331.93 ms]
                        change: [-2.3160% -1.1226% -0.3451%] (p = 0.04 < 0.10)
                        Change within noise threshold.

Graviton3

Benchmarking Dot(f32, auto-vectorization): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling.
Dot(f32, auto-vectorization)
                        time:   [160.62 ms 160.70 ms 160.76 ms]
                        change: [-1.1157% -0.6868% -0.2951%] (p = 0.00 < 0.10)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

codecov-commenter · 2024-07-26T19:42:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.73%. Comparing base (3edfa50) to head (62228e5).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2645      +/-   ##
==========================================
- Coverage   79.81%   79.73%   -0.08%     
==========================================
  Files         224      224              
  Lines       65871    65827      -44     
  Branches    65871    65827      -44     
==========================================
- Hits        52572    52489      -83     
- Misses      10225    10256      +31     
- Partials     3074     3082       +8

Flag	Coverage Δ
unittests	`79.73% <100.00%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ormat#2645) This change makes the auto-vectorization version of dot(f32) as fast as manually written SIMD. Run benchmarks via ``` export RUSTFLAGS="-C target-cpu=native" git checkout main cargo bench --bench dot -- --save-baseline dot_main f32 git checkout lei/simplify_dot cargo bench --bench dot -- --baseline dot_main f32 ``` On Macbook M2 Max ``` Dot(f32, auto-vectorization) time: [88.812 ms 89.654 ms 90.306 ms] change: [-2.5819% -1.6876% -0.6964%] (p = 0.01 < 0.10) Change within noise threshold. ``` AMD 5900X ``` Dot(f32, auto-vectorization) time: [172.50 ms 176.41 ms 179.41 ms] change: [-2.3545% +0.6133% +3.5448%] (p = 0.69 > 0.10) No change in performance detected. ``` Intel Sapphire ``` Dot(f32, auto-vectorization) time: [331.36 ms 331.62 ms 331.93 ms] change: [-2.3160% -1.1226% -0.3451%] (p = 0.04 < 0.10) Change within noise threshold. ``` Graviton3 ``` Benchmarking Dot(f32, auto-vectorization): Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling. Dot(f32, auto-vectorization) time: [160.62 ms 160.70 ms 160.76 ms] change: [-1.1157% -0.6868% -0.2951%] (p = 0.00 < 0.10) Change within noise threshold. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild ```

github-actions Bot added the chore label Jul 26, 2024

eddyxu marked this pull request as ready for review July 27, 2024 19:16

broccoliSpicy approved these changes Jul 27, 2024

View reviewed changes

eddyxu force-pushed the lei/simplify_dot branch 2 times, most recently from d2fbabd to e33a1c5 Compare July 29, 2024 15:35

github-actions Bot added the Stale label Nov 6, 2025

simplify dot implemetation

4efe649

wjones127 force-pushed the lei/simplify_dot branch from 62228e5 to 3480166 Compare November 10, 2025 18:42

fix duration import

3b0a32e

wjones127 force-pushed the lei/simplify_dot branch from 3480166 to 3b0a32e Compare November 10, 2025 18:54

wjones127 approved these changes Nov 10, 2025

View reviewed changes

wjones127 merged commit 96cfdf2 into main Nov 10, 2025
30 of 31 checks passed

wjones127 deleted the lei/simplify_dot branch November 10, 2025 19:52

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: simplify dot implementation to use auto-vectorization#2645

chore: simplify dot implementation to use auto-vectorization#2645
wjones127 merged 2 commits intomainfrom
lei/simplify_dot

eddyxu commented Jul 26, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eddyxu commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eddyxu commented Jul 26, 2024 •

edited

Loading

codecov-commenter commented Jul 26, 2024 •

edited

Loading