Skip to content

perf: add explicit SIMD types and distance kernels for f64#6540

Merged
BubbleCal merged 2 commits intolance-format:mainfrom
justinrmiller:jm/f64-simd-distance
Apr 16, 2026
Merged

perf: add explicit SIMD types and distance kernels for f64#6540
BubbleCal merged 2 commits intolance-format:mainfrom
justinrmiller:jm/f64-simd-distance

Conversation

@justinrmiller
Copy link
Copy Markdown
Contributor

Summary

  • Adds f64x4 and f64x8 SIMD types to lance-linalg with support for x86_64 (AVX2/AVX-512), aarch64 (NEON), and loongarch64 (LASX)
  • Replaces auto-vectorization-dependent f64 distance functions with explicit SIMD using two-level unrolling (f64x8 + f64x4 + scalar tail)
  • Updates norm_l2, dot, L2, and cosine distance for f64

Benchmark Results (Apple M-series, aarch64 NEON)

1M vectors × 1024 dimensions:

Benchmark Before After Change
NormL2(f64, auto-vec) 117.76 ms 116.04 ms ~same
NormL2(f64, SIMD) N/A (TODO) 119.16 ms new
Dot(f64, auto-vec) 129.36 ms 130.23 ms ~same
L2(f64, auto-vec) 132.53 ms 135.15 ms ~same
Cosine(f64, auto-vec) 202.52 ms 139.23 ms -31.4%

The biggest win is cosine distance, which previously had an empty impl Cosine for f64 {} falling back to the scalar path. The explicit SIMD implementation is 31% faster.

For norm_l2, dot, and L2, LLVM's auto-vectorization with the LANES=8 hint was already producing good code on this platform. The explicit SIMD ensures consistent performance across compilers and platforms rather than relying on fragile auto-vectorization hints.

Test plan

  • All 59 lance-linalg tests pass
  • Clippy clean (-D warnings)
  • cargo fmt clean
  • CI passes on all platforms

🤖 Generated with Claude Code

Adds f64x4 (256-bit) and f64x8 (512-bit / 2x256-bit) SIMD types with
full support for x86_64 (AVX2/AVX-512), aarch64 (NEON), and
loongarch64 (LASX). Replaces auto-vectorization-dependent f64 distance
functions with explicit SIMD implementations using two-level unrolling
(f64x8 main loop + f64x4 remainder + scalar tail).

Functions updated:
- norm_l2 for f64 (was TODO in benchmarks)
- dot product for f64
- L2 distance for f64
- cosine distance for f64 (was empty impl, falling back to scalar)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 77.19298% with 78 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-linalg/src/simd/f64.rs 66.52% 78 Missing ⚠️

📢 Thoughts on this report? Let us know!

@BubbleCal BubbleCal merged commit c913ff8 into lance-format:main Apr 16, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants