perf: add explicit SIMD types and distance kernels for f64 by justinrmiller · Pull Request #6540 · lance-format/lance

justinrmiller · 2026-04-16T08:08:14Z

Summary

Adds f64x4 and f64x8 SIMD types to lance-linalg with support for x86_64 (AVX2/AVX-512), aarch64 (NEON), and loongarch64 (LASX)
Replaces auto-vectorization-dependent f64 distance functions with explicit SIMD using two-level unrolling (f64x8 + f64x4 + scalar tail)
Updates norm_l2, dot, L2, and cosine distance for f64

Benchmark Results (Apple M-series, aarch64 NEON)

1M vectors × 1024 dimensions:

Benchmark	Before	After	Change
NormL2(f64, auto-vec)	117.76 ms	116.04 ms	~same
NormL2(f64, SIMD)	N/A (TODO)	119.16 ms	new
Dot(f64, auto-vec)	129.36 ms	130.23 ms	~same
L2(f64, auto-vec)	132.53 ms	135.15 ms	~same
Cosine(f64, auto-vec)	202.52 ms	139.23 ms	-31.4%

The biggest win is cosine distance, which previously had an empty impl Cosine for f64 {} falling back to the scalar path. The explicit SIMD implementation is 31% faster.

For norm_l2, dot, and L2, LLVM's auto-vectorization with the LANES=8 hint was already producing good code on this platform. The explicit SIMD ensures consistent performance across compilers and platforms rather than relying on fragile auto-vectorization hints.

Test plan

All 59 lance-linalg tests pass
Clippy clean (-D warnings)
cargo fmt clean
CI passes on all platforms

🤖 Generated with Claude Code

Adds f64x4 (256-bit) and f64x8 (512-bit / 2x256-bit) SIMD types with full support for x86_64 (AVX2/AVX-512), aarch64 (NEON), and loongarch64 (LASX). Replaces auto-vectorization-dependent f64 distance functions with explicit SIMD implementations using two-level unrolling (f64x8 main loop + f64x4 remainder + scalar tail). Functions updated: - norm_l2 for f64 (was TODO in benchmarks) - dot product for f64 - L2 distance for f64 - cosine distance for f64 (was empty impl, falling back to scalar) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-16T08:44:51Z

Codecov Report

❌ Patch coverage is 77.19298% with 78 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-linalg/src/simd/f64.rs	66.52%	78 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions Bot added the performance label Apr 16, 2026

Merge branch 'main' into jm/f64-simd-distance

268bd03

BubbleCal approved these changes Apr 16, 2026

View reviewed changes

BubbleCal merged commit c913ff8 into lance-format:main Apr 16, 2026
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add explicit SIMD types and distance kernels for f64#6540

perf: add explicit SIMD types and distance kernels for f64#6540
BubbleCal merged 2 commits intolance-format:mainfrom
justinrmiller:jm/f64-simd-distance

justinrmiller commented Apr 16, 2026

Uh oh!

codecov Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

justinrmiller commented Apr 16, 2026

Summary

Benchmark Results (Apple M-series, aarch64 NEON)

Test plan

Uh oh!

codecov Bot commented Apr 16, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants