perf: in-register lookup table & SIMD for 4bit PQ by BubbleCal · Pull Request #3178 · lance-format/lance

BubbleCal · 2024-11-26T05:00:58Z

4bit PQ is 3x faster than before:

16000,l2,PQ=96x4,DIM=1536
                        time:   [187.17 µs 187.95 µs 188.52 µs]
                        change: [-65.789% -65.641% -65.520%] (p = 0.00 < 0.10)
                        Performance has improved.

16000,cosine,PQ=96x4,DIM=1536
                        time:   [214.16 µs 214.52 µs 214.89 µs]
                        change: [-62.748% -62.594% -62.442%] (p = 0.00 < 0.10)
                        Performance has improved.

16000,dot,PQ=96x4,DIM=1536
                        time:   [190.12 µs 191.27 µs 192.22 µs]
                        change: [-65.496% -65.303% -65.086%] (p = 0.00 < 0.10)
                        Performance has improved.

post 8bit PQ results here for comparing, in short 4bit PQ is about 2x faster with the same index params:

compute_distances: 16000,l2,PQ=96,DIM=1536
                        time:   [405.11 µs 405.72 µs 406.92 µs]
                        change: [-0.2844% +0.1588% +0.6035%] (p = 0.50 > 0.10)
                        No change in performance detected.

compute_distances: 16000,cosine,PQ=96,DIM=1536
                        time:   [419.98 µs 421.05 µs 421.99 µs]
                        change: [-0.2540% +0.1098% +0.4928%] (p = 0.59 > 0.10)
                        No change in performance detected.

compute_distances: 16000,dot,PQ=96,DIM=1536
                        time:   [432.08 µs 433.63 µs 435.69 µs]
                        change: [-25.522% -25.243% -24.938%] (p = 0.00 < 0.10)
                        Performance has improved.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

…-dist-table

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

…-dist-table

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

codecov-commenter · 2024-11-28T07:31:50Z

Codecov Report

Attention: Patch coverage is 57.07071% with 170 lines in your changes missing coverage. Please review.

Project coverage is 78.51%. Comparing base (6e84834) to head (5fc527c).

Files with missing lines	Patch %	Lines
rust/lance-linalg/src/simd/u8.rs	47.35%	169 Missing ⚠️
rust/lance-index/src/vector/pq/storage.rs	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3178      +/-   ##
==========================================
- Coverage   78.62%   78.51%   -0.12%     
==========================================
  Files         243      244       +1     
  Lines       82889    83213     +324     
  Branches    82889    83213     +324     
==========================================
+ Hits        65170    65331     +161     
- Misses      14933    15099     +166     
+ Partials     2786     2783       -3

Flag	Coverage Δ
unittests	`78.51% <57.07%> (-0.12%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

eddyxu · 2024-11-29T13:16:15Z

+    // let qmax = distance_table
+    //     .chunks(NUM_CENTROIDS)
+    //     .tuple_windows()
+    //     .map(|(a, b)| {


delete those ?

eddyxu · 2024-11-29T13:16:40Z


+// Quantize the distance table to u8
+// returns quantized_distance_table
+// used for only 4bit PQ so num_centroids must be 16


Can you add comment about what are the returns

eddyxu · 2024-11-29T13:19:00Z

        let pq_codes = UInt8Array::from_iter_values(pq_codes);
        let transposed_codes = transpose(&pq_codes, num_vectors, num_sub_vectors);
-        let distances = compute_l2_distance(
+        let distances = compute_pq_distance(


We don't use dot anymore ?

compute_l2_distance and compute_dot_distance are the same, so keep only one.
the diff is at building distance table

eddyxu · 2024-11-29T15:18:08Z

+#[derive(Clone, Copy)]
+pub struct u8x16(pub __m128i);
+
+/// 16 of 32-bit `f32` values. Use 512-bit SIMD if possible.


This is copied from simd/f32?

eddyxu · 2024-11-29T15:18:50Z

+    }
+
+    #[inline]
+    pub fn right_shift_4(self) -> Self {


does this api compatible with portable_simd?

didn't see bit shifting operation in portable_simd

eddyxu · 2024-11-29T15:19:09Z

+        }
+        #[cfg(target_arch = "loongarch64")]
+        unsafe {
+            Self(lasx_xvfrsh_b(self.0, 4))


huh you figured out how to use longarch?

lol no, no way to test it, let me remove all loongarch code for u8x16

eddyxu · 2024-11-29T15:19:29Z

+        unsafe {
+            Self(vandq_u8(self.0, vdupq_n_u8(mask)))
+        }
+        #[cfg(target_arch = "loongarch64")]


Shall we always have a fallback for non simd route?

eddyxu · 2024-11-29T15:20:58Z

+    fn reduce_min(&self) -> u8 {
+        #[cfg(target_arch = "x86_64")]
+        unsafe {
+            let low = _mm_and_si128(self.0, _mm_set1_epi8(0xFF_u8 as i8));


this is only using sse1? Curious whether there are avx2 related coding to make this even faster.

didn't find a avx2 intrinsic to do this, but reduce_min is not used for now

Lets just delete reduce_sum and reduce_min if they are not used.

eddyxu · 2024-11-29T15:21:22Z

+        #[cfg(target_arch = "aarch64")]
+        unsafe {
+            Self(vminq_u8(self.0, rhs.0))
+        }


lets always have a fallback route

eddyxu · 2024-11-29T15:22:20Z

-    #[case(4, DistanceType::L2, 0.9)]
-    #[case(4, DistanceType::Cosine, 0.9)]
-    #[case(4, DistanceType::Dot, 0.8)]
+    #[case(4, DistanceType::L2, 0.75)]


You mentioned the new algorithm can have decent recall? Should we bump this up

chebbyChefNEQ · 2024-11-29T15:38:45Z

+    let (qmin, qmax, distance_table) = quantize_distance_table(distance_table);
    let num_vectors = code.len() * 2 / num_sub_vectors;
-    let mut distances = vec![0.0_f32; num_vectors];
+    // store the distances in u32 to avoid overflow


Signed-off-by: BubbleCal <bubble-cal@outlook.com>

eddyxu · 2024-12-04T17:02:47Z

+            debug_assert_eq!(dist_table.as_array(), origin_dist_table.as_array());
+
+            // compute next distances
+            let next_indices = vec_indices.right_shift_4();


Should we just implement a Shr for u8x16? This interface looks weird.

eddyxu · 2024-12-04T17:07:56Z

+    fn shuffle(&self, indices: u8x16) -> Self {
+        #[cfg(target_arch = "x86_64")]
+        unsafe {
+            Self(_mm_shuffle_epi8(self.0, indices.0))


Would it be faster if we can use https://doc.rust-lang.org/beta/core/arch/x86_64/fn._mm256_shuffle_epi8.html (u8x32)

Yeah I believe so,
Chose u8x16 because it fit in arm register size

we can implement u8x32 as 2 of 128bit register on arm? Just in general this can speed up x86 old cpu a lot, similar to https://github.com/lancedb/lance/blob/main/rust/lance-linalg/src/simd/f32.rs#L462

doable, let's do this next PR? that would need to change the computation logic as well, because there are only 16 centroids in distance_table for each sub vector.

eddyxu · 2024-12-04T17:09:31Z

+                .into_iter()
+                .zip(distances.iter_mut())
+                .for_each(|(d, sum)| {
+                    *sum += d as f32;


is this reduce_sum?

no, sum is distances[i]

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

…-dist-table

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

perf: in-register lookup table & SIMD for 4bit PQ

48bac16

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

github-actions Bot added the performance label Nov 26, 2024

BubbleCal added 18 commits November 26, 2024 13:03

Merge branch 'main' of https://github.com/lancedb/lance into quantize…

37e8902

…-dist-table

quantize distance table

e24084d

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

done

607f16d

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

4504fbe

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

6c13fc1

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

506d052

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

aef5b11

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

a2bc897

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

22843d3

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

9449c9d

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix x86 shift

23ffa79

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix x86 u8x16 multiply

31f53ed

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix ut

8040a46

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix ut

e5f8279

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

normalize vectors for tests

f5371d8

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix ut

b8a2016

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

Merge branch 'main' of https://github.com/lancedb/lance into quantize…

3a23bc8

…-dist-table

lower recall requirement

4406c08

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal requested review from chebbyChefNEQ and eddyxu November 28, 2024 07:27

BubbleCal marked this pull request as ready for review November 28, 2024 08:15

BubbleCal added 3 commits November 29, 2024 19:45

fix recall

e22c214

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix iport

6f7cac9

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix clippy

48f5989

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

eddyxu reviewed Nov 29, 2024

View reviewed changes

chebbyChefNEQ reviewed Nov 29, 2024

View reviewed changes

BubbleCal added 2 commits December 1, 2024 12:38

fix

3548651

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fallback non-simd impl

aad20da

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal requested review from chebbyChefNEQ and eddyxu December 2, 2024 04:36

eddyxu reviewed Dec 4, 2024

View reviewed changes

BubbleCal added 2 commits December 5, 2024 13:05

impl right shift

c650893

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

7d927cd

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

github-actions Bot added the python label Dec 5, 2024

BubbleCal added 4 commits December 5, 2024 13:16

Merge branch 'main' of https://github.com/lancedb/lance into quantize…

1fdb2bc

…-dist-table

fix

7bfca29

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

d93e4f9

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

5fc527c

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal requested a review from eddyxu December 5, 2024 07:01

eddyxu approved these changes Dec 5, 2024

View reviewed changes

BubbleCal merged commit 6c7b9fd into lance-format:main Dec 5, 2024

Conversation

BubbleCal commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BubbleCal commented Nov 26, 2024 •

edited

Loading

codecov-commenter commented Nov 28, 2024 •

edited

Loading