Skip to content

test: add regression test for ivf/pq search#5476

Merged
westonpace merged 1 commit intolance-format:mainfrom
westonpace:test/ivf-pq-regression
Jan 5, 2026
Merged

test: add regression test for ivf/pq search#5476
westonpace merged 1 commit intolance-format:mainfrom
westonpace:test/ivf-pq-regression

Conversation

@westonpace
Copy link
Copy Markdown
Member

@westonpace westonpace commented Dec 15, 2025

The test covers 5 parameters (cache, k, nprobes, refine factor, dataset size) for a total of 64 tests. It takes ~3-4 minutes to run on my system.

There are other parameters (distance type, number of dimensions, PQ-v-SQ, etc.) but these should only really affect compute time and/or recall and are probably better tested in rust benchmarks. Recall benchmarks will be done separately as they require real data.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two dimensions not tested here that seem worthwhile are:

  • num_bits in PQ (4-bit vs 8-bit)
  • data_type in the vector (f16 vs f32)

These setting will trigger different code paths, which I think are important to cover in regression testing.

There are some setting here like k which I don't think really hit different code paths as much as they just change a variable in the same code path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rust-level benchmarks would sufficiently (and probably with more fidelity) capture those parameters since they primarily affect the compute driven PQ search time. These parameters would also require training completely new indexes which is what dominates the benchmark time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have pq_dist_table and 4bit_pq_dist_table which cover num_bits and distance_type. However, these only look at f32. I'll add an issue to add data_type to these benchmarks (I want to review them anyways since the lance-index benches don't pass reliably and in sufficient time for me)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace
Copy link
Copy Markdown
Member Author

@bench-bot benchmark

@westonpace
Copy link
Copy Markdown
Member Author

westonpace commented Dec 19, 2025

Benchmark Results for PR #5476

Commit: e026964
Baseline: Up to 20 most recent historical results per benchmark

Summary

  • Total benchmarks: 123
  • 🚀 Improvements: 0
  • ⚠️ Regressions: 0
  • Stable: 123
  • Insufficient data: 0

✅ All Benchmarks Within Normal Range

No benchmarks had |z-score| > 2.0

All Results

View all 123 benchmark results
Benchmark PR Result Baseline Mean Baseline N Z-Score Status
Cosine(f32, auto-vectorized) 102.97 ms 153.39 ms 20 -1.19
Cosine(f32, scalar) 681.69 ms 820.47 ms 20 -0.79
Cosine(f64, auto-vectorized) 290.17 ms 404.36 ms 20 -1.02
Cosine(f64, scalar) 691.42 ms 836.72 ms 20 -0.79
Cosine(half::bfloat::bf16, auto-vectorized) 268.18 ms 362.38 ms 20 -0.86
Cosine(half::bfloat::bf16, scalar) 7.25 s 9.15 s 20 -0.86
Cosine(half::binary16::f16, auto-vectorized) 127.35 ms 154.34 ms 20 -0.80
Cosine(half::binary16::f16, scalar) 6.28 s 7.79 s 20 -0.80
Cosine(simd,f32x8) rng seed 2.81 ms 3.73 ms 20 -0.93
Dot(bf16, auto-vectorization) 409.46 ms 546.63 ms 20 -0.83
Dot(f16, SIMD) 70.48 ms 91.43 ms 20 -0.95
Dot(f32, SIMD) 101.22 ms 144.37 ms 20 -1.14
Dot(f32, arrow_artiy) 637.93 ms 792.48 ms 20 -0.91
Dot(f32, auto-vectorization) 96.77 ms 150.78 ms 20 -1.45
Dot(f64, arrow_artiy) 637.73 ms 797.02 ms 20 -0.90
Dot(f64, auto-vectorization) 188.91 ms 287.17 ms 20 -1.22
Dot(half::binary16::f16, arrow_artiy) 728.39 ms 870.45 ms 20 -0.90
Dot(half::binary16::f16, auto-vectorization) 70.01 ms 92.24 ms 20 -1.09
L2(f32, auto-vectorization) 104.57 ms 149.21 ms 20 -1.03
L2(f32, scalar) 659.71 ms 804.28 ms 20 -0.84
L2(f32, simd) 104.29 ms 146.38 ms 20 -1.00
L2(f64, auto-vectorization) 198.41 ms 297.23 ms 20 -1.21
L2(f64, scalar) 656.48 ms 815.34 ms 20 -0.89
L2(half::binary16::f16, auto-vectorization) 75.95 ms 94.50 ms 20 -0.89
L2(half::binary16::f16, scalar) 2.82 s 3.56 s 20 -0.86
L2(simd,f32x8) 4.60 ms 5.62 ms 20 -0.84
L2(uint8, auto-vectorization) 54.67 ms 66.65 ms 20 -0.95
L2(uint8, scalar) 50.17 ms 61.07 ms 20 -1.10
NormL2(f32, SIMD) 97.66 ms 142.88 ms 20 -1.10
NormL2(f32, auto-vectorization) 136.33 ms 142.08 ms 20 -0.14
NormL2(f32, scalar) 659.87 ms 787.33 ms 20 -0.73
NormL2(f64, auto-vectorization) 264.31 ms 296.55 ms 20 -0.34
NormL2(f64, scalar) 670.35 ms 807.86 ms 20 -0.75
NormL2(half::bfloat::bf16, auto-vectorization) 261.19 ms 339.64 ms 20 -0.80
NormL2(half::bfloat::bf16, scalar) 2.79 s 3.32 s 20 -0.77
NormL2(half::binary16::f16, SIMD) 71.74 ms 84.97 ms 20 -0.57
NormL2(half::binary16::f16, auto-vectorization) 72.01 ms 84.56 ms 20 -0.54
NormL2(half::binary16::f16, scalar) 679.61 ms 810.05 ms 20 -0.73
argmin(arrow) 4.83 ms 4.64 ms 20 +1.02
decode_fsl/float32_128_v2.0_nullfalse 32.17 ms 48.35 ms 20 -0.83
decode_fsl/float32_128_v2.1_nullfalse 10.73 ms 15.65 ms 20 -1.96
decode_fsl/float32_128_v2.1_nulltrue 22.12 ms 32.22 ms 20 -1.61
decode_fsl/float32_16_v2.0_nullfalse 30.40 ms 48.20 ms 20 -0.91
decode_fsl/float32_16_v2.1_nullfalse 17.12 ms 24.29 ms 20 -1.22
decode_fsl/float32_16_v2.1_nulltrue 28.48 ms 37.70 ms 20 -1.00
decode_fsl/float32_32_v2.0_nullfalse 30.45 ms 48.07 ms 20 -0.90
decode_fsl/float32_32_v2.1_nullfalse 18.69 ms 24.28 ms 20 -0.96
decode_fsl/float32_32_v2.1_nulltrue 24.01 ms 34.03 ms 20 -1.24
decode_fsl/float32_4_v2.0_nullfalse 30.61 ms 48.43 ms 20 -0.91
decode_fsl/float32_4_v2.1_nullfalse 17.46 ms 25.27 ms 20 -1.43
decode_fsl/float32_4_v2.1_nulltrue 47.58 ms 63.80 ms 20 -1.08
decode_fsl/float32_64_v2.0_nullfalse 30.62 ms 48.08 ms 20 -0.90
decode_fsl/float32_64_v2.1_nullfalse 11.69 ms 16.29 ms 20 -1.23
decode_fsl/float32_64_v2.1_nulltrue 25.29 ms 33.65 ms 20 -1.32
decode_fsl/int8_128_v2.0_nullfalse 30.41 ms 48.36 ms 20 -0.92
decode_fsl/int8_128_v2.1_nullfalse 17.43 ms 25.29 ms 20 -1.49
decode_fsl/int8_128_v2.1_nulltrue 24.19 ms 33.89 ms 20 -1.18
decode_fsl/int8_16_v2.0_nullfalse 30.25 ms 48.20 ms 20 -0.92
decode_fsl/int8_16_v2.1_nullfalse 17.42 ms 25.00 ms 20 -1.39
decode_fsl/int8_16_v2.1_nulltrue 47.81 ms 63.93 ms 20 -1.13
decode_fsl/int8_32_v2.0_nullfalse 30.14 ms 48.15 ms 20 -0.93
decode_fsl/int8_32_v2.1_nullfalse 17.01 ms 24.92 ms 20 -1.46
decode_fsl/int8_32_v2.1_nulltrue 34.96 ms 47.09 ms 20 -1.10
decode_fsl/int8_4_v2.0_nullfalse 30.06 ms 48.17 ms 20 -0.94
decode_fsl/int8_4_v2.1_nullfalse 19.46 ms 26.52 ms 20 -1.23
decode_fsl/int8_4_v2.1_nulltrue 131.40 ms 174.89 ms 20 -1.09
decode_fsl/int8_64_v2.0_nullfalse 30.03 ms 48.54 ms 20 -0.94
decode_fsl/int8_64_v2.1_nullfalse 17.60 ms 24.32 ms 20 -1.17
decode_fsl/int8_64_v2.1_nulltrue 27.24 ms 37.85 ms 20 -1.19
decode_primitive/date32 11.51 ms 15.89 ms 20 -1.55
decode_primitive/date64 11.98 ms 15.97 ms 20 -1.38
decode_primitive/decimal128(10, 10) 14.50 ms 16.43 ms 20 -0.72
decode_primitive/decimal256(10, 10) 29.05 ms 34.60 ms 20 -1.18
decode_primitive/duration(second) 11.60 ms 16.97 ms 20 -1.29
decode_primitive/fixed-utf8 65.79 µs 95.95 µs 20 -1.20
decode_primitive/float16 13.44 ms 16.00 ms 20 -0.95
decode_primitive/float32 13.69 ms 16.34 ms 20 -0.89
decode_primitive/float64 14.20 ms 16.49 ms 20 -0.64
decode_primitive/int16 11.91 ms 15.82 ms 20 -1.33
decode_primitive/int32 13.20 ms 15.91 ms 20 -0.92
decode_primitive/int64 12.02 ms 16.24 ms 20 -1.37
decode_primitive/int8 12.06 ms 15.99 ms 20 -1.35
decode_primitive/struct 136.30 µs 186.10 µs 20 -1.39
decode_primitive/time32(second) 11.99 ms 16.39 ms 20 -1.68
decode_primitive/time64(nanosecond) 12.19 ms 16.37 ms 20 -1.53
decode_primitive/timestamp(nanosecond, none) 12.27 ms 16.25 ms 20 -1.33
decode_primitive/uint16 13.33 ms 16.22 ms 20 -0.76
decode_primitive/uint32 12.86 ms 16.31 ms 20 -1.13
decode_primitive/uint64 13.78 ms 16.30 ms 20 -0.71
decode_primitive/uint8 12.64 ms 15.65 ms 20 -1.00
decode_primitive/utf8 752.31 µs 954.59 µs 20 -0.97
from_elem/full_read,parallel=1,read_size=1048576 36.39 ms 165.77 ms 20 -0.88
from_elem/full_read,parallel=1,read_size=16384 262.56 ms 326.30 ms 20 -0.66
from_elem/full_read,parallel=1,read_size=4096 938.26 ms 1.15 s 20 -0.56
from_elem/full_read,parallel=16,read_size=1048576 35.74 ms 165.92 ms 20 -0.88
from_elem/full_read,parallel=16,read_size=16384 261.63 ms 318.14 ms 20 -0.64
from_elem/full_read,parallel=16,read_size=4096 924.90 ms 1.14 s 20 -0.59
from_elem/full_read,parallel=32,read_size=1048576 35.81 ms 165.97 ms 20 -0.88
from_elem/full_read,parallel=32,read_size=16384 258.51 ms 318.46 ms 20 -0.71
from_elem/full_read,parallel=32,read_size=4096 929.44 ms 1.14 s 20 -0.57
from_elem/full_read,parallel=64,read_size=1048576 35.82 ms 166.02 ms 20 -0.88
from_elem/full_read,parallel=64,read_size=16384 258.62 ms 317.58 ms 20 -0.69
from_elem/full_read,parallel=64,read_size=4096 919.57 ms 1.15 s 20 -0.63
from_elem/random_read,parallel=1,item_size=1024 4.13 ms 4.16 ms 20 -0.04
from_elem/random_read,parallel=1,item_size=4096 4.66 ms 6.95 ms 20 -0.83
from_elem/random_read,parallel=1,item_size=8 1.81 ms 1.85 ms 20 -0.12
from_elem/random_read,parallel=16,item_size=1024 4.15 ms 4.11 ms 20 +0.05
from_elem/random_read,parallel=16,item_size=4096 4.64 ms 6.94 ms 20 -0.82
from_elem/random_read,parallel=16,item_size=8 1.82 ms 1.84 ms 20 -0.05
from_elem/random_read,parallel=32,item_size=1024 4.11 ms 4.10 ms 20 +0.02
from_elem/random_read,parallel=32,item_size=4096 4.91 ms 6.93 ms 20 -0.72
from_elem/random_read,parallel=32,item_size=8 1.83 ms 1.85 ms 20 -0.05
from_elem/random_read,parallel=64,item_size=1024 4.11 ms 4.10 ms 20 +0.02
from_elem/random_read,parallel=64,item_size=4096 4.73 ms 6.91 ms 20 -0.77
from_elem/random_read,parallel=64,item_size=8 1.83 ms 1.84 ms 20 -0.02
hamming,auto_vec 74.65 ms 96.16 ms 20 -0.90
hamming,scalar 118.43 ms 160.09 ms 20 -0.87
zip_1024Ki/2_2_2_zip_into_6 6.20 ms 7.98 ms 20 -0.82
zip_1024Ki/2_4_zip_into_6 4.61 ms 5.65 ms 20 -0.85
zip_32Ki/2_2_2_zip_into_6 247.84 µs 254.99 µs 20 -0.10
zip_32Ki/2_4_zip_into_6 154.40 µs 178.00 µs 20 -0.65
zip_8Ki/2_2_2_zip_into_6 42.65 µs 67.75 µs 20 -1.42
zip_8Ki/2_4_zip_into_6 34.56 µs 45.52 µs 20 -1.08

Generated by bench-bot 🤖

@westonpace westonpace merged commit c641403 into lance-format:main Jan 5, 2026
14 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
The test covers 5 parameters (cache, k, nprobes, refine factor, dataset
size) for a total of 64 tests. It takes ~3-4 minutes to run on my
system.

There are other parameters (distance type, number of dimensions,
PQ-v-SQ, etc.) but these should only really affect compute time and/or
recall and are probably better tested in rust benchmarks. Recall
benchmarks will be done separately as they require real data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants