test: add regression test for ivf/pq search#5476
test: add regression test for ivf/pq search#5476westonpace merged 1 commit intolance-format:mainfrom
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Two dimensions not tested here that seem worthwhile are:
num_bitsin PQ (4-bit vs 8-bit)data_typein the vector (f16 vs f32)
These setting will trigger different code paths, which I think are important to cover in regression testing.
There are some setting here like k which I don't think really hit different code paths as much as they just change a variable in the same code path.
There was a problem hiding this comment.
I think rust-level benchmarks would sufficiently (and probably with more fidelity) capture those parameters since they primarily affect the compute driven PQ search time. These parameters would also require training completely new indexes which is what dominates the benchmark time.
There was a problem hiding this comment.
We have pq_dist_table and 4bit_pq_dist_table which cover num_bits and distance_type. However, these only look at f32. I'll add an issue to add data_type to these benchmarks (I want to review them anyways since the lance-index benches don't pass reliably and in sufficient time for me)
|
@bench-bot benchmark |
Benchmark Results for PR #5476Commit: Summary
✅ All Benchmarks Within Normal RangeNo benchmarks had |z-score| > 2.0 All ResultsView all 123 benchmark results
Generated by bench-bot 🤖 |
The test covers 5 parameters (cache, k, nprobes, refine factor, dataset size) for a total of 64 tests. It takes ~3-4 minutes to run on my system. There are other parameters (distance type, number of dimensions, PQ-v-SQ, etc.) but these should only really affect compute time and/or recall and are probably better tested in rust benchmarks. Recall benchmarks will be done separately as they require real data.
The test covers 5 parameters (cache, k, nprobes, refine factor, dataset size) for a total of 64 tests. It takes ~3-4 minutes to run on my system.
There are other parameters (distance type, number of dimensions, PQ-v-SQ, etc.) but these should only really affect compute time and/or recall and are probably better tested in rust benchmarks. Recall benchmarks will be done separately as they require real data.