perf: fast rotation for RQ quantization#6024
Conversation
PR Review: Fast Rotation for RQ QuantizationThis PR introduces a new FHT-Kac rotation algorithm for RaBitQ, improving time complexity from O(d²) to O(d log d) and reducing memory from d² floats to d bits. The implementation looks solid overall. P1 Issues
Observations (not blocking)
The performance improvements are impressive. LGTM pending the panic-to-error change. |
…g/rq-fast-rotation-option
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
| codes.fill(0); | ||
| for (bit_idx, value) in rotated.iter().enumerate() { | ||
| if value.is_sign_positive() { | ||
| codes[bit_idx / u8::BITS as usize] |= 1u8 << (bit_idx % u8::BITS as usize); |
There was a problem hiding this comment.
It looks like if dim is not byte-aligned, we could hit panic here, for example, if dim is 129 (is this possible?)
I ask codex to have a repro:
use std::sync::Arc;
use arrow::datatypes::Float32Type;
use arrow_array::{ArrayRef, FixedSizeListArray, Float32Array};
use arrow_schema::{DataType, Field};
use lance_index::vector::bq::builder::RabitQuantizer;
use lance_index::vector::bq::RQRotationType;
use lance_index::vector::quantizer::Quantization;
fn main() {
let dim = 129; // non-byte-aligned dimension
let values = Float32Array::from(vec![1.0f32; dim]);
let field = Arc::new(Field::new("item", DataType::Float32, true));
let vectors =
FixedSizeListArray::try_new(field, dim as i32, Arc::new(values) as ArrayRef, None).unwrap();
// Panic is nondeterministic (depends on random rotation signs), loop to trigger reliably.
for i in 0..5000 {
let q =
RabitQuantizer::new_with_rotation::<Float32Type>(1, dim as i32, RQRotationType::Fast);
let _ = q.quantize(&vectors).unwrap();
if i % 1000 == 0 {
println!("iter={i}");
}
}
println!("done");
}This logic will panic at index out of bounds
There was a problem hiding this comment.
good catch
will add requirement of dimension % 8 == 0, i think it's fine because real world vectors are almost always dividable by 8
this introduces a new rotation algorithm which can do rotation in `O(dlogd)` time (`O(d^2)` before), and reduces the memory footprint of rotation matrix from d^2 floats to d bits this doesn't reduce recall, and significantly improves performance: <img width="1116" height="839" alt="image" src="https://github.com/user-attachments/assets/308240b5-e38c-4e2a-aea6-c4171c28687a" /> query performance is 93% faster, and 307% faster than IVF_PQ indexing is 5.4x faster, and 11.9x faster than IVF_PQ
this introduces a new rotation algorithm which can do rotation in
O(dlogd)time (O(d^2)before), and reduces the memory footprint of rotation matrix from d^2 floats to d bitsthis doesn't reduce recall, and significantly improves performance:

query performance is 93% faster, and 307% faster than IVF_PQ
indexing is 5.4x faster, and 11.9x faster than IVF_PQ