Skip to content

perf: fast rotation for RQ quantization#6024

Merged
BubbleCal merged 8 commits intomainfrom
yang/rq-fast-rotation-option
Feb 27, 2026
Merged

perf: fast rotation for RQ quantization#6024
BubbleCal merged 8 commits intomainfrom
yang/rq-fast-rotation-option

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

this introduces a new rotation algorithm which can do rotation in O(dlogd) time (O(d^2) before), and reduces the memory footprint of rotation matrix from d^2 floats to d bits

this doesn't reduce recall, and significantly improves performance:
image

query performance is 93% faster, and 307% faster than IVF_PQ
indexing is 5.4x faster, and 11.9x faster than IVF_PQ

@github-actions
Copy link
Copy Markdown
Contributor

PR Review: Fast Rotation for RQ Quantization

This PR introduces a new FHT-Kac rotation algorithm for RaBitQ, improving time complexity from O(d²) to O(d log d) and reducing memory from d² floats to d bits. The implementation looks solid overall.

P1 Issues

  1. Potential panic in builder.rs:444-445

    let input = column
        .as_slice()
        .expect("RabitQ input vectors should be contiguous");

    Using .expect() here can cause a runtime panic if data isn't contiguous. This should return a proper Result error instead.

  2. Minor: Python documentation typo (python/dataset.py:2866)
    Documentation says "Rabit Quantization" but should be "RaBitQ" for consistency with the rest of the codebase.

Observations (not blocking)

  • The default_rotation_type_compat() returning Matrix for backwards compatibility with older metadata is good.
  • SIMD AVX2 optimization with scalar fallback is well implemented.
  • Test coverage for rotation type preservation through optimize operations is appreciated.

The performance improvements are impressive. LGTM pending the panic-to-error change.

@github-actions github-actions Bot added the java label Feb 26, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 26, 2026

codes.fill(0);
for (bit_idx, value) in rotated.iter().enumerate() {
if value.is_sign_positive() {
codes[bit_idx / u8::BITS as usize] |= 1u8 << (bit_idx % u8::BITS as usize);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if dim is not byte-aligned, we could hit panic here, for example, if dim is 129 (is this possible?)

I ask codex to have a repro:

use std::sync::Arc;

use arrow::datatypes::Float32Type;
use arrow_array::{ArrayRef, FixedSizeListArray, Float32Array};
use arrow_schema::{DataType, Field};
use lance_index::vector::bq::builder::RabitQuantizer;
use lance_index::vector::bq::RQRotationType;
use lance_index::vector::quantizer::Quantization;

fn main() {
    let dim = 129; // non-byte-aligned dimension
    let values = Float32Array::from(vec![1.0f32; dim]);
    let field = Arc::new(Field::new("item", DataType::Float32, true));
    let vectors =
        FixedSizeListArray::try_new(field, dim as i32, Arc::new(values) as ArrayRef, None).unwrap();

    // Panic is nondeterministic (depends on random rotation signs), loop to trigger reliably.
    for i in 0..5000 {
        let q =
            RabitQuantizer::new_with_rotation::<Float32Type>(1, dim as i32, RQRotationType::Fast);
        let _ = q.quantize(&vectors).unwrap();
        if i % 1000 == 0 {
            println!("iter={i}");
        }
    }

    println!("done");
}

This logic will panic at index out of bounds

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch
will add requirement of dimension % 8 == 0, i think it's fine because real world vectors are almost always dividable by 8

@BubbleCal BubbleCal merged commit 4c81f5d into main Feb 27, 2026
29 checks passed
@BubbleCal BubbleCal deleted the yang/rq-fast-rotation-option branch February 27, 2026 06:11
wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 4, 2026
this introduces a new rotation algorithm which can do rotation in
`O(dlogd)` time (`O(d^2)` before), and reduces the memory footprint of
rotation matrix from d^2 floats to d bits

this doesn't reduce recall, and significantly improves performance:
<img width="1116" height="839" alt="image"
src="https://github.com/user-attachments/assets/308240b5-e38c-4e2a-aea6-c4171c28687a"
/>

query performance is 93% faster, and 307% faster than IVF_PQ
indexing is 5.4x faster, and 11.9x faster than IVF_PQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants