Skip to content

perf: improve SQ query speed#5596

Merged
wjones127 merged 5 commits intomainfrom
improve-sq-query-speed
Jan 1, 2026
Merged

perf: improve SQ query speed#5596
wjones127 merged 5 commits intomainfrom
improve-sq-query-speed

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

create_hnsw_sq(100000x128)
                        time:   [7.1499 s 7.1644 s 7.1840 s]
                        change: [-1.1794% -0.9172% -0.6107%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

search_hnsw_sq100000x128
                        time:   [253.49 µs 253.87 µs 254.24 µs]
                        change: [-3.6161% -3.4660% -3.3038%] (p = 0.00 < 0.05)
                        Performance has improved.

@github-actions
Copy link
Copy Markdown
Contributor

Code Review

P1: Inconsistent optimization in distance_all() method

The PR optimizes distance() by pre-computing the scale factor and using direct multiplication (dist * self.scale), but the distance_all() method at lines 446-468 (in the new file) still uses inverse_scalar_dist() which allocates a Vec and recomputes the scale on every call.

For consistency and to get the full performance benefit, distance_all() should also use the pre-computed self.scale field:

fn distance_all(&self, _k_hint: usize) -> Vec<f32> {
    match self.storage.distance_type {
        DistanceType::L2 | DistanceType::Cosine => self
            .storage
            .chunks
            .iter()
            .flat_map(|c| {
                c.sq_codes
                    .values()
                    .chunks_exact(c.dim())
                    .map(|sq_codes| l2_distance_uint_scalar(sq_codes, &self.query_sq_code) * self.scale)
            })
            .collect(),
        DistanceType::Dot => self
            .storage
            .chunks
            .iter()
            .flat_map(|c| {
                c.sq_codes
                    .values()
                    .chunks_exact(c.dim())
                    .map(|sq_codes| dot_distance(sq_codes, &self.query_sq_code) * self.scale)
            })
            .collect(),
        _ => panic!("We should not reach here: sq distance can only be L2 or Dot"),
    }
}

This would:

  1. Eliminate the per-call scale computation in inverse_scalar_dist
  2. Avoid the intermediate Vec allocation from collect_vec() inside inverse_scalar_dist
  3. Maintain consistency between the two distance calculation paths

Otherwise the changes look good - the dot_u8_scalar optimization with unrolling and the pre-computed scale factor are solid performance improvements.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@wjones127 wjones127 merged commit 781110a into main Jan 1, 2026
30 checks passed
@wjones127 wjones127 deleted the improve-sq-query-speed branch January 1, 2026 20:04
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
```
create_hnsw_sq(100000x128)
                        time:   [7.1499 s 7.1644 s 7.1840 s]
                        change: [-1.1794% -0.9172% -0.6107%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

search_hnsw_sq100000x128
                        time:   [253.49 µs 253.87 µs 254.24 µs]
                        change: [-3.6161% -3.4660% -3.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
```

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants