Skip to content

perf: various btree performance improvements#5446

Merged
westonpace merged 9 commits intolance-format:mainfrom
westonpace:perf/improve-btree-perf
Dec 10, 2025
Merged

perf: various btree performance improvements#5446
westonpace merged 9 commits intolance-format:mainfrom
westonpace:perf/improve-btree-perf

Conversation

@westonpace
Copy link
Copy Markdown
Member

The main improvements are:

  • Sort page by row ids when loading
  • Don't search page if the entire page matches, instead use a precomputed "all_ids" bitmap

@westonpace
Copy link
Copy Markdown
Member Author

Other potential improvements:

  • Sort page by row ids on write (avoid the minor cold search penalty but make a backwards incompatible change to the index)
  • Calculate pre-computed ids for ranges of pages

@westonpace
Copy link
Copy Markdown
Member Author

This drops the btree_range_most/int_unique/cached benchmark on my system from 19.5ms to 1.5ms

@github-actions github-actions Bot added the python label Dec 9, 2025
@westonpace westonpace marked this pull request as ready for review December 9, 2025 18:28
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
Just some questions

Comment thread rust/lance-index/src/scalar/btree.rs Outdated
fn pages_null(&self) -> Vec<u32> {
self.null_pages.clone()
fn pages_null(&self) -> Vec<Matches> {
// TODO: We could keep track of all-null pages and return Matches::All for those.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we can't do this for now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, I just didn't want to 😆. Let me add that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

impl FlatIndex {
pub fn try_new(data: RecordBatch) -> Result<Self> {
// Sort by row id to make bitmap construction more efficient
let data = data.sort_by_column(1, None)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this already sorted?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sorted by value. This sorts it by row id.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 88.37719% with 53 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/btree/flat.rs 89.00% 17 Missing and 14 partials ⚠️
rust/lance-index/src/scalar/btree.rs 92.25% 6 Missing and 5 partials ⚠️
rust/lance-core/src/utils/mask.rs 66.66% 6 Missing ⚠️
rust/lance-arrow/src/lib.rs 54.54% 4 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@westonpace westonpace force-pushed the perf/improve-btree-perf branch from 70d409e to ea59263 Compare December 10, 2025 16:04
@westonpace westonpace force-pushed the perf/improve-btree-perf branch from ea59263 to 8b4c11e Compare December 10, 2025 21:48
@westonpace westonpace merged commit 9782e9a into lance-format:main Dec 10, 2025
28 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
The main improvements are:

* Sort page by row ids when loading
* Don't search page if the entire page matches, instead use a
precomputed "all_ids" bitmap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants