Skip to content

fix: fix remap so that it handles deletions correctly#5828

Merged
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/btree-map-deletions
Jan 28, 2026
Merged

fix: fix remap so that it handles deletions correctly#5828
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/btree-map-deletions

Conversation

@westonpace
Copy link
Copy Markdown
Member

@westonpace westonpace commented Jan 27, 2026

The previous remap implementation assumed it was safe to just copy the lookup unchanged. This is only safe if there are no deletions. If there are deletions then the page boundaries have moved and so the min/max has moved as well. We need to recreate both the lookup and the data when we remap.

Closes #5826

@github-actions github-actions Bot added bug Something isn't working python labels Jan 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

P0: Debug print statements in production code

The remap function contains println! statements that should be removed before merging:

  • rust/lance-index/src/scalar/btree.rs lines with:
    • println!("Remapping part_id={:?} page_file={}", part_id, page_file);
    • println!("Remapped {} rows into {} rows", ...);

These will pollute stdout in production.

P0: Commented-out code in compression.rs

The file rust/lance-encoding/src/compression.rs has 19 lines of code commented out (lines 592-610). This appears to be unrelated to the btree remap fix and should either:

  1. Be reverted if unintentional
  2. Removed entirely if the logic is no longer needed
  3. Explained in the PR description if intentional

P1: Missing lookup file handling for basic index

In the new remap implementation, the lookup file is only handled for range-based indices:

if let Some(ranges_to_files) = &self.ranges_to_files {
    // ... merge lookups
}

For basic indices (when ranges_to_files is None), the lookup file is not copied or regenerated. The old code explicitly copied it:

// Copy the lookup file as-is (OLD CODE)
self.store
    .copy_index_file(BTREE_LOOKUP_NAME, dest_store)
    .await?;

Since train_btree_index generates lookup files, this might be working correctly for basic indices via the part_id=None case, but should be verified with a test for basic (non-range) btree index remap with deletions.


The core fix (retraining rather than just copying when remapping) is sound. Good test coverage for the deletion scenario.

@westonpace westonpace marked this pull request as ready for review January 28, 2026 14:47
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 91.50943% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/btree.rs 91.50% 8 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should backport this? looks like a serious correctness bug

@westonpace westonpace merged commit cf1bd20 into lance-format:main Jan 28, 2026
31 checks passed
@westonpace
Copy link
Copy Markdown
Member Author

I think we should backport this? looks like a serious correctness bug

Agreed.

jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 29, 2026
)

The previous remap implementation assumed it was safe to just copy the
lookup unchanged. This is only safe if there are no deletions. If there
are deletions then the page boundaries have moved and so the min/max has
moved as well. We need to recreate both the lookup and the data when we
remap.

Closes lance-format#5826
jackye1995 pushed a commit that referenced this pull request Jan 29, 2026
The previous remap implementation assumed it was safe to just copy the
lookup unchanged. This is only safe if there are no deletions. If there
are deletions then the page boundaries have moved and so the min/max has
moved as well. We need to recreate both the lookup and the data when we
remap.

Closes #5826
vivek-bharathan pushed a commit to vivek-bharathan/lance that referenced this pull request Feb 2, 2026
)

The previous remap implementation assumed it was safe to just copy the
lookup unchanged. This is only safe if there are no deletions. If there
are deletions then the page boundaries have moved and so the min/max has
moved as well. We need to recreate both the lookup and the data when we
remap.

Closes lance-format#5826
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Index corruption in btree index when compacting across deletions

2 participants