fix: invalidate index fragment bitmaps after data replacement and stale merge by wjones127 · Pull Request #5929 · lance-format/lance

wjones127 · 2026-02-10T18:21:58Z

This fixes a bug where queries or merge insert could result in the error this fragment does not exist in the dataset.

This would happen because the index referenced row addresses that were no longer in the dataset.

Here's one way this would happen:

Create a BTREE index covering 2 fragments
Do a merge insert that rewrites the index column in place. This removes the updated data from the index's fragment bitmap, so data is now hidden.
Do compaction. This will move the rows to new fragments.
Optimize the BTREE index: this would build the btree index with the old data, not filtering out the invalidated rows. After doing this the invalidated rows, previously masked out, are part of the index. But since compaction happened, those old invalidated rows moved.

This PR makes sure:

When incrementally updating indexes, we only read row ids for data in the fragment bitmap.
Make sure we remove fragments from the index fragment bitmap if we do a DataReplacement on those fields.

This rules are described in the spec at https://lance.org/format/table/index/#loading-an-index

Fixes #5321

…le merge After a merge_insert that modifies indexed columns (RewriteColumns path), followed by compaction + optimize_indices, subsequent queries could fail with "non-existent fragment" errors. Two root causes: 1. optimize_indices merged stale old B-tree data: `merge_indices_with_unindexed_frags` used raw `fragment_bitmap` instead of `effective_fragment_bitmap`, carrying entries for pruned fragments. When the effective bitmap was empty (fully pruned), `index.update()` still called `combine_old_new()` which merged stale entries. Now uses effective bitmap and rebuilds from scratch when old data is fully stale. 2. DataReplacement had no index invalidation: the DataReplacement handler in `build_manifest()` didn't call `prune_updated_fields_from_indices`, so replacing an indexed column's data left the index bitmap unchanged. Fixes lance-format#5321 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-10T18:23:41Z

Code Review

This PR fixes index fragment bitmap corruption after merge_insert with reordered columns followed by compact_files and optimize_indices. The fix addresses two root causes correctly.

P0: Potential Issue in Scalar Index Rebuild Logic

In rust/lance/src/index/append.rs, lines 140-162, the rebuild-from-scratch path when effective_old_frags.is_empty() passes true for the include_row_id parameter and Some(new_data_stream) for fragments:

super::scalar::build_scalar_index(
    dataset.as_ref(),
    column.name.as_str(),
    &new_uuid.to_string(),
    &params,
    true,  // include_row_id
    None,  // fragments (None means all)
    Some(new_data_stream),  // additional_data_stream
)
.await?

However, the new_data_stream was already prepared from unindexed fragments only (loaded at line 134-142). When rebuilding from scratch with an empty effective bitmap, the intent is to index all current fragments. Passing None for fragments is correct (indexes all), but new_data_stream only contains unindexed fragment data.

Recommendation: Verify that build_scalar_index with fragments=None and additional_data_stream=Some(new_data_stream) produces correct results, or consider passing None for both parameters when rebuilding from scratch.

P1: Replaced Fields Extraction

In rust/lance/src/dataset/transaction.rs, the replaced_fields extraction at lines 2062-2071 only looks at the first data file:

let replaced_fields: Vec<u32> = new_datafiles
    .first()
    .map(|f| {
        f.fields.iter().filter(...).collect()
    })
    .unwrap_or_default();

This is safe given the earlier validation (lines 2037-2058) that ensures all new data files have the same fields. However, adding a comment here would make this dependency explicit.

Tests

The test coverage is thorough:

test_merge_insert_with_reordered_columns_and_index directly reproduces the reported issue
test_data_replacement_invalidates_index_bitmap validates the DataReplacement fix

Overall the fix logic is sound and addresses the reported issue correctly.

codecov · 2026-02-10T18:56:17Z

Codecov Report

❌ Patch coverage is 91.13924% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/index/append.rs	82.14%	0 Missing and 5 partials ⚠️
rust/lance-index/src/scalar/btree.rs	92.59%	0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

When `optimize_indices` calls `index.update()`, BTreeIndex's `combine_old_new()` reads all stored data including entries for invalidated fragments, causing "non-existent fragment" errors. The bitmap pruning from the previous fix only affects the bitmap, not the stored index data. Pass the valid fragment bitmap into `ScalarIndex::update()` so BTreeIndex can filter old data rows whose fragment ID (upper 32 bits of the row address) is not in the bitmap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

westonpace

Thanks for figuring this out. It makes me wonder if the update path is even worth the complexity it brings at this point. Btree is the only one that really makes use of it.

westonpace · 2026-02-13T20:30:00Z

+    ///
+    /// If `valid_old_fragments` is provided, old index data for fragments not in the bitmap
+    /// will be filtered out during the merge.


So this only matters for btree because it is the only index that scans the old portion of the index instead of rescanning the whole column?

Actually, bitmap does this too. I need to fix that.

Fixed bitmap. Other indices should be done as a follow up, but I'm not aware of a code path where they cause a bug yet. Many of them like ZoneMap and BloomFilter given inexact results.

westonpace · 2026-02-13T20:35:22Z

+    let filtered = stream.map(move |batch_result| {
+        let batch = batch_result?;
+        let row_ids = batch
+            .column(1)


Minor nit: I've been trying to slowly switch over to using string constants like BTREE_IDS_COLUMN and avoiding integral indexing just because it seems less prone to error in the future.

I can try that.

wjones127 · 2026-02-13T21:12:27Z

It makes me wonder if the update path is even worth the complexity it brings at this point. Btree is the only one that really makes use of it.

That's a fair point. There's probably not a huge advantage to scanning the index vs scanning the fragments.

This is basically the same problem encountered in #5929 However, the solution there (and in other indices fixed since then) has been to prune indices during the update phase to remove old fragments. Unfortunately, FTS indices are maybe too large for this pruning to make sense (it would require a complete scan through of the old index) though I haven't verified how bad the performance would be. This PR instead keeps track of which fragments have been invalidated. It then uses this as a block-list at search time.

This is basically the same problem encountered in lance-format#5929 However, the solution there (and in other indices fixed since then) has been to prune indices during the update phase to remove old fragments. Unfortunately, FTS indices are maybe too large for this pruning to make sense (it would require a complete scan through of the old index) though I haven't verified how bad the performance would be. This PR instead keeps track of which fragments have been invalidated. It then uses this as a block-list at search time.

…le merge (lance-format#5929) This fixes a bug where queries or merge insert could result in the error `this fragment does not exist in the dataset`. This would happen because the index referenced row addresses that were no longer in the dataset. Here's one way this would happen: * Create a BTREE index covering 2 fragments * Do a merge insert that rewrites the index column in place. This removes the updated data from the index's fragment bitmap, so data is now hidden. * Do compaction. This will move the rows to new fragments. * Optimize the BTREE index: this would build the btree index with the old data, not filtering out the invalidated rows. After doing this the invalidated rows, previously masked out, are part of the index. But since compaction happened, those old invalidated rows moved. This PR makes sure: 1. When incrementally updating indexes, we only read row ids for data in the fragment bitmap. 2. Make sure we remove fragments from the index fragment bitmap if we do a DataReplacement on those fields. This rules are described in the spec at https://lance.org/format/table/index/#loading-an-index Fixes lance-format#5321 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

github-actions Bot added the bug Something isn't working label Feb 10, 2026

wjones127 added the critical-fix Bugs that cause crashes, security vulnerabilities, or incorrect data. label Feb 10, 2026

wjones127 force-pushed the fix/index-bitmap-invalidation-5321 branch from ef90fe2 to ed6c6b5 Compare February 10, 2026 23:51

modify test to add another fragment

6ba3166

wjones127 force-pushed the fix/index-bitmap-invalidation-5321 branch from ed6c6b5 to 6ba3166 Compare February 10, 2026 23:56

wjones127 marked this pull request as ready for review February 11, 2026 23:34

westonpace approved these changes Feb 13, 2026

View reviewed changes

use column name instead of index

1eb9b55

wjones127 merged commit a1a60f2 into lance-format:main Feb 19, 2026
32 checks passed

This was referenced Mar 11, 2026

fix: avoid fts corruption when a fragment is deleted #6171

Closed

fix: respect the old data filter on inverted index #6216

Merged

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: invalidate index fragment bitmaps after data replacement and stale merge#5929

fix: invalidate index fragment bitmaps after data replacement and stale merge#5929
wjones127 merged 4 commits intolance-format:mainfrom
wjones127:fix/index-bitmap-invalidation-5321

wjones127 commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 10, 2026

Uh oh!

codecov Bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

westonpace left a comment

Uh oh!

westonpace Feb 13, 2026

Uh oh!

wjones127 Feb 13, 2026

Uh oh!

wjones127 Feb 13, 2026

Uh oh!

westonpace Feb 13, 2026

Uh oh!

wjones127 Feb 13, 2026

Uh oh!

wjones127 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wjones127 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 10, 2026

Code Review

P0: Potential Issue in Scalar Index Rebuild Logic

P1: Replaced Fields Extraction

Tests

Uh oh!

codecov Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

westonpace Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

westonpace Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wjones127 commented Feb 10, 2026 •

edited

Loading

codecov Bot commented Feb 10, 2026 •

edited

Loading