Skip to content

feat(rust): support refresh frag bitmap for index after updating when…#4589

Merged
yanghua merged 4 commits intolance-format:mainfrom
yanghua:stable-row-id-frag-bitmap
Sep 18, 2025
Merged

feat(rust): support refresh frag bitmap for index after updating when…#4589
yanghua merged 4 commits intolance-format:mainfrom
yanghua:stable-row-id-frag-bitmap

Conversation

@yanghua
Copy link
Copy Markdown
Collaborator

@yanghua yanghua commented Aug 28, 2025

… enable stable rowid

@github-actions github-actions Bot added the enhancement New feature or request label Aug 28, 2025
@yanghua yanghua changed the title feat(rust): support refresh frag bitmap for index after updating when… [WIP] feat(rust): support refresh frag bitmap for index after updating when… Aug 28, 2025
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 30, 2025

Codecov Report

❌ Patch coverage is 96.94915% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.73%. Comparing base (419ac97) to head (df64a1c).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/transaction.rs 83.51% 13 Missing and 2 partials ⚠️
rust/lance/src/dataset/write/update.rs 98.84% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4589      +/-   ##
==========================================
+ Coverage   80.61%   80.73%   +0.11%     
==========================================
  Files         320      321       +1     
  Lines      123816   125624    +1808     
  Branches   123816   125624    +1808     
==========================================
+ Hits        99814   101420    +1606     
- Misses      20437    20605     +168     
- Partials     3565     3599      +34     
Flag Coverage Δ
unittests 80.73% <96.94%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yanghua yanghua force-pushed the stable-row-id-frag-bitmap branch from b07dd7c to 143db56 Compare August 30, 2025 13:59
@yanghua yanghua force-pushed the stable-row-id-frag-bitmap branch from 4a9f1f4 to 1cfb738 Compare August 31, 2025 11:01
Comment thread rust/lance/src/dataset/transaction.rs
fields_modified,
value_updated_fields: vec![],
mem_wal_to_flush: self.params.mem_wal_to_flush,
update_mode: Some(VerticalFullSchema),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it belongs VerticalPartialSchema

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjones127 It seems that even if it's a non-full-schema merge-insert, we still can not distinguish the updated fields(may contain the on field).

@yanghua yanghua force-pushed the stable-row-id-frag-bitmap branch from 740a2c3 to 0d27710 Compare September 1, 2025 08:57
Comment thread rust/lance/src/index/vector/ivf/v2.rs Outdated
@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Sep 3, 2025

@wjones127 When you have time, I hope you'll be able to answer. Very appreciated!

@jbapple
Copy link
Copy Markdown
Contributor

jbapple commented Sep 8, 2025

@yanghua this is still marked as "Draft". Is that intentional?

@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Sep 9, 2025

@yanghua this is still marked as "Draft". Is that intentional?

@jbapple Yeah, still need some questions, I am waiting for the answer from @wjones127

@yanghua yanghua force-pushed the stable-row-id-frag-bitmap branch 4 times, most recently from 67b72d8 to a72cc51 Compare September 11, 2025 13:22
Comment thread rust/lance/src/dataset/transaction.rs
Comment thread protos/transaction.proto
Comment thread rust/lance/src/index/vector/ivf/v2.rs Outdated
Comment thread rust/lance/src/dataset/transaction.rs Outdated
Comment thread rust/lance/src/index/vector/ivf/v2.rs Outdated
@yanghua yanghua force-pushed the stable-row-id-frag-bitmap branch 4 times, most recently from c57603d to e86c883 Compare September 14, 2025 12:35
@yanghua yanghua marked this pull request as ready for review September 15, 2025 05:54
@yanghua yanghua changed the title [WIP] feat(rust): support refresh frag bitmap for index after updating when… feat(rust): support refresh frag bitmap for index after updating when… Sep 15, 2025
mem_wal_to_merge: Option<MemWal>,
/// The fields that used to judge whether to preserve the new frag's id into
/// the frag bitmap of the specified indices.
fields_for_preserving_frag_bitmap: Vec<u32>,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, name to this.

@yanghua yanghua force-pushed the stable-row-id-frag-bitmap branch from 3dfceda to 32c4b4e Compare September 15, 2025 06:17
@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Sep 15, 2025

@wjones127 This PR is ready for review, PTAL.

@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Sep 17, 2025

@wjones127 Any input?

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems adequate for now. But I think before we say this is production ready, I'd recommend broader test coverage.

Comment thread rust/lance/src/dataset/write/update.rs Outdated
Comment on lines +4660 to +4669
// the frag bitmap of the index for 'value' field should not been updated
let value_bitmap = updated_value_index.fragment_bitmap.as_ref().unwrap();
assert_eq!(value_bitmap.len(), 2);
assert!(value_bitmap.contains(0));
assert!(value_bitmap.contains(1));

let vec_bitmap = updated_vec_index.fragment_bitmap.as_ref().unwrap();
assert_eq!(vec_bitmap.len(), 2);
assert!(vec_bitmap.contains(0));
assert!(vec_bitmap.contains(1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like, as written, we'll never update these for upsert, right? Originally, I was thinking if stable row ids were enabled, then updated rows and new rows would be written to separate fragments so that we could do this.

There is a case for merge_insert where we only do updates. This could be just because there aren't any new rows. Or it could be we set WhenNotMatched::DoNothing. In either of these cases, do we never update the bitmap?

Copy link
Copy Markdown
Collaborator Author

@yanghua yanghua Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like, as written, we'll never update these for upsert, right?

Yes, here(in this test) is because in full-schema mode, we did not figure out which columns are updated. So I set all fields to the fields_for_preserving_frag_bitmap, and it would cause the update for the frag bitmap not to happen.

Originally, I was thinking if stable row ids were enabled, then updated rows and new rows would be written to separate fragments so that we could do this.

IMO, it depends on whether we can distinguish which fields to update for each row in each fragment, right? It's hard to do this for RewriteRow mode. Correct me if I am wrong.

In either of these cases, do we never update the bitmap?

The current test cases are pure updates as you mentioned.

Currently, for merge insert, all the cases(update, upsert) would not update the frag bitmap. This is due to ReWriteRow mode, it's hard to split which fields would be updated. The detailed logic is here:

let index_covers_modified_field = index.fields.iter().any(|field_id| {
          value_updated_field_set.contains(&u32::try_from(*field_id).unwrap())
});

Comment on lines +4672 to +4673
#[tokio::test]
async fn test_sub_schema_upsert_fragment_bitmap() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, I'd like see a lot more tests cases for this. Unfortunately, right now the ratio of lines of code to useful tests cases is pretty high. I'd suggest thinking about how you can parameterize the tests to cover more useful situations. For example, you might use the same original test data, and very the merge insert parameters, input data, and index states.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, I'd like see a lot more tests cases for this.

Currently, the whole sub-schema merge insert would not be affected by this PR. It belongs to the RewriteColumn mode. It would update in-place. For pure update, the row id, fragment would not be updated, right?

@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Sep 18, 2025

But I think before we say this is production ready, I'd recommend broader test coverage.

Absolutely, we can move faster. I would add more test cases and refactor them. Some logic may need to refactor, e.g. #4695 .

@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Sep 18, 2025

Will merge it, if there are any issues, will fix them in the further PRs.

@yanghua yanghua merged commit a05d78d into lance-format:main Sep 18, 2025
26 checks passed
jackye1995 pushed a commit that referenced this pull request Sep 25, 2025
…4788)

This PR enhances the Java bindings for Operation::Update as a follow-up
to #4589.

While #4589 introduced new fields to the Rust Operation::Update struct,
the Java bindings were still missing support not only for those new
fields but also for the existing `fields_modified field`, which is
crucial for handling column modifications.

Specifically, it adds support for the following fields:

```rust
  Update {
      // ...
      /// The fields that have been modified
      fields_modified: Vec<u32>,
      // 👇 New fields from #4589
      /// The fields that used to judge whether to preserve the new frag's id into
      /// the frag bitmap of the specified indices.
      fields_for_preserving_frag_bitmap: Vec<u32>,
      /// The mode of update
      update_mode: Option<UpdateMode>,
  }
```

---------

Co-authored-by: Weiren <litaiwei.lwt@antgroup.com>
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…ance-format#4788)

This PR enhances the Java bindings for Operation::Update as a follow-up
to lance-format#4589.

While lance-format#4589 introduced new fields to the Rust Operation::Update struct,
the Java bindings were still missing support not only for those new
fields but also for the existing `fields_modified field`, which is
crucial for handling column modifications.

Specifically, it adds support for the following fields:

```rust
  Update {
      // ...
      /// The fields that have been modified
      fields_modified: Vec<u32>,
      // 👇 New fields from lance-format#4589
      /// The fields that used to judge whether to preserve the new frag's id into
      /// the frag bitmap of the specified indices.
      fields_for_preserving_frag_bitmap: Vec<u32>,
      /// The mode of update
      update_mode: Option<UpdateMode>,
  }
```

---------

Co-authored-by: Weiren <litaiwei.lwt@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants