feat(rust): support refresh frag bitmap for index after updating when…#4589
feat(rust): support refresh frag bitmap for index after updating when…#4589yanghua merged 4 commits intolance-format:mainfrom
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4589 +/- ##
==========================================
+ Coverage 80.61% 80.73% +0.11%
==========================================
Files 320 321 +1
Lines 123816 125624 +1808
Branches 123816 125624 +1808
==========================================
+ Hits 99814 101420 +1606
- Misses 20437 20605 +168
- Partials 3565 3599 +34
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b07dd7c to
143db56
Compare
4a9f1f4 to
1cfb738
Compare
| fields_modified, | ||
| value_updated_fields: vec![], | ||
| mem_wal_to_flush: self.params.mem_wal_to_flush, | ||
| update_mode: Some(VerticalFullSchema), |
There was a problem hiding this comment.
Actually, it belongs VerticalPartialSchema
There was a problem hiding this comment.
@wjones127 It seems that even if it's a non-full-schema merge-insert, we still can not distinguish the updated fields(may contain the on field).
740a2c3 to
0d27710
Compare
|
@wjones127 When you have time, I hope you'll be able to answer. Very appreciated! |
|
@yanghua this is still marked as "Draft". Is that intentional? |
@jbapple Yeah, still need some questions, I am waiting for the answer from @wjones127 |
67b72d8 to
a72cc51
Compare
… enable stable rowid
c57603d to
e86c883
Compare
| mem_wal_to_merge: Option<MemWal>, | ||
| /// The fields that used to judge whether to preserve the new frag's id into | ||
| /// the frag bitmap of the specified indices. | ||
| fields_for_preserving_frag_bitmap: Vec<u32>, |
There was a problem hiding this comment.
To avoid confusion, name to this.
3dfceda to
32c4b4e
Compare
|
@wjones127 This PR is ready for review, PTAL. |
|
@wjones127 Any input? |
wjones127
left a comment
There was a problem hiding this comment.
This seems adequate for now. But I think before we say this is production ready, I'd recommend broader test coverage.
| // the frag bitmap of the index for 'value' field should not been updated | ||
| let value_bitmap = updated_value_index.fragment_bitmap.as_ref().unwrap(); | ||
| assert_eq!(value_bitmap.len(), 2); | ||
| assert!(value_bitmap.contains(0)); | ||
| assert!(value_bitmap.contains(1)); | ||
|
|
||
| let vec_bitmap = updated_vec_index.fragment_bitmap.as_ref().unwrap(); | ||
| assert_eq!(vec_bitmap.len(), 2); | ||
| assert!(vec_bitmap.contains(0)); | ||
| assert!(vec_bitmap.contains(1)); |
There was a problem hiding this comment.
It seems like, as written, we'll never update these for upsert, right? Originally, I was thinking if stable row ids were enabled, then updated rows and new rows would be written to separate fragments so that we could do this.
There is a case for merge_insert where we only do updates. This could be just because there aren't any new rows. Or it could be we set WhenNotMatched::DoNothing. In either of these cases, do we never update the bitmap?
There was a problem hiding this comment.
It seems like, as written, we'll never update these for upsert, right?
Yes, here(in this test) is because in full-schema mode, we did not figure out which columns are updated. So I set all fields to the fields_for_preserving_frag_bitmap, and it would cause the update for the frag bitmap not to happen.
Originally, I was thinking if stable row ids were enabled, then updated rows and new rows would be written to separate fragments so that we could do this.
IMO, it depends on whether we can distinguish which fields to update for each row in each fragment, right? It's hard to do this for RewriteRow mode. Correct me if I am wrong.
In either of these cases, do we never update the bitmap?
The current test cases are pure updates as you mentioned.
Currently, for merge insert, all the cases(update, upsert) would not update the frag bitmap. This is due to ReWriteRow mode, it's hard to split which fields would be updated. The detailed logic is here:
let index_covers_modified_field = index.fields.iter().any(|field_id| {
value_updated_field_set.contains(&u32::try_from(*field_id).unwrap())
});
| #[tokio::test] | ||
| async fn test_sub_schema_upsert_fragment_bitmap() { |
There was a problem hiding this comment.
Eventually, I'd like see a lot more tests cases for this. Unfortunately, right now the ratio of lines of code to useful tests cases is pretty high. I'd suggest thinking about how you can parameterize the tests to cover more useful situations. For example, you might use the same original test data, and very the merge insert parameters, input data, and index states.
There was a problem hiding this comment.
Eventually, I'd like see a lot more tests cases for this.
Currently, the whole sub-schema merge insert would not be affected by this PR. It belongs to the RewriteColumn mode. It would update in-place. For pure update, the row id, fragment would not be updated, right?
Absolutely, we can move faster. I would add more test cases and refactor them. Some logic may need to refactor, e.g. #4695 . |
|
Will merge it, if there are any issues, will fix them in the further PRs. |
…4788) This PR enhances the Java bindings for Operation::Update as a follow-up to #4589. While #4589 introduced new fields to the Rust Operation::Update struct, the Java bindings were still missing support not only for those new fields but also for the existing `fields_modified field`, which is crucial for handling column modifications. Specifically, it adds support for the following fields: ```rust Update { // ... /// The fields that have been modified fields_modified: Vec<u32>, // 👇 New fields from #4589 /// The fields that used to judge whether to preserve the new frag's id into /// the frag bitmap of the specified indices. fields_for_preserving_frag_bitmap: Vec<u32>, /// The mode of update update_mode: Option<UpdateMode>, } ``` --------- Co-authored-by: Weiren <litaiwei.lwt@antgroup.com>
… enable stable rowid (lance-format#4589) … enable stable rowid
…ance-format#4788) This PR enhances the Java bindings for Operation::Update as a follow-up to lance-format#4589. While lance-format#4589 introduced new fields to the Rust Operation::Update struct, the Java bindings were still missing support not only for those new fields but also for the existing `fields_modified field`, which is crucial for handling column modifications. Specifically, it adds support for the following fields: ```rust Update { // ... /// The fields that have been modified fields_modified: Vec<u32>, // 👇 New fields from lance-format#4589 /// The fields that used to judge whether to preserve the new frag's id into /// the frag bitmap of the specified indices. fields_for_preserving_frag_bitmap: Vec<u32>, /// The mode of update update_mode: Option<UpdateMode>, } ``` --------- Co-authored-by: Weiren <litaiwei.lwt@antgroup.com>
… enable stable rowid