fix: merge struct array use wrong child values#5106
fix: merge struct array use wrong child values#5106westonpace merged 2 commits intolance-format:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5106 +/- ##
==========================================
+ Coverage 81.87% 81.91% +0.03%
==========================================
Files 341 341
Lines 140539 140667 +128
Branches 140539 140667 +128
==========================================
+ Hits 115072 115224 +152
+ Misses 21661 21637 -24
Partials 3806 3806
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @westonpace , @Xuanwo , could you help review when you have time, thanks very much~ |
8244fff to
8276f46
Compare
westonpace
left a comment
There was a problem hiding this comment.
One super tiny nit which is that we should re-use the existing trimmed_values function. Otherwise, this looks great
| fn get_list_values<O: OffsetSizeTrait>(list_array: &GenericListArray<O>) -> ArrayRef { | ||
| let offsets = list_array.value_offsets(); | ||
| let start = offsets[0].to_usize().expect("offset overflow"); | ||
| let end = offsets[list_array.len()] | ||
| .to_usize() | ||
| .expect("offset overflow"); | ||
| list_array.values().slice(start, end - start) | ||
| } |
There was a problem hiding this comment.
There is already a function trimmed_values that you can use in rust/lance-arrow/src/list.rs in ListArrayExt.
| .unwrap(); | ||
| let merged_values = merge_list_child_values( | ||
| child_field.as_ref(), | ||
| left_list.values().clone(), |
There was a problem hiding this comment.
This change is definitely needed, good catch.
8276f46 to
b5b69fd
Compare
|
Hi @westonpace , thanks your suggestion, I've updated the pr, please review when you have time, thanks very much! |
When I work with a dataset using struct column below, I got error
'Incorrect array length for StructArray field type, expected 5 got
3019'.
```
Total rows: 1000
Total items of 'type': 3019
Schema: map_data: list<item: struct<lane_dir: large_string, type: int64, xyz: list<item: list<item: double>>>>
child 0, item: struct<lane_dir: large_string, type: int64, xyz: list<item: list<item: double>>>
child 0, lane_dir: large_string
child 1, type: int64
child 2, xyz: list<item: list<item: double>>
child 0, item: list<item: double>
child 0, item: double
```
It is caused by values not correctly read when processing the child
list.
---------
Co-authored-by: lijinglun <lijinglun@bytedance.com>
When I work with a dataset using struct column below, I got error 'Incorrect array length for StructArray field type, expected 5 got 3019'.
It is caused by values not correctly read when processing the child list.