Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Jan 4, 2024

Which issue does this PR close?

Closes #7194

Rationale for this change

  • Support array_resize
  • Support column for this function
  • Support LargeList

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Jan 4, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I have a different idea about array_size.
We can apply the combination of existing array_length, array_slice, array_repeat and array_concat to resize the array. But I'm not sure is it faster or slower, is it easier to maintain or not. How do you think about this approach @Weijun-H ?

array_concat(array_slice(array, 0, min(array_length, count)), array_repeat(default_array, max(count-array_length), 0))


for (index, arr) in array.iter().enumerate() {
let count = count_array.value(index).to_usize().ok_or_else(|| {
DataFusionError::Execution(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: exec_err

array: &GenericListArray<O>,
count_array: &Int64Array,
field: &FieldRef,
new_element: Option<ArrayRef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_element might be more concise?

@Weijun-H
Copy link
Member Author

Weijun-H commented Jan 5, 2024

array_concat(array_slice(array, 0, min(array_length, count)), array_repeat(default_array, max(count-array_length), 0))

I think this approach is not a better idea, as supporting the column for max(count-array_length) in an argument is hard. However, the MutableArray could be another direction to consider, just like you did for other functions. I will try it as a follow-up PR. @jayzhan211

@comphead
Copy link
Contributor

comphead commented Jan 7, 2024

thanks @Weijun-H I'll review it tomorrow more closely if no one else beats me.

let new_rows = vec![&default_value; count - remain_count];
rows.extend(new_rows);

let last_offset = offsets.last().copied().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please handle the potential error on .unwrap

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H tests are awesome.
Im just wondering could we just concat 2 arrays(old+new one with default values) to do a resize?

I have slight concern that performance on current implementation, maybe I'm wrong

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Weijun-H and @comphead -- this is looking good

@jayzhan211

This comment was marked as outdated.


for (row_index, offset_window) in array.offsets().windows(2).enumerate() {
let count = count_array.value(row_index).to_usize().ok_or_else(|| {
exec_datafusion_err!("array_resize: failed to convert size to usize")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does type conversion more like internal_err?

wasn't expected/anticipated by the implementation and that is most likely a bug (the error message even encourages users to open a bug report)

unlike array.len() != number, which is error from user.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better with MutableArray! LGTM

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks for using MutableBuffer @Weijun-H it looks nicer now
Will also wait for @alamb as he had some comments as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @Weijun-H and @comphead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement array_resize function

4 participants