-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22715][SQL] Reuse the same array in CreateNamedStruct #19910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM |
| |$values = new Object[${valExprs.size}]; | ||
| |$valuesCode | ||
| |final InternalRow ${ev.value} = new $rowClass($values); | ||
| |$values = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create the object array every time is because GenericInternalRow doesn't copy the given array. If the parent operator keeps the produced rows without copying them, this change may cause wrong result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same feeling. But then I thought: isn't the same for each other value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the contractor is, if we need to buffer some values, we need to copy them. BTW CreateArray also reuse the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually CreateArray only reuse the array if element type is not primitive, we should also fix it in another PR. @mgaido91
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC I think CreateArray never reuses the array. Why are you saying it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
damn it was reverted in https://github.com/apache/spark/pull/19797/files#diff-c1758d627a06084e577be0d33d47f44eL97 and I was reading the old code. Let's bring it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @kiszk , is it by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#19797 addressed to avoid using an object in a global variable in several places since to reuse an object looks a hacky way.
I intentionally avoided the reuse of an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I didn't remember it very clearly, what was the gain of not reusing the array? Saving a global variable slot seems not worth.
|
Test build #84554 has finished for PR 19910 at commit
|
|
@cloud-fan the UT failures are caused by this PR. Therefore I think @viirya is right and also for |
|
I did a quick search, seems we do follow the rule that avoids reusing the data array. Let's restore the change in #19896 |
What changes were proposed in this pull request?
The PR reuses the same object in CreateNamedStruct instead of instantiating a new object every time.
The idea of this PR was suggested by @cloud-fan here (#19896 (comment)).
How was this patch tested?
existing UTs