Skip to content

Conversation

@mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Dec 6, 2017

What changes were proposed in this pull request?

The PR reuses the same object in CreateNamedStruct instead of instantiating a new object every time.

The idea of this PR was suggested by @cloud-fan here (#19896 (comment)).

How was this patch tested?

existing UTs

@cloud-fan
Copy link
Contributor

LGTM

|$values = new Object[${valExprs.size}];
|$valuesCode
|final InternalRow ${ev.value} = new $rowClass($values);
|$values = null;
Copy link
Member

@viirya viirya Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create the object array every time is because GenericInternalRow doesn't copy the given array. If the parent operator keeps the produced rows without copying them, this change may cause wrong result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same feeling. But then I thought: isn't the same for each other value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the contractor is, if we need to buffer some values, we need to copy them. BTW CreateArray also reuse the array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually CreateArray only reuse the array if element type is not primitive, we should also fix it in another PR. @mgaido91

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC I think CreateArray never reuses the array. Why are you saying it does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn it was reverted in https://github.com/apache/spark/pull/19797/files#diff-c1758d627a06084e577be0d33d47f44eL97 and I was reading the old code. Let's bring it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cc @kiszk , is it by accident?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19797 addressed to avoid using an object in a global variable in several places since to reuse an object looks a hacky way.
I intentionally avoided the reuse of an array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I didn't remember it very clearly, what was the gain of not reusing the array? Saving a global variable slot seems not worth.

@SparkQA
Copy link

SparkQA commented Dec 6, 2017

Test build #84554 has finished for PR 19910 at commit e995a0e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 6, 2017

@cloud-fan the UT failures are caused by this PR. Therefore I think @viirya is right and also for CreateArray we shouldn't reuse the object to avoid bad results. Likely this happens only in some particular cases, but this can generate bad results. Then I think the best option is to close this PR and to restore the change in #19896 , what do you think?

@cloud-fan
Copy link
Contributor

I did a quick search, seems we do follow the rule that avoids reusing the data array. Let's restore the change in #19896

@mgaido91 mgaido91 closed this Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants