-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14092] [SQL] move shouldStop() to end of while loop #11912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We would need to dump the JITed assembly to understand what's going on. |
| | while ($idx < numRows) { | ||
| | int $rowidx = $idx++; | ||
| | ${consume(ctx, columns1).trim} | ||
| | if (shouldStop()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some comment somewhere to explain why shouldStop needs to be here? It'd be great to reference the JIRA ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's here in the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw i'm not sure but i suspect this has to do with loop unrolling. jit stops unrolling the loop when shouldStop is part of the terminal condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment around line 248 saying this loop is very perf sensitive and changes to it should be measured carefully?
|
Test build #53905 has finished for PR 11912 at commit
|
|
LGTM |
|
Added comment, merging this into master. |
What changes were proposed in this pull request?
This PR rollback some changes in #11274 , which introduced some performance regression when do a simple aggregation on parquet scan with one integer column.
Does not really understand how this change introduce this huge impact, maybe related show JIT compiler inline functions. (saw very different stats from profiling).
How was this patch tested?
Manually run the parquet reader benchmark, before this change:
After this change