-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-40415: [C++] Extract the primitive operations implementing Take and make them instantiable in multiple scenarios #40416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for commit d972946c31d22c710310ec5a914753bcaaefb4c1. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
|
Results of benchmarks I ran on my laptop:
My goal was to factor out the abstractions for use in multiple contexts, but the performance results are very good. Details
|
|
Thanks for your patience. Conbench analyzed the 7 benchmarking runs that have been run so far on PR commit d972946c31d22c710310ec5a914753bcaaefb4c1. There were 21 benchmark results indicating a performance regression:
The full Conbench report has more details. |
|
Ran the benchmarks locally (gcc 12.3.0, AMD Ryzen 9 3900X). There are a number of improved benchmarks and also a couple sizable regressions: Details
|
|
I spent a lot of time trying to figure out what could possibly be causing these regressions, but it turns out to be randomness caused by memory operations and branch predictions. When running the benchmarks on a commit that doesn't even touch the code I get sizeable regressions too.
UPDATE: with all my changes applied:
|
|
Ok, there's a lot of C++ to decipher here but I'd rather see an explanation of how this is different from what already exists. Apart from calling the primitive "gather" than "take" and adding a bunch of abstractions that seem to emulate what already exists, I'm not sure what this brings exactly. |
|
My intuition is that this is primarily avoiding the cost of |
b97938c to
c692832
Compare
…nulls to a FixedSizeListBuilder
…locatePrimitiveArrayData
Rationale for this change
This can help fixing #39798 and #39566, but is also improving the performance of
Take(specially in the no-nulls case).What changes are included in this PR?
Takeimplementation for values/indices without nullsTakeinto a single oneARROW_COMPILER_ASSUMEthat allows reducing the binary size because in the loop body we know if the validity bitmap is null or notAre these changes tested?
Takeguarantees null values are zeroed out