-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-34388: [C++] Build core compute kernels unconditionally #34295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34388: [C++] Build core compute kernels unconditionally #34295
Conversation
|
|
|
One thing I haven't decided is how to deal with the compute unit tests since most of them make heavy use of the extra kernels, so a good chunk of them will fail without them. Easiest option would be force (Also, I'll look into the unity build failures - not quite sure what's going wrong there...) |
|
cc @felipecrv |
|
I've read the PR description more carefully now. |
|
Well to be fair, that might still be the right move as it'd be easy to make that a follow-up PR if we decide to go down that road (assuming we get the library boundaries right in this one). |
|
The goal is to eventually reduce the size of the "core" library right? Do we have any idea how slim this set of baseline kernels is compared to the full set? |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a reasonable set of minimal kernels. Do you want to add a CI job (or does one already exist?) that builds without ARROW_COMPUTE to ensure basic functionality (e.g. parquet reading/writing and csv reading/writing) still works?
Here's what would be present in the default build: There are currently 240 kernels in the full set, so it's a pretty deep cut.
That would be a good idea, yes. AFAIK none of the existing jobs build without ARROW_COMPUTE. Even if they did, the CSV writer/STL tests wouldn't be included and libparquet wouldn't be built at all. |
|
This failure looks related: https://github.com/apache/arrow/actions/runs/4247620924/jobs/7385865305 Perhaps just a changing of include orders has angered the unity build gremlins in some way. Given the two implementations are identical maybe we could put them in util_internal.h? |
f2b4597 to
96fe9af
Compare
|
|
|
I still need to add the CI job, but in preparation, I set things up so that certain tests won't be built without the complete kernel registry - so we wouldn't need any special ctest flags to avoid expected failures. The unity build redefinition errors should be fixed now. Most of the problematic code in scalar_round.cc was actually completely unused, so I just removed it. |
Co-authored-by: Weston Pace <weston.pace@gmail.com>
|
@felipecrv / @lidavidm I haven't been following the discussion on #25025 very closely. However, this change seems good. I assume we want to proceed with it? |
|
Yes, either way, I think this is a necessary first step before we can unbundle the kernels + I hope it is easier to review this way |
|
Benchmark runs are scheduled for baseline = 4b31aa4 and contender = be7a763. be7a763 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…37409) ### Rationale for this change GH-34295 changed meaning of `ARROW_COMPUTE`. `ARROW_COMPUTE=ON` means that "all compute kerenels are enabled" not "compute module is enabled". `arrow-compute.pc` is for detecting `ARROW_COMPUTE`. So `arrow-compute.pc` should be installed only when `ARROW_COMPUTE=ON`. ### What changes are included in this PR? Add missing `if (ARROW_COMPUTE)`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #37408 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…E=ON (apache#37409) ### Rationale for this change apacheGH-34295 changed meaning of `ARROW_COMPUTE`. `ARROW_COMPUTE=ON` means that "all compute kerenels are enabled" not "compute module is enabled". `arrow-compute.pc` is for detecting `ARROW_COMPUTE`. So `arrow-compute.pc` should be installed only when `ARROW_COMPUTE=ON`. ### What changes are included in this PR? Add missing `if (ARROW_COMPUTE)`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: apache#37408 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…E=ON (apache#37409) ### Rationale for this change apacheGH-34295 changed meaning of `ARROW_COMPUTE`. `ARROW_COMPUTE=ON` means that "all compute kerenels are enabled" not "compute module is enabled". `arrow-compute.pc` is for detecting `ARROW_COMPUTE`. So `arrow-compute.pc` should be installed only when `ARROW_COMPUTE=ON`. ### What changes are included in this PR? Add missing `if (ARROW_COMPUTE)`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: apache#37408 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
This includes the core compute machinery in libarrow by default - in addition to all cast kernels and several other kernels that are either dependencies of
cast(take) or utilized in libarrow/libparquet (unique,filter). The remaining kernels won't be built/registered unlessARROW_COMPUTE=ON(note that this would slightly change the option's meaning, as currently, nothing in arrow/compute is built unless it's set).Initially this was more substantial as the original goal was to build the extra kernels as a shared library (suggested in the orginal issue). After some discussion in the issue thread, I opted not to do that - primarily because I can't personally see the utility of a separate lib here, even ignoring the complexity it introduces. However, there may be a good reason that simply hasn't occured to me.