Skip to content

Conversation

@ildipo
Copy link
Contributor

@ildipo ildipo commented Mar 23, 2023

Rationale for this change

See the linked issue

What changes are included in this PR?

C++:

  • remove all compute/exec/* from libarrow
  • rename compute/exec -> acero and make libarrow_acero
  • add new ARROW_ACERO option, required if ARROW_DATASET is on
  • libarrow_dataset now depends on libarrow_acero

c_glib: add the new libarrow_acero dependency - we disallow building glib without it

python: added PYARROW_BUILD_ACERO, set to on if DATASETS are built

Are these changes tested?

All the standard tests do work properly.

I manually compiled C++ with:

  • no ARROW_ACERO
  • ARROW_ACERO and no ARROW_DATASET
  • ARROW_ACERO and ARROW_DATASET and no ARROW_SUBSTRAIT

I manually compiled python without ACERO & DATASET and with ACERO and without DATASET

Are there any user-facing changes?

If users include compute/exec files directly then they'll have to update their code.

@github-actions
Copy link

@kou kou changed the title GH-15280: [C++,Python,C_GLIB] add libarrow_acero containing everything previously in compute/exec GH-15280: [C++][Python][GLib] add libarrow_acero containing everything previously in compute/exec Mar 23, 2023
@ildipo
Copy link
Contributor Author

ildipo commented Mar 23, 2023

@westonpace I did not move acero components in their own namespace.
Do you think it is worth doing that as well?

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Mar 23, 2023
@ildipo ildipo requested a review from assignUser as a code owner March 24, 2023 16:34
@westonpace
Copy link
Member

@westonpace I did not move acero components in their own namespace.
Do you think it is worth doing that as well?

Hmm, let's leave it as it is for now. That would be a more significant breaking change for anyone using Acero today and I don't know that it's justified.

@westonpace
Copy link
Member

Hmm, let's leave it as it is for now. That would be a more significant breaking change for anyone using Acero today and I don't know that it's justified.

On the other hand...they're going to have to change all their include statements. So if we are going to make the change at some point, now is probably the time to do it.

@ildipo ildipo requested a review from raulcd as a code owner March 24, 2023 18:10
@ildipo ildipo force-pushed the refactor_10 branch 2 times, most recently from 72620f7 to d551aa0 Compare March 27, 2023 13:58
@kou
Copy link
Member

kou commented Apr 1, 2023

@icexelloss It seems that this breaks the "AMD64 Windows R 4.2 RTools 42" job:

https://github.com/apache/arrow/actions/runs/4582748548/jobs/8093221704#step:12:90

g++ -shared -s -static-libgcc -o arrow.dll tmp.def RTasks.o altrep.o array.o array_to_vector.o arraydata.o arrowExports.o bridge.o buffer.o chunkedarray.o compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o expression.o extension-impl.o feather.o field.o filesystem.o io.o json.o memorypool.o message.o parquet.o r_to_arrow.o recordbatch.o recordbatchreader.o recordbatchwriter.o safe-call-into-r-impl.o scalar.o schema.o symbols.o table.o threadpool.o type_infer.o -L../windows/arrow-11.0.0.9000/lib-10.4.0/x64 -L../windows/arrow-11.0.0.9000/lib/x64-ucrt -larrow_dataset -lparquet -larrow -larrow_bundled_dependencies -lutf8proc -lthrift -lsnappy -lz -lzstd -llz4 -lbz2 -lbrotlienc -lbrotlidec -lbrotlicommon -lole32 -lbcrypt -lpsapi -lcrypto -lcrypt32 -lre2 -laws-cpp-sdk-config -laws-cpp-sdk-transfer -laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity -laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream -laws-checksums -laws-c-common -lUserenv -lversion -lws2_32 -lBcrypt -lWininet -lwinhttp -lcurl -lnormaliz -lssh2 -lgdi32 -lssl -lcrypto -lcrypt32 -lwldap32 -lz -lws2_32 -lnghttp2 -Lc:/rtools42/x86_64-w64-mingw32.static.posix/lib/x64 -Lc:/rtools42/x86_64-w64-mingw32.static.posix/lib -LC:/R/bin/x64 -lR
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x139e): undefined reference to `__imp__ZNK5arrow5acero8ExecPlan8ToStringB5cxx11Ev'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1637): undefined reference to `__imp__ZN5arrow5acero8ExecPlan14StartProducingEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1649): undefined reference to `__imp__ZN5arrow5acero8ExecPlan8finishedEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1c3e): undefined reference to `__imp__ZN5arrow5acero8ExecPlan4MakeEPNS_7compute11ExecContextESt10shared_ptrIKNS_16KeyValueMetadataEE'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1de0): undefined reference to `__imp__ZN5arrow5acero29default_exec_factory_registryEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x2127): undefined reference to `__imp__ZTVN5arrow5acero34RecordBatchReaderSourceNodeOptionsE'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x2281): undefined reference to `__imp__ZTVN5arrow5acero22TableSourceNodeOptionsE'
...

Could you check this and open an issue for this if needed?

@assignUser
Copy link
Member

Could we run crossbow jobs before merging big changes like this? I don't see any in this PR at all.

@assignUser
Copy link
Member

assignUser commented Apr 1, 2023

@github-actions crossbow submit -g r -g cpp

@github-actions
Copy link

github-actions bot commented Apr 1, 2023

Revision: 769da29

Submitted crossbow builds: ursacomputing/crossbow @ actions-c171914822

Task Status
conda-linux-aarch64-cpu-r41 Azure
conda-linux-aarch64-cpu-r42 Azure
conda-linux-x64-cpu-r41 Azure
conda-linux-x64-cpu-r42 Azure
conda-osx-arm64-cpu-r41 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-x64-cpu-r41 Azure
conda-osx-x64-cpu-r42 Azure
conda-win-x64-cpu-r41 Azure
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions
r-binary-packages Github Actions
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp Github Actions
test-debian-10-cpp-amd64 Github Actions
test-debian-10-cpp-i386 Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-fedora-r-clang-sanitizer Azure
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-bundled Azure
test-r-depsource-system Github Actions
test-r-dev-duckdb Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-gcc-12 Github Actions
test-r-install-local Github Actions
test-r-install-local-minsizerel Github Actions
test-r-library-r-base-latest Azure
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rstudio-r-base-4.1-opensuse153 Azure
test-r-rstudio-r-base-4.2-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 Github Actions
test-r-versions Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-20 Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-minimal-with-formats Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions
test-ubuntu-r-sanitizer Azure

@ursabot
Copy link

ursabot commented Apr 1, 2023

Benchmark runs are scheduled for baseline = 7e19111 and contender = f137f29. f137f29 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.56% ⬆️0.0%] test-mac-arm
[Finished ⬇️3.83% ⬆️0.0%] ursa-i9-9960x
[Failed ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] f137f29f ec2-t3-xlarge-us-east-2
[Failed] f137f29f test-mac-arm
[Finished] f137f29f ursa-i9-9960x
[Failed] f137f29f ursa-thinkcentre-m75q
[Finished] 7e19111f ec2-t3-xlarge-us-east-2
[Failed] 7e19111f test-mac-arm
[Finished] 7e19111f ursa-i9-9960x
[Failed] 7e19111f ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@icexelloss
Copy link
Contributor

Could we run crossbow jobs before merging big changes like this?

Sorry I wasn't aware of crossbow jobs needs to be manually invoked before merging. Certainly will do next time.

For my knowledge, what is the difference between crossbow jobs and the github CI?

@icexelloss
Copy link
Contributor

@icexelloss It seems that this breaks the "AMD64 Windows R 4.2 RTools 42" job:

https://github.com/apache/arrow/actions/runs/4582748548/jobs/8093221704#step:12:90

g++ -shared -s -static-libgcc -o arrow.dll tmp.def RTasks.o altrep.o array.o array_to_vector.o arraydata.o arrowExports.o bridge.o buffer.o chunkedarray.o compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o expression.o extension-impl.o feather.o field.o filesystem.o io.o json.o memorypool.o message.o parquet.o r_to_arrow.o recordbatch.o recordbatchreader.o recordbatchwriter.o safe-call-into-r-impl.o scalar.o schema.o symbols.o table.o threadpool.o type_infer.o -L../windows/arrow-11.0.0.9000/lib-10.4.0/x64 -L../windows/arrow-11.0.0.9000/lib/x64-ucrt -larrow_dataset -lparquet -larrow -larrow_bundled_dependencies -lutf8proc -lthrift -lsnappy -lz -lzstd -llz4 -lbz2 -lbrotlienc -lbrotlidec -lbrotlicommon -lole32 -lbcrypt -lpsapi -lcrypto -lcrypt32 -lre2 -laws-cpp-sdk-config -laws-cpp-sdk-transfer -laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity -laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream -laws-checksums -laws-c-common -lUserenv -lversion -lws2_32 -lBcrypt -lWininet -lwinhttp -lcurl -lnormaliz -lssh2 -lgdi32 -lssl -lcrypto -lcrypt32 -lwldap32 -lz -lws2_32 -lnghttp2 -Lc:/rtools42/x86_64-w64-mingw32.static.posix/lib/x64 -Lc:/rtools42/x86_64-w64-mingw32.static.posix/lib -LC:/R/bin/x64 -lR
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x139e): undefined reference to `__imp__ZNK5arrow5acero8ExecPlan8ToStringB5cxx11Ev'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1637): undefined reference to `__imp__ZN5arrow5acero8ExecPlan14StartProducingEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1649): undefined reference to `__imp__ZN5arrow5acero8ExecPlan8finishedEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1c3e): undefined reference to `__imp__ZN5arrow5acero8ExecPlan4MakeEPNS_7compute11ExecContextESt10shared_ptrIKNS_16KeyValueMetadataEE'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1de0): undefined reference to `__imp__ZN5arrow5acero29default_exec_factory_registryEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x2127): undefined reference to `__imp__ZTVN5arrow5acero34RecordBatchReaderSourceNodeOptionsE'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x2281): undefined reference to `__imp__ZTVN5arrow5acero22TableSourceNodeOptionsE'
...

Could you check this and open an issue for this if needed?

Terribly sorry about that. Will do.

@assignUser
Copy link
Member

For my knowledge, what is the difference between crossbow jobs and the github CI?

Crossbow jobs are e.g. the nightly jobs which are run in a different repo, they have to be triggered manually via comment (as seen above). Which should be triggered by the reviewers on such a large change. (And imo any change that touches the build system in a significant way). This is a bit poorly documented tbh so don't worry ^^.

@icexelloss
Copy link
Contributor

For my knowledge, what is the difference between crossbow jobs and the github CI?

Crossbow jobs are e.g. the nightly jobs which are run in a different repo, they have to be triggered manually via comment (as seen above). Which should be triggered by the reviewers on such a large change. (And imo any change that touches the build system in a significant way). This is a bit poorly documented tbh so don't worry ^^.

Got it - Will do next time. Thanks!

@icexelloss
Copy link
Contributor

@icexelloss It seems that this breaks the "AMD64 Windows R 4.2 RTools 42" job:
https://github.com/apache/arrow/actions/runs/4582748548/jobs/8093221704#step:12:90

g++ -shared -s -static-libgcc -o arrow.dll tmp.def RTasks.o altrep.o array.o array_to_vector.o arraydata.o arrowExports.o bridge.o buffer.o chunkedarray.o compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o expression.o extension-impl.o feather.o field.o filesystem.o io.o json.o memorypool.o message.o parquet.o r_to_arrow.o recordbatch.o recordbatchreader.o recordbatchwriter.o safe-call-into-r-impl.o scalar.o schema.o symbols.o table.o threadpool.o type_infer.o -L../windows/arrow-11.0.0.9000/lib-10.4.0/x64 -L../windows/arrow-11.0.0.9000/lib/x64-ucrt -larrow_dataset -lparquet -larrow -larrow_bundled_dependencies -lutf8proc -lthrift -lsnappy -lz -lzstd -llz4 -lbz2 -lbrotlienc -lbrotlidec -lbrotlicommon -lole32 -lbcrypt -lpsapi -lcrypto -lcrypt32 -lre2 -laws-cpp-sdk-config -laws-cpp-sdk-transfer -laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity -laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream -laws-checksums -laws-c-common -lUserenv -lversion -lws2_32 -lBcrypt -lWininet -lwinhttp -lcurl -lnormaliz -lssh2 -lgdi32 -lssl -lcrypto -lcrypt32 -lwldap32 -lz -lws2_32 -lnghttp2 -Lc:/rtools42/x86_64-w64-mingw32.static.posix/lib/x64 -Lc:/rtools42/x86_64-w64-mingw32.static.posix/lib -LC:/R/bin/x64 -lR
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x139e): undefined reference to `__imp__ZNK5arrow5acero8ExecPlan8ToStringB5cxx11Ev'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1637): undefined reference to `__imp__ZN5arrow5acero8ExecPlan14StartProducingEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1649): undefined reference to `__imp__ZN5arrow5acero8ExecPlan8finishedEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1c3e): undefined reference to `__imp__ZN5arrow5acero8ExecPlan4MakeEPNS_7compute11ExecContextESt10shared_ptrIKNS_16KeyValueMetadataEE'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x1de0): undefined reference to `__imp__ZN5arrow5acero29default_exec_factory_registryEv'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x2127): undefined reference to `__imp__ZTVN5arrow5acero34RecordBatchReaderSourceNodeOptionsE'
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: compute-exec.o:compute-exec.c:(.text+0x2281): undefined reference to `__imp__ZTVN5arrow5acero22TableSourceNodeOptionsE'
...

Could you check this and open an issue for this if needed?

Terribly sorry about that. Will do.

Created #34843

@thisisnic
Copy link
Member

It's not just the R Windows builds; test-r-minimal-build and test-r-offline-minimal are also failing now.

assignUser pushed a commit that referenced this pull request Apr 5, 2023
### Rationale for this change
There was an issue with window R build introduced by this PR:
#34711

### What changes are included in this PR?
Many changes to the build files

### Are these changes tested?
crossbow builds

### Are there any user-facing changes?

* Closes: #34843

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Li Jin <ice.xelloss@gmail.com>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
kou added a commit that referenced this pull request Apr 6, 2023
### Rationale for this change

We need to update Linux packages for #34711.

### What changes are included in this PR?

Add Acero related packages.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* Closes: #34914

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
### Rationale for this change

We need to update Linux packages for apache#34711.

### What changes are included in this PR?

Add Acero related packages.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* Closes: apache#34914

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…rything previously in compute/exec (apache#34711)

### Rationale for this change

See the linked issue

### What changes are included in this PR?

C++:
* remove all compute/exec/* from libarrow
* rename compute/exec -> acero and make libarrow_acero
* add new ARROW_ACERO option, required if ARROW_DATASET is on
* libarrow_dataset now depends on libarrow_acero

c_glib: add the new libarrow_acero dependency - we disallow building glib without it

python: added PYARROW_BUILD_ACERO, set to on if DATASETS are built

### Are these changes tested?

All the standard tests do work properly.

I manually compiled C++ with:
* no ARROW_ACERO
* ARROW_ACERO and no ARROW_DATASET
* ARROW_ACERO and ARROW_DATASET and no ARROW_SUBSTRAIT 

I manually compiled python without ACERO & DATASET and with ACERO and without DATASET

### Are there any user-facing changes?

If users include compute/exec files directly then they'll have to update their code.

* Closes: apache#15280

Lead-authored-by: Davide Pasetto <dpasetto69@gmail.com>
Co-authored-by: Li Jin <ice.xelloss@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Li Jin <ice.xelloss@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…he#34844)

### Rationale for this change
There was an issue with window R build introduced by this PR:
apache#34711

### What changes are included in this PR?
Many changes to the build files

### Are these changes tested?
crossbow builds

### Are there any user-facing changes?

* Closes: apache#34843

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Li Jin <ice.xelloss@gmail.com>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
### Rationale for this change

We need to update Linux packages for apache#34711.

### What changes are included in this PR?

Add Acero related packages.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* Closes: apache#34914

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Move Acero out of libarrow

9 participants