-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components #9610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks like the right direction. I'm not certain that the configure script changes are exactly what we want but I'll need to think some more about it.
|
Oh, one other thing: in |
r/tools/autobrew
Outdated
| @@ -48,7 +48,8 @@ fi | |||
| # Hardcode this for my custom autobrew build | |||
| rm -f $BREWDIR/lib/*.dylib | |||
| AWS_LIBS="-laws-cpp-sdk-config -laws-cpp-sdk-transfer -laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity -laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream -laws-checksums -laws-c-common -lpthread -lcurl" | |||
| PKG_LIBS="-L$BREWDIR/lib -lparquet -larrow_dataset -larrow -larrow_bundled_dependencies -lthrift -llz4 -lsnappy -lzstd $AWS_LIBS" | |||
| PKG_LIBS="-lparquet -larrow_dataset -larrow -larrow_bundled_dependencies -lthrift -llz4 -lsnappy -lzstd $AWS_LIBS" | |||
| PKG_DIRS="-L$BREWDIR/lib" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to remember to upstream this change when we release to CRAN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relationship of this script (r/tools/autobrew) to https://github.com/autobrew/scripts/blob/master/apache-arrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's based on an earlier version of Jeroen's autobrew, and it's closest now to how he actually builds the C++ dependencies. It looks like the new autobrew script is based on some artifacts that are bundled up in a separate process (see https://github.com/autobrew/bundler/blob/master/apache-arrow.sh).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created ARROW-11861 to remind us of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we also needed to add -DARROW_R_WITH_PARQUET and -DARROW_R_WITH_DATASET to the cflags in https://github.com/autobrew/scripts/blob/master/apache-arrow (ARROW-12604)
Done in 8c5b235dddac47833b229323b4a216049eae5fea |
|
@github-actions crossbow submit test-r-minimal-build |
|
@nealrichardson do you want |
Correct, we don't want to change that meaning (yet). |
|
Revision: 6f70c9911262d5d956c7f4ded49647269af848d9 Submitted crossbow builds: ursacomputing/crossbow @ actions-175
|
dev/tasks/r/azure.linux.yml
Outdated
| @@ -47,6 +47,9 @@ jobs: | |||
| export R_ORG={{ r_org }} | |||
| export R_IMAGE={{ r_image }} | |||
| export R_TAG={{ r_tag }} | |||
| export ARROW_DATASET={{ arrow_dataset|default("ON") }} | |||
| export ARROW_PARQUET={{ arrow_parquet|default("ON") }} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this didn't pass through all the way: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=1721&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=6c939d89-0d1a-51f2-8b30-091a7a82e98c&l=240
Setting them here affects the environment in which docker-compose is called, but it doesn't forward into the docker environment.
I believe what you instead want is to pass them into docker-compose below, something like docker-compose -e ARROW_DATASET={{ arrow_dataset|default("ON") }} -e ARROW_PARQUET={{ arrow_parquet|default("ON") }} -e ARROW_S3={{ arrow_s3|default("ON") }} r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. Let's see if it works to add them to .env (066ce065124cf7dcaa08225cae83177594da3965) and if not then I'll do it with -e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, need to append not overwrite. Fixed in 318371fbaa07f1b6fcdf71f58d7d5460a931da13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only danger to appending to the .env file is that if you ran this locally, wouldn't it modify the file locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if folks might be running these commands locally then I can see how that would cause damage.
Looks like it didn't work anyhow. I will investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I think it's the same problem--note (for comparison) that R_ORG et al. are in the .env file, and they're environment variables for docker-compose, not passed into the docker container.
The fact that ARROW_R_DEV is in the env file, and then passed through in docker-compose.yml to the build environments, may just be due to my ignorance of all of this at the time I added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. I tried a few different things just for kicks, then used -e per your suggestion in 9479c6ce079c7230b0e290dcb5658436d1d62fde
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build looks good after 9479c6c:
https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-184-azure-test-r-minimal-build
|
@github-actions crossbow submit test-r-minimal-build |
|
Revision: 318371fbaa07f1b6fcdf71f58d7d5460a931da13 Submitted crossbow builds: ursacomputing/crossbow @ actions-177
|
|
@github-actions crossbow submit test-r-minimal-build |
|
Revision: 27c514cb1b14e7125e8ba50714c6480c93dc231a Submitted crossbow builds: ursacomputing/crossbow @ actions-181
|
|
@github-actions crossbow submit test-r-minimal-build |
|
Revision: a810b3fce373c7f5090e367e5a6afc70db60414a Submitted crossbow builds: ursacomputing/crossbow @ actions-182
|
|
@github-actions crossbow submit test-r-minimal-build |
|
Revision: 9479c6ce079c7230b0e290dcb5658436d1d62fde Submitted crossbow builds: ursacomputing/crossbow @ actions-184
|
|
@github-actions crossbow submit -g r |
|
@github-actions crossbow submit test-r-without-dataset-parquet-s3 |
|
Revision: 6eda5896c638ca1c7af577c02ec56bd802a2fe42 Submitted crossbow builds: ursacomputing/crossbow @ actions-185 |
|
Revision: 6eda5896c638ca1c7af577c02ec56bd802a2fe42 Submitted crossbow builds: ursacomputing/crossbow @ actions-186
|
|
The environment variables are being set in the container now, and Arrow is building with Dataset and Parquet off, but S3 is remaining on despite the environment variable. |
Done |
|
@github-actions crossbow submit test-r-minimal-build |
|
Revision: b5ed1c0 Submitted crossbow builds: ursacomputing/crossbow @ actions-197
|
|
@github-actions crossbow submit test-r-minimal-build |
|
Revision: 96e9104 Submitted crossbow builds: ursacomputing/crossbow @ actions-198
|
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks for doing this.
I'm going to try one more thing with the crossbow jobs, will revert if it's no good, and then merge.
|
@github-actions crossbow submit test-r-rstudio* test-r-minimal-build |
|
Revision: 1b1680f Submitted crossbow builds: ursacomputing/crossbow @ actions-199 |
Not implemented for Windows yet (that's ARROW-11884)