Skip to content

Conversation

@thisisnic
Copy link
Member

No description provided.

@github-actions
Copy link

@thisisnic thisisnic changed the title ARROW-11460: [R] Use system libraries if present on Linux [WIP] ARROW-11460: [R] Use system libraries if present on Linux Jul 13, 2021
@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

@github-actions
Copy link

Revision: 58dc9db0deb842189306982ee87629d00faf0d86

Submitted crossbow builds: ursacomputing/crossbow @ actions-595

Task Status
test-r-depsource Github Actions

@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

@github-actions
Copy link

Revision: e3ffb69de9799801d5c7434405ca907867c2795e

Submitted crossbow builds: ursacomputing/crossbow @ actions-596

Task Status
test-r-depsource Github Actions

@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

@github-actions
Copy link

Revision: 55e28b6a55c417124d7a3b48c571eb473fa63136

Submitted crossbow builds: ursacomputing/crossbow @ actions-597

Task Status
test-r-depsource Github Actions

@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

1 similar comment
@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

@github-actions
Copy link

Revision: 309842e1e8e21c0cf8b226bf8b4ba93b4fea5667

Submitted crossbow builds: ursacomputing/crossbow @ actions-598

Task Status
test-r-depsource Github Actions

@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

@github-actions
Copy link

Revision: aae65707cd2beed2ab38147b82c43a7f08af4ee6

Submitted crossbow builds: ursacomputing/crossbow @ actions-599

Task Status
test-r-depsource Github Actions

@jonkeane
Copy link
Member

@github-actions crossbow submit test-r-depsource

@github-actions
Copy link

Revision: 525abeafa20107eb521a34ee7247d3e8284c8b9c

Submitted crossbow builds: ursacomputing/crossbow @ actions-643

Task Status
test-r-depsource Github Actions

@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

@thisisnic thisisnic marked this pull request as ready for review July 27, 2021 10:26
@github-actions
Copy link

Revision: 16b5bbc0b936621d1cc3f2cb75f81ca7c05c129a

Submitted crossbow builds: ursacomputing/crossbow @ actions-662

Task Status
test-r-depsource Github Actions

@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-depsource

@github-actions
Copy link

Revision: 0846bd785632cfc3c78d3f8a09dde4d40d50e1d9

Submitted crossbow builds: ursacomputing/crossbow @ actions-663

Task Status
test-r-depsource Github Actions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to try and concoct a CI configuration that hits this else, but I'm curious if you've tried to do it locally and check that you get the expected behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not but happy to do so to test it. What do you think is the most straightforward approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just spin up a docker image?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I would ARROW_R_DEV=TRUE R_STATIC_DEPENDENCY_SOURCE=AUTO docker-compose build r and then interactively enter the docker container that is built there, uninstall pkg-config and then run /arrow/ci/scripts/r_test.sh or try installing manually

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't see your second comment before I sent mine, yeah, exactly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want ARROW_R_DEV=TRUE docker run -it apache/arrow-dev:r-rhub-ubuntu-gcc-release-latest -e ARROW_DEPENDENCY_SOURCE=AUTO /bin/bash (get the right variable name, and pass it into the runtime environment with -e)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you also need -v $(pwd):/arrow or something like that, that's what mounts the /arrow directory based on your local dir

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Neal!

Copy link
Member Author

@thisisnic thisisnic Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify then, the reason why ARROW_DEPENDENCY_SOURCE needs to be passed in via -e but ARROW_R_DEV doesn't is:

  • 'r' (defined in docker-compose.yml) is a service
  • specifying environment variables ahead of calling docker run allows you to set env vars for running that service
  • with ARROW_R_DEV, it doesn't matter whether you specify it via -e or via prefixing the call to docker run as it gets passed through anyway via the environment field of the r service
  • with ARROW_DEPENDENCY_SOURCE, it must be set via -e as there's nothing defined in the service to explicitly pass it through

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now tested and working as expected when pkg-config isn't installed

.env Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this here. We have to put env vars here that affect what the Dockerfiles build, but this is a runtime env var, so we can just pass it with -e and don't need to declare it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to that, here's one example of how that's done for a different CI job: https://github.com/apache/arrow/blob/master/.github/workflows/r-without-arrow.yml#L76

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for here, I would do ~the opposite of what I said yesterday, we want something like -e ARROW_DEPENDENCY_SOURCE=... in the docker call since that's the environment variable that we advertise and use throughout Arrow to determine that source.

Yesterday I mentioned calling it something like R_STATIC_DEP... in the .env file, which if we needed to pass it there would be good to mark that it only was getting passed to the R builds and wasn't a parameter that would help on a python build. But Neal's right that we should just put it in the docker command in the GHA workflow file and not worry about passing it around in docker-compose or the .env file. Sorry for my misdirection there yesterday!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right/complete, and I think if we look at what is required to be complete, we'll see that we don't actually want this logic in this script. Working backwards, the point of Kou's change is that the pkgconfig files that the arrow C++ build does now contain all of the -l libs that the build requires from the system (which shouldn't be any if we're doing a BUNDLED build, but with AUTO there will be libs in it). So we need to make sure the PKG_LIBS env var in r/configure gets them. And since this script, and even r/tools/nixlibs.R, can't return anything or set anything in the env that r/configure is running in, we need to call pkg-config on the arrow C++ install dir from r/configure. (This is the missing par.) But that will only work if pkg-config exists, so we need to check for it like you do here--but I think we need to check for it in r/configure so that that script know whether to set AUTO or BUNDLED.

r/configure Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want this change, this is for checking whether libarrow already exists on your system. (If you want to make a change here, edit the comments to clarify which case this is.)

Suggested change
if [ $? -eq 0 ] && [ "$ARROW_USE_PKG_CONFIG" != "false" ] && [ "$ARROW_DEPENDENCY_SOURCE" == "AUTO" ]; then
if [ $? -eq 0 ] && [ "$ARROW_USE_PKG_CONFIG" != "false" ]; then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, you may want to pull Kou's suggestion from #10404 (comment) in here (though I'm not certain about the --static part in this block, may have to experiment locally)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comments are still valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree! I'm still working on this - converted it to a draft for the moment whilst I go read up on things as I realised I was just changing code as per all the helpful suggestions, but not understanding why.

r/configure Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want this here. This is only relevant if you've run nixlibs.R to build libarrow. Put this somewhere around L176.

Copy link
Member

@nealrichardson nealrichardson Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, before L160 (the call to nixlibs.R), add something like:

if [ "${ARROW_DEPENDENCY_SOURCE}" = "" ]; then
  # TODO: BUNDLED is still default for now, but we plan to change it to AUTO
  ARROW_DEPENDENCY_SOURCE=BUNDLED; export ARROW_DEPENDENCY_SOURCE
fi
if [ "${ARROW_DEPENDENCY_SOURCE}" = "AUTO" ]; then
  which pkg-config >/dev/null 2>&1
  if [ $? -eq 0 ]; then
    export ARROW_DEPENDENCY_SOURCE=BUNDLED
    # echo something?
  fi
fi

Finally, you also need to edit build_arrow_static.sh to take ARROW_DEPENDENCY_SOURCE from the environment (default arg there BUNDLED)

r/configure Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want something like this, though we may need to adapt PKG_CONFIG_NAME based on whether dataset and parquet are built (i.e. listed in BUNDLED_LIBS, defined right above where this goes)

Suggested change
PKG_LIBS = "pkgconfig ${LIB_DIR} $PKG_LIBS"
PKG_LIBS="$PKG_LIBS `PKG_CONFIG_PATH=${LIB_DIR} pkg-config --libs-only-l --libs-only-other --static --silence-errors ${PKG_CONFIG_NAME}`"

r/configure Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you still need #10710 (comment) here, and also adding a comment explaining what's up would help our future selves.

@thisisnic thisisnic marked this pull request as draft July 30, 2021 15:02
@jonkeane
Copy link
Member

@github-actions crossbow submit conda-linux-gcc-py36-cpu-r40 conda-linux-gcc-py37-cpu-r41 test-r-depsource-auto test-r-depsource-system test-r-ubuntu-21.04 test-r-gcc-11

@github-actions
Copy link

Revision: 2cf3a26

Submitted crossbow builds: ursacomputing/crossbow @ actions-781

Task Status
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
test-r-depsource-auto Azure
test-r-depsource-system Github Actions
test-r-gcc-11 Github Actions
test-r-ubuntu-21.04 Github Actions

@jonkeane
Copy link
Member

@github-actions crossbow -g r

@github-actions
Copy link

No such option: -g

@jonkeane
Copy link
Member

@github-actions crossbow submit -g r

@github-actions
Copy link

Revision: 2cf3a26

Submitted crossbow builds: ursacomputing/crossbow @ actions-782

Task Status
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-osx-clang-py36-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-win-vs2017-py36-r40 Azure
conda-win-vs2017-py37-r41 Azure
homebrew-r-autobrew Github Actions
test-r-depsource-auto Azure
test-r-depsource-system Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-ubuntu-21.04 Github Actions
test-r-version-compatibility Github Actions
test-r-versions Github Actions
test-r-without-arrow Azure
test-ubuntu-18.04-r-sanitizer Azure

@jonkeane
Copy link
Member

AFAICT, conda-osx-clang-py36-r40, conda-osx-clang-py37-r41, homebrew-r-autobrew, test-r-rhub-debian-gcc-devel-lto-latest, and test-ubuntu-18.04-r-sanitizer are all failing on crossbow already independent of this change.

@nealrichardson
Copy link
Member

Ok, I checked the test-r-depsource-auto build output. It does seem to be picking up system libz and libbz2, though that's it. We do have a few tests that are skip_if_not_available("gzip"), and those tests were not skipped, so that confirms that we've successfully used the system libz.

So, +1 from me again on this. Will merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants