From dc3d4ffc98e09e87ccc9363444670ebe95cad20f Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Fri, 13 Aug 2021 13:19:28 +0100
Subject: [PATCH 01/31] Split content out some more
---
r/vignettes/developing.Rmd | 179 +++++++++++++++++++------------------
1 file changed, 90 insertions(+), 89 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index f5435c06797..884052efa18 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -9,13 +9,11 @@ vignette: >
```{r setup-options, include=FALSE}
knitr::opts_chunk$set(error = TRUE, eval = FALSE)
-
# Get environment variables describing what to evaluate
run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
-
# Update the source knit_hook to save the chunk (if it is marked to be saved)
knit_hooks_source <- knitr::knit_hooks$get("source")
knitr::knit_hooks$set(source = function(x, options) {
@@ -40,18 +38,26 @@ set -e
set -x
```
-If you're looking to contribute to `arrow`, this document can help you set up a development environment that will enable you to write code and run tests locally. It outlines how to build the various components that make up the Arrow project and R package, as well as some common troubleshooting and workflows developers use. Many contributions can be accomplished with the instructions in [R-only development](#r-only-development). But if you're working on both the C++ library and the R package, the [Developer environment setup](#-developer-environment-setup) section will guide you through setting up a developer environment.
+If you're looking to contribute to arrow, this vignette can help you set up a development environment that will enable you to write code and run tests locally. It outlines:
+* how to build the components that make up the Arrow project and R package
+* some common troubleshooting and workflows that developers use
+
+Many contributions can be accomplished with the instructions in [R-only development](#r-only-development), but if you're working on both the C++ library and the R package, the [Developer environment setup](#-developer-environment-setup) section will guide you through setting up a developer environment.
This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation); Linux users can find more details on building from source at `vignette("install", package = "arrow")`.
-This document is a work in progress and will grow + change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but certain custom configurations might conflict with these instructions and there are differences of opinion across developers about if and what the one true way to set up development environments like this is. We also solicit any feedback you have about things that are confusing or additions you would like to see here. Please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see anything that is confusing, odd, or just plain wrong.
+This document is a work in progress and will grow and change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but custom configurations might conflict with these instructions and there are differences of opinion across developers about how to set up development environments like this is.
-## R-only development
+We welcome any feedback you have about things that are confusing or additions you would like to see here. Please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see anything that is confusing, odd, or just plain wrong.
+
+# R-only developer environment setup
Windows and macOS users who wish to contribute to the R package and
-don’t need to alter the Arrow C++ library may be able to obtain a
-recent version of the library without building from source. On macOS,
-you may install the C++ library using [Homebrew](https://brew.sh/):
+don't need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source.
+
+## macOS
+On macOS, you can install the C++ library using [Homebrew](https://brew.sh/):
``` shell
# For the released version:
@@ -60,11 +66,10 @@ brew install apache-arrow
brew install apache-arrow --HEAD
```
+## Windows and Linux
+
On Windows and Linux, you can download a .zip file with the arrow dependencies from the
nightly repository.
-Windows users then can set the `RWINLIB_LOCAL` environment variable to point to that
-zip file before installing the `arrow` R package. On Linux, you'll need to create a `libarrow` directory inside the R package directory and unzip that file into it. Version numbers in that
-repository correspond to dates, and you will likely want the most recent.
To see what nightlies are available, you can use Arrow's (or any other S3 client's) S3 listing functionality to see what is in the bucket `s3://arrow-r-nightly/libarrow/bin`:
@@ -72,58 +77,78 @@ To see what nightlies are available, you can use Arrow's (or any other S3 client
nightly <- s3_bucket("arrow-r-nightly")
nightly$ls("libarrow/bin")
```
+Version numbers in that repository correspond to dates.
-## Developer environment setup
+### Windows
-If you need to alter both the Arrow C++ library and the R package code, or if you can’t get a binary version of the latest C++ library elsewhere, you’ll need to build it from source too. This section discusses how to set up a C++ build configured to work with the R package. For more general resources, see the [Arrow C++ developer
-guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+Windows users then can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing the arrow dependencies before installing the arrow R package.
+
+### Linux
+
+On Linux, you'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the arrow dependencies into it.
-There are four major steps to the process — the first three are relevant to all Arrow developers, and the last one is specific to the R bindings:
+# R and C++ developer environment setup
-1. Configuring the Arrow library build (using `cmake`) — this specifies how you want the build to go, what features to include, etc.
-2. Building the Arrow library — this actually compiles the Arrow library
-3. Install the Arrow library — this organizes and moves the compiled Arrow library files into the location specified in the configuration
-4. Building the R package — this builds the C++ code in the R package, and installs the R package for you
+If you need to alter both the Arrow C++ library and the R package code, or if you can't get a binary version of the latest C++ library elsewhere, you'll need to build it from source. This section discusses how to set up a C++ build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-### Install dependencies {.tabset}
+There are five major steps to the process — the first four are relevant to all Arrow developers, and the last one is specific to developers making changes to the R package:
-The Arrow C++ library will by default use system dependencies if suitable versions are found; if they are not present, it will build them during its own build process. The only dependencies that one needs to install outside of the build process are `cmake` (for configuring the build) and `openssl` if you are building with S3 support.
+1. Install dependencies
+2. Configuring the Arrow library build (using `cmake`) — this specifies how you want the build to go, what features to include, etc.
+3. Building the Arrow library — this actually compiles the Arrow library
+4. Install the Arrow library — this organizes and moves the compiled Arrow library files into the location specified in the configuration
+5. Building the R package — this builds the C++ code in the R package, and installs the R package for you
-For a faster build, you may choose to install on the system more C++ library dependencies (such as `lz4`, `zstd`, etc.) so that they don't need to be built from source in the Arrow build. This is optional.
+## Step 1 - Install dependencies
-#### macOS
+The Arrow C++ library will by default use system dependencies if suitable versions are found. If system dependencies are not present, the Arrow C++ library will build them during its own build process. The only dependencies that you need to install _outside_ of the build process are `cmake` (for configuring the build) and `openssl` if you are building with S3 support.
+
+For a faster build, you may choose to pre-install more C++ library dependencies (such as `lz4`, `zstd`, etc.) on the system so that they don't need to be built from source in the Arrow build.
+
+### macOS
```{bash, save=run & macos}
brew install cmake openssl
```
-#### Ubuntu
+### Ubuntu
```{bash, save=run & ubuntu}
sudo apt install -y cmake libcurl4-openssl-dev libssl-dev
```
-### Configure the Arrow build {.tabset}
+### Windows
+
+Currently, the R package cannot be made to work with a locally-built Arrow C++ library. This will be resolved in a future release.
+
+## Step 2 - Configure the Arrow build
+
+### Build location
-You can choose to build and then install the Arrow library into a user-defined directory or into a system-level directory. You only need to do one of these two options.
+There are two different ways that you can choose to build and then install the Arrow library:
-It is recommended that you install the arrow library to a user-level directory to be used in development. This is so that the development version you are using doesn't overwrite a released version of Arrow you may have installed. You are also able to have more than one version of the Arrow library to link to with this approach (by using different `ARROW_HOME` directories for the different versions). This approach also matches the recommendations for other Arrow bindings like [Python](http://arrow.apache.org/docs/developers/python.html).
+1. into a user-defined directory
+2. into a system-level directory
+
+You only need to do one of these options.
+
+It is recommended that you install the arrow library to a user-level directory to be used in development. This is so that the development version you are using doesn't overwrite a released version of Arrow you may have installed. You are also able to have more than one version of the Arrow library to link to with this approach (by using different `ARROW_HOME` directories for the different versions). This approach also matches the recommendations for other Arrow bindings like [Python](http://arrow.apache.org/docs/developers/python.html).
#### Configure for installing to a user directory
-In this example we will install it to a directory called `dist` that has the same parent as our `arrow` checkout, but it could be named or located anywhere you would like. However, note that your installation of the Arrow R package will point to this directory and need it to remain intact for the package to continue to work. This is one reason we recommend *not* placing it inside of the arrow git checkout.
+In this example we will install the Arrow C++ library to a directory called `dist` that has the same parent directory as our `arrow` checkout but your installation of the Arrow R package can point to any directory with any name. However, we recommend *not* placing it inside of the arrow git checkout directory as unwanted changes could stop it working properly.
```{bash, save=run & !sys_install}
export ARROW_HOME=$(pwd)/dist
mkdir $ARROW_HOME
```
-_Special instructions on Linux:_ You will need to set `LD_LIBRARY_PATH` to the `lib` directory that is under where we set `$ARROW_HOME`, before launching R and using Arrow. One way to do this is to add it to your profile (we use `~/.bash_profile` here, but you might need to put this in a different file depending on your setup, e.g. if you use a shell other than `bash`). On macOS we do not need to do this because the macOS shared library paths are hardcoded to their locations during build time.
+_Special instructions on Linux:_ You will need to set `LD_LIBRARY_PATH` to the `lib` directory that is under where you set `$ARROW_HOME`, before launching R and using Arrow. One way to do this is to add it to your profile (we use `~/.bash_profile` here, but you might need to put this in a different file depending on your setup, e.g. if you use a shell other than `bash`). On macOS you do not need to do this because the macOS shared library paths are hardcoded to their locations during build time.
```{bash, save=run & ubuntu & !sys_install}
export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile
```
-Now we can move into the arrow repository to start the build process. You will need to create a directory into which the C++ build will put its contents. It is recommended to make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). And then, change directories to be inside `cpp/build`:
+Now you can move into the arrow repository to start the build process. You will need to create a directory into which the C++ build will put its contents. It is recommended to make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). And then, change directories to be inside `cpp/build`:
```{bash, save=run & !sys_install}
pushd arrow
@@ -131,7 +156,7 @@ mkdir -p cpp/build
pushd cpp/build
```
-You’ll first call `cmake` to configure the build and then `make install`. For the R package, you’ll need to enable several features in the C++ library using `-D` flags:
+You'll first call `cmake` to configure the build and then `make install`. For the R package, you'll need to enable several features in the C++ library using `-D` flags:
```{bash, save=run & !sys_install}
cmake \
@@ -157,7 +182,7 @@ cmake \
If you would like to install Arrow as a system library you can do that as well. This is in some respects simpler, but if you already have Arrow libraries installed there, it would disrupt them and possibly require `sudo` permissions.
-Now we can move into the arrow repository to start the build process. You will need to create a directory into which the C++ build will put its contents. It is recommended to make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). And then, change directories to be inside `cpp/build`:
+Now you can move into the arrow repository to start the build process. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). And then, change directories to be inside `cpp/build`:
```{bash, save=run & sys_install}
pushd arrow
@@ -165,7 +190,7 @@ mkdir -p cpp/build
pushd cpp/build
```
-You’ll first call `cmake` to configure the build and then `make install`. For the R package, you’ll need to enable several features in the C++ library using `-D` flags:
+You'll first call `cmake` to configure the build and then `make install`. For the R package, you'll need to enable several features in the C++ library using `-D` flags:
```{bash, save=run & sys_install}
cmake \
@@ -185,7 +210,7 @@ cmake \
`..` refers to the C++ source directory: we're in `cpp/build`, and the source is in `cpp`.
-### More Arrow features
+## More Arrow features
To enable optional features including: S3 support, an alternative memory allocator, and additional compression libraries, add some or all of these flags (the trailing `\` makes them easier to paste into a bash shell on a new line):
@@ -202,11 +227,12 @@ To enable optional features including: S3 support, an alternative memory allocat
Other flags that may be useful:
* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=bundled`, for example, or any other dependency `*_SOURCE`, if you have a system version of a C++ dependency that doesn't work correctly with Arrow. This tells the build to compile its own version of the dependency from source.
+
* `-DCMAKE_BUILD_TYPE=debug` or `-DCMAKE_BUILD_TYPE=relwithdebinfo` can be useful for debugging. You probably don't want to do this generally because a debug build is much slower at runtime than the default `release` build.
-_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace around
+_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace
-### Build Arrow
+## Step 3 - Building Arrow
You can add `-j#` between `make` and `install` here too to speed up compilation by running in parallel (where `#` is the number of cores you have available).
@@ -221,19 +247,16 @@ need to use `sudo`:
sudo make install
```
+## Step 4 - Build the Arrow R package
-### Build the Arrow R package
-
-Once you’ve built the C++ library, you can install the R package and its
+Once you've built the C++ library, you can install the R package and its
dependencies, along with additional dev dependencies, from the git
checkout:
```{bash, save=run}
popd # To go back to the root directory of the project, from cpp/build
-
pushd r
R -e 'install.packages("remotes"); remotes::install_deps(dependencies = TRUE)'
-
R CMD INSTALL .
```
@@ -280,17 +303,17 @@ cmake \
..
```
-
+
### Documentation
-The documentation for the R package uses features of `roxygen2` that haven't yet been released on CRAN, such as conditional inclusion of examples via the `@examplesIf` tag. If you are making changes which require updating the documentation, please install the development version of `roxygen2` from GitHub.
+The documentation for the R package uses features of `roxygen2` that haven't yet been released on CRAN, such as conditional inclusion of examples via the `@examplesIf` tag. If you are making changes which require updating the documentation, please install the development version of `roxygen2` from GitHub.
```{r}
remotes::install_github("r-lib/roxygen2")
```
-## Troubleshooting
+# Troubleshooting
Note that after any change to the C++ library, you must reinstall it and
run `make clean` or `git clean -fdx .` to remove any cached object code
@@ -299,12 +322,12 @@ only necessary if you make changes to the C++ library source; you do not
need to manually purge object files if you are only editing R or C++
code inside `r/`.
-### Arrow library-R package mismatches
+## Arrow library-R package mismatches
If the Arrow library and the R package have diverged, you will see errors like:
```
-Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...):
+Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Symbol not found: __ZN5arrow2io16RandomAccessFile9ReadAsyncERKNS0_9IOContextExx
Referenced from: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
@@ -322,7 +345,7 @@ To resolve this, try rebuilding the Arrow library from [Building Arrow above](#b
If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#installing-to-another-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
```
-Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...):
+Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: /usr/local/lib/libarrow.400.dylib
Referenced from: /usr/local/lib/libparquet.400.dylib
@@ -372,19 +395,19 @@ If the package fails to install/load with an error like this:
ensure that `-DARROW_INSTALL_NAME_RPATH=OFF` was passed (this is important on
macOS to prevent problems at link time and is a no-op on other platforms).
Alternatively, try setting the environment variable `R_LD_LIBRARY_PATH` to
-wherever Arrow C++ was put in `make install`, e.g. `export
+wherever Arrow C++ was put in `make install`, e.g. `export
R_LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package.
When installing from source, if the R and C++ library versions do not
-match, installation may fail. If you’ve previously installed the
-libraries and want to upgrade the R package, you’ll need to update the
+match, installation may fail. If you've previously installed the
+libraries and want to upgrade the R package, you'll need to update the
Arrow C++ library first.
For any other build/configuration challenges, see the [C++ developer
guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-## Using `remotes::install_github(...)`
+# Using `remotes::install_github(...)`
If you need an Arrow installation from a specific repository or at a specific ref,
`remotes::install_github("apache/arrow/r", build = FALSE)`
@@ -408,7 +431,7 @@ separate from another Arrow development environment or system installation
* Setting the environment variable `FORCE_BUNDLED_BUILD` to `true` will skip the `pkg-config` search for Arrow libraries and attempt to build from the same source at the repository+ref given.
* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of Arrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`.
-## What happens when you `R CMD INSTALL`?
+# What happens when you `R CMD INSTALL`?
There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arrow users, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host) so the installation process is easy. However knowing about these scripts can help troubleshoot if things go wrong in them or things go wrong in an install:
@@ -418,34 +441,21 @@ There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arr
* Check if a binary is available from our hosted unofficial builds.
* Download the Arrow source and build the Arrow Library from source.
* `*** Proceed without C++` dependencies (this is an error and the package will not work, but if you see this message you know the previous steps have not succeeded/were not enabled)
-* `inst/build_arrow_static.sh` this script builds Arrow for a bundled, static build. It is called by `tools/nixlibs.R` when the Arrow library is being built. (If you're looking at this script, and you've gotten this far, it should look _incredibly_ familiar: it's basically the contents of this guide in script form — with a few important changes)
-
-## Styling and linting of the R code in the R package
-
-The R code in the package follows [the tidyverse style](https://style.tidyverse.org/). On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.
-
-To run the [lintr](https://github.com/jimhester/lintr) locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file `ci/docker/linux-apt-lint.dockerfile` for the current status) and then run `lintr::lint_package("arrow/r")`.
+* `inst/build_arrow_static.sh` this script builds Arrow for a bundled, static build. It is called by `tools/nixlibs.R` when the Arrow library is being built. (If you're looking at this script, and you've gotten this far, it might look incredibly familiar: it's basically the contents of this guide in script form — with a few important changes)
-One can automatically change the formatting of the code in the package using the [styler](https://styler.r-lib.org/) package. There are two ways to do this:
+# Editing C++ code in the R package
-1. Use the comment bot to do this automatically with the command `@github-actions autotune` on a PR and commit it back to the branch.
-2. Locally, with the command `make style` (for only the files changed), `make style-all` (for all files), or use `styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))` note the two excluded files which should not be styled.
-
-The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we habitually do not style is in `r/.styler_excludes.R`.
-
-## Editing C++ code in the R package
-
-The `arrow` package uses some customized tools on top of `cpp11` to prepare its
-C++ code in `src/`. This is because we have some features that are only enabled
+The arrow package uses some customized tools on top of `cpp11` to prepare its
+C++ code in `src/`. This is because there are some features that are only enabled
and built conditionally during build time. If you change C++ code in the R
package, you will need to set the `ARROW_R_DEV` environment variable to `true`
(optionally, add it to your `~/.Renviron` file to persist across sessions) so
-that the `data-raw/codegen.R` file is used for code generation. The `Makefile`
+that the `data-raw/codegen.R` file is used for code generation. The `Makefile`
commands also handles this automatically.
We use Google C++ style in our C++ code. The easiest way to accomplish this is
-use an editors/IDE that formats your code for you. Many popular editors/IDEs
-have support for running `clang-format` on C++ files when you save them.
+use an editors/IDE that formats your code for you. Many popular editors/IDEs
+have support for running `clang-format` on C++ files when you save them.
Installing/enabling the appropriate plugin may save you much frustration.
Check for style errors with
@@ -461,25 +471,21 @@ Fix any style issues before committing with
```
The lint script requires Python 3 and `clang-format-8`. If the command
-isn’t found, you can explicitly provide the path to it like
+isn't found, you can explicitly provide the path to it like
`CLANG_FORMAT=$(which clang-format-8) ./lint.sh`. On macOS, you can get
this by installing LLVM via Homebrew and running the script as
`CLANG_FORMAT=$(brew --prefix llvm@8)/bin/clang-format ./lint.sh`
-_Note_ that the lint script requires Python 3 and the Python dependencies
+_Note_ that the lint script requires Python 3 and the Python dependencies
(note that `cmake_format is pinned to a specific version):
-
* autopep8
* flake8
* cmake_format==0.5.2
-
-## Running tests
-
+# Running tests
Some tests are conditionally enabled based on the availability of certain
features in the package build (S3 support, compression libraries, etc.).
Others are generally skipped by default but can be enabled with environment
variables or other settings:
-
* All tests are skipped on Linux if the package builds without the C++ libarrow.
To make the build fail if libarrow is not available (as in, to test that
the C++ build was successful), set `TEST_R_WITH_ARROW=true`
@@ -494,7 +500,7 @@ variables or other settings:
settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
`MINIO_PORT` to override the defaults.
-## Github workflows
+# Github workflows
On a pull request, there are some actions you can trigger by commenting on the PR. We have additional CI checks that run nightly and can be requested on demand using an internal tool called [crosssbow](https://arrow.apache.org/docs/developers/crossbow.html). A few important GitHub comment commands include:
@@ -503,40 +509,35 @@ On a pull request, there are some actions you can trigger by commenting on the P
* `@github-actions autotune` will run and fix lint c++ linting errors + run R documentation (among other cleanup tasks) and commit them to the branch
-## Useful functions for Arrow developers
+# Useful functions for Arrow developers
Within an R session, these can help with package development:
``` r
# Load the dev package
devtools::load_all()
-
# Run the test suite, optionally filtering file names
devtools::test(filter="^regexp$")
# or the Makefile alternative from the arrow/r directory in a shell:
make test file=regexp
-
# Update roxygen documentation
devtools::document()
-
# To preview the documentation website
pkgdown::build_site()
-
# All package checks; see also below
devtools::check()
-
# See test coverage statistics
covr::report()
covr::package_coverage()
```
Any of those can be run from the command line by wrapping them in `R -e
-'$COMMAND'`. There’s also a `Makefile` to help with some common tasks
+'$COMMAND'`. There's also a `Makefile` to help with some common tasks
from the command line (`make test`, `make doc`, `make clean`, etc.)
-### Full package validation
+## Full package validation
``` shell
R CMD build .
R CMD check arrow_*.tar.gz --as-cran
-```
+```
\ No newline at end of file
From 1cbd7b1dacb8ef55dcabf3d80e42e960d79d08be Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Fri, 13 Aug 2021 13:28:06 +0100
Subject: [PATCH 02/31] Add styling/linting section back in
---
r/vignettes/developing.Rmd | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 884052efa18..e658868ac51 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -407,7 +407,7 @@ For any other build/configuration challenges, see the [C++ developer
guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-# Using `remotes::install_github(...)`
+## Using `remotes::install_github(...)`
If you need an Arrow installation from a specific repository or at a specific ref,
`remotes::install_github("apache/arrow/r", build = FALSE)`
@@ -431,7 +431,7 @@ separate from another Arrow development environment or system installation
* Setting the environment variable `FORCE_BUNDLED_BUILD` to `true` will skip the `pkg-config` search for Arrow libraries and attempt to build from the same source at the repository+ref given.
* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of Arrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`.
-# What happens when you `R CMD INSTALL`?
+## What happens when you `R CMD INSTALL`?
There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arrow users, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host) so the installation process is easy. However knowing about these scripts can help troubleshoot if things go wrong in them or things go wrong in an install:
@@ -443,6 +443,14 @@ There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arr
* `*** Proceed without C++` dependencies (this is an error and the package will not work, but if you see this message you know the previous steps have not succeeded/were not enabled)
* `inst/build_arrow_static.sh` this script builds Arrow for a bundled, static build. It is called by `tools/nixlibs.R` when the Arrow library is being built. (If you're looking at this script, and you've gotten this far, it might look incredibly familiar: it's basically the contents of this guide in script form — with a few important changes)
+## Styling and linting of the R code in the R package
+
+The R code in the package follows [the tidyverse style](https://style.tidyverse.org/). On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.
+
+To run the [lintr](https://github.com/jimhester/lintr) locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file `ci/docker/linux-apt-lint.dockerfile` for the current status) and then run `lintr::lint_package("arrow/r")`.
+
+
+
# Editing C++ code in the R package
The arrow package uses some customized tools on top of `cpp11` to prepare its
@@ -481,6 +489,7 @@ _Note_ that the lint script requires Python 3 and the Python dependencies
* autopep8
* flake8
* cmake_format==0.5.2
+
# Running tests
Some tests are conditionally enabled based on the availability of certain
features in the package build (S3 support, compression libraries, etc.).
From 3b14d84463993c82f3ac4484667f539065cb2c3d Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Fri, 13 Aug 2021 13:31:21 +0100
Subject: [PATCH 03/31] Add back in other missing bit
---
r/vignettes/developing.Rmd | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index e658868ac51..df4ba30e9b0 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -305,7 +305,7 @@ cmake \
-### Documentation
+## Documentation
The documentation for the R package uses features of `roxygen2` that haven't yet been released on CRAN, such as conditional inclusion of examples via the `@examplesIf` tag. If you are making changes which require updating the documentation, please install the development version of `roxygen2` from GitHub.
@@ -449,7 +449,13 @@ The R code in the package follows [the tidyverse style](https://style.tidyverse.
To run the [lintr](https://github.com/jimhester/lintr) locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file `ci/docker/linux-apt-lint.dockerfile` for the current status) and then run `lintr::lint_package("arrow/r")`.
+You can automatically change the formatting of the code in the package using the [styler](https://styler.r-lib.org/) package. There are two ways to do this:
+1. Use the comment bot to do this automatically with the command `@github-actions autotune` on a PR and commit it back to the branch.
+
+2. Locally, with the command `make style` (for only the files changed), `make style-all` (for all files), or use `styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))` note the two excluded files which should not be styled.
+
+The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we habitually do not style is in `r/.styler_excludes.R`.
# Editing C++ code in the R package
From 68e912f64e22976972c327a82ec39f567803290a Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Fri, 13 Aug 2021 13:39:02 +0100
Subject: [PATCH 04/31] Small tweaks
---
r/vignettes/developing.Rmd | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index df4ba30e9b0..0c66c8154d5 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -451,11 +451,11 @@ To run the [lintr](https://github.com/jimhester/lintr) locally, install the lint
You can automatically change the formatting of the code in the package using the [styler](https://styler.r-lib.org/) package. There are two ways to do this:
-1. Use the comment bot to do this automatically with the command `@github-actions autotune` on a PR and commit it back to the branch.
+1. Use the comment bot to do this automatically with the command `@github-actions autotune` on a PR, and commit it back to the branch.
-2. Locally, with the command `make style` (for only the files changed), `make style-all` (for all files), or use `styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))` note the two excluded files which should not be styled.
+2. Run the styler locally with the command `make style` (for only the files changed), `make style-all` (for all files), or use `styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))` - note the two excluded files which should not be styled.
-The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we habitually do not style is in `r/.styler_excludes.R`.
+The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in `r/.styler_excludes.R`.
# Editing C++ code in the R package
From 606c527ac95feb71260b2188cb425413b79137f0 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Mon, 16 Aug 2021 09:52:38 +0100
Subject: [PATCH 05/31] Add extra headings to make it easier to navigate the
docs, reorder some content
---
r/vignettes/developing.Rmd | 312 ++++++++++++++++++++-----------------
1 file changed, 170 insertions(+), 142 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 0c66c8154d5..f50d52d48a6 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -1,10 +1,14 @@
---
title: "Arrow R Developer Guide"
-output: rmarkdown::html_vignette
+output:
+ html_document:
+ toc: true
+ toc_depth: 2
vignette: >
%\VignetteIndexEntry{Arrow R Developer Guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
+
---
```{r setup-options, include=FALSE}
@@ -39,24 +43,25 @@ set -x
```
If you're looking to contribute to arrow, this vignette can help you set up a development environment that will enable you to write code and run tests locally. It outlines:
+
* how to build the components that make up the Arrow project and R package
* some common troubleshooting and workflows that developers use
-Many contributions can be accomplished with the instructions in [R-only development](#r-only-development), but if you're working on both the C++ library and the R package, the [Developer environment setup](#-developer-environment-setup) section will guide you through setting up a developer environment.
-
This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation); Linux users can find more details on building from source at `vignette("install", package = "arrow")`.
This document is a work in progress and will grow and change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but custom configurations might conflict with these instructions and there are differences of opinion across developers about how to set up development environments like this is.
We welcome any feedback you have about things that are confusing or additions you would like to see here. Please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see anything that is confusing, odd, or just plain wrong.
-# R-only developer environment setup
+# Developer environment setup
+
+## R-only
Windows and macOS users who wish to contribute to the R package and
don't need to alter the Arrow C++ library may be able to obtain a
recent version of the library without building from source.
-## macOS
+### macOS
On macOS, you can install the C++ library using [Homebrew](https://brew.sh/):
``` shell
@@ -66,7 +71,7 @@ brew install apache-arrow
brew install apache-arrow --HEAD
```
-## Windows and Linux
+### Windows and Linux
On Windows and Linux, you can download a .zip file with the arrow dependencies from the
nightly repository.
@@ -79,15 +84,15 @@ nightly$ls("libarrow/bin")
```
Version numbers in that repository correspond to dates.
-### Windows
+#### Windows
Windows users then can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing the arrow dependencies before installing the arrow R package.
-### Linux
+#### Linux
On Linux, you'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the arrow dependencies into it.
-# R and C++ developer environment setup
+## R and C++
If you need to alter both the Arrow C++ library and the R package code, or if you can't get a binary version of the latest C++ library elsewhere, you'll need to build it from source. This section discusses how to set up a C++ build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).
@@ -99,29 +104,29 @@ There are five major steps to the process — the first four are relevant to all
4. Install the Arrow library — this organizes and moves the compiled Arrow library files into the location specified in the configuration
5. Building the R package — this builds the C++ code in the R package, and installs the R package for you
-## Step 1 - Install dependencies
+### Step 1 - Install dependencies
The Arrow C++ library will by default use system dependencies if suitable versions are found. If system dependencies are not present, the Arrow C++ library will build them during its own build process. The only dependencies that you need to install _outside_ of the build process are `cmake` (for configuring the build) and `openssl` if you are building with S3 support.
For a faster build, you may choose to pre-install more C++ library dependencies (such as `lz4`, `zstd`, etc.) on the system so that they don't need to be built from source in the Arrow build.
-### macOS
+#### macOS
```{bash, save=run & macos}
brew install cmake openssl
```
-### Ubuntu
+#### Ubuntu
```{bash, save=run & ubuntu}
sudo apt install -y cmake libcurl4-openssl-dev libssl-dev
```
-### Windows
+#### Windows
Currently, the R package cannot be made to work with a locally-built Arrow C++ library. This will be resolved in a future release.
-## Step 2 - Configure the Arrow build
+### Step 2 - Configure the Arrow build
-### Build location
+#### Build location
There are two different ways that you can choose to build and then install the Arrow library:
@@ -132,7 +137,7 @@ You only need to do one of these options.
It is recommended that you install the arrow library to a user-level directory to be used in development. This is so that the development version you are using doesn't overwrite a released version of Arrow you may have installed. You are also able to have more than one version of the Arrow library to link to with this approach (by using different `ARROW_HOME` directories for the different versions). This approach also matches the recommendations for other Arrow bindings like [Python](http://arrow.apache.org/docs/developers/python.html).
-#### Configure for installing to a user directory
+##### Configure for installing to a user directory
In this example we will install the Arrow C++ library to a directory called `dist` that has the same parent directory as our `arrow` checkout but your installation of the Arrow R package can point to any directory with any name. However, we recommend *not* placing it inside of the arrow git checkout directory as unwanted changes could stop it working properly.
@@ -178,7 +183,7 @@ cmake \
`..` refers to the C++ source directory: we're in `cpp/build`, and the source is in `cpp`.
-#### Configure to install to a system directory
+##### Configure to install to a system directory
If you would like to install Arrow as a system library you can do that as well. This is in some respects simpler, but if you already have Arrow libraries installed there, it would disrupt them and possibly require `sudo` permissions.
@@ -210,7 +215,7 @@ cmake \
`..` refers to the C++ source directory: we're in `cpp/build`, and the source is in `cpp`.
-## More Arrow features
+#### Enabling more Arrow features
To enable optional features including: S3 support, an alternative memory allocator, and additional compression libraries, add some or all of these flags (the trailing `\` makes them easier to paste into a bash shell on a new line):
@@ -232,7 +237,7 @@ Other flags that may be useful:
_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace
-## Step 3 - Building Arrow
+### Step 3 - Building Arrow
You can add `-j#` between `make` and `install` here too to speed up compilation by running in parallel (where `#` is the number of cores you have available).
@@ -247,7 +252,7 @@ need to use `sudo`:
sudo make install
```
-## Step 4 - Build the Arrow R package
+### Step 4 - Build the Arrow R package
Once you've built the C++ library, you can install the R package and its
dependencies, along with additional dev dependencies, from the git
@@ -260,7 +265,7 @@ R -e 'install.packages("remotes"); remotes::install_deps(dependencies = TRUE)'
R CMD INSTALL .
```
-### Compilation flags
+#### Compilation flags
If you need to set any compilation flags while building the C++
extensions, you can use the `ARROW_R_CXXFLAGS` environment variable. For
@@ -305,111 +310,9 @@ cmake \
-## Documentation
-
-The documentation for the R package uses features of `roxygen2` that haven't yet been released on CRAN, such as conditional inclusion of examples via the `@examplesIf` tag. If you are making changes which require updating the documentation, please install the development version of `roxygen2` from GitHub.
-
-```{r}
-remotes::install_github("r-lib/roxygen2")
-```
-
-# Troubleshooting
-
-Note that after any change to the C++ library, you must reinstall it and
-run `make clean` or `git clean -fdx .` to remove any cached object code
-in the `r/src/` directory before reinstalling the R package. This is
-only necessary if you make changes to the C++ library source; you do not
-need to manually purge object files if you are only editing R or C++
-code inside `r/`.
-
-## Arrow library-R package mismatches
-
-If the Arrow library and the R package have diverged, you will see errors like:
-
-```
-Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
- unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
- dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Symbol not found: __ZN5arrow2io16RandomAccessFile9ReadAsyncERKNS0_9IOContextExx
- Referenced from: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
- Expected in: flat namespace
- in /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
-Error: loading failed
-Execution halted
-ERROR: loading failed
-```
-
-To resolve this, try rebuilding the Arrow library from [Building Arrow above](#building-arrow).
-
-### Multiple versions of Arrow library
-
-If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#installing-to-another-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
-
-```
-Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
- unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
- dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: /usr/local/lib/libarrow.400.dylib
- Referenced from: /usr/local/lib/libparquet.400.dylib
- Reason: image not found
-```
-
-You need to make sure that you don't let R link to your system library when building arrow. You can do this a number of different ways:
-
-* Setting the `MAKEFLAGS` environment variable to `"LDFLAGS="` (see below for an example) this is the recommended way to accomplish this
-* Using {withr}'s `with_makevars(list(LDFLAGS = ""), ...)`
-* adding `LDFLAGS=` to your `~/.R/Makevars` file (the least recommended way, though it is a common debugging approach suggested online)
-
-```{bash, save=run & !sys_install & macos, hide=TRUE}
-# Setup troubleshooting section
-# install a system-level arrow on macOS
-brew install apache-arrow
-```
-
+# Installing a version of the R package with a specific git reference
-```{bash, save=run & !sys_install & ubuntu, hide=TRUE}
-# Setup troubleshooting section
-# install a system-level arrow on Ubuntu
-sudo apt update
-sudo apt install -y -V ca-certificates lsb-release wget
-wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
-sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
-sudo apt update
-sudo apt install -y -V libarrow-dev
-```
-
-```{bash, save=run & !sys_install & macos}
-MAKEFLAGS="LDFLAGS=" R CMD INSTALL .
-```
-
-
-### `rpath` issues
-
-If the package fails to install/load with an error like this:
-
-```
- ** testing if installed package can be loaded from temporary location
- Error: package or namespace load failed for 'arrow' in dyn.load(file, DLLpath = DLLpath, ...):
- unable to load shared object '/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so':
- dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: @rpath/libarrow.14.dylib
-```
-
-ensure that `-DARROW_INSTALL_NAME_RPATH=OFF` was passed (this is important on
-macOS to prevent problems at link time and is a no-op on other platforms).
-Alternatively, try setting the environment variable `R_LD_LIBRARY_PATH` to
-wherever Arrow C++ was put in `make install`, e.g. `export
-R_LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package.
-
-When installing from source, if the R and C++ library versions do not
-match, installation may fail. If you've previously installed the
-libraries and want to upgrade the R package, you'll need to update the
-Arrow C++ library first.
-
-For any other build/configuration challenges, see the [C++ developer
-guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-
-
-## Using `remotes::install_github(...)`
-
-If you need an Arrow installation from a specific repository or at a specific ref,
+If you need an arrow installation from a specific repository or at a specific ref,
`remotes::install_github("apache/arrow/r", build = FALSE)`
should work on most platforms (with the notable exception of Windows).
The `build = FALSE` argument is important so that the installation can access the
@@ -426,24 +329,26 @@ remotes::install_github("apache/arrow/r@bugfix", build = FALSE)
Developers may wish to use this method of installing a specific commit
separate from another Arrow development environment or system installation
-(e.g. we use this in [arrowbench](https://github.com/ursacomputing/arrowbench) to install development versions of arrow isolated from the system install). If you already have Arrow C++ libraries installed system-wide, you may need to set some additional variables in order to isolate this build from your system libraries:
+(e.g. we use this in [arrowbench](https://github.com/ursacomputing/arrowbench)
+to install development versions of arrow isolated from the system install). If
+you already have Arrow C++ libraries installed system-wide, you may need to set
+some additional variables in order to isolate this build from your system libraries:
* Setting the environment variable `FORCE_BUNDLED_BUILD` to `true` will skip the `pkg-config` search for Arrow libraries and attempt to build from the same source at the repository+ref given.
+
* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of Arrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`.
-## What happens when you `R CMD INSTALL`?
+# Rebuilding the documentation
-There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arrow users, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host) so the installation process is easy. However knowing about these scripts can help troubleshoot if things go wrong in them or things go wrong in an install:
+The R documentation uses the [`@examplesIf`](https://roxygen2.r-lib.org/articles/rd.html#functions) tag introduced in `roxygen2` version 7.1.1.9001, which hasn't yet been released on CRAN at the time of writing. If you are making changes which require updating the documentation, please install the development version of `roxygen2` from GitHub.
-* `configure` and `configure.win` These scripts are triggered during `R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They handle finding the Arrow library, setting up the build variables necessary, and writing the package Makevars file that is used to compile the C++ code in the R package.
-* `tools/nixlibs.R` This script is sometimes called by `configure` on Linux (or on any non-windows OS with the environment variable `FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled builds (which is the default on linux). The operative logic is at the end of the script, but it will do the following (and it will stop with the first one that succeeds and some of the steps are only checked if they are enabled via an environment variable):
- * Check if there is an already built libarrow in `arrow/r/libarrow-{version}`, use that to link against if it exists.
- * Check if a binary is available from our hosted unofficial builds.
- * Download the Arrow source and build the Arrow Library from source.
- * `*** Proceed without C++` dependencies (this is an error and the package will not work, but if you see this message you know the previous steps have not succeeded/were not enabled)
-* `inst/build_arrow_static.sh` this script builds Arrow for a bundled, static build. It is called by `tools/nixlibs.R` when the Arrow library is being built. (If you're looking at this script, and you've gotten this far, it might look incredibly familiar: it's basically the contents of this guide in script form — with a few important changes)
+```{r}
+remotes::install_github("r-lib/roxygen2")
+```
+
+# Styling and linting
-## Styling and linting of the R code in the R package
+## R code
The R code in the package follows [the tidyverse style](https://style.tidyverse.org/). On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.
@@ -457,7 +362,7 @@ You can automatically change the formatting of the code in the package using the
The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in `r/.styler_excludes.R`.
-# Editing C++ code in the R package
+## C++ code
The arrow package uses some customized tools on top of `cpp11` to prepare its
C++ code in `src/`. This is because there are some features that are only enabled
@@ -492,6 +397,7 @@ this by installing LLVM via Homebrew and running the script as
_Note_ that the lint script requires Python 3 and the Python dependencies
(note that `cmake_format is pinned to a specific version):
+
* autopep8
* flake8
* cmake_format==0.5.2
@@ -501,27 +407,56 @@ Some tests are conditionally enabled based on the availability of certain
features in the package build (S3 support, compression libraries, etc.).
Others are generally skipped by default but can be enabled with environment
variables or other settings:
+
* All tests are skipped on Linux if the package builds without the C++ libarrow.
To make the build fail if libarrow is not available (as in, to test that
the C++ build was successful), set `TEST_R_WITH_ARROW=true`
+
* Some tests are disabled unless `ARROW_R_DEV=true`
+
* Tests that require allocating >2GB of memory to test Large types are disabled
unless `ARROW_LARGE_MEMORY_TESTS=true`
+
* Integration tests against a real S3 bucket are disabled unless credentials
are set in `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`; these are available
on request
+
* S3 tests using [MinIO](https://min.io/) locally are enabled if the
`minio server` process is found running. If you're running MinIO with custom
settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
`MINIO_PORT` to override the defaults.
-# Github workflows
+# Running additional CI checks
On a pull request, there are some actions you can trigger by commenting on the PR. We have additional CI checks that run nightly and can be requested on demand using an internal tool called [crosssbow](https://arrow.apache.org/docs/developers/crossbow.html). A few important GitHub comment commands include:
-* `@github-actions crossbow submit -g r` for all extended R CI tests
-* `@github-actions crossbow submit {task-name}` for running a specific task. See the `r:` group definition near the beginning of the [crossbow configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml) for a list of glob expression patterns that match names of items in the `tasks:` list below it.
-* `@github-actions autotune` will run and fix lint c++ linting errors + run R documentation (among other cleanup tasks) and commit them to the branch
+### Run all extended R CI tasks
+`@github-actions crossbow submit -g r`
+
+This runs each of the R-related CI tasks.
+
+### Run a specific task
+`@github-actions crossbow submit {task-name}`
+
+See the `r:` group definition near the beginning of the [crossbow configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml) for a list of glob expression patterns that match names of items in the `tasks:` list below it.
+
+### Run linting and documentation building tasks
+
+`@github-actions autotune`
+
+This will run and fix lint c++ linting errors + run R documentation (among other cleanup tasks) and commit them to the branch.
+
+# What happens when you `R CMD INSTALL`?
+
+There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arrow users, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host) so the installation process is easy. However knowing about these scripts can help troubleshoot if things go wrong in them or things go wrong in an install:
+
+* `configure` and `configure.win` These scripts are triggered during `R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They handle finding the Arrow library, setting up the build variables necessary, and writing the package Makevars file that is used to compile the C++ code in the R package.
+* `tools/nixlibs.R` This script is sometimes called by `configure` on Linux (or on any non-windows OS with the environment variable `FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled builds (which is the default on linux). The operative logic is at the end of the script, but it will do the following (and it will stop with the first one that succeeds and some of the steps are only checked if they are enabled via an environment variable):
+ * Check if there is an already built libarrow in `arrow/r/libarrow-{version}`, use that to link against if it exists.
+ * Check if a binary is available from our hosted unofficial builds.
+ * Download the Arrow source and build the Arrow Library from source.
+ * `*** Proceed without C++` dependencies (this is an error and the package will not work, but if you see this message you know the previous steps have not succeeded/were not enabled)
+* `inst/build_arrow_static.sh` this script builds Arrow for a bundled, static build. It is called by `tools/nixlibs.R` when the Arrow library is being built. (If you're looking at this script, and you've gotten this far, it might look incredibly familiar: it's basically the contents of this guide in script form — with a few important changes)
# Useful functions for Arrow developers
@@ -555,4 +490,97 @@ from the command line (`make test`, `make doc`, `make clean`, etc.)
``` shell
R CMD build .
R CMD check arrow_*.tar.gz --as-cran
-```
\ No newline at end of file
+```
+
+# Troubleshooting
+
+Note that after any change to the C++ library, you must reinstall it and
+run `make clean` or `git clean -fdx .` to remove any cached object code
+in the `r/src/` directory before reinstalling the R package. This is
+only necessary if you make changes to the C++ library source; you do not
+need to manually purge object files if you are only editing R or C++
+code inside `r/`.
+
+## Arrow library-R package mismatches
+
+If the Arrow library and the R package have diverged, you will see errors like:
+
+```
+Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
+ unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
+ dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Symbol not found: __ZN5arrow2io16RandomAccessFile9ReadAsyncERKNS0_9IOContextExx
+ Referenced from: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
+ Expected in: flat namespace
+ in /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
+Error: loading failed
+Execution halted
+ERROR: loading failed
+```
+
+To resolve this, try rebuilding the Arrow library from [Building Arrow above](#building-arrow).
+
+## Multiple versions of Arrow library
+
+If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#installing-to-another-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
+
+```
+Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
+ unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
+ dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: /usr/local/lib/libarrow.400.dylib
+ Referenced from: /usr/local/lib/libparquet.400.dylib
+ Reason: image not found
+```
+
+You need to make sure that you don't let R link to your system library when building arrow. You can do this a number of different ways:
+
+* Setting the `MAKEFLAGS` environment variable to `"LDFLAGS="` (see below for an example) this is the recommended way to accomplish this
+* Using {withr}'s `with_makevars(list(LDFLAGS = ""), ...)`
+* adding `LDFLAGS=` to your `~/.R/Makevars` file (the least recommended way, though it is a common debugging approach suggested online)
+
+```{bash, save=run & !sys_install & macos, hide=TRUE}
+# Setup troubleshooting section
+# install a system-level arrow on macOS
+brew install apache-arrow
+```
+
+
+```{bash, save=run & !sys_install & ubuntu, hide=TRUE}
+# Setup troubleshooting section
+# install a system-level arrow on Ubuntu
+sudo apt update
+sudo apt install -y -V ca-certificates lsb-release wget
+wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+sudo apt update
+sudo apt install -y -V libarrow-dev
+```
+
+```{bash, save=run & !sys_install & macos}
+MAKEFLAGS="LDFLAGS=" R CMD INSTALL .
+```
+
+
+## `rpath` issues
+
+If the package fails to install/load with an error like this:
+
+```
+ ** testing if installed package can be loaded from temporary location
+ Error: package or namespace load failed for 'arrow' in dyn.load(file, DLLpath = DLLpath, ...):
+ unable to load shared object '/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so':
+ dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: @rpath/libarrow.14.dylib
+```
+
+ensure that `-DARROW_INSTALL_NAME_RPATH=OFF` was passed (this is important on
+macOS to prevent problems at link time and is a no-op on other platforms).
+Alternatively, try setting the environment variable `R_LD_LIBRARY_PATH` to
+wherever Arrow C++ was put in `make install`, e.g. `export
+R_LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package.
+
+When installing from source, if the R and C++ library versions do not
+match, installation may fail. If you've previously installed the
+libraries and want to upgrade the R package, you'll need to update the
+Arrow C++ library first.
+
+For any other build/configuration challenges, see the [C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
\ No newline at end of file
From 01d28f35ec534f249d8bad3b637609d34b64201b Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Mon, 16 Aug 2021 10:07:26 +0100
Subject: [PATCH 06/31] Fix internal links
---
r/vignettes/developing.Rmd | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index f50d52d48a6..bd371cd0b92 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -47,7 +47,7 @@ If you're looking to contribute to arrow, this vignette can help you set up a de
* how to build the components that make up the Arrow project and R package
* some common troubleshooting and workflows that developers use
-This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation); Linux users can find more details on building from source at `vignette("install", package = "arrow")`.
+This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation).
This document is a work in progress and will grow and change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but custom configurations might conflict with these instructions and there are differences of opinion across developers about how to set up development environments like this is.
@@ -96,13 +96,7 @@ On Linux, you'll need to create a `libarrow` directory inside the R package dire
If you need to alter both the Arrow C++ library and the R package code, or if you can't get a binary version of the latest C++ library elsewhere, you'll need to build it from source. This section discusses how to set up a C++ build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-There are five major steps to the process — the first four are relevant to all Arrow developers, and the last one is specific to developers making changes to the R package:
-
-1. Install dependencies
-2. Configuring the Arrow library build (using `cmake`) — this specifies how you want the build to go, what features to include, etc.
-3. Building the Arrow library — this actually compiles the Arrow library
-4. Install the Arrow library — this organizes and moves the compiled Arrow library files into the location specified in the configuration
-5. Building the R package — this builds the C++ code in the R package, and installs the R package for you
+There are five major steps to the process — the first four are relevant to all Arrow developers, and the last one is specific to developers making changes to the R package.
### Step 1 - Install dependencies
@@ -276,7 +270,7 @@ need to set
export ARROW_R_CXXFLAGS=-fno-omit-frame-pointer
```
-### Developer Experience
+#### Recompiling the C++ code
With the setups described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the Arrow library C++ has changed and there is a mismatch between the Arrow Library and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up.
@@ -517,11 +511,11 @@ Execution halted
ERROR: loading failed
```
-To resolve this, try rebuilding the Arrow library from [Building Arrow above](#building-arrow).
+To resolve this, try rebuilding the Arrow library from [Building Arrow above](#step-3-building-arrow).
## Multiple versions of Arrow library
-If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#installing-to-another-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
+If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#configure-for-installing-to-a-user-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
```
Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
From d7ff3dda8a0f5e7394f8b1dc4071fcf89ed8a3c9 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Mon, 16 Aug 2021 10:10:15 +0100
Subject: [PATCH 07/31] Update pkgdown command to show preview
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index bd371cd0b92..0b11301fddc 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -467,7 +467,7 @@ make test file=regexp
# Update roxygen documentation
devtools::document()
# To preview the documentation website
-pkgdown::build_site()
+pkgdown::build_site(preview=TRUE)
# All package checks; see also below
devtools::check()
# See test coverage statistics
From 2e204c59b7253b27411c6d7c44f05132e05076a3 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Mon, 16 Aug 2021 10:12:37 +0100
Subject: [PATCH 08/31] Remove section as it's not Arrow-specific knowledge
---
r/vignettes/developing.Rmd | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 0b11301fddc..0c78afd35a1 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -455,7 +455,7 @@ There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arr
# Useful functions for Arrow developers
-Within an R session, these can help with package development:
+Within an R session, these functions can help with package development:
``` r
# Load the dev package
@@ -475,16 +475,7 @@ covr::report()
covr::package_coverage()
```
-Any of those can be run from the command line by wrapping them in `R -e
-'$COMMAND'`. There's also a `Makefile` to help with some common tasks
-from the command line (`make test`, `make doc`, `make clean`, etc.)
-
-## Full package validation
-
-``` shell
-R CMD build .
-R CMD check arrow_*.tar.gz --as-cran
-```
+There's also a `Makefile` to help with some common tasks from the command line (`make test`, `make doc`, `make clean`, etc.)
# Troubleshooting
From e6df2962897173d447e9c927b7eed414cad3addb Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Mon, 16 Aug 2021 12:04:21 +0100
Subject: [PATCH 09/31] Rename section headers and move some of the info around
installation into install vignette
---
r/vignettes/developing.Rmd | 101 ++++++++++++++++++++-----------------
r/vignettes/install.Rmd | 33 ++++++++++++
2 files changed, 88 insertions(+), 46 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 0c78afd35a1..226f25271b4 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -304,7 +304,7 @@ cmake \
-# Installing a version of the R package with a specific git reference
+## Installing a version of the R package with a specific git reference
If you need an arrow installation from a specific repository or at a specific ref,
`remotes::install_github("apache/arrow/r", build = FALSE)`
@@ -332,7 +332,15 @@ some additional variables in order to isolate this build from your system librar
* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of Arrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`.
-# Rebuilding the documentation
+# Common developer workflow tasks
+
+The `arrow/r` directory contains a `Makefile` to help with some common tasks from the command line (e.g. `make test`, `make doc`, `make clean`, etc.).
+
+## Loading arrow
+
+You can load the R package via `devtools::load_all()`.
+
+## Rebuilding the documentation
The R documentation uses the [`@examplesIf`](https://roxygen2.r-lib.org/articles/rd.html#functions) tag introduced in `roxygen2` version 7.1.1.9001, which hasn't yet been released on CRAN at the time of writing. If you are making changes which require updating the documentation, please install the development version of `roxygen2` from GitHub.
@@ -340,9 +348,19 @@ The R documentation uses the [`@examplesIf`](https://roxygen2.r-lib.org/articles
remotes::install_github("r-lib/roxygen2")
```
-# Styling and linting
+You can use `devtools::document()` and `pkgdown::build_site()` to rebuild the documentation and preview the results.
+
+```r
+# Update roxygen documentation
+devtools::document()
+
+# To preview the documentation website
+pkgdown::build_site(preview=TRUE)
+```
+
+## Styling and linting
-## R code
+### R code
The R code in the package follows [the tidyverse style](https://style.tidyverse.org/). On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.
@@ -356,7 +374,7 @@ You can automatically change the formatting of the code in the package using the
The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in `r/.styler_excludes.R`.
-## C++ code
+### C++ code
The arrow package uses some customized tools on top of `cpp11` to prepare its
C++ code in `src/`. This is because there are some features that are only enabled
@@ -396,7 +414,18 @@ _Note_ that the lint script requires Python 3 and the Python dependencies
* flake8
* cmake_format==0.5.2
-# Running tests
+## Running tests
+
+Tests can be run either using `devtools::test()` or the Makefile alternative.
+
+```r
+# Run the test suite, optionally filtering file names
+devtools::test(filter="^regexp$")
+
+# or the Makefile alternative from the arrow/r directory in a shell:
+make test file=regexp
+```
+
Some tests are conditionally enabled based on the availability of certain
features in the package build (S3 support, compression libraries, etc.).
Others are generally skipped by default but can be enabled with environment
@@ -420,62 +449,38 @@ variables or other settings:
settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
`MINIO_PORT` to override the defaults.
-# Running additional CI checks
+## Running checks
+
+You can run package checks by using `devtools::check()` or the Makefile alternative.
+
+```
+# All package checks; see also below
+devtools::check()
+```
+
+
+## Running additional CI checks
On a pull request, there are some actions you can trigger by commenting on the PR. We have additional CI checks that run nightly and can be requested on demand using an internal tool called [crosssbow](https://arrow.apache.org/docs/developers/crossbow.html). A few important GitHub comment commands include:
-### Run all extended R CI tasks
+#### Run all extended R CI tasks
`@github-actions crossbow submit -g r`
This runs each of the R-related CI tasks.
-### Run a specific task
+#### Run a specific task
`@github-actions crossbow submit {task-name}`
See the `r:` group definition near the beginning of the [crossbow configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml) for a list of glob expression patterns that match names of items in the `tasks:` list below it.
-### Run linting and documentation building tasks
+#### Run linting and documentation building tasks
`@github-actions autotune`
-This will run and fix lint c++ linting errors + run R documentation (among other cleanup tasks) and commit them to the branch.
+This will run and fix lint C++ linting errors, run R documentation (among other cleanup tasks), and commit the resulting updates to the branch.
-# What happens when you `R CMD INSTALL`?
-There are a number of scripts that are triggered when `R CMD INSTALL .`. For Arrow users, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host) so the installation process is easy. However knowing about these scripts can help troubleshoot if things go wrong in them or things go wrong in an install:
-
-* `configure` and `configure.win` These scripts are triggered during `R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They handle finding the Arrow library, setting up the build variables necessary, and writing the package Makevars file that is used to compile the C++ code in the R package.
-* `tools/nixlibs.R` This script is sometimes called by `configure` on Linux (or on any non-windows OS with the environment variable `FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled builds (which is the default on linux). The operative logic is at the end of the script, but it will do the following (and it will stop with the first one that succeeds and some of the steps are only checked if they are enabled via an environment variable):
- * Check if there is an already built libarrow in `arrow/r/libarrow-{version}`, use that to link against if it exists.
- * Check if a binary is available from our hosted unofficial builds.
- * Download the Arrow source and build the Arrow Library from source.
- * `*** Proceed without C++` dependencies (this is an error and the package will not work, but if you see this message you know the previous steps have not succeeded/were not enabled)
-* `inst/build_arrow_static.sh` this script builds Arrow for a bundled, static build. It is called by `tools/nixlibs.R` when the Arrow library is being built. (If you're looking at this script, and you've gotten this far, it might look incredibly familiar: it's basically the contents of this guide in script form — with a few important changes)
-
-
-# Useful functions for Arrow developers
-
-Within an R session, these functions can help with package development:
-
-``` r
-# Load the dev package
-devtools::load_all()
-# Run the test suite, optionally filtering file names
-devtools::test(filter="^regexp$")
-# or the Makefile alternative from the arrow/r directory in a shell:
-make test file=regexp
-# Update roxygen documentation
-devtools::document()
-# To preview the documentation website
-pkgdown::build_site(preview=TRUE)
-# All package checks; see also below
-devtools::check()
-# See test coverage statistics
-covr::report()
-covr::package_coverage()
-```
-There's also a `Makefile` to help with some common tasks from the command line (`make test`, `make doc`, `make clean`, etc.)
# Troubleshooting
@@ -568,4 +573,8 @@ libraries and want to upgrade the R package, you'll need to update the
Arrow C++ library first.
For any other build/configuration challenges, see the [C++ developer
-guide](https://arrow.apache.org/docs/developers/cpp/building.html).
\ No newline at end of file
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+## Other installation issues
+
+There are a number of scripts that are triggered when the arrow R package is installed. For package users who are not interacting with the underlying code, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host). However, knowing about these scripts can help package developers troubleshoot if things go wrong in them or things go wrong in an install. See [the installation vignette](./install.html) for more information.
\ No newline at end of file
diff --git a/r/vignettes/install.Rmd b/r/vignettes/install.Rmd
index 47ae8944b71..5fc487c85b7 100644
--- a/r/vignettes/install.Rmd
+++ b/r/vignettes/install.Rmd
@@ -123,6 +123,39 @@ you'll need to reinstall the package in order to enable S3 support.
# How dependencies are resolved
+There are a number of scripts that are triggered when `R CMD INSTALL .` is run.
+For Arrow users, these should all just work without configuration and pull in
+the most complete pieces (e.g. official binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the Arrow library, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
+(or on any non-windows OS with the environment variable
+`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
+builds (which is the default on linux). The operative logic is at the end of
+the script, but it will do the following (and it will stop with the first one
+that succeeds and some of the steps are only checked if they are enabled via an
+environment variable):
+ * Check if there is an already built libarrow in `arrow/r/libarrow-{version}`,
+ use that to link against if it exists.
+ * Check if a binary is available from our hosted unofficial builds.
+ * Download the Arrow source and build the Arrow Library from source.
+ * `*** Proceed without C++` dependencies (this is an error and the package
+ will not work, but if you see this message you know the previous steps have
+ not succeeded/were not enabled)
+
+* `inst/build_arrow_static.sh` - this script builds Arrow for a bundled, static
+build. It is called by `tools/nixlibs.R` when the Arrow library is being built.
+(If you're looking at this script, and you've gotten this far, it might look
+incredibly familiar: it's basically the contents of this guide in script form —
+with a few important changes)
+
In order for the `arrow` R package to work, it needs the Arrow C++ library.
There are a number of ways you can get it: a system package; a library you've
built yourself outside of the context of installing the R package;
From 0b8261ebb1d311060015e8aa63654f94b4cdebba Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Mon, 16 Aug 2021 15:18:08 +0100
Subject: [PATCH 10/31] Add back in the full validation bit
---
r/vignettes/developing.Rmd | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 226f25271b4..754132d2248 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -451,11 +451,23 @@ variables or other settings:
## Running checks
-You can run package checks by using `devtools::check()` or the Makefile alternative.
+You can run package checks by using `devtools::check()` and check test coverage with `covr::package_coverage()`.
-```
-# All package checks; see also below
+```r
+# All package checks
devtools::check()
+
+# See test coverage statistics
+covr::report()
+covr::package_coverage()
+
+```
+
+For full package validation, you can run the following commands from a terminal.
+
+```
+R CMD build .
+R CMD check arrow_*.tar.gz --as-cran
```
@@ -479,9 +491,6 @@ See the `r:` group definition near the beginning of the [crossbow configuration]
This will run and fix lint C++ linting errors, run R documentation (among other cleanup tasks), and commit the resulting updates to the branch.
-
-
-
# Troubleshooting
Note that after any change to the C++ library, you must reinstall it and
From fd45fd794239db38139134a2a1d04bc35bd8fc17 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Mon, 16 Aug 2021 15:22:55 +0100
Subject: [PATCH 11/31] Rephrase "specific ref"
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 754132d2248..02e8619167f 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -306,7 +306,7 @@ cmake \
## Installing a version of the R package with a specific git reference
-If you need an arrow installation from a specific repository or at a specific ref,
+If you need an arrow installation from a specific repository or git reference,
`remotes::install_github("apache/arrow/r", build = FALSE)`
should work on most platforms (with the notable exception of Windows).
The `build = FALSE` argument is important so that the installation can access the
From c72fc8facbf86eba08f8ed6bbfb84c5a53a4c09e Mon Sep 17 00:00:00 2001
From: Nic
Date: Wed, 25 Aug 2021 08:22:06 +0000
Subject: [PATCH 12/31] Update r/vignettes/developing.Rmd
Co-authored-by: Jonathan Keane
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 02e8619167f..ef59c5b6d04 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -90,7 +90,7 @@ Windows users then can set the `RWINLIB_LOCAL` environment variable to point to
#### Linux
-On Linux, you'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the arrow dependencies into it.
+On Linux, you'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled arrow binary files into it.
## R and C++
From 62913d54d08c1006742aa8c00ca1b16fb32e4635 Mon Sep 17 00:00:00 2001
From: Nic
Date: Wed, 25 Aug 2021 08:24:43 +0000
Subject: [PATCH 13/31] Update r/vignettes/developing.Rmd
Co-authored-by: Jonathan Keane
---
r/vignettes/developing.Rmd | 1 -
1 file changed, 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index ef59c5b6d04..d9d2d52f3a5 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -460,7 +460,6 @@ devtools::check()
# See test coverage statistics
covr::report()
covr::package_coverage()
-
```
For full package validation, you can run the following commands from a terminal.
From e10c29d69fe70308d770469f331f893f5dc05edd Mon Sep 17 00:00:00 2001
From: Nic
Date: Wed, 25 Aug 2021 08:25:58 +0000
Subject: [PATCH 14/31] Update r/vignettes/developing.Rmd
Co-authored-by: Jonathan Keane
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index d9d2d52f3a5..9bf70ecd6f9 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -488,7 +488,7 @@ See the `r:` group definition near the beginning of the [crossbow configuration]
`@github-actions autotune`
-This will run and fix lint C++ linting errors, run R documentation (among other cleanup tasks), and commit the resulting updates to the branch.
+This will run and fix lint C++ linting errors, run R documentation (among other cleanup tasks), run styler on any changed R code, and commit the resulting updates to the branch.
# Troubleshooting
From 7d6f025050164ab1bba463f45de31a5150a512e4 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 25 Aug 2021 17:25:11 +0100
Subject: [PATCH 15/31] Remove newline and revert to using html_vignette
---
r/vignettes/developing.Rmd | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 9bf70ecd6f9..c9349c185bd 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -1,14 +1,10 @@
---
title: "Arrow R Developer Guide"
-output:
- html_document:
- toc: true
- toc_depth: 2
+output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Arrow R Developer Guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
-
---
```{r setup-options, include=FALSE}
From 4800da9b9c96ca4db4a01b247da4aa61c6b9745d Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 25 Aug 2021 18:19:15 +0100
Subject: [PATCH 16/31] Add tabsets back in and a bit of rephrasing
---
r/vignettes/developing.Rmd | 63 ++++++++++++++++++++++----------------
1 file changed, 36 insertions(+), 27 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index c9349c185bd..f3399c3fa86 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -49,14 +49,29 @@ This document is a work in progress and will grow and change as the Apache Arrow
We welcome any feedback you have about things that are confusing or additions you would like to see here. Please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see anything that is confusing, odd, or just plain wrong.
-# Developer environment setup
+# Developer environment setup
-## R-only
+## R-only {.tabset}
Windows and macOS users who wish to contribute to the R package and
don't need to alter the Arrow C++ library may be able to obtain a
recent version of the library without building from source.
+### Linux
+
+On Linux, you can download a .zip file with the arrow dependencies from the
+nightly repository.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 client's) S3 listing functionality to see what is in the bucket `s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+Version numbers in that repository correspond to dates.
+
+You'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled arrow binary files into it.
+
### macOS
On macOS, you can install the C++ library using [Homebrew](https://brew.sh/):
@@ -67,10 +82,9 @@ brew install apache-arrow
brew install apache-arrow --HEAD
```
-### Windows and Linux
+### Windows
-On Windows and Linux, you can download a .zip file with the arrow dependencies from the
-nightly repository.
+On Windows, you can download a .zip file with the arrow dependencies from the nightly repository.
To see what nightlies are available, you can use Arrow's (or any other S3 client's) S3 listing functionality to see what is in the bucket `s3://arrow-r-nightly/libarrow/bin`:
@@ -80,43 +94,38 @@ nightly$ls("libarrow/bin")
```
Version numbers in that repository correspond to dates.
-#### Windows
+You can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing the arrow dependencies before installing the arrow R package.
-Windows users then can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing the arrow dependencies before installing the arrow R package.
-
-#### Linux
-
-On Linux, you'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled arrow binary files into it.
## R and C++
If you need to alter both the Arrow C++ library and the R package code, or if you can't get a binary version of the latest C++ library elsewhere, you'll need to build it from source. This section discusses how to set up a C++ build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-There are five major steps to the process — the first four are relevant to all Arrow developers, and the last one is specific to developers making changes to the R package.
+There are five major steps to the process.
-### Step 1 - Install dependencies
+### Step 1 - Install dependencies {.tabset}
The Arrow C++ library will by default use system dependencies if suitable versions are found. If system dependencies are not present, the Arrow C++ library will build them during its own build process. The only dependencies that you need to install _outside_ of the build process are `cmake` (for configuring the build) and `openssl` if you are building with S3 support.
For a faster build, you may choose to pre-install more C++ library dependencies (such as `lz4`, `zstd`, etc.) on the system so that they don't need to be built from source in the Arrow build.
-#### macOS
-```{bash, save=run & macos}
-brew install cmake openssl
-```
-
#### Ubuntu
```{bash, save=run & ubuntu}
sudo apt install -y cmake libcurl4-openssl-dev libssl-dev
```
+#### macOS
+```{bash, save=run & macos}
+brew install cmake openssl
+```
+
#### Windows
Currently, the R package cannot be made to work with a locally-built Arrow C++ library. This will be resolved in a future release.
### Step 2 - Configure the Arrow build
-#### Build location
+#### Build location {.tabset}
There are two different ways that you can choose to build and then install the Arrow library:
@@ -125,11 +134,11 @@ There are two different ways that you can choose to build and then install the A
You only need to do one of these options.
-It is recommended that you install the arrow library to a user-level directory to be used in development. This is so that the development version you are using doesn't overwrite a released version of Arrow you may have installed. You are also able to have more than one version of the Arrow library to link to with this approach (by using different `ARROW_HOME` directories for the different versions). This approach also matches the recommendations for other Arrow bindings like [Python](http://arrow.apache.org/docs/developers/python.html).
+We recommend that you configure the arrow library to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of Arrow you may have installed. You are also able to have more than one version of the Arrow library to link to with this approach (by using different `ARROW_HOME` directories for the different versions). This approach also matches the recommendations for other Arrow bindings like [Python](http://arrow.apache.org/docs/developers/python.html).
##### Configure for installing to a user directory
-In this example we will install the Arrow C++ library to a directory called `dist` that has the same parent directory as our `arrow` checkout but your installation of the Arrow R package can point to any directory with any name. However, we recommend *not* placing it inside of the arrow git checkout directory as unwanted changes could stop it working properly.
+In the example below, the Arrow C++ library is installed to a directory called `dist` that has the same parent directory as the `arrow` checkout, but your installation of the Arrow R package can point to any directory with any name. However, we recommend *not* placing it inside of the arrow git checkout directory as unwanted changes could stop it working properly.
```{bash, save=run & !sys_install}
export ARROW_HOME=$(pwd)/dist
@@ -143,7 +152,7 @@ export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile
```
-Now you can move into the arrow repository to start the build process. You will need to create a directory into which the C++ build will put its contents. It is recommended to make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). And then, change directories to be inside `cpp/build`:
+Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`:
```{bash, save=run & !sys_install}
pushd arrow
@@ -171,13 +180,13 @@ cmake \
..
```
-`..` refers to the C++ source directory: we're in `cpp/build`, and the source is in `cpp`.
+`..` refers to the C++ source directory: you're in `cpp/build`, and the source is in `cpp`.
##### Configure to install to a system directory
-If you would like to install Arrow as a system library you can do that as well. This is in some respects simpler, but if you already have Arrow libraries installed there, it would disrupt them and possibly require `sudo` permissions.
+It is possible to install Arrow as a system library. This is, in some respects, simpler than installing it to a user-level directory. However, if you already have a previous Arrow installation, this may disrupt it and could require `sudo` permissions to run the commands below.
-Now you can move into the arrow repository to start the build process. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). And then, change directories to be inside `cpp/build`:
+Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`:
```{bash, save=run & sys_install}
pushd arrow
@@ -185,7 +194,7 @@ mkdir -p cpp/build
pushd cpp/build
```
-You'll first call `cmake` to configure the build and then `make install`. For the R package, you'll need to enable several features in the C++ library using `-D` flags:
+First call `cmake` to configure the build and then `make install` to install the library. For the R package, you'll need to enable several features in the C++ library using `-D` flags:
```{bash, save=run & sys_install}
cmake \
@@ -203,7 +212,7 @@ cmake \
..
```
-`..` refers to the C++ source directory: we're in `cpp/build`, and the source is in `cpp`.
+`..` refers to the C++ source directory: you're in `cpp/build`, and the source is in `cpp`.
#### Enabling more Arrow features
From 55b9f46242c711f6bec0590fafb56d65f32edbcc Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 25 Aug 2021 18:25:04 +0100
Subject: [PATCH 17/31] Remove system install instructions
---
r/vignettes/developing.Rmd | 58 +++-----------------------------------
1 file changed, 4 insertions(+), 54 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index f3399c3fa86..7eca8122756 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -125,20 +125,9 @@ Currently, the R package cannot be made to work with a locally-built Arrow C++ l
### Step 2 - Configure the Arrow build
-#### Build location {.tabset}
+We recommend that you configure the arrow library to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of Arrow you may already have installed, and so that you are also able work with more than one version of the Arrow library (by using different `ARROW_HOME` directories for the different versions).
-There are two different ways that you can choose to build and then install the Arrow library:
-
-1. into a user-defined directory
-2. into a system-level directory
-
-You only need to do one of these options.
-
-We recommend that you configure the arrow library to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of Arrow you may have installed. You are also able to have more than one version of the Arrow library to link to with this approach (by using different `ARROW_HOME` directories for the different versions). This approach also matches the recommendations for other Arrow bindings like [Python](http://arrow.apache.org/docs/developers/python.html).
-
-##### Configure for installing to a user directory
-
-In the example below, the Arrow C++ library is installed to a directory called `dist` that has the same parent directory as the `arrow` checkout, but your installation of the Arrow R package can point to any directory with any name. However, we recommend *not* placing it inside of the arrow git checkout directory as unwanted changes could stop it working properly.
+In the example below, the Arrow C++ library is installed to a directory called `dist` that has the same parent directory as the `arrow` checkout. Your installation of the Arrow R package can point to any directory with any name, though we recommend *not* placing it inside of the arrow git checkout directory as unwanted changes could stop it working properly.
```{bash, save=run & !sys_install}
export ARROW_HOME=$(pwd)/dist
@@ -182,41 +171,9 @@ cmake \
`..` refers to the C++ source directory: you're in `cpp/build`, and the source is in `cpp`.
-##### Configure to install to a system directory
-
-It is possible to install Arrow as a system library. This is, in some respects, simpler than installing it to a user-level directory. However, if you already have a previous Arrow installation, this may disrupt it and could require `sudo` permissions to run the commands below.
-
-Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`:
-
-```{bash, save=run & sys_install}
-pushd arrow
-mkdir -p cpp/build
-pushd cpp/build
-```
-
-First call `cmake` to configure the build and then `make install` to install the library. For the R package, you'll need to enable several features in the C++ library using `-D` flags:
-
-```{bash, save=run & sys_install}
-cmake \
- -DARROW_COMPUTE=ON \
- -DARROW_CSV=ON \
- -DARROW_DATASET=ON \
- -DARROW_EXTRA_ERROR_CONTEXT=ON \
- -DARROW_FILESYSTEM=ON \
- -DARROW_INSTALL_NAME_RPATH=OFF \
- -DARROW_JEMALLOC=ON \
- -DARROW_JSON=ON \
- -DARROW_PARQUET=ON \
- -DARROW_WITH_SNAPPY=ON \
- -DARROW_WITH_ZLIB=ON \
- ..
-```
-
-`..` refers to the C++ source directory: you're in `cpp/build`, and the source is in `cpp`.
-
#### Enabling more Arrow features
-To enable optional features including: S3 support, an alternative memory allocator, and additional compression libraries, add some or all of these flags (the trailing `\` makes them easier to paste into a bash shell on a new line):
+To enable optional features including: S3 support, an alternative memory allocator, and additional compression libraries, add some or all of these flags to your call to `cmake` (the trailing `\` makes them easier to paste into a bash shell on a new line):
``` shell
-DARROW_MIMALLOC=ON \
@@ -234,7 +191,7 @@ Other flags that may be useful:
* `-DCMAKE_BUILD_TYPE=debug` or `-DCMAKE_BUILD_TYPE=relwithdebinfo` can be useful for debugging. You probably don't want to do this generally because a debug build is much slower at runtime than the default `release` build.
-_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace
+_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace.
### Step 3 - Building Arrow
@@ -244,13 +201,6 @@ You can add `-j#` between `make` and `install` here too to speed up compilation
make -j8 install
```
-If you are installing on linux, and you are installing to the system, you may
-need to use `sudo`:
-
-```{bash, save=run & sys_install & ubuntu}
-sudo make install
-```
-
### Step 4 - Build the Arrow R package
Once you've built the C++ library, you can install the R package and its
From 53d7e837bf6ba57f67870845fbd412d1045408e5 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 25 Aug 2021 19:23:22 +0100
Subject: [PATCH 18/31] Plural to singular
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 7eca8122756..f8e9d6d2657 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -227,7 +227,7 @@ export ARROW_R_CXXFLAGS=-fno-omit-frame-pointer
#### Recompiling the C++ code
-With the setups described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the Arrow library C++ has changed and there is a mismatch between the Arrow Library and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up.
+With the setup described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the Arrow library C++ has changed and there is a mismatch between the Arrow Library and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up.
For a full build: a `cmake` command with all of the R-relevant optional dependencies turned on. Development with other languages might require different flags as well. For example, to develop Python, you would need to also add `-DARROW_PYTHON=ON` (though all of the other flags used for Python are already included here).
From edf659d2bfb346e29c82c798102199ffea5d6ced Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 25 Aug 2021 21:05:33 +0100
Subject: [PATCH 19/31] Rephrase longer sentence
---
r/vignettes/developing.Rmd | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index f8e9d6d2657..ab638860ac9 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -474,7 +474,7 @@ To resolve this, try rebuilding the Arrow library from [Building Arrow above](#s
## Multiple versions of Arrow library
-If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#configure-for-installing-to-a-user-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
+If you are installing from a user-level directory, and you already have a previous installation of libarrow in a system directory, you get you may get errors like the following when you install the R package:
```
Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
@@ -484,7 +484,7 @@ Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath
Reason: image not found
```
-You need to make sure that you don't let R link to your system library when building arrow. You can do this a number of different ways:
+If this happens, you need to make sure that you don't let R link to your system library when building arrow. You can do this a number of different ways:
* Setting the `MAKEFLAGS` environment variable to `"LDFLAGS="` (see below for an example) this is the recommended way to accomplish this
* Using {withr}'s `with_makevars(list(LDFLAGS = ""), ...)`
From 8fe5faa03c64d1a163736c1af5f5f0375fa416ad Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 25 Aug 2021 21:12:46 +0100
Subject: [PATCH 20/31] Link to install vignette
---
r/vignettes/developing.Rmd | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index ab638860ac9..cfa79511af7 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -406,7 +406,8 @@ variables or other settings:
## Running checks
-You can run package checks by using `devtools::check()` and check test coverage with `covr::package_coverage()`.
+You can run package checks by using `devtools::check()` and check test coverage
+with `covr::package_coverage()`.
```r
# All package checks
@@ -427,7 +428,11 @@ R CMD check arrow_*.tar.gz --as-cran
## Running additional CI checks
-On a pull request, there are some actions you can trigger by commenting on the PR. We have additional CI checks that run nightly and can be requested on demand using an internal tool called [crosssbow](https://arrow.apache.org/docs/developers/crossbow.html). A few important GitHub comment commands include:
+On a pull request, there are some actions you can trigger by commenting on the
+PR. We have additional CI checks that run nightly and can be requested on demand
+using an internal tool called
+[crosssbow](https://arrow.apache.org/docs/developers/crossbow.html).
+A few important GitHub comment commands are shown below.
#### Run all extended R CI tasks
`@github-actions crossbow submit -g r`
@@ -437,13 +442,17 @@ This runs each of the R-related CI tasks.
#### Run a specific task
`@github-actions crossbow submit {task-name}`
-See the `r:` group definition near the beginning of the [crossbow configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml) for a list of glob expression patterns that match names of items in the `tasks:` list below it.
+See the `r:` group definition near the beginning of the [crossbow configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
+for a list of glob expression patterns that match names of items in the `tasks:`
+list below it.
#### Run linting and documentation building tasks
`@github-actions autotune`
-This will run and fix lint C++ linting errors, run R documentation (among other cleanup tasks), run styler on any changed R code, and commit the resulting updates to the branch.
+This will run and fix lint C++ linting errors, run R documentation (among other
+cleanup tasks), run styler on any changed R code, and commit the resulting
+updates to the branch.
# Troubleshooting
@@ -474,7 +483,9 @@ To resolve this, try rebuilding the Arrow library from [Building Arrow above](#s
## Multiple versions of Arrow library
-If you are installing from a user-level directory, and you already have a previous installation of libarrow in a system directory, you get you may get errors like the following when you install the R package:
+If you are installing from a user-level directory, and you already have a
+previous installation of libarrow in a system directory, you get you may get
+errors like the following when you install the R package:
```
Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
@@ -484,7 +495,8 @@ Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath
Reason: image not found
```
-If this happens, you need to make sure that you don't let R link to your system library when building arrow. You can do this a number of different ways:
+If this happens, you need to make sure that you don't let R link to your system
+library when building arrow. You can do this a number of different ways:
* Setting the `MAKEFLAGS` environment variable to `"LDFLAGS="` (see below for an example) this is the recommended way to accomplish this
* Using {withr}'s `with_makevars(list(LDFLAGS = ""), ...)`
@@ -540,4 +552,4 @@ guide](https://arrow.apache.org/docs/developers/cpp/building.html).
## Other installation issues
-There are a number of scripts that are triggered when the arrow R package is installed. For package users who are not interacting with the underlying code, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host). However, knowing about these scripts can help package developers troubleshoot if things go wrong in them or things go wrong in an install. See [the installation vignette](./install.html) for more information.
\ No newline at end of file
+There are a number of scripts that are triggered when the arrow R package is installed. For package users who are not interacting with the underlying code, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host). However, knowing about these scripts can help package developers troubleshoot if things go wrong in them or things go wrong in an install. See [the installation vignette](./install.html#how-dependencies-are-resolved) for more information.
\ No newline at end of file
From 9315f6f69ecab094c6af10db7535b570a062cb9f Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 25 Aug 2021 22:15:08 +0100
Subject: [PATCH 21/31] Link to cpp11
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index cfa79511af7..404129ab4f2 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -331,7 +331,7 @@ The styler package will fix many styling errors, thought not all lintr errors ar
### C++ code
-The arrow package uses some customized tools on top of `cpp11` to prepare its
+The arrow package uses some customized tools on top of [cpp11](https://cpp11.r-lib.org/) to prepare its
C++ code in `src/`. This is because there are some features that are only enabled
and built conditionally during build time. If you change C++ code in the R
package, you will need to set the `ARROW_R_DEV` environment variable to `true`
From 95aacf7f0933ae015f9c8005e26ec41b4153ddb9 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Thu, 26 Aug 2021 09:02:50 +0100
Subject: [PATCH 22/31] arrow terminology
---
r/vignettes/developing.Rmd | 72 +++++++++++++++++++++-----------------
1 file changed, 39 insertions(+), 33 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 404129ab4f2..26114d4f70f 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -45,6 +45,12 @@ If you're looking to contribute to arrow, this vignette can help you set up a de
This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation).
+For clarity, a quick note on terminology used in this vignette:
+* "Apache Arrow" or "Arrow" - the Apache Arrow project, including implementations in different languages
+* "libarrow" or "the Arrow C++ library" - Arrow's C++ library (N.B. this term might not be used in documentation in other parts of the Arrow project)
+* "arrow" or "the Arrow R package" - the R package
+* `arrow` - a directory into which the Arrow GitHub repo has been checked out
+
This document is a work in progress and will grow and change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but custom configurations might conflict with these instructions and there are differences of opinion across developers about how to set up development environments like this is.
We welcome any feedback you have about things that are confusing or additions you would like to see here. Please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see anything that is confusing, odd, or just plain wrong.
@@ -54,15 +60,15 @@ We welcome any feedback you have about things that are confusing or additions yo
## R-only {.tabset}
Windows and macOS users who wish to contribute to the R package and
-don't need to alter the Arrow C++ library may be able to obtain a
+don't need to alter libarrow may be able to obtain a
recent version of the library without building from source.
### Linux
-On Linux, you can download a .zip file with the arrow dependencies from the
+On Linux, you can download a .zip file containing libarrow from the
nightly repository.
-To see what nightlies are available, you can use Arrow's (or any other S3 client's) S3 listing functionality to see what is in the bucket `s3://arrow-r-nightly/libarrow/bin`:
+To see what nightlies are available, you can use arrow's (or any other S3 client's) S3 listing functionality to see what is in the bucket `s3://arrow-r-nightly/libarrow/bin`:
```
nightly <- s3_bucket("arrow-r-nightly")
@@ -70,10 +76,10 @@ nightly$ls("libarrow/bin")
```
Version numbers in that repository correspond to dates.
-You'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled arrow binary files into it.
+You'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled libarrow binary files into it.
### macOS
-On macOS, you can install the C++ library using [Homebrew](https://brew.sh/):
+On macOS, you can install libarrow using [Homebrew](https://brew.sh/):
``` shell
# For the released version:
@@ -84,9 +90,9 @@ brew install apache-arrow --HEAD
### Windows
-On Windows, you can download a .zip file with the arrow dependencies from the nightly repository.
+On Windows, you can download a .zip file containing libarrow from the nightly repository.
-To see what nightlies are available, you can use Arrow's (or any other S3 client's) S3 listing functionality to see what is in the bucket `s3://arrow-r-nightly/libarrow/bin`:
+To see what nightlies are available, you can use arrow's (or any other S3 client's) S3 listing functionality to see what is in the bucket `s3://arrow-r-nightly/libarrow/bin`:
```
nightly <- s3_bucket("arrow-r-nightly")
@@ -94,20 +100,20 @@ nightly$ls("libarrow/bin")
```
Version numbers in that repository correspond to dates.
-You can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing the arrow dependencies before installing the arrow R package.
+You can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing libarrow before installing the arrow R package.
## R and C++
-If you need to alter both the Arrow C++ library and the R package code, or if you can't get a binary version of the latest C++ library elsewhere, you'll need to build it from source. This section discusses how to set up a C++ build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+If you need to alter both libarrow and the R package code, or if you can't get a binary version of the latest libarrow elsewhere, you'll need to build it from source. This section discusses how to set up a C++ libarrow build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).
There are five major steps to the process.
### Step 1 - Install dependencies {.tabset}
-The Arrow C++ library will by default use system dependencies if suitable versions are found. If system dependencies are not present, the Arrow C++ library will build them during its own build process. The only dependencies that you need to install _outside_ of the build process are `cmake` (for configuring the build) and `openssl` if you are building with S3 support.
+When building libarrow, by default, system dependencies will be used if suitable versions are found. If system dependencies are not present, libarrow will build them during its own build process. The only dependencies that you need to install _outside_ of the build process are `cmake` (for configuring the build) and `openssl` if you are building with S3 support.
-For a faster build, you may choose to pre-install more C++ library dependencies (such as `lz4`, `zstd`, etc.) on the system so that they don't need to be built from source in the Arrow build.
+For a faster build, you may choose to pre-install more C++ library dependencies (such as `lz4`, `zstd`, etc.) on the system so that they don't need to be built from source in the libarrow build.
#### Ubuntu
```{bash, save=run & ubuntu}
@@ -121,27 +127,27 @@ brew install cmake openssl
#### Windows
-Currently, the R package cannot be made to work with a locally-built Arrow C++ library. This will be resolved in a future release.
+Currently, the R package cannot be made to work with a local libarrow build. This will be resolved in a future release.
-### Step 2 - Configure the Arrow build
+### Step 2 - Configure the libarrow build
-We recommend that you configure the arrow library to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of Arrow you may already have installed, and so that you are also able work with more than one version of the Arrow library (by using different `ARROW_HOME` directories for the different versions).
+We recommend that you configure libarrow to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of libarrow you may already have installed, and so that you are also able work with more than one version of libarrow (by using different `ARROW_HOME` directories for the different versions).
-In the example below, the Arrow C++ library is installed to a directory called `dist` that has the same parent directory as the `arrow` checkout. Your installation of the Arrow R package can point to any directory with any name, though we recommend *not* placing it inside of the arrow git checkout directory as unwanted changes could stop it working properly.
+In the example below, libarrow is installed to a directory called `dist` that has the same parent directory as the `arrow` checkout. Your installation of the Arrow R package can point to any directory with any name, though we recommend *not* placing it inside of the `arrow` git checkout directory as unwanted changes could stop it working properly.
```{bash, save=run & !sys_install}
export ARROW_HOME=$(pwd)/dist
mkdir $ARROW_HOME
```
-_Special instructions on Linux:_ You will need to set `LD_LIBRARY_PATH` to the `lib` directory that is under where you set `$ARROW_HOME`, before launching R and using Arrow. One way to do this is to add it to your profile (we use `~/.bash_profile` here, but you might need to put this in a different file depending on your setup, e.g. if you use a shell other than `bash`). On macOS you do not need to do this because the macOS shared library paths are hardcoded to their locations during build time.
+_Special instructions on Linux:_ You will need to set `LD_LIBRARY_PATH` to the `lib` directory that is under where you set `$ARROW_HOME`, before launching R and using arrow. One way to do this is to add it to your profile (we use `~/.bash_profile` here, but you might need to put this in a different file depending on your setup, e.g. if you use a shell other than `bash`). On macOS you do not need to do this because the macOS shared library paths are hardcoded to their locations during build time.
```{bash, save=run & ubuntu & !sys_install}
export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile
```
-Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`:
+Start by navigating in a terminal to the `arrow` repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`:
```{bash, save=run & !sys_install}
pushd arrow
@@ -149,7 +155,7 @@ mkdir -p cpp/build
pushd cpp/build
```
-You'll first call `cmake` to configure the build and then `make install`. For the R package, you'll need to enable several features in the C++ library using `-D` flags:
+You'll first call `cmake` to configure the build and then `make install`. For the R package, you'll need to enable several features in libarrow using `-D` flags:
```{bash, save=run & !sys_install}
cmake \
@@ -187,13 +193,13 @@ To enable optional features including: S3 support, an alternative memory allocat
Other flags that may be useful:
-* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=bundled`, for example, or any other dependency `*_SOURCE`, if you have a system version of a C++ dependency that doesn't work correctly with Arrow. This tells the build to compile its own version of the dependency from source.
+* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=BUNDLED`, for example, or any other dependency `*_SOURCE`, if you have a system version of a C++ dependency that doesn't work correctly with Arrow. This tells the build to compile its own version of the dependency from source.
* `-DCMAKE_BUILD_TYPE=debug` or `-DCMAKE_BUILD_TYPE=relwithdebinfo` can be useful for debugging. You probably don't want to do this generally because a debug build is much slower at runtime than the default `release` build.
_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace.
-### Step 3 - Building Arrow
+### Step 3 - Building libarrow
You can add `-j#` between `make` and `install` here too to speed up compilation by running in parallel (where `#` is the number of cores you have available).
@@ -203,7 +209,7 @@ make -j8 install
### Step 4 - Build the Arrow R package
-Once you've built the C++ library, you can install the R package and its
+Once you've built libarrow, you can install the R package and its
dependencies, along with additional dev dependencies, from the git
checkout:
@@ -227,7 +233,7 @@ export ARROW_R_CXXFLAGS=-fno-omit-frame-pointer
#### Recompiling the C++ code
-With the setup described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the Arrow library C++ has changed and there is a mismatch between the Arrow Library and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up.
+With the setup described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the libarrow C++ has changed and there is a mismatch between libarrow and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up.
For a full build: a `cmake` command with all of the R-relevant optional dependencies turned on. Development with other languages might require different flags as well. For example, to develop Python, you would need to also add `-DARROW_PYTHON=ON` (though all of the other flags used for Python are already included here).
@@ -279,13 +285,13 @@ remotes::install_github("apache/arrow/r@bugfix", build = FALSE)
Developers may wish to use this method of installing a specific commit
separate from another Arrow development environment or system installation
(e.g. we use this in [arrowbench](https://github.com/ursacomputing/arrowbench)
-to install development versions of arrow isolated from the system install). If
-you already have Arrow C++ libraries installed system-wide, you may need to set
+to install development versions of libarrow isolated from the system install). If
+you already have libarrow installed system-wide, you may need to set
some additional variables in order to isolate this build from your system libraries:
-* Setting the environment variable `FORCE_BUNDLED_BUILD` to `true` will skip the `pkg-config` search for Arrow libraries and attempt to build from the same source at the repository+ref given.
+* Setting the environment variable `FORCE_BUNDLED_BUILD` to `true` will skip the `pkg-config` search for libarrow and attempt to build from the same source at the repository+ref given.
-* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of Arrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`.
+* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of libarrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`.
# Common developer workflow tasks
@@ -431,7 +437,7 @@ R CMD check arrow_*.tar.gz --as-cran
On a pull request, there are some actions you can trigger by commenting on the
PR. We have additional CI checks that run nightly and can be requested on demand
using an internal tool called
-[crosssbow](https://arrow.apache.org/docs/developers/crossbow.html).
+[crossbow](https://arrow.apache.org/docs/developers/crossbow.html).
A few important GitHub comment commands are shown below.
#### Run all extended R CI tasks
@@ -456,16 +462,16 @@ updates to the branch.
# Troubleshooting
-Note that after any change to the C++ library, you must reinstall it and
+Note that after any change to libarrow, you must reinstall it and
run `make clean` or `git clean -fdx .` to remove any cached object code
in the `r/src/` directory before reinstalling the R package. This is
-only necessary if you make changes to the C++ library source; you do not
+only necessary if you make changes to libarrow source; you do not
need to manually purge object files if you are only editing R or C++
code inside `r/`.
-## Arrow library-R package mismatches
+## Arrow library - R package mismatches
-If the Arrow library and the R package have diverged, you will see errors like:
+If libarrow and the R package have diverged, you will see errors like:
```
Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...):
@@ -479,9 +485,9 @@ Execution halted
ERROR: loading failed
```
-To resolve this, try rebuilding the Arrow library from [Building Arrow above](#step-3-building-arrow).
+To resolve this, try [rebuilding the Arrow library](#step-3-building-arrow).
-## Multiple versions of Arrow library
+## Multiple versions of libarrow
If you are installing from a user-level directory, and you already have a
previous installation of libarrow in a system directory, you get you may get
From 2461b2741816b8860dcf4e1328a81ef6ad5b9816 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Thu, 26 Aug 2021 09:14:59 +0100
Subject: [PATCH 23/31] Move the install script subsection and add a sentence
about who it's for
---
r/vignettes/install.Rmd | 70 ++++++++++++++++++++++-------------------
1 file changed, 37 insertions(+), 33 deletions(-)
diff --git a/r/vignettes/install.Rmd b/r/vignettes/install.Rmd
index 5fc487c85b7..7be477b53e6 100644
--- a/r/vignettes/install.Rmd
+++ b/r/vignettes/install.Rmd
@@ -123,39 +123,6 @@ you'll need to reinstall the package in order to enable S3 support.
# How dependencies are resolved
-There are a number of scripts that are triggered when `R CMD INSTALL .` is run.
-For Arrow users, these should all just work without configuration and pull in
-the most complete pieces (e.g. official binaries that we host).
-
-An overview of these scripts is shown below:
-
-* `configure` and `configure.win` - these scripts are triggered during
-`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
-handle finding the Arrow library, setting up the build variables necessary, and
-writing the package Makevars file that is used to compile the C++ code in the R
-package.
-
-* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
-(or on any non-windows OS with the environment variable
-`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
-builds (which is the default on linux). The operative logic is at the end of
-the script, but it will do the following (and it will stop with the first one
-that succeeds and some of the steps are only checked if they are enabled via an
-environment variable):
- * Check if there is an already built libarrow in `arrow/r/libarrow-{version}`,
- use that to link against if it exists.
- * Check if a binary is available from our hosted unofficial builds.
- * Download the Arrow source and build the Arrow Library from source.
- * `*** Proceed without C++` dependencies (this is an error and the package
- will not work, but if you see this message you know the previous steps have
- not succeeded/were not enabled)
-
-* `inst/build_arrow_static.sh` - this script builds Arrow for a bundled, static
-build. It is called by `tools/nixlibs.R` when the Arrow library is being built.
-(If you're looking at this script, and you've gotten this far, it might look
-incredibly familiar: it's basically the contents of this guide in script form —
-with a few important changes)
-
In order for the `arrow` R package to work, it needs the Arrow C++ library.
There are a number of ways you can get it: a system package; a library you've
built yourself outside of the context of installing the R package;
@@ -198,9 +165,46 @@ unless you've set the `NOT_CRAN=true` environment variable.
For the mechanics of how all this works, see the R package `configure` script,
which calls `tools/nixlibs.R`.
+
If the C++ library is built from source, `inst/build_arrow_static.sh` is executed.
This build script is also what is used to generate the prebuilt binaries.
+## How the package is installed - advanced
+
+This subsection contains information which is likely to be most relevant mostly
+to Arrow developers and is not necessary for Arrow users to install Arrow.
+
+There are a number of scripts that are triggered when `R CMD INSTALL .` is run.
+For Arrow users, these should all just work without configuration and pull in
+the most complete pieces (e.g. official binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the Arrow library, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
+(or on any non-windows OS with the environment variable
+`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
+builds (which is the default on linux). The operative logic is at the end of
+the script, but it will do the following (and it will stop with the first one
+that succeeds and some of the steps are only checked if they are enabled via an
+environment variable):
+ * Check if there is an already built libarrow in `arrow/r/libarrow-{version}`,
+ use that to link against if it exists.
+ * Check if a binary is available from our hosted unofficial builds.
+ * Download the Arrow source and build the Arrow Library from source.
+ * `*** Proceed without C++` dependencies (this is an error and the package
+ will not work, but if you see this message you know the previous steps have
+ not succeeded/were not enabled)
+
+* `inst/build_arrow_static.sh` - called by `tools/nixlibs.R` when the Arrow
+library is being built. It builds Arrow for a bundled, static build, and
+mirrors the steps described in the ["Arrow R Developer Guide" vignette]("./developing.html")
+
# Troubleshooting
The intent is that `install.packages("arrow")` will just work and handle all C++
From 3cc6df7f02442f0363a614d9a6227ff406266a43 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Thu, 26 Aug 2021 09:16:50 +0100
Subject: [PATCH 24/31] I have no idea where these changes came from
---
r/vignettes/install.Rmd | 54 ++++++++++++++++++++---------------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/r/vignettes/install.Rmd b/r/vignettes/install.Rmd
index 7be477b53e6..d7588302e70 100644
--- a/r/vignettes/install.Rmd
+++ b/r/vignettes/install.Rmd
@@ -20,8 +20,8 @@ Our goal is to make `install.packages("arrow")` "just work" for as many Linux di
versions, and configurations as possible.
This document describes how it works and the options for fine-tuning Linux installation.
The intended audience for this document is `arrow` R package users on Linux, not developers.
-If you're contributing to the Arrow project,
-you'll probably want to manage your C++ installation more directly.
+If you're contributing to the Arrow project, see `vignette("developing", package = "arrow") for guidance on setting up your development environment.
+
Note also that if you use `conda` to manage your R environment, this document does not apply.
You can `conda install -c conda-forge --strict-channel-priority r-arrow` and you'll get the latest official
release of the R package along with any C++ dependencies.
@@ -72,7 +72,7 @@ Alternatively, you can set
export LIBARROW_MINIMAL=false
```
-to build the Arrow libraries with optional features such as compression libraries
+to build the Arrow libraries from source with optional features such as compression libraries
enabled. This will increase the build time but provides many useful features.
Prebuilt binaries are built with this flag enabled, so you get the full
functionality by using them as well.
@@ -80,6 +80,8 @@ functionality by using them as well.
Both of these variables are also set this way if you have the `NOT_CRAN=true`
environment variable set.
+## Helper function: install_arrow()
+
If you already have `arrow` installed and want to upgrade to a different version,
install a development build, or try to reinstall and fix issues with Linux
C++ binaries, you can call `install_arrow()`.
@@ -130,7 +132,7 @@ or, if you don't already have it, the R package will attempt to resolve it
automatically when it installs.
If you are authorized to install system packages and you're installing a CRAN release,
-you may want to use the official Apache Arrow release packages corresponding to the R package version.
+you may want to use the official Apache Arrow release packages corresponding to the R package version (though there are some drawbacks: see "Troubleshooting" below).
See the [Arrow project installation page](https://arrow.apache.org/install/)
to find pre-compiled binary packages for some common Linux distributions,
including Debian, Ubuntu, and CentOS.
@@ -148,7 +150,7 @@ If no Arrow C++ libraries are found on the system,
the R package installation script will next attempt to download
prebuilt static Arrow C++ libraries
that match your both your local operating system and `arrow` R package version.
-C++ libraries (source or binary) will only be retrieved if you have set the environment variable
+C++ binaries will only be retrieved if you have set the environment variable
`LIBARROW_BINARY` or `NOT_CRAN`.
If found, they will be downloaded and bundled when your R package compiles.
For a list of supported distributions and versions,
@@ -158,12 +160,9 @@ If no binary is found, it will download the Arrow C++ source that matches the R
(CRAN release or nightly build) and attempt to build it locally.
If no matching source bundle is found, it will also look to see if you are in
a checkout of the `apache/arrow` git repository and thus have the C++ source there.
-Depending on your system, building Arrow C++ from source likely will be slow;
-consequently, it is designed to happen only when you
-run `install.packages("arrow")` or `R CMD INSTALL` but not when running `R CMD check`,
-unless you've set the `NOT_CRAN=true` environment variable.
+Depending on your system, building Arrow C++ from source may be slow.
-For the mechanics of how all this works, see the R package `configure` script,
+For the specific mechanics of how all this works, see the R package `configure` script,
which calls `tools/nixlibs.R`.
If the C++ library is built from source, `inst/build_arrow_static.sh` is executed.
@@ -211,23 +210,26 @@ The intent is that `install.packages("arrow")` will just work and handle all C++
dependencies, but depending on your system, you may have better results if you
tune one of several parameters. Here are some known complications and ways to address them.
-## Package installed without C++ dependencies
+## Package failed to build C++ dependencies
-If you get an error like
+If you see a message like
```
-Cannot call io___MemoryMappedFile__Open(). See https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow C++ libraries.
+------------------------- NOTE ---------------------------
+There was an issue preparing the Arrow C++ libraries.
+See https://arrow.apache.org/docs/r/articles/install.html
+---------------------------------------------------------
```
-for every `arrow` function you call,
-that means that installing the package failed to retrieve or build C++ libraries
+in the output when the package fails to install,
+that means that installation failed to retrieve or build C++ libraries
compatible with the current version of the R package.
It is expected that C++ dependencies should be built successfully
on all Linux distributions, so you should not see this message. If you do,
please check the "Known installation issues" below to see if any apply.
-If none apply, retry the installation with `arrow::install_arrow(verbose = TRUE)`
-so that details on what failed are shown, then
+If none apply, set the environment variable `ARROW_R_DEV=TRUE`
+so that details on what failed are shown, and try installing again. Then,
please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues)
and include the full verbose installation output.
@@ -291,12 +293,11 @@ because they're known to work.
## Building C++ from source
If building the C++ library from source fails, check the error message.
-The install script attempts to install any necessary build dependencies,
-but it's possible that some operating systems may require additional ones.
-You may be able to install them and retry.
-Regardless, if the C++ library fails to compile,
+(If you don't see an error message, only the `----- NOTE -----`,
+set the environment variable `ARROW_R_DEV=TRUE` to increase verbosity and retry installation.)
+The install script should work everywhere, so if the C++ library fails to compile,
please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues)
-so that we can attempt to improve the script.
+so that we can improve the script.
## Known installation issues
@@ -325,8 +326,8 @@ See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
Some features are optional when you build Arrow from source. With the exception of `ARROW_S3`, these are all `ON` by default in the bundled C++ build, but you can set them to `OFF` to disable them.
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this `OFF`
+* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
+ dependencies are met; if they are not met, the build script will turn this `OFF`
* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
* `ARROW_PARQUET`
* `ARROW_DATASET`
@@ -368,7 +369,7 @@ By default, these are all unset. All boolean variables are case-insensitive.
* `ARROW_R_DEV`: If set to `true`, more verbose messaging will be printed
in the build script. `arrow::install_arrow(verbose = TRUE)` sets this.
This variable also is needed if you're modifying C++
- code in the package: see "Editing C++ code" in the README.
+ code in the package: see the developer guide vignette.
* `LIBARROW_DEBUG_DIR`: If the C++ library building from source fails (`cmake`),
there may be messages telling you to check some log file in the build directory.
However, when the library is built during R package installation,
@@ -378,8 +379,7 @@ By default, these are all unset. All boolean variables are case-insensitive.
The directory will be created if it does not exist.
* `CMAKE`: When building the C++ library from source, you can specify a
`/path/to/cmake` to use a different version than whatever is found on the `$PATH`
-
-
+
# Contributing
As mentioned above, please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues)
From 6a34060f5ecd15eda22c2154bbc05d873ede9758 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Thu, 26 Aug 2021 10:47:31 +0100
Subject: [PATCH 25/31] Shell to bash for nicer highlighting, split out inline
code
---
r/vignettes/developing.Rmd | 79 +++++++++++++++++++++++++++-----------
1 file changed, 57 insertions(+), 22 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 26114d4f70f..2f531561ff6 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -46,6 +46,7 @@ If you're looking to contribute to arrow, this vignette can help you set up a de
This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation).
For clarity, a quick note on terminology used in this vignette:
+
* "Apache Arrow" or "Arrow" - the Apache Arrow project, including implementations in different languages
* "libarrow" or "the Arrow C++ library" - Arrow's C++ library (N.B. this term might not be used in documentation in other parts of the Arrow project)
* "arrow" or "the Arrow R package" - the R package
@@ -81,7 +82,7 @@ You'll need to create a `libarrow` directory inside the R package directory and
### macOS
On macOS, you can install libarrow using [Homebrew](https://brew.sh/):
-``` shell
+```bash
# For the released version:
brew install apache-arrow
# Or for a development version, you can try:
@@ -111,9 +112,9 @@ There are five major steps to the process.
### Step 1 - Install dependencies {.tabset}
-When building libarrow, by default, system dependencies will be used if suitable versions are found. If system dependencies are not present, libarrow will build them during its own build process. The only dependencies that you need to install _outside_ of the build process are `cmake` (for configuring the build) and `openssl` if you are building with S3 support.
+When building libarrow, by default, system dependencies will be used if suitable versions are found. If system dependencies are not present, libarrow will build them during its own build process. The only dependencies that you need to install _outside_ of the build process are [cmake](https://cmake.org/) (for configuring the build) and [openssl](https://www.openssl.org/) if you are building with S3 support.
-For a faster build, you may choose to pre-install more C++ library dependencies (such as `lz4`, `zstd`, etc.) on the system so that they don't need to be built from source in the libarrow build.
+For a faster build, you may choose to pre-install more C++ library dependencies (such as [lz4](http://lz4.github.io/lz4/), [zstd](https://facebook.github.io/zstd/), etc.) on the system so that they don't need to be built from source in the libarrow build.
#### Ubuntu
```{bash, save=run & ubuntu}
@@ -181,7 +182,7 @@ cmake \
To enable optional features including: S3 support, an alternative memory allocator, and additional compression libraries, add some or all of these flags to your call to `cmake` (the trailing `\` makes them easier to paste into a bash shell on a new line):
-``` shell
+```bash
-DARROW_MIMALLOC=ON \
-DARROW_S3=ON \
-DARROW_WITH_BROTLI=ON \
@@ -227,7 +228,7 @@ extensions, you can use the `ARROW_R_CXXFLAGS` environment variable. For
example, if you are using `perf` to profile the R extensions, you may
need to set
-``` shell
+```bash
export ARROW_R_CXXFLAGS=-fno-omit-frame-pointer
```
@@ -239,7 +240,7 @@ With the setup described here, you should not need to rebuild the Arrow library
For a full build: a `cmake` command with all of the R-relevant optional dependencies turned on. Development with other languages might require different flags as well. For example, to develop Python, you would need to also add `-DARROW_PYTHON=ON` (though all of the other flags used for Python are already included here).
-``` shell
+```bash
cmake \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
@@ -267,15 +268,18 @@ cmake \
## Installing a version of the R package with a specific git reference
-If you need an arrow installation from a specific repository or git reference,
-`remotes::install_github("apache/arrow/r", build = FALSE)`
-should work on most platforms (with the notable exception of Windows).
+If you need an arrow installation from a specific repository or git reference, on most platforms except Windows, you can run:
+
+```{r}
+remotes::install_github("apache/arrow/r", build = FALSE)
+```
+
The `build = FALSE` argument is important so that the installation can access the
C++ source in the `cpp/` directory in `apache/arrow`.
As with other installation methods, setting the environment variables `LIBARROW_MINIMAL=false` and `ARROW_R_DEV=true` will provide a more full-featured version of Arrow and provide more verbose output, respectively.
-For example, to install from the (fictional) branch `bugfix` from `apache/arrow` one could:
+For example, to install from the (fictional) branch `bugfix` from `apache/arrow` you could run:
```r
Sys.setenv(LIBARROW_MINIMAL="false")
@@ -291,7 +295,10 @@ some additional variables in order to isolate this build from your system librar
* Setting the environment variable `FORCE_BUNDLED_BUILD` to `true` will skip the `pkg-config` search for libarrow and attempt to build from the same source at the repository+ref given.
-* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of libarrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`.
+* You may also need to set the Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of libarrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so:
+```{r}
+withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))
+```
# Common developer workflow tasks
@@ -325,13 +332,29 @@ pkgdown::build_site(preview=TRUE)
The R code in the package follows [the tidyverse style](https://style.tidyverse.org/). On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.
-To run the [lintr](https://github.com/jimhester/lintr) locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file `ci/docker/linux-apt-lint.dockerfile` for the current status) and then run `lintr::lint_package("arrow/r")`.
+To run the [lintr](https://github.com/jimhester/lintr) locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file `ci/docker/linux-apt-lint.dockerfile` for the current status) and then run
+
+```{r}
+lintr::lint_package("arrow/r")
+```
You can automatically change the formatting of the code in the package using the [styler](https://styler.r-lib.org/) package. There are two ways to do this:
1. Use the comment bot to do this automatically with the command `@github-actions autotune` on a PR, and commit it back to the branch.
-2. Run the styler locally with the command `make style` (for only the files changed), `make style-all` (for all files), or use `styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))` - note the two excluded files which should not be styled.
+2. Run the styler locally either via Makefile commands:
+
+```bash
+make style # (for only the files changed)
+make style-all # (for all files)
+```
+
+on in R:
+
+```{r}
+# note the two excluded files which should not be styled
+styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))
+```
The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in `r/.styler_excludes.R`.
@@ -352,21 +375,27 @@ Installing/enabling the appropriate plugin may save you much frustration.
Check for style errors with
-``` shell
+```bash
./lint.sh
```
Fix any style issues before committing with
-``` shell
+```bash
./lint.sh --fix
```
The lint script requires Python 3 and `clang-format-8`. If the command
-isn't found, you can explicitly provide the path to it like
-`CLANG_FORMAT=$(which clang-format-8) ./lint.sh`. On macOS, you can get
-this by installing LLVM via Homebrew and running the script as
-`CLANG_FORMAT=$(brew --prefix llvm@8)/bin/clang-format ./lint.sh`
+isn't found, you can explicitly provide the path to it like:
+
+```bash
+CLANG_FORMAT=$(which clang-format-8) ./lint.sh
+```
+
+On macOS, you can get this by installing LLVM via Homebrew and running the script as:
+```bash
+CLANG_FORMAT=$(brew --prefix llvm@8)/bin/clang-format ./lint.sh
+```
_Note_ that the lint script requires Python 3 and the Python dependencies
(note that `cmake_format is pinned to a specific version):
@@ -441,12 +470,16 @@ using an internal tool called
A few important GitHub comment commands are shown below.
#### Run all extended R CI tasks
-`@github-actions crossbow submit -g r`
+```
+@github-actions crossbow submit -g r
+```
This runs each of the R-related CI tasks.
#### Run a specific task
-`@github-actions crossbow submit {task-name}`
+```
+@github-actions crossbow submit {task-name}
+```
See the `r:` group definition near the beginning of the [crossbow configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
for a list of glob expression patterns that match names of items in the `tasks:`
@@ -454,7 +487,9 @@ list below it.
#### Run linting and documentation building tasks
-`@github-actions autotune`
+```
+@github-actions autotune
+```
This will run and fix lint C++ linting errors, run R documentation (among other
cleanup tasks), run styler on any changed R code, and commit the resulting
From 0e90fe66b9980ed6adb01839ec10a3d5a5c461bc Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Wed, 1 Sep 2021 15:12:38 +0100
Subject: [PATCH 26/31] Fix typo and whitespace
---
r/vignettes/developing.Rmd | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 2f531561ff6..400a457c081 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -349,11 +349,11 @@ make style # (for only the files changed)
make style-all # (for all files)
```
-on in R:
+or in R:
```{r}
# note the two excluded files which should not be styled
-styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))
+styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))
```
The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in `r/.styler_excludes.R`.
From aa3c9b23d4fdf6d26fb71700efc0aca82086e4b1 Mon Sep 17 00:00:00 2001
From: Nic
Date: Thu, 2 Sep 2021 11:42:02 +0000
Subject: [PATCH 27/31] Update r/vignettes/developing.Rmd
Co-authored-by: Jonathan Keane
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 400a457c081..6d860b9d8c8 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -176,7 +176,7 @@ cmake \
..
```
-`..` refers to the C++ source directory: you're in `cpp/build`, and the source is in `cpp`.
+`..` refers to the C++ source directory: you're in `cpp/build` and the source is in `cpp`.
#### Enabling more Arrow features
From dc434f5c263228cd5167bbfd8891c7e69f59af90 Mon Sep 17 00:00:00 2001
From: Nic
Date: Thu, 2 Sep 2021 11:42:27 +0000
Subject: [PATCH 28/31] Update r/vignettes/developing.Rmd
Co-authored-by: Jonathan Keane
---
r/vignettes/developing.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 6d860b9d8c8..b73b72ee68b 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -234,7 +234,7 @@ export ARROW_R_CXXFLAGS=-fno-omit-frame-pointer
#### Recompiling the C++ code
-With the setup described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the libarrow C++ has changed and there is a mismatch between libarrow and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up.
+With the setup described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterate and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the libarrow C++ has changed and there is a mismatch between libarrow and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up.
For a full build: a `cmake` command with all of the R-relevant optional dependencies turned on. Development with other languages might require different flags as well. For example, to develop Python, you would need to also add `-DARROW_PYTHON=ON` (though all of the other flags used for Python are already included here).
From 1f44aca84bd858627a0d35feddf352d374fb640d Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Thu, 2 Sep 2021 12:55:12 +0100
Subject: [PATCH 29/31] Move definition to footnote
---
r/vignettes/developing.Rmd | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index 400a457c081..af91f74f468 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -45,13 +45,6 @@ If you're looking to contribute to arrow, this vignette can help you set up a de
This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation).
-For clarity, a quick note on terminology used in this vignette:
-
-* "Apache Arrow" or "Arrow" - the Apache Arrow project, including implementations in different languages
-* "libarrow" or "the Arrow C++ library" - Arrow's C++ library (N.B. this term might not be used in documentation in other parts of the Arrow project)
-* "arrow" or "the Arrow R package" - the R package
-* `arrow` - a directory into which the Arrow GitHub repo has been checked out
-
This document is a work in progress and will grow and change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but custom configurations might conflict with these instructions and there are differences of opinion across developers about how to set up development environments like this is.
We welcome any feedback you have about things that are confusing or additions you would like to see here. Please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see anything that is confusing, odd, or just plain wrong.
@@ -61,7 +54,7 @@ We welcome any feedback you have about things that are confusing or additions yo
## R-only {.tabset}
Windows and macOS users who wish to contribute to the R package and
-don't need to alter libarrow may be able to obtain a
+don't need to alter libarrow^[Arrow's C++ library] may be able to obtain a
recent version of the library without building from source.
### Linux
From fd55285db91b3942b03683dbb1ba790142c9ada4 Mon Sep 17 00:00:00 2001
From: Nic Crane
Date: Thu, 2 Sep 2021 13:00:31 +0100
Subject: [PATCH 30/31] Refactor a few things for grammar and clarity
---
r/vignettes/developing.Rmd | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index af91f74f468..3d8ed1e0c15 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -41,20 +41,21 @@ set -x
If you're looking to contribute to arrow, this vignette can help you set up a development environment that will enable you to write code and run tests locally. It outlines:
* how to build the components that make up the Arrow project and R package
-* some common troubleshooting and workflows that developers use
+* workflows that developers use
+* some common troubleshooting steps and solutions
-This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation).
+This document is intended only for **developers** of Apache Arrow or the Arrow R package. R package users do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation).
-This document is a work in progress and will grow and change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but custom configurations might conflict with these instructions and there are differences of opinion across developers about how to set up development environments like this is.
+This document is a work in progress and will grow and change as the Apache Arrow project grows and changes. We have tried to make these steps as robust as possible (in fact, we even test exactly these instructions on our nightly CI to ensure they don't become stale!), but custom configurations might conflict with these instructions and there are differences of opinion across developers about how to set up development environments like this.
-We welcome any feedback you have about things that are confusing or additions you would like to see here. Please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see anything that is confusing, odd, or just plain wrong.
+We welcome any feedback you have about things that are confusing or additions you would like to see here - please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) if you have any suggestions or requests.
# Developer environment setup
## R-only {.tabset}
Windows and macOS users who wish to contribute to the R package and
-don't need to alter libarrow^[Arrow's C++ library] may be able to obtain a
+don't need to alter libarrow (Arrow's C++ library) may be able to obtain a
recent version of the library without building from source.
### Linux
From f3e355051fa2a492a5ef96560d092a6e68f50c75 Mon Sep 17 00:00:00 2001
From: Jonathan Keane
Date: Mon, 6 Sep 2021 15:15:03 -0500
Subject: [PATCH 31/31] Update r/vignettes/developing.Rmd
---
r/vignettes/developing.Rmd | 1 +
1 file changed, 1 insertion(+)
diff --git a/r/vignettes/developing.Rmd b/r/vignettes/developing.Rmd
index ff4994aa74d..59c231724aa 100644
--- a/r/vignettes/developing.Rmd
+++ b/r/vignettes/developing.Rmd
@@ -348,6 +348,7 @@ or in R:
```{r}
# note the two excluded files which should not be styled
styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))
+
```
The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in `r/.styler_excludes.R`.