-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-848: [Python] Another pass on conda dev guide #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,36 +16,41 @@ | |
|
|
||
| ### Linux and macOS | ||
|
|
||
| First, set up your thirdparty C++ toolchain using libraries from conda-forge: | ||
| #### System Requirements | ||
|
|
||
| On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is | ||
| sufficient. | ||
|
|
||
| On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or | ||
| higher. You can check your version by running | ||
|
|
||
| ```shell | ||
| conda config --add channels conda-forge | ||
| $ gcc --version | ||
| ``` | ||
|
|
||
| export ARROW_BUILD_TYPE=Release | ||
| On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with: | ||
|
|
||
| export CPP_TOOLCHAIN=$HOME/cpp-toolchain | ||
| export LD_LIBRARY_PATH=$CPP_TOOLCHAIN/lib:$LD_LIBRARY_PATH | ||
| ```shell | ||
| $ sudo apt-get install g++-4.9 | ||
| ``` | ||
|
|
||
| export BOOST_ROOT=$CPP_TOOLCHAIN | ||
| export FLATBUFFERS_HOME=$CPP_TOOLCHAIN | ||
| export RAPIDJSON_HOME=$CPP_TOOLCHAIN | ||
| export THRIFT_HOME=$CPP_TOOLCHAIN | ||
| export ZLIB_HOME=$CPP_TOOLCHAIN | ||
| export SNAPPY_HOME=$CPP_TOOLCHAIN | ||
| export BROTLI_HOME=$CPP_TOOLCHAIN | ||
| export JEMALLOC_HOME=$CPP_TOOLCHAIN | ||
| export ARROW_HOME=$CPP_TOOLCHAIN | ||
| export PARQUET_HOME=$CPP_TOOLCHAIN | ||
| Finally, set gcc 4.9 as the active compiler using: | ||
|
|
||
| conda create -y -q -p $CPP_TOOLCHAIN \ | ||
| flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib brotli jemalloc | ||
| ```shell | ||
| export CC=gcc-4.9 | ||
| export CXX=g++-4.9 | ||
| ``` | ||
|
|
||
| Now, activate a conda environment containing your target Python version and | ||
| NumPy installed: | ||
| #### Environment Setup and Build | ||
|
|
||
| First, let's create a conda environment with all the C++ build and Python | ||
| dependencies from conda-forge: | ||
|
|
||
| ```shell | ||
| conda create -y -q -n pyarrow-dev python=3.6 numpy | ||
| conda create -y -q -n pyarrow-dev \ | ||
| python=3.6 numpy six setuptools cython pandas pytest \ | ||
| cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \ | ||
| brotli jemalloc -c conda-forge | ||
| source activate pyarrow-dev | ||
| ``` | ||
|
|
||
|
|
@@ -67,14 +72,34 @@ drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/ | |
| drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/ | ||
| ``` | ||
|
|
||
| We need to set a number of environment variables to let Arrow's build system | ||
| know about our build toolchain: | ||
|
|
||
| ``` | ||
| export ARROW_BUILD_TYPE=release | ||
|
|
||
| export BOOST_ROOT=$CONDA_PREFIX | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We definitely should invest into getting all those variables into a single one. Better would be even that we could detect if we're in a conda env and try to prefer the libraries installed there.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could e.g. just try and search first for all dependencies in
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if we had an $ARROW_BUILD_TOOLCHAIN that automatically set all of the *_HOME variables?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that is in scope I would hope for :)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see ARROW-849 and PARQUET-957 |
||
| export BOOST_LIBRARYDIR=$CONDA_PREFIX/lib | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this need to be
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. conda-forge hardcodes lib https://github.com/conda-forge/boost-cpp-feedstock/blob/master/recipe/build.sh#L14
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool. |
||
|
|
||
| export FLATBUFFERS_HOME=$CONDA_PREFIX | ||
| export RAPIDJSON_HOME=$CONDA_PREFIX | ||
| export THRIFT_HOME=$CONDA_PREFIX | ||
| export ZLIB_HOME=$CONDA_PREFIX | ||
| export SNAPPY_HOME=$CONDA_PREFIX | ||
| export BROTLI_HOME=$CONDA_PREFIX | ||
| export JEMALLOC_HOME=$CONDA_PREFIX | ||
| export ARROW_HOME=$CONDA_PREFIX | ||
| export PARQUET_HOME=$CONDA_PREFIX | ||
| ``` | ||
|
|
||
| Now build and install the Arrow C++ libraries: | ||
|
|
||
| ```shell | ||
| mkdir arrow/cpp/build | ||
| pushd arrow/cpp/build | ||
|
|
||
| cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \ | ||
| -DCMAKE_INSTALL_PREFIX=$CPP_TOOLCHAIN \ | ||
| -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ | ||
| -DARROW_PYTHON=on \ | ||
| -DARROW_BUILD_TESTS=OFF \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tiny nit: Maybe make the case consistent here. |
||
| .. | ||
|
|
@@ -90,7 +115,7 @@ mkdir parquet-cpp/build | |
| pushd parquet-cpp/build | ||
|
|
||
| cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \ | ||
| -DCMAKE_INSTALL_PREFIX=$CPP_TOOLCHAIN \ | ||
| -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ | ||
| -DPARQUET_BUILD_BENCHMARKS=off \ | ||
| -DPARQUET_BUILD_EXECUTABLES=off \ | ||
| -DPARQUET_ZLIB_VENDORED=off \ | ||
|
|
@@ -102,11 +127,9 @@ make install | |
| popd | ||
| ``` | ||
|
|
||
| Now, install requisite build requirements for pyarrow, then build: | ||
| Now, build pyarrow: | ||
|
|
||
| ```shell | ||
| conda install -y -q six setuptools cython pandas pytest | ||
|
|
||
| cd arrow/python | ||
| python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --inplace | ||
| ``` | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be
export CC=$(which gcc-4.9)? (and same forCXX)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either work, but I'll change to the more explicit variety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of $(which FOO) doesn't produce an executable command for me when FOO is found multiple times in the path. I'll leave this as is for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.