Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 48 additions & 25 deletions python/DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,36 +16,41 @@

### Linux and macOS

First, set up your thirdparty C++ toolchain using libraries from conda-forge:
#### System Requirements

On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is
sufficient.

On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or
higher. You can check your version by running

```shell
conda config --add channels conda-forge
$ gcc --version
```

export ARROW_BUILD_TYPE=Release
On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with:

export CPP_TOOLCHAIN=$HOME/cpp-toolchain
export LD_LIBRARY_PATH=$CPP_TOOLCHAIN/lib:$LD_LIBRARY_PATH
```shell
$ sudo apt-get install g++-4.9
```

export BOOST_ROOT=$CPP_TOOLCHAIN
export FLATBUFFERS_HOME=$CPP_TOOLCHAIN
export RAPIDJSON_HOME=$CPP_TOOLCHAIN
export THRIFT_HOME=$CPP_TOOLCHAIN
export ZLIB_HOME=$CPP_TOOLCHAIN
export SNAPPY_HOME=$CPP_TOOLCHAIN
export BROTLI_HOME=$CPP_TOOLCHAIN
export JEMALLOC_HOME=$CPP_TOOLCHAIN
export ARROW_HOME=$CPP_TOOLCHAIN
export PARQUET_HOME=$CPP_TOOLCHAIN
Finally, set gcc 4.9 as the active compiler using:

conda create -y -q -p $CPP_TOOLCHAIN \
flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib brotli jemalloc
```shell
export CC=gcc-4.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be export CC=$(which gcc-4.9)? (and same for CXX)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either work, but I'll change to the more explicit variety.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of $(which FOO) doesn't produce an executable command for me when FOO is found multiple times in the path. I'll leave this as is for now

$ which gcc
gcc is /home/wesm/.conda/envs/$NAME/bin/gcc
gcc is /usr/bin/gcc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

export CXX=g++-4.9
```

Now, activate a conda environment containing your target Python version and
NumPy installed:
#### Environment Setup and Build

First, let's create a conda environment with all the C++ build and Python
dependencies from conda-forge:

```shell
conda create -y -q -n pyarrow-dev python=3.6 numpy
conda create -y -q -n pyarrow-dev \
python=3.6 numpy six setuptools cython pandas pytest \
cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \
brotli jemalloc -c conda-forge
source activate pyarrow-dev
```

Expand All @@ -67,14 +72,34 @@ drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/
drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/
```

We need to set a number of environment variables to let Arrow's build system
know about our build toolchain:

```
export ARROW_BUILD_TYPE=release

export BOOST_ROOT=$CONDA_PREFIX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should invest into getting all those variables into a single one. Better would be even that we could detect if we're in a conda env and try to prefer the libraries installed there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could e.g. just try and search first for all dependencies in $CMAKE_INSTALL_PREFIX

Copy link
Member Author

@wesm wesm Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we had an $ARROW_BUILD_TOOLCHAIN that automatically set all of the *_HOME variables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is in scope I would hope for :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see ARROW-849 and PARQUET-957

export BOOST_LIBRARYDIR=$CONDA_PREFIX/lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this need to be lib64 when the system is 64 bit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.


export FLATBUFFERS_HOME=$CONDA_PREFIX
export RAPIDJSON_HOME=$CONDA_PREFIX
export THRIFT_HOME=$CONDA_PREFIX
export ZLIB_HOME=$CONDA_PREFIX
export SNAPPY_HOME=$CONDA_PREFIX
export BROTLI_HOME=$CONDA_PREFIX
export JEMALLOC_HOME=$CONDA_PREFIX
export ARROW_HOME=$CONDA_PREFIX
export PARQUET_HOME=$CONDA_PREFIX
```

Now build and install the Arrow C++ libraries:

```shell
mkdir arrow/cpp/build
pushd arrow/cpp/build

cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-DCMAKE_INSTALL_PREFIX=$CPP_TOOLCHAIN \
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
-DARROW_PYTHON=on \
-DARROW_BUILD_TESTS=OFF \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: Maybe make the case consistent here.

..
Expand All @@ -90,7 +115,7 @@ mkdir parquet-cpp/build
pushd parquet-cpp/build

cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-DCMAKE_INSTALL_PREFIX=$CPP_TOOLCHAIN \
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
-DPARQUET_BUILD_BENCHMARKS=off \
-DPARQUET_BUILD_EXECUTABLES=off \
-DPARQUET_ZLIB_VENDORED=off \
Expand All @@ -102,11 +127,9 @@ make install
popd
```

Now, install requisite build requirements for pyarrow, then build:
Now, build pyarrow:

```shell
conda install -y -q six setuptools cython pandas pytest

cd arrow/python
python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --inplace
```
Expand Down
2 changes: 1 addition & 1 deletion python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def _run_cmake(self):
cmake_options.append('-DPYARROW_BUNDLE_ARROW_CPP=ON')

cmake_options.append('-DCMAKE_BUILD_TYPE={0}'
.format(self.build_type))
.format(self.build_type.lower()))

if sys.platform != 'win32':
cmake_command = (['cmake', self.extra_cmake_args] +
Expand Down