Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 15 additions & 18 deletions docs/source/developers/python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,8 @@ folder as the repositories and a target installation folder:

virtualenv pyarrow
source ./pyarrow/bin/activate
pip install six numpy pandas cython pytest hypothesis
pip install -r arrow/python/requirements-build.txt \
-r arrow/python/requirements-test.txt

# This is the folder where we will install the Arrow libraries during
# development
Expand Down Expand Up @@ -281,9 +282,6 @@ Now build and install the Arrow C++ libraries:

cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
-DARROW_FLIGHT=ON \
-DARROW_GANDIVA=ON \
-DARROW_ORC=ON \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
Expand All @@ -292,24 +290,23 @@ Now build and install the Arrow C++ libraries:
-DARROW_WITH_BROTLI=ON \
-DARROW_PARQUET=ON \
-DARROW_PYTHON=ON \
-DARROW_PLASMA=ON \
-DARROW_BUILD_TESTS=ON \
..
make -j4
make install
popd

Many of these components are optional, and can be switched off by setting them
to ``OFF``:
There are a number of optional components that can can be switched ON by
adding flags with ``ON``:

* ``ARROW_FLIGHT``: RPC framework
* ``ARROW_GANDIVA``: LLVM-based expression compiler
* ``ARROW_ORC``: Support for Apache ORC file format
* ``ARROW_PARQUET``: Support for Apache Parquet file format
* ``ARROW_PLASMA``: Shared memory object store
* ``ARROW_WITH_{BZ2, ZLIB, ZSTD, LZ4, SNAPPY, BROTLI}``: Build support for
compression libraries, used for reading and writing Parquet files and other
things.

Anything set to ``ON`` above can also be turned off. Note that some compression
libraries are needed for Parquet support.

If multiple versions of Python are installed in your environment, you may have
to pass additional parameters to cmake so that it can find the right
Expand Down Expand Up @@ -339,9 +336,6 @@ Now, build pyarrow:
.. code-block:: shell

pushd arrow/python
export PYARROW_WITH_FLIGHT=1
export PYARROW_WITH_GANDIVA=1
export PYARROW_WITH_ORC=1
export PYARROW_WITH_PARQUET=1
python setup.py build_ext --inplace
popd
Expand All @@ -361,6 +355,14 @@ libraries), one can set ``--bundle-arrow-cpp``:
python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
--bundle-arrow-cpp bdist_wheel

Docker examples
~~~~~~~~~~~~~~~

If you are having difficulty building the Python library from source, take a
look at the ``python/examples/minimal_build`` directory which illustrates a
complete build and test from source both with the conda and pip/virtualenv
build methods.

Building with CUDA support
~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -462,7 +464,6 @@ Let's configure, build and install the Arrow C++ libraries:
cmake -G "%PYARROW_CMAKE_GENERATOR%" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DARROW_CXXFLAGS="/WX /MP" ^
-DARROW_GANDIVA=on ^
-DARROW_PARQUET=on ^
-DARROW_PYTHON=on ^
..
Expand All @@ -474,7 +475,6 @@ Now, we can build pyarrow:
.. code-block:: shell

pushd arrow\python
set PYARROW_WITH_GANDIVA=1
set PYARROW_WITH_PARQUET=1
python setup.py build_ext --inplace
popd
Expand Down Expand Up @@ -528,7 +528,6 @@ configuration of the Arrow C++ library build:
cmake -G "%PYARROW_CMAKE_GENERATOR%" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DARROW_CXXFLAGS="/WX /MP" ^
-DARROW_GANDIVA=on ^
-DARROW_PARQUET=on ^
-DARROW_PYTHON=on ^
-DARROW_BUILD_TESTS=ON ^
Expand Down Expand Up @@ -557,8 +556,6 @@ To run all tests of the Arrow C++ library, you can also run ``ctest``:
ctest
popd



Windows Caveats
---------------

Expand Down
31 changes: 31 additions & 0 deletions python/examples/minimal_build/Dockerfile.fedora
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

FROM fedora:31

RUN dnf update -y && \
dnf install -y \
autoconf \
gcc \
gcc-g++ \
git \
wget \
make \
cmake \
ninja-build \
python3-devel \
python3-virtualenv
38 changes: 38 additions & 0 deletions python/examples/minimal_build/Dockerfile.ubuntu
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

FROM ubuntu:bionic

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update -y -q && \
apt-get install -y -q --no-install-recommends \
apt-transport-https \
software-properties-common \
wget && \
apt-get install -y -q --no-install-recommends \
build-essential \
cmake \
git \
ninja-build \
python3-dev \
python3-pip && \
apt-get clean && rm -rf /var/lib/apt/lists*

RUN pip3 install wheel && \
pip3 install -U setuptools && \
pip3 install wheel virtualenv
61 changes: 61 additions & 0 deletions python/examples/minimal_build/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Minimal Python source build on Linux

This directory shows how to bootstrap a local build from source on Linux with
an eye toward maximum portability across different Linux distributions. This
may help for contributors debugging build issues caused by their local
environments.

## Fedora 31

Build image:

```
docker build -t arrow_fedora_minimal -f Dockerfile.fedora
```

Build with conda or pip/virtualenv:

```
# With pip/virtualenv
docker run --rm -t -i -v $PWD:/io arrow_fedora_minimal /io/build_venv.sh

# With conda
docker run --rm -t -i -v $PWD:/io arrow_fedora_minimal /io/build_conda.sh
```

## Ubuntu 18.04

Build image:

```
docker build -t arrow_ubuntu_minimal -f Dockerfile.ubuntu
```

Build with conda or pip/virtualenv:

```
# With pip/virtualenv
docker run --rm -t -i -v $PWD:/io arrow_ubuntu_minimal /io/build_venv.sh

# With conda
docker run --rm -t -i -v $PWD:/io arrow_ubuntu_minimal /io/build_conda.sh
```
119 changes: 119 additions & 0 deletions python/examples/minimal_build/build_conda.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

set -e

#----------------------------------------------------------------------
# Change this to whatever makes sense for your system

HOME=
MINICONDA=$HOME/miniconda-for-arrow
LIBRARY_INSTALL_DIR=$HOME/local-libs
CPP_BUILD_DIR=$HOME/arrow-cpp-build
ARROW_ROOT=/arrow
PYTHON=3.7

git clone https://github.com/apache/arrow.git /arrow

#----------------------------------------------------------------------
# Run these only once

function setup_miniconda() {
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
wget -O miniconda.sh $MINICONDA_URL
bash miniconda.sh -b -p $MINICONDA
rm -f miniconda.sh
LOCAL_PATH=$PATH
export PATH="$MINICONDA/bin:$PATH"

conda update -y -q conda
conda config --set auto_update_conda false
conda info -a

conda config --set show_channel_urls True
conda config --add channels https://repo.continuum.io/pkgs/free
conda config --add channels conda-forge

conda create -y -n pyarrow-$PYTHON -c conda-forge \
--file arrow/ci/conda_env_unix.yml \
--file arrow/ci/conda_env_cpp.yml \
--file arrow/ci/conda_env_python.yml \
compilers \
python=3.7 \
pandas

export PATH=$LOCAL_PATH
}

setup_miniconda

#----------------------------------------------------------------------
# Activate conda in bash and activate conda environment

. $MINICONDA/etc/profile.d/conda.sh
conda activate pyarrow-$PYTHON
export ARROW_HOME=$CONDA_PREFIX

#----------------------------------------------------------------------
# Build C++ library

mkdir -p $CPP_BUILD_DIR
pushd $CPP_BUILD_DIR

cmake -GNinja \
-DCMAKE_BUILD_TYPE=DEBUG \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
-DARROW_FLIGHT=ON \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_SNAPPY=ON \
-DARROW_WITH_BROTLI=ON \
-DARROW_PARQUET=ON \
-DARROW_PLASMA=ON \
-DARROW_PYTHON=ON \
$ARROW_ROOT/cpp

ninja install

popd

#----------------------------------------------------------------------
# Build and test Python library
pushd $ARROW_ROOT/python

rm -rf build/ # remove any pesky pre-existing build directory

export PYARROW_BUILD_TYPE=Debug
export PYARROW_CMAKE_GENERATOR=Ninja
export PYARROW_WITH_FLIGHT=1
export PYARROW_WITH_PARQUET=1

# You can run either "develop" or "build_ext --inplace". Your pick

# python setup.py build_ext --inplace
python setup.py develop

# git submodules are required for unit tests
git submodule update --init
export PARQUET_TEST_DATA="$ARROW_ROOT/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="$ARROW_ROOT/testing/data"

py.test pyarrow
Loading