diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index 7cb340dbbb0..50eba0a3d76 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -252,7 +252,8 @@ folder as the repositories and a target installation folder: virtualenv pyarrow source ./pyarrow/bin/activate - pip install six numpy pandas cython pytest hypothesis + pip install -r arrow/python/requirements-build.txt \ + -r arrow/python/requirements-test.txt # This is the folder where we will install the Arrow libraries during # development @@ -281,9 +282,6 @@ Now build and install the Arrow C++ libraries: cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DCMAKE_INSTALL_LIBDIR=lib \ - -DARROW_FLIGHT=ON \ - -DARROW_GANDIVA=ON \ - -DARROW_ORC=ON \ -DARROW_WITH_BZ2=ON \ -DARROW_WITH_ZLIB=ON \ -DARROW_WITH_ZSTD=ON \ @@ -292,24 +290,23 @@ Now build and install the Arrow C++ libraries: -DARROW_WITH_BROTLI=ON \ -DARROW_PARQUET=ON \ -DARROW_PYTHON=ON \ - -DARROW_PLASMA=ON \ -DARROW_BUILD_TESTS=ON \ .. make -j4 make install popd -Many of these components are optional, and can be switched off by setting them -to ``OFF``: +There are a number of optional components that can can be switched ON by +adding flags with ``ON``: * ``ARROW_FLIGHT``: RPC framework * ``ARROW_GANDIVA``: LLVM-based expression compiler * ``ARROW_ORC``: Support for Apache ORC file format * ``ARROW_PARQUET``: Support for Apache Parquet file format * ``ARROW_PLASMA``: Shared memory object store -* ``ARROW_WITH_{BZ2, ZLIB, ZSTD, LZ4, SNAPPY, BROTLI}``: Build support for - compression libraries, used for reading and writing Parquet files and other - things. + +Anything set to ``ON`` above can also be turned off. Note that some compression +libraries are needed for Parquet support. If multiple versions of Python are installed in your environment, you may have to pass additional parameters to cmake so that it can find the right @@ -339,9 +336,6 @@ Now, build pyarrow: .. code-block:: shell pushd arrow/python - export PYARROW_WITH_FLIGHT=1 - export PYARROW_WITH_GANDIVA=1 - export PYARROW_WITH_ORC=1 export PYARROW_WITH_PARQUET=1 python setup.py build_ext --inplace popd @@ -361,6 +355,14 @@ libraries), one can set ``--bundle-arrow-cpp``: python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \ --bundle-arrow-cpp bdist_wheel +Docker examples +~~~~~~~~~~~~~~~ + +If you are having difficulty building the Python library from source, take a +look at the ``python/examples/minimal_build`` directory which illustrates a +complete build and test from source both with the conda and pip/virtualenv +build methods. + Building with CUDA support ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -462,7 +464,6 @@ Let's configure, build and install the Arrow C++ libraries: cmake -G "%PYARROW_CMAKE_GENERATOR%" ^ -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ -DARROW_CXXFLAGS="/WX /MP" ^ - -DARROW_GANDIVA=on ^ -DARROW_PARQUET=on ^ -DARROW_PYTHON=on ^ .. @@ -474,7 +475,6 @@ Now, we can build pyarrow: .. code-block:: shell pushd arrow\python - set PYARROW_WITH_GANDIVA=1 set PYARROW_WITH_PARQUET=1 python setup.py build_ext --inplace popd @@ -528,7 +528,6 @@ configuration of the Arrow C++ library build: cmake -G "%PYARROW_CMAKE_GENERATOR%" ^ -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ -DARROW_CXXFLAGS="/WX /MP" ^ - -DARROW_GANDIVA=on ^ -DARROW_PARQUET=on ^ -DARROW_PYTHON=on ^ -DARROW_BUILD_TESTS=ON ^ @@ -557,8 +556,6 @@ To run all tests of the Arrow C++ library, you can also run ``ctest``: ctest popd - - Windows Caveats --------------- diff --git a/python/examples/minimal_build/Dockerfile.fedora b/python/examples/minimal_build/Dockerfile.fedora new file mode 100644 index 00000000000..7dc329193c9 --- /dev/null +++ b/python/examples/minimal_build/Dockerfile.fedora @@ -0,0 +1,31 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +FROM fedora:31 + +RUN dnf update -y && \ + dnf install -y \ + autoconf \ + gcc \ + gcc-g++ \ + git \ + wget \ + make \ + cmake \ + ninja-build \ + python3-devel \ + python3-virtualenv \ No newline at end of file diff --git a/python/examples/minimal_build/Dockerfile.ubuntu b/python/examples/minimal_build/Dockerfile.ubuntu new file mode 100644 index 00000000000..d7b84085e90 --- /dev/null +++ b/python/examples/minimal_build/Dockerfile.ubuntu @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +FROM ubuntu:bionic + +ENV DEBIAN_FRONTEND=noninteractive + +RUN apt-get update -y -q && \ + apt-get install -y -q --no-install-recommends \ + apt-transport-https \ + software-properties-common \ + wget && \ + apt-get install -y -q --no-install-recommends \ + build-essential \ + cmake \ + git \ + ninja-build \ + python3-dev \ + python3-pip && \ + apt-get clean && rm -rf /var/lib/apt/lists* + +RUN pip3 install wheel && \ + pip3 install -U setuptools && \ + pip3 install wheel virtualenv \ No newline at end of file diff --git a/python/examples/minimal_build/README.md b/python/examples/minimal_build/README.md new file mode 100644 index 00000000000..5164bd819de --- /dev/null +++ b/python/examples/minimal_build/README.md @@ -0,0 +1,61 @@ + + +# Minimal Python source build on Linux + +This directory shows how to bootstrap a local build from source on Linux with +an eye toward maximum portability across different Linux distributions. This +may help for contributors debugging build issues caused by their local +environments. + +## Fedora 31 + +Build image: + +``` +docker build -t arrow_fedora_minimal -f Dockerfile.fedora +``` + +Build with conda or pip/virtualenv: + +``` +# With pip/virtualenv +docker run --rm -t -i -v $PWD:/io arrow_fedora_minimal /io/build_venv.sh + +# With conda +docker run --rm -t -i -v $PWD:/io arrow_fedora_minimal /io/build_conda.sh +``` + +## Ubuntu 18.04 + +Build image: + +``` +docker build -t arrow_ubuntu_minimal -f Dockerfile.ubuntu +``` + +Build with conda or pip/virtualenv: + +``` +# With pip/virtualenv +docker run --rm -t -i -v $PWD:/io arrow_ubuntu_minimal /io/build_venv.sh + +# With conda +docker run --rm -t -i -v $PWD:/io arrow_ubuntu_minimal /io/build_conda.sh +``` diff --git a/python/examples/minimal_build/build_conda.sh b/python/examples/minimal_build/build_conda.sh new file mode 100755 index 00000000000..6f93ebd5647 --- /dev/null +++ b/python/examples/minimal_build/build_conda.sh @@ -0,0 +1,119 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -e + +#---------------------------------------------------------------------- +# Change this to whatever makes sense for your system + +HOME= +MINICONDA=$HOME/miniconda-for-arrow +LIBRARY_INSTALL_DIR=$HOME/local-libs +CPP_BUILD_DIR=$HOME/arrow-cpp-build +ARROW_ROOT=/arrow +PYTHON=3.7 + +git clone https://github.com/apache/arrow.git /arrow + +#---------------------------------------------------------------------- +# Run these only once + +function setup_miniconda() { + MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh" + wget -O miniconda.sh $MINICONDA_URL + bash miniconda.sh -b -p $MINICONDA + rm -f miniconda.sh + LOCAL_PATH=$PATH + export PATH="$MINICONDA/bin:$PATH" + + conda update -y -q conda + conda config --set auto_update_conda false + conda info -a + + conda config --set show_channel_urls True + conda config --add channels https://repo.continuum.io/pkgs/free + conda config --add channels conda-forge + + conda create -y -n pyarrow-$PYTHON -c conda-forge \ + --file arrow/ci/conda_env_unix.yml \ + --file arrow/ci/conda_env_cpp.yml \ + --file arrow/ci/conda_env_python.yml \ + compilers \ + python=3.7 \ + pandas + + export PATH=$LOCAL_PATH +} + +setup_miniconda + +#---------------------------------------------------------------------- +# Activate conda in bash and activate conda environment + +. $MINICONDA/etc/profile.d/conda.sh +conda activate pyarrow-$PYTHON +export ARROW_HOME=$CONDA_PREFIX + +#---------------------------------------------------------------------- +# Build C++ library + +mkdir -p $CPP_BUILD_DIR +pushd $CPP_BUILD_DIR + +cmake -GNinja \ + -DCMAKE_BUILD_TYPE=DEBUG \ + -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ + -DCMAKE_INSTALL_LIBDIR=lib \ + -DARROW_FLIGHT=ON \ + -DARROW_WITH_BZ2=ON \ + -DARROW_WITH_ZLIB=ON \ + -DARROW_WITH_ZSTD=ON \ + -DARROW_WITH_LZ4=ON \ + -DARROW_WITH_SNAPPY=ON \ + -DARROW_WITH_BROTLI=ON \ + -DARROW_PARQUET=ON \ + -DARROW_PLASMA=ON \ + -DARROW_PYTHON=ON \ + $ARROW_ROOT/cpp + +ninja install + +popd + +#---------------------------------------------------------------------- +# Build and test Python library +pushd $ARROW_ROOT/python + +rm -rf build/ # remove any pesky pre-existing build directory + +export PYARROW_BUILD_TYPE=Debug +export PYARROW_CMAKE_GENERATOR=Ninja +export PYARROW_WITH_FLIGHT=1 +export PYARROW_WITH_PARQUET=1 + +# You can run either "develop" or "build_ext --inplace". Your pick + +# python setup.py build_ext --inplace +python setup.py develop + +# git submodules are required for unit tests +git submodule update --init +export PARQUET_TEST_DATA="$ARROW_ROOT/cpp/submodules/parquet-testing/data" +export ARROW_TEST_DATA="$ARROW_ROOT/testing/data" + +py.test pyarrow diff --git a/python/examples/minimal_build/build_venv.sh b/python/examples/minimal_build/build_venv.sh new file mode 100755 index 00000000000..b7dfc2061da --- /dev/null +++ b/python/examples/minimal_build/build_venv.sh @@ -0,0 +1,83 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -e + +#---------------------------------------------------------------------- +# Change this to whatever makes sense for your system + +MINICONDA=$HOME/miniconda-for-arrow +LIBRARY_INSTALL_DIR=$HOME/local-libs +CPP_BUILD_DIR=$HOME/arrow-cpp-build +ARROW_ROOT=/arrow +export ARROW_HOME=/dist +export LD_LIBRARY_PATH=/dist/lib:$LD_LIBRARY_PATH + +git clone https://github.com/apache/arrow.git /arrow + +virtualenv /venv +source /venv/bin/activate + +pip install -r /arrow/python/requirements-build.txt \ + -r /arrow/python/requirements-test.txt + +#---------------------------------------------------------------------- +# Build C++ library + +mkdir -p $CPP_BUILD_DIR +pushd $CPP_BUILD_DIR + +cmake -GNinja \ + -DCMAKE_BUILD_TYPE=DEBUG \ + -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ + -DCMAKE_INSTALL_LIBDIR=lib \ + -DARROW_WITH_BZ2=ON \ + -DARROW_WITH_ZLIB=ON \ + -DARROW_WITH_ZSTD=ON \ + -DARROW_WITH_LZ4=ON \ + -DARROW_WITH_SNAPPY=ON \ + -DARROW_WITH_BROTLI=ON \ + -DARROW_PARQUET=ON \ + -DARROW_PYTHON=ON \ + $ARROW_ROOT/cpp + +ninja install + +popd + +#---------------------------------------------------------------------- +# Build and test Python library +pushd $ARROW_ROOT/python + +rm -rf build/ # remove any pesky pre-existing build directory + +export PYARROW_BUILD_TYPE=Debug +export PYARROW_CMAKE_GENERATOR=Ninja +export PYARROW_WITH_PARQUET=1 + +# You can run either "develop" or "build_ext --inplace". Your pick + +# python setup.py build_ext --inplace +python setup.py develop + +# git submodules are required for unit tests +git submodule update --init +export PARQUET_TEST_DATA="$ARROW_ROOT/cpp/submodules/parquet-testing/data" +export ARROW_TEST_DATA="$ARROW_ROOT/testing/data" + +py.test pyarrow