From e51bce7f673f9da7b8ac9d7fa7f72cfd6fbc61ac Mon Sep 17 00:00:00 2001 From: MechCoder Date: Wed, 13 Jul 2016 17:24:26 -0700 Subject: [PATCH 1/3] [ARROW-240] Update installation instructions for pyarrow --- python/README.md | 115 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 113 insertions(+), 2 deletions(-) diff --git a/python/README.md b/python/README.md index c79fa9786f4..5065fa33c07 100644 --- a/python/README.md +++ b/python/README.md @@ -4,11 +4,122 @@ This library provides a Pythonic API wrapper for the reference Arrow C++ implementation, along with tools for interoperability with pandas, NumPy, and other traditional Python scientific computing packages. -#### Development details +### Development details This project is layered in two pieces: * pyarrow, a C++ library for easier interoperability between Arrow C++, NumPy, and pandas * Cython extensions and pure Python code under arrow/ which expose Arrow C++ - and pyarrow to pure Python users \ No newline at end of file + and pyarrow to pure Python users + +#### PyArrow Installation +These are instructions on how to install PyArrow from scratch on Linux (assuming arrow is not yet installed) + +1. **g++ and gcc** + + Make sure the latest versions of g++ and gcc are installed. + ```bash + sudo add-apt-repository ppa:ubuntu-toolchain-r/test + sudo apt-get update + sudo apt-get install gcc-4.9 g++-4.9 + ``` + +2. **cmake** + ```bash + sudo add-apt-repository ppa:george-edison55/cmake-3.x + sudo apt-get update + sudo apt-get install cmake + ``` + +3. **boost** + ```bash + sudo apt-get install build-essential g++ python-dev autotools-dev libicu-dev build-essential libbz2-dev + wget -O boost_1_60_0.tar.gz http://sourceforge.net/projects/boost/files/boost/1.60.0/boost_1_60_0.tar.gz/download + tar xzvf boost_1_60_0.tar.gz + cd boost_1_60_0 + ./bootstrap.sh --prefix=/usr/local + sudo ./b2 install + ``` + +3. **miniconda (optional)** + + Installing mininconda is optional but is recommended to easily install parquet-cpp and other dependencies (see below). Skip this if you prefer installing parquet-cpp from source. + ```bash + # Assumes that you are in the root of the arrow project. + export HOME=$(pwd) + # Change these if you would like to mininconda to reside in a different location. + export MINICONDA=$HOME/miniconda + + wget -O miniconda.sh https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh + + bash miniconda.sh -b -p $MINICONDA + export PATH="$MINICONDA/bin:$PATH" + + conda update -y -q conda + conda config --set show_channel_urls True + conda config --add channels https://repo.continuum.io/pkgs/free + conda config --add channels conda-forge + conda config --add channels apache + conda info -a + + conda install --yes conda-build jinja2 anaconda-client + conda install -y nomkl + ``` + +5. **Parquet-cpp** + + If you would like to install parquet-cpp from source, (https://github.com/apache/parquet-cpp/blob/master/README.md) is a better place to look at. You need to set the ``PARQUET_HOME`` environment variable to where parquet-cpp is installed. + ```bash + conda install -y --channel apache/channel/dev parquet-cpp + export PARQUET_HOME=$MINICONDA + ``` + +6. **Arrow-cpp and its dependencies*** + + We need arrow-cpp for its python port. If you have already have arrow-cpp, just remember to set the environment variable + ``ARROW_CPP_INSTALL`` to wherever it is installed. + ```bash + export CPP_BUILD_DIR=$HOME/cpp-build + mkdir $CPP_BUILD_DIR + cd $CPP_BUILD_DIR + export CPP_DIR=$HOME/cpp + + cp -r $CPP_DIR/thirdparty . + cp $CPP_DIR/setup_build_env.sh . + source setup_build_env.sh + + # Change this if you want arrow to be installed elsewhere. + export ARROW_CPP_INSTALL=$HOME/cpp-install + + CMAKE_COMMON_FLAGS="\ + -DARROW_BUILD_BENCHMARKS=ON \ + -DARROW_PARQUET=ON \ + -DARROW_HDFS=on \ + -DCMAKE_INSTALL_PREFIX=$ARROW_CPP_INSTALL" + cmake -DARROW_TEST_MEMCHECK=on \ + $CMAKE_COMMON_FLAGS \ + -DCMAKE_CXX_FLAGS="-Werror" \ + $CPP_DIR + + make -j4 + make install + + cd.. + ``` + +7. **Install python dependencies** + + If you have installed miniconda follow this. Else you could install them as you wish. + ```bash + PYTHON_DIR=$HOME/python + export LD_LIBRARY_PATH="$MINICONDA/lib:LD_LIBRARY_PATH" + conda install -y numpy pandas cython pytest + pushd $PYTHON_DIR + ``` + +8. **Build pyarrow** + + ```bash + python setup.py build_ext --inplace + ``` From c2533e9a1d81d3fa1a3dba6a1ced2c017d5ac5de Mon Sep 17 00:00:00 2001 From: MechCoder Date: Fri, 22 Jul 2016 12:24:12 -0700 Subject: [PATCH 2/3] Update installation instructions --- python/README.md | 112 +++++++---------------------------------------- 1 file changed, 15 insertions(+), 97 deletions(-) diff --git a/python/README.md b/python/README.md index 5065fa33c07..195ecb0457f 100644 --- a/python/README.md +++ b/python/README.md @@ -13,113 +13,31 @@ This project is layered in two pieces: * Cython extensions and pure Python code under arrow/ which expose Arrow C++ and pyarrow to pure Python users -#### PyArrow Installation -These are instructions on how to install PyArrow from scratch on Linux (assuming arrow is not yet installed) - -1. **g++ and gcc** - - Make sure the latest versions of g++ and gcc are installed. - ```bash - sudo add-apt-repository ppa:ubuntu-toolchain-r/test - sudo apt-get update - sudo apt-get install gcc-4.9 g++-4.9 - ``` - -2. **cmake** - ```bash - sudo add-apt-repository ppa:george-edison55/cmake-3.x - sudo apt-get update - sudo apt-get install cmake - ``` +#### PyArrow Dependencies: +These are the various projects that PyArrow depends on. +1. **g++ and gcc Version >= 4.9** +2. **cmake >= 3.2** 3. **boost** - ```bash - sudo apt-get install build-essential g++ python-dev autotools-dev libicu-dev build-essential libbz2-dev - wget -O boost_1_60_0.tar.gz http://sourceforge.net/projects/boost/files/boost/1.60.0/boost_1_60_0.tar.gz/download - tar xzvf boost_1_60_0.tar.gz - cd boost_1_60_0 - ./bootstrap.sh --prefix=/usr/local - sudo ./b2 install - ``` - -3. **miniconda (optional)** +4. **Parquet-cpp** - Installing mininconda is optional but is recommended to easily install parquet-cpp and other dependencies (see below). Skip this if you prefer installing parquet-cpp from source. - ```bash - # Assumes that you are in the root of the arrow project. - export HOME=$(pwd) - # Change these if you would like to mininconda to reside in a different location. - export MINICONDA=$HOME/miniconda - - wget -O miniconda.sh https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh - - bash miniconda.sh -b -p $MINICONDA - export PATH="$MINICONDA/bin:$PATH" - - conda update -y -q conda - conda config --set show_channel_urls True - conda config --add channels https://repo.continuum.io/pkgs/free - conda config --add channels conda-forge - conda config --add channels apache - conda info -a - - conda install --yes conda-build jinja2 anaconda-client - conda install -y nomkl - ``` - -5. **Parquet-cpp** - - If you would like to install parquet-cpp from source, (https://github.com/apache/parquet-cpp/blob/master/README.md) is a better place to look at. You need to set the ``PARQUET_HOME`` environment variable to where parquet-cpp is installed. + The preferred way to install parquet-cpp is to use conda. + You need to set the ``PARQUET_HOME`` environment variable to where parquet-cpp is installed. ```bash conda install -y --channel apache/channel/dev parquet-cpp - export PARQUET_HOME=$MINICONDA - ``` - -6. **Arrow-cpp and its dependencies*** - - We need arrow-cpp for its python port. If you have already have arrow-cpp, just remember to set the environment variable - ``ARROW_CPP_INSTALL`` to wherever it is installed. - ```bash - export CPP_BUILD_DIR=$HOME/cpp-build - mkdir $CPP_BUILD_DIR - cd $CPP_BUILD_DIR - export CPP_DIR=$HOME/cpp - - cp -r $CPP_DIR/thirdparty . - cp $CPP_DIR/setup_build_env.sh . - source setup_build_env.sh - - # Change this if you want arrow to be installed elsewhere. - export ARROW_CPP_INSTALL=$HOME/cpp-install - - CMAKE_COMMON_FLAGS="\ - -DARROW_BUILD_BENCHMARKS=ON \ - -DARROW_PARQUET=ON \ - -DARROW_HDFS=on \ - -DCMAKE_INSTALL_PREFIX=$ARROW_CPP_INSTALL" - cmake -DARROW_TEST_MEMCHECK=on \ - $CMAKE_COMMON_FLAGS \ - -DCMAKE_CXX_FLAGS="-Werror" \ - $CPP_DIR - - make -j4 - make install - - cd.. ``` +5. **Arrow-cpp and its dependencies*** -7. **Install python dependencies** - - If you have installed miniconda follow this. Else you could install them as you wish. + The Arrow C++ library must be built with all options enabled and installed with ``ARROW_HOME`` environment variable set to + the installation location. Look at (https://github.com/apache/arrow/blob/master/cpp/README.md) for + instructions. Alternatively you could just install arrow-cpp + from conda. ```bash - PYTHON_DIR=$HOME/python - export LD_LIBRARY_PATH="$MINICONDA/lib:LD_LIBRARY_PATH" - conda install -y numpy pandas cython pytest - pushd $PYTHON_DIR + conda install arrow-cpp -c apache/channel/dev ``` +6. **Python dependencies: numpy, pandas, cython, pytest** -8. **Build pyarrow** - +#### Install pyarrow ```bash python setup.py build_ext --inplace ``` From d4582e0e4b109659f0b15b4f31a4182ac7c33da9 Mon Sep 17 00:00:00 2001 From: MechCoder Date: Mon, 25 Jul 2016 11:24:58 -0700 Subject: [PATCH 3/3] update gcc and cmake --- python/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/README.md b/python/README.md index 195ecb0457f..bafe71b05ec 100644 --- a/python/README.md +++ b/python/README.md @@ -16,8 +16,8 @@ This project is layered in two pieces: #### PyArrow Dependencies: These are the various projects that PyArrow depends on. -1. **g++ and gcc Version >= 4.9** -2. **cmake >= 3.2** +1. **g++ and gcc Version >= 4.8** +2. **cmake > 2.8.6** 3. **boost** 4. **Parquet-cpp**