From cd8e093240fefa978788224f7bb84bf9e9f1c59e Mon Sep 17 00:00:00 2001 From: ARF1 Date: Thu, 5 Sep 2019 16:03:56 +0200 Subject: [PATCH 1/6] ARROW-6465: [Python] Improvement to Windows build instructions The current instructions for building the pyarrow python extension are incomplete. Problems include: - missing re2, llvm, clang prerequisites - missing info on which MSVC toolsets are supported - missing info on how the build commands to different MSVC toolsets - missing warning about currently broken Windows build config This amends the python developer documentation with above. --- docs/source/developers/python.rst | 66 ++++++++++++++++++++++--------- 1 file changed, 48 insertions(+), 18 deletions(-) diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index f25e030fd0e..30dcd18dd44 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -379,8 +379,25 @@ debugging a C++ unitttest, for example: Building on Windows =================== -First, we bootstrap a conda environment similar to above, but skipping some of -the Linux/macOS-only packages: +.. warning:: + Building on Windows is currently broken. The issue is being worked on + under the `Github PR #5247`. + +Building on Windows requires one of the following + +- `Build Tools for Visual Studio 2017 Date: Thu, 5 Sep 2019 18:00:55 +0200 Subject: [PATCH 2/6] fix rst syntax issues --- docs/source/developers/python.rst | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index 30dcd18dd44..d51a394c3c1 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -380,21 +380,19 @@ Building on Windows =================== .. warning:: - Building on Windows is currently broken. The issue is being worked on - under the `Github PR #5247`. + Building on Windows is currently broken. The issue is being worked on + under the `Github PR #5247 `_. -Building on Windows requires one of the following +Building on Windows requires one of the following compilers to be installed: -- `Build Tools for Visual Studio 2017`_ +- `Microsoft Build Tools 2015 `_, or - Visual Studio 2015 - Visual Studio 2017 -to be installed. - During the setup of Build Tools ensure at least one Windows SDK is selected. -Visual Studio 2019 and it's build tools are currently not supported. +Visual Studio 2019 and its build tools are currently not supported. We bootstrap a conda environment similar to above, but skipping some of the Linux/macOS-only packages: @@ -416,8 +414,8 @@ First, starting from fresh clones of Apache Arrow: Now, we build and install Arrow C++ libraries. -The CMake parameters will need to be adjusted according to the Build Tools or -Visual Studio version installed. +The CMake parameters need to be adjusted according to the Build Tools or Visual +Studio version installed. For Build Tools for Visual Studio 2017 and Visual Studio 2017: @@ -437,7 +435,7 @@ For Build Tools for Visual Studio 2017 and Visual Studio 2017: popd For Microsoft Build Tools 2015 and Visual Studio 2015, replace the string after -`cmake -G` with `"Visual Studio 14 2015 Win64"` +``cmake -G`` with ``"Visual Studio 14 2015 Win64"``. After that, we must put the install directory's bin path in our ``%PATH%``: From 927c6cc3a4b51ac9fd72cb44442732da6164feff Mon Sep 17 00:00:00 2001 From: ARF1 Date: Thu, 5 Sep 2019 21:20:21 +0200 Subject: [PATCH 3/6] fix ARROW_HOME location --- docs/source/developers/python.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index d51a394c3c1..fa62115832f 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -422,8 +422,8 @@ For Build Tools for Visual Studio 2017 and Visual Studio 2017: .. code-block:: shell mkdir arrow\cpp\build - pushd arrow\cpp\build set ARROW_HOME=%cd%\arrow-dist + pushd arrow\cpp\build cmake -G "Visual Studio 15 2017 Win64" ^ -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ -DARROW_CXXFLAGS="/WX /MP" ^ From 121faff0246baf5ad6425f7a7d24a3b8dbbfa6b0 Mon Sep 17 00:00:00 2001 From: ARF1 Date: Sat, 7 Sep 2019 12:35:21 +0200 Subject: [PATCH 4/6] incorporate suggestions from @wesm --- docs/source/developers/python.rst | 99 ++++++++++++++++++++++++------- 1 file changed, 77 insertions(+), 22 deletions(-) diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index fa62115832f..67d1301dc3d 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -379,14 +379,10 @@ debugging a C++ unitttest, for example: Building on Windows =================== -.. warning:: - Building on Windows is currently broken. The issue is being worked on - under the `Github PR #5247 `_. - Building on Windows requires one of the following compilers to be installed: - `Build Tools for Visual Studio 2017 `_ -- `Microsoft Build Tools 2015 `_, or +- `Microsoft Build Tools 2015 `_ - Visual Studio 2015 - Visual Studio 2017 @@ -414,17 +410,33 @@ First, starting from fresh clones of Apache Arrow: Now, we build and install Arrow C++ libraries. -The CMake parameters need to be adjusted according to the Build Tools or Visual -Studio version installed. +We set a number of environment variables: -For Build Tools for Visual Studio 2017 and Visual Studio 2017: +- the path of the installation directory of the Arrow C++ library as arrow + ``ARROW_HOME`` +- add the path of its built .dll-libraris to ``PATH`` +- and choose the compiler to be used .. code-block:: shell - mkdir arrow\cpp\build set ARROW_HOME=%cd%\arrow-dist + set PATH=%ARROW_HOME%\bin;%PATH% + set PYARROW_CMAKE_GENERATOR=Visual Studio 15 2017 Win64 + +This assumes Visual Studio 2017 or its build tools are used. For Visual Studio +2015 and its build tools use the following instead: + +.. code-block:: shell + + set PYARROW_CMAKE_GENERATOR=Visual Studio 14 2015 Win64 + +Let's configure, build and install the Arrow C++ libraries: + +.. code-block:: shell + + mkdir arrow\cpp\build pushd arrow\cpp\build - cmake -G "Visual Studio 15 2017 Win64" ^ + cmake -G "%PYARROW_CMAKE_GENERATOR%" ^ -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ -DARROW_CXXFLAGS="/WX /MP" ^ -DARROW_GANDIVA=on ^ @@ -434,15 +446,6 @@ For Build Tools for Visual Studio 2017 and Visual Studio 2017: cmake --build . --target INSTALL --config Release popd -For Microsoft Build Tools 2015 and Visual Studio 2015, replace the string after -``cmake -G`` with ``"Visual Studio 14 2015 Win64"``. - -After that, we must put the install directory's bin path in our ``%PATH%``: - -.. code-block:: shell - - set PATH=%ARROW_HOME%\bin;%PATH% - Now, we can build pyarrow: .. code-block:: shell @@ -453,6 +456,11 @@ Now, we can build pyarrow: python setup.py build_ext --inplace popd +.. note:: + + For building pyarrow, the above defined environment variables need to also + be set. Remember this if to want to re-build ``pyarrow`` after your initial build. + Then run the unit tests with: .. code-block:: shell @@ -461,17 +469,64 @@ Then run the unit tests with: py.test pyarrow -v popd +.. note:: + + With the above instructions the Arrow C++ libraries are not bundled with + the python extension. This is recommended for development as it allows the + C++ libraries to be re-built separately. + + As a consequence however, ``python setup.py install`` will also not install + the Arrow C++ libraries. Therefore, to use ``pyarrow`` in python, ``PATH`` + must contain the directory with the Arrow .dll-files. + + If you want to bundle the Arrow C++ libraries with ``pyarrow`` add + ``--bundle-arrow-cpp`` as build parameter: + + ``python setup.py build_ext --inplace --bundle_arrow_cpp`` + Running C++ unit tests for Python integration --------------------------------------------- -Getting ``python-test.exe`` to run is a bit tricky because your -``%PYTHONHOME%`` must be configured to point to the active conda environment: +Running C++ unit tests should not be necessary for most developers. If you do +want to run them, you need to pass ``-DARROW_BUILD_TESTS=ON`` during +configuration of the Arrow C++ library build: + +.. code-block:: shell + + pushd arrow\cpp\build + cmake -G "%PYARROW_CMAKE_GENERATOR%" ^ + -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ + -DARROW_CXXFLAGS="/WX /MP" ^ + -DARROW_GANDIVA=on ^ + -DARROW_PARQUET=on ^ + -DARROW_PYTHON=on ^ + -DARROW_BUILD_TESTS=ON ^ + .. + cmake --build . --target INSTALL --config Release + popd + + +Getting ``arrow-python-test.exe`` (C++ unit tests for python integration) to +run is a bit tricky because your ``%PYTHONHOME%`` must be configured to point +to the active conda environment: + +.. code-block:: shell + + set PYTHONHOME=%CONDA_PREFIX% + pushd arrow\cpp\build\release\Release + arrow-python-test.exe + popd + +To run all tests of the Arrow C++ library, you can also run ``ctest``: .. code-block:: shell set PYTHONHOME=%CONDA_PREFIX% + pushd arrow\cpp\build + ctest + popd + -Now ``python-test.exe`` or simply ``ctest`` (to run all tests) should work. Windows Caveats --------------- From f2ea51b07b7dc02e5545d382d21846609abfcb1c Mon Sep 17 00:00:00 2001 From: ARF1 Date: Sat, 7 Sep 2019 20:14:23 +0200 Subject: [PATCH 5/6] fix spelling of --bundle-arrow-cpp --- docs/source/developers/python.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index 67d1301dc3d..6a3b5f39bcb 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -482,7 +482,7 @@ Then run the unit tests with: If you want to bundle the Arrow C++ libraries with ``pyarrow`` add ``--bundle-arrow-cpp`` as build parameter: - ``python setup.py build_ext --inplace --bundle_arrow_cpp`` + ``python setup.py build_ext --inplace --bundle-arrow-cpp`` Running C++ unit tests for Python integration --------------------------------------------- From a57ea7d192f0ecc6db7d11b4d6d8e9f8f34aae5b Mon Sep 17 00:00:00 2001 From: ARF1 Date: Sun, 8 Sep 2019 11:50:56 +0200 Subject: [PATCH 6/6] warning about side-effects of --bundle-arrow-cpp --- docs/source/developers/python.rst | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index 6a3b5f39bcb..b76a37eadfa 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -412,9 +412,9 @@ Now, we build and install Arrow C++ libraries. We set a number of environment variables: -- the path of the installation directory of the Arrow C++ library as arrow +- the path of the installation directory of the Arrow C++ libraries as ``ARROW_HOME`` -- add the path of its built .dll-libraris to ``PATH`` +- add the path of installed DLL libraries to ``PATH`` - and choose the compiler to be used .. code-block:: shell @@ -472,27 +472,35 @@ Then run the unit tests with: .. note:: With the above instructions the Arrow C++ libraries are not bundled with - the python extension. This is recommended for development as it allows the + the Python extension. This is recommended for development as it allows the C++ libraries to be re-built separately. As a consequence however, ``python setup.py install`` will also not install - the Arrow C++ libraries. Therefore, to use ``pyarrow`` in python, ``PATH`` + the Arrow C++ libraries. Therefore, to use ``pyarrow`` in python, ``PATH`` must contain the directory with the Arrow .dll-files. - If you want to bundle the Arrow C++ libraries with ``pyarrow`` add + If you want to bundle the Arrow C++ libraries with ``pyarrow`` add ``--bundle-arrow-cpp`` as build parameter: - ``python setup.py build_ext --inplace --bundle-arrow-cpp`` + ``python setup.py build_ext --bundle-arrow-cpp`` + + Important: If you combine ``--bundle-arrow-cpp`` with ``--inplace`` the + Arrow C++ libraries get copied to the python source tree and are not cleared + by ``python setup.py clean``. They remain in place and will take precedence + over any later Arrow C++ libraries contained in ``PATH``. This can lead to + incompatibilities when ``pyarrow`` is later built without + ``--bundle-arrow-cpp``. Running C++ unit tests for Python integration --------------------------------------------- Running C++ unit tests should not be necessary for most developers. If you do -want to run them, you need to pass ``-DARROW_BUILD_TESTS=ON`` during +want to run them, you need to pass ``-DARROW_BUILD_TESTS=ON`` during configuration of the Arrow C++ library build: .. code-block:: shell + mkdir arrow\cpp\build pushd arrow\cpp\build cmake -G "%PYARROW_CMAKE_GENERATOR%" ^ -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^