Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 115 additions & 24 deletions docs/source/developers/python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -379,8 +379,19 @@ debugging a C++ unitttest, for example:
Building on Windows
===================

First, we bootstrap a conda environment similar to above, but skipping some of
the Linux/macOS-only packages:
Building on Windows requires one of the following compilers to be installed:

- `Build Tools for Visual Studio 2017 <https://download.visualstudio.microsoft.com/download/pr/3e542575-929e-4297-b6c6-bef34d0ee648/639c868e1219c651793aff537a1d3b77/vs_buildtools.exe>`_
- `Microsoft Build Tools 2015 <http://download.microsoft.com/download/5/F/7/5F7ACAEB-8363-451F-9425-68A90F98B238/visualcppbuildtools_full.exe>`_
- Visual Studio 2015
- Visual Studio 2017

During the setup of Build Tools ensure at least one Windows SDK is selected.

Visual Studio 2019 and its build tools are currently not supported.

We bootstrap a conda environment similar to above, but skipping some of the
Linux/macOS-only packages:

First, starting from fresh clones of Apache Arrow:

Expand All @@ -390,60 +401,140 @@ First, starting from fresh clones of Apache Arrow:

.. code-block:: shell

conda create -y -n pyarrow-dev -c conda-forge ^
--file arrow\ci\conda_env_cpp.yml ^
--file arrow\ci\conda_env_python.yml ^
python=3.7
conda create -y -n pyarrow-dev -c conda-forge ^
--file arrow\ci\conda_env_cpp.yml ^
--file arrow\ci\conda_env_python.yml ^
--file arrow\ci\conda_env_gandiva.yml ^
python=3.7
conda activate pyarrow-dev

Now, we build and install Arrow C++ libraries
Now, we build and install Arrow C++ libraries.

We set a number of environment variables:

- the path of the installation directory of the Arrow C++ libraries as
``ARROW_HOME``
- add the path of installed DLL libraries to ``PATH``
- and choose the compiler to be used

.. code-block:: shell

mkdir cpp\build
cd cpp\build
set ARROW_HOME=C:\thirdparty
cmake -G "Visual Studio 14 2015 Win64" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DARROW_CXXFLAGS="/WX /MP" ^
-DARROW_GANDIVA=on ^
-DARROW_PARQUET=on ^
-DARROW_PYTHON=on ..
cmake --build . --target INSTALL --config Release
cd ..\..
set ARROW_HOME=%cd%\arrow-dist
set PATH=%ARROW_HOME%\bin;%PATH%
set PYARROW_CMAKE_GENERATOR=Visual Studio 15 2017 Win64

After that, we must put the install directory's bin path in our ``%PATH%``:
This assumes Visual Studio 2017 or its build tools are used. For Visual Studio
2015 and its build tools use the following instead:

.. code-block:: shell

set PATH=%ARROW_HOME%\bin;%PATH%
set PYARROW_CMAKE_GENERATOR=Visual Studio 14 2015 Win64

Let's configure, build and install the Arrow C++ libraries:

.. code-block:: shell

mkdir arrow\cpp\build
pushd arrow\cpp\build
cmake -G "%PYARROW_CMAKE_GENERATOR%" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DARROW_CXXFLAGS="/WX /MP" ^
-DARROW_GANDIVA=on ^
-DARROW_PARQUET=on ^
-DARROW_PYTHON=on ^
..
cmake --build . --target INSTALL --config Release
popd

Now, we can build pyarrow:

.. code-block:: shell

cd python
pushd arrow\python
set PYARROW_WITH_GANDIVA=1
set PYARROW_WITH_PARQUET=1
python setup.py build_ext --inplace
popd

.. note::

For building pyarrow, the above defined environment variables need to also
be set. Remember this if to want to re-build ``pyarrow`` after your initial build.

Then run the unit tests with:

.. code-block:: shell

pushd arrow\python
py.test pyarrow -v
Copy link
Member

@kiszk kiszk Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed these steps on a fresh Windows 10 instance. Most of them work well.

An issue is that when I used Build Tools for Visual Studio 2017, this command shows the attached error. When I update this line as `self.cmake_generator = 'Visual Studio 15 2017 Win64', it works well.

Should we update this document or should we update the setup.py?

running build_ext
creating build
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
-- Running cmake for pyarrow
C:\Users\abc\Miniconda3\envs\pyarrow-dev\Library\bin\cmake.exe -DPYTHON_EXECUTABLE=C:\Users\abc\Miniconda3\envs\pyarrow-dev\python.exe  -G "Visual Studio 14 2015 Win64" -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_BUILD_GANDIVA=on -DCMAKE_BUILD_TYPE=release C:\arrow\arrow\python
-- Selecting Windows SDK version  to target Windows 10.0.15063.
CMake Error at CMakeLists.txt:22 (project):
  Failed to run MSBuild command:

    MSBuild.exe

  to get the value of VCTargetsPath:

    Microsoft (R) Build Engine version 15.9.21+g9802d43bc3 for .NET Framework
    Copyright (C) Microsoft Corporation. All rights reserved.

    Build started 9/6/2019 9:55:02 AM.
    Project "C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj" on node 1 (default targets).
    C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\Common7\IDE\VC\VCTargets\Microsoft.Cpp.Platform.targets(67,5): error MSB8020: The build tools for v140 (Platform Toolset = 'v140') cannot be found. To build using the v140 build tools, please install v140 build tools.  Alternatively, you may upgrade to the current Visual Studio tools by selecting the Project menu or right-click the solution, and then selecting "Retarget solution". [C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj]
    Done Building Project "C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj" (default targets) -- FAILED.

    Build FAILED.

    "C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj" (default target) (1) ->
    (PlatformPrepareForBuild target) ->
      C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\Common7\IDE\VC\VCTargets\Microsoft.Cpp.Platform.targets(67,5): error MSB8020: The build tools for v140 (Platform Toolset = 'v140') cannot be found. To build using the v140 build tools, please install v140 build tools.  Alternatively, you may upgrade to the current Visual Studio tools by selecting the Project menu or right-click the solution, and then selecting "Retarget solution". [C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj]

        0 Warning(s)
        1 Error(s)

    Time Elapsed 00:00:00.94


  Exit code: 1



-- Configuring incomplete, errors occurred!
See also "C:/arrow/arrow/python/build/temp.win-amd64-3.7/Release/CMakeFiles/CMakeOutput.log".
error: command 'C:\\Users\\abc\\Miniconda3\\envs\\pyarrow-dev\\Library\\bin\\cmake.exe' failed with exit status 1

(pyarrow-dev) C:\arrow\arrow\python>

Copy link
Contributor Author

@ARF1 ARF1 Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stumbled across this error as well since my last commit.

From what I can tell, this does not happen, when one follows the recipe in one go: i.e. configure arrow library, build arrow library then build extension.

I encountered this error when I wanted to rebuild the python extension without reconfiguring and re-building the arrow library. For this I did python setup.py clean. After that when I run python setup.py build_ext --inplace the issue you described pops up.

Simpler than changing the line you referenced would be to introduce into the build recipe the step:
set PYARROW_CMAKE_GENERATOR=Visual Studio 17 2017 Win64

One could then change the cmake line to use this once defined variable like so:
cmake -G "%PYARROW_CMAKE_GENERATOR%" ^

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your update. I agree with your proposal. This change also works for my environment.

popd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 474, I cannot execute python-test.exe or ctest at arrow\python. I may miss something to build python-test.exe

(pyarrow-dev) C:\arrow\arrow\python>ctest
*********************************
No test configuration file found!
*********************************
Usage

  ctest [options]


(pyarrow-dev) C:\arrow\arrow\python>python-test
'python-test' is not recognized as an internal or external command,
operable program or batch file.

(pyarrow-dev) C:\arrow\arrow\python>dir python-test*
 Volume in drive C is OS
 Volume Serial Number is 7884-B5DB

 Directory of C:\arrow\arrow\python

File Not Found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not tried this yet. Can confirm your observations.

Could this be an obsolete part of the instructions? In the linux instructions neither python-test nor ctest seem to make an appearance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to pass -DARROW_BUILD_TESTS=ON to build arrow-python-test.exe, the unit tests are now toggled off by default. I don't think most developers need to build this executable but this should go in a separate section for completeness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm Thanks. ctest runs now but it must be run from the cpp/build directory.

Also, arrow-python-test fails:

      Start 58: arrow-python-test
58/83 Test #58: arrow-python-test .........................Exit code 0xc0000409
***Exception:   0.51 sec

I can find a arrow-python-test.exe in cpp/build/release/Release. Unfortunately running it directly is not working either.

(pyarrow-dev) Z:\dev\arrow\cpp\build\release\Release>arrow-python-test.exe
Fatal Python error: initfsencoding: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00002390 (most recent call first):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm Apologies. ctest and arrow-python-test.exe do run ok.

I forgot the set PYTHONHOME=%CONDA_PREFIX%...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This change also works well for my environment.

Copy link
Member

@wesm wesm Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary for most developers to run the C++-based unit test, so we should put this off in a separate Advanced section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is already done


.. note::

With the above instructions the Arrow C++ libraries are not bundled with
the Python extension. This is recommended for development as it allows the
C++ libraries to be re-built separately.

As a consequence however, ``python setup.py install`` will also not install
the Arrow C++ libraries. Therefore, to use ``pyarrow`` in python, ``PATH``
must contain the directory with the Arrow .dll-files.

If you want to bundle the Arrow C++ libraries with ``pyarrow`` add
``--bundle-arrow-cpp`` as build parameter:

``python setup.py build_ext --bundle-arrow-cpp``

Important: If you combine ``--bundle-arrow-cpp`` with ``--inplace`` the
Arrow C++ libraries get copied to the python source tree and are not cleared
by ``python setup.py clean``. They remain in place and will take precedence
over any later Arrow C++ libraries contained in ``PATH``. This can lead to
incompatibilities when ``pyarrow`` is later built without
``--bundle-arrow-cpp``.

Running C++ unit tests for Python integration
---------------------------------------------

Getting ``python-test.exe`` to run is a bit tricky because your
``%PYTHONHOME%`` must be configured to point to the active conda environment:
Running C++ unit tests should not be necessary for most developers. If you do
want to run them, you need to pass ``-DARROW_BUILD_TESTS=ON`` during
configuration of the Arrow C++ library build:

.. code-block:: shell

mkdir arrow\cpp\build
pushd arrow\cpp\build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it better to add mkdir arrow\cpp\build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are quite right. Just in case somebody jumps directly to this section.

cmake -G "%PYARROW_CMAKE_GENERATOR%" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DARROW_CXXFLAGS="/WX /MP" ^
-DARROW_GANDIVA=on ^
-DARROW_PARQUET=on ^
-DARROW_PYTHON=on ^
-DARROW_BUILD_TESTS=ON ^
..
cmake --build . --target INSTALL --config Release
popd


Getting ``arrow-python-test.exe`` (C++ unit tests for python integration) to
run is a bit tricky because your ``%PYTHONHOME%`` must be configured to point
to the active conda environment:

.. code-block:: shell

set PYTHONHOME=%CONDA_PREFIX%
pushd arrow\cpp\build\release\Release
arrow-python-test.exe
popd

To run all tests of the Arrow C++ library, you can also run ``ctest``:

.. code-block:: shell

set PYTHONHOME=%CONDA_PREFIX%
pushd arrow\cpp\build
ctest
popd


Now ``python-test.exe`` or simply ``ctest`` (to run all tests) should work.

Windows Caveats
---------------
Expand Down