Skip to content

Conversation

@ARF1
Copy link
Contributor

@ARF1 ARF1 commented Sep 5, 2019

The current instructions for building the pyarrow python extension are incomplete.

Problems include:

  • missing re2, llvm, clang prerequisites
  • missing info on which MSVC toolsets are supported
  • missing info on how the build commands to different MSVC toolsets

This amends the python developer documentation with above.

ARF1 added 3 commits September 5, 2019 16:03
The current instructions for building the pyarrow python extension are incomplete.

Problems include:

 - missing re2, llvm, clang prerequisites
 - missing info on which MSVC toolsets are supported
 - missing info on how the build commands to different MSVC toolsets
 - missing warning about currently broken Windows build config

This amends the python developer documentation with above.
the Linux/macOS-only packages:
.. warning::
Building on Windows is currently broken. The issue is being worked on
under the `Github PR #5247 <https://github.com/apache/arrow/pull/5247>`_.
Copy link
Member

@wesm wesm Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this. Now that https://issues.apache.org/jira/browse/ARROW-6457 is merged it's building fine -- since our CI has been continuously green the issue seems to be localized to certain build configurations and systems

@codecov-io
Copy link

codecov-io commented Sep 6, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@d2be6a5). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #5294   +/-   ##
=========================================
  Coverage          ?   65.16%           
=========================================
  Files             ?      497           
  Lines             ?    68417           
  Branches          ?        0           
=========================================
  Hits              ?    44581           
  Misses            ?    23836           
  Partials          ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2be6a5...38afa1e. Read the comment docs.

.. code-block:: shell
pushd arrow\python
py.test pyarrow -v
Copy link
Member

@kiszk kiszk Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed these steps on a fresh Windows 10 instance. Most of them work well.

An issue is that when I used Build Tools for Visual Studio 2017, this command shows the attached error. When I update this line as `self.cmake_generator = 'Visual Studio 15 2017 Win64', it works well.

Should we update this document or should we update the setup.py?

running build_ext
creating build
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
-- Running cmake for pyarrow
C:\Users\abc\Miniconda3\envs\pyarrow-dev\Library\bin\cmake.exe -DPYTHON_EXECUTABLE=C:\Users\abc\Miniconda3\envs\pyarrow-dev\python.exe  -G "Visual Studio 14 2015 Win64" -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_BUILD_GANDIVA=on -DCMAKE_BUILD_TYPE=release C:\arrow\arrow\python
-- Selecting Windows SDK version  to target Windows 10.0.15063.
CMake Error at CMakeLists.txt:22 (project):
  Failed to run MSBuild command:

    MSBuild.exe

  to get the value of VCTargetsPath:

    Microsoft (R) Build Engine version 15.9.21+g9802d43bc3 for .NET Framework
    Copyright (C) Microsoft Corporation. All rights reserved.

    Build started 9/6/2019 9:55:02 AM.
    Project "C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj" on node 1 (default targets).
    C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\Common7\IDE\VC\VCTargets\Microsoft.Cpp.Platform.targets(67,5): error MSB8020: The build tools for v140 (Platform Toolset = 'v140') cannot be found. To build using the v140 build tools, please install v140 build tools.  Alternatively, you may upgrade to the current Visual Studio tools by selecting the Project menu or right-click the solution, and then selecting "Retarget solution". [C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj]
    Done Building Project "C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj" (default targets) -- FAILED.

    Build FAILED.

    "C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj" (default target) (1) ->
    (PlatformPrepareForBuild target) ->
      C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\Common7\IDE\VC\VCTargets\Microsoft.Cpp.Platform.targets(67,5): error MSB8020: The build tools for v140 (Platform Toolset = 'v140') cannot be found. To build using the v140 build tools, please install v140 build tools.  Alternatively, you may upgrade to the current Visual Studio tools by selecting the Project menu or right-click the solution, and then selecting "Retarget solution". [C:\arrow\arrow\python\build\temp.win-amd64-3.7\Release\CMakeFiles\3.15.3\VCTargetsPath.vcxproj]

        0 Warning(s)
        1 Error(s)

    Time Elapsed 00:00:00.94


  Exit code: 1



-- Configuring incomplete, errors occurred!
See also "C:/arrow/arrow/python/build/temp.win-amd64-3.7/Release/CMakeFiles/CMakeOutput.log".
error: command 'C:\\Users\\abc\\Miniconda3\\envs\\pyarrow-dev\\Library\\bin\\cmake.exe' failed with exit status 1

(pyarrow-dev) C:\arrow\arrow\python>

Copy link
Contributor Author

@ARF1 ARF1 Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stumbled across this error as well since my last commit.

From what I can tell, this does not happen, when one follows the recipe in one go: i.e. configure arrow library, build arrow library then build extension.

I encountered this error when I wanted to rebuild the python extension without reconfiguring and re-building the arrow library. For this I did python setup.py clean. After that when I run python setup.py build_ext --inplace the issue you described pops up.

Simpler than changing the line you referenced would be to introduce into the build recipe the step:
set PYARROW_CMAKE_GENERATOR=Visual Studio 17 2017 Win64

One could then change the cmake line to use this once defined variable like so:
cmake -G "%PYARROW_CMAKE_GENERATOR%" ^

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your update. I agree with your proposal. This change also works for my environment.

pushd arrow\python
py.test pyarrow -v
popd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 474, I cannot execute python-test.exe or ctest at arrow\python. I may miss something to build python-test.exe

(pyarrow-dev) C:\arrow\arrow\python>ctest
*********************************
No test configuration file found!
*********************************
Usage

  ctest [options]


(pyarrow-dev) C:\arrow\arrow\python>python-test
'python-test' is not recognized as an internal or external command,
operable program or batch file.

(pyarrow-dev) C:\arrow\arrow\python>dir python-test*
 Volume in drive C is OS
 Volume Serial Number is 7884-B5DB

 Directory of C:\arrow\arrow\python

File Not Found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not tried this yet. Can confirm your observations.

Could this be an obsolete part of the instructions? In the linux instructions neither python-test nor ctest seem to make an appearance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to pass -DARROW_BUILD_TESTS=ON to build arrow-python-test.exe, the unit tests are now toggled off by default. I don't think most developers need to build this executable but this should go in a separate section for completeness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm Thanks. ctest runs now but it must be run from the cpp/build directory.

Also, arrow-python-test fails:

      Start 58: arrow-python-test
58/83 Test #58: arrow-python-test .........................Exit code 0xc0000409
***Exception:   0.51 sec

I can find a arrow-python-test.exe in cpp/build/release/Release. Unfortunately running it directly is not working either.

(pyarrow-dev) Z:\dev\arrow\cpp\build\release\Release>arrow-python-test.exe
Fatal Python error: initfsencoding: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00002390 (most recent call first):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm Apologies. ctest and arrow-python-test.exe do run ok.

I forgot the set PYTHONHOME=%CONDA_PREFIX%...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This change also works well for my environment.

Copy link
Member

@wesm wesm Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary for most developers to run the C++-based unit test, so we should put this off in a separate Advanced section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is already done

@ARF1
Copy link
Contributor Author

ARF1 commented Sep 6, 2019

@kiszk I am encountering further issues:

Issue 1:
If I do not follow the entire build recipe, but only want to re-build the extension, I need to set the env variables ARROW_HOME and PATH=%ARROW_HOME%\bin;%PATH% if my anaconda prompt was closed since the first build. This should probably also go in the instructions or is that self-evident?

Issue 2:
After I run python setup.py install, close my anaconda propt, re-open the anaconda prompt and try to import pyarrow in python I get the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "Z:\Systemdateien\Miniconda3\envs\pyarrow-dev\lib\site-packages\pyarrow-0.14.1.dev430+g1c42e6f1b.d20190906-py3.7-win-amd64.egg\pyarrow\__init__.py", line 49, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: DLL load failed: The specified module could not be found.

To fix this, I again need to set PATH=%ARROW_HOME%\bin;%PATH%. Should the DLLs not be installed during python setup.py install?

@wesm
Copy link
Member

wesm commented Sep 6, 2019

Issue 1:
If I do not follow the entire build recipe, but only want to re-build the extension, I need to set the env variables ARROW_HOME and PATH=%ARROW_HOME%\bin;%PATH% if my anaconda prompt was closed since the first build. This should probably also go in the instructions or is that self-evident?

It may or may not be self-evident depending on the person. Probably a good idea to mention it explicitly in the document.

To fix this, I again need to set PATH=%ARROW_HOME%\bin;%PATH%. Should the DLLs not be installed during python setup.py install?

In general yes this is the recommendation for development.

You can try using --bundle-arrow-cpp (and this may work, I haven't tried doing a local install with bundling, we only use this for wheel installs)

You can mention both options

@ARF1
Copy link
Contributor Author

ARF1 commented Sep 7, 2019

@wesm I updated my local copy from #5297 to the current master and am now encountering new problems with the build:

>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "Z:\dev\arrow\python\pyarrow\__init__.py", line 49, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: DLL load failed: The specified procedure could not be found.

I cherry-picked 314e9f0 onto 2164e3b (common ancestor of master and #5297) and that works ok. So somewhere since 2164e3b an issue seems to have cropped up.

Can you guess where this was introduced? Or shall I try building at different points in between 2164e3b and master to identify which commit is causing the issue?

I thought, that maybe a dll was no longer getting built, but that does not seem to be the issue.


With 314e9f0:

>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "Z:\dev\arrow\python\pyarrow\__init__.py", line 49, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: DLL load failed: The specified procedure could not be found.
>>> quit()


(pyarrow-dev) Z:\dev\arrow-dist\bin>dir
 Volume in drive Z has no label.
 Volume Serial Number is 08E2-3231

 Directory of Z:\dev\arrow-dist\bin

07/09/2019  20:32    <DIR>          .
07/09/2019  20:32    <DIR>          ..
07/09/2019  20:27         5.631.488 arrow.dll
07/09/2019  20:28            76.288 arrow_dataset.dll
07/09/2019  20:29           765.440 arrow_python.dll
07/09/2019  20:31        22.038.528 gandiva.dll
07/09/2019  20:28         1.305.088 parquet.dll
               5 File(s)     29.816.832 bytes
               2 Dir(s)  216.331.890.688 bytes free

With #5239:

>>> import pyarrow
>>> quit()

(pyarrow-dev) Z:\dev\arrow-dist\bin>dir
 Volume in drive Z has no label.
 Volume Serial Number is 08E2-3231

 Directory of Z:\dev\arrow-dist\bin

07/09/2019  21:45    <DIR>          .
07/09/2019  21:45    <DIR>          ..
07/09/2019  21:39         5.629.440 arrow.dll
07/09/2019  21:40            76.288 arrow_dataset.dll
07/09/2019  21:42           764.416 arrow_python.dll
07/09/2019  21:44        22.038.528 gandiva.dll
07/09/2019  21:40         1.306.112 parquet.dll
               5 File(s)     29.814.784 bytes
               2 Dir(s)  216.336.334.848 bytes free


.. code-block:: shell
pushd arrow\cpp\build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it better to add mkdir arrow\cpp\build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are quite right. Just in case somebody jumps directly to this section.

@ARF1
Copy link
Contributor Author

ARF1 commented Sep 8, 2019

@wesm I updated my local copy from #5297 to the current master and am now encountering new problems with the build:

>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "Z:\dev\arrow\python\pyarrow\__init__.py", line 49, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: DLL load failed: The specified procedure could not be found.

Never mind, I finally found the culprit: When I tested --bundle-cpp-arrow, I combined it with --inplace. This copied the .dll-files to the python source tree and they do not get cleaned out with python setup.py clean.

In my later builds without --bundle-arrow-cpp they thus remained in place and took precedence over the newer .dll-files in PATH.

I added a note to this effect in the updated instructions.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me. Were you planning to do other changes or can this be merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "libraries".

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I fixed a couple typos. Merging this, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants