Skip to content

fix windows cuda support for python 3.8 + #6046

Merged
jywu-msft merged 4 commits intomasterfrom
jywu_py_fix
Dec 7, 2020
Merged

fix windows cuda support for python 3.8 + #6046
jywu-msft merged 4 commits intomasterfrom
jywu_py_fix

Conversation

@jywu-msft
Copy link
Member

#5956 doesn't work.
os.add_dll_directory() must be called before onnxruntime.capi._pybind_state gets imported.
otherwise you will get

import onnxruntime as ort
C:\onnxruntime\build\Windows\Release\Release\onnxruntime\capi_pybind_state.py:22: UserWarning: Cannot load onnxruntime.capi. Error: 'DLL load failed while importing onnxruntime_pybind11_state: The specified module could not be found.'.
warnings.warn("Cannot load onnxruntime.capi. Error: '{0}'.".format(str(e)))
Traceback (most recent call last):
File "", line 1, in
File "C:\onnxruntime\build\Windows\Release\Release\onnxruntime_init_.py", line 16, in
from onnxruntime.capi._pybind_state import get_device
ImportError: cannot import name 'get_device' from 'onnxruntime.capi._pybind_state' (C:\onnxruntime\build\Windows\Release\Release\onnxruntime\capi_pybind_state.py)

@jywu-msft jywu-msft requested a review from a team as a code owner December 5, 2020 00:38
@jywu-msft jywu-msft merged commit 020efc9 into master Dec 7, 2020
@jywu-msft jywu-msft deleted the jywu_py_fix branch December 7, 2020 18:09
@snnn
Copy link
Contributor

snnn commented Dec 7, 2020

Had a discussion with George, he said:

  1. If user had multiple cuda installs, it's their responsibility to set CUDA_PATH to the version they want to use
  2. Nvidia instructions say to install cudnn libs in same location as cuda libs

Therefore, to make the pipelines fully work, we need to update the CI build machines to

  1. Remove CUDA_PATH from the system wide environment variable lists, to avoid confusion. And set it up at the beginning of each Windows GPU build.
  2. Update cudnn's installation location. And we should update build.py and make the "--cudnn_home" build option deprecated on Windows.

@jywu-msft
Copy link
Member Author

Had a discussion with George, he said:

  1. If user had multiple cuda installs, it's their responsibility to set CUDA_PATH to the version they want to use
  2. Nvidia instructions say to install cudnn libs in same location as cuda libs

Therefore, to make the pipelines fully work, we need to update the CI build machines to

  1. Remove CUDA_PATH from the system wide environment variable lists, to avoid confusion. And set it up at the beginning of each Windows GPU build.
  2. Update cudnn's installation location. And we should update build.py and make the "--cudnn_home" build option deprecated on Windows.

for consistency, i don't think we need to deprecate cudnn_home just for windows. it can point to same location as cuda_home
otherwise, we should deprecate it for all platforms.

@ivanst0
Copy link
Member

ivanst0 commented Dec 7, 2020

#5956 doesn't work.
os.add_dll_directory() must be called before onnxruntime.capi._pybind_state gets imported.

I don't think this is true.

#5956 did work fine for onnxruntime-gpu when CUDA/cuDNN are properly installed. The problem with that fix was that it didn't work for CPU version of onnxruntime package (complaining about missing CUDA/cuDNN), which wasn't caught because the relevant tests were disabled at the time.

The second statement should be

os.add_dll_directory() must be called before onnxruntime.capi._pybind_state onnxruntime.capi.onnxruntime_pybind11_state gets imported.

Therefore, there was no reason to move os.add_dll_directory() to from onnxruntime/python/_pybind_state.py to onnxruntime/__init__.py.

@jywu-msft
Copy link
Member Author

jywu-msft commented Dec 7, 2020

#5956 doesn't work.
os.add_dll_directory() must be called before onnxruntime.capi._pybind_state gets imported.

I don't think this is true.

#5956 did work fine for onnxruntime-gpu when CUDA/cuDNN are properly installed. The problem with that fix was that it didn't work for CPU version of onnxruntime package (complaining about missing CUDA/cuDNN), which wasn't caught because the relevant tests were disabled at the time.

The second statement should be

os.add_dll_directory() must be called before onnxruntime.capi._pybind_state onnxruntime.capi.onnxruntime_pybind11_state gets imported.

Therefore, there was no reason to move os.add_dll_directory() to from onnxruntime/python/_pybind_state.py to onnxruntime/__init__.py.

yes. it is true. #5956 did not work for a python cuda build with python 3.8+ on Windows.
I tested it.
when you import onnxruntime, it calls __init__.py , which imports from capi._pybind_state
see: https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/__init__.py#L24

and if you look at capi._pybind_state.py ,
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/_pybind_state.py#L12
it imports * from onnxruntime.capi.onnxruntime_pybind11_state

and that is where you will get the error i pasted above

import onnxruntime as ort
C:\onnxruntime\build\Windows\Release\Release\onnxruntime\capi_pybind_state.py:22: UserWarning: Cannot load onnxruntime.capi. Error: 'DLL load failed while importing onnxruntime_pybind11_state: The specified module could not be found.'.
warnings.warn("Cannot load onnxruntime.capi. Error: '{0}'.".format(str(e)))
Traceback (most recent call last):
File "", line 1, in
File "C:\onnxruntime\build\Windows\Release\Release\onnxruntime_init_.py", line 16, in
from onnxruntime.capi._pybind_state import get_device
ImportError: cannot import name 'get_device' from 'onnxruntime.capi._pybind_state' (C:\onnxruntime\build\Windows\Release\Release\onnxruntime\capi_pybind_state.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants