This repository was archived by the owner on Jan 22, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2k
This repository was archived by the owner on Jan 22, 2024. It is now read-only.
system has unsupported display driver / cuda driver combination #1256
Copy link
Copy link
Closed
apache/mxnet
#18186Description
1. Issue or feature description
CUDA: Check failed: e == cudaSuccess (803 vs. 0) : system has unsupported display driver / cuda driver combination
2. Steps to reproduce the issue
Too lengthy/not possible to share.
3. Information to attach (optional if deemed irrelevant)
- Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
WARNING, the following logs are for debugging purposes only --
I0424 07:05:10.961294 3010 nvc.c:281] initializing library context (version=1.0.7, build=b71f87c04b8eca8a16bf60995506c35c937347d9)
I0424 07:05:10.961331 3010 nvc.c:255] using root /
I0424 07:05:10.961341 3010 nvc.c:256] using ldcache /etc/ld.so.cache
I0424 07:05:10.961348 3010 nvc.c:257] using unprivileged user 1000:1000
W0424 07:05:10.962501 3011 nvc.c:186] failed to set inheritable capabilities
W0424 07:05:10.962538 3011 nvc.c:187] skipping kernel modules load due to failure
I0424 07:05:10.962720 3012 driver.c:133] starting driver service
I0424 07:05:10.987696 3010 nvc_info.c:438] requesting driver information with ''
I0424 07:05:10.987894 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01
I0424 07:05:10.987949 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.440.33.01
I0424 07:05:10.987987 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 over /usr/lib/x86_64-linux-gn
u/tls/libnvidia-tls.so.440.33.01
I0424 07:05:10.988024 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01
I0424 07:05:10.988073 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01
I0424 07:05:10.988125 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01
I0424 07:05:10.988180 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01
I0424 07:05:10.988216 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01
I0424 07:05:10.988268 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01
I0424 07:05:10.988319 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01
I0424 07:05:10.988354 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01
I0424 07:05:10.988390 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01
I0424 07:05:10.988430 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01
I0424 07:05:10.988479 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01
I0424 07:05:10.988521 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01
I0424 07:05:10.988570 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01
I0424 07:05:10.988606 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01
I0424 07:05:10.988643 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01
I0424 07:05:10.988694 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01
I0424 07:05:10.988730 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01
I0424 07:05:10.988853 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01
I0424 07:05:10.988939 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01
I0424 07:05:10.988977 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01
I0424 07:05:10.989014 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01
I0424 07:05:10.989050 3010 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01
W0424 07:05:10.989072 3010 nvc_info.c:303] missing library libvdpau_nvidia.so
W0424 07:05:10.989079 3010 nvc_info.c:307] missing compat32 library libnvidia-ml.so
W0424 07:05:10.989087 3010 nvc_info.c:307] missing compat32 library libnvidia-cfg.so
W0424 07:05:10.989094 3010 nvc_info.c:307] missing compat32 library libcuda.so
W0424 07:05:10.989099 3010 nvc_info.c:307] missing compat32 library libnvidia-opencl.so
W0424 07:05:10.989104 3010 nvc_info.c:307] missing compat32 library libnvidia-ptxjitcompiler.so
W0424 07:05:10.989108 3010 nvc_info.c:307] missing compat32 library libnvidia-fatbinaryloader.so
W0424 07:05:10.989119 3010 nvc_info.c:307] missing compat32 library libnvidia-compiler.so
W0424 07:05:10.989125 3010 nvc_info.c:307] missing compat32 library libvdpau_nvidia.so
W0424 07:05:10.989132 3010 nvc_info.c:307] missing compat32 library libnvidia-encode.so
W0424 07:05:10.989138 3010 nvc_info.c:307] missing compat32 library libnvidia-opticalflow.so
W0424 07:05:10.989147 3010 nvc_info.c:307] missing compat32 library libnvcuvid.so
W0424 07:05:10.989158 3010 nvc_info.c:307] missing compat32 library libnvidia-eglcore.so
W0424 07:05:10.989165 3010 nvc_info.c:307] missing compat32 library libnvidia-glcore.so
W0424 07:05:10.989175 3010 nvc_info.c:307] missing compat32 library libnvidia-tls.so
W0424 07:05:10.989180 3010 nvc_info.c:307] missing compat32 library libnvidia-glsi.so
W0424 07:05:10.989188 3010 nvc_info.c:307] missing compat32 library libnvidia-fbc.so
W0424 07:05:10.989198 3010 nvc_info.c:307] missing compat32 library libnvidia-ifr.so
W0424 07:05:10.989204 3010 nvc_info.c:307] missing compat32 library libnvidia-rtcore.so
W0424 07:05:10.989211 3010 nvc_info.c:307] missing compat32 library libnvoptix.so
W0424 07:05:10.989215 3010 nvc_info.c:307] missing compat32 library libGLX_nvidia.so
W0424 07:05:10.989222 3010 nvc_info.c:307] missing compat32 library libEGL_nvidia.so
W0424 07:05:10.989230 3010 nvc_info.c:307] missing compat32 library libGLESv2_nvidia.so
W0424 07:05:10.989240 3010 nvc_info.c:307] missing compat32 library libGLESv1_CM_nvidia.so
W0424 07:05:10.989250 3010 nvc_info.c:307] missing compat32 library libnvidia-glvkspirv.so
W0424 07:05:10.989256 3010 nvc_info.c:307] missing compat32 library libnvidia-cbl.so
I0424 07:05:10.989472 3010 nvc_info.c:233] selecting /usr/bin/nvidia-smi
I0424 07:05:10.989496 3010 nvc_info.c:233] selecting /usr/bin/nvidia-debugdump
I0424 07:05:10.989515 3010 nvc_info.c:233] selecting /usr/bin/nvidia-persistenced
I0424 07:05:10.989540 3010 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-control
I0424 07:05:10.989561 3010 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-server
I0424 07:05:10.989589 3010 nvc_info.c:370] listing device /dev/nvidiactl
I0424 07:05:10.989598 3010 nvc_info.c:370] listing device /dev/nvidia-uvm
I0424 07:05:10.989607 3010 nvc_info.c:370] listing device /dev/nvidia-uvm-tools
I0424 07:05:10.989617 3010 nvc_info.c:370] listing device /dev/nvidia-modeset
I0424 07:05:10.989650 3010 nvc_info.c:274] listing ipc /run/nvidia-persistenced/socket
W0424 07:05:10.989668 3010 nvc_info.c:278] missing ipc /tmp/nvidia-mps
I0424 07:05:10.989675 3010 nvc_info.c:494] requesting device information with ''
I0424 07:05:10.995334 3010 nvc_info.c:524] listing device /dev/nvidia0 (GPU-4cfe4f25-9d56-b1f1-edb8-dfa13fc461ae at 00000000:00:1e.0)
NVRM version: 440.33.01
CUDA version: 10.2
Device Index: 0
Device Minor: 0
Model: Tesla T4
Brand: Tesla
GPU UUID: GPU-4cfe4f25-9d56-b1f1-edb8-dfa13fc461ae
Bus Location: 00000000:00:1e.0
Architecture: 7.5
- Kernel version from
uname -a
Linux ip-172-31-32-87 4.15.0-1057-aws #59-Ubuntu SMP Wed Dec 4 10:02:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Any relevant kernel output lines from
dmesg - Driver information from
nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Fri Apr 24 07:06:35 2020
Driver Version : 440.33.01
CUDA Version : 10.2
Attached GPUs : 1
GPU 00000000:00:1E.0
Product Name : Tesla T4
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561719002810
GPU UUID : GPU-4cfe4f25-9d56-b1f1-edb8-dfa13fc461ae
Minor Number : 0
VBIOS Version : 90.04.84.00.06
MultiGPU Board : No
Board ID : 0x1e
GPU Part Number : 900-2G183-0000-001
Inforom Version
Image Version : G183.0200.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : Pass-Through
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x00
Device : 0x1E
Domain : 0x0000
Device Id : 0x1EB810DE
Bus Id : 00000000:00:1E.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
- Docker version from
docker version
docker version
Client: Docker Engine - Community
Version: 19.03.8
API version: 1.40
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:25:46 2020
OS/Arch: linux/amd64
Experimental: false
- NVIDIA packages version from
dpkg -l '*nvidia*'orrpm -qa '*nvidia*'
dpkg -l '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===========================-==================-==================-============================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-cfg1-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
un libnvidia-common <none> <none> (no description available)
ii libnvidia-common-440 440.33.01-0ubuntu1 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-container-tools 1.0.7-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.0.7-1 amd64 NVIDIA container runtime library
un libnvidia-decode <none> <none> (no description available)
ii libnvidia-decode-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA Video Decoding runtime libraries
un libnvidia-encode <none> <none> (no description available)
ii libnvidia-encode-440:amd64 440.33.01-0ubuntu1 amd64 NVENC Video Encoding runtime library
un libnvidia-fbc1 <none> <none> (no description available)
ii libnvidia-fbc1-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
un libnvidia-gl <none> <none> (no description available)
ii libnvidia-gl-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un libnvidia-ifr1 <none> <none> (no description available)
ii libnvidia-ifr1-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
un libnvidia-ml1 <none> <none> (no description available)
un nvidia-304 <none> <none> (no description available)
un nvidia-340 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-390 <none> <none> (no description available)
ii nvidia-compute-utils-440 440.33.01-0ubuntu1 amd64 NVIDIA compute utilities
un nvidia-container-runtime <none> <none> (no description available)
un nvidia-container-runtime-ho <none> <none> (no description available)
ii nvidia-container-toolkit 1.0.5-1 amd64 NVIDIA container runtime hook
ii nvidia-dkms-440 440.33.01-0ubuntu1 amd64 NVIDIA DKMS package
un nvidia-dkms-kernel <none> <none> (no description available)
ii nvidia-driver-440 440.33.01-0ubuntu1 amd64 NVIDIA driver metapackage
un nvidia-driver-binary <none> <none> (no description available)
un nvidia-kernel-common <none> <none> (no description available)
ii nvidia-kernel-common-440 440.33.01-0ubuntu1 amd64 Shared files used with the kernel module
un nvidia-kernel-source <none> <none> (no description available)
ii nvidia-kernel-source-440 440.33.01-0ubuntu1 amd64 NVIDIA kernel source package
un nvidia-legacy-340xx-vdpau-d <none> <none> (no description available)
ii nvidia-modprobe 440.33.01-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
un nvidia-opencl-icd <none> <none> (no description available)
un nvidia-persistenced <none> <none> (no description available)
ii nvidia-prime 0.8.8.2 all Tools to enable NVIDIA's Prime
ii nvidia-settings 440.33.01-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binary <none> <none> (no description available)
un nvidia-smi <none> <none> (no description available)
un nvidia-utils <none> <none> (no description available)
ii nvidia-utils-440 440.33.01-0ubuntu1 amd64 NVIDIA driver support binaries
un nvidia-vdpau-driver <none> <none> (no description available)
ii xserver-xorg-video-nvidia-4 440.33.01-0ubuntu1 amd64 NVIDIA binary Xorg driver
- NVIDIA container library version from
nvidia-container-cli -V
version: 1.0.7
build date: 2020-01-21T18:59+00:00
build revision: b71f87c04b8eca8a16bf60995506c35c937347d9
build compiler: x86_64-linux-gnu-gcc-7 7.4.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
- Docker command, image and tag used
incubator-mxnet repo uses the base image as nvidia/cuda-10-1 docker image
The command that fails for me :
sudo docker run --gpus all -v /home/ubuntu/incubator-mxnet:/work/mxnet -v /home/ubuntu/incubator-mxnet/build:/work/build mxnetci/build.ubuntu_gpu_cu101 /work/runtime_functions.sh integrationtest_ubuntu_gpu_python
barzan-hayati
Metadata
Metadata
Assignees
Labels
No labels