You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Running mxnet-horovod example incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py on mxnet1.8-cuda11.0 with python 3.7 encountered a segfault error. The error occurred after the example script finished.
The same script works fine on mxnet1.8-cuda10.2 with python 3.7 and mxnet1.8-cuda11.0 with python 3.6.
To Reproduce
Steps to reproduce
Launch an EC2 p3.8x gpu instance with dlami: ami-02440419a5afe47ab
Build mx1.8-cu110 from source
Install Horovod python3 -m pip install horovod
Run LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH python3 \ incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py to reproduce the error