For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.
Environment info
Operating System:
Ubuntu 16.04.3 LTS
Compiler:
hipcc / hcc (clang 6, see version output below)
hipcc --version
HIP version: 1.4.17494
HCC clang version 6.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 42ceed861a212d9bd0aef883ee7981144f3ecc02) (ssh://gerritgit/compute/ec/hcc-tot/llvm 23e086be6f627e6e983c6789d2e77da6bf85ebb6) (based on HCC 1.1.17493-2f85d8a-42ceed8-23e086b )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
Package used (Python/R/Scala/Julia):
MXNet version:
Or if installed from source:
MXNet commit hash (git rev-parse HEAD):
d053ae8
If you are using python package, please provide
Python version and distribution:
If you are using R package, please provide
R sessionInfo():
Error Message:
Please paste the full error message, including stack trace.
The initial issue was that with the latest rocm (1.7.60) install from the repositories there was problem with rocBLAS and hcRNG was missing so I built them from git. hcFFT was available as expected. At this point mxnet appear to compile but multiple errors reported I'm attached a build log from the second build attempt so it is less noisy.
I am also using cuda 9.1 but I did try cuda 8 which also failed. The environment vars in both cases were:
LD_LIBRARY_PATH=/usr/local/cuda/lib64 (this symlinked to 8 or 9.1 depending on what is installed)
HIP_PLATFORM=hcc
The current git version of mxnet also do not need the Makefile modification presented since it is always there.
build.log
Minimum reproducible example
if you are using your own code, please provide a short script that reproduces the error.
Steps to reproduce
or if you are running standard examples, please provide the commands you have run that lead to the error.
1.make -j $(nproc)
2.
3.
What have you tried to solve it?
The first stoppage in the log...
41 warnings and 2 errors generated.
Died at /opt/rocm/bin/hipcc line 500
...refers to a line in the hipcc script...
495 if ($runCmd) {
496 if ($HIP_PLATFORM eq "hcc" and exists($hipConfig{'HCC_VERSION'}) and $HCC_VERSION ne $hipConfig{'HCC_VERSION'}) {
497 print ("HIP ($HIP_PATH) was built using hcc $hipConfig{'HCC_VERSION'}, but you are using $HCC_HOME/hcc with version $HCC_VERSION from hipcc. Please rebuild HIP including cmake or update HCC_HOME variable.\n") ;
498 die unless $ENV{'HIP_IGNORE_HCC_VERSION'};
499 }
500 system ("$CMD") and die ();
501 }
However, my HIP configuration appears to be good...
hipconfig
HIP version : 1.4.17494
== hipconfig
HIP_PATH : /opt/rocm
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/include -I/opt/rocm/hcc/include
== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 6.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 42ceed861a212d9bd0aef883ee7981144f3ecc02) (ssh://gerritgit/compute/ec/hcc-tot/llvm 23e086be6f627e6e983c6789d2e77da6bf85ebb6) (based on HCC 1.1.17493-2f85d8a-42ceed8-23e086b )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 6.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver1
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc-1.0/include -I/opt/rocm/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc-1.0/lib -Wl,--rpath=/opt/rocm/hcc-1.0/lib -ldl -lm -lpthread -lunwind -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive
=== Environment Variables
PATH=/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH=/usr/local/cuda/lib64
HIP_PLATFORM=hcc
== Linux Kernel
Hostname :
Linux 4.4.0-109-generic apache#132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
~ ~ ~
I'm not sure what to try next. My guess is that there are some function differences between mxnet code and the larger requirements but I don't know how to resolve that.
For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.
Environment info
Operating System:
Ubuntu 16.04.3 LTS
Compiler:
hipcc / hcc (clang 6, see version output below)
hipcc --version
HIP version: 1.4.17494
HCC clang version 6.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 42ceed861a212d9bd0aef883ee7981144f3ecc02) (ssh://gerritgit/compute/ec/hcc-tot/llvm 23e086be6f627e6e983c6789d2e77da6bf85ebb6) (based on HCC 1.1.17493-2f85d8a-42ceed8-23e086b )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
Package used (Python/R/Scala/Julia):
MXNet version:
Or if installed from source:
MXNet commit hash (
git rev-parse HEAD):d053ae8
If you are using python package, please provide
Python version and distribution:
If you are using R package, please provide
R
sessionInfo():Error Message:
Please paste the full error message, including stack trace.
The initial issue was that with the latest rocm (1.7.60) install from the repositories there was problem with rocBLAS and hcRNG was missing so I built them from git. hcFFT was available as expected. At this point mxnet appear to compile but multiple errors reported I'm attached a build log from the second build attempt so it is less noisy.
I am also using cuda 9.1 but I did try cuda 8 which also failed. The environment vars in both cases were:
LD_LIBRARY_PATH=/usr/local/cuda/lib64 (this symlinked to 8 or 9.1 depending on what is installed)
HIP_PLATFORM=hcc
The current git version of mxnet also do not need the Makefile modification presented since it is always there.
build.log
Minimum reproducible example
if you are using your own code, please provide a short script that reproduces the error.
Steps to reproduce
or if you are running standard examples, please provide the commands you have run that lead to the error.
1.make -j $(nproc)
2.
3.
What have you tried to solve it?
The first stoppage in the log...
41 warnings and 2 errors generated.
Died at /opt/rocm/bin/hipcc line 500
...refers to a line in the hipcc script...
495 if ($runCmd) {
496 if ($HIP_PLATFORM eq "hcc" and exists($hipConfig{'HCC_VERSION'}) and $HCC_VERSION ne $hipConfig{'HCC_VERSION'}) {
497 print ("HIP ($HIP_PATH) was built using hcc $hipConfig{'HCC_VERSION'}, but you are using $HCC_HOME/hcc with version $HCC_VERSION from hipcc. Please rebuild HIP including cmake or update HCC_HOME variable.\n") ;
498 die unless $ENV{'HIP_IGNORE_HCC_VERSION'};
499 }
500 system ("$CMD") and die ();
501 }
However, my HIP configuration appears to be good...
hipconfig
HIP version : 1.4.17494
== hipconfig
HIP_PATH : /opt/rocm
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/include -I/opt/rocm/hcc/include
== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 6.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 42ceed861a212d9bd0aef883ee7981144f3ecc02) (ssh://gerritgit/compute/ec/hcc-tot/llvm 23e086be6f627e6e983c6789d2e77da6bf85ebb6) (based on HCC 1.1.17493-2f85d8a-42ceed8-23e086b )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 6.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver1
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc-1.0/include -I/opt/rocm/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc-1.0/lib -Wl,--rpath=/opt/rocm/hcc-1.0/lib -ldl -lm -lpthread -lunwind -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive
=== Environment Variables
PATH=/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH=/usr/local/cuda/lib64
HIP_PLATFORM=hcc
== Linux Kernel
Hostname :
Linux 4.4.0-109-generic apache#132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
~ ~ ~
I'm not sure what to try next. My guess is that there are some function differences between mxnet code and the larger requirements but I don't know how to resolve that.