Description
Extremely negative softmax inputs output NaN. This is an error caught detected in MKLDNN already (uxlfoundation/oneDNN#106 ) with a fix (https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e ) in MKLDNN v0.15+ (we are v0.14).
The fix is either to:
patch MKLDNN v0.14 with the earlier fix
to upgrade the MKLDNN version in mxnet (Update MKL-DNN dependency #12953 ).
Environment info (Required)
ubuntu@ip-172-31-3-217:~$ python diagnose.py
----------Python Info----------
Version : 3.6.4
Compiler : GCC 7.2.0
Build : ('default', 'Jan 16 2018 18:10:19')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 9.0.1
Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Version : 1.3.0
Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet
Commit Hash : b3be92f4a48bce62a5a8424271871c2f81c8f7f1
----------System Info----------
Platform : Linux-4.4.0-1065-aws-x86_64-with-debian-stretch-sid
system : Linux
node : ip-172-31-3-217
release : 4.4.0-1065-aws
version : #75-Ubuntu SMP Fri Aug 10 11:14:32 UTC 2018
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Thread(s) per core: 2
Core(s) per socket: 18
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping: 3
CPU MHz: 3000.000
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-17,36-53
NUMA node1 CPU(s): 18-35,54-71
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0012 sec, LOAD: 0.4806 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1717 sec, LOAD: 0.5293 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1596 sec, LOAD: 0.3734 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0262 sec, LOAD: 0.1173 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0013 sec, LOAD: 0.3264 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0118 sec, LOAD: 0.0690 sec.
Package used (Python/R/Scala/Julia):
Python
For Scala user, please provide:
Java version: (java -version)
Maven version: (mvn -version)
Scala runtime if applicable: (scala -version)
For R user, please provide R sessionInfo():
Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):
MXNet commit hash:
6b5d9f9
Build config:
MKLDNN (pip install mxnet-mkl)
Error Message:
ubuntu@ip-172-31-3-217:~/incubator-mxnet$ python tt.py
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
[
[[[[nan nan]]]]
<NDArray 1x1x1x2 @cpu(0)>]
Minimum reproducible example
import mxnet as mx
input_data = mx.nd.array([[[[-1e30,-1e30]]]])
data = mx.sym.Variable('data')
out1 = data.softmax(axis=1)
exec1 = out1.bind(mx.cpu(), args={'data': input_data, 'softmax_label': mx.nd.ones([1]), 'fc_weight': mx.nd.ones([2,2]), 'fc1_weight': mx.nd.ones([2,2])})
exec1.forward()[0].wait_to_read()
print(exec1.outputs)
Steps to reproduce
Run the following script.
What have you tried to solve it?
Applying this one line fix (https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e ) in mkldnn fixes the issue.
Description
Extremely negative softmax inputs output NaN. This is an error caught detected in MKLDNN already (uxlfoundation/oneDNN#106) with a fix (https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e) in MKLDNN v0.15+ (we are v0.14).
The fix is either to:
Environment info (Required)
Package used (Python/R/Scala/Julia):
Python
For Scala user, please provide:
java -version)mvn -version)scala -version)For R user, please provide R
sessionInfo():Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):
MXNet commit hash:
6b5d9f9
Build config:
MKLDNN (pip install mxnet-mkl)
Error Message:
Minimum reproducible example
Steps to reproduce
Run the following script.
What have you tried to solve it?
Applying this one line fix (https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e) in mkldnn fixes the issue.