This repository was archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Autograd fails when using take operator repeatedly #11599
Copy link
Copy link
Closed
Description
Description
Here provides a strange example that a backward pass on the accumulated sum of a 2-d NDArray sc, may potentially cause autograd to fail, depending on the shape of sc.
Minimum reproducible example
TL;DR. This code calculates x = sum(sc[i, j] for each i, j), and then invokes a backward pass x.backward(). The error message says that it could not calculate the gradient w.r.t. some variables, but I didn't require gradient for non-differentiable variables like i or j.
import mxnet as mx
def _array(shape):
"""Create an NDArray with random entries and the given shape
"""
return mx.nd.random.uniform(-1.0, 1.0, shape=shape, dtype="float32")
def index(sc, i, j):
"""Equivalent to sc[i, j] in numpy
"""
return sc.take(i).squeeze(axis=0) \
.take(j).squeeze(axis=0)
# tweaking `sc` in different shapes, the code sometimes fails, sometimes not
row_len, col_len = 2, 8
# the scanned variable
sc = _array((row_len, col_len))
sc.attach_grad() # we only require the gradient w.r.t. `sc`
# `i`, `j` are loop variables which don't require grad
i = mx.nd.array([0], dtype="int64")
j = mx.nd.array([0], dtype="int64")
with mx.autograd.record(train_mode=True):
xs = []
for _ in range(row_len):
x_i = []
for _ in range(col_len):
x_ij = index(sc, i, j)
x_i.append(x_ij)
j = j + 1
i = i + 1
j = j - col_len # reset j
xs.append(mx.nd.stack(*x_i))
x = mx.nd.stack(*xs)
x = x.sum()
x.backward()
print(sc.grad.asnumpy()) # the expected result should be all `1`sError Message
>> python2 test.py
/home/ubuntu/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "test.py", line 33, in <module>
print(sc.grad.asnumpy())
File "/home/ubuntu/Projects/mxnet/python/mxnet/ndarray/ndarray.py", line 1910, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/ubuntu/Projects/mxnet/python/mxnet/base.py", line 210, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [06:43:28] src/operator/tensor/./indexing_op.h:829: Check failed: req[take_::kIdx] == kNullOp (1 vs. 0) take layer doesn't support gradient into index
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f740cb6231b]
[bt] (1) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f740cb62b58]
[bt] (2) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::op::TakeOpBackward<mshadow::cpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x220) [0x7f740e46c710]
[bt] (3) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x291) [0x7f740f26a7b1]
[bt] (4) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x68) [0x7f740f7a7538]
[bt] (5) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x47) [0x7f740f7a7517]
[bt] (6) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x47) [0x7f740f7a7517]
[bt] (7) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x47) [0x7f740f7a7517]
[bt] (8) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x47) [0x7f740f7a7517]
[bt] (9) /home/ubuntu/Projects/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x47) [0x7f740f7a7517]
Steps to reproduce
- Save the code as
test.py, and runpython2 test.py
What have you tried to solve it?
I have no idea where it comes from. That by tweaking row_len and col_len, it behaves differently seems weird to me. I only observed that
- When
row_len = 1, it never fails - When
row_len = 2, andcol_len < 8, it doesn't fails; whencol_len >= 8, it fails - When
row_len = 3, andcol_len < 7, it doesn't fails; whencol_len >= 7, it fails - When
row_len = 4, andcol_len < 7, it doesn't fails; whencol_len >= 7, it fails
Environment info
Not relevant though.
----------Python Info----------
('Version :', '2.7.15')
('Compiler :', 'GCC 7.2.0')
('Build :', ('default', 'May 1 2018 23:32:55'))
('Arch :', ('64bit', ''))
------------Pip Info-----------
('Version :', '10.0.1')
('Directory :', '/home/ubuntu/anaconda2/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
/home/ubuntu/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
('Version :', '1.3.0')
('Directory :', '/home/ubuntu/Projects/junru-mxnet/python/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform :', 'Linux-4.4.0-1062-aws-x86_64-with-debian-stretch-sid')
('system :', 'Linux')
('node :', 'ip-172-31-42-30')
('release :', '4.4.0-1062-aws')
('version :', '#71-Ubuntu SMP Fri Jun 15 10:07:39 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Thread(s) per core: 2
Core(s) per socket: 18
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping: 3
CPU MHz: 3000.000
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-17,36-53
NUMA node1 CPU(s): 18-35,54-71
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0019 sec, LOAD: 0.5934 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0070 sec, LOAD: 0.1024 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0099 sec, LOAD: 0.5567 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0035 sec, LOAD: 0.0305 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0994 sec, LOAD: 0.5750 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1169 sec, LOAD: 0.5358 sec.
Build info
Compiler: gcc
MXNet commit hash: dd954b4 (current HEAD)
Build config: default
Reactions are currently unavailable