This repository was archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
MKLDNN batch_norm doesn't work with Large Tensor Support #19065
Copy link
Copy link
Closed
Labels
Description
Description
When run running batch_norm with large inputs for e.g.:
import mxnet as mx
from mxnet import np, npx
A = np.ones((2, 1000000000))
gamma = np.ones((2))
beta = np.zeros((2))
mov_mean = np.ones((2))
mov_var = np.ones((2))
A.attach_grad()
with mx.autograd.record():
B = npx.batch_norm(A, gamma, beta, mov_mean, mov_var)
print("output={}".format(B))
B.backward()
print("gradient={}".format(A.grad))
the program errors out giving following error:
(pytest) ubuntu@ip-172-31-0-156 ~/workspace/incubator-mxnet (mx2lts) $ python test_batch_norm.py
curr_path=/home/ubuntu/workspace/incubator-mxnet
sys_path=['/home/ubuntu/workspace/incubator-mxnet', '/home/ubuntu/workspace/incubator-mxnet/python', '/home/ubuntu/anaconda3/envs/pytest/lib/python38.zip', '/home/ubuntu/anaconda3/envs/pytest/lib/python3.8', '/home/ubuntu/anaconda3/envs/pytest/lib/python3.8/lib-dynload', '/home/ubuntu/anaconda3/envs/pytest/lib/python3.8/site-packages', '/home/ubuntu/workspace/incubator-mxnet/tests/python/unittest/']
[15:27:26] ../src/storage/storage.cc:198: Using Pooled (Naive) StorageManager for CPU
malloc_consolidate(): invalid chunk size
Aborted (core dumped)
To Reproduce
Build MXNet from source and enable Large Tensor Support by turning ON the flag USE_INT64_TENSOR_SIZE and run the above sample python script
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python
# paste outputs here
(pytest) ubuntu@ip-172-31-0-156 ~/workspace/incubator-mxnet (mx2lts) $ curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python
----------Python Info----------
Version : 3.8.5
Compiler : GCC 7.3.0
Build : ('default', 'Aug 5 2020 08:36:46')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 20.2.2
Directory : /home/ubuntu/anaconda3/envs/pytest/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version : 2.0.0
Directory : /home/ubuntu/workspace/incubator-mxnet/python/mxnet
Commit hash file "/home/ubuntu/workspace/incubator-mxnet/python/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library : ['/home/ubuntu/workspace/incubator-mxnet/python/mxnet/../../build/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✔ F16C
✖ JEMALLOC
✔ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ DIST_KVSTORE
✔ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✔ DEBUG
✖ TVM_OP
----------System Info----------
Platform : Linux-5.3.0-1032-aws-x86_64-with-glibc2.10
system : Linux
node : ip-172-31-0-156
release : 5.3.0-1032-aws
version : #34~18.04.2-Ubuntu SMP Fri Jul 24 10:06:28 UTC 2020
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2456.934
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.00
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0028 sec, LOAD: 0.4208 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1488 sec, LOAD: 0.2241 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2317 sec, LOAD: 0.4453 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0043 sec, LOAD: 0.1646 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0047 sec, LOAD: 0.1006 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.01151585578918457 sec.
----------Environment----------