-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Relocation truncation issues #17045
Description
Description
libmxnet.so gets too large (depending on compile options), so that linking fails. This was observed before on CI with test coverage functionality enabled (#15971), but can also happen with non-test-coverage builds, such as -DUSE_INT64_TENSOR_SIZE=ON build.
I first observe this in the #17031 (http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0), but can easily reproduce it on the master branch when building with GCC 7.4.
Error Message
From the CI
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x1e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x6b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x9c): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
crtstuff.c:(.text+0xaa): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o
crtstuff.c:(.text+0xbb): additional relocation overflows omitted from the output
libmxnet.so: PC-relative offset overflow in PLT entry for `_ZN5mxnet2op8mxnet_op6KernelINS0_9pick_gradILi3ELb0EEEN7mshadow3gpuEE6LaunchIJPdS9_PfiiNS5_5ShapeILi3EEESC_EEEvPNS5_6StreamIS6_EEiDpT_'
collect2: error: ld returned 1 exit status
FAILED: : && /tmp/ccache-redirects/g++ -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -std=c++11 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fopenmp -std=c++0x -O3 -DNDEBUG tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/engine_shutdown_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/thread_local_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/threaded_engine_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/kvstore/gpu_topology_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/libinfo_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/activation_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/batchnorm_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/coreop_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/dropout_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/fully_conn_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/krprod_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_operator_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/runner/core_op_runner_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/slice_channel_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/tune/operator_tune_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/storage/storage_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cmake_device_link.o -o tests/mxnet_unit_tests -L/usr/local/cuda/lib64 -L/work/build/3rdparty/tvm -L/usr/local/cuda/targets/x86_64-linux/lib -Wl,-rpath,/usr/local/cuda/lib64:/work/build/3rdparty/openmp/runtime/src:/work/build/3rdparty/tvm lib/libgtest.a -Wl,--whole-archive libmxnet.a -Wl,--no-whole-archive 3rdparty/dmlc-core/libdmlc.a /usr/local/cuda/lib64/libnvToolsExt.so /usr/lib/libopenblas.so /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libjemalloc.so /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9 /usr/lib/x86_64-linux-gnu/libopencv_imgproc.so.2.4.9 3rdparty/openmp/runtime/src/libomp.so -lpthread -llapack /usr/lib/x86_64-linux-gnu/libjemalloc.so /usr/lib/x86_64-linux-gnu/libcudnn.so -lcublas -lcufft -lcusolver -lcurand -lnvrtc -lcuda /usr/lib/x86_64-linux-gnu/libprotobuf.so /usr/lib/x86_64-linux-gnu/libzmq.so 3rdparty/ps-lite/libpslite.a -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libzmq.so 3rdparty/ps-lite/libpslite.a -lprotobuf -lrt -lpthread -llapack /usr/lib/x86_64-linux-gnu/libcudnn.so -lcublas -lcufft -lcusolver -lcurand -lnvrtc -lcuda /usr/lib/x86_64-linux-gnu/libprotobuf.so /usr/lib/x86_64-linux-gnu/libzmq.so -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libzmq.so -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.9 -ldl -lpthread -lcudadevrt -lcudart_static -lrt -lpthread -ldl && :
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x8): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `register_tm_clones':
crtstuff.c:(.text+0x49): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x82): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x95): relocation truncated to fit: R_X86_64_PC32 against `.bss'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/engine_shutdown_test.cc.o: In function `EngineShutdown_stop_without_crashing_Test::TestBody()':
engine_shutdown_test.cc:(.text+0xf8): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x130): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x137): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x15d): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
engine_shutdown_test.cc:(.text+0x18d): additional relocation overflows omitted from the output
tests/mxnet_unit_tests: PC-relative offset overflow in PLT entry for `nvrtcGetPTX@@libnvrtc.so.10.1'
collect2: error: ld returned 1 exit status
Compiling master version with GCC on Ubuntu 18.04 (Deep Learning AMI) gives an equivalent error message (though slightly different wording due to GCC vs Clang).
To Reproduce
cmake -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52,70 -DUSE_INT64_TENSOR_SIZE=ON ..
on Ubuntu 18.04 (gcc 7.4, ld 2.3), where the CMake options here are taken from the build_ubuntu_gpu_large_tensor CI run.
Environment
Environment used for reproducing the error with master version of MXNet.
----------Python Info----------
Version : 3.8.0
Compiler : GCC 7.4.0
Build : ('default', 'Dec 8 2019 08:07:09')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 19.2.3
Directory : /home/ubuntu/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/ubuntu/src/mxnet-dc/python/mxnet
Num GPUs : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Linux-4.15.0-1056-aws-x86_64-with-glibc2.27
system : Linux
node : ip-172-31-26-35
release : 4.15.0-1056-aws
version : #58-Ubuntu SMP Tue Nov 26 15:14:34 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping: 7
CPU MHz: 3600.024
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0021 sec, LOAD: 0.3891 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0003 sec, LOAD: 0.3134 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0450 sec, LOAD: 0.0738 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0034 sec, LOAD: 0.0103 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0159 sec, LOAD: 0.1406 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0432 sec, LOAD: 0.3530 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0021 sec, LOAD: 0.0701 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0313 sec, LOAD: 0.1727 sec.