Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Unittest failed: ThreadSafety.CachedOpFullModel #17833

@sl1pkn07

Description

@sl1pkn07

Description

Fail pass unitest ThreadSafety.CachedOpFullModel

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

Details
 
1: [ RUN      ] ThreadSafety.CachedOpFullModel
1: [18:04:39] /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/tests/cpp/thread_safety/thread_safety_test.cc:175: Running inference for imagenet1k-resnet-18 num_threads: 20 num_inf_per_thread: 1 random_sleep: 1 static_alloc: 0 static_shape: 0
1: unknown file: Failure
1: C++ exception with description "[18:04:39] /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/tests/../cpp-package/include/mxnet-cpp/symbol.hpp:102: Check failed: (MXSymbolCreateFromFile(file_name.c_str(), &(handle))) == (0) 
1: 
1: " thrown in the test body.
1: [  FAILED  ] ThreadSafety.CachedOpFullModel (2 ms)
1: [----------] 2 tests from ThreadSafety (161 ms total)
1: 
1: [----------] Global test environment tear-down
1: [==========] 102 tests from 21 test cases ran. (774184 ms total)
1: [  PASSED  ] 101 tests.
1: [  FAILED  ] 1 test, listed below:
1: [  FAILED  ] ThreadSafety.CachedOpFullModel
1: 
1:  1 FAILED TEST
1:   YOU HAVE 1 DISABLED TEST
1: 
1: 
1/1 Test #1: AllTestsInmxnetUnitTests .........***Failed  775.58 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 775.59 sec

The following tests FAILED:
          1 - AllTestsInmxnetUnitTests (Failed)
Errors while running CTest

NOTE: also get this, but not faail

Details
1: [----------] 3 tests from ACTIVATION_PERF
1: [ RUN      ] ACTIVATION_PERF.ExecuteBidirectional
1: terminate called after throwing an instance of 'dmlc::Error'
1:   what():  [17:56:41] /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mshadow/./stream_gpu-inl.h:184: Check failed: e == cudaSuccess: CUDA: initialization error
1: 
1: 
1: [       OK ] OMPBehaviour.after_fork (298654 ms)

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  mkdir -p build
  cd build
  cmake ../incubator-mxnet \
    -DCMAKE_BUILD_TYPE=None \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DCMAKE_INSTALL_DOCDIR=share/doc/mxnet \
    -DENABLE_CUDA_RTC=ON \
    -DBUILD_CPP_EXAMPLES=OFF \
    -DUSE_CCACHE=OFF \
    -DUSE_CPP_PACKAGE=ON \
    -DUSE_CUDNN=ON \
    -DUSE_NCCL=ON \
    -DUSE_OPENCV=ON \
    -DUSE_OPENMP=ON \
    -DUSE_MKL_IF_AVAILABLE=OFF \
    -DUSE_MKLDNN=OFF \
    -DUSE_LAPACK=ON \
    -DUSE_LIBJPEG_TURBO=ON \
    -DUSE_JEMALLOC=OFF \
    -DUSE_GPERFTOOLS=OFF \
    -DUSE_DIST_KVSTORE=OFF \
    -DNCCL_ROOT=/usr \
    -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF \
    -DCUDA_HOST_COMPILER=/opt/cuda/bin/gcc \
    -DCMAKE_CUDA_FLAGS=--expt-relaxed-constexpr \
    -DCMAKE_C_COMPILER=/usr/bin/cc-8 \
    -DCMAKE_C_COMPILER_AR=/usr/bin/gcc-ar-8 \
    -DCMAKE_C_COMPILER_RANLIB=/usr/bin/gcc-ranlib-8 \
    -DCMAKE_CXX_COMPILER=/usr/bin/c++-8 \
    -DCMAKE_CXX_COMPILER_AR=/usr/bin/gcc-ar-8 \
    -DCMAKE_CXX_COMPILER_RANLIB=/usr/bin/gcc-ranlib-8

  LC_ALL=C make # VERBOSE=1
  cd tests
  MXNET_LIBRARY_PATH="$(pwd)/../build/libmxnet.so" GTEST_COLOR=1 ctest --verbose

Environment

Archlinux
Linux sL1pKn07 5.5.4-arch1-1 #1 SMP PREEMPT Sat, 15 Feb 2020 00:36:29 +0000 x86_64 GNU/Linux

MXnet from git (master)
Cuda 10.2
GCC 8.4.0
Nvidia 2060RTX
OpenCV 4.2.0 (builded with CUDA support)

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

Details
----------Python Info----------
Version      : 3.8.2
Compiler     : GCC 9.2.1 20200130
Build        : ('default', 'Feb 26 2020 22:21:03')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 20.0.2
Directory    : /usr/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version      : 2.0.0
Directory    : /usr/lib/python3.8/site-packages/mxnet
Num GPUs     : 1
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-5.5.4-arch1-1-x86_64-with-glibc2.2.5
system       : Linux
node         : sL1pKn07
release      : 5.5.4-arch1-1
version      : #1 SMP PREEMPT Sat, 15 Feb 2020 00:36:29 +0000
----------Hardware Info----------
machine      : x86_64
processor    : 
Arquitectura:                        x86_64
modo(s) de operación de las CPUs:    32-bit, 64-bit
Orden de los bytes:                  Little Endian
Tamaños de las direcciones:          46 bits physical, 48 bits virtual
CPU(s):                              48
Lista de la(s) CPU(s) en línea:      0-47
Hilo(s) de procesamiento por núcleo: 2
Núcleo(s) por «socket»:              12
«Socket(s)»                          2
Modo(s) NUMA:                        2
ID de fabricante:                    GenuineIntel
Familia de CPU:                      6
Modelo:                              79
Nombre del modelo:                   Genuine Intel(R) CPU 0000 @ 2.20GHz
Revisión:                            0
CPU MHz:                             2195.314
CPU MHz máx.:                        2200,0000
CPU MHz mín.:                        1200,0000
BogoMIPS:                            4392.60
Virtualización:                      VT-x
Caché L1d:                           768 KiB
Caché L1i:                           768 KiB
Caché L2:                            6 MiB
Caché L3:                            60 MiB
CPU(s) del nodo NUMA 0:              0-11,24-35
CPU(s) del nodo NUMA 1:              12-23,36-47
---secret ;)--- 
Indicadores:                         fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est t
                                     m2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 e
                                     rms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0165 sec, LOAD: 0.7566 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0177 sec, LOAD: 0.6921 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1040 sec, LOAD: 0.7131 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0366 sec, LOAD: 0.4594 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0356 sec, LOAD: 0.3307 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1042 sec, LOAD: 0.7952 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0169 sec, LOAD: 0.6182 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0204 sec, LOAD: 0.2750 sec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions