Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Failing KVStore test dist-kvstore tests GPU #11441

@marcoabreu

Description

@marcoabreu

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11433/1/pipeline/1569
This test throws a lot of errors:

+ ../../tools/launch.py -n 7 --launcher local python dist_sync_kvstore.py

worker 0 is initialized

worker 3 is initialized

worker 4 is initialized

worker 2 is initialized

worker 5 is initialized

worker 1 is initialized

worker 6 is initialized

worker 6 is done with non compression tests

worker 1 is done with non compression tests

Traceback (most recent call last):

  File "dist_sync_kvstore.py", line 384, in <module>

    kv = init_kv()

  File "dist_sync_kvstore.py", line 72, in init_kv

    kv.init(keys_shape, [mx.nd.ones(shape)] * len(keys_shape))

  File "../../python/mxnet/kvstore.py", line 154, in init

    check_call(_LIB.MXKVStoreInitEx(self.handle, mx_uint(len(ckeys)), ckeys, cvals))

  File "../../python/mxnet/base.py", line 210, in check_call

    raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: [23:38:17] src/kvstore/./kvstore_local.h:84: Check failed: str_key_dict_.find(str_key) == str_key_dict_.end() duplicate init of key 3



Stack trace returned 10 entries:

[bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f2efc0db08b]

[bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f2efc0dbbf8]

[bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::kvstore::KVStoreLocal::Init(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x50d) [0x7f2efedb375d]

[bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXKVStoreInitEx+0x4be) [0x7f2efed2721e]

[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f2f5d8e7e40]

[bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f2f5d8e78ab]

[bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7f2f5daf73df]

[bt] (7) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7f2f5dafbd82]

[bt] (8) python(PyEval_EvalFrameEx+0x578f) [0x4c15bf]

[bt] (9) python(PyEval_EvalCodeEx+0x306) [0x4b9ab6]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions