Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[v1.x] test_gluon_data unit tests failing #19877

@josephevans

Description

@josephevans

Description

On the v1.x pipeline, we are seeing the following test failures consistently:

in tests/python/unittest/test_gluon_data.py:

test_multi_worker_dataloader_release_pool
test_multi_worker_forked_data_loader

Occurrences

https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-19872/7/pipeline/293/#step-776-log-1725
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-19872/4/pipeline/296

Test failure logs:

[2021-02-10T01:39:46.205Z] test_gluon_data.test_multi_worker_dataloader_release_pool ... terminate called after throwing an instance of 'dmlc::Error'
[2021-02-10T01:39:46.205Z]   what():  [01:39:41] src/storage/./cpu_shared_storage_manager.h:218: Check failed: count >= 0 (-2 vs. 0) : 
[2021-02-10T01:39:46.205Z] Stack trace:
[2021-02-10T01:39:46.205Z]   [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x61) [0x7f191fc63b61]
[2021-02-10T01:39:46.205Z]   [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::storage::CPUSharedStorageManager::FreeImpl(mxnet::Storage::Handle const&)+0xd3) [0x7f192522fdf3]
[2021-02-10T01:39:46.205Z]   [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::storage::CPUSharedStorageManager::Free(mxnet::Storage::Handle)+0x98) [0x7f1925237348]
[2021-02-10T01:39:46.205Z]   [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::StorageImpl::Free(mxnet::Storage::Handle)+0x69) [0x7f1925232ce9]
[2021-02-10T01:39:46.205Z]   [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x5ade409) [0x7f1924b21409]
[2021-02-10T01:39:46.205Z]   [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x61d3c50) [0x7f1925216c50]
[2021-02-10T01:39:46.205Z]   [bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0xa50) [0x7f1925210440]
[2021-02-10T01:39:46.205Z]   [bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)+0x349) [0x7f192522c9d9]
[2021-02-10T01:39:46.205Z]   [bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, bool)+0x42b) [0x7f1925219f5b]
[2021-02-10T01:39:46.205Z]   [bt] (9) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool)+0xd8) [0x7f1925216948]
[2021-02-10T01:39:46.461Z] /work/runtime_functions.sh: line 1008:     6 Aborted                 (core dumped) nosetests-3.4 $NOSE_COVERAGE_ARGUMENTS $NOSE_TIMER_ARGUMENTS --with-xunit --xunit-file nosetests_unittest.xml --verbose 
[2021-02-09T22:11:59.574Z] ======================================================================
[2021-02-09T22:11:59.574Z] ERROR: test_gluon_data.test_multi_worker_forked_data_loader
[2021-02-09T22:11:59.574Z] ----------------------------------------------------------------------
[2021-02-09T22:11:59.574Z] Traceback (most recent call last):
[2021-02-09T22:11:59.574Z]   File "/usr/local/lib/python3.7/dist-packages/nose/case.py", line 198, in runTest
[2021-02-09T22:11:59.574Z]     self.test(*self.arg)
[2021-02-09T22:11:59.574Z]   File "/work/mxnet/tests/python/unittest/common.py", line 226, in test_new
[2021-02-09T22:11:59.574Z]     mx.nd.waitall()
[2021-02-09T22:11:59.574Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 211, in waitall
[2021-02-09T22:11:59.574Z]     check_call(_LIB.MXNDArrayWaitAll())
[2021-02-09T22:11:59.574Z]   File "/work/mxnet/python/mxnet/base.py", line 246, in check_call
[2021-02-09T22:11:59.574Z]     raise get_last_ffi_error()
[2021-02-09T22:11:59.574Z] mxnet.base.MXNetError: Traceback (most recent call last):
[2021-02-09T22:11:59.574Z]   [bt] (9) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool)+0xd8) [0x7f0df6da1c48]
[2021-02-09T22:11:59.574Z]   [bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, bool)+0x42b) [0x7f0df6da525b]
[2021-02-09T22:11:59.574Z]   [bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)+0x349) [0x7f0df6db7e69]
[2021-02-09T22:11:59.574Z]   [bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0xa50) [0x7f0df6d9b740]
[2021-02-09T22:11:59.574Z]   [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x63dbf50) [0x7f0df6da1f50]
[2021-02-09T22:11:59.574Z]   [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x5cde545) [0x7f0df66a4545]
[2021-02-09T22:11:59.574Z]   [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::StorageImpl::Free(mxnet::Storage::Handle)+0x69) [0x7f0df6dbe0b9]
[2021-02-09T22:11:59.574Z]   [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::storage::CPUSharedStorageManager::Free(mxnet::Storage::Handle)+0x98) [0x7f0df6dc2718]
[2021-02-09T22:11:59.574Z]   [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::storage::CPUSharedStorageManager::FreeImpl(mxnet::Storage::Handle const&)+0xcf) [0x7f0df6dbb27f]
[2021-02-09T22:11:59.574Z]   [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x61) [0x7f0df16c59e1]
[2021-02-09T22:11:59.574Z]   File "src/storage/./cpu_shared_storage_manager.h", line 218
[2021-02-09T22:11:59.574Z] MXNetError: Check failed: count >= 0 (-1 vs. 0) : 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions