Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

A potential race condition in the executor or engine. #10865

@zheng-da

Description

@zheng-da

Previously, we encounter a memory error. It was caused by a race condition that the MKLDNN memory in an output NDArray was removed when some MKLDNN operator tried to read the MKLDNN memory from its input arrays. The error was temporarily fixed in #10651

This error can be reproduced in the following command:

export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
export MXNET_TEST_SEED=11
export MXNET_MODULE_SEED=812478194
export MXNET_TEST_COUNT=10000
nosetests-2.7 -v tests/python/unittest/test_module.py:test_forward_reshape

However, the race condition shouldn't happen. The execution engine schedules the execution of computation based on the data dependency. When an operator is scheduled to write data to an output NDArray, any operator that reads data from the NDArray shouldn't be scheduled for execution. But we actually observe that the input array of an operator is modified when the operator is running, which suggests that the race condition can mess up data in the input NDArray even without MKLDNN, but it's harder to notice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions