Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Gluon DataLoader cannot release the processes in the pool #13521

@YutingZhang

Description

@YutingZhang

https://github.com/apache/incubator-mxnet/blob/f2dcd7c7b8676b55d912997fc3f9c62c55915307/python/mxnet/gluon/data/dataloader.py#L532-L533

Logically, when a DataLoader is recycled, the _worker_pool should be recycled, and the terminate() of the _worker_pool function should be called immediately. However, it did not ...

Each time I kill a DataLoader, it leaves the worker processes dangling.
I guess it is a bug of python multiprocess.Pool. Anyway, I think we can patch it by explicitly call _worker_pool.terminate()

Minimum code to reproduce the errors.

import mxnet as mx
import numpy as np
A=np.random.rand(999, 2000)
D=mx.gluon.data.DataLoader(A, batch_size=8, num_workers=2)
the_iter = iter(D)
next(the_iter)
del the_iter
del D

I recorded a video demo for this bug: https://drive.google.com/open?id=1q4CmU_F1vAtxoZ_KUmrIEfVRk3RsQfv8

Environment: today's mxnet from pip, python3.6 on p3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions