Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Mxnet gets stuck when run example/image-classification/train_mnist.py #1468

@Azure-Vani

Description

@Azure-Vani

The process gets stuck when I run train_mnist.py in Ubuntu 15.04. I only change config.mk to use openblas as backend before build mxnet itself.

The last few lines in output of strace python train_mnist.py is:

clone(child_stack=0x7f71d3ffeff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f71d3fff9d0, tls=0x7f71d3fff700, child_tidptr=0x7f71d3fff9d0) = 2337
futex(0x24533cc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x24533c8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
brk(0x292f000) = 0x292f000
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=414, ...}) = 0
write(2, "2016-02-14 18:32:45,836 Node[0] "..., 612016-02-14 18:32:45,836 Node[0] Start training with [cpu(0)]
) = 61
brk(0x2992000) = 0x2992000
brk(0x29f4000) = 0x29f4000
brk(0x2a5e000) = 0x2a5e000
brk(0x2acd000) = 0x2acd000
futex(0x23e0aa4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x23e0aa0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x23e0a70, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x23e0ad0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x23e0ad4, FUTEX_WAIT_PRIVATE, 3, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x23e0a70, FUTEX_WAKE_PRIVATE, 1) = 0
brk(0x2aee000) = 0x2aee000
futex(0x7f71dc001260, FUTEX_WAKE_PRIVATE, 1) = 1
brk(0x2b50000) = 0x2b50000
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x23e0aa4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x23e0aa0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 3, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x23e0aa4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x23e0aa0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 5, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 7, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 9, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x23e0aa4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x23e0aa0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 11, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 13, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 15, NULL) = 0
futex(0x7f71dc000e40, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x23e0aa4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x23e0aa0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f71dc000e6c, FUTEX_WAIT_PRIVATE, 17, NULL

It seems get stuck on a mutex. Anyone knows what happens?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions