Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Slow CPU inference in Gluon GRU module #13634

@marekjg

Description

@marekjg

Description

Gluon.GRU is slow on the CPU comparing to ndarray.RNN GRU for the same input.

Environment info

Deep Learning AMI 19, Tesla V100

----------Python Info----------
Version      : 3.7.1
Compiler     : GCC 7.3.0
Build        : ('default', 'Oct 23 2018 19:19:42')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.1
Directory    : /home/ec2-user/anaconda3/envs/gmarek_mx13/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/ec2-user/anaconda3/envs/gmarek_mx13/lib/python3.7/site-packages/mxnet
Commit Hash   : b45e1273ece8eba1a011107ce12032af58efe661
----------System Info----------
Platform     : Linux-4.14.77-70.59.amzn1.x86_64-x86_64-with-glibc2.10
system       : Linux
node         : ip-172-31-44-214
release      : 4.14.77-70.59.amzn1.x86_64
version      : #1 SMP Mon Nov 12 22:02:45 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2701.073
BogoMIPS:              4600.18
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-7
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0018 sec, LOAD: 0.7860 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0006 sec, LOAD: 0.5938 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0006 sec, LOAD: 0.0175 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0004 sec, LOAD: 1.0119 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0114 sec, LOAD: 0.4352 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0004 sec, LOAD: 0.0866 sec.

Minimum reproducible example

from time import time

import mxnet as mx
from mxnet import nd
from mxnet import gluon
from mxnet.gluon import rnn

inp_dim = 1024
hid_dim = 1024
n_layers = 1
n_parameters = (inp_dim * hid_dim + hid_dim + hid_dim * hid_dim + hid_dim) * 3
n_steps = 100

for ctx in [mx.cpu(), mx.gpu()]:
    gru_params = nd.random.uniform(low=-1, high=1, shape=(n_parameters,), ctx=ctx)
    gru_ndarray = lambda x, h_0: nd.RNN(x, gru_params, h_0, num_layers=n_layers,
                                        state_size=hid_dim, mode='gru', state_outputs=True)
    gru_gluon = rnn.GRU(hid_dim, n_layers, input_size=inp_dim)
    gru_gluon.collect_params().initialize(ctx=ctx)
    gru_gluon.hybridize()

    x = nd.random_normal(0, 1, (1, 1, inp_dim), ctx=ctx)
    h_0 = x

    # JIC: warm-up
    _, _ = gru_gluon(x, h_0)
    nd.waitall()

    for method, gru in [('ndarray', gru_ndarray), ('gluon', gru_gluon)]:
        h = h_0
        start = time()
        for step in range(n_steps):
            _, h = gru(x, h)
            if method == 'gluon':
                h = h[0]
        nd.waitall()
        dt = time() - start
        print(ctx, method, dt)

Steps to reproduce

Run the above script with python

Output

Gluon.GRU is significantly slower than ndarray.RNN
device,method,time:
cpu(0) ndarray 0.07194805145263672
cpu(0) gluon 4.735473394393921
gpu(0) ndarray 0.013593673706054688
gpu(0) gluon 0.04437994956970215

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions