Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Excessive memory allocation without static_alloc #12116

@safrooze

Description

@safrooze

Description

The change in #11951 that fixes nested call on CachedOp causes excessive memory allocation when hybridize() is called with default static_alloc=False. In my specific case, the memory allocation grows from 1.5GB to over 10GB.

Environment info (Required)

----------Python Info----------
Version      : 3.4.5
Compiler     : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)
Build        : ('default', 'Jul  2 2016 17:47:47')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 18.0
Directory    : /home/ec2-user/anaconda3/envs/mxnet_p34/lib/python3.4/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.0
Directory    : /home/ec2-user/src/mxnet/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.9.93-41.60.amzn1.x86_64-x86_64-with-glibc2.2.5
system       : Linux
node         : ip-172-31-73-235
release      : 4.9.93-41.60.amzn1.x86_64
version      : #1 SMP Fri Apr 13 21:58:27 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2699.945
BogoMIPS:              4600.11
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-7
----------Network Test----------
Setting timeout: 10
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0023 sec, LOAD: 0.0982 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1248 sec, LOAD: 0.4074 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0031 sec, LOAD: 0.1043 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1631 sec, LOAD: 0.4245 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0184 sec, LOAD: 0.5672 sec.
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0014 sec, LOAD: 0.4493 sec.

I'm using Python package

Build info (Required if built from source)

This is reproducible through package from PyPI: pip install --pre -U mxnet-cu90mkl==1.3.0b20180801. Installing the previous version (1.3.0b20180726) doesn't have this problem.

I also confirmed that the exact commit in #11951 resulted in this regression by building from source before and at this node.

Compiler: gcc
MXNet commit hash - BROKEN: ed20304
MXNet commit hash - GOOD: 98a41af

Build config:
(Paste the content of config.mk, or the build command.)

Minimum reproducible example

Don't have one yet. Might be able to come up with one if it's necessary.

Steps to reproduce

  1. build hybrid network.
  2. call hybridize()
  3. check memory usage using nvidida-smi
  4. Repeat without calling hybridize() or by calling hybridize(static_alloc=True) and problem goes away.

What have you tried to solve it?

Nothing yet.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions