This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Description
Description
- There is an MXNet nightly benchmark which runs CV and NLP models on MXNet Nightly pip wheel and report the metrics and it showed a performance regression on GPU Memory.
- After bisecting the PRs, the PR due to which there is an increase in GPU memory of 120-130 MB happen is 17767. I ran the SSD training with and without the PR commit and it showed around 120MB increase in GPU Memory. I haven't ran all those models personally but the reproducing script is basic and it is applicable to all those models.
- List of models that affected are as below. The GPU Memory increase data we got from our internal benchmarking system:
| model |
GPU Memory (From:to) |
| VGG16_training |
9.63k to 9.78k |
| SSD_training |
4.75k to 4.91k |
| LSTM_inference: Gluon and Module |
G: 1.55k to 1.71 M: 1.47k to 1.64k |
| Caffenet_inference: Gluon and Module |
G: 2.17k to 2.31k M: 2.0k to 2.16k |
| YoloV3_GPU: Gluon and Module |
G: 1.87k to 2.03k M: 1.84k to 2.0k |
| Resnet50_v2_FP16_inference_GPU: Gluon and Module |
G: 1.58k to 1.74k M: 1.55k to 1.71k |
| Resnet50_v2_inference_GPU: Gluon and Module |
G: 1.55k to 1.71k M: 1.48k to 1.64k |
| Resnet152_v2_inference_GPU: Gluon and Module |
G: 1.92k to 2.08k M: 1.86k to 2.02k |
| Inception_Inference_GPU: Gluon and Module |
G: 1.43k to 1.59k M: 1.40 to 1.56k |
| SSD_inference_GPU: Gluon and Module |
G: 1.65k to 1.81k M: 1.54k to 1.70k |
| A3C_inference_GPU: Gluon and Module |
G: 1.32k to 1.48k M: 1.31k to 1.47k |
| word_language_model_hybrid_p3.16_training |
2.25k to 2.43k |
| Mobile_pose_training |
2.38k to 2.54k |
To Reproduce
- Run the below lines of code and monitor GPU usage using
nvidia-smi command.
import mxnet as mx
a = mx.nd.zeros((1,), ctx=mx.gpu())
Output:
MXNet build from source:
Instance: p3.16xLarge
Without PR [17767] Commit id: f882de0c7ecd6ff1f0fdba492865afc6d7e29271
GPU usage: gpu(0): 1279MiB / 16160MiB
With PR [17767] Commit id: 5542d03695b4a2589afb88acf128d4ba8ac94d0d
GPU usage: gpu(0): 1407MiB / 16160MiB
- I am tagging the PR author here: @ptrendx
- Thanks @ptrendx for sharing the simple reproducible script.