This repository was archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
[RFC] Moving MXNet-AMP to core #18896
Copy link
Copy link
Open
Labels
Description
MXNet already has experimental AMP (Automatic Mixed Precision) support, exposed in mxnet.contrib package. It is used for automatic casting models to both float16 and bfloat16. This RFC covers moving it into core / making a first-class feature, as well as further development.
Here's a rough task break down for the initial move:
Need to ensure AMP works with numpy ops - i.e., all ops are in either of the lists- done in AMP support for Numpy ops #19036API change: make loss scale public (Make loss scale public in AMP #17507)- done in AMP support for Numpy ops #19036Transparent / lazy AMP initialization? (Got "kFlag == type_flag_: TBlob.get_with_shape: data type do not match specified type.Expected: 0 v.s. given 2" when training with amp. #18902 (comment))- a warning added, when amp.init() is called and a model already exists in AMP support for Numpy ops #19036- A number of issues has to be resolved to improve user experience:
Cannot load trainer with AMP (Cannot load trainer with AMP #16858)- fixed in Get rid of monkey patching in LossScaler overflow handling #18959There's a CUDA crash (IMA) in amp_multicast, happens on some models (Yolo3)- fixed in Fix possible IMA in amp_multicast fusion #19318- AMP not reusing weights on recursive networks (AMP not reusing weights on recursive networks #19019)
The actual shuffling code around and updating import paths
Post move:
- Layout optimization - upstreaming feature already existing in NVIDIA NGC container. This helps convolutions' performance by automatically casting between NCHW and NHWC layouts.
- Explore alternatives to front end ops monkey-patching (AMP for mx2 #18697)
- Add a way for the user to turn AMP off, and to control AMP setting via a context manager.
szha and sxjscience