Batch_dot does not support FP16 well

The `batch_dot` does not support FP16 well and can make training slower compared to using FP32. This is tested using [Transformer](https://github.com/dmlc/gluon-nlp/blob/master/scripts/nmt/train_transformer.py) model in Gluonnlp. This feature has been added in a [NVIDIA mxnet](https://docs.nvidia.com/deeplearning/dgx/mxnet-release-notes/rel_17.11.html#rel_17.11). So I think it is good to enable this in the master.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch_dot does not support FP16 well #11796

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Batch_dot does not support FP16 well #11796

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions