BatchNorm can not converge with scale=False

## Description
 BatchNorm operator with ``scale=False`` can not converge.

### Error Message
No error message, but loss value and training accuracy is abnormal comparing with ``scale=True`` BatchNorm.

## To Reproduce
We can try ``https://github.com/nttstar/arcface.np`` to train arcface. Add one BatchNorm op with ``scale=False`` after final embedding layer


## What have you tried to solve it?

1. Set ``Scale=True``, it can work but with slightly worse test accuracy.

## Environment

----------Python Info----------
Version      : 3.6.9
Compiler     : GCC 7.3.0
Build        : ('default', 'Jul 30 2019 19:07:31')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.3.1
Directory    : /root/anaconda2/envs/py36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 2.0.0
Directory    : /root/anaconda2/envs/py36/lib/python3.6/site-packages/mxnet
Num GPUs     : 8
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-3.10.0-327.el7.x86_64-x86_64-with-centos-7.5.1804-Core
system       : Linux
node         : gpu06
release      : 3.10.0-327.el7.x86_64
version      : #1 SMP Thu Nov 19 22:10:57 UTC 2015


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BatchNorm can not converge with scale=False #18475

Description

Error Message

To Reproduce

What have you tried to solve it?

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BatchNorm can not converge with scale=False #18475

Description

Description

Error Message

To Reproduce

What have you tried to solve it?

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions