-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Open
Labels
Description
Bad data scaling and/or bad learning rates often cause training to diverge, producing NaN values and losses that quickly propagate throughout the net.
While occasional divergence is a fact of life, Caffe could do a better job of handling this situation, both in terms of user experience (there is no point in continuing to perform gradient descent on a NaN loss), and in terms of code correctness (pooling layer and probably others do not handle NaN comparisons correctly, which actually causes memory to be written out-of-bounds due to a default argmax index of -1).
See the issue with pooling layer at #1333 (and thanks @tleyden for pointing out the crash).
Reactions are currently unavailable