Handle divergence/NaNs gracefully

Bad data scaling and/or bad learning rates often cause training to diverge, producing NaN values and losses that quickly propagate throughout the net.

While occasional divergence is a fact of life, Caffe could do a better job of handling this situation, both in terms of user experience (there is no point in continuing to perform gradient descent on a NaN loss), and in terms of code correctness (pooling layer and probably others do not handle NaN comparisons correctly, which actually causes memory to be written out-of-bounds due to a default argmax index of -1).

See the issue with pooling layer at #1333 (and thanks @tleyden for pointing out the crash).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle divergence/NaNs gracefully #1349

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle divergence/NaNs gracefully #1349

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions