Skip to content

Handle divergence/NaNs gracefully #1349

@longjon

Description

@longjon

Bad data scaling and/or bad learning rates often cause training to diverge, producing NaN values and losses that quickly propagate throughout the net.

While occasional divergence is a fact of life, Caffe could do a better job of handling this situation, both in terms of user experience (there is no point in continuing to perform gradient descent on a NaN loss), and in terms of code correctness (pooling layer and probably others do not handle NaN comparisons correctly, which actually causes memory to be written out-of-bounds due to a default argmax index of -1).

See the issue with pooling layer at #1333 (and thanks @tleyden for pointing out the crash).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions