Skip to content

When learning with Transformer, loss becomes nan after backpropagation. #37

@sooftware

Description

@sooftware

Currently, Seq2seq and Transformer have two models implemented, and after backpropagation when learning with Transformer, the phenomenon of loss becoming nan continues. I have tried debugging, but I have not yet confirmed which part is wrong. If you have had a similar experience or have any guesses, I would appreciate it if you could help me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions