I'm not completely sure if it's this library problem, but maybe you could help.
Trying to run T5-large from huggingface's library with DeepSpeed library I got a strange result. When I change mode to fp16 training loss is going to be NaN value, as well as some of tensors in model's features output. I'm not sure, can it be a Transformers library fault? Original example that I used utilizes pytorch_pretrained_bert, and it works well.
Training with FP32 does not result any NaN troubles.
I have some code, made out of DeepSpeedExamples code:
https://github.com/exelents/try_t5
If somebody would like to help and try to run it, here is compiled binary dataset:
https://drive.google.com/file/d/1oxCxYCuCWebmaUQ_s9il7EDBkisL7x-_/view?usp=sharing
https://drive.google.com/file/d/1WCzxAnp2bEllbQ0_2d_6hoq5tQjxBFXh/view?usp=sharing
I'm not completely sure if it's this library problem, but maybe you could help.
Trying to run T5-large from huggingface's library with DeepSpeed library I got a strange result. When I change mode to fp16 training loss is going to be NaN value, as well as some of tensors in model's features output. I'm not sure, can it be a Transformers library fault? Original example that I used utilizes pytorch_pretrained_bert, and it works well.
Training with FP32 does not result any NaN troubles.
I have some code, made out of DeepSpeedExamples code:
https://github.com/exelents/try_t5
If somebody would like to help and try to run it, here is compiled binary dataset:
https://drive.google.com/file/d/1oxCxYCuCWebmaUQ_s9il7EDBkisL7x-_/view?usp=sharing
https://drive.google.com/file/d/1WCzxAnp2bEllbQ0_2d_6hoq5tQjxBFXh/view?usp=sharing