Clamping hidden state values to allow FP16 by SSamDav · Pull Request #19229 · huggingface/transformers

SSamDav · 2022-09-28T11:41:46Z

What does this PR do?

Fixes # (issue)
Following the discussion in #9295 and the solution proposed by #9487.

Implement the solution that enables the FP16 for LongT5 models.

Who can review:
@patrickvonplaten, @patil-suraj

HuggingFaceDocBuilderDev · 2022-09-28T11:53:58Z

The documentation is not available anymore as the PR was closed or merged.

SSamDav · 2022-09-28T13:46:42Z

Hi, it seems that I have a test error, however, I didn't change the code that is falling.
Does anyone know how I can pass these tests?

ydshieh · 2022-09-29T09:09:15Z

Thanks @SSamDav.

And sorry, I forgot to mention in order to avoid the test failure you saw yesterday, you can rebase the working branch on main. (If it still appears in the new run)

ydshieh

Thank you @SSamDav 🚀 .

The fix follows exactly what has been done for T5 👍

patrickvonplaten · 2022-09-29T19:01:54Z

Ok for me if we know that it helps for inference. In T5 it didn't really help for training at all in the end, so if this was implemented to enable training in fp16, I'm not sure it's a good idea (I think @patil-suraj won't have time to look into it btw).

Also cc @ArthurZucker here just FYI

ydshieh · 2022-09-30T12:08:08Z

Since @patrickvonplaten prefers to have an issue where this PR will solve, I think we are not going to merge this PR at this moment. Let's see if there will be such issues reported for LongT5 in the future. We can make an investigation and decide if to re-open/merge this PR (or with a different fix). WDYT? cc @younesbelkada @ArthurZucker .

And regarding nan: My though is that it's most likely coming from the sequences with all -inf after adding the mask to the attention scores, and nan after softmax, like what we observed in OPT or Bloom recently, where we provided a fix as close as to where nan happens. (However, in the case of T5, a clamp is done T5LayerFF which is not attention-related).

SSamDav · 2022-09-30T12:17:15Z

I my tests when I run a finetuned version of the google/long-t5-tglobal-base in FP16 I got Nan in the forward step, I could check if the values com from the LongT5LayerFF.

ydshieh · 2022-09-30T12:26:58Z

I my tests when I run a finetuned version of the google/long-t5-tglobal-base in FP16 I got Nan in the forward step, I could check if the values com from the LongT5LayerFF.

Hi, so it is running the inference, right? Is that finetuned checkpoint uploaded to Hub?

SSamDav · 2022-09-30T12:28:36Z

Hi, so it is running the inference, right?

Yes

Is that finetuned checkpoint uploaded to Hub?

No, it was trained in confidential data.

ydshieh · 2022-09-30T12:41:57Z

Is that finetuned checkpoint uploaded to Hub?

No, it was trained in confidential data.

Got it. However, it would be really nice if we have a public available checkpoint (on another dataset) that can show the issue and the effect of the fix. I understand that it may not easy to obtain another such checkpoint - and potentially time consuming.

@patrickvonplaten Any further comment?

younesbelkada · 2022-10-02T20:58:18Z

Hi!
I second what @ydshieh said, probably the root cause of this is happening inside the attention score computation as it has been observed for BLOOM and OPT - maybe it's worth investigating a bit before merging!
As a simple test, we could try to reproduce what has been done in #18057 & #17437 and see if this fixes the initial issue

younesbelkada

Thanks a lot @SSamDav for the fix!
I managed to reproduce the initial issue with the following snippet:

import torch
from transformers import AutoTokenizer, LongT5ForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/long-t5-tglobal-base")

model = LongT5ForConditionalGeneration.from_pretrained("google/long-t5-tglobal-base", torch_dtype=torch.float16).to(0)
inputs = tokenizer(100 * "studies have shown that owning a dog is good for you ", return_tensors="pt")
input_ids = inputs.input_ids

outputs = model.encoder(input_ids.to(0))
print(outputs.last_hidden_state.isnan().any())

However, it seems that the rootcause of this issue is happening at the LongT5LayerFF layer, exactly at this line. It seems that adding the previous hidden states with the GeLU-ed hidden states (forwarded_states) causes overflow issues. I tried adding scores = torch.max(scores, torch.tensor(torch.finfo(scores.dtype).min)) here but this seems to not help as the overflow comes after the attention layer. I propose to use these changes for now as it definitely helps to get inference in fp16 working. Maybe we should still add the line scores = torch.max(scores, torch.tensor(torch.finfo(scores.dtype).min)), but I can see that the attention scores are casted in fp32 before the softmax, so maybe it's not necessary - cc @ydshieh 🙏

I think that we should also add a small slow test reproducing the behavior of the snippet above!

@slow 
def test_fp16_inference(self):
    tokenizer = AutoTokenizer.from_pretrained("google/long-t5-tglobal-base")
    model = LongT5ForConditionalGeneration.from_pretrained("google/long-t5-tglobal-base", torch_dtype=torch.float16).to(0)
    inputs = tokenizer(100 * "studies have shown that owning a dog is good for you ", return_tensors="pt")
    input_ids = inputs.input_ids
    outputs = model.encoder(input_ids.to(0))
    self.assertFalse(outputs.last_hidden_state.isnan().any())

I also propose to change the comments and explicitly specify that this helps for fp16 inference, not training as mentioned by @patrickvonplaten

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

patrickvonplaten · 2022-10-04T12:18:56Z

Hey @SSamDav,

If the PR as is now solves your problem for inference - it's good for me to merge! I don't think it'll fix problems with fine-tuning though

SSamDav · 2022-10-04T13:39:51Z

Hey @patrickvonplaten, good thanks for the help!

ydshieh reviewed Sep 29, 2022

View reviewed changes

Comment thread src/transformers/models/longt5/modeling_longt5.py Outdated

SSamDav added 3 commits September 29, 2022 10:20

Clamping hidden state values to allow FP16

f015103

Reformating

d2d98c3

Adding missing if condition

2b1376c

SSamDav force-pushed the longt5-fp16 branch from 7aa8470 to 2b1376c Compare September 29, 2022 09:20

SSamDav requested a review from ydshieh September 29, 2022 09:34

ydshieh approved these changes Sep 29, 2022

View reviewed changes

ydshieh requested a review from patil-suraj September 29, 2022 10:05

younesbelkada approved these changes Oct 2, 2022

View reviewed changes

Comment thread src/transformers/models/longt5/modeling_longt5.py Outdated

Comment thread src/transformers/models/longt5/modeling_longt5.py Outdated

Comment thread src/transformers/models/longt5/modeling_longt5.py Outdated

SSamDav and others added 4 commits October 3, 2022 09:15

Update src/transformers/models/longt5/modeling_longt5.py

f8c7afe

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Update src/transformers/models/longt5/modeling_longt5.py

967e08a

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Update src/transformers/models/longt5/modeling_longt5.py

7989575

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Formating file

b5d7fe1

younesbelkada requested review from ArthurZucker and ydshieh October 3, 2022 11:40

patrickvonplaten merged commit 971da2e into huggingface:main Oct 4, 2022

geniki mentioned this pull request Feb 6, 2023

Longformer FP16 training broken since transformers 4.21 #21449

Closed

4 tasks

Conversation

SSamDav commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SSamDav commented Sep 28, 2022

Uh oh!

Uh oh!

ydshieh commented Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Sep 30, 2022

Uh oh!

SSamDav commented Sep 30, 2022

Uh oh!

ydshieh commented Sep 30, 2022

Uh oh!

SSamDav commented Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Sep 30, 2022

Uh oh!

younesbelkada commented Oct 2, 2022

Uh oh!

younesbelkada left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten commented Oct 4, 2022

Uh oh!

SSamDav commented Oct 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SSamDav commented Sep 28, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 28, 2022 •

edited

Loading

ydshieh commented Sep 29, 2022 •

edited

Loading

ydshieh left a comment •

edited

Loading

patrickvonplaten commented Sep 29, 2022 •

edited

Loading

SSamDav commented Sep 30, 2022 •

edited

Loading

younesbelkada left a comment •

edited

Loading