Fix `XGLMModelLanguageGenerationTest.test_batched_nan_fp16` by ydshieh · Pull Request #19473 · huggingface/transformers

ydshieh · 2022-10-10T18:22:57Z

What does this PR do?

#18057 added this test to test running with fp16.

However, from_pretrained(model_name, torch_dtype=torch.float16 seems not able to change the dtype for weights registered below:

transformers/src/transformers/models/xglm/modeling_xglm.py

Lines 168 to 176 in a7bc422

    
               self.make_weights(num_positions + self.offset, embedding_dim, padding_idx) 
        
           def make_weights(self, num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None): 
        
               emb_weights = self.get_embedding(num_embeddings, embedding_dim, padding_idx) 
        
               if hasattr(self, "weights"): 
        
                   # in forward put the weights on the correct dtype and device of the param 
        
                   emb_weights = emb_weights.to(dtype=self.weights.dtype, device=self.weights.device) 
        
               self.register_buffer("weights", emb_weights)

and hidden_states becomes again float32 (because position is) at

transformers/src/transformers/models/xglm/modeling_xglm.py

Line 715 in a7bc422

hidden_states = inputs_embeds + positions

and finally failed at hidden_states = self.self_attn_layer_norm(hidden_states) with

RuntimeError: expected scalar type Float but found Half

HuggingFaceDocBuilderDev · 2022-10-10T18:38:16Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2022-10-10T19:00:15Z

-        # embed positions
-        positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length)
+        # embed positions, cast from float32 to `inputs_embeds.dtype`
+        positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length).to(inputs_embeds.dtype)


position (and embed_positions's weight) are in float32 even if we load the model in float16. Need this cast to make later LayerNorm layer work.

I think the right fix would be to make sure the weights have the correct dtype, as the embedding layer is the biggest one, so the memory save is very important.

sgugger

Thanks for looking into this. While this fixes the issue, I'm not sure if it's the right fix.

sgugger · 2022-10-11T12:21:17Z

-        # embed positions
-        positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length)
+        # embed positions, cast from float32 to `inputs_embeds.dtype`
+        positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length).to(inputs_embeds.dtype)


I think the right fix would be to make sure the weights have the correct dtype, as the embedding layer is the biggest one, so the memory save is very important.

ydshieh · 2022-10-11T12:36:06Z

@sgugger OK, let me check if I can do something for (not a real) weights defined by

self.register_buffer("weights", emb_weights)

ydshieh · 2022-10-11T13:47:52Z

            emb[padding_idx, :] = 0

-        return emb
+        return emb.to(torch.get_default_dtype())


The involved test in this PR uses

from_pretrained(model_name, torch_dtype=torch.float16, ...)

but at init. time, it uses float to create some tensors that is then registered to the buffer (and keep float32)

transformers/src/transformers/models/xglm/modeling_xglm.py

Line 189 in 9ed80b0

emb = torch.arange(num_embeddings, dtype=torch.float).unsqueeze(1) * emb.unsqueeze(0)

sgugger

Thanks, this feels like a much better fix!

cast positions dtype in XGLMModel

4b75afc

ydshieh force-pushed the fix_pt_xglm_test branch from 53fece9 to 4b75afc Compare October 10, 2022 18:48

ydshieh commented Oct 10, 2022

View reviewed changes

ydshieh requested a review from sgugger October 10, 2022 19:04

sgugger reviewed Oct 11, 2022

View reviewed changes

Get the correct dtype at init time

e2a0dd0

ydshieh commented Oct 11, 2022

View reviewed changes

sgugger approved these changes Oct 11, 2022

View reviewed changes

Get the correct dtype at init time

fbe3403

ydshieh merged commit c664661 into huggingface:main Oct 11, 2022

ydshieh deleted the fix_pt_xglm_test branch October 11, 2022 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `XGLMModelLanguageGenerationTest.test_batched_nan_fp16`#19473

Fix `XGLMModelLanguageGenerationTest.test_batched_nan_fp16`#19473
ydshieh merged 3 commits intohuggingface:mainfrom
ydshieh:fix_pt_xglm_test

ydshieh commented Oct 10, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 10, 2022 •

edited

Loading

Uh oh!

ydshieh Oct 10, 2022

Uh oh!

sgugger Oct 11, 2022

Uh oh!

sgugger left a comment

Uh oh!

sgugger Oct 11, 2022

Uh oh!

ydshieh commented Oct 11, 2022

Uh oh!

ydshieh Oct 11, 2022

Uh oh!

sgugger left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self.make_weights(num_positions + self.offset, embedding_dim, padding_idx)

	def make_weights(self, num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None):
	emb_weights = self.get_embedding(num_embeddings, embedding_dim, padding_idx)
	if hasattr(self, "weights"):
	# in forward put the weights on the correct dtype and device of the param
	emb_weights = emb_weights.to(dtype=self.weights.dtype, device=self.weights.device)

	self.register_buffer("weights", emb_weights)

Conversation

ydshieh commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh Oct 10, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Oct 11, 2022

Uh oh!

ydshieh Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ydshieh commented Oct 10, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 10, 2022 •

edited

Loading