Fix XGLMModelLanguageGenerationTest.test_batched_nan_fp16#19473
Fix XGLMModelLanguageGenerationTest.test_batched_nan_fp16#19473ydshieh merged 3 commits intohuggingface:mainfrom
XGLMModelLanguageGenerationTest.test_batched_nan_fp16#19473Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
53fece9 to
4b75afc
Compare
| # embed positions | ||
| positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length) | ||
| # embed positions, cast from float32 to `inputs_embeds.dtype` | ||
| positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length).to(inputs_embeds.dtype) |
There was a problem hiding this comment.
position (and embed_positions's weight) are in float32 even if we load the model in float16. Need this cast to make later LayerNorm layer work.
There was a problem hiding this comment.
I think the right fix would be to make sure the weights have the correct dtype, as the embedding layer is the biggest one, so the memory save is very important.
sgugger
left a comment
There was a problem hiding this comment.
Thanks for looking into this. While this fixes the issue, I'm not sure if it's the right fix.
| # embed positions | ||
| positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length) | ||
| # embed positions, cast from float32 to `inputs_embeds.dtype` | ||
| positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length).to(inputs_embeds.dtype) |
There was a problem hiding this comment.
I think the right fix would be to make sure the weights have the correct dtype, as the embedding layer is the biggest one, so the memory save is very important.
|
@sgugger OK, let me check if I can do something for (not a real) weights defined by self.register_buffer("weights", emb_weights) |
| emb[padding_idx, :] = 0 | ||
|
|
||
| return emb | ||
| return emb.to(torch.get_default_dtype()) |
There was a problem hiding this comment.
The involved test in this PR uses
from_pretrained(model_name, torch_dtype=torch.float16, ...)but at init. time, it uses float to create some tensors that is then registered to the buffer (and keep float32)
sgugger
left a comment
There was a problem hiding this comment.
Thanks, this feels like a much better fix!
What does this PR do?
#18057 added this test to test running with fp16.
However,
from_pretrained(model_name, torch_dtype=torch.float16seems not able to change the dtype for weights registered below:transformers/src/transformers/models/xglm/modeling_xglm.py
Lines 168 to 176 in a7bc422
and
hidden_statesbecomes againfloat32(becausepositionis) attransformers/src/transformers/models/xglm/modeling_xglm.py
Line 715 in a7bc422
and finally failed at
hidden_states = self.self_attn_layer_norm(hidden_states)withRuntimeError: expected scalar type Float but found Half