cohere_asr: fix bug for model_parallel_beam_search test case#45214
cohere_asr: fix bug for model_parallel_beam_search test case#45214ydshieh merged 5 commits intohuggingface:mainfrom
Conversation
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
|
|
||
| # Fixed sinusoidal position embedding added to token embeddings, then layernorm | ||
| pos_emb = self.pos_emb(position_ids.squeeze(0)) | ||
| pos_emb = pos_emb.to(inputs_embeds.device) |
There was a problem hiding this comment.
Could you share the (full) error log you have for this test, please 🙏 ? Thanks.
I don't see this model's tests using device_map = "auto", so it's a bit strange we have the device issue.
There was a problem hiding this comment.
When I use 4 cards: export CUDA_VISIBLE_DEVICES=0,4,5,6, and for this test case, it will throw error:
encoder_hidden_states = self.proj(encoder_hidden_states)
if (input_ids is None) ^ (inputs_embeds is not None):
raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
if inputs_embeds is None:
inputs_embeds = self.embed_tokens(input_ids)
if use_cache and past_key_values is None:
past_key_values = EncoderDecoderCache(DynamicCache(config=self.config), DynamicCache(config=self.config))
if position_ids is None:
past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
position_ids = torch.arange(inputs_embeds.shape[1], device=inputs_embeds.device) + past_seen_tokens
position_ids = position_ids.unsqueeze(0)
# Fixed sinusoidal position embedding added to token embeddings, then layernorm
pos_emb = self.pos_emb(position_ids.squeeze(0))
> inputs_embeds = self.embedding_layernorm(inputs_embeds + pos_emb)
^^^^^^^^^^^^^^^^^^^^^^^
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:3!
src/transformers/models/cohere_asr/modeling_cohere_asr.py:388: RuntimeError
|
ok, my bad, this test is actually using "auto" new_model = model_class.from_pretrained(tmp_dir, device_map="auto") |
|
run-slow: cohere_asr |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: cohere_asr |
|
This comment contains models: ["models/cohere_asr"] |
CI ResultsCommit Info
Model CI Report❌ 1 new failed tests from this PR 😭
|
|
run-slow: cohere_asr |
|
This comment contains models: ["models/cohere_asr"] |
…uggingface#45214) * cohere_asr: fix bug for model_parallel_beam_search test case Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix * fix --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This PR fixes failed test case:
tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrModelTest::test_model_parallel_beam_search, and add some adjustment to make the test cases pass for Intel XPU device. @ydshieh pls help review, thx!