Skip `test_prompt_lookup_decoding_matches_greedy_search` for `voxtral` by ydshieh · Pull Request #40643 · huggingface/transformers

ydshieh · 2025-09-03T08:43:24Z

What does this PR do?

tests/models/voxtral/test_modeling_voxtral.py::VoxtralForConditionalGenerationModelTest::test_prompt_lookup_decoding_matches_greedy_search

is flaky

https://app.circleci.com/pipelines/github/huggingface/transformers/144445/workflows/829f1347-398e-493f-a531-36c3178da153/jobs/1909491

    @can_return_tuple
    @auto_docstring
    def forward(
        self,
        input_ids: Optional[torch.LongTensor] = None,
        input_features: Optional[torch.FloatTensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        past_key_values: Optional[Cache] = None,
        inputs_embeds: Optional[torch.FloatTensor] = None,
        labels: Optional[torch.LongTensor] = None,
        use_cache: Optional[bool] = None,
        cache_position: Optional[torch.LongTensor] = None,
        logits_to_keep: Union[int, torch.Tensor] = 0,
        **kwargs: Unpack[TransformersKwargs],
    ) -> CausalLMOutputWithPast:
        if inputs_embeds is None:
            inputs_embeds = self.get_input_embeddings()(input_ids)
    
        if input_features is not None:
            audio_embeds = self.get_audio_embeds(input_features)
    
            # replace text-audio token placeholders with audio embeddings
            audio_token_mask = input_ids == self.config.audio_token_id
>           inputs_embeds[audio_token_mask] = audio_embeds
E           RuntimeError: shape mismatch: value tensor of shape [30, 32] cannot be broadcast to indexing result of shape [32, 32]

/usr/local/lib/python3.9/site-packages/transformers/models/voxtral/modeling_voxtral.py:512: RuntimeError```

HuggingFaceDocBuilderDev · 2025-09-03T08:52:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Thanks for fixing, it's been annoying me for a while as well!

Now we have audio models and the list is growing huge 🙃 Maybe we can update this by checking if the config has any of audio/image/video token id and skip it?

ydshieh · 2025-09-03T11:33:19Z

Maybe we can update this by checking if the config has any of audio/image/video token id and skip it?

Let me add a TODO as comment and do it later 🙏

too much things this week 😢

fix

3b71b04

ydshieh requested review from gante and zucchini-nlp September 3, 2025 08:43

fix

8975419

zucchini-nlp approved these changes Sep 3, 2025

View reviewed changes

fix

714377e

ydshieh enabled auto-merge (squash) September 3, 2025 11:36

ydshieh merged commit c485c52 into main Sep 3, 2025
25 checks passed

ydshieh deleted the fix_voxtral branch September 3, 2025 11:45

ydshieh mentioned this pull request Sep 3, 2025

Skip test_prompt_lookup_decoding_matches_greedy_search for qwen2_audio #40664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip `test_prompt_lookup_decoding_matches_greedy_search` for `voxtral`#40643

Skip `test_prompt_lookup_decoding_matches_greedy_search` for `voxtral`#40643
ydshieh merged 3 commits intomainfrom
fix_voxtral

ydshieh commented Sep 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

ydshieh commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ydshieh commented Sep 3, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants