Skip to content

Skip test_prompt_lookup_decoding_matches_greedy_search for voxtral#40643

Merged
ydshieh merged 3 commits intomainfrom
fix_voxtral
Sep 3, 2025
Merged

Skip test_prompt_lookup_decoding_matches_greedy_search for voxtral#40643
ydshieh merged 3 commits intomainfrom
fix_voxtral

Conversation

@ydshieh
Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh commented Sep 3, 2025

What does this PR do?

tests/models/voxtral/test_modeling_voxtral.py::VoxtralForConditionalGenerationModelTest::test_prompt_lookup_decoding_matches_greedy_search

is flaky

https://app.circleci.com/pipelines/github/huggingface/transformers/144445/workflows/829f1347-398e-493f-a531-36c3178da153/jobs/1909491

    @can_return_tuple
    @auto_docstring
    def forward(
        self,
        input_ids: Optional[torch.LongTensor] = None,
        input_features: Optional[torch.FloatTensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        past_key_values: Optional[Cache] = None,
        inputs_embeds: Optional[torch.FloatTensor] = None,
        labels: Optional[torch.LongTensor] = None,
        use_cache: Optional[bool] = None,
        cache_position: Optional[torch.LongTensor] = None,
        logits_to_keep: Union[int, torch.Tensor] = 0,
        **kwargs: Unpack[TransformersKwargs],
    ) -> CausalLMOutputWithPast:
        if inputs_embeds is None:
            inputs_embeds = self.get_input_embeddings()(input_ids)
    
        if input_features is not None:
            audio_embeds = self.get_audio_embeds(input_features)
    
            # replace text-audio token placeholders with audio embeddings
            audio_token_mask = input_ids == self.config.audio_token_id
>           inputs_embeds[audio_token_mask] = audio_embeds
E           RuntimeError: shape mismatch: value tensor of shape [30, 32] cannot be broadcast to indexing result of shape [32, 32]

/usr/local/lib/python3.9/site-packages/transformers/models/voxtral/modeling_voxtral.py:512: RuntimeError```

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing, it's been annoying me for a while as well!

Now we have audio models and the list is growing huge 🙃 Maybe we can update this by checking if the config has any of audio/image/video token id and skip it?

@ydshieh
Copy link
Copy Markdown
Collaborator Author

ydshieh commented Sep 3, 2025

Maybe we can update this by checking if the config has any of audio/image/video token id and skip it?

Let me add a TODO as comment and do it later 🙏

too much things this week 😢

@ydshieh ydshieh enabled auto-merge (squash) September 3, 2025 11:36
@ydshieh ydshieh merged commit c485c52 into main Sep 3, 2025
25 checks passed
@ydshieh ydshieh deleted the fix_voxtral branch September 3, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants