Fix slow test_moshika_greedy_unconditional_fp16 by manueldeprada · Pull Request #39251 · huggingface/transformers

manueldeprada · 2025-07-07T13:55:15Z

Coming from #38725, previously, e18f233 attempted to fix the default attention mask issue that appeated with #34464, but it was still failing the slow test tests/models/moshi/test_modeling_moshi.py::MoshiIntegrationTests::test_moshika_greedy_unconditional_fp16

History from git bisect:

transformers/src/transformers/modeling_utils.py

Line 1413 in 84a6789

torch_dtype = kwargs.pop("torch_dtype", config.torch_dtype)

from Enable different torch dtype in sub models #34873 broke the types for some commits, then it was fixed in a later commit.
auto-compilation on generate also made it fail for some commits, it was also corrected after some commits.
Then [generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) #35993 broke here the test (didnt have the -1):

transformers/src/transformers/generation/utils.py

Line 2090 in 36bf1d2

max_cache_length = generation_config.max_length - 1
And [tests] Test all cache implementations #37873 modified sliding window behaviour breaking it as well (>= to >)

transformers/src/transformers/cache_utils.py

Line 1740 in 1b22290

to_shift = cache_position > max_cache_len - 1

Setting cache_implementation="dynamic" makes the test pass, but the sliding window cache should not behave different. I believe this is due to the depth decoder being window 8 by default, but audio is confusing to me.

This PR is not a fix: the modeling code should be changed to accomodate what I highlight in the diff

cc @eustlb @ydshieh

manueldeprada · 2025-07-08T16:46:53Z

run-slow: moshi

github-actions · 2025-07-08T16:48:11Z

This comment contains run-slow, running the specified jobs:

models: ['models/moshi']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-07-08T16:56:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eustlb

Thanks for the update @manueldeprada, still some work to come to fix moshi...
Is this PR still relevant concerning changes to cache_utils.py and generation/utils.py ? If not can you solve merging conflicts?

eustlb · 2025-09-02T14:30:26Z

    def _prepare_attention_mask_for_generation(
        self,
-        input_ids: torch.LongTensor,
+        inputs_tensor: torch.Tensor,
        generation_config: GenerationConfig,
-        kwargs: dict[str, Any],
+        model_kwargs: dict[str, Any],
    ) -> torch.LongTensor:
-        pad_token_id = generation_config.pad_token_id
-        eos_token_id = generation_config.eos_token_id
-
-        default_attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device)
-        if pad_token_id is None:
-            return default_attention_mask
-
-        is_pad_token_in_inputs = (pad_token_id is not None) and torch.isin(input_ids, pad_token_id).any()
-        is_pad_token_not_equal_to_eos_token_id = (eos_token_id is None) or ~torch.isin(
-            eos_token_id, pad_token_id
-        ).any()
-        can_infer_attention_mask = is_pad_token_in_inputs * is_pad_token_not_equal_to_eos_token_id
-        attention_mask_from_padding = input_ids.ne(pad_token_id).long()
-
-        attention_mask = (
-            attention_mask_from_padding * can_infer_attention_mask + default_attention_mask * ~can_infer_attention_mask
+        return super()._prepare_attention_mask_for_generation(
+            inputs_tensor=inputs_tensor,
+            generation_config=generation_config,
+            model_kwargs={},
        )
-        return attention_mask


can't we just remove _prepare_attention_mask_for_generation override here?

eustlb · 2025-09-02T14:30:49Z

    slicing = torch.arange(max_cache_len, device=value_states.device)
    current_seq_len = cache_position[-1] + 1  # Use last position to determine current length
-    to_shift = current_seq_len > max_cache_len
+    to_shift = current_seq_len >= max_cache_len


this will break other models no?

eustlb · 2025-09-02T14:30:58Z

        # - different models have a different cache name expected by the model (default = "past_key_values")
        # - `max_length`, prepared above, is used to determine the maximum cache length
-        max_cache_length = generation_config.max_length - 1
+        max_cache_length = generation_config.max_length


this will break other models no?

manueldeprada and others added 5 commits July 7, 2025 15:50

fix moshi slow test

9c2c20c

Merge branch 'main' into fix-moshi

6ca350a

Merge remote-tracking branch 'upstream/main' into fix-moshi

f95c4df

new findings

7435908

ruff

220d707

huggingface deleted a comment from github-actions Bot Jul 8, 2025

huggingface deleted a comment from HuggingFaceDocBuilderDev Jul 8, 2025

huggingface deleted a comment from github-actions Bot Jul 8, 2025

eustlb reviewed Sep 2, 2025

View reviewed changes

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix slow test_moshika_greedy_unconditional_fp16#39251

Fix slow test_moshika_greedy_unconditional_fp16#39251
manueldeprada wants to merge 5 commits intohuggingface:mainfrom
manueldeprada:fix-moshi

manueldeprada commented Jul 7, 2025 •

edited by ydshieh

Loading

Uh oh!

manueldeprada commented Jul 8, 2025

Uh oh!

github-actions Bot commented Jul 8, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 8, 2025

Uh oh!

eustlb left a comment

Uh oh!

eustlb Sep 2, 2025

Uh oh!

eustlb Sep 2, 2025

Uh oh!

eustlb Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

manueldeprada commented Jul 7, 2025 • edited by ydshieh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manueldeprada commented Jul 8, 2025

Uh oh!

github-actions Bot commented Jul 8, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 8, 2025

Uh oh!

eustlb left a comment

Choose a reason for hiding this comment

Uh oh!

eustlb Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

eustlb Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

eustlb Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

manueldeprada commented Jul 7, 2025 •

edited by ydshieh

Loading