Gemma4: fix failed test cases by kaixuanliu · Pull Request #45568 · huggingface/transformers

kaixuanliu · 2026-04-22T06:52:45Z

What does this PR do?

This PR did several things:

Skip some test cases that are not suitbale for gemma4 model
Fix bug when attention_mask is None(tests/models/gemma4/test_modeling_gemma4.py::Gemma4Audio2TextModelTest::test_eager_matches_fa2_generate)
fix some failed test cases related to test_flash_attn_x_from_config
Add XPU related Expectations

Fixes # (issue)

Code Agent Policy

I confirm that this is not a pure code agent PR.

Who can review?

@ydshieh pls help review

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

github-actions · 2026-04-27T02:16:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma4

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

ydshieh · 2026-04-29T12:37:17Z

+        if attention_mask is not None:
+            attention_mask = self._convert_4d_mask_to_blocked_5d(attention_mask)


@Cyrilvallez any opinion.

From PR descriptioin

Fix bug when attention_mask is None(tests/models/gemma4/test_modeling_gemma4.py::Gemma4Audio2TextModelTest::test_eager_matches_fa2_generate)

ydshieh · 2026-04-29T12:40:29Z

+    @unittest.skip(
+        "Under non-bf16 dtypes, MoE grouped_mm falls back to "
+        "_grouped_mm_fallback_backward which is incompatible with torch.compile."
+    )
+    def test_flash_attn_2_can_compile_with_attention_mask_None_without_graph_break(self):
+        pass
+
+    @unittest.skip(
+        "Under non-bf16 dtypes, MoE grouped_mm falls back to "
+        "_grouped_mm_fallback_backward which is incompatible with torch.compile."
+    )
+    def test_torch_compile_for_training(self):
+        pass


OK for me. Just let one of @Cyrilvallez or @vasqu to also valid or comment

They are indeed failing on our CI too

Hmm, iirc the fallback should be compile compatible cc @IlyasMoutawwakil

Which torch version is CI now at btw?

it is torch compileable, just not any mode that uses cuda graphs (like max-autotune), where the torch.grouped_mm also fails on <sm90 (it also uses the fallback path)

ydshieh · 2026-04-29T12:43:04Z

+    @require_flash_attn
+    @require_torch_accelerator
+    @mark.flash_attn_test
+    @slow
+    def test_flash_attn_2_from_config(self):
+        # Gemma4 requires mm_token_type_ids in train mode, so we test in eval mode
+        self.flash_attn_from_config(attn_implementation="flash_attention_2", test_fwd_in_train=False)
+
+    @require_flash_attn_3
+    @require_torch_gpu
+    @mark.flash_attn_3_test
+    @slow
+    def test_flash_attn_3_from_config(self):
+        # Gemma4 requires mm_token_type_ids in train mode, so we test in eval mode
+        self.flash_attn_from_config(attn_implementation="flash_attention_3", test_fwd_in_train=False)


@kaixuanliu I didn't see these 2 failing on our Flash Attn CI job.

Could you share more info / error logs ?

Our flash attn ci doesn have FA3 - I think it's hard to install because you need to compile from source and it's much longer than FA2 build from source

Maybe we could add a separate FA4 CI - not sure how stable it is tho since it's still in beta

Well, for FA3 and FA4, on my env they are skipped as well. I can delete these two.

Ah no, see my comment below #45568 (comment)

No, I mean for

test_flash_attn_2_from_config

our CI is [PASSED]. So I am not sure why we need this fix, at least for FA2.

Our CI runner don't have FA3 or FA4, so they are skipped. But the question may still valid: do we really this fix?

vasqu · 2026-04-29T13:02:16Z

+        pass
+
+    @unittest.skip("The base test does not pass image_position_ids and mm_token_type_ids required by Gemma4")
+    def test_flash_attn_4_inference_equivalence_right_padding(self):


Can we have something like

transformers/tests/models/dia/test_modeling_dia.py

Lines 258 to 271 in 727741f

def skip_non_greedy_generate(self):

skippable_tests = [

"test_sample_generate_dict_output", # return sequences > 1

"test_beam",

"test_contrastive",

"test_assisted",

"test_prompt_lookup",

"test_model_parallel_beam_search",

"test_generate_without_input_ids",

]

for test in skippable_tests:

if self._testMethodName.startswith(test):

self.skipTest(reason="Dia only supports greedy search / sampling with one sequence.")

But for FA? Imo it will always be quite a lot to skip these manually like that

kaixuanliu added 4 commits April 22, 2026 03:14

set eval mode for flash attn tests

078b908

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

skip flash_attn tests

7abaeef

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

fix bug when attention_mask is None

5eac346

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

add XPU expectations

edd29c4

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu changed the title ~~Gemma4 fix~~ Gemma4: fix failed test cases Apr 22, 2026

kaixuanliu added 2 commits April 22, 2026 07:30

add deterministic decorator

1ef6f01

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

skip 2 compile related tests

51671d4

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu marked this pull request as ready for review April 22, 2026 09:25

github-actions Bot requested review from Rocketknight1 and ydshieh April 22, 2026 09:25

Merge branch 'main' into gemma4-fix

388ad09

update

6165de2

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

ydshieh reviewed Apr 29, 2026

View reviewed changes

vasqu reviewed Apr 29, 2026

View reviewed changes

		if attention_mask is not None:
		attention_mask = self._convert_4d_mask_to_blocked_5d(attention_mask)

	def skip_non_greedy_generate(self):
	skippable_tests = [
	"test_sample_generate_dict_output", # return sequences > 1
	"test_beam",
	"test_contrastive",
	"test_assisted",
	"test_prompt_lookup",
	"test_model_parallel_beam_search",
	"test_generate_without_input_ids",
	]

	for test in skippable_tests:
	if self._testMethodName.startswith(test):
	self.skipTest(reason="Dia only supports greedy search / sampling with one sequence.")

Conversation

kaixuanliu commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Code Agent Policy

Who can review?

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaixuanliu commented Apr 22, 2026 •

edited

Loading