fix(x_clip): fix 8 failed test cases by kaixuanliu · Pull Request #45394 · huggingface/transformers

kaixuanliu · 2026-04-13T08:28:21Z

No description provided.

Fixed 8 test(s): - tests/models/x_clip/test_modeling_x_clip.py::XCLIPVisionModelTest::test_flash_attn_2_inference_equivalence - tests/models/x_clip/test_modeling_x_clip.py::XCLIPVisionModelTest::test_flash_attn_2_inference_equivalence_right_padding - tests/models/x_clip/test_modeling_x_clip.py::XCLIPVisionModelTest::test_model_parallelism - tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelTest::test_flash_attn_2_inference_equivalence - tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelTest::test_flash_attn_2_inference_equivalence_right_padding - tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelTest::test_model_parallelism - tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelIntegrationTest::test_inference - tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelIntegrationTest::test_inference_interpolate_pos_encoding

kaixuanliu · 2026-04-13T12:38:53Z

This PR fixes 8 failed test cases for x_clip model:

- tests/models/x_clip/test_modeling_x_clip.py::XCLIPVisionModelTest::test_flash_attn_2_inference_equivalence
- tests/models/x_clip/test_modeling_x_clip.py::XCLIPVisionModelTest::test_flash_attn_2_inference_equivalence_right_padding
- tests/models/x_clip/test_modeling_x_clip.py::XCLIPVisionModelTest::test_model_parallelism
- tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelTest::test_flash_attn_2_inference_equivalence
- tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelTest::test_flash_attn_2_inference_equivalence_right_padding
- tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelTest::test_model_parallelism
- tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelIntegrationTest::test_inference
- tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelIntegrationTest::test_inference_interpolate_pos_encoding

@ydshieh @molbap pls help review, thx!

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu · 2026-04-17T01:05:51Z

@zucchini-nlp Hi, can you help review? Thx!

zucchini-nlp · 2026-04-17T10:39:48Z

+    @unittest.skip(
+        reason="X-CLIP's hidden_states are nested in sub-outputs (text_model_output, vision_model_output), not at root level"
+    )
+    def test_flash_attn_2_inference_equivalence(self):
+        pass
+


can change this part to check if output has llogits_per_video, ig xclip has no images which is why it fails now

transformers/tests/test_modeling_common.py

Lines 3357 to 3366 in 69448db

def _get_output_logits(outputs):

if "hidden_states" in outputs:

return outputs.hidden_states[-1]

elif model.config.is_encoder_decoder:

return outputs.decoder_hidden_states[-1]

elif "logits_per_image" in outputs:

return outputs.logits_per_image

else:

return outputs.logits

I tried this but still failed for this case. As the value of logits_per_video is very sensitive to the vision_model_output's hidden_states, the tolerance is not enough.

Oops, on CUDA it is OK, I will update the code here.

zucchini-nlp · 2026-04-17T10:42:33Z

+    @unittest.skip(reason="X-CLIP needs batch size to match frames, can't crop and create new dummy inputs")
+    def test_flash_attn_2_inference_equivalence(self):
+        pass
+


i suppose already failing on main, but not sure we want to just skip it. @ydshieh to review

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

github-actions · 2026-04-20T13:48:05Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: x_clip

ydshieh · 2026-04-28T14:18:12Z

+    def __call__(self, images=None, text=None, videos=None, **kwargs):
+        # X-CLIP uses the image_processor for video frames. Map videos to images
+        # so the base class processes them through image_processor.
+        if videos is not None and images is None:
+            images = videos
+        return super().__call__(images=images, text=text, **kwargs)


cc @yonigozlan and @zucchini-nlp if you remember the history and could judge the change here.

yea, I made those changes. Actually x-clip processes videos in old-style via an image processor, so this is a valid fix

I don't mind it, so we don't have to override the whole __call__

Thanks a lot

ydshieh · 2026-04-28T14:53:18Z

run inside CI runner, all ✅

Sai-Suraj-27 mentioned this pull request Apr 13, 2026

Fix failing XCLIPModelIntegrationTest #45053

Closed

6 tasks

kaixuanliu marked this pull request as ready for review April 13, 2026 12:26

github-actions Bot requested review from molbap and ydshieh April 13, 2026 12:27

kaixuanliu changed the title ~~fix(x_clip): auto-fix failing tests~~ fix(x_clip): fix 8 failed test cases Apr 14, 2026

kaixuanliu added 2 commits April 14, 2026 07:43

update

2a47f0b

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Merge branch 'main' into fix-model-tests-x_clip-20260413

67d7202

kaixuanliu marked this pull request as draft April 16, 2026 02:19

kaixuanliu added 2 commits April 16, 2026 03:02

update skip reason

e4367a5

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

update skip reason

d6ed15e

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu changed the title ~~fix(x_clip): fix 8 failed test cases~~ fix(x_clip): fix 7 failed test cases Apr 16, 2026

kaixuanliu marked this pull request as ready for review April 16, 2026 03:29

update no_split_modules

d5a88c6

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu changed the title ~~fix(x_clip): fix 7 failed test cases~~ fix(x_clip): fix 8 failed test cases Apr 16, 2026

zucchini-nlp reviewed Apr 17, 2026

View reviewed changes

kaixuanliu added 2 commits April 17, 2026 15:05

update code

c61fd38

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Merge branch 'main' into fix-model-tests-x_clip-20260413

de7cc05

ydshieh reviewed Apr 28, 2026

View reviewed changes

ydshieh approved these changes Apr 28, 2026

View reviewed changes

ydshieh merged commit 5d7ff43 into huggingface:main Apr 28, 2026
24 checks passed

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(x_clip): fix 8 failed test cases#45394

fix(x_clip): fix 8 failed test cases#45394
ydshieh merged 8 commits intohuggingface:mainfrom
kaixuanliu:fix-model-tests-x_clip-20260413

kaixuanliu commented Apr 13, 2026

Uh oh!

kaixuanliu commented Apr 13, 2026 •

edited

Loading

Uh oh!

kaixuanliu commented Apr 17, 2026

Uh oh!

zucchini-nlp Apr 17, 2026

Uh oh!

kaixuanliu Apr 17, 2026

Uh oh!

kaixuanliu Apr 17, 2026

Uh oh!

kaixuanliu Apr 20, 2026

Uh oh!

zucchini-nlp Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

ydshieh Apr 28, 2026

Uh oh!

zucchini-nlp Apr 28, 2026

Uh oh!

ydshieh Apr 28, 2026

Uh oh!

ydshieh commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def _get_output_logits(outputs):
	if "hidden_states" in outputs:
	return outputs.hidden_states[-1]
	elif model.config.is_encoder_decoder:
	return outputs.decoder_hidden_states[-1]
	elif "logits_per_image" in outputs:
	return outputs.logits_per_image
	else:
	return outputs.logits

Conversation

kaixuanliu commented Apr 13, 2026

Uh oh!

kaixuanliu commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaixuanliu commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaixuanliu commented Apr 13, 2026 •

edited

Loading