Refactor CLIP-like models by zucchini-nlp · Pull Request #44431 · huggingface/transformers

zucchini-nlp · 2026-03-04T10:02:13Z

What does this PR do?

Re-opening back a PR on cleaning up clip-like model's backbones. Let's merge it now, I've been seeing quite a lot of ppl reporting it and I am not sure when it will be resolved by the big vision refactor

Basically, it just removes unused modules and gets rid of nested text_model.text_model. That allows capture_outputs to work easily without patching

vasqu · 2026-03-04T12:20:09Z

Let me wrap up #43590 first but very much needed 🙏 I've seen similar stuff not only on clip tbh but can't remember where

yonigozlan · 2026-03-06T20:28:07Z

Thanks for reopening @zucchini-nlp ! Sorry I couldn't get to it first, happy to review if needed!

HuggingFaceDocBuilderDev · 2026-03-10T10:14:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2026-03-13T15:03:11Z

run-slow: altclip, chinese_clip, clip, clipseg, kosmos2, metaclip_2, siglip, siglip2

github-actions · 2026-03-13T15:04:34Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/altclip", "models/chinese_clip", "models/clip", "models/clipseg", "models/kosmos2", "models/metaclip_2", "models/siglip", "models/siglip2"]
quantizations: []

github-actions · 2026-03-13T15:30:42Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	75c3b9d4	workflow commit (merge commit)
PR	e77ab154	branch commit (from PR)
main	2548d0db	base commit (on `main`)

Model CI Report

❌ 3 new failed tests from this PR 😭

altclip:
tests/models/altclip/test_modeling_altclip.py::AltCLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)
chinese_clip:
tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)
clip:
tests/models/clip/test_modeling_clip.py::CLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)

vasqu

Some initial comments, just to make sure: We do not need any conversions?

Other than these, super nice work. We need this so much

zucchini-nlp · 2026-03-17T10:13:59Z

-                            f"`text_config_dict` is provided which will be used to initialize `AltCLIPTextConfig`. The "
+                            f"`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The "


i don't like that modular didn't change this

zucchini-nlp · 2026-03-17T10:31:25Z

 @auto_docstring
 class Aimv2Model(Aimv2PreTrainedModel):
    config: Aimv2Config
-    _no_split_modules = ["Aimv2TextEmbeddings", "Aimv2EncoderLayer", "Aimv2VisionEmbeddings"]


already defined in Aimv2PreTrainedModel and cleaned thanks to modular

zucchini-nlp · 2026-03-17T10:33:41Z

        )


-class MetaClip2TextTransformer(MetaClip2PreTrainedModel):


this was a PreTrainedModel but without high lvl import available. Don't know if anyone was importing and using it, so may be breaking

Do we want to BC for this model?

zucchini-nlp · 2026-03-17T10:35:29Z

-        super().__init__(config)
-        self.vision_model = MLCDVisionTransformer(config)
-        # Initialize weights and apply final processing


with base model prefix, the model should be load-able back. Deleting this because MLCDVisionModel has no function except for being a wrapper around MLCDVisionTransformer

zucchini-nlp · 2026-03-17T13:14:03Z

@vasqu this is ready and modular now. The CI will be red for a while for unrelated reasons

vasqu

The core is super solid, I'm just nitpicky as always. I think there are a few valid comments but most are definitely not major

vasqu · 2026-03-27T12:37:25Z

Looks like a bad rebase 👀

zucchini-nlp · 2026-03-27T12:41:01Z

Which is why I love rebasing in github haha, will force squash all commits then

zucchini-nlp · 2026-03-27T14:07:54Z

-# Copyright 2022 WenXiang ZhongzhiCheng LedellWu LiuGuang BoWenZhang and The HuggingFace Inc. team. All rights reserved.
-#


Model and config had different headers. ig BAAI is the right one, models are released under BAAI

zucchini-nlp · 2026-03-27T14:09:11Z

run-slow: clip, clipseg, chinese_clip, altclip, siglip, siglip2, metaclip_2

github-actions · 2026-03-27T14:10:29Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/altclip", "models/chinese_clip", "models/clip", "models/clipseg", "models/metaclip_2", "models/siglip", "models/siglip2"]
quantizations: []

github-actions · 2026-03-27T14:36:43Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	c53ec6d5	workflow commit (merge commit)
PR	6a5c1e8c	branch commit (from PR)
main	689f52ce	base commit (on `main`)

Model CI Report

❌ 9 new failed tests from this PR 😭

altclip:
tests/models/altclip/test_modeling_altclip.py::AltCLIPModelIntegrationTest::test_inference (✅ ⟹ ❌)
chinese_clip:
tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPTextModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels (✅ ⟹ ❌)
tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPTextModelTest::test_eager_matches_sdpa_inference_05_fp16_pad_right (✅ ⟹ ❌)
tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)
tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelIntegrationTest::test_inference (✅ ⟹ ❌)
clip:
tests/models/clip/test_modeling_clip.py::CLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)
clipseg:
tests/models/clipseg/test_modeling_clipseg.py::CLIPSegModelTest::test_batching_equivalence (❌ ⟹ ❌)
tests/models/clipseg/test_modeling_clipseg.py::CLIPSegModelTest::test_sdpa_can_compile_dynamic (✅ ⟹ ❌)
metaclip_2:
tests/models/metaclip_2/test_modeling_metaclip_2.py::MetaClip2ModelTest::test_batching_equivalence (❌ ⟹ ❌)

zucchini-nlp · 2026-04-02T12:12:53Z

run-slow: altclip, chinese_clip, clip, clipseg, metaclip_2, siglip, siglip2, x_clip

github-actions · 2026-04-02T12:14:12Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/altclip", "models/chinese_clip", "models/clip", "models/clipseg", "models/metaclip_2", "models/siglip", "models/siglip2", "models/x_clip"]
quantizations: []

github-actions · 2026-04-02T12:38:37Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	b480eb10	workflow commit (merge commit)
PR	175e7eff	branch commit (from PR)
main	abc417a4	base commit (on `main`)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

zucchini-nlp · 2026-04-02T14:59:01Z

Slow tests are passing, checked with latest nightly

github-actions · 2026-04-09T13:57:14Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, altclip, blip, chinese_clip, clip, clipseg, clvp, git, groupvit, idefics, kosmos2, metaclip_2, mlcd, owlv2, owlvit, siglip

vasqu

Flaky test only, so merging as it's quite an important PR to have

biship · 2026-04-10T13:18:01Z

@vladmandic pretty sure this will break sd.next

vladmandic · 2026-04-10T13:22:53Z

@vladmandic pretty sure this will break sd.next

i'll check, thanks for heads-up. but yes, most likely.

Cyrilvallez · 2026-04-10T13:22:57Z

@zucchini-nlp @vasqu this breaks all existing checkpoints for clip and other models

Cyrilvallez · 2026-04-10T13:23:25Z

We absolutely cannot change modules graph without adding the necessary mappings or something

zucchini-nlp · 2026-04-10T13:40:31Z

FYI Cyril: I am aware and working on it (got swayed by smth else), so we don't have duplicate PRs

* squash! * fix copies for losses * reverse mapping test * xclip * fxi repo * altclip needs eager attn to pass the test?! and ofc non-causal masl * final fix repo * layernorm typo / docsting * clips can't agree on causality * ugh, skip xclip * fix repo * comments * and more * fixing stuff * i didn't push it yesterday? * style * break out of infinte dependency loop * fxi repo and hopefully merge today * again

zucchini-nlp requested a review from vasqu March 4, 2026 10:02

vasqu reviewed Mar 13, 2026

View reviewed changes

zucchini-nlp commented Mar 17, 2026

View reviewed changes

Comment thread src/transformers/models/clip/modeling_clip.py

zucchini-nlp commented Mar 17, 2026

View reviewed changes

zucchini-nlp requested a review from vasqu March 17, 2026 13:13

zucchini-nlp mentioned this pull request Mar 23, 2026

Fix: Add correct return behaviour when output_hidden_states=True for CLIP and SIGLIP vision models #44952

Open

2 tasks

vasqu reviewed Mar 25, 2026

View reviewed changes

squash!

56a76c2

zucchini-nlp force-pushed the siglip-architecture branch from 4f60399 to 56a76c2 Compare March 27, 2026 13:26

zucchini-nlp added 2 commits March 27, 2026 14:38

fix copies for losses

935b7e6

reverse mapping test

6a5c1e8

zucchini-nlp commented Mar 27, 2026

View reviewed changes

vasqu mentioned this pull request Mar 30, 2026

Refactor OwlViT to modular Transformers #45073

Open

zucchini-nlp added 2 commits March 30, 2026 18:52

xclip

607c754

fxi repo

c336140

zucchini-nlp added 3 commits April 2, 2026 14:09

i didn't push it yesterday?

abfe434

rebase

7175c5e

style

175e7ef

Merge branch 'main' into siglip-architecture

1c834e1

zucchini-nlp enabled auto-merge April 2, 2026 14:59

zucchini-nlp added 8 commits April 2, 2026 17:12

break out of infinte dependency loop

1b0f567

Merge branch 'main' into siglip-architecture

ea013e9

Merge branch 'main' into siglip-architecture

c9f97da

Merge branch 'main' into siglip-architecture

8620058

Merge remote-tracking branch 'upstream/main' into siglip-architecture

c1b09c0

fxi repo and hopefully merge today

c787a29

Merge branch 'main' into siglip-architecture

7734ca2

again

0de27b8

vasqu approved these changes Apr 9, 2026

View reviewed changes

vasqu disabled auto-merge April 9, 2026 14:22

vasqu merged commit 353a9dc into huggingface:main Apr 9, 2026
25 of 28 checks passed

This was referenced Apr 10, 2026

CI fails with dev dependencies for Llava models: AssertionError: Param is not updated huggingface/trl#5497

Closed

Fix CI with dev dependencies for Llava models huggingface/trl#5499

Merged

JustinTong0323 mentioned this pull request Apr 23, 2026

Upgrade transformers from 5.5.4 to 5.6.0 sgl-project/sglang#23525

Merged

6 tasks

		f"`text_config_dict` is provided which will be used to initialize `AltCLIPTextConfig`. The "
		f"`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The "

		# Copyright 2022 WenXiang ZhongzhiCheng LedellWu LiuGuang BoWenZhang and The HuggingFace Inc. team. All rights reserved.
		#

Conversation

zucchini-nlp commented Mar 4, 2026

What does this PR do?

Uh oh!

vasqu commented Mar 4, 2026

Uh oh!

yonigozlan commented Mar 6, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 10, 2026

Uh oh!

zucchini-nlp commented Mar 13, 2026

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

github-actions Bot commented Mar 13, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Mar 17, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu commented Mar 27, 2026

Uh oh!

zucchini-nlp commented Mar 27, 2026

Uh oh!

zucchini-nlp Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Mar 27, 2026

Uh oh!

github-actions Bot commented Mar 27, 2026

Uh oh!

github-actions Bot commented Mar 27, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

zucchini-nlp commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

zucchini-nlp Mar 17, 2026 •

edited

Loading

vladmandic commented Apr 10, 2026 •

edited

Loading