Skip to content

Refactor CLIP-like models#44431

Merged
vasqu merged 26 commits intohuggingface:mainfrom
zucchini-nlp:siglip-architecture
Apr 9, 2026
Merged

Refactor CLIP-like models#44431
vasqu merged 26 commits intohuggingface:mainfrom
zucchini-nlp:siglip-architecture

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

What does this PR do?

Re-opening back a PR on cleaning up clip-like model's backbones. Let's merge it now, I've been seeing quite a lot of ppl reporting it and I am not sure when it will be resolved by the big vision refactor

Basically, it just removes unused modules and gets rid of nested text_model.text_model. That allows capture_outputs to work easily without patching

@zucchini-nlp zucchini-nlp requested a review from vasqu March 4, 2026 10:02
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 4, 2026

Let me wrap up #43590 first but very much needed 🙏 I've seen similar stuff not only on clip tbh but can't remember where

@yonigozlan
Copy link
Copy Markdown
Member

Thanks for reopening @zucchini-nlp ! Sorry I couldn't get to it first, happy to review if needed!

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: altclip, chinese_clip, clip, clipseg, kosmos2, metaclip_2, siglip, siglip2

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/altclip", "models/chinese_clip", "models/clip", "models/clipseg", "models/kosmos2", "models/metaclip_2", "models/siglip", "models/siglip2"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 75c3b9d4 workflow commit (merge commit)
PR e77ab154 branch commit (from PR)
main 2548d0db base commit (on main)

Model CI Report

3 new failed tests from this PR 😭

  • altclip:
    tests/models/altclip/test_modeling_altclip.py::AltCLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)

  • chinese_clip:
    tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)

  • clip:
    tests/models/clip/test_modeling_clip.py::CLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, just to make sure: We do not need any conversions?

Other than these, super nice work. We need this so much

Comment thread src/transformers/models/altclip/modeling_altclip.py Outdated
Comment thread src/transformers/models/altclip/modeling_altclip.py
Comment thread src/transformers/models/altclip/modeling_altclip.py
Comment thread src/transformers/models/chinese_clip/modeling_chinese_clip.py Outdated
Comment thread src/transformers/models/clip/modeling_clip.py Outdated
Comment thread src/transformers/models/metaclip_2/modular_metaclip_2.py
Comment thread tests/test_modeling_common.py
Comment thread utils/check_repo.py
Comment on lines -180 to +203
f"`text_config_dict` is provided which will be used to initialize `AltCLIPTextConfig`. The "
f"`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The "
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like that modular didn't change this

Comment thread src/transformers/models/clip/modeling_clip.py
@auto_docstring
class Aimv2Model(Aimv2PreTrainedModel):
config: Aimv2Config
_no_split_modules = ["Aimv2TextEmbeddings", "Aimv2EncoderLayer", "Aimv2VisionEmbeddings"]
Copy link
Copy Markdown
Member Author

@zucchini-nlp zucchini-nlp Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already defined in Aimv2PreTrainedModel and cleaned thanks to modular

)


class MetaClip2TextTransformer(MetaClip2PreTrainedModel):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a PreTrainedModel but without high lvl import available. Don't know if anyone was importing and using it, so may be breaking

Do we want to BC for this model?

Comment on lines -520 to -522
super().__init__(config)
self.vision_model = MLCDVisionTransformer(config)
# Initialize weights and apply final processing
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with base model prefix, the model should be load-able back. Deleting this because MLCDVisionModel has no function except for being a wrapper around MLCDVisionTransformer

@zucchini-nlp zucchini-nlp requested a review from vasqu March 17, 2026 13:13
@zucchini-nlp
Copy link
Copy Markdown
Member Author

@vasqu this is ready and modular now. The CI will be red for a while for unrelated reasons

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core is super solid, I'm just nitpicky as always. I think there are a few valid comments but most are definitely not major

Comment thread src/transformers/models/altclip/modular_altclip.py
Comment thread src/transformers/models/altclip/modular_altclip.py Outdated
Comment thread src/transformers/models/altclip/modular_altclip.py Outdated
Comment thread src/transformers/models/altclip/modular_altclip.py Outdated
Comment thread src/transformers/models/altclip/modular_altclip.py Outdated
Comment thread src/transformers/models/siglip2/modular_siglip2.py
Comment thread tests/models/altclip/test_modeling_altclip.py
Comment thread tests/models/gemma3n/test_modeling_gemma3n.py
Comment thread tests/models/sam3/test_modeling_sam3.py
Comment thread tests/models/siglip2/test_modeling_siglip2.py
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 27, 2026

Looks like a bad rebase 👀

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Which is why I love rebasing in github haha, will force squash all commits then

Comment on lines -1 to -2
# Copyright 2022 WenXiang ZhongzhiCheng LedellWu LiuGuang BoWenZhang and The HuggingFace Inc. team. All rights reserved.
#
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model and config had different headers. ig BAAI is the right one, models are released under BAAI

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: clip, clipseg, chinese_clip, altclip, siglip, siglip2, metaclip_2

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/altclip", "models/chinese_clip", "models/clip", "models/clipseg", "models/metaclip_2", "models/siglip", "models/siglip2"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN c53ec6d5 workflow commit (merge commit)
PR 6a5c1e8c branch commit (from PR)
main 689f52ce base commit (on main)

Model CI Report

9 new failed tests from this PR 😭

  • altclip:
    tests/models/altclip/test_modeling_altclip.py::AltCLIPModelIntegrationTest::test_inference (✅ ⟹ ❌)

  • chinese_clip:
    tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPTextModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels (✅ ⟹ ❌)
    tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPTextModelTest::test_eager_matches_sdpa_inference_05_fp16_pad_right (✅ ⟹ ❌)
    tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)
    tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelIntegrationTest::test_inference (✅ ⟹ ❌)

  • clip:
    tests/models/clip/test_modeling_clip.py::CLIPModelTest::test_batching_equivalence (❌ ⟹ ❌)

  • clipseg:
    tests/models/clipseg/test_modeling_clipseg.py::CLIPSegModelTest::test_batching_equivalence (❌ ⟹ ❌)
    tests/models/clipseg/test_modeling_clipseg.py::CLIPSegModelTest::test_sdpa_can_compile_dynamic (✅ ⟹ ❌)

  • metaclip_2:
    tests/models/metaclip_2/test_modeling_metaclip_2.py::MetaClip2ModelTest::test_batching_equivalence (❌ ⟹ ❌)

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: altclip, chinese_clip, clip, clipseg, metaclip_2, siglip, siglip2, x_clip

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/altclip", "models/chinese_clip", "models/clip", "models/clipseg", "models/metaclip_2", "models/siglip", "models/siglip2", "models/x_clip"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN b480eb10 workflow commit (merge commit)
PR 175e7eff branch commit (from PR)
main abc417a4 base commit (on main)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Slow tests are passing, checked with latest nightly

@zucchini-nlp zucchini-nlp enabled auto-merge April 2, 2026 14:59
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, altclip, blip, chinese_clip, clip, clipseg, clvp, git, groupvit, idefics, kosmos2, metaclip_2, mlcd, owlv2, owlvit, siglip

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky test only, so merging as it's quite an important PR to have

@vasqu vasqu disabled auto-merge April 9, 2026 14:22
@vasqu vasqu merged commit 353a9dc into huggingface:main Apr 9, 2026
25 of 28 checks passed
@biship
Copy link
Copy Markdown

biship commented Apr 10, 2026

@vladmandic pretty sure this will break sd.next

@vladmandic
Copy link
Copy Markdown

vladmandic commented Apr 10, 2026

@vladmandic pretty sure this will break sd.next

i'll check, thanks for heads-up. but yes, most likely.

@Cyrilvallez
Copy link
Copy Markdown
Member

@zucchini-nlp @vasqu this breaks all existing checkpoints for clip and other models

@Cyrilvallez
Copy link
Copy Markdown
Member

We absolutely cannot change modules graph without adding the necessary mappings or something

@zucchini-nlp
Copy link
Copy Markdown
Member Author

FYI Cyril: I am aware and working on it (got swayed by smth else), so we don't have duplicate PRs

sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
* squash!

* fix copies for losses

* reverse mapping test

* xclip

* fxi repo

* altclip needs eager attn to pass the test?! and ofc non-causal masl

* final fix repo

* layernorm typo / docsting

* clips can't agree on causality

* ugh, skip xclip

* fix repo

* comments

* and more

* fixing stuff

* i didn't push it yesterday?

* style

* break out of infinte dependency loop

* fxi repo and hopefully merge today

* again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants