Simplify `get_*_features` methods + update docs by qubvel · Pull Request #40555 · huggingface/transformers

qubvel · 2025-08-29T17:31:12Z

What does this PR do?

As per the title, unbloating the following methods to remove output_* arguments that have no sense for these methods.

As a bonus: updating snippets to use load_image function instead of PIL + requests

HuggingFaceDocBuilderDev · 2025-08-29T17:45:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SangbumChoi · 2025-08-30T13:16:40Z

Just saw this after writing this #40563 (Not perfectly aligned with this PR)
Love this idea to be merged.

zucchini-nlp

Thanks, much cleaner this way! I think we would still need to allow kwargs to be backward compatible

Also I would like a less breaking change on BLIP with a small deprecation cycle, since its users might be relying on get_xx_feats to get the whole output dict

zucchini-nlp · 2025-09-01T09:04:52Z


        self.post_init()

+    @filter_out_non_signature_kwargs()


does it not require a function that accepts **kwargs? Otherwise it will be a breaking change for users to pass output_attentions/output_hidden_states

This is the idea of filter_out_non_signature_kwargs - to avoid **kwargs. Even if output_attentions arg is passed it would be filtered out and the warning will be issued

zucchini-nlp · 2025-09-01T09:13:09Z

-        pixel_values: Optional[torch.FloatTensor] = None,
-        output_attentions: Optional[bool] = None,
-        output_hidden_states: Optional[bool] = None,
+        pixel_values: torch.FloatTensor,


same here, might need to use kwargs

zucchini-nlp · 2025-09-01T09:15:10Z

-        image_features = vision_outputs[1]  # pooled_output
-
+        vision_outputs = self.vision_model(pixel_values=pixel_values)
+        image_features = vision_outputs.pooler_output


should we pass return_dict=True explicitly or is it guaranteed that the output will be dict?

I suppose get_image_features is a relatively rare use case, and return_dict=False within the config is also a rare use case. Therefore, the combination to catch the error is extremely rare, if it even exists anywhere. I would not overwhelm the code with explicit return_dict=True everywhere, but I might be wrong

zucchini-nlp · 2025-09-01T09:17:40Z

            )

-        return text_outputs
+        return text_outputs.logits


kind of breaking, as it will not return the whole text output dict after this. BLIP is still used commonly in certain cases, so I would prefer to not break it

ok, fixed in f8b6854

zucchini-nlp · 2025-09-01T09:18:33Z

+        vision_features = vision_outputs.pooler_output

-        return vision_outputs
+        return vision_features


zucchini-nlp · 2025-09-01T09:30:15Z


        return pooled_output

-    @auto_docstring


is deleting auto_docstring intended, i think it removes pixel_values from docs

ahh, that's a modular trick! good catch, thanks

fixed in 7ff87f8

github-actions · 2025-09-01T10:10:40Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, blip_2, chinese_clip, clap, clip, clipseg, flava, groupvit, metaclip_2, owlv2, owlvit, siglip, siglip2, vision_text_dual_encoder

zucchini-nlp

Thanks!

qubvel added 18 commits August 29, 2025 15:25

siglip

b9af202

clip

00802b6

aimv2

947b93e

metaclip_2

5c0bf30

align

47ca558

align fixup

e389bb5

altclip

afc81a7

blip2 (make consistent)

b5060f4

chineese clip

7ffbb9f

clipseg

b117f50

flava

6d754d2

groupvit

11a54c2

owlv2

699725a

owlvit

731ef17

vision_encoder

8b4ca27

clap

053bf6a

x_clip

e51ec01

fixup

60200d6

qubvel requested a review from zucchini-nlp August 29, 2025 17:38

qubvel marked this pull request as ready for review August 29, 2025 17:39

zucchini-nlp reviewed Sep 1, 2025

View reviewed changes

qubvel added 2 commits September 1, 2025 09:50

fix siglip2

7ff87f8

blip2

f8b6854

qubvel added 2 commits September 1, 2025 10:13

fix blip2 tests (revert to original)

fa882ae

fix docs

0d5311c

qubvel requested a review from zucchini-nlp September 1, 2025 10:21

zucchini-nlp approved these changes Sep 1, 2025

View reviewed changes

qubvel merged commit 2537ed4 into huggingface:main Sep 1, 2025
20 checks passed

JiangJQ2000 mentioned this pull request Nov 24, 2025

Fix ChineseCLIPModel.get_text_features #42351

Merged

5 tasks

Conversation

qubvel commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 29, 2025

Uh oh!

SangbumChoi commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qubvel Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qubvel Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Sep 1, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qubvel commented Aug 29, 2025 •

edited

Loading

SangbumChoi commented Aug 30, 2025 •

edited

Loading

qubvel Sep 1, 2025 •

edited

Loading

qubvel Sep 1, 2025 •

edited

Loading