Skip to content

Simplify get_*_features methods + update docs#40555

Merged
qubvel merged 22 commits intohuggingface:mainfrom
qubvel:fix-get-features
Sep 1, 2025
Merged

Simplify get_*_features methods + update docs#40555
qubvel merged 22 commits intohuggingface:mainfrom
qubvel:fix-get-features

Conversation

@qubvel
Copy link
Copy Markdown
Contributor

@qubvel qubvel commented Aug 29, 2025

What does this PR do?

As per the title, unbloating the following methods to remove output_* arguments that have no sense for these methods.

As a bonus: updating snippets to use load_image function instead of PIL + requests

@qubvel qubvel requested a review from zucchini-nlp August 29, 2025 17:38
@qubvel qubvel marked this pull request as ready for review August 29, 2025 17:39
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SangbumChoi
Copy link
Copy Markdown
Contributor

SangbumChoi commented Aug 30, 2025

Just saw this after writing this #40563 (Not perfectly aligned with this PR)
Love this idea to be merged.

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, much cleaner this way! I think we would still need to allow kwargs to be backward compatible

Also I would like a less breaking change on BLIP with a small deprecation cycle, since its users might be relying on get_xx_feats to get the whole output dict


self.post_init()

@filter_out_non_signature_kwargs()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it not require a function that accepts **kwargs? Otherwise it will be a breaking change for users to pass output_attentions/output_hidden_states

Copy link
Copy Markdown
Contributor Author

@qubvel qubvel Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the idea of filter_out_non_signature_kwargs - to avoid **kwargs. Even if output_attentions arg is passed it would be filtered out and the warning will be issued

pixel_values: Optional[torch.FloatTensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
pixel_values: torch.FloatTensor,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, might need to use kwargs

image_features = vision_outputs[1] # pooled_output

vision_outputs = self.vision_model(pixel_values=pixel_values)
image_features = vision_outputs.pooler_output
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we pass return_dict=True explicitly or is it guaranteed that the output will be dict?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose get_image_features is a relatively rare use case, and return_dict=False within the config is also a rare use case. Therefore, the combination to catch the error is extremely rare, if it even exists anywhere. I would not overwhelm the code with explicit return_dict=True everywhere, but I might be wrong

)

return text_outputs
return text_outputs.logits
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of breaking, as it will not return the whole text output dict after this. BLIP is still used commonly in certain cases, so I would prefer to not break it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed in f8b6854

vision_features = vision_outputs.pooler_output

return vision_outputs
return vision_features
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


return pooled_output

@auto_docstring
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is deleting auto_docstring intended, i think it removes pixel_values from docs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, that's a modular trick! good catch, thanks

Copy link
Copy Markdown
Contributor Author

@qubvel qubvel Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 7ff87f8

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Sep 1, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, blip_2, chinese_clip, clap, clip, clipseg, flava, groupvit, metaclip_2, owlv2, owlvit, siglip, siglip2, vision_text_dual_encoder

@qubvel qubvel requested a review from zucchini-nlp September 1, 2025 10:21
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@qubvel qubvel merged commit 2537ed4 into huggingface:main Sep 1, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants