Simplify get_*_features methods + update docs#40555
Simplify get_*_features methods + update docs#40555qubvel merged 22 commits intohuggingface:mainfrom
get_*_features methods + update docs#40555Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Just saw this after writing this #40563 (Not perfectly aligned with this PR) |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Thanks, much cleaner this way! I think we would still need to allow kwargs to be backward compatible
Also I would like a less breaking change on BLIP with a small deprecation cycle, since its users might be relying on get_xx_feats to get the whole output dict
|
|
||
| self.post_init() | ||
|
|
||
| @filter_out_non_signature_kwargs() |
There was a problem hiding this comment.
does it not require a function that accepts **kwargs? Otherwise it will be a breaking change for users to pass output_attentions/output_hidden_states
There was a problem hiding this comment.
This is the idea of filter_out_non_signature_kwargs - to avoid **kwargs. Even if output_attentions arg is passed it would be filtered out and the warning will be issued
| pixel_values: Optional[torch.FloatTensor] = None, | ||
| output_attentions: Optional[bool] = None, | ||
| output_hidden_states: Optional[bool] = None, | ||
| pixel_values: torch.FloatTensor, |
There was a problem hiding this comment.
same here, might need to use kwargs
| image_features = vision_outputs[1] # pooled_output | ||
|
|
||
| vision_outputs = self.vision_model(pixel_values=pixel_values) | ||
| image_features = vision_outputs.pooler_output |
There was a problem hiding this comment.
should we pass return_dict=True explicitly or is it guaranteed that the output will be dict?
There was a problem hiding this comment.
I suppose get_image_features is a relatively rare use case, and return_dict=False within the config is also a rare use case. Therefore, the combination to catch the error is extremely rare, if it even exists anywhere. I would not overwhelm the code with explicit return_dict=True everywhere, but I might be wrong
| ) | ||
|
|
||
| return text_outputs | ||
| return text_outputs.logits |
There was a problem hiding this comment.
kind of breaking, as it will not return the whole text output dict after this. BLIP is still used commonly in certain cases, so I would prefer to not break it
| vision_features = vision_outputs.pooler_output | ||
|
|
||
| return vision_outputs | ||
| return vision_features |
|
|
||
| return pooled_output | ||
|
|
||
| @auto_docstring |
There was a problem hiding this comment.
is deleting auto_docstring intended, i think it removes pixel_values from docs
There was a problem hiding this comment.
ahh, that's a modular trick! good catch, thanks
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, align, altclip, blip_2, chinese_clip, clap, clip, clipseg, flava, groupvit, metaclip_2, owlv2, owlvit, siglip, siglip2, vision_text_dual_encoder |
What does this PR do?
As per the title, unbloating the following methods to remove
output_*arguments that have no sense for these methods.As a bonus: updating snippets to use
load_imagefunction instead of PIL + requests