Support MetaCLIP 2#39821
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: clip |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| last_hidden_state = self.final_layer_norm(last_hidden_state) | ||
|
|
||
| if self.eos_token_id == 2: | ||
| if torch.all(input_ids.max(dim=-1).values == 49407).item(): |
There was a problem hiding this comment.
can't be 100% sure if there are clip models on the hub with self.eos_token_id == 2 and yet with tokenizer's eos != 49407 or if already having added new tokens.
There was a problem hiding this comment.
also 49407 is hardcoded
There was a problem hiding this comment.
should we start raising warning and deprecate an incorrect 'eos', so this code bloack can be removed in the future? Super hacky rn
There was a problem hiding this comment.
That would require updating the configs of the OpenAI CLIP models, models like FashionCLIP, BioCLIP, etc (probably thousands of models)
There was a problem hiding this comment.
As talked internally with Arthur, let's not touch this part of code, just simply use modular.
@NielsRogge suggest to run some tests to avoid this kind of situation in the future.
What does this PR do?
Meta just released MetaCLIP 2 (worldwide), new CLIP models trained on 300+ languages.
However, when making them compatible with
modeling_clip.py, I noticed there's a mistake with the original OpenAI CLIP models.CLIPTokenizer:CLIPTextConfig#24773. For backwards compatibility, a block of code was kept and it only runs in case a model has wrongly set EOS token ID == 2.cc @ydshieh