Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/transformers/models/clip/modeling_clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -634,7 +634,7 @@ def forward(
last_hidden_state = encoder_outputs.last_hidden_state
last_hidden_state = self.final_layer_norm(last_hidden_state)

if self.eos_token_id == 2:
if torch.all(input_ids.max(dim=-1).values == 49407).item():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't be 100% sure if there are clip models on the hub with self.eos_token_id == 2 and yet with tokenizer's eos != 49407 or if already having added new tokens.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also 49407 is hardcoded

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we start raising warning and deprecate an incorrect 'eos', so this code bloack can be removed in the future? Super hacky rn

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require updating the configs of the OpenAI CLIP models, models like FashionCLIP, BioCLIP, etc (probably thousands of models)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As talked internally with Arthur, let's not touch this part of code, just simply use modular.

@NielsRogge suggest to run some tests to avoid this kind of situation in the future.

# The `eos_token_id` was incorrect before PR #24773: Let's keep what have been done here.
# A CLIP model with such `eos_token_id` in the config can't work correctly with extra new tokens added
# ------------------------------------------------------------
Expand Down