🚨 Generic Sequence Classifier works for multimodal models#44664
🚨 Generic Sequence Classifier works for multimodal models#44664zucchini-nlp wants to merge 3 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
|
||
| class Qwen3_5ForSequenceClassification(GenericForSequenceClassification, Qwen3_5PreTrainedModel): | ||
| config: Qwen3_5TextConfig | ||
| config_class = AutoConfig |
There was a problem hiding this comment.
The main change: to allow a single class to be loaded as LLM (text_config) or as multimodal lm (vlm config)
Not sure if it aligns well with transformers tho
There was a problem hiding this comment.
Some initial comments, I have a bit of my doubts because the signature is not properly reflecting VLMs. Not sure if it would rather make sense to have a dedicated one instead
Also, are these really the only ones that use it? A bit surprising but oh well
5968cdc to
19ab5fb
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, gemma3, qwen3_5 |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44664&sha=277261 |
|
Failing test is unrelated and due to bad regex, same as in #45313. The previous fix didn't fix it for all cases 😢 |
vasqu
left a comment
There was a problem hiding this comment.
Let's add 🚨 to the title, I hope not too many relied on the exact class name for qwen 3.5 text here
But I think this is the best solution for now then, sad that the override didnt work out 😢
Fixes #44625 and #44406 (comment)
After long discussion, ig this is the easiest and most straightforward way for us to use
Genericclasses with mulitmodals, and keep their signature. Overloading didn't work as I expected 😓Ans I am quite reluctant to use
TypedDicts, as we will have a lot of unrelated args even after filtering by modality.Working example:
This should work for ST