[qwen2-vl] fix vision attention scaling#39043
Conversation
|
run-slow: qwen2_vl,qwen2_5_vl,qwen2_5_omni |
|
This comment contains run-slow, running the specified jobs: models: ['models/qwen2_5_omni', 'models/qwen2_5_vl', 'models/qwen2_vl'] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ydshieh
left a comment
There was a problem hiding this comment.
Thanks! Since the CI is green, and the fix is so simple, Looks too great to me !
|
Ah, @zucchini-nlp , there is also
which is failing due to #38930. But also failing with this PR. |
|
previously at this block in we have but after #38930 |
ArthurZucker
left a comment
There was a problem hiding this comment.
Much needed thanks for catching this!
|
@ydshieh the ColQwen test is failing for me even before the Qwen2-VL vision refactor, maybe it started failing due to other PR? |
I will check again (inside CI runner, it's that PR causing problem. Do you check locally or inside a SSH runner?) |
|
Only locally for now |
|
hmm, it is passing now on main, and even on the commit of this merged PR. Either I made some mistake when checking back then or something strange there. Sorry to bother. We are ✅ |
|
when I say pass, I mean on A10 |
scale lost its `-` when refactoring
scale lost its `-` when refactoring
scale lost its `-` when refactoring
scale lost its `-` when refactoring
scale lost its `-` when refactoring
scale lost its `-` when refactoring
scale lost its `-` when refactoring
What does this PR do?
As per title, after the refactor scaling was accidentally changed from
1/math.sqrt(head_dim)tomath.sqrt(head_dim)