[qwen2-vl] fix vision attention scaling by zucchini-nlp · Pull Request #39043 · huggingface/transformers

zucchini-nlp · 2025-06-25T19:41:50Z

What does this PR do?

As per title, after the refactor scaling was accidentally changed from 1/math.sqrt(head_dim) to math.sqrt(head_dim)

zucchini-nlp · 2025-06-25T19:42:08Z

run-slow: qwen2_vl,qwen2_5_vl,qwen2_5_omni

github-actions · 2025-06-25T19:43:40Z

This comment contains run-slow, running the specified jobs:

models: ['models/qwen2_5_omni', 'models/qwen2_5_vl', 'models/qwen2_vl']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-06-25T19:54:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh

Thanks! Since the CI is green, and the fix is so simple, Looks too great to me !

ydshieh · 2025-06-26T10:16:02Z

Ah, @zucchini-nlp , there is also

tests/models/colqwen2/test_modeling_colqwen2.py::ColQwen2ModelIntegrationTest::test_model_integration_test

which is failing due to #38930. But also failing with this PR.

ydshieh · 2025-06-26T10:24:49Z

previously at this block

        # Check if the maximum scores per row are in the diagonal of the matrix score
        self.assertTrue((scores.argmax(axis=1) == torch.arange(len(ds), device=scores.device)).all())

in tests/models/colqwen2/test_modeling_colqwen2.py

we have

tensor([[7.0820, 6.6836, 7.5547],
        [8.1797, 9.3516, 8.0312],
        [7.6641, 8.3359, 8.9922]], dtype=torch.float16)

but after #38930

tensor([[15.0703,  8.7422, 15.0312],
        [ 9.5078, 16.8906, 10.6250],
        [15.6484, 12.3984, 20.4688]], dtype=torch.float16)

Cyrilvallez

Indeed thanks!

ArthurZucker

Much needed thanks for catching this!

zucchini-nlp · 2025-06-30T08:55:42Z

@ydshieh the ColQwen test is failing for me even before the Qwen2-VL vision refactor, maybe it started failing due to other PR?

ydshieh · 2025-06-30T09:42:07Z

@ydshieh the ColQwen test is failing for me even before the Qwen2-VL vision refactor, maybe it started failing due to other PR?

I will check again (inside CI runner, it's that PR causing problem. Do you check locally or inside a SSH runner?)

zucchini-nlp · 2025-06-30T10:07:36Z

Only locally for now

ydshieh · 2025-06-30T10:37:18Z

hmm, it is passing now on main, and even on the commit of this merged PR.

Either I made some mistake when checking back then or something strange there. Sorry to bother.

We are ✅

ydshieh · 2025-06-30T10:37:44Z

when I say pass, I mean on A10

scale lost its `-` when refactoring

scale lost its - when refactoring

7e1e8d3

zucchini-nlp requested review from Cyrilvallez and ydshieh June 25, 2025 19:42

ydshieh approved these changes Jun 26, 2025

View reviewed changes

ydshieh self-requested a review June 26, 2025 10:26

Cyrilvallez approved these changes Jun 26, 2025

View reviewed changes

ArthurZucker approved these changes Jun 26, 2025

View reviewed changes

ArthurZucker merged commit 44b2316 into huggingface:main Jun 26, 2025
14 of 15 checks passed

ydshieh mentioned this pull request Jul 7, 2025

[vlm] fix loading of retrieval VLMs #39242

Merged

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

[qwen2-vl] fix vision attention scaling (huggingface#39043)

b1ed34a

scale lost its `-` when refactoring

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

[qwen2-vl] fix vision attention scaling (huggingface#39043)

58c2c58

scale lost its `-` when refactoring

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

[qwen2-vl] fix vision attention scaling (huggingface#39043)

1e1e15d

scale lost its `-` when refactoring

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

[qwen2-vl] fix vision attention scaling (huggingface#39043)

5b78d78

scale lost its `-` when refactoring

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

[qwen2-vl] fix vision attention scaling (huggingface#39043)

48555a9

scale lost its `-` when refactoring

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

[qwen2-vl] fix vision attention scaling (huggingface#39043)

43807c9

scale lost its `-` when refactoring

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

[qwen2-vl] fix vision attention scaling (huggingface#39043)

17a2950

scale lost its `-` when refactoring

Conversation

zucchini-nlp commented Jun 25, 2025

What does this PR do?

Uh oh!

zucchini-nlp commented Jun 25, 2025

Uh oh!

github-actions Bot commented Jun 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 25, 2025

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Jun 26, 2025

Uh oh!

ydshieh commented Jun 26, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Jun 30, 2025

Uh oh!

ydshieh commented Jun 30, 2025

Uh oh!

zucchini-nlp commented Jun 30, 2025

Uh oh!

ydshieh commented Jun 30, 2025

Uh oh!

ydshieh commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants