Revert text_positions in Qwen25VL#40177
Revert text_positions in Qwen25VL#40177pengzhenghao wants to merge 2 commits intohuggingface:mainfrom
Conversation
|
Note that the outreach of the issue might not limited to Qwen2.5VL. But I am not familiar with other model so the only thing I do here is to initialize the investigation on this issue. @rahul-tuli let's see if this can get performance back. |
|
You only modified the modular file, but you need to apply the changes to the modeling file as well, i.e. |
|
Thanks for investigating, it helps a lot! We are going to include #40161 in a patch for safety but for final fixes (maybe including this) will have to wait for 4.56.x potentially. Let's see whether the original author of the mmmu issue can chime in and give feedback |
|
@pengzhenghao Could you please add the issue numbers to I guess that it should be OK to write something like this: |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Thanks for digging into the issue. I think we should not be deleting textual position ids here which is still needed for generation with FA2. Just removing text position will not solve it, because the root of issue is in
transformers/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py
Lines 872 to 898 in cf58d70
We need to allow packed sequence with FA2 which was requested several times by the community, And instead define that
text_pos_ids = None if the position ids do not contain 4 position types in the first dimension
|
Can you also check if these tests are still passing on the models? transformers/tests/test_modeling_common.py Lines 4265 to 4284 in 2914cec |
You need to update all Qwen-VL family including Qwen-Omni, the issue is relevant to models using 3D RoPE |
|
Hi! Are there any updates on this? I am trying to finetune qwenvl-2.5 and I am noticing significant variations in the loss when using the latest transformers version ( I tried a few previous versions and the latest one that seems to yield reasonable loss during each step is Thanks, |
|
@pengzhenghao hey, do you still want to merge this PR? I can take it up, if you are busy |
|
I will assume you are busy and take over the issue tomorrow, because Qwen-VL models have high usage |
|
Thank you so much for following this issue! @zucchini-nlp I am indeed busy and also I am not familiar with other Qwen-VL models & FA2 impl. That would be great if you can take over!
I am a little confused here.
Seems like you are suggesting:
So, I think this naturally implies:
Do I understand correctly? |
|
@pengzhenghao yeah right, since we don't want to enforce all users to use packed sequence with FA2 we can just set I will check it tomorrow, and we definitely need more tests in Qwen given its specific RoPE type |
cf58d70 to
9b13d94
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen2_5_vl |
|
@pengzhenghao I located the bug and it is not in using packed or text position ids. The bug was actually in how position ids are computed when generating which was not same as computing in Thanks for your PR and investigating it |

What does this PR do?
Fixes # (issue)
Remove problematic text_position_ids preparation when generating with Qwen2.5VL model.
Fixes #40154 (The Qwen 2.5 VL text position ID issue)
Fixes #40136 (Qwen2.5VL performance drop)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.