Skip to content

fix bug for videomt model device mismatch#45204

Merged
ydshieh merged 5 commits intohuggingface:mainfrom
kaixuanliu:videomt-xpu
Apr 10, 2026
Merged

fix bug for videomt model device mismatch#45204
ydshieh merged 5 commits intohuggingface:mainfrom
kaixuanliu:videomt-xpu

Conversation

@kaixuanliu
Copy link
Copy Markdown
Contributor

@ydshieh pls help review, thx!

)
query_tokens = self.query_updater(propagated_query).to(frame_hidden_states.device) + self.query.weight[
None, :, :
].to(frame_hidden_states.device)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have this change now, we don't need the

query_tokens.to(frame_hidden_states.device)

below.

It's no big deal, but let's remove it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the advice. Done.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Apr 9, 2026

@kaixuanliu

Could you share which tests are failing without this PR, as well as the (full) error log you have for them, please 🙏 ? Thanks

@kaixuanliu
Copy link
Copy Markdown
Contributor Author

Sure. Before this PR, when I use 4 cards to run test case like:
tests/models/videomt/test_modeling_videomt.py::VideomtForUniversalSegmentationIntegrationTest::test_instance_segmentation_inference , it will fail with error:

for frame_idx in range(num_frames):                                                                                                 frame_hidden_states = hidden_states[:, frame_idx]

            if propagated_query is None:
                query_tokens = self.query.weight[None, :, :].expand(batch_size, -1, -1)
            else:
>               query_tokens = self.query_updater(propagated_query) + self.query.weight[None, :, :].to(
                    frame_hidden_states.device
                )
E               RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:2!

src/transformers/models/videomt/modeling_videomt.py:1181: RuntimeError

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Apr 10, 2026

Thank you for the provided information. Indeed, this model's integration tests have

model_kwargs = {"device_map": "auto"}

which will lead this kind of issues. So the fix makes a lot of sense (even for user-facing, not just for CI).

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Apr 10, 2026

run-slow: videomt

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/videomt"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN cb3e97d4 workflow commit (merge commit)
PR 0db70bb0 branch commit (from PR)
main f6ff4ed8 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Apr 10, 2026

run-slow: videomt

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: videomt

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/videomt"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 6a456549 workflow commit (merge commit)
PR 64fda37c branch commit (from PR)
main a9f5b3a8 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@ydshieh ydshieh merged commit 4fd862f into huggingface:main Apr 10, 2026
18 of 20 checks passed
@kaixuanliu kaixuanliu deleted the videomt-xpu branch April 13, 2026 02:40
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
* fix bug for videomt model device mismatch

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants