Skip to content

align xpu's autocast behavior w/ cuda by using device agnostic torch APIs#38284

Merged
ydshieh merged 25 commits intohuggingface:mainfrom
yao-matrix:autocast-xpu
Jun 19, 2025
Merged

align xpu's autocast behavior w/ cuda by using device agnostic torch APIs#38284
ydshieh merged 25 commits intohuggingface:mainfrom
yao-matrix:autocast-xpu

Conversation

@yao-matrix
Copy link
Copy Markdown
Contributor

@ArthurZucker, pls help review, thx very much.

cuda

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
@Rocketknight1
Copy link
Copy Markdown
Member

cc @IlyasMoutawwakil

@yao-matrix
Copy link
Copy Markdown
Contributor Author

ci failure seems not brought by my PR

…magegpt

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

# Upcast (turn off autocast) and reorder (Scale K by 1 / root(dk))
with torch.amp.autocast(query.device.type, enabled=False):
with torch.autocast(query.device.type, enabled=False):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align to other modeling code in models directory

@yao-matrix
Copy link
Copy Markdown
Contributor Author

@ArthurZucker @IlyasMoutawwakil could you help review and comment? Thx very much

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
@yao-matrix yao-matrix changed the title align xpu's autocast behavior w/ cuda by using device agnostic torch.autocast align xpu's autocast behavior w/ cuda by using device agnostic torch APIs May 29, 2025
@yao-matrix
Copy link
Copy Markdown
Contributor Author

ci failure maybe because of the instable ci env

@Rocketknight1
Copy link
Copy Markdown
Member

CI seems clear now! cc @IlyasMoutawwakil

@yao-matrix
Copy link
Copy Markdown
Contributor Author

@IlyasMoutawwakil , could you help review? Thx very much.

input_dtype = query_states.dtype
device_type = (
query_states.device.type
if isinstance(query_states.device.type, str) and query_states.device.type != "mps"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would query_states.device.type be anything other than str ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what's the problem with mps exactly ?

Copy link
Copy Markdown
Contributor Author

@yao-matrix yao-matrix Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know, it's a existing practice in original code

device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"
, and i reuse it because i don't have mps so just follow the existing behavior. I can see it also be in chameleon and recurrent_gemma modeling since the first PR, so i cannot retrieve the history on why using this. Maybe @ArthurZucker and @zucchini-nlp know the reason.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think it comes from this comment #29285 (comment). The changes were first added in this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx @zucchini-nlp, this PR explains why disable casting is needed.
And, @IlyasMoutawwakil , this PR #29439 , explains why mps is excluded, it's because mps doesn't support amp.
For when query_states.device.type is not a str, I guess it's a backward-compatible behavior, because before PT 2.0, there are only 2 values for torch.device.type which are "cpu" and "cuda", so for devices like "mps" (supported since PT 1.12), so for PT 1.13, tensor device type in "mps" may return None. So need this guard. But I don't have device to confirm, just my guess.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, so I guess we can remove the str check, since support for torch 1.x was dropped a while ago.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IlyasMoutawwakil , done, pls help review again, thx.

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Copy link
Copy Markdown
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yao-matrix
Copy link
Copy Markdown
Contributor Author

@Rocketknight1 , do you know who else need review this PR after Ilyas approved? Thx.

@IlyasMoutawwakil IlyasMoutawwakil requested a review from ydshieh June 19, 2025 08:41
Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LVGTM, thank you

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Jun 19, 2025

run-slow: qwen2_5_omni, gemma, phimoe, qwen2_moe, gpt2, distilbert

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/distilbert', 'models/gemma', 'models/gpt2', 'models/phimoe', 'models/qwen2_5_omni', 'models/qwen2_moe']
quantizations: [] ...

@ydshieh ydshieh enabled auto-merge (squash) June 19, 2025 11:36
@ydshieh ydshieh merged commit a9ce8c6 into huggingface:main Jun 19, 2025
20 of 21 checks passed
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yao-matrix yao-matrix deleted the autocast-xpu branch June 19, 2025 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants