fix bug when getting the real accelerator's device number by faaany · Pull Request #2874 · huggingface/accelerate

faaany · 2024-06-20T07:21:19Z

What does this PR do?

This PR is a follow-up fix for PR #2826 and I want to correct my statement in that PR that torch.device(d).type == "xpu" is enough to check the xpu device just like the case in npu and mlu. This was my mistake. In fact, torch.device(0).type will always return "cuda" on XPU as can be seen from the pytorch code and from the pytorch offical doc at least for now. But we are working on a PR to support it in the future pytorch version. Also for NPU path, I think torch.device(0).typewill returncuda` as can be seen here.

In addition, users might pass device id that exceeds the available device count. For this case, we will not count that incorrect id to num_devices when calculating the balanced memory. So this PR actually fixes 2 issues:

num_devices for non-cuda devices will always be 0
num_devices will include device index that is larger than the available device number

Who can review?

@SunMarc and @muellerzr

yao-matrix · 2024-06-20T08:07:20Z

OK for me.

HuggingFaceDocBuilderDev · 2024-06-20T09:58:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

faaany · 2024-06-26T06:52:30Z

@SunMarc @muellerzr

SunMarc

Thanks for your work @faaany ! I left a comment

SunMarc

Sounds good ! I left a comment about mlu since i'm not sure this is safe to change the expected_device_type.

SunMarc · 2024-06-27T09:28:29Z

-    elif is_mlu_available():
-        expected_device_type = "mlu"


The issue you shared @faaany is about the to() methods for mlu. I'm not sure if torch.device(d).type will really return cuda with mlu. Do you have any insights @huismiling since you added the support to mlu ? To be safe, I would suggest reverting this change.

good idea, I will update it. But I am also very curious about the behavior on npu:

Hi @statelesshz , could you help us verify what torch.device(0) returns on NPU? It is cuda or npu? Thanks a lot!

@faaany Hi, MLU devices type is mlu .

>>> torch.device(0).type 'mlu'

Thanks @huismiling! Could you update @faaany ?

@huismiling thanks for updating!

faaany · 2024-07-03T00:56:12Z

Good news is that during this time our proposal to make torch.device(0) return 'xpu' on xpu got approved by stock pytorch (PR link). To avoid "reverse-engineering", let me close this PR. Thanks so much for the discussion! @SunMarc

muellerzr · 2024-07-03T10:50:39Z

That's great @faaany !

SunMarc · 2024-07-03T10:52:05Z

Nice :)

fix bug

7503807

align with get_max_memory

c622ea2

faaany marked this pull request as ready for review June 20, 2024 09:34

SunMarc reviewed Jun 26, 2024

View reviewed changes

Comment thread src/accelerate/utils/modeling.py Outdated

update

8db169c

SunMarc reviewed Jun 27, 2024

View reviewed changes

faaany closed this Jul 3, 2024

dvrogozh mentioned this pull request Jul 13, 2024

pytorch: safetensors library hardcodes using CUDA if only device index is provided safetensors/safetensors#499

Closed

faaany deleted the device-fix branch November 4, 2024 06:09

Conversation

faaany commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

yao-matrix commented Jun 20, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Jun 20, 2024

Uh oh!

faaany commented Jun 26, 2024

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Jun 27, 2024

Choose a reason for hiding this comment

Uh oh!

faaany Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huismiling Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

SunMarc Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

faaany Jul 3, 2024

Choose a reason for hiding this comment

Uh oh!

faaany commented Jul 3, 2024

Uh oh!

muellerzr commented Jul 3, 2024

Uh oh!

SunMarc commented Jul 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

faaany commented Jun 20, 2024 •

edited

Loading

faaany Jun 27, 2024 •

edited

Loading