Add gpu option to cpu offload #3990

Disty0 · 2023-07-07T10:40:28Z

What does this PR do?

Add a gpu argument to enable_sequential_cpu_offload and enable_model_cpu_offload instead of assuming it's cuda only.
We can change torch.cuda.empty_cache() with another function from outside but we can't easily change the GPU without this PR.

Before submitting

[N] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[Y] Did you read the contributor guideline?
[Y] Did you read our philosophy doc (important for complex PRs)?
[N] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[N] Did you make sure to update the documentation with your changes?
Note: I couldn't find a doc about arguments for cpu offload.
Here are the;
documentation guidelines, and
here are tips on formatting docstrings.
[N] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Core library:

Pipelines: @patrickvonplaten and @sayakpaul
JAX and MPS: @pcuenca

sayakpaul · 2023-07-07T12:01:50Z

It's quite a large PR. We would maybe like to first discuss the impact and then proceed to reviewing it. Pinging @pcuenca @patrickvonplaten.

patrickvonplaten · 2023-07-12T19:07:55Z

I'm generally fine with it, but what other devices are you targeting exactly? @pcuenca wdyt?

pcuenca · 2023-07-12T19:27:41Z

I tested on mps and it works. However, I'm not sure it helps because memory is unified in M1/M2 computers, so it doesn't matter that we move the model between cpu and the mps device: if the model is too large we'll run out of memory no matter what. I could see two potential benefits, but I think neither can be realized now:

Offload to disk instead of cpu, to allow running very large models. I don't think this use case is currently supported by either enable_sequential_cpu_offload or enable_model_cpu_offload (but I might be wrong).
Use this on a computer with several mps devices. I'm currently not aware of such a system.

What type of hardware do you have in mind for this, @Disty0? Could you maybe clarify the use case here so we can better understand the issue? Thank you!

Disty0 · 2023-07-12T19:58:28Z

I am targeting xpu devices (Intel Arc GPUs) and possibly dml device (DirectML GPUs) too.

enable_model_cpu_offload works fine on xpu with this PR.
But enable_sequential_cpu_offload doesn't work yet since PyTorch IPEX doesn't work nicely with a meta device.

lshqqytiger · 2023-07-13T04:47:59Z

enable_sequential_cpu_offload works with DirectML very well. It will be nice if cpu offloading is available for more devices not only cuda.

pcuenca · 2023-07-13T08:00:07Z

enable_sequential_cpu_offload works with DirectML very well.

That's interesting! Could you please provide an example of use?

HuggingFaceDocBuilderDev · 2023-07-13T09:55:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Disty0 · 2023-07-14T22:16:24Z

enable_sequential_cpu_offload works with DirectML very well.

That's interesting! Could you please provide an example of use?

Something along these lines for Intel ARC GPUs:

torch.cuda.empty_cache = torch.xpu.empty_cache
pipeline.enable_model_cpu_offload(gpu="xpu", gpu_id=0)

pcuenca · 2023-07-15T15:28:39Z

Interesting! So I would be in favor of reviewing and merging this, sounds like it could be useful for some hardware architectures.

lshqqytiger · 2023-07-16T02:26:20Z

enable_sequential_cpu_offload works with DirectML very well.

That's interesting! Could you please provide an example of use?

Similar to xpu.

pipeline.enable_sequential_cpu_offload(gpu="privateuseone", gpu_id=0)

But I prefer to provide torch.device itself like below.

device = torch_directml.device(0)
pipeline.enable_sequential_cpu_offload(device=device)

patrickvonplaten · 2023-07-17T15:19:58Z

Agree with @lshqqytiger here - that's also what I added here: #4114 . Should we maybe first go with #4114 and then possible refactor enable_model_cpu_offload in a similar way?

Disty0 · 2023-07-17T20:22:37Z

Agree with @lshqqytiger here - that's also what I added here: #4114 . Should we maybe first go with #4114 and then possible refactor enable_model_cpu_offload in a similar way?

Using device will be better, i too agree with that.

#4114 seems like a better approach than this PR.
We can close this and continue with #4114.
And refactor enable_model_cpu_offload after that.

patrickvonplaten · 2023-07-18T09:13:12Z

Sorry for the duplicated work here @Disty0. Should we maybe try to do the same refactor for enable_model_cpu_offload by making every class define a class attribute describing the order in which the hooks should be applied? cc @williamberman @pcuenca wdyt?

Disty0 · 2023-07-18T09:30:51Z

Sorry for the duplicated work here @Disty0. Should we maybe try to do the same refactor for enable_model_cpu_offload by making every class define a class attribute describing the order in which the hooks should be applied? cc @williamberman @pcuenca wdyt?

Yes, that would be better. We won't have to change 68+ files if we want to change something in the future that way.

Disty0 added 2 commits July 7, 2023 13:07

Add gpu option to cpu offload

4d6f053

Add gpu option to cpu offload

9411a63

Merge remote-tracking branch 'upstream/main'

7798c6d

patrickvonplaten mentioned this pull request Jul 15, 2023

Refactor execution device & cpu offload #4114

Merged

patrickvonplaten closed this in #4114 Jul 18, 2023

patrickvonplaten reopened this Jul 18, 2023

patrickvonplaten closed this Jul 19, 2023

Add gpu option to cpu offload #3990

Add gpu option to cpu offload #3990

Uh oh!

Conversation

Disty0 commented Jul 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul commented Jul 7, 2023

Uh oh!

patrickvonplaten commented Jul 12, 2023

Uh oh!

pcuenca commented Jul 12, 2023

Uh oh!

Disty0 commented Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lshqqytiger commented Jul 13, 2023

Uh oh!

pcuenca commented Jul 13, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jul 13, 2023

Uh oh!

Disty0 commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcuenca commented Jul 15, 2023

Uh oh!

lshqqytiger commented Jul 16, 2023

Uh oh!

patrickvonplaten commented Jul 17, 2023

Uh oh!

Disty0 commented Jul 17, 2023

Uh oh!

patrickvonplaten commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Disty0 commented Jul 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Disty0 commented Jul 7, 2023 •

edited

Loading

Disty0 commented Jul 12, 2023 •

edited

Loading

Disty0 commented Jul 14, 2023 •

edited

Loading

patrickvonplaten commented Jul 18, 2023 •

edited

Loading