-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add gpu option to cpu offload #3990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It's quite a large PR. We would maybe like to first discuss the impact and then proceed to reviewing it. Pinging @pcuenca @patrickvonplaten. |
|
I'm generally fine with it, but what other devices are you targeting exactly? @pcuenca wdyt? |
|
I tested on
What type of hardware do you have in mind for this, @Disty0? Could you maybe clarify the use case here so we can better understand the issue? Thank you! |
|
I am targeting
|
|
|
That's interesting! Could you please provide an example of use? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Something along these lines for Intel ARC GPUs: |
|
Interesting! So I would be in favor of reviewing and merging this, sounds like it could be useful for some hardware architectures. |
Similar to xpu. pipeline.enable_sequential_cpu_offload(gpu="privateuseone", gpu_id=0)But I prefer to provide device = torch_directml.device(0)
pipeline.enable_sequential_cpu_offload(device=device) |
|
Agree with @lshqqytiger here - that's also what I added here: #4114 . Should we maybe first go with #4114 and then possible refactor |
Using device will be better, i too agree with that. #4114 seems like a better approach than this PR. |
|
Sorry for the duplicated work here @Disty0. Should we maybe try to do the same refactor for |
Yes, that would be better. We won't have to change 68+ files if we want to change something in the future that way. |
What does this PR do?
Add a gpu argument to enable_sequential_cpu_offload and enable_model_cpu_offload instead of assuming it's cuda only.
We can change torch.cuda.empty_cache() with another function from outside but we can't easily change the GPU without this PR.
Before submitting
Note: I couldn't find a doc about arguments for cpu offload.
Here are the;
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Core library: