HPU support#36424
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@ArthurZucker @muellerzr PR is ready for review, I made sure (trainer, fsdp, deepspeed) tests ran successfully on both gaudi1 and gaudi2 in single and multi device settings. |
| # the file doesn't exist in the repo | ||
| if not os.path.exists("utils/testing_scripts/fsdp_cpu_offloading.py"): | ||
| raise unittest.SkipTest("FSDP CPU offloading script not found!") |
There was a problem hiding this comment.
couldn't find this file, is this test still relevant ?
There was a problem hiding this comment.
I think it's meant to be:
from functools import partial
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
# verify we have FSDP activation support ready by importing:
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
checkpoint_wrapper,
CheckpointImpl,
apply_activation_checkpointing,
)
from transformers.models.llama.modeling_llama import LlamaDecoderLayer
model_id = "HuggingFaceM4/tiny-random-Llama3ForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_id)
model.train()
model.gradient_checkpointing_enable()
accelerator = Accelerator()
model = accelerator.prepare(model)
check_fn = lambda submodule: isinstance(submodule, LlamaDecoderLayer)
non_reentrant_wrapper = partial(
checkpoint_wrapper,
offload_to_cpu=False,
checkpoint_impl=CheckpointImpl.NO_REENTRANT,
)
apply_activation_checkpointing(
model, checkpoint_wrapper_fn=non_reentrant_wrapper, check_fn=check_fn
)
print(model)
rand_input = torch.LongTensor([[0, 1, 0, 1]]).to(0)
model(rand_input)Was referenced in #31161 but never actually added? 😅
There was a problem hiding this comment.
should I leave it for another PR ? the file path utils/testing_scripts/fsdp_cpu_offloading.py doesn't make sense in transformers repo.
ArthurZucker
left a comment
There was a problem hiding this comment.
NIce! Missing for me a bit of doc on:
- what is HPU
- how could anyone run on HPU?
But that's it!
| # the file doesn't exist in the repo | ||
| if not os.path.exists("utils/testing_scripts/fsdp_cpu_offloading.py"): | ||
| raise unittest.SkipTest("FSDP CPU offloading script not found!") |
muellerzr
left a comment
There was a problem hiding this comment.
Thanks! Added a note for our apparent missing test file 👀
| # the file doesn't exist in the repo | ||
| if not os.path.exists("utils/testing_scripts/fsdp_cpu_offloading.py"): | ||
| raise unittest.SkipTest("FSDP CPU offloading script not found!") |
There was a problem hiding this comment.
I think it's meant to be:
from functools import partial
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
# verify we have FSDP activation support ready by importing:
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
checkpoint_wrapper,
CheckpointImpl,
apply_activation_checkpointing,
)
from transformers.models.llama.modeling_llama import LlamaDecoderLayer
model_id = "HuggingFaceM4/tiny-random-Llama3ForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_id)
model.train()
model.gradient_checkpointing_enable()
accelerator = Accelerator()
model = accelerator.prepare(model)
check_fn = lambda submodule: isinstance(submodule, LlamaDecoderLayer)
non_reentrant_wrapper = partial(
checkpoint_wrapper,
offload_to_cpu=False,
checkpoint_impl=CheckpointImpl.NO_REENTRANT,
)
apply_activation_checkpointing(
model, checkpoint_wrapper_fn=non_reentrant_wrapper, check_fn=check_fn
)
print(model)
rand_input = torch.LongTensor([[0, 1, 0, 1]]).to(0)
model(rand_input)Was referenced in #31161 but never actually added? 😅
muellerzr
left a comment
There was a problem hiding this comment.
Everything looks good from the Trainer side in my eyes, only thing we may want is to add an accelerate import check to flag as a requirement (release will go live tonight)
Added ! target version is 1.50 right ? @muellerzr |
What does this PR do?
This PR introduces upstream support for HPU torch device/backend:
This PR focuses on enabling out of the box support in eager mode (
PT_HPU_LAZY_MODE=0), whileoptimum-habanawill continue to enable optimized paths making use of the lazy mode and advanced features of the SynapseAI software stack.This is part of three PRs:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.