[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device.#450
Closed
ys950902 wants to merge 1 commit intodeepspeedai:mainfrom
Closed
[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device.#450ys950902 wants to merge 1 commit intodeepspeedai:mainfrom
ys950902 wants to merge 1 commit intodeepspeedai:mainfrom
Conversation
Author
|
#429 for layernorm is added in this pr. Hi @polisettyvarma could you please also take a look on this pr, is it okay for you on layernorm, many thank! |
|
@ys950902 will this feature work correctly ? have you done any accuracy check for this ? |
tjruwase
reviewed
Sep 30, 2024
| else: | ||
| from .rmsnorm import RMSNorm | ||
| from torch.nn import LayerNorm | ||
| from .layernorm import LayerNorm |
There was a problem hiding this comment.
Can you please share the failure that is being fixed? I have two concerns about this change:
- It is quite subtle since it does not show the connection to sequence-parallelism
- It is unclear to me that new
LayerNormis equivalent totorch.nn.LayerNormfor non sequence-parallel case. Maintaining parity withtorch.nn.LayerNormimposes extra development burden.
So, I would like to further understand the problem and explore alternative solutions. Thanks!
|
Closing for lack of response. Please re-open if needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As sequence_parallel is added in Megatron-DeepSpeed for layernorm, for current implementation, non-CUDA device is using from torch.nn import LayerNorm for layernorm, there is no attr named sequence_parallel, will cause init error for non-CUDA device.
This pr is to fix this issue.