[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. by ys950902 · Pull Request #450 · deepspeedai/Megatron-DeepSpeed

ys950902 · 2024-09-29T06:21:17Z

As sequence_parallel is added in Megatron-DeepSpeed for layernorm, for current implementation, non-CUDA device is using from torch.nn import LayerNorm for layernorm, there is no attr named sequence_parallel, will cause init error for non-CUDA device.

This pr is to fix this issue.

ys950902 · 2024-09-29T06:22:37Z

#429 for layernorm is added in this pr. Hi @polisettyvarma could you please also take a look on this pr, is it okay for you on layernorm, many thank!

polisettyvarma · 2024-09-29T15:08:35Z

@ys950902 will this feature work correctly ? have you done any accuracy check for this ?

tjruwase · 2024-09-30T13:30:46Z

    else:
        from .rmsnorm import RMSNorm
-    from torch.nn import LayerNorm
+    from .layernorm import LayerNorm


Can you please share the failure that is being fixed? I have two concerns about this change:

It is quite subtle since it does not show the connection to sequence-parallelism

It is unclear to me that new LayerNorm is equivalent to torch.nn.LayerNorm for non sequence-parallel case. Maintaining parity with torch.nn.LayerNorm imposes extra development burden.

So, I would like to further understand the problem and explore alternative solutions. Thanks!

tjruwase · 2024-12-14T00:53:18Z

Closing for lack of response. Please re-open if needed.

Fix init issue for layer_norm in sequence_parallel.

3ad2e8f

ys950902 requested review from GuanhuaWang, arashb, awan-10, duli2012, eltonzheng and tjruwase as code owners September 29, 2024 06:21

ys950902 changed the title ~~Fix init issue for layer_norm in sequence_parallel for non-CUDA device.~~ [Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. Sep 29, 2024

tjruwase removed request for GuanhuaWang, arashb, awan-10, duli2012 and eltonzheng September 30, 2024 13:19

tjruwase reviewed Sep 30, 2024

View reviewed changes

tjruwase closed this Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device.#450

[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device.#450
ys950902 wants to merge 1 commit intodeepspeedai:mainfrom
ys950902:layernorm_init

ys950902 commented Sep 29, 2024

Uh oh!

ys950902 commented Sep 29, 2024 •

edited

Loading

Uh oh!

polisettyvarma commented Sep 29, 2024

Uh oh!

tjruwase Sep 30, 2024 •

edited

Loading

Uh oh!

tjruwase commented Dec 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ys950902 commented Sep 29, 2024

Uh oh!

ys950902 commented Sep 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polisettyvarma commented Sep 29, 2024

Uh oh!

tjruwase Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjruwase commented Dec 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ys950902 commented Sep 29, 2024 •

edited

Loading

tjruwase Sep 30, 2024 •

edited

Loading