Make fused normalization functions backward-compatible#1760
Merged
crcrpar merged 1 commit intoNVIDIA:masterfrom Jan 1, 2024
Merged
Make fused normalization functions backward-compatible#1760crcrpar merged 1 commit intoNVIDIA:masterfrom
crcrpar merged 1 commit intoNVIDIA:masterfrom
Conversation
Signed-off-by: Tim Moon <tmoon@nvidia.com>
timmoon10
added a commit
to timmoon10/NeMo
that referenced
this pull request
Dec 21, 2023
See NVIDIA/apex#1760. Signed-off-by: Tim Moon <tmoon@nvidia.com>
Contributor
|
Hi @timmoon10 , Just thought people may not be using the Function directly and forgot about Megatron. I believe it might be best to submit another PR to Megatron-LM in tandem with this one, since I believe Megatron-Deepspeed already have this feature (deepspeedai/Megatron-DeepSpeed#277) and it'd be great if Megatron has it as well. |
Contributor
Author
|
@RuiWang1998 That's nifty, it'll be convenient to just reuse that existing work. These two approaches aren't mutually exclusive, so I don't think there is any harm to merging. This change won't break the newer code that uses |
crcrpar
approved these changes
Jan 1, 2024
ericharper
added a commit
to NVIDIA-NeMo/NeMo
that referenced
this pull request
Jan 12, 2024
* Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <tmoon@nvidia.com> * Removed unused import Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
minitu
pushed a commit
to minitu/NeMo
that referenced
this pull request
Jan 19, 2024
…eMo#7909) * Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <tmoon@nvidia.com> * Removed unused import Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
ssh-meister
pushed a commit
to ssh-meister/NeMo
that referenced
this pull request
Feb 15, 2024
…eMo#7909) * Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <tmoon@nvidia.com> * Removed unused import Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
…eMo#7909) * Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <tmoon@nvidia.com> * Removed unused import Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#1715 makes breaking API changes to some fused normalization functions, in particular adding
memory_efficientas a positional argument. This PR makesmemory_efficienta keyword argument to ensure backward compatibility.This change is motivated by the fact that Megatron-LM uses the old API:
https://github.com/NVIDIA/Megatron-LM/blob/2bc6cd307a11423928c675f741e79e03df23e721/megatron/core/fusions/fused_layer_norm.py#L147
This prevents NeMo from upgrading from the 23.09 to 23.11 PyTorch container. See NVIDIA-NeMo/NeMo#7909 (comment).
Feedback would be appreciated. An alternative approach is to update Megatron-LM, but this seems simpler. Pinging @RuiWang1998.