Skip to content

Make fused normalization functions backward-compatible#1760

Merged
crcrpar merged 1 commit intoNVIDIA:masterfrom
timmoon10:memory-efficient-layer-norm-bugfix
Jan 1, 2024
Merged

Make fused normalization functions backward-compatible#1760
crcrpar merged 1 commit intoNVIDIA:masterfrom
timmoon10:memory-efficient-layer-norm-bugfix

Conversation

@timmoon10
Copy link
Contributor

@timmoon10 timmoon10 commented Dec 21, 2023

#1715 makes breaking API changes to some fused normalization functions, in particular adding memory_efficient as a positional argument. This PR makes memory_efficient a keyword argument to ensure backward compatibility.

This change is motivated by the fact that Megatron-LM uses the old API:
https://github.com/NVIDIA/Megatron-LM/blob/2bc6cd307a11423928c675f741e79e03df23e721/megatron/core/fusions/fused_layer_norm.py#L147
This prevents NeMo from upgrading from the 23.09 to 23.11 PyTorch container. See NVIDIA-NeMo/NeMo#7909 (comment).

Feedback would be appreciated. An alternative approach is to update Megatron-LM, but this seems simpler. Pinging @RuiWang1998.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
timmoon10 added a commit to timmoon10/NeMo that referenced this pull request Dec 21, 2023
See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@RuiWang1998
Copy link
Contributor

Hi @timmoon10 ,

Just thought people may not be using the Function directly and forgot about Megatron. I believe it might be best to submit another PR to Megatron-LM in tandem with this one, since I believe Megatron-Deepspeed already have this feature (deepspeedai/Megatron-DeepSpeed#277) and it'd be great if Megatron has it as well.

@timmoon10
Copy link
Contributor Author

@RuiWang1998 That's nifty, it'll be convenient to just reuse that existing work.

These two approaches aren't mutually exclusive, so I don't think there is any harm to merging. This change won't break the newer code that uses memory_efficient.

@crcrpar crcrpar merged commit c07a4cf into NVIDIA:master Jan 1, 2024
ericharper added a commit to NVIDIA-NeMo/NeMo that referenced this pull request Jan 12, 2024
* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Removed unused import

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
minitu pushed a commit to minitu/NeMo that referenced this pull request Jan 19, 2024
…eMo#7909)

* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Removed unused import

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024
…eMo#7909)

* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Removed unused import

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Sasha Meister <ameister@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
…eMo#7909)

* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Removed unused import

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments