Improve communication overlapping in FP8 distributed optimizer#8221
Merged
timmoon10 merged 14 commits intoNVIDIA-NeMo:mainfrom Feb 8, 2024
Merged
Improve communication overlapping in FP8 distributed optimizer#8221timmoon10 merged 14 commits intoNVIDIA-NeMo:mainfrom
timmoon10 merged 14 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
for more information, see https://pre-commit.ci
Collaborator
Author
|
jenkins |
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Collaborator
Author
|
jenkins |
Collaborator
Author
|
jenkins |
Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <tmoon@nvidia.com>
Collaborator
Author
|
jenkins |
ssh-meister
pushed a commit
to ssh-meister/NeMo
that referenced
this pull request
Feb 15, 2024
…A-NeMo#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
vasunvidia
pushed a commit
to vasunvidia/NeMo
that referenced
this pull request
Feb 19, 2024
…A-NeMo#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
vasunvidia
added a commit
to vasunvidia/NeMo
that referenced
this pull request
Feb 19, 2024
NVIDIA-NeMo#8221)" This reverts commit 5521687.
layalir
added a commit
to layalir/NeMo
that referenced
this pull request
Feb 28, 2024
NVIDIA-NeMo#8221)" This reverts commit c84121a.
layalir
added a commit
to layalir/NeMo
that referenced
this pull request
Feb 29, 2024
NVIDIA-NeMo#8221)" This reverts commit c84121a.
ftxj
pushed a commit
to ftxj/NeMo
that referenced
this pull request
Feb 29, 2024
…A-NeMo#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
minitu
pushed a commit
to minitu/NeMo
that referenced
this pull request
Mar 7, 2024
…A-NeMo#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
pablo-garay
pushed a commit
that referenced
this pull request
Mar 19, 2024
* Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com>
8 tasks
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
…A-NeMo#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
When training GPT, the Apex distributed Adam optimizer overlaps its first parameter all-gather with the optimizer step. This optimization has been applied to both FP8 and non-FP8 models.
Collection: NLP
Changelog
Usage
Run GPT, e.g. with the config at https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml.
Enable FP8 support with
model.fp8=True, FP8 parameters withmodel.fp8_params=True, the distributed optimizer withmodel.optim.name=distributed_fused_adam, and overlapped param all-gathers withmodel.optim.overlap_param_sync=TrueJenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkinson the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information