Distributed optimizer support for experimental FP8 tensors by timmoon10 · Pull Request #7469 · NVIDIA-NeMo/NeMo

timmoon10 · 2023-09-20T20:02:22Z

What does this PR do ?

This PR integrates with experimental Float8Tensors from the Transformer Engine float8tensor_experiments branch. This allows the model to only store FP8 weight matrices. The distributed optimizer will store an FP32 master copy of the weights and will perform param all-gathers in FP8.

Collection: NLP

Changelog

Add logic to initialize GPT with FP8 weight matrices
Add distributed optimizer support for FP8 weight matrices, including FP8 param all-gathers

Usage

Enable FP8 support:
https://github.com/NVIDIA/NeMo/blob/19a3b7015fe353199af97903df1814e3a470b503/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L169

Use Megatron-core model:
https://github.com/NVIDIA/NeMo/blob/19a3b7015fe353199af97903df1814e3a470b503/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L49

Set the optimizer to distributed_fused_adam:

https://github.com/NVIDIA/NeMo/blob/f8be40b75ee1f8437b56fcc9602dc2aaddfb0643/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L228

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Pinging @sudhakarsingh27.

Additional Information

Depends on https://github.com/sudhakarsingh27/transformerengine/tree/float8tensor_experiments, but there are import guards to handle the case that it isn't available.
Depends on Distributed optimizer infrastructure for FP8 parameters NVIDIA/apex#1723
Depends on https://gitlab-master.nvidia.com/ADLR/megatron-lm/-/tree/run_fp8_poc_with_nemo/
Behavior is unchanged for non-FP8 models.
If any FP8 params are detected, overlapping is disabled for non-FP8 grad reductions and param all-gathers.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

for more information, see https://pre-commit.ci

nemo/core/optim/distributed_adam.py

+            bucket_id = fragment.bucket_id
+            bucket_start, bucket_end = fragment.bucket_range
+            param_start, param_end = fragment.param_range
+            if param_end <= param_start or bucket_id not in self._params_buckets:


nemo/core/optim/distributed_adam.py

+HAVE_TE_FP8TENSOR = False
+try:
+    from transformer_engine.pytorch import Float8Tensor
+    from transformer_engine.pytorch.cpp_extensions import cast_to_fp8


github-actions · 2023-10-16T01:45:01Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2023-10-24T01:44:42Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

sudhakarsingh27 and others added 5 commits September 7, 2023 16:11

fp8 poc usage with megatron-core

b049f83

Add FP8 support to distopt

48cc1d1

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Correctly accumulate amax when param is split across buckets

5b9db8c

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Debug FP8 casts in distopt

5c76317

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Optimize distopt handling of FP8 scaling factors

d85ab40

Signed-off-by: Tim Moon <tmoon@nvidia.com>

github-actions bot added core Changes to NeMo Core NLP labels Sep 20, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d22729

for more information, see https://pre-commit.ci

github-advanced-security bot found potential problems Sep 27, 2023

View reviewed changes

This was referenced Sep 28, 2023

Distributed optimizer support for experimental FP8 tensors (r1.20.0 branch) #7565

Closed

[PyTorch] Experimental FP8 tensor class NVIDIA/TransformerEngine#452

Merged

timmoon10 marked this pull request as draft October 1, 2023 21:22

github-actions bot added the stale label Oct 16, 2023

github-actions bot closed this Oct 24, 2023

This was referenced Nov 14, 2023

Distributed optimizer support for experimental FP8 tensors #7885

Closed

Add distopt support for FP8 params and BF16 optimizer state #7909

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed optimizer support for experimental FP8 tensors#7469

Distributed optimizer support for experimental FP8 tensors#7469
timmoon10 wants to merge 6 commits intoNVIDIA-NeMo:mainfrom
timmoon10:fp8-distopt

timmoon10 commented Sep 20, 2023 •

edited

Loading

Uh oh!

Check notice

Check notice

github-actions bot commented Oct 16, 2023

Uh oh!

github-actions bot commented Oct 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

timmoon10 commented Sep 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Check notice

Check notice

github-actions bot commented Oct 16, 2023

Uh oh!

github-actions bot commented Oct 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

timmoon10 commented Sep 20, 2023 •

edited

Loading