Skip to content

Distributed optimizer support for contiguous param buffer with FP8 params#1749

Merged
crcrpar merged 3 commits intoNVIDIA:masterfrom
timmoon10:fp8-distopt-bugfix
Nov 20, 2023
Merged

Distributed optimizer support for contiguous param buffer with FP8 params#1749
crcrpar merged 3 commits intoNVIDIA:masterfrom
timmoon10:fp8-distopt-bugfix

Conversation

@timmoon10
Copy link
Contributor

#1723 added distopt infrastructure to support FP8 parameters in NeMo, but I found a bug with contiguous_param_buffer=True. In the non-FP8 case, the local shards of the updated params are views into the contiguous buffer. The Adam kernel outputs to the buffer, we do in-place all-gathers, and the params are ready for fprop. However, the FP8 case should use a temporary buffer since the Adam kernel doesn't support FP8. The Adam kernel outputs to a temporary FP32 buffer and we cast to FP8 in the contiguous param buffer.

Copy link
Collaborator

@crcrpar crcrpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be easily feasible to add a test case?

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Contributor Author

👍 Done

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments