[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer#183
Merged
hyunwoongko merged 2 commits intoEleutherAI:mainfrom May 25, 2023
Merged
Conversation
KKIEEK
approved these changes
May 8, 2023
dyanos
pushed a commit
that referenced
this pull request
Jun 8, 2023
…O optimizer (#183) ## Title - [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer ## Description It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me. Currently, the entire parameter group is named `fp16_param_groups`, and the parts managed by the gpu at the current rank are described as `fp32_flat_param_groups_of_current_rank`. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the `__init__`method is fp16. In other cases, however, its characteristics do not correspond to the variable it. I would like to propose an alternative term, `working_param` and `master_param`. The term is more closely related to the concept of the mixed-precision training context. Using `working_param` and `master_weight` would create a clear distinction between the two types of parameters and help avoid confusion. To summarize my suggestions: - `fp16` -> `working` - `fp32` -> `master` ## Linked Issues - N/A ## Reference - hpcaitech/ColossalAI#3173
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
Description
It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me.
Currently, the entire parameter group is named
fp16_param_groups, and the parts managed by the gpu at the current rank are described asfp32_flat_param_groups_of_current_rank. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the__init__method is fp16. In other cases, however, its characteristics do not correspond to the variable it.I would like to propose an alternative term,
working_paramandmaster_param. The term is more closely related to the concept of the mixed-precision training context. Usingworking_paramandmaster_weightwould create a clear distinction between the two types of parameters and help avoid confusion.To summarize my suggestions:
fp16->workingfp32->masterLinked Issues
Reference