[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer.#3173
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer.#3173ver217 merged 8 commits intohpcaitech:mainfrom
Conversation
|
Hi @yhna940 Thanks for your contribution. But the way you named is little confusing. For each |
|
The code coverage for the changed files is 85%. Click me to view the complete report |
Thank you for your feedback. I understand the concern about the naming conventions and appreciate your suggestion. However, I would like to propose an alternative term, I hope this explanation clarifies my reasoning for suggesting the term To summarize my suggestions:
|
|
Hi @1SAA Based on our previous discussions, I have renamed the variables that were causing confusion related to Mixed Precision. Could you please review them once again? Thank you! |
|
Hi @yhna940 Thanks for your contribution, but there are some conflicts in this PR. Could you please solve them first? Thanks. |
Hello @binmakeswell , the conflict has been resolved. thank you |
|
Hi @yhna940 I am willing to merge your pr once tests in CI are passed. |
Hi @1SAA Github Action CI LogsLocal Test Log |
|
Hi @yhna940 It seems like there exists a memory leak in our tests. I think this problem is not caused by your code. I will fix this error soon. |
|
@yhna940 Can you sync our main branch updates first? We've already fixed the CI test. So after this, we can merge your this PR. Thanks. |
@ver217 I have synced the main branch updates to this PR. Thanks for letting me know that the CI test issue has been resolved. |
|
The code coverage for the changed files is 86%. Click me to view the complete report |
…O optimizer (#183) ## Title - [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer ## Description It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me. Currently, the entire parameter group is named `fp16_param_groups`, and the parts managed by the gpu at the current rank are described as `fp32_flat_param_groups_of_current_rank`. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the `__init__`method is fp16. In other cases, however, its characteristics do not correspond to the variable it. I would like to propose an alternative term, `working_param` and `master_param`. The term is more closely related to the concept of the mixed-precision training context. Using `working_param` and `master_weight` would create a clear distinction between the two types of parameters and help avoid confusion. To summarize my suggestions: - `fp16` -> `working` - `fp32` -> `master` ## Linked Issues - N/A ## Reference - hpcaitech/ColossalAI#3173
…O optimizer (#183) ## Title - [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer ## Description It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me. Currently, the entire parameter group is named `fp16_param_groups`, and the parts managed by the gpu at the current rank are described as `fp32_flat_param_groups_of_current_rank`. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the `__init__`method is fp16. In other cases, however, its characteristics do not correspond to the variable it. I would like to propose an alternative term, `working_param` and `master_param`. The term is more closely related to the concept of the mixed-precision training context. Using `working_param` and `master_weight` would create a clear distinction between the two types of parameters and help avoid confusion. To summarize my suggestions: - `fp16` -> `working` - `fp32` -> `master` ## Linked Issues - N/A ## Reference - hpcaitech/ColossalAI#3173
📌 Checklist before creating the PR
[doc/gemini/tensor/...]: A concise description🚨 Issue number
N/A
📝 What does this PR do?
It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me.
Currently, the entire parameter group is named
fp16_param_groups, and the parts managed by the gpu at the current rank are described asfp32_flat_param_groups_of_current_rank. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the__init__method is fp16. In other cases, however, its characteristics do not correspond to the variable it. So I want it to be renamed according to the sharding state, not the data type, according to thefsdpconvention ofpytorch. (with names like flatten_sharded_optim_state_dict and full_optim_state_dict).This is a related but more trivial issue, but it seems that the
param_storemethods don't even need to specify fp16.Thank you :)
💥 Checklist before requesting a review
⭐️ Do you enjoy contributing to Colossal-AI?
Tell us more if you don't enjoy contributing to Colossal-AI.