[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer by yhna940 · Pull Request #183 · EleutherAI/oslo

yhna940 · 2023-05-08T14:01:46Z

Title

[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer

Description

It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me.

Currently, the entire parameter group is named fp16_param_groups, and the parts managed by the gpu at the current rank are described as fp32_flat_param_groups_of_current_rank. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the __init__method is fp16. In other cases, however, its characteristics do not correspond to the variable it.

I would like to propose an alternative term, working_param and master_param. The term is more closely related to the concept of the mixed-precision training context. Using working_param and master_weight would create a clear distinction between the two types of parameters and help avoid confusion.

To summarize my suggestions:

fp16 -> working
fp32 -> master

Linked Issues

N/A

Reference

[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. hpcaitech/ColossalAI#3173

…O optimizer (#183) ## Title - [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer ## Description It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me. Currently, the entire parameter group is named `fp16_param_groups`, and the parts managed by the gpu at the current rank are described as `fp32_flat_param_groups_of_current_rank`. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the `__init__`method is fp16. In other cases, however, its characteristics do not correspond to the variable it. I would like to propose an alternative term, `working_param` and `master_param`. The term is more closely related to the concept of the mixed-precision training context. Using `working_param` and `master_weight` would create a clear distinction between the two types of parameters and help avoid confusion. To summarize my suggestions: - `fp16` -> `working` - `fp32` -> `master` ## Linked Issues - N/A ## Reference - hpcaitech/ColossalAI#3173

Refact confusing var for zero optim

64d0771

yhna940 requested a review from hyunwoongko as a code owner May 8, 2023 14:01

yhna940 self-assigned this May 8, 2023

yhna940 added the ZeRO ZeroRedundancyOptimizer label May 8, 2023

KKIEEK approved these changes May 8, 2023

View reviewed changes

Apply lint

ad4e797

hyunwoongko merged commit 4bf13ac into EleutherAI:main May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer#183

[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer#183
hyunwoongko merged 2 commits intoEleutherAI:mainfrom
yhna940:chore/refact-zero-optim

yhna940 commented May 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yhna940 commented May 8, 2023

Title

Description

Linked Issues

Reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants