Skip to content

[BUG]: gemini 插件微调时,疑似不能正确加载模型权重 #5034

@hmzo

Description

@hmzo

🐛 Describe the bug

环境

------------ Environment ------------
Colossal-AI version: 0.3.4
PyTorch version: 2.0.1
System CUDA version: 11.7
CUDA version required by PyTorch: 11.7

BUG 细节

微调代码修改自:https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/llama2/finetune.py

加载模型:https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/llama2/finetune.py#L237

除了plugin的类型其余变量都保持一致,发现zero2时,loss的表现正常,而使用gemini时,更像是从一个随机初始化的weight进行优化

zero2,loss 正常从比较低的水平开始下降:
image

gemini,loss 从特别高的水平下降:
image

Environment

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions