-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
在您提供的SFT命令中,有两个参数: --per_device_train_batch_size 1 --gradient_accumulation_steps 2
实际batch size就是 12显卡数。我试图增大训练的batchsize,也就是增加per_device_train_batch_size,但是会报错,我搜索显示貌似是open r1不推荐修改这个参数。如果增大gradient_accumulation_steps这个参数,是否又会影响性能呢?
我用4张卡,如果采用原参数,也就是batch size只为8,这是否太小了?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels