🐛 Describe the bug
ColossalAI/examples/language/llama2/pretrain.py 训练过程一切正常,但是保存checkpoint的时候保存lr_scheduler的state会报错
Traceback (most recent call last):
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/examples/language/llama2/pretrain.py", line 320, in
main()
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/examples/language/llama2/pretrain.py", line 309, in main
save(booster, model, optimizer, lr_scheduler, epoch, step + 1, args.batch_size, coordinator,
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/examples/language/llama2/pretrain.py", line 89, in save
booster.save_lr_scheduler(lr_scheduler, os.path.join(save_dir, 'lr_scheduler'))
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/booster/booster.py", line 293, in save_lr_scheduler
self.checkpoint_io.save_lr_scheduler(lr_scheduler, checkpoint)
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/booster/plugin/torch_ddp_plugin.py", line 50, in save_lr_scheduler
super().save_lr_scheduler(lr_scheduler, checkpoint)
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/checkpoint_io/checkpoint_io_base.py", line 321, in save_lr_scheduler
torch.save(lr_scheduler.state_dict(), checkpoint)
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/nn/lr_scheduler/delayed.py", line 94, in state_dict
raise NotImplementedError()

Environment
Cuda 11.6
torch.version
'2.0.0+cu117'
🐛 Describe the bug
ColossalAI/examples/language/llama2/pretrain.py 训练过程一切正常,但是保存checkpoint的时候保存lr_scheduler的state会报错
Traceback (most recent call last):
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/examples/language/llama2/pretrain.py", line 320, in
main()
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/examples/language/llama2/pretrain.py", line 309, in main
save(booster, model, optimizer, lr_scheduler, epoch, step + 1, args.batch_size, coordinator,
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/examples/language/llama2/pretrain.py", line 89, in save
booster.save_lr_scheduler(lr_scheduler, os.path.join(save_dir, 'lr_scheduler'))
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/booster/booster.py", line 293, in save_lr_scheduler
self.checkpoint_io.save_lr_scheduler(lr_scheduler, checkpoint)
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/booster/plugin/torch_ddp_plugin.py", line 50, in save_lr_scheduler
super().save_lr_scheduler(lr_scheduler, checkpoint)
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/checkpoint_io/checkpoint_io_base.py", line 321, in save_lr_scheduler
torch.save(lr_scheduler.state_dict(), checkpoint)
File "/scratch/users/nus/tomyoung/Colossal-llama2/ColossalAI/colossalai/nn/lr_scheduler/delayed.py", line 94, in state_dict
raise NotImplementedError()
Environment
Cuda 11.6
torch.version
'2.0.0+cu117'