[BUG]: llama2 save model error

### 🐛 Describe the bug

When I run 1000 steps with hybird parallel setting to save model.

I didn't change any code of pretrain.py
```
Saving checkpoint
Traceback (most recent call last):
  File "/mnt/hwfile/songyixin/ColossalAI/examples/language/llama2/pretrain.py", line 316, in <module>
    main()
  File "/mnt/hwfile/songyixin/ColossalAI/examples/language/llama2/pretrain.py", line 305, in main
    save(booster, model, optimizer, lr_scheduler, epoch, step + 1, args.batch_size, coordinator,
  File "/mnt/hwfile/songyixin/ColossalAI/examples/language/llama2/pretrain.py", line 87, in save
    booster.save_model(model, os.path.join(save_dir, 'model'), shard=True)
  File "/mnt/hwfile/songyixin/ColossalAI/colossalai/booster/booster.py", line 242, in save_model
    self.checkpoint_io.save_model(model,
  File "/mnt/hwfile/songyixin/ColossalAI/colossalai/checkpoint_io/checkpoint_io_base.py", line 140, in save_model
    self.save_sharded_model(model, checkpoint, gather_dtensor, prefix, size_per_shard, use_safetensors)
  File "/mnt/hwfile/songyixin/ColossalAI/colossalai/checkpoint_io/hybrid_parallel_checkpoint_io.py", line 231, in save_sharded_model
    total_size = save_state_dict_shards(sharded_state_dict=state_dict_shard,
  File "/mnt/hwfile/songyixin/ColossalAI/colossalai/checkpoint_io/utils.py", line 256, in save_state_dict_shards
    save_state_dict(shard, checkpoint_file_path, use_safetensors=use_safetensors)
  File "/mnt/hwfile/songyixin/ColossalAI/colossalai/checkpoint_io/utils.py", line 324, in save_state_dict
    torch.save(state_dict, checkpoint_file_path)
  File "/mnt/petrelfs/songyixin/miniconda3/envs/cai/lib/python3.9/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/mnt/petrelfs/songyixin/miniconda3/envs/cai/lib/python3.9/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
```

### Environment

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: llama2 save model error #4759

🐛 Describe the bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: llama2 save model error #4759

Description

🐛 Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions