[BUG]: Fail to load huggingface pretraining when use shardinit

### 🐛 Describe the bug


```
        world_size = torch.distributed.get_world_size()
        shard_pg = ProcessGroup(tp_degree=world_size) if args.shardinit else None
        default_dist_spec = ShardSpec([-1], [world_size]) if args.shardinit else None

        with ColoInitContext(device=get_current_device(),
                             dtype=torch.half,
                             default_dist_spec=default_dist_spec,
                             default_pg=shard_pg):
            model = BloomForCausalLM.from_pretrained(args.model_name_or_path)
```

When using shardinit, the model will be split into multiple GPUs first and then load the huggingface pertaining,  so checkpoint mismatch will occur.

```
RuntimeError: Error(s) in loading state_dict for BloomForCausalLM:
        size mismatch for transformer.word_embeddings.weight: copying a param 
with shape torch.Size([46145, 4096]) from checkpoint, the shape in current model
is torch.Size([46145, 512]).
        size mismatch for transformer.word_embeddings_layernorm.weight: copying 
a param with shape torch.Size([4096]) from checkpoint, the shape in current 
model is torch.Size([512]).
        size mismatch for transformer.word_embeddings_layernorm.bias: copying a 
param with shape torch.Size([4096]) from checkpoint, the shape in current model 
is torch.Size([512]).
        size mismatch for transformer.h.0.input_layernorm.weight: copying a 
param with shape torch.Size([4096]) from checkpoint, the shape in current model 
is torch.Size([512]).
```
I wonder to know how to successfully load huggingface pertaining when using shardinit, seems that it's necessary when we want to fine-tune a very large model.


### Environment

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Fail to load huggingface pretraining when use shardinit #2770

🐛 Describe the bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: Fail to load huggingface pretraining when use shardinit #2770

Description

🐛 Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions