Skip to content

[BUG]: Llama2 HybridParallelPlugin train failed when pp_size>1 #4705

@xs1997zju

Description

@xs1997zju

🐛 Describe the bug

Llama2 HybridParallelPlugin train failed when pp_size>1
we modified the llama training example to use hybridparallel plugin but encountered such an error.
The hidden_states is None in colossalai/shardformer/modeling/llama.py:61 in llama_model_forward
image

Environment

torch-'1.13.1+cu117', transformers-4.32.0, colossalAI-release-v0.3.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions