Skip to content

[BUG]: sequence parallel test failed #4888

@littsk

Description

@littsk

🐛 Describe the bug

When enabling sequence parallelism in test_shard_bert, I encountered the following error.
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 183, in check_bert run_bert_test() File "/mnt/vepfs/lctzw/ColossalAI/colossalai/testing/utils.py", line 62, in _execute_function_by_param partial_func(**kwargs) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 139, in run_bert_test check_forward_backward(model_fn, data_gen_fn, output_transform_fn, loss_fn, test_config) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 27, in check_forward_backward org_loss, org_output, sharded_loss, sharded_output = run_forward_backward_with_hybrid_plugin( File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/_utils.py", line 182, in run_forward_backward_with_hybrid_plugin sharded_output = sharded_model(**data) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/booster/plugin/hybrid_parallel_plugin.py", line 126, in forward return super().forward(*args, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/interface/model.py", line 25, in forward return self.module(*args, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1672, in forward outputs = self.bert( File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/shardformer/modeling/bert.py", line 1235, in forward embedding_output = self.embeddings( File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 238, in forward embeddings += position_embeddings RuntimeError: The size of tensor a (1600) must match the size of tensor b (512) at non-singleton dimension 1

Environment

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions