🐛 Describe the bug
When enabling sequence parallelism in test_shard_bert, I encountered the following error.
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 183, in check_bert run_bert_test() File "/mnt/vepfs/lctzw/ColossalAI/colossalai/testing/utils.py", line 62, in _execute_function_by_param partial_func(**kwargs) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 139, in run_bert_test check_forward_backward(model_fn, data_gen_fn, output_transform_fn, loss_fn, test_config) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 27, in check_forward_backward org_loss, org_output, sharded_loss, sharded_output = run_forward_backward_with_hybrid_plugin( File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/_utils.py", line 182, in run_forward_backward_with_hybrid_plugin sharded_output = sharded_model(**data) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/booster/plugin/hybrid_parallel_plugin.py", line 126, in forward return super().forward(*args, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/interface/model.py", line 25, in forward return self.module(*args, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1672, in forward outputs = self.bert( File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/shardformer/modeling/bert.py", line 1235, in forward embedding_output = self.embeddings( File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 238, in forward embeddings += position_embeddings RuntimeError: The size of tensor a (1600) must match the size of tensor b (512) at non-singleton dimension 1
Environment
No response
🐛 Describe the bug
When enabling sequence parallelism in test_shard_bert, I encountered the following error.
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 183, in check_bert run_bert_test() File "/mnt/vepfs/lctzw/ColossalAI/colossalai/testing/utils.py", line 62, in _execute_function_by_param partial_func(**kwargs) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 139, in run_bert_test check_forward_backward(model_fn, data_gen_fn, output_transform_fn, loss_fn, test_config) File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/test_shard_bert.py", line 27, in check_forward_backward org_loss, org_output, sharded_loss, sharded_output = run_forward_backward_with_hybrid_plugin( File "/mnt/vepfs/lctzw/ColossalAI/tests/test_shardformer/test_model/_utils.py", line 182, in run_forward_backward_with_hybrid_plugin sharded_output = sharded_model(**data) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/booster/plugin/hybrid_parallel_plugin.py", line 126, in forward return super().forward(*args, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/interface/model.py", line 25, in forward return self.module(*args, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1672, in forward outputs = self.bert( File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/ColossalAI/colossalai/shardformer/modeling/bert.py", line 1235, in forward embedding_output = self.embeddings( File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/vepfs/lctzw/colossal_env/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 238, in forward embeddings += position_embeddings RuntimeError: The size of tensor a (1600) must match the size of tensor b (512) at non-singleton dimension 1Environment
No response