[booster] implement Gemini plugin by ver217 · Pull Request #3352 · hpcaitech/ColossalAI

ver217 · 2023-03-30T09:58:14Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

Closes #3351

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Add a plugin for Gemini. I tests the forward/backward/optimizer step on our model zoo.

Checkpoint IO is not tested.

Gemini may be incompatible with many models. A report is attached below. And those models will be temporarily skipped.

Compatibility report

Passed models(59): ['diffusers_clip_model', 'diffusers_clip_text_model', 'timm_cait', 'timm_convmixer', 'timm_efficientnetv2', 'timm_vision_transformer', 'timm_deit', 'timm_coat', 'timm_deit3', 'timm_ese_vovnet19b_dw', 'timm_hardcorenas_a', 'timm_inception_v3', 'timm_regnetv_040', 'timm_tnt_b_patch16_224', 'timm_vgg', 'timm_dpn', 'timm_densenet', 'timm_rexnet', 'torchaudio_convtasnet', 'torchaudio_emformer', 'torchaudio_wav2letter_waveform', 'torchaudio_wav2letter_mfcc', 'torchaudio_wav2vec2_base', 'deepfm_densearch', 'deepfm_overarch', 'deepfm_sparsearch', 'dlrm_densearch', 'dlrm_overarch', 'dlrm_sparsearch', 'torchvision_alexnet', 'torchvision_densenet121', 'torchvision_efficientnet_b0', 'torchvision_mobilenet_v2', 'torchvision_mnasnet0_5', 'torchvision_regnet_x_16gf', 'torchvision_shufflenet_v2_x0_5', 'torchvision_squeezenet1_0', 'torchvision_vgg11', 'torchvision_efficientnet_v2_s', 'transformers_albert_for_masked_lm', 'transformers_albert_for_sequence_classification', 'transformers_albert_for_token_classification', 'transformers_albert_for_question_answering', 'transformers_albert_for_multiple_choice', 'transformers_bert_lm_head_model', 'transformers_bert_for_masked_lm', 'transformers_bert_for_sequence_classification', 'transformers_bert_for_token_classification', 'transformers_bert_for_next_sentence', 'transformers_bert_for_mcq', 'transformers_gpt', 'transformers_gpt_lm', 'transformers_gpt_for_token_classification', 'transformers_gpt_for_sequence_classification', 'transformers_opt', 'transformers_opt_for_causal_lm', 'transformers_t5', 'transformers_t5_for_conditional_generation', 'transformers_t5_encoder_model']

Failed models(37): ['diffusers_clip_vision_model', 'timm_resnet', 'timm_beit', 'timm_beitv2', 'timm_eca_nfnet', 'timm_efficientformer', 'timm_hrnet_w18_small', 'timm_nf_ecaresnet101', 'timm_nf_regnet_b0', 'timm_skresnet18', 'timm_wide_resnet50_2', 'timm_convit', 'timm_dm_nfnet', 'timm_swin_transformer', 'torchaudio_conformer', 'torchaudio_deepspeech', 'torchaudio_hubert_base', 'torchaudio_wavernn', 'torchaudio_tacotron', 'deepfm_interactionarch', 'deepfm_simpledeepfmnn', 'dlrm', 'dlrm_interactionarch', 'torchvision_googlenet', 'torchvision_inception_v3', 'torchvision_mobilenet_v3_small', 'torchvision_resnet18', 'torchvision_resnext50_32x4d', 'torchvision_wide_resnet50_2', 'torchvision_vit_b_16', 'torchvision_convnext_base', 'torchvision_swin_s', 'transformers_albert', 'transformers_albert_for_pretraining', 'transformers_bert', 'transformers_bert_for_pretraining', 'transformers_gpt_double_heads']

Reason:

diffusers_clip_vision_model: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not
compatible with ZeroDDP.\n', 'Reduction failed at followed parameters:\n\tvision_model.embeddings.class_embedding\n\tvision_model.embeddings.patch_embedding.weight\n\tvision_model
.embeddings.position_embedding.weight\n\tvision_model.pre_layrnorm.weight\n\tvision_model.pre_layrnorm.bias\n\tvision_model.encoder.layers.0.self_attn.k_proj.weight\n\tvision_mode
l.encoder.layers.0.self_attn.k_proj.bias\n\tvision_model.encoder.layers.0.self_attn.v_proj.weight\n\tvision_model.encoder.layers.0.self_attn.v_proj.bias\n\tvision_model.encoder.la
yers.0.self_attn.q_proj.weight\n\tvision_model.encoder.layers.0.self_attn.q_proj.bias\n\tvision_model.encoder.layers.0.self_attn.out_proj.weight\n\tvision_model.encoder.layers.0.s
elf_attn.out_proj.bias\n\tvision_model.encoder.layers.0.layer_norm1.weight\n\tvision_model.encoder.layers.0.layer_norm1.bias\n\tvision_model.encoder.layers.0.mlp.fc1.weight\n\tvis
ion_model.encoder.layers.0.mlp.fc1.bias\n\tvision_model.encoder.layers.0.mlp.fc2.weight\n\tvision_model.encoder.layers.0.mlp.fc2.bias\n\tvision_model.encoder.layers.0.layer_norm2.
weight\n\tvision_model.encoder.layers.0.layer_norm2.bias\n\tvision_model.encoder.layers.1.self_attn.k_proj.weight\n\tvision_model.encoder.layers.1.self_attn.k_proj.bias\n\tvision_
model.encoder.layers.1.self_attn.v_proj.weight\n\tvision_model.encoder.layers.1.self_attn.v_proj.bias\n\tvision_model.encoder.layers.1.self_attn.q_proj.weight\n\tvision_model.enco
der.layers.1.self_attn.q_proj.bias\n\tvision_model.encoder.layers.1.self_attn.out_proj.weight\n\tvision_model.encoder.layers.1.self_attn.out_proj.bias\n\tvision_model.encoder.laye
rs.1.layer_norm1.weight\n\tvision_model.encoder.layers.1.layer_norm1.bias\n\tvision_model.encoder.layers.1.mlp.fc1.weight\n\tvision_model.encoder.layers.1.mlp.fc1.bias\n\tvision_m
odel.encoder.layers.1.mlp.fc2.weight\n\tvision_model.encoder.layers.1.mlp.fc2.bias\n\tvision_model.encoder.layers.1.layer_norm2.weight\n\tvision_model.encoder.layers.1.layer_norm2
.bias\n\tvision_model.encoder.layers.2.self_attn.k_proj.weight\n\tvision_model.encoder.layers.2.self_attn.k_proj.bias\n\tvision_model.encoder.layers.2.self_attn.v_proj.weight\n\tv
ision_model.encoder.layers.2.self_attn.v_proj.bias\n\tvision_model.encoder.layers.2.self_attn.q_proj.weight\n\tvision_model.encoder.layers.2.self_attn.q_proj.bias\n\tvision_model.
encoder.layers.2.self_attn.out_proj.weight\n\tvision_model.encoder.layers.2.self_attn.out_proj.bias\n\tvision_model.encoder.layers.2.layer_norm1.weight\n\tvision_model.encoder.lay
ers.2.layer_norm1.bias\n\tvision_model.encoder.layers.2.mlp.fc1.weight\n\tvision_model.encoder.layers.2.mlp.fc1.bias\n\tvision_model.encoder.layers.2.mlp.fc2.weight\n\tvision_mode
l.encoder.layers.2.mlp.fc2.bias\n\tvision_model.encoder.layers.2.layer_norm2.weight\n\tvision_model.encoder.layers.2.layer_norm2.bias\n\tvision_model.encoder.layers.3.self_attn.k_
proj.weight\n\tvision_model.encoder.layers.3.self_attn.k_proj.bias\n\tvision_model.encoder.layers.3.self_attn.v_proj.weight\n\tvision_model.encoder.layers.3.self_attn.v_proj.bias\
n\tvision_model.encoder.layers.3.self_attn.q_proj.weight\n\tvision_model.encoder.layers.3.self_attn.q_proj.bias\n\tvision_model.encoder.layers.3.self_attn.out_proj.weight\n\tvisio
n_model.encoder.layers.3.self_attn.out_proj.bias\n\tvision_model.encoder.layers.3.layer_norm1.weight\n\tvision_model.encoder.layers.3.layer_norm1.bias\n\tvision_model.encoder.laye
rs.3.mlp.fc1.weight\n\tvision_model.encoder.layers.3.mlp.fc1.bias\n\tvision_model.encoder.layers.3.mlp.fc2.weight\n\tvision_model.encoder.layers.3.mlp.fc2.bias\n\tvision_model.enc
oder.layers.3.layer_norm2.weight\n\tvision_model.encoder.layers.3.layer_norm2.bias\n\tvision_model.encoder.layers.4.self_attn.k_proj.weight\n\tvision_model.encoder.layers.4.self_a
ttn.k_proj.bias\n\tvision_model.encoder.layers.4.self_attn.v_proj.weight\n\tvision_model.encoder.layers.4.self_attn.v_proj.bias\n\tvision_model.encoder.layers.4.self_attn.q_proj.w
eight\n\tvision_model.encoder.layers.4.self_attn.q_proj.bias\n\tvision_model.encoder.layers.4.self_attn.out_proj.weight\n\tvision_model.encoder.layers.4.self_attn.out_proj.bias\n\
tvision_model.encoder.layers.4.layer_norm1.weight\n\tvision_model.encoder.layers.4.layer_norm1.bias\n\tvision_model.encoder.layers.4.mlp.fc1.weight\n\tvision_model.encoder.layers.
4.mlp.fc1.bias\n\tvision_model.encoder.layers.4.mlp.fc2.weight\n\tvision_model.encoder.layers.4.mlp.fc2.bias\n\tvision_model.encoder.layers.4.layer_norm2.weight\n\tvision_model.en
coder.layers.4.layer_norm2.bias\n\tvision_model.encoder.layers.5.self_attn.k_proj.weight\n\tvision_model.encoder.layers.5.self_attn.k_proj.bias\n\tvision_model.encoder.layers.5.se
lf_attn.v_proj.weight\n\tvision_model.encoder.layers.5.self_attn.v_proj.bias\n\tvision_model.encoder.layers.5.self_attn.q_proj.weight\n\tvision_model.encoder.layers.5.self_attn.q_
proj.bias\n\tvision_model.encoder.layers.5.self_attn.out_proj.weight\n\tvision_model.encoder.layers.5.self_attn.out_proj.bias\n\tvision_model.encoder.layers.5.layer_norm1.weight\n
\tvision_model.encoder.layers.5.layer_norm1.bias\n\tvision_model.encoder.layers.5.mlp.fc1.weight\n\tvision_model.encoder.layers.5.mlp.fc1.bias\n\tvision_model.encoder.layers.5.mlp
.fc2.weight\n\tvision_model.encoder.layers.5.mlp.fc2.bias\n\tvision_model.encoder.layers.5.layer_norm2.weight\n\tvision_model.encoder.layers.5.layer_norm2.bias\n\tvision_model.enc
oder.layers.6.self_attn.k_proj.weight\n\tvision_model.encoder.layers.6.self_attn.k_proj.bias\n\tvision_model.encoder.layers.6.self_attn.v_proj.weight\n\tvision_model.encoder.layer
s.6.self_attn.v_proj.bias\n\tvision_model.encoder.layers.6.self_attn.q_proj.weight\n\tvision_model.encoder.layers.6.self_attn.q_proj.bias\n\tvision_model.encoder.layers.6.self_att
n.out_proj.weight\n\tvision_model.encoder.layers.6.self_attn.out_proj.bias\n\tvision_model.encoder.layers.6.layer_norm1.weight\n\tvision_model.encoder.layers.6.layer_norm1.bias\n\
tvision_model.encoder.layers.6.mlp.fc1.weight\n\tvision_model.encoder.layers.6.mlp.fc1.bias\n\tvision_model.encoder.layers.6.mlp.fc2.weight\n\tvision_model.encoder.layers.6.mlp.fc
2.bias\n\tvision_model.encoder.layers.6.layer_norm2.weight\n\tvision_model.encoder.layers.6.layer_norm2.bias\n\tvision_model.encoder.layers.7.self_attn.k_proj.weight\n\tvision_mod
el.encoder.layers.7.self_attn.k_proj.bias\n\tvision_model.encoder.layers.7.self_attn.v_proj.weight\n\tvision_model.encoder.layers.7.self_attn.v_proj.bias\n\tvision_model.encoder.l
ayers.7.self_attn.q_proj.weight\n\tvision_model.encoder.layers.7.self_attn.q_proj.bias\n\tvision_model.encoder.layers.7.self_attn.out_proj.weight\n\tvision_model.encoder.layers.7.
self_attn.out_proj.bias\n\tvision_model.encoder.layers.7.layer_norm1.weight\n\tvision_model.encoder.layers.7.layer_norm1.bias\n\tvision_model.encoder.layers.7.mlp.fc1.weight\n\tvi
sion_model.encoder.layers.7.mlp.fc1.bias\n\tvision_model.encoder.layers.7.mlp.fc2.weight\n\tvision_model.encoder.layers.7.mlp.fc2.bias\n\tvision_model.encoder.layers.7.layer_norm2
.weight\n\tvision_model.encoder.layers.7.layer_norm2.bias\n\tvision_model.encoder.layers.8.self_attn.k_proj.weight\n\tvision_model.encoder.layers.8.self_attn.k_proj.bias\n\tvision
_model.encoder.layers.8.self_attn.v_proj.weight\n\tvision_model.encoder.layers.8.self_attn.v_proj.bias\n\tvision_model.encoder.layers.8.self_attn.q_proj.weight\n\tvision_model.enc
oder.layers.8.self_attn.q_proj.bias\n\tvision_model.encoder.layers.8.self_attn.out_proj.weight\n\tvision_model.encoder.layers.8.self_attn.out_proj.bias\n\tvision_model.encoder.lay
ers.8.layer_norm1.weight\n\tvision_model.encoder.layers.8.layer_norm1.bias\n\tvision_model.encoder.layers.8.mlp.fc1.weight\n\tvision_model.encoder.layers.8.mlp.fc1.bias\n\tvision_
model.encoder.layers.8.mlp.fc2.weight\n\tvision_model.encoder.layers.8.mlp.fc2.bias\n\tvision_model.encoder.layers.8.layer_norm2.weight\n\tvision_model.encoder.layers.8.layer_norm
2.bias\n\tvision_model.encoder.layers.9.self_attn.k_proj.weight\n\tvision_model.encoder.layers.9.self_attn.k_proj.bias\n\tvision_model.encoder.layers.9.self_attn.v_proj.weight\n\t
vision_model.encoder.layers.9.self_attn.v_proj.bias\n\tvision_model.encoder.layers.9.self_attn.q_proj.weight\n\tvision_model.encoder.layers.9.self_attn.q_proj.bias\n\tvision_model
.encoder.layers.9.self_attn.out_proj.weight\n\tvision_model.encoder.layers.9.self_attn.out_proj.bias\n\tvision_model.encoder.layers.9.layer_norm1.weight\n\tvision_model.encoder.la
yers.9.layer_norm1.bias\n\tvision_model.encoder.layers.9.mlp.fc1.weight\n\tvision_model.encoder.layers.9.mlp.fc1.bias\n\tvision_model.encoder.layers.9.mlp.fc2.weight\n\tvision_mod
el.encoder.layers.9.mlp.fc2.bias\n\tvision_model.encoder.layers.9.layer_norm2.weight\n\tvision_model.encoder.layers.9.layer_norm2.bias\n\tvision_model.encoder.layers.10.self_attn.
k_proj.weight\n\tvision_model.encoder.layers.10.self_attn.k_proj.bias\n\tvision_model.encoder.layers.10.self_attn.v_proj.weight\n\tvision_model.encoder.layers.10.self_attn.v_proj.
bias\n\tvision_model.encoder.layers.10.self_attn.q_proj.weight\n\tvision_model.encoder.layers.10.self_attn.q_proj.bias\n\tvision_model.encoder.layers.10.self_attn.out_proj.weight\
n\tvision_model.encoder.layers.10.self_attn.out_proj.bias\n\tvision_model.encoder.layers.10.layer_norm1.weight\n\tvision_model.encoder.layers.10.layer_norm1.bias\n\tvision_model.e
ncoder.layers.10.mlp.fc1.weight\n\tvision_model.encoder.layers.10.mlp.fc1.bias\n\tvision_model.encoder.layers.10.mlp.fc2.weight\n\tvision_model.encoder.layers.10.mlp.fc2.bias\n\tv
ision_model.encoder.layers.10.layer_norm2.weight\n\tvision_model.encoder.layers.10.layer_norm2.bias\n\tvision_model.encoder.layers.11.self_attn.k_proj.weight\n\tvision_model.encod
er.layers.11.self_attn.k_proj.bias\n\tvision_model.encoder.layers.11.self_attn.v_proj.weight\n\tvision_model.encoder.layers.11.self_attn.v_proj.bias\n\tvision_model.encoder.layers
.11.self_attn.q_proj.weight\n\tvision_model.encoder.layers.11.self_attn.q_proj.bias\n\tvision_model.encoder.layers.11.self_attn.out_proj.weight\n\tvision_model.encoder.layers.11.s
elf_attn.out_proj.bias\n\tvision_model.encoder.layers.11.layer_norm1.weight\n\tvision_model.encoder.layers.11.layer_norm1.bias\n\tvision_model.encoder.layers.11.mlp.fc1.weight\n\t
vision_model.encoder.layers.11.mlp.fc1.bias\n\tvision_model.encoder.layers.11.mlp.fc2.weight\n\tvision_model.encoder.layers.11.mlp.fc2.bias\n\tvision_model.encoder.layers.11.layer
_norm2.weight\n\tvision_model.encoder.layers.11.layer_norm2.bias\n\tvision_model.post_layernorm.weight\n\tvision_model.post_layernorm.bias')
timm_resnet: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the
autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can
fix this by cloning the output of the custom Function.
timm_beit: Parameter `blocks.11.attn.relative_position_bias_table` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
timm_beitv2: Parameter `blocks.11.attn.relative_position_bias_table` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
timm_eca_nfnet: 'NoneType' object has no attribute 'detach'
timm_efficientformer: Parameter `stages.3.blocks.4.token_mixer.attention_biases` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
timm_hrnet_w18_small: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is)
and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden.
You can fix this by cloning the output of the custom Function.
timm_nf_ecaresnet101: 'NoneType' object has no attribute 'detach'
timm_nf_regnet_b0: 'NoneType' object has no attribute 'detach'
timm_skresnet18: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and
the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You
can fix this by cloning the output of the custom Function.
timm_wide_resnet50_2: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is)
and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden.
You can fix this by cloning the output of the custom Function.
timm_convit: expected scalar type Float but found Half
timm_dm_nfnet: 'NoneType' object has no attribute 'detach'
timm_swin_transformer: Parameter `layers.3.blocks.1.attn.relative_position_bias_table` failed at the gradient reduction. Some unsupported torch function is operated upon this
parameter.
torchaudio_conformer: Parameter `conformer_layers.3.self_attn.in_proj_weight` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
torchaudio_deepspeech: Parameter `bi_rnn.weight_ih_l0` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
torchaudio_wavernn: Parameter `rnn2.weight_ih_l0` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
torchaudio_tacotron: Parameter `decoder.decoder_rnn.weight_ih` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
deepfm_interactionarch: expected scalar type Half but found Float
deepfm_simpledeepfmnn: expected scalar type Half but found Float
dlrm: expected scalar type Half but found Float
dlrm_interactionarch: optimizer got an empty parameter list
torchvision_googlenet: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with ZeroDDP.\n',
'Reduction failed at followed parameters:\n\tconv1.conv.weight\n\tconv1.bn.weight\n\tconv1.bn.bias\n\tconv2.conv.weight\n\tconv2.bn.weight\n\tconv2.bn.bias\n\tconv3.conv.weight\n\
tconv3.bn.weight\n\tconv3.bn.bias\n\tinception3a.branch1.conv.weight\n\tinception3a.branch1.bn.weight\n\tinception3a.branch1.bn.bias\n\tinception3a.branch2.0.conv.weight\n\tincept
ion3a.branch2.0.bn.weight\n\tinception3a.branch2.0.bn.bias\n\tinception3a.branch2.1.conv.weight\n\tinception3a.branch2.1.bn.weight\n\tinception3a.branch2.1.bn.bias\n\tinception3a.
branch3.0.conv.weight\n\tinception3a.branch3.0.bn.weight\n\tinception3a.branch3.0.bn.bias\n\tinception3a.branch3.1.conv.weight\n\tinception3a.branch3.1.bn.weight\n\tinception3a.br
anch3.1.bn.bias\n\tinception3a.branch4.1.conv.weight\n\tinception3a.branch4.1.bn.weight\n\tinception3a.branch4.1.bn.bias\n\tinception3b.branch1.conv.weight\n\tinception3b.branch1.
bn.weight\n\tinception3b.branch1.bn.bias\n\tinception3b.branch2.0.conv.weight\n\tinception3b.branch2.0.bn.weight\n\tinception3b.branch2.0.bn.bias\n\tinception3b.branch2.1.conv.wei
ght\n\tinception3b.branch2.1.bn.weight\n\tinception3b.branch2.1.bn.bias\n\tinception3b.branch3.0.conv.weight\n\tinception3b.branch3.0.bn.weight\n\tinception3b.branch3.0.bn.bias\n\
tinception3b.branch3.1.conv.weight\n\tinception3b.branch3.1.bn.weight\n\tinception3b.branch3.1.bn.bias\n\tinception3b.branch4.1.conv.weight\n\tinception3b.branch4.1.bn.weight\n\ti
nception3b.branch4.1.bn.bias\n\tinception4a.branch1.conv.weight\n\tinception4a.branch1.bn.weight\n\tinception4a.branch1.bn.bias\n\tinception4a.branch2.0.conv.weight\n\tinception4a
.branch2.0.bn.weight\n\tinception4a.branch2.0.bn.bias\n\tinception4a.branch2.1.conv.weight\n\tinception4a.branch2.1.bn.weight\n\tinception4a.branch2.1.bn.bias\n\tinception4a.branc
h3.0.conv.weight\n\tinception4a.branch3.0.bn.weight\n\tinception4a.branch3.0.bn.bias\n\tinception4a.branch3.1.conv.weight\n\tinception4a.branch3.1.bn.weight\n\tinception4a.branch3
.1.bn.bias\n\tinception4a.branch4.1.conv.weight\n\tinception4a.branch4.1.bn.weight\n\tinception4a.branch4.1.bn.bias\n\tinception4b.branch1.conv.weight\n\tinception4b.branch1.bn.we
ight\n\tinception4b.branch1.bn.bias\n\tinception4b.branch2.0.conv.weight\n\tinception4b.branch2.0.bn.weight\n\tinception4b.branch2.0.bn.bias\n\tinception4b.branch2.1.conv.weight\n
\tinception4b.branch2.1.bn.weight\n\tinception4b.branch2.1.bn.bias\n\tinception4b.branch3.0.conv.weight\n\tinception4b.branch3.0.bn.weight\n\tinception4b.branch3.0.bn.bias\n\tince
ption4b.branch3.1.conv.weight\n\tinception4b.branch3.1.bn.weight\n\tinception4b.branch3.1.bn.bias\n\tinception4b.branch4.1.conv.weight\n\tinception4b.branch4.1.bn.weight\n\tincept
ion4b.branch4.1.bn.bias\n\tinception4c.branch1.conv.weight\n\tinception4c.branch1.bn.weight\n\tinception4c.branch1.bn.bias\n\tinception4c.branch2.0.conv.weight\n\tinception4c.bran
ch2.0.bn.weight\n\tinception4c.branch2.0.bn.bias\n\tinception4c.branch2.1.conv.weight\n\tinception4c.branch2.1.bn.weight\n\tinception4c.branch2.1.bn.bias\n\tinception4c.branch3.0.
conv.weight\n\tinception4c.branch3.0.bn.weight\n\tinception4c.branch3.0.bn.bias\n\tinception4c.branch3.1.conv.weight\n\tinception4c.branch3.1.bn.weight\n\tinception4c.branch3.1.bn
.bias\n\tinception4c.branch4.1.conv.weight\n\tinception4c.branch4.1.bn.weight\n\tinception4c.branch4.1.bn.bias\n\tinception4d.branch1.conv.weight\n\tinception4d.branch1.bn.weight\
n\tinception4d.branch1.bn.bias\n\tinception4d.branch2.0.conv.weight\n\tinception4d.branch2.0.bn.weight\n\tinception4d.branch2.0.bn.bias\n\tinception4d.branch2.1.conv.weight\n\tinc
eption4d.branch2.1.bn.weight\n\tinception4d.branch2.1.bn.bias\n\tinception4d.branch3.0.conv.weight\n\tinception4d.branch3.0.bn.weight\n\tinception4d.branch3.0.bn.bias\n\tinception
4d.branch3.1.conv.weight\n\tinception4d.branch3.1.bn.weight\n\tinception4d.branch3.1.bn.bias\n\tinception4d.branch4.1.conv.weight\n\tinception4d.branch4.1.bn.weight\n\tinception4d
.branch4.1.bn.bias\n\tinception4e.branch1.conv.weight\n\tinception4e.branch1.bn.weight\n\tinception4e.branch1.bn.bias\n\tinception4e.branch2.0.conv.weight\n\tinception4e.branch2.0
.bn.weight\n\tinception4e.branch2.0.bn.bias\n\tinception4e.branch2.1.conv.weight\n\tinception4e.branch2.1.bn.weight\n\tinception4e.branch2.1.bn.bias\n\tinception4e.branch3.0.conv.
weight\n\tinception4e.branch3.0.bn.weight\n\tinception4e.branch3.0.bn.bias\n\tinception4e.branch3.1.conv.weight\n\tinception4e.branch3.1.bn.weight\n\tinception4e.branch3.1.bn.bias
\n\tinception4e.branch4.1.conv.weight\n\tinception4e.branch4.1.bn.weight\n\tinception4e.branch4.1.bn.bias\n\tinception5a.branch1.conv.weight\n\tinception5a.branch1.bn.weight\n\tin
ception5a.branch1.bn.bias\n\tinception5a.branch2.0.conv.weight\n\tinception5a.branch2.0.bn.weight\n\tinception5a.branch2.0.bn.bias\n\tinception5a.branch2.1.conv.weight\n\tinceptio
n5a.branch2.1.bn.weight\n\tinception5a.branch2.1.bn.bias\n\tinception5a.branch3.0.conv.weight\n\tinception5a.branch3.0.bn.weight\n\tinception5a.branch3.0.bn.bias\n\tinception5a.br
anch3.1.conv.weight\n\tinception5a.branch3.1.bn.weight\n\tinception5a.branch3.1.bn.bias\n\tinception5a.branch4.1.conv.weight\n\tinception5a.branch4.1.bn.weight\n\tinception5a.bran
ch4.1.bn.bias\n\tinception5b.branch1.conv.weight\n\tinception5b.branch1.bn.weight\n\tinception5b.branch1.bn.bias\n\tinception5b.branch2.0.conv.weight\n\tinception5b.branch2.0.bn.w
eight\n\tinception5b.branch2.0.bn.bias\n\tinception5b.branch2.1.conv.weight\n\tinception5b.branch2.1.bn.weight\n\tinception5b.branch2.1.bn.bias\n\tinception5b.branch3.0.conv.weigh
t\n\tinception5b.branch3.0.bn.weight\n\tinception5b.branch3.0.bn.bias\n\tinception5b.branch3.1.conv.weight\n\tinception5b.branch3.1.bn.weight\n\tinception5b.branch3.1.bn.bias\n\ti
nception5b.branch4.1.conv.weight\n\tinception5b.branch4.1.bn.weight\n\tinception5b.branch4.1.bn.bias\n\taux1.conv.conv.weight\n\taux1.conv.bn.weight\n\taux1.conv.bn.bias\n\taux1.f
c1.weight\n\taux1.fc1.bias\n\taux1.fc2.weight\n\taux1.fc2.bias\n\taux2.conv.conv.weight\n\taux2.conv.bn.weight\n\taux2.conv.bn.bias\n\taux2.fc1.weight\n\taux2.fc1.bias\n\taux2.fc2
.weight\n\taux2.fc2.bias\n\tfc.weight\n\tfc.bias')
torchvision_inception_v3: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with
ZeroDDP.\n', 'Reduction failed at followed parameters:\n\tConv2d_1a_3x3.conv.weight\n\tConv2d_1a_3x3.bn.weight\n\tConv2d_1a_3x3.bn.bias\n\tConv2d_2a_3x3.conv.weight\n\tConv2d_2a_3
x3.bn.weight\n\tConv2d_2a_3x3.bn.bias\n\tConv2d_2b_3x3.conv.weight\n\tConv2d_2b_3x3.bn.weight\n\tConv2d_2b_3x3.bn.bias\n\tConv2d_3b_1x1.conv.weight\n\tConv2d_3b_1x1.bn.weight\n\tC
onv2d_3b_1x1.bn.bias\n\tConv2d_4a_3x3.conv.weight\n\tConv2d_4a_3x3.bn.weight\n\tConv2d_4a_3x3.bn.bias\n\tMixed_5b.branch1x1.conv.weight\n\tMixed_5b.branch1x1.bn.weight\n\tMixed_5b
.branch1x1.bn.bias\n\tMixed_5b.branch5x5_1.conv.weight\n\tMixed_5b.branch5x5_1.bn.weight\n\tMixed_5b.branch5x5_1.bn.bias\n\tMixed_5b.branch5x5_2.conv.weight\n\tMixed_5b.branch5x5_
2.bn.weight\n\tMixed_5b.branch5x5_2.bn.bias\n\tMixed_5b.branch3x3dbl_1.conv.weight\n\tMixed_5b.branch3x3dbl_1.bn.weight\n\tMixed_5b.branch3x3dbl_1.bn.bias\n\tMixed_5b.branch3x3dbl
_2.conv.weight\n\tMixed_5b.branch3x3dbl_2.bn.weight\n\tMixed_5b.branch3x3dbl_2.bn.bias\n\tMixed_5b.branch3x3dbl_3.conv.weight\n\tMixed_5b.branch3x3dbl_3.bn.weight\n\tMixed_5b.bran
ch3x3dbl_3.bn.bias\n\tMixed_5b.branch_pool.conv.weight\n\tMixed_5b.branch_pool.bn.weight\n\tMixed_5b.branch_pool.bn.bias\n\tMixed_5c.branch1x1.conv.weight\n\tMixed_5c.branch1x1.bn
.weight\n\tMixed_5c.branch1x1.bn.bias\n\tMixed_5c.branch5x5_1.conv.weight\n\tMixed_5c.branch5x5_1.bn.weight\n\tMixed_5c.branch5x5_1.bn.bias\n\tMixed_5c.branch5x5_2.conv.weight\n\t
Mixed_5c.branch5x5_2.bn.weight\n\tMixed_5c.branch5x5_2.bn.bias\n\tMixed_5c.branch3x3dbl_1.conv.weight\n\tMixed_5c.branch3x3dbl_1.bn.weight\n\tMixed_5c.branch3x3dbl_1.bn.bias\n\tMi
xed_5c.branch3x3dbl_2.conv.weight\n\tMixed_5c.branch3x3dbl_2.bn.weight\n\tMixed_5c.branch3x3dbl_2.bn.bias\n\tMixed_5c.branch3x3dbl_3.conv.weight\n\tMixed_5c.branch3x3dbl_3.bn.weig
ht\n\tMixed_5c.branch3x3dbl_3.bn.bias\n\tMixed_5c.branch_pool.conv.weight\n\tMixed_5c.branch_pool.bn.weight\n\tMixed_5c.branch_pool.bn.bias\n\tMixed_5d.branch1x1.conv.weight\n\tMi
xed_5d.branch1x1.bn.weight\n\tMixed_5d.branch1x1.bn.bias\n\tMixed_5d.branch5x5_1.conv.weight\n\tMixed_5d.branch5x5_1.bn.weight\n\tMixed_5d.branch5x5_1.bn.bias\n\tMixed_5d.branch5x
5_2.conv.weight\n\tMixed_5d.branch5x5_2.bn.weight\n\tMixed_5d.branch5x5_2.bn.bias\n\tMixed_5d.branch3x3dbl_1.conv.weight\n\tMixed_5d.branch3x3dbl_1.bn.weight\n\tMixed_5d.branch3x3
dbl_1.bn.bias\n\tMixed_5d.branch3x3dbl_2.conv.weight\n\tMixed_5d.branch3x3dbl_2.bn.weight\n\tMixed_5d.branch3x3dbl_2.bn.bias\n\tMixed_5d.branch3x3dbl_3.conv.weight\n\tMixed_5d.bra
nch3x3dbl_3.bn.weight\n\tMixed_5d.branch3x3dbl_3.bn.bias\n\tMixed_5d.branch_pool.conv.weight\n\tMixed_5d.branch_pool.bn.weight\n\tMixed_5d.branch_pool.bn.bias\n\tMixed_6a.branch3x
3.conv.weight\n\tMixed_6a.branch3x3.bn.weight\n\tMixed_6a.branch3x3.bn.bias\n\tMixed_6a.branch3x3dbl_1.conv.weight\n\tMixed_6a.branch3x3dbl_1.bn.weight\n\tMixed_6a.branch3x3dbl_1.
bn.bias\n\tMixed_6a.branch3x3dbl_2.conv.weight\n\tMixed_6a.branch3x3dbl_2.bn.weight\n\tMixed_6a.branch3x3dbl_2.bn.bias\n\tMixed_6a.branch3x3dbl_3.conv.weight\n\tMixed_6a.branch3x3
dbl_3.bn.weight\n\tMixed_6a.branch3x3dbl_3.bn.bias\n\tMixed_6b.branch1x1.conv.weight\n\tMixed_6b.branch1x1.bn.weight\n\tMixed_6b.branch1x1.bn.bias\n\tMixed_6b.branch7x7_1.conv.wei
ght\n\tMixed_6b.branch7x7_1.bn.weight\n\tMixed_6b.branch7x7_1.bn.bias\n\tMixed_6b.branch7x7_2.conv.weight\n\tMixed_6b.branch7x7_2.bn.weight\n\tMixed_6b.branch7x7_2.bn.bias\n\tMixe
d_6b.branch7x7_3.conv.weight\n\tMixed_6b.branch7x7_3.bn.weight\n\tMixed_6b.branch7x7_3.bn.bias\n\tMixed_6b.branch7x7dbl_1.conv.weight\n\tMixed_6b.branch7x7dbl_1.bn.weight\n\tMixed
_6b.branch7x7dbl_1.bn.bias\n\tMixed_6b.branch7x7dbl_2.conv.weight\n\tMixed_6b.branch7x7dbl_2.bn.weight\n\tMixed_6b.branch7x7dbl_2.bn.bias\n\tMixed_6b.branch7x7dbl_3.conv.weight\n\
tMixed_6b.branch7x7dbl_3.bn.weight\n\tMixed_6b.branch7x7dbl_3.bn.bias\n\tMixed_6b.branch7x7dbl_4.conv.weight\n\tMixed_6b.branch7x7dbl_4.bn.weight\n\tMixed_6b.branch7x7dbl_4.bn.bia
s\n\tMixed_6b.branch7x7dbl_5.conv.weight\n\tMixed_6b.branch7x7dbl_5.bn.weight\n\tMixed_6b.branch7x7dbl_5.bn.bias\n\tMixed_6b.branch_pool.conv.weight\n\tMixed_6b.branch_pool.bn.wei
ght\n\tMixed_6b.branch_pool.bn.bias\n\tMixed_6c.branch1x1.conv.weight\n\tMixed_6c.branch1x1.bn.weight\n\tMixed_6c.branch1x1.bn.bias\n\tMixed_6c.branch7x7_1.conv.weight\n\tMixed_6c
.branch7x7_1.bn.weight\n\tMixed_6c.branch7x7_1.bn.bias\n\tMixed_6c.branch7x7_2.conv.weight\n\tMixed_6c.branch7x7_2.bn.weight\n\tMixed_6c.branch7x7_2.bn.bias\n\tMixed_6c.branch7x7_
3.conv.weight\n\tMixed_6c.branch7x7_3.bn.weight\n\tMixed_6c.branch7x7_3.bn.bias\n\tMixed_6c.branch7x7dbl_1.conv.weight\n\tMixed_6c.branch7x7dbl_1.bn.weight\n\tMixed_6c.branch7x7db
l_1.bn.bias\n\tMixed_6c.branch7x7dbl_2.conv.weight\n\tMixed_6c.branch7x7dbl_2.bn.weight\n\tMixed_6c.branch7x7dbl_2.bn.bias\n\tMixed_6c.branch7x7dbl_3.conv.weight\n\tMixed_6c.branc
h7x7dbl_3.bn.weight\n\tMixed_6c.branch7x7dbl_3.bn.bias\n\tMixed_6c.branch7x7dbl_4.conv.weight\n\tMixed_6c.branch7x7dbl_4.bn.weight\n\tMixed_6c.branch7x7dbl_4.bn.bias\n\tMixed_6c.b
ranch7x7dbl_5.conv.weight\n\tMixed_6c.branch7x7dbl_5.bn.weight\n\tMixed_6c.branch7x7dbl_5.bn.bias\n\tMixed_6c.branch_pool.conv.weight\n\tMixed_6c.branch_pool.bn.weight\n\tMixed_6c
.branch_pool.bn.bias\n\tMixed_6d.branch1x1.conv.weight\n\tMixed_6d.branch1x1.bn.weight\n\tMixed_6d.branch1x1.bn.bias\n\tMixed_6d.branch7x7_1.conv.weight\n\tMixed_6d.branch7x7_1.bn
.weight\n\tMixed_6d.branch7x7_1.bn.bias\n\tMixed_6d.branch7x7_2.conv.weight\n\tMixed_6d.branch7x7_2.bn.weight\n\tMixed_6d.branch7x7_2.bn.bias\n\tMixed_6d.branch7x7_3.conv.weight\n
\tMixed_6d.branch7x7_3.bn.weight\n\tMixed_6d.branch7x7_3.bn.bias\n\tMixed_6d.branch7x7dbl_1.conv.weight\n\tMixed_6d.branch7x7dbl_1.bn.weight\n\tMixed_6d.branch7x7dbl_1.bn.bias\n\t
Mixed_6d.branch7x7dbl_2.conv.weight\n\tMixed_6d.branch7x7dbl_2.bn.weight\n\tMixed_6d.branch7x7dbl_2.bn.bias\n\tMixed_6d.branch7x7dbl_3.conv.weight\n\tMixed_6d.branch7x7dbl_3.bn.we
ight\n\tMixed_6d.branch7x7dbl_3.bn.bias\n\tMixed_6d.branch7x7dbl_4.conv.weight\n\tMixed_6d.branch7x7dbl_4.bn.weight\n\tMixed_6d.branch7x7dbl_4.bn.bias\n\tMixed_6d.branch7x7dbl_5.c
onv.weight\n\tMixed_6d.branch7x7dbl_5.bn.weight\n\tMixed_6d.branch7x7dbl_5.bn.bias\n\tMixed_6d.branch_pool.conv.weight\n\tMixed_6d.branch_pool.bn.weight\n\tMixed_6d.branch_pool.bn
.bias\n\tMixed_6e.branch1x1.conv.weight\n\tMixed_6e.branch1x1.bn.weight\n\tMixed_6e.branch1x1.bn.bias\n\tMixed_6e.branch7x7_1.conv.weight\n\tMixed_6e.branch7x7_1.bn.weight\n\tMixe
d_6e.branch7x7_1.bn.bias\n\tMixed_6e.branch7x7_2.conv.weight\n\tMixed_6e.branch7x7_2.bn.weight\n\tMixed_6e.branch7x7_2.bn.bias\n\tMixed_6e.branch7x7_3.conv.weight\n\tMixed_6e.bran
ch7x7_3.bn.weight\n\tMixed_6e.branch7x7_3.bn.bias\n\tMixed_6e.branch7x7dbl_1.conv.weight\n\tMixed_6e.branch7x7dbl_1.bn.weight\n\tMixed_6e.branch7x7dbl_1.bn.bias\n\tMixed_6e.branch
7x7dbl_2.conv.weight\n\tMixed_6e.branch7x7dbl_2.bn.weight\n\tMixed_6e.branch7x7dbl_2.bn.bias\n\tMixed_6e.branch7x7dbl_3.conv.weight\n\tMixed_6e.branch7x7dbl_3.bn.weight\n\tMixed_6
e.branch7x7dbl_3.bn.bias\n\tMixed_6e.branch7x7dbl_4.conv.weight\n\tMixed_6e.branch7x7dbl_4.bn.weight\n\tMixed_6e.branch7x7dbl_4.bn.bias\n\tMixed_6e.branch7x7dbl_5.conv.weight\n\tM
ixed_6e.branch7x7dbl_5.bn.weight\n\tMixed_6e.branch7x7dbl_5.bn.bias\n\tMixed_6e.branch_pool.conv.weight\n\tMixed_6e.branch_pool.bn.weight\n\tMixed_6e.branch_pool.bn.bias\n\tAuxLog
its.conv0.conv.weight\n\tAuxLogits.conv0.bn.weight\n\tAuxLogits.conv0.bn.bias\n\tAuxLogits.conv1.conv.weight\n\tAuxLogits.conv1.bn.weight\n\tAuxLogits.conv1.bn.bias\n\tAuxLogits.f
c.weight\n\tAuxLogits.fc.bias\n\tMixed_7a.branch3x3_1.conv.weight\n\tMixed_7a.branch3x3_1.bn.weight\n\tMixed_7a.branch3x3_1.bn.bias\n\tMixed_7a.branch3x3_2.conv.weight\n\tMixed_7a
.branch3x3_2.bn.weight\n\tMixed_7a.branch3x3_2.bn.bias\n\tMixed_7a.branch7x7x3_1.conv.weight\n\tMixed_7a.branch7x7x3_1.bn.weight\n\tMixed_7a.branch7x7x3_1.bn.bias\n\tMixed_7a.bran
ch7x7x3_2.conv.weight\n\tMixed_7a.branch7x7x3_2.bn.weight\n\tMixed_7a.branch7x7x3_2.bn.bias\n\tMixed_7a.branch7x7x3_3.conv.weight\n\tMixed_7a.branch7x7x3_3.bn.weight\n\tMixed_7a.b
ranch7x7x3_3.bn.bias\n\tMixed_7a.branch7x7x3_4.conv.weight\n\tMixed_7a.branch7x7x3_4.bn.weight\n\tMixed_7a.branch7x7x3_4.bn.bias\n\tMixed_7b.branch1x1.conv.weight\n\tMixed_7b.bran
ch1x1.bn.weight\n\tMixed_7b.branch1x1.bn.bias\n\tMixed_7b.branch3x3_1.conv.weight\n\tMixed_7b.branch3x3_1.bn.weight\n\tMixed_7b.branch3x3_1.bn.bias\n\tMixed_7b.branch3x3_2a.conv.w
eight\n\tMixed_7b.branch3x3_2a.bn.weight\n\tMixed_7b.branch3x3_2a.bn.bias\n\tMixed_7b.branch3x3_2b.conv.weight\n\tMixed_7b.branch3x3_2b.bn.weight\n\tMixed_7b.branch3x3_2b.bn.bias\
n\tMixed_7b.branch3x3dbl_1.conv.weight\n\tMixed_7b.branch3x3dbl_1.bn.weight\n\tMixed_7b.branch3x3dbl_1.bn.bias\n\tMixed_7b.branch3x3dbl_2.conv.weight\n\tMixed_7b.branch3x3dbl_2.bn
.weight\n\tMixed_7b.branch3x3dbl_2.bn.bias\n\tMixed_7b.branch3x3dbl_3a.conv.weight\n\tMixed_7b.branch3x3dbl_3a.bn.weight\n\tMixed_7b.branch3x3dbl_3a.bn.bias\n\tMixed_7b.branch3x3d
bl_3b.conv.weight\n\tMixed_7b.branch3x3dbl_3b.bn.weight\n\tMixed_7b.branch3x3dbl_3b.bn.bias\n\tMixed_7b.branch_pool.conv.weight\n\tMixed_7b.branch_pool.bn.weight\n\tMixed_7b.branc
h_pool.bn.bias\n\tMixed_7c.branch1x1.conv.weight\n\tMixed_7c.branch1x1.bn.weight\n\tMixed_7c.branch1x1.bn.bias\n\tMixed_7c.branch3x3_1.conv.weight\n\tMixed_7c.branch3x3_1.bn.weigh
t\n\tMixed_7c.branch3x3_1.bn.bias\n\tMixed_7c.branch3x3_2a.conv.weight\n\tMixed_7c.branch3x3_2a.bn.weight\n\tMixed_7c.branch3x3_2a.bn.bias\n\tMixed_7c.branch3x3_2b.conv.weight\n\t
Mixed_7c.branch3x3_2b.bn.weight\n\tMixed_7c.branch3x3_2b.bn.bias\n\tMixed_7c.branch3x3dbl_1.conv.weight\n\tMixed_7c.branch3x3dbl_1.bn.weight\n\tMixed_7c.branch3x3dbl_1.bn.bias\n\t
Mixed_7c.branch3x3dbl_2.conv.weight\n\tMixed_7c.branch3x3dbl_2.bn.weight\n\tMixed_7c.branch3x3dbl_2.bn.bias\n\tMixed_7c.branch3x3dbl_3a.conv.weight\n\tMixed_7c.branch3x3dbl_3a.bn.
weight\n\tMixed_7c.branch3x3dbl_3a.bn.bias\n\tMixed_7c.branch3x3dbl_3b.conv.weight\n\tMixed_7c.branch3x3dbl_3b.bn.weight\n\tMixed_7c.branch3x3dbl_3b.bn.bias\n\tMixed_7c.branch_poo
l.conv.weight\n\tMixed_7c.branch_pool.bn.weight\n\tMixed_7c.branch_pool.bn.bias\n\tfc.weight\n\tfc.bias')
torchvision_mobilenet_v3_small: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was
returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior
is forbidden. You can fix this by cloning the output of the custom Function.
torchvision_resnet18: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is)
and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden.
You can fix this by cloning the output of the custom Function.
torchvision_resnext50_32x4d: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned
as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is
forbidden. You can fix this by cloning the output of the custom Function.
torchvision_wide_resnet50_2: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned
as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is
forbidden. You can fix this by cloning the output of the custom Function.
torchvision_vit_b_16: Parameter `encoder.layers.encoder_layer_11.self_attention.in_proj_weight` failed at the gradient reduction. Some unsupported torch function is operated upon
this parameter.
torchvision_convnext_base: Output 0 of AliasBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned
as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is
forbidden. You can fix this by cloning the output of the custom Function.
torchvision_swin_s: Parameter `features.7.1.attn.relative_position_bias_table` failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
transformers_albert: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with ZeroDDP.\n',
'Reduction failed at followed parameters:\n\tembeddings.word_embeddings.weight\n\tembeddings.position_embeddings.weight\n\tembeddings.token_type_embeddings.weight\n\tembeddings.La
yerNorm.weight\n\tembeddings.LayerNorm.bias\n\tencoder.embedding_hidden_mapping_in.weight\n\tencoder.embedding_hidden_mapping_in.bias\n\tencoder.albert_layer_groups.0.albert_layer
s.0.full_layer_layer_norm.weight\n\tencoder.albert_layer_groups.0.albert_layers.0.full_layer_layer_norm.bias\n\tencoder.albert_layer_groups.0.albert_layers.0.attention.query.weigh
t\n\tencoder.albert_layer_groups.0.albert_layers.0.attention.query.bias\n\tencoder.albert_layer_groups.0.albert_layers.0.attention.key.weight\n\tencoder.albert_layer_groups.0.albe
rt_layers.0.attention.key.bias\n\tencoder.albert_layer_groups.0.albert_layers.0.attention.value.weight\n\tencoder.albert_layer_groups.0.albert_layers.0.attention.value.bias\n\tenc
oder.albert_layer_groups.0.albert_layers.0.attention.dense.weight\n\tencoder.albert_layer_groups.0.albert_layers.0.attention.dense.bias\n\tencoder.albert_layer_groups.0.albert_lay
ers.0.attention.LayerNorm.weight\n\tencoder.albert_layer_groups.0.albert_layers.0.attention.LayerNorm.bias\n\tencoder.albert_layer_groups.0.albert_layers.0.ffn.weight\n\tencoder.a
lbert_layer_groups.0.albert_layers.0.ffn.bias\n\tencoder.albert_layer_groups.0.albert_layers.0.ffn_output.weight\n\tencoder.albert_layer_groups.0.albert_layers.0.ffn_output.bias\n
\tpooler.weight\n\tpooler.bias')
transformers_albert_for_pretraining: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with
ZeroDDP.\n', 'Reduction failed at followed parameters:\n\talbert.embeddings.word_embeddings.weight\n\talbert.embeddings.position_embeddings.weight\n\talbert.embeddings.token_type_
embeddings.weight\n\talbert.embeddings.LayerNorm.weight\n\talbert.embeddings.LayerNorm.bias\n\talbert.encoder.embedding_hidden_mapping_in.weight\n\talbert.encoder.embedding_hidden
_mapping_in.bias\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.full_layer_layer_norm.weight\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.full_layer_layer_norm
.bias\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.attention.query.weight\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.attention.query.bias\n\talbert.encoder
.albert_layer_groups.0.albert_layers.0.attention.key.weight\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.attention.key.bias\n\talbert.encoder.albert_layer_groups.0.albe
rt_layers.0.attention.value.weight\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.attention.value.bias\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.attention.d
ense.weight\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.attention.dense.bias\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.attention.LayerNorm.weight\n\talbe
rt.encoder.albert_layer_groups.0.albert_layers.0.attention.LayerNorm.bias\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.ffn.weight\n\talbert.encoder.albert_layer_groups.
0.albert_layers.0.ffn.bias\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.ffn_output.weight\n\talbert.encoder.albert_layer_groups.0.albert_layers.0.ffn_output.bias\n\talb
ert.pooler.weight\n\talbert.pooler.bias\n\tpredictions.bias\n\tpredictions.LayerNorm.weight\n\tpredictions.LayerNorm.bias\n\tpredictions.dense.weight\n\tpredictions.dense.bias\n\t
sop_classifier.classifier.weight\n\tsop_classifier.classifier.bias')
transformers_bert: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with ZeroDDP.\n',
'Reduction failed at followed parameters:\n\tembeddings.word_embeddings.weight\n\tembeddings.position_embeddings.weight\n\tembeddings.token_type_embeddings.weight\n\tembeddings.La
yerNorm.weight\n\tembeddings.LayerNorm.bias\n\tencoder.layer.0.attention.self.query.weight\n\tencoder.layer.0.attention.self.query.bias\n\tencoder.layer.0.attention.self.key.weigh
t\n\tencoder.layer.0.attention.self.key.bias\n\tencoder.layer.0.attention.self.value.weight\n\tencoder.layer.0.attention.self.value.bias\n\tencoder.layer.0.attention.output.dense.
weight\n\tencoder.layer.0.attention.output.dense.bias\n\tencoder.layer.0.attention.output.LayerNorm.weight\n\tencoder.layer.0.attention.output.LayerNorm.bias\n\tencoder.layer.0.in
termediate.dense.weight\n\tencoder.layer.0.intermediate.dense.bias\n\tencoder.layer.0.output.dense.weight\n\tencoder.layer.0.output.dense.bias\n\tencoder.layer.0.output.LayerNorm.
weight\n\tencoder.layer.0.output.LayerNorm.bias\n\tencoder.layer.1.attention.self.query.weight\n\tencoder.layer.1.attention.self.query.bias\n\tencoder.layer.1.attention.self.key.w
eight\n\tencoder.layer.1.attention.self.key.bias\n\tencoder.layer.1.attention.self.value.weight\n\tencoder.layer.1.attention.self.value.bias\n\tencoder.layer.1.attention.output.de
nse.weight\n\tencoder.layer.1.attention.output.dense.bias\n\tencoder.layer.1.attention.output.LayerNorm.weight\n\tencoder.layer.1.attention.output.LayerNorm.bias\n\tencoder.layer.
1.intermediate.dense.weight\n\tencoder.layer.1.intermediate.dense.bias\n\tencoder.layer.1.output.dense.weight\n\tencoder.layer.1.output.dense.bias\n\tencoder.layer.1.output.LayerN
orm.weight\n\tencoder.layer.1.output.LayerNorm.bias\n\tpooler.dense.weight\n\tpooler.dense.bias')
transformers_bert_for_pretraining: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with
ZeroDDP.\n', 'Reduction failed at followed parameters:\n\tbert.embeddings.word_embeddings.weight\n\tbert.embeddings.position_embeddings.weight\n\tbert.embeddings.token_type_embedd
ings.weight\n\tbert.embeddings.LayerNorm.weight\n\tbert.embeddings.LayerNorm.bias\n\tbert.encoder.layer.0.attention.self.query.weight\n\tbert.encoder.layer.0.attention.self.query.
bias\n\tbert.encoder.layer.0.attention.self.key.weight\n\tbert.encoder.layer.0.attention.self.key.bias\n\tbert.encoder.layer.0.attention.self.value.weight\n\tbert.encoder.layer.0.
attention.self.value.bias\n\tbert.encoder.layer.0.attention.output.dense.weight\n\tbert.encoder.layer.0.attention.output.dense.bias\n\tbert.encoder.layer.0.attention.output.LayerN
orm.weight\n\tbert.encoder.layer.0.attention.output.LayerNorm.bias\n\tbert.encoder.layer.0.intermediate.dense.weight\n\tbert.encoder.layer.0.intermediate.dense.bias\n\tbert.encode
r.layer.0.output.dense.weight\n\tbert.encoder.layer.0.output.dense.bias\n\tbert.encoder.layer.0.output.LayerNorm.weight\n\tbert.encoder.layer.0.output.LayerNorm.bias\n\tbert.encod
er.layer.1.attention.self.query.weight\n\tbert.encoder.layer.1.attention.self.query.bias\n\tbert.encoder.layer.1.attention.self.key.weight\n\tbert.encoder.layer.1.attention.self.k
ey.bias\n\tbert.encoder.layer.1.attention.self.value.weight\n\tbert.encoder.layer.1.attention.self.value.bias\n\tbert.encoder.layer.1.attention.output.dense.weight\n\tbert.encoder
.layer.1.attention.output.dense.bias\n\tbert.encoder.layer.1.attention.output.LayerNorm.weight\n\tbert.encoder.layer.1.attention.output.LayerNorm.bias\n\tbert.encoder.layer.1.inte
rmediate.dense.weight\n\tbert.encoder.layer.1.intermediate.dense.bias\n\tbert.encoder.layer.1.output.dense.weight\n\tbert.encoder.layer.1.output.dense.bias\n\tbert.encoder.layer.1
.output.LayerNorm.weight\n\tbert.encoder.layer.1.output.LayerNorm.bias\n\tbert.pooler.dense.weight\n\tbert.pooler.dense.bias\n\tcls.predictions.bias\n\tcls.predictions.transform.d
ense.weight\n\tcls.predictions.transform.dense.bias\n\tcls.predictions.transform.LayerNorm.weight\n\tcls.predictions.transform.LayerNorm.bias\n\tcls.seq_relationship.weight\n\tcls
.seq_relationship.bias')
transformers_gpt_double_heads: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with
ZeroDDP.\n', 'Reduction failed at followed parameters:\n\ttransformer.wte.weight\n\ttransformer.wpe.weight\n\ttransformer.h.0.ln_1.weight\n\ttransformer.h.0.ln_1.bias\n\ttransform
er.h.0.attn.c_attn.weight\n\ttransformer.h.0.attn.c_attn.bias\n\ttransformer.h.0.attn.c_proj.weight\n\ttransformer.h.0.attn.c_proj.bias\n\ttransformer.h.0.ln_2.weight\n\ttransform
er.h.0.ln_2.bias\n\ttransformer.h.0.mlp.c_fc.weight\n\ttransformer.h.0.mlp.c_fc.bias\n\ttransformer.h.0.mlp.c_proj.weight\n\ttransformer.h.0.mlp.c_proj.bias\n\ttransformer.h.1.ln_
1.weight\n\ttransformer.h.1.ln_1.bias\n\ttransformer.h.1.attn.c_attn.weight\n\ttransformer.h.1.attn.c_attn.bias\n\ttransformer.h.1.attn.c_proj.weight\n\ttransformer.h.1.attn.c_pro
j.bias\n\ttransformer.h.1.ln_2.weight\n\ttransformer.h.1.ln_2.bias\n\ttransformer.h.1.mlp.c_fc.weight\n\ttransformer.h.1.mlp.c_fc.bias\n\ttransformer.h.1.mlp.c_proj.weight\n\ttran
sformer.h.1.mlp.c_proj.bias\n\ttransformer.ln_f.weight\n\ttransformer.ln_f.bias\n\tmultiple_choice_head.summary.weight\n\tmultiple_choice_head.summary.bias')
torchaudio_hubert_base: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible
with ZeroDDP.\n', 'Reduction failed at followed parameters:\n\tfeature_extractor.conv_layers.0.layer_norm.weight\n\tfeature_extractor.conv_layers.0.layer_norm.bias\n\tfeature_extr
actor.conv_layers.0.conv.weight\n\tfeature_extractor.conv_layers.1.conv.weight\n\tfeature_extractor.conv_layers.2.conv.weight\n\tfeature_extractor.conv_layers.3.conv.weight\n\tfea
ture_extractor.conv_layers.4.conv.weight\n\tfeature_extractor.conv_layers.5.conv.weight\n\tfeature_extractor.conv_layers.6.conv.weight\n\tencoder.feature_projection.layer_norm.wei
ght\n\tencoder.feature_projection.layer_norm.bias\n\tencoder.feature_projection.projection.weight\n\tencoder.feature_projection.projection.bias\n\tencoder.transformer.pos_conv_emb
ed.conv.bias\n\tencoder.transformer.pos_conv_embed.conv.weight_g\n\tencoder.transformer.pos_conv_embed.conv.weight_v\n\tencoder.transformer.layer_norm.weight\n\tencoder.transforme
r.layer_norm.bias\n\tencoder.transformer.layers.0.attention.k_proj.weight\n\tencoder.transformer.layers.0.attention.k_proj.bias\n\tencoder.transformer.layers.0.attention.v_proj.we
ight\n\tencoder.transformer.layers.0.attention.v_proj.bias\n\tencoder.transformer.layers.0.attention.q_proj.weight\n\tencoder.transformer.layers.0.attention.q_proj.bias\n\tencoder
.transformer.layers.0.attention.out_proj.weight\n\tencoder.transformer.layers.0.attention.out_proj.bias\n\tencoder.transformer.layers.0.layer_norm.weight\n\tencoder.transformer.la
yers.0.layer_norm.bias\n\tencoder.transformer.layers.0.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.0.feed_forward.intermediate_dense.bias\n\tencoder.trans
former.layers.0.feed_forward.output_dense.weight\n\tencoder.transformer.layers.0.feed_forward.output_dense.bias\n\tencoder.transformer.layers.0.final_layer_norm.weight\n\tencoder.
transformer.layers.0.final_layer_norm.bias\n\tencoder.transformer.layers.1.attention.k_proj.weight\n\tencoder.transformer.layers.1.attention.k_proj.bias\n\tencoder.transformer.lay
ers.1.attention.v_proj.weight\n\tencoder.transformer.layers.1.attention.v_proj.bias\n\tencoder.transformer.layers.1.attention.q_proj.weight\n\tencoder.transformer.layers.1.attenti
on.q_proj.bias\n\tencoder.transformer.layers.1.attention.out_proj.weight\n\tencoder.transformer.layers.1.attention.out_proj.bias\n\tencoder.transformer.layers.1.layer_norm.weight\
n\tencoder.transformer.layers.1.layer_norm.bias\n\tencoder.transformer.layers.1.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.1.feed_forward.intermediate_de
nse.bias\n\tencoder.transformer.layers.1.feed_forward.output_dense.weight\n\tencoder.transformer.layers.1.feed_forward.output_dense.bias\n\tencoder.transformer.layers.1.final_laye
r_norm.weight\n\tencoder.transformer.layers.1.final_layer_norm.bias\n\tencoder.transformer.layers.2.attention.k_proj.weight\n\tencoder.transformer.layers.2.attention.k_proj.bias\n
\tencoder.transformer.layers.2.attention.v_proj.weight\n\tencoder.transformer.layers.2.attention.v_proj.bias\n\tencoder.transformer.layers.2.attention.q_proj.weight\n\tencoder.tra
nsformer.layers.2.attention.q_proj.bias\n\tencoder.transformer.layers.2.attention.out_proj.weight\n\tencoder.transformer.layers.2.attention.out_proj.bias\n\tencoder.transformer.la
yers.2.layer_norm.weight\n\tencoder.transformer.layers.2.layer_norm.bias\n\tencoder.transformer.layers.2.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.2.fee
d_forward.intermediate_dense.bias\n\tencoder.transformer.layers.2.feed_forward.output_dense.weight\n\tencoder.transformer.layers.2.feed_forward.output_dense.bias\n\tencoder.transf
ormer.layers.2.final_layer_norm.weight\n\tencoder.transformer.layers.2.final_layer_norm.bias\n\tencoder.transformer.layers.3.attention.k_proj.weight\n\tencoder.transformer.layers.
3.attention.k_proj.bias\n\tencoder.transformer.layers.3.attention.v_proj.weight\n\tencoder.transformer.layers.3.attention.v_proj.bias\n\tencoder.transformer.layers.3.attention.q_p
roj.weight\n\tencoder.transformer.layers.3.attention.q_proj.bias\n\tencoder.transformer.layers.3.attention.out_proj.weight\n\tencoder.transformer.layers.3.attention.out_proj.bias\
n\tencoder.transformer.layers.3.layer_norm.weight\n\tencoder.transformer.layers.3.layer_norm.bias\n\tencoder.transformer.layers.3.feed_forward.intermediate_dense.weight\n\tencoder
.transformer.layers.3.feed_forward.intermediate_dense.bias\n\tencoder.transformer.layers.3.feed_forward.output_dense.weight\n\tencoder.transformer.layers.3.feed_forward.output_den
se.bias\n\tencoder.transformer.layers.3.final_layer_norm.weight\n\tencoder.transformer.layers.3.final_layer_norm.bias\n\tencoder.transformer.layers.4.attention.k_proj.weight\n\ten
coder.transformer.layers.4.attention.k_proj.bias\n\tencoder.transformer.layers.4.attention.v_proj.weight\n\tencoder.transformer.layers.4.attention.v_proj.bias\n\tencoder.transform
er.layers.4.attention.q_proj.weight\n\tencoder.transformer.layers.4.attention.q_proj.bias\n\tencoder.transformer.layers.4.attention.out_proj.weight\n\tencoder.transformer.layers.4
.attention.out_proj.bias\n\tencoder.transformer.layers.4.layer_norm.weight\n\tencoder.transformer.layers.4.layer_norm.bias\n\tencoder.transformer.layers.4.feed_forward.intermediat
e_dense.weight\n\tencoder.transformer.layers.4.feed_forward.intermediate_dense.bias\n\tencoder.transformer.layers.4.feed_forward.output_dense.weight\n\tencoder.transformer.layers.
4.feed_forward.output_dense.bias\n\tencoder.transformer.layers.4.final_layer_norm.weight\n\tencoder.transformer.layers.4.final_layer_norm.bias\n\tencoder.transformer.layers.5.atte
ntion.k_proj.weight\n\tencoder.transformer.layers.5.attention.k_proj.bias\n\tencoder.transformer.layers.5.attention.v_proj.weight\n\tencoder.transformer.layers.5.attention.v_proj.
bias\n\tencoder.transformer.layers.5.attention.q_proj.weight\n\tencoder.transformer.layers.5.attention.q_proj.bias\n\tencoder.transformer.layers.5.attention.out_proj.weight\n\tenc
oder.transformer.layers.5.attention.out_proj.bias\n\tencoder.transformer.layers.5.layer_norm.weight\n\tencoder.transformer.layers.5.layer_norm.bias\n\tencoder.transformer.layers.5
.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.5.feed_forward.intermediate_dense.bias\n\tencoder.transformer.layers.5.feed_forward.output_dense.weight\n\ten
coder.transformer.layers.5.feed_forward.output_dense.bias\n\tencoder.transformer.layers.5.final_layer_norm.weight\n\tencoder.transformer.layers.5.final_layer_norm.bias\n\tencoder.
transformer.layers.6.attention.k_proj.weight\n\tencoder.transformer.layers.6.attention.k_proj.bias\n\tencoder.transformer.layers.6.attention.v_proj.weight\n\tencoder.transformer.l
ayers.6.attention.v_proj.bias\n\tencoder.transformer.layers.6.attention.q_proj.weight\n\tencoder.transformer.layers.6.attention.q_proj.bias\n\tencoder.transformer.layers.6.attenti
on.out_proj.weight\n\tencoder.transformer.layers.6.attention.out_proj.bias\n\tencoder.transformer.layers.6.layer_norm.weight\n\tencoder.transformer.layers.6.layer_norm.bias\n\tenc
oder.transformer.layers.6.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.6.feed_forward.intermediate_dense.bias\n\tencoder.transformer.layers.6.feed_forward.
output_dense.weight\n\tencoder.transformer.layers.6.feed_forward.output_dense.bias\n\tencoder.transformer.layers.6.final_layer_norm.weight\n\tencoder.transformer.layers.6.final_la
yer_norm.bias\n\tencoder.transformer.layers.7.attention.k_proj.weight\n\tencoder.transformer.layers.7.attention.k_proj.bias\n\tencoder.transformer.layers.7.attention.v_proj.weight
\n\tencoder.transformer.layers.7.attention.v_proj.bias\n\tencoder.transformer.layers.7.attention.q_proj.weight\n\tencoder.transformer.layers.7.attention.q_proj.bias\n\tencoder.tra
nsformer.layers.7.attention.out_proj.weight\n\tencoder.transformer.layers.7.attention.out_proj.bias\n\tencoder.transformer.layers.7.layer_norm.weight\n\tencoder.transformer.layers
.7.layer_norm.bias\n\tencoder.transformer.layers.7.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.7.feed_forward.intermediate_dense.bias\n\tencoder.transform
er.layers.7.feed_forward.output_dense.weight\n\tencoder.transformer.layers.7.feed_forward.output_dense.bias\n\tencoder.transformer.layers.7.final_layer_norm.weight\n\tencoder.tran
sformer.layers.7.final_layer_norm.bias\n\tencoder.transformer.layers.8.attention.k_proj.weight\n\tencoder.transformer.layers.8.attention.k_proj.bias\n\tencoder.transformer.layers.
8.attention.v_proj.weight\n\tencoder.transformer.layers.8.attention.v_proj.bias\n\tencoder.transformer.layers.8.attention.q_proj.weight\n\tencoder.transformer.layers.8.attention.q
_proj.bias\n\tencoder.transformer.layers.8.attention.out_proj.weight\n\tencoder.transformer.layers.8.attention.out_proj.bias\n\tencoder.transformer.layers.8.layer_norm.weight\n\te
ncoder.transformer.layers.8.layer_norm.bias\n\tencoder.transformer.layers.8.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.8.feed_forward.intermediate_dense.
bias\n\tencoder.transformer.layers.8.feed_forward.output_dense.weight\n\tencoder.transformer.layers.8.feed_forward.output_dense.bias\n\tencoder.transformer.layers.8.final_layer_no
rm.weight\n\tencoder.transformer.layers.8.final_layer_norm.bias\n\tencoder.transformer.layers.9.attention.k_proj.weight\n\tencoder.transformer.layers.9.attention.k_proj.bias\n\ten
coder.transformer.layers.9.attention.v_proj.weight\n\tencoder.transformer.layers.9.attention.v_proj.bias\n\tencoder.transformer.layers.9.attention.q_proj.weight\n\tencoder.transfo
rmer.layers.9.attention.q_proj.bias\n\tencoder.transformer.layers.9.attention.out_proj.weight\n\tencoder.transformer.layers.9.attention.out_proj.bias\n\tencoder.transformer.layers
.9.layer_norm.weight\n\tencoder.transformer.layers.9.layer_norm.bias\n\tencoder.transformer.layers.9.feed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.9.feed_fo
rward.intermediate_dense.bias\n\tencoder.transformer.layers.9.feed_forward.output_dense.weight\n\tencoder.transformer.layers.9.feed_forward.output_dense.bias\n\tencoder.transforme
r.layers.9.final_layer_norm.weight\n\tencoder.transformer.layers.9.final_layer_norm.bias\n\tencoder.transformer.layers.10.attention.k_proj.weight\n\tencoder.transformer.layers.10.
attention.k_proj.bias\n\tencoder.transformer.layers.10.attention.v_proj.weight\n\tencoder.transformer.layers.10.attention.v_proj.bias\n\tencoder.transformer.layers.10.attention.q_
proj.weight\n\tencoder.transformer.layers.10.attention.q_proj.bias\n\tencoder.transformer.layers.10.attention.out_proj.weight\n\tencoder.transformer.layers.10.attention.out_proj.b
ias\n\tencoder.transformer.layers.10.layer_norm.weight\n\tencoder.transformer.layers.10.layer_norm.bias\n\tencoder.transformer.layers.10.feed_forward.intermediate_dense.weight\n\t
encoder.transformer.layers.10.feed_forward.intermediate_dense.bias\n\tencoder.transformer.layers.10.feed_forward.output_dense.weight\n\tencoder.transformer.layers.10.feed_forward.
output_dense.bias\n\tencoder.transformer.layers.10.final_layer_norm.weight\n\tencoder.transformer.layers.10.final_layer_norm.bias\n\tencoder.transformer.layers.11.attention.k_proj
.weight\n\tencoder.transformer.layers.11.attention.k_proj.bias\n\tencoder.transformer.layers.11.attention.v_proj.weight\n\tencoder.transformer.layers.11.attention.v_proj.bias\n\te
ncoder.transformer.layers.11.attention.q_proj.weight\n\tencoder.transformer.layers.11.attention.q_proj.bias\n\tencoder.transformer.layers.11.attention.out_proj.weight\n\tencoder.t
ransformer.layers.11.attention.out_proj.bias\n\tencoder.transformer.layers.11.layer_norm.weight\n\tencoder.transformer.layers.11.layer_norm.bias\n\tencoder.transformer.layers.11.f
eed_forward.intermediate_dense.weight\n\tencoder.transformer.layers.11.feed_forward.intermediate_dense.bias\n\tencoder.transformer.layers.11.feed_forward.output_dense.weight\n\ten
coder.transformer.layers.11.feed_forward.output_dense.bias\n\tencoder.transformer.layers.11.final_layer_norm.weight\n\tencoder.transformer.layers.11.final_layer_norm.bias')

Test

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

github-actions · 2023-03-31T07:48:43Z

The code coverage for the changed files is 38%.

Click me to view the complete report

Name                                                   Stmts   Miss  Cover
--------------------------------------------------------------------------
colossalai/booster/plugin/__init__.py                      4      0   100%
colossalai/booster/plugin/gemini_plugin.py               116     67    42%
tests/test_booster/test_plugin/test_gemini_plugin.py      71     51    28%
--------------------------------------------------------------------------
TOTAL                                                    191    118    38%

ver217 added 8 commits March 29, 2023 17:16

[booster] add gemini plugin

2ec8f87

[booster] update docstr

da4614e

[booster] gemini plugin add coloparam convertor

828e76a

[booster] fix coloparam convertor

b7af7e4

[booster] fix gemini plugin device

739b47d

[booster] add gemini plugin test

8e76bb8

[booster] gemini plugin ignore sync bn

2af5a8b

[booster] skip some model

cbd6b9d

ver217 added Run Build and Test API related to API changes labels Mar 30, 2023

FrankLeeeee reviewed Mar 30, 2023

View reviewed changes

Comment thread colossalai/booster/plugin/gemini_plugin.py

ver217 added 4 commits March 31, 2023 10:48

[booster] skip some model

d87fc9d

[booster] modify test world size

9980438

[booster] modify test world size

2d90c6a

[booster] skip test

ea15a35

YuliangLiu0306 approved these changes Mar 31, 2023

View reviewed changes

ver217 merged commit 5f2e34e into hpcaitech:main Mar 31, 2023

ver217 deleted the feature/booster-gemini branch March 31, 2023 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[booster] implement Gemini plugin#3352

[booster] implement Gemini plugin#3352
ver217 merged 12 commits intohpcaitech:mainfrom
ver217:feature/booster-gemini

ver217 commented Mar 30, 2023 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Mar 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ver217 commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

Compatibility report

Test

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

Uh oh!

github-actions Bot commented Mar 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ver217 commented Mar 30, 2023 •

edited

Loading