Skip to content

Get inferior results when using distributed training. #3

@0x4f5da2

Description

@0x4f5da2

非常有幸拜读您的论文,感觉很受启发。在训练模型的过程中,我遇到了一些小问题,还希望您能慷慨解答。

I'm trying to train model with 3 RTX 2080Ti using:
我尝试用三张2080Ti进行训练,使用的命令和参数是:

CUDA_VISIBLE_DEVICES=0,1,2 python -m torch.distributed.launch --nproc_per_node=3 scripts/train_hoim.py --debug --lr_warm_up -p ./logs/self-train-batch6_c/ --batch_size 2 --nw 5 --w_RCNN_loss_bbox 10.0 --epochs 22 --lr 0.003 --distributed.

I encountered some issue with the SyncBatchNorm which said expect at least 3D input (got 2D input). According to my investigation, SyncBatchNorm is not compatible with BatchNorm1d having a 2D input in PyTorch 1.2. So I substitute the BatchNorm1d in faster_rcnn_hoim.py with a workaround:
由于PyTorch 1.2版本的SyncBatchNormBatchNorm1d不是很兼容,不支持2D的输入,所以我自己写了一个类来替换faster_rcnn_hoim.py中的BatchNorm1d

class MyBatchNorm1d(nn.Module):

    def __init__(self, *args):
        super(MyBatchNorm1d, self).__init__()
        self.bn2d = nn.BatchNorm2d(*args)

    def forward(self, x):
        x = x[..., None, None]
        x = self.bn2d(x)
        return x[..., 0, 0]

However, I end up with inferior performance even compared with a model trained with batch=2 on a single GPU.
但是最终使用分布式训练得到的结果比单卡、batch=2训练出来的结果还要差。

I noticed you used a single GPU in the experiment of your paper, I'm wondering if you encountered the similar issue or I did not configured the code correctly.
我注意到在您的论文中使用了单张显卡进行训练,我想知道您是否遇到了相似的问题,亦或是我没有我的训练方式有误?

Thanks!
谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions