Get inferior results when using distributed training.

非常有幸拜读您的论文，感觉很受启发。在训练模型的过程中，我遇到了一些小问题，还希望您能慷慨解答。

I'm trying to train model with 3 RTX 2080Ti using: 
我尝试用三张2080Ti进行训练，使用的命令和参数是：

`CUDA_VISIBLE_DEVICES=0,1,2 python -m torch.distributed.launch --nproc_per_node=3 scripts/train_hoim.py --debug --lr_warm_up -p ./logs/self-train-batch6_c/ --batch_size 2 --nw 5 --w_RCNN_loss_bbox 10.0 --epochs 22 --lr 0.003 --distributed`.



I encountered some issue with the `SyncBatchNorm` which said `expect at least 3D input (got 2D input)`. According to my investigation, `SyncBatchNorm` is not compatible with `BatchNorm1d` having a 2D input in PyTorch 1.2. So I substitute the `BatchNorm1d` in `faster_rcnn_hoim.py` with a workaround:
由于PyTorch 1.2版本的`SyncBatchNorm`与`BatchNorm1d`不是很兼容，不支持2D的输入，所以我自己写了一个类来替换`faster_rcnn_hoim.py`中的`BatchNorm1d`：

```
class MyBatchNorm1d(nn.Module):

    def __init__(self, *args):
        super(MyBatchNorm1d, self).__init__()
        self.bn2d = nn.BatchNorm2d(*args)

    def forward(self, x):
        x = x[..., None, None]
        x = self.bn2d(x)
        return x[..., 0, 0]
```

However, I end up with inferior performance even compared with a model trained with `batch=2` on a single GPU.
但是最终使用分布式训练得到的结果比单卡、`batch=2`训练出来的结果还要差。

I noticed you used a single GPU in the experiment of your paper, I'm wondering if you encountered the similar issue or I did not configured the code correctly.
我注意到在您的论文中使用了单张显卡进行训练，我想知道您是否遇到了相似的问题，亦或是我没有我的训练方式有误？

Thanks!
谢谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get inferior results when using distributed training. #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Get inferior results when using distributed training. #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions