-
Notifications
You must be signed in to change notification settings - Fork 11
Description
非常有幸拜读您的论文,感觉很受启发。在训练模型的过程中,我遇到了一些小问题,还希望您能慷慨解答。
I'm trying to train model with 3 RTX 2080Ti using:
我尝试用三张2080Ti进行训练,使用的命令和参数是:
CUDA_VISIBLE_DEVICES=0,1,2 python -m torch.distributed.launch --nproc_per_node=3 scripts/train_hoim.py --debug --lr_warm_up -p ./logs/self-train-batch6_c/ --batch_size 2 --nw 5 --w_RCNN_loss_bbox 10.0 --epochs 22 --lr 0.003 --distributed.
I encountered some issue with the SyncBatchNorm which said expect at least 3D input (got 2D input). According to my investigation, SyncBatchNorm is not compatible with BatchNorm1d having a 2D input in PyTorch 1.2. So I substitute the BatchNorm1d in faster_rcnn_hoim.py with a workaround:
由于PyTorch 1.2版本的SyncBatchNorm与BatchNorm1d不是很兼容,不支持2D的输入,所以我自己写了一个类来替换faster_rcnn_hoim.py中的BatchNorm1d:
class MyBatchNorm1d(nn.Module):
def __init__(self, *args):
super(MyBatchNorm1d, self).__init__()
self.bn2d = nn.BatchNorm2d(*args)
def forward(self, x):
x = x[..., None, None]
x = self.bn2d(x)
return x[..., 0, 0]
However, I end up with inferior performance even compared with a model trained with batch=2 on a single GPU.
但是最终使用分布式训练得到的结果比单卡、batch=2训练出来的结果还要差。
I noticed you used a single GPU in the experiment of your paper, I'm wondering if you encountered the similar issue or I did not configured the code correctly.
我注意到在您的论文中使用了单张显卡进行训练,我想知道您是否遇到了相似的问题,亦或是我没有我的训练方式有误?
Thanks!
谢谢!