Skip to content

assert bool((rel_pair_idx == pair_idx[vr_indices]).all()) #2

@tsamoura

Description

@tsamoura

Dear authors,

Congratulations for the very nice work! I ran your code for SGDET and I got an assertion error. In particular, I ran this command:

CUDA_VISIBLE_DEVICES=6 \
python tools/relation_train_net.py \
 --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \
 MODEL.ROI_RELATION_HEAD.USE_GT_BOX False \
 MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False \
 MODEL.ROI_RELATION_HEAD.PREDICTOR RUNetPredictor \
 SOLVER.IMS_PER_BATCH 1 \
 TEST.IMS_PER_BATCH 1 \
 DTYPE "float16" \
 SOLVER.PRE_VAL True \
 SOLVER.BASE_LR 0.0025 \
 MODEL.ROI_RELATION_HEAD.L21_LOSS 0.7 \
 MODEL.PRETRAINED_DETECTOR_CKPT ~/checkpoints/pretrained_faster_rcnn/model_final.pth \
 OUTPUT_DIR ~/checkpoints/runet-sgdet

and I got the exception:

maskrcnn_benchmark INFO: -------------------------------
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Traceback (most recent call last):
  File "tools/relation_train_net.py", line 379, in <module>
    main()
  File "tools/relation_train_net.py", line 372, in main
    model = train(cfg, args.local_rank, args.distributed, logger)
  File "tools/relation_train_net.py", line 147, in train
    loss_dict = model(images, targets)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets, logger)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 69, in forward
    x, detections, loss_relation = self.relation(features, detections, targets, logger)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/roi_heads/relation_head/relation_head.py", line 94, in forward
    refine_logits, relation_logits, add_losses = self.predictor(proposals, rel_pair_idxs, full_pair_idxs, rel_labels, rel_binarys, roi_features, union_features, logger)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/roi_heads/relation_head/roi_relation_predictors.py", line 819, in forward
    assert bool((rel_pair_idx == pair_idx[vr_indices]).all())

Notice that I got the same assertion error when trying with multiple GPUs, i.e., when running this command:

CUDA_VISIBLE_DEVICES=6,7 \
python -m torch.distributed.launch \
 --master_port 15026 \
 --nproc_per_node=2 \
 tools/relation_train_net.py \
 --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \
 MODEL.ROI_RELATION_HEAD.USE_GT_BOX False \
 MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False \
 MODEL.ROI_RELATION_HEAD.PREDICTOR RUNetPredictor \
 SOLVER.IMS_PER_BATCH 2 \
 TEST.IMS_PER_BATCH 2 \
 DTYPE "float16" \
 SOLVER.PRE_VAL True \
 SOLVER.BASE_LR 0.0025 \
 MODEL.ROI_RELATION_HEAD.L21_LOSS 0.7 \
 MODEL.PRETRAINED_DETECTOR_CKPT ~/checkpoints/pretrained_faster_rcnn/model_final.pth \
 OUTPUT_DIR ~/checkpoints/runet-sgdet-2gpus

Any suggestions for fixing the issue?

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions