I tried to reproduce your paper, but it's not clear how to adopt the Two-View Transformer in your framework. For the pretrained Faster R-CNN (ResNext-101), did you use the original Visual Genome dataset? Or did you use the filtered version which has less number of categories?
I tried to reproduce your paper, but it's not clear how to adopt the Two-View Transformer in your framework. For the pretrained Faster R-CNN (ResNext-101), did you use the original Visual Genome dataset? Or did you use the filtered version which has less number of categories?