Hello, thank you for such an amazing work!
Could you please provide a little detail on how you evaluated the Attend-and-Excite method? Since this method needs token indices at an input level, how did you select tokens when producing results in your VISOR benchmark (Table 3 in the paper)?
Hello, thank you for such an amazing work!
Could you please provide a little detail on how you evaluated the Attend-and-Excite method? Since this method needs token indices at an input level, how did you select tokens when producing results in your VISOR benchmark (Table 3 in the paper)?