Attend-and-Excite

Hello, thank you for such an amazing work!

Could you please provide a little detail on how you evaluated the Attend-and-Excite method? Since this method needs token indices at an input level, how did you select tokens when producing results in your VISOR benchmark (Table 3 in the paper)?