In long_range_main.py the use_tvm arg is set to be default FALSE, and in the sample scripts this arg is not triggered. But if this arg is FALSE, it seems that pyramidal attention is not used in the whole model, which is the main contribution of the paper.
So if this arg should be set TRUE when I want to use pyramidal attention to save computation lost?
In long_range_main.py the use_tvm arg is set to be default FALSE, and in the sample scripts this arg is not triggered. But if this arg is FALSE, it seems that pyramidal attention is not used in the whole model, which is the main contribution of the paper.
So if this arg should be set TRUE when I want to use pyramidal attention to save computation lost?