Hello, by combining the code and your paper, I have the following questions(about vit_ csra):
In the code, the class token is not used in the input of the last CSRA module, so why set the class token in the code in "VIT_CSRA".
Has the last MLP head used for classification in the vision transformer been deleted directly?