Some problems about vision transformer

Hello, by combining the code and your paper, I have the following questions（about vit_ csra）：

In the code, the class token is not used in the input of the last CSRA module, so why set the class token in the code in "VIT_CSRA".
Has the last MLP head used for classification in the vision transformer been deleted directly?