Is your feature request related to a problem? Please describe.
Currently, the ViT backbone is designed to serve as backbone for segmentation models and in particular the UNETR model. It is desirable to support classification as well.
Describe the solution you'd like
A PR which addresses this by utilizing class embedding (token, self.cls_token ) in patchembedding block used in ViT class.