Could you please explain how the alpha parameter affects the training of FVIT? Why does VITB use 0.7, while VITL uses 0.95?
Could you please explain how the alpha parameter affects the training of FVIT? Why does VITB use 0.7, while VITL uses 0.95?