Is your feature request related to a problem? Please describe.
With ToDevice after ToTensor or EnsureType, we can move the data to GPU and leverage CacheDataset to avoid duplicated CPU -> GPU copying in every epoch. And also can support other GPU transforms to accelerate.