[FEATURE]: [PyTorch] per-channel FP8 quantization

Implement per-channel scaling (in PyTorch) for FP8 quantization. 
Support PyTorch native FP8 formats.
Refer to: 
https://pytorch.org/docs/stable/tensors.html#id7
https://arxiv.org/pdf/2209.05433