Support AMP: FP8 (O1) + BF16 (O2), using current scaling.
Support AMP: FP8 (O1) + BF16 (O2), using current scaling.