Currently, RyuJIT generates AVX instructions (VEX-encoding) when AVX2 is available:
- Floating-point calculations and SIMD code use VEX-encoding instructions on AVX2-capable machines (Haswell and above).
- SIMD vectors (
System.Numerics.Vector<T>) has the size of 256-bit (YMM) on AVX2-capable machines and size of 128-bit (XMM) on machines that support AVX (Sandy Bridge) or blow ISA.
However, we will broadly use AVX instructions via Intel hardware intrinsics even if the underlying hardware has no AVX2. Therefore, mixing use of Avx intrinsics and floating-point calculation (or System.Numerics.Vectors) may trigger AVX-SSE transition penalties with the current codegen strategy. The new VEX-encoding selection strategy should be:
- Floating-point calculations use VEX-encoding instructions on AVX-capable machines (Sandy Bridge and above).
- SIMD vectors (
System.Numerics.Vector<T>) has the size of 256-bit (YMM) on AVX2-capable machines and size of 128-bit (XMM) on machines that support AVX (Sandy Bridge) or blow ISA. // no change
- SIMD code (
System.Numerics.Vectors) is compiled to instructions that have VEX.128 prefix and operate over XMM registers on AVX-capable machines, but compiled to instructions that have VEX.256 prefix and operate over YMM registers on AVX2-capable machines.
I will provide a PR after finish dotnet/coreclr#14020 .
Currently, RyuJIT generates AVX instructions (VEX-encoding) when AVX2 is available:
System.Numerics.Vector<T>) has the size of 256-bit (YMM) on AVX2-capable machines and size of 128-bit (XMM) on machines that support AVX (Sandy Bridge) or blow ISA.However, we will broadly use AVX instructions via Intel hardware intrinsics even if the underlying hardware has no AVX2. Therefore, mixing use of
Avxintrinsics and floating-point calculation (orSystem.Numerics.Vectors) may trigger AVX-SSE transition penalties with the current codegen strategy. The new VEX-encoding selection strategy should be:System.Numerics.Vector<T>) has the size of 256-bit (YMM) on AVX2-capable machines and size of 128-bit (XMM) on machines that support AVX (Sandy Bridge) or blow ISA. // no changeSystem.Numerics.Vectors) is compiled to instructions that have VEX.128 prefix and operate over XMM registers on AVX-capable machines, but compiled to instructions that have VEX.256 prefix and operate over YMM registers on AVX2-capable machines.I will provide a PR after finish dotnet/coreclr#14020 .