-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue
Milestone
Description
When using Vector256.Zero, I would expect it to be kept in a fixed register and reused. Instead, what I see is a vxorps operation emitted every time.
AVX2 has 16 YMM registers.
Bellow is one example. I can get the desired behavior by forcing a zero vector variable instead of using Vector256.Zero;
Assigning Vector256<byte>.Zero to a variable alone does not do the trick. Only the extra xor operation ensures it stays in a fixed register.
var byteVector = Vector256.LoadUnsafe<byte>(ref spanRef);
var low = Avx2.UnpackLow(byteVector, Vector256<byte>.Zero);
var high = Avx2.UnpackHigh(byteVector, Vector256<byte>.Zero);
var added = Avx2.Add(low.AsInt16(), high.AsInt16());
added = Avx2.HorizontalAdd(added, Vector256<short>.Zero);
added = Avx2.HorizontalAdd(added, Vector256<short>.Zero);
added = Avx2.HorizontalAdd(added, Vector256<short>.Zero);
//ASM
mov rax, bword ptr [rcx]
vmovdqu ymm0, ymmword ptr[rax]
vxorps ymm1, ymm1, ymm1
vpunpcklbw ymm1, ymm0, ymm1
vxorps ymm2, ymm2, ymm2
vpunpckhbw ymm0, ymm0, ymm2
vpaddw ymm0, ymm1, ymm0
vxorps ymm1, ymm1, ymm1
vphaddw ymm0, ymm0, ymm1
vxorps ymm1, ymm1, ymm1
vphaddw ymm0, ymm0, ymm1
vxorps ymm1, ymm1, ymm1
vphaddw ymm0, ymm0, ymm1
var byteVector = Vector256.LoadUnsafe<byte>(ref spanRef);
var zero = Vector256<byte>.Zero;
zero = Avx2.Xor(zero, zero); //forces fixed register
var low = Avx2.UnpackLow(byteVector, zero);
var high = Avx2.UnpackHigh(byteVector, zero);
var added = Avx2.Add(low.AsInt16(), high.AsInt16());
added = Avx2.HorizontalAdd(added, zero.AsInt16());
added = Avx2.HorizontalAdd(added, zero.AsInt16());
added = Avx2.HorizontalAdd(added, zero.AsInt16());
//ASM
mov rax, bword ptr [rcx]
vxorps ymm0, ymm0, ymm0
vmovdqu ymm1, ymmword ptr[rax]
vpunpcklbw ymm2, ymm1, ymm0
vpunpckhbw ymm1, ymm1, ymm0
vpaddw ymm1, ymm2, ymm1
vphaddw ymm1, ymm1, ymm0
vphaddw ymm1, ymm1, ymm0
vphaddw ymm1, ymm1, ymm0
category:cq
theme:cse
skill-level:intermediate
cost:medium
impact:small
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue