-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue
Milestone
Description
For nodes with RMW semantics, we are currently setting op2 as delayFree in a number of scenarios: https://github.com/dotnet/coreclr/blob/master/src/jit/lsraxarch.cpp#L659
This can cause "poor" codegen in certain scenarios, such as when op1 and op2 are the same value (i.e. the same local) and where op2 is the "final use" for that value.
One example of this is:
public static Vector128<long> Test2(long value)
{
var tmp = Sse2.ConvertScalarToVector128Int64(value);
return Sse2.UnpackLow(tmp, tmp);
}Where this currently (with dotnet/coreclr#16808) generates:
66480F6EC2 movd xmm0, rdx
0F28C8 movaps xmm1, xmm0
660F6CC8 punpcklqdq xmm1, xmm0
0F1109 movups xmmword ptr [rcx], xmm1
488BC1 mov rax, rcxNot setting delay free for op2 allows us to elide the unneeded copy and instead generate:
66480F6EC2 movd xmm0, rdx
660F6CC0 punpcklqdq xmm0, xmm0
0F1101 movups xmmword ptr [rcx], xmm0
488BC1 mov rax, rcxcategory:cq
theme:register-allocator
skill-level:intermediate
cost:small
4creators
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue