Skip to content

Update lsra to not mark op2 as delay free for certain RMW nodes #9896

@tannergooding

Description

@tannergooding

For nodes with RMW semantics, we are currently setting op2 as delayFree in a number of scenarios: https://github.com/dotnet/coreclr/blob/master/src/jit/lsraxarch.cpp#L659

This can cause "poor" codegen in certain scenarios, such as when op1 and op2 are the same value (i.e. the same local) and where op2 is the "final use" for that value.

One example of this is:

public static Vector128<long> Test2(long value)
{
    var tmp = Sse2.ConvertScalarToVector128Int64(value);
    return Sse2.UnpackLow(tmp, tmp);
}

Where this currently (with dotnet/coreclr#16808) generates:

66480F6EC2           movd     xmm0, rdx
0F28C8               movaps   xmm1, xmm0
660F6CC8             punpcklqdq xmm1, xmm0
0F1109               movups   xmmword ptr [rcx], xmm1
488BC1               mov      rax, rcx

Not setting delay free for op2 allows us to elide the unneeded copy and instead generate:

66480F6EC2           movd     xmm0, rdx
660F6CC0             punpcklqdq xmm0, xmm0
0F1101               movups   xmmword ptr [rcx], xmm0
488BC1               mov      rax, rcx

category:cq
theme:register-allocator
skill-level:intermediate
cost:small

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions