Skip to content

RyuJIT SIMD: front-end optimizations for SIMD types #6742

@sivarv

Description

@sivarv

There are quite a few front-end optimizations that RyuJIT can do on SIMD vector types

  1. Constant vector propagation.
    E.g. Back-end (lowerxarch.cpp) looks to see if (in)Equality operation is against a zero vector. It checks to see if op2 is a zero vector. If front-end phases propagate constant vectors and make sure they appear as op2 in a comparison, will increase the chance of back-end making the optimization.

  2. CSE'ing of operations on SIMD types
    Right now the SetEvalOder() costs need to be updated for SIMD types. Note that depending on the target, SIMD vector size could be 16 bytes (on SSE2 machines) or 32-bytes (on AVX2 machines). Therefore, these costs cannot be static constants, should be a function of vector size.

When two successive indexed accesses of a SIMD vector take place why not RyuJIT optimize vextractf/shift operations generated?

offset = 2;
if (_vectorUlongSpan < 4 || (u = vector64[2]) == 0)
{
        offset = 3;
        if (_vectorUlongSpan < 4 || (u = vector64[3]) == 0)
        {
And the assembly it generated:

vextractf128 xmm1,ymm0,1  
vmovd       rdi,xmm1  
test        rdi,rdi  
jne         00007FFF07225841  
mov         esi,3  
vextractf128 xmm1,ymm0,1  
vpsrldq     ymm1,ymm1,8  
vmovd       rdi,xmm1  

One possible route is that SIMD Vector index operations are expanded into Extract and Shift operations early in front-end phase. CSE phase could evaluate Extract operation into a temp and replace all further occurrences that could be eliminated with the temp.

3) Loop unrolling 
To iterate over individual vector elements, one uses

for (int i=0; i < Vector<int>.Count; ++i)
{
    ....
    = V[i]
}
  1. Elimination of GT_SIMD_CHK when the index is a non-const, in loops like above.

Vector indexed access using a non-constant would result in writing vector to memory and accessing the required element from memory. When the loop is small enough, it might be beneficial to unroll loop so that SIMD vectors are indexed using constant indices. One such opportunity is in Kestrel server FindFistEqualByte() method.

  1. Loop hoisting of constant vectors

e.g.

// We can hoist the following constant vectors 
// in the below loop: Vector<T>.One, Vector<T>.Zero
// new Vector<T>(T val)
for(...)
{
      .. = Vector<int>.One + b;

     pi = new Vector<float>(3.1412);
     ....
    if (x == Vector<long>.Zero)
}
  1. Promotion of structs containing SIMD type fields.
    This is tracked by issue dotnet/coreclr#7508

category:cq
theme:vector-codegen
skill-level:intermediate
cost:large
impact:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issue

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions