Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

An optimization to Kestrel Server Seek() method  #1141

@sivarv

Description

@sivarv

Seek() method has the following logic

#if !DEBUG
                    // Check will be Jitted away https://github.com/dotnet/coreclr/issues/1079
                    if (Vector.IsHardwareAccelerated)
                    {
#endif
                        if (following >= _vectorSpan)
                        {
                            var data = new Vector<byte>(array, index);
                            var byte0Equals = Vector.Equals(data, byte0Vector);
                            var byte1Equals = Vector.Equals(data, byte1Vector);
                            var byte2Equals = Vector.Equals(data, byte2Vector);

                            if (!byte0Equals.Equals(Vector<byte>.Zero))
                            {
                                byte0Index = FindFirstEqualByte(ref byte0Equals);
                            }
                            if (!byte1Equals.Equals(Vector<byte>.Zero))
                            {
                                byte1Index = FindFirstEqualByte(ref byte1Equals);
                            }
                            if (!byte2Equals.Equals(Vector<byte>.Zero))
                            {
                                byte2Index = FindFirstEqualByte(ref byte2Equals);
                            }

Since byte0Equals, byte1Equals and byte2Equals are passed by reference to FindFistEqualByte(), they are marked as addr-exposed locals by RyuJIT and as a result are not register allocated. Now when they are used in if condition "!byte0Equals.Equals(Vector.Zero). it gets loaded from memory. That is on an AVX2 capable machine, 3 SIMD vectors (of each 32-bytes) will be written to memory and loaded from memory plus stack frame space for 3 SIMD vectors need to be reserved.

This can be avoided by slightly re-arranging the code as follows:

Vector<byte> tmp;
var data = new Vector<byte>(array, index);

var byte0Equals = Vector.Equals(data, byte0Vector);
if (!byte0Equals.Equals(Vector<byte>.Zero))
{    
      // Make a copy and pass it by ref.
      // As a result byte0Equals will not be marked as addr-exposed and will be reg allocated
      // Note that making a copy under h/w acceleration is equal to a reg-to-reg or reg-to-mem move
      tmp = byte0Equals;
      byte0Index = FindFirstEqualByte(ref tmp);
}

var byte1Equals = Vector.Equals(data, byte1Vector);
if (!byte1Equals.Equals(Vector<byte>.Zero))
{  
      tmp = byte1Equals;
      byte1Index = FindFirstEqualByte(ref tmp);
}

var byte2Equals = Vector.Equals(data, byte2Vector);
if (!byte2Equals.Equals(Vector<byte>.Zero))
{  
      tmp = byte2Equals;
      byte2Index = FindFirstEqualByte(ref tmp);
}

With the above change byte0Equals, byte1Equals, byte2Equals SIMD vectors will remain in registers since they are no longer addr-exposed locals and would reduce stack frame space by 2*32 bytes. Further since life-times of byte0Equals, byte1Equals and byte2Equals are non-overlapping, it would also reduce register pressure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions