Seek() method has the following logic
#if !DEBUG
// Check will be Jitted away https://github.com/dotnet/coreclr/issues/1079
if (Vector.IsHardwareAccelerated)
{
#endif
if (following >= _vectorSpan)
{
var data = new Vector<byte>(array, index);
var byte0Equals = Vector.Equals(data, byte0Vector);
var byte1Equals = Vector.Equals(data, byte1Vector);
var byte2Equals = Vector.Equals(data, byte2Vector);
if (!byte0Equals.Equals(Vector<byte>.Zero))
{
byte0Index = FindFirstEqualByte(ref byte0Equals);
}
if (!byte1Equals.Equals(Vector<byte>.Zero))
{
byte1Index = FindFirstEqualByte(ref byte1Equals);
}
if (!byte2Equals.Equals(Vector<byte>.Zero))
{
byte2Index = FindFirstEqualByte(ref byte2Equals);
}
Since byte0Equals, byte1Equals and byte2Equals are passed by reference to FindFistEqualByte(), they are marked as addr-exposed locals by RyuJIT and as a result are not register allocated. Now when they are used in if condition "!byte0Equals.Equals(Vector.Zero). it gets loaded from memory. That is on an AVX2 capable machine, 3 SIMD vectors (of each 32-bytes) will be written to memory and loaded from memory plus stack frame space for 3 SIMD vectors need to be reserved.
This can be avoided by slightly re-arranging the code as follows:
Vector<byte> tmp;
var data = new Vector<byte>(array, index);
var byte0Equals = Vector.Equals(data, byte0Vector);
if (!byte0Equals.Equals(Vector<byte>.Zero))
{
// Make a copy and pass it by ref.
// As a result byte0Equals will not be marked as addr-exposed and will be reg allocated
// Note that making a copy under h/w acceleration is equal to a reg-to-reg or reg-to-mem move
tmp = byte0Equals;
byte0Index = FindFirstEqualByte(ref tmp);
}
var byte1Equals = Vector.Equals(data, byte1Vector);
if (!byte1Equals.Equals(Vector<byte>.Zero))
{
tmp = byte1Equals;
byte1Index = FindFirstEqualByte(ref tmp);
}
var byte2Equals = Vector.Equals(data, byte2Vector);
if (!byte2Equals.Equals(Vector<byte>.Zero))
{
tmp = byte2Equals;
byte2Index = FindFirstEqualByte(ref tmp);
}
With the above change byte0Equals, byte1Equals, byte2Equals SIMD vectors will remain in registers since they are no longer addr-exposed locals and would reduce stack frame space by 2*32 bytes. Further since life-times of byte0Equals, byte1Equals and byte2Equals are non-overlapping, it would also reduce register pressure.
Seek() method has the following logic
Since byte0Equals, byte1Equals and byte2Equals are passed by reference to FindFistEqualByte(), they are marked as addr-exposed locals by RyuJIT and as a result are not register allocated. Now when they are used in if condition "!byte0Equals.Equals(Vector.Zero). it gets loaded from memory. That is on an AVX2 capable machine, 3 SIMD vectors (of each 32-bytes) will be written to memory and loaded from memory plus stack frame space for 3 SIMD vectors need to be reserved.
This can be avoided by slightly re-arranging the code as follows:
With the above change byte0Equals, byte1Equals, byte2Equals SIMD vectors will remain in registers since they are no longer addr-exposed locals and would reduce stack frame space by 2*32 bytes. Further since life-times of byte0Equals, byte1Equals and byte2Equals are non-overlapping, it would also reduce register pressure.