-
Notifications
You must be signed in to change notification settings - Fork 5.3k
ARM64 intrinsics support for Utf8String.Experimental #39103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
|
Tagging subscribers to this area: @tannergooding |
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Utf8String.Experimental/src/System.Utf8String.Experimental.csproj
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Utf8String.Experimental/src/System/Runtime/Intrinsics/Intrinsics.Shims.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
we don't need this exception anymore? Or should we add a check for AdvSimd as well? Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:1190 in 3acd93c. [](commit_id = 3acd93c604043f6f9cadddb24dfae595a2dc9090, deletion_comment = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just like a second pair of eyes here: Based on Levi's explanation, this looks correct to me. The result of AdvSimd.AddSaturate(utf16VectorFirst.AsUInt16(), asciiMaskForAddSaturate)) will have the 0x8000 bit set for non-Ascii values and ContainsNonAsciiValue ANDs with 0x8000800080008000, so the logic is equivalent I think. @kunalspathak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's my understanding. The & NonAsciiDataSeenMask is needed in Sse2 case because during MoveMask() we convert data from ushort to byte, but to find non-ascii, we need to check the alternate lanes of byte variants of what MoveMask returned and hence we & 0b_1010_1010_1010_1010. But for AdvSimd, in ContainsNonAsciiValue(ushort) you are already operating on ushort and there as per @GrabYourPitchforks explaination, you just need to check if there is any lane whose MSB is set (as a result of AddSaturate() ) .
I still need to double check your code around NonAsciiDataSeenMask and GetMaskOfMostSignificantBits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be StoreSelectedScalar with an index of 0? @kunalspathak . Store will store both the elements from asciiVector otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me re-check this logic tomorrow (or rather today morning :))
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified down to
Vector128<short> vecWide = Vector128.CreateScalarUnsafe(value).AsInt16();
Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(vecWide);
AdvSimd.StoreSelectedScalar(ref outputBuffer, lower.AsUInt32(), 0);and you don't need a call to ExtractNarrowingSaturateUnsignedUpper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StoreSelectedScalar only takes in a byte*. I think I'd have to use Unsafe code for this? I can still avoid the Upper call though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgovind It takes a pointer address, correct. But its type will depend on the type of the second arg (i.e. SIMD value)
We need a check for Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:968 in e3b64de. [](commit_id = e3b64de85f368fc5373dc4798e7afd69266195fa, deletion_comment = False) |
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetMaskOfMostSignificantBits [](start = 30, length = 28)
Alright, so on careful inspection it looks like you can optimize this method for byte variant (all the places in GetIndexOfFirstNonAsciiByte_Sse2OrArm64() that calls this method). This is essentially 2nd use-case in my comment here - #38653 (comment). For the mask that you get for byte version, you just do TrailingZeroCount(mask) on it. So you don't care about the exact mask but the first lane that has MSB set. So you can achieve the same thing as done in #38707.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int GetIndexOfFirstNonAsciiByte(Vector128<byte> value)
{
if (!AdvSimd.Arm64.IsSupported || !BitConverter.IsLittleEndian)
{
throw new PlatformNotSupportedException();
}
Vector128<byte> mostSignificantBitIsSet = AdvSimd.ShiftRightArithmetic(value.AsSByte(), 7).AsByte();
Vector128<byte> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitmask);
// collapse mask to lower bits
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
ulong mask = extractedBits.AsUInt64().ToScalar();
// calculate the index
int index = BitOperations.TrailingZeroCount(mask) >> 2;
Debug.Assert((mask != 0) ? index < 16 : index >= 16);
return index;
}I would also rename the method used in byte version to GetIndexOfFirstNonAsciiByte().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can optimize this method for char variant as well (all the places in GetIndexOfFirstNonAsciiChar_Sse2OrArm64() that calls this method). Even this falls in 2nd category where you just use the returned value and call TrailingZeroCount() on it. (Just double check my observation). So here is my thinking.
I am pasting @GrabYourPitchforks 's response just to have all this info in single place:
" 0xAAAA is correct. The way this logic works is that it adds 0x7F80 to every 16-bit ushort in the input, saturating at 0xFFFF (to avoid integer overflow back to a negative value).
This means that if the original value is in the range 0x0000 .. 0x007F (ASCII), the add operation will result in 0x7F80 .. 0x7FFF.
If the original value is in the range 0x0080 .. 0x807F (non-ASCII), the add operation will result in 0x8000 .. 0xFFFF.
And if the original value is in the range 0x8080 .. 0xFFFF (non-ASCII), the add operation will result in 0xFFFF due to saturation.
The non-ASCII case always results in the sum having the 0x8000 bit set, but it doesn't always result in the sum having the 0x0080 bit set. Hence 0xAAAA - not 0xFFFF - is the correct comparison against the pmovmskb result.
This is also called out in the comment a few lines above, which should hint that alternating bits of the pmovmskb result should be ignored."
If you have non-Ascii value between 0x80 .. 0xFFFF, the AddSaturate() operation will give you a value between 0x8000 ... 0xFFFF. Then you can do similar trick as above, except do an arithmetic right shift by 15. That way, you end up getting 0xFFFF in all lanes that originally had values between 0x80...0xFFFF. After that you do And with slightly modified s_bitMask to get the mask and from that get trailing zero count to get the lane. If all lanes contain ASCII, then the trailing count would 8 and thats the amount you need to advance the buffer pointer.
[MethodImpl(MethodImplOptions.NoInlining)]
public static int GetIndexOfFirstNonAsciiUShort(Vector128<ushort> value)
{
if (!AdvSimd.Arm64.IsSupported || !BitConverter.IsLittleEndian)
{
throw new PlatformNotSupportedException();
}
// This can be readonly static
Vector128<ushort> s_bitmask = Vector128.Create((uint)0x1000001).AsUInt16();
Vector128<ushort> mostSignificantBitIsSet = AdvSimd.ShiftRightArithmetic(value.AsInt16(), 15).AsUInt16();
Vector128<ushort> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitmask);
// collapse mask to lower bits
extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
ulong mask = extractedBits.AsUInt64().ToScalar();
// calculate the index
int index = BitOperations.TrailingZeroCount(mask) >> 3;
Debug.Assert((mask != 0) ? index < 8 : index >= 8);
return index;
}With this suggestion, the callers of this method that pass the value as a result of Saturate() has to change:
-
It no longer need to case to
.AsByte()but can simply doGetIndexOfFirstNonAsciiUShort(AdvSimd.AddSaturate(firstVector, asciiMaskForAddSaturate)). -
You no longer have to do the check of
currentMaskreturned from this method withNonAsciiDataSeenMask. So your AdvSimd won't have the check forif ((currentMask & NonAsciiDataSeenMask) != 0). Instead it will be
if (indexOfNonAsciiUShort < 8) { // renamed currentMask to indexOfNonAsciiUShort
goto FoundNonAsciiDataInCurrentMask;
}-
In the label
FoundNonAsciiDataInCurrentMask, you no longer have to docurrentMask &= NonAsciiDataSeenMask;andBitOperations.TrailingZeroCount(currentMask) - 1, but instead can directly dopBuffer = (char*)((byte*)pBuffer + indexOfNonAsciiUShort); -
Update assert for this case to be
Debug.Assert(indexOfNonAsciiUShort >= 8).
By doing this, you save 2 AddPairwise instructions in this operation. But please double check the functionality with various inputs to make sure the logic is correct. I might have missed something in the snippet I gave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in point 3 above, inside FoundNonAsciiDataInCurrentMask , the buffer is byte* advanced with bytes factor, so you might want to do similar when you use indexOfNonAsciiUShort in the arithmetic.
Update the comments to relevant ARM64 instruction as well. Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:757 in e3b64de. [](commit_id = e3b64de85f368fc5373dc4798e7afd69266195fa, deletion_comment = False) |
same here. Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:758 in e3b64de. [](commit_id = e3b64de85f368fc5373dc4798e7afd69266195fa, deletion_comment = False) |
81624ba to
589c470
Compare
|
I'm going to close this PR pending our discussions on UTF8. |
First cut. Putting it up to see how CI turns out.
Edit:
As part of this PR, I will also refactor the
USE_INTERNAL_ACCESSIBILITYconst to move it within a csproj -- Done