Skip to content

Conversation

@pgovind
Copy link

@pgovind pgovind commented Jul 10, 2020

First cut. Putting it up to see how CI turns out.

Edit:
As part of this PR, I will also refactor the USE_INTERNAL_ACCESSIBILITY const to move it within a csproj -- Done

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Jul 10, 2020

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

@kunalspathak
Copy link
Contributor

        }

we don't need this exception anymore? Or should we add a check for AdvSimd as well?


Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:1190 in 3acd93c. [](commit_id = 3acd93c604043f6f9cadddb24dfae595a2dc9090, deletion_comment = True)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just like a second pair of eyes here: Based on Levi's explanation, this looks correct to me. The result of AdvSimd.AddSaturate(utf16VectorFirst.AsUInt16(), asciiMaskForAddSaturate)) will have the 0x8000 bit set for non-Ascii values and ContainsNonAsciiValue ANDs with 0x8000800080008000, so the logic is equivalent I think. @kunalspathak

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my understanding. The & NonAsciiDataSeenMask is needed in Sse2 case because during MoveMask() we convert data from ushort to byte, but to find non-ascii, we need to check the alternate lanes of byte variants of what MoveMask returned and hence we & 0b_1010_1010_1010_1010. But for AdvSimd, in ContainsNonAsciiValue(ushort) you are already operating on ushort and there as per @GrabYourPitchforks explaination, you just need to check if there is any lane whose MSB is set (as a result of AddSaturate() ) .

I still need to double check your code around NonAsciiDataSeenMask and GetMaskOfMostSignificantBits.

Copy link
Author

@pgovind pgovind Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be StoreSelectedScalar with an index of 0? @kunalspathak . Store will store both the elements from asciiVector otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me re-check this logic tomorrow (or rather today morning :))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified down to

Vector128<short> vecWide = Vector128.CreateScalarUnsafe(value).AsInt16();
Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(vecWide);
AdvSimd.StoreSelectedScalar(ref outputBuffer, lower.AsUInt32(), 0);

and you don't need a call to ExtractNarrowingSaturateUnsignedUpper

Copy link
Author

@pgovind pgovind Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StoreSelectedScalar only takes in a byte*. I think I'd have to use Unsafe code for this? I can still avoid the Upper call though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgovind It takes a pointer address, correct. But its type will depend on the type of the second arg (i.e. SIMD value)

@kunalspathak
Copy link
Contributor

            firstVector = Sse2.LoadVector128((ushort*)pBuffer); // unaligned load

We need a check for AdvSimd here.


Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:968 in e3b64de. [](commit_id = e3b64de85f368fc5373dc4798e7afd69266195fa, deletion_comment = False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMaskOfMostSignificantBits [](start = 30, length = 28)

Alright, so on careful inspection it looks like you can optimize this method for byte variant (all the places in GetIndexOfFirstNonAsciiByte_Sse2OrArm64() that calls this method). This is essentially 2nd use-case in my comment here - #38653 (comment). For the mask that you get for byte version, you just do TrailingZeroCount(mask) on it. So you don't care about the exact mask but the first lane that has MSB set. So you can achieve the same thing as done in #38707.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int GetIndexOfFirstNonAsciiByte(Vector128<byte> value)
{
    if (!AdvSimd.Arm64.IsSupported || !BitConverter.IsLittleEndian)
    {
        throw new PlatformNotSupportedException();
    }

    Vector128<byte> mostSignificantBitIsSet = AdvSimd.ShiftRightArithmetic(value.AsSByte(), 7).AsByte();
    Vector128<byte> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitmask);

    // collapse mask to lower bits
    extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
    ulong mask = extractedBits.AsUInt64().ToScalar();

    // calculate the index
    int index = BitOperations.TrailingZeroCount(mask) >> 2;
    Debug.Assert((mask != 0) ? index < 16 : index >= 16);
    return index;
}

I would also rename the method used in byte version to GetIndexOfFirstNonAsciiByte().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can optimize this method for char variant as well (all the places in GetIndexOfFirstNonAsciiChar_Sse2OrArm64() that calls this method). Even this falls in 2nd category where you just use the returned value and call TrailingZeroCount() on it. (Just double check my observation). So here is my thinking.

I am pasting @GrabYourPitchforks 's response just to have all this info in single place:

" 0xAAAA is correct. The way this logic works is that it adds 0x7F80 to every 16-bit ushort in the input, saturating at 0xFFFF (to avoid integer overflow back to a negative value).

This means that if the original value is in the range 0x0000 .. 0x007F (ASCII), the add operation will result in 0x7F80 .. 0x7FFF.
If the original value is in the range 0x0080 .. 0x807F (non-ASCII), the add operation will result in 0x8000 .. 0xFFFF.
And if the original value is in the range 0x8080 .. 0xFFFF (non-ASCII), the add operation will result in 0xFFFF due to saturation.

The non-ASCII case always results in the sum having the 0x8000 bit set, but it doesn't always result in the sum having the 0x0080 bit set. Hence 0xAAAA - not 0xFFFF - is the correct comparison against the pmovmskb result.

This is also called out in the comment a few lines above, which should hint that alternating bits of the pmovmskb result should be ignored."

If you have non-Ascii value between 0x80 .. 0xFFFF, the AddSaturate() operation will give you a value between 0x8000 ... 0xFFFF. Then you can do similar trick as above, except do an arithmetic right shift by 15. That way, you end up getting 0xFFFF in all lanes that originally had values between 0x80...0xFFFF. After that you do And with slightly modified s_bitMask to get the mask and from that get trailing zero count to get the lane. If all lanes contain ASCII, then the trailing count would 8 and thats the amount you need to advance the buffer pointer.

[MethodImpl(MethodImplOptions.NoInlining)]
public static int GetIndexOfFirstNonAsciiUShort(Vector128<ushort> value)
{
    if (!AdvSimd.Arm64.IsSupported || !BitConverter.IsLittleEndian)
    {
        throw new PlatformNotSupportedException();
    }

    // This can be readonly static
    Vector128<ushort> s_bitmask = Vector128.Create((uint)0x1000001).AsUInt16();
    
    Vector128<ushort> mostSignificantBitIsSet = AdvSimd.ShiftRightArithmetic(value.AsInt16(), 15).AsUInt16();
     Vector128<ushort> extractedBits = AdvSimd.And(mostSignificantBitIsSet, s_bitmask);
    // collapse mask to lower bits
     extractedBits = AdvSimd.Arm64.AddPairwise(extractedBits, extractedBits);
    ulong mask = extractedBits.AsUInt64().ToScalar();
    // calculate the index
    int index = BitOperations.TrailingZeroCount(mask) >> 3;
    Debug.Assert((mask != 0) ? index < 8 : index >= 8);
    return index;
}

With this suggestion, the callers of this method that pass the value as a result of Saturate() has to change:

  1. It no longer need to case to .AsByte() but can simply do GetIndexOfFirstNonAsciiUShort(AdvSimd.AddSaturate(firstVector, asciiMaskForAddSaturate)).

  2. You no longer have to do the check of currentMask returned from this method with NonAsciiDataSeenMask. So your AdvSimd won't have the check for if ((currentMask & NonAsciiDataSeenMask) != 0). Instead it will be

if (indexOfNonAsciiUShort < 8) { // renamed currentMask to indexOfNonAsciiUShort
   goto FoundNonAsciiDataInCurrentMask;
}
  1. In the label FoundNonAsciiDataInCurrentMask, you no longer have to do currentMask &= NonAsciiDataSeenMask; and BitOperations.TrailingZeroCount(currentMask) - 1, but instead can directly do pBuffer = (char*)((byte*)pBuffer + indexOfNonAsciiUShort);

  2. Update assert for this case to be Debug.Assert(indexOfNonAsciiUShort >= 8).

By doing this, you save 2 AddPairwise instructions in this operation. But please double check the functionality with various inputs to make sure the logic is correct. I might have missed something in the snippet I gave.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in point 3 above, inside FoundNonAsciiDataInCurrentMask , the buffer is byte* advanced with bytes factor, so you might want to do similar when you use indexOfNonAsciiUShort in the arithmetic.

@kunalspathak
Copy link
Contributor

        Vector128<ushort> asciiMaskForTestZ = Vector128.Create((ushort)0xFF80); // used for PTEST on supported hardware

Update the comments to relevant ARM64 instruction as well.


Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:757 in e3b64de. [](commit_id = e3b64de85f368fc5373dc4798e7afd69266195fa, deletion_comment = False)

@kunalspathak
Copy link
Contributor

        Vector128<ushort> asciiMaskForAddSaturate = Vector128.Create((ushort)0x7F80); // used for PADDUSW

same here.


Refers to: src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs:758 in e3b64de. [](commit_id = e3b64de85f368fc5373dc4798e7afd69266195fa, deletion_comment = False)

@jeffhandley
Copy link
Member

I'm going to close this PR pending our discussions on UTF8.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants