Vectorize SpanHelpers.IndexOf for byte#17143
Conversation
| return (int)(byte*)(index + 7); | ||
| #if !netstandard10 | ||
| VectorLength: | ||
| // Already checked, but for jit branch elmination |
There was a problem hiding this comment.
What does this do for the jit branch elimination?
There was a problem hiding this comment.
Is always false when no vector support so should make the function's asm smaller?
There was a problem hiding this comment.
The JIT has dead code elimination. When no vector support, the never taken branch at the top should be enough to make this code. I do not think that this extra check can ever help anything.
| return (int)(byte*)(index + 7); | ||
| #if !netstandard10 | ||
| VectorLength: | ||
| // Already checked, but for jit branch elmination |
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static Vector<byte> GetVector(byte vectorByte) | ||
| { | ||
| #if !NETCOREAPP |
There was a problem hiding this comment.
This needs to be defined in the .csproj; and it should be lowercase.
|
Think I got the csproj changes correct? |
Yes. |
|
I have been wondering whether it would be worth it to check for a few matches before we take the vectorized path - to have more balanced performance profile over all possible inputs. Would you mind collecting some numbers about it? |
|
LGTM otherwise. |
Am guessing it will make initial start better, but then the following worse. E.g. if the match was at byte 9 and the start up checked the first 8 bytes then went vector. The byte 9 match would pay the cost for both the 8 byte individual search and then the vector start up cost. Also for something like "StartsWith" you check the first byte and subsequent rather than doing an index of? Will try though... |
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static Vector<byte> GetVector(byte vectorByte) | ||
| { | ||
| #if !netcoreapp |
There was a problem hiding this comment.
Isn't !netcoreapp === netstandard1.0? In which case, why do we need to define netcoreapp as a constant?
There was a problem hiding this comment.
netstandard1.1+ can also be full framework and mono
There was a problem hiding this comment.
The fix is currently only in netcoreapp jit 1.2.0+
| 0x03ul << 32 | | ||
| 0x02ul << 40 | | ||
| 0x01ul << 48) + 1; | ||
| #endif |
There was a problem hiding this comment.
This contains nested #if branches where the parent covers a large body of the code. I think we should break it up to surround individual methods instead.
That is,
#if !netstandard10
// Vector sub-search adapted from https://github.com/aspnet/KestrelHttpServer/pull/1138
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static int LocateFirstFoundByte(Vector<byte> match)
{
...
}
#endifWould avoid confusion if methods get moved around/etc.
There was a problem hiding this comment.
!netstandard10 covers whether the vectors package is available
!netcoreapp covers a bug in the jit regarding vectors that has been fixed in netcoreapp 1.2; but hasn't been released in full framework; so it is a subset of !netstandard10
There was a problem hiding this comment.
#if'd the individual methods where applicable
|
LGTM |
|
@KrzysztofCwalina, can this be merged? |
|
Yes. Looks good to me. |
|
Is this only for fast span? |
It is for both. |
|
Ok good. |
Extension methods in the SpanExtensions class only have one implementation (which lives in corefx). |
|
Still think Clear and Fill should move to Extensions rather than instance methods defined on both spans |
Hey @benaadams, will you be providing the requested perf numbers. If you are too busy elsewhere, I am more than happy to run some benchmark tests and provide the requested data. Let me know :) |
|
@benaadams, I think you are right about the cost of the 9th item, but here is what I am thinking:
.. just a thought. |
|
K, so... I've had a d'oh moment. I've not been able to check as have been have some issues with the build (now resolved) and not been abe to build and use coreclr+corefx from src. However... as they are extension methods I don't actually need to build coreclr+corefx to test 😝 |
|
On netcoreapp 1.1 (netcoreapp 2.0 jit has had Vector fixes in this area, but not sure how to benchmark it other than in a frankenbuild, will try in morning) Vector16 Vector32 Looking at the results; it probably does need some kind of blend. Also will try to use |
|
For netcoreapp 1.1 Vector cut overs are about 2x Vector.Count (e.g. 32 and 64) |
|
Have an idea... |
|
It it worth merging this now and then following up with startup change? i.e. /cc @davidfowl |
|
There is some bad codegen int the Vector path; trying to build with coreclr 2.0 to see if its fixed |
|
Yeah, I think we should merge and then optimize the front. @davidfowl? @ahsonkhan? |
|
Having trouble getting the asm for 2.0 but there is a bunch of code gen it hits that should have been improved: e.g. dotnet/coreclr#7407, https://github.com/dotnet/coreclr/issues/7843 |
|
Can we merge? @jkotas |
|
LGTM |
/cc @jkotas @KrzysztofCwalina @davidfowl @ahsonkhan