SpanHelpers IndexOfAny-methods use sentinel-value#25970
Conversation
Instead of having _valueLength_-times an if-check in the loop, a sentinel value is used and hence only one final checks needs to done. Perf-win depends on the size of _valueLength_, the greater the more win.
|
Do you have any numbers to demonstrate the perf-win? |
|
@jkotas I'm just running better benchmarks and will post results here. |
| return 0; // A zero-length sequence is always treated as "found" at the start of the search space. | ||
|
|
||
| int index = -1; | ||
| int index = int.MinValue; |
There was a problem hiding this comment.
You may be able to just use int index = -1; here, and then you do not need to normalize it at the end.
There was a problem hiding this comment.
Yeah, that makes sense on LastIndexOf -- it appears I didn't think too much about it, only about the "forward-version".
There was a problem hiding this comment.
I'll apply the same in SpanHelper.T.cs LastIndexOf-method.
| { | ||
| index = (index == -1 || index < tempIndex) ? tempIndex : index; | ||
| } | ||
| index = (tempIndex == -1 || index > tempIndex) ? index : tempIndex; |
There was a problem hiding this comment.
The check for tempIndex == -1 can be omitted here I think.
There was a problem hiding this comment.
Due the change above (line 80) yes, because index is initialized to -1.
As of PR-feedback.
| { | ||
| index = (index == -1 || index > tempIndex) ? tempIndex : index; | ||
| } | ||
| index = (tempIndex == -1 || tempIndex > index) ? index : tempIndex; |
There was a problem hiding this comment.
You can use the same leaner pattern here as in LastIndexOfAny. Just need to use unsigned comparison:
index = ((uint)index > (uint)tempIndex) ? index : tempIndex;
There was a problem hiding this comment.
index = ((uint)tempIndex > (uint)index) ? index : tempIndex;
is correct.
| { | ||
| index = (index == -1 || index < tempIndex) ? tempIndex : index; | ||
| } | ||
| index = (index > tempIndex) ? index : tempIndex; |
There was a problem hiding this comment.
I think that the JIT will generate sligthly better code if this is written as if (tempIndex > index) index = tempIndex;. (You can check the disassembly.)
There was a problem hiding this comment.
You are right.
Simple test with SharpLab (I hope this quite near to real JIT, and JIT for x86 is now also RuyJIT):
public class C {
public int A(int tempIndex)
{
int index = -1;
index = (index > tempIndex) ? index : tempIndex;
return index;
}
public int B(int tempIndex)
{
int index = -1;
if (index > tempIndex) index = tempIndex;
return index;
}
}produces:
C.A(Int32)
L0000: push ebp
L0001: mov ebp, esp
L0003: cmp edx, 0xffffffff
L0006: jl L000c
L0008: mov eax, edx
L000a: jmp L000f
L000c: or eax, 0xffffffff
L000f: pop ebp
L0010: ret
C.B(Int32)
L0000: or eax, 0xffffffff
L0003: cmp edx, 0xffffffff
L0006: jge L000a
L0008: mov eax, edx
L000a: ret
As of PR-feedback.
ResultsCode for benchmarks: https://gist.github.com/gfoidl/842f3ca4a80e532515f4476cd92cae66 Span.IndexOfAnyBenchmarkDotNet=v0.10.11, OS=Windows 7 SP1 (6.1.7601.0)
Processor=Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge), ProcessorCount=8
Frequency=2241054 Hz, Resolution=446.2186 ns, Timer=TSC
.NET Core SDK=2.1.2
[Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
DefaultJob : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
Span.LastIndexOfAnyHere I can't measure any notable difference. When running the benchmark multiple times once IndexOfAny is slightly faster, and once the method from this PR is faster. Old JITed codeNote: method got inlined 000007fe`773adb60 System.SpanHelpers.LastIndexOfAny(Byte ByRef, Int32, Byte ByRef, Int32)
int index = -1;
^^^^^^^^^^^^^^^
000007fe`773adb77 41beffffffff mov r14d,0FFFFFFFFh
for (int i = 0; i < valueLength; i++)
^^^^^^^^^
000007fe`773adb7d 4533ff xor r15d,r15d
for (int i = 0; i < valueLength; i++)
^^^^^^^^^^^^^^^
000007fe`773adb80 85f6 test esi,esi
000007fe`773adb82 7e2f jle 000007fe`773adbb3
000007fe`773adb84 4963d7 movsxd rdx,r15d
000007fe`773adb87 0fb61413 movzx edx,byte ptr [rbx+rdx]
000007fe`773adb8b 488bcf mov rcx,rdi
000007fe`773adb8e 448bc5 mov r8d,ebp
000007fe`773adb91 e802f3ffff call 000007fe`773ace98
if (tempIndex != -1)
^^^^^^^^^^^^^^^^^^^^
000007fe`773adb96 83f8ff cmp eax,0FFFFFFFFh
000007fe`773adb99 7410 je 000007fe`773adbab
index = (index == -1 || index < tempIndex) ? tempIndex : index;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
000007fe`773adb9b 4183feff cmp r14d,0FFFFFFFFh
000007fe`773adb9f 7407 je 000007fe`773adba8
000007fe`773adba1 443bf0 cmp r14d,eax
000007fe`773adba4 7c02 jl 000007fe`773adba8
for (int i = 0; i < valueLength; i++)
^^^
000007fe`773adba6 eb03 jmp 000007fe`773adbab
000007fe`773adba8 448bf0 mov r14d,eax
000007fe`773adbab 41ffc7 inc r15d
000007fe`773adbae 443bfe cmp r15d,esi
000007fe`773adbb1 7cd1 jl 000007fe`773adb84
return index;
^^^^^^^^^^^^^
000007fe`773adbb3 418bc6 mov eax,r14dNew JITed codeNote: method got inlined 000007fe`773adb60 System.SpanHelpers.LastIndexOfAnyNew(Byte ByRef, Int32, Byte ByRef, Int32)
int index = -1;
^^^^^^^^^^^^^^^
000007fe`773adb77 41beffffffff mov r14d,0FFFFFFFFh
for (int i = 0; i < valueLength; i++)
^^^^^^^^^
000007fe`773adb7d 4533ff xor r15d,r15d
for (int i = 0; i < valueLength; i++)
^^^^^^^^^^^^^^^
000007fe`773adb80 85f6 test esi,esi
000007fe`773adb82 7e22 jle 000007fe`773adba6
000007fe`773adb84 4963d7 movsxd rdx,r15d
000007fe`773adb87 0fb61413 movzx edx,byte ptr [rbx+rdx]
000007fe`773adb8b 488bcf mov rcx,rdi
000007fe`773adb8e 448bc5 mov r8d,ebp
000007fe`773adb91 e802f3ffff call 000007fe`773ace98
if (tempIndex > index) index = tempIndex;
^^^^^^^^^^^^^^^^^^^^^^
000007fe`773adb96 413bc6 cmp eax,r14d
000007fe`773adb99 7e03 jle 000007fe`773adb9e
if (tempIndex > index) index = tempIndex;
^^^^^^^^^^^^^^^^^^
000007fe`773adb9b 448bf0 mov r14d,eax
for (int i = 0; i < valueLength; i++)
^^^
000007fe`773adb9e 41ffc7 inc r15d
000007fe`773adba1 443bfe cmp r15d,esi
000007fe`773adba4 7cde jl 000007fe`773adb84
return index;
^^^^^^^^^^^^^
000007fe`773adba6 418bc6 mov eax,r14d |
| return 0; // A zero-length sequence is always treated as "found" at the start of the search space. | ||
|
|
||
| int index = -1; | ||
| int index = int.MaxValue; |
There was a problem hiding this comment.
You should be able to start with -1 here and avoid the normalization at the end.
There was a problem hiding this comment.
Great -- the "uint-trick" opens the door for this 😄
Span.LastIndexOfAnyI updated the benchmark-code, so that a the last position with the last value a match is found, i.e. more accurate measuring for the loop under test. Here are the results more in the way I would expect: BenchmarkDotNet=v0.10.11, OS=Windows 7 SP1 (6.1.7601.0)
Processor=Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge), ProcessorCount=8
Frequency=2241054 Hz, Resolution=446.2186 ns, Timer=TSC
.NET Core SDK=2.1.2
[Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
DefaultJob : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
|
* SpanHelpers IndexOfAny-methods use sentinel-value Instead of having _valueLength_-times an if-check in the loop, a sentinel value is used and hence only one final checks needs to done. Perf-win depends on the size of _valueLength_, the greater the more win. * LastIndexOf initializes index with -1 instead of int.MinValue As of PR-feedback. * Better checks based on if instead of ternary operator As of PR-feedback. * SpanHelpers.byte.cs IndexOfAny can avoid int.MaxValue due to unit-trick As of PR-Feedback Commit migrated from dotnet/corefx@016659c
Instead of having valueLength-times an if-check in the loop, a sentinel value is used and hence only one final checks needs to done.
Perf-win depends on the size of valueLength, the greater the more win.