Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

SpanHelpers IndexOfAny-methods use sentinel-value#25970

Merged
jkotas merged 4 commits into
dotnet:masterfrom
gfoidl:master
Dec 18, 2017
Merged

SpanHelpers IndexOfAny-methods use sentinel-value#25970
jkotas merged 4 commits into
dotnet:masterfrom
gfoidl:master

Conversation

@gfoidl
Copy link
Copy Markdown
Member

@gfoidl gfoidl commented Dec 17, 2017

Instead of having valueLength-times an if-check in the loop, a sentinel value is used and hence only one final checks needs to done.

Perf-win depends on the size of valueLength, the greater the more win.

Instead of having _valueLength_-times an if-check in the loop,
a sentinel value is used and hence only one final checks needs to done.

Perf-win depends on the size of _valueLength_, the greater the more win.
@jkotas
Copy link
Copy Markdown
Member

jkotas commented Dec 17, 2017

Do you have any numbers to demonstrate the perf-win?

@gfoidl
Copy link
Copy Markdown
Member Author

gfoidl commented Dec 17, 2017

@jkotas I'm just running better benchmarks and will post results here.

@dotnet dotnet deleted a comment from begay Dec 17, 2017
return 0; // A zero-length sequence is always treated as "found" at the start of the search space.

int index = -1;
int index = int.MinValue;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to just use int index = -1; here, and then you do not need to normalize it at the end.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense on LastIndexOf -- it appears I didn't think too much about it, only about the "forward-version".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll apply the same in SpanHelper.T.cs LastIndexOf-method.

{
index = (index == -1 || index < tempIndex) ? tempIndex : index;
}
index = (tempIndex == -1 || index > tempIndex) ? index : tempIndex;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for tempIndex == -1 can be omitted here I think.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due the change above (line 80) yes, because index is initialized to -1.

@dotnet dotnet deleted a comment from begay Dec 17, 2017
{
index = (index == -1 || index > tempIndex) ? tempIndex : index;
}
index = (tempIndex == -1 || tempIndex > index) ? index : tempIndex;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the same leaner pattern here as in LastIndexOfAny. Just need to use unsigned comparison:

index = ((uint)index > (uint)tempIndex) ? index : tempIndex;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index = ((uint)tempIndex > (uint)index) ? index : tempIndex;
is correct.

{
index = (index == -1 || index < tempIndex) ? tempIndex : index;
}
index = (index > tempIndex) ? index : tempIndex;
Copy link
Copy Markdown
Member

@jkotas jkotas Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the JIT will generate sligthly better code if this is written as if (tempIndex > index) index = tempIndex;. (You can check the disassembly.)

Copy link
Copy Markdown
Member Author

@gfoidl gfoidl Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

Simple test with SharpLab (I hope this quite near to real JIT, and JIT for x86 is now also RuyJIT):

public class C {
    public int A(int tempIndex)
    {
     	int index = -1;
        
        index = (index > tempIndex) ? index : tempIndex;
        
        return index;
    }
    
    public int B(int tempIndex)
    {
     	int index = -1;
        
        if (index > tempIndex) index = tempIndex;
        
        return index;
    }
}

produces:

C.A(Int32)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: cmp edx, 0xffffffff
    L0006: jl L000c
    L0008: mov eax, edx
    L000a: jmp L000f
    L000c: or eax, 0xffffffff
    L000f: pop ebp
    L0010: ret

C.B(Int32)
    L0000: or eax, 0xffffffff
    L0003: cmp edx, 0xffffffff
    L0006: jge L000a
    L0008: mov eax, edx
    L000a: ret

@gfoidl
Copy link
Copy Markdown
Member Author

gfoidl commented Dec 17, 2017

Results

Code for benchmarks: https://gist.github.com/gfoidl/842f3ca4a80e532515f4476cd92cae66

Span.IndexOfAny

BenchmarkDotNet=v0.10.11, OS=Windows 7 SP1 (6.1.7601.0)
Processor=Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge), ProcessorCount=8
Frequency=2241054 Hz, Resolution=446.2186 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host]     : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Method BufferSize ValuesSize Mean Error StdDev Scaled
IndexOfAny 1024 4 429.3 ns 0.4782 ns 0.4239 ns 1.00
IndexOfAnyNew 1024 4 394.9 ns 0.4722 ns 0.3687 ns 0.92
IndexOfAny 1024 10 1,033.3 ns 1.7612 ns 1.5613 ns 1.00
IndexOfAnyNew 1024 10 950.1 ns 2.4577 ns 1.9188 ns 0.92
IndexOfAny 4096 4 1,181.2 ns 0.8200 ns 0.7670 ns 1.00
IndexOfAnyNew 4096 4 1,147.8 ns 0.8821 ns 0.8252 ns 0.97
IndexOfAny 4096 10 2,924.2 ns 8.1831 ns 6.3888 ns 1.00
IndexOfAnyNew 4096 10 2,828.9 ns 2.2138 ns 2.0707 ns 0.97

Span.LastIndexOfAny

Here I can't measure any notable difference. When running the benchmark multiple times once IndexOfAny is slightly faster, and once the method from this PR is faster.
For me this seems strange, because the JITed code is shorter and more precise in the new variant, so it "must" be faster. Maybe my code for the benchmark is not ideal to reflect this exactly.

Old JITed code

Note: method got inlined

000007fe`773adb60 System.SpanHelpers.LastIndexOfAny(Byte ByRef, Int32, Byte ByRef, Int32)
			int index = -1;
   ^^^^^^^^^^^^^^^
000007fe`773adb77 41beffffffff    mov     r14d,0FFFFFFFFh
			for (int i = 0; i < valueLength; i++)
        ^^^^^^^^^
000007fe`773adb7d 4533ff          xor     r15d,r15d
			for (int i = 0; i < valueLength; i++)
                   ^^^^^^^^^^^^^^^
000007fe`773adb80 85f6            test    esi,esi
000007fe`773adb82 7e2f            jle     000007fe`773adbb3
000007fe`773adb84 4963d7          movsxd  rdx,r15d
000007fe`773adb87 0fb61413        movzx   edx,byte ptr [rbx+rdx]
000007fe`773adb8b 488bcf          mov     rcx,rdi
000007fe`773adb8e 448bc5          mov     r8d,ebp
000007fe`773adb91 e802f3ffff      call    000007fe`773ace98
				if (tempIndex != -1)
    ^^^^^^^^^^^^^^^^^^^^
000007fe`773adb96 83f8ff          cmp     eax,0FFFFFFFFh
000007fe`773adb99 7410            je      000007fe`773adbab
					index = (index == -1 || index < tempIndex) ? tempIndex : index;
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
000007fe`773adb9b 4183feff        cmp     r14d,0FFFFFFFFh
000007fe`773adb9f 7407            je      000007fe`773adba8
000007fe`773adba1 443bf0          cmp     r14d,eax
000007fe`773adba4 7c02            jl      000007fe`773adba8
			for (int i = 0; i < valueLength; i++)
                                    ^^^
000007fe`773adba6 eb03            jmp     000007fe`773adbab
000007fe`773adba8 448bf0          mov     r14d,eax
000007fe`773adbab 41ffc7          inc     r15d
000007fe`773adbae 443bfe          cmp     r15d,esi
000007fe`773adbb1 7cd1            jl      000007fe`773adb84
			return index;
   ^^^^^^^^^^^^^
000007fe`773adbb3 418bc6          mov     eax,r14d

New JITed code

Note: method got inlined

000007fe`773adb60 System.SpanHelpers.LastIndexOfAnyNew(Byte ByRef, Int32, Byte ByRef, Int32)
			int index = -1;
   ^^^^^^^^^^^^^^^
000007fe`773adb77 41beffffffff    mov     r14d,0FFFFFFFFh
			for (int i = 0; i < valueLength; i++)
        ^^^^^^^^^
000007fe`773adb7d 4533ff          xor     r15d,r15d
			for (int i = 0; i < valueLength; i++)
                   ^^^^^^^^^^^^^^^
000007fe`773adb80 85f6            test    esi,esi
000007fe`773adb82 7e22            jle     000007fe`773adba6
000007fe`773adb84 4963d7          movsxd  rdx,r15d
000007fe`773adb87 0fb61413        movzx   edx,byte ptr [rbx+rdx]
000007fe`773adb8b 488bcf          mov     rcx,rdi
000007fe`773adb8e 448bc5          mov     r8d,ebp
000007fe`773adb91 e802f3ffff      call    000007fe`773ace98
				if (tempIndex > index) index = tempIndex;
    ^^^^^^^^^^^^^^^^^^^^^^
000007fe`773adb96 413bc6          cmp     eax,r14d
000007fe`773adb99 7e03            jle     000007fe`773adb9e
				if (tempIndex > index) index = tempIndex;
                           ^^^^^^^^^^^^^^^^^^
000007fe`773adb9b 448bf0          mov     r14d,eax
			for (int i = 0; i < valueLength; i++)
                                    ^^^
000007fe`773adb9e 41ffc7          inc     r15d
000007fe`773adba1 443bfe          cmp     r15d,esi
000007fe`773adba4 7cde            jl      000007fe`773adb84
			return index;
   ^^^^^^^^^^^^^
000007fe`773adba6 418bc6          mov     eax,r14d

return 0; // A zero-length sequence is always treated as "found" at the start of the search space.

int index = -1;
int index = int.MaxValue;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to start with -1 here and avoid the normalization at the end.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great -- the "uint-trick" opens the door for this 😄

@gfoidl
Copy link
Copy Markdown
Member Author

gfoidl commented Dec 18, 2017

Span.LastIndexOfAny

I updated the benchmark-code, so that a the last position with the last value a match is found, i.e. more accurate measuring for the loop under test.

Here are the results more in the way I would expect:

BenchmarkDotNet=v0.10.11, OS=Windows 7 SP1 (6.1.7601.0)
Processor=Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge), ProcessorCount=8
Frequency=2241054 Hz, Resolution=446.2186 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host]     : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Method BufferSize ValuesSize Mean Error StdDev Scaled ScaledSD
LastIndexOfAny 1024 4 366.1 ns 3.141 ns 2.623 ns 1.00 0.00
LastIndexOfAnyNew 1024 4 363.6 ns 1.825 ns 1.707 ns 0.99 0.01
LastIndexOfAny 1024 10 1,035.3 ns 2.025 ns 1.465 ns 1.00 0.00
LastIndexOfAnyNew 1024 10 1,033.0 ns 8.327 ns 6.954 ns 1.00 0.01
LastIndexOfAny 4096 4 1,122.3 ns 5.109 ns 4.779 ns 1.00 0.00
LastIndexOfAnyNew 4096 4 1,126.3 ns 2.386 ns 1.863 ns 1.00 0.00
LastIndexOfAny 4096 10 3,408.7 ns 66.348 ns 99.307 ns 1.00 0.00
LastIndexOfAnyNew 4096 10 3,301.6 ns 18.005 ns 15.961 ns 0.97 0.03

Copy link
Copy Markdown
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jkotas jkotas merged commit 016659c into dotnet:master Dec 18, 2017
@karelz karelz added this to the 2.1.0 milestone Dec 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* SpanHelpers IndexOfAny-methods use sentinel-value

Instead of having _valueLength_-times an if-check in the loop,
a sentinel value is used and hence only one final checks needs to done.

Perf-win depends on the size of _valueLength_, the greater the more win.

* LastIndexOf initializes index with -1 instead of int.MinValue

As of PR-feedback.

* Better checks based on if instead of ternary operator

As of PR-feedback.

* SpanHelpers.byte.cs IndexOfAny can avoid int.MaxValue due to unit-trick

As of PR-Feedback


Commit migrated from dotnet/corefx@016659c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants