Create optimized TryRead<T>. by JeremyKuhne · Pull Request #2419 · dotnet/corefxlab

JeremyKuhne · 2018-08-07T03:47:04Z

Move helpers to instance methods, split out slow paths and inline fast paths where needed.

Add some perf tests.

Overhead from implicit struct copies, method calls was significant. The best pattern I can come up with is to keep the fast (single span) reads in tight methods with callouts for aggregation. Changed the pattern of peek to return a direct span rather than a copy when it isn't necessary.

Before:

    Method |     Mean |    Error |   StdDev |
---------- |---------:|---------:|---------:|
 ReadInt32 | 328.9 us | 2.700 us | 2.525 us |
 PeekSpan  | 2.589 ms | 0.0125 ms | 0.0117 ms |

After

    Method |     Mean |     Error |    StdDev |
---------- |---------:|----------:|----------:|
 ReadInt32 | 102.1 us | 0.8000 us | 0.7483 us |
  PeekSpan | 240.5 us | 2.4859 us | 2.2037 us |

Roughly 3x / 10x improvement for binary reads. Parsing reads from 20% (split input) - 37% improvement in the benchmarks I added.

Move helpers to instance methods, split out slow paths and inline fast paths where needed. Add some perf tests.

JeremyKuhne · 2018-08-07T03:55:37Z

I also removed endianness in the binary read. Doesn't seem worth taking a hit for. If we really want it we can add slower overloads.

davidfowl · 2018-08-07T06:48:33Z

If you're going to pick an endianess to support, you'd want big endian since most networking protocols are big endian.

davidfowl · 2018-08-07T06:55:53Z

cc @mgravell @benaadams

ahsonkhan · 2018-08-07T07:39:20Z

        {
-            var unread = reader.UnreadSegment;
-            if (Utf8Parser.TryParse(unread, out value, out int consumed))
+            const int MaxLength = 15;


What happens if the buffer contains leading zeros? Would MaxLength = 15 work in that case?

Also, where is 15 coming from?

You have to be at the start- only the user can know what is relevant. 15 is the maximum size of the incoming string in the given format.

ahsonkhan · 2018-08-07T07:47:14Z

    <EmbeddedResource Include="System.Text.Primitives\SampleTexts\*.txt" />
  </ItemGroup>
+  <ItemGroup>
+    <Compile Include="..\System.Buffers.Primitives.Tests\BufferSegment.cs" Link="BufferSegment.cs" />


Include a project reference to the entire System.Buffers.Primitives.Tests instead, similar to other project references?

https://github.com/dotnet/corefxlab/pull/2419/files#diff-2d9cd050a80433427e0c8791e9b28ed5R51

Tried that first, it didn't work out. We link in those files in multiple test projects. I should probably move the helpers to a shared project.

Tried that first, it didn't work out.

What ended up breaking?

Also, I tried to run the Reader_Binary tests locally, and i can't. I am getting the following errors for some reason. Have you come across this before @JeremyKuhne?

// Validating benchmarks: Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=99, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf16toUtf8 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf16toUtf8 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=9999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf16toUtf8 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=99, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf32toUtf16 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf32toUtf16 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=9999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf32toUtf16 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=99, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf8toUtf16 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf8toUtf16 has 7 Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=9999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf8toUtf16 has 7

Any ideas? cc @adamsitnik

ahsonkhan · 2018-08-07T07:50:33Z

+{
+    public class Reader_Binary
+    {
+        static byte[] s_array;


nit: add access modifiers explicitly

Yeah, sorry, copying an existing test. I'll fix that one too.

ahsonkhan · 2018-08-07T07:55:39Z

+        public void PeekSpan()
+        {
+            const int Count = 1000;
+            const int Iterations = 100000;


Why are we manually setting up the iterations here? Is the objective here to measure reader.Peek and amortize the overhead of the setup of Span/BufferReader (which are ref structs and can't be saved as fields in global setup)?

cc @adamsitnik

@ahsonkhan yes, you are right.

Yes, I couldn't hold the reader as you mention.

@adamsitnik, do you plan to add support for ref struct fields in BDN?

@ahsonkhan no, at least as of today. It would imply too many changes for our engine (today we derive a class from type with benchmarks, with structs it's impossible) +the stack only has too many limitations, example: it can't be used for [MemberData] because it requires to cast to object

public IEnumerator<object[]> GetEnumerator() { yield return new object[] { 1, 2, 3 }; yield return new object[] { -4, -6, -10 }; yield return new object[] { -2, 2, 0 }; yield return new object[] { int.MinValue, -1, int.MaxValue }; }

we also can't use Func<Span<T>> because it can't be generic argument ;(

ahsonkhan · 2018-08-07T08:03:09Z

+        [GlobalSetup]
+        public void Setup()
+        {
+            s_array = new byte[100000];


nit: use underscores as digit separators

ahsonkhan · 2018-08-07T08:06:47Z

    {
+        public static ReadOnlySequence<byte> CreateSplitBuffer(byte[] buffer, int minSize, int maxSize)
+        {
+            if (buffer == null || buffer.Length == 0 || minSize <= 0 || maxSize <= 0)


Also if minSize > maxSize?

ahsonkhan · 2018-08-07T08:13:48Z

-            var unread = reader.UnreadSegment;
-            if (Utf8Parser.TryParse(unread, out value, out int consumed))
+            var unread = UnreadSegment;
+            if (Utf8Parser.TryParse(unread, out value, out int consumed) && consumed < unread.Length)


Why do we need to compare consumed with unread.Length here? Isn't parsing bool known to be of constant length (i.e. either 4 or 5)? Shouldn't we return true if TryParse returns true, regardless?

For bool I can see dropping out. The others don't know if there is more valid data beyond a segment. I'll change this one and add a comment.

ahsonkhan · 2018-08-07T08:15:23Z

        {
-            var unread = reader.UnreadSegment;
-            if (Utf8Parser.TryParse(unread, out value, out int consumed))
+            var unread = UnreadSegment;


nit: avoid using var here.

I'll clean up the existing vars.

benaadams · 2018-08-07T08:48:25Z

Overhead from implicit struct copies, method calls was significant.

Can you add more detail/examples to "Inlined Struct wrapper .ctors are non-zero cost" https://github.com/dotnet/coreclr/issues/18542

benaadams · 2018-08-07T09:12:47Z

If you're going to pick an endianess to support, you'd want big endian since most networking protocols are big endian.

When using can always add a non-branching byteswap after (using BitConverter.IsLittleEndian taken to a readonly static to change it to a Jit const); though probably also want @GrabYourPitchforks' "[WIP] Add support for BSWAP intrinsic" dotnet/coreclr#18398

Custom protocols; IPC and file formats aren't necessarily big endian (e.g. why convert to big endian then back to little endian when 90% of the machines are little endian these days)

mgravell · 2018-08-07T09:21:59Z

on endianness - yah, I remember having lots of convos about whether it made sense to worry about endianness at this level, when you don't know the type (individual fields vs entire struct, etc)

But: endianness is important, and despite what @davidfowl says; plenty use "little" too :)

but that's OK; I can use

SomeStruct val // TryRead etc
if(!BitConverter.IsLittleEndian) val.Foo = BinaryPrimitives.ReverseEndianness(val.Foo);

which should inline / JIT-hoist nicely enough.

It would be nice if there was a:

var x = BinaryPrimitives.InterpretAsLittleEndian(val);
var y = BinaryPrimitives.InterpretAsBigEndian(val);

but... I'll settle for what I have :)

example impl, to show what I mean:

public static int InterpretAsLittleEndian(int val)
    => BitConverter.IsLittleEndian ? val : ReverseEndianness(val);
public static int InterpretAsBigEndian(int val)
    => BitConverter.IsLittleEndian ? ReverseEndianness(val) : val;

I can always add that kind of this in local utility methods, of course.

Drawaes · 2018-08-07T09:24:10Z

There has been a solid push in low latency finance recently to move away from bigendian formats because it's just a waste of cycles to have to reverse everything so yes little endian is important.

benaadams · 2018-08-07T10:56:41Z

Ah, BitConverter.IsLittleEndian is an intrinsic now in CoreClr dotnet/coreclr#9789

JeremyKuhne · 2018-08-07T20:26:47Z

Can you add more detail/examples to "Inlined Struct wrapper .ctors are non-zero cost"

@benaadams I didn't look at the resulting IL, I just measured. Here are the numbers I saw (before breaking the slow path out of the main method).

Use static method in class that takes instance

    Method |     Mean |    Error |   StdDev |
---------- |---------:|---------:|---------:|
 ReadInt32 | 318.0 us | 5.955 us | 5.279 us |

Use instance method

    Method |     Mean |    Error |   StdDev |
---------- |---------:|---------:|---------:|
 ReadInt32 | 272.9 us | 5.203 us | 5.110 us |

JeremyKuhne · 2018-08-07T21:19:23Z

I've added some endian sample methods public unsafe bool TryReadInt32LittleEndian(out int value). If you're aligned with the proper platform it ends up being just about as fast:

                 Method |     Mean |     Error |    StdDev |
----------------------- |---------:|----------:|----------:|
              ReadInt32 | 101.7 us | 0.5264 us | 0.4924 us |
    ReadInt32_BigEndian | 137.2 us | 1.0393 us | 0.9722 us |
 ReadInt32_LittleEndian | 103.5 us | 0.6634 us | 0.6206 us |

@mgravell, @davidfowl, @Drawaes, @benaadams Thoughts?

benaadams · 2018-08-07T21:23:51Z

I like it 👍

Don't think unsigned are needed? Would rather keep signature count lower.

JeremyKuhne · 2018-08-07T23:18:41Z

Still taking feedback of course- merging in the current state.

davidfowl · 2018-08-08T06:08:29Z

+            Debug.Assert(UnreadSegment.Length < sizeof(T));
+
+            // Not enough data in the current segment, try to peek for the data we need.
+            byte* buffer = stackalloc byte[sizeof(T)];


Why isn't this directly using stackalloc Span?

Because it won't let you pass it to Peek (for fear of being captured as the reader is a ref struct). Readonly methods should address some of this. I'm discussing with @jaredpar etc. about the possibilities of adding a feature that allows methods to take spans and specify that they won't stash the incoming span argument.

Create optimized TryRead<T>.

8aa5a31

Move helpers to instance methods, split out slow paths and inline fast paths where needed. Add some perf tests.

JeremyKuhne requested review from ahsonkhan and davidfowl August 7, 2018 03:47

ahsonkhan reviewed Aug 7, 2018

View reviewed changes

ahsonkhan approved these changes Aug 7, 2018

View reviewed changes

joshfree approved these changes Aug 7, 2018

View reviewed changes

Address feedback, add int endian reader methods

3eaa5e2

benaadams approved these changes Aug 7, 2018

View reviewed changes

Fill out endian readers

6e9b077

Don't think unsigned are needed? Would rather keep signature count lower.

JeremyKuhne merged commit 138c21a into dotnet:master Aug 7, 2018

JeremyKuhne deleted the binaryread branch August 7, 2018 23:18

davidfowl reviewed Aug 8, 2018

View reviewed changes

Conversation

JeremyKuhne commented Aug 7, 2018

Uh oh!

JeremyKuhne commented Aug 7, 2018

Uh oh!

davidfowl commented Aug 7, 2018

Uh oh!

davidfowl commented Aug 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamsitnik Aug 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benaadams commented Aug 7, 2018

Uh oh!

benaadams commented Aug 7, 2018

Uh oh!

mgravell commented Aug 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Drawaes commented Aug 7, 2018

Uh oh!

benaadams commented Aug 7, 2018

Uh oh!

JeremyKuhne commented Aug 7, 2018

Uh oh!

JeremyKuhne commented Aug 7, 2018

Uh oh!

benaadams commented Aug 7, 2018

Uh oh!

JeremyKuhne commented Aug 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

adamsitnik Aug 7, 2018 •

edited

Loading

mgravell commented Aug 7, 2018 •

edited

Loading