Skip to content
This repository was archived by the owner on Aug 2, 2023. It is now read-only.

Create optimized TryRead<T>.#2419

Merged
JeremyKuhne merged 3 commits into
dotnet:masterfrom
JeremyKuhne:binaryread
Aug 7, 2018
Merged

Create optimized TryRead<T>.#2419
JeremyKuhne merged 3 commits into
dotnet:masterfrom
JeremyKuhne:binaryread

Conversation

@JeremyKuhne
Copy link
Copy Markdown
Member

Move helpers to instance methods, split out slow paths and inline fast paths where needed.

Add some perf tests.

Overhead from implicit struct copies, method calls was significant. The best pattern I can come up with is to keep the fast (single span) reads in tight methods with callouts for aggregation. Changed the pattern of peek to return a direct span rather than a copy when it isn't necessary.

Before:

    Method |     Mean |    Error |   StdDev |
---------- |---------:|---------:|---------:|
 ReadInt32 | 328.9 us | 2.700 us | 2.525 us |
 PeekSpan  | 2.589 ms | 0.0125 ms | 0.0117 ms |

After

    Method |     Mean |     Error |    StdDev |
---------- |---------:|----------:|----------:|
 ReadInt32 | 102.1 us | 0.8000 us | 0.7483 us |
  PeekSpan | 240.5 us | 2.4859 us | 2.2037 us |

Roughly 3x / 10x improvement for binary reads. Parsing reads from 20% (split input) - 37% improvement in the benchmarks I added.

Move helpers to instance methods, split out slow paths and inline fast paths where needed.

Add some perf tests.
@JeremyKuhne
Copy link
Copy Markdown
Member Author

I also removed endianness in the binary read. Doesn't seem worth taking a hit for. If we really want it we can add slower overloads.

@davidfowl
Copy link
Copy Markdown
Member

If you're going to pick an endianess to support, you'd want big endian since most networking protocols are big endian.

@davidfowl
Copy link
Copy Markdown
Member

cc @mgravell @benaadams

{
var unread = reader.UnreadSegment;
if (Utf8Parser.TryParse(unread, out value, out int consumed))
const int MaxLength = 15;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the buffer contains leading zeros? Would MaxLength = 15 work in that case?

Also, where is 15 coming from?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to be at the start- only the user can know what is relevant. 15 is the maximum size of the incoming string in the given format.

Comment thread tests/Benchmarks/Benchmarks.csproj Outdated
<EmbeddedResource Include="System.Text.Primitives\SampleTexts\*.txt" />
</ItemGroup>
<ItemGroup>
<Compile Include="..\System.Buffers.Primitives.Tests\BufferSegment.cs" Link="BufferSegment.cs" />
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a project reference to the entire System.Buffers.Primitives.Tests instead, similar to other project references?

https://github.com/dotnet/corefxlab/pull/2419/files#diff-2d9cd050a80433427e0c8791e9b28ed5R51

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that first, it didn't work out. We link in those files in multiple test projects. I should probably move the helpers to a shared project.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that first, it didn't work out.

What ended up breaking?

Also, I tried to run the Reader_Binary tests locally, and i can't. I am getting the following errors for some reason. Have you come across this before @JeremyKuhne?

// Validating benchmarks:
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=99, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf16toUtf8 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf16toUtf8 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=9999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf16toUtf8 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=99, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf32toUtf16 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf32toUtf16 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=9999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf32toUtf16 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=99, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf8toUtf16 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf8toUtf16 has 7
Only 1 benchmark method in a group can have "Baseline = true" applied to it, group DefaultJob-[Length=9999, CodePointInfo=Syste(...)Point [43]] in class EncodeFromUtf8toUtf16 has 7

Any ideas? cc @adamsitnik

{
public class Reader_Binary
{
static byte[] s_array;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add access modifiers explicitly

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, copying an existing test. I'll fix that one too.

public void PeekSpan()
{
const int Count = 1000;
const int Iterations = 100000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we manually setting up the iterations here? Is the objective here to measure reader.Peek and amortize the overhead of the setup of Span/BufferReader (which are ref structs and can't be saved as fields in global setup)?

cc @adamsitnik

Copy link
Copy Markdown
Member

@adamsitnik adamsitnik Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahsonkhan yes, you are right.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I couldn't hold the reader as you mention.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik, do you plan to add support for ref struct fields in BDN?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahsonkhan no, at least as of today. It would imply too many changes for our engine (today we derive a class from type with benchmarks, with structs it's impossible) +the stack only has too many limitations, example: it can't be used for [MemberData] because it requires to cast to object

 public IEnumerator<object[]> GetEnumerator()
    {
        yield return new object[] { 1, 2, 3 };
        yield return new object[] { -4, -6, -10 };
        yield return new object[] { -2, 2, 0 };
        yield return new object[] { int.MinValue, -1, int.MaxValue };
    }

we also can't use Func<Span<T>> because it can't be generic argument ;(

[GlobalSetup]
public void Setup()
{
s_array = new byte[100000];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use underscores as digit separators

{
public static ReadOnlySequence<byte> CreateSplitBuffer(byte[] buffer, int minSize, int maxSize)
{
if (buffer == null || buffer.Length == 0 || minSize <= 0 || maxSize <= 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if minSize > maxSize?

var unread = reader.UnreadSegment;
if (Utf8Parser.TryParse(unread, out value, out int consumed))
var unread = UnreadSegment;
if (Utf8Parser.TryParse(unread, out value, out int consumed) && consumed < unread.Length)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to compare consumed with unread.Length here? Isn't parsing bool known to be of constant length (i.e. either 4 or 5)? Shouldn't we return true if TryParse returns true, regardless?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bool I can see dropping out. The others don't know if there is more valid data beyond a segment. I'll change this one and add a comment.

{
var unread = reader.UnreadSegment;
if (Utf8Parser.TryParse(unread, out value, out int consumed))
var unread = UnreadSegment;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid using var here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean up the existing vars.

@benaadams
Copy link
Copy Markdown
Member

Overhead from implicit struct copies, method calls was significant.

Can you add more detail/examples to "Inlined Struct wrapper .ctors are non-zero cost" https://github.com/dotnet/coreclr/issues/18542

@benaadams
Copy link
Copy Markdown
Member

If you're going to pick an endianess to support, you'd want big endian since most networking protocols are big endian.

When using can always add a non-branching byteswap after (using BitConverter.IsLittleEndian taken to a readonly static to change it to a Jit const); though probably also want @GrabYourPitchforks' "[WIP] Add support for BSWAP intrinsic" dotnet/coreclr#18398

Custom protocols; IPC and file formats aren't necessarily big endian (e.g. why convert to big endian then back to little endian when 90% of the machines are little endian these days)

@mgravell
Copy link
Copy Markdown
Contributor

mgravell commented Aug 7, 2018

on endianness - yah, I remember having lots of convos about whether it made sense to worry about endianness at this level, when you don't know the type (individual fields vs entire struct, etc)

But: endianness is important, and despite what @davidfowl says; plenty use "little" too :)

but that's OK; I can use

SomeStruct val // TryRead etc
if(!BitConverter.IsLittleEndian) val.Foo = BinaryPrimitives.ReverseEndianness(val.Foo);

which should inline / JIT-hoist nicely enough.

It would be nice if there was a:

var x = BinaryPrimitives.InterpretAsLittleEndian(val);
var y = BinaryPrimitives.InterpretAsBigEndian(val);

but... I'll settle for what I have :)

example impl, to show what I mean:

public static int InterpretAsLittleEndian(int val)
    => BitConverter.IsLittleEndian ? val : ReverseEndianness(val);
public static int InterpretAsBigEndian(int val)
    => BitConverter.IsLittleEndian ? ReverseEndianness(val) : val;

I can always add that kind of this in local utility methods, of course.

@Drawaes
Copy link
Copy Markdown
Contributor

Drawaes commented Aug 7, 2018

There has been a solid push in low latency finance recently to move away from bigendian formats because it's just a waste of cycles to have to reverse everything so yes little endian is important.

@benaadams
Copy link
Copy Markdown
Member

Ah, BitConverter.IsLittleEndian is an intrinsic now in CoreClr dotnet/coreclr#9789

@JeremyKuhne
Copy link
Copy Markdown
Member Author

Can you add more detail/examples to "Inlined Struct wrapper .ctors are non-zero cost"

@benaadams I didn't look at the resulting IL, I just measured. Here are the numbers I saw (before breaking the slow path out of the main method).

Use static method in class that takes instance

    Method |     Mean |    Error |   StdDev |
---------- |---------:|---------:|---------:|
 ReadInt32 | 318.0 us | 5.955 us | 5.279 us |

Use instance method

    Method |     Mean |    Error |   StdDev |
---------- |---------:|---------:|---------:|
 ReadInt32 | 272.9 us | 5.203 us | 5.110 us |

@JeremyKuhne
Copy link
Copy Markdown
Member Author

I've added some endian sample methods public unsafe bool TryReadInt32LittleEndian(out int value). If you're aligned with the proper platform it ends up being just about as fast:

                 Method |     Mean |     Error |    StdDev |
----------------------- |---------:|----------:|----------:|
              ReadInt32 | 101.7 us | 0.5264 us | 0.4924 us |
    ReadInt32_BigEndian | 137.2 us | 1.0393 us | 0.9722 us |
 ReadInt32_LittleEndian | 103.5 us | 0.6634 us | 0.6206 us |

@mgravell, @davidfowl, @Drawaes, @benaadams Thoughts?

@benaadams
Copy link
Copy Markdown
Member

I like it 👍

Don't think unsigned are needed? Would rather keep signature count lower.
@JeremyKuhne JeremyKuhne merged commit 138c21a into dotnet:master Aug 7, 2018
@JeremyKuhne JeremyKuhne deleted the binaryread branch August 7, 2018 23:18
@JeremyKuhne
Copy link
Copy Markdown
Member Author

Still taking feedback of course- merging in the current state.

Debug.Assert(UnreadSegment.Length < sizeof(T));

// Not enough data in the current segment, try to peek for the data we need.
byte* buffer = stackalloc byte[sizeof(T)];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this directly using stackalloc Span?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it won't let you pass it to Peek (for fear of being captured as the reader is a ref struct). Readonly methods should address some of this. I'm discussing with @jaredpar etc. about the possibilities of adding a feature that allows methods to take spans and specify that they won't stash the incoming span argument.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants