Skip to content

API review notes for UTF-8 JsonReader#73

Merged
terrajobst merged 3 commits into
dotnet:masterfrom
ahsonkhan:JsonReaderReview
Nov 7, 2018
Merged

API review notes for UTF-8 JsonReader#73
terrajobst merged 3 commits into
dotnet:masterfrom
ahsonkhan:JsonReaderReview

Conversation

@ahsonkhan
Copy link
Copy Markdown
Contributor

@ahsonkhan ahsonkhan commented Oct 31, 2018

[JSON RFC](https://tools.ietf.org/html/rfc8259)

* We have several different input sources that we could receive JSON data from:
- Pipe (sync/async)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async only

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Will fix.

Sequence and hence it is not worth adding separate types. This leads to a
single type having some fields/properties that are not relevant for both
inputs but provides a single layer for the bottom of the JSON stack.
* It is possible to build **synchronous** stream and pipe-based readers on top
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipe-based readers can't be sync.

* We plan to provide stream support by providing a higher-level convenience type
that uses the `Utf8JsonReader` as the work horse to make stitching left over
data easier for the caller.
* There are plans to ship a Pipe to Stream adapter to help with the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of the adapter is Stream -> Pipe so I'm not 100% sure I understand this part. Does this mean that there won't be an easy way to use a pipe directly and parse a JSON payload?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of the adapter is Stream -> Pipe

I would have expected we'd have an adapter in both directions. Is that not the plan?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both directions will exist but I don’t want to have to adapt a pipe to a stream in order to parse JSON.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both directions will exist

Good

1.3, and provide a portable library).
- YSharp already has a package named [System.Text.Json](https://www.nuget.org/packages/System.Text.Json/) published on nuget.
We would need to replace it.
2) Should we pick the curretn API design or go with the alternative (B, below)?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: curretn

1) What namespace and assembly should these types live in? Can it be a new
assembly (it would have to be if we want to support older netstandard, say
1.3, and provide a portable library).
- YSharp already has a package named [System.Text.Json](https://www.nuget.org/packages/System.Text.Json/) published on nuget.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously... We need to preserve the System.* prefix. cc @terrajobst

of the low level "span and sequence" JsonReader. Stream would use the
span-based ctor, while Pipe would use the sequence-based ctor. However, we
don't plan to provide any pipe based higher-level type. Depending on
System.IO.Pipelines would be inverting the expected layering.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it invert the expected layering? As a separate reader type, it could live in either System.IO.Pipelines.dll or a higher-level assembly. I'm not saying we need to add it, but let's not let layering be the reason we don't.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. You make a good point that the pipe-based json reader can live in any higher-level assembly, so it doesn't make sense to not provide one because of layering.

support on top of the low level "span and sequence" JsonReader. It will
require new internal static methods and careful construction of the state
structs. Creating a new instance of the ref struct on every Read/ReadAsync is
not a viable solution since it is too slow.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quantify the "too slow" part?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's 2-3x slower. I will clarify in the notes. For example see this diff:
https://www.diffchecker.com/rMwi4bCx

Note that the JsonReader struct is quite large (168+ bytes) and many of the Read operations take < 100 ns, while the ctor takes 20-40 ns.

And the results (just measuring synchronous read):

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17763
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-009706
  [Host]     : .NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT
  Job-XIDFHR : .NET Core 2.1.3 (CoreCLR 4.6.26725.06, CoreFX 4.6.26725.05), 64bit RyuJIT

IterationCount=5  WarmupCount=3  
Method IsDataCompact TestCase Mean Error StdDev Scaled ScaledSD Gen 0 Allocated
CurrentReader True BasicJson 498.06 ns 41.597 ns 10.805 ns 1.00 0.00 - 0 B
RefStructStreamReader True BasicJson 770.21 ns 12.822 ns 3.330 ns 1.55 0.03 - 0 B
ClassStreamReaderWithNewRefStructReader True BasicJson 1,786.75 ns 179.893 ns 46.727 ns 3.59 0.11 0.0229 112 B
ClassStreamReaderWithStatic True BasicJson 750.87 ns 35.707 ns 9.275 ns 1.51 0.03 0.0372 160 B
CurrentReader True HelloWorld 90.96 ns 4.526 ns 1.176 ns 1.00 0.00 - 0 B
RefStructStreamReader True HelloWorld 339.26 ns 6.541 ns 1.699 ns 3.73 0.05 - 0 B
ClassStreamReaderWithNewRefStructReader True HelloWorld 503.95 ns 27.721 ns 7.200 ns 5.54 0.10 0.0257 112 B
ClassStreamReaderWithStatic True HelloWorld 245.16 ns 9.455 ns 2.456 ns 2.70 0.04 0.0377 160 B
CurrentReader True Json400B 751.07 ns 38.120 ns 9.902 ns 1.00 0.00 - 0 B
RefStructStreamReader True Json400B 1,109.63 ns 98.998 ns 25.714 ns 1.48 0.04 - 0 B
ClassStreamReaderWithNewRefStructReader True Json400B 2,553.46 ns 98.878 ns 25.683 ns 3.40 0.05 0.0229 112 B
ClassStreamReaderWithStatic True Json400B 1,022.42 ns 39.739 ns 10.322 ns 1.36 0.02 0.0362 160 B
CurrentReader True Json400KB 795,532.02 ns 41,610.493 ns 10,808.171 ns 1.00 0.00 - 0 B
RefStructStreamReader True Json400KB 839,487.23 ns 26,588.312 ns 6,906.215 ns 1.06 0.02 - 0 B
ClassStreamReaderWithNewRefStructReader True Json400KB 1,948,951.44 ns 455,567.198 ns 118,331.885 ns 2.45 0.14 - 112 B
ClassStreamReaderWithStatic True Json400KB 933,847.15 ns 67,642.311 ns 17,569.839 ns 1.17 0.02 - 160 B
CurrentReader True Json40KB 73,969.82 ns 3,540.797 ns 919.709 ns 1.00 0.00 - 0 B
RefStructStreamReader True Json40KB 79,708.97 ns 3,805.099 ns 988.360 ns 1.08 0.02 - 0 B
ClassStreamReaderWithNewRefStructReader True Json40KB 181,556.41 ns 22,898.544 ns 5,947.812 ns 2.45 0.08 - 112 B
ClassStreamReaderWithStatic True Json40KB 88,575.84 ns 1,936.204 ns 502.922 ns 1.20 0.01 - 160 B
CurrentReader True Json4KB 5,657.49 ns 117.912 ns 30.627 ns 1.00 0.00 - 0 B
RefStructStreamReader True Json4KB 6,261.78 ns 351.670 ns 91.345 ns 1.11 0.02 - 0 B
ClassStreamReaderWithNewRefStructReader True Json4KB 17,110.21 ns 1,012.294 ns 262.939 ns 3.02 0.04 - 112 B
ClassStreamReaderWithStatic True Json4KB 6,673.03 ns 277.201 ns 72.002 ns 1.18 0.01 0.0305 160 B

Options = JsonReaderOptions.SkipComments
};

while (json.Read())
Copy link
Copy Markdown

@redknightlois redknightlois Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For high-performance scenarios this switch loop will be your worst offender; but that is a problem of all tokenize based parsers. You will hit the limit of performance very fast, I know because we already hit it. https://github.com/ravendb/ravendb/blob/v4.1/src/Sparrow/Json/BlittableJsonReaderObject.cs#L972

We are experimenting on a different paradigm of parsing where you scan and flag first, and then you can query the representation directly what you want instead. Loosely based on the ideas from https://www.microsoft.com/en-us/research/publication/mison-fast-json-parser-data-analytics/

For example:

  • Where's the next object and get a Span<byte> over it instead that you can further parse or ignore entirely, etc.
  • Get all the children Span<byte> for this level.
  • Iterate over all array value Span<byte> which you can parse further if you want to.
  • etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, Utf8JsonReader is an abstraction that you pay for. E.g. if you have a fixed schema or if you are not worried about having a reasonable API, you can certainly get higher performance by avoiding this abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants