API review notes for UTF-8 JsonReader#73
Conversation
| [JSON RFC](https://tools.ietf.org/html/rfc8259) | ||
|
|
||
| * We have several different input sources that we could receive JSON data from: | ||
| - Pipe (sync/async) |
There was a problem hiding this comment.
Good catch. Will fix.
| Sequence and hence it is not worth adding separate types. This leads to a | ||
| single type having some fields/properties that are not relevant for both | ||
| inputs but provides a single layer for the bottom of the JSON stack. | ||
| * It is possible to build **synchronous** stream and pipe-based readers on top |
There was a problem hiding this comment.
pipe-based readers can't be sync.
| * We plan to provide stream support by providing a higher-level convenience type | ||
| that uses the `Utf8JsonReader` as the work horse to make stitching left over | ||
| data easier for the caller. | ||
| * There are plans to ship a Pipe to Stream adapter to help with the |
There was a problem hiding this comment.
The main purpose of the adapter is Stream -> Pipe so I'm not 100% sure I understand this part. Does this mean that there won't be an easy way to use a pipe directly and parse a JSON payload?
There was a problem hiding this comment.
The main purpose of the adapter is Stream -> Pipe
I would have expected we'd have an adapter in both directions. Is that not the plan?
There was a problem hiding this comment.
Yes both directions will exist but I don’t want to have to adapt a pipe to a stream in order to parse JSON.
There was a problem hiding this comment.
Yes both directions will exist
Good
| 1.3, and provide a portable library). | ||
| - YSharp already has a package named [System.Text.Json](https://www.nuget.org/packages/System.Text.Json/) published on nuget. | ||
| We would need to replace it. | ||
| 2) Should we pick the curretn API design or go with the alternative (B, below)? |
| 1) What namespace and assembly should these types live in? Can it be a new | ||
| assembly (it would have to be if we want to support older netstandard, say | ||
| 1.3, and provide a portable library). | ||
| - YSharp already has a package named [System.Text.Json](https://www.nuget.org/packages/System.Text.Json/) published on nuget. |
There was a problem hiding this comment.
Seriously... We need to preserve the System.* prefix. cc @terrajobst
| of the low level "span and sequence" JsonReader. Stream would use the | ||
| span-based ctor, while Pipe would use the sequence-based ctor. However, we | ||
| don't plan to provide any pipe based higher-level type. Depending on | ||
| System.IO.Pipelines would be inverting the expected layering. |
There was a problem hiding this comment.
Why would it invert the expected layering? As a separate reader type, it could live in either System.IO.Pipelines.dll or a higher-level assembly. I'm not saying we need to add it, but let's not let layering be the reason we don't.
There was a problem hiding this comment.
That's true. You make a good point that the pipe-based json reader can live in any higher-level assembly, so it doesn't make sense to not provide one because of layering.
| support on top of the low level "span and sequence" JsonReader. It will | ||
| require new internal static methods and careful construction of the state | ||
| structs. Creating a new instance of the ref struct on every Read/ReadAsync is | ||
| not a viable solution since it is too slow. |
There was a problem hiding this comment.
Can you quantify the "too slow" part?
There was a problem hiding this comment.
Sure, it's 2-3x slower. I will clarify in the notes. For example see this diff:
https://www.diffchecker.com/rMwi4bCx
Note that the JsonReader struct is quite large (168+ bytes) and many of the Read operations take < 100 ns, while the ctor takes 20-40 ns.
And the results (just measuring synchronous read):
BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17763
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-009706
[Host] : .NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT
Job-XIDFHR : .NET Core 2.1.3 (CoreCLR 4.6.26725.06, CoreFX 4.6.26725.05), 64bit RyuJIT
IterationCount=5 WarmupCount=3
| Method | IsDataCompact | TestCase | Mean | Error | StdDev | Scaled | ScaledSD | Gen 0 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| CurrentReader | True | BasicJson | 498.06 ns | 41.597 ns | 10.805 ns | 1.00 | 0.00 | - | 0 B |
| RefStructStreamReader | True | BasicJson | 770.21 ns | 12.822 ns | 3.330 ns | 1.55 | 0.03 | - | 0 B |
| ClassStreamReaderWithNewRefStructReader | True | BasicJson | 1,786.75 ns | 179.893 ns | 46.727 ns | 3.59 | 0.11 | 0.0229 | 112 B |
| ClassStreamReaderWithStatic | True | BasicJson | 750.87 ns | 35.707 ns | 9.275 ns | 1.51 | 0.03 | 0.0372 | 160 B |
| CurrentReader | True | HelloWorld | 90.96 ns | 4.526 ns | 1.176 ns | 1.00 | 0.00 | - | 0 B |
| RefStructStreamReader | True | HelloWorld | 339.26 ns | 6.541 ns | 1.699 ns | 3.73 | 0.05 | - | 0 B |
| ClassStreamReaderWithNewRefStructReader | True | HelloWorld | 503.95 ns | 27.721 ns | 7.200 ns | 5.54 | 0.10 | 0.0257 | 112 B |
| ClassStreamReaderWithStatic | True | HelloWorld | 245.16 ns | 9.455 ns | 2.456 ns | 2.70 | 0.04 | 0.0377 | 160 B |
| CurrentReader | True | Json400B | 751.07 ns | 38.120 ns | 9.902 ns | 1.00 | 0.00 | - | 0 B |
| RefStructStreamReader | True | Json400B | 1,109.63 ns | 98.998 ns | 25.714 ns | 1.48 | 0.04 | - | 0 B |
| ClassStreamReaderWithNewRefStructReader | True | Json400B | 2,553.46 ns | 98.878 ns | 25.683 ns | 3.40 | 0.05 | 0.0229 | 112 B |
| ClassStreamReaderWithStatic | True | Json400B | 1,022.42 ns | 39.739 ns | 10.322 ns | 1.36 | 0.02 | 0.0362 | 160 B |
| CurrentReader | True | Json400KB | 795,532.02 ns | 41,610.493 ns | 10,808.171 ns | 1.00 | 0.00 | - | 0 B |
| RefStructStreamReader | True | Json400KB | 839,487.23 ns | 26,588.312 ns | 6,906.215 ns | 1.06 | 0.02 | - | 0 B |
| ClassStreamReaderWithNewRefStructReader | True | Json400KB | 1,948,951.44 ns | 455,567.198 ns | 118,331.885 ns | 2.45 | 0.14 | - | 112 B |
| ClassStreamReaderWithStatic | True | Json400KB | 933,847.15 ns | 67,642.311 ns | 17,569.839 ns | 1.17 | 0.02 | - | 160 B |
| CurrentReader | True | Json40KB | 73,969.82 ns | 3,540.797 ns | 919.709 ns | 1.00 | 0.00 | - | 0 B |
| RefStructStreamReader | True | Json40KB | 79,708.97 ns | 3,805.099 ns | 988.360 ns | 1.08 | 0.02 | - | 0 B |
| ClassStreamReaderWithNewRefStructReader | True | Json40KB | 181,556.41 ns | 22,898.544 ns | 5,947.812 ns | 2.45 | 0.08 | - | 112 B |
| ClassStreamReaderWithStatic | True | Json40KB | 88,575.84 ns | 1,936.204 ns | 502.922 ns | 1.20 | 0.01 | - | 160 B |
| CurrentReader | True | Json4KB | 5,657.49 ns | 117.912 ns | 30.627 ns | 1.00 | 0.00 | - | 0 B |
| RefStructStreamReader | True | Json4KB | 6,261.78 ns | 351.670 ns | 91.345 ns | 1.11 | 0.02 | - | 0 B |
| ClassStreamReaderWithNewRefStructReader | True | Json4KB | 17,110.21 ns | 1,012.294 ns | 262.939 ns | 3.02 | 0.04 | - | 112 B |
| ClassStreamReaderWithStatic | True | Json4KB | 6,673.03 ns | 277.201 ns | 72.002 ns | 1.18 | 0.01 | 0.0305 | 160 B |
| Options = JsonReaderOptions.SkipComments | ||
| }; | ||
|
|
||
| while (json.Read()) |
There was a problem hiding this comment.
For high-performance scenarios this switch loop will be your worst offender; but that is a problem of all tokenize based parsers. You will hit the limit of performance very fast, I know because we already hit it. https://github.com/ravendb/ravendb/blob/v4.1/src/Sparrow/Json/BlittableJsonReaderObject.cs#L972
We are experimenting on a different paradigm of parsing where you scan and flag first, and then you can query the representation directly what you want instead. Loosely based on the ideas from https://www.microsoft.com/en-us/research/publication/mison-fast-json-parser-data-analytics/
For example:
- Where's the next object and get a
Span<byte>over it instead that you can further parse or ignore entirely, etc. - Get all the children
Span<byte>for this level. - Iterate over all array value
Span<byte>which you can parse further if you want to. - etc.
There was a problem hiding this comment.
Yep, Utf8JsonReader is an abstraction that you pay for. E.g. if you have a fixed schema or if you are not worried about having a reasonable API, you can certainly get higher performance by avoiding this abstraction.
cc @joshfree, @KrzysztofCwalina, @steveharter, @JeremyKuhne, @bartonjs, @GrabYourPitchforks, @steveharter, @terrajobst, @davidfowl, @JamesNK, @glennc, @stephentoub