Skip to content

Conversation

@jozkee
Copy link
Member

@jozkee jozkee commented Sep 16, 2020

... point values on Utf8JsonReader and JsonDocument.

The internal field Utf8JsonReader._numberFormat is set to 'e' only if the JSON number that was read was in scientific notation format and is then passed as the standardFormat argument on Utf8Parser.TryParse method.

There is no difference from parsing using the default format, except for the next condition, which is granted to always evaluate as false when called from Utf8JsonReader:

if ((!textUsedExponentNotation) && (standardFormat == 'E' || standardFormat == 'e'))
{
bytesConsumed = 0;
return false;
}

With that said, the _numberFormat field and all the related logic can be safely removed.
This change was motivated by #39363 (comment).

…point values on Utf8JsonReader and JsonDocument
@jozkee jozkee added this to the 6.0.0 milestone Sep 16, 2020
@jozkee jozkee self-assigned this Sep 16, 2020
@ahsonkhan
Copy link
Contributor

This change makes sense.

There is no difference from parsing using the default format

There is a subtle difference. The only time we'd use 'e'/'E' standard format is if we require the number to be in scientific notation form, since Utf8Parser.TryParse(...) would return false for "normalized" inputs in that case.

Default, G, and e/E, all allow exponents, but e/E is the only one that requires them. Looking back even to 2.1 (when these APIs were originally introduced), this behavior hasn't changed:
https://github.com/dotnet/corefx/blob/2b5ec6d425c44a8dac3ccdcc355ce63b7fc80ddc/src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Float.cs#L77
I also don't think there is any performance impact of using default vs e, both for number in scientific notation and normal notation. It might be worth doing a quick test to validate.

One of the benefits of keeping the number format field is that it acts as validation that the data being passed through the layers is correct (i.e. the token that the reader processed, is what is being passed to the parser). If while reading the JSON, we found the number was in scientific notation, then it is useful to pass 'e' to the Utf8Parser API so that it can verify that indeed that is the case (and rejects input like 123, if there happened to be a data flow bug).

However, given the extensive set of tests we already have, and since we benefit from reducing the state needed to be tracked which simplifies the logic, it is worth it to remove that field.

// A regular number like 123 isn't valid, it needs to be in scientific notation, like 123e0
// This validation is not required when it comes to JSON.
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123"), out float val, out _, 'e')); // false
Console.WriteLine(val); // 0
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123"), out val, out _, 'E')); // false
Console.WriteLine(val); // 0

// Currently, casing doesn't matter, since 'e' and 'E' both allow 123e1 and 123E1.	
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123e1"), out val, out _, 'e')); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123E1"), out val, out _, 'e')); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123e1"), out val, out _, 'E')); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123E1"), out val, out _, 'E')); // true
Console.WriteLine(val); // 1230

// The behavior is identical, if we don't pass in a standard format, and use the default one.
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123"), out val, out _)); // true
Console.WriteLine(val); // 123
		
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123e1"), out val, out _)); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123E1"), out val, out _)); // true
Console.WriteLine(val); // 1230

For some context, the use of this field was introduced in the following PR, to avoid having to loop through and check if the number span contained an 'e': dotnet/corefx#34386, which as you point out, wasn't necessary to begin with.

Copy link
Contributor

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


if (Utf8Parser.TryParse(span, out decimal tmp, out int bytesConsumed, _numberFormat)
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal bool TryGetDecimalCore(out decimal value, ReadOnlySpan<byte> span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the out parameter is normally last or towards the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other TryGet*Core methods have this same parameter ordering and since this is internal I think is no much value on changing the ordering here, perhaps that can be made in a clean-up PR later.

@jozkee jozkee merged commit 527f9ae into dotnet:master Sep 17, 2020
@jozkee jozkee deleted the remove_numberformat branch September 17, 2020 22:45
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants