Remove unnecessary logic to determine E format when parsing floating... #42298

jozkee · 2020-09-16T03:30:36Z

... point values on Utf8JsonReader and JsonDocument.

The internal field Utf8JsonReader._numberFormat is set to 'e' only if the JSON number that was read was in scientific notation format and is then passed as the standardFormat argument on Utf8Parser.TryParse method.

There is no difference from parsing using the default format, except for the next condition, which is granted to always evaluate as false when called from Utf8JsonReader:

runtime/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Float.cs

Lines 104 to 108 in 33dba95

    
           if ((!textUsedExponentNotation) && (standardFormat == 'E' || standardFormat == 'e')) 
        
           { 
        
               bytesConsumed = 0; 
        
               return false; 
        
           }

With that said, the _numberFormat field and all the related logic can be safely removed.
This change was motivated by #39363 (comment).

…point values on Utf8JsonReader and JsonDocument

ahsonkhan · 2020-09-16T07:12:48Z

This change makes sense.

There is no difference from parsing using the default format

There is a subtle difference. The only time we'd use 'e'/'E' standard format is if we require the number to be in scientific notation form, since Utf8Parser.TryParse(...) would return false for "normalized" inputs in that case.

Default, G, and e/E, all allow exponents, but e/E is the only one that requires them. Looking back even to 2.1 (when these APIs were originally introduced), this behavior hasn't changed:
https://github.com/dotnet/corefx/blob/2b5ec6d425c44a8dac3ccdcc355ce63b7fc80ddc/src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Float.cs#L77
I also don't think there is any performance impact of using default vs e, both for number in scientific notation and normal notation. It might be worth doing a quick test to validate.

One of the benefits of keeping the number format field is that it acts as validation that the data being passed through the layers is correct (i.e. the token that the reader processed, is what is being passed to the parser). If while reading the JSON, we found the number was in scientific notation, then it is useful to pass 'e' to the Utf8Parser API so that it can verify that indeed that is the case (and rejects input like 123, if there happened to be a data flow bug).

However, given the extensive set of tests we already have, and since we benefit from reducing the state needed to be tracked which simplifies the logic, it is worth it to remove that field.

// A regular number like 123 isn't valid, it needs to be in scientific notation, like 123e0
// This validation is not required when it comes to JSON.
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123"), out float val, out _, 'e')); // false
Console.WriteLine(val); // 0
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123"), out val, out _, 'E')); // false
Console.WriteLine(val); // 0

// Currently, casing doesn't matter, since 'e' and 'E' both allow 123e1 and 123E1.	
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123e1"), out val, out _, 'e')); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123E1"), out val, out _, 'e')); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123e1"), out val, out _, 'E')); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123E1"), out val, out _, 'E')); // true
Console.WriteLine(val); // 1230

// The behavior is identical, if we don't pass in a standard format, and use the default one.
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123"), out val, out _)); // true
Console.WriteLine(val); // 123
		
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123e1"), out val, out _)); // true
Console.WriteLine(val); // 1230
Console.WriteLine(Utf8Parser.TryParse(Encoding.UTF8.GetBytes("123E1"), out val, out _)); // true
Console.WriteLine(val); // 1230

For some context, the use of this field was introduced in the following PR, to avoid having to loop through and check if the number span contained an 'e': dotnet/corefx#34386, which as you point out, wasn't necessary to begin with.

ahsonkhan

Nice!

steveharter · 2020-09-17T12:26:34Z

src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.TryGet.cs


-            if (Utf8Parser.TryParse(span, out decimal tmp, out int bytesConsumed, _numberFormat)
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal bool TryGetDecimalCore(out decimal value, ReadOnlySpan<byte> span)


nit: the out parameter is normally last or towards the end.

All other TryGet*Core methods have this same parameter ordering and since this is internal I think is no much value on changing the ordering here, perhaps that can be made in a clean-up PR later.

Remove unnecessary logic to determine E format when parsing floating …

c3e0b60

…point values on Utf8JsonReader and JsonDocument

jozkee added the area-System.Text.Json label Sep 16, 2020

jozkee added this to the 6.0.0 milestone Sep 16, 2020

jozkee requested review from layomia and steveharter as code owners September 16, 2020 03:30

jozkee self-assigned this Sep 16, 2020

ahsonkhan approved these changes Sep 16, 2020

View reviewed changes

layomia approved these changes Sep 17, 2020

View reviewed changes

steveharter reviewed Sep 17, 2020

View reviewed changes

jozkee merged commit 527f9ae into dotnet:master Sep 17, 2020

jozkee deleted the remove_numberformat branch September 17, 2020 22:45

layomia mentioned this pull request Sep 20, 2020

Add JsonNumberHandling(Attribute) & support for (de)serializing numbers from/to string #39363

Merged

ghost locked as resolved and limited conversation to collaborators Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove unnecessary logic to determine E format when parsing floating... #42298

Remove unnecessary logic to determine E format when parsing floating... #42298

Uh oh!

jozkee commented Sep 16, 2020

Uh oh!

ahsonkhan commented Sep 16, 2020

Uh oh!

ahsonkhan left a comment

Uh oh!

steveharter Sep 17, 2020

Uh oh!

jozkee Sep 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if ((!textUsedExponentNotation) && (standardFormat == 'E' \|\| standardFormat == 'e'))
	{
	bytesConsumed = 0;
	return false;
	}

Remove unnecessary logic to determine E format when parsing floating... #42298

Remove unnecessary logic to determine E format when parsing floating... #42298

Uh oh!

Conversation

jozkee commented Sep 16, 2020

Uh oh!

ahsonkhan commented Sep 16, 2020

Uh oh!

ahsonkhan left a comment

Choose a reason for hiding this comment

Uh oh!

steveharter Sep 17, 2020

Choose a reason for hiding this comment

Uh oh!

jozkee Sep 17, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants