Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic. by tannergooding · Pull Request #25901 · dotnet/coreclr

tannergooding · 2019-07-26T17:45:00Z

This resolves https://github.com/dotnet/coreclr/issues/25857 and also brings in a nice little perf increase for the managed implementation.

Before Intrinsic

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
Round	4.169 us	0.0134 us	0.0119 us	4.168 us	4.153 us	4.187 us	-	-	-	-
Round	4.183 us	0.0118 us	0.0111 us	4.181 us	4.165 us	4.201 us	-	-	-	-

After Intrinsic

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
Round	4.150 us	0.0064 us	0.0060 us	4.151 us	4.141 us	4.161 us	-	-	-	-
Round	4.164 us	0.0107 us	0.0100 us	4.166 us	4.146 us	4.179 us	-	-	-	-

Before Managed

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
Round	32.12 us	0.0903 us	0.0845 us	32.14 us	31.94 us	32.24 us	-	-	-	-
Round	25.03 us	0.1173 us	0.1040 us	25.02 us	24.81 us	25.20 us	-	-	-	-

After Managed

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
Round	17.29 us	0.0432 us	0.0404 us	17.27 us	17.23 us	17.36 us	-	-	-	-
Round	15.77 us	0.0150 us	0.0140 us	15.77 us	15.75 us	15.80 us	-	-	-	-

…intrinsic and managed form are deterministic.

tannergooding · 2019-07-26T17:46:02Z

CC. @danmosemsft.

We want to backport this to nc3.0 given it impacts determinism under tiering, correct?

tannergooding · 2019-07-26T17:46:19Z

CC. @EgorBo

danmoseley · 2019-07-26T17:59:06Z

We want to backport this to nc3.0 given it impacts determinism under tiering, correct?

The impact certainly meets the bar - numeric behavior changing behavior spontanously is a recipe for heisenbugs.

But the change seems large and potentially risky. Thoughts about that? Do we have enough coverage on different platforms?

Is there a way to run all CoreFX tests without tiering (ie compare a run with all "optimized JIT" and another run with all "quick JIT") to see whether any other behavior is different in the "no tiering" regime?

cc @jkotas @kouvel

danmoseley · 2019-07-26T17:59:58Z

Is there a more targeted, safer fix that could be done for 3.0 only? I don't care about the perf improvement for 3.0

danmoseley · 2019-07-26T18:02:59Z

Are there more tests we can usefully add (on top of testing with and without tiering)? That could also reduce risk

tannergooding · 2019-07-26T18:07:27Z

Is there a more targeted, safer fix that could be done for 3.0 only? I don't care about the perf improvement for 3.0

This is the targeted fix and is using a known good/existing implementation. It just happens to bring in a perf benefit at the same time.

Are there more tests we can usefully add (on top of testing with and without tiering)? That could also reduce risk

Tests have to be added to CoreFX, and I am working on getting those up.

tannergooding · 2019-07-26T18:09:14Z

-            // This shortcut is necessary to workaround precision loss in borderline cases on some platforms

-            if (x == (float)((int)x))
+            // This is based on the 'Berkeley SoftFloat Release 3e' algorithm


For reference: https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f32_roundToInt.c

The irrelevant support for the other rounding modes and error reporting was not included.
Code comments were added for readability, they do not exist in the reference.

tannergooding · 2019-07-26T18:09:39Z


-            // If the number has no fractional part do nothing
-            // This shortcut is necessary to workaround precision loss in borderline cases on some platforms
+            // This is based on the 'Berkeley SoftFloat Release 3e' algorithm


For reference: https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f64_roundToInt.c

The irrelevant support for the other rounding modes and error reporting was not included.
Code comments were added for readability, they do not exist in the reference.

It might be worth adding that note (i.e. that the other rounding modes from that algorithm are not included).

danmoseley · 2019-07-26T18:14:28Z

Also can you briefly educate me how tiering caused htis?

tannergooding · 2019-07-26T18:29:17Z

Also can you briefly educate me how tiering caused htis?

Under Tier 0, the JIT will just use the pre-jitted System.Math.Round and System.MathF.Round implementations. These implementations have a bug and do not always return the IEEE compliant result.

After a certain number of calls (30), the JIT will rejit these methods for the current hardware. On machines with SSE4.1 support (most machines in the past 12 years), the JIT will treat these methods as "intrinsic" and just directly emit the roundsd and roundss instruction instead of compiling the C# implementation. These instructions are IEEE compliant.

This difference can cause cold vs hot methods to return different results for some inputs (for example, as is the case with 0.499999969f).

tannergooding · 2019-07-26T18:30:22Z

This PR just updates the implementation to use a known good/correct implementation which should always return the correct result (and therefore the behavior should always match between the various tiers).

EgorBo · 2019-07-26T18:35:07Z

I found the problematic values using this script:

using System;
using System.Runtime.CompilerServices;

class Program
{
    static void Main()
    {
        Random rand = new Random();
        for (int i = 0; i < int.MaxValue; i++)
        {
            double rd = rand.NextDouble();
            int ri = rand.Next(int.MinValue, int.MaxValue);

            if (SSE41Round((float)rd) != ManagedRound((float)rd))
            {
                Console.WriteLine(rd);
            }

            if (SSE41Round((float)(rd * ri)) != ManagedRound((float)(rd * ri)))
            {
                Console.WriteLine(rd);
            }
        }

        Console.WriteLine("Done.");
    }

    // aka R2R
    [MethodImpl(MethodImplOptions.NoOptimization | MethodImplOptions.NoInlining)]
    static float ManagedRound(float x) => MathF.Round(x);

    // aka tier1 (or tier0 if mscorlib is not prejitted)
    [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.NoInlining)]
    static float SSE41Round(float x) => MathF.Round(x);
}

jkotas · 2019-07-26T18:46:56Z

Also can you briefly educate me how tiering caused htis?

This bug is about observable floating point behavior differences with optimizations on vs. off. Tiering switches between optimization off and on mid-flight that makes this type of bugs to be more severe.

This specific bug is not new. It has been in the product for a long time. The following program will produce difference results for dotnet run vs. dotnet run -c Release on .NET Core 2.1:

using System;

class Program
{
    static float x = 0.499999969f;

    static void Main(string[] args)
    {
        Console.WriteLine(MathF.Round(x));
    }
}

What is our overall confidence that floating point computation produce exact same results with optimizations on vs. optimizations off? If we are not confident in this, "safe fix" may be to disable tiering for any methods with floating point math.

EgorBo · 2019-07-26T19:57:55Z

@jkotas at least it couldn't happen during the same app session/process at some random point.
I was told hotspot folks had several similar interp/C1/C2 issues with FP methods.

I guess it makes sense to revise all XX.IsSupported paths for bugs like this one (because as far as I understand R2R contains only software fallbacks). Or as you said - always tier1 all methods with specific intrinsics inside.

UPD it looks like only System.Numerics.Matrix4x4 touches floating point intrinsics.

…n 0.5 and less than 1.0

tannergooding · 2019-07-26T20:21:54Z

What is our overall confidence that floating point computation produce exact same results with optimizations on vs. optimizations off? If we are not confident in this, "safe fix" may be to disable tiering for any methods with floating point math.

I created and ran the following program for the entire 32-bit floating point range. Aside from NaN differences (software preserves the exact NaN, hardware normalizes to a single NaN), the results are identical: https://gist.github.com/tannergooding/a55001ef69b6ec0746fc05468786cf80

tannergooding · 2019-07-26T20:23:12Z

(I also found a bug in the codegen for Sse41.Round*(Vector128<T>) and will be putting up a fix momentarily; this bug does not impact the 2 argument overloads).

tannergooding · 2019-07-26T20:28:07Z

Logged https://github.com/dotnet/coreclr/issues/25904 and opened #25905 for the aforementioned codegen issue.

danmoseley · 2019-07-28T20:56:29Z

So there are 3 implementations

roundsd and roundss (when not Tier0 if SSE4.1 available)
MathF/Math (otherwise)
FloatingPointUtils::round (used when evaluating constant expressions during code generation?)

tannergooding · 2019-07-29T02:09:57Z

So there are 3 implementations

Correct, although the second two are meant to be identical copies of each other (the former in C# and the latter in C++).

tannergooding · 2019-08-02T19:42:23Z

Could I get review/sign-off on this for .NET 5 at least, and then we can take it to tactics on Tuesday?

Or are you wanting a different change for release/3.0?

danmoseley · 2019-08-02T20:32:27Z

@jkotas is back Monday, I am interested in his take and someone on @dotnet/jit-contrib to help choose correct fix for 3.0 if any. If we want to do something different for 3.0, I suggest to put that smaller change in master as well, so it gets more coverage, and we can later put the better fix (if it is this) in master.

If we are not confident in this, "safe fix" may be to disable tiering for any methods with floating point math.

@kouvel thoughts on the above? It seems this is not a new bug, it is just not encountered easily without tiering.

tannergooding · 2019-08-02T20:44:25Z

The confidence is fairly high given:

I created and ran the following program for the entire 32-bit floating point range. Aside from NaN differences (software preserves the exact NaN, hardware normalizes to a single NaN), the results are identical: https://gist.github.com/tannergooding/a55001ef69b6ec0746fc05468786cf80

danmoseley · 2019-08-02T20:45:53Z

The confidence is fairly high given:

Point taken. @dotnet/jit-contrib ? On the numerics side, @GrabYourPitchforks any concern?

jkotas · 2019-08-03T06:04:02Z

This fix for Math.Round looks fine to me.

What is our overall confidence that floating point computation produce exact same results with optimizations on vs. optimizations off?

I meant floating point computation in general, not specific to Math.Round. For example, https://github.com/dotnet/coreclr/issues/20561 was a similar bug where floating point computation returned different results with optimizations on and off. How many more similar bugs are lurking there? What can we do to find them?

tannergooding · 2019-08-05T14:23:08Z

This fix for Math.Round looks fine to me.

Am I good to merge this and put up the backport to release/3.0 for tactics tomorrow then?

How many more similar bugs are lurking there? What can we do to find them?

There are still a few known IEEE bugs and they are all tracked by issues. I don't think we have any other known ones for Debug vs Release or Tiered vs non-Tiered right now.

As for finding them, I think all we can really do here is compare against known good implementations (the implementations that are slower but have been the "gold standard" for many years) and increase test coverage for known problematic values (not just the special values like positive/negative 1, 0, Infinity, and NaN; but also midpoint values, denormal values, min/max of the domain for the given function, etc).

danmoseley · 2019-08-05T14:32:57Z

I'm fine with merging and porting if this is signed off on

AndyAyersMS

Do we know when the original regression was first introduced?

CarolEidt · 2019-08-05T15:41:03Z


-            // If the number has no fractional part do nothing
-            // This shortcut is necessary to workaround precision loss in borderline cases on some platforms
+            // This is based on the 'Berkeley SoftFloat Release 3e' algorithm


It might be worth adding that note (i.e. that the other rounding modes from that algorithm are not included).

CarolEidt · 2019-08-05T15:48:09Z

Do we know when the original regression was first introduced?

I believe that, as discussed above, this is a long-standing bug, but the difference in behavior was not readily observable without tiering.

tannergooding · 2019-08-05T15:48:15Z

@AndyAyersMS, there would have been Debug vs Release differences since ~January of last year when support for generating roundss/roundsd was enabled: #14736

The managed implementation has been "incorrect" and not matched the IEEE compliant behavior for as far back as I can find.

Edit: As Carol called out, the tiering difference which impacts the same program is new in .NET Core 3.0.

tannergooding · 2019-08-05T21:54:45Z

It might be worth adding that note (i.e. that the other rounding modes from that algorithm are not included).

I'll add this in a follow up PR here.

…t so that the intrinsic and managed form are deterministic. (#26017) * Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic. (#25901) * Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic. * Fixing the Math.Round and MathF.Round handling for values greater than 0.5 and less than 1.0 * Applying formatting patch. * Adding a comment about only having the roundToNearestTiesToEven code path

…intrinsic and managed form are deterministic. (dotnet/coreclr#25901) * Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic. * Fixing the Math.Round and MathF.Round handling for values greater than 0.5 and less than 1.0 * Applying formatting patch. Commit migrated from dotnet/coreclr@c384e36

Updating Math.Round and MathF.Round to be IEEE compliant so that the …

e248d22

…intrinsic and managed form are deterministic.

tannergooding commented Jul 26, 2019

View reviewed changes

Fixing the Math.Round and MathF.Round handling for values greater tha…

1676e11

…n 0.5 and less than 1.0

tannergooding mentioned this pull request Jul 26, 2019

Ensure that we check for implicit ivals for single argument intrinsics before the SIMDScalar handling. #25905

Merged

Applying formatting patch.

7e81291

danmoseley requested review from AndyAyersMS and jkotas August 2, 2019 20:29

AndyAyersMS approved these changes Aug 5, 2019

View reviewed changes

CarolEidt approved these changes Aug 5, 2019

View reviewed changes

tannergooding merged commit c384e36 into dotnet:master Aug 5, 2019

tannergooding mentioned this pull request Aug 5, 2019

[release/3.0] Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic. #26017

Merged

Conversation

tannergooding commented Jul 26, 2019

Before Intrinsic

After Intrinsic

Before Managed

After Managed

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

danmoseley commented Jul 26, 2019

Uh oh!

danmoseley commented Jul 26, 2019

Uh oh!

danmoseley commented Jul 26, 2019

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

tannergooding Jul 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Jul 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarolEidt Aug 5, 2019

Choose a reason for hiding this comment

Uh oh!

danmoseley commented Jul 26, 2019

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

EgorBo commented Jul 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Jul 26, 2019

Uh oh!

EgorBo commented Jul 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

tannergooding commented Jul 26, 2019

Uh oh!

danmoseley commented Jul 28, 2019

Uh oh!

tannergooding commented Jul 29, 2019

Uh oh!

tannergooding commented Aug 2, 2019

Uh oh!

danmoseley commented Aug 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Aug 2, 2019

Uh oh!

danmoseley commented Aug 2, 2019

Uh oh!

jkotas commented Aug 3, 2019

Uh oh!

tannergooding commented Aug 5, 2019

Uh oh!

danmoseley commented Aug 5, 2019

Uh oh!

AndyAyersMS left a comment

Choose a reason for hiding this comment

Uh oh!

CarolEidt Aug 5, 2019

Choose a reason for hiding this comment

Uh oh!

CarolEidt commented Aug 5, 2019

Uh oh!

tannergooding commented Aug 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding Jul 26, 2019 •

edited

Loading

tannergooding Jul 26, 2019 •

edited

Loading

EgorBo commented Jul 26, 2019 •

edited

Loading

EgorBo commented Jul 26, 2019 •

edited

Loading

danmoseley commented Aug 2, 2019 •

edited

Loading

tannergooding commented Aug 5, 2019 •

edited

Loading