Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic.#25901

Merged
tannergooding merged 3 commits into
dotnet:masterfrom
tannergooding:fix-25857
Aug 5, 2019
Merged

Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic.#25901
tannergooding merged 3 commits into
dotnet:masterfrom
tannergooding:fix-25857

Conversation

@tannergooding
Copy link
Copy Markdown
Member

This resolves https://github.com/dotnet/coreclr/issues/25857 and also brings in a nice little perf increase for the managed implementation.

Before Intrinsic

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Round 4.169 us 0.0134 us 0.0119 us 4.168 us 4.153 us 4.187 us - - - -
Round 4.183 us 0.0118 us 0.0111 us 4.181 us 4.165 us 4.201 us - - - -

After Intrinsic

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Round 4.150 us 0.0064 us 0.0060 us 4.151 us 4.141 us 4.161 us - - - -
Round 4.164 us 0.0107 us 0.0100 us 4.166 us 4.146 us 4.179 us - - - -

Before Managed

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Round 32.12 us 0.0903 us 0.0845 us 32.14 us 31.94 us 32.24 us - - - -
Round 25.03 us 0.1173 us 0.1040 us 25.02 us 24.81 us 25.20 us - - - -

After Managed

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Round 17.29 us 0.0432 us 0.0404 us 17.27 us 17.23 us 17.36 us - - - -
Round 15.77 us 0.0150 us 0.0140 us 15.77 us 15.75 us 15.80 us - - - -

…intrinsic and managed form are deterministic.
@tannergooding
Copy link
Copy Markdown
Member Author

CC. @danmosemsft.

We want to backport this to nc3.0 given it impacts determinism under tiering, correct?

@tannergooding
Copy link
Copy Markdown
Member Author

CC. @EgorBo

@danmoseley
Copy link
Copy Markdown
Member

We want to backport this to nc3.0 given it impacts determinism under tiering, correct?

The impact certainly meets the bar - numeric behavior changing behavior spontanously is a recipe for heisenbugs.

But the change seems large and potentially risky. Thoughts about that? Do we have enough coverage on different platforms?

Is there a way to run all CoreFX tests without tiering (ie compare a run with all "optimized JIT" and another run with all "quick JIT") to see whether any other behavior is different in the "no tiering" regime?

cc @jkotas @kouvel

@danmoseley
Copy link
Copy Markdown
Member

Is there a more targeted, safer fix that could be done for 3.0 only? I don't care about the perf improvement for 3.0

@danmoseley
Copy link
Copy Markdown
Member

Are there more tests we can usefully add (on top of testing with and without tiering)? That could also reduce risk

@tannergooding
Copy link
Copy Markdown
Member Author

Is there a more targeted, safer fix that could be done for 3.0 only? I don't care about the perf improvement for 3.0

This is the targeted fix and is using a known good/existing implementation. It just happens to bring in a perf benefit at the same time.

Are there more tests we can usefully add (on top of testing with and without tiering)? That could also reduce risk

Tests have to be added to CoreFX, and I am working on getting those up.

// This shortcut is necessary to workaround precision loss in borderline cases on some platforms

if (x == (float)((int)x))
// This is based on the 'Berkeley SoftFloat Release 3e' algorithm
Copy link
Copy Markdown
Member Author

@tannergooding tannergooding Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference: https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f32_roundToInt.c

The irrelevant support for the other rounding modes and error reporting was not included.
Code comments were added for readability, they do not exist in the reference.


// If the number has no fractional part do nothing
// This shortcut is necessary to workaround precision loss in borderline cases on some platforms
// This is based on the 'Berkeley SoftFloat Release 3e' algorithm
Copy link
Copy Markdown
Member Author

@tannergooding tannergooding Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference: https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f64_roundToInt.c

The irrelevant support for the other rounding modes and error reporting was not included.
Code comments were added for readability, they do not exist in the reference.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding that note (i.e. that the other rounding modes from that algorithm are not included).

@danmoseley
Copy link
Copy Markdown
Member

Also can you briefly educate me how tiering caused htis?

@tannergooding
Copy link
Copy Markdown
Member Author

Also can you briefly educate me how tiering caused htis?

Under Tier 0, the JIT will just use the pre-jitted System.Math.Round and System.MathF.Round implementations. These implementations have a bug and do not always return the IEEE compliant result.

After a certain number of calls (30), the JIT will rejit these methods for the current hardware. On machines with SSE4.1 support (most machines in the past 12 years), the JIT will treat these methods as "intrinsic" and just directly emit the roundsd and roundss instruction instead of compiling the C# implementation. These instructions are IEEE compliant.

This difference can cause cold vs hot methods to return different results for some inputs (for example, as is the case with 0.499999969f).

@tannergooding
Copy link
Copy Markdown
Member Author

This PR just updates the implementation to use a known good/correct implementation which should always return the correct result (and therefore the behavior should always match between the various tiers).

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Jul 26, 2019

I found the problematic values using this script:

using System;
using System.Runtime.CompilerServices;

class Program
{
    static void Main()
    {
        Random rand = new Random();
        for (int i = 0; i < int.MaxValue; i++)
        {
            double rd = rand.NextDouble();
            int ri = rand.Next(int.MinValue, int.MaxValue);

            if (SSE41Round((float)rd) != ManagedRound((float)rd))
            {
                Console.WriteLine(rd);
            }

            if (SSE41Round((float)(rd * ri)) != ManagedRound((float)(rd * ri)))
            {
                Console.WriteLine(rd);
            }
        }

        Console.WriteLine("Done.");
    }

    // aka R2R
    [MethodImpl(MethodImplOptions.NoOptimization | MethodImplOptions.NoInlining)]
    static float ManagedRound(float x) => MathF.Round(x);

    // aka tier1 (or tier0 if mscorlib is not prejitted)
    [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.NoInlining)]
    static float SSE41Round(float x) => MathF.Round(x);
}

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Jul 26, 2019

Also can you briefly educate me how tiering caused htis?

This bug is about observable floating point behavior differences with optimizations on vs. off. Tiering switches between optimization off and on mid-flight that makes this type of bugs to be more severe.

This specific bug is not new. It has been in the product for a long time. The following program will produce difference results for dotnet run vs. dotnet run -c Release on .NET Core 2.1:

using System;

class Program
{
    static float x = 0.499999969f;

    static void Main(string[] args)
    {
        Console.WriteLine(MathF.Round(x));
    }
}

What is our overall confidence that floating point computation produce exact same results with optimizations on vs. optimizations off? If we are not confident in this, "safe fix" may be to disable tiering for any methods with floating point math.

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Jul 26, 2019

@jkotas at least it couldn't happen during the same app session/process at some random point.
I was told hotspot folks had several similar interp/C1/C2 issues with FP methods.

I guess it makes sense to revise all XX.IsSupported paths for bugs like this one (because as far as I understand R2R contains only software fallbacks). Or as you said - always tier1 all methods with specific intrinsics inside.

UPD it looks like only System.Numerics.Matrix4x4 touches floating point intrinsics.

@tannergooding
Copy link
Copy Markdown
Member Author

What is our overall confidence that floating point computation produce exact same results with optimizations on vs. optimizations off? If we are not confident in this, "safe fix" may be to disable tiering for any methods with floating point math.

I created and ran the following program for the entire 32-bit floating point range. Aside from NaN differences (software preserves the exact NaN, hardware normalizes to a single NaN), the results are identical: https://gist.github.com/tannergooding/a55001ef69b6ec0746fc05468786cf80

@tannergooding
Copy link
Copy Markdown
Member Author

(I also found a bug in the codegen for Sse41.Round*(Vector128<T>) and will be putting up a fix momentarily; this bug does not impact the 2 argument overloads).

@tannergooding
Copy link
Copy Markdown
Member Author

Logged https://github.com/dotnet/coreclr/issues/25904 and opened #25905 for the aforementioned codegen issue.

@danmoseley
Copy link
Copy Markdown
Member

So there are 3 implementations

  1. roundsd and roundss (when not Tier0 if SSE4.1 available)
  2. MathF/Math (otherwise)
  3. FloatingPointUtils::round (used when evaluating constant expressions during code generation?)

@tannergooding
Copy link
Copy Markdown
Member Author

So there are 3 implementations

Correct, although the second two are meant to be identical copies of each other (the former in C# and the latter in C++).

@tannergooding
Copy link
Copy Markdown
Member Author

Could I get review/sign-off on this for .NET 5 at least, and then we can take it to tactics on Tuesday?

Or are you wanting a different change for release/3.0?

@danmoseley
Copy link
Copy Markdown
Member

danmoseley commented Aug 2, 2019

@jkotas is back Monday, I am interested in his take and someone on @dotnet/jit-contrib to help choose correct fix for 3.0 if any. If we want to do something different for 3.0, I suggest to put that smaller change in master as well, so it gets more coverage, and we can later put the better fix (if it is this) in master.

If we are not confident in this, "safe fix" may be to disable tiering for any methods with floating point math.

@kouvel thoughts on the above? It seems this is not a new bug, it is just not encountered easily without tiering.

@tannergooding
Copy link
Copy Markdown
Member Author

The confidence is fairly high given:

I created and ran the following program for the entire 32-bit floating point range. Aside from NaN differences (software preserves the exact NaN, hardware normalizes to a single NaN), the results are identical: https://gist.github.com/tannergooding/a55001ef69b6ec0746fc05468786cf80

@danmoseley
Copy link
Copy Markdown
Member

The confidence is fairly high given:

Point taken. @dotnet/jit-contrib ? On the numerics side, @GrabYourPitchforks any concern?

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 3, 2019

This fix for Math.Round looks fine to me.

What is our overall confidence that floating point computation produce exact same results with optimizations on vs. optimizations off?

I meant floating point computation in general, not specific to Math.Round. For example, https://github.com/dotnet/coreclr/issues/20561 was a similar bug where floating point computation returned different results with optimizations on and off. How many more similar bugs are lurking there? What can we do to find them?

@tannergooding
Copy link
Copy Markdown
Member Author

This fix for Math.Round looks fine to me.

Am I good to merge this and put up the backport to release/3.0 for tactics tomorrow then?

How many more similar bugs are lurking there? What can we do to find them?

There are still a few known IEEE bugs and they are all tracked by issues. I don't think we have any other known ones for Debug vs Release or Tiered vs non-Tiered right now.

As for finding them, I think all we can really do here is compare against known good implementations (the implementations that are slower but have been the "gold standard" for many years) and increase test coverage for known problematic values (not just the special values like positive/negative 1, 0, Infinity, and NaN; but also midpoint values, denormal values, min/max of the domain for the given function, etc).

@danmoseley
Copy link
Copy Markdown
Member

I'm fine with merging and porting if this is signed off on

Copy link
Copy Markdown
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know when the original regression was first introduced?


// If the number has no fractional part do nothing
// This shortcut is necessary to workaround precision loss in borderline cases on some platforms
// This is based on the 'Berkeley SoftFloat Release 3e' algorithm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding that note (i.e. that the other rounding modes from that algorithm are not included).

@CarolEidt
Copy link
Copy Markdown

Do we know when the original regression was first introduced?

I believe that, as discussed above, this is a long-standing bug, but the difference in behavior was not readily observable without tiering.

@tannergooding
Copy link
Copy Markdown
Member Author

tannergooding commented Aug 5, 2019

@AndyAyersMS, there would have been Debug vs Release differences since ~January of last year when support for generating roundss/roundsd was enabled: #14736

The managed implementation has been "incorrect" and not matched the IEEE compliant behavior for as far back as I can find.

Edit: As Carol called out, the tiering difference which impacts the same program is new in .NET Core 3.0.

@tannergooding tannergooding merged commit c384e36 into dotnet:master Aug 5, 2019
@tannergooding
Copy link
Copy Markdown
Member Author

It might be worth adding that note (i.e. that the other rounding modes from that algorithm are not included).

I'll add this in a follow up PR here.

wtgodbe pushed a commit that referenced this pull request Aug 7, 2019
…t so that the intrinsic and managed form are deterministic. (#26017)

* Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic. (#25901)

* Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic.

* Fixing the Math.Round and MathF.Round handling for values greater than 0.5 and less than 1.0

* Applying formatting patch.

* Adding a comment about only having the roundToNearestTiesToEven code path
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…intrinsic and managed form are deterministic. (dotnet/coreclr#25901)

* Updating Math.Round and MathF.Round to be IEEE compliant so that the intrinsic and managed form are deterministic.

* Fixing the Math.Round and MathF.Round handling for values greater than 0.5 and less than 1.0

* Applying formatting patch.


Commit migrated from dotnet/coreclr@c384e36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Math.Round returns different values for the same input.

6 participants