Revert "std.math: proper support for 64-bit real in exp()" by ibuclaw · Pull Request #3483 · dlang/phobos

ibuclaw · 2015-07-10T09:49:34Z

Reverts #3285

Because no one has responded to any of my complaints or concerns for over a month. @kinke @WalterBright @MartinNowak

ibuclaw · 2015-07-10T09:53:21Z

std/math.d

To re-cap what I've already raised - for the third time.

This line is not using the RealFormat enum, it should be F.realFormat == RealFormat.ieeeExtended, likewise the static if below too (F.realFormat == RealFormat.ieeeDouble)

Don't you think performing these 2 changes in a proper commit would be just as much work as this revert+description?

I raised a PR 1 month ago. (#3388) - no one responded.

MartinNowak · 2015-07-23T06:07:55Z

Revert of fix (#3514), your decision.

schveiguy · 2015-11-16T14:15:13Z

Is this PR still necessary? I don't understand the code enough to tell.

kinke · 2015-11-16T14:32:47Z

The RealFormat enum is now used due to #3514. I've been running into these tests again these days when trying to enable LLVM math intrinsics for LDC. I'll re-run the tests again when I'm home and see if the D version of std.math.exp(-0x1.64p+9L) for 64-bit reals on Win64 still yields the same result diverging from GDC (@ibuclaw: how exactly did you produce your result?).

ibuclaw · 2015-11-16T15:44:54Z

(@ibuclaw: how exactly did you produce your result?).

Compile with -mlong-double-64, any platform such as Linux x86 will work.

kinke · 2015-11-16T18:25:39Z

I still get 0x0.06f84920bb2d3p-1022 for std.math.exp(). core.stdc.math.exp() (VS 2015 runtime) on the other hand yields 0x0.06f84920bb2d4p-1022, i.e., Iain's result. Maybe it's just one of the many non-hex literals which gets parsed slightly differently by LDC (I think we use MS strtod() for that).
The LLVM intrinsic (most likely just using the MS C implementation) also yields ...4.

ibuclaw · 2015-11-16T18:52:27Z

@kinke - How does std.math.exp return doubles vs. core.stdc.math.exp? Are you forcing a different ABI for extern(D) in LLVM?

ibuclaw · 2015-11-16T18:57:15Z

BTW: The limits/values I have for real == double are:

enum real C1 = 6.93145751953125E-1;
enum real C2 = 1.42860682030941723212E-6;

enum real OF = 7.09782712893383996843E2;   // log(2 ^^ 1024)
enum real UF = -7.08396418532264106224E2;  // log(2 ^^ -1022)

The first two shouldn't make a difference, as the 80-bit literals should round correctly to them.

kinke · 2015-11-16T19:18:39Z

The 2 latter literals make a difference for me - I get 0 when using them. ;)
UF is seriously different (-7.08396418532264106224E2 vs. -7.451332191019412076235E2). Seems like your limit doesn't allow subnormals at all.

I don't know how you handle the details in GDC with -mlong-double-64. For Win64 and Microsoft's lacking support for x87 in their runtime (64-bit long double), we (LDC) opted to go for native 64-bit reals. So even the literals are parsed as 64-bit doubles by a Win64 LDC build (compiled by MSVC), and all real computations (incl. CTFE!) are obviously performed in 64 bits. So core.stdc.math.exp() and core.stdc.math.expl() are equivalent and yield a double. The only difference for real with extern(D) is the mangling.

ibuclaw · 2015-11-16T19:54:41Z

The 2 latter literals make a difference for me - I get 0 when using them. ;)
UF is seriously different (-7.08396418532264106224E2 vs. -7.451332191019412076235E2). Seems like your limit doesn't allow subnormals at all.

Yes, you are correct! That serves me right from assuming the documentation is correct. :-D

Actual limits for real == double

enum real OF =  7.09782712893383996732E2;    // log(double.max)
enum real UF = -7.451332191019412076235E2;   // log(2**-1075)

Sorry about that.

ibuclaw · 2015-11-16T21:20:47Z

we (LDC) opted to go for native 64-bit reals. So even the literals are parsed as 64-bit doubles by a Win64 LDC build (compiled by MSVC), and all real computations (incl. CTFE!) are obviously performed in 64 bits.

Yes, I seem to have forgotten that we are talking about double's. Well it's been rare, but I have seen cases where returning off the stack does one thing, and return in registers does another. Just wanted to make sure I wasn't seeing the same thing again. :-)

ibuclaw · 2015-11-16T21:21:50Z

I don't know how you handle the details in GDC with -mlong-double-64.

Should be identical to using double. Perhaps a better or alternate platform to test would be ARM?

kinke · 2015-11-16T22:06:42Z

Yep, ARM results would be nice (for both GDC and LDC). But I'm pretty sure ...4 is correct, given that MS returns it too. I'm just wondering where the divergence comes from (still present with your actual constants).

kinke · 2015-11-16T23:04:39Z

Btw, what about improving feqrel() for subnormals?
In this case, we have a diff of 1 subnormal unit (2^-1022, min exponent), but as the step is from 3=011b to 4=100b, the 3 least significant mantissa bits differ. If the step was from 7=0111b to 8=1000b, the last 4 bits would differ, you get the idea.
We always use assert(feqrel(a, b) >= T.mant_dig - maxDifferingBits). Imho, a

bool feqrel(T)(T a, T b, int maxDifferingBits);

would be more convenient. And with that vital maxDifferingBits information, we could then treat subnormals differently. E.g., we could compute the absolute mantissa difference d and then simply return (d >> maxDifferingBits) == 0.
What do you think?

Edit: Oh feqrel() is currently not that bad for subnormals, i.e., it seems to do the above, but for some reason subtracts 1 from the intuitive result (double.mant_dig == 53):

feqrel(0x0.06f84920bb2d3p-1022, 0x0.06f84920bb2d4p-1022); //  1=    1b => returns 51
feqrel(0x0.06fffffffffffp-1022, 0x0.0700000000007p-1022); //  8= 1000b => returns 48
feqrel(0x0.06fffffffffffp-1022, 0x0.070000000000ep-1022); // 15= 1111b => returns 48
feqrel(0x0.06fffffffffffp-1022, 0x0.070000000000fp-1022); // 16=10000b => returns 47

This also means that the test would currently pass on Win64 LDC anyway when using 0x0.06f84920bb2d4p-1022L.

ibuclaw · 2015-11-17T20:48:23Z

I'm just wondering where the divergence comes from (still present with your actual limits).

Yeah, just making sure that it's not a value that's out of bounds from what it is meant to handle.

I'll see if I have an ARM virtual machine lingering around...

ibuclaw · 2015-11-17T21:25:36Z

I needed to apply #3386

ARM passes the tests. Hmm...

kinke · 2015-11-17T22:58:16Z

ARM passes the tests.

Why not? If it returns ...4 instead of the expected ...3, feqrel() returns 51 = real.mant_dig - 2, and the assert isn't triggered.
Your pull #3386 will come in handy, I've simply disabled the IEEE flags checks for LDC and non-x87 reals. ;)

ibuclaw · 2015-11-18T07:24:58Z

Why not? If it returns ...4 instead of the expected ...3, feqrel() returns 51 = real.mant_dig - 2, and the assert isn't triggered.

The 'hmmm' was because on x86 using doubles it asserts. Using x86 with reals asserts for me too, I'll get a full report table going on the matter.

ibuclaw · 2015-11-18T08:39:21Z

After some more testing, it seems that -mlong-double-64 is fundamentally broken in my environment at the moment. Strange, because it wasn't before. :-)

Oh well, here's the feqrel results of ARM and x86/64. Just a point of note, re-writing real exp(real) into double exp(real) and running the unittests on x86 reveals the same precision as ARM.

Input	ARM	ARM (libm)	x86/64	x86/64 (libm)
1.0	52	53	64	64
0.5	53	53	64	64
3.0	52	52	63	64
0x1.1p+13 / 0x1.6p+9	53	53	52	52
0x1.7p+13 / 0x1.7p+9	53	53	64	64
0x1p+80	53	53	64	64
real.infinity	53	53	64	64
-0x1.18p+13 / -0x1.6p+9	53	53	51	51
-0x1.625p+13	N/A	N/A	53	53
-0x1.62dafp+13 / -0x1.64p+9	51	51	54	54
-0x1.643p+13 / -0x1.743p+9	53	53	64	64
-0x1.645p+13 / -0x1.8p+9	53	53	64	64
-0x1p30	53	53	64	64

kinke · 2015-11-18T09:17:28Z

So the only minor divergences of your ARM results from my Win64 results are for 1.0 and 3.0 inputs (involving the E constant) as well as that -0x1.64p+9. Your exp(1.0L) != E is particularly interesting.

re-writing real exp(real) into double exp(real) and running the unittests on x86 reveals the same precision as ARM.

How can that be? So you only changed the return type from real to double, casting the 80-bit result down to 64-bit? I'll create a table for LDC (x86 Win64 with 64-bit reals, and x86 Linux with 80-bit reals) too once I'm home. Also note that libm's expl() yields quite different results; see ldc-developers@07a11cf#commitcomment-14404644 (where I tried to enable the llvm.exp intrinsic which seems to just forward to the libm C implementation).
Edit: My x86 libm results almost correspond perfectly to your x86 results. The only difference is for input 3.0, for which you get 63 and I got 64. So I guess you're not using the Phobos implementation (x87 inline assembly).

ibuclaw · 2015-11-18T10:23:13Z

casting the 80-bit result down to 64-bit?

No, by using the 64-bit branch explicitly.

ibuclaw · 2015-11-18T10:27:57Z

So I guess you're not using the Phobos implementation (x87 inline assembly).

Correct. GDC don't do DMD-style inline assembly. I'm testing the generic version, which shows that we can only guarantee double precision in the near overflow/underflow tests.

kinke · 2015-11-18T19:42:48Z

I'm pretty sure it's the other way around: the expected values for 80-bit reals seem crappy. If using DMD-style inline assembly, std.math.exp(real) is implemented as exp2(LOG2E*x), and exp2() uses x87 assembly... I'm currently gathering LDC results.

dnadlinger · 2015-11-18T19:59:04Z

It's probably obvious to you anyway, but I'd not trust the x87 results at all; various functions are known to contain bugs. Arbitraty-precision libraries would probably be the better reference, and IIRC libm was quite good as well.

kinke · 2015-11-18T20:18:02Z

LDC results for std.math.exp(real x), feqrel() results in [brackets]:

x86, 64-bit real:

x	expected	Generic D version	LLVM intrinsic (MSVC++)
0x1.000p+0	0x1.5bf0a8b145769p+1	0x1.5bf0a8b14576ap+1 [52]	0x1.5bf0a8b145769p+1 [53]
0x1.000p-1	0x1.a61298e1e069cp+0	0x1.a61298e1e069cp+0 [53]	0x1.a61298e1e069cp+0 [53]
0x1.800p+1	0x1.415e5bf6fb105p+4	0x1.415e5bf6fb106p+4 [52]	0x1.415e5bf6fb106p+4 [52]
0x1.600p+9	0x1.93bf4ec282efbp+1015	0x1.93bf4ec282efbp+1015 [53]	0x1.93bf4ec282efbp+1015 [53]
0x1.700p+9	inf	inf [53]	inf [53]
0x1.000p+80	inf	inf [53]	inf [53]
inf	inf	inf [53]	inf [53]
-0x1.600p+9	0x1.44a3824e5285fp-1016	0x1.44a3824e5285fp-1016 [53]	0x1.44a3824e5285fp-1016 [53]
-0x1.640p+9	0x0.06f84920bb2d3p-1022	0x0.06f84920bb2d3p-1022 [53]	0x0.06f84920bb2d4p-1022 [51]
-0x1.743p+9	0x0.0000000000001p-1022	0x0.0000000000001p-1022 [53]	0x0.0000000000001p-1022 [53]
-0x1.800p+9	0x0p+0	0x0p+0 [53]	0x0p+0 [53]
-0x1.000p+30	0x0p+0	0x0p+0 [53]	0x0p+0 [53]

x86, 80-bit real (the version using DMD-style inline assembly always returns the expected value):

x	expected	Generic D version	LLVM intrinsic (libm)
0x8p-3	0xa.df85458a2bb4a9bp-2	0xa.df85458a2bb4a9bp-2 [64]	0xa.df85458a2bb4a9bp-2 [64]
0x8p-4	0xd.3094c70f034de4cp-3	0xd.3094c70f034de4cp-3 [64]	0xd.3094c70f034de4cp-3 [64]
0xcp-2	0xa.0af2dfb7d882f97p+1	0xa.0af2dfb7d882f96p+1 [63]	0xa.0af2dfb7d882f97p+1 [64]
0x8.8p+10	0x9.4d77ff7e4763228p+12554	0x9.4d77ff7e47624b3p+12554 [52]	0x9.4d77ff7e47624b3p+12554 [52]
0xb.8p+10	inf	inf [64]	inf [64]
0x8p+77	inf	inf [64]	inf [64]
inf	inf	inf [64]	inf [64]
-0x8.cp+10	0xa.f25faa5a4036dc8p-12930	0xa.f25faa5a40381a7p-12930 [51]	0xa.f25faa5a40381a8p-12930 [51]
-0xb.128p+10	0xd.35eb451ce88f9aep-16361	0xd.35eb451ce88ff62p-16361 [53]	0xd.35eb451ce88ff62p-16361 [53]
-0xb.16d78p+10	0x6.5b14f4c09dc0874p-16385	0x6.5b14f4c09dc076ap-16385 [54]	0x6.5b14f4c09dc076ap-16385 [54]
-0xb.218p+10	0x0.000000000000002p-16385	0x0.000000000000002p-16385 [64]	0x0.000000000000002p-16385 [64]
-0xb.228p+10	0x0p+0	0x0p+0 [64]	0x0p+0 [64]
-0x8p+27	0x0p+0	0x0p+0 [64]	0x0p+0 [64]

ibuclaw · 2015-11-18T20:20:25Z

Updated my table with results for libm tests (slash GCC builtins).

ibuclaw · 2015-11-18T20:27:25Z

0x0.06f84920bb2d4p-1022 [51]

OK, so my ARM tests, libm, and llvm intrinsics all seem to agree on this. So shall we just alter the expected value to 0x0.06f84920bb2d4p-1022L - meaning that Win64 will be the odd one out?

kinke · 2015-11-18T20:34:08Z

Yep, perfectly fine with that.

But more importantly, what about the 80-bit reference results? There's 2 options:

As they are ~~imprecise~~ crappy, allow for up to 13 diverging mantissa bits.
Use proper ones (libm/Wolfram), as all non-assembly versions seem to agree on them pretty much as well. That means that exp() would need to be fixed, most likely by just eliminating the special case for x87 and using the generic version instead.

The latter will probably result in more failing assertions. And obviously require your pull for the IEEE flags.

ibuclaw · 2015-11-18T20:41:27Z

That means that exp() would need to be fixed, most likely by just eliminating the special case for x87 and using the generic version instead.

Well, what I read from it is that it's not clear how accurate the x87 test/expected pairs are. If libm disagrees with a divergence of up to 13 mantissa bits, maybe they are just plain wrong?

kinke · 2015-11-18T20:47:38Z

maybe they are just plain wrong?

Of course they are. I bet they've just been obtained by running the current implementation, which multiplies the value and feeds exp2() with it, which can only be an approximation.
Ironically, Walter wouldn't let me use a limit of 13 bits in my original pull (as preparation for LLVM intrinsics) because he likes to keep the tests tight. If only they were testing for correct results ;) - but I didn't go for the underlying issue back then.

ibuclaw · 2015-11-19T09:32:18Z

Yep, perfectly fine with that.

Actually, what's the result of feqrel with 0x0.06f84920bb2d4p-1022 on Win64? If it's less than 51, then it might not be worth it.

kinke · 2015-11-19T10:23:12Z

I've replied to your new pull. Given the diff of one subnormal unit, feqrel() returns real.mant_dig - 2 = 51 (feqrel() subtracts an additional bit for subnormals, no idea why).

ibuclaw · 2015-11-19T20:48:01Z

I have raised a new bug: https://issues.dlang.org/show_bug.cgi?id=15365

So I guess this is good for closure.

ibuclaw reviewed Jul 10, 2015
View reviewed changes

Revert "std.math: proper support for 64-bit real in exp()"

6f0f928

DmitryOlshansky added the math label Jul 15, 2015

ibuclaw mentioned this pull request Nov 19, 2015

Adjust expected result of test to match ARM and others #3819

Merged

ibuclaw closed this Nov 19, 2015

ibuclaw deleted the revert-3285-exp branch November 19, 2015 20:48

Uh oh!

Conversation

ibuclaw commented Jul 10, 2015

Uh oh!

ibuclaw Jul 10, 2015

Choose a reason for hiding this comment

Uh oh!

kinke Jul 10, 2015

Choose a reason for hiding this comment

Uh oh!

ibuclaw Jul 11, 2015

Choose a reason for hiding this comment

Uh oh!

MartinNowak commented Jul 23, 2015

Uh oh!

schveiguy commented Nov 16, 2015

Uh oh!

kinke commented Nov 16, 2015

Uh oh!

ibuclaw commented Nov 16, 2015

Uh oh!

kinke commented Nov 16, 2015

Uh oh!

ibuclaw commented Nov 16, 2015

Uh oh!

ibuclaw commented Nov 16, 2015

Uh oh!

kinke commented Nov 16, 2015

Uh oh!

ibuclaw commented Nov 16, 2015

Uh oh!

ibuclaw commented Nov 16, 2015

Uh oh!

ibuclaw commented Nov 16, 2015

Uh oh!

kinke commented Nov 16, 2015

Uh oh!

kinke commented Nov 16, 2015

Uh oh!

ibuclaw commented Nov 17, 2015

Uh oh!

ibuclaw commented Nov 17, 2015

Uh oh!

kinke commented Nov 17, 2015

Uh oh!

ibuclaw commented Nov 18, 2015

Uh oh!

ibuclaw commented Nov 18, 2015

Uh oh!

kinke commented Nov 18, 2015

Uh oh!

ibuclaw commented Nov 18, 2015

Uh oh!

ibuclaw commented Nov 18, 2015

Uh oh!

kinke commented Nov 18, 2015

Uh oh!

dnadlinger commented Nov 18, 2015

Uh oh!

kinke commented Nov 18, 2015

Uh oh!

ibuclaw commented Nov 18, 2015

Uh oh!

ibuclaw commented Nov 18, 2015

Uh oh!

kinke commented Nov 18, 2015

Uh oh!

ibuclaw commented Nov 18, 2015

Uh oh!

kinke commented Nov 18, 2015

Uh oh!

ibuclaw commented Nov 19, 2015

Uh oh!

kinke commented Nov 19, 2015

Uh oh!

ibuclaw commented Nov 19, 2015

Uh oh!

Reviewers

Assignees

Labels