Skip to content

Begin replacing floatTraits pointer math with RealRep union#4336

Closed
tsbockman wants to merge 2 commits intodlang:masterfrom
tsbockman:float_union
Closed

Begin replacing floatTraits pointer math with RealRep union#4336
tsbockman wants to merge 2 commits intodlang:masterfrom
tsbockman:float_union

Conversation

@tsbockman
Copy link
Contributor

@tsbockman tsbockman commented May 18, 2016

Pros:

  • Prepare the way for CTFE floating-point math by switching from explicit pointer reinterpretation to a union.
  • union code is easier to read and understand.
  • union code is also easier to write, thus freeing mental energy for performance optimizations.
  • Consolidate tons of near-duplicate code-paths into (usually) two: one for ibmExtended, and one for everything else.
  • Adding support for new IEEE-style floating-point types (such as half or quad) gets easier as more of std.math is converted to use RealRep. (ucent would help here, though.)
  • By the time all of std.math has been converted, I expect that (in net) RealRep will have reduced the total line count substantially.
  • It may be possible to replace some of the inline assembler with RealRep code; the codegen seems to be very clean, for the most part.

Cons:

  • Many large diffs (starting with this one) will be required to complete the conversion.
  • std.math will become substantially more dependant upon inlining for good performance.

TODO: Make this publically available through std.bitmanip once the API
is stable and tested.
*/
union RealRep(N)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be at least package as long as it's not exposed publicly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That whole section was already marked package, starting on line 214.

@tsbockman
Copy link
Contributor Author

tsbockman commented May 20, 2016

I have now tuned this PR to produce the best assembly code that I can.

I optimized to minimize memory access, instruction count, branches, and embedded constant size, in about that order. (Note that the runtime code affected by this PR is all bitwise integer ops, so there are no slow, precision-destroying FPU ops to worry about.)

Here is a summary of the net change. -X ins means the union version is X instructions shorter. -Y mem means that it contains Y fewer memory accesses:

DMD64 DMD32 LDC64 LDC32 GDC64
isNaN:80 -2 ins -8 ins, -6 mem 0 0 0
isNaN:64 -5 ins, -2 mem -10 ins, -5 mem 0 0 +1 ins
isNaN:32 0 0 0 0 +2 ins
isFinite:80 -1 ins -1 ins 0 0 0
isFinite:64 0 -1 ins 0 0 -2 mem
isFinite:32 -1 ins -1 ins 0 0 -2 mem
isInfinity:80 -4 ins -14 ins, -6 mem 0 0 0
isInfinity:64 -1 ins 0 0 0 -1 ins
isInfinity:32 0 0 0 0 0
isNormal:80 -1 ins -1 ins +1 ins +1 ins -1 ins
isNormal:64 -1 ins -1 ins +1 ins +1 ins -1 ins, -2 mem
isNormal:32 0 -1 ins +1 ins +1 ins -1 ins, -2 mem
isSubnormal:80 0 0 0 -1 ins 0
isSubnormal:64 -1 ins, -1 mem +2 ins -4 ins -4 ins -1 ins, -3 mem
isSubnormal:32 +1 ins +1 ins 0 0 +3 ins
signbit:80 -1 ins 0 0 0 0
signbit:64 +1 ins 0 0 0 -2 mem
signbit:32 -1 ins 0 0 0 -1 ins, -2 mem
isIdentical:80 0 0 0 0 +4 ins, +4 mem
isIdentical:64 -15 ins, -12 mem -15 ins, -8 mem -13 ins, -10 mem -15 ins, -6 mem -13 ins, -10 mem
isIdentical:32 -14 ins, -10 mem -18 ins, -10 mem -13 ins, -10 mem -18 ins, -8 mem -13 ins, -10 mem
copysign:80 +2 ins +3 ins 0
copysign:64 +3 ins, +1 mem +7 ins, +5 mem +5 ins, -4 mem
copysign:32 +3 ins, +1 mem +2 ins +5 ins, -4 mem

Overall, I expect that the new union version is either the same speed, or faster compared to the old one. I do wish I could figure out why DMD doesn't like my copysign() implementation though, particularly for double on 32-bit.

EDIT: LDC uses the llvm_copysign() intrinsic for copysign(); that's why my version compared poorly. I have removed it from the comparison, since there is no need for the inferior RealRep version on that platform.

@tsbockman
Copy link
Contributor Author

Ping @9il and @ibuclaw . I know this is a lot to review, but can you guys at least take a first look and let me know if this is worth pursuing?

I want to start working on follow-up pull requests to convert the rest of the pointer-based code in std.math, but don't want to waste my time if my whole approach is unacceptable for some reason.

@tsbockman tsbockman force-pushed the float_union branch 2 times, most recently from 202d8ba to 8160b90 Compare May 31, 2016 13:01
@tsbockman
Copy link
Contributor Author

Converting std.math to use unions is a large project, and I don't want it hanging over my head indefinitely. Since no one will tell me whether this is wanted or not, I'm going to close it now.

I might reopen if someone asks me to, soon.

@tsbockman tsbockman closed this Jun 16, 2016
@WalterBright WalterBright reopened this Jun 17, 2016
@WalterBright
Copy link
Member

There is good work in here. Please do not close.

std/math.d Outdated
alias Mant;
+/

static if (N.mant_dig == 11 && N.max_exp == 0x6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use decimals for N.max_exp also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used hexadecimal for N.max_exp because for all other formats, N.max_exp == (1 << (expDig - 1)) and hexadecimal makes it easy to verify at a glance that the numbers are actually correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact... this one should be (1 << (expDig - 1)) == 0x10, also; I don't know where I got 0x6 from. I'll fix that now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

expDig = 11;
alias Mant = ulong;
}
else static if (N.mant_dig == 53 && N.max_exp == 0x4000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@tsbockman
Copy link
Contributor Author

On the forum, Simen Kjærås questioned the need for a separate ibmExtended code path.

While I don't think it can be consolidated with the others, I am wondering if we should just drop it completely. What do you guys think, @ibuclaw and @kinke ? Is imbExtended something D needs to support, or should we mandate that all floating-point formats be IEEE-like?

@ibuclaw
Copy link
Member

ibuclaw commented May 17, 2017

How difficult a change would it be? If trivial then I'd say let's just deal with it when someone has the hardware (or emulator) to test.

@tsbockman
Copy link
Contributor Author

tsbockman commented May 17, 2017

How difficult a change would it be?

I don't really know how difficult adding ibmExtended support will be; it seems fairly easy so far, but there are a lot of functions in std.math that I haven't worked on yet. (I started with the easy stuff to keep this first PR small.)

Even if I succeed in writing code that is correct, I definitely can't optimize it much without access to a test environment.

(Removing the existing incomplete ibmExtended support is easy - I'll just add a static assert when that format is detected and delete all the special code paths for it outside of RealRep.)

@kinke
Copy link
Contributor

kinke commented May 17, 2017

From https://gcc.gnu.org/wiki/Ieee128PowerPC:

[...] the PowerPC compiler uses a software type known as IBM extended double type. The IBM extended double type is a pair of double values that give the user more bits of precision, but no additional range for the mantissa. All of the support for IBM extended double is done via software emulation (there are deprecated instructions to load and store a pair of floating point values, but current hardware no longer supports these instructions).

As D real is defined as the largest FP type in hardware (with LDC violating this for Windows/MSVC already) and I don't recall anyone requesting support for it, I'm all for getting rid of ibmExtended completely. I'd suggest adding a new commit removing it, so that it's still in the git history in case someone needs it in the future (interop with C libs like glibc compiled with -mlong-double-128 or so).

Nice PR btw (no time for a proper review though).

@kinke
Copy link
Contributor

kinke commented May 17, 2017

Btw such a FP union might come in handy already in druntime instead of Phobos; core.internal.convert could definitely use it.

@tsbockman
Copy link
Contributor Author

druntime

OK. I'll keep that in mind for the future.

Currently my plan is to keep RealRep private to the std.math package until the entire thing has been converted to use it. That way, the API design won't get frozen before it's been adequately tested.

We can look into publicising it and moving it to druntime or std.bitmanip later.

@WalterBright
Copy link
Member

Note that CTFE supports the special casts:

*cast(uint*)&float
*cast(uint*)&double

in order to support bit manipulation of floating point values. Changing these to unions will break CTFE support.

@tsbockman
Copy link
Contributor Author

@WalterBright I can manually lower the union code to an equivalent struct that uses those casts internally, if necessary.

Are those really the only supported casts, though? What about ulong*, ushort*, etc.? What about real?

@Biotronic
Copy link
Contributor

Those are the only supported casts. Hence, real is just a mess in CTFE. I actually started writing a patch for DMD to support the missing casts a year ago or so - maybe I should pick it up again and add a PR. Either that, or this PR is basically stranded until NewCTFE comes around.

@ibuclaw
Copy link
Member

ibuclaw commented May 18, 2017

When I moved type painting out of ctfe, I wrote it with the intention of it being a simple addition to paint basic and vector types to static array and vice versa [1].

Where all you need is to add the case for Tsarray, and do a loop passing the pointer adjusted buffer to encodeXXX or decodeXXX. Then allowing the cast in ctfe as being valid where Target.paintAsType is called from.

I had also intended to use this for union support too, but I never got round to it.

[1] https://github.com/dlang/dmd/blob/master/src/ddmd/target.d#L374

@tsbockman
Copy link
Contributor Author

@ibuclaw @Biotronic @WalterBright I have opened a DMD PR to extend the repainting capabilities as needed.

{
RealRep!double lo;
RealRep!double hi;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing, can safely say that this is wrong. The most significant part is always first, regardless of of endian-ness on ibmExtended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants