Generalize CTFE float/int repainting to support many more types. by tsbockman · Pull Request #6811 · dlang/dmd

tsbockman · 2017-05-19T03:54:54Z

This enhances paintFloatInt() to support repainting between any combination of byte/ubyte/short/ushort/int/uint/long/ulong/float/double/real, as well as small static arrays derived from any of those types (such as ushort[5]).

Repainting is permitted as long as the from type has a size equal to or greater than the to type. 80-bit real is a special case: the padding area does not count when casting to it, but does when casting from it.

I believe this PR provides the capabilities needed to get all of std.math working at CTFE using the same code path as runtime (aside from inline assembler), albeit in a rather ugly fashion.

TODO: Mostly tests.

tsbockman · 2017-05-19T04:46:23Z

Failing tests:

I fixed this first one by returning false from isFloatIntPaint() in the trivial case where both element types are of the same size and nature (fromEle.isintegral() == toEle.isintegral() && fromEle.size() == toEle.size()). BUT I don't understand the compiler well enough to work out why it occurred in the first place, so reviewers take note:

compilable/interpret3.d(1709): Error: assert(arr[0] == 3) failed
compilable/interpret3.d(1719):        called from here: test10243()
compilable/interpret3.d(1719):        while evaluating: `static assert(test10243())`
compilable/interpret3.d(2165): Error: *& a = 15 cannot be evaluated at compile time
compilable/interpret3.d(2169):        called from here: (*function () => true)()
compilable/interpret3.d(2163):        while evaluating: `static assert((*function () => true)())`

This test (fail_compilation/ctfe14207.d) also "fails", but I'm not sure why we need it?

/*
TEST_OUTPUT:
---
fail_compilation/ctfe14207.d(13): Error: cannot convert &immutable(ulong) to ubyte[8]* at compile time
fail_compilation/ctfe14207.d(18):        called from here: nativeToBigEndian()
fail_compilation/ctfe14207.d(22):        called from here: digest()
---
*/

ubyte[8] nativeToBigEndian()
{
    immutable ulong res = 1;
    return *cast(ubyte[8]*) &res;
}

auto digest()
{
    ubyte[8] bits = nativeToBigEndian();
    return bits;
}

enum h = digest();

I know that repainting needs to handle endian-ness properly, but it's not difficult to do so provided that the endian-ness of both the host and the target are known. What predicate should I fill in here?

        // If host != target endian-ness, swap the byte order for bp[0 .. eleSize] here.
        if ( ??? )
        {

jpf91 · 2017-05-19T06:31:01Z

version(BigEndian)
    enum hostBigEndian = true;
else
    enum hostBigEndian = false;

// If host != target endian-ness, swap the byte order for bp[0 .. eleSize] here.
if (hostBigEndian != target.bigEndian)

I think the target endianess information is not exposed to the frontend currently. You'll either have to extend target.d or maybe you can use the encodeInteger/decodeInteger functions. (DMD is currently hardcoded for little endian targets, so you can simply return a hardcoded value in target.d for DMD. LDC and GDC can properly determine the target endianess)

ibuclaw

Seems OK on the whole.

ibuclaw · 2017-05-19T05:39:32Z

src/ddmd/ctfeexpr.d

+extern (C++) bool isFloatIntPaint(Type from, Type to)
 {
-    return from.size() == to.size() && (from.isintegral() && to.isfloating() || from.isfloating() && to.isintegral());
+    const toUsed = (to.ty == Tfloat80) ? 80/8 : to.size();


Remove special case for reals.

Or maybe you are trying to do Target.realsize - Target.realpad here...

Is there a Treal ENUMTY value I can use instead of Tfloat80 then?

Because as long as it's checking against Tfloat80, it ought to say 80/8, not Target.realsize - Target.realpad: there is no guarantee that real will always be Tfloat80.

I know it's an unfortunate name, but Tfloat80 is treated as largest supported floating point type in gdc and ldc. Tfloat80 can even be a double!

Gross! But thanks for explaining; I'll fix my code...

ibuclaw · 2017-05-19T05:52:37Z

src/ddmd/ctfeexpr.d

+        return false;
+
+    Type fromEle = (from.ty == Tsarray) ? (cast(TypeArray) from).next : from;
+    Type toEle = (to.ty == Tsarray) ? (cast(TypeArray) to).next : to;


fromElem/toElem? Unless the latter is clashes with a backend function, in which case I think another appropriate name would be tbfrom/tbto.

What does "tb" stand for?

tb is the result of running t.toBasetype()

ibuclaw · 2017-05-19T05:59:27Z

src/ddmd/target.d

-// Write the integer value of 'e' into a unsigned byte buffer.
-extern (C++) static void encodeInteger(Expression e, ubyte* buffer)
+// Write the bit patterns representing the value of `e` into `buffer`.
+extern (C++) void paintEncode(Expression e, ubyte* buffer)


This can be made private, remove the extern(C++) too.

ibuclaw · 2017-05-19T06:03:09Z

src/ddmd/target.d

+        assert (e.op == TOK.TOKarrayliteral);
+        arr = cast(ArrayLiteralExp) e;
+        aLen = cast(int) arr.elements.dim;
+        eleSize = cast(int) (cast(TypeArray) arr.type).next.size();


Same as before, using ele looks strange to me.

ibuclaw · 2017-05-19T06:12:48Z

src/ddmd/target.d

+                break;
+            case Tfloat80:
+                *(cast(real*) bp) = cast(real) value;
+                bp[80/8 .. eleSize] = 0; // Clear the padding area for consistency.


Target.realsize - Target.realpad

ibuclaw · 2017-05-19T06:13:55Z

src/ddmd/target.d

+                bp[80/8 .. eleSize] = 0; // Clear the padding area for consistency.
+                break;
+            default:
+                assert (0);


Make it a final switch then?

ibuclaw · 2017-05-19T06:17:33Z

src/ddmd/target.d

+        {
+            const last = eleSize - 1;
+            for (size_t x = 0; x <= last; ++x)
+                bp[x] = bp[last - x];


I know this is inside a static if (false), but you probably want to use swap.

ibuclaw · 2017-05-19T06:31:55Z

src/ddmd/target.d

-            break;
+            const last = eleSize - 1;
+            for (size_t x = 0; x <= last; ++x)
+                bp[x] = bp[last - x];


Or maybe my thoughts about endianness don't apply here, as dmd doesn't internally represent data as the target sees it.

ibuclaw · 2017-05-19T06:36:35Z

Other notes: Missing test cases, should also check the behaviour of type punning a zero sized array.

ibuclaw · 2017-05-19T06:48:57Z

I know that repainting needs to handle endian-ness properly, but it's not difficult to do so provided that the endian-ness of both the host and the target are known. What predicate should I fill in here?

I probably wouldn't worry about it, dmd doesn't pretend to support mixed endianness anyway. I'd put it in the later queue as other compilers should already be handling this.

ibuclaw · 2017-05-19T06:49:41Z

However the testsuite should be endianness aware.

UplinkCoder · 2017-05-19T06:57:14Z

@tsbockman in general I dislike it when people extend the feature set of ctfe...

Also the implementation needs to be cleaned.

ibuclaw · 2017-05-19T07:15:38Z

in general I dislike it when people extend the feature set of ctfe...

Why? Because that means you need to extend your feature set? 😉

UplinkCoder · 2017-05-19T07:18:16Z

Why? Because that means you need to extend your feature set? 😉

Exactly.

UplinkCoder · 2017-05-19T07:37:18Z

@tsbockman don't support Treal at all.
keep it at well defined IEEE types and that ought to be it.

tsbockman · 2017-05-19T07:50:05Z

don't support Treal at all.

std.math makes heavy use of real. Practically disallowing its use in CTFE is a severe restriction.

Also, real already works in CTFE, in general - it's just repainting that doesn't. So, it's not like you can just leave it out, anyway; there's lots of CTFE code out there that uses real.

WalterBright · 2017-05-19T16:33:50Z

This test (fail_compilation/ctfe14207.d) also "fails", but I'm not sure why we need it?

The 14207 indicates it came from fixing https://issues.dlang.org/show_bug.cgi?id=14207 which gives a use case.

tsbockman · 2017-05-19T16:59:24Z

https://issues.dlang.org/show_bug.cgi?id=14207 which gives a use case.

Given that the test case works at run time, I see no reason that it shouldn't be made to work at compile time, as well; I'll replace that fail_compilation test with a normal one to verify a correct result, instead.

ibuclaw · 2017-05-19T17:24:03Z

The 14207 indicates it came from fixing https://issues.dlang.org/show_bug.cgi?id=14207 which gives a use case.

Looks like ICE -> Error to me. This PR makes it so we go from Error -> OK.

tsbockman · 2017-05-19T17:30:43Z

It looks like everything is passing now. I still need to add a bunch more tests before this is ready to merge, though.

yebblies · 2017-05-21T07:11:34Z

This really is ugly. If we're just doing this to so that we can get std.math working, why not do it a much less ugly way?

Is there anything there that can't be done with a few intrinsics instead of pointer casting? eg getFloatExponent could do the masking and shifting under the hood, and will encourage the math code to be trivially portable. This also avoids the special-casing of 80-bit reals.

I've always seen ctfe preventing pointer casting and union reinterpretation as being a feature, not a bug.

ibuclaw · 2017-05-21T07:40:16Z

Is there anything there that can't be done with a few intrinsics instead of pointer casting?

Bit setting may or may not be more awkward, see for instance std.math.floor() or ceil().

In any case the implementation underneath the hood would still remain the same. Whether or not the user code is close or far away from it doesn't really matter.

I have no objection to adding new properties to provide accessors, someone may request for a DIP though.

yebblies · 2017-05-22T02:32:17Z

TBH it looks like floor would be a lot simpler with intrinsics. I agree with @UplinkCoder about increasing the ctfe feature set too, unless this has much broader use I'm not aware of.

Most similar stuff can be done in ctfe without reinterpretation, because you can reliably extract bits from integers. Intrinsics to extract bits from floats would IIUC complete this to the point that hashing and other low-level manipulations can be done in a clean and portable way.

This would also mean you could write low-level floating point code without ever having to care about padding. And unlike casting/unions, the intrinsics could be overloaded to provide the same interface for library float types.

@tsbockman What do you think about this?

PetarKirov · 2017-05-22T09:25:36Z

I agree with @UplinkCoder about increasing the ctfe feature set too...

Intrinsics also increase the feature set, but do so by inventing a different language dialect incompatible with the rest language. I.e. they make writing generic code harder because you would need to have different code for CTFE and runtime - the same mistake that C++ does over and over again. Having a clean language that allows you to have code reuse is much more important than inventing new intrinsics for every single use case. Type repainting is a key part of every systems programmer's toolbox. Let's not add more hacks to the language.

yebblies · 2017-05-22T10:43:44Z

I.e. they make writing generic code harder because you would need to have different code for CTFE and runtime

No, because the intrinsics would work at runtime. There is nothing clean about the current use of unions, take a look at the code in std.math.

tsbockman · 2017-05-22T18:43:04Z

@yebblies Most of the benefits which you claim intrinsics would have (portability, cleaner code, etc.) over unions or reinterpret casts really just come from consolidating the union logic into one place.

That one place can be a type in phobos or druntime, or it can be some intrinsic functions implemented in the compiler. Either way, it makes no difference as far as portability, correctness, and code simplification goes.

As to the CTFE feature set - the arbitrary, undocumented restrictions placed upon CTFE code are a frequent source of frustration for me, especial since (last time I checked) if (__ctfe) breaks inlining in DMD.

There is nothing clean about the current use of unions, take a look at the code in std.math.

std.math is a mess because the union logic is spread out all over the module. Consolidating it in one place behind a sensible API will fix this. That's the point of my in-progress rewrite.

yebblies · 2017-05-22T21:59:32Z

@tsbockman

That one place can be a type in phobos or druntime, or it can be some intrinsic functions implemented in the compiler. Either way, it makes no difference as far as portability, correctness, and code simplification goes.

It makes a difference as far as complexity of the supported set of features in ctfe. Adding this feature does not make it easier to specify the exact feature set either.

It sounds like you're saying an intrinsic based api would work for your purpose as long as it works at runtime and compile time. Our even that you're already designing such an api, and it just needs to be moved to druntime and added to the builtins list. Am i misunderstanding?

tsbockman · 2017-05-23T01:59:37Z

Adding this feature does not make it easier to specify the exact feature set either.

The restrictions themselves are the main problem; the lack of documentation is just salt in the wound. Expanding the CTFE feature set in the direction of the run time feature set lessens the need for separate documentation.

It sounds like you're saying an intrinsic based api would work for your purpose as long as it works at runtime and compile time. Our even that you're already designing such an api, and it just needs to be moved to druntime and added to the builtins list. Am i misunderstanding?

An appropriate intrinsic API would look similar to my RealRep union type, if it were chopped apart into a bunch of stand-alone functions. However, I haven't finalized the design because I figure the best way to design something that really does what's needed (including performance) is to keep it private until all of std.math has been rewritten to use it, revising the API as I go.

I wouldn't want to commit to a specific list of intrinsics right now, because while it's easy to write a list that is functionally complete, it's harder to figure out which set gives optimal performance.

tsbockman · 2017-05-23T03:19:47Z

With further testing, I see that (with or without this PR) paintFloatInt() only works properly for rvalues.

Is there a reasonable way to extend this to work with lvalues, too? If not, I might just close this...

ibuclaw · 2017-05-23T07:00:20Z

Exactly.

If you can't do constant propagation, and that includes pointer offset and devirtualization at compile time, then I would consider that a bug in ctfe. You should be able to support everything that a compiler optimizer is able to do, and then some.

ibuclaw · 2017-05-23T07:22:06Z

Is there a reasonable way to extend this to work with lvalues, too? If not, I might just close this...

I would consider restricting it to rvalues is reasonable for the time being. Lvalues would just require in the worst case double painting.

*cast(int*) & f = 42;
(*cast(byte[4]*) & f)[0] = 1;

Its easier just to let the user do the rewrite as:

int i = 42;
f = *cast(float*) & i;

byte[4] b = *cast(byte[4]*) & f;
b[0] = 1;
f = *cast(float*) & b;

Instead of thrusting this on the compiler. At least that is my opinion until we perhaps switch to better ctfe technologies.

tsbockman · 2017-05-23T15:32:59Z

Its easier just to let the user do the rewrite

I thought about that, but that's only a viable option if we can demonstrate that it generates the same run time code. In particular, I am concerned about this case:

real r = 1.5;
ushort[5] u = *(cast(ushort[5]*) &r);
u[4] = u[4] | 0x8000;
r = *(cast(real*) &u);
assert(r == -1.5);

Given that ushort[5] does not fit in a single register, I really hope that the optimizer is smart enough to eliminate the unnecessary load and store of u[0 .. 4].

I'll investigate.

tsbockman · 2017-05-23T20:06:45Z

I really hope that the optimizer is smart enough to eliminate the unnecessary load and store of u[0 .. 4].

It's not:

real makeNegCT(real x)
{
  ushort[5] u = *(cast(ushort[5]*) &x);
  u[4] = u[4] | 0x8000;
  x = *(cast(real*) &u);
  return x;
}

real makeNegRT(real x)
{
  (cast(ushort*) &x)[4] |= 0x8000;
  return x;
}

makeNegRT() generates the desired code,

real makeNegRT(real):
 push   rax
 or     DWORD PTR [rsp+0x18],0x8000
 
 fld    TBYTE PTR [rsp+0x10]
 pop    rcx
 ret    
 nop

But makeNegCT() does not:

real makeNegCT(real):
 sub    rsp,0x18
 lea    rsi,[rsp+0x20]
 lea    rdi,[rsp+0x8]
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
 movs   BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
 movs   BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
 or     DWORD PTR [rsp+0x10],0x8000
 
 fld    TBYTE PTR [rsp+0x8]
 add    rsp,0x18
 ret    
 nop    DWORD PTR [rax+rax+0x0]

(That's from DMD via asm.dlang.org; the results from GDC are similar. LDC produces sub-optimal code for both.)

AndrewEdwards · 2020-09-08T12:19:50Z

@tsbockman - Can we get your eyes back on this. I has been quite a while. Is this a no go or are we at a point where this is now possible?

tsbockman · 2020-09-08T21:39:30Z

Is this a no go or are we at a point where this is now possible?

@AndrewEdwards This was always possible, from a technical perspective. The only blocker was the core compiler development team making a decision as to whether/how they want this feature implemented. See the posts of UplinkCoder, yebblies, and ibuclaw, above.

In the years since this PR stalled, I have learned much more about how to write the kind of low-level floating-point code that it is designed to support. I think I could now design a good intrinsic API for that purpose, if it were decided that yebblies' solution is best. However, I still believe that expanding the CTFE engine's support for reinterpret casts would be much better for the language in the long run.

Regardless, I don't feel very motivated to actually work on any of this again: I largely quit trying to contribute code to the D project years ago because I have watched far too many pull requests (both my own and those of others) stall indefinitely waiting for leadership to make some important decision.

It seems that when debates like this don't lead to a clear and objective answer, often leadership refuses or forgets to make any decision at all, so that it just ends up being a big waste of time for would-be contributors like myself.

Geod24 · 2020-09-09T03:58:41Z

@tsbockman : If your goal is to get std.math working at compile time, there's been talks about reducing its dependence on real, driven by the performances impact of doing so. Would that help achieve your objective ?

Regarding the contribution process, I share the frustration, however things have improved quite a bit over the years, and I think if you were to resume contributing the experience would be much more pleasant. It's never going to be a walk in the park (we obviously need to set a high quality barrier for changing the language), but some unnecessary barriers were definitely lowered.

WalterBright · 2020-09-09T04:01:04Z

@tsbockman I do apologize for that. On a pragmatic note, smaller, more incremental changes are far more likely to get approved. It's not unusual to have a dozen or more separate PRs for a single change, for example, a recent series where I redid how code was generated for divide-by-constant.

AndrewEdwards · 2020-09-09T07:20:10Z

@tsbockman - I was specifically referring to this comment and the ensuing conversation with Iain.

With further testing, I see that (with or without this PR) paintFloatInt() only works properly for rvalues.

Is there a reasonable way to extend this to work with lvalues, too? If not, I might just close this..

Thanks for your response. I aim to assist on improving the experience with contributions in general.

@WalterBright, @yebblies, @UplinkCoder. From what I understand, the general premise is a firm decision on whether to implement via intrinsics vice the current implementation and exclusion/inclusion of real from the implementation is necessary before moving forward. Please discuss and decide on this matter so that it is clear which approach to take.

tsbockman · 2020-09-09T08:39:50Z

I was specifically referring to this comment and the ensuing conversation with Iain.

If your goal is to get std.math working at compile time, there's been talks about reducing its dependence on real, driven by the performances impact of doing so. Would that help achieve your objective ?

My actual goal was to make it possible to do major refactoring and optimizations in std.math without introducing regressions in its existing partial CTFE support: see my Phobos PR, #4336, in particular the discussion starting here.

When this PR stalled without a clear path forward or alternative, I eventually just decided to fork std.math for my own personal use. My current version (which is not online anywhere at the moment):

Is similar in speed, or significantly faster depending on the task
Shares far more code between float, double, and real
Also supports half-precision floating point
Uses almost no assembly code
Is much easier to read and understand, as it is written in a higher-level style
Improves precision and/or correctness in some cases

However, it does not have good CTFE support, because there is no way in D as it currently stands to write some low-level floating-point algorithms in a way that is correct and efficient at both compile-time and run-time.

From what I understand, the general premise is a firm decision on whether to implement via intrinsics vice the current implementation and exclusion/inclusion of real from the implementation is necessary before moving forward.

Correct. The alternatives are:

Improve CTFE support for reinterpret casts and/or unions via this PR, and via future work to fix this for lvalues.
Scrap this PR in favour of a new intrinsic API to solve the CTFE problem, as yebblies suggested above.
Accept that there will be some regressions in std.math's CTFE support, in which case this PR isn't needed.
Reject my work on std.math, in which case this PR isn't important and may as well be closed.

Of course this PR is not ready to merge as-is, but there is no point in me working on that unless the compiler dev team is pretty sure they want to take option (1).

AndrewEdwards · 2020-09-14T05:56:39Z

Ping @WalterBright, @yebblies, @UplinkCoder.

tsbockman force-pushed the ctfe_real branch from d1948b9 to 8cb5304 Compare May 19, 2017 04:29

ibuclaw requested changes May 19, 2017

View reviewed changes

tsbockman force-pushed the ctfe_real branch from 8cb5304 to cb7b4dd Compare May 19, 2017 07:57

tsbockman force-pushed the ctfe_real branch from cb7b4dd to 3319fb6 Compare May 19, 2017 17:16

tsbockman force-pushed the ctfe_real branch 2 times, most recently from f6689ad to 081643a Compare May 19, 2017 19:56

tsbockman force-pushed the ctfe_real branch 2 times, most recently from 687d139 to 855a3de Compare May 29, 2017 16:40

Generalize CTFE float/int repainting to support many more types.

570cc7d

tsbockman force-pushed the ctfe_real branch from 855a3de to 570cc7d Compare May 29, 2017 16:51

tsbockman mentioned this pull request Nov 3, 2017

std.math: Add RealRep union for accessing floating point bits via a union. dlang/phobos#5825

Closed

dlang-bot added Review:Needs Rebase Review:Needs Work Review:stalled labels Jan 1, 2018

AndrewEdwards assigned tsbockman Sep 8, 2020

tsbockman closed this Oct 8, 2020

Uh oh!

Conversation

tsbockman commented May 19, 2017

Uh oh!

tsbockman commented May 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpf91 commented May 19, 2017

Uh oh!

ibuclaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tsbockman May 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ibuclaw commented May 19, 2017

Uh oh!

ibuclaw commented May 19, 2017

Uh oh!

ibuclaw commented May 19, 2017

Uh oh!

UplinkCoder commented May 19, 2017

Uh oh!

ibuclaw commented May 19, 2017

Uh oh!

UplinkCoder commented May 19, 2017

Uh oh!

UplinkCoder commented May 19, 2017

Uh oh!

tsbockman commented May 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WalterBright commented May 19, 2017

Uh oh!

tsbockman commented May 19, 2017

Uh oh!

ibuclaw commented May 19, 2017

Uh oh!

tsbockman commented May 19, 2017

Uh oh!

yebblies commented May 21, 2017

Uh oh!

ibuclaw commented May 21, 2017

Uh oh!

yebblies commented May 22, 2017

Uh oh!

PetarKirov commented May 22, 2017

Uh oh!

yebblies commented May 22, 2017

Uh oh!

tsbockman commented May 22, 2017

tsbockman commented May 19, 2017 •

edited

Loading

tsbockman May 19, 2017 •

edited

Loading

tsbockman commented May 19, 2017 •

edited

Loading

tsbockman commented May 23, 2017 •

edited

Loading

tsbockman commented Sep 8, 2020 •

edited

Loading

tsbockman commented Sep 9, 2020 •

edited

Loading