Skip to content

Generalize CTFE float/int repainting to support many more types.#6811

Closed
tsbockman wants to merge 1 commit intodlang:masterfrom
tsbockman:ctfe_real
Closed

Generalize CTFE float/int repainting to support many more types.#6811
tsbockman wants to merge 1 commit intodlang:masterfrom
tsbockman:ctfe_real

Conversation

@tsbockman
Copy link
Contributor

This enhances paintFloatInt() to support repainting between any combination of byte/ubyte/short/ushort/int/uint/long/ulong/float/double/real, as well as small static arrays derived from any of those types (such as ushort[5]).

Repainting is permitted as long as the from type has a size equal to or greater than the to type. 80-bit real is a special case: the padding area does not count when casting to it, but does when casting from it.

I believe this PR provides the capabilities needed to get all of std.math working at CTFE using the same code path as runtime (aside from inline assembler), albeit in a rather ugly fashion.

TODO: Mostly tests.

@tsbockman
Copy link
Contributor Author

tsbockman commented May 19, 2017

Failing tests:

I fixed this first one by returning false from isFloatIntPaint() in the trivial case where both element types are of the same size and nature (fromEle.isintegral() == toEle.isintegral() && fromEle.size() == toEle.size()). BUT I don't understand the compiler well enough to work out why it occurred in the first place, so reviewers take note:

compilable/interpret3.d(1709): Error: assert(arr[0] == 3) failed
compilable/interpret3.d(1719):        called from here: test10243()
compilable/interpret3.d(1719):        while evaluating: `static assert(test10243())`
compilable/interpret3.d(2165): Error: *& a = 15 cannot be evaluated at compile time
compilable/interpret3.d(2169):        called from here: (*function () => true)()
compilable/interpret3.d(2163):        while evaluating: `static assert((*function () => true)())`

This test (fail_compilation/ctfe14207.d) also "fails", but I'm not sure why we need it?

/*
TEST_OUTPUT:
---
fail_compilation/ctfe14207.d(13): Error: cannot convert &immutable(ulong) to ubyte[8]* at compile time
fail_compilation/ctfe14207.d(18):        called from here: nativeToBigEndian()
fail_compilation/ctfe14207.d(22):        called from here: digest()
---
*/

ubyte[8] nativeToBigEndian()
{
    immutable ulong res = 1;
    return *cast(ubyte[8]*) &res;
}

auto digest()
{
    ubyte[8] bits = nativeToBigEndian();
    return bits;
}

enum h = digest();

I know that repainting needs to handle endian-ness properly, but it's not difficult to do so provided that the endian-ness of both the host and the target are known. What predicate should I fill in here?

        // If host != target endian-ness, swap the byte order for bp[0 .. eleSize] here.
        if ( ??? )
        {

@jpf91
Copy link
Contributor

jpf91 commented May 19, 2017

version(BigEndian)
    enum hostBigEndian = true;
else
    enum hostBigEndian = false;

// If host != target endian-ness, swap the byte order for bp[0 .. eleSize] here.
if (hostBigEndian != target.bigEndian)

I think the target endianess information is not exposed to the frontend currently. You'll either have to extend target.d or maybe you can use the encodeInteger/decodeInteger functions. (DMD is currently hardcoded for little endian targets, so you can simply return a hardcoded value in target.d for DMD. LDC and GDC can properly determine the target endianess)

Copy link
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK on the whole.

extern (C++) bool isFloatIntPaint(Type from, Type to)
{
return from.size() == to.size() && (from.isintegral() && to.isfloating() || from.isfloating() && to.isintegral());
const toUsed = (to.ty == Tfloat80) ? 80/8 : to.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove special case for reals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe you are trying to do Target.realsize - Target.realpad here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a Treal ENUMTY value I can use instead of Tfloat80 then?

Because as long as it's checking against Tfloat80, it ought to say 80/8, not Target.realsize - Target.realpad: there is no guarantee that real will always be Tfloat80.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's an unfortunate name, but Tfloat80 is treated as largest supported floating point type in gdc and ldc. Tfloat80 can even be a double!

Copy link
Contributor Author

@tsbockman tsbockman May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gross! But thanks for explaining; I'll fix my code...

return false;

Type fromEle = (from.ty == Tsarray) ? (cast(TypeArray) from).next : from;
Type toEle = (to.ty == Tsarray) ? (cast(TypeArray) to).next : to;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fromElem/toElem? Unless the latter is clashes with a backend function, in which case I think another appropriate name would be tbfrom/tbto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "tb" stand for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tb is the result of running t.toBasetype()

// Write the integer value of 'e' into a unsigned byte buffer.
extern (C++) static void encodeInteger(Expression e, ubyte* buffer)
// Write the bit patterns representing the value of `e` into `buffer`.
extern (C++) void paintEncode(Expression e, ubyte* buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made private, remove the extern(C++) too.

assert (e.op == TOK.TOKarrayliteral);
arr = cast(ArrayLiteralExp) e;
aLen = cast(int) arr.elements.dim;
eleSize = cast(int) (cast(TypeArray) arr.type).next.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, using ele looks strange to me.

break;
case Tfloat80:
*(cast(real*) bp) = cast(real) value;
bp[80/8 .. eleSize] = 0; // Clear the padding area for consistency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target.realsize - Target.realpad

bp[80/8 .. eleSize] = 0; // Clear the padding area for consistency.
break;
default:
assert (0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a final switch then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

{
const last = eleSize - 1;
for (size_t x = 0; x <= last; ++x)
bp[x] = bp[last - x];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is inside a static if (false), but you probably want to use swap.

break;
const last = eleSize - 1;
for (size_t x = 0; x <= last; ++x)
bp[x] = bp[last - x];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe my thoughts about endianness don't apply here, as dmd doesn't internally represent data as the target sees it.

@ibuclaw
Copy link
Member

ibuclaw commented May 19, 2017

Other notes: Missing test cases, should also check the behaviour of type punning a zero sized array.

@ibuclaw
Copy link
Member

ibuclaw commented May 19, 2017

I know that repainting needs to handle endian-ness properly, but it's not difficult to do so provided that the endian-ness of both the host and the target are known. What predicate should I fill in here?

I probably wouldn't worry about it, dmd doesn't pretend to support mixed endianness anyway. I'd put it in the later queue as other compilers should already be handling this.

@ibuclaw
Copy link
Member

ibuclaw commented May 19, 2017

However the testsuite should be endianness aware.

@UplinkCoder
Copy link
Member

@tsbockman in general I dislike it when people extend the feature set of ctfe...

Also the implementation needs to be cleaned.

@ibuclaw
Copy link
Member

ibuclaw commented May 19, 2017

in general I dislike it when people extend the feature set of ctfe...

Why? Because that means you need to extend your feature set? 😉

@UplinkCoder
Copy link
Member

Why? Because that means you need to extend your feature set? 😉

Exactly.

@UplinkCoder
Copy link
Member

@tsbockman don't support Treal at all.
keep it at well defined IEEE types and that ought to be it.

@tsbockman
Copy link
Contributor Author

tsbockman commented May 19, 2017

don't support Treal at all.

std.math makes heavy use of real. Practically disallowing its use in CTFE is a severe restriction.

Also, real already works in CTFE, in general - it's just repainting that doesn't. So, it's not like you can just leave it out, anyway; there's lots of CTFE code out there that uses real.

@WalterBright
Copy link
Member

This test (fail_compilation/ctfe14207.d) also "fails", but I'm not sure why we need it?

The 14207 indicates it came from fixing https://issues.dlang.org/show_bug.cgi?id=14207 which gives a use case.

@tsbockman
Copy link
Contributor Author

https://issues.dlang.org/show_bug.cgi?id=14207 which gives a use case.

Given that the test case works at run time, I see no reason that it shouldn't be made to work at compile time, as well; I'll replace that fail_compilation test with a normal one to verify a correct result, instead.

@ibuclaw
Copy link
Member

ibuclaw commented May 19, 2017

The 14207 indicates it came from fixing https://issues.dlang.org/show_bug.cgi?id=14207 which gives a use case.

Looks like ICE -> Error to me. This PR makes it so we go from Error -> OK.

@tsbockman
Copy link
Contributor Author

It looks like everything is passing now. I still need to add a bunch more tests before this is ready to merge, though.

@tsbockman tsbockman force-pushed the ctfe_real branch 2 times, most recently from f6689ad to 081643a Compare May 19, 2017 19:56
@yebblies
Copy link
Contributor

This really is ugly. If we're just doing this to so that we can get std.math working, why not do it a much less ugly way?

Is there anything there that can't be done with a few intrinsics instead of pointer casting? eg getFloatExponent could do the masking and shifting under the hood, and will encourage the math code to be trivially portable. This also avoids the special-casing of 80-bit reals.

I've always seen ctfe preventing pointer casting and union reinterpretation as being a feature, not a bug.

@ibuclaw
Copy link
Member

ibuclaw commented May 21, 2017

Is there anything there that can't be done with a few intrinsics instead of pointer casting?

Bit setting may or may not be more awkward, see for instance std.math.floor() or ceil().

In any case the implementation underneath the hood would still remain the same. Whether or not the user code is close or far away from it doesn't really matter.

I have no objection to adding new properties to provide accessors, someone may request for a DIP though.

@yebblies
Copy link
Contributor

TBH it looks like floor would be a lot simpler with intrinsics. I agree with @UplinkCoder about increasing the ctfe feature set too, unless this has much broader use I'm not aware of.

Most similar stuff can be done in ctfe without reinterpretation, because you can reliably extract bits from integers. Intrinsics to extract bits from floats would IIUC complete this to the point that hashing and other low-level manipulations can be done in a clean and portable way.

This would also mean you could write low-level floating point code without ever having to care about padding. And unlike casting/unions, the intrinsics could be overloaded to provide the same interface for library float types.

@tsbockman What do you think about this?

@PetarKirov
Copy link
Member

I agree with @UplinkCoder about increasing the ctfe feature set too...

Intrinsics also increase the feature set, but do so by inventing a different language dialect incompatible with the rest language. I.e. they make writing generic code harder because you would need to have different code for CTFE and runtime - the same mistake that C++ does over and over again. Having a clean language that allows you to have code reuse is much more important than inventing new intrinsics for every single use case. Type repainting is a key part of every systems programmer's toolbox. Let's not add more hacks to the language.

@yebblies
Copy link
Contributor

I.e. they make writing generic code harder because you would need to have different code for CTFE and runtime

No, because the intrinsics would work at runtime. There is nothing clean about the current use of unions, take a look at the code in std.math.

@tsbockman
Copy link
Contributor Author

@yebblies Most of the benefits which you claim intrinsics would have (portability, cleaner code, etc.) over unions or reinterpret casts really just come from consolidating the union logic into one place.

That one place can be a type in phobos or druntime, or it can be some intrinsic functions implemented in the compiler. Either way, it makes no difference as far as portability, correctness, and code simplification goes.

As to the CTFE feature set - the arbitrary, undocumented restrictions placed upon CTFE code are a frequent source of frustration for me, especial since (last time I checked) if (__ctfe) breaks inlining in DMD.

There is nothing clean about the current use of unions, take a look at the code in std.math.

std.math is a mess because the union logic is spread out all over the module. Consolidating it in one place behind a sensible API will fix this. That's the point of my in-progress rewrite.

@yebblies
Copy link
Contributor

@tsbockman

That one place can be a type in phobos or druntime, or it can be some intrinsic functions implemented in the compiler. Either way, it makes no difference as far as portability, correctness, and code simplification goes.

It makes a difference as far as complexity of the supported set of features in ctfe. Adding this feature does not make it easier to specify the exact feature set either.

It sounds like you're saying an intrinsic based api would work for your purpose as long as it works at runtime and compile time. Our even that you're already designing such an api, and it just needs to be moved to druntime and added to the builtins list. Am i misunderstanding?

@tsbockman
Copy link
Contributor Author

tsbockman commented May 23, 2017

Adding this feature does not make it easier to specify the exact feature set either.

The restrictions themselves are the main problem; the lack of documentation is just salt in the wound. Expanding the CTFE feature set in the direction of the run time feature set lessens the need for separate documentation.

It sounds like you're saying an intrinsic based api would work for your purpose as long as it works at runtime and compile time. Our even that you're already designing such an api, and it just needs to be moved to druntime and added to the builtins list. Am i misunderstanding?

An appropriate intrinsic API would look similar to my RealRep union type, if it were chopped apart into a bunch of stand-alone functions. However, I haven't finalized the design because I figure the best way to design something that really does what's needed (including performance) is to keep it private until all of std.math has been rewritten to use it, revising the API as I go.

I wouldn't want to commit to a specific list of intrinsics right now, because while it's easy to write a list that is functionally complete, it's harder to figure out which set gives optimal performance.

@tsbockman
Copy link
Contributor Author

With further testing, I see that (with or without this PR) paintFloatInt() only works properly for rvalues.

Is there a reasonable way to extend this to work with lvalues, too? If not, I might just close this...

@ibuclaw
Copy link
Member

ibuclaw commented May 23, 2017

Exactly.

If you can't do constant propagation, and that includes pointer offset and devirtualization at compile time, then I would consider that a bug in ctfe. You should be able to support everything that a compiler optimizer is able to do, and then some.

@ibuclaw
Copy link
Member

ibuclaw commented May 23, 2017

Is there a reasonable way to extend this to work with lvalues, too? If not, I might just close this...

I would consider restricting it to rvalues is reasonable for the time being. Lvalues would just require in the worst case double painting.

*cast(int*) & f = 42;
(*cast(byte[4]*) & f)[0] = 1;

Its easier just to let the user do the rewrite as:

int i = 42;
f = *cast(float*) & i;

byte[4] b = *cast(byte[4]*) & f;
b[0] = 1;
f = *cast(float*) & b;

Instead of thrusting this on the compiler. At least that is my opinion until we perhaps switch to better ctfe technologies.

@tsbockman
Copy link
Contributor Author

Its easier just to let the user do the rewrite

I thought about that, but that's only a viable option if we can demonstrate that it generates the same run time code. In particular, I am concerned about this case:

real r = 1.5;
ushort[5] u = *(cast(ushort[5]*) &r);
u[4] = u[4] | 0x8000;
r = *(cast(real*) &u);
assert(r == -1.5);

Given that ushort[5] does not fit in a single register, I really hope that the optimizer is smart enough to eliminate the unnecessary load and store of u[0 .. 4].

I'll investigate.

@tsbockman
Copy link
Contributor Author

I really hope that the optimizer is smart enough to eliminate the unnecessary load and store of u[0 .. 4].

It's not:

real makeNegCT(real x)
{
  ushort[5] u = *(cast(ushort[5]*) &x);
  u[4] = u[4] | 0x8000;
  x = *(cast(real*) &u);
  return x;
}

real makeNegRT(real x)
{
  (cast(ushort*) &x)[4] |= 0x8000;
  return x;
}

makeNegRT() generates the desired code,

real makeNegRT(real):
 push   rax
 or     DWORD PTR [rsp+0x18],0x8000
 
 fld    TBYTE PTR [rsp+0x10]
 pop    rcx
 ret    
 nop    

But makeNegCT() does not:

real makeNegCT(real):
 sub    rsp,0x18
 lea    rsi,[rsp+0x20]
 lea    rdi,[rsp+0x8]
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
 movs   BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
 movs   BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
 or     DWORD PTR [rsp+0x10],0x8000
 
 fld    TBYTE PTR [rsp+0x8]
 add    rsp,0x18
 ret    
 nop    DWORD PTR [rax+rax+0x0]

(That's from DMD via asm.dlang.org; the results from GDC are similar. LDC produces sub-optimal code for both.)

@tsbockman tsbockman force-pushed the ctfe_real branch 2 times, most recently from 687d139 to 855a3de Compare May 29, 2017 16:40
@AndrewEdwards
Copy link
Contributor

@tsbockman - Can we get your eyes back on this. I has been quite a while. Is this a no go or are we at a point where this is now possible?

@tsbockman
Copy link
Contributor Author

tsbockman commented Sep 8, 2020

Is this a no go or are we at a point where this is now possible?

@AndrewEdwards This was always possible, from a technical perspective. The only blocker was the core compiler development team making a decision as to whether/how they want this feature implemented. See the posts of UplinkCoder, yebblies, and ibuclaw, above.

In the years since this PR stalled, I have learned much more about how to write the kind of low-level floating-point code that it is designed to support. I think I could now design a good intrinsic API for that purpose, if it were decided that yebblies' solution is best. However, I still believe that expanding the CTFE engine's support for reinterpret casts would be much better for the language in the long run.

Regardless, I don't feel very motivated to actually work on any of this again: I largely quit trying to contribute code to the D project years ago because I have watched far too many pull requests (both my own and those of others) stall indefinitely waiting for leadership to make some important decision.

It seems that when debates like this don't lead to a clear and objective answer, often leadership refuses or forgets to make any decision at all, so that it just ends up being a big waste of time for would-be contributors like myself.

@Geod24
Copy link
Member

Geod24 commented Sep 9, 2020

@tsbockman : If your goal is to get std.math working at compile time, there's been talks about reducing its dependence on real, driven by the performances impact of doing so. Would that help achieve your objective ?

Regarding the contribution process, I share the frustration, however things have improved quite a bit over the years, and I think if you were to resume contributing the experience would be much more pleasant. It's never going to be a walk in the park (we obviously need to set a high quality barrier for changing the language), but some unnecessary barriers were definitely lowered.

@WalterBright
Copy link
Member

@tsbockman I do apologize for that. On a pragmatic note, smaller, more incremental changes are far more likely to get approved. It's not unusual to have a dozen or more separate PRs for a single change, for example, a recent series where I redid how code was generated for divide-by-constant.

@AndrewEdwards
Copy link
Contributor

@tsbockman - I was specifically referring to this comment and the ensuing conversation with Iain.

With further testing, I see that (with or without this PR) paintFloatInt() only works properly for rvalues.

Is there a reasonable way to extend this to work with lvalues, too? If not, I might just close this..

Thanks for your response. I aim to assist on improving the experience with contributions in general.

@WalterBright, @yebblies, @UplinkCoder. From what I understand, the general premise is a firm decision on whether to implement via intrinsics vice the current implementation and exclusion/inclusion of real from the implementation is necessary before moving forward. Please discuss and decide on this matter so that it is clear which approach to take.

@tsbockman
Copy link
Contributor Author

tsbockman commented Sep 9, 2020

I was specifically referring to this comment and the ensuing conversation with Iain.

If your goal is to get std.math working at compile time, there's been talks about reducing its dependence on real, driven by the performances impact of doing so. Would that help achieve your objective ?

My actual goal was to make it possible to do major refactoring and optimizations in std.math without introducing regressions in its existing partial CTFE support: see my Phobos PR, #4336, in particular the discussion starting here.

When this PR stalled without a clear path forward or alternative, I eventually just decided to fork std.math for my own personal use. My current version (which is not online anywhere at the moment):

  • Is similar in speed, or significantly faster depending on the task
  • Shares far more code between float, double, and real
  • Also supports half-precision floating point
  • Uses almost no assembly code
  • Is much easier to read and understand, as it is written in a higher-level style
  • Improves precision and/or correctness in some cases

However, it does not have good CTFE support, because there is no way in D as it currently stands to write some low-level floating-point algorithms in a way that is correct and efficient at both compile-time and run-time.

From what I understand, the general premise is a firm decision on whether to implement via intrinsics vice the current implementation and exclusion/inclusion of real from the implementation is necessary before moving forward.

Correct. The alternatives are:

  1. Improve CTFE support for reinterpret casts and/or unions via this PR, and via future work to fix this for lvalues.
  2. Scrap this PR in favour of a new intrinsic API to solve the CTFE problem, as yebblies suggested above.
  3. Accept that there will be some regressions in std.math's CTFE support, in which case this PR isn't needed.
  4. Reject my work on std.math, in which case this PR isn't important and may as well be closed.

Of course this PR is not ready to merge as-is, but there is no point in me working on that unless the compiler dev team is pretty sure they want to take option (1).

@AndrewEdwards
Copy link
Contributor

Ping @WalterBright, @yebblies, @UplinkCoder.

@tsbockman tsbockman closed this Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants