Skip to content

hypot template#2548

Closed
9il wants to merge 1 commit intodlang:masterfrom
9il:fpt
Closed

hypot template#2548
9il wants to merge 1 commit intodlang:masterfrom
9il:fpt

Conversation

@9il
Copy link
Member

@9il 9il commented Sep 22, 2014

Win64 still fails. Help is needed from someone with Win64.

Blocked

DMD issue: (workaround)

Contents

  1. Remove FPTemporary usage in std.complex. remove FPTemporary usage #2637 (merged)
  2. Remove FPTemporary usage in std.numeric. remove FPTemporary usage #2637 (merged)
  3. Change hypot to template. <---------- fails (workaround)
  4. Clean up imports in std.complex. std.complex: clean imports #2659 (merged)
  5. Clean up imports in std.numeric. std.numeric: clean imports #2658 (merged)
  6. Change isfinite, isinf to isFinite, isInfinity in std.numeric and std.math. -> std.math: deprecate backwards compatibility with Phobos1 #2636 (merged)

@9il
Copy link
Member Author

9il commented Sep 22, 2014

assert(abs(cdouble(-1+1i)) == sqrt(2.0 ));

Fails on Windows_64

@9il 9il changed the title 1. FPTemporary usage 2. hypot update (Win64 fails, need help) 1. FPTemporary usage 2. hypot update Sep 22, 2014
@quickfur
Copy link
Member

Could the failure be caused by roundoff error? Have you tried approxEqual instead? Or at least insert some writelns before the assert so that you can at least see what value was actually produced in the autotester.

@9il
Copy link
Member Author

9il commented Sep 23, 2014

If it is possible approxEqual shouldn't been used in stuff like this. This function assume full precision. I will try tomorrow to test it)

There is one thing that I can't understand:

auto abs(Num)(Num z) @safe pure nothrow @nogc
    if (is(Num* : const(cfloat*)) || is(Num* : const(cdouble*))
            || is(Num* : const(creal*)))

Why pointers?

@yebblies
Copy link
Contributor

auto abs(Num)(Num z) @safe pure nothrow @nogc
    if (is(Num* : const(cfloat*)) || is(Num* : const(cdouble*))
            || is(Num* : const(creal*)))

To exclude types that convert to complex but aren't complex.

@9il
Copy link
Member Author

9il commented Sep 26, 2014

@quickfur Seems that writefln and assert (..., format(...)) don`t produce any information in D autotester.

@quickfur
Copy link
Member

The error seems to happen 1 line above your throw statement. I.e., this line:

assert(hypot(c.re, c.im) == sqrt(2.0));

@quickfur
Copy link
Member

Oh, also, you probably should just use writeln instead of throwing an exception. Just in case the message somehow got lost in transit.

@9il
Copy link
Member Author

9il commented Sep 26, 2014

Thanks!

@quickfur
Copy link
Member

Whoa. Either there's a bug in std.format (%a is supposed to be exact, and those two numbers certainly look identical!), or there's a codegen bug that causes comparison of two equal values to fail.

@quickfur
Copy link
Member

Maybe try writefln("%a", cast(double)(...)) and writefln("%a", cast(real)(...))? Something looks fishy here, since cdouble is supposed to have double precision, but writefln is only printing enough digits for float (single) precision.

@9il
Copy link
Member Author

9il commented Sep 27, 2014

This float format is just another one Microsoft feature XD

    version (linux)
    {
        assert(stream.data == "1.67 -0X1.47AE147AE147BP+0 nan",
                stream.data);
    }
    else version (OSX)
    {
        assert(stream.data == "1.67 -0X1.47AE147AE147BP+0 nan",
                stream.data);
    }
    else version (MinGW)
    {
        assert(stream.data == "1.67 -0XA.3D70A3D70A3D8P-3 nan",
                stream.data);
    }
    else version (CRuntime_Microsoft)
    {
        assert(stream.data == "1.67 -0X1.47AE14P+0 nan",
                stream.data);
    }

@9il 9il changed the title (Win64 fails, need help) 1. FPTemporary usage 2. hypot update (Blocked) 1. FPTemporary usage 2. hypot update Sep 27, 2014
@9il
Copy link
Member Author

9il commented Sep 27, 2014

Blocked by DMD

@quickfur
Copy link
Member

Did you file a dmd bug?

@9il
Copy link
Member Author

9il commented Sep 27, 2014

Looks like https://issues.dlang.org doesn't work.

Software error:

Can't connect to the database.
Error: Access denied for user 'bugs'@'localhost' (using password: YES)
Is your database installed and up and running?
Do you have the correct username and password selected in localconfig?

For help, please send mail to the webmaster (webmaster@puremagic.com), giving this error message and the time and date of the error.

@quickfur quickfur changed the title (Blocked) 1. FPTemporary usage 2. hypot update 1. FPTemporary usage 2. hypot update Sep 27, 2014
@quickfur
Copy link
Member

https://issues.dlang.org is back online.

@DmitryOlshansky
Copy link
Member

@9il might be possible to break this up in smaller chunks?

@9il
Copy link
Member Author

9il commented Oct 23, 2014

@DmitryOlshansky I will create new PR for all changes except hypot update.

@9il
Copy link
Member Author

9il commented Oct 23, 2014

Or 2-3 PRs will be better?

@DmitryOlshansky
Copy link
Member

2-3 is small ones is faster to pull.

@9il
Copy link
Member Author

9il commented Oct 24, 2014

ok

@9il 9il changed the title 1. FPTemporary usage 2. hypot update hypot update Oct 24, 2014
@9il
Copy link
Member Author

9il commented Nov 7, 2014

All other changes are already in master.
Rebased.

@quickfur
Copy link
Member

quickfur commented Nov 7, 2014

Unittests are failing, fix plz. :)

@9il
Copy link
Member Author

9il commented Nov 7, 2014

@quickfur
This is DMD issue.

@9il
Copy link
Member Author

9il commented Nov 7, 2014

See main comment.

@quickfur
Copy link
Member

quickfur commented Nov 7, 2014

Oh, the dmd bug is still not fixed? Nevermind then.

@DmitryOlshansky
Copy link
Member

Seems to fail on some 32-bit platforms btw. @9il I have Win64 - where to dig?

@9il
Copy link
Member Author

9il commented Jul 29, 2015

Disassembling this code would help, thought.

double two = 2.0;
auto a = sqrt(two);
auto a = sqrt(2.0)
cdouble c = -1+1i;
auto a = hypot(c.re, c.im) ;
cdouble c = -1+1i;
auto a = abs(c) ;

Can you get it and past please?

@DmitryOlshansky
Copy link
Member

After merging your pull and rebuilding phobos this D code:

import std.math;

void foo1(){
    double two = 2.0;
    auto a = sqrt(two);
}

void foo2(){
    auto a = sqrt(2.0);
}

void foo3(){
    cdouble c = -1+1i;
    auto a = hypot(c.re, c.im);
}

void foo4(){
    cdouble c = -1+1i;
    auto a = abs(c) ;
}

produces this ASM:

.text:0000000000000140 _D6test644foo1FZv proc near             
.text:0000000000000140
.text:0000000000000140 var_20          = qword ptr -20h
.text:0000000000000140 var_8           = qword ptr -8
.text:0000000000000140
.text:0000000000000140                 push    rbp
.text:0000000000000141                 mov     rbp, rsp
.text:0000000000000144                 sub     rsp, 20h
.text:0000000000000148                 movsd   xmm0, cs:_TMP0
.text:0000000000000151                 movsd   [rbp+var_8], xmm0
.text:0000000000000157                 fld     [rbp+var_8]
.text:000000000000015A                 fsqrt
.text:000000000000015C                 fstp    [rbp+var_20]
.text:000000000000015F                 movsd   xmm1, [rbp+var_20]
.text:0000000000000164                 lea     rsp, [rbp+0]
.text:0000000000000168                 pop     rbp
.text:0000000000000169                 retn
.text:0000000000000169 _D6test644foo1FZv endp

.text:0000000000000188 _D6test644foo2FZv proc near             
.text:0000000000000188
.text:0000000000000188 var_10          = qword ptr -10h
.text:0000000000000188
.text:0000000000000188                 push    rbp
.text:0000000000000189                 mov     rbp, rsp
.text:000000000000018C                 sub     rsp, 10h
.text:0000000000000190                 fld     cs:_TMP0
.text:0000000000000196                 fsqrt
.text:0000000000000198                 fstp    [rbp+var_10]
.text:000000000000019B                 movsd   xmm0, [rbp+var_10]
.text:00000000000001A0                 lea     rsp, [rbp+0]
.text:00000000000001A4                 pop     rbp
.text:00000000000001A5                 retn
.text:00000000000001A5 _D6test644foo2FZv endp

.text:00000000000001C0 _D6test644foo3FZv proc near             
.text:00000000000001C0
.text:00000000000001C0 var_10          = qword ptr -10h
.text:00000000000001C0 var_8           = qword ptr -8
.text:00000000000001C0
.text:00000000000001C0                 push    rbp
.text:00000000000001C1                 mov     rbp, rsp
.text:00000000000001C4                 sub     rsp, 10h
.text:00000000000001C8                 fld     cs:_TMP1
.text:00000000000001CE                 fld     cs:dbl_100
.text:00000000000001D4                 fstp    [rbp+var_8]
.text:00000000000001D7                 fstp    [rbp+var_10]
.text:00000000000001DA                 movsd   xmm1, [rbp+var_10]
.text:00000000000001E0                 movsd   xmm0, [rbp+var_8]
.text:00000000000001E6                 sub     rsp, 20h
.text:00000000000001EA                 movq    rdx, xmm1
.text:00000000000001EF                 movq    rcx, xmm0
.text:00000000000001F4                 call    _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd
.text:00000000000001F9                 add     rsp, 20h
.text:00000000000001FD                 lea     rsp, [rbp+0]
.text:0000000000000201                 pop     rbp
.text:0000000000000202                 retn
.text:0000000000000202 _D6test644foo3FZv endp

.text:0000000000000220 _D6test644foo4FZv proc near             ; DATA XREF: .pdata:$pdata$_D6test644foo4FZv�o
.text:0000000000000220                                         ; .pdata:000000000000026C�o
.text:0000000000000220
.text:0000000000000220 var_20          = qword ptr -20h
.text:0000000000000220 var_18          = qword ptr -18h
.text:0000000000000220 var_10          = qword ptr -10h
.text:0000000000000220 var_8           = qword ptr -8
.text:0000000000000220
.text:0000000000000220                 push    rbp
.text:0000000000000221                 mov     rbp, rsp
.text:0000000000000224                 sub     rsp, 20h
.text:0000000000000228                 fld     cs:_TMP1
.text:000000000000022E                 fld     cs:dbl_100
.text:0000000000000234                 fstp    [rbp+var_18]
.text:0000000000000237                 fstp    [rbp+var_20]
.text:000000000000023A                 fld     [rbp+var_20]
.text:000000000000023D                 fld     [rbp+var_18]
.text:0000000000000240                 fstp    [rbp+var_8]
.text:0000000000000243                 fstp    [rbp+var_10]
.text:0000000000000246                 lea     rcx, [rbp+var_10]
.text:000000000000024A                 sub     rsp, 20h
.text:000000000000024E                 call    _D3std4math10__T3absTrZ3absFNaNbNiNfrZd
.text:0000000000000253                 add     rsp, 20h
.text:0000000000000257                 lea     rsp, [rbp+0]
.text:000000000000025B                 pop     rbp
.text:000000000000025C                 retn
.text:000000000000025C _D6test644foo4FZv endp

@DmitryOlshansky
Copy link
Member

Also I added T.stringof to 562 line in math.d and T happens to be cfloat. I guess it's one of 'em codegen bugs with built-in complex numbers. I really-really wish we remove them for good.

@9il
Copy link
Member Author

9il commented Aug 3, 2015

Please, can you add asm with -O -inline -release ?

@DmitryOlshansky
Copy link
Member

foo1 and foo2 were optimized to single ret + stack push/pop.
Others are:

.text:0000000000000188 _D6test644foo3FZv proc near            
.text:0000000000000188
.text:0000000000000188 var_10          = qword ptr -10h
.text:0000000000000188 var_8           = qword ptr -8
.text:0000000000000188
.text:0000000000000188                 push    rbp
.text:0000000000000189                 mov     rbp, rsp
.text:000000000000018C                 sub     rsp, 10h
.text:0000000000000190                 fld     cs:_TMP1
.text:0000000000000196                 fld     cs:dbl_100
.text:000000000000019C                 fstp    [rbp+var_8]
.text:000000000000019F                 fstp    [rbp+var_10]
.text:00000000000001A2                 movsd   xmm1, cs:_TMP1
.text:00000000000001AB                 movsd   xmm0, [rbp+var_8]
.text:00000000000001B1                 sub     rsp, 20h
.text:00000000000001B5                 movq    rdx, xmm1
.text:00000000000001BA                 movq    rcx, xmm0
.text:00000000000001BF                 call    _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd
.text:00000000000001C4                 add     rsp, 20h
.text:00000000000001C8                 lea     rsp, [rbp+0]
.text:00000000000001CC                 pop     rbp
.text:00000000000001CD                 retn
.text:00000000000001CD _D6test644foo3FZv endp


.text:00000000000001E8 _D6test644foo4FZv proc near             
.text:00000000000001E8
.text:00000000000001E8 var_10          = qword ptr -10h
.text:00000000000001E8 var_8           = qword ptr -8
.text:00000000000001E8
.text:00000000000001E8                 push    rbp
.text:00000000000001E9                 mov     rbp, rsp
.text:00000000000001EC                 sub     rsp, 10h
.text:00000000000001F0                 fld     cs:_TMP1
.text:00000000000001F6                 fld     cs:dbl_100
.text:00000000000001FC                 fstp    [rbp+var_8]
.text:00000000000001FF                 fstp    [rbp+var_10]
.text:0000000000000202                 movsd   xmm1, cs:_TMP1
.text:000000000000020B                 movsd   xmm0, [rbp+var_8]
.text:0000000000000211                 sub     rsp, 20h
.text:0000000000000215                 movq    rdx, xmm1
.text:000000000000021A                 movq    rcx, xmm0
.text:000000000000021F                 call    _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd
.text:0000000000000224                 add     rsp, 20h
.text:0000000000000228                 lea     rsp, [rbp+0]
.text:000000000000022C                 pop     rbp
.text:000000000000022D                 retn
.text:000000000000022D _D6test644foo4FZv endp

@9il
Copy link
Member Author

9il commented Aug 3, 2015

Nice, foo3 and foo4 looks identical.
Can you past also _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd?

@DmitryOlshansky
Copy link
Member

.text:0000000000000248 _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd proc near
.text:0000000000000248
.text:0000000000000248 var_20          = qword ptr -20h
.text:0000000000000248 var_10          = qword ptr -10h
.text:0000000000000248 var_8           = qword ptr -8
.text:0000000000000248 arg_0           = qword ptr  10h
.text:0000000000000248 arg_8           = qword ptr  18h
.text:0000000000000248
.text:0000000000000248                 push    rbp
.text:0000000000000249                 mov     rbp, rsp
.text:000000000000024C                 sub     rsp, 20h
.text:0000000000000250                 movsd   [rbp+arg_0], xmm0
.text:0000000000000255                 movsd   [rbp+arg_8], xmm1
.text:000000000000025A                 fld     [rbp+arg_8]
.text:000000000000025D                 fabs
.text:000000000000025F                 fstp    [rbp+var_20]
.text:0000000000000262                 movsd   xmm1, [rbp+var_20]
.text:0000000000000267                 movsd   [rbp+var_10], xmm1
.text:000000000000026D                 movsd   [rbp+var_20], xmm1
.text:0000000000000272                 fld     [rbp+var_20]
.text:0000000000000275                 fld     [rbp+arg_0]
.text:0000000000000278                 fabs
.text:000000000000027A                 fstp    [rbp+var_20]
.text:000000000000027D                 movsd   xmm0, [rbp+var_20]
.text:0000000000000282                 movsd   [rbp+var_8], xmm0
.text:0000000000000288                 movsd   [rbp+var_20], xmm0
.text:000000000000028D                 fld     [rbp+var_20]
.text:0000000000000290                 fcomip  st, st(1)
.text:0000000000000292                 fstp    st
.text:0000000000000294                 jp      short loc_298
.text:0000000000000296                 jbe     short loc_2EE
.text:0000000000000298
.text:0000000000000298 loc_298:                                ; CODE XREF: _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+4C�j
.text:0000000000000298                 movsd   xmm2, [rbp+var_10]
.text:000000000000029E                 movsd   [rbp+var_8], xmm2
.text:00000000000002A4                 fld     [rbp+arg_0]
.text:00000000000002A7                 fabs
.text:00000000000002A9                 fstp    [rbp+var_20]
.text:00000000000002AC                 movsd   xmm3, [rbp+var_20]
.text:00000000000002B1                 movsd   [rbp+var_10], xmm3
.text:00000000000002B7                 movsd   [rbp+var_20], xmm3
.text:00000000000002BC                 fld     [rbp+var_20]
.text:00000000000002BF                 fld     cs:_TMP2
.text:00000000000002C5                 fucomip st, st(1)
.text:00000000000002C7                 fstp    st
.text:00000000000002C9                 jp      short loc_2D1
.text:00000000000002CB                 jz      loc_456
.text:00000000000002D1
.text:00000000000002D1 loc_2D1:                                ; CODE XREF: _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+81�j
.text:00000000000002D1                 fld     [rbp+var_8]
.text:00000000000002D4                 fld     cs:_TMP2
.text:00000000000002DA                 fucomip st, st(1)
.text:00000000000002DC                 fstp    st
.text:00000000000002DE                 jnz     short loc_2EE
.text:00000000000002E0                 jp      short loc_2EE
.text:00000000000002E2                 movsd   xmm0, [rbp+var_8]
.text:00000000000002E8                 lea     rsp, [rbp+0]
.text:00000000000002EC                 pop     rbp
.text:00000000000002ED                 retn
.text:00000000000002EE ; ---------------------------------------------------------------------------
.text:00000000000002EE
.text:00000000000002EE loc_2EE:                                ; CODE XREF: _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+4E�j
.text:00000000000002EE                                         ; _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+96�j ...
.text:00000000000002EE                 fld     [rbp+var_8]
.text:00000000000002F1                 fld     cs:_TMP3
.text:00000000000002F7                 fcomip  st, st(1)
.text:00000000000002F9                 fstp    st
.text:00000000000002FB                 ja      loc_39B
.text:0000000000000301                 jp      loc_39B
.text:0000000000000307                 movsd   xmm2, cs:_TMP4
.text:0000000000000310                 movsd   xmm3, [rbp+var_10]
.text:0000000000000316                 mulsd   xmm3, xmm2
.text:000000000000031A                 movsd   [rbp+var_10], xmm3
.text:0000000000000320                 movsd   xmm4, cs:_TMP4
.text:0000000000000329                 movsd   xmm0, [rbp+var_8]
.text:000000000000032F                 mulsd   xmm0, xmm4
.text:0000000000000333                 movsd   [rbp+var_8], xmm0
.text:0000000000000339                 movsd   xmm1, [rbp+var_10]
.text:000000000000033F                 movsd   xmm2, [rbp+var_10]
.text:0000000000000345                 mulsd   xmm2, xmm1
.text:0000000000000349                 movsd   [rbp+var_10], xmm2
.text:000000000000034F                 movsd   xmm3, [rbp+var_8]
.text:0000000000000355                 movsd   xmm4, [rbp+var_8]
.text:000000000000035B                 mulsd   xmm4, xmm3
.text:000000000000035F                 movsd   [rbp+var_8], xmm4
.text:0000000000000365                 movsd   xmm0, [rbp+var_10]
.text:000000000000036B                 movsd   xmm1, [rbp+var_8]
.text:0000000000000371                 addsd   xmm1, xmm0
.text:0000000000000375                 movsd   [rbp+var_8], xmm1
.text:000000000000037B                 fld     [rbp+var_8]
.text:000000000000037E                 fsqrt
.text:0000000000000380                 fstp    [rbp+var_20]
.text:0000000000000383                 movsd   xmm0, [rbp+var_20]
.text:0000000000000388                 movsd   xmm2, cs:_TMP5
.text:0000000000000391                 mulsd   xmm0, xmm2
.text:0000000000000395                 lea     rsp, [rbp+0]
.text:0000000000000399                 pop     rbp
.text:000000000000039A                 retn
.text:000000000000039B ; ---------------------------------------------------------------------------
.text:000000000000039B
.text:000000000000039B loc_39B:                                ; CODE XREF: _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+B3�j
.text:000000000000039B                                         ; _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+B9�j
.text:000000000000039B                 fld     [rbp+var_10]
.text:000000000000039E                 fld     cs:_TMP6
.text:00000000000003A4                 fcomip  st, st(1)
.text:00000000000003A6                 fstp    st
.text:00000000000003A8                 jb      loc_442
.text:00000000000003AE                 movsd   xmm1, cs:_TMP7
.text:00000000000003B7                 movsd   xmm2, [rbp+var_10]
.text:00000000000003BD                 mulsd   xmm2, xmm1
.text:00000000000003C1                 movsd   [rbp+var_10], xmm2
.text:00000000000003C7                 movsd   xmm3, cs:_TMP7
.text:00000000000003D0                 movsd   xmm4, [rbp+var_8]
.text:00000000000003D6                 mulsd   xmm4, xmm3
.text:00000000000003DA                 movsd   [rbp+var_8], xmm4
.text:00000000000003E0                 movsd   xmm0, [rbp+var_10]
.text:00000000000003E6                 movsd   xmm1, [rbp+var_10]
.text:00000000000003EC                 mulsd   xmm1, xmm0
.text:00000000000003F0                 movsd   [rbp+var_10], xmm1
.text:00000000000003F6                 movsd   xmm2, [rbp+var_8]
.text:00000000000003FC                 movsd   xmm3, [rbp+var_8]
.text:0000000000000402                 mulsd   xmm3, xmm2
.text:0000000000000406                 movsd   [rbp+var_8], xmm3
.text:000000000000040C                 movsd   xmm4, [rbp+var_10]
.text:0000000000000412                 movsd   xmm0, [rbp+var_8]
.text:0000000000000418                 addsd   xmm0, xmm4
.text:000000000000041C                 movsd   [rbp+var_8], xmm0
.text:0000000000000422                 fld     [rbp+var_8]
.text:0000000000000425                 fsqrt
.text:0000000000000427                 fstp    [rbp+var_20]
.text:000000000000042A                 movsd   xmm0, [rbp+var_20]
.text:000000000000042F                 movsd   xmm1, cs:_TMP8
.text:0000000000000438                 mulsd   xmm0, xmm1
.text:000000000000043C                 lea     rsp, [rbp+0]
.text:0000000000000440                 pop     rbp
.text:0000000000000441                 retn
.text:0000000000000442 ; ---------------------------------------------------------------------------
.text:0000000000000442
.text:0000000000000442 loc_442:                                ; CODE XREF: _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+160�j
.text:0000000000000442                 fld     [rbp+var_10]
.text:0000000000000445                 fmul    cs:_TMP9
.text:000000000000044B                 fld     [rbp+var_8]
.text:000000000000044E                 fcomip  st, st(1)
.text:0000000000000450                 fstp    st
.text:0000000000000452                 jnb     short loc_462
.text:0000000000000454                 jp      short loc_462
.text:0000000000000456
.text:0000000000000456 loc_456:                                ; CODE XREF: _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+83�j
.text:0000000000000456                 movsd   xmm0, [rbp+var_10]
.text:000000000000045C                 lea     rsp, [rbp+0]
.text:0000000000000460                 pop     rbp
.text:0000000000000461                 retn
.text:0000000000000462 ; ---------------------------------------------------------------------------
.text:0000000000000462
.text:0000000000000462 loc_462:                                ; CODE XREF: _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+20A�j
.text:0000000000000462                                         ; _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd+20C�j
.text:0000000000000462                 movsd   xmm4, [rbp+var_10]
.text:0000000000000468                 movsd   xmm1, [rbp+var_10]
.text:000000000000046E                 mulsd   xmm1, xmm4
.text:0000000000000472                 movsd   [rbp+var_10], xmm1
.text:0000000000000478                 movsd   xmm0, [rbp+var_8]
.text:000000000000047E                 movsd   xmm2, [rbp+var_8]
.text:0000000000000484                 mulsd   xmm2, xmm0
.text:0000000000000488                 movsd   [rbp+var_8], xmm2
.text:000000000000048E                 movsd   xmm3, [rbp+var_10]
.text:0000000000000494                 movsd   xmm4, [rbp+var_8]
.text:000000000000049A                 addsd   xmm4, xmm3
.text:000000000000049E                 movsd   [rbp+var_8], xmm4
.text:00000000000004A4                 fld     [rbp+var_8]
.text:00000000000004A7                 fsqrt
.text:00000000000004A9                 fstp    [rbp+var_20]
.text:00000000000004AC                 movsd   xmm0, [rbp+var_20]
.text:00000000000004B1                 lea     rsp, [rbp+0]
.text:00000000000004B5                 pop     rbp
.text:00000000000004B6                 retn
.text:00000000000004B6 _D3std4math14__T5hypotTdTdZ5hypotFNaNbNiNfxdxdZd endp

@DmitryOlshansky
Copy link
Member

Nice, foo3 and foo4 looks identical.

OMG. Seems like abs version calls the wrong function. And it's caused by -inline switch. I tried all combinations of -O and -inline, -O has no negative effect.

@9il
Copy link
Member Author

9il commented Aug 3, 2015

Nice, foo3 and foo4 looks identical.

OMG. Seems like abs version calls the wrong function. And it's caused by -inline switch. I tried all combinations of -O and -inline, -O has no negative effect.

Why you think so? Should not abs calls hypot for complex numbers?

@DmitryOlshansky
Copy link
Member

Why you think so? Should not abs calls hypot for complex numbers?

Ehm right, I must be tired.

@9il
Copy link
Member Author

9il commented Aug 3, 2015

Hypot code looks really crazy. Excuse me please for a lot of asm requests. This is another one:

void foo5(){
    cdouble c = -1+1i;
    auto a = hypot(c.re, cast(double)c.im); //idouble -> double?
}

@9il
Copy link
Member Author

9il commented Aug 3, 2015

foo5 is fine, you can ignore previous msg.

@9il
Copy link
Member Author

9il commented Aug 3, 2015

I can not find wrong code.
Can you please upload code for all file compiled with -w -dip25 -m32 -fPIC -O -release -unittest -c (like in auto-tester)?

@DmitryOlshansky
Copy link
Member

64bit binary:
https://www.dropbox.com/s/plwv9ce3zs9vfk0/math_64.exe?dl=1

32bit binary (also asserts for me):
https://www.dropbox.com/s/ec9jwxjoap1k7n8/math.exe?dl=1

Both with debug info I hope you can navigate in them with disassembler.

@9il
Copy link
Member Author

9il commented Aug 3, 2015

otool -tvV ./math.exe
./math.exe: is not an object file

Please send the text file.

On 03 Aug 2015, at 13:51, Dmitry Olshansky notifications@github.com wrote:

64bit binary:
https://www.dropbox.com/s/plwv9ce3zs9vfk0/math_64.exe?dl=1 https://www.dropbox.com/s/plwv9ce3zs9vfk0/math_64.exe?dl=1
32bit binary (also asserts for me):
https://www.dropbox.com/s/ec9jwxjoap1k7n8/math.exe?dl=1 https://www.dropbox.com/s/ec9jwxjoap1k7n8/math.exe?dl=1
Both with debug info I hope you can navigate in them with disassembler.


Reply to this email directly or view it on GitHub #2548 (comment).

@DmitryOlshansky
Copy link
Member

@9il
Copy link
Member Author

9il commented Aug 3, 2015

I can not find any of fooX because function names defined like sub_402DF0 proc near :(

@DmitryOlshansky
Copy link
Member

I'm certain 64-bit version has names. 32-bit sadly has broken debug symbols and disassembler fails to load them.

@9il
Copy link
Member Author

9il commented Aug 3, 2015

I have 32 bit version now )

@9il
Copy link
Member Author

9il commented Aug 3, 2015

But I can not catch error on Mac, because Darwin_64_32 not fails. So my 32bit binaries are useless.

@DmitryOlshansky
Copy link
Member

Is there VirtualBox for Mac? At home my Linux sits in VBox on Windows host and find it incredibly easy to switch between the two.

@9il
Copy link
Member Author

9il commented Aug 3, 2015

I‘ll try with VB.

On 03 Aug 2015, at 19:05, Dmitry Olshansky notifications@github.com wrote:

Is there VirtualBox for Mac? At home my Linux sits in VBox on Windows host and find it incredibly easy to switch between the two.


Reply to this email directly or view it on GitHub #2548 (comment).

@wilzbach
Copy link
Contributor

Merge conflict - please rebase :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants