Skip to content

Upgrade front-end & libs to v2.084.0#2946

Merged
kinke merged 44 commits intoldc-developers:masterfrom
kinke:merge-2.084
Jan 18, 2019
Merged

Upgrade front-end & libs to v2.084.0#2946
kinke merged 44 commits intoldc-developers:masterfrom
kinke:merge-2.084

Conversation

@kinke
Copy link
Member

@kinke kinke commented Dec 18, 2018

No description provided.

kinke added 11 commits December 18, 2018 02:31
std.internal.digest.sha_SSSE3 now supports Win64 too. It contains naked
DMD-style asm which isn't easily portable to LLVM asm. For Win32, there
are 2 more modules - the reason we aren't shipping with LTO libs for
Win32 yet (but for Win64).

I took a pragmatic approach by compiling these modules separately - only
generating the binary object file <skipping the bitcode one) and using it
for both binary and LTO libs.
It's been broken for core.math.ldexp since switching to the alias.
The std.math.ldexp builtins were removed upstream, which makes sense, as
they all rely on core.math.ldexp.
For proper OS and MODEL vars. E.g., OS is needed for the new stdcpp
test, as OS=osx needs `-stdlib=libc++` in the clang command line.
@kinke kinke changed the title WIP: Upgrade front-end & libs to v2.084.0-beta.1 WIP: Upgrade front-end & libs to v2.084.0 Jan 5, 2019
@kinke
Copy link
Member Author

kinke commented Jan 9, 2019

Nothing in particular, it's just the a lot is going on. ;)

kinke added 3 commits January 9, 2019 23:48
Cache length & ptr in DSliceValue, so that e.g. a pair constructed from
a constant length and some ptr keeps returning a constant length instead
of an extractvalue instruction every time the length is needed.

This enables checking for matching constant lengths when copying slices
and makes `test1()` in runnable/betterc.d work (memcpy instead of
_d_array_slice_copy call):

```
int[10] a1 = void;
int[10] a2 = void;
a1[] = a2[];
```

(more or less equivalent to `a1 = a2`, which is already optimized)
@kinke
Copy link
Member Author

kinke commented Jan 9, 2019

The slice copy optimization wasn't really needed either, but I thought it deserves a bit of attention.

int[10] a1 = void;
int[10] a2 = void;
a1[] = a2[];

Old IR (v1.13):

  %a1 = alloca [10 x i32], align 4                ; [#uses = 2, size/byte = 40]
  %a2 = alloca [10 x i32], align 4                ; [#uses = 1, size/byte = 40]
  %1 = bitcast [10 x i32]* %a2 to i32*            ; [#uses = 1]
  %2 = insertvalue { i64, i32* } { i64 10, i32* undef }, i32* %1, 1 ; [#uses = 2]
  %3 = bitcast [10 x i32]* %a1 to i32*            ; [#uses = 1]
  %4 = bitcast i32* %3 to i8*                     ; [#uses = 1]
  %.ptr = extractvalue { i64, i32* } %2, 1        ; [#uses = 1]
  %5 = bitcast i32* %.ptr to i8*                  ; [#uses = 1]
  %.len = extractvalue { i64, i32* } %2, 0        ; [#uses = 1]
  %6 = mul i64 %.len, 4                           ; [#uses = 1]
  call void @_d_array_slice_copy(i8* nocapture %4, i64 40, i8* nocapture %5, i64 %6) #1
  %7 = bitcast [10 x i32]* %a1 to i32*            ; [#uses = 1]
  %8 = insertvalue { i64, i32* } { i64 10, i32* undef }, i32* %7, 1 ; [#uses = 0]
  ret void

New IR:

  %a1 = alloca [10 x i32], align 4                ; [#uses = 2, size/byte = 40]
  %a2 = alloca [10 x i32], align 4                ; [#uses = 1, size/byte = 40]
  %1 = bitcast [10 x i32]* %a2 to i32*            ; [#uses = 2]
  %2 = insertvalue { i64, i32* } { i64 10, i32* undef }, i32* %1, 1 ; [#uses = 0]
  %3 = bitcast [10 x i32]* %a1 to i32*            ; [#uses = 1]
  %4 = bitcast i32* %3 to i8*                     ; [#uses = 1]
  %5 = bitcast i32* %1 to i8*                     ; [#uses = 1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %4, i8* align 1 %5, i64 40, i1 false)
  %6 = bitcast [10 x i32]* %a1 to i32*            ; [#uses = 1]
  %7 = insertvalue { i64, i32* } { i64 10, i32* undef }, i32* %6, 1 ; [#uses = 0]
  ret void

This optimization doesn't check for overlap, but neither does the existing optimization for static arrays on both sides (a1 = a2).


As to the actual _d_array_slice_copy -betterC issue, I'm just not convinced inlining in each backend is the way to go, considering the number of still missing similar functions and the hassle of emitting such code (compared to writing 3 a few lines in D), so I disabled that subtest for now.

@kinke
Copy link
Member Author

kinke commented Jan 10, 2019

Btw, the bounds are still checked for

int[10] a1 = void;
int[10] a2 = void;
a1[0..256] = a2[0..256];

(but no subsequent _d_array_slice_copy call anymore).

@kinke
Copy link
Member Author

kinke commented Jan 10, 2019

CircleCI OSX (64-bit only):

c++ -m64 -c runnable/extra-files/cpp_stdlib.cpp -o ../../../../ninja-ldc/dmd-testsuite-debug/runnable/cpp_stdlib.cpp.o -std=c++11 -std=c++11
runnable/extra-files/cpp_stdlib.cpp:1:10: fatal error: 'array' file not found
2980: #include <array>
2980:          ^~~~~~~
2980: 1 error generated.

Any idea what's missing? Incomplete Xcode installation?!

@kinke
Copy link
Member Author

kinke commented Jan 10, 2019

Btw @JohanEngelen, the strange LLVM 7.0 crashes for Travis/Linux are apparently back, master failed too...

@thewilsonator
Copy link
Contributor

-std=c++11 -std=c++11

One of those should be a -stdlib=libc++.

@JohanEngelen
Copy link
Member

considering the number of still missing similar functions

How many functions are we talking about?
What is _d_array_slice_copy supposed to do with -betterC when the slices overlap, lengths disagree, or nullptrs?

@kinke
Copy link
Member Author

kinke commented Jan 10, 2019

How many functions are we talking about?

I don't know exactly, but inlining every non-templated runtime function usable/useful in -betterC (_d_arrayassign_{l,r} etc.) sounds horrifying (in terms of work ;)).

What is _d_array_slice_copy supposed to do with -betterC when the slices overlap, lengths disagree, or nullptrs?

I guess it depends on the new -checkaction switch, but presumably either calling C assert or emitting an unreachable.

@JohanEngelen
Copy link
Member

We could try forcing cross module inlining them? (i.e. inlining them from D source)

@kinke
Copy link
Member Author

kinke commented Jan 10, 2019

Nope, the regular D version throws Errors, and exceptions aren't supported by -betterC. Or do you mean having specific betterC versions of these functions in druntime and cross-module-inlining them? That might work (edit: as would linking against a tiny betterC runtime library instead of druntime), but anyway, I don't see this as high priority. Walter chose to tackle/inline that specific use case and added a test for it; as an LDC -betterC user, I would go ahead and implement my own _d_array_slice_copy in a few lines for the time being (or just use -release if lazy).
Related #2425 is more than a year old.

This stuff (compiler support functions) needs to be tackled with a clear strategy IMO, for bare-metal and -betterC friendliness.

@kinke
Copy link
Member Author

kinke commented Jan 12, 2019

Wrt. 32-bit Linux dmd-testsuite failures, running a single one of them manually (via SSH directly on CI test machine) works. According to the logs, it seems like there are conflicts wrt. generated temp filenames for tests running in parallel (these files are removed immediately after each test):

Test runnable/a17.d failed:       ../../../../ninja-ldc/dmd-testsuite-debug_32/17048cdb74d71bd8683b1617baab89f0b04fa1f3132184ee8ffebe912b90308: No such file or directory
Test runnable/argufilem.d failed: ../../../../ninja-ldc/dmd-testsuite-debug_32/17048cdb74d71bd8683b1617baab89f0b04fa1f3132184ee8ffebe912b90308: No such file or directory
Test runnable/bug16146.d failed:  ../../../../ninja-ldc/dmd-testsuite-debug_32/17048cdb74d71bd8683b1617baab89f0b04fa1f3132184ee8ffebe912b90308: No such file or directory
Test runnable/cov2.d failed:      ../../../../ninja-ldc/dmd-testsuite-debug_32/17048cdb74d71bd8683b1617baab89f0b04fa1f3132184ee8ffebe912b90308: No such file or directory
...

I didn't spot relevant changes in d_do_test.d or std.random though after a quick glance. Edit: I looked at the wrong Phobos branch, there have been std.random changes indeed with v2.084, e.g., dlang/phobos#6580.

@kinke
Copy link
Member Author

kinke commented Jan 13, 2019

Argh I think I found the culprit:

private ulong bootstrapSeed() @nogc nothrow
{
    ulong result = void;
    enum ulong m = 0xc6a4_a793_5bd1_e995UL; // MurmurHash2_64A constant.
    void updateResult(ulong x)
    {
        x *= m;
        x = (x ^ (x >>> 47)) * m;
        result = (result ^ x) * m;
    }
    import core.thread : getpid, Thread;
    import core.time : MonoTime;

    updateResult(cast(ulong) cast(void*) Thread.getThis());
    updateResult(cast(ulong) getpid());
    updateResult(cast(ulong) MonoTime.currTime.ticks);
    //printf(".: bootstrap seed [early]: %llx\n", result);
    result = (result ^ (result >>> 47)) * m;
    result = result ^ (result >>> 47);
    //printf(".: bootstrap seed: %llx\n", result);
    return result;
}

With -O, the result of this is undef in IR, due to result not being initialized (= void) and the return value depending on its initialization... The intent probably was to make the initial seed somewhat more random, but it completely breaks down with the LLVM optimizer.

@JohanEngelen
Copy link
Member

Nice find. Small note: the optimizer plays no role. Using an uninitialized value is UB in D. https://dlang.org/spec/declaration.html#void_init

@kinke
Copy link
Member Author

kinke commented Jan 14, 2019

Finally green.

@n8sh
Copy link
Contributor

n8sh commented Jan 14, 2019

Upstreamed the std.random fix.
dlang/phobos#6833

@JohanEngelen
Copy link
Member

The core.thread issue with -O in Windows is probably because those tests are wrong.

mixin("asm pure nothrow @nogc { mov "~REG~", 0xFFFFFFFFFFFFFFFF; }");
zeroRegister.call();
mixin("asm pure nothrow @nogc { mov after, "~REG~"; }");

Allthough callee zeroRegister.call() shouldn't change the register tested (before==after the call), but the caller itself (testNonvolatileRegister) is allowed to change the registers in between those 3 lines of code: setting up the call to zeroRegister.call() may involve changing those registers.

@kinke
Copy link
Member Author

kinke commented Jan 14, 2019

I extracted the test in a separate module and looked at the optimized IR (not the asm) - it was fine, register being set, then the zeroRegister.call() call (the failing registers aren't used for passing/returning values across calls by the Win64 ABI), then register being read again. The inline asm constraints/clobbers were okay too. And it also worked as expected.
So I guess it's due to some additional inlining going on if directly in core.thread, where the Fiber implementation and the support functions reside. What's strange is that that module wasn't touched at all.

@kinke kinke changed the title WIP: Upgrade front-end & libs to v2.084.0 Upgrade front-end & libs to v2.084.0 Jan 16, 2019
@kinke kinke merged commit f5a5324 into ldc-developers:master Jan 18, 2019
@kinke kinke deleted the merge-2.084 branch January 18, 2019 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants