Optimize static array comparisons to a memcmp call for types for which this is valid. by JohanEngelen · Pull Request #1719 · ldc-developers/ldc

JohanEngelen · 2016-08-25T14:03:13Z

Resolves #1632.

kinke · 2016-08-25T17:46:21Z

gen/arrays.cpp

+  if (ltype->ty != Tsarray)
+    return false;
+
+  auto *elemType = ltype->nextOf()->toBasetype();


There's a function somewhere (I don't recall its name right now) which descends to the first non-static-array element type. By using that, you can get rid of the recursion in the function above.

You also need to check for a compatible rhs type. This is valid but obviously not suited for memcmp (and should be part of a test):

int[3] ia = [ 1, 2, 3 ]; short[3] sa = [ 1, 2, 3 ]; assert(ia == sa);

OK, didn't know that.
But the recursion isn't so bad, I think? validCompareWithMemcmpType will become recursive again when someone implements the logic for Tstruct, and then the Tsarray logic is also needed there.

Thanks a lot about the testcase, bad assumption on my part. I'll just check that both lhs and rhs types are exactly the same then?

Edit: interestingly, the memcmp call is not emitted for int[3] == short[3]. I wasn't expecting that lol. So it already works, but testcase needs to be added certainly. Gotta go now.

Alright, the recursion may really be needed for structs later on.

I'll just check that both lhs and rhs types are exactly the same then?

I always find this a bit tricky. 'Exactly the same' includes const/immutable modifiers, which don't matter here. There's a stripModifiers() function or so (which already offers recursion, which we need here), but I find that one a bit tedious to use. An idea might be checking the LLVM types: DtoMemType(l->type) == DtoMemType(r->type) (e.g., this would work for char[] == byte[] and even char[] == bool[], both are allowed by the front-end, and work for integer signedness-mismatches too).

Thanks for doing this, people will appreciate it I'm sure.

For when the types are different (int[3] == short[3] but also byte[3] == char[3]), the front-end lowers the code into a call to (e.g.) object._ArrayEq!(byte, char)._ArrayEq(byte[], char[]). So optimizing those will require much more and different work.
I will add a check that the types have to be equivalent (ignoring constness), just in case.

JohanEngelen · 2016-09-01T13:07:46Z

tests\debuginfo\codeview.d fails on Windows, but I don't see how it is related to this PR (no array comparisons happening in codeview.d afaict). @rainers Do you know what's going on? Thanks.

rainers · 2016-09-01T19:51:26Z

@rainers Do you know what's going on? Thanks.

Sorry for the trouble. I suspect it is again displaying a different definition of "string" from another DLL or static library. Making the symbol search case sensitive might help. You can switch this by adding a line // CDB: .symopt-1 before // CDB: dt string in codeview.d.

dnadlinger · 2016-09-01T21:38:46Z

gen/arrays.cpp

+bool validCompareWithMemcmp(DValue *l, DValue *r) {
+  auto *ltype = l->type->toBasetype();
+
+  // Only static arrays are potentially compared using memcmp.


Why not dynamic arrays? It seems like all this would require is an extra icmp for the length members.

And maybe a fast-return if the pointers match too, to optimize checks against the same memory.

All left for future work. It just adds complexity.

The comment is not very useful, since it exactly replicates the content of the code itself, yet leaves the question as to why unanswered.

I've improved the comments.

rainers · 2016-09-02T08:49:55Z

You can switch this by adding a line // CDB: .symopt-1 before // CDB: dt string in codeview.d.

Also improved symbol loading in #1731

kinke · 2016-09-02T14:57:06Z

AppVeyor jobs retriggered.

kinke · 2016-09-02T16:50:37Z

Wow, something bad is happening here. In master, tests/debuginfo/codeview.d takes 1:43 mins for x64 and 0:23 for x86 (when hovering over an AppVeyor log line, a tooltip pops up with a timestamp).

With this PR, the x86 test takes a massive 6:21, and the x64 job times out after more than 26 minutes for that single test! Pinging @rainers.

rainers · 2016-09-02T17:23:07Z

Pinging @rainers.

I suspect cdb is trying to load symbols from the MS symbol servers. Passing -snul (disables automatic symbol loading for unqualified names) to the cdb command line might help.

JohanEngelen · 2016-09-04T13:22:07Z

Now the Windows unittest fails on a uuid.d assert, line 944. I've looked at the IR (Mac and Windows), but for that particular comparison, memcmp is not used (it is a dyn array comparison).
Anyone know what's going on?

kinke · 2016-09-04T14:24:25Z

AppVeyor is very strange today. For your x64 run:

1/1 Test #957: lit-tests ........................   Passed  1230.67 sec

tests/codegen/inlining_templates.d takes 17 minutes. For my LLVM 3.9 final PR, it even took almost 25 minutes. So I'm not sure AppVeyor can be trusted today...

Edit: Sorry, this has nothing to do with your issue. It's the debuginfo tests with cdb that take forever (unfortunately, on my box too).

dnadlinger · 2016-09-04T14:41:18Z

On your box too? Should be easy to fix then? Otherwise, I'd say we back out the tests for now.

rainers · 2016-09-04T17:03:41Z

tests/codegen/inlining_templates.d takes 17 minutes

I suspect that it is again the next test that takes so long: codeview.d ;-( cvbasictypes.d also too more than 3 minutes.

rainers · 2016-09-04T17:05:51Z

It's the debuginfo tests with cdb that take forever (unfortunately, on my box too).

Can you run them without redirecting the output and see what it is doing? The only issue I've seen as a cause for slowdoen is loading symbols from the symbol servers.

dnadlinger · 2016-09-04T17:15:52Z

The only issue I've seen as a cause for slowdoen is loading symbols from the symbol servers.

But that should be disabled in master now?

rainers · 2016-09-04T17:22:47Z

But that should be disabled in master now?

Yes, that's what I thought. With cdb from the Windows 10 SDK, I also see a pause (less than a minute) when loading the symbols of the executable explicitely. That doesn't happen for cdb from the Windows 8.1 SDK.

dnadlinger · 2016-09-04T17:23:52Z

With cdb from the Windows 10 SDK, I also see a pause (less than a minute) when loading the symbols of the executable explicitely.

Do you mean, due to network traffic? 30s or whatever it is still seems awfully long for loading symbols for an executable.

rainers · 2016-09-04T17:34:29Z

Do you mean, due to network traffic? 30s or whatever it is still seems awfully long for loading symbols for an executable.

Yes. It is looking for codeview.exe`s symbols on the MS symbol servers, too. I'm trying to disable that...

rainers · 2016-09-04T17:42:23Z

I hope this helps: #1743

kinke · 2016-09-05T20:22:29Z

Anyone know what's going on?

Nope. I can reproduce it here though, it starts with -O2. It works fine if the unittest is isolated in its own module. But a printf in the original source right before the assertion reveals that str (the runtime array) is all set to nulls...

str = 0x000000C2199FF4E0 (00000000-0000-0000-0000-000000000000)
s   = 0x00007FF7276097B0 (8ab3060e-2cba-4f23-b74c-b52db3bdfb46)

kinke · 2016-09-05T20:32:12Z

Well, it's not all nulls, the dashes are part of the string. This is more complete:

// printf("id = 0x%p 0x%p\n", id.ulongs[0], id.ulongs[1]);
id = 0x234FBA2C0E06B38A 0x46FBBDB32DB54CB7
// printf("str = 0x%p (%.36s)\ns   = 0x%p (%.36s)\n", str.ptr, str.ptr, s.ptr, s.ptr);
str = 0x000000C43AAFF710 (00000000-0000-0000-0000-000000000000)
s   = 0x00007FF60AA297B0 (8ab3060e-2cba-4f23-b74c-b52db3bdfb46)

So it looks as if the this pointer in id.toString(str[]) (id is an enum struct std.uuid.UUID) could point to something else with all zeros.

kinke · 2016-09-05T20:43:12Z

Well, this could very well be another symptom of #1324, as we have a compile-time instance of a struct with a union again.

JohanEngelen · 2016-09-06T19:01:56Z

I don't know what's going on here :(
I can reproduce it too on Windows, but with very slight changes, the bug is gone. Definitely something with unions+CTFE. I can fix Phobos so the bug is gone but... :'( :'(
It is also very strange that it depends on the LLVM optimization level.

Edit: "fix Phobos" would mean changing "enum" to "auto" in the unittest, basically going from ctfe to runtime, hiding the union bug.

kinke · 2016-10-25T22:47:27Z

AppVeyor retriggered after merging #1846 and also verified locally - no change unfortunately, std.uuid:944 still asserts starting with -O2 on Win64. :(

JohanEngelen · 2016-10-26T07:41:07Z

:(

dnadlinger · 2017-02-09T23:45:09Z

@UplinkCoder: Sure. I might have implemented the struct/class reference lowering in LDC. My point is that the lack of identity is supposed to be non-observable at runtime. If it is (e.g. if the data is mutable, like in issue 15989), it's mainly an accepts-invalid DMD bug.

UplinkCoder · 2017-02-09T23:52:30Z

@klickverbot You know certainly alot more about ldc internals then I do (I know nothing about them).
@JohanEngelen If this were an issue with ctfe it should not be affected by the optimization level.
Maybe the optimizer assumes some things that are not true ?

kinke · 2017-02-10T15:45:22Z

The unoptimized IR (I've used the merge-2.072 branch, this PR not merged in) looks absolutely fine:

import std.uuid;

void main()
{
    import std.encoding : Char = AsciiChar;
    enum  utfstr = "8ab3060e-2cba-4f23-b74c-b52db3bdfb46";
    alias String = immutable(Char)[];
    enum String s = cast(String)utfstr;
    enum id = UUID(utfstr);
    Char[36] str;
    id.toString(str[]);
    assert(str == s);
}

%std.uuid.UUID = type { [16 x i8] }

; data for `enum UUID id`
@.arrayliteral = internal unnamed_addr constant [16 x i8] c"\8A\B3\06\0E,\BAO#\B7L\B5-\B3\BD\FBF" ; [#uses = 1]
; `enum String s`, used in the comparison
@.str = private unnamed_addr constant [37 x i8] c"8ab3060e-2cba-4f23-b74c-b52db3bdfb46\00" ; [#uses = 1]

  %str = alloca [36 x i8], align 1                ; [#uses = 4, size/byte = 36]
  ; enum UUID id
  %.structliteral = alloca %std.uuid.UUID, align 8 ; [#uses = 2, size/byte = 16]
  ; zero-initialize `str`
  %1 = bitcast [36 x i8]* %str to i8*             ; [#uses = 1]
  call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 36, i32 1, i1 false)
  ; [...]
  ; initialize single `id` field via memcpy from @.arrayliteral
  %4 = getelementptr inbounds %std.uuid.UUID, %std.uuid.UUID* %.structliteral, i32 0, i32 0 ; [#uses = 1, type = [16 x i8]*]
  %5 = bitcast [16 x i8]* %4 to i8*               ; [#uses = 1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %5, i8* getelementptr inbounds ([16 x i8], [16 x i8]* @.arrayliteral, i32 0, i32 0), i64 16, i32 1, i1 false)
  ; str[]
  %6 = bitcast [36 x i8]* %str to i8*             ; [#uses = 1]
  %7 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %6, 1 ; [#uses = 1]
  ; id.toString(str[])
  call void @_D3std4uuid4UUID39__T8toStringTAE3std8encoding9AsciiCharZ8toStringMxFNaNbNiNfMAE3std8encoding9AsciiCharZv(%std.uuid.UUID* nonnull %.structliteral, { i64, i8* } %7) #0
  ; str[]
  %8 = bitcast [36 x i8]* %str to i8*             ; [#uses = 1]
  %9 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %8, 1 ; [#uses = 1]
  ; str == s
  %10 = call i32 @_adEq2({ i64, i8* } %9, { i64, i8* } { i64 36, i8* getelementptr inbounds ([37 x i8], [37 x i8]* @.str, i32 0, i32 0) }, %object.TypeInfo* bitcast (%"typeid(AsciiChar[])"* @_D34TypeInfo_AE3std8encoding9AsciiChar6__initZ to %object.TypeInfo*)) #2 ; [#uses = 1]

My results back then via printf clearly indicated that the run-time data of id was garbage/zero-initialized. The this pointer of id (the %.structliteral alloca) is fine, and it's initialized. I don't think the LLVM optimizer chokes on something as simple as this. So it's very likely that the problem lies somewhere in the front-end when constructing enum id = UUID(utfstr); at compile-time and ending up with the (parsed) 16 bytes of UUID runtime data (@.arrayliteral in LLVM IR).

@UplinkCoder: Please have a look at the interesting comment in https://github.com/dlang/phobos/blob/master/std/uuid.d#L353.

kinke · 2017-02-16T22:58:24Z

For the record: I'm absolutely eager to merge this and am in favor of just using static immutable id = UUID(utfstr); for LDC to work around this apparently invalid or at least problematic unittest.

kinke · 2017-02-17T18:22:00Z

This PR doesn't affect the problematic unittest itself IR-wise; two slices are compared, and this PR only supports static arrays so far.

JohanEngelen · 2017-02-19T21:43:34Z

Shall I modify Phobos and merge this?
(and open a new issue with the reduced problematic testcase from this PR, so we can investigate there what is going on?)

dnadlinger · 2017-02-19T21:47:12Z

Seems reasonable, although introducing a known miscompilation leaves a bit of a stale taste… I don't have time to look into the issue any more closely right now, though.

JohanEngelen · 2017-02-19T21:52:51Z

Yeah... this is a strange bug. I'm not sure whether it is a miscompile or a unittest bug. My current feeling is that it is a unittest bug, exposed by aggressive optimization enabled by this PR.

dnadlinger · 2017-02-19T21:58:51Z

Yep… I'd say let's merge it and keep a close eye on it throughout the 1.2 beta phase.

JohanEngelen · 2017-02-22T22:53:26Z

green after phobos modification.

kinke · 2017-02-22T23:20:55Z

Stefan suggested a static immutable instead of auto, did you try that?

JohanEngelen · 2017-02-28T23:30:38Z

With static immutable we don't have to change the nothrow @nogc. So if that works, great. Updated the PR.

…h this is valid. Resolves ldc-developers#1632

JohanEngelen · 2017-02-28T23:39:36Z

Travis is not retriggering :(

kinke · 2017-03-01T19:57:31Z

Yay, finally. ;)

I don't think it'll make a huge impact in the current form. As soon as slices are supported (should be straight foward), that will definitely change, and I actually expect noticeable performance improvements for client code (incl. synthetic benchmarks, where LDC and D in general could even shine a bit more). And we should work on supporting suited structs soon too.

JohanEngelen mentioned this pull request Aug 25, 2016

Array equal code #1632

Closed

kinke reviewed Aug 25, 2016
View reviewed changes

JohanEngelen force-pushed the gh1632 branch from 7fc60e0 to fadbf56 Compare August 31, 2016 18:50

dnadlinger reviewed Sep 1, 2016
View reviewed changes

JohanEngelen force-pushed the gh1632 branch from 3f2f40f to 1efb1c3 Compare September 5, 2016 08:34

kinke mentioned this pull request Oct 23, 2016

Fix union layout and initialization #1846

Merged

JohanEngelen force-pushed the gh1632 branch from 1efb1c3 to 077efbe Compare February 16, 2017 20:44

JohanEngelen force-pushed the gh1632 branch from 077efbe to 02b7fff Compare February 19, 2017 22:31

JohanEngelen added a commit to JohanEngelen/ldc that referenced this pull request Feb 19, 2017

Update phobos: work around an issue exposed by PR ldc-developers#1719

16b3de2

JohanEngelen force-pushed the gh1632 branch from 02b7fff to 0794c2b Compare February 22, 2017 20:40

JohanEngelen added a commit to JohanEngelen/ldc that referenced this pull request Feb 22, 2017

Update phobos: work around an issue exposed by PR ldc-developers#1719

95eae71

Update phobos: work around an issue exposed by PR ldc-developers#1719

9ee217e

JohanEngelen force-pushed the gh1632 branch from 0794c2b to a8bf2d3 Compare February 28, 2017 23:30

JohanEngelen closed this Feb 28, 2017

JohanEngelen reopened this Feb 28, 2017

Optimize static array comparisons to a memcmp call for types for whic…

bd09cdc

…h this is valid. Resolves ldc-developers#1632

JohanEngelen force-pushed the gh1632 branch from a8bf2d3 to bd09cdc Compare February 28, 2017 23:38

JohanEngelen closed this Mar 1, 2017

JohanEngelen reopened this Mar 1, 2017

kinke merged commit 33f226e into ldc-developers:master Mar 1, 2017

JohanEngelen deleted the gh1632 branch March 1, 2017 20:20

JohanEngelen mentioned this pull request Mar 21, 2017

Implement optimization: compare slices using memcmp if valid #2047

Merged

Uh oh!

Conversation

JohanEngelen commented Aug 25, 2016

Uh oh!

kinke Aug 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JohanEngelen Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

JohanEngelen Aug 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kinke Aug 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JohanEngelen Aug 31, 2016

Choose a reason for hiding this comment

Uh oh!

JohanEngelen commented Sep 1, 2016

Uh oh!

rainers commented Sep 1, 2016

Uh oh!

dnadlinger Sep 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kinke Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

JohanEngelen Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

dnadlinger Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

JohanEngelen Sep 4, 2016

Choose a reason for hiding this comment

Uh oh!

rainers commented Sep 2, 2016

Uh oh!

kinke commented Sep 2, 2016

Uh oh!

kinke commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rainers commented Sep 2, 2016

Uh oh!

JohanEngelen commented Sep 4, 2016

Uh oh!

kinke commented Sep 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnadlinger commented Sep 4, 2016

Uh oh!

rainers commented Sep 4, 2016

Uh oh!

rainers commented Sep 4, 2016

Uh oh!

dnadlinger commented Sep 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rainers commented Sep 4, 2016

Uh oh!

dnadlinger commented Sep 4, 2016

Uh oh!

rainers commented Sep 4, 2016

Uh oh!

rainers commented Sep 4, 2016

Uh oh!

kinke commented Sep 5, 2016

Uh oh!

kinke commented Sep 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kinke commented Sep 5, 2016

Uh oh!

kinke Aug 25, 2016 •

edited

Loading

JohanEngelen Aug 25, 2016 •

edited

Loading

kinke Aug 25, 2016 •

edited

Loading

dnadlinger Sep 1, 2016 •

edited

Loading

kinke commented Sep 2, 2016 •

edited

Loading

kinke commented Sep 4, 2016 •

edited

Loading

dnadlinger commented Sep 4, 2016 •

edited

Loading

kinke commented Sep 5, 2016 •

edited

Loading

JohanEngelen commented Sep 6, 2016 •

edited

Loading

kinke commented Feb 10, 2017 •

edited

Loading

kinke commented Feb 22, 2017 •

edited

Loading