Optimize static array comparisons to a memcmp call for types for which this is valid.#1719
Conversation
| if (ltype->ty != Tsarray) | ||
| return false; | ||
|
|
||
| auto *elemType = ltype->nextOf()->toBasetype(); |
There was a problem hiding this comment.
There's a function somewhere (I don't recall its name right now) which descends to the first non-static-array element type. By using that, you can get rid of the recursion in the function above.
You also need to check for a compatible rhs type. This is valid but obviously not suited for memcmp (and should be part of a test):
int[3] ia = [ 1, 2, 3 ];
short[3] sa = [ 1, 2, 3 ];
assert(ia == sa);There was a problem hiding this comment.
OK, didn't know that.
But the recursion isn't so bad, I think? validCompareWithMemcmpType will become recursive again when someone implements the logic for Tstruct, and then the Tsarray logic is also needed there.
There was a problem hiding this comment.
Thanks a lot about the testcase, bad assumption on my part. I'll just check that both lhs and rhs types are exactly the same then?
Edit: interestingly, the memcmp call is not emitted for int[3] == short[3]. I wasn't expecting that lol. So it already works, but testcase needs to be added certainly. Gotta go now.
There was a problem hiding this comment.
Alright, the recursion may really be needed for structs later on.
I'll just check that both lhs and rhs types are exactly the same then?
I always find this a bit tricky. 'Exactly the same' includes const/immutable modifiers, which don't matter here. There's a stripModifiers() function or so (which already offers recursion, which we need here), but I find that one a bit tedious to use. An idea might be checking the LLVM types: DtoMemType(l->type) == DtoMemType(r->type) (e.g., this would work for char[] == byte[] and even char[] == bool[], both are allowed by the front-end, and work for integer signedness-mismatches too).
Thanks for doing this, people will appreciate it I'm sure.
There was a problem hiding this comment.
For when the types are different (int[3] == short[3] but also byte[3] == char[3]), the front-end lowers the code into a call to (e.g.) object._ArrayEq!(byte, char)._ArrayEq(byte[], char[]). So optimizing those will require much more and different work.
I will add a check that the types have to be equivalent (ignoring constness), just in case.
|
|
Sorry for the trouble. I suspect it is again displaying a different definition of "string" from another DLL or static library. Making the symbol search case sensitive might help. You can switch this by adding a line |
gen/arrays.cpp
Outdated
| bool validCompareWithMemcmp(DValue *l, DValue *r) { | ||
| auto *ltype = l->type->toBasetype(); | ||
|
|
||
| // Only static arrays are potentially compared using memcmp. |
There was a problem hiding this comment.
Why not dynamic arrays? It seems like all this would require is an extra icmp for the length members.
There was a problem hiding this comment.
And maybe a fast-return if the pointers match too, to optimize checks against the same memory.
There was a problem hiding this comment.
All left for future work. It just adds complexity.
There was a problem hiding this comment.
The comment is not very useful, since it exactly replicates the content of the code itself, yet leaves the question as to why unanswered.
There was a problem hiding this comment.
I've improved the comments.
Also improved symbol loading in #1731 |
|
AppVeyor jobs retriggered. |
|
Wow, something bad is happening here. In master, With this PR, the x86 test takes a massive 6:21, and the x64 job times out after more than 26 minutes for that single test! Pinging @rainers. |
I suspect cdb is trying to load symbols from the MS symbol servers. Passing |
|
Now the Windows unittest fails on a uuid.d assert, line 944. I've looked at the IR (Mac and Windows), but for that particular comparison, memcmp is not used (it is a dyn array comparison). |
|
AppVeyor is very strange today. For your x64 run:
Edit: Sorry, this has nothing to do with your issue. It's the debuginfo tests with cdb that take forever (unfortunately, on my box too). |
|
On your box too? Should be easy to fix then? Otherwise, I'd say we back out the tests for now. |
I suspect that it is again the next test that takes so long: codeview.d ;-( cvbasictypes.d also too more than 3 minutes. |
Can you run them without redirecting the output and see what it is doing? The only issue I've seen as a cause for slowdoen is loading symbols from the symbol servers. |
But that should be disabled in master now? |
Yes, that's what I thought. With cdb from the Windows 10 SDK, I also see a pause (less than a minute) when loading the symbols of the executable explicitely. That doesn't happen for cdb from the Windows 8.1 SDK. |
Do you mean, due to network traffic? 30s or whatever it is still seems awfully long for loading symbols for an executable. |
Yes. It is looking for codeview.exe`s symbols on the MS symbol servers, too. I'm trying to disable that... |
|
I hope this helps: #1743 |
Nope. I can reproduce it here though, it starts with |
|
Well, it's not all nulls, the dashes are part of the string. This is more complete: // printf("id = 0x%p 0x%p\n", id.ulongs[0], id.ulongs[1]);
id = 0x234FBA2C0E06B38A 0x46FBBDB32DB54CB7
// printf("str = 0x%p (%.36s)\ns = 0x%p (%.36s)\n", str.ptr, str.ptr, s.ptr, s.ptr);
str = 0x000000C43AAFF710 (00000000-0000-0000-0000-000000000000)
s = 0x00007FF60AA297B0 (8ab3060e-2cba-4f23-b74c-b52db3bdfb46)So it looks as if the |
|
Well, this could very well be another symptom of #1324, as we have a compile-time instance of a struct with a union again. |
|
I don't know what's going on here :( Edit: "fix Phobos" would mean changing "enum" to "auto" in the unittest, basically going from ctfe to runtime, hiding the union bug. |
|
AppVeyor retriggered after merging #1846 and also verified locally - no change unfortunately, |
|
:( |
|
@UplinkCoder: Sure. I might have implemented the struct/class reference lowering in LDC. My point is that the lack of identity is supposed to be non-observable at runtime. If it is (e.g. if the data is mutable, like in issue 15989), it's mainly an accepts-invalid DMD bug. |
|
@klickverbot You know certainly alot more about ldc internals then I do (I know nothing about them). |
|
The unoptimized IR (I've used the merge-2.072 branch, this PR not merged in) looks absolutely fine: import std.uuid;
void main()
{
import std.encoding : Char = AsciiChar;
enum utfstr = "8ab3060e-2cba-4f23-b74c-b52db3bdfb46";
alias String = immutable(Char)[];
enum String s = cast(String)utfstr;
enum id = UUID(utfstr);
Char[36] str;
id.toString(str[]);
assert(str == s);
}%std.uuid.UUID = type { [16 x i8] }
; data for `enum UUID id`
@.arrayliteral = internal unnamed_addr constant [16 x i8] c"\8A\B3\06\0E,\BAO#\B7L\B5-\B3\BD\FBF" ; [#uses = 1]
; `enum String s`, used in the comparison
@.str = private unnamed_addr constant [37 x i8] c"8ab3060e-2cba-4f23-b74c-b52db3bdfb46\00" ; [#uses = 1]
%str = alloca [36 x i8], align 1 ; [#uses = 4, size/byte = 36]
; enum UUID id
%.structliteral = alloca %std.uuid.UUID, align 8 ; [#uses = 2, size/byte = 16]
; zero-initialize `str`
%1 = bitcast [36 x i8]* %str to i8* ; [#uses = 1]
call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 36, i32 1, i1 false)
; [...]
; initialize single `id` field via memcpy from @.arrayliteral
%4 = getelementptr inbounds %std.uuid.UUID, %std.uuid.UUID* %.structliteral, i32 0, i32 0 ; [#uses = 1, type = [16 x i8]*]
%5 = bitcast [16 x i8]* %4 to i8* ; [#uses = 1]
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %5, i8* getelementptr inbounds ([16 x i8], [16 x i8]* @.arrayliteral, i32 0, i32 0), i64 16, i32 1, i1 false)
; str[]
%6 = bitcast [36 x i8]* %str to i8* ; [#uses = 1]
%7 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %6, 1 ; [#uses = 1]
; id.toString(str[])
call void @_D3std4uuid4UUID39__T8toStringTAE3std8encoding9AsciiCharZ8toStringMxFNaNbNiNfMAE3std8encoding9AsciiCharZv(%std.uuid.UUID* nonnull %.structliteral, { i64, i8* } %7) #0
; str[]
%8 = bitcast [36 x i8]* %str to i8* ; [#uses = 1]
%9 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %8, 1 ; [#uses = 1]
; str == s
%10 = call i32 @_adEq2({ i64, i8* } %9, { i64, i8* } { i64 36, i8* getelementptr inbounds ([37 x i8], [37 x i8]* @.str, i32 0, i32 0) }, %object.TypeInfo* bitcast (%"typeid(AsciiChar[])"* @_D34TypeInfo_AE3std8encoding9AsciiChar6__initZ to %object.TypeInfo*)) #2 ; [#uses = 1]My results back then via printf clearly indicated that the run-time data of @UplinkCoder: Please have a look at the interesting comment in https://github.com/dlang/phobos/blob/master/std/uuid.d#L353. |
|
For the record: I'm absolutely eager to merge this and am in favor of just using |
|
This PR doesn't affect the problematic unittest itself IR-wise; two slices are compared, and this PR only supports static arrays so far. |
|
Shall I modify Phobos and merge this? |
|
Seems reasonable, although introducing a known miscompilation leaves a bit of a stale taste… I don't have time to look into the issue any more closely right now, though. |
|
Yeah... this is a strange bug. I'm not sure whether it is a miscompile or a unittest bug. My current feeling is that it is a unittest bug, exposed by aggressive optimization enabled by this PR. |
|
Yep… I'd say let's merge it and keep a close eye on it throughout the 1.2 beta phase. |
|
green after phobos modification. |
|
Stefan suggested a |
|
With |
…h this is valid. Resolves ldc-developers#1632
|
Travis is not retriggering :( |
|
Yay, finally. ;) I don't think it'll make a huge impact in the current form. As soon as slices are supported (should be straight foward), that will definitely change, and I actually expect noticeable performance improvements for client code (incl. synthetic benchmarks, where LDC and D in general could even shine a bit more). And we should work on supporting suited structs soon too. |
Resolves #1632.