Implement optimization: compare slices using memcmp if valid#2047
Implement optimization: compare slices using memcmp if valid#2047kinke merged 5 commits intoldc-developers:masterfrom
Conversation
|
@9il is this relevant for you? |
|
Nice that you followed up on this.
Makes perfect sense to me, it's the user's responsibility to provide a valid slice. Only exception being that two slices of length 0 should be equal IMO, regardless of the start addresses.
You mean
I'm rather surprised the front-end allows comparisons such as this. Isn't that a compile-time false, detectable in the front-end? |
gen/arrays.cpp
Outdated
| assert(l_ptr && r_ptr && num_elements); | ||
| LLFunction *fn = getRuntimeFunction(loc, gIR->module, "memcmp"); | ||
| assert(fn); | ||
| auto size_in_bytes = num_elements; |
There was a problem hiding this comment.
fixed it. I kept the l_ptr underscore business, it reads a little easier I find.
gen/arrays.cpp
Outdated
| irs.ir->CreateCondBr(lengthsCompareEqual, memcmpBB, memcmpEndBB); | ||
|
|
||
| // If lengths are equal: call memcmp. | ||
| // Note: it is UB if a slice pointers is null, thus no extra null checks are |
There was a problem hiding this comment.
I'm not convinced this is indeed the case. Of course, if at least one of the arrays has non-zero length, then any slice starting at nullptr would be invalid, and hence comparing it UB (well, on systems with an MMU and the page at 0x0 not mapped). However, this isn't an interesting case in the first place – it never requires special treatment, and either works (on targets where 0x0 is valid memory) or doesn't.
Zero-length arrays are the only tricky case. However, there I don't see how the GDC test case you linked above corroborates your conclusion. Do you have an example that clearly shows that 0x0[0 .. 0] == 0x0[0 .. 0] is treated as UB in DMD?
There was a problem hiding this comment.
I meant UB in terms of language spec (upon UB, the system can crash or not, it doesn't matter).
Calling memcmp with a nullptr is UB, also if the length argument is 0.
druntime's code does not check for length==0, neither does GDC's generated code.
This is an important point to check because LLVM could optimize basic blocks with memcmp(0,0,0) in a funny way; at the moment LLVM doesn't do anything crazy and apparently assumes memcmp(0,0,0) returns 0: https://godbolt.org/g/twuhQc
There was a problem hiding this comment.
I'm quite aware of the (standard-mandated) behaviour of C memcmp, hence the question. I was specifically interested in the case where you mentioned DMD would generate code crashing at runtime. In particular, this seemed odd because at least on the default Linux/BSD/macOS libcs, memcmp(…, …, 0) is required to always return 0. In that light, the fact that the TypeInfo.equals implementation does not check for null could also be seen as merely acknowledging that all DMD target platforms provide appropriate guarantees for memcmp, rather than as an indication that null comparisons are UB.
There was a problem hiding this comment.
So shall I add to the comment that for zero-length arrays, things work out correctly?
(the crashing case was with non-zero length arrays, so not so interesting)
There was a problem hiding this comment.
A test would be much preferred over a code comment.
Yeah, same here. I suppose implicit slicing of static arrays makes the types compatible? |
We don't lower |
|
Ah yeah right, forgot it again. ;) |
This also fixes a pessimization where the `memcmp` call would become an `invoke` if the user provides his own `memcmp` prototype in the code (the prototype would not carry the `nounwind` function attribute).
e459daf to
bba06cd
Compare
gen/arrays.cpp
Outdated
|
|
||
| auto *l_ptr = DtoArrayPtr(l); | ||
| auto *r_ptr = DtoArrayPtr(r); | ||
| auto *l_size = DtoArrayLen(l); |
There was a problem hiding this comment.
l_length - I'm fond of having consistent semantics for size (size in bytes), length (#elements) and, if unavoidable for LLVM, sizeInBits. ;)
[I can add a fixup commit too.]
| @@ -1062,32 +1059,68 @@ bool validCompareWithMemcmp(DValue *l, DValue *r) { | |||
| return validCompareWithMemcmpType(elemType); | |||
There was a problem hiding this comment.
4 lines up: So Type::equivalent() considers bool[4] and bool[] etc. equivalent? I wouldn't have expected that (only thought it would deal with const/immutable/shared...), and I think it deserves an explicit comment.
There was a problem hiding this comment.
After some testing: bool[3] and bool[4] are also considered equivalent !
There was a problem hiding this comment.
Added test for the optimized false result of bool[3]==bool[4].
|
Nice! |
|
Of course not! A bitwise comparison of floating point arrays would be a very bad idea... |
|
How does this relate to dlang/druntime#1792? If I understand correctly, the frontend will replace |
|
The good thing about this LDC PR is that it does not require DRuntime dependency and do not generate template bloat. |
Seriously? The whole idea behind lowering to template functions is to reduce the dependency on DRuntime.
#include <array>
#include <cstdio>
int compareArrays(const int *p1, size_t len1, const int *p2, size_t len2);
int main()
{
std::array<int, 3> arr1 = {1, 2, 3};
std::array<int, 3> arr2 = {1, 2, 4};
int res = compareArrays(arr1.begin(), arr1.size(), arr2.begin(), arr2.size());
printf("%d\n", res);
}
extern(C++) pure nothrow @nogc
int compareArrays(scope const(int)* p1, size_t len1, scope const(int)* p2, size_t len2)
{
return p1[0 .. len1] < p2[0 .. len2];
}
extern(C) void _d_dso_registry() {}$ ~/dlang/install.sh install dmd-nightly
Downloading and unpacking http://nightlies.dlang.org/dmd-master-2017-03-28/dmd.master.linux.tar.xz
######################################################################## 100.0%
dub-1.2.1 already installed
Run `source ~/dlang/dmd-master-2017-03-28/activate` in your shell to use dmd-master-2017-03-28.
This will setup PATH, LIBRARY_PATH, LD_LIBRARY_PATH, DMD, DC, and PS1.
Run `deactivate` later on to restore your environment.
$ source ~/dlang/dmd-master-2017-03-28/activate
$ g++ -std=c++11 -c main.cpp && \
dmd -O -betterC -c compare.d && \
g++ main.o compare.o -o d_array_compare
$ ./d_array_compare
1
$ size -A -d compare.o
compare.o :
section size addr
.text 0 0
.data 0 0
.bss 0 0
.rodata 0 0
.comment 0 0
.note 0 0
.note.GNU-stack 0 0
.data.rel.ro 0 0
.eh_frame 120 0
.text._Z13compareArraysPKimS0_m 60 0
.text._d_dso_registry 8 0
.text._D6object12__T5__cmpTiZ5__cmpFNaNbNiNexAixAiZi 148 0
minfo 0 0
.group.d_dso 20 0
.data.d_dso_rec 8 0
.text.d_dso_init 40 0
.dtors.d_dso_dtor 8 0
.ctors.d_dso_ctor 8 0
Total 420
$ nm compare.o
0000000000000000 t
0000000000000000 W _D6object12__T5__cmpTiZ5__cmpFNaNbNiNexAixAiZi
0000000000000000 T _d_dso_registry
U _d_dso_registry
U _GLOBAL_OFFSET_TABLE_
U __start_minfo
U __stop_minfo
0000000000000000 T _Z13compareArraysPKimS0_m
Of course 120 bytes for The important thing is that templates allow you to pay only for what you use and are also |
|
@ZombineDev: It's great that we have guys like you following both LDC and DMD development; I don't have the time to follow DMD. It's nice that there's apparently some movement in this regard; currently (for LDC, i.e., 2.073), the array comparison sucks badly due to TypeInfo lookup etc. I see absolutely no reason for optimizing map/reduce or some other higher-level stuff, but equality checks are absolutely basic and potentially quite frequent, hence this work on optimizing them. After this, it's still lacking support for unpadded aggregates (without opEquals, floating-point members etc.), but the current upstream druntime __equals() appears to be as simple as possible, without any memcmp() optimizations at all. Note that LLVM knows the semantics of memcmp() and can elide the call, that's why we are keen on using it. I very much prefer coding this in druntime instead of the compiler. Something that can't be done in a library is telling the compiler that memcmp() never throws (nope, |
|
I had seen the druntime stuff too and am divided on this issue, but chose this PR's solution. |
And why would it be bad and annoying? Lowering of constructs into library implementations is a great way to reduce compiler complexity (assuming, of course, that the interface uses sufficiently non-leaky abstractions). |
This also fixes a pessimization where the `memcmp` call would become an `invoke` if the user provides his own `memcmp` prototype in the code (the prototype would not carry the `nounwind` function attribute).
|
The mixing of parsing and semantic analysis and optimization/lowering like it is done now is discarding information, and often I run into troubles because of it. Two things that come to mind are cross-module inlining and coverage. The replacement (by the parser) of |
The optimization is to do a length-compare + memcmp-compare for dynamic array comparisons. A continuation of #1719 .
Slice nullptr checks are not needed, because comparing nullptr slices is UB in D. (I can't find this in the specs. I deduced it from crashing executables when built with DMD, and from GDC's optimized code https://godbolt.org/g/qoNKSJ)
This also fixes a pessimization where the
memcmpcall would become aninvokeif the user provides amemcmpprototype in the code (the prototype would not carry thenounwindfunction attribute). This is not as rare as it may seem: it may quickly happen when the user imports one of the stdlib modules.What's left is optimizing comparisons like
int[3] == int[2]. See https://godbolt.org/g/ZBLkp3.