Skip to content

Implement optimization: compare slices using memcmp if valid#2047

Merged
kinke merged 5 commits intoldc-developers:masterfrom
JohanEngelen:dynarraymemcmp
Mar 29, 2017
Merged

Implement optimization: compare slices using memcmp if valid#2047
kinke merged 5 commits intoldc-developers:masterfrom
JohanEngelen:dynarraymemcmp

Conversation

@JohanEngelen
Copy link
Member

The optimization is to do a length-compare + memcmp-compare for dynamic array comparisons. A continuation of #1719 .
Slice nullptr checks are not needed, because comparing nullptr slices is UB in D. (I can't find this in the specs. I deduced it from crashing executables when built with DMD, and from GDC's optimized code https://godbolt.org/g/qoNKSJ)

This also fixes a pessimization where the memcmp call would become an invoke if the user provides a memcmp prototype in the code (the prototype would not carry the nounwind function attribute). This is not as rare as it may seem: it may quickly happen when the user imports one of the stdlib modules.

What's left is optimizing comparisons like int[3] == int[2]. See https://godbolt.org/g/ZBLkp3.

@JohanEngelen
Copy link
Member Author

@9il is this relevant for you?

@kinke
Copy link
Member

kinke commented Mar 21, 2017

Nice that you followed up on this.

comparing nullptr slices is UB in D

Makes perfect sense to me, it's the user's responsibility to provide a valid slice. Only exception being that two slices of length 0 should be equal IMO, regardless of the start addresses.

This is not as rare as it may seem: it may quickly happen when the user imports one of the stdlib modules.

You mean memcmp in druntime isn't marked nothrow?

What's left is optimizing comparisons like int[3] == int[2]

I'm rather surprised the front-end allows comparisons such as this. Isn't that a compile-time false, detectable in the front-end?

gen/arrays.cpp Outdated
assert(l_ptr && r_ptr && num_elements);
LLFunction *fn = getRuntimeFunction(loc, gIR->module, "memcmp");
assert(fn);
auto size_in_bytes = num_elements;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no camelCase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it. I kept the l_ptr underscore business, it reads a little easier I find.

gen/arrays.cpp Outdated
irs.ir->CreateCondBr(lengthsCompareEqual, memcmpBB, memcmpEndBB);

// If lengths are equal: call memcmp.
// Note: it is UB if a slice pointers is null, thus no extra null checks are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is indeed the case. Of course, if at least one of the arrays has non-zero length, then any slice starting at nullptr would be invalid, and hence comparing it UB (well, on systems with an MMU and the page at 0x0 not mapped). However, this isn't an interesting case in the first place – it never requires special treatment, and either works (on targets where 0x0 is valid memory) or doesn't.

Zero-length arrays are the only tricky case. However, there I don't see how the GDC test case you linked above corroborates your conclusion. Do you have an example that clearly shows that 0x0[0 .. 0] == 0x0[0 .. 0] is treated as UB in DMD?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant UB in terms of language spec (upon UB, the system can crash or not, it doesn't matter).
Calling memcmp with a nullptr is UB, also if the length argument is 0.
druntime's code does not check for length==0, neither does GDC's generated code.
This is an important point to check because LLVM could optimize basic blocks with memcmp(0,0,0) in a funny way; at the moment LLVM doesn't do anything crazy and apparently assumes memcmp(0,0,0) returns 0: https://godbolt.org/g/twuhQc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite aware of the (standard-mandated) behaviour of C memcmp, hence the question. I was specifically interested in the case where you mentioned DMD would generate code crashing at runtime. In particular, this seemed odd because at least on the default Linux/BSD/macOS libcs, memcmp(…, …, 0) is required to always return 0. In that light, the fact that the TypeInfo.equals implementation does not check for null could also be seen as merely acknowledging that all DMD target platforms provide appropriate guarantees for memcmp, rather than as an indication that null comparisons are UB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So shall I add to the comment that for zero-length arrays, things work out correctly?

(the crashing case was with non-zero length arrays, so not so interesting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test would be much preferred over a code comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Of course :)

@dnadlinger
Copy link
Member

I'm rather surprised the front-end allows comparisons such as this. Isn't that a compile-time false, detectable in the front-end?

Yeah, same here. I suppose implicit slicing of static arrays makes the types compatible?

@dnadlinger
Copy link
Member

You mean memcmp in druntime isn't marked nothrow?

We don't lower nothrow to LLVM's nounwind, since Errors can still be thrown.

@kinke
Copy link
Member

kinke commented Mar 21, 2017

Ah yeah right, forgot it again. ;)

This also fixes a pessimization where the `memcmp` call would become an `invoke` if the user provides his own `memcmp` prototype in the code (the prototype would not carry the `nounwind` function attribute).
gen/arrays.cpp Outdated

auto *l_ptr = DtoArrayPtr(l);
auto *r_ptr = DtoArrayPtr(r);
auto *l_size = DtoArrayLen(l);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l_length - I'm fond of having consistent semantics for size (size in bytes), length (#elements) and, if unavoidable for LLVM, sizeInBits. ;)

[I can add a fixup commit too.]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -1062,32 +1059,68 @@ bool validCompareWithMemcmp(DValue *l, DValue *r) {
return validCompareWithMemcmpType(elemType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 lines up: So Type::equivalent() considers bool[4] and bool[] etc. equivalent? I wouldn't have expected that (only thought it would deal with const/immutable/shared...), and I think it deserves an explicit comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing: bool[3] and bool[4] are also considered equivalent !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for the optimized false result of bool[3]==bool[4].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking! 👍

@9il
Copy link

9il commented Mar 28, 2017

Nice!
Does this PR cover floating point numbers?

@kinke
Copy link
Member

kinke commented Mar 28, 2017

Of course not! A bitwise comparison of floating point arrays would be a very bad idea...

@PetarKirov
Copy link
Contributor

How does this relate to dlang/druntime#1792? If I understand correctly, the frontend will replace EqualExp with CallExp(__equals, ...), or NotExp(CallExp(__equals, ...)), so an EqualExp will no longer reach this line in ToElemVisitor::visit(EqualExp *e). IMO, the approach taken in dlang/dmd#6634 and dlang/dmd#6597 is superior because it removes complexity from the compiler (e.g. this PR and #1719) in favor increased flexibility in druntime. For example __cmp already employs memcmp (via core.internal.string.dstrcmp) in some cases. What would be the next step for LDC - recognizing specific instantiations of map and reduce in the compiler and adding special cases for them?!

@9il
Copy link

9il commented Mar 28, 2017

The good thing about this LDC PR is that it does not require DRuntime dependency and do not generate template bloat.

@PetarKirov
Copy link
Contributor

The good thing about this LDC PR is that it does not require DRuntime dependency and do not generate template bloat.

Seriously? The whole idea behind lowering to template functions is to reduce the dependency on DRuntime.
Have you even tried it? It's not that difficult:

main.cpp:

#include <array>
#include <cstdio>

int compareArrays(const int *p1, size_t len1, const int *p2, size_t len2);

int main()
{
    std::array<int, 3> arr1 = {1, 2, 3};
    std::array<int, 3> arr2 = {1, 2, 4};

    int res = compareArrays(arr1.begin(), arr1.size(), arr2.begin(), arr2.size());

    printf("%d\n", res);
}

compare.d:

extern(C++) pure nothrow @nogc
int compareArrays(scope const(int)* p1, size_t len1, scope const(int)* p2, size_t len2)
{
    return p1[0 .. len1] < p2[0 .. len2];
}

extern(C) void _d_dso_registry() {}
$ ~/dlang/install.sh install dmd-nightly
Downloading and unpacking http://nightlies.dlang.org/dmd-master-2017-03-28/dmd.master.linux.tar.xz
######################################################################## 100.0%
dub-1.2.1 already installed

Run `source ~/dlang/dmd-master-2017-03-28/activate` in your shell to use dmd-master-2017-03-28.
This will setup PATH, LIBRARY_PATH, LD_LIBRARY_PATH, DMD, DC, and PS1.
Run `deactivate` later on to restore your environment.

$ source ~/dlang/dmd-master-2017-03-28/activate

$ g++ -std=c++11 -c main.cpp && \
  dmd -O -betterC -c compare.d && \
  g++ main.o compare.o -o d_array_compare

$ ./d_array_compare
1

$ size -A -d compare.o
compare.o  :
section                                                size   addr
.text                                                     0      0
.data                                                     0      0
.bss                                                      0      0
.rodata                                                   0      0
.comment                                                  0      0
.note                                                     0      0
.note.GNU-stack                                           0      0
.data.rel.ro                                              0      0
.eh_frame                                               120      0
.text._Z13compareArraysPKimS0_m                          60      0
.text._d_dso_registry                                     8      0
.text._D6object12__T5__cmpTiZ5__cmpFNaNbNiNexAixAiZi    148      0
minfo                                                     0      0
.group.d_dso                                             20      0
.data.d_dso_rec                                           8      0
.text.d_dso_init                                         40      0
.dtors.d_dso_dtor                                         8      0
.ctors.d_dso_ctor                                         8      0
Total                                                   420

$ nm compare.o
0000000000000000 t
0000000000000000 W _D6object12__T5__cmpTiZ5__cmpFNaNbNiNexAixAiZi
0000000000000000 T _d_dso_registry
                 U _d_dso_registry
                 U _GLOBAL_OFFSET_TABLE_
                 U __start_minfo
                 U __stop_minfo
0000000000000000 T _Z13compareArraysPKimS0_m

Of course 120 bytes for _Z13compareArraysPKimS0_m + 148 bytes for
_D6object12__T5__cmpTiZ5__cmpFNaNbNiNexAixAiZi is not exactly stellar,
but surely if the code was compiled with LDC or GDC the results would at least a little better.

The important thing is that templates allow you to pay only for what you use and are also
much easier to modify. Just imagine working around a suboptimal code or a bug in the compiler
when you can just modify object.d as you want and be done with it
(you won't even need to recompile druntime). And object.d can actually be implemented
in your custom mir runtime library putting the codegen and link-time dependencies entirely under your control!

@kinke
Copy link
Member

kinke commented Mar 28, 2017

@ZombineDev: It's great that we have guys like you following both LDC and DMD development; I don't have the time to follow DMD. It's nice that there's apparently some movement in this regard; currently (for LDC, i.e., 2.073), the array comparison sucks badly due to TypeInfo lookup etc. I see absolutely no reason for optimizing map/reduce or some other higher-level stuff, but equality checks are absolutely basic and potentially quite frequent, hence this work on optimizing them. After this, it's still lacking support for unpadded aggregates (without opEquals, floating-point members etc.), but the current upstream druntime __equals() appears to be as simple as possible, without any memcmp() optimizations at all. Note that LLVM knows the semantics of memcmp() and can elide the call, that's why we are keen on using it.

I very much prefer coding this in druntime instead of the compiler. Something that can't be done in a library is telling the compiler that memcmp() never throws (nope, nothrow isn't enough). Arguably, LLVM should know, or we could detect some basic, well-known-not-to-throw functions in LDC.

@JohanEngelen
Copy link
Member Author

I had seen the druntime stuff too and am divided on this issue, but chose this PR's solution.
From my point of view, it is bad and annoying that the frontend does AST rewriting like that (and makes me wish we had our own frontend). The AST rewriting also adds complexity to the compiler of course, including semantic() calling another semantic(), that then depends on some druntime code, which in turn...
(in the extreme, you could make a similar case for doing virtual function calling using druntime...)

@dnadlinger
Copy link
Member

From my point of view, it is bad and annoying that the frontend does AST rewriting like that (and makes me wish we had our own frontend).

And why would it be bad and annoying? Lowering of constructs into library implementations is a great way to reduce compiler complexity (assuming, of course, that the interface uses sufficiently non-leaky abstractions).

@kinke kinke merged commit a405596 into ldc-developers:master Mar 29, 2017
kinke pushed a commit that referenced this pull request Mar 29, 2017
This also fixes a pessimization where the `memcmp` call would become an `invoke` if the user provides his own `memcmp` prototype in the code (the prototype would not carry the `nounwind` function attribute).
@JohanEngelen JohanEngelen deleted the dynarraymemcmp branch March 30, 2017 15:13
@JohanEngelen
Copy link
Member Author

The mixing of parsing and semantic analysis and optimization/lowering like it is done now is discarding information, and often I run into troubles because of it. Two things that come to mind are cross-module inlining and coverage. The replacement (by the parser) of __FILE__ means cross-module inlining code cannot detect whether a certain string is __FILE__ in code or not. And better coverage mapping (LLVM-style) cannot be done because, for example, loops are rewritten and because of things like this == rewriting (we cannot distinguish between a == b and __equals(a, b)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants