Skip to content

CMake: Allow compiling all D modules of a runtime lib at once#2231

Merged
kinke merged 8 commits intoldc-developers:masterfrom
kinke:runtimeCMake
Jul 27, 2017
Merged

CMake: Allow compiling all D modules of a runtime lib at once#2231
kinke merged 8 commits intoldc-developers:masterfrom
kinke:runtimeCMake

Conversation

@kinke
Copy link
Member

@kinke kinke commented Jul 22, 2017

Curious about the timings when building all D modules at once in a single command-line (like DMD does), I hacked the CMake script and performed a few tests (Win64, LDC/LLVM with assertions, quad-core CPU without SMT):

All at once, phobos2-ldc:        50.4 secs with -j1, ~1 GB RAM
             phobos2-ldc-debug:  14.6 secs with -j1
Separately,  phobos2-ldc:       109.5 secs with -j1, 35.1 secs with -j4
             phobos2-ldc-debug:  44.8 secs with -j1, 13.1 secs with -j4

So saving all the duplicate parsing work does pay off substantially, at the cost of much coarser parallelizability and having to recompile all modules when only a single one changes. Setting the CMake variable COMPILE_ALL_D_FILES_AT_ONCE=OFF restores the previous behavior of compiling each module separately.

Building druntime (debug+release), Phobos (debug+release) and profile-rt with -j4 used to take 60 secs on my box, and now with all-at-once compilation 50 secs. That meager difference is of course due to the
phobos2-ldc target taking that long; the other CPU cores are idling after a short while.

I also compiled the Phobos (release) unittests all at once, to check how much memory that requires. We are talking about 13 GB, so that's not really an option, so all unittest objects are still compiled separately.

So to take full advantage of this, the all target should be built alongside the unittests, so that all CPU cores have something to do.


To get this to work, I had to move from object file suffices (e.g., .../std/math-debug-shared_32.o) to different base directories, e.g., build-ldc/runtime/objects-debug-shared_32/std/math.o). This mitigates an issue due to conflicting object files when building LDC on a 32-bit host with MULTILIB=ON (@WebDrake).

@WebDrake
Copy link
Contributor

@kinke hah, it's nice to see that two separate issues can both be addressed by a simple change!

That said, it could be good to split out the object-file-suffix changes into a separate patch, precisely because they address more than just the present issue.

What release(s) are you thinking of targeting this towards?

@kinke
Copy link
Member Author

kinke commented Jul 22, 2017

it could be good to split out the object-file-suffix changes into a separate patch

Yep, I was thinking the same; will happen as part of later PR polishing.

two separate issues

Well separate compilation is not a real issue; If I had a Ryzen CPU with 16 logical cores, I'd most likely still want to compile separately. But for the majority of our CI boxes with 2 CPU cores, all-at-once compilation and some script tweaking could shave off maybe a minute or two and reduce the #lines in the logs.

The other motivation behind this approach is that it should be able to hide the template culling issues for Phobos 2.074 (for the normal libs, not the for the unittest libs) and potentially result in higher-performant Phobos due to more redundant template instantiations and so better inline-ability.

What release(s) are you thinking of targeting this towards?

I haven't really thought about that. If it works out nicely, the next one (1.4) I guess.

@kinke kinke force-pushed the runtimeCMake branch 3 times, most recently from b265858 to 20efbdf Compare July 22, 2017 20:13
@WebDrake
Copy link
Contributor

I haven't really thought about that. If it works out nicely, the next one (1.4) I guess.

That's what I assumed. So, absent some way to tweak the cmake invocation to work around the current build issues, that gives me a target for when I can reintroduce multilib support to the snap package. Thanks!

@kinke
Copy link
Member Author

kinke commented Jul 23, 2017

@joakim-noah: Cross-compiling the testrunner executables is getting in sight. TARGET_OS is the next step.

@kinke kinke force-pushed the runtimeCMake branch 5 times, most recently from 00d7919 to 4cb44a8 Compare July 23, 2017 07:16
@joakim-noah
Copy link
Contributor

joakim-noah commented Jul 23, 2017

Oh nice, I'll look at adapting my Android cross-compilation patch to this pull, neither of which use ldc to invoke the linker.

As for TARGET_OS, unfortunately there's no good way to differentiate between Linux and Android in CMake right now, but the Termux author has expressed interest in fixing that for native builds, termux/termux-packages#151, and of course this will always help with other cross-compiles.

@kinke kinke force-pushed the runtimeCMake branch 2 times, most recently from 61eb363 to 61fe0e6 Compare July 23, 2017 15:34
@kinke
Copy link
Member Author

kinke commented Jul 23, 2017

TARGET_SYSTEM added. So this is combination with #2198 should make cross-compiling the stdlibs, optionally including the test runners, work without too much hassle. Handling/naming of D_FLAGS, RT_CFLAGS and LD_FLAGS may need to tweaked.

There's a new alignment issue on Win64 when linking the testrunners via CMake which I still need to look into.

@kinke
Copy link
Member Author

kinke commented Jul 23, 2017

There's another issue with this PR, Semaphore randomly fails when building Phobos:

std/random.d(446): Error: template instance std.random.LinearCongruentialEngine!(uint, 16807, 0, 2147483647) does not match template declaration LinearCongruentialEngine(UIntType, UIntType a, UIntType c, UIntType m) if (isUnsigned!UIntType)
std/random.d(448): Error: template instance std.random.LinearCongruentialEngine!(uint, 48271, 0, 2147483647) does not match template declaration LinearCongruentialEngine(UIntType, UIntType a, UIntType c, UIntType m) if (isUnsigned!UIntType)
std/random.d(730): Error: template instance std.random.MersenneTwisterEngine!(uint, 32, 624, 397, 31, 2567483615u, 11, 7, 2636928640u, 15, 4022730752u, 18) does not match template declaration MersenneTwisterEngine(UIntType, uint w, uint n, uint m, uint r, UIntType a, uint u, uint s, UIntType b, uint t, UIntType c, uint l) if (isUnsigned!UIntType)
std/random.d(1024): Error: template instance std.random.XorshiftEngine!(uint, 32, 13, 17, 15) does not match template declaration XorshiftEngine(UIntType, UIntType bits, UIntType a, UIntType b, UIntType c) if (isUnsigned!UIntType)
std/random.d(1025): Error: template instance std.random.XorshiftEngine!(uint, 64, 10, 13, 10) does not match template declaration XorshiftEngine(UIntType, UIntType bits, UIntType a, UIntType b, UIntType c) if (isUnsigned!UIntType)
std/random.d(1026): Error: template instance std.random.XorshiftEngine!(uint, 96, 10, 5, 26) does not match template declaration XorshiftEngine(UIntType, UIntType bits, UIntType a, UIntType b, UIntType c) if (isUnsigned!UIntType)
std/random.d(1027): Error: template instance std.random.XorshiftEngine!(uint, 128, 11, 8, 19) does not match template declaration XorshiftEngine(UIntType, UIntType bits, UIntType a, UIntType b, UIntType c) if (isUnsigned!UIntType)
std/random.d(1028): Error: template instance std.random.XorshiftEngine!(uint, 160, 2, 1, 4) does not match template declaration XorshiftEngine(UIntType, UIntType bits, UIntType a, UIntType b, UIntType c) if (isUnsigned!UIntType)
std/random.d(1029): Error: template instance std.random.XorshiftEngine!(uint, 192, 2, 1, 4) does not match template declaration XorshiftEngine(UIntType, UIntType bits, UIntType a, UIntType b, UIntType c) if (isUnsigned!UIntType)
std/traits.d(6620): Error: static assert  "Type uint does not have an Unsigned counterpart"
std/traits.d(6386):        instantiated from here: Modifier!uint
std/traits.d(6624):        instantiated from here: ModifyTypePreservingTQ!(Impl, uint)
std/format.d(1480):        instantiated from here: Unsigned!uint
std/format.d(3486):        ... (3 instantiations, -v to show) ...
std/typecons.d(402):        instantiated from here: format!(char, uint, uint)
std/encoding.d(3468):        instantiated from here: Tuple!(BOM, "schema", ubyte[], "sequence")

The randomness is particularly worrying. I quicky looked at std.traits.isUnsigned; it's using staticIndexOf() to look up the type in 2 TypeTuples (integral & unsigned type lists)...

@kinke
Copy link
Member Author

kinke commented Jul 23, 2017

Another annoying issue: the test runner executables aren't relinked if they exist and their tested lib has been updated (due to a changed src module etc.). They are currently linked in manually via linker flags (pull in all objects...) and the build dependency is added via add_dependencies()...

@kinke
Copy link
Member Author

kinke commented Jul 23, 2017

The Win64 failure was partly due to incremental linking (CMake default) apparently affecting the alignment of globals and this triggering 3 bugs in druntime & Phobos.

std.variant was still failing; I quickly debugged and noted that it uses function addresses as markers and that there's a mismatch there. Turning on identical COMDAT folding for the MS linker, something we do by default when linking via LDC, gets rid of the failure.

@kinke
Copy link
Member Author

kinke commented Jul 25, 2017

The test runners are now correctly updated. So Semaphore is the last problem, and the failure seems to happen nearly every time now.

kinke added 8 commits July 26, 2017 23:28
Curious about the timings when building all D modules at once in a single
command-line (like DMD does), I hacked the CMake script and performed a
few tests (Win64, LDC/LLVM with assertions, quad-core CPU without SMT):

All at once, phobos2-ldc:        50.4 secs with -j1, ~1 GB RAM
             phobos2-ldc-debug:  14.6 secs with -j1
Separately,  phobos2-ldc:       109.5 secs with -j1, 35.1 secs with -j4
             phobos2-ldc-debug:  44.8 secs with -j1, 13.1 secs with -j4

So saving all the duplicate parsing work does pay off substantially, at
the cost of much coarser parallelizability and having to recompile all
modules when only a single one changes. Setting the CMake variable
`COMPILE_ALL_D_FILES_AT_ONCE=OFF` restores the previous behavior of
compiling each module separately.

Building druntime (debug+release), Phobos (debug+release) and profile-rt
with `-j4` used to take 60 secs on my box, and now with all-at-once
compilation 50 secs. That meager difference is of course due to the
phobos2-ldc target taking that long; the other CPU cores are idling after
a short while.

I also compiled the Phobos (release) unittests all at once, to check how
much memory that requires. We are talking about 13 GB, so that's not
really an option, so all unittest objects are still compiled separately.

So to take full advantage of this, the `all` target should be built
alongside the unittests, so that all CPU cores have something to do.

---

To get this to work, I had to move from object file suffices (e.g.,
`.../std/math-debug-shared_32.o`) to different base directories, e.g.,
`build-ldc/runtime/objects-debug-shared_32/std/math.o`). This mitigates an
issue due to conflicting object files when building LDC on a 32-bit host
with MULTILIB=ON.
Mostly as preparation for cross-compiling them and to simplify test
dependency handling. Also prevents relinking the testrunners on each CTest
invokation etc.

Instead of adding each unittest lib and each test runner executable as
separate test, with correct dependencies between them (even inbetween the
unittest libs - shared Phobos depending on static Phobos etc.), I chose to
reduce the complexity by only exposing the test runner executables as
tests, with no dependencies.

This may lead to issues when multiple test runners sharing common build
dependencies are built in parallel. But building them this way is not very
convenient anyway (e.g., no way to control number of build jobs for each
build-testrunner test), so just make sure everything works when
running the tests serially via `ctest [-j1]`, and focus more on easily
building the test runners externally via a new meta build target
`all-test-runners` => `ninja -j4 all all-test-runners && ctest -j4 ...`.
Primarily for cross-compilability. Also add CMake variable `C_SYSTEM_LIBS`
to allow the user to specify the names of the C libraries to be linked
against shared runtime libs and test runner executables.
Representing a list of

* the target's CMAKE_SYSTEM_NAME (Linux, Darwin, Windows, FreeBSD, ...),
* and the associated definitions of UNIX, APPLE and MSVC.

So the user would use something like `TARGET_SYSTEM=Linux;UNIX` when
targeting Linux, `Darwin;APPLE;UNIX` for OSX and `Windows;MSVC` for MSVC
targets.
CMake uses incremental linking for MSVC targets by default, with apparent
effect on the alignment of globals. A bunch of unittests have been failing
on Win64 without these fixes since letting CMake link the test runners.

Rainer has fixed the 2 Phobos locations upstream already; the
`core.thread` bug is still unfixed, I submitted PR ldc-developers#1888.
std.variant was still failing; I quickly debugged and noted that it uses
function addresses as markers and that there's a mismatch there. Turning
on identical COMDAT folding for the MS linker, something we do by default
when linking via LDC, gets rid of the failure.
add_dependencies() only makes sure the dependencies are up-to-date before
running the testrunner's target. So after hacking a source file, the
unittest libraries were correctly updated, but the test runners themselves
weren't relinked afterwards.
target_link_libraries() isn't an option when linking in the library
manually via custom linker flags to make sure all objects are pulled in.

Fortunately, the LINK_DEPENDS target property allows specifying a file
dependency, so that a testrunner is relinked when its library is newer.
@kinke kinke merged commit 5d86c1f into ldc-developers:master Jul 27, 2017
@kinke kinke deleted the runtimeCMake branch July 27, 2017 22:01
@kinke
Copy link
Member Author

kinke commented Jul 28, 2017

After merging merge-2.074 following this, Semaphore still failed with both ninja and make, this time when generating all and all-test-runners in parallel (all alone always worked). ninja tried to link a testrunner without building a unittest lib first (~400 build targets missing - 2.6k vs. 3k). make complained about the built ldc2 executable, 'text file not ready' (tried to invoke it as script via sh!). So the strangest of errors.

As both generators failed, I checked the CMake version; Semaphore is using Ubuntu 14.04's 2.8.12.2. Travis is using 3.2.x for Linux. So I switched to latest CMake (currently 3.9) for Semaphore, and it's working now [with ninja, haven't tested with make]. As there's no online documentation for CMake < 3, I'll leave it at that; let's just not use ancient CMake versions for testing.

@joakim-noah
Copy link
Contributor

I've started trying this out: one request I have is that the link.txt for the test runners always puts test_runner.o first, which is needed for the Android emulated TLS to work properly. I tried some CMake commands like add_dependencies but they don't work for a variable. Could you change the link order to this, if you know how?

@kinke
Copy link
Member Author

kinke commented Jul 30, 2017

I guess the issue is that the related unittest library is linked in manually via linker flags, and that those precede the objects in the CMake linker command line. In that case, the object would need to be specified as first linker flag. So maybe try set(full_linkflags ${test_runner_o} ${linkflags}) in both locations in build_test_runners().

@joakim-noah
Copy link
Contributor

That puts the test runner object file first, but then errors out when linking because it's listed twice, clashing symbols. If I remove the test runner object file from add_executable, it complains that no sources provided and CMake craps out. I hate dealing with CMake, any idea what to do?

@kinke
Copy link
Member Author

kinke commented Jul 30, 2017

Yeah it's a pain. For POSIX, making the testrunner a custom target with manual linking command ${CMAKE_C_COMPILER} -o ... ${test_runner_o} ${full_linkflags} could be an option.
Edit: But some dummy source would be much simpler.
Edit2: lol, that's what people do: file(WRITE null.c "") ... add_executable(name null.c)

@joakim-noah
Copy link
Contributor

Yeah, that did it, thanks. I'll submit this with my Android PR.

@kinke
Copy link
Member Author

kinke commented Jul 30, 2017

I implemented something in the meantime too, see #2253.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants