allow GC API profiling with --DRT-gcopt=profile:2#1147
allow GC API profiling with --DRT-gcopt=profile:2#1147MartinNowak merged 4 commits intodlang:masterfrom
Conversation
8d8d494 to
b7d237c
Compare
|
Just what I need right now ;). Can you please rebase it? |
|
Also wanted to add something like runLocked already. |
src/gc/gc.d
Outdated
There was a problem hiding this comment.
That's the only conversion you're saving by all this MonoTimeNative. Not worth the trouble IMO.
There was a problem hiding this comment.
The larger performance hit was subtracting two MonoTimes to get a Duration. This caused a noticable slowdown.
The actual problem I was trying to solve was that the time functions do not work without MonoTime.ticksPerSecond being initialized, but the GC is running before that. I did not want to add an initialization check everywhere.
There was a problem hiding this comment.
I just noticed that I could get similar performance when using MonoTime.currTime.ticks. Unfortunately this still needs MonoTime._ticksPerSecond to be initialized.
There was a problem hiding this comment.
I've replaced the code duplication with a runtime check for MonoTime._ticksPerSecond.
b7d237c to
1f50e0a
Compare
Done. I wonder how the option parser could pass the review ;-) |
|
Please note that "profile" is no longer a boolean option, so y/n don't work anymore. This is rather inconsistent if debug=PROFILE_API isn't set, because profile:2 has no meaning without it. I don't see a noticable performance hit with PROFILE_API enabled, but profile<2 on win32 on my laptop, so maybe we could always build it unconditionally. (I can't test it for Win64, still have to figure out how resonably to build after the |
What happened there? |
I wouldn't be surprised, because the branch is easily predicted. |
src/gc/gc.d
Outdated
There was a problem hiding this comment.
I get up to a few percent slowdown (worst in tree1), apparently because the delegate call isn't inlined.
|
Can you please split-off the gc.config fixes so that they can go into 2.067? |
The problem is linking the unittests:
Maybe it's just my build script, but I expect a lot of people will have trouble, especially if you just want to rebuild druntime/phobos within a distributed archive. Also note that the source folders in the distribution have a different layout than what you usually fetch from github. |
Done: #1173 |
I know, I know, it's a mess. |
Don't sweat the small stuff. |
|
So what do we do about the callback cost? |
|
Anything but using a string mixin fails, but I'd like to avoid that. I'm trying to fix the inliner... |
Well, good luck... I kind of gave up trying to optimize for dmd. But maybe improving the frontend inliner also helps LDC/GDC? |
I got runLocked inlined, but it's probably a little unsafe. It's not so easy to determine whether a passed delegate might get modified during function execution or not.
I remember the GDC/LDC developers saying that they avoid the inliner. |
|
I've changed runLocked to a pair of mixin templates doLock/doUnlock that inline perfectly. I don't see a difference to master when compiling without -debug=PROFILE_API. |
fix parsing "help" print floats with %g instead of %f for a shorter display
|
Auto-merge toggled on |
allow GC API profiling with --DRT-gcopt=profile:2
|
thx |
Here's a benchmark result with API time:
Example output details for dlist:
malloc includes collections, GC.lock time not included in single function time.
Running it disabled has a tiny impact on performance, so I've made it optional with -debug=PROFILE_API.
This also corrects a copy and paste error in runbench which fails to extract the GC summary from gcx.log.